From ChatGPT Experiments to Production AI: What We Learned Integrating AI into Consultancy Work

13 February 2026·7 min read·640 words

AI ImplementationProfessional ServicesProduction AI

We started with a question: could AI make our consulting work better? After months of continuous trialling and refinement, the answer is nuanced. The distance between AI hype and AI utility in professional services is measured in disciplined experimentation — here's what that journey actually looks like.

We started with a question: could AI make our consulting work better? After months of continuous trialling and refinement, the answer is nuanced — and more interesting than we expected.

The distance between AI hype and AI utility in professional services is measured in disciplined experimentation. Here's what that journey actually looks like.

Phase 1: The ChatGPT Honeymoon

Like every professional services firm in 2023, we started with ChatGPT. The first outputs were impressive — until they weren't. Generic summaries of complex documents. Plausible-sounding analysis that missed critical context. Confident assertions that were subtly wrong.

The problem wasn't the technology. It was our expectations. We were treating a general-purpose tool as a domain specialist.

Phase 2: Pattern Recognition

The second phase taught us something important: AI excels at synthesis, but fails at judgement.

Give it 500 pages of due diligence documents and it can identify every reference to revenue recognition across the entire set. Ask it whether the revenue recognition approach is appropriate for that specific industry vertical, and it guesses. Sometimes well, sometimes catastrophically.

This distinction — synthesis versus judgement — became our operating principle. AI handles the breadth. Experts handle the depth.

Expand Chart

Phase 3: The Tuning Revelation

This is where most firms get stuck, because this is where the work stops being exciting and starts being difficult.

Small configuration changes produce massive accuracy shifts. Temperature settings that work for creative scenario analysis fail for precise numerical outputs. Retrieval strategies need different weighting for regulatory guidance versus case precedent. Prompt architectures that excel at structured data analysis collapse when handling unstructured interview transcripts.

The hidden complexity of AI tuning deserves its own treatment — but the short version is that tuning is neither intuitive nor one-time work. It's continuous refinement. And it's where the real competitive advantage forms.

Phase 4: Mature Integration

Production AI doesn't look like the demos. It's less dramatic and more useful.

A financial advisory firm uses AI to synthesise thousands of pages of due diligence documents in hours, but every material finding goes through expert review. A pharmaceutical consultancy cross-references regulatory submissions across jurisdictions, but specialist interpretation determines the compliance implications. An assessment firm identifies patterns across interview transcripts, but experienced professionals make the judgement calls.

The expert isn't removed from the process. The expert is freed from the mechanical parts of the process.

Expand Chart

The Two Discoveries

Months of production deployment revealed two things that weren't obvious at the start:

Discovery 1: Custom tooling beats model access. Everyone can subscribe to Claude or GPT-4. The two structural moats in professional services AI are private data and custom tooling — not which model you can access.

Discovery 2: We need human oversight more, not less. The better AI gets at synthesis, the more critical expert judgement becomes. AI-generated analysis that's 95% correct is more dangerous than obviously wrong output, because it passes casual review. Why most AI projects fail is often about this false confidence.

What We'd Tell Ourselves at the Start

If we were beginning again, three things:

First, skip the pilot mindset. Pilots are designed to prove a concept. Production deployment is designed to prove value. These require fundamentally different approaches, different metrics, and different timelines.

Second, invest in tuning before scaling. A poorly tuned AI system that reaches more users just amplifies its mistakes. Get the configuration right with a focused team before expanding.

Third, measure what matters. Not "hours saved" but "quality maintained or improved." Not "documents processed" but "insights surfaced that experts would have missed."

The journey from ChatGPT experiments to production AI isn't linear and it isn't quick. But for firms willing to do the disciplined work, the results are transformative. Assess your readiness — the path is clearer than the hype suggests.

The Two Moats: Why Consultancies' AI Advantages Are Structural, Not Timing

Most professional services firms are still asking 'should we explore AI?' The firms pulling ahead are already in production. But the advantage isn't timing — it's structural. Two competitive moats are forming that can't be bought, replicated, or rushed: private data and custom tooling.

6 min read·AI Strategy

Read article

Why Most AI Projects Fail (And What the 5% Do Differently)

MIT's Project NANDA found 95% of enterprise AI pilots deliver zero return. Companies have invested £30-40 billion with nothing to show. But 5% achieve rapid revenue acceleration. The difference isn't the technology - it's implementation and context.

7 min read·AI Implementation

Read article

Identifying High-ROI Processes for AI Automation

Most people intuitively know which tasks are too complex, too arduous, or too boring for humans alone. We've found high-value processes fall into three categories — and picking one from each is the fastest way to prove AI value.

5 min read·AI Implementation

Read article

Context is Everything - UK AI Consultancy & Private AI Deployment

Our Team

Proven Results

Phase 1: The ChatGPT Honeymoon

Phase 2: Pattern Recognition

Phase 3: The Tuning Revelation

Phase 4: Mature Integration

The Two Discoveries

What We'd Tell Ourselves at the Start

Related Articles

The Two Moats: Why Consultancies' AI Advantages Are Structural, Not Timing

Why Most AI Projects Fail (And What the 5% Do Differently)

Identifying High-ROI Processes for AI Automation