Case StudyProfessional Services

Financial Consultancy, Enterprise

Industry

Professional Services

Sector

Financial Consulting

Size

Enterprise Consultancy

Region

United Kingdom

Transformation Results

team time cost

Time returned to high-value analysis

~50% of staff, 40% of time on admin/compilation → 1 hour pipeline execution + TBD review time

workflow scale

Full automation with audit trails

Manual, days per report → 7 agents, 68 prompts, 230 content pieces

data processing

Zero manual data handling

Manual extraction from 30+ tab workbooks → 53 tables extracted automatically

hallucination control

Audit trail for every number

Human verification required → Zero — figures extracted, never generated

output quality

Professional output, every time

Manual compilation, formatting inconsistencies → 82-slide presentations, presentation-ready

Executive Summary

We built a production agentic workflow that automates complex financial report generation from specialist spreadsheets to presentation-ready deliverables. Seven coordinated AI agents process 53 data tables through 68 individually crafted prompts, generating 230 pieces of narrative content and assembling 82-slide presentations — with every output traceable to its source.

The practical impact: In the company we're working with, about half the team spends 40% of their time on admin and PowerPoint compilation. The automated pipeline now runs in about an hour. How long human review takes? We don't know yet — it's new, and we'll update this in a month once we have real data.

The Challenge

"The MP produces a DDA for the RA." If that sentence means nothing to you, you're in good company. But in our client's world, it's Tuesday. Every industry has its own dialect — acronyms stacked on acronyms, meaning compressed into shorthand that only insiders can parse.

The proprietary process sits entirely in Excel. Activity-based accounting methodology, industry benchmarks accumulated over decades, complex pivot tables — this is the intellectual property of consultants who are leaders in their field. We didn't replace any of it. We built the automation layer that takes the outputs of their expertise and transforms them into the final product.

Why It Failed Before

A year ago, we built this same workflow with GPT-3.5 and Bubble.io. The orchestration worked — we could wire stages together, move data between steps. But the LLM couldn't do the job. It couldn't cross-reference across documents. It couldn't handle the domain jargon. It couldn't maintain consistency across hundreds of prompts.

Fast forward to today. We ran a benchmark: could Claude 3.5 on claude.ai handle this workflow? We spent four hours before calling it. The model itself was capable — the raw intelligence wasn't the problem. But the tooling collapsed under the weight of the task. Context limits meant we couldn't hold enough workflow in a single session. No filesystem access. No persistent memory between steps.

Complex, multi-stage, domain-specific agentic workflows need purpose-built infrastructure. They need filesystem integration, persistent context, orchestration logic. General-purpose chat interfaces aren't designed for that. Not yet.

What We Built

The deliverable is a compiled financial assessment report — over a hundred slides, presentation-ready, with charts, tables, benchmarks, and written narrative.

The pipeline works in stages with managed dependencies:

Data and charts extracted from source spreadsheets

Extracted data feeds into hundreds of individually crafted prompts

Multi-level architecture: raw data → level-one prompts → level-two summaries → level-three executive narratives

Every dependency is explicit. You can't generate an executive summary before section summaries exist. You can't write narrative before data is extracted and validated.

Every piece of generated content carries full provenance. Which source document. Which data point. How it was processed, and by which step in the pipeline. The audit trail is a first-class deliverable.

Critically: zero tolerance for hallucinations. Financial figures are extracted and placed, never generated. The AI writes narrative around verified data. It does not invent data.

The Real Foundation: Understanding the Domain

Here's the contrarian claim: the hardest part of building an agentic workflow isn't the AI, the code, or the architecture. It's understanding the domain.

Without knowing what "The MP produces a DDA for the RA" means — really knowing, not just expanding the acronyms — no amount of prompt engineering will save you. Every synonym, every abbreviation, every implicit assumption baked into decades of client process had to be understood before a single line of code was written.

There are no spring chickens on this team. Business modelling, operations, financial analysis, process mapping — this knowledge was built over careers, not bootcamps. When you're staring at a workbook with thirty tabs of pivot tables and activity-based costings, grey hair is an asset.

An AI can only be as good as the instructions it receives. If you don't understand the business, your prompts are garbage — and your output is confident garbage.

We didn't start with code. We started with questions. What does this spreadsheet actually mean? What's the relationship between these tabs? Why is this benchmark structured this way? Six hours with coffee and a whiteboard — two people who understood both the technology and the business, sketching the process, mapping dependencies.

Designing the Workflow

The whiteboard sessions produced a design — stages, boundaries, data flows, dependencies, all mapped out before anyone opened an editor.

Key decisions:

Context Isolation: Each section of the final deliverable became an independent agent context with its own data and prompts. This prevents 'context poisoning' — when an AI agent's working memory bleeds across unrelated tasks, output quality degrades. Clean boundaries keep each agent focused on exactly what it needs to know.

Parallel Execution: The source documents are large. The workflows are numerous. Running everything sequentially would take too long. Parallel execution wasn't a nice-to-have — it was a requirement.

Explicit Dependencies: Data extraction before narrative generation. Level-one analysis before level-two summaries. Level-two summaries before level-three executive narrative. Every dependency was mapped, every ordering constraint deliberate.

Prove Each Step: Extract one chart. Run one prompt. Merge one section into the template. Validate the output. Get each component right on its own before wiring them together.

Automatic Feedback: The system checks its own output — catches formatting errors, detects text overflow, validates data placement. When something fails, it knows, and it reports exactly what went wrong and where.

The Result

Twenty million tokens to reach production testing. 7 specialist agents coordinated across 17 skills process 53 extracted data tables, execute 68 individually crafted prompts that generate 230 distinct pieces of narrative content, render 62 tables and diagrams, place 91 charts and images, and assemble it all into an 82-slide presentation.

Every chart placement, every narrative paragraph, every financial figure can be traced back to the specific spreadsheet cell it came from and the specific pipeline step that processed it. The audit trail isn't a feature. For financial reporting, it's the point.

The workflow runs on Sasha — our agentic workflow platform — which provides the observability and orchestration that a pipeline of this complexity demands.

Business Impact

The manual version of this process consumed days per person per week. Experienced consultants spending their time on compilation and formatting instead of analysis and client work. That time is now returned to them.

The pipeline runs in about an hour. Human review time? Honestly, we don't know yet — the system is new. We'll update this case study in a month once we have real data.

Why It Worked

A year ago, we couldn't build this. The models weren't capable enough. Today, the models are. But the technology was the straightforward part.

The months of work were spent understanding a client's business deeply enough to encode it — sitting with their spreadsheets, learning their jargon, mapping their processes, and asking the questions that only experience teaches you to ask.

If your organisation has complex, domain-heavy document workflows that consume specialist time, this is now solvable. The question isn't whether the AI is smart enough. It is. The question is whether the people building the workflow understand your business well enough to get it right.

Context is Everything - UK AI Consultancy & Private AI Deployment

Our Team

Proven Results