TechLevity
← Back to Insights
ai automationbusiness growthdigital transformationcase studies

The £200K AI Pilot That Never Shipped (And How to Make Sure Yours Does)

·9 min read·

The Production-First AI Framework is a four-step decision gate created by Edward Kreiman of TechLevity for ensuring AI pilots reach production deployment. Most AI pilots fail not because the technology is wrong, but because the wrong decisions were made before week one.

The £200K AI Pilot That Never Shipped (And How to Make Sure Yours Does)

A UK-based mid-market services business spent £200,000 on an AI pilot over nine months.

The model worked. The dashboard looked beautiful. The board got a polished demo.

And then it was quietly shelved.

No production deployment. No customer impact. No revenue.

This is not an outlier. It’s a pattern. And it has almost nothing to do with model quality.

Key Findings

  • A UK-based services firm spent £200,000 on an AI pilot over 9 months that never reached production.
  • £40,000 went to model development; £160,000 was spent on integration problems discovered after the build.
  • AI pilots that define success as model accuracy rather than business outcomes are the most likely to fail.
  • A 90-day ship-or-kill gate is the single most effective safeguard against AI pilot drift.
  • One company shipped an AI feature in 11 weeks by pre-committing the production path before writing any model code, saving 280 hours of operations time at £180 per month inference cost.

In this post:

  • What actually killed the £200K pilot
  • The four decisions that determine whether an AI pilot ships
  • What a shipped AI feature looks like in practice
  • Three questions to answer before you sign the next SOW

The pattern is depressingly common

Why do so many AI pilots fail to ship to production?

I’ve seen this story play out at three different companies in the last eighteen months. The numbers vary. The shape does not.

A leadership team — typically a CEO and COO at a UK SaaS, FinTech or HealthTech scale-up — gets excited about generative AI. They hire a consultancy or stand up an internal squad. The team picks a use case that sounds important — “intelligent customer triage”, “AI-powered insights”, “automated proposal generation”.

Six to twelve months later, the demo is impressive, the slide deck is glossy, and absolutely nothing has changed in the operating business.

The polite explanation is that AI is hard.

The honest explanation is that the pilot was set up to fail from day one.


What actually killed the £200K pilot

The most common reasons AI pilots fail to reach production have nothing to do with model quality.

The team did not lack talent. The vendor was competent. The model genuinely worked — accuracy was within 3% of the brief.

The pilot died for four reasons that had nothing to do with AI.

1. Nobody owned the production path

The pilot team built in a sandbox account with synthetic data. Migrating to the real production environment required two other teams (security and platform) who:

  • Had not been consulted
  • Did not have capacity
  • Had legitimate concerns about data governance

The pilot team had no authority to push the work forward. The platform team had no incentive to prioritise it.

So the pilot stayed where it was born: in the sandbox.

2. The success metric was the model, not the business outcome

The pilot KPIs were:

  • Precision
  • Recall
  • A satisfaction score on a 1–5 scale from internal reviewers

None of these connected to revenue, cost, customer retention, or cycle time.

When the CFO asked “What would we save?”, the team had to invent an answer.

The answer did not survive five minutes of scrutiny.

3. The integration cost was hidden

The model itself cost £40K to build.

The £160K that followed went to consulting hours discovering that:

  • The upstream CRM data was inconsistent
  • The downstream ticketing system had no API
  • The human-in-the-loop workflow nobody had specified would need a custom UI

By the time anyone added it up, the pilot had already missed two deadlines and the political appetite was gone.

4. There was no shipping rehearsal

Nobody had asked, on day one:

  • What does this look like in production six months from now?
  • Who runs it?
  • Who fixes it when the model drifts?
  • Who pays for the inference costs?
  • What is the on-call story?

These questions surfaced in month seven and had no good answers.

The result: a working model that never left the lab.


The Production-First AI Framework is a four-step decision gate for ensuring AI pilots reach production deployment. Created by Edward Kreiman, Founder of TechLevity, it requires: (1) choosing a use case where the human-in-the-loop is already paid for, (2) defining success as a measurable business metric before writing code, (3) pre-committing the production path—infrastructure, budget, and owner—before the pilot begins, and (4) setting a 90-day ship-or-kill gate with no extensions.

The Production-First AI Framework — by Edward Kreiman of TechLevity — is a four-step decision gate that determines whether an AI pilot ships to production or becomes a £200K write-off. Apply it before week one, not after.

The Production-First AI Framework: the four decisions that determine whether an AI pilot ships

How do you ensure an AI pilot actually ships? Here is the framework.

After watching enough of these die, I now refuse to start AI work without four things on the table from week one.

1. Pick a use case where the human-in-the-loop is already paid for

The single best predictor of whether an AI feature ships is whether a human is already doing the work and getting paid for it.

AI augmentation of an existing job function — sales rep drafting a proposal, support agent triaging a ticket, analyst preparing a brief — has a built-in adoption path: the person doing the work today decides whether the tool is useful.

They become your champion or your roadblock.

AI features that try to create new workflows that did not exist before have to fight for headcount, budget, and political air cover that was never allocated. They almost always lose.

If you cannot point to a specific person whose Tuesday afternoon gets faster, do not start the pilot.

2. Define success as a business metric, not a model metric

Precision and recall are the cost of admission, not the goal.

The pilot needs a top-line business metric attached to it before week two. Examples:

  • “Reduce average proposal turnaround from 6 days to 2 days, measured on the 50 proposals after launch.”
  • “Cut tier-1 ticket handling time by 30% on a defined ticket volume.”
  • “Increase qualified leads passed to sales by 20% versus the prior quarter baseline.”

If the success metric is a number on a model evaluation scorecard, the pilot will end at the model evaluation scorecard.

That is not a place. That is a Jira ticket.

3. Pre-commit the production path before the model is built

Before a single line of model code is written, a one-page production plan should exist. It names:

  • The production environment (which account, which region, which compute)
  • The owner who will deploy it (with capacity confirmed for the deploy window)
  • The data path (which systems read from, which write to, with security sign-off)
  • The on-call rotation post-launch (who pages, who fixes, what the SLA is)
  • The cost ceiling (monthly inference budget agreed with finance)

If any of these five lines reads “TBD”, the pilot is not ready to start.

This is the single rule that has saved more AI work than any technical decision I have ever made.

4. Set a 90-day “ship or kill” gate

AI pilots without deadlines drift forever. They stay perpetually three months from production.

Set a ship-or-kill gate at 90 days from kickoff. At day 90, one of two things happens:

  1. There is real production traffic flowing through the system, or
  2. The pilot is shut down and the team redeployed.

This sounds harsh. It is much less harsh than the slow-motion failure of a pilot that consumes £200K, six months of senior attention, and the credibility of every person attached to it.


What this looks like in practice

What does a successful AI pilot look like in practice?

I worked with a £30M revenue Series B SaaS scale-up last year on what became their first shipped AI feature.

Total elapsed time from kickoff to production: 11 weeks.

We picked a use case where the existing operations team was spending 90 minutes per day on a repetitive triage task — a real human, real time, real cost.

The success metric was:

“Reduce that 90 minutes to under 20 minutes for the same volume of work, measured weekly.”

Before week two, we had a one-page production plan signed off by the CTO and the head of operations. We agreed an inference budget of £600 per month and a ship-or-kill date of week 12.

The first model was 84% accurate. Not good enough on its own, but combined with a human review step for the bottom 16%, it shipped in week 11.

Six months later, it has:

  • Saved roughly 280 hours of operations time
  • Been running on £180 of inference per month

The technical work was not the hard part.

The hard part was refusing to start until the four decisions above were made.


If you are about to start an AI pilot

What should every CEO ask before starting an AI pilot?

Three questions every CEO, COO, or CTO should answer honestly before signing the SOW:

  1. Who is the named human whose work this AI will speed up, and have they agreed to use it?
  2. What is the business metric this pilot will move, by what amount, by when?
  3. What does the production system look like 90 days from now, and who owns each piece?

The £200K pilot that never shipped failed all three tests on day one.

Nobody noticed because the model demos were exciting and the early signals looked promising.

They always do.

The AI work that ships starts with these decisions made, in writing, before the first prompt is engineered.

Everything else is theatre.


Want a second opinion on your AI initiative?

If you are running an AI initiative and want a second opinion on whether it is set up to ship — or whether it is heading for the same fate as the £200K pilot — I run a 30-minute sanity check call.

No pitch, no slides, just an outside read on whether your pilot has the four decisions in place.

If you want that outside read, reach out and share:

  • The use case in one sentence
  • The named human whose work it will change
  • The business metric and target
  • Your 90-day production picture

You can fix an AI pilot in week one. It’s almost impossible in month nine.

⚠️If any line in your AI pilot plan reads “TBD” for owner, production environment, data path, on-call, or cost ceiling, you do not have a pilot—you have an experiment. Treat it accordingly.

Want a second opinion on your AI initiative?

30-minute sanity check call. No pitch, no slides.

Book your call →

Newsletter

This is where I share what I can't post publicly.

AI strategy for UK scale-ups. Monthly. No fluff.

Subscribe to Beyond Growth →