General intelligence isn’t the bottleneck

Frontier models keep getting smarter, that doesn’t mean they’re better.

May 18, 2026

When a frontier lab releases a new model, you might assume it’s gotten better at everything. Reasoning, coding, Ancient Greek, customer service, all enhanced in one fell swoop. And as this model moves horizontally, mastering disciplines, it will gobble up all the AI products built atop it.

But we’ve seen evidence that isn’t true.

Before we launched Apex – our own model, trained specifically for customer service – we ran Fin in production on Claude Sonnet 4.0. We tested later releases, Sonnet 4.5, 4.6, and Opus 4.5, and none outperformed Sonnet 4.0 on our RAG customer service task.

Turns out, the latest aren’t always the greatest – at everything.

As Fergal, our Chief AI Officer, puts it: “I think there will be some tasks that you can saturate in terms of intelligence and extra general intelligence doesn’t really move the needle for those tasks.” That means, some AI products, built with specific judgment and expertise can outperform the general models, even if they rely upon them.

Fergal recently joined The Chief AI Officer Show to talk about what three years of building AI products teaches you about models, pricing, and why Agent adoption is slower than it should be.

Unseen effort

Two years ago, many AI businesses heard the same criticism: “you’re just wrapping the model.” Cynics assumed that the journey from ChatGPT to Fin was easy – that what we were building was just a thin layer on top.

Fergal thinks that was “a total misunderstanding… People did not understand how much work you have to do around the model in order to deliver a product that was valuable.” He believes our investment in AI was “deeper and broader than any new entrant,” requiring a significant portion of R&D effort.

Today, people talk about “the harness around the model.” That framing is also flawed.

An AI model isn’t something you build a harness around, but a component, one building block in a larger engineered system. For Fin, the general models showed up at defined points in its architecture to deliver specific constrained functionality. Everything else was underlying infrastructure.

That will change over time. Models will take on more. But as they do, the pressure to specialize – rather than rely on horizontal general intelligence – increases alongside it. “You want to make sure that you’re continuing to add a big thick layer of value between the model and the set of business problems that you’re addressing.”

Why the frontier model isn’t always right

When trying to explain a model’s general intelligence vs. ability on specific tasks, and why AI products can survive outside of frontier labs, Fergal provides a thought experiment.

You’re hiring a retail assistant in a clothing store and you have two choices: a Nobel Prize winning physicist with a really high IQ or a retail assistant with 20 years of expertise.

Initially, you might want to go for the physicist – they’re smarter. But in reality, you only need so much general intelligence for the task. You also need emotional intelligence, sales experience, and personality. That’s true in the real world and Fergal believes it’ll be true for models.

This is one of the reasons he and his team began building our own specialized model, Apex.

The other was the significant improvement in open-weight model performance. Once they began to approach frontier models, it opened a new strategic option: take a strong open-weight model, apply a significant reinforcement learning program, and teach it the exact tradeoffs your domain requires. Initially, Fergal just expected Apex to be cheaper to run, but during training, he soon realized “it’s going to be better by the time we’re done.”

The long-term play for application-layer companies is vertical integration: build or fine-tune models optimized for your specific domain, rather than relying on horizontal models carrying capability you don’t need.

The gamble of outcome-based pricing

When we launched Fin with outcome-based pricing at $0.99 per resolution, we were pricing at a loss. The cost to serve a single resolution (running GPT-4 with dedicated capacity) was around $1.50 at the time.

The decision to price that way came down to two bets Fergal owned: that inference costs would fall, and that resolution rates would rise. Both felt like genuine risks but he “was very confident in them.”

However, the pricing model wasn’t primarily a unit economics decision. The more important reason was alignment. Outcome-based pricing makes every part of the organization – sales, customer success, R&D – want exactly what the customer wants: a resolution that actually works. “That’s not to be underestimated.”

Three years later, outcome-based pricing has become the expectation for high-value AI products. The market caught up.

Adoption is slower than the technology deserves

Fin now automates 83% of our own customer support volume and we’re not an outlier – we have customers with even higher automation rates. The technology works. And yet customer service adoption across the industry has been slow.

According to Fergal: “To my mind we should have seen much more radical adoption of Agents in customer service than we actually have.”

The reason is structural. Many businesses view customer service as a cost centre and are therefore slow to adopt new technology. Unlike coding Agents, there’s no product-led growth dynamic pulling people in – no engineer spinning up a trial, no bottom-up pressure from people who’ve already seen it work.

The skepticism is understandable, even if it’s expensive. When we tell people that 83% of our support volume is fully automated, they often don’t believe it. “In the back of their head they’d be like, it’s probably 30%.” We’ve had to learn to show the distribution of automation rates across our customer base to make the numbers credible – even internally.

AI is underhyped, Fergal argues as people are pattern-matching to previous hype cycles and find it hard to take genuine capability seriously. The diffusion curve is playing out but slower than the technology warrants.

To hear the full conversation and Fergal’s thoughts on evaluating models, the company’s initial reaction to ChatGPT, and if society is prepared for the impacts of AI, tune in here.

Discussion about this post

Ready for more?