Last Thursday (Sept 11) we hosted an event called Building Frontier AI Products in San Francisco. We brought together Fergal Reid and Molly Mahar from our AI Group at Fin, Silas Alberti from Cognition, and Niko Grupen from Harvey, as well as Brett Chen from Perplexity.
We were there to talk about the hard things. About trying to get to reliability and scale with a probabilistic engine, trying to break away from old norms of building software, and trying to recalibrate and understand the human factors that make that hard.
Before the talks, we gathered to share technical insights and chat 1:1 with attendees, and during those conversations and afterwards, the room was full of good questions. People were ready to dig into the tension between figuring out what AI can do, and making it do that reliably in a world where we can’t control all outcomes.
Here’s what we talked about onstage.
Building purpose-built models for better value
Fergal’s talk was all about value, how decisions are made, and how Fin is owning more of the AI layer. He showcased the different layers at play in AI products — the model layer, the AI layer, the application layer — and made it clear that what’s made Intercom succeed is understanding that decisions about what is possible come first from the AI group itself.
His team is doing old-school RnD work, really — it’s an experimental culture, exploratory and hypothesis-based, and that requires that his team is trusted to make value judgments as the people who are closest to the material work. He described how an understanding of AI value has to first come from whether an idea is possible, then if it solves a real problem, and then whether it can be done quickly and reliably.
He also gave a high level look at some recent explorations and investments that have resulted in new custom models for Fin — models that handle reranking, retrieval and issue summarization, and more — building to one clear takeaway on an emerging pattern we see:
“You can get really good performance by replacing LLM calls with more special purpose models.”
Building intelligence that really scales
Brett Chen from Perplexity gave a talk that was all about hard engineering realities, the grunt work of scaling intelligence when you want to build truly decentralized agents that work with millions of users across many use cases, if you want them to be truly “production-grade”.
Chen’s work focuses on AI Agent architecture, and in his view, Perplexity’s success at scaling is all about the details: reliability and availability, evaluation, and careful iteration are what make these systems truly production-ready and distinctive.
It’s insane to think of the impact of this barely 3-year-old company, and was great to get an inside view into products like Comet.
“If there’s one message I’d like you to take away, it’s that reliability is what distinguishes your product.”
Building alongside organizational human nature
Molly Mahar, our Principal AI Designer for Fin, gave a stellar talk about the cracks that exist in organizations that are in tension with the experimental way our AI Group works. She gave great advice about how to work across a larger organization to manage assumptions — about ownership, about latency, about how well you can anticipate changes.
Molly’s talk was a refreshing look at the human factors involved in getting great AI work out the door.
“Survival is about ruthless, constant re-prioritization. Because customers, execs, other teams — they all want something from you. Which is why NO is non-negotiable.”
We need human judgment to build AI for humans
Last up, Silas Alberti from Cognition and Niko Grupen from Harvey joined Fergal and our COO Des onstage for a great panel conversation.
The biggest theme that arose was that while the technology is advancing fast, the role of human judgment makes an enormous difference. As long as we haven’t gotten to AGI and tools that can’t self govern, knowing when a system is “good enough” to ship is key. That balance of making bets and then managing outcomes is still something they’re doing, day in and day out, as they build.
That resonated in the questions and conversations we were asked, as well — nuanced questions that dug into the places where human experience and AI meet — questions we were only able to scratch the surface of, like “How do we think about the end user experience — should we be thinking about a platform or split screen operating system and not just our own OS?”
There’s so much more for us to talk about. It felt on the night that people respond to authenticity — they don’t want the hype, they want to see and hear about the failed experiments, the messiness, not polished “look what we built” pitches. We’re excited to keep sharing more as we build.
Catch up with the whole event on demand