Big week. Huge. A wave of launches and commentary sparked new conversations about the direction of AI — and revived some old questions about the role of AI in visual media.
But first: Team Fin around the world
Des spoke at one of the four official Ted conferences running this year, TedAI in Vienna:
Paul was in Munich talking to Andrey Khusid of Miro:
Finally, our SVP of Engineering Jordan Neill got on stage with Figma in San Francisco, comparing notes on products and accelerating engineering with VP of Engineering Marcel Weekes.
Keep up with all our upcoming events — including Pioneer, more on that in a minute.
“Wild times ahead”
It feels like the next wave of visual design tools arrived this week in one fell swoop:
OpenAI launched Sora 2, offering coherent generative video that (mostly) gets the physics right.
With their Claude 4.5 release, Anthropic highlighted Imagine with Claude with the promise of real-time generative interface creation.
Most of the online chatter seems to be about Sora, which is definitely impressive — and in an echo of the many Studio Ghibli posts from March (hard to believe that was only a few months ago) — has spawned thousands of videos of an AI-generated Sam Altman doing occasionally illegal things.
But in our design team, the early excitement was for Imagine, showing Anthropic moving definitively into visual forms, and what looks like a new way to design software. Unlike vibe coding, where you build the UI up front, it progressively vibe codes itself based on how you interact with it:
We can already imagine lots of implications and use cases for Fin and Intercom.
“A very new direction. Wild times ahead.” — Emmet Connolly, SVP of Design
What’s interesting is that both Anthropic and OpenAI recently produced highly polished, humanistic, and seemingly mostly analog campaigns for their brands:
We loved both campaigns, for what it’s worth. It’s smart, and counterintuitive, on both companies’ parts to show that elevated production values — and good storytelling — are still what matters most. That’s going to continue to be true, no matter how future images are produced.
Also this week, we spotted the design tool Pencil, an “open-source agent driven MCP canvas” that lives in your codebase and works on Cursor, Claude and others. It’s set about tackling the problem of getting Figma-like precision within a vibe-coding tool. We’re definitely curious.
Rebooting code and commerce
Just days before the Sora 2 announcement, OpenAI introduced upgrades to its coding tool Codex for GPT-5, signaling a focus on developer ecosystems. There’s been so much competition in this space, so it will be interesting to see what Claude Code (Anthropic’s code-agent tool) will bring to the table next. It seems like the gaps are narrowing — like the agents and features in developer tools are converging.
As if they didn’t have anything else going on, OpenAI added an e-commerce integration — a Stripe-powered feature called Instant Checkout. It allows US users to purchase from Etsy and Shopify within chat. They also open-sourced the Agentic Commerce Protocol that powers it.
“AIs have quietly crossed a threshold”
After the wave of launches, we were digging into research and commentary about the progress of agents automating human work.
OpenAI (those guys again) introduced GDPval, “a new evaluation that measures model performance on economically valuable, real-world tasks across 44 occupations”. Its benchmarks come from measuring actual customer-facing tasks across a range of industries, and it evaluates a range of models, not just its own.
Ethan Mollick of One Useful Thing commented on the research:
AIs have quietly crossed a threshold: they can now perform real, economically relevant work.
But he pointed out while human experts are still “just barely” ahead of AI agents, they won’t replace humans soon — because most tasks completed by agents ignore real-world workflow constraints. True adoption requires grounding agents in rules, oversight, and handling edge cases, rather than promising open-ended autonomy.
We’re already tackling this problem with Fin Guidance and Fin Tasks, features that focus on complex queries and the context that informs them). The study confirms what we’ve been saying — that Frontier models are achieving parity and quality on one-shot tasks on discrete, well-specified outputs (draft emails, summaries, triage suggestions, first-draft proposals).
While agent advances taken on their own are amazing and will soon surpass human experts on average, real value and true integration with the workplace comes from designing for reliability and failure recovery.
Our parting thought
Pioneer, our annual summit for AI customer service leaders, is next week, and we’re excited.
We’ll be back next Friday to talk about everything that went down onstage.
See you then!