16 Comments
User's avatar
Philippe De Ridder's avatar

Darragh, really appreciate the transparency! Willing to share your story at our next AUTONOMOUS summit end of May? We typically get around 15,000 participants, and this time we're exactly focused on this type of fundamental transformation stories. It would be great to have you!

Darragh Curran's avatar

Wow - thanks Philippe - that could be really cool - and there's a good chance I could be available too - want to follow up over email? darragh at intercom dot io

Philippe De Ridder's avatar

Great! following up over email

Sugendran Ganess's avatar

Curious on how you're measuring token spend. I'd like to do similar graphs with my team's data, but finding the Claude analytics data a bit more miss than hit. FWIW we're on the plan instead of on-demand billing

Darragh Curran's avatar

This is still a bit of work in progress, but we are on per token billing, and we're starting to pull data directly out of Anthropic's cost/usage APIs and get it into Snowflake

Bruno Kiafuka's avatar

Interesting post Darragh and love to seeing the AI adoption 🚀... A few questions:

- Agents reproduce whatever patterns exist in the codebase, good or bad. At 93.6% agent-driven PRs, bad patterns can spread faster than any human team could. How are you thinking about preventing that?

- For non-R&D folks Agentic coding is wild. Do you see that path eventually leading to non-engineers shipping to production?

Excited for the next post 👏👏👏

Darragh Curran's avatar

Thanks Bruno.

re: reproducing good/bad patterns. yes - this requires you to be very deliberate - to be opinionated about what good looks like in various contexts - and make sure that both code generation and code review are respecting your wishes.

re: non engineers shipping to production - already happening - e.g. PM/Design shipping changes (e.g. small tweaks) but also building/shipping working prototypes/experiments. also folks outside R&D e.g. support ops team iterating on internal APIs used to power our instance of Fin, or a fun recent example, our Director of Tax shipping changes to our billing system and core product https://www.linkedin.com/feed/update/urn:li:activity:7451009289873395712/

Mark Goodhead's avatar

How do you prevent Goodhart's law seeping into the PRs per employee metric? I assume, even if it's not explicit in performance reviews (or is it?), all the public leaderboarding of this metric is going to naturally encourage people to break up PRs even more than they normally would? I'd be interested to see if LOC / PR has changed over time which might be one way to observe this

Darragh Curran's avatar

replied on x so that I can more easily share an image too (https://x.com/darraghcurran/status/2049451862869295240) - I've not been worried about this, for one we've a very transparent and high trust environment, if all a person attempted to do was jack up the PR count but not accomplish more it'd be very obvious.

What we've actually seen wrt LOC per PR is that it's increased too, this makes sense when you think about the ease at which CC will write e.g. tests, or it's growing ability to take on larger scope of problems. We expect some downward pressure with our use of AI PR approval (https://ideas.fin.ai/p/ai-is-approving-our-pull-requests) which encourages smaller PRs.

One related take away for me, is that I've seen internally and externally people get caught up on not finding the perfect measure, and allow that to hold them back from making progress in the right direction. I don't believe there's any one perfect measure for this, triangulation is very important, and it's a big trap to fall into to not peruse the bountiful gains available because the perfect measure doesn't exist.

Nidhi Wadmark's avatar

Curious how did your prioritization process change? “39% faster ship to idea…” more velocity means more pressure on deciding what to build.

Thanks for penning the article. It is an insightful read.

Darragh Curran's avatar

For now at least we've no shortage of things we want to build - but I do think roadmapping/prioritisation does change in major ways when you can unlock an abundance of executional capacity - https://ideas.fin.ai/p/product-strategy-still-means-saying

Jordan Moore's avatar

For #2, why not compare to the previous years Dec-Mar? Those charts just look like typical velocity coming out of the new year.

For #3, what is the X axis? It’s honestly hard to extrapolate from this because a lot of key information is missing. Also, what does agent-driven mean in a PR if only 19% are approved by agents?

Darragh Curran's avatar

fair point on time range of comparison - I picked the period aligned with our steep ramp in PR throughput - and this wasn't the pattern we saw e.g. last year, yes we'd get some months where it'd blip up maybe 10-50% then dip down - the start of this year has been unprecedented, and all signs suggest the trend continues.

re: #3 - do you mean the code quality? x-axis is time/weeks - y-axis, is a aggregated view based on code analysis of contibutions that make the code base less complex (positive bars) or more complex (negative bars).

Agent driven means engineer is prompting Claude Code (in our case) with the context of the problem to solve, and the agent is writing all the code, submitting PR etc. 93.6% of PRs are like this.

19% AI approved, are PRs that are reviewed and approved without the need for a human reviewer.

Jordan Moore's avatar

Yes— I meant Y axis, thank you for clarifying! Makes sense on this trend being different from past years.

Thank you for following up— that PR flow sounds agent-driven to me, very cool.

Anne Marie Kingsland's avatar

A full breakdown of our PR review Agent just published on the Intercom blog:

https://www.intercom.com/blog/ai-is-approving-our-pull-requests-heres-how-we-made-it-safe/

Darragh Curran's avatar

yep - PR auto approval flow is very cool - a small crack team of specialised sub agents meticulously reviewing every PR - write up on this topic coming very soon.