2× – nine months later: We did it

Apr 16

You can too.

20 Comments

Darragh, really appreciate the transparency! Willing to share your story at our next AUTONOMOUS summit end of May? We typically get around 15,000 participants, and this time we're exactly focused on this type of fundamental transformation stories. It would be great to have you!

Reply (1)

Darragh Curran

Apr 21

Wow - thanks Philippe - that could be really cool - and there's a good chance I could be available too - want to follow up over email? darragh at intercom dot io

Reply (1)

Philippe De Ridder

Apr 21

Great! following up over email

Sugendran Ganess

Apr 20

Curious on how you're measuring token spend. I'd like to do similar graphs with my team's data, but finding the Claude analytics data a bit more miss than hit. FWIW we're on the plan instead of on-demand billing

Reply (1)

Darragh Curran

Apr 21

This is still a bit of work in progress, but we are on per token billing, and we're starting to pull data directly out of Anthropic's cost/usage APIs and get it into Snowflake

Bruno Kiafuka

Apr 16

Interesting post Darragh and love to seeing the AI adoption 🚀... A few questions:

- Agents reproduce whatever patterns exist in the codebase, good or bad. At 93.6% agent-driven PRs, bad patterns can spread faster than any human team could. How are you thinking about preventing that?

- For non-R&D folks Agentic coding is wild. Do you see that path eventually leading to non-engineers shipping to production?

Excited for the next post 👏👏👏

Reply (1)

Darragh Curran

Apr 18

Thanks Bruno.

re: reproducing good/bad patterns. yes - this requires you to be very deliberate - to be opinionated about what good looks like in various contexts - and make sure that both code generation and code review are respecting your wishes.

re: non engineers shipping to production - already happening - e.g. PM/Design shipping changes (e.g. small tweaks) but also building/shipping working prototypes/experiments. also folks outside R&D e.g. support ops team iterating on internal APIs used to power our instance of Fin, or a fun recent example, our Director of Tax shipping changes to our billing system and core product https://www.linkedin.com/feed/update/urn:li:activity:7451009289873395712/

Caoimhe Kennedy

Great read, appreciate your transparency!

Did you look at spend vs weighted PRs merged over time? Was this an average over your 12months?

Curious to learn more about what you saw there, it really tracks with my personal experience regarding token usage (rather than PRs) and what I've seen in the organization I'm in. However, also seeing that as ppl get familiar with using agents they're become more efficiently with token usage. Currently pushing the individual usage as a single metric as a place to start, this is great food for thought on breakdowns you took on from this.

Jason

May 28

This is such a great article, thank you for sharing Darragh. I love the ambitious goals. We're an AI first team and absolutely are most constrained at this point by the PR review process. We've automated a healthy portion of dev & PR review, but have not removed constraints on the final merge process being gated by a human. I'd be curious to know what conditions you put in place to determine what can auto-merge vs. being gated by human review?

Fun side note, our company (www.revenium.ai) is actually built to help companies teams measure the effectiveness & output of their use of AI, and we completely align to the metrics you published. Such a great read, thanks again!

Reply (1)

Darragh Curran

May 30

Thanks Jason - Revenium looks cool.

RE: AI approval - there's a little bit of detail in https://ideas.fin.ai/p/ai-is-approving-our-pull-requests - and also in this recent webinar: https://ideas.fin.ai/p/webinar-q-and-a-how-fin-3xd-r-and.

we are starting in areas of lowest risks, smaller changes, not on super critical high volume paths (e.g. more on new product build without usage, vs iterating on e.g. core APIs).

Reply (1)

Jason

May 30

Thanks Darragh. Interesting and helpful article. We're doing something similar with a multi-agent review of PRs and have seen similar results in terms of things it finds that a human never would. What we are missing now is the comparison of the outcomes that you've done for defect/revert rates vs human-reviewed PRs. That should help everyone get more comfortable going forward. Thanks!

Mark Goodhead

Apr 29

How do you prevent Goodhart's law seeping into the PRs per employee metric? I assume, even if it's not explicit in performance reviews (or is it?), all the public leaderboarding of this metric is going to naturally encourage people to break up PRs even more than they normally would? I'd be interested to see if LOC / PR has changed over time which might be one way to observe this

Reply (1)

Darragh Curran

Apr 29

replied on x so that I can more easily share an image too (https://x.com/darraghcurran/status/2049451862869295240) - I've not been worried about this, for one we've a very transparent and high trust environment, if all a person attempted to do was jack up the PR count but not accomplish more it'd be very obvious.

What we've actually seen wrt LOC per PR is that it's increased too, this makes sense when you think about the ease at which CC will write e.g. tests, or it's growing ability to take on larger scope of problems. We expect some downward pressure with our use of AI PR approval (https://ideas.fin.ai/p/ai-is-approving-our-pull-requests) which encourages smaller PRs.

One related take away for me, is that I've seen internally and externally people get caught up on not finding the perfect measure, and allow that to hold them back from making progress in the right direction. I don't believe there's any one perfect measure for this, triangulation is very important, and it's a big trap to fall into to not peruse the bountiful gains available because the perfect measure doesn't exist.

Nidhi Wadmark

Apr 22

Curious how did your prioritization process change? “39% faster ship to idea…” more velocity means more pressure on deciding what to build.

Thanks for penning the article. It is an insightful read.

Reply (1)

Darragh Curran

Apr 24

For now at least we've no shortage of things we want to build - but I do think roadmapping/prioritisation does change in major ways when you can unlock an abundance of executional capacity - https://ideas.fin.ai/p/product-strategy-still-means-saying

Jordan Moore

Apr 17

For #2, why not compare to the previous years Dec-Mar? Those charts just look like typical velocity coming out of the new year.

For #3, what is the X axis? It’s honestly hard to extrapolate from this because a lot of key information is missing. Also, what does agent-driven mean in a PR if only 19% are approved by agents?

Reply (1)

Darragh Curran

Apr 18

fair point on time range of comparison - I picked the period aligned with our steep ramp in PR throughput - and this wasn't the pattern we saw e.g. last year, yes we'd get some months where it'd blip up maybe 10-50% then dip down - the start of this year has been unprecedented, and all signs suggest the trend continues.

re: #3 - do you mean the code quality? x-axis is time/weeks - y-axis, is a aggregated view based on code analysis of contibutions that make the code base less complex (positive bars) or more complex (negative bars).

Agent driven means engineer is prompting Claude Code (in our case) with the context of the problem to solve, and the agent is writing all the code, submitting PR etc. 93.6% of PRs are like this.

19% AI approved, are PRs that are reviewed and approved without the need for a human reviewer.

Reply (1)

Jordan Moore

Apr 18

Yes— I meant Y axis, thank you for clarifying! Makes sense on this trend being different from past years.

Thank you for following up— that PR flow sounds agent-driven to me, very cool.

Reply (2)

Anne Marie Kingsland

Apr 21

A full breakdown of our PR review Agent just published on the Intercom blog:

https://www.intercom.com/blog/ai-is-approving-our-pull-requests-heres-how-we-made-it-safe/

Darragh Curran

Apr 21

yep - PR auto approval flow is very cool - a small crack team of specialised sub agents meticulously reviewing every PR - write up on this topic coming very soon.