The current impact of AI on engineering velocity

Justin Reock:

Wow. Hello. Welcome to DX Annual. Good morning, everybody.

Audience:

Morning.

Justin Reock:

Morning. Woo. All right. Love it. Yeah. Some good energy. So yeah, take a look around this room, right? Directors and VPs and CTOs. You’re all the people who aren’t just watching what’s happening with AI. You’re actually responsible for what happens next inside the organization. So this is the place to be today. So before we get into it, just a couple of really quick housekeeping items. We have two stages. So you’re sitting in the main stage right now and the breakout stage is just behind you at the other end of this floor. So after our opening general sessions, we’re going to split out into breakout tracks in the afternoon and your agenda booklet and the app, which you’ll get some details on in a minute. Have all the sessions mapped out. So just know where you want to be before the transition and you’ll be good.

Lunch runs from 12:30 to 1:30, and it’s going to be served on the first floor. So you’ll head downstairs when we break. And I also want to shout out the DXperts’ Lounge that can be found on the first floor. That’s where you can get some product demos. You can take a look at all of our research and ask some technical questions. So it’s really worth a visit down there. Each breakout session today has a live chat feature in the app, so use it throughout each talk to submit questions. You’ll be able to vote on questions and stuff like that and our hosts will be reading off of there. And so that’s where you’ll be kind of running things through for Q&A.

Our evening reception kicks off at 5:00 and we’ll have more on that as we get kind of closer to the end of the day. All right. So now let’s talk about why we’re actually here today. I’m really excited to hand things over to somebody who’s been driving towards this show today’s event for a long time. Greyson Junggren, co-founder and CRO of DX, to share some opening remarks. All right.

Greyson Junggren:

All right, man. Okay.

Announcer:

Please make your way to the second floor.

Greyson Junggren:

Everybody, please make your way to the second floor. Cool. Yeah. Thank you, Justin. Thank you all for being here. I know a lot of you have traveled from near from far. And I just want to start by saying we are so incredibly grateful for you guys coming out here, joining these sessions, learning with us today. The entire DX team is so excited for this. We’ve wanted to do this for a really long time. All of our customers, a lot of you all in the room have been asking for something like this for a very, very long time, to do this exact thing, to fill a room with you all, the practitioners, the people that are doing the real work, figuring out how AI is changing the way that we build software.

So today you’re going to hear from senior engineering leaders, DevProd leaders from companies like Dell, like Vanguard, like BNY, like Netflix, Uber, Dropbox, Airbnb, and several others. And you’re going to hear from them about what they’re seeing, how they’re rolling out AI, what they’re learning, and what they’re changing. And ultimately, our hope is that you walk away today with new connections, with people that are working through very similar challenges at similar stages as you all, as well as new ideas that you can take forward in your own organizations.

Announcer:

So let’s kick this thing off. Please join me in welcoming my co-founder and the CEO of DX, Abi Noda, for a fireside chat about what the data’s telling us, the actual impact AI’s had on developer productivity. And hosting the conversation is Brian Houck, a co-author of the Space Framework and a leading developer productivity researcher from Microsoft.

Brian Houck:

I’m really excited to be here today. The lineup is incredible. There are so many interesting talks that I cannot wait to sort of dive into, but first we’re going to dig into one of my absolute favorite topics. So Abi and the team at DX have been working on a research report covering AI’s impact on engineering velocity. It’s not published yet, so we’re all going to get a little bit of a sneak peek into some of their early findings. Now, before we dive in, I do want to say we only have about 30 minutes, so we don’t actually have time for Q&A, but if there are some burning questions you have, please find Abi or I afterwards and we would be happy to talk about it in more depth.

All right. So to kick us off, Abi, you are sitting on one of the coolest, largest data sets on how engineers actually work in the real world. And this upcoming report is digging into what is happening as AI adoption really scales across our industry. So before we get into the findings, I’m interested in what motivated this research, what really sparked it.

Abi Noda:

Yeah. Thanks, Brian. And I just want to echo Greyson’s welcome. This is so exciting to finally have everyone in a room to be able to gather and learn from one another. So a little background on this study we’ve been working on. Nearly every conversation I’ve had with folks like you over the past few months has begun with folks expressing that, “My CEO, our executives are expecting these astronomical gains because there’s so much hype about AI in the media and hearsay. It’s not what we’re seeing on the ground. How do we close that gap? How do we set realistic expectations? How do we even know what to be aiming for?”

And we’re hearing this from, I bet everyone in this room has probably experienced this to some extent. And so at DX, we started asking ourselves the same question. What really are companies seeing? What can we actually see in the data? What is the impact we’re seeing? And so that’s what we’ve been digging into at DX. And this is just the beginning. As we’ll talk about today, there’s a lot more to unpack. There’s a lot of nuance in the data, but at least we have sort of a first glimpse at what we’re seeing at least over the past 12 months as AI adoption has really matured and soared across companies.

Brian Houck:

Awesome. Well, I can’t wait to dive into some of the findings. So I’m a researcher and I love nerding out over research methodology. And I don’t want to spend too much time on this, but I’d like it, if you could just walk us through a little bit, like how did DX actually investigate this? How did you design the study and how did you decide which companies to include in it?

Abi Noda:

So at DX, of course, our customer network is, at this point, about 500 companies. For the purpose of this study, we took a sample of those companies. Just because of the effort required, we couldn’t really include every company. What we looked for in terms of the selection of the companies were companies that have reached a point of maturity in their developer’s adoption of AI. So we defined that as, I believe it was over 75% monthly active usage of tools. And in almost all cases, there’s been a significant rise in adoption over the past 12 months. We also looked, narrowed the scope to companies with over a hundred engineers. We also excluded companies that’d gone through major liquidity events or M&A, IPO, like major changes where regulatory impact could have a play.

And in terms of the methodology, it’s really hard. It’s really hard to single out the causality in terms of like what is AI and what are the other confounding factors. One of the things I see folks do a lot is cross-sectional analysis of, “Hey, the developers using AI are more productive than the ones who aren’t.” And there’s flaws I think in that approach because oftentimes the developers using AI were the ones who are already coding the most. And so they always look better. In our case, what we did is look at a longitudinal data. So we looked at the organizational level. As AI adoption has matured, what has been the impact on the organizational velocity and throughput? And we can talk more about what we mean by throughput and velocity in a moment.

Brian Houck:

Yeah, I love that. And as somebody who has been studying the impact of AI on the industry, being able to control for all of those different biases are so challenging. And so I’m glad that you did a very thoughtful approach. So I’m curious, as you set out to do this research, what were you expecting to see? What were some of your hypotheses? And then ultimately, what did you actually see? Were those hypotheses correct?

Abi Noda:

So as a developer myself, I had some sort of gut feel hypotheses, but to be totally honest, from a study standpoint, we really didn’t know. We were really eager to just see … I really didn’t know what the answer was. And we were really pushing the team to, “What’s the answer, what’s the answer? We want to know.” We want to know just as much as you guys probably want to know. And what we ultimately found was that the TLDR is that most organizations fall kind of between that 10 to 15% mark in terms of … And we’ll talk more about why we chose PR throughput as the indicator of velocity in a moment, but about 10 to 15% increase in throughput. And the actual median was around 8%, the mean was around 11%. So we can talk more about that in a moment as well.

So it was much more modest than I think we expected. I think it’s much more modest than many CEOs who are talking to their CEO friends about the 10x gains that the companies are delivering would expect. I think in terms of our engagements with different companies and customers, it wasn’t completely surprising. I mean, we’ll talk about why the gains are more modest than what all of us might’ve expected in a moment, but there’s a lot of factors underlying that.

Brian Houck:

All right. Well, to kick us off with maybe a spicier question this morning, so one of the central themes of the space framework is that developer productivity is nuanced, it’s complex, it’s about so much more than just the count of activities. And so given that, why did you choose PR throughput as sort of like your measurement? And do you have anything you could say about the nuance of how you measured that?

Abi Noda:

Yeah. So I mean, of course, measuring velocity or productivity is in of itself really challenging. And so for this study, we actually did look across a number of different metrics, but more across the full spectrum of space or the DX Core 4. But in terms of the focus of what we wanted to sort of focus on and publish, we felt like PR throughput was the most relevant right now and practical for the world, mostly because that’s what we see most organizations talking about and focusing on. So we kind of just wanted to meet the world where it’s at in terms of how it’s thinking about this problem.

We also explored our proprietary metric true throughput, which is adjust PR throughput with AI to sort of like weighted PR throughput. So it takes some of the noise out. Again, we liked that signal, but for relevance and practicality, we felt like PR throughput is a more accessible metric to focus on. We have more data on the other metrics and how those have been affected as well. So we’ll be eventually publishing those as well, but really, it’s about practicality and relevance in the moment, why we focused on PR throughput.

Brian Houck:

Awesome. So I suspect for many of you in the audience, your lived experience is 5%, 7%, 10% increase in PR throughput probably matches what you were feeling. That probably rings pretty true for many of you. I see some head nods. But as you mentioned earlier, when I go and talk to business leaders, they’re often expecting 20, 30, 40, 50% increase in sort of quote, unquote, “productivity”. And so I am just curious a little bit, as you went and talked to developers, why do you think we are seeing those gains might be lower than some people, particularly those on the outside of the industry might expect?

Abi Noda:

Yeah. So our team went and conducted follow-up interviews to really ask the question, “Hey, why aren’t the gains higher than what you’re seeing?” And there were a number of different factors. These were sort of the coded categories of responses. And this isn’t the full list, by the way. These are just the top five. The top one probably wouldn’t surprise most of us in this room. It was that coding is not the primary bottleneck for engineers. And Brian, of course, at Microsoft, you guys pretty recently published a study on where engineering time is spent. And I believe it was 14% of developer time is actually spent coding. So it’s a very small part of where our time and money is going. And so if AI is only optimizing that, it partially explains maybe why the enterprise velocity gains are limited.

We also heard about how these AI tools and new automation introduces new bottlenecks. And we all are thinking about this, things like review and technical debt, cognitive debt, new concept. We also heard about challenges with adoption. So it was pretty interesting, a lot of social friction in cultural clashes around adoption that’s sort of inhibiting full adoption. And there were a few more. Just for brevity, I won’t go into all of them, but those were some of the major themes we heard about.

Brian Houck:

So I did have a chance to preview a little bit of the data and we see that, or you saw rather that sort of the typical gains in throughput were 7.7%, but you don’t have to go that far out in the distribution before … There were some organizations seeing substantially higher gains. And I’m always interested in where do we have some wild outliers. And given that, I’m curious, as you look at some of these outliers, what set apart some of those companies that might have had some disproportionately high gains from the more typical experience?

Abi Noda:

So we don’t have a great answer to this yet. We really focus on why aren’t the gains higher was the first question. The second question is, okay, for those for whom the gains are higher, why? And I think there’s sort of two parts to that question in terms of how we want to approach it. One is that we’re planning to first tease out for the companies whose gains are measurably higher, we just want to double click into that to make sure, are these superficial gains? Are these real gains? We want to peel back the onion a little bit.

And then two, we want to understand, okay, assuming these are real true gains, what are the strategies they’re employing? And by the way, we have speakers today, I think we’ll be sharing some stories. There are good success stories and strategies that’ll be shared. I think a lot of it boils down to, what I’ve seen is sort of an all in culture, a fully bought in culture around centralized rollout and championing of these AI tools, obviously something you work on at Microsoft with your counterparts, but that’s the biggest thing I’ve seen. It’s more of a cultural shift toward really infusing AI into the entire AISDLC, not just coding, and really aligned cultural push to push that adoption through.

Brian Houck:

That actually aligns with some recent research that I had published that showed that even things just like leadership advocacy really increased the gains that people solve from AI. So you said something earlier that I think is really interesting and is often a misunderstood point about software engineering. Software engineering is so much more than coding. Coding may be the things that like us as devs, we want to do. It is like a central part of our identity, but to your point, it’s only about 14% of our day.

And one of the things that I’ve been seeing in some of my research is, well, yes, we are producing 5, 7, 10, 15% more pull requests. We are doing it substantially more efficiently. So we are actually seeing for like hands-on keyboard time spent coding, we’re actually producing about 40% more PRs per hour of coding. And so like that hence that we are reclaiming some of that coding time. Do you have any idea where that’s getting reinvested? Some of it clearly into more throughput, but where else is that maybe getting spread out in the SDLC?

Abi Noda:

So this has been a similarly perplexing question. So for the past six months, I’ve heard customers asking, “Look, we’re seeing X percentage self-reported time savings from developers. They’re saying they’re saving this much time, but we’re not seeing that show up in our output and throughput. So what’s the disconnect? Where’s that time going?” We don’t know. This requires additional research, but I will say, I think one preliminary hypothesis is if we anchor to that 14% number, if developers only spend 14% of their time coding, then it would make sense that the time they’re recouping is radically distributed across their activities.

So only 14% of the time savings go back into more code, which would explain why we don’t see one-to-one output gains mapping to time savings. I think another hypothesis, if we refer back to the question of why aren’t the gains higher is that some of the time savings actually come with side effects that require more time. So more time overseeing the work, QAing the work, reviewing the work. I think there’s also some anecdotal, like we’ve seen in our research a little bit, that there’s also the lag time. So developers using these tools, like what do they do while they’re waiting for the agents to produce the code? Do they go play video games, go for a walk? So yeah, those are some of the hypotheses, but we don’t have a clear answer yet.

Brian Houck:

So you actually just started touching on it, but I am curious, there is a reason that we don’t just measure lines of code as a good measure of productivity. Just writing more code isn’t always the right answer. And I’m curious, as we are increasing velocity, what are some of the potential unwanted side effects that you’ve been seeing?

Abi Noda:

Quality and cost are the two that I hear our customers talking about constantly, right? Cost is something we’re all trying to wrangle right now or just stay on top of, get some visibility into. And quality is something that I think we all understand is sort of an underlying risk, but I think it’s still early days. We did see like AWS and Amazon go very public talking about this, but there’s some delay between the technical debt that we’re perhaps creating and the later consequences that may follow.

I think the biggest thing that I’ve been talking to customers about is sort of this cultural risk of, I’ve been calling it false velocity, right? And I see this in so many organizations right now. And to some extent, even us as developer productivity leaders can fall into this trap of just being so focused right now on showing off how much faster and prolific we are with AI and not really focusing on what meaningful improvement is this leading to.

So what I mean by that is, you probably have engineers in your organization showing off really crazy things that they can do with Claude or whatever tool, but we’re not asking, “Okay, is your team’s product velocity actually increasing?” And leaders are talking about, “Oh, how many more PRs and how many more lines of code we’re generating,” but is their roadmap actually accelerating and is the quality of their products and code actually sustainable? So I think there’s this risk right now of just being so focused on showing off what we can do that we’re not paying attention to like, are we actually getting better? Are we actually materially improving our businesses?"

Brian Houck:

So that’s really interesting to me because one of the things that I’ve been trying to look at is like, what is the actual innovation velocity as a result of AI? Are we just shifting bottlenecks around, as you previously hinted at, or are we actually able to deliver innovation faster? And so what’s sort of your recommendation to engineering leaders who want to look at where they can put AI in lots of places in the SDLC, not just code execution?

Abi Noda:

I mean, that’s the biggest thing when I talk to leaders and hear about where their heads are at right now. A lot of folks are at a point where you’ve rolled out one or more of the popular coding tools to developers. Adoption is at a fairly satisfactory place. Maybe you’re kind of double clicking into that a bit more. So a good amount of folks are asking, “What next?” And especially if you’re sitting at that kind of 10% number, you’re asking, “Okay, well, we wanted 10X, how do we get there?”

I think the things I’m hearing about and seeing are, one, very clearly looking left and right of code. So we’ve solved the coding part to some extent. We’ve accelerated that, but as we know, if that’s only 14% of where our time and money is going, what about the rest of the 86%? How do we accelerate and optimize that? So I think that’s a really important question.

In addition to thinking about the SDLC in terms of left and right of code, there’s also, how do we improve the developer experience more deeply? So especially if you’re a DX customer, you’re thinking about things like deep work and documentation, these friction points that developers experience, how can we leverage AI to materially improve those things? And the third theme that I’m seeing a lot of organizations focus on is this idea of, there’s-

Abi Noda:

… Theme that I’m seeing a lot of organizations focus on is this idea of… There’s so many names for it, autonomous engineering, async engineering, background agents, this idea of… I think today we’re still at a point where most of us are focused on leveraging AI to accelerate the human work. Humans are still in the cockpit, they’re steering, they’re monitoring the work of the agents. How do we compliment that acceleration with augmentation, which is the idea of more autonomous agents that are truly augmenting your human workforce and working in parallel, not underneath our human developers. So those are the three big themes. And of course, using data to help inform and guide these investments is really important.

Brian Houck:

So you said a word there that I think is incredibly important for a lot of my research, human. For those of you who might be familiar with my work is I really enjoy looking at sort of framing the human context of how we work. And so I will look at things like, “How does having plants in your office make you more productive? How does access to sunlight change your productivity?” Even things like, “How does…” Exactly. How does spraying lavender in your face while you sleep improve your productivity? And so you and I are both huge fans of Dr. Margaret Ann’s story, and she’s been talking a lot about cognitive debt and sort of the human cost of some of these AI transformations. And I’m curious, what have you looked at in that area?

Abi Noda:

I’m a big fan of that research and those ideas. Margaret Ann’s story… Peggy has recently published. I think that falls under what I was talking about earlier in terms of the risks and how I think there is also a time delay. This is also recent and new. We’re not thinking about, “Okay, what are the consequences 12 months from now if we adopt certain ways of working?” So absolutely, I think the loss of human understanding of the systems that we’re building is a really interesting risk. There’s sort of different takes on how material of a risk that is, depending on how you look at it. Some people might argue, “Well, look, it doesn’t matter how, if the humans don’t understand the systems, because they can just use AI to quickly regain that understanding when needed.” That’s one argument. Others may argue, “No. If you need to call the mechanic, they better be able to work on the machine and have mastery over it.” So I don’t think we know yet, but I think it’s a really valid and interesting idea.

Brian Houck:

So we’ve sort of hinted at it throughout this talk, this notion of, “Are we actually delivering innovation faster or are we just moving bottlenecks around?” And my first reaction for anytime we have an unanswered question is, “Well, how do we try to measure it?” And I’m curious if you have any thoughts on how should we actually be measuring if we’re just sort of shifting our bottlenecks?

Abi Noda:

Yeah. And as everyone in this room is probably thinking about, and as we at DX have been talking about, I think our approach to measurement, some things stay the same and some things need to evolve and are evolving. Our perspective has been that… This is a framework we published, I think it was last September. So it feels really old in my mind, but I think that much of this still holds true today. There’s also a lot that has changed and it is new that we’ve been working on at DX. But to some extent, how we think about the overall software organization and things like quality and velocity, I think stay the same. And especially if we’re trying to understand how are things changing with AI, you need consistent measures pre, post, during this transformation. But there’s also a lot of new tools, new ways of working, new workflows, many new workflows being born every week.

And so how do we sort of measure these new ways of working? Well, those do require new approaches. And I’ll touch on two things and we can maybe double click into them some more if we have time. One is, I think increasingly, I think there’s a need to separate out how you’re measuring. So you’re really thinking about that acceleration. So the human lift, like how much faster are our humans is one bucket and then the augmentation, like how much more capacity are we generating and creating with agents? I think thinking of those as two separate buckets in your overall formula is really useful. And then the question of like how do we measure agents is really interesting. How do we measure ROI? We have some ideas here that we’ve seen out in the real world work pretty well. So this idea of reducing our understanding of agents to agent hourly rate, which kind of gives us this return on investment idea that we can compare to, say, human hourly rate, which gives us an interesting comparison point.

Something really new that was not even in our minds last September, but some of you may have heard about is this idea of agent experience. So in the same way that DX was founded, this whole idea, “How do we measure developer effectiveness?” Well, you go to developers and you get feedback and signal from them. At DX, we’ve just released a very similar idea, but for agents, how do you measure AI agent engineering effectiveness? Well, you go to the agents and you get feedback from them on where are their bottlenecks and constraints. So we’ve just rolled that out. We’re serving agents. That’s pretty crazy.

Brian Houck:

Okay. That is wild to me. And I know we’re almost at time. So do you have one bullet? What’s one finding you have on how to make agents more effective?

Abi Noda:

Well, you got to ask them, just like we ask our developers. So it’s really early days, but we’ve initially begun with measuring four factors and I won’t list all of them, but for example, we’ll ask the agents, “How were the requirements that you were given? How easily were you able to understand the code base that you were working within? How well were you steered by the human that you were pairing with?” So these are the types of questions we’re asking agents in a calibrated way. So we get quantitative metrics from that as well as qualitative feedback. So the agent will explain where they had challenges or bottlenecks. So that’s the rough idea.

Brian Houck:

I love it. I love the same principles to measurement apply. So that is unfortunately all the time we have. We were able to cover a lot of ground, thankfully. Abi, thank you so much for sharing some of these early sneak peeks into your research. Can’t wait to read the full report. So if you had any questions that we weren’t able to get to today, please find us throughout the day, particularly Abi. He’s the man with all the answers. And just thank you so much. And with that, I will pass the mic back to Justin, I think, to introduce the next session.

Abi Noda:

Thank you, Brian. Thank you so much. It’s great.

Justin Reock:

All right. Thanks, Abi and Brian. So what Abi just walked us through is the macro view, like the data, the patterns, what’s actually shifting across the industry right now. So we’re going to zoom in a little bit. Data kind of tells us what’s happening, but it takes leaders to tell you how to react and to respond to that data. So we have three of those leaders that are going to join Abi on stage right now to dig into what it actually means to design an AI native engineering organization. So I’d like to welcome to the stage, our next panelist. We have Tim Bozarth, who is the CVP of CoreAI at Microsoft. We have Taroon Mandhana, who is our CTO at Atlassian, and we have Nancy Wang, who’s the CTO at 1Password. Now, just like before, we only have 45 minutes and there’s a lot of ground to cover, so there won’t be time for audience Q&A in this session, but please find the group afterwards. Thanks so much.

The current impact of AI on engineering velocity

Show notes

Most organizations are seeing modest gains from AI

Coding is only one part of the productivity equation

Why productivity gains are lower than many leaders expected

Beware of false velocity

The biggest opportunities lie beyond coding

Measurement frameworks are evolving

Cognitive debt is a new concern

Timestamps

Transcript