Skip to content
Podcast

From PR throughput to product velocity: How Dropbox is rethinking productivity in the agentic era

In this session from DX Annual, Uma Namasivayam, Senior Director of Engineering Productivity at Dropbox, shares how the company's developer productivity efforts evolved from improving developer experience to preparing for the agentic era. He explains how Dropbox approached AI adoption across its engineering organization, the impact it had on developer productivity, and why faster code generation is creating new bottlenecks in areas such as code review, validation, and CI/CD. He also discusses Dropbox's efforts to rethink engineering systems, measurement, and workflows, including the development of agentic tooling and new metrics designed to move beyond PR throughput and toward product velocity.

Show notes

Dropbox’s productivity journey started before AI

  • DXI helped Dropbox identify productivity problems as system problems rather than talent problems. When the company began measuring developer experience in 2023, it found significant variation across teams in DXI scores, PR throughput, and cycle time.
  • Measuring developer experience created a framework for prioritizing investments. The team used DXI to identify friction across areas such as debugging, documentation, and build systems while giving leadership a common language for discussing productivity.

AI adoption required more than access to tools

  • Dropbox combined executive support, developer segmentation, enablement, and strong guardrails to drive adoption. Different teams and developer roles were matched with different tools and workflows based on their needs.
  • The approach helped Dropbox increase AI adoption from roughly 30% to 100% within six months. During the same period, PR throughput doubled and developer satisfaction with AI tools increased significantly.

Engineers used their extra capacity to tackle neglected work

  • As AI increased throughput, engineers naturally pulled maintenance work, migrations, and technical debt from the backlog. Dropbox saw significant growth in these categories without any specific direction from leadership.
  • The additional capacity was often reinvested into engineering health. Teams used the opportunity to address long-standing issues that had accumulated over time rather than focusing exclusively on new feature development.

The next challenges are scale, trust, and measurement

  • Dropbox believes the move to agentic engineering creates three major challenges: scale, validation and trust, and measurement. Existing development systems were not designed for a world where AI dramatically increases code throughput.
  • As code generation accelerates, bottlenecks are shifting toward code review, validation, and CI/CD systems. The company is already seeing pressure move downstream in the software development lifecycle.

Agentic engineering requires redesigning the entire system

  • Uma compared the transition to the shift from steam-powered factories to electric factories. The biggest gains came from redesigning the entire system rather than simply replacing one technology with another.
  • Dropbox is investing in agentic workflows across the SDLC and building Nova as an orchestration layer. The company is evaluating roughly 30 development steps, and one in twelve pull requests is already being generated by Nova.

PR throughput is becoming a less useful measure of productivity

  • Dropbox believes traditional engineering metrics need to evolve alongside AI. As agentic workflows become more common, measuring productivity through pull request volume alone provides an incomplete picture of engineering output.
  • The company is increasingly focused on metrics such as AI contribution, loaded cost per PR, agentic workflow coverage, work distribution, and time to ship. The goal is to better connect engineering activity to customer value and business outcomes.

Timestamps

(00:00) Intro

(00:57) The beginning of Dropbox’s DX journey

(02:34) AI adoption at Dropbox: what made it work

(04:46) The results of Dropbox’s AI adoption efforts

(05:39) What the results mean for the business

(06:55) The phases of AI adoption and where they are now

(08:00) The new bottlenecks

(09:16) Three challenges Dropbox faces moving into agentic engineering

(10:05) How Dropbox is redesigning the SDLC for agentic engineering

(15:46) The new metrics that matter

(19:16) Final takeaways

Listen to this episode on:

Transcript

Erin:

Hi, everyone. Welcome back. We still have another minute as everyone gets settled in. If you were already on the main stage talk that we just had, if you don’t mind also squeezing in again, just because we’ll have the entire group here, want to make sure that everyone has a spot to sit if they can. We’re going to get ourselves moving.

Also, we’re saying as we go into these last two sessions, we’re not going to be doing live Q&A anymore, but the Q&A within the actual event app will still be open for you. So please keep submitting questions that you want answered. We’re going to hear from Uma in a second, as well as a one more panel to close us out. We want to make sure that we’re still covering the things eventually that are coming up for you as you’re listening to those talks.

Okay. Well, all day, we’ve been hearing about pretty similar themes about how AI is changing the way that we build things, what we’re building. Our next speaker has spent pretty much the last year thinking really deeply about how we can get ahead of what he refers to as a potential scalability crisis. So as we dramatically increase our code generation, our systems are being pressure-tested, we’re returning to this industry-evergreen question of how we really measure developer productivity. Do our frameworks still apply? So pretty on theme with what we’ve talked about today, but we’re going to hear it from an engineering and productivity leader’s perspective. So Uma Namasivayam is the senior director focused on engineering productivity at Dropbox. So in a company where the entire product is built on making knowledge work more efficient, Uma and his team have really had to rethink what productivity means in the agentic area from the inside out. So please welcome Uma to the stage.

Uma Namasivayam:

Thank you, Erin. Appreciate it. Good afternoon, folks. So good to be here. So the title of my talk, “From PR Throughput to Product Velocity,” I chose this title really, really carefully because we have been counting PRs for a very long time and we are moving towards actually how fast we can ship products, and that’s a journey Dropbox has been going through for the last three years. But when I came here, when I listened to the last three talks, I realized that everybody’s saying the same thing. Either I’m very, very influential or something else is going on. But jokes aside, it is very, very good to see that the industry itself is converging towards a specific way of doing things. So we can actually learn from each other and actually take this industry innovation to the next level. So in my talk, I’m just going to talk about our journey through AI adoption, where are we today and where are we going forward? So that’s kind of a wrap-up for all the things that is happening today.

So our journey with developer experience started back in 2023. We partnered with DX. We ran a lot of benchmarks. The numbers weren’t great, to be honest. So the developer experience investments started in 2023. We did invest in infrastructure, the workflows, the upskilling of the team end-to-end. What we also realized was it was not a talent problem. It was more of a system problem.

Just to give you some perspective of how the metrics were shaping then, our baseline metrics. Our DXI score was 52. For the folks in the room, DXI is developer experience index. It’s DX’s effectiveness scale. So what it measures is it measures across 12 dimensions like production debugging, documentation, build and test, you name it. What was so useful about this metric was when we go to the board, we can actually talk about how are we doing on specific areas and how the teams themselves are doing. So we got a heat map of different teams across different dimensions, which was very, very, very powerful, and that’s what we started with. And PR engineers per month was at less than 13. Also, a huge variance between the different teams. It may sound very, very archaic looking at the number now with where agents are going, but that’s where we started with. Cycle time, the time it takes to actually write the code to land in production was five hours. And that was also kind of very, very high variance across the world. So overall, the numbers weren’t good, but we invested very, very aggressively over the last two years. And that actually helped us to get to the next phase of our world, which is AI adoption.

So as you all know, LLMs made code generation very, very next level of the model adoption. So late 2024, there were folks using AI organically, but it was not actually managed properly. There was no playbook. We were actually building the playbook as we ran through. So our actual AI adoption journey started in early 2025. And before I actually talk about how it actually panned out, I just want to also summarize what actually made it work.

So top-down executive support. I know a lot of folks before me also said the same thing, but there’s some nuance here also. Leadership coming in and saying that AI is important drives the right type of culture from top down, not like a mandate, but at least it helps. In addition to that, making sure your processes are also helped in a way where we are actually moving faster within the company. Speed is going to be of the most important essence. We actually looked at our whole legal procurement process to reduce the time it takes to bring more and more tools inside that all required leadership support and was really, really helpful.

Then was developer segmentation. Just like our DXI, we looked at how early adopters were using AI, how the skeptics are using AI, and we had very, very different approaches for them. In addition to that, just like the segmentation, different segments rate different tools. A web developer versus mobile developer versus desktop developer has different types of AI. And we started matching those tools for the different developer segments and that all brought it together.

And enablement and change management, this is something that we did not take very lightly. Our training matters a lot. The models were evolving so fast. Cursor was evolving so fast. Claude came up with something very, very unique every single month. So we brought in folks from those organizations to give training for our developers. And AI champions, folks who actually did a really, really good job within an organization, they also were incentivized to come and show what wins they had. And this whole feedback loop made this process really, really fast.

And last but not least, but most important one, your safety and guardrail mechanism. We worked very closely with the security team and also making sure the code quality is not regressing. And this was something that was paramount for how it actually worked.

So this is all great. So what happened with these numbers? 100% adoption by year end. We went from literally 30% to 100% in the six months, and this is not just people using tools, people actually making behavior changes with all the tools that they have. It also gave us 2x more PR throughput. And this also has a variance. There are some folks in the teams who were going at 3 to [inaudible 00:09:33] when we started in 2025, but overall as an organization, we went to 2x. Our DXI score also increased because of the investments we made. Just a caveat here, if you are comparing the numbers from 2023, the DXI had some changes in how they were measuring it. So in general, the numbers actually went up, but that’s also a good investment that we made. In addition to that, the tool CSAT, in terms of how the developers thought about the tools, that also went up by quite a lot from 69% to 82%. So really good metrics, but what does it mean for the business?

So this is an important slide that I want to actually spend some time here. What you’re seeing in the chart here is the amount of PRs that are categorized by different categories like security/compliance, tech debt/migrations, you name it, from 2024 till end of last year. On the Y-axis, you have what we call the weighted PR account, which is basically takes into account the complexity of PRs and whatnot. And we started our AI adoption phase in 2025, May or April, something around that time. You can see the orange line. You can see a huge increase in how the maintenance PRs or the CTO Orgs are going up. And this is a trend that we saw across all the different organizations also.

So what we realized was engineers themselves, when they got the extra capacity, they were pulling in from the backlog to work on the tech that has been neglected for quite some time. So this was a very great insight. We didn’t do any sort of communication or whatnot. This happened very, very organically and the throughput actually goes up. So this is great. This is probably a time where I can say, “I’m done. Great presentation,” and move on, but there is more to that.

So we solve the great adoption problem, okay, which is amazing, but the reality is the bottleneck starts arising from now on, and that’s a phase we are in right now. So before I go into the bottleneck, we believe the AI code generation stages are in multiple phases. Early 2025 was all about autocomplete and chat-based assistant. Developers used to talk to Claude Code, get snippets. We had plugins for a VS Code, JetBrains and whatnot. And this was basically making the life faster. You saw the 2x increase coming from this, but right now, we are somewhere between three and four in Dropbox where AI agents are able to inspect the repo, edit files, run tests, and iterate on it. And the ultimate goal is folks can start managing async agents by themselves. And all it means that your code generation volume is going to go up through the roof pretty soon. So we are somewhere between two and three, depends on where it is. We have some high-achieving developers who are in four. So it depends on where they are, but the spectrum is between two, three, and four right now.

So the question I want to ask all of the leaders here, engineers here is, if your code throughput goes up by 3x tomorrow, would your SDLC be able to absorb it? We have seen some early signs it is not the case. It breaks everywhere. I’m sure it is a case for everybody out here also, but that is the problem statement that we are going after right now.

Claude, can you move this? Now, it’s all right. Okay. So what we have seen is the bottleneck has moved. AI-accelerated action is actually in the code generation phase. I’m not going to spend more time on that, but what’s happening right now is we are seeing the code review time actually increasing quite a bit. In addition to that, because the speed at which code is getting generated, we also believe the cognitive overload for developers are high. They’re not able to actually make the right judgment calls on whether the code is right. In addition to that, they also believe agents are also going to deliver better code. So the quality aspect of code review is actually a little bit suspect at this point.

The CI/CD load is also going up. For every PRs, we spin up a lot of builds and we are seeing an explosion in terms of the number of builds that is happening. There’s a cost concern as well, so that optimization has to be done there as well. And the validation time also is increasing, so your wait time for the PR cycle time is also going up. Pretty much your bottleneck is moving to the right. So faster code generation means more and more pressure during the downstream stage.

So to summarize, what are the three challenges that we are looking at? I talked about scale. So AI is actually letting engineers initiate more and more parallel work, expanding the engineering surface area also. So the system that we had in the past is definitely not built for it. So that is one problem. As a result, the validation and trust, the quality aspect of it, the security aspect of it is also a little bit suspect. So we have to make sure that we are thinking about those as we build the system again.

And I know the Uber folks also talked quite a bit about measurement. We also believe that when your AI throughput is increasing significantly, all the things that we did in the past in terms of measuring PR throughput is no longer valid. We need to think about it very, very differently in terms of how the customer value is actually getting generated. So we are also really sending the entire measurement system for the agentic SDLC, and those are the three challenges we believe in the next six to 12 months that we need to solve for.

I think it was Airbnb that also came up with a very similar analogy. I didn’t talk to them, but we also have a very similar analogy here. So in 1900s, we had a bunch of steam engine. Folks came up with an electric motor. There is one philosophy where people can just take away the steam engine and put the electric motor. You get some benefits. That’s where most of the engineering orgs are there. But there is the better way to do it is like rebuilding the factory, making sure your agentic SDLC is actually the center of it and reorganize everything around it. And that’s what we are going towards right now. And most orgs right now are just perhaps wrapping the power source. We are trying to rebuild the factory as we speak.

So this also has a mindset shift that needs to happen. In the past, it was all about how can you fix tools for certain problems. Okay. AI coding assistant problem has been solved. Oh, now we need to fix the review checklist. Can you go buy some software? The metrics for the software will be different. Similarly, for CI pipeline and monitoring. Each challenge was treated very, very separately, and that was actually the methodology that was done in the past. Right now, in terms of how we want to think about redesigning is, let’s look at the SDLC end-to-end. How can we infuse agents across the board so we can move away from local optimization to system optimization? So the takeaway is the system is going to be the unit of analysis, not the tool itself. So your metrics, your scale, everything has to be adapted towards the entire system redesign.

So how is Dropbox responding for this one? So I talked about validation and trust, right? So the real challenge is when your code generation is actually moving really, really fast, the other parts of the system cannot actually keep up. So we need to bring some form of agentic workflows for the other parts of the system, so we can keep up with the speed and also have the right [inaudible 00:16:21] guardrails.

The next is also scale. This cannot be done with humans. You need to have some kind of an orchestration layer, bring it all together, and I’ll talk a little bit more about Nova in a bit, but that’s something that we are building also.

When the system is going to be a unit of design, the whole measurement processes also has to be thought across the system, and we are actually moving away from local metrics to system metrics. I’ll talk about it in a minute.

And I don’t want to actually undermine, learning and behavior shift is also very critical. We are getting to a path where engineers themselves have to move from the way they were coding to agentic engineering and managing agents and harnesses. That’s also an area we are investing.

So we want to ex-

Uma Namasivayam:

So we want to expand beyond AI [inaudible 00:17:04] on coding. In Dropbox, we did an audit of the amount of SDLG steps that a developer has to do. We have close to 30 steps right now. And we are actually looking at each and every step. How can we bring agents? How can we learn from each of those rollouts and ideate from it and have different metrics for those things?

So we are building across the different areas. We are, right now, at close to five or six steps that are getting agentic and it’s increasing. Good example is code review. We have agents that are looking at the existing code review, looking at all the commits and providing a risk scale for this particular review and so that the user or the developer can actually look at it and make a call.

Flaky tests, any sort of anomalies that you have in the system is generally getting magnified with the speed. So the flaky behavior where the tool can also look at the issues and bring it all together is an area we’re investing. And we also heard from the other folks and from Netflix and also from Uber, migration is an area where actively investing. These are areas where there is deep toil, and this is where we are also making agentic SDLC possible here from an investment perspective.

So the Nova is an orchestration layer I talked about. We want to actually automate the SDLC, but it cannot be done manually. So this is kind of the add chat that Airbnb talked about. This is kind of the agent platform. You should think of it as it has the full code based understanding of Dropbox. It knows how a Dropbox engineer works or the Dropbox practices are built in. The security checklists are there. And it also has connection to all the MCPs and internal Bs post systems that we have. It has a strong front end and also the async workflows.

So how it helps us in two different ways. One, a developer can actually initiate Nova job from anywhere like Slack, Jira, GitHub. So that actually increases the speed at which you can deliver. Second, for the agentic SDLC piece that I mentioned before, Nova becomes the orchestration platform that connects all the pieces of SDLC. And it can actually move from entry point to pull request in a very short period of time. This is a system that we are investing in. And again, this is also a very lighter investment that we are making for the folks that are thinking, why do we invest in here when we have Claude Code and others? The integration layer is needed anyways. In case there is a future system that is coming in, you can also connect to it. But our perspective is we don’t want to have a vendor model lock in. We want to have a harness that ties into from different vendors. So we can also have multiple vendors that are here. And that’s investment that we’re making in Nova.

And we all like this number. I know some folks also talked about one in nine millions of PRs that actually get generated. We are at one in 12. And again, this is an alpha product. We did not even do any sort of communications. We just opened up on a Slack channel. Engineers themselves are actually adopting Nova big time. And one in 12 PRs right now are produced by Nova end to end.

We believe as we start increasing more and more capabilities, and we are working with the developers, this number is only going to go up significantly. And we are using for migrations, we are using for PR generation across from spec in to spec out, and also for async workflows.

So really, really strong progress. Highly actually recommend that this is an area of orchestration that needs to be invested across the industry, and this will be a big thing in the future.

The last section, I’m going to talk about measuring what matters in the system. A lot of us before also talked about the same thing. PR throughput is the North Star, but it doesn’t work in the new system when we talk about agentic SDLC. The work done as a count of PRs is good, but we really need to think about what does actually the customer get in terms of revenue? What does the customer get in terms of value? And how the work actually happens?

So how are we thinking about the framework here? We are thinking about more of a product funnel framework. The stage one is like, how is the fuel? How is the token consumption happening across the different agents, different users? We need to have a full idea around that. And then adoption. How is the user actually using those tokens to build those products? Where is it getting invested? And also for the agent workflows.

One and two is great, but if you do all these things, how do you actually measure what kind of AI output are you getting? What portion of the PR is actually getting done by AI? We don’t know. These are the things that we need to start measuring.

And finally, impact. This is the most critical one because I’m sure, at some point, the CFO is going to ask about your token usage is super high. What’s going on in the ROI? So that comes up all the time. So impact is going to be critical. I know the Uber folks talked about time to ship and in general we lost your leverage. This is an area also we are actually investing.

A quick glance of what are the metrics that we are looking at. Number one, AI PRs. This is the actual contribution of AI to a PR. A quick stat here. Three months ago, this number was at 30%. Right now it’s at 60%. We didn’t do anything. All the models got better. And we also rolled out Nova. The numbers actually went up. Similarly, AI lines of code. This is all the lines of code that is getting delivered. What portion of it is actually delivered by AI? That is something that we are measuring. And the third one, we also want to measure the loaded cost of a PR. This is going to be an important measuring in the future because costs are also going up. This does not just look at the PR specifically, but also what does it take to look at reviewing the PR, landing the PR, build end-to-end PR. So we can also think about what models to invest in what features. So that is where the cost lens here comes into place.

The fourth one is work distribution. This goes back to the ROI. Looking at the developer time, as well as the agent time, where is it actually invested? Is it for tech debt? Is it for migrations? Is it for changing the business in terms of new features? That’s another area we are also investing in.

And fifth one for us is also very important. For us to reduce the bottleneck, agentic SDLC coverage is very, very critical. And we are at like 5 out of 30, like I mentioned, we want to get to 100%, but we want to keep on iterating on this one. And time to ship metric, though both folks talked about, I’m not going to go into the detail, but it’s again, idea to customer value, very, very ephemeral metric, but this is going to be the North Star for AI investment in the future.

All the AI metrics are great, but when we think about quality, this is an important piece of the puzzle that we also need to bring together just as a checks and balances. So in Dropbox, we have every single system in terms of components, and those components have their own SLOs also. We want to make sure the SLOs are not degrading because of AI rollout. But some quality metrics that we are looking at is doing like A/B testing, human generated PR versus AI PR. How is the review SLA? Is it going up or going down? Path rate for CI/CD. Is an AI PR actually passing better than human PR or at least at the same level? Is something that we are looking at. Defect ratio and rework rate similarly. That is also something that we are bringing together as a quality metric together.

So finally, to wrap this up, I want to leave with four different takeaways. I’m sure the world right now is moving towards agentic coding. Your PRs are going to go up very, very fast, if not for today, in the next couple of months. It is coming for sure. You’ve got to think about your bottleneck map. Where does the bottleneck exist? It’s most likely in the stages other than code generation. So let’s start looking at the whole system and you start thinking about the bottleneck overall.

Also start investing more in validation, not just generation, like I said before. The biggest risk is losing trust in the code base from an engineer and also the customer. So building guardrails for agents, building guardrails for the security before you scale is also going to be extremely critical.

And the third one that is close to my heart, you have to run it like a product, measure what matters. This starts from also looking at the system holistically, have a good really strong metric framework, have a really strong roadmaps, but then also look at rolling out things faster and also learning and iterating from it rather than just waiting for the perfection here. So run it like actual product and learn from it.

And the last one is build the connective tissue deliberately. Every company is going to be different. For some of them, it is going to be building it. For some of them it’s going to be buying it, but this is something that is going to be extremely critical as you build your existing systems and connect them all together. So this is an area of investment that folks have to do it in the future also.

With that, a lot of tools are out there. Things are actually going to go to accelerate from here. You’ll see more and more gains. I really believe we should all start rebuilding and learning from each other. And thank you so much.