Dropbox's journey with developer productivity metrics

Utsav Shah shares the story of Dropbox’s journey with measuring developer productivity. Utsav discusses what he learned about both system and survey-based measures, his opinion on the usefulness of common Git metrics, and more.


Abi: Thanks so much for being on the show today. Really excited to speak with you.

Utsav: Thank you for having me.

So, as I understand it, you have a pretty awesome story to tell about your time at Dropbox. I know you mentioned you were around 900 engineers at the time, this is several years ago. And when COVID hit, your CTO asked your group, and can go more into that in a minute, but to figure out a way to measure developer productivity, and this mandate came down to you. So I’d love to zoom back to that time and start with just what was going on at Dropbox when COVID hit?

Yeah. I think I’ll give a little bit of background of, how did we get here? So I started on a developer tools build infrastructure team, way back in 2017. My first job out of college, and I was really interested in this one particular team, and that’s how most new grads. So you have to ask yourself why. So I interned at Google right before and I had just fallen in love with the build infrastructure, like Blaze build X, Y, Z, and you have 10,000 machines just running commands for you. It was just unbelievable. And then when I was interviewing for my first job, I met the tech lead of my team at that point, Kyle, who said, “We’re trying to rebuild that here at Dropbox. We are doing our migration to Basil and we really want to build that infrastructure.” So I’m like, “Okay, I’m going to work on that team.”

So from 2017 to 2020, pretty much I think in the middle of 2020 I was on all various flavors of build infrastructure, developer tools, developer velocity stuff. We had actually been running engineering effectiveness surveys all throughout that time and I had just learned a lot from the other engineers who’d been around at that time. But then by 2020 I was the tech lead of the team. I had kind of been around the block a little bit and COVID hit. Everyone had to work from home and the first question on everyone’s mind was like, “How is this affecting us?” First there’s of course stay safe, stay at home. A few weeks passed by, are we still getting our jobs done? Is anything happening? And it’s really hard to tell at the executive level. You can kind of tell, okay, are we still shipping products or not? But are people being more effective?

And I had opinions, for sure. And that question came down to me. But you hear a lot about leadership wanting to be metrics driven. So they kind of wanted metrics. And at that point I was dubious. I’m like there is not a single metric or there’s not maybe a couple of metrics that I can distill for 1000 engineers to kind of go back up and show you. But they still wanted metrics. So the first easy thing… I mean that’s all the background. I can talk about more details about what we try to do. I can go into something else.

Yeah. So I’m curious, you mentioned you had an opinion in terms of how COVID was impacting developer productivity at Dropbox. So what was your, on the ground, opinion?

Yeah. It’s again hard to tell on an aggregate what people are feeling. But if you look at just how much a CI costing us every month, it’s a good proxy of are people making any pull requests or not? That was all the exact same. I don’t think on any infrastructure graphs you could have noticed a difference between COVID and non COVID. Number of pull requests being merged, a number of commits across all of our different depositories. It all stayed the exact same

So my opinion from the bat, and I would’ve noticed if anything had changed because our spend would’ve changed, our profile of what we need to focus on would’ve changed, but everything looked the exact same. We can just carry on with our roadmaps. So, that was my original impressions. The remote work I had not known, I had one remote worker on my team and he seemed to be fine and we are all doing the same. So my initial impression was there is no metric on this graph that can tell the difference between pre COVID and post COVID. But then leadership wanted a metric anyway so that they can catch if there’s any problems going on. So, that was my original opinion. Yeah.

Yeah that’s interesting. So sounds like you kind of just looked at the metrics you already had, particularly around your build tools and conveyed that to leadership. But it sounds like it wasn’t enough, they weren’t convinced.

Yes. So they were not happy with the state of metrics that we had and that’s for good reason. Things like commit throughput, it is an extremely bad proxy for anything. You need to be at least be able to slice and dice per org, per group. I mean does it really make sense to measure the number of commits going into a shared depository between 30 teams? They were understandably not happy. And this is something that I wanted to solve a while back. The thing was that Dropbox had kind of started in early 2007, 2008. So we kind of have all of these homegrown tools. We had a homegrown CI tool, we were using Fabricator, which is the Facebook open sourced version of a code review tool. And one big challenge in my mind was that you cannot easily integrate these tools with everything else in the ecosystem of thinking around develop productivity metrics and all of that has kind of really expanded from 2007.

And since GitHub has kind of stolen all of the market share, I shouldn’t say stolen, taken all the market share around repositories, all of the tools around measuring developer productivity like your GitPrime and code climate velocity, tools that I wanted to use for an extremely long time. They don’t integrate with Fabricator. So my original thinking was that GitHub is popular, engineers like using GitHub. I need us to migrate to GitHub because not only does that open up all the developer velocity improvements of just having a tool that everyone knows how to use.

But also I get all these things, these amazing metrics. I could integrate it, GitPrime with our current set of Git metrics. But I think it’s just not enough. You need the full story, you need how quickly are Jira tasks getting done on time. And you kind of want to combine that with that qualitative stuff and surveys. So, that was what I had tried to do. I had started a conversation with GitHub and they just told me at that time I think this was early in the Microsoft acquisition era that there’s no way they’re going to support us. I was like, “Okay well I guess I’m going to wait.” Yeah.

That’s so funny. Well, I was working at GitHub at that time so that’s interesting to hear. So sounds like you had sort of a interest in this problem for a while. You mentioned you were aware of some of these vendor tools. I’m curious, were you also influenced by books like Accelerate or the SPACE Framework, also came out a little bit after COVID hit. So I’m curious, what was your personal view on how this problem should be approached? Did it really boil down to the Git-based metrics or did you have kind of… Yeah, what’s your sort of high level view on the problem?

So yeah, I’d been interested in this problem since 2018. Since a year of joining in that role. That time I think SPACE wasn’t a thing. I think Code Climate Velocity and GitPrime were the only two vendors and now GitPrime has been acquired by Pluralsight. But that time GitPrime was the fancy tool, it’s the number one player in the space and they had a podcast episode which I loved and I was like, “I need to get this tool for us.” So my opinion kind of evolved over time because we used to do these engineering effectiveness surveys once a quarter and we used to get a lot of information from that. But then engineers complained about survey fatigue and it’s a pretty rational, it’s a good complaint, it’s a reasonable thing to say. It’s once a quarter, writing down 30 questions is kind of annoying. So we moved to once a half.

And then for 2020 when COVID happened we didn’t want to throw yet another survey when people are just adjusting, it’s still brand new. So we kind of waited and we sent it right towards the end and I think now we’re just doing once a year. So I had start seeing that information come in outside learning about the industry and we also had an attempt to measure things like code review velocity, cycle time and stuff. So GitPrime was talking about all of these metrics that seemed pretty interesting, that code review cycle time. Yeah, that is pretty cool. So we tried to build a state machine type thing using our internal tools so that we don’t have to go through the effort of migrating to GitHub or something like that. And we don’t have to go through a vendor review with an enterprise company and a big logo like Dropbox, it can take six months for the security approvals to come through, especially if you’re reading source codes.

Can we hack something internally? And I think it was a valent effort, I think we tried, but the code review metrics that we came up with were just never good enough to use. Some of the metrics we came up with. We started instrumenting local developer tools like GIT and Linter and we started instrumenting code reviews. But we could never tell the full picture. And I was always suspicious of, can any tool really tell the full picture? Because a lot of information needs to be both from that low level information as well as the qualitative, survey information. At the same time from the surveys, the three biggest problems were always documentation, open source, and then the third place… No sorry. Not open source. The open layout, the open office layout and the third always traded places. And number one and two were always documentation and the open office layout.

So I was like, “Is there so much I can do as a developer effectiveness person if the biggest problems are stuff that’s really hard for me to fix.” So I think some other teams try to build up a knowledge group and a technical writing group, we migrated documentation tools. Efforts that we tried and failed at, honestly speaking. So my opinion was just trying to absorb all this information. And over time I became more opinionated about there’s not just one metric that can give us an answer. There’s a ton of things. Linter is getting slower in one week, by two x, is bad. Is it easy to put that into a large framework and measure that and show that in one metric somehow? No, that’s not really possible. The Linter’s for the desktop team are going to be much slower than the Linter’s for the server team.

So you need to be able to slice and dice. You really need to be good at handling data, looking at data and slicing and dicing appropriately. Sometimes you even need to look at team specific information. So, in my mind it’s really hard to come up with a good set of metrics in an aggregate. They can guide you and they can, maybe you can catch regressions using certain specific things, but by the end we had a table of 20 different metrics that we’d have to look at and of course team members leave so you don’t have enough context on why it matters that the Linter’s got slower. It was just a challenge. Yeah.

Well I think this kind of predicament you were in is really the same place a lot of companies are in, even today, in terms of thinking about and measuring productivity. You mention, I think two things that are pretty common patterns. One being the difficulty of actually getting some of these metrics out of a set of disparate, especially homegrown systems and platforms. And on the survey side, the challenge of survey fatigue and running surveys at a frequency that’s useful for groups like yours, and balancing that with actually capturing actionable data.

So I’d love to go into a little bit more detail on in all these areas, to really understand what you were seeing on the ground. I’d love to start with, you mentioned can any tool actually tell the full story? You talked about getting metrics around Linter’s and metrics around code review. So I want to go into more of these specifics. I’m curious, you mentioned the scenario with the Linter metrics where it might be faster for one group and slower for another group. So you need that context, and you certainly need to actually slice the data and divide the data properly. You had talked about code review cycle time as well earlier. So I’m curious, what was your experience looking at that metric? It sounds like you were able to suss it out of Fabricator. That’s of course a very common metric in solutions like GitPrime or Pluralsight flow. So what was your experience with it?

Yeah, I think we got the metric out but we could never do anything actionable. At least for the first couple of years that we had that metric. Because there were all these states that were expected. A developer might open up a pull request and then they might be out for a vacation for a week. So it’s obvious, or it’s expected that the reviewer doesn’t review it and that kind of skews your metric because you’re like, “Oh this team is taking a week to review other people’s code.” No, not really. The developer’s just out. Or there’s some kind of developers who make a lot of pull requests, they assign reviewers but the reviewer has an understanding that I’m not going to look at this until the test pass. So then there’s all of these human variables that come in. And because of all of those outlier cases, if the metric shifted from a P90 of two hours to a P90 of four hours, is there anything actionable we can do?

I don’t think so. And if there’s nothing you can do if the metric regresses by 100%, that’s not a useful metric for me. So, that’s the kind of issues. And then the question became, should we invest more in making that metric more reliable? And let’s try to think of what we can do as a developer tools group. If we invest more and we make this metric super accurate, what does that bias? Maybe we can figure out, okay, some groups are slower at reviewing code than others. We can probably tell their managers. Somewhat helpful, not really helpful unless leadership really cares about pushing that for us.

Because there’s a lot of developer tools or developer effectiveness issues that are technical, the builders loan and there’s a lot of them that are organizational. That one pocket of the company has a different culture than the others and there’s no amount of tooling that can poke people. Maybe you can send up the sense set up, pull request reminders. I know I’m talking to a pull request reminders expert. So we built all of that stuff in house and to try moving that metric, but those are obvious things to do, which you don’t really need a metric for anyway. So in my mind I decided okay, I’m not going to pay attention to this metric at all. The amount of investment it’ll take, plus the return on investment is too low for this to be useful.

Yeah, well I can relate to that because at GitHub we were also trying to build a solution that provided code review metrics. And of course my previous company, Pull Panda did the same thing and it was… I remember the number one feature request for the pull request metrics was to exclude non-business hours. So like you mentioned, whether someone goes on vacation or people are just in multiple time zones, if you’re trying to mandate a six hour turnaround time SLA, you need really precise measurements that filter out all the noise. And I never got to that feature. Because like you said, it just didn’t seem worthwhile. I’m curious, you brought up a great point about how some problems or bottlenecks with developer productivity are tool, or revolve around tooling, and others revolve more around process, like code review. At the time, which do you think your leadership was really more interested in? Or was it just both or neither?

I don’t think they cared about the difference. I think the question from the CTO was like, “Can we ship code faster?” There were tons of reasons that we weren’t shipping code as quickly as we could. And I think if you’re at any company where you’ve accumulated 15 years of tech debt and various products and acquisitions and you’ve tried and failed to incorporate them, the amount of debt is immeasurably high. So, I think the CTO wanted another answer on why are we shipping not as quickly as we could. And I think any engineer who’s been at the company for three months, they don’t have to be a developer tools, engineers can kind of tell you the answer that that’s tech debt and that’s probably the reason. And cutting off that tech debt is probably more important than spending a lot of time trying to invest into measuring productivity and optimizing that metric in a very quantitative way. So sometimes you just need to clean stuff up.

I’m curious, earlier you talked about the challenge… Well first of all, that tools like GitPrime weren’t even compatible with the systems you use. But it also sounds like you were kind of taking a good look at how you could potentially stitch together data from these different systems together to have your own homegrown solution. I’ve worked with many companies that, large enterprises that invest a lot of time and money and engineering resources into building these types of platforms. So I’m curious, why didn’t you… Well I know you did a little bit, but can you describe, paint a picture of how difficult it would’ve been to really build the platform that provides integrated metrics across all the different tools? I mean how many different tools and disparate areas of the organization were there? I’m curious to get the on the ground picture of that.

Yeah, yeah. I think the first philosophical point was that this is not Dropbox’s core competency, developing infrastructure for developer metrics. So the first sticking point was out of all the things my team could do, is this the most valuable? And going from that I can describe the various challenges. In our case you have server developers, desktop developers, mobile developers, developers on various different parts, contractors, QA engineers who don’t really fit the mold of regular developers in the sense of you don’t want the same metrics applying to them both. So you have all these different groups. Now within each specific group you have infrastructure engineers who just have faster built times, faster code review times, they make much more frequent commits. Product engineers necessarily, they need to be more careful, their code is going right at the end customer.

They’re using a different set of tools, there’s different IDEs. There’s no mandated IDE at Dropbox. Because you have several different languages, you have Python, you have Go. So trying to measure stuff within the IDE, there was no standardized tool or IDE that we could use to say, “What does your IDE performance look like?” You have different pockets of the code base that have variable, different in-house even tools. We had a version of, I don’t know if you’re familiar with GSX, but we had a version of that in Python. So none of the existing tools even know how to read how long it would take to render a pixel file. That’s what it was called. So if you just look at all of different local developer tools, that already makes it challenging. Now we had one central ICI system. It was somewhat easier to be able to get metrics on from that system.

But of course, tests that run on the server side on Linux are going to be quicker than all of the various desktop tests. So now your desktop engineers are necessarily going to be slowed down. So, there’s that aspect and then there’s of course all of the tooling and CI processes if you had multiple task trackers. So trying to integrate with all of those was a challenge. Different teams have different processes. Once you migrate to Jira, it’s hyper customizable. So maybe some team is using Epics, some team is using some other mechanism and we were migrating to Jira because of this, because of the flexibility. And then you can get top level visibility into stuff because each team can have a customizable workflow for themselves. So, I think it’s just a very, very different landscape across the company. And then which area do you measure first? Which one do you care about first?

We also tried that for a bit. But why don’t we just try to solve this one group of engineers in Seattle. We know that they have a large problem in developer efficiency. Can we just build an IDE plugin for them? So we even try to take that approach. But then now how do you focus on 20 people and not 870? How do you figure out what that trade off is? So I think in the large company there were a lot of different tools and trying to measure everything would be a lot of work. We did measure some of the very basic. The tool that you use to interact with Fabricator, we instrumented that, Git, we instrumented that, CI system, we instrumented that. That’s kind of, and that gives you maybe a third of the picture and we just rolled with that.

Got it. Yeah. Thanks for sharing that. That was interesting to hear. So you mentioned another way you were trying to measure things is using surveys. Before we get into the challenges and approaches you had to that, what were people’s opinions on the surveys versus the system data? Was there maybe, was it a bias from leadership toward, were you kind of being swayed by people? Did executives prefer one versus the other?

I think executives didn’t really care too much in the beginning. We loved surveys, the developer tools teams, developer effectiveness teams in its various forms. We believe that the survey was just much, much, much better than any tool. The way we thought about it was tools can maybe help us catch regressions when the metric is good enough. It’s like, okay, P90 CI time that goes up for a certain set of developers that’s consistent, that is bad and that’s going to affect us in multiple ways, from a finance perspective and a developer effect on this perspective. So we need to care about that. So we also got some metrics from tools around failure, like infrastructure failure. It’s great for really targeted things like that. But the survey was amazing for qualitative analysis and it wasn’t even that qualitative I would say. We kind of piggybacked on the company’s HR survey tool, Glint, and it lets us do all sorts of interesting things like actually breakdown answers by org, by group.

So by tenure, by cohort. So we could do really fascinating things, like let’s try to find engineers who’ve worked somewhere else and have been at the company for three months or six months. Because these are the people who generally have the most useful opinions when you think about. They’ve looked at how the outside world works and they come in and they’re like, “This is kind of strange. Why is this workflow so slow?” If they’ve been here for two years, they’ve kind of gotten used to it and then they won’t be complaining as much. So we had this pretty, I would say involved survey tool that helped us kind of break things down that’s like, “Okay, all the people who work in this org versus that org.” And then you can search by keyword. It’s like, “Let’s see what people are complaining about Basil or about Git or some other tool like that.” So I think the executives did not care as much, but we love the survey and to me it was the singular most important data point outside of talking to people directly. Yeah.

Really interesting to hear. And I’m curious, and you mentioned it wasn’t just qualitative and there were I’m sure numerical breakdowns and analysis of the data. How did you design the survey questions? And I’m curious, did you ask about just people’s opinions about things or did you also ask people things like, “How often does your CI break?” Things like that? I’m curious. Yeah, just tell me more about the design. And for some background at GitHub I worked with a team that ran our designed and administered our developer satisfaction survey and we had a team of pretty senior engineering leaders plus Nicole Forsgren spending fairly significant time, continuing to iterate on both the design and the delivery of these surveys. I’m curious how you guys did it.

So we did not have nearly as much investment. And I also have to clarify that most of this stuff wasn’t done by me. I just piggybacked on stuff that some other engineers worked on. But we had this one particular staff engineer or senior engineer, Alex V, who was the TL of our sister team at that time who designed the survey a few years before I joined. And there were specific questions around how much has your developer experience changed over the last three months if you’ve been at the company for more than six months, let’s say. And you could rate that on a scale of one to 10. So large efforts and migrations around, okay, we are moving from the previous build system to this one, we could actually tell the difference. And I remember Q3 of 2017 when the team I was working on had shipped this big migration that made developer environments better.

You could actually see the difference between the previous year survey. 45% of developers said that they felt effective in the previous year survey and now it’s like 71%. So you can see a massive change. So we had these specific questions around. In the last week, how many times did your dev environment break? Exactly the kind of stuff you’re talking about. Which forces people not to think in the abstract. It’s like, “Oh in the last six months who knows?”

But in the last week, I have a metric and often humans tend to think maybe last week was not representative. But if you average that out across people, it is pretty representative. So there were specifically designed questions like, “In the last week, how many times did things break, compared to three months ago? What is your general feeling?” So there were those kinds of questions with a zero to 10, plus you could add extra information to any one of them. I wouldn’t say it was very scientifically, it was an extremely scientifically well designed survey. But I think Alex had certainly read a lot of information online about how to ask these kind of questions because they weren’t just thrown in there. He had thought through those things. Yeah.

That’s funny. Was Alex just kind of fast? He just had a personal passion for survey design and psychometrics it sounds like.

No, I think he had a personal passion for developer velocity. And I think if you’ve been in the space, maybe this is an older version talking, but I think if you’ve been in the space at that time, you’re like none of the other metrics are going to help me, but I need some way to actually measure myself well. And the survey was the best answer to that. It also gives you the enough information to be able to break it down. Yeah, we really need to focus on desktop engineers on this site because they are not having a good time, for example.

Yeah, that makes sense. Well if you don’t mind sharing, I’d love to hear a little bit about just the quick stats. I mean how many questions was the survey? How long was the survey typically for a developer to fill out, and what kind of participation rate were you guys able to get typically?

Yeah. I have to say I don’t remember all of the specifics, but it wasn’t short. It was at least a 20 to 30 minute if you wanted to fill stuff out in detail. And with all of these things you can skip all of the qualitative answers and just fill out one to 10 and let it go. But we got a pretty good response rate because it was pretty much the only survey that we sent out to every single engineer and it had all the mechanisms of the people tools or the people team’s survey tool. So it would send automatic reminders, it had a dedicated link for each person. So you know where the responses come from, it wasn’t just a standard Google form. And even though I’m working at a much smaller company now, I’ve been pushing for at least we need to get a better survey tool because we don’t need to have something as advanced as that particular one.

But we need to know where we are getting our responses from. We need to be able to break it down. And yeah, it wasn’t ginormous or anything, it was I think just 20 or 30 questions with a single text box for a free form response if there’s anything else. Plus for each multiple choice or ranking answer, you could add extra information to that particular answer if you wanted to. The first couple of years I think we ran it, we got a pretty good… I remember 2020 was actually a little low and we had a 65 or 70% response rate. And it’s primarily because of nudges. We could nudge teams. And previously when there was more organizational firepower behind this stuff, it was even higher because two could give nudges. We could also tell organizational leaders like, “Your group hasn’t filled it out. So please can you send them an email asking them to fill it out.” So I think it was purely a function of how often we pinged people and we nudged people.

That makes sense. And I mean that response rate sounds pretty good to me. I know at GitHub shortly after I left, the last one we did was around 40%, maybe lower.

Maybe that’s a good thing and people don’t have things to complain about. That’s how… If no one’s complaining, that’s always a great sign to me.

True. But I feel developers always have things to complain about. I’m not sure I trust that heuristic by itself. Well I’m curious, so you had talked about initially when you started this survey you were running it once a quarter. But as sort of around the tail end of you being at Dropbox, it had morphed to being just once per year. Can you talk about why did that happen and did you guys try things to keep it quarterly?

Yeah, I don’t think we tried to keep it quarterly because as we heard from people that there’s the survey fatigue, we also were not seeing too many changes quarterly. Once we kind of did our largest projects that got a development environments to a reasonable state, the change did not happen as frequently. The Q3 and Q4 2017, I remember this somewhat, the results were very similar.

So even the ROI that we had was pretty low. So it’s like, “Okay, let’s just make it once, twice a year or once in two years.” And then it was kind of more organic. There was a little bit of management turn on these teams. Teams got split and reorganized and people got shuffled around and some people left and some people joined. And so the going from twice a year to once a year was more of an inorganic, you don’t have enough bandwidth to run this quarter, run this survey this quarter or this half, so let’s just wait until some other time. Plus when COVID happened you kind of paused on all of such things. So I think we were planning to send that survey out. We had to wait a little bit because we just don’t want to send the survey out right now. Let’s wait for some time.

That’s interesting. You mentioned just not having the capacity to run them more often. So I’m curious, what did that look like? I mean how much work was it each time you ran the survey and who was running it?

Yeah, I think the amount of work was in the analysis, in my mind. So I did the analysis the last few times before I moved out of the team. And in my mind, if you don’t spend the time to dig in, write a whole report on what you found interesting, things you’ve learned, which group has affected the most, which cohort has affected the most, then there’s no point of, because it’s a reasonable amount of tax to ask everyone to fill out a survey and then not spend the time actually analyzing the results. So it was a non trivial amount of work. I would say the design of the survey questions themselves did not change much over time. Once we had found a good set, of course you want to add and subtract more tags. It’s like, “Okay, if we have more build tools, we want people to have an option on complaining about those build tools.”

So, there was a little bit of that ongoing work, but it was actually the analysis that was, I would say it took a week sometimes. And since it was a larger company and since the survey tool was administered by the people team, sometimes you’d have to wait for the people team to give you access to the results. They wanted to do some analysis on it as well. So it was also kind of across effort. So adding both those things in a week of work, getting access, analyzing, it meant that it’s just another task to be prioritized against other things. Plus a team like ours, a build infrastructure developer tools, developer velocity team is affected by the standard organizational stuff. If this year we are trying to cut down on the amount we’re spending on CI, that is just much more important, especially if you go public, wink, wink.

So we had to kind of prioritize how do we focus on X, Y, Z? How do we make sure we do a good job? Let’s just pause on the surveys and send it out next year. Given that we know that we are not seeing that many changes year over year. And even if we did find out there’s a lot of burning fires, there’s other stuff we need to work on. Okay, we can write a report, we can figure out a bunch of findings, we just don’t have the people to work on those findings, let’s just wait.

Yeah, that’s interesting you mentioned not having the people to work on the findings. Earlier you mentioned when you were looking at the Git-based metrics for example code review, how that wasn’t really something your team could do anything about and you didn’t believe other teams would probably do anything about it either unless it was really made a priority of the organization. So I’m curious was the scope of your engineering effectiveness survey really constraints to just the things that your specific team could change? Or did you also include things around topics code review that might be more relevant to other leaders, but that your team can affect?

Yeah, I think this is a fascinating question. Because there’s all of the organizational things that the survey can give and share information about, but if there’s no leader who cares about moving that metric across the company, then does it really matter? And I think that was kind of the case. So our survey did not just involve things that my team of five people or eight people cared about. We had a whole application development group I think, which involved developer effectiveness, which was the global developer effectiveness team. But each specific group, the server engineers had a server platform, the desktop engineers had a desktop platform and it was kind of a shared effort in a sense. We ran the survey, we would collect the results and we would create a report and we would work with all of these different platform teams to give them specific insight which they can take back.

It’s like, oh you know what desktop platform you should consider improving your CI and let’s work together on fixing that. Or you should think about prioritizing these efforts or here are the set of problems that people have talked about. So I think all the technical teams worked in tandem. So the survey helped all of the engineering teams that focused on developer effectiveness, developer tooling. Some of the larger organizational stuff, we should improve code review cycles and X, Y, Z team. I don’t think the org was set up in a way that we could easily make a change in that direction. And in my opinion, it’s really someone who is… So let’s say that we find out that there’s an org that has slower code review cycle time compared to everyone else. We can go and tell that leader that you know what, this is a problem and they can go and fix it.

But if that leader doesn’t care, we are not going to see a change. And then it’s kind of like, do we care enough to escalate that? It’s like, “Oh you should solve this problem within your org.” Maybe you have other problems that you care about. Maybe it’s study the CTO’s job to push on that org leader. And we did not really see that many org leaders caring that much about this stuff. So it ended up being in practice that we would make all of the technical improvements we could make and we would surface information as relevant and the information was not super relevant for a lot of people.

Yeah. I’m curious, I mean let’s just talk about something like code review cycle time. Why do you think that wasn’t that relevant or important to the product engineering leaders?

Yeah. I think there were a ton of, I would say more pressing problems like burning fires at the company that code review cycle time was not something… I think it would’ve been a luxury for a leader to care about that metric, given things like tech debt. Things that we knew are problematic. And if you think about an organizational leader, they can say that I want to spend, let’s say 30% of engineering bandwidth on foundational improvements and 70% on making sure that we are shipping new product. That 30% was already spent on things that they knew had to get fixed. So where does this finding that code review cycle time really slot in? It just did not seem that important. And I don’t have more specific information to give you other than this anic data, but even in our mind it wasn’t that important a metric to push.

Because when you kind of combine that metric with anic data from these surveys, engineers are not complaining about code reviews being slow. There’s a small percentage, there’s always that. But in general, engineers complained about the lack of documentation much more and that was seen as a much bigger problem across the company. Everyone liked their coworkers, everyone kind of made sure that co reviews were happening, everyone was upset that there wasn’t enough documentation and that was slowing us down. So to organizational leaders, things like that, things like documentation was clearly a higher priority. And to that effect, they actually tried to form a team and to think about knowledge sharing and all of that. They didn’t spend time on these other things.

That’s really interesting. I mean if you could of reinsert yourself into that environment, something like code review cycle time, I think while it’s maybe not the thing that developers necessarily are complaining about. I think just from a rational standpoint, there clearly is a bottleneck there. And perhaps a pretty significant one in the context of how quickly things can go from idea to production. So yeah, I’m curious if you could reinsert yourself into that environment, how could you get leadership to care about something like that? When they probably should, especially since it seems like a pretty low hanging thing, you don’t really need engineering investment to affect that. So what are your thoughts on that?

Yeah, I think the first thing I would need is a metric that I can defend. If I don’t have a metric that I can say this is scope to your team, this is scope to the senior engineers, this is not an intern struggling and their mentor not reviewing code quickly enough. That is a problem, that’s a separate problem. I think the first thing I would need is a metric that I can defend. And the second thing is for me to convince myself that this is a significant part of the engineering site. The whole product velocity more like this is a significant roadblock when an engineer is trying to ship a feature. If code review cycle time, if it seemed like for a certain group their PRs were getting stuck on code review for days and weeks and everything else was smooth, they’re shipping their features quickly on time, they’re not needing to make these revert commits and revert their reverts and get that through.

There’s all of these other metrics that are also relevant. If you take all of those things and then you show what code reviews are actually taking up 40% of developer time, with the metric that I can trust. Then I would be happy to push myself or push other people and saying, “You know what, you should care about this.” I think the way it was, was kind of a chicken and egg where if we didn’t have a good enough metric, we couldn’t use that to tell people that this is a problem. So I think a metric that we could defend, a metric that we could understand ourselves and we actually genuinely believed that it is a source of a problem, then it could’ve made the difference.

Yeah, that makes sense. So you fairly recently started working at Vanta and you’ve mentioned they’re going from a pretty small team, I think around 10, to on their way to 50 engineers. Where are you at with all… Are you in the same place you were at with Dropbox a few years ago or how you viewing this problem or experiencing this problem there? What’s also the need? I mean, are they already hitting a point where engineering velocity is kind of a little elusive to understand?

Yeah, even with 10 engineers it’s hard to understand. I think even when I first started, since it was a smaller company, 50 or 60 people and you set up one on ones with the CEO, you get to talk about anything. She knew my background, she’s like, “Let’s talk about developer velocity. Is it as quick as it needs to be?” And no one even really knows, even with five engineers, even with two engineers, maybe we can be shipping faster. So with 50 engineers it’s harder. You get anecdotes, you get information. I think everyone is interested. Our leadership has already talked about tooling and which metrics we should be using. And again, this time we’re on GitHub so I can actually get these metrics now. So it is elusive. Without a good tool, there’s no point in trying to measure it.

There’s no point of trying to get it yourself. There’s the standard metrics of how long CI builds take on each PRs, which we try to have a measure for. And over time I think the plan would be to use a better survey system at least. Or even just investigate the market. I haven’t done any vendor research or anything this time. But once we get to that point that we care about this stuff and we’re at 70 or 80 engineers and we have some time, also at a startup here just trying to ship stuff as much as possible. So, once you get some more time, probably go back to doing a vendor review and pick something that combines maybe some of the information that we get from GitHub with a survey that we can send out to people and have that broken down by small groups of teams. That is the ideal tool that I would be looking at maybe six months from now.

Got it. Well, Utsav I’m really excited to continue to follow your journey at Vanta and really enjoyed this conversation. Thanks so much for being on the show today.

Yeah, thank you for having me again.