Setting targets for developer productivity metrics

Abi Noda: Well, thanks, everyone for joining. I’m Abi, CEO of DX, and Laura, CTO of DX. This is our ongoing webinar series, where we talk about developer productivity measurement, how to operationalize metrics, how to improve productivity. So today we’re talking about a really fun topic, which is how to set targets and goals for developer productivity metrics. And I think this is probably one of the most challenging problems in this whole domain, Laura. Laura and I have been talking about this for weeks now, sharing ideas in advance of this webinar, so hopefully we have some good advice for you all. In terms of just some housekeeping, we’d love to hear just questions in the chat as we go through. We’ll try to address those and at the end we’ll come back around and try to address any lingering questions as well. Laura, I’ll hand it to you.

Laura Tacho: Yeah, thanks, Abi. So as Abi said, we spend a lot of time thinking about operationalizing metrics and neither of us believe the problem of what to measure is perfectly solved, but right now we feel pretty confident that Core 4 is the right place to start at least. So Core 4 is the new framework from Abi and myself as co-authors, with collaboration from the authors of DORA, SPACE and DevEx. This is a great general guidance to get start with measuring. Once organizations have these metrics though, the questions that Abi and I are getting often is like what to do next? So how do we actually get the metrics off of the dashboard and into the teams? And one thing that we’ve been focusing on and why we wanted to have this conversation is how to set goals around those metrics. So actually drive improvement. Abi, I just wanted to hear from you, you’re also having lots of conversations with companies all the time about this. What are some of the challenges that you’re seeing when organizations are trying to start using these metrics, especially to set targets and set goals for continuous improvement?

Abi Noda: Yeah, I mean this has been a challenge for a long time. I mean, folks who’ve been a part of the DORA community for years, there’s been a lot of attempts within that world to how do we set goals around DORA metrics, should we even be setting goals? There’s a lot of stories on goal setting gone wrong with particularly DORA metrics, but really all types of metrics. And I think one of the common patterns is kind of blanket setting targets on metrics in a haphazard way than the experience for teams. It’s often some of these metrics don’t really make sense for teams to target, or in some organizations there’s definitely been reports of rampant teams just kind of fluffing the metrics to be able to inflate the numbers and hit those targets. So that’s a story I think we hear over and over again, and that doesn’t even get to the root of what the problem is, but it just shows how tricky this problem is for everyone.

Laura Tacho: Yeah, it’s definitely tricky and I think the classic stereotype of our company decided we wanted to do DORA metrics, and then now we have everyone has the target of qualifying in the elite category or hitting some external benchmark, regardless of context, regardless of how often that service might be updated, if it makes sense. That’s sort of like the trope of poor goal setting, because it just ignores a lot of the context of organizations. So one thing I see is metrics overwhelm definitely from companies that are trying to get started. There’s just so many metrics out there. We kind of have a firehose of data in some cases and it can be really difficult for leaders and teams or individuals to figure out where to focus as well. And that leads to a little bit of analysis paralysis.

Abi Noda: Yeah. And I think the reaction, not only are numbers overwhelming, but numbers are scary. I think organizations often roll out these metrics. They explicitly or implicitly communicate that these numbers should be targets or should be targeted and the reaction from teams, that can be not a frictionless experience for organizations, that that can immediately create mistrust, fear around these numbers. So we’ll talk about this more, but so important to get ahead of that, always be communicating prior to rolling out the numbers, so that the intention is it is often good to just clarify, we’re not setting targets on these, these are actually numbers we’re observing to learn and monitor, not to target or evaluate teams. So we’ll talk more about that later, but I think that’s another pretty common theme and challenge as well.

Laura Tacho: Yeah, and Abi mentioned already teams sort of fluffing the metrics. And I know whenever I bring up setting targets around metrics up to executives or individual contributors alike, I’m sure a lot of you out there right now listening to this are scratching your head thinking, “But when we use metrics in public, aren’t teams just going to game the system?”

I think this fear of gamification of the metrics holds a lot of us back when it comes to using metrics in public or even setting targets around them, because we know that the consequences can often outweigh the benefits. It’s kind of like a fine art and a science to figure out the right way to do it. So we’ll definitely make sure to get into gamification as well. On the topic of gamification, I think kind of tying up this conversation and coming back to the DORA metrics, DORA is very practical and useful for a lot of companies, because of the external benchmarks. And I also see teams struggling to understand how to use benchmarks and how to set actual targets, if they should be aiming for the external benchmark, if they should be aiming for something else, like how to actually figure out what the number should be, what should they be striving for. I think that’s a pretty common struggle for these companies as well.

Abi Noda: And of course one thing that comes up often on this topic is Goodhart’s Law. So Laura, maybe just a quick primer on Goodhart’s Law. We’ll be talking about this theme a lot throughout this session, but just for folks who aren’t familiar with it.

Laura Tacho: Yeah, we’ll definitely go in depth into gamification. Those of you who haven’t heard about Goodhart’s Law before, it’s the kind of concept that when you have a measure and it becomes a target, it’s not a good measure anymore, because people, individuals, they’re kind of incentivized to distort the data or distort the system, so you’re not going to get an accurate reading. An example might be we want to reduce the number of bugs and so we have that target, and then the classic trope is like people will stop reporting bugs, and then we’re going to artificially bring that number down.

We actually end up with a system that’s overall lower quality than that was before, because now the bugs aren’t visible to us. But this is kind of a risk, I think that’s in everyone’s hind brain as we talk about metrics out in the world, because it’s natural human tendency to feel a little bit threatened maybe, or feel a little bit scared when we have specific numbers driving our evaluation or leading to our promotions, or our bonuses. So how do we avoid that? And I think you and I both can talk a little bit about what we’ve learned and how to avoid it.

So actually with that, why don’t we jump into more about solution mode, what to do about this problem? We have data. We have a desire to improve. We know that we want to drive accountability and we want to set targets around this data, but how to actually do it. So we need to take a few factors into account. Of course we want to avoid setting blanket metrics. We need to find the right target, avoid gamification. One of the things, Abi, that we’ve really been focusing on a lot, and if you caught our last webinar, we talked about this concept of diagnostic metrics and improvement metrics. We’ve lately been talking about this idea of controllable input metrics and output metrics. Can you talk a little bit more about this input versus output metric and why it matters for goal setting?

Abi Noda: Yeah, I think this idea of controllable input metrics versus output metrics relates highly to the concepts of leading and lagging indicators, but I think the controllable input and output verbiage is more clear. One of the biggest mistakes I think organizations can make is to set targets on output metrics. And there’s actually a whole dialogue within the DORA community about this problem as well. But when you set targets on output metrics, output metrics, in contrast to controllable inputs, are things that teams either can’t directly control or are not really clearly actionable to teams. And so, when you set a target on an output metric, this can create a poor experience for the team, because they don’t really know what to do and it can often lead to them just distorting the numbers, just targeting the number instead of actually improving the system.

So a really good example of this would be PR throughput. This is a good metric for tracking productivity. This is part of the Core 4. We’ve talked about this a lot, but it is an output metric. Think about what happens if you set PR throughput as a target for a team. From the team’s perspective, what should they actually do to improve that, for example? Are they supposed to just create more PRs, split up their PRs into smaller pieces arbitrarily? It’s not clear. And so, I think PR throughput is one of those that is clearly a useful output metric, but not a controllable input, and therefore not something we should be setting targets for teams around.

In contrast, if you look at a metric like code review turnaround SLAs, that’s a metric where you’re giving something to a team that they really can control. That’s a behavior they can control. It’s clearly linked to improving the system, shortening the time that people wait for code reviews. And if they do that, we should see impact on the output metric, which is PR throughput. If we can turn around PRs faster, we should see a higher flow of PRs through the system. So I think the takeaway here is before setting any targets, you need to first work out what are your controllable input metrics that lead to the outcome or output metrics that you’re trying to actually drive in your organization. And then, from there you can begin to zoom in and figure out how to operationalize that.

Laura Tacho: Yeah. And Abi, I think this is a pattern that we are pretty comfortable using in other parts of our lives, but I think because of the maybe overabundance of metrics in some scenarios and also the vast variety of metrics that we have, they’re sort of all presented on a dashboard. They’re all presented equally, when in fact they’re really not. They’re used for different use cases by different personas. It’s just easy to get a little bit lost in the details. And we often use health analogies, and so I’ll kind of share one with the audience here to help illustrate this idea of the controllable input versus output metric. So I have low iron and this is the output metric. I get this measured once a year on my blood panel with my doctor, so I’m seeing what’s my iron level once a year. Maybe more, hopefully not, but let’s just say it’s once a year. On a daily basis, that metric isn’t actually going to change. I just have my blood panel. That piece of paper is the same piece of paper.

In order to influence it, I have to look at controllable input metrics. I have to look at actually things that are within my control, decisions that I’m making on a daily basis in order to influence that number. So this might be if I take my iron supplement every day, if I take my vitamin C supplement every day, if I’m hitting nutritional targets, like eating broccoli and strawberries, or avoiding coffee at meal times to block iron absorption. If I look at those altogether in my daily habits, that is my control and I know that if I’m making progress on those and sticking to those, then I can actually make change on that output metric, which is sort of the big picture lagging indicator that I’m going to have measured much less frequently. So just like for PR throughput, we don’t know, is the controllable input metric typing faster? Is it reducing batch size? Is it increasing build time? We have to break down these big output metrics into much smaller chunks that are much more locally actionable for teams, and that’s really the area that we want to focus on.

Abi Noda: I think communicating this is so important too. Clearly calling out here are the controllable inputs and here are the outputs that we’re trying to drive. Imagine for a moment, if we were to just hide the left side of the slide that you’re showing right now, if we were to hide the inputs. What behavior would it potentially incentivize if you just had the output metric of iron level, right? Would folks just go get iron injections? Would they just overdose on iron supplementation to just try to get that number up? And that’s analogous the types of harmful behaviors or outcomes we see in organizations that set targets on outputs. People, they don’t really do the right things to improve the system. Instead, they grasp for ways to just move the number and that’s often not actually improving the system.

Laura Tacho: Yeah, I think one challenge that I often hear from companies or even from individual frontline managers, or senior engineer, staff engineers say, “Yeah, we’ve adopted DORA metrics or whatever it might be, and we are supposed to work on change failure rate or we’re supposed to work on lead time and it just feels untouchable for us. It feels like it’s so big and thorny, we don’t even know where to start.”

And they then project that and think it’s a failing on themselves, of, “I can’t figure this problem out,” when in reality it’s like we’re just looking at the wrong level, the wrong resolution of metric, and it needs to go through this translation or this mapping process, this metric’s mapping process of breaking down these output metrics into controllable input metrics, so that we can better figure out what are the levers that we actually want to pull or to push, so that we can have the result that we want to.

Abi Noda: So Laura, why don’t we apply this now to some engineering productivity metrics and go into the DX Core 4 as an example. So for folks not as familiar with the DX Core 4, we have white paper, lots of content around this on our website at getdx.com, but Core 4 is our framework for measuring developer productivity that combines DORA, SPACE and DevEx. And the Core 4 has four key metrics. So PR throughput, the DXI, developer experience index, change failure rate, and innovation ratio or time spent on new capabilities. And since rolling this out, one of the biggest questions has been, so how do we actually set targets around these? How do we take this framework and begin driving improvements across the organization? So I think Laura, if you want to go to the next slide, let’s talk through this idea of controllable inputs versus outputs here, because we see a lot of organizations right now getting tripped up with the Core 4.

So we already talked about why PR throughput is an output metric. When you look at the DX Core 4, actually there is some gray area. Technically a metric can be an input and an output, depending on how you want to draw the chart, but I think it’s safe to say change fail rate is also an output metric. You can’t really just control how many failures you produce. Failure is the outcome of how you’re deploying software, how you’re developing and testing, and ensuring quality in your process. An innovation ratio or percentage of time spent on new capabilities is also I think more of an output than an input. I mean, technically, yes, you can just tell people, “Hey, here’s how to carve out your time.” But when we look at the data, we know that a lot of the things that pull us away from doing a creative, innovative feature development is actually reactive work, unplanned work, incidents, support, pings, interruptions. And so, I think again, innovation ratio is something that is less directly controllable and much more of an output.

And then, so when we look at the DXI in contrast, that really is a controllable input, because the DXI is measuring specific areas of the development process that can be improved, optimized, reduce friction. And I think the theory of improvement here that translates to how we should be setting targets is really, hey, if we reduce friction in the developer experience, which we can do, teams can do that without unhealthy behaviors. It’s by its nature focused on improving the system, then we expect to see better outcomes and these output metrics, we expect to see higher throughput, we expect to see better or stable quality, and we expect developers to be able to spend more time developing product. So that’s I think the quick version of how we apply this concept to the Core 4.

Laura Tacho: Yeah. I get asked a lot, how do great data-driven leaders operate? And I think this is a great example of two things that data-driven leaders do, which is interrogate and experiment. So the idea here is DXI, it has a lot of levers right there. It’s a collection of levers, basically. These are concrete actions that teams can focus on that are going to influence the outputs. So we’d look at something like innovation ratio and what I would say how this all fits together, is that we need to interrogate and then experiment. So what does that actually look like? We can say, “Oh, we want to increase innovation ratio.” As Abi said, this is most likely because teams are spending time on unplanned work, incidents are taking too long, the quality is low. The DXI is going to have that information in there, because it’s pointing you to where the friction lies.

So we want to come with that interrogation into the DXI and try to figure out where is my developer experience falling short? What are the actual things that we should focus on? Are people not able to innovate, because it takes too long to update dependencies or because they’re being pulled in a bunch of different directions? Because those are going to require really different interventions, and so we have to come with a curiosity, just interrogate this data. Once you have done that interrogation, and let’s say it’s the case where we have too many interruptions due to incidents, then we can form some hypotheses. What is actually going to move the needle on this innovation rate? So let’s say we want to reduce the time that it takes to resolve an incident by half, and maybe the key to that is we have really poor production debugging tools and skills perhaps.

So that’s going to be an area of influence. We can validate that with some signals that we’re getting from our DXI and then make the hypothesis do the work. Shorter is better. You want to be able to get short feedback loops here, but if we just come to a team and say, “Okay, fix your innovation rate, make it 5% higher.” I know a team could easily do that just by ignoring bugs or no longer responding to incidents. That’s not what we want. We don’t want a distortion of the data or distortion of the system. We want to improve the system actually, and that’s why we need to tell the more complete story with this metrics mapping, with interrogating the data-forming hypotheses, setting the right goals at the right level.

Abi Noda: I’ll share two more thoughts before, Laura, we move in the gamification specifically. But for folks coming in here who are more familiar with DORA than Core 4, what we’re talking about here, a hundred percent applies to DORA as well. Dr. Nicole Forsgren and the current folks leading the DORA organization are fairly public in saying that you shouldn’t set targets on the DORA metrics. I know Nicole recently did a webinar where she said metrics optimization of the DORA metrics is not the goal, and in fact, the DORA framework has the capability catalog that is sort of the input, the controllable input to the DORA metrics, which are an output.

And so again, whether it’s Core 4, DORA or really any set of metrics really distinguishing what are the controllable inputs from the outputs, framing them clearly as two separate things, and then focusing on those inputs is how you’re going to create a healthy culture around targets and data-driven improvement. Laura, that then brings us to talking a little bit more specifically about gamification, because even if you set targets on the right numbers, you can still run into negative behaviors, gamification in the organization. So let’s talk about how do we counteract that?

Laura Tacho: Yeah, so I think my headline here is we know how people react to systems of metrics being presented, and so it’s up to us to design a better system. So we talked about Goodhart’s Law, which is like when a measure becomes a target, it’s no longer a great measure anymore, because it just gets distorted. There’s also Campbell’s Law, which is when you use a number or a measure to make a decision, the less you can rely on that measure. There’s a lot of history about how human beings react, and so if we’re worried about gamifying the system, just know that’s not a problem with the developers. It’s a problem with the system, so let’s come together and make a better system. I think when it comes to gamification, there’s really a couple design flaws really to avoid. One is one-dimensional metrics. So in the scenario that Abi is talking about, we set a target on PR throughput. That allows a lot of, it allows gamification, because we might inflate or satisfy our goal for PR throughput.

While everything else that’s important to the business sort of falls to the wayside, because our view of system performance isn’t multidimensional, it’s just one-dimensional. We’re overly fixated just on that one thing. So that’s one big design flaw. The other thing is really emphasizing incentive or reward and celebrating hitting the threshold, rather than really focusing on improvement in learning. Specifically, I can think of scenarios where a promotion is tied to your commit count. That’s a classic example of heavily incentivizing people to game the system. Of course we’re going to get very low-quality commits if that’s the scenario. So to avoid gamification, we want to make sure we’re using multidimensional measurements. As Abi mentioned before, targeting these controllable inputs versus outputs puts a lot more control in the teams and makes it a lot less likely to experience gamification, simply because we have more and higher fidelity insight into what’s actually happening in the system.

We want to avoid overly incentivizing or rewarding hitting a specific threshold of a number. Instead, focus on the learning and improvement, and the process. Giving teams time to do the work is also something really important, because it removes sort of the threat I would say. And I think when people are feeling under pressure and threatened or they don’t feel any sense of recourse, they think that this is an unfair expectation being set on them, they don’t feel that they actually have the room, skills or autonomy to take action to improve the system, then we’re going to get those other behaviors where we’re distorting the system or distorting the data.

Abi Noda: I think the overarching theme is to really be inviting in this process. Don’t just drop targets, metrics on teams, but instead really invite them to be a part of improving the organization, improving the business, and being given time and permission to go improve the system that they exist within, right? Teams want to improve this stuff. They’d really love nothing more than to have time to improve this stuff. So we just have to define the right framing and guardrails around that, and make sure we’re instructing and inviting teams to go improve that system, and make sure that we’re not making targets a scary evaluative thing that is really a negative experience for teams. So make it positive and like you said, Laura, celebrate the improvements and efforts, and don’t make it just about numbers and evaluations.

Laura Tacho: Yeah. Ben Norris, you had a great question about how to help developers buy into some of the metrics, kind of explaining this anecdote. Like, “They make sense, but when I showed the developer some of the metrics that we were gathering, it was like maybe a little bit of a… threw the team off a little bit.”

And I think this question, Abi your point of inviting them, making sure that the intention is really clear, these metrics are meant to improve the system, not meant to evaluate individual performance, not tied to your bonus or promotion. I hope that’s the case. Just helping the developers understand what’s being done with them, improving the system. And then also for things that are a little more controversial and PR throughput is controversial. It was controversial even with our authoring team, we kind of went back and forth on it. But understanding what it’s being used for, who’s seeing the data, what decisions are going to be made with the data, all of these things that sort of demystify what the metric is being used for will only help acceptance, because in the absence of information, teams tend to make up things to fill the gaps, to answer their questions, and that’s the last thing that we want. We just want as much transparency as possible.

Abi Noda: Laura, I would also add that when developers, when you get that negative reaction from developers, when they say, “Hey, I’m not sure if these are even the right metrics.” That’s usually in my experience, often a pretty telling signal. If developers are reacting to a saying, “I don’t know if we should be looking at that.”

That’s probably a case where we’ve accidentally framed an output metric as an input. And I know a lot of leaders feel like, “Oh, developers are just always going to push back against metrics.”

And developers feel like, “Oh, leaders are just trying to bludgeon us with metrics and data-driven evaluation.” But I think in reality, that area of friction, when developers kind of push back on specific metrics, that’s a good signal to dig into. And I think nine times out of 10, it’s going to just be a misunderstanding or an opportunity to clarify, oh, this should actually be an output and we need to come up with a better controllable input that really makes sense to the developer and to teams.

Laura Tacho: Yeah. Yeah. Abi, I want to get into maybe some of the more tactical logistical parts about setting goals. One of the other questions I’m asked a lot is like, “Okay, let’s say we’ve got the right controllable input metrics. We’re avoiding gamification, but how many targets should you set in a given time period? What’s a reasonable amount of things to focus on at a time?”

Abi Noda: Yeah, I don’t know if there’s a magic number there and metrics are tricky. Sometimes you have compound composite metrics, so one metric is actually 20, but you can kind of pick and choose. I think the anti-pattern to avoid is one of those things where you have like 30 different metrics in a spreadsheet across five different dimensions, and you’re kind of telling teams, “Go make this all green,” or, “Make this all look great.” I think it’s important to really focus the organization, focus teams around a pretty narrow set of metrics and goals. I think if I had to pick, I’d say around five to six controllable input metrics at a time, I think is about the max I would generally recommend, but that can shift every quarter, every six months. So it’s a journey and you can kind of iterate on that as you go.

Laura Tacho: Yeah. Yeah. I think it highly depends also on your maturity when it comes to continuous improvement. If you’re just getting started out or if you’re just starting out, one of the things that I like to emphasize is it’s not just about developer productivity metrics, it’s about metrics in general. And so, it’s not about are we focusing on code review SLAs? It’s like how as a team are we using metrics to inform our habit of continuous improvement? That’s also something that your teams need to focus on, and so the smaller is better when you’re getting started as well. I think as you mature then aiming for that five to six max that Abi shared is definitely a good number to have in mind.

Abi Noda: Laura, I see a really good question in the comments around multidimensional set of metrics. The question comes from Seth Nelson, “How do you bridge the gap with the CEO?” Who wants a graph with one number, right? And I think this is an exercise, I recently was in San Francisco with a team. We were kind of working on this exact problem. How do you map this out? And I think today we’ve talked about output metrics and inputs as two levels of metrics. And if you were to visualize that in a pyramid, you would have your output metrics or the Core 4, and then below it you would have your controllable inputs that are what the teams are really targeting. I think the output metrics then are the things that you would typically be reporting up to executives. So again, that was one of the core goals of the DX Core 4 framework, is what is it that at the C level and developer productivity leaders, platform leaders, what should they be aligning around at the leadership level?

And I will share with this exercise I was doing with this team, we even explored actually rolling up the Core 4 into one composite index score. And so, you can always keep rolling up for reporting purposes, but again, really important that those controllable inputs, that the tactical level things that you’re telling teams to focus on, that that is really clear. You can’t mix up the outputs or the board level metric with the inputs or we get right back into the problems we were talking about.

And it is important though, again, that framing we were showing today of the outputs and inputs on the same slide, it’s important to show the whole organization how the things link and connect to one another. Developers, even though they’re not targeting like MPS, which might be a company level metric, it’s fair to frame to them how curative view turnaround actually links up to developer productivity, which then links up to something like MPS. Connecting the dots for everyone, making everything sort of align from the C level down to the individual developer, that’s really important to do, and I think that can be done fairly simply when we just understand these different levels of metrics.

Laura Tacho: Yeah, I think what you’re saying, Abi, leads really nicely into a question that a couple folks have had about connecting the metrics to business impact. And that just sort of expands on what you were talking about just now of drawing the line between what we’re doing here in engineering and what the impact is on a business. So the metrics mapping concept, where we’re taking an output metric and breaking it down into controllable input metrics. That’s sort of the mindset that you want to have when trying to connect what it is you’re doing in engineering back to, for example, a company OKR, or in fact maybe the other way around. We say, “Okay, our company OKR is to increase time to market. What would we need to do in order to increase time to market? What are the things that are standing in our way? We’re losing time due to inefficient processes. We are spending too much time on KTLO, we’re being interrupted too much by incidents.”

Then we can just kind of keep going through, it’s like a recursive process, this breakdown of output into controllable inputs and figure out what’s the right level then that you want to start setting some targets or making goals. I think the other kind of half of that question is figuring out the business value, the business case. And I have in, if you go to Lenny’s newsletter, I wrote an article introducing the Core 4, and I talked through my methodology for doing that, but you want to have either a time argument or a money argument. And in fact, those two things can be the same, because once you can get anything in time, it can be made into money pretty easily.

But these are just some of the tools and building blocks of helping you put together a clear line from what we’re doing in engineering to those company OKRs. Use that metrics mapping process, make sure that you’re focusing on the controllable inputs. And then, if you need to sort of build out a business case, figure out why did you choose that controllable input? It’s not an accident. What were the hypotheses? What’s the data telling you about what time it’s going to save or money it’s going to save? And then start packaging that together into kind of a business case to help you defend those choices and make it clear why the goals exist to begin with.

Abi Noda: I would also recommend folks go check out the webinar discussion, Laura, you and I did about the Core 4 framework. Because one of the key goals of the Core 4 is to actually identify the right set of developer productivity, top level metrics that do connect back to the business and can be translated into business values. DXI, for example, the DXI model that we’ve developed that connects that to dollars of software engineering, hours saved, developer time saved, or as you mentioned, innovation ratio, how that directly speaks to the concerns of finance and the CFO. So would point folks back to that conversation as well for a deeper dive into just how do we kind of connect developer productivity to the business.

Laura Tacho: Yeah. All right. Abi, last kind of big question around goal setting, which is about external benchmarks. Could you shed a little light about what they’re useful for? Should teams be setting targets to hit 75th percentile across the board? Where’s the nuance there, what advice do you have?

Abi Noda: Yeah, I think top quartiles of 75th percentile is a good place to aim. And we talked about this in a prior session we did on benchmarks. I think it’s also important to call out that, when it comes to target setting and goal setting, we need to be really thoughtful about how we’re doing that. So for example, depending on the metric, so for some metrics for example, higher isn’t always better. You can only, for example, PR cycle time. You don’t want cycle time to be zero necessarily, right? That would mean you’re not doing code reviews.

So you have to think about, okay, some metrics need to be measured against an SLA or a threshold. Some metrics we want to set targets around a percentage increase. So look, different teams are going to be in different places, because they’re different. And so, we can’t just tell everyone, “Hey, hit this number.” Instead, “Hey, we want to see everyone lift up by this percentage.” And then, of course, for some metrics you can just set an absolute number. So those are all considerations in addition to benchmarks around how do you actually set the number, what is the guidance around what is the actual target? And it really depends on the metric.

Laura Tacho: Yeah, I think another thing I want to highlight is the effort to move up the percentile. It’s not linear. It takes a lot more effort to go from 75th to 90th percentile than it does from 50th to 75th. Once you get into these higher percentiles, anyone who’s worked in high availability systems and going from three nines to four nines, to five nines, is it’s not that easy. It’s astoundingly difficult. And so, another thing is it might be possible to move one metric up five points pretty easily. You might find ways that are just opportunities in your business to do that.

For another metric, it might be really difficult. It might take two years to move that metric up five points. And so, setting sort of blanket across the board, there’s generally a lot more nuance and a lot more context that really needs to be taken into consideration. I think Abi’s advice of understanding what’s realistic for the given context, the given team is really spot on. Metrics don’t replace strategy. They enhance it. And so, we can’t just fall back and say, “Oh, well now we have the numbers. Let’s just do a 10% increase across the board.” It still requires human decision making, requires discipline, requires prioritization. The numbers aren’t going to take the place of that good strategy setting.

Good. So Abi, wrapping up, what would you say your advice is to companies who, now they have access to the data, maybe they’ve adopted the Core 4, they want to start using the data to drive improvement. What would you recommend that they do to set better targets around developer productivity?

Abi Noda: Yeah, I think if there’s one thing I’d hope folks take away today from the conversation, it is this idea of controllable inputs versus outputs. So when you leave the session today, go back to your team and bring this up. Start using that language of controllable inputs versus outputs. Start thinking about, with what you’re trying to do at your organization right now, what are the inputs? What are the outputs? It’s a fun exercise. It’s not always easy. As we were saying earlier, sometimes the metrics are kind of in the gray area. So really focus on clarifying that, put that on a slide. And then start thinking about, okay, who owns this? Is it teams or is it the platform organization? And then, what we didn’t talk about today is really change management. So getting leaders involved, getting the right comms out, how do you actually align this to an OKR process? Things like that. But that’s all really important too, once you try to take this concept of inputs and outputs, set targets around them and try to actually roll that out in your organization.

Laura Tacho: Yeah. And just keep in mind if you’re getting a reaction to a metric that it feels unmovable or it feels like why are we even looking at this? It could be that you’re setting an output metric as a goal for a frontline team, that is valuable signal. And don’t just assume that you’re doing something wrong or you’re missing information. It could just be, oh, we’re looking at the wrong thing and it’s easy enough to adjust that, now that you have this new framework of controllable input versus output, hopefully it can ease some of that challenge for you in your own organizations. All right, thanks, all, for joining us live. Thanks for your thoughtful questions and keep an eye out for the next conversation that Abi and I will host around developer productivity metrics.

Abi Noda: Thanks, everyone.

Laura Tacho: Take care.

Setting targets for developer productivity metrics

Show Notes

Common pitfalls with engineering metrics frameworks

Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”

Input vs. output metrics

How to implement Core 4 well

How to avoid gamification

Timestamps

Transcript