A new way to measure developer productivity with Laura Tacho

Abi: Laura, thanks so much for sitting down with me today and coming on the show.

Laura: Yeah. It’s great to be here, Abi.

Abi: Well, we’re here to talk about the new paper I’ve published along with my co-authors, Dr. Margaret-Anne Storey, who goes by Peggy, Nicole Forsgren and Dr. Michaela Greiler. For people who haven’t seen the paper yet, it’s been published in ACM Queue, and is titled DevEx: What Actually Drives Productivity. We’re discussing this paper on this show because the primary audience for this paper is listeners of this show, that is engineering leaders and folks who work in developer productivity orgs. A couple months ago, as we were thinking about how to do this episode, I thought what better way to introduce this paper to listeners then to have a conversation with you about it? Just to start off, could you maybe tell listeners who you are and what you do?

Laura: Yeah, I’d be happy to. I’m so glad that we get to have this conversation, because I think there’s so much for us to talk about. I started coaching engineering leaders almost exclusively as my main business two years ago. Before that, I was a VP of engineering. I was a senior director. I’ve been in engineering leadership for a number of years, mostly in DevOps and developer productivity tools. The overlap between DevOps’ maturity and now morphing into developer experience has been, it’s quite an interesting journey to have been a part of for the last five years. I have helped at this point, with the last two years of doing coaching and training, about 250 different companies introduce or get more knowledgeable about engineering metrics. I’ve seen so many different kinds of problems, I’m sure similar to all the problems that you’ve seen. I’m really excited about this paper, and how it elevates and evolves the conversation around developer experience.

Abi: Well, one of the things I really appreciate about you is that you have so much in-the-trenches experience with metrics. Not only do you coach leaders, as you just shared, but you were also formerly an engineering director and dealt with a lot of these challenges yourself. I know in a guest post you did on Gergely’s blog, The Pragmatic Engineer, recently, you talked about a product called GitPrime, now referred to as Pluralsight Flow. You talked about some challenges with your journey with tools like that. I’d love to ask you what your view is on Git Metrics.

Laura: Yeah. I think that there are probably some people listening to this that use those metrics, and it meets all of their needs. I’m so happy that that’s the case. It wasn’t the case for me. I introduced GitPrime, aka Pluralsight Flow, to a large engineering team. I just found that it was a heavy lift when it came to getting my team to trust me and to trust the rest of the management team, in terms of what we were doing with those metrics. People feel very spied on, quite honestly. But beyond all of those sort of sentiment things, it didn’t actually give me the insight into where the problems were.

I had this aha moment in an offsite one time with just a small portion of the team, when we were talking about cycle time or just our ability to get stuff into prod. One of the engineers said, “I just wait so long for a code review.” I was like, “Well, I don’t really care about your opinion.” Thinking back, how could I have really thought that? But I think that’s the unflattering light of myself. That’s kind of probably what I was thinking of. I wanted metrics to tell me where the problems were, and that’s why I was so invested in a tool like GitPrime. But what I found was that in zero cases did those Git Metrics, those activity metrics that you can scrape from an API, tell me something that my team wasn’t telling me already.

Once I realized that, things just came into such clearer focus for me. Throughout my experience coaching all these different leaders, I found that it’s pretty much the same for them as well. Just that data alone by itself isn’t going to tell you anything that your team isn’t telling you. The power is really when we can combine them together. We can look at things like developer sentiment, developer satisfaction, alongside the workflow metrics that we can actually set sensible and realistic targets, and track our progress to see if we’re reaching a goal or not.

Abi: Well, I echo what you said. I have also an interesting experience. Before DX, I built a company called Pull Panda, which offered these types of metrics. Had very similar experiences to what you described, where these metrics seemed really valuable on the surface and companies would buy them. But when it came down to how they were using them, I would often find that they weren’t getting that much valuable insight. They were often being used in scary ways, such as performance reviews and stack ranking developers and that sort of thing. We were earlier talking about Pluralsight Flow, but there are today, by my count, at least 30 companies out there selling very similar products. There’s a lot of companies building their own tooling around this. I’m curious what your perspective is on that trend. Why are these metrics seemingly becoming more and more common, despite folks like us and other leaders talking about their ineffectiveness?

Laura: Yeah, this is such an interesting phenomenon in our industry. If you were to look at AWS usage 10 years ago or whatever, there was a focus of" “Just do everything.” Then the market swings toward, “Oh, we have to optimize.” I feel like developer productivity metrics are that same sort of trend of, “Okay. Well now I have this big engineering organization, how do I optimize it?” Because often, these engineers are the most highly paid people in your company, if not, they’re definitely up there. It’s also historically an organization that is quite opaque and black boxy.

For senior leaders, they want insight into what’s happening in these highly paid, highly competitive areas. They just want some insight. In the same way that we can look at revenue on the sales side, what’s the equivalent to look for? Well, there really is no one metric that matters. I think that’s been well established by research. We have all these companies trying to pull together dashboards and pretty graphs and charts, with automatic data as to not burden the team. So that they can tell a story in some ways to defend their budget, but also to provide more transparency to senior leaders.

Abi: Appreciate your perspective, and would agree. I think the topic of how to measure engineering and engineering metrics is often so contentious and polarized. Of course, a lot of folks have really negative experiences being on one side or the other of metrics being misused. However, I like how you talked about it. Because we should be empathetic to leaders and businesses who, as you mentioned, do spend tens of millions of dollars on developers. What they’re trying to do is something that makes a lot of sense. They’re trying to find ways to optimize that spend, to understand how good they are, how they can get better. To your point, I think that unsolved need is partly what has continued to drive the increase in some of the “bad metrics” that we see. As well as the continued research and evolution of other approaches, which we’ll talk about today.

Laura: Yeah, definitely. I think engineering is often a department that is highly dependent on other departments. Maybe in a more extreme way than, for example, sales can only sell to qualified leads that marketing brings in. But sales and marketing tend to work close together, and they have similar objectives when it comes to revenue. When it comes to engineering orgs, sometimes you’re working cross-functionally with teams that have very different objectives than engineering does.

Engineering’s responsible for the how it gets built, but not often the what gets built. I think for that reason, the metrics become just such a point of contention because it feels quite unfair. But at the same time, it’s absolutely reasonable to have a CEO wanting metrics of one of the biggest budgets in their company, how that’s being spent. Whether it’s wasteful or not wasteful, or whether we’re doing a great job or just a good job. It’s hard to know. It’s hard to have visibility.

Abi: Well, it’s funny you brought up the CEO specifically. I’m sure we’ll touch on this later, but I think for so many leaders out there this journey into, I like to call it the elusive quest to measure developer productivity, really often starts with being asked by a CEO, “Hey. Can you give us some metrics on engineering?” Of course, that’s not an easy thing to actually produce. I think this background, learning more about you and the brief discussion we had about metrics, will help listeners as we move into the discussion about the new paper, and as we talk about other research around this topic such as DORA and SPACE. As mentioned earlier, I want to now turn that over to you to lead this discussion. I think you’ll have so many great questions, not only about the research and what it means for leaders, but also how it can be applied. With that, Laura, you are now the host.

Laura: Amazing. I have so many questions for you, Abi. The first thing is that we’ve actually been talking now for however many minutes about developer experience, but we haven’t actually addressed what it means. There’s no better person that I can think of to give a definition of it than you. Can you give an answer, what is developer experience anyway?

Abi: Yeah, great question. When we began our research into developer experience, one of our goals was to actually define it. I think you started to see developer experience talked about a lot in the news and on blogs and Twitter, and people had varying definitions for what it was. Some people actually were referring to external developer experience. Meaning tools vendors that provide software to developers, what is the user experience of their products? Then there was another half of the population that was talking about internal developer experience. Today we are talking about internal developer experience. In our research, we define developer experience as how developers think about, feel about and value their work.

It sounds like a simple definition, but it actually took a lot of effort to come to that definition. That builds on prior literature, including a lot of research from psychology on just what the word experience means in individuals. If we take that definition of developer experience, what it means in more layman’s terms is the actual lived experience of developers. What are the pain points, the friction points, the delightful moments that they experience in their everyday work? I think what’s also interesting about developer experience is that there’s the definition of what developer experience is, but developer experience is a practice. The way I like to describe that practice is that it’s a developer-centric approach to understanding and improving developer productivity.

Laura: Mm-hmm. That’s such a nice definition to think about the positive of how you think about and value your work, and how you feel about it. Because a lot of definitions of developer experience or when it’s talked about is often framed in the negative. We’re not going to tell you what is great developer experience, but I can tell you what bad developer experience is. I really like the work that came out of Stripe. There’s a really great Twitter thread by Rebecca Murphey. Who explained how they rolled out, and how they treat developer experience in letting developers register paper cuts, from the phrase death by a thousand paper cuts.

There’s all these little things that you encounter that slow you down, introduce friction, introduce frustration into your world. I like thinking about the other side. What brings me joy as a developer? I love seeing my build finish really fast. I feel very productive. I feel like I’m unhindered. I can do the work that needs to be done, and I don’t have to contact switch and do all of those things. Obviously, developer experience is important to organizations who want to maximize the effectiveness of their development teams.

You mentioned that this work on developer experience is building on not just psychology, other business research, but also quite a big body of research that came out of what we might call the DevOps movement. That was sort of something everyone was talking about in 2017-2018. Kubernetes is on the scene now. We’re talking about this shift to cloud, digital transformation. In fact, some of the co-authors with you on this paper have done a lot of research in that arena. Can you tell me a little bit more about how you connected with Nicole, Peggy and Michaela to work on this?

Abi: Well, Nicole, Peggy and Michaela are all highly acclaimed researchers in their respective areas of software engineering research. I think what ultimately brought us all together is this recognition that this is a big problem for industry, a big problem for leaders. This problem of, what should we be measuring and focusing on to improve developer productivity, as mentioned in the paper, has been elusive. No prior research has really addressed that question specifically. When you look at something like DORA, and I’m sure we’ll talk about this more later, DORA wasn’t focused on developer productivity.

It was actually, as you kind of alluded to, more focused on this movement around continuous delivery and digital transformation. And focused on a construct called software delivery performance, which is different than developer productivity. Then later on Nicole Forsgren and Peggy and some others got together to write this paper called SPACE, which provided a new framework for how to think about developer productivity, which again was a distinct construct and problem from software delivery performance.

I actually known Nicole from way back when. Her and I connected originally when I was working on my previous company, Pull Panda. She had just published Accelerate. I reached out to her just to bounce ideas around this problem of engineering metrics. I was really inspired by her work. Fortune had it that her and I ended up working together at GitHub a couple years later on this specific problem. Her and I worked on, actually it’s kind of funny, incubating a Git Metrics product while we were at GitHub.

We have a lot of funny stories about that, but we also worked together on actually applying the DORA metrics and some of the concepts from SPACE in our work at GitHub, and trying to bring some of that knowledge and practice to customers. Her and I crossed paths and worked together right around the time that she started writing the SPACE framework. A couple years later, once I had left GitHub and was working on DX and this research in the developer experience, her and I just reconnected. Said, “Hey. This is a problem we have both been talking about and trying to work on. Let’s pair up and try to tackle this together.”

Peggy and Michaela also I know through other channels. Peggy is one of the most acclaimed researchers in regards to developer productivity. She’s published hundreds of papers on developer productivity. Much of the SPACE framework was really built on Peggy’s prior research in the field. In my perspective, we couldn’t ask for a better group of researchers to come together to try to evolve this work, and move the industry forward in terms of how to actually apply these principles.

Laura: Yeah, absolutely. I work with so many leaders who spend time in the theoretical space. Maybe they’ve read these papers. Really, they’re on the ground and they have problems, and they need something that’s going to help them solve it. Let’s talk about DORA first, as we kind of go through a little bit of this evolution story of where we ended up with DevEx. As you said, Dr. Nicole Forsgren is probably best known for her book Accelerate. She also worked on DORA. I actually met Nicole back way before Accelerate happens. I think we met at like an O’Reilly conference in Santa Clara in, I don’t know, 2014 or something. I mean, it’s a small world.

It was a very small world of women who were working in developer tool space. I think she was at Puppet, if I’m not mistaken, at that point. We’ve had lots of path crossing. Then I was at CloudBees, and CloudBees was a big sponsor of State of DevOps Report, so we had another opportunity to cross paths again. DORA has emerged as the canon of metrics when it comes to measuring system performance. What are some of the reasons that organizations grasp on to DORA and adopt DORA? What kinds of questions are they trying to answer with DORA?

Abi: Yeah, it’s interesting. Taking a step back, I really think, as you’ve mentioned, leaders don’t always understand the nuance distinctions between something like DORA and something like SPACE. And now the new framework which we’ve published. I think when you take a step back, leaders are latching onto all of these things with the same end goal in mind, which is the one we talked about earlier. They’re really just trying to either report to stakeholders or report to themselves, to get an understanding of how their organization is doing. Is it good? Is it bad? Is it so-so? How can they get better? What are the bottlenecks? What are the constraints?

What are the insights they can pull from this data, these metrics, that can help them take action or make decisions to improve their organization? I think DORA, there’s a lot of interesting discussions happening around DORA, including from the researchers who currently lead DORA at Google. One of the semi-critical ways I’ve seen DORA described by some of these researchers is that they’ve called it the easy button for engineering metrics. The DORA metrics, there’s four of them, the four key metrics. They’re on page 19 of Accelerate. Everyone seems to be using them. It’s sort of become this easy button for folks who are trying to solve this really hard problem.

What to measure, how to use metrics and data to inform action. Here are these four “research backed metrics.” It’s an easy button for organizations and leaders to adopt these metrics, and say, “Okay. Now we have metrics. We’ve gone from nothing to something, and off to the races.” I think that’s one of the reasons why the DORA metrics have become so popular and really proliferated across the industry. They’re simple, they make sense, they’re good metrics. You look at them and there’s nothing scary or harmful about them, such a contrast to something like lines of code or counting commits. They’re well-intentioned, there’s benchmarks, there’s research behind them, and lots of folks are using them.

Laura: Yeah. I think Accelerate was also so pivotal in propelling DORA to the forefront in the hearts and minds of even non-technical leaders. In my own experience, I come from developer tooling infrastructure world. We’ve got the Phoenix project, we’ve got books like that. Accelerate is right there on that very short list. If you’re working at a company like this, chances are your CEO might even have read Accelerate. I found that a lot of leaders I work with as well are getting asked about DORA metrics from people that have never been software developers in their lives. They’re so accessible, because it’s such a nice short tidy list. There’s benchmarks and there’s research. It’s like, “Well, why can’t I have this? Can I have this now? I would like to know this data.”

I think one of the limitations of DORA, and I think this is openly talked about in the following research, for example SPACE, is that DORA was never meant to be the end of the story. DORA is just insight into software delivery performance. It’s not developer experience. What I find very fascinating is that SPACE… Maybe you can give a breakdown of what SPACE is, as we’ve been tossing around these acronyms. SPACE, also Nicole worked on, and is very much continuing to pull that thread through and build on new things. Could you tell the audience who’s listening that might not have heard of SPACE yet, just very quickly what SPACE is?

Abi: Yeah. I’m excited to dive into SPACE. I do want to say one more thing about the DORA metrics. We talked about how they’ve become so popular. They’re this “easy button” for organizations, but more and more you’re hearing leaders talking about the shortcomings or limitations of those metrics, which SPACE kind of gets into. A very common conversation that I’ve been hearing more and more from leaders. If you ask them what insights have these DORA metrics actually given you, or how have you been able to actually use them? More and more leaders are realizing or finding that, although the DORA metrics are helpful for getting a big picture sense of your organization and comparing against the benchmarks, it’s actually difficult to derive much more than that. It’s very difficult to identify the specific constraints in your organization from such high level metrics. I’m curious actually if you’ve seen that sort of trend or pattern with the leaders you work with as well.

Laura: Absolutely, and this is something that Gergely and I talked about as we were going through the prep for the article that I wrote for Pragmatic Engineer. He shared that when he was at Uber, they used DORA metrics as well. When you qualify for Elite, which at the time was the highest category, now that category has sort of been dissolved for various reasons. But if you get the gold star when it comes to DORA metrics, you basically get a pat on the back. It says, “You’re doing a good job,” but they’re not going to give you any direction in terms of what is holding you back.

Whereas if you’re just starting on that journey, DORA metrics can be very valuable. Because it tells you, “Oh, you should really look at your cycle time,” or, “Your MTTR is way out of whack, and this is what you need to focus on.” But especially as well for teams that have been established in the last five years, I mean even seven years to this point, DORA metrics might just not tell you what you hope. They’re just going to say, “You’re doing a good job.” You have to do something else in order to get the insight at the level of where you can actually have impact.

Abi: Yeah, I completely agree. It’s funny. Nicole and I, when we worked at GitHub, one of the things we did was deploy the DORA metrics internally at GitHub. I was actually given an OKR that was titled Accelerate GitHub Engineering. We tried to use the DORA metrics as the sort of KR to drive that work. What you just described was exactly our experience. We had the DORA metrics, and they were pretty good. So the next question was naturally, “Okay, so what should we do to actually improve?” The DORA metrics didn’t answer that question. Not only did they not answer that question, as we went out to a lot of teams and leaders to interview them and talk to them about, “Hey, what would it take to increase your lead time?” It became very clear that the individual context was being lost in this bubbled up metric of lead time.

For example, if a team had a lead time of three days. Well, is that good? Is that bad? It depended on what was actually going on with that team, what the good and bad reasons were for that lead time. One example, and I know in a recent conversation with Nathen Harvey, who now leads DORA at Google, he brought this up as well. A mobile team, an iOS team that deploys via shipping through the app store and getting review, this notion of lead time doesn’t even really apply to them in the same way that it does to other teams. I saw that playing out in smaller ways across the organization, which made it really difficult to do anything with the DORA metrics.

Laura: When I teach leaders about DORA metrics, I have the big disclaimer. “These metrics are great if you are a web application team. That’s what these benchmarks are mostly drawing on.” If you are a mobile team, an SRE team, a data engineering team, as you mentioned, there’s just so many other different types of engineering teams that are not web application teams. DORA metrics can be, first of all, they can make you feel really bad because they’re unrealistic. For mobile teams, for example, you’re beholden to the app store, but they’re also just not appropriate for the other types of work. For example, SREs have a lot of interruptive work. That’s sort of the whole point. They’re not working in the same cycles that a web application team is working in.

Companies, one of the failure modes I’ve seen is that they come in and say, “Okay, DORA metrics across the board.” Because it’s an easy button, and they expect it to be an easy button without taking the time to consider the nuances, even across the different types of teams that they have in their own organization. Then ending up with teams who feel unfairly judged, or they’re not getting the results that they expect from instrumenting DORA metrics. And at the end of the day, not getting the insights that are actionable for them. As we talked about before, DORA does have limitations.

I think it was recognized that it had limitations even by the original authors. One thing that I find very interesting is that SPACE, and this paper, The SPACE of Developer Productivity, Nicole was researching on this paper as well. So this is very much an extension of her own experience working with DORA metrics so in-depth, and then coming into SPACE. Before we get so far into SPACE, just so that everyone isn’t googling acronyms, can you give a very quick overview of what SPACE is and what it’s intended to do?

Abi: Yeah. First of all, funny how you described the lead in to how SPACE became a thing. As we talked about earlier, I think one of the misconceptions people have is that SPACE is a better way to measure or understand developer productivity than DORA. In fact, they were really just two separate things. As we talked about earlier, DORA was focused on software delivery performance, which was really around the ability to deliver software. In contrast, SPACE was trying to tackle developer productivity head on. Thankfully there’s actually been decades of rigorous research around what is developer productivity, and what drives developer productivity. SPACE was really a meta-analysis or a summarization of these decades of research around developer productivity, in order to provide a practical framework for practitioners.

The SPACE framework, SPACE is an acronym that stands for these five different dimensions or categories, aspects of developer productivity. The main point of SPACE was that developer productivity cannot be reduced down to a single number or a single frame. Developer productivity is highly complex, because development work is very multi-dimensional and varied and diverse. I recently was having a conversation with a leader who said to me, “Hey, I’d hoped that SPACE would just provide me this canonical list of metrics that everyone should be measuring, kind of like DORA. Instead, it gave me this 5x5 or 4x4 matrix of examples. It left me feeling even more confused.”

While I understand where this leader is coming from, I think it’s important that people understand that was partly the point of SPACE, was to make the point explicit that developer productivity isn’t simple. You can’t just measure it with one or two simple metrics. I think SPACE has done a great job of creating that awareness, that developer productivity is more than just the number of commits or cycle time or number of lines of code. There’s a lot of other aspects to it. In particular the sentiment of developers, the satisfaction dimension in SPACE is something that’s really important.

Laura: I often get asked the question with leaders that I work with. “Okay. Should I use DORA or SPACE,” as if they are somehow in competition with one another. The answer I give them is that SPACE is the overarching big picture. As you said, it is categories. It is not a list of metrics. I think that can be disappointing for some people, because they want another easy button. I educate them on how SPACE helps us broaden our thinking around developer productivity, and helps you make sure that you don’t have any places where you’re just really missing insight. For example, the satisfaction and wellbeing category, the communication and collaboration category, those are the ones that are most often overlooked.

What I tell them is that DORA is measuring a capability to ship code, and that capability is one aspect of developer productivity. We can actually put DORA within SPACE, and see where DORA metrics align to the SPACE categories, but they’re not in tension with one another. I really like your perspective that they’re actually totally different things. In my mind, I keep them so connected that one fits into the other. I wonder if you could just give me a little more of your take on that, because I think we have different experiences there.

Abi: Well, I really love your framing, and I don’t disagree with it at all. When you take a step back, what leaders are trying to do with these things are ultimately the same. So I think the advice you give to leaders is spot on. Because they’re looking at this as a constellation of different metrics and trying to make sense of, “Okay. Do I apply one or the other,” or as you suggest, are they really one and the same? I think the perspective I was sharing is that, although there may be overlap in the metrics themselves, the research behind DORA and the research around SPACE were aimed at two different problems and two different lenses into organizations, both of which are important.

I think Accelerate was much more focused on the sort of DevOps capabilities, the continuous delivery capabilities and the system performance side of things. Whereas SPACE is more holistic, and it’s looking at what drives team, organizational and individual productivity. When we talk later about this new paper and this new framework, we’ll actually find the same thing. Where the DORA metrics are actually encapsulated in the DevEx framework and the measurement, the example metrics that we provide in the paper.

Laura: We’re inching our way there. I think understanding that historical context of how and when did DevEx evolve. Because all of these things have really informed not just the research, but also the practitioners and the questions that they’re having. As they go up on the question ladder or the problem ladder, what they’re aiming at, what they’re trying to solve for. To that point, we talked about DORA being an easy button. SPACE is not really an easy button. It doesn’t have a list of metrics. In your experience, what are some of the reasons that organizations are drawn to SPACE, and how do they even implement it given that it’s not very prescriptive?

Abi: Well I want to first call out we recently had Dr. Margaret-Anne Storey, or Peggy, on this podcast where we talked specifically about SPACE. She had a lot of interesting observations and stories to share around the challenges of organizations trying to apply it. I would definitely recommend listeners check that out. I think it’s funny you mentioned that SPACE is not an easy button, and doesn’t just provide this easy list of metrics. I think one of the most common anti-patterns I see is that people do just scroll down to the example metrics table in SPACE, and view it as a thing that we should just copy and paste into our organization, and call it a day.

Peggy, in our conversation in this past episode specifically talked about how those metrics were examples, not recommendations. Not even suggestions, they were examples. In reality, SPACE is quite difficult to apply. You can’t just take SPACE and turn it into a dashboard easily. There’s a lot of people I think such as you who can help organizations think about how to do that in useful ways, but it’s not as easy as just buying something off the shelf, or definitely not just copying and pasting the list of example metrics from the paper.

Laura: I could not agree more. I think one of the common pitfalls I see is that we look at SPACE and think, “Okay. I’ll just pick one metric from each category, and call it a day.” It is a shortcut, but that’s not really the intention. In fact, when I was having a conversation with Peggy a couple months ago, I said, “I get this question a lot. Right from the source, I want to know your opinion. Is it important to have a set of metrics for every single category?” Kind of in short the answer is, “Well, no. In the paper we suggest three different categories is kind of the minimum.”

But really we talked about SPACE being a set of lenses to help you see things more clearly, and make sure that you’re not missing things. It’s more of a guidance and set of principles, versus something that’s very prescriptive. Which to a group of leaders who are known to be very logical, rational, and don’t have a lot of time, that level of fuzziness can get in the way of the value of SPACE because it is difficult to implement. And it does take time to do, versus DORA where you can just integrate that old GitHub API in your Jira, and have some answers pretty quickly.

Abi: Yeah. As you’re alluding to when your CEO asks you for metrics, you don’t have time to go through all the fuzzy challenges of SPACE. You just need to come up with some metrics, and it does become another easy button. When I talk with, for example, developer productivity teams or developer experience teams, that have a little bit of a different perspective on this problem, they’re really looking for actionable insights. These teams aren’t really looking for metrics just to appease a stakeholder or report up to a boss. They’re really looking for insights to guide action.

I think what I’ve found with these teams is that there really is this practical gap of organizations where they’re struggling to implement SPACE, because it is so fuzzy. That is one of the things that motivated us to work on this new framework. I think SPACE made this argument that organizations need to be looking beyond just activity-based measures like commits or pull requests. This new paper and new framework which we’re providing, I think provides a practical approach for actually doing that.

Laura: What’s so interesting, and I’ve realized this just now as we’re talking. When I am working with a leader who is interested in implementing DORA, oftentimes they’ve been asked for these DORA metrics from someone else. But the leaders who are very interested in implementing SPACE, they don’t have their CEOs asking them about SPACE. It’s just not as accessible, because it’s not so cut-and-dried as DORA is. The leaders who are implementing SPACE are really doing it because they see it as a way to improve the quality of life for their development teams, and they really want to have a holistic view of it.

I think now we’re at a point where we can maybe transition to talking about developer experience, and kind of what’s the next step of the story. In your paper we have DORA, we have SPACE, but you’re introducing three more pillars of developer experience. Could you talk about that, and we can get into some of the details of what new stuff, or what’s being presented with more evidence in this paper that’s just come out.

Abi: Yeah. I think first for some background on the origin of this new paper, we’ve talked about these challenges with existing approaches, methods out there for what to measure and how to measure. Similarly, there’s still been this overarching elusive problem for leaders of what should we measure? What is meaningful? What should we be measuring to understand and improve the productivity of our developers, whom we spend tens of millions of dollars on? The purpose of this paper is to present a new and practical approach for doing exactly that. What we talk about in this paper is, first off, a practical framework for how to think about what developer experience consists of, and what it is.

In our prior paper, we published 25 of the most relevant things that affect developer experience. We felt that was too many things to bring to practitioners, and so we wanted to find a more simplified model for organizations. How do we as an organization actualize this and operationalize this? How do we begin measuring it and improving it? That was sort of the second part of our paper, where we provide a set of example metrics, and recommendations on organizations, what they should be measuring and how they should be measuring it, along with some real world examples.

Laura: Can you talk us through those three pillars of developer experience?

‍Abi: Absolutely. The three, we call them dimensions or core categories of developer experience are flow state, feedback loops and cognitive load. These three categories emerged from our research. We found that these three dimensions really crosscut all the different factors that we’d uncovered in our prior research. With these three dimensions, what we’re recommending to organizations is, “Hey. Just focus on these three. All the rest of the factors which we’ve identified in our research interrelate or emerge within these three categories.”

Just to go into them briefly, flow state is actually something that was introduced in SPACE, if you recall. There is a category of SPACE called efficiency and flow, which talks about the sort of cognitive state of flow, which is a really interesting concept I think. There’s a lot of opportunity for more research on that specifically as it pertains to software developers. However, flow state is that sort of cognitive state of being fully energized, immersed, fulfilled in one’s work. You lose track of time and space. I think for leaders, the important thing to take away from flow state is not only that it creates the space for optimal productivity in developers, but that there’s also this piece around it which is about motivation. How do you actually motivate developers to be energized and in the flow state in their work?

When we talk about how do you optimize your tens of millions of dollars of developer spend, I think leaders often immediately leap to, how do we make people more efficient, or how do we make them work harder? Those are both things you can try to do, but I think what’s often missing is, if you want a developer to be working at 9:00 PM on a weeknight… Which I’m not saying you should want to do that, but if you did, the way to actually achieve that would not be through cracking the whip. It would be through providing such fulfilling and energizing work that a developer is motivated and loses track of time. All of a sudden, they’re up at 9:00 PM solving a really difficult problem.

Laura: Yeah. I love that “fall in love with the problem, not the solution,” kind of thinking. I’ve been very curious about flow state as well, actually since getting into a bit of the research because of SPACE. There’s different things that you can do to try to optimize it, but really it comes down to, are you interested in doing the work and is it enjoyable to you? As you said, how do you think about it? How do you perceive it? How do you feel about it? Are those positive? Then flow state is more within reach.

Abi: Yeah, absolutely. I didn’t fully answer your original question. The other two categories are feedback loops and cognitive load. Feedback loops I think is most closely tied to things like the DORA metrics, and more conventional metrics that you see organizations using, which are around what’s called value stream mapping or value stream optimization. Feedback loops is a lot about cutting out the wasteful time that developers spend waiting for feedback from their systems and tools, as well as people. It’s also important to call out, it’s not just about the time spent waiting for feedback, but also ultimately the quality of the feedback received. Which I think we’ll talk about later, the distinction between speed and quality.

The third category is cognitive load, which really describes the mental processing required for developers to do their work. This, again, manifests in many different ways. This could be the difficulty of going into some mucky code that they need to work in. Or it might be the difficulty of figuring out how to support and deploy software in a increasingly complex environment, with tools such as Kubernetes. Or it might just mean the difficulty of looking up an answer to a technical question from a person or through documentation.

Again, these three categories encapsulate all the different factors that we found in our prior research. By focusing on these three dimensions, organizations can have aligned conversations around what they should be focusing on to uncover opportunities. Then as we’ll talk about next, how they can approach actually measuring and capturing insights about the developer experience.

Laura: One thing I wanted to ask you on the altitude level of these dimensions is that satisfaction and wellbeing. Which is often, when people are very brand new to developer experience, they think about it as analogous to developer satisfaction. They’re thinking about DevEx can be measured by an eNPS score or an Employee Net Promoter Score. It’s a measure of satisfaction in how happy someone is to do their job, and whether or not they’re going to recommend, “Work on my team,” to one of their friends. What’s interesting to me is your three dimensions don’t directly address satisfaction in any way.

In fact, when I talk about SPACE with the leaders that I coach, I often give them this kind of silly visualization. SPACE is a four-legged octopus. The head is the S, and then everything else flows down, where satisfaction touches all everything. If you’re working on stuff that doesn’t get used by end users, your satisfaction is going to be lower. If you’re constantly being interrupted, your satisfaction is going to be lower. Does that come into play here with the way that you’ve set up these three dimensions. And how they interconnect with each other, and actually influence a developer sense of satisfaction in how they live, their lived experience of being a developer?

Abi: Great question. I think, first of all, when people talk about satisfaction, it’s sort of a catch term for a lot of different but related ideas. For example, a lot of companies are talking about, “Oh, we measure developer satisfaction. It’s one of our C-suite metrics now.” When I ask them, “Oh, what do you actually mean by developer satisfaction? What are you actually measuring? What’s your survey construct?” I’ve found that people are measuring all different types of things. Some people mean satisfaction with tools. Other people mean job satisfaction, which is more closely aligned with something like eNPS. Which eNPS is funny, because a lot of researchers I’ve met with, and who come from more the employee experience and engagement side, have a lot of criticisms of eNPS.

Because eNPS actually doesn’t even measure satisfaction. It measures employee advocacy, how much people want to advocate for their organization as a place to work. Anyways, hopefully that helps a little bit in terms of framing the fact that satisfaction is a little bit of a catch term. Now in terms of this new framework and this new paper, satisfaction definitely crosscuts all the different dimensions. I would more so describe it as just developer sentiment, right? Developer sentiment is really the leading… The strongest signal I think you can get across any aspect of developer experience or development tools and processes, just asking your developers is a direct source of insight into what your constraints and bottlenecks are as a leader.

Laura: Absolutely. My biggest advice is just ask people. I say, “If I could rent an airplane to do skywriting, to just summarize my advice on introducing developer metrics, it’s very simple. Just ask them, because chances are they have the answer. They live that pain or that joy every day. They know. It’s very close to them.” It’s surprising, as the logical, rational engineering leader who has a bias toward quantitative, automatically collected data, how little value that we put on things like perception and sentiment. One of the things that I really like about this paper, and that I think is going to be quite… I mean it is quite a divergence from things like DORA, which are focused on fully automated collected.

It’s quite a divergence from DORA, which is focused on metrics that are completely collected in an automated way, anything you can plug an API into. In this paper, this collection of metrics, or the insight that you get from perceptions and workflows, perceptions and workflows are at the same… They’re peers to each other. Perceptions are human attitudes and opinions, and workflows are those metrics about system behavior. How did you arrive at this two types of metrics that allow you to measure something much more comprehensively?

Abi: One analogy I like to share is, imagine you go to the doctor and you’re feeling really sick. So you have the perception that you are ill. You go to the doctor, and the doctor just takes some objective metrics, quantitative metrics. They take your blood pressure, your temperature, your heart rate. They say, “Hey. It looks like you’re all good. Nothing wrong with you. All the metrics look great.” You would be flabbergasted, right? You would say, “No, wait. Hold on. I’m telling you that I think something is wrong.” I think that little anecdote highlights the importance of focusing on the perceptions and opinions of developers. Because as humans we’re able to observe the whole system. Not just the small individual things that we’re measuring in our systems, but the full picture, the gaps between the tools, the experience of using the tools and systems, or going through the processes.

Conversely, imagine if you were to go to the doctor and you would say, “I’m feeling really healthy.” They didn’t take your blood pressure. They didn’t take your blood glucose levels. And they miss something. They miss something that actually might potentially be wrong with your health, or could be an opportunity to improve your health. I think that’s similarly another problem we see. If organizations only focus on the perceptions and observations of developers, you might miss out on understanding more nuanced opportunities for improvement, or even explanations for why someone might be feeling or have a certain attitude about something. So you really need both subjective and objective, or as we call in the paper perceptual and workflow measures, to get a complete view into your system, and understand the developer experience and opportunities to improve it.

Laura: I would love it if we can get just extremely nuts and bolts-y about this approach to measurement. Because I think it is something so new that I think some leaders might be having a hard time wrapping their head around it. We talked briefly about code reviews being a way to explain this. Can you explain how you would measure something related to perception versus the workflow, when you’re trying to measure code reviews?

Abi: Absolutely. I think code review is a really intuitive example, because we all understand how that’s currently measured a lot by organizations. Code review turnaround, great example of something that is fairly measurable through our systems, and is also quite commonly measured by organizations. When you’re looking at something like code review, there’s the system side or the workflow side, which is how long does the typical code review take? There’s various ways to measure that, often through tools like GitHub. But one of the things that I think is commonly missed is, A, is there a problem?

I’ve met with many teams where some teams might take three days for code review, but actually they never get blocked because they’re working on parallel things. Whereas another team that takes two hours for code review, but they’re single track in their work, so they’re actually really frustrated that they have to wait. So you need to balance how long do code reviews take with, do our developers, are they actually getting blocked by the code reviews? Are they actually frustrated by the code review process? Again, you need both.

I think having the sentiment from developers, their perception of the code review turnaround, tells you the magnitude of the pain and the bottleneck felt by developers. But on the other hand, even if developers feel perfectly happy with code review turnaround time, as a leader, you might say, “There’s still an opportunity here to optimize this, so that we can get these features out to our customers faster. Even if the developers aren’t getting blocked by them, I want these features to get to customers faster. Therefore, this is something we should optimize.” Hopefully that’s a helpful sort of concrete example of, again, the importance of both the perception and the workflow as it pertains to something like code review.

Laura: I think that’s such a great example, and also one from my own life. I’ve worked on a team where we had 24 hour code review. Some people on the team thought that was great, and some people thought that it was way too slow. It’s the same objective number, but such a difference in perception. That really did shift, as a leader, what I chose to invest time on, what was the perception of my team? Did they think it was a problem? Because it doesn’t really matter what I think, it mattered what they thought. I want to get into a little bit then about how to actually measure this stuff. Because now it seems like we’re adding automatically collected metrics on top of survey data, and there’s just so many different ways to collect all of this data. Does the paper make a recommendation on how to collect this workflow and perceptive data?

Abi: We do make a specific recommendation, and that is to start with surveys. You brought up that maybe historically things like DORA are more often measured through system data. It’s funny. I think that is actually a anti-pattern that has proliferated, because of the bias which you mentioned earlier of engineering leaders to want these highly precise realtime quantitative metrics. I think one of the things folks don’t realize is that Nicole and the research she led around DORA actually measured the DORA metrics with surveys. In fact, Google continues to use surveys to measure those four, or five metrics now, and produce the benchmarks every year.

When Nicole and I worked at Microsoft together, Microsoft came to us and said, “Hey. We’re trying to measure the DORA metrics, but we have all these different systems.” I mean the scale is hard to fathom of Microsoft engineering. What do we even define as a deployment? When do we start the “timer” for measuring something like lead time? Ultimately the advice we gave to the team at Microsoft was to use surveys. Because surveys, you don’t have to necessarily have a standard instrumentation and analytics approach to data, if you are just asking people to report how their systems work or what they’re seeing on the ground. Anyways, that was a little bit of a tangent to your question of our recommendation.

In the paper, we recommend starting with surveys. Because surveys are, first of all, the only way to capture the perceptions, the perceptional measures, which we’ve talked about. But they’re also a way to capture a lot of the workflow measures within an organization. Albeit not with the same level of precision or continuity that you would if you were to instrument all your systems, but you can get enough to inform the types of decisions that organizations are trying to make with this data. Then our recommendation is, if you need more real time data, or if you need higher precision data, or if you want to crosscheck your survey data and triangulate it against system data, then invest in the often very expensive and challenging work of instrumenting systems to capture that realtime data and produce metrics that way.

Laura: Yeah. I teach a course on developer productivity metrics, developer performance metrics. I do a bit of a sneaky thing. I have the leaders take the DevOps quick check, the DORA metrics quiz where they’re self-reporting how frequently they deploy their self-reporting, their MTTR. Then when we get to a point of discussing tooling, and how they might practically measure those values at their own company, the conversation very quickly turns to, “Okay, what tool can I buy?” When I say, “You can do this all in a survey,” they all kind of look at me puzzled.

I say, “You all did this in our last session, and you thought it was accurate per your own answers.” The point here is not 100% accuracy. Very few companies need 100% accuracy. They need enough data to make a decision with confidence. Usually the self-reported data, I’ve found does that just fine, with very outliers that might need to have it supplemented by the automatically collected data that’s beyond what you can just get by doing some good searches in GitHub PRs, or looking at metrics that Jira makes for you automatically.

Abi: Absolutely. It’s interesting, the appendix of the book Accelerate actually goes into quite a bit of depth on advocating for survey-based measurement, and the advantages of survey-based measurement compared to system data. One of the points Nicole makes in the book, as well as in a later article she’s published called DevOps Metrics. That’s another ACM article, where she breaks down the differences between system-based metrics and survey-based metrics. Although there’s this bias to “trust” the system metrics more, oftentimes the system data is lying to us, in the same ways that we fear people will lie to us in surveys. How many times do our logs not really tell us the truth about what’s going on in a system? Or how often do we run into a problem where the data isn’t normalized or cleansed in a way that actually produces accurate insights?

I recently wrote an article called Prejudice Against Surveys. I think there is this bias against surveys as a form of measurement in this industry, because of our origin as computer programmers who love this realtime data from our systems. I think that’s something that’s changing. We see leading companies like Google and Microsoft rely heavily on survey-based measurement for understanding their developer populations. Another thing I like to share with people is looking at other industries, like healthcare or education or even economics, things like inflation and GDP are actually all measured through survey-based approaches. I think this prejudice against surveys, this fear or skepticism towards surveys, it’s just an education gap. It’s an evolution that I think our industry is going through, and will continue to go through.

Laura: I definitely agree with you. I want to close out this conversation by talking about how companies are taking the content of your new paper. So the idea of the three dimensions, plus measuring things, workflow metrics and perception metrics, and how this actually looks on the ground tactically when they’re trying to solve these problems. There’s two examples in the paper, one from eBay and one for Pfizer. Now these are both very big companies that have developer experience teams. They have big budgets, lots of developers. How have companies like eBay and Pfizer taken the principles that are put forth in this new paper, and enacted them actually on the ground?

Abi: One of the exciting trends I think we’re seeing in the industry is the growth of these dedicated developer productivity or developer experience teams or organizations within these companies. eBay and Pfizer are both examples of companies that have recently stepped up their investment in dedicated leaders and functions that are focused on improving developer productivity. I think what’s also interesting about the eBay and Pfizer examples is that they together encapsulate the two primary approaches to improving developer experience, and where both are necessary.

eBay is a great example of an organization that has the centralized team that’s understanding the developer population. Triaging the bottlenecks and issues that they’re discovering. And then focusing on these sort of crosscutting large sweeping initiatives that are really being driven through leaders of these different organizations and the DevEx team itself. In contrast, Pfizer is very focused on enabling local teams, individual managers, directors, to get access to these types of insights that we’ve talked about. And then focus on more local improvements that they can make in their individual areas. When I talk to leaders, I always stress the importance of having both of these things happening at the same time.

One of the things that we talk about, and have found in our research around developer experience, is that there always is this combination of crosscutting issues, things like maybe build times or deployment infrastructure. These are problems that often span across the entire organization, and probably need to be centrally addressed. But at the same time, so many of the issues and challenges with developer experience are local problems. It might be a specific part of the code base that one team is working in. Or it might be the code review processes of that specific team, and how that team interacts with this one other specific team. So it’s really important that improvements to developer experience be seen through these kind of parallel tracks of what are the foundational crosscutting things that we can address? And how do we enable, educate and empower the individual teams and managers to understand their own constraints, and focus and improve those.

Laura: Yeah. What advice would you have for those listeners out there who are not at companies like eBay and Pfizer, with thousands of developers and big buckets of money, who are at the early stage startup, scaling startup? When they’re trying to look and incorporate DevEx principles and dimensions into their work, what do you recommend that they do?

Abi:. I think when you look at a large organization like eBay or Pfizer, there’s this enormous ROI potential in increasing developer productivity by even like 0.5%. The dollars you accrue from doing that are enormous. In a small startup or just a smaller organization, that ROI might look a little bit different. However, I think the stakes are often even greater. If you’re a startup that’s searching for product market fit, you cannot afford to have this few number of developers that you can afford already be wasting time, or being held back, or being unmotivated because of their environment. I think in a startup environment, it’s equally important to start getting a baseline, and understanding and being focused on the developer experience.

Albeit it’s probably not going to be a central team like these larger organizations. It’s probably just going to be the engineering leader of that company organization. One example of an organization I’ve worked with a lot that’s doing this is Vercel. They started benchmarking and understanding their developer experience quite early in their journey. I actually asked them the question, “Why did you start when you were only around 20 to 30 engineers?” They said, “Getting this baseline has been so invaluable for them as they’ve grown,” because they’ve now more than tripled since then. Having that baseline has told them things in the trends and the changes that have happened, that they wouldn’t have been able to uncover without having that prior baseline already.

Laura: Yeah. It’s so much cheaper to try to start tracking these metrics, and start working with them in the very early stages of your company, versus trying to roll this out to thousands of developers after you’ve gone through scaling. It might seem silly or overkill. That’s some of the words that leaders I work with talk about, when they think about, “Oh, do we really have to measure developer satisfaction? We have two developers.” Sure, maybe you’re not getting the statistical significance with two developers. But I’ll tell you, it’s much easier to incorporate these into your engineering culture, and be purposeful about what you build, versus trying to add them later when you have 100 voices in the conversation who are slightly disagreeing about what matters. It also makes it really clear when you are scaling, who you’re trying to attract and who you’re trying to repel like a magnet. I find it to be very useful for that.

Abi: I’ll add one more thing to that. Earlier we talked about how it can often be overkill if you’re trying to get the DORA metrics to invest tons of money into sophisticated realtime data analytics, versus just starting with a survey. I think there’s a similar concept or principle that is important for developer experience. You don’t need to start out with some sophisticated, rigorous survey program to begin focusing on developer experience. Really, these surveys and listening programs are just evolved or scalable versions of one-on-ones and retros, which are things that nearly every engineering leader is already doing. You can start by just having conversations to start getting those insights into what’s holding the developers back. As you grow, as you can’t scale those one-on-one conversations to get a clear understanding of your developer population, then look at survey-based methods as a way to scale the one-on-one or retro process that you already have.

Laura: Yeah. Start small, and then figure out where to go from there. I think that’s great advice. Abi, thank you so much for the conversation today. It’s been so insightful. I’m very excited about this paper, because I think it does bring a lot of these themes down to a very practical level that’s much easier to implement, than some of the theory of SPACE, which might be seen as just a little too fuzzy for people. I think this paper is going to be so helpful to its audience. Is there any last parting thought that you want to call out to the audience to take away from this conversation?

Abi: Myself, Nicole, Peggy, Michaela, we’re all open to people reaching out, and trying to answer questions about this paper. I also just want to really thank you for coming on this show and being guest host today. This is really helpful for listeners, I think. I really appreciate you doing this. Thanks so much.

Laura: Wonderful. Next time I’ll wear an Abi costume, and it will be even better.

Abi: That’s awesome.

A new way to measure developer productivity with Laura Tacho

Timestamps

Transcript