How Zapier adopts engineering metrics and avoids pitfalls

‍Abi: You’ve written a great blog article series about using metrics and engineering. And one of those articles outlines what you call the five categories of metrics**. You’re going to dive into this, but first want to ask, why did you decide to write this?**‍

Mojtaba: So at my current organization, we’re going through a journey on engineering metrics. And this is where these blog posts are coming from. They’re both coming from my previous experience at my previous organization, as we went through a similar journey, but also as I’m learning within my current organization, I’d like to share within and outside of this organization what I’m learning.

One of the first things that came up was when engineering metrics comes up as a topic, it’s a very expansive large topic. And so I found it helpful to categorize the various metrics, because they each have a different audience and a different use case. And by having them spelled out, teams could look at the five categories and try to decide which of these categories is most applicable to them. Because each team is quite different from each other. And so that was the genesis of that article - it was really to try to help teams focus a bit more on which of these categories of metrics is the one that is best suited for its needs.

Well, that’s really interesting and it goes into the next question I had. So in your article, you list these five categories: customer metrics, team workload metrics, team performance metrics, SLAs, and happiness and engagement. So when I was reading the article, a question that popped into my head was, “So does a team need all of these metrics?”

Excellent question, I would say initially, definitely not. And that’s going back to the Genesis of that article. Different teams are at a different stage. They have different types of work. They have different customers. And so I would say initially, no. And I could see, even in the long term, some teams may never need some of these over others.

So I can give examples where a team that is developing something that goes directly to a customer, an external customer that uses it. You could see how important those product or customer metrics can be. You might have teams that are more like infrastructure and platform. Yes, they have customers. You have to have internal customers, but maybe they can start to first focus on the service level metrics. Again, you could have teams that are already performing, things are really good. Maybe they don’t really need to measure a lot of their engagement. It’s fine. So, a long-winded way of saying, “I don’t think that all categories are needed for all teams.”

That makes sense. And I know you’ve been in this situation before, but when you think about these metrics, are you thinking about them from the perspective of a single sort of autonomous local team or are these metrics that typically are rolled out org-wide? I’m curious how that then impacts the choices you’re referring to in terms of what metrics matter for actual teams.

I think there can be an org-wide rollout of metrics in the sense that there is an impetus for the organization as a whole to become data-driven in its decision-making. That’s different from rolling out the same metrics for every team. And even for all teams to be at the same maturity level. Because again, it depends on the size of the team, in what cycle of the team, are they in the storming? Are they norming? Maybe the team was just created recently.

So yes, at org-level, it could be more of a cultural change to go from one that is not data-driven to data-driven, but each team would have the autonomy and context to know how far along the metrics journey they could be and where in that journey, in terms of which category they focus on and what are the skills they need, if that makes sense. So I think yes.

I know you have a pretty incredible, gripping story of rolling out metrics in your previous organization. So I’d love to go into that. Tell us a little bit about the context, what was that org like and what prompted the journey to roll out metrics or become more data-driven?

So the organization was in its rapid growth stage. And initially, when it was smaller decisions, yes, they were made autonomously and very quickly, but everyone could have very good transparency and knowledge of what’s going on and why that decision is being made. So this is about maybe 20, 30 people, they can move fast. Maybe they can act on a lot of anecdotal or gut feeling or experience.

But as the org was starting to scale up, now that decision-making couldn’t be made all at once by the same people. Now, it was delegated to different departments and teams were making their own decisions.

And yet, at the leadership level, they were seeing that, a) they weren’t getting visibility into the impact of those decisions. And also, b) it wasn’t good enough. The scale was not lending itself to making anecdotal or gut feeling decisions. Now, you actually needed to act on more data, but also the scale was allowing you to have data. Whereas, previously, you just didn’t have the data, so how can you make decisions based on that? So it was both an opportunity because the scale allowed us to collect more data, but also, a challenge because with scale, now you need to make those decisions based on data, so that it’s the right decision.

I really like that. And you call out, I think, a really useful point there, if you’re a three-person team with five customers, you’re probably just not going to have a lot of data to have worthwhile conversations about metrics. In your description, you mentioned several times that these metrics were being used to guide decisions or understand the impact of decisions. Can you give examples of what types of decisions you’re referring to?

So I’ll give examples where there were technical decisions but also org design decisions. So a technical decision, this was in the organization, the software was being run on virtual machines. And as, again, as scale was becoming an issue dealing with multiple VMs, each customized and treated as pets as it were. And they wanted to move to the cattle model and having a fleet of these VMs and so on. And a key decision that the technical team, both the development and operation teams wanted to make was around, “Do we move to Kubernetes?” So there could be, and there were very good arguments from a technical point of view of why this makes sense, why do this, we could share articles, why this is more elegant, it scales, you have infrastructures code and so on and so forth, but given the cost of moving from what we had to this new architecture and infrastructure, at that time, the leaders were asking, “Okay. Well, we maybe can estimate the cost, but we don’t quite know the benefit.” Yes, we can qualitatively and anecdotally say that it will be better, but because the cost is quite numeric, in terms of the number of months paid by engineering salary, can you put a number around where we get some benefits?

And so this is where coming back to some of the DORA metrics, as an example, was helpful to be able to articulate that lead times could be improved deployment frequency, could improve incident rates and so on and so forth. Now, a business leader could make a decision based on the cost of going to a new infrastructure versus these other numbers, “Does it make sense that our deployment frequency is once every two months,” as an example, and in this new infrastructure, we can get there. That’s one example of critical technical decision that needed to be made with large investment that needed some data.

Even org design, in the last organization, our development and operation teams were totally separate and reporting into different management chains, which is sort of early in the DevOps journey. And so there were advocates for merging dev and ops together. This is an org decision. And yet again, there’s a cost associated with that org design and org change. So what are the metrics? What will improve? What do we think will improve? Of course, we can talk about happiness and so on, but can someone put a number around or a set of numbers around what will improve if we get our ops and dev folks working together, because we know it will be painful, initially. So can someone talk about the benefits in a, not just qualitative, but also a quantitative way?

That’s really helpful. Certainly, leaders have to make these difficult decisions all the time in having actual data to at least articulate the proposed benefits of these decisions can be so important. Let’s go back to your journey. So you shared the impetus, your organization was growing rapidly. There was an opportunity to become more data-driven. So what was the first step?

So I would say the first step was really an internal change for me. I was resistant to this much emphasis on data. I had not come from a background where this was implemented before. So this was the first time where I was in an organization where the business leaders were asking for data-driven decision-making. And so the first obstacle really was an internal one. I had to grapple with this. I had been an engineer for 15 years. I knew the perverse incentives that can come in from metrics. I had to live the pain of management, putting in metrics and incentivizing them, and then years down the line paying the price of perverse incentive. So my first obstacle really was to come to grips with, “Okay. Can we move forward with this and still try to mitigate some of those problems and those risks of being data-driven?”

Because there are problems. There are risks with doing it a certain way. Beyond that, that same hesitance was all around me in the organization. So it wasn’t just that I had to overcome myself and my resistance, but there was understandable resistance across the organization in terms of what are these metrics? How are they going to be used? This was a pattern that emerged time and time again until we got comfortable, how is leadership going to use this? Is this going to be used to punish us, to measure us, is this part of our performance evaluation? And that was really an obstacle that continued for quite a while until we got so comfortable that we realized the data is actually helping the teams themselves and the team started to demand metrics. That was almost the Nirvana moment that we reached where teams themselves, individual contributors and managers started to say, “Oh, there’s a decision to be made. Hold on. We don’t have the right data. Let us go back.” We knew we had “Arrived” when that happened. And it took nearly two years.

Wow. That is quite the journey. Well, you’re hitting on something that I think anyone who’s been an engineering manager or engineer can relate to, which is this fear of metrics or hesitancy around metrics. I know when I was an engineering manager, I went through maybe a similar experience as you where I was actually asked by non-technical leadership for metrics and was also very hesitant and troubled to figure out something that would satisfy leadership, but also, not create to mute me within the engineering organization. So I’m curious in your case, was that ask sort of coming from business stakeholders or was it coming from another technical leader? What were you counterbalancing?

It definitely was from the business leader. And from my experience so far, and again, I’m still quite a toddler in this journey. I’m still learning. And this is part of the reason why I keep trying to publish blogs and podcasts and so on to also share my learning, but also then get feedback. So far, I have seen it come from business leaders and my hypothesis is that business leaders are used to these metrics for all other departments, even for marketing, even for design where it’s a truly creative, difficult to measure department and function, they have very, very strong metrics and data. A marketing team cannot just say, “Give me a couple of million dollars a year and we’ll be very creative.” There will be very difficult metrics that they need to sort of live up to justify that, obviously sales and in the operations and so on and so forth.

So typically, in my experience, I’ve seen business leaders look across the different functions and see that almost every function has a metric or set of metrics that they use to define success, to use the metric to make good decisions. And when they come to engineering, they’re perplexed why those metrics aren’t there? And the arguments they get don’t quite jive, because it’s like, “Oh, it’s very creative work. It’s knowledge work. It’s unpredictable.” And they say, “Yes, but so is marketing. So is design and yet at product management, they have very strong metrics in terms of their ROI and so on.” So why is it that engineering, which one could argue is more an applied science that should be data-driven and metric-driven is still resistant? So yes, in my experience, it typically comes from business leaders, almost being perplexed why engineering doesn’t have as many metrics as other functions.

Yeah. I can definitely relate to that. That was exactly the scenario I was in. My boss literally said, “Hey, we’re having leadership meetings, marketing sales ops, all the other departments have metrics, what are yours?” And I tried to fight back, but that’s a story for another time. So you are dealing with this sort of hesitancy, even asked for metrics. So what metrics did you consider? What did you try rolling out, initially?

What I came to learn and this took some time was that there is no one metric to rule them all. So one of the things I’ve learned, luckily again, maybe I made the mistakes at the last organization that I’m trying to now apply at this current one is to adapt the metric to the team and really start asking the team, “What are your pain points and what are the outcomes you’re really after? And which metrics would help you?”

So instead of going to them and saying, “Okay. Cycle time is the best or DORA metrics will surely get you out of this.” First kind of asking, “Okay. What are your problems? Tell me about your work. Where do you think your bottlenecks are? Where are your pain points?” That inevitably narrows down which metric they should use, because then they can quantify it, then they can articulate the impact of their decision. So I haven’t been able to find even a set of metrics that I can very confidently say, “This will help any team,” it’s really the other way. Let us talk to the team and see where their pain points, what their goals are and what were their pain points are.

That makes sense. I have a question about that. Do you ever go sort of talk to a team and you’re trying to probe? And do you ever just sort of get flat out rejected? I mean, does a team just say, “I think we’re good. I don’t think we need metrics.” Have you had that happen before?

That’s the majority of the cases, in my experience. I’m surprised if they welcome this conversation with open arms. Again, I’ve been an engineer myself. I understand why. I see two categories of resistance. One is where they fundamentally disagree with using data to measure engineering work. That’s that first category, they have seen it be used improperly. They have seen it used to drive perverse incentives and so on. That’s one category, the other categories actually, teams that go, “Oh, we would love to, but we just don’t have time. We are so busy fighting fire from fire that we can’t do this.” And that really resonates, because if you’ve read the book, The Phoenix Project, that’s exactly in metrics and making work visible are the ways to get out of that vicious cycle of being very busy, and yet, how do you do that if you don’t have time? And so those are two categories of resistance I’ve seen from teams. But yeah, to answer your question, the vast majority of times there’s resistance and not welcoming with regards to using metrics.

Thanks for that breakdown. Have you seen metrics be useful for really small squads? A lot of engineering teams, depending on your organization might be broken out in different structures, but I know one I’ve seen a lot is sort of the squad model where squads can be often as small as three people. I’m curious what sort of patterns you’ve seen with teams like that?

With the small squad, we have one example in our current organization that used metrics, a very particular metric extremely effectively. So this was a team that was just put together for a brand new initiative, brand new product. So they’re in the storming phase of coming together and figuring out what to do. And one of the things that the engineering manager did quite quickly was to get a quick survey out week after week. And he called it the Ambiguity Meter. I mean, here’s someone who doesn’t even go and find a framework. They just say, “Look, I have a pain point. The pain point is this team doesn’t know what to work on. Why don’t I ask them themselves?” And week after week, the engineering manager was asking this in the survey and on a very simple scale, zero, I have no idea what I’m doing and I don’t know where this team is going to go.

And 10 is I know exactly what to do and how this fits into the company strategy. Well, initially the results were not very good. They were in the twos and threes. And this engineering manager was able to… Again, it’s a very small team, I believe, three to four people. They were able to use these numbers and talk to the product manager and say, “Look, this is not good. And I’m not anecdotally telling you this is not good. Here’s some data.” Plus, if we put in a better roadmap in place and so on, we should expect this to go up. Surely after a few months, we are seeing that trend up. So they didn’t start with heavy-handed. I don’t know, let’s do DORA metrics or anything. They hadn’t even built a surface yet. And so there was no deployment frequency and here was a team that did a great job of using this “Ambiguity Meter” to drive up a good conversation with the product.

Well, I love that example, it’s such a good example of just having an actual specific problem you’re trying to solve through data and measurements and building a solution just for that, instead of maybe reaching for something that’s more common or off-the-shelf. So I’m curious to go back to your story again. So you started talking to teams, you have this ask to produce metrics. What was your first ship? What metrics did you actually end up with?

For the development and operation teams, the pain point that they had gotten into was that because they were separate, development team was releasing software, which was not making it into production. And so the first thing that I asked was, “Okay. Both sides are saying, this is very painful, can we measure our deployment frequency and lead time?” Just these two, which are two of the four DORA metrics. And when we say measure in this case, it’s just really write it down and put in the graph, because in this case, deployment frequency was something once every six months and lead time was even worse. And just visualizing that data brought that conversation forward, not just with the executives in this case, but between the teams and gave us a very clear goal that if we did fix this issue where one team is releasing and their code not getting into production well, clearly we would see that graph trend downwards in terms of lead time and upwards in terms of deployment frequency.

That’s where we started with that team, because that’s where they had been stuck. With another team, they were in the firefighting mode, very busy, burnout was a problem. We had attrition issues. What we did there was… And it was a pure, again, operation team answering the phone and dealing with customers and so on. What’s to measure workload, okay? Well, how many tickets do you get? And what we found was they weren’t even making that work visible. So they didn’t even have tickets to follow the work. And so how can they measure it? So for that team, it was a completely different conversation, because they had a completely different problem.

That is really interesting. I’m curious for the kind of team that was having difficulty with delivery, and you mentioned their deployments and lead time were maybe a little bit alarming. What were the actual problems contributing to that?

The problem was you could say both technical but organization on Conway’s Law that they mirror each other. So, as I said, the operation team was completely separate from development and development measured itself on the number of releases it did, and the features that it completed, not that those features would make it into production. So what had happened was, those virtual machines that I mentioned, they had not been upgraded, their OS had not been upgraded. And one of these releases, let’s say, the release from six months ago, necessitated an upgrade of the OS on those VMs. So from the developer point of view, “Okay. Well, I’ll just upgrade the OS and my new fancy version with all the good features in there will work.”

From the operation point of view, this was a monster of an upgrade because all their scripts would break. And so they had waited for months to put in the right scripts and update all their monitoring scripts, all their upgraded, downgrade scripts in order to deal with this OS upgrade that is tied to a particular version, all the while the development team was happily iterating on that version that had not seen the light of day. So it was a technical problem tied to the OS upgrade, but an organizational problem, because the pain of the ops team, in terms of deploying was not at all felt by the dev team, because they were just incentivized to release.

That’s really funny. And probably, not an uncommon sort of scenario. So you wrote this article called Nothing In, Nothing Out. You mentioned it’s the lesser known version of Garbage In, Garbage Out. But as you’ve maybe touched on earlier in this conversation to measure a lot of these things, you actually need good data to begin with. And especially in larger organizations, that means sometimes enforcing specific rules or processes, sort of standardizing the way teams works. I’m curious how this experience went with your previous organization and maybe how or what are the types of challenges you might face in a much larger organization in trying to do this?

So what I’ve seen, also resonated with me in the book I read, I think, The Goal and Theory of Constraint, which starts by saying, “Make work visible.” This is the first step. This is the Nothing In, Nothing Out. There are operation teams, especially, sometimes development teams, too, where their work is not anywhere. It’s based on someone’s Slack message, based on maybe in the old days, somebody picking up the phone or in a non-remote environment, somebody dropping by their desk and asking them to do something that just takes their half of their day away. In those situations, the work and the workload is not visible.

Therefore, there cannot be any data associated with it. So the journey cannot even begin. For those teams, the first thing that I recommend is have an intake process, come up with something that says, “Work coming from phone, from Slack, from JIRA, from email, from customer request, from service desk, whatever,” you have to have a process by which you can bring them into somewhere, of your choosing, where it’s visible. Now, you can start to get data. That’s the first step.

After the intake process, what ends up happening is now the team has some data, in the sense that the work coming in is all being captured somewhere. Typically, at this time, they go from nothing into nothing out, which is the first step. They just didn’t have anything, to garbage in, garbage out. Because the work coming in has not been categorized. Has not been, maybe, estimated. Has not been pruned, et cetera, et cetera. The data you get, you can visualize, but it doesn’t really tell you that much. And that’s sort of the next step is, “Okay. Well, for me to get better data, I need to do some work, in terms of the incoming work coming into the team, how do I label it, categorize it, size it?” Whatever it is, again, that the team wants to measure, then it starts to yield some insights in terms of what’s happening in the team. So that’s where that sort of Nothing In, Nothing Out leading to garbage in, garbage out, and eventually, good stuff in, good stuff out comes from.

And how do you foresee or have you experienced dealing with that challenge at a really large organization? I feel like if you have 30 to maybe 60 engineers, it’s not too difficult to say, “Hey, look, everyone, we’re going to work in this way.” If you have 500 or 1,000 engineers, teams doing all kinds of different things, maybe even using different tools, that can be really difficult. I’m curious if you have thoughts or experiences on that?

At even a larger org, which is where I am now. This is still something that teams can do autonomously, in the sense that they need to go and find out their intake process per team. There doesn’t need to be. So even though it’s an organization of 300 engineers, they’re not all working on the same intake. Each team has their own slice of work, working on a particular vertical slice of a value to the customer. And so teams are still seven to eight engineers with an engineer manager. In that case, if each team has a proper intake process, they can start to collect this. Now, there is a question here of tooling where someone could say, “Well, this is painful, because I’ve got Slack coming in. I’ve got email. Are there tools that can help me collect this data a lot better instead of every team needing to figure it out and choose their own adventure as it were?”

And then, now, you’ve got an organization of 300 to 500 engineers, and each team has figured out its own tooling. And now it’s really difficult to have a consistent way. I think this is where tooling can have an interesting impact, because if there is a demand from teams to collect information and that’s the first part is creating that demand of understanding. Then they start to ask this question of, “Well, there’s got to be a better way. What are other people doing in other companies? Can I get better tooling for metrics collection and data collection?” Then I think it’s a good time to have that conversation. I find some teams start a little too early with the tooling before creating that demand and understanding why they’re doing this. So I don’t know if your question was leading up to the tooling, in terms of a more cohesive way of measuring across all of the teams.

We can move into more of that tooling side because I know for example at GitHub we had this problem. We wanted to try to measure lead time. And of course, we had maybe a certain percentage of the org working in this one Monolith, but a big percentage of the org was off doing all kinds of other things, whether it was mobile apps or CLI apps, desktop apps, our on-prem enterprise product and the new products. And so it didn’t really seem feasible, at least at the time to instrument everything, to give us some consistent reading of lead time across the organization.

I did not stay at GitHub long enough after that to see if we were able to eventually do that. But that felt like a really insurmountable challenge at the time. In fact, it felt difficult enough to even identify what all the different teams were. I remember spending weeks actually trying to go to HR, go to leaders, go to infer the information from our service catalog of like, “What are even all the teams that are working on different things that we would have to instrument?” So that was my experience and I don’t know how we could have solved that. So I’m curious to get your thoughts.

I think part of this comes back to what is the goal or outcome that we seek for measuring something across an entire organization? I think if we say we want to measure the same thing across all the teams, I think there could be problems with that as a goal, because the lead time and deployment frequency of one team doesn’t really compare, or it’s not really fair to compare the lead time and deployment frequency of one team to another, because it could be based on how much legacy work there is, how much technical there is, how complex, is this a very, very complex core part of the product with lots of weird edge cases or not. And so where I resist a standardization across an entire org of both tooling and measurement, is that it loses that context and nuance per team.

And it, I think that’s, that is one thing that can lead to that legitimate resistance from teams saying, “Oh, you’re going to compare my lead time with my adjacent team’s lead time. I’m working in the Monolith.” They’ve already, for whatever reason, they’re a brand new service. It’s not fair. You can’t really, and yet maybe the Monolith is far more critical to the business than their service. So how are you comparing our lead times? Where I do see lots of value in standardization of tooling is when teams are trying to get going and they’re looking for best practices or tools that will just get them off the ground. So instead of saying, “Okay. Each team, you go figure out the tooling, you go figure out how to measure your lead time.”

It’s to come to them and say, “Okay. You do want to measure your lead time. You’ve already decided that I’m not telling you to measure your lead time.” You’re coming to me and saying, “I want to measure lead time. I want to measure deployment frequency,” guess what? You don’t need to go do it yourself. I have found a way that most teams can, can just plug in this and your GitLab off here, and put that there, and JIRA, and boom, you have your first level DORA metric going, that there’s tremendous value in that, because now teams are not reinventing the wheel one at a time and completely independent of each other and not learning from each other.

I think you hit the nail on the head when you said that maybe think twice before you take something like lead time or deployment frequency and make it an org-wide metric. In our scenario, we were exactly trying to do that. In fact, I had an OKR that said accelerate GitHub’s engineering. So I needed some sort of metric that was org-wide that could be used as sort of the baseline of how “Quickly” we were moving. Again, I didn’t stay at GitHub long enough to sort of see the full life cycle of how these metrics played out. But I’m really curious if someone came to you and said, “Hey, accelerate organization X’s engineering.” What if lead time and deployment frequency might not be the right things to measure org-wide, what could you measure? And it’s okay if the answer’s nothing, because I don’t think we were able to figure it out at GitHub, but I’m curious to get your take.

One thing I would look into, I think it’s a very, very tough question, but it’s a fair question that business leaders do ask of engineering leaders, which is velocity and quality are the two things that really differentiate teams and organizations from other. And the faster you can learn and get feedback from the market, the faster we know we can win. And so like you said, “How are you making it faster?” The way I would look at it is, maybe not in terms of absolute value, but relative value. So it’s not that we need to get our deployment frequency to be X or our lead time to be Y across the entire org. It’s more, “Are teams improving their deployment frequency? And are they improving their lead time?” Or are teams coming back and saying, “Those aren’t even good velocity metrics for us.”

In our team context, cycle time might be better or time to return to a customer with an answer might be better. Okay. So you’ve picked your velocity metric, whatever the number is, actually, the absolute value may not be very relevant. What is far more important is that it’s getting better. And so my answer would be, “We don’t need one metric to rule them all. What if the engineering organization was showing you that different departments and teams, they have velocity metrics, and they’re improving.” So clearly, we are getting better. Then the need for a single metric that is rolled out across all of engineering goes away, because it answers the question. Yes, we are getting faster, but you don’t need one metric for all the engineering to prove that.

I like that. So sort of leaving it up to the teams to figure out their own baselines and their own, maybe, targets that make sense to them. I’m curious, these types of metrics, we’re talking about cycle time, lead time, deployment frequency, even time to answer. So I recently watched this movie called American Factory, a great documentary, but it was really interesting because it went really behind-the-scenes into manufacturing factory work. And my ears lit up when I heard them talking about metrics, like cycle time and lead time. And it was so interesting to actually observe the way these factories work and how it really is an assembly line. Just step A to B to Z or say A, B, C, D. With engineering, when I’ve tried to roll out these types of metrics sometimes, and you alluded to this earlier, there’s this sort of indifference or rejection of these metrics.

And I think part of it is, so for example, a common conversation I’ve had is an engineer saying, “Well, sometimes things take a little longer and sometimes they don’t.” Sometimes their cycle times a little longer, sometimes it’s a little shorter. It just is what it is. Or I had an engineer tell me about lead time when we set up these lead time dashboards and we’re sort of excited about it. They sort of said, “Why are you telling us something we already know? We have a pretty good sense of how quickly we deploy.” Sometimes we do things fast, sometimes things take… So I’m curious, how do you avoid the turning software development into an assembly line sort of mentality, right? When you roll out these types of metrics that traditionally do come from manufacturing, right? I’m curious to get your thoughts on that.

Yeah. That’s a very common, and I would say, legitimate pushback in the sense that engineering, by way of being creative work and knowledge work, you’re creating something new. Whereas, manufacturing operations are creating the same thing over and over and becoming efficient. So it’s a fundamental difference here. It doesn’t really preclude metrics. The way I look at metrics is in the context of the scientific method. So if engineering is applied science, and one of science’s core methods is the scientific method where you make a hypothesis, you measure, and you learn, and then you go back, and either you fine tune your hypothesis or you disprove it. This is where metric can help engineers. So of course some features, some tickets, some bugs take longer than others.

And so that’s known. That would be the same as a scientist saying, “It’s strange. Some things move at a different speed, but that’s the way the world works.” A scientist would say, “Yes, different things move at different speeds. Let me categorize them. Let me measure them. I will do it by weight. By air resistance. And so on. Now, I’m able to learn what are the parameters that make something go fast or slow?” I think that’s the part that, as engineers, we need to have a conversation about, which is of course there’s high variability of what we create, but can we try to measure and have a hypothesis of what are the things that take a long time and what are the things that don’t, and what is the underlying mechanism?

So looking at it from a scientific point of view, can we learn about what are the parameters that make something take a long time or not just as an understanding, because that understanding can now lead to possibly prediction, but also insight to find, “Okay. These things seem to take a long time that might reveal something in terms of our inefficiency in our tooling, in our process, or maybe that’s just the way it is. Therefore, let’s bake that into our prediction from now on and not think that for the last 100 times, that type of feature has taken a long time. Surely, the 101st time will be faster.” No, we have now data to prove that it isn’t. So I think it’s that scientific method that can be a good framework for engineers, especially, because engineers come from that applied science background that can lead to that learning and insight about metrics.

Well, I love that insight and maybe there’s… I’m sorry if you already have a PhD, but I was thinking maybe there’s a PhD in your future in this area. Clearly, I love the advocacy for being scientific in our approaches. Well, speaking of science, so there are these DORA metrics that came out of the book, Accelerate, which has research behind them showing how they correlate to different elements of business performance. And so these DORA metrics have become really popular. I know that when speaking to engineering leaders, there’s sort of mix reviews and still a lot of confusion about how you’re supposed to use these metrics and the right ways to use them.

And I know at GitHub, we tried using these metrics and rolling them out to teams. When we were sort of excited about them at a leadership level, more so, we could just confirm, I think, what we already knew, but it was still exciting to get this data. When we rolled them out to teams, there was this strong indifference I feel from teams and no one seemed to really learn anything from them. No one seemed to care that much about them. No one seemed excited to use them. I’m curious, what’s your view or experience been like with DORA metrics, specifically?

DORA metrics, I have a very similar experience in the sense that in some situations and for some teams, the DORA metrics are at best useless. For some teams, they are incredibly helpful. And so I gave one example of the team where the two out of the four DORA metrics immediately was something they could rally around, both the dev and ops team could look at their deployment frequency. I know of several teams that, today, that metrics they’re measuring it and it tells them nothing. So one way I looked at DORA metrics is that if we look at sort of the instrument panel of an airplane, you have maybe a 100 different knobs and indicators and so on. DORA metrics are one of those panels. Is that the panel that the pilot always looks at? No. They only look at it if there’s something red there.

Should all airplanes have that? No, it depends on the airplane, right? If you’re a 747, you might need that for certain section. But if you’re trying a little Cessna, maybe the person can just look out the window and see if the wing is oscillating too much. They don’t need an instrument. So the way I look at DORA is they’re definitely a subset. Are they useful? Absolutely, in certain context. It’s the application of them as the silver bullet that I think can be problematic? Okay. I don’t need to think about metrics. I don’t need to think how to apply this. There’s research behind this. I’m just going to take this. Roll it out across without nuance, without context. I think that’s one danger. The other danger is which follows that first one, which is the automatic creation of tools that do that, because now you’re even taking more nuance out of it.

It’s one thing for a team to decide itself, “Okay. I’m going to define lead time as from the time it’s deployed to master until it’s in full production.” It’s in order for all teams to measure the exact same thing. Now, you’re losing even more nuances, and therefore, less use for the team. So those are two things that I’ve seen happen at the leadership level, “Oh, these DORA metrics are so powerful. Let’s just have them all.” And they can help leaders abdicate the responsibility of finding out which metrics actually matter, because they can just point to a research. Again, great research, but again, there’s an AB abdication of applying that properly to their organization. And then there’s this other movement to have, again, very useful tooling, but have that as a silver bullet of, “Don’t think anymore, this tool will solve at all for you.”

Again, coming back to scientific method, it would be someone selling a box that will measure everything for you. And you go, “I know as a scientist, that doesn’t work,” every scientist needs to take some tools, but they might actually have to adapt the sensor to the particular thing they’re trying to measure. Otherwise, you can’t have a voltmeter and just measure everything with a voltmeter. Sometimes you need an app meter. So sometimes you need just measure distance. You can’t use a voltmeter for everything. So yes, I’ve seen very mixed results of use of DORA metrics.

That’s a great explanation. It reminds me of this thought I used to joke to myself about which was, “Oh, your deployment frequency is once per month you must be a low performer, but then it turns out it’s like an iOS team that it takes three weeks to get Apple to review their app.” So as you mentioned, this nuance or context is so important and can be missed. Going a little more high level. We’re both part of this RANDs engineering leadership Slack group. And it seems like every week someone’s asking, “Hey, what metrics are you using?” In your mind, to me, sometimes I just feel like, “Oh, man. We’re just so hopeless to be lost here.” What do you see as the future? What’s the answer to this? What’s the right way?

I don’t think in the future there will be an answer that will provide the adequate answer for that question in perpetuity. Because like I said, there’s context per team. And so I don’t think we’ll get to a point even five, 10 years from now where a software engineering leader will come and ask a question, “What do you measure? Just do X.” I don’t think we’ll get there. What I hope will happen and I think is happening in engineering is the muscle and the skill to ask which metric is appropriate to what context. And I would say more importantly, the muscle and skill to advance on metrics. So one of the things I’ve written about is metrics can be deprecated, metrics should be deprecated.

You know that ambiguity meter? Now, the team is at a seven or eight. They know exactly what to do. Is there value in measuring that going forward? Oh, if it’s easy, easy, sure. But not really. Not anymore. They can actually deprecate that metric. So even for the same team, so I’ve argued that for different teams, there is no answer of, “Just measure this.” Even on timescale, the exact same team at one point in time will use different metrics than another, the skill and the muscle to learn how to move forward with metrics, even move off of some metrics to other metrics, that is the skill that we need and not particular metrics or particular tools.

I like that a lot. So we all need to sort of become better scientists, not look for the silver bullet necessarily. Well, Mojtaba, I really enjoyed this conversation and appreciate all your insight. I’m really excited to continue to follow your journey with metrics and your writing as well. Thanks for being on the show today.

Thank you so much, Abi. It’s been a pleasure.

Transcript