Podcast

How teams are using productivity metrics at LinkedIn

Max Kanat-Alexander, the Tech Lead for the Developer Productivity and Insights Team at LinkedIn, shares an inside look at LinkedIn’s metrics platform and how teams across the organization use it.

Timestamps

  • (1:31) Why Max shares how his team is measuring productivity
  • (3:20) Why some teams use metrics and some don’t
  • (6:03) The types of metrics Max’s team focuses on
  • (12:59) The role of TPMs
  • (17:05) How Max would measure productivity if he weren’t at LinkedIn
  • (25:04) Surprises in how teams are using metrics at LinkedIn
  • (31:27) The tooling required to enable metrics for teams to use
  • (36:41) Qualitative versus quantitative metrics
  • (40:39) Measuring code quality at Google
  • (46:16) Whether a centralized team should own measurement

Listen to this episode on Spotify, Apple Podcasts, Pocket Casts, Overcast, or wherever you listen to podcasts.

Transcript

Abi: Max, you’re our first repeat guest. Thanks so much for coming on the show again.

Max: Yeah, you’re welcome. I’m excited.

Abi: Awesome. Well, this is timely, because I’m going to start off by talking about the recent article that your group just put out about how LinkedIn measures developer productivity and happiness. Firstly, I just wanted to ask you, LinkedIn is a little bit unique in the fact that it is putting all this out there publicly. Would love to know what’s the behind the scenes of this, how these are coming out, what the internal conversations are like.

Max: I can certainly speak to what’s motivating me. I think what’s motivating me is that I’ve always wanted to help software engineers on a broad scale. Every time that I have an opportunity to do that, any platform that I have to do that, I want to do that. I’ve seen that most companies struggle in this area in particular, most engineering organizations tend to struggle when they are trying to figure out how to measure things and how to visualize those measurements and how to make those measurements into action and change that happens in their business. I thought, “Wow, what a great opportunity to be able to help a lot of people.” Here’s a gap, a void in the field of engineering knowledge in the world, and wouldn’t it be great to fill that void. This is a beautiful opportunity to help people, so let’s try to do that.

Abi: Well, you’re absolutely filling the void, I think, and it’s a big void as you articulated. I’m not going to go too deep into that article, because listeners should just go read the article that’s published in the LinkedIn engineering blog, but I did want to ask you about some parts of it. Adoption is one thing that’s actually talked about in the article and I really appreciate it. There’s actually concrete adoption numbers and testimonials shared in that article. But as we’ve talked about offline, as we both know, there’s so much buzz around metrics and so much less conversation about how they’re actually being used and in what ways they’re actually providing values.

I want to ask you, and not just specific to LinkedIn, but what are the highs and lows you’re seeing with teams actually using these metrics? Maybe give us a picture of both ends of the spectrum, what teams are using it a lot and why and how and who’s not using it and why do you think that’s the case?

Max: We actually have a pretty good idea of the specifics to the answers to your question. The people who use it the most are the people who have metrics in the system that they care about and who have a process that they use to review those metrics. Those are the two driving factors of the success of our flagship visualization products, developer insights hub, are those two things. For example, let’s say we put out build time metrics, but let’s say that you are an ML engineer and you rarely build anything, you mostly spend a lot of time waiting for offline data flows to process or waiting for things to happen with your models or your feature store, and so your team doesn’t care and will never engage with our build time metrics. We could have the most beautiful build time dashboard in the world and we could try to market it all over the place and tell everybody they should use it and even try to force people to use it in some way—which we wouldn’t do—and no value will occur.

That’s really the first one. The second one is teams that have established processes for operating on data in some fashion. We see, for example, a lot of teams have these reviews that they do on a monthly or biweekly basis where a bunch of managers will get together in a room and they will discuss the state of certain metrics. Sometimes there’s a TPM who’s running the meeting and they will prepare in advance some interesting narratives or things like that. They’ll have a conversation maybe with some of the executives about, “Hey, here’s what we’re doing about this particular thing that we care about right now.”

That process, it’s interesting, it actually expands beyond the meeting. Engagement, for example, that we see, expands beyond the meeting, because… Well, there’s two reasons. One of them is that people have to prepare for the meeting, so they want to go in and they want to look at their own metrics. But two is that there’s an expansion effect that occurs from the meeting where people go into the meeting, they look at these numbers, and then they see that the tool exists or that the visualization exists. If the visualization is straightforward enough that people feel like they can engage with it, and if it has the data in it that they care about, then they will continue to engage with it outside of the meeting. At least, that’s what we hope, and so far that’s what we see in our data as well.

Abi: One question I have for you is, do you think every team should be doing this, this process, this review of the metrics and in particular the types of metrics that are included in your platform? I think at a lot of other companies, this type of process can sometimes become mandated from the top-down where an executive says, "Hey, all teams are reviewing and focusing on these metrics. I want to ask you in your view, is that a good approach and in what cases should a team should a team not be focusing on these types of metrics?

Max: I think first of all, we have to clarify what types of metrics we’re talking about. You had a really good post not that long ago where you talked about classes of metrics, and you and I share a pretty similar view on what the classes of metrics are. One category would be business impact metrics. This is your revenue, this is your user sessions, this is whatever your impact is, like at YouTube, it was like video watch time, things like that. Then there’s the operational metrics which are like, “Is my system up? Is it running? What’s the latency?” Then finally there’s what I call productivity metrics, which are how effective and efficient is my team at producing the business metrics and the operational metrics at producing those results.

My team focuses mostly on the productivity metrics and a little bit on the operational metrics, but in a sense for managers. Once we have that, I think audience is also important, because when you say everybody…. I don’t ever want to mandate a process that requires engineering ICs to go look at a dashboard on a regular basis, they’re just going to look at it and be like, “Why am I in this meeting?”

I say this to people sometimes and they say, “Well, ICs do get value out of dashboards,” and I say, “Yes, that’s absolutely true. ICs totally get value out of dashboards.” But in a different way they’re tasked with a particular problem that they need to solve for the business, and they want to understand what’s the potential impact of this change that I could make, or where’s the real problem? What’s the underlying root cause? That’s very different than if you are a manager and you want to make strategic decisions about the direction of your team, where you want to see trends. The most important thing you want to see is trends. You want to know, is that graph going up or is that graph going down over a long period of time as a sense of are we in general making the right decisions, or do we need to, in general, pivot?

And so, for the productivity metrics, if you’re going to mandate a process, I would only mandate a process for managers and I would be thoughtful about at what level the data is useful. If you’re on a team that does 5 code reviews a week or 10 code reviews a week and you’re trying to look at code review response times, looking at that might be useful for your team, but it also could just be noisy. That graph could go up and down a lot in unpredictable ways, because the volume is very low. Usually, what I say is like, "Look, you want to wait until you’ve got maybe 50 people or 100 people in an organization, and probably you want to look at the productivity metrics at that level. You do want to be able to dive down into them, because you want to understand, “Oh hey, this team that works on this particular type of thing, they’re really struggling.”

I do need to clarify here that what you’re not looking for is “these people are slackers.” You don’t want to have metrics that say that and that’s not the investigation you want to do. What you want to do is: these people are experiencing pain, they are blocked, and now I want to go talk to, say, that manager or that group of ICs and say, “Tell me about your pain,” so that either the engineering manager can help encourage new development practices that we could enact in order to counteract that pain, or I can go to the developer productivity tooling team, whoever that may be (even if sometimes it’s a person, one person) and say, “Hey, let’s make some changes.”

I think the things to consider are types of metrics, audience, and the size of the org. Then on top of that, the cadence at which you review. Let’s say that you’re in charge of a 1,000 person engineering org, it would be a mistake to do a weekly review of these metrics, because you cannot issue an instruction that will cause anything to change within a week. Instead, what will happen is you will end up ratholing on tiny details of metrics and mostly just cause people to scramble in fear and panic because you are the big boss, and no actual value will happen for developers. You want to be looking at six-week trends, quarter long trends, quarter-over-quarter trends at that level. In iHub, in order to encourage this, in iHub, we show people six week over six week changes. That’s the big number that we show people. You can see that in the mocks in the article.

The graphs when you expand them, they show 24 weeks. That’s what they show right now. They show you and they draw lines where they compare these six weeks to the previous six weeks. That’s the primary highlight that we try to give people, because that’s the compromise between the, I have a 100-person org, I have 1,000 person org," that’s what we figured it was, at about six weeks." The whole reason we did that is to encourage discussions about and thinking about the long-term trends of things.

If you are a frontline manager of an engineering team or you’re an individual tech lead of… If you’re the direct tech lead of a team, looking at those metrics on a week-over-week basis is fine, could be great. You want to know sooner. You want to be like, “Hey, what tactically can we do right now to solve the problem we had last week?” But if you are the VP of engineering at a very large organization, as much as you might want to do things like that because you are probably an excellent engineer and know in your mind all the things that everybody should be doing and maybe you just want to go and tell everybody, "Hey, go do the right thing,” you probably can’t do that. Instead, you need to be like, “Hey, what’s going to be our long-term strategy for this long-term problem?”

Abi: One thing I really appreciated about what you shared is the line about how there’s this risk of leaders just stressing everyone out, scrambling things without creating any actual net value. I was just working on an article this morning about cycle time, which is, we’ve talked it a little bit about it offline, but metric that a lot of organizations sometimes turn into their KPI around engineering velocity. One of the things I’ve seen happen is a focus on optimizing cycle time so much so that at some point you’re just breaking down work into smaller pieces without actually creating any benefits. What you said really resonated.

Was curious to get your thoughts on that, but also want to ask you a question. Earlier you mentioned that TPMs often get involved with running these review processes. In offline, you mentioned to me that TPM involvement is a key predictor of success and adoption of these metrics. I want to ask you about that. What’s the takeaway from that? Does that mean EMS on their own don’t have the capacity or interests as much in this type of improvement and metric review process? Or what’s the role and benefit of the TPMs as well?

Max: I think it depends on the company that you’re at. I think it depends on the org that you’re at. I think it’s pretty universal that project managers of all types across probably all industries are trained to care about metrics. If you’re a project manager and you don’t care about metrics, it’s a little bit like you’re an engineer and you don’t know how to code, almost. Not to that extreme, but it would be very, very unusual to have a project manager who doesn’t show up and say, “Hey, where’s the numbers? Did the numbers get better? Are they getting worse? How do we know we’re succeeding? What’s your measurement of success?” That’s an activity that I expect project managers in all industries to engage in.

Whether or not engineers know how to do this is not part of anybody’s training as an engineer. Many engineering managers are former engineers. They rose up from being engineers, and their training didn’t include, how do I define metrics? How do I act on metrics? If they “grow up” in a company that already does that on a regular basis and has gotten “the religion” of metrics and data and it’s all laid out for them, then could osmose that from the environment. But otherwise, there’s very little training. There’s not like a school that you go to that says, “Hey, engineering managers, here’s how you do metrics.”

All the time, instead, what you know you see at some companies is, a leader who does know how to do data will go to an engineering manager and say, “Hey, I want metrics.” The engineering manager who wants to do the right thing and is an engineer and thinks, “Well, sure we should have data,” they scramble and they try to figure out, “Well, what metrics do I define?” And either they pick out whatever they can measure or they just write the names of metrics down in a document and say, “Here’s what we’re going to measure,” and then never actually measure it.

That’s not me faulting the engineering manager. That’s what I want to be clear. I’m telling this story and it doesn’t sound great, but it also sounds normal to me. This sounds normal to me, because you’re asking people who have never been trained to do a thing to suddenly magically know how to do a thing. I think that’s a lot of the time where the TPMs do come in. They say, “Hey, not only do they know how to have metrics. They know how to have processes and how to design processes that cause effective action in a business.” If I haven’t described two of the core functions of a project manager, I’m not sure what they are. Implement processes that bring order, and cause data to be looked at and operated on so that we can understand the success of our work. That sounds, to me, pretty core to project management, technical or not.

And so, the TPMs have a desire for this to happen, and because it’s a core part of their role, they will drive it. They’re willing to drive it, they’re willing to help other people understand what to do and get other people together in a room and set up a process for everybody to cause it to happen. Now, it doesn’t always require a TPM. There are engineering managers who do this and do a great job usually because they coincidentally learned about how to operate with metrics or data through some random occurrence in their life or career.

Abi: That’s a good point about how engineers and engineering managers, oftentimes they don’t have that formal discipline or background in the project management type stuff, which as you said, revolves around setting up recurring processes, reviewing metrics. I think in that case, the benefit of bringing in the professionals, so to speak, is pretty clear. I want to ask you, LinkedIn’s put out a couple of these articles now on your approach, which looks incredible. If you were to get hired at a new company and the CTO, CEO says, “Hey, we want developer productivity metrics here,” how would you approach it if you were to do it again? What are maybe some of the key things you would repeat that you’ve done at LinkedIn, and what are maybe some things you would do differently or reconsider?

Max: Okay. I’m going to talk about this in general and then you might have to remind me at the end about things that I would repeat or reconsider. The problem that you run into with this is that every company is different. They may not be different in some of the core things they need to measure or in what the end result is, but they’re all different in the place that they’re at. It’s different depending on what size you are as a business, too, and where you’re at in your journey as a business, and then where you’re at in what I call “the data journey.”

I would describe the data journey as sounding roughly like: you start off with no data, only anecdotes. Then you go to the idea that we should have data, and then you go to measuring anything you can measure. Just any number that pops up in anyone’s mind, you’re going to measure that. Then you go to having processes about data. Then you go to having thoughtful metrics where you think about what are the metrics going to be, what’s the impact? Let’s think more about how do you define metrics? And usually, at the end you get to what causes action, there’s some process of action to occur.

People are at different places on that journey. Also, different organizations value different things, and you have to recognize that. It’s very easy as a developer productivity person to go into a company and say, "I know everything. You people are all doing it wrong, and you just need to listen to me. If everybody did what I said, then everything would be fine.” I can tell you that if you do that, it will not work and you will have no impact. All that will happen is you’ll alienate your coworkers and potentially lose your job, because you are saying things that are not real to them. You are talking about rainbows and unicorns, and they are in the trenches digging.

They’re like, “Stop talking to me about rainbows and unicorns. I have to dig this trench.” One of the first things you have to do is recognize: what’s the real problem that people know exists? Let’s say you’re talking about data. I often see in engineering organizations across the world that infrastructure developers sometimes don’t have a great understanding of the requirements of their customers. They sometimes believe that because they are developers, that they know what all developers need, and thus they don’t need to go out and do extensive user research.

Now, I have enough experience, creating infrastructure and developer productivity platforms that I know that that is not true, that there are many developers who are doing many things that you don’t know about and cannot imagine until you go and talk to them. That will be unique to your company. They will have some crazy set of scripts that they’ve written and some wild workflow that you never imagined was occurring. You’re like, “Why won’t you adopt my tool?” And they’re like, “Well, because I got these scripts.” It blows your mind. You’re like, “Well, I designed the most beautiful thing in the whole world. Why won’t you just use my thing?” It’s very easy for me, with data, to go in and say, “Well, you all need data, because you don’t understand your users.”

Occasionally, you can, through repetition of the message, bring about a realization of this being the case. This is a hard way to go. It takes a very long time if you do this. If you just force everybody to do the thing and then repeat it over and over and over and over and cause people to look at the data over and over and over and over and over, after a period of years, some people will start to get it.

That’s a pretty rough way to go. What you have to do is you have to find out, "Hey. Okay, first of all, is this a top-down company or a bottom-up company?” If this is a top-down company, what does the boss care about? If this is a bottom-up company, what are the problems that the engineers on the ground are facing? What do they really care about? What’s the most immediate thing that they find painful? And you’ve got to figure out what the right solution to that is. Maybe it’s data, but maybe it’s not. Maybe it’s just: go fix something.

You might be going the whole time like, “My God, these data systems are terrible. I can’t figure out what I’m trying to do,” or, “I don’t even understand how to make queries,” or, “These data sources are so messy and there’s no dashboards, and I can’t do anything.” Okay. Just stow that in the back of your mind. Just stow it and wait until you get to a place where you can have a conversation with whichever group that matters—the top-down or the bottom-up in the company that you’re at—and wait until you can have that conversation where you’re like, “Hey, what do you think about data? You think we should measure something about this? Or what do you think?” And try to bring people along with it. That’s my thing.

I think that to answer the rest of your question about things that I would repeat and or not do, I think that the things that I would repeat is doing a good job of identifying which direction matters. I think we did a good job of figuring out who were the opinion leaders and how we would help change the minds of the opinion leaders at the company, or not even change their minds, but bring them the information that they wanted to have, so that they could make useful decisions.

I think one thing that I would change is I would spend more time going to teams and asking them, “What data do you care about?” Especially product teams. I would say, "Show me what data you care about.” I might get a list of metrics that I’m like, “What?” But it would at least inform me as to: what are the things that I think will be acted on if I put them up?

Then I think one other thing that I’d like to do is… We have Insights Hub and then Insights Hub links out to a lot of detailed individual dashboards, so you can go to a dashboard that lets you dig into build time or a dashboard that lets you dig into our GitHub metrics. I would have, I think, spent a little more time focusing on the customer of the detailed dashboards, because it’s very easy as a data engineer to just put up a detailed dashboard that serves you. If you’re the engineer writing the pipeline, I think what serves you is, let’s have every filter and every dimension on one page, because I need to go and check my pipeline, I need to go and see if my pipeline is working, and I want to have a bunch of different graphs on the same page, because I don’t want to have to keep loading different pages to know everything that there is to know. That’s not just true of data engineers, but it’s also true of anybody who’s a deep expert in the data. Making a dashboard that serves a deep expert in the data is very different than serving a person who may be a deep expert in their own field, but isn’t a deep expert in your data source. I think thinking about what kind of investigations those people want to do is something that I would think about upfront when I was designing these deep dive dashboards.

Abi: This leads into another question I had for you. You touched on one thing you’d do differently next time, is do a bit more research maybe upfront about what are the things that the product teams themselves care about or want to measure. I want to ask you, you having experience now with so many different types of metrics being rolled out across the organization, what surprised you? What metrics are teams getting less value out of than you might have anticipated, because they might be popular metrics? And what types of metrics are teams getting value out of that you didn’t initially anticipate?

Max: These are things like reviewer response time, which is the time from when an author posts an update to a PR to when a person who is assigned as a reviewer actually posts any comment at all. It just measures how long an author has to wait for a response basically, and we display it to the teams of the authors. My theory was, managers are going to care about the experience that their authors are having. However, multiple managers have come to us and said, “The reviewer is on another team. I cannot control this metric, and I actually would like to hide this metric from my dashboard. I actually don’t want to see it anymore.” That’s a bad indicator.

If somebody comes to you and says, "How do I turn off this metric? You always want to dive into that and be like, “Oh, what happened? What happened?” You can have your own opinions, you can be like, “Well, you should just go talk to that other team.” That was our thought. Our thought was like, “Well, you know, you can just go and negotiate with the other team.” But sometimes that’s not realistic. Sometimes the other team is some central infrastructure team or some monorepo or something that everybody has to go through and they’re busy. Maybe they could change their practices if they had the data for themselves, but we’re not showing them that data. We’re showing the authors the data. I have actually been considering switching that, for example, to show you the reviews that your team does as opposed to the PRs that your team sends out and how long we have to wait for them.

But at the same time, some teams do get a lot of value out of that metric and they drive to push it down. I think that’s more common when the reviews are contained within their team or within their immediate organization.

The metrics that teams tend to get the most value out of are the ones that they designed themselves or the ones that they originate themselves. I would say the metrics that teams get the most value out of are the ones that they believe in you. You and I might have many opinions about how to measure reliability, for example, or what are the best metrics that drive cultural change in software engineers to have reliable software systems. But a team, they might not have gone through the lengthy set of philosophical discussions that you and I have had, or the vast number of papers that you have read on the subject and they might just want to count more basic things. The fact of the matter is, that does a lot more than having no data.

In the field of reliability, there’s definitely a lot of debate about Time To Resolve metrics as an example. What I think about Time To Resolve metrics, for example, personally, is that they tend to encourage a team to be more operationally focused and figure out how to become better at handling incidents instead of better at preventing incidents. There are ways around that, you can sum up the total time to resolve instead of taking the mean. There’s all sorts of things you can do with that, but at the same time, if you have no data or you have MTTR (median time to resolve) you’re way better off having MTTR. MTTR is infinitely better than no metrics. I might sound like I’m bashing it, but honestly, it can be super helpful. It can have a dramatic effect on the actual experience that customers have of your product. If you have a day-long outage versus an hour-long outage, that’s a huge difference.

Honestly, even if you had all the other metrics, you’d probably still have MTTR. My points that I usually bring up with people  are like, “Hey, maybe let’s have additional metrics on top of that so that we can balance out our operational thoughts with our preventative thoughts.” But if the team is totally bought in to MTTR and they want to count MTTR, or they count just the number of incidents and drive down the number of incidents and they act on that, then that’s way better than not having anything.

The way that we’ve accounted for that in iHub is we have a separation between what we call “company level metrics” and “team level metrics.” Company level metrics are visible by default in the UI for everybody, and team level metrics are not visible by default in the UI, but you can turn them on for your team. So, your team can see, and your team can onboard whatever metrics you want, it requires no review process. I’m not involved. The Insights Hub team is hardly involved. You just throw that metric in there and you can see it if you want to.

Whereas the company-wide metrics, I’m involved. I get in there and I’m like, “Okay, we got to talk about the definition, we’re going to really dig into this.” And so, that’s the balance. The balance is, I might have all these opinions. It’s just like I talked about in other areas where I’m like, "Well, I think I know what to do,” and I have my opinions about the quality of things and all these things.

The way that we’ve balanced that out is we’ve said, “Okay, for the things that are going to affect the whole company, we’re going to go through this stringent quality assurance process on the metrics themselves, but for the things that a team themselves wants to measure, just do whatever. You measure what you want to measure. Honestly, that drives a ton of engagement with the data.” The cool thing is if you then also drive those teams who are using the team specific metrics to a UI that has the company-wide metrics, then they can also interact with them.

Abi: I love that. It’s a user engagement strategy, but also valuable to the teams and company, of course. I want to ask you about the tooling side of how you’ve actually enabled the custom metrics. If I’m understanding correctly, does this include bringing in new data sources, or is this just new calculations on the existing data?

Max: New data sources.

Abi: Can you talk about for people who work at smaller companies, maybe talk about why that’s needed, the diversity of the tool chain at LinkedIn, for example, as we’ll talk about this in a minute, but a lot of companies look at out-of-the-box metric solutions, which usually come with maybe up to a dozen or so built-in data sources without being able to extend beyond that? Talk about why at LinkedIn you need something that goes beyond that.

Max: Compared to a lot of larger tech companies, LinkedIn actually has a fairly uniform developer productivity system for most of the company. Despite that, there’s a lot of custom in-house built pieces. Well, I mean, a lot of it is custom in in-house pieces, and despite the standardization, there are still significant sections of the company that use different development systems. You have two different things. One of them is you have a bunch of stuff that we built in-house. I would be surprised if any company can get away with a totally SaaS-based developer productivity solution at this time.

I don’t think there’s any real turnkey developer productivity platform that exists today that you can just adopt wholesale and not have to glue the pieces together. There’s large chunks of it. GitHub is a large chunk of it, and there’s issue trackers. There’s Jira and Linear and stuff like that that you can use or project management systems or whatever. But you still have to tie all those pieces together, and there’s going to be something specific to the work that you do. This is, by the way, especially true if you work in AI and ML where the external tooling is very, very minimal and the stuff that you need to do on a day-to-day basis is going to be a lot of stuff that your company is going to have to build in most cases.

There are platforms that you can use, and they provide a lot of great stuff, but you’re still going to have to build a lot of the actual experience yourself. At a company like LinkedIn, those built in-house built pieces could be 100 different pieces of software, conservatively, I would say. You’re like, “Well, what could all those pieces be?” Well, I mean, think about it this way. Think about there’s a UI for seeing the CI jobs, there’s the actual thing that runs the CI jobs. There’s data about the test framework. There’s a system that stores the results of the CI jobs. I’ve named four, three or four things, and that’s just for CI. Then you might have slightly different versions of those for whatever different language platforms you support.

Then what about your IDEs and what’s all your IDE infrastructure? How does all of that work? There’s all that. Then think about the people who don’t use your centralized development system. A lot of companies have a very diverse developer ecosystem. Even at LinkedIn, there’s no exception. There’s a lot of people who are developing in totally different systems.

Maybe you find individual teams that have their own platform, and then they’re storing data in their own place. Then here’s an interesting one that we’ve run into recently. Let’s say that you write in-house software, but that software is not for software engineers? What if you write an in-house CRM? What if you write an in-house meeting room management system? And now the developers of that system need to understand its impact, so suddenly, you, an engineering team, need to go in and interface with all of these systems. You might need to go interface with whatever CRM you’re using externally, which I’ll tell you, developer productivity metrics platforms have not thought about that.

Nobody thinks, “Oh yeah, developers like engineering metrics, you need me to understand a CRM. Why?” That doesn’t…

Abi: That’s funny.

Max: Yeah, that doesn’t help me. But maybe some of the data is going to be there. The data can just be in an infinite variety of places. Any place that you can imagine, the data could be there.

Abi: I want to actually say I completely agree and I’ve often recommended to people that they think twice about some of the turnkey SaaS metric solutions, because like you said, I do think unless you’re a really, really, really small team, really small startup, you’re going to quickly outgrow the usefulness of these tools once teams not even just adopt different systems like you said, but even just different workflows within those systems that require different types of measurements. Maybe not necessarily data sources, but just different types of measurements. To piggyback off this, you and I have had several conversations offline about how these quantitative metrics can’t capture the full picture of developer productivity, developer experience. I want to ask, and of course the article you guys published does touch on this, but how do you incorporate the human side? Then I want to ask you to share that experience you had at Google. I think it’s such a great story. I just said a post on it as you saw, but would love to just hear it firsthand. That experience you went through at Google around code quality metrics.

Max: Okay. Yeah, let me talk about the difference between qualitative and quantitative metrics from an engineering practices standpoint, and how you take action on them.

Qualitative metrics are your highest coverage information. You just ask people what they think, how they feel about things, and there’s better or worse ways to design surveys. People should go and listen to your podcasts with Dr. Nicole Forsgren and Nathan Harvey and anybody else who does survey design, because there’s actually quite a bit to know about it. But at the worst you just put out a survey and you’re like, “How do you feel about blah?” And people answer you. Numerous studies have shown that this is actually the most reliable and highest coverage information that you can get. The thing about quantitative metrics is they tend to be specific, they tend to be like, build time. That’s a quantitative metric. That’s the easiest one that everybody always talks about because it’s like the most well-known developer productivity quantitative metric that you can measure is build time.

At least, historically, that’s been for decades. People have been able to measure build time. When build time is very, very bad, then it can be the dominant part of your experience. I’ve talked to people who had build times that were a day and a half, and that’s a pretty bad experience. That dominates your developer productivity experience. But with build time, what I’ve discovered is that there’s a rule of fours approximately, that every time you decrease it by a factor of four, that’s when you have a different experience. But if you decrease it by less than that, then it doesn’t really matter. If I go from having a 40-minute long build to having a 10-minute long build, that is transformative for me as a developer. But if I go from having a 40-minute long build to having a 30-minute long build, it probably doesn’t make any difference for me.

However, on the plus side, the quantitative metric is very actionable. It’s directly actionable. I look at that, I can dig into who’s having the slow build, what kind of machines are they using, what’s the CPUs on those machines, what memory do they have? What other processes are running on those machines? What types of builds are they doing? What language are they building? How large are their code bases? How many dependencies do they have? I can do a full engineering investigation, I can come up with a concrete answer that I can tell you is factually true, and then I can develop a concrete engineering solution that I have now a factual basis for building. That’s awesome.

But then that can also turn out to be not the issue. That’s what I’m pointing out, this is the comparison. Whereas the qualitative metrics, you’ll rarely miss the issue, at least the biggest issue. The big, big issue when you send out a survey, you’ll get the big issue. People will be like, “I hate this code review tool,” and you’ll get it a zillion times. Or people will be like, “Oh my god, I spend all of my time in meetings and we’re always engaged in alignment and I can’t ever get any coding done.” You’ll hear it over and over and over. You’ll hear 10 or 20 different free text comments, will just tell you, and the ratings will be really poor and you’ll get it. But what do you do about that can be very nebulous, because sometimes all you get is, “I hate the code of your tool.” That’s it. You don’t get like, “It’s slow, it’s unreliable.” There’s nothing that you can dive into and measure.

So you have to think about those two things differently.

You want to talk about the Google thing?

Abi: Yeah.

I was part of a group at Google called the Code Health Intergroup. Intergroups were a thing that existed across the company where a bunch of people would as volunteers get together to tackle a horizontal issue. I think the Code Health Intergroup started as the Testing Intergroup once upon a time and then evolved into this Code Health Intergroup. At the time that I was there, it was run by Russ Rufer and Tracy Bialik, who, if you ever look in the acknowledgements of some of the seminal works on software engineering, they are in the acknowledgements of those seminal works in software engineering, and often, the acknowledgements are things like, “Wow, they totally helped me transform the book completely,” they are behind the scenes people who have just been driving developer productivity initiatives for decades.

They’re brilliant, they’re amazing, and we had these amazing meetings. But one thing that would happen is every week somebody would show up from across the company, almost every week, and ask us, “How do you measure code quality?” Because that’s what we were focused on in a lot of the conversations because one of the things that you discover, and this is a total side note and could be a whole other podcast. But as you go through your life of engineering, at first, people discover that software development and they can write software. Then usually people discover testing. They’re like, “Oh, I can write tests.”

Then somewhere down the road people often discover that actually code quality is underneath almost all of those things, on almost all the problems that you have. Outside of human problems, organizational problems, which are huge and can be major issues, a lot of engineering issues ultimately stem from some factor of code health or code quality in the long term. It’s just hard to see it.

Russ and Tracy and I were all pretty experienced. I mean, I’m not comparing myself to them, but they were way more experienced than I was at the time. But I had some experience. I had published my first book by this time, and people would come to us, because we were the code quality people and they would say, “How do you measure it?” Mostly, we would tell them, “Don’t do that,” because what they wanted was they wanted a number. They wanted a number that says this is how good or bad this piece of code is. For a long time I struggled to explain to people why not to do it. I just knew, “Well, it’s not a good idea to do it.”

I think Russ and Tracy were better at explaining it at the time, and they would talk about the consequences of it. It ends up making you narrowly focused on some of the wrong things. You take measurements like cyclomatic complexity, which basically traces the number of possible paths through a function. That doesn’t actually tell you if a human being understands the piece of code or not.

What I eventually came to, and all of this sounds very simple. The thing that I’m about to tell you was revelatory—like a revelation—to me and every other person in the room to whom I said it to the first time that I said it. The definition of simplicity for code is: easy to read, understand and correctly modify.

If you think about that, that’s fundamentally a human property. It’s fundamentally a thing, a human being experiences in their mind. When you get quantitative metrics for what a human being experiences in their mind, you let me know. But I-

Abi: I love that.

Max: I don’t anticipate that ever happening. I think you have to ask people what their experience was, and I think that’s the only way that you can know completely what a person’s experience was. Could you theoretically come up with things that might be helpful? I could imagine an AI-based assistant that tried to do pattern matching on things that other people had…like an AI assistant who had been trained with reinforced learning with human feedback where a bunch of human raiders say, “Yes, this is the thing,” or, “No, this isn’t the thing that I’m looking for.”

I could imagine a system that was trained on patterns of bad code or code that people had said was confusing. You asked a bunch of experts or people who were familiar with a code base, is this particular thing confusing or not? And you might get something.

But even then the problem would be viewpoint, because first of all, who’s the code for? Is it an external API, or is it an internal piece of code? If it’s an external API, it needs to be understandable to everybody. If it’s an internal piece of code, it really only needs to be comprehensible to the people who work on it, which is very different. The kind of structure that you would give to that piece of code is very different. It’s all subjective.

Abi: Yeah, I love that. As you said, I mean, it’s funny you said it was a revelation when you first brought it up within your group, but it is. I mean, anyone who’s written code knows that the quality of the code is a perceptual, an experience for a human. I think it’s funny you brought up, the only way to perhaps programatize a metric around this would be to use AI to either have the AI ask the humans the questions or trend the AI to give you the answers if the AI were able to perceive code in the same way that humans actually do.

I want to ask you, one thing I often recommend to companies is that when they’re getting started on this journey of metrics and understanding how to measure developer productivity, I bring up your team as an example, and I say that it makes sense for a central developer productivity team to be in the role of being the steward of measurement and bringing these types of insights to all the product teams across organization. This is what you’re doing at LinkedIn. I want to ask you, I mean, this is probably a little bit of a pattern you just fell into, is this the right way? Do you see this as something like a pattern that other organizations should be following?

Max: Sure. Let’s talk about the ideal world and the practical world. In my ideal world, every engineer would have an understanding of data and be able to make queries and create their own tables and think about this when they were designing their systems. Realistically, if you want to have good data sources, the people generating the data need to own the data sources. That’s one of the only ways that you can guarantee over time that the thing stays correct, because somebody might change the system itself and then they need to change the emitted data.

However, I would say that offline data of the sort that I’m usually looking at is often viewed by engineers as a secondary task. I can understand that. I think I would’ve viewed it that way and many times in my career. It doesn’t seem like “making the computer do the thing,” which is the joy of programming. The joy of programming is making the computer do the thing you said to do because it’s really fun. I still find that fun. I think people think of things like writing tests, writing documentation, having good metrics and emitting clean data sources as like, “Oh, I got to go do that now. Oh my god, right?” If that is somebody’s job, the offline data thing, it can result, in the practical world, in better results.

The danger is just like any time you create a bifurcated engineering system. Think about companies that had separate QA teams from their software engineering teams and the software engineers were never required to write tests. We call that the throw-it-over-the-wall system, where the software engineers write code and then they never have to think about its reliability ever again. The testers are then on the hook from it there on out. The problem with that model is it never leads to reliable software, but it does lead to very high quality test systems. The reverse model often struggles to generate high quality test systems. The system where you just tell the software engineers, now you are solely responsible for all testing.

When testing becomes complicated or when there need to be central testing libraries that are shared across unit or integration tests, or when there needs to be a new integration testing framework or a data generation framework for integration tests, then those often don’t get done, because they aren’t the task of any individual software engineer. It’s not their “job”, and data systems often end up being the same way. Let’s say that there needs to be common infrastructure or there needs to be somebody who pays attention to the quality and usability of the offline data sources.

If you just say every engineer should do this individually, then that doesn’t happen. That’s the most true with visualizations. It is very, very rare for an individual engineering team to develop a highly usable visualization for their own engineering metrics. If you’re a small team, three, four people, and you just need a dashboard for yourself, you don’t need it to be incredible, you really don’t. The amount of effort you should put into it is hopefully fairly low. Really, you should not put a ton of effort into a dashboard for three to four people.

But what if those three to four people maintain a system that’s depended on by the entire company, and the entire company needs that data? You see there’s a paradox for an engineering manager here. I want those engineers on that team to be responsible for their own data, but that data is not the mission of that team or the mission of any engineer on that team. What do I do?

The way that we solve this currently is we have dedicated people inside of the developer productivity organization who help make data pipelines. These people are in the team that I am the tech lead of, and they work with the data source owners, like let’s say the build team, to help build pipelines and dashboards for build metrics. The value of that is: it actually happens. And it happens in much greater volume and much faster than if you just hope that eventually the build team will get around to prioritizing it.

When I say that, by the way, our build team’s awesome, they’re fantastic. They support thousands of engineers at LinkedIn and they do great work, but they might not consider dashboards to be their primary task. That might not rise high up on their priority list. They might not dedicate an engineer for a quarter just to make a dashboard, but we can do that.

Abi: It’s not just the volume or speed of getting these types of metrics up, but it’s also, as you talked about earlier, the careful design and deliberateness around the definitions, how they’re calculated and how they’re reported and presented that I think your team is adding so much value versus just the build team slapping together some instrumentation and dumping that in a Looker dashboard.

Max: One of the things that happens every single time you have one of these metrics is that there’s a very long phase that we call “user acceptance testing,” but a lot of what that means is hunting down anomalies and making the data look sensible. And the amount of work that’s involved in that is huge. If you just think, “I’m going to make a dashboard and I’m just going to present it to people,” you do not know what you’re in for. If you just say, “I want to put numbers in a dashboard,” fine. If you say, "I want to put accurate numbers in a dashboard that are meaningful,” you are in for a ride.

Abi: Totally agree. Well, Max, I really appreciated this conversation, the behind the scenes look at Insights Hub and the work you’re doing. I want to just shut out my gratitude to you and the rest of your colleagues at LinkedIn for putting this stuff out there for the world to learn about. Thanks again for coming on the show for a second time. I’m sure this will not be the last. Really enjoyed this conversation.

Max: Thanks, Abi. This was great. I could easily talk about this for several more hours, so I’d be totally happy to come back.