Pitfalls of tracking developer activity metrics
Taylor Bruneaux
Analyst
The practice of tracking developer activity metrics is as common as it is controversial. Many organizations track metrics such as pull requests merged, commits, or story points in hopes of measuring and managing team performance. At the same time, there is continued outcry against the use of metrics amongst many leaders, developers, and researchers.
To capture a helpful perspective on this topic, we interviewed Bryan Finster, who previously founded Walmart’s DevOps center focused on scaling engineering excellence across the enterprise. While at Walmart, Bryan witnesses the organization adopt developer activity metrics — an approach he predicted would negatively influence the way developers work. Here, he describes why these metrics were adopted and how they impacted teams.
With that, it’s over to Bryan.
Metrics are the siren song of so many organizations. “If only we had a simple way to track how we are doing!” The problem is that metrics can be either very useful or very destructive to our goals. It all depends on what we measure and why. If we are measuring things that give us insights into waste, friction, and delivery pain that negatively impact our ability to deliver value, then they are very helpful. When the goal is holding people accountable, they are less than useless.
For several years, I’ve used delivery metrics to identify problems and help teams correct them. If you’d like to read more about those experiences, you can find my thoughts in How to Misuse and Abuse DORA Metrics. In that time, I’ve also seen misguided attempts by some to manage through metrics. One common and incredibly bad approach is measuring individual activity metrics. Let me share some real examples.
“What gets measured gets managed.”
An engineering manager I worked with wanted a way to manage team performance. The team was a mixture of experience, from barely out of college to 15–20 years in the industry. The manager, naturally, wanted to ensure everyone was pulling their weight. He also needed a way to stack rank resources for their annual review. Jira metrics dashboards were the obvious choice. By dashboarding the number of tasks each person completed during the sprint, the manager could easily discover who the top developers were and who were candidates for “trimming the bottom of the bell curve.” This lightened the manager’s workload, focused the developers on increasing output, and helped the senior engineers by eliminating the distraction of helping others to grow. Strangely, it didn’t improve the team’s ability to deliver value to the end users.
The outcome of this method of management was predictable:
- Team members would cherry-pick the easiest tasks that could be completed quickly rather than working in a value-priority sequence.
- Staff Engineers focused on the tactical work of closing issues rather than the work they should have been focused on: helping set technical direction, growing their replacements, looking for ways to improve the overall operability and maintainability of the application, and ensuring general technical excellence on the team.
- Work in progress was high because reviewing someone else’s code didn’t help you make your numbers.
- Collaborating with others on the team reflected negatively on developer activity reporting, so the best thing for each developer to do was to be an individual hero.
This wasn’t a team. This was a co-located group of people working on the same code base. Poorly. Because the goal we had was to help this team improve delivery, the first priority was to educate the manager on how destructive his management practices were to value delivery. “What gets measured gets managed” isn’t a recommendation, it’s a warning.
Code Harder
Our enterprise delivery platform included delivery metrics as a first-class citizen. As an organization, we were using continuous delivery as a forcing function for improving engineering and value delivery across the enterprise. We’d gamified team dashboards to encourage frequent code integration, low tech debt, fast and stable pipelines, and frequent delivery. Something we deliberately avoided in our dashboards was individual developer activity. Teams deliver solutions, not individuals. One area decided to fix this “defect” in our platform by purchasing GitPrime, now Pluralsight Flow.
GitPrime, and other similar “developer productivity measurement” tools, collect metrics from version control to track individual development metrics such as which developer is contributing the most to which part of the code, how many commits each developer is making per week, how many comments developers make during code review, etc. When I received a demo of this tool, my response was, “Under no circumstances should this be used.” It focused on volume instead of value and would incentivize exactly the same behaviors we’d seen in the previous example. I predicted that it would inspire fear, reduce collaboration, and cause micro-managing. None of those helps the end-user. My prediction was accurate. One team even reported that their director was reviewing the number of code review comments and challenging why they were not exceeding the GitPrime recommended minimum. Their response? They added more words to the review comments. Leaving aside the fact that the best code review process will not result in written review comments at all, no customer ever received value from the volume of review comments. In fact, the opposite is true.
The end result of using GitPrime was the impacted teams focused on individual activity and not value delivery. If I am focused on how many lines of code I’m shipping every day, then I’m NOT focused on the problem we are trying to solve. Solving problems and delivering better value is the job, not typing. The goal of purchasing this tool wasn’t to help teams identify flow constraints. The goal was to hold developers accountable for pushing code.
Are tools that report individual activity totally useless? Not entirely, no. They are far more likely to be used in destructive ways, but there are some things they report that are useful if used correctly. Having a way to see hot spots in the code that indicate that only one or two people on the team modify that part of the code is useful. That shows siloing in the team that puts application stability at risk. However, the same thing can be accomplished without needing any metrics at all. If everyone is pulling the next priority from the backlog, then silos are automatically disrupted.
I’ve been watching more vendors release solutions for tracking individual activity over the years. I don’t blame them. People are throwing money at them to do that. The problem isn’t the tools. The problem is that the people buying the tools don’t understand that software development is not assembly line piece work done by overpaid typists. Software development is a team sport where novel problems are being solved. I do understand the need for understanding individual performance. There needs to be a way to reward high performing people and to give low performing people opportunities to improve or find other employment options. However, there are no metrics for this. Go and see how the team is working. Listen to the people doing the work. Be part of the value stream, not someone who manages with a dashboard.
Thanks to Bryan for sharing his experiences. Developer activity is easy to measure, thus many leaders fall into the trap of tracking metrics that can cause negative consequences. Bryan’s experiences at Walmart can help leaders better understand the pitfalls of using these metrics to manage teams.