At some point along an organization’s journey with measuring developer productivity, there is often a discussion on whether or not to track individual developer activity metrics such as the number of commits, lines of code, or pull requests. Interest in these metrics is typically prompted by leaders wanting objective insights into developer performance, or how much work is getting done.
Although the reasons for tracking these types of metrics make intuitive sense, the industry is fraught with polarized opinions on this practice that sometimes spill out into the public. Fortunately for leaders, there’s extensive research on this topic that can help with navigating through discussions and decisions on what to (or not to) measure. This article summarizes this research.
Software engineering is difficult to measure in part because it is a highly complex task. Therefore, traditional productivity metrics that focus on output volume do not readily apply. Furthermore, when evaluating these metrics based on individual or team, more is not necessarily better. A recent paper written by Google researchers articulates this point bluntly:
One might count the lines of code written by a developer in a certain period and calculate a simple output per unit time “productivity” measure like lines of code per minute. Pounds of coal shoveled per hour will tell you which shovelers are the best shovelers; lines of code per minute will not tell you which software developers are the best software developers.
Similarly, in their seminal paper “Defining Productivity in Software Engineering,” authors Stefan Wagner and Florian Deissenboeck state that leaders must consider factors beyond activity when measuring developer productivity:
While with production processes we can measure productivity with the number of units produced, productivity is much trickier for intellectual work. It is commonly agreed on that the nature of knowledge work fundamentally differs from manual work and, hence, factors besides the simple output/input ratio need to be taken into account.
Aside from the creative nature of the work, developer activity is also difficult to measure because it involves many different types of tasks that are not easily captured. In the paper titled “The SPACE of Developer Productivity,” the authors caution that “because of the complex and diverse activities that developers perform, their activity is not easy to measure or quantify.”
Google researcher Ciera Jaspan further elaborates on this idea in the paper, “No Single Metric Captures Productivity”:
When we create a metric, we are examining a thin slice of a developer’s overall time and output. Developers engage in a variety of other development tasks beyond just writing code, including providing guidance and reviewing code for other developers, designing systems and features, and managing releases and configuration of software systems. Developers also engage in a variety of social tasks such as mentoring or coordination that can have a significant impact on overall team or organization output.
Still, many leaders today focus solely on measuring individual activity metrics. As described by Dr. Margaret-Anne Storey, a co-author of the SPACE framework, this is something researchers are concerned about:
We only included metrics like lines of code in the SPACE framework because we found practitioners were commonly using them, and we wanted to build a bridge for those people. I often see companies focusing on metrics like the number of pull requests, which we do not recommend.
Next we’ll cover research on the potential consequences of utilizing developer activity metrics.
Rewarding developers for lines of code leads to bloated software that incurs higher maintenance costs and higher cost of change.
– Dr. Nicole Forsgren
Several research studies have investigated the consequences of tracking individual activity metrics. One key finding is that developers hold skepticism and fear towards these types of metrics, which may result in them “gaming the system” out of self-preservation. This is further discussed in a recent paper from Google:
Developers are concerned about how any measurement could be misinterpreted, particularly by managers who do not have technical knowledge about inherent caveats any metric has. If productivity metrics directly feed into an individual’s performance grading, then they will impact how developers are compensated and whether they continue to keep their jobs—a serious consequence for getting it wrong. These high stakes further incentivize gaming the metrics, for example, by committing unnecessary code just to increase LOC ratings.
In another paper titled “Summarizing and Measuring Development Activity,” the authors caution that the mere presence of activity metrics can warp incentives, even even when those metrics are not explicitly being used to reward or penalize developers:
Development activity may be impossible to measure, and it might even be dangerous to try to measure it since those measures can lead developers to game a system rather than work towards the goodness of the codebase.
Google researchers have also written about the morale issues that can arise from use of individual activity metrics. These morale issues negatively impact overall productivity, and can also cause attrition issues.
Tracking individual performance can create morale issues which can perversely bring down overall productivity. Research shows that developers do not like having metrics focused on identifying the productivity of individual engineers; this has also been our experience at Google. Developers are concerned about privacy issues and about how measurements can be misinterpreted, particularly by managers who do not have technical knowledge about caveats.
There are several common motivations for why a company may be interested in measuring the activity of their developers. However, in each of these cases, research shows that activity metrics generally do a poor job at providing valuable insights.
Identifying high and low performers. Companies often look to individual metrics to help them assess individual performance. However, as described earlier, activity metrics provide a limited view of activity that often doesn’t provide value. In “No Single Metric Captures Productivity,” Google researchers state:
It is our experience that managers (and peers) frequently already know who the low performers are. In that case, metrics serve only to validate a preexisting conception for why an individual is a low performer, and so using them to identify people in the first place is not necessary and serves only to demoralize the higher-performing employees.
Helping developers grow their skills. Although many metrics vendors cite this as a use case, there are no research studies which validate the use of activity metrics for helping developers grow their skills. Rather, a recent paper by Microsoft researchers identified the top five attributes of great engineers as:
For leaders who hire or coach developers, these findings present valuable insights for hiring, upskilling, and supporting developers’ growth. Activity metrics, however, do not help developers or managers assess or boost skills across these attributes.
Instead of attempting to use individual activity metrics to evaluate developer performance, consider improvements to your formal performance review processes, starting with a clear definition of role expectations and then working backwards to identify potential metrics.
Improving engineering productivity. Another common motivator for tracking individual activity metrics is to identify inefficiencies and improve engineering productivity. However, activity metrics are heavily influenced by outside forces: when they’re used in isolation, they may provide misleading signals into where productivity issues exist. This is explained by Microsoft researchers in their recent paper:
Sometimes, higher volumes of activity appear for various reasons: working longer hours may signal developers having to "brute-force" work to overcome bad systems or poor planning to meet a predefined release schedule. On the other hand, increased activity may reflect better engineering systems, providing developers with the tools they need to do their jobs effectively, or better collaboration and communication with team members in unblocking their changes and code reviews. Activity metrics alone do not reveal which of these is the case, so they should never be used in isolation.
A recent paper by Thoughtworks further recommends against using activity metrics to try to improve developer effectiveness:
Organizations look for ways to measure developer productivity. The common anti-pattern is to look at lines of code, feature output or to put too much focus on trying to spot the underperforming developers. It is better to turn the conversation around to focus on how the organization is providing an effective engineering environment. Being productive motivates developers. Without the friction, they have time to think creatively and apply themselves. If organizations do not do this, then in my experience the best engineers will leave. There is no reason for a developer to work in an ineffective environment when lots of great innovative digital companies are looking to hire strong technical talent.
The decision of whether or not to track developer activity metrics is an often debated topic. The findings discussed in this article, which come from researchers at companies like Google and Microsoft, can help leaders better navigate the potential pitfalls of using these types of metrics.