The engineering metrics used by top dev teams

Taylor Bruneaux

Analyst

Engineering metrics are the foundation of data-driven development organizations. Whether you’re a startup scaling your engineering team or an enterprise optimizing developer productivity, understanding which metrics to track–and how to define them–can make the difference between guesswork and strategic improvement.

The best engineering dashboards provide visibility into team velocity, product throughput, and software engineering performance metrics that drive reliability and profitability while identifying bottlenecks that impact SDLC efficiency and development velocity.

Leading technology companies like Google, Atlassian, and Etsy rely on data-driven engineering metrics rather than intuition to optimize their development processes and make critical tooling decisions. This collection of metrics defines the essential engineering KPIs and software development metrics that these industry leaders and more use to drive informed decisions about workflow improvements, organizational changes, and performance optimization.

Use this resource to establish new measurement frameworks, create metrics benchmarks for your teams, or communicate engineering impact to stakeholders. Engineering teams across various industries frequently discuss these approaches on platforms like Reddit and in technical blog posts. If you want to learn more about how these companies implement metrics, read our comprehensive guide.

Now, onto the metrics:

DX Core 4

The DX Core 4 is a unified approach to measuring developer productivity that encapsulates DORA, SPACE, and DevEx frameworks.

Developed by leading researchers, this framework includes four balanced dimensions: speed, effectiveness, quality, and business impact.

Over 300 companies across tech, finance, retail, and pharmaceutical industries have implemented the DX Core 4 metrics, achieving up to 12% increases in engineering efficiency, substantial cost savings, and 15% improvements in employee engagement. For implementation guidance, see our detailed Core 4 measurement guide.

Here’s a little more about the benchmarks that make up the Core 4:

Speed

Measures throughput and developer velocity through diffs per engineer (tracked at team level), lead time for changes, and deployment frequency.

Effectiveness

Captures how effectively developers accomplish their work using the Developer Experience Index (DXI), ease of delivery metrics, and engineering morale indicators. Teams can track these metrics using our DXI reporting tools for comprehensive insights.

Quality

Tracks system reliability through change failure rate, MTTR for failed deployments, and operational health metrics, including software security KPIs.

Business impact

Measures percentage of time spent on new capabilities versus maintenance work, initiative ROI, and revenue per engineer to understand team capacity and ownership patterns. Organizations can use allocation tracking to determine how their team distributes its engineering time.

We designed the DX Core 4 to deploy in weeks rather than months, leveraging readily available system metrics and self-reported data for your engineering dashboard.

For practical implementation strategies, explore our research on operationalizing developer productivity metrics, and see how teams use Core 4 reporting to track these metrics effectively.

Beyond the DX Core 4, engineering teams often track additional specialized KPIs depending on their specific needs and contexts. These supplementary metrics provide deeper engineering visibility into workflow efficiency, backlog management, sprint progress, and progress engineering KPIs.

Leaders can apply Engineering metrics in many ways, from tracking software engineering metrics for code quality to monitoring project metrics that reveal system bottlenecks. Teams can leverage data connectors to aggregate metrics from multiple tools and systems.

Here are detailed definitions of other important software engineering performance metrics used across the industry:

Adoption rate

Adoption rate measures how many developers actively use a product or service.

Ideally, this represents a percentage of the developers using the product compared to the total number of developers the product is intended to serve. However, some teams employ simpler metrics, such as calculating the total number of users who have ever utilized a product. They may also determine the total number of users who have used the product within a specific timeframe, such as the past month or quarter.

There are also examples of companies that measure the adoption of a process, such as Uber, which measures “Design Docs Generated per Engineer.” Engineers write design docs for non-trivial projects before they start meaningful work: the idea is to get feedback early and ultimately decrease the time taken to complete a project.

Uber’s metric tracks how frequently developers are following this practice. For insights into what constitutes high-functioning engineering teams, research shows that process adoption is a key indicator.

Availability

Availability measures the percentage of time that your infrastructure is operational and accessible within a given period. Teams use availability as a metric to report on when discussing system performance and MTBF (Mean Time Between Failures).

Change failure rate

One of the four key DORA metrics, change failure rate, measures stability and pipeline quality.

The DORA team defines change failure rate as “percentage of changes to production or releases to users result in degraded service (for example, lead to service impairment or service outage) and subsequently require remediation (for example, require a hotfix, rollback, fix forward, patch).” The problematic part of this metric is defining “failure.”

For some real-world examples, Lattice measures Change Failure Rate as the number of PagerDuty incidents divided by the number of deployments. Amplitude measures it as the P0s over production deploys (the P0 count goes through PagerDuty, and the deploy count is from Spinnaker).

Another way to calculate Change Failure Rate is to measure the percentage of hotfixes or rollbacks deployments, which can indicate the rework ratio in your development process.

CI Determinism (CID)

CI Determinism is the opposite of test flakiness: it measures the likelihood of a test suite’s result being valid, not a flake. The benefit of using this metric over Test Flakiness is that CI Determinism is a good number when it goes up.

LinkedIn is the only company we spoke with that includes CI Determinism as a top-level metric. They use a system that runs CI tests at specific times every week to track whether these tests give consistent results or whether they change from one run to another. Each test gets a score based on how often it gets the same result. If a test runs 10 times and it passes 7 out of those 10 times, its Determinism Score would be 70%. A higher score is better because it means the test is more reliable.

When they aggregate the metric, they average all the Determinism Scores to get an overall Determinism Score. This way, codebases that run less frequently but are still flaky have their flakiness equally represented in the metric as codebases that run frequently.

Code reviewer response time

Code reviewer response time measures how long it takes for code reviewers to respond to each update from a developer during a code review. LinkedIn and Google believe that one of the most essential qualities of a code review is that reviewers respond quickly. This metric measures how quickly reviewers respond to each update a developer posts, as slow review bandwidth can create bottlenecks.

LinkedIn calculates code reviewer response time as the time, in business hours, that it takes between each request and response. A request is when a reviewer gets a notification that an author has taken some action, and now the author is blocked while waiting for a response.

“Response” is the first time after a request that a reviewer or code owner responds to the PR and sends that response to the author. LinkedIn looks explicitly at the P50 and the P90 values.

Atlassian measures pull request cycle time (PR creation to merge time) as a similar metric. They measure the average time a pull request takes from ‘open’ to ‘merged’ over the last ten pull requests. The difference between Atlassian’s and LinkedIn’s metrics is that Atlassian’s looks at the whole process, whereas LinkedIn focuses on driving a specific behavior (faster response times).

Deep work

Time for deep work, or “focus time,” measures the amount of uninterrupted time developers have at work. Most teams use surveys to capture variations of this metric, as it directly impacts engineering effectiveness and morale metrics. Research on developers’ ideal and actual workdays reveals significant gaps between what developers need and what they experience:

Developer satisfaction regarding the time available for deep work is evaluated through a survey, which includes the question: “How satisfied are you with the amount of uninterrupted time you have for deep work?”

Meeting heavy days or the inverse, number of days with sufficient focus time, can be tracked by asking developers: “In a typical week, how many days do you have with more than one scheduled meeting (not including standups)?” Response items should provide a scale of options, such as 0 days, 1 day, 2 days, 3 days, four or more days.

Interruption frequency can be measured by asking developers: “In a typical week, how often are you interrupted from your primary task to work on something else that was unplanned or suddenly requested?” Response items should provide a scale of options, such as less than once per week, at least once per week, at least once every two days, at least once per day, and at least once every couple of hours.

Deployment frequency

Deployment frequency is another of the four key DORA metrics: it measures how often an organization successfully releases to production.

Teams may look at the frequency of successful deployments over any given period (hourly, daily, weekly, monthly, yearly).

The challenge with this metric is defining what constitutes a successful deployment to production. DORA’s research looks at successful deployments to any amount of traffic, but teams may define a successful deployment differently (for example, deploying to 50% or more traffic).

Generically measuring deployment frequency can be tricky for some organizations, especially those with software complexity. As an alternative, LinkedIn shared that they’ve begun to work on a metric called “deployment freshness,” which measures how old code is in production. Improving Deployment freshness should bring the same value to the business as improving deployment frequency.

Developer build time

Developer build time measures how long developers wait for their build tool to finish. Teams use this standard metric that productivity teams focus on because it often indicates a significant opportunity to enhance developer productivity.

In many companies, developers spend a significant amount of time waiting for builds to complete, and even minor improvements to make builds faster are beneficial for overall capacity and utilization.

LinkedIn defines this as the wall-clock time from when the build tool starts a “build” to when it completes. The duration is measured and reported in seconds.

Critically, LinkedIn only counts this for builds invoked by human beings, that we reasonably assume they are waiting on (this is notable because other teams have run into issues by including build times from robotic builds in their metric). LinkedIn excludes all builds run on the CI infrastructure in this metric.

Developer customer satisfaction (CSAT) and net user satisfaction (NSAT)

Developer satisfaction metrics can capture how satisfied developers are overall with their development systems, or how satisfied they are with specific tools. Satisfaction is typically captured quarterly in a developer experience survey. Teams can ask questions about developers’ overall satisfaction and experience using specific tools (CSAT).

Some developer productivity teams also measure engagement, which can be captured through a developer survey asking, “How energized are you by your work?”

This measures how excited and stimulated developers feel. It’s commonly assessed in HR engagement surveys; however, DevProd teams also focus on engagement because it indicates developer productivity. Additionally, it can be used to balance other measures that emphasize speed. Delivering software faster is beneficial, but not at the expense of developer happiness or when it leads to more bugs.

Ease of delivery

Many of the companies we spoke with measure ease of delivery, which is a qualitative assessment of how simple or challenging it is for developers to perform their work. Ease of delivery is often used as a “north star metric” for developer productivity teams, as their mission is to facilitate developers in their jobs and enhance collaboration among teams.

Ease of delivery can be assessed through a quarterly survey that asks, “How easy or difficult is it for you to do work as a developer or technical contributor at [Company]?”

Experiment velocity (or learning velocity)

Experiment velocity is a unique metric from Etsy. At Etsy, experiments are a core aspect of their engineering culture to bring teams closer to the customer and learn quickly. Each team at Etsy designs and runs its experiments to assess how users will respond to new features.

The Experiment velocity metric was developed using an internal experimentation platform that monitors the progress of these experiments. Etsy focuses on how many experiments are initiated each week, how many are concluded, and how many achieve a positive hit rate.

Lead time for changes

Another one of the four key DORA metrics, lead time for changes, measures the time between a code change and the release of this change to end users. It is a measure of speed and responsiveness in the development process.

As described by the DORA program, "The lead time for changes metric requires two important pieces of data: when the commit happened, and when the deployment happened.”

“This means,” they explain,” that for every deployment, you must maintain a list of all the changes included. This is done using triggers with a SHA mapping back to the commits. With the list of changes in the deploy table, you can join back to the changes table to retrieve the timestamps, and then calculate the median lead time.

GitLab measures lead time for changes by calculating the median time for a merge request to merge into production (from master).

Perceived rate of delivery

Perceived rate of delivery is a measure of how fast or slow a developer feels their team delivers software. It is a measure of speed and complements quantitative flow metrics.

Teams typically use the perceived delivery rate to understand whether development teams feel they’re delivering quickly or not.

When teams rely only on quantitative metrics, they often wonder whether what they see is good or bad. Take deployment frequency: this metric alone doesn’t tell us how difficult it is for a team to deploy code, or whether a team feels they’re shipping software quickly. The perceived rate of delivery provides that data.

Perceived productivity is another metric used for the same reasons: it provides the developers’ perspective on how often they feel productive within a given week.

Time to restore services

Time to restore service is another one of the four key DORA metrics. It measures how long it takes an organization to recover from a failure in production, also known as MTTR (mean time to repair). It’s intended to measure stability.

Teams pair this metric with change failure rate, which measures the percentage of changes that require a rollback or hotfix. Time to restore services will track when the change (that required a rollback) was released and resolved.

Time to 1st and 10th PR

Time to 1st and 10th PR is a measure Peloton uses to understand the ramp-up time for new developers and assess the effectiveness of their onboarding interface.

These metrics are not used to evaluate individual developers but to measure the impact of improvements to the onboarding process, which the Tech Enablement & Developer Experience team has focused on. This helps teams understand capacity for sprint planning and how quickly new team members can contribute to deadlines. Organizations seeking to optimize their developer onboarding can utilize specialized onboarding studies to benchmark and enhance these metrics.

Weekly time loss

Weekly time loss calculates the percentage of time developers lose due to obstacles or inefficiencies in their work environment (for example, slow tools or processes, unplanned work, unclear tasks).

Like ease of delivery, this is frequently used as a “north star metric” by developer productivity teams to track the impact of their work and measure sprint velocity effectiveness.

For example, suppose they introduce a change that brings weekly time loss from 23% down to 20%. In that case, that can translate to significant savings and measurable impact on profitability for a mid-sized organization or larger.

This metric often appears as a key component in executive engineering dashboards due to its direct correlation with business outcomes and its ability to reveal hidden inefficiencies in the SDLC that impact overall project metrics and burndown tracking.

Transform metrics into action with the right approach

Having comprehensive engineering metrics is just the beginning—the real challenge lies in transforming that data into meaningful improvements.

Teams often struggle with analysis paralysis, overwhelming engineering dashboards, and difficulty connecting software engineering KPIs to decision-making. The key is understanding whether you need diagnostic metrics for strategic direction and metrics benchmarks, or improvement metrics for daily actions that drive real change in development velocity and teamwork.

Common pitfalls to avoid

Successful engineering metrics organizations avoid reverting to old habits like adding throughput metrics to leadership scorecards without addressing workflow inefficiencies, tech debt, or delivery pipeline constraints.

These teams don’t overwhelm teams with hundreds of measurements or let valuable data sit unused in dashboards collecting dust. Most importantly, they avoid relying solely on intuition when making engineering effectiveness and team collaboration decisions.

The systematic approach that works

Effective monitoring productivity requires understanding the tradeoffs between engineering KPIs, from code complexity and downtime to responsiveness and business alignment.

Teams need visibility into their backlog management, release burndown tracking, and product throughput while focusing on product excellence and morale metrics. They must track software security KPIs, engineering morale indicators, and ownership patterns that impact collaboration across development timelines.

The DX Core 4 framework provides this systematic approach, offering diagnostic capabilities for strategic planning and improvement metrics for actionable insights into team velocity, reliability, and engineering visibility. With proven results across 300+ companies, including significant cost savings through automation, reduced repairs and rework, and improved deadline adherence, it represents a mature solution for organizations ready to move beyond data collection to data-driven transformation. Teams can explore the DX platform to see how these metrics integrate with comprehensive developer productivity measurement.

DX can help your organization implement healthy, actionable metrics that drive fundamental improvements in developer productivity, reduce churn, and enhance business alignment.

Measure developer productivity with the DX Core 4

Learn about the DX Core 4, a unified framework for measuring productivity that encompasses DORA, SPACE, and DevEx.

Read now →

Published

May 22, 2025

Engineering acceleration tools