Skip to content

Three metrics for measuring the impact of AI on code quality

Kali Watkins

Product Marketing

The hype around coding assistants emphasizes faster code delivery, but behind the headlines, leaders are often concerned about trading quality for speed. Measuring quality can be challenging, which is why we recommend tracking multiple metrics to gain a more comprehensive understanding.

In a recent report from DX’s CTO, 18 companies including GitHub, Dropbox, Atlassian, and Adyen, shared the metrics they’re focused on to measure AI’s impact on developer productivity, which includes measures for quality. In this article, we’ll describe three metrics we see organizations commonly use to measure quality and explain how to view the impact AI is having on these metrics in DX. The three metrics we’ll walk through are change failure rate, pull request revert rate, and code maintainability.

How to analyze AI’s impact on quality metrics in DX

Before diving into the metrics, it’s important to know where to find them in DX. The platform offers three primary views for analyzing AI impact: group comparison, before vs after, and trend correlations.

1. Group comparisons: DX automatically categorizes developers into buckets of none, light, moderate, or heavy AI usage. This view allows you to compare how quality metrics change across adoption groups, making it easy to determine whether heavier adoption correlates with higher maintainability, more incidents, or rising revert rates.

2. Before vs. after: This report shows how productivity metrics change as users transition from lower to higher AI tool adoption, eliminating the bias that can arise from comparing across different groups.

3. Trend correlations: View trend correlations to understand how your organizational metrics are changing over time, as AI adoption increases. For example, you can monitor if PR revert rate increases with AI adoption. Note that correlation does not prove causation; many factors can affect these metrics.

Three metrics for measuring quality: Change fail percentage, PR revert rate, and code maintainability

1. Change fail percentage

Change fail percentage measures the percentage of production changes that result in degraded service, impairment, or outage—based on either system or self-reported data. We offer both options because some customers prefer to view AI’s impact through a quantitative lens and validate against the quantitative metric. This metric can serve as a signal of whether AI usage is introducing instability into production. Here’s how it’s calculated in DX:

Out-of-the box system calculation:

Change fail percentage = (Number of incidents) / (Total number of deployments)

Note: DX provides the ability to customize metrics or create new ones, so you can make sure you’re capturing a version of change fail percentage that’s most relevant to your team or organization.

Self-reported calculation:

  1. Each response range (e.g., “1–5%”, “6–15%”) is converted to its midpoint (e.g., 3%, 10.5%).
  2. We then take the average of those midpoints across all respondents

To view AI’s impact on change fail percentage in DX, navigate to Reports and search for AI impact, or find it in the sidebar under the GenAI dropdown. Change fail percentage is available in the group comparisons and trend correlations tabs (more on navigating these views below).

Across our customer base, AI’s impact on quality metrics like change fail percentage can be volatile—there’s no consistent pattern of whether AI improves or diminishes quality. That’s why it’s essential to capture this data within your own organization, so you can understand the impact firsthand.

2. PR revert rate

PR revert rate is a quantitative metric that is captured in real-time. It’s calculated by the number of reverted pull requests divided by the total number of pull requests (excluding the revert PRs). A high revert rate may signal quality issues, such as bugs, regressions, or poorly understood changes slipping through review.

To view whether AI usage is impacting PR revert rate, go to the AI impact report.

3. Code maintainability

Code maintainability is a perception-based metric. Developers are asked to rate how easy it is to understand and modify code, and the score reflects the percentage of favorable responses. This metric comes from quarterly snapshots, providing a direct recurring pulse from developers. Tracking how it shifts over time, and how it compares to industry peers, offers valuable insight into the health of your codebase.

Because code maintainability is a quarterly metric, it can be a lagging indicator of quality. If you want more frequent signals into developer sentiment about code maintainability, as well as how AI usage is impacting it, we recommend running a targeted study.

To view whether AI is impacting code maintainability, go to the AI impact report. Code maintainability is available alongside change failure percentage and PR revert rate in the group comparisons, before vs. after, and trend correlations tabs.

It’s not uncommon to see code maintainability scores decline with increased AI use. This may represent a “new normal,” as AI generates more of the code and developers spend less time writing it themselves—leaving them further removed from the codebase.

Final thoughts

As always, we recommend viewing developer productivity through multiple dimensions. When looking at quality, it’s equally important to keep an eye on measures of speed and effectiveness. For a deeper set of metrics and guidance on assessing AI’s impact, see DX’s AI Measurement Framework (or, if you’re a customer, read this guide on how to apply the AI Measurement Framework in DX.)

If you’d like help setting up an AI connector or getting more out of DX’s AI impact reporting, reach out to your DX representative or request a demo.

 

Published
October 2, 2025