AI’s impact on quality: A volatile, uneven landscape

Data from 43,000 engineers across ~100 companies shows quality outcomes ranging from big gains to serious declines.

Justin Reock

Deputy CTO

This post was originally published in Engineering Enablement, DX’s newsletter dedicated to sharing research and perspectives on developer productivity. Subscribe to be notified when we publish new issues.

Over the past few months at conferences and networking events, I’ve had the chance to talk with a lot of engineering leaders about how they’re approaching AI rollouts. A common theme from those conversations is the belief that AI is having a modest but positive impact on quality.

That belief is reinforced by studies like one that DORA published earlier this year, which suggested that a 25% increase in AI adoption correlated with a 3.4% improvement in code quality and a 1.8% reduction in complexity. Findings like these suggest that AI’s effect on quality is modest—generally neutral, or slightly positive.

Curious about this, I turned to DX’s data to see how AI tool usage is affecting different measures of quality. At first glance, the averages told the same story: slight improvements, broadly aligned with industry research. But once I drilled into the company-level results, it became clear the reality is far more complex.

At the company level, we see huge volatility. Some organizations are seeing major quality gains with AI, while others are experiencing serious setbacks. In other words, relying on industry averages for quality measures can be dangerous and may give leaders a false sense of security. Leaders need to measure what’s happening inside their organization, because the outcomes can vary dramatically from one company to the next.

Note: We see much more consistency in AI adoption and throughput. It’s quality in particular where the volatility shows up.

In this week’s newsletter, I’ll break down what we’re seeing in the data on AI’s impact on software quality.

While engineering organizations move rapidly to integrate AI into workflows, a top concern is whether teams are sacrificing quality for increased velocity. Are we moving faster while also maintaining quality standards? Or are we rapidly creating technical debt in the form of unmaintainable, defect-ridden AI code?

Though still nascent, AI is clearly impacting the way organizations create software. As of now, our data reveals that AI’s impact on software quality is not a simple success story, but rather a complex landscape where outcomes depend heavily on organizational context, implementation approach, and usage patterns. Our data demonstrates that industry averages alone won’t help you make organizational decisions regarding the AI impact on quality, although we do see more consistency in metrics such as throughput and adoption rates. You must measure your own organization to be sure of the true quality impacts.

This report examines data from approximately 43,000 engineers across nearly 100 companies. Engineers were separated into groups based on their AI coding activity in the last 30 days. Using these groups, we looked at the relationship between AI tool usage and three aspects of quality: change failure rate, change confidence, and code maintainability.

Quality impact is wildly uneven between organizations

Across all three of these drivers, we observed a great deal of volatility. While some teams saw clear improvement as AI became more widespread, others saw serious degradation. There can be multiple reasons for this, stemming from existing code hygiene practices, the availability of formal training on the best use of AI, and the size, complexity, and domain-specificity of the codebase.

Good or bad code quality is not derived from any single behavior of an organization, but emerges from a confluence of best practices, scanning and automation, developer comprehension and experience, customer feedback, and engineering rigor. With numerous variables contributing to quality outcomes, both cultural and technical, these results are not surprising but serve to validate the industry-wide consensus that the ROI from AI investments is not evenly distributed.

Change confidence increases slightly on average, varies across organizations

Change confidence was positively impacted on average, with a 2.6-point gain on the DevEx driver scale between users and non-users of AI across the full sample, a significant shift given the large sample size.

However, these numbers cannot be taken at face value. The reason that the impact seems relatively flat is that different companies are seeing vastly different results. When we visualize the data per company, a different picture emerges.

In this chart, each bar represents the impact to Change Confidence for a single company based on our DevEx driver calculations (higher is better). We see that, though industry-wide averages show modest, positive gains to Change Confidence, the extreme ends are, in fact, cancelling one another out. ** ** The results range by more than 40 points, with the majority of companies clustering between losing 10 points and gaining 10 points.

Code maintainability varies similarly to change confidence

Examining the analysis of the Code Maintainability metric reveals a pattern similar to that observed with Change Confidence. The industry average shows a slight 2.2-point positive gain:

Looking at the same distributed representation, we see the same level of variance, varying more than 40 points:

Code maintainability is often cited as one of the biggest risks with AI-assisted engineering, as the AI may not conform to best practices, internal coding standards, and domain semantics. Simply looking at an average can give leaders an inflated and false sense of security in this direction, as it’s clear that many companies are experiencing significant negative impacts on this metric.

Change Failure Rate demonstrates uneven distribution

One of the most trusted quality metrics, Change Failure Rate, exhibits similar volatility across companies, regardless of whether they use AI or not. The average difference is very flat, showing only a 0.11% reduction in failed releases between AI vs. non-AI users.

Once again, we must look at a distributed, per-company view to gain a clearer understanding. Despite the industry average leaning positively, there are plenty of teams that are generating more defects while using AI tools than when not:

Bearing in mind that the industry benchmark P50 average for Change Failure Rate is 4%, these numbers are significant. With decreases in failed releases of nearly 2% at the top range, and almost 3% at the bottom range, they can represent a 50% or more shift in the increase or decrease of failed releases.

The level of AI usage has an impact on overall scores

When we break down this data by self-reported Heavy, Moderate, and Light users of AI versus those who don’t use AI, clear patterns emerge. While organizations may initially experience a dip in quality as engineers begin to adopt AI, leaders should allow time for engineers to become more comfortable using the technology, and quality will ultimately improve.

Analyzing Change Confidence, we observe a pattern where heavy users are slightly less confident than moderate users, though both maintain a significant lead over non-users. Light users of AI see a drop in confidence, but that confidence increases with more regular usage. This indicates that AI users are more confident overall, but when they first begin to use the technology, their confidence decreases initially.

A similar pattern emerges when examining the Code Maintainability metric, although in this case, we observe a steady increase from light to moderate usage, with a corresponding dip from no usage to light usage. This again suggests that the user’s experience with these tools is a significant factor in their effectiveness. The more tool experience you have, the higher the likelihood of creating more maintainable code.

When uncertainty is high, measurement is critical

While industry averages suggest modest improvements across all quality metrics, the reality is far more nuanced, with results varying by over 40 points between companies and ranging from significant quality improvements to concerning degradations. The finding that moderate AI users often outperform both light and heavy users suggests there’s an optimal adoption curve where teams gain confidence and maintain quality standards, but excessive reliance may lead to diminishing returns or even negative outcomes.

For engineering leaders, these findings underscore that successful AI integration requires more than just tool deployment; it demands thoughtful measurement, implementation strategies, and proper training to ensure that the promise of increased velocity doesn’t come at the expense of the code quality that underpins long-term software sustainability.

Published

October 1, 2025