Skip to content

AI measurement framework: Complete guide for engineering leaders

How high-performing engineering organizations measure AI impact and ROI

Taylor Bruneaux

Analyst

An AI measurement framework is a structured system for tracking, interpreting, and acting on the right signals when evaluating AI-assisted engineering. It organizes AI metrics across three dimensions (utilization, impact, and cost) and connects them to developer conditions, engineering performance, and business outcomes.

Most organizations can tell you how many developers are using Copilot or another AI coding assistant. Few can tell you whether it’s making a meaningful difference. That gap is a measurement problem, and it won’t close on its own.

A measurement framework isn’t a dashboard or a fixed set of KPIs. It’s a repeatable system that:

  • Defines which signals are worth tracking and why
  • Connects those signals to developer conditions, engineering performance, and business outcomes
  • Creates a consistent process for surfacing what’s working and where to focus next

Without that structure, teams end up with fragmented data: one team tracking PR cycle time, another watching suggestion acceptance rates, a third relying on developer surveys. The picture stays incomplete. Decisions stay reactive.

Why “we’ll know it when we see it” doesn’t work for AI ROI

When organizations adopt AI coding tools, the instinct is often to wait and see. Developers start using the tool, velocity looks stable or slightly improved, and leadership assumes the investment is paying off.

Our research tells a more complicated story. Industry-wide AI adoption has reached 93%, and developers save an average of 3.9 hours per week with AI coding tools with daily users saving 4.72 hours per week. These are real gains. But adoption rates alone don’t tell you whether those gains are translating into better engineering outcomes.

Data from our Q1 2026 AI-assisted engineering report shows that nearly 28% of committed code is now AI-authored. Yet, impact on quality continues to be varied and volatile. Some organizations are seeing clear improvement to quality as AI usage increases. Others are seeing serious degradation, with change failure rates swinging by as much as 2 percentage points, meaning some companies are shipping 50% more defects than before.

The pattern is consistent: AI tools can improve individual developer workflows while simultaneously introducing team-level risks that traditional productivity measures don’t capture. Without a structured measurement framework, organizations optimize for the wrong signals and make expansion decisions based on partial data.

What a good AI measurement framework includes

Effectively measuring AI code assistants and agents requires focusing on three key dimensions: utilization, impact, and cost. These dimensions align with the natural lifecycle of AI adoption: where teams first prioritize adoption and usage, then shift to measuring impact, and eventually focus on governance, standardization, and cost efficiency.

Utilization: how much are developers adopting and using AI tools?

Utilization metrics establish the foundation. Without understanding how extensively AI is being used, impact data is impossible to interpret.

Key utilization metrics include:

Utilization is necessary but not sufficient. Our research shows that even leading organizations are only reaching around 60% active usage of AI tools, and high adoption can coexist with low or negative impact if measurement stops there.

Shadow AI adds another dimension of complexity. Our Q1 2026 data found that users without telemetry data from enterprise AI tools are still reporting weekly or daily usage, time savings, and significant percentages of code being created by AI. Organizations should expect that developers are using personal licenses for AI tools. Acceptable use policies are critical to avoid security or license breaches and to clarify what types of data are safe to use with which AI tools.

Impact: how is AI affecting engineering productivity?

Impact measurement is where most organizations underinvest and where the most important signals live.

The most reliable approach combines direct and indirect metrics rather than relying on any single measure.

Direct metrics offer immediate signals to evaluate the effectiveness of specific tools:

  • AI-driven time savings (developer hours/week)
  • Developer satisfaction
  • Human-equivalent hours (HEH) of work completed by agents

Indirect metrics, through regression and longitudinal analysis of DX Core 4 measures, surface longer-term benefits and hidden risks:

The distinction matters. AI metrics tell you what’s happening. Core metrics confirm whether it’s actually driving improvement.

Our longitudinal study into AI’s impact on PR throughput found only 10–15% increases over a period of more than a year. This underscores the need for comprehensive, continuous measurement beyond just coding habits. An increase in speed doesn’t automatically lead to better business results.

Cost: is AI spend and ROI optimal?

Once past tool selection and rollout, tracking cost becomes essential. This is not just to monitor usage, but to also identify high-ROI use cases worth replicating. Our AI ROI calculator can help establish a baseline before you build out full cost tracking.

Key cost metrics:

  • AI spend (both total and per developer)
  • Net time gain per developer (time savings minus AI spend)
  • Agent hourly rate (human-equivalent hours / AI spend)

This is also the stage where standardization and governance matter most: setting model configurations, usage guidelines, and security protocols to ensure scalable, compliant AI adoption. Without these frameworks, organizations risk inconsistent outcomes, security gaps, and missed opportunities to scale impact.

How the AI Measurement Framework connects to the DX Core 4

The AI Measurement Framework is designed to work alongside (not replace) the DX Core 4, which measures overall engineering productivity across four dimensions: speed, effectiveness, quality, and business impact.

The relationship is precise: AI metrics measure what’s happening. Core 4 metrics confirm whether it’s working.

Early data from companies using both frameworks shows that AI can provide lift across all four Core 4 dimensions. By combining broader productivity metrics with targeted AI measures, organizations can track progress and adapt their strategies as the role of AI in software development continues to evolve.

This integration matters because many organizations’ biggest bottlenecks lie outside AI entirely. They find them in the outer loop, or in human factors like collaboration, alignment, and the ability to do deep, focused work. Our Q1 2026 data shows that meeting-heavy days and interruption frequency still outweigh AI time savings in annualized developer cost. AI is a local optimizer. Global optimization requires fixing the human and systemic processes surrounding the code, which is precisely what developer experience measurement is designed to surface.

For a deeper look at the Core 4, see our research on measuring developer productivity with the DX Core 4.


Measuring AI agents: a different challenge

Autonomous agents introduce a measurement question that doesn’t have a settled answer yet: should agents be treated as independent contributors, or as extensions of the developers and teams that deploy them?

In our experience, the most effective approach is to treat agents as extensions of the developers and teams that oversee their work. When assessing a team’s PR throughput, include both human-authored pull requests and those authored by agents operating under that team’s direction.

This reflects a broader shift: every developer will increasingly operate as a lead for a team of AI agents, and the skills of the human operator will matter. Developers will increasingly be measured the way managers are measured today: based on the performance of their teams.

Agent-specific metrics include:

  • Tasks assigned to agents
  • Human-equivalent hours (HEH) of work completed by agents
  • Agent hourly rate (HEH / AI spend)

Agentic tooling is still in its early stages. As these tools mature, measurement strategies and working models must evolve in parallel.

How to roll out AI metrics

As with any measurement effort, leaders must be intentional about how metrics are introduced and communicated. Measuring developer activity (especially in the context of AI) can be a sensitive topic. The hype surrounding AI, combined with growing telemetry from AI tools, has intensified the pressure teams feel.

We strongly caution against top-down mandates or using metrics for individual performance evaluation. Metrics like code generation volume are particularly susceptible to gaming, which is a dynamic the research on measuring developer activity documents in detail. Encouraging behavior that optimizes for the metric rather than the outcome risks malicious compliance — undermining team trust and rendering the data meaningless.

When rolling out AI metrics, communicate clearly:

  • These metrics will not be used in individual performance evaluations
  • The purpose of measurement is to understand how AI-assisted work affects developer experience and software quality—not to micromanage output
  • Data is necessary to guide organizational investment, helping teams determine which tools and workflows deliver real value and which do not

Setting goals based on real industry data

One of the key challenges for leaders today is reconciling the performance claims seen online with the results they see in their own organizations. Among peers, researchers, and experienced leaders, there’s a shared understanding that these numbers often don’t reflect reality.

Our Q1 2026 AI-assisted engineering report helps contextualize what’s actually happening across the industry:

  • Industry-wide AI adoption: 93% of developers using AI coding tools at least monthly
  • Average developer time savings: 3.9 hours per week (daily users: 4.72 hours/week)
  • Share of merged code that is AI-authored: 27.4%
  • PR throughput gap: daily AI users merge a median of 2.4 PRs per week, compared to 1.5 for non-users

At DX, we’ve gathered over four million benchmark samples across hundreds of organizations. These industry benchmarks help leaders contextualize their performance, set realistic expectations, and ensure they’re staying competitive as AI accelerates.


AI measurement FAQ

What is the DX AI Measurement Framework?

The DX AI Measurement Framework is a research-based set of metrics for tracking utilization, impact, and ROI of AI-assisted engineering. Developed with leading companies, researchers, and AI vendors, it organizes signals across three dimensions and connects them to engineering performance and business outcomes. It is vendor-agnostic and practical. Any organization can begin using it immediately.

Why can’t we just track AI adoption rates?

Adoption rates tell you whether developers are using AI tools, not whether those tools are improving engineering performance. Our Q1 2026 data shows that high adoption can coexist with reduced code quality and increased change failure rates. Without impact and cost signals alongside utilization, adoption data is incomplete and potentially misleading.

How do AI metrics relate to DORA metrics and the DX Core 4?

AI metrics measure what’s happening: utilization, time savings, AI-authored code.

DORA and Core 4 metrics confirm whether it’s working: whether delivery is faster, quality is holding, and developers are more effective.

The two are complementary, not interchangeable. See our complete guide to DORA metrics for more.

What’s the difference between direct and indirect AI metrics?

Direct metrics (like AI-driven time savings) offer immediate signals about a specific tool’s effectiveness. Indirect metrics (like PR throughput and the Developer Experience Index) surface longer-term benefits and hidden risks through longitudinal analysis. The most reliable measurement approach combines both rather than relying on any single measure.

How should we measure AI agents differently from AI assistants?

AI assistants accelerate individual developer tasks. AI agents operate more autonomously. They execute multi-step workflows, writing and running code, interacting with external systems. Agent measurement requires signals around human-equivalent hours completed and agent hourly rate, alongside the team-level metrics for the humans overseeing them. See our research on measuring AI code assistants and agents for detail.

How do we avoid gaming in AI metrics?

Avoid using metrics like code generation volume for individual performance evaluation. Focus on team-level outcomes and communicate clearly that measurement guides investment decisions, not output evaluation. Metrics anchored in developer experience signals and connected to business outcomes are significantly harder to game than pure activity metrics.

Why AI measurement frameworks matter

Companies are no longer as limited by the number of engineers they can hire as by the degree to which they can augment them with AI to gain leverage. But leverage requires knowing what’s working.

The most effective engineering organizations don’t measure AI adoption and delivery performance separately. They use a unified measurement approach that connects AI tool usage to developer conditions, engineering performance, and business outcomes, and gives leaders the context to make decisions that hold up over time.

Organizations that have applied this approach are seeing substantial results. Booking.com deployed AI tools to over 3,500 engineers and, within several months, achieved a 16% increase in throughput. Intercom nearly doubled AI adoption rates and realized 41% AI-driven developer time savings. The key difference: these companies measured utilization, impact, and cost together, and then used that data to guide strategic enablement.


Go deeper: for everything we’ve published on measuring engineering performance—from the Core 4 to real-world AI measurement practices—see our developer productivity metrics guide.

Last Updated
May 20, 2026