How to measure AI's impact on developer productivity
All of DX's comprehensive resources, frameworks and metrics for tracking AI adoption, measuring productivity impact, and optimizing ROI
Taylor Bruneaux
Analyst
testAI coding assistants and autonomous agents are transforming software development. Yet most engineering leaders can’t answer basic questions about their AI investments: Which tools are delivering value? What productivity gains are we actually achieving? How do we justify continued spending?
Headlines promise “2x productivity improvements” and “30% of code written by AI,” but these claims rarely match what organizations experience on the ground. Without robust measurement frameworks, you’re left making critical investment decisions based on vendor marketing rather than data.
This guide compiles everything DX has learned about measuring AI’s impact on engineering productivity. Drawing on research with hundreds of organizations, partnerships with leading AI vendors and researchers, and proven implementations at companies like Booking.com, Intercom, and Block, we provide the complete framework for understanding AI’s real impact—from adoption metrics and ROI calculations to quality analysis and strategic recommendations.
What is AI measurement?
AI measurement tracks how AI coding assistants and autonomous agents impact developer productivity, code quality, and business outcomes. It goes beyond simple adoption metrics to understand the real value AI tools deliver across the software development lifecycle.
Effective AI measurement requires tracking three dimensions:
- Utilization: How developers adopt and use AI tools
- Impact: How AI affects productivity, quality, and developer experience
- Cost: Whether AI spending delivers positive return on investment
The DX AI measurement framework
DX developed the AI measurement framework in partnership with leading companies (GitHub, Dropbox, Atlassian, Booking.com), researchers, and AI vendors. The framework provides research-based metrics across three dimensions that mirror the typical adoption journey.
Utilization metrics
Organizations start by tracking usage with AI usage analytics:
- AI tool usage: Daily and weekly active users
- Percentage of PRs that are AI-assisted: Pull requests including AI-generated code
- Percentage of committed code that is AI-generated: AI-authored code reaching production
- Tasks assigned to autonomous agents: Work delegated to agentic AI systems
The biggest gains come when developers move from non-usage to consistent usage.
Impact metrics
After establishing adoption, teams measure impact with AI impact analysis:
- AI-driven time savings: Developer hours saved per week (industry-standard metric)
- Developer satisfaction: How developers feel about AI tools
- DX Core 4 metrics: PR throughput, perceived rate of delivery, Developer Experience Index, code maintainability, change confidence, change fail percentage
- Human-equivalent hours: Work completed by autonomous agents
Best practice: Combine direct metrics like time savings with longitudinal analysis of productivity metrics.
Cost metrics
Organizations optimize ROI by tracking:
- AI spend: Total and per-developer cost including licenses, usage-based pricing, training, enablement
- Net time gain per developer: Time savings minus AI spend
- Agent hourly rate: Human-equivalent hours divided by AI spend
How to collect AI measurement data
Three complementary methods provide comprehensive visibility:
Tool-based metrics
Most AI tools provide admin APIs for tracking usage, token consumption, and spending. System-level metrics from GitHub, JIRA, Linear, CI/CD tools, and incident management systems reveal shifts like changes in pull request throughput or review latency.
Periodic surveys
Quarterly surveys capture longer-term trends including developer satisfaction, perceived productivity improvements, code maintainability perceptions, and overall developer experience that system data can’t measure.
Experience sampling
Experience sampling asks targeted questions at the point of work. After submitting a PR: “Did you use AI to write this code?” After reviewing: “Was this code easier or harder to understand because it was AI-generated?”
Best practice: Layer all three methods to cross-validate data and build a comprehensive picture.
The reality versus the hype
There’s a significant gap between AI performance claims in headlines and what engineering leaders observe. Bold claims like “90% of code will be written by AI” come from cherry-picked studies in narrow scenarios.
What the data actually shows
DX research across 38,880 developers at 184 companies shows:
- Even leading organizations reach only 60-70% weekly active usage
- In mature rollouts, 40-50% use AI tools daily
- Average time savings: 3 hours and 45 minutes per week
- Real productivity boost: 5-15%, not 50-100%
Usage frequency drives gains
Research with a major enterprise job platform revealed:
- Heavy users (daily): Nearly 5x more PRs per week than non-users
- Frequent users (multiple times weekly): Nearly 4x more PRs
- Infrequent users (weekly): 2.5x more PRs
Power users see the biggest productivity breakthroughs.
Same-engineer analysis eliminates bias
A major financial services company tracked engineers’ productivity against their own baseline from before AI adoption:
- Engineers using AI tools: 30% increase in PR throughput year-over-year
- Engineers not using AI: 5% increase (normal variation)
This methodology eliminates confounding variables like tenure and team changes.
Key metrics for measuring AI impact
Speed metrics
TrueThroughput: Accounts for pull request complexity, providing more accurate signals than traditional PR counts.
PR cycle time: Time from PR creation to merge, showing whether AI tools accelerate or slow workflows.
Quality metrics
PR revert rate: Reverted PRs divided by total PRs, signaling quality issues when speed increases at quality’s expense.
Change failure rate: Percentage of production changes causing degraded service, outages, or rollbacks.
Code maintainability: Developer perception of how easy code is to understand and modify.
Effectiveness metrics
Developer Experience Index (DXI): Composite of 14 evidence-based drivers directly linked to financial impact. Every one-point increase saves 13 minutes per developer per week (about 10 hours annually).
Business impact metrics
Percentage of time spent on feature development: Tracks whether AI helps automate toil and frees developers for high-value work.
Calculating AI ROI
What teams are spending
Mid-sized tech companies typically spend $100,000 to $250,000 per year on AI tools. Large enterprises often invest $2 million+ annually:
- GitHub Copilot Business: $19/user/month ($114K/year for 500 engineers)
- OpenAI GPT-4 Turbo: $10K to $100K+ per year depending on volume
- Internal copilots and integrations: $50K to $250K+ in dev time and infrastructure
- Monitoring and governance tools: $20K to $100K per year
ROI calculation example
A product company rolled out GitHub Copilot to 80 of 120 engineers:
- Time saved: 2.4 hours × 80 engineers × 4 weeks = 768 hours/month
- Hourly cost: $150K/year ÷ 2,080 hours ≈ $78/hour
- Value of time saved: $59,900/month
- Tooling cost: 80 × $19 = $1,520/month
- ROI: approximately 39x return
Industry benchmarks
Based on 50+ enterprise implementations:
Small enterprises (50-200 developers): 150-250% ROI over 3 years, 12-18 month payback
Mid-market enterprises (200-1,000 developers): 200-400% ROI over 3 years, 8-15 month payback
Large enterprises (1,000+ developers): 300-600% ROI over 3 years, 6-12 month payback
High-performing implementations (top 20%) achieve 500%+ ROI through superior change management and comprehensive measurement.
Measuring autonomous agents
The most effective approach treats agents as extensions of developers and teams, not independent contributors. When assessing team PR throughput, include both human-authored and agent-authored PRs under that team’s direction.
Developers will increasingly operate as “leads” for teams of AI agents, measured based on team performance. Think of it like Jenkins or CI/CD tools—we measure efficiency gains in the context of teams using the tooling.
Note: Agentic tooling is still in early stages. Focus on maturity (autocomplete vs. multi-step tasks) rather than just usage.
Common pitfalls to avoid
Focusing on vanity metrics
Overemphasizing “percentage of code written by AI” without connecting to business outcomes leads to misleading conclusions. Focus on business impact metrics like time savings, developer satisfaction, and quality measures.
Why acceptance rate is flawed
Accepted code is often heavily modified or deleted before commit. Better alternatives include tagging AI-assisted PRs or using file system-level observability to detect AI-generated changes across IDEs and tools.
Measuring before adoption maturity
Allow 3-6 months for developers to develop effective AI workflows before drawing definitive conclusions. Early measurements should focus on adoption trends.
Expecting immediate linear correlations
Developers often reinvest time savings into higher-quality work, learning, or solving complex problems rather than just producing more output. Measure holistic impact, not just volume.
Ignoring multi-tool usage patterns
Modern developers use 2-3 different AI tools simultaneously. Measure combined productivity impact with vendor evaluation rather than evaluating tools in isolation.
Implementation roadmap
Months 1-2: set your baseline
Establish measurements before AI becomes embedded in workflows. Run developer experience surveys, track core engineering metrics (PR throughput, cycle times, deployment success rates), and document time allocation. You cannot retroactively recreate perceptual measurements once AI changes workflows.
Months 3-4: roll out and start tracking
Launch with pilot teams or opt-in users. Track weekly adoption, run short pulse surveys for time savings, share progress in company-wide meetings to create momentum.
Months 5-6: measure impact
Connect tool usage to baseline metrics. Segment users into cohorts (heavy, frequent, occasional, non-users) and compare results. Identify where AI drives value and what high-performing users do differently.
Ongoing: optimize and expand
Share monthly reports with leadership, conduct quarterly deep dives combining metrics with engineer interviews, adapt strategy with AI workflow optimization.
How to roll out AI metrics successfully
Focus on team-level aggregation
Always aggregate at team or department level, never track individuals. This protects psychological safety, avoids perverse incentives, and builds trust.
Communicate clearly and often
Be transparent about data usage. Emphasize that metrics won’t be used for individual performance evaluations, the purpose is understanding AI’s impact on developer experience, and data guides organizational investment decisions.
Treat it as an experiment
Define baselines, collect data systematically, analyze results, iterate strategy. Balance quantitative metrics with qualitative feedback about whether tools are genuinely helpful or creating friction.
Remember the bigger picture
AI is one tool among many. Over 20% of developer time is lost due to friction and poor tooling—not all AI-solvable. Balance AI investment with investments in code quality, infrastructure, and feedback loops.
Beyond code generation: high-impact use cases
The highest-impact AI applications often aren’t about writing code:
- Stack trace analysis and debugging: Number one time-saver. AI excels at interpreting complex error messages
- Code refactoring and cleanup: Automated suggestions for quality improvements
- Test generation and documentation: Eliminates repetitive but essential tasks
- Learning new frameworks: Accelerates onboarding and cross-training
- Requirements analysis: Translates business needs into technical specifications
- Code review assistance: Identifies issues and maintains consistency
Mid-loop code generation wasn’t a top use case. Real time-savers are eliminating tedious tasks AI handles exceptionally well.
Strategic recommendations
Capture a baseline to compare against
Treat AI adoption as rigorous experiment. Without baselines, you’ll never know if AI improved productivity or just changed how work feels. Organizations moving fast on baseline measurement will have longitudinal impact studies. Those that wait will have anecdotes.
Lead with clear data
Data beats hype every time. Organizations positioning themselves well approach AI as any significant technology decision: identifying specific problems AI can solve, building necessary capabilities, measuring impact systematically, maintaining focus on fundamentals with AI strategic planning.
Show up with confidence
As an engineering leader, answer three questions in any meeting:
- How does your organization perform today?
- How is AI helping, or not helping?
- What are you going to do next?
It’s your responsibility to educate others about realistic AI expectations.
Real-world success stories
Booking.com: Used the framework to roll out AI to 3,500 developers, achieving 65% higher adoption and saving 150K additional hours.
Intercom: By nearly doubling adoption, achieved 41% increase in AI-driven developer time savings.
Block: Used DXI to identify 500,000 hours lost annually to friction. Data shaped investment decisions and enabled faster delivery without compromising quality.
Airbnb: Achieved 70% weekly adoption through friction reduction, pre-installing tools, and in-IDE nudges.
Essential resources
DX is the industry leader in AI measurement research and implementation. Here’s our complete collection of frameworks, case studies, and practical guidance developed in partnership with leading organizations and researchers.
Framework and implementation
- AI measurement framework white paper: Complete research-based framework
- Framework introduction: Overview and key principles
- How to implement the framework in DX: Step-by-step guide
Measuring different dimensions
- How to measure AI’s impact on your engineering team: Practical multi-layered framework
- Three metrics for measuring AI’s impact on code quality: Change failure rate, PR revert rate, maintainability
- Five metrics in DX for measuring AI impact: TrueThroughput, cycle time, revert rate, DXI, feature development time
- How top companies measure AI impact: Case studies from 18 organizations
ROI and cost analysis
- AI coding tools ROI calculator: Calculate return on investment
- How to measure AI ROI in enterprise software projects: Enterprise evaluation framework
Real-world case studies
- Booking.com drives 65% increased AI adoption with DX: How data-driven measurement accelerated AI rollout
- Booking.com uses DX to measure AI’s impact on developer productivity: Measuring AI impact across 3,500 engineers
- Workhuman increases ROI from AI assistants 21% with DX: Data-driven optimization of AI tools
- Grammarly tracks and boosts AI adoption with DX: Systematic approach to AI tool rollout
- Mercari’s data-informed AI transformation with DX: Enterprise-scale AI measurement strategy
- DroneDeploy’s playbook for evaluating and maximizing AI impact: Comprehensive evaluation methodology
Methodology and best practices
- Data collection methodologies: Tool-based metrics, surveys, experience sampling
- How to introduce AI impact metrics: Best practices for rollout
- Measuring AI in software teams: trends: Current patterns and recommendations
Expert discussion and deep dives
- Podcast: Measuring AI code assistants: Framework breakdown with Laura Tacho and Abi Noda
- Podcast: How to cut through the hype: Live from LeadDev London
- Podcast: The evolving role of devprod teams: How platform teams are adapting
- Podcast: How to measure genAI adoption: Panel with GitHub, Airbnb, Jumio leaders
- Webinar: Measuring AI code assistants and agents: Live implementation discussion
- Newsletter: Measuring engineering productivity in the AI era: Key Q&A
- How 18 companies measure AI impact: Real metrics from industry leaders
Practical guides
- Takeaways: How to measure engineering productivity in the AI era: Summary of key learnings
- Building better software faster: Cutting through hype with data
Strategic context
- Unlocking developer productivity with generative AI: Comprehensive guide
- AI-assisted development is the future: Why measurement remains critical
- Framework announcement: Initial introduction
- How can you measure AI’s impact?: Connecting Core 4 with AI measurements
DX platform features
- AI usage analytics: Track adoption and usage patterns
- AI code metrics: Monitor AI-generated code quality
- AI impact analysis: Measure ROI and benchmark outcomes
- AI workflow optimization: Drive optimal effectiveness
- AI enablement: Focused, effective rollouts
- AI strategic planning: Data-driven strategy
- Vendor evaluation: Compare tools systematically
Putting it into practice
A successful AI measurement strategy combines the AI measurement framework’s specific metrics with broader productivity measurement. This dual approach ensures you understand both how AI tools are being used and how they’re impacting overall engineering performance.
The most successful organizations don’t just measure AI adoption. They use measurement to drive continuous improvement, inform enablement strategies, and make data-driven decisions about AI investments.
Ready to measure AI’s impact on your engineering organization? Request a demo to see how DX can help you implement these metrics.