DORA metrics tools in 2026: What to measure, and what's missing
The best tools for tracking DORA metrics today — and why measurement doesn't stop there
Taylor Bruneaux
Analyst
Engineering leaders have been asking the same question for years: what should we actually measure?
DORA metrics were a meaningful answer when they arrived. They gave teams a shared language for delivery performance — deployment frequency, lead time for changes, change failure rate, and time to restore service. That clarity was valuable. It still is.
But our research shows that DORA alone is no longer sufficient. Teams with strong DORA scores still report high friction. Engineering organizations with elite deployment frequency are still spending the majority of their R&D time on maintenance rather than new capabilities. The data reveals a consistent pattern: delivery speed is only one dimension of developer productivity, and optimizing for it in isolation creates blind spots.
In 2026, a third pressure has emerged. AI is reshaping how engineering organizations deliver software — and DORA metrics, on their own, can’t tell you whether improvements in deployment frequency or lead time are driven by AI tooling, sustainable process changes, or a temporary quality trade-off that hasn’t surfaced yet.
The DX Core 4 addresses the first gap directly — placing DORA’s delivery metrics within a broader framework of Speed, Effectiveness, Quality, and Impact. The DX AI Measurement Framework addresses the second, providing research-backed metrics for tracking AI adoption, impact, and cost alongside Core 4. This article covers the best DORA metrics tools in 2026, how each fits into a more complete measurement approach, and where leaders should focus next.
What the four DORA metrics measure
The DORA metrics measure software delivery performance across four dimensions.
Deployment frequency
Deployment frequency measures how often a team releases updates to production — new features, bug fixes, security patches, or technical debt reduction. It reflects the operational cadence of a DevOps team and is a validated signal of how effectively a team has reduced batch size and removed deployment friction.
Lead time for changes
Lead time for changes measures the elapsed time from when a change is initiated — typically a pull request — to when it reaches production. It surfaces bottlenecks in code review, testing, and release processes. Short lead times indicate a healthy, high-trust delivery pipeline.
Change failure rate
Change failure rate measures the proportion of deployments that result in a production failure requiring remediation. It is a validated signal of code quality and the effectiveness of pre-production testing. Higher failure rates indicate gaps in quality gates or testing coverage.
Time to restore service
Time to restore service measures how quickly a team recovers from a production failure. It reflects incident response maturity — the availability of runbooks, on-call processes, and the team’s working knowledge of the system under failure.
Together, these four metrics provide a validated baseline for assessing the speed and stability of software delivery. The table below summarizes what each metric surfaces and why it matters.
DORA metric | What it highlights | Business impact | Core 4 dimension |
|---|---|---|---|
Deployment frequency | Ship cadence issues — from process bottlenecks to CI/CD tool limitations | Reduces time to market; improves system quality and customer experience | Speed |
Lead time for changes | Process inefficiencies and resource constraints in the delivery pipeline | Reduces time to market; increases engineering throughput and developer retention | Speed |
Change failure rate | Defects that testing should have caught before production | Increases customer satisfaction; reduces time lost to firefighting and rework | Quality |
Time to restore service | Gaps in incident management processes, tooling, or system knowledge | Reduces customer dissatisfaction and revenue lost to downtime | Quality |
Who should use DORA metrics tools
DORA metrics tools are most useful for three audiences: engineering leaders who need to communicate delivery performance to the business, DevOps and platform engineering teams responsible for improving the delivery pipeline, and developer experience teams benchmarking their organization against industry standards.
Teams earlier in their DevOps maturity tend to get the most immediate value — DORA metrics surface bottlenecks that are otherwise invisible. Teams that are already high-performing use them as a stability check: a signal that speed improvements aren’t coming at the cost of reliability. In both cases, the data is only useful if the tooling feeding it is accurate, consistently defined, and connected to the right data sources.
The right DORA metrics tool depends heavily on where a team is in that journey, which is why tool selection matters as much as the metrics themselves.
Why DORA metrics are necessary but not sufficient
Tracking DORA metrics provides a rigorous baseline. Teams that don’t measure deployment frequency or lead time are flying blind on delivery performance. That baseline matters.
What the data also shows, however, is that DORA metrics do not encompass several conditions that directly influence whether developers can do their best work. They don’t capture whether developers have protected time for deep focus, whether cross-team collaboration is functioning well, how much technical debt is absorbing engineering capacity, or how developers experience their tools and workflows day to day.
Developers report that these conditions — slow feedback loops, interrupted flow, unclear ownership, fragmented tooling — are among the most significant sources of friction in their work. DORA metrics don’t surface them.
There is also a gap at the business level. A team can achieve elite DORA performance while directing most of its engineering capacity toward maintenance rather than new capabilities. That’s precisely why the DX Core 4 introduces Impact as a dedicated measurement dimension — specifically, the percentage of R&D time spent on new capabilities. This is a metric that non-technical stakeholders can immediately understand and act on.
The Core 4 was developed in collaboration with the authors of DORA, SPACE, and DevEx — including Dr. Nicole Forsgren — and has been tested with over 300 organizations. Organizations using it have seen 3–12% improvements in engineering efficiency and 14% increases in R&D time directed toward feature development. It provides a single, practical framework that supports decision-making at every level of the organization.
There is also a gap that Core 4 alone doesn’t close: understanding what role AI is playing in any of these results. Our data shows that even leading organizations are only reaching around 60% active usage of AI tools. Teams that are seeing DORA metrics improve may be benefiting from AI-assisted development — or they may be accumulating quality debt as AI-generated code increases throughput while reducing maintainability. Without AI-specific measurement, leaders can’t tell the difference.
That’s what the DX AI Measurement Framework is designed to address. It tracks three dimensions alongside Core 4: utilization (how much AI tooling is actually being used), impact (how AI is affecting developer time savings, code quality, and Core 4 metrics), and cost (whether AI spend is generating a positive return). Early data from companies using this combined approach shows that AI can provide lift across all four Core 4 dimensions — but only when adoption is measured, impact is validated, and quality signals are monitored in parallel.
Importantly, none of the DORA metrics tools reviewed below can feed data into the AI Measurement Framework. That capability is specific to DX. For teams that need only delivery baselines, the tools below are a solid starting point. For teams that need to understand whether AI is driving real improvement or masking quality risk, the measurement approach needs to go further.
Six DORA metrics tools compared for 2026
Several tools support DORA metric tracking in 2026. The options below represent a range of approaches — from CI/CD platforms and incident management tools to open-source infrastructure and purpose-built engineering intelligence platforms. For each, we’ve assessed what it measures well, where it falls short, and how well it supports a path toward Core 4-level measurement.
The table below summarizes how each DORA metrics tool compares across the criteria that matter most for selection.
Tool | All 4 DORA metrics | Setup complexity | Core 4 coverage | AI measurement | Open source |
|---|---|---|---|---|---|
GitLab | Partial (Ultimate tier) | Low | Speed + Quality only | No | No |
GitHub Actions | Partial (speed only) | Low | Speed only | No | No |
Apache DevLake | Yes | High | Speed + Quality only | No | Yes |
PagerDuty | Partial (stability only) | Low–Medium | Quality only | No | No |
Datadog | Yes | Medium | Speed + Quality only | No | No |
DX | Yes | Low | All four dimensions | Yes | No |
GitLab
GitLab is a hosted Git server with extensive features for source code management, CI/CD pipeline support, project planning, monitoring, and security.
What it supports
GitLab supports DORA metrics natively via its Value Streams Dashboard. For teams already running their entire delivery lifecycle in GitLab, this provides a low-friction starting point for tracking deployment frequency and lead time for changes. No additional tooling is required if incident management also runs through GitLab.
What’s missing
GitLab’s DORA metrics implementation assumes teams use GitLab for incident management. Change failure rate and time to restore service are computed from GitLab incidents specifically — teams using PagerDuty or other external tools will find these metrics incomplete without webhook configuration. DORA metrics require an Ultimate tier subscription, which starts at $99 per user per month. There is no pathway to measuring developer experience or business impact, which means leaders working from GitLab DORA data are operating with two of the four Core 4 dimensions at most.
GitHub Actions
GitHub Actions is the CI/CD automation platform built into GitHub. It is already part of the delivery pipeline for a large share of engineering teams, making it a natural — if partial — source of DORA signal.
What it supports
GitHub Actions captures the raw data needed to approximate deployment frequency and lead time for changes. Community-maintained actions in the GitHub Marketplace can calculate both metrics directly from workflow run data, requiring no additional tooling for teams that deploy via GitHub pipelines. For teams that want a rough DORA baseline quickly and without dedicated tooling, this is the lowest-friction starting point available.
What’s missing
GitHub Actions does not track change failure rate or time to restore service — both require incident data that lives outside the CI/CD pipeline. Metric calculations based on workflow runs are approximations, and results can be skewed for teams with low deployment volumes or non-standard branching strategies. GitHub Actions provides a data source, not a DORA metrics platform. Making meaningful use of the output requires connecting it to a broader observability or analytics layer.
Apache DevLake
Apache DevLake is an open-source dev data platform that ingests, analyzes, and visualizes engineering data from across a team’s toolchain. It has become the leading self-hosted option for teams that want flexible, customizable DORA infrastructure without vendor lock-in.
What it supports
DevLake provides a built-in DORA dashboard via Grafana, with data source support for GitHub, GitLab, Jira, Jenkins, BitBucket, Azure DevOps, PagerDuty, and more. Because it is fully open, teams can define their own deployment and incident logic, write custom SQL queries, and extend dashboards to match their specific delivery model. It is the only DORA metrics tool on this list that gives teams complete ownership over how metrics are computed and stored — and it is free to use.
What’s missing
DevLake requires meaningful infrastructure investment — deploying via Docker Compose or Helm, configuring data connections, and managing ongoing maintenance. Initial setup typically takes several days of engineering time, and ongoing reliability depends on the team owning the stack. There is no out-of-the-box support for developer-reported friction or the qualitative signals that Core 4’s Effectiveness dimension depends on. It is best suited for platform engineering teams with capacity to own their own tooling.
PagerDuty
PagerDuty is an incident management platform that gives engineering teams real-time visibility into production failures and the workflows to respond to them. For DORA measurement, it is the primary data source for the two stability metrics.
What it supports
PagerDuty is the most commonly used tool for feeding change failure rate and time to restore service data into a DORA metrics stack. It integrates natively with GitLab’s Value Streams Dashboard, Datadog DORA Metrics, Apache DevLake, and other platforms via webhook, making incident data available to whichever analytics layer a team uses. The precision of PagerDuty’s incident timelines — start time, escalation events, resolution — makes it one of the most reliable sources for time to restore calculation.
What’s missing
PagerDuty is an incident management tool, not a DORA metrics platform. It does not compute or display DORA metrics on its own. Teams need to connect it to an analytics or observability layer to translate incident data into delivery performance signals. It also has no visibility into the speed metrics — deployment frequency and lead time require CI/CD data from a separate source.
Datadog
Datadog is a monitoring and observability platform that added a dedicated DORA Metrics product to its Software Delivery suite. For teams already running Datadog, it provides one of the more integrated approaches to DORA measurement available today.
What it supports
Datadog DORA Metrics automatically ingests deployment and failure data from across a team’s stack — CI/CD pipelines, APM, Git providers, and incident management tools including PagerDuty — without requiring custom instrumentation. It tracks all four DORA metrics and surfaces them through pre-built dashboards with filtering by team, service, environment, and time period. Because Datadog sits across the full observability stack, it can correlate DORA signals with system health data: for example, showing whether an increase in deployment frequency has affected error rates or service availability. DORA Metrics is included at no additional cost for existing Datadog customers.
What’s missing
Datadog is primarily an observability platform that has extended into DORA measurement. Teams not already using Datadog APM or Datadog Incident Management will find the setup more involved, and configuring accurate metric computation across a complex toolchain still requires meaningful effort. Like all delivery-focused DORA metrics tools, it does not capture developer-reported friction or the Impact dimension that connects engineering output to business value.
DX
DX is the only platform built specifically to implement the full DX Core 4 framework, integrating DORA metrics as one part of a complete view of engineering performance.
What it supports
DX captures all four DORA metrics within the Speed and Quality dimensions of Core 4. It goes further by measuring the Developer Experience Index (DXI) — a validated measure of 14 factors that influence how developers engage with their work — which quantifies the conditions developers need to deliver effectively. Its SDLC analytics layer surfaces engineering performance data across teams and business units. The Impact dimension connects engineering output to business value, tracking the percentage of R&D time directed toward new capabilities — a metric that resonates with CFOs and CEOs in ways that lead time does not.
Our research shows that each one-point improvement in the DXI saves approximately 13 minutes per developer per week. Teams in the top quartile of DXI scores show 4–5x higher performance across speed, quality, and engagement. DX makes these relationships visible and actionable.
DX also implements the DX AI Measurement Framework alongside Core 4, giving leaders a single platform to track AI utilization (adoption and active usage across teams), impact (AI-driven time savings, developer satisfaction, and Core 4 metric trends), and cost (AI spend against net time gained). This matters because DORA metrics alone can’t tell you whether an improvement in deployment frequency reflects a genuine process improvement or a throughput increase that is quietly degrading code quality. Booking.com used this combined approach to deploy AI tools to over 3,500 engineers and achieved a 16% increase in throughput within several months.
Where leaders should focus
Every other DORA metrics tool on this list provides raw delivery data — in some cases, very good raw delivery data. DX provides DORA as a foundation and builds the complete measurement layer on top of it: developer experience, engineering performance, AI impact, and business outcomes. It is the only tool here that answers not just “how fast are we delivering?” but “why, and what should we do about it?”
How to choose the right DORA metrics tool
The right tool depends on where a team is in its measurement maturity. Most organizations move through four stages.
Stage 1: DORA fundamentals
The goal is a reliable baseline for deployment frequency and lead time. GitLab works well for GitLab-native teams; GitHub Actions for GitHub-native teams; Apache DevLake for organizations that want self-hosted flexibility and are willing to invest the setup time. PagerDuty and Datadog are the natural complements for adding stability metrics — change failure rate and time to restore service require incident data that CI/CD tools alone can’t provide.
Stage 2: An integrated DORA stack
Most teams end up combining tools: a CI/CD source for speed metrics, an incident management tool for stability metrics, and an analytics layer to bring it together. Datadog is the strongest integrated option for teams already in its ecosystem. DevLake is the strongest open-source option for teams that want full ownership. The key question at this stage is data integrity — whether metrics are consistently defined across teams and trustworthy enough to act on.
Stage 3: The full picture
When the goal is to give leadership a complete view of engineering performance — from delivery speed to developer-reported friction to business impact — the DORA metrics tools above provide inputs, but not answers. That requires a framework and platform built for the purpose. DX and the Core 4 are where that work happens.
Stage 4: DORA in an AI-assisted organization
This is the emerging measurement challenge in 2026. As AI tools raise throughput across engineering teams, DORA metrics can improve on the surface while quality degrades beneath. Deployment frequency rises; change failure rate may follow. Lead time shortens; code maintainability may suffer. Teams deploying AI tools need to monitor Core 4 Quality metrics — specifically change failure rate and the DXI’s code maintainability signal — with extra care. Pairing DORA data with AI utilization and impact metrics is what separates organizations that are genuinely improving from those that are generating misleading numbers.
When evaluating any DORA metrics tool, also assess: ease of setup and integration, security model, scalability across teams, and whether reporting is actionable out of the box or requires significant internal investment to interpret. For a broader view of which DevOps KPIs connect engineering work to business results, the framework matters as much as the tooling.
Where leaders should focus next
DORA metrics are a valuable starting point. They give teams a shared language for delivery performance and a clear signal of where the delivery pipeline is breaking down.
The more complete measurement approach — one that also captures the conditions developers need to deliver effectively and connects engineering output to business outcomes — is the DX Core 4. The SPACE framework remains useful for teams that want to define custom metrics in areas like collaboration and flow. For more context on how these frameworks connect, the Accelerate metrics that originally motivated DORA remain useful background.
For engineering leaders who want a single, practical answer to “what should we measure?” — Core 4 is the clearest the industry has produced. Start with DORA metrics tools to establish your delivery baseline. Layer in developer experience measurement to understand why performance looks the way it does. Add AI measurement to understand what’s actually driving change. That sequence is the path to measurement that improves the organization, not just the dashboard.
Frequently asked questions
What are the best DORA metrics tools in 2026?
The best DORA metrics tool depends on your team’s toolchain and measurement maturity. For GitLab-native teams, GitLab’s Value Streams Dashboard provides the lowest-friction starting point. For GitHub-native teams, GitHub Actions combined with Datadog covers all four metrics without additional tooling. For open-source flexibility, Apache DevLake is the strongest self-hosted option. For a complete view that goes beyond DORA — including developer experience, business impact, and AI measurement — DX is the only platform that implements the full DX Core 4 framework.
What is the difference between DORA metrics and the DX Core 4?
DORA metrics measure software delivery performance across four dimensions: deployment frequency, lead time for changes, change failure rate, and time to restore service. The DX Core 4 extends this by adding two further dimensions — Effectiveness (measured through the Developer Experience Index) and Impact (measured through R&D time allocation and business outcomes). DORA covers the Speed and Quality dimensions of Core 4. Core 4 provides the complete picture.
Can DORA metrics tools measure AI’s impact on engineering productivity?
No. Standard DORA metrics tools track delivery system data — CI/CD pipelines, incident management tools, and Git providers. They can show whether deployment frequency or lead time has changed, but they cannot tell you whether those changes are driven by AI tooling, process improvements, or a quality trade-off accumulating in the background. Measuring AI’s impact requires a dedicated framework. The DX AI Measurement Framework tracks utilization, impact, and cost of AI tools alongside Core 4 metrics, providing the attribution that DORA alone cannot.
How do I measure DORA metrics without a dedicated tool?
Teams can approximate deployment frequency and lead time using CI/CD pipeline data from GitHub Actions or GitLab CI, and time to restore using incident data from PagerDuty. Change failure rate requires correlating deployment events with incident data, which typically requires a lightweight analytics layer or custom scripting. Apache DevLake is the strongest open-source option for assembling all four metrics without a commercial tool. The tradeoff is setup time and ongoing maintenance.
What is a good DORA metrics benchmark?
The DORA research program defines four performance tiers. Elite teams deploy on-demand (multiple times per day), with lead times under one hour, change failure rates below 5%, and time to restore under one hour. High performers deploy between once per day and once per week. The 2025 DORA report found that 16.7% of surveyed teams reported a change failure rate of 4% or lower. Where leaders focus should be on improvement over time within their own context, rather than hitting specific thresholds — the benchmarks are directional signals, not targets.
Which DORA metrics are most affected by AI-assisted development?
Our data shows deployment frequency and lead time are most likely to improve as AI tools increase individual developer throughput. Change failure rate is the metric to watch most carefully — AI-generated code can accelerate delivery while introducing quality issues that don’t surface immediately. Organizations deploying AI tools should treat change failure rate as an early warning signal, and pair DORA Quality metrics with code maintainability data from the DXI to catch quality degradation before it compounds.