Skip to content

PRs won’t tell you whether AI is boosting developer productivity. This will.

We all know AI is all the rage right now. From the executive suite to newsrooms to development teams, everyone’s buzzing about how tools like GitHub Copilot are rewriting how software is delivered. You’ve seen the stats — tasks done 55% faster, review times cut down by an average 19.3 hours, AI-assisted PRs see a 1.57x higher merge rate. Microsoft, Google, and all the big names are publishing research and fueling this narrative.

Nearly every engineering leader I chat with is throwing budget at AI tools. It’s a gold rush like we haven’t seen in years. Yet, amidst this frenzy, there’s a significant blind spot: actually measuring the impact of AI on developer productivity.

Leaders need this information to validate and inform their investments. However, developer productivity has always been a complex problem. Measuring the impact of AI is no different.

Some organizations are also experiencing suboptimal adoption. Despite all the hype, AI tools are not spreading as expected. These leaders lack data about why developers are or aren’t adopting these tools, and are looking for ways to better understand why this is happening and how to address it.

The core issue for adoption challenges and justifying the ROI is this: it’s difficult to get useful feedback, signals, and measurements on how AI is impacting developer productivity. At DX, we’ve been working with a number of organizations to solve these challenges, and are seeing promising results. In this article, we will share our learnings on the different approaches organizations are using, and provide guidance into how to combine the methods available into a holistic approach that gives organizations adopting AI tools the insights they need.

To read our full guide on how to measure AI adoption and impact, go here.

How organizations are collecting feedback and data today

Right now, companies are desperate for data on AI’s impact but are coming up short. The usual metrics are not cutting it. Pull request counts, number of commits — they’re not telling a compelling story (and in many cases, not showing any changes at all). Leaders are worried and confused.

Some organizations have launched efforts to collect data through surveys, but struggle with survey design and collecting enough responses to produce reliable baselines. Experience sampling – the least familiar of the methods – holds a lot of promise, but putting it into practice can be challenging.

At DX, we’ve witnessed many of the benefits and challenges of these different approaches, and find that many organizations’ challenges stem from the misapplication or misunderstanding of how to properly utilize each method. Telemetry metrics, experience sampling, and surveys can all provide leaders with rich and useful data. Deploying each method successfully is the challenge.

Methods for measuring GenAI impact: telemetry, experience sampling, surveys

A better way forward

Understanding AI’s adoption and impact is tough, but not impossible. It requires a nuanced approach. Let’s break down three strategies that can be used together to get the insights you need:

  1. Telemetry metrics: These include common metrics like pull requests per developer, code review time, or cycle time. These types of metrics are great for getting a high-level gauge of how developer output and activity levels are being affected by AI. However, they don’t tell AI is the cause of any fluctuations, and they don’t give insight into how AI tools are being used.
  2. Experience sampling: Experience sampling is a method for collecting data about individuals’ behaviors, thoughts, and feelings as they occur in real-time or close to the moment of experience. For example, you could send short surveys to developers as they complete tasks to learn if and how they’re using AI tools and what benefits they’ve realized. If not using a tool built for this such as PlatformX, experience sampling can be difficult to set up. However, it’s a powerful method for getting concrete data on time savings and ROI from AI tools.
  3. Surveys: Periodic developer surveys are used to measure adoption, satisfaction, and self-reported productivity. They’re a powerful tool for capturing additional insights about the benefits of AI. For example, we’ve seen AI have direct measurable benefits on developer fulfillment and ease of completing development tasks. Surveys are difficult to design and administer well, so organizations may experience insufficient participation rates or questionable data.

In practice, organizations can build each of these methods themselves, use piecemeal vendors, or work with a vendor like DX that automates all three.

Bringing it all together

To fully understand AI’s benefits, adoption, and how its being used, you need a mixed-methods approach. Telemetry benefits, experience sampling, and surveys all have their unique strengths.

We recommend organizations start with surveys to get a baseline before AI tools have been fully rolled out. Running these surveys regularly, about every six to twelve weeks, helps track changes in developer adoption and satisfaction.

Then, keep an eye on telemetry metrics to spot any changes or trends in developer productivity levels as AI tools are adopted. Be sure to properly clean and normalize data so that you’re getting reliable signals.

Lastly, we strongly recommend running experience sampling studies in focused, four-week intervals. These studies can yield powerful data on the dollar-value ROI of AI tools, along with close-up insights into how developers are using AI to realize their productivity gains. These learnings can be shared back with other developers and internal platform teams, helping make clear the best use cases for AI as well as gaps and opportunities.

Methods for measuring GenAI impact

Final thoughts

AI represents a significant opportunity to boost developer productivity and job satisfaction. Effective collection of developer metrics and feedback is key to optimally rolling out and realizing the full impact of these tools.

As discussed, data can be used to better understand and drive adoption, as well as validate the financial ROI of productivity gains being captured. Insights on specific AI use cases can help with educating developers across your organization on how to best apply these tools.

The earlier organizations can establish baselines and put data mechanisms in place, the better: this provides a longitudinal view of how AI impacts your business over time.

To read our full guide on how to measure AI adoption and impact, go here. To learn more about how DX can help you implement the approach described in this article, request a product walkthrough here.

Published
March 8, 2024