There’s been a ton of research on the impact of AI tools like GitHub Copilot, but most of it comes from Microsoft and the vendors selling these solutions. So, we figured it was time to take a look at some real-world data and provide an independent perspective.
Earlier this month, we analyzed data from a US-based fintech company with 1,000-2,000 engineers. We’ve been working closely with them to understand the productivity impact of their Copilot rollout—something their CFO is very eager to measure.
Here’s what we found when we dug into the numbers.
One key metric we always look at with Copilot implementations is the boost in pull requests merged. Now, let’s be clear—the number of PRs alone isn’t a perfect metric. It doesn’t tell you about effort or how complex PRs are. But it’s a widely accepted signal for productivity, which is why it was included in the DX Core 4 framework.
For this analysis, we split the developer population into two groups: active Copilot users (developers using it at least once a week) and non-active users.
Over a 90-day period, active Copilot users merged 24% more PRs per week than non-users. A 24% lift is big, and the company was thrilled with this result.
Of course, not all PRs are created equal. Some are more complex than others. So, we also looked at average PR size, measured by lines of code changed. What we found was that active Copilot users not only merged more PRs, but their PRs were also larger on average.
The takeaway: this fintech company is seeing a measurable increase in output from Copilot users. This has helped them feel more confident about their investment.
That said, not every organization sees a 25% lift. Some even report lower PR throughput with Copilot. We’ll dive into some reasons for this in future reports, but it underscores why you need to look at a range of metrics when evaluating the impact.
As mentioned earlier, PRs don’t tell the whole story. There are other variables at play. Maybe the developers using Copilot are just more experienced or more productive overall. It could be correlation, not causation.
That’s why this company also looked at self-reported time savings. In their latest quarterly survey, 28% of active Copilot users reported saving at least one hour per week, and 11% said they’re saving two or more hours per week. These numbers track closely with the uptick in PR output.
Of course, self-reported metrics like this one have their own challenges, namely that it’s hard for people to provide highly precise reports on the time they’ve saved. One of the ways to overcome this is to collect more in-the-moment data points through methods such as experience sampling.
There’s been no shortage of vendor-driven research on AI tools like Copilot, but real-world examples are still hard to come by. The data we’ve seen from this fintech company shows a meaningful boost in both output and time savings. As Copilot continues to evolve, and as developers get better at using it, we expect its impact to keep growing.