Skip to content
Podcast

AI and productivity: A year-in-review with Microsoft, Google, and GitHub researchers

As AI adoption accelerates across the software industry, engineering leaders are increasingly focused on a harder question: how to understand whether these tools are actually improving developer experience and organizational outcomes. In this year-end episode of the Engineering Enablement podcast, host Laura Tacho is joined by Brian Houck from Microsoft, Collin Green and Ciera Jaspan from Google, and Eirini Kalliamvakou from GitHub to examine what 2025 research reveals about AI impact in engineering teams. The panel discusses why measuring AI’s effectiveness is inherently complex, why familiar metrics like lines of code continue to resurface despite their limitations, and how multidimensional frameworks such as SPACE and DORA provide a more accurate view of developer productivity. The conversation also looks ahead to 2026, exploring how AI is beginning to reshape the role of the developer, how junior engineers’ skill sets may evolve, where agentic workflows are emerging, and why some widely shared AI studies were misunderstood. Together, the panel offers a grounded perspective on moving beyond hype toward more thoughtful, evidence-based AI adoption.

Show notes

Measuring AI impact requires multiple lenses

  • There is no single metric that can capture AI’s impact. Developer productivity and experience are inherently multidimensional, requiring trade-offs to be evaluated across speed, quality, collaboration, and meaning.
  • Frameworks like SPACE and DORA help avoid metric tunnel vision. They encourage teams to examine complementary signals rather than optimizing one dimension at the expense of others.
  • Measurement must reflect systems, not tools. AI does not operate in isolation; its impact depends on organizational context, workflows, and existing engineering practices.

Why familiar metrics keep failing us

  • Lines of code remains a deeply misleading metric. AI tends to generate verbose code, making raw output a poor proxy for productivity, quality, or long-term maintainability.
  • More code does not equal better outcomes. Excess code can increase maintenance burden, technical debt, and cognitive load over time.
  • Easy-to-measure metrics are often the most dangerous. Their simplicity makes them attractive during periods of uncertainty, even when they obscure what is actually changing.

The limits of tracking AI-generated code

  • Measuring the percentage of AI-generated code oversimplifies reality. AI may write, delete, refactor, or reorganize code in ways that raw percentages fail to capture.
    AI-generated code does not inherently signal higher risk. In some contexts, AI output may be more consistent or higher quality than human-written code.
  • These metrics are better used as supporting signals, not goals. They can inform budgeting, experimentation, or adoption patterns but should not drive performance targets.

How AI is reshaping the role of the developer

  • Developers are shifting from implementers to orchestrators. Advanced AI users spend more time framing problems, setting context, and validating outcomes than writing raw code.
  • AI fluency is becoming a core skill. Knowing how to guide, correct, and collaborate with agents is increasingly important.
  • Adoption follows a progression. Developers tend to move from skepticism to exploration, collaboration, and eventually strategic use as expectations recalibrate.

What this means for junior engineers

  • Skill development may accelerate rather than disappear. Junior engineers may practice delegation, planning, and system-level thinking earlier by working with AI agents.
  • Technical fundamentals still matter. Understanding architecture, requirements, and failure modes remains essential for supervising AI-generated work.
  • Interpersonal skills risk being deprioritized. Managing agents is not the same as managing people, raising concerns about how collaboration skills develop over time.

AI is not just a productivity tool

  • Creativity and innovation benefit from friction. Research suggests that exposing decision points and seams can create space for new ideas rather than faster repetition.
  • Automating everything is not always desirable. Removing all toil may reduce opportunities for learning, insight, and creative problem-solving.
  • AI should augment thinking, not replace it. Tools that surface trade-offs and choices can support better outcomes than those that simply eliminate effort.

High-leverage AI use cases focus on toil

  • Developers spend only about 14% of their time writing code. Optimizing coding alone rarely leads to large productivity gains.
  • The biggest opportunities lie in removing friction. Documentation, compliance tasks, incident response, flaky tests, and knowledge discovery consistently rank as top pain points.
  • AI excels at work developers dislike but must still do. Automating dull, repetitive tasks can improve satisfaction and free time for meaningful work.

Why leadership and change management matter

  • AI adoption is a human problem before it is a technical one. Organizations that understand developer pain points deploy AI more effectively.
    Agentic workflows amplify organizational differences. Teams with strong experimentation cultures and feedback loops move faster and with less friction.
  • Culture determines outcomes. How leaders communicate expectations, normalize experimentation, and support learning shapes whether AI adoption succeeds or stalls.

Looking ahead to 2026

  • Task parallelization is an emerging frontier. Developers are beginning to use agents to explore multiple solution paths simultaneously.
  • Collaboration with agents will redefine productivity. Teams, not just individuals, will increasingly work alongside AI systems.
  • Research must evolve with the work itself. New workflows will require new metrics, new telemetry, and new ways of understanding impact.

Lessons from the METR paper

  • Context matters more than headlines suggest. Results showing slower performance often reflected expert developers working in familiar codebases.
  • AI may help most where familiarity is lowest. New domains, unfamiliar systems, and onboarding scenarios show different outcomes.
  • Media oversimplification distorts understanding. Nuance is critical when interpreting AI research, especially as studies move into real-world environments.

Timestamps

(00:00) Intro

(02:35) Introducing the panel and the focus of the discussion

(04:43) Why measuring AI’s impact is such a hard problem

(05:30) How Microsoft approaches AI impact measurement

(06:40) How Google thinks about measuring AI impact

(07:28) GitHub’s perspective on measurement and insights from the DORA report

(10:35) Why lines of code is a misleading metric

(14:27) The limitations of measuring the percentage of code generated by AI

(18:24) GitHub’s research on how AI is shaping the identity of the developer

(21:39) How AI may change junior engineers’ skill sets

(24:42) Google’s research on using AI and creativity

(26:24) High-leverage AI use cases that improve developer experience

(32:38) Open research questions for AI and developer productivity in 2026

(35:33) How leading organizations approach change and agentic workflows

(38:02) Why the METR paper resonated and how it was misunderstood

Listen to this episode on:

Transcript

Laura: Welcome to another episode of the Engineering Enablement Podcast. I’m Laura Tacho. As we close out 2025, let’s take a moment to reflect on the leaps that our industry has made in the last 12 months. If we rewind to the beginning of the year, AI usage was actually pretty spotty across the industry, and a path toward consistent industry-wide adoption was just starting to come into focus.

Tools like Kiro and Cloud Code hadn’t even been released yet. And now looking at the industry in December of 2025, 90% of developers are using AI tools at least once a month to get their work done. And with all of this growing adoption comes a growing focus on impact and more questions about how AI is really impacting developers, our organizations, and the software that we build.

To discuss this, invited Brian Houck from Microsoft, Colin Green and Sierra Jaspin from Google and Irene Kaliamvaku from GitHub to share some of their research from 2025, share their takes on AI impact, and talk about the biggest questions that they’re trying to answer in 2026.

Welcome, everybody. My name’s Laura Tacho. I’m the CTO at DX. I am joined by a fantastic group of panelists today. What we are going to talk about is a year summary of AI research. We’re going to talk about measurements, we’ll talk about data literacy and what you can do to better understand these kinds of reports when they come out, we’ll talk about the most interesting things, things that are on the radar for 2026.

Before we get into that, I want to invite each one of my guests here to introduce themselves really briefly. Brian, I’m going to start with you just because you’re on the top of my screen.

Brian: Awesome. Thank you. Well, thanks, everyone, for having me. My name is Brian Houck. I work at Microsoft. I work in what’s called our Engineering Thrive division. And our team’s mission is to make it easy for Microsoft developers to do great work quickly, and so how do we measure and improve all of the sources of friction that get in the way of us doing our jobs?

Laura: Awesome. How about Ciera next?

Ciera: Hi, everyone. My name’s Ciera Jaspan. I’m a senior staff software engineer at Google, and I lead the engineers on the developer intelligence organization. So we are in charge of trying to understand what makes developers happy and productive and guiding the tools, teams, and frameworks, and process teams at Google to try to improve productivity.

Laura: Awesome. And you’re joined by your colleague, Collin, which, Collin, if you wouldn’t mind introducing yourself as well.

Collin: I’m Collin Green. I’m a senior staff UX researcher at Google. I work with Ciera in the developer intelligence organization. Ciera mentioned that she runs the engineering side of the team. Our team has a UX research side, composed mostly of behavioral and social scientists, so we study the human component of the issue that Ciera was mentioning. We try to make developers happy and productive at Google, and we use a bunch of different methods from behavioral and social sciences to help do that.

Eirini: Hi. I’m Eirini Kalliamvakou. I’m a research advisor at GitHub, which means I oversee and lead some key research initiative. The common denominator is always trying to understand how developers are working, how they’re thinking, channel that into our prototyping of new solutions, our evaluations, our strategic narratives. So it’s always very interesting work. And most recently has been focused on AI and its impact, but it always has this theme of productivity and happiness for developers.

Laura: And what a year it’s been for AI and research and measurement and impact. I think nobody on this call is bored at work. That’s definitely for sure. There’s always something absolutely changing. Sometimes it feels like minutely, hourly, definitely.

To start our conversation off, I do want to talk about just this very big problem of measuring AI impact. So I think by this point, we’ve got 90% of developers or so using AI tools at least monthly, weekly, many of them daily. And developers themselves, leaders, executives are left with a big question of like, how do we actually know if this is having a positive impact or a negative impact? And so we need measurement in order to establish that.

Before we get into the really fine-grain details of what metrics to look at, I wondered if we could zoom out a little bit and talk just conceptually about the approach to measurement. What is a good way to approach measuring AI impact?

And Brian, I want to kick this off to you to start with, if you wouldn’t mind sharing your approach on this, because I think you’ve done some interesting work on the measurement problem, both for AI, but also just very broadly across developer productivity.

Brian: Yeah, absolutely. So I am a co-author of the SPACE framework, and so I have a very particular view that developer experience is incredibly nuanced, and that requires us to look across lots of different dimensions. And so there is no one measure that we will ever have on is AI making a difference. We really need to look at how is it changing the speed of our throughput, how is it changing our efficiency, our ability to collaborate and work effectively with our peers? What is it doing to the sense of meaning we get from work? And so when I think about designing specific metrics, I really start with this notion of we have to look at it from multiple angles, complimentary angles, to make sure that an improvement in one area isn’t being counteracted by a decrease somewhere else.

Laura: I would love to find a place where we all disagree, where we have differing opinions, and maybe that will come, but it seems like that’s also the approach that you both are taking as well at Google.

Collin: I think that’s right. And this notion of having trade-offs and understanding the trade-offs that you’re making when you introduce tools or interventions is really important. I think the other bit, Brian mentioned SPACE, and I think it’s important to focus on developer experience as a holistic thing, not just productivity. Productivity is obviously very important, and we want our developers to be productive, but we also want to pay attention to things like how is using AI de-skilling/re-skilling developers. So those like longer trajectory things, how do we build an organization that’s not only productive, but is sustainably productive and is going to grow and mature along with the tools and continue to be productive 5 and 10 years in the future. I think that’s part of the picture as well.

Laura: Eirini, what does this look like for you at GitHub? But then also you contributed to the DORA report, which is a huge cross-cutting research project that was very focused on AI. How does this theme of what to measure show up for you?

Eirini: I think there are different pieces of the whole puzzle, and we haven’t necessarily also even seen the full spectrum of impact as well. So a lot of the time, I fully agree with Brian’s multidimensionality is the way to go, for sure.

In the DORA report this year, there was a lot of focus on the more traditional metrics like throughput and stability and so on. And we saw throughput starting to go up, which was different from the year before. Instability is still there.

And interesting thing from the 2025 report was capabilities in the organization that amplify AI’s benefits to team performance, to organizational performance, and so on. And that was a little bit unique to have a model of these seven capabilities.

And a lot of the capabilities sound very familiar, like working in small batches or having strong version control practices and so on. But other capabilities, like how clear is the AI policy in an organization, are starting to emerging as more consistent levers when we’re trying to influence the impact that AI has, and then when we’re also trying to measure whether it achieved what we wanted.

I also very much think about how developers are experiencing the difference in how they work, and we’re starting to see that emerging as well.

We did a study that was based on trust, trying to understand and describe how developers are building trust in AI tools. And I think that there are a lot of interesting points there of what is it that developers look … how do they judge AI tools, whether they trust them, and therefore in the end, because that predicts whether they’re going to adopt them and how much they’re going to use them as well. So a lot about, is the output good, is the tool helpful, is it predictable? Which is, I think, a big, big developing story when it comes to AI tools. These were all factors that contributed to the intent to adopt. So it’s all part of the same puzzle.

Laura: Yeah, thank you for sharing those answers. It sounds like we’ve landed in the same spot with measuring AI impact as we did with developer productivity. And my opinion is they’re not different. We have to look at multiple dimensions in order to find the ROI of AI or to find the impact. And the way that AI works is by applying it to a system. And then we have things like the SPACE framework, like Core 4, whatever other metrics framework to measure the system outcomes. And then we can make a decision about whether AI is impactful or not.

So we’re now at a point where we have a lot of adoption. We’re talking about multidimensional measurement strategies. I wonder if any of you have an example of a metric that you’ve seen become more popular recently that you find to be interesting or quite good, or even maybe the opposite, not very good, that maybe wasn’t talked about at the beginning of this year. Is there a new metric for AI impact or adoption? Brian, I see you laughing. I think you have an answer, so I’m going to call on you first.

Brian: I do. It’s like I want to pull my hair out, because I feel like all of a sudden we’ve all lost our minds and forgotten what a thoroughly incomplete metric lines of code is. And it’s just like everywhere I look, people are talking about lines of code is a credible metric for the impact of AI. And it’s just like the verbosity of AI lends itself to writing a lot of lines of code. And that may be good, that may be bad, but in and of itself, that is a metric that I cannot believe as an industry we’re letting trickle back in.

Collin: I was just watching Martin Fowler on the Pragmatic Engineer, that last episode, and he was reminding me about this notion of lines of code spent rather than lines of code produced. And I think that’s a great counter to the notion of counting lines and code as a measure of productivity. You’re building things, you’re deploying them that have to be maintained, that they’ll depreciate, that they’ll decay, they’ll introduce overhead and technical debt. So you should really think carefully about whether you actually want to collect these things or whether you just want to spend exactly as many as you must.

Ciera: I didn’t even think of the lines of code because I’ve been putting such a moratorium to Google, I’m like, “We’re not doing that.” But I was just reading, it’s not just industry; academic papers are doing it too. I was just reading a paper where they were using lines of code, and I’m looking at it going, they took a program, and they translated it over with AI versus humans, and the AI version had 10 times the amount of code. Sometimes one of the programs said 20 times, and I’m like, “Really? How are we thinking that therefore?” And they’re like, “See, look, it wrote more code, it’s better.” I’m like, “But it was the same program to start with.”

Laura: Wow.

Ciera: It was astounding to me that we’re still using this.

Laura: Why do you think that is?

Eirini: It’s the easiest thing to measure. And I think that that has always been the default behavior when it comes to measuring productivity for engineering is it’s the most visible output and it’s the easiest to measure. You don’t even have to have very complicated telemetry or sophisticated frameworks in place to capture that. So I think that’s just the curse of it is just so obvious and so easy to measure that it draws the eye.

And while we’re in this more intermediate phase, especially with AI, I definitely believe we have good, solid multidimensional frameworks already to guide us, but also, it’s the admissions that the work itself is changing. So it’s that we are moving more towards developers actually writing less code, being focused less on raw implementation and more writing specs, for example.

So the nature of the work is changing, making a lot of these metrics completely irrelevant, but we also don’t have the new set of metrics fully figured out yet. And I think that that gap in the middle is where everybody falls, and it’s like, “I guess I’m going to measure the thing that is easiest to reach for and it’s fastest to turn around.”

Laura: I think that’s part of just a normal human pattern in all cases. It’s like we’re in this amorphous, messy middle part where there’s a lot of unknowns, and so we just reach back to the things that we feel are extremely controllable and very concrete. And unfortunately, lines of code is that. So lines of code, we all agree, not a great metric.

I wondered if any of you had opinions or have seen interesting examples of how this plays out in a research scenario or real-life scenario, measuring percentage of code generated by AI. It’s different than lines of code, but what information might that give to an organization? What decisions could they make with that?

Ciera: It’s interesting because there is an assumption built into that, that that means AI equals risk. And maybe that’s true. I don’t know that we’ve actually know for certain one way or the other. For some organizations, I can imagine that maybe the AI produces better code. If you’ve got a whole bunch of very new engineers who are still struggling, maybe AI for some organizations is actually better. I don’t know that it necessarily means risk. I think it’s an interesting number to be aware of, but I’d rather see us focus on things like, how do we manage risk overall in general, rather than just risk of AI, because we can have risk from other places as well. We can have risk from people not knowing the critical requirements. We can have various compliance risks that might be just as well missed by humans. We can have other types of automated agents that are not AI-based that are also increasing risk or decreasing risk. And I think I’d rather see us understand what is risk in general and be able to measure those different types of risk.

Collin: I think there’s also a challenge just with the notion of percentage of code and production that was written by AI. I mean, some lines will be modified by AI, some lines will be added by AI, some will be deleted by AI. Is it even a meaningful metric to say X percent of the lines of code and production? Maybe you have AI that’s really good at stripping out dead code. So 0% of the code in your production is AI-generated, but in fact, AI is having a very big positive impact. There’s a lot of nuance there that’s lost with a metric like that.

Brian: I think there’s also this difference between having a metric which implies that you might have goals against it, and you’re tracking progress over time, versus it is interesting supporting information. And so knowing your SI percentage, it does help you look at mapping, well, are we starting to leverage AI in new projects, in new interesting ways? Are we growing that usage? Even doing things like helping understand budgeting and cost, tokens aren’t free. And so looking at, are we able to generate more AI code relative to our spend? Those are all interesting things, but I wouldn’t make metrics against them necessarily.

Eirini: It’s certainly not the metric.

Laura: Yeah, like a metric, not the metric. And I think that’s something that you’ll probably hear from all five of us is that all of these measurements are a number and not the number to focus on. And I think that’s where most of the damage is done is when we try to distill these very complex systems with so much nuance down to a number to put in a slide deck, because that doesn’t serve anyone. It’s actually, it’s incorrect information for the requesting executive or the board or whoever you’re trying to … and it really doesn’t serve the developers and the developer population to accurately reflect what’s going on either.

I was at an event yesterday, and someone from … I believe Meta shared that … Well, first of all, the phrase generative AI is just as good at deleting code as it is at generating code. And I think that’s an interesting way to frame it. I don’t think that’s really talked about.

And they said they had some agents remove dead code, and they actually reduced their security incidents by 90%. And that’s a really interesting use case for generative AI, which isn’t really the use cases that are talked about often, which is about generating huge swaths of code. So I think there’s just so much surface area that’s underutilized, use cases that are untapped.

I wanted to get into some of the research that you’ve all done this year, because it’s been a great year for AI research, developer productivity research. There’s a lot of new ideas. Eirini, I want to go to you first. You had this paper about the new identity of a developer and what’s changing and what’s not changing in AI and how skills are a little bit different. I wondered if you could share just some kind of highlights from that work.

Eirini: This was a qualitative piece of work, because I feel that we need to understand some of the core parts before we start quantitatively trying to track and measure things. And we interviewed developers that are advanced users of AI. And there are details in the blog about what the definition of that.

But the idea was to start seeing those that are a little bit further down the line of AI adoption and they’re working … have gone the furthest working with AI and how they are working with it, what they’re getting out of it.

So part of what we found was that the work is very different. Developers are describing that they’re focusing not on the raw implementation, as I hinted earlier, but they’re focusing more on what happens before they use agents a lot. So what happens before they ask an agent to implement something? So how do they set it up for success? A lot of context setting, setting guardrails, going back and forth with agents until they have a good, solid plan for the implementation, and then coming back after agents have implemented a code solution to verify. So a lot of testing and review and things that we’re all very familiar with.

The other thing that was interesting was that this is something that happens as a progression. So the developers that we interviewed have been working with agents and have very sophisticated setups working with multiple agents and parallelizing tasks and so on, but they didn’t start like that; they started as being very skeptical about AI, trying out very simple features at the beginning. And through frequent use and trial and error, they go through stages that we documented, from skeptics to explorers to collaborators to strategists.

Everything changes during that progression. The expectations that they have about what accuracy they should expect, what iteration is involved, how one-shot success becomes not a thing that is expected. So all their expectations adjust, their sentiment also moves, but it’s the feeling that the work is very different. So they’re starting to see themselves not as the ones that are producing the code, but the ones that are orchestrating everything else so that the code is produced.

Definitely has implications for skills. And when we’re thinking about organizations and what skills they’re trying to teach and enable in their developers, that also gives us some hints. Some of the fundamentals are definitely equally important, but there’s a lot about AI fluency, about product understanding, about orchestration, about the different ways to collaborate with agents or with other features and other modalities, system design. So very new skills that developers will need, but also some skills they already have that continue to be very important.

Collin: It’s so funny, so much of what Eirini just said, you could swap out agent for brand new developer, and you’re talking about the role of an engineering tech lead. Ciera could probably speak to this better than I can, but a lot of what you just described is a senior staff engineer’s job to coordinate junior engineers, help them see the big picture, help things fit together.

Eirini: And another thing that was mentioned was also it’s very similar to managing others, which I guess the staff and the really senior engineers do manage the effort of others and guide the effort of others, for sure. But people were likening it to being a manager, and the idea that you don’t just give a one-liner description to someone and expect that they’re going to be very successful in delivering the outcome, it takes a lot of setting them up for success. And of course, you have to check their work afterwards as well.

Ciera: Well, I was just thinking about, I know some people have been concerned about this, because they’re afraid that because you might need those skill sets that we’re going to put ourselves in a situation where junior engineers aren’t going to learn the skills that they need in order to become a tech lead, to become a senior engineer.

There’s also the way of looking at this that maybe they are actually going to learn it earlier, because they have to now manage this AI. And I don’t know that we’re going to know right now which way that’s going to go and how to help junior engineers learn those skillsets early, learn how to use AI, and then how to transfer that towards then becoming a senior tech lead. I think it’ll be interesting to see how that goes though.

Laura: I think that’s extremely interesting and a perspective that’s not shared very often, because there’s a lot of replacement of junior engineers, the death of the junior engineer, and now we’re not going to have any seniors. But the opposite could be true that now juniors can level up much faster. We see that AI helps people onboard, but to your point, Ciera, of helping juniors delegate tasks, set clear expectations, communicate about the problem very clearly. This is something they’re going to have to do much earlier in their career, where previously, they were relying on their tech lead and their seniors to have a really complete ticket so that they could work on a scoped task. 2026 will definitely be interesting in that regard.

Collin: I also wonder which pieces of what junior engineers learn now and what senior engineers do now might be lost. So a lot of the communication skills, understanding how to work with people, the same skills are not required to work with agents. You can just bluntly delegate a task. You don’t have to describe to a person or describe why it’s important necessarily. They’re just like, “Oh, I do what you tell me.” So I wonder how it will change interpersonal dynamics and the way that people mature as professionals outside of their technical skills.

Brian: Again, it’s exactly why you want to measure across lots of different dimensions so that you can capture where the leveraging autonomous agentic AI may hurt the way that we collaborate and interact with our peers. Because at the end of the day, still, the number one most frequently cited workplace challenge amongst software engineers is that they are missing social interactions with their peers, and we don’t want to exacerbate that even further.

Laura: Collin, I want to go over to you next because you did some research on creativity for AI tools for creativity and not explicitly productivity, which I think continues this conversation about the non-technical skills that are important to work with AI. But I thought that paper was so interesting, because we often see AI tools, especially in the software engineering space, pitched as a peer productivity gain and not really for other purposes like creativity. Could you share a little bit about what that paper covered?

Collin: Sure. I mean, first, I want to defer to my co-authors. I was an advisor on that paper, but a lot of the heavy lifting was done by Sarah Inman and Sarah D’Angelo and the others who worked on that paper. So just credit to them where it’s due.

I think the notion that AI has to be only for productivity, it’s just narrow. So in a bunch of different disciplines, we’ve used automation or we’ve used tools to help enhance what human capabilities can do. And so I think the notion of that paper was that sometimes when you expose the seam, sometimes when you expose the decision-making, you expose the nitty-gritty of a task. That’s actually an opportunity for people to think about how to do things differently, how to challenge an assumption or how to make a different decision. And that’s space for innovation.

So I think the idea that you automate away all the decision-making or automate away all the friction, all the seams is contrary to the idea that if you really understand a problem and you understand the considerations that went into decision-making along the way and you understand where the choices were made, that doesn’t necessarily result in a better product; it just results in a faster path to the same thing. So viewing innovation or creativity as things that you have to toil a little bit to get to, I think is the thrust of the paper. And there’s a lot of research on creativity, broadly speaking, that supports this notion.

Laura: There was a question from the audience about what are now the killer AI use cases? Creating code is not the time where devs spend the most time. There’s meetings, floods, friction, bad planning, there’s so many other things.

Brian, you had … I mean, we spoke briefly about, or maybe not, more than briefly about bad developer days and some interesting ways that Microsoft is thinking about, what are the ways that we can find the best use cases? You also had the recent paper on where AI matters, how we can support developers in daily work. Can you talk a little bit about where do we find the most high-value use cases when it comes to maybe more the productivity angle, not the creativity angle like Collin was just speaking about?

Brian: Actually, quite a bit to unpack in that question. But obviously, the breadth of what software engineers do today is massive. I think a wildly misunderstood thing about software engineering is that it is rolling up your sleeves and writing code. And while us as engineers, we really enjoy that, that gives us a lot of meaning and fulfillment, only about 14% of our day is actually spent coding.

And I think that AI, we often are talking about the AI coding assistance as accelerating that coding task, something we already like doing. And while we are willing to use it, I think a really interesting case for AI that is going to make us enjoy our jobs a lot more is how can we have it start automating away our toil? Those tasks that we don’t find meaningful, we don’t actually think of as the things that make us software engineers, as not part of our identity alignment. Things like dealing with compliance burden or mitigating service incidents, dealing with bureaucratic paperwork.

And so you would mention this metric, the developer days, which actually took inspiration from an idea that was started at Google, which is for us a real-time, telemetry-based measure of developer toil. And so one of the things that I am looking at is as we have agents that in the background can go and start pulling these work items off of the stack and addressing those tasks that we really just don’t want to do. When we wake up in the morning and we see our task backlog is like, “Ugh, today’s going to suck.” I want to see, can AI agents start pulling those things off? And so we don’t even know they exist anymore, because just magically in the background, we have someone who doesn’t care about doing them, does them all for us.

Collin: And Ciera will tell you this too, but I have a pet peeve that people don’t look enough at the automation literature from the last 40 years when they’re thinking about AI. And in robotics and in automation, they talk about automating away dull, dirty, and dangerous jobs. And I don’t know exactly what the equivalent is for software engineering, but some near analogy to that is it should be a guiding principle for why we seek to use AI.

Brian: You actually say something really interesting there that I want to touch on is it’s also a good reminder that we don’t always even need to use AI to do this. There’s just good old-fashioned algorithmic automation that we just need to remember, there are things that we don’t want humans to have to do. And whether AI is the way to solve that or just good old-fashioned automation, we should look at how do we remove all of the things that prevent our developers from focusing on delivering new innovation?

Ciera: Collin, you said we don’t know, but we do though though. We have our quarterly survey where we ask people questions like, what is the biggest blocker to your productivity? And we’ve got a list, and it’s pretty consistent that there’s every single quarter, it’s the same things that bubble at the top: technical debt, documentation, flaky tests, and I can’t figure out how to learn this new technology.

And to Brian’s point, I think I completely agree that sometimes automation … we should automate the things that are easy. These four things keep bubbling up because automation hasn’t solved them yet. We’ve been doing this for decades, and we still haven’t figured it out. Those are the opportunities for AI, is the ones where it’s like we know it’s painful, we know nobody likes to do this work. It’s important work though, which is why we keep having to do it, and we haven’t figured out how to automate it yet. That’s a huge opportunity in all of those.

Brian: I’m actually running a survey right now where less than 1% of developers say that they never struggle finding good documentation. That is such a place that consistently has been a top pain point. I have multiple studies out this year that say documentation is the number one or number two thing that devs want AI to help with. And it’s like, yeah, it’s not a core dev task that we think of, but it is a consistent pain point and always has been.

Collin: We’re actually seeing in our quarterly survey some improvement in some of our knowledge management tasks as a result of …

Laura: Awesome.

Collin: … the launch of LLMs to help developers navigate documentation. The interesting thing that comes along with that is a little bit of a pain point around, am I sure that the LLM’s not hallucinating about that documentation or misreading it or misrepresenting it? So that usual twist that comes in when someone else is summarizing for you, but we’re seeing improvement across a number of tasks like finding the documentation that does exist, knowing whether documentation exists, et cetera.

Brian: Similarly, within Microsoft, I think one of the most successful actual real-world AI tools that we have is helping navigate all of our engineering system documentation. It’s one where you can just measure the amount of minutes saved, and it’s meaningful.

Laura: To go back to that big hairy question that I asked you, Brian, about finding the use cases. I mean, this is a great example of taking your developer productivity data that is multidimensional, looks across your whole software delivery life cycle. So you have many different things. Finding where the friction is for developers, and then seeing if AI is actually a good solution to that problem, that’s how we really increase velocity and get developers to reduce toil and move faster is by solving the problems that they have.

And in a lot of cases, we know what the problems are, and we can’t just grasp at AI just for the sake of a shiny new tech; we have to really stay focused on solving real developer problems, and then ask ourselves, how do we get that data? Which goes back to the very beginning about multidimensional measurement.

I want to ask a little bit more about 2026 and research themes. Eirini, what are you thinking about right now? What are you curious about? What is an unanswered question for you around developer productivity and AI?

Eirini: It’s, I think, a little bit of a twist on developer productivity just because I’m coming out of this work that showed me that the work of a developer is changing. And I think that that has two parts to it. One, I see, and this is a frontier behavior at the moment, but depending on how the tool’s supported, it can become a lot more mainstream, which is the idea of developers paralyzing tasks and being able to use agents to leverage that as a way to experiment or try out, prototype different ideas before they commit to a particular solution.

This is a hypothesis at the moment that task parallelization is something that tools can support, that it has benefits for innovation, fast prototyping. And so that’s one thing that I want to focus on is understand that task parallelization behavior and see what its impact is for productivity, for accelerating innovation, experimentation, and so on.

The reason why I said it’s a little bit of a twist in productivity is because I suspect we probably have to design and capture different telemetry, different metrics in order to capture that behavior and being able to operationalize it. That’s one part of it.

And the other one is we mentioned earlier, I think was Brian said people don’t want to lose the collaboration skills and the social interaction. And I think Brian hinted to that.

At GitHub, we’re working with the idea of collaborative workflows with AI agents. It’s still early days, but the idea that you have teams working with agents, not just individuals working with agents. And I think that that’s going to be a fascinating way to look at collaboration, workflows, dynamics when agents are put in developer environments or environments where developers collaborate. And that also will have implications for how do we define productivity, what is task completion when you have multiple people working with agents and they parallelize things and so on. So I feel like I find that very fascinating. That’s where I’m headed towards.

Collin: This notion of teams collaborating with an agent, I know that some teams at Google are starting to set some team norms about how they interact with their code base in the form of standard prompts so that every agent that’s deployed to their code base follows a certain set of prescripted prompts that helps standardize the way that the code is written.

Laura: What is your collective individual perception of companies adopting agentic workflows?

From my point of view, I saw 2024, 2025 is really the IDE code completion, agentic IDE era. And I think 2026 is going to see more agentic, human-out-of-the-loop workflows. This is the trend that I’m seeing. I wonder if you’re seeing that as well in your orgs and in your research.

Ciera: There’s no interest in identifying what are the tough workflows and targeting agents specifically for some of those workflows.

Brian: Going to something you hinted at, Laura, I do think what you’re really going to see is who are the organizations that have a good pulse on what their top developer pain points are? Because otherwise, you’re just throwing spaghetti at the wall and seeing what sticks. And so I think you’ll see that those organizations that have really invested in understanding their developer experience are going to more efficiently and effectively be able to go and deploy agentic AI solutions. It is going to get easier and easier to design and build and deploy these AI solutions. And unless you know what it is you’re trying to solve, then you’re not going to find a whole lot of success. And I do think you’re going to start seeing a pretty wide split between those two kinds of organizations.

Laura: Yeah, back to basics. First principles, find the real problem, and apply AI to the problem and not just for the sake of technology. I often describe AI as like a change management problem with interesting technology, not a purely technological problem. Organizations that have the best results with AI tools are the ones that have really strong experimentation, problem-solving, really good test and testing culture, all of those things. It’s like AI isn’t going to be able to change all of those things overnight.

Brian: And I do think, speaking of change management, also just being able to do culture change, as Eirini was talking about, fundamentally, the way we do our jobs is evolving. And I think you’ll see there’s a very real human problem or human aspect of that is how do we help all of the individuals within our organizations navigate that change and feel positive about it? And I think you’ll start seeing big differences and outcomes based on that.

Laura: I wonder if one of you has a favorite research paper or something that you found extremely interesting that came out this year that you want to mention what was interesting or captivating to you.

Ciera: Okay, I’ll go there. The METR paper definitely made a splash.

Laura: It definitely did, yeah. I’m actually surprised we made it this long without bringing it up, and now maybe we could have filled the whole time with it. Why do you think it made such a big splash?

Ciera: I mean, okay, there was a couple aspects. First, I thought it was very well done. It was one of those ones where I saw the headlines in LinkedIn posts, and I was like, “Ah.” But then I went and actually read the papers like, “Oh, oh, they did a good job.” This is actually [inaudible 00:39:27] headlines were not nuanced, the paper was more nuanced. And I love that.

I think it made a pretty big splash because I think it did resonate with a lot of people’s personal experiences, which are not always the same as what the headlines before were saying. And it’s because we were looking at different things across these different studies. Some of the studies are lab-based. A lot of the early studies, in fact, on AI are lab-based. And that’s the appropriate thing to be doing when you’ve got an early technology, you’re still trying to explore what’s going on here.

We are now starting to shift to being able to study AI in situ. What happens when you’re using it for a real developer, not just on a real problem, but on their problem? And I think that’s where there’s possibly some interesting differences where in the METR study, they were looking at developers working on their own code bases, solving the problems that they wanted to solve, which means those developers had really high familiarity. It is quite possible that for that set of people, that AI was not actually helping, but was hindering because they had to interact with the prompt. And you see this in their data, developers spent more time interacting with the prompt and more time trying to understand the prompt, even though they did spend less time coding. But overall, the numbers made it so that AI was slowing them down.

But that is different from a use case where an engineer is working in a code base they’re not familiar with, either because they are a more junior engineer or because, like one of my teammates the other day, he needed to add something to another team’s code base, and he was like, “Well, I could go ask them, and I could go read all their docs, or I could go to see how AI does with this.” And he’s like, “Yeah, it mostly worked.” And that does save him time, because it’s something he’s not familiar with. So I think this is where we’re going to start seeing more of that nuance of like when AI is more useful and how we can use it appropriately.

Laura: Yeah, absolutely. I think that’s also an interesting lesson maybe to close out on a note about media literacy was that, Ciera, as you mentioned, the way that this got covered in the media and on LinkedIn was really, really simplified and almost oversimplified to the point of being incorrect. Makes a good headline though. So to take away from that is like, we’re going to probably see more headlines like that, more facts and figures about company X or company Y, especially as we start to study things out of a lab scenario and in real problems and their problems, we’re going to see a lot more numbers.

And it’s so important, all of you listening who are obviously interested in research, that’s why you’re showing up here, to get curious about what those numbers are, where they’re coming from, who’s doing the study, what are they trying to sell you, if anything, how might that influence what the story is? Just read through things and find the nuance, because it definitely always is there. Nothing in this space is simple. And if it’s presented to you as simple, then it’s probably incorrect. I would go so far to say that it’s like oversimplified.

Thank you all for joining this conversation. I think it was really valuable and really interesting.

I want to share that DX is having a annual conference. The inaugural one is going to be in April, April 16th at Pier 27. Brian will be there, Collin will be there, Eirini will be there as well. Ciera is going to be off gallivanting around the world at that time, so she can’t join us. But if you’re interested in this conversation on a much bigger scale and in person where we can have a more high-bandwidth exploration, conversation, then definitely check out dxannual.com.

We’ll see you around.

Collin: Thank you so much.

Eirini: Thank you so much.

Brian: Thanks, everyone.

Laura: Bye.

Ciera: Thank you. Bye.