Measuring AI impact, assessing readiness, and new data trends

Erinn Schaal:

Welcome. Thanks everyone for joining.

The discussion today is going to be hosted by Jesse Adametz, who is a Senior Director of Platform Engineering from Twilio. He’s also a longtime DX customer, super thoughtful thought leader. He’s been on our Engineering Enablement podcast before. You might recognize him. He’s going to be guiding this conversation with Abi AMA style around questions that we’re hearing from you all and just the industry in general around the evolution of AI impact. From there, I’ll pass it over to you, Jesse, to kick off this conversation.

Jesse Adametz:

Yeah, I feel I’m here to get as many questions answered, even just for myself. I feel things have been moving very quick the last even, well, obviously, a year plus, but the last couple of months there seems to have been this clique, maybe around background agents or this conversation about ROI, things like that, so suddenly it feels like pieces followed into place very quickly, and then we’re all asking the same questions. Yeah, maybe let’s start there, Abi. Again, I think it’s gone from a lot of leaders like us are saying, “Cool, how do we adopt this?” But adoption seems to have moved very quickly, and now, we’re all saying, “How do we finish the rollout, or enable everybody doing more of the same things faster?” And then, of course, you’ve got people seeing the adoption and saying, “How do we measure it?” What are you seeing from leaders about maybe how they’re applying AI across the SDLC? I think we’ve moved on from code gen specifically or autocomplete, and so, now it’s like, where else are we seeing it in the SDLC?

Abi Noda:

First of all, I want to just acknowledge every leader I talk to, the impression I leave with is, this is a crazy, it’s an unbelievable time to be a developer productivity or platform leader. Never before has there been this amount of change, this level of attention and focus on developer productivity. It’s both a stressful and very demanding time to be a leader in this space, but I think also very exciting and huge abundance of opportunities and opportunity to steer our organizations. When I go talk to leaders, I would say, first of all, this question of what is the future SDLC? What is beyond just code tools? I think that’s actually the question everyone is trying to figure out. I’ve started seeing, when I go meet with customers, they share with me their pitch decks, their vision docs, their 12-36 month hypothesis for how they’re going to incorporate AI across the SDLC, but it’s very much in the visioning state. I would say, everyone is trying to figure out, what does this look like 12 months from now? No one has it nailed.

A few of the themes that I pick up on, I would say, are one, looking at the wider SDLC, and I use the term wider, so spanning more of the SDLC process, so areas like review, areas like planning and prototyping, those are themes that come up when I go talk to leaders in terms of where they’re seeing new bottlenecks emerge and new opportunities to greater impact productivity. Another theme I hear about is just developer experience. Especially when I go talk to DX customers, they’re looking at the data on, where are the bottlenecks for developers? Where are the friction points for developers? How do we leverage AI to make a dent in those areas? Whether that’s code-based maintainability, even deep work, documentation, of course, which also is critical for successful AI reliability.

Developer experience is another theme. And then lastly, you already touched on this, Jesse, in your intro, but background agents, which go by many names. I’ve heard everything from async engineering, agent-driven engineering, background agents, autonomous, full agentic engineering. There’s a lot of different labels for this, but generally, the idea of how can we unleash agents outside of the human nudging approach of working with AI, but rather more in the background, more proactive, more autonomous, how can we, basically, give more work to agents than without being bottlenecked by human workflow. Those are the themes I’m hearing about.

Jesse Adametz:

The review one is a big one for me or for us that I’ve seen a lot lately that … And to your point, we saw it come up in our DX data, for example, that it turns out that generating the code wasn’t actually really ever the problem. Turns out, we probably had code review problems and things a long time ago, but now it’s being amplified. That actually came up in our most recent snapshot of code review is now the bottleneck. And it’s interesting to see it. This is one of those hindsights 2020, I guess, but for those who read the Phoenix project and things like that way back when and these analogies to assembly lines and just moving the bottleneck from one station to the other, it’s all rushing back where I’m like, “Oh, look, that’s happening.” So yeah, it’s super interesting.

If we maybe go one step back, we talked about, again, this shift from adoption, but I imagine there’s still lots of folks. It’s easy to feel left behind in the conversation even week-to-week, honestly. I’m sure there’s some people thinking, how do you ensure the company is ready for AI? Is there such thing as ready? Is there a checklist where before you go enable mass enablement, you do this, or a team just letting it happen and then catching up? What are you seeing there?

Abi Noda:

I think this is a really relevant question to this audience, because I think this is where the bulk of the work and opportunity lies for platform leaders, developer productivity leaders. It’s true. A lot of leaders I go talk to will lament or comment that their environment just does not seem well-suited or ready for AI. What they typically mean by this is it’s actually a lot of the things that we associate with developer experience. They’ll talk about, “Hey, we don’t have standardized development environments that are actually, have the necessary tools for agents to produce good code. We don’t have good feedback loops, things like CI, tests, security guardrails in place to allow agents to produce code that feels safe. We don’t have documentation about our software systems, our different repositories, how these systems interrelate.”

When you go have an agent try to work on a feature, it goes and spins up Kafka, even though you already have RabbitMQ in place. I’ve heard a lot of leaders call this AI readiness, and I think it often comes up that the irony is like these are actually the same things that we’ve been talking about for a long time in terms of developer experience. Those same bottlenecks are bottlenecks for AI to be reliable and effective. And so, this is an area I see a lot of organizations reinvesting in now, reframing as agent effectiveness or agent experience and reinvesting. But it’s a journey. No different than it’s been a journey for a lot of us to improve these areas for developers, to tackle these areas in this new world. It still takes time. It’s really a transformation that everyone needs to undergo in order to get their environment in a state where AI can work reliably.

Jesse Adametz:

I imagine tangential to that is, when you’re getting ready to enable or you’re getting ready to, again, you’re catching up from feeling left behind, tool assessment feels like a thing that I think a lot of this audience has probably either gone through or going through. I’m curious, is the important thing which tool you use, like Cloud Code versus Cursor versus anything else? Is the differentiator how people are using the AI tools or the tools they’re using? What have you seen there?

Abi Noda:

It would be incorrect to say that there aren’t material differences in the tools. It’s also true that the tools are evolving so quickly that there’s no advice I could give on this call right now that would hold true for probably longer than 10 days. I would say that generally speaking, I don’t think success hinges on the tool you choose. The models continue to improve. All of us, to some extent are, and I’ve heard leaders talk about, “Look, what can we do other than sit around waiting for models to improve?” I think that’s kind of the question for all of us. What can we do to increase the leverage from these tools? I think that ties back to the things we talked about, making sure context and the environment that AI works in is as rich and effective as possible. There’s also other things we can do in terms of enabling teams and developers to learn how to use these tools effectively, tools we can build as platform leaders to deploy these tools and incorporate them into the SDLC seamlessly.

A lot of things we can do, but yes, I don’t think tool choice is the most important decision right now, because I think all of us on this call, the landscape is, it’s not fair to call it a three-horse race, but we understand where the tools are at and at this point, the rate at which models are improving, so that stabilize a little bit to a certain extent. I think it’s time to look at, what do we do beyond those tools?

Jesse Adametz:

Yeah, you said something there that resonated about the rapid improvement of the models. I saw recently, and I think we’re feeling it too, that where enablement for us a few months ago might’ve looked like, “Hey, how do we roll out a Twilio-wide CloudMD type file to go and embed our standards and give everybody a leg up?” But we’re seeing now the better pattern is every time the model increments delete CloudMD and start over because the instructions are stale. That’s wild.

Tangential to that, I think in the enablement part and talking about the speed of it, I think what comes up a lot is investing in education and knowledge-sharing. What I see is so far there tends to be these folks that are really far on one end of the spectrum, like their monitors are not big enough to run enough terminals that are adversarially coding against each other. But then, you’ve got the other folks that are like, “Ah, I prompted six months ago and it wasn’t great.” The middle ground is very sparse. So then, you start talking about, “Okay, well, how can these people teach those people?” But also at the same time, the second we sit down to be, “Okay, what would a curriculum look like?” The ball moved. Do others feel that pain from the conversations you’re having?

Abi Noda:

Yeah, 100%. I think similar to how difficult it is to really ground ourselves in reality at an industry level, what is actually a realistic expectation around what we should be able to achieve today in the next 12 months?That’s a very difficult question for leaders to answer right now. I think it’s similarly actually pretty difficult to answer that question even just as a developer within an organization. Because probably, in your organization, Jesse, there’s probably developers sharing clips of them doing crazy things with agents, crazy things. And then you start wanting to ask questions like, “Okay, well, is that workflow, is that portable to all developers within your organization? Is it actually reliable? Was it cost-effective?” That’s actually the elephant in the room.

I really think we’re still at a point of fostering experimentation, acknowledging that we don’t know exactly what the right way, the official way will look like. I think it’s more important right now to encourage that experimentation, ask good questions, ask hard questions about cost. I think that’s increasingly important and we’ll talk more about that today. I don’t think there’s a playbook that can be written right now, to your point. I think it would be futile to attempt to do so.

Jesse Adametz:

Maybe along those lines, something I think I uncovered talking to other leaders and reading last year was one really tactical approach perhaps around the enablement is to make it someone’s job. Again, not necessarily a new thought, but specifically around AI enablement like, “Hey, if there was a team responsible for thinking about the tools, thinking about the workflows, sharing those more broadly, et cetera, I’d share, that’s an experiment we’re trying.” We’ve recently set up a productivity team, but it’s interesting because it immediately comes up, of course, of like, “Well, how do you measure the success of the enablement team?” Any thoughts there?

Abi Noda:

Yeah, and I don’t think this question and answer really has changed with AI. I think we’ve always had a little bit of a challenge in terms of, well, how should we think about the “productivity” of an enabling team, because pure product velocity and eng velocity doesn’t really apply. I think our advice has always been to think of it more in terms of outcomes. And really, the outcome that an enabling team is looking to deliver is productivity increase to the teams they serve. I think that the outcome to be focused on in terms of success is to what degree have you lifted up the metrics or eng velocity for the teams you’re enabling rather than what is our product or eng velocity as a platform team?

Jesse Adametz:

Right. We’re starting to scratch on more of the measuring, conversations I’ve definitely been having and I know others are having. It’s an interesting shift, I think, what platform or developer enablement teams used to work on was that the cost of it was more just like, “Well, we employ these folks and that’s the cost of them making people more effective and having leverage.” But now with AI, there’s actual ROI conversations, because it’s like dollar in, how many dollars out? What’s your thoughts currently? Do we measure ROI? How do we measure ROI? How do you maybe combat the conversations, if that’s the tactic around correlation versus causation, “Hey, we still shouldn’t be measuring ROI, or if we do, it has to be this way.” What’s the current thinking from your perspective?

Abi Noda:

Well, I’ll say a few things. First of all, the correlation versus causation conundrum, nine out of 10 organizations I go meet with I feel is making a little bit of a mistake around the correlation versus causation. Generally speaking, analyses that focus on the question of, do people who use AI more have higher code throughput? That analysis is generally, I would say a little bit flawed, because typically, the people who use AI more are the people who coded more in the first place, so that’s a problem. I think, in most organizational settings, a longitudinal analysis is more telling. You may have seen on our newsletter, we’re about to publish a meta longitudinal analysis across a bunch of companies and the findings are really illuminating.

Longitudinal analyses are also sometimes challenging just due to data availability, like you need longitudinal data that’s clean. You also need to still account for confounds. One, I think pretty serious confound variable right now is actually just the heightened pressure at a lot of organizations to increase throughput. Leaders are demanding higher throughput and higher levels of AI and code activity. I think there is just increases in throughput due to Goodhart’s law there. Longitudinal analysis is my recommendation when possible. That doesn’t mean that just comparing cohorts cross-sectionally, like I mentioned, is useless. We operate in businesses, we have to answer certain questions the best we can, even with imperfect data. And so, it’s okay to do that, but I just want to call out statistically, I don’t think that’s the thing to be drawing long-term conclusions from.

In terms of ROI of AI in general, I think the more time I’ve spent with customers and in our research looking at the data, I think it’s helpful to think of it in two buckets. One is amplification, which Jesse, I know you’re a big fan of, you used that term already, and that is really about how much more productive are humans thanks to AI. For this, we look at things like throughput. Our engineer is able to deliver more by using AI. Time savings, like how much time do developers feel like they’re saving in their work, specific workflows thanks to AI or new tools that incorporate AI. Developer experience scores. To what extent are AI initiatives or AI tools just improving the broader developer experience, which in turn, we have a conversion of developer experience to time savings with the developer experience index. That’s amplification. I think that’s one bucket.

The other bucket, which is, it’s in some cases more theoretical than actual, depending on where you are in your journey, but I would call it augmentation. It’s really to what extent, and this goes back to background agents a little bit, what we were talking about, but augmentation is to what extent are you actually extending your organizational engineering capacity by incorporating agents as headcount into your workforce? How much work are agents driving and delivering? And of course, with all this, the divisor needs to be cost. How much work are agents delivering?

How much is an interesting question. We can look at that in terms of throughput. One thing, and we had this as part of our AI measurement framework published last year, a unit we like to think about as human equivalent hours. How much work are agents delivering? How much would that have taken a human? And then, you divide that by how much did it cost? How much did we spend? The result is actually like an agent hourly rate, which I think is a really interesting metric because that’s going to hopefully be pretty low, so it tells you, “Hey, this is a place to invest.” Agents are a really efficient workforce for us. It’s a high ROI place to invest. That’s how we think about augmentation.

Again, I think amplification, augmentation, I think this is a good way to think about different bits of data we’re collecting. I think it’s also a good way for leaders, like you and others on this call, to tell the story. When you go to executives and you’re talking about, what are we doing with AI? Well, to a certain extent, we’re amplifying our humans and to a certain extent, we’re trying to augment our workforce with agents. I think it’s a nice mental model.

Jesse Adametz:

There’s a couple of things there. The background agent thing I mentioned earlier, it feels like that’s kind of just clicked in the last two months, not even clicked that, “Oh, all of a sudden everybody’s doing it,” but everybody all of a sudden understands it at least a little bit more of what it could be. I think how we’ve been thinking about it, and maybe it’s along the lines of that staff augmentation of everybody’s got KTLO, everybody’s got these heaps of, well, there’s just these third-party dependencies that need to be upgraded and this and that. There’s solutions for that kind of stuff like Dependabot or Renovate or whatever, but there’s an interesting thing that we stumbled upon just recently where it’s like, “Well, even if a version of a thing gets bumped, what if the signature changes?”

The renovated Dependabot, they’re not solving that. So if you then tell an agent, “Hey, did you notice that package got bumped? Does anything else have to happen?” That’s work that obviously we would point an engineer at historically, but what if instead you just woke up and it was done? And so, to me, you mentioned the amplification, internally, I don’t have to sell anything, so I don’t use the word productivity, but I’ve always used the experience part. To me, it’s just wildly good experience. You wake up and it’s like, “Oh, all these things just got done,” and it just makes people happier hopefully, so that’s super interesting.

You mentioned the cost of an agent thing too. Something that I was thinking about recently, and from talking to others too, maybe a difference between enterprise and startup, I’ve been looking at the cost of people burning tokens. But depending on your context, what I’ve found is it’s either a lot or a little, but the same token burn. In public, FANG companies maybe compensation is very high and you look at the token cost of one of those engineers and token cost as a percent of comp is very, very low and you’re like, “Well, why would we hire?” And I don’t actually mean to take it from the headcount perspective, but it’s like them burning those tokens is free in comparison compared to how much more they can do. But I would imagine that’s a challenge at a startup, where dollar comp is a little bit lower and somebody might be burning half their salary in tokens. I don’t know if you have thoughts there, but that’s recent observation.

Abi Noda:

I think the presumption right now, the working assumption is that every token dollar spent is a good dollar spent. I wouldn’t disagree with that, but I do think that we’re going to have to refine that a little bit. There’s a lot of cases of token spend that is probably difficult to rationalize when you really zoom in. I saw a comment earlier about just better analytics on how tokens are being spent, how engineers are prompting these tools, and of course, what’s the outcome are we getting? We just published some data again, the full paper’s coming on, we’re seeing 10% to 15% overall throughput gains longitudinally year-over-year.

Let’s suppose two thirds of that is attributable to AI, so let’s call it eight to 10%. Okay. What does that mean in terms of spend? What’s the ROI we want? Does that mean 8% to 10% of an engineer’s salary worth of token spend? I don’t think anyone’s figured out the formula, but I think when you actually zoom into the numbers, this working assumption of, “Oh, just burn tokens at all costs,” I think starts to become a little murkier. I don’t have the answer. I would just say that I think we’re all starting to think more about this. And then, of course, the costs are so variable and still changing, the pricing models are changing. It’s hard to pin down right now. I think I would see this as a next 12-month initiative for a lot of leaders on this call, not something they’re going to solve today, right?

Jesse Adametz:

No, and there was just a good call out in the chat too, so I’ll acknowledge that. It is definitely model-dependent and things like that. You choose the wrong model to do a lot of the token spend and it’s not palatable versus a cheaper model, things like that.

Abi Noda:

We saw Cloud Code introduce reviews and it’s like, okay, it’s $25 a review. You need some guardrails around that.

Jesse Adametz:

Yeah, I would not bucket that in the free category.

I’m curious, I like to think the industry has matured enough that we understand lines of code is not a good measure of an engineer. Is it a bad measure of an agent? Is measuring agents and what they would do without a person any different? Do we have thoughts there yet?

Abi Noda:

I think, and I haven’t seen an argument against this still. Obviously, disputing lines of code is what folks like me in this space have spent a lot of time over the years. I would say lines of code is a noisy metric for the simple reason, low effort change can be many lines of code, high effort change can be few lines of code, and so it’s just noisy. I think it becomes even noisier with AI-generated code, which has the tendency to be inflated in terms of line count. I don’t think that’s a controversial statement. For those reasons, post-AI and pre-AI, we’ve always preferred metrics like PR throughput, which we feel, again, still imperfect, but it provides a little bit more of a normalized signal of change throughput. How many automic changes are we pushing through the system? I think that’s a less noisy metric.

At DX, we’ve also developed true throughput, which is weighted PR throughput, which actually does incorporate lines of code as one of the mechanisms to weight PR throughput. But again, I think all these metrics are imperfect, so you’re dealing with degrees of imperfection. I do think lines of code is significantly less perfect than PR throughput as a high-level signal. There’s other research and empirical, a lot of big tech companies have spent a lot of time trying to figure this question out too of generally landed at change throughput as their preferred signal for this type of thing.

Jesse Adametz:

Yeah, it makes me think, again, back to DXI as maybe the really clear, unclear, because it’s calculated, but the thing to actually optimize for. What you were just saying made me think back again to experience where I’ve got somebody on my team, for example, that it’s funny how the art of possible is just limitless now where he specifically has said to me before, he’s like, “I want to do 20 changes a day.” Where that used to be like, “Oh, that’s a great goal. I understand you’re being hyperbolic.” He’s not anymore. He’s actually approaching something like 20 changes a day and it’s tangible. He’s one of those, like he’s far on the end of the spectrum, monitor is not big enough for enough agents, but it’s quite interesting to see the realization here that anything is actually possible now. It’s wild.

I’m curious, maybe as a next question there, one thing we’re seeing, I know somebody mentioned earlier at the start too, the impact this has on engineer’s role, the role of the individual contributor. I think one way I’ve certainly seen, and I think the industry has started talking about is the role of the software engineer will go more to orchestrator, or I’ve seen folks even just equate it to leadership, where typically, as software engineers get higher in their career, they take on a lot more leadership and they are doing that orchestration even outside of management. But now, it’s like, “Will it be that we’re expecting level one and level two engineers to lead teams?” What are you thinking there on the orchestration of work rather than just doing work?

Abi Noda:

This is one of those examples where it’s a little bit difficult to get down to reality. Certainly, there’s a lot of examples out there within our organizations and on Twitter of developers being orchestrators. To what extent is that actually how engineers should work day-to-day in most organizations? I think that’s a little murkier. To what extent are engineers well-positioned to be effective at that way of working is another question. To what extent to engineers actually would enjoy working in that way is a third question. I think a couple of thoughts on that. One, I think that a lot of developers prefer to being in the code versus spending a lot of energy drafting requirements and specs and reviewing other people’s code. Take that for what it’s worth. My guess would be most engineers would prefer to be in the code than just reviewing code and drafting requirements for agents.

To what extent are developers well-positioned to be effective at this? Well, I would posit that on most teams it’s the definition. It’s actually the defining and deciding of what to do that is more the bottleneck than the coding and doing it part. And it’s also, I would say, arguably the harder and rarer skill to hone to have that good judgment. Suddenly saying, “Hey, if we just took all our engineers today and just gave them three ICs with infinite coding capacity that report to them, what would be the outcome? Would our organizations be more productive in terms of outcomes?” I would probably guess no, and this goes back to coding was never the bottleneck. More engineers, it’s like the mythical man, just adding engineers doesn’t actually make things go faster. I don’t have the clear answer to this, but I would say, it’s not as simple as, “Oh yeah, every developer’s just going to be an orchestrator and that’s going to work well,” and be accretive to our organizations.

I think we have to see. I think not all engineers are going to want to or be well-suited for being orchestrators. What does that mean for the role? What does that mean in terms of upskilling? Maybe this is just one of the really difficult human bottlenecks to actualizing the ROI we envision out of AI. It actually boils down to human judgment, and that’s the bottleneck.

Jesse Adametz:

Yeah, super interesting. How are we seeing AI influence work from home or RTO policies?

Abi Noda:

Well, right now, AI is putting a lot of pressure on all companies and therefore leaders are, I would say, generally cutting work-from-home. It’s more about the metapower balance of the workforce versus corporations. And I think AI, the dynamic on the market right now and the employment market is such that I think there’s a shift toward leaders cutting work-from-home policies and RTO. That’s what’s happening now. In terms of, how’s that impacting productivity? I don’t know. There’s too much other stuff happening to find that signal.

Jesse Adametz:

You mentioned the … engineers may not want to spend their time developing requirements kind of thing. That’s an interesting one, and I think it was also another maybe prior observation of the AI adoption that was like, “Oh,” maybe I’ll imply a little bit of two cents here where it’s like, theoretically we’ve always said, “Oh, you should write a well-formed spec, so you understand what you’re building,” and stuff like that. Maybe that goes in the same bucket of you should write good docs, and a lot of times we cut corners there as an industry. But it is true that if you give a well-formed spec to the AI, it does significantly better. Do you have any pulse on how folks are doing there? Are they starting to write better documentation? Are they doing spec first? Or is it maybe still, the alternative is still just vibe coding? What are folks doing?

Abi Noda:

I saw a funny comics circulating today. The best spec is actually just code, which was kind of funny. Spec-driven, buzzwords aside, I think it’s true. The AI is only as effective as the instructions given to it, so that’s table stakes. By instructions, it’s not just … there’s different types of instructions, like literally do X, Y, Z. That’s one way. Understanding of the broader requirements, understanding of the broader system environment it’s working in and specific tools and choices to make, I think this is a hard problem because I think it’s just human … We’re lazy and doing this type of thing is not super-gratifying because you’re not producing, you’re just setting up the foundation for producing. Yeah, I wouldn’t say I’ve seen any, again, specific playbook for how to do this super well at an enterprise scale. But certainly, I’m seeing a lot of platform leaders pay attention to the fact that things like documentation, having good base level foundational context for AI is really table stakes. And then, of course, at the individual engineer level, how to properly steer these tools is a skill and a challenge as well.

Jesse Adametz:

I’m going to pivot to a question from the Q&A that actually I have some similarity with with recent conversations. The question is like, how are companies handling the increased requests and pressure for “non-engineers” to start contributing to the code base? I’ll share, even internally, we’ve seen an explosion of GitHub seat requests, like more seats than we have engineers, and we’re scratching our head of “What’s going on?” But it’s like everyone’s a developer. And of course, if they’re writing code, we don’t want them to store it in Google Drive, and so we should give them a license. But yeah, this expectation of non-engineers, or is there an expectation more broadly? Is it certain companies? Do we think there’s a pattern there yet?

Abi Noda:

Well, what is a pattern is what you just described, which is I think a lot of organizations are trying to make a cultural shift, where all employees, particularly in R&D, so your designers and product managers, business analysts, are all expected to … There’s a blurring of the edges of roles, I think, is what we all see as possible, and so there’s a push in that direction. Again, in terms of how do we manage that? It sounds like you’re navigating that right now, Jesse, so yeah, I guess they need GitHub seats, they need Cloud Code licenses. What does this mean in terms of how teams managing production software need to operate? Again, comes back to guardrails and does this introduce more review bottlenecks? I think those are situational questions that folks need to answer.

Jesse Adametz:

I’d share from the platform perspective, it poses new questions for sure, like where we give out AWS accounts for different reasons or we have certain guardrails around what can go into production, things like that. Or we even assume, “Hey, every code base is going to make its way to production,” so we have branch protections and things like that. It’s this new shape where it’s like, “Okay, well, yeah, that team in marketing needs an AWS account, but it shouldn’t look like that one. It should not have PCI and all this other stuff.” It’s interesting because now platform is potentially serving an even broader audience and not similar personas other than they write code. That is a bunch of new conversation where it’s like, “Oh, well, we know how to create GitHub orgs.” “No, this one’s different and what does a new paved path look like?”

We talked a little bit about the SDLC earlier. One question that somebody asks, do we think maybe Sprint ceremonies in their traditional sense, do we think these are being challenged even? Is the two-week sprint dead, broken, dying?

Abi Noda:

I think there’s a lot of questioning of traditional process and workflow in general going on. From my point of view, in terms of what I’m seeing at companies and also what we’re seeing at our company, at both DX and Atlassian, I don’t think planning is dead. Planning and prioritization, back to what we talked about, you can have all the edge capacity in the world, but if you don’t direct it at the right things, you waste money and time. With one caveat that prototyping can be done a lot faster. Certainly, you can explore and spike on things faster, but when it comes down to what is our focus, I think you still need rituals and processes for determining where you’re going to spend your focus, your mental capacity, your tokens, and your people.

I won’t comment on two-week sprint cycles. Again, I think that’s more of a human. I don’t think AI really affects the cadence in which we want to meet and plan. The question about whether we should be using story points was certainly interesting. I still don’t think that goes away. Again, it’s not reality to think of all coding tasks as being reduced to a cost of zero. Humans are in the loop. Humans are steering and overseeing and reviewing the code, so there’s still significant cost and time that goes into developing things.

I think it still makes sense to, when we’re prioritizing, understand the relative time and cost of different things is part of prioritization. Whether you do that in story points or T-shirt sizes or some other unit, I think is an open question as it always has been. I’ve also seen discussions of like, “Well, do we incorporate token cost estimate into estimation?” I haven’t 100% understood what the value of that would be.

I think that would correlate … that’s like an implementation concern. I don’t think you necessarily know upfront how many tokens you’re going to burn when you go work on something, because it’s an iterative process. That’s my thoughts on that.

Erinn Schaal:

Okay. We have somehow come to the very top of our time there. I want to make sure I take a quick second to say thank you to Jesse for hosting the conversation today and Abi for taking the time to answer these questions. And of course, thank you all so much for joining us today.

Measuring AI impact, assessing readiness, and new data trends

Show notes

AI is expanding beyond coding into the full SDLC

AI readiness is a developer experience problem

Measuring AI ROI is messy and still evolving

AI impact falls into two buckets: amplification and augmentation

Background agents shift how work gets done and where bottlenecks appear

Specs and documentation are becoming critical infrastructure

Timestamps

Transcript