Leveraging sentiment data, driving org-wide action, and executive engagement

David, thanks so much for coming on the show. Excited to finally have you on here.

David Betts: Yeah, thanks Abi, excited to be here.

Abi Noda: Well, a lot to cover today. We’re going to talk about how you’re leveraging sentiment data, some of your internal developer platform and portal journey. But I want to start with giving listeners a quick overview of your role within Twilio, the organization you help lead and what your focus areas are. So yeah, just give listeners a little bit of a background there.

David Betts: Sure. Well, I joined Twilio in the Segment business unit about two and a half years ago, and if you’re not familiar, Twilio bought Segment at the end of 2020. Previously I’d been a customer at a different company, so I was familiar with the organization. And I was brought in to lead what was called the deploys team at the time, and it was really like we had a collection of tooling teams. And how that’s evolved over time is now I lead the developer platform team, and that encompasses three teams within the broader infrastructure organization.

We have the release engineering team that’s focused on really that CI and CD pipeline. We have the infrastructure enablement team, which helps optimize cloud infrastructure like provisioning and management, and we have the developer enablement team, which is responsible for things like what is the developer experience, and how do we get product engineers to adopt the platform more broadly. And our primary tool there is backstage as our internal developer portal.

Abi Noda: I want to ask you a little bit more about release engineering. What does that look like day-to-day? For example, I presume you all have some sort of CI/CD workflows that have been rolled out for some time. Are you focused on building, layering on top of that? Is it just optimizing and firefighting and supporting teams, or are you migrating? What is the current tactical focus within release engineering?

David Betts: I would say we’re actively building a fairly innovative solution, specifically in the continuous deployment space. It feels like continuous integration, really getting to a built golden image. Everything we build and run is based on Kubernetes, so once we end with a container image, we feel like the CI experience is done, and so we happen to leverage Buildkite. It’s been in an environment that has evolved rapidly, and I feel like it’s pretty mature, so that’s really our CI platform.

But in the CD space, as we adopted Kubernetes, we quickly coalesced on a GitOps environment based on Argo CD, but what we found is that product engineering teams were coming up with their own custom solutions, the spaghetti of Bash in Buildkite and orchestrating deployments from their CI ecosystem. And we really drew that line and set that vision around CI ends when the image is built and validated and it pushes to your container repository, we shouldn’t be doing deployment orchestrations from that environment. And so we evaluated the broad CNCF landscape on every tool that’s available for orchestrating deployments.

We know some people have built tools in the open source community around that, but they haven’t necessarily evolved into the Kubernetes ecosystem as well as we would’ve liked. And so we ended up building our own custom tooling there. We started with an application we called K2. It was purely command line driven, and it did some abstraction on the Kubernetes manifest, as we saw engineers had kind of a trouble adopting that, but we really wanted to build that multi-environment, multi-region replication. And so we ended up with a tool that internally we call ShipWrite, to kind of help engineers orchestrate their deployments.

Abi Noda: One of the things you’ve done that I think is remarkable in terms of how your organization is driving all these platform developer experience products and initiatives forward is, you’re very data-driven. I know you have a lot to say on how you’re leveraging both sentiment data and traditional telemetry metrics to help you lead this organization forward. Share an overview of what that process and journey looks like today.

David Betts: Well, when I first came in, we were heavily system-based or quantitative metric-based. And really we had-

Abi Noda: The DORA metrics, if I recall.

David Betts: Yeah, along the DORA metrics, the four key metrics in DORA, and we had our systems flowing into that. And even at the broader Twilio level, there was an effort to meet on a consistent basis, review the metrics and identify, should we be making improvements? And I think we got to the point where we had made a bunch of improvements based on those metrics.

And so that exercise started to feel like overhead that we weren’t really identifying innovations to go drive or continuous improvements to go build against that. And as we started to look at, well, what other system metrics could we be collecting that would drive that innovation, we really stumbled into the whole sentiment-based approach. And it really resonated with us because one of Twilio’s core values is, ask your developer.

And of course that comes from a marketing aspect, but we do embed that inside of our ecosystem. And so we previously run ad hoc developer experience surveys based in Google forms, and we didn’t have consistency of data across surveys, time periods, or even execute them on a consistent basis. But we started to really hone in on let’s take a very systematic or data-driven approach to measuring developer sentiment. And so we launched our first survey, and we got above 90% response rate.

And we took a look at that data and it was a gold mine for us. It really helped us identify what are the low sentiment areas, where do people prioritize that they want to see improvement in an area, and open up free text comments and have the ability to follow up with individual product engineers and ask detailed questions about how can we get better in this space? And we saw that drive both us as an infrastructure platform team in picking our roadmap and building out the next 12 months of the platform.

But where we really got excitement was in the individual product engineering team, both team leads and product engineers would look at this data, have deep conversations within their team about why is this a pain point and what can we do within our team to improve it. We just wrapped up our last sentiment survey and we got 98% response rate, so it’s caught on like fire. We ran it for nine quarters, and there’s a lot of excitement and engagement with each survey.

I want to ask you, and you’ve had a lot of success with adoption of sentiment data and surveys at Twilio, so I want to ask you a series of different questions around areas I see folks struggling with sometimes. One of the things I want to ask you is more tactical perhaps, but how do you personally think about the distinct value of sentiment data versus more traditional or telemetry-based metrics, how would you sort of compare them?

David Betts: I would say that the sentiment data that you collect directly from developers is easier to action on, and it’s easier to understand. You can ask a developer, how easy or hard is it to set up and run your code locally? That’s a value that you’ll get, is this a good experience or a bad experience?

To measure and evaluate that from a system level or a quantitative perspective, you have to figure out, “Well, what are the contributing metrics that we can go collect? Do we have trust in the data that we collect? And then how do we project that into the experience?”

So I think it will give you more real-time information, but it’s a lot harder to reason about and take action on, and it requires a bigger investment. So the sentiment approach is just a much lower barrier to entry, and it gives you some pretty direct actionable th ings to go do.

Abi Noda: And how do you think about using sentiment data and telemetry data together? That’s another question I hear asked all the time is, do we correlate them? What do we do with both? Clearly there’s opportunity there, but how have you thought about that?

David Betts: When we think about how do we leverage the system level data to drive improvements, it really starts with the sentiment. So we use that sentiment to identify areas where we do want to improve. We will often continue the conversation with engineers through follow-up studies, but what we really want to aim for is what one, two or three system level metrics can we go correlate with this sentiment because we’ve identified an improvement that we want to make.

If I go back to the local environment setup example, we might determine that jumping into a new code base is really difficult for engineers in our company. And so can we measure the time to first one or five or 10 PRs when an engineer gets into a new code base, and then use that as an interim measurement on whether we’re making progress as our real-time feedback. We may be three months out before we get our next snapshot to see improvements in sentiment, but we may be able to get a one or two week turnaround on the system level metrics.

Abi Noda: And as an organization such as yours, that’s been very focused on DORA, at least historically, and I think DORA is still something that you look, at least leadership looks at. How do you think about how the sentiment data fits in or relates to DORA specifically?

David Betts: Yeah, and I think it’s important to carve out that DORA has evolved significantly beyond the four core metrics. And of course, we look at those four core metrics as just a bellwether, like is something going off the rails so badly that it’s finally impacted the DORA metrics that we should care about? We often will find that prior to it getting to that level. But if you look at DORA V-2, there’s a lot of measurement that’s happening by DORA around what’s the productivity efficiency and developer sentiment, happiness, satisfaction. Those are the type of sentiment metrics that we are measuring with our quarterly survey.

I want to ask you, so earlier you were talking about the overall process for how you gather sentiment data regularly, how you look at that to broadly identify where you should be focusing, how teams are using that data. As the person, one of the people who’s been driving this program around the recurring developer survey, can you talk more tactically, what actual things do you do after surveys to help achieve everything you described?

David Betts: There’s a lot of communication around our developer experience survey. We encourage people to look at the raw data directly, and it’s immediately available the second we close our snapshot. So we’ll broadcast out to Slack that we’ve closed the snapshot, the data’s available, creating action items for individual team leads to go look at the data. But it’s pretty organic at this point, on teams jumping in, they get an individual DM from Slack that the snapshot is closed and that they’ll click on the link and go look at it.

But more broadly, we create summaries of the data. We create both a slide deck that’s more aimed at the executive or the leader level that’s kind of giving a high level of where have we improved, what are the important drivers that people want to see improvements in. And then we create what we call the voice of the developer email that goes out after every snapshot, providing more depth into what are people saying, what is the platform team doing about it? And then we have a very intentional section where we call out where individual teams have made improvements in specific, what we call drivers.

So those would be a production debugging workflow. Has that gotten better? Did the team set a goal to improve this from the last snapshot, and did they see a big, 60% sentiment increase? And that’s to help bring visibility to the teams that are looking at this data, and intentionally driving improvements from it. And I think that helps reinforce that this data is used, and that individual engineers can see that spending five to 10 minutes of my time answering this survey every quarter drives meaningful change across the organization, or at least within my own team.

Abi Noda: Well, I really love that piece at the end that you just touched on about the recognition of individual teams that have driven meaningful gains or improvements. Recognizing that, I think as you mentioned, is such a good way to encourage other teams to mimic and follow that behavior and make it something that’s accessible to more teams and more developers who it might not be immediately intuitive that we can do stuff with this data.

David Betts: And what we found when we first launched this survey is that both teams and leaders had a concept that this was an infrastructure platform problem to solve. So if there was a low sentiment driver, they would expect us to come up with a solution. And when you look at something like documentation, yes, the platform team might be able to solve search, but they can’t improve the quality of the documentation at an individual service level, that’s on teams to solve. And so building that engagement at the lower team level is really how you drive the improvements on so many of these drivers.

Abi Noda: What advice do you have for engaging with executives specifically? You mentioned you put together sort of an executive overview. What have you seen over the now years of doing this as far as what’s resonated?

David Betts: I think executives initially are going to want to see correlation to data that they care about. For example, we have a sentiment around allocation. Where do you spend the majority of your time? Is it building new features, responding to incidents or fixing bugs? Executives also track that data through system level or other projections that they come up with.

So being able to correlate that data, and they can see this isn’t that far off, builds confidence in that data. The other thing that we typically get questions in during our executive readout are, how are we doing compared to industry averages? Can we break that down? Is this a fully remote team, a hybrid team or an in-office team? How does our data compare across industry size? Those types of metrics.

And then they want to know where, one, have we made improvements? I think everybody loves a good story about making an improvement in developer experience, but two, where should we be investing, and is this an investment that we need to make at the platform level that we need a new observability solution because production debugging or logging sentiment is low. Or is this a program that we need to drive across all engineering teams to improve local service documentation?

Abi Noda: I’ve got to say, David, I’m genuinely extremely impressed with your responses and how you’re approaching all these topics and questions. I’ve really appreciated all the insights thus far. Just wanted to say that. Double-clicking on the previous question, can you recall, see, you talked about all these different things that are the types of things that resonate with leaders, and how do you build credibility with self-reported or sentiment data. Can you recall specific moments?

David Betts: Surprisingly, most recently when we started launching our copilot or AI sentiment, both in frequency of use across the engineering community, as well as perception of time saved. And I think they, one, AI and LLMs are hot right now and coding assistants are top of mind for a lot of people. But the engagement in that was really around, can we be more effective? Can we increase adoption, and how can we correlate this with the system data to determine if we’re saving more time in this space? And I think that excitement around GitHub Copilot as our tool, and seeing that high-frequency engineers of GitHub Copilot are correlated with high sentiment in a variety of other drivers, not just in the typical metrics that you would assume would be related to it generating code quickly.

Abi Noda: One of the things I’ve heard a couple of times recently is, folks who are having trouble just getting buy-in around this idea of developer experience altogether. I just made a post earlier this week about someone I was talking to a leader who said, “Hey, developer experience kind of sounds to me like, hey, developers just want to be pampered, like they’re soft.” Have you seen this at all up close? Have you seen that misinterpretation? And what are some things you’ve done or advice you might have on how to navigate scenarios like that?

David Betts: So I think it is not something I’ve seen directly. Our VP and our general manager are some of the first people into the developer experience data. When the snapshot closes, I get a document that is called Chris’s hot take on DX, the day that the snapshot closes. So we’re lucky in that we have buy-in all the way up the chain. That developer experience is very important.

I think the sentiment that developer experience is like having a nap pod in your office really relates back to not seeing the value in either developer experience or more broadly a platform team. And so I think there’s a wealth of information in the industry about how do you prove the value of developer experience? And really what people want to see is are we being more efficient? Are we driving higher quality? Are we saving time?

Are we gaining FTEs based on making improvements in these specific areas? And so when we can turn around and say, “Well, yes, this is sentiment data, but people are self-reporting that they’re saving two to four hours by using GitHub Copilot per week, and that equates to four full-time engineers of time saved across our organization.” That’s the type of value when you can roll out a program, you can measure the sentiment and you can project that into essentially dollars saved in the organization.

Abi Noda: Yeah, let me get your input on another recurring conversation I’ve been having. So in this example with Copilot, right, so you get data, I mean, across the board, I think that two to four hours per week is what we’re seeing across the board with organizations, whether they’re gathering that data in partnership with us or they’ve had their own analysts do all kinds of back flips to come to that conclusion.

Now, the interesting conversation that I’ve been having is people coming to me and saying, “So our CFO has now turned around and said, okay, if this is saving us X amount of hours per week, where’s that showing up?” Does that mean we can reduce our workforce by the equivalent of whatever that percentage of FTE capacity is, or where can we get that savings? I’m curious, first of all, I don’t know if you’ve encountered that in your role, but whether you have or haven’t, how would you approach that conversation?

David Betts: From the perspective of, "Hey, we have freed up for FTEs across our organization with Gen-AI tooling. I think most leaders recognize that as increased capacity for the same dollar value. And most leaders don’t want to reduce their capacity overall, They want to grow it. They would love to grow it by headcount as well. But in our current market, being able to free up engineer time to get more done with the same amount of spend is very attractive.

Abi Noda: Looping back to something we touched on a few minutes ago. You talked about how some of the tension or dialogue around buy-in into developer productivity experience is closely related to the buy-in around even just platform engineering investments.

Obviously you work at an organization that has a inherent DNA and affinity and understanding of why this type of investment is important, but nonetheless, I’m sure come every quarter or every H1, H2 planning, you are having discussions about what is the right sized investment, not only in the organization as a whole, but within each of these different areas. How are you approaching that?

David Betts: Well, usually we try to cheat by just referencing Jeff Lawson’s book, Ask Your Developer, which there’s an entire chapter devoted to how do you size your infrastructure or platform organization, and it should be 10% of your engineering capacity.

Abi Noda: Yeah, easy.

David Betts: So if that doesn’t work, referencing our founder who wrote the book, then we talk about, what are the initiatives that we’re trying to drive? Our sentiment survey helps drive our roadmap. And so we build a one to two-year based roadmap for our platform. It then becomes easy to talk about, here’s the capabilities that we can deliver to the rest of the organization, and by when.

And when there is excitement about those specific capabilities, for example, we want automatic rollbacks based off of our Datadog metrics and our deployment tooling, but we don’t get it until for 12 months. We can then talk about acceleration of that roadmap by adding headcount. And so it’s less about give me more headcount because we’ll give an ill perceived return back to the organization. But it’s a specific capability that resonates with leaders that they want, that they can get faster with additional headcount.

Abi Noda: One other challenge I see organizations run into with surveys is the fact that you can’t run them all that often. You all run the survey quarterly. That’s probably as fast of a cadence as you can practically sustain just given the impact, the time it takes from developers as well as the lift it requires for you to actually administer and conduct the survey. So I know you guys are doing some pretty interesting things in terms of real-time feedback collection that is happening in parallel and adjacent to the quarterly surveys. Share more about how you’re leveraging that approach.

David Betts: And you’re right around the potential for a survey for fatigue, and running it quarterly for all engineers, introduces that. But the other aspect of that is we may be asking someone to rate their experience for something that happened two or three months ago. So they may not provision cloud infrastructure all that often. So can they really remember how painful that was, or has it faded in their memory?

So yeah, we run real-time surveys often triggered by a get PR merge, and as infrastructure is code using Terraform, that’s all coming through that mechanism. And it gives us that opportunity to select a percentage of that population and say, “Hey, I saw you just merged a PR with Terraform change. Can you rate your experience? Can you tell us any challenges that you had in making a change to cloud infrastructure?”

And we can use that in between the snapshots to get a sense of, are we making improvements? Specifically, we will drive these around platform roadmap capabilities that we’re actively rolling out to try to address specific sentiment issues. And so combining that with our system level metrics gives us a more real-time perspective on, are we doing the right thing?

Abi Noda: And given that you have these two or three or four, we’ve talked about a number of ways in which you can collect developer sentiment and feedback data, specifically as it pertains to the quarterly survey as opposed to this real-time event-driven approach. What would you say, in what cases is the event, what are the pros and cons? When should you use one approach versus the other, in your view?

David Betts: I think the event is, or the real-time is really valuable when somebody has just accomplished a specific task, and you want to get in the moment feedback before it fades from their memory. So they may be smaller annoyances. They can also be good for big annoyances, but you’re going to get those on your quarterly snapshot from the sentiment because they’re at the top of everybody’s mind all the time. And so these are the paper cuts, where it’s a small annoyance, but it’s not really something that they’re going to think about a month later.

And so in our internal developer portal, when somebody launches a new service from our template, that’s a great time to ask about, “How is your experience in creating the container repository associated with that?” Or, “You just merged a PR, did you use Copilot? What was your experience, and do you think it saved you any time on this specific PR, versus looking back over the last three months, how much time do you think you saved thanks to Copilot?” So I think you get a smaller granularity of data in the real-time surveys.

Abi Noda: Well David, I’ve loved hearing about how you’ve leveraged or are leveraging sentiment data to drive your platform engineering organization. Thanks so much for these insights and your time today.

David Betts: Yeah, thank you Abi, great conversation.

Leveraging sentiment data, driving org-wide action, and executive engagement

Show Notes

Timestamps

Transcript