Podcast

Supporting 100,000 engineers at IBM

Max Pugliese, formerly the Director of Developer Experience at IBM, offers a look at what it’s like to support tens of thousands of engineers. He explains why it’s important to think about the culture and processes surrounding the tooling changes a team tries to implement, how to stay close to developers, and more.

Transcript

‍**Abi: Thanks for coming on the show, Max. Can you start with a quick intro about yourself and what your role is at IBM?**‍

Max: Yeah, absolutely, and thanks for having me. I’m excited to be here. So I am responsible for developer experience at IBM. My team and I are kind of focused on providing easy to use tools and practices for product engineers to ship high quality, secure software quickly and easily, whatever that means to their individual product team and their specific business context. Obviously IBM’s a pretty large enterprise, so there’s a variety of things that we provide for the entire corporation like source control, and then some things that are more optimized for specific businesses or business units. And so we are fortunate to kind of operationalize all that through a number of different verticals, including like internal developer advocacy. That group helps to bridge the divide between product development and then the broader individuals who are building and maintaining the associated Dell platform, individual developer tools, right. So your source controls, your incident response platforms, building deployment automation, and then some things that kind of make sense for our tech landscape specifically like app and data accelerators.

I’m curious to know more just about your personal background and how you got into this role.

I am a product developer by trade, I suppose. I did that for five or so years, and then started moving into like engineering management of, again, product development teams. And I did that at increasing leadership roles until I kind of found myself managing, let’s say, 10, 15 product development teams. And at that time, what I kind of recognized is that myself and my leadership team were spending a ton of time looking at what our different teams were doing and where they were spending their time and noticing that a lot of them were doing very similar things that, while they were very important from a developer perspective, weren’t necessarily part of the differentiated value that we needed their team to solve.

And so my go to example of this is if we had 15 teams, we had 35 deployment pipelines and that made sense for those particular teams and what their use case was, but obviously there’s this opportunity to kind of pull that out, make it a shared service, and allow all our different teams to kind of benefit from that so that they can spend more time adding the differentiated value that we needed to and we could make sure that they all got to benefit from a leading edge deployment process.

And so, as I was doing this, I kind of recognized that there was an opportunity throughout our broader business unit to do something similar because people that I was talking to, my peers and such, all had very similar kind of observations. And so we put together kind of a pitch, if you will, to bring together some of these efforts and align them and organize around them so that we could start to provide a more like robust end to end experience for our developers.

That’s a great story. And how long ago did that happen, that pitch?

It probably started like two to three years ago. It’s been a process.

And so tell me more about the history of this team then. I mean, were there other teams like this team before this team?

Sure. So, IBM’s obviously been developing software for a very long time and there’s definitely been a variety of like developer productivity teams in the various product units, and there still are today. But at a corporate level, until very recently, what we primarily did was provide a suite of third party tools for teams to adopt as it made sense to them. And so, these tools were both in the general productivity category, like video conferencing, Slack, things like this, and also the developer productivity piece. And the upside was we could consolidate the procurement process, we can provide a really cohesive and consistent adoption process for teams as they onboarded. But obviously the downside is that a disparate set of tools does not necessarily make a frictionless end to end developer tool chain. And so teams were still spending a lot of time figuring that out and maintaining those tool integrations as part of their SDLC.

And so like I was saying, three or so years ago we started this conversation around now that we’ve containerized so many workloads and we’ve consolidated the number of environments we’ve deployed into through kind of migrations to cloud. And similarly, I think culturally there’s an increased interest in transparency of our engineering maturity, right, quote unquote. And so there’s opportunity for DevX to expand from a set of developer tools and into a much more opinionated end to end tool chain really emerged. And we made the decision to like really organize around providing that goal.

Yeah. That makes a lot of sense. And you mentioned to me actually, before this call, you had the same … You said, “Focusing on tools is fruitless if you don’t have good engineering culture or process.” What do you mean by that?

Yeah, so I think it’s really easy to have a conversation about tools because it’s like a least common denominator type thing, right. Every engineer, no matter where you work, needs some tools that helps them focus on what they’re doing. But I think equally as important, what we’ve done at least organizationally, is we’ve spent a similar amount of time talking about what we want our engineering culture to be. What’s important, why is it important, and trying to get buy in from everyone within the organization.

And then our role within that conversation is, sure we have an opinion, but we also then need to operationalize that culture and make it really easy for everybody to participate in it so that we can kind of take that step function change together and not necessarily just show up on a random Tuesday and say, “Hey, we’re going to start breaking builds if we don’t have 95% code coverage. And look at that, now we have much more high quality software,” which obviously is not the right way to do things. And so I think we have a much more partnership and collaborative approach around we’ve collectively said that this is important to us as a culture, so these are the changes that we’re making and what’s your feedback, and let’s kind on that together.

So that makes a lot of sense, and it sounds almost simple. But I know IBM, you have tens of thousands of engineers, so just putting together a quote, unquote culture is probably not easy. Can you go into more specifics about like what is that culture? Like what did you actually put together into writing?

Yeah. It’s interesting. And I mean, IBM is really big. And so we’ve been able … I’ll be honest about being able to make inroads with different groups better and more than necessarily others, and we do have so much different software development. But I think what we started with was getting a group of engineers that feel really strongly about this together and start outside of your like day to day work and start to talk about what’s important to us. And that conversation gravitated towards deployment frequency, transparency, quality, security, and those then became like the outcomes that we’re trying to drive, both as an engineering organization, but also then as a team, right. We start to measure those things and start to look at them. And then I think that that idea of let’s come together and talk about this and figure out from an outcome perspective and not necessarily a tools perspective, what’s important and what’s not.

So how have you begun to operationalize that? Again, I think a lot of people listening to this don’t have experience dealing with an org with tens of thousands of engineers, so how do you do that?

So there’s a couple things that really work in our favor, right. One being that we’re responsible for source code management, and, for a variety of reasons, everybody’s kind of gravitated to our offering. Which is great because it gives us kind of that lowest common denominator that everybody uses or pins off of. And from there, we kind of have a product mindset as much as possible where we go out and we talk to both individual developers and spend time with them and understand kind of their friction points. And we do a lot of surveying such that we’re able to build a backlog of capabilities where we can say, “Okay, what’s the effort versus outcome of these different things and where should we invest our time?”

And within that framework, I suppose, and with making sure that as we build, we’re super transparent about what we’re building, why we’re building it, we’ve been able to make progress down that backlog. Our starting place, just to get a little bit more specific, unsurprisingly I’m sure, is CICD at the end of the day. Especially from a product perspective, every product developer needs a mechanism to deploy their code into production.

And that’s been really effective for us. And we’ve got really good, I think, kind of like advocates out there in the product development space where they’ve said, “Hey, yeah, let’s co-create this with you. We’ll give you feedback or we’ll open PRs against different things that we’re seeing,” and our team’s very open to kind of accepting those PRs or these different things, so that there is like a little bit more of co-creation model than just a here’s release one, take it or leave it.

I’m really curious to dig in a little bit more into the survey practice. Again, just, it’s hard to grasp for me at the scale of tens of thousands of engineers. So I mean, who are you sending this to, are you sending it to everybody? And how often are you sending it?

So surveying, I think we do it every six months and we send it to everybody that’s flagged in our HR system as a developer or developer adjacent, right, from like a job role perspective. And it ranges from like really tactical, what’s your user set on source code management, to much more like cultural. Do you think sharing code with your coworkers is important, do you share code with your coworkers, which have very different responses. And so that allows us to kind of dig into some of these things a little bit more nuanced.

Gotcha. So it sounds like you kind of combine some high level sentiment towards different things, as well as just asking about what specific behaviors and practices them and their teams follow.

Yeah, absolutely. And then we can kind of like break out personas in some of this stuff. So like one of the ones that we go to a lot is what’s a new hire persona. How quickly is somebody hired and productive, and that is even really basic, like low hanging fruit. On your first day, are you able to see what repos your team merges most of its code into. Oftentimes the answer to that is no. And I think we’ve really traditionally placed the burden on the product team to have really good documentation in these different things, but from RC in DevX, we have that information, we have the data, and I think it’s reasonable to expect us to be able to say, “Hey, you joined this organization, these are their top repos. This is where they deploy into stage and test and production.” Just so you have a starting context for some of this stuff and you don’t literally have to ask everybody everything on the first day.

With the result of the survey, what are maybe the top indicators or scores that you’re kind of analyzing? I mean, is there some kind of top line, overall happiness of developers or are you looking at things at a more narrow level? And I’m curious, what, within your group, has been sort of like the most important KPIs?

The surveying is interesting because to some extent, you also get responses of what people perceive to want sometimes, right. And so back to like my product development days, if you ask somebody, if you want feature one, two, and three, they’re going to say yes, yes, and yes. It doesn’t necessarily mean they’ll use it. And so we were interested in like tool sentiment because, kind of for obvious reasons, I think. And then the cultural component’s really interesting.

But then one of the things that we’re starting to try to do much better is connect tool sentiment to outcomes and say, “Okay, so everybody likes our incident response platform. That’s great. But is it actually reducing our MTTR,” or some sort of other kind of resiliency type metric. And in some cases it is, and in some cases, jury’s still out. And so that kind of connects sentiment to business results, I suppose.

And then what are we measuring? So I think, that’s a really good question. And we look at, so for our adopters, I think there’s externally, what is your deployment frequency? What’s your code quality? Can we actually help you understand and digest that information? Can we make it more transparent into those kind of measures? But then on the internally, we’re doing a lot of comparing to baselines and saying, "Okay, so you joined our platform from the end to end fully integrated perspective. Are you deploying more than you were? Is your quality increased, is your lead time increased? And so far, we’ve been really encouraged by the information that we’ve seen.

That’s awesome. I’m curious, before this call, you had kind of mentioned this thing about developer experience and how it’s getting a lot of attention, right, and what it means at IBM. I’m curious for you to just elaborate on that a bit. I mean, when did the term developer experience, for example, even arrive at IBM and what’s that journey been like?

That’s a good question. I think I was kind of anecdotally mentioned I think like from an industry perspective, it seems to be gaining a lot of traction, which is interesting and kind of fun. Like I said, we really started talking about it I think three or so years ago. It doesn’t mean we weren’t doing things for developers, but really thinking about it as a end to end kind of experience and things like that.

You know, a lot of companies kind of approach developer experience in terms of, like you mentioned, stitching together different vendor tools. And you mentioned you’re trying to create this better end to end experience. What does that look like from a project or initiative standpoint?

So I think, going back to like from a persona standpoint, our vision, for lack of a better phrase, is when you start a new project, right, so you kind of initialize that repo and, well, ideally you don’t even take the next step, you just initialize the repo and it’s all connected. But current state is you create a new repo and you connect to kind of like our pipeline through a config file, right. So you have a YAML in your repo. And from there, you start to get all the benefits of what that means without necessarily, as a product developer, having to make any changes.

A really good example is over the last couple months, we integrated a number of scanning tools into our pipelines, and all the teams that were ordered onto it got that for free, right. They didn’t make any changes. And the other teams had to kind of take time out of their sprints and their normal cadences to add those things to their pipelines. And so when I think end to end, what do you get? That’s kind of the way that I’m thinking about it. How do we remove that burden from product development teams, but they still get like a leading edge deployment platform.

You talked earlier quite a bit about this concept of advocacy, right, and kind of operationalizing culture. What, practically speaking, are you guys trying to do at IBM in that area?

Yeah, so I think we’re super fortunate to be able to have like an internal developer advocacy team. And what they kind of practically do is a variety of things around education, community building, but also like your traditional kind of coaching and advocacy, and actually sitting with development teams, and helping them board the tools, seeing where they’re getting stuck. And kind of conversely, or as well, partnering with teams that are doing things really, really well, understanding what they’re doing and figuring out how we can go back and scale that across the organization.

And so I appreciate that that’s definitely something that we’re lucky to have, but I think it’s been really great because it’s allowed us to build really robust personas out to really understand the product developer constraints, where they have pressures coming from product owners, coming from their own management teams, and their own kind of timelines and deadlines. And so by having much more robust personas, we’re able to build very specific and well targeted backlog items for us to go down, while at the same time, leaving our platform engineers a lot of time to actually build out and mature the platform. And so yeah, it’s a really fun like glue type team, I suppose, that’s really interfacing between a lot of really passionate engineers.

Yeah. That sounds like a really awesome practice of kind of embedding advocates, if you would call them that, across the organization. I’m curious, what do these engagements actually look like? How big is the advocacy org? And is this just like a dozen people randomly sprinkled across the organization, or is there like a pretty substantial ratio in terms of the teams you’re able to reach?

So right now it’s probably even less than a dozen people. It’s not huge. And I think for now, that makes sense. DevX is a scale function, right. We got to be able to scale what we’re doing. And so the smaller team, I think, is intentional in that respect.

And I think the other piece that’s worth mentioning is a lot of what we do assumes that these teams have space in their kind of business process, right. We can’t kind of swoop into some team with a deliverable in a month and say, “Whoa, whoa, whoa, we’re going to completely change all your development practices.” And so there’s always this nice trade off of some team has a little bit of bandwidth, so we’re able to deploy a coach for a period of time to kind of engage with them for maybe a month or two, understand some of what they’re doing.

But I think, taking a few steps back, by having a really informed view on some of these metrics and measures, we’re able to then intentionally engage with people who are kind of outperforming and understand why they’re outperforming and understanding whether or not that’s something that we can incorporate and scale. Or other folks that are struggling a little bit, and we can understand kind of why that is and understand if that’s something that we need to make a tooling change for, or we need to kind of engage in another kind of way.

Because I think one of the things that we’ve really noticed is engineers, they want to be the best that they can, right. And they often have a lot of constraints pressed upon them. And so some of what we get to do is kind of understand what some of those constraints are and see if we can help them navigate that and maybe free up a little bit of bandwidth so that they can pay down some technical debt or what have you.

That makes sense. And so it sounds like more early on in the history of your role, you were very focused on tooling, and with this advocacy program, it’s almost like you’re coaching, right, local teams on the constraints and local challenges. So I’m curious how your view maybe has evolved. Like when you look at this problem of developer experience as a whole, like what’s the split of responsibilities or cause, in terms of friction, between kind of global tooling type things versus local team tooling or processes or just culture?

That’s a really interesting question. I don’t know that I can give you an informed proportion that doesn’t feel like a lot of false precision. I would say that it’s probably pretty equal. One of the trends that I do think that we start to see is teams that have kind of like cultural challenges or constraints, whether that’s, again, from a business perspective or otherwise, they tend to also have like tooling type challenges as well. And so they’re definitely pretty highly correlated between one another.

That’s a great observation. And I think in our experience, we see that pretty similarly, as far as typically cultural problems are almost like a leading indicator of a bunch of other problems, right?

Yeah, absolutely.

You mentioned that you use a set of indicators to kind of identify teams that are outperforming or teams that may be struggling in order to go kind of choose who to go engage with, in terms of advocacy. Like what are the top indicators you’re looking at to make those decisions?

Yeah, it’s a good question, because obviously the fun part about engineering productivity is there’s no lack of indicators to measure. I think we’re pretty focused on what’s probably like the normal set, I would say. Deployment frequency, lead time, kind of like quality of deployment, as well as security, essentially, from a statement of like open source vulnerabilities or other vulnerabilities, and time to close some of those things. I don’t think that’s the set that we’ll always use in perpetuity, but for now it seems to give us a pretty good indication of how things are going, to at least like start that conversation and start to dig into what’s actually going on.

When we talk to engineering leaders, we kind of hear mixed reviews. I mean, like you mentioned, that is a pretty common set of metrics, right, the four key metrics from the book Accelerate. I’m curious, have you run into sort of limitations with them or what are the shortcomings of those metrics in your experience?

So I will say I’m a fan of the book. I think it’s super fair. I do think that business context is so important. And I think some of those measures need to balance that, right. So if you’re in a highly regulated application or have some other kind of constraint, daily deployments might not be reasonable for you. And so you start to look at, well, what’s your frequency deployment versus what your target is, and let’s have a conversation about that if you are lagging your target. And again, so maybe daily deployments isn’t reasonable, but if you are hoping to deploy once a week and you’re not able to do that, like let’s have a conversation about it. I think it’s a good start, you just can’t blindly follow it, right. You have to kind of add that layer of business context on top. That’s probably what I hear the most when I talk to engineers as well, right. Like I get it, the book’s great, that’s not the world I live in though.

Yeah. I have a similar experience when I worked at GitHub, we tried to roll out these metrics, and I was also a part of a product team at the time. And of course, for example, lead time, right, like time to kind of deploy or ship, and we were shipping on-prem software. So of course our lead time was weeks, and we were kind of helpless as far as what to do with that metric, like it didn’t even really seem to apply to us.

Yeah. And I think that’s a really similar sentiment. One of the things that we talked to Todd about is like we want to look at your trends, right. Is your frequency constant or is it starting to trend down? And if it’s starting to trend down, why is that? Or same thing with quality and lead time. And so I’m a big fan of trends.

Yeah. I feel like everyone in engineering, we just love data, right.

Yeah, exactly.

So maybe getting back to these four key metrics, do you have any tips for leaders who are probably having that same conversation that you’ve had to had where developers are like, “Yeah, we like the book, but these metrics don’t … It’s not really our world.” Like how have you approached that in conversations or with teams when you’re engaging with them?

Yeah, I think, so going back to … We really try to engage with an open mind and understand the context they’re working for. I think engineers kind of can collectively agree that, or I don’t think that’s a crazy statement to say most engineers would say code that’s in source control but not deployed is not necessarily doing anybody any good, right. That doesn’t seem like a surprising take. And so I think you can have an open and transparent conversation about like, “Okay, I understand that you have some constraints, but what is your ideal? What actually makes sense to you?” And then when we start to measure these things in the tool, can we compare to that as a target or a goal, as opposed to blindly assuming that everybody can deploy every day.

And that’s where we’ve had a lot of success because it forces the conversation to get beyond, “Oh, no, that just adamantly doesn’t work for us and this is a silly conversation, into, okay, if you’re going to be reasonable about it, then I have to give you a thoughtful response.” And in that thoughtful response, there’s specificity and specificities, things that you can then plan around and you can take action on, and specificity is really interesting.

Yeah. That makes sense. I’m curious, do you look at any other indicators, for example, like employee satisfaction or engagement type metrics, particularly around this advocacy initiative? Like do you use those types of signals to figure out who to reach out to?

We’re starting to broaden our set of things, absolutely. I think that’s really interesting. And to an extent also how much communication between team members there are, especially in a increasingly remote world. That’s something that we can understand a little bit more on. And our teams that talk more, that develop more out in the open, so to speak, that are accepting PRs from groups that aren’t necessarily within their direct team, are they deploying more or less, or not necessarily just deployed more or less frequently, but are they releasing more frequently than others. And so I think there’s, going back to kind of that community concept, one of the things that we’re really trying to understand is teams that are engaging more in the community, do they have like a higher throughput, for lack of a better phrase, than not.

And I’m curious, just since we were talking about metrics, when you say throughput, are you looking at a metric?

No, I was saying like net progress or something like that, like what’s the for umbrella term. Are they delivering more frequently, I suppose.

Yeah. That makes sense. Well, pivoting a little bit, earlier you just mentioned, it’s kind of interesting that the term developer experience has kind of started trending up in the industry. Why do you think that is? What’s your view on that?

Yeah, I think there’s like two things that stand out to me, the first being, I don’t know if it’s like consolidation or ability to like programmatically manage, but the number of artifacts and deployment environments are definitely being consolidated, right. There’s increasing amount of product teams that are deploying what’s effectively the same thing, from an infrastructure management perspective, right. They’re deploying some sort of container. And so that commonality allows the ability to provide that as a shared service to actually be a tractable problem that can be solved.

And so when you combine that with, I think what’s kind of like an increasing specialization in engineers, right, you’re not going to hire, as an organization, an ML engineer and make him go do front end development. That’s a different skillset, that’s a different specialization. Instead, you want your ML engineer to spend as much time as possible doing ML things. That’s what’s interesting to them, it’s what’s core to your business, and so what you don’t want them to do is kind of continually build a CICD platform. And so as you bring those two things together, you’re going to start moving those components that can be a shared service into a shared service and then allow your engineers that ability to do what’s actually super interesting to them.

And then I think like in that same vein, there’s this inevitable maturation of like software engineering or product development that, as we get further and further, these pieces start to get commoditized, things that are commoditized can move into shared services, and then we can kind of continue to build the things that are super differentiated, right. So like a really good example is networking, right. So I was a product engineer, I did zero networking. That’s fully been a shared service for years and years now and I think we’re going to just start seeing that more and more. We’re just currently in like this product dev cycle of something similar.

That makes sense. And so earlier we were talking a lot about measurement, then now we kind of touched on developer experience and this whole concept and movement. Kind of bring those two things together, how does your group articulate the value and impact of your work to the rest of the organization?

Yeah, I think there’s two things, right. So we’re lucky in that, kind of going back to this culture statement, our organization has bought into this being important and this being valuable. And so in doing so, it means that we can show engineering productivity gains kind of in isolation and say, “Hey, if you join our platform, whatever that means to you, something is going to change for the better.” And so we spent a lot of time with teams baselining how their world was and then showing percent improvements on different aspects as it makes sense. And I think we’ve been really fortunate for those measures, right, like we were talking about like an increased lead time, to actually stand on their own because everybody appreciates their importance.

Though, taking that a step further, we’ve also had really good success showing a fairly linear relationship between some things in actual, like more traditional business measures. And so if we’re deployed more frequently, we can show that those kind of new product features are somehow being represented then in end user sentiment scores positively and draw some sort of correlation between those two numbers. Obviously there’s a little bit of challenge in that, that nothing’s as linear as we want to, but we’ve had some very good success in deployments in that respect, as well as some of the software quality things and shown applications that have a higher software quality have much higher sentiment scores. They don’t have that dip when defect goes out or the service goes down. And in doing so, we have a much more kind of consistent sentiment score from a end user perspective, and that’s been really fun to show those connections.

I’m curious, with such a large org that you’re dealing with, where do you look outward? Like are there people you follow or specific books or blogs? Like how are you staying in tune outside of IBM?

Yeah, so we talked about the Accelerate book, which I think is a really good starting place for most of this stuff. And I think I spent a decent amount on like dev Twitter, I suppose. There’s a couple Substacks on like engineering culture and things that I follow as well, and try and keep an understanding on the industry through those. It’s really interesting, some people, I think the Pragmatic Engineer is a really interesting Substack that a lot of people follow. And so I follow as well.

Yeah. We’re big fans of Gergely and his newsletter. I really enjoyed this conversation today, Max. Thanks so much for coming on the show and it’d be great to chat again sometime soon.

Absolutely, thanks so much for having me.