Did Twitter hire Engineering Effectiveness too late?

Abi: Can you start off by introducing yourself and giving a quick overview of your background?

‍Peter: My name is Peter Seibel. Most relevant, I spent four years at Twitter. A year and a half of that I was the tech lead for the engineering effectiveness group that we’re obviously going to be talking about today. I also worked on data quality and anti-abuse at Twitter.

After Twitter I was the head of engineering at the Democratic National Committee for four years, leading up to the 2020 election.

I’ve written a couple books and now I’m potentially on the way to becoming a high school computer science teacher.

You wrote a well-known blog post about your experience at Twitter. I’d like to start by asking you a few questions about that team. The team was called the engineering effectiveness group. What was happening at the time that led to the creation of that team, and what was the team’s intended scope?

So to some extent the name was a rebranding. Twitter for a long time had a team called developer productivity or DevProd that had always been really, as far as I could tell, underfunded. It was a small team. And as I described in the essay, a lot of stuff had grown up at Twitter organically with different teams just going their own way, especially when they got into breaking apart the monolith into microservices. It was like, well, every team runs their own microservice. Every team could do whatever they want within reason, and sometimes not within reason.

And as for developer productivity, they were responsible for the build tool Pants which was a version of Blaze from Google that was built in-house. They kind of ran a CI system and did some other things, but they were a really small team.

So engineering effectiveness came about after many years and in particular, in response to this project to merge the two mono repos that we had, which is obviously not what you want. So the new SVP of engineering went and hired a VP from Oracle who came in and rebuilt the team, took a bunch of other things under her control, and she recruited me into it. That’s how I got involved. So it was a very explicit attempt to try and put more resources into the problem.

That’s interesting you mentioned it was sort of a rebrand because the name engineering effectiveness does seem less common than the typical DevProd or DevEx team. Can you share more about why you thought engineering effectiveness was a better rebrand?

I don’t know that the name is necessarily better, it’s sort of synonymous. Developer productivity, engineering effectiveness, whatever. It was just at that point, a ton of good engineers had worked on DevProd, so I’m not trying to throw anybody under the bus who worked on the team in its original incarnation, but it was under-resourced so it just had a bad brand. It was like DevProd was the team who did not help you because they couldn’t.

A lot of really good engineers went in and tried really hard and left the company from DevProd because they were so burned out from trying to push the boulder up the hill with no resources. You couldn’t recruit people to come in and work on DevProd at that time.

So we needed a new name to help people understand that it was actually new.

How do you think other people should advocate for an investment in a team like this?

I’d say to start early. In my essay, I have this model that tries to show as your org gets bigger and bigger, even a disproportionately large investment might be justified in the proportion of your whole engineering that is devoted to tools.

But you should think about it even when you’re small. When is putting one person on it worthwhile? If one person starts and does a good job, they can set things up so it’s easier to add the next person at the right time.

Whereas if you wait and let everybody just roll their own stuff or people build half-ass versions of things that everybody uses and everybody hates, but nobody really wants to keep making them good, then it’s really hard to unwind that.

So I think, think about it early on and frequently check in. Are we at the point where we need to add more to this effort? And get good people on it. And if you’re lucky you get someone who’s a really good senior engineer who recognizes the importance of it and is willing to do the work themselves. They’re senior both so they can have good judgment, but also so they can bring other people along and convince people that their judgment is good either just by virtue of well, that guy or gal knows what she’s doing so we trust them, or if they have to argue about it, they can make a good case for why they should do a thing a certain way.

‍It sounds the team was a little perpetually underfunded at Twitter. As you’ve looked across industry and worked at other places, do you feel our industry as a whole is underfunding these teams?

So the place that everyone believes does this super well is Google. It’s easier when you have dump trucks backing up to the office unloading money every day to do whatever you want.

I suspect Facebook too: I gave a talk at a Facebook conference about this stuff, but they also have trucks of money showing up every day. Etsy, where I worked before Twitter, which was much smaller, had very good development and operations… They had a really mature process around how you develop software at Etsy, to the extent that every engineer was expected to ship to prod on their first day.

It was a small change, but you would get through the process of making a small change to the website in PHP and go through the process of how you push it live so that after your first day, you’re like “Okay, I know how to do that.” Some people come to a new company and they write code and six months later, they haven’t actually shipped anything because they don’t actually know how to do all the things that are necessary after the code compiles to get it in the world.

I want to rewind a little bit. Can you give a kind of a quick overview of what during your time at Twitter, the engineering effectiveness team mostly focused on?

The crisis was this monorepo. As I described in the early part of that essay, Twitter had grown up with sort of two separate monorepos that had taken cross dependencies on each other. So in order to update code in one side, you might have to get code on the other side changed to use the new thing, and publish an artifact which you could then pull. And so your code would now compile.

So it could just take a really, really long time to get changes made. And so it had kind of been decided that we were going to go all in on a monorepo.

Ironically, before I was on EE I was in a coalition of people who thought the monorepo was a terrible idea and thought that we should instead invest in tooling. You’re going to have to invest in tooling either way. I felt we can invest in tooling, that would make the multi-repo world better without many of the difficulties that would come with a monorepo the size of Twitter’s. A

But the things that people were predicting was that the repo was going to be so big that Git just wasn’t designed for the size of that repo.

So we’d decided, actually before my time, one of the senior staff engineers convinced the VP of engineering that we could make Git work by doing some fancy hacks on Git. And so they were working on that. Then there was a bunch of other stuff we had to do to actually harmonize the two different versions of Pants or get both repos on the same version of Pants. I think we actually did it in two steps.

One, we’d merge them in the source control, but they were actually still separate. So you still had to do this multi-hop dance to actually make a change if dependencies crossed the two repos, even though they now were in one repo.

And then we unified the build. So it went two repos to a monorepo, but a multi build and then a mono build.

And everything broke down. Because everything was at a double scale all of a sudden. So Git status took forever, Git pull took… If you went on vacation and came back two weeks later and tried to do a Git pull, you might as well go on vacation for another two weeks because that’s how long it was going to take. And we had plans for how we were going to address all these things, and we were working on them all.

So that was the big crisis that caused the need for the team. We did some other things along the way, but that was our focus initially.

‍In your article you mentioned that an area where engineering effectiveness can help is, “coordinating with engineers to push the good ways of doing things and stamp out the bad ways. Whether it’s how we do code reviews, test our code, write design docs or anything else.” So I’m curious, how did you do that at Twitter?

So some of that was more aspirational for EE. But the thing is, the tool you pick pushes a certain way of doing things. If it’s code review, maybe the tool gets the right people to review the right things. But it doesn’t stop there – it’s also about culture. You can’t just pick a tool and deploy it. You have to pick a tool, deploy it, and make sure people understand how to use it, and over communicate.

‍Why did you think that it was aspirational?

So code review, design review just in general, encouraging good practices and discouraging bad practices. We weren’t at the point where we were able to really go and work with other teams. We had so many of our own fires to put out. In that sense it was aspirational.

I had also talked to people about this idea of how do you get good people to work on engineering effectiveness type work? Some people love working on tools, but some people just want to build the product. Some people want to build the fundamental infrastructure of the RPC library or whatever. And so we had talked about ways you could have either people rotate into engineering effectiveness to maybe bring some new insights and expertise, but also to have EE people go sit with other teams and maybe even work on EE stuff still, but in the context of another team and help solve their particular problems. That way they’d also be like, “Oh yeah, I see what people are complaining about when they say CI is slow. Actually it’s this particular thing. And then we can fix it." As opposed to just lots of teams say CI is slow, but we don’t really know what that means. So if we had fewer things on fire and more headcount, then we would try and rotate people around and embed people in both directions.

‍In your post, you all also wrote that EE is hard to measure. And you shared with me before the show that you have an interesting story on Git-based metrics. Can you share a little bit more about that?

There was a perception, which was probably accurate, that code review was a bottleneck for a lot of teams. The thought was that it was contributing to engineer dissatisfaction because they’d write some code and then they’d wait so long to get the mandatory code review done — it just slowed them down. So that was a reasonable observation, and the typical inference from that is that managers should be managing that. Managers should know if their team is bogging down in code review.

But then you might think, “Well, how would managers possibly know that?” How would they know their team is being slowed down in code reviews. And you might come up with the answer, “we should build a tool that makes a dashboard of how long code reviews are taking by team.” And further, the manager probably needs to know what’s going on by individual too.

That information can be good if it’s just information that lets the managers and the team leads and everyone at some level have visibility into these things. I mean, even individuals, it’s like “Oh, wow, I didn’t realize that my teammates in aggregate spent 18 hours last week waiting for me to review their code.”

That might change your behavior. You might think, “Oh wow. I didn’t quite realize what an impact it was that I wasn’t getting to code.” But it’s hard to build a tool like that without freaking out the engineers that managers are going to be judging them based on some particular metric that comes out of the tool. And one concern is people are going to game their metrics in a way that isn’t healthy… It’s the classic, “Oh, you’re just going to measure lines of code I produce per day. So I’m going to write my code in a way that produces lots of lines. Not because that’s a good way to write it.”

So it’s really hard to make metrics that people won’t game. There are theories about it that say it’s impossible.

So when we were building that tool, I was pushing to make it a tool for the ICs first. If it’s useful to the ICs, it will have useful information that the manager can use to get similar insights. But if you build something that the ICs look at it and they’re like, “this is surveillance”, then it doesn’t matter how useful it is to the managers because it’s going to demotivate the engineers and also make them start gaming the metrics. And then it’s useless.

‍Were there any other attempts at measuring it while you were at Twitter?

Twitter had all these dashboards and big data, they handle something like a trillion client events a day. So the thinking was, why not have an event for every time a developer executes any of the development tool commands? Anytime they run Git, run the compiler, run the build, run CI, whatever. And then we can answer the question “Where does the time go?”. How many things succeeded, how many things failed?

It seems that would be useful in the aggregate at least to focus your efforts. “Wow, the average Git pull is taking five minutes.” And then you get into a little more surveillance-y things, like “the average Git pull takes five minutes, and the next command is executed 10 minutes after that because the developer always goes to get coffee when they do a Git pull, because they know it takes forever”. And so actually you’re losing 15 minutes of time and you need to get that pull down to 30 seconds to keep them in flow. So we started building those metrics.

I also built something that I found very useful. I have to acknowledge that it was not universally loved, but a lot of people liked it. I claimed the name “go/rage”.

Quick context: Twitter has go links, and this is common, these are where you set up a host named Go and then internally you can say Go slash whatever. And then you have something that expands the abbreviation. So I claimed the name go/rage.

So you would type go/rage and it popped up a page that had a text box where it asks “What are you raging about today?”. You would type some stuff and you’d hit submit. And then it would show you a little gif of people flipping tables. And it said “thanks for your rage”.

All the rage entries went into a database, and there was an admin side that I and some other people in EE could see. We could also tag them, but basically that was the tool. It wasn’t public. It wasn’t a tool for publicly shaming other teams or people. It was just a way to communicate what was frustrating.

Which partly I just built because that’s how I felt. When I was on the other side, I was like can I just tell someone that this is really unacceptable? How long this is taking or that this thing just broke again or whatever. It’s not worth filing a JIRA ticket. It’s not worth even emailing someone. I just want to throw it in the hopper and hope someone sorts it out. I, in EE, was the sorter. So I would get these rage entries and I had the email of the person who had submitted it. So I knew who it was, it wasn’t anonymous. (I told them that it wasn’t anonymous.) So sometimes I would email them back and be like, “I hear you, this is what we’re doing.” Or, “This is why this is not at the top of our to-do list right now. And I understand that it sucks, but we’re working on these three other things. And here’s why.”

I would sometimes share them with other people on the team, which was the part that not everyone loved. I think it was harder, particularly for folks who were responsible for the thing that people were complaining about, it’s hard to not feel attacked when people are saying your shit is terrible. It was easier for people who came in to be like, “Yeah, it’s all terrible. It’s not my fault. I wasn’t here when it got terrible, I’m just trying to fix it.”

I do think that also points to a characteristic that you need in this work. Because even if everything is great, I suspect people will always complain. So when you’re in dev EE or dev prod, the whole engineering team is your customer and engineers are super opinionated.

But it was a really useful tool for me to get a sense of areas of pain. In the end I made some spark charts based on the tag. So you can really see the spikes, Git is going up. And then we released a fix and it dropped down, and something else would spike up. It was sort of data to see at least the effects of what we were doing.

‍How did developers become aware about the tool? Did you advertise it in other places, or were there links to the Go link in other places?

Oh yeah. I definitely told people about it. For a while there I sent a “state of EE” email roughly once a month. Now, I did that partly because we had to tell people what’s up with the monorepo conversion. Is that happening? Where are we at? What’s going on? But I always plugged the go/rage tool.

It was like, “Hey, look, I have office hours that you could sign up for here. You can email me, or if you just want to rage, go rage.”

And it was funny, I didn’t always make it super obvious that it was an engineering effectiveness thing because it’s pretty generic. So I would also get rages about completely different corporate things. I would reply to them and be like, “Hey, I don’t really know anything about that. I’m sorry. Maybe you should talk to these people.”

‍Looking ahead 5-10 years, how do you think the role of engineering effectiveness and dev prod will change?

One thing that has changed, certainly from Etsy and Twitter, is that so much more is in the cloud now. Which in some ways makes things so much easier. You actually can just spin up a bunch of compute power to do a thing. But that also means there’s a billion more dimensions that everybody can do their own thing.

If you’re a tech company that builds web things in general, it should be easy for a developer to do certain things. But there’s so many choices now. That makes Engineering Effectiveness in the cloud world very different. The basic problem is the same: “how do you make it easy for people to do the things they need to be able to?”. But there’s so many choices now, so it’s harder, and more necessary.

Engineering acceleration tools

Did Twitter hire Engineering Effectiveness too late?

Transcript