Podcast

How Shopify runs their developer happiness survey

This week we’re joined by Mark Côté, who leads the Developer Acceleration team at Shopify, to learn about their developer survey program. Mark shares what goes into designing and running the survey, what they’ve done to drive participation rates, and how they leverage their data.

Timestamps

  • (1:32) Starting the survey
  • (3:20) How the survey has evolved
  • (4:22) Three types of information gleaned from the survey
  • (7:37) Designing and running the survey
  • (12:28) Participation rates
  • (15:12) Why there’s an increase of interest in the results at Shopify
  • (17:42) What’s affecting participation rates
  • (23:03) Selecting survey questions
  • (27:01) Refining survey questions
  • (28:54) Survey length
  • (30:56) Analyzing the results
  • (33:31) How the data is stored and shared
  • (35:56) Sending targeted surveys to the right developers
  • (37:40) Using the results as a Developer Acceleration organization
  • (39:29) Confidence in the data
  • (41:27) The value of a developer survey

Listen to this episode on Spotify, Apple Podcasts, Pocket Casts, Overcast, or wherever you listen to podcasts.

Transcript

Abi: Mark, it’s so good to have you back on the show for the second time. I’m excited to chat today. Thanks for your time.

Mark: Thanks for having me back.

Abi: The focus today is on your developer happiness survey. This is something I’ve heard about, I’ve read about, I’ve heard about from people I know who work at Shopify. I’m excited to dive in because you, of course, now lead it. So where I would love to start is the history of the survey, and I know this predates you, but I would love to share with listeners what really prompted this survey starting and who drove it.

Mark: Yeah, so we’ve been doing this survey since, as far as I could tell, at least 2018 and possibly sporadically before then. But 2018, 2019 is when this got a bit more disciplined and organized. So again, from my archeology, as far as I could tell, this coincided with when my department’s Developer Acceleration got formed of it as its own thing and got its own identity. So around then, I think that was when we hired our first director of all of Developer Acceleration. So I believe it was a partnership with him, and then somebody became a principal engineer at one point to start to understand what our developers needed.

Shopify had a very early investment and strong investment inside of developer experience and developer productivity. And so I think we started this survey probably earlier than a lot of companies were because we needed, I think, direction. We wanted to make sure that what we were doing was having the effects that we wanted, and it was probably a form of marketing, I assume as well, and PR to help developers know this is what we’re doing, we care about your input. So we’ve been doing it twice a year since 2019, I believe, including this year, and I’m working on the latest edition right now.

Abi: I’m sure you’ve been following the news with the new McKinsey report, how to measure productivity, and so I’d love to hear from you. For listeners who aren’t currently doing surveys, why do you need a survey to help you improve developer productivity?

Mark: It’s interesting. We don’t use this survey to measure productivity directly, and we have another kind of different initiatives to look at metrics, and I think what we call engineering health as a whole. So this survey provides a lot of guidance for our Developer Acceleration group and engineering as a whole. In fact, we’ve changed over time, where when we first started, we looked a lot at a combination of our tools and systems but also overall engineering satisfaction, including the health of your code bases. Do you think you spend enough time doing maintenance tests and cleanup and all that kind of thing? And we kind of moved away from that over a while because we didn’t feel like it was super actionable. We got a bit more focused on our tools and systems and where we thought we would have the most impact by understanding our users’ satisfaction as a whole.

We pulled in some of our sibling teams, the ones who work on some of our front-end development tools, mobile tooling, other things in that area. But there’s been a renewed interest, probably along with these engineering health metrics, for people to understand general developer satisfaction with our code. We have one of the biggest Ruby monoliths in the world. It’s not all going to be perfect in there, and it’s hard to programmatically understand all the health of these things. So we need a bit of that subjective information. So we’re going to start pulling those questions back in around satisfaction with how things are going. But in terms of our Developer Acceleration group, we divide up, or I use a model, I love understanding things through models, of three general ways that we get information.

So one of them is our overall satisfaction level. So this is our longitudinal, if you will, questions that we ask every time that we’ve been asking from the beginning. They give us coarse-grained information. It’s very important. If these scores change a lot, that’s something to dig into, whether it’s positive or negative. Actually, historically, they’ve been very, very stable. So we ask about, “What’s your overall satisfaction with our tooling? How does our tooling compare to companies you worked at in the past, and would you say our tooling is getting better or worse in the last six months?” And we have very consistent answers, around about 80% of people give it a 4 or 5 point score on both how does it compare to previous companies and overall satisfaction. That’s been more or less stable for a while.

We also use this stuff to chart our progress on improvements. I think it was another podcast that I heard here, another episode of the Engineering Enablement podcast, where somebody was talking about the most important result of any work that you’re doing on engineering productivity is that the developers feel it. If you think you’ve improved something and the developers don’t think so, you probably haven’t solved the problem we’re at. So as an example, we spent a lot of time improving our CI, which was, I think, the p75, something like that. It was 45 minutes on our monolith. It was way too long.

So the team spent several months digging all the way down the stack to basically the kernel level to try to squeeze more performance, and we reduced it to about 15, 20 minutes. That looked amazing on paper, and we continued to ask our pain points, and in fact, the CI satisfaction time would drop down the list of pain points. It wasn’t considered the top pain point for a while. It seems to always be on the list because you can’t seem to make CI fast enough, but we could correlate this answer with the improvements that we had done. So it wasn’t just outliers that we were fixing. We improved satisfaction as a whole.

Then the third reason we run this… The third main reason is to prioritize our pain points.

So ranking of pain points, we always ask that, like, “What’s your top one you can answer in your own?” But we have a lot of predefined answers in there. They give us an indication of what the new problem spaces may be that have emerged or have come back up, and confirmation that the problems that we’re working on or that we’ve just improved on are the things that we should be working on. Then a rough idea of prioritization. So most people use one of the pre-written answers, and these things are what I would call medium granularity. They’re enough to seed user interviews to dig deeper into the problem later.

I think that the free text, qualitative answers give us an interesting kind of input. They’re usually very specific and related to types of tooling that we haven’t directly asked about. So they’re hard to extrapolate because sometimes there’s patterns, sometimes it’s just one-off things, but they must be particularly acute because people are taking their time to actually write out something for us, whereas most people probably want to breeze through the survey as quickly as they can and answer all these questions just by point and click. So if you’ve taken the time to write it, that’s obviously bothering you, and it’s worth for us looking into.

Abi: I love this three-prong where you think about the value you get out of the survey. I want to go into design and methodology. I think one of the things people don’t realize is just how much of an investment and a lift it requires to do something like this. I’ve heard stories from folks at Shopify who were really impressed with how this is run. Could you give listeners an idea of the high level, “What goes into it? What kind of people are involved?” I think data scientists and your HR team, your team, there’s a lot of people involved with this. Just the big picture, what does this require from an investment standpoint?

Mark: Yeah, it is an investment, to the point that I have to make sure I don’t procrastinate on it. Because I think like, “Yeah, this is not too bad.” And I get into it, I’m like, “Oh, okay.” But most recently, I spent several hours just reading all of our past reports just to make sure that our latest edition was going to be going in the right direction. But so we run this in conjunction with my team, Developer Acceleration, the main key there, although we have a recent operations manager who helps me out with a lot of these things too. And then we partner with, I think the team has changed its name a few times, but the talent research and insights, people analytics, basically the HR department over there, that they have their own team, that they’re the ones who do also our pulse surveys, and they’re used to crunching people data.

So it’s a partnership between the two of us. We’re the subject matter experts, although not the full subject of matter experts. As I mentioned, we pull in our front-end development team, mobile development teams, and other people to help us make sure that we’re answering a wide range of questions. This makes sense. So we run. We are the Developer Acceleration, one of the ones that kind of set the basic themes, the questions, and everything. But we get a lot of help from the data scientists on the people analytics team to help us formulate some of those questions properly. There isn’t an art or, actually, probably a science that goes into survey design in terms of how do you phrase these questions to try to get the best signal to reduce ambiguity so that you can trust the answers more. So they help us with some of that formulation, and then they’re the ones who actually do the mechanics of it.

They put it into the survey tool, they send out the reminders, and then they do all the number crunching at the end. So we run the survey twice a year, as I mentioned, approximately every six months. In the first survey of the year, we will target about 50% of our developers. Then in the second survey, we target the rest. We used to skip anybody who would only been at Shopify six months or less, but I think that’s actually a very interesting cohort. I’m not 100% sure why we used to skip it. I think as long as we have the demographic information that allows us to look at them separately from people who have more tenure, it’s a very interesting thing. It’s like, "How quickly do you get used to our tools? What kind of pain points exist for the first six months that maybe don’t exist in a year’s time?

So after the survey closes, like I said, the people analytics and talent research teams crunch all this data. They prepare a big, long doc for us that has comparisons to past things, the key points, the top-level summary, insights based on the freeform text answers. Then we do a joint presentation. We prepare a joint slide deck. They go over, again, their findings and their insights from a purely statistical approach. Then the second half of that presentation is where I and sometimes other people from my team will go over what we’ve done from last time, where we were addressing previous pain points, and what we’re doing now or plan to do in the near future to address the findings in this report. Then all that goes into our internal, what we call the vault, like our wiki. Basically, we have a single page in there that has all the past results, links to videos, and all that going back to 2019.

Abi: Clearly, there’s a lot involved, and that was an amazing overview. I want to start just double-clicking into each of these parts. I think for listeners, there’s so much here to learn, starting with the twice-per-year, 50% sampling cadence and approach. How did you guys land on that? Have you tried other cadences and sampling methods? How did you arrive on twice per year, half of the organization on each survey?

Mark: Yeah, I think that was mainly just there’s a statistical… Even when we were a smaller company, 50% is still a very statistically significant part of our population. We’re much bigger than we were, but even years ago when we started this, there was enough people in there, and then it just doesn’t burn people out on surveys. I think we have had a bit of a fall in survey response rate, even though we haven’t changed the frequency of it, but there’s been a lot of other surveys, formal and informal that have gone out more from different departments.

I wonder if that’s kind of contributed overall to people being a little less engaged in spending yet more time answering these surveys. So I think we have to do a bit more marketing and maybe change a few things. We did toy with the idea of maybe we could do it more often, and 25% of the time. I think the big trade-off is, as you mentioned before, the investment in preparing these things every time. We’d also have to take into consideration whether we would want to keep the survey the same longer, whether we would want to change things more often, those trade-offs around that. But the six months felt right and gave us enough signal.

Abi: You mentioned there was recently a bit of a drop in participation rates. So I’m going to ask you, what was that most recent participation rate, and what kind of participation have you typically seen in the past?

Mark: Yeah, for a long time, I didn’t realize exactly how well our surveys were performing until I read and heard about other rates, participation rates. A couple of years ago, we were somewhere around 70% of respondents of people we surveyed would respond to it in the engineering departments. So apparently, that’s pretty good. The last, that held pretty true until, I think it was just this past year, it dropped a little bit. It dropped about 50% in January, and it’s a little lower than that, somewhere in the 40% or something on the most recent one we did. Actually, I guess that’s earlier this year. So it’s been dropping a little bit, and I have some theories as to why that is. At first I thought, “Okay, we’re just asking too many questions.” It’s like a CI suite. People just want to keep adding in things.

I tried to maintain a one-in-one-out kind of thing, but I think that failed. And so I thought, “Okay, maybe that seems like a reasonable answer.” But then I actually went back finally and counted up all the questions, which I hadn’t done before, and realized that depending on how you answer, because we have a more complicated branching model according to if you’re a mobile developer, a front-end developer, or our core monolith developer, the typical number of questions you probably answer is actually lower now than it was a couple of years ago. So I guess that’s probably not the answer there. If I had to take other guesses, we have done a lot of surveying. We always have our biannual pulse survey, which is more general developer and more satisfaction across the whole company. We also have just miscellaneous small surveys that go out from, sometimes from my team, sometimes other teams, are very targeted, so that could be a part of it.

Another thing that I’ve noticed, and I wanted to bring this section back for multiple reasons, is, we used to ask questions that required more intimate knowledge of some of the code bases. We’d ask people about, “How do you feel about the area you’re working in, the code you’re working in? If you have time to spend on Techjet, where do you typically put that time?” And we get a little more opinions about just the general state of how developers are spending their time, and I’m wondering if that was an engaging way where people really enjoyed it.

I don’t think I’ve ever done one of these surveys when I was a developer back in the day, but I can imagine that’s the part where I’d probably have a lot of opinions, stuff that I’m working in every single day, the inner loop, if you will, and all the CI/CD stuff is stuff that I have to attract with, but it’s not the fun part. So I’m curious. We’re going to bring those back, and we’re going to put them in a section fairly high up and see if that gets people motivated to answer more questions.

Abi: Yeah, you mentioned these questions yet removed and then adding, and in addition to your theory around the participation rates, you’ve talked about how there’s just been a general renewed interest. Can you share more about that renewed interest in bringing back these things? And for listeners, these are things like you mentioned around code quality or even HR-related things, maybe that your team can’t really affect but, for the broader organization and teams could be interesting. So share more about what’s driving that renewed interest in these measures.

Mark: Yeah, I think unsurprisingly, as you mentioned, there’s a lot of publishings on developer productivity right now, and I think Shopify, like anywhere else, some of SLT, is interested in getting a sense of what you might call engineering health and satisfaction. Some of the things that we can measure a lot more accurately, participation in certain systems, usage of certain systems, we correlate some of those developer experience things with hard data that we try to match them, as I think I mentioned before around CI times, to make sure that it’s reflecting people’s actual experiences. But there’s other areas where it’s incredibly hard to measure how tidy your code is, or do you think your testing suite is comprehensive enough? And we do test around or we do measurements around intermittent test failure and flaky tests, but how much does that really get in people’s way?

The only way you can know that kind of stuff is by asking people. So I think there’s been just that we’re not the SLT. Our senior leadership team was not really realizing they get the whole picture of exactly what’s going on in the engineering experience, and this is one way to learn that. Then I looked at some of those questions and results and some of the regression analysis and predictors of happiness, and I found those very interesting myself. So I want to just dig back into those. We also have a team that’s dedicated around Ruby developer experience in particular, and then they have a vested interest in some of these things as well to see if they should target their tools in certain areas.

Abi: I want to give a huge plus one to what you’re talking about in terms of some things can only be measured by asking people. Just recently, I had Collin Green, Ciera Jaspan from Google on the podcast, and Google recently has also been trying to measure technical debt and they’ve been measuring it through surveys. Then this whole study, where they tried over 100 or over 70 different hard metrics to try to measure tech debt, they concluded that none of them were predicting tech debt at all and go back to the drawing board.

But so interestingly, here you echo the similar views for listeners. I want to ask you a little bit more about participation rate because, when I talk to other leaders in your shoes, participation rate is a really big pain point, and not just participation rate, but what to do with the participation rate. So I want to ask you, how do you and the other leaders view participation? When you see 50%, for example, is that good, is that bad? Do you get leaders or people on your team questioning the reliability of the data because of that participation rate? Or are data scientists telling you it’s representative? What’s the dialogue and thinking around participation rate?

Mark: I’d say I don’t like to see a drop in any metric, including arguably one of the more interesting metrics or more important metrics. I have exactly, like you said, been told by the data folks that this is… We have a good number of engineers at Shopify, and even if we’re down to 40% or something, that we still… That’s very statistically significant. So we’re not worried about the accuracy. I haven’t heard anyone, senior leaders or otherwise, remark that this is a big problem that we can’t trust the results. I think we just want more participation because, in and of itself, the participation rate is a vote about how much you think my team and leadership are listening to your concerns. So we’re actually adding back. Actually, I don’t think we’ve ever asked this question. I got this from another somebody at Fairwell who said that she asks, “What’s your confidence level that changes are going to be affected by the survey itself?” And we never really ask that.

We always go, and we make our presentations afterwards, but I have no idea if most developers are just thinking nothing’s going to change because of this. So I think that’s a vulnerable question to ask. I’m slightly scared to see these responses of that, but I think it’s important to stare that at the face. If it’s very low, we’ll have to probably dig in and interview some people and figure out why. Are we just not thinking of the right things to work on? What’s going on? Is it too slow to notice? Sometimes we have so many small, incremental changes. People think nothing really changes until one day they talk to somebody who was here five years ago and realize, “Wow, things are radically different, radically better now, but you don’t always see it in the moment.”

Abi: Off the record, I’ll definitely reach out to you to learn what you learn about that confidence feedback from developers. I think that’s a great and vulnerable approach. I love it. It’s great to hear that the participation rate you’re getting now is not an obstacle for the credibility or the interpretation of data. I recently spoke to a leader who was at Amazon in the early days of their tech survey, and she actually had a different experience where they were getting 50% participation and leaders at Amazon didn’t see the data as reliable, and so they started really going for a census. But I think the difference is that at Shopify, you have trained data scientists who understand the concept of sampling and getting representative conclusions out of sample sets of data, which is totally a legitimate way of using data. So I think that’s probably the bridge or what’s allowing you guys to have success even at the same rate that Amazon was getting in their early days. So that’s interesting.

Mark: They provide us margins of errors and other things like that. Despite my engineering degree, I never dug too much in statistics, so I take them at their word a little bit for that. But yeah, as I said, nobody seems to think this is a problem, but we still want to drive this up if we can.

Abi: Yeah. So for listeners who are getting started with surveys or have been doing this for a while, based on your experience, what you’ve seen, what are the general strategies you would recommend for driving good participation rates?

Mark: Yeah, I guess this is the test for them right now since they dropped a little bit. It’s really trying to make sure that you have a partnership with our user base, that it’s like we are here to make their lives better, and to try to make them understand that… Try to make them understand, it sounds a little patronizing, but to try to work together so that everybody understands that the most important things are being addressed and that we are putting our efforts where they should go to make everybody’s day-to-day development experience better. The more that trust that there is there, the higher the participation rate that you’ll get in that. So that includes, I think, the most important thing that we do is we share back the results. We do a talk. It’s not just a report that gets published somewhere, but I’m up on a screen for 5 or 10 minutes talking about what we did last time.

We want to do more outside of that, I also try to publish just every couple of months like, “Here’s some stuff, some wins that our department has done, some changes that maybe you didn’t notice because you’d work in certain areas.” And again, just show the impact that we’re trying to have. Then there’s, of course, the little things like the reminders that go out. You have a survey that goes out, and you can see we’ve tracked the spikes. As soon as the first reminder goes out, like, “There’s another little spike.” And the second reminder, “There’s another little spike.” So it’s clear that those little… I feel like everybody has so much to do that you have to nudge people and you have to talk about, “We really, really appreciate your participation in this kind of thing.” But I think it really comes down to making sure that this is really in a partnership. That’s not just for the survey. That is the whole point of my department. If we stopped having that trust in our department, then we wouldn’t be anywhere near as impactful. We wouldn’t be doing our jobs properly.

Abi: Shifting topics a little bit and something we’ve already touched on, but I want to get a little deeper. What’s the process of developing the survey items? Is this your team’s job? Is it a collaboration between you and other technical teams, or is it the PhDs and the people analytics team? Take us through the process, and I know since you’ve been doing this for a while, it’s probably kind of a historical bank that you’re working off of that was originally developed. So take listeners through the process of developing survey items.

Mark: Yeah, it’s mostly my department, or Developer Acceleration as a whole, takes the stewardship, if you will. It was our former director when I was hired, who I think may have even started or at least took over this survey and really made it into something bigger. Then he ran that in conjunction with our people analytics and talent research teams. So we have definitely had a fork, the document kind of model. Every iteration of the survey, we have a master question list, we make a copy of it, and then we go through the whole thing from top to bottom and try to pull in the people. So we want to associate some of our questions with the subject matter experts that are related to them. Then I tend to reach out to them and say, “Hey, we’ve got a new iteration.” And I’m always reminding them, like, “What did you get out of the last one? Were these things actionable?”

Because people will think that something comes out and they’re going, “This is going to be super interesting.” And then they get the data afterwards and they’re like, “Ah, this actually kind of matches what I thought and I actually don’t know what to do with this.” And so that means, there’s a question that we could just get rid of and make room for more questions. So the people analytics team at the beginning doesn’t tend to be too much involved. They will help us with the phrasing of certain questions, both for consistency but also just clarity and consistency. I think around the wording of some of these questions to make sure that we get the strongest signal since that’s not our specialty. So we pull in all these subject-matter experts. Most recently, I’m going to pull in a VP as well, who has a more vested interest in these things now, and he wants to go over the whole thing from top to bottom and see if his concerns are reflected in here.

As I think I mentioned before, where I’m trying to take a little bit more of a global thing. We got pigeonholed a little bit into just the actionable feedback for my department and maybe one other department, but I want to get a broader sense that I can feed this back into other leaders, directors, and VPs and so that they can understand, slice up the data a little bit and understand maybe what’s going on in their departments, the components that they work on, the apps that they work on, if we have, depending on how fine brain data we get on that. But a lot of it is just getting the right people in front of this for the right doc and always making sure that every question that we have in there is something actionable and that it makes sense to continue to ask it and then to change it, remove it, or replace it with something else if it looks like we’re just not going to get any signal.

Abi: Structurally speaking, is it Likert, unipolar, bipolar, 5-point, 7-point? I really wanted to say all those things, but you get my point. Are you part of those debates and discussions? What’s the design from a scientific standpoint?

Mark: Most of them are 5-point Likert, so either a star rating or very unsatisfied to highly satisfied. Whenever we publish the results, the satisfaction rates are always 4 or 5 stars. So if we say 80% of developers are satisfied with our tooling, we mean 80% of them have given it a 4 or 5 star rating. We do have a few multiple-choice type questions as well. So our pain points will be like, pick 3 or 5 of your top pain points that get in your way, and then we’ll have a little other section where people can write in things as well. Then we have a lot of freeform comments as well.

Most of them are just in addition to our other questions, we don’t really have too many that are just open-ended. I think we might have one or two now. Most of them are like, “If you have comments about the above, throw those in there.” And then our people analytics team will be able to dig in and look for patterns and analyze this kind of stuff. Maybe use LLMs now. We didn’t really have that option before too, so that could be a new way of seeing what’s summarizing complaints and suggestions.

Abi: Yeah, it definitely seems like a use case. I want to ask you, and you touched on this already, but when you’re trying to measure something or get a signal on something, you come up with a survey item, and it doesn’t always work well. The first time you run it, you get a bunch of data, “Oh, this wasn’t maybe really what we were going for.” So how much churn, not A/B testing but iteration, do you see around specific items from survey to surveys? Is there quite a bit of churn, or do you tend to get it on the first try and then stick with it so you get the trend?

Mark: Yeah, actually, that’s something I was thinking about recently when I was looking at past pain points and seeing how they changed over time. But then I was like, “Wait, we keep adding options to these pain points as we think of them.” I’m like, “Wait, this is going to be diluting past votes.” Because maybe before they would’ve said, “CI times are the longest.” But then you throw in deploy times as a new one. They’re like, “Actually no, that bothers me more.” And so you can’t compare them from year to years easily. So I’m trying to lower the amount of churn on those multi-select ones in particular because I think that changing that too much, will throw off comparisons in the past. They’re still valuable in their own right to analyze now, but we’ll lose some of the historical data. We have some questions that we basically never changed.

There are what I call longitudinal questions of overall satisfaction in comparison to previous companies you might have worked for, that sort of thing. Then we will add specific things in if there’s a new technology. So when we started where we launched our own cloud development environment, we threw in some specific questions around that because we’re very, very interested and slightly worried that people were finding this difficult to work with, as was the case for a couple of surveys. So there’s a few of them that we tend to drop. I try not to tweak the questions too much because, like I said, it’s too difficult to compare from before. But definitely replacing them and adding new ones in.

Abi: Yeah, makes sense. I don’t know if you have good beta on this, but do you have an idea of how long it takes the average person to actually complete the survey?

Mark: That’s a good question. I think we looked at that last time. Our most recent one, we didn’t have that data available, but we had people who would run through it, and I think it’s somewhere around the 15 to 20 minute mark, which seems like we wouldn’t want to go any longer than that. But that was another thing. I want to test these surveys ahead of time with a handful of people who weren’t involved in the process or unbiased, that maybe they could give us their first impression of this survey coming in from only knowing the previous ones and then tell us like, “That felt really long.” Or, “I don’t know why you’re asking this kind of thing and stuff.” So we’ve never really done any beta testing of the surveys before, or at least not as long as I’ve been here. So I think that will be useful because, speaking from experience, more than 15 minutes in the survey, and I’m like, “I’m going to get bored.” And we don’t tend to record partial answers or anything like that, so we just lose all that data.

Abi: I’ve gone down the rabbit hole with this recently, and listeners are interested. Email me. I can share this, but there’s actually quite a bit of literature out there that’s interesting that looks at the relationship between survey length and participation rate, and it’s not as precipitous of a drop-off with increasing length as you would expect. I was surprised by that research, but empirically and just based on my own experience going through surveys, like you mentioned, the longer the survey, the more you feel like I’m never doing this again. So it’s interesting to look at the research, and it’s something I would recommend to folks.

Mark: I imagine. That’s where I’m coming back to the engagement as a thing of these questions that we’re asking them that they really care about. It’s no fun to do a survey where you just think, “I don’t even care about any of these topics.” So you’re not going to finish them if you’re like, “This isn’t relevant.” Or if you get questions that sound too similar. I’ve done consumer surveys before, I’m like, “You’ve just asked me this question six different times in slightly different ways. I don’t care anymore.” So I want to try to get bang for the buck onto every question if I can.

Abi: So you get the results twice per year. You’ve already touched on this, but take us through the process for analyzing this data. I know that’s something you lean on the people analytics team quite a bit for. You’ve mentioned demographic and regression analysis. Can you give some concrete examples of the types of regression analysis that you guys do or what you found useful?

Mark: Yeah, something that I thought was very interesting is, there’s specific, and again, I’m not a stats guy, so again, I’m taking their word at it, but there’s correlations from particular tools or particular usage of tools and their overall satisfaction. So we have an internal tool called Dev. It’s just our Bootstrapper. It’s how you interact with repositories and do a lot of automation, everything from cloning them to running tests, and the DevOp is a command that people use all the time to just install all the dependencies or keep it up to date and all this. The satisfaction with our DevOp was correlated to a high overall satisfaction, and that doesn’t surprise me because yet people are doing this every single day.

Similarly, comfort with production infrastructure was an overall indicator, and this is where our people analytics team suggested, like, “If you want to get overall satisfaction, these things are so heavily weighted that the smoother you can make any of these things, the more happy people are invoking these commands, the more smoothly, quickly they run. The overall experience of working at Shopify is going to be higher.” So that really gives us a place to dig in if we want to improve the life of developers.

We’ve also seen things, the people with shorter tenure are actually more satisfied and think things are getting better than people with longer tenure. So I think that feeds into my… The longer you’re here, the changes seem incremental, and you don’t really think things are getting better as fast as you’d like them to, or you get used to get spoiled by the tooling that we have here. Because people come here from other companies that are just like, “Everything is so easy.” You’re here for five years and all you see is like, “Yeah, this could be easier, this could be easier.”

So they prepare the report that has a lot of those little insights and stuff in it and just like a long, I don’t know what it is, 6 to 10 page doc or something that we publish, and then they go over some of the results in the video afterwards so they can highlight some of the particularly interesting finds, and then anybody wants to dig in more data. We are returning to a confidential but not anonymous method where we can get all that demographic information. So we’re not going to look at your specific… Only our analytics team will have access to the specific individual answers. I won’t have any access to that or anything, but now we can slice these things up by tenure and by gender and by level of impact and all that kind of stuff and get interesting correlations that way.

Abi: You mentioned that the data gets put in a document, published to everyone. I wanted to ask more about where you put that data. Is that… For example, Amazon has their own tool they’ve built, Google has a similar tool where anyone can come in and query, slice, dice, explore the data. Do you guys do something similar to that, or is it more of a static report that gets shared out?

Mark: We did a tiny bit of that last time, and I feel like it was the first time, as far as I know, that gave some people access. I think more directors of stuff to see this kind of stuff because it is sensitive. We don’t want people using these things for the wrong purposes and measuring individual productivity, satisfaction, or something. So in the past, it’s been more static. We just have the report. It’s all on our internal wiki-type application, and all the historical stuff almost as long as we’ve been running the survey, I think back to 2019. They’re all published there. The videos and slides that we all do afterwards are there. But I think as our internal data gets more mature, I think that’s a wonderful idea that I want to explore as well, give people the ability to look into those. We have a lot of data analytics tools, and we’re in the middle of switching out to better ones. So I think there could be, I wouldn’t be surprised if I get feedback from directors and VPs and stuff that they’d love to see more of that data under their own control.

Abi: So if your reporting currently is more static in nature, I assume that individual teams don’t get team-level breakdowns currently, but is that something you get asked? Does the managers reach out and say, “Hey, I would love… Can I get a report just for my team?” And if so, how do you respond to that?

Mark: Yeah, I’ve gotten that a little bit. What we’ve done in the past or more recently is divide by not so much teams but areas. So the mobile developers and people, the directors of mobile development here, we have their own section, and they get included in helping to set those survey results and everything. So that we can give them the data specifically around, “Here’s how mobile developers feel.” And similarly, the people who work on our core monolith, we can give them that kind of information. Then there’s a little bit of specific things where we actually ask them like, “What part of the monolith do you work in? What’s your favorite parts of the monolith and stuff?” So this is broad data, but we can feed that back to individual teams as well to say, “Yeah, your component is rated one of the best in or fun to work with.” Or whatever, like, “Why? What do you think is different about your component compared to some of these other components so we can learn as a group together like them?”

Abi: What you just said just sparked another question, and I think listeners might have this question as well. It sounds obvious, but I don’t think it is. How do you actually know which of your developer population works in the monolith or what part? Are those questions in the survey? Or are you combining the survey data with GitHub contribution data to identify segments?

Mark: Yeah, no, we ask that very specifically. We ask how many reposts you work with. Do you work with our main one? We have a lot of some branches in there. So it’s like if you work on our core monolith or if that’s one of your top repos, then we’ll ask you follow-up questions. I think again, there is actually some interest right now by some of our data science folks that I’ve been working with, some new people that I’ve worked with the first time around. We could probably get some of this data automatically. We don’t need to ask people. So it’s not so much, they want to trust them or something, but there’s probably… We could save on certain questions if we ask things, but I think at the same token, we wouldn’t want, probably, to pull in that data at the beginning of the survey. I’m not sure offhand.

That feels a little creepy if they already know everything that I know at the beginning. So we’d want to ask, “Are you a core developer?” And things like that. It gives a subjectivity kind of thing too. As an aside, we actually asked people around to GitHub teams on a separate survey recently and ask them, “How many GitHub teams do you think you’re a part of? But don’t check.” And then we asked them, “Okay, now check and see.” And people were members of more of these teams than they thought, so there’s funny little things that we could do around people who don’t actually even realize some of the things, like how often they work in something unless you measure it. So there might be some interesting perception versus objective reality comparisons we could do.

Abi: Yeah, it could be interesting if the inverse were too. People don’t commit to a repo, but they say they work on it.

Mark: Yeah, a lot of people probably have strong opinions about these things that they don’t actually work in very often.

Abi: Switching gears a little bit, I want to ask about how your team interprets and uses the results. You’ve mentioned you get satisfaction scores, and you also do the problem ranking. How do you balance those two? Which do you look at more as the signal of what’s important or what’s improved? Do you look at the changes in rankings or the changes in scores? How do you think about those two things?

Mark: Yeah, the ranking is one of the major things that we use because these pain points, a lot of them, are always there. It’s just, the magnitude of the problem changes over time. So we saw an increase in people ranking poor documentation for a little while. So that was the concerted effort from my team and some other teams to shore up our documentation, our training, various other things, and then we saw that fall down the list. The same thing with the CI times. So that’s part of our verification. If we spend a bunch of time on something, we should see those fall down the rankings. Something else will always bubble up because we always have something to do that’s always… Could be something that’s better.

Some of the specific questions that we dig into are around scores are, where we are, especially systems that my teams are responsible for and particular aspects of those. So we ask them about our CI systems, how’s the understandability? How’s the speed? How does that feel? And those ones we ask are a little more targeted. We know everybody interacts with our CI every day, so is it getting easier to use or is it getting harder to use? That’s something that we want to dig in more, whereas that might not be something that somebody thinks about as a specific pain point. They think more broadly, probably.

Abi: Makes sense. Referring again to this recent conversation with some folks at Google, they talked about how one challenge in their early days of their survey was that leaders didn’t just really trust survey data or view it highly. So I’m curious if you’ve seen that at Shopify at all, people who just dismiss survey data for whatever reason, and how you’ve maybe navigated that.

Mark: Yeah, I hadn’t seen that so much. In fact, we sometimes see the opposite where people want to know, we use GitHub Copilot a lot, so what are people’s thoughts about that? Are they meeting our expectations of something and questions around pair programming? We wanted to know how much are people pairing. That’s really built into Shopify’s DNA, pair programming. So we could try to get some metrics one way or another out of some of the pairing tools we use or something. But it’s much easier to just ask the people directly, like, “How often do you pair? How’s your experience doing that?” So I think that Shopify’s early investment in Developer Acceleration is, I think, a testament to these things that are hard to just objectively measure and that we have to trust some sort of sentiment. We have a concept of a trust battery at Shopify, and how charged is your trust battery? And we try to keep our trust battery charged with our developers, but we can only really know that by talking to them.

I think the two sides of the broad surveys, where we get the more quantitative information from a wide segment, go hand in hand with our more targeted user testing and requirements gathering, where we go and talk to a smaller number of people but in depth about how’s your experience using our CD tool and everything. So I think if you believe in that kind of user testing, and Shopify really does for our products, the surveys are just the opposite side of that, a complimentary opposite side of that. So I’ve never seen people particularly dismiss any of this stuff. In fact, as I said, I think there’s a growing interest in it now.

Abi: That’s awesome to hear, and I think myself and listeners probably ponder about what is it about Shopify that’s different than Google? That there’s just a cultural acceptance and embrace of survey-based data and signal, whereas at Google, it’s been a little bit more of an uphill journey. Of course, now they’re in a good spot, and leaders love it, all that sort of thing. But that’s interesting to consider. So you’d be the right person to ask this final question, then. For organizations that are thinking about maybe doing this type of developer survey, some of the leaders might not be sold, someone’s advocating for it. What would be your pitch for doing this?

Mark: I think it’s, again, if you are investing in your developer, experiencing your developer productivity, if you have or are considering setting up an org to do that kind of thing, I think you just need that feedback cycle. You’re going to… Presumably, most of the companies out there are doing marketing and serving their user base. If we have a friendly client-customer or customer-producer relationship inside of here, we need to know that same information. I think there’s no way you’ll get it. Developers, a lot of them like sharing their opinions if you ask them the right questions. I think compared in conjunction with your own metrics that you can have around usage and other things, it is just a really good way of getting a sense of what do your developers really care about. What is their blockers, and how do we build trust that we’re trying to fix these things? And this kind of feedback loop, I think, is invaluable.

It doesn’t have to be a long survey to start with. It could just be rounds of things of, “How’s your day-to-day experience on this? Are we meeting your expectations around our CI/CD?” Probably not because, like I said, developers have very high expectations, but to what degree are we not yet meeting your expectations around CI/CD, and how do you feel working in your code base? Do you wish you had more time? But the key is that you have to follow up on those things. If you’re going to ask these questions, you have to do something about it afterwards. Otherwise, people will actually lose trust instead of building trust. So as long as you’re taking these things seriously, even if you’re not getting it perfect on the first survey, feedback the information that you’re learning back to the developer, show that you’re taking it seriously, and then you’ll just continue to build this trust, you’ll continue to build a high-value developer experience, and people will talk about it, and then more people will want to work for you.

Abi: I love it. I hope listeners who are in this position of thinking about this can use the inspiration you’ve just provided and hopefully move things forward. But Mark, hey, I really enjoyed this conversation. I really appreciate you taking the time to chat with me again, the second time on the show. Thanks so much for the insights you’ve shared.

Mark: Thank you for having me back. I’ve enjoyed this conversation as much as the last one. It’s always great chatting.

Abi: Thank you so much for listening to this week’s episode. As always, you can find detailed show notes and other content at our website, getdx.com. If you enjoyed this episode, please subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Please also consider rating our show since this helps more listeners discover our podcast. Thanks again, and I’ll see you in the next episode.