In this week's episode, Abi is joined by industry leaders Idan Gazit from GitHub, Anna Sulkina from Airbnb, and Alix Melchy from Jumio. Together, they discuss the impact of GenAI tools on developer productivity, exploring challenges in measurement and enhancement. They delve into AI's evolving role in engineering, from overcoming friction points to exploring real-world applications and the future of technology. Gain insights into how AI-driven chat assistants are reshaping workflows and the vision for coding.
Abi Noda: Thanks everyone for your time. We’ll jump into intros in a moment.
But just to set the agenda for today, we are diving into all things GenAI, specifically around developer productivity. We’re going to talk about use cases, adoption, strategies for how to drive adoption and implement these types of tools and initiatives within our organizations. We’ll talk about what we’re seeing as far as adoption and impact. Some of us are measuring the impact of GenAI within our developer organization, so really excited to hear about what we’re seeing on the ground. And so to kick things off, we’ll start with really brief intros. My name is Abi, I am the CEO of DX, and really excited to welcome our guests on today. We’ll quickly go around the horn. Alix, if you wouldn’t mind sharing a brief intro about yourself?
Alix Melchy: Sure. Alix Melchy. I’m a VP of AI here at Jumio, so we’re an online identity verification company. I’m a driver responsible of all the AI components throughout our product line.
Abi Noda: Awesome. Anna, joining us from Airbnb. Share a little bit about your role and focus?
Anna Sulkina: One of the organizations I lead at Airbnb is developer platform on the mission to maximize developer effectiveness and productivity.
Abi Noda: Awesome. And Idan, joining us from GitHub Next, share with listeners a little bit about your focus at GitHub and what GitHub Next is?
Idan Gazit: GitHub Next is the, I guess, at other companies it would be called the innovation team. That’s such a terrible name. We’re the long bets team of GitHub. Our job is to research things that could be impactful for developers and we’re the original team that created Copilot and then all of the follow-on things. We focus on how do we make software development better, safer, faster, easier, more accessible in whatever forms. And nowadays, a lot of that seems to be about how do we take this new technology and apply it to the work that we all do.
Abi Noda: Awesome. Well, we’ll kick off the discussion. Again, I want to remind listeners, please post questions in the chat. We’ll do our best to weave them into the discussion as we go. Anything you’re hoping to learn about top of mind or questions about specific things that are discussed, please post them in the chat. Where I want to start the discussion today, I talked to a lot of organizations who have high hopes for GenAI tools across their business and especially for their developers. But actually getting developers to adopt these tools is often a challenge. And these organizations see it in the numbers, they hear it from their developers. I want to talk about what we’re all seeing in terms of adoption and some strategies for how we can drive adoption? Alix, I want to start with you. I know you’ve shared that at Jumio adoption of GenAI tools has been a little bit of a mixed bag. Can you share more specifics on what you believe are the factors contributing to this and any advice here for other companies to try and avoid these pitfalls?
Alix Melchy: Sure. I think part of it is the hype and the enthusiasm that comes with GenAI. A lot of developers were perhaps a bit over-enthusiastic about what they could do, and then you hit on the reality of everyday life. That was a bit of what happened. What we did do then is really to focus on what I would call pioneers of champions. People that are more willing to experiment, to spend a bit the time for identifying the precise use cases that are better suited for the tools at hand. And that has shown a bit better success here and better traction. And so the idea here is to drive a bit more adoption through this model of, if you want, community of practice having some champions that champion certain use cases.
Abi Noda: In talking to colleagues of yours across the industry, any specific pitfalls you would call out, things to avoid or watch out for, for leaders who are trying to drive adoption?
Alix Melchy: First, even before studying that you do want to have a certain framework around GenAI. And so that’s where we spend some time even behind the stage before we rolled out all this process, what’s our policy around that? What’s the policy framework? What kind of tools do we allow here? What are guardrails we put in place also? What’s the generated code? How do we check for its compliance? For example, all aspects like that. All this needs to be put in place so that then developers can plan safely. And then in terms of adoption is, do you have a huge commitment from the leadership to drive that? Do you have then what’s your investment that you’re putting in? Are you relying more on this approach with people that are spending the time experimenting a bit, evangelizing that? Are you spending the time to build a bit more infrastructure and tooling here? As for example Anna has done at Airbnb. These are aspects that need to be very clear. Also depending on where you are at in your organization.
Abi Noda: Anna, Airbnb is really at the forefront of experimenting with GenAI tools for the developer organization and has had a lot of success in driving adoption. Before diving into how you’ve been able to successfully drive adoption, could you start by sharing the actual current use cases for GenAI within the Airbnb developer organization? And we have a timely question from Russ at Plaid on that topic as well.
Anna Sulkina: That’s a great question. We have three major areas where we apply AI and engineering for developer productivity. And so first chat. We provide an internal AI interface for all Airbnb plays. And it has been really a great productivity booster for developers. And for that we offer two main chat interfaces. One is a vendor specific, which is hooked up to all sorts of internal sources such as Google Drive, Slack, email and things like that, documents, et cetera. And so that’s really helpful when you want to figure out what is this thing, what does it do, how does it work? And that enables everyone at the company to learn quickly. And the other chat that we have is more useful for specifically coding and development cases. And so that chat is our own Airbnb aware. And it’s specifically helpful where you want to overlap Airbnb specific context and source code and things like that with the more general understanding of engineering and practices, et cetera.
And so those two have been really useful. And we are also seeing and experimenting with even going further, bringing the Airbnb aware chat into the ID where the developer is to reduce the context switch that folks have to do to access that information. And also the future that we are looking into there is to really evolve that and provide a platform so that anyone at the company can create their own chat, their own domain specific chats.
And so providing some building blocks for that. So that’s one stream, the chat. Then GitHub Copilot is another fantastic tool that Airbnb has been successful and benefiting from. And I know we’ll touch on that later. And then third, last but not least, is integrating AI into developer workflow. You can imagine basically having assistance for code reviewing or CI debugging or large scale migrations. And so this is a more nascent area, but we see a lot of potential and really good experimental results.
Abi Noda: Anna, again, a lot of companies I talk to making varying levels of investment in these types of tools but concerned or are facing challenges with adoption. Share what type of adoption you’ve been able to achieve at Airbnb and some of the things you’ve done or the environment in which you work that have enabled you to achieve that?
Anna Sulkina: Taking a case of Copilot adoption and speaking with my interested peers, I understand that there is sometimes desire but hard to actually drive the adoption or get buy-in. We’ve been fortunate with fantastic leadership support here and that’s been key really to prioritize and fast-track approvals to be able to even bring in and use Copilot. And we, as you can imagine, there is always a wave of people being super excited about the new fantastic tools.
So we had a wave of folks, first pioneers who wanted immediately to try that. And so that was the first wave. And then the strategies broadly speaking that we leveraged, broad reach-out and education and resource sharing, tech talks, sharing best practices and things like that. And then more pointed direct reach-outs to folks. And that also included not only reach-outs directly really to people who have Slack or email but also nudges within the tooling itself, within IDE for example. And so overall that whole strategy so far has been really successful and we are around 70% adoption of active developers using Copilot on a weekly basis.
Abi Noda: That’s an amazing stat and very aspirational, I think, for a lot of other organizations who are driving these rollouts. And I am going to ask you in a minute to double-click into the method you mentioned of utilizing nudging and pop-ups. I think that’s very novel and I know listeners would love to learn more about that. Before that, Idan, I want to go to you. At GitHub, you’re actually in charge of developing and testing new GenAI tools for developers. Share a little bit more about what your focus has been, the use cases you’re seeing as opportunities for GenAI? And share your process for developing and testing new GenAI tools for developers?
Idan Gazit: I think maybe the key thing to realize is that in the early days of GenAI, I think we all, us, everyone, conflated it with authoring code. And it is very useful at authoring code, but that’s not the only thing it’s useful at. And when you actually look at the work of software development, take any professional working software developer and ask them, what do you spend your time on? And there’s no lack of studies to support this notion. Very little of it is actually authoring the lines of code. Almost all of it is this cognitive work of what code do I need to touch, how do I affect the changes that I want to affect. All of this sense-making and orientation in the run-up to actually typing out the five lines of code. I’ll have an hour of thinking followed by 30 seconds of typing.
While it’s very valuable to generate the code and sometimes it’s great because that lets me skip right over the thinking, I’m like, ah, how does this API work? Blah blah blah. But in this time Copilot is just like, “Here’s the line that you want.” And I’m like, “Why, yes, that’s exactly what I wanted. Thank you very much.” I think the future that we are all reaching for is tools to help us with the rest of the job, the sense making the orientation. Here’s a new code base or here’s a part of the code base I haven’t touched in six months and therefore it’s gone, my brain deleted it. A lot of what we focus on is on figuring those things out. How do we aid developers with that hard part of the job and how do we build tools that feel AI native instead of bolting another chat sidebar onto the side of something. Chat is great but we can reach for more, I think. And so a lot of what we do is focused on that discovery and that mechanism, let’s call it.
Abi Noda: Well, Idan, in a moment we’ll go deeper into some of the challenges of understanding where are the most impactful use cases and opportunities for GenAI. But before we go there, Anna, I do want to double click, since the first time you shared with me the nudging and pop-ups to drive adoption at Airbnb, I’ve been eager to learn more and I’m sure listeners will be as well. Share a little bit more about how this is implemented or how you think others could implement similar strategies?
Anna Sulkina: Yep, great question. A few things here. First, essentially thinking from the point of reducing the friction and context switch as much as possible for developers, so meeting them where they are. And that started with we pre reserved seats for basically all active developers so that there is no reason, you don’t have to request anything. That’s number one. Then going into looking into pre-installing Copilot. Again, reducing the number of steps and hoops you have to jump through. And then what you mentioned here is specifically nudges. Banners within the IDE. You open your IDE, the icon already, its pre-installed ideally. And then there is a nudge for you, hey, try this thing. That’s been from all of the strategies and tactics that were applied with driving adoption, that’s been the most successful, that drove biggest adoption. And again, it’s all about reducing the friction. You don’t have to think about it. No matter how many reminders and emails and whatnot you send, you still have to do something to try it even. And here it’s right there so you can try it and then decide whether you like it or not.
Abi Noda: Thanks for sharing that and I’m sure myself and others are eager to see more examples of that and learn more, but we’ll move on for now. I want to move into impact. One of the funniest questions I get asked a lot is, there’s a lot of hype and research and marketing around GenAI tools for engineering right now. But folks often ask me, how much productivity improvement should we actually expect? What are people actually seeing? Of course that’s a very tricky question, as we’ll get into. But Anna, starting with you. I know you’ve conducted quite a bit of research on this question at Airbnb. Share how you’ve approached conducting this research and what some of the findings have been?
Anna Sulkina: Let’s see. A couple of things worth mentioning. One, we do bi-annual developer experience surveys and we augmented our recent last year survey adding specific questions around AI, applying AI to developer productivity. And so, for example, one question was how much has AI tool increased your overall developer productivity?
Again, it’s really hard as we know, hard to measure developer productivity in the first place and especially mapping that to AI. And so it’s all measuring the developer perceived improvement. And so in response to these questions, what we’ve heard that half of the respondents believe that AI has improved their productivity by more than 20%. We’ve seen basically quite a big chunk of the respondents saying 20 to 30%. In more extreme cases like [inaudible 00:20:20] respondents believe that their productivity has increased by 45%.
Again, it’s self-reported, so take that with a grain of salt.
And then most majority of the respondents, like 95%, said that they saw at least some positive effect. Worth noting that this was not just about Copilot, this was about all AI tools, and as I mentioned that included chat, Airbnb aware chat and things like that. That’s one. And then we also, last year’s hackathon that we ran across the company, there was a productivity stream within the hackathon. And we’ve seen tremendous number of prototypes built during the hackathon. Clearly the appetite was there, excitement and creativity was through the roof. Now, it was amazing to see, but as we all know and folks can attest here that it’s really easy to spin up prototypes. It’s [inaudible 00:21:24] to make them of a quality that is actually the reliability and quality that can be used truly in production. Those are the two things.
Abi Noda: Thanks for sharing. Those are exciting numbers and I saw of course comments in the chat around the interesting discussion around perceived productivity, self-reported productivity versus actual productivity outside the scope of this discussion today. But actually I think something Idan will speak on soon. Before we go there, Alix, you’ve mentioned that you’ve also recently surveyed your developers on their uses of Copilot. What were some of your findings?
Alix Melchy: We went more with surveys or question and full text answer or interview to understand a bit better around the use cases, where they were struggling and where they had success. I think the first takeaway for us was, as I said, adjustment to the tooling capabilities. And so here that was the first learning that we got. And so readjusting a bit here in terms of the best practices to leverage the tool and to go over this frustration or disappointment. The second thing is there are some good use cases where we’ve seen very good adoptions around, for example, test cases. That has helped a lot in our context to leverage GenAI for that. And so that was an area where we wanted to double down on. That’s a bit how we went about the feedback from the developers. And I said very hard to have a quantitative impact on the productivity.
And so it’s more about how they felt in the experience of using the tool. And tellingly, how often they use it also. Because you get what they say but what they actually do in the tooling. And then that’s where we start seeing a bit more, okay, this kind of a user group or this kind of usage is working and the other ones that’s not really where we should spend more time. And perhaps that has led us also to reassess the choice of tooling. So that’s why we are going back to re-evaluate a bit some of the choices here.
Abi Noda: Well, I know that’ll be a cliffhanger for folks listening, but we’ll leave it there. Idan, of course, your team focuses on developing and innovating, inventing new tools. GitHub publishes a lot of data on the productivity impact of the new tools and products being developed. I know you have a lot of insight into the question of, what are some of the unique and inherent challenges of measuring the impact of these types of AI products and tools?
Idan Gazit: That’s a good question. I’d say there’s a couple of aspects. The first is that there’s not that much prior art. If I go and Google for how do I measure my SaaS product, I will be drowning in blog posts telling me exactly how I should go about it. But when it comes to GenAI things, you can’t actually test locally. Any one specific sampling you take means nothing because of the inherent non-determinism of the responses. And so the only way to know if your thing is working better or worse is at scale, which means that you must effectively test in production. For example, with original Copilot, we had a baseline metric of acceptance rate. And so okay, X percent of people are accepting this suggestion. Now, let’s break that chart down by language. Does the model perform better for Python, which is very, very popular and is out there on the internet a lot. Which means that the model is probably better at it than Haskell.
Haskell is less popular. But okay, maybe it’s also about the latency to where inference happens. If the user is in Asia Pacific but their inference is happening in US East, then there’s an additional few hundred milliseconds of latency and is that what’s killing their experience, is it a time bound kind of thing? Or maybe we also want to track it’s accepted, but is it retained at 30 seconds, at two minutes at 10 minutes? There’s this combinatoric explosion of charts and then how do you actually know if it’s good?
The only way is to pick which ones are your most interesting metrics to track and realize that you must effectively test on your users in production because anything else is not really going to tell you much. How it works on your laptop is not going to be clearly better or worse, it’s just going to be anecdotal sampling of what’s coming out of the models. I think those are the big challenges. And then there’s the big meta challenge which is also like is it useful? Is acceptance a proxy for value? Coming up with how to define value is hard and it was always hard and it will always be hard in this as in everything else. And there I have even less guidance there. It’s like you got to know your product and your domain.
Abi Noda: Great insights, Idan. Anna, I want to go back to you. At Airbnb you’ve stayed very closely to your developers on GenAI usage. What’s your advice to other organizations in terms of where are the biggest gains that you’re seeing? What are the top use cases? And then where are some of the use cases that are more challenging or may require more investment to really actualize?
Anna Sulkina: Great question. I’ll start, a few points here to touch upon. First where we’ve seen the biggest gain so far is around assistant developers where a hundred percent correctness isn’t necessary to be helpful. Where chat is a great use case here where you can follow up with questions or continue modifying, providing edits to the code. So that essentially the suggestion doesn’t have to be a hundred percent correct, but it will give you the ballpark and you can build upon that and iterate. And another one is investing in making chat assistants that are aware of your internal sources. Documentation code, best practices and integrated into the developer workflow.
Again, reducing the context switch and the friction there. I think those are the biggest areas that we’re seeing the investment paying off and there is much more to do there. But then I’d also add, as with anything else, probably being realistic about what’s achievable today versus the future. Because it’s easy to get excited about the technologies and we’ve seen everyone wants to do something in AI. But realizing quickly that going from prototype to production is non-trivial and evaluating the quality of prototypes et cetera is really quite challenging.
And so there is also, I think, just generally speaking about any product, thinking about MVP and the specific limited area where you want to see progress and you want to measure the success there before expanding. That’s another one. And generally speaking, I am skeptical of non-specialists, at this point at least, of non-specialists investing in large efforts like fine-tuning. There is so much progress happening in this area with AI that you can probably ride the wave and benefit a lot by smart color integrations. I think that’s probably, that summarizes the state. And I guess I also would add that thinking… Again, keeping the future in mind, but being very realistic and specific what you want to get and what problem to solve. Not just applying AI for the sake of AI.
Abi Noda: Great. Great advice there and like everyone here, I’m really excited to see what future use cases and opportunities we start seeing actualize. That’s one of the next topics I wanted to get into is where are the biggest future opportunities we’re seeing? I also want to start weaving in some questions from the audience that I want to get to. Before that I introduced this part of the discussion with the question that I’m asked often which is, how much productivity gain can we actually get or expect? I want to go quickly around the horn. I’ll go last, with no more than a two to three sentence response from each person here on, how you would answer that question if asked by an engineering leader today? Alix, let’s start with you. How much productivity gain can we actually expect to get right now?
Alix Melchy: Difficult question. Here I would say really depends also on the developers, the engineers. If I take use cases of MLEs, I can see very good productivity again, just by the way you can leverage data much more efficiently. Which is one of the area where MLEs end up spending a lot of time. Other engineering, and so here you can talk about 25, 30%, other areas I would be more conservative but wouldn’t necessarily enhance that number.
Abi Noda: That’s fair. Idan, I’ll go to you next. How would you answer that question?
Idan Gazit: I don’t think I would answer it with a percentage because it’s percentage of what? Uplift on what? Not because I don’t think it’s possible to measure those things, but so many of the benefits are harder to quantify. Instead of spending our time laying bricks, we get to spend our time doing architecture. That’s a win. How to stick a number on that win? I don’t know. And at least all of the research I’ve ever seen about developer productivity shies away from being like, this is definitely how productivity is measured. It very much depends on the application. I think the key wins are in the satisfaction of the developers, the lowering of cognitive load. The fact that I’m able to orient myself more quickly. And that instead of spending time working for the computer writing boilerplate, the computer works for me, which I greatly prefer as a software developer. I think that’s the key, the key win, the lower cognitive load and the ease of doing things.
Abi Noda: And lastly, Anna, you more so than perhaps anyone here is face-to-face with this problem of answering that question. I’m curious, how would you answer that question, particularly to an outside organization that’s asking, hey, you’ve had a lot of success with rollout Airbnb, sounds like engineers are more productive. How much improvement should we expect in productivity if we do what Airbnb has done?
Anna Sulkina: And I love Alix and Idan’s answers there. For sure. To start with, as you know and you have done a bunch of research and thinking around how to even measure developer productivity, so that’s really hard. But putting that aside, as I mentioned in our developer experience surveys and the perception at least is that AI so far can improve developer productivity, 10, 20, 30% generally speaking. I can also see that let’s say in a few years, if we are continuing at this speed of progress that is happening in this AI space, then maybe we’ll get to doubling productivity. But again, back to what Idan said, is it really about what is it doubling really? Is it you are putting more bricks into the wall or you can build more things generally.
And lastly, I’d say that we aren’t even talking about the AI potentially opening up software development to folks who are not computer science based. Maybe it becomes, you can draw things like UX designers or product managers, maybe can build things without having to write code themselves. Maybe we get to a point in some future where natural language is what you use to build. Or maybe the folks who sit in a non-English speaking country have to figure out the requirements and it helps with translation and whatnot. Basically you raise the language barrier with understanding. And at some point it’s not just about stamping out code, it’s really maybe figuring out the design and architecture. Who knows what the future holds.
Abi Noda: That’s great. And I’m similarly optimistic. My response to this question today is that of course it depends on the task and depends on the engineer as it has been brought up in this conversation. But I certainly see many current existing use cases where these tools can drive 50%, 100% plus increases in particular types of tasks and repetitive tasks. But then of course there’s a large part of software development and engineering time that can’t be reduced or optimized yet through these tools. And so it’s largely task dependent in terms of the overall productivity increase that can be achieved. Moving into one of the audience questions that I think is fantastic. This is a question from an individual who works at Apple asking, “For an org starting with GenAI in developer productivity, what’s the right first step?” Anna, I’d love to direct this question to you. If you rewind the clock to when GenAI was new on the scene at Airbnb, how did the process begin?
Anna Sulkina: I think it all started with figuring out how to integrate chat into the company’s and including developers workflow and connecting and making the chat context aware, Airbnb context aware. I think that’s where the biggest initial effort was. And figuring out also how to integrate that, how to possibly then like next steps, exposing some settings and experimenting with how much more context is good or bad, how do you do this integration? That was the first step. And then I don’t remember the exact sequence, but at some point the pilot was… ID code assistant became also the priority as well. And so those two were the work streams that we were moving pretty quickly and seeing amazing results.
Abi Noda: Thank you for addressing that. And going to another question from the audience, from David. Asking about where is it that we really see friction for developers? I hear this a lot that tools like Copilot, very focused on the coding part of the job. But as the question states, having difficulty explaining how we’re not just improving a local optima? Which is the coding phase of the developer workday or workflow. Idan, you touched on this earlier. How do you think about the overall developer experience and where the biggest opportunities are or not for improvement?
Idan Gazit: When television was first invented, the dominant form of entertainment was the radio drama. And they didn’t know what to do with TV cameras in the beginning. And so what they did at first is they just stuck them in the room and they filmed the radio drama performers with the microphones, like acting out the radio drama. And they’d pop a balloon and that would be the gunshot where the murder happened or whatever. But they just filmed it. They took this new tech that they didn’t know what it was for, and then they bolted it onto the side of the old tech, which they were familiar with. And I think that’s where we are right now, figuring out what to do with this. Reality television was possible on the first day that the TV camera existed, it just took us time to figure out what to use this new medium for.
I think about that a lot. Copilot, original Copilot was the first mass market GenAI thing. And then very rapidly thereafter, it was a year or two later with the release of GPT-35, that ChatGPT started and then GPT-4. Now we have this code suggestion is one application and chat is a very different application. It’s not as good at just being in my authoring process. But it’s fantastic as, I don’t want to tap a human on a shoulder and maybe the question I’m going to ask is stupid and I don’t want to feel stupid, but I don’t feel embarrassed of the robots for now. We’ve got these two applications and now we’re all starting I think to reach for the more interesting things, which is the AI native things, like the reality television. What do we actually want to use this for?
We recently released Copilot Workspace, which is I’d say our first effort at this. It’s a little bit more towards that natural language thing you were touching on. It’s like, I want AI to generate a spec for me. Here’s how it works today and here’s how I want it to work in the future. And then I want the developer to be able to steer with natural language. And then build a plan. Like, okay, you’ve got the spec, how do I go from before to after? What files do I need to touch? What changes do I need to make. Again in natural language and again give the human that steering wheel. And only then do we actually synthesize the code. I don’t think that’s the only new modality, but when you ask where are we going, how are we going to reach for more, I definitely think about, how do we get past chat.
Which is fantastic for some application of this, but it’s not very actionable. It’s hard, I can’t have a conversation with chat about architecture and then be like, go apply that to my code base. I can at best be like, yeah, take this snippet that you generated and paste it right here. But how do we start to reach for more systemic interactions with code. To the point where maybe some kinds of tasks we won’t ever need to see the code. Which sounds scary, but when was the last time you wrote C or Assembler? We’ve all moved into using higher and higher level languages. And we’ve all gotten comfortable with the notion of there being a tool that’s transforming what I say in a high level language into machine code so why not English to JavaScript? I don’t know. And figuring that stuff out. That’s the discovery right there, that’s the ballgame.
Abi Noda: Well, a lot of really deep thoughts and analogies that I really enjoyed listening to. I’m sure the four of us could spend all day talking about this topic but we are out time today and need to wrap. Idan, Anna, Alix, thank you all so much for your time today and thank you to the audience, everyone who tuned in. We’ll be sharing the recording for this immediately after the conclusion of this stream. Thank you everyone for joining. Hope to see you again next time.
Idan Gazit: Thank you.
Alix Melchy: Thanks, everyone.
Anna Sulkina: It was fun. Thank you.