How Monzo runs data-driven AI experimentation

Laura: Welcome to another episode of Engineering Enablement. I’m Laura Tacho, your host this week. My guest today is Fabian Deshayes from Monzo Bank, a London-based bank that serves over 13 million customers and is scaling fast. Fabian oversees multiple platform engineering teams at Monzo who own everything from CICD for backend and mobile, mobile releases, feature flags, and of course AI tooling. Before he was at Monzo, Fabian worked at Spotify on Backstage and Developer Experience. Fabian, thank you so much for joining us today.

Fabien: Thank you very much. Thank you for having me. Long time listener of the show, so very pleased to be here.

Laura: Yeah, well, we’re pleased to have you. Could you start by telling the audience a little bit about Monzo for those of them that haven’t heard about Monzo before?

Fabien: Sure, so Monzo is a digital bank in the UK. We have more than 30 million customers here and we are very proud of what we’re building. Very new and yeah, we’re coming to Europe now.

Laura: Very excited to hear that. Can you share a little bit about your role? So you’re responsible for several platform engineering teams, but what does the scope of your role look like?

Fabien: Yeah, so I look after four teams at the moment, which are looking around what we call developer experience, but there’s touching many different parts. So for example, we’ve got the mobile platform, which looks after our release and everything to do with already apps and performance. We also have two teams dedicated to developer experience and teams about AI, AI enablement. We call it augmentation engineering.

Laura: That’s a good name for it. AI augmentation engineering.

To start our conversation, Fabien, we’re going to talk a lot about AI. We’ll talk about data-driven evaluations of AI tools and talk about some of the downstream impact that it’s having on your engineering organization and your business. To start with, Monzo is a bank. It’s a tech company that also does banking we could say. You’re a modern bank, but a bank nonetheless. You have a lot of sensitive information, personal information. It’s a highly regulated industry. Could you share a little bit about your approach to AI and your AI strategy in a regulated industry like banking where sometimes those things don’t seem to go together so well, innovation and regulation, but you’ve done it in a way that’s really working for you? Can you talk a bit about that?

Fabien: I think what’s important for us is that us as being the platform group for all the engineers at Monzo, we can provide tools to everyone. And by centralizing making these tools available, we can make sure we have guardrails in place when it comes to like PII client data to all of that. That’s top of mind for all of us. So we’re not going to make all the tools available and going to tell everyone you can use it on whatever you want. We’re going to make sure that it’s only applicable to certain domains to only certain people who’ve been doing some training. It doesn’t mean we’re not looking at expanding these tools, but we want to make sure that we have the things well in order before expanding. And we’re also a tech company, as you said, so we also want to make sure we’re not falling behind the curve.

And we know all of our competitors are looking at these tools and we’ve done the same something like a year ago where we looked at ourselves and wondered should we just bring all those tools in or should we wait a little bit? We decided that the best approach was to get the tools in and go through a trial with some evaluation criteria and just be very deliberate about what are we trying to achieve with these tools, where are we going, and which ones are the one that we want to have? It’s also not about just looking at what works for the others. It’s about understanding what works best for us because we have a very specific context where, as you said, a regulated industry. We also have a monorepo. We’ve got a few different language, Go, Swift, Kotlin. So all of these things are parameters for how AI needs to work for us.

Laura: Yeah, I think that’s such an example of how actually a regulation and the constraints that you’re operating in have forced you to think very deliberately about every choice that you’re making with AI, which in my experience with the companies that I speak with has led in a lot of cases better outcomes because the AI tooling and evaluations are a lot more targeted at very specific problems. And there’s a lot more change management around how they’re being rolled out, which turns out to be an advantage in the long term, even if it feels like a disadvantage in the short term because you hear a lot on LinkedIn or in other headlines about how fast other companies are moving. But to your point, Fabien, making sure it’s the right call for your circumstance. And every company is a little bit unique, so I think that’s a very good grain of advice for those listening. Let’s rewind two years ago. Let’s talk about where you started with AI. I know it was with Copilot. Can you talk us through the very early days of Monzo and AI augmented engineering?

Fabien: So I think like many other companies, our journey with AI started with GitHub turning on Copilot for us and suddenly this landed in our VS code. And that was quite interesting and nice and a lot of people starting trying it, but there was not a lot of adoption. Not everybody was keen and also they didn’t really know what to do with it. And I think from there we started being more and more curious and I think our leadership was really keen to make sure that we are investing into bringing these tools in. And we were really empowered in the platform collective to make sure that we can bring these tools in, but also make it in a very structured way. So that’s what we did.

Laura: Let’s talk about some of that structure because data-driven evaluations for AI tools, it’s so critical. Can you share why that was the approach that you took? Why was this structured evaluation with really clear criteria so important for your AI strategy?

Fabien: We are very data-driven at Monzo. And any decision we make, we need to make sure that it is the right one. And what is best other than data to make this decision? So we wanted to make sure that even before we started any trial we had a evaluation criteria in mind so that we could compare all the different tools that we’re going to try. So we’ve tried three tools since then, Cursor, Windsurf, and Claude Code and we’ve applied a set of evaluation criteria for all of these. And this has been extremely helpful because at the end of the trial we’ve got our evaluation criteria, we’ve got our values, our numbers, and we can decide whether we want to continue with it or not.

And it was very easy to convince anyone that could question or that wanted to say, “Oh, this tool is great. I’ve used it on my personal time. We should definitely use it.” And then we look at the data and we can say, “Oh, no, look at that. There’s some things that don’t match up with what you’re thinking.” So having that evaluation criteria has been extremely helpful for us. And some of the criteria we’ve looked at, that was really helpful. So we’ve got a list of 10 to 15 criterias around adoption, the weekly active and the monthly active users retention, which is super important for me because I see a lot of people wanting to try the tool. Then they’ll be counted as part as weekly active users, but then they’re just going to drop the tool because it’s not really helpful. So I think the retention criteria was really helpful.

We’ve been looking at the number of lines of code returned by the AI and the percentage of code written by the AI. We’ve been looking at the acceptance rate of suggestions, the satisfaction of our engineers. That has been really, really important. And we’ve looked at the different use cases that it covers and whether it’s good at certain things rather than others, so bug fixing, documentation, large migration, refactoring, developing new features based on templates, all of these things. We wanted to make sure we had detailed feedback on all of these. We also looked at the company health and also just the cost of the tool because some of these tools can be quite expensive if you use them quite a lot. So we want to make sure that there’s a good return on investment and we understand where we’re spending our money.

Laura: Yeah, thank you so much for sharing that with me and with the audience out there. I know that data-driven evaluations and understanding how should I even be approaching making a purchasing decision is something so many leaders are in the thick of right now, really kind of struggling to find an approach. So I think your guidance will be really helpful for them. I want to get into specifics on some of these things. So first, let’s start with the tools themselves. So you’re part of the platform collective. You mentioned that you’re trialing Claude Code, Cursor, and Windsurf. Who picked those tools? Was that something the platform engineering team or your platform engineering collective was responsible for? Could developers nominate tools for trials? How did you land on those being the ones that you were trialing?

Fabien: Yeah, it’s an interesting mix of both. So we’re listening to what people are suggesting, we’re looking at the market and we’re talking to some of our friends and at other companies and trying to understand where this is going. Obviously the market is moving really quickly and I remember we started a trial and suddenly there was another tool that everybody wanted to try, but we also wanted to make sure we sequenced the trials, not to have too much overlap as to otherwise you could bias some of our data. So it’s really a mix of what our engineers and leaders have been asking about, but also us going more deeply to try to understand if this is something that could work for us in our context. And we also want to make sure we can cater for the largest use case, so we’ve looked at tool that could help, more specifically iOS engineer, for example, which would be great because they don’t always have the best tooling in their day-to-day life, but now might not be the best time for us to invest in a tool that can only serve 10% of our engineering force.

When we look at tools like Claude Code, the great thing with it is that it can integrate as many different workflows and ideas and that makes it very easy to adopt and start using. So there’s also a question of which tool and we’ve had to reject some proposal of people wanting to do a trial just because we believe that it’s not the best investment of our time. We don’t have an unlimited budget and a very large team that works on bringing these tools in, but we are much more interested about making sure we can bring tools that can help the majority of our engineers and we can invest in the tool. I think it’s one thing to bring a tool in, it’s another thing to make sure that the tool is useful and understand all of the context at Monzo. So we’ve invested a lot into tooling and Claude files and workflows and all of the markdown files that really guide the models.

Laura: Yeah, some organizations take the approach that a developer just gets a stipend and they can choose whatever tool they want, in which case your iOS engineers could be picking something different from your Go engineers based on whatever preferences they have, their own workflows and use cases that they need to have covered. At Monzo, though, because you are operating at scale and scaling, you’re in a highly regulated industry. When you talk about bringing in a tool, you’re not talking about just developers being approved to use it. You’re really talking about an organizational decision to support this tool, train people, enable them how to use it, as you said, all the .md files and all the context gathering for this tool to really integrate it in the workflow. Am I getting that right?

Fabien: Yes, that’s absolutely correct. Another example of that is how we use MCP. It would be very easy for us to just enable all of the MCP integrations and get models to access our BigQuery data and our Notion data and our Jira data and whatever tool you want, but some of these tools might have some PII or there might be things that are confidential that we don’t want to share. So our approach by default is not to enable that, but we’re creating an abstraction on top of all of these what we call knowledge draws and we are just allow listing the different things that we want the models to be able to access. So something that we’ve done, for example, is we’ve got a lot of reference documentation as part of our onboarding paved path or golden pass, however you want to call it, and we want to make sure that the models know how to access them.

It’s by ensuring that model can access the Monzo context it’s going to make it 10 times better at doing our day-to-day job. And I think the first reaction of a lot of engineers we’ve tried these tools is like it’s pretty good, but I had to tweak the code to make it the Monzo way. And that’s because the tool doesn’t know about it, so it’s not going to be magically picking things up. And because we are working on a monorepo, there’s quite a lot of context if you were to push the entire monorepo there. So for us having markdown files that can guide models and having an MCP that can open some RAG workflow for the models, that’s been extremely important and that’s been very successful as well.

Laura: Yeah, so this is really about transforming the ways that software gets made at Monzo, not just about an individual developer having three different AI tools that they can use however they want for their own individual tasks. This is really about changing the way that your organization works, which is why data is so important and why you need the validation of the use case and of the impact. So that when you do scale it, first of all it’s an investment of money, but also time, resources, training that you’re really making a solid decision.

Fabien: Definitely. So important to make sure that it’s not just bring the tool in and forget about it. It has to be a continuous involvement and it’s also not just about one tool, if we can work on tools. And that’s why MCP I think is really great, is it can interact with all the different models and all the different tools. So that’s really a great value for us.

Laura: I want to get back to some of the metrics that you mentioned before about actually tactically how you are evaluating these tools and get into some specifics. So you mentioned a couple things like adoption rate, percentage of code written with AI, percentage of PRs written with AI, satisfaction, company health. What are you actually using to measure these things both during the trial period, but also beyond the trial period? Are you using system data? Are you using self-reported metrics? What have you had to put together in order to get a field of view on the impact of these AI tools at Monzo?

Fabien: Some of these criteria that we get the data straight from the tool providers, so things like active users, lines of code return, sometimes acceptance rate as well. That’s the kind of thing that they like to put in front of us. Credit usage and cost as well, that’s very easy for us to gather. For the other one, it’s mainly survey data, so we send that to all the users. We have a list of users. We send a short survey and we’ve used the same template for all the different tools, again, just to make sure we are doing this the right way. Yeah, it’s been pretty good because we can then dig into some details area, like I talk about satisfaction.

And we have one satisfaction, but we also dig into the different use case and try to say how good is tool X for refactoring? How good is tool X for large migration? How did you find tool X at explaining code? So all of that help us dig into the details. And then we are running quarterly developer experience surveys as well and we’ve started adding some more AI tailored questions to that. For example, one thing that was really interesting is that we still only have half of our engineers who feel confident to use AI engineering tools on a day-to-day basis. So it means it shows us that we still have a long way to go.

Laura: So even if the adoption rate and we can see people are using it maybe weekly or monthly, that’s not the same as they feel confident, they feel like it’s really helping them, they trust the output. And then from there you can make strategic decisions about investing in training, enablement, targeting other use cases, doing other things. So I imagine that data is really helpful for that continued support of the tool and helping people stay onboarded as users.

Fabien: Exactly. It really points to us where do we need to invest? Do we still need to invest in upskilling people? Do we need to invest in more tools? I think at the moment we’re in a position where we know that we have good tools in, but we just need the engineers to feel more comfortable about it.

Laura: Yeah. So you mentioned doing evaluations of a couple tools, Claude Code, Cursor, Windsurf. For an individual developer, do they have access to multiple trial tools at the same time or do you segment who gets access to which tools? So you’re kind of comparing different cohorts to different cohorts or is there ever the experience of one developer has access to a bunch of tools and then they pick their favorite?

Fabien: Yeah, that’s a really great question. What we’ve tried to do is first sequence how we bring these tools in just because we want to avoid overloading people with new things. That already the pace of change in AI is very high and we want to avoid people feeling drowned or even having a fear of missing out. It’s also something we’re very conscious about. It’s not because we bring these tools in that everybody needs to drop what they’re doing and jump on these things. We’ve had a good learning story when we did these trials because the first one we did, we gave the tool to as many engineers as possible. And it was almost free for anyone to use, but we quickly realized that there was a lot of people who were interested, but didn’t have the time to invest, couldn’t give us any good feedback. And it felt a little bit disappointing in that we spent so much time trying to enable 400, 450 engineers and only 150 would use it and then 50 would give us feedback.

So we decided then to change our approach in future trials to just start with a small cohort of people that have a very well defined use case. So they might be working on a migration. They might just be very interested. We have a group called the AI Engineering Champions which are just in different parts of Monzo looking at different parts of the product. Having a couple of engineers who are very keen on leveraging these tools, but also be the relay for the rest of their collective to present their learnings. That was very important to have them engage, so they can relay the information. And starting with this group of smaller number of people, we can really push the boundaries of these tools. Made us realize that some tools might look great on the surface, but when you dig into actually using it on real project for real use case, there are some limitations. So I think this cohort helping us evaluate and give us better feedback or more qualitative feedback was really helpful for the subsequent trials that we’ve run.

Laura: How are you getting that information from those champions then scaled across the rest of the organization? Because I imagine that advocacy part is part of that responsibility of being an AI champion.

Fabien: Yeah, it’s something we haven’t yet fully invested in because the way we look at things, we believe that first we wanted to bring some tools and try them and get our hands dirty. The second phase is to invest and double down on these tools by building all the tooling around it, all the safety nets around regulation, the guardrails that we want to have. And then the last phase is we want to start upskilling people and we’re just getting there, but part of our strategy is to communicate very often, whether it’s like Slack channels or all hands announcements around what are the things we’re doing and what are the success stories as well. I think it’s really motivating for people to hear that one of their colleague tried the tool for a particular use case and it really worked.

And what I see is that there’s a lot of people who are not part of the champions, but who just did it just because they feel that it was really productive and they just want to share it with the rest of the organization. And I find that really great, but I also know that it will not be able to scale that model for everyone. So it will definitely bring some more structured learning, looking at prompting engineering, for example. I think that’s something really important. It’s not really yet part of our onboarding tutorials at Monzo. It’s not something we’re going to look at in the next six months, for sure.

Laura: Yeah, a lot of exploration, it sounds like. A lot of experimentation happening, trying to figure out where the high leverage use cases are and then nailing it really well and then scaling it across your organization seems to be your approach.

Fabien: Definitely. And we’ve got some good data around that. When we look at AI usage and also spending, we see that we have a minority of users who are the biggest spenders. And the curve start with a high peak of the top 10 users who spend 50% or more of their money and then you really decrease and then there’s a long tail. And what we want to see is that for that curve to become a little bit more linear. So yes, there’s always going to be high spenders, people that are using it more, but we want that critical mass of the kind of 25 to 75 to be spending a lot more and just being able to be more comfortable with these tools. And the great thing is we have that data already.

Laura: Yeah, and how are you getting that data you’re getting directly from the tools themselves, right? Telemetry has sort of matured a lot in the last 12 months when it comes to what you can get from the tools and I think usage and costs spending data is one of the areas that there’s a lot more visibility into than 12 months ago, for example.

Fabien: Yeah, definitely. And we also conducted a bunch of one-to-one interviews with our most costly users or we call them power users because that’s much nicer. And it’s really, really interesting to know what they’re doing because they’re very advanced in the curve of what you can do with AI in engineering. There are also some people who just said, “I was just trying to compare the different models.” So yes, there might have been a lot of spending for that time, but I think it’s not going to be always like that.

Laura: Yeah. It was experiment for a purpose, right?

Fabien: Yeah, but that person has so much learning that we actually shared what they’ve learned to the rest of the company and I think that’s really going to help everyone.

Laura: I want to talk a little bit about the spending and budget aspect. I mean, Monzo is a bank, so it’s probably not unusual for you all to be thinking about pounds and cents and other things. Cost control I think is something that is on a lot of people’s minds. Maybe they’re not actively doing anything about it, but 2026 is not going to see, I think, AI costs reduce, right? Or we should be expecting to spend more. Can you talk about how the cost piece of it factors into your evaluation criteria? You have a budget in mind for AI tool spend per developer. How are you approaching bringing in all of these tools, saving room for experiment, but also not blowing the top off of the budget that you have for R&D?

Fabien: Yeah, that’s something where we started a year ago we had no idea. And we looked at what would be sensible, how much would be comfortable spending. And we started with that, but very quickly we realized that number was not really helpful. I think what helped us to land on a figure was to actually put the tool in the hands of our engineers and seeing are they satisfied with it and how much does it cost? And we’ve landed on a figure of $1,000 per engineer per year at the moment.

And that’s something we’re comfortable spending and we’re not there yet, but we’re close to that. And that’s something that has been very transparent from the beginning up to our CTO, for example. We were discussing with them and saying, “Would you be comfortable if we spend that much?” And they said yes. And I think we had some rationale behind it. It’s not just a random number in the air, but we are thinking that a typical user would use that many requests per day and all the things. Then agents are coming. That’s yet another unknown. And similarly, I think with agents, we’re going to be spending more, but we’ll take a similar approach of saying we expect to see these gains from agents and these gains will translate to these different costs. But I think until we have real data, it’s difficult to project.

Laura: And just for some kind of benchmarking, generally what we’re seeing in the conversations we’re having with engineering leaders as well as a poll we did during a recent webinar about budgeting, we had about 200 plus folks respond. How much money are you spending per engineer? We had the majority of people falling into that 500, 1,000, and then tailing off as we got above 1,000. But that’s also my general generic guidance, is sticking around that $1,000 mark per engineer per year.

GitHub co-pilot enterprise license is about $500 per developer per year. And I think like Monzo, a lot of companies and many companies that I talked to, I would say most of them are taking a multi-vendor approach where there’s a lot of different use cases and tools are specialized still. We don’t have one tool that’s going to cover every single use case. And so it’s important to keep room for experimentation, challengers to incumbent tools, kind of keeping options open, especially as this ecosystem is evolving so fast. Just so fast. It’s hard to keep track.

Fabien: Very tricky to know what’s going to happen in three months.

Laura: Absolutely. And one of the things about cost, which leads me into my next question, is about return on investment. So we can look and say, "Okay, we’re comfortable spending up to $1,000 per developer per year, but we also want to see the results coming back to the business. And so Fabien, I wanted to ask you about at Monzo, generally speaking, what are some of the results that you’re seeing from using AI right now in terms of throughput, speed, quality, those kinds of engineering performance metrics? What has been the change or the shift with AI augmented engineering?

Fabien: We’ve collected some data very recently on that, so I’m very happy to be able to talk about them. One thing we’ve seen is that there’s definitely more pull requests per engineer. Not 100% more, but something around the 10% to 20% more PRs per engineers. And what was really interesting to us was that the average size of the pull request is higher, so it’s like 20% more larger pull requests in terms of size, which I think was really, really interesting. The other thing we’ve noticed is that now 20% of our code that is produced, according to qualitative data, is produced by AI and we found that really interesting. And I think on our latest developer experience survey, we also seen a resurgence in people complaining or mentioning time for review being a bottleneck because reviewing AI or pull requests coming from AI can be much more complicated and long just to do because there’s just more code. And so we found that now we have a new bottleneck and a new problem to solve.

Laura: Yeah. So you’re having not just more PRs per engineer, but also the size of the PRs per engineer is growing. And so that’s kind of two things that lead to an increased bottleneck in code review. Are you using AI tools for code review right now?

Fabien: We’re using it, but we’re using it as opt-in. So it’s interesting because at some point we tried having it like a blanket thing on all of our pull requests we would have an AI review. And we did some survey again after two weeks and realized that a lot of people thought it was not a great investment. It was really good at certain things, but also created a lot of noise, so we decided to make it opt-in.

Laura: Talking about bottlenecks, we mentioned code review being a bottleneck. One of the things you mentioned as well was developers trying to get their changes into a dev environment for them to test and now we have more code more frequently. This was an existing bottleneck that you had and AI is just exacerbating the problem, making it worse, but you came up with a solution, which is these preview environments. You’re calling them tenencies. Can you talk a little bit about how that project came to be and how it’s helping developers get more valuable work done?

Fabien: Yeah, that’s been a very interesting journey that started roughly at the same time when we started to adopt AI. So the bottleneck was already there and we saw that through our developer experience surveys and we knew it was a problem. And we spend a lot of time thinking about what would be the right solution because at Monzo at the moment we have a dev environment and a production environment. We don’t have anything in between.

And then we could have just said, “Oh, let’s create another environment,” like a integration environment or something like that, but that would just move the bottleneck somewhere else, I feel. And we really went back to our first principles and trying to understand what are the engineering try to achieve. Basically, they want to test their code in an environment, but also don’t want to go through the process of having to merge and wait for the deploy and do all of that. So they would just straight deploy without reviews on anything, which would create a lot of instability in our dev environment. So what we did there was, as you said, created preview environments, which allows us to route some of our requests through our dev environment to this tendency, which would be per user. And then the user would deploy their service to that tendency and then we would route the traffic straight to that service.

And that allows us to scale really, really well because now we have one per user and our dev environment is a lot less stretched. And some of the side effects we saw around that was that we had a lot of flaky tests, for example, because we were running tests on that environment that was constantly changing and that state was mutating a lot. And so we’ve seen a lot of improvements on that area already. And we’re very glad that we’re building that now and we’ve got that available because with AI being here and soon AI agents being able to test their changes on these environments, that’s going to be fantastic.

Laura: So AI can exacerbate and really highlight some of the existing bottlenecks. As you said, you had this bottleneck even before AI came and thankfully you have such platform principles at Monzo thinking about scaling this and addressing the bottleneck, which is also helping you accelerate AI. This is now groundwork preparation for agents being able to create and then test and verify their own changes in these sort of ephemeral environments. And you already have the scaling built in, so that’s a huge accelerant multiplier for you all by having that really good plumbing and that platform mindset to start with. That will be very cool to see in the next 12 months, how this all unfolds.

Fabien: Yeah, exactly. It’s a very exciting future.

Laura: I wanted to ask you as we start to wrap up here is not just developers, but also other folks are using AI and we’re sort of expanding the definition maybe of not who is a developer, but where the work gets done, where it gets handed off. You mentioned before about product managers and designers now having more access to AI tools in order to get a bit further along in prototyping and validation before that work would maybe be transitioned to a developer team. Can you talk a little bit about that approach? I think it’s a really great illustration of using AI not just to speed up individual coding, but to really change the way that software gets made at Monzo in general.

Fabien: Yeah, it’s been a fantastic thing to observe this change because it came organically from our designers, our product managers and other disciplines really at Monzo who just reach out to us saying, “Hey, I know you’re in charge of tool X or tool Y. Could you help us get on board? Because we have these great things we want to achieve.” And we’ve seen some fantastic results where all of our designers have been able to try the tool and prototype things directly. And the feedback we got from them is fantastic because they say that the prototype they create is much better quality and it’s much, much faster to get that out. But interestingly, and similarly to what we’ve discussed before, we need to make sure that we give the LLMs the right tools to work well at Monzo. So for our designers, we have a design system with this building blocks for what does a button look like, what are the different rules in terms of spacing look like and everything. And all of that is available to the tools.

So when they create the prototypes, all of that is pretty much instantly created. And that was fantastic to see and to observe, but I think the best aspect for me as an engineer is to see that we’re blurring the line between an engineer and other discipline. And where do we create the value of these products? I think the pass from an ID to a prototype to something that can be passed on to end engineer has widely changed in the past few months and I think it’s going to speed up the way we can develop and ship features at Monzo, for sure.

Laura: Yeah, that’s exciting to hear. It’s exciting to hear. Two things to ask you as we close out our conversation. First, for companies that are maybe feeling stuck, they’re making these tools available and they’re noticing, like you did in early days, adoption’s just not what they expected, the value or impact isn’t what they expected. What is your advice to companies who are trying to use AI as a way to really change the way work gets done, move the needle for the business, and not just be a fun, enjoyable thing for engineers to do? What advice do you have for them?

Fabien: Yeah, I’d say there’s multiple advice because it’s quite a complex thing. I’d say this, don’t limit yourself to just making these tools available. There’s going to be a lot of things that comes with it. You’ll need to share examples that work. You need to build the foundation to make sure people can leverage it and it is efficient for their context. You need to share the prompts that you’ve used successfully. You need to share success stories and you need to build communities as well. I think it’s very important to be able to have relays in different parts of your organization who understand what you’re building and can relay your message when you’re saying that there’s a new tool available, or you’ve built an MCP for them, or there’s something else that is coming, new features, new models they’re really good at.

And I think related to that, bringing a tool in your company is quite an investment. It’s not something that you just bring it on the shelf and people can just self-serve. You’re going to have to invest in it to make sure that it works, so don’t try to bring all the tools at once. I don’t think that’s going to work well. Make sure you have a evaluation criterias or you have ways to understand what’s going to work for you, what’s not going to work. And be just happy to say no to certain tools if it doesn’t meet the threshold that you’ve defined. Invest where you think it’s best. And yeah, good luck because it keeps changing. So it’s a tricky domain to be in for sure, but super exciting.

Laura: Yeah, really exciting. Things are changing really fast. When it comes to AI and the tooling ecosystem. What do you think will be the biggest changes that we’ll see in the next 12 months?

Fabien: I think for me, there’s something around the agents and how engineer are going to use and leverage agents in the future. And we definitely can picture a world where as an engineer you’re actually leading a team of agents that are specialized to do very different tasks. So you might have your unit test engineer or your unit test model or tool and your refactoring one, and you have your architecture one, and you might have your Monzo domain expert one, which knows all about finance and all these things. And for me, the big challenge and where I haven’t seen any tools around that yet that does it greatly is the orchestration of all of that. I believe that’s definitely something that’s going to be very powerful.

These tools are going to be here and they’re going to be even better at what they’re doing, but for engineers or developers to leverage that to their maximum, I think we’ll need something in between where you won’t have to give all of that context and say, “Oh, look at that file here and this is the instruction for unit test.” So all of that manual linking that engineers still have to do sometimes. I think there’s an orchestration tool here that can really help with that and make it very easy for everyone.

Laura: I bet there is a platform engineering team out there working on tooling like this that will open source it or create it, make it available somehow. But I think you’re right. It’s that orchestration layer. We just want to declare what we want and get it. And I think at the heart, that’s really the role of platform engineering and developer productivity engineering, DevEx engineering, is connecting the dots, centralizing the complexity so that we can abstract it away from the developer in a sensible way, but that will be really interesting. Wonderful, Fabien. Thanks so much for sharing your guidance on data-driven AI tooling evaluations, how that’s looked for Monzo, some of the impact that you’re seeing already, and then some advice to engineering leaders. I think it’s been really helpful for those listening out there and thank you very much for joining us.

Fabien: Thank you, Laura. Great chat today.

Engineering acceleration tools

How Monzo runs data-driven AI experimentation

Show notes

Evaluating AI in a regulated industry

Experimentation strategy

Evaluation criteria and metrics

Budgeting and cost control

Engineering outcomes

Cross-functional adoption

Leadership and enablement

Timestamps

Transcript