Designing the AI‑native engineering organization with 1Password, Microsoft and Atlassian

Greyson Junggren:

Awesome. Well, thank you, Justin, and really excited for this panel discussion. Before we begin, I’ll just briefly introduce everyone up on stage. So Tim is corporate vice president in Microsoft’s core AI division, where he leads both the development of AI-driven developer experiences and product and focuses on Microsoft’s internal developer infrastructure, metrics, and engineering culture transformation. Nancy is the CTO at 1Password. She leads innovation across security and AI. And before that, she was general manager at AWS, also a partner at a VC firm, Felicis Ventures. And Taroon is CTO over AI and teamwork at Atlassian. So he oversees engineering over Jira, Confluence, Loom, and Rovo. He also oversees developer productivity and infrastructure at Atlassian. And before Atlassian, he led search at Microsoft Bing and Facebook search and marketplace. So to dive into the discussion, so much is changing right now. What is a change that you’ve made or are planning to make to your engineering org structure because of AI? Taroon, do you mind kicking us off?

Taroon Mandhana:

Yeah. Honestly, a lot is changing, but I think it’s very early. Very transparently, we are not planning to make any changes. I think we already have a fairly large span of control for our managers somewhere between 12 and 18. So as of now, I really don’t see that changing. What indeed we are seeing changing is how the teams come together to work on a problem. So particularly one of the trends that we are seeing is, especially for zero to one projects, something you’re trying to build from scratch, the team sizes are going down or the squad sizes are going down. Typically, you’ll have three or four people coming and joining very tightly with a designer or a product manager to get something off the ground. And at that time, obviously getting to PMF is probably the highest thing you’re optimizing for. You’re optimizing for the learning loop. And with AI really accelerating a lot of the building time and alignment is where you’re seeing a lot of friction in those setup. So that’s where bringing smaller teams is we are seeing a lot more effectiveness.

Nancy Wang:

Yeah. Maybe to just add to that real quick, I think in addition to team size, one trend that we’re noticing internally is also planning cycles. So for example, this is from my AWS days, where I don’t know how many of you have spent time at AWS, but OP1 should probably set a lot of alarm bells or just be very familiar. That’s where you start planning for the next 12 to 18 months, and you have essentially that bucket of roadmap items. Now, especially with the market and the industry pivoting so much, and we happen to be in the agent security space, which is pivoting just as quickly as agents and their deployment methods itself, we’re now planning for a quarter ahead instead of 12 to 18 months.

Greyson Junggren:

Yeah. And similarly to this, organization change is best at helping create headwinds or eliminate headwinds and create tailwinds. In order to really do those kinds of optimizations, which are really long, persistent, affect careers, affect the trajectory of people, you need to have a system in a little bit more equilibrium and a little bit deeper understanding of where we’re going. I think the main approach right now is operating smaller, smaller groups. We’re doing a whole bunch of work where V teams are forming, having very clear specific missions, operating in eight week cycles, and just really focus on speed of learning and speed of experimentation. Organization change, that’s a thing to sort of figure out later on once we know what the shape of the sort of new optimized system looks like.

Abi Noda:

In addition to team structures and org structures, there’s a lot of discourse around, “Well, what’s the role of an engineer in the future?” And you all are in charge of defining that in your organizations, both in terms of hiring the next generation of developers in your organizations, but also defining what does great look like. So how do you think about three to five years from now, what is the profile? What was the skills profile of a great engineer? And how do you think about that in terms of hiring and promotions? Tim, can you kick us off on that?

Greyson Junggren:

Sure. There were a bunch of sub-questions buried in there. So maybe I’ll say the first one. One, projecting out three to five years right now, if you’re attempting to do that, you’re insane. You’re hallucinating at what that future looks like, which is really hard because we’re running organizations with people with long-term careers, and so everything is trying to drive us to do that projection. So we have to think about what is durable. Like, “What are the durable patterns that we’re seeing taking shape right now in the teams that are the most effective in the people who are rising up and able to do the most and to do more as we’re going through this change?” So it’s not necessarily about… We hear a lot of the zeitgeist around the blending of roles. I think there’s some truth to that, but the distribution of that truth is a little varied.

The thing that is absolutely clear and consistent that we see both inside the company, we see with our customers, we see in the startup community that we work closely with is this maker’s mindset. If you have the maker’s mindset, which means you’re not worried about the tools that you’re holding, the specific tool you’re using, you are objective oriented, you’re thinking about crafting and creating a thing and hopefully a business outcome that makes you money. And then you drive towards that with whatever you can. That mindset is actually the thing we’re looking for. And so more and more and more, as we’re thinking about what early in career people look like as they come in, as we think about our existing people who are long in career, we’re trying to identify those and encourage those with that maker’s mindset, like via optimizing incentives and rewards, via the way that we communicate and how we talk about the speed of experimentation, the speed of learning, and the speed of actually creating.

Nancy Wang:

Yeah. I would say for us, and maybe just drawing some experiences from my own career, I’ve actually been sort of shifting in and out of product roles as well as engineering roles. And that really stems from, I would say, when I first started out my career in tech. There was this very sort of sharp delineation between, “Well, if you’re wanting to talk to customers, you’re going to be on the product side. If you want to build, you’re going to be on the engineering side and the two sides shall not meet.” And I feel like, especially now with this mindset of, “Everyone is a builder,” right? It’s easy to spin up Claude Code. It’s so easy to bootstrap your own dev environment now. You might not even need to be super proficient at writing code in order to do that. We’re really seeing these lines kind of blurge between product management and engineering.

And so specifically on a startup teams like the ones at 1Password, we’re actually having more product engineers, is what I would call them. So folks who have good taste in understanding workflows and understanding UX requirements, talking to customers, but also they’re hands-on sufficiently where they can build that first prototype. Now, with that said, you’ll still need full stack engineers to turn that into production code, and especially if you’re not operating within the confines of an AI native environment where everything is greenfield. So for example, we have to operate internally with a monolithic database, and so that does require engineering to get involved once you pass the prototyping phase. But certainly for new grads, my best advice is, “Try to be both,” right? See if you can actually span the entire spectrum of the product development lifecycle all the way to the software development lifecycle.

Taroon Mandhana:

Yeah, I think I can’t agree more. The product engineering mindset is pretty much key and it’s coming to bear, especially with AI accelerating a lot of the building work. A few other things I would add on top of that is I think the folks we are seeing more successful in leveraging AI and obviously seeing better outcomes are the ones who are able to go to the next level of abstraction to some extent and being able to solve problem at that level and also being able to work across different code bases or different technologies. So I do feel three to five years from now, I think we are going to see a lot more generalists, somebody who can go from front end to middle tier to back end is very comfortable getting into different code bases and have some level of understanding to work with them, even though they might be deep in one place.

So being able to do that is one part, I think, is very critical. The other piece I would go back and add is also some high level of agency is very useful here because as I said earlier, I think the bottleneck is really decision making alignment, deciding what to build and also responding back to what you’re seeing. So engineers who are more, in many ways, step up, are able to have the right conversations, are able to think about the product and work across craft, are the ones which we are seeing to be a lot more successful. And I do think three to five years it will become even more important.

Abi Noda:

You all touched on the blurring of roles and how that’ll evolve over the future and how that impacts how we think about the profile of engineers we want. A lot of companies I talk to are talking about what’s happening now, which is that roles outside of developers, so PM design, et cetera, are writing code and they’re asking like, “What do we do about… What’s the right process for that in the organization? What does that mean in terms of quality standards?” Taroon, maybe you can kick us off. How are you thinking about approaching that?

Taroon Mandhana:

So I think there are two sides to this. The part that is very exciting, especially around the left of code, where you are ideating, I think those discussions have become lot more high fidelity, especially if the PMs and designers are building first level prototypes to go ideate and to spar with everybody else. So as a result of that, there’s this lot more, I would say, robust discussion and clarity in terms of what to build. And oftentimes when you’re working on documents or specs, you’re spending a lot more time versus when you’re actually playing with the prototype, you’re having real conversations. So I definitely see in a qualitative sense, some of the left of the code processes is getting way better. Now when it comes to actually shipping and production, that’s a completely different story. If I’m being honest, every day I have engineers escalating that, "This particular designer wants to merge this massive PR and here are all the 15 problems with it and what should I do about it? "

So on the shipping side, I would say we are still figuring that out. And I think it all depends. For very simple things where like we have some areas where designers are fixing, fit and finish things, things that the engineers never optimized in their backlog and our designers are feeling empowered to go fix those paper cuts. And I actually think that’s a great idea. So those kinds of changes, I think the teams are welcoming. Also, it depends on how robust are the hardening or the processes of deployment within the team. If the test suites are very robust, if there are right checks and balances, right standardized sort of enforcement and good understanding of accountability, once something breaks in production, for those teams, I think they’re a lot more comfortable in entertaining. Some of these check-ins are merging from non-engineering craft, but the teams where there’s a lot of legacy, the write of code is not in the best shape, I think that’s where I’m seeing a lot of friction and teams are still figuring it out. So I think it’ll depend from team to team, where it goes.

Nancy Wang:

Yeah. I feel like you’ve been reading my Slack channels around the designers and engineering sort of conversations. Very much so. I would say certainly at 1Password where if we talk about, for example, I’ll use a case in point, which is our front end testing because we’re live through all of the Chromium based browsers, also Safari, also all of the different mobile clients, desktop clients. That’s a lot of surface area for us to write automated tests, right? And as you all probably have experience with playwright testing, it’s not always works out the way you think it will, right? And so we rely a lot on our community, for example, obviously internal testing, but also our CX team. And this is a example where you have customer support associates or folks who typically, if they’re not engineers, they don’t write code, but now with, for example, all of these coding assistants or tools, they can generate PRs.

And so this is where from engineering, we’re now building the testing harnesses to be able to evaluate those PRs, do code reviews, and make sure that they meet our quality bar. But now you are essentially allowing a customer facing function who has never write code before to actually contribute to the SDLC. Yeah.

Greyson Junggren:

No, I love this and I strongly agree. Zero to one, it’s amazing because more people now are able to take an idea they have in their head and turn it into something that you can actually interact with. So like sort of the product builder, that maker’s mindset is much more accessible. There’s maybe another half to this too that I think is interesting is most of the conversation here was about how more people are participating in the product creation process. There’s a second half, right? We have this mantra inside, which is, “Use AI every day to do more,” right? The goal is to do more. You can use it to help build your products and integrate it into your products, and you can use it to improve yourself and your own systems. The thing that I think is really interesting is that the non-engineers are finding uses on this side more than they’re finding uses on this side.

We see great for prototyping, but we see it through the whole life cycle here and changing workflows and changing the way that they work and optimizing the way that they gather and distill information and communication in all sorts of other forms. So the place where I’ve seen some just absolutely incredible, really exciting stuff that makes me really excited about the future is how all of these people of different disciplines look at the way that they work and are now using AI to optimize those sides. And that doesn’t go to production, except it does. It goes to production in the way the business works. It goes into your own sort of production machine. And that’s really awesome. I think we’ll see this side mature as testing infrastructure, as testing systems as the way that we validate and build confidence in our code mature. That’ll happen in the next year or so. This side is right now and it’s real right now.

Abi Noda:

Switching gears a little bit, all of you have been in roles of rolling out AI across organization, whether that means encouraging its use, whether it means baking incentives into compensation structures or mandating its use. So I want to discuss that a little bit. So I think just being really direct, have any of you mandated AI usage? And if so, what were any learnings from that? Or what are your thoughts on that? And Taroon, maybe we can start with you. Or actually, Tim, let’s start with you.

Greyson Junggren:

Sure. No, I think mandating its use… Mandating anything’s use is driving an activity, is focused on activity, not outcome. And I think you guys… You and Brian touched on this a little bit earlier, in that nuance of the difference between activity and outcomes. Activity metrics are useful. They’re useful to understand behaviors and patterns, but they’re not the objective. Like I just also mentioned, use AI every day to do more. Now, using AI is a thing we absolutely pay attention to. We’ve been driving, we’ve been encouraging, we’ve been setting up an incentives and rewards. We’ve been doing training, trying to make it easier to understand how to hold this new tool as we figure it out. But in the end, the outcome metrics are the things that actually matter. And to us, the frame we use is speed, ease, and quality. Are you making it fast and easy to build great products?

And then identifying what those objective metrics are on those top three things we could talk more about later. But that focus on the outcomes is key. Now what we absolutely do know and we do see is that for active AI users, behaviors are changing, patterns are changing, their bottlenecks are changing. They are able to do more. Thus, as a business, of course we want to encourage that because we see the tide rising. We see the bar rising in terms of what people are actually capable of doing. But managing to an adoption number is silly. Managing to outcomes is fantastic. And that’s just the way that we work. That’s the way businesses have worked historically. AI doesn’t change that. It just changes where the bar is and what we’re capable of doing.

Nancy Wang:

Yeah. I like to focus on metrics and certainly as power users of DX internally, this is… I would say we first started our AI rollout journey sometime last year. And I think as the maybe only a private company on this stage, VCs get very excited. So if you’re in a role where you have to brief your VC board, and I guess I’m making fun of myself as well, my other half, they get really excited about, “Oh, this company’s doing this. This company’s having full agentic loops. They’re 10x more productive. Where are you guys on this journey?” And you’re often having to say, “Well, we have to go step by step,” because to Tim’s point, if you’re saying, “Hey, you must use it or else we’re going to fire you,” that’s not the right way to go. It’s going to incentivize adverse behaviors. You’re going to find someone maybe just jacking up token usage, building side projects, like all of the above.

And so what we’ve done internally is build guild of champions, AI champions, having real examples like the CX example I just talked about where AI can actually make your lives better. And then also celebrating people being very, I would say, creative, writing a skills library, being able to, for example, show how fast they were able to pull up the launch date by one or two months with the use of full agentic loops.

Taroon Mandhana:

Yeah. I think it’s the same with Atlassian as well. We are not mandating AI use. At the same time, there is a lot of effort around encouraging AI use and the way it really transpires giving the right set of tools to the engineers. We have a central team that tries pretty hard to make sure there is very easy access. There’s a lot of focus on enablement, and then you’re leaving it up to the teams to then go figure out how to best use them. We are doing a few other things. One, I think to your point, if teams are seeing success… And oftentimes you’ll find those success in a very bottoms of fashion, which you are finding every day, some interesting way somebody has really optimized some part of their workflow. So there, I think the part where we are doing some work is really amplifying some of those wins across the company. So that’s where.

And then the second piece is in terms of enablement, there are champions… So we have about 6,000 engineers. So everywhere where the team is of 100 to 200 engineers, there’s usually champions that have organically emerged. There’s a community of those champions where they’re sharing best practices. And also, those champions locally are trying to do a lot of enablement. In terms of in this particular code base, in this process, this is how we should perhaps go organize our code. So there’s a lot of activity and effort in terms of really enablement and access. And I think we are keeping our ears to the ground in terms of how it is changing, how people are working. And then it goes back to what we are tracking is more outcomes as opposed to activity. So it does come up… When new projects come up, I think our mindset has changed a little bit in terms of how long this will take.

So we are definitely raising the expectation of… Depending upon a type of a change, like our expectations on zero to one has really gone up, like it should not take us that long. Similarly, the timelines have shrunk for a lot of projects, but not necessarily mandating because we do think it’s a means to an end, it’s not the end.

Greyson Junggren:

Yeah. Can I add maybe one more thought on that too? I think all of us talked about the importance of understanding. We’re all…

Greyson Junggren:

I think all of us talked about the importance of understanding. We’re all pushing on these campaigns in order to help bring AI use across the business. We all are paying attention to it. I think all of us in here are paying attention to it, looking at daily active use or the depth of this use. That’s really key.

It is important still to understand the shape of activity, because one, when there’s an absence of activity, the inactivity is you ask yourself why. What’s going on? What do we do? How do we invest more in pioneering programs or training programs? Or what are the things that hold these people back from having the ability to pick up or the skill or the opportunity to pick up and learn how to use this new tool? So activity is a powerful signal in understanding, but it’s not the goal. The goal is outcome.

Abi Noda:

And beyond driving adoption, I know all of you are … Tim, we’ve been talking about this since last year, are you thinking about what is the SDLC of the future? What does an AI native SDLC look like? And it’s very difficult to project out far into the future, but I am curious for your thoughts on that, but also, are there specific parts of the SDLC that you feel will remain very human-led for the foreseeable future? And if so, what?

Greyson Junggren:

Yeah, sure. Okay. So let’s use a frame that we probably … There’s a lot of ways we can describe the SDLC. I’ll use like plan, create, validate, deploy, operate, a five-stage version of it. That’s pretty familiar. Create, we already see is changing. AI is already incredibly good at writing a line of code. Is it good at building complex systems? It remains to be seen.

What we know, where us as people, we’re taste makers, where people with understanding of craft is plan and validate. We are seeing a major change. And again, if we look at that system right now, where time goes, where human time and energy goes, the vast majority goes into create and operate. Roughly 80% in operate, maybe 10% to 15% in create, and the other three sort of remain. Of course, this varies based on business or unit, but this is sort of in aggregate across industry.

The shift that we’re seeing in these frontier teams is where plan and validate consume the vast majority of the time, which is really amazing, because the other things are just decreasing in the amount of energy and time it takes in order to get there. It’s also an identity crisis for developers, because if writing code, if sitting on a keyboard and typing lines of code was the thing that brings you a sense of pride, it’s challenging, because what really you are doing is making. But the great news is this is fantastic for makers.

It’s fantastic for people who are objective oriented, who are thinking about the thing they’re trying to build and bringing in. But this massive transition and shift to plan and validate being where the vast majority of the energy is, is a thing we very much see taking shape. And I think that will continue to happen and continue to mature as sort of the shape of AI stacks and the AI ecosystem evolves and matures. And as really, they just embed and spread more broadly across existing systems, as well as these new greenfield applications that are spinning up from scratch so quickly.

Nancy Wang:

Yeah. I think just to add to that, maybe the example that we have internally, which is we have stopped actually writing full length PRDs and instead have been moving into prototypes, to your point around plan and validation mode. And the fun is really now taking this prototype instead of a PRD, especially if you’re going out to customers and getting early validation. I don’t have to talk you through or whiteboard. I can actually show you a prototype demo and you can see the different screens, you can see the user experience, and that’s been super, super powerful. That actually has also helped us write better technical specs on the engineering side once customers have validated those user requirements now into technical requirements because we don’t have to … Actually, we’ve eliminated, I would say almost half of the sort of constant back and forth between engineering, going to product and being, “Oh, how do you handle this edge case?” Or, “How do users react to X?” Because you’ve already actually done that during the plan and validate phase now.

Taroon Mandhana:

Yeah. And I think maybe just to add to that, the thing I’m most sort of … I see there’s a lot of potential and we are only starting to scratch the surfaces on the operate side of the house. I mean, our engineers are spending an insane amount of time responding to alerts, responding to customer issues, responding to incidents. And that’s the place where we are starting to see a lot more activity where people are leveraging AI, but that could significantly reduce the RTB or run the business cost.

So for instance, we are already seeing patterns where alerts, there are agents now starting to respond to alerts and figure out is it a real thing and then only wake up a human if there is something real. During then incident management, can we actually reduce the meantime to detect and to remediate? Also in terms of PIR, I think there’s some amount of automation we are starting to see where people will understand what happened and then going and fixing the product often takes time. Can we automate some of these things? So I do think a lot of that aspect of SDLC is going to change and hopefully change for good as we start to bring more AI agents there.

The other piece I would also go back and say, there’s a bunch of stuff that kind of has to happen in the background. And we’ve talked about it a lot in terms of toil. How can a lot of that toil go away? We are starting to see patterns of that. For instance, accessibility bugs, right now a lot of accessibility bugs were taking a backseat at Atlassian. A lot of this is getting done much faster now and folks have automated parts of it. 50% of all vulnerabilities right now, the simple ones where you’re bumping up the library versions, things like that are being done using AI.

I think a lot of migrations between sort of test frameworks, things like that, there’s a lot of energy I think that is going in terms of, can we automate these parts? And this kind of goes back to the central dev infra teams to some extent. We are starting to see them really lean into and have some of these things run in the background to then go and give more time back to the engineer. So that’s the part I’m very excited about.

Abi Noda:

Are there any specific areas you think leaders should be really cautious about in terms of rolling out or AI? Maybe specific examples of things you’ve seen in your own organizations where you thought, “I don’t know if we should be doing that.”

Taroon Mandhana:

That’s a good one. I mean, at this point, I think the mindset we have taken is of experimentation. So in a way, we are encouraging people trying various things and we are okay with some mistakes happening. That being said, I would say still we haven’t gotten to the comfort level where humans are not in the loop. I think at the end of the day, there is a human accountable for reviewing a piece of code and making sure they’re responsible when things happen. I don’t think we have gotten to that stage yet.

I also think depending upon the criticality of the component that you are working on, if it is high risk, high stakes, I think we are being a lot more careful. And then I’ll just go back to something you asked earlier, which is particularly, I think I am a little worried about … We have seen patterns of duplication happening, tech debt increasing in many ways where people are quickly producing features and at some point the maintainability of the code is suffering to some extent.

So I do think I’m seeing patterns of that. I don’t think we have a solution yet. What it has prompted is we need to go back to standardized sort of approaches and we have to go back to more, I would say, the right of code quality in checks. So definitely we are making a shift there, but I don’t think we have stopped anything from happening yet.

Greyson Junggren:

Yeah. I mean, don’t delegate, validate to AI yet. We absolutely still need humans in the loop for important systems. Also, don’t delegate security to AI.

Nancy Wang:

Yes. Can I plus one to that as a security company?

Greyson Junggren:

Yeah, yeah.

Nancy Wang:

Thank you.

Greyson Junggren:

As you said that, it’s like, “Please don’t do that yet.” Use it to help you with pen testing, use it for your red teams, use it to help you battle harden your system, but for the love of God, don’t just trust it to deliver you secure products.

I think at this point, it is really about augmentation and lifting up. You gave a number of great examples of how it can be used to help us really build better systems, but at this point, the human is still in the loop. One of my passion projects and a big focus of a chunk of my team is helping us think about also the next generation, the next higher level of abstraction, where AI is able to take more of an even front seat role in helping build products and materialize entire products that are highly validated and highly trusted, but this stuff is still very much frontier. So at this point, the human is absolutely in the loop. We really still need that expert. It’s more than still, we need that expertise and we will continue to need that expertise long into the future.

Nancy Wang:

Yeah. I guess maybe just both gentlemen brought up really good points. Just how I think about the SDLC is as a pipe, right? So if you add more in the front, well, then you add more to the back. And so this kind of manifests itself with, as we get more makers using Tim’s analogies, writing PRs, well, then you need code reviews. And so this is where we’re looking at code reviews, we’re looking at reliability, maintainability. And especially as a security company, there’s some architectural things that we just will not compromise for our customers.

And so actually, one of the experiments that we’re running internally, which I’m happy to share with this group once we finish it, is working with a reinforcement learning lab to actually create a DevOps agent that is customized to our environment. We’ll see how that goes because it’s actually going to take real life data of how our engineers respond to incidents, how we triage different signals and alerts across different systems that we have, different platforms, different cloud platforms, and quickly sort of burn down that MTT art that we have internally. So that’s an example where can we actually delegate this work or at least delegate a significant portion of this work to an agent that’s been trained on our environment.

Abi Noda:

Right. So let’s talk about measurement and I’ll sort of steer this a little bit. I think firstly, it’d be interesting to hear about how each of you are sort of generally approaching measurement of AI in your organizations, but two specific areas I would want to double click on. One is, what’s the most top of mind question for you right now in terms of questions you want from your data or metrics? And two, so many folks here are in the position of reporting on progress and success and return on investment of these AI investments to their board or to their executives. How are you approaching that problem? Taroon, let’s start with you.

Taroon Mandhana:

Yeah, I think you talked about it in the opening. I don’t think we have the answers yet. I think we are spending a lot of time in understanding and even understanding activity for that matter, like what is that option like? Where is AI being used? What is it changing to doing to PR throughput, to cycle times, like PR cycle time, issue cycle time, what have you?

And we have very weak measure of what it is doing to overall business outcomes. But at the same time, the inputs into that, we are keeping a very close eye to that. We are also trying to figure out our token use, which is like a big budget item now. So understanding where it is being used, who’s using it, and more importantly, what are they using it for.

But in terms of ROI, I would say I don’t think we have figured out a formula let yet. Neither are we enforcing a very strong ROI requirement yet. And we are looking at quantitative numbers, but what at least I’m paying a lot of attention on our team is on the qualitative side. So our sort of dev infra team is constantly talking to engineers to understand is it really helping them every day and in what way? So we are paying a lot of attention to that qualitative pieces coming back? And the second piece is, can they do without it? Will they be unhappy if it was taken away? So really trying to understand. I do expect over time, we hopefully get to some sort of a quantitative metric that we can look towards, but we don’t have a story yet.

Nancy Wang:

Yeah. The framework we’ve been using internally is, I would say maybe a three step program. So one is to Taroon’s point, your AI token bill is really now your new cloud bill. So the same way that you looked at AWS bill and you’re like, “Huh, I guess my EC2 cluster has always been on.” Or like, “Why did you use that instance type versus this instance type?”

So same thing, why did you use this higher powered model for a simple task versus this smaller parameter model, which is cheaper or open source, for example, for some of you who can? As a security company, it’s a little difficult here. And then the second thing is going into intent, which is, well, what are you actually spending these tokens for? And so internally, and of course folks are interested, happy to share our beta as well. We’re building a SaaS cost management tool for ourselves and hopefully for customers as well, which allows us to see actually token spend and usage by repo and therefore by project. And so that’s been really helpful to kind of draw token usage back to intent.

And then the third thing, which I think you’ve also touched upon, Tim, which is provide guardrails abstractions. And so, one thing that we’re also thinking about internally is, for example, we have a CI automation service. Well, then can we map token usage by build volume to that CI automation service? And if that goes out of whack, well, then you can go back to the metrics and say, “Well, we need a token increase on this particular service.”

That’s just one example, but your mileage may vary, but essentially having frameworks, having hypotheses about how tokens are being spent in your organization can prevent, let’s just say, surprise conversations with your CFO. And maybe this is fresh, because as a startup, I’m also serving as our AI IT procurement person. So I’m now negotiating actually packages with all of the model labs and also being this consumer of that. Maybe there’s attrition state involved. But anyway, so I’m kind of in those conversations and I would just say a tip for those of you who are negotiating those AI token bills with model providers. Sometimes you can forward project, just like with Amazon, you have an EDP of how much you might consume. And so the larger, I would say, or more forecasting accuracy you can bring to your bill, the lower you can get your token per token cost to be.

Greyson Junggren:

I think all of us are trying to avoid surprise conversations with our CFOs at this point too. I love this. You guys gave great examples and some good things that I also feel strongly about. And maybe back to your question too is, what are we really trying to move from a metric standpoint? I mentioned that speed, ease and quality framework we used before.

And actually, Brian and I are working on a paper where we’ll publish hopefully in the next month or two, really diving into this framework and how we’ve applied it. So there’ll be a lot more information shared soon. But I’ll take a second and talk about our three North Stars, the three North Star metrics. On speed, it is idea to value. We are trying to measure and understand how long in calendar time it takes to go from an idea to value delivered in a customer’s hands.

On ease, there’s a lot of ways you can look at this, but the one that like our North Star we track towards is innovation time. So how much time you have, every person has as an individual that they can focus on innovation versus run the business and corporate overhead are sort of the three primary buckets that pretty much everything fits into. And then really maximizing that innovation time. It’s like a focus time plus plus.

And then quality is product quality, how you want to measure product quality on those different dimensions. So these are the things like our eternal wish and our eternal guidance is how do we look at these high level North Star metrics and then drive change to them, find the bottlenecks, find the segment that’s the roughest or the slowest or the most complex in there, and then simplify it. Invest in simplifying either via central infrastructure that we build and invest in for the whole company, or that an individual can build for themselves or their team at any level.

Abi Noda:

And I have a follow-up for all of you and we’ll need to move quickly just for the sake of time, but all of you are in the position of having the conversations with your CEOs and the board on how much AI is moving the needle in your organizations. As we talked about, a lot of people here feel very under pressure, a mismatch of expectations versus reality. So share a little bit of advice on how to navigate those conversations.

Taroon Mandhana:

Oh, actually, I would love to get advice myself because that being said, I think it goes back to somewhere you have to connect back to business outcomes. And you also have to, in many ways, ground it back. I think you started in the morning, right? The real productivity gains we are seeing when you look at a team of 6,000 engineers is somewhere in that 10% to 15% range. Yes, there are pockets within the company where we are seeing a particular individual or a team where their throughput is like doubled or tripled, but that’s not across the entire company.

And there are just so many factors why that is the case. It’s a combination of the people, it’s a combination of the dev environment/the whole SDLC loop within that team to the type of product that they’re working on. So seeing some pretty wide code something versus actually translating it back to a team of 6,000 engineers is just a very different ballgame. So one part of the conversation is really grounding them back into that data. Of course, it’s a very qualitative conversation.

The second thing after that I would say is it goes back to there’s sort of a flow to this. There is adoption, there is usage, and then there is outcomes. So having some conversation about still, even though we are figuring out the metrics, what are we looking at? Even if these are activity metrics, what are the activity metrics and what are the … I would almost say counter metrics with that. So it’s not an easy, simple conversation. Here is a number, it has gone from X to Y. It’s a combination of things right now. I do hope over time it becomes somewhat clear, because I would love this idea to product. It’s just super hard right now to measure this.

So we are talking in terms of PR throughput. We are talking in terms of number of features, whichever way we measure it shipped by a team, then we are talking in terms of qualitative examples and setting up somewhat of a grounded expectations over a period of time. And at the same conversation, we are also having the budget conversation, because it goes hand in hand, like we talked about forecasting. I think on my third version of the forecast, what we forecasted in January for the whole six months.

Taroon Mandhana:

Yeah, I know. It’s like the budgets are just moving way faster than we thought, so I think this whole thing became like a thing we have to manage it much like … I think your point was very valid. We literally started to think throughout to manage it like AWS COGS cost because it requires that level of rigor and sophistication. So anyway, that’s where I’ll stop.

Nancy Wang:

Yeah. Sounds like you have a board meeting coming up. We do too. So actually, this is very fresh because we’re actually in the process of putting together our board briefing on this. What I will say is, and again, this might be more specific for a private company board or a VC board. So we have a subset about your engineering force. We have roughly 600 plus engineers.

And so we think about, for example, dollars revenue brought in per person, for example, there’s common benchmarks. So you can look on Wall Street for what those are by company per vertical. That’s one. Specifically within engineering, we’re also going to talk about, to Taroon’s point, feature velocity. And so the specific metric we’re using to tell that story is time for first commit to merge, for example, which speaks to throughput, which speaks to velocity.

And then of course, for our specific VC audience, it’s really the relationship between, I like to say, tokens, product, dollar signs. And so the kind of more tightly you can narrate that relationship between, okay, this is how much we’re spending, right? This is the type of products we’re building and this is also leading to, for example, customer adoption, uptake, expansion. And I would say specifically for kind of VC audiences, kind of tying that revenue portion, that revenue attribution to usage is what they would be looking for.

Greyson Junggren:

And I see the clock just hit zero, so I’ll make this fast. We spent a lot of time talking about being able to show the value of every token used. There’s one responsibility that we have, especially as people in this room who are interested in this, is we also have to convey the importance of learning and experimentation in here.

We’re in a place, we’re in the industry, the fastest learners will win. What that means is we are going to spend tokens on things that do not create value, but they create learning amongst the workforce and how to use the tools in new ways. We have to also make room for that. Yes, we want the work that’s being done with AI to generate value. This is why, as I talked about those metrics, these are so essential and key. But when you’re talking about interacting with boards, we have to also create headroom to learn, because you have to be fast and that is the essence. And when the snow globe gets shaken and the industry is changing and the economics of the industry is changing, you would better be a fast learner and we need to make room for that as well. Let’s not forget that.

Abi Noda:

Well, thank you so much Taroon, Nancy, Tim. We didn’t get to cover everything, so they’ll be hanging around the rest of the day, so try to grab them if you have more questions. But otherwise, thank you again, and I’ll welcome Justin back up to the stage to introduce our next session.

Nancy Wang:

Thank you.

Taroon Mandhana:

Thank you.

Abi Noda:

Good job, man.

Designing the AI‑native engineering organization with 1Password, Microsoft and Atlassian

Show notes

Rethink team structures for faster learning

The best engineers think like makers

Expect more people to participate in software creation

Drive adoption through enablement, not mandates

The AI-native SDLC shifts work toward planning and validation

Measure outcomes, costs, and learning

Timestamps

Transcript