Skip to content
Podcast

Prioritization as code: An AI-supported framework for platform engineering

In this session from DX Annual, Eleanor Millman, Senior Staff Product Manager, and Mina Tawadrous, Associate Director of Product Management at SiriusXM, share how their platform engineering organization developed a prioritization framework for platform engineering teams serving hundreds of developers across a complex cloud platform. They explain how they define and weight platform-specific impact factors, use developer data to refine priorities, and score projects more consistently. They also explore why prioritization debates often stem from conflicting, invisible, or outdated assumptions, and how SiriusXM began treating assumptions like code by documenting, versioning, and reviewing them in source control. Finally, they demonstrate how AI can surface assumptions, connect initiatives to existing knowledge, and support project scoring while keeping humans in the loop. Throughout the session, they offer a practical framework for making prioritization decisions more transparent, data-driven, and scalable.

Show notes

Building a platform engineering prioritization framework

  • Platform engineering requires different prioritization criteria. SiriusXM found that traditional product metrics did not fully capture the value of platform engineering work, leading the team to define platform-specific impact factors around development speed, reliability, security, cost, platform efficiency, user trust, and data-driven decision making.
  • A simple scoring model created a shared language for prioritization. The framework combined impact, urgency, effort, and business needs to help teams compare projects consistently and explain why certain initiatives were prioritized over others.
  • The framework evolved alongside the organization. As company priorities changed after a major platform launch, SiriusXM adjusted impact factor weights to reflect new goals around cost optimization, technical debt reduction, and data maturity.

Using developer data to guide decisions

  • Developer feedback helped shape prioritization. Rather than relying solely on intuition, the team used survey data and other developer insights to determine where additional investment would have the greatest impact.
  • Impact factor weights were revisited regularly. Quarterly reviews allowed the team to adjust priorities based on changing business objectives and improvements in areas such as reliability and security.
  • Data increased confidence in prioritization decisions. By grounding discussions in evidence, teams were able to align more effectively on where to invest their limited capacity.

Treating assumptions like code

  • Many prioritization conflicts stem from assumptions rather than priorities. Teams often disagreed because they were working from different, invisible, or outdated assumptions about users, workflows, and business needs.
  • Documenting assumptions improved organizational alignment. SiriusXM began storing assumptions in source control, making them easier to discover, review, update, and validate over time.
  • Debates became more productive when assumptions were explicit. Instead of arguing over which project mattered most, teams focused on validating the underlying beliefs that informed their decisions.

Using AI to surface organizational knowledge

  • Assumption repositories became difficult to navigate at scale. As more assumptions were documented, it became increasingly difficult for individuals to find relevant context and connections across projects.
  • AI helped uncover relationships humans might miss. By searching assumption repositories, OKRs, and prior project data, AI was able to surface relevant information that would otherwise be difficult to discover.
  • AI improved information recall rather than replacing judgment. The goal was not automated decision making but helping teams access the knowledge needed to make better decisions.

Building an AI-assisted prioritization workflow

  • AI can guide teams through the scoring process. SiriusXM built workflows that ask clarifying questions, surface assumptions, identify relevant organizational context, and generate initial project scores.
  • Human validation remains essential. Teams review assumptions, challenge recommendations, and approve updates before information is added back into the system.
  • Each prioritization cycle strengthens the knowledge base. New assumptions, decisions, and project context become available for future initiatives, making the system more valuable over time.

Keeping humans in the loop

  • The framework is designed to support conversations, not replace them. Scores help teams discuss priorities more objectively, but important decisions still require context and judgment.
  • Stakeholder disagreements often reveal useful information. When the framework produces results that feel wrong, the discussion can uncover missing assumptions, incomplete data, or opportunities to improve the model itself.
  • The framework continues to evolve. SiriusXM treats both the prioritization model and the supporting AI tools as products that require ongoing iteration, feedback, and refinement.

Timestamps

(00:00) Intro

(02:58) Building a platform engineering prioritization framework

(04:59) The seven platform engineering impact factors

(09:38) Using impact factors to score projects

(13:11) Using developer data to refine priorities

(16:33) Three ways assumptions fail

(17:40) Assumptions as code

(21:00) New problems created by assumptions as code

(22:00) Using AI to surface assumptions

(23:44) Building an AI-powered feedback loop

(25:44) Inside the AI prioritization tool

(28:18) Three steps to build your own framework

(30:02) Q&A #1: Evaluating high-cost projects

(31:30) Q&A #2: The cadence of iteration

(32:10) Q&A #3: When the framework conflicts with a stakeholder’s priorities

(35:26) Q&A #4: Using the framework for non-developers

Listen to this episode on:

Transcript

Brittany:

All right. Well, welcome back. Everyone made it in. Amazing. Okay. Well, something I know we talk a lot about with our customers is how to align teams on a shared vision despite competing priorities and in a lot of instances, competing perceptions of what actually needs improving. So here to talk about the tactical approach they’ve taken at SiriusXM, please welcome Eleanor Millman, who’s a senior staff product manager and Mina Tawadrous, associate director of product management. Please help me welcome them to the stage. Welcome guys. Welcome.

Eleanor Millman:

Hello. I’m Eleanor. This is Mina. I guess I have an intro again here. I led product for SiriusXM’s platform engineering org for two years, and Mina is its current product lead. We’re excited to be here today to talk to you or talk you through a prioritization framework we developed as well as how data can increase confidence in what you are building and how AI can make all of this easier. You’ll be getting a how-to guide on how to prioritize developer productivity projects, especially when you have few to know product managers, which of course can be common in internal product spaces.

So I will set the stage for our story. A few years ago, SiriusXM created a 50 person platform engineering org to build a brand new AWS-based platform upon which a whole new tech stack and app rebrand would run. We were and still are serving 700,000 internal developers across the full SDLC. We have broad language and runtime support, so Scala, Java, Python, TypeScript, a few others. And we have deep capabilities. While we do support the typical backend service owners, we also have some more unusual use cases. Our Databricks based data platform also runs on our cloud platform. Also, we require infrastructure as code for higher environments, and many of the developers we still support at that time were inexperienced with owning infrastructure. Some also were not that familiar with AWS, and while many new Terraform, a number had never used AWS CDK and CloudFormation, which is what we landed on for our IAC requirement.

And lastly, we had few to no mandates other than this infrastructure’s code requirement, which meant that we had to support a broader range of tools. So typical product problem. We were struggling with this question. What should we build and in what order? And one of the challenges of being a platform engineering team, as I suspect many of you know, is that oftentimes you have to make impactful product decisions without having the product staffing that would help you make the best decisions. The team did a great job before I joined as the first and at that time only product manager, but they were finding that as the number of potential projects grew, their methods of prioritization were not scaling.

And I want to call out that platform engineering has prioritization challenges because it’s a non-traditional product area. For most externally facing products, you build a feature, you release it to users, and you can often measure the revenue a feature generates. But platform is twice removed from revenue and end users, so impact is much more complicated than just revenue. We knew we cared about development speed, runtime reliability, security and compliance and cost reduction, but we’re unsure of how to combine them. And yet, when I looked around to see what others had done in the fund for prioritization in this space, I didn’t find anything at the time. So we improvised. We took an iterative approach to developing our prioritization framework. We started with this formula, which is very loosely based on rice, a common prioritization formula. So we defined impact as how much value a project brings to users, basically value per user times number of users.

We made urgency, a function that blows up as a project’s due date nears. We try hard to avoid putting an urgency date, but sometimes you have true hard deadlines. In our case, for example, that company-wide launch date I mentioned. An effort is defined as developer months or developer weeks, whatever you’d like, needed to deliver a project. So this basic formula prioritized high impact or urgent items that took little effort to build, basically low hanging fruit. So we ran projects through that formula and it worked okay, but we realized that impactful larger projects were being deprioritized too much. So we used this math trick to dull the effect of larger efforts.

That change made the project prioritization ranking feel more correct, but we still were missing at least one element. Sometimes a project is required because internal stakeholders demand it. For example, security says you have to set up an audit trail or you won’t be in compliance. There’s no arguing with that. So we added this internal business need, which is really just a fudge factor to take this into account. Just like urgency, we strive to use this variable as little as possible, but it was required to correctly prioritize a few projects. And so this is the formula that we use today, but let’s dive more into impact because that’s really where all the complexity is. So as I mentioned earlier, we knew we cared about these four impact factors. So development speed, how quickly developers can deliver features, bug fixes, et cetera. Runtime reliability. For us at SiriusXM, listeners are our end users because we’re a media music company.

So we defined runtime reliability as giving these listeners a great experience through low latency, low error rates, and low downtime. Security and compliance, just what you’d expect. For example, implementing security scans in the CI/CD pipelines would have impact in this category, and then cost reduction. For us, we chose to define it as reducing infrastructure costs, which is mostly our AWS and Datadog bills. As with the formula I just discussed, we tested this idea out by scoring the projects in our queue, and we realized we were missing a few impact factors. First of all, platform engineering, our efficiency. So of course, when we become more efficient, that means that we can deliver value faster to our users. So refactors…

Eleanor Millman:

… that we can deliver value faster to our users. So refactors, making our own code reviews more efficient fell into this category. Another example, our software delivery team wanted to do some work to be able to make it easier for themselves to write automated tests for our shared GitHub action workflows. They struggled to prioritize this project until we added this impact factor. Next was user trust. So as I mentioned earlier, we had few mandates, so we had to foster adoption of platform engineering offerings rather than require it of our users. Having our users trust us makes that easier. So any project that originates with a user feature request would get a small bump in this category. Also, anything that prevented something we own from going down would also benefit. So for example, we self-host our GitHub instance. So if that went down, we would really lose user trust.

So projects that help keep our GitHub instance up and functioning well receive a few points in this area. And lastly, we in platform engineering really wanted and want to be data-driven, but of course it takes engineering effort, significant engineering effort to be data-driven. We use this category to prioritize projects around enabling telemetry in our offerings and building out our basic metrics platform. So there we go. We defined seven impact factors. The next question was how to weight them. So pre-launch of that app rebrand and new tech stack I mentioned, we knew that stakeholders cared most about development speed and then of course delivering a high quality experience to those listeners, our end users. So we started out with these weights, 30% for development speed, 20% for runtime reliability, and then the rest equally 10% each. But post-launch, the company started focusing on cost reduction, especially as traffic was flowing through the new AWS-based platform.

The great thing about the framework is that it was easy to change the weights to reflect these changed priorities. Of course, we increased the focus on cost. You can see it went from 10% to 20%. Also, we increased the weight of our platform engineering impact factor since we had been building up a bunch of tech debt as we were so focused on getting our users to that launch date, and we wanted to work that down again. And lastly, while we were making some progress in being data-driven, we looked around and we’re not anywhere where we wanted to be, so we also bumped up being data-driven a bit. Now, if you raise some weight, you must reduce others. And there was definitely a moment when it felt very uncomfortable, because who wants to reduce any focus at all? But you got to. And so what really helped us to do this, to make those decisions, we looked at developer data.

So we’re, of course, a DX customer. And so the DX quarterly survey was very useful as we made these decisions. For example, we saw that developers reported the systems they maintained were high quality and had few incidents. So we felt comfortable reducing the focus on runtime reliability. Not that we don’t care about it, but for future projects, we didn’t feel like we had to invest in it the same way as we had before. Likewise, data showed us that we were doing actually quite well on security and compliance, so we are also able to reduce the focus on that. And now every quarter since then, we take a look at these impact factor weights after we define our platform engineering OKRs, and we tweak them as needed to get alignment.

Okay. Finally, let’s talk about scoring specific projects. So we use relative scoring for the impact factors. As you can see in the legend at the bottom, we give a zero for no impact, one for low impact, two for medium, three high, and five if a project has massive impact in a particular impact factor category. And we estimate these impact factor values using user data where possible. We particularly like using the DX studies feature, which is basically targeted surveys. We find that a really fast and easy way to go get this data, but you can just go talk to users. You can send out your own surveys, whatever you need to do to get a little more of a sense of how a project may affect a certain user group. Just to talk through one example, let’s say project E. So you can see that it has a one for speed, so that means that it has some, but low impact on development speed.

It has a three for reliability, so it has a high impact on that end user latency, error rate, et cetera. Zero for our own efficiency, so must not affect it there. A little bit of impact, low impact on security. Five, so massive impact on cost, and two, so definitely somewhat helps with user trust, and must not have been helping us be data-driven. The very last thing that we added in this first big iteration was some lightweight documentation on the prioritization scoring of each project so that we could explain to the platform engineering org and any stakeholders why we scored a project a certain way. So we just wrote them down very quickly in blurbs, put them in a column in our project management software, and let’s talk quickly through an example.

So this was pre-launch. We had very little cost attribution and that did not feel good. So we really realized we had to have a project around AWS cost attribution. It did have an urgency date. We knew we absolutely had to be able to explain the surge in cost post-launch. So we put the date in there. We gave it a one, so low impact for development speed, because if we didn’t make this… There was some cost attribution, but not very much. And without a more granular cost attribution, we knew that engineering teams would have to spend quite a lot of time explaining cost spikes to users, or sorry, pardon me, to finance. So we felt like we could save developers a bit of time by doing better beforehand. We thought there was no impact to runtime reliability and security and compliance, so they got zeros. Cost reduction, we gave it a five, because of course, the first step in reducing cost is to be able to visualize those costs and understand where you’re spending money.

We gave it a one for platform engineering efficiency. We thought it would help us be a little bit more efficient because same thing. We’re going to have to explain our costs just as much as the engineering teams will. User trust, we said low, small impact. We felt like a modern cloud platform of course should have cost attribution. How can we look our users in the eye and ask them to trust us if we don’t offer this? So we acknowledge that we would build some user trust by building this. And lastly, being data-driven, of course, being able to visualize costs and where money’s going allows us to make data-driven decisions around cost. So there you have it. That was the first iteration, and Mina is going to talk you through some additional changes we made.

Mina Tawadrous:

So I came to [inaudible 00:19:15] a little later than Eleanor. So we was able to look at this model and see where we can make some improvements and where we could make some iterations. And as the teams grew, we kind of grew in size. We had multiple teams, so challenges arose that we thought we could kind of iterate on as well. So what was probably the most difficult about this model was when we had these cross-team priorities and there were conflicting assumptions between teams, right? You have something that requires capacity across two teams as backlogs, and then they start scoring the work and you get such dramatic different scores from team A versus team B. And you really need that buy-in from all those teams to feel totally bought in on why we need to actually dedicate backlog capacity to this functionality. So only thinking of this problem as a larger problem, and I kind of learned about an interesting study that I think demonstrates this pretty well.

There’s a study in the 90s called the… Or it’s kind of credited for coming up with this cognitive bias called The Curse of Knowledge. The idea there is if you know something really well or you feel like you know something very well, you actually overestimate how much somebody else knows about that same topic. So the study itself, what they would do is they would kind of split up two groups. They’d call them tappers and listeners. Tappers were told to just tap out a familiar tune like happy birthday or something like that. And they were asked to estimate what’s the percent chance the listener group is going to get that melody right? So tappers thought, “All right, everyone knows the song. It’s in my head. I’m tapping it out. About half the listeners will probably guess the song.” It was actually closer to 2.5%. So we really overestimate our ability or somebody else’s knowledge when we feel like we have knowledge in that space.

So this was kind of what was happening in platform engineering in these scoring kind of conversations. You had a team that was like, “I know the users, I’ve talked to them for like two or three months now. I know this space.” They would score it a certain way. That other team has different exposure to that same user base. They built their own assumptions and they had a very hard time understanding how someone else couldn’t see their worldview the same. So let’s go ahead and paint a concrete picture of this. Here’s an example project. Let’s say we were storing something around, improving our deployments, right? We might have two teams working on this store and we might say, “This is such a large initiative that we’re going to need a little bit of work from you, DevEx. We’re going to need a little bit of work from our delivery team. Let’s go ahead and score it.”

DevEx might go ahead and say our dev speed impact, back to that rubric, is actually a five, super high impact. Why? Because the current deployment time is actually 20 minutes. We’ve looked at DX survey data. It’s pretty close to 20 minutes, and this is going to hit 700 of our development team today, let’s say. Then you have delivery on the other side saying, “Okay, dev speed, probably a medium impact because actually, if you look at our GitHub actions data here, it’s closer to five minutes using actions data. And actually, if you look really deeply into this problem, it probably only affects Java developers.” So both are actually right if you think about it, but they’re actually coming at it from totally different assumptions. And the result here is more of like a heated debate about whether or not we should do this work, because you’re arguing and debating about priorities versus the assumptions.

So going deeper into this, we’re able to kind of like break down or think of assumptions and how they fail us in kind of three ways, let’s say. So one way is kind of like when you have two assumptions in conflict, right? The example there is a pretty good one. Somebody thinks deployment time is 20 minutes, somebody thinks it’s five minutes. There is two existing assumptions that are existing right now in conflict. There’s also assumptions can be invisible. This could be a situation where, again, think of all the floating assumptions going on in your businesses today. They’re not written down or maybe what happens often is somebody builds these assumptions over time and they might leave the team and then that assumption goes out the door with them. And finally, assumptions can be stale. There’s situations where you might believe you know something because you did solid research on it, you did some studies on it six months ago.

We know how quickly technology changes or user base changes, and that assumption all of a sudden becomes stale and it’s not as useful as it once was. So we kind of noticed looking at these three different assumption failures and we kind of like came up with a hypothesis, right? What if we treated assumptions the same way we treated code? And if we go back to those three assumptions or three assumption failure options, you could actually see how code could kind of resolve all three of them, and this is something we’ve been experimenting with. So conflict resolution, as an example, all of a sudden, if you think of like, if you had a difference in code between two developers, there’s actually a mechanism in code to go ahead and resolve that. You submit a PR. You could go ahead and have that PR reviewed, and then you’d get-

Mina Tawadrous:

… submit a PR. You could go ahead and have that PR reviewed and then you’d get to a point where the proper assumption is actually stored in code and you have a [inaudible 00:24:08] to tracking of the conversation that led us to that place. If you think about assumptions being invisible, this is, again, the reason why things are great in code is it makes this more discoverable. We store it now in a central repository, everyone has access to the repository. The assumptions are much more visible, they’re version controlled. We can know the data source, we know who was the last person to validate it. And data freshness, again, if you are bringing it in as code or bringing an assumption in as code, every assumption has a timestamp, an author, a history. So you can actually see, well, that assumption was last committed 12 months ago. We should probably revisit that.

So what does this actually look like? Here’s an example of an assumptions document. This is one iteration we had early on where you would literally, again, put your assumption in code. Again, probably from the engineering world, like ADRs, this is very similar but take a product lens to it, and why not actually store and list your assumptions in code? And we’ll talk about the power that this brings in an AI enabled organization.

So some things that we found super useful, the tricky thing here is you need this data to be flexible enough to store many different types of assumptions. So it’s hard to go ahead and format things very specifically in table format, so keeping a loose format was helpful for us. But the key areas here is what is the assumption in human-readable language? I don’t think you have to over engineer, try to think of how a computer might read this and I’ll explain why. But also, what is the data source? A very critical one, and also when was it last updated? And you can also see here, we’ve experimented with a few other things like what’s the confidence level of this assumption that we’re still going back and forth on?

So this was interesting, being able to like move things into a code base and dealing things with PRs, making things more visible. It was pretty cool. So we were able to codify these assumptions and it shifted the debates, which was the goal, from debating about priorities to debating about assumptions. And I think that was really helpful for us because priority debates, I’m sure many people have been in these, like what’s more important today? They’re emotional, they’re positional. It’s like my project matters more. People dig in and it becomes about team identity and political capital.

Assumption debates are very different. They are often a lot more analytical. The questions you might ask are what do we believe about our users? What data supports it? When was this data last validated? You’re not arguing anymore about which project matters. You’re arguing about what you know today about your shared reality. So when we made this shift, the temperature dropped in these arguments. People stopped defending their roadmap or like, “This is the most important thing.” Then you started actually being able to talk and engage about the assumptions it said and you built this space together of a shared reality.

So this created another problem. Every solution creates more problems, which is great. But you can imagine, you’re probably already thinking to yourself, “Okay, I’m going to store every assumption in code.” That’s a big file, and also, am I going to expect people to read this file every time they come up with a new assumption to see if one already exists? What’s actually going on here?

So it helped with the conversations, but these are the problems. Users weren’t always able to look up for existing related assumptions. It’s very tricky. Unless you are trying to literally look up deployment lead time and you want to know what the existing assumption is on the organization, you do a keyword search for deployment lead time. But a lot of the assumptions that we use in product, they’re hard to correlate or hard to know that something is even related to the initiative you’re working on, so it’s harder to find the existing assumptions that already existed when you’re talking about reading a flat document.

So it’s been almost the whole talk and we have not used the two letter word yet. So how does AI fit in? And we started using AI to help with this process. So what we’ve started to do is we’ve been able to use AI to surface those assumptions when the connections aren’t obvious. So imagine you’re storing an IDE plugin rollout and AI surfaces. In this case, our AI will go ahead and read your assumptions repository, and it might surface that, “Hey, did you know that your contract or headcount, which is 20% of your employee base, doesn’t have access to that IDE?” Those are very difficult assumptions to actually tease out if you’re trying to read through a whole document, but with AI, it’s become a lot easier to just go ahead and leverage the LLM to go ahead, read through that, read through your initiative, and try to come up with some connections to that.

So now what we’re hoping for, the future of this is people are writing Epic’s product briefs or scoring initiatives. We use AI to query the current assumptions repository, and we surface that relevant data, survey results and related past initiatives. It got us to think, what is our position here? Because as humans, we still like to make judgments. So where we see the best fit for this model, which I’ll show in a bit, is we still use AI for recall, so think of it as like the best way to recall information that is hard to access with our limited brain power, but we use humans still for judgment, especially on like, “Hey, is this actually something we believe in? Do we actually want to prioritize it?” We are still the judgment layer, but we make a lot better calls when AI is used for getting that data out of the system.

So here’s what this tool ended up looking like. So it started off in a matrix form, of a scoring matrix that we saw earlier. I’m going to show some pseudo screenshots of what this actually looks like today. So this is our feedback loop, if you think about it. So a product manager, an engineer, an architect will go ahead, enter a product idea into your favorite back and forth generative coding chatbot of choice that we cannot suggest what we use because we’re a public traded company. Then that same tool would go ahead, ask guidance questions, fill in the gaps about the initiative. Because it knows what it’s trying to store, it has a goal in mind. “I have to be able to collect this data to be able to make this decision more easier for my human.”

It finds assumptions that already exist in the assumption library, but then we have a human in the loop over here where the human actually has to validate that assumption. You have to read through it and you have to say, “Do I actually agree with this? Is this too stale? Is there something here that’s wrong that I want to submit back to the repository for future?” If that is true, we update the assumption, it brings back a store, and then same thing, is this a store you actually believe in? Do you actually believe in the priority of this work or not? And then you cycle and the assumptions library grows and grows and gets more powerful.

We’ve expanded to a shorter version, a rapid loop where it’s literally just send the product idea, give me a score. So we right now have these two models floating out there, and we’re just checking usage on each and seeing what people prefer today.

So this is fake, but this is what our chat tool, which is one way we’ve integrated this in, works. So same example here, but just now laid out. Somebody might say, “Hey, I have a skills marketplace that I want to go ahead and deploy to this IDP or in terms of developer platform. I want it to be a self-service catalog where teams can discover plugins and integrations for their favorite tool.” The AI assistant comes back and says, “All right, first, I need to understand more about this project.” Because again, it has the context of how to score the work. It asks guidance questions. This has been pretty valuable to get people to really think, “Why are we building this?” And you could go ahead and just … Users would go and just give it the extra information on that initiative if it wasn’t able to pull it from a PRD or another brief-like document.

The AI system would come back and actually list out the assumptions for you that it was able to grab, and the interesting thing here is we list out the line of code that it’s coming from, right? “So okay, I found five assumptions that apply. We have, this is all fake data, so don’t worry about, but active engineers, and it’s coming from this platform and it’s line 102.” We also go ahead and try to figure out, does this match any OKRs this quarter? So our OKRs are stored in code as well, and it will tell you what percent alignment. This contributes to a KR strongly, this one contributes to an objective strongly, and you still have to, again, validate or invalidate any of this, and any conversations you have with that tool, you actually could push it back. So you could say, “All right, actually, there’s one assumption you have over there. I actually differ. It’s actually 50. I think they’re talking about plugin requests per month, and where’d you get it from?”

“I got it from a Jira board somewhere.” So then you store that back in code, and then there’s a reference point of where this assumption got generated. And the key here is this is going to go back to the repository and it’s going to require a human PR approval, because we’re still trying to enforce this idea of humans being in the loop at the most critical time.

And then finally, all right, we’ve updated our assumptions. That’s great. Now, the next product manager, the next engineer will get a lot of value from this newly generated, newly fresh, up-to-date assumptions, but now what’s the store? That was the idea of this topic. What’s the actual priority here? So we’ll go ahead and spit out the score for you, and again, same kind of flow. You as a human have to validate, do I agree with this story? Is there some other context that’s missing here? So we always keep humans in the loop, both at the validation of the assumptions as well as the validation of the score.

And when that gets done, we send it off to a central repository, so now there’s more context for the next scoring initiative, because it’s interesting on, all right, well, last time we store an initiative related to this. It got a three for dev speed. So it has that context as well to hopefully increase or improve its ability to score work a little bit more effectively.

So that’s what we have working today, and I’m going to go ahead and give the clicker back to Eleanor to close us off.

Eleanor Millman:

Yeah. So we hope you try it out if it sounds like something interesting. Should you want to start tomorrow, you should do three things. First of all, you should define your rubric. You heard me talk about the seven impact factors that we’ve used. You can use them as is or change them based on what works for your organization. And then figure out what weights make sense, so you can look at company OKRs and developer data to align on weights with key stakeholders. You should revisit these frequently, of course, as company conditions change.

Next, number two, score your projects, writing down assumptions, as Mina described. On your first run, please always try to do this with people synchronously so that you can have good conversations that surface assumptions, and then document those assumptions, writing down sources, dates, owners. And of course, do treat those assumptions as code, putting them into source control and creating an easy mechanism for users to contribute to and change them. And lastly, number three, integrate your assumptions into your existing AI workflows so that AI can help recall that information when scoring, and you can easily loop in humans to ensure that the assumptions are validated.

So thank you so much. Please reach out to Mina and me on LinkedIn if you want to connect on this and other platform engineering and developer productivity topics, and of course, we are happy to take questions.

Brittany:

Perfect. Okay. Just a friendly reminder, you can submit questions through the app, so you’ll just go to your main page and then select session Q&A. We’re in session 4B for those of you who are trying to figure out where to leave those questions. As you both could imagine, there’s many of questions coming through.

Mina Tawadrous:

I know you’re …

Brittany:

I could imagine. There’s many of questions coming through.

Mina Tawadrous:

I know.

Brittany:

One of the first questions is what happens if you’re looking to evaluate something that would totally blow up costs? Do you still give it a rating of a 5? Do you ever think about changing your model? How do you guys approach that?

Mina Tawadrous:

You go.

Eleanor Millman:

We definitely had this conversation, like should they be able to have … Because cost reduction, of course, we give points for a project when it reduces costs, so should we do opposite prioritization if it’s going to increase costs? We had many conversations like this. In the end, we decided not to. We figured that there were other mechanisms to decide whether that project made sense. Hopefully if it was going to increase costs, folks had thought about the benefits in other ways. But there’s no doubt that’s the sort of thing that you want to talk about, and one thing I really liked about this framework is it made us have those conversations. By trying to make something more quantified, you have to debate with your coworkers what makes sense for your org.

Mina Tawadrous:

Yeah. And I think now when we have work, again, keeping a more iterative model, if your users start scoring some work and you start seeing it over and over and over again. We’re not going to ever hit moonshot items, or we want to park some capacity for some things that might blow up costs but have so much developer speed, you can iterate on the weights or even the rubric itself. So I think it’s very good when you start to keep open to iterating those initial weights as well.

Brittany:

How often are you all iterating on this? Do you add things to this project as new ideas come up, or are you only going through this during your planning cycles?

Mina Tawadrous:

So on the AI tool, we are iterating as we speak. Every time a user scores something, we go ahead and get immediate feedback from that user. We mentioned that we even have a parallel agent right now that’s actually running in store and work as we go, and we’re just using that primarily for feedback on the product, not necessarily for the stores themselves. So we’re iterating on this all the time.

Brittany:

Okay. So what happens when the framework says that something that a big stakeholder wants shouldn’t be the priority? Have you had any examples where that’s happened, and how have you handled that conversation?

Eleanor Millman:

Yeah. I would say that, yes. I think many of us in this room probably have had that happen, in general. There’s a number of ways that we’ve handled it depending on the project. First of all, it always gets us to have a conversation, which I think is a really positive thing when we’re talking about products. We would look and certainly ask the stakeholder, “Why do you think this is so important?” To the point of everything Mina said about assumptions, maybe the stakeholder has certain assumptions that we don’t hold. Maybe a bit of user research can go and help us align ourselves together, so that’s one possibility.

Another possibility early on is maybe the framework itself is wrong. Maybe that stakeholder really values something that needs to show up in the framework. Hopefully you would not forget security and compliance, but should you, maybe InfoSec will show up and be like, “Oh, my gosh, you gotta be secure.” That might prompt you to realize that you’re not correctly thinking about the value you bring to your company.

And lastly, I would say that was that internal business value thing. Sometimes, hopefully rarely, you just gotta let a stakeholder demand it. Finance might require something, security might require something and you have to do it, but hopefully as an org, you can mostly focus on the value you bring to the developers you support or the users you support and minimize just random asks from stakeholders.

Mina Tawadrous:

I could add an example that you think of. We had a really large initiative. It was like a six-month project, going to change the world type of initiative. So there’s a lot of investment behind it, a lot of belief behind it. And we launched, we were able to get something out there, and we gave user value, but not to the impact that was hypothesized, so we used the formula for the follow-up, right? “Okay, we did V1. We have all these amazing features that everyone is so excited or was so excited for for V2, and we have all this momentum. We just did this giant launch. Let’s go, go, go to build V2.”

And by using the formula, we were actually able to have the conversations with stakeholders that were like, “This is the right time to continue building because we have the momentum.” But when we plug into the formula, we look at the developer speed and when we look at other initiatives, they’re outranking V2 of that product, which is a painful thing. It’s a painful thing to say, “Let’s slow down momentum on this thing that I firmly believe in.” But when you have it down in metrics and in numbers and you’re able to explain it, it’s able to deal with the really heavy stakeholder conversations easier than without those numbers.

Because again, it’s going back to it’s very difficult to debate priorities. It’s hard to tell a stakeholder like, “This is not the most important thing for us.” But if you start just throwing out assumptions, throwing out data and saying, “This is important, but it’s going to give us X value, and comparatively to this, that’ll give us Y value,” now we’re just talking about value, and we do have the discussions about the assumptions instead. So I found the formula was really helpful in that example.

Brittany:

Yeah, a really creative way to be able to help pull emotion and ego out of some of these really big and hard decisions that everyone is having to make right now, so really, really cool. What about outside of the developer space? Are you leveraging this in any way to help in other roles outside of developers, or is that something that you’re also thinking through?

Mina Tawadrous:

I’ll do a quick response and I’ll tee it off to Eleanor, but we’re actually doing, similar to a lot of companies here, we are looking at like the non-developer, the citizen builder, whatever you want to call them, the persona that does not code every day. I’m one of them. So we have some really exciting work in that area coming up, and we ran it through that exact rubric that you saw. Eleanor ran it through like a few weeks ago. It showed me the score and I was like, “That’s really low.” Because the rubric, right, is prioritizing developer speed, and we still believe developers are those that were like traditionally hired developers. So I think the rubric, as you see today, really opens up this question around how do you support these users, and what do you have to change about your prioritization skills to get there. So I think it’s a very interesting time, and I think, Eleanor, you’ve done some work looking into this area.

Eleanor Millman:

Yeah. And it’s just another example of it. Don’t see this as a really firm framework, like we have to prioritize this project this way because that’s what the framework says. It only either what we see is prioritized, like if we see a project prioritized a certain way, it feels right, great, and we can explain why. And if it feels wrong in our gut, we go and figure it out, so exactly. I scored this project that I’m working on now to help non-developers deploy apps they make, and it was really low and it just felt wrong, and of course it’s because we are not including those users, because they’re not typical platform engineering users.

So I think the framework will probably, at least for us, only be more useful as we first push forward the conversation of are these users part of our user base. I hope the answer will be yes. And then when they are part of our user base, “Well, wait a second. What do they care about and what value can we bring to them?” Because runtime reliability, for example, these users probably won’t bring that, but they’re making proof of concepts that may help our overall company do better, and they maybe will be making internal tools to help non-developers be more productive. I don’t even know. So the point is we’re probably going to have another round of conversations of what impact factors should we consider. What are their weights? What’s the benefit to the company? So I’m excited to go through that.

Brittany:

An exciting time for sure. Thank you both so much for being here. Super interesting stuff. I know so many in this room have probably hundreds of more questions for you both, so I’m sure you’ll be very busy the rest of the day. Thanks again for being here. Everyone else in the room, we will take about a five-minute break and then we’ll be back up here for our last breakout session.

Eleanor Millman:

Thank you.

Mina Tawadrous:

Thanks.