Skip to content
Podcast

Dylan Keyer

Show Notes

Internal developer portals: from spreadsheet to strategic platform

  • Started with a spreadsheet. CarGurus built their internal developer portal, Showroom, in 2019. The initial goal was simple: solve ownership confusion during a monolith-to-microservices transition.
  • Showroom now powers five key pillars:
    • Discoverability: Centralized source of truth for services, jobs, and ownership
    • Governance: Dynamic compliance checks to ensure “Golden Path” adoption
    • Self-serviceability: Spin up new services, subscribe to topics, and more
    • Transparency: Access logs, service data, and operational info in one place
    • Operational efficiency: Reduced cognitive load and friction across the SDLC
  • Interfaces with infra but abstracts complexity. Showroom sits on top of observability, alerting, and infra tools, providing a consistent experience even as backend systems evolve.
  • Evolved by solving real problems, not chasing trends. Rather than set out to “build an IDP,” the team invested in Showroom whenever it accelerated strategic initiatives—and saw massive returns.
  • Impact: 75-day setup cycles cut to <3 days. By leaning on automated workflows, best-practice templates, and centralized access to information, CarGurus dramatically increased service creation velocity.

Build vs. buy: why CarGurus chose to go homegrown

  • Started before Backstage existed. No off-the-shelf solution could meet their requirements when the initiative began.
  • Kept evaluating even after Backstage launched. But concluded the effort to customize it would be equal to enhancing their own tool.
  • Customization was critical. Their IDP integrates deeply with internal systems and niche tools not typically supported by commercial solutions.
  • Consistent UX across tools. Showroom provides a single pane of glass—developers don’t need to know what’s happening under the hood as tools change.

AI coding assistants: a platform-level initiative

  • Bake-off between three tools. CarGurus ran a head-to-head comparison across multiple AI assistants. Feedback showed different tools excelled in different languages and workflows.
  • Standardization isn’t the goal. The company is embracing flexibility—letting teams and engineers choose the best tool for their domain.
  • Qualitative feedback > raw telemetry. Developer sentiment and perceived time savings are more actionable than latency or code volume alone.
  • Measuring impact across six dimensions:
    • Speed: Diffs per engineer
    • Efficiency: Developer Experience Index (DXI)
    • Satisfaction: CSAT scores
    • Adoption: Tool usage tracking
    • Time savings: Self-reported
    • Burnout prevention: Monitoring job satisfaction
  • Maintainability dropped slightly. AI-generated code may feel less “owned,” even if it’s correct. A known and acceptable trade-off so far.

Driving adoption: what’s working

  • Leadership buy-in was immediate. Execs were eager to invest in efficiency. AI adoption was viewed as a strategic imperative.
  • Champions are key. Internal AI “power users” share tips via tech talks and videos to scale best practices organically.
  • Internal marketing makes a difference. Just telling developers they had access wasn’t enough. Framing AI tooling as a must-try (not optional) led to a meaningful adoption spike.
  • Focus is on education and experimentation. Goal is to help developers integrate AI into their daily flow—not just try it once and move on.
  • Expected ROI? ~15–30% gains in efficiency in early stages, with room to grow as usage matures and more advanced agent-based tools are explored.

Transcript

Abi Noda: Dylan, thanks so much for your time today and coming on the show. Excited for this conversation.

Dylan Keyer: Yeah, I’m super stoked to be here. Thanks so much.

Abi Noda: Well, we’re focused today on talking about this data lake that you’ve been working on at Twilio for… How long has it been now?

Dylan Keyer: Since the beginning of this year, but there’s a lot of historical background on past attempts that goes way past that.

Abi Noda: Yes, the ever elusive engineering data lake problem, which we’re excited to dive into today.

Where I would love to start is understanding there’s a lot of reasons why organizations get into trying to build this sort of thing. And at Twilio, I know that this was actually driven from my understanding from your CTO who had recently joined the company. Tell the story of, again, what prompted this effort at Twilio.

Dylan Keyer: Yeah. So we’ll start with how it started, which was around…

We have a lot of engineering teams, we’re pretty big at Twilio, and they’re all well-intentioned in that they want to have reliable services. They want to make sure they have good process that they develop well.

And so we have a pretty strong culture around operational reviews. Teams hold them on a regular basis, varying levels of the org structure.

The issue is that there is no central team, there’s no central body of work, no central tools to support this. So every team was coming up with their own solutions.

You have product-facing engineers writing scripts to suck data out of different tools to pull it together, to enable a dashboard so they can look at it with their director and their VP.

No central team doing that means a lot of inefficiency because every product team is doing it.

That’s how it started. So it’s very specific use case. Let’s solve this problem once across all 12.

Abi Noda: And what kind of things… Define operational review. What types of metrics or questions? Describe more of what you mean by this.

Dylan Keyer: Yeah. That’s actually part of the problem because each team ends up having their own opinion around what’s a good thing to look at and they might miss things that are important.

But I think at a high level, teams are and should be looking at in the past two weeks or month, for example, what have the golden signals been for our services? Availability, latency, saturation, that kind of thing. How many incidents have been declared against the services that we maintain? What’s the paging volume look like, both through the lens of how shaky our services been but also through the lens of what’s the on-call experience like for our engineers? Are they getting burnt out? And then of course you should be looking at things like vulnerabilities in your software, making sure you prioritize non-feature work. So there’s lots of different metrics you can look at and pieces of data there.

That’s the high level, I think, of things that are pretty standard to look at.

Abi Noda: And so you’re looking at this data or you were seeking this data from the perspective of being able to have conversations around how are we doing? Are we getting better? Are we getting worse? What actions do we want to take? Share more about how you guys were thinking originally about how you would operationalize this.

Dylan Keyer: I think you’re spot on.

The idea was we should be able to show this data in a single pane of glass through the lens of trends, in particular output metrics, that are maybe tricky to control but are good to look at as a litmus test of are we getting better at certain things.

But then there’s also things you want to look at where you can control the knobs and you want to be able to showcase it in a way that biases teams towards taking action.

Those were definitely the two buckets for all these different bits of data and metrics for sure.

Abi Noda: And when you were just getting started, because we’re going to talk about the work you’ve done and how this is being used and all that, but at least initially when your CTO was looking for this to exist, what did you estimate in terms of, I don’t know, cost to build, the number of people who would need to be involved in this? What was your back of the napkin sizing up of what this endeavor was going to cost?

Dylan Keyer: Yeah. Don’t really have a good answer. There were so many unknowns for us to be honest, because we first had to ask what are all the places we need to get data from. What would be our golden lists of metrics? I gave some examples before, but we really wanted to iron out what is that standard list. What is the blessed list from our CTO and from leadership around what we should look at, work our way backwards to figure out where do we need to get data from? How long do we need to retain it to be useful? Going back to the point of trends.

We went through that exercise first to figure out what are all the things to plug in and then what are all the other sources to plug in to be able to enrich it.

Yeah, that was the first exercise. I guess not to answer your question, but that was where we started.

And then what we wanted to arrive at next wasn’t even back of napkin math yet. We just wanted to know how can we bake off different options. Once we knew the answers to the questions I just mentioned, we just figured out what high-level technical solutions that are common in industry? What are the tools out there? Let’s spin up a couple experiments, bake them off against each other, and then figure out, I guess from there, okay, now we have a sense for our data volume and retention throughput and frequency and all this good stuff. We have a solution we think we want to take to the next level towards an MVP, then we went to the back of napkin.

Abi Noda: What did the staffing for this look like? I guess coming from your CTO, I imagine, that hopefully there was a team put into place, or was it just you?

Dylan Keyer: No, it wasn’t just me.

I would say it was best effort to start. A lot of people that were passionate about this space, but not maybe formally part of the same team, rallying around it.

We sit in a central SRE space, central SRE function but we didn’t have, at least this time, a single team that would go and own this. So came to me and I found some other people that would be interested and their leaders and all the way up to the CTO all got aligned on, “Yeah, this makes sense. Let’s not formalize anything around it, but let’s at least put some bodies against this thing.”

Abi Noda: Let’s talk about the architecture of the solution a little bit. I understand there’s several components of this in terms of how you’re extracting and loading data, how you’re transforming the data, and then ultimately how this data is exposed to folks who are trying to leverage it. Share a little bit about, maybe to start…

I understand you ended up leveraging an open source extraction tool set. Maybe start there and describe to listeners how this thing actually works.

Dylan Keyer: One of the principles we started off with was that we wanted to follow ELT mindset because oftentimes, the value that you get out of this data comes when you transform it, when you enrich it by joining it to other data sources. We wanted that to be more transparent. We didn’t want that to be buried in spaghetti code that sits too close to the API calls that get the data in the first place.

We started off with ELT. Yeah, and you’re right, we’re using an open source framework for how we do the E and the L. But we have a lot of proprietary systems at Twilio where we need to be able to wire into them. So we leveraged the open source toolkit for that. That does our extraction, our loading.

We use AWS heavily at Twilio. Our backbone is the standard data backbone that you’d find in AWS environment. We load our data in S3, which solves a lot of problems for us around durability, retention. It’s cheap-

Abi Noda: Cost.

Dylan Keyer: Yeah. Going back to the back of the napkin, I guess we didn’t think about it because we didn’t conceive of any better solution at that point from a cost perspective. But then, yeah, we crawl our data with Glue, we publish it for querying in the Glue Catalog, and then yeah, it’s wired up into Athena for us to consume.

That’s the E in the L part, and I’ll pause there, we can go into the T part. That’s bag of goodies.

Abi Noda: Yeah. I would love to dive into the transformation step. What tooling do you use for that? I don’t know, what are some of the challenges you’ve encountered there?

Dylan Keyer: Taking the first part, so we use dbt Core, the open source. Huge fans of dbt. None of us knew what dbt was before January and now we’ve been baptized in fire with it. Gotten to test some of its limits and have really, really loved it.

It marries source control and dry principles with SQL and things that are pretty common, like a common skill set to have when you’re trying to query and transform data.

Yeah, huge fans of dbt and that is what we’re using for this.

I think to answer your other question around some of the challenges, what we found is that writing the SQL is easy. dbt makes this an easy technical problem to solve. We’ve loaded a ton of data from all the places that it lives to answer the questions, now we just need to transform this data, join together, and there it is, the end of the rainbow. We finally arrived there.

But it’s not so easy because we’re running into these problems with what does our taxonomy look like? What’s the accuracy of our software catalog? How rich is our metadata? How consistent is it?

We can double click in any angle on that because that’s been the challenge.

Abi Noda: Yeah. Well, definitely planning on double clicking into that challenge as well as others.

Before we go there though, share how do you expose this data? Is this something where a traditional data team you’re receiving tickets and generating analysis and reports for leaders across the organization? Or is this more self-service and how have you achieved that?

Dylan Keyer: Definitely more the latter.

We don’t want to be a blocker at any stage of our ecosystem for other teams, one of the past mistakes we’ve made. And so yeah, self-service first for everything. So from the point of, “Hey, we want to ingest data,” using the open source tool I mentioned before, or “We want to transform it,” or “We want to view it and we want to build our own dashboards around it,” everything is self-service.

Particularly when it comes to how we expose this. Again, huge fans of open source tools were self-hosting Apache Superset.

And so because most of our customers or consumers if you like, are engineers, they like to get their hands dirty, they like to write SQL, they like to have that flexibility, Superset serves as a thin layer on top of Amazon Athena for us and it really simplifies our access model.

We wire up Superset behind our authentication provider, our SSO provider, everybody gets access into it. Makes that super easy for us to manage. Go run your queries against any of the data, answer the ad hoc question you have.

It also goes toe to toe pretty well with some of your traditional BI tools. Folks that want to have long-lived dashboards to run their operational reviews as an example or do other things, full suite of charts that you can put in the dashboards and you have a ton of flexibility there as well.

We’ve really liked it as our Swiss army knife to solve all the use cases we have.

Abi Noda: Let’s talk about, again, if we think back to the beginning of the conversation, we talked about what initially drove this, getting an understanding of how are we doing, how can we get better operational reviews. How have you begun to see that actualized? Is this platform now being used for an operational review process? What does that process look like? Walk us through…

This is one of the biggest questions I hear around these types of platforms is how do we actually use it? What does that look like for you at Twilio?

Dylan Keyer: We’re fortunate in the sense that our senior most leaders are bought in to running data-driven operations. Our CTO is 100% supporting us in this.

Our North Star has been for a few months now that… We’re leveraging this towards creating, I don’t want to call it the mother of all dashboards, but you hear executives often ask for my single pane of glass.

We’ve achieved at least a nascent version of that, that we’re now going to bring across all of our senior most engineering leaders across Twilio with the support of our CTO. We can filter by each of these users to look at the data in their domain, and we’re going to use this starting, I guess I’ll say very soon to be safe, in the coming weeks, I’ll say, to run the mother of all operational reviews at Twilio.

Our senior-most engineering leaders looking at a dashboard in Superset, making decisions, taking action, yeah, we’re there. We’re there at the highest levels of our engineering leadership, which is super rewarding.

We could talk more about other levels too but to me, that’s my favorite thing to talk about because it’s the most visible thing for us at least internally.

Abi Noda: What does that dashboard look like? Again, you don’t have to go into specifics if you’re not comfortable sharing, but are you looking at data at a service-by-service level, team-by-team level, or is this more organizational? What is the data you’re actually looking at?

Dylan Keyer: I mentioned before we started off with our base set of what are the golden metrics we care about, where does that data come from. We worked backwards. That was actually all towards enabling this North Star dashboard.

We look at the high level. We look at on-call health and again, through the lens of which services are experiencing the most issues, and we have a lot of paging volume, and then which engineers… In a blameless way, let’s understand which engineers are getting burnt out.

And so yeah, we can slice this by services. We can also slice this by actual users in the org chart to figure out which pockets of our leaders need to lean in on their teams to make sure that they’re not being burnt out.

On-call health, we look at what we call our customer experience. This is through the lens of our customer-centric service-level indicators.

We did a lot of foundational work last year actually to make sure we have those. We’re now pulling them in and enriching them to look at, “Hey, we sell this suite of products, customers want to do this set of things with each of those products.” Has it been a good experience doing either of those things right through the lens of availability, latency, etc. And again, that’s slicing on a per product level actually through the lens of how our customers buy our products.

And then we look at a lot of deployment, velocity, deployment quality stuff like how fast are we shipping and again, that’s on a per-service basis, but we also can slice this by the org to look in certain pockets. How frequently are we doing this in lower environments? How frequently are we deploying, not just how fast?

That’s rounding out where we are today with how that dashboard looks right now.

One of the things that was really interesting to me when we were talking about this earlier was how the adoption of this platform has expanded, or at least begun to expand, beyond this initial engineering operational review use case into a number of different areas. That’s something I’d love to learn more about from you. One of the use cases you mentioned to me was legal. We’d love for you to share how your legal team has found value in this platform.

Dylan Keyer: The way this should normally work is we have a business problem to solve. We gather requirements and we steer forward towards it.

What we found is… We have this phrase critical mass. Eventually, you have so much data coming in from so many different places that you can’t even fathom what the use cases will be until you already have the data and you go backwards into it.

The case for legal was actually around licenses across all of our GitHub repos. Again, we have MNA activity, we have internal and publicly hosted like GitHub.com repos. Repos everywhere.

And so it turns out our legal team really struggles to understand what are the license types being used across all of our GitHub repos. Do we have permissive licenses in use? Do we have copy left?

It all comes through the lens of intellectual property risk, but this is something that as far as I’ve heard from them since they’ve been engaged, they either could not do this or couldn’t do it without an insane amount of effort to be able to collate this all together.

This is something they just stumbled upon like, “Oh, hey, you have all that data. Let’s write a query in 30 seconds and put together a pie chart that answers this question for us.”

Abi Noda: That’s really interesting how, like you mentioned, typically the journey is identifying the needs across the business and building toward a solution, but how you’ve amassed this treasure trove of data and folks across the business are coming to you with questions and use cases.

Another example you mentioned to me was your security team. We would love for you to share a bit more about that.

Dylan Keyer: Our security team struggles in particular just because the vast amount of tools that we need to deploy at our scale and complexity around like, “Hey, what’s happening in the runtime? What’s happening in source? What’s happening here? What’s happening there?”

We have an amalgamation of tools all do their job and they all surface their insights within the confines of their own UIs.

Our security team struggles with at least two things. One, how do we get this all into a single pane of class? And again, the single pane of glass idea. And then how do we enrich this?

It’s one thing to know that service XYZ has vulnerability ABC. Well, who owns that service? Who needs to take action? That’s often a piece of metadata that is super difficult to shove into the data models of all these different tools. And so not only do you want to pull them together, you also want to enrich them.

Yeah, that’s basically what’s in flight now with our security team now that they know that we have these patterns of repeatedly doing this from different data sources. They’re evaluating how they can get all the data from those tools and enrich it within metadata we have, start to reduce some manual toil that exists today. We’re trying to attribute ownership and drive action.

Abi Noda: I want to shift into talking about challenges of both operating this platform but also extracting the most value from it. You touched on this earlier today as well as in our recent conversations, but you mentioned to me that the real tricky problem with this is not extracting data from different sources, but rather being able to reliably join the data together. Can you expand on this? What’s the challenge here?

Dylan Keyer: The challenge is I’ve got data loaded that was easy a couple days in the sprint. What should be even easier is writing my left join, my inner join, this value from this source, this value here.

I’ll give you an example.

Let’s say we have a critical piece of metadata that’s like an AWS tag. Should exist on every resource. That should be my super reliable piece of information to tie this back to the service for which that resource belongs, which then allows me to figure out who owns it,

What happens when that tag either isn’t used consistently? Fifth percent of our resources might not be tagged. What happens if there’s nothing enforcing the case sensitivity on that tag?

Now, we find ourselves doing weird things that we don’t want to have to do that are super brittle, fuzzy string matching, and let’s just make all these lower and hope that, yeah, they’re all the same thing.

That’s just one example, but that just speaks to the bigger problem, which is the join is easy to write. Knowing what’s going to happen when you do the join is the scary part.

And it’s the part that we’re grappling with right now, especially when you have systems that have evolved as your business has evolved and you weren’t mature enough to know that you need to enforce these guardrails and standards, all your different systems are going to have, what I’ll say, varying levels of quality in how their data models have been implemented and how their metadata is controlled, so to speak.

By far the biggest challenge.

Abi Noda: For context for listeners and clarification, a lot of the joining that I understand you try to do is data from third-party tools, like PagerDuty for example, or your ITSM platform, and joining that third-party data back to a central catalog of services, components, and teams. Is that correct?

Dylan Keyer: Yeah, that’s the bottom. That’s exactly right.

Same thing with the org chart. We enrich it with our software catalog, we also enrich it with our org chart.

Abi Noda: How do you deal with this problem? As you mentioned, this is a expected problem, well, maybe not expected until you try to solve the problem you’re working on, but a natural problem in organizations where the data, especially in third-party tools, isn’t being tracked in a standardized or maybe consistent way, or maybe not a way that’s consistent with your other data. Where do you go from there? Is it a matter of doing the best with the data you have or mandating certain standards and trying to enforce them? What are some things you’ve tried to do to overcome this challenge?

Dylan Keyer: I hate to give this answer. It depends. It depends on how bad the situation is.

I think the one thing that’s consistent across them though, what we found to be true, and maybe this will be impactful for your listeners, is if you paper over the problem, if you try to have Herculean effort with the brittle solutions, I mentioned, fuzzy string matching, weird coalescing, manually maintained mapping tables, it paints a picture that your VPs and your leaders might want to see. They don’t want to see null values, they don’t want to see that 50% of the data has been dropped on the floor because the join didn’t succeed.

But if that’s the reality and you’re papering over it, you’ve lost the forcing function of like, “Hey, this is a really bad thing that we need to go fix,” through examples, like you mentioned. Figuring out do we have a standard that’s not being met or do we lack a standard that we need to conceive of?

Those are all conversations that need to be had, system by system, with some consistencies of course, but that’s the one thing that I think is the common undercurrent.

Do not paper over it. Do not offer weird brittle solutions. Show it in all its glory, garbage in, garbage out, then work your way backwards to figure out how you go and fix it at the source.

Abi Noda: Let’s maybe dig into a specific example like PagerDuty. I think you mentioned you observed the challenge around the team names in PagerDuty not lining up with either team or service names in your catalog. I’m curious if and how you’ve approached that.

And a thought that comes to my mind is what if, for example, teams are using PagerDuty, perhaps they’re just organizing in PagerDuty with a different sort of team construct and definition than your catalog altogether that maybe cannot even be coalesced. Again, I’m speculating here, but how have you thought about that problem specifically?

Dylan Keyer: Caveat being you’re speculating, but without me disclosing too much, you’re not too far off.

Yeah, let’s take this one as an example.

The way we’re handling this is we make sure that leadership is aware up to the executive that owns that tool because again, this is a tool that is critical to our business operations. They need to be aware of this, they need to be aware of the limitations that this is causing from a reporting perspective.

Get their buy-in that this is a problem that they’re worth investing and solving. Check. We have that.

We then go to the people that are the technical owners of this system. They’re the ones that have the knobs to control of saying like, “Hey, we’re going to put in place a certain guardrail,” or, “Hey, we’re going to go back and nuke some of these things from orbit because they haven’t been used in so long.”

Work with them to align on, “Here are requirements. Here’s what our leadership wants to do. We want to be able to report on things and match them here and share requirements with them.” They don’t have a product manager, that’s okay. You’re going to have to help them in those conversations.

Agree on what those requirements are, and then let them define the standard as the owners of that system that needs to either be enforced better or just needs to be implemented from scratch.

That’s what we’re doing right now with PagerDuty, to be honest. It’s very much in flight. I’ll be happy to report back in hopefully the not-so-distant future with how it pans out. But yeah, that’s what we’re doing. We’re in the trenches with the team that owns PagerDuty, sharing our requirements, coming to agreement on a standard that’ll work.

Abi Noda: How do you guys deal with org data?

This is something I’ve seen is just a challenge for pretty much every larger organization I’ve ever seen. What’s typical that I’ve seen is an organization has a system such as Workday, some kind of HRIS system that has the management reporting hierarchy of the organization, but it doesn’t contain team names and it might not contain really the team structure of the organization.

Curious, how you deal with that, both from how do you model that, but then how do you do your best to keep that information up to date?

Dylan Keyer: To your point, because Workday is, or whatever HRIS system in question, ignorant of some of these other things, the way we’re solving it, and again, I wouldn’t consider this to be a conquered mountain but one that we feel pretty good about so far, is we actually are pulling this metadata in and annotating the catalog itself.

Our catalog is aware that these are our users. That’s being fed from our HRIS. Each user record has all the annotations you could imagine that you’d want. Who is that person’s manager? Well, if I have every user and I know who their manager is? Recursion. I can rebuild the org chart. Beautiful.

I also know that they’re a member of a team. We have an entity in our catalog that represents our teams, our groups. We have a strong ownership relation for all the different components of the software to say, “What team owns them?”

That is how we’re mapping all the dots together like, “Hey, I know who you are as a user, I know who your manager is. I know what teams you’re in. I know what teams own the software.” So no matter what entity exists in the source system I pull, I can pretty much map to enable the slicing and dicing and the rolling up that’s needed.

Abi Noda: And so I guess first… Oh, go ahead.

Dylan Keyer: No, I was just going to say, and the last thing that hopefully is obvious is we’re ingesting all that from the catalog.

We’re ingesting all of that, including the stuff that comes from the HRIS, via our catalog.

Abi Noda: How do you keep the team, groups, etc., in the catalog up to date? I know that’s a bit of a tactical question, but is there one person at the organization who’s just looking out for any type of team changes and new hires and putting them in the right groups? Or is it more of a groundswell process?

Dylan Keyer: It’s not necessarily my area of expertise. We’ve farmed that problem out to the team that maintains our software catalog.

But here’s my understanding, and I’m pretty sure this is accurate. And if it’s not accurate, I guess this would be an even better solution, which your listeners would care about anyway, which is route everything as closely as you can to where IT deals with memberships when people get terminated or hired.

You’re using AD or LDAP solution. Your teams should not deviate from those. Reuse them as much as you can. That’s what we’re doing at Twilio. People get fired, hired, they leave, membership change is there. We use those exact same team entities in the catalog, which means we reuse the same ones in the data lake as well.

Abi Noda: Shifting over to a different question I have. I think another common challenge I’ve seen around these types of data lakes, particularly for engineering, is the challenge of, and I’m not sure this is applicable, unifying data of the same category together.

For example, if an organization uses self-hosted GitHub, GitHub Cloud, and GitLab and Bitbucket, for example, there can be some challenges there when it comes to reporting.

Curious if you’ve encountered this challenge and how you’ve approached it.

Dylan Keyer: Yes, definitely encountered it.

How we’ve approached it is through maybe more of the artwork side of data engineering. This is where our self-service functionality helps us.

We often are not the ones that know how to achieve this. How do we, I don’t want to use the word coalesce, but how do we marry all these things together in a cohesive way? We often don’t know. The technical answer is easy. dbt’s the tool, I already mentioned that. How do you craft the query to do that, and how do you make sure it actually makes sense?

The way we’re handling it is GitOps flow, you know SQL, you’re the domain expert, there’s some important stakeholder that has an important question. You’re going to be the one that’s tasked with doing it. You’re going to be the one that owns the implementation and overlay of that business logic in your SQL, and we’ll give you tools to enable you to do that more easily.

How we’re doing this is bringing in the right domain experts to make sure they can do it, dbt and SQL can make it easy technically, and then they bring the business logic to the table.

Abi Noda: For listeners, just to make this more concrete, I believe you were telling me you specifically do have this challenge, for example, with GitHub where you have through acquisitions and for other reasons, multiple instances, and I think you were saying executives said something, “We don’t want to have to ask the same question four times,” or something like that?

 

Dylan Keyer: Yeah, that’s exactly right.

To make it more concrete, yes, we do have multiple GitHub organizations. I actually don’t know what the number is.

But the point is N number of sources doesn’t mean I want to look at N number of dashboards, or N number of charts, or N number of tables.

This is getting in a little bit into the weeds with how we’ve designed our dbt stuff. We actually follow dbt’s document standard guidance. There’s nothing mythical about it. We load everything in a raw format. We transform into staging models. We have an intermediate layer and then a marts layer.

One thing that… This is a principle that we’re enforcing, which I think will help hammer this home, is where the buck stops, which is our marts layer, if you like, the thing that’s getting wired in the dashboards that stakeholders are going to see.

I don’t want to have N number of mart tables just because I have N number of the same data source hosted in different places.

What the implementation detail needs to look like, goes back to what I mentioned before, the domain experts. What needs to be true is it is a single table at the end of the day. GitHub repositories.

Some metadata in there will tell me where it came from. I need to know that too, but I don’t want to look in different places. That’s how we’re enforcing that.

Abi Noda: Well, if you had GitLab two, you might have to just call it repositories.

Dylan Keyer: Yeah. True. True, true, true.

But the point would be, yes, there’d be one thing. We wouldn’t have a GitLab repositories and a GitHub repositories.

Same thing with deployments. I don’t care how many different ways we deploy our software. Common definition, it lands in a singular table.

Abi Noda: Well, we’ve talked about some of the challenges associated with joining data from across different systems and combining data from sibling systems of the same family.

Another challenge you mentioned to me, which I’d love to dig into, is the difficulty of actually agreeing on not just what to measure, but universal definitions for things. Can you share more about this and maybe take us through a specific example?

Dylan Keyer: Yeah, I can.

In a world we lived in before, and we’re still working our way out of, where you have different teams pulling their own data from different places and defining their own metrics, people have different opinions.

Let’s take an example.

I mentioned before we care about how long it takes us to deploy. Well, how do we measure that?

A lot of your listeners might be familiar with DORA metrics. You have lead time to change. DORA gives you some pretty good guidance around how to measure that, but there’s still implementation detail. And guess what? People are going to differ.

Do I measure this from when merge domain happens? Do I measure this from the commit timestamp going onto the feature branch? Some of this is objective, some of it is not.

And so what you basically need to do is get leadership buy-in. Come up with proposals, pros, cons of each like, “Hey, what we want to inform is we want to find bottlenecks in our deployment process. With that in mind, here’s the best definition we think of that’s going to be like if we implement the metric this way. When it’s red, it’s going to help us identify those bottlenecks. When it’s green, it means that we are shipping fast and there’s not bottlenecks, but there’s other considerations. Can you bless this metric?” We get leadership buy-in.

We do that on a metric by metric basis. And of course, you’re going to have to pull in subject matter experts from time to time that know these systems and know the fields and the data models, but that’s the way we’re handling it.

Abi Noda: Do you feel like you generally are successful at doing this, or do you feel like you arrive in a place where not everyone is happy or there are still some edge cases and exceptions? What’s been your experience?

Dylan Keyer: Again, it depends.

I think we do a pretty good job of landing at a North Star.

We have a strong culture here at Twilio of agreeing but moving on. Sorry. Disagreeing but moving on.

I think the challenge though is that we agree on a North Star. The North Star is really hard to implement and we want to do something decent in the meantime, so we might settle on a next best definition, and that’s where we start to see, well, there’s so many edge cases where that doesn’t help us. We spend more time spinning our wheels around. Is that actually a problem or is it just one of those edge cases?

I think that’s the real challenge. It’s not arriving at the North Star. It’s the inability to implement the North Star and what you settle for in the meantime.

Abi Noda: That makes sense.

Well, Dylan, this has been a really fun conversation. Thanks for taking us through your journey of building this engineering metrics data lake. I’m excited to follow the journey. It seems like you’re still early on the path but again, thanks for sharing these insights and spending time here today. Really appreciate it.

Dylan Keyer: Yeah. It’s been an absolute pleasure. Thank you so much for your time.