Implementing a developer portal at American Airlines

Abi: Karl, thanks so much for sitting down with me. Really excited to have you on the show today.

Karl: Happy to be able to share our story with the community. Thanks for having me.

Abi: So developer portals and of course Spotify Backstage are really hot topics right now. I saw Gartner just put out a report and is talking about Backstage. Lots of companies and developer experience leaders out there are thinking about whether they need a developer portal. And if so, how to go about doing it. So really excited to get into your journey and your story of how you’ve stood up a developer portal at American Airlines. I want to start by going all the way to the beginning and understand how did even the idea of standing up a developer portal, or maybe it wasn’t even called a developer portal initially in the way you saw it. But how was this idea actually conceived at American Airlines?

Karl: Well, I had a unique journey. To get here, I started off doing statistics in IT. Came into the IT field doing network engineering. And one of the things I noticed while I was in network engineer was there had to be a better way for self-service and automation.

One of the problems we had found was our network engineers typically work in the middle of the night. So we’re not disturbing flights, and customers, and that sort of thing. We still have high availability and redundancy, but we want to minimize the impact as much as possible.

And one of the things as I was implementing some network devices, I also happened to be on the network tool team later. But one of the things I had found was there was a big gap in between the time it took from a network engineer installing a device to the time that the network monitoring team was available to validate and verify that devices were properly monitored. So one of the problems I found was being able for network engineers to get that real-time feedback about the work they had just performed.

So I spent time building out a smaller homegrown portal or platform in our networking space. While I wasn’t an application developer, I spent a lot of time scripting various items, improving network device collection scripts in our enterprise platforms. And we also had a new directive to spend 30% of our time towards automation.

So with that 30% of time, I ran with it. And I was able to communicate with various different devices. When a network engineer in the middle of the night entered in their device information, maybe an IP address, a host name, something like that. We would be able to live validate that our network tools could reach those devices. That they could log in, collect information, that sort of thing. And we continued to grow that platform.

In early 2020, the developer experience team had started at an enterprise level, and that’s the developer experience platform I’m currently involved with today, which has grown a ton since the group was started in early 2020 with code. And now the original homegrown network portal that I started is now integrating into this larger developer experience platform at an enterprise level. So I really got excited about developer experience, automation, self-service, when I noticed as a network engineer that there were ways we could do things better. We could ensure that devices were properly monitored from the start, and that a device wouldn’t randomly go down in an airport without us knowing in the middle of the night after implementation.

Abi: That’s really interesting to hear about your personal background and that your interest in self-service developer portals actually began not even in application development, but rather in network engineering. So I’m curious to know as you made that shift into application engineering, how was the idea or the initiative around building self-service capabilities conceived within that group? Was there a particular set of problems that were clearly surfacing across the organization, or was this a directive from leadership down to just develop automation? As you’d alluded to. What really caused this initiative to begin as far as building a developer portal?

Karl: At the enterprise level, we started in a unique time just as COVID was starting to close businesses and kept everyone at home. Two individuals from our coaching organization found that the in-person engagements they had made it difficult to continue working effectively over the internet. The coaching organization just didn’t have the right tools in place at the time. And while the coaching organization worked on establishing those tools, these two individuals started the developer experience platform and started Runway.

Our newly hired managing director, he saw a lot of value in consolidation and automation to reduce the work we performed manually, which would set us up for bigger and better tasks. He really wanted to focus on self-service, everything is code, and happy engineers. He believed everything that was possible which set the vision for automation. And he also brought in a new director to help steer the initial team structure, working agreements, the pace that we were working at, the feature sets, and the energy he brought to the team too.

So it really had to do a lot with the new managing director coming in, seeing how a lot of teams had been swivel chairing tickets for a long time from platform to platform and seeing an opportunity for automation through the reduction of tickets by providing the self-service capabilities.

In some cases, it would take teams months to get to the internet. And our coaching organization also saw a repetitive pattern of, “Hey, how can we set up our Java application or our spring boot application?” So being able to create a templatized system to get teams to the clouds sooner in addition to these self-service capabilities, were really our goals from the beginning.

Abi: Gotcha. And just to make it really vivid for listeners, when you describe the old the process, which was to stand up new services via a series of tickets. I mean, what were these tickets for? Was it to request support, hands-on support to actually get things going? Or did you have to go through a series of red tape approvals to get things going? What was that process like before?

Karl: So we have a lot of snowflakes in our organization. We like to do things in very different ways. And we haven’t communicated a whole lot on the best standards and patterns to utilize, which is where our developer experience portal is helping through the standardization of templates.

But when you had mentioned the red tape and that sort of thing, the approval process. Because we did things in so many different ways, there was more scrutiny over what was implemented.

So there were those problems. But also for a team to stand up a new virtual server, that was a ticket to get an account created or synced from LDAP to the server with proper permissions. That was another ticket. Anytime you needed web app defense, that was a ticket. So there were a lot of tickets in the process, and a lot of those tickets had to be fulfilled in a certain order, which could take months to eventually complete and get your application online.

So we really focused on the easiest way to get to the cloud with the soonest outcome, and having security baked in. And having an established set of patterns that teams no longer have to ask for approval. And that allowed ideas, experimentation to flourish a lot easier, because now teams can just go to our developer experience portal, which we call Runway, and request the services that they need.

It’s also reduced the amount of meetings, approvals, that sort of thing. In about 10 minutes, our users can have a fully working application that’s been created from a template. Whereas before, it would’ve taken them months. And we do it in a secure way using our standards. And we work with other teams such as our Kubernetes team, our DevOps team for our pipelines.

So we’re really trying to make sure that we’re not working in a silo, but we’re involving a lot of teams all over the area to contribute, provide feedback, and establish these standards as well. One of the ways we do that is through our demos and discovery sessions where teams can make sure that they have a say on what we’re implementing to build the right features for American developers.

Abi: I really appreciate the depiction of what things look like before Runway. Because I think what you described as an experience that a lot of leaders at a lot of different companies can probably relate to. So many companies even today still have ad hoc processes for standing up and deploying new services.

Today, of course, you used Backstage for your developer portal. I’m curious, what was the initial team that got together to start on this project and what was the discovery process like? Did you immediately know you were going to build your portal on Backstage, or did you look at other approaches or off-the-shelf solutions which you considered?

Karl: So it was an interesting time. The roadmaps of establishing our developer experience group and Backstage came around the same time. We had a desire from a coaching organization to make the templating system easier. And that’s where one of our team members had come with us. Let me restate that.

There was a team member who had come to our leadership with an article where Spotify Backstage was talked about in being a game changer in the developer experience space.

So we latched onto Backstage very early on. We started with Backstage alpha four. I believe they started alpha four in March of 2020. And our first code committed to our internal repository was on May 1st, 2020. So pretty quick and close timeline there.

We had started on Backstage. We did consider at one point alternatives as well. Because when you’re starting with the alpha four product, it might not be too stable. There were many, many major breaking changes which caused headaches for our team. It also is very hard to upgrade as well initially. But Spotify was still Spotify Labs. They were still getting their feet wet with Backstage and trying to see what were the right things, trying to drive community involvement to try to find those right things as well.

So we started early. We had many discussions about potentially changing course. But we believed in what Spotify had as a vision for their community and for their product. And we had also tried other homegrown tools in the past that didn’t quite scale right. And we had tried vendor products as well that did not allow a collaborative experience between teams. And that’s one thing we saw with Backstage and has really come true, is that anyone from our IT organization or really anywhere within American Airlines can contribute back to our Backstage implementation Runway.

So we’ve had a lot of success with the InnerSource plug-in model as well where teams all over IT can contribute back, create their own plug-ins, their own functionality. With some of the previous vendor provided platforms, that tended to be a bottleneck where all the developing for automation and self-service revolved around a single team. And we knew that’s not something we wanted.

So everything that Backstage promised along with the vision and with the community, it all came true, and we’re very happy at this point that we stuck with it. We needed to ask those questions to make sure as a team everyone was happy with the direction, and what we were solidifying as the future for American Airlines developer experience. And I think everyone on the team was really happy with the outcome of that.

‍Abi: Looking back now, because you’re someone who’s very plugged into the developer portal space, and I’m sure has a good eye in how the landscape has been changing. Do you still think Backstage is the right choice for most organizations? In what cases do you think perhaps a SaaS vendor that is an open source and doesn’t provide that extensibility might be an appropriate fit? And another way to ask that question, I’m curious how much of the open source nature of Backstage has been critical to achieving the goals of your roadmap.

Karl: For other companies, I think it comes down to size and complexity. We have a mix of modern and outdated technology. We have IT service management tools that might not be the most popular in the industry as well. So I think American Airlines has tried a couple different solutions in the past that have provided some success but hasn’t panned out a hundred percent the way we wanted to.

Sometimes, the vendor solutions require more input fields. One thing we’ve based our Backstage implementation around is being able to use as many relational data sources in the background to reduce user input. There’s a lot of backend systems that provide information, provide logic to our Backstage implementation. So I think it really depends on the technology you’re planning to abstract away with Backstage, and how many different tools are in the mix as well. I think that probably covers it.

Abi: Appreciate that response, and I’m excited to get into some of the details on how you guys have extended Backstage and built integration into your other tools. Coming back to the initial build out and launch of this developer portal, I recently read an article about Backstage and how there’s all kinds of things you can do with it. But most companies start by biting off one or two common areas of the developer lifecycle that they want to address with a developer portal.

And American Airlines, I can almost guess based on what you’ve already shared so far. But what was the V1 scope for the developer portal? Was it primarily just around the app templates, or was there more to it?

Karl: It had a lot to do with the repetitive pattern of templating out applications for teams. That was a large portion of the work that our coaching organization did. And they would typically template out applications the same way every single time. A majority of our applications are Java based, and we wanted to standardize how teams approach that. And then we also have many other languages that followed suit as well.

So the templating system for the best practices. But additionally, being able to get application teams to the clouds sooner. We were after sooner, being able to use self-service, consistency, and governance. So we put a lot of effort into making sure the right checks are in place, the right securities in place, so that teams don’t have to worry about that anymore. We wanted to reduce the amount of time teams spent towards application and delivery, and be able to empower them to work more on their feature delivery and getting those features in the hands of our customers, even sooner than they would without Runway. So those were the two main goals.

Our CIO had also mentioned at one time that security keeps her up at night at times. And it’s understandable with all the breaches that are happening around the world and the industry. And having teams fly down the runway, our Backstage implementation, that could be concerning if we don’t have the right security and gates in place as well.

So by having a standardized system that’s deploying applications the right way each time, and having application toggles to easily implement API management security, or be able to easily implement our corporate authentication. Those are certain things that were manual tickets in the past, and teams might have to manually create their own micro gateways to protect their applications. They might have to get their own secrets to power those different security gateways as well. And we’ve accelerated the process by eliminating those and automating a lot of the things that teams used to have to care about. And now we can just provide them as check marks saying, “Hey, we’ve covered it for you. Continue on with your app development.”

Abi: The capabilities and value proposition of the portal that you’re describing are I think very powerful and clear. At the same time, I think a common challenge that leaders run into is actually, how do you roll this thing out and get people to adopt it? Would love to know how you did that at American Airlines, and also what advice you would have for leaders who are going through that challenge right now.

Karl: I think the approach that we took of a minimal viable product and having weekly demos with small bits released at a time for all of our interested internal members. Not only IT, but mostly IT. Was our big success. It allowed us with those weekly demos and those small releases to quickly iterate, to quickly incorporate feedback from our community.

We started out creating the application templates that would post just to our code repository. In this case GitHub. And eventually created automation to send the produced application code to a cloud partner for hosting.

Then we automated different things out of our current service catalog that were manually swivel chair items. Things that people requested. So we heard it was really hard to get access into our cloud partners. So we targeted that. We made it easier. We went through a couple iterations based on feedback, and we’ve made that process really smooth and quick for users to use.

So making those small wins at the beginning with a minimal viable product, and starting to show the potential of the platform early on with leaders, with others at the company, those with strong voices, I think helped quite a bit. The momentum started to build, and weekly our IT staff started getting excited for what would come every week during the demos. We had some really strong voices of support in the beginning, and we also had some really strong voices opposing what we aim to do.

We worked with both of those groups there. Some teams saw various portals getting recreated at times, like we were recreating features in Azure. But really, it was more about abstraction, providing safe guardrails, being able to lock down our cloud partners, and be able to do things smart through a developer experience portal. But if there was more information needed, still be able to get the customer to the right area through links through our platform.

And then some teams also saw potential loss of work. Not everyone viewed what we were doing positively, because there was some duplication around what we were aiming to do, and what was provided in different areas of the company.

So we worked really hard to keep proving value every single week. We held various discovery sessions with our customers as well, so that customers could voice what they were looking to automate, maybe how we were able to automate and provide self-service capabilities. And we also had office hours for developers who were interested in building into the platform. Maybe like our enterprise managed file transfer system, they had an interest to build a plugin into the platform. So during office hours, teams can ask questions, get help and support. And we also worked with any team with handheld meetings who might not have developed before. So we created developers from our IT staff that might have been in a supporting role, but they knew that self-service could help their group directly.

So the rollout was really trying to target the small wins and the tickets that we could reduce from swivel chairing, so that we could empower those who used to fulfill those swivel chairing tickets maybe to work on automation or things that were bigger and better for them as well.

‍Abi: It’s interesting to hear that there was opposition from multiple angles. And now that you’ve described some of those cases, I think it makes a lot of sense. Sounds like your group did a really good job reaching out to those folks who maybe had concerns or opposition, and partnering with them to overcome the opposition.

I would love to know, how have you seen adoption grow over time? And also, how do you today define success when it comes to adoption and impact of this project overall?

And another way to look at this is if you’re a leader listening to this who’s thinking about an initiative to build a developer portal, what is it that they should be aiming for? Does success mean that every single developer is using the developer portal for everything, or some subset of the organization is using this for certain things? How have you seen that evolve in American Airlines, and how do you define it yourself?

Karl: Originally, when we started down the path of developer experience, there was a lot of manual work going on. So being able to measure return of investment through the reduction of tickets by providing self-service capabilities to our customers is one thing that we measured very, very closely. We had a goal of eliminating as many tickets as possible that were manually fulfilled, and we would track how many of those tickets came in and how many we could reduce through automation. So that was one of the things that getting buy-in from other leaders to provide those self-service capabilities, no one could really deny.

The hard part was figuring out the right automation, who would build it, who would have the time to build it. Things like that as well. So I think the reduction of manual capabilities is definitely a huge selling point.

There’s also a selling point I think for security and standardization. Spotify has recently released paid plugins for Backstage. They’ve announced that now, so I can say that. Let me restate that if I can. Spotify has now created paid for plugins that they have recently announced. And one of those paid for plugins is a scorecard feature. You can incorporate various standards into the scorecard, various security checking as well.

So one of the big goals from a security standpoint that we have is, do we have all vulnerabilities remediated in various platforms? Are we using the recommended and standard version of Java that we’ve established across the company? And Spotify’s, I believe it’s called Soundboard. Their soundboard scorecards will allow you to hook up various platforms in the background to feed information into the Backstage platform that can help with security standardization. Bubbling those issues to the top as well, that can help identify security issues that might be there across the company.

So there’s also a value from a security standpoint, abstracting the various platforms you might have across the company into one scorecard that teams can easily view. And then they can also have links to jump into different platforms if they need to get more information as well.

We’ve also reduced the amount of time it takes teams to develop and get items out the door, to our customers that are flying with us. So through the standardization of templates, teams don’t have to reinvent the wheel. So if teams are constantly at a specific organization having to re-implement things many, many times potentially with different variations, being able to standardize that and know what’s in your environment as well. Know how things are created and set up.

It can help reduce the amount of time teams take doing very repetitive tasks. So through the reduction of tickets, through being able to improve security and standard, alerting and reporting, through being able to save time with the standardization of templates, I think that’s very, very powerful.

In one case that we heard from an application team, we saved them over 30% of their time by not having to create and manage their infrastructure anymore. That was time that they further invested into feature delivery, which will hopefully differentiate us from the competition as well. Provide those competitive advantages in the features that we offer, that maybe our competitors can’t because it does take them that extra 30% of time.

Abi: One follow-on question I have to this is specifically around adoption. With where you’re at currently, is it a foregone conclusion that teams use Runway to bootstrap new applications and operate them? Or are there still teams doing it on their own through the ticketing process or otherwise? What to you is the end state that is appropriate for this type of endeavor?

Karl: Eventually, I think we would really like to have every developer going through the Runway platform in order to deploy their applications. I had brought up the CIO’s security concerns previously. There were goals to mandate the Runway platform to launch any application.

Our team was actually the one that pushed back and said, “We’re not quite ready. We’re not quite mature yet.” But being able to find those small wins. Again, being able to lure customers over with the capabilities and automation that they no longer have to fulfill, and slowly working vertical by vertical has been our approach.

So every quarter, we have a key performance indicator where we have to have so many customers that come over to our Runway platform, to our shared cluster system. So looking to increase the number of apps month over month, hopefully exponentially, has been our goal.

There’s a lot of applications that we deploy that might not fit standards or norms for one reason or another. So I think getting to a hundred percent of every application deployed through the Runway platform will be difficult. But that would be nice.

I think through more standardization and better templates, that one day we might be able to hit the 90% mark, maybe the 95% mark. We’ve made it easy for customers that might not fit one of our standard templates to still be able to easily deploy their containerized image and application out to our deployment environment. So we realize that not every application team’s going to fit the mold, and we really are after the 80%.

Abi: Makes sense. And I just want to say, thank you so much for sharing your journey with building Runway. And I wanted to say, by the way, that Runway is excellent branding for a developer portal at American Airlines, and makes me curious to learn about what all the other companies out there are calling the developer portals internally.

But before we move on to talking about some of the custom extensions that you’ve built for Runway, I want to conclude this part of the discussion by asking you really in layman’s terms… Who should or shouldn’t invest in building a developer portal? And maybe to follow that on, what’s the right point in time for an organization to think about investing in something like this?

Karl: I think a developer experience portal, something like the Backstage framework that’s open source, it provides a lot of freedom for developers to do whatever they want. A very nice abstraction layer.

If you’re starting off from scratch, maybe Backstage isn’t the right move for you. If you have a small developer population at your organization, it might be harder to justify the investment in a developer portal. If you have few applications that are deployed or deployments, new applications might not be created as frequently, then maybe there’s not as big of a need.

For us, it was really the various amount of technology we have, the various amount of backend systems that we have as well. And how we wanted to abstract those because of the vast number of backend systems, the vast number of technologies we have available.

It really made sense, where people didn’t wear multiple hats. They’re very specialized in what they do. And being able to self-service needs across the company isn’t as easy as jumping into a backend platform, clicking create, and you’ve got your new resource.

So there’s lots of guardrails and security. Various types of access control and permissions that we implement across various platforms. So I think that’s really where the value comes from. When you have highly specialized individuals across the company that typically need to submit many support tickets to accomplish something, there’s a lot of backend systems making things more complex that have to work together and be orchestrated together. I think this is really where you’re going to see that value.

‍Abi: I want to move on now to diving into some of the customization and extension work that you’ve done. And I know you’ve done a lot. And so today, I just want to hopefully discuss a couple of the most interesting stories with you. For starters, I know one extension you’d brought up with me was the Kubernetes deployment and interactive catalog features. So would love for you to first introduce what that is. And maybe in layman’s terms, describe the problem it aims to solve.

Karl: Right. So Kubernetes is a very difficult platform. It can be very challenging for teams across our organization to learn. And we really didn’t want every team to have to learn Kubernetes and the complexities of it. It can take years for someone to get that right skillset that they need, and we really wanted the application teams to just focus on their app development and feature delivery, not the infrastructure and delivery components.

So with the Kubernetes deployment features that we have, we originally had seen a plugin from Roadie. It’s a SaaS provider of Backstage. They’ve been a great partner through our contributions to the open source space. So within our journey, we knew we had to deploy applications. But we also had to show information back about the deployment status.

The Backstage platform at one time had released the Kubernetes module where you can easily view your deployed resources. But with the Roadie plugin, which was a Argo CD plugin, a GitOps plugin where we can define what an application should look like, instead of instructions on how it should end up, the result. We went with that Argo CD plugin. We saw that it could display information about our applications in the Argo CD platform, but it couldn’t do a whole lot more than that. We found different ways of deploying applications to our clusters that we experimented with. But we really liked the Argo CD platform, the GitOps platform, the features it provided.

So in the initial days, we had what we needed from the plugin, but we sought out more. So we worked with Roadie to develop a way to create applications inside Argo CD directly from Backstage, which is now open source. We also found a way to develop a backend that could search multiple Argo CD instances. At first, the initial design had some scaling problems in our organization. So we worked with Roadie to contribute back both the creation and the multi-instance capabilities for that.

So with Runway, we solve through the catalog that’s provided, a single place to find information about a running Kubernetes app that has been deployed with the plug-ins I had just mentioned that we collaborated with Roadie, to deploy the applications.

The catalog provides one place for recent GitOps deployment information for Argo, application container information and statistics, application URLs, the application code repository, documentation. If the team’s offering an API, you can also include the API schema as well.

So it’s a one stop shop to deploy your application, to get information. But if you need more detailed information that we haven’t provided through an abstracted view, our team members can also jump into the backend Kubernetes platform, the backend logging platform, or whatever they need.

So our idea and philosophy behind this has been teams shouldn’t have to jump into multiple platforms. They should be able to get a high level overview of their application. Be able to create, be able to deploy to another region potentially. To be able to delete their application, whatever they need to do in a single platform. And when they do need to dig down, we’ll provide the links to do so, high level documentation so the team can be successful. So around Kubernetes deployment and being able to get information, that’s the solution we’ve provided to teams.

Abi: Thanks for sharing that. I would love to know, in your view, what are the most difficult parts or what’s involved with developing a new plugin or extension on the Backstage platform? What should other companies thinking about that know about before they weighed in?

And I mentioned to you, I’m personally looking into this myself. So I’m really curious to know what’s all involved. I would imagine there’s a UI component side to building out an extension as well as the integration into bespoke, backend services and integrations as well. So would love to get an understanding from your perspective and what people should know about before they get started.

Karl: I think one of the big things that came up a little bit later, but I think it’s very important, is planning contributions back to the open source community. Maybe your example in use case might be a little different. But at American Airlines, we found making solutions generic enough for others to utilize and easily consume, isn’t always the easiest thing. We’re typically developing something for internal use. The instructions and the development is internal. Maybe it’s a small team that owns it, and the documentation might not be up to par with where it should be.

But many companies consume open source, but the number contributing back isn’t as high. And the Backstage community has been awesome. But I think being able to plan those contributions back is key.

And we’ve had a couple instances where what we’ve developed internally hasn’t been as friendly for open source consumption. But in the case of the two Argo plugins I had mentioned, we went back after we developed them internally, made them more generic, made them work better with the open source plugins that are out there. And ultimately, we were successful there. So thinking about ways to contribute back. Without companies contributing back, we wouldn’t have the Backstage framework from Spotify.

So I would definitely encourage that. As far as thinking about building your own plugin, there is a UI component. There’s the potential backend component. Whether that exists in Backstage or not, I think that’s something definitely to consider. If you’re building everything into Backstage and everything into the Backstage backend, at that point you’re kind of creating another monolithic application, right?

So we have found opportunities directly from the UI calling various backends, while still having proper authentication from each user. So that’s one approach that we’ve taken.

For the backend, one thing that’s nice is depending on your installation, if you’re using the Backstage catalog, you’re likely going to have a database of some sort. Well, the database can be used by each plugin. You can also have Redis caching, or I believe they support Memcached and a few others as well. But you get that benefit of building into the Backstage backend by not having to have your own database for a microservice or something like that.

So I think that there’s some trade-offs between building directly in the Backstage code base that a company has and being able to externalize that as well through different backends.

But one thing we also introduced internally was standalone UI plugins so that teams who might not want to follow our strict standards on code quality, and tests, and things like that. Maybe there’s better reasons. Maybe they have their own review process, hopefully.

But whatever the case is, teams can go to Runway. They can click on a template to create a new standalone plugin. And then that will be on the installation. That’ll be brought in from our private registry.

So I think there’s many different ways for creating plugins between the frontend and backend on how it can work, and trying to figure out the right approach to incorporating those items, and finding the right balance between what do I build in the backend, and what can be its own maybe alternative service, or microservice, or something like that.

I also think that skilling up our internal community wasn’t necessarily easy. From a language perspective, there’s not much flexibility. Backstage is built on type script, and then they also use the React framework. Internally, we used a lot of Java, and we had chosen Angular as a frontend UI framework So that meant a lot of people had to relearn some skillsets in order to contribute. But the coaching engagements I previously mentioned, office hours and creating a safe place to ask questions, all those items were really beneficial to the success of skilling up the community.

And then also, I had mentioned being able to find the right approach to release. I had mentioned the minimal viable product approach, MVP. I think that was also a huge success in trying to figure out the plugin creation. So don’t try to boil the ocean. Start something really small, prove you can do it, and then build on from there.

Abi: That’s really interesting, the part where you described providing boilerplate to help teams create their own UI plugins, as you mentioned. I think that’s a really interesting approach.

This is such an interesting topic. I would love to ask you also about another extension that I know you’ve created, which has to do with automated ephemeral environments for development branches if I’m understanding correctly. Could you share more about what that solution aimed to solve, and also how you landed on your unique approach versus some of the off the shelf or vendor solutions that exist?

Karl: So before developing the code or thinking about code to accomplish this task, our platform Runway was doing ephemeral environments using GitHub actions, before most teams realized they wanted it.

At first with Runway, the original idea came up around the fact that our developers inside the Runway platform team and the outside contributors around our company contributing, our preview environment was constantly swapping between various developers’ code.

And that wasn’t ideal. If you wanted to be able to preview what you were working on and building on, maybe you had a five-minute window to watch it. And if you went up and grabbed a coffee, or refilled your water, or something like that, by the time you made it back to your desk, it might be onto the next developer’s pull request.

So that wasn’t an ideal thing to have happening, and we used GitHub actions to implement that, but it wasn’t an ideal implementation. And we had problems replicating this to other teams who wanted it as well, due to branch protection on Git repos, functional accounts. And then credentials as well. We really didn’t want every team having to have functional accounts, and credentials, and punching holes in branch protection as well. So we knew how the process could work, but we didn’t know how to automate this behavior in a highly repeatable fashion. We did evaluate other solutions. And there’s solutions out there that ultimately can be quite pricey with the large user base we have of Runway.

Typically, we saw per user licensing. But with the abstraction layer of Runway, we really didn’t want those users in those tools. And we also didn’t see that the value for cost was there. So we reevaluated internally what we could do, and we found we had a couple key tools already available, like our Runway platform and our Kubernetes operator.

So using our operator, we reach out to GitHub now every so often on programmed repositories of code. Our developers register their application in our developer experience platform Runway, just like they do today. And we still get to use our GitOps platform as well that we chose. Because the new custom resource definition, a Kubernetes term. But that is, we’re basically defining the fact that a customer wants to watch a repository for any new pull requests to update code that might be coming up.

So every couple minutes, our operator reaches out to GitHub on those programmed repositories. If a team member is updating code and posts a pull request, we’ll match that to an application image and deploy that application. We’ll also post messages back to the user’s GitHub pull requests as comments to notify them of the status and the URL of their newly deployed application. So each application gets a unique URL with the pull request number attached to it.

We also have customer retention policies and our image repository to ensure we’re not burning up resources. And teams can say that, “Hey, I want a maximum of five pull requests at a time deployed based on the last update time.” So we’ve got different policies as well that teams can fine tune to make sure we’re not having too many non-production environments available at the same time.

We extended this as well. The ephemeral environment is really cool for non-production. But for production, and we also implemented through our Runway operator, as we call it, our Kubernetes operator. We also implemented a way to reference a GitHub release number to release a new application for production.

While these two items, they might defeat the declarative part of GitOps, it works really well for some teams that are highly automated, have their releases posted to GitHub automatically, and they never have to go and update a container tag. If an application is deployed to 100 different production clusters across the globe, those will all be updated by our GitOps platform based on the GitHub release number as well.

So not only the ephemeral environments, which is really useful in non-prod, but the production environment as well with automated updates based on GitHub release numbers, were possible because of the operator we’ve implemented. Ultimately, the goal with the operator was to simplify the Kubernetes native manifest. But we’ve found lots of opportunities on top of that to provide automation for the ephemeral environments, automatic image updating. Being able to notify security partners, cloud providers when new ingress has come up. So that operator has been key to making it easier to come to this solution.

Abi: I really enjoyed learning about some of the extensions you’ve built and how you’ve approached them. I think that a lot of leaders out there that are just getting started with developer portals will find inspiration in them as well. For the last part of our conversation, I want to ask you about something that you’ve shared with me, which is that your developer portal Runway is developed using what is called an InnerSource model for development. Having previously worked at GitHub, I’m familiar with what InnerSource is. But could you share for listeners what InnerSource is and how you’re leveraging that for your developer portal?

Karl: Sure. InnerSource is a model of open source principles and practices that are applied to software development inside a company. So it can allow others to contribute and consume products internally.

Originally for InnerSource, it was adopted before I was working in the software development area. I was probably still in networking or maybe statistics at the time, if I go back too far. And at the time, Runway also wasn’t around.

So it started out with applications to contribute to basic building blocks that others could benefit from in its own marketplace. So this marketplace, this website was born to find projects to contribute to and consume. And actually at the time, we still base it off this as well. But we use GitHub tags, and then we scan our GitHub organization to put those items into the marketplace.

And as the Runway developer platform grew, the InnerSource marketplace joined forces. This is a plugin. So that was one great example of an InnerSource plugin. And the most successful InnerSource application at American is Runway. We have many plug-ins contributed. Our team is 16 developers at our standard level. And we have 108 developers who have contributed code around the company.

And that doesn’t include those who have requested features, submitted bug reports, participated in our community events. It also doesn’t count some of the standalone plugins that I previously mentioned that are incorporated into the platform either. They have their own repositories that are incorporated at build time from the published modules internally.

We have a lot of plugins that are part of the heartbeat of Runway, because of the InnerSource model. I don’t think Runway would’ve been successful without InnerSource. Previously, I had mentioned that we tried various vendor products. But holding the development of self-service and automation to a single team created a bottleneck, and we really didn’t want to be that bottleneck. So that’s where we worked so hard with the community to make them successful.

And we’ve had some amazing contributions like our API management plugin to create an update security manifest for our API endpoints. Non-sensitive data lake access for authorized team members, plugins to interface with our VMware platforms for virtual machine functions. Various cloud provider plugins as well that we’ve created. Our asset management team has recently built plugins, and then the homegrown network platform I mentioned was also incorporated as plugins to Runway as well. So we’ve had a lot of success with our internal teams contributing through InnerSource, in open source ways to make this platform successful.

Abi: Sounds like InnerSource has been an amazing success in American Airlines. I think that’s something other companies should be thinking about, especially in a time when, as we all know, platform teams and really all teams are constrained in terms of the current macroeconomic environment. Karl, this has been an amazing conversation. Have really enjoyed hearing the insights about your journey building a developer portal at American Airlines. Thanks again for coming on the show today.

Karl: Thanks for having me, Abi. I’ve really enjoyed talking with you.

Implementing a developer portal at American Airlines

Timestamps

Transcript