Most platform engineering stories start with success. A small team builds something useful, leadership notices, and suddenly that team is everyone's platform team. That's usually the moment things start to fall apart. Let me tell you about Marshmallow.
Marshmallow is a really smart dog. She built CI/CD pipelines and Terraform scripts for her own team, shipped fast and stably, and her company noticed. So they pulled a few more smart dogs together, formed a central platform team, and asked her to do the same thing but for everyone. The company expected treats out of this: faster delivery across the org, fewer outages, savings on infrastructure, happier engineers. Marshmallow was the dog supposed to bring them home.
Marshmallow overwhelmed: too many requests, Dependabot PRs everywhere, "where are the treats?!"
A year later, Marshmallow is under siege. Every team wants something slightly different, her Jira is on fire, and Dependabot keeps opening PRs faster than she can review them. Things break whenever she merges those PRs in batches. And when management asks where all those promised treats are, she can't really answer. Honestly, she ate them. Hard to blame her. If this story sounds familiar, you're not alone. We see it at Giant Swarm all the time, and not just at one company.
The CNCF Platform Engineering Maturity Model maps how platform teams grow from ad-hoc beginnings to a full product. There's one step in it almost everyone struggles with: the jump from level 2 to level 3.
The gap between Level 2 (Operational) and Level 3 (Scalable)
At level 2 you're operational. You have a dedicated team, leadership pushes everyone to use the platform, and you've got some standardised tooling. Level 3 is scalable. People actively pull the platform in, self-service works, and the team has a vision instead of a ticket queue.
That gap, in my experience, is the shift from a service team to a product team. From reacting to acting. The way through it is product thinking.
I like to cook, so here's the easy recipe. And the order matters:
Understand your demand.
Control your scope.
Prove your value.
That's it. Let me walk through each one, with a story.
Picture a large European enterprise. They had a great idea years back: migrate off VMs and ServiceNow tickets, into the cloud, with self-service. Self-service and governance, way before "developer platform" was a phrase anyone used. Honestly, pretty ambitious for back then.
They put a brilliant team on it. Ten months later, they were done and ready to onboard the first teams.
But while the platform team was heads-down building, the engineers got curious. They read the cloud vendor's docs, spun up their own stuff, and ended up writing 200 different Dockerfiles and pipelines. By the time the platform launched, it was way too restrictive for the audience that was supposed to use it. The team delivered on its promise. They just built it for a customer that no longer existed.
How the platform team imagined the app teams would behave vs. what the app teams actually did
You can't build a product if you don't know who it's for. And who it's for keeps changing.
The fix isn't a survey. It's getting out of the building and actually talking to your users. Yes, even though they're your colleagues. Pick one team. Work alongside them like a collaborator, not a ticket queue, until what you've built is mature enough to hand off as self-service. Then do it again with the next team.
A Team Topologies newsletter entry makes this point directly. Platform teams tend to default to "everything as a service" because that's what a mature platform eventually looks like, but the right move is to start in "collaboration" mode with one stream-aligned team and only evolve to as-a-service once the capability is mature enough to stand on its own.
Get it right for one team, then the next, and then the next. And be loud about it. Make marketing out of the teams you've already onboarded. You want the next team fighting to be in the rotation.
A mid-sized manufacturer trusted their IT experts deeply. Whatever those experts asked for, they got. So the platform ended up with contracts for AWS, Azure, GCP, on-prem, and 150 different data services. A full Pokédex of a platform.
The platform team imagined developers would be delighted by the choice. Instead, every developer became an infrastructure researcher, spending mental energy on questions like "Should I use Azure or AWS for this?" and "What's the actual difference?" Meanwhile the platform team couldn't govern any of it properly. The surface got too wide to manage.
The problem was focusing on technology instead of capabilities.
Roadmap decision tree: improve an existing step? new capability? duplicate with different tech?
A vision lets you say no. And saying no is what saves you.
The fix is to stop thinking about your platform as a list of tools and start thinking about it as a map of capabilities.
A tool is something you install. A capability is a job your platform does for someone. Take monitoring: "we run a Prometheus stack" is a tool statement, but "engineers can understand the state of their applications" is a capability statement. Or take deployment: "we have GitHub Actions" versus "engineers can ship code to production safely". Same technologies underneath, very different ways of framing what the platform is actually for. Users don't come to your platform because they want Prometheus. They come because they need to know whether their app is healthy.
Draw the customer journey through your platform from that capability lens. List the capabilities your users come to you for. For each one, write down what a user actually does inside the platform to make use of it: the concrete steps and interactions on the user's path, not the tools behind them. For every new request that comes in, ask three questions.
Does this improve a step we already have?
Is it a genuinely new capability we should add?
Or is it just a duplicate with a different technology label?
If it's the third one, send them the docs for the existing capability and have a conversation. You don't need a new tool. You need a vision that earns you the right to say no.
I've watched this story play out more often than I'd like.
A platform team did everything right. They talked to users, built a great product, got widespread adoption, and every engineer loved it. Then the post-COVID cost-cutting wave hit, management looked for places to trim, and the platform team was first on the chopping block. The platform got worse because there weren't enough people to run it.
The platform was good. The team was good. None of that mattered, because the people holding the budget couldn't see any of it. The team had sold the platform to developers, but they'd never sold it to the people with the money.
If you can't measure the value of your platform, someone else will measure only its cost.
The people deciding your budget don't speak DORA, the engineering metrics platform teams use to measure themselves. You can tell them you reduced change in lead time and they'll just shrug. So flip the metrics into their language.
Change Fail Rate → Unplanned Cost: rollbacks, hotfixes, incident calls × engineering hourly rate
Change lead time? That's opportunity cost, the revenue your business can't earn yet.
Change the fail rate? That's unplanned cost: the rollbacks, hotfixes, and incident calls, multiplied by your engineering hourly rate.
Developer onboarding time? That's an unproductive headcount, salary paid without anything shipped.
Or, borrow a real one: when Adidas worked with Karpenter and spot instances on their Giant Swarm platform, they cut app-team infrastructure costs by 50%. That's a number every CFO understands.
Your platform isn't a cost centre. It's the reason your organization moves faster and spends less. But only if you can prove it, in their language.
Same dog, same platform, same organization. But Marshmallow now knows what her users actually need, has a vision that lets her say no, and can show leadership exactly where the treats are.
Marshmallow on vacation in the mountains
That's product thinking. And it's why she could finally take her first vacation.
Don't overthink this. Schedule a one-hour conversation with your busiest platform user. Don't bring a survey, don't bring a roadmap, don't even talk about the platform.
Just ask them what's broken. What slows them down every day. What they actually hate about working with you.
That's step one. Recipes only work if you start at the beginning.
Editor's note: If you want to watch Dominik walk through this live, the talk is here. And if your Jira looks anything like Marshmallow's, we should probably talk.