From three clusters to eighty: how scale shapes platform engineering

by The Team @ Giant Swarm on Jun 23, 2026

<span id="hs_cos_wrapper_name" class="hs_cos_wrapper hs_cos_wrapper_meta_field hs_cos_wrapper_type_text" style="" data-hs-cos-general-type="meta_field" data-hs-cos-type="text" >From three clusters to eighty: how scale shapes platform engineering</span>

From three clusters to eighty: how scale shapes platform engineering » Giant Swarm
6:14

"Platform engineering" is one of the more elastic terms in the industry. A team running three clusters and a team running eighty both use it to describe their work, attend the same conferences, and evaluate the same tooling categories. The day-to-day reality is often almost nothing alike.

That elasticity matters when teams pick approaches, evaluate tooling, and try to learn from peers. The same vocabulary covers very different jobs.

Two populations, not one

The cleanest place to see the difference is in the size of the fleet a team is running. Talking to platform engineers across conferences, customer conversations, and the wider community, one thing that comes up reliably is how unevenly clusters are distributed across teams. It's not a smooth curve. It's bimodal: a large group of teams running just a handful of clusters, and a large group running fifty or more. Oliver's retrospective on the survey digs into the year-on-year picture. What that looks like in practice is two distinct populations, both calling themselves platform teams. Worth knowing which one you're in.

What small fleets are optimizing for

What we typically see at the small end of the range, working with teams in this cohort:

The focus is on production-readiness and getting developers unblocked. The platform team often has fewer than five people, sometimes a single platform engineer wearing several other hats. The clusters tend to be similar to each other, sometimes identical. Integration questions still dominate: how to wire up observability, how to handle secrets, how to ship safely. The CNCF landscape feels just as overwhelming at this scale. The difference is the absolute size of the assembly cost, not the shape of it.

What's optimized for at this scale is time to first reliable workload. The team is building toward a platform that's usable by application developers, with as little friction as possible between "we have a need" and "the service is running in production."

What large fleets are optimizing for

What we typically see at the other end, working with teams running 50 or more clusters:

The work shifts shape. The platform team is bigger, sometimes much bigger, and the questions change. How do we make sure all of these clusters are configured the same way? How do we apply a policy change to all of them at once? How do we get a fleet-wide view of cost, security posture, and upgrade status? How do we keep the platform consistent as teams across the business spin up environments for their own purposes?

What's optimized for at this scale is consistency. Repeatable golden paths, fleet-wide controllers, policy as code, abstraction layers that hide differences between clusters. The integration decisions are largely the same as at small scale, but they're being made many times over and need to hold up across all of them.

Scale plus environment, and what breaks at the wrong scale

There's a second variable layered onto the scale question. Multi-cloud is now common rather than exceptional. AWS, Azure, bare metal, and Google show up together in many of the environments we see, and most teams arrived at multi-cloud incrementally rather than by design. 

Where scale plus environment matters is in the response patterns that don't transfer well across the range.

Approaches that work at the small end have a recognizable shape: a few dashboards per cluster, manual upgrades, point-to-point integrations between tools, configuration kept in a few people's heads. They hit a wall as the fleet grows past what a small team can hold mentally. At twenty clusters, the dashboard count alone is unworkable.

Approaches built for the large end have a different shape: fleet-wide controllers, abstraction layers, multi-tenant policy engines, dedicated platform team roles. They impose overhead that's hard to justify at three clusters. The investment doesn't earn its keep until the fleet is large enough to need it.

The friction often shows up when a team grows into the next bracket without changing the underlying approach. Small-fleet patterns scaled up break in subtle ways, usually around consistency. Large-fleet patterns scaled down feel like overkill, and the team works around them.

A modular approach as one practical path

If the realities at three clusters and at eighty are this different, the response can be too. One practical path is modular: adopt capabilities as the operating reality changes, rather than aiming for one comprehensive platform up front.

Observability is a good example. At small scale, a few dashboards and an alerting rule or two are often enough. As the fleet grows past what a few people can monitor, the question shifts to fleet-wide aggregation, predictable cost, and the ability to query across clusters. The investment in a proper observability layer earns its keep at the point where the small-scale approach stops working, not before.

The same pattern holds across security, policy, connectivity, and developer enablement. Each capability has a scale at which it becomes worth investing in. Adopting them progressively, when the shape of the problem changes, is one way to navigate the range without overbuilding early or underbuilding late.

This isn't the only valid approach. Comprehensive platforms work for teams that know their growth shape in advance. DIY assembly works for teams with the capacity and a clear point of view. Modular adoption is a third option, useful when the trajectory isn't fully predictable yet.

Locate yourself on the range

Most teams sit somewhere on this range, often closer to one end than the other, often growing toward the other end without quite noticing.

The first useful question, before evaluating tools or approaches, is locating yourself. Which end of the cluster distribution are you closer to today? Which direction are you growing? And does your current approach fit both of those answers, or just one?

If you're working through this question on your team — which end you're at, which direction you're growing — that's something we think about a lot. We'd be glad to compare notes.