Kubernetes at the Edge: how KubeEdge brings cloud native orchestration to IoT and beyond

by The Team @ Giant Swarm on Apr 6, 2026

Kubernetes at the Edge: how KubeEdge brings cloud native orchestration to IoT and beyond image thumbnail

Kubernetes at the Edge: how KubeEdge brings cloud native orchestration to IoT and beyond » Giant Swarm
12:00

This post is based on a talk given by Antonia von den Driesch and Xavier Avrillier at KCD Warsaw 2025.

Edge computing has quietly become one of the most important layers in modern infrastructure. Factories, toll stations, even satellites now generate data at volumes that make shipping everything to the cloud impractical, or outright impossible. The question isn't whether to process data at the edge anymore, it's how to manage the thousands of devices and nodes doing that processing without losing your mind. If you already use Kubernetes, the answer might be closer than you think.

The edge isn't the cloud, and that's the problem

Edge computing places data processing close to where data is produced: in a factory, on a highway gantry, or aboard a satellite. Instead of sending every sensor reading and camera frame to a central cloud, an intermediate layer of compute handles the time-sensitive work locally.

There are four main reasons organizations push processing to the edge. Latency is the most intuitive: a self-driving car can't wait 200ms for a cloud round-trip before deciding to brake. Bandwidth is next, because when machinery generates gigabytes of raw data per minute, streaming it all upstream simply isn't feasible. Autonomy matters because many edge locations have unreliable connectivity; a remote factory or offshore rig needs to keep working even when the network drops. Finally, privacy requirements or regulations may mandate that certain data never leaves the premises at all.

These constraints appear across industries (industrial IoT, retail, smart cities, healthcare, telco, and even space) but they all share a common operational challenge: scale. Managing a handful of edge servers is reasonable. Managing thousands of resource-constrained, intermittently connected nodes across dozens of locations is a different problem entirely.

Why Kubernetes makes sense (and where it breaks down)

Kubernetes already solves the orchestration challenge for traditional IT workloads. Its declarative API lets you describe a desired state and have the system reconcile reality to match. It's portable, so the same application definitions work in the cloud or in a data center. Its reconciliation loops reduce manual operations, and it gives teams a familiar operational model they already use daily.

The catch is that vanilla Kubernetes assumes stable networking. The API server expects nodes to maintain constant contact. When a node drops off the network, Kubernetes treats it as a failure: after a default five-minute eviction timeout, it reschedules pods elsewhere. That behavior makes perfect sense in a data center, but it's exactly wrong at the edge. An edge node going offline for an hour shouldn't trigger pod evictions and split-brain scenarios. It should keep running its workloads autonomously and sync back up when connectivity returns. This is the gap that KubeEdge was built to fill.

KubeEdge: extending Kubernetes to the edge

KubeEdge is a CNCF project originally started by Huawei in 2018 with a clear mission: extend Kubernetes to edge nodes and IoT devices. It entered the CNCF Sandbox in early 2019, moved to Incubating in late 2020, and graduated in October 2024, making it one of the more mature edge-native projects in the ecosystem.

Its core features address the three things that make edge different from cloud: it leverages the standard Kubernetes API so you don't need a separate operational model, it handles unreliable connectivity gracefully, and it provides a device management framework for representing physical IoT devices as Kubernetes resources.

The project's adoption speaks to real-world demand. China's electronic highway toll collection system is one of the most cited deployments: over 100,000 edge nodes running half a million applications, replacing manual toll plazas with overhead gantries using camera-based license plate recognition. The system processes roughly 300 million data points daily.

An even more striking example is the first cloud native edge computing satellite, launched in 2021. Used for flood prevention and disaster mitigation, the satellite uses KubeEdge to run a small onboard AI model. When it captures an image of a region, the model evaluates whether cloud cover or weather makes the photograph useless, and only transmits images worth processing. A more powerful model on the ground station handles the detailed analysis. This approach reportedly reduced disaster response time from days to hours and cut data transmission by around 90%.

How the architecture works

KubeEdge's architecture follows the same cloud-edge-device layering that defines edge computing in general, but maps each layer to specific components.

The cloud side: CloudCore

On the cloud side (which can be any Kubernetes cluster, not necessarily a public cloud provider), the CloudCore component is deployed as a Helm chart. It's the single gateway between the Kubernetes API and all edge nodes.

CloudCore contains three main pieces. The CloudHub is a messaging broker that maintains a persistent, secure channel to each edge node over WebSocket or QUIC. The EdgeController propagates workload changes from the Kubernetes API to the edge and reports back status and events. The DeviceController reconciles device custom resources, meaning the Kubernetes representations of physical IoT hardware.

Funneling all edge communication through the CloudCore provides two important benefits. First, it reduces API server load by aggregating updates from potentially thousands of edge nodes rather than having each one hit the API server directly. Second, it provides resilience across flaky networks through incremental updates: when a disconnected node reconnects, it doesn't flood the API server with a backlog of changes all at once. KubeEdge benchmarks have demonstrated this architecture supporting up to 100,000 edge nodes in a single cluster.

The edge side: EdgeCore

On each edge machine, the EdgeCore runs as a single binary with a memory footprint of around 70 MB. The EdgeHub is the counterpart to CloudHub, maintaining the WebSocket or QUIC connection, and routing messages to the correct internal modules.

Two modules are used for offline operation. The MetaManager caches pod state, ConfigMap contents, and other Kubernetes metadata in a local SQLite database. This means that when the node loses connectivity, it still has everything it needs to keep running. When connection is restored, it syncs the delta. The DeviceTwin module tracks the state and attributes of connected physical devices, synchronizing that information both to local applications and back to the cloud.

A component called edged functions as the edge equivalent of the kubelet, talking to the container runtime to manage pod lifecycles. All internal communication passes through a messaging framework called Beehive, which is specific to KubeEdge.

On the device-facing side, an EventBus integrates with an MQTT broker to connect edge modules with device mappers, while a ServiceBus provides HTTP communication with local applications and APIs.

The key difference from standard Kubernetes: when an edge node disconnects, pods are not rescheduled to other nodes. The EdgeCore's embedded database preserves the desired state, and the node operates autonomously until connectivity returns. This is how KubeEdge avoids the split-brain problems that would plague a regular Kubernetes setup in edge environments.

Managing physical devices as Kubernetes resources

Beyond workload orchestration, KubeEdge provides a framework for representing IoT devices as first-class Kubernetes custom resources. This is where the project moves from "Kubernetes at the edge" into true IoT territory.

The device management layer revolves around mappers, which are containers that translate between physical device protocols and KubeEdge's internal messaging. IoT devices speak a variety of protocols: Modbus, OPC UA, ONVIF, Bluetooth, Zigbee, and others. A mapper for a given protocol knows how to collect data from devices using that protocol and transmit it to KubeEdge via MQTT. Several mappers are available upstream, and a Go framework exists for building custom ones.

On the Kubernetes side, devices are described using two custom resource types. A DeviceModel defines the properties of a device type (for example, an 8-channel analog input module might have properties for each channel, a protocol designation, and access modes). A DeviceInstance represents an actual deployed device, referencing its model and specifying concrete details like network addresses and protocol register mappings.

This approach creates a digital twin of each physical device inside Kubernetes. You can inspect device state with kubectl, build controllers that react to device changes, and manage your entire device fleet through the same API you use for pods and deployments.

In practice, the data flow looks something like this: a device (say, an industrial camera on a production line) publishes data to an MQTT topic on a local broker. A mapper container subscribes to that topic, updates the device twin in KubeEdge, and can expose metrics for collection. The edge node handles the latency-sensitive processing locally, while syncing state back to the cloud for further analysis. Giant Swarm customers use this pattern for things like detecting defects in paper production, with camera feeds processed by edge-side AI models. If you want to see a working example of this stack end-to-end, check out the demo from our talk at KubeCon Atlanta.

Why we chose KubeEdge over K3s

If you're evaluating edge Kubernetes options, K3s will come up. It's a solid project and well-suited to low-resource devices (but it connects only to a K3s control plane). Supporting it properly would mean maintaining a separate cluster API bootstrap provider, a separate control plane provider, and separate lifecycle management just to connect edge nodes to an existing cluster. That's real overhead, both for us and for customers who already run Kubernetes.

KubeEdge can be added on top of any existing Kubernetes environment regardless of the type of control plane, no forked lifecycle management. Customers don't have to change how they operate to add edge nodes. That was the deciding factor for us.

What to watch for in production

KubeEdge isn't plug-and-play in every environment. One class of problem worth knowing about involves CNI compatibility on specific edge hardware.

We use Cilium as our CNI, and on certain edge devices we've run into incompatibilities caused by missing flags in the Linux kernel because of how it works with eBPF. There's no single fix, depending on the hardware and the customer's constraints, the resolution might involve an entirely separate CNI, recompiling the kernel, or in some cases recommending different hardware altogether. It's the kind of problem that only surfaces in production, on real devices, which is why close collaboration between the platform team and the people who know the hardware matters more at the edge than it does in a data center.

Where to go from here

KubeEdge is a graduated CNCF project with active development and a growing ecosystem of mappers and integrations. If your organization runs Kubernetes and has edge computing needs, whether that's factory floors, retail locations, remote infrastructure, or anything in between, it's worth evaluating.

The KubeEdge case studies page has more real-world examples beyond the highway and satellite deployments covered here. If you're thinking about how to bring Kubernetes to the edge in your environment, we'd love to talk about it.