This is the second in a series of blog posts on building Container Images. The series started with What is the Future of Container Image Building? which looked at how building images has changed since Docker first launched and how some of the restrictions of using Dockerfiles can be overcome. This post focuses on Podman and Buildah and in future posts we will examine other new approaches in this area.
Podman and Buildah are two quite recent tools that have emerged to aid with container image building. They are complementary tools, both constituents of the Open Repository for Container Tools, and stem from Red Hat’s mission to excise the Docker daemon from container workflows. Why two tools, and what does each bring to the container image building experience? Let’s start with Podman.
Podman’s purpose extends beyond the container image building objective, but it’s often discussed in conjunction with Buildah, and is considered here because it has a contribution to make to container image building.
Podman attempts to reproduce the entirety of the familiar Docker CLI without the need to run a daemon to serve and act on API requests. Instead of a client/server model, Podman implements a local fork/exec model, which in Red Hat’s eyes greatly simplifies the control and security of the container’s lifecycle.
Podman emulates the various client commands that Docker provides, and some advocates even encourage new users to alias the
docker command to
podman, in order to ease migration from one to the other. Amongst the suite of Docker-like commands that Podman provides, is the
podman build command. It’s used for building OCI-compliant container images, using a Dockerfile as its source for the various build steps. In that sense it is virtually identical to the
docker build command, but without the overhead of the Docker daemon.
As you might expect, all of the familiar
docker build command line arguments are available in
podman build (save for the odd one that remains unimplemented, like
--cache-from), with some additional arguments that are required in lieu of some features that are normally provided by the Docker daemon (e.g. registry communication). Swapping over from
docker build to
podman build, then, is a largely seamless experience, except for the odd quirk such as needing to specify where to find images that are referenced without a fully qualified image name.
Interested in seeing how Giant Swarm goes toe-to-toe with OpenShift? Read our article that breaks down the key benefits of using Giant Swarm compared to OpenShift.
With one big goal achieved, a daemonless build experience, Podman also provides another sought after feature - rootless container builds. Historically, because of the Docker daemon, building container images with
docker build has required root privileges, a level of access that is often considered too permissive in security conscious organizations. In providing the ability to perform rootless builds, Podman answers this serious concern, but it’s not without its limitations.
The process of building images from Dockerfiles involves the temporary creation of containers for running commands in order to install packages, retrieve remote content, build artifacts, and so on. Creating and running containers ordinarily requires root privileges. So, how does Podman satisfy this apparent dichotomy? In order to circumvent the need for running builds as the root user, Podman makes use of user namespaces. Namespaces provide an isolation mechanism for Linux processes, and are a primary constituent of the container abstraction. If the set of namespaces a container is created with includes the user namespace, then the agent that invokes the container can be a non-privileged user - in other words, with user namespaces Podman can use containers to effect rootless image builds.
User namespaces provide a way of mapping a range of non-privileged user and group IDs (UIDs/GIDs) from the host’s default user namespace, to a different set of UIDs/GIDs inside a new user namespace associated with a container. In this way, a non-privileged UID/GID on the host can be safely mapped to the root user (UID/GID=0) inside a container, giving the container’s process the elevated privileges it might need as part of an image build (for example, to install OS packages). But courtesy of the mapping, the container only has the same file access permissions on the host that are bestowed on the unprivileged user who issues the
podman build command. This means the host’s filesystem is protected from accidental or malicious container compromises.
Current Rootless Limitations
But there’s a problem. New images are often built from a base image (the
FROM instruction in a Dockerfile), whose content will ordinarily be owned by the user with UID/GID=0. When a container is started as part of a container build by an unprivileged user, the container’s files are owned by UID/GID=0 on the host, whilst the container’s process will only have the file access permissions associated with the unprivileged user. This may mean the container’s process is unable to write to its filesystem, which would severely hinder container image building. In order that the files from the image have the correct ownership inside a container, the set of UID/GIDs needs to be ‘shifted’ inline with the user namespace mapping. Currently, there is no optimal means for achieving this.
When a rootless
podman build is invoked and a container requires an ownership ‘shift’, the filesystem content is copied and ownership changed (chowned) to reflect the mapping. This is clearly inefficient in terms of space, and takes time, which can severely impact the length of time container builds take. One of the big ideas behind containers is that multiple containers and images can share an image’s content without duplication. Ideally, this shift should happen without duplication as part of the mount operation when a container’s filesystem is assembled from its constituent layers.
Most container runtimes use overlayfs for assembling a container’s filesystem, which doesn’t support shifting UID/GIDs on its mounts, but recently Ubuntu became the first Linux distribution to support an in-kernel mechanism (shiftfs) for overlays, which has been put to use in the Linux Containers LXD project.
For Podman, a partial remedy is at hand with the introduction of a mount option for the overlayfs in Linux kernel version 4.19, which only copies up the metadata for files and directories to the read/write layer, rather than the content itself. Ultimately, however, the community is waiting on support in the mainline Linux kernel for shifting UID/GIDs in order to achieve this goal.
Let’s move on to Buildah and explain how it relates to, and is different from,
What we haven’t mentioned thus far is that
podman build uses Buildah under the covers to perform container image builds. This means that daemonless and rootless builds are also a feature of Buildah. Unlike Podman, Buildah has a container image build-specific function, and has a number of features that stretch beyond building images based on Dockerfiles.
The majority of container images out in the wild have been built using a Dockerfile as the immutable reference for the image. We’ve discussed how
podman build uses Dockerfiles in order to build images, and Buildah can also build images from Dockerfiles using the
buildah bud command. But once again, Buildah was inspired by the quest for an alternative method to the ubiquitous Dockerfile for container image building. The rationale for an alternative method, is that what is required is simply a ‘bundle’ that represents an OCI compliant image, and how you get to that end goal doesn’t necessitate a Dockerfile. Buildah’s maintainers insist that the Dockerfile is a limitation that needs circumvention.
How Buildah Works
Despite the deliberate desire for independence from the Dockerfile, Buildah uses a very similar process for building container images. A
docker build runs a new container to process each Dockerfile instruction, which results in the creation of new or changed content or image metadata, before the container is committed as a new image. The next instruction is processed in a new container based on the previously created image, and then committed as a new image, and so on. Buildah does the same thing, but instead of using Dockerfile instructions it executes Buildah sub-commands, and doesn’t require a ‘commit’ after the execution of each sub-command.
The build process might start with a
buildah from command, which results in a running container based on the image that’s specified as an argument. This is clearly analogous to the
FROM Dockerfile instruction. To execute commands within a container as part of image building (e.g. to create a new user, or build an artifact from its source), an image author can make use of
buildah run, which can be interactive when required. In addition to running commands that create content for container images, Buildah also provides a means for defining metadata for images, using
buildah config. This enables the specification of things like exposed ports, the default user, container entrypoint, and so on. The
buildah copy and
buildah add commands are directly analogous to the
ADD Dockerfile instructions for getting external content into an image. Using
buildah mount, it’s even possible to mount a container’s root filesystem at a suitable location on the host for subsequent manipulation with tools that are native to the host itself.
Once an image author is confident they have finished crafting their image, the
buildah commit command commits the container to a new image.
There are some clear similarities between the combination of a Dockerfile and
docker build, and Buildah. But a Dockerfile imposes the sequential execution of dependent instructions, so how does Buildah provide similar order and repeatability in a container build? In place of the daemon’s build engine which imposes this order for a
docker build, it’s suggested that container builds using Buildah should be defined programmatically, using something like Bash for example.
#!/bin/bash id=$(buildah from --pull node:10) buildah run $id mkdir -p /usr/src/app buildah config --workingdir /usr/src/app $id buildah copy $id $PWD . buildah run --net host $id npm install buildah config --port 1337 --entrypoint '["npm", "start"]' $id buildah commit $id example-app
The simple example above shows how to achieve a repeatable build using Buildah in a Bash script.
In taking the approach to image building that it does, Buildah removes any dependency on a long running daemon process, and further, frees the image builder from the constraints of Dockerfile syntax. The images that get created by Buildah can be pushed to a container registry, and then subsequently pulled by Podman or a Docker daemon, and will work seamlessly on container runtimes that support the OCI spec.
Build Caching and Parallel Execution
If images are built using a Dockerfile and
buildah bud, then image layers are cached for re-use in subsequent builds. For those coming to Buildah from a Docker environment, this is the expected behavior, which can significantly increase the speed of build execution. But, if you’re expecting caching to be available when building images using Buildah commands in a script, you will be in for a surprise. Caching is not available, which means that the entirety of the build steps need to be performed on each new build iteration, irrespective of whether any change in content or command occurred.
Additionally, Buildah executes its build steps sequentially, even if one build step is entirely independent of another. Whilst the inclusion of a parallel build step feature has been considered, it’s not currently available in Buildah, which can further extend the length of time it takes to perform complex container image builds.
Whilst there is a clear distinction between Podman and Buildah, it may seem a little confusing that there are two routes to achieve the same goal. If in doubt,
podman build should be used when authoring images with Dockerfiles, and Buildah should be used if Dockerfile syntax is considered too restrictive, or a script-like approach to repeatability is preferred. It’s worth mentioning that container images and Dockerfiles are almost synonymous, so it remains to be seen whether Buildah will get enough traction beyond the Red Hat community to eventually usurp the Dockerfile.
Podman and Buildah deliver on two of the most sought after features for container image building; daemonless and rootless builds. These tools, however, are competing in an increasingly crowded space, and whilst it’s still early days, they do lack some of the features that similar tools currently provide.