This article is a very opinionated approach to microservices and Docker in general and how a framework like Rails could fit in there. Further, we’re looking into how an infrastructure service built for microservices and Docker, Giant Swarm, could help getting it running in the cloud.
Talking about architecture nowadays will eventually lead to microservices. And from microservices it is only a stone's throw to talking about containers and Docker. While one does not imply the other, both movements favor each other and even have some things in common: They claim to ease the pain of building and operating large and complex applications.
Since there is (luckily) no such thing as apt-get install microservice one could ask: "How can I have this new shiny thing?" Most likely the answer to this would be: "You cannot have it.". To be more precise: "You cannot have it right now.". Keeping in mind that "Architecture is the part of a system that is the hardest to change" the term "microservice" relates to an architectural style rather than a silver bullet. Even in 2015 one does not simply refactor a monolith. Especially not if one uses a framework like Rails embracing monolithic applications.
Docker on the other hand promises to "eliminate the friction between development, QA, and production environments" and maybe one can have at least this. However, running Docker in production is hard, that’s why we will look into how Giant Swarm’s infrastructure and facilities simplify the actual deployment.
Join me on this journey and let us find out where this road will lead us. To help with this we will dockerize a simple app I created to showcase Guacamole, an ODM I'm building for the NoSQL database ArangoDB. You can find the app on GitHub. If you want to give it a try you need to have Docker installed on your machine, Ruby 2 or higher, and for the actual deployment a Giant Swarm account. No need to install any database though, we will use the production Docker container locally, too.
Disclaimer: A lot of information and inspiration for parts of this post are taken from this article on dockerizing a rails app. There you will find useful additional details I won't cover here.
Divide and Conquer
Before we start crafting Docker containers let's first have a look at the application itself and how we could approach the dockerization. The application at hand is nothing special:
Login via OAuth2 against GitHub
Calls to an external API (GitHub)
Background jobs for long running tasks
We could put all of this into one single Docker container, but this would just cause more pain than gain. For instance we would lose the option to scale the app itself independently from the database. Following the trend of having skinny containers with at best only a single processes running let's cut the app into five containers:
A nginx will serve as the frontend proxy server. It will serve the static assets in our case. In a more complex application it could additionally take care of access control or load balancing between different backend services.
Next in line is the Rails app. It will be run by a simple web server. We will use Puma for this.
The Sidekiq worker will get its own container, too. If you have more than one queue this result in one container for each queue.
One container for Redis as the job queue.
One container for ArangoDB as the main database.
The following diagram helps to understand the planned architecture and where communication is happening:
I would argue giving Sidekiq its own container is not even in the same ballpark as microservices - there is far more to do to get there. But it gives us operational separation of the app itself and the worker processes, which is still very nice.
The Good, the Bad, and the Ugly
Now that we have identified the containers, we need to actually create them. Docker containers are built based on a Dockerfile, which describes the single steps the final container is composed of. We said we would need five containers and if a container is based on a Dockerfile we need five of them, right?
Almost - As luck would have it one can share container images. And if one does not need any customization those public images can just be used. You will find images for all sorts of applications and there are of course images for most databases around.
We will use the official Redis and ArangoDB images. As we we don't want to install the databases on our local machine, but use the container locally, we can just run the following commands:
Those two commands will fetch the images from the official Docker registry, start them detached (
-d) and assign them a name (
--name). Both will have a volume assigned and their defaults ports exposed. At least for ArangoDB you should configure authentication for your production setup.
nginx Frontend Proxy
Remember, Docker containers should be treated as immutable. Changes should be applied at build time and not run time. This requires you to create a new image whenever you change something. And changing some things requires changes to the Dockerfile. For the nginx frontend we need such a change, namely a configuration file to forward to our Rails application. Since every Docker image builds on top of an existing image we use the official nginx image as basis for our custom one:
A Dockerfile always starts with the
FROM instruction. It tells Docker, which image is the base to be used. Additionally, we will just
COPY the public folder and the configuration into the image. As I said before containers should be treated as immutable and due to this, we need to create a new image whenever we change the assets. The nginx config itself is pretty straight forward:
One thing needs to be mentioned though: Where does this rails-app hostname come from? Docker will provide information about linked (we will explain later what that is) containers in two ways: A bunch of environment variables and entries in the
/etc/hosts file. In this case we’re gonna use the entry from the
Rails App and Sidekiq Worker
Let's add something nginx can actually forward to: Our precious Rails application. There is an official Rails Dockerfile that we will not use. It installs components we don't need or don't use, but worse of all it runs bundle install without
--deployment flag. Nevertheless we can use it as guidance for our own variation:
Without Docker one would deploy such an app which Capistrano or something similar. Steps that would be done remotely on the server will now be done when building the Docker image. Things like installing gems and copying the source "to the server". With this we have a container we can start anywhere, anytime and have exactly the state we had when we originally built the underlying image.
The Dockerfile for the Sidekiq worker is nearly the same. What would be even better than just duplicating it, would be to define a custom base image to be used by both the Rails app and the Sidekiq worker. I will leave this as an exercises for the reader. If you come up with something cool, feel free to contribute to the Github project.
Docker expects only one Dockerfile but we need three in our case. I gave each of the Dockerfiles a meaningful suffix but using them with Docker would result in renaming all those files by hand. Not an option. If there only was a tool that could help us with this, oh wait, we could use Rake:
Both the web and app build need to be performed with
RAILS_ENV=production because we want the assets to be generated for production and not development. The
-t flag will specify the repository name of the resulting image. This is required in the next step to upload the image.
Moving to the Cloud
Up to this point we have a very elaborated local environment. This is really nice, but what if we really want is this exact environment out in the wild. At least that's what was promised to us, wasn't it?
One could configure any server to run our shiny Docker-based application. However, this would leave us with all kinds of additional challenges: Linking containers together, scaling individual containers, managing containers across multiple servers, and many more. Luckily, you can just use Giant Swarm, where all of this (and more) is taken care of. First thing you need is to request an invite. After you have been invited and installed the swarm CLI and setup your local machine the first thing to do is to create a
Here you define you your entire application and how each of its components relates to each other. And if you recall the nginx configuration from before where we used
http://rails-app:8080 as the backend address, this is where we define it. The component rails-app will be linked to the component nginx and the name of route to the appropriate host will be made available under rails-app. The very same is true for the
REDIS_URL which relates to the redis component.
Since we don't want to put sensible information into our
swarm.json (like the GitHub OAuth2 access tokens) we can create a dedicated file named
swarmvars.json to define those:
We can relate to those variables in the
$github_key for instance. When running the application on the Giant Swarm infrastructure each Docker container will be started with appropriate
--env options. To make everything accessible from the outside we need to assign a domain to at least one component. Since the nginx is our entry point we configure the domain in this component.
Before we can finally start the application we first need to upload our custom images to the Giant Swarm registry (of course you could push it to the Docker registry too, but maybe you don’t want your app publicly available):
This will take some time depending on your internet connection. But as soon as this has finished, you can finally start all of this with one single command:
$ swarm up
Behind the scenes this will fetch all required images from the registries (the Docker and the Giant Swarm one), run each container with the appropriate options, collect all the logs from the containers, and make the app available under http://gh-recommender.gigantic.io. That's pretty neat!
If you made it this far: Congratulations and thanks for the endurance.
At this point the application will use a single container for each component. Next your application attracts more users or you have an unforeseen event which requires more resources. Traditionally adding additional servers was a cumbersome task and took at least a couple of manual steps: Booting up machines, provision them and add them two the load balancer or the like. With Giant Swarm adding another instances of the Rails application is as easy as this
$ swarm scaleup github_recommender/gh-recommender/rails-app
While this removes a lot of the technical burden of scaling your application it does not make your app itself magically suitable for horizontal scaling. And when it comes to the database layer things became a whole lot more complicated. You still need to invest in building your application scalable. But at least now you can focus on this part and stop worrying about the infrastructure details.
In the end this article can only cover so much. There is so much more to discuss and learn on this topic. I hope I could get you at least a head start, though. If you want to go further, here are some suggestions for topics we have not covered but are nevertheless relevant:
I said at the beginning containers can be used for local development too. And they even should be used for this. But we didn't cover how this can be accomplished.
One big issue is debugging of containers, both locally and in production. As with many other aspects there is no silver bullet, yet, and maybe never will be. At the same time it is something one should care about early on.
Security is a big issue in the Docker world. While relying on Giant Swarm helps a bit, one has to get familiar with the security implications of using containers and Docker. And I'm not talking about security flaws per se, but about the general difference from traditional infrastructure setups like VMs or managed servers.
Additionally I strongly suggest you to build your own images and not rely on the public ones. You will end up with way too much differences. For instance the five containers from this example rely on three different Linux distributions.
Further, we tried to do just one process per container. While this is the official statement from Docker it is still controversial. Both views on the topic are valid in my opinion. You should definitely have a look into this and make up your own opinion about it.
Don't start with microservices. Start with Docker. Get familiar with the tooling, paradigms around it, and how it feels working with it. How do things fail? What impact does it have on existing tooling and processes? You should tackle those first before cutting your app into small pieces and deploying those pieces all over the place. I strongly suggest a step-by-step approach on this.
I don't think everybody will or even should switch to using Docker in the near future. But it is a very interesting piece of technology and containerization itself is most likely here to stay. It is currently stable enough to start using it even if you’re not a very early adopter. Learning the technology on non-critical applications is something one should start better today than tomorrow. And having an infrastructure provider like Giant Swarm definitely helps operating your containers online and at scale.