The History and Future of Hosting
• Oct 2, 2014
"I think there is a world market for maybe five computers." -- Thomas Watson, chairman of IBM, 1943
To really understand what Giant Swarm is doing, it is important to take a look at the industry we are in and where it has come from. The future is often meaningless without understanding the past. I am talking about the hosting market and how we handled all those computers that were not on our desk, but running special tasks or delivering us all that content that is out there on the Internet.
We actually started out pre-Internet with huge mainframes - hugely expensive, with very special use cases. But that is going too far back. Let’s start with the Internet.
"640K ought to be enough for anybody." -- Attributed to Bill Gates, 1981, but believed to be an urban legend.
The basics of what we now know as the internet were started at the Advanced Research Projects Agency (ARPA) in the 60s, when the US Department of Defense, awarded contracts for a packet network system. If you are interested in the details, I can only suggest reading Matthew Lyon’s book Where Wizards Stay Up Late: The Origins of the Internet. Out of that effort the National Science Foundation Network (NSFNET) began to emerge in the 1980s to network national supercomputing centers at universities. It was decommissioned in 1995, removing the last restrictions on the use of the Internet to carry commercial traffic. Having servers meant hosting them yourself in your office and they were very expensive. It was the time of SUN servers with Solaris on it, running Oracle as a database. All proprietary and with big licensing fees. A startup meant millions of infrastructure costs.
The funniest thing I can remember was when there was a huge storm in May of ‘95, and the power grid went down for a few days. We had to go rent a power generator and take turns filling it with diesel fuel for 4 days. 24/7. We were laughing, "How many pages to the gallon today?" It was a crazy storm and it also started leaking in our building. We had meetings by candlelight with a bunch of prominent companies. They walk in; there are no lights; there are cords running everywhere leading to the generator out back; water dripping from the ceiling. We were trying to convince them, “Oh, yeah, we’re a real business,” when you say, "Hold on, I gotta go fill up the tank." So I remember that set of days pretty vividly.
-- Tim Brady (Yahoo, founded January 1994), Founders at Work
The first tidal wave of change in the hosting market was already starting to emerge with Linux, a free operating system kernel started back in 1991 by Linus Torvalds. In 1996 Linux got it’s mascot, Tux, the penguin and started to really gain traction. A large part of that was very likely that Red Hat and other open source based companies raised huge rounds of funding and went public, allowing them to get the credibility to not go away. 1998 had the production release of MySQL 3.21, a free database system, and it too gained wide adoption quickly. While it started in the 80s, now we really went from free as in beer to free as in speech at scale in terms of software. You could no longer simply use software but also use it how you want - even commercially - and actually take a deeper look into its source code and adapt it if needed, at best, contributing back to the source.
This allowed the creation of cheap server boxes from e.g. Intel, running open source software and totally changing the landscape of hosting. In the mid 90s you were still hosting yourself most of the time, but no longer running expensive Solaris and Oracle installations but Linux and MySQL. At the same time hosting centers came up and one of the big ones was Exodus Communications, founded in 1994 and going public in 1998. They were on a buying and building spree starting after their IPO and went up in flames end of 2001, but by that time they had built huge data centers and put immense amounts of fibre in the ground. As we now know, the growth their customers were predicting never materialized and the data centers lay empty, the cables dark in the ground. The boom and bust is something we can be thankful for as the huge amounts of money pumped into the field gave us a great infrastructure that was then put to good use in the following years. That period cemented the data center as a central way to host your systems.
It was still bare metal boxes though, be they under your control or managed by a partner. In 2001, VMWare, then three years old, entered the hosting market and brought with it the idea of virtual servers. Other virtualization technologies soon followed. In 2003 Xen started and in 2007 KVM (Kernel Based Virtual Machine) was merged into the linux kernel, giving virtualization the last push it needed to wide spread adoption.
Virtualization allowed for more efficient usage of the underlying hardware. It also built the first little abstraction layer for system administrators to be able to give you a server while at least partly maintaining control over the hardware itself and being able to move that virtual server to other systems if needed. Efficiency wise, it meant that you could put servers where free resources were available and use the hardware much more efficiently. They were still billed like normal servers though.
Infrastructure as a Service (IaaS) is born
In 2006, Amazon changed the game with AWS, Amazon Web Services, something they could do as old hosting companies were sleeping at the wheel. Servers were big business and the incumbents were making money head over heels, visible by the high performance cars all too prominent at hosting conventions.
But Amazon’s move was really born from internal troubles. They learned that their developers spent up to 50% of their time thinking about the hosting infrastructure of their services. This was a huge overhead, especially because Amazon is structured in two-pizza teams, teams that can be fed with two large pizzas. There are hundreds of teams inside of Amazon and the general rule is that they should all build their software and APIs on the notion that it might be opened to the public. AWS enabled them to build the hosting part into their software, using APIs to start up new servers and scale without having to worry about getting new hardware. That was somebody elses business. As such, AWS was an internal system up until the time they thought it could also be interesting for the world.
The biggest change was that now software could start servers. You had an API that you could integrate into your software stack and you were only billed on an hourly basis, being able to start and stop servers as needed. It still meant you needed to take care of the software part of servers and how they interact though. You booted up an operating system, with the right software in it and had to administer that operating system and software.
The next step: Platform as a Service (PaaS)
In 2007/08, Heroku came along and created another abstraction layer. Billing was no longer based on servers but on processes. Caring for the operating system and the software installed was no longer your concern. You simply pushed your Ruby code (later other languages were added) into their system, chose a database and Heroku took care of the rest. The Platform as a Service (PaaS) space was born. This also meant that you had no control over the exact software installed or where it was installed. The database you were using could be miles away from your webservers. If you needed a special version of a given software you were out of luck. The ease of use came at a price.
Modern Software Development
Over time, software was getting more and more complicated and concepts like micro services, 12 factor apps, and polyglot persistence took hold. What this meant was that you no longer had one monolithic system, but specialised parts using specialised software, like for logging, user authentication, statistics, product listings, and so forth. Amazon is a great example here.
It allowed for teams to work independently - different parts of your platform being weaved together through APIs. It also meant that each team could really use the right technology stack for their use case, building much more efficient systems. Nowadays, you can see this again and again in the case of big players, Facebook and Twitter, releasing open source versions of their internal specialised systems and components. At the same time, special tools like RabbitMQ are gaining in adoption to pass asynchronous messages between disconnected systems. The systems we build to run Internet properties get a lot more powerful, but also a lot more complex. You could start in a PaaS solution but sooner or later you grow out of it and need to get more control over the underlying systems, and most of the time, you will move into the IaaS space, having to take care of servers again, something you never wanted to do.
A better abstraction layer
We needed a better abstraction layer. FreeBSD had something called jail for a long time and the Linux camp was working on LXC. The idea of containers is that all you really want is your own small room in the house, using the same infrastructure again and again, but with all the freedom for your room. Think of using docker to easily add, change, move rooms in an existing house. Compare that to virtual servers, where you are stacking house upon house.
You get the best of both worlds, the IaaS world and the PaaS world, at least based on new Startups like our own emerging. You get a programmatic interface to run your services, with each service within docker pre-built and shareable, giving you full flexibility to build your container with exactly what you need, while creating an additional abstraction layer that allows other people to take care of the underlying hardware. The tools are there, the platform just needs to be built.
There is one big player in the space, who has used containerization for a long time already: Google. Then, In 2013, Docker came to light, built out of the ashes of dotCloud. It added a lot of interesting things to containers, especially the quick workflow for customizing your containers, allowing developers to easily create their own container(s) and administrators benefitting from the higher density of containers at the same time. While you might be able to put dozens of virtual machines on a server, each having its own operating system, you could fit thousands of containers on the same server and they all start instantly. Containers are just isolated processes that share big parts of their filesystem due to the layered architecture. So if you are able to start 100 MySQL processes, it would make nearly no difference to start 100 MySQL processes in separate containers. They need a lot less space, do not have virtualization overhead and are fully portable.
The question until recently was whether Docker could become The Containerization Standard. After raising a total of $65 million in the last 9 months, the company is here to stay, and partners are flocking into the ecosystem. Docker now has shown the world that it has the stamina to become the standard for containerization. It has not become the standard through traction, even though there are early signs of it, but through sheer force of will … and money.
A good example of the power of Docker is visible in a recent video from the team behind New Relic, who have seldom seen such an enthusiastic adoption of new technology by internal teams and already had huge time savings and simplifications in their setup based on Docker.
And the advantages do not stop at the productivity of developers, but extend to creating a much more robust interface and abstraction towards the operations part of running a big platform. The operations people have always known containers but they never received them willingly and directly from the developers themselves.
What containerization enables, is a largely automated and standardized data center, abstracted away from the user, giving developers reliability and control without having to care about the underlying hardware. At the same time this is where it remains complicated. Running one Docker container is easy, running a fleet of them over different hardware instances with full orchestration is something that is still out of reach for many out there.
This is where Giant Swarm comes in. We want to be the full stack startup in the Docker ecosystem, giving developers a readily usable tool to finally really stop thinking about the hosting part of their apps. They should not think about how it runs, just knowing that it will. They should be able to focus on their services and apps and just push them somewhere - all else being taken care of - at the same time having full control over all aspects of their stack.
Current Data Centers aren’t ready for this yet, but tools are being built to make them ready and slowly but surely the underlying hardware will be moved to the perfect stack for a docker ecosystem. Things like Flannel, Weave, Flocker and Quobyte are showing the way. Until software defined networking and software defined storage have prevailed in data centers and have lead the underlying hardware to completely disappear, we will see a continuous stream of new products that simply bypass the lack of APIs in data centers. We are building the software layer - integrating the right tools and building our own - to enable this future, freeing up sysadmins’ and developers’ time alike.
Now is the perfect time, where we have a clear shot at building the future of hosting, the Developer Dreamland, because again, the old guard will be complacent until it is too late.Request Invite