With the incredible hype around Docker, it’s amazing how I’ve avoided learning it until recently. For the longest time, I didn’t have a solid grasp on what it does and why it exists. Turns out I’m not alone. Many fellow developers also struggle with understanding what Docker does and how it can make their lives easier.
As I was browsing Hacker News this morning, I came across a fantastic explanation in the comment section by bkanber. Here it is:
So, you build a web app and it gets popular. It needs one load balancer, 5 app servers, at least two database nodes for replication, a redis cluster for caching and queuing, an elasticsearch cluster for full text search, and a cluster of job worker servers to do async stuff like processing images, etc.
In the ancient past, like when I’m from, you’d write up a few different bash scripts to help you provision each server type. But setting this all up, you’d still have to run around and create 20 servers and provision them into one of 5 different types, etc.
Then there’s chef/puppet, which takes your bash script and makes it a little more maintainable. But there are still issues: huge divide between dev/prod environments, and adding 5 new nodes ASAP is still tedious.
Now you have cloud and container orchestration. Containers are like the git repos of the server world. You build a container to run each of your apps (nginx, redis, etc), configure each once (like coding and committing), and then they work identically on dev and prod after you launch them (you can clone/pull onto hardware). And what’s more, since a container image is pre-built, it launches on metal in a matter of seconds, not minutes. All the apt-get install crap was done at image build time, not container launch time.
Things are a lot easier now, but you still have a problem. You’re scaling to 30, maybe 50 different servers running 6 or 7 different services. More and more you want to treat your hardware as a generic compute cloud, but you can’t escape that, even with docker, your servers have identities and personalities. You still need to sit and think about which of your 50 servers to launch a container on, and make sure it’s under the correct load balancer, etc.
That’s where Kubernetes steps in; it’s a level of abstraction higher than docker, and works at the cluster level. You define services around your docker containers, and let Kubernetes initialize the hardware, and abstract it away into a giant compute cloud, and then all you have to do is tell kubernetes to scale a certain service up and down, and it automatically figures out which servers to take the action on and modifies your load balancer for that service accordingly.
At the scale of “a few servers”, Kubernetes doesn’t help much. At the scale of dozens or hundreds, it definitely does. “Orchestration” isn’t just a buzzword, it’s the correct term here; all those containers and services and pieces of hardware DO need to be wrangled. In the past it was a full time sysadmin job, now it’s just a Kubernetes or Fleet config file.
Disclosure: I’m currently writing a book on Docker. Disclaimer: I have not had my coffee yet.
Edit: Since someone asked, I’m writing a book called “Complete Docker” which will be published by Apress. I don’t know the exact pub date that Apress will launch it on, but I expect it’ll be available in October.
If you’re curious, here is the entire thread.
I’ve been a long-time fan of the Hacker News comment section. I just discovered that there’s a collection of the “best comments.” No idea who selects these, but it’s a great read!