I am using kubernetes instead of docker swarm to orchestrate docker images and all points mentioned in this article do not apply. My cluster is small - I have <100 machines at peak, but so far it feels ready for prime time.
There are parts of docker that are relatively stable, have many other companies involved, and have been around for a while. There are also "got VC money, gotta monetise" parts that damage the reputation of stable parts.
> My cluster is small - I have <100 machines at peak
Let me ask: WTF are people are doing that <100 machines is a "small cluster"? I ran a Top 100 (as measured by Quantcast) website with 23 machines and that included kit and caboodle -- Dev, Staging and Production environments. And quite some of that were just for HA purposes not because we needed that much... Stackexchange also runs about two dozen servers. Yes, yes, Google, Facebook runs datacenters but there's a power law kind of distribution here and the iron needs are falling very, very fast as you move from say Top 1 website to Top 30.
The number of machines you need to run a service is not really a linear function of your traffic. If you have a mostly static website that can be heavily cached/cdn'd, you can easily scale to thousands of requests a second with a small server footprint. I expect that's true of many of the top 100 sites as measured by visitors (like Quantcast does).
But if you need to store a lot of data, or need to look up data with very low latency, or do CPU-intensive work for every request, you will end up with a lot more servers. (The other thing to consider is that SaSS companies can easily deal with more traffic than even the largest web sites, because they tend to aggregate traffic from many websites; Quantcast, for example, where I used to work, got hundreds of thousands of requests per second to its measurement endpoint.)
Not everyone can afford to vertically scale. Those 24 servers of StackExchange together cost more than the average 100 machine 'small' cluster. At Digital Ocean the performance sweetspot probably lies at the $20/mo machine, so that's 2000$/mo for a 100 machine cluster. You think StackExchange pays less than that for their hardware?
Also, some sites are simply larger than StackExchange, and you never heard of them. There's a huge spectrum between StackExchange and Google.
There's some truth in that, the site I mentioned in 2010 was running DB servers with 144GB of RAM -- unlike SE we rented servers, the colo numbers didn't look good.
I have a processing pipeline that does crawling, indexing and some analytics on the crawled data - I am building a vertical-focused search engine focused on catering for some specific kinds of companies.
Usually my cluster is very small, unless the pipeline is in progress. At this very moment I am not doing any processing, so I only have 3 machines up.
- If you don't use google container engine (hosted kubernetes, also known as GKE), kubernetes have a reputation of being hard to set up. I am running on GKE, so I can't comment too much.
- It's hard to see how all of the orchestration/docker/etc. things play together when entering the area. I expect many people hear "docker" and want to just try "docker", not being aware that there exist alternatives for some parts, some parts are more reliable that others, etc. E.g. the article we are discussing seem to be doing this.
Kubernetes seems very unwieldy and hard to wrap my head around. Swarm seems straightforward and easy to figure out how to configure.
Let's give it a try and see if I can find a good tutorial for kubernetes to do what I want. (I haven't tried this experiment in a couple months, since before nomad and swarm got my interest.)
-----
Ok, I'm back. I went to kubernetes.io. there' "give it a try" has me creating a google account and getting set up on google container engine. Due to standard issue google account hassles, I quickly got mired in quicksand having nothing to do with kubernetes.
I have no interest in google container engine. Let me set it up using vagrant, or my own VPSes, or as a demo locally, whatever, I'll set up virtual boxes.
I found the Minikube getting started to be a good experience about a month ago. If you did make Minikube the default for "give it a try" that would instantly gain k8s more credibility in my eyes, as it calls attention to the excellent tooling around k8s, diminishes the perception of GKE as the only first-class k8s environment (ahem, lock-in!), and promotes the notion of an economical and fast k8s development environment.
It isn't directly linked from the front page, but they have a getting started section [0] that covers non-GKE options. I run k8s on AWS using the Kops tool.
Kubernetes is definitely more to learn before getting started than swarm, but that's mostly because it has different and more powerful primitives and more features built in.
What are standard Google account hassles? The k8s getting started guide is one of the single best cloud setup guides I've ever read. You've deployed your app, and can easily leverage the concepts listed their to deploy more complex apps.
K8s might seem more unwieldy than swarm, but from that feature set you can expect things to work the way they are explained.
Swarm on the other hand has made my entire team question whether 1.12 is even worth upgrading to.
Completely 100% disagree that the getting started guide is easy. Let's go through this by the numbers. First off specifically comparing the guides, the Docker 1.12 stuff works ANYWHERE you have root access. When I go here I'm totally confused:
Ok, let's see, not only do I immediately get diverted to another page, but I feel like every OS+Cloud combination isn't represented. I guess the CLOSEST thing to working is Ubuntu+AWS. Click.
YAY! JUJU a new technology I need to learn. Hey guess what, this only works with Ubuntu. Closes browser. I spent weeks trying to map this out in my head. I can't understand why Kub doesn't just "install" like Docker does.
Ok, back to the deployment guide. Let's see there's a GIANT TABLE OF LINKS based on cloud+OS+whatever. So I think it's a massive understatement to say that Kub is more unwieldy than Swarm 1.12.
The docs are more of an encyclopedia: a lot of facts, not a lot of editorializing. Unless you enjoy learning from encyclopedias, I would highly recommend talking to other members of the community on the kubernetes slack (kubernetes-novice, kubernetes-user or for AWS sig-aws). That way you can quickly find what has worked well for people and what hasn't, and hopefully save yourself a ton of time.
I would love for our docs to be better - good people are working on it though documentation is always hard. In the meantime, the community is a wonderful resource!
You can have a swarm with only a handful of commands now, with TLS (for docker api). Eventually stable docker host to docker host container networking will have encryption out of the box too. The number of steps for a production docker setup keeps getting shorter and shorter, kubernetes seems to be now focused on this aspect, they might catch up. This is all of course ignoring the IaaS solutions and obviously for the reasons stated in the article, swarm doesn't see those benefits with 1.12 because major features do not work yet which are supposedly stable.
I used it because I didn't understand what I needed and wanted to stay in the Docker ecosystem to try to limit tooling issues. Kubernetes sounded like more than I needed given what Swarm offered. Turned out alright because I was able to learn a lot, but Docker tools aren't as great as they make them sound (nothing is production ready and devs are too constrained to improve anything). I've moved up to kube and feel like I would have saved a month if I'd gone with it in the first place.
I've used K8S before, and those things are smooth once installation happen. We have a project which is looking at using Docker Swarm on a two node set up. The idea is that this will give us a baseline, and since it is looking more and more we'll move to GKE, this sets us up for a later move into GKE. I thought Docker Swarm would be mature enough to handle two nodes, for a short time (maybe half a year) until we make the leap to GKE. But based on the reports coming in from the wild, that's looking less and less likely.
We hired a contractor to do this part. My team is so resource constrained that I don't have time for this, so we farmed it out. But now I'm thinking, the risk of project failure is much higher than I thought, made worse that the contractor is also showing signs of having poor communication skills. (I would rather be updated on things going wrong than to have someone try to be the hero or cowboy and figure it all out).
I went through a few iterations for logging, but now I settled on using built-in GKE logging. Stdout logs from my containers are picked up by kubernetes and forwarded to stackdriver. Since it's just stdout I do not create too much lock-in. I use stackdriver dashboard for investigating recent logs and BigQuery exporter for complex analysis. My stdout logs are jsons, so I can export extra metadata without relying on regexps for analysis - I use https://pypi.python.org/pypi/python-json-logger.
Just as a hint in case you're not aware of this: If you log errors in the format[1] expected by Stackdriver Error Reporting, your errors will automatically be picked up and grouped in that service as well.
since we are not on GKE already, I want to be able to use k8s to forward to my host machine journald and thats broken. I think Google is doing a lot of handholding to get it working with stackdriver.
This is the blocker for me. I cant switch to GKE already because I use AWS postgresql. But I want to use k8s :(
Thr K8S project I was involved in last year used AWS postgresql too. At that time, figuring out how to have persistant data was too much. Further, AWS EBS driver for K8S storage wasn't there. And, PetSets have not come out. With PetSets out, I think figuring out how to do a datadtore on K8S or GKE will be easier. (I had mentioned to my CTO, I don't know how to do persistant store on GKE; he pointed out the gcloud data store; I told him, that was not what I meant ;-)
The owner of the company ran out of money before I could add logging, but my plan was to get it out to something like papertrail.
I was on AWS. I sidestepped the issue by using AWS RDS (postgresql).
I had tried to get the nascent EBS stuff working, but when I realized that I'd have to get a script to check if an EBS volume was formatted with a filesystem before mounting it in K8S, I stopped. This might have been improved by now.
I probably wrote that support (or at least maintain it), and you shouldn't ever have needed to add a script to format your disk: you declare the filesystem you want and it comes up formatted. If you didn't open an issue before please do so and I'll make double-sure it is now fixed (or point me to the issue if you already opened it!)
On the logging front, kube-up comes up with automatic logging via fluentd to an ElasticSearch cluster hosted in k8s itself. You can relatively easily replace that ES cluster with an AWS ES cluster (using a proxy to do the AWS authentication), or you can reconfigure fluentd to run to AWS ES. Or you can set pretty easily set up something yourself using daemonsets if you'd rather use something like splunk, but I don't know if anyone has shared a config for this!
A big shortcoming of the current fluentd/ES setup is that it also predates PetSets, and so it still doesn't use persistent storage in kube-up. I'm trying to fix this in time for 1.4 though!
If you don't know about it, the sig-aws channel on the kubernetes slack is where the AWS folk tend to hang out and work through these snafus together - come join us :-)
@justinsb - based on the bug link I posted above, what do you think is the direction that k8s logging is going to take?
From what you wrote, it seems that lots of people consider logging in k8s to be a solved issue. I'm wondering why is there a detailed spec for all the journald stuff, etc.
From my perspective - it will be amazing if k8s can manage and aggregate logs on the host machine. It's also a way of reducing complexity to get started. People starting with 1-2 node setups start with local logs before tackling the complexity of fluent, etc
I'm not particularly familiar with that github issue. A lot of people in k8s are building some amazing things, but that doesn't mean that the base functionality isn't there today.
If you want logs to go into ElasticSearch, k8s does that today - you just write to stdout / stderr and it works. I don't love the way multi-line logs are not combined (the stack trace problem), but it works fine, and that's more an ElasticSearch/fluentd issue really. You'll likely want to replace the default ES configuration with either one backed by a PersistentVolume or an AWS ES cluster.
Could it be more efficient and more flexible? Very much so! Maybe in the future you'll be able to log to journald, or more probably be able to log to local files. I can't see a world in which you _won't_ be able to log to stdout/stderr. Maybe those streams are redirected to a local file in the "logs" area, but it should still just work.
If anything I'd say this issue has suffered from being too general, though some very specific plans are coming out of it. If writing to stdout/stderr and having it go to ElasticSearch via fluentd doesn't meet your requirements today, then you should open a more specific issue I think - it'll likely help the "big picture" issue along!
There are parts of docker that are relatively stable, have many other companies involved, and have been around for a while. There are also "got VC money, gotta monetise" parts that damage the reputation of stable parts.