Yes, the part "Some open hard problems/Configuration" was particularly interesting to me. Over time we evolved our tooling around k8s to a versioned (e)yaml + jq + rc.tpl.json => rc.json solution, with our own implementations for rolling-updates (to make them dependent on readyness checks) and node evacuation - all in bash. while being nifty, a wrong indent in a yaml file can still spoil the party and requires debugging at container level.
it did not occur to me that a robust, typesafe approach when dealing with k8s configuration objects could be a good idea.
Mine's isn't typesafe. But now that I'm thinking about it, it would be an interesting approach too. In Matsuri, debug options are available to show the manifest or kubectl controls. I specifically don't use yaml or json to define the template, and instead, have it programmatically generated from Ruby directly. This let me use Ruby class inheritance and module mixins to manage everything. So no indent problems, though, I sometimes run into specs that don't validate for Kubernetes.
The rolling-update in Matsuri still uses the kubectl rolling-update under the cover. What it does do is introspect to find the current revision numbers and the current image tag (if you do not provide them).
Kubernetes is integrated with a lot of IaaS providers to get you a block storage volume to persist your data on. Once you request a persistent volume in a container it will provision the volume, attach it to the node where the container is scheduled to. It is then mounted (formatted if empty) into the container. When the container is killed and restarted on another node the volume moves with the container to that node.
Now when you want clustering of things like mysql/mongo/elasticsearch/rabbitmq/etc it's a bit more complex, b/c they bring their own sharding/clustering concepts, which you have to implement on top of kubernetes. So you won't be able to simply scale mysql up via "kubectl scale rc --replicas=5", but you will have to implement a specific clustering solution, with five unique mysql-pods with their own volumes. For mysql there is "vitess" which is an attempt to build such an abstraction upon kubernetes.
> Two things which I never understood about using environment variables are how do you version control the changes and how do you manage these variables when you have more than just a handful of them?
We're doing this: the env vars are stored as a stage/container/key hierarchy in version-controlled eyaml files (yaml with encryption at the value level, nice for git diffs). At deployment the eyaml gets decrypted by ops or jenkins converted into a container env map (in our case a kubernetes resource controller).
Additionally we tag deployed containers with the config's git hash to have reproducible deployments, which is actually pretty useful. (again we leverage kubernetes labels, but this principle should could be applied to other orchestration tech i guess).
to a degree, i guess. On AWS there are a lot of services, which would qualify as PaaS offerings. Even on Rackspace, you get Databases (and HA Groups), Load-Balancers (w/ SSL termination), Queues (albeit the obscure, OpenStack one) and ElasticSearch for Logging (the latter is pretty hefty priced IMO, though).
So we're able to run quite a lot of stateless 12factor-like services on Rackspace (using kubernetes), RabbitMQ being the exception we have to manage ourselves (which sucks).
I guess those expectations are part of the problem, a talk in the recent openstack conf higlighted this: OpenStack was initially created for what these days is called "cloud-native" workloads, every VM was considered ephemeral. companies and users then try to mould it into a cheaper VMWare and are frustrated how bad it is at this.
Resizing, stopping VMs, while admittedly being a rather trivial tasks, point to this usage of OpenStack. When I would mourn the loss of individual VMs on OpenStack (or public clouds for that matter), I would turn gray soon.
We have a very nasty issue in kubernetes with it's userspace-proxy leaking handles, when misbehaving workload doesn't close connections properly (e.g. Java InputStreams). Could this be related?
There is an open issue in which we came to more or less the same conclusion as mentioned the article (not a bug, but a feature of the TCP/IP protocol).
i am a bit puzzled why other people are not constantly bitten by this, though.
Official API support would be nice. We need to query repos for specific tags. we're using the unofficial API currently, but having our cd pipeline rely on that does not feel right.
Yeah, the HA information was buried/non-existent in the docs until recently, I guess. We also ran into issues when replacing the master, b/c the kubelets where fixed to an ip and the nodes' configuration was basically immutable (provisioned by cloud-config).
What worked for us: Accessing the the master by DNS and putting etcd on a persistent volume. This way we're able to replace the master within a DNS records ttl. As the api server is not a hard requirement for the workload, this is HA enough for us.
Specifically some canonical instructions on how to harden a cluster would be helpful. Many Starting Guides have nodes use plain http to talk to the api server, thus even deployed containers can do this do.
It took me a while to find a proper kubeconfig example for kubelet and kube-proxy token auth (the one I eventually found was buried in some github issue i think).
Also, I found no information on how on what to put in the authorization jsonl file for kubelet (the given example is wrong, since the kubelet needs to write/report node status to the api) and kube-proxy. Peeking into the code helped, but I guess this information could be helpful for admins.
Corekube is somewhat complex template (and not supported by kubernetes currently). Just attempt to roll your own, it's actually not that hard. You can use the official coreos "getting started" cloud-config files as a starting point. Etcd & flannel are required, the rest is just wiring a set of binaries.
Adding nodes is pretty straightforward, you just create a server with the proper cloud-config, they should auto-register at the master. A serious setup, however, involves using security groups/private networks and load balancers, also cinder is not supported as a volume backend yet.
it did not occur to me that a robust, typesafe approach when dealing with k8s configuration objects could be a good idea.