At scale? transparent HA means that I can fail over services to other regions wi...

cduzz · on May 23, 2024

I was referring to the notion that "failings" in the hypervisor layer like "hot swap, shared block storage, host movements, online backup/clone, live recovery, Highly available host failover" are a problem. At scale, I don't want my application to rely on any of that magic.

Reliability is always your problem not something to be punted to another layer of the stack that lets you pretend stuff doesn't go wrong.

KaiserPro · on May 23, 2024

> Reliability is always your problem

yup, which is why relying on devs to engineer it is a pain in the arse. Having online migration is such a useful tool to avoid accidental overloads when doing maintenance, its also a grat tool to have when testing config changes.

Currently I work at a place that has its own container spec and scheduler. This makes sense because we have literal millions of machines to manage. but thats an edge case.

For something like a global newspaper (when I used to work) it would be a massive overkill, we spent far too long making K8s act like a mainframe, when we could have bought one 20 times over, and still have change for a good party every week. or, just used hosted databases and liberal caches.

cduzz · on May 23, 2024

Oh sure -- for piddly enterprise nonsense, having some VM yeeting magic to HA a thing that's not HA is .... yeah, I guess. Ideally in combination with tested backups for when the HA magic corrupts instead of protects, but such is life.

But that's not "at scale" that's just some great plains accounting app that's been dragged from one pickle jar to another.

KaiserPro · on May 23, 2024

a more canonical example:

In 2016 we had a 36k cluster. There was something like 2 PB of fast online storage, 48pb of nearline, and two massive tape libraries for backup/interchange.

The cluster was ephemeral, and could be reprovisioned automatically by netboot. However the DNS/DHCP + auth servers were on the critical pathway. So we dumped them on a VMware cluster to make sure that we could run them in as close to 100% as possible. Yes, they were replicated, but they were also running on separate HA clusters, with mirrored storage. This meant that if we lost both of them, we could within a few minutes run them directly from a snapshot, or if it was a catasrafuck reload the config from git.

Now we could have made our own DNS+dhcp server, and or kerberos/ldap/active directory. but that cost money and wasn't worth the time. Plus the risk of running your own with a small crew (less than 10 infra people) was way to high.

VMware was almost mainframe level of uptime, if you did it right.