Network partitions really do happen! They are often short, but if you can't recover from them, then you shouldn't call yourself a distributed system.
I am shocked at how fragile etcd is in this way. I was hoping docker swarm was better, but I'm not surprised (alas) to find out that it has the same problem.
I'm about ready to build my own solution, because I know a way to do it that will be really robust in the face of partitions (and it doesn't use RAFT, you probably should not be using RAFT, I've seen lots of complaints about zookeeper too. I've done this before in other contexts so I know how to make it work, but so have others so why are people who don't know how to make it work reinventing the wheel all the time?)
I am shocked at how fragile etcd is in this way. I was hoping docker swarm was better, but I'm not surprised (alas) to find out that it has the same problem.
I'm about ready to build my own solution, because I know a way to do it that will be really robust in the face of partitions (and it doesn't use RAFT, you probably should not be using RAFT, I've seen lots of complaints about zookeeper too. I've done this before in other contexts so I know how to make it work, but so have others so why are people who don't know how to make it work reinventing the wheel all the time?)