This was a roughly six month project for a single engineer working around 75% of the time on it, with help from other folks along the way for code reviews and etc. The first three months was research, planning, implementation, etc and the latter three months was a very careful roll out and migration from the old system to the new and finally decommissioning the old system.
Thanks very much for the reply. Very useful. I think these kinds of details really help people in other organizations who might want to undertake similar projects.
Do queries to github.net stay internal or do you also sync github.net zones to Route53/Dynect ... just in case?
We have a similar setup with unbound and nsd (no need for powerdns for us). Even then it took a while to get it right because JVM apps especially love to hang for no reason doing NS lookups. You also need to specify -Dnetworkaddress.cache.ttl= etc since they don't listen to TTLs.
Running unbound on every single machine has saved us a lot of downtime.
Nearly all of our internal zones are internal and not sync'd to an external provider. In a few cases we need to perform lookups of internal zones external to our network and those zones live both internal and external.
We use the mysql backend and http API, a few small nits but for our purposes it has worked very well thus far. Note that our authorities never see production traffic outside of AXFRs from our "edge" hosts so I can't say how well it works for other use cases.
What's the reason you've chosen MySQL over the bind backend when you are using the API anyways? I have to make a similar decision soon and I am not really sure yet, any insight would be appreciated.
Full access (read and write) to the PowerDNS HTTP API requires one of their generic SQL backends (via https://docs.powerdns.com/md/httpapi/README/), such as MySQL. The bind backend only supports reading from the API, changes to zones would need to be done on the file system and/or using pdns_control. Beyond that having all our records queryable via SQL has been nice for debugging and researching our own DNS records, types and etc. Lastly, backends like the MySQL one allow for things like auto generating serials and adding comments to the DNS data.
This is really cool work, I worked with a team that implemented an ECMP hashing scheme using a set of IPs kept alive by VRRP in a previous lifetime, so I have a bit of familiarity with the space and a few questions.
The article says the L4 layer uses ECMP with consistent/rendezvous hashing. is this vendor implemented or implemented by you using openflow or something similar? How does graceful removal at the director layer work? I know you would have to start directing incoming SYNs to another group, but how do you differentiate non-SYN packets that started on the draining group vs. ones that started on the new group?
If you are using L4 fields in the hash, how do you handle ICMP? This approach could break PMTU discovery because a icmp fragmentation needed packet sent in response to a message sent to one of your DSR boxes might hash to a different box, unless considerations have been made.
CARP and similar systems require an active/passive configuration which we did not want since it needs at least twice as many hosts, half of which are not doing any work. We had similar issues with our former Git storage system based on DRDB (http://githubengineering.com/introducing-dgit/).
pfsync, lvs and etc uses multicast to share connection state which we also wanted to avoid.
Joe from GitHub here, frankly there is a lot we want to talk about and release and it was simply too much for one post. We'd like to give it a proper treatment and single very long post won't do that. Also, it allows us to get folks interested in the project and give us time to prepare our code for release. It's a surprisingly big job.
Personally, I would have preferred you waited until you could release all the documents at once. I admit I was interested, but I've seen too many people and organizations start a conversation but never finish it or show the goods. It's misleading and unfair to dangle a solution when all you really have is a problem.
It smelled suspicious, but its release generated a bunch of noise on HN anyway. And they never followed up with subsequent parts, which suggests to me that they never found a solution in the first place.
I'm not suggesting that GitHub is blowing smoke -- if you truly have a solution, that's great! But there's no harm in gathering documentation and source code and cleaning it up and waiting until it's good and ready to go. Otherwise, I frankly mistrust the motives and abilities of those involved. Call me cynical if you must.
To paraphrase from another industry, "sell no wine before its time." There's a lot of wisdom there that is equally applicable to products in our industry too.
Joe from GitHub here, we'll talk about it later posts but GLB is based on a number of open source projects including, haproxy, iptables, FoU and pf_ring.
Many existing open source solutions are optimized for short lived HTTP requests and don't address the long running connection issue (like a large git clone). We wanted something better for our use case.
I'm currently working with GitHub Support on dealing with zip downloads of a 5GB repo failing after 2-3 minutes, with curl error "transfer closed with outstanding read data remaining".
Sure about the long running connection issue being solved? :-)
IIRC the Boundary app cookbook is a pretty big divergence from the Opscode application cookbook due to their reliance on roles, environments and etc. Additionally our cookbook has a far more narrow (I think in a good way) focus. Both have their place, just probably not in the same cookbook.
He seems to assume a lot in this post, for instance his 200 vs 4 node comparison assumes that you have 200 nodes because the poor performance of your DBMS requires that many nodes. If that's the case, great, use voltdb. If not, it's perfectly reasonable to think that 200 nodes would have more network partitions than 4 nodes which is why one would pick an AP system in the first place.
I'm guessing in this case downtime does not refer to systems being in a down state but rather the sysadmins have time to work on whatever interests them.
logicalstack's comment still makes sense if you interpret downtime to mean what you've just explained. Here are some ways that good systems guys fill their time in between deployments and triaging problems:
1. Reading security bulletins and proactively trying to determine if systems are affected and what to do about it.
2. Reading about upcoming hardware and software so that they can plan the best platforms to deploy applications to.
3. Auditing applications for various problems.
4. Doing routine testing of the validity of backups and how easily they might be restored.
I upvoted logicalstack back to 0 because he might have implied that if your systems guys are reading Hacker News all day and commenting on threads about TechCrunch articles, they might have better places to spend their reading time. Believe it or not, there really are systems guys that have their hands full doing real work and don't just sit back in their chair surfing the net and playing video games.
One missing item is instrumentation and metrics. Understanding and debugging a complex application is made much easier by having an abundance of easily collectable metrics that describe the running or cumulative state of the system.
Totally. As important as logging is to profiling and debugging a system, collecting metrics is invaluable for keeping a system running. Being alerted to potential (or actual) problems can allow an admin to respond to them effectively, rather than trying to patch things together after it's hit the fan.