Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://www.sfgate.com/news/article/google-electrical-incide...

Three people in critical condition after Google data center 'electrical incident' in the Council Bluffs, Iowa datacenter (us-central region, I believe)



An arc blast isn’t anything to play with! It’s an electrically charged fireball that is 4 times hotter than the sun at the point of the arc. Pretty much instant 3rd degree burns, while simultaneously getting zapped and hit with an explosive shockwave.

https://youtu.be/-iClXrd50Z8 https://youtu.be/PO6see7_ODY


The physics of those things is scary.

If I recall correctly, the mitigating PPE for expecting an arc flash is reflective, because so much of the energy imparted on a victim is straight-up electromagnetic radiation. You get lightbulbed to death.


Yeah an arc flash suit is rated in cal/cm². It gets to a limit where the potential energy from a high voltage piece of equipment is so great that they don’t make a suit for it because the pressure from the shockwave is enough to kill you. Pretty much any utility transformer has this much potential energy, but they design them to make it near impossible to cross phases.


Seems unrelated. This happened ~10 hours ago.


10 hours ago we had a flood of network failures in us-central1 and saw no GCP status changes. We blindly attempted to mitigate in various ways (freezing HPAs because we thought that we were making excessive calls to external infra and getting throttled) and it resolved itself eventually. Maybe we were at fault the entire time but not seeing this issue exposed up on the GCP dashboard is infuriating.


At AWS, if the status changes then someone somewhere gets fired, so a lot of time incidents happen without being recorded on the status board. Maybe it’s the same issue with GCP, or maybe concern for their injured peers made everyone forget to update the status. I really hope the later.


> At AWS, if the status changes then someone somewhere gets fired, so a lot of time incidents happen without being recorded on the status board.

This can't be true, can it? What's the reason to lie, when the lie would be so incredibly obvious?


It's not true. The real answer is that execs don't want to pay the costs of slo violations. If the checkmark status green, who could say whether the service was down?


It used to be that one of the best things about working at Google was the "blameless postmortem". As long as you are able to learn from an incident and weren't attempting to look at private data, then you could write up a postmortem document and actually use that as part of a promotion packet. Google would loose a key part of its soul if they were to change that.


AWS has a process like that as well, it’s called a COE. GP is either misinformed or making stuff up.


Could be related if eng tried to route traffic away from Council Bluffs after the accident (shut down data center for safety inspection/repair?) and failed.


not at all relevant to this outage


Do you have anything to add about the root cause?


Very much doubt I am allowed to. We are racing to recover before EU reaches peak traffic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: