It's funny, I hold the exact opposite opinion, but from the same example: In the course of my programming career, I've had at least 3 different instances where I crashed stuff in production because I was computing an average and forgot to handle the case of the empty list. Everything would have been just fine if dividing by zero yielded zero.
What was the problem with crashing? Surely you had Kubernetes/GCP/ECS restart your container, or if you're using a BEAM based language, it would have just restarted
> Everything would have been just fine if dividing by zero yielded zero
perhaps you weren't making business decisions based on the reported average, just logging it for metrics or something, in which case I can see how a crash/restart would be annoying.
I imagine the problem was that it crashed the whole process, and so the processing of other, completely fine data that was happening in parallel, was aborted as well. Did that lead to that data being dropped on the floor? Who knows — but probably yes.
And process restarts are not instantaneous, just so you know, and that's even without talking about bringing the application into the "stable stream processing" state, which includes establishing streaming connections with other up- and downstream services.
I've learned my lesson since, but still.