This is why your CI system should record statistics on which tests failed and when.
A test with a pattern of failing intermittently over its entire lifetime is most likely a flaky test. Disabling the test modifies a bunch of metrics, like skipped tests and code coverage (woe be unto you if some idiot set it up so you can't commit code that lowers test coverage, instead of just warning you).
There are much bigger and more likely sources of regression in the code than a new concurrency bug in code covered by a test that already has a concurrency bug in it. Like someone modifying a test to match the regression they just created.
If there is a concurrency or randomness bug in the untested code, that will probably surface when trying to write a robust test to replace the old one. If not by the author, then by the person the author asks for help when they can't figure it out.
> A test with a pattern of failing intermittently over its entire lifetime is most likely a flaky test.
I'm not GP, but I think they meant something more radical.
In CI, if a test is seen to fail and then succeed on the same git commit for the first time, that's it. One time is enough to classify it as "flaky". No more lifetime, no more patterns of interesting behavior. The test is removed and thrown into backlog.
I might paint with a broader brush. If your commit breaks master, we revert the whole change and you try again.
When the old stuff breaks, there's a bunch of other decisions built on top of it and there is no clear path to unwind the changes. Between HEAD and HEAD-4? Yank it.
If you have a problem with testing infrastructure/code, that will start disabling random tests until either you're left with nothing, or realise that radical moves are rarely practical. (With a side of distrust from any team you send the work to)
A test with a pattern of failing intermittently over its entire lifetime is most likely a flaky test. Disabling the test modifies a bunch of metrics, like skipped tests and code coverage (woe be unto you if some idiot set it up so you can't commit code that lowers test coverage, instead of just warning you).
There are much bigger and more likely sources of regression in the code than a new concurrency bug in code covered by a test that already has a concurrency bug in it. Like someone modifying a test to match the regression they just created.
If there is a concurrency or randomness bug in the untested code, that will probably surface when trying to write a robust test to replace the old one. If not by the author, then by the person the author asks for help when they can't figure it out.