In something like described, it sounds like the design part is where they’re at. If they were optimizing a working algorithm (eg selecting the best memory ordering for some operation), then I’d say they’re in the implementation phase.
What’s not clear to me is if TLA+ can properly model the memory model. If I recall correctly, it doesn’t. You’re just testing your logic. That does still leave for some heavy lifting to be done.
It's kinda made to handle investigating and preventing such rare race conditions.