Nope, std::memory_order_acquire is perfectly fine here. Just because an atomic operation involves a store doesn't necessarily mean you always need/want to synchronize with that store. In this case, we only need to make sure that subsequent code is not reordered before the atomic load, hence the aquire semantics.
I spent a whole lot of time staring at this, and I was eventually able to convince myself that you're correct, there isn't any way for this to actually go wrong.
In the case where you're storing a different value than was already there, you just took the lock and even in relaxed ordering that is in fact an atomic operation, nobody else has the lock. In every other case who cares if you release, you didn't change anything anyway. The unlock() will release, and so anybody who subsequently acquires the lock will see the changes made under the lock.
Even after convincing myself this analysis is correct, I'm still scared that it's wrong, anyway. After all my previous analysis was wrong :/
It's not actually clear to me right now what part of the memory model would prevent the release store of an earlier unlock from being moved past the acquire load of a later lock.
Except that you wouldn't move it past a single acquire load, you'd potentially move it past an infinite number of acquire loads, and maybe then you run into formal language about forward progress?
There are similarly fun questions around hoisting an acquire load out of an otherwise empty loop...
> what part of the memory model would prevent the release store of an earlier unlock from being moved past the acquire load of a later lock.
Are you talking about different locks? In that case, yes, reordering can happen.
If it's about the same lock, then this can't happen because it would change the semantics of the code (you can't aquire the lock without releasing it first).
That reordering would be bad even with different locks because it can introduce deadlocks, as somebody has pointed out.
There must be some formal semantics reason which forbids it, and I think the reason boils down to the lock not being a single acquire load but instead a loop of acquire loads.
Unlock followed by lock is a store that is followed by a potentially infinite loop. If you move the store past the loop, and the loop happens to be or becomes infinite, then the effect of the store will never become visible, and that is an incorrect program transform because it'd be as-if you simply erased the store.
So you can probably move the store past any finite number of loads, but not past a potentially infinite loop.
Yes this re-ordering can cause a deadlock. But it's not an allowed optimization to change a non-deadlocking program into a potentially deadlocking program, so this reordering can not happen. (http://eel.is/c++draft/intro.multithread#intro.progress-7)
You can still get this deadlock if you actually write the code like that: lock1.lock(); lock2.unlock();
"aquire" means that the compiler can't move any reads/writes up before the load. "release" means that it can't move any reads/writes down after the store.
The compiler can move a release store of variable A past an aquire load of variable B because it violates neither constraints.
The opposite is not true: it can't move an aquire load of variable A past a release store of variable B because it violates both contraints.
For a lock, aquire/release semantics are sufficient because its only job is to protect some shared piece of data, it doesn't care about other locks.
> reordering would be bad even with different locks because it can introduce deadlocks, as somebody has pointed out.
Do you have an example of such a deadlock? Maybe it's best to work with some actual code.
That's fine, but swap the middle pair of unlock followed by lock on both threads and you have a standard deadlock based on different lock nestings.
And to reiterate from my previous comment, I do think that transform is or should be forbidden. That's not for synchronization reasons per se, but because the transform can change the set of externally visible side effects of the code. This argument requires that a store with release semantics is considered to be an externally visible side effect. That makes intuitive sense to me, though I don't remember what e.g. the C++ standard actually says about that.
Although pure aquire-release semantics would allow such reordering, it also didn't feel quite right to me, either. Fortunately, someone asked the exact same question on Stack Overflow and got a good answer: https://stackoverflow.com/a/61300584
Looks like you've been on the right track with regard to possible infinite loops.
Isn't this case discussed at the end of the article?
> C++ committee member Tony Van Eerd commented on reddit that with std::memory_order_acquire and two locks a and b; calls to a.unlock(); b.lock() and lock() for different locks could be reordered to b.lock(); a.unlock(); and introduce a potential deadlock. I don’t think that’s true. Reading section 6.9.2.1 Data races (paragraph 9) of the C++ standard] no loads or stores can bed moved into or out from between a load-acquire and store-release pair.
It's true that the compiler can't move reads/write out from between a load-acquire and store-release pair, but in my understanding it can certainly move reads/writes into the pair. Or can someone point me to the exact section of the standard where this is prohibited?
I think my stack overflow link above gives a more satisfying explanation on why the example can't deadlock.
Yes, it's the same example, but the answer appears to be subtly different. Acquire and release are only "one-way barriers". Consider the link in the sibling comment by spacechild1.
The title "Corrected memory order in some cases" is a bit misleading because the original code was correct. The commit just relaxes some unnecessarily strict memory orderings (to improve performance on platforms with weak memory models).