> You should never have a bunch of threads constantly spamming the same mutex. I...

forrestthewoods · on Oct 2, 2024

If you’re trying to mutate a dictionary many times from many threads you’re going to have a bad time. The fix isn’t a faster mutex, it’s don’t do that.

cogman10 · on Oct 2, 2024

Depends on the dictionary implementation. There's a number of thread safe dictionaries in the wild with varying degrees of parallelism performance. Pretty much all of them benefit from faster mutexes.

For example, some thread safe dictionaries will segment their underlying key/value pairs which allows them to have concurrent reads and writes for a given segment which significantly improves performance.

forrestthewoods · on Oct 3, 2024

> Pretty much all of them benefit from faster mutexes.

Faster doesn't mean fast. You really, really want to not use a thread-safe dictionary.

kccqzy · on Oct 2, 2024

> would be slower than just having everything wait to write to a shared location

Nope.

cogman10 · on Oct 2, 2024

Yup.

Message passing has allocation pressure and cache consistency pressure not present in using a shared message location. Especially as the amount of memory in question goes up, the benefit of a shared location increases in terms of the performance impact.

Sure, for something silly like writing to an int, there is negative benefit in a shared location, but when you start talking about a dictionary with 1 million entries, the shared location becomes much more of a benefit vs all the copying and allocating you'd have to do if you tried to do the same thing with message passing.

For some datastructures, it's the optimal way to move data around. For example, LMAX disruptor is about the fastest way to pass messages because of the shared memory and well tuned locks.

kccqzy · on Oct 2, 2024

You are talking about writes to a data structure such as a list or a dictionary from multiple threads. Nobody uses advanced message passing techniques for that. A list is basically the poster child of why you avoid mutexes: each thread writes to its own version of a sublist, with no use of mutexes, and then at the end of the processing every thread's lists are merged together. Merging a list takes O(1) time by manipulating a few pointers. You avoid mutexes and there's zero drawback. I don't know why you are talking about copying or allocating: a list requires no copying to be merged, and in this case all the allocations still happen over multiple threads so there's no change in allocation pressure. If your allocator is bad, there could be internal mutexes inside the allocator, but that's besides the point.

With dictionaries, you may have to do a bit of resizing and rehashing at the end. But in my benchmarks, it is still worthwhile.

cogman10 · on Oct 3, 2024

> Merging a list takes O(1) time

Depends on the list implementation. I assume we are talking C++ lists? in that case yeah, I can see how you'd instead do a `splice` of sublists.

I was thinking more in terms of something like a `vector` in which case adding the sublists in requires copying values from one vector to the source vector. That (especially if done wrong) can involve allocating potentially more than once to drop the results in.

But, for the record, there are lock free algorithms that don't require a mutex to add values into a list. You basically CAS the tail pointer with your new value.

That being said, I concede that for a list a mutex is probably the wrong choice. It's a better choice in Java which has constraints that prevent (easily) splicing together 2 lists.

For the dictionary, implementation is key. A good segmented dictionary is going to be hard to beat.