This seems to assume that all information needed to determine if two keys are eq...

laughinghan · on March 21, 2019

What are you talking about?

> It's premised on the idea that you can dynamically change the hash function such that eventually you will find one with fewer than some fixed number of items sharing the same hash value.

Why would you think there are any fixed numbers? The OP is about a stdlib hash table (for Rust). Comparable hash table implementations all resize the table when necessary to decrease the load factor. Performance with respect to load factor still matters, of course, less resizing is better, but there are no fixed numbers of anything.

> It completely undermines the "worst-case constant lookup", because there is no upper bound on the number of times you might have to change the hash function and rebuild the table before you find one with sufficiently few collisions.

Why would you need to change the hash function or rebuild anything during lookup? It's lookup. Nobody is changing anything.

> It is a much stronger requirement on the keys than other hash tables have.

What hash table doesn't require keys to be hashable?

_urga · on March 21, 2019

Those are really questions and issues of implementation than of theory.

1. Tabulation hashing provides guarantees against chosen key attacks.

2. In addition, bucketized cuckoo hashing provides guarantees on the probability of resize (low compared to linear probing), and multiple resizes (very low compared to linear probing).

3. Further, most hash table implementations that I know of have layers of defense, including limiting the number of sequential resizes for a given key.

Putting everything together, if anything, I would be more concerned with the failure mode for linear probing (scanning the whole table before resizing) than for Cuckoo hashing (scanning two fixed-size buckets before resizing).

lixtra · on March 21, 2019

> a security concern - if an untrusted user can affect what data is added to the table, it could cause a severe DoS

Isn’t that a general problem [1] that is addressed with randomization of the input data before hashing?

[1] https://www.purehacking.com/blog/josh-zlatin/introduction-to...