This seems to assume that all information needed to determine if two keys are equal is incorporated into the resulting hash value.
It's premised on the idea that you can dynamically change the hash function such that eventually you will find one with fewer than some fixed number of items sharing the same hash value.
There are two problems:
1) It completely undermines the "worst-case constant lookup", because there is no upper bound on the number of times you might have to change the hash function and rebuild the table before you find one with sufficiently few collisions.
2) It is a much stronger requirement on the keys than other hash tables have. With other hash tables I can simply omit "difficult to hash" data from my hash function, on the basis that enough other data is hashed to avoid a significant performance penalty. With this implementation, the entire hash table will simply fail.
This additional failure mode is also a security concern - if an untrusted user can affect what data is added to the table, it could cause a severe DoS where the table will either enter an infinite loop trying to find a non-conflicting hash function, or have an unexpected failure mode (most people don't expect their hash tables to randomly fail). Even if it is somehow designed such that a hash function can always be found, the attacker could spend time up-front finding values that take a long time for that hash function to be reached, causing numerous re-hashes of the table.
> It's premised on the idea that you can dynamically change the hash function such that eventually you will find one with fewer than some fixed number of items sharing the same hash value.
Why would you think there are any fixed numbers? The OP is about a stdlib hash table (for Rust). Comparable hash table implementations all resize the table when necessary to decrease the load factor. Performance with respect to load factor still matters, of course, less resizing is better, but there are no fixed numbers of anything.
> It completely undermines the "worst-case constant lookup", because there is no upper bound on the number of times you might have to change the hash function and rebuild the table before you find one with sufficiently few collisions.
Why would you need to change the hash function or rebuild anything during lookup? It's lookup. Nobody is changing anything.
> It is a much stronger requirement on the keys than other hash tables have.
What hash table doesn't require keys to be hashable?
Those are really questions and issues of implementation than of theory.
1. Tabulation hashing provides guarantees against chosen key attacks.
2. In addition, bucketized cuckoo hashing provides guarantees on the probability of resize (low compared to linear probing), and multiple resizes (very low compared to linear probing).
3. Further, most hash table implementations that I know of have layers of defense, including limiting the number of sequential resizes for a given key.
Putting everything together, if anything, I would be more concerned with the failure mode for linear probing (scanning the whole table before resizing) than for Cuckoo hashing (scanning two fixed-size buckets before resizing).
It's premised on the idea that you can dynamically change the hash function such that eventually you will find one with fewer than some fixed number of items sharing the same hash value.
There are two problems: 1) It completely undermines the "worst-case constant lookup", because there is no upper bound on the number of times you might have to change the hash function and rebuild the table before you find one with sufficiently few collisions.
2) It is a much stronger requirement on the keys than other hash tables have. With other hash tables I can simply omit "difficult to hash" data from my hash function, on the basis that enough other data is hashed to avoid a significant performance penalty. With this implementation, the entire hash table will simply fail.
This additional failure mode is also a security concern - if an untrusted user can affect what data is added to the table, it could cause a severe DoS where the table will either enter an infinite loop trying to find a non-conflicting hash function, or have an unexpected failure mode (most people don't expect their hash tables to randomly fail). Even if it is somehow designed such that a hash function can always be found, the attacker could spend time up-front finding values that take a long time for that hash function to be reached, causing numerous re-hashes of the table.