- DO NOT change long hashing. Not worth the effort or differences from Java.
- DO change the vector hashcode algorithm either by changing the multiplier constant, rehashing the items, or both. (Currently I lean towards only rehashing the items.)
- DO change the set hashcode algorithm to rehash the items before summing.
- DO change the map hashcode algorithm to to remove the symmetric inclusion of keys and values. My preferred change would be to treat K/V as a 2-element vector and re-use whatever algorithm we use for that.
- DO NOT use xorShift32 for rehash - the reliance on xor can easily cause collisions when summing hashcodes in sets.
- DO use Murmur3 as rehash algorithm - it's fast, extremely well-tested, commonly used, has great avalanche and bit dispersion properties. Note, this is not for hashing the entire collection, but for rehashing individual element hash codes.
- DO use the identical hash algorithm for hash() and hasheq() in collections.
- Question from Andy Fingerhut: Is the recommendation to use the identical hash for hashCode() and hasheq()? hashCode() seems like we have our hands tied because it must be the same as for Java's collections. If you meant use the same for hash() and hasheq(), I am curious how hash() is used in the code at all?
- CONSIDER rehashing keys in PersistentHashMap.hash() for storage and retrieval - this would yield better hash keys for all users of persistent hash maps and sets, not just for numbers, but also for keywords, strings, symbols, and whatever else you put in them. This could yield shallower trees and better average performance on hash lookup for large trees. Proving this would likely require instrumentation inside PHM and/or developing some more intensive performance studies.