Hi, Mark! On Feb 19, MARK CALLAGHAN wrote:
we realized that the hash function used in my_hash_sort_bin is lousy for this input: test.sbest1, test.sbtest2, ..., test.sbtest10. The problem is made worse when a small number of hash buckets is used because the hash function output doesn't do the right thing for the least significant bits so that all 10 inputs map to the same hash bucket. More details are at http://bugs.mysql.com/bug.php?id=66473
The InnoDB hash function is much better. Details for that and a test program are in the bug report. Does anyone remember why this hash function was chosen?
strings/ctype-bin.c doesn't have any comments explaining why this hash function was selected. This is another peeve for me. Critical code like this should be explained if we expect anyone new to begin working on this code.
This is mysql hash function from the dawn of time. Before charset code it existed in the mysys/hash.c and I found it unchanged in as early as mysql-3.20.13 (it's 1997). Regards, Sergei