[Maria-developers] my_hash_sort_bin
Hello MariaDB, While working with Dmitry Lenev to understand why MySQL 5.6 was slower than 5.1 on one benchmark ( https://www.facebook.com/notes/mysql-at-facebook/my-mysql-is-faster-than-you...) we realized that the hash function used in my_hash_sort_bin is lousy for this input: test.sbest1, test.sbtest2, ..., test.sbtest10. The problem is made worse when a small number of hash buckets is used because the hash function output doesn't do the right thing for the least significant bits so that all 10 inputs map to the same hash bucket. More details are at http://bugs.mysql.com/bug.php?id=66473 The InnoDB hash function is much better. Details for that and a test program are in the bug report. Does anyone remember why this hash function was chosen? strings/ctype-bin.c doesn't have any comments explaining why this hash function was selected. This is another peeve for me. Critical code like this should be explained if we expect anyone new to begin working on this code. I haven't done a formal test of the hash function, and http://burtleburtle.net/bob/hash/hashfaq.html would be a good reference for that. But I think it is very likely that my assertion that this hash function is lousy holds in general, rather than is limited to this input. And given the large number of machines that run MySQL, MariaDB and Percona Server (and the FB patch) around the world I suspect that we should figure this out. -- Mark Callaghan mdcallag@gmail.com
Hi, Mark! On Feb 19, MARK CALLAGHAN wrote:
we realized that the hash function used in my_hash_sort_bin is lousy for this input: test.sbest1, test.sbtest2, ..., test.sbtest10. The problem is made worse when a small number of hash buckets is used because the hash function output doesn't do the right thing for the least significant bits so that all 10 inputs map to the same hash bucket. More details are at http://bugs.mysql.com/bug.php?id=66473
The InnoDB hash function is much better. Details for that and a test program are in the bug report. Does anyone remember why this hash function was chosen?
strings/ctype-bin.c doesn't have any comments explaining why this hash function was selected. This is another peeve for me. Critical code like this should be explained if we expect anyone new to begin working on this code.
This is mysql hash function from the dawn of time. Before charset code it existed in the mysys/hash.c and I found it unchanged in as early as mysql-3.20.13 (it's 1997). Regards, Sergei
participants (2)
-
MARK CALLAGHAN
-
Sergei Golubchik