I'm developing a project which uses soundex matching (alongside other options more sophisticated than that). This is custom software with a mariad back end. MariaDB soundex built-in produces codes of apparently arbitary length; the usual is 4 characters. I've come to the conclusion that the best way forward is to mimic Maria's soundex in my code. I cannot find a description of the algorithm Maria uses though: the only source code I can find is in mf_soundex.c, and that is explicitly limited to 4 characters. But I have installed triggers which generate codes such as A4152636125262 Where is the soundex function found? Anyone know?
Hi!
On Thu, Apr 25, 2024 at 4:12 AM lucyfrost--- via developers
I'm developing a project which uses soundex matching (alongside other options more sophisticated than that). This is custom software with a mariad back end.
MariaDB soundex built-in produces codes of apparently arbitary length; the usual is 4 characters. I've come to the conclusion that the best way forward is to mimic Maria's soundex in my code.
I cannot find a description of the algorithm Maria uses though: the only source code I can find is in mf_soundex.c, and that is explicitly limited to 4 characters. But I have installed triggers which generate codes such as A4152636125262
Where is the soundex function found? Anyone know?
Can you show an example of how you got longer codes. MariaDB has two soundex implementations, the one in mf_soundex.c and the other in Item_strfunc.cc, search for Item_func_soundex::val_str(). The later soundex function can generate more characters than 4 (which is probably a bug as it was not intended to work that way) Regards, Monty
The simplest example is something like UPDATE a_table SET column1=SOUNDEX(column2); This gives codes from 4 to (as far as I have found) 8 characters. -- Sent with Tuta; enjoy secure & ad-free emails: https://tuta.com Apr 25, 2024, 19:12 by michael.widenius@gmail.com:
Hi!
On Thu, Apr 25, 2024 at 4:12 AM lucyfrost--- via developers
wrote: I'm developing a project which uses soundex matching (alongside other options more sophisticated than that). This is custom software with a mariad back end.
MariaDB soundex built-in produces codes of apparently arbitary length; the usual is 4 characters. I've come to the conclusion that the best way forward is to mimic Maria's soundex in my code.
I cannot find a description of the algorithm Maria uses though: the only source code I can find is in mf_soundex.c, and that is explicitly limited to 4 characters. But I have installed triggers which generate codes such as A4152636125262
Where is the soundex function found? Anyone know?
Can you show an example of how you got longer codes. MariaDB has two soundex implementations, the one in mf_soundex.c and the other in Item_strfunc.cc, search for Item_func_soundex::val_str(). The later soundex function can generate more characters than 4 (which is probably a bug as it was not intended to work that way)
Regards, Monty
Hi, lucyfrost, On Apr 25, lucyfrost via developers wrote:
I'm developing a project which uses soundex matching (alongside other options more sophisticated than that). This is custom software with a mariad back end.
MariaDB soundex built-in produces codes of apparently arbitary length; the usual is 4 characters. I've come to the conclusion that the best way forward is to mimic Maria's soundex in my code.
I cannot find a description of the algorithm Maria uses though: the only source code I can find is in mf_soundex.c, and that is explicitly limited to 4 characters. But I have installed triggers which generate codes such as A4152636125262
Where is the soundex function found? Anyone know?
https://github.com/MariaDB/server/blob/11.5/sql/item_strfunc.cc#L2953-L3094 Regards, Sergei Chief Architect, MariaDB Server and security@mariadb.org
Thanks for that. I now have two choices: will I use mariadb to manipulate soundex via triggers, and possibly stored procs, and do a parallel implementation for the front-end, or do all the soundex in the front-end? Given that I have my suspicions that soundex was added to the spec without much thought or real need, I shall go back to the client and see what they want soundex for really. Lucy -- Sent with Tuta; enjoy secure & ad-free emails: https://tuta.com Apr 26, 2024, 00:22 by serg@mariadb.org:
Hi, lucyfrost,
On Apr 25, lucyfrost via developers wrote:
I'm developing a project which uses soundex matching (alongside other options more sophisticated than that). This is custom software with a mariad back end.
MariaDB soundex built-in produces codes of apparently arbitary length; the usual is 4 characters. I've come to the conclusion that the best way forward is to mimic Maria's soundex in my code.
I cannot find a description of the algorithm Maria uses though: the only source code I can find is in mf_soundex.c, and that is explicitly limited to 4 characters. But I have installed triggers which generate codes such as A4152636125262
Where is the soundex function found? Anyone know?
https://github.com/MariaDB/server/blob/11.5/sql/item_strfunc.cc#L2953-L3094
Regards, Sergei Chief Architect, MariaDB Server and security@mariadb.org
participants (3)
-
lucyfrost@tutanota.com
-
Michael Widenius
-
Sergei Golubchik