Re: [Maria-developers] e3f45b2f9ea: MDEV-10267 Add ngram fulltext parser plugin
Hi, Rinat! On Nov 02, Rinat Ibragimov wrote:
I can't decide. From my point of view, the current approach is fine. Please pick a variant, and I'll try to implement that.
No, I cannot guess which approach will produce more relevant searches. Implement something and then we test what works better
Variable-length n-grams approach is too innovative, and hard to reason about. I've never heard about such an approach, and it doesn't look good to me. So I'll stick with a simple slicer.
If you mean that variant where it splits "n-grams approach" to "n-gr", "gra", "ram", "ams", "ms a", "s ap", "app", ... then it's just "n letters in every chunk" very easy to explain. But ok, let's start simple and benchmark.
Of course, it can. Note that fts_get_word() doesn't generate n-grams either, it gets the whole word and the n-gram plugin later splits it into n-grams. Similarly param->mysql_parse() will extract words for you and you'll split them into n-grams.
Changed to use param->mysql_parse().
Turns out that in Aria, MyISAM, and InnoDB, param->mysql_parser() does call back param->mysql_add_word(). Is it part of the plugin API?
Yes, it is. E.g. slide 12 from my old presentation: http://conferences.oreillynet.com/presentations/mysql06/golubchik_sergei.pdf shows that there are three points where a plugin can add functionality. * It can extract the text and then call param->mysql_parser(), this allows to parse, say, gzip-ed texts or EXIF comments in images. * It can replace param->mysql_parser(), to use different rules for spliting the text into words. This is what the n-gram plugin normally does * It can replace param->mysql_add_word() to post-process every word after the built-in parser did the splitting. For example, stemming or soundex plugin can do that.
Comments in include/mysql/plugin_ftparser.h do not mention that at all. That's why I initially thought that param->mysql_parse() will parse the string like the default parser do, without any ways to interact with the process.
I've edited the comment to mention this possibility. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
participants (1)
-
Sergei Golubchik