Re: 56fa1da9c67: MDEV-30048 Prefix keys for CHAR work differently for MyISAM vs InnoDB
Hi, Alexander, The actual code change (replacing strnncollsp with strnncollsp_nchars) is ok. But I didn't like the functions you've created. The *_ft_* family is quite confusing, I would not expect to see fulltext-specific functions my_compare.h. I would expect to see there functions which are generic and named by what they're doing, not by where they should be used. ha_compare_char_fixed and ha_compare_char_varying are better. Names are clearer and it's kind of understandable what they do. I suggest you remove *_ft_* functions. fulltext code should, I guess, just use charset_info->coll->strnncoll() everywhere. Not strnncollsp(), because words normally don't have trailing spaces, and if some custom pluggable parser returns words with trailing spaces, they're likely a part of the word and should be compared too. Regards, Sergei Chief Architect, MariaDB Server and security@mariadb.org On Oct 17, Alexander Barkov wrote:
revision-id: 56fa1da9c67 (mariadb-10.4.28-90-g56fa1da9c67) parent(s): ed2adc8c6f9 author: Alexander Barkov committer: Alexander Barkov timestamp: 2023-04-07 14:54:17 +0400 message:
MDEV-30048 Prefix keys for CHAR work differently for MyISAM vs InnoDB
Also fixes: MDEV-30050 Inconsistent results of DISTINCT with NOPAD
Problem:
Key segments for CHAR columns where compared using strnncollsp() for engines MyISAM and Aria.
This did not work correct in case if the engine applyied trailing space compression.
Fix:
Replacing ha_compare_text() with a number of new functions:
- ha_compare_ft_text_full() - ha_compare_ft_text_prefix() - ha_compare_ft_text() - ha_compare_char_varying() - ha_compare_char_fixed()
The code branch corresponding to comparison of CHAR column keys (HA_KEYTYPE_TEXT segment type) now uses ha_compare_char_fixed() which calls strnncollsp_nchars().
For the rest of the code: - comparison of VARCHAR/TEXT column keys (HA_KEYTYPE_VARTEXT1, HA_KEYTYPE_VARTEXT2 segments types) - comparison in the fulltext code this patch does not change the behaviour.
Hello Sergei, Thanks for the review. On 10/18/23 12:29 AM, Sergei Golubchik wrote:
Hi, Alexander,
The actual code change (replacing strnncollsp with strnncollsp_nchars) is ok.
But I didn't like the functions you've created. The *_ft_* family is quite confusing, I would not expect to see fulltext-specific functions my_compare.h. I would expect to see there functions which are generic and named by what they're doing, not by where they should be used.
ha_compare_char_fixed and ha_compare_char_varying are better. Names are clearer and it's kind of understandable what they do.
I suggest you remove *_ft_* functions. fulltext code should, I guess, just use charset_info->coll->strnncoll() everywhere. Not strnncollsp(), because words normally don't have trailing spaces, and if some custom pluggable parser returns words with trailing spaces, they're likely a part of the word and should be compared too.
I removed *_ft_* functions. Please find a new patch here: https://github.com/MariaDB/server/commit/62078f55b832402a2e2c6982349c02fc4c1... As agreed on slack, this patch does not change strnncollsp() to strnncoll(), that will be done separately when needed. Thanks.
Regards, Sergei Chief Architect, MariaDB Server and security@mariadb.org
On Oct 17, Alexander Barkov wrote:
revision-id: 56fa1da9c67 (mariadb-10.4.28-90-g56fa1da9c67) parent(s): ed2adc8c6f9 author: Alexander Barkov committer: Alexander Barkov timestamp: 2023-04-07 14:54:17 +0400 message:
MDEV-30048 Prefix keys for CHAR work differently for MyISAM vs InnoDB
Also fixes: MDEV-30050 Inconsistent results of DISTINCT with NOPAD
Problem:
Key segments for CHAR columns where compared using strnncollsp() for engines MyISAM and Aria.
This did not work correct in case if the engine applyied trailing space compression.
Fix:
Replacing ha_compare_text() with a number of new functions:
- ha_compare_ft_text_full() - ha_compare_ft_text_prefix() - ha_compare_ft_text() - ha_compare_char_varying() - ha_compare_char_fixed()
The code branch corresponding to comparison of CHAR column keys (HA_KEYTYPE_TEXT segment type) now uses ha_compare_char_fixed() which calls strnncollsp_nchars().
For the rest of the code: - comparison of VARCHAR/TEXT column keys (HA_KEYTYPE_VARTEXT1, HA_KEYTYPE_VARTEXT2 segments types) - comparison in the fulltext code this patch does not change the behaviour.
Hi, Alexander, On Oct 17, Sergei Golubchik wrote:
Hi, Alexander,
The actual code change (replacing strnncollsp with strnncollsp_nchars) is ok.
But I didn't like the functions you've created. The *_ft_* family is quite confusing, I would not expect to see fulltext-specific functions my_compare.h. I would expect to see there functions which are generic and named by what they're doing, not by where they should be used.
ha_compare_char_fixed and ha_compare_char_varying are better. Names are clearer and it's kind of understandable what they do.
I've looked at your patch with renamed functions: ha_compare_ft_text_full -> ha_compare_word ha_compare_ft_text_prefix -> ha_compare_word_prefix ha_compare_ft_text -> ha_compare_word_or_prefix That looked better, more clear. I think this would be ok to push. May be with a comment, like /* the following functions are comparing one word of text - a common operation in the full-text search */ Regards, Sergei Chief Architect, MariaDB Server and security@mariadb.org
Hello Sergei, On 10/21/23 5:56 PM, Sergei Golubchik wrote:
Hi, Alexander,
On Oct 17, Sergei Golubchik wrote:
Hi, Alexander,
The actual code change (replacing strnncollsp with strnncollsp_nchars) is ok.
But I didn't like the functions you've created. The *_ft_* family is quite confusing, I would not expect to see fulltext-specific functions my_compare.h. I would expect to see there functions which are generic and named by what they're doing, not by where they should be used.
ha_compare_char_fixed and ha_compare_char_varying are better. Names are clearer and it's kind of understandable what they do.
I've looked at your patch with renamed functions:
ha_compare_ft_text_full -> ha_compare_word ha_compare_ft_text_prefix -> ha_compare_word_prefix ha_compare_ft_text -> ha_compare_word_or_prefix
That looked better, more clear. I think this would be ok to push. May be with a comment, like
/* the following functions are comparing one word of text - a common operation in the full-text search */
Please find a new version here: https://github.com/MariaDB/server/commit/99ffd7e3934cb1dd7b6c0ad553a5d544527... Thanks.
Regards, Sergei Chief Architect, MariaDB Server and security@mariadb.org
participants (2)
-
Alexander Barkov
-
Sergei Golubchik