Hello Sergei, Thanks for the review. A new patch is here: https://github.com/MariaDB/server/commit/0629711db43ec489a360d8f689b72fac66a... See also comments below: On 9/11/21 5:46 PM, Sergei Golubchik wrote:
Hi, Alexander!
please add more benchmark results to the MDEV, all the variants that you've tried and speedups (or slowdowns) for each.
otherwise, basically, no comments, see below. the only real concern is to know exactly when it helps and when it doesn't.
I added benchmark results to the MDEV. <cut>
--- /dev/null +++ b/strings/ctype-ascii.h @@ -0,0 +1,189 @@ +#ifndef CTYPE_ASCII_H_INCLUDED +#define CTYPE_ASCII_H_INCLUDED
The convention is CTYPE_ASCII_INCLUDED (every file has .h, so it's omitted from the include guard)
Fixed. <cut>
+/* + The following macro returns the bit 0x20 set to: + - 1 for input bytes in the ranges [60..7F] or [E0..FF] + - 0 otherwise + Bytes in the ranges [40..7F] and [C0..FF] have the bit 0x40 set. + Bytes in the ranges [60..7F] smd [E0..FF] have the bit 0x20 set.
s/smd/and/
Fixed. <cut>
+static inline int +my_strcoll_ascii_toupper_4bytes(const uchar *a, const uchar *b) +{ + ulonglong abn= (((ulonglong) mi_uint4korr(a)) << 32) | mi_uint4korr(b); + abn= my_ascii_to_upper_magic_uint64(abn); + if ((uint) (abn >> 32) == (uint32) abn)
s/uint/uint32/ ?
Fixed. <cut>
+ /* + TODO: + Try to get advantage of SIMD instructions by massive comparison + (16 bytes at a time) of characters against (x>='a' && x<='z') using: + - either explicit intrinsics
I thought your SIMD version was slower, wasn't it?
On quick tests SIMD appeared to be 1.5 times faster on pure ASCII (comparing to the current patch version). I decided not to include SIMD at this point yet. Enabling SIMD needs some further experimenting: 1. How to enable SIMD properly: - during compile time through cmake only - or additionally during run time depending on whether the processor really supports the SIMD commands used. 2. It should be more efficient to handle both ASCII and multi-byte characters by SIMD operations, instead of just catching ASCII substrings. This at least should be possible for _bin collations. Not sure about _ci. <cut>
+#if defined(STRCOLL_MB7_BIN) +#define MY_STRCOLL_MB7_4BYTES(a,b) my_strcoll_mb7_bin_4bytes((a),(b)) +#if defined(__x86_64__)
as Wlad has already said, what about not gcc-on-intel 64bit platforms?
Fixed to: #if SIZEOF_VOIDP == 8 .. #endif Thanks to Wlad for the suggestion.