Hi, Alexander! On Jun 23, Alexander Barkov wrote:
+#define MY_FUNCTION_NAME(x) my_ ## x ## _big5_chinese_ci +#define WEIGHT_MB1(x) (sort_order_big5[(uchar) (x)]) +#define WEIGHT_MB2(x,y) (big5code(x, y)) +#include "ctype-strcoll.ic" + + +#undef MY_FUNCTION_NAME +#undef WEIGHT_MB1 +#undef WEIGHT_MB2
alternatively you could undef them at the end of ctype-strcoll.ic it'll also be a protection against including this file twice (although it cannot be done now anyway)
Thanks for the idea. This can certainly be done for WEIGHT_MB1 and WEIGHT_MB2.
What do you suggest to do with MY_FUNCTION_NAME? It's now used in ctype-strcoll.ic in collation name context, and in ctype-mb.ic in character name context.
So the structure of a file is about like this:
#define MY_FUNCTION_NAME(x) my_ ## x ## _sjis #define ... character set specific definitions #include "ctype-mb.ic" ... #undef MY_FUNCTION_NAME #define MY_FUNCTION_NAME(x) my_ ## x ## _sjis_japanese_ci #define ... ci collation specific definitions #include "ctype-strcoll.ic" ... #undef MY_FUNCTION_NAME #define MY_FUNCTION_NAME(x) my_ ## x ## _sjis_bin #define ... bin collation specific definitions #include "ctype-strcoll.ic" ...
Perhaps I should undef MY_FUNCTION_NAME in both ctype-mb.ic and ctype-strncoll.ic
Yes, I would've done that, I guess.
Another approach would be to have two things:
MY_CHARSET_FUNCTION_NAME MY_COLLATION_FUNCTION_NAME
What will be easier to read?
I don't think so. Unless you need both MY_CHARSET_FUNCTION_NAME and MY_COLLATION_FUNCTION_NAME in one include file. But you apparently don't. By the way, could you please rename sql/sql_plugin_services.h to sql/sql_plugin_services.ic and sql/sys_vars.h to sql/sys_vars.ic? As you're introducing a new convention, let's follow it consistently.
+#define MY_FUNCTION_NAME(x) my_ ## x ## _big5_bin +#define WEIGHT_MB1(x) ((uchar) (x)) +#define WEIGHT_MB2(x,y) (big5code(x, y)) +#include "ctype-strcoll.ic" + diff --git a/strings/ctype-strcoll.ic b/strings/ctype-strcoll.ic new file mode 100644 index 0000000..7217f99 --- /dev/null +++ b/strings/ctype-strcoll.ic @@ -0,0 +1,208 @@ +/* + Copyright (c) 2015, MariaDB Foundation + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; version 2 of the License. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +*/ + + +#ifndef MY_FUNCTION_NAME +#error MY_FUNCTION_NAME is not defined +#endif + + +/* + The weight for automatically padded spaces when comparing strings with + the PAD SPACE property. + Should normally be equal to the weight of a regular space.
it is not a space for sjis and cp932? what does that mean? padding space is not the same as a regular space? why?
It is the same.
- For sjis and cp932 wight for a single byte character is just defined in a more complex way:
#define WEIGHT_MB1(x) (256 * (int) (uchar) (x)) #define WEIGHT_PAD_SPACE (256 * (int) ' ')
So regular and pad spaces have the same weight again: 0x2000
In that case it might've been easier to read (less questions asked) if you'd define it unconditionally as #define WEIGHT_PAD_SPACE WEIGHT_MB1(' ')
+/* + Weight of an illegal byte. + Must be greater than weight of any normal character. + Two bad bytes are compared binary. +*/ +#ifndef WEIGHT_ILSEQ +#define WEIGHT_ILSEQ(x) (0xFF00 + (x))
why #ifndef? it creates an impression that the caller (includer :) can redefine it. better define WEIGHT_ILSEQ here unconditionally and undefine it at the end
You got a correct impression.
The next step is to implement Asian MB3 character sets, eucjpms and ujis. WEIGHT_ILSEQ will be a 24-bit value for them, so they will redefine it, most likely like this:
#define WEIGHT_ILSEQ(x) (0xFFFF00 + (x))
I don't particularly like that. Same as with WEIGHT_PAD_SPACE - one thinks that WEIGHT_ILSEQ depends on the charset. While in fact it only depends on the max_char_length. I cannot immediately suggest a better definition though (constant and easy to understand), so it's ok to keep it like you've done. Regards, Sergei