Hi, Alexander, Few comments/questions below. Meanwhile I'm reviewing bb-10.9-bar-uca14 On May 26, Alexander Barkov wrote:
By the way, perhaps some of these statements should display short collation names:
SHOW CREATE TABLE t1; SHOW CREATE DATABASE db1; SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS; SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES; SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
Can we discuss this?
Short names, I guess. First two - for readability, the last three - so that one could join with INFORMATION_SCHEMA.COLLATIONS table.
diff --git a/include/m_ctype.h b/include/m_ctype.h index 4c6628b72b3..706764ead2a 100644 --- a/include/m_ctype.h +++ b/include/m_ctype.h @@ -240,6 +242,46 @@ typedef enum enum_repertoire_t } my_repertoire_t;
+/* ID compatibility */ +typedef enum enum_collation_id_type +{ + MY_COLLATION_ID_TYPE_PRECISE= 0, + MY_COLLATION_ID_TYPE_COMPAT_100800= 1 +} my_collation_id_type_t; + +/* Collation name display modes */ +typedef enum enum_collation_name_mode +{ + MY_COLLATION_NAME_MODE_FULL= 0, + MY_COLLATION_NAME_MODE_CONTEXT= 1 +} my_collation_name_mode_t; + +/* Level flags */ +#define MY_CS_LEVEL_BIT_PRIMARY 0x00 +#define MY_CS_LEVEL_BIT_SECONDARY 0x01 +#define MY_CS_LEVEL_BIT_TERTIARY 0x02 +#define MY_CS_LEVEL_BIT_QUATERNARY 0x03 + +#define MY_CS_COLL_LEVELS_S1 (1<<MY_CS_LEVEL_BIT_PRIMARY) + +#define MY_CS_COLL_LEVELS_AI_CS (1<<MY_CS_LEVEL_BIT_PRIMARY)| \ + (1<<MY_CS_LEVEL_BIT_TERTIARY) + +#define MY_CS_COLL_LEVELS_S2 (1<<MY_CS_LEVEL_BIT_PRIMARY)| \ + (1<<MY_CS_LEVEL_BIT_SECONDARY) + +#define MY_CS_COLL_LEVELS_S3 (1<<MY_CS_LEVEL_BIT_PRIMARY)| \ + (1<<MY_CS_LEVEL_BIT_SECONDARY) | \ + (1<<MY_CS_LEVEL_BIT_TERTIARY)
AI_CS and S3 don't seem to be used yet
Right, there are no old _AI_CS and _AS_CS (aka S3) collations.
New _AI_CS and _AS_CS collations definitions are initialized by this function:
my_uca1400_collation_definition_init(MY_CHARSET_LOADER *loader, struct charset_info_st *dst, uint id)
Level flags are calculated by this function from "id".
So there are no hard-coded definitions with MY_CS_COLL_LEVELS_AI_CS and MY_CS_COLL_LEVELS_S3 either.
Should I remove these definitions?
as you like, but if you want to keep them - add a comment. otherwise anyone can remove them when they'll see those defines are unused
+ +#define MY_CS_COLL_LEVELS_S4 (1<<MY_CS_LEVEL_BIT_PRIMARY)| \ + (1<<MY_CS_LEVEL_BIT_SECONDARY) | \ + (1<<MY_CS_LEVEL_BIT_TERTIARY) | \ + (1<<MY_CS_LEVEL_BIT_QUATERNARY) + /* Flags for strxfrm */ #define MY_STRXFRM_LEVEL1 0x00000001 /* for primary weights */ #define MY_STRXFRM_LEVEL2 0x00000002 /* for secondary weights */ diff --git a/strings/ctype-simple.c b/strings/ctype-simple.c index b579f0af203..d09dfba86ed 100644 --- a/strings/ctype-simple.c +++ b/strings/ctype-simple.c @@ -1940,13 +1941,26 @@ my_bool my_propagate_complex(CHARSET_INFO *cs __attribute__((unused)), ...
why do you still use the old concept of "strength"? Why not to use bitmap consistently everywhere?
The collation definition file Index.xml is based on the LDML syntax.
It uses tags like this:
<settings strength="2"/>
This function is needed to handle these LDML tags.
Btw, to define user-defined _AI_CS collations we'll need to add an LDML extension eventually.
Hmm. Are you saying that LDML cannot describe a UCA 14.0.0 collation? Or L1+L3 without L2 isn't standard?
+} + diff --git a/strings/ctype-uca1400data.h b/strings/ctype-uca1400data.h --- /dev/null +++ b/strings/ctype-uca1400data.h @@ -0,0 +1,44151 @@ +/* + Generated from allkeys.txt version '14.0.0' +*/
if it's generated, do you need to check it in? perhaps it should be generated during the build? you've checked in allkeys1400.txt anyway.
Right, we can consider it.
Btw, I've checked it all versions:
$ ls mysql-test/std_data/unicode/
allkeys1400.txt allkeys400.txt allkeys520.txt
So we can generate sources for all three UCA versions from these files.
But I suggest we do it separately. Should I create an MDEV for this?
Why would you want to add to the history a file that is generated? It'll be there forever. You already have uca-dump.c in the tree, running it during the build is a few lines in CMakeLists.txt, basically copy-paste, because we already do it for gen_lex_hash and comp_err. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org