Hi, Alexander, I'm sorry it took a while. I'm still thinking about the whole thing, it's a rather big change for a really fringe functionality. But I failed to come up with something better so far. Code-wise the patch is mostly fine. See few small comments below, and one slightly larger comment re. replication. On Mar 07, Alexander Barkov wrote:
revision-id: 940d028521f (mariadb-10.11.1-4-g940d028521f) parent(s): 71dedd0ebcd author: Alexander Barkov committer: Alexander Barkov timestamp: 2022-12-14 20:00:22 +0400 message:
MDEV-30164 System variable for default collations
This patch adds a way to override default collations (or "character set collations") for desired character sets.
diff --git a/sql/lex_charset.h b/sql/lex_charset.h --- a/sql/lex_charset.h +++ b/sql/lex_charset.h @@ -544,6 +699,20 @@ struct Lex_exact_charset_extended_collation_attrs_st { return m_ci; } + CHARSET_INFO *charset_info(const Charset_collation_map_st &map) const + { + switch (m_type) + { + case Lex_exact_charset_extended_collation_attrs_st::TYPE_CHARACTER_SET: + return map.get_collation_for_charset(m_ci); + case TYPE_EMPTY:
Lex_exact_charset_extended_collation_attrs_st::TYPE_EMPTY (or all case labels without a struct name)
+ case Lex_exact_charset_extended_collation_attrs_st::TYPE_CHARACTER_SET_COLLATE_EXACT: + case Lex_exact_charset_extended_collation_attrs_st::TYPE_COLLATE_CONTEXTUALLY_TYPED:
COLLATE DEFAULT is TYPE_COLLATE_CONTEXTUALLY_TYPED. shouldn't it use the map too?
+ case Lex_exact_charset_extended_collation_attrs_st::TYPE_COLLATE_EXACT: + break; + } + return m_ci; + } Type type() const { return m_type; diff --git a/sql/log_event.h b/sql/log_event.h --- a/sql/log_event.h +++ b/sql/log_event.h @@ -2147,6 +2153,8 @@ class Query_log_event: public Log_event bool sql_mode_inited; bool charset_inited;
+ LEX_CSTRING character_set_collations_str;
better LEX_CUSTRING, don't you think? It's not even a _str.
+ uint32 flags2; sql_mode_t sql_mode; ulong auto_increment_increment, auto_increment_offset; diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc --- a/sql/log_event_server.cc +++ b/sql/log_event_server.cc @@ -1194,6 +1194,14 @@ bool Query_log_event::write() int2store(start+2, auto_increment_offset); start+= 4; } + + if (thd && thd->variables.character_set_collations.count()) + { + *start++= Q_COLLATIONS_SESSION; + size_t len= thd->variables.character_set_collations.to_binary((char*)start); + start+= len; + }
Perhaps, detect if it's needed? A cheap way of doing it would be to extend your Elem_st with a query_id. And every time you find_elem_by_charset, you set this elem's query_id to thd->query_id. And here you write only elements with the current query id. If any. Another approach would be to have a bitmap, like uchar used_default_coll_mapping; and in find_elem_by_charset() you set the bit, like used_default_coll_mapping |= 1 << i; and then, again, print affected collations, if any. Most often used_default_coll_mapping will likely be zero one more question. In, say, 10.10->11.1 replication master and slave will have different default collations, but thd->variables.character_set_collations will not reflect that. How do you plan to solve it?
+ if (charset_inited) { *start++= Q_CHARSET_CODE;
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org