9 Jul
2023
9 Jul
'23
2:56 p.m.
Hi, Alexander, On Jul 07, Alexander Barkov wrote: > On 6/24/23 12:20 AM, Sergei Golubchik wrote: > > > > > diff --git a/sql/lex_charset.cc b/sql/lex_charset.cc > > > --- a/sql/lex_charset.cc > > > +++ b/sql/lex_charset.cc > > > @@ -447,6 +451,17 @@ Lex_exact_charset_opt_extended_collate::find_default_collation() const > > > > > > +CHARSET_INFO * > > > +Lex_exact_charset_opt_extended_collate:: > > > + find_mapped_default_collation(Charset_collation_map_st::Used *used, > > > + const Charset_collation_map_st &map) const > > > +{ > > > + CHARSET_INFO *cs= find_compiled_default_collation(); > > > + if (!cs) > > > + return nullptr; > > > + return map.get_collation_for_charset(used, cs); > > > > This seems a bit redundant. find_compiled_default_collation() will > > do get_charset_by_csname(m_ci, MY_CS_PRIMARY) and only then you'll > > be able to search, because you map default compiled collation id to > > use specified collation id. ... > And by the way, the situation when find_compiled_default_collation() > is really called is very rare itself: > > CREATE TABLE t1 (a CHAR(10) COLLATE DEFAULT) > CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; Indeed, this is quite uncommon. Ok. > > but conceptually you need to map a charset to a collation, not a > > collation to a collation. If you'd map, say, cs->cs_name.str (just > > the pointer, not the string value) or, say, cs->cset, then you'd be > > mapping a charset to a collation. and here you'd be able to do just > > > > return map.get_collation_for_charset(used, m_ci) > > > > with find_compiled_default_collation() being done inside > > get_collation_for_charset() and only when Elem_st wasn't found > > cs->cset is a pointer to MY_CHARSET_HANDLER. It cannot be used here, > because the same handler is shared between multiple character sets. Oops. Sorry. I thought for some reason that cs->cset describes the charset. But now I see that all charset methods get CHARSET_INFO as an argument and use tables from there. Indeed, cs->cset can be the same for many charsets (and it is). > cs->cs_name.str could work. However, this may not be reliable > with user-defined collations. What do you mean? > Taking into account that the situation when redundant > lookup is done is very rare, I propose not to change anything here now. > Instead, eventually we need to fix the collation library itself, > to avoid lookup inside get_charset_by_csname(m_ci, MY_CS_PRIMARY). I mean, logically there should be a structure that defines a character set. And a structure that defines a collation. Former doesn't exists, the latter is confusingly called CHARSET_INFO. Eventually there should be a clear distinction in the code just like you've recently made a clear distinction in the SQL. For now, perhaps, cs->cs_name.str could be such a "charset identifier". > > > diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc > > > --- a/sql/sql_parse.cc > > > +++ b/sql/sql_parse.cc > > > @@ -6049,6 +6049,27 @@ mysql_execute_command(THD *thd, bool is_called_from_prepared_stmt) > > > } > > > thd->reset_kill_query(); > > > } > > > + > > > + /* > > > + If a non-default collation (in @@character_set_collations) > > > + was used during the statement, the mysqlbinlog output for > > > + the current statement will contain a sequence like this: > > > + > > > + SET character_set_collations='utf8mb3=utf8mb3_bin'; > > > + INSERT INTO t1 VALUES (_utf8mb3'test'); > > > + COMMIT; > > > + > > > + The statment (INSERT in this example) is already in binlog at this point, and the > > > + and the "SET character_set_collations" is written inside a > > > + Q_CHARACTER_SET_COLLATIONS chunk in its log entry header. > > > + The flag CHARACTER_SET_COLLATIONS_USED is not needed any more. > > > + > > > + Let's suppress the flag to avoid a Q_CHARACTER_SET_COLLATIONS chunk > > > + inside the COMMIT log entry header - it would be useless and would > > > + only waste space in the binary log. > > > + */ > > > + thd->used&= ~THD::CHARACTER_SET_COLLATIONS_USED; > > > > Wouldn't that logic apply to other THD::xxx_USED flags? > > May be you should reset thd->used=0 here? > > Sounds reasonable. I tried thd->used=0. > > rpl.rpl_hrtime failed with multiple chunks like this: > > -SET TIMESTAMP=1293832861.123456/*!*/; > +SET TIMESTAMP=1293832861/*!*/; > COMMIT > > Do you know why binlog always puts "SET TIMESTAMP" before "COMMIT"? because a timestamp is a always present for any event, it's the first 4 bytes of the common event header, mandatory for all event types. In fact, event type itself is only in the 5th byte, after the timestamp. But the fractional part if printed only for Query_log_event and only if the Q_HRNOW field in the Query_log_event was present. I'd think that thd->used can be done not even before COMMIT, where you put it, but directly after MYSQL_BIN_LOG::write. 'used' flags affect the next binlog event, so it's reasonable to reset them after the event is written. > This change would not be comfortable for me. > I'd prefer if "SET TIMESTAMP" was not printed at all when it's not needed, > instead of removing only the fractional part. > > Also these tests: > binlog.binlog_mysqlbinlog_row > binlog.binlog_mysqlbinlog_row_myisam > failed with one chunk: > > SET TIMESTAMP=1000000000/*!*/; > -SET @@session.time_zone='SYSTEM'/*!*/; > COMMIT Yes, same. > What do you suggest? > > I'd solve it in a separate task. It seems it needs further > investigations, not related to this MDEV. I'd suggest to reset thd->used after the event was written. This should keep flags from leaking into the next event. Your thd->used&= ~THD::CHARACTER_SET_COLLATIONS_USED won't be needed. > > > diff --git a/sql/sql_prepare.cc b/sql/sql_prepare.cc > > > --- a/sql/sql_prepare.cc > > > +++ b/sql/sql_prepare.cc > > > @@ -3538,6 +3544,13 @@ static void mysql_stmt_execute_common(THD *thd, > > > DBUG_VOID_RETURN; > > > } > > > > > > + if (stmt->prepare_time_charset_collation_map_version() != > > > + thd->variables.character_set_collations.version()) > > > + { > > > + if (stmt->reprepare()) > > > + DBUG_VOID_RETURN; > > > + } > > > > why do you reprepare here and not where reprepare() is normally > > happens, in the Prepared_statement::execute_loop()? > > Can you please suggest a better place for this code? I mean, Prepared_statement already knows how to reprepare, it's already implemented and is done in Prepared_statement::execute_loop(). Why do you need to add another reprepare in a different place? > > > @@ -5220,6 +5252,13 @@ bool Prepared_statement::execute(String *expanded_query, bool open_cursor) > > > MYSQL_QUERY_EXEC_START(thd->query(), thd->thread_id, thd->get_db(), > > > &thd->security_ctx->priv_user[0], > > > (char *) thd->security_ctx->host_or_ip, 1); > > > + /* > > > + If PREPARE used @@character_set_collations, > > > + then we need to make sure binary log writes > > > + the map in the event header. > > > + */ > > > > why is it different from other THD::xx_USED flags? > > CHARACTER_SET_COLLATIONS_USED in most cases is set during parsing only, > and is not set on later stages afterwards. > > Other XXX_USED flags are set at later stages, either during fix_fields(), > or during evaluation. We don't need to care about copying them > from m_prepare_time_thd_used_flags, > because the flags will be set during EXECUTE (and in many cases they are > not even set during PREPARE). Still, you could've done thd->used|= m_prepare_time_thd_used_flags; couldn't you? It wouldn't beg the question of how CHARACTER_SET_COLLATIONS_USED is different. And in the future if there will be more parsing-time flags, it'll still work > > > diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy > > > --- a/sql/sql_yacc.yy > > > +++ b/sql/sql_yacc.yy > > > @@ -9686,6 +9713,9 @@ column_default_non_parenthesized_expr: > > > } > > > | CONVERT_SYM '(' expr USING charset_name ')' > > > { > > > + Character_set_collations_used used(thd); > > > + $5= thd->variables.character_set_collations. > > > + get_collation_for_charset(&used, $5); > > > > this is rather awkward pattern that you use everywhere, with > > > > Character_set_collations_used used(thd); > > > > why not to pass thd as an argument instead? > > charset_collations.h and lex_charset.h use > Charset_collation_map_st::Used. They neither know what its descendant > Character_set_collations_used is, nor know what THD is. > > I would like not to make charset_collations.h and lex_charset.h depend > on the entire sql_class.h. > > As a solution, we can move the part of THD where the data type used_t > and the enum with XXX_USED flags are defined into a separate class in > a separate header: yes, that sounds reasonable > > > diff --git a/sql/simple_tokenizer.h b/sql/simple_tokenizer.h > > > --- /dev/null > > > +++ b/sql/simple_tokenizer.h > <cut> > > > +class Simple_tokenizer > > > +{ > > > + const char *m_ptr; > > > + const char *m_end; > > > +public: > > > + Simple_tokenizer(const char *str, size_t length) > > > + :m_ptr(str), m_end(str + length) > > > + { } ... > - This charset-unaware (ASCII-only) tokenizer can be reused for a > number of system variables (sql_mode, optimizer_switch, > log_slow_filter, myisam_recover_options, > slave_transaction_retry_errors). could you add an MDEV for that please? perhaps even consider doing it after this one. > > > diff --git a/sql/charset_collations.h b/sql/charset_collations.h > > > --- /dev/null > > > +++ b/sql/charset_collations.h > <cut> > > > + > > > + struct Elem_st > > > + { > > > + protected: > > > + CHARSET_INFO *m_charset; > > > + CHARSET_INFO *m_collation; > > > > What is the difference between charset and collation? > > I was under impression that any CHARSET_INFO describes a collation, > > there is no special structure just for a charset. Am I wrong? > > You're not wrong. > > Just conceptually we map from a charset to a collation here. Hence the > names. > > I can rename to: > > CHARSET_INFO *m_from_charset; > CHARSET_INFO *m_to_collation; > > to make it more clear. Thanks. Much better. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org