developers
Threads by month
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 8 participants
- 6807 discussions
10 Jul '24
Hi!
I assume everyone on this mailing list has a GitHub account. Could you
please +1 these PRs that add a mention of MariaDB in the documentation
of various projects that currently only mention MySQL?
Open each link below and press the thumbs up icon in the lower left
corner of the PR description:
https://github.com/bookshelf/bookshelf/pull/2129
https://github.com/ponyorm/pony/pull/708
https://github.com/metabase/metabase/issues/40325
https://github.com/SeaQL/seaql.github.io/pull/116
I am asking this because there are a lot of database tools and
libraries out there that work with both MySQL and MariaDB, but they
only mention MySQL, which is a pity, because the users of those tools
might actually switch from MariaDB to MySQL simply out of
confusion/fear that the tool supports only MySQL, which is almost
never true.
One of those rare cases is Javascript/Node.js Drizzle ORM which needs
changes to support MariaDB. There is already a PR for that, you could
+1 it as well:
https://github.com/drizzle-team/drizzle-orm/pull/1692
The Rust Diesel ORM has a similar situation going on:
https://github.com/diesel-rs/diesel/issues/1882
https://github.com/diesel-rs/diesel/pull/3964
However most projects have absolutely no MySQL-specific things and
work with MariaDB, and just needed the docs updated, like was done in
these cases:
https://github.com/mysql-net/MySqlConnector/pull/1460
https://github.com/rails/rails/pull/51330
https://github.com/prisma/docs/pull/5706
https://github.com/gregstoll/gallery2/pull/168
https://github.com/coleifer/peewee/pull/2858
3
5
MariaDB 11.4 build consumers too much disk space - solved by not shipping the embedded server in Debian anymore
by Otto Kekäläinen 10 Jul '24
by Otto Kekäläinen 10 Jul '24
10 Jul '24
Hi all!
While preparing 11.4.2 for upload to Debian[1] I noticed that builders
fail on lack of disk space[2]. Below are some listings of build
artifacts sorted by size. I solved this for now by stopping[3] to ship
the embedded server in Debian, I don't think it had many users anyway.
Leaving this here for visibility/comments though.
[1] https://salsa.debian.org/mariadb-team/mariadb-server/-/merge_requests/88
[2] https://salsa.debian.org/otto/mariadb-server/-/jobs/5793989
[3] https://salsa.debian.org/mariadb-team/mariadb-server/-/merge_requests/88/di…
± du -shc * | sort -hr
12G total
7,8G builddir
3,7G debian
234M mysql-test
95M storage
30M sql
17M strings
5,1M plugin
± find debian/tmp/ -type f -exec du -b {} \; | sort -n | tail -n 25
8938376 debian/tmp/usr/bin/myisampack
8956744 debian/tmp/usr/bin/mariadb-test
9441288 debian/tmp/usr/bin/myisamchk
9661400 debian/tmp/usr/bin/mariadb-client-test
9725048 debian/tmp/usr/bin/mariadb-binlog
9894072 debian/tmp/usr/lib/mysql/plugin/ha_spider.so
10277896 debian/tmp/usr/lib/mysql/plugin/ha_connect.so
10622344 debian/tmp/usr/bin/aria_s3_copy
11875712 debian/tmp/usr/bin/aria_dump_log
12048792 debian/tmp/usr/bin/aria_ftdump
12079744 debian/tmp/usr/bin/aria_pack
12814096 debian/tmp/usr/bin/aria_read_log
12943168 debian/tmp/usr/bin/aria_chk
25659064 debian/tmp/usr/lib/mysql/plugin/ha_mroonga.so
138399064 debian/tmp/usr/bin/sst_dump
143275720 debian/tmp/usr/lib/mysql/plugin/ha_rocksdb.so
145093496 debian/tmp/usr/bin/mariadb-ldb
214442664 debian/tmp/usr/bin/test-connect-t
214822248 debian/tmp/usr/bin/mariadb-embedded
214849432 debian/tmp/usr/lib/x86_64-linux-gnu/libmariadbd.so.19
215011504 debian/tmp/usr/bin/mariadb-test-embedded
216148632 debian/tmp/usr/bin/mariadb-client-test-embedded
248523904 debian/tmp/usr/sbin/mariadbd
256329048 debian/tmp/usr/bin/mariadb-backup
622299760 debian/tmp/usr/lib/x86_64-linux-gnu/libmariadbd.a
± find builddir -type f -exec du -b {} \; | sort -n | tail -n 25
18278408 builddir/storage/rocksdb/librocksdb_tools.a
25227146 builddir/sql/libwsrep.a
25298746 builddir/sql/libwsrep_provider.a
25614608 builddir/storage/mroonga/vendor/groonga/lib/libgroonga.a
25659064 builddir/storage/mroonga/ha_mroonga.so
92889078 builddir/storage/perfschema/libperfschema_embedded.a
97604058 builddir/storage/perfschema/libperfschema.a
125637396 builddir/storage/innobase/libinnobase_embedded.a
131904864 builddir/storage/innobase/libinnobase.a
138399064 builddir/storage/rocksdb/sst_dump
143275720 builddir/storage/rocksdb/ha_rocksdb.so
145093496 builddir/storage/rocksdb/mariadb-ldb
189426944 builddir/unittest/sql/my_json_writer-t
189648448 builddir/unittest/sql/explain_filename-t
214442664 builddir/unittest/embedded/test-connect-t
214822248 builddir/libmysqld/examples/mariadb-embedded
214849432 builddir/libmysqld/libmariadbd.so.19
215011504 builddir/libmysqld/examples/mariadb-test-embedded
216148632 builddir/libmysqld/examples/mariadb-client-test-embedded
248523904 builddir/sql/mariadbd
256329048 builddir/extra/mariabackup/mariadb-backup
337460022 builddir/libmysqld/libsql_embedded.a
380429390 builddir/sql/libsql.a
511486028 builddir/storage/rocksdb/librocksdblib.a
622299760 builddir/libmysqld/libmariadbd.a
1
2
Re: [MariaDB commits] [PATCH] MDEV-34504 PURGE BINARY LOGS not working anymore
by Kristian Nielsen 09 Jul '24
by Kristian Nielsen 09 Jul '24
09 Jul '24
Hi Monty,
Here is my review of the patch for MDEV-34504. Looks good, just a bunch of
small suggestions:
> From: Monty <monty(a)mariadb.org>
>
> PURGE BINARY LOGS did not always purge binary logs. This commit fixes
> some of the issues and adds notifications if a binary log cannot be
> purged.
> diff --git a/mysql-test/suite/binlog/r/binlog_index.result b/mysql-test/suite/binlog/r/binlog_index.result
> index 9dfda71f9a7..2d2363a7fec 100644
> --- a/mysql-test/suite/binlog/r/binlog_index.result
> +++ b/mysql-test/suite/binlog/r/binlog_index.result
> @@ -30,6 +30,7 @@ flush logs;
> flush logs;
> *** must be a warning master-bin.000001 was not found ***
> Warnings:
> +Note 1375 Binary log 'master-bin.000004' is not purged because it is the current active binlog
Wording: "it is the active binlog" or "it is the currently active binlog".
> +Warnings:
> +Note 1375 Binary log 'master-bin.000004' is not purged because it is in use by an active XID transaction
A slightly better explanation is: "because it may be needed for crash
recovery". This more directly explains to the user why the file cannot be
removed yet (and what happens if the user removes the file anyway through
the file system).
It's true that this is currently related mostly to xa transactions, but the
details are more complex. It's not really "active" transactions (the
transactions are already committed at this point, it's the InnoDB redo log
that may not yet be synced to disk). And there may be other reasons in the
future (I think maybe binlog rotation already can be a reason, too).
> --- a/mysql-test/suite/binlog/t/binlog_flush_binlogs_delete_domain.test
> +++ b/mysql-test/suite/binlog/t/binlog_flush_binlogs_delete_domain.test
> @@ -91,6 +91,8 @@ while ($domain_cnt)
> FLUSH BINARY LOGS;
> --let $purge_to_binlog= query_get_value(SHOW MASTER STATUS, File, 1)
> --eval PURGE BINARY LOGS TO '$purge_to_binlog'
> +--replace_column 2 #
> +SHOW BINARY LOGS;
Better use --source include/show_binary_logs.inc here (for consistency with
other tests, it does the same thing).
> diff --git a/sql/log.cc b/sql/log.cc
> index c27c4f3353b..1384fb0b3e7 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -4791,8 +4791,8 @@ int MYSQL_BIN_LOG::purge_first_log(Relay_log_info* rli, bool included)
>
> @@ -4902,7 +4903,6 @@ int MYSQL_BIN_LOG::purge_logs(const char *to_log,
> log_info.log_file_name);
> goto err;
> }
> -
> if (find_next_log(&log_info, 0) || exit_loop)
> break;
> }
Accidental deletion of empty line.
> @@ -5428,18 +5435,23 @@ int MYSQL_BIN_LOG::real_purge_logs_by_size(ulonglong binlog_pos)
> -MYSQL_BIN_LOG::can_purge_log(const char *log_file_name_arg)
> +MYSQL_BIN_LOG::can_purge_log(THD *thd, const char *log_file_name_arg,
> + bool interactive)
> {
> - THD *thd= current_thd; // May be NULL at startup
> @@ -5464,8 +5479,7 @@ MYSQL_BIN_LOG::can_purge_log(const char *log_file_name_arg)
> purge_sending_new_binlog_file= sending_new_binlog_file;
> }
> if ((res= log_in_use(log_file_name_arg,
> - (is_relay_log ||
> - (thd && thd->lex->sql_command == SQLCOM_PURGE)) ?
> + (is_relay_log || interactive) ?
> 0 : slave_connections_needed_for_purge)))
It's great that you avoid using current_thd here. In fact, with this change,
thd is no longer used at all in can_purge_log(), so you don't need to pass
it in as an argument any longer. (I'm guessing this is a left-over from an
earlier version of the patch, before you introduced the `interactive`
argument, which I also think is very good).
So please remove the `thd` argument of can_purge_log() ...
> @@ -5338,6 +5339,7 @@ int MYSQL_BIN_LOG::real_purge_logs_by_size(ulonglong binlog_pos)
> MY_STAT stat_area;
> char to_log[FN_REFLEN];
> ulonglong found_space= 0;
> + THD *thd= current_thd;
... and then you can also remove this call of current_thd.
> @@ -5473,9 +5487,39 @@ MYSQL_BIN_LOG::can_purge_log(const char *log_file_name_arg)
> waiting_for_slave_to_change_binlog= 1;
> strmake(purge_binlog_name, log_file_name_arg,
> sizeof(purge_binlog_name)-1);
> + if (res == 1)
> + reason= "it is in use by a slave thread";
> + else
> + reason= "less than 'slave_connections_needed_for_purge' slaves has "
> + "processed it";
> + goto error;
Grammar: s/has/have/:
"less than 'slave_connections_needed_for_purge' slaves have processed it"
> @@ -5847,7 +5850,28 @@ int mysqld_main(int argc, char **argv)
> #ifdef WITH_WSREP
> wsrep_set_wsrep_on(nullptr);
> if (WSREP_ON && wsrep_check_opts()) unireg_abort(1);
> -#endif
> +
> + if (!opt_bootstrap && WSREP_PROVIDER_EXISTS && WSREP_ON)
> + {
> + if (global_system_variables.binlog_format != BINLOG_FORMAT_ROW)
> + {
> + sql_print_warning("Binlog_format changed to \"ROW\" because of Galera");
> + global_system_variables.binlog_format= BINLOG_FORMAT_ROW;
> + mark_binlog_format_used(binlog_format_used);
> + }
This change seems unrelated to MDEV-34504. Not critical I think, but for the
future it's best to put such changes in a separate commit. (The change itself
is good, I think.)
> @@ -1303,13 +1312,19 @@ Sys_slave_connections_needed_for_purge(
> "slave_connections_needed_for_purge",
> "Minimum number of connected slaves required for automatic binary "
> "log purge with max_binlog_total_size, binlog_expire_logs_seconds "
> - "or binlog_expire_logs_days.",
> + "or binlog_expire_logs_days. Default is 0 when Galera is enabled and 1 "
> + "otherwise.",
Since Galera is #ifdef, I guess this should be something like:
"or binlog_expire_logs_days. "
#ifdef WSREP
"Default is 0 when Galera is enabled and 1 otherwise.",
#else
"Defaults to 1.",
#endif
> diff --git a/sql/sys_vars.h b/sql/sys_vars.h
> new file mode 100644
> index 00000000000..8f4eac38cd0
> --- /dev/null
> +++ b/sql/sys_vars.h
> @@ -0,0 +1,17 @@
> +/* Copyright (c) 2024, MariaDB Corporation.
> +
> + This program is free software; you can redistribute it and/or modify
> + it under the terms of the GNU General Public License as published by
> + the Free Software Foundation; version 2 of the License.
Hm. It's not copyrighted by MariaDB Corporation solely, there are other
copyright holders.
If you want to have a copyright line before the GPL header, you could maybe
say "Copyright (c) 2024, MariaDB Corporation and others", or "Parts of this
file Copyright (c) 2024, MariaDB Corporation". Or just omit it.
- Kristian.
1
0
08 Jul '24
Here is my review of the MDEV-33487 / PR#3087 patch.
First, some general remarks.
This patch addresses a specific bottleneck when binlogging huge transactions
(eg. multi-gigabyte table updates logged in ROW mode).
The root cause is a fundamental limitation in the current binlog format that
requires all transactions to be binlogged as a single event group, using
consecutive bytes in a single binlog file. Thus, while writing the large
transaction to the binlog during commit, all other transaction commits will
have to wait before they can commit.
Eventually, we should move towards an enhanced binlog format that doesn't
have this limitation, so that transactions can be partially binlogged during
execution, avoiding this bottlenect. There are a number of other problems
caused by this limitation that are not addressed by this patch. Changing the
binlog format is a complex task not easily done, so this patch should be
seen as a temporary solution to one specific bottleneck, until if/when this
underlying limitation can hopefully be removed.
Below my detailed review of the code in the patch. This patch touches core
code of the binlog group commit mechanism, which is complex code with a
number of tricky issues to consider, and very important to keep correct and
maintainable. I have tried to give concrete suggestions on ways to write the
code, as well as some necessary extra tests. I think if MariaDB PLC wants
this patch to be merged, they need to assign a core developer to take
responsibility for it to ensure that any problems that turn up later will be
handled in a reasonable time frame (eg. prioritise time to work with the
author to debug and get fixes reviewed and merged).
> From: poempeng <poempeng(a)tencent.com>
>
> During the commit stage of a large transaction, it may cost too much
> time to write binlog cache and block all subsequent transactions for
> a long time. One possible solution of this issue is to write the binlog
> of the large transaction to a temporary file and then rename it to the
> next new binlog file.
It's not really the cost of writing (which is similar with and without the
patch), the problem is doing costly I/O under the global mutex LOCK_log that
blocks other commits. I suggest making that clear from the commit message,
for example:
Very large transactions (eg. gigabytes) can stall other transactions for a
long time because the data is written while holding LOCK_log, which blocks
other commits from binlogging. One possible solution to this issue is to
write a large transaction to its own binlog file outside of holding
LOCK_log, and then forcing a binlog rotate after obtaining LOCK_log,
simply renaming the new binlog file in place.
> diff --git a/sql/log.cc b/sql/log.cc
> index 460cefea47b..9de1c6118cd 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -74,6 +74,9 @@
> #include <utility> // pair
> #endif
>
> +#include <atomic>
> +#include <chrono>
These should not be needed after changes mentioned below.
> @@ -3727,7 +3730,10 @@ bool MYSQL_BIN_LOG::open(const char *log_name,
> enum cache_type io_cache_type_arg,
> ulong max_size_arg,
> bool null_created_arg,
> - bool need_mutex)
> + bool need_mutex,
> + const char *file_to_rename,
> + my_off_t file_size,
> + group_commit_entry *entry)
> + if (file_to_rename)
> + {
> + if (write_gtid_and_skip_event(file_size, entry))
> + goto err;
> + }
> +
I don't think you need to write these here, inside MYSQL_BIN_LOG::open().
Instead you can do it in the caller after open() returns, right? Then you
can avoid the extra arguments file_size and entry, and the conditional here.
> @@ -6826,7 +6850,6 @@ MYSQL_BIN_LOG::write_gtid_event(THD *thd, bool standalone,
> thd->variables.server_id= global_system_variables.server_id;
> }
> #endif
> -
Avoid unrelated changes in the diff like this one (to simplify later merges
and history search).
> @@ -7849,11 +7872,18 @@ int Event_log::write_cache_raw(THD *thd, IO_CACHE *cache)
> - mysql_mutex_assert_owner(&LOCK_log);
> +
> + IO_CACHE *out_file= f;
> + if (likely(f == nullptr))
> + {
> + mysql_mutex_assert_owner(&LOCK_log);
> + out_file= get_log_file();
> + }
Don't add another conditional here. Instead just pass in the IO_CACHE to use
as a parameter in each call site, eg. make the `f` parameter mandatory, not
optional.
> @@ -8581,7 +8612,23 @@ MYSQL_BIN_LOG::queue_for_group_commit(group_commit_entry *orig_entry)
> bool
> MYSQL_BIN_LOG::write_transaction_to_binlog_events(group_commit_entry *entry)
> {
> - int is_leader= queue_for_group_commit(entry);
> + int is_leader;
> + if (unlikely(entry->cache_mngr->trx_cache.get_byte_position() >=
> + non_blocking_binlog_threshold) ||
> + DBUG_IF("non_blocking_binlog_ignore_cache_size"))
> + {
> + if (!can_use_non_blocking_binlog(entry) ||
> + write_non_blocking_binlog(entry))
> + goto original_commit_path;
> + /* thread using non-blocking binlog is treated as a single group */
> + is_leader= 1;
> + entry->non_blocking_log= true;
> + entry->next= nullptr;
> + goto group_commit_leader;
> + }
> +original_commit_path:
> +
> + is_leader= queue_for_group_commit(entry);
Ok, so when we force a binlog rotation due to binlog_non_blocking_threshold,
we do this by itself, not as part of a group commit. I think that is
sensible.
The way this gets integrated in the existing code is not very nice, with all
these extra goto's and conditionals on entry->non_blocking_log. Can we
instead make the decision on binlog_non_blocking_threshold in the function
MYSQL_BIN_LOG::write_transaction_to_binlog()? And then call into
MYSQL_BIN_LOG::trx_group_commit_leader() directly, skipping the group commit
code with `is_leader` in write_transaction_to_binlog_events(). That should
greatly simplify the logic.
Probably the trx_group_commit_leader() needs to be refactored a bit, split
in two, where the first part is dealing with grabbing the group commit
queue, while only the second part is called in the
binlog_non_blocking_threshold case, to do the actual binlogging.
You might also consider using a separate function for the binlogging in the
binlog_non_blocking_threshold case, extracting pieces from
trx_group_commit_leader() into smaller shared functions; it depends on what
gives the cleanest and simplest code, and is best decided as part of the
refactoring of trx_group_commit_leader().
Another thing related to is this: How does the code ensure that commit
ordering (on the slave) is preserved? This is the wait_for_prior_commit()
mechanism, and is needed to ensure on the slave that the GTIDs are binlogged
in the right order. This is normally handled by queue_for_group_commit(),
which is skipped in the binlog_non_blocking_threshold case. It was not clear
to me from the code how this gets handled. This probably also needs a test
case to test that binlog order will be correct on the slave when multiple
transactions commit in parallel (eg. optimistic parallel replication), and
several of them use the binlog_non_blocking_threshold case.
> @@ -12459,6 +12519,7 @@ mysql_bin_log_commit_pos(THD *thd, ulonglong *out_pos, const char **out_file)
> }
> #endif /* INNODB_COMPATIBILITY_HOOKS */
>
> +mysql_rwlock_t binlog_checksum_rwlock;
>
> static void
> binlog_checksum_update(MYSQL_THD thd, struct st_mysql_sys_var *var,
> @@ -12468,6 +12529,7 @@ binlog_checksum_update(MYSQL_THD thd, struct st_mysql_sys_var *var,
> bool check_purge= false;
> ulong UNINIT_VAR(prev_binlog_id);
>
> + mysql_rwlock_wrlock(&binlog_checksum_rwlock);
> mysql_mutex_lock(mysql_bin_log.get_log_lock());
> if(mysql_bin_log.is_open())
> {
Ok, so you introduce a new lock to protect changing binlog_checksum_options.
This is otherwise protected by LOCK_log, but we don't want to be holding
that while writing the bin transaction, obviously.
But did you check all places where binlog_checksum_options is accessed? Your
patch doesn't use the new binlog_checksum_rwlock anywhere except in
binlog_checksum_update(). There might be some places that currently use
LOCK_log to protect the access, and could instead use the new
binlog_checksum_rwlock (reducing contention on LOCK_log). Or did you check
already, and there was no such opportunity?
> +bool MYSQL_BIN_LOG::can_use_non_blocking_binlog(group_commit_entry *entry)
> +{
> + DBUG_ASSERT(entry->cache_mngr->trx_cache.get_byte_position() >=
> + non_blocking_binlog_threshold ||
> + DBUG_IF("non_blocking_binlog_ignore_cache_size"));
> + THD *thd= entry->thd;
> + if (unlikely(!is_open()) || encrypt_binlog ||
> + !entry->cache_mngr->stmt_cache.empty() ||
> + entry->cache_mngr->trx_cache.has_incident() || thd->slave_thread ||
> + thd->wait_for_commit_ptr)
Oh. So you disable this on the slave. Is that really necessary?
Ok, on the slave, in many cases the following transactions in any case are
blocked from committing before the large one, so I see that this could be
acceptable. It does seem a very special-purpose use case though. So if this
is kept disabled for slave threads, it needs to be very clearly documented,
and there should be a comment here as well in the code about why this is
done.
> +std::string MYSQL_BIN_LOG::generate_random_file_name()
"generate_tmp_binlog_file_name" ?
> + if (unlikely(!binlog_dir_inited.load(std::memory_order_acquire)))
> + {
> + char dev[FN_REFLEN];
> + size_t dev_length;
> + mysql_mutex_lock(&LOCK_log);
> + if (!binlog_dir_inited.load(std::memory_order_relaxed) && name != nullptr)
This is not necessary. Just initialize what you need when the binlog is
initialized, just like other binlog initialization, avoiding the check for
binlog_dir_inited. See for example init_server_components() in sql/mysqld.cc
> + std::string temp_file_name;
Normally we don't use std::string in the server source, as it makes it
harder to control the memory allocation and/or to avoid dynamic memory
allocation. In this case we would normally use a fixed-size buffer of length
FN_REFLEN, or the class String from sql/sql_string.h with some fixed
stack-allocated initial buffer for the common case of not too long name, to
avoid a dynamic memory allocation.
> + temp_file_name.reserve(FN_REFLEN);
> + auto now_in_sys= std::chrono::system_clock::now().time_since_epoch();
> + auto now_in_ms=
> + std::chrono::duration_cast<std::chrono::milliseconds>(now_in_sys)
> + .count();
You don't need to add this millisecond timestamp, do you? The file name will
already be unique from temp_bin_counter?
> + auto count= temp_bin_counter.fetch_add(1);
This by default uses std::memory_order_seq_cst, which is overly restrictive.
You can just use class Atomic_counter from include/my_counter.h to get a
simple atomic counter with std::memory_order_relaxed semantics.
> + temp_file_name.append(binlog_dir);
> + temp_file_name.append("_temp_bin_");
> + temp_file_name.append(std::to_string(now_in_ms));
> + temp_file_name.push_back('_');
> + temp_file_name.append(std::to_string(count));
> +template <typename F= std::function<void()>> class Scoped_guard
> +{
> +public:
> + Scoped_guard(F f);
> + ~Scoped_guard();
> +
> +private:
> + F func;
> +};
Hm. Yes, this is nice, but it is not good if each new feature implements its
own specific class like this.
So either this should go somewhere shared where it can be used in the future
by other code (did you check that there isn't already something similar that
can be used, in the MariaDB source code or in the C++ standard library?),
preferably as a separate commit.
Or alternatively stay with the existing old-fascioned style for error
handling used elsewhere in sql/log.cc.
> + size_t skip_event_len= non_blocking_binlog_reserved_size - skip_event_start;
> + skip_event_buf=
> + (uchar *) my_malloc(PSI_INSTRUMENT_ME, skip_event_len, MYF(MY_ZEROFILL));
> + generate_skip_event(skip_event_buf, skip_event_len, skip_event_start,
> + entry->thd);
> + if (my_b_write(&log_file, skip_event_buf, skip_event_len) != 0)
> + return true;
Do we really need to generate this skip event? It seems quite hacky to have
a random dummy event in the middle of the binlog.
The size you need to reserve should be possible to pre-calculate:
- Format_description_log_event
- Binlog_checkpoint_event
- Gtid_list_log_event
- Gtid_log_event
The only issue should be the Gtid_list_log_event. This can grow in size, but
only if a new GTID domain id or server id gets added during the writing of
the temporary file. This is very unlikely to happen, and the code can check
for if this happens and in this case fall back to writing the data to the
binlog file normally under LOCK_log.
This way, we can avoid this strange skip event.
> + if (temp_file_name.size() >= FN_REFLEN)
> + {
> + sql_print_warning("The name of temp file '%s' is too long!",
> + temp_file_name.c_str());
> + return true;
This will go in the server log for the user/DBA to wonder about. So add a
bit of context to the message so that it is clear that it is about the
temporary file used for binlog_non_blocking_threshold.
> + thd->push_internal_handler(&dummy_error_handler);
Hm. Why do you need to do this? If this is really needed, then it needs a
good comment explaning why.
But I suspect you might not need this at all. Is it to avoid assertions when
an error is handled on the file operations? If you don't want
mysql_file_open() and so to set an error with my_error() (because you want
to handle the error yourself somehow), then just don't pass MY_WME as a flag
to the function.
> + if (unlikely(generate_new_name(new_name, name, 0)))
> + abort();
> + new_name_ptr= new_name;
> + /*
> + We log the whole file name for log file as the user may decide
> + to change base names at some point.
> + */
> + Rotate_log_event r(new_name + dirname_length(new_name), 0,
> + LOG_EVENT_OFFSET, 0);
> + if (unlikely(write_event(&r, alg)))
> + abort();
> + if (unlikely(flush_io_cache(&log_file) != 0))
> + abort();
No, don't crash the server like this.
Errors during writing binlog are indeed very tricky. But it shouldn't crash
the server. In the worst case, we may be left with a corrup binlog, but the
error should be handled and reported to the user/DBA, similar to how it is
done in other parts of the code.
How does this code work with respect to the binlog checkpoint mechanism?
The binlog checkpoint mechanism is a way to ensure that sufficient binlog
files are kept to ensure correct crash recovery, in an asynchroneous way
that does not require syncing the InnoDB redo log at binlog rotation. It
uses the struct xid_count_per_binlog to keep track of which binlogs are
active, and uses this to write binlog checkpoint events as appropriate.
There are some tricky concurrency issues around this mechanism, so it is
important to consider carefully how this will be handled in different cases
by the code. For example, if the binlog checkpoint gets delayed and spans
multiple binlog files including one or more generated by this patch; or what
happens if RESET MASTER is run in parallel, etc.
I don't see any changes related to xid_count_per_binlog in the patch. Is
this because this is all handled correctly by MYSQL_BIN_LOG::open()? I would
like to see a comment somewhere explaining how this is working.
Also I think there needs to be some test cases testing this, eg. delayed
binlog checkpointing when several binlog_non_blocking_threshold binlog
rotations are running in parallel.
> diff --git a/sql/log.h b/sql/log.h
> index fc5209d1922..af2b56da802 100644
> --- a/sql/log.h
> +++ b/sql/log.h
> @@ -19,6 +19,7 @@
> +private:
> + // reserved size for format_description_log_event, gtid_list_log_event ...
> + static const my_off_t non_blocking_binlog_reserved_size= 10 * 1024 * 1024;
Oh, 10 *megabyte* default for the dummy event? That's surely excessive? But
hopefully we can avoid the dummy event completely as suggested above.
> +static Sys_var_ulonglong Sys_var_non_blocking_binlog_threshold(
> + "non_blocking_binlog_threshold",
> + "If the binlog size of a transaction exceeds this value, we write it to a "
> + "new temporary file and rename it to the next binlog file.",
> + GLOBAL_VAR(non_blocking_binlog_threshold), CMD_LINE(OPT_ARG),
> + VALID_RANGE(256 * 1024 * 1024, ULONGLONG_MAX), DEFAULT(ULONGLONG_MAX),
> + BLOCK_SIZE(1));
Maybe name it so that it starts with "binlog_", eg. binlog_non_blocking_threshold?
I think the VALID_RANGE should be extended, there is no need to forbid users
to set it low for testing purposes etc. (even though I agree normally a
large value would be strongly recommended).
Suggestion for better description:
If the binlog size in bytes of a transaction exceeds this value, force a
binlog rotation and write the transaction to its own binlog file. This can
reduce stalls of other `commits when binlogging very large transactions, at
the cost of creating extra binlog files smaller than the configured
--max-binlog-size.
- Kristian.
1
0
https://jira.mariadb.org/browse/MDEV-34228
I am trying my hands at this issue, which is a new feature introduced in
SQL2023 and have been unable to determine if the underscore should only be
part of queries, or it should be part of the users query's outputs.
Also I have been trying to fiddle, around the parser in sql_lex.cc.
Any pointers or guidance on how to approach this?
Rohan Gupta
2
1
I followed
https://mariadb.org/get-involved/getting-started-for-developers/get-code-bu…
and am able to run the server and mariadb client.
Right now i am trying to debug the project after building and running it
but havent been successful.
I am trying the issue https://jira.mariadb.org/browse/MDEV-34228
In SQL 2023 we have the use of underscore with numeric literals. Any
pointers on how to approach this will be appreciated. Been trying to fiddle
around numericliteral.h file.
Thank You
3
2
Hi!
Here are a couple of PRs that have already one approval, but have been
pending to get the second approval for some weeks. Could any devs here
spare some review time and add their approval so the PRs can be
merged? (or shout out if they object them being merged)
https://github.com/MariaDB/server/pull/3301
https://github.com/MariaDB/server/pull/3261
https://github.com/MariaDB/server/pull/3245
https://github.com/MariaDB/server/pull/3244
https://github.com/MariaDB/server/pull/3225
https://github.com/MariaDB/server/pull/2961
1
0
Hi,
It's a known issue that strings to and from UDF functions are a pain
(https://dev.mysql.com/blog-archive/a-tale-of-udfs-with-character-sets/,
amongst others).
Most recently I ran into an issue trying to create a UDF to assist with
IPv6 manipulations (in it's current, not working as ex form at
https://github.com/jkroonza/uls-mysql/blob/master/src/uls_inet6.c)
The two UDF functions there is intended to find the "network" (first)
and "broadcast" (last) address of a range, with the intention to add
functions to perform other manipulations (like adding some offset to an
IPv6, or even add two IPv6's together, eg, to find the "next" network
for 2c0f:f720::/40 you can add 0:0:100:: to 2c0f:f720::).
The referenced works great when you get values from inet6 column to pass
to the function (it gets transformed into textual presentation before
being passed to the UDF) eg:
MariaDB [test]> select uls_inet6_network_address('::ffff:192.168.1.123',
120);
+--------------------------------------------------------+
| uls_inet6_network_address('::ffff:192.168.1.123', 120) |
+--------------------------------------------------------+
| ::ffff:192.168.1.0 |
+--------------------------------------------------------+
1 row in set (0.001 sec)
However, given this:
MariaDB [test]> create table t(t inet6 not null, primary key(t));
Query OK, 0 rows affected (0.024 sec)
MariaDB [test]> insert into t values('::ffff:192.168.1.15');
Query OK, 1 row affected (0.014 sec)
MariaDB [test]> select * from t where
uls_inet6_network_address('::ffff:192.168.1.123', 120) < t;
Empty set, 1 warning (0.009 sec)
MariaDB [test]> show warnings;
+---------+------+---------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------+
| Warning | 1292 | Incorrect inet6 value: '::ffff:192.168.1.0' |
+---------+------+---------------------------------------------+
1 row in set (0.000 sec)
I started to dig, to find this:
MariaDB [test]> select
charset(uls_inet6_network_address('::ffff:192.168.1.123', 120));
+-----------------------------------------------------------------+
| charset(uls_inet6_network_address('::ffff:192.168.1.123', 120)) |
+-----------------------------------------------------------------+
| binary |
+-----------------------------------------------------------------+
1 row in set (0.000 sec)
Then I played a bit and found that if I can rig it to the returned
string has 16 characters I don't get an error; eg:
MariaDB [test]> select * from t where
uls_inet6_network_address('::ff:f:ffff:ffff', 120) > t;
+---------------------+
| t |
+---------------------+
| ::ffff:192.168.1.15 |
+---------------------+
1 row in set (0.001 sec)
Based on this I'm inferring that:
1. Contextually the difference between BINARY() and CHAR() types is the
character set (ie, binary is implemented as a kind of characters set for
strings, or perhaps strings are binaries with a specific character set).
2. The inet6 storage column (which I now realize is simply BINARY(16),
with a transform applied to transform strings to and from the BINARY()
format, I think the technical term in the code is "fixed binary storage
storage").
3. When string (character) data gets sent *to* an inet6 column a
"character set conversion" is performed, and as is the case with UDFs,
it's already BINARY, so no conversion is actually performed, and since
the BINARY length is now *not* 16 bytes, we get the above warning, and
thus comparisons are as if with a NULL value, and always false.
I've been contemplating possible "solutions" to the problem without
breaking backwards compatibility, and it's tricky. The below aims at a
simple way of specifying the return "character set" of a string return
in more detail. And possibly even looking at the character sets for
parameters.
MySQL have started down a path (as per the tale above) whereby it's
using some other mechanism to give indications about character sets. I
don't like the particular mechanism, but it could work, looks like it
does only character set though, in which case for my example above, I
could just force both input and output to "latin1" so that MySQL/MariaDB
is forced to convert between that and INET6 again. I think we can do
better. I'm not convinced their change will permit backwards
compatibility - ie, any UDF that uses those functions will require them
and cannot opt to operate in a degraded mode. Not sure that's even a
desirable thing to have though ... so perhaps copying what they're doing
here is the way to go.
My ideas:
Option 1. Simply permit specifying a more precise string type during
function creation.
For example, instead of:
MariaDB [uls]> CREATE FUNCTION uls_inet6_last_address RETURNS STRING
SONAME 'uls_inet6.so';
Do this:
MariaDB [uls]> CREATE FUNCTION uls_inet6_last_address RETURNS INET6
SONAME 'uls_inet6.so';
And this is permitted because INET6 is a specialization of BINARY, which
is really what MySQL passes to/from UDF functions anyway.
The downside is this will only work for returns. This would, work for
my use case, since from INET6 => UDF the decoder is called anyway, so I
do get the text presentation on function entry, and now I can now I just
need to return the binary form of INET6. If sent back to the client it
gets converted by the server to
This eliminates at least *one* of the two pointless conversions to/from
binary format.
Option 2. A mechanism to pass the character set/binary type upon entry
and exit.
What I'm thinking here is a non-backwards compatible change,
potentially. Or use the *extension pointer somehow. I'm still having
some trouble navigating the codebase to figure out exactly how the void
*extension pointers in UDF_ARGS and UDF_INIT structs are used. But I'm
thinking they can be (ab)used to somehow pass an additional structure,
which may be how 3 is achieved perhaps?
To be more precise, the current UDF_ARGS looks like:
47 typedef struct UDF_ARGS {
48 unsigned int arg_count; /**< Number of arguments */
49 enum Item_result *arg_type; /**< Pointer to item_results */
50 char **args; /**< Pointer to argument */
51 unsigned long *lengths; /**< Length of string arguments */
52 char *maybe_null; /**< Set to 1 for all maybe_null
args */
53 char **attributes; /**< Pointer to attribute name */
54 unsigned long *attribute_lengths; /**< Length of attribute arguments */
55 void *extension;
56 } UDF_ARGS;
I can't find any source files that references both the UDF_ARGS type and
references this extension field. As such I'm hoping I can safely assume
that in current implementations this will always be NULL.
If so it becomes possible to turn this into a pointer-sized version
field, or struct size field, ie something like:
union {
void *extension;
size_t argsize; /* one alternative */
long api_version; /* another alternative */
};
for argsize this way it becomes possible to check that this field is >=
sizeof(UDF_ARGS) against which the UDF was compiled. This *feels*
flakey, but should be fine. If a user wants more fine-grained compile
backwards compat, then the user can check that all fields (further than
those listed) are positioned inside the structure.
Another option is simply to have extension be a pointer to some
UDF_ARGS_EXPANDED struct, which really is a pointer back to UDF_ARGS
which is really something like:
typedef struct UDF2_ARGS {
... everything from above, except void* extension becomes:
void *UDF2_ARGS extension; /* this points to self */
size_t argsize; /* or long api_version */
bool udf2_supported; /* set to false by the server, if the UDF
supports this, set to true */
const char ** string_arg_types; /* in _init this will be passed as
the raw type, eg, INET6 if the input is an INET6 field, can be set in
_init to have the server cast the type, ie, perform some conversion */
... additional fields;
} UDF2_ARGS;
So in my example, if udf2_supported comes back as false after _init,
then current behaviour is maintained, and INET6 will be converted to
string, and return strings are assumed to be BINARY (current
behaviour). If however, udf2_supported, then my code could set
string_args_types[0] = "inet6"; - which the engine knows is a BINARY(16)
type, and use the INET6 conversion code to convert to and from
BINARY(16) as needed.
Option 3. Some other more formal UDF API versioning/api
definition/scheme. So call this UDFv2, and make this extensible so that
features can be "negotiated", or that the UDF itself can even "fail to
load" in case where it requires support from the running server that's
not provided.
The idea here would be that a symbol like "mariadb_udf_init" is
exported, passing it some structure that defines the capabilities of the
server it's running on, the module can then decline to load, or proceed,
but limit itself based on functionality available kind of thing.
I'm guessing option 2 is also a way of achieving this. And may be
sufficient. I do however like the idea of having a {func}_load and
{func}_unload call (some of our PBKDF functions would also benefit from
this in that we can load the openssl algorithms once at startup instead
of every time on _init. There may be other complications though, but
still, this could be one suggestion for v2.
I'm happy to write a more formal specification based off of the informal
specification, and I'm inclined to attempt option 2 above. I just
honestly have no idea what's all REALLY involved, and where to find the
various bits and pieces of code. So I guess the initial steps would be:
1. Add a _load function which is invoked at *load* time, passing server
and version information. Not sure what else could be useful, intension
is to initialise the UDF for use as needed. UDFs that want to maintain
backwards compatible would need to track if this was indeed called or
not, and if not, act accordingly, or if it can't be backwards compatible
without this, error out in _init. Suggestions as to parameters to pass
would be great, I suggest we use a struct again with a size field as
it's first field to ensure that we can add extra fields at a later
stage, safely. Eg:
struct UDF_LOADINFO {
size_t load_size;
const char* server;
struct {
uint16_t major, minor, patch; /* 10, 6, 17 - for example what
I've got runing currently */
const char* extra; /* MariaDB-log, the stuff after the dash from
SELECT version(); */
} version;
} UDF_LOADINFO;
And so we will call "bool (*func_load)(const UDF_LOADINFO*)" IFF it exist.
2. Add an _unload function "void (*func_unload)()", purpose of which is
to clean up after _load.
Note: It's possible to use __attribute__((constructor)) and destructor
functions, but my experience is that they're messy, and if you have
multiple functions in a UDF you can't know which ones you're being
loaded for.
3. Expand for UDF2_ARGS as explained in option 2 above, as well as
UDF2_INIT.
4. Ability to flag functions as *deterministic* (same return for same
arguments) rather than merely const or not const. This way SQL can
optimize calls in the same manner as for stored functions. I'm not
going to dig into that, but happy to add the flag.
5. Document and flag this as UDFv2 as and when it hits formal release.
Kind regards,
Jaco
2
7
Re: 98ebe0a3afc: MDEV-19123 Change default charset from latin1 to utf8mb4
by Sergei Golubchik 17 Jun '24
by Sergei Golubchik 17 Jun '24
17 Jun '24
Hi, Alexander,
No comments about actual code changes, only about tests.
Please, see below
On Jun 12, Alexander Barkov wrote:
> revision-id: 98ebe0a3afc (mariadb-11.5.1-12-g98ebe0a3afc)
> parent(s): 186a30de58b
> author: Alexander Barkov
> committer: Alexander Barkov
> timestamp: 2024-06-11 14:17:11 +0400
> message:
>
> MDEV-19123 Change default charset from latin1 to utf8mb4
>
> Changing the default server character set from latin1 to utf8mb4.
> diff --git a/mysql-test/main/column_compression.test b/mysql-test/main/column_compression.test
> --- a/mysql-test/main/column_compression.test
> +++ b/mysql-test/main/column_compression.test
> @@ -9,20 +9,20 @@ let $typec= BLOB COMPRESSED;
> let $typeu= BLOB;
> --source column_compression.inc
>
> -let $typec= TEXT COMPRESSED;
> -let $typeu= TEXT;
> +let $typec= TEXT COMPRESSED CHARACTER SET latin1;
> +let $typeu= TEXT CHARACTER SET latin1;
why?
> --source column_compression.inc
>
> let $typec= VARBINARY(10000) COMPRESSED;
> let $typeu= VARBINARY(10000);
> --source column_compression.inc
>
> -let $typec= VARCHAR(10000) COMPRESSED;
> -let $typeu= VARCHAR(10000);
> +let $typec= VARCHAR(10000) COMPRESSED CHARACTER SET latin1;
> +let $typeu= VARCHAR(10000) CHARACTER SET latin1;
> --source column_compression.inc
>
> let $typec= TEXT COMPRESSED CHARSET ucs2;
> -let $typeu= TEXT;
> +let $typeu= TEXT CHARACTER SET latin1;
> --source column_compression.inc
>
> SET column_compression_zlib_wrap=DEFAULT;
> diff --git a/mysql-test/main/create.result b/mysql-test/main/create.result
> --- a/mysql-test/main/create.result
> +++ b/mysql-test/main/create.result
> @@ -506,9 +506,9 @@ FROM t1;
> SHOW CREATE TABLE t2;
> Table Create Table
> t2 CREATE TABLE `t2` (
> - `ifnull(c_tinytext, CAST('yet another binary data' AS BINARY))` tinyblob DEFAULT NULL,
> - `ifnull(c_text, CAST('yet another binary data' AS BINARY))` blob DEFAULT NULL,
> - `ifnull(c_mediumtext, CAST('yet another binary data' AS BINARY))` mediumblob DEFAULT NULL,
> + `ifnull(c_tinytext, CAST('yet another binary data' AS BINARY))` blob DEFAULT NULL,
> + `ifnull(c_text, CAST('yet another binary data' AS BINARY))` mediumblob DEFAULT NULL,
> + `ifnull(c_mediumtext, CAST('yet another binary data' AS BINARY))` longblob DEFAULT NULL,
looks wrong
UPD: there were more changes like this below, I didn't comment on them
> `ifnull(c_longtext, CAST('yet another binary data' AS BINARY))` longblob DEFAULT NULL
> ) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_uca1400_ai_ci
> DROP TABLE t2;
> diff --git a/mysql-test/main/ctype_utf8_def_upgrade.result b/mysql-test/main/ctype_utf8_def_upgrade.result
> --- a/mysql-test/main/ctype_utf8_def_upgrade.result
> +++ b/mysql-test/main/ctype_utf8_def_upgrade.result
> @@ -61,23 +61,23 @@ t1 CREATE TABLE `t1` (
> PRIMARY KEY (`Host`,`Db`)
> ) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_uca1400_ai_ci COMMENT='Host privileges; Merged with database privileges'
> DROP TABLE t1;
> -SET @@character_set_database=DEFAULT;
> +SET @@collation_database=DEFAULT;
why?
> # Now do the same, but doing 'ALTER DATABASE' to create the db.opt file,
> # instead of setting variables directly.
> # Emulate a pre-4.1 database without db.opt
> SHOW CREATE DATABASE db1;
> Database Create Database
> -db1 CREATE DATABASE `db1` /*!40100 DEFAULT CHARACTER SET utf8mb3 COLLATE utf8mb3_uca1400_ai_ci */
> +db1 CREATE DATABASE `db1` /*!40100 DEFAULT CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci */
> USE db1;
> -SELECT @@character_set_database, 'taken from defaults' AS comment;
> -@@character_set_database comment
> -utf8mb3 taken from defaults
> +SELECT @@collation_database, 'taken from defaults' AS comment;
> +@@collation_database comment
> +utf8mb3_general_ci taken from defaults
> USE test;
> ALTER DATABASE db1 DEFAULT CHARACTER SET latin1;
> USE db1;
> -SELECT @@character_set_database, 'taken from db.opt' AS comment;
> -@@character_set_database comment
> -latin1 taken from db.opt
> +SELECT @@collation_database, 'taken from db.opt' AS comment;
> +@@collation_database comment
> +latin1_swedish_ci taken from db.opt
> SELECT COUNT(*) FROM t1;
> ERROR HY000: Got error 190 "Incompatible key or row definition between the MariaDB .frm file and the information in the storage engine. You can try REPAIR TABLE ... USE_FRM possibly followed by ALTER TABLE ... FORCE or dump and restore the table to fix this" from storage engine MyISAM
> REPAIR TABLE t1 USE_FRM;
> diff --git a/mysql-test/main/endspace.test b/mysql-test/main/endspace.test
> --- a/mysql-test/main/endspace.test
> +++ b/mysql-test/main/endspace.test
> @@ -13,7 +13,7 @@ drop table if exists t1;
> # Test MyISAM tables.
> #
>
> -create table t1 (text1 varchar(32) not NULL, KEY key1 (text1));
> +create table t1 (text1 varchar(32) not NULL, KEY key1 (text1)) charset=latin1;
why is that?
> insert into t1 values ('teststring'), ('nothing'), ('teststring\t');
> check table t1;
> select * from t1 ignore key (key1) where text1='teststring' or
> diff --git a/mysql-test/main/func_compress.test b/mysql-test/main/func_compress.test
> --- a/mysql-test/main/func_compress.test
> +++ b/mysql-test/main/func_compress.test
> @@ -1,5 +1,9 @@
> -- source include/have_compress.inc
> -- source include/have_normal_zlib.inc
> +
> +--source include/test_db_charset_latin1.inc
same as in column_compression.test ?
> +
> +
> #
> # Test for compress and uncompress functions:
> #
> diff --git a/mysql-test/main/long_unique.result b/mysql-test/main/long_unique.result
> --- a/mysql-test/main/long_unique.result
> +++ b/mysql-test/main/long_unique.result
> @@ -41,7 +41,7 @@ Ignored NO
>
> MyISAM file: DATADIR/test/t1
> Record format: Packed
> -Character set: latin1_swedish_ci (8)
> +Character set: ? (0)
huh?
> Data records: 10 Deleted blocks: 0
> Recordlength: 12
>
> diff --git a/mysql-test/main/mysqlbinlog_row_compressed.result b/mysql-test/main/mysqlbinlog_row_compressed.result
> --- a/mysql-test/main/mysqlbinlog_row_compressed.result
> +++ b/mysql-test/main/mysqlbinlog_row_compressed.result
> @@ -40,9 +40,10 @@ SET @@session.sql_mode=#/*!*/;
> SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
> /*!\C latin1 *//*!*/;
> SET @@session.character_set_client=latin1,@@session.collation_connection=8,@@session.collation_server=#/*!*/;
> +SET @@session.character_set_collations='utf8mb3=utf8mb3_uca1400_ai_ci,ucs2=ucs2_uca1400_ai_ci,utf8mb4=utf8mb4_uca1400_ai_ci,utf16=utf16_uca1400_ai_ci,utf32=utf32_uca1400_ai_ci'/*!*/;
why did this appear?
> SET @@session.lc_time_names=0/*!*/;
> SET @@session.collation_database=DEFAULT/*!*/;
> -CREATE TABLE t1 (pk INT PRIMARY KEY, f1 INT, f2 INT, f3 TINYINT, f4 MEDIUMINT, f5 BIGINT, f6 INT, f7 INT, f8 char(1))
> +CREATE TABLE t1 (pk INT PRIMARY KEY, f1 INT, f2 INT, f3 TINYINT, f4 MEDIUMINT, f5 BIGINT, f6 INT, f7 INT, f8 char(1)) CHARSET=latin1
> /*!*/;
> # at <pos>
> #<date> server id 1 end_log_pos # CRC32 XXX GTID 0-1-2 ddl thread_id=TID
> diff --git a/mysql-test/main/sp.result b/mysql-test/main/sp.result
> --- a/mysql-test/main/sp.result
> +++ b/mysql-test/main/sp.result
> @@ -6800,6 +6800,8 @@ DROP FUNCTION IF EXISTS f2;
> #
>
> CREATE FUNCTION f1() RETURNS VARCHAR(65525) RETURN 'Hello';
> +Warnings:
> +Note 1246 Converting column '' from VARCHAR to TEXT
this could use latin1, I suppose
>
> CREATE FUNCTION f2() RETURNS TINYINT RETURN 1;
>
> diff --git a/mysql-test/suite/binlog_encryption/rpl_special_charset.opt b/mysql-test/suite/binlog_encryption/rpl_special_charset.opt
> --- a/mysql-test/suite/binlog_encryption/rpl_special_charset.opt
> +++ b/mysql-test/suite/binlog_encryption/rpl_special_charset.opt
> @@ -1 +1 @@
> ---character-set-server=utf16
> +--character-set-server=utf16 --collation-server=utf16_general_ci
why?
> diff --git a/mysql-test/suite/federated/assisted_discovery.test b/mysql-test/suite/federated/assisted_discovery.test
> --- a/mysql-test/suite/federated/assisted_discovery.test
> +++ b/mysql-test/suite/federated/assisted_discovery.test
> @@ -38,7 +38,7 @@ create table t1 (
> d varchar(4096) not null,
> primary key (a),
> key (b,c,d(255))
> -);
> +) CHARSET=latin1;
why?
> show create table t1;
>
> connection master;
> diff --git a/mysql-test/suite/funcs_1/r/is_columns_innodb.result b/mysql-test/suite/funcs_1/r/is_columns_innodb.result
> --- a/mysql-test/suite/funcs_1/r/is_columns_innodb.result
> +++ b/mysql-test/suite/funcs_1/r/is_columns_innodb.result
> @@ -739,6 +740,9 @@ WHERE table_schema LIKE 'test%'
> AND CHARACTER_OCTET_LENGTH / CHARACTER_MAXIMUM_LENGTH <> 1
> ORDER BY CHARACTER_SET_NAME, COLLATION_NAME, COL_CML;
> COL_CML DATA_TYPE CHARACTER_SET_NAME COLLATION_NAME
> +4.0000 char utf8mb4 utf8mb4_uca1400_ai_ci
> +4.0000 enum utf8mb4 utf8mb4_uca1400_ai_ci
> +4.0000 set utf8mb4 utf8mb4_uca1400_ai_ci
I suppose this changed the intention of the test
> Warnings:
> Warning 1365 Division by 0
> Warning 1365 Division by 0
> diff --git a/mysql-test/suite/gcol/r/innodb_virtual_index.result b/mysql-test/suite/gcol/r/innodb_virtual_index.result
> --- a/mysql-test/suite/gcol/r/innodb_virtual_index.result
> +++ b/mysql-test/suite/gcol/r/innodb_virtual_index.result
> @@ -114,7 +114,7 @@ KEY `vbidxcol` (`vbidxcol`),
> KEY `a_2` (`a`,`vbidxcol`),
> KEY `vbidxcol_2` (`vbidxcol`),
> FULLTEXT KEY `ftsic` (`c`,`b`)
> -) ENGINE=InnoDB;
> +) ENGINE=InnoDB CHARSET=latin1;
why latin1 everywhere in this file?
> Warnings:
> Note 1831 Duplicate index `vbidxcol_2`. This is deprecated and will be disallowed in a future release
> ALTER TABLE ibstd_08 ADD COLUMN nc07006 BIGINT AUTO_INCREMENT NOT NULL , ADD KEY auto_nc07006(nc07006);
> diff --git a/mysql-test/suite/innodb/r/online_table_rebuild.result b/mysql-test/suite/innodb/r/online_table_rebuild.result
> --- a/mysql-test/suite/innodb/r/online_table_rebuild.result
> +++ b/mysql-test/suite/innodb/r/online_table_rebuild.result
> @@ -10,7 +10,7 @@ INSERT INTO t1 VALUES(2, repeat('b', 100), repeat('a', 100));
> COMMIT;
> SET DEBUG_SYNC="now SIGNAL dml_commit";
> connection default;
> -ERROR 23000: Duplicate entry 'bbbbbbbbbb' for key 'f2'
> +ERROR 23000: Duplicate entry 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb' for key 'f2'
why?
> connection default;
> SET DEBUG_SYNC="inplace_after_index_build SIGNAL dml_start WAIT_FOR dml_commit";
> ALTER TABLE t1 ADD PRIMARY KEY(f1);
> diff --git a/mysql-test/suite/innodb/t/alter_primary_key.test b/mysql-test/suite/innodb/t/alter_primary_key.test
> --- a/mysql-test/suite/innodb/t/alter_primary_key.test
> +++ b/mysql-test/suite/innodb/t/alter_primary_key.test
> @@ -1,6 +1,9 @@
> --source innodb_default_row_format.inc
> --source include/have_debug.inc
> --source include/have_debug_sync.inc
> +--disable_query_log
> +--source include/test_db_charset_latin1.inc
> +--enable_query_log
you don't disable query log in other tests, why here?
>
> --echo #
> --echo # MDEV-23244 ALTER TABLE…ADD PRIMARY KEY fails to flag
> diff --git a/mysql-test/suite/perfschema/t/short_option_1-master.opt b/mysql-test/suite/perfschema/t/short_option_1-master.opt
> --- a/mysql-test/suite/perfschema/t/short_option_1-master.opt
> +++ b/mysql-test/suite/perfschema/t/short_option_1-master.opt
> @@ -1 +1 @@
> --a -Cutf8 -W1
> +-a -Cutf8 -W1 --collation-server=utf8_general_ci
was it necessary?
> diff --git a/mysql-test/suite/rpl/include/rpl_charset.inc b/mysql-test/suite/rpl/include/rpl_charset.inc
> --- a/mysql-test/suite/rpl/include/rpl_charset.inc
> +++ b/mysql-test/suite/rpl/include/rpl_charset.inc
> @@ -145,4 +145,11 @@ sync_slave_with_master;
> --echo #
>
>
> +connection master;
> +SET GLOBAL collation_server=utf8mb4_uca1400_ai_ci;
> +SET SESSION collation_server=utf8mb4_uca1400_ai_ci;
> +connection slave;
> +SET GLOBAL collation_server=utf8mb4_uca1400_ai_ci;
> +SET SESSION collation_server=utf8mb4_uca1400_ai_ci;
why?
> +
> --source include/rpl_end.inc
> diff --git a/mysql-test/suite/sys_vars/inc/secure_timestamp_func.inc b/mysql-test/suite/sys_vars/inc/secure_timestamp_func.inc
> --- a/mysql-test/suite/sys_vars/inc/secure_timestamp_func.inc
> +++ b/mysql-test/suite/sys_vars/inc/secure_timestamp_func.inc
> @@ -58,7 +58,7 @@ set time_zone='+00:00';
> set timestamp=1234567890.101112;
> select @@timestamp, now(6);
>
> -create table t1 (b varchar(20), a timestamp(6) default current_timestamp(6));
> +create table t1 (b varchar(20), a timestamp(6) default current_timestamp(6)) charset=latin1;
why? charset shouldn't affect secure_timestamp functionality
(same in all other secure_timestamp tests)
> insert t1 (b) values ('replicated');
> sync_slave_with_master;
> create trigger t1rbr before insert on t1 for each row set new.a=now(6);
> diff --git a/storage/connect/mysql-test/connect/r/bson.result b/storage/connect/mysql-test/connect/r/bson.result
> --- a/storage/connect/mysql-test/connect/r/bson.result
> +++ b/storage/connect/mysql-test/connect/r/bson.result
> @@ -321,7 +321,7 @@ WHO CHAR(12),
> WEEK INT(2) JPATH='$.WEEK[2].NUMBER',
> WHAT CHAR(32) JPATH='$.WEEK[2].EXPENSE[*].WHAT',
> AMOUNT DOUBLE(8,2) JPATH='$.WEEK[2].EXPENSE[*].AMOUNT')
> -ENGINE=CONNECT TABLE_TYPE=BSON FILE_NAME='expense.json';
> +ENGINE=CONNECT CHARSET=latin1 TABLE_TYPE=BSON FILE_NAME='expense.json';
is it necessary? are there any connect/bson tests for utf8mb4?
> SELECT * FROM t4;
> WHO WEEK WHAT AMOUNT
> Joe 5 Beer 14.00
> diff --git a/storage/connect/mysql-test/connect/r/endian.result b/storage/connect/mysql-test/connect/r/endian.result
> --- a/storage/connect/mysql-test/connect/r/endian.result
> +++ b/storage/connect/mysql-test/connect/r/endian.result
> @@ -10,7 +10,7 @@ birth DATE NOT NULL FIELD_FORMAT='L',
> id CHAR(5) NOT NULL FIELD_FORMAT='L2',
> salary DOUBLE(9,2) NOT NULL DEFAULT 0.00 FIELD_FORMAT='LF',
> dept INT(4) NOT NULL FIELD_FORMAT='L2'
> -) ENGINE=CONNECT TABLE_TYPE=BIN BLOCK_SIZE=5 FILE_NAME='Testbal.dat';
> +) ENGINE=CONNECT CHARSET=latin1 TABLE_TYPE=BIN BLOCK_SIZE=5 FILE_NAME='Testbal.dat';
or any connect utf8mb4 tests whatsoever?
> SELECT * FROM t1;
> fig name birth id salary dept
> 5500 ARCHIBALD 1980-01-25 3789 4380.50 318
> diff --git a/storage/rocksdb/mysql-test/rocksdb/r/checksum_table_live.result b/storage/rocksdb/mysql-test/rocksdb/r/checksum_table_live.result
> --- a/storage/rocksdb/mysql-test/rocksdb/r/checksum_table_live.result
> +++ b/storage/rocksdb/mysql-test/rocksdb/r/checksum_table_live.result
> @@ -1,7 +1,7 @@
> DROP TABLE IF EXISTS t1,t2;
> -CREATE TABLE t1 (a INT PRIMARY KEY, b CHAR(8)) ENGINE=rocksdb CHECKSUM=1;
> +CREATE TABLE t1 (a INT PRIMARY KEY, b CHAR(8)) ENGINE=rocksdb CHECKSUM=1 CHARSET=latin1;
again. was it necessary? are there any rocksdb tests with utf8mb4?
> INSERT INTO t1 (a,b) VALUES (1,'a'),(2,'b');
> -CREATE TABLE t2 (a INT PRIMARY KEY, b CHAR(8)) ENGINE=rocksdb CHECKSUM=1;
> +CREATE TABLE t2 (a INT PRIMARY KEY, b CHAR(8)) ENGINE=rocksdb CHECKSUM=1 CHARSET=latin1;
> CHECKSUM TABLE t1;
> Table Checksum
> test.t1 4259194219
Regards,
Sergei
Chief Architect, MariaDB Server
and security(a)mariadb.org
2
3
12 Jun '24
Review of MDEV-30073 Wrong result on 2nd execution of PS for query
with NOT EXISTS
Bye the way, nice to see that this patch close so many different MDEVs!
Please add a list of all the MDEV's closed by this patch in the commit message!
MDEV-23828: query with EXISTS over subquery using mergeable derived
table / view executed in PS/SP for the second time
MDEV-30396: 2nd execution of query with EXISTS subquery over view / DT
MDEV-31175: 2nd execution of SELECT with mergeable view / DT
and aggregate function
MDEV-33081: 2-nd execution of query with forced materialization
of a mergeable view because of too many base tables
MDEV-31269: 2nd execution of query with HAVING containing IN predicand
over subquery with EXISTS in WHERE clause
MDEV-31281: 2nd execution of query with IN subquery whose left operand
is aggregate function or aggregate function
MDEV-31178: 2 execution of query with conversion from
IN predicate to IN subquery applied
> index 446154e517b..d474a39e228 100644
> --- a/mysql-test/main/func_group.test
> +++ b/mysql-test/main/func_group.test
> @@ -1268,12 +1268,14 @@ set @@optimizer_switch='materialization=on,in_to_exists=off,semijoin=off';
> --echo # 1) Test that subquery materialization is setup for query with
> --echo # premature optimize() exit due to "Impossible WHERE"
> --echo #
> +--disable_ps2_protocol
> SELECT MIN(t2.pk)
> FROM t2 JOIN t1 ON t1.pk=t2.pk
> WHERE 'j'
> HAVING ('m') IN (
> SELECT v
> FROM t2);
> +--enable_ps2_protocol
Why this change?
Does this produce wrong results after your patch?
At least I cannot see anything in the query that would disallow double
execution.
If double execution does not work anymore for this one, please add a
note about this in
the commit message or in the test case.
> diff --git a/mysql-test/main/mysqltest_ps.test b/mysql-test/main/mysqltest_ps.test
> index c5a332f691f..853ce8bd38e 100644
> --- a/mysql-test/main/mysqltest_ps.test
> +++ b/mysql-test/main/mysqltest_ps.test
> @@ -32,3 +32,4 @@ select 1 + "2 a";
> create table t (a int primary key, b blob default '');
> select a, (2*a) AS a from t group by a;
> drop table t;
> +
>
Remove the change to the above. No need to touch this test case in this commit
> diff --git a/mysql-test/main/order_by.test b/mysql-test/main/order_by.test
> index c1e0c318283..7ca9a8f8def 100644
> --- a/mysql-test/main/order_by.test
> +++ b/mysql-test/main/order_by.test
> @@ -90,7 +90,9 @@ drop table t1;
> create table t1 (i int);
> insert into t1 values(1),(2),(1),(2),(1),(2),(3);
> select distinct i from t1;
> +--disable_ps2_protocol
> select distinct i from t1 order by rand(5);
> +--enable_ps2_protocol
>
Why disable ps2_protocol here?
> diff --git a/mysql-test/main/type_date.test b/mysql-test/main/type_date.test
> index 3566e427d50..5f17a893d53 100644
> --- a/mysql-test/main/type_date.test
> +++ b/mysql-test/main/type_date.test
> @@ -601,7 +601,7 @@ SHOW CREATE TABLE t2;
> DROP TABLE t2;
> DROP VIEW v1;
> DROP TABLE t1;
> -
> +# --enable_ps2_protocol
The above looks wrong. Probably leftover from some testing.
Please revert the change to this file
> diff --git a/mysql-test/main/type_newdecimal.test b/mysql-test/main/type_newdecimal.test
> index 366199e8aa8..7223b7b44f0 100644
> --- a/mysql-test/main/type_newdecimal.test
> +++ b/mysql-test/main/type_newdecimal.test
> @@ -1929,7 +1929,9 @@ drop table t1;
> SELECT AVG(DISTINCT 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001);
> --enable_view_protocol
> CREATE TABLE t1 AS SELECT NULL AS v1;
> +--disable_ps2_protocol
> SELECT 1 FROM t1 GROUP BY v1 ORDER BY AVG ( from_unixtime ( '' ) ) ;
> +--enable_ps2_protocol
> DROP TABLE t1;
Why the change?
> diff --git a/mysql-test/main/win.test b/mysql-test/main/win.test
> index cba6b0fd7af..00c79425fc0 100644
> --- a/mysql-test/main/win.test
> +++ b/mysql-test/main/win.test
> @@ -2770,6 +2770,7 @@ drop table t1;
> CREATE TABLE t1 ( a char(25), b text);
> INSERT INTO t1 VALUES ('foo','bar');
>
> +--disable_ps2_protocol
> SELECT
> SUM(b) OVER (PARTITION BY a),
> ROW_NUMBER() OVER (PARTITION BY b)
> @@ -2777,6 +2778,7 @@ FROM t1
> GROUP BY
> LEFT((SYSDATE()), 'foo')
> WITH ROLLUP;
> +--enable_ps2_protocol
> drop table t1;
Why ?
> diff --git a/mysql-test/suite/federated/federatedx_create_handlers.test b/mysql-test/suite/federated/federatedx_create_handlers.test
> index 8e504590282..b2884397333 100644
> --- a/mysql-test/suite/federated/federatedx_create_handlers.test
> +++ b/mysql-test/suite/federated/federatedx_create_handlers.test
> @@ -94,6 +94,7 @@ DEFAULT CHARSET=latin1;
> INSERT INTO federated.t3 VALUES
> ('yyy'), ('www'), ('yyy'), ('xxx'), ('www'), ('yyy'), ('www');
>
> +--sorted_result
> SELECT *
> FROM federated.t3, (SELECT * FROM federated.t1 WHERE id > 3) t
> WHERE federated.t3.name=t.name;
>
I assume this has nothing to do with the patch, but just a random
failure you fixed?
If that is the case, please add a comment about this file change in
the commit message or
move this change to another commit.
> --- a/sql/item.cc
> +++ b/sql/item.cc
> @@ -3027,7 +3027,11 @@ Item* Item_ref::build_clone(THD *thd)
> unlikely(!(copy->ref= (Item**) alloc_root(thd->mem_root,
> sizeof(Item*)))) ||
> unlikely(!(*copy->ref= (* ref)->build_clone(thd))))
> + {
> + if (copy)
> + copy->ref= 0;
> return 0;
> + }
> return copy;
> }
Ok code, but I would have prefered this a bit more simple approach:
Item* Item_ref::build_clone(THD *thd)
{
Item_ref *copy= (Item_ref *) get_copy(thd);
if (likely(copy))
{
if (unlikely(!(copy->ref= (Item**) alloc_root(thd->mem_root,
sizeof(Item*)))) ||
unlikely(!(*copy->ref= (* ref)->build_clone(thd))))
{
copy->ref= 0;
return 0;
}
}
return copy;
}
> @@ -5823,22 +5827,48 @@ Item_field::fix_outer_field(THD *thd, Field **from_field, Item **reference)
> select->group_list.elements &&
> (place == SELECT_LIST || place == IN_HAVING))
> {
> - Item_outer_ref *rf;
> /*
> - If an outer field is resolved in a grouping select then it
> - is replaced for an Item_outer_ref object. Otherwise an
> - Item_field object is used.
> - The new Item_outer_ref object is saved in the inner_refs_list of
> - the outer select. Here it is only created. It can be fixed only
> - after the original field has been fixed and this is done in the
> - fix_inner_refs() function.
> - */
> - ;
> - if (!(rf= new (thd->mem_root) Item_outer_ref(thd, context, this)))
> - return -1;
> - thd->change_item_tree(reference, rf);
> - select->inner_refs_list.push_back(rf, thd->mem_root);
> - rf->in_sum_func= thd->lex->in_sum_func;
> + This Item_field represents a reference to a column in some
> + outer select. Wrap this Item_field item into an Item_outer_ref
> + object if we are at the first execution of the query or at
> + processing the prepare command for it. Make this wrapping
> + permanent for the first query execution so that it could be used
> + for next executions of the query.
> + Add the wrapper item to the list st_select_lex::inner_refs_list
> + of the select against which this Item_field has been resolved.
> + */
> + bool is_first_execution= thd->is_first_query_execution();
> + if (is_first_execution ||
> + thd->stmt_arena->is_stmt_prepare())
The above is the same as:
((stmt_arena->is_conventional() ||
stmt_arena->state != STMT_EXECUTED) &&
!stmt_arena->is_stmt_prepare()) ||
stmt_arena->is_stmt_prepare())
=>
((stmt_arena->state == STMT_CONVENTIONAL_EXECUTION ||
stmt_arena->state != STMT_EXECUTED) &&
stmt_arena->state != STMT_INITIALIZED) ||
stmt_arena->state == STMT_INITIALIZED))
=>
(stmt_arena->state != Query_arena::STMT_EXECUTED &&
stmt_arena->state != Query_arena::STMT_INITIALIZED) ||
stmt_arena->state == Query_arena::STMT_INITIALIZED
=>
stmt_arena->state != Query_arena::STMT_EXECUTED
Is it not better to use the above as it is shorter and easier to
understand as there are less hidden abstractions?
Is STMT_EXECUTED safe to use?
We don't set state to STMT_EXECUTE in case of an error during execute.
If the error is temporary (depending on data or out of space during
execution) then state will be left in STMT_INITIALIZED. Don't we have a
problem in this case for the next execution?
In 11.5 we can trivially simulate an error in almost any query by
setting the tmp disk quota to a very low value.
Could be fixed by adding a new state STMT_OPTIMIZED that tells us that
all optimizations rewrites has been done and then we could check for
STMT_OPTIMIZED || STMT_EXECUTED or force a re-prepare if the previous
execution returned an error (may be can set state to STMT_ERROR in
this case)?
Another options is for a new prepare in the case the last query ended
with an error.
> + /*
> + If an outer field is resolved in a grouping select then it
> + is replaced for an Item_outer_ref object. Otherwise an
> + Item_field object is used.
> + The new Item_outer_ref object is saved in the inner_refs_list of
> + the outer select. Here it is only created. It can be fixed only
> + after the original field has been fixed and this is done in the
> + fix_inner_refs() function.
> + */
> + if (!(rf= new (thd->mem_root) Item_outer_ref(thd, context, this)))
> + rc= -1;
> + select->inner_refs_list.push_back(rf, thd->mem_root);
> + if (is_first_execution)
> + *reference= rf;
> + else
> + thd->change_item_tree(reference, rf);
> + if (arena)
> + thd->restore_active_arena(arena, &backup);
> + if (rc)
> + return rc;
> + rf->in_sum_func= thd->lex->in_sum_func;
> + }
> }
I don't like that we use 'rf' when we know it is 0. We can fix that
with the following code change that also removes the 'rc' variable
(shorter):
{
Query_arena *arena=0, backup;
Item_outer_ref *rf;
bool is_first_execution;
if ((is_first_execution= thd->is_first_query_execution()) &&
!thd->is_conventional())
arena= thd->activate_stmt_arena_if_needed(&backup);
/*
If an outer field is resolved in a grouping select then it
is replaced for an Item_outer_ref object. Otherwise an
Item_field object is used.
The new Item_outer_ref object is saved in the inner_refs_list of
the outer select. Here it is only created. It can be fixed only
after the original field has been fixed and this is done in the
fix_inner_refs() function.
*/
if (!(rf= new (thd->mem_root) Item_outer_ref(thd, context, this)))
{
select->inner_refs_list.push_back(rf, thd->mem_root);
if (is_first_execution)
*reference= rf;
else
thd->change_item_tree(reference, rf);
}
if (arena)
thd->restore_active_arena(arena, &backup);
if (!rf) // Memory allocation failed
return -1
rf->in_sum_func= thd->lex->in_sum_func;
}
Now code has exactly the same amount of jump instructions as your
code, but it is shorter, avoids the usage of 'ref' and does not need 'rc'.
Note that in comparing to the original code, we are creating an arena for the
states:
STMT_INITIALIZED, STMT_INITIALIZED_FOR_SP= 1, STMT_PREPARED= 2, STMT_ERROR= -1
This is probably ok, just wanted to ensure you are aware of this.
Would it be enough/safer to just test for SMT_PREPARED to see if an
arena should be used? (Just wondering)
> /*
> A reference is resolved to a nest level that's outer or the same as
> @@ -5938,7 +5968,7 @@ Item_field::fix_outer_field(THD *thd, Field **from_field, Item **reference)
> else if (ref != not_found_item)
> {
> Item *save;
> - Item_ref *rf;
> + Item_ref *rf= NULL;;
remove extra ;
> /* Should have been checked in resolve_ref_in_select_and_group(). */
> DBUG_ASSERT(*ref && (*ref)->is_fixed());
> @@ -5950,35 +5980,64 @@ Item_field::fix_outer_field(THD *thd, Field **from_field, Item **reference)
<cut>
> + This Item_field represents a reference to a column in some outer select.
> + Wrap this Item_field item into an object of the Item_ref class or a class
> + derived from Item_ref we are at the first execution of the query or at
> + processing the prepare command for it. Make this wrapping permanent for
> + the first query execution so that it could be used for next executions
> + of the query. The type of the wrapper depends on the place where this
> + item field occurs or on type of the query. If it is in the the having clause
> + we use a wrapper of the Item_ref type. Otherwise we use a wrapper either
> + of Item_direct_ref type or of Item_outer_ref. The latter is used for
> + grouping queries. Such a wrapper is always added to the list inner_refs_list
> + for the select against which the outer reference has been resolved.
> */
> + bool is_first_execution= thd->is_first_query_execution();
> + if (is_first_execution ||
> + thd->stmt_arena->is_stmt_prepare())
> + {
Replace with
if (thd->stmt_arena->state != Query_arena::STMT_EXECUTED)
{
bool is_first_execution= thd->is_first_query_execution();
> + int rc= 0;
> + Query_arena *arena, backup;
> + arena= 0;
> + if (is_first_execution)
> + arena= thd->activate_stmt_arena_if_needed(&backup);
> + rf= (place == IN_HAVING ?
> + new (thd->mem_root)
> + Item_ref(thd, context, ref, table_name,
> + field_name, alias_name_used) :
> + (!select->group_list.elements ?
> + new (thd->mem_root)
> + Item_direct_ref(thd, context, ref, table_name,
> + field_name, alias_name_used) :
> + new (thd->mem_root)
> + Item_outer_ref(thd, context, ref, table_name,
> + field_name, alias_name_used)));
> + *ref= save;
> + if (!rf)
> + rc= -1;
> + if (!rc && place != IN_HAVING && select->group_list.elements)
> + {
> + outer_context->select_lex->inner_refs_list.push_back(
> + (Item_outer_ref*)rf, thd->mem_root);
> + ((Item_outer_ref*)rf)->in_sum_func= thd->lex->in_sum_func;
> + }
> + if (is_first_execution)
> + *reference= rf;
> + else
> + thd->change_item_tree(reference, rf);
> + if (arena)
> + thd->restore_active_arena(arena, &backup);
> + if (rc)
> + return rc;
Pleace do same change as in previous case by using an 'if' instead of
setting rc.
> We can not "move" aggregate function in the place where
> @@ -6003,22 +6062,46 @@ Item_field::fix_outer_field(THD *thd, Field **from_field, Item **reference)
> this, (Item_ident*)*reference, false);
> if (last_checked_context->select_lex->having_fix_field)
> {
> + This Item_field represents a reference to a column in having clause
> + of some outer select. Wrap this Item_field item into an Item_ref object
> + if we are at the first execution of the query or at processing the
> + prepare command for it. Make this wrapping permanent for the first query
> + execution so that it could be used for next executions of the query.
> */
> - DBUG_ASSERT(!rf->fixed); // Assured by Item_ref()
> - if (rf->fix_fields(thd, reference) || rf->check_cols(1))
> - return -1;
> + Item_ref *rf;
> + bool is_first_execution= thd->is_first_query_execution();
> + if (is_first_execution ||
> + thd->stmt_arena->is_stmt_prepare())
Fix test, as in other cases.
> + {
> + int rc= 0;
> + Query_arena *arena, backup;
> + arena= 0;
> + if (is_first_execution)
> + arena= thd->activate_stmt_arena_if_needed(&backup);
> + rf= new (thd->mem_root) Item_ref(thd, context,
> + (*from_field)->table->s->db,
> + Lex_cstring_strlen((*from_field)->
> + table->alias.c_ptr()),
> + field_name);
> + if (!rf)
> + rc= -1;
> + if (is_first_execution)
> + *reference= rf;
> + else
> + thd->change_item_tree(reference, rf);
> + if (arena)
> + thd->restore_active_arena(arena, &backup);
> + if (rc)
> + return rc;
> + /*
> + rf is Item_ref => never substitute other items (in this case)
> + during fix_fields() => we can use rf after fix_fields()
> + */
> + DBUG_ASSERT(!rf->fixed); // Assured by Item_ref()
> + if (rf->fix_fields(thd, reference) || rf->check_cols(1))
> + return -1;
> + }
same here.
Change
if (!rf)
rc= -1;
...
if (arena)
thd->restore_active_arena(arena, &backup);
to
if (rc)
{
....
}
if (arena)
thd->restore_active_arena(arena, &backup);
> @@ -8321,6 +8404,9 @@ bool Item_ref::fix_fields(THD *thd, Item **reference)
> goto error;
> }
> + if ((*ref)->fix_fields_if_needed(thd, reference))
> + goto error;
> +
> set_properties();
Add a comment when this is needed (ie, when reference <> 0 and it is not fixed.
I tried to come up with a simple comment but failed :(
I assume this is at least true for the prepare phase of a prepared statement.
>
> if ((*ref)->check_cols(1))
> @@ -9282,8 +9368,21 @@ bool Item_direct_view_ref::fix_fields(THD *thd, Item **reference)
> bitmap_set_bit(fld->table->read_set, fld->field_index);
> }
> }
> - else if ((*ref)->fix_fields_if_needed(thd, ref))
> - return TRUE;
> + else
> + {
> + if (thd->is_noninitial_query_execution())
replace
(thd->is_noninitial_query_execution())
with
thd->is_stmt_executed()
See later comments in definition of is_noninitial_query_execution()
> + {
> + bool rc= false;
> + bool save_wrapper= thd->lex->current_select->no_wrap_view_item;
> + thd->lex->current_select->no_wrap_view_item= TRUE;
> + rc= (*ref)->fix_fields_if_needed(thd, ref);
> + thd->lex->current_select->no_wrap_view_item= save_wrapper;
> + if (rc)
> + return TRUE;
> + }
> + else if ((*ref)->fix_fields_if_needed(thd, ref))
> + return TRUE;
> + }
Do we need no_wrap_view_item anymore (we probably do, but it is not obvious)?
The original code that introduced no_wrap_view_item used it to test
things in sql_base.cc
but that code does not exist anymore.
The only usage we have of it now is in item.cc:
if (!thd->lex->current_select->no_wrap_view_item &&
thd->lex->in_sum_func &&
thd->lex->in_sum_func->nest_level ==
select->nest_level)
set_if_bigger(thd->lex->in_sum_func->max_arg_level,
select->nest_level);
There is no code comment what the above code is supposed to do.
Could you PLEASE add a comment about what is the purpose o the above
code, especially
why the nest_level is not applicable to no_wrap_view_items ?
I have verified that we need
!thd->lex->current_select->no_wrap_view_item here. We
get crashes in main.derived_view if remove the test.
> diff --git a/sql/item_cmpfunc.cc b/sql/item_cmpfunc.cc
> index f8959cfd2bf..cc9c81608ca 100644
> --- a/sql/item_cmpfunc.cc
> +++ b/sql/item_cmpfunc.cc
> @@ -4588,7 +4588,9 @@ bool Item_func_in::value_list_convert_const_to_int(THD *thd)
> */
> if (arg[0]->type() != Item::NULL_ITEM &&
> !convert_const_to_int(thd, field_item, &arg[0]))
> - all_converted= false;
> + all_converted= false;
> + else
> + block_transform_into_subq= true;
> }
if (all_converted)
m_comparator.set_handler(&type_handler_slonglong);
The above looks strange.
You are setting block_transform_into_subq to true in the case of
IN (1,1)
but not setting it for ("abb",NULL);
I assume the idea is that if if convert_const_to_int() changes
anything, then we block conversion of the IN to sub queries as our code
cannot convert back to use the original code in this case?
(as the translation to use subqueries is permanent).
Here is a test cases the sets block_transform_into_subq to true:
create table t1 (a bigint);
insert into t1 select seq from seq_1_to_100;
select benchmark(1,1);
select * from t1 where a in ('1','2','3');
drop table t1;
Is there not a better way to do this ?
- Make all translations of strings to integers permantently in the
statement arena
- Keep the derived table around for next execution (which would be a
nice speedup).
If nothing can be done to improve things, we should ae least move
setting block_transform_into_subq to the end of the function and fix that we
only block the transformation for prepared statements:
> if (all_converted)
> m_comparator.set_handler(&type_handler_slonglong);
->
if (all_converted)
{
m_comparator.set_handler(&type_handler_slonglong);
if (thd->is_first_query_execution())
block_transform_into_subq= 1;
}
----
> diff --git a/sql/item_cmpfunc.h b/sql/item_cmpfunc.h
> index 465625f69bf..1404fa30d6b 100644
> --- a/sql/item_cmpfunc.h
> +++ b/sql/item_cmpfunc.h
> @@ -2401,6 +2401,7 @@ class Item_func_in :public Item_func_opt_neg,
> {
> return check_argument_types_like_args0();
> }
> + bool block_transform_into_subq;
Please add a comment about the purpose of this variable, when it
should be used and how it is used.
As far as I can see, this set in the case where an IN() requires a
item conversion
in which case we have to block it for prepared statements as we cannot
roll it back.
> diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc
> index ff84e6b7bf9..4dcd1f3ae9d 100644
> --- a/sql/item_subselect.cc
> +++ b/sql/item_subselect.cc
> @@ -104,6 +104,16 @@ void Item_subselect::init(st_select_lex *select_lex,
> unit->item= this;
> engine->change_result(this, result, FALSE);
> }
> + else if ((unit->item->substype() == ALL_SUBS ||
> + unit->item->substype() == ANY_SUBS) &&
> + (((Item_in_subselect*) unit->item)->
> + test_set_strategy(SUBS_MAXMIN_INJECTED) ||
> + ((Item_in_subselect*) unit->item)->
> + test_set_strategy(SUBS_MAXMIN_ENGINE)))
> + {
> + unit->item= this;
> + engine->change_result(this, result, FALSE);
> + }
> else
> {
> /*
Better to combine the test with the previous test as the code inside
the {} is identical?
Note that I understand the above code to 100%, so adding a comment
what the purpose of
this code would be great.
Also change the function 'test_set_strategy()' to
'test_choosen_strategy(SUBS_MAXMIN_INJECTED | SUBS_MAXMIN_INJECTED)'
This will make the code shorter and easier to read.
Later in the review there is a new definition for test_choosen_strategy().
This is very similar to test_strategy(), but also ensure that the
strategy is chosen.
> @@ -200,15 +211,6 @@ void Item_in_subselect::cleanup()
>
> void Item_allany_subselect::cleanup()
> {
> - /*
> - The MAX/MIN transformation through injection is reverted through the
> - change_item_tree() mechanism. Revert the select_lex object of the
> - query to its initial state.
> - */
> - for (SELECT_LEX *sl= unit->first_select();
> - sl; sl= sl->next_select())
> - if (test_set_strategy(SUBS_MAXMIN_INJECTED))
> - sl->with_sum_func= false;
> Item_in_subselect::cleanup();
> }
I assume the above is now removed becasue MAX/MIN optimization changes are now
permanent in Item_allany_subselect::transform_into_max_min(JOIN *join).
Please add an explanation for the above change to the commit message
and also add a note that thanks to this change we don't need to
cleanup for MAX/MIN optimization.
> @@ -497,6 +505,9 @@ void Item_subselect::recalc_used_tables(st_select_lex *new_parent,
> Ref_to_outside *upper;
> DBUG_ENTER("recalc_used_tables");
>
> + if (!unit->thd->is_first_query_execution())
> + DBUG_VOID_RETURN;
With prepared statements, the optimizer plan can change for each query.
How come we don't need to recalc_used_tables() just because we have
done one execution before?
> @@ -2143,11 +2155,11 @@ bool Item_allany_subselect::transform_into_max_min(JOIN *join)
> }
> if (upper_item)
> upper_item->set_sum_test(item);
> - thd->change_item_tree(&select_lex->ref_pointer_array[0], item);
> + select_lex->ref_pointer_array[0]= item;
> {
> List_iterator<Item> it(select_lex->item_list);
> it++;
> - thd->change_item_tree(it.ref(), item);
> + it.replace(item);
> }
Looks like we are now are doing the MAX/MIN transformation permanently.
> +uint st_select_lex_unit::ub_eq_for_exists2_in()
> +{
> + if (derived && derived->is_materialized_derived())
> + return 0;
> + st_select_lex *sl= first_select();
> + if (sl->next_select())
> + return 0;
> + uint n= sl->select_n_eq;
> + for (st_select_lex_unit *unit= sl->first_inner_unit();
> + unit;
> + unit= unit->next_unit())
> + {
> + n+= unit->ub_eq_for_exists2_in();
> + }
> + return n;
> +}
Please use a more clear function name. 'ub' is not clear.
Please also add a function comment to explain the purpose if this function
as I do not understand the purpose of it.
The function is is used here:
bool Item_exists_subselect::fix_fields(THD *thd, Item **ref)
{
DBUG_ENTER("Item_exists_subselect::fix_fields");
st_select_lex *sel= unit->first_select();
sel->select_n_reserved= unit->ub_eq_for_exists2_in();
But it's purpose is not that clear or what it trying to calculate.
For example, if you have a query:
SELECT (a=1) + (b=2) + (c=3)
select_n_eq will be 3.
In case of
SELECT (a = 1 or b = 1) or (a=2 or b=4)
select_n_eq will be 4
I don't see how either value can be of any use for the optimizer.
Why not instead calculate the number of = operations in the top level
Item_comp_and() ? You could have the counter in Item_comp_and of
how many '=' comparisons there are inside it.
This would give you an exact number of = that may be needed, compared
to the current code.
Note find_inner_outer_equalities() is only appending items from the top
and level or alternatively 0 or 1 element for other items.
> +bool Item_singlerow_subselect::test_set_strategy(uchar strategy_arg)
> +{
> + DBUG_ASSERT(strategy_arg == SUBS_MAXMIN_INJECTED ||
> + strategy_arg == SUBS_MAXMIN_ENGINE);
> + return ((strategy & SUBS_STRATEGY_CHOSEN) &&
> + (strategy & ~SUBS_STRATEGY_CHOSEN) == strategy_arg);
> +}
Change to the following so that we can check all bits at once, like
in test_strategy().
bool Item_singlerow_subselect::test_chosen_strategy(uint strategy_arg)
{
DBUG_ASSERT(strategy_arg &&
!(strategy_arg & ~(SUBS_MAXMIN_INJECTED | SUBS_MAXMIN_ENGINE)));
return ((strategy & SUBS_STRATEGY_CHOSEN) &&
(strategy & strategy_arg);
}
> diff --git a/sql/item_subselect.h b/sql/item_subselect.h
> index d3572e1abe7..9e318e74bab 100644
> --- a/sql/item_subselect.h
> +++ b/sql/item_subselect.h
> @@ -302,6 +302,7 @@ class Item_singlerow_subselect :public Item_subselect
> {
> protected:
> Item_cache *value, **row;
> + uchar strategy;
Change to uint16 to ensure that we can add more bits later without having
to change a lot of code (thanks to alignment, the storage required
does not change if we use uint16)
> diff --git a/sql/item_sum.cc b/sql/item_sum.cc
> index 4cf403c1618..e1a32185ad1 100644
> --- a/sql/item_sum.cc
> +++ b/sql/item_sum.cc
> @@ -93,10 +93,15 @@ bool Item_sum::init_sum_func_check(THD *thd)
> thd->lex->in_sum_func= this;
> nest_level= thd->lex->current_select->nest_level;
> ref_by= 0;
> + if (!thd->is_noninitial_query_execution() ||
> + (thd->active_stmt_arena_to_use()->state ==
> + Query_arena::STMT_SP_QUERY_ARGUMENTS))
> + {
> aggr_level= -1;
> aggr_sel= NULL;
> max_arg_level= -1;
> max_sum_func_level= -1;
> + }
Please fix the indentation.
Please add a comment for the above test. Here is a suggestion:
/*
Item_sum objects are persistent and does not have to be updated
for pre-executed prepared statements (stmt_arena->state == EXECUTED) except
when setting variables for stored procedures
(state == Query_arena::STMT_SP_QUERY_ARGUMENTS))
*/
> outer_fields.empty();
> return FALSE;
> }
> @@ -409,7 +414,7 @@ bool Item_sum::register_sum_func(THD *thd, Item **ref)
> sl= sl->master_unit()->outer_select() )
> sl->master_unit()->item->get_with_sum_func_cache()->set_with_sum_func();
> }
> - if (aggr_sel)
> + if (aggr_sel && aggr_sel != thd->lex->current_select )
> thd->lex->current_select->mark_as_dependent(thd, aggr_sel, NULL);
Remove extra space before && and before )
Please add comment under which circumstances this is true:
aggr_sel != thd->lex->current_select
I would like to understand why this was not needed before.
> +++ b/sql/opt_subselect.cc
> @@ -962,8 +962,13 @@ bool JOIN::transform_max_min_subquery()
> if (!subselect || (subselect->substype() != Item_subselect::ALL_SUBS &&
> subselect->substype() != Item_subselect::ANY_SUBS))
> DBUG_RETURN(0);
> - DBUG_RETURN(((Item_allany_subselect *) subselect)->
> - transform_into_max_min(this));
> + bool rc= false;
> + Query_arena *arena, backup;
> + arena= thd->activate_stmt_arena_if_needed(&backup);
> + rc= (((Item_allany_subselect *) subselect)->transform_into_max_min(this));
> + if (arena)
> + thd->restore_active_arena(arena, &backup);
> + DBUG_RETURN(rc);
> }
Please update the comment for this function to make it clear that the changes
to use min/max are now permanent.
>
> @@ -1889,7 +1894,7 @@ static bool convert_subq_to_sj(JOIN *parent_join, Item_in_subselect *subq_pred)
> with thd->change_item_tree
> */
> Item_func_eq *item_eq=
> - new (thd->mem_root) Item_func_eq(thd, left_exp_orig,
> + new (thd->mem_root) Item_func_eq(thd, left_exp,
> subq_lex->ref_pointer_array[0]);
> if (!item_eq)
> goto restore_tl_and_exit;
I assume this was a bug in the old code?
I checked the history and this line has changed back and between
left_exp_orig and left_exp over time.
The documentation for left_expr_orig is:
/*
Important for PS/SP: left_expr_orig is the item that left_expr originally
pointed at. That item is allocated on the statement arena, while
left_expr could later be changed to something on the execution arena.
*/
Item *left_expr_orig;
As an experiment I added left_exp_orig back and did run the test that you
had changed in your commit with valgrind. All test passed.
Do you have any test that shows that left_exp is the right value here or do
you have any other proof why left_exp is right ?
> @@ -6535,6 +6540,10 @@ bool JOIN::choose_subquery_plan(table_map join_tables)
> if (is_in_subquery())
> {
> in_subs= unit->item->get_IN_subquery();
> + if (in_subs->test_set_strategy(SUBS_MAXMIN_INJECTED))
> + return false;
> + if (in_subs->test_set_strategy(SUBS_MAXMIN_ENGINE))
> + return false;
> if (in_subs->create_in_to_exists_cond(this))
> return true;
You can replace the above two changed lines with:
if (in_subs->test_chosen_strategy(SUBS_MAXMIN_INJECTED | SUBS_MAXMIN_INJECTED))
> diff --git a/sql/sql_array.h b/sql/sql_array.h
> index b6de1b18d78..b9fdef0623c 100644
> --- a/sql/sql_array.h
> +++ b/sql/sql_array.h
> @@ -39,7 +39,10 @@ template <typename Element_type> class Bounds_checked_array
>
> Bounds_checked_array(Element_type *el, size_t size_arg)
> : m_array(el), m_size(size_arg)
> - {}
> + {
> + if (!m_size)
> + m_array= NULL;
> + }
Strange that one would call it with m_size() with el != 0 and m_size == 0.
Sounds like a bug in the caller. I would change the above to be a DBUG_ASSERT
instead.
> void reset() { m_array= NULL; m_size= 0; }
>
> diff --git a/sql/sql_base.cc b/sql/sql_base.cc
> index f1fd0053a82..a6cdd218c44 100644
> --- a/sql/sql_base.cc
> +++ b/sql/sql_base.cc
> @@ -5783,10 +5783,7 @@ find_field_in_view(THD *thd, TABLE_LIST *table_list,
> {
> if (!my_strcasecmp(system_charset_info, field_it.name()->str, name))
> {
> - // in PS use own arena or data will be freed after prepare
> - if (register_tree_change &&
> - thd->stmt_arena->is_stmt_prepare_or_first_stmt_execute())
> - arena= thd->activate_stmt_arena_if_needed(&backup);
> + arena= thd->activate_stmt_arena_if_needed(&backup);
Why this change?
Will it not use statement area for a lot of cases where it did not before?
like for state == STMT_EXECUTED ?
- This was discussed on slack, but we did not have time to reach a conclusion.
> @@ -5805,11 +5802,23 @@ find_field_in_view(THD *thd, TABLE_LIST *table_list,
> the replacing item.
> */
> if (*ref && !(*ref)->is_autogenerated_name())
> + {
> + arena=0;
> + if (thd->is_first_query_execution())
> + arena= thd->activate_stmt_arena_if_needed(&backup);
> item->set_name(thd, (*ref)->name);
> + if (arena)
> + thd->restore_active_arena(arena, &backup);
Item::set_name() does never allocate any memory, so we don't need arena here.
(I checked all code on the sql level).
> + }
> + if (item != *ref)
> + {
> + if (register_tree_change &&
> + !thd->is_first_query_execution())
The above is same as:
register_tree_change &&
!(stmp_arena->state != STMT_INITIALIZED &&
stmt_arena->state != Query_arena::STMT_EXECUTED)
=>
register_tree_change &&
(stmp_arena->state == STMT_INITIALIZED ||
stmt_arena->state == STMT_EXECUTED))
Is the above correct and the intention of this cde?
> + thd->change_item_tree(ref, item);
> + else
> + *ref= item;
> + }
You can remove the else before *ref. It is faster to update *ref than handle
the jump.
I am still a bit concerned that we set *ref=item when
register_tree_change != 0 as our code comments says that if
register_tree_change is set, then *ref will be removed before next
execution and any tries to acces it can result in a core dump.
> @@ -7639,10 +7648,14 @@ bool setup_fields(THD *thd, Ref_ptr_array ref_pointer_array,
> TODO: remove it when (if) we made one list for allfields and
> ref_pointer_array
> */
> - if (!ref_pointer_array.is_null())
> + if (thd->is_first_query_execution() ||
> + thd->stmt_arena->is_stmt_prepare())
The new test is same as (see earlier examples):
thd->stmt_arena->state != Query_arena::STMT_EXECUTED
Please replace the test with the above or replace it with is_stmt_exceuted()
<cut>
> /*
> @@ -7682,17 +7695,6 @@ bool setup_fields(THD *thd, Ref_ptr_array ref_pointer_array,
> ref[0]= item;
> ref.pop_front();
> }
> - /*
> - split_sum_func() must be called for Window Function items, see
> - Item_window_func::split_sum_func.
> - */
> - if (sum_func_list &&
> - ((item->with_sum_func() && item->type() != Item::SUM_FUNC_ITEM) ||
> - item->with_window_func))
> - {
> - item->split_sum_func(thd, ref_pointer_array, *sum_func_list,
> - SPLIT_SUM_SELECT);
> - }
I assume this is moved part of this from setup_fields to JOIN::prepare().
Any particular reason why this move was required ?
Just curious. Still, it would be good to explain this in the commit message.
(One reason is that it will allow us to remove sum_func_list from the
arguments to setup_fields)
I don't see any big problem with the code as it is JOIN::prepare() that is
calling both setup_fields() and split_sum_func().
It will be a bit slower as we have to traverse the field_list twice, but
that is not a high cost.
Note that if we go with the above code change, you should also remove
sum_func_list from the arguments as the above code was the only place it
was used.
Also, please add back the comment to the moved code:
/*
split_sum_func() must be called for Window Function items, see
Item_window_func::split_sum_func.
*/
> diff --git a/sql/sql_class.cc b/sql/sql_class.cc
> index 7913f1a40d2..19f1ec9728e 100644
> --- a/sql/sql_class.cc
> +++ b/sql/sql_class.cc
> @@ -3959,12 +3962,16 @@ void Query_arena::free_items()
> Item *next;
> DBUG_ENTER("Query_arena::free_items");
> /* This works because items are allocated on THD::mem_root */
> + for (next= free_list; next; next= next->next)
> + {
> + next->cleanup();
> + }
Why not call cleanup in the following loop instead of looping over
the elements twice (as we did before)?
Is there some cases when we cannot do cleanup as part of delete_self()?
If yes, please add comment in the code why we have to traverse the list twice
as otherwise it is very likely some developer will try to optimize the
first loop away while looking at the code.
If this is the case, is this not a problem also for free_items() ?
> diff --git a/sql/sql_class.h b/sql/sql_class.h
> index 7e5d9ac96e3..2f05b2dbd03 100644
> --- a/sql/sql_class.h
> +++ b/sql/sql_class.h
> @@ -2569,6 +2569,20 @@ class THD: public THD_count, /* this must be first */
> return static_cast<PSI_thread*>(my_atomic_loadptr((void*volatile*)&m_psi));
> }
>
> + inline bool is_first_query_execution()
> + {
> + return (stmt_arena->is_conventional() ||
> + stmt_arena->state != Query_arena::STMT_EXECUTED) &&
> + !stmt_arena->is_stmt_prepare();
(stmt_arena->is_conventional() ||
stmt_arena->state != Query_arena::STMT_EXECUTED) &&
!stmt_arena->is_stmt_prepare()
=>
(stmt_arena->state == STMT_CONVENTIONAL_EXECUTION ||
stmt_arena->state != STMT_EXECUTED) &&
stmp_arena->state != STMT_INITIALIZED)
=>
stmt_arena->state != STMT_INITIALIZED &&
stmt_arena->state != STMT_EXECUTED
Simpler and easier to understand.
> + }
> +
> + inline bool is_noninitial_query_execution()
> + {
> + return !stmt_arena->is_conventional() &&
> + stmt_arena->state == Query_arena::STMT_EXECUTED &&
> + !stmt_arena->is_stmt_prepare() ;
Please add comment to explain this function and when to use it.
(name of is_first_query_execution() is clear, but this one is not)
Anyway:
!stmt_arena->is_conventional() &&
stmt_arena->state == Query_arena::STMT_EXECUTED &&
!stmt_arena->is_stmt_prepare() ;
=>
stmt_arena->state != STMT_CONVENTIONAL_EXECUTION &&
stmt_arena->state == STMT_EXECUTED &&
stmt_arena->state != STMT_INITIALIZED
=>
stmt_arena->state == STMT_EXECUTED
Better to renamed it to is_stmt_executed() and move it to where we
define is_stmt_prepare() in class THD and class Query_arena
> diff --git a/sql/sql_lex.cc b/sql/sql_lex.cc
> index b669925a263..d7b7e98a6e4 100644
> --- a/sql/sql_lex.cc
> +++ b/sql/sql_lex.cc
> @@ -3601,6 +3607,8 @@ ulong st_select_lex::get_table_join_options()
>
> uint st_select_lex::get_cardinality_of_ref_ptrs_slice(uint order_group_num_arg)
> {
> + if (card_of_ref_ptrs_slice)
> + return card_of_ref_ptrs_slice;
I assume this is just a cache and the code should work also without it?
If yes, please add a comment about that.
> if (!((options & SELECT_DISTINCT) && !group_list.elements))
> hidden_bit_fields= 0;
>
> @@ -3620,20 +3628,23 @@ uint st_select_lex::get_cardinality_of_ref_ptrs_slice(uint order_group_num_arg)
> order_group_num * 2 +
> hidden_bit_fields +
> fields_in_window_functions;
> + card_of_ref_ptrs_slice= n;
> return n;
> }
Does caching card_of_ref_ptrs_slice really help ?
(The function is quite trivial)
We call the function in one place, setup_ref_array()
It is called once per query (For selectr in JOIN::prepare())
I think having the variable and testing and updating clearing it for
each normal query will cost us more than not having it.
This assumes we have more normal queries than we have prepared
statements.
I would suggest we remove the variable.
> -bool st_select_lex::setup_ref_array(THD *thd, uint order_group_num)
> +static
> +bool setup_ref_ptrs_array(THD *thd, Ref_ptr_array *ref_ptrs, uint n_elems)
> {
> + if (!ref_ptrs->is_null())
> + {
> + if (ref_ptrs->size() >= n_elems)
> + return false;
> + }
> Item **array= static_cast<Item**>(
> thd->active_stmt_arena_to_use()->alloc(sizeof(Item*) * n_elems));
> if (likely(array != NULL))
> + *ref_ptrs= Ref_ptr_array(array, n_elems);
> return array == NULL;
> }
I assume the change is that the new code will alloocate a new array
if the old array was not big enough.
That is good. Howevever it would be good to know when this can happen.
For ref_pointer_array this should be impossible as we cache the value
and it cannot change for two iterations.
Is this reallocate needed for the save_ref_ptrs array?
Please add a comment to explain in which case the old array was not
big enough
It would also be good to have the code for setup_ref_ptr_array,
setup_ref_array and save_ref_ptrs_after_persistent_rewrites and
save_ref_ptrs_if_needed close to each other as these are very similar
and very related.
> +
> +bool st_select_lex::setup_ref_array(THD *thd, uint order_group_num)
> +{
> + /*
> + We have to create array in prepared statement memory if it is a
> + prepared statement
> + */
> + uint slice_card= !thd->is_noninitial_query_execution() ?
> + get_cardinality_of_ref_ptrs_slice(order_group_num) :
> + card_of_ref_ptrs_slice;
> + uint n_elems= slice_card * 5;
> + DBUG_ASSERT(n_elems % 5 == 0);
You can remove the DBUG_ASSERT. It is obvious that this is always
correct (for this code).
> + return setup_ref_ptrs_array(thd, &ref_pointer_array, n_elems);
> +}
With the cache, why don't have to test for
!thd->is_noninitial_query_execution(). This should be good enough:
uint slice_card= get_cardinality_of_ref_ptrs_slice(order_group_num);
With the cache, it will return at once after the first call.
As the cache is updated on first run, the answer should always be the
same is in your original code.
> +bool st_select_lex::save_ref_ptrs_after_persistent_rewrites(THD *thd)
> +{
> + uint n_elems= card_of_ref_ptrs_slice;
> + if (setup_ref_ptrs_array(thd, &save_ref_ptrs, n_elems))
> + return true;
> + if (!ref_pointer_array.is_null())
> + join->copy_ref_ptr_array(save_ref_ptrs, join->ref_ptr_array_slice(0));
> + return false;
> +}
Please add to function start:
/* Ensure that setup_ref_array has been called */
DBUG_ASSERT(card_of_ref_ptrs_slice);
It looks like all calls to setup_ref_ptrs_array() are using the same
value for all future calls. If this is the case, why did you change
setup_ref_ptrs_array() to check if number of elements has grown?
If this is not needed, remove the check and add an assert()
to ensure that the elements does not grow over time.
> +bool st_select_lex::save_ref_ptrs_if_needed(THD *thd)
> +{
> + if (thd->is_first_query_execution())
> + {
> + if (save_ref_ptrs_after_persistent_rewrites(thd))
> + return true;
> + }
> + return false;
> +}
The above if is true in case of
stmt_arena->state != STMT_INITIALIZED &&
stmt_arena->state != STMT_EXECUTED
We call save_ref_prts_if_needed in JOIN::prepare() and
subselect_single_select_engine::prepare(), mysql_select() and
st_select_lex_unit::prepare_join
I assume this is ok.
However, I did find some inconsistently in how this functioin is called.
JOIN::prepare() calls it on errors before returning (for most cases but not
all).
subselect_single_select_engine::prepare() & mysql_select()
calls it if join->prepare() returns error, but not otherwise.
st_select_lex_unit::prepare_join() calls it always after prepare.
I would like to know when we MUST call save_ref_prts_if_needed().
Do we have to call it in case JOIN::prepare returns 0 ?
It would be easier if things would be done this way:
- JOIN::prepare() would always call it in case of error if there is any
need to call it.
- We would not have to call it in
subselect_single_select_engine::prepare() or mysql_select()
- We would remove the call in st_select_lex_unit::prepare_join() or
alternative call it only if join->prepare() returns 0.
(Depending on what is correct here)
> void st_select_lex_unit::print(String *str, enum_query_type query_type)
> {
> if (with_clause)
> @@ -4958,7 +5007,7 @@ bool st_select_lex::optimize_unflattened_subqueries(bool const_only)
> }
> if ((res= inner_join->optimize()))
> return TRUE;
> - if (!inner_join->cleaned)
> + if (inner_join->thd->is_first_query_execution())
> sl->update_used_tables();
> sl->update_correlated_cache();
> is_correlated_unit|= sl->is_correlated;
Fix intendation.
Why is cleaned not good enough here?
cleaned means that the join structure cannot be used anymore.
Why not do:
if (!inner_join->cleaned && inner_join->thd->is_first_query_execution())
Should be safer.
> diff --git a/sql/sql_lex.h b/sql/sql_lex.h
> index 3ab50d4aaa8..59029057a23 100644
> --- a/sql/sql_lex.h
> +++ b/sql/sql_lex.h
> @@ -1045,6 +1045,8 @@ class st_select_lex_unit: public st_select_lex_node {
>
> bool can_be_merged();
>
> + uint ub_eq_for_exists2_in();
> +
> friend class st_select_lex;
> };
>
> @@ -1176,6 +1178,7 @@ class st_select_lex: public st_select_lex_node
> uint curr_tvc_name;
> /* true <=> select has been created a TVC wrapper */
> bool is_tvc_wrapper;
> + uint fields_added_by_fix_inner_refs;
You can remove the above variable as it is not used anywhere.
It is set at:
sql_select.cc:1499: select_lex->fields_added_by_fix_inner_refs=
sql_lex.cc:3119: fields_added_by_fix_inner_refs= 0;
but not used anywhere.
> /*
> Needed to correctly generate 'PRIMARY' or 'SIMPLE' for select_type column
> of EXPLAIN
> @@ -1201,6 +1204,8 @@ class st_select_lex: public st_select_lex_node
>
> /// Array of pointers to top elements of all_fields list
> Ref_ptr_array ref_pointer_array;
> + Ref_ptr_array save_ref_ptrs;
> + uint card_of_ref_ptrs_slice;
Please add uint variables together with other uint variables.
Otherwise we may loose 4 byte for each variable because of alignment.
>
> /*
> number of items in select_list and HAVING clause used to get number
> @@ -1220,6 +1225,7 @@ class st_select_lex: public st_select_lex_node
> uint order_group_num;
> /* reserved for exists 2 in */
> uint select_n_reserved;
> + uint select_n_eq;
Please add a descripton for the above variable.
(Name does not tell me what it is used for).
> /*
> it counts the number of bit fields in the SELECT list. These are used when DISTINCT is
> converted to a GROUP BY involving BIT fields.
> @@ -1295,6 +1301,8 @@ class st_select_lex: public st_select_lex_node
> case of an error during prepare the PS is not created.
> */
> uint8 changed_elements; // see TOUCHED_SEL_*
> + uint8 save_uncacheable;
> + uint8 save_master_uncacheable;
The above two variables can be removed as these are only assigned to
but never used for anything.
> /* TODO: add foloowing first_* to bitmap above */
> bool first_natural_join_processing;
> bool first_cond_optimization;
> diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc
> index 5db53b6f2b8..83569207a0c 100644
> --- a/sql/sql_parse.cc
> +++ b/sql/sql_parse.cc
> @@ -4956,11 +4956,24 @@ mysql_execute_command(THD *thd)
> if ((res= multi_delete_precheck(thd, all_tables)))
> break;
>
> + if (thd->stmt_arena->is_stmt_prepare())
> + {
> + if (add_item_to_list(thd, new (thd->mem_root) Item_null(thd)))
> + goto error;
> + }
> + else if (thd->is_first_query_execution())
> + {
The above tests for
stmt_arena->state != STMT_INITIALIZED &&
stmt_arena->state != STMT_EXECUTED
While .. != STMT_EXECUTED would be enough as it is tested on if above.
No big deal, just and observarion
> + if (select_lex->item_list.elements != 0)
> + select_lex->item_list.empty();
> + bool rc= false;
You don't have to set it to false here.
> + Query_arena *arena= 0, backup;
> + arena= thd->activate_stmt_arena_if_needed(&backup);
> + rc= add_item_to_list(thd, new (thd->stmt_arena->mem_root) Item_null(thd));
> + if (arena)
> + thd->restore_active_arena(arena, &backup);
> + if (rc)
> + goto error;
> + }
> diff --git a/sql/sql_select.cc b/sql/sql_select.cc
> index a7b84bbfe3b..27eb466b47c 100644
> --- a/sql/sql_select.cc
> +++ b/sql/sql_select.cc
> @@ -551,6 +551,19 @@ fix_inner_refs(THD *thd, List<Item> &all_fields, SELECT_LEX *select,
> the found_in_group_by field of the references from the list.
> */
> List_iterator_fast <Item_outer_ref> ref_it(select->inner_refs_list);
> +
> + if (thd->is_noninitial_query_execution())
> + {
> + while ((ref= ref_it++))
> + {
> + if (ref->found_in_select_list)
> + continue;
> + int el= all_fields.elements;
> + all_fields.push_front(ref_pointer_array[el], thd->mem_root);
> + }
> + return false;
> + }
Please move the comment before this loop after the loop as it refers
to the next loop.
Also please add a short comment why the purpose of this loop is.
Will this not cause the ref_pointer_array to grow for each
prepared statement call (after the first one)?
Looking at the other parts of your patch, I thought that the size and content
of ref_pointer_array[] will be fixed after the first call.
(or saved in save_ref_ptrs).
Adding something to ref_pointer_array in case of STMT_EXECUTED sounds
like it is going against the above principles.
> + bool is_first_execution= thd->is_first_query_execution();
> + if (is_first_execution ||
> + thd->stmt_arena->is_stmt_prepare())
The above can be replaced with
if (thd->stmt_area->state != STMT_EXECUTED)
It a bit confusing that we have defined:
inline bool is_stmt_execute() const
{ return state == STMT_PREPARED || state == STMT_EXECUTED; }
as it would be been logical to have
inline bool is_stmt_execute() const
{ return state == STMT_EXECUTED; }
Adding a function is_stmt_executed()
would easily be confused with is_stmt_execute().
Maybe we should rename is_stmt_execute to is_stmt_preared_or_executed()
and add is_stmt_executed() ?
> @@ -1336,18 +1367,13 @@ JOIN::prepare(TABLE_LIST *tables_init, COND *conds_init, uint og_num,
> DBUG_RETURN(-1);
> }
>
> - if (thd->lex->current_select->first_cond_optimization)
> - {
> - if ( conds && ! thd->lex->current_select->merged_into)
> - select_lex->select_n_reserved= conds->exists2in_reserved_items();
> - else
> - select_lex->select_n_reserved= 0;
> - }
I asume this is now what is calcualated in Item_exists_subselect::fix_fields()
and not needed here anymore?
> @@ -1463,6 +1489,38 @@ JOIN::prepare(TABLE_LIST *tables_init, COND *conds_init, uint og_num,
> if (res)
> DBUG_RETURN(res);
>
> + if (thd->is_first_query_execution() ||
> + thd->lex->is_ps_or_view_context_analysis())
> + {
> + if (fix_inner_refs(thd, all_fields, select_lex, ref_ptrs))
> + DBUG_RETURN(-1);
Remove the above outside the if as the code is identical also for the else
part.
> + if (thd->is_first_query_execution() &&
> + all_fields.elements > select_lex->item_list.elements)
> + select_lex->fields_added_by_fix_inner_refs=
> + select_lex->inner_refs_list.elements;
You can remove the above 4 lines as fields_added_by_fix_inner_refs is not
used anywhere.
> + }
> + else
> + {
> + if (fix_inner_refs(thd, all_fields, select_lex, ref_ptrs))
> + DBUG_RETURN(-1);
> + select_lex->uncacheable= select_lex->save_uncacheable;
> + select_lex->master_unit()->uncacheable= select_lex->save_master_uncacheable;
you can remove the above as neither variable is used.
> + }
> +
The above means you can replace the above query block with:
if (fix_inner_refs(thd, all_fields, select_lex, ref_ptrs))
DBUG_RETURN(-1);
> @@ -1624,6 +1681,7 @@ JOIN::prepare(TABLE_LIST *tables_init, COND *conds_init, uint og_num,
> err:
> delete procedure; /* purecov: inspected */
> procedure= 0;
> + select_lex->save_ref_ptrs_if_needed(thd);
> DBUG_RETURN(-1); /* purecov: inspected */
> }
ok. Please check if there is any other case in JOIN::prepare where we
return <> 0 where we should call save_ref_ptrs_if_needed(thd).
By the way, I am not sure why we would save_ref_ptrs_if_needed() at
all in case of errors. The state will in this case not be set to
STMT_EXECUTED and thus the state should not be used by the next queryt.
Please add a comment why we need the result in case of an error.
> // Update used tables after all handling derived table procedures
> - select_lex->update_used_tables();
> + if (select_lex->first_cond_optimization)
> + select_lex->update_used_tables();
Why first_cond_optimization() instead of state != STMT_EXECUTED ?
> - if (transform_max_min_subquery())
> + if (select_lex->first_cond_optimization && transform_max_min_subquery())
> DBUG_RETURN(1); /* purecov: inspected */
>
> if (select_lex->first_cond_optimization)
Please combine the above two ifs
> @@ -2042,6 +2100,11 @@ JOIN::optimize_inner()
<cut>
> + select_lex->save_uncacheable= select_lex->uncacheable;
> + select_lex->save_master_uncacheable= select_lex->master_unit()->uncacheable;
Remove as not used.
> @@ -4832,6 +4895,7 @@ mysql_select(THD *thd, TABLE_LIST *tables, List<Item> &fields, COND *conds,
> if ((err= join->prepare(tables, conds, og_num, order, false, group,
> having, proc_param, select_lex, unit)))
> {
> + select_lex->save_ref_ptrs_if_needed(thd);
Can probably be removed. See previous comment
> @@ -4858,6 +4922,7 @@ mysql_select(THD *thd, TABLE_LIST *tables, List<Item> &fields, COND *conds,
> if ((err= join->prepare(tables, conds, og_num, order, false, group, having,
> proc_param, select_lex, unit)))
> {
> + select_lex->save_ref_ptrs_if_needed(thd);
> goto err;
Can probably be removed. See previous comment
> @@ -25172,14 +25253,33 @@ find_order_in_list(THD *thd, Ref_ptr_array ref_pointer_array,
<cut>
> - from_field= find_field_in_tables(thd, (Item_ident*) order_item, tables,
> - NULL, &view_ref, IGNORE_ERRORS, FALSE,
> - FALSE);
> + if (!thd->is_noninitial_query_execution())
> + {
> + from_field= find_field_in_tables(thd, (Item_ident*) order_item, tables,
> + NULL, &view_ref, IGNORE_ERRORS, FALSE,
> + FALSE);
> + order->view_ref= view_ref;
> + order->from_field= from_field;
> + }
> + else
> + {
> + if (order->view_ref)
> + {
Fix indentation
> + view_ref= order->view_ref;
> + from_field= order->from_field;
> + }
> + else
> + {
Fix indentation
> + from_field= find_field_in_tables(thd, (Item_ident*) order_item,
> + tables, NULL, &view_ref,
> + IGNORE_ERRORS, FALSE, FALSE);
> + }
> + }
> if (!from_field)
> from_field= (Field*) not_found_field;
> }
> @@ -25232,6 +25332,12 @@ find_order_in_list(THD *thd, Ref_ptr_array ref_pointer_array,
> if (found_item != not_found_item)
> {
> order->item= &ref_pointer_array[all_fields.elements-1-counter];
> + if (!thd->is_first_query_execution() &&
> + !thd->lex->is_ps_or_view_context_analysis())
> + {
> + if ((*order->item)->fix_fields_if_needed_for_order_by(thd, order->item))
> + return TRUE;
> + }
The above code is new. Why do need this when we did not need it before?
Please add a comment for the block above.
> order->in_field_list= 0;
> return FALSE;
> }
> diff --git a/sql/sql_union.cc b/sql/sql_union.cc
> index c3c4198439a..5fb31ccbe46 100644
> --- a/sql/sql_union.cc
> +++ b/sql/sql_union.cc
> @@ -1114,6 +1114,8 @@ bool st_select_lex_unit::prepare_join(THD *thd_arg, SELECT_LEX *sl,
> thd_arg->lex->proc_list.first),
> sl, this);
>
> + sl->save_ref_ptrs_if_needed(thd);
You may be able to remove the above, look at other comments.
> @@ -2784,6 +2786,8 @@ bool st_select_lex::cleanup()
>
> if (join)
> {
> + if (join->thd->stmt_arena->is_stmt_prepare())
> + inner_refs_list.empty();
> List_iterator<TABLE_LIST> ti(leaf_tables);
> TABLE_LIST *tbl;
> while ((tbl= ti++))
> @@ -2813,7 +2817,6 @@ bool st_select_lex::cleanup()
> continue;
> error= (bool) ((uint) error | (uint) lex_unit->cleanup());
> }
> - inner_refs_list.empty();
> exclude_from_table_unique_test= FALSE;
> hidden_bit_fields= 0;
> DBUG_RETURN(error);
The above patch looks a bit strange.
We are deleting the join in this function. Isn't the inner_refs_list
depending on the join struct in any way?
One effect of the above is that we are resetting inner_refs_list
only later when excecuting the next query in
st_select_lex::init_select() for anything else than STMT_INITIALIZED.
I tried reverting the fix and I did get errors.
I assume this is because we now only update inner_refs_list once and
permantently during first execution?
It would be good to add some kind of comment in the commit message
about this change.
Why only is_stmt_prepare and not also for STMT_CONVENTIONAL_EXECUTION ?
Regards,
Monty
1
0