developers
Threads by month
- ----- 2025 -----
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 5 participants
- 6819 discussions
[Maria-developers] Documenting is needed: MDEV-16991 Rounding vs truncation for TIME, DATETIME, TIMESTAMP
by Alexander Barkov 26 Nov '18
by Alexander Barkov 26 Nov '18
26 Nov '18
Hi Kenneth, all,
I've pushed MDEV-16991.
Can you please document this new feature.
Thanks!
2
3
Hi Alexey,
Thanks for your review.
On 11/26/2018 12:29 AM, Alexey Botchkov wrote:
> Ok, fine with leaving sql_base_types.h and the MYSQL_TIME_STATUS as it is.
>
> The last comment then:
> Item_func_trt_id::val_int()
> Item_interval_DDhhmmssff_typecast::val_str()
> there you get the 'current_thd' which is considered to be slow.
> Code looks like
> THD *thd= current_thd;
> Datetime::Options opt(TIME_CONV_NONE, thd);
> if (args[0]->get_date(thd, &commit_ts, opt))
> I think it makes sence to add the Item_func_trt_id::m_opt and fill it in
> ::fix_fields().
Unfortunately this does not help much in these two places exactly.
We need to pass thd to get_date() anyway.
Methods like val_int(), val_str(), etc, should be fixed eventually to
get an extra THD* parameter.
But I checked the entiner patch again and fixed a few other places
where some current_thd calls were redundant.
Thanks!
>
> Ok with me to push if you fix it like that.
>
> One extra idea though. We can get rid of current_thd
> in set_date_length() too -
> add the THD * parameter there and call in in ::fix_fields() instead of
> ::fix_length_and_dec().
> Since it doesn't seem critical, i'm fine if you leave it as it is :)
Alas, fix_length_and_dec() does not get THD* as an argument.
Only fix_fields() does.
fix_length_and_dec() should also be fixed to get THD*.
Btw, perhaps 10.4 is a good timing for this.
>
> Best regards.
> HF
>
>
>
> On Sun, Nov 25, 2018 at 9:42 AM Alexander Barkov <bar(a)mariadb.com
> <mailto:bar@mariadb.com>> wrote:
>
> Hi Alexey,
>
>
> On 11/25/2018 09:05 AM, Alexander Barkov wrote:
> >>
> >> What do you think if we do this:
> >> - add THD* memver to the MYSQL_TIME_STATUS structure,
> >> so we can get rid of the THD * parameter to many functions?
> >> - send one MYSQL_TIME_STATUS* to the add_nanoseconds()
> >> instead of three {thd, &status->warnings, and
> status->nanoseconds} ?
> >
> > Changing all functions that accept both &warn and nsec to
> > accept a pointer to MYSQL_TIME_STATUS instead, like this:
> >
> > < xxxx(thd, &status->warnings, and status->nanoseconds);
> >> xxxx(thd, &status);
> >
> > is a very good idea.
> >
> >
> > I'd avoid mix "thd" to MYSQL_TIME_STATUS though.
> > It's a pure C structure.
> >
> > Does it sound OK?
> >
>
> I tried this, but this does not save anything.
> See the attached patch.
>
> git diff --stat
> sql/sql_time.cc | 6 +++---
> sql/sql_type.cc | 2 +-
> sql/sql_type.h | 8 ++++++++
> 3 files changed, 12 insertions(+), 4 deletions(-)
>
>
> In all other places "*warn" and "nsec" come from different
> places (not from the same MYSQL_TIME_STATUS).
>
>
> Btw, MYSQL_TIME_STATUS is actually an interface to low level
> pure C conversion functions like str_to_xxx() and number_to_xxx().
> Let's not propagate it all around the code.
>
> Can we leave the relevant code as is?
>
> Thanks.
>
1
0
Re: [Maria-developers] [Commits] 8bfb140d5dc: Move deletion of old GTID rows to slave background thread
by Kristian Nielsen 25 Nov '18
by Kristian Nielsen 25 Nov '18
25 Nov '18
Hi Andrei!
Here is the patch to move the cleanup of old GTID rows into the slave
background thread. It is mostly identical to the patch I wrote a month
ago, but now I got test cases added/updated and I ran it through buildbot.
Want to review it? The patch should be complete now. The patch is also
available in git here:
https://github.com/knielsen/server/tree/gtid_delete_in_background_thread
This patch could go into 10.4.
With the patch, a request is sent to the background thread to delete old
rows every @@gtid_cleanup_batch_size replicated transactions. Some
considerations on the default of this value:
- It should be reasonably large so that the background thread only needs to
wake up occasionally, and a number of row deletions can share the cost of
the part that is not inherently per-row.
- A large enough value can group a number of row deletions in a single
transaction which is probably beneficial. There is also a fixed overhead
per existing gtid_domain_id to scan the GTID state, so the value should
not be much smaller than the number of domains in use.
- It should not be too large either, since that could cause the
mysql.gtid_slave_pos table to grow unnecessarily. Ideally the table
should fit in one InnoDB data page.
I settled on a default of 64. Probably a wide range will work fine in >99%
of cases, eg. 10 or 100 should be mostly the same. 64 should be large enough
that the per-batch overhead becomes small relative to the per-row, even if
hundreds of domains are used. And small enough that the table is likely to
be able to fit in a single InnoDB 16k page, even considering that deletion
is asynchroneous and more rows can accumulate temporarily due to parallel
replication etc.
I think this is a nice improvement. Moving the deletion logic away from the
main replication SQL/worker threads simplifies the logic, eg. for when a
transaction fails after a deletion has already been done. And it also has
the potential to be more efficient; the contention on global mutex
LOCK_slave_state is reduced, processing is moved out of the main replication
threads, and batching row deletions could potentially be cheaper than doing
them one-by-one.
- Kristian.
Kristian Nielsen <knielsen(a)knielsen-hq.org> writes:
> revision-id: 8bfb140d5dc247c183787b8a0a1799cf375845bd (mariadb-10.3.10-25-g8bfb140d5dc)
> parent(s): 74387028a06c557f36a0fd1bbde347f1551c8fb7
> author: Kristian Nielsen
> committer: Kristian Nielsen
> timestamp: 2018-11-25 19:38:33 +0100
> message:
>
> Move deletion of old GTID rows to slave background thread
>
> This patch changes how old rows in mysql.gtid_slave_pos* tables are deleted.
> Instead of doing it as part of every replicated transaction in
> record_gtid(), it is done periodically (every @@gtid_cleanup_batch_size
> transaction) in the slave background thread.
>
> This removes the deletion step from the replication process in SQL or worker
> threads, which could speed up replication with many small transactions. It
> also decreases contention on the global mutex LOCK_slave_state. And it
> simplifies the logic, eg. when a replicated transaction fails after having
> deleted old rows.
>
> With this patch, the deletion of old GTID rows happens asynchroneously and
> slightly non-deterministic. Thus the number of old rows in
> mysql.gtid_slave_pos can temporarily exceed @@gtid_cleanup_batch_size. But
> all old rows will be deleted eventually after sufficiently many new GTIDs
> have been replicated.
>
> ---
> mysql-test/main/mysqld--help.result | 10 +
> mysql-test/suite/rpl/r/rpl_gtid_mdev4484.result | 40 +-
> mysql-test/suite/rpl/r/rpl_gtid_stop_start.result | 8 +-
> .../suite/rpl/r/rpl_parallel_optimistic.result | 14 +-
> mysql-test/suite/rpl/t/rpl_gtid_mdev4484.test | 68 +++-
> .../suite/rpl/t/rpl_parallel_optimistic.test | 42 ++-
> .../sys_vars/r/sysvars_server_notembedded.result | 14 +
> sql/log_event.cc | 6 +-
> sql/mysqld.cc | 1 +
> sql/mysqld.h | 1 +
> sql/rpl_gtid.cc | 413 +++++++++++++--------
> sql/rpl_gtid.h | 12 +-
> sql/rpl_rli.cc | 87 +----
> sql/rpl_rli.h | 11 -
> sql/slave.cc | 35 +-
> sql/slave.h | 1 +
> sql/sys_vars.cc | 13 +
> .../mysql-test/rocksdb_rpl/r/mdev12179.result | 18 +
> .../mysql-test/rocksdb_rpl/t/mdev12179.test | 85 +++++
> .../mysql-test/tokudb_rpl/r/mdev12179.result | 18 +
> .../tokudb/mysql-test/tokudb_rpl/t/mdev12179.test | 85 +++++
> 21 files changed, 675 insertions(+), 307 deletions(-)
>
> diff --git a/mysql-test/main/mysqld--help.result b/mysql-test/main/mysqld--help.result
> index 5a7153f32d3..4f801ec5275 100644
> --- a/mysql-test/main/mysqld--help.result
> +++ b/mysql-test/main/mysqld--help.result
> @@ -294,6 +294,15 @@ The following specify which files/extra groups are read (specified before remain
> --group-concat-max-len=#
> The maximum length of the result of function
> GROUP_CONCAT()
> + --gtid-cleanup-batch-size=#
> + Normally does not need tuning. How many old rows must
> + accumulate in the mysql.gtid_slave_pos table before a
> + background job will be run to delete them. Can be
> + increased to reduce number of commits if using many
> + different engines with --gtid_pos_auto_engines, or to
> + reduce CPU overhead if using a huge number of different
> + gtid_domain_ids. Can be decreased to reduce number of old
> + rows in the table.
> --gtid-domain-id=# Used with global transaction ID to identify logically
> independent replication streams. When events can
> propagate through multiple parallel paths (for example
> @@ -1425,6 +1434,7 @@ gdb FALSE
> general-log FALSE
> getopt-prefix-matching FALSE
> group-concat-max-len 1048576
> +gtid-cleanup-batch-size 64
> gtid-domain-id 0
> gtid-ignore-duplicates FALSE
> gtid-pos-auto-engines
> diff --git a/mysql-test/suite/rpl/r/rpl_gtid_mdev4484.result b/mysql-test/suite/rpl/r/rpl_gtid_mdev4484.result
> index aaeb0c8f119..55d2831dcf4 100644
> --- a/mysql-test/suite/rpl/r/rpl_gtid_mdev4484.result
> +++ b/mysql-test/suite/rpl/r/rpl_gtid_mdev4484.result
> @@ -16,36 +16,32 @@ INSERT INTO t1 VALUES (1);
> connection slave;
> connection slave;
> include/stop_slave.inc
> +SET @old_gtid_cleanup_batch_size= @@GLOBAL.gtid_cleanup_batch_size;
> +SET GLOBAL gtid_cleanup_batch_size= 2;
> SET @old_dbug= @@GLOBAL.debug_dbug;
> SET GLOBAL debug_dbug="+d,gtid_slave_pos_simulate_failed_delete";
> SET sql_log_bin= 0;
> -CALL mtr.add_suppression("Can't find file");
> +CALL mtr.add_suppression("<DEBUG> Error deleting old GTID row");
> SET sql_log_bin= 1;
> include/start_slave.inc
> connection master;
> -INSERT INTO t1 VALUES (2);
> -connection slave;
> -include/wait_for_slave_sql_error.inc [errno=1942]
> -STOP SLAVE IO_THREAD;
> -SELECT domain_id, server_id, seq_no FROM mysql.gtid_slave_pos
> -ORDER BY domain_id, sub_id DESC LIMIT 1;
> -domain_id server_id seq_no
> -0 1 3
> +connection slave;
> +SELECT COUNT(*), MAX(seq_no) INTO @pre_count, @pre_max_seq_no
> +FROM mysql.gtid_slave_pos;
> +SELECT IF(@pre_count >= 20, "OK", CONCAT("Error: too few rows seen while errors injected: ", @pre_count));
> +IF(@pre_count >= 20, "OK", CONCAT("Error: too few rows seen while errors injected: ", @pre_count))
> +OK
> SET GLOBAL debug_dbug= @old_dbug;
> -include/start_slave.inc
> connection master;
> -INSERT INTO t1 VALUES (3);
> -connection slave;
> -connection slave;
> -SELECT domain_id, server_id, seq_no FROM mysql.gtid_slave_pos
> -ORDER BY domain_id, sub_id DESC LIMIT 1;
> -domain_id server_id seq_no
> -0 1 4
> -SELECT * FROM t1 ORDER BY i;
> -i
> -1
> -2
> -3
> +connection slave;
> +connection slave;
> +SELECT IF(COUNT(*) >= 1, "OK", CONCAT("Error: too few rows seen after errors no longer injected: ", COUNT(*)))
> +FROM mysql.gtid_slave_pos
> +WHERE seq_no <= @pre_max_seq_no;
> +IF(COUNT(*) >= 1, "OK", CONCAT("Error: too few rows seen after errors no longer injected: ", COUNT(*)))
> +OK
> connection master;
> DROP TABLE t1;
> +connection slave;
> +SET GLOBAL gtid_cleanup_batch_size= @old_gtid_cleanup_batch_size;
> include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/r/rpl_gtid_stop_start.result b/mysql-test/suite/rpl/r/rpl_gtid_stop_start.result
> index ff845794c22..b27ffed9f94 100644
> --- a/mysql-test/suite/rpl/r/rpl_gtid_stop_start.result
> +++ b/mysql-test/suite/rpl/r/rpl_gtid_stop_start.result
> @@ -171,7 +171,7 @@ include/start_slave.inc
> *** MDEV-4692: mysql.gtid_slave_pos accumulates values for a domain ***
> SELECT domain_id, COUNT(*) FROM mysql.gtid_slave_pos GROUP BY domain_id;
> domain_id COUNT(*)
> -0 2
> +0 3
> 1 2
> connection server_1;
> INSERT INTO t1 VALUES (11);
> @@ -179,7 +179,7 @@ connection server_2;
> FLUSH NO_WRITE_TO_BINLOG TABLES;
> SELECT domain_id, COUNT(*) FROM mysql.gtid_slave_pos GROUP BY domain_id;
> domain_id COUNT(*)
> -0 2
> +0 4
> 1 2
> include/start_slave.inc
> connection server_1;
> @@ -189,8 +189,8 @@ connection server_2;
> FLUSH NO_WRITE_TO_BINLOG TABLES;
> SELECT domain_id, COUNT(*) FROM mysql.gtid_slave_pos GROUP BY domain_id;
> domain_id COUNT(*)
> -0 2
> -1 2
> +0 3
> +1 1
> *** MDEV-4650: show variables; ERROR 1946 (HY000): Failed to load replication slave GTID position ***
> connection server_2;
> SET sql_log_bin=0;
> diff --git a/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result b/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result
> index ca202a66b0e..83343e52cab 100644
> --- a/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result
> +++ b/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result
> @@ -12,6 +12,8 @@ SET GLOBAL slave_parallel_threads=10;
> CHANGE MASTER TO master_use_gtid=slave_pos;
> SET @old_parallel_mode=@@GLOBAL.slave_parallel_mode;
> SET GLOBAL slave_parallel_mode='optimistic';
> +SET @old_gtid_cleanup_batch_size= @@GLOBAL.gtid_cleanup_batch_size;
> +SET GLOBAL gtid_cleanup_batch_size= 1000000;
> connection server_1;
> INSERT INTO t1 VALUES(1,1);
> BEGIN;
> @@ -131,6 +133,11 @@ c
> 204
> 205
> 206
> +SELECT IF(COUNT(*) >= 30, "OK", CONCAT("Error: too few old rows found: ", COUNT(*)))
> +FROM mysql.gtid_slave_pos;
> +IF(COUNT(*) >= 30, "OK", CONCAT("Error: too few old rows found: ", COUNT(*)))
> +OK
> +SET GLOBAL gtid_cleanup_batch_size=1;
> *** Test @@skip_parallel_replication. ***
> connection server_2;
> include/stop_slave.inc
> @@ -651,9 +658,10 @@ DROP TABLE t1, t2, t3;
> include/save_master_gtid.inc
> connection server_2;
> include/sync_with_master_gtid.inc
> -Check that no more than the expected last four GTIDs are in mysql.gtid_slave_pos
> -select count(4) <= 4 from mysql.gtid_slave_pos order by domain_id, sub_id;
> -count(4) <= 4
> +SELECT COUNT(*) <= 5*@@GLOBAL.gtid_cleanup_batch_size
> +FROM mysql.gtid_slave_pos;
> +COUNT(*) <= 5*@@GLOBAL.gtid_cleanup_batch_size
> 1
> +SET GLOBAL gtid_cleanup_batch_size= @old_gtid_cleanup_batch_size;
> connection server_1;
> include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_gtid_mdev4484.test b/mysql-test/suite/rpl/t/rpl_gtid_mdev4484.test
> index e1f5696f5a1..a28bff3d27a 100644
> --- a/mysql-test/suite/rpl/t/rpl_gtid_mdev4484.test
> +++ b/mysql-test/suite/rpl/t/rpl_gtid_mdev4484.test
> @@ -28,37 +28,79 @@ INSERT INTO t1 VALUES (1);
> # Inject an artificial error deleting entries, and check that the error handling code works.
> --connection slave
> --source include/stop_slave.inc
> +SET @old_gtid_cleanup_batch_size= @@GLOBAL.gtid_cleanup_batch_size;
> +SET GLOBAL gtid_cleanup_batch_size= 2;
> SET @old_dbug= @@GLOBAL.debug_dbug;
> SET GLOBAL debug_dbug="+d,gtid_slave_pos_simulate_failed_delete";
> SET sql_log_bin= 0;
> -CALL mtr.add_suppression("Can't find file");
> +CALL mtr.add_suppression("<DEBUG> Error deleting old GTID row");
> SET sql_log_bin= 1;
> --source include/start_slave.inc
>
> --connection master
> -INSERT INTO t1 VALUES (2);
> +--disable_query_log
> +let $i = 20;
> +while ($i) {
> + eval INSERT INTO t1 VALUES ($i+10);
> + dec $i;
> +}
> +--enable_query_log
> +--save_master_pos
>
> --connection slave
> ---let $slave_sql_errno= 1942
> ---source include/wait_for_slave_sql_error.inc
> -STOP SLAVE IO_THREAD;
> -SELECT domain_id, server_id, seq_no FROM mysql.gtid_slave_pos
> - ORDER BY domain_id, sub_id DESC LIMIT 1;
> +--sync_with_master
> +
> +# Now wait for the slave background thread to try to delete old rows and
> +# hit the error injection.
> +--let _TEST_MYSQLD_ERROR_LOG=$MYSQLTEST_VARDIR/log/mysqld.2.err
> +--perl
> + open F, '<', $ENV{'_TEST_MYSQLD_ERROR_LOG'} or die;
> + outer: while (1) {
> + inner: while (<F>) {
> + last outer if /<DEBUG> Error deleting old GTID row/;
> + }
> + # Easy way to do sub-second sleep without extra modules.
> + select(undef, undef, undef, 0.1);
> + }
> +EOF
> +
> +# Since we injected error in the cleanup code, the rows should remain in
> +# mysql.gtid_slave_pos. Check that we have at least 20 (more robust against
> +# non-deterministic cleanup and future changes than checking for exact number).
> +SELECT COUNT(*), MAX(seq_no) INTO @pre_count, @pre_max_seq_no
> + FROM mysql.gtid_slave_pos;
> +SELECT IF(@pre_count >= 20, "OK", CONCAT("Error: too few rows seen while errors injected: ", @pre_count));
> SET GLOBAL debug_dbug= @old_dbug;
> ---source include/start_slave.inc
>
> --connection master
> -INSERT INTO t1 VALUES (3);
> +--disable_query_log
> +let $i = 20;
> +while ($i) {
> + eval INSERT INTO t1 VALUES ($i+40);
> + dec $i;
> +}
> +--enable_query_log
> --sync_slave_with_master
>
> --connection slave
> -SELECT domain_id, server_id, seq_no FROM mysql.gtid_slave_pos
> - ORDER BY domain_id, sub_id DESC LIMIT 1;
> -SELECT * FROM t1 ORDER BY i;
> -
> +# Now check that 1) rows are being deleted again after removing error
> +# injection, and 2) old rows are left that failed their delete while errors
> +# where injected (again compensating for non-deterministic deletion).
> +# Deletion is async and slightly non-deterministic, so we wait for at
> +# least 10 of the 20 new rows to be deleted.
> +let $wait_condition=
> + SELECT COUNT(*) <= 20-10
> + FROM mysql.gtid_slave_pos
> + WHERE seq_no > @pre_max_seq_no;
> +--source include/wait_condition.inc
> +SELECT IF(COUNT(*) >= 1, "OK", CONCAT("Error: too few rows seen after errors no longer injected: ", COUNT(*)))
> + FROM mysql.gtid_slave_pos
> + WHERE seq_no <= @pre_max_seq_no;
>
> # Clean up
> --connection master
> DROP TABLE t1;
> +--connection slave
> +SET GLOBAL gtid_cleanup_batch_size= @old_gtid_cleanup_batch_size;
>
> --source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test b/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test
> index e08472d5f51..0060cf4416c 100644
> --- a/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test
> +++ b/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test
> @@ -21,6 +21,10 @@ SET GLOBAL slave_parallel_threads=10;
> CHANGE MASTER TO master_use_gtid=slave_pos;
> SET @old_parallel_mode=@@GLOBAL.slave_parallel_mode;
> SET GLOBAL slave_parallel_mode='optimistic';
> +# Run the first part of the test with high batch size and see that
> +# old rows remain in the table.
> +SET @old_gtid_cleanup_batch_size= @@GLOBAL.gtid_cleanup_batch_size;
> +SET GLOBAL gtid_cleanup_batch_size= 1000000;
>
>
> --connection server_1
> @@ -108,7 +112,12 @@ SELECT * FROM t3 ORDER BY c;
> SELECT * FROM t1 ORDER BY a;
> SELECT * FROM t2 ORDER BY a;
> SELECT * FROM t3 ORDER BY c;
> -#SHOW STATUS LIKE 'Slave_retried_transactions';
> +# Check that we have a bunch of old rows left-over - they were not deleted
> +# due to high @@gtid_cleanup_batch_size. Then set a low
> +# @@gtid_cleanup_batch_size so we can test that rows start being deleted.
> +SELECT IF(COUNT(*) >= 30, "OK", CONCAT("Error: too few old rows found: ", COUNT(*)))
> + FROM mysql.gtid_slave_pos;
> +SET GLOBAL gtid_cleanup_batch_size=1;
>
>
> --echo *** Test @@skip_parallel_replication. ***
> @@ -557,25 +566,18 @@ DROP TABLE t1, t2, t3;
>
> --connection server_2
> --source include/sync_with_master_gtid.inc
> -# Check for left-over rows in table mysql.gtid_slave_pos (MDEV-12147).
> -#
> -# There was a bug when a transaction got a conflict and was rolled back. It
> -# might have also handled deletion of some old rows, and these deletions would
> -# then also be rolled back. And since the deletes were never re-tried, old no
> -# longer needed rows would accumulate in the table without limit.
> -#
> -# The earlier part of this test file have plenty of transactions being rolled
> -# back. But the last DROP TABLE statement runs on its own and should never
> -# conflict, thus at this point the mysql.gtid_slave_pos table should be clean.
> -#
> -# To support @@gtid_pos_auto_engines, when a row is inserted in the table, it
> -# is associated with the engine of the table at insertion time, and it will
> -# only be deleted during record_gtid from a table of the same engine. Since we
> -# alter the table from MyISAM to InnoDB at the start of this test, we should
> -# end up with 4 rows: two left-over from when the table was MyISAM, and two
> -# left-over from the InnoDB part.
> ---echo Check that no more than the expected last four GTIDs are in mysql.gtid_slave_pos
> -select count(4) <= 4 from mysql.gtid_slave_pos order by domain_id, sub_id;
> +# Check that old rows are deleted from mysql.gtid_slave_pos.
> +# Deletion is asynchronous, so use wait_condition.inc.
> +# Also, there is a small amount of non-determinism in the deletion of old
> +# rows, so it is not guaranteed that there can never be more than
> +# @@gtid_cleanup_batch_size rows in the table; so allow a bit of slack
> +# here.
> +let $wait_condition=
> + SELECT COUNT(*) <= 5*@@GLOBAL.gtid_cleanup_batch_size
> + FROM mysql.gtid_slave_pos;
> +--source include/wait_condition.inc
> +eval $wait_condition;
> +SET GLOBAL gtid_cleanup_batch_size= @old_gtid_cleanup_batch_size;
>
> --connection server_1
> --source include/rpl_end.inc
> diff --git a/mysql-test/suite/sys_vars/r/sysvars_server_notembedded.result b/mysql-test/suite/sys_vars/r/sysvars_server_notembedded.result
> index e8e4d671eb9..5c5ca8b66b2 100644
> --- a/mysql-test/suite/sys_vars/r/sysvars_server_notembedded.result
> +++ b/mysql-test/suite/sys_vars/r/sysvars_server_notembedded.result
> @@ -1202,6 +1202,20 @@ NUMERIC_BLOCK_SIZE NULL
> ENUM_VALUE_LIST NULL
> READ_ONLY NO
> COMMAND_LINE_ARGUMENT NULL
> +VARIABLE_NAME GTID_CLEANUP_BATCH_SIZE
> +SESSION_VALUE NULL
> +GLOBAL_VALUE 64
> +GLOBAL_VALUE_ORIGIN COMPILE-TIME
> +DEFAULT_VALUE 64
> +VARIABLE_SCOPE GLOBAL
> +VARIABLE_TYPE INT UNSIGNED
> +VARIABLE_COMMENT Normally does not need tuning. How many old rows must accumulate in the mysql.gtid_slave_pos table before a background job will be run to delete them. Can be increased to reduce number of commits if using many different engines with --gtid_pos_auto_engines, or to reduce CPU overhead if using a huge number of different gtid_domain_ids. Can be decreased to reduce number of old rows in the table.
> +NUMERIC_MIN_VALUE 0
> +NUMERIC_MAX_VALUE 2147483647
> +NUMERIC_BLOCK_SIZE 1
> +ENUM_VALUE_LIST NULL
> +READ_ONLY NO
> +COMMAND_LINE_ARGUMENT REQUIRED
> VARIABLE_NAME GTID_CURRENT_POS
> SESSION_VALUE NULL
> GLOBAL_VALUE
> diff --git a/sql/log_event.cc b/sql/log_event.cc
> index 8813d20578e..e10480fb015 100644
> --- a/sql/log_event.cc
> +++ b/sql/log_event.cc
> @@ -5565,7 +5565,7 @@ int Query_log_event::do_apply_event(rpl_group_info *rgi,
> gtid= rgi->current_gtid;
> if (unlikely(rpl_global_gtid_slave_state->record_gtid(thd, >id,
> sub_id,
> - rgi, false,
> + true, false,
> &hton)))
> {
> int errcode= thd->get_stmt_da()->sql_errno();
> @@ -8362,7 +8362,7 @@ Gtid_list_log_event::do_apply_event(rpl_group_info *rgi)
> {
> if ((ret= rpl_global_gtid_slave_state->record_gtid(thd, &list[i],
> sub_id_list[i],
> - NULL, false, &hton)))
> + false, false, &hton)))
> return ret;
> rpl_global_gtid_slave_state->update_state_hash(sub_id_list[i], &list[i],
> hton, NULL);
> @@ -8899,7 +8899,7 @@ int Xid_log_event::do_apply_event(rpl_group_info *rgi)
> rgi->gtid_pending= false;
>
> gtid= rgi->current_gtid;
> - err= rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id, rgi,
> + err= rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id, true,
> false, &hton);
> if (unlikely(err))
> {
> diff --git a/sql/mysqld.cc b/sql/mysqld.cc
> index afef4a5f52c..07bdd66f74c 100644
> --- a/sql/mysqld.cc
> +++ b/sql/mysqld.cc
> @@ -580,6 +580,7 @@ ulong opt_binlog_commit_wait_count= 0;
> ulong opt_binlog_commit_wait_usec= 0;
> ulong opt_slave_parallel_max_queued= 131072;
> my_bool opt_gtid_ignore_duplicates= FALSE;
> +uint opt_gtid_cleanup_batch_size= 64;
>
> const double log_10[] = {
> 1e000, 1e001, 1e002, 1e003, 1e004, 1e005, 1e006, 1e007, 1e008, 1e009,
> diff --git a/sql/mysqld.h b/sql/mysqld.h
> index d5cabd790b2..261748372f9 100644
> --- a/sql/mysqld.h
> +++ b/sql/mysqld.h
> @@ -258,6 +258,7 @@ extern ulong opt_slave_parallel_mode;
> extern ulong opt_binlog_commit_wait_count;
> extern ulong opt_binlog_commit_wait_usec;
> extern my_bool opt_gtid_ignore_duplicates;
> +extern uint opt_gtid_cleanup_batch_size;
> extern ulong back_log;
> extern ulong executed_events;
> extern char language[FN_REFLEN];
> diff --git a/sql/rpl_gtid.cc b/sql/rpl_gtid.cc
> index fabd09adaa7..196c2fe3d16 100644
> --- a/sql/rpl_gtid.cc
> +++ b/sql/rpl_gtid.cc
> @@ -79,7 +79,7 @@ rpl_slave_state::record_and_update_gtid(THD *thd, rpl_group_info *rgi)
> rgi->gtid_pending= false;
> if (rgi->gtid_ignore_duplicate_state!=rpl_group_info::GTID_DUPLICATE_IGNORE)
> {
> - if (record_gtid(thd, &rgi->current_gtid, sub_id, NULL, false, &hton))
> + if (record_gtid(thd, &rgi->current_gtid, sub_id, false, false, &hton))
> DBUG_RETURN(1);
> update_state_hash(sub_id, &rgi->current_gtid, hton, rgi);
> }
> @@ -244,7 +244,7 @@ rpl_slave_state_free_element(void *arg)
>
>
> rpl_slave_state::rpl_slave_state()
> - : last_sub_id(0), gtid_pos_tables(0), loaded(false)
> + : pending_gtid_count(0), last_sub_id(0), gtid_pos_tables(0), loaded(false)
> {
> mysql_mutex_init(key_LOCK_slave_state, &LOCK_slave_state,
> MY_MUTEX_INIT_SLOW);
> @@ -331,14 +331,11 @@ rpl_slave_state::update(uint32 domain_id, uint32 server_id, uint64 sub_id,
> }
> }
> rgi->gtid_ignore_duplicate_state= rpl_group_info::GTID_DUPLICATE_NULL;
> -
> -#ifdef HAVE_REPLICATION
> - rgi->pending_gtid_deletes_clear();
> -#endif
> }
>
> if (!(list_elem= (list_element *)my_malloc(sizeof(*list_elem), MYF(MY_WME))))
> return 1;
> + list_elem->domain_id= domain_id;
> list_elem->server_id= server_id;
> list_elem->sub_id= sub_id;
> list_elem->seq_no= seq_no;
> @@ -348,6 +345,15 @@ rpl_slave_state::update(uint32 domain_id, uint32 server_id, uint64 sub_id,
> if (last_sub_id < sub_id)
> last_sub_id= sub_id;
>
> +#ifdef HAVE_REPLICATION
> + ++pending_gtid_count;
> + if (pending_gtid_count >= opt_gtid_cleanup_batch_size)
> + {
> + pending_gtid_count = 0;
> + slave_background_gtid_pending_delete_request();
> + }
> +#endif
> +
> return 0;
> }
>
> @@ -382,20 +388,22 @@ rpl_slave_state::get_element(uint32 domain_id)
>
>
> int
> -rpl_slave_state::put_back_list(uint32 domain_id, list_element *list)
> +rpl_slave_state::put_back_list(list_element *list)
> {
> - element *e;
> + element *e= NULL;
> int err= 0;
>
> mysql_mutex_lock(&LOCK_slave_state);
> - if (!(e= (element *)my_hash_search(&hash, (const uchar *)&domain_id, 0)))
> - {
> - err= 1;
> - goto end;
> - }
> while (list)
> {
> list_element *next= list->next;
> +
> + if ((!e || e->domain_id != list->domain_id) &&
> + !(e= (element *)my_hash_search(&hash, (const uchar *)&list->domain_id, 0)))
> + {
> + err= 1;
> + goto end;
> + }
> e->add(list);
> list= next;
> }
> @@ -572,12 +580,12 @@ rpl_slave_state::select_gtid_pos_table(THD *thd, LEX_CSTRING *out_tablename)
> /*
> Write a gtid to the replication slave state table.
>
> + Do it as part of the transaction, to get slave crash safety, or as a separate
> + transaction if !in_transaction (eg. MyISAM or DDL).
> +
> gtid The global transaction id for this event group.
> sub_id Value allocated within the sub_id when the event group was
> read (sub_id must be consistent with commit order in master binlog).
> - rgi rpl_group_info context, if we are recording the gtid transactionally
> - as part of replicating a transactional event. NULL if called from
> - outside of a replicated transaction.
>
> Note that caller must later ensure that the new gtid and sub_id is inserted
> into the appropriate HASH element with rpl_slave_state.add(), so that it can
> @@ -585,16 +593,13 @@ rpl_slave_state::select_gtid_pos_table(THD *thd, LEX_CSTRING *out_tablename)
> */
> int
> rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id,
> - rpl_group_info *rgi, bool in_statement,
> + bool in_transaction, bool in_statement,
> void **out_hton)
> {
> TABLE_LIST tlist;
> int err= 0, not_sql_thread;
> bool table_opened= false;
> TABLE *table;
> - list_element *delete_list= 0, *next, *cur, **next_ptr_ptr, **best_ptr_ptr;
> - uint64 best_sub_id;
> - element *elem;
> ulonglong thd_saved_option= thd->variables.option_bits;
> Query_tables_list lex_backup;
> wait_for_commit* suspended_wfc;
> @@ -684,7 +689,7 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id,
> thd->wsrep_ignore_table= true;
> #endif
>
> - if (!rgi)
> + if (!in_transaction)
> {
> DBUG_PRINT("info", ("resetting OPTION_BEGIN"));
> thd->variables.option_bits&=
> @@ -716,168 +721,280 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id,
> my_error(ER_OUT_OF_RESOURCES, MYF(0));
> goto end;
> }
> +end:
>
> - mysql_mutex_lock(&LOCK_slave_state);
> - if ((elem= get_element(gtid->domain_id)) == NULL)
> +#ifdef WITH_WSREP
> + thd->wsrep_ignore_table= false;
> +#endif
> +
> + if (table_opened)
> {
> - mysql_mutex_unlock(&LOCK_slave_state);
> - my_error(ER_OUT_OF_RESOURCES, MYF(0));
> - err= 1;
> - goto end;
> + if (err || (err= ha_commit_trans(thd, FALSE)))
> + ha_rollback_trans(thd, FALSE);
> +
> + close_thread_tables(thd);
> + if (in_transaction)
> + thd->mdl_context.release_statement_locks();
> + else
> + thd->mdl_context.release_transactional_locks();
> }
> + thd->lex->restore_backup_query_tables_list(&lex_backup);
> + thd->variables.option_bits= thd_saved_option;
> + thd->resume_subsequent_commits(suspended_wfc);
> + DBUG_EXECUTE_IF("inject_record_gtid_serverid_100_sleep",
> + {
> + if (gtid->server_id == 100)
> + my_sleep(500000);
> + });
> + DBUG_RETURN(err);
> +}
>
> - /* Now pull out all GTIDs that were recorded in this engine. */
> - delete_list = NULL;
> - next_ptr_ptr= &elem->list;
> - cur= elem->list;
> - best_sub_id= 0;
> - best_ptr_ptr= NULL;
> - while (cur)
> +
> +/*
> + Return a list of all old GTIDs in any mysql.gtid_slave_pos* table that are
> + no longer needed and can be deleted from the table.
> +
> + Within each domain, we need to keep around the latest GTID (the one with the
> + highest sub_id), but any others in that domain can be deleted.
> +*/
> +rpl_slave_state::list_element *
> +rpl_slave_state::gtid_grab_pending_delete_list()
> +{
> + uint32 i;
> + list_element *full_list;
> +
> + mysql_mutex_lock(&LOCK_slave_state);
> + full_list= NULL;
> + for (i= 0; i < hash.records; ++i)
> {
> - list_element *next= cur->next;
> - if (cur->hton == hton)
> - {
> - /* Belongs to same engine, so move it to the delete list. */
> - cur->next= delete_list;
> - delete_list= cur;
> - if (cur->sub_id > best_sub_id)
> + element *elem= (element *)my_hash_element(&hash, i);
> + list_element *elist= elem->list;
> + list_element *last_elem, **best_ptr_ptr, *cur, *next;
> + uint64 best_sub_id;
> +
> + if (!elist)
> + continue; /* Nothing here */
> +
> + /* Delete any old stuff, but keep around the most recent one. */
> + cur= elist;
> + best_sub_id= cur->sub_id;
> + best_ptr_ptr= &elist;
> + last_elem= cur;
> + while ((next= cur->next)) {
> + last_elem= next;
> + if (next->sub_id > best_sub_id)
> {
> - best_sub_id= cur->sub_id;
> - best_ptr_ptr= &delete_list;
> - }
> - else if (best_ptr_ptr == &delete_list)
> + best_sub_id= next->sub_id;
> best_ptr_ptr= &cur->next;
> - }
> - else
> - {
> - /* Another engine, leave it in the list. */
> - if (cur->sub_id > best_sub_id)
> - {
> - best_sub_id= cur->sub_id;
> - /* Current best is not on the delete list. */
> - best_ptr_ptr= NULL;
> }
> - *next_ptr_ptr= cur;
> - next_ptr_ptr= &cur->next;
> + cur= next;
> }
> - cur= next;
> - }
> - *next_ptr_ptr= NULL;
> - /*
> - If the highest sub_id element is on the delete list, put it back on the
> - original list, to preserve the highest sub_id element in the table for
> - GTID position recovery.
> - */
> - if (best_ptr_ptr)
> - {
> + /*
> + Append the new elements to the full list. Note the order is important;
> + we do it here so that we do not break the list if best_sub_id is the
> + last of the new elements.
> + */
> + last_elem->next= full_list;
> + /*
> + Delete the highest sub_id element from the old list, and put it back as
> + the single-element new list.
> + */
> cur= *best_ptr_ptr;
> *best_ptr_ptr= cur->next;
> - cur->next= elem->list;
> + cur->next= NULL;
> elem->list= cur;
> +
> + /*
> + Collect the full list so far here. Note that elist may have moved if we
> + deleted the first element, so order is again important.
> + */
> + full_list= elist;
> }
> mysql_mutex_unlock(&LOCK_slave_state);
>
> - if (!delete_list)
> - goto end;
> + return full_list;
> +}
> +
>
> - /* Now delete any already committed GTIDs. */
> - bitmap_set_bit(table->read_set, table->field[0]->field_index);
> - bitmap_set_bit(table->read_set, table->field[1]->field_index);
> +/* Find the mysql.gtid_slave_posXXX table associated with a given hton. */
> +LEX_CSTRING *
> +rpl_slave_state::select_gtid_pos_table(void *hton)
> +{
> + struct gtid_pos_table *table_entry;
>
> - if ((err= table->file->ha_index_init(0, 0)))
> + /*
> + See comments on rpl_slave_state::gtid_pos_tables for rules around proper
> + access to the list.
> + */
> + table_entry= (struct gtid_pos_table *)
> + my_atomic_loadptr_explicit(>id_pos_tables, MY_MEMORY_ORDER_ACQUIRE);
> +
> + while (table_entry)
> {
> - table->file->print_error(err, MYF(0));
> - goto end;
> + if (table_entry->table_hton == hton)
> + {
> + if (likely(table_entry->state == GTID_POS_AVAILABLE))
> + return &table_entry->table_name;
> + }
> + table_entry= table_entry->next;
> }
> - cur = delete_list;
> - while (cur)
> - {
> - uchar key_buffer[4+8];
>
> - DBUG_EXECUTE_IF("gtid_slave_pos_simulate_failed_delete",
> - { err= ENOENT;
> - table->file->print_error(err, MYF(0));
> - /* `break' does not work inside DBUG_EXECUTE_IF */
> - goto dbug_break; });
> + table_entry= (struct gtid_pos_table *)
> + my_atomic_loadptr_explicit(&default_gtid_pos_table, MY_MEMORY_ORDER_ACQUIRE);
> + return &table_entry->table_name;
> +}
>
> - next= cur->next;
>
> - table->field[1]->store(cur->sub_id, true);
> - /* domain_id is already set in table->record[0] from write_row() above. */
> - key_copy(key_buffer, table->record[0], &table->key_info[0], 0, false);
> - if (table->file->ha_index_read_map(table->record[1], key_buffer,
> - HA_WHOLE_KEY, HA_READ_KEY_EXACT))
> - /* We cannot find the row, assume it is already deleted. */
> - ;
> - else if ((err= table->file->ha_delete_row(table->record[1])))
> - table->file->print_error(err, MYF(0));
> - /*
> - In case of error, we still discard the element from the list. We do
> - not want to endlessly error on the same element in case of table
> - corruption or such.
> - */
> - cur= next;
> - if (err)
> - break;
> - }
> -IF_DBUG(dbug_break:, )
> - table->file->ha_index_end();
> +void
> +rpl_slave_state::gtid_delete_pending(THD *thd,
> + rpl_slave_state::list_element **list_ptr)
> +{
> + int err= 0;
> + ulonglong thd_saved_option;
>
> -end:
> + if (unlikely(!loaded))
> + return;
>
> #ifdef WITH_WSREP
> - thd->wsrep_ignore_table= false;
> + /*
> + Updates in slave state table should not be appended to galera transaction
> + writeset.
> + */
> + thd->wsrep_ignore_table= true;
> #endif
>
> - if (table_opened)
> + thd_saved_option= thd->variables.option_bits;
> + thd->variables.option_bits&=
> + ~(ulonglong)(OPTION_NOT_AUTOCOMMIT |OPTION_BEGIN |OPTION_BIN_LOG |
> + OPTION_GTID_BEGIN);
> +
> + while (*list_ptr)
> {
> - if (err || (err= ha_commit_trans(thd, FALSE)))
> - {
> - /*
> - If error, we need to put any remaining delete_list back into the HASH
> - so we can do another delete attempt later.
> - */
> - if (delete_list)
> - {
> - put_back_list(gtid->domain_id, delete_list);
> - delete_list = 0;
> - }
> + LEX_CSTRING *gtid_pos_table_name, *tmp_table_name;
> + Query_tables_list lex_backup;
> + TABLE_LIST tlist;
> + TABLE *table;
> + handler::Table_flags direct_pos;
> + list_element *cur, **cur_ptr_ptr;
> + bool table_opened= false;
> + void *hton= (*list_ptr)->hton;
>
> - ha_rollback_trans(thd, FALSE);
> + thd->reset_for_next_command();
> +
> + /*
> + Only the SQL thread can call select_gtid_pos_table without a mutex
> + Other threads needs to use a mutex and take into account that the
> + result may change during execution, so we have to make a copy.
> + */
> + mysql_mutex_lock(&LOCK_slave_state);
> + tmp_table_name= select_gtid_pos_table(hton);
> + gtid_pos_table_name= thd->make_clex_string(tmp_table_name->str,
> + tmp_table_name->length);
> + mysql_mutex_unlock(&LOCK_slave_state);
> + if (!gtid_pos_table_name)
> + {
> + /* Out of memory - we can try again later. */
> + break;
> }
> - close_thread_tables(thd);
> - if (rgi)
> +
> + thd->lex->reset_n_backup_query_tables_list(&lex_backup);
> + tlist.init_one_table(&MYSQL_SCHEMA_NAME, gtid_pos_table_name, NULL, TL_WRITE);
> + if ((err= open_and_lock_tables(thd, &tlist, FALSE, 0)))
> + goto end;
> + table_opened= true;
> + table= tlist.table;
> +
> + if ((err= gtid_check_rpl_slave_state_table(table)))
> + goto end;
> +
> + direct_pos= table->file->ha_table_flags() & HA_PRIMARY_KEY_REQUIRED_FOR_POSITION;
> + bitmap_set_all(table->write_set);
> + table->rpl_write_set= table->write_set;
> +
> + /* Now delete any already committed GTIDs. */
> + bitmap_set_bit(table->read_set, table->field[0]->field_index);
> + bitmap_set_bit(table->read_set, table->field[1]->field_index);
> +
> + if (!direct_pos && (err= table->file->ha_index_init(0, 0)))
> {
> - thd->mdl_context.release_statement_locks();
> - /*
> - Save the list of old gtid entries we deleted. If this transaction
> - fails later for some reason and is rolled back, the deletion of those
> - entries will be rolled back as well, and we will need to put them back
> - on the to-be-deleted list so we can re-do the deletion. Otherwise
> - redundant rows in mysql.gtid_slave_pos may accumulate if transactions
> - are rolled back and retried after record_gtid().
> - */
> -#ifdef HAVE_REPLICATION
> - rgi->pending_gtid_deletes_save(gtid->domain_id, delete_list);
> -#endif
> + table->file->print_error(err, MYF(0));
> + goto end;
> }
> - else
> +
> + cur = *list_ptr;
> + cur_ptr_ptr = list_ptr;
> + do
> {
> - thd->mdl_context.release_transactional_locks();
> -#ifdef HAVE_REPLICATION
> - rpl_group_info::pending_gtid_deletes_free(delete_list);
> -#endif
> + uchar key_buffer[4+8];
> + list_element *next= cur->next;
> +
> + if (cur->hton == hton)
> + {
> + int res;
> +
> + table->field[0]->store((ulonglong)cur->domain_id, true);
> + table->field[1]->store(cur->sub_id, true);
> + if (direct_pos)
> + {
> + res= table->file->ha_rnd_pos_by_record(table->record[0]);
> + }
> + else
> + {
> + key_copy(key_buffer, table->record[0], &table->key_info[0], 0, false);
> + res= table->file->ha_index_read_map(table->record[0], key_buffer,
> + HA_WHOLE_KEY, HA_READ_KEY_EXACT);
> + }
> + DBUG_EXECUTE_IF("gtid_slave_pos_simulate_failed_delete",
> + { res= 1;
> + err= ENOENT;
> + sql_print_error("<DEBUG> Error deleting old GTID row");
> + });
> + if (res)
> + /* We cannot find the row, assume it is already deleted. */
> + ;
> + else if ((err= table->file->ha_delete_row(table->record[0])))
> + {
> + sql_print_error("Error deleting old GTID row: %s",
> + thd->get_stmt_da()->message());
> + /*
> + In case of error, we still discard the element from the list. We do
> + not want to endlessly error on the same element in case of table
> + corruption or such.
> + */
> + }
> + *cur_ptr_ptr= next;
> + my_free(cur);
> + }
> + else
> + {
> + /* Leave this one in the list until we get to the table for its hton. */
> + cur_ptr_ptr= &cur->next;
> + }
> + cur= next;
> + if (err)
> + break;
> + } while (cur);
> +end:
> + if (table_opened)
> + {
> + if (!direct_pos)
> + table->file->ha_index_end();
> +
> + if (err || (err= ha_commit_trans(thd, FALSE)))
> + ha_rollback_trans(thd, FALSE);
> }
> + close_thread_tables(thd);
> + thd->mdl_context.release_transactional_locks();
> + thd->lex->restore_backup_query_tables_list(&lex_backup);
> +
> + if (err)
> + break;
> }
> - thd->lex->restore_backup_query_tables_list(&lex_backup);
> thd->variables.option_bits= thd_saved_option;
> - thd->resume_subsequent_commits(suspended_wfc);
> - DBUG_EXECUTE_IF("inject_record_gtid_serverid_100_sleep",
> - {
> - if (gtid->server_id == 100)
> - my_sleep(500000);
> - });
> - DBUG_RETURN(err);
> +
> +#ifdef WITH_WSREP
> + thd->wsrep_ignore_table= false;
> +#endif
> }
>
>
> @@ -1251,7 +1368,7 @@ rpl_slave_state::load(THD *thd, const char *state_from_master, size_t len,
>
> if (gtid_parser_helper(&state_from_master, end, >id) ||
> !(sub_id= next_sub_id(gtid.domain_id)) ||
> - record_gtid(thd, >id, sub_id, NULL, in_statement, &hton) ||
> + record_gtid(thd, >id, sub_id, false, in_statement, &hton) ||
> update(gtid.domain_id, gtid.server_id, sub_id, gtid.seq_no, hton, NULL))
> return 1;
> if (state_from_master == end)
> diff --git a/sql/rpl_gtid.h b/sql/rpl_gtid.h
> index 0fc92d5e33c..60d822f7b0d 100644
> --- a/sql/rpl_gtid.h
> +++ b/sql/rpl_gtid.h
> @@ -118,8 +118,9 @@ struct rpl_slave_state
> {
> struct list_element *next;
> uint64 sub_id;
> - uint64 seq_no;
> + uint32 domain_id;
> uint32 server_id;
> + uint64 seq_no;
> /*
> hton of mysql.gtid_slave_pos* table used to record this GTID.
> Can be NULL if the gtid table failed to load (eg. missing
> @@ -191,6 +192,8 @@ struct rpl_slave_state
>
> /* Mapping from domain_id to its element. */
> HASH hash;
> + /* GTIDs added since last purge of old mysql.gtid_slave_pos rows. */
> + uint32 pending_gtid_count;
> /* Mutex protecting access to the state. */
> mysql_mutex_t LOCK_slave_state;
> /* Auxiliary buffer to sort gtid list. */
> @@ -233,7 +236,10 @@ struct rpl_slave_state
> int truncate_state_table(THD *thd);
> void select_gtid_pos_table(THD *thd, LEX_CSTRING *out_tablename);
> int record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id,
> - rpl_group_info *rgi, bool in_statement, void **out_hton);
> + bool in_transaction, bool in_statement, void **out_hton);
> + list_element *gtid_grab_pending_delete_list();
> + LEX_CSTRING *select_gtid_pos_table(void *hton);
> + void gtid_delete_pending(THD *thd, rpl_slave_state::list_element **list_ptr);
> uint64 next_sub_id(uint32 domain_id);
> int iterate(int (*cb)(rpl_gtid *, void *), void *data,
> rpl_gtid *extra_gtids, uint32 num_extra,
> @@ -245,7 +251,7 @@ struct rpl_slave_state
> bool is_empty();
>
> element *get_element(uint32 domain_id);
> - int put_back_list(uint32 domain_id, list_element *list);
> + int put_back_list(list_element *list);
>
> void update_state_hash(uint64 sub_id, rpl_gtid *gtid, void *hton,
> rpl_group_info *rgi);
> diff --git a/sql/rpl_rli.cc b/sql/rpl_rli.cc
> index b275ad884bd..2d91620c898 100644
> --- a/sql/rpl_rli.cc
> +++ b/sql/rpl_rli.cc
> @@ -1820,6 +1820,7 @@ rpl_load_gtid_slave_state(THD *thd)
> int err= 0;
> uint32 i;
> load_gtid_state_cb_data cb_data;
> + rpl_slave_state::list_element *old_gtids_list;
> DBUG_ENTER("rpl_load_gtid_slave_state");
>
> mysql_mutex_lock(&rpl_global_gtid_slave_state->LOCK_slave_state);
> @@ -1905,6 +1906,13 @@ rpl_load_gtid_slave_state(THD *thd)
> rpl_global_gtid_slave_state->loaded= true;
> mysql_mutex_unlock(&rpl_global_gtid_slave_state->LOCK_slave_state);
>
> + /* Clear out no longer needed elements now. */
> + old_gtids_list=
> + rpl_global_gtid_slave_state->gtid_grab_pending_delete_list();
> + rpl_global_gtid_slave_state->gtid_delete_pending(thd, &old_gtids_list);
> + if (old_gtids_list)
> + rpl_global_gtid_slave_state->put_back_list(old_gtids_list);
> +
> end:
> if (array_inited)
> delete_dynamic(&array);
> @@ -2086,7 +2094,6 @@ rpl_group_info::reinit(Relay_log_info *rli)
> long_find_row_note_printed= false;
> did_mark_start_commit= false;
> gtid_ev_flags2= 0;
> - pending_gtid_delete_list= NULL;
> last_master_timestamp = 0;
> gtid_ignore_duplicate_state= GTID_DUPLICATE_NULL;
> speculation= SPECULATE_NO;
> @@ -2217,12 +2224,6 @@ void rpl_group_info::cleanup_context(THD *thd, bool error)
> erroneously update the GTID position.
> */
> gtid_pending= false;
> -
> - /*
> - Rollback will have undone any deletions of old rows we might have made
> - in mysql.gtid_slave_pos. Put those rows back on the list to be deleted.
> - */
> - pending_gtid_deletes_put_back();
> }
> m_table_map.clear_tables();
> slave_close_thread_tables(thd);
> @@ -2448,78 +2449,6 @@ rpl_group_info::unmark_start_commit()
> }
>
>
> -/*
> - When record_gtid() has deleted any old rows from the table
> - mysql.gtid_slave_pos as part of a replicated transaction, save the list of
> - rows deleted here.
> -
> - If later the transaction fails (eg. optimistic parallel replication), the
> - deletes will be undone when the transaction is rolled back. Then we can
> - put back the list of rows into the rpl_global_gtid_slave_state, so that
> - we can re-do the deletes and avoid accumulating old rows in the table.
> -*/
> -void
> -rpl_group_info::pending_gtid_deletes_save(uint32 domain_id,
> - rpl_slave_state::list_element *list)
> -{
> - /*
> - We should never get to a state where we try to save a new pending list of
> - gtid deletes while we still have an old one. But make sure we handle it
> - anyway just in case, so we avoid leaving stray entries in the
> - mysql.gtid_slave_pos table.
> - */
> - DBUG_ASSERT(!pending_gtid_delete_list);
> - if (unlikely(pending_gtid_delete_list))
> - pending_gtid_deletes_put_back();
> -
> - pending_gtid_delete_list= list;
> - pending_gtid_delete_list_domain= domain_id;
> -}
> -
> -
> -/*
> - Take the list recorded by pending_gtid_deletes_save() and put it back into
> - rpl_global_gtid_slave_state. This is needed if deletion of the rows was
> - rolled back due to transaction failure.
> -*/
> -void
> -rpl_group_info::pending_gtid_deletes_put_back()
> -{
> - if (pending_gtid_delete_list)
> - {
> - rpl_global_gtid_slave_state->put_back_list(pending_gtid_delete_list_domain,
> - pending_gtid_delete_list);
> - pending_gtid_delete_list= NULL;
> - }
> -}
> -
> -
> -/*
> - Free the list recorded by pending_gtid_deletes_save(). Done when the deletes
> - in the list have been permanently committed.
> -*/
> -void
> -rpl_group_info::pending_gtid_deletes_clear()
> -{
> - pending_gtid_deletes_free(pending_gtid_delete_list);
> - pending_gtid_delete_list= NULL;
> -}
> -
> -
> -void
> -rpl_group_info::pending_gtid_deletes_free(rpl_slave_state::list_element *list)
> -{
> - rpl_slave_state::list_element *next;
> -
> - while (list)
> - {
> - next= list->next;
> - my_free(list);
> - list= next;
> - }
> -}
> -
> -
> rpl_sql_thread_info::rpl_sql_thread_info(Rpl_filter *filter)
> : rpl_filter(filter)
> {
> diff --git a/sql/rpl_rli.h b/sql/rpl_rli.h
> index d9f0e0e5d3b..b8b153c34be 100644
> --- a/sql/rpl_rli.h
> +++ b/sql/rpl_rli.h
> @@ -757,11 +757,6 @@ struct rpl_group_info
> /* Needs room for "Gtid D-S-N\x00". */
> char gtid_info_buf[5+10+1+10+1+20+1];
>
> - /* List of not yet committed deletions in mysql.gtid_slave_pos. */
> - rpl_slave_state::list_element *pending_gtid_delete_list;
> - /* Domain associated with pending_gtid_delete_list. */
> - uint32 pending_gtid_delete_list_domain;
> -
> /*
> The timestamp, from the master, of the commit event.
> Used to do delayed update of rli->last_master_timestamp, for getting
> @@ -903,12 +898,6 @@ struct rpl_group_info
> char *gtid_info();
> void unmark_start_commit();
>
> - static void pending_gtid_deletes_free(rpl_slave_state::list_element *list);
> - void pending_gtid_deletes_save(uint32 domain_id,
> - rpl_slave_state::list_element *list);
> - void pending_gtid_deletes_put_back();
> - void pending_gtid_deletes_clear();
> -
> longlong get_row_stmt_start_timestamp()
> {
> return row_stmt_start_timestamp;
> diff --git a/sql/slave.cc b/sql/slave.cc
> index bb1300d36e6..f8499513dd6 100644
> --- a/sql/slave.cc
> +++ b/sql/slave.cc
> @@ -465,6 +465,8 @@ static struct slave_background_gtid_pos_create_t {
> void *hton;
> } *slave_background_gtid_pos_create_list;
>
> +static volatile bool slave_background_gtid_pending_delete_flag;
> +
>
> pthread_handler_t
> handle_slave_background(void *arg __attribute__((unused)))
> @@ -499,6 +501,7 @@ handle_slave_background(void *arg __attribute__((unused)))
> {
> slave_background_kill_t *kill_list;
> slave_background_gtid_pos_create_t *create_list;
> + bool pending_deletes;
>
> thd->ENTER_COND(&COND_slave_background, &LOCK_slave_background,
> &stage_slave_background_wait_request,
> @@ -508,13 +511,15 @@ handle_slave_background(void *arg __attribute__((unused)))
> stop= abort_loop || thd->killed || slave_background_thread_stop;
> kill_list= slave_background_kill_list;
> create_list= slave_background_gtid_pos_create_list;
> - if (stop || kill_list || create_list)
> + pending_deletes= slave_background_gtid_pending_delete_flag;
> + if (stop || kill_list || create_list || pending_deletes)
> break;
> mysql_cond_wait(&COND_slave_background, &LOCK_slave_background);
> }
>
> slave_background_kill_list= NULL;
> slave_background_gtid_pos_create_list= NULL;
> + slave_background_gtid_pending_delete_flag= false;
> thd->EXIT_COND(&old_stage);
>
> while (kill_list)
> @@ -541,6 +546,17 @@ handle_slave_background(void *arg __attribute__((unused)))
> create_list= next;
> }
>
> + if (pending_deletes)
> + {
> + rpl_slave_state::list_element *list;
> +
> + slave_background_gtid_pending_delete_flag= false;
> + list= rpl_global_gtid_slave_state->gtid_grab_pending_delete_list();
> + rpl_global_gtid_slave_state->gtid_delete_pending(thd, &list);
> + if (list)
> + rpl_global_gtid_slave_state->put_back_list(list);
> + }
> +
> mysql_mutex_lock(&LOCK_slave_background);
> } while (!stop);
>
> @@ -615,6 +631,23 @@ slave_background_gtid_pos_create_request(
>
>
> /*
> + Request the slave background thread to delete no longer used rows from the
> + mysql.gtid_slave_pos* tables.
> +
> + This is called from time-critical rpl_slave_state::update(), so we avoid
> + taking any locks here. This means we may race with the background thread
> + to occasionally lose a signal. This is not a problem; any pending rows to
> + be deleted will just be deleted a bit later as part of the next batch.
> +*/
> +void
> +slave_background_gtid_pending_delete_request(void)
> +{
> + slave_background_gtid_pending_delete_flag= true;
> + mysql_cond_signal(&COND_slave_background);
> +}
> +
> +
> +/*
> Start the slave background thread.
>
> This thread is currently used for two purposes:
> diff --git a/sql/slave.h b/sql/slave.h
> index 649d55b45b9..12d569b0333 100644
> --- a/sql/slave.h
> +++ b/sql/slave.h
> @@ -276,6 +276,7 @@ bool net_request_file(NET* net, const char* fname);
> void slave_background_kill_request(THD *to_kill);
> void slave_background_gtid_pos_create_request
> (rpl_slave_state::gtid_pos_table *table_entry);
> +void slave_background_gtid_pending_delete_request(void);
>
> extern bool volatile abort_loop;
> extern Master_info *active_mi; /* active_mi for multi-master */
> diff --git a/sql/sys_vars.cc b/sql/sys_vars.cc
> index 6d4c135683a..9348f4e5c98 100644
> --- a/sql/sys_vars.cc
> +++ b/sql/sys_vars.cc
> @@ -1942,6 +1942,19 @@ Sys_var_last_gtid::session_value_ptr(THD *thd, const LEX_CSTRING *base)
> }
>
>
> +static Sys_var_uint Sys_gtid_cleanup_batch_size(
> + "gtid_cleanup_batch_size",
> + "Normally does not need tuning. How many old rows must accumulate in "
> + "the mysql.gtid_slave_pos table before a background job will be run to "
> + "delete them. Can be increased to reduce number of commits if "
> + "using many different engines with --gtid_pos_auto_engines, or to "
> + "reduce CPU overhead if using a huge number of different "
> + "gtid_domain_ids. Can be decreased to reduce number of old rows in the "
> + "table.",
> + GLOBAL_VAR(opt_gtid_cleanup_batch_size), CMD_LINE(REQUIRED_ARG),
> + VALID_RANGE(0,2147483647), DEFAULT(64), BLOCK_SIZE(1));
> +
> +
> static bool
> check_slave_parallel_threads(sys_var *self, THD *thd, set_var *var)
> {
> diff --git a/storage/rocksdb/mysql-test/rocksdb_rpl/r/mdev12179.result b/storage/rocksdb/mysql-test/rocksdb_rpl/r/mdev12179.result
> index 9c20fea97ae..a1e501f78f4 100644
> --- a/storage/rocksdb/mysql-test/rocksdb_rpl/r/mdev12179.result
> +++ b/storage/rocksdb/mysql-test/rocksdb_rpl/r/mdev12179.result
> @@ -2,6 +2,7 @@ include/master-slave.inc
> [connection master]
> connection server_2;
> include/stop_slave.inc
> +SET GLOBAL gtid_cleanup_batch_size = 999999999;
> CHANGE MASTER TO master_use_gtid=slave_pos;
> SET sql_log_bin=0;
> CREATE TABLE mysql.gtid_slave_pos_innodb LIKE mysql.gtid_slave_pos;
> @@ -41,6 +42,8 @@ a
> 1
> SELECT * FROM mysql.gtid_slave_pos ORDER BY sub_id;
> domain_id sub_id server_id seq_no
> +0 1 1 1
> +0 2 1 2
> 0 3 1 3
> 0 4 1 4
> SELECT * FROM ( SELECT * FROM mysql.gtid_slave_pos_innodb
> @@ -121,6 +124,21 @@ Transactions_multi_engine 6
> DELETE FROM t1 WHERE a >= 100;
> DELETE FROM t2 WHERE a >= 100;
> DELETE FROM t3 WHERE a >= 100;
> +connection server_1;
> +include/save_master_gtid.inc
> +connection server_2;
> +include/sync_with_master_gtid.inc
> +SELECT COUNT(*)>=10 FROM mysql.gtid_slave_pos;
> +COUNT(*)>=10
> +1
> +SELECT COUNT(*)>=10 FROM ( SELECT * FROM mysql.gtid_slave_pos_innodb
> +UNION ALL SELECT * FROM mysql.gtid_slave_pos_innodb_redundant) inner_select;
> +COUNT(*)>=10
> +1
> +SELECT COUNT(*)>=10 FROM mysql.gtid_slave_pos_rocksdb;
> +COUNT(*)>=10
> +1
> +SET GLOBAL gtid_cleanup_batch_size = 3;
> connection server_2;
> include/stop_slave.inc
> SET sql_log_bin=0;
> diff --git a/storage/rocksdb/mysql-test/rocksdb_rpl/t/mdev12179.test b/storage/rocksdb/mysql-test/rocksdb_rpl/t/mdev12179.test
> index e0d16e7f242..631d9ca533f 100644
> --- a/storage/rocksdb/mysql-test/rocksdb_rpl/t/mdev12179.test
> +++ b/storage/rocksdb/mysql-test/rocksdb_rpl/t/mdev12179.test
> @@ -4,6 +4,12 @@
>
> --connection server_2
> --source include/stop_slave.inc
> +
> +# Set GTID cleanup limit high enough that cleanup will not run and we
> +# can rely on consistent table output in .result.
> +--let $old_gtid_cleanup_batch_size=`SELECT @@GLOBAL.gtid_cleanup_batch_size`
> +SET GLOBAL gtid_cleanup_batch_size = 999999999;
> +
> CHANGE MASTER TO master_use_gtid=slave_pos;
> SET sql_log_bin=0;
> CREATE TABLE mysql.gtid_slave_pos_innodb LIKE mysql.gtid_slave_pos;
> @@ -89,6 +95,82 @@ DELETE FROM t2 WHERE a >= 100;
> DELETE FROM t3 WHERE a >= 100;
>
>
> +# Create a bunch more GTIDs in mysql.gtid_slave_pos* tables to test with.
> +--connection server_1
> +--disable_query_log
> +let $i=10;
> +while ($i) {
> + eval INSERT INTO t1 VALUES (300+$i);
> + eval INSERT INTO t2 VALUES (300+$i);
> + eval INSERT INTO t3 VALUES (300+$i);
> + dec $i;
> +}
> +--enable_query_log
> +--source include/save_master_gtid.inc
> +
> +--connection server_2
> +--source include/sync_with_master_gtid.inc
> +
> +# Check that we have many rows in mysql.gtid_slave_pos now (since
> +# @@gtid_cleanup_batch_size was set to a huge value). No need to check
> +# for an exact number, since that will require changing .result if
> +# anything changes prior to this point, and we just need to know that
> +# we have still have some data in the tables to make the following
> +# test effective.
> +SELECT COUNT(*)>=10 FROM mysql.gtid_slave_pos;
> +SELECT COUNT(*)>=10 FROM ( SELECT * FROM mysql.gtid_slave_pos_innodb
> + UNION ALL SELECT * FROM mysql.gtid_slave_pos_innodb_redundant) inner_select;
> +SELECT COUNT(*)>=10 FROM mysql.gtid_slave_pos_rocksdb;
> +
> +# Check that old GTID rows will be deleted when batch delete size is
> +# set reasonably. Old row deletion is not 100% deterministic (by design), so
> +# we must wait for it to occur, but it should occur eventually.
> +SET GLOBAL gtid_cleanup_batch_size = 3;
> +let $i=40;
> +--disable_query_log
> +--let $keep_include_silent=1
> +while ($i) {
> + let N=`SELECT 1+($i MOD 3)`;
> + --connection server_1
> + eval UPDATE t$N SET a=a+1 WHERE a=(SELECT MAX(a) FROM t$N);
> + --source include/save_master_gtid.inc
> + --connection server_2
> + --source include/sync_with_master_gtid.inc
> + let $j=50;
> + while ($j) {
> + let $is_done=`SELECT SUM(a)=1 FROM (
> + SELECT COUNT(*) AS a FROM mysql.gtid_slave_pos
> + UNION ALL
> + SELECT COUNT(*) AS a FROM ( SELECT * FROM mysql.gtid_slave_pos_innodb
> + UNION ALL SELECT * FROM mysql.gtid_slave_pos_innodb_redundant) inner_select
> + UNION ALL
> + SELECT COUNT(*) AS a FROM mysql.gtid_slave_pos_rocksdb) outer_select`;
> + if ($is_done) {
> + let $j=0;
> + }
> + if (!$is_done) {
> + real_sleep 0.1;
> + dec $j;
> + }
> + }
> + dec $i;
> + if ($is_done) {
> + let $i=0;
> + }
> +}
> +--enable_query_log
> +--let $keep_include_silent=0
> +if (!$is_done) {
> + --echo Timed out waiting for mysql.gtid_slave_pos* tables to be cleaned up
> +}
> +
> +--disable_query_log
> +DELETE FROM t1 WHERE a >= 100;
> +DELETE FROM t2 WHERE a >= 100;
> +DELETE FROM t3 WHERE a >= 100;
> +--enable_query_log
> +
> +
> # Test status variables Rpl_transactions_multi_engine and Transactions_gtid_foreign_engine.
> # Have mysql.gtid_slave_pos* for myisam and innodb but not rocksdb.
> --connection server_2
> @@ -223,6 +305,9 @@ SHOW STATUS LIKE "%transactions%engine";
> SET sql_log_bin=0;
> DROP TABLE mysql.gtid_slave_pos_innodb;
> SET sql_log_bin=1;
> +--disable_query_log
> +eval SET GLOBAL gtid_cleanup_batch_size = $old_gtid_cleanup_batch_size;
> +--enable_query_log
>
> --connection server_1
> DROP TABLE t1;
> diff --git a/storage/tokudb/mysql-test/tokudb_rpl/r/mdev12179.result b/storage/tokudb/mysql-test/tokudb_rpl/r/mdev12179.result
> index d4532eec4e2..d79e7e59aa4 100644
> --- a/storage/tokudb/mysql-test/tokudb_rpl/r/mdev12179.result
> +++ b/storage/tokudb/mysql-test/tokudb_rpl/r/mdev12179.result
> @@ -2,6 +2,7 @@ include/master-slave.inc
> [connection master]
> connection server_2;
> include/stop_slave.inc
> +SET GLOBAL gtid_cleanup_batch_size = 999999999;
> CHANGE MASTER TO master_use_gtid=slave_pos;
> SET sql_log_bin=0;
> CREATE TABLE mysql.gtid_slave_pos_innodb LIKE mysql.gtid_slave_pos;
> @@ -41,6 +42,8 @@ a
> 1
> SELECT * FROM mysql.gtid_slave_pos ORDER BY sub_id;
> domain_id sub_id server_id seq_no
> +0 1 1 1
> +0 2 1 2
> 0 3 1 3
> 0 4 1 4
> SELECT * FROM ( SELECT * FROM mysql.gtid_slave_pos_innodb
> @@ -121,6 +124,21 @@ Transactions_multi_engine 6
> DELETE FROM t1 WHERE a >= 100;
> DELETE FROM t2 WHERE a >= 100;
> DELETE FROM t3 WHERE a >= 100;
> +connection server_1;
> +include/save_master_gtid.inc
> +connection server_2;
> +include/sync_with_master_gtid.inc
> +SELECT COUNT(*)>=10 FROM mysql.gtid_slave_pos;
> +COUNT(*)>=10
> +1
> +SELECT COUNT(*)>=10 FROM ( SELECT * FROM mysql.gtid_slave_pos_innodb
> +UNION ALL SELECT * FROM mysql.gtid_slave_pos_innodb_redundant) inner_select;
> +COUNT(*)>=10
> +1
> +SELECT COUNT(*)>=10 FROM mysql.gtid_slave_pos_tokudb;
> +COUNT(*)>=10
> +1
> +SET GLOBAL gtid_cleanup_batch_size = 3;
> connection server_2;
> include/stop_slave.inc
> SET sql_log_bin=0;
> diff --git a/storage/tokudb/mysql-test/tokudb_rpl/t/mdev12179.test b/storage/tokudb/mysql-test/tokudb_rpl/t/mdev12179.test
> index ceb119cd0dc..1d19a25889e 100644
> --- a/storage/tokudb/mysql-test/tokudb_rpl/t/mdev12179.test
> +++ b/storage/tokudb/mysql-test/tokudb_rpl/t/mdev12179.test
> @@ -4,6 +4,12 @@
>
> --connection server_2
> --source include/stop_slave.inc
> +
> +# Set GTID cleanup limit high enough that cleanup will not run and we
> +# can rely on consistent table output in .result.
> +--let $old_gtid_cleanup_batch_size=`SELECT @@GLOBAL.gtid_cleanup_batch_size`
> +SET GLOBAL gtid_cleanup_batch_size = 999999999;
> +
> CHANGE MASTER TO master_use_gtid=slave_pos;
> SET sql_log_bin=0;
> CREATE TABLE mysql.gtid_slave_pos_innodb LIKE mysql.gtid_slave_pos;
> @@ -89,6 +95,82 @@ DELETE FROM t2 WHERE a >= 100;
> DELETE FROM t3 WHERE a >= 100;
>
>
> +# Create a bunch more GTIDs in mysql.gtid_slave_pos* tables to test with.
> +--connection server_1
> +--disable_query_log
> +let $i=10;
> +while ($i) {
> + eval INSERT INTO t1 VALUES (300+$i);
> + eval INSERT INTO t2 VALUES (300+$i);
> + eval INSERT INTO t3 VALUES (300+$i);
> + dec $i;
> +}
> +--enable_query_log
> +--source include/save_master_gtid.inc
> +
> +--connection server_2
> +--source include/sync_with_master_gtid.inc
> +
> +# Check that we have many rows in mysql.gtid_slave_pos now (since
> +# @@gtid_cleanup_batch_size was set to a huge value). No need to check
> +# for an exact number, since that will require changing .result if
> +# anything changes prior to this point, and we just need to know that
> +# we have still have some data in the tables to make the following
> +# test effective.
> +SELECT COUNT(*)>=10 FROM mysql.gtid_slave_pos;
> +SELECT COUNT(*)>=10 FROM ( SELECT * FROM mysql.gtid_slave_pos_innodb
> + UNION ALL SELECT * FROM mysql.gtid_slave_pos_innodb_redundant) inner_select;
> +SELECT COUNT(*)>=10 FROM mysql.gtid_slave_pos_tokudb;
> +
> +# Check that old GTID rows will be deleted when batch delete size is
> +# set reasonably. Old row deletion is not 100% deterministic (by design), so
> +# we must wait for it to occur, but it should occur eventually.
> +SET GLOBAL gtid_cleanup_batch_size = 3;
> +let $i=40;
> +--disable_query_log
> +--let $keep_include_silent=1
> +while ($i) {
> + let N=`SELECT 1+($i MOD 3)`;
> + --connection server_1
> + eval UPDATE t$N SET a=a+1 WHERE a=(SELECT MAX(a) FROM t$N);
> + --source include/save_master_gtid.inc
> + --connection server_2
> + --source include/sync_with_master_gtid.inc
> + let $j=50;
> + while ($j) {
> + let $is_done=`SELECT SUM(a)=1 FROM (
> + SELECT COUNT(*) AS a FROM mysql.gtid_slave_pos
> + UNION ALL
> + SELECT COUNT(*) AS a FROM ( SELECT * FROM mysql.gtid_slave_pos_innodb
> + UNION ALL SELECT * FROM mysql.gtid_slave_pos_innodb_redundant) inner_select
> + UNION ALL
> + SELECT COUNT(*) AS a FROM mysql.gtid_slave_pos_tokudb) outer_select`;
> + if ($is_done) {
> + let $j=0;
> + }
> + if (!$is_done) {
> + real_sleep 0.1;
> + dec $j;
> + }
> + }
> + dec $i;
> + if ($is_done) {
> + let $i=0;
> + }
> +}
> +--enable_query_log
> +--let $keep_include_silent=0
> +if (!$is_done) {
> + --echo Timed out waiting for mysql.gtid_slave_pos* tables to be cleaned up
> +}
> +
> +--disable_query_log
> +DELETE FROM t1 WHERE a >= 100;
> +DELETE FROM t2 WHERE a >= 100;
> +DELETE FROM t3 WHERE a >= 100;
> +--enable_query_log
> +
> +
> # Test status variables Rpl_transactions_multi_engine and Transactions_gtid_foreign_engine.
> # Have mysql.gtid_slave_pos* for myisam and innodb but not tokudb.
> --connection server_2
> @@ -223,6 +305,9 @@ SHOW STATUS LIKE "%transactions%engine";
> SET sql_log_bin=0;
> DROP TABLE mysql.gtid_slave_pos_innodb;
> SET sql_log_bin=1;
> +--disable_query_log
> +eval SET GLOBAL gtid_cleanup_batch_size = $old_gtid_cleanup_batch_size;
> +--enable_query_log
>
> --connection server_1
> DROP TABLE t1;
1
0
Re: [Maria-developers] [Commits] 1ed2d8b98ad: MDEV-17032: Estimates are higher for partitions of a table with @@use_stat_tables= PREFERABLY
by Sergey Petrunia 25 Nov '18
by Sergey Petrunia 25 Nov '18
25 Nov '18
Hi Varun,
On Fri, Nov 16, 2018 at 07:41:07PM +0530, Varun wrote:
> revision-id: 1ed2d8b98ade099fe23b7d5c00d23364388e15aa (mariadb-10.0.36-80-g1ed2d8b98ad)
> parent(s): a84d87fde8c0bc325c8e00f06ea02bcd84a75d55
> author: Varun Gupta
> committer: Varun Gupta
> timestamp: 2018-11-16 19:40:47 +0530
> message:
>
> MDEV-17032: Estimates are higher for partitions of a table with @@use_stat_tables= PREFERABLY
>
> The problem here is EITS statistics does not calculate statistics for the partitions of the table.
> So a temporary solution would be to not read EITS statistics for partitioned tables.
>
> Also disabling reading of EITS for columns that participate in the partition list of a table.
>
> ---
> mysql-test/r/partition.result | 88 +++++++++++++++++++++++++++++++++++++++++++
> mysql-test/t/partition.test | 58 ++++++++++++++++++++++++++++
> sql/opt_range.cc | 12 +++++-
> sql/partition_info.cc | 26 +++++++++++++
> sql/partition_info.h | 1 +
> sql/sql_statistics.cc | 16 ++++++++
> 6 files changed, 199 insertions(+), 2 deletions(-)
>
> diff --git a/mysql-test/r/partition.result b/mysql-test/r/partition.result
> index c6669176b3d..aedf9f89f0e 100644
> --- a/mysql-test/r/partition.result
> +++ b/mysql-test/r/partition.result
> @@ -2645,3 +2645,91 @@ Warnings:
> Note 1517 Duplicate partition name p2
> DEALLOCATE PREPARE stmt;
> DROP TABLE t1;
> +#
> +# MDEV-17032: Estimates are higher for partitions of a table with @@use_stat_tables= PREFERABLY
> +#
> +create table t0(a int);
> +insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
> +create table t1(a int);
> +insert into t1 select A.a + B.a* 10 + C.a * 100 from t0 A, t0 B, t0 C;
> +create table t2 (
> +part_key int,
> +a int,
> +b int
> +) partition by list(part_key) (
> +partition p0 values in (0),
> +partition p1 values in (1),
> +partition p2 values in (2),
> +partition p3 values in (3),
> +partition p4 values in (4)
> +);
> +insert into t2
> +select mod(a,5), a/100, a from t1;
> +set @save_use_stat_tables= @@use_stat_tables;
> +set @save_optimizer_use_condition_selectivity=@@optimizer_use_condition_selectivity;
> +#
> +# Tests using stats provided by the storage engine
> +#
> +explain extended select * from t2 where part_key=1;
> +id select_type table type possible_keys key key_len ref rows filtered Extra
> +1 SIMPLE t2 ALL NULL NULL NULL NULL 200 100.00 Using where
> +Warnings:
> +Note 1003 select `test`.`t2`.`part_key` AS `part_key`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t2` where (`test`.`t2`.`part_key` = 1)
> +explain partitions select * from t2 where part_key=1;
> +id select_type table partitions type possible_keys key key_len ref rows Extra
> +1 SIMPLE t2 p1 ALL NULL NULL NULL NULL 200 Using where
> +explain extended select * from t2 where part_key in (1,2);
> +id select_type table type possible_keys key key_len ref rows filtered Extra
> +1 SIMPLE t2 ALL NULL NULL NULL NULL 400 100.00 Using where
> +Warnings:
> +Note 1003 select `test`.`t2`.`part_key` AS `part_key`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t2` where (`test`.`t2`.`part_key` in (1,2))
> +explain partitions select * from t2 where part_key=1;
> +id select_type table partitions type possible_keys key key_len ref rows Extra
> +1 SIMPLE t2 p1 ALL NULL NULL NULL NULL 200 Using where
> +explain extended select * from t2 where part_key in (1,2);
> +id select_type table type possible_keys key key_len ref rows filtered Extra
> +1 SIMPLE t2 ALL NULL NULL NULL NULL 400 100.00 Using where
> +Warnings:
> +Note 1003 select `test`.`t2`.`part_key` AS `part_key`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t2` where (`test`.`t2`.`part_key` in (1,2))
> +explain partitions select * from t2 where part_key in (1,2);
> +id select_type table partitions type possible_keys key key_len ref rows Extra
> +1 SIMPLE t2 p1,p2 ALL NULL NULL NULL NULL 400 Using where
> +set @save_histogram_size=@@histogram_size;
> +set @@histogram_size=100;
> +set @@use_stat_tables= PREFERABLY;
> +set @@optimizer_use_condition_selectivity=4;
> +analyze table t2;
> +Table Op Msg_type Msg_text
> +test.t2 analyze status Engine-independent statistics collected
> +test.t2 analyze status OK
> +#
> +# Tests using EITS
> +#
> +explain extended select * from t2 where part_key=1;
> +id select_type table type possible_keys key key_len ref rows filtered Extra
> +1 SIMPLE t2 ALL NULL NULL NULL NULL 200 100.00 Using where
> +Warnings:
> +Note 1003 select `test`.`t2`.`part_key` AS `part_key`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t2` where (`test`.`t2`.`part_key` = 1)
> +explain partitions select * from t2 where part_key=1;
> +id select_type table partitions type possible_keys key key_len ref rows Extra
> +1 SIMPLE t2 p1 ALL NULL NULL NULL NULL 200 Using where
> +explain extended select * from t2 where part_key in (1,2);
> +id select_type table type possible_keys key key_len ref rows filtered Extra
> +1 SIMPLE t2 ALL NULL NULL NULL NULL 400 100.00 Using where
> +Warnings:
> +Note 1003 select `test`.`t2`.`part_key` AS `part_key`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t2` where (`test`.`t2`.`part_key` in (1,2))
> +explain partitions select * from t2 where part_key=1;
> +id select_type table partitions type possible_keys key key_len ref rows Extra
> +1 SIMPLE t2 p1 ALL NULL NULL NULL NULL 200 Using where
> +explain extended select * from t2 where part_key in (1,2);
> +id select_type table type possible_keys key key_len ref rows filtered Extra
> +1 SIMPLE t2 ALL NULL NULL NULL NULL 400 100.00 Using where
> +Warnings:
> +Note 1003 select `test`.`t2`.`part_key` AS `part_key`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t2` where (`test`.`t2`.`part_key` in (1,2))
> +explain partitions select * from t2 where part_key in (1,2);
> +id select_type table partitions type possible_keys key key_len ref rows Extra
> +1 SIMPLE t2 p1,p2 ALL NULL NULL NULL NULL 400 Using where
> +set @@use_stat_tables= @save_use_stat_tables;
> +set @@optimizer_use_condition_selectivity= @save_optimizer_use_condition_selectivity;
> +set @@histogram_size= @save_histogram_size;
> +drop table t0,t1,t2;
> diff --git a/mysql-test/t/partition.test b/mysql-test/t/partition.test
> index 1c8cd0375d6..00c6f1ce77c 100644
> --- a/mysql-test/t/partition.test
> +++ b/mysql-test/t/partition.test
> @@ -2897,3 +2897,61 @@ EXECUTE stmt;
> DEALLOCATE PREPARE stmt;
> DROP TABLE t1;
>
> +--echo #
> +--echo # MDEV-17032: Estimates are higher for partitions of a table with @@use_stat_tables= PREFERABLY
> +--echo #
> +
> +create table t0(a int);
> +insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
> +
> +create table t1(a int);
> +insert into t1 select A.a + B.a* 10 + C.a * 100 from t0 A, t0 B, t0 C;
> +
> +
> +create table t2 (
> + part_key int,
> + a int,
> + b int
> +) partition by list(part_key) (
> + partition p0 values in (0),
> + partition p1 values in (1),
> + partition p2 values in (2),
> + partition p3 values in (3),
> + partition p4 values in (4)
> +);
> +insert into t2
> +select mod(a,5), a/100, a from t1;
> +
> +set @save_use_stat_tables= @@use_stat_tables;
> +set @save_optimizer_use_condition_selectivity=@@optimizer_use_condition_selectivity;
> +--echo #
> +--echo # Tests using stats provided by the storage engine
> +--echo #
> +explain extended select * from t2 where part_key=1;
> +explain partitions select * from t2 where part_key=1;
> +explain extended select * from t2 where part_key in (1,2);
> +explain partitions select * from t2 where part_key=1;
^^ Isn't this query the same like one two lines above?
> +explain extended select * from t2 where part_key in (1,2);
> +explain partitions select * from t2 where part_key in (1,2);
> +
> +set @save_histogram_size=@@histogram_size;
> +set @@histogram_size=100;
> +set @@use_stat_tables= PREFERABLY;
> +set @@optimizer_use_condition_selectivity=4;
> +analyze table t2;
> +--echo #
> +--echo # Tests using EITS
> +--echo #
> +# filtered should be 100
Please change to use '--echo #' so that the above is visible in the .result
file as well.
> +explain extended select * from t2 where part_key=1;
> +explain partitions select * from t2 where part_key=1;
> +# filtered should be 100
> +explain extended select * from t2 where part_key in (1,2);
> +explain partitions select * from t2 where part_key=1;
> +# filtered should be 100
> +explain extended select * from t2 where part_key in (1,2);
> +explain partitions select * from t2 where part_key in (1,2);
Please add a test that shows that condition selectivity is taken into account
for non-partitioned columns.
> +set @@use_stat_tables= @save_use_stat_tables;
> +set @@optimizer_use_condition_selectivity= @save_optimizer_use_condition_selectivity;
> +set @@histogram_size= @save_histogram_size;
> +drop table t0,t1,t2;
> diff --git a/sql/opt_range.cc b/sql/opt_range.cc
> index 3bcaa72e32f..59aa2f3f280 100644
> --- a/sql/opt_range.cc
> +++ b/sql/opt_range.cc
> @@ -3322,6 +3322,10 @@ bool create_key_parts_for_pseudo_indexes(RANGE_OPT_PARAM *param,
> {
> Field **field_ptr;
> TABLE *table= param->table;
> + partition_info *part_info= NULL;
> + #ifdef WITH_PARTITION_STORAGE_ENGINE
> + part_info= table->part_info;
> + #endif
> uint parts= 0;
>
> for (field_ptr= table->field; *field_ptr; field_ptr++)
> @@ -3329,7 +3333,9 @@ bool create_key_parts_for_pseudo_indexes(RANGE_OPT_PARAM *param,
> Column_statistics* col_stats= (*field_ptr)->read_stats;
> if (bitmap_is_set(used_fields, (*field_ptr)->field_index)
> && col_stats && !col_stats->no_stat_values_provided()
> - && !((*field_ptr)->type() == MYSQL_TYPE_GEOMETRY))
> + && !((*field_ptr)->type() == MYSQL_TYPE_GEOMETRY)
> + && (!part_info ||
> + part_info->field_in_partition_expr(*field_ptr)))
> parts++;
> }
>
> @@ -3350,7 +3356,9 @@ bool create_key_parts_for_pseudo_indexes(RANGE_OPT_PARAM *param,
> if (bitmap_is_set(used_fields, (*field_ptr)->field_index))
> {
> Field *field= *field_ptr;
> - if (field->type() == MYSQL_TYPE_GEOMETRY)
> + if (field->type() == MYSQL_TYPE_GEOMETRY
> + || (!part_info ||
> + part_info->field_in_partition_expr(field)))
> continue;
This condition is wrong.
(field->type() == MYSQL_TYPE_GEOMETRY
|| (!part_info ||
part_info->field_in_partition_expr(field)))
is the same as
(field->type() == MYSQL_TYPE_GEOMETRY ||
!part_info || ...
which means if there is no partitioning, we won't count the selectivity.
Indeed, if I apply this patch, all EITS tests start to fail, selectivity is no
longer counted:
--- /home/psergey/dev-git/10.1/mysql-test/r/selectivity_innodb.result 2018-11-16 14:06:27.231202907 +0300
+++ /home/psergey/dev-git/10.1/mysql-test/r/selectivity_innodb.reject 2018-11-25 01:03:14.661647893 +0300
@@ -28,13 +28,13 @@
explain extended
select * from t1 where a is null;
id select_type table type possible_keys key key_len ref rows filtered Extra
-1 SIMPLE t1 ALL NULL NULL NULL NULL 10 40.00 Using where
+1 SIMPLE t1 ALL NULL NULL NULL NULL 10 100.00 Using where
Warnings:
Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` where isnull(`test`.`t1`.`a`)
explain extended
select * from t1 where a is not null;
id select_type table type possible_keys key key_len ref rows filtered Extra
-1 SIMPLE t1 ALL NULL NULL NULL NULL 10 60.00 Using where
+1 SIMPLE t1 ALL NULL NULL NULL NULL 10 100.00 Using where
Warnings:
See also the comment below about the return value of field_in_partition_expr.
>
> uint16 store_length;
> diff --git a/sql/partition_info.cc b/sql/partition_info.cc
> index 52bda560c1c..0111fc1451d 100644
> --- a/sql/partition_info.cc
> +++ b/sql/partition_info.cc
> @@ -3164,6 +3164,32 @@ void partition_info::print_debug(const char *str, uint *value)
> DBUG_PRINT("info", ("parser: %s", str));
> DBUG_VOID_RETURN;
> }
> +
> +/*
> +
> + Disabling reading EITS statistics for columns involved in the
> + partition list of a table.
> + We assume the selecticivity for such columns would be handled
> + during partition pruning.
> +
> +*/
> +
This comment here doesn't make any sense anymore. Please move it to
create_key_parts_for_pseudo_indexes.
Also, the name and return value of the function is confusing now. If
one calls the function for the partitioning column:
field_in_partition_expr(partition_column) == false.
Please change the return value to be true in this case, and false otherwise.
> +bool partition_info::field_in_partition_expr(Field *field) const
> +{
> + uint i;
> + for (i= 0; i < num_part_fields; i++)
> + {
> + if (field->eq(part_field_array[i]))
> + return FALSE;
> + }
> + for (i= 0; i < num_subpart_fields; i++)
> + {
> + if (field->eq(subpart_field_array[i]))
> + return FALSE;
> + }
> + return TRUE;
> +}
> +
> #else /* WITH_PARTITION_STORAGE_ENGINE */
> /*
> For builds without partitioning we need to define these functions
> diff --git a/sql/partition_info.h b/sql/partition_info.h
> index f250c5496bf..10b8954ace7 100644
> --- a/sql/partition_info.h
> +++ b/sql/partition_info.h
> @@ -384,6 +384,7 @@ class partition_info : public Sql_alloc
> bool is_full_part_expr_in_fields(List<Item> &fields);
> public:
> bool has_unique_name(partition_element *element);
> + bool field_in_partition_expr(Field *field) const;
> };
>
> uint32 get_next_partition_id_range(struct st_partition_iter* part_iter);
> diff --git a/sql/sql_statistics.cc b/sql/sql_statistics.cc
> index cb75a5c2176..02dd4970c99 100644
> --- a/sql/sql_statistics.cc
> +++ b/sql/sql_statistics.cc
> @@ -3589,6 +3589,22 @@ void set_statistics_for_table(THD *thd, TABLE *table)
> (use_stat_table_mode <= COMPLEMENTARY ||
> !table->stats_is_read || read_stats->cardinality_is_null) ?
> table->file->stats.records : read_stats->cardinality;
> +
> + /*
> + For partitioned table, EITS statistics is based on data from all partitions.
> +
> + On the other hand, Partition Pruning figures which partitions will be
> + accessed and then computes the estimate of rows in used_partitions.
> +
> + Use the estimate from Partition Pruning as it is typically more precise.
> + Ideally, EITS should provide per-partition statistics but this is not
> + implemented currently.
> + */
Good.
> + #ifdef WITH_PARTITION_STORAGE_ENGINE
> + if (table->part_info)
> + table->used_stat_records= table->file->stats.records;
> + #endif
> +
> KEY *key_info, *key_info_end;
> for (key_info= table->key_info, key_info_end= key_info+table->s->keys;
> key_info < key_info_end; key_info++)
--
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
1
0
Hi Alexey,
On 11/25/2018 03:00 AM, Alexey Botchkov wrote:
> Hi, Alexander.
>
> First question about the sql/sql_basic_types.h.
> To me it looks overcomplicated. Do we really need classes over
> the simple enums?
> I attached my version of this file here. In my opinion it's shorter itself,
> and makes other code look nicer.
This is intentional, for strict data type control.
Simple enums have some problems. For example, you can pass
enums to a function that expects an integer data type.
I want any mistakes like this to cause a compilation error.
I would avoid using the traditional enums.
Can you please leave it as is?
I just found that C++11 introduced enum classes.
They don't convert to int implicitly, and don't
export their values to the public name space.
https://www.codeguru.com/cpp/cpp/article.php/c19083/C-2011-Stronglytyped-En…
We have enabled C++11 in 10.4
So now this could be changed to:
class enum date_conv_mode_t
{
CONV_NONE= 0U,
FUZZY_DATES= 1U,
TIME_ONLY= 4U,
INTERVAL_hhmmssff= 8U,
INTERVAL_DAY= 16U,
RANGE0_LAST= INTERVAL_DAY,
NO_ZERO_IN_DATE= (1UL << 23), // MODE_NO_ZERO_IN_DATE
NO_ZERO_DATE= (1UL << 24), // MODE_NO_ZERO_DATE
INVALID_DATES= (1UL << 25) // MODE_INVALID_DATES
};
We could try this.
>
> Other than that
>
> lines like this can be shortened
> -TIME_FUZZY_DATES (date_mode_t::value_t::FUZZY_DATES),
> +TIME_FUZZY_DATES (date_mode_t::FUZZY_DATES),
Thanks. Fixed.
>
> Three variables all defined differently. Looks funny :)
> const date_conv_mode_t
> ...
> static const date_conv_mode_t
> ...
> static time_round_mode_t
> Why is that?
Thanks for noticing this. I've changed them all to "static const".
>
>
> What do you think if we do this:
> - add THD* memver to the MYSQL_TIME_STATUS structure,
> so we can get rid of the THD * parameter to many functions?
> - send one MYSQL_TIME_STATUS* to the add_nanoseconds()
> instead of three {thd, &status->warnings, and status->nanoseconds} ?
Changing all functions that accept both &warn and nsec to
accept a pointer to MYSQL_TIME_STATUS instead, like this:
< xxxx(thd, &status->warnings, and status->nanoseconds);
> xxxx(thd, &status);
is a very good idea.
I'd avoid mix "thd" to MYSQL_TIME_STATUS though.
It's a pure C structure.
Does it sound OK?
>
> You do like this:
> Datetime::Options opt(TIME_CONV_NONE, thd);
> Datetime dt(thd, item->arguments()[0], opt);
>
> Why not just
> Datetime dt(thd, item->arguments()[0], Datetime::Options
> opt(TIME_CONV_NONE, thd))?
> I'm not sure about this very case, but ofter code like yours translated
> with extra
> constructor call.
When the line fits the entire call, I do like this:
a.
Datetime dt(thd, item, Datetime::Options opt(mode, thd));
When the line does not fit, I tried both ways:
b.
Datetime dt(thd, item->arguments()[0],
Datetime::Options opt(TIME_CONV_NONE, thd));
and
c.
Datetime::Options opt(TIME_CONV_NONE, thd);
Datetime dt(thd, item->arguments()[0], opt);
and "c" looked easier to read for me than "b".
But this causes a constructor call indeed.
Although, in case of date_mode_t this is very cheap
(as cheap as copying of a 4 byte m_mode value), it's still
a good idea to avoid this. I'll change like you suggest.
Moreover, it's very likely that date_mode_t can change to something
mode complex in the future. So an extra constructor call can get more
expensive.
>
> Best regards.
> HF
>
1
1
Re: [Maria-developers] [Commits] 68b7e231cb3: MDEV-17255: New optimizer defaults and ANALYZE TABLE
by Sergey Petrunia 24 Nov '18
by Sergey Petrunia 24 Nov '18
24 Nov '18
Hi Varun,
On Fri, Nov 16, 2018 at 11:48:59PM +0530, Varun wrote:
> revision-id: 68b7e231cb3a1cdfbd968c2cbf72687b29b6d2da (mariadb-10.3.6-209-g68b7e231cb3)
> parent(s): b9a9055793ab8b9a50200e2f55602463627dedd3
Does the above say the patch is going into 10.3?
The MDEV has fixVersion=10.4 and the parent revision is also in 10.4.
So I assume the target version is 10.4?
> author: Varun Gupta
> committer: Varun Gupta
> timestamp: 2018-11-16 23:48:11 +0530
> message:
>
> MDEV-17255: New optimizer defaults and ANALYZE TABLE
>
> Added another value for the use_stat_tables system variable
Please mention in the commit comment:
- the name of the new value
- the fact that the new value is now the default.
> ---
> sql/opt_range.cc | 2 +-
> sql/sql_admin.cc | 3 ++-
> sql/sql_statistics.h | 1 +
> sql/sys_vars.cc | 4 ++--
> 4 files changed, 6 insertions(+), 4 deletions(-)
Please run the testsuite. I did the 'main' testsuite and:
- main.mysqld--help, main.stat_tables_disabled - these need updates
- a few other tests need to be updated as now opening any table will now cause
the server to attept to read EITS stats for it.
Please also add test coverage for this particular MDEV. This should be a
testcase that shows that
- with PREFERABLY, ANALYZE will collect EITS stats, SELECT will use them.
- with PREFERABLY_FOR_READS :
-- ANALYZE will not collect EITS stats
-- ANALYZE .. PERSISTENT FOR ... will update the EITS stats.
-- SELECT will use them.
.
>
> diff --git a/sql/opt_range.cc b/sql/opt_range.cc
> index d6db365a8a2..f501a5ae085 100644
> --- a/sql/opt_range.cc
> +++ b/sql/opt_range.cc
> @@ -3158,7 +3158,7 @@ bool calculate_cond_selectivity_for_table(THD *thd, TABLE *table, Item **cond)
>
> if (thd->variables.optimizer_use_condition_selectivity > 2 &&
> !bitmap_is_clear_all(used_fields) &&
> - thd->variables.use_stat_tables > 0)
> + get_use_stat_tables_mode(thd) > NEVER)
> {
> PARAM param;
> MEM_ROOT alloc;
> diff --git a/sql/sql_admin.cc b/sql/sql_admin.cc
> index d0d959de8f9..b5c732737b7 100644
> --- a/sql/sql_admin.cc
> +++ b/sql/sql_admin.cc
> @@ -767,7 +767,8 @@ static bool mysql_admin_table(THD* thd, TABLE_LIST* tables,
> }
> collect_eis=
> (table->table->s->table_category == TABLE_CATEGORY_USER &&
> - (get_use_stat_tables_mode(thd) > NEVER ||
> + ((get_use_stat_tables_mode(thd) > NEVER &&
> + get_use_stat_tables_mode(thd) < PREFERABLY_FOR_READS) ||
> lex->with_persistent_for_clause));
>
>
> diff --git a/sql/sql_statistics.h b/sql/sql_statistics.h
> index 39cddf95188..a942c05be09 100644
> --- a/sql/sql_statistics.h
> +++ b/sql/sql_statistics.h
> @@ -22,6 +22,7 @@ enum enum_use_stat_tables_mode
> NEVER,
> COMPLEMENTARY,
> PREFERABLY,
Please add a comment describing the new value.
> + PREFERABLY_FOR_READS,
> } Use_stat_tables_mode;
>
> typedef
> diff --git a/sql/sys_vars.cc b/sql/sys_vars.cc
> index 92c7d329bb9..3652d728fc1 100644
> --- a/sql/sys_vars.cc
> +++ b/sql/sys_vars.cc
> @@ -5841,12 +5841,12 @@ static Sys_var_ulong Sys_progress_report_time(
> VALID_RANGE(0, UINT_MAX), DEFAULT(5), BLOCK_SIZE(1));
>
> const char *use_stat_tables_modes[] =
> - {"NEVER", "COMPLEMENTARY", "PREFERABLY", 0};
> + {"NEVER", "COMPLEMENTARY", "PREFERABLY", "PREFERABLY_FOR_READS", 0};
> static Sys_var_enum Sys_optimizer_use_stat_tables(
> "use_stat_tables",
> "Specifies how to use system statistics tables",
> SESSION_VAR(use_stat_tables), CMD_LINE(REQUIRED_ARG),
> - use_stat_tables_modes, DEFAULT(0));
> + use_stat_tables_modes, DEFAULT(3));
>
> static Sys_var_ulong Sys_histogram_size(
> "histogram_size",
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
1
0
Hi, Alexey!
Thanks!
Few minor comments, see below
On Nov 23, Alexey Botchkov wrote:
> revision-id: afedc29f773362f342756e9cdd9bbd9ed0f1abe2 (versioning-1.0.3-68-gafedc29f773)
> parent(s): 4ec8598c1dcb63499bce998142b8e5b8b09b2d30
> author: Alexey Botchkov <holyfoot(a)askmonty.org>
> committer: Alexey Botchkov <holyfoot(a)askmonty.org>
> timestamp: 2018-06-26 14:10:46 +0400
> message:
>
> MDEV-14024 PCRE2.
>
> Related changes in the server code.
>
> diff --git a/cmake/pcre.cmake b/cmake/pcre.cmake
> index 4c113929866..264f93a60a2 100644
> --- a/cmake/pcre.cmake
> +++ b/cmake/pcre.cmake
> @@ -5,24 +5,17 @@ SET(WITH_PCRE "auto" CACHE STRING
>
> MACRO (CHECK_PCRE)
> IF(WITH_PCRE STREQUAL "system" OR WITH_PCRE STREQUAL "auto")
> - CHECK_LIBRARY_EXISTS(pcre pcre_stack_guard "" HAVE_PCRE_STACK_GUARD)
> + CHECK_LIBRARY_EXISTS(pcre2-8 pcre2_match "" HAVE_PCRE2)
> - IF(NOT CMAKE_CROSSCOMPILING)
> - SET(CMAKE_REQUIRED_LIBRARIES "pcre")
> - CHECK_C_SOURCE_RUNS("
> - #include <pcre.h>
> - int main() {
> - return -pcre_exec(NULL, NULL, NULL, -999, -999, 0, NULL, 0) < 256;
> - }" PCRE_STACK_SIZE_OK)
> - SET(CMAKE_REQUIRED_LIBRARIES)
> - ENDIF()
> ENDIF()
> - IF(NOT HAVE_PCRE_STACK_GUARD OR NOT PCRE_STACK_SIZE_OK OR
> - WITH_PCRE STREQUAL "bundled")
> + IF(NOT HAVE_PCRE2 OR WITH_PCRE STREQUAL "bundled")
> IF (WITH_PCRE STREQUAL "system")
> - MESSAGE(FATAL_ERROR "system pcre is not found or unusable")
> + MESSAGE(FATAL_ERROR "system pcre2-8 library is not found or unusable")
> ENDIF()
> - SET(PCRE_INCLUDES ${CMAKE_BINARY_DIR}/pcre ${CMAKE_SOURCE_DIR}/pcre)
> - ADD_SUBDIRECTORY(pcre)
> + SET(PCRE_INCLUDES ${CMAKE_BINARY_DIR}/pcre2 ${CMAKE_SOURCE_DIR}/pcre2
> + ${CMAKE_BINARY_DIR}/pcre2/src ${CMAKE_SOURCE_DIR}/pcre2/src)
That's a bit strange. I'd expect only .../pcre2/src paths, not .../pcre2.
you need pcre2.h and pcre2posix.h, they're both in src/ right?
> + SET(PCRE_BUILD_TESTS OFF CACHE BOOL "Disable tests.")
> + SET(PCRE2_BUILD_PCRE2GREP OFF CACHE BOOL "Disable pcre2grep")
> + ADD_SUBDIRECTORY(pcre2)
> ENDIF()
> ENDMACRO()
>
> diff --git a/extra/mariabackup/CMakeLists.txt b/extra/mariabackup/CMakeLists.txt
> index 7df5fa17903..a9d7153b893 100644
> --- a/extra/mariabackup/CMakeLists.txt
> +++ b/extra/mariabackup/CMakeLists.txt
> @@ -37,7 +37,7 @@ INCLUDE_DIRECTORIES(
> )
>
> IF(NOT HAVE_SYSTEM_REGEX)
> - INCLUDE_DIRECTORIES(${CMAKE_SOURCE_DIR}/pcre)
> + INCLUDE_DIRECTORIES(${CMAKE_SOURCE_DIR}/pcre2 ${CMAKE_SOURCE_DIR}/pcre2/src)
Hmm, not INCLUDE_DIRECTORIES(${PCRE_INCLUDES}) ?
pcre2.h is in the ${CMAKE_BINARY_DIR}, it's generated.
> ENDIF()
>
> IF(WITH_WSREP)
> diff --git a/mysql-test/main/func_regexp_pcre.test b/mysql-test/main/func_regexp_pcre.test
> index 21600390bb2..30969b3e9ae 100644
> --- a/mysql-test/main/func_regexp_pcre.test
> +++ b/mysql-test/main/func_regexp_pcre.test
> @@ -382,6 +382,7 @@ SELECT 'AB' RLIKE 'A B';
> SELECT 'AB' RLIKE 'A# this is a comment\nB';
> SET default_regex_flags=DEFAULT;
>
> +--error ER_REGEXP_ERROR
> SELECT 'Aq' RLIKE 'A\\q';
you forgot to update func_regexp_pcre.result
> SET default_regex_flags='EXTRA';
> --error ER_REGEXP_ERROR
> diff --git a/sql/item_cmpfunc.cc b/sql/item_cmpfunc.cc
> index 76f4788c1cf..450a9c54176 100644
> --- a/sql/item_cmpfunc.cc
> +++ b/sql/item_cmpfunc.cc
> @@ -5485,124 +5486,44 @@ bool Regexp_processor_pcre::compile(Item *item, bool send_error)
> */
> void Regexp_processor_pcre::pcre_exec_warn(int rc) const
> {
> - char buf[64];
> - const char *errmsg= NULL;
> + PCRE2_UCHAR8 buf[128];
> THD *thd= current_thd;
>
> - /*
> - Make a descriptive message only for those pcre_exec() error codes
> - that can actually happen in MariaDB.
> - */
> - switch (rc)
> + int errlen= pcre2_get_error_message(rc, buf, sizeof(buf));
> + if (errlen <= 0)
> {
> - case PCRE_ERROR_NULL:
> - errmsg= "pcre_exec: null argument passed";
> - break;
> - case PCRE_ERROR_BADOPTION:
> - errmsg= "pcre_exec: bad option";
> - break;
> - case PCRE_ERROR_BADMAGIC:
> - errmsg= "pcre_exec: bad magic - not a compiled regex";
> - break;
> - case PCRE_ERROR_UNKNOWN_OPCODE:
> - errmsg= "pcre_exec: error in compiled regex";
> - break;
> - case PCRE_ERROR_NOMEMORY:
> - errmsg= "pcre_exec: Out of memory";
> - break;
> - case PCRE_ERROR_NOSUBSTRING:
> - errmsg= "pcre_exec: no substring";
> - break;
> - case PCRE_ERROR_MATCHLIMIT:
> - errmsg= "pcre_exec: match limit exceeded";
> - break;
> - case PCRE_ERROR_CALLOUT:
> - errmsg= "pcre_exec: callout error";
> - break;
> - case PCRE_ERROR_BADUTF8:
> - errmsg= "pcre_exec: Invalid utf8 byte sequence in the subject string";
> - break;
> - case PCRE_ERROR_BADUTF8_OFFSET:
> - errmsg= "pcre_exec: Started at invalid location within utf8 byte sequence";
> - break;
> - case PCRE_ERROR_PARTIAL:
> - errmsg= "pcre_exec: partial match";
> - break;
> - case PCRE_ERROR_INTERNAL:
> - errmsg= "pcre_exec: internal error";
> - break;
> - case PCRE_ERROR_BADCOUNT:
> - errmsg= "pcre_exec: ovesize is negative";
> - break;
> - case PCRE_ERROR_RECURSIONLIMIT:
> - my_snprintf(buf, sizeof(buf), "pcre_exec: recursion limit of %ld exceeded",
> - m_pcre_extra.match_limit_recursion);
> - errmsg= buf;
> - break;
> - case PCRE_ERROR_BADNEWLINE:
> - errmsg= "pcre_exec: bad newline options";
> - break;
> - case PCRE_ERROR_BADOFFSET:
> - errmsg= "pcre_exec: start offset negative or greater than string length";
> - break;
> - case PCRE_ERROR_SHORTUTF8:
> - errmsg= "pcre_exec: ended in middle of utf8 sequence";
> - break;
> - case PCRE_ERROR_JIT_STACKLIMIT:
> - errmsg= "pcre_exec: insufficient stack memory for JIT compile";
> - break;
> - case PCRE_ERROR_RECURSELOOP:
> - errmsg= "pcre_exec: Recursion loop detected";
> - break;
> - case PCRE_ERROR_BADMODE:
> - errmsg= "pcre_exec: compiled pattern passed to wrong bit library function";
> - break;
> - case PCRE_ERROR_BADENDIANNESS:
> - errmsg= "pcre_exec: compiled pattern passed to wrong endianness processor";
> - break;
> - case PCRE_ERROR_JIT_BADOPTION:
> - errmsg= "pcre_exec: bad jit option";
> - break;
> - case PCRE_ERROR_BADLENGTH:
> - errmsg= "pcre_exec: negative length";
> - break;
> - default:
> - /*
> - As other error codes should normally not happen,
> - we just report the error code without textual description
> - of the code.
> - */
> - my_snprintf(buf, sizeof(buf), "pcre_exec: Internal error (%d)", rc);
> - errmsg= buf;
> + my_snprintf((char *)buf, sizeof(buf), "pcre_exec: Internal error (%d)", rc);
> }
> push_warning_printf(thd, Sql_condition::WARN_LEVEL_WARN,
> - ER_REGEXP_ERROR, ER_THD(thd, ER_REGEXP_ERROR), errmsg);
> + ER_REGEXP_ERROR, ER_THD(thd, ER_REGEXP_ERROR), buf);
> }
>
>
> /**
> Call pcre_exec() and send a warning if pcre_exec() returned with an error.
> */
> -int Regexp_processor_pcre::pcre_exec_with_warn(const pcre *code,
> - const pcre_extra *extra,
> +int Regexp_processor_pcre::pcre_exec_with_warn(const pcre2_code *code,
> + pcre2_match_data *data,
> const char *subject,
> int length, int startoffset,
> - int options, int *ovector,
> - int ovecsize)
> + uint options)
> {
> - int rc= pcre_exec(code, extra, subject, length,
> - startoffset, options, ovector, ovecsize);
> + int rc= pcre2_match(code, (PCRE2_SPTR8) subject, (PCRE2_SIZE) length,
> + (PCRE2_SIZE) startoffset, options, data, NULL);
> DBUG_EXECUTE_IF("pcre_exec_error_123", rc= -123;);
> - if (unlikely(rc < PCRE_ERROR_NOMATCH))
> - pcre_exec_warn(rc);
> + if (unlikely(rc < PCRE2_ERROR_NOMATCH))
> + m_SubStrVec= NULL;
> + else
> + m_SubStrVec= pcre2_get_ovector_pointer(data);
> +
The function name is still pcre_exec_with_warn, the comment still says
"Call pcre_exec() and send a warning". But neither seems to be
happening here.
where do you do warnings now?
> return rc;
> }
>
>
> bool Regexp_processor_pcre::exec(const char *str, size_t length, size_t offset)
> {
> - m_pcre_exec_rc= pcre_exec_with_warn(m_pcre, &m_pcre_extra, str, (int)length, (int)offset, 0,
> - m_SubStrVec, array_elements(m_SubStrVec));
> + m_pcre_exec_rc= pcre_exec_with_warn(m_pcre, m_pcre_match_data,
> + str, (int)length, (int)offset, 0);
> return false;
> }
>
> diff --git a/sql/sys_vars.cc b/sql/sys_vars.cc
> index ccb7d3b29ed..e8abae3487d 100644
> --- a/sql/sys_vars.cc
> +++ b/sql/sys_vars.cc
> @@ -5692,19 +5692,21 @@ static const char *default_regex_flags_names[]=
> "DOTALL", // (?s) . matches anything including NL
> "DUPNAMES", // (?J) Allow duplicate names for subpatterns
> "EXTENDED", // (?x) Ignore white space and # comments
> - "EXTRA", // (?X) extra features (e.g. error on unknown escape character)
> + "EXTENDED_MORE",//(?xx) Ignore white space and # comments inside cheracter
perhaps, update all flag descriptions from man pcre2syntax? Like
"EXTENDED", // (?x) extended: ignore white space except in classes
"EXTENDED_MORE",//(?xx) as (?x) but also ignore space and tab in classes
also, I don't quite like EXTENDED_MORE name. But I don't have a good
suggestion. May be just re-purpose EXTENDED to do (?xx) without
introducing a new flag?
> + "EXTRA", // means nothing since PCRE2
may be a warning when it's used?
> "MULTILINE", // (?m) ^ and $ match newlines within data
> "UNGREEDY", // (?U) Invert greediness of quantifiers
> 0
> };
> static const int default_regex_flags_to_pcre[]=
> {
> - PCRE_DOTALL,
> - PCRE_DUPNAMES,
> - PCRE_EXTENDED,
> - PCRE_EXTRA,
> - PCRE_MULTILINE,
> - PCRE_UNGREEDY,
> + PCRE2_DOTALL,
> + PCRE2_DUPNAMES,
> + PCRE2_EXTENDED,
> + PCRE2_EXTENDED_MORE,
> + 0, /* EXTRA flag not available since PCRE2 */
> + PCRE2_MULTILINE,
> + PCRE2_UNGREEDY,
> 0
> };
> int default_regex_flags_pcre(const THD *thd)
> diff --git a/storage/tokudb/PerconaFT/cmake_modules/TokuFeatureDetection.cmake b/storage/tokudb/PerconaFT/cmake_modules/TokuFeatureDetection.cmake
> index 2f04a33558a..51257febc0d 100644
> --- a/storage/tokudb/PerconaFT/cmake_modules/TokuFeatureDetection.cmake
> +++ b/storage/tokudb/PerconaFT/cmake_modules/TokuFeatureDetection.cmake
> @@ -1,6 +1,8 @@
> ## feature detection
> find_package(Threads)
> -find_package(ZLIB REQUIRED)
> +IF(WITH_EXTERNAL_ZLIB)
> + find_package(ZLIB REQUIRED)
> +ENDIF()
push it in separate commit, please.
>
> option(USE_VALGRIND "Build to run safely under valgrind (often slower)." ON)
> if(USE_VALGRIND)
>
Regards,
Sergei
Chief Architect MariaDB
and security(a)mariadb.org
1
0
Re: [Maria-developers] 49b63dcc048: MDEV-15073: Generic UDAF parser code in server for windows functions
by Sergei Golubchik 23 Nov '18
by Sergei Golubchik 23 Nov '18
23 Nov '18
Hi, Oleksandr!
On Nov 21, Oleksandr Byelkin wrote:
> revision-id: 49b63dcc0482bcb0fc7f642b0c19c8e24740ab27 (mariadb-10.4.0-20-g49b63dcc048)
> parent(s): b5ac863f1494920b5e7035c9dfa0ebfdaa50a15d
> author: Oleksandr Byelkin <sanja(a)mariadb.com>
> committer: Oleksandr Byelkin <sanja(a)mariadb.com>
> timestamp: 2018-11-20 10:58:34 +0100
> message:
>
> MDEV-15073: Generic UDAF parser code in server for windows functions
>
> Added support for usual agreggate UDF (UDAF)
> Added remove() call support for more efficient window function processing
> Added example of aggregate UDF with efficient windows function support
Looks good, see a couple of minor comments below.
I wouldn't extend UDF API, really. It's ancient and it should be removed
and replaced with a new plugin type, not extended.
But if you still want to do it make sure it's documented
https://mariadb.com/kb/en/user-defined-functions/
> diff --git a/mysql-test/main/udf.test b/mysql-test/main/udf.test
> index c3a25c6bcce..e72f8b219af 100644
> --- a/mysql-test/main/udf.test
> +++ b/mysql-test/main/udf.test
> @@ -528,3 +528,62 @@ DROP FUNCTION METAPHON;
> #INSERT INTO t1 (a) VALUES ('Hello');
> #SELECT * FROM t1;
> DROP TABLE t1;
> +
> +--echo
> +--echo MDEV-15073: Generic UDAF parser code in server for windows functions
> +--echo
use --echo #
so that your echo commands would stand out in the result file.
currently they look like an output from some query and it's confusing.
The test itself is good.
> diff --git a/sql/item_sum.cc b/sql/item_sum.cc
> index 0e52b2988a3..d19645b5020 100644
> --- a/sql/item_sum.cc
> +++ b/sql/item_sum.cc
> @@ -3235,6 +3235,25 @@ bool Item_udf_sum::add()
> DBUG_RETURN(0);
> }
>
> +
> +bool Item_udf_sum::supports_removal()
> +{
> + DBUG_ENTER("Item_udf_sum::supports_remove");
> + DBUG_PRINT("info", ("support: %d", udf.supports_removal()));
> + DBUG_RETURN(udf.supports_removal());
> +}
> +
> +
> +void Item_udf_sum::remove()
> +{
> + my_bool tmp_null_value;
> + DBUG_ENTER("Item_udf_sum::remove");
> + udf.remove(&tmp_null_value);
> + null_value= tmp_null_value;
> + DBUG_VOID_RETURN;
> +}
> +
> +
> void Item_udf_sum::cleanup()
> {
> /*
> diff --git a/sql/item_sum.h b/sql/item_sum.h
> index 1a21c257221..663dea1ed7b 100644
> --- a/sql/item_sum.h
> +++ b/sql/item_sum.h
> @@ -574,7 +574,7 @@ class Item_sum :public Item_func_or_sum
> virtual bool add()= 0;
> virtual bool setup(THD *thd) { return false; }
>
> - virtual bool supports_removal() const { return false; }
> + virtual bool supports_removal() { return false; }
why?
> virtual void remove() { DBUG_ASSERT(0); }
Regards,
Sergei
Chief Architect MariaDB
and security(a)mariadb.org
2
2
The MariaDB Developer Unconference is taking place in New York 23-24
Feb, just before OpenWorks:
https://mariadb.org/2019-developers-unconference-new-york/
Please sign up on the Meetup page:
https://www.meetup.com/MariaDB-Developers/events/256385430/ - security
is very tight at the building, so everyone needs to be signed up in
advance to be given access. There's a cap on the attendance, so don't
delay too long.
If you want to suggest a session, the spreadsheet is open for proposals.
1
0
[Maria-developers] 7c088b624f4: MDEV-14500 filesort to support engines with slow rnd_pos
by Sergei Golubchik 19 Nov '18
by Sergei Golubchik 19 Nov '18
19 Nov '18
revision-id: 7c088b624f4f50e1e9f11891c371eba912228a2d (mariadb-10.4.0-14-g7c088b624f4)
parent(s): da8a589a982c45ed161b9a25d61282a6bc1fa037
author: Sergei Golubchik <serg(a)mariadb.org>
committer: Sergei Golubchik <serg(a)mariadb.org>
timestamp: 2018-11-17 14:13:37 +0100
message:
MDEV-14500 filesort to support engines with slow rnd_pos
If the engine wants to avoid rnd_pos() - force a temporary table
before a filesort. But don't do it if addon_fields are used.
---
mysql-test/suite/archive/rnd_pos.result | 56 +++++++++++++++++++++++++++++++++
mysql-test/suite/archive/rnd_pos.test | 27 ++++++++++++++++
sql/filesort.cc | 55 ++++++++++++++++++--------------
sql/filesort.h | 3 ++
sql/handler.h | 7 +++++
sql/sql_select.cc | 15 +++++++++
storage/archive/ha_archive.h | 2 +-
storage/csv/ha_tina.h | 2 +-
8 files changed, 142 insertions(+), 25 deletions(-)
diff --git a/mysql-test/suite/archive/rnd_pos.result b/mysql-test/suite/archive/rnd_pos.result
new file mode 100644
index 00000000000..b6b6748d53f
--- /dev/null
+++ b/mysql-test/suite/archive/rnd_pos.result
@@ -0,0 +1,56 @@
+create table t1(c1 int not null, c2 double not null, c3 char(255) not null) engine=archive;
+insert t1 select seq, seq+0.7, concat('row with c1 = ', seq) from seq_1_to_10;
+explain partitions select c1,c3 from t1 order by c2;
+id select_type table partitions type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 NULL ALL NULL NULL NULL NULL 10 Using filesort
+set max_length_for_sort_data = 4;
+explain partitions select c1,c3 from t1 order by c2;
+id select_type table partitions type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 NULL ALL NULL NULL NULL NULL 10 Using temporary; Using filesort
+flush status;
+select c1,c3 from t1 order by c2;
+c1 c3
+1 row with c1 = 1
+2 row with c1 = 2
+3 row with c1 = 3
+4 row with c1 = 4
+5 row with c1 = 5
+6 row with c1 = 6
+7 row with c1 = 7
+8 row with c1 = 8
+9 row with c1 = 9
+10 row with c1 = 10
+set max_length_for_sort_data = default;
+show status where variable_name like '%tmp%' and value != 0;
+Variable_name Value
+Created_tmp_tables 1
+Handler_tmp_write 10
+Rows_tmp_read 20
+alter table t1 partition by hash (c1) partitions 3;
+explain partitions select c1,c3 from t1 order by c2;
+id select_type table partitions type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 p0,p1,p2 ALL NULL NULL NULL NULL 10 Using filesort
+set max_length_for_sort_data = 4;
+explain partitions select c1,c3 from t1 order by c2;
+id select_type table partitions type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 p0,p1,p2 ALL NULL NULL NULL NULL 10 Using temporary; Using filesort
+flush status;
+select c1,c3 from t1 order by c2;
+c1 c3
+1 row with c1 = 1
+2 row with c1 = 2
+3 row with c1 = 3
+4 row with c1 = 4
+5 row with c1 = 5
+6 row with c1 = 6
+7 row with c1 = 7
+8 row with c1 = 8
+9 row with c1 = 9
+10 row with c1 = 10
+set max_length_for_sort_data = default;
+show status where variable_name like '%tmp%' and value != 0;
+Variable_name Value
+Created_tmp_tables 1
+Handler_tmp_write 10
+Rows_tmp_read 20
+drop table t1;
diff --git a/mysql-test/suite/archive/rnd_pos.test b/mysql-test/suite/archive/rnd_pos.test
new file mode 100644
index 00000000000..7a1d78ea003
--- /dev/null
+++ b/mysql-test/suite/archive/rnd_pos.test
@@ -0,0 +1,27 @@
+#
+# MDEV-14500 Support engines without rnd_pos
+#
+source include/have_archive.inc;
+source include/have_sequence.inc;
+source include/have_partition.inc;
+
+create table t1(c1 int not null, c2 double not null, c3 char(255) not null) engine=archive;
+insert t1 select seq, seq+0.7, concat('row with c1 = ', seq) from seq_1_to_10;
+explain partitions select c1,c3 from t1 order by c2;
+set max_length_for_sort_data = 4;
+explain partitions select c1,c3 from t1 order by c2;
+flush status;
+select c1,c3 from t1 order by c2;
+set max_length_for_sort_data = default;
+show status where variable_name like '%tmp%' and value != 0;
+
+alter table t1 partition by hash (c1) partitions 3;
+explain partitions select c1,c3 from t1 order by c2;
+set max_length_for_sort_data = 4;
+explain partitions select c1,c3 from t1 order by c2;
+flush status;
+select c1,c3 from t1 order by c2;
+set max_length_for_sort_data = default;
+show status where variable_name like '%tmp%' and value != 0;
+
+drop table t1;
diff --git a/sql/filesort.cc b/sql/filesort.cc
index e682f3389da..cbe79967647 100644
--- a/sql/filesort.cc
+++ b/sql/filesort.cc
@@ -1957,6 +1957,30 @@ sortlength(THD *thd, SORT_FIELD *sortorder, uint s_length,
return length;
}
+bool filesort_use_addons(TABLE *table, uint sortlength,
+ uint *length, uint *fields, uint *null_fields)
+{
+ Field **pfield, *field;
+ *length= *fields= *null_fields= 0;
+
+ for (pfield= table->field; (field= *pfield) ; pfield++)
+ {
+ if (!bitmap_is_set(table->read_set, field->field_index))
+ continue;
+ if (field->flags & BLOB_FLAG)
+ return false;
+ (*length)+= field->max_packed_col_length(field->pack_length());
+ if (field->maybe_null())
+ (*null_fields)++;
+ (*fields)++;
+ }
+ if (!*fields)
+ return false;
+ (*length)+= (*null_fields+7)/8;
+
+ return *length + sortlength <
+ table->in_use->variables.max_length_for_sort_data;
+}
/**
Get descriptors of fields appended to sorted fields and
@@ -1991,11 +2015,8 @@ get_addon_fields(TABLE *table, uint sortlength, LEX_STRING *addon_buf)
Field **pfield;
Field *field;
SORT_ADDON_FIELD *addonf;
- uint length= 0;
- uint fields= 0;
- uint null_fields= 0;
+ uint length, fields, null_fields;
MY_BITMAP *read_set= table->read_set;
- ulong max_sort_len= table->in_use->variables.max_length_for_sort_data;
DBUG_ENTER("get_addon_fields");
/*
@@ -2011,26 +2032,14 @@ get_addon_fields(TABLE *table, uint sortlength, LEX_STRING *addon_buf)
addon_buf->str= 0;
addon_buf->length= 0;
- for (pfield= table->field; (field= *pfield) ; pfield++)
- {
- if (!bitmap_is_set(read_set, field->field_index))
- continue;
- if (field->flags & BLOB_FLAG)
- DBUG_RETURN(0);
- length+= field->max_packed_col_length(field->pack_length());
- if (field->maybe_null())
- null_fields++;
- fields++;
- }
- if (!fields)
- DBUG_RETURN(0);
- length+= (null_fields+7)/8;
+ // see remove_const() for HA_SLOW_RND_POS explanation
+ if (table->file->ha_table_flags() & HA_SLOW_RND_POS)
+ sortlength= 0;
- if (length+sortlength > max_sort_len ||
- !my_multi_malloc(MYF(MY_WME | MY_THREAD_SPECIFIC),
- &addonf, sizeof(SORT_ADDON_FIELD) * (fields+1),
- &addon_buf->str, length,
- NullS))
+ if (!filesort_use_addons(table, sortlength, &length, &fields, &null_fields) ||
+ !my_multi_malloc(MYF(MY_WME | MY_THREAD_SPECIFIC), &addonf,
+ sizeof(SORT_ADDON_FIELD) * (fields+1),
+ &addon_buf->str, length, NullS))
DBUG_RETURN(0);
diff --git a/sql/filesort.h b/sql/filesort.h
index bd1d81f91ef..359f44a3907 100644
--- a/sql/filesort.h
+++ b/sql/filesort.h
@@ -161,6 +161,9 @@ SORT_INFO *filesort(THD *thd, TABLE *table, Filesort *filesort,
Filesort_tracker* tracker, JOIN *join=NULL,
table_map first_table_bit=0);
+bool filesort_use_addons(TABLE *table, uint sortlength,
+ uint *length, uint *fields, uint *null_fields);
+
void change_double_for_sort(double nr,uchar *to);
#endif /* FILESORT_INCLUDED */
diff --git a/sql/handler.h b/sql/handler.h
index b98244fe47e..ea514634b86 100644
--- a/sql/handler.h
+++ b/sql/handler.h
@@ -308,6 +308,13 @@ enum enum_alter_inplace_result {
#define HA_SLOW_CMP_REF (1ULL << 54)
#define HA_CMP_REF_IS_EXPENSIVE HA_SLOW_CMP_REF
+/**
+ Some engines are unable to provide an efficient implementation for rnd_pos().
+ Server will try to avoid it, if possible
+
+ TODO better to do it with cost estimates, not with an explicit flag
+*/
+#define HA_SLOW_RND_POS (1ULL << 55)
/* bits in index_flags(index_number) for what you can do with index */
#define HA_READ_NEXT 1 /* TODO really use this flag */
diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 3f5def0838e..c16521068c6 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -13181,6 +13181,21 @@ remove_const(JOIN *join,ORDER *first_order, COND *cond,
JOIN_TAB *head= join->join_tab + join->const_tables;
*simple_order= head->on_expr_ref[0] == NULL;
+ if (*simple_order && head->table->file->ha_table_flags() & HA_SLOW_RND_POS)
+ {
+ uint u1, u2, u3;
+ /*
+ normally the condition is (see filesort_use_addons())
+
+ length + sortlength <= max_length_for_sort_data
+
+ but for HA_SLOW_RND_POS tables we relax it a bit, as the alternative
+ is to use a temporary table, which is rather expensive.
+
+ TODO proper cost estimations
+ */
+ *simple_order= filesort_use_addons(head->table, 0, &u1, &u2, &u3);
+ }
}
else
{
diff --git a/storage/archive/ha_archive.h b/storage/archive/ha_archive.h
index 56ff566db8c..1f25fba4eed 100644
--- a/storage/archive/ha_archive.h
+++ b/storage/archive/ha_archive.h
@@ -108,7 +108,7 @@ class ha_archive: public handler
return (HA_NO_TRANSACTIONS | HA_REC_NOT_IN_SEQ | HA_CAN_BIT_FIELD |
HA_BINLOG_ROW_CAPABLE | HA_BINLOG_STMT_CAPABLE |
HA_STATS_RECORDS_IS_EXACT | HA_CAN_EXPORT |
- HA_HAS_RECORDS | HA_CAN_REPAIR |
+ HA_HAS_RECORDS | HA_CAN_REPAIR | HA_SLOW_RND_POS |
HA_FILE_BASED | HA_CAN_INSERT_DELAYED | HA_CAN_GEOMETRY);
}
ulong index_flags(uint idx, uint part, bool all_parts) const
diff --git a/storage/csv/ha_tina.h b/storage/csv/ha_tina.h
index c75a64faa52..5b389d984d6 100644
--- a/storage/csv/ha_tina.h
+++ b/storage/csv/ha_tina.h
@@ -107,7 +107,7 @@ class ha_tina: public handler
{
return (HA_NO_TRANSACTIONS | HA_REC_NOT_IN_SEQ | HA_NO_AUTO_INCREMENT |
HA_BINLOG_ROW_CAPABLE | HA_BINLOG_STMT_CAPABLE | HA_CAN_EXPORT |
- HA_CAN_REPAIR);
+ HA_CAN_REPAIR | HA_SLOW_RND_POS);
}
ulong index_flags(uint idx, uint part, bool all_parts) const
{
1
0