- developers - lists.mariadb.org

Moving forward from default-mysql-* packages in Debian
by Otto Kekäläinen 19 Jul '24

19 Jul '24

Hi! tl;dr: Anyone here interested in trying to push various open source projects in Debian and the upstreams to use mariadb-server directly instead of via default-mysql-server dependency? ## Situation Back in 2016 Debian introduced the virtual packages: - default-mysql-server - default-mysql-client - default-libmysqlclient-dev In Debian these have since pointed to MariaDB, and in Ubuntu they point to MySQL. Not much has happened since. There are only a handful packages in Debian that have declared an explicit dependency on MariaDB directly. Most packages continue to depend on default-mysql-* instead of MariaDB directly. As MariaDB and MySQL diverge, those packages will eventually end up depending on MySQL only unless they transition slowly to depend on MariaDB directly. It didn't happen organically, so it would need a push. Anyone here interested in taking up such a task? It requires talking to all the upstreams to officially support/adopt MariaDB going forward. ## Background - https://lists.debian.org/debian-devel-announce/2016/09/msg00000.html - https://lists.debian.org/debian-devel/2016/11/msg00832.html - https://wiki.debian.org/Teams/MySQL/default-mysql-server - https://tracker.debian.org/pkg/mysql-defaults ## Package lists # apt rdepends default-mysql-client default-mysql-client Reverse Depends: |Depends: audiolink |Depends: zoneminder |Depends: wordpress Recommends: tango-db |Depends: sqitch |Recommends: rsyslog-mysql |Depends: roundcube-mysql |Depends: rt5-db-mysql |Depends: rt4-db-mysql |Depends: redmine-mysql |Recommends: prelude-manager |Depends: postfixadmin |PreDepends: openstack-compute-node Depends: opendnssec-enforcer-mysql |Depends: ocsinventory-reports |Recommends: oar-server-mysql |Recommends: mysql-workbench |Suggests: mysql-sandbox Suggests: munin-plugins-core Suggests: motion Depends: libreoffice-canzeley-client Depends: lcmaps-plugins-jobrep-admin |Depends: kamailio-mysql-modules |Recommends: icinga2-ido-mysql Suggests: education-main-server |Depends: dbconfig-mysql |Suggests: cedar-backup3 |Depends: bacula-director-mysql |Depends: automysqlbackup # apt rdepends default-mysql-server default-mysql-server Reverse Depends: |Depends: python3-testing.mysqld (>= 1.0.2) |Depends: zoph |Recommends: zoneminder |Recommends: zabbix-server-mysql |Recommends: zabbix-proxy-mysql |Suggests: wordpress Suggests: trojan |Recommends: sympa |Recommends: spip |Suggests: sogo |Suggests: rsyslog-mysql Suggests: roundcube-mysql |Suggests: rt5-db-mysql |Suggests: rt4-db-mysql |Suggests: redmine-mysql |Suggests: adminer |Suggests: python3-mysqldb |Suggests: pwman3 |Suggests: prometheus-mysqld-exporter Recommends: postfix-gld |Suggests: phpmyadmin Suggests: pdns-backend-mysql Suggests: orthanc-mysql |Depends: openstack-cloud-services |Depends: openstack-cluster-installer |Suggests: ocsinventory-server |Suggests: oar-server-mysql |Recommends: mysqltuner |Suggests: mysql-sandbox |Recommends: mediawiki |Suggests: mailman3 |Suggests: mailman3-web Suggests: libreoffice-canzeley-client |Suggests: libreoffice-sdbc-mysql Suggests: inspircd Suggests: icingadb |Suggests: icinga2-ido-mysql Suggests: goval-dictionary Suggests: golang-github-ctdk-goiardi-dev Suggests: goiardi Suggests: digitemp |Depends: diaspora-common |Suggests: dbconfig-mysql Suggests: collectd-core |Recommends: cacti |Recommends: bacula-director-mysql |Recommends: audiolink # apt rdepends default-libmysqlclient-dev default-libmysqlclient-dev Reverse Depends: Depends: libboinc-app-dev Depends: libvtk9-dev Depends: urweb Depends: librust-mysqlclient-sys-dev Suggests: qtbase5-gles-dev Suggests: qtbase5-dev |Depends: postgresql-16-mysql-fdw Depends: libpoco-dev Recommends: newlisp |Suggests: mysql-sandbox Depends: libmysql-ocaml-dev Depends: libmysql++-dev Depends: libmailutils-dev Depends: kannel-dev Depends: libhoel-dev Suggests: libglpk40 Depends: libgdal-dev Suggests: fp-units-db-3.2.2 Suggests: fp-units-win-db-3.2.2 Depends: cl-sql-mysql

1 0

Re: [MariaDB commits] [PATCH] MDEV-15393 part 2. Fixes to 'gtid_slave_pos duplicate key errors after mysqldump restore' MDEV-34615 mysqldump wipes off pre-existing gtid slave state
by Kristian Nielsen 18 Jul '24

18 Jul '24

Andrei Elkin <andrei.elkin(a)mariadb.com> writes: >> The most common case is the provisioning of a slave with a complete dump >> from a master. In this case, it is required that the GTID position is set >> correctly across all domains. Merging the existing and new GTID position as >> attempted in this patch will break the provisioning of the slave. > > 'Will break' but merely for the technical (your pp1,2 to the concat) > reasons. I don't see what else can be a problem of MDEV-34615 part. No. What I mean is, consider some slave server that has an existing gtid_slave_pos='20-2-1000' say, domain_id=20. And then this server is for some reason re-provisioned (ie. all data overwritten) with a new mysqldump, say from a master with GTID pos '10-1-2000'. Now the correct GTID position matching the new dump is SET GLOBAL gtid_slave_pos='10-1-2000'; If you make the MDEV-34615 idea default, then the position for the new slave will be multi-domain '20-2-1000,10-1-2000', which is not correct and could cause problems later. Basically, only the user knows if the old @@gtid_slave_pos is meaningful and should be kept, or if it should be overwritten, mysqldump cannot reliably determine this. And changing the default will end up breaking things in some cases that are working now. The merging of GTID positions can be useful in the case where we are setting up a multi-source slave C by importing dumps from two master A and B, say. In this case the first import needs to overwrite any old gtid_slave_pos, and the second import should merge. Thus the need for the user to specify. I hope this explains the problem. - Kristian.

1 0

Re: [MariaDB commits] [PATCH] MDEV-15393 part 2. Fixes to 'gtid_slave_pos duplicate key errors after mysqldump restore' MDEV-34615 mysqldump wipes off pre-existing gtid slave state
by Kristian Nielsen 17 Jul '24

17 Jul '24

> commit 7a50135f3f908380e2dfb45e3f95c9fb10c01d62 > Author: Andrei <andrei.elkin(a)mariadb.com> > Date: Mon Jul 15 17:50:37 2024 +0300 > > MDEV-15393 part 2. Fixes to 'gtid_slave_pos duplicate key errors after mysqldump restore' > MDEV-34615 mysqldump wipes off pre-existing gtid slave state Hi Andrei, here my review of the patch, with a particular important issue to not break existing use cases and backwards compatibility, as well as some general issues/comments. > When mysqldump run on mysql system database it generates inserts sql commands > into mysql.gtid_slave_pos. After running the backup script those Here and below, I suggest better wording for text that is hard to understand or using wrong/unusual grammar. "When mysqldump is run to dump the `mysql` system database, it generates INSERT statements into the table mysql.gtid_slave_pos" > inserts did not produce an expected gtid state on slave. "did not produce the exspected gtid state" > In particular the maximum of mysql.gtid_slave_pos.sub_id did not make > into > rpl_global_gtid_slave_state.last_sub_id > a memory object that the table persists. I think you mean here "an in-memory object that is supposed to match the current state of the table". > And that was regardless of whether --gtid option in or out. "whether the --gtid option was specified by the user or not". > The fixes ensure the insert block of the dump script is followed > with a "summing-up" SET @global.gtid_slave_pos assignment. > To address MDEV-34615 the assignment makes sure a possibly > pre-existing gtid state (e.g reflecting another domain) remains as a > sub-state upon the dump installation. This change is not appropriate, as it breaks backwards compatibility and breaks a common use case. Certainly not in a GA release such as 10.5! The most common case is the provisioning of a slave with a complete dump from a master. In this case, it is required that the GTID position is set correctly across all domains. Merging the existing and new GTID position as attempted in this patch will break the provisioning of the slave. The existing supported way of not overwriting the slave's old @@GLOBAL.gtid_slave_pos is to use --master-data=2 --gtid. This will output the GTID position in the dump commented out, and the user can themselves (manually or through a script of their own) merge the positions as required for their setup and application. If you think it is important to have a facility in mysqldump to do merging automatically, then you should implement this as a new option. Such new option would need to go to the next development version (11.6 or 11.7 I suppose?). It could be --master-data = 5 or 6 for example (idea being bit 2 means "merge old and new position), or maybe some other way that doesn't require "magic" constants. I am commenting below on some problems with the implementation of the merging of old and new position. But note that these comments apply to a different patch that implement this as a new option. These changes must not be made to the existing default behaviour. > diff --git a/client/mysqldump.c b/client/mysqldump.c > index 1c9d92d6e09..0be890e2528 100644 > --- a/client/mysqldump.c > +++ b/client/mysqldump.c > @@ -5985,8 +5985,13 @@ static int dump_selected_tables(char *db, char **table_names, int tables) > } /* dump_selected_tables */ > > > +const char fmt_gtid_pos[] = "%sSET GLOBAL gtid_slave_pos=concat('%s', " > + "if(length(@@global.gtid_slave_pos) = 0, '', ','), " > + "@@global.gtid_slave_pos);\n"; Two problems with this: 1. This will result in a syntax error if the new position is empty and the old position is not (eg. SET GLOBAL gtid_slave_pos=',0-1-2'). 2. Unless I'm missing something obvious, this doesn't actually work to merge two different GTIDs with the same domain! It produces an error ER_DUPLICATE_GTID_DOMAIN. It seems there is a lack of coverage in the associated test case, if the actual merging of domains isn't even tested? SELECT @@GLOBAL.gtid_slave_pos; @@GLOBAL.gtid_slave_pos 0-1-2 SET GLOBAL gtid_slave_pos= CONCAT('0-1-10', ',', @@GLOBAL.gtid_slave_pos); rpl.tmp2 'mix' [ fail ] mysqltest: At line 12: query 'SET GLOBAL gtid_slave_pos= CONCAT('0-1-10', ',', @@GLOBAL.gtid_slave_pos)' failed: ER_DUPLICATE_GTID_DOMAIN (1943): GTID 0-1-2 and 0-1-10 conflict (duplicate domain id 0) > @@ -6054,8 +6059,11 @@ static int do_show_master_status(MYSQL *mysql_con, int consistent_binlog_pos, > fprintf(md_result_file, > "%sCHANGE MASTER TO MASTER_USE_GTID=slave_pos;\n", > comment_prefix); > - fprintf(md_result_file, > - "%sSET GLOBAL gtid_slave_pos='%s';\n", > + /* > + Defer print of SET gtid_slave_pos until after its placeholder > + table is guaranteed to have been dumped. > + */ > + sprintf(set_gtid_pos, fmt_gtid_pos, set_gtid_pos is a fixed size buffer, if I understand the patch correctly. This seems to mean that you have a buffer overflow here if the GTID position has too many domains? Note that it is not appropriate to simply use snprintf() instead, as this will merely truncate the output and give silent data corruption (or in this case probably a strange syntax error). Instead at least a proper error should be given (better to support arbitrary GTID position of course, but that might be a separate task if this is already an existing limitation of mysqldump, I did not check). > @@ -6181,10 +6189,8 @@ static int do_show_slave_status(MYSQL *mysql_con, int use_gtid, > mysql_free_result(slave); > return 1; > } > - print_comment(md_result_file, 0, > - "-- GTID position to start replication:\n"); > - fprintf(md_result_file, "%sSET GLOBAL gtid_slave_pos='%s';\n", > - gtid_comment_prefix, gtid_pos); > + /* similarly to do_show_master_status */ > + sprintf(set_gtid_pos, fmt_gtid_pos, gtid_comment_prefix, gtid_pos); I assume this is also a potential buffer overflow, didn't check closely. > @@ -6913,6 +6932,11 @@ int main(int argc, char **argv) > int exit_code; > int consistent_binlog_pos= 0; > int have_mariadb_gtid= 0; > + /* > + to hold SET @@global.gtid_slave_pos which is deferred to print > + until the function epilogue. > + */ > + char set_gtid_pos[3 + sizeof(fmt_gtid_pos) + MAX_GTID_LENGTH]= {0}; - Kristian.

1 0

'_' in numeric literals [MDEV-33228]
by Rohan Gupta 16 Jul '24

16 Jul '24

I was going through this issue https://jira.mariadb.org/browse/MDEV-34228, and noticed that during parsing we check if '_' is present in a token , if it does it is parsed as an identifier. My question here is to follow a SQL2023 standard of '_' in numeric literals, how would one go distinguishing the difference between numeric literals and identifiers ? mariadb documentation says that a identifier cannot start with a number until its in backticks `{identifier}`. Rohan Gupta

2 1

Re: d78376da69d: MDEV-34504 PURGE BINARY LOGS not working anymore
by Sergei Golubchik 10 Jul '24

10 Jul '24

Hi, Monty, On Jul 09, Michael Widenius wrote: > revision-id: d78376da69d (mariadb-11.4.2-11-gd78376da69d) > parent(s): 4ffeca643d5 > author: Michael Widenius > committer: Michael Widenius > timestamp: 2024-07-09 11:01:29 +0300 > message: > > MDEV-34504 PURGE BINARY LOGS not working anymore > > PURGE BINARY LOGS did not always purge binary logs. This commit fixes > some of the issues and adds notifications if a binary log cannot be > purged. > > User visible changes: > - 'PURGE BINARY LOG TO log_name' and 'PURGE BINARY LOGS BEFORE date' > worked differently. 'TO' ignored 'slave_connections_needed_for_purge' > while 'BEFORE' did not. Now both versions ignores the > 'slave_connections_needed_for_purge variable'. > - 'PURGE BINARY LOG..' commands now returns 'note' if a binary log cannot > be deleted like > Note 1375 Binary log 'master-bin.000004' is not purged because it is > the current active binlog > - Automatic binary log purges, based on date or size, will write a > note to the error log if a binary log matching the size or date > cannot yet be deleted. > - If 'slave_connections_needed_for_purge' is set from a config or > command line, it is set to 0 if Galera is enabled and 1 otherwise > (old default). This ensures that automatic binary log purge works > with Galera as before the addition of > 'slave_connections_needed_for_purge'. > If the variable is changed to 0, a warning will be printed to the error > log. > > Code changes: > - Added THD argument to several purge_logs related functions that needed > THD. > - Added 'interactive' options to purge_logs functions. This allowed > me to remove testing of sql_command == SQLCOM_PURGE. it's not really _ineractive_ but ok > - Changed purge_logs_before_date() to first check if log is applicable > before calling can_purge_logs(). This ensures we do not get a > notification for logs that does not match the remove criteria. > - MYSQL_BIN_LOG::can_purge_log() will write notifications to the user > or error log if a log cannot yet be removed. > - log_in_use() will return reason why a binary log cannot be removed. > - Moved checking of binlog_format for Galera to be after Galera is > initialized (The old check never worked). If Galera is enabled > we now change the binlog_format to ROW, with a warning, instead of > aborting the server. If this change happens, the binlog_format variable > will be marked with AUTO or FORCED, for information_schema.system_variables, > and a warning will be printed to the error log. What's "FORCE"? "AUTO" means that the value was automatically changed by the server, "FORCE" means the same, it looks redundant. And it's not the "source of the value", better to revert that part. > - Print also a warning if FLASHBACK changes the binlog_format to ROW. > Before this was done silently. > > diff --git a/sql/mysqld.cc b/sql/mysqld.cc > index 2b12001de9e..604eaadcc92 100644 > --- a/sql/mysqld.cc > +++ b/sql/mysqld.cc > @@ -5847,7 +5850,28 @@ int mysqld_main(int argc, char **argv) > #ifdef WITH_WSREP > wsrep_set_wsrep_on(nullptr); > if (WSREP_ON && wsrep_check_opts()) unireg_abort(1); > -#endif > + > + if (!opt_bootstrap && WSREP_PROVIDER_EXISTS && WSREP_ON) > + { > + if (global_system_variables.binlog_format != BINLOG_FORMAT_ROW) > + { > + sql_print_warning("Binlog_format changed to \"ROW\" because of Galera"); should be sql_print_information() > + global_system_variables.binlog_format= BINLOG_FORMAT_ROW; > + mark_binlog_format_used(binlog_format_used); > + } > + binlog_format_used= 1; > + if (!slave_connections_needed_for_purge_option_used) > + { > + slave_connections_needed_for_purge= > + internal_slave_connections_needed_for_purge= 0; > + mark_slave_connections_needed_for_purge_as_auto(); you don't need that, you can use SYSVAR_AUTOSIZE(internal_slave_connections_needed_for_purge, 0); > + sql_print_information( > + "slave_connections_needed_for_purge changed to 0 because " > + "of Galera. Change it to 1 or higher if this Galera node " > + "is also Master in a normal replication setup"); > + } > + } > +#endif /* WITH_WSREP */ > > #ifdef _WIN32 > /* > @@ -8219,7 +8243,9 @@ mysqld_get_one_option(const struct my_option *opt, const char *argument, > ((enum_slave_parallel_mode)opt_slave_parallel_mode); > break; > } > - > + case (int) OPT_SLAVE_CONNECTIONS_NEEDED_FOR_PURGE: > + slave_connections_needed_for_purge_option_used= 1; you don't need that, you can use if (!IS_SYSVAR_AUTOSIZE(&internal_slave_connections_needed_for_purge)) > + break; > case (int)OPT_BINLOG_IGNORE_DB: > { > binlog_filter->add_ignore_db(argument); > @@ -8714,18 +8740,14 @@ static int get_options(int *argc_ptr, char ***argv_ptr) > opt_bin_log= opt_bin_log_used= 1; > > /* Force format to row */ > + if (global_system_variables.binlog_format != BINLOG_FORMAT_ROW) > + { > + sql_print_warning("Binlog_format changed to \"ROW\" because of " > + "flashback"); sql_print_information > + global_system_variables.binlog_format= BINLOG_FORMAT_ROW; > + mark_binlog_format_used(binlog_format_used); > + } > binlog_format_used= 1; > - global_system_variables.binlog_format= BINLOG_FORMAT_ROW; > - } > - > - if (!opt_bootstrap && WSREP_PROVIDER_EXISTS && WSREP_ON && > - global_system_variables.binlog_format != BINLOG_FORMAT_ROW) > - { > - > - WSREP_ERROR ("Only binlog_format = 'ROW' is currently supported. " > - "Configured value: '%s'. Please adjust your configuration.", > - binlog_format_names[global_system_variables.binlog_format]); > - return 1; > } Regards, Sergei Chief Architect, MariaDB Server and security(a)mariadb.org

2 1

GitHub account owners - please +1 these MariaDB mention PRs
by Otto Kekäläinen 10 Jul '24

10 Jul '24

Hi! I assume everyone on this mailing list has a GitHub account. Could you please +1 these PRs that add a mention of MariaDB in the documentation of various projects that currently only mention MySQL? Open each link below and press the thumbs up icon in the lower left corner of the PR description: https://github.com/bookshelf/bookshelf/pull/2129 https://github.com/ponyorm/pony/pull/708 https://github.com/metabase/metabase/issues/40325 https://github.com/SeaQL/seaql.github.io/pull/116 I am asking this because there are a lot of database tools and libraries out there that work with both MySQL and MariaDB, but they only mention MySQL, which is a pity, because the users of those tools might actually switch from MariaDB to MySQL simply out of confusion/fear that the tool supports only MySQL, which is almost never true. One of those rare cases is Javascript/Node.js Drizzle ORM which needs changes to support MariaDB. There is already a PR for that, you could +1 it as well: https://github.com/drizzle-team/drizzle-orm/pull/1692 The Rust Diesel ORM has a similar situation going on: https://github.com/diesel-rs/diesel/issues/1882 https://github.com/diesel-rs/diesel/pull/3964 However most projects have absolutely no MySQL-specific things and work with MariaDB, and just needed the docs updated, like was done in these cases: https://github.com/mysql-net/MySqlConnector/pull/1460 https://github.com/rails/rails/pull/51330 https://github.com/prisma/docs/pull/5706 https://github.com/gregstoll/gallery2/pull/168 https://github.com/coleifer/peewee/pull/2858

3 5

MariaDB 11.4 build consumers too much disk space - solved by not shipping the embedded server in Debian anymore
by Otto Kekäläinen 09 Jul '24

09 Jul '24

Hi all! While preparing 11.4.2 for upload to Debian[1] I noticed that builders fail on lack of disk space[2]. Below are some listings of build artifacts sorted by size. I solved this for now by stopping[3] to ship the embedded server in Debian, I don't think it had many users anyway. Leaving this here for visibility/comments though. [1] https://salsa.debian.org/mariadb-team/mariadb-server/-/merge_requests/88 [2] https://salsa.debian.org/otto/mariadb-server/-/jobs/5793989 [3] https://salsa.debian.org/mariadb-team/mariadb-server/-/merge_requests/88/di… ± du -shc * | sort -hr 12G total 7,8G builddir 3,7G debian 234M mysql-test 95M storage 30M sql 17M strings 5,1M plugin ± find debian/tmp/ -type f -exec du -b {} \; | sort -n | tail -n 25 8938376 debian/tmp/usr/bin/myisampack 8956744 debian/tmp/usr/bin/mariadb-test 9441288 debian/tmp/usr/bin/myisamchk 9661400 debian/tmp/usr/bin/mariadb-client-test 9725048 debian/tmp/usr/bin/mariadb-binlog 9894072 debian/tmp/usr/lib/mysql/plugin/ha_spider.so 10277896 debian/tmp/usr/lib/mysql/plugin/ha_connect.so 10622344 debian/tmp/usr/bin/aria_s3_copy 11875712 debian/tmp/usr/bin/aria_dump_log 12048792 debian/tmp/usr/bin/aria_ftdump 12079744 debian/tmp/usr/bin/aria_pack 12814096 debian/tmp/usr/bin/aria_read_log 12943168 debian/tmp/usr/bin/aria_chk 25659064 debian/tmp/usr/lib/mysql/plugin/ha_mroonga.so 138399064 debian/tmp/usr/bin/sst_dump 143275720 debian/tmp/usr/lib/mysql/plugin/ha_rocksdb.so 145093496 debian/tmp/usr/bin/mariadb-ldb 214442664 debian/tmp/usr/bin/test-connect-t 214822248 debian/tmp/usr/bin/mariadb-embedded 214849432 debian/tmp/usr/lib/x86_64-linux-gnu/libmariadbd.so.19 215011504 debian/tmp/usr/bin/mariadb-test-embedded 216148632 debian/tmp/usr/bin/mariadb-client-test-embedded 248523904 debian/tmp/usr/sbin/mariadbd 256329048 debian/tmp/usr/bin/mariadb-backup 622299760 debian/tmp/usr/lib/x86_64-linux-gnu/libmariadbd.a ± find builddir -type f -exec du -b {} \; | sort -n | tail -n 25 18278408 builddir/storage/rocksdb/librocksdb_tools.a 25227146 builddir/sql/libwsrep.a 25298746 builddir/sql/libwsrep_provider.a 25614608 builddir/storage/mroonga/vendor/groonga/lib/libgroonga.a 25659064 builddir/storage/mroonga/ha_mroonga.so 92889078 builddir/storage/perfschema/libperfschema_embedded.a 97604058 builddir/storage/perfschema/libperfschema.a 125637396 builddir/storage/innobase/libinnobase_embedded.a 131904864 builddir/storage/innobase/libinnobase.a 138399064 builddir/storage/rocksdb/sst_dump 143275720 builddir/storage/rocksdb/ha_rocksdb.so 145093496 builddir/storage/rocksdb/mariadb-ldb 189426944 builddir/unittest/sql/my_json_writer-t 189648448 builddir/unittest/sql/explain_filename-t 214442664 builddir/unittest/embedded/test-connect-t 214822248 builddir/libmysqld/examples/mariadb-embedded 214849432 builddir/libmysqld/libmariadbd.so.19 215011504 builddir/libmysqld/examples/mariadb-test-embedded 216148632 builddir/libmysqld/examples/mariadb-client-test-embedded 248523904 builddir/sql/mariadbd 256329048 builddir/extra/mariabackup/mariadb-backup 337460022 builddir/libmysqld/libsql_embedded.a 380429390 builddir/sql/libsql.a 511486028 builddir/storage/rocksdb/librocksdblib.a 622299760 builddir/libmysqld/libmariadbd.a

1 2

Re: [MariaDB commits] [PATCH] MDEV-34504 PURGE BINARY LOGS not working anymore
by Kristian Nielsen 09 Jul '24

09 Jul '24

Hi Monty, Here is my review of the patch for MDEV-34504. Looks good, just a bunch of small suggestions: > From: Monty <monty(a)mariadb.org> > > PURGE BINARY LOGS did not always purge binary logs. This commit fixes > some of the issues and adds notifications if a binary log cannot be > purged. > diff --git a/mysql-test/suite/binlog/r/binlog_index.result b/mysql-test/suite/binlog/r/binlog_index.result > index 9dfda71f9a7..2d2363a7fec 100644 > --- a/mysql-test/suite/binlog/r/binlog_index.result > +++ b/mysql-test/suite/binlog/r/binlog_index.result > @@ -30,6 +30,7 @@ flush logs; > flush logs; > *** must be a warning master-bin.000001 was not found *** > Warnings: > +Note 1375 Binary log 'master-bin.000004' is not purged because it is the current active binlog Wording: "it is the active binlog" or "it is the currently active binlog". > +Warnings: > +Note 1375 Binary log 'master-bin.000004' is not purged because it is in use by an active XID transaction A slightly better explanation is: "because it may be needed for crash recovery". This more directly explains to the user why the file cannot be removed yet (and what happens if the user removes the file anyway through the file system). It's true that this is currently related mostly to xa transactions, but the details are more complex. It's not really "active" transactions (the transactions are already committed at this point, it's the InnoDB redo log that may not yet be synced to disk). And there may be other reasons in the future (I think maybe binlog rotation already can be a reason, too). > --- a/mysql-test/suite/binlog/t/binlog_flush_binlogs_delete_domain.test > +++ b/mysql-test/suite/binlog/t/binlog_flush_binlogs_delete_domain.test > @@ -91,6 +91,8 @@ while ($domain_cnt) > FLUSH BINARY LOGS; > --let $purge_to_binlog= query_get_value(SHOW MASTER STATUS, File, 1) > --eval PURGE BINARY LOGS TO '$purge_to_binlog' > +--replace_column 2 # > +SHOW BINARY LOGS; Better use --source include/show_binary_logs.inc here (for consistency with other tests, it does the same thing). > diff --git a/sql/log.cc b/sql/log.cc > index c27c4f3353b..1384fb0b3e7 100644 > --- a/sql/log.cc > +++ b/sql/log.cc > @@ -4791,8 +4791,8 @@ int MYSQL_BIN_LOG::purge_first_log(Relay_log_info* rli, bool included) > > @@ -4902,7 +4903,6 @@ int MYSQL_BIN_LOG::purge_logs(const char *to_log, > log_info.log_file_name); > goto err; > } > - > if (find_next_log(&log_info, 0) || exit_loop) > break; > } Accidental deletion of empty line. > @@ -5428,18 +5435,23 @@ int MYSQL_BIN_LOG::real_purge_logs_by_size(ulonglong binlog_pos) > -MYSQL_BIN_LOG::can_purge_log(const char *log_file_name_arg) > +MYSQL_BIN_LOG::can_purge_log(THD *thd, const char *log_file_name_arg, > + bool interactive) > { > - THD *thd= current_thd; // May be NULL at startup > @@ -5464,8 +5479,7 @@ MYSQL_BIN_LOG::can_purge_log(const char *log_file_name_arg) > purge_sending_new_binlog_file= sending_new_binlog_file; > } > if ((res= log_in_use(log_file_name_arg, > - (is_relay_log || > - (thd && thd->lex->sql_command == SQLCOM_PURGE)) ? > + (is_relay_log || interactive) ? > 0 : slave_connections_needed_for_purge))) It's great that you avoid using current_thd here. In fact, with this change, thd is no longer used at all in can_purge_log(), so you don't need to pass it in as an argument any longer. (I'm guessing this is a left-over from an earlier version of the patch, before you introduced the `interactive` argument, which I also think is very good). So please remove the `thd` argument of can_purge_log() ... > @@ -5338,6 +5339,7 @@ int MYSQL_BIN_LOG::real_purge_logs_by_size(ulonglong binlog_pos) > MY_STAT stat_area; > char to_log[FN_REFLEN]; > ulonglong found_space= 0; > + THD *thd= current_thd; ... and then you can also remove this call of current_thd. > @@ -5473,9 +5487,39 @@ MYSQL_BIN_LOG::can_purge_log(const char *log_file_name_arg) > waiting_for_slave_to_change_binlog= 1; > strmake(purge_binlog_name, log_file_name_arg, > sizeof(purge_binlog_name)-1); > + if (res == 1) > + reason= "it is in use by a slave thread"; > + else > + reason= "less than 'slave_connections_needed_for_purge' slaves has " > + "processed it"; > + goto error; Grammar: s/has/have/: "less than 'slave_connections_needed_for_purge' slaves have processed it" > @@ -5847,7 +5850,28 @@ int mysqld_main(int argc, char **argv) > #ifdef WITH_WSREP > wsrep_set_wsrep_on(nullptr); > if (WSREP_ON && wsrep_check_opts()) unireg_abort(1); > -#endif > + > + if (!opt_bootstrap && WSREP_PROVIDER_EXISTS && WSREP_ON) > + { > + if (global_system_variables.binlog_format != BINLOG_FORMAT_ROW) > + { > + sql_print_warning("Binlog_format changed to \"ROW\" because of Galera"); > + global_system_variables.binlog_format= BINLOG_FORMAT_ROW; > + mark_binlog_format_used(binlog_format_used); > + } This change seems unrelated to MDEV-34504. Not critical I think, but for the future it's best to put such changes in a separate commit. (The change itself is good, I think.) > @@ -1303,13 +1312,19 @@ Sys_slave_connections_needed_for_purge( > "slave_connections_needed_for_purge", > "Minimum number of connected slaves required for automatic binary " > "log purge with max_binlog_total_size, binlog_expire_logs_seconds " > - "or binlog_expire_logs_days.", > + "or binlog_expire_logs_days. Default is 0 when Galera is enabled and 1 " > + "otherwise.", Since Galera is #ifdef, I guess this should be something like: "or binlog_expire_logs_days. " #ifdef WSREP "Default is 0 when Galera is enabled and 1 otherwise.", #else "Defaults to 1.", #endif > diff --git a/sql/sys_vars.h b/sql/sys_vars.h > new file mode 100644 > index 00000000000..8f4eac38cd0 > --- /dev/null > +++ b/sql/sys_vars.h > @@ -0,0 +1,17 @@ > +/* Copyright (c) 2024, MariaDB Corporation. > + > + This program is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; version 2 of the License. Hm. It's not copyrighted by MariaDB Corporation solely, there are other copyright holders. If you want to have a copyright line before the GPL header, you could maybe say "Copyright (c) 2024, MariaDB Corporation and others", or "Parts of this file Copyright (c) 2024, MariaDB Corporation". Or just omit it. - Kristian.

1 0

Re: [MariaDB commits] [PATCH] MDEV-33487 Feature non-blocking binlog
by Kristian Nielsen 08 Jul '24

08 Jul '24

Here is my review of the MDEV-33487 / PR#3087 patch. First, some general remarks. This patch addresses a specific bottleneck when binlogging huge transactions (eg. multi-gigabyte table updates logged in ROW mode). The root cause is a fundamental limitation in the current binlog format that requires all transactions to be binlogged as a single event group, using consecutive bytes in a single binlog file. Thus, while writing the large transaction to the binlog during commit, all other transaction commits will have to wait before they can commit. Eventually, we should move towards an enhanced binlog format that doesn't have this limitation, so that transactions can be partially binlogged during execution, avoiding this bottlenect. There are a number of other problems caused by this limitation that are not addressed by this patch. Changing the binlog format is a complex task not easily done, so this patch should be seen as a temporary solution to one specific bottleneck, until if/when this underlying limitation can hopefully be removed. Below my detailed review of the code in the patch. This patch touches core code of the binlog group commit mechanism, which is complex code with a number of tricky issues to consider, and very important to keep correct and maintainable. I have tried to give concrete suggestions on ways to write the code, as well as some necessary extra tests. I think if MariaDB PLC wants this patch to be merged, they need to assign a core developer to take responsibility for it to ensure that any problems that turn up later will be handled in a reasonable time frame (eg. prioritise time to work with the author to debug and get fixes reviewed and merged). > From: poempeng <poempeng(a)tencent.com> > > During the commit stage of a large transaction, it may cost too much > time to write binlog cache and block all subsequent transactions for > a long time. One possible solution of this issue is to write the binlog > of the large transaction to a temporary file and then rename it to the > next new binlog file. It's not really the cost of writing (which is similar with and without the patch), the problem is doing costly I/O under the global mutex LOCK_log that blocks other commits. I suggest making that clear from the commit message, for example: Very large transactions (eg. gigabytes) can stall other transactions for a long time because the data is written while holding LOCK_log, which blocks other commits from binlogging. One possible solution to this issue is to write a large transaction to its own binlog file outside of holding LOCK_log, and then forcing a binlog rotate after obtaining LOCK_log, simply renaming the new binlog file in place. > diff --git a/sql/log.cc b/sql/log.cc > index 460cefea47b..9de1c6118cd 100644 > --- a/sql/log.cc > +++ b/sql/log.cc > @@ -74,6 +74,9 @@ > #include <utility> // pair > #endif > > +#include <atomic> > +#include <chrono> These should not be needed after changes mentioned below. > @@ -3727,7 +3730,10 @@ bool MYSQL_BIN_LOG::open(const char *log_name, > enum cache_type io_cache_type_arg, > ulong max_size_arg, > bool null_created_arg, > - bool need_mutex) > + bool need_mutex, > + const char *file_to_rename, > + my_off_t file_size, > + group_commit_entry *entry) > + if (file_to_rename) > + { > + if (write_gtid_and_skip_event(file_size, entry)) > + goto err; > + } > + I don't think you need to write these here, inside MYSQL_BIN_LOG::open(). Instead you can do it in the caller after open() returns, right? Then you can avoid the extra arguments file_size and entry, and the conditional here. > @@ -6826,7 +6850,6 @@ MYSQL_BIN_LOG::write_gtid_event(THD *thd, bool standalone, > thd->variables.server_id= global_system_variables.server_id; > } > #endif > - Avoid unrelated changes in the diff like this one (to simplify later merges and history search). > @@ -7849,11 +7872,18 @@ int Event_log::write_cache_raw(THD *thd, IO_CACHE *cache) > - mysql_mutex_assert_owner(&LOCK_log); > + > + IO_CACHE *out_file= f; > + if (likely(f == nullptr)) > + { > + mysql_mutex_assert_owner(&LOCK_log); > + out_file= get_log_file(); > + } Don't add another conditional here. Instead just pass in the IO_CACHE to use as a parameter in each call site, eg. make the `f` parameter mandatory, not optional. > @@ -8581,7 +8612,23 @@ MYSQL_BIN_LOG::queue_for_group_commit(group_commit_entry *orig_entry) > bool > MYSQL_BIN_LOG::write_transaction_to_binlog_events(group_commit_entry *entry) > { > - int is_leader= queue_for_group_commit(entry); > + int is_leader; > + if (unlikely(entry->cache_mngr->trx_cache.get_byte_position() >= > + non_blocking_binlog_threshold) || > + DBUG_IF("non_blocking_binlog_ignore_cache_size")) > + { > + if (!can_use_non_blocking_binlog(entry) || > + write_non_blocking_binlog(entry)) > + goto original_commit_path; > + /* thread using non-blocking binlog is treated as a single group */ > + is_leader= 1; > + entry->non_blocking_log= true; > + entry->next= nullptr; > + goto group_commit_leader; > + } > +original_commit_path: > + > + is_leader= queue_for_group_commit(entry); Ok, so when we force a binlog rotation due to binlog_non_blocking_threshold, we do this by itself, not as part of a group commit. I think that is sensible. The way this gets integrated in the existing code is not very nice, with all these extra goto's and conditionals on entry->non_blocking_log. Can we instead make the decision on binlog_non_blocking_threshold in the function MYSQL_BIN_LOG::write_transaction_to_binlog()? And then call into MYSQL_BIN_LOG::trx_group_commit_leader() directly, skipping the group commit code with `is_leader` in write_transaction_to_binlog_events(). That should greatly simplify the logic. Probably the trx_group_commit_leader() needs to be refactored a bit, split in two, where the first part is dealing with grabbing the group commit queue, while only the second part is called in the binlog_non_blocking_threshold case, to do the actual binlogging. You might also consider using a separate function for the binlogging in the binlog_non_blocking_threshold case, extracting pieces from trx_group_commit_leader() into smaller shared functions; it depends on what gives the cleanest and simplest code, and is best decided as part of the refactoring of trx_group_commit_leader(). Another thing related to is this: How does the code ensure that commit ordering (on the slave) is preserved? This is the wait_for_prior_commit() mechanism, and is needed to ensure on the slave that the GTIDs are binlogged in the right order. This is normally handled by queue_for_group_commit(), which is skipped in the binlog_non_blocking_threshold case. It was not clear to me from the code how this gets handled. This probably also needs a test case to test that binlog order will be correct on the slave when multiple transactions commit in parallel (eg. optimistic parallel replication), and several of them use the binlog_non_blocking_threshold case. > @@ -12459,6 +12519,7 @@ mysql_bin_log_commit_pos(THD *thd, ulonglong *out_pos, const char **out_file) > } > #endif /* INNODB_COMPATIBILITY_HOOKS */ > > +mysql_rwlock_t binlog_checksum_rwlock; > > static void > binlog_checksum_update(MYSQL_THD thd, struct st_mysql_sys_var *var, > @@ -12468,6 +12529,7 @@ binlog_checksum_update(MYSQL_THD thd, struct st_mysql_sys_var *var, > bool check_purge= false; > ulong UNINIT_VAR(prev_binlog_id); > > + mysql_rwlock_wrlock(&binlog_checksum_rwlock); > mysql_mutex_lock(mysql_bin_log.get_log_lock()); > if(mysql_bin_log.is_open()) > { Ok, so you introduce a new lock to protect changing binlog_checksum_options. This is otherwise protected by LOCK_log, but we don't want to be holding that while writing the bin transaction, obviously. But did you check all places where binlog_checksum_options is accessed? Your patch doesn't use the new binlog_checksum_rwlock anywhere except in binlog_checksum_update(). There might be some places that currently use LOCK_log to protect the access, and could instead use the new binlog_checksum_rwlock (reducing contention on LOCK_log). Or did you check already, and there was no such opportunity? > +bool MYSQL_BIN_LOG::can_use_non_blocking_binlog(group_commit_entry *entry) > +{ > + DBUG_ASSERT(entry->cache_mngr->trx_cache.get_byte_position() >= > + non_blocking_binlog_threshold || > + DBUG_IF("non_blocking_binlog_ignore_cache_size")); > + THD *thd= entry->thd; > + if (unlikely(!is_open()) || encrypt_binlog || > + !entry->cache_mngr->stmt_cache.empty() || > + entry->cache_mngr->trx_cache.has_incident() || thd->slave_thread || > + thd->wait_for_commit_ptr) Oh. So you disable this on the slave. Is that really necessary? Ok, on the slave, in many cases the following transactions in any case are blocked from committing before the large one, so I see that this could be acceptable. It does seem a very special-purpose use case though. So if this is kept disabled for slave threads, it needs to be very clearly documented, and there should be a comment here as well in the code about why this is done. > +std::string MYSQL_BIN_LOG::generate_random_file_name() "generate_tmp_binlog_file_name" ? > + if (unlikely(!binlog_dir_inited.load(std::memory_order_acquire))) > + { > + char dev[FN_REFLEN]; > + size_t dev_length; > + mysql_mutex_lock(&LOCK_log); > + if (!binlog_dir_inited.load(std::memory_order_relaxed) && name != nullptr) This is not necessary. Just initialize what you need when the binlog is initialized, just like other binlog initialization, avoiding the check for binlog_dir_inited. See for example init_server_components() in sql/mysqld.cc > + std::string temp_file_name; Normally we don't use std::string in the server source, as it makes it harder to control the memory allocation and/or to avoid dynamic memory allocation. In this case we would normally use a fixed-size buffer of length FN_REFLEN, or the class String from sql/sql_string.h with some fixed stack-allocated initial buffer for the common case of not too long name, to avoid a dynamic memory allocation. > + temp_file_name.reserve(FN_REFLEN); > + auto now_in_sys= std::chrono::system_clock::now().time_since_epoch(); > + auto now_in_ms= > + std::chrono::duration_cast<std::chrono::milliseconds>(now_in_sys) > + .count(); You don't need to add this millisecond timestamp, do you? The file name will already be unique from temp_bin_counter? > + auto count= temp_bin_counter.fetch_add(1); This by default uses std::memory_order_seq_cst, which is overly restrictive. You can just use class Atomic_counter from include/my_counter.h to get a simple atomic counter with std::memory_order_relaxed semantics. > + temp_file_name.append(binlog_dir); > + temp_file_name.append("_temp_bin_"); > + temp_file_name.append(std::to_string(now_in_ms)); > + temp_file_name.push_back('_'); > + temp_file_name.append(std::to_string(count)); > +template <typename F= std::function<void()>> class Scoped_guard > +{ > +public: > + Scoped_guard(F f); > + ~Scoped_guard(); > + > +private: > + F func; > +}; Hm. Yes, this is nice, but it is not good if each new feature implements its own specific class like this. So either this should go somewhere shared where it can be used in the future by other code (did you check that there isn't already something similar that can be used, in the MariaDB source code or in the C++ standard library?), preferably as a separate commit. Or alternatively stay with the existing old-fascioned style for error handling used elsewhere in sql/log.cc. > + size_t skip_event_len= non_blocking_binlog_reserved_size - skip_event_start; > + skip_event_buf= > + (uchar *) my_malloc(PSI_INSTRUMENT_ME, skip_event_len, MYF(MY_ZEROFILL)); > + generate_skip_event(skip_event_buf, skip_event_len, skip_event_start, > + entry->thd); > + if (my_b_write(&log_file, skip_event_buf, skip_event_len) != 0) > + return true; Do we really need to generate this skip event? It seems quite hacky to have a random dummy event in the middle of the binlog. The size you need to reserve should be possible to pre-calculate: - Format_description_log_event - Binlog_checkpoint_event - Gtid_list_log_event - Gtid_log_event The only issue should be the Gtid_list_log_event. This can grow in size, but only if a new GTID domain id or server id gets added during the writing of the temporary file. This is very unlikely to happen, and the code can check for if this happens and in this case fall back to writing the data to the binlog file normally under LOCK_log. This way, we can avoid this strange skip event. > + if (temp_file_name.size() >= FN_REFLEN) > + { > + sql_print_warning("The name of temp file '%s' is too long!", > + temp_file_name.c_str()); > + return true; This will go in the server log for the user/DBA to wonder about. So add a bit of context to the message so that it is clear that it is about the temporary file used for binlog_non_blocking_threshold. > + thd->push_internal_handler(&dummy_error_handler); Hm. Why do you need to do this? If this is really needed, then it needs a good comment explaning why. But I suspect you might not need this at all. Is it to avoid assertions when an error is handled on the file operations? If you don't want mysql_file_open() and so to set an error with my_error() (because you want to handle the error yourself somehow), then just don't pass MY_WME as a flag to the function. > + if (unlikely(generate_new_name(new_name, name, 0))) > + abort(); > + new_name_ptr= new_name; > + /* > + We log the whole file name for log file as the user may decide > + to change base names at some point. > + */ > + Rotate_log_event r(new_name + dirname_length(new_name), 0, > + LOG_EVENT_OFFSET, 0); > + if (unlikely(write_event(&r, alg))) > + abort(); > + if (unlikely(flush_io_cache(&log_file) != 0)) > + abort(); No, don't crash the server like this. Errors during writing binlog are indeed very tricky. But it shouldn't crash the server. In the worst case, we may be left with a corrup binlog, but the error should be handled and reported to the user/DBA, similar to how it is done in other parts of the code. How does this code work with respect to the binlog checkpoint mechanism? The binlog checkpoint mechanism is a way to ensure that sufficient binlog files are kept to ensure correct crash recovery, in an asynchroneous way that does not require syncing the InnoDB redo log at binlog rotation. It uses the struct xid_count_per_binlog to keep track of which binlogs are active, and uses this to write binlog checkpoint events as appropriate. There are some tricky concurrency issues around this mechanism, so it is important to consider carefully how this will be handled in different cases by the code. For example, if the binlog checkpoint gets delayed and spans multiple binlog files including one or more generated by this patch; or what happens if RESET MASTER is run in parallel, etc. I don't see any changes related to xid_count_per_binlog in the patch. Is this because this is all handled correctly by MYSQL_BIN_LOG::open()? I would like to see a comment somewhere explaining how this is working. Also I think there needs to be some test cases testing this, eg. delayed binlog checkpointing when several binlog_non_blocking_threshold binlog rotations are running in parallel. > diff --git a/sql/log.h b/sql/log.h > index fc5209d1922..af2b56da802 100644 > --- a/sql/log.h > +++ b/sql/log.h > @@ -19,6 +19,7 @@ > +private: > + // reserved size for format_description_log_event, gtid_list_log_event ... > + static const my_off_t non_blocking_binlog_reserved_size= 10 * 1024 * 1024; Oh, 10 *megabyte* default for the dummy event? That's surely excessive? But hopefully we can avoid the dummy event completely as suggested above. > +static Sys_var_ulonglong Sys_var_non_blocking_binlog_threshold( > + "non_blocking_binlog_threshold", > + "If the binlog size of a transaction exceeds this value, we write it to a " > + "new temporary file and rename it to the next binlog file.", > + GLOBAL_VAR(non_blocking_binlog_threshold), CMD_LINE(OPT_ARG), > + VALID_RANGE(256 * 1024 * 1024, ULONGLONG_MAX), DEFAULT(ULONGLONG_MAX), > + BLOCK_SIZE(1)); Maybe name it so that it starts with "binlog_", eg. binlog_non_blocking_threshold? I think the VALID_RANGE should be extended, there is no need to forbid users to set it low for testing purposes etc. (even though I agree normally a large value would be strongly recommended). Suggestion for better description: If the binlog size in bytes of a transaction exceeds this value, force a binlog rotation and write the transaction to its own binlog file. This can reduce stalls of other `commits when binlogging very large transactions, at the cost of creating extra binlog files smaller than the configured --max-binlog-size. - Kristian.

1 0

Question regarding underscore in Numeric Literals
by Rohan Gupta 08 Jul '24

08 Jul '24

https://jira.mariadb.org/browse/MDEV-34228 I am trying my hands at this issue, which is a new feature introduced in SQL2023 and have been unable to determine if the underscore should only be part of queries, or it should be part of the users query's outputs. Also I have been trying to fiddle, around the parser in sql_lex.cc. Any pointers or guidance on how to approach this? Rohan Gupta

2 1