developers
Threads by month
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
May 2024
- 4 participants
- 7 discussions
2
2
Hi!
Thanks to everyone who has worked on polishing the CI tests and integrations.
Looking at e.g. 10.11 branch[1] I see that commit ccb7a1e[2] has a
green checkmark next to it and all 15 CI jobs passed.
I just wanted to check if everyone developing the MariaDB Server is
committed to getting the CI to be consistently green?
This means that GitHub rules need to be a bit more strict, not
allowing any failing tests jobs at all, and developers and managers
need to agree to stop adding new commits on any release branch if it
can't be done without a fully passing CI run.
Thanks to Daniel's review on latest failures we know these are at the
moment recurring:
* MDEV-25614 Galera test failure on GCF-354 included in MDEV-33073
always green buildbot
* MDEV-33785 aarch64 macos encryption.create_or_replace
* MDEV-33601 galera_3nodes.galera_safe_to_bootstrap test failing
* MDEV-33786 galera_3nodes.galera_vote_rejoin_mysqldump mysql_shutdown
failed at line 85:
I see two approaches to get to consistently green CI:
1) Stop all development and focus on just fixing these, don't continue
until CI is fully green, and once it is fully green make the GitHub
branch protection settings one notch stricter to not allow any new
commits unless the CI is fully green so it never regresses again.
2) Disable these tests and make the rules in GitHub branch protection
one notch stricter right away, and not allow any new commits unless
the CI is fully green ensuring no new recurring failures are
introduced.
- Otto
[1] https://github.com/MariaDB/server/commits/10.11
[2] https://github.com/MariaDB/server/commit/ccb7a1e9a15e6a47aba97f9bdbfab2e4bf…
7
12
10 Jul '24
Hi!
I assume everyone on this mailing list has a GitHub account. Could you
please +1 these PRs that add a mention of MariaDB in the documentation
of various projects that currently only mention MySQL?
Open each link below and press the thumbs up icon in the lower left
corner of the PR description:
https://github.com/bookshelf/bookshelf/pull/2129
https://github.com/ponyorm/pony/pull/708
https://github.com/metabase/metabase/issues/40325
https://github.com/SeaQL/seaql.github.io/pull/116
I am asking this because there are a lot of database tools and
libraries out there that work with both MySQL and MariaDB, but they
only mention MySQL, which is a pity, because the users of those tools
might actually switch from MariaDB to MySQL simply out of
confusion/fear that the tool supports only MySQL, which is almost
never true.
One of those rare cases is Javascript/Node.js Drizzle ORM which needs
changes to support MariaDB. There is already a PR for that, you could
+1 it as well:
https://github.com/drizzle-team/drizzle-orm/pull/1692
The Rust Diesel ORM has a similar situation going on:
https://github.com/diesel-rs/diesel/issues/1882
https://github.com/diesel-rs/diesel/pull/3964
However most projects have absolutely no MySQL-specific things and
work with MariaDB, and just needed the docs updated, like was done in
these cases:
https://github.com/mysql-net/MySqlConnector/pull/1460
https://github.com/rails/rails/pull/51330
https://github.com/prisma/docs/pull/5706
https://github.com/gregstoll/gallery2/pull/168
https://github.com/coleifer/peewee/pull/2858
3
5
Hi!
Thanks for the suggestions. We'll take a look on our side at adding topics
and improving the repo metadata.
On Tue, Mar 19, 2024 at 8:55 AM Ian Gilfillan <ian(a)mariadb.org> wrote:
> Hi!
>
> Thanks for the suggestions. We'll take a look on our side at adding topics
> and improving the repo metadata.
>
> On Tue, Mar 19, 2024 at 5:20 AM Otto Kekäläinen <otto(a)kekalainen.net>
> wrote:
>
>> Hi!
>>
>> I was browsing the GitHub topic 'mariadb' at
>> https://github.com/topics/mariadb and to my surprise MariaDB itself
>> isn't in that list at all. This seems to be due to
>> https://github.com/MariaDB/server not having any topics defined at
>> all.
>>
>> I recommend you add in the GitHub settings of
>> https://github.com/MariaDB/server at least the topic 'mariadb', and
>> while at it, also add link to mariadb.org.
>>
>> More ideas on how to improve the look and features of the GitHub page
>> can be found by browsing projects like https://github.com/pulsar-edit
>> and https://github.com/prisma which have all their GitHub account
>> metadata polished.
>>
>> - Otto
>>
>
3
8
04 Jun '24
Kristian Nielsen via commits <commits(a)lists.mariadb.org> writes:
> From: Brandon Nesterenko <brandon.nesterenko(a)mariadb.com>
>
> This patch extends the command `SHOW REPLICA HOSTS` with three
> columns:
Hi Brandon, please find my review of MDEV-21322 below.
> This patch extends the command `SHOW REPLICA HOSTS` with three
> columns:
>
> 1) Gtid_State_Sent. This represents that latest GTIDs sent to the
> replica in each domain. It will always be populated, regardless of
> the semi-sync status (i.e. asynchronous connections will still
> update this column with the latest GTID state sent to the replica).
>
> 2) Gtid_State_Ack. For semi-synchronous connections (only), this
> column represents the last GTID in each domain that the replica has
> acknowledged.
Use "Pos" instead of "State", eg. Gtid_Pos_Sent and Gtid_Pos_Ack.
To be consistent with eg. gtid_binlog_pos, which has last GTID per
domain_id; vs. gtid_binlog_state, which has GTID per every (domain_id,
server_id) combination.
> @@ -2272,6 +2292,30 @@ send_event_to_slave(binlog_send_info *info, Log_event_type event_type,
> return "Failed to run hook 'after_send_event'";
> }
>
> + if (info->thd->slave_info)
> + {
> + strncpy(info->thd->slave_info->gtid_state_sent.log_file,
> + info->log_file_name + info->dirlen,
> + strlen(info->log_file_name) - info->dirlen);
> + info->thd->slave_info->gtid_state_sent.log_pos= pos;
> + }
Isn't there missing synchronisation here against concurrent access from
another thread?
I guess we need LOCK_thd_data around the strncpy().
Maybe we can avoid taking LOCK_thd_data except in the uncommon case where
the gile name changes. It would be best to avoid a mutex lock/unlock for
_every_ event sent to a slave.
The position doesn't need locking I think, you can use an atomic
std::memory_order_relaxed mostly for documentation if you like.
> @@ -125,8 +114,11 @@ int THD::register_slave(uchar *packet, size_t packet_length)
> if (check_access(this, PRIV_COM_REGISTER_SLAVE, any_db.str, NULL,NULL,0,0))
> return 1;
> if (!(si= (Slave_info*)my_malloc(key_memory_SLAVE_INFO, sizeof(Slave_info),
> - MYF(MY_WME))))
> + MYF(MY_WME|MY_ZEROFILL))))
> return 1;
> + memset(si->gtid_state_sent.log_file, '\0', FN_REFLEN);
> + memset(si->gtid_state_ack.log_file, '\0', FN_REFLEN);
I think it's good to add the MY_ZEROFILL here, but then the two memset() are
redundant and can be removed.
> @@ -1235,6 +1251,14 @@ int Repl_semi_sync_master::update_sync_header(THD* thd, unsigned char *packet,
> *need_sync= sync;
>
> l_end:
> + if (is_on())
> + {
> + thd->slave_info->sync_status=
> + sync ? thd->slave_info->sync_status=
> + Slave_info::SYNC_STATE_SEMI_SYNC_ACTIVE
> + : thd->slave_info->sync_status=
> + Slave_info::SYNC_STATE_SEMI_SYNC_STALE;
> + }
There's something wierd here, double assignment of
thd->slave_info->sync_status, probably you just meant:
thd->slave_info->sync_status= sync ?
Slave_info::SYNC_STATE_SEMI_SYNC_ACTIVE : Slave_info::SYNC_STATE_SEMI_SYNC_STALE;
More importantly, update_sync_header() is called just _before_ writing the
event on the socket to the connected slave. Is that really the right place
to do so? Shouldn't the status be updated only _after_ the event has been
sent?
Or does it not matter, because the actual TCP sending is asynchroneous
anyway? In that case, should add a comment explaining why updating the
status here is ok.
> --- a/mysql-test/suite/rpl/r/rpl_mixed_ddl_dml.result
> +++ b/mysql-test/suite/rpl/r/rpl_mixed_ddl_dml.result
> @@ -11,8 +11,8 @@ n
> 2002
> connection master;
> show slave hosts;
> -Server_id Host Port Master_id
> -2 127.0.0.1 9999 1
> +Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack Sync_Status
> +2 127.0.0.1 9999 1 0-1-2 Asynchronous
Just a remark that this output could be non-deterministic unless the test
guarantees that the state will be stable at this point. So please check that
the test will produce stable output; or if you already did so, then great!
> +--echo # state, and Gtid_State_Sent/Ack should start, and only Gtid_State_Sent
Typo, s/should start/should start empty/ ?
> +--connection server_2
> +--source include/stop_slave.inc
> +set global rpl_semi_sync_slave_enabled = 1;
> +--source include/start_slave.inc
> +
> +--connection server_1
> +let $status_var= Rpl_semi_sync_master_clients;
> +let $status_var_value= 1;
> +source include/wait_for_status_var.inc;
> +
> +--replace_result $SLAVE_MYPORT SLAVE_PORT $SERVER_MYPORT_3 SLAVE2_PORT $DEFAULT_MASTER_PORT DEFAULT_PORT
> +SHOW REPLICA HOSTS;
Hm, won't this output be non-deterministic?
From a quick look in the code, Rpl_semi_sync_master_clients is incremented
from the start of mysql_binlog_send(), but the state is only set from
update_sync_header() when the first event is sent to the slave. So won't
there be a (small) window where the state has not yet switched to ACTIVE
when this SHOW SLAVE HOSTS runs?
So better wait for the state to become ACTIVE instead. Or for the expected
value in Gtid_Pos_Sent, where that is non-empty.
Also, better use SHOW SLAVE HOSTS here, for consistency with rest of test
suite.
> +--let $nextval= `SELECT max(a)+1 from t1`
> +--eval insert into t1 values ($nextval)
Is there a reason here to use separate --let and --eval? And not just:
INSERT INTO t1 SELECT max(a)+1 FROM t1;
> +--echo # Ensuring master gtid_binlog_pos matches Gtid_State_Ack
> +--let $gtid_ack= query_get_value(show slave hosts, Gtid_State_Ack, 1)
> +if (`SELECT strcmp("$master_gtid","$gtid_ack") != 0`)
> +{
> + --echo # Master gtid_binlog_pos: $master_gtid
> + --echo # Gtid_State_Ack: $gtid_ack
> + --die Master's gtid_binlog_pos should match Gtid_State_Ack, but doesn't
Again, sync_with_master_gtid.inc on the slave will not guarantee that the
master saw the ack yet, even though it will be a rare race when it doesn't
happen.
So you'd need to wait for the expected value here with wait_condition.inc
or somilar instead, I think.
Probably the same in subsequent parts of the test case.
> +--echo #
> +--echo # 21322.5: When connecting a new slave (server_id 3) which initially has
> +--echo # semi-sync disabled, SHOW SLAVE HOSTS on the master should show its
> +--echo # Sync_Status as asynchronous (while server_id 2 is still semi-sync
> +--echo # active).
Sorry, it's a very long and complex test case, I didn't have the time today
to carefully read and check every detail in it after this point. Please
carefully check all the cases for possible races like described above.
Replication tests are notorious for having non-deterministic failures (as
I'm sure you've experienced first-hand), and it's important to try to avoid
that as much as possible.
A general remark on the test, the output of Gtid_Pos_Sent and Gtid_Pos_Ack
are full GTID positions (multiple domain_id). But there is no test of having
more than one domain in the output. I think you need to extend the test to
test at least one case where there are multiple domains. And also need to
test when the GTID state in the binlog has multiple server_id (though maybe
the test case already does this, I didn't check myself).
- Kristian.
2
3
Re: eba212af92d: MDEV-9101 Limit size of created disk temporary files and tables
by Sergei Golubchik 08 May '24
by Sergei Golubchik 08 May '24
08 May '24
Hi, Monty,
On Apr 18, Michael Widenius wrote:
> commit eba212af92d
> Author: Michael Widenius <monty(a)mariadb.org>
> Date: Thu Mar 14 17:59:00 2024 +0100
>
> diff --git a/sql/json_table.cc b/sql/json_table.cc
> index 6713e796eb7..da5ab0c450a 100644
> --- a/sql/json_table.cc
> +++ b/sql/json_table.cc
> @@ -734,7 +734,8 @@ bool Create_json_table::finalize(THD *thd, TABLE *table,
>
> table->db_stat= HA_OPEN_KEYFILE;
> if (unlikely(table->file->ha_open(table, table->s->path.str, O_RDWR,
> - HA_OPEN_TMP_TABLE | HA_OPEN_INTERNAL_TABLE)))
> + HA_OPEN_TMP_TABLE | HA_OPEN_INTERNAL_TABLE |
> + HA_OPEN_SIZE_TRACKING)))
shouldn't HA_OPEN_INTERNAL_TABLE always imply HA_OPEN_SIZE_TRACKING
automatically?
> DBUG_RETURN(true);
>
> table->set_created();
> diff --git a/storage/maria/ma_statrec.c b/storage/maria/ma_statrec.c
> index d8a8b0a05d7..ff0c6c7a617 100644
> --- a/storage/maria/ma_statrec.c
> +++ b/storage/maria/ma_statrec.c
> @@ -45,6 +45,13 @@ my_bool _ma_write_static_record(MARIA_HA *info, const uchar *record)
> my_errno=HA_ERR_RECORD_FILE_FULL;
> return(2);
> }
> + if (info->s->tracked &&
are there temporary files that you don't want to track?
are there non-temporary files that you want to track?
> + _ma_update_tmp_file_size(&info->s->track_data,
> + MY_ALIGN(info->s->state.state.data_file_length+
> + info->s->base.pack_reclength,
> + MARIA_TRACK_INCREMENT_SIZE)))
> + return(2);
> +
> if (info->opt_flag & WRITE_CACHE_USED)
> { /* Cash in use */
> if (my_b_write(&info->rec_cache, record,
> diff --git a/storage/maria/ma_page.c b/storage/maria/ma_page.c
> index 5881456a69a..0877c966d1c 100644
> --- a/storage/maria/ma_page.c
> +++ b/storage/maria/ma_page.c
> @@ -417,6 +417,13 @@ my_off_t _ma_new(register MARIA_HA *info, int level,
> DBUG_RETURN(HA_OFFSET_ERROR);
> }
> share->state.state.key_file_length+= block_size;
> + if (info->s->tracked &&
> + _ma_update_tmp_file_size(&share->track_index,
> + share->state.state.key_file_length))
shouldn't you do it before share->state.state.key_file_length
is increased? Currently it seems that you're failing
the operation, but the file will stay increased
same pattern everywhere else
> + {
> + mysql_mutex_unlock(&share->intern_lock);
> + DBUG_RETURN(HA_OFFSET_ERROR);
> + }
> /* Following is for not transactional tables */
> info->state->key_file_length= share->state.state.key_file_length;
> mysql_mutex_unlock(&share->intern_lock);
> diff --git a/storage/maria/ma_delete_all.c b/storage/maria/ma_delete_all.c
> index f355d0da3e8..ed2087d3091 100644
> --- a/storage/maria/ma_delete_all.c
> +++ b/storage/maria/ma_delete_all.c
> @@ -132,9 +132,16 @@ int maria_delete_all_rows(MARIA_HA *info)
> if (_ma_flush_table_files(info, MARIA_FLUSH_DATA|MARIA_FLUSH_INDEX,
> FLUSH_IGNORE_CHANGED, FLUSH_IGNORE_CHANGED) ||
> mysql_file_chsize(info->dfile.file, 0, 0, MYF(MY_WME)) ||
> - mysql_file_chsize(share->kfile.file, share->base.keystart, 0, MYF(MY_WME)))
> + mysql_file_chsize(share->kfile.file, share->base.keystart, 0,
> + MYF(MY_WME)))
should be > 0 here, shouldn't it? Both for kfile and dfile.
> goto err;
>
> + if (info->s->tracked)
> + {
> + _ma_update_tmp_file_size(&info->s->track_data, 0);
> + _ma_update_tmp_file_size(&info->s->track_index, share->base.keystart);
but this shouldn't update the size if mysql_file_chsize returned -1
> + }
> +
> if (_ma_initialize_data_file(share, info->dfile.file))
> goto err;
>
> diff --git a/sql/uniques.cc b/sql/uniques.cc
> index 36725e80a6b..44d93bb0f88 100644
> --- a/sql/uniques.cc
> +++ b/sql/uniques.cc
> @@ -720,7 +720,7 @@ bool Unique::merge(TABLE *table, uchar *buff, size_t buff_size,
> /* Open cached file for table records if it isn't open */
> if (! my_b_inited(outfile) &&
> open_cached_file(outfile, mysql_tmpdir, TEMP_PREFIX, DISK_CHUNK_SIZE,
> - MYF(MY_WME)))
> + MYF(MY_WME | MY_TRACK | MY_TRACK_WITH_LIMIT)))
as far as I can see, open_cached_file is only used for temporary files.
it should apply MY_TRACK | MY_TRACK_WITH_LIMIT internally, it'll
be less error-prone
> return 1;
>
> bzero((char*) &sort_param,sizeof(sort_param));
> diff --git a/storage/maria/ma_write.c b/storage/maria/ma_write.c
> index 9a6859bfd4d..7404db84067 100644
> --- a/storage/maria/ma_write.c
> +++ b/storage/maria/ma_write.c
> @@ -386,6 +387,11 @@ int maria_write(MARIA_HA *info, const uchar *record)
> }
> }
> }
> + else if (my_errno == HA_ERR_LOCAL_TMP_SPACE_FULL ||
> + my_errno == HA_ERR_GLOBAL_TMP_SPACE_FULL)
> + {
> + filepos= HA_OFFSET_ERROR; /* Avoid write_record_abort() */
why avoid? HA_ERR_LOCAL_TMP_SPACE_FULL is like EQUOT or ENOSPC.
why should be masked as HA_OFFSET_ERROR, and not be a fatal error?
> + }
> else
> fatal_error= 1;
>
> diff --git a/storage/maria/ma_close.c b/storage/maria/ma_close.c
> index 7441e29a97b..47935d61b88 100644
> --- a/storage/maria/ma_close.c
> +++ b/storage/maria/ma_close.c
> @@ -14,7 +14,7 @@
> along with this program; if not, write to the Free Software
> Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA */
>
> -/* close a isam-database */
> +/* close an Aria table */
at last :) it's what 27 years old?
> /*
> TODO:
> We need to have a separate mutex on the closed file to allow other threads
> diff --git a/sql/sql_window.cc b/sql/sql_window.cc
> index e0c7cef1a9e..4959c7833a1 100644
> --- a/sql/sql_window.cc
> +++ b/sql/sql_window.cc
> @@ -2922,15 +2922,16 @@ bool compute_window_func(THD *thd,
> if (unlikely(thd->is_error() || thd->is_killed()))
> break;
>
> -
> /* Return to current row after notifying cursors for each window
> function. */
> - tbl->file->ha_rnd_pos(tbl->record[0], rowid_buf);
> + if (tbl->file->ha_rnd_pos(tbl->record[0], rowid_buf))
> + return true;
how can it fail?
> }
>
> /* We now have computed values for each window function. They can now
> be saved in the current row. */
> - save_window_function_values(window_functions, tbl, rowid_buf);
> + if (save_window_function_values(window_functions, tbl, rowid_buf))
> + return true;
>
> rownum++;
> }
> diff --git a/mysql-test/main/status.test b/mysql-test/main/status.test
> index acf29f35b9e..b6be046a9ac 100644
> --- a/mysql-test/main/status.test
> +++ b/mysql-test/main/status.test
> @@ -244,9 +244,9 @@ let $rnd_next = `show global status like 'handler_read_rnd_next'`;
> let $tmp_table = `show global status like 'Created_tmp_tables'`;
> show status like 'com_show_status';
> show status like 'hand%write%';
> -show status like '%tmp%';
> +show status like '%_tmp%';
Doesn't look like a very meaningful change. Did you mean '%\_tmp%' ?
> show status like 'hand%write%';
> -show status like '%tmp%';
> +show status like '%_tmp%';
> show status like 'com_show_status';
> let $rnd_next2 = `show global status like 'handler_read_rnd_next'`;
> let $tmp_table2 = `show global status like 'Created_tmp_tables'`;
> diff --git a/mysys/mf_cache.c b/mysys/mf_cache.c
> index 2fec59f4e70..712187f6181 100644
> --- a/mysys/mf_cache.c
> +++ b/mysys/mf_cache.c
> @@ -43,7 +43,7 @@ my_bool open_cached_file(IO_CACHE *cache, const char* dir, const char *prefix,
> cache->file_name=0;
> cache->buffer=0; /* Mark that not open */
> if (!init_io_cache(cache, -1, cache_size, WRITE_CACHE, 0L, 0,
> - MYF(cache_myflags | MY_NABP)))
> + MYF(cache_myflags | MY_NABP | MY_TRACK)))
Why not MY_TRACK_WITH_LIMIT ?
> {
> DBUG_RETURN(0);
> }
> diff --git a/sql/sql_select.cc b/sql/sql_select.cc
> index 6f2797b6b39..324f7679e8f 100644
> --- a/sql/sql_select.cc
> +++ b/sql/sql_select.cc
> @@ -22275,7 +22275,8 @@ bool open_tmp_table(TABLE *table)
> int error;
> if (unlikely((error= table->file->ha_open(table, table->s->path.str, O_RDWR,
> HA_OPEN_TMP_TABLE |
> - HA_OPEN_INTERNAL_TABLE))))
> + HA_OPEN_INTERNAL_TABLE |
> + HA_OPEN_SIZE_TRACKING))))
shouldn't HA_OPEN_TMP_TABLE or HA_OPEN_INTERNAL_TABLE imply
HA_OPEN_SIZE_TRACKING?
> {
> table->file->print_error(error, MYF(0)); /* purecov: inspected */
> table->db_stat= 0;
> diff --git a/sql/handler.cc b/sql/handler.cc
> index fec473d97bb..a05b7476724 100644
> --- a/sql/handler.cc
> +++ b/sql/handler.cc
> @@ -443,9 +444,12 @@ int ha_init_errors(void)
> /* Zerofill it to avoid uninitialized gaps. */
remove the comment, if you no longer zerofill
> if (! (handler_errmsgs= (const char**) my_malloc(key_memory_handler_errmsgs,
> HA_ERR_ERRORS * sizeof(char*),
> - MYF(MY_WME | MY_ZEROFILL))))
> + MYF(MY_WME))))
> return 1;
>
> + /* Copy default handler error messages */
> + memcpy(handler_errmsgs, handler_error_messages, HA_ERR_ERRORS * sizeof(char*));
> +
> /* Set the dedicated error messages. */
> SETMSG(HA_ERR_KEY_NOT_FOUND, ER_DEFAULT(ER_KEY_NOT_FOUND));
> SETMSG(HA_ERR_FOUND_DUPP_KEY, ER_DEFAULT(ER_DUP_KEY));
> diff --git a/storage/maria/maria_def.h b/storage/maria/maria_def.h
> index 705562eb795..b256466f1f0 100644
> --- a/storage/maria/maria_def.h
> +++ b/storage/maria/maria_def.h
> @@ -778,6 +785,14 @@ typedef struct st_maria_share
> myf write_flag;
> enum data_file_type data_file_type;
> enum pagecache_page_type page_type; /* value depending transactional */
> +
> + /*
> + tracked will cause lost bytes (not aligned) but this is ok as it is always
> + used with tmp_file_tracking if set
I don't understand your explanation why "this is ok"
> + */
> + my_bool tracked; /* Tracked table (always internal) */
> + struct tmp_file_tracking track_data,track_index;
> +
> /**
> if Checkpoint looking at table; protected by close_lock or THR_LOCK_maria
> */
> diff --git a/include/my_sys.h b/include/my_sys.h
> index af509c7df45..0df12963ab9 100644
> --- a/include/my_sys.h
> +++ b/include/my_sys.h
> @@ -178,6 +180,17 @@ uchar *my_large_malloc(size_t *size, myf my_flags);
> void my_large_free(void *ptr, size_t size);
> void my_large_page_truncate(size_t *size);
>
> +/* Tracking tmp file usage */
> +
> +struct tmp_file_tracking
> +{
> + ulonglong last_position;
I had to read how "last_position" is used to realize that it's
in fact previous file_size value. Why not to call it previous_file_size?
Or recorded_file_size ? Or old_file_size? it's not a "position"
> + ulonglong file_size;
> +};
> +
> +typedef int (*TMPFILE_SIZE_CB)(struct tmp_file_tracking *track, int no_error);
> +extern TMPFILE_SIZE_CB update_tmp_file_size;
> +
> #ifdef _WIN32
> extern BOOL my_obtain_privilege(LPCSTR lpPrivilege);
> #endif
> diff --git a/sql/sql_class.h b/sql/sql_class.h
> index c1efb49f6da..42eee9913ec 100644
> --- a/sql/sql_class.h
> +++ b/sql/sql_class.h
> @@ -1070,12 +1074,22 @@ typedef struct system_status_var
> */
>
> #define last_system_status_var questions
> -#define last_cleared_system_status_var local_memory_used
> +
> +/* Parameters to set_status_var_init() */
> +
> +#define STATUS_OFFSET(A) offsetof(STATUS_VAR,A)
> +/* Clear as part of flush */
> +#define clear_up_to_tmp_space_used STATUS_OFFSET(tmp_space_used)
> +/* Clear as part of startup */
> +#define clear_up_to_memory_used STATUS_OFFSET(local_memory_used)
> +/* Full initialization. Note that global_memory_used is updated early! */
> +#define clear_up_to_global_memory_used STATUS_OFFSET(global_memory_used)
> +#define last_restored_status_var clear_up_to_tmp_space_used
those are bad names. and the fact that you need to rename these defines
and change all code that uses them - this kind of shows they're not good.
when there will be a new variable, clear_up_to_tmp_space_used can
become clear_up_to_new_variable and everything will need to
be changed again.
better use names that don't depend on individual variables.
last_restored_status_var and last_cleared_system_status_var are good.
Try to split variables logically. First there are additive
variables - global value is the sum of all local values. They're cleared,
they're added up tp global, etc.
Then there are non-additive, not cleared, not added - like
local_memory_used and tmp_space_used, for example. And global_memory_used
is special because it's initialized early, right?
so, may be defines based on these properties would be easier to understand.
because now I'm totally at loss when to use clear_up_to_tmp_space_used,
when to use clear_up_to_memory_used, when clear_up_to_global_memory_used
and when last_restored_status_var.
> +
>
> /** Number of contiguous global status variables */
> -constexpr int COUNT_GLOBAL_STATUS_VARS= int(offsetof(STATUS_VAR,
> - last_system_status_var) /
> - sizeof(ulong)) + 1;
> +constexpr int COUNT_GLOBAL_STATUS_VARS=
> + int(STATUS_OFFSET(last_system_status_var) /sizeof(ulong)) + 1;
>
> /*
> Global status variables
> diff --git a/sql/filesort.cc b/sql/filesort.cc
> index abcd4127a0b..df4508b635a 100644
> --- a/sql/filesort.cc
> +++ b/sql/filesort.cc
> @@ -1082,6 +1078,11 @@ static ha_rows find_all_keys(THD *thd, Sort_param *param, SQL_SELECT *select,
> DBUG_RETURN(num_records);
>
> err:
> + if (!quick_select)
> + {
> + (void) file->extra(HA_EXTRA_NO_CACHE);
> + file->ha_rnd_end();
> + }
why?
> sort_form->column_bitmaps_set(save_read_set, save_write_set);
> DBUG_RETURN(HA_POS_ERROR);
> } /* find_all_keys */
> diff --git a/sql/mysqld.cc b/sql/mysqld.cc
> index dc4a955aa79..4727b7b5413 100644
> --- a/sql/mysqld.cc
> +++ b/sql/mysqld.cc
> @@ -1947,6 +1950,10 @@ static void mysqld_exit(int exit_code)
> if (exit_code == 0 || opt_endinfo)
> SAFEMALLOC_REPORT_MEMORY(0);
> }
> + if (global_tmp_space_used)
> + fprintf(stderr, "Warning: tmp_space not freed: %llu\n",
> + (ulonglong) global_tmp_space_used);
This is confusing. almost all (if not all) temp space is automatically
freed by the OS when the process exits, so this warning will only
scare the user about lost space when in fact no space is lost,
it's only some space was not accounted for inside MariaDB.
I suggest either make this warning debug-only or rephrase it like
"Note: %llu tmp_space was lost and will be freed automatically when the
process exits"
> +
> DBUG_LEAVE;
> #ifdef _WIN32
> my_report_svc_status(SERVICE_STOPPED, exit_code, 0);
> @@ -7571,6 +7636,7 @@ SHOW_VAR status_vars[]= {
> {"Tc_log_page_size", (char*) &tc_log_page_size, SHOW_LONG_NOFLUSH},
> {"Tc_log_page_waits", (char*) &tc_log_page_waits, SHOW_LONG},
> #endif
> + {"tmp_space_used", (char*) offsetof(STATUS_VAR, tmp_space_used), SHOW_LONGLONG_STATUS},
first letter - upper case
> #ifdef HAVE_POOL_OF_THREADS
> {"Threadpool_idle_threads", (char *) &show_threadpool_idle_threads, SHOW_SIMPLE_FUNC},
> {"Threadpool_threads", (char *) &show_threadpool_threads, SHOW_SIMPLE_FUNC},
> diff --git a/sql/log.cc b/sql/log.cc
> index 460cefea47b..a9bc3758dd5 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -2388,6 +2384,8 @@ bool Event_log::check_write_error(THD *thd)
> case ER_TRANS_CACHE_FULL:
> case ER_STMT_CACHE_FULL:
> case ER_ERROR_ON_WRITE:
> + case EE_LOCAL_TMP_SPACE_FULL:
> + case EE_GLOBAL_TMP_SPACE_FULL:
not a good idea to mix EE_ OS-like errors and ER_ SQL layer errors
> case ER_BINLOG_LOGGING_IMPOSSIBLE:
> checked= TRUE;
> break;
> @@ -6650,7 +6651,8 @@ Event_log::flush_and_set_pending_rows_event(THD *thd, Rows_log_event* event,
> {
> set_write_error(thd, is_transactional);
> if (check_cache_error(thd, cache_data) &&
> - stmt_has_updated_non_trans_table(thd))
> + (stmt_has_updated_non_trans_table(thd) ||
> + !is_transactional))
why?
> cache_data->set_incident();
> delete pending;
> cache_data->set_pending(NULL);
> @@ -7868,8 +7877,16 @@ int Event_log::write_cache(THD *thd, binlog_cache_data *cache_data)
> DBUG_RETURN(res ? ER_ERROR_ON_WRITE : 0);
> }
>
> - if (reinit_io_cache(cache, READ_CACHE, 0, 0, 0))
> + /*
> + Allow flush of transaction logs to temporary go over the tmp space limit
> + as we do not want the commit to fail
> + */
> + cache->myflags&= ~MY_TRACK_WITH_LIMIT;
> + res= reinit_io_cache(cache, READ_CACHE, 0, 0, 0);
> + cache->myflags|= MY_TRACK_WITH_LIMIT;
you also remove MY_TRACK_WITH_LIMIT inside reinit_io_cache(),
so this is redundant. Or it's redundant inside reinit_io_cache(),
either way, no point in doing it twice
> + if (res)
> DBUG_RETURN(ER_ERROR_ON_WRITE);
> +
> /* Amount of remaining bytes in the IO_CACHE read buffer. */
> size_t log_file_pos;
> uchar header_buf[LOG_EVENT_HEADER_LEN];
> diff --git a/mysys/mf_iocache.c b/mysys/mf_iocache.c
> index 4ee1331bdb3..6214057f9e3 100644
> --- a/mysys/mf_iocache.c
> +++ b/mysys/mf_iocache.c
> @@ -73,6 +74,50 @@ int (*_my_b_encr_read)(IO_CACHE *info,uchar *Buffer,size_t Count)= 0;
> int (*_my_b_encr_write)(IO_CACHE *info,const uchar *Buffer,size_t Count)= 0;
>
>
> +static inline my_bool tmp_file_track(IO_CACHE *info, ulonglong file_size)
> +{
> + if ((info->myflags & (MY_TRACK | MY_TRACK_WITH_LIMIT)) &&
> + update_tmp_file_size)
> + {
> + if (info->tracking.file_size < file_size)
> + {
> + int error;
> + info->tracking.file_size= file_size;
> + if ((error= update_tmp_file_size(&info->tracking,
> + !(info->myflags &
> + MY_TRACK_WITH_LIMIT))))
> + {
> + if (info->myflags & MY_WME)
> + my_error(my_errno, MYF(0));
> + info->error= -1;
> + return 1;
> + }
> + }
> + }
> + return 0;
> +}
> +
> +my_bool io_cache_tmp_file_track(IO_CACHE *info, ulonglong file_size)
> +{
> + return tmp_file_track(info, file_size);
> +}
> +
> +
> +void end_tracking_io_cache(IO_CACHE *info)
what does "end tracking IO_CACHE" mean?
I read it as "end tracking of this IO_CACHE, it is no longer
tracked after this function call" - but it's not what it is doing.
so what how do you read this function name?
> +{
> + if ((info->myflags & (MY_TRACK | MY_TRACK_WITH_LIMIT)) &&
> + info->tracking.file_size)
> + {
> + info->tracking.file_size= 0;
> + update_tmp_file_size(&info->tracking, 1);
> + }
> +}
> +
> +void truncate_io_cache(IO_CACHE *info)
> +{
> + if (my_chsize(info->file, 0, 0, MYF(MY_WME)) <= 0)
> + end_tracking_io_cache(info);
not quite. If my_chsize() returns -1, you should not do
info->tracking.file_size= 0, because file size wasn't actually changed.
> +}
>
> static void
> init_functions(IO_CACHE* info)
Regards,
Sergei
Chief Architect, MariaDB Server
and security(a)mariadb.org
2
1
Re: 0d6dc52f43d: MDEV-30046 Refactor write_record and fix idempotent replication
by Sergei Golubchik 04 May '24
by Sergei Golubchik 04 May '24
04 May '24
Hi, Nikita,
On Apr 29, Nikita Malyavin wrote:
> revision-id: 0d6dc52f43d (mariadb-10.5.24-175-g0d6dc52f43d)
> parent(s): 652e9ee405e
> author: Nikita Malyavin
> committer: Nikita Malyavin
> timestamp: 2024-04-27 21:08:21 +0200
> message:
>
> MDEV-30046 Refactor write_record and fix idempotent replication
>
> Idempotent write_row works same as REPLACE: if there is a duplicating
> record in the table, then it will be deleted and re-inserted, with the
> same update optimization.
>
> The code in Rows:log_event::write_row was basically copy-pasted from
> write_record.
>
> What's done:
> REPLACE operation was unified across replication and sql. It is now
> representred as a Write_record class, that holds the whole state, and allows
> re-using some resources in between the row writes.
>
> Replace, IODKU and single insert implementations are split across different
> methods, reluting in a much cleaner code.
>
> The entry point is preserved as a single Write_record::write_record() call.
> The implementation to call is chosen on the constructor stage.
>
> This allowed several optimizations to be done:
> 1. The table key list is not iterated for every row. We find last unique key in
> the order of checking once and preserve it across the rows. See last_uniq_key().
> 2. ib_handler::referenced_by_foreign_key acquires a global lock. This call was
> done per row as well. Not all the table config that allows optimized replace is
> folded into a single boolean field can_optimize. All the fields to check are
> even stored in a single register on a 64-bit platform.
> 3. DUP_REPLACE and DUP_UPDATE cases now have one less level of indirection
> 4. modified_non_trans_tables is checked and set only when it's really needed.
> 5. Obsolete bitmap manipulations are removed.
>
> Also:
> * Unify replace initialization step across implementations:
> add prepare_for_replace and finalize_replace
> * alloca is removed in favor of mem_root allocation. This memory is reused
> across the rows.
> * An rpl-related callback is added to the replace branch, meaning that an extra
> check is made per row replace even for the common case. It can be avoided with
> templates if considered a problem.
>
> diff --git a/mysql-test/main/long_unique_bugs_replication.test b/mysql-test/main/long_unique_bugs_replication.test
> index 9c44d13e6a5..f2225cde101 100644
> --- a/mysql-test/main/long_unique_bugs_replication.test
> +++ b/mysql-test/main/long_unique_bugs_replication.test
> +
> +--echo #
> +--echo # End of 10.4 tests
> +--echo #
...
> --echo #
> --echo # End of 10.4 tests
> --echo #
may be 10.5?
> diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc
> index 38e8b062f18..b5810eeaf7b 100644
> --- a/sql/log_event_server.cc
> +++ b/sql/log_event_server.cc
> @@ -5705,10 +5708,10 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi)
> &m_cols_ai : &m_cols);
> bitmap_intersect(table->write_set, after_image);
>
> - this->slave_exec_mode= slave_exec_mode_options; // fix the mode
> -
> + COPY_INFO copy_info;
> + Write_record write_record;
> // Do event specific preparations
> - error= do_before_row_operations(rli);
> + error= do_before_row_operations(rgi, ©_info, &write_record);
You create write_record on the stack and save it inside this->m_write_record
1. why not to have it permanently a part of Rows_log_event ?
2. if you'll keep it on the stack, please, reset m_write_record=0
before returning from ::do_apply_event(). And may be even do
m_write_record= &write_record here, not in ::do_before_row_operations()
it'll be a bit easier to see how a local variable is stored and then freed
and that it doesn't actually leave the scope.
>
> /*
> Bug#56662 Assertion failed: next_insert_id == 0, file handler.cc
> @@ -7132,6 +7148,40 @@ Write_rows_log_event::do_before_row_operations(const Slave_reporting_capability
> m_table->mark_auto_increment_column(true);
> }
>
> + if (slave_exec_mode == SLAVE_EXEC_MODE_IDEMPOTENT &&
> + (m_table->file->ha_table_flags() & HA_DUPLICATE_POS ||
> + m_table->s->long_unique_table))
> + error= m_table->file->ha_rnd_init_with_error(0);
I'd put this inside if (slave_exec_mode == SLAVE_EXEC_MODE_IDEMPOTENT)
about ~40 lines above the place where you put it now.
> +
> + if (!error)
> + {
> + bzero(copy_info, sizeof *copy_info);
> + copy_info->handle_duplicates=
> + slave_exec_mode == SLAVE_EXEC_MODE_IDEMPOTENT ?
> + DUP_REPLACE : DUP_ERROR;
simply copy_info->handle_duplicates= thd->lex->duplicates;
> + copy_info->table_list= m_table->pos_in_table_list;
> +
> + int (*callback)(void *, void*)= NULL;
> + if (!get_flags(COMPLETE_ROWS_F))
> + {
> + /*
> + If row is incomplete we will use the record found to fill
> + missing columns.
> + */
> + callback= [](void *e, void* r)->int {
> + auto rgi= static_cast<rpl_group_info*>(r);
> + auto event= static_cast<Write_rows_log_event*>(e);
> + return event->incomplete_record_callback(rgi);
> + };
> + }
> + new (write_record) Write_record(thd, m_table, copy_info,
> + m_vers_from_plain &&
> + m_table->versioned(VERS_TIMESTAMP),
> + m_table->triggers && do_invoke_trigger(),
> + NULL, callback, this, rgi);
please, use write_record->init(...) here. And in all similar places below,
I won't comment on them separately.
> + m_write_record= write_record;
> + }
> +
> return error;
> }
>
> @@ -7173,7 +7223,15 @@ Write_rows_log_event::do_after_row_operations(const Slave_reporting_capability *
> {
> m_table->file->print_error(local_error, MYF(0));
> }
> - return error? error : local_error;
> + int rnd_error= 0;
> + if (m_table->file->inited)
> + {
> + DBUG_ASSERT(slave_exec_mode == SLAVE_EXEC_MODE_IDEMPOTENT);
> + DBUG_ASSERT(m_table->file->ha_table_flags() & HA_DUPLICATE_POS ||
> + m_table->s->long_unique_table);
DBUG_ASSERT(m_table->file->inited == handler::RND);
> + rnd_error= m_table->file->ha_rnd_end();
> + }
> + return error? error : local_error ? local_error : rnd_error;
> }
>
> bool Rows_log_event::process_triggers(trg_event_type event,
> diff --git a/sql/sql_insert.cc b/sql/sql_insert.cc
> index ff71361493a..e410efda121 100644
> --- a/sql/sql_insert.cc
> +++ b/sql/sql_insert.cc
> @@ -677,6 +678,30 @@ Field **TABLE::field_to_fill()
> return triggers && triggers->nullable_fields() ? triggers->nullable_fields() : field;
> }
>
> +int prepare_for_replace(TABLE *table, enum_duplicates handle_duplicates,
> + bool ignore)
> +{
> + bool create_lookup_handler= handle_duplicates != DUP_ERROR;
why do you initialize create_lookup_handler here
and then immediately repeat it two lines below? :)
less confusing would be
bool create_lookup_handler= handle_duplicates != DUP_ERROR || ignore;
if (create_lookup_handler)
{
> + if (ignore || handle_duplicates != DUP_ERROR)
> + {
> + create_lookup_handler= true;
> + table->file->extra(HA_EXTRA_IGNORE_DUP_KEY);
> + if (table->file->ha_table_flags() & HA_DUPLICATE_POS ||
> + table->s->long_unique_table)
> + {
> + if (table->file->ha_rnd_init_with_error(false))
> + return 1;
> + }
> + }
> + return table->file->prepare_for_insert(create_lookup_handler);
> +}
> +
> +int finalize_replace(TABLE *table, enum_duplicates handle_duplicates,
> + bool ignore)
> +{
DBUG_ASSERT(m_table->file->inited != handler::INDEX);
> + return table->file->inited ? table->file->ha_rnd_end() : 0;
> +}
> +
>
> /**
> INSERT statement implementation
> @@ -1728,31 +1745,74 @@ int mysql_prepare_insert(THD *thd, TABLE_LIST *table_list,
> }
>
>
> /* Check if there is more uniq keys after field */
this comment isn't correct anymore, is it?
on the other hand, the function is small and the name is clear,
I think you can simply delete the comment, if you'd like.
> +static ushort get_last_unique_key(TABLE *table, KEY *key_info)
> +{
> + for (uint key= table->s->keys; key; --key)
> + if (key_info[key-1].flags & HA_NOSAME)
> + return key-1;
> + return MAX_KEY;
> +}
>
> +/**
> + An update optimization can be done in REPLACE only when certain criterial met.
> + In particular, if the "error key" was the last one to check. If not, there can
> + be other keys that can fail the uniqueness check after we will delete the row.
> +
> + The problem arises with the order of the key checking. It is divided in two
> + parts:
> + 1. Higher-level handler::ha_write_row goes first. It checks for long unique
> + and period overlaps (the latter is not supported by REPLACE yet).
better "(when the latter will be supported by REPLACE)"
this way the comment will be less wrong and misleading when we'll finally
implement this support and but forget to update the comment :)
> + 2. Engine usual unique index check.
> + However, the order of the keys in TABLE_SHARE::key_info is:
> + * first, engine unique keys
> + * then long unique, which is not seen by engine
> + WITHOUT OVERLAPS key has no particular order
> + see sort_keys in sql_table.cc
> +
> + So we need to wrap this order.
> + @return last unique key to check for duplication, or MAX_KEY if none.
> + */
> -static int last_uniq_key(TABLE *table, const KEY *key, uint keynr)
> +ushort Write_record::get_last_unique_key() const
> {
> + if (info->handle_duplicates != DUP_REPLACE)
> + return MAX_KEY; // Not used.
> + if (!table->s->key_info)
> + return MAX_KEY;
> /*
> When an underlying storage engine informs that the unique key
> conflicts are not reported in the ascending order by setting
> the HA_DUPLICATE_KEY_NOT_IN_ORDER flag, we cannot rely on this
> information to determine the last key conflict.
>
> The information about the last key conflict will be used to
> do a replace of the new row on the conflicting row, rather
> than doing a delete (of old row) + insert (of new row).
>
> Hence check for this flag and disable replacing the last row
> - by returning 0 always. Returning 0 will result in doing
> + by returning MAX_KEY always. Returning MAX_KEY will result in doing
> a delete + insert always.
> */
> if (table->file->ha_table_flags() & HA_DUPLICATE_KEY_NOT_IN_ORDER)
> - return 0;
> + return MAX_KEY;
>
> - while (++keynr < table->s->keys)
> - if (key[keynr].flags & HA_NOSAME)
> - return 0;
> - return 1;
> + /*
> + Note, that TABLE_SHARE and TABLE see long uniques differently:
> + - TABLE_SHARE sees as HA_KEY_ALG_LONG_HASH and HA_NOSAME
> + - TABLE sees as usual non-unique indexes
> + */
> + return
> + table->key_info[0].flags & HA_NOSAME ?
> + /*
> + We have an in-engine unique, which should be checked last.
> + Go through the TABLE list, which knows nothing about long uniques.
> + */
> + ::get_last_unique_key(table, table->key_info) :
> + /*
> + We can only have long uniques.
> + Go through the TABLE_SHARE list, where long uniques can be found.
> + */
> + ::get_last_unique_key(table, table->s->key_info);
> }
>
>
> @@ -1786,449 +1846,511 @@ int vers_insert_history_row(TABLE *table)
> }
>
> /*
> - Write a record to table with optional deleting of conflicting records,
> - invoke proper triggers if needed.
> -
> - SYNOPSIS
> - write_record()
> - thd - thread context
> - table - table to which record should be written
> - info - COPY_INFO structure describing handling of duplicates
> - and which is used for counting number of records inserted
> - and deleted.
> - sink - result sink for the RETURNING clause
> -
> - NOTE
> - Once this record will be written to table after insert trigger will
> - be invoked. If instead of inserting new record we will update old one
> - then both on update triggers will work instead. Similarly both on
> - delete triggers will be invoked if we will delete conflicting records.
> -
> - Sets thd->transaction.stmt.modified_non_trans_table to TRUE if table which
> - is updated didn't have transactions.
> -
> - RETURN VALUE
> - 0 - success
> - non-0 - error
> -
> -int write_record(THD *thd, TABLE *table, COPY_INFO *info, select_result *sink)
> -{
> - int error, trg_error= 0;
> - char *key=0;
> - MY_BITMAP *save_read_set, *save_write_set;
> - ulonglong prev_insert_id= table->file->next_insert_id;
> - ulonglong insert_id_for_cur_row= 0;
> - ulonglong prev_insert_id_for_cur_row= 0;
> - DBUG_ENTER("write_record");
> + Retrieve the old (duplicated) row into record[1] to be able to
> + either update or delete the offending record. We either:
> +
> + - use ha_rnd_pos() with a row-id (available as dup_ref) to the
> + offending row, if that is possible (MyISAM and Blackhole, also long unique
> + and application-time periods), or else
> +
> + - use ha_index_read_idx_map() with the key that is duplicated, to
> + retrieve the offending row.
> */
> +int Write_record::locate_dup_record()
> +{
> + handler *h= table->file;
> + int error= 0;
> + if (h->ha_table_flags() & HA_DUPLICATE_POS || h->lookup_errkey != (uint)-1)
> + {
> + DBUG_PRINT("info", ("Locating offending record using rnd_pos()"));
> +
> + error= h->ha_rnd_pos(table->record[1], h->dup_ref);
> + if (unlikely(error))
> + {
> + DBUG_PRINT("info", ("rnd_pos() returns error %d",error));
> + h->print_error(error, MYF(0));
> + }
> + }
> + else
> + {
> + DBUG_PRINT("info",
> + ("Locating offending record using ha_index_read_idx_map"));
>
> + if (h->lookup_handler)
> + h= h->lookup_handler;
Why?
> +
> + error= h->extra(HA_EXTRA_FLUSH_CACHE);
> + if (unlikely(error))
> + {
> + DBUG_PRINT("info",("Error when setting HA_EXTRA_FLUSH_CACHE"));
> + return my_errno;
> + }
>
> - info->records++;
> - save_read_set= table->read_set;
> - save_write_set= table->write_set;
> + if (unlikely(key == NULL))
> + {
> + key= static_cast<uchar*>(thd->alloc(table->s->max_unique_length));
> + if (key == NULL)
> + {
> + DBUG_PRINT("info",("Can't allocate key buffer"));
> + return ENOMEM;
> + }
> + }
>
> - DBUG_EXECUTE_IF("rpl_write_record_small_sleep_gtid_100_200",
> + key_copy(key, table->record[0], table->key_info + key_nr, 0);
> +
> + error= h->ha_index_read_idx_map(table->record[1], key_nr, key,
> + HA_WHOLE_KEY, HA_READ_KEY_EXACT);
why did you revert commit 5e345281e3599c79 here?
> + if (unlikely(error))
> {
> - if (thd->rgi_slave && (thd->rgi_slave->current_gtid.seq_no == 100 ||
> - thd->rgi_slave->current_gtid.seq_no == 200))
> - my_sleep(20000);
> - });
> - if (info->handle_duplicates == DUP_REPLACE ||
> - info->handle_duplicates == DUP_UPDATE)
> + DBUG_PRINT("info", ("index_read_idx() returns %d", error));
> + h->print_error(error, MYF(0));
> + }
> + }
> +
> + if (unlikely(error))
> + return error;
> +
> + if (table->vfield)
> + {
> + /*
> + We have not yet called update_virtual_fields()
> + in handler methods for the just read row in record[1].
> + */
> + table->move_fields(table->field, table->record[1], table->record[0]);
> + error= table->update_virtual_fields(table->file, VCOL_UPDATE_FOR_REPLACE);
> + table->move_fields(table->field, table->record[0], table->record[1]);
> + }
> + return error;
> +}
> +
> +/**
> + REPLACE is defined as either INSERT or DELETE + INSERT. If
> + possible, we can replace it with an UPDATE, but that will not
> + work on InnoDB if FOREIGN KEY checks are necessary.
> +
> + Suppose that we got the duplicate key to be a key that is not
> + the last unique key for the table and we perform an update:
> + then there might be another key for which the unique check will
> + fail, so we're better off just deleting the row and trying to insert again.
> +
> + Additionally we don't use ha_update_row in following cases:
> + * when triggers should be invoked, as it may spoil the intermedite NEW_ROW
> + representation
> + * when we have referenced keys, as there can be different ON DELETE/ON UPDATE
> + actions
> +*/
> +int Write_record::replace_row(ha_rows *inserted, ha_rows *deleted)
> +{
> + int error;
> + while (unlikely(error=table->file->ha_write_row(table->record[0])))
> {
> - while (unlikely(error=table->file->ha_write_row(table->record[0])))
> + error= prepare_handle_duplicate(error);
> + if (unlikely(error))
> + return restore_on_error();
> +
> + if (incomplete_records_cb && (error= incomplete_records_cb(arg1, arg2)))
This callback is strange. I'd rather inherit Write_record_rpl
from Write_record and put rpl-specific logic there.
Also you could avoid a switch on INSERT/REPLACE/ODKU if you'd use
inheritance instead. But the switch is less strange than the callback :)
> + return restore_on_error();
> +
> + if (can_optimize && key_nr == last_unique_key)
> {
> - uint key_nr;
> - /*
> - If we do more than one iteration of this loop, from the second one the
> - row will have an explicit value in the autoinc field, which was set at
> - the first call of handler::update_auto_increment(). So we must save
> - the autogenerated value to avoid thd->insert_id_for_cur_row to become
> - 0.
> - */
> - if (table->file->insert_id_for_cur_row > 0)
> - insert_id_for_cur_row= table->file->insert_id_for_cur_row;
> - else
> - table->file->insert_id_for_cur_row= insert_id_for_cur_row;
> - bool is_duplicate_key_error;
> - if (table->file->is_fatal_error(error, HA_CHECK_ALL))
> - goto err;
> - is_duplicate_key_error=
> - table->file->is_fatal_error(error, HA_CHECK_ALL & ~HA_CHECK_DUP);
> - if (!is_duplicate_key_error)
> - {
> - /*
> - We come here when we had an ignorable error which is not a duplicate
> - key error. In this we ignore error if ignore flag is set, otherwise
> - report error as usual. We will not do any duplicate key processing.
> - */
> - if (info->ignore)
> - {
> - table->file->print_error(error, MYF(ME_WARNING));
> - goto after_trg_or_ignored_err; /* Ignoring a not fatal error */
> - }
> - goto err;
> - }
> - if (unlikely((int) (key_nr = table->file->get_dup_key(error)) < 0))
> - {
> - error= HA_ERR_FOUND_DUPP_KEY; /* Database can't find key */
> - goto err;
> - }
> - DEBUG_SYNC(thd, "write_row_replace");
> + DBUG_PRINT("info", ("Updating row using ha_update_row()"));
> + if (table->versioned(VERS_TRX_ID))
> + table->vers_start_field()->store(0, false);
>
> - /* Read all columns for the row we are going to replace */
> - table->use_all_columns();
> - /*
> - Don't allow REPLACE to replace a row when a auto_increment column
> - was used. This ensures that we don't get a problem when the
> - whole range of the key has been used.
> - */
> - if (info->handle_duplicates == DUP_REPLACE && table->next_number_field &&
> - key_nr == table->s->next_number_index && insert_id_for_cur_row > 0)
> - goto err;
> - if (table->file->has_dup_ref())
> + error= table->file->ha_update_row(table->record[1], table->record[0]);
> +
> + if (likely(!error))
> + ++*deleted;
> + else if (error != HA_ERR_RECORD_IS_THE_SAME)
> + return on_ha_error(error);
> +
> + if (versioned)
> {
> - DBUG_ASSERT(table->file->inited == handler::RND);
> - if (table->file->ha_rnd_pos(table->record[1],table->file->dup_ref))
> - goto err;
> + store_record(table, record[2]);
> + error= vers_insert_history_row(table);
> + restore_record(table, record[2]);
> + if (unlikely(error))
> + return on_ha_error(error);
> }
> +
> + break;
> + }
> + else
> + {
> + DBUG_PRINT("info", ("Deleting offending row and trying to write"
> + " new one again"));
> +
> + auto *trg = table->triggers;
> + if (use_triggers && trg->process_triggers(table->in_use, TRG_EVENT_DELETE,
> + TRG_ACTION_BEFORE, TRUE))
> + return restore_on_error();
> + if (!versioned)
> + error = table->file->ha_delete_row(table->record[1]);
> else
> {
> - if (table->file->extra(HA_EXTRA_FLUSH_CACHE)) /* Not needed with NISAM */
> - {
> - error=my_errno;
> - goto err;
> - }
> -
> - if (!key)
> - {
> - if (!(key=(char*) my_safe_alloca(table->s->max_unique_length)))
> - {
> - error=ENOMEM;
> - goto err;
> - }
> - }
> - key_copy((uchar*) key,table->record[0],table->key_info+key_nr,0);
> - key_part_map keypart_map= (1 << table->key_info[key_nr].user_defined_key_parts) - 1;
> - if ((error= (table->file->ha_index_read_idx_map(table->record[1],
> - key_nr, (uchar*) key,
> - keypart_map,
> - HA_READ_KEY_EXACT))))
> - goto err;
> + store_record(table, record[2]);
> + restore_record(table, record[1]);
> + table->vers_update_end();
> + error = table->file->ha_update_row(table->record[1], table->record[0]);
> + restore_record(table, record[2]);
> }
> - if (table->vfield)
> +
> + if (unlikely(error))
> + return on_ha_error(error);
> + ++*deleted;
> +
> + if (use_triggers)
> {
> - /*
> - We have not yet called update_virtual_fields(VOL_UPDATE_FOR_READ)
> - in handler methods for the just read row in record[1].
> - */
> - table->move_fields(table->field, table->record[1], table->record[0]);
> - int verr = table->update_virtual_fields(table->file, VCOL_UPDATE_FOR_REPLACE);
> - table->move_fields(table->field, table->record[0], table->record[1]);
> - if (verr)
> - goto err;
> + error= trg->process_triggers(table->in_use, TRG_EVENT_DELETE,
> + TRG_ACTION_AFTER, TRUE);
> + if (unlikely(error))
> + return restore_on_error();
> }
> - if (info->handle_duplicates == DUP_UPDATE)
> - {
> - int res= 0;
> - /*
> - We don't check for other UNIQUE keys - the first row
> - that matches, is updated. If update causes a conflict again,
> - an error is returned
> - */
> - DBUG_ASSERT(table->insert_values != NULL);
> - store_record(table,insert_values);
> - restore_record(table,record[1]);
> - table->reset_default_fields();
> + }
> + }
>
> - /*
> - in INSERT ... ON DUPLICATE KEY UPDATE the set of modified fields can
> - change per row. Thus, we have to do reset_default_fields() per row.
> - Twice (before insert and before update).
> - */
> - DBUG_ASSERT(info->update_fields->elements ==
> - info->update_values->elements);
> - if (fill_record_n_invoke_before_triggers(thd, table,
> - *info->update_fields,
> - *info->update_values,
> - info->ignore,
> - TRG_EVENT_UPDATE))
> - goto before_trg_err;
> -
> - bool different_records= (!records_are_comparable(table) ||
> - compare_record(table));
> - /*
> - Default fields must be updated before checking view updateability.
> - This branch of INSERT is executed only when a UNIQUE key was violated
> - with the ON DUPLICATE KEY UPDATE option. In this case the INSERT
> - operation is transformed to an UPDATE, and the default fields must
> - be updated as if this is an UPDATE.
> - */
> - if (different_records && table->default_field)
> - table->evaluate_update_default_function();
> -
> - /* CHECK OPTION for VIEW ... ON DUPLICATE KEY UPDATE ... */
> - res= info->table_list->view_check_option(table->in_use, info->ignore);
> - if (res == VIEW_CHECK_SKIP)
> - goto after_trg_or_ignored_err;
> - if (res == VIEW_CHECK_ERROR)
> - goto before_trg_err;
> -
> - table->file->restore_auto_increment(prev_insert_id);
> - info->touched++;
> - if (different_records)
> - {
> - if (unlikely(error=table->file->ha_update_row(table->record[1],
> - table->record[0])) &&
> - error != HA_ERR_RECORD_IS_THE_SAME)
> - {
> - if (info->ignore &&
> - !table->file->is_fatal_error(error, HA_CHECK_ALL))
> - {
> - if (!(thd->variables.old_behavior &
> - OLD_MODE_NO_DUP_KEY_WARNINGS_WITH_IGNORE))
> - table->file->print_error(error, MYF(ME_WARNING));
> - goto after_trg_or_ignored_err;
> - }
> - goto err;
> - }
> + /*
> + If more than one iteration of the above loop is done, from
> + the second one the row being inserted will have an explicit
> + value in the autoinc field, which was set at the first call of
> + handler::update_auto_increment(). This value is saved to avoid
> + thd->insert_id_for_cur_row becoming 0. Use this saved autoinc
> + value.
> + */
> + if (table->file->insert_id_for_cur_row == 0)
> + table->file->insert_id_for_cur_row = insert_id_for_cur_row;
>
> - if (error != HA_ERR_RECORD_IS_THE_SAME)
> - {
> - info->updated++;
> - if (table->versioned() &&
> - table->vers_check_update(*info->update_fields))
> - {
> - if (table->versioned(VERS_TIMESTAMP))
> - {
> - store_record(table, record[2]);
> - if ((error= vers_insert_history_row(table)))
> - {
> - info->last_errno= error;
> - table->file->print_error(error, MYF(0));
> - trg_error= 1;
> - restore_record(table, record[2]);
> - goto after_trg_or_ignored_err;
> - }
> - restore_record(table, record[2]);
> - }
> - info->copied++;
> - }
> - }
> - else
> - error= 0;
> - /*
> - If ON DUP KEY UPDATE updates a row instead of inserting
> - one, it's like a regular UPDATE statement: it should not
> - affect the value of a next SELECT LAST_INSERT_ID() or
> - mysql_insert_id(). Except if LAST_INSERT_ID(#) was in the
> - INSERT query, which is handled separately by
> - THD::arg_of_last_insert_id_function.
> - */
> - prev_insert_id_for_cur_row= table->file->insert_id_for_cur_row;
> - insert_id_for_cur_row= table->file->insert_id_for_cur_row= 0;
> - trg_error= (table->triggers &&
> - table->triggers->process_triggers(thd, TRG_EVENT_UPDATE,
> - TRG_ACTION_AFTER, TRUE));
> - info->copied++;
> - }
> + return after_insert(inserted);
> +}
>
> - /*
> - Only update next_insert_id if the AUTO_INCREMENT value was explicitly
> - updated, so we don't update next_insert_id with the value from the
> - row being updated. Otherwise reset next_insert_id to what it was
> - before the duplicate key error, since that value is unused.
> - */
> - if (table->next_number_field_updated)
> - {
> - DBUG_ASSERT(table->next_number_field != NULL);
>
> - table->file->adjust_next_insert_id_after_explicit_value(table->next_number_field->val_int());
> - }
> - else if (prev_insert_id_for_cur_row)
> - {
> - table->file->restore_auto_increment(prev_insert_id_for_cur_row);
> - }
> - goto ok;
> +int Write_record::insert_on_duplicate_update(ha_rows *inserted,
> + ha_rows *updated)
> +{
> + int res= table->file->ha_write_row(table->record[0]);
> + if (likely(!res))
> + return after_insert(inserted);
> +
> + res=prepare_handle_duplicate(res);
> + if (unlikely(res))
> + return restore_on_error();
> +
> + ulonglong prev_insert_id_for_cur_row= 0;
> + /*
> + We don't check for other UNIQUE keys - the first row
> + that matches, is updated. If update causes a conflict again,
> + an error is returned
> + */
> + DBUG_ASSERT(table->insert_values != NULL);
> + store_record(table,insert_values);
> + restore_record(table,record[1]);
> + table->reset_default_fields();
> +
> + /*
> + in INSERT ... ON DUPLICATE KEY UPDATE the set of modified fields can
> + change per row. Thus, we have to do reset_default_fields() per row.
> + Twice (before insert and before update).
> + */
> + DBUG_ASSERT(info->update_fields->elements ==
> + info->update_values->elements);
> + if (fill_record_n_invoke_before_triggers(thd, table,
> + *info->update_fields,
> + *info->update_values,
> + info->ignore,
> + TRG_EVENT_UPDATE))
> + return restore_on_error();
> +
> + bool different_records= (!records_are_comparable(table) ||
> + compare_record(table));
> + /*
> + Default fields must be updated before checking view updateability.
> + This branch of INSERT is executed only when a UNIQUE key was violated
> + with the ON DUPLICATE KEY UPDATE option. In this case the INSERT
> + operation is transformed to an UPDATE, and the default fields must
> + be updated as if this is an UPDATE.
> + */
> + if (different_records && table->default_field)
> + table->evaluate_update_default_function();
> +
> + /* CHECK OPTION for VIEW ... ON DUPLICATE KEY UPDATE ... */
> + res= info->table_list->view_check_option(table->in_use, info->ignore);
> + if (unlikely(res))
> + {
> + if (res == VIEW_CHECK_ERROR)
> + return restore_on_error();
> + DBUG_ASSERT(res == VIEW_CHECK_SKIP);
> + return 0;
> + }
> +
> + table->file->restore_auto_increment(prev_insert_id);
> + info->touched++;
> + if (different_records)
> + {
> + int error= table->file->ha_update_row(table->record[1], table->record[0]);
> +
> + if (unlikely(error) && error != HA_ERR_RECORD_IS_THE_SAME)
> + {
> + if (info->ignore && !table->file->is_fatal_error(error, HA_CHECK_ALL))
why not is_fatal_error(error) ?
> + {
> + if (!(thd->variables.old_behavior &
> + OLD_MODE_NO_DUP_KEY_WARNINGS_WITH_IGNORE))
> + table->file->print_error(error, MYF(ME_WARNING));
> + return 0;
> }
> - else /* DUP_REPLACE */
> + return on_ha_error(error);
> + }
> +
> + if (error != HA_ERR_RECORD_IS_THE_SAME)
> + {
> + ++*updated;
> + if (table->versioned() &&
> + table->vers_check_update(*info->update_fields))
> {
> - /*
> - The manual defines the REPLACE semantics that it is either
> - an INSERT or DELETE(s) + INSERT; FOREIGN KEY checks in
> - InnoDB do not function in the defined way if we allow MySQL
> - to convert the latter operation internally to an UPDATE.
> - We also should not perform this conversion if we have
> - timestamp field with ON UPDATE which is different from DEFAULT.
> - Another case when conversion should not be performed is when
> - we have ON DELETE trigger on table so user may notice that
> - we cheat here. Note that it is ok to do such conversion for
> - tables which have ON UPDATE but have no ON DELETE triggers,
> - we just should not expose this fact to users by invoking
> - ON UPDATE triggers.
> -
> - Note, TABLE_SHARE and TABLE see long uniques differently:
> - - TABLE_SHARE sees as HA_KEY_ALG_LONG_HASH and HA_NOSAME
> - - TABLE sees as usual non-unique indexes
> - */
> - bool is_long_unique= table->s->key_info &&
> - table->s->key_info[key_nr].algorithm ==
> - HA_KEY_ALG_LONG_HASH;
> - if ((is_long_unique ?
> - /*
> - We have a long unique. Test that there are no in-engine
> - uniques and the current long unique is the last long unique.
> - */
> - !(table->key_info[0].flags & HA_NOSAME) &&
> - last_uniq_key(table, table->s->key_info, key_nr) :
> - /*
> - We have a normal key - not a long unique.
> - Test is the current normal key is unique and
> - it is the last normal unique.
> - */
> - last_uniq_key(table, table->key_info, key_nr)) &&
> - !table->file->referenced_by_foreign_key() &&
> - (!table->triggers || !table->triggers->has_delete_triggers()))
> - {
> - if (table->versioned(VERS_TRX_ID))
> - {
> - bitmap_set_bit(table->write_set, table->vers_start_field()->field_index);
> - table->file->column_bitmaps_signal();
> - table->vers_start_field()->store(0, false);
> - }
> - if (unlikely(error= table->file->ha_update_row(table->record[1],
> - table->record[0])) &&
> - error != HA_ERR_RECORD_IS_THE_SAME)
> - goto err;
> - if (likely(!error))
> - {
> - info->deleted++;
> - if (!table->file->has_transactions())
> - thd->transaction->stmt.modified_non_trans_table= TRUE;
> - if (table->versioned(VERS_TIMESTAMP))
> - {
> - store_record(table, record[2]);
> - error= vers_insert_history_row(table);
> - restore_record(table, record[2]);
> - if (unlikely(error))
> - goto err;
> - }
> - }
> - else
> - error= 0; // error was HA_ERR_RECORD_IS_THE_SAME
> - /*
> - Since we pretend that we have done insert we should call
> - its after triggers.
> - */
> - goto after_trg_n_copied_inc;
> - }
> - else
> + if (versioned)
> {
> - if (table->triggers &&
> - table->triggers->process_triggers(thd, TRG_EVENT_DELETE,
> - TRG_ACTION_BEFORE, TRUE))
> - goto before_trg_err;
> -
> - if (!table->versioned(VERS_TIMESTAMP))
> - error= table->file->ha_delete_row(table->record[1]);
> - else
> - {
> - store_record(table, record[2]);
> - restore_record(table, record[1]);
> - table->vers_update_end();
> - error= table->file->ha_update_row(table->record[1],
> - table->record[0]);
> - restore_record(table, record[2]);
> - }
> + store_record(table, record[2]);
> + error= vers_insert_history_row(table);
> + restore_record(table, record[2]);
> if (unlikely(error))
> - goto err;
> - if (!table->versioned(VERS_TIMESTAMP))
> - info->deleted++;
> - else
> - info->updated++;
> - if (!table->file->has_transactions_and_rollback())
> - thd->transaction->stmt.modified_non_trans_table= TRUE;
> - if (table->triggers &&
> - table->triggers->process_triggers(thd, TRG_EVENT_DELETE,
> - TRG_ACTION_AFTER, TRUE))
> - {
> - trg_error= 1;
> - goto after_trg_or_ignored_err;
> - }
> - /* Let us attempt do write_row() once more */
> + return on_ha_error(error);
> }
> + ++*inserted;
> }
> }
> -
> - /*
> - If more than one iteration of the above while loop is done, from
> - the second one the row being inserted will have an explicit
> - value in the autoinc field, which was set at the first call of
> - handler::update_auto_increment(). This value is saved to avoid
> - thd->insert_id_for_cur_row becoming 0. Use this saved autoinc
> - value.
> - */
> - if (table->file->insert_id_for_cur_row == 0)
> - table->file->insert_id_for_cur_row= insert_id_for_cur_row;
> -
> /*
> - Restore column maps if they where replaced during an duplicate key
> - problem.
> + If ON DUP KEY UPDATE updates a row instead of inserting
> + one, it's like a regular UPDATE statement: it should not
> + affect the value of a next SELECT LAST_INSERT_ID() or
> + mysql_insert_id(). Except if LAST_INSERT_ID(#) was in the
> + INSERT query, which is handled separately by
> + THD::arg_of_last_insert_id_function.
> */
> - if (table->read_set != save_read_set ||
> - table->write_set != save_write_set)
> - table->column_bitmaps_set(save_read_set, save_write_set);
> + prev_insert_id_for_cur_row= table->file->insert_id_for_cur_row;
> + insert_id_for_cur_row= table->file->insert_id_for_cur_row= 0;
> +
> + ++*inserted; // Conforms the older behavior;
> +
> + if (use_triggers
> + && table->triggers->process_triggers(thd, TRG_EVENT_UPDATE,
> + TRG_ACTION_AFTER, TRUE))
> + return restore_on_error();
> }
> - else if (unlikely((error=table->file->ha_write_row(table->record[0]))))
> +
> + /*
> + Only update next_insert_id if the AUTO_INCREMENT value was explicitly
> + updated, so we don't update next_insert_id with the value from the
> + row being updated. Otherwise reset next_insert_id to what it was
> + before the duplicate key error, since that value is unused.
> + */
> + if (table->next_number_field_updated)
> + {
> + DBUG_ASSERT(table->next_number_field != NULL);
> +
> + table->file->adjust_next_insert_id_after_explicit_value(
> + table->next_number_field->val_int());
> + }
> + else if (prev_insert_id_for_cur_row)
> + {
> + table->file->restore_auto_increment(prev_insert_id_for_cur_row);
> + }
> +
> + return send_data();
> +}
> +
> +bool Write_record::is_fatal_error(int error)
> +{
> + return error == HA_ERR_LOCK_DEADLOCK ||
> + error == HA_ERR_LOCK_WAIT_TIMEOUT ||
These errors were only in rpl code, not anywhere in sql_insert.cc.
I don't think it's wrong to check for them everywhere consistently,
but does it change the behavior? Can you create a test case for INSERT
before your changes that comes to thd->is_fatal_error() with
HA_ERR_LOCK_DEADLOCK or HA_ERR_LOCK_WAIT_TIMEOUT? What will be the
behavior before and after your change?
> + table->file->is_fatal_error(error, HA_CHECK_ALL);
> +}
> +
> +int Write_record::single_insert(ha_rows *inserted)
> +{
> + DBUG_EXECUTE_IF("rpl_write_record_small_sleep_gtid_100_200",
> + {
> + uint64 seq= thd->rgi_slave ? thd->rgi_slave->current_gtid.seq_no : 0;
> + if (seq == 100 || seq == 200)
> + my_sleep(20000);
> + });
> +
> + int error= table->file->ha_write_row(table->record[0]);
> + if (unlikely(error))
> {
> DEBUG_SYNC(thd, "write_row_noreplace");
> - if (!info->ignore ||
> - table->file->is_fatal_error(error, HA_CHECK_ALL))
> - goto err;
> + if (!info->ignore || is_fatal_error(error))
> + return on_ha_error(error);
> if (!(thd->variables.old_behavior &
> OLD_MODE_NO_DUP_KEY_WARNINGS_WITH_IGNORE))
> table->file->print_error(error, MYF(ME_WARNING));
> table->file->restore_auto_increment(prev_insert_id);
> - goto after_trg_or_ignored_err;
> + return 0;
> }
> + return after_insert(inserted);
> +}
>
> -after_trg_n_copied_inc:
> - info->copied++;
> - thd->record_first_successful_insert_id_in_cur_stmt(table->file->insert_id_for_cur_row);
> - trg_error= (table->triggers &&
> - table->triggers->process_triggers(thd, TRG_EVENT_INSERT,
> - TRG_ACTION_AFTER, TRUE));
> +/**
> + Write a record to table with optional deleting of conflicting records,
> + invoke proper triggers if needed.
> +
> + @note
> + Once this record will be written to table after insert trigger will
> + be invoked. If instead of inserting new record we will update old one
> + then both on update triggers will work instead. Similarly both on
> + delete triggers will be invoked if we will delete conflicting records.
>
> -ok:
> + Sets thd->transaction.stmt.modified_non_trans_table to TRUE if table which
> + is updated didn't have transactions.
> +
> + @return
> + 0 - success
> + non-0 - error
> +*/
> +int Write_record::write_record()
> +{
> + ha_rows inserted= 0, updated= 0;
> + prev_insert_id= table->file->next_insert_id;
> + info->records++;
> + ignored_error= false;
> + int res= 0;
> + switch(info->handle_duplicates)
> + {
> + case DUP_ERROR:
> + res= single_insert(&inserted);
> + break;
> + case DUP_UPDATE:
> + res= insert_on_duplicate_update(&inserted, &updated);
> + info->updated+= updated;
> + break;
> + case DUP_REPLACE:
> + res= replace_row(&inserted, &updated);
> + if (versioned)
> + info->updated+= updated;
> + else
> + info->deleted+= updated;
> + }
> +
> + info->copied+= inserted;
> + if (inserted || updated)
> + notify_non_trans_table_modified();
> + return res;
> +}
> +
> +
> +int Write_record::prepare_handle_duplicate(int error)
> +{
> /*
> - We send the row after writing it to the table so that the
> - correct values are sent to the client. Otherwise it won't show
> - autoinc values (generated inside the handler::ha_write()) and
> - values updated in ON DUPLICATE KEY UPDATE.
> + If we do more than one iteration of the replace loop, from the second one the
> + row will have an explicit value in the autoinc field, which was set at
> + the first call of handler::update_auto_increment(). So we must save
> + the autogenerated value to avoid thd->insert_id_for_cur_row to become
> + 0.
> */
> - if (sink && sink->send_data(thd->lex->returning()->item_list) < 0)
> - trg_error= 1;
> + if (table->file->insert_id_for_cur_row > 0)
> + insert_id_for_cur_row= table->file->insert_id_for_cur_row;
> + else
> + table->file->insert_id_for_cur_row= insert_id_for_cur_row;
> +
> + if (is_fatal_error(error))
> + return on_ha_error(error);
>
> -after_trg_or_ignored_err:
> - if (key)
> - my_safe_afree(key,table->s->max_unique_length);
> + bool is_duplicate_key_error=
> + table->file->is_fatal_error(error, HA_CHECK_ALL & ~HA_CHECK_DUP);
> + if (!is_duplicate_key_error)
> + {
> + /*
> + We come here when we had an ignorable error which is not a duplicate
> + key error. In this we ignore error if ignore flag is set, otherwise
> + report error as usual. We will not do any duplicate key processing.
> + */
> + if (info->ignore)
> + {
> + table->file->print_error(error, MYF(ME_WARNING));
> + ignored_error= true;
> + return 1; /* Ignoring a not fatal error */
> + }
> + return on_ha_error(error);
> + }
> +
> + key_nr = table->file->get_dup_key(error);
> +
> + if (unlikely(key_nr == std::numeric_limits<decltype(key_nr)>::max()))
Well, the handler returns (uint)-1, please compare with it, instead of
inventing numerous other ways of archieving the same.
And why did you decide to use ushort for the key number?
We pretty much never use ushort, key number is uint everywhere.
I agree that ushort is enough size-wise, but let's rather be consistent.
> + return on_ha_error(HA_ERR_FOUND_DUPP_KEY); /* Database can't find key */
> +
> + DEBUG_SYNC(thd, "write_row_replace");
> +
> + /* Read all columns for the row we are going to replace */
> + table->use_all_columns();
> + /*
> + Don't allow REPLACE to replace a row when an auto_increment column
> + was used. This ensures that we don't get a problem when the
> + whole range of the key has been used.
> + */
> + if (info->handle_duplicates == DUP_REPLACE && table->next_number_field &&
> + key_nr == table->s->next_number_index && insert_id_for_cur_row > 0)
> + return on_ha_error(error);
> +
> + error= locate_dup_record();
> + if (error)
> + return on_ha_error(error);
> +
> + return 0;
> +}
> +
> +inline
> +void Write_record::notify_non_trans_table_modified()
> +{
> if (!table->file->has_transactions_and_rollback())
> thd->transaction->stmt.modified_non_trans_table= TRUE;
ok, as you like.
I'm personally not a great fan of one-liners that are used only once.
> - DBUG_RETURN(trg_error);
> +}
>
> -err:
> +inline
> +int Write_record::on_ha_error(int error)
> +{
> info->last_errno= error;
> table->file->print_error(error,MYF(0));
> -
> -before_trg_err:
> + DBUG_PRINT("info", ("Returning error %d", error));
> + return restore_on_error();
> +}
> +
> +inline
> +int Write_record::restore_on_error()
> +{
> table->file->restore_auto_increment(prev_insert_id);
> - if (key)
> - my_safe_afree(key, table->s->max_unique_length);
> - table->column_bitmaps_set(save_read_set, save_write_set);
> - DBUG_RETURN(1);
> + return ignored_error ? 0 : 1;
> +}
> +
> +inline
> +int Write_record::after_insert(ha_rows *inserted)
> +{
> + ++*inserted;
> + thd->record_first_successful_insert_id_in_cur_stmt(table->file->insert_id_for_cur_row);
> + return after_ins_trg() || send_data();
> +}
> +
> +inline
> +int Write_record::after_ins_trg()
> +{
> + int res= 0;
> + if (use_triggers)
> + res= table->triggers->process_triggers(thd, TRG_EVENT_INSERT,
> + TRG_ACTION_AFTER, TRUE);
> + return res;
> +}
> +
> +inline
> +int Write_record::send_data()
> +{
> + /*
> + We send the row after writing it to the table so that the
> + correct values are sent to the client. Otherwise it won't show
> + autoinc values (generated inside the handler::ha_write()) and
> + values updated in ON DUPLICATE KEY UPDATE.
> + */
> + return sink ? sink->send_data(thd->lex->returning()->item_list) < 0 : 0;
> }
>
>
> +
> /******************************************************************************
> Check that there aren't any null_fields
> ******************************************************************************/
> @@ -4816,18 +4934,8 @@ select_create::prepare(List<Item> &_values, SELECT_LEX_UNIT *u)
>
> restore_record(table,s->default_values); // Get empty record
> thd->cuted_fields=0;
> - bool create_lookup_handler= info.handle_duplicates != DUP_ERROR;
> - if (info.ignore || info.handle_duplicates != DUP_ERROR)
> - {
> - create_lookup_handler= true;
> - table->file->extra(HA_EXTRA_IGNORE_DUP_KEY);
> - if (table->file->ha_table_flags() & HA_DUPLICATE_POS)
> - {
> - if (table->file->ha_rnd_init_with_error(0))
> - DBUG_RETURN(1);
> - }
> - }
> - table->file->prepare_for_insert(create_lookup_handler);
> + if (prepare_for_replace(table, info.handle_duplicates, info.ignore))
ok.
strange that select_create::prepare doesn't call select_insert::prepare
> + DBUG_RETURN(1);
> if (info.handle_duplicates == DUP_REPLACE &&
> (!table->triggers || !table->triggers->has_delete_triggers()))
> table->file->extra(HA_EXTRA_WRITE_CAN_REPLACE);
> diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc
> index 5db53b6f2b8..7ce4850f534 100644
> --- a/sql/sql_parse.cc
> +++ b/sql/sql_parse.cc
> @@ -4814,7 +4815,8 @@ mysql_execute_command(THD *thd)
> &lex->value_list,
> lex->duplicates,
> lex->ignore,
> - result)))
> + result,
> + &write)))
Why do you need Write_record here on the stack?
Wouldn't it be more logical to have it in select_insert ?
> {
> if (lex->analyze_stmt)
> ((select_result_interceptor*)sel_result)->disable_my_ok_calls();
Regards,
Sergei
Chief Architect, MariaDB Server
and security(a)mariadb.org
2
1