[Maria-discuss] DELETE query hanging on Statistics (10.6.12)
With this table: CREATE TABLE t1 ( label varchar(100) NOT NULL, dt datetime NOT NULL, success tinyint(1) NOT NULL, PRIMARY KEY (label,dt), KEY dt (dt), KEY success (success,dt) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC I have a DELETE query that has been stuck in Statistics for nearly an hour now: DELETE MyDB.t1 FROM MyDB.t1 WHERE dt <= '2023-02-21 08:06:46' AND dt >= '2023-02-21 08:03:48' This statement runs several times per hour to trim the table (and has for years). It has run fine against my 10.6.12 DB (this is a downstream replica DB) for a couple weeks now(when I upgraded the replica); but suddenly today produced the hang. Any ideas why? The table has nearly 15 million rows; but the rows are more or less evenly spaced across datetimes. Also I find that I cannot kill the query using KILL 12345; the KILL statement just comes back immediately without reporting an error and the statement continues. Is there no way to kill this statement without crashing the DB? Again, this is a replica statement that is executing. I had planned to stop the query, execute an ANALYZE TABLE t1 and then try it again, but I don't seem to be able to kill the query. Thank you! Dan
Hi, Dan, On Mar 03, mariadb@Biblestuph.com wrote:
With this table:
CREATE TABLE t1 ( label varchar(100) NOT NULL, dt datetime NOT NULL, success tinyint(1) NOT NULL, PRIMARY KEY (label,dt), KEY dt (dt), KEY success (success,dt) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC
I have a DELETE query that has been stuck in Statistics for nearly an hour now:
DELETE MyDB.t1 FROM MyDB.t1 WHERE dt <= '2023-02-21 08:06:46' AND dt >= '2023-02-21 08:03:48'
This statement runs several times per hour to trim the table (and has for years). It has run fine against my 10.6.12 DB (this is a downstream replica DB) for a couple weeks now(when I upgraded the replica); but suddenly today produced the hang. Any ideas why? The table has nearly 15 million rows; but the rows are more or less evenly spaced across datetimes.
Is there any chance you could run https://poormansprofiler.org/ (or one of the alternatives) when this happens again?
Also I find that I cannot kill the query using KILL 12345; the KILL statement just comes back immediately without reporting an error and the statement continues. Is there no way to kill this statement without crashing the DB? Again, this is a replica statement that is executing. I had planned to stop the query, execute an ANALYZE TABLE t1 and then try it again, but I don't seem to be able to kill the query.
KILL always returns immediately. Killing in MariaDB is cooperative, it simply tells the thread "abort execution at your earliest convenience" and it's up to the executing thread to decide when to abort. KILL doesn't wait for it. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
On 3/6/2023 3:58 AM, Sergei Golubchik wrote:
Hi, Dan,
On Mar 03, mariadb@Biblestuph.com wrote:
With this table:
CREATE TABLE t1 ( label varchar(100) NOT NULL, dt datetime NOT NULL, success tinyint(1) NOT NULL, PRIMARY KEY (label,dt), KEY dt (dt), KEY success (success,dt) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC
I have a DELETE query that has been stuck in Statistics for nearly an hour now:
DELETE MyDB.t1 FROM MyDB.t1 WHERE dt <= '2023-02-21 08:06:46' AND dt >= '2023-02-21 08:03:48'
This statement runs several times per hour to trim the table (and has for years). It has run fine against my 10.6.12 DB (this is a downstream replica DB) for a couple weeks now(when I upgraded the replica); but suddenly today produced the hang. Any ideas why? The table has nearly 15 million rows; but the rows are more or less evenly spaced across datetimes.
Is there any chance you could run https://poormansprofiler.org/ (or one of the alternatives) when this happens again?
Thanks for your reply. Later that same day I ended up having to force the server to die (kill -9); the thread never did let go and never left statistics; and since it was a replica thread it was preventing the replica from getting updated. I know that the server was getting some CPU while the thread was hung, but that may have only been the slave IO thread (which AFAICT was still running fine). And the server itself was responsive, so I was able to FLUSH all the tables (at least all but the one that it was working on) before I pulled the plug. After I killed the server I restarted with skip-slave and immediately ANALYZEd the table in question, as well as all the other tables, before starting the slave again. I've not seen the issue since (and honestly hope I don't) but if so I will attempt to use the profiling technique you mentioned and report back here.
Also I find that I cannot kill the query using KILL 12345; the KILL statement just comes back immediately without reporting an error and the statement continues. Is there no way to kill this statement without crashing the DB? Again, this is a replica statement that is executing. I had planned to stop the query, execute an ANALYZE TABLE t1 and then try it again, but I don't seem to be able to kill the query.
KILL always returns immediately. Killing in MariaDB is cooperative, it simply tells the thread "abort execution at your earliest convenience" and it's up to the executing thread to decide when to abort. KILL doesn't wait for it.
Yah, that's kind of what I figured. Is there a recommended (better) course of action other than kill -9 if this happens again?
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
On 3/18/2023 1:28 PM, mariadb@biblestuph.com wrote:
On 3/6/2023 3:58 AM, Sergei Golubchik wrote:
Hi, Dan,
On Mar 03, mariadb@Biblestuph.com wrote:
With this table:
CREATE TABLE t1 ( label varchar(100) NOT NULL, dt datetime NOT NULL, success tinyint(1) NOT NULL, PRIMARY KEY (label,dt), KEY dt (dt), KEY success (success,dt) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC
I have a DELETE query that has been stuck in Statistics for nearly an hour now:
DELETE MyDB.t1 FROM MyDB.t1 WHERE dt <= '2023-02-21 08:06:46' AND dt >= '2023-02-21 08:03:48'
This statement runs several times per hour to trim the table (and has for years). It has run fine against my 10.6.12 DB (this is a downstream replica DB) for a couple weeks now(when I upgraded the replica); but suddenly today produced the hang. Any ideas why? The table has nearly 15 million rows; but the rows are more or less evenly spaced across datetimes.
Is there any chance you could run https://poormansprofiler.org/ (or one of the alternatives) when this happens again?
Thanks for your reply. Later that same day I ended up having to force the server to die (kill -9); the thread never did let go and never left statistics; and since it was a replica thread it was preventing the replica from getting updated. I know that the server was getting some CPU while the thread was hung, but that may have only been the slave IO thread (which AFAICT was still running fine). And the server itself was responsive, so I was able to FLUSH all the tables (at least all but the one that it was working on) before I pulled the plug.
After I killed the server I restarted with skip-slave and immediately ANALYZEd the table in question, as well as all the other tables, before starting the slave again. I've not seen the issue since (and honestly hope I don't) but if so I will attempt to use the profiling technique you mentioned and report back here.
It just happened again today. A different replica this time, but the exact same statement. Here's the "Poor Man's Profile." Hope it is helpful: 26 pthread_cond_timedwait@@GLIBC_2.3.2,Thread_cache::park(),do_handle_one_connection(CONNECT*,,handle_one_connection,pfs_spawn_thread,start_thread,clone 2 pthread_cond_timedwait@@GLIBC_2.3.2,MYSQL_BIN_LOG::wait_for_update_binlog_end_pos(THD*,,mysql_binlog_send(THD*,,dispatch_command(enum_server_command,,do_command(THD*,,do_handle_one_connection(CONNECT*,,handle_one_connection,pfs_spawn_thread,start_thread,clone 1 syscall,tpool::aio_linux::getevent_thread_routine(tpool::aio_linux*),??,start_thread,clone 1 syscall,ssux_lock_impl<true>::wr_wait(unsigned,buf_page_get_low(page_id_t,,buf_page_get_gen(page_id_t,,btr_block_get(dict_index_t,btr_can_merge_with_page(btr_cur_t*,,btr_compress(btr_cur_t*,,btr_cur_compress_if_useful(btr_cur_t*,,btr_cur_pessimistic_delete(dberr_t*,,btr_cur_node_ptr_delete(btr_cur_t*,,btr_compress(btr_cur_t*,,btr_cur_compress_if_useful(btr_cur_t*,,btr_cur_pessimistic_delete(dberr_t*,,row_purge_remove_sec_if_poss_tree(purge_node_t*,,row_purge_record_func(purge_node_t*,,row_purge_step(que_thr_t*),que_run_threads(que_thr_t*),trx_purge(unsigned,purge_coordinator_callback(void*),tpool::task_group::execute(tpool::task*),tpool::thread_pool_generic::worker_main(tpool::worker_data*),??,start_thread,clone 1 syscall,srw_mutex_impl<true>::wait_and_lock(),buf_page_get_low(page_id_t,,buf_page_get_gen(page_id_t,,btr_block_get(dict_index_t,btr_estimate_n_rows_in_range(dict_index_t*,,ha_innobase::records_in_range(unsigned,handler::multi_range_read_info_const(unsigned,DsMrr_impl::dsmrr_info_const(unsigned,check_quick_select(PARAM*,,SQL_SELECT::test_quick_select(THD*,,JOIN::optimize_inner(),JOIN::optimize(),mysql_select(THD*,,mysql_execute_command(THD*,,mysql_parse(THD*,,Query_log_event::do_apply_event(rpl_group_info*,,apply_event_and_update_pos_apply(Log_event*,,rpt_handle_event(rpl_parallel_thread::queued_event*,,handle_rpl_parallel_thread,pfs_spawn_thread,start_thread,clone 1 sigwaitinfo,signal_hand,pfs_spawn_thread,start_thread,clone 1 pthread_cond_wait@@GLIBC_2.3.2,rpl_parallel_entry::choose_thread(rpl_group_info*,,rpl_parallel::do_event(rpl_group_info*,,handle_slave_sql,pfs_spawn_thread,start_thread,clone 1 pthread_cond_wait@@GLIBC_2.3.2,MYSQL_BIN_LOG::queue_for_group_commit(MYSQL_BIN_LOG::group_commit_entry*),MYSQL_BIN_LOG::write_transaction_to_binlog_events(MYSQL_BIN_LOG::group_commit_entry*),MYSQL_BIN_LOG::write_transaction_to_binlog(THD*,,binlog_flush_cache(THD*,,MYSQL_BIN_LOG::log_and_order(THD*,,ha_commit_trans(THD*,,trans_commit(THD*),Xid_log_event::do_commit(),Xid_apply_log_event::do_apply_event(rpl_group_info*),apply_event_and_update_pos_apply(Log_event*,,rpt_handle_event(rpl_parallel_thread::queued_event*,,handle_rpl_parallel_thread,pfs_spawn_thread,start_thread,clone 1 pthread_cond_wait@@GLIBC_2.3.2,handle_manager,pfs_spawn_thread,start_thread,clone 1 pthread_cond_wait@@GLIBC_2.3.2,buf_flush_page_cleaner(),??,start_thread,clone 1 pthread_cond_wait@@GLIBC_2.3.2,binlog_background_thread,pfs_spawn_thread,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,tpool::thread_pool_generic::wait_for_tasks(std::unique_lock<std::mutex>&,,tpool::thread_pool_generic::get_task(tpool::worker_data*,,tpool::thread_pool_generic::worker_main(tpool::worker_data*),??,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,timer_handler,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,my_service_thread_sleep,ma_checkpoint_background,start_thread,clone 1 poll,vio_io_wait,vio_socket_io_wait,vio_ssl_read,my_real_read(st_net*,,my_net_read_packet_reallen,cli_safe_read_reallen,handle_slave_io,pfs_spawn_thread,start_thread,clone 1 poll,vio_io_wait,vio_socket_io_wait,vio_read,my_real_read(st_net*,,my_net_read_packet_reallen,my_net_read_packet,do_command(THD*,,do_handle_one_connection(CONNECT*,,handle_one_connection,pfs_spawn_thread,start_thread,clone 1 poll,handle_connections_sockets(),mysqld_main(int,,__libc_start_main,_start 1
Also I find that I cannot kill the query using KILL 12345; the KILL statement just comes back immediately without reporting an error and the statement continues. Is there no way to kill this statement without crashing the DB? Again, this is a replica statement that is executing. I had planned to stop the query, execute an ANALYZE TABLE t1 and then try it again, but I don't seem to be able to kill the query.
KILL always returns immediately. Killing in MariaDB is cooperative, it simply tells the thread "abort execution at your earliest convenience" and it's up to the executing thread to decide when to abort. KILL doesn't wait for it.
Yah, that's kind of what I figured. Is there a recommended (better) course of action other than kill -9 if this happens again?
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi Dan, I see one thread that is doing a re-entrant call to btr_cur_pessimistic_delete() on a secondary index tree when purging the history of a committed transaction (such as a DELETE or an UPDATE of an indexed column). It matches the hang https://jira.mariadb.org/browse/MDEV-29835 that was actually introduced in MySQL 5.7 already and has been present in MariaDB Server starting with 10.2.2. The thread right below that is executing btr_estimate_n_rows_in_range(), which was improved in https://jira.mariadb.org/browse/MDEV-21136 in MariaDB 10.6.9. After that change, we started to see much more InnoDB hangs. For the 10.6.12 release, I tried to fix MDEV-29835. Because I ran out of time, I fixed only part of it, in https://jira.mariadb.org/browse/MDEV-30400. I have the feeling that this partial fix made the hangs in the remaining cases much more likely. Yasufumi Kinoshita introduced in MySQL 5.7 a latch mode that sits between exclusive (X) and shared (S), called SX by him, and called U (Update) by me in https://jira.mariadb.org/browse/MDEV-24142. At most one X or SX lock can be granted on an object at a time. While X locks conflict with S locks, the SX lock allows any number of S locks to be acquired concurrently. The problem is lock order inversion because Yasufumi’s implementation violates his own design constraints https://dev.mysql.com/worklog/task/?id=6326 High Level Architecture. I helped formulate those rules back then, but I was not otherwise involved with the design, implementation or review of the change. Unfortunately, one section heading "(2) When holding index->lock SX-latch:" is missing. The purge thread that is doing the re-entrant call to btr_cur_pessimistic_delete() is holding an index tree SX-latch and some leaf page latches. As part of the page merge, it has to access some non-leaf pages on which it did not acquire latches upfront. According to the design rules, this is the wrong order of acquiring latches. The btr_estimate_n_rows_in_range() thread is holding an index S-latch and following the correct order for that case. Without having the output of "thread apply all backtrace full", I cannot say for sure that this is a case of MDEV-29835, but I think that it is extremely likely. Based on other cases that I have analyzed, I expect that the btr_cur_pessimistic_delete() is holding a page latch that btr_estimate_n_rows_in_range() is waiting for, and it is waiting for a higher-level page latch that btr_estimate_n_rows_in_range() is holding. The simple fix to this would be to never use the index SX-lock mode, and always escalate to exclusive locking. We actually tried that years back in https://jira.mariadb.org/browse/MDEV-14637 but it would have caused a significant performance regression. The upcoming quarterly releases (within a month or so) includes a fix of MDEV-29835 that only escalates to exclusively locking the index tree when it is really needed. In debug builds, we have assertions that would fire if index page latches are being acquired in the wrong order while not holding an exclusive index latch. This fix was tested both for correctness (lack of debug assertions) and performance. This is not the only bug that is related to SX-locks. https://jira.mariadb.org/browse/MDEV-29883 is another example. Some users are successfully using a development snapshot that includes the fix of MDEV-29835. In https://jira.mariadb.org/browse/MDEV-30481 you can find one example. With best regards, -- Marko Mäkelä, Lead Developer InnoDB MariaDB plc
Thanks so much for the detailed response. I won't pretend to understand the details of all of it, but I'm taking away a couple big things: 1. It's a mutual deadlock that cannot resolve itself 2. It's (potentially, if this really is a match) fixed in the upcoming 10.6.13 Am I understanding that much correctly? In the meantime, is there any potential work-around that can be applied to existing DBs? I.E., some configuration setting or parameter to either remove or at least reduce the potential for the issue? As a guess I yesterday tried turning off parallel replication on the two systems that it has crashed so far, but as the crash is not regular (I'm not able to repeat it in a development system) I don't know if it helped at all. Also, I'm concerned about my upstream DB's. So far it's only crashed replicas, and only with that one query. But do I read you right that this problem is not necessarily replication-related; that it could happen on the primary, as well? The biggest problem I have right now when it happens is that I can't find any way to break out of it. I.E., I cannot kill the thread in question and instead end up having to kill the server, which of course is a huge pain point. Is there anything else I can do with it that is not so drastic? Thanks again, Dan Ragle On 4/14/2023 1:53 AM, Marko Mäkelä wrote:
Hi Dan,
I see one thread that is doing a re-entrant call to btr_cur_pessimistic_delete() on a secondary index tree when purging the history of a committed transaction (such as a DELETE or an UPDATE of an indexed column). It matches the hang https://jira.mariadb.org/browse/MDEV-29835 that was actually introduced in MySQL 5.7 already and has been present in MariaDB Server starting with 10.2.2.
The thread right below that is executing btr_estimate_n_rows_in_range(), which was improved in https://jira.mariadb.org/browse/MDEV-21136 in MariaDB 10.6.9. After that change, we started to see much more InnoDB hangs. For the 10.6.12 release, I tried to fix MDEV-29835. Because I ran out of time, I fixed only part of it, in https://jira.mariadb.org/browse/MDEV-30400. I have the feeling that this partial fix made the hangs in the remaining cases much more likely.
Yasufumi Kinoshita introduced in MySQL 5.7 a latch mode that sits between exclusive (X) and shared (S), called SX by him, and called U (Update) by me in https://jira.mariadb.org/browse/MDEV-24142. At most one X or SX lock can be granted on an object at a time. While X locks conflict with S locks, the SX lock allows any number of S locks to be acquired concurrently.
The problem is lock order inversion because Yasufumi’s implementation violates his own design constraints https://dev.mysql.com/worklog/task/?id=6326 High Level Architecture. I helped formulate those rules back then, but I was not otherwise involved with the design, implementation or review of the change. Unfortunately, one section heading "(2) When holding index->lock SX-latch:" is missing.
The purge thread that is doing the re-entrant call to btr_cur_pessimistic_delete() is holding an index tree SX-latch and some leaf page latches. As part of the page merge, it has to access some non-leaf pages on which it did not acquire latches upfront. According to the design rules, this is the wrong order of acquiring latches. The btr_estimate_n_rows_in_range() thread is holding an index S-latch and following the correct order for that case.
Without having the output of "thread apply all backtrace full", I cannot say for sure that this is a case of MDEV-29835, but I think that it is extremely likely. Based on other cases that I have analyzed, I expect that the btr_cur_pessimistic_delete() is holding a page latch that btr_estimate_n_rows_in_range() is waiting for, and it is waiting for a higher-level page latch that btr_estimate_n_rows_in_range() is holding.
The simple fix to this would be to never use the index SX-lock mode, and always escalate to exclusive locking. We actually tried that years back in https://jira.mariadb.org/browse/MDEV-14637 but it would have caused a significant performance regression. The upcoming quarterly releases (within a month or so) includes a fix of MDEV-29835 that only escalates to exclusively locking the index tree when it is really needed. In debug builds, we have assertions that would fire if index page latches are being acquired in the wrong order while not holding an exclusive index latch. This fix was tested both for correctness (lack of debug assertions) and performance.
This is not the only bug that is related to SX-locks. https://jira.mariadb.org/browse/MDEV-29883 is another example.
Some users are successfully using a development snapshot that includes the fix of MDEV-29835. In https://jira.mariadb.org/browse/MDEV-30481 you can find one example.
With best regards,
participants (3)
-
mariadb@Biblestuph.com
-
Marko Mäkelä
-
Sergei Golubchik