Hi Sergei, Andrei, The commit message had me confused at first - because there are mechanisms in parallel replication to ensure that things stay alive as long as needed; and because if description_event_for_exec points to invalid (freed) memory, then it would seem to indicate a deeper problem than can be solved by just copying a single value out. So I looked into it a bit (thanks Andrei for doing a testcase BTW!). And the real problem is not use-after-free, it is that rli->relay_log.description_event_for_exec is owned by the SQL driver thread and updated without any thread synchronisation; it is wrong to access it from a worker thread. So I checked and found that there is actually another case of the same problem not fixed in the patch, in Execute_load_log_event::do_apply_event(): int Execute_load_log_event::do_apply_event(rpl_group_info *rgi) if (!(lev= (Load_log_event*) Log_event::read_log_event(&file, rli->relay_log.description_event_for_exec, opt_slave_sql_verify_checksum)) || It looks to me that this will have the same problem - when run in the worker thread, the value of description_event_for_exec will be for whatever point in the log the SQL thread is currently reading, not the one corresponding to the LOAD_EVENT. So Load_log_event will need to take a copy of the description_event_for_exec, just like Query_log_event needed the options_written_to_binlog value. An alternative approach to fix could be to use the inuse_relaylog. Parallel replication keeps a list of these (one for each relay log having workers active), and keeps each element alive until all workers are done with that relay log. Currently inuse_relaylog mostly has the name of the relay log file, used for transaction retry in optimistic parallel replication. But it could be extended with a copy of description_event_for_exec which would be available to do_apply_event() in rgi->relaylog. That is perhaps the "correct" fix. But as long as it's only copying a 32-bit value to query event (plus a similar fix for LOAD_EVENT), maybe the fix in the patch is fine as well. Some other smaller comments: The commit message is misleading. The problem is that worker threads are accessing an FD owned by and updated by the SQL driver thread. This FD can be from any random binlog following the current query of the worker thread, as Sergei pointed out. The FD being freed while the worker thread is using its copy is just one symptom, more common will be to use a too new FD, which is also what triggers an error in Andrei's testcase. And it would be good to put a comment in the code on description_event_for_exec that it must _not_ be accessed from worker threads. The name is somewhat misleading ("exec" suggests exactly worker thread operation), maybe it should be renamed to something like "description_event_for_sql_thread", and description_event_for_queue could be called "description_event_for_io_thread", or something. BTW, the testcase only fails sporadically (without the fix), becase it depends on whether the SQL thread has had time to read ahead to a new FD when the CREATE TABLE t1 runs in the worker thread. There's a wait in the testcase which seems to be intended to ensure this condition: # Wait for the last event recieve before to unlock t2 --let $wait_condition= SELECT COUNT(*) > 0 FROM information_schema.processlist WHERE state = "Waiting for worker threads to be idle" But apparently it's ineffective, I had to run the test ~50 times for the error to trigger. Just mentioning it, maybe it's worth spending time on. Hope this helps, - Kristian. Sergei Golubchik <serg@mariadb.org> writes:
Hi, Andrei,
I still don't understand this.
1. Why rgi->options_to_bin_log is only set for GTID_EVENT? Is there a guarantee that there always be a GTID_EVENT before a QUERY_EVENT? What if gtids aren't enabled?
2. how do you guarantee that all query events for a previous value of options_written_to_bin_log have been applied before updating it in the rgi?
3. and if you do guarantee it, why is it atomic? No query events = nobody reads options_written_to_bin_log concurrently.
And why not to do a simple patch, like the one I've attached? (note, it's an intentionally minimal patch to show the fix, practically I'd accompany it with a small cleanup patch)
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org