Nirbhay,
I'm going to have a number of patches/suggestions from chasing
this. Hopefully I'll have them in a consumable fashion soon.
Is it preferred to send them all to the list?
This is a rough summary of what we've found so far:
1.) MDEV-6924: either:
fix because CTAS uses THD::STMT_QUERY_TYPE
alternatively: Query_log_event::Query_log_event()
flips the setting of "direct" when binlog is not row/
picks inapproriate setting of use_cache
1a.) Revert patch for MDEV-7673, as it apparently can cause a crash
with WSREP: FSM: no such a transition REPLICATING -> REPLICATING
2.) select_insert::send_eof() will call my_ok() when called from
select_create::send_eof() even if abort_result_set() is going to
be called. Rectify for CTAS case.
3.) wsrep_applier thread tends to spin and try to apply the same
transaction multiple times to cluster failure even though the
selected victim thread is slowly trying to abort.
a.) increase timeout if a victim has been selected
b.) don't downcall from wsrep_abort_thd if victim is already
aborting
4.) select_create::send_eof() sets exit_done before seeing if galera
is going to call abort_result_set(), which can lead to unexpected
tables being present + cluster failure as result.
5.) handle_select() resets thd->killed() even when thread was a victim
thread, causing crash.
6.) cherry-picking upstream commit cc3d09bc8d5a78abc064d289045b20363aab9d28
(I believe you're already aware of this one seeing as how your
name is on it)
Thanks,
Andy
--
Andrew W. Elble
aweits@discipline.rit.edu
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912