Re: [Maria-developers] [Merge] lp:~paul-mccullagh/maria/maria-pbxt-rc3 into lp:maria
Hi, Paul! On Dec 01, Paul McCullagh wrote:
This patch changes MariaDB code in one case:
File handler.cc, Line 1591
DBUG_ASSERT(total_ha_2pc == (ulong) opt_bin_log+1); // only InnoDB and binlog
This line has been commented out, because the assertions fails now that PBXT supports XA.
Although this assertion implies that not everything will work, I have not found any problem with the stability of MySQL due to the addition of PBXT XA.
Eh, no, not exactly. This assertion was to make sure that the ifdef'ed code only runs when InnoDB is the only XA-capable engine. As it's no longer the case, you need to remove the whole ifdef block, not only the assert.
PBXT XA can be disabled by setting pbxt_support_xa=0, it is enabled by default.
Regards / Mit vielen Grüßen, Sergei -- __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@sun.com> / /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server Architect /_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München 161028 <___/ Sonnenallee 1, 85551 Kirchheim-Heimstetten Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Häring
Hi Sergei, Thanks for the heads-up on that! The code in question is as follows: #ifndef WILL_BE_DELETED_LATER /* for now, only InnoDB supports 2pc. It means we can always safely rollback all pending transactions, without risking inconsistent data */ DBUG_ASSERT(total_ha_2pc == (ulong) opt_bin_log+1); // only InnoDB and binlog tc_heuristic_recover= TC_HEURISTIC_RECOVER_ROLLBACK; // forcing ROLLBACK info.dry_run=FALSE; #endif What would be the affect of removing these lines? And what do you think we would need to test when we take out the lines? Best regards, Paul On Dec 1, 2009, at 1:14 PM, Sergei Golubchik wrote:
Hi, Paul!
On Dec 01, Paul McCullagh wrote:
This patch changes MariaDB code in one case:
File handler.cc, Line 1591
DBUG_ASSERT(total_ha_2pc == (ulong) opt_bin_log+1); // only InnoDB and binlog
This line has been commented out, because the assertions fails now that PBXT supports XA.
Although this assertion implies that not everything will work, I have not found any problem with the stability of MySQL due to the addition of PBXT XA.
Eh, no, not exactly. This assertion was to make sure that the ifdef'ed code only runs when InnoDB is the only XA-capable engine.
As it's no longer the case, you need to remove the whole ifdef block, not only the assert.
PBXT XA can be disabled by setting pbxt_support_xa=0, it is enabled by default.
Regards / Mit vielen Grüßen, Sergei
-- __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@sun.com> / /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server Architect /_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München 161028 <___/ Sonnenallee 1, 85551 Kirchheim- Heimstetten Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Häring
-- Paul McCullagh PrimeBase Technologies www.primebase.org www.blobstreaming.org pbxt.blogspot.com
Hi, Paul! On Dec 01, Paul McCullagh wrote:
Thanks for the heads-up on that!
The code in question is as follows:
#ifndef WILL_BE_DELETED_LATER /* for now, only InnoDB supports 2pc. It means we can always safely rollback all pending transactions, without risking inconsistent data */ DBUG_ASSERT(total_ha_2pc == (ulong) opt_bin_log+1); // only InnoDB and binlog tc_heuristic_recover= TC_HEURISTIC_RECOVER_ROLLBACK; // forcing ROLLBACK info.dry_run=FALSE; #endif
What would be the affect of removing these lines?
I'd better start from "what is the effect of these lines" :) 2PC (with a binlog) works as follows: prepare in all affected storage engines binlog write commit in all affected storage engines XA recovery (with a binlog) works as follows: ask storage engines about the prepared but not committed transactions ask engines to commit every such a transaction that is in a binlog and ask engines to rollback other prepared transactions Now, if a binlog doesn't indicate a crash (last binlog file was closed properly, showing a normal shutdown procedure) or, for example, there're no binlog files at all, the server cannot perform the recovery - it doesn't know what transactions should be committed and what transactions should be rolled back. A transaction may be prepared in one engine and already committed (or rolled back) in another. In this case the server requests the user to make the decision, the user has to request an explicit commit or rollback with --tc-heuristic-recover command-line switch. But if there's only one XA-capable storage engine, it's always safe to rollback all prepared transactions, there can be no other engine that has them committed. That's what the ifdef-ed code does - if InnoDB is the only XA-capable storage engine on recovery without binlog it forces a rollback of all not committed transactions, preserving the pre-XA InnoDB behavior.
And what do you think we would need to test when we take out the lines?
Start a transaction, do some table writes. Crash the server during the 2pc, for example: SET @@debug="d,crash_commit_after_prepare"; COMMIT; delete all binlog files (or only the last one), try to start mysqld - it should abort with an error. The error message tells what to do - I hope you'll find it clear enough (but please complain, if not).
On Dec 1, 2009, at 1:14 PM, Sergei Golubchik wrote:
On Dec 01, Paul McCullagh wrote:
This patch changes MariaDB code in one case:
File handler.cc, Line 1591
DBUG_ASSERT(total_ha_2pc == (ulong) opt_bin_log+1); // only InnoDB and binlog
This line has been commented out, because the assertions fails now that PBXT supports XA.
Although this assertion implies that not everything will work, I have not found any problem with the stability of MySQL due to the addition of PBXT XA.
Eh, no, not exactly. This assertion was to make sure that the ifdef'ed code only runs when InnoDB is the only XA-capable engine.
As it's no longer the case, you need to remove the whole ifdef block, not only the assert.
PBXT XA can be disabled by setting pbxt_support_xa=0, it is enabled by default.
Regards / Mit vielen Grüßen, Sergei
-- __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@sun.com> / /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server Architect /_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München 161028 <___/ Sonnenallee 1, 85551 Kirchheim-Heimstetten Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Häring
-- Paul McCullagh PrimeBase Technologies www.primebase.org www.blobstreaming.org pbxt.blogspot.com
Regards / Mit vielen Grüßen, Sergei -- __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@sun.com> / /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server Architect /_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München 161028 <___/ Sonnenallee 1, 85551 Kirchheim-Heimstetten Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Häring
Hi Sergei, Thanks for the great explanation. On Dec 1, 2009, at 2:50 PM, Sergei Golubchik wrote:
Hi, Paul!
On Dec 01, Paul McCullagh wrote:
Thanks for the heads-up on that!
The code in question is as follows:
#ifndef WILL_BE_DELETED_LATER /* for now, only InnoDB supports 2pc. It means we can always safely rollback all pending transactions, without risking inconsistent data */ DBUG_ASSERT(total_ha_2pc == (ulong) opt_bin_log+1); // only InnoDB and binlog tc_heuristic_recover= TC_HEURISTIC_RECOVER_ROLLBACK; // forcing ROLLBACK info.dry_run=FALSE; #endif
What would be the affect of removing these lines?
I'd better start from "what is the effect of these lines" :)
2PC (with a binlog) works as follows:
prepare in all affected storage engines binlog write commit in all affected storage engines
XA recovery (with a binlog) works as follows:
ask storage engines about the prepared but not committed transactions ask engines to commit every such a transaction that is in a binlog and ask engines to rollback other prepared transactions
Now, if a binlog doesn't indicate a crash (last binlog file was closed properly, showing a normal shutdown procedure) or, for example, there're no binlog files at all, the server cannot perform the recovery - it doesn't know what transactions should be committed and what transactions should be rolled back. A transaction may be prepared in one engine and already committed (or rolled back) in another. In this case the server requests the user to make the decision, the user has to request an explicit commit or rollback with --tc-heuristic-recover command-line switch.
So if I understand this correctly, tc-heuristic-recover handles a situation that should actually never occur.
But if there's only one XA-capable storage engine, it's always safe to rollback all prepared transactions, there can be no other engine that has them committed. That's what the ifdef-ed code does - if InnoDB is the only XA-capable storage engine on recovery without binlog it forces a rollback of all not committed transactions, preserving the pre-XA InnoDB behavior.
So, in fact, we could change this code: #ifndef WILL_BE_DELETED_LATER /* for now, only InnoDB supports 2pc. It means we can always safely rollback all pending transactions, without risking inconsistent data */ DBUG_ASSERT(total_ha_2pc == (ulong) opt_bin_log+1); // only InnoDB and binlog tc_heuristic_recover= TC_HEURISTIC_RECOVER_ROLLBACK; // forcing ROLLBACK info.dry_run=FALSE; #endif to: if (total_ha_2pc == (ulong) opt_bin_log+1) { tc_heuristic_recover= TC_HEURISTIC_RECOVER_ROLLBACK; // forcing ROLLBACK info.dry_run=FALSE; }
And what do you think we would need to test when we take out the lines?
Start a transaction, do some table writes. Crash the server during the 2pc, for example:
SET @@debug="d,crash_commit_after_prepare"; COMMIT;
delete all binlog files (or only the last one), try to start mysqld - it should abort with an error.
OK, thanks. I will give this a try.
The error message tells what to do - I hope you'll find it clear enough (but please complain, if not).
Will do! Best regards, Paul
On Dec 1, 2009, at 1:14 PM, Sergei Golubchik wrote:
On Dec 01, Paul McCullagh wrote:
This patch changes MariaDB code in one case:
File handler.cc, Line 1591
DBUG_ASSERT(total_ha_2pc == (ulong) opt_bin_log+1); // only InnoDB and binlog
This line has been commented out, because the assertions fails now that PBXT supports XA.
Although this assertion implies that not everything will work, I have not found any problem with the stability of MySQL due to the addition of PBXT XA.
Eh, no, not exactly. This assertion was to make sure that the ifdef'ed code only runs when InnoDB is the only XA-capable engine.
As it's no longer the case, you need to remove the whole ifdef block, not only the assert.
PBXT XA can be disabled by setting pbxt_support_xa=0, it is enabled by default.
Regards / Mit vielen Grüßen, Sergei
-- __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@sun.com> / /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server Architect /_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München 161028 <___/ Sonnenallee 1, 85551 Kirchheim- Heimstetten Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Häring
-- Paul McCullagh PrimeBase Technologies www.primebase.org www.blobstreaming.org pbxt.blogspot.com
Regards / Mit vielen Grüßen, Sergei
-- __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@sun.com> / /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server Architect /_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München 161028 <___/ Sonnenallee 1, 85551 Kirchheim- Heimstetten Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Häring
-- Paul McCullagh PrimeBase Technologies www.primebase.org www.blobstreaming.org pbxt.blogspot.com
Hi, Paul! On Dec 02, Paul McCullagh wrote:
Now, if a binlog doesn't indicate a crash (last binlog file was closed properly, showing a normal shutdown procedure) or, for example, there're no binlog files at all, the server cannot perform the recovery - it doesn't know what transactions should be committed and what transactions should be rolled back. A transaction may be prepared in one engine and already committed (or rolled back) in another. In this case the server requests the user to make the decision, the user has to request an explicit commit or rollback with --tc-heuristic-recover command-line switch.
So if I understand this correctly, tc-heuristic-recover handles a situation that should actually never occur.
Yes, that's mainly to account for a user mistake - deleting binlog after a crash or copying the live datadir from master to a slave without binlog. Or something similar.
But if there's only one XA-capable storage engine, it's always safe to rollback all prepared transactions, there can be no other engine that has them committed. That's what the ifdef-ed code does - if InnoDB is the only XA-capable storage engine on recovery without binlog it forces a rollback of all not committed transactions, preserving the pre-XA InnoDB behavior.
So, in fact, we could change this code:
to:
if (total_ha_2pc == (ulong) opt_bin_log+1) { tc_heuristic_recover= TC_HEURISTIC_RECOVER_ROLLBACK; // forcing ROLLBACK info.dry_run=FALSE; }
I intentionally did it with ifdef - the idea was that if there could *possibly* be another XA-engine, we need to go the safe way. Otherwise one could, for example, restart the server with disabled pbxt and unknowingly enable auto-rollback mode, causing inconsistent data. Regards / Mit vielen Grüßen, Sergei -- __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@sun.com> / /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server Architect /_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München 161028 <___/ Sonnenallee 1, 85551 Kirchheim-Heimstetten Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Häring
Serg, thanks a lot for your help and comments! Paul, for now I have removed the whole #ifdef block following Serg's suggestion, and pushed the result for full Buildbot testing; if everything looks ok I will merge into main lp:maria. It is great that you are following up on this, just let me know if you find a better approach and we'll change it. - Kristian.
Hi Kristian, On Dec 2, 2009, at 1:07 PM, Kristian Nielsen wrote:
Serg, thanks a lot for your help and comments!
Paul, for now I have removed the whole #ifdef block following Serg's suggestion, and pushed the result for full Buildbot testing; if everything looks ok I will merge into main lp:maria.
Thanks, this would be great.
It is great that you are following up on this, just let me know if you find a better approach and we'll change it.
As I mentioned in a previous e-mail to Sergei, his comments have lead me to discover that 2-phase commit is not used with PBXT and the binlog (although 1.0.09 supports 2-phase commit). I am investigating the problem, and will push a patch as soon as I have fixed the problem. In the meantime, I believe it is still OK to merge 1.0.09 after testing, as you say. Best regards, Paul -- Paul McCullagh PrimeBase Technologies www.primebase.org www.blobstreaming.org pbxt.blogspot.com
Hi Sergei, I have tested this, and it all works as you specified, but now I have found a problem with PBXT. The problem stems from code in PBXT (in functions ha_pbxt::write_row, ha_pbxt::update_row, ha_pbxt::delete_row) which I have pasted below. The code is a hack! According to the comment I had a problem with calling: trans_register_ha(pb_mysql_thd, FALSE, pbxt_hton), to register the start of a transaction statement. When I did this unconditionally at the start of a statement, ha_commit_trans() was not always called (and MySQL became confused, thinking a transaction is still running). On reading the MySQL code (long time ago) I noticed that ha_commit_trans() was only always called when an update was performed. So I delayed the trans_register_ha(pb_mysql_thd, FALSE, pbxt_hton) call until write_row(), etc was called. The problem now is that MySQL now fails to recognize PBXT transactions as XA (see ha_write_row below), because mark_trx_read_write() is called before write_row() (i.e. before trans_register_ha() call). I noticed that InnoDB and NDB call trans_register_ha() unconditionally at the start of a statement. Could it be that the problem that lead to my "hack" has been fixed? Best regards, Paul PBXT code in ha_pbxt::write_row, etc. ------------------------------------------------------------ /* GOTCHA: I have a huge problem with the transaction statement. * It is not ALWAYS committed (I mean ha_commit_trans() is * not always called - for example in SELECT). * * If I call trans_register_ha() but ha_commit_trans() is not called * then MySQL thinks a transaction is still running (while * I have committed the auto-transaction in ha_pbxt::external_lock()). * * This causes all kinds of problems, like transactions * are killed when they should not be. * * To prevent this, I only inform MySQL that a transaction * has beens started when an update is performed. I have determined that * ha_commit_trans() is only guarenteed to be called if an update is done. */ if (!pb_open_tab->ot_thread->st_stat_trans) { trans_register_ha(pb_mysql_thd, FALSE, pbxt_hton); XT_PRINT0(pb_open_tab->ot_thread, "ha_pbxt::write_row trans_register_ha all=FALSE\n"); pb_open_tab->ot_thread->st_stat_trans = TRUE; } ------------------------------- int handler::ha_write_row(uchar *buf) { int error; Log_func *log_func= Write_rows_log_event::binlog_row_logging_function; DBUG_ENTER("handler::ha_write_row"); mark_trx_read_write(); if (unlikely(error= write_row(buf))) DBUG_RETURN(error); if (unlikely(error= binlog_log_row(table, 0, buf, log_func))) DBUG_RETURN(error); /* purecov: inspected */ DBUG_RETURN(0); } On Dec 2, 2009, at 12:26 PM, Sergei Golubchik wrote:
Hi, Paul!
On Dec 02, Paul McCullagh wrote:
Now, if a binlog doesn't indicate a crash (last binlog file was closed properly, showing a normal shutdown procedure) or, for example, there're no binlog files at all, the server cannot perform the recovery - it doesn't know what transactions should be committed and what transactions should be rolled back. A transaction may be prepared in one engine and already committed (or rolled back) in another. In this case the server requests the user to make the decision, the user has to request an explicit commit or rollback with --tc-heuristic-recover command-line switch.
So if I understand this correctly, tc-heuristic-recover handles a situation that should actually never occur.
Yes, that's mainly to account for a user mistake - deleting binlog after a crash or copying the live datadir from master to a slave without binlog. Or something similar.
But if there's only one XA-capable storage engine, it's always safe to rollback all prepared transactions, there can be no other engine that has them committed. That's what the ifdef-ed code does - if InnoDB is the only XA-capable storage engine on recovery without binlog it forces a rollback of all not committed transactions, preserving the pre-XA InnoDB behavior.
So, in fact, we could change this code:
to:
if (total_ha_2pc == (ulong) opt_bin_log+1) { tc_heuristic_recover= TC_HEURISTIC_RECOVER_ROLLBACK; // forcing ROLLBACK info.dry_run=FALSE; }
I intentionally did it with ifdef - the idea was that if there could *possibly* be another XA-engine, we need to go the safe way. Otherwise one could, for example, restart the server with disabled pbxt and unknowingly enable auto-rollback mode, causing inconsistent data.
Regards / Mit vielen Grüßen, Sergei
-- __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@sun.com> / /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server Architect /_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München 161028 <___/ Sonnenallee 1, 85551 Kirchheim- Heimstetten Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Häring
-- Paul McCullagh PrimeBase Technologies www.primebase.org www.blobstreaming.org pbxt.blogspot.com
participants (3)
-
Kristian Nielsen
-
Paul McCullagh
-
Sergei Golubchik