[Maria-developers] crash in TC_LOG_MMAP::log_one_transaction in maria-10.0.0
Hello, I am having problems with the shinny new mariadb-10.0.0. I have a simple test case that replaces a row in innodb and tokudb and then commits. Eventually, mysqld hits this assert: mysqld: /home/prohaska/maria10-build/mariadb-10.0.0/sql/log.cc:7547: int TC_LOG_MMAP::log_one_transaction(my_xid): Assertion `p->ptr < p->end' failed. The test works great when the binlog is enabled. The test crashes when the binlog is OFF. So, maybe the problem is isolated to the TC_LOG_MMAP logic. I noticed that the TC_LOG_MMAP::unlog function is quite different than the mariadb-5.5 version. What is the best way to get this fixed? Would it help if i made the test case available? Thanks Rich Prohaska Here is the stack trace: Program received signal SIGABRT, Aborted. 0x0000003006a32885 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x0000003006a32885 in raise () from /lib64/libc.so.6 #1 0x0000003006a34065 in abort () from /lib64/libc.so.6 #2 0x0000003006a2b9fe in __assert_fail_base () from /lib64/libc.so.6 #3 0x0000003006a2bac0 in __assert_fail () from /lib64/libc.so.6 #4 0x00000000008862bf in TC_LOG_MMAP::log_one_transaction (this=0x1541f00 <tc_log_mmap>, xid=142018) at /home/prohaska/maria10-build/mariadb-10.0.0/sql/log.cc:7547 #5 0x0000000000885442 in TC_LOG_MMAP::log_and_order (this=0x1541f00 <tc_log_mmap>, thd=0x27f0000, xid=142018, all=true, need_prepare_ordered=false, need_commit_ordered=true) at /home/prohaska/maria10-\ build/mariadb-10.0.0/sql/log.cc:7180 #6 0x00000000007b819f in ha_commit_trans (thd=0x27f0000, all=true) at /home/prohaska/maria10-build/mariadb-10.0.0/sql/handler.cc:1329 #7 0x0000000000710307 in trans_commit (thd=0x27f0000) at /home/prohaska/maria10-build/mariadb-10.0.0/sql/transaction.cc:213 #8 0x000000000060a2df in mysql_execute_command (thd=0x27f0000) at /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_parse.cc:4084 #9 0x000000000060f1a7 in mysql_parse (thd=0x27f0000, rawbuf=0x7fff6c0050b8 "commit", length=6, parser_state=0x7fffe4112660) at /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_parse.cc:6056 #10 0x00000000006029a3 in dispatch_command (command=COM_QUERY, thd=0x27f0000, packet=0x27fa341 "commit", packet_length=6) at /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_parse.cc:1216 #11 0x0000000000601bdb in do_command (thd=0x27f0000) at /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_parse.cc:945 #12 0x00000000006ff6d1 in do_handle_one_connection (thd_arg=0x27f0000) at /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_connect.cc:1254 #13 0x00000000006ff179 in handle_one_connection (arg=0x27f0000) at /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_connect.cc:1168 #14 0x0000003006e077f1 in start_thread () from /lib64/libpthread.so.0 #15 0x0000003006ae592d in clone () from /lib64/libc.so.6
Hi, Rich! On Nov 15, Rich Prohaska wrote:
Hello, I am having problems with the shinny new mariadb-10.0.0.
I have a simple test case that replaces a row in innodb and tokudb and then commits. Eventually, mysqld hits this assert:
mysqld: /home/prohaska/maria10-build/mariadb-10.0.0/sql/log.cc:7547: int TC_LOG_MMAP::log_one_transaction(my_xid): Assertion `p->ptr < p->end' failed.
The test works great when the binlog is enabled. The test crashes when the binlog is OFF. So, maybe the problem is isolated to the TC_LOG_MMAP logic. I noticed that the TC_LOG_MMAP::unlog function is quite different than the mariadb-5.5 version.
Yes, that's because of MDEV-232 (Remove one fsync() inside engine's commit() method).
What is the best way to get this fixed? Would it help if i made the test case available?
Yes, please, it would help a lot! Regards, Sergei
Hello Sergei, I made a 10.0.0 branch on launchpad with tokudb in it. The following can be used to hit the assert in TC_LOG_MMAP::log_one_transaction. If you have a chance, please take a look and let us know what we need to change (if the maria 10 code base or in our storage engine) to get this fixed. Thanks Rich Prohaska Here is what i did to reproduce the problem: mkdir m10 m10-build cd m10-build # get my maria10 branch and the 6.5.0 fractal tree SDK bzr branch lp:~prohaska7/maria/2pc-crash-with-maria10-and-tokudb wget https://s3.amazonaws.com/tokutek-mysql-6.5.0/tokufractaltreeindex-6.5.0-4816... wget https://s3.amazonaws.com/tokutek-mysql-6.5.0/tokufractaltreeindex-6.5.0-4816... md5sum --check *.md5 tar xzf tokufractaltreeindex-6.5.0-48167-linux-x86_64.tar.gz # setup env to point to the fractal tree includes and libs export TOKUFRACTALTREE=$PWD/tokufractaltreeindex-6.5.0-48167-linux-x86_64 export TOKUFRACTALTREE_LIBNAME=tokufractaltreeindex-6.5.0-48167_static export TOKUPORTABILITY_LIBNAME=tokuportability-6.5.0-48167_static export TOKUDB_VERSION=test10 export MYSQL_BUILD_PREFIX=$HOME/m10 # build and install cd 2pc-crash-with-maria10-and.tokudb ./BUILD/compile-pentium-debug-max make install # run mysql_install_db # start mysqld # install plugin tokudb soname 'ha_tokudb.so'; # shutdown and reboot mysqld (so as to avoid a nasty crash in # start mysqld without binlog so that TC_LOG_MMAP is being used # create tokudb and innodb test tables # the following python program is attached. # ./logxid.deadlock.py --docreate=1 # run some simple multi-engine txns until mysqld crashes. takes a couple of minutes on my machine. # ./logxid.deadlock.py --k=1 # should hit this assert eventually: # mysqld: /home/prohaska/m10-build/2pc-crash-with-maria10-and-tokudb/sql/log.cc:7547: int TC_LOG_MMAP::log_one_transaction(my_xid): Assertion `p->ptr < p->end' failed. On Thu, Nov 15, 2012 at 11:05 AM, Sergei Golubchik <serg@askmonty.org> wrote:
Hi, Rich!
On Nov 15, Rich Prohaska wrote:
Hello, I am having problems with the shinny new mariadb-10.0.0.
I have a simple test case that replaces a row in innodb and tokudb and then commits. Eventually, mysqld hits this assert:
mysqld: /home/prohaska/maria10-build/mariadb-10.0.0/sql/log.cc:7547: int TC_LOG_MMAP::log_one_transaction(my_xid): Assertion `p->ptr < p->end' failed.
The test works great when the binlog is enabled. The test crashes when the binlog is OFF. So, maybe the problem is isolated to the TC_LOG_MMAP logic. I noticed that the TC_LOG_MMAP::unlog function is quite different than the mariadb-5.5 version.
Yes, that's because of MDEV-232 (Remove one fsync() inside engine's commit() method).
What is the best way to get this fixed? Would it help if i made the test case available?
Yes, please, it would help a lot!
Regards, Sergei
Rich Prohaska <prohaska7@gmail.com> writes:
I made a 10.0.0 branch on launchpad with tokudb in it. The following can be used to hit the assert in TC_LOG_MMAP::log_one_transaction. If
Thanks a lot for your effort. I'll look into fixing this ASAP. See MDEV-3861 for tracking this issue: https://mariadb.atlassian.net/browse/MDEV-3861 - Kristian.
Rich Prohaska <prohaska7@gmail.com> writes:
I made a 10.0.0 branch on launchpad with tokudb in it. The following can be used to hit the assert in TC_LOG_MMAP::log_one_transaction. If
With the test case you supplied, it was easy to repeat the problem, thanks a lot! The attached patch fixes the issue for me, please try it and let me know if it solves the problem for you also. This bug (two bugs actually) is in all versions of MariaDB. Perhaps it is only possible to trigger it in 10.0, or at least easier to trigger, as it depends on the order in which things happen during commit. Thanks for finding and reporting this. As you have learned the hard way, this (multi-engine XA transactions with no binlog) is one of the not-so-well tested parts of the server, but if you keep the bug reports coming we will do our best to fix them ASAP. - Kristian.
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
The attached patch fixes the issue for me, please try it and let me know if it solves the problem for you also.
Sorry, the patch is broken, I didn't test carefully enough. I'll look at this further Monday and get back to you with more, please don't waste time on this broken patch... - Kristian.
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
Sorry, the patch is broken, I didn't test carefully enough.
In fact, I found *another* bug, making it 3 in total... The first two of these also exists in older MariaDB/MySQL, but may be hard/impossible to trigger without the third bug, which is something I introduced in 10.0. Try this patch, I verified more carefully now that it fixes the problem and allows your test to run without crashing. Thanks again for taking the time to report this. Let us know if you find more issues, and I will look into fixing them as well. - Kristian.
Hello Kristian, Thanks for the patch. It fixes the problem that I described. Now onto the next problem. Try running: logxid.deadlock.py --k=1 & logxid.deadlock.py --k=2 & logxid.deadlock.py --k=3 & When one runs multiple XA clients with a debug build of my branch, we crash at log.cc line 7567. An unlocked mutex is being unlocked. I discussed this problem with Sergei and we think that the problem is caused a few lines above. I updated my branch ( lp:~prohaska7/maria/2pc-crash-with-maria10-and-tokudb) with a proposed bug fix. Rich Prohaska On Sun, Nov 18, 2012 at 6:59 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
Sorry, the patch is broken, I didn't test carefully enough.
In fact, I found *another* bug, making it 3 in total... The first two of these also exists in older MariaDB/MySQL, but may be hard/impossible to trigger without the third bug, which is something I introduced in 10.0.
Try this patch, I verified more carefully now that it fixes the problem and allows your test to run without crashing.
Thanks again for taking the time to report this. Let us know if you find more issues, and I will look into fixing them as well.
- Kristian.
Rich Prohaska <prohaska7@gmail.com> writes:
When one runs multiple XA clients with a debug build of my branch, we crash at log.cc line 7567. An unlocked mutex is being unlocked. I discussed this problem with Sergei and we think that the problem is caused a few lines above. I updated my branch ( lp:~prohaska7/maria/2pc-crash-with-maria10-and-tokudb) with a proposed bug fix.
Yes, your patch looks correct, thanks! I'll include it along with my other patches. Do your tests work with this your latest patch? - Kristian.
test ran for a few hours without error. thanks for quickly fixing the problems. how can we get tokudb into the mariadb regression environment? On Mon, Nov 19, 2012 at 12:39 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Rich Prohaska <prohaska7@gmail.com> writes:
When one runs multiple XA clients with a debug build of my branch, we crash at log.cc line 7567. An unlocked mutex is being unlocked. I discussed this problem with Sergei and we think that the problem is caused a few lines above. I updated my branch ( lp:~prohaska7/maria/2pc-crash-with-maria10-and-tokudb) with a proposed bug fix.
Yes, your patch looks correct, thanks! I'll include it along with my other patches.
Do your tests work with this your latest patch?
- Kristian.
participants (3)
-
Kristian Nielsen
-
Rich Prohaska
-
Sergei Golubchik