Hi Kentoku, I just reviewed one of your revisions, specifically bzr diff -c3829 lp:~kentokushiba/maria/10.0.4-spider-3.0/ I believe things are a bit more complex: 2PC protocol doesn't seem to permit cohorts to fail during commit phase: http://en.wikipedia.org/wiki/Two-phase_commit_protocol#Commit_phase <quot> If the coordinator received an agreement message from all cohorts during the commit-request phase: 1. The coordinator sends a commit message to all the cohorts. 2. Each cohort completes the operation, and releases all the locks and resources held during the transaction. 3. Each cohort sends an acknowledgment to the coordinator. 4. The coordinator completes the transaction when all acknowledgments have been received. </quot> I read the above as: the only problem coordinator may experience is missing acknowledgement. What shall coordinator do if some cohorts acknowledged commit, but some did not? Probably spider should detect it earlier? Sergei, what's your opinion? Regards, Sergey On Mon, Sep 30, 2013 at 05:45:13AM +0900, kentoku wrote:
Hi Sergey,
Thank you for your information. I could reproduce. I tried to fix it and pushed it.
Thanks, Kentoku
2013/9/27 Sergey Vojtovich <svoj@mariadb.org>
Hi Kentoku,
BUILD/compile-amd64-debug-max cd mysql-test cat > t/AAA.test --source include/have_innodb.inc
install soname 'ha_spider.so';
--connection default eval CREATE TABLE t1 (a INT) ENGINE=InnoDB;
--connect (con1,localhost,root,,) XA START 'xa1'; INSERT INTO t1 (a) VALUES (1),(2); XA END 'xa1'; XA PREPARE 'xa1';
--connection default --enable_reconnect --append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect restart EOF --shutdown_server 0 --source include/wait_until_disconnected.inc --source include/wait_until_connected_again.inc XA RECOVER; XA COMMIT 'xa1'; --End of file ./mtr AAA
Regards, Sergey
Hi Sergey,
The above should have been fixed back in the beginning of 2012. Which MariaDB revision are you testing with? It's same as lp:~kentokushiba/maria/10.0.4-spider-3.0. That's very strange. How did you build it and what command line options did you use (including those that are listed in my.cnf)? O.K. I send you later.
By the way, I found a problem point in Spider. Currently it is fixed and pushed. And I can't reproduce assertion failure. Can you reproduce
failure yet? I just tested 10.0.4-spider-3.0 with rev.3827, it still fails. Could you please tell me about build options and command line options
you used?
Thanks, Kentoku
2013/9/27 Sergey Vojtovich <svoj@mariadb.org>
Hi Kentoku,
On Wed, Sep 25, 2013 at 02:53:17AM +0900, kentoku wrote:
Hi Sergey,
The above should have been fixed back in the beginning of 2012. Which MariaDB revision are you testing with? It's same as lp:~kentokushiba/maria/10.0.4-spider-3.0. That's very strange. How did you build it and what command line
you use (including those that are listed in my.cnf)?
By the way, I found a problem point in Spider. Currently it is fixed and pushed. And I can't reproduce assertion failure. Can you reproduce
On Fri, Sep 27, 2013 at 11:21:01PM +0900, kentoku wrote: this that options did this
failure yet? I just tested 10.0.4-spider-3.0 with rev.3827, it still fails.
Regards, Sergey
Thanks, Kentoku
2013/9/24 Sergey Vojtovich <svoj@mariadb.org>
Hi Kentoku,
On Thu, Sep 19, 2013 at 11:47:33PM +0900, kentoku wrote: > Hi Sergey, > > > I'm afraid fixing rnd_end() callers in the server may stall
for a long
> time. > O.K. No need to fix it if it is not easy. This request is not high priority > request. > > > Is it accaptable for spider to use bulk updates and deletes API instead > > (see handler.h: start_bulk_update/start_bulk_delete). > Spider use it already. This API is not used if target table has after > trigger. I understand why this API is not used in this case, but it > sometimes causes performance problem. So, I thought it is better to prepare > other choice for something wrong. Anyway, I will disable bulk > updating/deleting feature without using API. I see. Hope it is acceptable.
> > > Nope, it looks like a bug in thread pool. MDEV-4739 has different trace, > how > did you get this one? Just executed given test? > I got this trace when I try to reproduce MDEV-4739. I did as the followings. > > 1. start mysqld > 2. log in mysqld > 3. mysql> CREATE TABLE t1 (a INT) ENGINE=InnoDB; > 4. mysql> XA START 'xa1'; > 5. mysql> INSERT INTO t1 (a) VALUES (1),(2); > 6. mysql> XA END 'xa1'; > 7. mysql> XA PREPARE 'xa1'; > 8. kill -9 mysqld_safe and mysqld from another terminal > 9. start mysqld on gdb > > At that time, InnoDB and Spider were enabled and log-bin was disabled. So > probably "total_ha_2pc > 1" was true, "opt_bin_log" was false. > Does it help you? We couldn't reproduce it yet. :(
Looking through the code I noticed that call to thd_wait_begin() looks as following:
static void scheduler_wait_sync_begin(void) { thd_wait_begin(NULL, THD_WAIT_SYNC); }
Note that thd is always NULL. And it must be NULL at this point, because we're booting. But according to your trace thd is not NULL.
#0 0x00000000005eabf6 in thd_wait_begin ( thd=0x29da060, wait_type=10) at /ssd1/mariadb-10.0.4/sql/sql_class.cc:4277 #1 0x000000000072a114 in scheduler_wait_sync_begin () at /ssd1/mariadb-10.0.4/sql/scheduler.cc:59 ...
The above should have been fixed back in the beginning of 2012. Which MariaDB revision are you testing with?
Thanks, Sergey
> > Thanks, > Kentoku > > > > 2013/9/19 Sergey Vojtovich <svoj@mariadb.org> > > > Hi Kentoku, > > > > I'm adding MariaDB developers to CC. > > > > On Thu, Sep 19, 2013 at 01:19:13AM +0900, kentoku wrote: > > > Hi Sergey, > > > > > > > But what kind of errors are possible in your case? Other storage > > engines > > > doesn't > > > seem to suffer from this API violation. > > > > > > Spider support bulk updating and deleting for avoiding network roundtrip > > > between data node. Some times, last bulk updating is executed in > > rnd_end() > > > function. So rnd_end() has possibility getting errors from data node. > > I'm afraid fixing rnd_end() callers in the server may stall for a long > > time. > > Is it accaptable for spider to use bulk updates and deletes API instead > > (see handler.h: start_bulk_update/start_bulk_delete). > > > > > By the way, about MDEV-4739. I get the following stack trace. > > > Program received signal SIGSEGV, Segmentation fault. > > > 0x00000000005eabf6 in thd_wait_begin (thd=0x29da060, > > > wait_type=10) > > > at /ssd1/mariadb-10.0.4/sql/sql_class.cc:4277 > > > 4277 MYSQL_CALLBACK(thd->scheduler, thd_wait_begin, (thd, > > wait_type)); > > > (gdb) print thd > > > $1 = (THD *) 0x29da060 > > > (gdb) bt > > > #0 0x00000000005eabf6 in thd_wait_begin ( > > > thd=0x29da060, wait_type=10) > > > at /ssd1/mariadb-10.0.4/sql/sql_class.cc:4277 > > > #1 0x000000000072a114 in scheduler_wait_sync_begin () > > > at /ssd1/mariadb-10.0.4/sql/scheduler.cc:59 > > > #2 0x0000000000d6dc20 in my_sync (fd=23, my_flags=0) > > > at /ssd1/mariadb-10.0.4/mysys/my_sync.c:76 > > > #3 0x0000000000d6b54f in my_msync (fd=23, > > > addr=0x7ffff7ff4000, len=4096, flags=4) > > > at /ssd1/mariadb-10.0.4/mysys/my_mmap.c:27 > > > #4 0x00000000008bea03 in TC_LOG_MMAP::open ( > > > this=0x16e6a00, opt_name=0xe19c87 "tc.log") > > > at /ssd1/mariadb-10.0.4/sql/log.cc:7735 > > > #5 0x00000000005751cb in init_server_components () > > > at /ssd1/mariadb-10.0.4/sql/mysqld.cc:4797 > > > #6 0x0000000000575a07 in mysqld_main (argc=30, > > > argv=0x1f204d0) > > > at /ssd1/mariadb-10.0.4/sql/mysqld.cc:5208 > > > #7 0x000000000056d884 in main (argc=11, > > > argv=0x7fffffffe3a8) > > > at /ssd1/mariadb-10.0.4/sql/main.cc:25 > > > (gdb) print thd->scheduler > > > $2 = (scheduler_functions *) 0x8f8f8f8f8f8f8f8f > > > (gdb) print thd_wait_begin > > > $3 = {void (THD *, > > > int)} 0x5eaba4 <thd_wait_begin(THD*, int)> > > > (gdb) print wait_type > > > $4 = 10 > > > > > > It is looks that "thd->scheduler" is not initialized. What do you think? > > > Must storage engine set it? > > Nope, it looks like a bug in thread pool. MDEV-4739 has different trace, > > how > > did you get this one? Just executed given test? > > > > Regards, > > Sergey > >