Hi Kentoku, On Thu, Sep 19, 2013 at 11:47:33PM +0900, kentoku wrote:
Hi Sergey,
I'm afraid fixing rnd_end() callers in the server may stall for a long time. O.K. No need to fix it if it is not easy. This request is not high priority request.
Is it accaptable for spider to use bulk updates and deletes API instead (see handler.h: start_bulk_update/start_bulk_delete). Spider use it already. This API is not used if target table has after trigger. I understand why this API is not used in this case, but it sometimes causes performance problem. So, I thought it is better to prepare other choice for something wrong. Anyway, I will disable bulk updating/deleting feature without using API. I see. Hope it is acceptable.
Nope, it looks like a bug in thread pool. MDEV-4739 has different trace, how did you get this one? Just executed given test? I got this trace when I try to reproduce MDEV-4739. I did as the followings.
1. start mysqld 2. log in mysqld 3. mysql> CREATE TABLE t1 (a INT) ENGINE=InnoDB; 4. mysql> XA START 'xa1'; 5. mysql> INSERT INTO t1 (a) VALUES (1),(2); 6. mysql> XA END 'xa1'; 7. mysql> XA PREPARE 'xa1'; 8. kill -9 mysqld_safe and mysqld from another terminal 9. start mysqld on gdb
At that time, InnoDB and Spider were enabled and log-bin was disabled. So probably "total_ha_2pc > 1" was true, "opt_bin_log" was false. Does it help you?
We couldn't reproduce it yet. :( Looking through the code I noticed that call to thd_wait_begin() looks as following: static void scheduler_wait_sync_begin(void) { thd_wait_begin(NULL, THD_WAIT_SYNC); } Note that thd is always NULL. And it must be NULL at this point, because we're booting. But according to your trace thd is not NULL. #0 0x00000000005eabf6 in thd_wait_begin ( thd=0x29da060, wait_type=10) at /ssd1/mariadb-10.0.4/sql/sql_class.cc:4277 #1 0x000000000072a114 in scheduler_wait_sync_begin () at /ssd1/mariadb-10.0.4/sql/scheduler.cc:59 ... The above should have been fixed back in the beginning of 2012. Which MariaDB revision are you testing with? Thanks, Sergey
Thanks, Kentoku
2013/9/19 Sergey Vojtovich <svoj@mariadb.org>
Hi Kentoku,
I'm adding MariaDB developers to CC.
On Thu, Sep 19, 2013 at 01:19:13AM +0900, kentoku wrote:
Hi Sergey,
But what kind of errors are possible in your case? Other storage engines doesn't seem to suffer from this API violation.
Spider support bulk updating and deleting for avoiding network roundtrip between data node. Some times, last bulk updating is executed in rnd_end() function. So rnd_end() has possibility getting errors from data node. I'm afraid fixing rnd_end() callers in the server may stall for a long time. Is it accaptable for spider to use bulk updates and deletes API instead (see handler.h: start_bulk_update/start_bulk_delete).
By the way, about MDEV-4739. I get the following stack trace. Program received signal SIGSEGV, Segmentation fault. 0x00000000005eabf6 in thd_wait_begin (thd=0x29da060, wait_type=10) at /ssd1/mariadb-10.0.4/sql/sql_class.cc:4277 4277 MYSQL_CALLBACK(thd->scheduler, thd_wait_begin, (thd, wait_type)); (gdb) print thd $1 = (THD *) 0x29da060 (gdb) bt #0 0x00000000005eabf6 in thd_wait_begin ( thd=0x29da060, wait_type=10) at /ssd1/mariadb-10.0.4/sql/sql_class.cc:4277 #1 0x000000000072a114 in scheduler_wait_sync_begin () at /ssd1/mariadb-10.0.4/sql/scheduler.cc:59 #2 0x0000000000d6dc20 in my_sync (fd=23, my_flags=0) at /ssd1/mariadb-10.0.4/mysys/my_sync.c:76 #3 0x0000000000d6b54f in my_msync (fd=23, addr=0x7ffff7ff4000, len=4096, flags=4) at /ssd1/mariadb-10.0.4/mysys/my_mmap.c:27 #4 0x00000000008bea03 in TC_LOG_MMAP::open ( this=0x16e6a00, opt_name=0xe19c87 "tc.log") at /ssd1/mariadb-10.0.4/sql/log.cc:7735 #5 0x00000000005751cb in init_server_components () at /ssd1/mariadb-10.0.4/sql/mysqld.cc:4797 #6 0x0000000000575a07 in mysqld_main (argc=30, argv=0x1f204d0) at /ssd1/mariadb-10.0.4/sql/mysqld.cc:5208 #7 0x000000000056d884 in main (argc=11, argv=0x7fffffffe3a8) at /ssd1/mariadb-10.0.4/sql/main.cc:25 (gdb) print thd->scheduler $2 = (scheduler_functions *) 0x8f8f8f8f8f8f8f8f (gdb) print thd_wait_begin $3 = {void (THD *, int)} 0x5eaba4 <thd_wait_begin(THD*, int)> (gdb) print wait_type $4 = 10
It is looks that "thd->scheduler" is not initialized. What do you think? Must storage engine set it? Nope, it looks like a bug in thread pool. MDEV-4739 has different trace, how did you get this one? Just executed given test?
Regards, Sergey