Re: [Maria-developers] Questions re MDEV-4736 and MDEV-4739 (was Re: Spider's installation sql file)

4 Oct 2013

      Hi Kentoku,

I just reviewed one of your revisions, specifically
bzr diff -c3829 lp:~kentokushiba/maria/10.0.4-spider-3.0/

I believe things are a bit more complex: 2PC protocol doesn't seem to permit
cohorts to fail during commit phase:
http://en.wikipedia.org/wiki/Two-phase_commit_protocol#Commit_phase

<quot>
If the coordinator received an agreement message from all cohorts during the
commit-request phase:
  1. The coordinator sends a commit message to all the cohorts.
  2. Each cohort completes the operation, and releases all the locks and
     resources held during the transaction.
  3. Each cohort sends an acknowledgment to the coordinator.
  4. The coordinator completes the transaction when all acknowledgments have
     been received.
</quot>

I read the above as: the only problem coordinator may experience is missing
acknowledgement. What shall coordinator do if some cohorts acknowledged
commit, but some did not? Probably spider should detect it earlier?

Sergei, what's your opinion?

Regards,
Sergey

On Mon, Sep 30, 2013 at 05:45:13AM +0900, kentoku wrote:
...
Hi Sergey,
Thank you for your information. I could reproduce. I tried to fix it and
pushed it.
Thanks,
Kentoku
2013/9/27 Sergey Vojtovich <svoj@mariadb.org>
...
Hi Kentoku,
BUILD/compile-amd64-debug-max
cd mysql-test
cat > t/AAA.test
--source include/have_innodb.inc
install soname 'ha_spider.so';
--connection default
eval CREATE TABLE t1 (a INT) ENGINE=InnoDB;
--connect (con1,localhost,root,,)
XA START 'xa1';
INSERT INTO t1 (a) VALUES (1),(2);
XA END 'xa1';
XA PREPARE 'xa1';
--connection default
--enable_reconnect
--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
restart
EOF
--shutdown_server 0
--source include/wait_until_disconnected.inc
--source include/wait_until_connected_again.inc
XA RECOVER;
XA COMMIT 'xa1';
--End of file
./mtr AAA
Regards,
Sergey
...
Hi Sergey,
...
...
...
The above should have been fixed back in the beginning of 2012.
Which
MariaDB
revision are you testing with?
It's same as lp:~kentokushiba/maria/10.0.4-spider-3.0.
That's very strange. How did you build it and what command line options
did
you use (including those that are listed in my.cnf)?
O.K. I send you later.
...
...
By the way, I found a problem point in Spider. Currently it is fixed
and
pushed. And I can't reproduce assertion failure. Can you reproduce
...
...
...
failure yet?
I just tested 10.0.4-spider-3.0 with rev.3827, it still fails.
Could you please tell me about build options and command line options
...
you used?
Thanks,
Kentoku
2013/9/27 Sergey Vojtovich <svoj@mariadb.org>
...
Hi Kentoku,
On Wed, Sep 25, 2013 at 02:53:17AM +0900, kentoku wrote:
...
Hi Sergey,
...
The above should have been fixed back in the beginning of 2012.
Which
MariaDB
revision are you testing with?
It's same as lp:~kentokushiba/maria/10.0.4-spider-3.0.
That's very strange. How did you build it and what command line
...
...
you use (including those that are listed in my.cnf)?
...
By the way, I found a problem point in Spider. Currently it is fixed
and
pushed. And I can't reproduce assertion failure. Can you reproduce
On Fri, Sep 27, 2013 at 11:21:01PM +0900, kentoku wrote:
this
that
options did
this
...
...
...
failure yet?
I just tested 10.0.4-spider-3.0 with rev.3827, it still fails.
Regards,
Sergey
...
Thanks,
Kentoku
2013/9/24 Sergey Vojtovich <svoj@mariadb.org>
...
Hi Kentoku,
On Thu, Sep 19, 2013 at 11:47:33PM +0900, kentoku wrote:
> Hi Sergey,
>
> > I'm afraid fixing rnd_end() callers in the server may stall
for a
long
...
...
> time.
> O.K. No need to fix it if it is not easy. This request is not
high
priority
> request.
>
> > Is it accaptable for spider to use bulk updates and deletes API
instead
> > (see handler.h: start_bulk_update/start_bulk_delete).
> Spider use it already. This API is not used if target table has
after
> trigger. I understand why this API is not used in this case, but
it
> sometimes causes performance problem. So, I thought it is better
to
prepare
> other choice for something wrong. Anyway, I will disable bulk
> updating/deleting feature without using API.
I see. Hope it is acceptable.
>
> > Nope, it looks like a bug in thread pool. MDEV-4739 has
different
trace,
> how
> did you get this one? Just executed given test?
> I got this trace when I try to reproduce MDEV-4739. I did as the
followings.
>
> 1. start mysqld
> 2. log in mysqld
> 3. mysql> CREATE TABLE t1 (a INT) ENGINE=InnoDB;
> 4. mysql> XA START 'xa1';
> 5. mysql> INSERT INTO t1 (a) VALUES (1),(2);
> 6. mysql> XA END 'xa1';
> 7. mysql> XA PREPARE 'xa1';
> 8. kill -9 mysqld_safe and mysqld from another terminal
> 9. start mysqld on gdb
>
> At that time, InnoDB and Spider were enabled and log-bin was
disabled. So
> probably "total_ha_2pc > 1" was true, "opt_bin_log" was false.
> Does it help you?
We couldn't reproduce it yet. :(
Looking through the code I noticed that call to thd_wait_begin()
looks
as
following:
static void scheduler_wait_sync_begin(void) {
  thd_wait_begin(NULL, THD_WAIT_SYNC);
}
Note that thd is always NULL. And it must be NULL at this point,
because
we're
booting. But according to your trace thd is not NULL.
#0  0x00000000005eabf6 in thd_wait_begin (
    thd=0x29da060, wait_type=10)
    at /ssd1/mariadb-10.0.4/sql/sql_class.cc:4277
#1  0x000000000072a114 in scheduler_wait_sync_begin ()
    at /ssd1/mariadb-10.0.4/sql/scheduler.cc:59
...
The above should have been fixed back in the beginning of 2012.
Which
MariaDB
revision are you testing with?
Thanks,
Sergey
>
> Thanks,
> Kentoku
>
>
>
> 2013/9/19 Sergey Vojtovich <svoj@mariadb.org>
>
> > Hi Kentoku,
> >
> > I'm adding MariaDB developers to CC.
> >
> > On Thu, Sep 19, 2013 at 01:19:13AM +0900, kentoku wrote:
> > > Hi Sergey,
> > >
> > > > But what kind of errors are possible in your case? Other
storage
> > engines
> > > doesn't
> > > seem to suffer from this API violation.
> > >
> > > Spider support bulk updating and deleting for avoiding
network
roundtrip
> > > between data node. Some times, last bulk updating is
executed in
> > rnd_end()
> > > function. So rnd_end() has possibility getting errors from
data
node.
> > I'm afraid fixing rnd_end() callers in the server may stall
for a
long
> > time.
> > Is it accaptable for spider to use bulk updates and deletes API
instead
> > (see handler.h: start_bulk_update/start_bulk_delete).
> >
> > > By the way, about MDEV-4739. I get the following stack trace.
> > > Program received signal SIGSEGV, Segmentation fault.
> > > 0x00000000005eabf6 in thd_wait_begin (thd=0x29da060,
> > >     wait_type=10)
> > >     at /ssd1/mariadb-10.0.4/sql/sql_class.cc:4277
> > > 4277      MYSQL_CALLBACK(thd->scheduler, thd_wait_begin,
(thd,
> > wait_type));
> > > (gdb) print thd
> > > $1 = (THD *) 0x29da060
> > > (gdb) bt
> > > #0  0x00000000005eabf6 in thd_wait_begin (
> > >     thd=0x29da060, wait_type=10)
> > >     at /ssd1/mariadb-10.0.4/sql/sql_class.cc:4277
> > > #1  0x000000000072a114 in scheduler_wait_sync_begin ()
> > >     at /ssd1/mariadb-10.0.4/sql/scheduler.cc:59
> > > #2  0x0000000000d6dc20 in my_sync (fd=23, my_flags=0)
> > >     at /ssd1/mariadb-10.0.4/mysys/my_sync.c:76
> > > #3  0x0000000000d6b54f in my_msync (fd=23,
> > >     addr=0x7ffff7ff4000, len=4096, flags=4)
> > >     at /ssd1/mariadb-10.0.4/mysys/my_mmap.c:27
> > > #4  0x00000000008bea03 in TC_LOG_MMAP::open (
> > >     this=0x16e6a00, opt_name=0xe19c87 "tc.log")
> > >     at /ssd1/mariadb-10.0.4/sql/log.cc:7735
> > > #5  0x00000000005751cb in init_server_components ()
> > >     at /ssd1/mariadb-10.0.4/sql/mysqld.cc:4797
> > > #6  0x0000000000575a07 in mysqld_main (argc=30,
> > >     argv=0x1f204d0)
> > >     at /ssd1/mariadb-10.0.4/sql/mysqld.cc:5208
> > > #7  0x000000000056d884 in main (argc=11,
> > >     argv=0x7fffffffe3a8)
> > >     at /ssd1/mariadb-10.0.4/sql/main.cc:25
> > > (gdb) print thd->scheduler
> > > $2 = (scheduler_functions *) 0x8f8f8f8f8f8f8f8f
> > > (gdb) print thd_wait_begin
> > > $3 = {void (THD *,
> > >     int)} 0x5eaba4 <thd_wait_begin(THD*, int)>
> > > (gdb) print wait_type
> > > $4 = 10
> > >
> > > It is looks that "thd->scheduler" is not initialized. What
do you
think?
> > > Must storage engine set it?
> > Nope, it looks like a bug in thread pool. MDEV-4739 has
different
trace,
> > how
> > did you get this one? Just executed given test?
> >
> > Regards,
> > Sergey
> >