Re: [Maria-discuss] Known limitation with TokuDB in Read Free Replication & parallel replication ?

8 Aug 2016

      Here is the commit with the test :

https://github.com/jocel1/server/commit/e1e1716ec2af981d29239e9e075734080a2a...

(I've not updated the result file)

And a small modification to output the show slave status in case of sync 
failure :

https://github.com/jocel1/server/commit/e1261396af0282738e8034885949bcc6a6f5...

   Jocelyn

Le 08/08/2016 à 01:50, jocelyn fournier a écrit :
...
Hi Kristian,
Just FYI I confirm the "Lock wait timeout exceeded; try restarting 
transaction" behaviour you described.
I've duplicated & modified the rpl_parallel_optimistic.test and run it 
into 
storage/tokudb/mysql-test/tokudb_rpl/t/rpl_parallel_optimistic.test :
./mtr --suite=tokudb_rpl <1:33:48
Logging: ./mtr  --suite=tokudb_rpl
vardir: /home/joce/mariadb-10.1.16/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/home/joce/mariadb-10.1.16/mysql-test/var'...
Checking supported features...
MariaDB Version 10.1.16-MariaDB-debug
 - SSL connections supported
 - binaries are debug compiled
Using suites: tokudb_rpl
Collecting tests...
Installing system database...
==============================================================================
TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
worker[1] mysql-test-run: WARNING: running this script as _root_ will 
cause some tests to be skipped
tokudb_rpl.rpl_parallel_optimistic 'innodb_plugin,mix' [ fail ]
        Test ended at 2016-08-08 01:26:34
CURRENT_TEST: tokudb_rpl.rpl_parallel_optimistic
mysqltest: In included file "./include/sync_with_master_gtid.inc":
included from 
/home/joce/mariadb-10.1.16/storage/tokudb/mysql-test/tokudb_rpl/t/rpl_parallel_optimistic.test 
at line 59:
At line 50: Failed to sync with master
The result from queries just before the failure was:
< snip >
DELETE FROM t1 WHERE a=2;
INSERT INTO t1 VALUES (2,5);
DELETE FROM t1 WHERE a=3;
INSERT INTO t1 VALUES(3,2);
DELETE FROM t1 WHERE a=1;
INSERT INTO t1 VALUES(1,2);
DELETE FROM t1 WHERE a=3;
INSERT INTO t1 VALUES(3,3);
DELETE FROM t1 WHERE a=2;
INSERT INTO t1 VALUES (2,6);
include/save_master_gtid.inc
SELECT * FROM t1 ORDER BY a;
a    b
1    2
2    6
3    3
include/start_slave.inc
include/sync_with_master_gtid.inc
Timeout in master_gtid_wait('0-1-20', 120), current slave GTID 
position is: 0-1-3.
Slave state : Waiting for master to send event    127.0.0.1 root    
16000    1    master-bin.000001    3468 slave-relay-bin.000002    
796    master-bin.000001    Yes No                         1205    
Lock wait timeout exceeded; try restarting transaction    0    772    
3790    None        0 No                            No    0        
1205    Lock wait timeout exceeded; try restarting transaction        
1 Slave_Pos 0-1-20            optimistic
I've no explanation so far for the DUPLICATE KEY error I've seen.
Jocelyn
Le 15/07/2016 à 17:09, Kristian Nielsen a écrit :
...
jocelyn fournier <jocelyn.fournier@gmail.com> writes:
...
Thanks for the quick answer! I wonder if it would be possible the
automatically disable the optimistic parallel replication for an
engine if it does not implement it ?
That would probably be good - though it would be better to just 
implement
the necessary API, it's a very small change (basically TokuDB just 
needs to
inform the upper layer of any lock waits that take place inside).
However, looking more at your description, you got a "key not found"
error. Not implementing the thd_report_wait_for() could lead to 
deadlocks,
but it shouldn't cause key not found. In fact, in optimistic mode, all
errors are treated as "deadlock" errors, the query is rolled back, and
run again, this time not in parallel.
So I'm wondering if there is something else going on. If transactions 
T1 and
T2 run in parallel, it's possible that they have a row conflict. But 
if T2
deleted a row expected by T1, I would expect T1 to wait on a row lock 
held
by T2, not get a duplicate key error. And if T1 has not yet inserted 
a row
expected by T2, then T2 would be rolled back and retried after T1 has
committed. The first can cause deadlock, but neither case seems to cause
duplicate error.
Maybe TokuDB is doing something special with locks around 
replication, or
something else goes wrong. I guess TokuDB just hasn't been tested 
much with
parallel replication.
Does it work ok when running in conservative parallel mode?
- Kristian.