Hi,guys

I have worked on this branch https://code.launchpad.net/~knielsen/maria/dingqi-parallel-replication for some days, and found bugs listed below.May this would be helpful to you.

1, when slave switch on table filter,this bug could lead server crash.

how to reappear:

on slave

set replicate-wild-ignore-table = test.t5 in config file

on master do these operations

CREATE TABLE test.t3 (a INT AUTO_INCREMENT PRIMARY KEY, b DECIMAL(20,20), c INT);
SET INSERT_ID=1;
SET @c=2;
SET @@rand_seed1=10000000, @@rand_seed2=1000000;
INSERT INTO t3 VALUES (NULL, RAND(), @c);

codes lead this bug:

In execute_single_transaction()

case RAND_EVENT:
need_remove_from_trans= true;

        if(!rli->is_deferred_event(ev))
          delete ev;
        break;

reason:

Rand Event object is deleted in execute_single_transaction(),

but it's pointer would be used is slave_execute_deferred_events() later.

2, SQL thread could read and apply some log events repeated.

how to reappear:

it's a little hard to reappear. if you set max_relay_log_size=100M and keep SQL thread closed to IO thread, this bug may reappear.

codes lead this bug:

In reopen_relay_log()

rli->event_relay_log_pos= max(rli->event_relay_log_pos, BIN_LOG_HEADER_SIZE);

my_b_seek(cur_log,rli->event_relay_log_pos);

reason:
when SQL thread use a hot log,but the hot log was closed by IO thread just recently, SQL thread need to reopen this log and set read offset to rli->event_relay_log_pos, while rli->event_relay_log_pos could be set new value in other thread for there are many threads apply log events.so rli->event_relay_log_pos could be less then rli->future_event_relay_log_pos.

3, SQL thread do not report error information in result of "show slave status"and replication do not stop, when the slave insert duplicate record into a table with primary key.

how to reappear:

Just need to change master_log_pos to read duplicate records from master.

codes lead this bug:

In execute_single_transaction()

retry_transaction:
ev= trans->event_list_head;

... ...

if (ret && rli->trans_retries < slave_trans_retries)
{ ...

goto retry_transaction;
}

reason:

as I have sayed in other email: Rows_log_event::do_apply_event() do twice but return different results for m_curr_row==m_rows_end in the second time.

4, when do oparetions such as "show slave status" and "stop slave", it could be blocked for a long time.

how to reappear:

just do "show slave status" again and again.

codes lead this bug:

In the queue_event()

case FORMAT_DESCRIPTION_EVENT:

...

wait_for_all_dml_done(&mi->rli, true);

and in process_io_rotate()

wait_for_all_dml_done(&mi->rli, true);

reason:

IO thread could wait in wait_for_all_dml_done() while holding the rpl_mi->data_lock, so operations like "show slave status" could be blocked for waiting rpl_mi->data_lock.

5, "START SLAVE UNTIL" make replication stop in different place.

how to reappear:

suppose log events in relay log like:

BEGIN; ------->pos1

LOG_EVENT1;

LOG_EVENT2;

COMMIT; ------>pos2

BEGIN; ------>pos3

LOG_EVENT3; --->stop_pos

LOG_EVENT4;

COMMIT; ------>pos4

If we do START SLAVE UNTIL relay_log_pos=stop_pos; The replication should stop at pos4 but it stop pos2.

6, log_event->thd is wrong.

suppose log_event was read is thread_1 so the log_event->thd==thread_1, but this log_event may be dispatch to other thread (suppose thread_2).the log_event is applyed in thread_2 but the log_event->thd==thread_1.this problem can make log event apply failed in MySQL, but in mariaDB it seems ok.

2013-07-19

nanyi607rao