hi kristian,
 
In parallel replication if a transaction executed failed, replication should stop immediately(unconsider retry), and transactions
behind the failed one should skip or rollback. you did try to do like this, there are codes like:
 
        if (unlikely(entry->stop_on_error_sub_id <= rgi->wait_commit_sub_id))
          skip_event_group= true;
 
this codes can tell latter transactions to skip but can't tell them rollback. because if a transaction started commiting before a former transaction failed (such as Lock timeout for unknown reason), the commiting transaction will not be affectd by stop_on_error_sub_id.
 
Then the failed transaction should wakeup latter commiting transactions and tell them to rollback, unfortunately it won't. codes like
      if (!rgi->is_error && !skip_event_group)
        err= rpt_handle_event(events, rpt);
      else
        err= thd->wait_for_prior_commit();    
      ... ...            
      finish_event_group(thd, err, event_gtid_sub_id, entry, rgi);
 
if the failed transaction didn't fail at end event, err's value would come from wait_for_prior_commit, the err would be 0 if its former transaction has successed, then the failed transaction would tell latter transactions ok to commit in finish_event_group.
2014-03-19

nanyi607rao