Re: [Maria-developers] [Maria-discuss] Known limitation with TokuDB in Read Free Replication & parallel replication ?
Hello Kristian,
I am working on a second variant of the kill query design that will only
kill the pending lock request if any for the thd being killed. The
previous design had a problem when killing the query that triggered the
wait for call
On Aug 15, 2016 7:26 PM, "Rich Prohaska"
Hello Kristian, See attached snapshot of slave threads and tokudb locks. Thread 16 is waiting for a tokudb lock held by thread 16, which is waiting for a tokudb lock held by thread 14. Thread 14 is waiting for a prior transaction to complete, presumably either thread 15 or 16. So, we have a deadlock that tokudb can not detect because the ordering constraint is not available to tokudb. I assume that the optimistic scheduler killed thread 16, but since tokudb does not implement the kill_query function, the deadlock is only resolved when the tokudb lock timer pops.
On Mon, Aug 15, 2016 at 8:16 AM, Rich Prohaska
wrote: Hello Kristian, The simplest kill_query implementation for tokudb would just signal all of the pending lock request's condition variables. This would cause the killed callback to be called. A performance refinement, if necessary, would allow thread A (executing the kill_query function) to identify and signal a condition variable for a blocked thread B.
On Mon, Aug 15, 2016 at 5:42 AM, Kristian Nielsen < knielsen@knielsen-hq.org> wrote:
Rich Prohaska
writes: tokudb lock timeouts are resolving the replication stall. unfortunately, the tokudb lock timeout is 4 seconds, so the throughput is almost zero.
Yes. Sorry for not making it clear that my proof-of-concept patch was incomplete...
I suspect that the poor slave replication performance for optimistic replication occurs because TokuDB does not implement the kill_query handlerton function. kill_handlerton gets called to resolve lock wait
Possibly, but I'm not sure it's that important. The kill will be effective as soon as the wait is over.
No, you're absolutely right, after testing (and thinking) some more, I realise that indeed the kill_query functionality is important.
A possible scenario is, given transactions T1, T2, and T3 in that order:
T3 acquires a lock on row R3, T2 similarly acquires R2. Now T3 tries to acquire R2, but has to wait for T2 to release it. Later T1 tries to acquire R3, also has to wait.
At this point, we kill T3, since it is holding a lock (R3) needed by an earlier transaction T1. However, T3 will not notice the kill until its own wait (on R2 held by T2) times out. T2 cannot release the lock because it is waiting for T1 to commit first. So we have a deadlock :-/
With InnoDB, the kill causes T3 to wake up immediately and roll back, so that T1 can proceed without much delay.
Ok, so something more is needed here. I see there is a killed_callback() which seems to check for the kill, so I'm hoping that can be used with a suitable wakeup of the offending lock_request (or all requests, perhaps). But as I'm completely new to TokuDB, I still need some more time to read the code and try to understand how everything fits together...
TokuFT implements pessimistic locking and 2 phase locking algorithms. This wiki describes locking and concurrency in a little more detail: https://github.com/percona/tokudb-engine/wiki/Transactions-a nd-Concurrency.
Thanks, this was quite helpful.
Yes, I think they are false positives since the thd_report_wait_for API is called but it does NOT call the THD::awake function.
Ah. Then it's probably normal, caused by the group-commit optimisation. In conservative mode, if two transactions T1 and T2 did not group commit on the master, then cannot be started in parallel on the slave. But T2 can start as soon as T1 has reached COMMIT. Thus, if T2 happens to conflict with T1, there is a small window where T2 can need to wait on T1 until T1 has completed its commit.
Thanks,
- Kristian.
Hello Kristian,
I have tokudb kill query working with parallel replication. Needed to
attach the mysql thd pointer to each lock request, and use the thd as a key
into the pending lock requests to find the one being used by a particular
thd. Once found, this lock request is immediately completed with a lock
timeout result. Also needed to rearrange the lock wait for code so that
the callback is called when not holding any tokudb locks.
This solves 1 or 10 parallel replication stalls with tokudb. The remainder
require the tokudb lock timer to pop.
The problem is that the parallel replicator attempts to kill the query for
a mysql slave thread that is waiting for a prior txn to complete. This
does not have any effect on tokudb. The workaround for now is to use an
extremely small tokudb lock timeout so that the stalls are short.
On Tue, Aug 16, 2016 at 9:59 PM, Rich Prohaska
Hello Kristian, I am working on a second variant of the kill query design that will only kill the pending lock request if any for the thd being killed. The previous design had a problem when killing the query that triggered the wait for call
On Aug 15, 2016 7:26 PM, "Rich Prohaska"
wrote: Hello Kristian, I have a prototype of the TokuFT code that will cause ALL lock waiters to call their killed callback here: https://github.com/proha ska7/tokuft/tree/kill_lockers
On Mon, Aug 15, 2016 at 11:51 AM, Rich Prohaska
wrote: Hello Kristian, See attached snapshot of slave threads and tokudb locks. Thread 16 is waiting for a tokudb lock held by thread 16, which is waiting for a tokudb lock held by thread 14. Thread 14 is waiting for a prior transaction to complete, presumably either thread 15 or 16. So, we have a deadlock that tokudb can not detect because the ordering constraint is not available to tokudb. I assume that the optimistic scheduler killed thread 16, but since tokudb does not implement the kill_query function, the deadlock is only resolved when the tokudb lock timer pops.
On Mon, Aug 15, 2016 at 8:16 AM, Rich Prohaska
wrote: Hello Kristian, The simplest kill_query implementation for tokudb would just signal all of the pending lock request's condition variables. This would cause the killed callback to be called. A performance refinement, if necessary, would allow thread A (executing the kill_query function) to identify and signal a condition variable for a blocked thread B.
On Mon, Aug 15, 2016 at 5:42 AM, Kristian Nielsen < knielsen@knielsen-hq.org> wrote:
Rich Prohaska
writes: tokudb lock timeouts are resolving the replication stall. unfortunately, the tokudb lock timeout is 4 seconds, so the throughput is almost zero.
Yes. Sorry for not making it clear that my proof-of-concept patch was incomplete...
> I suspect that the poor slave replication performance for optimistic > replication occurs because TokuDB does not implement the kill_query > handlerton function. kill_handlerton gets called to resolve lock wait
Possibly, but I'm not sure it's that important. The kill will be effective as soon as the wait is over.
No, you're absolutely right, after testing (and thinking) some more, I realise that indeed the kill_query functionality is important.
A possible scenario is, given transactions T1, T2, and T3 in that order:
T3 acquires a lock on row R3, T2 similarly acquires R2. Now T3 tries to acquire R2, but has to wait for T2 to release it. Later T1 tries to acquire R3, also has to wait.
At this point, we kill T3, since it is holding a lock (R3) needed by an earlier transaction T1. However, T3 will not notice the kill until its own wait (on R2 held by T2) times out. T2 cannot release the lock because it is waiting for T1 to commit first. So we have a deadlock :-/
With InnoDB, the kill causes T3 to wake up immediately and roll back, so that T1 can proceed without much delay.
Ok, so something more is needed here. I see there is a killed_callback() which seems to check for the kill, so I'm hoping that can be used with a suitable wakeup of the offending lock_request (or all requests, perhaps). But as I'm completely new to TokuDB, I still need some more time to read the code and try to understand how everything fits together...
TokuFT implements pessimistic locking and 2 phase locking algorithms. This wiki describes locking and concurrency in a little more detail: https://github.com/percona/tokudb-engine/wiki/Transactions-a nd-Concurrency.
Thanks, this was quite helpful.
Yes, I think they are false positives since the thd_report_wait_for API is called but it does NOT call the THD::awake function.
Ah. Then it's probably normal, caused by the group-commit optimisation. In conservative mode, if two transactions T1 and T2 did not group commit on the master, then cannot be started in parallel on the slave. But T2 can start as soon as T1 has reached COMMIT. Thus, if T2 happens to conflict with T1, there is a small window where T2 can need to wait on T1 until T1 has completed its commit.
Thanks,
- Kristian.
Rich Prohaska
I have tokudb kill query working with parallel replication. Needed to attach the mysql thd pointer to each lock request, and use the thd as a key into the pending lock requests to find the one being used by a particular thd. Once found, this lock request is immediately completed with a lock timeout result. Also needed to rearrange the lock wait for code so that the callback is called when not holding any tokudb locks.
Ok, sounds great!
This solves 1 or 10 parallel replication stalls with tokudb. The remainder require the tokudb lock timer to pop.
The problem is that the parallel replicator attempts to kill the query for a mysql slave thread that is waiting for a prior txn to complete. This does not have any effect on tokudb. The workaround for now is to use an extremely small tokudb lock timeout so that the stalls are short.
Hm, that sounds like a problem with the parallel replication code. It should be normal that a thread needs to be killed in "waiting for prior txn", and it should work. I will look into this. Is your latest kill query code on github? Thanks, - Kristian.
participants (2)
-
Kristian Nielsen
-
Rich Prohaska