Re: [Maria-developers] [Commits] Rev 4369: MDEV-7148 - Recurring: InnoDB: Failing assertion: !lock->recursive in lp:maria/5.5
Sergey Vojtovich <svoj@mariadb.org> writes:
revno: 4369 revision-id: svoj@mariadb.org-20141120094823-fozdsrm1kzv46kxi parent: jplindst@mariadb.org-20141119182734-cwbaed0ka90ocj5e committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 5.5 timestamp: Thu 2014-11-20 13:48:23 +0400 message: MDEV-7148 - Recurring: InnoDB: Failing assertion: !lock->recursive
On PPC64 high-loaded server may crash due to assertion failure in InnoDB rwlocks code.
This happened because load order between "recursive" and "writer_thread" wasn't properly enforced.
Looks ok. I failed to understand why the patch would fix the assertion mentioned in the bug report. But the patch itself looks correct. - Kristian.
Sergey Vojtovich <svoj@mariadb.org> writes:
revno: 4369 revision-id: svoj@mariadb.org-20141120094823-fozdsrm1kzv46kxi parent: jplindst@mariadb.org-20141119182734-cwbaed0ka90ocj5e committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 5.5 timestamp: Thu 2014-11-20 13:48:23 +0400 message: MDEV-7148 - Recurring: InnoDB: Failing assertion: !lock->recursive
On PPC64 high-loaded server may crash due to assertion failure in InnoDB rwlocks code.
This happened because load order between "recursive" and "writer_thread" wasn't properly enforced.
Looks ok.
I failed to understand why the patch would fix the assertion mentioned in the bug report. But the patch itself looks correct. That's a bit tricky... I'll explain just one of the simplest side effects of
Hi Kristian, On Mon, Nov 24, 2014 at 03:27:21PM +0100, Kristian Nielsen wrote: this bug, which is half way to this assertion failure. Please let me know if you want me to track it down to the assertion failure. Initial ------- lock->lock_word= X_LOCK_DECR // means unlocked lock->recursive= FALSE lock->writer_thread= 0 thr1 acquires exclusive lock ---------------------------- thr1: lock->lock_word-= X_LOCK_DECR @rw_lock_x_lock_low()/rw_lock_lock_word_decr() thr1: lock->writer_thread= thr1 @rw_lock_x_lock_low()/rw_lock_set_writer_id_and_recursion_flag() thr1: sync @rw_lock_x_lock_low()/rw_lock_set_writer_id_and_recursion_flag() thr1: lock->recursive= TRUE @rw_lock_x_lock_low()/rw_lock_set_writer_id_and_recursion_flag() thr1 releases exclusive lock ---------------------------- thr1: if (lock->lock_word == 0) lock->recursive = FALSE @rw_lock_x_unlock_func()/rw_lock_x_prepare_unlock() thr1: lock->lock_word+= X_LOCK_DECR Note: we only reset recursive, but writer_thread still equals to thr1. thr2 and thr1 contest for the lock ---------------------------------- thr2: lock->lock_word-= X_LOCK_DECR @rw_lock_x_lock_low()/rw_lock_lock_word_decr() thr1: sees that X-lock is acquired, it now needs to check if it is relock or failed lock @rw_lock_x_lock_low() thr1: loads lock->writer_thread, which is still thr1 @rw_lock_x_lock_low() thr2: lock->writer_thread= thr1 @rw_lock_x_lock_low()/rw_lock_set_writer_id_and_recursion_flag() thr2: sync @rw_lock_x_lock_low()/rw_lock_set_writer_id_and_recursion_flag() thr2: lock->recursive= TRUE @rw_lock_x_lock_low()/rw_lock_set_writer_id_and_recursion_flag() thr1: loads lock->recursive, which is TRUE @rw_lock_x_lock_low() thr1: if (lock->recursive && lock->writer_thread == thr1) lock->lock_word -= X_LOCK_DECR @rw_lock_x_lock_low() At this point we have 2 threads holding exclusive lock! Regards, Sergey
i think writing unit tests for these low-level primitives would be worth while. especially now in light of recent problems found...and the #define-madness obfuscation /Jonas On Wed, Nov 26, 2014 at 9:32 AM, Sergey Vojtovich <svoj@mariadb.org> wrote:
Hi Kristian,
Sergey Vojtovich <svoj@mariadb.org> writes:
revno: 4369 revision-id: svoj@mariadb.org-20141120094823-fozdsrm1kzv46kxi parent: jplindst@mariadb.org-20141119182734-cwbaed0ka90ocj5e committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 5.5 timestamp: Thu 2014-11-20 13:48:23 +0400 message: MDEV-7148 - Recurring: InnoDB: Failing assertion: !lock->recursive
On PPC64 high-loaded server may crash due to assertion failure in InnoDB rwlocks code.
This happened because load order between "recursive" and "writer_thread" wasn't properly enforced.
Looks ok.
I failed to understand why the patch would fix the assertion mentioned in the bug report. But the patch itself looks correct. That's a bit tricky... I'll explain just one of the simplest side effects of
On Mon, Nov 24, 2014 at 03:27:21PM +0100, Kristian Nielsen wrote: this bug, which is half way to this assertion failure. Please let me know if you want me to track it down to the assertion failure.
Initial ------- lock->lock_word= X_LOCK_DECR // means unlocked lock->recursive= FALSE lock->writer_thread= 0
thr1 acquires exclusive lock ---------------------------- thr1: lock->lock_word-= X_LOCK_DECR @rw_lock_x_lock_low()/rw_lock_lock_word_decr() thr1: lock->writer_thread= thr1 @rw_lock_x_lock_low()/rw_lock_set_writer_id_and_recursion_flag() thr1: sync @rw_lock_x_lock_low()/rw_lock_set_writer_id_and_recursion_flag() thr1: lock->recursive= TRUE @rw_lock_x_lock_low()/rw_lock_set_writer_id_and_recursion_flag()
thr1 releases exclusive lock ---------------------------- thr1: if (lock->lock_word == 0) lock->recursive = FALSE @rw_lock_x_unlock_func()/rw_lock_x_prepare_unlock() thr1: lock->lock_word+= X_LOCK_DECR
Note: we only reset recursive, but writer_thread still equals to thr1.
thr2 and thr1 contest for the lock ---------------------------------- thr2: lock->lock_word-= X_LOCK_DECR @rw_lock_x_lock_low()/rw_lock_lock_word_decr() thr1: sees that X-lock is acquired, it now needs to check if it is relock or failed lock @rw_lock_x_lock_low() thr1: loads lock->writer_thread, which is still thr1 @rw_lock_x_lock_low() thr2: lock->writer_thread= thr1 @rw_lock_x_lock_low()/rw_lock_set_writer_id_and_recursion_flag() thr2: sync @rw_lock_x_lock_low()/rw_lock_set_writer_id_and_recursion_flag() thr2: lock->recursive= TRUE @rw_lock_x_lock_low()/rw_lock_set_writer_id_and_recursion_flag() thr1: loads lock->recursive, which is TRUE @rw_lock_x_lock_low() thr1: if (lock->recursive && lock->writer_thread == thr1) lock->lock_word -= X_LOCK_DECR @rw_lock_x_lock_low()
At this point we have 2 threads holding exclusive lock!
Regards, Sergey
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Sergey Vojtovich <svoj@mariadb.org> writes:
I failed to understand why the patch would fix the assertion mentioned in the bug report. But the patch itself looks correct. That's a bit tricky... I'll explain just one of the simplest side effects of this bug, which is half way to this assertion failure. Please let me know if you want me to track it down to the assertion failure.
Thanks, no there is no need. I agree that the patch is correct and fixes a real issue. In fact, as I understand your explanation, the real problem is that the rwlocks behave incorrectly. The assertion is just how it was discovered, but it's just one symptom of a real problem. That is how I understood the patch as well. (It is often a good idea in commit messages to explain the actual problem, rather than the symptom that was used to discover the bug. Eg. I often see commit messages like "Fix a Valgrind warning in Buildbot", instead of a proper explanation about what was actually wrong and what the user-visible effect was, if any. You did explain in the commit message that the problem was incorrect load order, no need to change it I think.) - Kristian.
participants (3)
-
Jonas Oreland
-
Kristian Nielsen
-
Sergey Vojtovich