Laurynas,We cannot recover from a torn page only using redo log. But wouldn't undo log record enough information for recovery in the case of a torn page? Undo log should have old values of affected rows. So shouldn't it be enough to recover a torn page using information from undo log?XiaofeiOn Sat, May 9, 2015 at 12:07 AM, Laurynas Biveinis <laurynas.biveinis@gmail.com> wrote:Xiaofei -
We can indeed detect the torn page write without the doublewrite
buffer (and WebScaleSQL has a patch utilising this observation). But
we need not only to detect, but to recover the page as well. And
without the doublewrite, if we discard the page, we have nothing: a
half-old half-new page on the disk and the redo log records for that
page are not enough to recover it.
--
2015-05-09 8:44 GMT+03:00 Xiaofei Du <xiaofei.du008@gmail.com>:
> Justin,
>
> I think the fsync I was concerning and the torn page problem are two
> different things. But now I have a question about double write buffer. If we
> can detect a torn page by checking the top and bottom of a page, why would
> we still need double write buffer? If the page is consistent, then we use
> it, otherwise, we just discard it. Maybe this is a naive question. But
> please let me know. Thanks.
>
> Xiaofei
>
> On Fri, May 8, 2015 at 9:24 PM, Justin Swanhart <greenlion@gmail.com> wrote:
>>
>> Hi,
>>
>> The log does not have whole pages. Pages must not be torn for the
>> recovery process to work. A fsync is required when a page is written to
>> disk. During recovery all changes since the last checkpoint are replayed,
>> then transactions that do not have a commit marker are rolled back. This is
>> called roll forward/roll back recovery.
>>
>> --Justin
>>
>> On Fri, May 8, 2015 at 6:09 PM, Xiaofei Du <xiaofei.du008@gmail.com>
>> wrote:
>>>
>>> Justin,
>>>
>>> I was thinking of if fsync is needed each time after a write. The
>>> operations are already in the log. So recovery can always be done from the
>>> log. The difference is that during recovery, we need to go back further in
>>> the log and it will take longer. But in that way, I guess it would be hard
>>> to coordinate with the kernel flush thread.
>>>
>>> Xiaofei
>>>
>>> On Fri, May 8, 2015 at 2:06 PM, Justin Swanhart <greenlion@gmail.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> InnoDB recovery can not handle torn pages. An fsync is required to
>>>> ensure that the page is fully written to disk. This is also why the
>>>> doublewrite buffer is used. Before pages are written down to disk, they are
>>>> first written sequentially into the doublewrite buffer. This buffer is
>>>> synced, then async page writing can proceed. If the database crashes, the
>>>> pages in flight will be rewritten by the doublewrite buffer. The detection
>>>> mechanism for torn pages comes from an LSN, which is written into the top
>>>> and the bottom of the page. If the LSN at the top and bottom do not match
>>>> the page is torn.
>>>>
>>>> Regards,
>>>>
>>>> --Justin
>>>>
>>>> On Fri, May 8, 2015 at 12:43 PM, Xiaofei Du <xiaofei.du008@gmail.com>
>>>> wrote:
>>>>>
>>>>> Laurynas,
>>>>>
>>>>> This is exactly what I was looking for. I went through these functions
>>>>> before. I disabled double write buffer, so I didn't pay attention to code
>>>>> under buf_dblwr... The reason I asked this question is because I didn't know
>>>>> how the recovery process works, so I was wondering if it's necessary to
>>>>> fsync after each write. It's a performance concern. Anyway, thank you very
>>>>> much!
>>>>>
>>>>> Jan -- Thank you for your answer too!
>>>>>
>>>>> Xiaofei
>>>>>
>>>>> On Thu, May 7, 2015 at 9:59 PM, Laurynas Biveinis
>>>>> <laurynas.biveinis@gmail.com> wrote:
>>>>>>
>>>>>> Xiaofei -
>>>>>>
>>>>>> fsync is performed for all the flush types (LRU, flush, single page)
>>>>>> if it is asked for (innodb_flush_method != O_DIRECT_NO_FSYNC). The
>>>>>> apparent difference in sync and async is not because of the sync
>>>>>> difference itself, but because of the flush type difference. The
>>>>>> single page flush flushes one page, and requests a fsync for its file.
>>>>>> Other flushes flush in batches, don't have to fsync for each written
>>>>>> page individually but rather sync once at the end. Then doublewrite
>>>>>> complicates this further. If it is disabled, fsync will happen in
>>>>>> buf_dblwr_sync_datafiles called from buf_dblwr_flush_buffered_writes
>>>>>> called from buf_flush_common called at the end of either LRU or flush
>>>>>> list flush. If doublewrite is enabled, fsync will happen in
>>>>>> buf_dblwr_update called from buf_flush_write_complete.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2015-05-07 9:01 GMT+03:00 Xiaofei Du <xiaofei.du008@gmail.com>:
>>>>>> > Hi Laurynas,
>>>>>> >
>>>>>> > On Wed, May 6, 2015 at 9:14 PM, Laurynas Biveinis
>>>>>> > <laurynas.biveinis@gmail.com> wrote:
>>>>>> >>
>>>>>> >> Xiaofei -
>>>>>> >>
>>>>>> >> > Does InnoDB maintain a dirty
>>>>>> >> > page table?
>>>>>> >>
>>>>>> >> You must be referring to the buffer pool flush_list.
>>>>>> >
>>>>>> >
>>>>>> > You are right. The flush_list is can be used for recovery and
>>>>>> > checkpoint.
>>>>>> >
>>>>>> >>
>>>>>> >>
>>>>>> >> > Is fsync called to guarantee the page to be on persistent
>>>>>> >> > storage so that the dirty page table can be updated? If this is
>>>>>> >> > the
>>>>>> >> > case,
>>>>>> >> > when is the dirty page table updated for asynchronous IOs?
>>>>>> >>
>>>>>> >> Check buf_flush_write_complete in buf0flu.cc. For async IO it is
>>>>>> >> called from buf_page_io_complete in buf0buf.cc.
>>>>>> >
>>>>>> >
>>>>>> > You are right that this is the place it updates the dirty page
>>>>>> > information.
>>>>>> > But I still don't understand why the fsync is needed for synchronous
>>>>>> > IOs,
>>>>>> > but not for the AIOs. Jan Lindstrom said fsync is also called for
>>>>>> > other AIO
>>>>>> > operations. But I could only it true in one of many AIO operations.
>>>>>> > Or maybe
>>>>>> > I am missing something still?
>>>>>> >
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Laurynas
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Laurynas
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~maria-discuss
>>>>> Post to : maria-discuss@lists.launchpad.net
>>>>> Unsubscribe : https://launchpad.net/~maria-discuss
>>>>> More help : https://help.launchpad.net/ListHelp
>>>>>
>>>>
>>>
>>
>
Laurynas