Re: [Maria-developers] pre-allocating binlog to speed up sync_binlog=1
![](https://secure.gravatar.com/avatar/3c2329333dd5898f2e72818af773243d.jpg?s=120&d=mm&r=g)
Hi, Arjen! [moving to the public list] On Dec 14, Arjen Lentz wrote:
Hi all
Can we adopt/implement http://forge.mysql.com/worklog/task.php?id=4925 in MariaDB? The benchmark info is in the item, and looks quite interesting. The author tested it using a separate tool to preallocate the binlog, but it should be straightforward implementing this inside the server, that'd also be much easier to manage without extra fuss for a DBA.
You mean - before it gets into 5.6 and we merge it the normal way? Or you mean - with a different implementation (w/o mysqlbinlogalloc) ? Regards, Sergei P.S. These are the changesets: final version: http://lists.mysql.com/commits/113309 http://lists.mysql.com/commits/113306 http://lists.mysql.com/commits/113307 review comments: http://lists.mysql.com/commits/116965 http://lists.mysql.com/commits/121478
![](https://secure.gravatar.com/avatar/99fde0c1dfd216326aae0aff30d493f1.jpg?s=120&d=mm&r=g)
Sergei Golubchik <serg@askmonty.org> writes:
On Dec 14, Arjen Lentz wrote:
Can we adopt/implement http://forge.mysql.com/worklog/task.php?id=4925 in MariaDB?
P.S. These are the changesets:
final version: http://lists.mysql.com/commits/113309 http://lists.mysql.com/commits/113306 http://lists.mysql.com/commits/113307 review comments: http://lists.mysql.com/commits/116965 http://lists.mysql.com/commits/121478
The implementation of this makes me very uneasy. The problem is that I see nothing that properly handles partial writes into the binlog, at least from a quick read-through. Neither in the worklog nor in the patch. Just the fact that this is not clearly described up-front in the worklog is very worrying! The worklog says this: "For replication threads, when reading the latest binary log, getting actual size information is needed to check EOF [...] If binlog size is not set, 4KB is read so bogus data is read if actual binlog size is smaller than 4KB. This makes slave i/o thread terminated)" But there is no guarantee that "bogus data" will be detected as such. We don't even have a checksum on events. So basically, after a crash the last binlog event may be corrupt, with no sure way to detect this corruption. In other words, we loose crash recovery. Which is the whole point of setting sync_binlog=1 in the first place. [I would love to learn that I am wrong, as this is a very nice feature. But the whole reason fsync() is slow when appending to files is handling the difficult issue of partial writes, so I would be really curious how the patch manages to handle this properly.] - Kristian.
![](https://secure.gravatar.com/avatar/4a81ad68d36ab4fdbf830ccb4db35769.jpg?s=120&d=mm&r=g)
Hi Serg, Kristian, all On 14/12/2010, at 4:08 PM, Sergei Golubchik wrote:
On Dec 14, Arjen Lentz wrote:
Can we adopt/implement http://forge.mysql.com/worklog/task.php?id=4925 in MariaDB? The benchmark info is in the item, and looks quite interesting. The author tested it using a separate tool to preallocate the binlog, but it should be straightforward implementing this inside the server, that'd also be much easier to manage without extra fuss for a DBA.
You mean - before it gets into 5.6 and we merge it the normal way? Or you mean - with a different implementation (w/o mysqlbinlogalloc) ?
Both ;-) sync_binlog=1 is a serious problem, note my earlier bugreport on the number of fsyncs a commit requires. While fixing that is not trivial, pre-allocating apparently makes a big difference. So I'd see that as high priority, even just looking in the realm of a number of Open Query clients. Secondly, I think the implementation with the external mysqlbinlogalloc tool is not nice. I appreciate that writing full a 1GB file takes a long time, but that's not how it needs to work. InnoDB can extend its tablespace with a configurable chunk at a time (8M by default, iirc). The binlog_preallocate variable should, similar to the InnoDB method, define the size of the next preallocation. I also don't see the point of the =0 ignoring a preallocated file. Since mysqld always starts a new binlog on startup, it does not have to scan an existing file at that stage. For replication, surely it can detect what's going on without having a setting. By having the pre-allocation size configurable, users can choose an amount that works best in their particular environment; with the right filesystem, it could be huge. Otherwise, more modest. Either way advantage can be gained. Cheers, Arjen. -- Arjen Lentz, Exec.Director @ Open Query (http://openquery.com) Remote expertise & maintenance for MySQL/MariaDB server environments. Follow us at http://openquery.com/blog/ & http://twitter.com/openquery
![](https://secure.gravatar.com/avatar/d1fa154903ca7ed342c36888aab05236.jpg?s=120&d=mm&r=g)
It would be nice if pre-allocated binlog files were possible, but performance results from ext-3 overstates the benefit to be had. If you care about performance with the binlog enabled then don't use ext-3: http://www.facebook.com/note.php?note_id=194501560932 On Mon, Dec 13, 2010 at 10:08 PM, Sergei Golubchik <serg@askmonty.org> wrote:
Hi, Arjen!
[moving to the public list]
On Dec 14, Arjen Lentz wrote:
Hi all
Can we adopt/implement http://forge.mysql.com/worklog/task.php?id=4925 in MariaDB? The benchmark info is in the item, and looks quite interesting. The author tested it using a separate tool to preallocate the binlog, but it should be straightforward implementing this inside the server, that'd also be much easier to manage without extra fuss for a DBA.
You mean - before it gets into 5.6 and we merge it the normal way? Or you mean - with a different implementation (w/o mysqlbinlogalloc) ?
Regards, Sergei
P.S. These are the changesets:
final version: http://lists.mysql.com/commits/113309 http://lists.mysql.com/commits/113306 http://lists.mysql.com/commits/113307 review comments: http://lists.mysql.com/commits/116965 http://lists.mysql.com/commits/121478
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Mark Callaghan mdcallag@gmail.com
![](https://secure.gravatar.com/avatar/4a81ad68d36ab4fdbf830ccb4db35769.jpg?s=120&d=mm&r=g)
Hi Mark, all On 21/12/2010, at 12:51 PM, MARK CALLAGHAN wrote:
It would be nice if pre-allocated binlog files were possible, but performance results from ext-3 overstates the benefit to be had. If you care about performance with the binlog enabled then don't use ext-3: http://www.facebook.com/note.php?note_id=194501560932
Not surprising, however in many environments deployments get stuck with ext3, so I do think there's benefit for the real world. Cheers, Arjen.
On Mon, Dec 13, 2010 at 10:08 PM, Sergei Golubchik <serg@askmonty.org> wrote:
Hi, Arjen!
[moving to the public list]
On Dec 14, Arjen Lentz wrote:
Hi all
Can we adopt/implement http://forge.mysql.com/worklog/task.php?id=4925 in MariaDB? The benchmark info is in the item, and looks quite interesting. The author tested it using a separate tool to preallocate the binlog, but it should be straightforward implementing this inside the server, that'd also be much easier to manage without extra fuss for a DBA.
You mean - before it gets into 5.6 and we merge it the normal way? Or you mean - with a different implementation (w/o mysqlbinlogalloc) ?
Regards, Sergei
P.S. These are the changesets:
final version: http://lists.mysql.com/commits/113309 http://lists.mysql.com/commits/113306 http://lists.mysql.com/commits/113307 review comments: http://lists.mysql.com/commits/116965 http://lists.mysql.com/commits/121478
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Mark Callaghan mdcallag@gmail.com
-- Arjen Lentz, Exec.Director @ Open Query (http://openquery.com) Remote expertise & maintenance for MySQL/MariaDB server environments. Follow us at http://openquery.com/blog/ & http://twitter.com/openquery
![](https://secure.gravatar.com/avatar/d1fa154903ca7ed342c36888aab05236.jpg?s=120&d=mm&r=g)
On Mon, Dec 20, 2010 at 9:17 PM, Arjen Lentz <arjen@openquery.com> wrote:
Hi Mark, all
On 21/12/2010, at 12:51 PM, MARK CALLAGHAN wrote:
It would be nice if pre-allocated binlog files were possible, but performance results from ext-3 overstates the benefit to be had. If you care about performance with the binlog enabled then don't use ext-3: http://www.facebook.com/note.php?note_id=194501560932
Not surprising, however in many environments deployments get stuck with ext3, so I do think there's benefit for the real world.
Arjen, The question isn't whether this has a real world benefit. Lots of changes do. The question is whether this is worth the effort to fix given the many other problems that could be fixed and who is going to fund that fix. This is a big change to replication internals. -- Mark Callaghan mdcallag@gmail.com
![](https://secure.gravatar.com/avatar/4a81ad68d36ab4fdbf830ccb4db35769.jpg?s=120&d=mm&r=g)
Hi Mark On 23/12/2010, at 6:19 AM, MARK CALLAGHAN wrote:
On Mon, Dec 20, 2010 at 9:17 PM, Arjen Lentz <arjen@openquery.com> wrote:
Hi Mark, all
On 21/12/2010, at 12:51 PM, MARK CALLAGHAN wrote:
It would be nice if pre-allocated binlog files were possible, but performance results from ext-3 overstates the benefit to be had. If you care about performance with the binlog enabled then don't use ext-3: http://www.facebook.com/note.php?note_id=194501560932
Not surprising, however in many environments deployments get stuck with ext3, so I do think there's benefit for the real world.
Arjen,
The question isn't whether this has a real world benefit. Lots of changes do. The question is whether this is worth the effort to fix given the many other problems that could be fixed and who is going to fund that fix. This is a big change to replication internals.
right well that's good to know - about it being a big change... the patch in any case looked overly complicated in terms of approach, so I figured perhaps a simpler way would be ok. Clients were interested so funding might be ok. But indeed, lots of things could be even more useful. we'll see. Cheers, Arjen. -- Arjen Lentz, Exec.Director @ Open Query (http://openquery.com) Remote expertise & maintenance for MySQL/MariaDB server environments. Follow us at http://openquery.com/blog/ & http://twitter.com/openquery
participants (4)
-
Arjen Lentz
-
Kristian Nielsen
-
MARK CALLAGHAN
-
Sergei Golubchik