Re: [Maria-developers] 14c2f90ad08: Idempotent INSERT events for system versioning
Hi, Sergei, Thank you for observations! This task is in progress. While doing it I found out that RBR doesn't replicate timestamp fractions. That is: it gets seconds from event timestamp marker and adds fractions from system time. This causes bugs in System Versioning. More on it in Bug 6 here: https://github.com/tempesta-tech/mariadb/issues/578#issuecomment-470533050 I propose to fix it by adding timestamp fractions field to Log_event header after FLAGS_OFFSET. To support backwards compatibility it is needed to increment fdle->binlog_version and probably add new flag LOG_EVENT_HAS_SEC_PART_F (for support sending Log_event without fractions). Do you agree? On Thu, Jan 31, 2019 at 1:10 PM Sergei Golubchik <serg@mariadb.com> wrote:
Hi, Aleksey!
On Jan 31, Aleksey Midenkov wrote:
revision-id: 14c2f90ad08 (versioning-1.0.7-1-g14c2f90ad08) parent(s): a8efe7ab1f2 author: Aleksey Midenkov <midenok@gmail.com> committer: Aleksey Midenkov <midenok@gmail.com> timestamp: 2018-11-17 16:30:10 +0300 message:
Idempotent INSERT events for system versioning
Case 1: Rows_log_event::write_row() always overwrite historical row.
Related to MDEV-16370.
I realized that the whole RBR logging of system versioned tables is wrong.
For an update you log update_row event and write_row event separately and execute them separately. It allows trivially to manipulate the history. See a test case attached.
To fix this, the slave should ignore all events that modify the history. And generate them locally. Say, an update on the master generates update_row and write_row events. The slave executes update_row event which updates a current row and this also should generate the historical row. Then the slave ignores write_row event.
Of course, as an optimization, the master should not generate historical write_row events, but as my test case shows, they can be forged, so the slave should ignore them anyway, if they happen to come.
I've created a new https://jira.mariadb.org/browse/MDEV-18432 for that, let's not have it tied to MDEV-16370.
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
-- All the best, Aleksey Midenkov @midenok
Aleksey, hello.
Hi, Sergei,
Thank you for observations! This task is in progress. While doing it I found out that RBR doesn't replicate timestamp fractions. That is: it gets seconds from event timestamp marker and adds fractions from system time. This causes bugs in System Versioning. More on it in Bug 6 here: https://github.com/tempesta-tech/mariadb/issues/578#issuecomment-470533050
I propose to fix it by adding timestamp fractions field to Log_event header after FLAGS_OFFSET. To support backwards compatibility it is needed to increment fdle->binlog_version and probably add new flag LOG_EVENT_HAS_SEC_PART_F (for support sending Log_event without fractions). Do you agree?
Adding a new flag to Log_event could indeed force the binlog_version increment. However as you're concerned with Rows_log_event the new field could be added to this class and its header. You can find how to do that in Rows_log_event::Rows_log_event. Hope this is helpful. Andrei
On Thu, Jan 31, 2019 at 1:10 PM Sergei Golubchik <serg@mariadb.com> wrote:
Hi, Aleksey!
On Jan 31, Aleksey Midenkov wrote: > revision-id: 14c2f90ad08 (versioning-1.0.7-1-g14c2f90ad08) > parent(s): a8efe7ab1f2 > author: Aleksey Midenkov <midenok@gmail.com> > committer: Aleksey Midenkov <midenok@gmail.com> > timestamp: 2018-11-17 16:30:10 +0300 > message: > > Idempotent INSERT events for system versioning > > Case 1: Rows_log_event::write_row() always overwrite historical row. > > Related to MDEV-16370.
I realized that the whole RBR logging of system versioned tables is wrong.
For an update you log update_row event and write_row event separately and execute them separately. It allows trivially to manipulate the history. See a test case attached.
To fix this, the slave should ignore all events that modify the history. And generate them locally. Say, an update on the master generates update_row and write_row events. The slave executes update_row event which updates a current row and this also should generate the historical row. Then the slave ignores write_row event.
Of course, as an optimization, the master should not generate historical write_row events, but as my test case shows, they can be forged, so the slave should ignore them anyway, if they happen to come.
I've created a new https://jira.mariadb.org/browse/MDEV-18432 for that, let's not have it tied to MDEV-16370.
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
-- All the best,
Aleksey Midenkov @midenok
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Hi, Aleksey! On Mar 12, Aleksey Midenkov wrote:
Hi, Sergei,
Thank you for observations! This task is in progress. While doing it I found out that RBR doesn't replicate timestamp fractions. That is: it gets seconds from event timestamp marker and adds fractions from system time. This causes bugs in System Versioning. More on it in Bug 6 here: https://github.com/tempesta-tech/mariadb/issues/578#issuecomment-470533050
I propose to fix it by adding timestamp fractions field to Log_event header after FLAGS_OFFSET. To support backwards compatibility it is needed to increment fdle->binlog_version and probably add new flag LOG_EVENT_HAS_SEC_PART_F (for support sending Log_event without fractions). Do you agree?
Query_log_event() can already store microseconds, see Q_HRNOW flag. So better to add them to Rows_log_event, if needed. But I don't quite understand why it's needed. What you see - microseconds from the system time and seconds from the binlog event - is a workaround for replicating non-versioned master to versioned slave, where binlog events don't have microseconds. In your case versioned-to-versioned RBR, row events should have microsecond-exact timestamps already, you can get them from there. Only Delete_rows_log_event might not have the actual timestamp(6), but this can be fixed on the master, I suspect. Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Andrei, thanks for explanation! Agree with you and Sergei on that topic. Sergei, On Tue, Mar 12, 2019 at 9:04 PM Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Aleksey!
On Mar 12, Aleksey Midenkov wrote:
Hi, Sergei,
Thank you for observations! This task is in progress. While doing it I found out that RBR doesn't replicate timestamp fractions. That is: it gets seconds from event timestamp marker and adds fractions from system time. This causes bugs in System Versioning. More on it in Bug 6 here:
https://github.com/tempesta-tech/mariadb/issues/578#issuecomment-470533050
I propose to fix it by adding timestamp fractions field to Log_event header after FLAGS_OFFSET. To support backwards compatibility it is needed to increment fdle->binlog_version and probably add new flag LOG_EVENT_HAS_SEC_PART_F (for support sending Log_event without fractions). Do you agree?
Query_log_event() can already store microseconds, see Q_HRNOW flag. So better to add them to Rows_log_event, if needed.
But I don't quite understand why it's needed.
In scope of MDEV-16370 slave adds history record. Master sends UPDATE of row (row_start = X.Y; row_end = MAX). Query time is X.(Y + n). Slave adds history record and gets timestamp: X from master; F from system time fractions. If F < Y this makes bad history record (row_start = X.Y; row_end = X.F).
What you see - microseconds from the system time and seconds from the binlog event - is a workaround for replicating non-versioned master to versioned slave, where binlog events don't have microseconds.
In your case versioned-to-versioned RBR, row events should have microsecond-exact timestamps already, you can get them from there.
Only Delete_rows_log_event might not have the actual timestamp(6), but this can be fixed on the master, I suspect.
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
-- All the best, Aleksey Midenkov @midenok
Hi, Aleksey! On Mar 13, Aleksey Midenkov wrote:
Query_log_event() can already store microseconds, see Q_HRNOW flag. So better to add them to Rows_log_event, if needed.
But I don't quite understand why it's needed.
In scope of MDEV-16370 slave adds history record. Master sends UPDATE of row (row_start = X.Y; row_end = MAX). Query time is X.(Y + n).
Slave adds history record and gets timestamp: X from master; F from system time fractions. If F < Y this makes bad history record (row_start = X.Y; row_end = X.F).
That's exactly what I wrote below. You can make F=Y, because the event already contains Y, you won't need to extend it.
What you see - microseconds from the system time and seconds from the binlog event - is a workaround for replicating non-versioned master to versioned slave, where binlog events don't have microseconds.
In your case versioned-to-versioned RBR, row events should have microsecond-exact timestamps already, you can get them from there.
Only Delete_rows_log_event might not have the actual timestamp(6), but this can be fixed on the master, I suspect.
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Sergei, On Wed, Mar 13, 2019 at 9:01 PM Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Aleksey!
On Mar 13, Aleksey Midenkov wrote:
Query_log_event() can already store microseconds, see Q_HRNOW flag. So better to add them to Rows_log_event, if needed.
But I don't quite understand why it's needed.
In scope of MDEV-16370 slave adds history record. Master sends UPDATE of row (row_start = X.Y; row_end = MAX). Query time is X.(Y + n).
Slave adds history record and gets timestamp: X from master; F from system time fractions. If F < Y this makes bad history record (row_start = X.Y; row_end = X.F).
That's exactly what I wrote below. You can make F=Y, because the event already contains Y, you won't need to extend it.
I don't like it for a couple of reasons. It's a hackish way that is hard to remember and hard to figure out for unenlightened people. It leads to history mismatch between master and slave.
What you see - microseconds from the system time and seconds from the binlog event - is a workaround for replicating non-versioned master to versioned slave, where binlog events don't have microseconds.
In your case versioned-to-versioned RBR, row events should have microsecond-exact timestamps already, you can get them from there.
Only Delete_rows_log_event might not have the actual timestamp(6), but this can be fixed on the master, I suspect.
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
-- All the best, Aleksey Midenkov @midenok
Hi, Aleksey! On Mar 13, Aleksey Midenkov wrote:
On Wed, Mar 13, 2019 at 9:01 PM Sergei Golubchik <serg@mariadb.org> wrote:
On Mar 13, Aleksey Midenkov wrote:
Query_log_event() can already store microseconds, see Q_HRNOW flag. So better to add them to Rows_log_event, if needed.
But I don't quite understand why it's needed.
In scope of MDEV-16370 slave adds history record. Master sends UPDATE of row (row_start = X.Y; row_end = MAX). Query time is X.(Y + n).
Slave adds history record and gets timestamp: X from master; F from system time fractions. If F < Y this makes bad history record (row_start = X.Y; row_end = X.F).
That's exactly what I wrote below. You can make F=Y, because the event already contains Y, you won't need to extend it.
I don't like it for a couple of reasons. It's a hackish way that is hard to remember and hard to figure out for unenlightened people.
What do you mean by that?
It leads to history mismatch between master and slave.
How it can cause a history mismatch? Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Sergey, On Wed, Mar 13, 2019 at 9:48 PM Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Aleksey!
On Mar 13, Aleksey Midenkov wrote:
On Wed, Mar 13, 2019 at 9:01 PM Sergei Golubchik <serg@mariadb.org> wrote:
On Mar 13, Aleksey Midenkov wrote:
Query_log_event() can already store microseconds, see Q_HRNOW flag. So better to add them to Rows_log_event, if needed.
But I don't quite understand why it's needed.
In scope of MDEV-16370 slave adds history record. Master sends UPDATE of row (row_start = X.Y; row_end = MAX). Query time is X.(Y + n).
Slave adds history record and gets timestamp: X from master; F from system time fractions. If F < Y this makes bad history record (row_start = X.Y; row_end = X.F).
That's exactly what I wrote below. You can make F=Y, because the event already contains Y, you won't need to extend it.
I don't like it for a couple of reasons. It's a hackish way that is hard to remember and hard to figure out for unenlightened people.
What do you mean by that?
It's higher overhead costs in terms of support.
It leads to history mismatch between master and slave.
How it can cause a history mismatch?
Obviously fractions will differ. On master it will be: (row_start = X.Y; row_end = X.(Y + n)); On slave it will be: (row_start = X.Y; row_end = X.(Y + 1)); In case Y = 999 999, it will be: (row_start = X.Y; row_end = (X + 1).0);
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
-- All the best, Aleksey Midenkov @midenok
participants (3)
-
Aleksey Midenkov
-
andrei.elkin@pp.inet.fi
-
Sergei Golubchik