developers
Threads by month
- ----- 2025 -----
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 6826 discussions
[Maria-developers] Progress (by Knielsen): New replication APIs (107)
by worklog-noreply@askmonty.org 21 Jun '10
by worklog-noreply@askmonty.org 21 Jun '10
21 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: New replication APIs
CREATION DATE..: Mon, 15 Mar 2010, 13:55
SUPERVISOR.....: Knielsen
IMPLEMENTOR....: Knielsen
COPIES TO......: Sergei
CATEGORY.......: Server-Sprint
TASK ID........: 107 (http://askmonty.org/worklog/?tid=107)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 69
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Knielsen - Mon, 21 Jun 2010, 08:36)=-=-
Research and design thoughts.
Worked 19 hours and estimate 0 hours remain (original estimate increased by 19 hours).
-=-=(Knielsen - Mon, 07 Jun 2010, 12:11)=-=-
High Level Description modified.
--- /tmp/wklog.107.old.31097 2010-06-07 12:11:57.000000000 +0000
+++ /tmp/wklog.107.new.31097 2010-06-07 12:11:57.000000000 +0000
@@ -7,3 +7,6 @@
https://lists.launchpad.net/maria-developers/msg01998.html
+Wiki page for the project:
+
+ http://askmonty.org/wiki/ReplicationProject
-=-=(Knielsen - Mon, 29 Mar 2010, 07:33)=-=-
Research and design discussions: Galera, 2pc/XA, group commit, multi-engine transactions.
Worked 14 hours and estimate 0 hours remain (original estimate increased by 14 hours).
-=-=(Knielsen - Wed, 24 Mar 2010, 10:39)=-=-
Design discussions
Worked 11 hours and estimate 0 hours remain (original estimate increased by 11 hours).
-=-=(Knielsen - Mon, 15 Mar 2010, 14:28)=-=-
Research into the problem, and discussions on phone/mailing list
Worked 25 hours and estimate 0 hours remain (original estimate increased by 25 hours).
-=-=(Guest - Mon, 15 Mar 2010, 14:18)=-=-
High-Level Specification modified.
--- /tmp/wklog.107.old.9086 2010-03-15 14:18:18.000000000 +0000
+++ /tmp/wklog.107.new.9086 2010-03-15 14:18:18.000000000 +0000
@@ -1 +1,43 @@
+Current ideas/status after discussions on the mailing list:
+
+ - Implement a set of plugin APIs and use them to move all of the existing
+ MySQL replication into a (set of) plugins.
+
+ - Design the APIs so that they can support full MySQL replication, but also
+ so that they do not hardcode assumptions about how this replication
+ implementation is done, and so that they will be suitable for other types of
+ replication (Tungsten, Galera, parallel replication, ...).
+
+ - APIs need to include the concept of a global transaction ID. Need to
+ determine the extent to which the semantics of such ID will be defined
+ by the API, and to which extend it will be defined by the plugin
+ implementations.
+
+ - APIs should properly support reliable crash-recovery with decent
+ performance (eg. not require multiple mandatory fsync()s per commit, and
+ not make group commit impossible).
+
+ - Would be nice if the API provided facilities for implementing good
+ consistency checking support (mainly checking master tables against slave
+ tables is hard here I think, but also applying wrong binlog data and
+ individual event checksums).
+
+
+Steps to make this more concrete:
+
+ - Investigate the current MySQL replication, and list all of the places where
+ a plugin implementation will need to connect/hook into the MySQL server.
+ * handler::{write,update,delete}_row()
+ * Statement execution
+ * Transaction start/commit
+ * Table open
+ * Query safe/not/safe for statement based replication
+ * Statement-based logging details (user variables, random seed, etc.)
+ * ...
+
+ - Use this list to make an initial sketch of the set of APIs we need.
+
+ - Use the list to determine the feasibility of this project and the level of
+ detail in the API needed to support a full replication implementation as a
+ plugin.
-=-=(Serg - Mon, 15 Mar 2010, 14:13)=-=-
Observers changed: Sergei
DESCRIPTION:
This is a top-level task for the project of designing a new set of replication
APIs for MariaDB.
This task is for the initial discussion of what to do and where to focus.
The project is started in this email thread:
https://lists.launchpad.net/maria-developers/msg01998.html
Wiki page for the project:
http://askmonty.org/wiki/ReplicationProject
HIGH-LEVEL SPECIFICATION:
Current ideas/status after discussions on the mailing list:
- Implement a set of plugin APIs and use them to move all of the existing
MySQL replication into a (set of) plugins.
- Design the APIs so that they can support full MySQL replication, but also
so that they do not hardcode assumptions about how this replication
implementation is done, and so that they will be suitable for other types of
replication (Tungsten, Galera, parallel replication, ...).
- APIs need to include the concept of a global transaction ID. Need to
determine the extent to which the semantics of such ID will be defined
by the API, and to which extend it will be defined by the plugin
implementations.
- APIs should properly support reliable crash-recovery with decent
performance (eg. not require multiple mandatory fsync()s per commit, and
not make group commit impossible).
- Would be nice if the API provided facilities for implementing good
consistency checking support (mainly checking master tables against slave
tables is hard here I think, but also applying wrong binlog data and
individual event checksums).
Steps to make this more concrete:
- Investigate the current MySQL replication, and list all of the places where
a plugin implementation will need to connect/hook into the MySQL server.
* handler::{write,update,delete}_row()
* Statement execution
* Transaction start/commit
* Table open
* Query safe/not/safe for statement based replication
* Statement-based logging details (user variables, random seed, etc.)
* ...
- Use this list to make an initial sketch of the set of APIs we need.
- Use the list to determine the feasibility of this project and the level of
detail in the API needed to support a full replication implementation as a
plugin.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Knielsen): New replication APIs (107)
by worklog-noreply@askmonty.org 21 Jun '10
by worklog-noreply@askmonty.org 21 Jun '10
21 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: New replication APIs
CREATION DATE..: Mon, 15 Mar 2010, 13:55
SUPERVISOR.....: Knielsen
IMPLEMENTOR....: Knielsen
COPIES TO......: Sergei
CATEGORY.......: Server-Sprint
TASK ID........: 107 (http://askmonty.org/worklog/?tid=107)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 69
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Knielsen - Mon, 21 Jun 2010, 08:36)=-=-
Research and design thoughts.
Worked 19 hours and estimate 0 hours remain (original estimate increased by 19 hours).
-=-=(Knielsen - Mon, 07 Jun 2010, 12:11)=-=-
High Level Description modified.
--- /tmp/wklog.107.old.31097 2010-06-07 12:11:57.000000000 +0000
+++ /tmp/wklog.107.new.31097 2010-06-07 12:11:57.000000000 +0000
@@ -7,3 +7,6 @@
https://lists.launchpad.net/maria-developers/msg01998.html
+Wiki page for the project:
+
+ http://askmonty.org/wiki/ReplicationProject
-=-=(Knielsen - Mon, 29 Mar 2010, 07:33)=-=-
Research and design discussions: Galera, 2pc/XA, group commit, multi-engine transactions.
Worked 14 hours and estimate 0 hours remain (original estimate increased by 14 hours).
-=-=(Knielsen - Wed, 24 Mar 2010, 10:39)=-=-
Design discussions
Worked 11 hours and estimate 0 hours remain (original estimate increased by 11 hours).
-=-=(Knielsen - Mon, 15 Mar 2010, 14:28)=-=-
Research into the problem, and discussions on phone/mailing list
Worked 25 hours and estimate 0 hours remain (original estimate increased by 25 hours).
-=-=(Guest - Mon, 15 Mar 2010, 14:18)=-=-
High-Level Specification modified.
--- /tmp/wklog.107.old.9086 2010-03-15 14:18:18.000000000 +0000
+++ /tmp/wklog.107.new.9086 2010-03-15 14:18:18.000000000 +0000
@@ -1 +1,43 @@
+Current ideas/status after discussions on the mailing list:
+
+ - Implement a set of plugin APIs and use them to move all of the existing
+ MySQL replication into a (set of) plugins.
+
+ - Design the APIs so that they can support full MySQL replication, but also
+ so that they do not hardcode assumptions about how this replication
+ implementation is done, and so that they will be suitable for other types of
+ replication (Tungsten, Galera, parallel replication, ...).
+
+ - APIs need to include the concept of a global transaction ID. Need to
+ determine the extent to which the semantics of such ID will be defined
+ by the API, and to which extend it will be defined by the plugin
+ implementations.
+
+ - APIs should properly support reliable crash-recovery with decent
+ performance (eg. not require multiple mandatory fsync()s per commit, and
+ not make group commit impossible).
+
+ - Would be nice if the API provided facilities for implementing good
+ consistency checking support (mainly checking master tables against slave
+ tables is hard here I think, but also applying wrong binlog data and
+ individual event checksums).
+
+
+Steps to make this more concrete:
+
+ - Investigate the current MySQL replication, and list all of the places where
+ a plugin implementation will need to connect/hook into the MySQL server.
+ * handler::{write,update,delete}_row()
+ * Statement execution
+ * Transaction start/commit
+ * Table open
+ * Query safe/not/safe for statement based replication
+ * Statement-based logging details (user variables, random seed, etc.)
+ * ...
+
+ - Use this list to make an initial sketch of the set of APIs we need.
+
+ - Use the list to determine the feasibility of this project and the level of
+ detail in the API needed to support a full replication implementation as a
+ plugin.
-=-=(Serg - Mon, 15 Mar 2010, 14:13)=-=-
Observers changed: Sergei
DESCRIPTION:
This is a top-level task for the project of designing a new set of replication
APIs for MariaDB.
This task is for the initial discussion of what to do and where to focus.
The project is started in this email thread:
https://lists.launchpad.net/maria-developers/msg01998.html
Wiki page for the project:
http://askmonty.org/wiki/ReplicationProject
HIGH-LEVEL SPECIFICATION:
Current ideas/status after discussions on the mailing list:
- Implement a set of plugin APIs and use them to move all of the existing
MySQL replication into a (set of) plugins.
- Design the APIs so that they can support full MySQL replication, but also
so that they do not hardcode assumptions about how this replication
implementation is done, and so that they will be suitable for other types of
replication (Tungsten, Galera, parallel replication, ...).
- APIs need to include the concept of a global transaction ID. Need to
determine the extent to which the semantics of such ID will be defined
by the API, and to which extend it will be defined by the plugin
implementations.
- APIs should properly support reliable crash-recovery with decent
performance (eg. not require multiple mandatory fsync()s per commit, and
not make group commit impossible).
- Would be nice if the API provided facilities for implementing good
consistency checking support (mainly checking master tables against slave
tables is hard here I think, but also applying wrong binlog data and
individual event checksums).
Steps to make this more concrete:
- Investigate the current MySQL replication, and list all of the places where
a plugin implementation will need to connect/hook into the MySQL server.
* handler::{write,update,delete}_row()
* Statement execution
* Transaction start/commit
* Table open
* Query safe/not/safe for statement based replication
* Statement-based logging details (user variables, random seed, etc.)
* ...
- Use this list to make an initial sketch of the set of APIs we need.
- Use the list to determine the feasibility of this project and the level of
detail in the API needed to support a full replication implementation as a
plugin.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Knielsen): Replication API for stacked event generators (120)
by worklog-noreply@askmonty.org 21 Jun '10
by worklog-noreply@askmonty.org 21 Jun '10
21 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Replication API for stacked event generators
CREATION DATE..: Mon, 07 Jun 2010, 13:13
SUPERVISOR.....: Knielsen
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 120 (http://askmonty.org/worklog/?tid=120)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 2
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Knielsen - Mon, 21 Jun 2010, 08:35)=-=-
Research and design thoughts.
Worked 2 hours and estimate 0 hours remain (original estimate increased by 2 hours).
DESCRIPTION:
A part of the replication project, MWL#107.
Events are produced by event Generators. Examples are
- Generation of statement-based replication events
- Generation of row-based events
- Generation of PBXT engine-level replication events
and maybe reading of events from relay log on slave may also be an example of
generating events.
Event generators can be stacked, and a generator may defer event generation to
the next one down the stack. For example, the row-level replication event
generator may defer DLL to the statement-level replication event generator.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Knielsen): Replication API for stacked event generators (120)
by worklog-noreply@askmonty.org 21 Jun '10
by worklog-noreply@askmonty.org 21 Jun '10
21 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Replication API for stacked event generators
CREATION DATE..: Mon, 07 Jun 2010, 13:13
SUPERVISOR.....: Knielsen
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 120 (http://askmonty.org/worklog/?tid=120)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 2
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Knielsen - Mon, 21 Jun 2010, 08:35)=-=-
Research and design thoughts.
Worked 2 hours and estimate 0 hours remain (original estimate increased by 2 hours).
DESCRIPTION:
A part of the replication project, MWL#107.
Events are produced by event Generators. Examples are
- Generation of statement-based replication events
- Generation of row-based events
- Generation of PBXT engine-level replication events
and maybe reading of events from relay log on slave may also be an example of
generating events.
Event generators can be stacked, and a generator may defer event generation to
the next one down the stack. For example, the row-level replication event
generator may defer DLL to the statement-level replication event generator.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Knielsen): Store in binlog text of statements that caused RBR events (47)
by worklog-noreply@askmonty.org 21 Jun '10
by worklog-noreply@askmonty.org 21 Jun '10
21 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Store in binlog text of statements that caused RBR events
CREATION DATE..: Sat, 15 Aug 2009, 23:48
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Knielsen, Serg
CATEGORY.......: Server-Sprint
TASK ID........: 47 (http://askmonty.org/worklog/?tid=47)
VERSION........: Server-9.x
STATUS.........: Code-Review
PRIORITY.......: 60
WORKED HOURS...: 42
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 35
PROGRESS NOTES:
-=-=(Knielsen - Mon, 21 Jun 2010, 08:32)=-=-
Final review.
Assist with some problems applying the patch.
Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour).
-=-=(Guest - Thu, 17 Jun 2010, 00:38)=-=-
Dependency deleted: 39 no longer depends on 47
-=-=(Knielsen - Mon, 07 Jun 2010, 07:13)=-=-
Help debug some test failures seen in Buildbot.
Worked 6 hours and estimate 0 hours remain (original estimate increased by 6 hours).
-=-=(Knielsen - Mon, 31 May 2010, 06:49)=-=-
Help Alexi debug+fix some test problems in the patch.
Worked 4 hours and estimate 0 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 25 May 2010, 08:29)=-=-
Help debug strange problem in mysqlbinlog.test.
Worked 1 hour and estimate 4 hours remain (original estimate unchanged).
-=-=(Knielsen - Mon, 17 May 2010, 08:45)=-=-
Merge with latest trunk and run Buildbot tests.
Worked 1 hour and estimate 5 hours remain (original estimate unchanged).
-=-=(Knielsen - Wed, 05 May 2010, 13:53)=-=-
Review of fixes to first review done. No new issues found.
Worked 2 hours and estimate 6 hours remain (original estimate unchanged).
-=-=(Knielsen - Fri, 23 Apr 2010, 12:51)=-=-
Status updated.
--- /tmp/wklog.47.old.28747 2010-04-23 12:51:36.000000000 +0000
+++ /tmp/wklog.47.new.28747 2010-04-23 12:51:36.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Knielsen - Tue, 06 Apr 2010, 15:26)=-=-
Code review (mailed to maria-developers@).
Worked 7 hours and estimate 8 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 06 Apr 2010, 15:25)=-=-
Status updated.
--- /tmp/wklog.47.old.12734 2010-04-06 15:25:54.000000000 +0000
+++ /tmp/wklog.47.new.12734 2010-04-06 15:25:54.000000000 +0000
@@ -1 +1 @@
-Code-Review
+In-Progress
------------------------------------------------------------
-=-=(View All Progress Notes, 35 total)=-=-
http://askmonty.org/worklog/index.pl?tid=47&nolimit=1
DESCRIPTION:
Store in binlog (and show in mysqlbinlog output) texts of statements that
caused RBR events
This is needed for (list from Monty):
- Easier to understand why updates happened
- Would make it easier to find out where in application things went
wrong (as you can search for exact strings)
- Allow one to filter things based on comments in the statement.
The cost of this can be that the binlog will be approximately 2x in size
(especially insert of big blob's would be a bit painful), so this should
be an optional feature.
HIGH-LEVEL SPECIFICATION:
Content
~~~~~~~
1. Annotate_rows_log_event
2. Server option: --binlog-annotate-rows-events
3. Server option: --replicate-annotate-rows-events
4. mysqlbinlog option: --print-annotate-rows-events
5. mysqlbinlog output
1. Annotate_rows_log_event [ ANNOTATE_ROWS_EVENT ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Describes the query which caused the corresponding rows events. Has empty
post-header and contains the query text in its data part. Example:
************************
ANNOTATE_ROWS_EVENT
************************
00000220 | B6 A0 2C 4B | time_when = 1261215926
00000224 | 33 | event_type = 51
00000225 | 64 00 00 00 | server_id = 100
00000229 | 36 00 00 00 | event_len = 54
0000022D | 56 02 00 00 | log_pos = 00000256
00000231 | 00 00 | flags = <none>
------------------------
00000233 | 49 4E 53 45 | query = "INSERT INTO t1 VALUES (1), (2), (3)"
00000237 | 52 54 20 49 |
0000023B | 4E 54 4F 20 |
0000023F | 74 31 20 56 |
00000243 | 41 4C 55 45 |
00000247 | 53 20 28 31 |
0000024B | 29 2C 20 28 |
0000024F | 32 29 2C 20 |
00000253 | 28 33 29 |
************************
In binary log, Annotate_rows event follows the (possible) 'BEGIN' Query event
and precedes the first of Table map events which accompany the corresponding
rows events. (See example in the "mysqlbinlog output" section below.)
2. Server option: --binlog-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the master to write Annotate_rows events to the binary log.
* Variable Name: binlog_annotate_rows_events
* Scope: Global & Session
* Access Type: Dynamic
* Data Type: bool
* Default Value: OFF
NOTE. Session values allows to annotate only some selected statements:
...
SET SESSION binlog_annotate_rows_events=ON;
... statements to be annotated ...
SET SESSION binlog_annotate_rows_events=OFF;
... statements not to be annotated ...
3. Server option: --replicate-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the slave to reproduce Annotate_rows events recieved from the master
in its own binary log (sensible only in pair with log-slave-updates option).
* Variable Name: replicate_annotate_rows_events
* Scope: Global
* Access Type: Read only
* Data Type: bool
* Default Value: OFF
NOTE. Why do we additionally need this 'replicate' option? Why not to make
the slave to reproduce this events when its binlog-annotate-rows-events
global value is ON? Well, because, for example, we may want to configure
the slave which should reproduce Annotate_rows events but has global
binlog-annotate-rows-events = OFF meaning this to be the default value for
the client threads (see also "How slave treats replicate-annotate-rows-events
option" in LLD part).
4. mysqlbinlog option: --print-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With this option, mysqlbinlog prints the content of Annotate_rows events (if
the binary log does contain them). Without this option (i.e. by default),
mysqlbinlog skips Annotate_rows events.
5. mysqlbinlog output
~~~~~~~~~~~~~~~~~~~~~
With --print-annotate-rows-events, mysqlbinlog outputs Annotate_rows events
in a form like this:
...
# at 1646
#091219 12:45:26 server id 100 end_log_pos 1714 Query thread_id=1
exec_time=0 error_code=0
SET TIMESTAMP=1261215926/*!*/;
BEGIN
/*!*/;
# at 1714
# at 1812
# at 1853
# at 1894
# at 1938
#091219 12:45:26 server id 100 end_log_pos 1812 Query: `DELETE t1, t2 FROM
t1 INNER JOIN t2 INNER JOIN t3 WHERE t1.a=t2.a AND t2.a=t3.a`
#091219 12:45:26 server id 100 end_log_pos 1853 Table_map: `test`.`t1`
mapped to number 16
#091219 12:45:26 server id 100 end_log_pos 1894 Table_map: `test`.`t2`
mapped to number 17
#091219 12:45:26 server id 100 end_log_pos 1938 Delete_rows: table id 16
#091219 12:45:26 server id 100 end_log_pos 1982 Delete_rows: table id 17
flags: STMT_END_F
...
LOW-LEVEL DESIGN:
Content
~~~~~~~
1. Annotate_rows event number
2. Outline of Annotate_rows event behavior
3. How Master writes Annotate_rows events to the binary log
4. How slave treats replicate-annotate-rows-events option
5. How slave IO thread requests Annotate_rows events
6. How master executes the request
7. How slave SQL thread processes Annotate_rows events
8. General remarks
1. Annotate_rows event number
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To avoid possible event numbers conflict with MySQL/Sun, we leave a gap
between the last MySQL event number and the Annotate_rows event number:
enum Log_event_type
{ ...
INCIDENT_EVENT= 26,
// New MySQL event numbers are to be added here
MYSQL_EVENTS_END,
MARIA_EVENTS_BEGIN= 51,
// New Maria event numbers start from here
ANNOTATE_ROWS_EVENT= 51,
ENUM_END_EVENT
};
together with the corresponding extension of 'post_header_len' array in the
Format description event. (This extension does not affect the compatibility
of the binary log). Here is how Format description event looks like with
this extension:
************************
FORMAT_DESCRIPTION_EVENT
************************
00000004 | A1 A0 2C 4B | time_when = 1261215905
00000008 | 0F | event_type = 15
00000009 | 64 00 00 00 | server_id = 100
0000000D | 7F 00 00 00 | event_len = 127
00000011 | 83 00 00 00 | log_pos = 00000083
00000015 | 01 00 | flags = LOG_EVENT_BINLOG_IN_USE_F
------------------------
00000017 | 04 00 | binlog_ver = 4
00000019 | 35 2E 32 2E | server_ver = 5.2.0-MariaDB-alpha-debug-log
..... ...
0000004B | A1 A0 2C 4B | time_created = 1261215905
0000004F | 13 | common_header_len = 19
------------------------
post_header_len
------------------------
00000050 | 38 | 56 - START_EVENT_V3 [1]
..... ...
00000069 | 02 | 2 - INCIDENT_EVENT [26]
0000006A | 00 | 0 - RESERVED [27]
..... ...
00000081 | 00 | 0 - RESERVED [50]
00000082 | 00 | 0 - ANNOTATE_ROWS_EVENT [51]
************************
2. Outline of Annotate_rows event behavior
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Each Annotate_rows_log_event object has two private members describing the
corresponding query:
char *m_query_txt;
uint m_query_len;
When the object is created for writing to a binary log, this query is taken
from 'thd' (for short, below we omit the 'Annotate_rows_log_event::' prefix
as well as other implementation details):
Annotate_rows_log_event(THD *thd)
{
m_query_txt = thd->query();
m_query_len = thd->query_length();
}
When the object is read from a binary log, the query is taken from the buffer
containing the binary log representation of the event (this buffer is allocated
in Log_event object from which all Log events are derived):
Annotate_rows_log_event(char *buf, uint event_len,
Format_description_log_event *desc)
{
m_query_len = event_len - desc->common_header_len;
m_query_txt = buf + desc->common_header_len;
}
The events are written to the binary log by the Log_event::write() member
which calls virtual write_data_header() and write_data_body() members
("data header" and "post header" are synonym in replication terminology).
In our case, data header is empty and data body is just the query:
bool write_data_body(IO_CACHE *file)
{
return my_b_safe_write(file, (uchar*) m_query_txt, m_query_len);
}
Printing the event is just printing the query:
void Annotate_rows_log_event::print(FILE *file, PRINT_EVENT_INFO *pinfo)
{
my_b_printf(&pinfo->head_cache, "\tQuery: `%s`\n", m_query_txt);
}
3. How Master writes Annotate_rows events to the binary log
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The event is written to the binary log just before the group of Table_map
events which precede corresponding Rows events (one query may generate
several Table map events in the binary log, but the corresponding
Annotate_rows event must be written only once before the first Table map
event; hence the boolean variable 'with_annotate' below):
int write_locked_table_maps(THD *thd)
{ ...
bool with_annotate= thd->variables.binlog_annotate_rows_events;
...
for (uint i= 0; i < ... <number of tables> ...; ++i)
{ ...
thd->binlog_write_table_map(table, ..., with_annotate);
with_annotate= 0; // write Annotate_event not more than once
...
}
...
}
int THD::binlog_write_table_map(TABLE *table, ..., bool with_annotate)
{ ...
Table_map_log_event the_event(...);
...
if (with_annotate)
{
Annotate_rows_log_event anno(this);
mysql_bin_log.write(&anno);
}
mysql_bin_log.write(&the_event);
...
}
4. How slave treats replicate-annotate-rows-events option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The replicate-annotate-rows-events option is treated just as the session
value of the binlog_annotate_rows_events variable for the slave IO and
SQL threads. This setting is done during initialization of these threads:
pthread_handler_t handle_slave_io(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_IO);
...
}
pthread_handler_t handle_slave_sql(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_SQL);
...
}
int init_slave_thread(THD* thd, SLAVE_THD_TYPE thd_type)
{ ...
thd->variables.binlog_annotate_rows_events=
opt_replicate_annotate_rows_events;
...
}
5. How slave IO thread requests Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the replicate-annotate-rows-events option is not set on a slave, there
is no need for master to send Annotate_rows events to this slave. The slave
(or mysqlbinlog in remote case), before requesting binlog dump via the
COM_BINLOG_DUMP command, informs the master whether it should send these
events by executing the newly added COM_BINLOG_DUMP_OPTIONS_EXT server
command:
case COM_BINLOG_DUMP_OPTIONS_EXT:
thd->binlog_dump_flags_ext= packet[0];
my_ok(thd);
break;
Note. We add this new command and don't use COM_BINLOG_DUMP to avoid possible
conflicts with MySQL/Sun.
6. How master executes the request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
case COM_BINLOG_DUMP:
{ ...
flags= uint2korr(packet + 4);
...
mysql_binlog_send(thd, ..., flags);
...
}
void mysql_binlog_send(THD* thd, ..., ushort flags)
{ ...
Log_event::read_log_event(&log, packet, ...);
...
if ((*packet)[EVENT_TYPE_OFFSET + 1] != ANNOTATE_ROWS_EVENT ||
flags & BINLOG_SEND_ANNOTATE_ROWS_EVENT)
{
my_net_write(net, packet->ptr(), packet->length());
}
...
}
7. How slave SQL thread processes Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The slave processes each recieved event by "applying" it, i.e. by
calling the Log_event::apply_event() function which in turn calls
the virtual do_apply_event() member specific for each type of the
event.
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev = next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
int apply_event_and_update_pos(Log_event *ev, ...)
{ ...
ev->apply_event(...);
...
}
int Log_event::apply_event(...)
{
return do_apply_event(...);
}
What does it mean to "apply" an Annotate_rows event? It means to set current
thd query to that of the described by the event, i.e. to the query which
caused the subsequent Rows events (see "How Master writes Annotate_rows
events to the binary log" to follow what happens further when the subsequent
Rows events are applied):
int Annotate_rows_log_event::do_apply_event(...)
{
thd->set_query(m_query_txt, m_query_len);
}
NOTE. I am not sure, but possibly current values of thd->query and
thd->query_length should be saved before calling set_query() and to be
restored on the Annotate_rows_log_event object deletion.
Is it really needed ?
After calling this do_apply_event() function we may not delete the
Annotate_rows_log_event object immediatedly (see exec_relay_log_event()
above) because thd->query now points to the string inside this object.
We may keep the pointer to this object in the Relay_log_info:
class Relay_log_info
{
public:
...
void set_annotate_event(Annotate_rows_log_event*);
Annotate_rows_log_event* get_annotate_event();
void free_annotate_event();
...
private:
Annotate_rows_log_event* m_annotate_event;
};
The saved Annotate_rows object should be deleted when all corresponding
Rows events will be processed:
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev= next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (rli->get_annotate_event() && is_last_rows_event(ev))
rli->free_annotate_event();
else if (ev->get_type_code() == ANNOTATE_ROWS_EVENT)
rli->set_annotate_event((Annotate_rows_log_event*) ev);
else if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
where
bool is_last_rows_event(Log_event* ev)
{
Log_event_type type= ev->get_type_code();
if (IS_ROWS_EVENT_TYPE(type))
{
Rows_log_event* rows= (Rows_log_event*)ev;
return rows->get_flags(Rows_log_event::STMT_END_F);
}
return 0;
}
#define IS_ROWS_EVENT_TYPE(type) ((type) == WRITE_ROWS_EVENT || \
(type) == UPDATE_ROWS_EVENT || \
(type) == DELETE_ROWS_EVENT)
8. General remarks
~~~~~~~~~~~~~~~~~~
Kristian noticed that introducing new log event type should be coordinated
somehow with MySQL/Sun:
Kristian: The numeric code for this event must be assigned carefully.
It should be coordinated with MySQL/Sun, otherwise we can get into a
situation where MySQL uses the same numeric code for one event that
MariaDB uses for ANNOTATE_ROWS_EVENT, which would make merging the two
impossible.
Alex: I reserved about 20 numbers not to have possible conflicts
with MySQL.
Kristian: Still, I think it would be appropriate to send a polite email
to internals(a)lists.mysql.com about this and suggesting to reserve the
event number.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Knielsen): Store in binlog text of statements that caused RBR events (47)
by worklog-noreply@askmonty.org 21 Jun '10
by worklog-noreply@askmonty.org 21 Jun '10
21 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Store in binlog text of statements that caused RBR events
CREATION DATE..: Sat, 15 Aug 2009, 23:48
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Knielsen, Serg
CATEGORY.......: Server-Sprint
TASK ID........: 47 (http://askmonty.org/worklog/?tid=47)
VERSION........: Server-9.x
STATUS.........: Code-Review
PRIORITY.......: 60
WORKED HOURS...: 42
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 35
PROGRESS NOTES:
-=-=(Knielsen - Mon, 21 Jun 2010, 08:32)=-=-
Final review.
Assist with some problems applying the patch.
Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour).
-=-=(Guest - Thu, 17 Jun 2010, 00:38)=-=-
Dependency deleted: 39 no longer depends on 47
-=-=(Knielsen - Mon, 07 Jun 2010, 07:13)=-=-
Help debug some test failures seen in Buildbot.
Worked 6 hours and estimate 0 hours remain (original estimate increased by 6 hours).
-=-=(Knielsen - Mon, 31 May 2010, 06:49)=-=-
Help Alexi debug+fix some test problems in the patch.
Worked 4 hours and estimate 0 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 25 May 2010, 08:29)=-=-
Help debug strange problem in mysqlbinlog.test.
Worked 1 hour and estimate 4 hours remain (original estimate unchanged).
-=-=(Knielsen - Mon, 17 May 2010, 08:45)=-=-
Merge with latest trunk and run Buildbot tests.
Worked 1 hour and estimate 5 hours remain (original estimate unchanged).
-=-=(Knielsen - Wed, 05 May 2010, 13:53)=-=-
Review of fixes to first review done. No new issues found.
Worked 2 hours and estimate 6 hours remain (original estimate unchanged).
-=-=(Knielsen - Fri, 23 Apr 2010, 12:51)=-=-
Status updated.
--- /tmp/wklog.47.old.28747 2010-04-23 12:51:36.000000000 +0000
+++ /tmp/wklog.47.new.28747 2010-04-23 12:51:36.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Knielsen - Tue, 06 Apr 2010, 15:26)=-=-
Code review (mailed to maria-developers@).
Worked 7 hours and estimate 8 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 06 Apr 2010, 15:25)=-=-
Status updated.
--- /tmp/wklog.47.old.12734 2010-04-06 15:25:54.000000000 +0000
+++ /tmp/wklog.47.new.12734 2010-04-06 15:25:54.000000000 +0000
@@ -1 +1 @@
-Code-Review
+In-Progress
------------------------------------------------------------
-=-=(View All Progress Notes, 35 total)=-=-
http://askmonty.org/worklog/index.pl?tid=47&nolimit=1
DESCRIPTION:
Store in binlog (and show in mysqlbinlog output) texts of statements that
caused RBR events
This is needed for (list from Monty):
- Easier to understand why updates happened
- Would make it easier to find out where in application things went
wrong (as you can search for exact strings)
- Allow one to filter things based on comments in the statement.
The cost of this can be that the binlog will be approximately 2x in size
(especially insert of big blob's would be a bit painful), so this should
be an optional feature.
HIGH-LEVEL SPECIFICATION:
Content
~~~~~~~
1. Annotate_rows_log_event
2. Server option: --binlog-annotate-rows-events
3. Server option: --replicate-annotate-rows-events
4. mysqlbinlog option: --print-annotate-rows-events
5. mysqlbinlog output
1. Annotate_rows_log_event [ ANNOTATE_ROWS_EVENT ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Describes the query which caused the corresponding rows events. Has empty
post-header and contains the query text in its data part. Example:
************************
ANNOTATE_ROWS_EVENT
************************
00000220 | B6 A0 2C 4B | time_when = 1261215926
00000224 | 33 | event_type = 51
00000225 | 64 00 00 00 | server_id = 100
00000229 | 36 00 00 00 | event_len = 54
0000022D | 56 02 00 00 | log_pos = 00000256
00000231 | 00 00 | flags = <none>
------------------------
00000233 | 49 4E 53 45 | query = "INSERT INTO t1 VALUES (1), (2), (3)"
00000237 | 52 54 20 49 |
0000023B | 4E 54 4F 20 |
0000023F | 74 31 20 56 |
00000243 | 41 4C 55 45 |
00000247 | 53 20 28 31 |
0000024B | 29 2C 20 28 |
0000024F | 32 29 2C 20 |
00000253 | 28 33 29 |
************************
In binary log, Annotate_rows event follows the (possible) 'BEGIN' Query event
and precedes the first of Table map events which accompany the corresponding
rows events. (See example in the "mysqlbinlog output" section below.)
2. Server option: --binlog-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the master to write Annotate_rows events to the binary log.
* Variable Name: binlog_annotate_rows_events
* Scope: Global & Session
* Access Type: Dynamic
* Data Type: bool
* Default Value: OFF
NOTE. Session values allows to annotate only some selected statements:
...
SET SESSION binlog_annotate_rows_events=ON;
... statements to be annotated ...
SET SESSION binlog_annotate_rows_events=OFF;
... statements not to be annotated ...
3. Server option: --replicate-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the slave to reproduce Annotate_rows events recieved from the master
in its own binary log (sensible only in pair with log-slave-updates option).
* Variable Name: replicate_annotate_rows_events
* Scope: Global
* Access Type: Read only
* Data Type: bool
* Default Value: OFF
NOTE. Why do we additionally need this 'replicate' option? Why not to make
the slave to reproduce this events when its binlog-annotate-rows-events
global value is ON? Well, because, for example, we may want to configure
the slave which should reproduce Annotate_rows events but has global
binlog-annotate-rows-events = OFF meaning this to be the default value for
the client threads (see also "How slave treats replicate-annotate-rows-events
option" in LLD part).
4. mysqlbinlog option: --print-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With this option, mysqlbinlog prints the content of Annotate_rows events (if
the binary log does contain them). Without this option (i.e. by default),
mysqlbinlog skips Annotate_rows events.
5. mysqlbinlog output
~~~~~~~~~~~~~~~~~~~~~
With --print-annotate-rows-events, mysqlbinlog outputs Annotate_rows events
in a form like this:
...
# at 1646
#091219 12:45:26 server id 100 end_log_pos 1714 Query thread_id=1
exec_time=0 error_code=0
SET TIMESTAMP=1261215926/*!*/;
BEGIN
/*!*/;
# at 1714
# at 1812
# at 1853
# at 1894
# at 1938
#091219 12:45:26 server id 100 end_log_pos 1812 Query: `DELETE t1, t2 FROM
t1 INNER JOIN t2 INNER JOIN t3 WHERE t1.a=t2.a AND t2.a=t3.a`
#091219 12:45:26 server id 100 end_log_pos 1853 Table_map: `test`.`t1`
mapped to number 16
#091219 12:45:26 server id 100 end_log_pos 1894 Table_map: `test`.`t2`
mapped to number 17
#091219 12:45:26 server id 100 end_log_pos 1938 Delete_rows: table id 16
#091219 12:45:26 server id 100 end_log_pos 1982 Delete_rows: table id 17
flags: STMT_END_F
...
LOW-LEVEL DESIGN:
Content
~~~~~~~
1. Annotate_rows event number
2. Outline of Annotate_rows event behavior
3. How Master writes Annotate_rows events to the binary log
4. How slave treats replicate-annotate-rows-events option
5. How slave IO thread requests Annotate_rows events
6. How master executes the request
7. How slave SQL thread processes Annotate_rows events
8. General remarks
1. Annotate_rows event number
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To avoid possible event numbers conflict with MySQL/Sun, we leave a gap
between the last MySQL event number and the Annotate_rows event number:
enum Log_event_type
{ ...
INCIDENT_EVENT= 26,
// New MySQL event numbers are to be added here
MYSQL_EVENTS_END,
MARIA_EVENTS_BEGIN= 51,
// New Maria event numbers start from here
ANNOTATE_ROWS_EVENT= 51,
ENUM_END_EVENT
};
together with the corresponding extension of 'post_header_len' array in the
Format description event. (This extension does not affect the compatibility
of the binary log). Here is how Format description event looks like with
this extension:
************************
FORMAT_DESCRIPTION_EVENT
************************
00000004 | A1 A0 2C 4B | time_when = 1261215905
00000008 | 0F | event_type = 15
00000009 | 64 00 00 00 | server_id = 100
0000000D | 7F 00 00 00 | event_len = 127
00000011 | 83 00 00 00 | log_pos = 00000083
00000015 | 01 00 | flags = LOG_EVENT_BINLOG_IN_USE_F
------------------------
00000017 | 04 00 | binlog_ver = 4
00000019 | 35 2E 32 2E | server_ver = 5.2.0-MariaDB-alpha-debug-log
..... ...
0000004B | A1 A0 2C 4B | time_created = 1261215905
0000004F | 13 | common_header_len = 19
------------------------
post_header_len
------------------------
00000050 | 38 | 56 - START_EVENT_V3 [1]
..... ...
00000069 | 02 | 2 - INCIDENT_EVENT [26]
0000006A | 00 | 0 - RESERVED [27]
..... ...
00000081 | 00 | 0 - RESERVED [50]
00000082 | 00 | 0 - ANNOTATE_ROWS_EVENT [51]
************************
2. Outline of Annotate_rows event behavior
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Each Annotate_rows_log_event object has two private members describing the
corresponding query:
char *m_query_txt;
uint m_query_len;
When the object is created for writing to a binary log, this query is taken
from 'thd' (for short, below we omit the 'Annotate_rows_log_event::' prefix
as well as other implementation details):
Annotate_rows_log_event(THD *thd)
{
m_query_txt = thd->query();
m_query_len = thd->query_length();
}
When the object is read from a binary log, the query is taken from the buffer
containing the binary log representation of the event (this buffer is allocated
in Log_event object from which all Log events are derived):
Annotate_rows_log_event(char *buf, uint event_len,
Format_description_log_event *desc)
{
m_query_len = event_len - desc->common_header_len;
m_query_txt = buf + desc->common_header_len;
}
The events are written to the binary log by the Log_event::write() member
which calls virtual write_data_header() and write_data_body() members
("data header" and "post header" are synonym in replication terminology).
In our case, data header is empty and data body is just the query:
bool write_data_body(IO_CACHE *file)
{
return my_b_safe_write(file, (uchar*) m_query_txt, m_query_len);
}
Printing the event is just printing the query:
void Annotate_rows_log_event::print(FILE *file, PRINT_EVENT_INFO *pinfo)
{
my_b_printf(&pinfo->head_cache, "\tQuery: `%s`\n", m_query_txt);
}
3. How Master writes Annotate_rows events to the binary log
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The event is written to the binary log just before the group of Table_map
events which precede corresponding Rows events (one query may generate
several Table map events in the binary log, but the corresponding
Annotate_rows event must be written only once before the first Table map
event; hence the boolean variable 'with_annotate' below):
int write_locked_table_maps(THD *thd)
{ ...
bool with_annotate= thd->variables.binlog_annotate_rows_events;
...
for (uint i= 0; i < ... <number of tables> ...; ++i)
{ ...
thd->binlog_write_table_map(table, ..., with_annotate);
with_annotate= 0; // write Annotate_event not more than once
...
}
...
}
int THD::binlog_write_table_map(TABLE *table, ..., bool with_annotate)
{ ...
Table_map_log_event the_event(...);
...
if (with_annotate)
{
Annotate_rows_log_event anno(this);
mysql_bin_log.write(&anno);
}
mysql_bin_log.write(&the_event);
...
}
4. How slave treats replicate-annotate-rows-events option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The replicate-annotate-rows-events option is treated just as the session
value of the binlog_annotate_rows_events variable for the slave IO and
SQL threads. This setting is done during initialization of these threads:
pthread_handler_t handle_slave_io(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_IO);
...
}
pthread_handler_t handle_slave_sql(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_SQL);
...
}
int init_slave_thread(THD* thd, SLAVE_THD_TYPE thd_type)
{ ...
thd->variables.binlog_annotate_rows_events=
opt_replicate_annotate_rows_events;
...
}
5. How slave IO thread requests Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the replicate-annotate-rows-events option is not set on a slave, there
is no need for master to send Annotate_rows events to this slave. The slave
(or mysqlbinlog in remote case), before requesting binlog dump via the
COM_BINLOG_DUMP command, informs the master whether it should send these
events by executing the newly added COM_BINLOG_DUMP_OPTIONS_EXT server
command:
case COM_BINLOG_DUMP_OPTIONS_EXT:
thd->binlog_dump_flags_ext= packet[0];
my_ok(thd);
break;
Note. We add this new command and don't use COM_BINLOG_DUMP to avoid possible
conflicts with MySQL/Sun.
6. How master executes the request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
case COM_BINLOG_DUMP:
{ ...
flags= uint2korr(packet + 4);
...
mysql_binlog_send(thd, ..., flags);
...
}
void mysql_binlog_send(THD* thd, ..., ushort flags)
{ ...
Log_event::read_log_event(&log, packet, ...);
...
if ((*packet)[EVENT_TYPE_OFFSET + 1] != ANNOTATE_ROWS_EVENT ||
flags & BINLOG_SEND_ANNOTATE_ROWS_EVENT)
{
my_net_write(net, packet->ptr(), packet->length());
}
...
}
7. How slave SQL thread processes Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The slave processes each recieved event by "applying" it, i.e. by
calling the Log_event::apply_event() function which in turn calls
the virtual do_apply_event() member specific for each type of the
event.
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev = next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
int apply_event_and_update_pos(Log_event *ev, ...)
{ ...
ev->apply_event(...);
...
}
int Log_event::apply_event(...)
{
return do_apply_event(...);
}
What does it mean to "apply" an Annotate_rows event? It means to set current
thd query to that of the described by the event, i.e. to the query which
caused the subsequent Rows events (see "How Master writes Annotate_rows
events to the binary log" to follow what happens further when the subsequent
Rows events are applied):
int Annotate_rows_log_event::do_apply_event(...)
{
thd->set_query(m_query_txt, m_query_len);
}
NOTE. I am not sure, but possibly current values of thd->query and
thd->query_length should be saved before calling set_query() and to be
restored on the Annotate_rows_log_event object deletion.
Is it really needed ?
After calling this do_apply_event() function we may not delete the
Annotate_rows_log_event object immediatedly (see exec_relay_log_event()
above) because thd->query now points to the string inside this object.
We may keep the pointer to this object in the Relay_log_info:
class Relay_log_info
{
public:
...
void set_annotate_event(Annotate_rows_log_event*);
Annotate_rows_log_event* get_annotate_event();
void free_annotate_event();
...
private:
Annotate_rows_log_event* m_annotate_event;
};
The saved Annotate_rows object should be deleted when all corresponding
Rows events will be processed:
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev= next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (rli->get_annotate_event() && is_last_rows_event(ev))
rli->free_annotate_event();
else if (ev->get_type_code() == ANNOTATE_ROWS_EVENT)
rli->set_annotate_event((Annotate_rows_log_event*) ev);
else if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
where
bool is_last_rows_event(Log_event* ev)
{
Log_event_type type= ev->get_type_code();
if (IS_ROWS_EVENT_TYPE(type))
{
Rows_log_event* rows= (Rows_log_event*)ev;
return rows->get_flags(Rows_log_event::STMT_END_F);
}
return 0;
}
#define IS_ROWS_EVENT_TYPE(type) ((type) == WRITE_ROWS_EVENT || \
(type) == UPDATE_ROWS_EVENT || \
(type) == DELETE_ROWS_EVENT)
8. General remarks
~~~~~~~~~~~~~~~~~~
Kristian noticed that introducing new log event type should be coordinated
somehow with MySQL/Sun:
Kristian: The numeric code for this event must be assigned carefully.
It should be coordinated with MySQL/Sun, otherwise we can get into a
situation where MySQL uses the same numeric code for one event that
MariaDB uses for ANNOTATE_ROWS_EVENT, which would make merging the two
impossible.
Alex: I reserved about 20 numbers not to have possible conflicts
with MySQL.
Kristian: Still, I think it would be appropriate to send a polite email
to internals(a)lists.mysql.com about this and suggesting to reserve the
event number.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Knielsen): Store in binlog text of statements that caused RBR events (47)
by worklog-noreply@askmonty.org 21 Jun '10
by worklog-noreply@askmonty.org 21 Jun '10
21 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Store in binlog text of statements that caused RBR events
CREATION DATE..: Sat, 15 Aug 2009, 23:48
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Knielsen, Serg
CATEGORY.......: Server-Sprint
TASK ID........: 47 (http://askmonty.org/worklog/?tid=47)
VERSION........: Server-9.x
STATUS.........: Code-Review
PRIORITY.......: 60
WORKED HOURS...: 42
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 35
PROGRESS NOTES:
-=-=(Knielsen - Mon, 21 Jun 2010, 08:32)=-=-
Final review.
Assist with some problems applying the patch.
Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour).
-=-=(Guest - Thu, 17 Jun 2010, 00:38)=-=-
Dependency deleted: 39 no longer depends on 47
-=-=(Knielsen - Mon, 07 Jun 2010, 07:13)=-=-
Help debug some test failures seen in Buildbot.
Worked 6 hours and estimate 0 hours remain (original estimate increased by 6 hours).
-=-=(Knielsen - Mon, 31 May 2010, 06:49)=-=-
Help Alexi debug+fix some test problems in the patch.
Worked 4 hours and estimate 0 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 25 May 2010, 08:29)=-=-
Help debug strange problem in mysqlbinlog.test.
Worked 1 hour and estimate 4 hours remain (original estimate unchanged).
-=-=(Knielsen - Mon, 17 May 2010, 08:45)=-=-
Merge with latest trunk and run Buildbot tests.
Worked 1 hour and estimate 5 hours remain (original estimate unchanged).
-=-=(Knielsen - Wed, 05 May 2010, 13:53)=-=-
Review of fixes to first review done. No new issues found.
Worked 2 hours and estimate 6 hours remain (original estimate unchanged).
-=-=(Knielsen - Fri, 23 Apr 2010, 12:51)=-=-
Status updated.
--- /tmp/wklog.47.old.28747 2010-04-23 12:51:36.000000000 +0000
+++ /tmp/wklog.47.new.28747 2010-04-23 12:51:36.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Knielsen - Tue, 06 Apr 2010, 15:26)=-=-
Code review (mailed to maria-developers@).
Worked 7 hours and estimate 8 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 06 Apr 2010, 15:25)=-=-
Status updated.
--- /tmp/wklog.47.old.12734 2010-04-06 15:25:54.000000000 +0000
+++ /tmp/wklog.47.new.12734 2010-04-06 15:25:54.000000000 +0000
@@ -1 +1 @@
-Code-Review
+In-Progress
------------------------------------------------------------
-=-=(View All Progress Notes, 35 total)=-=-
http://askmonty.org/worklog/index.pl?tid=47&nolimit=1
DESCRIPTION:
Store in binlog (and show in mysqlbinlog output) texts of statements that
caused RBR events
This is needed for (list from Monty):
- Easier to understand why updates happened
- Would make it easier to find out where in application things went
wrong (as you can search for exact strings)
- Allow one to filter things based on comments in the statement.
The cost of this can be that the binlog will be approximately 2x in size
(especially insert of big blob's would be a bit painful), so this should
be an optional feature.
HIGH-LEVEL SPECIFICATION:
Content
~~~~~~~
1. Annotate_rows_log_event
2. Server option: --binlog-annotate-rows-events
3. Server option: --replicate-annotate-rows-events
4. mysqlbinlog option: --print-annotate-rows-events
5. mysqlbinlog output
1. Annotate_rows_log_event [ ANNOTATE_ROWS_EVENT ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Describes the query which caused the corresponding rows events. Has empty
post-header and contains the query text in its data part. Example:
************************
ANNOTATE_ROWS_EVENT
************************
00000220 | B6 A0 2C 4B | time_when = 1261215926
00000224 | 33 | event_type = 51
00000225 | 64 00 00 00 | server_id = 100
00000229 | 36 00 00 00 | event_len = 54
0000022D | 56 02 00 00 | log_pos = 00000256
00000231 | 00 00 | flags = <none>
------------------------
00000233 | 49 4E 53 45 | query = "INSERT INTO t1 VALUES (1), (2), (3)"
00000237 | 52 54 20 49 |
0000023B | 4E 54 4F 20 |
0000023F | 74 31 20 56 |
00000243 | 41 4C 55 45 |
00000247 | 53 20 28 31 |
0000024B | 29 2C 20 28 |
0000024F | 32 29 2C 20 |
00000253 | 28 33 29 |
************************
In binary log, Annotate_rows event follows the (possible) 'BEGIN' Query event
and precedes the first of Table map events which accompany the corresponding
rows events. (See example in the "mysqlbinlog output" section below.)
2. Server option: --binlog-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the master to write Annotate_rows events to the binary log.
* Variable Name: binlog_annotate_rows_events
* Scope: Global & Session
* Access Type: Dynamic
* Data Type: bool
* Default Value: OFF
NOTE. Session values allows to annotate only some selected statements:
...
SET SESSION binlog_annotate_rows_events=ON;
... statements to be annotated ...
SET SESSION binlog_annotate_rows_events=OFF;
... statements not to be annotated ...
3. Server option: --replicate-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the slave to reproduce Annotate_rows events recieved from the master
in its own binary log (sensible only in pair with log-slave-updates option).
* Variable Name: replicate_annotate_rows_events
* Scope: Global
* Access Type: Read only
* Data Type: bool
* Default Value: OFF
NOTE. Why do we additionally need this 'replicate' option? Why not to make
the slave to reproduce this events when its binlog-annotate-rows-events
global value is ON? Well, because, for example, we may want to configure
the slave which should reproduce Annotate_rows events but has global
binlog-annotate-rows-events = OFF meaning this to be the default value for
the client threads (see also "How slave treats replicate-annotate-rows-events
option" in LLD part).
4. mysqlbinlog option: --print-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With this option, mysqlbinlog prints the content of Annotate_rows events (if
the binary log does contain them). Without this option (i.e. by default),
mysqlbinlog skips Annotate_rows events.
5. mysqlbinlog output
~~~~~~~~~~~~~~~~~~~~~
With --print-annotate-rows-events, mysqlbinlog outputs Annotate_rows events
in a form like this:
...
# at 1646
#091219 12:45:26 server id 100 end_log_pos 1714 Query thread_id=1
exec_time=0 error_code=0
SET TIMESTAMP=1261215926/*!*/;
BEGIN
/*!*/;
# at 1714
# at 1812
# at 1853
# at 1894
# at 1938
#091219 12:45:26 server id 100 end_log_pos 1812 Query: `DELETE t1, t2 FROM
t1 INNER JOIN t2 INNER JOIN t3 WHERE t1.a=t2.a AND t2.a=t3.a`
#091219 12:45:26 server id 100 end_log_pos 1853 Table_map: `test`.`t1`
mapped to number 16
#091219 12:45:26 server id 100 end_log_pos 1894 Table_map: `test`.`t2`
mapped to number 17
#091219 12:45:26 server id 100 end_log_pos 1938 Delete_rows: table id 16
#091219 12:45:26 server id 100 end_log_pos 1982 Delete_rows: table id 17
flags: STMT_END_F
...
LOW-LEVEL DESIGN:
Content
~~~~~~~
1. Annotate_rows event number
2. Outline of Annotate_rows event behavior
3. How Master writes Annotate_rows events to the binary log
4. How slave treats replicate-annotate-rows-events option
5. How slave IO thread requests Annotate_rows events
6. How master executes the request
7. How slave SQL thread processes Annotate_rows events
8. General remarks
1. Annotate_rows event number
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To avoid possible event numbers conflict with MySQL/Sun, we leave a gap
between the last MySQL event number and the Annotate_rows event number:
enum Log_event_type
{ ...
INCIDENT_EVENT= 26,
// New MySQL event numbers are to be added here
MYSQL_EVENTS_END,
MARIA_EVENTS_BEGIN= 51,
// New Maria event numbers start from here
ANNOTATE_ROWS_EVENT= 51,
ENUM_END_EVENT
};
together with the corresponding extension of 'post_header_len' array in the
Format description event. (This extension does not affect the compatibility
of the binary log). Here is how Format description event looks like with
this extension:
************************
FORMAT_DESCRIPTION_EVENT
************************
00000004 | A1 A0 2C 4B | time_when = 1261215905
00000008 | 0F | event_type = 15
00000009 | 64 00 00 00 | server_id = 100
0000000D | 7F 00 00 00 | event_len = 127
00000011 | 83 00 00 00 | log_pos = 00000083
00000015 | 01 00 | flags = LOG_EVENT_BINLOG_IN_USE_F
------------------------
00000017 | 04 00 | binlog_ver = 4
00000019 | 35 2E 32 2E | server_ver = 5.2.0-MariaDB-alpha-debug-log
..... ...
0000004B | A1 A0 2C 4B | time_created = 1261215905
0000004F | 13 | common_header_len = 19
------------------------
post_header_len
------------------------
00000050 | 38 | 56 - START_EVENT_V3 [1]
..... ...
00000069 | 02 | 2 - INCIDENT_EVENT [26]
0000006A | 00 | 0 - RESERVED [27]
..... ...
00000081 | 00 | 0 - RESERVED [50]
00000082 | 00 | 0 - ANNOTATE_ROWS_EVENT [51]
************************
2. Outline of Annotate_rows event behavior
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Each Annotate_rows_log_event object has two private members describing the
corresponding query:
char *m_query_txt;
uint m_query_len;
When the object is created for writing to a binary log, this query is taken
from 'thd' (for short, below we omit the 'Annotate_rows_log_event::' prefix
as well as other implementation details):
Annotate_rows_log_event(THD *thd)
{
m_query_txt = thd->query();
m_query_len = thd->query_length();
}
When the object is read from a binary log, the query is taken from the buffer
containing the binary log representation of the event (this buffer is allocated
in Log_event object from which all Log events are derived):
Annotate_rows_log_event(char *buf, uint event_len,
Format_description_log_event *desc)
{
m_query_len = event_len - desc->common_header_len;
m_query_txt = buf + desc->common_header_len;
}
The events are written to the binary log by the Log_event::write() member
which calls virtual write_data_header() and write_data_body() members
("data header" and "post header" are synonym in replication terminology).
In our case, data header is empty and data body is just the query:
bool write_data_body(IO_CACHE *file)
{
return my_b_safe_write(file, (uchar*) m_query_txt, m_query_len);
}
Printing the event is just printing the query:
void Annotate_rows_log_event::print(FILE *file, PRINT_EVENT_INFO *pinfo)
{
my_b_printf(&pinfo->head_cache, "\tQuery: `%s`\n", m_query_txt);
}
3. How Master writes Annotate_rows events to the binary log
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The event is written to the binary log just before the group of Table_map
events which precede corresponding Rows events (one query may generate
several Table map events in the binary log, but the corresponding
Annotate_rows event must be written only once before the first Table map
event; hence the boolean variable 'with_annotate' below):
int write_locked_table_maps(THD *thd)
{ ...
bool with_annotate= thd->variables.binlog_annotate_rows_events;
...
for (uint i= 0; i < ... <number of tables> ...; ++i)
{ ...
thd->binlog_write_table_map(table, ..., with_annotate);
with_annotate= 0; // write Annotate_event not more than once
...
}
...
}
int THD::binlog_write_table_map(TABLE *table, ..., bool with_annotate)
{ ...
Table_map_log_event the_event(...);
...
if (with_annotate)
{
Annotate_rows_log_event anno(this);
mysql_bin_log.write(&anno);
}
mysql_bin_log.write(&the_event);
...
}
4. How slave treats replicate-annotate-rows-events option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The replicate-annotate-rows-events option is treated just as the session
value of the binlog_annotate_rows_events variable for the slave IO and
SQL threads. This setting is done during initialization of these threads:
pthread_handler_t handle_slave_io(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_IO);
...
}
pthread_handler_t handle_slave_sql(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_SQL);
...
}
int init_slave_thread(THD* thd, SLAVE_THD_TYPE thd_type)
{ ...
thd->variables.binlog_annotate_rows_events=
opt_replicate_annotate_rows_events;
...
}
5. How slave IO thread requests Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the replicate-annotate-rows-events option is not set on a slave, there
is no need for master to send Annotate_rows events to this slave. The slave
(or mysqlbinlog in remote case), before requesting binlog dump via the
COM_BINLOG_DUMP command, informs the master whether it should send these
events by executing the newly added COM_BINLOG_DUMP_OPTIONS_EXT server
command:
case COM_BINLOG_DUMP_OPTIONS_EXT:
thd->binlog_dump_flags_ext= packet[0];
my_ok(thd);
break;
Note. We add this new command and don't use COM_BINLOG_DUMP to avoid possible
conflicts with MySQL/Sun.
6. How master executes the request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
case COM_BINLOG_DUMP:
{ ...
flags= uint2korr(packet + 4);
...
mysql_binlog_send(thd, ..., flags);
...
}
void mysql_binlog_send(THD* thd, ..., ushort flags)
{ ...
Log_event::read_log_event(&log, packet, ...);
...
if ((*packet)[EVENT_TYPE_OFFSET + 1] != ANNOTATE_ROWS_EVENT ||
flags & BINLOG_SEND_ANNOTATE_ROWS_EVENT)
{
my_net_write(net, packet->ptr(), packet->length());
}
...
}
7. How slave SQL thread processes Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The slave processes each recieved event by "applying" it, i.e. by
calling the Log_event::apply_event() function which in turn calls
the virtual do_apply_event() member specific for each type of the
event.
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev = next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
int apply_event_and_update_pos(Log_event *ev, ...)
{ ...
ev->apply_event(...);
...
}
int Log_event::apply_event(...)
{
return do_apply_event(...);
}
What does it mean to "apply" an Annotate_rows event? It means to set current
thd query to that of the described by the event, i.e. to the query which
caused the subsequent Rows events (see "How Master writes Annotate_rows
events to the binary log" to follow what happens further when the subsequent
Rows events are applied):
int Annotate_rows_log_event::do_apply_event(...)
{
thd->set_query(m_query_txt, m_query_len);
}
NOTE. I am not sure, but possibly current values of thd->query and
thd->query_length should be saved before calling set_query() and to be
restored on the Annotate_rows_log_event object deletion.
Is it really needed ?
After calling this do_apply_event() function we may not delete the
Annotate_rows_log_event object immediatedly (see exec_relay_log_event()
above) because thd->query now points to the string inside this object.
We may keep the pointer to this object in the Relay_log_info:
class Relay_log_info
{
public:
...
void set_annotate_event(Annotate_rows_log_event*);
Annotate_rows_log_event* get_annotate_event();
void free_annotate_event();
...
private:
Annotate_rows_log_event* m_annotate_event;
};
The saved Annotate_rows object should be deleted when all corresponding
Rows events will be processed:
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev= next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (rli->get_annotate_event() && is_last_rows_event(ev))
rli->free_annotate_event();
else if (ev->get_type_code() == ANNOTATE_ROWS_EVENT)
rli->set_annotate_event((Annotate_rows_log_event*) ev);
else if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
where
bool is_last_rows_event(Log_event* ev)
{
Log_event_type type= ev->get_type_code();
if (IS_ROWS_EVENT_TYPE(type))
{
Rows_log_event* rows= (Rows_log_event*)ev;
return rows->get_flags(Rows_log_event::STMT_END_F);
}
return 0;
}
#define IS_ROWS_EVENT_TYPE(type) ((type) == WRITE_ROWS_EVENT || \
(type) == UPDATE_ROWS_EVENT || \
(type) == DELETE_ROWS_EVENT)
8. General remarks
~~~~~~~~~~~~~~~~~~
Kristian noticed that introducing new log event type should be coordinated
somehow with MySQL/Sun:
Kristian: The numeric code for this event must be assigned carefully.
It should be coordinated with MySQL/Sun, otherwise we can get into a
situation where MySQL uses the same numeric code for one event that
MariaDB uses for ANNOTATE_ROWS_EVENT, which would make merging the two
impossible.
Alex: I reserved about 20 numbers not to have possible conflicts
with MySQL.
Kristian: Still, I think it would be appropriate to send a polite email
to internals(a)lists.mysql.com about this and suggesting to reserve the
event number.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Knielsen): Store in binlog text of statements that caused RBR events (47)
by worklog-noreply@askmonty.org 21 Jun '10
by worklog-noreply@askmonty.org 21 Jun '10
21 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Store in binlog text of statements that caused RBR events
CREATION DATE..: Sat, 15 Aug 2009, 23:48
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Knielsen, Serg
CATEGORY.......: Server-Sprint
TASK ID........: 47 (http://askmonty.org/worklog/?tid=47)
VERSION........: Server-9.x
STATUS.........: Code-Review
PRIORITY.......: 60
WORKED HOURS...: 42
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 35
PROGRESS NOTES:
-=-=(Knielsen - Mon, 21 Jun 2010, 08:32)=-=-
Final review.
Assist with some problems applying the patch.
Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour).
-=-=(Guest - Thu, 17 Jun 2010, 00:38)=-=-
Dependency deleted: 39 no longer depends on 47
-=-=(Knielsen - Mon, 07 Jun 2010, 07:13)=-=-
Help debug some test failures seen in Buildbot.
Worked 6 hours and estimate 0 hours remain (original estimate increased by 6 hours).
-=-=(Knielsen - Mon, 31 May 2010, 06:49)=-=-
Help Alexi debug+fix some test problems in the patch.
Worked 4 hours and estimate 0 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 25 May 2010, 08:29)=-=-
Help debug strange problem in mysqlbinlog.test.
Worked 1 hour and estimate 4 hours remain (original estimate unchanged).
-=-=(Knielsen - Mon, 17 May 2010, 08:45)=-=-
Merge with latest trunk and run Buildbot tests.
Worked 1 hour and estimate 5 hours remain (original estimate unchanged).
-=-=(Knielsen - Wed, 05 May 2010, 13:53)=-=-
Review of fixes to first review done. No new issues found.
Worked 2 hours and estimate 6 hours remain (original estimate unchanged).
-=-=(Knielsen - Fri, 23 Apr 2010, 12:51)=-=-
Status updated.
--- /tmp/wklog.47.old.28747 2010-04-23 12:51:36.000000000 +0000
+++ /tmp/wklog.47.new.28747 2010-04-23 12:51:36.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Knielsen - Tue, 06 Apr 2010, 15:26)=-=-
Code review (mailed to maria-developers@).
Worked 7 hours and estimate 8 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 06 Apr 2010, 15:25)=-=-
Status updated.
--- /tmp/wklog.47.old.12734 2010-04-06 15:25:54.000000000 +0000
+++ /tmp/wklog.47.new.12734 2010-04-06 15:25:54.000000000 +0000
@@ -1 +1 @@
-Code-Review
+In-Progress
------------------------------------------------------------
-=-=(View All Progress Notes, 35 total)=-=-
http://askmonty.org/worklog/index.pl?tid=47&nolimit=1
DESCRIPTION:
Store in binlog (and show in mysqlbinlog output) texts of statements that
caused RBR events
This is needed for (list from Monty):
- Easier to understand why updates happened
- Would make it easier to find out where in application things went
wrong (as you can search for exact strings)
- Allow one to filter things based on comments in the statement.
The cost of this can be that the binlog will be approximately 2x in size
(especially insert of big blob's would be a bit painful), so this should
be an optional feature.
HIGH-LEVEL SPECIFICATION:
Content
~~~~~~~
1. Annotate_rows_log_event
2. Server option: --binlog-annotate-rows-events
3. Server option: --replicate-annotate-rows-events
4. mysqlbinlog option: --print-annotate-rows-events
5. mysqlbinlog output
1. Annotate_rows_log_event [ ANNOTATE_ROWS_EVENT ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Describes the query which caused the corresponding rows events. Has empty
post-header and contains the query text in its data part. Example:
************************
ANNOTATE_ROWS_EVENT
************************
00000220 | B6 A0 2C 4B | time_when = 1261215926
00000224 | 33 | event_type = 51
00000225 | 64 00 00 00 | server_id = 100
00000229 | 36 00 00 00 | event_len = 54
0000022D | 56 02 00 00 | log_pos = 00000256
00000231 | 00 00 | flags = <none>
------------------------
00000233 | 49 4E 53 45 | query = "INSERT INTO t1 VALUES (1), (2), (3)"
00000237 | 52 54 20 49 |
0000023B | 4E 54 4F 20 |
0000023F | 74 31 20 56 |
00000243 | 41 4C 55 45 |
00000247 | 53 20 28 31 |
0000024B | 29 2C 20 28 |
0000024F | 32 29 2C 20 |
00000253 | 28 33 29 |
************************
In binary log, Annotate_rows event follows the (possible) 'BEGIN' Query event
and precedes the first of Table map events which accompany the corresponding
rows events. (See example in the "mysqlbinlog output" section below.)
2. Server option: --binlog-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the master to write Annotate_rows events to the binary log.
* Variable Name: binlog_annotate_rows_events
* Scope: Global & Session
* Access Type: Dynamic
* Data Type: bool
* Default Value: OFF
NOTE. Session values allows to annotate only some selected statements:
...
SET SESSION binlog_annotate_rows_events=ON;
... statements to be annotated ...
SET SESSION binlog_annotate_rows_events=OFF;
... statements not to be annotated ...
3. Server option: --replicate-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the slave to reproduce Annotate_rows events recieved from the master
in its own binary log (sensible only in pair with log-slave-updates option).
* Variable Name: replicate_annotate_rows_events
* Scope: Global
* Access Type: Read only
* Data Type: bool
* Default Value: OFF
NOTE. Why do we additionally need this 'replicate' option? Why not to make
the slave to reproduce this events when its binlog-annotate-rows-events
global value is ON? Well, because, for example, we may want to configure
the slave which should reproduce Annotate_rows events but has global
binlog-annotate-rows-events = OFF meaning this to be the default value for
the client threads (see also "How slave treats replicate-annotate-rows-events
option" in LLD part).
4. mysqlbinlog option: --print-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With this option, mysqlbinlog prints the content of Annotate_rows events (if
the binary log does contain them). Without this option (i.e. by default),
mysqlbinlog skips Annotate_rows events.
5. mysqlbinlog output
~~~~~~~~~~~~~~~~~~~~~
With --print-annotate-rows-events, mysqlbinlog outputs Annotate_rows events
in a form like this:
...
# at 1646
#091219 12:45:26 server id 100 end_log_pos 1714 Query thread_id=1
exec_time=0 error_code=0
SET TIMESTAMP=1261215926/*!*/;
BEGIN
/*!*/;
# at 1714
# at 1812
# at 1853
# at 1894
# at 1938
#091219 12:45:26 server id 100 end_log_pos 1812 Query: `DELETE t1, t2 FROM
t1 INNER JOIN t2 INNER JOIN t3 WHERE t1.a=t2.a AND t2.a=t3.a`
#091219 12:45:26 server id 100 end_log_pos 1853 Table_map: `test`.`t1`
mapped to number 16
#091219 12:45:26 server id 100 end_log_pos 1894 Table_map: `test`.`t2`
mapped to number 17
#091219 12:45:26 server id 100 end_log_pos 1938 Delete_rows: table id 16
#091219 12:45:26 server id 100 end_log_pos 1982 Delete_rows: table id 17
flags: STMT_END_F
...
LOW-LEVEL DESIGN:
Content
~~~~~~~
1. Annotate_rows event number
2. Outline of Annotate_rows event behavior
3. How Master writes Annotate_rows events to the binary log
4. How slave treats replicate-annotate-rows-events option
5. How slave IO thread requests Annotate_rows events
6. How master executes the request
7. How slave SQL thread processes Annotate_rows events
8. General remarks
1. Annotate_rows event number
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To avoid possible event numbers conflict with MySQL/Sun, we leave a gap
between the last MySQL event number and the Annotate_rows event number:
enum Log_event_type
{ ...
INCIDENT_EVENT= 26,
// New MySQL event numbers are to be added here
MYSQL_EVENTS_END,
MARIA_EVENTS_BEGIN= 51,
// New Maria event numbers start from here
ANNOTATE_ROWS_EVENT= 51,
ENUM_END_EVENT
};
together with the corresponding extension of 'post_header_len' array in the
Format description event. (This extension does not affect the compatibility
of the binary log). Here is how Format description event looks like with
this extension:
************************
FORMAT_DESCRIPTION_EVENT
************************
00000004 | A1 A0 2C 4B | time_when = 1261215905
00000008 | 0F | event_type = 15
00000009 | 64 00 00 00 | server_id = 100
0000000D | 7F 00 00 00 | event_len = 127
00000011 | 83 00 00 00 | log_pos = 00000083
00000015 | 01 00 | flags = LOG_EVENT_BINLOG_IN_USE_F
------------------------
00000017 | 04 00 | binlog_ver = 4
00000019 | 35 2E 32 2E | server_ver = 5.2.0-MariaDB-alpha-debug-log
..... ...
0000004B | A1 A0 2C 4B | time_created = 1261215905
0000004F | 13 | common_header_len = 19
------------------------
post_header_len
------------------------
00000050 | 38 | 56 - START_EVENT_V3 [1]
..... ...
00000069 | 02 | 2 - INCIDENT_EVENT [26]
0000006A | 00 | 0 - RESERVED [27]
..... ...
00000081 | 00 | 0 - RESERVED [50]
00000082 | 00 | 0 - ANNOTATE_ROWS_EVENT [51]
************************
2. Outline of Annotate_rows event behavior
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Each Annotate_rows_log_event object has two private members describing the
corresponding query:
char *m_query_txt;
uint m_query_len;
When the object is created for writing to a binary log, this query is taken
from 'thd' (for short, below we omit the 'Annotate_rows_log_event::' prefix
as well as other implementation details):
Annotate_rows_log_event(THD *thd)
{
m_query_txt = thd->query();
m_query_len = thd->query_length();
}
When the object is read from a binary log, the query is taken from the buffer
containing the binary log representation of the event (this buffer is allocated
in Log_event object from which all Log events are derived):
Annotate_rows_log_event(char *buf, uint event_len,
Format_description_log_event *desc)
{
m_query_len = event_len - desc->common_header_len;
m_query_txt = buf + desc->common_header_len;
}
The events are written to the binary log by the Log_event::write() member
which calls virtual write_data_header() and write_data_body() members
("data header" and "post header" are synonym in replication terminology).
In our case, data header is empty and data body is just the query:
bool write_data_body(IO_CACHE *file)
{
return my_b_safe_write(file, (uchar*) m_query_txt, m_query_len);
}
Printing the event is just printing the query:
void Annotate_rows_log_event::print(FILE *file, PRINT_EVENT_INFO *pinfo)
{
my_b_printf(&pinfo->head_cache, "\tQuery: `%s`\n", m_query_txt);
}
3. How Master writes Annotate_rows events to the binary log
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The event is written to the binary log just before the group of Table_map
events which precede corresponding Rows events (one query may generate
several Table map events in the binary log, but the corresponding
Annotate_rows event must be written only once before the first Table map
event; hence the boolean variable 'with_annotate' below):
int write_locked_table_maps(THD *thd)
{ ...
bool with_annotate= thd->variables.binlog_annotate_rows_events;
...
for (uint i= 0; i < ... <number of tables> ...; ++i)
{ ...
thd->binlog_write_table_map(table, ..., with_annotate);
with_annotate= 0; // write Annotate_event not more than once
...
}
...
}
int THD::binlog_write_table_map(TABLE *table, ..., bool with_annotate)
{ ...
Table_map_log_event the_event(...);
...
if (with_annotate)
{
Annotate_rows_log_event anno(this);
mysql_bin_log.write(&anno);
}
mysql_bin_log.write(&the_event);
...
}
4. How slave treats replicate-annotate-rows-events option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The replicate-annotate-rows-events option is treated just as the session
value of the binlog_annotate_rows_events variable for the slave IO and
SQL threads. This setting is done during initialization of these threads:
pthread_handler_t handle_slave_io(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_IO);
...
}
pthread_handler_t handle_slave_sql(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_SQL);
...
}
int init_slave_thread(THD* thd, SLAVE_THD_TYPE thd_type)
{ ...
thd->variables.binlog_annotate_rows_events=
opt_replicate_annotate_rows_events;
...
}
5. How slave IO thread requests Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the replicate-annotate-rows-events option is not set on a slave, there
is no need for master to send Annotate_rows events to this slave. The slave
(or mysqlbinlog in remote case), before requesting binlog dump via the
COM_BINLOG_DUMP command, informs the master whether it should send these
events by executing the newly added COM_BINLOG_DUMP_OPTIONS_EXT server
command:
case COM_BINLOG_DUMP_OPTIONS_EXT:
thd->binlog_dump_flags_ext= packet[0];
my_ok(thd);
break;
Note. We add this new command and don't use COM_BINLOG_DUMP to avoid possible
conflicts with MySQL/Sun.
6. How master executes the request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
case COM_BINLOG_DUMP:
{ ...
flags= uint2korr(packet + 4);
...
mysql_binlog_send(thd, ..., flags);
...
}
void mysql_binlog_send(THD* thd, ..., ushort flags)
{ ...
Log_event::read_log_event(&log, packet, ...);
...
if ((*packet)[EVENT_TYPE_OFFSET + 1] != ANNOTATE_ROWS_EVENT ||
flags & BINLOG_SEND_ANNOTATE_ROWS_EVENT)
{
my_net_write(net, packet->ptr(), packet->length());
}
...
}
7. How slave SQL thread processes Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The slave processes each recieved event by "applying" it, i.e. by
calling the Log_event::apply_event() function which in turn calls
the virtual do_apply_event() member specific for each type of the
event.
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev = next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
int apply_event_and_update_pos(Log_event *ev, ...)
{ ...
ev->apply_event(...);
...
}
int Log_event::apply_event(...)
{
return do_apply_event(...);
}
What does it mean to "apply" an Annotate_rows event? It means to set current
thd query to that of the described by the event, i.e. to the query which
caused the subsequent Rows events (see "How Master writes Annotate_rows
events to the binary log" to follow what happens further when the subsequent
Rows events are applied):
int Annotate_rows_log_event::do_apply_event(...)
{
thd->set_query(m_query_txt, m_query_len);
}
NOTE. I am not sure, but possibly current values of thd->query and
thd->query_length should be saved before calling set_query() and to be
restored on the Annotate_rows_log_event object deletion.
Is it really needed ?
After calling this do_apply_event() function we may not delete the
Annotate_rows_log_event object immediatedly (see exec_relay_log_event()
above) because thd->query now points to the string inside this object.
We may keep the pointer to this object in the Relay_log_info:
class Relay_log_info
{
public:
...
void set_annotate_event(Annotate_rows_log_event*);
Annotate_rows_log_event* get_annotate_event();
void free_annotate_event();
...
private:
Annotate_rows_log_event* m_annotate_event;
};
The saved Annotate_rows object should be deleted when all corresponding
Rows events will be processed:
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev= next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (rli->get_annotate_event() && is_last_rows_event(ev))
rli->free_annotate_event();
else if (ev->get_type_code() == ANNOTATE_ROWS_EVENT)
rli->set_annotate_event((Annotate_rows_log_event*) ev);
else if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
where
bool is_last_rows_event(Log_event* ev)
{
Log_event_type type= ev->get_type_code();
if (IS_ROWS_EVENT_TYPE(type))
{
Rows_log_event* rows= (Rows_log_event*)ev;
return rows->get_flags(Rows_log_event::STMT_END_F);
}
return 0;
}
#define IS_ROWS_EVENT_TYPE(type) ((type) == WRITE_ROWS_EVENT || \
(type) == UPDATE_ROWS_EVENT || \
(type) == DELETE_ROWS_EVENT)
8. General remarks
~~~~~~~~~~~~~~~~~~
Kristian noticed that introducing new log event type should be coordinated
somehow with MySQL/Sun:
Kristian: The numeric code for this event must be assigned carefully.
It should be coordinated with MySQL/Sun, otherwise we can get into a
situation where MySQL uses the same numeric code for one event that
MariaDB uses for ANNOTATE_ROWS_EVENT, which would make merging the two
impossible.
Alex: I reserved about 20 numbers not to have possible conflicts
with MySQL.
Kristian: Still, I think it would be appropriate to send a polite email
to internals(a)lists.mysql.com about this and suggesting to reserve the
event number.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
19 Jun '10
Sergei (and everyone else),
The Release Notes and Changelog pages for the MariaDB 5.2.1-beta release
are ready:
http://askmonty.org/wiki/Manual:MariaDB_5.2.1_Release_Notes
http://askmonty.org/wiki/Manual:MariaDB_5.2.1_Changelog
Please let me know if the Release Notes should mention anything else
or if there is anything on that page which should be changed. The
Changelog should have the full list of commits from the 5.2.0-beta up
through the commit with the 5.2.1-beta tag.
The download page for this release is also ready to go, but I haven't
activated it yet. I will activate it (i.e. link to it from the download
page, and other wiki pages) once the mirrors have been seeded (later
tonight or tomorrow).
Thanks.
--
Daniel Bartholomew
Monty Program - http://askmonty.org
1
1
[Maria-developers] Updated (by Guest): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply@askmonty.org 18 Jun '10
by worklog-noreply@askmonty.org 18 Jun '10
18 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to change the used database
CREATION DATE..: Fri, 07 Aug 2009, 14:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 36 (http://askmonty.org/worklog/?tid=36)
VERSION........: Server-5.3
STATUS.........: Complete
PRIORITY.......: 60
WORKED HOURS...: 49
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 18 Jun 2010, 15:20)=-=-
Version updated.
--- /tmp/wklog.36.old.11335 2010-06-18 15:20:26.000000000 +0000
+++ /tmp/wklog.36.new.11335 2010-06-18 15:20:26.000000000 +0000
@@ -1 +1 @@
-Server-9.x
+Server-5.3
-=-=(Guest - Thu, 17 Jun 2010, 00:39)=-=-
Dependency deleted: 39 no longer depends on 36
-=-=(Guest - Sat, 07 Nov 2009, 22:43)=-=-
Category updated.
--- /tmp/wklog.36.old.9112 2009-11-07 22:43:50.000000000 +0200
+++ /tmp/wklog.36.new.9112 2009-11-07 22:43:50.000000000 +0200
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-Sprint
-=-=(Guest - Sat, 07 Nov 2009, 22:43)=-=-
Status updated.
--- /tmp/wklog.36.old.9112 2009-11-07 22:43:50.000000000 +0200
+++ /tmp/wklog.36.new.9112 2009-11-07 22:43:50.000000000 +0200
@@ -1 +1 @@
-Un-Assigned
+Complete
-=-=(Bothorsen - Tue, 03 Nov 2009, 13:49)=-=-
More cleanup work done by Alexi, Bo and Sergey.
Worked 4 hours and estimate 0 hours remain (original estimate increased by 4 hours).
-=-=(Bothorsen - Tue, 03 Nov 2009, 13:49)=-=-
Sergey and Bo has been working on getting the patch ready, and Alexi has fixed some issues with the
patch.
Worked 15 hours and estimate 0 hours remain (original estimate increased by 15 hours).
-=-=(Bothorsen - Tue, 03 Nov 2009, 13:47)=-=-
Alexi has implemented a patch for this item.
Worked 30 hours and estimate 0 hours remain (original estimate increased by 30 hours).
-=-=(Guest - Tue, 15 Sep 2009, 18:04)=-=-
Low Level Design modified.
--- /tmp/wklog.36.old.19322 2009-09-15 18:04:49.000000000 +0300
+++ /tmp/wklog.36.new.19322 2009-09-15 18:04:49.000000000 +0300
@@ -191,7 +191,7 @@
- In process_event() function add switch case for Load_log_event and
add print_use_stmt() invocations where needed (according to the
- events lis above), e.g.:
+ events list above), e.g.:
Exit_status process_event(
PRINT_EVENT_INFO *print_event_info,
-=-=(Guest - Tue, 15 Sep 2009, 15:53)=-=-
Low Level Design modified.
--- /tmp/wklog.36.old.13421 2009-09-15 15:53:31.000000000 +0300
+++ /tmp/wklog.36.new.13421 2009-09-15 15:53:31.000000000 +0300
@@ -150,10 +150,17 @@
following events (see process_event() function):
- Query_log_event
-- Execute_load_query_log_event
-- Create_file_log_event
-
-TODO. Needed to check this list requires carefully !!!
+- Load_log_event
+- Execute_load_query_log_event [ :public Query_log_event ]
+- Create_file_log_event [ :public Load_log_event ]
+
+TODO. Needed to check this list carefully (not sure for Create_file_log_event)
+ Notes.
+ - In replication, only Query_log_event and Load_log_event uses
+ rpl_filter->get_rewrite_db();
+ - In mysqlbinlog (process_event), Execute_load_query_log_event
+ and Create_file_log_event are processed in separate switch
+ cases. And Load_log_event is processed in the default switch case.
Conditions for emiting use-statement:
- LOG_EVENT_SUPPRESS_USE_F is OFF for the event
@@ -182,8 +189,9 @@
*/
}
-- In process_event() function add print_use_stmt() invocations where
- needed (according to the events lis above), e.g.:
+- In process_event() function add switch case for Load_log_event and
+ add print_use_stmt() invocations where needed (according to the
+ events lis above), e.g.:
Exit_status process_event(
PRINT_EVENT_INFO *print_event_info,
@@ -207,6 +215,11 @@
}
break;
...
+ case LOAD_EVENT:
+ print_use_stmt((Load_log_event*)ev, print_event_info);
+ break;
+ default:
+ ...
}
...
}
-=-=(Guest - Tue, 15 Sep 2009, 12:12)=-=-
Low Level Design modified.
--- /tmp/wklog.36.old.3961 2009-09-15 12:12:26.000000000 +0300
+++ /tmp/wklog.36.new.3961 2009-09-15 12:12:26.000000000 +0300
@@ -144,6 +144,8 @@
3. Supporting rewrite-db for SBR events
---------------------------------------
+Limited to emiting USE <db_to> instead of USE <db_from>.
+
USE statements can be emited by mysqlbinlog as a result of processing the
following events (see process_event() function):
------------------------------------------------------------
-=-=(View All Progress Notes, 20 total)=-=-
http://askmonty.org/worklog/index.pl?tid=36&nolimit=1
DESCRIPTION:
Sometimes there is a need to take a binary log and apply it to a database with
a different name than the original name of the database on binlog producer.
If one is using statement-based replication, he can achieve this by grepping
out "USE dbname" statements out of the output of mysqlbinlog(*). With
row-based replication this is no longer possible, as database name is encoded
within the the BINLOG '....' statement.
This task is about adding an option to mysqlbinlog that would allow to change
the names of used databases in both RBR and SBR events.
(*) this implies that all statements refer to tables in the current database,
doesn't catch updates made inside stored functions and so forth, but still
works for a practially-important subset of cases.
HIGH-LEVEL SPECIFICATION:
Context
-------
(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global
overview)
At the moment, the server has a replication slave option
--replicate-rewrite-db="from->to"
the option affects
- Table_map_log_event (all RBR events)
- Load_log_event (LOAD DATA)
- Query_log_event (SBR-based updates, with the usual assumption that the
statement refers to tables in current database, so that changing the current
database will make the statement to work on a table in a different database).
See also MySQL BUG#42941. Note this bug is fixed in MySQL 5.1.37, which is not
merged into MariaDB at the time of writing, but planned to be merged before
release.
What we could do
----------------
Option1: make mysqlbinlog accept --replicate-rewrite-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
same extent as replication slave would process --replicate-rewrite-db option.
Option2: Add database-agnostic RBR events and --strip-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Right now RBR events require a databasename. It is not possible to have RBR
event stream that won't mention which database the events are for. When I
tried to use debugger and specify empty database name, attempt to apply the
binlog resulted in this error:
090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
opening tables,
We could do as follows:
- Make the server interpret empty database name in RBR event (i.e. in a
Table_map_log_event) as "use current database". Binlog slave thread
probably should not allow such events as it doesn't have a natural current
database.
- Add a mysqlbinlog --strip-db option that would
= not produce any "USE dbname" statements
= change databasename for all RBR events to be empty
That way, mysqlbinlog output will be database-agnostic and apply to the
current database.
(this will have the usual limitations that we assume that all statements in
the binlog refer to the current database).
Option3: Enhance database rewrite
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there is a need to support database change for statements that use
dbname.tablename notation and are replicated as statements (i.e. are DDL
statements and/or DML statements that are binlogged as statements),
then that could be supported as follows:
- Make the server's parser recognize special form of comments
/* !database-alias(oldname,newname) */
and save the mapping somewhere
- Put the hooks in table open and name resolution code to use the saved
mapping.
Once we've done the above, it will be easy to perform a complete,
no-compromise or restrictions database name change in binary log.
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to
parse the statement).
LOW-LEVEL DESIGN:
Content
-------
1. Adding rewrite-db option
2. Supporting rewrite-db option for RBR events
3. Supporting rewrite-db option for SBR events
(Limited to affecting only USE statements)
4. Current status
1. Adding rewrite-db option
---------------------------
1.1. Syntax:
--rewrite-db='db_from->db_to'
1.2. Add 'OPT_REWRITE_DB' to 'options_client' (in client_priv.h).
1.3. In mysqlbinlog.cc:
- Add { "rewrite-db", OPT_REWRITE_DB, ...} record to my_long_options:
- Add Rpl_filter object to mysqlbinlog.cc
Rpl_filter* binlog_filter;
- Add corresponding switch case to get_one_option():
case OPT_REWRITE_DB:
<extract db-from and db-to strings>
binlog_filter->add_db_rewrite(db_from, db_to);
break;
.
Note. To make Rpl_filter usable in a MYSQL_CLIENT context, few small
additional changes are required:
- In sql_list.cc/h, Sql_alloc::new(size_t) and Sql_alloc::new[](size_t)
uses sql_alloc() which is THD dependent. These are to be modified
as follows:
#ifdef MYSQL_CLIENT
extern MEM_ROOT sql_list_client_mem_root; // defined in sql_list.cc
#endif
class Sql_alloc
{ ...
static void *operator new(size_t size) throw ()
{
#ifndef MYSQL_CLIENT
return sql_alloc(size);
#else
return alloc_root(&sql_list_client_mem_root, size);
#endif
}
static void *operator new[](size_t size) throw ()
{
#ifndef MYSQL_CLIENT
return sql_alloc(size);
#else
return alloc_root(&sql_list_client_mem_root, size);
#endif
}
...
}
- In rpl_filter.cc:
Rpl_filter::Rpl_filter() :
...
{
#ifdef MYSQL_CLIENT
init_alloc_root(&sql_list_client_mem_root, ...);
#endif
...
}
Rpl_filter::~Rpl_filter()
{ ...
#ifdef MYSQL_CLIENT
free_root(&sql_list_client_mem_root, ...);
#endif
}
2. Supporting rewrite-db for RBR events
---------------------------------------
In binlog, each row operation event is preceded by Table map event(s) which maps
table id(s) to database and table names. So, it's enough to support rewriting
database name in a Table map.
2.1. Add rewrite_db() member to Table_map_log_event:
int Table_map_log_event::rewrite_db(
const char* new_db,
size_t new_db_len,
const Format_description_log_event* desc)
{
/* 1. In temp_buf member (possibly reallocating it) rewrite
event length, db length, and db parts
2. Change m_dblen and m_dbnam members
*/
}
Comment. This function assumes that temp_buf member contains Table map
binlog representaion (temp_buf is used for creating corresponding
BINLOG statement).
2.2. In mysqlbinlog modify corresponding switch case in the
process_event() function:
Exit_status process_event(
PRINT_EVENT_INFO *print_event_info,
Log_event *ev, ...)
{
...
switch (ev_type) {
...
case TABLE_MAP_EVENT:
{
Table_map_log_event *map= ((Table_map_log_event *)ev);
if (shall_skip_database(map->get_db_name()))
{ ...
}
// WL36
size_t new_len= 0;
const char* new_db= binlog_filter->get_rewrite_db(
map->get_db_name(), &new_len);
if (new_len && map->rewrite_db(new_db, new_len,
glob_description_event))
{ error("Could not rewrite database name");
goto err;
}
}
case WRITE_ROWS_EVENT:
case DELETE_ROWS_EVENT:
case UPDATE_ROWS_EVENT:
...
}
...
}
Comment. Rpl_filter::get_rewrite_db(db_from, &len): if filter contains
a (db_from, db_to) pair, this function returns pointer to db_to and
sets len = db_to length; otherwise, it returns db_from and does not
change len value.
3. Supporting rewrite-db for SBR events
---------------------------------------
Limited to emiting USE <db_to> instead of USE <db_from>.
USE statements can be emited by mysqlbinlog as a result of processing the
following events (see process_event() function):
- Query_log_event
- Load_log_event
- Execute_load_query_log_event [ :public Query_log_event ]
- Create_file_log_event [ :public Load_log_event ]
TODO. Needed to check this list carefully (not sure for Create_file_log_event)
Notes.
- In replication, only Query_log_event and Load_log_event uses
rpl_filter->get_rewrite_db();
- In mysqlbinlog (process_event), Execute_load_query_log_event
and Create_file_log_event are processed in separate switch
cases. And Load_log_event is processed in the default switch case.
Conditions for emiting use-statement:
- LOG_EVENT_SUPPRESS_USE_F is OFF for the event
(e.g. it is ON for 'create database' statement)
- event's db name differs from db_name in PRINT_EVENT_INFO
(PRINT_EVENT_INFO keeps db name of the last issued USE statement;
initially, this db name is empty).
3.1. In mysqlbinlog.cc
- Add the following function:
void print_use_stmt(Log_event* event, PRINT_EVENT_INFO* pinfo)
{
if (event->flags & LOG_EVENT_SUPPRESS_USE_F)
return;
/*
- For events listed above get db_from = event->db;
- If db_from is the same as pinfo->db then return;
- If there is rewrite-db rule db_from->db_to,
set db = db_to. Else set db = db_from;
- Print "use <db>" to mysqlbinlog output
- Set pinfo->db = db_from
(this suppresses emiting use-statements by corresponding
log_event's print-function)
*/
}
- In process_event() function add switch case for Load_log_event and
add print_use_stmt() invocations where needed (according to the
events list above), e.g.:
Exit_status process_event(
PRINT_EVENT_INFO *print_event_info,
Log_event *ev, ...)
{
...
switch (ev_type) {
case QUERY_EVENT:
if (shall_skip_database(((Query_log_event*)ev)->db))
goto end;
if (opt_base64_output_mode == BASE64_OUTPUT_ALWAYS)
{
// Possibly in case of rewite-db rule for ev->db
// a warning should be emited here (see note below)
... write_event_header_and_base64(ev, ...) ...
}
else
{
print_use_stmt((Query_log_event*)ev, print_event_info);
ev->print(result_file, print_event_info);
}
break;
...
case LOAD_EVENT:
print_use_stmt((Load_log_event*)ev, print_event_info);
break;
default:
...
}
...
}
Note. write_event_header_and_base64() does not print use-statement. It
produces BINLOG statement using ev->temp_buf content (i.e. the binary
log representation of the event). We don't rewrite temp_buf here with
db_to name (as we do it for Table map event) - this implies the
limitation 3 mentioned above.
Question: Is supporting of rewite_db + --base64-output really needed
currently?
4. Current status
-----------------
The outlined design (implemented for mysql-5.1.37) is tested for
simple test-cases.
TODO. 1. Check list of events which can emit use-statement.
2. Supporting of rewite_db + --base64-output ?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0