developers
Threads by month
- ----- 2025 -----
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 3 participants
- 6829 discussions
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] Progress (by Knielsen): Store in binlog text of statements that caused RBR events (47)
by worklog-noreply@askmonty.org 07 Jun '10
by worklog-noreply@askmonty.org 07 Jun '10
07 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Store in binlog text of statements that caused RBR events
CREATION DATE..: Sat, 15 Aug 2009, 23:48
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Knielsen, Serg
CATEGORY.......: Server-Sprint
TASK ID........: 47 (http://askmonty.org/worklog/?tid=47)
VERSION........: Server-9.x
STATUS.........: Code-Review
PRIORITY.......: 60
WORKED HOURS...: 41
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 35
PROGRESS NOTES:
-=-=(Knielsen - Mon, 07 Jun 2010, 07:13)=-=-
Help debug some test failures seen in Buildbot.
Worked 6 hours and estimate 0 hours remain (original estimate increased by 6 hours).
-=-=(Knielsen - Mon, 31 May 2010, 06:49)=-=-
Help Alexi debug+fix some test problems in the patch.
Worked 4 hours and estimate 0 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 25 May 2010, 08:29)=-=-
Help debug strange problem in mysqlbinlog.test.
Worked 1 hour and estimate 4 hours remain (original estimate unchanged).
-=-=(Knielsen - Mon, 17 May 2010, 08:45)=-=-
Merge with latest trunk and run Buildbot tests.
Worked 1 hour and estimate 5 hours remain (original estimate unchanged).
-=-=(Knielsen - Wed, 05 May 2010, 13:53)=-=-
Review of fixes to first review done. No new issues found.
Worked 2 hours and estimate 6 hours remain (original estimate unchanged).
-=-=(Knielsen - Fri, 23 Apr 2010, 12:51)=-=-
Status updated.
--- /tmp/wklog.47.old.28747 2010-04-23 12:51:36.000000000 +0000
+++ /tmp/wklog.47.new.28747 2010-04-23 12:51:36.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Knielsen - Tue, 06 Apr 2010, 15:26)=-=-
Code review (mailed to maria-developers@).
Worked 7 hours and estimate 8 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 06 Apr 2010, 15:25)=-=-
Status updated.
--- /tmp/wklog.47.old.12734 2010-04-06 15:25:54.000000000 +0000
+++ /tmp/wklog.47.new.12734 2010-04-06 15:25:54.000000000 +0000
@@ -1 +1 @@
-Code-Review
+In-Progress
-=-=(Knielsen - Mon, 29 Mar 2010, 10:59)=-=-
Status updated.
--- /tmp/wklog.47.old.27790 2010-03-29 10:59:53.000000000 +0000
+++ /tmp/wklog.47.new.27790 2010-03-29 10:59:53.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Alexi - Thu, 18 Feb 2010, 19:29)=-=-
Worked 20 hours (alexi)
Worked 20 hours and estimate 15 hours remain (original estimate unchanged).
------------------------------------------------------------
-=-=(View All Progress Notes, 33 total)=-=-
http://askmonty.org/worklog/index.pl?tid=47&nolimit=1
DESCRIPTION:
Store in binlog (and show in mysqlbinlog output) texts of statements that
caused RBR events
This is needed for (list from Monty):
- Easier to understand why updates happened
- Would make it easier to find out where in application things went
wrong (as you can search for exact strings)
- Allow one to filter things based on comments in the statement.
The cost of this can be that the binlog will be approximately 2x in size
(especially insert of big blob's would be a bit painful), so this should
be an optional feature.
HIGH-LEVEL SPECIFICATION:
Content
~~~~~~~
1. Annotate_rows_log_event
2. Server option: --binlog-annotate-rows-events
3. Server option: --replicate-annotate-rows-events
4. mysqlbinlog option: --print-annotate-rows-events
5. mysqlbinlog output
1. Annotate_rows_log_event [ ANNOTATE_ROWS_EVENT ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Describes the query which caused the corresponding rows events. Has empty
post-header and contains the query text in its data part. Example:
************************
ANNOTATE_ROWS_EVENT
************************
00000220 | B6 A0 2C 4B | time_when = 1261215926
00000224 | 33 | event_type = 51
00000225 | 64 00 00 00 | server_id = 100
00000229 | 36 00 00 00 | event_len = 54
0000022D | 56 02 00 00 | log_pos = 00000256
00000231 | 00 00 | flags = <none>
------------------------
00000233 | 49 4E 53 45 | query = "INSERT INTO t1 VALUES (1), (2), (3)"
00000237 | 52 54 20 49 |
0000023B | 4E 54 4F 20 |
0000023F | 74 31 20 56 |
00000243 | 41 4C 55 45 |
00000247 | 53 20 28 31 |
0000024B | 29 2C 20 28 |
0000024F | 32 29 2C 20 |
00000253 | 28 33 29 |
************************
In binary log, Annotate_rows event follows the (possible) 'BEGIN' Query event
and precedes the first of Table map events which accompany the corresponding
rows events. (See example in the "mysqlbinlog output" section below.)
2. Server option: --binlog-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the master to write Annotate_rows events to the binary log.
* Variable Name: binlog_annotate_rows_events
* Scope: Global & Session
* Access Type: Dynamic
* Data Type: bool
* Default Value: OFF
NOTE. Session values allows to annotate only some selected statements:
...
SET SESSION binlog_annotate_rows_events=ON;
... statements to be annotated ...
SET SESSION binlog_annotate_rows_events=OFF;
... statements not to be annotated ...
3. Server option: --replicate-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the slave to reproduce Annotate_rows events recieved from the master
in its own binary log (sensible only in pair with log-slave-updates option).
* Variable Name: replicate_annotate_rows_events
* Scope: Global
* Access Type: Read only
* Data Type: bool
* Default Value: OFF
NOTE. Why do we additionally need this 'replicate' option? Why not to make
the slave to reproduce this events when its binlog-annotate-rows-events
global value is ON? Well, because, for example, we may want to configure
the slave which should reproduce Annotate_rows events but has global
binlog-annotate-rows-events = OFF meaning this to be the default value for
the client threads (see also "How slave treats replicate-annotate-rows-events
option" in LLD part).
4. mysqlbinlog option: --print-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With this option, mysqlbinlog prints the content of Annotate_rows events (if
the binary log does contain them). Without this option (i.e. by default),
mysqlbinlog skips Annotate_rows events.
5. mysqlbinlog output
~~~~~~~~~~~~~~~~~~~~~
With --print-annotate-rows-events, mysqlbinlog outputs Annotate_rows events
in a form like this:
...
# at 1646
#091219 12:45:26 server id 100 end_log_pos 1714 Query thread_id=1
exec_time=0 error_code=0
SET TIMESTAMP=1261215926/*!*/;
BEGIN
/*!*/;
# at 1714
# at 1812
# at 1853
# at 1894
# at 1938
#091219 12:45:26 server id 100 end_log_pos 1812 Query: `DELETE t1, t2 FROM
t1 INNER JOIN t2 INNER JOIN t3 WHERE t1.a=t2.a AND t2.a=t3.a`
#091219 12:45:26 server id 100 end_log_pos 1853 Table_map: `test`.`t1`
mapped to number 16
#091219 12:45:26 server id 100 end_log_pos 1894 Table_map: `test`.`t2`
mapped to number 17
#091219 12:45:26 server id 100 end_log_pos 1938 Delete_rows: table id 16
#091219 12:45:26 server id 100 end_log_pos 1982 Delete_rows: table id 17
flags: STMT_END_F
...
LOW-LEVEL DESIGN:
Content
~~~~~~~
1. Annotate_rows event number
2. Outline of Annotate_rows event behavior
3. How Master writes Annotate_rows events to the binary log
4. How slave treats replicate-annotate-rows-events option
5. How slave IO thread requests Annotate_rows events
6. How master executes the request
7. How slave SQL thread processes Annotate_rows events
8. General remarks
1. Annotate_rows event number
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To avoid possible event numbers conflict with MySQL/Sun, we leave a gap
between the last MySQL event number and the Annotate_rows event number:
enum Log_event_type
{ ...
INCIDENT_EVENT= 26,
// New MySQL event numbers are to be added here
MYSQL_EVENTS_END,
MARIA_EVENTS_BEGIN= 51,
// New Maria event numbers start from here
ANNOTATE_ROWS_EVENT= 51,
ENUM_END_EVENT
};
together with the corresponding extension of 'post_header_len' array in the
Format description event. (This extension does not affect the compatibility
of the binary log). Here is how Format description event looks like with
this extension:
************************
FORMAT_DESCRIPTION_EVENT
************************
00000004 | A1 A0 2C 4B | time_when = 1261215905
00000008 | 0F | event_type = 15
00000009 | 64 00 00 00 | server_id = 100
0000000D | 7F 00 00 00 | event_len = 127
00000011 | 83 00 00 00 | log_pos = 00000083
00000015 | 01 00 | flags = LOG_EVENT_BINLOG_IN_USE_F
------------------------
00000017 | 04 00 | binlog_ver = 4
00000019 | 35 2E 32 2E | server_ver = 5.2.0-MariaDB-alpha-debug-log
..... ...
0000004B | A1 A0 2C 4B | time_created = 1261215905
0000004F | 13 | common_header_len = 19
------------------------
post_header_len
------------------------
00000050 | 38 | 56 - START_EVENT_V3 [1]
..... ...
00000069 | 02 | 2 - INCIDENT_EVENT [26]
0000006A | 00 | 0 - RESERVED [27]
..... ...
00000081 | 00 | 0 - RESERVED [50]
00000082 | 00 | 0 - ANNOTATE_ROWS_EVENT [51]
************************
2. Outline of Annotate_rows event behavior
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Each Annotate_rows_log_event object has two private members describing the
corresponding query:
char *m_query_txt;
uint m_query_len;
When the object is created for writing to a binary log, this query is taken
from 'thd' (for short, below we omit the 'Annotate_rows_log_event::' prefix
as well as other implementation details):
Annotate_rows_log_event(THD *thd)
{
m_query_txt = thd->query();
m_query_len = thd->query_length();
}
When the object is read from a binary log, the query is taken from the buffer
containing the binary log representation of the event (this buffer is allocated
in Log_event object from which all Log events are derived):
Annotate_rows_log_event(char *buf, uint event_len,
Format_description_log_event *desc)
{
m_query_len = event_len - desc->common_header_len;
m_query_txt = buf + desc->common_header_len;
}
The events are written to the binary log by the Log_event::write() member
which calls virtual write_data_header() and write_data_body() members
("data header" and "post header" are synonym in replication terminology).
In our case, data header is empty and data body is just the query:
bool write_data_body(IO_CACHE *file)
{
return my_b_safe_write(file, (uchar*) m_query_txt, m_query_len);
}
Printing the event is just printing the query:
void Annotate_rows_log_event::print(FILE *file, PRINT_EVENT_INFO *pinfo)
{
my_b_printf(&pinfo->head_cache, "\tQuery: `%s`\n", m_query_txt);
}
3. How Master writes Annotate_rows events to the binary log
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The event is written to the binary log just before the group of Table_map
events which precede corresponding Rows events (one query may generate
several Table map events in the binary log, but the corresponding
Annotate_rows event must be written only once before the first Table map
event; hence the boolean variable 'with_annotate' below):
int write_locked_table_maps(THD *thd)
{ ...
bool with_annotate= thd->variables.binlog_annotate_rows_events;
...
for (uint i= 0; i < ... <number of tables> ...; ++i)
{ ...
thd->binlog_write_table_map(table, ..., with_annotate);
with_annotate= 0; // write Annotate_event not more than once
...
}
...
}
int THD::binlog_write_table_map(TABLE *table, ..., bool with_annotate)
{ ...
Table_map_log_event the_event(...);
...
if (with_annotate)
{
Annotate_rows_log_event anno(this);
mysql_bin_log.write(&anno);
}
mysql_bin_log.write(&the_event);
...
}
4. How slave treats replicate-annotate-rows-events option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The replicate-annotate-rows-events option is treated just as the session
value of the binlog_annotate_rows_events variable for the slave IO and
SQL threads. This setting is done during initialization of these threads:
pthread_handler_t handle_slave_io(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_IO);
...
}
pthread_handler_t handle_slave_sql(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_SQL);
...
}
int init_slave_thread(THD* thd, SLAVE_THD_TYPE thd_type)
{ ...
thd->variables.binlog_annotate_rows_events=
opt_replicate_annotate_rows_events;
...
}
5. How slave IO thread requests Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the replicate-annotate-rows-events option is not set on a slave, there
is no need for master to send Annotate_rows events to this slave. The slave
(or mysqlbinlog in remote case), before requesting binlog dump via the
COM_BINLOG_DUMP command, informs the master whether it should send these
events by executing the newly added COM_BINLOG_DUMP_OPTIONS_EXT server
command:
case COM_BINLOG_DUMP_OPTIONS_EXT:
thd->binlog_dump_flags_ext= packet[0];
my_ok(thd);
break;
Note. We add this new command and don't use COM_BINLOG_DUMP to avoid possible
conflicts with MySQL/Sun.
6. How master executes the request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
case COM_BINLOG_DUMP:
{ ...
flags= uint2korr(packet + 4);
...
mysql_binlog_send(thd, ..., flags);
...
}
void mysql_binlog_send(THD* thd, ..., ushort flags)
{ ...
Log_event::read_log_event(&log, packet, ...);
...
if ((*packet)[EVENT_TYPE_OFFSET + 1] != ANNOTATE_ROWS_EVENT ||
flags & BINLOG_SEND_ANNOTATE_ROWS_EVENT)
{
my_net_write(net, packet->ptr(), packet->length());
}
...
}
7. How slave SQL thread processes Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The slave processes each recieved event by "applying" it, i.e. by
calling the Log_event::apply_event() function which in turn calls
the virtual do_apply_event() member specific for each type of the
event.
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev = next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
int apply_event_and_update_pos(Log_event *ev, ...)
{ ...
ev->apply_event(...);
...
}
int Log_event::apply_event(...)
{
return do_apply_event(...);
}
What does it mean to "apply" an Annotate_rows event? It means to set current
thd query to that of the described by the event, i.e. to the query which
caused the subsequent Rows events (see "How Master writes Annotate_rows
events to the binary log" to follow what happens further when the subsequent
Rows events are applied):
int Annotate_rows_log_event::do_apply_event(...)
{
thd->set_query(m_query_txt, m_query_len);
}
NOTE. I am not sure, but possibly current values of thd->query and
thd->query_length should be saved before calling set_query() and to be
restored on the Annotate_rows_log_event object deletion.
Is it really needed ?
After calling this do_apply_event() function we may not delete the
Annotate_rows_log_event object immediatedly (see exec_relay_log_event()
above) because thd->query now points to the string inside this object.
We may keep the pointer to this object in the Relay_log_info:
class Relay_log_info
{
public:
...
void set_annotate_event(Annotate_rows_log_event*);
Annotate_rows_log_event* get_annotate_event();
void free_annotate_event();
...
private:
Annotate_rows_log_event* m_annotate_event;
};
The saved Annotate_rows object should be deleted when all corresponding
Rows events will be processed:
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev= next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (rli->get_annotate_event() && is_last_rows_event(ev))
rli->free_annotate_event();
else if (ev->get_type_code() == ANNOTATE_ROWS_EVENT)
rli->set_annotate_event((Annotate_rows_log_event*) ev);
else if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
where
bool is_last_rows_event(Log_event* ev)
{
Log_event_type type= ev->get_type_code();
if (IS_ROWS_EVENT_TYPE(type))
{
Rows_log_event* rows= (Rows_log_event*)ev;
return rows->get_flags(Rows_log_event::STMT_END_F);
}
return 0;
}
#define IS_ROWS_EVENT_TYPE(type) ((type) == WRITE_ROWS_EVENT || \
(type) == UPDATE_ROWS_EVENT || \
(type) == DELETE_ROWS_EVENT)
8. General remarks
~~~~~~~~~~~~~~~~~~
Kristian noticed that introducing new log event type should be coordinated
somehow with MySQL/Sun:
Kristian: The numeric code for this event must be assigned carefully.
It should be coordinated with MySQL/Sun, otherwise we can get into a
situation where MySQL uses the same numeric code for one event that
MariaDB uses for ANNOTATE_ROWS_EVENT, which would make merging the two
impossible.
Alex: I reserved about 20 numbers not to have possible conflicts
with MySQL.
Kristian: Still, I think it would be appropriate to send a polite email
to internals(a)lists.mysql.com about this and suggesting to reserve the
event number.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] Progress (by Knielsen): Store in binlog text of statements that caused RBR events (47)
by worklog-noreply@askmonty.org 07 Jun '10
by worklog-noreply@askmonty.org 07 Jun '10
07 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Store in binlog text of statements that caused RBR events
CREATION DATE..: Sat, 15 Aug 2009, 23:48
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Knielsen, Serg
CATEGORY.......: Server-Sprint
TASK ID........: 47 (http://askmonty.org/worklog/?tid=47)
VERSION........: Server-9.x
STATUS.........: Code-Review
PRIORITY.......: 60
WORKED HOURS...: 41
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 35
PROGRESS NOTES:
-=-=(Knielsen - Mon, 07 Jun 2010, 07:13)=-=-
Help debug some test failures seen in Buildbot.
Worked 6 hours and estimate 0 hours remain (original estimate increased by 6 hours).
-=-=(Knielsen - Mon, 31 May 2010, 06:49)=-=-
Help Alexi debug+fix some test problems in the patch.
Worked 4 hours and estimate 0 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 25 May 2010, 08:29)=-=-
Help debug strange problem in mysqlbinlog.test.
Worked 1 hour and estimate 4 hours remain (original estimate unchanged).
-=-=(Knielsen - Mon, 17 May 2010, 08:45)=-=-
Merge with latest trunk and run Buildbot tests.
Worked 1 hour and estimate 5 hours remain (original estimate unchanged).
-=-=(Knielsen - Wed, 05 May 2010, 13:53)=-=-
Review of fixes to first review done. No new issues found.
Worked 2 hours and estimate 6 hours remain (original estimate unchanged).
-=-=(Knielsen - Fri, 23 Apr 2010, 12:51)=-=-
Status updated.
--- /tmp/wklog.47.old.28747 2010-04-23 12:51:36.000000000 +0000
+++ /tmp/wklog.47.new.28747 2010-04-23 12:51:36.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Knielsen - Tue, 06 Apr 2010, 15:26)=-=-
Code review (mailed to maria-developers@).
Worked 7 hours and estimate 8 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 06 Apr 2010, 15:25)=-=-
Status updated.
--- /tmp/wklog.47.old.12734 2010-04-06 15:25:54.000000000 +0000
+++ /tmp/wklog.47.new.12734 2010-04-06 15:25:54.000000000 +0000
@@ -1 +1 @@
-Code-Review
+In-Progress
-=-=(Knielsen - Mon, 29 Mar 2010, 10:59)=-=-
Status updated.
--- /tmp/wklog.47.old.27790 2010-03-29 10:59:53.000000000 +0000
+++ /tmp/wklog.47.new.27790 2010-03-29 10:59:53.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Alexi - Thu, 18 Feb 2010, 19:29)=-=-
Worked 20 hours (alexi)
Worked 20 hours and estimate 15 hours remain (original estimate unchanged).
------------------------------------------------------------
-=-=(View All Progress Notes, 33 total)=-=-
http://askmonty.org/worklog/index.pl?tid=47&nolimit=1
DESCRIPTION:
Store in binlog (and show in mysqlbinlog output) texts of statements that
caused RBR events
This is needed for (list from Monty):
- Easier to understand why updates happened
- Would make it easier to find out where in application things went
wrong (as you can search for exact strings)
- Allow one to filter things based on comments in the statement.
The cost of this can be that the binlog will be approximately 2x in size
(especially insert of big blob's would be a bit painful), so this should
be an optional feature.
HIGH-LEVEL SPECIFICATION:
Content
~~~~~~~
1. Annotate_rows_log_event
2. Server option: --binlog-annotate-rows-events
3. Server option: --replicate-annotate-rows-events
4. mysqlbinlog option: --print-annotate-rows-events
5. mysqlbinlog output
1. Annotate_rows_log_event [ ANNOTATE_ROWS_EVENT ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Describes the query which caused the corresponding rows events. Has empty
post-header and contains the query text in its data part. Example:
************************
ANNOTATE_ROWS_EVENT
************************
00000220 | B6 A0 2C 4B | time_when = 1261215926
00000224 | 33 | event_type = 51
00000225 | 64 00 00 00 | server_id = 100
00000229 | 36 00 00 00 | event_len = 54
0000022D | 56 02 00 00 | log_pos = 00000256
00000231 | 00 00 | flags = <none>
------------------------
00000233 | 49 4E 53 45 | query = "INSERT INTO t1 VALUES (1), (2), (3)"
00000237 | 52 54 20 49 |
0000023B | 4E 54 4F 20 |
0000023F | 74 31 20 56 |
00000243 | 41 4C 55 45 |
00000247 | 53 20 28 31 |
0000024B | 29 2C 20 28 |
0000024F | 32 29 2C 20 |
00000253 | 28 33 29 |
************************
In binary log, Annotate_rows event follows the (possible) 'BEGIN' Query event
and precedes the first of Table map events which accompany the corresponding
rows events. (See example in the "mysqlbinlog output" section below.)
2. Server option: --binlog-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the master to write Annotate_rows events to the binary log.
* Variable Name: binlog_annotate_rows_events
* Scope: Global & Session
* Access Type: Dynamic
* Data Type: bool
* Default Value: OFF
NOTE. Session values allows to annotate only some selected statements:
...
SET SESSION binlog_annotate_rows_events=ON;
... statements to be annotated ...
SET SESSION binlog_annotate_rows_events=OFF;
... statements not to be annotated ...
3. Server option: --replicate-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the slave to reproduce Annotate_rows events recieved from the master
in its own binary log (sensible only in pair with log-slave-updates option).
* Variable Name: replicate_annotate_rows_events
* Scope: Global
* Access Type: Read only
* Data Type: bool
* Default Value: OFF
NOTE. Why do we additionally need this 'replicate' option? Why not to make
the slave to reproduce this events when its binlog-annotate-rows-events
global value is ON? Well, because, for example, we may want to configure
the slave which should reproduce Annotate_rows events but has global
binlog-annotate-rows-events = OFF meaning this to be the default value for
the client threads (see also "How slave treats replicate-annotate-rows-events
option" in LLD part).
4. mysqlbinlog option: --print-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With this option, mysqlbinlog prints the content of Annotate_rows events (if
the binary log does contain them). Without this option (i.e. by default),
mysqlbinlog skips Annotate_rows events.
5. mysqlbinlog output
~~~~~~~~~~~~~~~~~~~~~
With --print-annotate-rows-events, mysqlbinlog outputs Annotate_rows events
in a form like this:
...
# at 1646
#091219 12:45:26 server id 100 end_log_pos 1714 Query thread_id=1
exec_time=0 error_code=0
SET TIMESTAMP=1261215926/*!*/;
BEGIN
/*!*/;
# at 1714
# at 1812
# at 1853
# at 1894
# at 1938
#091219 12:45:26 server id 100 end_log_pos 1812 Query: `DELETE t1, t2 FROM
t1 INNER JOIN t2 INNER JOIN t3 WHERE t1.a=t2.a AND t2.a=t3.a`
#091219 12:45:26 server id 100 end_log_pos 1853 Table_map: `test`.`t1`
mapped to number 16
#091219 12:45:26 server id 100 end_log_pos 1894 Table_map: `test`.`t2`
mapped to number 17
#091219 12:45:26 server id 100 end_log_pos 1938 Delete_rows: table id 16
#091219 12:45:26 server id 100 end_log_pos 1982 Delete_rows: table id 17
flags: STMT_END_F
...
LOW-LEVEL DESIGN:
Content
~~~~~~~
1. Annotate_rows event number
2. Outline of Annotate_rows event behavior
3. How Master writes Annotate_rows events to the binary log
4. How slave treats replicate-annotate-rows-events option
5. How slave IO thread requests Annotate_rows events
6. How master executes the request
7. How slave SQL thread processes Annotate_rows events
8. General remarks
1. Annotate_rows event number
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To avoid possible event numbers conflict with MySQL/Sun, we leave a gap
between the last MySQL event number and the Annotate_rows event number:
enum Log_event_type
{ ...
INCIDENT_EVENT= 26,
// New MySQL event numbers are to be added here
MYSQL_EVENTS_END,
MARIA_EVENTS_BEGIN= 51,
// New Maria event numbers start from here
ANNOTATE_ROWS_EVENT= 51,
ENUM_END_EVENT
};
together with the corresponding extension of 'post_header_len' array in the
Format description event. (This extension does not affect the compatibility
of the binary log). Here is how Format description event looks like with
this extension:
************************
FORMAT_DESCRIPTION_EVENT
************************
00000004 | A1 A0 2C 4B | time_when = 1261215905
00000008 | 0F | event_type = 15
00000009 | 64 00 00 00 | server_id = 100
0000000D | 7F 00 00 00 | event_len = 127
00000011 | 83 00 00 00 | log_pos = 00000083
00000015 | 01 00 | flags = LOG_EVENT_BINLOG_IN_USE_F
------------------------
00000017 | 04 00 | binlog_ver = 4
00000019 | 35 2E 32 2E | server_ver = 5.2.0-MariaDB-alpha-debug-log
..... ...
0000004B | A1 A0 2C 4B | time_created = 1261215905
0000004F | 13 | common_header_len = 19
------------------------
post_header_len
------------------------
00000050 | 38 | 56 - START_EVENT_V3 [1]
..... ...
00000069 | 02 | 2 - INCIDENT_EVENT [26]
0000006A | 00 | 0 - RESERVED [27]
..... ...
00000081 | 00 | 0 - RESERVED [50]
00000082 | 00 | 0 - ANNOTATE_ROWS_EVENT [51]
************************
2. Outline of Annotate_rows event behavior
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Each Annotate_rows_log_event object has two private members describing the
corresponding query:
char *m_query_txt;
uint m_query_len;
When the object is created for writing to a binary log, this query is taken
from 'thd' (for short, below we omit the 'Annotate_rows_log_event::' prefix
as well as other implementation details):
Annotate_rows_log_event(THD *thd)
{
m_query_txt = thd->query();
m_query_len = thd->query_length();
}
When the object is read from a binary log, the query is taken from the buffer
containing the binary log representation of the event (this buffer is allocated
in Log_event object from which all Log events are derived):
Annotate_rows_log_event(char *buf, uint event_len,
Format_description_log_event *desc)
{
m_query_len = event_len - desc->common_header_len;
m_query_txt = buf + desc->common_header_len;
}
The events are written to the binary log by the Log_event::write() member
which calls virtual write_data_header() and write_data_body() members
("data header" and "post header" are synonym in replication terminology).
In our case, data header is empty and data body is just the query:
bool write_data_body(IO_CACHE *file)
{
return my_b_safe_write(file, (uchar*) m_query_txt, m_query_len);
}
Printing the event is just printing the query:
void Annotate_rows_log_event::print(FILE *file, PRINT_EVENT_INFO *pinfo)
{
my_b_printf(&pinfo->head_cache, "\tQuery: `%s`\n", m_query_txt);
}
3. How Master writes Annotate_rows events to the binary log
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The event is written to the binary log just before the group of Table_map
events which precede corresponding Rows events (one query may generate
several Table map events in the binary log, but the corresponding
Annotate_rows event must be written only once before the first Table map
event; hence the boolean variable 'with_annotate' below):
int write_locked_table_maps(THD *thd)
{ ...
bool with_annotate= thd->variables.binlog_annotate_rows_events;
...
for (uint i= 0; i < ... <number of tables> ...; ++i)
{ ...
thd->binlog_write_table_map(table, ..., with_annotate);
with_annotate= 0; // write Annotate_event not more than once
...
}
...
}
int THD::binlog_write_table_map(TABLE *table, ..., bool with_annotate)
{ ...
Table_map_log_event the_event(...);
...
if (with_annotate)
{
Annotate_rows_log_event anno(this);
mysql_bin_log.write(&anno);
}
mysql_bin_log.write(&the_event);
...
}
4. How slave treats replicate-annotate-rows-events option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The replicate-annotate-rows-events option is treated just as the session
value of the binlog_annotate_rows_events variable for the slave IO and
SQL threads. This setting is done during initialization of these threads:
pthread_handler_t handle_slave_io(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_IO);
...
}
pthread_handler_t handle_slave_sql(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_SQL);
...
}
int init_slave_thread(THD* thd, SLAVE_THD_TYPE thd_type)
{ ...
thd->variables.binlog_annotate_rows_events=
opt_replicate_annotate_rows_events;
...
}
5. How slave IO thread requests Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the replicate-annotate-rows-events option is not set on a slave, there
is no need for master to send Annotate_rows events to this slave. The slave
(or mysqlbinlog in remote case), before requesting binlog dump via the
COM_BINLOG_DUMP command, informs the master whether it should send these
events by executing the newly added COM_BINLOG_DUMP_OPTIONS_EXT server
command:
case COM_BINLOG_DUMP_OPTIONS_EXT:
thd->binlog_dump_flags_ext= packet[0];
my_ok(thd);
break;
Note. We add this new command and don't use COM_BINLOG_DUMP to avoid possible
conflicts with MySQL/Sun.
6. How master executes the request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
case COM_BINLOG_DUMP:
{ ...
flags= uint2korr(packet + 4);
...
mysql_binlog_send(thd, ..., flags);
...
}
void mysql_binlog_send(THD* thd, ..., ushort flags)
{ ...
Log_event::read_log_event(&log, packet, ...);
...
if ((*packet)[EVENT_TYPE_OFFSET + 1] != ANNOTATE_ROWS_EVENT ||
flags & BINLOG_SEND_ANNOTATE_ROWS_EVENT)
{
my_net_write(net, packet->ptr(), packet->length());
}
...
}
7. How slave SQL thread processes Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The slave processes each recieved event by "applying" it, i.e. by
calling the Log_event::apply_event() function which in turn calls
the virtual do_apply_event() member specific for each type of the
event.
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev = next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
int apply_event_and_update_pos(Log_event *ev, ...)
{ ...
ev->apply_event(...);
...
}
int Log_event::apply_event(...)
{
return do_apply_event(...);
}
What does it mean to "apply" an Annotate_rows event? It means to set current
thd query to that of the described by the event, i.e. to the query which
caused the subsequent Rows events (see "How Master writes Annotate_rows
events to the binary log" to follow what happens further when the subsequent
Rows events are applied):
int Annotate_rows_log_event::do_apply_event(...)
{
thd->set_query(m_query_txt, m_query_len);
}
NOTE. I am not sure, but possibly current values of thd->query and
thd->query_length should be saved before calling set_query() and to be
restored on the Annotate_rows_log_event object deletion.
Is it really needed ?
After calling this do_apply_event() function we may not delete the
Annotate_rows_log_event object immediatedly (see exec_relay_log_event()
above) because thd->query now points to the string inside this object.
We may keep the pointer to this object in the Relay_log_info:
class Relay_log_info
{
public:
...
void set_annotate_event(Annotate_rows_log_event*);
Annotate_rows_log_event* get_annotate_event();
void free_annotate_event();
...
private:
Annotate_rows_log_event* m_annotate_event;
};
The saved Annotate_rows object should be deleted when all corresponding
Rows events will be processed:
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev= next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (rli->get_annotate_event() && is_last_rows_event(ev))
rli->free_annotate_event();
else if (ev->get_type_code() == ANNOTATE_ROWS_EVENT)
rli->set_annotate_event((Annotate_rows_log_event*) ev);
else if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
where
bool is_last_rows_event(Log_event* ev)
{
Log_event_type type= ev->get_type_code();
if (IS_ROWS_EVENT_TYPE(type))
{
Rows_log_event* rows= (Rows_log_event*)ev;
return rows->get_flags(Rows_log_event::STMT_END_F);
}
return 0;
}
#define IS_ROWS_EVENT_TYPE(type) ((type) == WRITE_ROWS_EVENT || \
(type) == UPDATE_ROWS_EVENT || \
(type) == DELETE_ROWS_EVENT)
8. General remarks
~~~~~~~~~~~~~~~~~~
Kristian noticed that introducing new log event type should be coordinated
somehow with MySQL/Sun:
Kristian: The numeric code for this event must be assigned carefully.
It should be coordinated with MySQL/Sun, otherwise we can get into a
situation where MySQL uses the same numeric code for one event that
MariaDB uses for ANNOTATE_ROWS_EVENT, which would make merging the two
impossible.
Alex: I reserved about 20 numbers not to have possible conflicts
with MySQL.
Kristian: Still, I think it would be appropriate to send a polite email
to internals(a)lists.mysql.com about this and suggesting to reserve the
event number.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] Progress (by Knielsen): Store in binlog text of statements that caused RBR events (47)
by worklog-noreply@askmonty.org 07 Jun '10
by worklog-noreply@askmonty.org 07 Jun '10
07 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Store in binlog text of statements that caused RBR events
CREATION DATE..: Sat, 15 Aug 2009, 23:48
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Knielsen, Serg
CATEGORY.......: Server-Sprint
TASK ID........: 47 (http://askmonty.org/worklog/?tid=47)
VERSION........: Server-9.x
STATUS.........: Code-Review
PRIORITY.......: 60
WORKED HOURS...: 41
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 35
PROGRESS NOTES:
-=-=(Knielsen - Mon, 07 Jun 2010, 07:13)=-=-
Help debug some test failures seen in Buildbot.
Worked 6 hours and estimate 0 hours remain (original estimate increased by 6 hours).
-=-=(Knielsen - Mon, 31 May 2010, 06:49)=-=-
Help Alexi debug+fix some test problems in the patch.
Worked 4 hours and estimate 0 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 25 May 2010, 08:29)=-=-
Help debug strange problem in mysqlbinlog.test.
Worked 1 hour and estimate 4 hours remain (original estimate unchanged).
-=-=(Knielsen - Mon, 17 May 2010, 08:45)=-=-
Merge with latest trunk and run Buildbot tests.
Worked 1 hour and estimate 5 hours remain (original estimate unchanged).
-=-=(Knielsen - Wed, 05 May 2010, 13:53)=-=-
Review of fixes to first review done. No new issues found.
Worked 2 hours and estimate 6 hours remain (original estimate unchanged).
-=-=(Knielsen - Fri, 23 Apr 2010, 12:51)=-=-
Status updated.
--- /tmp/wklog.47.old.28747 2010-04-23 12:51:36.000000000 +0000
+++ /tmp/wklog.47.new.28747 2010-04-23 12:51:36.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Knielsen - Tue, 06 Apr 2010, 15:26)=-=-
Code review (mailed to maria-developers@).
Worked 7 hours and estimate 8 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 06 Apr 2010, 15:25)=-=-
Status updated.
--- /tmp/wklog.47.old.12734 2010-04-06 15:25:54.000000000 +0000
+++ /tmp/wklog.47.new.12734 2010-04-06 15:25:54.000000000 +0000
@@ -1 +1 @@
-Code-Review
+In-Progress
-=-=(Knielsen - Mon, 29 Mar 2010, 10:59)=-=-
Status updated.
--- /tmp/wklog.47.old.27790 2010-03-29 10:59:53.000000000 +0000
+++ /tmp/wklog.47.new.27790 2010-03-29 10:59:53.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Alexi - Thu, 18 Feb 2010, 19:29)=-=-
Worked 20 hours (alexi)
Worked 20 hours and estimate 15 hours remain (original estimate unchanged).
------------------------------------------------------------
-=-=(View All Progress Notes, 33 total)=-=-
http://askmonty.org/worklog/index.pl?tid=47&nolimit=1
DESCRIPTION:
Store in binlog (and show in mysqlbinlog output) texts of statements that
caused RBR events
This is needed for (list from Monty):
- Easier to understand why updates happened
- Would make it easier to find out where in application things went
wrong (as you can search for exact strings)
- Allow one to filter things based on comments in the statement.
The cost of this can be that the binlog will be approximately 2x in size
(especially insert of big blob's would be a bit painful), so this should
be an optional feature.
HIGH-LEVEL SPECIFICATION:
Content
~~~~~~~
1. Annotate_rows_log_event
2. Server option: --binlog-annotate-rows-events
3. Server option: --replicate-annotate-rows-events
4. mysqlbinlog option: --print-annotate-rows-events
5. mysqlbinlog output
1. Annotate_rows_log_event [ ANNOTATE_ROWS_EVENT ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Describes the query which caused the corresponding rows events. Has empty
post-header and contains the query text in its data part. Example:
************************
ANNOTATE_ROWS_EVENT
************************
00000220 | B6 A0 2C 4B | time_when = 1261215926
00000224 | 33 | event_type = 51
00000225 | 64 00 00 00 | server_id = 100
00000229 | 36 00 00 00 | event_len = 54
0000022D | 56 02 00 00 | log_pos = 00000256
00000231 | 00 00 | flags = <none>
------------------------
00000233 | 49 4E 53 45 | query = "INSERT INTO t1 VALUES (1), (2), (3)"
00000237 | 52 54 20 49 |
0000023B | 4E 54 4F 20 |
0000023F | 74 31 20 56 |
00000243 | 41 4C 55 45 |
00000247 | 53 20 28 31 |
0000024B | 29 2C 20 28 |
0000024F | 32 29 2C 20 |
00000253 | 28 33 29 |
************************
In binary log, Annotate_rows event follows the (possible) 'BEGIN' Query event
and precedes the first of Table map events which accompany the corresponding
rows events. (See example in the "mysqlbinlog output" section below.)
2. Server option: --binlog-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the master to write Annotate_rows events to the binary log.
* Variable Name: binlog_annotate_rows_events
* Scope: Global & Session
* Access Type: Dynamic
* Data Type: bool
* Default Value: OFF
NOTE. Session values allows to annotate only some selected statements:
...
SET SESSION binlog_annotate_rows_events=ON;
... statements to be annotated ...
SET SESSION binlog_annotate_rows_events=OFF;
... statements not to be annotated ...
3. Server option: --replicate-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the slave to reproduce Annotate_rows events recieved from the master
in its own binary log (sensible only in pair with log-slave-updates option).
* Variable Name: replicate_annotate_rows_events
* Scope: Global
* Access Type: Read only
* Data Type: bool
* Default Value: OFF
NOTE. Why do we additionally need this 'replicate' option? Why not to make
the slave to reproduce this events when its binlog-annotate-rows-events
global value is ON? Well, because, for example, we may want to configure
the slave which should reproduce Annotate_rows events but has global
binlog-annotate-rows-events = OFF meaning this to be the default value for
the client threads (see also "How slave treats replicate-annotate-rows-events
option" in LLD part).
4. mysqlbinlog option: --print-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With this option, mysqlbinlog prints the content of Annotate_rows events (if
the binary log does contain them). Without this option (i.e. by default),
mysqlbinlog skips Annotate_rows events.
5. mysqlbinlog output
~~~~~~~~~~~~~~~~~~~~~
With --print-annotate-rows-events, mysqlbinlog outputs Annotate_rows events
in a form like this:
...
# at 1646
#091219 12:45:26 server id 100 end_log_pos 1714 Query thread_id=1
exec_time=0 error_code=0
SET TIMESTAMP=1261215926/*!*/;
BEGIN
/*!*/;
# at 1714
# at 1812
# at 1853
# at 1894
# at 1938
#091219 12:45:26 server id 100 end_log_pos 1812 Query: `DELETE t1, t2 FROM
t1 INNER JOIN t2 INNER JOIN t3 WHERE t1.a=t2.a AND t2.a=t3.a`
#091219 12:45:26 server id 100 end_log_pos 1853 Table_map: `test`.`t1`
mapped to number 16
#091219 12:45:26 server id 100 end_log_pos 1894 Table_map: `test`.`t2`
mapped to number 17
#091219 12:45:26 server id 100 end_log_pos 1938 Delete_rows: table id 16
#091219 12:45:26 server id 100 end_log_pos 1982 Delete_rows: table id 17
flags: STMT_END_F
...
LOW-LEVEL DESIGN:
Content
~~~~~~~
1. Annotate_rows event number
2. Outline of Annotate_rows event behavior
3. How Master writes Annotate_rows events to the binary log
4. How slave treats replicate-annotate-rows-events option
5. How slave IO thread requests Annotate_rows events
6. How master executes the request
7. How slave SQL thread processes Annotate_rows events
8. General remarks
1. Annotate_rows event number
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To avoid possible event numbers conflict with MySQL/Sun, we leave a gap
between the last MySQL event number and the Annotate_rows event number:
enum Log_event_type
{ ...
INCIDENT_EVENT= 26,
// New MySQL event numbers are to be added here
MYSQL_EVENTS_END,
MARIA_EVENTS_BEGIN= 51,
// New Maria event numbers start from here
ANNOTATE_ROWS_EVENT= 51,
ENUM_END_EVENT
};
together with the corresponding extension of 'post_header_len' array in the
Format description event. (This extension does not affect the compatibility
of the binary log). Here is how Format description event looks like with
this extension:
************************
FORMAT_DESCRIPTION_EVENT
************************
00000004 | A1 A0 2C 4B | time_when = 1261215905
00000008 | 0F | event_type = 15
00000009 | 64 00 00 00 | server_id = 100
0000000D | 7F 00 00 00 | event_len = 127
00000011 | 83 00 00 00 | log_pos = 00000083
00000015 | 01 00 | flags = LOG_EVENT_BINLOG_IN_USE_F
------------------------
00000017 | 04 00 | binlog_ver = 4
00000019 | 35 2E 32 2E | server_ver = 5.2.0-MariaDB-alpha-debug-log
..... ...
0000004B | A1 A0 2C 4B | time_created = 1261215905
0000004F | 13 | common_header_len = 19
------------------------
post_header_len
------------------------
00000050 | 38 | 56 - START_EVENT_V3 [1]
..... ...
00000069 | 02 | 2 - INCIDENT_EVENT [26]
0000006A | 00 | 0 - RESERVED [27]
..... ...
00000081 | 00 | 0 - RESERVED [50]
00000082 | 00 | 0 - ANNOTATE_ROWS_EVENT [51]
************************
2. Outline of Annotate_rows event behavior
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Each Annotate_rows_log_event object has two private members describing the
corresponding query:
char *m_query_txt;
uint m_query_len;
When the object is created for writing to a binary log, this query is taken
from 'thd' (for short, below we omit the 'Annotate_rows_log_event::' prefix
as well as other implementation details):
Annotate_rows_log_event(THD *thd)
{
m_query_txt = thd->query();
m_query_len = thd->query_length();
}
When the object is read from a binary log, the query is taken from the buffer
containing the binary log representation of the event (this buffer is allocated
in Log_event object from which all Log events are derived):
Annotate_rows_log_event(char *buf, uint event_len,
Format_description_log_event *desc)
{
m_query_len = event_len - desc->common_header_len;
m_query_txt = buf + desc->common_header_len;
}
The events are written to the binary log by the Log_event::write() member
which calls virtual write_data_header() and write_data_body() members
("data header" and "post header" are synonym in replication terminology).
In our case, data header is empty and data body is just the query:
bool write_data_body(IO_CACHE *file)
{
return my_b_safe_write(file, (uchar*) m_query_txt, m_query_len);
}
Printing the event is just printing the query:
void Annotate_rows_log_event::print(FILE *file, PRINT_EVENT_INFO *pinfo)
{
my_b_printf(&pinfo->head_cache, "\tQuery: `%s`\n", m_query_txt);
}
3. How Master writes Annotate_rows events to the binary log
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The event is written to the binary log just before the group of Table_map
events which precede corresponding Rows events (one query may generate
several Table map events in the binary log, but the corresponding
Annotate_rows event must be written only once before the first Table map
event; hence the boolean variable 'with_annotate' below):
int write_locked_table_maps(THD *thd)
{ ...
bool with_annotate= thd->variables.binlog_annotate_rows_events;
...
for (uint i= 0; i < ... <number of tables> ...; ++i)
{ ...
thd->binlog_write_table_map(table, ..., with_annotate);
with_annotate= 0; // write Annotate_event not more than once
...
}
...
}
int THD::binlog_write_table_map(TABLE *table, ..., bool with_annotate)
{ ...
Table_map_log_event the_event(...);
...
if (with_annotate)
{
Annotate_rows_log_event anno(this);
mysql_bin_log.write(&anno);
}
mysql_bin_log.write(&the_event);
...
}
4. How slave treats replicate-annotate-rows-events option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The replicate-annotate-rows-events option is treated just as the session
value of the binlog_annotate_rows_events variable for the slave IO and
SQL threads. This setting is done during initialization of these threads:
pthread_handler_t handle_slave_io(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_IO);
...
}
pthread_handler_t handle_slave_sql(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_SQL);
...
}
int init_slave_thread(THD* thd, SLAVE_THD_TYPE thd_type)
{ ...
thd->variables.binlog_annotate_rows_events=
opt_replicate_annotate_rows_events;
...
}
5. How slave IO thread requests Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the replicate-annotate-rows-events option is not set on a slave, there
is no need for master to send Annotate_rows events to this slave. The slave
(or mysqlbinlog in remote case), before requesting binlog dump via the
COM_BINLOG_DUMP command, informs the master whether it should send these
events by executing the newly added COM_BINLOG_DUMP_OPTIONS_EXT server
command:
case COM_BINLOG_DUMP_OPTIONS_EXT:
thd->binlog_dump_flags_ext= packet[0];
my_ok(thd);
break;
Note. We add this new command and don't use COM_BINLOG_DUMP to avoid possible
conflicts with MySQL/Sun.
6. How master executes the request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
case COM_BINLOG_DUMP:
{ ...
flags= uint2korr(packet + 4);
...
mysql_binlog_send(thd, ..., flags);
...
}
void mysql_binlog_send(THD* thd, ..., ushort flags)
{ ...
Log_event::read_log_event(&log, packet, ...);
...
if ((*packet)[EVENT_TYPE_OFFSET + 1] != ANNOTATE_ROWS_EVENT ||
flags & BINLOG_SEND_ANNOTATE_ROWS_EVENT)
{
my_net_write(net, packet->ptr(), packet->length());
}
...
}
7. How slave SQL thread processes Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The slave processes each recieved event by "applying" it, i.e. by
calling the Log_event::apply_event() function which in turn calls
the virtual do_apply_event() member specific for each type of the
event.
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev = next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
int apply_event_and_update_pos(Log_event *ev, ...)
{ ...
ev->apply_event(...);
...
}
int Log_event::apply_event(...)
{
return do_apply_event(...);
}
What does it mean to "apply" an Annotate_rows event? It means to set current
thd query to that of the described by the event, i.e. to the query which
caused the subsequent Rows events (see "How Master writes Annotate_rows
events to the binary log" to follow what happens further when the subsequent
Rows events are applied):
int Annotate_rows_log_event::do_apply_event(...)
{
thd->set_query(m_query_txt, m_query_len);
}
NOTE. I am not sure, but possibly current values of thd->query and
thd->query_length should be saved before calling set_query() and to be
restored on the Annotate_rows_log_event object deletion.
Is it really needed ?
After calling this do_apply_event() function we may not delete the
Annotate_rows_log_event object immediatedly (see exec_relay_log_event()
above) because thd->query now points to the string inside this object.
We may keep the pointer to this object in the Relay_log_info:
class Relay_log_info
{
public:
...
void set_annotate_event(Annotate_rows_log_event*);
Annotate_rows_log_event* get_annotate_event();
void free_annotate_event();
...
private:
Annotate_rows_log_event* m_annotate_event;
};
The saved Annotate_rows object should be deleted when all corresponding
Rows events will be processed:
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev= next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (rli->get_annotate_event() && is_last_rows_event(ev))
rli->free_annotate_event();
else if (ev->get_type_code() == ANNOTATE_ROWS_EVENT)
rli->set_annotate_event((Annotate_rows_log_event*) ev);
else if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
where
bool is_last_rows_event(Log_event* ev)
{
Log_event_type type= ev->get_type_code();
if (IS_ROWS_EVENT_TYPE(type))
{
Rows_log_event* rows= (Rows_log_event*)ev;
return rows->get_flags(Rows_log_event::STMT_END_F);
}
return 0;
}
#define IS_ROWS_EVENT_TYPE(type) ((type) == WRITE_ROWS_EVENT || \
(type) == UPDATE_ROWS_EVENT || \
(type) == DELETE_ROWS_EVENT)
8. General remarks
~~~~~~~~~~~~~~~~~~
Kristian noticed that introducing new log event type should be coordinated
somehow with MySQL/Sun:
Kristian: The numeric code for this event must be assigned carefully.
It should be coordinated with MySQL/Sun, otherwise we can get into a
situation where MySQL uses the same numeric code for one event that
MariaDB uses for ANNOTATE_ROWS_EVENT, which would make merging the two
impossible.
Alex: I reserved about 20 numbers not to have possible conflicts
with MySQL.
Kristian: Still, I think it would be appropriate to send a polite email
to internals(a)lists.mysql.com about this and suggesting to reserve the
event number.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] Progress (by Knielsen): Store in binlog text of statements that caused RBR events (47)
by worklog-noreply@askmonty.org 07 Jun '10
by worklog-noreply@askmonty.org 07 Jun '10
07 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Store in binlog text of statements that caused RBR events
CREATION DATE..: Sat, 15 Aug 2009, 23:48
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Knielsen, Serg
CATEGORY.......: Server-Sprint
TASK ID........: 47 (http://askmonty.org/worklog/?tid=47)
VERSION........: Server-9.x
STATUS.........: Code-Review
PRIORITY.......: 60
WORKED HOURS...: 41
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 35
PROGRESS NOTES:
-=-=(Knielsen - Mon, 07 Jun 2010, 07:13)=-=-
Help debug some test failures seen in Buildbot.
Worked 6 hours and estimate 0 hours remain (original estimate increased by 6 hours).
-=-=(Knielsen - Mon, 31 May 2010, 06:49)=-=-
Help Alexi debug+fix some test problems in the patch.
Worked 4 hours and estimate 0 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 25 May 2010, 08:29)=-=-
Help debug strange problem in mysqlbinlog.test.
Worked 1 hour and estimate 4 hours remain (original estimate unchanged).
-=-=(Knielsen - Mon, 17 May 2010, 08:45)=-=-
Merge with latest trunk and run Buildbot tests.
Worked 1 hour and estimate 5 hours remain (original estimate unchanged).
-=-=(Knielsen - Wed, 05 May 2010, 13:53)=-=-
Review of fixes to first review done. No new issues found.
Worked 2 hours and estimate 6 hours remain (original estimate unchanged).
-=-=(Knielsen - Fri, 23 Apr 2010, 12:51)=-=-
Status updated.
--- /tmp/wklog.47.old.28747 2010-04-23 12:51:36.000000000 +0000
+++ /tmp/wklog.47.new.28747 2010-04-23 12:51:36.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Knielsen - Tue, 06 Apr 2010, 15:26)=-=-
Code review (mailed to maria-developers@).
Worked 7 hours and estimate 8 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 06 Apr 2010, 15:25)=-=-
Status updated.
--- /tmp/wklog.47.old.12734 2010-04-06 15:25:54.000000000 +0000
+++ /tmp/wklog.47.new.12734 2010-04-06 15:25:54.000000000 +0000
@@ -1 +1 @@
-Code-Review
+In-Progress
-=-=(Knielsen - Mon, 29 Mar 2010, 10:59)=-=-
Status updated.
--- /tmp/wklog.47.old.27790 2010-03-29 10:59:53.000000000 +0000
+++ /tmp/wklog.47.new.27790 2010-03-29 10:59:53.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Alexi - Thu, 18 Feb 2010, 19:29)=-=-
Worked 20 hours (alexi)
Worked 20 hours and estimate 15 hours remain (original estimate unchanged).
------------------------------------------------------------
-=-=(View All Progress Notes, 33 total)=-=-
http://askmonty.org/worklog/index.pl?tid=47&nolimit=1
DESCRIPTION:
Store in binlog (and show in mysqlbinlog output) texts of statements that
caused RBR events
This is needed for (list from Monty):
- Easier to understand why updates happened
- Would make it easier to find out where in application things went
wrong (as you can search for exact strings)
- Allow one to filter things based on comments in the statement.
The cost of this can be that the binlog will be approximately 2x in size
(especially insert of big blob's would be a bit painful), so this should
be an optional feature.
HIGH-LEVEL SPECIFICATION:
Content
~~~~~~~
1. Annotate_rows_log_event
2. Server option: --binlog-annotate-rows-events
3. Server option: --replicate-annotate-rows-events
4. mysqlbinlog option: --print-annotate-rows-events
5. mysqlbinlog output
1. Annotate_rows_log_event [ ANNOTATE_ROWS_EVENT ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Describes the query which caused the corresponding rows events. Has empty
post-header and contains the query text in its data part. Example:
************************
ANNOTATE_ROWS_EVENT
************************
00000220 | B6 A0 2C 4B | time_when = 1261215926
00000224 | 33 | event_type = 51
00000225 | 64 00 00 00 | server_id = 100
00000229 | 36 00 00 00 | event_len = 54
0000022D | 56 02 00 00 | log_pos = 00000256
00000231 | 00 00 | flags = <none>
------------------------
00000233 | 49 4E 53 45 | query = "INSERT INTO t1 VALUES (1), (2), (3)"
00000237 | 52 54 20 49 |
0000023B | 4E 54 4F 20 |
0000023F | 74 31 20 56 |
00000243 | 41 4C 55 45 |
00000247 | 53 20 28 31 |
0000024B | 29 2C 20 28 |
0000024F | 32 29 2C 20 |
00000253 | 28 33 29 |
************************
In binary log, Annotate_rows event follows the (possible) 'BEGIN' Query event
and precedes the first of Table map events which accompany the corresponding
rows events. (See example in the "mysqlbinlog output" section below.)
2. Server option: --binlog-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the master to write Annotate_rows events to the binary log.
* Variable Name: binlog_annotate_rows_events
* Scope: Global & Session
* Access Type: Dynamic
* Data Type: bool
* Default Value: OFF
NOTE. Session values allows to annotate only some selected statements:
...
SET SESSION binlog_annotate_rows_events=ON;
... statements to be annotated ...
SET SESSION binlog_annotate_rows_events=OFF;
... statements not to be annotated ...
3. Server option: --replicate-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the slave to reproduce Annotate_rows events recieved from the master
in its own binary log (sensible only in pair with log-slave-updates option).
* Variable Name: replicate_annotate_rows_events
* Scope: Global
* Access Type: Read only
* Data Type: bool
* Default Value: OFF
NOTE. Why do we additionally need this 'replicate' option? Why not to make
the slave to reproduce this events when its binlog-annotate-rows-events
global value is ON? Well, because, for example, we may want to configure
the slave which should reproduce Annotate_rows events but has global
binlog-annotate-rows-events = OFF meaning this to be the default value for
the client threads (see also "How slave treats replicate-annotate-rows-events
option" in LLD part).
4. mysqlbinlog option: --print-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With this option, mysqlbinlog prints the content of Annotate_rows events (if
the binary log does contain them). Without this option (i.e. by default),
mysqlbinlog skips Annotate_rows events.
5. mysqlbinlog output
~~~~~~~~~~~~~~~~~~~~~
With --print-annotate-rows-events, mysqlbinlog outputs Annotate_rows events
in a form like this:
...
# at 1646
#091219 12:45:26 server id 100 end_log_pos 1714 Query thread_id=1
exec_time=0 error_code=0
SET TIMESTAMP=1261215926/*!*/;
BEGIN
/*!*/;
# at 1714
# at 1812
# at 1853
# at 1894
# at 1938
#091219 12:45:26 server id 100 end_log_pos 1812 Query: `DELETE t1, t2 FROM
t1 INNER JOIN t2 INNER JOIN t3 WHERE t1.a=t2.a AND t2.a=t3.a`
#091219 12:45:26 server id 100 end_log_pos 1853 Table_map: `test`.`t1`
mapped to number 16
#091219 12:45:26 server id 100 end_log_pos 1894 Table_map: `test`.`t2`
mapped to number 17
#091219 12:45:26 server id 100 end_log_pos 1938 Delete_rows: table id 16
#091219 12:45:26 server id 100 end_log_pos 1982 Delete_rows: table id 17
flags: STMT_END_F
...
LOW-LEVEL DESIGN:
Content
~~~~~~~
1. Annotate_rows event number
2. Outline of Annotate_rows event behavior
3. How Master writes Annotate_rows events to the binary log
4. How slave treats replicate-annotate-rows-events option
5. How slave IO thread requests Annotate_rows events
6. How master executes the request
7. How slave SQL thread processes Annotate_rows events
8. General remarks
1. Annotate_rows event number
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To avoid possible event numbers conflict with MySQL/Sun, we leave a gap
between the last MySQL event number and the Annotate_rows event number:
enum Log_event_type
{ ...
INCIDENT_EVENT= 26,
// New MySQL event numbers are to be added here
MYSQL_EVENTS_END,
MARIA_EVENTS_BEGIN= 51,
// New Maria event numbers start from here
ANNOTATE_ROWS_EVENT= 51,
ENUM_END_EVENT
};
together with the corresponding extension of 'post_header_len' array in the
Format description event. (This extension does not affect the compatibility
of the binary log). Here is how Format description event looks like with
this extension:
************************
FORMAT_DESCRIPTION_EVENT
************************
00000004 | A1 A0 2C 4B | time_when = 1261215905
00000008 | 0F | event_type = 15
00000009 | 64 00 00 00 | server_id = 100
0000000D | 7F 00 00 00 | event_len = 127
00000011 | 83 00 00 00 | log_pos = 00000083
00000015 | 01 00 | flags = LOG_EVENT_BINLOG_IN_USE_F
------------------------
00000017 | 04 00 | binlog_ver = 4
00000019 | 35 2E 32 2E | server_ver = 5.2.0-MariaDB-alpha-debug-log
..... ...
0000004B | A1 A0 2C 4B | time_created = 1261215905
0000004F | 13 | common_header_len = 19
------------------------
post_header_len
------------------------
00000050 | 38 | 56 - START_EVENT_V3 [1]
..... ...
00000069 | 02 | 2 - INCIDENT_EVENT [26]
0000006A | 00 | 0 - RESERVED [27]
..... ...
00000081 | 00 | 0 - RESERVED [50]
00000082 | 00 | 0 - ANNOTATE_ROWS_EVENT [51]
************************
2. Outline of Annotate_rows event behavior
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Each Annotate_rows_log_event object has two private members describing the
corresponding query:
char *m_query_txt;
uint m_query_len;
When the object is created for writing to a binary log, this query is taken
from 'thd' (for short, below we omit the 'Annotate_rows_log_event::' prefix
as well as other implementation details):
Annotate_rows_log_event(THD *thd)
{
m_query_txt = thd->query();
m_query_len = thd->query_length();
}
When the object is read from a binary log, the query is taken from the buffer
containing the binary log representation of the event (this buffer is allocated
in Log_event object from which all Log events are derived):
Annotate_rows_log_event(char *buf, uint event_len,
Format_description_log_event *desc)
{
m_query_len = event_len - desc->common_header_len;
m_query_txt = buf + desc->common_header_len;
}
The events are written to the binary log by the Log_event::write() member
which calls virtual write_data_header() and write_data_body() members
("data header" and "post header" are synonym in replication terminology).
In our case, data header is empty and data body is just the query:
bool write_data_body(IO_CACHE *file)
{
return my_b_safe_write(file, (uchar*) m_query_txt, m_query_len);
}
Printing the event is just printing the query:
void Annotate_rows_log_event::print(FILE *file, PRINT_EVENT_INFO *pinfo)
{
my_b_printf(&pinfo->head_cache, "\tQuery: `%s`\n", m_query_txt);
}
3. How Master writes Annotate_rows events to the binary log
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The event is written to the binary log just before the group of Table_map
events which precede corresponding Rows events (one query may generate
several Table map events in the binary log, but the corresponding
Annotate_rows event must be written only once before the first Table map
event; hence the boolean variable 'with_annotate' below):
int write_locked_table_maps(THD *thd)
{ ...
bool with_annotate= thd->variables.binlog_annotate_rows_events;
...
for (uint i= 0; i < ... <number of tables> ...; ++i)
{ ...
thd->binlog_write_table_map(table, ..., with_annotate);
with_annotate= 0; // write Annotate_event not more than once
...
}
...
}
int THD::binlog_write_table_map(TABLE *table, ..., bool with_annotate)
{ ...
Table_map_log_event the_event(...);
...
if (with_annotate)
{
Annotate_rows_log_event anno(this);
mysql_bin_log.write(&anno);
}
mysql_bin_log.write(&the_event);
...
}
4. How slave treats replicate-annotate-rows-events option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The replicate-annotate-rows-events option is treated just as the session
value of the binlog_annotate_rows_events variable for the slave IO and
SQL threads. This setting is done during initialization of these threads:
pthread_handler_t handle_slave_io(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_IO);
...
}
pthread_handler_t handle_slave_sql(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_SQL);
...
}
int init_slave_thread(THD* thd, SLAVE_THD_TYPE thd_type)
{ ...
thd->variables.binlog_annotate_rows_events=
opt_replicate_annotate_rows_events;
...
}
5. How slave IO thread requests Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the replicate-annotate-rows-events option is not set on a slave, there
is no need for master to send Annotate_rows events to this slave. The slave
(or mysqlbinlog in remote case), before requesting binlog dump via the
COM_BINLOG_DUMP command, informs the master whether it should send these
events by executing the newly added COM_BINLOG_DUMP_OPTIONS_EXT server
command:
case COM_BINLOG_DUMP_OPTIONS_EXT:
thd->binlog_dump_flags_ext= packet[0];
my_ok(thd);
break;
Note. We add this new command and don't use COM_BINLOG_DUMP to avoid possible
conflicts with MySQL/Sun.
6. How master executes the request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
case COM_BINLOG_DUMP:
{ ...
flags= uint2korr(packet + 4);
...
mysql_binlog_send(thd, ..., flags);
...
}
void mysql_binlog_send(THD* thd, ..., ushort flags)
{ ...
Log_event::read_log_event(&log, packet, ...);
...
if ((*packet)[EVENT_TYPE_OFFSET + 1] != ANNOTATE_ROWS_EVENT ||
flags & BINLOG_SEND_ANNOTATE_ROWS_EVENT)
{
my_net_write(net, packet->ptr(), packet->length());
}
...
}
7. How slave SQL thread processes Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The slave processes each recieved event by "applying" it, i.e. by
calling the Log_event::apply_event() function which in turn calls
the virtual do_apply_event() member specific for each type of the
event.
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev = next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
int apply_event_and_update_pos(Log_event *ev, ...)
{ ...
ev->apply_event(...);
...
}
int Log_event::apply_event(...)
{
return do_apply_event(...);
}
What does it mean to "apply" an Annotate_rows event? It means to set current
thd query to that of the described by the event, i.e. to the query which
caused the subsequent Rows events (see "How Master writes Annotate_rows
events to the binary log" to follow what happens further when the subsequent
Rows events are applied):
int Annotate_rows_log_event::do_apply_event(...)
{
thd->set_query(m_query_txt, m_query_len);
}
NOTE. I am not sure, but possibly current values of thd->query and
thd->query_length should be saved before calling set_query() and to be
restored on the Annotate_rows_log_event object deletion.
Is it really needed ?
After calling this do_apply_event() function we may not delete the
Annotate_rows_log_event object immediatedly (see exec_relay_log_event()
above) because thd->query now points to the string inside this object.
We may keep the pointer to this object in the Relay_log_info:
class Relay_log_info
{
public:
...
void set_annotate_event(Annotate_rows_log_event*);
Annotate_rows_log_event* get_annotate_event();
void free_annotate_event();
...
private:
Annotate_rows_log_event* m_annotate_event;
};
The saved Annotate_rows object should be deleted when all corresponding
Rows events will be processed:
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev= next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (rli->get_annotate_event() && is_last_rows_event(ev))
rli->free_annotate_event();
else if (ev->get_type_code() == ANNOTATE_ROWS_EVENT)
rli->set_annotate_event((Annotate_rows_log_event*) ev);
else if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
where
bool is_last_rows_event(Log_event* ev)
{
Log_event_type type= ev->get_type_code();
if (IS_ROWS_EVENT_TYPE(type))
{
Rows_log_event* rows= (Rows_log_event*)ev;
return rows->get_flags(Rows_log_event::STMT_END_F);
}
return 0;
}
#define IS_ROWS_EVENT_TYPE(type) ((type) == WRITE_ROWS_EVENT || \
(type) == UPDATE_ROWS_EVENT || \
(type) == DELETE_ROWS_EVENT)
8. General remarks
~~~~~~~~~~~~~~~~~~
Kristian noticed that introducing new log event type should be coordinated
somehow with MySQL/Sun:
Kristian: The numeric code for this event must be assigned carefully.
It should be coordinated with MySQL/Sun, otherwise we can get into a
situation where MySQL uses the same numeric code for one event that
MariaDB uses for ANNOTATE_ROWS_EVENT, which would make merging the two
impossible.
Alex: I reserved about 20 numbers not to have possible conflicts
with MySQL.
Kristian: Still, I think it would be appropriate to send a polite email
to internals(a)lists.mysql.com about this and suggesting to reserve the
event number.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] Progress (by Knielsen): Add Sphinx storage engine to MariaDB (42)
by worklog-noreply@askmonty.org 07 Jun '10
by worklog-noreply@askmonty.org 07 Jun '10
07 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add Sphinx storage engine to MariaDB
CREATION DATE..: Mon, 10 Aug 2009, 23:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Knielsen
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 42 (http://askmonty.org/worklog/?tid=42)
VERSION........: Server-5.2
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 9
ESTIMATE.......: 7 (hours remain)
ORIG. ESTIMATE.: 16
PROGRESS NOTES:
-=-=(Knielsen - Mon, 07 Jun 2010, 07:12)=-=-
Help Andrew with the integration.
Worked 2 hours and estimate 7 hours remain (original estimate unchanged).
-=-=(Knielsen - Mon, 31 May 2010, 06:49)=-=-
Wrote patch that allows to test SphinxSE in mysql-test-run, using external Sphinx daemon.
Worked 7 hours and estimate 9 hours remain (original estimate unchanged).
-=-=(Knielsen - Fri, 28 May 2010, 07:49)=-=-
High-Level Specification modified.
--- /tmp/wklog.42.old.4369 2010-05-28 07:49:13.000000000 +0000
+++ /tmp/wklog.42.new.4369 2010-05-28 07:49:13.000000000 +0000
@@ -49,6 +49,10 @@
might be possible to pre-generate the necessary data/index files and store
them in the source tree.
+I pushed a proof-of-concept patch for this here:
+
+ lp:~knielsen/maria/5.2-sphinxse
+
Here is a sample test case using this:
--source include/have_sphinx.inc
-=-=(Knielsen - Fri, 28 May 2010, 06:31)=-=-
High-Level Specification modified.
--- /tmp/wklog.42.old.32746 2010-05-28 06:31:24.000000000 +0000
+++ /tmp/wklog.42.new.32746 2010-05-28 06:31:24.000000000 +0000
@@ -1 +1,63 @@
+Code
+----
+
+Andrew Aksyonoff from Sphinx is helping to integrate the SphinxSE plugin into
+the MariaDB tree.
+
+It is a plugin, so it can be added to the tree just by including the
+sub-directory storage/sphinx/.
+
+The Sphinx plugin is already of some maturity, having been used with MySQL for
+some time.
+
+
+Testing
+-------
+
+To get testing in the mysql-test-run framework, some extensions are needed.
+
+To use the Sphinx storage engine, the external Sphinx search daemon needs to
+be running with some data directory containing indexed data. It also needs to
+be allocated a port.
+
+This is the indended approach:
+
+1. Testing will use an external Sphinx setup installed on the machine. Sphinx
+binaries will be searched in typical locations (eg. /usr/bin, /usr/local/bin),
+or can be specified explicitly in the environment with SPHINXSEARCH_INDEXER
+and SPHINXSEARCH_SEARCHD for the two required binaries. If the external Sphinx
+binaries can not be found, then Sphinx tests will be disabled (using some
+--source include/have_sphinx.inc in the test cases).
+
+2. The mysql-test-run framework will install Sphinx search data and start/stop
+the Sphinx search daemon for the test cases, similarly how it is done for the
+other servers mysqld, ndbd, etc. We will run the Sphinx search daemon with
+options --console, --config, and --pidfile.
+
+3. The mysql-test-run framework will generate a Sphinx config file from a
+template in mysql-test/suite/sphinx/my.cnf. This config file will allocate
+ports and data directories appropriate for avoiding conflicts between multiple
+simultaneous mysql-test-run executions. The Sphinx config file is sufficiently
+similar to MySQL my.cnf that we can use the existing framework for generating
+config file, with just a slightly modified variant of the code writing the
+file to disk.
+
+4. The mysql-test-run framework will pre-load the mysql database with tables
+and data for Sphinx to index. It will then run the `indexer` program to
+generate the indexes, and then start the `searchd` daemon. These three steps
+must be done in order, as each step depends on the previous. ALTERNATIVE: it
+might be possible to pre-generate the necessary data/index files and store
+them in the source tree.
+
+Here is a sample test case using this:
+
+--source include/have_sphinx.inc
+--source include/have_sphinxse.inc
+
+--replace_result $SPHINXSEARCH_PORT SPHINXSEARCH_PORT
+eval create table ts ( id int unsigned not null, w int not null, q varchar(255)
+not null, index(q) ) engine=sphinx
+connection="sphinx://127.0.0.1:$SPHINXSEARCH_PORT/*";
+select * from ts where q='test';
+drop table ts;
-=-=(Knielsen - Fri, 28 May 2010, 06:07)=-=-
Version updated.
--- /tmp/wklog.42.old.32184 2010-05-28 06:07:00.000000000 +0000
+++ /tmp/wklog.42.new.32184 2010-05-28 06:07:00.000000000 +0000
@@ -1 +1 @@
-9.x
+Server-5.2
-=-=(Knielsen - Fri, 28 May 2010, 06:06)=-=-
Category updated.
--- /tmp/wklog.42.old.32171 2010-05-28 06:06:23.000000000 +0000
+++ /tmp/wklog.42.new.32171 2010-05-28 06:06:23.000000000 +0000
@@ -1 +1 @@
-Maria-BackLog
+Server-Sprint
-=-=(Knielsen - Fri, 28 May 2010, 06:06)=-=-
Version updated.
--- /tmp/wklog.42.old.32171 2010-05-28 06:06:23.000000000 +0000
+++ /tmp/wklog.42.new.32171 2010-05-28 06:06:23.000000000 +0000
@@ -1 +1 @@
-Maria-2.0
+9.x
-=-=(Knielsen - Fri, 28 May 2010, 06:06)=-=-
Status updated.
--- /tmp/wklog.42.old.32171 2010-05-28 06:06:23.000000000 +0000
+++ /tmp/wklog.42.new.32171 2010-05-28 06:06:23.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned
-=-=(Guest - Tue, 15 Sep 2009, 02:25)=-=-
no
Reported zero hours worked. Estimate unchanged.
-=-=(Guest - Tue, 15 Sep 2009, 02:24)=-=-
Version updated.
--- /tmp/wklog.42.old.13241 2009-09-15 02:24:07.000000000 +0300
+++ /tmp/wklog.42.new.13241 2009-09-15 02:24:07.000000000 +0300
@@ -1 +1 @@
-Connector/.NET-5.1
+Maria-2.0
DESCRIPTION:
Add the Sphinx storage engine to the MariaDB tree
HIGH-LEVEL SPECIFICATION:
Code
----
Andrew Aksyonoff from Sphinx is helping to integrate the SphinxSE plugin into
the MariaDB tree.
It is a plugin, so it can be added to the tree just by including the
sub-directory storage/sphinx/.
The Sphinx plugin is already of some maturity, having been used with MySQL for
some time.
Testing
-------
To get testing in the mysql-test-run framework, some extensions are needed.
To use the Sphinx storage engine, the external Sphinx search daemon needs to
be running with some data directory containing indexed data. It also needs to
be allocated a port.
This is the indended approach:
1. Testing will use an external Sphinx setup installed on the machine. Sphinx
binaries will be searched in typical locations (eg. /usr/bin, /usr/local/bin),
or can be specified explicitly in the environment with SPHINXSEARCH_INDEXER
and SPHINXSEARCH_SEARCHD for the two required binaries. If the external Sphinx
binaries can not be found, then Sphinx tests will be disabled (using some
--source include/have_sphinx.inc in the test cases).
2. The mysql-test-run framework will install Sphinx search data and start/stop
the Sphinx search daemon for the test cases, similarly how it is done for the
other servers mysqld, ndbd, etc. We will run the Sphinx search daemon with
options --console, --config, and --pidfile.
3. The mysql-test-run framework will generate a Sphinx config file from a
template in mysql-test/suite/sphinx/my.cnf. This config file will allocate
ports and data directories appropriate for avoiding conflicts between multiple
simultaneous mysql-test-run executions. The Sphinx config file is sufficiently
similar to MySQL my.cnf that we can use the existing framework for generating
config file, with just a slightly modified variant of the code writing the
file to disk.
4. The mysql-test-run framework will pre-load the mysql database with tables
and data for Sphinx to index. It will then run the `indexer` program to
generate the indexes, and then start the `searchd` daemon. These three steps
must be done in order, as each step depends on the previous. ALTERNATIVE: it
might be possible to pre-generate the necessary data/index files and store
them in the source tree.
I pushed a proof-of-concept patch for this here:
lp:~knielsen/maria/5.2-sphinxse
Here is a sample test case using this:
--source include/have_sphinx.inc
--source include/have_sphinxse.inc
--replace_result $SPHINXSEARCH_PORT SPHINXSEARCH_PORT
eval create table ts ( id int unsigned not null, w int not null, q varchar(255)
not null, index(q) ) engine=sphinx
connection="sphinx://127.0.0.1:$SPHINXSEARCH_PORT/*";
select * from ts where q='test';
drop table ts;
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] Progress (by Knielsen): Add Sphinx storage engine to MariaDB (42)
by worklog-noreply@askmonty.org 07 Jun '10
by worklog-noreply@askmonty.org 07 Jun '10
07 Jun '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add Sphinx storage engine to MariaDB
CREATION DATE..: Mon, 10 Aug 2009, 23:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Knielsen
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 42 (http://askmonty.org/worklog/?tid=42)
VERSION........: Server-5.2
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 9
ESTIMATE.......: 7 (hours remain)
ORIG. ESTIMATE.: 16
PROGRESS NOTES:
-=-=(Knielsen - Mon, 07 Jun 2010, 07:12)=-=-
Help Andrew with the integration.
Worked 2 hours and estimate 7 hours remain (original estimate unchanged).
-=-=(Knielsen - Mon, 31 May 2010, 06:49)=-=-
Wrote patch that allows to test SphinxSE in mysql-test-run, using external Sphinx daemon.
Worked 7 hours and estimate 9 hours remain (original estimate unchanged).
-=-=(Knielsen - Fri, 28 May 2010, 07:49)=-=-
High-Level Specification modified.
--- /tmp/wklog.42.old.4369 2010-05-28 07:49:13.000000000 +0000
+++ /tmp/wklog.42.new.4369 2010-05-28 07:49:13.000000000 +0000
@@ -49,6 +49,10 @@
might be possible to pre-generate the necessary data/index files and store
them in the source tree.
+I pushed a proof-of-concept patch for this here:
+
+ lp:~knielsen/maria/5.2-sphinxse
+
Here is a sample test case using this:
--source include/have_sphinx.inc
-=-=(Knielsen - Fri, 28 May 2010, 06:31)=-=-
High-Level Specification modified.
--- /tmp/wklog.42.old.32746 2010-05-28 06:31:24.000000000 +0000
+++ /tmp/wklog.42.new.32746 2010-05-28 06:31:24.000000000 +0000
@@ -1 +1,63 @@
+Code
+----
+
+Andrew Aksyonoff from Sphinx is helping to integrate the SphinxSE plugin into
+the MariaDB tree.
+
+It is a plugin, so it can be added to the tree just by including the
+sub-directory storage/sphinx/.
+
+The Sphinx plugin is already of some maturity, having been used with MySQL for
+some time.
+
+
+Testing
+-------
+
+To get testing in the mysql-test-run framework, some extensions are needed.
+
+To use the Sphinx storage engine, the external Sphinx search daemon needs to
+be running with some data directory containing indexed data. It also needs to
+be allocated a port.
+
+This is the indended approach:
+
+1. Testing will use an external Sphinx setup installed on the machine. Sphinx
+binaries will be searched in typical locations (eg. /usr/bin, /usr/local/bin),
+or can be specified explicitly in the environment with SPHINXSEARCH_INDEXER
+and SPHINXSEARCH_SEARCHD for the two required binaries. If the external Sphinx
+binaries can not be found, then Sphinx tests will be disabled (using some
+--source include/have_sphinx.inc in the test cases).
+
+2. The mysql-test-run framework will install Sphinx search data and start/stop
+the Sphinx search daemon for the test cases, similarly how it is done for the
+other servers mysqld, ndbd, etc. We will run the Sphinx search daemon with
+options --console, --config, and --pidfile.
+
+3. The mysql-test-run framework will generate a Sphinx config file from a
+template in mysql-test/suite/sphinx/my.cnf. This config file will allocate
+ports and data directories appropriate for avoiding conflicts between multiple
+simultaneous mysql-test-run executions. The Sphinx config file is sufficiently
+similar to MySQL my.cnf that we can use the existing framework for generating
+config file, with just a slightly modified variant of the code writing the
+file to disk.
+
+4. The mysql-test-run framework will pre-load the mysql database with tables
+and data for Sphinx to index. It will then run the `indexer` program to
+generate the indexes, and then start the `searchd` daemon. These three steps
+must be done in order, as each step depends on the previous. ALTERNATIVE: it
+might be possible to pre-generate the necessary data/index files and store
+them in the source tree.
+
+Here is a sample test case using this:
+
+--source include/have_sphinx.inc
+--source include/have_sphinxse.inc
+
+--replace_result $SPHINXSEARCH_PORT SPHINXSEARCH_PORT
+eval create table ts ( id int unsigned not null, w int not null, q varchar(255)
+not null, index(q) ) engine=sphinx
+connection="sphinx://127.0.0.1:$SPHINXSEARCH_PORT/*";
+select * from ts where q='test';
+drop table ts;
-=-=(Knielsen - Fri, 28 May 2010, 06:07)=-=-
Version updated.
--- /tmp/wklog.42.old.32184 2010-05-28 06:07:00.000000000 +0000
+++ /tmp/wklog.42.new.32184 2010-05-28 06:07:00.000000000 +0000
@@ -1 +1 @@
-9.x
+Server-5.2
-=-=(Knielsen - Fri, 28 May 2010, 06:06)=-=-
Category updated.
--- /tmp/wklog.42.old.32171 2010-05-28 06:06:23.000000000 +0000
+++ /tmp/wklog.42.new.32171 2010-05-28 06:06:23.000000000 +0000
@@ -1 +1 @@
-Maria-BackLog
+Server-Sprint
-=-=(Knielsen - Fri, 28 May 2010, 06:06)=-=-
Version updated.
--- /tmp/wklog.42.old.32171 2010-05-28 06:06:23.000000000 +0000
+++ /tmp/wklog.42.new.32171 2010-05-28 06:06:23.000000000 +0000
@@ -1 +1 @@
-Maria-2.0
+9.x
-=-=(Knielsen - Fri, 28 May 2010, 06:06)=-=-
Status updated.
--- /tmp/wklog.42.old.32171 2010-05-28 06:06:23.000000000 +0000
+++ /tmp/wklog.42.new.32171 2010-05-28 06:06:23.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned
-=-=(Guest - Tue, 15 Sep 2009, 02:25)=-=-
no
Reported zero hours worked. Estimate unchanged.
-=-=(Guest - Tue, 15 Sep 2009, 02:24)=-=-
Version updated.
--- /tmp/wklog.42.old.13241 2009-09-15 02:24:07.000000000 +0300
+++ /tmp/wklog.42.new.13241 2009-09-15 02:24:07.000000000 +0300
@@ -1 +1 @@
-Connector/.NET-5.1
+Maria-2.0
DESCRIPTION:
Add the Sphinx storage engine to the MariaDB tree
HIGH-LEVEL SPECIFICATION:
Code
----
Andrew Aksyonoff from Sphinx is helping to integrate the SphinxSE plugin into
the MariaDB tree.
It is a plugin, so it can be added to the tree just by including the
sub-directory storage/sphinx/.
The Sphinx plugin is already of some maturity, having been used with MySQL for
some time.
Testing
-------
To get testing in the mysql-test-run framework, some extensions are needed.
To use the Sphinx storage engine, the external Sphinx search daemon needs to
be running with some data directory containing indexed data. It also needs to
be allocated a port.
This is the indended approach:
1. Testing will use an external Sphinx setup installed on the machine. Sphinx
binaries will be searched in typical locations (eg. /usr/bin, /usr/local/bin),
or can be specified explicitly in the environment with SPHINXSEARCH_INDEXER
and SPHINXSEARCH_SEARCHD for the two required binaries. If the external Sphinx
binaries can not be found, then Sphinx tests will be disabled (using some
--source include/have_sphinx.inc in the test cases).
2. The mysql-test-run framework will install Sphinx search data and start/stop
the Sphinx search daemon for the test cases, similarly how it is done for the
other servers mysqld, ndbd, etc. We will run the Sphinx search daemon with
options --console, --config, and --pidfile.
3. The mysql-test-run framework will generate a Sphinx config file from a
template in mysql-test/suite/sphinx/my.cnf. This config file will allocate
ports and data directories appropriate for avoiding conflicts between multiple
simultaneous mysql-test-run executions. The Sphinx config file is sufficiently
similar to MySQL my.cnf that we can use the existing framework for generating
config file, with just a slightly modified variant of the code writing the
file to disk.
4. The mysql-test-run framework will pre-load the mysql database with tables
and data for Sphinx to index. It will then run the `indexer` program to
generate the indexes, and then start the `searchd` daemon. These three steps
must be done in order, as each step depends on the previous. ALTERNATIVE: it
might be possible to pre-generate the necessary data/index files and store
them in the source tree.
I pushed a proof-of-concept patch for this here:
lp:~knielsen/maria/5.2-sphinxse
Here is a sample test case using this:
--source include/have_sphinx.inc
--source include/have_sphinxse.inc
--replace_result $SPHINXSEARCH_PORT SPHINXSEARCH_PORT
eval create table ts ( id int unsigned not null, w int not null, q varchar(255)
not null, index(q) ) engine=sphinx
connection="sphinx://127.0.0.1:$SPHINXSEARCH_PORT/*";
select * from ts where q='test';
drop table ts;
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/ac32c9d693e0b70eec04d9ed68b9b4d8.jpg?s=120&d=mm&r=g)
[Maria-developers] Rev 2795: Bugfixes in file:///home/bell/maria/bzr/work-maria-5.3-scache/
by sanja@askmonty.org 05 Jun '10
by sanja@askmonty.org 05 Jun '10
05 Jun '10
At file:///home/bell/maria/bzr/work-maria-5.3-scache/
------------------------------------------------------------
revno: 2795
revision-id: sanja(a)askmonty.org-20100605195727-7rrc5k75lr0a4o9z
parent: sanja(a)askmonty.org-20100527182744-1tu96cgyiaodzs32
committer: sanja(a)askmonty.org
branch nick: work-maria-5.3-scache
timestamp: Sat 2010-06-05 22:57:27 +0300
message:
Bugfixes
=== modified file 'mysql-test/r/myisam_mrr.result'
--- a/mysql-test/r/myisam_mrr.result 2010-03-11 21:43:31 +0000
+++ b/mysql-test/r/myisam_mrr.result 2010-06-05 19:57:27 +0000
@@ -394,7 +394,7 @@
# - engine_condition_pushdown does not affect ICP
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on
create table t0 (a int);
insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table t1 (a int, b int, key(a));
=== modified file 'mysql-test/r/subquery_cache.result'
--- a/mysql-test/r/subquery_cache.result 2010-05-27 17:41:38 +0000
+++ b/mysql-test/r/subquery_cache.result 2010-06-05 19:57:27 +0000
@@ -588,4 +588,28 @@
Subquery_cache_hit 0
Subquery_cache_miss 4
drop table t1;
+#test of sql_big_tables switch and outer table reference in subquery with grouping
+set option sql_big_tables=1;
+CREATE TABLE t1 (a INT PRIMARY KEY, b INT);
+INSERT INTO t1 VALUES (1,1),(2,1),(3,2),(4,2),(5,3),(6,3);
+SELECT (SELECT t1_outer.a FROM t1 AS t1_inner GROUP BY b LIMIT 1) FROM t1 AS t1_outer;
+(SELECT t1_outer.a FROM t1 AS t1_inner GROUP BY b LIMIT 1)
+1
+2
+3
+4
+5
+6
+drop table t1;
+set option sql_big_tables=0;
+#test of function reference to outer query
+set local group_concat_max_len=400;
+create table t2 (a int, b int);
+insert into t2 values (1,1), (2,2);
+select b x, (select group_concat(x) from t2) from t2;
+x (select group_concat(x) from t2)
+1 1,1
+2 2,2
+drop table t2;
+set local group_concat_max_len=default;
set optimizer_switch='subquery_cache=default';
=== modified file 'mysql-test/t/subquery_cache.test'
--- a/mysql-test/t/subquery_cache.test 2010-05-27 17:41:38 +0000
+++ b/mysql-test/t/subquery_cache.test 2010-06-05 19:57:27 +0000
@@ -201,4 +201,20 @@
show status like "subquery_cache%";
drop table t1;
+--echo #test of sql_big_tables switch and outer table reference in subquery with grouping
+set option sql_big_tables=1;
+CREATE TABLE t1 (a INT PRIMARY KEY, b INT);
+INSERT INTO t1 VALUES (1,1),(2,1),(3,2),(4,2),(5,3),(6,3);
+SELECT (SELECT t1_outer.a FROM t1 AS t1_inner GROUP BY b LIMIT 1) FROM t1 AS t1_outer;
+drop table t1;
+set option sql_big_tables=0;
+
+--echo #test of function reference to outer query
+set local group_concat_max_len=400;
+create table t2 (a int, b int);
+insert into t2 values (1,1), (2,2);
+select b x, (select group_concat(x) from t2) from t2;
+drop table t2;
+set local group_concat_max_len=default;
+
set optimizer_switch='subquery_cache=default';
=== modified file 'sql/item.cc'
--- a/sql/item.cc 2010-05-27 17:41:38 +0000
+++ b/sql/item.cc 2010-06-05 19:57:27 +0000
@@ -5110,6 +5110,19 @@
}
+/**
+ Saves one Fields of an Item of in other Field
+
+ @param from Field to copy value from
+ @param null_value reference on item null_value to set it if it is needed
+ @param to Field to cope value to
+ @param no_conversions how to deal with NULL value (see
+ set_field_to_null_with_conversions())
+
+ @retval FALSE OK
+ @retval TRUE Error
+*/
+
static int save_field_in_field(Field *from, my_bool *null_value,
Field *to, bool no_conversions)
{
@@ -5139,6 +5152,10 @@
int Item_field::save_in_field(Field *to, bool no_conversions)
{
+ /* if it is external field */
+ if (unlikely(depended_from))
+ return save_field_in_field(field, &null_value, to, no_conversions);
+
return save_field_in_field(result_field, &null_value, to, no_conversions);
}
@@ -6346,7 +6363,7 @@
int Item_ref::save_in_field(Field *to, bool no_conversions)
{
int res;
- if (result_field)
+ if (result_field && !depended_from)
return save_field_in_field(result_field, &null_value, to, no_conversions);
res= (*ref)->save_in_field(to, no_conversions);
null_value= (*ref)->null_value;
=== modified file 'sql/item_subselect.cc'
--- a/sql/item_subselect.cc 2010-05-25 18:29:14 +0000
+++ b/sql/item_subselect.cc 2010-06-05 19:57:27 +0000
@@ -1,4 +1,4 @@
-/* Copyrigh (C) 2000 MySQL AB
+/* Copyright (C) 2000 MySQL AB
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -818,6 +818,12 @@
exec();
}
+/**
+ Checks subquery cache for value
+
+ @retval NULL nothing found
+ @retval reference on item representing value found in the cache
+*/
Item *Item_subselect::check_cache()
{
=== modified file 'sql/item_subselect.h'
--- a/sql/item_subselect.h 2010-05-24 17:29:56 +0000
+++ b/sql/item_subselect.h 2010-06-05 19:57:27 +0000
@@ -95,7 +95,10 @@
st_select_lex *parent_select;
/**
- List of items subquery depends on (externally resolved);
+ List of references on items subquery depends on (externally resolved);
+
+ @note We can't store direct links on Items because it could be
+ substituted with other item (for example for grouping).
*/
List<Item*> depends_on;
=== modified file 'sql/sql_subquery_cache.cc'
--- a/sql/sql_subquery_cache.cc 2010-05-27 18:27:44 +0000
+++ b/sql/sql_subquery_cache.cc 2010-06-05 19:57:27 +0000
@@ -96,6 +96,10 @@
/**
Creates equalities expression.
+ @note For some type of fields index lookup do not return failure but set
+ pointer on the next record. To check exact match we use expression like:
+ field1=value1 and field2=value2 ...
+
@retval FALSE OK
@retval TRUE Error
*/
@@ -111,6 +115,7 @@
for (uint i= 1 /* skip result filed */; (ref= li++); i++)
{
Field *fld= cache_table->field[i];
+ /* Only some field types should be checked after lookup */
if (fld->type() == MYSQL_TYPE_VARCHAR ||
fld->type() == MYSQL_TYPE_TINY_BLOB ||
fld->type() == MYSQL_TYPE_MEDIUM_BLOB ||
@@ -140,11 +145,22 @@
}
+/**
+ Enumerates all fields in field number order.
+
+ @param arg reference on current field number
+
+ @return field number
+*/
+
static uint field_enumerator(uchar *arg)
{
return ((uint*)arg)[0]++;
}
+/**
+ Initializes temporary table and index for this cache
+*/
void Subquery_cache_tmptable::init()
{
@@ -182,8 +198,10 @@
if (!(cache_table= create_tmp_table(table_thd, &cache_table_param,
items, (ORDER*) NULL,
FALSE, FALSE,
- (table_thd->options |
- TMP_TABLE_ALL_COLUMNS),
+ ((table_thd->options |
+ TMP_TABLE_ALL_COLUMNS) &
+ ~(OPTION_BIG_TABLES |
+ TMP_TABLE_FORCE_MYISAM)),
HA_POS_ERROR,
(char *)"subquery-cache-table")))
{
@@ -191,14 +209,16 @@
DBUG_VOID_RETURN;
}
- if (cache_table->s->blob_fields)
+ if (cache_table->s->db_type() != heap_hton)
{
- DBUG_PRINT("error", ("we do not need blobs"));
+ DBUG_PRINT("error", ("we need only heap table"));
goto error;
}
+ /* first field in the table is result value, so we skip it */
li_items++;
field_counter=1;
+
if (cache_table->alloc_keys(1) ||
(cache_table->add_tmp_key(0, items.elements - 1,
&field_enumerator,
@@ -224,6 +244,7 @@
DBUG_PRINT("error", ("Creating Item_field failed"));
goto error;
}
+
if (make_equalities())
{
DBUG_PRINT("error", ("Creating equalities failed"));
@@ -247,11 +268,26 @@
}
+/**
+ Checks if current key present in the cache and returns value if it is true
+
+ @param value assigned Item with value from the cache if key
+ is found
+ @return result of the key lookup
+*/
+
Subquery_cache::result Subquery_cache_tmptable::check_value(Item **value)
{
int res;
DBUG_ENTER("Subquery_cache_tmptable::check_value");
+ /*
+ We delay cache initialization to get item references which should be
+ used at the moment of query execution. I.e. we store reference on item
+ reference at the moment of class creation but for table creation and
+ index supply structures (join_tab) we need real Items which used at the
+ moment of execution so we can resolve reference only at this point.
+ */
if (!inited)
init();
@@ -275,6 +311,15 @@
}
+/**
+ Puts given value in the cache
+
+ @param value Value to put in the cache
+
+ @retval FALSE OK
+ @retval TRUE Error
+*/
+
my_bool Subquery_cache_tmptable::put_value(Item *value)
{
int error;
@@ -313,9 +358,3 @@
cache_table= NULL;
DBUG_RETURN(TRUE);
}
-
-
-void Subquery_cache_tmptable::cleanup()
-{
- cache_table->file->ha_delete_all_rows();
-}
=== modified file 'sql/sql_subquery_cache.h'
--- a/sql/sql_subquery_cache.h 2010-05-25 10:45:36 +0000
+++ b/sql/sql_subquery_cache.h 2010-06-05 19:57:27 +0000
@@ -23,10 +23,6 @@
Puts value into this cache (key should be taken from cache owner)
*/
virtual my_bool put_value(Item *value)= 0;
- /**
- Cleans up and reset cache before reusing
- */
- virtual void cleanup()= 0;
};
struct st_table_ref;
@@ -45,10 +41,9 @@
virtual ~Subquery_cache_tmptable();
virtual result check_value(Item **value);
virtual my_bool put_value(Item *value);
- virtual void cleanup();
+
+private:
void init();
-
-private:
bool make_equalities();
/* tmp table parameters */
=== modified file 'sql/table.cc'
--- a/sql/table.cc 2010-05-27 17:41:38 +0000
+++ b/sql/table.cc 2010-06-05 19:57:27 +0000
@@ -5187,10 +5187,16 @@
key_part_info->store_length= key_part_info->length;
if ((*reg_field)->real_maybe_null())
+ {
key_part_info->store_length+= HA_KEY_NULL_LENGTH;
+ keyinfo->key_length+= HA_KEY_NULL_LENGTH;
+ }
if ((*reg_field)->type() == MYSQL_TYPE_BLOB ||
(*reg_field)->real_type() == MYSQL_TYPE_VARCHAR)
+ {
key_part_info->store_length+= HA_KEY_BLOB_LENGTH;
+ keyinfo->key_length+= HA_KEY_BLOB_LENGTH; // ???
+ }
key_part_info->type= (uint8) (*reg_field)->key_type();
key_part_info->key_type =
1
0
>From dispatch_command() in sql_parse.cc net_end_statement() is called after
ha_autocommit_or_rollback() but before close_thread_tables(). What can go
wrong in the call to close_thread_tables() after the response to the client?
Commit or rollback was done before a response was sent to the client.
/* If commit fails, we should be able to reset the OK status. */
thd->main_da.can_overwrite_status= TRUE;
ha_autocommit_or_rollback(thd, thd->is_error());
thd->main_da.can_overwrite_status= FALSE;
thd->transaction.stmt.reset();
net_end_statement(thd);
query_cache_end_of_result(thd);
thd->proc_info= "closing tables";
/* Free tables */
close_thread_tables(thd);
--
Mark Callaghan
mdcallag(a)gmail.com
2
1
![](https://secure.gravatar.com/avatar/3c2329333dd5898f2e72818af773243d.jpg?s=120&d=mm&r=g)
Re: [Maria-developers] [Bug 314570] Re: update is not changing internal auto increment value
by Sergei Golubchik 05 Jun '10
by Sergei Golubchik 05 Jun '10
05 Jun '10
Hi, Michael!
On Jun 04, Michael Widenius wrote:
>
> hi!
>
> >>>>> "Sergei" == Sergei <sergii(a)pisem.net> writes:
>
> Sergei> ** Changed in: maria
> Sergei> Importance: Undecided => Low
>
> Sergei> --
> Sergei> update is not changing internal auto increment value
> Sergei> https://bugs.launchpad.net/bugs/314570
>
> Why low ?
>
> Looks like a serious issue that we should get Percona to fix at once!
Because Heikki said it's not a bug, but intentional InnoDB behavior.
I'm not sure we should fix it at all. Heikki is certainly not fixing
it.
Regards,
Sergei
2
1
![](https://secure.gravatar.com/avatar/4ae3ae2d37cd61bdf2ec02642a5100a7.jpg?s=120&d=mm&r=g)
04 Jun '10
All,
MariaDB 5.2.1 is getting closer to being released so I've started
filling out the Release Notes and Changelog pages:
http://askmonty.org/wiki/Manual:MariaDB_5.2.1_Release_Notes
http://askmonty.org/wiki/Manual:MariaDB_5.2.1_Changelog
On the documentation TODO list for this release is a page on the OQGraph
storage engine for the manual. Any volunteers? :) (I'll get to it next
week, but if someone wants to put something up right away I wouldn't
object.)
Thanks.
--
Daniel Bartholomew
Monty Program - http://askmonty.org
1
0