developers
Threads by month
- ----- 2025 -----
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 3 participants
- 6829 discussions
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] Progress (by Knielsen): Store in binlog text of statements that caused RBR events (47)
by worklog-noreply@askmonty.org 25 May '10
by worklog-noreply@askmonty.org 25 May '10
25 May '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Store in binlog text of statements that caused RBR events
CREATION DATE..: Sat, 15 Aug 2009, 23:48
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Knielsen, Serg
CATEGORY.......: Server-Sprint
TASK ID........: 47 (http://askmonty.org/worklog/?tid=47)
VERSION........: Server-9.x
STATUS.........: Code-Review
PRIORITY.......: 60
WORKED HOURS...: 31
ESTIMATE.......: 4 (hours remain)
ORIG. ESTIMATE.: 35
PROGRESS NOTES:
-=-=(Knielsen - Tue, 25 May 2010, 08:29)=-=-
Help debug strange problem in mysqlbinlog.test.
Worked 1 hour and estimate 4 hours remain (original estimate unchanged).
-=-=(Knielsen - Mon, 17 May 2010, 08:45)=-=-
Merge with latest trunk and run Buildbot tests.
Worked 1 hour and estimate 5 hours remain (original estimate unchanged).
-=-=(Knielsen - Wed, 05 May 2010, 13:53)=-=-
Review of fixes to first review done. No new issues found.
Worked 2 hours and estimate 6 hours remain (original estimate unchanged).
-=-=(Knielsen - Fri, 23 Apr 2010, 12:51)=-=-
Status updated.
--- /tmp/wklog.47.old.28747 2010-04-23 12:51:36.000000000 +0000
+++ /tmp/wklog.47.new.28747 2010-04-23 12:51:36.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Knielsen - Tue, 06 Apr 2010, 15:26)=-=-
Code review (mailed to maria-developers@).
Worked 7 hours and estimate 8 hours remain (original estimate unchanged).
-=-=(Knielsen - Tue, 06 Apr 2010, 15:25)=-=-
Status updated.
--- /tmp/wklog.47.old.12734 2010-04-06 15:25:54.000000000 +0000
+++ /tmp/wklog.47.new.12734 2010-04-06 15:25:54.000000000 +0000
@@ -1 +1 @@
-Code-Review
+In-Progress
-=-=(Knielsen - Mon, 29 Mar 2010, 10:59)=-=-
Status updated.
--- /tmp/wklog.47.old.27790 2010-03-29 10:59:53.000000000 +0000
+++ /tmp/wklog.47.new.27790 2010-03-29 10:59:53.000000000 +0000
@@ -1 +1 @@
-In-Progress
+Code-Review
-=-=(Alexi - Thu, 18 Feb 2010, 19:29)=-=-
Worked 20 hours (alexi)
Worked 20 hours and estimate 15 hours remain (original estimate unchanged).
-=-=(Serg - Fri, 05 Feb 2010, 14:04)=-=-
Observers changed: Knielsen,Serg
-=-=(Guest - Fri, 05 Feb 2010, 13:40)=-=-
Category updated.
--- /tmp/wklog.47.old.9197 2010-02-05 13:40:36.000000000 +0200
+++ /tmp/wklog.47.new.9197 2010-02-05 13:40:36.000000000 +0200
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-Sprint
------------------------------------------------------------
-=-=(View All Progress Notes, 31 total)=-=-
http://askmonty.org/worklog/index.pl?tid=47&nolimit=1
DESCRIPTION:
Store in binlog (and show in mysqlbinlog output) texts of statements that
caused RBR events
This is needed for (list from Monty):
- Easier to understand why updates happened
- Would make it easier to find out where in application things went
wrong (as you can search for exact strings)
- Allow one to filter things based on comments in the statement.
The cost of this can be that the binlog will be approximately 2x in size
(especially insert of big blob's would be a bit painful), so this should
be an optional feature.
HIGH-LEVEL SPECIFICATION:
Content
~~~~~~~
1. Annotate_rows_log_event
2. Server option: --binlog-annotate-rows-events
3. Server option: --replicate-annotate-rows-events
4. mysqlbinlog option: --print-annotate-rows-events
5. mysqlbinlog output
1. Annotate_rows_log_event [ ANNOTATE_ROWS_EVENT ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Describes the query which caused the corresponding rows events. Has empty
post-header and contains the query text in its data part. Example:
************************
ANNOTATE_ROWS_EVENT
************************
00000220 | B6 A0 2C 4B | time_when = 1261215926
00000224 | 33 | event_type = 51
00000225 | 64 00 00 00 | server_id = 100
00000229 | 36 00 00 00 | event_len = 54
0000022D | 56 02 00 00 | log_pos = 00000256
00000231 | 00 00 | flags = <none>
------------------------
00000233 | 49 4E 53 45 | query = "INSERT INTO t1 VALUES (1), (2), (3)"
00000237 | 52 54 20 49 |
0000023B | 4E 54 4F 20 |
0000023F | 74 31 20 56 |
00000243 | 41 4C 55 45 |
00000247 | 53 20 28 31 |
0000024B | 29 2C 20 28 |
0000024F | 32 29 2C 20 |
00000253 | 28 33 29 |
************************
In binary log, Annotate_rows event follows the (possible) 'BEGIN' Query event
and precedes the first of Table map events which accompany the corresponding
rows events. (See example in the "mysqlbinlog output" section below.)
2. Server option: --binlog-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the master to write Annotate_rows events to the binary log.
* Variable Name: binlog_annotate_rows_events
* Scope: Global & Session
* Access Type: Dynamic
* Data Type: bool
* Default Value: OFF
NOTE. Session values allows to annotate only some selected statements:
...
SET SESSION binlog_annotate_rows_events=ON;
... statements to be annotated ...
SET SESSION binlog_annotate_rows_events=OFF;
... statements not to be annotated ...
3. Server option: --replicate-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tells the slave to reproduce Annotate_rows events recieved from the master
in its own binary log (sensible only in pair with log-slave-updates option).
* Variable Name: replicate_annotate_rows_events
* Scope: Global
* Access Type: Read only
* Data Type: bool
* Default Value: OFF
NOTE. Why do we additionally need this 'replicate' option? Why not to make
the slave to reproduce this events when its binlog-annotate-rows-events
global value is ON? Well, because, for example, we may want to configure
the slave which should reproduce Annotate_rows events but has global
binlog-annotate-rows-events = OFF meaning this to be the default value for
the client threads (see also "How slave treats replicate-annotate-rows-events
option" in LLD part).
4. mysqlbinlog option: --print-annotate-rows-events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With this option, mysqlbinlog prints the content of Annotate_rows events (if
the binary log does contain them). Without this option (i.e. by default),
mysqlbinlog skips Annotate_rows events.
5. mysqlbinlog output
~~~~~~~~~~~~~~~~~~~~~
With --print-annotate-rows-events, mysqlbinlog outputs Annotate_rows events
in a form like this:
...
# at 1646
#091219 12:45:26 server id 100 end_log_pos 1714 Query thread_id=1
exec_time=0 error_code=0
SET TIMESTAMP=1261215926/*!*/;
BEGIN
/*!*/;
# at 1714
# at 1812
# at 1853
# at 1894
# at 1938
#091219 12:45:26 server id 100 end_log_pos 1812 Query: `DELETE t1, t2 FROM
t1 INNER JOIN t2 INNER JOIN t3 WHERE t1.a=t2.a AND t2.a=t3.a`
#091219 12:45:26 server id 100 end_log_pos 1853 Table_map: `test`.`t1`
mapped to number 16
#091219 12:45:26 server id 100 end_log_pos 1894 Table_map: `test`.`t2`
mapped to number 17
#091219 12:45:26 server id 100 end_log_pos 1938 Delete_rows: table id 16
#091219 12:45:26 server id 100 end_log_pos 1982 Delete_rows: table id 17
flags: STMT_END_F
...
LOW-LEVEL DESIGN:
Content
~~~~~~~
1. Annotate_rows event number
2. Outline of Annotate_rows event behavior
3. How Master writes Annotate_rows events to the binary log
4. How slave treats replicate-annotate-rows-events option
5. How slave IO thread requests Annotate_rows events
6. How master executes the request
7. How slave SQL thread processes Annotate_rows events
8. General remarks
1. Annotate_rows event number
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To avoid possible event numbers conflict with MySQL/Sun, we leave a gap
between the last MySQL event number and the Annotate_rows event number:
enum Log_event_type
{ ...
INCIDENT_EVENT= 26,
// New MySQL event numbers are to be added here
MYSQL_EVENTS_END,
MARIA_EVENTS_BEGIN= 51,
// New Maria event numbers start from here
ANNOTATE_ROWS_EVENT= 51,
ENUM_END_EVENT
};
together with the corresponding extension of 'post_header_len' array in the
Format description event. (This extension does not affect the compatibility
of the binary log). Here is how Format description event looks like with
this extension:
************************
FORMAT_DESCRIPTION_EVENT
************************
00000004 | A1 A0 2C 4B | time_when = 1261215905
00000008 | 0F | event_type = 15
00000009 | 64 00 00 00 | server_id = 100
0000000D | 7F 00 00 00 | event_len = 127
00000011 | 83 00 00 00 | log_pos = 00000083
00000015 | 01 00 | flags = LOG_EVENT_BINLOG_IN_USE_F
------------------------
00000017 | 04 00 | binlog_ver = 4
00000019 | 35 2E 32 2E | server_ver = 5.2.0-MariaDB-alpha-debug-log
..... ...
0000004B | A1 A0 2C 4B | time_created = 1261215905
0000004F | 13 | common_header_len = 19
------------------------
post_header_len
------------------------
00000050 | 38 | 56 - START_EVENT_V3 [1]
..... ...
00000069 | 02 | 2 - INCIDENT_EVENT [26]
0000006A | 00 | 0 - RESERVED [27]
..... ...
00000081 | 00 | 0 - RESERVED [50]
00000082 | 00 | 0 - ANNOTATE_ROWS_EVENT [51]
************************
2. Outline of Annotate_rows event behavior
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Each Annotate_rows_log_event object has two private members describing the
corresponding query:
char *m_query_txt;
uint m_query_len;
When the object is created for writing to a binary log, this query is taken
from 'thd' (for short, below we omit the 'Annotate_rows_log_event::' prefix
as well as other implementation details):
Annotate_rows_log_event(THD *thd)
{
m_query_txt = thd->query();
m_query_len = thd->query_length();
}
When the object is read from a binary log, the query is taken from the buffer
containing the binary log representation of the event (this buffer is allocated
in Log_event object from which all Log events are derived):
Annotate_rows_log_event(char *buf, uint event_len,
Format_description_log_event *desc)
{
m_query_len = event_len - desc->common_header_len;
m_query_txt = buf + desc->common_header_len;
}
The events are written to the binary log by the Log_event::write() member
which calls virtual write_data_header() and write_data_body() members
("data header" and "post header" are synonym in replication terminology).
In our case, data header is empty and data body is just the query:
bool write_data_body(IO_CACHE *file)
{
return my_b_safe_write(file, (uchar*) m_query_txt, m_query_len);
}
Printing the event is just printing the query:
void Annotate_rows_log_event::print(FILE *file, PRINT_EVENT_INFO *pinfo)
{
my_b_printf(&pinfo->head_cache, "\tQuery: `%s`\n", m_query_txt);
}
3. How Master writes Annotate_rows events to the binary log
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The event is written to the binary log just before the group of Table_map
events which precede corresponding Rows events (one query may generate
several Table map events in the binary log, but the corresponding
Annotate_rows event must be written only once before the first Table map
event; hence the boolean variable 'with_annotate' below):
int write_locked_table_maps(THD *thd)
{ ...
bool with_annotate= thd->variables.binlog_annotate_rows_events;
...
for (uint i= 0; i < ... <number of tables> ...; ++i)
{ ...
thd->binlog_write_table_map(table, ..., with_annotate);
with_annotate= 0; // write Annotate_event not more than once
...
}
...
}
int THD::binlog_write_table_map(TABLE *table, ..., bool with_annotate)
{ ...
Table_map_log_event the_event(...);
...
if (with_annotate)
{
Annotate_rows_log_event anno(this);
mysql_bin_log.write(&anno);
}
mysql_bin_log.write(&the_event);
...
}
4. How slave treats replicate-annotate-rows-events option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The replicate-annotate-rows-events option is treated just as the session
value of the binlog_annotate_rows_events variable for the slave IO and
SQL threads. This setting is done during initialization of these threads:
pthread_handler_t handle_slave_io(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_IO);
...
}
pthread_handler_t handle_slave_sql(void *arg)
{
THD *thd= new THD;
...
init_slave_thread(thd, SLAVE_THD_SQL);
...
}
int init_slave_thread(THD* thd, SLAVE_THD_TYPE thd_type)
{ ...
thd->variables.binlog_annotate_rows_events=
opt_replicate_annotate_rows_events;
...
}
5. How slave IO thread requests Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the replicate-annotate-rows-events option is not set on a slave, there
is no need for master to send Annotate_rows events to this slave. The slave
(or mysqlbinlog in remote case), before requesting binlog dump via the
COM_BINLOG_DUMP command, informs the master whether it should send these
events by executing the newly added COM_BINLOG_DUMP_OPTIONS_EXT server
command:
case COM_BINLOG_DUMP_OPTIONS_EXT:
thd->binlog_dump_flags_ext= packet[0];
my_ok(thd);
break;
Note. We add this new command and don't use COM_BINLOG_DUMP to avoid possible
conflicts with MySQL/Sun.
6. How master executes the request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
case COM_BINLOG_DUMP:
{ ...
flags= uint2korr(packet + 4);
...
mysql_binlog_send(thd, ..., flags);
...
}
void mysql_binlog_send(THD* thd, ..., ushort flags)
{ ...
Log_event::read_log_event(&log, packet, ...);
...
if ((*packet)[EVENT_TYPE_OFFSET + 1] != ANNOTATE_ROWS_EVENT ||
flags & BINLOG_SEND_ANNOTATE_ROWS_EVENT)
{
my_net_write(net, packet->ptr(), packet->length());
}
...
}
7. How slave SQL thread processes Annotate_rows events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The slave processes each recieved event by "applying" it, i.e. by
calling the Log_event::apply_event() function which in turn calls
the virtual do_apply_event() member specific for each type of the
event.
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev = next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
int apply_event_and_update_pos(Log_event *ev, ...)
{ ...
ev->apply_event(...);
...
}
int Log_event::apply_event(...)
{
return do_apply_event(...);
}
What does it mean to "apply" an Annotate_rows event? It means to set current
thd query to that of the described by the event, i.e. to the query which
caused the subsequent Rows events (see "How Master writes Annotate_rows
events to the binary log" to follow what happens further when the subsequent
Rows events are applied):
int Annotate_rows_log_event::do_apply_event(...)
{
thd->set_query(m_query_txt, m_query_len);
}
NOTE. I am not sure, but possibly current values of thd->query and
thd->query_length should be saved before calling set_query() and to be
restored on the Annotate_rows_log_event object deletion.
Is it really needed ?
After calling this do_apply_event() function we may not delete the
Annotate_rows_log_event object immediatedly (see exec_relay_log_event()
above) because thd->query now points to the string inside this object.
We may keep the pointer to this object in the Relay_log_info:
class Relay_log_info
{
public:
...
void set_annotate_event(Annotate_rows_log_event*);
Annotate_rows_log_event* get_annotate_event();
void free_annotate_event();
...
private:
Annotate_rows_log_event* m_annotate_event;
};
The saved Annotate_rows object should be deleted when all corresponding
Rows events will be processed:
int exec_relay_log_event(THD* thd, Relay_log_info* rli)
{ ...
Log_event *ev= next_event(rli);
...
apply_event_and_update_pos(ev, ...);
if (rli->get_annotate_event() && is_last_rows_event(ev))
rli->free_annotate_event();
else if (ev->get_type_code() == ANNOTATE_ROWS_EVENT)
rli->set_annotate_event((Annotate_rows_log_event*) ev);
else if (ev->get_type_code() != FORMAT_DESCRIPTION_EVENT)
delete ev;
...
}
where
bool is_last_rows_event(Log_event* ev)
{
Log_event_type type= ev->get_type_code();
if (IS_ROWS_EVENT_TYPE(type))
{
Rows_log_event* rows= (Rows_log_event*)ev;
return rows->get_flags(Rows_log_event::STMT_END_F);
}
return 0;
}
#define IS_ROWS_EVENT_TYPE(type) ((type) == WRITE_ROWS_EVENT || \
(type) == UPDATE_ROWS_EVENT || \
(type) == DELETE_ROWS_EVENT)
8. General remarks
~~~~~~~~~~~~~~~~~~
Kristian noticed that introducing new log event type should be coordinated
somehow with MySQL/Sun:
Kristian: The numeric code for this event must be assigned carefully.
It should be coordinated with MySQL/Sun, otherwise we can get into a
situation where MySQL uses the same numeric code for one event that
MariaDB uses for ANNOTATE_ROWS_EVENT, which would make merging the two
impossible.
Alex: I reserved about 20 numbers not to have possible conflicts
with MySQL.
Kristian: Still, I think it would be appropriate to send a polite email
to internals(a)lists.mysql.com about this and suggesting to reserve the
event number.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] Progress (by Knielsen): Efficient group commit for binary log (116)
by worklog-noreply@askmonty.org 25 May '10
by worklog-noreply@askmonty.org 25 May '10
25 May '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Efficient group commit for binary log
CREATION DATE..: Mon, 26 Apr 2010, 13:28
SUPERVISOR.....: Knielsen
IMPLEMENTOR....:
COPIES TO......: Serg
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 116 (http://askmonty.org/worklog/?tid=116)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 49
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Knielsen - Tue, 25 May 2010, 08:28)=-=-
More thoughts on and changes to the archtecture. Got to something now that I am satisfied with and
that seems to be able to handle all issues.
Implement new prepare_ordered and commit_ordered handler methods and the logic in ha_commit_trans().
Implement TC_LOG::group_log_xid() method and logic in ha_commit_trans().
Implement XtraDB part, using commit_ordered() rather than prepare_commit_mutex.
Fix test suite failures.
Proof-of-concept patch series complete now.
Do initial benchmark, getting good results. With 64 threads, see 26x improvement in queries-per-sec.
Next step: write up the architecture description.
Worked 21 hours and estimate 0 hours remain (original estimate increased by 21 hours).
-=-=(Knielsen - Wed, 12 May 2010, 06:41)=-=-
Started work on a Quilt patch series, refactoring the binlog code to prepare for implementing the
group commit, and working on the design of group commit in parallel.
Found and fixed several problems in error handling when writing to binlog.
Removed redundant table map version locking.
Split binlog writing into two parts in preparations for group commit. When ready to write to the
binlog, threads enter a queue, and the first thread in the queue handles the binlog writing for
everyone. When it obtains the LOCK_log, it first loops over all threads, executing the first part of
binlog writing (the write(2) syscall essentially). It then runs the second part (fsync(2)
essentially) only once, and then wakes up the remaining threads in the queue.
Still to be done:
Finish the proof-of-concept group commit patch, by 1) implementing the prepare_fast() and
commit_fast() callbacks in handler.cc 2) move the binlog thread enqueue from log_xid() to
binlog_prepare_fast(), 3) move fast part of InnoDB commit to innobase_commit_fast(), removing the
prepare_commit_mutex().
Write up the final design in this worklog.
Evaluate the design to see if we can do better/different.
Think about possible next steps, such as releasing innodb row locks early (in
innobase_prepare_fast), and doing crash recovery by replaying transactions from the binlog (removing
the need for engine durability and 2 of 3 fsync() in commit).
Worked 28 hours and estimate 0 hours remain (original estimate increased by 28 hours).
-=-=(Serg - Mon, 26 Apr 2010, 14:10)=-=-
Observers changed: Serg
DESCRIPTION:
Currently, in order to ensure that the server can recover after a crash to a
state in which storage engines and binary log are consistent with each other,
it is necessary to use XA with durable commits for both storage engines
(innodb_flush_log_at_trx_commit=1) and binary log (sync_binlog=1).
This is _very_ expensive, since the server needs to do three fsync() operations
for every commit, as there is no working group commit when the binary log is
enabled.
The idea is to
- Implement/fix group commit to work properly with the binary log enabled.
- (Optionally) avoid the need to fsync() in the engine, and instead rely on
replaying any lost transactions from the binary log against the engine
during crash recovery.
For background see these articles:
http://kristiannielsen.livejournal.com/12254.html
http://kristiannielsen.livejournal.com/12408.html
http://kristiannielsen.livejournal.com/12553.html
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] Progress (by Knielsen): Efficient group commit for binary log (116)
by worklog-noreply@askmonty.org 25 May '10
by worklog-noreply@askmonty.org 25 May '10
25 May '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Efficient group commit for binary log
CREATION DATE..: Mon, 26 Apr 2010, 13:28
SUPERVISOR.....: Knielsen
IMPLEMENTOR....:
COPIES TO......: Serg
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 116 (http://askmonty.org/worklog/?tid=116)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 49
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Knielsen - Tue, 25 May 2010, 08:28)=-=-
More thoughts on and changes to the archtecture. Got to something now that I am satisfied with and
that seems to be able to handle all issues.
Implement new prepare_ordered and commit_ordered handler methods and the logic in ha_commit_trans().
Implement TC_LOG::group_log_xid() method and logic in ha_commit_trans().
Implement XtraDB part, using commit_ordered() rather than prepare_commit_mutex.
Fix test suite failures.
Proof-of-concept patch series complete now.
Do initial benchmark, getting good results. With 64 threads, see 26x improvement in queries-per-sec.
Next step: write up the architecture description.
Worked 21 hours and estimate 0 hours remain (original estimate increased by 21 hours).
-=-=(Knielsen - Wed, 12 May 2010, 06:41)=-=-
Started work on a Quilt patch series, refactoring the binlog code to prepare for implementing the
group commit, and working on the design of group commit in parallel.
Found and fixed several problems in error handling when writing to binlog.
Removed redundant table map version locking.
Split binlog writing into two parts in preparations for group commit. When ready to write to the
binlog, threads enter a queue, and the first thread in the queue handles the binlog writing for
everyone. When it obtains the LOCK_log, it first loops over all threads, executing the first part of
binlog writing (the write(2) syscall essentially). It then runs the second part (fsync(2)
essentially) only once, and then wakes up the remaining threads in the queue.
Still to be done:
Finish the proof-of-concept group commit patch, by 1) implementing the prepare_fast() and
commit_fast() callbacks in handler.cc 2) move the binlog thread enqueue from log_xid() to
binlog_prepare_fast(), 3) move fast part of InnoDB commit to innobase_commit_fast(), removing the
prepare_commit_mutex().
Write up the final design in this worklog.
Evaluate the design to see if we can do better/different.
Think about possible next steps, such as releasing innodb row locks early (in
innobase_prepare_fast), and doing crash recovery by replaying transactions from the binlog (removing
the need for engine durability and 2 of 3 fsync() in commit).
Worked 28 hours and estimate 0 hours remain (original estimate increased by 28 hours).
-=-=(Serg - Mon, 26 Apr 2010, 14:10)=-=-
Observers changed: Serg
DESCRIPTION:
Currently, in order to ensure that the server can recover after a crash to a
state in which storage engines and binary log are consistent with each other,
it is necessary to use XA with durable commits for both storage engines
(innodb_flush_log_at_trx_commit=1) and binary log (sync_binlog=1).
This is _very_ expensive, since the server needs to do three fsync() operations
for every commit, as there is no working group commit when the binary log is
enabled.
The idea is to
- Implement/fix group commit to work properly with the binary log enabled.
- (Optionally) avoid the need to fsync() in the engine, and instead rely on
replaying any lost transactions from the binary log against the engine
during crash recovery.
For background see these articles:
http://kristiannielsen.livejournal.com/12254.html
http://kristiannielsen.livejournal.com/12408.html
http://kristiannielsen.livejournal.com/12553.html
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] Updated (by Psergey): Push conditions down into non-mergeable VIEWs when possible (119)
by worklog-noreply@askmonty.org 25 May '10
by worklog-noreply@askmonty.org 25 May '10
25 May '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Push conditions down into non-mergeable VIEWs when possible
CREATION DATE..: Mon, 24 May 2010, 20:52
SUPERVISOR.....: Igor
IMPLEMENTOR....:
COPIES TO......: Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 119 (http://askmonty.org/worklog/?tid=119)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 24 May 2010, 20:59)=-=-
High-Level Specification modified.
--- /tmp/wklog.119.old.25116 2010-05-24 20:59:40.000000000 +0000
+++ /tmp/wklog.119.new.25116 2010-05-24 20:59:40.000000000 +0000
@@ -1 +1,113 @@
+<contents>
+HLS
+1. Problems to be addressed in this WL
+2. Pushdown of conditions into non-mergeable VIEWs
+2.1 A note about VIEWs with ORDER BY ... LIMIT
+2.2 What condition can be pushed
+3. Pushdown from HAVING into WHERE
+4. When to do the pushdown
+5. Other things to take care of
+
+
+</contents>
+
+1. Problems to be addressed in this WL
+======================================
+The problem actually consists of two parts:
+1. Condition on VIEW columns are not pushed down into VIEWs.
+2. Even if conditions were pushed, they would have been put into VIEW's
+HAVING clause, which would not give enough of speedup. In order to get a
+real speedup, the optimizer must be able to move relevant part of HAVING
+into WHERE (and then use it for further optimizations) in order to provide
+the desired speedup. Note that HAVING->WHERE condition move is orthogonal
+to VIEW processing.
+
+2. Pushdown of conditions into non-mergeable VIEWs
+==================================================
+We can push a condition into non-mergeable VIEW when VIEW's top-most operation
+is selection (i.e., filtering). This is true, for example, when the VIEW is
+defined as
+
+ SELECT select_list FROM from_clause [WHERE where_cond] [HAVING having_cond]
+
+and not true when the VIEW is defined as
+
+ SELECT select_list FROM from_clause [WHERE where_cond] ORDER BY expr LIMIT n
+
+Generalizing the above, we arrive at the following rule:
+
+ For non-mergeable VIEWs,
+ - pushdown must not be done if VIEW uses ORDER BY .. LIMIT
+ - when pushdown is done, the pushed condition should be added to the WHERE
+ clause.
+
+Note: In scope of this WL, we will not hande VIEWs that have UNION [ALL] as
+top operation.
+
+(TODO: what about SELECT DISTINCT?)
+(TODO: pushdown down into IN subqueries?)
+
+2.1 A note about VIEWs with ORDER BY ... LIMIT
+----------------------------------------------
+Although it is not possible to push a condition below the ORDER BY ... LIMIT
+operation, there is still some benefit from checking the condition early as
+that would allow to avoid writing non-matching rows into temporary table.
+
+We could do that if we introduced a post-ORDERBY selection operation. That
+operation would also allow to support ORDER BY ... LIMIT inside subqueries
+(we don't currently support those because default subquery strategy,
+IN->EXISTS rewrite, also needs to push a condition into subquery).
+
+2.2 What condition can be pushed
+--------------------------------
+Assuming simplify_joins() operation has done normalization:
+* If the VIEW is in top-level join list, or inside a semi-join that's in
+ top-level join list, we can push parts of WHERE condition.
+* If the VIEW is inside an outer join, we can push it's ON expression.
+
+We can reuse make_cond_for_index()/make_remainder_cond() code to extract part
+of condition that can be pushed, and the remainder, respectively.
+
+Pushability criteria for an atomic (i.e. not AND/OR) condition is that
+
+ the condition only uses VIEW's fields.
+
+(TODO: what about fields of const tables? Do we have const tables already
+retrived by the time VIEW is materialized? If yes, we could push down
+expressions that refer to const tables, too)
+
+3. Pushdown from HAVING into WHERE
+==================================
+The idea is:
+
+ Parts of HAVING that refer to columns/expressions we're doing GROUP BY on
+ can be put into WHERE.
+
+(TODO: do we need to handle case of grouping over expressions?)
+
+(TODO: when moving expression for HAVING to WHERE, do we need
+to do something with it? Replace all Item_ref objects with items that
+they refer to?
+ - In case of referring to expression, do we get
+ Item_ref(where_clause_expr) or expr( Item_ref(..), .., Item_ref(..))?
+)
+
+4. When to do the pushdown
+==========================
+In order to do pushdown, we must have prepare phase finished
+for both parent (so that we can make sense of its WHERE condition) and
+child (so that we know what it has in its select list).
+
+We can do pushdown before we've done join optimization (i.e. choose_plan()
+call) of the parent.
+
+We must do pushdown before we've done JOIN::optimize() of the child
+(in particular, it must be done before we do update_ref_and_keys() and
+range analysis in the child).
+
+
+5. Other things to take care of
+===============================
+* Pushing down fulltext predicates (it seems one needs to "register" a
+ fulltext predicate when it is moved from one select from another? Ask Serg)
DESCRIPTION:
There are complaints (see links below) about cases with non-mergeable
VIEW (because it has a GROUP BY), a query that has restrictions on
the grouped column, and poor performance that is caused by VIEW
processing code ignoring the restriction.
This WL is about addressing this issue.
links to complaints:
http://code.openark.org/blog/mysql/views-better-performance-with-condition-…
http://www.mysqlperformanceblog.com/2010/05/19/a-workaround-for-the-perform…
The target version is MariaDB 5.3+, because it has late optimization/execution
for FROM-subqueries/non mergeable VIEWs, which makes it much more feasible to
inject something into VIEW before it is optimized/executed.
HIGH-LEVEL SPECIFICATION:
<contents>
HLS
1. Problems to be addressed in this WL
2. Pushdown of conditions into non-mergeable VIEWs
2.1 A note about VIEWs with ORDER BY ... LIMIT
2.2 What condition can be pushed
3. Pushdown from HAVING into WHERE
4. When to do the pushdown
5. Other things to take care of
</contents>
1. Problems to be addressed in this WL
======================================
The problem actually consists of two parts:
1. Condition on VIEW columns are not pushed down into VIEWs.
2. Even if conditions were pushed, they would have been put into VIEW's
HAVING clause, which would not give enough of speedup. In order to get a
real speedup, the optimizer must be able to move relevant part of HAVING
into WHERE (and then use it for further optimizations) in order to provide
the desired speedup. Note that HAVING->WHERE condition move is orthogonal
to VIEW processing.
2. Pushdown of conditions into non-mergeable VIEWs
==================================================
We can push a condition into non-mergeable VIEW when VIEW's top-most operation
is selection (i.e., filtering). This is true, for example, when the VIEW is
defined as
SELECT select_list FROM from_clause [WHERE where_cond] [HAVING having_cond]
and not true when the VIEW is defined as
SELECT select_list FROM from_clause [WHERE where_cond] ORDER BY expr LIMIT n
Generalizing the above, we arrive at the following rule:
For non-mergeable VIEWs,
- pushdown must not be done if VIEW uses ORDER BY .. LIMIT
- when pushdown is done, the pushed condition should be added to the WHERE
clause.
Note: In scope of this WL, we will not hande VIEWs that have UNION [ALL] as
top operation.
(TODO: what about SELECT DISTINCT?)
(TODO: pushdown down into IN subqueries?)
2.1 A note about VIEWs with ORDER BY ... LIMIT
----------------------------------------------
Although it is not possible to push a condition below the ORDER BY ... LIMIT
operation, there is still some benefit from checking the condition early as
that would allow to avoid writing non-matching rows into temporary table.
We could do that if we introduced a post-ORDERBY selection operation. That
operation would also allow to support ORDER BY ... LIMIT inside subqueries
(we don't currently support those because default subquery strategy,
IN->EXISTS rewrite, also needs to push a condition into subquery).
2.2 What condition can be pushed
--------------------------------
Assuming simplify_joins() operation has done normalization:
* If the VIEW is in top-level join list, or inside a semi-join that's in
top-level join list, we can push parts of WHERE condition.
* If the VIEW is inside an outer join, we can push it's ON expression.
We can reuse make_cond_for_index()/make_remainder_cond() code to extract part
of condition that can be pushed, and the remainder, respectively.
Pushability criteria for an atomic (i.e. not AND/OR) condition is that
the condition only uses VIEW's fields.
(TODO: what about fields of const tables? Do we have const tables already
retrived by the time VIEW is materialized? If yes, we could push down
expressions that refer to const tables, too)
3. Pushdown from HAVING into WHERE
==================================
The idea is:
Parts of HAVING that refer to columns/expressions we're doing GROUP BY on
can be put into WHERE.
(TODO: do we need to handle case of grouping over expressions?)
(TODO: when moving expression for HAVING to WHERE, do we need
to do something with it? Replace all Item_ref objects with items that
they refer to?
- In case of referring to expression, do we get
Item_ref(where_clause_expr) or expr( Item_ref(..), .., Item_ref(..))?
)
4. When to do the pushdown
==========================
In order to do pushdown, we must have prepare phase finished
for both parent (so that we can make sense of its WHERE condition) and
child (so that we know what it has in its select list).
We can do pushdown before we've done join optimization (i.e. choose_plan()
call) of the parent.
We must do pushdown before we've done JOIN::optimize() of the child
(in particular, it must be done before we do update_ref_and_keys() and
range analysis in the child).
5. Other things to take care of
===============================
* Pushing down fulltext predicates (it seems one needs to "register" a
fulltext predicate when it is moved from one select from another? Ask Serg)
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
3
2
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] Updated (by Psergey): Push conditions down into non-mergeable VIEWs when possible (119)
by worklog-noreply@askmonty.org 24 May '10
by worklog-noreply@askmonty.org 24 May '10
24 May '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Push conditions down into non-mergeable VIEWs when possible
CREATION DATE..: Mon, 24 May 2010, 20:52
SUPERVISOR.....: Igor
IMPLEMENTOR....:
COPIES TO......: Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 119 (http://askmonty.org/worklog/?tid=119)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 24 May 2010, 20:59)=-=-
High-Level Specification modified.
--- /tmp/wklog.119.old.25116 2010-05-24 20:59:40.000000000 +0000
+++ /tmp/wklog.119.new.25116 2010-05-24 20:59:40.000000000 +0000
@@ -1 +1,113 @@
+<contents>
+HLS
+1. Problems to be addressed in this WL
+2. Pushdown of conditions into non-mergeable VIEWs
+2.1 A note about VIEWs with ORDER BY ... LIMIT
+2.2 What condition can be pushed
+3. Pushdown from HAVING into WHERE
+4. When to do the pushdown
+5. Other things to take care of
+
+
+</contents>
+
+1. Problems to be addressed in this WL
+======================================
+The problem actually consists of two parts:
+1. Condition on VIEW columns are not pushed down into VIEWs.
+2. Even if conditions were pushed, they would have been put into VIEW's
+HAVING clause, which would not give enough of speedup. In order to get a
+real speedup, the optimizer must be able to move relevant part of HAVING
+into WHERE (and then use it for further optimizations) in order to provide
+the desired speedup. Note that HAVING->WHERE condition move is orthogonal
+to VIEW processing.
+
+2. Pushdown of conditions into non-mergeable VIEWs
+==================================================
+We can push a condition into non-mergeable VIEW when VIEW's top-most operation
+is selection (i.e., filtering). This is true, for example, when the VIEW is
+defined as
+
+ SELECT select_list FROM from_clause [WHERE where_cond] [HAVING having_cond]
+
+and not true when the VIEW is defined as
+
+ SELECT select_list FROM from_clause [WHERE where_cond] ORDER BY expr LIMIT n
+
+Generalizing the above, we arrive at the following rule:
+
+ For non-mergeable VIEWs,
+ - pushdown must not be done if VIEW uses ORDER BY .. LIMIT
+ - when pushdown is done, the pushed condition should be added to the WHERE
+ clause.
+
+Note: In scope of this WL, we will not hande VIEWs that have UNION [ALL] as
+top operation.
+
+(TODO: what about SELECT DISTINCT?)
+(TODO: pushdown down into IN subqueries?)
+
+2.1 A note about VIEWs with ORDER BY ... LIMIT
+----------------------------------------------
+Although it is not possible to push a condition below the ORDER BY ... LIMIT
+operation, there is still some benefit from checking the condition early as
+that would allow to avoid writing non-matching rows into temporary table.
+
+We could do that if we introduced a post-ORDERBY selection operation. That
+operation would also allow to support ORDER BY ... LIMIT inside subqueries
+(we don't currently support those because default subquery strategy,
+IN->EXISTS rewrite, also needs to push a condition into subquery).
+
+2.2 What condition can be pushed
+--------------------------------
+Assuming simplify_joins() operation has done normalization:
+* If the VIEW is in top-level join list, or inside a semi-join that's in
+ top-level join list, we can push parts of WHERE condition.
+* If the VIEW is inside an outer join, we can push it's ON expression.
+
+We can reuse make_cond_for_index()/make_remainder_cond() code to extract part
+of condition that can be pushed, and the remainder, respectively.
+
+Pushability criteria for an atomic (i.e. not AND/OR) condition is that
+
+ the condition only uses VIEW's fields.
+
+(TODO: what about fields of const tables? Do we have const tables already
+retrived by the time VIEW is materialized? If yes, we could push down
+expressions that refer to const tables, too)
+
+3. Pushdown from HAVING into WHERE
+==================================
+The idea is:
+
+ Parts of HAVING that refer to columns/expressions we're doing GROUP BY on
+ can be put into WHERE.
+
+(TODO: do we need to handle case of grouping over expressions?)
+
+(TODO: when moving expression for HAVING to WHERE, do we need
+to do something with it? Replace all Item_ref objects with items that
+they refer to?
+ - In case of referring to expression, do we get
+ Item_ref(where_clause_expr) or expr( Item_ref(..), .., Item_ref(..))?
+)
+
+4. When to do the pushdown
+==========================
+In order to do pushdown, we must have prepare phase finished
+for both parent (so that we can make sense of its WHERE condition) and
+child (so that we know what it has in its select list).
+
+We can do pushdown before we've done join optimization (i.e. choose_plan()
+call) of the parent.
+
+We must do pushdown before we've done JOIN::optimize() of the child
+(in particular, it must be done before we do update_ref_and_keys() and
+range analysis in the child).
+
+
+5. Other things to take care of
+===============================
+* Pushing down fulltext predicates (it seems one needs to "register" a
+ fulltext predicate when it is moved from one select from another? Ask Serg)
DESCRIPTION:
There are complaints (see links below) about cases with non-mergeable
VIEW (because it has a GROUP BY), a query that has restrictions on
the grouped column, and poor performance that is caused by VIEW
processing code ignoring the restriction.
This WL is about addressing this issue.
links to complaints:
http://code.openark.org/blog/mysql/views-better-performance-with-condition-…
http://www.mysqlperformanceblog.com/2010/05/19/a-workaround-for-the-perform…
The target version is MariaDB 5.3+, because it has late optimization/execution
for FROM-subqueries/non mergeable VIEWs, which makes it much more feasible to
inject something into VIEW before it is optimized/executed.
HIGH-LEVEL SPECIFICATION:
<contents>
HLS
1. Problems to be addressed in this WL
2. Pushdown of conditions into non-mergeable VIEWs
2.1 A note about VIEWs with ORDER BY ... LIMIT
2.2 What condition can be pushed
3. Pushdown from HAVING into WHERE
4. When to do the pushdown
5. Other things to take care of
</contents>
1. Problems to be addressed in this WL
======================================
The problem actually consists of two parts:
1. Condition on VIEW columns are not pushed down into VIEWs.
2. Even if conditions were pushed, they would have been put into VIEW's
HAVING clause, which would not give enough of speedup. In order to get a
real speedup, the optimizer must be able to move relevant part of HAVING
into WHERE (and then use it for further optimizations) in order to provide
the desired speedup. Note that HAVING->WHERE condition move is orthogonal
to VIEW processing.
2. Pushdown of conditions into non-mergeable VIEWs
==================================================
We can push a condition into non-mergeable VIEW when VIEW's top-most operation
is selection (i.e., filtering). This is true, for example, when the VIEW is
defined as
SELECT select_list FROM from_clause [WHERE where_cond] [HAVING having_cond]
and not true when the VIEW is defined as
SELECT select_list FROM from_clause [WHERE where_cond] ORDER BY expr LIMIT n
Generalizing the above, we arrive at the following rule:
For non-mergeable VIEWs,
- pushdown must not be done if VIEW uses ORDER BY .. LIMIT
- when pushdown is done, the pushed condition should be added to the WHERE
clause.
Note: In scope of this WL, we will not hande VIEWs that have UNION [ALL] as
top operation.
(TODO: what about SELECT DISTINCT?)
(TODO: pushdown down into IN subqueries?)
2.1 A note about VIEWs with ORDER BY ... LIMIT
----------------------------------------------
Although it is not possible to push a condition below the ORDER BY ... LIMIT
operation, there is still some benefit from checking the condition early as
that would allow to avoid writing non-matching rows into temporary table.
We could do that if we introduced a post-ORDERBY selection operation. That
operation would also allow to support ORDER BY ... LIMIT inside subqueries
(we don't currently support those because default subquery strategy,
IN->EXISTS rewrite, also needs to push a condition into subquery).
2.2 What condition can be pushed
--------------------------------
Assuming simplify_joins() operation has done normalization:
* If the VIEW is in top-level join list, or inside a semi-join that's in
top-level join list, we can push parts of WHERE condition.
* If the VIEW is inside an outer join, we can push it's ON expression.
We can reuse make_cond_for_index()/make_remainder_cond() code to extract part
of condition that can be pushed, and the remainder, respectively.
Pushability criteria for an atomic (i.e. not AND/OR) condition is that
the condition only uses VIEW's fields.
(TODO: what about fields of const tables? Do we have const tables already
retrived by the time VIEW is materialized? If yes, we could push down
expressions that refer to const tables, too)
3. Pushdown from HAVING into WHERE
==================================
The idea is:
Parts of HAVING that refer to columns/expressions we're doing GROUP BY on
can be put into WHERE.
(TODO: do we need to handle case of grouping over expressions?)
(TODO: when moving expression for HAVING to WHERE, do we need
to do something with it? Replace all Item_ref objects with items that
they refer to?
- In case of referring to expression, do we get
Item_ref(where_clause_expr) or expr( Item_ref(..), .., Item_ref(..))?
)
4. When to do the pushdown
==========================
In order to do pushdown, we must have prepare phase finished
for both parent (so that we can make sense of its WHERE condition) and
child (so that we know what it has in its select list).
We can do pushdown before we've done join optimization (i.e. choose_plan()
call) of the parent.
We must do pushdown before we've done JOIN::optimize() of the child
(in particular, it must be done before we do update_ref_and_keys() and
range analysis in the child).
5. Other things to take care of
===============================
* Pushing down fulltext predicates (it seems one needs to "register" a
fulltext predicate when it is moved from one select from another? Ask Serg)
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] Updated (by Psergey): Push conditions down into non-mergeable VIEWs when possible (119)
by worklog-noreply@askmonty.org 24 May '10
by worklog-noreply@askmonty.org 24 May '10
24 May '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Push conditions down into non-mergeable VIEWs when possible
CREATION DATE..: Mon, 24 May 2010, 20:52
SUPERVISOR.....: Igor
IMPLEMENTOR....:
COPIES TO......: Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 119 (http://askmonty.org/worklog/?tid=119)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 24 May 2010, 20:59)=-=-
High-Level Specification modified.
--- /tmp/wklog.119.old.25116 2010-05-24 20:59:40.000000000 +0000
+++ /tmp/wklog.119.new.25116 2010-05-24 20:59:40.000000000 +0000
@@ -1 +1,113 @@
+<contents>
+HLS
+1. Problems to be addressed in this WL
+2. Pushdown of conditions into non-mergeable VIEWs
+2.1 A note about VIEWs with ORDER BY ... LIMIT
+2.2 What condition can be pushed
+3. Pushdown from HAVING into WHERE
+4. When to do the pushdown
+5. Other things to take care of
+
+
+</contents>
+
+1. Problems to be addressed in this WL
+======================================
+The problem actually consists of two parts:
+1. Condition on VIEW columns are not pushed down into VIEWs.
+2. Even if conditions were pushed, they would have been put into VIEW's
+HAVING clause, which would not give enough of speedup. In order to get a
+real speedup, the optimizer must be able to move relevant part of HAVING
+into WHERE (and then use it for further optimizations) in order to provide
+the desired speedup. Note that HAVING->WHERE condition move is orthogonal
+to VIEW processing.
+
+2. Pushdown of conditions into non-mergeable VIEWs
+==================================================
+We can push a condition into non-mergeable VIEW when VIEW's top-most operation
+is selection (i.e., filtering). This is true, for example, when the VIEW is
+defined as
+
+ SELECT select_list FROM from_clause [WHERE where_cond] [HAVING having_cond]
+
+and not true when the VIEW is defined as
+
+ SELECT select_list FROM from_clause [WHERE where_cond] ORDER BY expr LIMIT n
+
+Generalizing the above, we arrive at the following rule:
+
+ For non-mergeable VIEWs,
+ - pushdown must not be done if VIEW uses ORDER BY .. LIMIT
+ - when pushdown is done, the pushed condition should be added to the WHERE
+ clause.
+
+Note: In scope of this WL, we will not hande VIEWs that have UNION [ALL] as
+top operation.
+
+(TODO: what about SELECT DISTINCT?)
+(TODO: pushdown down into IN subqueries?)
+
+2.1 A note about VIEWs with ORDER BY ... LIMIT
+----------------------------------------------
+Although it is not possible to push a condition below the ORDER BY ... LIMIT
+operation, there is still some benefit from checking the condition early as
+that would allow to avoid writing non-matching rows into temporary table.
+
+We could do that if we introduced a post-ORDERBY selection operation. That
+operation would also allow to support ORDER BY ... LIMIT inside subqueries
+(we don't currently support those because default subquery strategy,
+IN->EXISTS rewrite, also needs to push a condition into subquery).
+
+2.2 What condition can be pushed
+--------------------------------
+Assuming simplify_joins() operation has done normalization:
+* If the VIEW is in top-level join list, or inside a semi-join that's in
+ top-level join list, we can push parts of WHERE condition.
+* If the VIEW is inside an outer join, we can push it's ON expression.
+
+We can reuse make_cond_for_index()/make_remainder_cond() code to extract part
+of condition that can be pushed, and the remainder, respectively.
+
+Pushability criteria for an atomic (i.e. not AND/OR) condition is that
+
+ the condition only uses VIEW's fields.
+
+(TODO: what about fields of const tables? Do we have const tables already
+retrived by the time VIEW is materialized? If yes, we could push down
+expressions that refer to const tables, too)
+
+3. Pushdown from HAVING into WHERE
+==================================
+The idea is:
+
+ Parts of HAVING that refer to columns/expressions we're doing GROUP BY on
+ can be put into WHERE.
+
+(TODO: do we need to handle case of grouping over expressions?)
+
+(TODO: when moving expression for HAVING to WHERE, do we need
+to do something with it? Replace all Item_ref objects with items that
+they refer to?
+ - In case of referring to expression, do we get
+ Item_ref(where_clause_expr) or expr( Item_ref(..), .., Item_ref(..))?
+)
+
+4. When to do the pushdown
+==========================
+In order to do pushdown, we must have prepare phase finished
+for both parent (so that we can make sense of its WHERE condition) and
+child (so that we know what it has in its select list).
+
+We can do pushdown before we've done join optimization (i.e. choose_plan()
+call) of the parent.
+
+We must do pushdown before we've done JOIN::optimize() of the child
+(in particular, it must be done before we do update_ref_and_keys() and
+range analysis in the child).
+
+
+5. Other things to take care of
+===============================
+* Pushing down fulltext predicates (it seems one needs to "register" a
+ fulltext predicate when it is moved from one select from another? Ask Serg)
DESCRIPTION:
There are complaints (see links below) about cases with non-mergeable
VIEW (because it has a GROUP BY), a query that has restrictions on
the grouped column, and poor performance that is caused by VIEW
processing code ignoring the restriction.
This WL is about addressing this issue.
links to complaints:
http://code.openark.org/blog/mysql/views-better-performance-with-condition-…
http://www.mysqlperformanceblog.com/2010/05/19/a-workaround-for-the-perform…
The target version is MariaDB 5.3+, because it has late optimization/execution
for FROM-subqueries/non mergeable VIEWs, which makes it much more feasible to
inject something into VIEW before it is optimized/executed.
HIGH-LEVEL SPECIFICATION:
<contents>
HLS
1. Problems to be addressed in this WL
2. Pushdown of conditions into non-mergeable VIEWs
2.1 A note about VIEWs with ORDER BY ... LIMIT
2.2 What condition can be pushed
3. Pushdown from HAVING into WHERE
4. When to do the pushdown
5. Other things to take care of
</contents>
1. Problems to be addressed in this WL
======================================
The problem actually consists of two parts:
1. Condition on VIEW columns are not pushed down into VIEWs.
2. Even if conditions were pushed, they would have been put into VIEW's
HAVING clause, which would not give enough of speedup. In order to get a
real speedup, the optimizer must be able to move relevant part of HAVING
into WHERE (and then use it for further optimizations) in order to provide
the desired speedup. Note that HAVING->WHERE condition move is orthogonal
to VIEW processing.
2. Pushdown of conditions into non-mergeable VIEWs
==================================================
We can push a condition into non-mergeable VIEW when VIEW's top-most operation
is selection (i.e., filtering). This is true, for example, when the VIEW is
defined as
SELECT select_list FROM from_clause [WHERE where_cond] [HAVING having_cond]
and not true when the VIEW is defined as
SELECT select_list FROM from_clause [WHERE where_cond] ORDER BY expr LIMIT n
Generalizing the above, we arrive at the following rule:
For non-mergeable VIEWs,
- pushdown must not be done if VIEW uses ORDER BY .. LIMIT
- when pushdown is done, the pushed condition should be added to the WHERE
clause.
Note: In scope of this WL, we will not hande VIEWs that have UNION [ALL] as
top operation.
(TODO: what about SELECT DISTINCT?)
(TODO: pushdown down into IN subqueries?)
2.1 A note about VIEWs with ORDER BY ... LIMIT
----------------------------------------------
Although it is not possible to push a condition below the ORDER BY ... LIMIT
operation, there is still some benefit from checking the condition early as
that would allow to avoid writing non-matching rows into temporary table.
We could do that if we introduced a post-ORDERBY selection operation. That
operation would also allow to support ORDER BY ... LIMIT inside subqueries
(we don't currently support those because default subquery strategy,
IN->EXISTS rewrite, also needs to push a condition into subquery).
2.2 What condition can be pushed
--------------------------------
Assuming simplify_joins() operation has done normalization:
* If the VIEW is in top-level join list, or inside a semi-join that's in
top-level join list, we can push parts of WHERE condition.
* If the VIEW is inside an outer join, we can push it's ON expression.
We can reuse make_cond_for_index()/make_remainder_cond() code to extract part
of condition that can be pushed, and the remainder, respectively.
Pushability criteria for an atomic (i.e. not AND/OR) condition is that
the condition only uses VIEW's fields.
(TODO: what about fields of const tables? Do we have const tables already
retrived by the time VIEW is materialized? If yes, we could push down
expressions that refer to const tables, too)
3. Pushdown from HAVING into WHERE
==================================
The idea is:
Parts of HAVING that refer to columns/expressions we're doing GROUP BY on
can be put into WHERE.
(TODO: do we need to handle case of grouping over expressions?)
(TODO: when moving expression for HAVING to WHERE, do we need
to do something with it? Replace all Item_ref objects with items that
they refer to?
- In case of referring to expression, do we get
Item_ref(where_clause_expr) or expr( Item_ref(..), .., Item_ref(..))?
)
4. When to do the pushdown
==========================
In order to do pushdown, we must have prepare phase finished
for both parent (so that we can make sense of its WHERE condition) and
child (so that we know what it has in its select list).
We can do pushdown before we've done join optimization (i.e. choose_plan()
call) of the parent.
We must do pushdown before we've done JOIN::optimize() of the child
(in particular, it must be done before we do update_ref_and_keys() and
range analysis in the child).
5. Other things to take care of
===============================
* Pushing down fulltext predicates (it seems one needs to "register" a
fulltext predicate when it is moved from one select from another? Ask Serg)
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] New (by Psergey): Push conditions down into non-mergeable VIEWs when possible (119)
by worklog-noreply@askmonty.org 24 May '10
by worklog-noreply@askmonty.org 24 May '10
24 May '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Push conditions down into non-mergeable VIEWs when possible
CREATION DATE..: Mon, 24 May 2010, 20:52
SUPERVISOR.....: Igor
IMPLEMENTOR....:
COPIES TO......: Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 119 (http://askmonty.org/worklog/?tid=119)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
There are complaints (see links below) about cases with non-mergeable
VIEW (because it has a GROUP BY), a query that has restrictions on
the grouped column, and poor performance that is caused by VIEW
processing code ignoring the restriction.
This WL is about addressing this issue.
links to complaints:
http://code.openark.org/blog/mysql/views-better-performance-with-condition-…
http://www.mysqlperformanceblog.com/2010/05/19/a-workaround-for-the-perform…
The target version is MariaDB 5.3+, because it has late optimization/execution
for FROM-subqueries/non mergeable VIEWs, which makes it much more feasible to
inject something into VIEW before it is optimized/executed.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] New (by Psergey): Push conditions down into non-mergeable VIEWs when possible (119)
by worklog-noreply@askmonty.org 24 May '10
by worklog-noreply@askmonty.org 24 May '10
24 May '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Push conditions down into non-mergeable VIEWs when possible
CREATION DATE..: Mon, 24 May 2010, 20:52
SUPERVISOR.....: Igor
IMPLEMENTOR....:
COPIES TO......: Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 119 (http://askmonty.org/worklog/?tid=119)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
There are complaints (see links below) about cases with non-mergeable
VIEW (because it has a GROUP BY), a query that has restrictions on
the grouped column, and poor performance that is caused by VIEW
processing code ignoring the restriction.
This WL is about addressing this issue.
links to complaints:
http://code.openark.org/blog/mysql/views-better-performance-with-condition-…
http://www.mysqlperformanceblog.com/2010/05/19/a-workaround-for-the-perform…
The target version is MariaDB 5.3+, because it has late optimization/execution
for FROM-subqueries/non mergeable VIEWs, which makes it much more feasible to
inject something into VIEW before it is optimized/executed.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/b38aed4ff9d39949c1acd04e43a0c640.jpg?s=120&d=mm&r=g)
[Maria-developers] New (by Psergey): Push conditions down into non-mergeable VIEWs when possible (119)
by worklog-noreply@askmonty.org 24 May '10
by worklog-noreply@askmonty.org 24 May '10
24 May '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Push conditions down into non-mergeable VIEWs when possible
CREATION DATE..: Mon, 24 May 2010, 20:52
SUPERVISOR.....: Igor
IMPLEMENTOR....:
COPIES TO......: Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 119 (http://askmonty.org/worklog/?tid=119)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
There are complaints (see links below) about cases with non-mergeable
VIEW (because it has a GROUP BY), a query that has restrictions on
the grouped column, and poor performance that is caused by VIEW
processing code ignoring the restriction.
This WL is about addressing this issue.
links to complaints:
http://code.openark.org/blog/mysql/views-better-performance-with-condition-…
http://www.mysqlperformanceblog.com/2010/05/19/a-workaround-for-the-perform…
The target version is MariaDB 5.3+, because it has late optimization/execution
for FROM-subqueries/non mergeable VIEWs, which makes it much more feasible to
inject something into VIEW before it is optimized/executed.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
![](https://secure.gravatar.com/avatar/ac32c9d693e0b70eec04d9ed68b9b4d8.jpg?s=120&d=mm&r=g)
[Maria-developers] Rev 2789: Subquery cache (as is) in file:///home/bell/maria/bzr/work-maria-5.3-scache/
by sanja@askmonty.org 24 May '10
by sanja@askmonty.org 24 May '10
24 May '10
At file:///home/bell/maria/bzr/work-maria-5.3-scache/
------------------------------------------------------------
revno: 2789
revision-id: sanja(a)askmonty.org-20100524172956-7b14x01aodizr3sq
parent: sergii(a)pisem.net-20100510134608-oyi2vznyghgcrt0x
committer: sanja(a)askmonty.org
branch nick: work-maria-5.3-scache
timestamp: Mon 2010-05-24 20:29:56 +0300
message:
Subquery cache (as is)
=== modified file 'libmysqld/Makefile.am'
--- a/libmysqld/Makefile.am 2010-03-20 12:01:47 +0000
+++ b/libmysqld/Makefile.am 2010-05-24 17:29:56 +0000
@@ -80,7 +80,8 @@
sql_tablespace.cc \
rpl_injector.cc my_user.c partition_info.cc \
sql_servers.cc event_parse_data.cc opt_table_elimination.cc \
- multi_range_read.cc opt_index_cond_pushdown.cc
+ multi_range_read.cc opt_index_cond_pushdown.cc \
+ sql_subquery_cache.cc
libmysqld_int_a_SOURCES= $(libmysqld_sources)
nodist_libmysqld_int_a_SOURCES= $(libmysqlsources) $(sqlsources)
=== modified file 'sql/CMakeLists.txt'
--- a/sql/CMakeLists.txt 2010-03-20 12:01:47 +0000
+++ b/sql/CMakeLists.txt 2010-05-24 17:29:56 +0000
@@ -78,7 +78,7 @@
rpl_rli.cc rpl_mi.cc sql_servers.cc
sql_connect.cc scheduler.cc
sql_profile.cc event_parse_data.cc opt_table_elimination.cc
- ds_mrr.cc
+ ds_mrr.cc sql_subquery_cache.cc
${PROJECT_SOURCE_DIR}/sql/sql_yacc.cc
${PROJECT_SOURCE_DIR}/sql/sql_yacc.h
${PROJECT_SOURCE_DIR}/include/mysqld_error.h
=== modified file 'sql/Makefile.am'
--- a/sql/Makefile.am 2010-03-20 12:01:47 +0000
+++ b/sql/Makefile.am 2010-05-24 17:29:56 +0000
@@ -80,7 +80,7 @@
event_data_objects.h event_scheduler.h \
sql_partition.h partition_info.h partition_element.h \
contributors.h sql_servers.h \
- multi_range_read.h
+ multi_range_read.h sql_subquery_cache.h
mysqld_SOURCES = sql_lex.cc sql_handler.cc sql_partition.cc \
item.cc item_sum.cc item_buff.cc item_func.cc \
@@ -130,7 +130,7 @@
sql_servers.cc event_parse_data.cc \
opt_table_elimination.cc \
multi_range_read.cc \
- opt_index_cond_pushdown.cc
+ opt_index_cond_pushdown.cc sql_subquery_cache.cc
nodist_mysqld_SOURCES = mini_client_errors.c pack.c client.c my_time.c my_user.c
=== modified file 'sql/item.cc'
--- a/sql/item.cc 2010-03-20 12:01:47 +0000
+++ b/sql/item.cc 2010-05-24 17:29:56 +0000
@@ -2273,6 +2273,13 @@
str->append(str_value);
}
+void Item_bool_cache::print(String *str, enum_query_type query_type)
+{
+ if (null_value)
+ str->append("NULL", 4);
+ else
+ Item_int::print(str, query_type);
+}
Item_uint::Item_uint(const char *str_arg, uint length):
Item_int(str_arg, length)
@@ -3646,12 +3653,17 @@
resolved_item->db_name : "");
const char *table_name= (resolved_item->table_name ?
resolved_item->table_name : "");
+ DBUG_ENTER("mark_as_dependent");
+ DBUG_PRINT("enter", ("Field '%s.%s.%s in select %d resolved in %d",
+ db_name, table_name,
+ resolved_item->field_name, current->select_number,
+ last->select_number));
/* store pointer on SELECT_LEX from which item is dependent */
if (mark_item)
mark_item->depended_from= last;
if (current->mark_as_dependent(thd, last, /** resolved_item psergey-thu
**/mark_item))
- return TRUE;
+ DBUG_RETURN(TRUE);
if (thd->lex->describe & DESCRIBE_EXTENDED)
{
push_warning_printf(thd, MYSQL_ERROR::WARN_LEVEL_NOTE,
@@ -3661,7 +3673,7 @@
resolved_item->field_name,
current->select_number, last->select_number);
}
- return FALSE;
+ DBUG_RETURN(FALSE);
}
@@ -3698,6 +3710,7 @@
resolving)
*/
SELECT_LEX *previous_select= current_sel;
+
for (; previous_select->outer_select() != last_select;
previous_select= previous_select->outer_select())
{
@@ -3726,6 +3739,7 @@
mark_as_dependent(thd, last_select, current_sel, resolved_item,
dependent);
}
+ return;
}
@@ -4098,6 +4112,9 @@
((ref_type == REF_ITEM ||
ref_type == FIELD_ITEM) ?
(Item_ident*) (*reference) : 0));
+ context->select_lex->
+ register_dependency_item(last_checked_context->select_lex,
+ reference);
return 0;
}
}
@@ -4113,7 +4130,9 @@
((ref_type == REF_ITEM || ref_type == FIELD_ITEM) ?
(Item_ident*) (*reference) :
0));
-
+ context->select_lex->
+ register_dependency_item(last_checked_context->select_lex,
+ reference);
/*
A reference to a view field had been found and we
substituted it instead of this Item (find_field_in_tables
@@ -4215,6 +4234,10 @@
mark_as_dependent(thd, last_checked_context->select_lex,
context->select_lex, rf,
rf);
+ context->select_lex->
+ register_dependency_item(last_checked_context->select_lex,
+ reference);
+
return 0;
}
else
@@ -4222,6 +4245,9 @@
mark_as_dependent(thd, last_checked_context->select_lex,
context->select_lex,
this, (Item_ident*)*reference);
+ context->select_lex->
+ register_dependency_item(last_checked_context->select_lex,
+ reference);
if (last_checked_context->select_lex->having_fix_field)
{
Item_ref *rf;
@@ -5973,6 +5999,9 @@
refer_type == FIELD_ITEM) ?
(Item_ident*) (*reference) :
0));
+ context->select_lex->
+ register_dependency_item(last_checked_context->select_lex,
+ reference);
/*
view reference found, we substituted it instead of this
Item, so can quit
@@ -6023,6 +6052,9 @@
thd->change_item_tree(reference, fld);
mark_as_dependent(thd, last_checked_context->select_lex,
thd->lex->current_select, fld, fld);
+ context->select_lex->
+ register_dependency_item(last_checked_context->select_lex,
+ reference);
/*
A reference is resolved to a nest level that's outer or the same as
the nest level of the enclosing set function : adjust the value of
@@ -6046,6 +6078,9 @@
DBUG_ASSERT(*ref && (*ref)->fixed);
mark_as_dependent(thd, last_checked_context->select_lex,
context->select_lex, this, this);
+ context->select_lex->
+ register_dependency_item(last_checked_context->select_lex,
+ ref);
/*
A reference is resolved to a nest level that's outer or the same as
the nest level of the enclosing set function : adjust the value of
=== modified file 'sql/item.h'
--- a/sql/item.h 2010-03-20 12:01:47 +0000
+++ b/sql/item.h 2010-05-24 17:29:56 +0000
@@ -1143,6 +1143,11 @@
{ return Field::GEOM_GEOMETRY; };
String *check_well_formed_result(String *str, bool send_error= 0);
bool eq_by_collation(Item *item, bool binary_cmp, CHARSET_INFO *cs);
+
+ /**
+ Used to get reference on real item (not Item_ref)
+ */
+ virtual Item **unref(Item **my_ref) { return my_ref; };
};
@@ -1922,8 +1927,31 @@
virtual void print(String *str, enum_query_type query_type);
Item_num *neg ();
uint decimal_precision() const { return max_length; }
- bool check_partition_func_processor(uchar *bool_arg) { return FALSE;}
- bool check_vcol_func_processor(uchar *arg) { return FALSE;}
+};
+
+
+/**
+ Item represent TRUE/FALSE/NULL for subquery values
+*/
+
+class Item_bool_cache: public Item_int
+{
+public:
+ Item_bool_cache(): Item_int(0, 1)
+ {
+ unsigned_flag= maybe_null= null_value= TRUE;
+ name= (char *)"bool chache";
+ }
+ Item_bool_cache(my_bool val, my_bool null): Item_int(val, 1)
+ {
+ unsigned_flag= maybe_null= TRUE;
+ null_value= null;
+ name= (char *)"bool chache";
+ }
+ Item *clone_item() { return new Item_bool_cache(value, null_value); }
+ uint decimal_precision() const { return 1; }
+ virtual void print(String *str, enum_query_type query_type);
+ void set(my_bool val, my_bool null) {value= test(val); null_value= null;}
};
@@ -2479,6 +2507,11 @@
{
return trace_unsupported_by_check_vcol_func_processor("ref");
}
+
+ /**
+ Used to get reference on real item (not Item_ref)
+ */
+ virtual Item **unref(Item **my_ref) {return (*ref)->unref(ref); };
};
@@ -3146,7 +3179,8 @@
example(0), used_table_map(0), cached_field(0), cached_field_type(MYSQL_TYPE_STRING),
value_cached(0)
{
- fixed= 1;
+ fixed= 1;
+ maybe_null= 1;
null_value= 1;
}
Item_cache(enum_field_types field_type_arg):
@@ -3154,6 +3188,7 @@
value_cached(0)
{
fixed= 1;
+ maybe_null= 1;
null_value= 1;
}
=== modified file 'sql/item_cmpfunc.cc'
--- a/sql/item_cmpfunc.cc 2010-03-20 12:01:47 +0000
+++ b/sql/item_cmpfunc.cc 2010-05-24 17:29:56 +0000
@@ -1736,6 +1736,13 @@
used_tables_cache|= args[1]->used_tables();
not_null_tables_cache|= args[1]->not_null_tables();
const_item_cache&= args[1]->const_item();
+ DBUG_ASSERT(scache == NULL);
+ if (args[0]->cols() ==1 &&
+ thd->variables.optimizer_switch & OPTIMIZER_SWITCH_SUBQUERY_CACHE)
+ {
+ sub->depends_on.push_front((Item**)&cache);
+ scache= new Subquery_cache_tmptable(thd, sub->depends_on, &result);
+ }
fixed= 1;
return FALSE;
}
@@ -1744,10 +1751,26 @@
longlong Item_in_optimizer::val_int()
{
bool tmp;
+ DBUG_ENTER("Item_in_optimizer::val_int");
+
DBUG_ASSERT(fixed == 1);
cache->store(args[0]);
cache->cache_value();
-
+
+ /* check if result is in the cache */
+ if (scache)
+ {
+ Subquery_cache_tmptable::result res;
+ Item *cached_value;
+ res= scache->check_value(&cached_value);
+ if (res == Subquery_cache_tmptable::HIT)
+ {
+ tmp= cached_value->val_int();
+ null_value= cached_value->null_value;
+ DBUG_RETURN(tmp);
+ }
+ }
+
if (cache->null_value)
{
/*
@@ -1818,11 +1841,18 @@
for (uint i= 0; i < ncols; i++)
item_subs->set_cond_guard_var(i, TRUE);
}
- return 0;
+ DBUG_RETURN(0);
}
tmp= args[1]->val_bool_result();
null_value= args[1]->null_value;
- return tmp;
+
+ /* put result in the cache */
+ if (scache)
+ {
+ result.set(tmp, null_value);
+ scache->put_value(&result);
+ }
+ DBUG_RETURN(tmp);
}
@@ -1839,6 +1869,11 @@
Item_bool_func::cleanup();
if (!save_cache)
cache= 0;
+ if (scache)
+ {
+ delete scache;
+ scache= 0;
+ }
DBUG_VOID_RETURN;
}
=== modified file 'sql/item_cmpfunc.h'
--- a/sql/item_cmpfunc.h 2010-03-20 12:01:47 +0000
+++ b/sql/item_cmpfunc.h 2010-05-24 17:29:56 +0000
@@ -215,6 +215,7 @@
class Item_cache;
+class Subquery_cache;
#define UNKNOWN ((my_bool)-1)
@@ -237,6 +238,10 @@
{
protected:
Item_cache *cache;
+ /* Subquery cache */
+ Subquery_cache *scache;
+ /* result representation for the subquery cache */
+ Item_bool_cache result;
bool save_cache;
/*
Stores the value of "NULL IN (SELECT ...)" for uncorrelated subqueries:
@@ -247,7 +252,7 @@
my_bool result_for_null_param;
public:
Item_in_optimizer(Item *a, Item_in_subselect *b):
- Item_bool_func(a, my_reinterpret_cast(Item *)(b)), cache(0),
+ Item_bool_func(a, my_reinterpret_cast(Item *)(b)), cache(0), scache(NULL),
save_cache(0), result_for_null_param(UNKNOWN)
{}
bool fix_fields(THD *, Item **);
=== modified file 'sql/item_subselect.cc'
--- a/sql/item_subselect.cc 2010-03-29 14:04:35 +0000
+++ b/sql/item_subselect.cc 2010-05-24 17:29:56 +0000
@@ -1,4 +1,4 @@
-/* Copyright (C) 2000 MySQL AB
+/* Copyrigh (C) 2000 MySQL AB
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -34,11 +34,10 @@
Item_subselect::Item_subselect():
Item_result_field(), value_assigned(0), thd(0), substitution(0),
- engine(0), old_engine(0), used_tables_cache(0), have_to_be_excluded(0),
- const_item_cache(1),
- inside_first_fix_fields(0), done_first_fix_fields(FALSE),
- eliminated(FALSE),
- engine_changed(0), changed(0), is_correlated(FALSE)
+ engine(0), old_engine(0), scache(0), used_tables_cache(0),
+ have_to_be_excluded(0), const_item_cache(1), inside_first_fix_fields(0),
+ done_first_fix_fields(FALSE), eliminated(FALSE), engine_changed(0),
+ changed(0), is_correlated(FALSE)
{
with_subselect= 1;
reset();
@@ -116,6 +115,12 @@
}
if (engine)
engine->cleanup();
+ depends_on.empty();
+ if (scache)
+ {
+ delete scache;
+ scache= 0;
+ }
reset();
value_assigned= 0;
DBUG_VOID_RETURN;
@@ -148,6 +153,8 @@
Item_subselect::~Item_subselect()
{
delete engine;
+ if (scache)
+ delete scache;
}
Item_subselect::trans_res
@@ -746,9 +753,19 @@
void Item_singlerow_subselect::fix_length_and_dec()
{
+ DBUG_ENTER("Item_singlerow_subselect::fix_length_and_dec");
if ((max_columns= engine->cols()) == 1)
{
+ DBUG_PRINT("info", ("one, elements: %u flag %u",
+ (uint)depends_on.elements,
+ (uint)test(thd->variables.optimizer_switch & OPTIMIZER_SWITCH_SUBQUERY_CACHE)));
engine->fix_length_and_dec(row= &value);
+ if (depends_on.elements && optimizer_flag(thd, OPTIMIZER_SWITCH_SUBQUERY_CACHE))
+ {
+ DBUG_ASSERT(scache == NULL);
+ scache= new Subquery_cache_tmptable(thd, depends_on, value);
+ DBUG_PRINT("info", ("cache: 0x%lx", (ulong) scache));
+ }
}
else
{
@@ -765,6 +782,7 @@
*/
if (engine->no_tables())
maybe_null= engine->may_be_null();
+ DBUG_VOID_RETURN;
}
uint Item_singlerow_subselect::cols()
@@ -797,77 +815,200 @@
exec();
}
+
+Item *Item_subselect::check_cache()
+{
+ DBUG_ENTER("Item_subselect::check_cache");
+ if (scache)
+ {
+ Subquery_cache_tmptable::result res;
+ Item *cached_value;
+ res= scache->check_value(&cached_value);
+ if (res == Subquery_cache_tmptable::HIT)
+ DBUG_RETURN(cached_value);
+ }
+ DBUG_RETURN(NULL);
+}
+
double Item_singlerow_subselect::val_real()
{
+ Item *cached_value;
+ bool err;
+ DBUG_ENTER("Item_singlerow_subselect::val_real");
DBUG_ASSERT(fixed == 1);
- if (!exec() && !value->null_value)
+
+ if ((cached_value = check_cache()))
+ {
+ double res= cached_value->val_real();
+ if ((null_value= cached_value->null_value))
+ {
+ reset();
+ DBUG_RETURN(0);
+ }
+ else
+ DBUG_RETURN(res);
+ }
+
+ if (!(err= exec()) && !value->null_value)
{
null_value= 0;
- return value->val_real();
+ if (scache)
+ scache->put_value(value);
+ DBUG_RETURN(value->val_real());
}
else
{
reset();
- return 0;
+ DBUG_PRINT("info", ("error: %u", (uint)err));
+ if (scache && !err)
+ scache->put_value(&const_null_value);
+ DBUG_RETURN(0);
}
}
longlong Item_singlerow_subselect::val_int()
{
+ Item *cached_value;
+ bool err;
+ DBUG_ENTER("Item_singlerow_subselect::val_int");
DBUG_ASSERT(fixed == 1);
- if (!exec() && !value->null_value)
+
+ if ((cached_value = check_cache()))
+ {
+ longlong res= cached_value->val_int();
+ if ((null_value= cached_value->null_value))
+ {
+ reset();
+ DBUG_RETURN(0);
+ }
+ else
+ DBUG_RETURN(res);
+ }
+
+ if (!(err= exec()) && !value->null_value)
{
null_value= 0;
- return value->val_int();
+ if (scache)
+ scache->put_value(value);
+ DBUG_RETURN(value->val_int());
}
else
{
reset();
- return 0;
+ DBUG_PRINT("info", ("error: %u", (uint)err));
+ if (scache && !err)
+ scache->put_value(&const_null_value);
+ DBUG_RETURN(0);
}
}
String *Item_singlerow_subselect::val_str(String *str)
{
- if (!exec() && !value->null_value)
+ Item *cached_value;
+ bool err;
+ DBUG_ENTER("Item_singlerow_subselect::val_str");
+ DBUG_ASSERT(fixed == 1);
+
+ if ((cached_value = check_cache()))
+ {
+ String *res= cached_value->val_str(str);
+ if ((null_value= cached_value->null_value))
+ {
+ reset();
+ DBUG_RETURN(0);
+ }
+ else
+ DBUG_RETURN(res);
+ }
+
+ if (!(err= exec()) && !value->null_value)
{
null_value= 0;
- return value->val_str(str);
+ if (scache)
+ scache->put_value(value);
+ DBUG_RETURN(value->val_str(str));
}
else
{
reset();
- return 0;
+ DBUG_PRINT("info", ("error: %u", (uint)err));
+ if (scache && !err)
+ scache->put_value(&const_null_value);
+ DBUG_RETURN(0);
}
}
my_decimal *Item_singlerow_subselect::val_decimal(my_decimal *decimal_value)
{
- if (!exec() && !value->null_value)
+ Item *cached_value;
+ bool err;
+ DBUG_ENTER("Item_singlerow_subselect::val_decimal");
+ DBUG_ASSERT(fixed == 1);
+
+ if ((cached_value = check_cache()))
+ {
+ my_decimal *res= cached_value->val_decimal(decimal_value);
+ if ((null_value= cached_value->null_value))
+ {
+ reset();
+ DBUG_RETURN(0);
+ }
+ else
+ DBUG_RETURN(res);
+ }
+
+ if (!(err= exec()) && !value->null_value)
{
null_value= 0;
- return value->val_decimal(decimal_value);
+ if (scache)
+ scache->put_value(value);
+ DBUG_RETURN(value->val_decimal(decimal_value));
}
else
{
reset();
- return 0;
+ DBUG_PRINT("info", ("error: %u", (uint)err));
+ if (scache && !err)
+ scache->put_value(&const_null_value);
+ DBUG_RETURN(0);
}
}
bool Item_singlerow_subselect::val_bool()
{
- if (!exec() && !value->null_value)
+ Item *cached_value;
+ bool err;
+ DBUG_ENTER("Item_singlerow_subselect::val_bool");
+ DBUG_ASSERT(fixed == 1);
+
+ if ((cached_value = check_cache()))
+ {
+ bool res= cached_value->val_bool();
+ if ((null_value= cached_value->null_value))
+ {
+ reset();
+ DBUG_RETURN(0);
+ }
+ else
+ DBUG_RETURN(res);
+ }
+
+ if (!(err= exec()) && !value->null_value)
{
null_value= 0;
- return value->val_bool();
+ if (scache)
+ scache->put_value(value);
+ DBUG_RETURN(value->val_bool());
}
else
{
reset();
- return 0;
+ DBUG_PRINT("info", ("error: %u", (uint)err));
+ if (scache && !err)
+ scache->put_value(&const_null_value);
+ DBUG_RETURN(0);
}
}
@@ -952,33 +1093,77 @@
void Item_exists_subselect::fix_length_and_dec()
{
+ DBUG_ENTER("Item_exists_subselect::fix_length_and_dec");
decimals= 0;
max_length= 1;
max_columns= engine->cols();
/* We need only 1 row to determine existence */
unit->global_parameters->select_limit= new Item_int((int32) 1);
+ if (substype() == EXISTS_SUBS && depends_on.elements &&
+ optimizer_flag(thd, OPTIMIZER_SWITCH_SUBQUERY_CACHE))
+ {
+ DBUG_ASSERT(scache == NULL);
+ scache= new Subquery_cache_tmptable(thd, depends_on, &result);
+ DBUG_PRINT("info", ("cache: 0x%lx", (ulong) scache));
+ }
+ DBUG_VOID_RETURN;
}
double Item_exists_subselect::val_real()
{
+ Item *cached_value;
+ DBUG_ENTER("Item_exists_subselect::val_int");
DBUG_ASSERT(fixed == 1);
+
+ if ((cached_value = check_cache()))
+ {
+ double res= cached_value->val_real();
+ DBUG_ASSERT(!cached_value->null_value);
+ DBUG_RETURN(res);
+ }
+
if (exec())
{
reset();
- return 0;
- }
- return (double) value;
+ DBUG_RETURN(0);
+ }
+
+ if (scache)
+ {
+ result.set(value, FALSE);
+ scache->put_value(&result);
+ }
+
+ DBUG_RETURN((double) value);
}
longlong Item_exists_subselect::val_int()
{
+ Item *cached_value;
+ DBUG_ENTER("Item_exists_subselect::val_real");
+ DBUG_ASSERT(fixed == 1);
+
+ if ((cached_value = check_cache()))
+ {
+ longlong res= cached_value->val_int();
+ DBUG_ASSERT(!cached_value->null_value);
+ DBUG_RETURN(res);
+ }
+
DBUG_ASSERT(fixed == 1);
if (exec())
{
reset();
- return 0;
- }
- return value;
+ DBUG_RETURN(0);
+ }
+
+ if (scache)
+ {
+ result.set(value, FALSE);
+ scache->put_value(&result);
+ }
+
+ DBUG_RETURN(value);
}
@@ -997,11 +1182,31 @@
String *Item_exists_subselect::val_str(String *str)
{
+ Item *cached_value;
+ DBUG_ENTER("Item_exists_subselect::val_str");
DBUG_ASSERT(fixed == 1);
+
+ if ((cached_value = check_cache()))
+ {
+ String *res= cached_value->val_str(str);
+ DBUG_ASSERT(!cached_value->null_value);
+ DBUG_RETURN(res);
+ }
+
if (exec())
+ {
reset();
+ DBUG_RETURN(NULL);
+ }
+
+ if (scache)
+ {
+ result.set(value, FALSE);
+ scache->put_value(&result);
+ }
+
str->set((ulonglong)value,&my_charset_bin);
- return str;
+ DBUG_RETURN(str);
}
@@ -1020,23 +1225,60 @@
my_decimal *Item_exists_subselect::val_decimal(my_decimal *decimal_value)
{
+ Item *cached_value;
+ DBUG_ENTER("Item_exists_subselect::val_decvimal");
DBUG_ASSERT(fixed == 1);
+
+ if ((cached_value = check_cache()))
+ {
+ my_decimal *res= cached_value->val_decimal(decimal_value);
+ DBUG_ASSERT(!cached_value->null_value);
+ DBUG_RETURN(res);
+ }
+
if (exec())
+ {
reset();
+ DBUG_RETURN(0);
+ }
+
+ if (scache)
+ {
+ result.set(value, FALSE);
+ scache->put_value(&result);
+ }
+
int2my_decimal(E_DEC_FATAL_ERROR, value, 0, decimal_value);
- return decimal_value;
+ DBUG_RETURN(decimal_value);
}
bool Item_exists_subselect::val_bool()
{
+ Item *cached_value;
+ DBUG_ENTER("Item_exists_subselect::val_real");
DBUG_ASSERT(fixed == 1);
+
+ if ((cached_value = check_cache()))
+ {
+ my_bool res= cached_value->val_bool();
+ DBUG_ASSERT(!cached_value->null_value);
+ DBUG_RETURN(res);
+ }
+
if (exec())
{
reset();
- return 0;
- }
- return value != 0;
+ DBUG_RETURN(0);
+ }
+
+ if (scache)
+ {
+ result.set(value, FALSE);
+ scache->put_value(&result);
+ }
+
+ DBUG_RETURN(value != 0);
}
=== modified file 'sql/item_subselect.h'
--- a/sql/item_subselect.h 2010-03-29 14:04:35 +0000
+++ b/sql/item_subselect.h 2010-05-24 17:29:56 +0000
@@ -27,6 +27,7 @@
class subselect_hash_sj_engine;
class Item_bool_func2;
class Cached_item;
+class Subquery_cache;
/* base class for subselects */
@@ -57,6 +58,10 @@
subselect_engine *engine;
/* old engine if engine was changed */
subselect_engine *old_engine;
+ /* subquery cache */
+ Subquery_cache *scache;
+ /* null consrtant for caching */
+ Item_null const_null_value;
/* cache of used external tables */
table_map used_tables_cache;
/* allowed number of columns (1 for single value subqueries) */
@@ -67,7 +72,7 @@
bool have_to_be_excluded;
/* cache of constant state */
bool const_item_cache;
-
+
bool inside_first_fix_fields;
bool done_first_fix_fields;
public:
@@ -88,13 +93,18 @@
*/
List<Ref_to_outside> upper_refs;
st_select_lex *parent_select;
-
- /*
+
+ /**
+ List of items subquery depends on (externally resolved);
+ */
+ List<Item*> depends_on;
+
+ /*
TRUE<=>Table Elimination has made it redundant to evaluate this select
(and so it is not part of QEP, etc)
- */
+ */
bool eliminated;
-
+
/* changed engine indicator */
bool engine_changed;
/* subquery is transformed */
@@ -178,6 +188,8 @@
return trace_unsupported_by_check_vcol_func_processor("subselect");
}
+ Item *check_cache();
+
/**
Get the SELECT_LEX structure associated with this Item.
@return the SELECT_LEX structure associated with this Item
@@ -202,6 +214,7 @@
{
protected:
Item_cache *value, **row;
+
public:
Item_singlerow_subselect(st_select_lex *select_lex);
Item_singlerow_subselect() :Item_subselect(), value(0), row (0) {}
@@ -268,6 +281,8 @@
{
protected:
bool value; /* value of this item (boolean: exists/not-exists) */
+ /* result representation for the subquery cache */
+ Item_bool_cache result;
public:
Item_exists_subselect(st_select_lex *select_lex);
=== modified file 'sql/mysql_priv.h'
--- a/sql/mysql_priv.h 2010-03-20 12:01:47 +0000
+++ b/sql/mysql_priv.h 2010-05-24 17:29:56 +0000
@@ -568,12 +568,13 @@
#define OPTIMIZER_SWITCH_SEMIJOIN 256
#define OPTIMIZER_SWITCH_PARTIAL_MATCH_ROWID_MERGE 512
#define OPTIMIZER_SWITCH_PARTIAL_MATCH_TABLE_SCAN 1024
+#define OPTIMIZER_SWITCH_SUBQUERY_CACHE (1<<11)
#ifdef DBUG_OFF
-# define OPTIMIZER_SWITCH_LAST 2048
+# define OPTIMIZER_SWITCH_LAST (1<<12)
#else
-# define OPTIMIZER_SWITCH_TABLE_ELIMINATION 2048
-# define OPTIMIZER_SWITCH_LAST 4096
+# define OPTIMIZER_SWITCH_TABLE_ELIMINATION (1<<12)
+# define OPTIMIZER_SWITCH_LAST (1<<13)
#endif
#ifdef DBUG_OFF
@@ -588,7 +589,8 @@
OPTIMIZER_SWITCH_MATERIALIZATION | \
OPTIMIZER_SWITCH_SEMIJOIN | \
OPTIMIZER_SWITCH_PARTIAL_MATCH_ROWID_MERGE|\
- OPTIMIZER_SWITCH_PARTIAL_MATCH_TABLE_SCAN)
+ OPTIMIZER_SWITCH_PARTIAL_MATCH_TABLE_SCAN|\
+ OPTIMIZER_SWITCH_SUBQUERY_CACHE)
#else
# define OPTIMIZER_SWITCH_DEFAULT (OPTIMIZER_SWITCH_INDEX_MERGE | \
OPTIMIZER_SWITCH_INDEX_MERGE_UNION | \
@@ -601,7 +603,8 @@
OPTIMIZER_SWITCH_MATERIALIZATION | \
OPTIMIZER_SWITCH_SEMIJOIN | \
OPTIMIZER_SWITCH_PARTIAL_MATCH_ROWID_MERGE|\
- OPTIMIZER_SWITCH_PARTIAL_MATCH_TABLE_SCAN)
+ OPTIMIZER_SWITCH_PARTIAL_MATCH_TABLE_SCAN|\
+ OPTIMIZER_SWITCH_SUBQUERY_CACHE)
#endif
/*
@@ -936,6 +939,7 @@
#ifdef MYSQL_SERVER
#include "sql_servers.h"
#include "opt_range.h"
+#include "sql_subquery_cache.h"
#ifdef HAVE_QUERY_CACHE
struct Query_cache_query_flags
@@ -1269,6 +1273,10 @@
Item *having, ORDER *proc_param, ulonglong select_type,
select_result *result, SELECT_LEX_UNIT *unit,
SELECT_LEX *select_lex);
+
+struct st_join_table *create_index_lookup_join_tab(TABLE *table);
+int join_read_key2(THD *thd, struct st_join_table *tab, TABLE *table,
+ struct st_table_ref *table_ref);
void free_underlaid_joins(THD *thd, SELECT_LEX *select);
bool mysql_explain_union(THD *thd, SELECT_LEX_UNIT *unit,
select_result *result);
@@ -1288,6 +1296,7 @@
bool table_cant_handle_bit_fields,
bool make_copy_field,
uint convert_blob_length);
+bool open_tmp_table(TABLE *table);
void sp_prepare_create_field(THD *thd, Create_field *sql_field);
int prepare_create_field(Create_field *sql_field,
uint *blob_columns,
=== modified file 'sql/mysqld.cc'
--- a/sql/mysqld.cc 2010-03-20 12:01:47 +0000
+++ b/sql/mysqld.cc 2010-05-24 17:29:56 +0000
@@ -305,6 +305,7 @@
"firstmatch","loosescan","materialization", "semijoin",
"partial_match_rowid_merge",
"partial_match_table_scan",
+ "subquery_cache",
#ifndef DBUG_OFF
"table_elimination",
#endif
@@ -325,6 +326,7 @@
sizeof("semijoin") - 1,
sizeof("partial_match_rowid_merge") - 1,
sizeof("partial_match_table_scan") - 1,
+ sizeof("subquery_cache") - 1,
#ifndef DBUG_OFF
sizeof("table_elimination") - 1,
#endif
@@ -404,8 +406,9 @@
static const char *optimizer_switch_str="index_merge=on,index_merge_union=on,"
"index_merge_sort_union=on,"
"index_merge_intersection=on,"
- "index_condition_pushdown=on"
-#ifndef DBUG_OFF
+ "index_condition_pushdown=on,"
+ "subquery_cache=on"
+#ifndef DBUG_OFF
",table_elimination=on";
#else
;
@@ -5872,7 +5875,9 @@
OPT_RECORD_RND_BUFFER, OPT_DIV_PRECINCREMENT, OPT_RELAY_LOG_SPACE_LIMIT,
OPT_RELAY_LOG_PURGE,
OPT_SLAVE_NET_TIMEOUT, OPT_SLAVE_COMPRESSED_PROTOCOL, OPT_SLOW_LAUNCH_TIME,
- OPT_SLAVE_TRANS_RETRIES, OPT_READONLY, OPT_ROWID_MERGE_BUFF_SIZE,
+ OPT_SLAVE_TRANS_RETRIES,
+ OPT_SUBQUERY_CACHE,
+ OPT_READONLY, OPT_ROWID_MERGE_BUFF_SIZE,
OPT_DEBUGGING, OPT_DEBUG_FLUSH,
OPT_SORT_BUFFER, OPT_TABLE_OPEN_CACHE, OPT_TABLE_DEF_CACHE,
OPT_THREAD_CONCURRENCY, OPT_THREAD_CACHE_SIZE,
@@ -7164,7 +7169,7 @@
{"optimizer_switch", OPT_OPTIMIZER_SWITCH,
"optimizer_switch=option=val[,option=val...], where option={index_merge, "
"index_merge_union, index_merge_sort_union, index_merge_intersection, "
- "index_condition_pushdown"
+ "index_condition_pushdown, subquery_cache"
#ifndef DBUG_OFF
", table_elimination"
#endif
@@ -7868,6 +7873,8 @@
{"Ssl_version", (char*) &show_ssl_get_version, SHOW_FUNC},
#endif /* HAVE_OPENSSL */
{"Syncs", (char*) &my_sync_count, SHOW_LONG_NOFLUSH},
+ {"Subquery_cache_hit", (char*) &subquery_cache_hit, SHOW_LONG},
+ {"Subquery_cache_miss", (char*) &subquery_cache_miss, SHOW_LONG},
{"Table_locks_immediate", (char*) &locks_immediate, SHOW_LONG},
{"Table_locks_waited", (char*) &locks_waited, SHOW_LONG},
#ifdef HAVE_MMAP
@@ -8006,6 +8013,7 @@
abort_loop= select_thread_in_use= signal_thread_in_use= 0;
ready_to_exit= shutdown_in_progress= grant_option= 0;
aborted_threads= aborted_connects= 0;
+ subquery_cache_miss= subquery_cache_hit= 0;
delayed_insert_threads= delayed_insert_writes= delayed_rows_in_use= 0;
delayed_insert_errors= thread_created= 0;
specialflag= 0;
=== modified file 'sql/sql_base.cc'
--- a/sql/sql_base.cc 2010-03-20 12:01:47 +0000
+++ b/sql/sql_base.cc 2010-05-24 17:29:56 +0000
@@ -8062,6 +8062,10 @@
if (*conds)
{
thd->where="where clause";
+ DBUG_EXECUTE("where",
+ print_where(*conds,
+ "WHERE in setup_conds",
+ QT_ORDINARY););
if ((!(*conds)->fixed && (*conds)->fix_fields(thd, conds)) ||
(*conds)->check_cols(1))
goto err_no_arena;
=== modified file 'sql/sql_class.cc'
--- a/sql/sql_class.cc 2010-03-20 12:01:47 +0000
+++ b/sql/sql_class.cc 2010-05-24 17:29:56 +0000
@@ -3020,6 +3020,7 @@
table_charset= 0;
precomputed_group_by= 0;
bit_fields_as_long= 0;
+ skip_create_table= 0;
DBUG_VOID_RETURN;
}
=== modified file 'sql/sql_class.h'
--- a/sql/sql_class.h 2010-03-20 12:01:47 +0000
+++ b/sql/sql_class.h 2010-05-24 17:29:56 +0000
@@ -2786,12 +2786,17 @@
that MEMORY tables cannot index BIT columns.
*/
bool bit_fields_as_long;
+ /*
+ Whether to create or postpone actual creation of this temporary table.
+ TRUE <=> create_tmp_table will create only the TABLE structure.
+ */
+ bool skip_create_table;
TMP_TABLE_PARAM()
:copy_field(0), group_parts(0),
group_length(0), group_null_parts(0), convert_blob_length(0),
schema_table(0), precomputed_group_by(0), force_copy_fields(0),
- bit_fields_as_long(0)
+ bit_fields_as_long(0), skip_create_table(0)
{}
~TMP_TABLE_PARAM()
{
=== modified file 'sql/sql_lex.cc'
--- a/sql/sql_lex.cc 2010-03-20 12:01:47 +0000
+++ b/sql/sql_lex.cc 2010-05-24 17:29:56 +0000
@@ -1829,6 +1829,53 @@
}
+/**
+ Registers reference on items on which the subqueries depends
+
+ @param last pointer to last st_select_lex struct, before
+ which all st_select_lex have to be marked as
+ dependent
+ @param dependency reference on the item on which all this
+ subqueries depends
+
+*/
+
+void st_select_lex::register_dependency_item(st_select_lex *last,
+ Item **dependency)
+{
+ SELECT_LEX *s= this;
+ DBUG_ENTER("st_select_lex::register_dependency_item");
+ DBUG_ASSERT(this != last);
+ DBUG_ASSERT(*dependency);
+ dependency= (*dependency)->unref(dependency);
+ do
+ {
+ /* check duplicates */
+ List_iterator_fast<Item*> li(s->master_unit()->item->depends_on);
+ Item **dep;
+ while ((dep= li++))
+ {
+ if ((*dep)->eq(*dependency, FALSE))
+ {
+ DBUG_PRINT("info", ("dependency %s already present",
+ ((*dependency)->name ?
+ (*dependency)->name :
+ "<no name>")));
+ DBUG_VOID_RETURN;
+ }
+ }
+
+ s->master_unit()->item->depends_on.push_back(dependency);
+ DBUG_PRINT("info", ("depends_on: Select: %d added: %s",
+ s->select_number,
+ ((*dependency)->name ?
+ (*dependency)->name :
+ "<no name>")));
+ } while ((s= s->outer_select()) != last && s != 0);
+ DBUG_VOID_RETURN;
+}
+
+
/*
st_select_lex_node::mark_as_dependent mark all st_select_lex struct from
this to 'last' as dependent
@@ -1843,7 +1890,7 @@
bool st_select_lex::mark_as_dependent(THD *thd, st_select_lex *last, Item *dependency)
{
-
+ DBUG_ENTER("st_select_lex::mark_as_dependent");
DBUG_ASSERT(this != last);
/*
@@ -1872,11 +1919,11 @@
Item_subselect *subquery_expr= s->master_unit()->item;
if (subquery_expr && subquery_expr->mark_as_dependent(thd, last,
dependency))
- return TRUE;
+ DBUG_RETURN(TRUE);
} while ((s= s->outer_select()) != last && s != 0);
is_correlated= TRUE;
this->master_unit()->item->is_correlated= TRUE;
- return FALSE;
+ DBUG_RETURN(FALSE);
}
bool st_select_lex_node::set_braces(bool value) { return 1; }
=== modified file 'sql/sql_lex.h'
--- a/sql/sql_lex.h 2010-03-20 12:01:47 +0000
+++ b/sql/sql_lex.h 2010-05-24 17:29:56 +0000
@@ -748,6 +748,7 @@
}
bool mark_as_dependent(THD *thd, st_select_lex *last, Item *dependency);
+ void register_dependency_item(st_select_lex *last, Item **dependency);
bool set_braces(bool value);
bool inc_in_sum_expr();
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2010-05-10 13:46:08 +0000
+++ b/sql/sql_select.cc 2010-05-24 17:29:56 +0000
@@ -151,7 +151,6 @@
static int join_read_system(JOIN_TAB *tab);
static int join_read_const(JOIN_TAB *tab);
static int join_read_key(JOIN_TAB *tab);
-static int join_read_key2(JOIN_TAB *tab, TABLE *table, TABLE_REF *table_ref);
static void join_read_key_unlock_row(st_join_table *tab);
static int join_read_always_key(JOIN_TAB *tab);
static int join_read_last_key(JOIN_TAB *tab);
@@ -5209,7 +5208,7 @@
'join->best_positions' contains a complete optimal extension of the
current partial QEP.
*/
- DBUG_EXECUTE("opt", print_plan(join, join->tables,
+ DBUG_EXECUTE("opt", print_plan(join, n_tables,
record_count, read_time, read_time,
"optimal"););
DBUG_RETURN(FALSE);
@@ -7625,6 +7624,40 @@
/**
+ Creates and fills JOIN_TAB for index look up in temporary table
+
+ @param table The table where to look up
+
+ @return JOIN_TAB object or NULL in case of error
+*/
+
+JOIN_TAB *create_index_lookup_join_tab(TABLE *table)
+{
+ JOIN_TAB *tab;
+ DBUG_ENTER("create_index_lookup_join_tab");
+
+ if (!((tab= new JOIN_TAB)))
+ DBUG_RETURN(NULL);
+ tab->read_record.table= table;
+ tab->read_record.file=table->file;
+ /*tab->read_record.unlock_row= rr_unlock_row;*/
+ tab->next_select=0;
+ tab->sorted= 1;
+
+ table->status= STATUS_NO_RECORD;
+ tab->read_first_record= join_read_key;
+ /*tab->read_record.unlock_row= join_read_key_unlock_row;*/
+ tab->read_record.read_record= join_no_more_records;
+ if (table->covering_keys.is_set(tab->ref.key) &&
+ !table->no_keyread)
+ {
+ table->key_read=1;
+ table->file->extra(HA_EXTRA_KEYREAD);
+ }
+ DBUG_RETURN(tab);
+}
+
+/**
Give error if we some tables are done with a full join.
This is used by multi_table_update and multi_table_delete when running
@@ -10778,6 +10811,7 @@
case Item::REF_ITEM:
case Item::NULL_ITEM:
case Item::VARBIN_ITEM:
+ case Item::CACHE_ITEM:
if (make_copy_field)
{
DBUG_ASSERT(((Item_result_field*)item)->result_field);
@@ -11552,7 +11586,8 @@
¶m->recinfo, select_options))
goto err;
}
- if (open_tmp_table(table))
+ DBUG_PRINT("info", ("skip_create_table: %d", (int)param->skip_create_table));
+ if (!param->skip_create_table && open_tmp_table(table))
goto err;
thd->mem_root= mem_root_save;
@@ -11700,16 +11735,17 @@
bool open_tmp_table(TABLE *table)
{
int error;
+ DBUG_ENTER("open_tmp_table");
if ((error= table->file->ha_open(table, table->s->table_name.str, O_RDWR,
HA_OPEN_TMP_TABLE |
HA_OPEN_INTERNAL_TABLE)))
{
table->file->print_error(error,MYF(0)); /* purecov: inspected */
table->db_stat=0;
- return(1);
+ DBUG_RETURN(1);
}
(void) table->file->extra(HA_EXTRA_QUICK); /* Faster */
- return(0);
+ DBUG_RETURN(0);
}
@@ -12540,7 +12576,8 @@
else
{
/* Do index lookup in the materialized table */
- if ((res= join_read_key2(join_tab, sjm->table, sjm->tab_ref)) == 1)
+ if ((res= join_read_key2(join_tab->join->thd, join_tab,
+ sjm->table, sjm->tab_ref)) == 1)
DBUG_RETURN(NESTED_LOOP_ERROR); /* purecov: inspected */
if (res || !sjm->in_equality->val_int())
DBUG_RETURN(NESTED_LOOP_NO_MORE_ROWS);
@@ -13323,61 +13360,61 @@
static int
join_read_key(JOIN_TAB *tab)
{
- return join_read_key2(tab, tab->table, &tab->ref);
+ return join_read_key2(tab->join->thd, tab, tab->table, &tab->ref);
}
-/*
+/*
eq_ref access handler but generalized a bit to support TABLE and TABLE_REF
not from the join_tab. See join_read_key for detailed synopsis.
*/
-static int
-join_read_key2(JOIN_TAB *tab, TABLE *table, TABLE_REF *table_ref)
+int join_read_key2(THD *thd, JOIN_TAB *tab, TABLE *table, TABLE_REF *table_ref)
{
int error;
+ DBUG_ENTER("join_read_key2");
if (!table->file->inited)
{
table->file->ha_index_init(table_ref->key, tab->sorted);
}
/* TODO: Why don't we do "Late NULLs Filtering" here? */
- if (cmp_buffer_with_ref(tab->join->thd, table, table_ref) ||
+ if (cmp_buffer_with_ref(thd, table, table_ref) ||
(table->status & (STATUS_GARBAGE | STATUS_NO_PARENT | STATUS_NULL_ROW)))
{
if (table_ref->key_err)
{
table->status=STATUS_NOT_FOUND;
- return -1;
+ DBUG_RETURN(-1);
}
/*
Moving away from the current record. Unlock the row
in the handler if it did not match the partial WHERE.
*/
- if (tab->ref.has_record && tab->ref.use_count == 0)
+ if (table_ref->has_record && table_ref->use_count == 0)
{
tab->read_record.file->unlock_row();
- tab->ref.has_record= FALSE;
+ table_ref->has_record= FALSE;
}
error=table->file->ha_index_read_map(table->record[0],
table_ref->key_buff,
make_prev_keypart_map(table_ref->key_parts),
HA_READ_KEY_EXACT);
if (error && error != HA_ERR_KEY_NOT_FOUND && error != HA_ERR_END_OF_FILE)
- return report_error(table, error);
+ DBUG_RETURN(report_error(table, error));
if (! error)
{
- tab->ref.has_record= TRUE;
- tab->ref.use_count= 1;
+ table_ref->has_record= TRUE;
+ table_ref->use_count= 1;
}
}
else if (table->status == 0)
{
- DBUG_ASSERT(tab->ref.has_record);
- tab->ref.use_count++;
+ DBUG_ASSERT(table_ref->has_record);
+ table_ref->use_count++;
}
table->null_row=0;
- return table->status ? -1 : 0;
+ DBUG_RETURN(table->status ? -1 : 0);
}
=== modified file 'sql/table.cc'
--- a/sql/table.cc 2010-03-20 12:01:47 +0000
+++ b/sql/table.cc 2010-05-24 17:29:56 +0000
@@ -20,6 +20,7 @@
#include "sql_trigger.h"
#include <m_ctype.h>
#include "my_md5.h"
+#include "my_bit.h"
/* INFORMATION_SCHEMA name */
LEX_STRING INFORMATION_SCHEMA_NAME= {C_STRING_WITH_LEN("information_schema")};
@@ -5096,6 +5097,115 @@
file->column_bitmaps_signal();
}
+
+/**
+ @brief
+ Allocate space for keys
+
+ @param key_count number of keys to allocate.
+
+ @details
+ Allocate space enough to fit 'key_count' keys for this table.
+
+ @return FALSE space was successfully allocated.
+ @return TRUE an error occur.
+*/
+
+bool TABLE::alloc_keys(uint key_count)
+{
+ DBUG_ASSERT(!s->keys);
+ key_info= s->key_info= (KEY*) my_malloc(sizeof(KEY)*key_count, MYF(0));
+ max_keys= key_count;
+ return !(key_info);
+}
+
+
+/**
+ @brief Adds one key to a temporary table.
+
+ @param key_parts bitmap of fields that take a part in the key.
+ @param key_name name of the key
+
+ @details
+ Creates a key for this table from fields which corresponds the bits set to 1
+ in the 'key_parts' bitmap. The 'key_name' name is given to the newly created
+ key.
+
+ @return <0 an error occur.
+ @return >=0 number of newly added key.
+*/
+
+int TABLE::add_tmp_key(ulonglong key_parts, const char *key_name)
+{
+ DBUG_ASSERT(s->keys< max_keys);
+
+ KEY* keyinfo;
+ Field **reg_field;
+ uint i;
+ bool key_start= TRUE;
+ uint key_part_count= my_count_bits(key_parts);
+ KEY_PART_INFO* key_part_info=
+ (KEY_PART_INFO*) my_malloc(sizeof(KEY_PART_INFO)* key_part_count, MYF(0));
+ if (!key_part_info)
+ return -1;
+ keyinfo= key_info + s->keys;
+ keyinfo->key_part=key_part_info;
+ keyinfo->usable_key_parts=keyinfo->key_parts= key_part_count;
+ keyinfo->key_length=0;
+ keyinfo->algorithm= HA_KEY_ALG_UNDEF;
+ keyinfo->name= (char *)key_name;
+ keyinfo->flags= HA_GENERATED_KEY;
+ keyinfo->rec_per_key= (ulong*)my_malloc(sizeof(ulong)*key_part_count, MYF(0));
+ if (!keyinfo->rec_per_key)
+ return -1;
+ bzero(keyinfo->rec_per_key, sizeof(ulong)*key_part_count);
+ for (i= 0, reg_field=field ;
+ *reg_field;
+ i++, reg_field++)
+ {
+ if (!(key_parts & (1 << i)))
+ continue;
+ if (key_start)
+ (*reg_field)->key_start.set_bit(s->keys);
+ key_start= FALSE;
+ (*reg_field)->part_of_key.set_bit(s->keys);
+ (*reg_field)->flags|= PART_KEY_FLAG;
+ key_part_info->null_bit= (*reg_field)->null_bit;
+ key_part_info->null_offset= (uint) ((*reg_field)->null_ptr -
+ (uchar*) record[0]);
+ key_part_info->field= *reg_field;
+ key_part_info->offset= (*reg_field)->offset(record[0]);
+ key_part_info->length= (uint16) (*reg_field)->pack_length();
+ keyinfo->key_length+= key_part_info->length;
+ /* TODO:
+ The below method of computing the key format length of the
+ key part is a copy/paste from opt_range.cc, and table.cc.
+ This should be factored out, e.g. as a method of Field.
+ In addition it is not clear if any of the Field::*_length
+ methods is supposed to compute the same length. If so, it
+ might be reused.
+ */
+ key_part_info->store_length= key_part_info->length;
+
+ if ((*reg_field)->real_maybe_null())
+ key_part_info->store_length+= HA_KEY_NULL_LENGTH;
+ if ((*reg_field)->type() == MYSQL_TYPE_BLOB ||
+ (*reg_field)->real_type() == MYSQL_TYPE_VARCHAR)
+ key_part_info->store_length+= HA_KEY_BLOB_LENGTH;
+
+ key_part_info->type= (uint8) (*reg_field)->key_type();
+ key_part_info->key_type =
+ ((ha_base_keytype) key_part_info->type == HA_KEYTYPE_TEXT ||
+ (ha_base_keytype) key_part_info->type == HA_KEYTYPE_VARTEXT1 ||
+ (ha_base_keytype) key_part_info->type == HA_KEYTYPE_VARTEXT2) ?
+ 0 : FIELDFLAG_BINARY;
+ key_part_info++;
+ }
+ set_if_bigger(s->max_key_length, keyinfo->key_length);
+ return ++s->keys - 1;
+}
+
+
/**
@brief Check if this is part of a MERGE table with attached children.
=== modified file 'sql/table.h'
--- a/sql/table.h 2010-03-20 12:01:47 +0000
+++ b/sql/table.h 2010-05-24 17:29:56 +0000
@@ -781,6 +781,7 @@
uint temp_pool_slot; /* Used by intern temp tables */
uint status; /* What's in record[0] */
uint db_stat; /* mode of file as in handler.h */
+ uint max_keys; /* Size of allocated key_info array. */
/* number of select if it is derived table */
uint derived_select_number;
int current_lock; /* Type of lock on table */
@@ -914,6 +915,8 @@
inline bool needs_reopen_or_name_lock()
{ return s->version != refresh_version; }
bool is_children_attached(void);
+ bool alloc_keys(uint key_count);
+ int add_tmp_key(ulonglong key_parts, const char *key_name);
};
enum enum_schema_table_state
=== modified file 'storage/maria/ha_maria.cc'
--- a/storage/maria/ha_maria.cc 2010-03-20 12:01:47 +0000
+++ b/storage/maria/ha_maria.cc 2010-05-24 17:29:56 +0000
@@ -995,6 +995,8 @@
{
MARIA_HA *tmp= file;
file= 0;
+ if (!tmp)
+ return 0;
return maria_close(tmp);
}
1
0