developers
Threads by month
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
February 2010
- 24 participants
- 291 discussions
[Maria-developers] Updated (by Monty): Add support for google protocol buffers (34)
by worklog-noreply@askmonty.org 11 Feb '10
by worklog-noreply@askmonty.org 11 Feb '10
11 Feb '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add support for google protocol buffers
CREATION DATE..: Tue, 21 Jul 2009, 21:11
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 34 (http://askmonty.org/worklog/?tid=34)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Monty - Thu, 11 Feb 2010, 19:59)=-=-
High Level Description modified.
--- /tmp/wklog.34.old.17915 2010-02-11 17:59:17.000000000 +0000
+++ /tmp/wklog.34.new.17915 2010-02-11 17:59:17.000000000 +0000
@@ -1,5 +1,21 @@
-Add support for Google Protocol Buffers (further GPB). It should be possible
-to have columns that store GPB-encoded data, as well as use SQL constructs to
+Add support for dynamic columns:
+
+- A column that can hold information from many columns
+- One can instantly add or remove column data
+
+This is a useful feature for any store type of application, where you want to
+store different type of information for different kind of items.
+
+For example, for shoes you want to store: material, size, colour, maker
+For a computer you want to store ram, hard disk size etc...
+
+In a normal 'relational' system you would need to a table for each type.
+With dynamic columns you have all common items as fixed fields (like
+product_code, manufacturer, price) and the rest stored in a dynamic column.
+
+The proposed idea is to store the dynamic information in a blob in
+Google Protocol Buffers (further GPB) format and use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
+
Any support for indexing GPB data is outside of scope of this WL entry.
-=-=(Knielsen - Fri, 22 Jan 2010, 11:38)=-=-
Low Level Design modified.
--- /tmp/wklog.34.old.29965 2010-01-22 11:38:57.000000000 +0200
+++ /tmp/wklog.34.new.29965 2010-01-22 11:38:57.000000000 +0200
@@ -2,3 +2,12 @@
and a parser for text form of .proto file which then exposes the parsed
file via standard GPB message navigation API.
+* We should have both server-side support and client-side support (client side
+ means functions in libmysqlclient so that user can select the full BLOB and
+ extract fields in the application).
+
+* Add some kind of header to the GPB blob to support versioning and future
+ extensibility.
+
+* Add complete syntax description (update, add, drop, exists, ...).
+
-=-=(Psergey - Tue, 21 Jul 2009, 21:13)=-=-
Low Level Design modified.
--- /tmp/wklog.34.old.6462 2009-07-21 21:13:13.000000000 +0300
+++ /tmp/wklog.34.new.6462 2009-07-21 21:13:13.000000000 +0300
@@ -1 +1,4 @@
+* GPB tarball contains a protocol definition for .proto file structure itself
+ and a parser for text form of .proto file which then exposes the parsed
+ file via standard GPB message navigation API.
-=-=(Psergey - Tue, 21 Jul 2009, 21:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.34.old.6399 2009-07-21 21:12:23.000000000 +0300
+++ /tmp/wklog.34.new.6399 2009-07-21 21:12:23.000000000 +0300
@@ -1 +1,78 @@
+<contents>
+1. GPB Encoding overview
+2. GPB in an SQL database
+2.1 Informing server about GPB field names and types
+2.2 Addressing GPB fields
+2.2.1 Option1: SQL Function
+2.2.2 Option2: SQL columns
+</contents>
+
+
+1. GPB Encoding overview
+========================
+
+GBB is a compact encoding for structured and typed data. A unit of GPB data
+(it is called message) is only partially self-describing: it's possible to
+iterate over its parts, but, quoting the spec
+
+http://code.google.com/apis/protocolbuffers/docs/encoding.html:
+ " the name and declared type for each field can only be determined on the
+ decoding end by referencing the message type's definition (i.e. the .proto
+ file). "
+
+2. GPB in an SQL database
+=========================
+
+It is possible to store GPB data in MariaDB today - one can declare a binary
+blob column and use it to store GPB messages. Storing and retrieving entire
+messages will be the only available operations, though, as the server has no
+idea about the GPB format.
+It is apparent that ability to peek inside GPB data from SQL layer would be of
+great advantage: one would be able to
+- select only certain fields or parts of GPB messages
+- filter records based on the values of GPB fields
+- etc
+performing such operations at SQL layer will allow to reduce client<->server
+traffic right away, and will open path to getting the best possible
+performance.
+
+2.1 Informing server about GPB field names and types
+----------------------------------------------------
+User-friendly/meaningful access to GPB fields requires knowledge of GPB field
+names and types, which are not available from GPB message itself (see "GPB
+encoding overview" section).
+
+So the first issue to be addressed is to get the server to know the definition
+of stored messages. We intend to assume that all records have GPB messages
+that conform to a certain single definition, which gives one definition per
+GPB field.
+
+DecisionToMake: How to pass the server the GPB definition?
+First idea: add a CREATE TABLE parameter which will specify either the
+definition itself or path to .proto file with the definition.
+
+2.2 Addressing GPB fields
+-------------------------
+We'll need to provide a way to access GPB fields. This can be complicated as
+structures that are encoded in GPB message can be nested and recursive.
+
+2.2.1 Option1: SQL Function
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Introduce an SQL function GPB_FIELD(path) which will return contents of the
+field.
+- Return type of the function will be determined from GPB message definition.
+- For path, we can use XPath selector (a subset of XPath) syntax.
+
+(TODO ^ the above needs to be specified in more detail. is the selector as
+simple as filesystem path or we allow quantifiers (with predicates?)?)
+
+2.2.2 Option2: SQL columns
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make GPB columns to be accessible as SQL columns.
+This approach has problems:
+- It might be hard to implement code-wise
+ - (TODO will Virtual columns patch help??)
+- It is not clear how to access fields from nested structures. Should we allow
+ quoted names like `foo/bar[2]/baz' ?
+
DESCRIPTION:
Add support for dynamic columns:
- A column that can hold information from many columns
- One can instantly add or remove column data
This is a useful feature for any store type of application, where you want to
store different type of information for different kind of items.
For example, for shoes you want to store: material, size, colour, maker
For a computer you want to store ram, hard disk size etc...
In a normal 'relational' system you would need to a table for each type.
With dynamic columns you have all common items as fixed fields (like
product_code, manufacturer, price) and the rest stored in a dynamic column.
The proposed idea is to store the dynamic information in a blob in
Google Protocol Buffers (further GPB) format and use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
Any support for indexing GPB data is outside of scope of this WL entry.
HIGH-LEVEL SPECIFICATION:
<contents>
1. GPB Encoding overview
2. GPB in an SQL database
2.1 Informing server about GPB field names and types
2.2 Addressing GPB fields
2.2.1 Option1: SQL Function
2.2.2 Option2: SQL columns
</contents>
1. GPB Encoding overview
========================
GBB is a compact encoding for structured and typed data. A unit of GPB data
(it is called message) is only partially self-describing: it's possible to
iterate over its parts, but, quoting the spec
http://code.google.com/apis/protocolbuffers/docs/encoding.html:
" the name and declared type for each field can only be determined on the
decoding end by referencing the message type's definition (i.e. the .proto
file). "
2. GPB in an SQL database
=========================
It is possible to store GPB data in MariaDB today - one can declare a binary
blob column and use it to store GPB messages. Storing and retrieving entire
messages will be the only available operations, though, as the server has no
idea about the GPB format.
It is apparent that ability to peek inside GPB data from SQL layer would be of
great advantage: one would be able to
- select only certain fields or parts of GPB messages
- filter records based on the values of GPB fields
- etc
performing such operations at SQL layer will allow to reduce client<->server
traffic right away, and will open path to getting the best possible
performance.
2.1 Informing server about GPB field names and types
----------------------------------------------------
User-friendly/meaningful access to GPB fields requires knowledge of GPB field
names and types, which are not available from GPB message itself (see "GPB
encoding overview" section).
So the first issue to be addressed is to get the server to know the definition
of stored messages. We intend to assume that all records have GPB messages
that conform to a certain single definition, which gives one definition per
GPB field.
DecisionToMake: How to pass the server the GPB definition?
First idea: add a CREATE TABLE parameter which will specify either the
definition itself or path to .proto file with the definition.
2.2 Addressing GPB fields
-------------------------
We'll need to provide a way to access GPB fields. This can be complicated as
structures that are encoded in GPB message can be nested and recursive.
2.2.1 Option1: SQL Function
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduce an SQL function GPB_FIELD(path) which will return contents of the
field.
- Return type of the function will be determined from GPB message definition.
- For path, we can use XPath selector (a subset of XPath) syntax.
(TODO ^ the above needs to be specified in more detail. is the selector as
simple as filesystem path or we allow quantifiers (with predicates?)?)
2.2.2 Option2: SQL columns
~~~~~~~~~~~~~~~~~~~~~~~~~~
Make GPB columns to be accessible as SQL columns.
This approach has problems:
- It might be hard to implement code-wise
- (TODO will Virtual columns patch help??)
- It is not clear how to access fields from nested structures. Should we allow
quoted names like `foo/bar[2]/baz' ?
LOW-LEVEL DESIGN:
* GPB tarball contains a protocol definition for .proto file structure itself
and a parser for text form of .proto file which then exposes the parsed
file via standard GPB message navigation API.
* We should have both server-side support and client-side support (client side
means functions in libmysqlclient so that user can select the full BLOB and
extract fields in the application).
* Add some kind of header to the GPB blob to support versioning and future
extensibility.
* Add complete syntax description (update, add, drop, exists, ...).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
Re: [Maria-developers] Rev 2740: Group commit for maria storage engine. in file:///Users/bell/maria/bzr/work-maria-5.2-groupcommit/
by Oleksandr Byelkin 11 Feb '10
by Oleksandr Byelkin 11 Feb '10
11 Feb '10
Hi!
10 февр. 2010, в 21:38, Sergei Golubchik написал(а):
[skip]
>>> Why use my_atomic_store32 ?
>>
>> As I understood idea of atomic operation it is guaranted that we will
>> read consistent value (not one byte from one value and other one from
>> other). Yes I remember your statement that on modern 32bit system
>> you
>> always get it consistent, then why we made atomic operations at all?
>
> Because my_atomic_store32() also adds a full memory barrier to the
> atomic store operation. That is, if you do
>
> my_atomic_store32(&a, 1);
> my_atomic_store32(&b, 2);
>
> and then in another thread
>
> if (my_atomic_load32(&b) == 2)
> {
> ...
> here you can be sure that a==1, because a=1 was executed before
> b=2. And neither compiler nor the cpu swapped two assignments.
> }
>
In other words it is real current value of the variable in all
threads. It looks like what I need.
2
1
[Maria-developers] Rev 2758: Subquery optimizations: backport in file:///home/psergey/dev/maria-5.3-subqueries-r3/
by Sergey Petrunya 11 Feb '10
by Sergey Petrunya 11 Feb '10
11 Feb '10
At file:///home/psergey/dev/maria-5.3-subqueries-r3/
------------------------------------------------------------
revno: 2758
revision-id: psergey(a)askmonty.org-20100211120315-o1hpcxl5lkbrbl25
parent: psergey(a)askmonty.org-20100209203217-al1k9h50zrlphy5d
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.3-subqueries-r3
timestamp: Thu 2010-02-11 15:03:15 +0300
message:
Subquery optimizations: backport
- Fix valgrind failure: do initialize Item::is_expensive_cache.
=== modified file 'sql/item.cc'
--- a/sql/item.cc 2010-02-08 13:10:19 +0000
+++ b/sql/item.cc 2010-02-11 12:03:15 +0000
@@ -373,8 +373,8 @@
Item::Item():
- rsize(0), name(0), orig_name(0), name_length(0), fixed(0),
- is_autogenerated_name(TRUE),
+ is_expensive_cache(-1), rsize(0), name(0), orig_name(0), name_length(0),
+ fixed(0), is_autogenerated_name(TRUE),
collation(&my_charset_bin, DERIVATION_COERCIBLE)
{
marker= 0;
@@ -410,6 +410,7 @@
tables.
*/
Item::Item(THD *thd, Item *item):
+ is_expensive_cache(-1),
rsize(0),
str_value(item->str_value),
name(item->name),
=== modified file 'sql/item.h'
--- a/sql/item.h 2010-01-28 13:48:33 +0000
+++ b/sql/item.h 2010-02-11 12:03:15 +0000
@@ -513,6 +513,9 @@
enum traverse_order { POSTFIX, PREFIX };
+ /* Cache of the result of is_expensive(). */
+ int8 is_expensive_cache;
+
/* Reuse size, only used by SP local variable assignment, otherwize 0 */
uint rsize;
@@ -878,9 +881,6 @@
static CHARSET_INFO *default_charset();
virtual CHARSET_INFO *compare_collation() { return NULL; }
- /* Cache of the result of is_expensive(). */
- int8 is_expensive_cache;
-
virtual bool walk(Item_processor processor, bool walk_subquery, uchar *arg)
{
return (this->*processor)(arg);
1
0
[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2815: Added option --temporary-tables to test speed of temporary tables
by noreply@launchpad.net 10 Feb '10
by noreply@launchpad.net 10 Feb '10
10 Feb '10
------------------------------------------------------------
revno: 2815
committer: Michael Widenius <monty(a)askmonty.org>
branch nick: maria-5.1
timestamp: Wed 2010-02-10 23:26:06 +0200
message:
Added option --temporary-tables to test speed of temporary tables
added:
mysql-test/suite/parts/t/partition_repair_myisam-master.opt
modified:
sql-bench/bench-init.pl.sh
sql-bench/server-cfg.sh
sql-bench/test-connect.sh
sql-bench/test-create.sh
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0
[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2814: When one does a drop table, the indexes are not flushed to disk before drop anymore (with MyISAM/...
by noreply@launchpad.net 10 Feb '10
by noreply@launchpad.net 10 Feb '10
10 Feb '10
------------------------------------------------------------
revno: 2814
committer: Michael Widenius <monty(a)askmonty.org>
branch nick: maria-5.1
timestamp: Wed 2010-02-10 21:06:24 +0200
message:
When one does a drop table, the indexes are not flushed to disk before drop anymore (with MyISAM/Maria)
myisam-recover options changed from OFF to 'DEFAULT' to get less change of data loss when using MyISAM.
(The disadvantage is that changed MyISAM tables will be checked at access time; Use --myisam-recover=OFF for old behavior)
Don't call extra(HA_EXTRA_FORCE_REOPEN) in ALTER TABLE if table is locked as this will mark table as crashed!
Added assert to detect if we accidently would use MyISAM versioning in MySQL
modified:
include/my_base.h
mysql-test/mysql-test-run.pl
mysql-test/r/sp-destruct.result
mysql-test/r/variables.result
mysql-test/r/view.result
mysql-test/suite/maria/t/maria-recovery2-master.opt
mysql-test/t/sp-destruct.test
mysql-test/t/view.test
sql/lock.cc
sql/mysql_priv.h
sql/mysqld.cc
sql/sql_base.cc
sql/sql_delete.cc
sql/sql_table.cc
sql/table.cc
sql/table.h
storage/maria/ha_maria.cc
storage/maria/ma_blockrec.c
storage/maria/ma_close.c
storage/maria/ma_extra.c
storage/maria/ma_locking.c
storage/maria/ma_recovery.c
storage/maria/maria_def.h
storage/myisam/mi_close.c
storage/myisam/mi_extra.c
storage/myisam/mi_open.c
storage/myisam/myisamdef.h
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (monty:2815)
by Michael Widenius 10 Feb '10
by Michael Widenius 10 Feb '10
10 Feb '10
#At lp:maria based on revid:monty@askmonty.org-20100210190624-38ucdn8y98k1v1zd
2815 Michael Widenius 2010-02-10
Added option --temporary-tables to test speed of temporary tables
added:
mysql-test/suite/parts/t/partition_repair_myisam-master.opt
modified:
sql-bench/bench-init.pl.sh
sql-bench/server-cfg.sh
sql-bench/test-connect.sh
sql-bench/test-create.sh
per-file messages:
mysql-test/suite/parts/t/partition_repair_myisam-master.opt
Added missing file from last push
sql-bench/bench-init.pl.sh
Added options:
--temporary-tables to test speed of temporary tables
sql-bench/server-cfg.sh
Added limit for number of temporary tables one can create
sql-bench/test-connect.sh
Skip test that doesn't work with temporary tables.
sql-bench/test-create.sh
Added limit for number of temporary tables one can create
=== added file 'mysql-test/suite/parts/t/partition_repair_myisam-master.opt'
--- a/mysql-test/suite/parts/t/partition_repair_myisam-master.opt 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/parts/t/partition_repair_myisam-master.opt 2010-02-10 21:26:06 +0000
@@ -0,0 +1 @@
+--myisam-recover=off
=== modified file 'sql-bench/bench-init.pl.sh'
--- a/sql-bench/bench-init.pl.sh 2010-02-09 17:17:04 +0000
+++ b/sql-bench/bench-init.pl.sh 2010-02-10 21:26:06 +0000
@@ -39,7 +39,7 @@ require "$pwd/server-cfg" || die "Can't
$|=1; # Output data immediately
-$opt_skip_test=$opt_skip_create=$opt_skip_delete=$opt_verbose=$opt_fast_insert=$opt_lock_tables=$opt_debug=$opt_skip_delete=$opt_fast=$opt_force=$opt_log=$opt_use_old_results=$opt_help=$opt_odbc=$opt_small_test=$opt_small_tables=$opt_samll_key_tables=$opt_stage=$opt_old_headers=$opt_die_on_errors=$opt_tcpip=$opt_random=$opt_only_missing_tests=0;
+$opt_skip_test=$opt_skip_create=$opt_skip_delete=$opt_verbose=$opt_fast_insert=$opt_lock_tables=$opt_debug=$opt_skip_delete=$opt_fast=$opt_force=$opt_log=$opt_use_old_results=$opt_help=$opt_odbc=$opt_small_test=$opt_small_tables=$opt_samll_key_tables=$opt_stage=$opt_old_headers=$opt_die_on_errors=$opt_tcpip=$opt_random=$opt_only_missing_tests=$opt_temporary_tables=0;
$opt_cmp=$opt_user=$opt_password=$opt_connect_options=$opt_connect_command= "";
$opt_server="mysql"; $opt_dir="output";
$opt_host="localhost";$opt_database="test";
@@ -59,7 +59,7 @@ $log_prog_args=join(" ", skip_arguments(
"use-old-results","skip-test",
"optimization","hw",
"machine", "dir", "suffix", "log"));
-GetOptions("skip-test=s","comments=s","cmp=s","server=s","user=s","host=s","database=s","password=s","loop-count=i","row-count=i","skip-create","skip-delete","verbose","fast-insert","lock-tables","debug","fast","force","field-count=i","regions=i","groups=i","time-limit=i","log","use-old-results","machine=s","dir=s","suffix=s","help","odbc","small-test","small-tables","small-key-tables","stage=i","threads=i","random","old-headers","die-on-errors","create-options=s","hires","tcpip","silent","optimization=s","hw=s","socket=s","connect-options=s","connect-command=s","only-missing-tests") || usage();
+GetOptions("skip-test=s","comments=s","cmp=s","server=s","user=s","host=s","database=s","password=s","loop-count=i","row-count=i","skip-create","skip-delete","verbose","fast-insert","lock-tables","debug","fast","force","field-count=i","regions=i","groups=i","time-limit=i","log","use-old-results","machine=s","dir=s","suffix=s","help","odbc","small-test","small-tables","small-key-tables","stage=i","threads=i","random","old-headers","die-on-errors","create-options=s","hires","tcpip","silent","optimization=s","hw=s","socket=s","connect-options=s","connect-command=s","only-missing-tests","temporary-tables") || usage();
usage() if ($opt_help);
$server=get_server($opt_server,$opt_host,$opt_database,$opt_odbc,
@@ -454,6 +454,9 @@ All benchmarks takes the following optio
create all MySQL tables as InnoDB tables use:
--create-options=ENGINE=InnoDB
+--temporary-tables
+ Use temporary tables for all tests.
+
--database (Default $opt_database)
In which database the test tables are created.
=== modified file 'sql-bench/server-cfg.sh'
--- a/sql-bench/server-cfg.sh 2010-02-09 17:17:04 +0000
+++ b/sql-bench/server-cfg.sh 2010-02-10 21:26:06 +0000
@@ -159,6 +159,7 @@ sub new
$limits{'max_index'} = 16; # Max number of keys
$limits{'max_index_parts'} = 16; # Max segments/key
$limits{'max_tables'} = (($machine || '') =~ "^win") ? 5000 : 65000;
+ $limits{'max_temporary_tables'}= 400;
$limits{'max_text_size'} = 1000000; # Good enough for tests
$limits{'multi_drop'} = 1; # Drop table can take many tables
$limits{'order_by_position'} = 1; # Can use 'ORDER BY 1'
@@ -189,6 +190,7 @@ sub new
$self->{'transactions'} = 1; # Transactions enabled
$limits{'max_columns'} = 90; # Max number of columns in table
$limits{'max_tables'} = 32; # No comments
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
}
if (defined($main::opt_create_options) &&
$main::opt_create_options =~ /engine=bdb/i)
@@ -200,6 +202,7 @@ sub new
{
$limits{'working_blobs'} = 0; # Blobs not implemented yet
$limits{'max_tables'} = 500;
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$self->{'transactions'} = 1; # Transactions enabled
}
@@ -270,7 +273,14 @@ sub create
my($self,$table_name,$fields,$index,$options) = @_;
my($query,@queries);
- $query="create table $table_name (";
+ if ($main::opt_temporary_tables)
+ {
+ $query="create temporary table $table_name (";
+ }
+ else
+ {
+ $query="create table $table_name (";
+ }
foreach $field (@$fields)
{
# $field =~ s/ decimal/ double(10,2)/i;
@@ -393,6 +403,7 @@ sub new
$limits{'max_conditions'} = 74;
$limits{'max_columns'} = 75;
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$limits{'max_text_size'} = 32000;
$limits{'query_size'} = 65535;
$limits{'max_index'} = 5;
@@ -622,7 +633,9 @@ sub new
$limits{'max_conditions'} = 9999; # This makes Pg real slow
$limits{'max_index'} = 64; # Big enough
$limits{'max_index_parts'} = 16;
- $limits{'max_tables'} = 5000; # 10000 crashes pg 7.0.2
+ $limits{'max_tables'} = 65000;
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 65000; # Good enough for test
$limits{'multi_drop'} = 1;
$limits{'order_by_position'} = 1;
@@ -873,6 +886,8 @@ sub new
$limits{'max_conditions'} = 9999; # Probably big enough
$limits{'max_columns'} = 2000; # From crash-me
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 65492; # According to tests
$limits{'query_size'} = 65535; # Probably a limit
$limits{'max_index'} = 64; # Probably big enough
@@ -1104,6 +1119,7 @@ sub new
# above this value .... but can handle 2419 columns
# maybe something for crash-me ... but how to check ???
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$limits{'max_text_size'} = 4095; # max returned ....
$limits{'query_size'} = 65535; # Not a limit, big enough
$limits{'max_index'} = 64; # Big enough
@@ -1374,6 +1390,8 @@ sub new
$limits{'max_conditions'} = 9999; # (Actually not a limit)
$limits{'max_columns'} = 254; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 2000; # Limit for blob test-connect
$limits{'query_size'} = 65525; # Max size with default buffers.
$limits{'max_index'} = 16; # Max number of keys
@@ -1647,6 +1665,8 @@ sub new
$limits{'max_column_name'} = 18; # max table and column name
$limits{'max_columns'} = 994; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_index'} = 64; # Max number of keys
$limits{'max_index_parts'} = 15; # Max segments/key
$limits{'max_text_size'} = 65535; # Max size with default buffers. ??
@@ -1835,6 +1855,8 @@ sub new
$limits{'max_conditions'} = 97; # We get 'Query is too complex'
$limits{'max_columns'} = 255; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 255; # Max size with default buffers.
$limits{'query_size'} = 65535; # Not a limit, big enough
$limits{'max_index'} = 32; # Max number of keys
@@ -2020,6 +2042,8 @@ sub new
$limits{'max_conditions'} = 1030; # We get 'Query is too complex'
$limits{'max_columns'} = 250; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 9830; # Max size with default buffers.
$limits{'query_size'} = 9830; # Max size with default buffers.
$limits{'max_index'} = 64; # Max number of keys
@@ -2216,6 +2240,8 @@ sub new
$limits{'max_conditions'} = 1030; # We get 'Query is too complex'
$limits{'max_columns'} = 250; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 9830; # Max size with default buffers.
$limits{'query_size'} = 9830; # Max size with default buffers.
$limits{'max_index'} = 64; # Max number of keys
@@ -2448,6 +2474,8 @@ sub new
$limits{'max_conditions'} = 50; # (Actually not a limit)
$limits{'max_columns'} = 254; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 2000; # Limit for blob test-connect
$limits{'query_size'} = 65525; # Max size with default buffers.
$limits{'max_index'} = 16; # Max number of keys
@@ -2652,6 +2680,8 @@ sub new
$limits{'max_conditions'} = 418; # We get 'Query is too complex'
$limits{'max_columns'} = 500; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 254; # Max size with default buffers.
$limits{'query_size'} = 254; # Max size with default buffers.
$limits{'max_index'} = 48; # Max number of keys
@@ -2830,6 +2860,7 @@ sub new
$limits{'max_conditions'} = 9999; # (Actually not a limit)
$limits{'max_columns'} = 252; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$limits{'max_text_size'} = 15000; # Max size with default buffers.
$limits{'query_size'} = 1000000; # Max size with default buffers.
$limits{'max_index'} = 32; # Max number of keys
@@ -3032,6 +3063,7 @@ sub new
$limits{'max_conditions'} = 9999; # (Actually not a limit)
$limits{'max_columns'} = 252; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$limits{'max_text_size'} = 15000; # Max size with default buffers.
$limits{'query_size'} = 1000000; # Max size with default buffers.
$limits{'max_index'} = 65000; # Max number of keys
@@ -3228,6 +3260,7 @@ sub new
# The following should be 8192, but is smaller because Frontbase crashes..
$limits{'max_columns'} = 150; # Max number of columns in table
$limits{'max_tables'} = 5000; # 10000 crashed FrontBase
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$limits{'max_text_size'} = 65000; # Max size with default buffers.
$limits{'query_size'} = 8000000; # Max size with default buffers.
$limits{'max_index'} = 38; # Max number of keys
@@ -3440,6 +3473,7 @@ sub new
$limits{'max_conditions'} = 9999; # (Actually not a limit) *
$limits{'max_columns'} = 1023; # Max number of columns in table *
$limits{'max_tables'} = 65000; # Should be big enough * unlimited actually
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$limits{'max_text_size'} = 15000; # Max size with default buffers.
$limits{'query_size'} = 64*1024; # Max size with default buffers. *64 kb by default. May be set by system variable
$limits{'max_index'} = 510; # Max number of keys *
=== modified file 'sql-bench/test-connect.sh'
--- a/sql-bench/test-connect.sh 2009-05-29 13:40:55 +0000
+++ b/sql-bench/test-connect.sh 2010-02-10 21:26:06 +0000
@@ -161,41 +161,48 @@ if ($opt_fast && defined($server->{vacuu
{
$server->vacuum(0,\$dbh);
}
-$dbh->disconnect;
+if (!$main::opt_temporary_tables)
+{
+ $dbh->disconnect;
+}
#
# First test connect/select/disconnect
#
-print "Testing connect/select 1 row from table/disconnect\n";
+if (!$main::opt_temporary_tables)
+{
+ print "Testing connect/select 1 row from table/disconnect\n";
-$loop_time=new Benchmark;
-$errors=0;
+ $loop_time=new Benchmark;
+ $errors=0;
-for ($i=0 ; $i < $small_loop_count ; $i++)
-{
- for ($j=0; $j < $max_test ; $j++)
+ for ($i=0 ; $i < $small_loop_count ; $i++)
{
- last if ($dbh = DBI->connect($server->{'data_source'}, $opt_user, $opt_password));
- $errors++;
- }
- die $DBI::errstr if ($j == $max_test);
+ for ($j=0; $j < $max_test ; $j++)
+ {
+ last if ($dbh = DBI->connect($server->{'data_source'}, $opt_user, $opt_password));
+ $errors++;
+ }
+ die $DBI::errstr if ($j == $max_test);
- $sth = $dbh->do("select a,i,s,$i from bench1") # Select * from table with 1 record
+ $sth = $dbh->do("select a,i,s,$i from bench1") # Select * from table with 1 record
or die $DBI::errstr;
- $dbh->disconnect;
-}
+ $dbh->disconnect;
+ }
-$end_time=new Benchmark;
-print "Warning: $errors connections didn't work without a time delay\n" if ($errors);
-print "Time to connect+select_1_row ($small_loop_count): " .
+ $end_time=new Benchmark;
+ print "Warning: $errors connections didn't work without a time delay\n" if ($errors);
+ print "Time to connect+select_1_row ($small_loop_count): " .
timestr(timediff($end_time, $loop_time),"all") . "\n\n";
+ $dbh = $server->connect();
+}
+
#
# The same test, but without connect/disconnect
#
print "Testing select 1 row from table\n";
-$dbh = $server->connect();
$loop_time=new Benchmark;
for ($i=0 ; $i < $opt_loop_count ; $i++)
=== modified file 'sql-bench/test-create.sh'
--- a/sql-bench/test-create.sh 2009-05-29 13:40:55 +0000
+++ b/sql-bench/test-create.sh 2010-02-10 21:26:06 +0000
@@ -47,7 +47,15 @@ if ($opt_small_test)
$create_loop_count/=1000;
}
-$max_tables=min($limits->{'max_tables'},$opt_loop_count);
+if ($opt_temporary_tables)
+{
+ $max_tables=min($limits->{'max_tables'},$opt_loop_count);
+}
+else
+{
+ $max_tables=min($limits->{'max_tables'},$opt_loop_count);
+ $max_tables=400;
+}
if ($opt_small_test)
{
@@ -71,7 +79,7 @@ $dbh = $server->connect();
if ($opt_force) # If tables used in this test exist, drop 'em
{
print "Okay..Let's make sure that our tables don't exist yet.\n\n";
- for ($i=1 ; $i <= $max_tables ; $i++)
+ for ($i=1 ; $i <= max($max_tables, $create_loop_count) ; $i++)
{
$dbh->do("drop table bench_$i" . $server->{'drop_attr'});
}
@@ -245,7 +253,7 @@ for ($i=2 ; $i <= $keys ; $i++)
}
$loop_time=new Benchmark;
-for ($i=1 ; $i <= $opt_loop_count ; $i++)
+for ($i=1 ; $i <= $create_loop_count ; $i++)
{
do_many($dbh,$server->create("bench_$i", \@fields, \@keys));
$dbh->do("drop table bench_$i" . $server->{'drop_attr'}) or die $DBI::errstr;
1
0
[Maria-developers] Rev 2740: Group commit for maria storage engine. in file:///Users/bell/maria/bzr/work-maria-5.2-groupcommit/
by sanja@askmonty.org 10 Feb '10
by sanja@askmonty.org 10 Feb '10
10 Feb '10
At file:///Users/bell/maria/bzr/work-maria-5.2-groupcommit/
------------------------------------------------------------
revno: 2740
revision-id: sanja(a)askmonty.org-20100210205026-8l8veoi8dbon5cwl
parent: knielsen(a)knielsen-hq.org-20100201190519-b9uktnn90rwwiile
committer: sanja(a)askmonty.org
branch nick: work-maria-5.2-groupcommit
timestamp: Wed 2010-02-10 22:50:26 +0200
message:
Group commit for maria storage engine.
=== added file 'mysql-test/suite/maria/r/group_commit.result'
--- a/mysql-test/suite/maria/r/group_commit.result 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/maria/r/group_commit.result 2010-02-10 20:50:26 +0000
@@ -0,0 +1,17 @@
+drop table if exists t1;
+create table t1 (a int);
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 100;
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 0;
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 100;
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 0;
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 100;
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+drop table t1;
=== modified file 'mysql-test/suite/maria/r/maria3.result'
--- a/mysql-test/suite/maria/r/maria3.result 2009-09-18 01:04:43 +0000
+++ b/mysql-test/suite/maria/r/maria3.result 2010-02-10 20:50:26 +0000
@@ -306,6 +306,8 @@
maria_block_size 8192
maria_checkpoint_interval 30
maria_force_start_after_recovery_failures 0
+maria_group_commit none
+maria_group_commit_interval 0
maria_log_file_size 4294959104
maria_log_purge_type immediate
maria_max_sort_file_size 9223372036853727232
@@ -328,6 +330,7 @@
Maria_pagecache_reads #
Maria_pagecache_write_requests #
Maria_pagecache_writes #
+Maria_transaction_log_syncs #
create table t1 (b char(0));
insert into t1 values(NULL),("");
select length(b) from t1;
=== added file 'mysql-test/suite/maria/t/group_commit.test'
--- a/mysql-test/suite/maria/t/group_commit.test 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/maria/t/group_commit.test 2010-02-10 20:50:26 +0000
@@ -0,0 +1,71 @@
+# Test different ways of syncing (mostly syntax)
+
+--disable_warnings
+drop table if exists t1;
+--enable_warnings
+
+create table t1 (a int);
+
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 100;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 0;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 100;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 0;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 100;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+drop table t1;
=== modified file 'storage/maria/ha_maria.cc'
--- a/storage/maria/ha_maria.cc 2009-12-03 11:34:11 +0000
+++ b/storage/maria/ha_maria.cc 2010-02-10 20:50:26 +0000
@@ -102,22 +102,40 @@
array_elements(maria_translog_purge_type_names) - 1, "",
maria_translog_purge_type_names, NULL
};
+
+/* transactional log directory sync */
const char *maria_sync_log_dir_names[]=
{
"NEVER", "NEWFILE", "ALWAYS", NullS
};
-
TYPELIB maria_sync_log_dir_typelib=
{
array_elements(maria_sync_log_dir_names) - 1, "",
maria_sync_log_dir_names, NULL
};
+/* transactional log group commit */
+const char *maria_group_commit_names[]=
+{
+ "none", "hard", "soft", NullS
+};
+TYPELIB maria_group_commit_typelib=
+{
+ array_elements(maria_group_commit_names) - 1, "",
+ maria_group_commit_names, NULL
+};
+
/** Interval between background checkpoints in seconds */
static ulong checkpoint_interval;
static void update_checkpoint_interval(MYSQL_THD thd,
struct st_mysql_sys_var *var,
void *var_ptr, const void *save);
+static void update_maria_group_commit(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save);
+static void update_maria_group_commit_interval(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save);
/** After that many consecutive recovery failures, remove logs */
static ulong force_start_after_recovery_failures;
static void update_log_file_size(MYSQL_THD thd,
@@ -164,6 +182,24 @@
NULL, update_log_file_size, TRANSLOG_FILE_SIZE,
TRANSLOG_MIN_FILE_SIZE, 0xffffffffL, TRANSLOG_PAGE_SIZE);
+static MYSQL_SYSVAR_ENUM(group_commit, maria_group_commit,
+ PLUGIN_VAR_RQCMDARG,
+ "Specifies maria group commit mode. "
+ "Possible values are \"none\" (no group commit), "
+ "\"hard\" (with waiting to actual commit), "
+ "\"soft\" (no wait for commit (DANGEROUS!!!))",
+ NULL, update_maria_group_commit,
+ TRANSLOG_GCOMMIT_NONE, &maria_group_commit_typelib);
+
+static MYSQL_SYSVAR_ULONG(group_commit_interval, maria_group_commit_interval,
+ PLUGIN_VAR_RQCMDARG,
+ "Interval between commite in microseconds (1/1000000c)."
+ " 0 stands for no waiting"
+ " for other threads to come and do a commit in \"hard\" mode and no"
+ " sync()/commit at all in \"soft\" mode. Option has only an effect"
+ " if maria_group_commit is used",
+ NULL, update_maria_group_commit_interval, 0, 0, UINT_MAX, 1);
+
static MYSQL_SYSVAR_ENUM(log_purge_type, log_purge_type,
PLUGIN_VAR_RQCMDARG,
"Specifies how maria transactional log will be purged. "
@@ -3275,6 +3311,8 @@
MYSQL_SYSVAR(block_size),
MYSQL_SYSVAR(checkpoint_interval),
MYSQL_SYSVAR(force_start_after_recovery_failures),
+ MYSQL_SYSVAR(group_commit),
+ MYSQL_SYSVAR(group_commit_interval),
MYSQL_SYSVAR(page_checksum),
MYSQL_SYSVAR(log_dir_path),
MYSQL_SYSVAR(log_file_size),
@@ -3306,6 +3344,92 @@
}
/**
+ @brief Updates group commit mode
+*/
+
+static void update_maria_group_commit(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save)
+{
+ ulong value= (ulong)*((long *)var_ptr);
+ DBUG_ENTER("update_maria_group_commit");
+ DBUG_PRINT("enter", ("old value: %lu new value %lu rate %lu",
+ value, (ulong)(*(long *)save),
+ maria_group_commit_interval));
+ /* old value */
+ switch (value) {
+ case TRANSLOG_GCOMMIT_NONE:
+ break;
+ case TRANSLOG_GCOMMIT_HARD:
+ translog_hard_group_commit(FALSE);
+ break;
+ case TRANSLOG_GCOMMIT_SOFT:
+ translog_soft_sync(FALSE);
+ if (maria_group_commit_interval)
+ translog_soft_sync_end();
+ break;
+ default:
+ DBUG_ASSERT(0); /* impossible */
+ }
+ value= *(ulong *)var_ptr= (ulong)(*(long *)save);
+ translog_sync();
+ /* new value */
+ switch (value) {
+ case TRANSLOG_GCOMMIT_NONE:
+ break;
+ case TRANSLOG_GCOMMIT_HARD:
+ translog_hard_group_commit(TRUE);
+ break;
+ case TRANSLOG_GCOMMIT_SOFT:
+ translog_soft_sync(TRUE);
+ /* variable change made under global lock so we can just read it */
+ if (maria_group_commit_interval)
+ translog_soft_sync_start();
+ break;
+ default:
+ DBUG_ASSERT(0); /* impossible */
+ }
+ DBUG_VOID_RETURN;
+}
+
+/**
+ @brief Updates group commit interval
+*/
+
+static void update_maria_group_commit_interval(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save)
+{
+ ulong new_value= (ulong)*((long *)save);
+ ulong *value_ptr= (ulong*) var_ptr;
+ DBUG_ENTER("update_maria_group_commit_interval");
+ DBUG_PRINT("enter", ("old value: %lu new value %lu group commit %lu",
+ *value_ptr, new_value, maria_group_commit));
+
+ /* variable change made under global lock so we can just read it */
+ switch (maria_group_commit) {
+ case TRANSLOG_GCOMMIT_NONE:
+ *value_ptr= new_value;
+ translog_set_group_commit_interval(new_value);
+ break;
+ case TRANSLOG_GCOMMIT_HARD:
+ *value_ptr= new_value;
+ translog_set_group_commit_interval(new_value);
+ break;
+ case TRANSLOG_GCOMMIT_SOFT:
+ if (*value_ptr)
+ translog_soft_sync_end();
+ translog_set_group_commit_interval(new_value);
+ if ((*value_ptr= new_value))
+ translog_soft_sync_start();
+ break;
+ default:
+ DBUG_ASSERT(0); /* impossible */
+ }
+ DBUG_VOID_RETURN;
+}
+
+/**
@brief Updates the transaction log file limit.
*/
@@ -3327,6 +3451,7 @@
{"Maria_pagecache_reads", (char*) &maria_pagecache_var.global_cache_read, SHOW_LONGLONG},
{"Maria_pagecache_write_requests", (char*) &maria_pagecache_var.global_cache_w_requests, SHOW_LONGLONG},
{"Maria_pagecache_writes", (char*) &maria_pagecache_var.global_cache_write, SHOW_LONGLONG},
+ {"Maria_transaction_log_syncs", (char*) &translog_syncs, SHOW_LONGLONG},
{NullS, NullS, SHOW_LONG}
};
=== modified file 'storage/maria/ma_init.c'
--- a/storage/maria/ma_init.c 2008-10-09 20:03:54 +0000
+++ b/storage/maria/ma_init.c 2010-02-10 20:50:26 +0000
@@ -82,6 +82,11 @@
maria_inited= maria_multi_threaded= FALSE;
ft_free_stopwords();
ma_checkpoint_end();
+ if (translog_status == TRANSLOG_OK)
+ {
+ translog_soft_sync_end();
+ translog_sync();
+ }
if ((trid= trnman_get_max_trid()) > max_trid_in_control_file)
{
/*
=== modified file 'storage/maria/ma_loghandler.c'
--- a/storage/maria/ma_loghandler.c 2010-01-06 21:27:53 +0000
+++ b/storage/maria/ma_loghandler.c 2010-02-10 20:50:26 +0000
@@ -18,6 +18,7 @@
#include "ma_blockrec.h" /* for some constants and in-write hooks */
#include "ma_key_recover.h" /* For some in-write hooks */
#include "ma_checkpoint.h"
+#include "ma_servicethread.h"
/*
On Windows, neither my_open() nor my_sync() work for directories.
@@ -47,6 +48,15 @@
#include <m_ctype.h>
#endif
+/** @brief protects checkpoint_in_progress */
+static pthread_mutex_t LOCK_soft_sync;
+/** @brief for killing the background checkpoint thread */
+static pthread_cond_t COND_soft_sync;
+/** @brief control structure for checkpoint background thread */
+static MA_SERVICE_THREAD_CONTROL soft_sync_control=
+ {THREAD_DEAD, FALSE, &LOCK_soft_sync, &COND_soft_sync};
+
+
/* transaction log file descriptor */
typedef struct st_translog_file
{
@@ -124,10 +134,24 @@
/* Previous buffer offset to detect it flush finish */
TRANSLOG_ADDRESS prev_buffer_offset;
/*
+ If the buffer was forced to close it save value of its horizon
+ otherwise LSN_IMPOSSIBLE
+ */
+ TRANSLOG_ADDRESS pre_force_close_horizon;
+ /*
How much is written (or will be written when copy_to_buffer_in_progress
become 0) to this buffer
*/
translog_size_t size;
+ /*
+ When moving from one log buffer to another, we write the last of the
+ previous buffer to file and then move to start using the new log
+ buffer. In the case of a part filed last page, this page is not moved
+ to the start of the new buffer but instead we set the 'skip_data'
+ variable to tell us how much data at the beginning of the buffer is not
+ relevant.
+ */
+ uint skipped_data;
/* File handler for this buffer */
TRANSLOG_FILE *file;
/* Threads which are waiting for buffer filling/freeing */
@@ -304,6 +328,7 @@
*/
pthread_mutex_t log_flush_lock;
pthread_cond_t log_flush_cond;
+ pthread_cond_t new_goal_cond;
/* Protects changing of headers of finished files (max_lsn) */
pthread_mutex_t file_header_lock;
@@ -344,13 +369,39 @@
ulong log_purge_type= TRANSLOG_PURGE_IMMIDIATE;
ulong log_file_size= TRANSLOG_FILE_SIZE;
+/* sync() of log files directory mode */
ulong sync_log_dir= TRANSLOG_SYNC_DIR_NEWFILE;
+ulong maria_group_commit= TRANSLOG_GCOMMIT_NONE;
+ulong maria_group_commit_interval= 0;
/* Marker for end of log */
static uchar end_of_log= 0;
#define END_OF_LOG &end_of_log
+/**
+ Switch for "soft" sync (no real sync() but periodical sync by service
+ thread)
+*/
+static volatile my_bool soft_sync= FALSE;
+/**
+ Switch for "hard" group commit mode
+*/
+static volatile my_bool hard_group_commit= FALSE;
+/**
+ File numbers interval which have to be sync()
+*/
+static uint32 soft_sync_min= 0;
+static uint32 soft_sync_max= 0;
+static uint32 soft_need_sync= 1;
+/**
+ stores interval in microseconds
+*/
+static uint32 group_commit_wait= 0;
enum enum_translog_status translog_status= TRANSLOG_UNINITED;
+ulonglong translog_syncs= 0; /* Number of sync()s */
+
+/* time of last flush */
+static ulonglong flush_start= 0;
/* chunk types */
#define TRANSLOG_CHUNK_LSN 0x00 /* 0 chunk refer as LSN (head or tail */
@@ -980,12 +1031,17 @@
static TRANSLOG_FILE *get_current_logfile()
{
TRANSLOG_FILE *file;
+ DBUG_ENTER("get_current_logfile");
rw_rdlock(&log_descriptor.open_files_lock);
+ DBUG_PRINT("info", ("max_file: %lu min_file: %lu open_files: %lu",
+ (ulong) log_descriptor.max_file,
+ (ulong) log_descriptor.min_file,
+ (ulong) log_descriptor.open_files.elements));
DBUG_ASSERT(log_descriptor.max_file - log_descriptor.min_file + 1 ==
log_descriptor.open_files.elements);
file= *dynamic_element(&log_descriptor.open_files, 0, TRANSLOG_FILE **);
rw_unlock(&log_descriptor.open_files_lock);
- return (file);
+ DBUG_RETURN(file);
}
uchar NEAR maria_trans_file_magic[]=
@@ -1069,6 +1125,7 @@
static my_bool translog_max_lsn_to_header(File file, LSN lsn)
{
uchar lsn_buff[LSN_STORE_SIZE];
+ my_bool rc;
DBUG_ENTER("translog_max_lsn_to_header");
DBUG_PRINT("enter", ("File descriptor: %ld "
"lsn: (%lu,0x%lx)",
@@ -1077,11 +1134,17 @@
lsn_store(lsn_buff, lsn);
- DBUG_RETURN(my_pwrite(file, lsn_buff,
- LSN_STORE_SIZE,
- (LOG_HEADER_DATA_SIZE - LSN_STORE_SIZE),
- log_write_flags) != 0 ||
- my_sync(file, MYF(MY_WME)) != 0);
+ rc= (my_pwrite(file, lsn_buff,
+ LSN_STORE_SIZE,
+ (LOG_HEADER_DATA_SIZE - LSN_STORE_SIZE),
+ log_write_flags) != 0 ||
+ my_sync(file, MYF(MY_WME)) != 0);
+ /*
+ We should not increase counter in case of error above, but it is so
+ unlikely that we can ignore this case
+ */
+ translog_syncs++;
+ DBUG_RETURN(rc);
}
@@ -1423,7 +1486,9 @@
static my_bool translog_buffer_init(struct st_translog_buffer *buffer, int num)
{
DBUG_ENTER("translog_buffer_init");
- buffer->prev_last_lsn= buffer->last_lsn= LSN_IMPOSSIBLE;
+ buffer->pre_force_close_horizon=
+ buffer->prev_last_lsn= buffer->last_lsn=
+ LSN_IMPOSSIBLE;
DBUG_PRINT("info", ("last_lsn and prev_last_lsn set to 0 buffer: 0x%lx",
(ulong) buffer));
@@ -1435,6 +1500,7 @@
memset(buffer->buffer, TRANSLOG_FILLER, TRANSLOG_WRITE_BUFFER);
/* Buffer size */
buffer->size= 0;
+ buffer->skipped_data= 0;
/* cond of thread which is waiting for buffer filling */
if (pthread_cond_init(&buffer->waiting_filling_buffer, 0))
DBUG_RETURN(1);
@@ -1489,7 +1555,10 @@
TODO: sync only we have changed the log
*/
if (!file->is_sync)
+ {
rc= my_sync(file->handler.file, MYF(MY_WME));
+ translog_syncs++;
+ }
rc|= my_close(file->handler.file, MYF(MY_WME));
my_free(file, MYF(0));
return test(rc);
@@ -2044,7 +2113,8 @@
(ulong) LSN_OFFSET(log_descriptor.horizon),
(ulong) LSN_OFFSET(log_descriptor.horizon)));
DBUG_ASSERT(buffer_no == buffer->buffer_no);
- buffer->prev_last_lsn= buffer->last_lsn= LSN_IMPOSSIBLE;
+ buffer->pre_force_close_horizon=
+ buffer->prev_last_lsn= buffer->last_lsn= LSN_IMPOSSIBLE;
DBUG_PRINT("info", ("last_lsn and prev_last_lsn set to 0 buffer: 0x%lx",
(ulong) buffer));
buffer->offset= log_descriptor.horizon;
@@ -2052,6 +2122,7 @@
buffer->file= get_current_logfile();
buffer->overlay= 0;
buffer->size= 0;
+ buffer->skipped_data= 0;
translog_cursor_init(cursor, buffer, buffer_no);
DBUG_PRINT("info", ("file: #%ld (%d) init cursor #%u: 0x%lx "
"chaser: %d Size: %lu (%lu)",
@@ -2523,6 +2594,7 @@
TRANSLOG_ADDRESS offset= buffer->offset;
TRANSLOG_FILE *file= buffer->file;
uint8 ver= buffer->ver;
+ uint skipped_data;
DBUG_ENTER("translog_buffer_flush");
DBUG_PRINT("enter",
("Buffer: #%u 0x%lx file: %d offset: (%lu,0x%lx) size: %lu",
@@ -2557,6 +2629,8 @@
disk
*/
file= buffer->file;
+ skipped_data= buffer->skipped_data;
+ DBUG_ASSERT(skipped_data < TRANSLOG_PAGE_SIZE);
for (i= 0, pg= LSN_OFFSET(buffer->offset) / TRANSLOG_PAGE_SIZE;
i < buffer->size;
i+= TRANSLOG_PAGE_SIZE, pg++)
@@ -2573,13 +2647,16 @@
DBUG_ASSERT(i + TRANSLOG_PAGE_SIZE <= buffer->size);
if (translog_status != TRANSLOG_OK && translog_status != TRANSLOG_SHUTDOWN)
DBUG_RETURN(1);
- if (pagecache_inject(log_descriptor.pagecache,
+ if (pagecache_write_part(log_descriptor.pagecache,
&file->handler, pg, 3,
buffer->buffer + i,
PAGECACHE_PLAIN_PAGE,
PAGECACHE_LOCK_LEFT_UNLOCKED,
- PAGECACHE_PIN_LEFT_UNPINNED, 0,
- LSN_IMPOSSIBLE))
+ PAGECACHE_PIN_LEFT_UNPINNED,
+ PAGECACHE_WRITE_DONE, 0,
+ LSN_IMPOSSIBLE,
+ skipped_data,
+ TRANSLOG_PAGE_SIZE - skipped_data))
{
DBUG_PRINT("error",
("Can't write page (%lu,0x%lx) to pagecache, error: %d",
@@ -2589,10 +2666,12 @@
translog_stop_writing();
DBUG_RETURN(1);
}
+ skipped_data= 0;
}
file->is_sync= 0;
- if (my_pwrite(file->handler.file, buffer->buffer,
- buffer->size, LSN_OFFSET(buffer->offset),
+ if (my_pwrite(file->handler.file, buffer->buffer + buffer->skipped_data,
+ buffer->size - buffer->skipped_data,
+ LSN_OFFSET(buffer->offset) + buffer->skipped_data,
log_write_flags))
{
DBUG_PRINT("error", ("Can't write buffer (%lu,0x%lx) size %lu "
@@ -2985,6 +3064,7 @@
uchar *from, *table= NULL;
int is_last_unfinished_page;
uint last_protected_sector= 0;
+ uint skipped_data= curr_buffer->skipped_data;
TRANSLOG_FILE file_copy;
uint8 ver= curr_buffer->ver;
translog_wait_for_writers(curr_buffer);
@@ -2997,7 +3077,38 @@
}
DBUG_ASSERT(LSN_FILE_NO(addr) == LSN_FILE_NO(curr_buffer->offset));
from= curr_buffer->buffer + (addr - curr_buffer->offset);
- memcpy(buffer, from, TRANSLOG_PAGE_SIZE);
+ if (skipped_data && addr == curr_buffer->offset)
+ {
+ /*
+ We read page part of which is not present in buffer,
+ so we should read absent part from file (page cache actually)
+ */
+ file= get_logfile_by_number(file_no);
+ DBUG_ASSERT(file != NULL);
+ /*
+ it's ok to not lock the page because:
+ - The log handler has it's own page cache.
+ - There is only one thread that can access the log
+ cache at a time
+ */
+ if (!(buffer= pagecache_read(log_descriptor.pagecache,
+ &file->handler,
+ LSN_OFFSET(addr) / TRANSLOG_PAGE_SIZE,
+ 3, buffer,
+ PAGECACHE_PLAIN_PAGE,
+ PAGECACHE_LOCK_LEFT_UNLOCKED,
+ NULL)))
+ DBUG_RETURN(NULL);
+ }
+ else
+ skipped_data= 0; /* Read after skipped in buffer data */
+ /*
+ Now we have correct data in buffer up to 'skipped_data'. The
+ following memcpy() will move the data from the internal buffer
+ that was not yet on disk.
+ */
+ memcpy(buffer + skipped_data, from + skipped_data,
+ TRANSLOG_PAGE_SIZE - skipped_data);
/*
We can use copy then in translog_page_validator() because it
do not put it permanently somewhere.
@@ -3291,6 +3402,7 @@
uint32 next_page_offset, page_rest;
uint32 i;
File fd;
+ int rc;
TRANSLOG_VALIDATOR_DATA data;
char path[FN_REFLEN];
uchar page_buff[TRANSLOG_PAGE_SIZE];
@@ -3316,14 +3428,19 @@
TRANSLOG_PAGE_SIZE);
page_rest= next_page_offset - LSN_OFFSET(addr);
memset(page_buff, TRANSLOG_FILLER, page_rest);
- if ((fd= open_logfile_by_number_no_cache(LSN_FILE_NO(addr))) < 0 ||
- ((my_chsize(fd, next_page_offset, TRANSLOG_FILLER, MYF(MY_WME)) ||
- (page_rest && my_pwrite(fd, page_buff, page_rest, LSN_OFFSET(addr),
- log_write_flags)) ||
- my_sync(fd, MYF(MY_WME))) |
- my_close(fd, MYF(MY_WME))) ||
- (sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS &&
- sync_dir(log_descriptor.directory_fd, MYF(MY_WME | MY_IGNORE_BADFD))))
+ rc= ((fd= open_logfile_by_number_no_cache(LSN_FILE_NO(addr))) < 0 ||
+ ((my_chsize(fd, next_page_offset, TRANSLOG_FILLER, MYF(MY_WME)) ||
+ (page_rest && my_pwrite(fd, page_buff, page_rest, LSN_OFFSET(addr),
+ log_write_flags)) ||
+ my_sync(fd, MYF(MY_WME)))));
+ translog_syncs++;
+ rc|= (fd > 0 && my_close(fd, MYF(MY_WME)));
+ if (sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS)
+ {
+ rc|= sync_dir(log_descriptor.directory_fd, MYF(MY_WME | MY_IGNORE_BADFD));
+ translog_syncs++;
+ }
+ if (rc)
DBUG_RETURN(1);
/* fix the horizon */
@@ -3511,6 +3628,7 @@
pthread_mutex_init(&log_descriptor.dirty_buffer_mask_lock,
MY_MUTEX_INIT_FAST) ||
pthread_cond_init(&log_descriptor.log_flush_cond, 0) ||
+ pthread_cond_init(&log_descriptor.new_goal_cond, 0) ||
my_rwlock_init(&log_descriptor.open_files_lock,
NULL) ||
my_init_dynamic_array(&log_descriptor.open_files,
@@ -3912,7 +4030,6 @@
log_descriptor.flushed= log_descriptor.horizon;
log_descriptor.in_buffers_only= log_descriptor.bc.buffer->offset;
log_descriptor.max_lsn= LSN_IMPOSSIBLE; /* set to 0 */
- log_descriptor.previous_flush_horizon= log_descriptor.horizon;
/*
Now 'flushed' is set to 'horizon' value, but 'horizon' is (potentially)
address of the next LSN and we want indicate that all LSNs that are
@@ -3995,6 +4112,10 @@
It is beginning of the log => there is no LSNs in the log =>
There is no harm in leaving it "as-is".
*/
+ log_descriptor.previous_flush_horizon= log_descriptor.horizon;
+ DBUG_PRINT("info", ("previous_flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(log_descriptor.
+ previous_flush_horizon)));
DBUG_RETURN(0);
}
file_no--;
@@ -4070,6 +4191,9 @@
translog_free_record_header(&rec);
}
}
+ log_descriptor.previous_flush_horizon= log_descriptor.horizon;
+ DBUG_PRINT("info", ("previous_flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(log_descriptor.previous_flush_horizon)));
DBUG_RETURN(0);
err:
ma_message_no_user(0, "log initialization failed");
@@ -4157,6 +4281,7 @@
pthread_mutex_destroy(&log_descriptor.log_flush_lock);
pthread_mutex_destroy(&log_descriptor.dirty_buffer_mask_lock);
pthread_cond_destroy(&log_descriptor.log_flush_cond);
+ pthread_cond_destroy(&log_descriptor.new_goal_cond);
rwlock_destroy(&log_descriptor.open_files_lock);
delete_dynamic(&log_descriptor.open_files);
delete_dynamic(&log_descriptor.unfinished_files);
@@ -6885,11 +7010,11 @@
{
translog_size_t res;
DBUG_ENTER("translog_read_record_header_from_buffer");
- DBUG_ASSERT(translog_is_LSN_chunk(page[page_offset]));
- DBUG_ASSERT(translog_status == TRANSLOG_OK ||
- translog_status == TRANSLOG_READONLY);
DBUG_PRINT("info", ("page byte: 0x%x offset: %u",
(uint) page[page_offset], (uint) page_offset));
+ DBUG_ASSERT(translog_is_LSN_chunk(page[page_offset]));
+ DBUG_ASSERT(translog_status == TRANSLOG_OK ||
+ translog_status == TRANSLOG_READONLY);
buff->type= (page[page_offset] & TRANSLOG_REC_TYPE);
buff->short_trid= uint2korr(page + page_offset + 1);
DBUG_PRINT("info", ("Type %u, Short TrID %u, LSN (%lu,0x%lx)",
@@ -7356,27 +7481,27 @@
"Buffer addr: (%lu,0x%lx) "
"Page addr: (%lu,0x%lx) "
"size: %lu (%lu) Pg: %u left: %u in progress %u",
- (uint) log_descriptor.bc.buffer_no,
- (ulong) log_descriptor.bc.buffer,
- LSN_IN_PARTS(log_descriptor.bc.buffer->offset),
+ (uint) old_buffer_no,
+ (ulong) old_buffer,
+ LSN_IN_PARTS(old_buffer->offset),
(ulong) LSN_FILE_NO(log_descriptor.horizon),
(ulong) (LSN_OFFSET(log_descriptor.horizon) -
log_descriptor.bc.current_page_fill),
- (ulong) log_descriptor.bc.buffer->size,
+ (ulong) old_buffer->size,
(ulong) (log_descriptor.bc.ptr -log_descriptor.bc.
buffer->buffer),
(uint) log_descriptor.bc.current_page_fill,
(uint) left,
- (uint) log_descriptor.bc.buffer->
+ (uint) old_buffer->
copy_to_buffer_in_progress));
translog_lock_assert_owner();
LINT_INIT(current_page_fill);
- new_buff_beginning= log_descriptor.bc.buffer->offset;
- new_buff_beginning+= log_descriptor.bc.buffer->size; /* increase offset */
+ new_buff_beginning= old_buffer->offset;
+ new_buff_beginning+= old_buffer->size; /* increase offset */
DBUG_ASSERT(log_descriptor.bc.ptr !=NULL);
DBUG_ASSERT(LSN_FILE_NO(log_descriptor.horizon) ==
- LSN_FILE_NO(log_descriptor.bc.buffer->offset));
+ LSN_FILE_NO(old_buffer->offset));
translog_check_cursor(&log_descriptor.bc);
DBUG_ASSERT(left < TRANSLOG_PAGE_SIZE);
if (left)
@@ -7387,18 +7512,20 @@
*/
DBUG_PRINT("info", ("left: %u", (uint) left));
+ old_buffer->pre_force_close_horizon=
+ old_buffer->offset + old_buffer->size;
/* decrease offset */
new_buff_beginning-= log_descriptor.bc.current_page_fill;
current_page_fill= log_descriptor.bc.current_page_fill;
memset(log_descriptor.bc.ptr, TRANSLOG_FILLER, left);
- log_descriptor.bc.buffer->size+= left;
+ old_buffer->size+= left;
DBUG_PRINT("info", ("Finish Page buffer #%u: 0x%lx "
"Size: %lu",
- (uint) log_descriptor.bc.buffer->buffer_no,
- (ulong) log_descriptor.bc.buffer,
- (ulong) log_descriptor.bc.buffer->size));
- DBUG_ASSERT(log_descriptor.bc.buffer->buffer_no ==
+ (uint) old_buffer->buffer_no,
+ (ulong) old_buffer,
+ (ulong) old_buffer->size));
+ DBUG_ASSERT(old_buffer->buffer_no ==
log_descriptor.bc.buffer_no);
}
else
@@ -7509,11 +7636,21 @@
if (left)
{
- /*
- TODO: do not copy beginning of the page if we have no CRC or sector
- checks on
- */
- memcpy(new_buffer->buffer, data, current_page_fill);
+ if (log_descriptor.flags &
+ (TRANSLOG_PAGE_CRC | TRANSLOG_SECTOR_PROTECTION))
+ memcpy(new_buffer->buffer, data, current_page_fill);
+ else
+ {
+ /*
+ This page header does not change if we add more data to the page so
+ we can not copy it and will not overwrite later
+ */
+ new_buffer->skipped_data= current_page_fill;
+#ifndef DBUG_OFF
+ memset(new_buffer->buffer, 0xa5, current_page_fill);
+#endif
+ DBUG_ASSERT(new_buffer->skipped_data < TRANSLOG_PAGE_SIZE);
+ }
}
old_buffer->next_buffer_offset= new_buffer->offset;
translog_buffer_lock(new_buffer);
@@ -7561,6 +7698,7 @@
{
log_descriptor.next_pass_max_lsn= lsn;
log_descriptor.max_lsn_requester= pthread_self();
+ pthread_cond_broadcast(&log_descriptor.new_goal_cond);
}
while (flush_no == log_descriptor.flush_no)
{
@@ -7572,66 +7710,78 @@
/**
- @brief Flush the log up to given LSN (included)
-
- @param lsn log record serial number up to which (inclusive)
- the log has to be flushed
-
- @return Operation status
+ @brief sync() range of files (inclusive) and directory (by request)
+
+ @param min min internal file number to flush
+ @param max max internal file number to flush
+ @param sync_dir need sync directory
+
+ return Operation status
@retval 0 OK
@retval 1 Error
-
-*/
-
-my_bool translog_flush(TRANSLOG_ADDRESS lsn)
-{
- LSN sent_to_disk= LSN_IMPOSSIBLE;
- TRANSLOG_ADDRESS flush_horizon;
- uint fn, i;
+*/
+
+static my_bool translog_sync_files(uint32 min, uint32 max,
+ my_bool sync_dir)
+{
+ uint fn;
+ my_bool rc= 0;
+ ulonglong flush_interval;
+ DBUG_ENTER("translog_sync_files");
+ DBUG_PRINT("info", ("min: %lu max: %lu sync dir: %d",
+ (ulong) min, (ulong) max, (int) sync_dir));
+ DBUG_ASSERT(min <= max);
+
+ flush_interval= group_commit_wait;
+ if (flush_interval)
+ flush_start= my_micro_time();
+ for (fn= min; fn <= max; fn++)
+ {
+ TRANSLOG_FILE *file= get_logfile_by_number(fn);
+ DBUG_ASSERT(file != NULL);
+ if (!file->is_sync)
+ {
+ if (my_sync(file->handler.file, MYF(MY_WME)))
+ {
+ rc= 1;
+ translog_stop_writing();
+ DBUG_RETURN(rc);
+ }
+ translog_syncs++;
+ file->is_sync= 1;
+ }
+ }
+
+ if (sync_dir)
+ {
+ if (!(rc= sync_dir(log_descriptor.directory_fd,
+ MYF(MY_WME | MY_IGNORE_BADFD))))
+ translog_syncs++;
+ }
+
+ DBUG_RETURN(rc);
+}
+
+
+/*
+ @brief Flushes buffers with LSNs in them less or equal address <lsn>
+
+ @param lsn address up to which all LSNs should be flushed,
+ can be reset to real last LSN address
+ @parem sent_to_disk returns 'sent to disk' position
+ @param flush_horizon returns horizon of the flush
+
+ @note About terminology see comment to translog_flush().
+*/
+
+void translog_flush_buffers(TRANSLOG_ADDRESS *lsn,
+ TRANSLOG_ADDRESS *sent_to_disk,
+ TRANSLOG_ADDRESS *flush_horizon)
+{
dirty_buffer_mask_t dirty_buffer_mask;
+ uint i;
uint8 last_buffer_no, start_buffer_no;
- my_bool rc= 0;
- DBUG_ENTER("translog_flush");
- DBUG_PRINT("enter", ("Flush up to LSN: (%lu,0x%lx)", LSN_IN_PARTS(lsn)));
- DBUG_ASSERT(translog_status == TRANSLOG_OK ||
- translog_status == TRANSLOG_READONLY);
- LINT_INIT(sent_to_disk);
-
- pthread_mutex_lock(&log_descriptor.log_flush_lock);
- DBUG_PRINT("info", ("Everything is flushed up to (%lu,0x%lx)",
- LSN_IN_PARTS(log_descriptor.flushed)));
- if (cmp_translog_addr(log_descriptor.flushed, lsn) >= 0)
- {
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);
- DBUG_RETURN(0);
- }
- if (log_descriptor.flush_in_progress)
- {
- translog_flush_set_new_goal_and_wait(lsn);
- if (!pthread_equal(log_descriptor.max_lsn_requester, pthread_self()))
- {
- /* fix lsn if it was horizon */
- if (cmp_translog_addr(lsn, log_descriptor.bc.buffer->last_lsn) > 0)
- lsn= BUFFER_MAX_LSN(log_descriptor.bc.buffer);
- translog_flush_wait_for_end(lsn);
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);
- DBUG_RETURN(0);
- }
- log_descriptor.next_pass_max_lsn= LSN_IMPOSSIBLE;
- }
- log_descriptor.flush_in_progress= 1;
- flush_horizon= log_descriptor.previous_flush_horizon;
- DBUG_PRINT("info", ("flush_in_progress is set"));
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);
-
- translog_lock();
- if (log_descriptor.is_everything_flushed)
- {
- DBUG_PRINT("info", ("everything is flushed"));
- rc= (translog_status == TRANSLOG_READONLY);
- translog_unlock();
- goto out;
- }
+ DBUG_ENTER("translog_flush_buffers");
/*
We will recheck information when will lock buffers one by
@@ -7656,15 +7806,15 @@
/*
if LSN up to which we have to flush bigger then maximum LSN of previous
buffer and at least one LSN was saved in the current buffer (last_lsn !=
- LSN_IMPOSSIBLE) then we better finish the current buffer.
+ LSN_IMPOSSIBLE) then we have to close the current buffer.
*/
- if (cmp_translog_addr(lsn, log_descriptor.bc.buffer->prev_last_lsn) > 0 &&
+ if (cmp_translog_addr(*lsn, log_descriptor.bc.buffer->prev_last_lsn) > 0 &&
log_descriptor.bc.buffer->last_lsn != LSN_IMPOSSIBLE)
{
struct st_translog_buffer *buffer= log_descriptor.bc.buffer;
- lsn= log_descriptor.bc.buffer->last_lsn; /* fix lsn if it was horizon */
+ *lsn= log_descriptor.bc.buffer->last_lsn; /* fix lsn if it was horizon */
DBUG_PRINT("info", ("LSN to flush fixed to last lsn: (%lu,0x%lx)",
- LSN_IN_PARTS(log_descriptor.bc.buffer->last_lsn)));
+ LSN_IN_PARTS(log_descriptor.bc.buffer->last_lsn)));
last_buffer_no= log_descriptor.bc.buffer_no;
log_descriptor.is_everything_flushed= 1;
translog_force_current_buffer_to_finish();
@@ -7676,8 +7826,10 @@
TRANSLOG_BUFFERS_NO);
translog_unlock();
}
- sent_to_disk= translog_get_sent_to_disk();
- if (cmp_translog_addr(lsn, sent_to_disk) > 0)
+
+ /* flush buffers */
+ *sent_to_disk= translog_get_sent_to_disk();
+ if (cmp_translog_addr(*lsn, *sent_to_disk) > 0)
{
DBUG_PRINT("info", ("Start buffer #: %u last buffer #: %u",
@@ -7697,53 +7849,238 @@
LSN_IN_PARTS(buffer->last_lsn),
(buffer->file ?
"dirty" : "closed")));
- if (buffer->prev_last_lsn <= lsn &&
+ if (buffer->prev_last_lsn <= *lsn &&
buffer->file != NULL)
{
- DBUG_ASSERT(flush_horizon <= buffer->offset + buffer->size);
- flush_horizon= buffer->offset + buffer->size;
+ DBUG_ASSERT(*flush_horizon <= buffer->offset + buffer->size);
+ *flush_horizon= (buffer->pre_force_close_horizon != LSN_IMPOSSIBLE ?
+ buffer->pre_force_close_horizon :
+ buffer->offset + buffer->size);
+ /* pre_force_close_horizon is reset during new buffer start */
+ DBUG_PRINT("info", ("flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(*flush_horizon)));
+ DBUG_ASSERT(*flush_horizon <= log_descriptor.horizon);
+
translog_buffer_flush(buffer);
}
translog_buffer_unlock(buffer);
i= (i + 1) % TRANSLOG_BUFFERS_NO;
} while (i != last_buffer_no);
- sent_to_disk= translog_get_sent_to_disk();
- }
-
- /* sync files from previous flush till current one */
- for (fn= LSN_FILE_NO(log_descriptor.flushed); fn <= LSN_FILE_NO(lsn); fn++)
- {
- TRANSLOG_FILE *file= get_logfile_by_number(fn);
- DBUG_ASSERT(file != NULL);
- if (!file->is_sync)
- {
- if (my_sync(file->handler.file, MYF(MY_WME)))
+ *sent_to_disk= translog_get_sent_to_disk();
+ }
+
+ DBUG_VOID_RETURN;
+}
+
+/**
+ @brief Flush the log up to given LSN (included)
+
+ @param lsn log record serial number up to which (inclusive)
+ the log has to be flushed
+
+ @return Operation status
+ @retval 0 OK
+ @retval 1 Error
+
+ @note
+
+ - Non group commit logic: Commits made in passes. Thread which started
+ flush first is performing actual flush, other threads sets new goal (LSN)
+ of the next pass (if it is maximum) and waits for the pass end or just
+ wait for the pass end.
+
+ - If hard group commit enabled and rate set to zero:
+ The first thread sends all changed buffers to disk. This is repeated
+ as long as there are new LSNs added. The process can not loop
+ forever because we have limited number of threads and they will wait
+ for the data to be synced.
+ Pseudo code:
+
+ do
+ send changed buffers to disk
+ while new_goal
+ sync
+
+ - If hard group commit switched ON and less than rate microseconds has
+ passed from last sync, then after buffers have been sent to disk
+ wait until rate microseconds has passed since last sync, do sync and return.
+ This ensures that if we call sync infrequently we don't do any waits.
+
+ - If soft group commit enabled everything works as with 'non group commit'
+ but the thread doesn't do any real sync(). If rate is not zero the
+ sync() will be performed by a service thread with the given rate
+ when needed (new LSN appears).
+
+ @note Terminology:
+ 'sent to disk' means written to disk but not sync()ed,
+ 'flushed' mean sent to disk and synced().
+*/
+
+my_bool translog_flush(TRANSLOG_ADDRESS lsn)
+{
+ struct timespec abstime;
+ ulonglong flush_interval;
+ ulonglong time_spent;
+ LSN sent_to_disk= LSN_IMPOSSIBLE;
+ TRANSLOG_ADDRESS flush_horizon;
+ my_bool rc= 0;
+ my_bool hgroup_commit_at_start;
+ DBUG_ENTER("translog_flush");
+ DBUG_PRINT("enter", ("Flush up to LSN: (%lu,0x%lx)", LSN_IN_PARTS(lsn)));
+ DBUG_ASSERT(translog_status == TRANSLOG_OK ||
+ translog_status == TRANSLOG_READONLY);
+ LINT_INIT(sent_to_disk);
+
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ DBUG_PRINT("info", ("Everything is flushed up to (%lu,0x%lx)",
+ LSN_IN_PARTS(log_descriptor.flushed)));
+ if (cmp_translog_addr(log_descriptor.flushed, lsn) >= 0)
+ {
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ DBUG_RETURN(0);
+ }
+ if (log_descriptor.flush_in_progress)
+ {
+ translog_lock();
+ /* fix lsn if it was horizon */
+ if (cmp_translog_addr(lsn, log_descriptor.bc.buffer->last_lsn) > 0)
+ lsn= BUFFER_MAX_LSN(log_descriptor.bc.buffer);
+ translog_unlock();
+ translog_flush_set_new_goal_and_wait(lsn);
+ if (!pthread_equal(log_descriptor.max_lsn_requester, pthread_self()))
+ {
+ /*
+ translog_flush_wait_for_end() release log_flush_lock while is
+ waiting then acquire it again
+ */
+ translog_flush_wait_for_end(lsn);
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ DBUG_RETURN(0);
+ }
+ log_descriptor.next_pass_max_lsn= LSN_IMPOSSIBLE;
+ }
+ log_descriptor.flush_in_progress= 1;
+ flush_horizon= log_descriptor.previous_flush_horizon;
+ DBUG_PRINT("info", ("flush_in_progress is set, flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(flush_horizon)));
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+
+ hgroup_commit_at_start= hard_group_commit;
+ if (hgroup_commit_at_start)
+ flush_interval= group_commit_wait;
+
+ translog_lock();
+ if (log_descriptor.is_everything_flushed)
+ {
+ DBUG_PRINT("info", ("everything is flushed"));
+ translog_unlock();
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ goto out;
+ }
+
+ for (;;)
+ {
+ /* Following function flushes buffers and makes translog_unlock() */
+ translog_flush_buffers(&lsn, &sent_to_disk, &flush_horizon);
+
+ if (!hgroup_commit_at_start)
+ break; /* flush pass is ended */
+
+retest:
+ /*
+ We do not check time here because pthread_mutex_lock rarely takes
+ a lot of time so we can sacrifice a bit precision to performance
+ (taking into account that my_micro_time() might be expensive call).
+ */
+ if (flush_interval == 0)
+ break; /* flush pass is ended */
+
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ if (log_descriptor.next_pass_max_lsn == LSN_IMPOSSIBLE)
+ {
+ if (flush_interval == 0 ||
+ (time_spent= (my_micro_time() - flush_start)) >= flush_interval)
{
- rc= 1;
- translog_stop_writing();
- sent_to_disk= LSN_IMPOSSIBLE;
- goto out;
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ break;
}
- file->is_sync= 1;
- }
- }
-
- if (sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS &&
- (LSN_FILE_NO(log_descriptor.previous_flush_horizon) !=
- LSN_FILE_NO(flush_horizon) ||
- ((LSN_OFFSET(log_descriptor.previous_flush_horizon) - 1) /
- TRANSLOG_PAGE_SIZE) !=
- ((LSN_OFFSET(flush_horizon) - 1) / TRANSLOG_PAGE_SIZE)))
- rc|= sync_dir(log_descriptor.directory_fd, MYF(MY_WME | MY_IGNORE_BADFD));
+ DBUG_PRINT("info", ("flush waits: %llu interval: %llu spent: %llu",
+ flush_interval - time_spent,
+ flush_interval, time_spent));
+ /* wait time or next goal */
+ set_timespec_nsec(abstime, flush_interval - time_spent);
+ pthread_cond_timedwait(&log_descriptor.new_goal_cond,
+ &log_descriptor.log_flush_lock,
+ &abstime);
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ DBUG_PRINT("info", ("retest conditions"));
+ goto retest;
+ }
+
+ /* take next goal */
+ lsn= log_descriptor.next_pass_max_lsn;
+ log_descriptor.next_pass_max_lsn= LSN_IMPOSSIBLE;
+ /* prevent other thread from continue */
+ log_descriptor.max_lsn_requester= pthread_self();
+ DBUG_PRINT("info", ("flush took next goal: (%lu,0x%lx)",
+ LSN_IN_PARTS(lsn)));
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+
+ /* next flush pass */
+ DBUG_PRINT("info", ("next flush pass"));
+ translog_lock();
+ }
+
+ /*
+ sync() files from previous flush till current one
+ */
+ if (!soft_sync || hgroup_commit_at_start)
+ {
+ if ((rc=
+ translog_sync_files(LSN_FILE_NO(log_descriptor.flushed),
+ LSN_FILE_NO(lsn),
+ sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS &&
+ (LSN_FILE_NO(log_descriptor.
+ previous_flush_horizon) !=
+ LSN_FILE_NO(flush_horizon) ||
+ (LSN_OFFSET(log_descriptor.
+ previous_flush_horizon) /
+ TRANSLOG_PAGE_SIZE) !=
+ (LSN_OFFSET(flush_horizon) /
+ TRANSLOG_PAGE_SIZE)))))
+ {
+ sent_to_disk= LSN_IMPOSSIBLE;
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ goto out;
+ }
+ /* keep values for soft sync() and forced sync() actual */
+ {
+ uint32 fileno= LSN_FILE_NO(lsn);
+ my_atomic_rwlock_wrlock(&soft_sync_rwl);
+ my_atomic_store32(&soft_sync_min, fileno);
+ my_atomic_store32(&soft_sync_max, fileno);
+ my_atomic_rwlock_wrunlock(&soft_sync_rwl);
+ }
+ }
+ else
+ {
+ my_atomic_rwlock_wrlock(&soft_sync_rwl);
+ my_atomic_store32(&soft_sync_max, LSN_FILE_NO(lsn));
+ my_atomic_store32(&soft_need_sync, 1);
+ my_atomic_rwlock_wrunlock(&soft_sync_rwl);
+ }
+
+ DBUG_ASSERT(flush_horizon <= log_descriptor.horizon);
+
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
log_descriptor.previous_flush_horizon= flush_horizon;
out:
- pthread_mutex_lock(&log_descriptor.log_flush_lock);
if (sent_to_disk != LSN_IMPOSSIBLE)
log_descriptor.flushed= sent_to_disk;
log_descriptor.flush_in_progress= 0;
log_descriptor.flush_no++;
DBUG_PRINT("info", ("flush_in_progress is dropped"));
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);\
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
pthread_cond_broadcast(&log_descriptor.log_flush_cond);
DBUG_RETURN(rc);
}
@@ -8113,6 +8450,8 @@
my_bool translog_purge(TRANSLOG_ADDRESS low)
{
uint32 last_need_file= LSN_FILE_NO(low);
+ uint32 min_unsync;
+ int soft;
TRANSLOG_ADDRESS horizon= translog_get_horizon();
int rc= 0;
DBUG_ENTER("translog_purge");
@@ -8120,12 +8459,26 @@
DBUG_ASSERT(translog_status == TRANSLOG_OK ||
translog_status == TRANSLOG_READONLY);
+ soft= soft_sync;
+ my_atomic_rwlock_wrlock(&soft_sync_rwl);
+ min_unsync= my_atomic_load32(&soft_sync_min);
+ my_atomic_rwlock_wrunlock(&soft_sync_rwl);
+ DBUG_PRINT("info", ("min_unsync: %lu", (ulong) min_unsync));
+ if (soft && min_unsync < last_need_file)
+ {
+ last_need_file= min_unsync;
+ DBUG_PRINT("info", ("last_need_file set to %lu", (ulong)last_need_file));
+ }
+
pthread_mutex_lock(&log_descriptor.purger_lock);
+ DBUG_PRINT("info", ("last_lsn_checked file: %lu:",
+ (ulong) log_descriptor.last_lsn_checked));
if (LSN_FILE_NO(log_descriptor.last_lsn_checked) < last_need_file)
{
uint32 i;
uint32 min_file= translog_first_file(horizon, 1);
DBUG_ASSERT(min_file != 0); /* log is already started */
+ DBUG_PRINT("info", ("min_file: %lu:",(ulong) min_file));
for(i= min_file; i < last_need_file && rc == 0; i++)
{
LSN lsn= translog_get_file_max_lsn_stored(i);
@@ -8356,6 +8709,159 @@
}
+
+/**
+ Sets soft sync mode
+
+ @param mode TRUE if we need switch soft sync on else off
+*/
+
+void translog_soft_sync(my_bool mode)
+{
+ soft_sync= mode;
+}
+
+
+/**
+ Sets hard group commit
+
+ @param mode TRUE if we need switch hard group commit on else off
+*/
+
+void translog_hard_group_commit(my_bool mode)
+{
+ hard_group_commit= mode;
+}
+
+
+/**
+ @brief forced log sync (used when we are switching modes)
+*/
+
+void translog_sync()
+{
+ uint32 max= get_current_logfile()->number;
+ uint32 min;
+ DBUG_ENTER("ma_translog_sync");
+
+ my_atomic_rwlock_rdlock(&soft_sync_rwl);
+ min= my_atomic_load32(&soft_sync_min);
+ my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+ if (!min)
+ min= max;
+
+ translog_sync_files(min, max, sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS);
+
+ DBUG_VOID_RETURN;
+}
+
+
+/**
+ @brief set rate for group commit
+
+ @param interval interval to set.
+
+ @note We use this function with additional variable because have to
+ restart service thread with new value which we can't make inside changing
+ variable routine (update_maria_group_commit_interval)
+*/
+
+void translog_set_group_commit_interval(uint32 interval)
+{
+ DBUG_ENTER("translog_set_group_commit_interval");
+ group_commit_wait= interval;
+ DBUG_PRINT("info", ("wait: %llu",
+ (ulonglong)group_commit_wait));
+ DBUG_VOID_RETURN;
+}
+
+
+/**
+ @brief syncing service thread
+*/
+
+static pthread_handler_t
+ma_soft_sync_background( void *arg __attribute__((unused)))
+{
+
+ my_thread_init();
+ {
+ DBUG_ENTER("ma_soft_sync_background");
+ for(;;)
+ {
+ ulonglong prev_loop= my_micro_time();
+ ulonglong time, sleep;
+ uint32 min, max, sync_request;
+ my_atomic_rwlock_rdlock(&soft_sync_rwl);
+ min= my_atomic_load32(&soft_sync_min);
+ max= my_atomic_load32(&soft_sync_max);
+ sync_request= my_atomic_load32(&soft_need_sync);
+ my_atomic_store32(&soft_sync_min, max);
+ my_atomic_store32(&soft_need_sync, 0);
+ my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+
+ sleep= group_commit_wait;
+ if (sync_request)
+ translog_sync_files(min, max, FALSE);
+ time= my_micro_time() - prev_loop;
+ if (time > sleep)
+ sleep= 0;
+ else
+ sleep-= time;
+ if (my_service_thread_sleep(&soft_sync_control, sleep))
+ break;
+ }
+ my_service_thread_signal_end(&soft_sync_control);
+ my_thread_end();
+ DBUG_RETURN(0);
+ }
+}
+
+
+/**
+ @brief Starts syncing thread
+*/
+
+int translog_soft_sync_start(void)
+{
+ pthread_t th;
+ int res= 0;
+ uint32 min, max;
+ DBUG_ENTER("translog_soft_sync_start");
+
+ /* check and init variables */
+ my_atomic_rwlock_rdlock(&soft_sync_rwl);
+ min= my_atomic_load32(&soft_sync_min);
+ max= my_atomic_load32(&soft_sync_max);
+ if (!max)
+ my_atomic_store32(&soft_sync_max, (max= get_current_logfile()->number));
+ if (!min)
+ my_atomic_store32(&soft_sync_min, max);
+ my_atomic_store32(&soft_need_sync, 1);
+ my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+
+ if (!(res= ma_service_thread_control_init(&soft_sync_control)))
+ if (!(res= pthread_create(&th, NULL, ma_soft_sync_background, NULL)))
+ soft_sync_control.status= THREAD_RUNNING;
+ DBUG_RETURN(res);
+}
+
+
+/**
+ @brief Stops syncing thread
+*/
+
+void translog_soft_sync_end(void)
+{
+ DBUG_ENTER("translog_soft_sync_end");
+ if (soft_sync_control.inited)
+ {
+ ma_service_thread_control_end(&soft_sync_control);
+ }
+ DBUG_VOID_RETURN;
+}
+
+
#ifdef MARIA_DUMP_LOG
#include <my_getopt.h>
extern void translog_example_table_init();
=== modified file 'storage/maria/ma_loghandler.h'
--- a/storage/maria/ma_loghandler.h 2009-01-15 22:25:53 +0000
+++ b/storage/maria/ma_loghandler.h 2010-02-10 20:50:26 +0000
@@ -342,6 +342,14 @@
TRANSLOG_SHUTDOWN /* going to shutdown the loghandler */
};
extern enum enum_translog_status translog_status;
+extern ulonglong translog_syncs; /* Number of sync()s */
+
+void translog_soft_sync(my_bool mode);
+void translog_hard_group_commit(my_bool mode);
+int translog_soft_sync_start(void);
+void translog_soft_sync_end(void);
+void translog_sync();
+void translog_set_group_commit_interval(uint32 interval);
/*
all the rest added because of recovery; should we make
@@ -441,6 +449,14 @@
typedef enum
{
+ TRANSLOG_GCOMMIT_NONE,
+ TRANSLOG_GCOMMIT_HARD,
+ TRANSLOG_GCOMMIT_SOFT
+} enum_maria_group_commit;
+extern ulong maria_group_commit;
+extern ulong maria_group_commit_interval;
+typedef enum
+{
TRANSLOG_PURGE_IMMIDIATE,
TRANSLOG_PURGE_EXTERNAL,
TRANSLOG_PURGE_ONDEMAND
1
0
[Maria-developers] Rev 2740: Group commit for maria storage engine. in file:///Users/bell/maria/bzr/work-maria-5.2-groupcommit/
by sanja@askmonty.org 10 Feb '10
by sanja@askmonty.org 10 Feb '10
10 Feb '10
At file:///Users/bell/maria/bzr/work-maria-5.2-groupcommit/
------------------------------------------------------------
revno: 2740
revision-id: sanja(a)askmonty.org-20100209083259-ekki5zw4hbaeqpwh
parent: knielsen(a)knielsen-hq.org-20100201190519-b9uktnn90rwwiile
committer: sanja(a)askmonty.org
branch nick: work-maria-5.2-groupcommit
timestamp: Tue 2010-02-09 10:32:59 +0200
message:
Group commit for maria storage engine.
=== added file 'mysql-test/suite/maria/r/group_commit.result'
--- a/mysql-test/suite/maria/r/group_commit.result 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/maria/r/group_commit.result 2010-02-09 08:32:59 +0000
@@ -0,0 +1,17 @@
+drop table if exists t1;
+create table t1 (a int);
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 100;
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 0;
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 100;
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 0;
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 100;
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+drop table t1;
=== modified file 'mysql-test/suite/maria/r/maria3.result'
--- a/mysql-test/suite/maria/r/maria3.result 2009-09-18 01:04:43 +0000
+++ b/mysql-test/suite/maria/r/maria3.result 2010-02-09 08:32:59 +0000
@@ -306,6 +306,8 @@
maria_block_size 8192
maria_checkpoint_interval 30
maria_force_start_after_recovery_failures 0
+maria_group_commit none
+maria_group_commit_interval 0
maria_log_file_size 4294959104
maria_log_purge_type immediate
maria_max_sort_file_size 9223372036853727232
@@ -328,6 +330,7 @@
Maria_pagecache_reads #
Maria_pagecache_write_requests #
Maria_pagecache_writes #
+Maria_transaction_log_syncs #
create table t1 (b char(0));
insert into t1 values(NULL),("");
select length(b) from t1;
=== added file 'mysql-test/suite/maria/t/group_commit.test'
--- a/mysql-test/suite/maria/t/group_commit.test 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/maria/t/group_commit.test 2010-02-09 08:32:59 +0000
@@ -0,0 +1,71 @@
+# Test different ways of syncing (mostly syntax)
+
+--disable_warnings
+drop table if exists t1;
+--enable_warnings
+
+create table t1 (a int);
+
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 100;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 0;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 100;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 0;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 100;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+drop table t1;
=== modified file 'storage/maria/ha_maria.cc'
--- a/storage/maria/ha_maria.cc 2009-12-03 11:34:11 +0000
+++ b/storage/maria/ha_maria.cc 2010-02-09 08:32:59 +0000
@@ -102,22 +102,40 @@
array_elements(maria_translog_purge_type_names) - 1, "",
maria_translog_purge_type_names, NULL
};
+
+/* transactional log directory sync */
const char *maria_sync_log_dir_names[]=
{
"NEVER", "NEWFILE", "ALWAYS", NullS
};
-
TYPELIB maria_sync_log_dir_typelib=
{
array_elements(maria_sync_log_dir_names) - 1, "",
maria_sync_log_dir_names, NULL
};
+/* transactional log group commit */
+const char *maria_group_commit_names[]=
+{
+ "none", "hard", "soft", NullS
+};
+TYPELIB maria_group_commit_typelib=
+{
+ array_elements(maria_group_commit_names) - 1, "",
+ maria_group_commit_names, NULL
+};
+
/** Interval between background checkpoints in seconds */
static ulong checkpoint_interval;
static void update_checkpoint_interval(MYSQL_THD thd,
struct st_mysql_sys_var *var,
void *var_ptr, const void *save);
+static void update_maria_group_commit(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save);
+static void update_maria_group_commit_interval(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save);
/** After that many consecutive recovery failures, remove logs */
static ulong force_start_after_recovery_failures;
static void update_log_file_size(MYSQL_THD thd,
@@ -164,6 +182,24 @@
NULL, update_log_file_size, TRANSLOG_FILE_SIZE,
TRANSLOG_MIN_FILE_SIZE, 0xffffffffL, TRANSLOG_PAGE_SIZE);
+static MYSQL_SYSVAR_ENUM(group_commit, maria_group_commit,
+ PLUGIN_VAR_RQCMDARG,
+ "Specifies maria group commit mode. "
+ "Possible values are \"none\" (no group commit), "
+ "\"hard\" (with waiting to actual commit), "
+ "\"soft\" (no wait for commit (DANGEROUS!!!))",
+ NULL, update_maria_group_commit,
+ TRANSLOG_GCOMMIT_NONE, &maria_group_commit_typelib);
+
+static MYSQL_SYSVAR_ULONG(group_commit_interval, maria_group_commit_interval,
+ PLUGIN_VAR_RQCMDARG,
+ "Interval between commite in microseconds (1/1000000c)."
+ " 0 stands for no waiting"
+ "for other threads to come and do a commit in \"hard\" mode and no"
+ " sync()/commit at all in \"soft\" mode. Option has only an effect"
+ "if maria_group_commit is used",
+ NULL, update_maria_group_commit_interval, 0, 0, UINT_MAX, 1);
+
static MYSQL_SYSVAR_ENUM(log_purge_type, log_purge_type,
PLUGIN_VAR_RQCMDARG,
"Specifies how maria transactional log will be purged. "
@@ -3275,6 +3311,8 @@
MYSQL_SYSVAR(block_size),
MYSQL_SYSVAR(checkpoint_interval),
MYSQL_SYSVAR(force_start_after_recovery_failures),
+ MYSQL_SYSVAR(group_commit),
+ MYSQL_SYSVAR(group_commit_interval),
MYSQL_SYSVAR(page_checksum),
MYSQL_SYSVAR(log_dir_path),
MYSQL_SYSVAR(log_file_size),
@@ -3306,6 +3344,92 @@
}
/**
+ @brief Updates group commit mode
+*/
+
+static void update_maria_group_commit(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save)
+{
+ ulong value= (ulong)*((long *)var_ptr);
+ DBUG_ENTER("update_maria_group_commit");
+ DBUG_PRINT("enter", ("old value: %lu new value %lu rate %lu",
+ value, (ulong)(*(long *)save),
+ maria_group_commit_interval));
+ /* old value */
+ switch (value) {
+ case TRANSLOG_GCOMMIT_NONE:
+ break;
+ case TRANSLOG_GCOMMIT_HARD:
+ translog_hard_group_commit(FALSE);
+ break;
+ case TRANSLOG_GCOMMIT_SOFT:
+ translog_soft_sync(FALSE);
+ if (maria_group_commit_interval)
+ translog_soft_sync_end();
+ break;
+ default:
+ DBUG_ASSERT(0); /* impossible */
+ }
+ value= *(ulong *)var_ptr= (ulong)(*(long *)save);
+ translog_sync();
+ /* new value */
+ switch (value) {
+ case TRANSLOG_GCOMMIT_NONE:
+ break;
+ case TRANSLOG_GCOMMIT_HARD:
+ translog_hard_group_commit(TRUE);
+ break;
+ case TRANSLOG_GCOMMIT_SOFT:
+ translog_soft_sync(TRUE);
+ /* variable change made under global lock so we can just read it */
+ if (maria_group_commit_interval)
+ translog_soft_sync_start();
+ break;
+ default:
+ DBUG_ASSERT(0); /* impossible */
+ }
+ DBUG_VOID_RETURN;
+}
+
+/**
+ @brief Updates group commit interval
+*/
+
+static void update_maria_group_commit_interval(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save)
+{
+ ulong new_value= (ulong)*((long *)save);
+ ulong *value_ptr= (ulong*) var_ptr;
+ DBUG_ENTER("update_maria_group_commit_interval");
+ DBUG_PRINT("enter", ("old value: %lu new value %lu group commit %lu",
+ *value_ptr, new_value, maria_group_commit));
+
+ /* variable change made under global lock so we can just read it */
+ switch (maria_group_commit) {
+ case TRANSLOG_GCOMMIT_NONE:
+ *value_ptr= new_value;
+ translog_set_group_commit_interval(new_value);
+ break;
+ case TRANSLOG_GCOMMIT_HARD:
+ *value_ptr= new_value;
+ translog_set_group_commit_interval(new_value);
+ break;
+ case TRANSLOG_GCOMMIT_SOFT:
+ if (*value_ptr)
+ translog_soft_sync_end();
+ translog_set_group_commit_interval(new_value);
+ if ((*value_ptr= new_value))
+ translog_soft_sync_start();
+ break;
+ default:
+ DBUG_ASSERT(0); /* impossible */
+ }
+ DBUG_VOID_RETURN;
+}
+
+/**
@brief Updates the transaction log file limit.
*/
@@ -3327,6 +3451,7 @@
{"Maria_pagecache_reads", (char*) &maria_pagecache_var.global_cache_read, SHOW_LONGLONG},
{"Maria_pagecache_write_requests", (char*) &maria_pagecache_var.global_cache_w_requests, SHOW_LONGLONG},
{"Maria_pagecache_writes", (char*) &maria_pagecache_var.global_cache_write, SHOW_LONGLONG},
+ {"Maria_transaction_log_syncs", (char*) &translog_syncs, SHOW_LONGLONG},
{NullS, NullS, SHOW_LONG}
};
=== modified file 'storage/maria/ma_init.c'
--- a/storage/maria/ma_init.c 2008-10-09 20:03:54 +0000
+++ b/storage/maria/ma_init.c 2010-02-09 08:32:59 +0000
@@ -82,6 +82,11 @@
maria_inited= maria_multi_threaded= FALSE;
ft_free_stopwords();
ma_checkpoint_end();
+ if (translog_status == TRANSLOG_OK)
+ {
+ translog_soft_sync_end();
+ translog_sync();
+ }
if ((trid= trnman_get_max_trid()) > max_trid_in_control_file)
{
/*
=== modified file 'storage/maria/ma_loghandler.c'
--- a/storage/maria/ma_loghandler.c 2010-01-06 21:27:53 +0000
+++ b/storage/maria/ma_loghandler.c 2010-02-09 08:32:59 +0000
@@ -18,6 +18,7 @@
#include "ma_blockrec.h" /* for some constants and in-write hooks */
#include "ma_key_recover.h" /* For some in-write hooks */
#include "ma_checkpoint.h"
+#include "ma_servicethread.h"
/*
On Windows, neither my_open() nor my_sync() work for directories.
@@ -47,6 +48,15 @@
#include <m_ctype.h>
#endif
+/** @brief protects checkpoint_in_progress */
+static pthread_mutex_t LOCK_soft_sync;
+/** @brief for killing the background checkpoint thread */
+static pthread_cond_t COND_soft_sync;
+/** @brief control structure for checkpoint background thread */
+static MA_SERVICE_THREAD_CONTROL soft_sync_control=
+ {THREAD_DEAD, FALSE, &LOCK_soft_sync, &COND_soft_sync};
+
+
/* transaction log file descriptor */
typedef struct st_translog_file
{
@@ -124,10 +134,20 @@
/* Previous buffer offset to detect it flush finish */
TRANSLOG_ADDRESS prev_buffer_offset;
/*
+ If the buffer was forced to close it save value of its horizon
+ otherwise LSN_IMPOSSIBLE
+ */
+ TRANSLOG_ADDRESS pre_force_close_horizon;
+ /*
How much is written (or will be written when copy_to_buffer_in_progress
become 0) to this buffer
*/
translog_size_t size;
+ /*
+ How much data was skipped during moving page from previous buffer
+ to this one (it is optimisation of forcing buffer to finish
+ */
+ uint skipped_data;
/* File handler for this buffer */
TRANSLOG_FILE *file;
/* Threads which are waiting for buffer filling/freeing */
@@ -304,6 +324,7 @@
*/
pthread_mutex_t log_flush_lock;
pthread_cond_t log_flush_cond;
+ pthread_cond_t new_goal_cond;
/* Protects changing of headers of finished files (max_lsn) */
pthread_mutex_t file_header_lock;
@@ -344,13 +365,38 @@
ulong log_purge_type= TRANSLOG_PURGE_IMMIDIATE;
ulong log_file_size= TRANSLOG_FILE_SIZE;
+/* sync() of log files directory mode */
ulong sync_log_dir= TRANSLOG_SYNC_DIR_NEWFILE;
+ulong maria_group_commit= TRANSLOG_GCOMMIT_NONE;
+ulong maria_group_commit_interval= 0;
/* Marker for end of log */
static uchar end_of_log= 0;
#define END_OF_LOG &end_of_log
+/**
+ Switch for "soft" sync (no real sync() but periodical sync by service
+ thread)
+*/
+static volatile my_bool soft_sync= FALSE;
+/**
+ Switch for "hard" group commit mode
+*/
+static volatile my_bool hard_group_commit= FALSE;
+/**
+ File numbers interval which have to be sync()
+*/
+static uint32 soft_sync_min= 0;
+static uint32 soft_sync_max= 0;
+/**
+ stores interval in microseconds
+*/
+static uint32 group_commit_wait= 0;
enum enum_translog_status translog_status= TRANSLOG_UNINITED;
+ulonglong translog_syncs= 0; /* Number of sync()s */
+
+/* time of last flush */
+static ulonglong flush_start= 0;
/* chunk types */
#define TRANSLOG_CHUNK_LSN 0x00 /* 0 chunk refer as LSN (head or tail */
@@ -980,12 +1026,17 @@
static TRANSLOG_FILE *get_current_logfile()
{
TRANSLOG_FILE *file;
+ DBUG_ENTER("get_current_logfile");
rw_rdlock(&log_descriptor.open_files_lock);
+ DBUG_PRINT("info", ("max_file: %lu min_file: %lu open_files: %lu",
+ (ulong) log_descriptor.max_file,
+ (ulong) log_descriptor.min_file,
+ (ulong) log_descriptor.open_files.elements));
DBUG_ASSERT(log_descriptor.max_file - log_descriptor.min_file + 1 ==
log_descriptor.open_files.elements);
file= *dynamic_element(&log_descriptor.open_files, 0, TRANSLOG_FILE **);
rw_unlock(&log_descriptor.open_files_lock);
- return (file);
+ DBUG_RETURN(file);
}
uchar NEAR maria_trans_file_magic[]=
@@ -1069,6 +1120,7 @@
static my_bool translog_max_lsn_to_header(File file, LSN lsn)
{
uchar lsn_buff[LSN_STORE_SIZE];
+ my_bool rc;
DBUG_ENTER("translog_max_lsn_to_header");
DBUG_PRINT("enter", ("File descriptor: %ld "
"lsn: (%lu,0x%lx)",
@@ -1077,11 +1129,13 @@
lsn_store(lsn_buff, lsn);
- DBUG_RETURN(my_pwrite(file, lsn_buff,
- LSN_STORE_SIZE,
- (LOG_HEADER_DATA_SIZE - LSN_STORE_SIZE),
- log_write_flags) != 0 ||
- my_sync(file, MYF(MY_WME)) != 0);
+ if (!(rc= (my_pwrite(file, lsn_buff,
+ LSN_STORE_SIZE,
+ (LOG_HEADER_DATA_SIZE - LSN_STORE_SIZE),
+ log_write_flags) != 0 ||
+ my_sync(file, MYF(MY_WME)) != 0)))
+ translog_syncs++;
+ DBUG_RETURN(rc);
}
@@ -1423,7 +1477,9 @@
static my_bool translog_buffer_init(struct st_translog_buffer *buffer, int num)
{
DBUG_ENTER("translog_buffer_init");
- buffer->prev_last_lsn= buffer->last_lsn= LSN_IMPOSSIBLE;
+ buffer->pre_force_close_horizon=
+ buffer->prev_last_lsn= buffer->last_lsn=
+ LSN_IMPOSSIBLE;
DBUG_PRINT("info", ("last_lsn and prev_last_lsn set to 0 buffer: 0x%lx",
(ulong) buffer));
@@ -1435,6 +1491,7 @@
memset(buffer->buffer, TRANSLOG_FILLER, TRANSLOG_WRITE_BUFFER);
/* Buffer size */
buffer->size= 0;
+ buffer->skipped_data= 0;
/* cond of thread which is waiting for buffer filling */
if (pthread_cond_init(&buffer->waiting_filling_buffer, 0))
DBUG_RETURN(1);
@@ -1489,7 +1546,10 @@
TODO: sync only we have changed the log
*/
if (!file->is_sync)
+ {
rc= my_sync(file->handler.file, MYF(MY_WME));
+ translog_syncs++;
+ }
rc|= my_close(file->handler.file, MYF(MY_WME));
my_free(file, MYF(0));
return test(rc);
@@ -2044,7 +2104,8 @@
(ulong) LSN_OFFSET(log_descriptor.horizon),
(ulong) LSN_OFFSET(log_descriptor.horizon)));
DBUG_ASSERT(buffer_no == buffer->buffer_no);
- buffer->prev_last_lsn= buffer->last_lsn= LSN_IMPOSSIBLE;
+ buffer->pre_force_close_horizon=
+ buffer->prev_last_lsn= buffer->last_lsn= LSN_IMPOSSIBLE;
DBUG_PRINT("info", ("last_lsn and prev_last_lsn set to 0 buffer: 0x%lx",
(ulong) buffer));
buffer->offset= log_descriptor.horizon;
@@ -2052,6 +2113,7 @@
buffer->file= get_current_logfile();
buffer->overlay= 0;
buffer->size= 0;
+ buffer->skipped_data= 0;
translog_cursor_init(cursor, buffer, buffer_no);
DBUG_PRINT("info", ("file: #%ld (%d) init cursor #%u: 0x%lx "
"chaser: %d Size: %lu (%lu)",
@@ -2523,6 +2585,7 @@
TRANSLOG_ADDRESS offset= buffer->offset;
TRANSLOG_FILE *file= buffer->file;
uint8 ver= buffer->ver;
+ uint skipped_data;
DBUG_ENTER("translog_buffer_flush");
DBUG_PRINT("enter",
("Buffer: #%u 0x%lx file: %d offset: (%lu,0x%lx) size: %lu",
@@ -2557,6 +2620,8 @@
disk
*/
file= buffer->file;
+ skipped_data= buffer->skipped_data;
+ DBUG_ASSERT(skipped_data < TRANSLOG_PAGE_SIZE);
for (i= 0, pg= LSN_OFFSET(buffer->offset) / TRANSLOG_PAGE_SIZE;
i < buffer->size;
i+= TRANSLOG_PAGE_SIZE, pg++)
@@ -2573,13 +2638,16 @@
DBUG_ASSERT(i + TRANSLOG_PAGE_SIZE <= buffer->size);
if (translog_status != TRANSLOG_OK && translog_status != TRANSLOG_SHUTDOWN)
DBUG_RETURN(1);
- if (pagecache_inject(log_descriptor.pagecache,
+ if (pagecache_write_part(log_descriptor.pagecache,
&file->handler, pg, 3,
buffer->buffer + i,
PAGECACHE_PLAIN_PAGE,
PAGECACHE_LOCK_LEFT_UNLOCKED,
- PAGECACHE_PIN_LEFT_UNPINNED, 0,
- LSN_IMPOSSIBLE))
+ PAGECACHE_PIN_LEFT_UNPINNED,
+ PAGECACHE_WRITE_DONE, 0,
+ LSN_IMPOSSIBLE,
+ skipped_data,
+ TRANSLOG_PAGE_SIZE - skipped_data))
{
DBUG_PRINT("error",
("Can't write page (%lu,0x%lx) to pagecache, error: %d",
@@ -2589,10 +2657,12 @@
translog_stop_writing();
DBUG_RETURN(1);
}
+ skipped_data= 0;
}
file->is_sync= 0;
- if (my_pwrite(file->handler.file, buffer->buffer,
- buffer->size, LSN_OFFSET(buffer->offset),
+ if (my_pwrite(file->handler.file, buffer->buffer + buffer->skipped_data,
+ buffer->size - buffer->skipped_data,
+ LSN_OFFSET(buffer->offset) + buffer->skipped_data,
log_write_flags))
{
DBUG_PRINT("error", ("Can't write buffer (%lu,0x%lx) size %lu "
@@ -2985,6 +3055,7 @@
uchar *from, *table= NULL;
int is_last_unfinished_page;
uint last_protected_sector= 0;
+ uint skipped_data= curr_buffer->skipped_data;
TRANSLOG_FILE file_copy;
uint8 ver= curr_buffer->ver;
translog_wait_for_writers(curr_buffer);
@@ -2997,7 +3068,25 @@
}
DBUG_ASSERT(LSN_FILE_NO(addr) == LSN_FILE_NO(curr_buffer->offset));
from= curr_buffer->buffer + (addr - curr_buffer->offset);
- memcpy(buffer, from, TRANSLOG_PAGE_SIZE);
+ if (skipped_data > (addr - curr_buffer->offset))
+ {
+ /*
+ We read page part of which is not present in buffer,
+ so we should read absent part from file (page cache actually)
+ */
+ file= get_logfile_by_number(file_no);
+ DBUG_ASSERT(file != NULL);
+ buffer= pagecache_read(log_descriptor.pagecache, &file->handler,
+ LSN_OFFSET(addr) / TRANSLOG_PAGE_SIZE,
+ 3, buffer,
+ PAGECACHE_PLAIN_PAGE,
+ PAGECACHE_LOCK_LEFT_UNLOCKED,
+ NULL);
+ }
+ else
+ skipped_data= 0; /* Read after skipped in buffer data */
+ memcpy(buffer + skipped_data, from + skipped_data,
+ TRANSLOG_PAGE_SIZE - skipped_data);
/*
We can use copy then in translog_page_validator() because it
do not put it permanently somewhere.
@@ -3291,6 +3380,7 @@
uint32 next_page_offset, page_rest;
uint32 i;
File fd;
+ int rc;
TRANSLOG_VALIDATOR_DATA data;
char path[FN_REFLEN];
uchar page_buff[TRANSLOG_PAGE_SIZE];
@@ -3316,14 +3406,19 @@
TRANSLOG_PAGE_SIZE);
page_rest= next_page_offset - LSN_OFFSET(addr);
memset(page_buff, TRANSLOG_FILLER, page_rest);
- if ((fd= open_logfile_by_number_no_cache(LSN_FILE_NO(addr))) < 0 ||
- ((my_chsize(fd, next_page_offset, TRANSLOG_FILLER, MYF(MY_WME)) ||
- (page_rest && my_pwrite(fd, page_buff, page_rest, LSN_OFFSET(addr),
- log_write_flags)) ||
- my_sync(fd, MYF(MY_WME))) |
- my_close(fd, MYF(MY_WME))) ||
- (sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS &&
- sync_dir(log_descriptor.directory_fd, MYF(MY_WME | MY_IGNORE_BADFD))))
+ rc= ((fd= open_logfile_by_number_no_cache(LSN_FILE_NO(addr))) < 0 ||
+ ((my_chsize(fd, next_page_offset, TRANSLOG_FILLER, MYF(MY_WME)) ||
+ (page_rest && my_pwrite(fd, page_buff, page_rest, LSN_OFFSET(addr),
+ log_write_flags)) ||
+ my_sync(fd, MYF(MY_WME)))));
+ translog_syncs++;
+ rc|= (fd > 0 && my_close(fd, MYF(MY_WME)));
+ if (sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS)
+ {
+ rc|= sync_dir(log_descriptor.directory_fd, MYF(MY_WME | MY_IGNORE_BADFD));
+ translog_syncs++;
+ }
+ if (rc)
DBUG_RETURN(1);
/* fix the horizon */
@@ -3511,6 +3606,7 @@
pthread_mutex_init(&log_descriptor.dirty_buffer_mask_lock,
MY_MUTEX_INIT_FAST) ||
pthread_cond_init(&log_descriptor.log_flush_cond, 0) ||
+ pthread_cond_init(&log_descriptor.new_goal_cond, 0) ||
my_rwlock_init(&log_descriptor.open_files_lock,
NULL) ||
my_init_dynamic_array(&log_descriptor.open_files,
@@ -3912,7 +4008,6 @@
log_descriptor.flushed= log_descriptor.horizon;
log_descriptor.in_buffers_only= log_descriptor.bc.buffer->offset;
log_descriptor.max_lsn= LSN_IMPOSSIBLE; /* set to 0 */
- log_descriptor.previous_flush_horizon= log_descriptor.horizon;
/*
Now 'flushed' is set to 'horizon' value, but 'horizon' is (potentially)
address of the next LSN and we want indicate that all LSNs that are
@@ -3995,6 +4090,10 @@
It is beginning of the log => there is no LSNs in the log =>
There is no harm in leaving it "as-is".
*/
+ log_descriptor.previous_flush_horizon= log_descriptor.horizon;
+ DBUG_PRINT("info", ("previous_flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(log_descriptor.
+ previous_flush_horizon)));
DBUG_RETURN(0);
}
file_no--;
@@ -4070,6 +4169,9 @@
translog_free_record_header(&rec);
}
}
+ log_descriptor.previous_flush_horizon= log_descriptor.horizon;
+ DBUG_PRINT("info", ("previous_flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(log_descriptor.previous_flush_horizon)));
DBUG_RETURN(0);
err:
ma_message_no_user(0, "log initialization failed");
@@ -4157,6 +4259,7 @@
pthread_mutex_destroy(&log_descriptor.log_flush_lock);
pthread_mutex_destroy(&log_descriptor.dirty_buffer_mask_lock);
pthread_cond_destroy(&log_descriptor.log_flush_cond);
+ pthread_cond_destroy(&log_descriptor.new_goal_cond);
rwlock_destroy(&log_descriptor.open_files_lock);
delete_dynamic(&log_descriptor.open_files);
delete_dynamic(&log_descriptor.unfinished_files);
@@ -6885,11 +6988,11 @@
{
translog_size_t res;
DBUG_ENTER("translog_read_record_header_from_buffer");
- DBUG_ASSERT(translog_is_LSN_chunk(page[page_offset]));
- DBUG_ASSERT(translog_status == TRANSLOG_OK ||
- translog_status == TRANSLOG_READONLY);
DBUG_PRINT("info", ("page byte: 0x%x offset: %u",
(uint) page[page_offset], (uint) page_offset));
+ DBUG_ASSERT(translog_is_LSN_chunk(page[page_offset]));
+ DBUG_ASSERT(translog_status == TRANSLOG_OK ||
+ translog_status == TRANSLOG_READONLY);
buff->type= (page[page_offset] & TRANSLOG_REC_TYPE);
buff->short_trid= uint2korr(page + page_offset + 1);
DBUG_PRINT("info", ("Type %u, Short TrID %u, LSN (%lu,0x%lx)",
@@ -7356,27 +7459,27 @@
"Buffer addr: (%lu,0x%lx) "
"Page addr: (%lu,0x%lx) "
"size: %lu (%lu) Pg: %u left: %u in progress %u",
- (uint) log_descriptor.bc.buffer_no,
- (ulong) log_descriptor.bc.buffer,
- LSN_IN_PARTS(log_descriptor.bc.buffer->offset),
+ (uint) old_buffer_no,
+ (ulong) old_buffer,
+ LSN_IN_PARTS(old_buffer->offset),
(ulong) LSN_FILE_NO(log_descriptor.horizon),
(ulong) (LSN_OFFSET(log_descriptor.horizon) -
log_descriptor.bc.current_page_fill),
- (ulong) log_descriptor.bc.buffer->size,
+ (ulong) old_buffer->size,
(ulong) (log_descriptor.bc.ptr -log_descriptor.bc.
buffer->buffer),
(uint) log_descriptor.bc.current_page_fill,
(uint) left,
- (uint) log_descriptor.bc.buffer->
+ (uint) old_buffer->
copy_to_buffer_in_progress));
translog_lock_assert_owner();
LINT_INIT(current_page_fill);
- new_buff_beginning= log_descriptor.bc.buffer->offset;
- new_buff_beginning+= log_descriptor.bc.buffer->size; /* increase offset */
+ new_buff_beginning= old_buffer->offset;
+ new_buff_beginning+= old_buffer->size; /* increase offset */
DBUG_ASSERT(log_descriptor.bc.ptr !=NULL);
DBUG_ASSERT(LSN_FILE_NO(log_descriptor.horizon) ==
- LSN_FILE_NO(log_descriptor.bc.buffer->offset));
+ LSN_FILE_NO(old_buffer->offset));
translog_check_cursor(&log_descriptor.bc);
DBUG_ASSERT(left < TRANSLOG_PAGE_SIZE);
if (left)
@@ -7387,18 +7490,20 @@
*/
DBUG_PRINT("info", ("left: %u", (uint) left));
+ old_buffer->pre_force_close_horizon=
+ old_buffer->offset + old_buffer->size;
/* decrease offset */
new_buff_beginning-= log_descriptor.bc.current_page_fill;
current_page_fill= log_descriptor.bc.current_page_fill;
memset(log_descriptor.bc.ptr, TRANSLOG_FILLER, left);
- log_descriptor.bc.buffer->size+= left;
+ old_buffer->size+= left;
DBUG_PRINT("info", ("Finish Page buffer #%u: 0x%lx "
"Size: %lu",
- (uint) log_descriptor.bc.buffer->buffer_no,
- (ulong) log_descriptor.bc.buffer,
- (ulong) log_descriptor.bc.buffer->size));
- DBUG_ASSERT(log_descriptor.bc.buffer->buffer_no ==
+ (uint) old_buffer->buffer_no,
+ (ulong) old_buffer,
+ (ulong) old_buffer->size));
+ DBUG_ASSERT(old_buffer->buffer_no ==
log_descriptor.bc.buffer_no);
}
else
@@ -7509,11 +7614,21 @@
if (left)
{
- /*
- TODO: do not copy beginning of the page if we have no CRC or sector
- checks on
- */
- memcpy(new_buffer->buffer, data, current_page_fill);
+ if (log_descriptor.flags &
+ (TRANSLOG_PAGE_CRC | TRANSLOG_SECTOR_PROTECTION))
+ memcpy(new_buffer->buffer, data, current_page_fill);
+ else
+ {
+ /*
+ This page header does not change if we add more data to the page so
+ we can not copy it and will not overwrite later
+ */
+ new_buffer->skipped_data= current_page_fill;
+#ifndef DBUG_OFF
+ memset(new_buffer->buffer, 0xa5, current_page_fill);
+#endif
+ DBUG_ASSERT(new_buffer->skipped_data < TRANSLOG_PAGE_SIZE);
+ }
}
old_buffer->next_buffer_offset= new_buffer->offset;
translog_buffer_lock(new_buffer);
@@ -7561,6 +7676,7 @@
{
log_descriptor.next_pass_max_lsn= lsn;
log_descriptor.max_lsn_requester= pthread_self();
+ pthread_cond_broadcast(&log_descriptor.new_goal_cond);
}
while (flush_no == log_descriptor.flush_no)
{
@@ -7572,66 +7688,78 @@
/**
- @brief Flush the log up to given LSN (included)
-
- @param lsn log record serial number up to which (inclusive)
- the log has to be flushed
-
- @return Operation status
+ @brief sync() range of files (inclusive) and directory (by request)
+
+ @param min min internal file number to flush
+ @param max max internal file number to flush
+ @param sync_dir need sync directory
+
+ return Operation status
@retval 0 OK
@retval 1 Error
-
-*/
-
-my_bool translog_flush(TRANSLOG_ADDRESS lsn)
-{
- LSN sent_to_disk= LSN_IMPOSSIBLE;
- TRANSLOG_ADDRESS flush_horizon;
- uint fn, i;
+*/
+
+static my_bool translog_sync_files(uint32 min, uint32 max,
+ my_bool sync_dir)
+{
+ uint fn;
+ my_bool rc= 0;
+ ulonglong flush_interval;
+ DBUG_ENTER("translog_sync_files");
+ DBUG_PRINT("info", ("min: %lu max: %lu sync dir: %d",
+ (ulong) min, (ulong) max, (int) sync_dir));
+ DBUG_ASSERT(min <= max);
+
+ flush_interval= group_commit_wait;
+ if (flush_interval)
+ flush_start= my_micro_time();
+ for (fn= min; fn <= max; fn++)
+ {
+ TRANSLOG_FILE *file= get_logfile_by_number(fn);
+ DBUG_ASSERT(file != NULL);
+ if (!file->is_sync)
+ {
+ if (my_sync(file->handler.file, MYF(MY_WME)))
+ {
+ rc= 1;
+ translog_stop_writing();
+ DBUG_RETURN(rc);
+ }
+ translog_syncs++;
+ file->is_sync= 1;
+ }
+ }
+
+ if (sync_dir)
+ {
+ if (!(rc= sync_dir(log_descriptor.directory_fd,
+ MYF(MY_WME | MY_IGNORE_BADFD))))
+ translog_syncs++;
+ }
+
+ DBUG_RETURN(rc);
+}
+
+
+/*
+ @brief Flushes buffers with LSNs in them less or equal address <lsn>
+
+ @param lsn address up to which all LSNs should be flushed,
+ can be reset to real last LSN address
+ @parem sent_to_disk returns 'sent to disk' position
+ @param flush_horizon returns horizon of the flush
+
+ @note About terminology see comment to translog_flush().
+*/
+
+void translog_flush_buffers(TRANSLOG_ADDRESS *lsn,
+ TRANSLOG_ADDRESS *sent_to_disk,
+ TRANSLOG_ADDRESS *flush_horizon)
+{
dirty_buffer_mask_t dirty_buffer_mask;
+ uint i;
uint8 last_buffer_no, start_buffer_no;
- my_bool rc= 0;
- DBUG_ENTER("translog_flush");
- DBUG_PRINT("enter", ("Flush up to LSN: (%lu,0x%lx)", LSN_IN_PARTS(lsn)));
- DBUG_ASSERT(translog_status == TRANSLOG_OK ||
- translog_status == TRANSLOG_READONLY);
- LINT_INIT(sent_to_disk);
-
- pthread_mutex_lock(&log_descriptor.log_flush_lock);
- DBUG_PRINT("info", ("Everything is flushed up to (%lu,0x%lx)",
- LSN_IN_PARTS(log_descriptor.flushed)));
- if (cmp_translog_addr(log_descriptor.flushed, lsn) >= 0)
- {
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);
- DBUG_RETURN(0);
- }
- if (log_descriptor.flush_in_progress)
- {
- translog_flush_set_new_goal_and_wait(lsn);
- if (!pthread_equal(log_descriptor.max_lsn_requester, pthread_self()))
- {
- /* fix lsn if it was horizon */
- if (cmp_translog_addr(lsn, log_descriptor.bc.buffer->last_lsn) > 0)
- lsn= BUFFER_MAX_LSN(log_descriptor.bc.buffer);
- translog_flush_wait_for_end(lsn);
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);
- DBUG_RETURN(0);
- }
- log_descriptor.next_pass_max_lsn= LSN_IMPOSSIBLE;
- }
- log_descriptor.flush_in_progress= 1;
- flush_horizon= log_descriptor.previous_flush_horizon;
- DBUG_PRINT("info", ("flush_in_progress is set"));
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);
-
- translog_lock();
- if (log_descriptor.is_everything_flushed)
- {
- DBUG_PRINT("info", ("everything is flushed"));
- rc= (translog_status == TRANSLOG_READONLY);
- translog_unlock();
- goto out;
- }
+ DBUG_ENTER("translog_flush_buffers");
/*
We will recheck information when will lock buffers one by
@@ -7656,15 +7784,15 @@
/*
if LSN up to which we have to flush bigger then maximum LSN of previous
buffer and at least one LSN was saved in the current buffer (last_lsn !=
- LSN_IMPOSSIBLE) then we better finish the current buffer.
+ LSN_IMPOSSIBLE) then we have to close the current buffer.
*/
- if (cmp_translog_addr(lsn, log_descriptor.bc.buffer->prev_last_lsn) > 0 &&
+ if (cmp_translog_addr(*lsn, log_descriptor.bc.buffer->prev_last_lsn) > 0 &&
log_descriptor.bc.buffer->last_lsn != LSN_IMPOSSIBLE)
{
struct st_translog_buffer *buffer= log_descriptor.bc.buffer;
- lsn= log_descriptor.bc.buffer->last_lsn; /* fix lsn if it was horizon */
+ *lsn= log_descriptor.bc.buffer->last_lsn; /* fix lsn if it was horizon */
DBUG_PRINT("info", ("LSN to flush fixed to last lsn: (%lu,0x%lx)",
- LSN_IN_PARTS(log_descriptor.bc.buffer->last_lsn)));
+ LSN_IN_PARTS(log_descriptor.bc.buffer->last_lsn)));
last_buffer_no= log_descriptor.bc.buffer_no;
log_descriptor.is_everything_flushed= 1;
translog_force_current_buffer_to_finish();
@@ -7676,8 +7804,10 @@
TRANSLOG_BUFFERS_NO);
translog_unlock();
}
- sent_to_disk= translog_get_sent_to_disk();
- if (cmp_translog_addr(lsn, sent_to_disk) > 0)
+
+ /* flush buffers */
+ *sent_to_disk= translog_get_sent_to_disk();
+ if (cmp_translog_addr(*lsn, *sent_to_disk) > 0)
{
DBUG_PRINT("info", ("Start buffer #: %u last buffer #: %u",
@@ -7697,53 +7827,237 @@
LSN_IN_PARTS(buffer->last_lsn),
(buffer->file ?
"dirty" : "closed")));
- if (buffer->prev_last_lsn <= lsn &&
+ if (buffer->prev_last_lsn <= *lsn &&
buffer->file != NULL)
{
- DBUG_ASSERT(flush_horizon <= buffer->offset + buffer->size);
- flush_horizon= buffer->offset + buffer->size;
+ DBUG_ASSERT(*flush_horizon <= buffer->offset + buffer->size);
+ *flush_horizon= (buffer->pre_force_close_horizon != LSN_IMPOSSIBLE ?
+ buffer->pre_force_close_horizon :
+ buffer->offset + buffer->size);
+ /* pre_force_close_horizon is reset during new buffer start */
+ DBUG_PRINT("info", ("flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(*flush_horizon)));
+ DBUG_ASSERT(*flush_horizon <= log_descriptor.horizon);
+
translog_buffer_flush(buffer);
}
translog_buffer_unlock(buffer);
i= (i + 1) % TRANSLOG_BUFFERS_NO;
} while (i != last_buffer_no);
- sent_to_disk= translog_get_sent_to_disk();
- }
-
- /* sync files from previous flush till current one */
- for (fn= LSN_FILE_NO(log_descriptor.flushed); fn <= LSN_FILE_NO(lsn); fn++)
- {
- TRANSLOG_FILE *file= get_logfile_by_number(fn);
- DBUG_ASSERT(file != NULL);
- if (!file->is_sync)
- {
- if (my_sync(file->handler.file, MYF(MY_WME)))
+ *sent_to_disk= translog_get_sent_to_disk();
+ }
+
+ DBUG_VOID_RETURN;
+}
+
+/**
+ @brief Flush the log up to given LSN (included)
+
+ @param lsn log record serial number up to which (inclusive)
+ the log has to be flushed
+
+ @return Operation status
+ @retval 0 OK
+ @retval 1 Error
+
+ @note
+
+ - Non group commit logic: Commits made in passes. Thread which started
+ flush first is performing actual flush, other threads sets new goal (LSN)
+ of the next pass (if it is maximum) and waits for the pass end or just
+ wait for the pass end.
+
+ - If hard group commit enabled and rate set to zero:
+ The first thread sends all changed buffers to disk. This is repeated
+ as long as there are new LSNs added. The process can not loop
+ forever because we have limited number of threads and they will wait
+ for the data to be synced.
+ Pseudo code:
+
+ do
+ send changed buffers to disk
+ while new_goal
+ sync
+
+ - If hard group commit switched ON and less than rate microseconds has
+ passed from last sync, then after buffers have been sent to disk
+ wait until rate microseconds has passed since last sync, do sync and return.
+ This ensures that if we call sync infrequently we don't do any waits.
+
+ - If soft group commit enabled everything works as with 'non group commit'
+ but the thread doesn't do any real sync(). If rate is not zero the
+ sync() will be performed by a service thread with the given rate
+ when needed (new LSN appears).
+
+ @note Terminology:
+ 'sent to disk' means written to disk but not sync()ed,
+ 'flushed' mean sent to disk and synced().
+*/
+
+my_bool translog_flush(TRANSLOG_ADDRESS lsn)
+{
+ struct timespec abstime;
+ ulonglong flush_interval;
+ ulonglong time_spent;
+ LSN sent_to_disk= LSN_IMPOSSIBLE;
+ TRANSLOG_ADDRESS flush_horizon;
+ my_bool rc= 0;
+ my_bool hgroup_commit_at_start;
+ DBUG_ENTER("translog_flush");
+ DBUG_PRINT("enter", ("Flush up to LSN: (%lu,0x%lx)", LSN_IN_PARTS(lsn)));
+ DBUG_ASSERT(translog_status == TRANSLOG_OK ||
+ translog_status == TRANSLOG_READONLY);
+ LINT_INIT(sent_to_disk);
+
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ DBUG_PRINT("info", ("Everything is flushed up to (%lu,0x%lx)",
+ LSN_IN_PARTS(log_descriptor.flushed)));
+ if (cmp_translog_addr(log_descriptor.flushed, lsn) >= 0)
+
+
+ {
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ DBUG_RETURN(0);
+ }
+ if (log_descriptor.flush_in_progress)
+ {
+ translog_lock();
+ /* fix lsn if it was horizon */
+ if (cmp_translog_addr(lsn, log_descriptor.bc.buffer->last_lsn) > 0)
+ lsn= BUFFER_MAX_LSN(log_descriptor.bc.buffer);
+ translog_unlock();
+ translog_flush_set_new_goal_and_wait(lsn);
+ if (!pthread_equal(log_descriptor.max_lsn_requester, pthread_self()))
+ {
+ /*
+ translog_flush_wait_for_end() release log_flush_lock while is
+ waiting then acquire it again
+ */
+ translog_flush_wait_for_end(lsn);
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ DBUG_RETURN(0);
+ }
+ log_descriptor.next_pass_max_lsn= LSN_IMPOSSIBLE;
+ }
+ log_descriptor.flush_in_progress= 1;
+ flush_horizon= log_descriptor.previous_flush_horizon;
+ DBUG_PRINT("info", ("flush_in_progress is set, flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(flush_horizon)));
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+
+ hgroup_commit_at_start= hard_group_commit;
+ if (hgroup_commit_at_start)
+ flush_interval= group_commit_wait;
+
+ translog_lock();
+ if (log_descriptor.is_everything_flushed)
+ {
+ DBUG_PRINT("info", ("everything is flushed"));
+ translog_unlock();
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ goto out;
+ }
+
+ for (;;)
+ {
+ /* Following function flushes buffers and makes translog_unlock() */
+ translog_flush_buffers(&lsn, &sent_to_disk, &flush_horizon);
+
+ if (!hgroup_commit_at_start)
+ break; /* flush pass is ended */
+
+retest:
+ if (flush_interval != 0 &&
+ (my_micro_time() - flush_start) >= flush_interval)
+ break; /* flush pass is ended */
+
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ if (log_descriptor.next_pass_max_lsn != LSN_IMPOSSIBLE)
+ {
+ /* take next goal */
+ lsn= log_descriptor.next_pass_max_lsn;
+ log_descriptor.next_pass_max_lsn= LSN_IMPOSSIBLE;
+ /* prevent other thread from continue */
+ log_descriptor.max_lsn_requester= pthread_self();
+ DBUG_PRINT("info", ("flush took next goal: (%lu,0x%lx)",
+ LSN_IN_PARTS(lsn)));
+ }
+ else
+ {
+ if (flush_interval == 0 ||
+ (time_spent= (my_micro_time() - flush_start)) >= flush_interval)
{
- rc= 1;
- translog_stop_writing();
- sent_to_disk= LSN_IMPOSSIBLE;
- goto out;
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ break;
}
- file->is_sync= 1;
- }
- }
-
- if (sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS &&
- (LSN_FILE_NO(log_descriptor.previous_flush_horizon) !=
- LSN_FILE_NO(flush_horizon) ||
- ((LSN_OFFSET(log_descriptor.previous_flush_horizon) - 1) /
- TRANSLOG_PAGE_SIZE) !=
- ((LSN_OFFSET(flush_horizon) - 1) / TRANSLOG_PAGE_SIZE)))
- rc|= sync_dir(log_descriptor.directory_fd, MYF(MY_WME | MY_IGNORE_BADFD));
+ DBUG_PRINT("info", ("flush waits: %llu interval: %llu spent: %llu",
+ flush_interval - time_spent,
+ flush_interval, time_spent));
+ /* wait time or next goal */
+ set_timespec_nsec(abstime, flush_interval - time_spent);
+ pthread_cond_timedwait(&log_descriptor.new_goal_cond,
+ &log_descriptor.log_flush_lock,
+ &abstime);
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ DBUG_PRINT("info", ("retest conditions"));
+ goto retest;
+ }
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+
+ /* next flush pass */
+ DBUG_PRINT("info", ("next flush pass"));
+ translog_lock();
+ }
+
+ /*
+ sync() files from previous flush till current one
+ */
+ if (!soft_sync || hgroup_commit_at_start)
+ {
+ if ((rc=
+ translog_sync_files(LSN_FILE_NO(log_descriptor.flushed),
+ LSN_FILE_NO(lsn),
+ sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS &&
+ (LSN_FILE_NO(log_descriptor.
+ previous_flush_horizon) !=
+ LSN_FILE_NO(flush_horizon) ||
+ (LSN_OFFSET(log_descriptor.
+ previous_flush_horizon) /
+ TRANSLOG_PAGE_SIZE) !=
+ (LSN_OFFSET(flush_horizon) /
+ TRANSLOG_PAGE_SIZE)))))
+ {
+ sent_to_disk= LSN_IMPOSSIBLE;
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ goto out;
+ }
+ /* keep values for soft sync() and forced sync() actual */
+ {
+ uint32 fileno= LSN_FILE_NO(lsn);
+ my_atomic_rwlock_wrlock(&soft_sync_rwl);
+ my_atomic_store32(&soft_sync_min, fileno);
+ my_atomic_store32(&soft_sync_max, fileno);
+ my_atomic_rwlock_wrunlock(&soft_sync_rwl);
+ }
+ }
+ else
+ {
+ my_atomic_rwlock_wrlock(&soft_sync_rwl);
+ my_atomic_store32(&soft_sync_max, LSN_FILE_NO(lsn));
+ my_atomic_rwlock_wrunlock(&soft_sync_rwl);
+ }
+
+ DBUG_ASSERT(flush_horizon <= log_descriptor.horizon);
+
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
log_descriptor.previous_flush_horizon= flush_horizon;
out:
- pthread_mutex_lock(&log_descriptor.log_flush_lock);
if (sent_to_disk != LSN_IMPOSSIBLE)
log_descriptor.flushed= sent_to_disk;
log_descriptor.flush_in_progress= 0;
log_descriptor.flush_no++;
DBUG_PRINT("info", ("flush_in_progress is dropped"));
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);\
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
pthread_cond_broadcast(&log_descriptor.log_flush_cond);
DBUG_RETURN(rc);
}
@@ -8113,6 +8427,8 @@
my_bool translog_purge(TRANSLOG_ADDRESS low)
{
uint32 last_need_file= LSN_FILE_NO(low);
+ uint32 min_unsync;
+ int soft;
TRANSLOG_ADDRESS horizon= translog_get_horizon();
int rc= 0;
DBUG_ENTER("translog_purge");
@@ -8120,12 +8436,23 @@
DBUG_ASSERT(translog_status == TRANSLOG_OK ||
translog_status == TRANSLOG_READONLY);
+ soft= soft_sync;
+ DBUG_PRINT("info", ("min_unsync: %lu", (ulong) min_unsync));
+ if (soft && min_unsync < last_need_file)
+ {
+ last_need_file= min_unsync;
+ DBUG_PRINT("info", ("last_need_file set to %lu", (ulong)last_need_file));
+ }
+
pthread_mutex_lock(&log_descriptor.purger_lock);
+ DBUG_PRINT("info", ("last_lsn_checked file: %lu:",
+ (ulong) log_descriptor.last_lsn_checked));
if (LSN_FILE_NO(log_descriptor.last_lsn_checked) < last_need_file)
{
uint32 i;
uint32 min_file= translog_first_file(horizon, 1);
DBUG_ASSERT(min_file != 0); /* log is already started */
+ DBUG_PRINT("info", ("min_file: %lu:",(ulong) min_file));
for(i= min_file; i < last_need_file && rc == 0; i++)
{
LSN lsn= translog_get_file_max_lsn_stored(i);
@@ -8356,6 +8683,155 @@
}
+
+/**
+ Sets soft sync mode
+
+ @param mode TRUE if we need switch soft sync on else off
+*/
+
+void translog_soft_sync(my_bool mode)
+{
+ soft_sync= mode;
+}
+
+
+/**
+ Sets hard group commit
+
+ @param mode TRUE if we need switch hard group commit on else off
+*/
+
+void translog_hard_group_commit(my_bool mode)
+{
+ hard_group_commit= mode;
+}
+
+
+/**
+ @brief forced log sync (used when we are switching modes)
+*/
+
+void translog_sync()
+{
+ uint32 max= get_current_logfile()->number;
+ uint32 min;
+ DBUG_ENTER("ma_translog_sync");
+
+ my_atomic_rwlock_rdlock(&soft_sync_rwl);
+ min= my_atomic_load32(&soft_sync_min);
+ my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+ if (!min)
+ min= max;
+
+ translog_sync_files(min, max, sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS);
+
+ DBUG_VOID_RETURN;
+}
+
+
+/**
+ @brief set rate for group commit
+
+ @param interval interval to set.
+
+ @note We use this function with additional variable because have to
+ restart service thread with new value which we can't make inside changing
+ variable routine (update_maria_group_commit_interval)
+*/
+
+void translog_set_group_commit_interval(uint32 interval)
+{
+ DBUG_ENTER("translog_set_group_commit_interval");
+ group_commit_wait= interval;
+ DBUG_PRINT("info", ("wait: %llu",
+ (ulonglong)group_commit_wait));
+ DBUG_VOID_RETURN;
+}
+
+
+/**
+ @brief syncing service thread
+*/
+
+static pthread_handler_t
+ma_soft_sync_background( void *arg __attribute__((unused)))
+{
+
+ my_thread_init();
+ {
+ DBUG_ENTER("ma_soft_sync_background");
+ for(;;)
+ {
+ ulonglong prev_loop= my_micro_time();
+ ulonglong time, sleep;
+ uint32 min, max;
+ my_atomic_rwlock_rdlock(&soft_sync_rwl);
+ min= my_atomic_load32(&soft_sync_min);
+ max= my_atomic_load32(&soft_sync_max);
+ my_atomic_store32(&soft_sync_min, max);
+ my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+
+ sleep= group_commit_wait;
+ translog_sync_files(min, max, FALSE);
+ time= my_micro_time() - prev_loop;
+ if (time > sleep)
+ sleep= 0;
+ else
+ sleep-= time;
+ if (my_service_thread_sleep(&soft_sync_control, sleep))
+ break;
+ }
+ my_service_thread_signal_end(&soft_sync_control);
+ my_thread_end();
+ DBUG_RETURN(0);
+ }
+}
+
+
+/**
+ @brief Starts syncing thread
+*/
+
+int translog_soft_sync_start(void)
+{
+ pthread_t th;
+ int res= 0;
+ uint32 min, max;
+ DBUG_ENTER("translog_soft_sync_start");
+
+ /* check and init variables */
+ my_atomic_rwlock_rdlock(&soft_sync_rwl);
+ min= my_atomic_load32(&soft_sync_min);
+ max= my_atomic_load32(&soft_sync_max);
+ if (!max)
+ my_atomic_store32(&soft_sync_max, (max= get_current_logfile()->number));
+ if (!min)
+ my_atomic_store32(&soft_sync_min, max);
+ my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+
+ if (!(res= ma_service_thread_control_init(&soft_sync_control)))
+ if (!(res= pthread_create(&th, NULL, ma_soft_sync_background, NULL)))
+ soft_sync_control.status= THREAD_RUNNING;
+ DBUG_RETURN(res);
+}
+
+
+/**
+ @brief Stops syncing thread
+*/
+
+void translog_soft_sync_end(void)
+{
+ DBUG_ENTER("translog_soft_sync_end");
+ if (soft_sync_control.inited)
+ {
+ ma_service_thread_control_end(&soft_sync_control);
+ }
+ DBUG_VOID_RETURN;
+}
+
+
#ifdef MARIA_DUMP_LOG
#include <my_getopt.h>
extern void translog_example_table_init();
=== modified file 'storage/maria/ma_loghandler.h'
--- a/storage/maria/ma_loghandler.h 2009-01-15 22:25:53 +0000
+++ b/storage/maria/ma_loghandler.h 2010-02-09 08:32:59 +0000
@@ -342,6 +342,14 @@
TRANSLOG_SHUTDOWN /* going to shutdown the loghandler */
};
extern enum enum_translog_status translog_status;
+extern ulonglong translog_syncs; /* Number of sync()s */
+
+void translog_soft_sync(my_bool mode);
+void translog_hard_group_commit(my_bool mode);
+int translog_soft_sync_start(void);
+void translog_soft_sync_end(void);
+void translog_sync();
+void translog_set_group_commit_interval(uint32 interval);
/*
all the rest added because of recovery; should we make
@@ -441,6 +449,14 @@
typedef enum
{
+ TRANSLOG_GCOMMIT_NONE,
+ TRANSLOG_GCOMMIT_HARD,
+ TRANSLOG_GCOMMIT_SOFT
+} enum_maria_group_commit;
+extern ulong maria_group_commit;
+extern ulong maria_group_commit_interval;
+typedef enum
+{
TRANSLOG_PURGE_IMMIDIATE,
TRANSLOG_PURGE_EXTERNAL,
TRANSLOG_PURGE_ONDEMAND
3
2
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (monty:2814)
by Michael Widenius 10 Feb '10
by Michael Widenius 10 Feb '10
10 Feb '10
#At lp:maria based on revid:monty@askmonty.org-20100209171704-h7stfhbh94k54tbf
2814 Michael Widenius 2010-02-10
When one does a drop table, the indexes are not flushed to disk before drop anymore (with MyISAM/Maria)
myisam-recover options changed from OFF to 'DEFAULT' to get less change of data loss when using MyISAM.
(The disadvantage is that changed MyISAM tables will be checked at access time; Use --myisam-recover=OFF for old behavior)
Don't call extra(HA_EXTRA_FORCE_REOPEN) in ALTER TABLE if table is locked as this will mark table as crashed!
Added assert to detect if we accidently would use MyISAM versioning in MySQL
modified:
include/my_base.h
mysql-test/mysql-test-run.pl
mysql-test/r/sp-destruct.result
mysql-test/r/variables.result
mysql-test/r/view.result
mysql-test/suite/maria/t/maria-recovery2-master.opt
mysql-test/t/sp-destruct.test
mysql-test/t/view.test
sql/lock.cc
sql/mysql_priv.h
sql/mysqld.cc
sql/sql_base.cc
sql/sql_delete.cc
sql/sql_table.cc
sql/table.cc
sql/table.h
storage/maria/ha_maria.cc
storage/maria/ma_blockrec.c
storage/maria/ma_close.c
storage/maria/ma_extra.c
storage/maria/ma_locking.c
storage/maria/ma_recovery.c
storage/maria/maria_def.h
storage/myisam/mi_close.c
storage/myisam/mi_extra.c
storage/myisam/mi_open.c
storage/myisam/myisamdef.h
per-file messages:
include/my_base.h
Mark NOT_USED as USED, as we now use this as a flag to not call extra()
mysql-test/mysql-test-run.pl
Don't write all options when there is something wrong with the arguments
mysql-test/r/sp-destruct.result
Add missing flush of mysql.proc (as the test copied live tables)
mysql-test/r/variables.result
myisam-recover options changed to 'default'
mysql-test/r/view.result
Don't show create time in result
mysql-test/suite/maria/t/maria-recovery2-master.opt
Don't run test with myisam-recover (as this produces extra warnings during simulated death)
mysql-test/t/sp-destruct.test
Add missing flush of mysql.proc (as the test copied live tables)
mysql-test/t/view.test
Don't show create time in result
sql/lock.cc
Added marker if table was deleted to argument list
sql/mysql_priv.h
Added marker if table was deleted to argument list
sql/mysqld.cc
myisam-recover options changed from OFF to 'DEFAULT' to get less change of data loss when using MyISAM
Allow one to specify OFF as argument to myisam-recover (was default before but one couldn't specify it)
sql/sql_base.cc
Mark if table is going to be deleted
sql/sql_delete.cc
Mark if table is going to be deleted
sql/sql_table.cc
Mark if table is going to be deleted
Don't call extra(HA_EXTRA_FORCE_REOPEN) in ALTER TABLE if table is locked as this will mark table as crashed!
sql/table.cc
Signal to handler if table is getting deleted as part of getting droped from table cache.
sql/table.h
Added marker if table is going to be deleted.
storage/maria/ha_maria.cc
Don't search for transaction handler if file is not transactional or outside of transaction
(Fixed possible core dump)
storage/maria/ma_blockrec.c
Don't write changed information if table is going to be deleted.
storage/maria/ma_close.c
Don't write changed information if table is going to be deleted.
storage/maria/ma_extra.c
Mark tables that are deleted as crased, to ensure good behavior on restart if we suddenly crash.
storage/maria/ma_locking.c
Cleanup
storage/maria/ma_recovery.c
We need trnman to be inited during redo phase (to be able to open tables checked with maria_chk)
storage/maria/maria_def.h
Added marker if table is going to be deleted.
storage/myisam/mi_close.c
Don't write changed information if table is going to be deleted.
storage/myisam/mi_extra.c
Mark tables that are deleted as crased, to ensure good behavior on restart if we suddenly crash.
storage/myisam/mi_open.c
Added assert to detect if we accidently would use MyISAM versioning in MySQL
storage/myisam/myisamdef.h
Added marker if table is going to be deleted.
=== modified file 'include/my_base.h'
--- a/include/my_base.h 2009-09-07 20:50:10 +0000
+++ b/include/my_base.h 2010-02-10 19:06:24 +0000
@@ -111,7 +111,7 @@ enum ha_storage_media {
enum ha_extra_function {
HA_EXTRA_NORMAL=0, /* Optimize for space (def) */
HA_EXTRA_QUICK=1, /* Optimize for speed */
- HA_EXTRA_NOT_USED=2,
+ HA_EXTRA_NOT_USED=2, /* Should be ignored by handler */
HA_EXTRA_CACHE=3, /* Cache record in HA_rrnd() */
HA_EXTRA_NO_CACHE=4, /* End caching of records (def) */
HA_EXTRA_NO_READCHECK=5, /* No readcheck on update */
=== modified file 'mysql-test/mysql-test-run.pl'
--- a/mysql-test/mysql-test-run.pl 2010-01-29 10:42:31 +0000
+++ b/mysql-test/mysql-test-run.pl 2010-02-10 19:06:24 +0000
@@ -5542,6 +5542,8 @@ sub usage ($) {
if ( $message )
{
print STDERR "$message\n";
+ print STDERR "For full list of options, use $0 --help\n";
+ exit;
}
print <<HERE;
=== modified file 'mysql-test/r/sp-destruct.result'
--- a/mysql-test/r/sp-destruct.result 2009-11-21 11:18:21 +0000
+++ b/mysql-test/r/sp-destruct.result 2010-02-10 19:06:24 +0000
@@ -1,4 +1,5 @@
call mtr.add_suppression("Column count of mysql.proc is wrong. Expected 20, found 19. The table is probably corrupted");
+flush table mysql.proc;
use test;
drop procedure if exists bug14233;
drop function if exists bug14233;
=== modified file 'mysql-test/r/variables.result'
--- a/mysql-test/r/variables.result 2010-01-11 13:15:28 +0000
+++ b/mysql-test/r/variables.result 2010-02-10 19:06:24 +0000
@@ -1261,12 +1261,12 @@ ERROR HY000: Variable 'lower_case_table_
#
SHOW VARIABLES like 'myisam_recover_options';
Variable_name Value
-myisam_recover_options OFF
+myisam_recover_options DEFAULT
SELECT @@session.myisam_recover_options;
ERROR HY000: Variable 'myisam_recover_options' is a GLOBAL variable
SELECT @@global.myisam_recover_options;
@@global.myisam_recover_options
-OFF
+DEFAULT
SET @@session.myisam_recover_options= 'x';
ERROR HY000: Variable 'myisam_recover_options' is a read only variable
SET @@global.myisam_recover_options= 'x';
=== modified file 'mysql-test/r/view.result'
--- a/mysql-test/r/view.result 2009-10-15 21:38:29 +0000
+++ b/mysql-test/r/view.result 2010-02-10 19:06:24 +0000
@@ -155,13 +155,13 @@ v5 VIEW
v6 VIEW
show table status;
Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment
-t1 MyISAM 10 Fixed 5 9 45 # 1024 0 NULL # # NULL latin1_swedish_ci NULL
-v1 NULL NULL NULL NULL NULL NULL # NULL NULL NULL # # NULL NULL NULL NULL VIEW
-v2 NULL NULL NULL NULL NULL NULL # NULL NULL NULL # # NULL NULL NULL NULL VIEW
-v3 NULL NULL NULL NULL NULL NULL # NULL NULL NULL # # NULL NULL NULL NULL VIEW
-v4 NULL NULL NULL NULL NULL NULL # NULL NULL NULL # # NULL NULL NULL NULL VIEW
-v5 NULL NULL NULL NULL NULL NULL # NULL NULL NULL # # NULL NULL NULL NULL VIEW
-v6 NULL NULL NULL NULL NULL NULL # NULL NULL NULL # # NULL NULL NULL NULL VIEW
+t1 MyISAM 10 Fixed 5 9 45 # 1024 0 NULL # # # latin1_swedish_ci NULL
+v1 NULL NULL NULL NULL NULL NULL # NULL NULL NULL # # # NULL NULL NULL VIEW
+v2 NULL NULL NULL NULL NULL NULL # NULL NULL NULL # # # NULL NULL NULL VIEW
+v3 NULL NULL NULL NULL NULL NULL # NULL NULL NULL # # # NULL NULL NULL VIEW
+v4 NULL NULL NULL NULL NULL NULL # NULL NULL NULL # # # NULL NULL NULL VIEW
+v5 NULL NULL NULL NULL NULL NULL # NULL NULL NULL # # # NULL NULL NULL VIEW
+v6 NULL NULL NULL NULL NULL NULL # NULL NULL NULL # # # NULL NULL NULL VIEW
drop view v1,v2,v3,v4,v5,v6;
create view v1 (c,d,e,f) as select a,b,
a in (select a+2 from t1), a = all (select a from t1) from t1;
=== modified file 'mysql-test/suite/maria/t/maria-recovery2-master.opt'
--- a/mysql-test/suite/maria/t/maria-recovery2-master.opt 2009-01-15 14:29:14 +0000
+++ b/mysql-test/suite/maria/t/maria-recovery2-master.opt 2010-02-10 19:06:24 +0000
@@ -1 +1 @@
---skip-stack-trace --skip-core-file --loose-maria-log-dir-path=$MYSQLTEST_VARDIR/tmp
+--skip-stack-trace --skip-core-file --loose-maria-log-dir-path=$MYSQLTEST_VARDIR/tmp --myisam-recover=
=== modified file 'mysql-test/t/sp-destruct.test'
--- a/mysql-test/t/sp-destruct.test 2009-11-21 11:18:21 +0000
+++ b/mysql-test/t/sp-destruct.test 2010-02-10 19:06:24 +0000
@@ -17,6 +17,7 @@ call mtr.add_suppression("Column count o
# Backup proc table
let $MYSQLD_DATADIR= `select @@datadir`;
+flush table mysql.proc;
--copy_file $MYSQLD_DATADIR/mysql/proc.frm $MYSQLTEST_VARDIR/tmp/proc.frm
--copy_file $MYSQLD_DATADIR/mysql/proc.MYD $MYSQLTEST_VARDIR/tmp/proc.MYD
--copy_file $MYSQLD_DATADIR/mysql/proc.MYI $MYSQLTEST_VARDIR/tmp/proc.MYI
=== modified file 'mysql-test/t/view.test'
--- a/mysql-test/t/view.test 2009-10-15 21:38:29 +0000
+++ b/mysql-test/t/view.test 2010-02-10 19:06:24 +0000
@@ -87,7 +87,7 @@ explain extended select c from v6;
# show table/table status test
show tables;
show full tables;
---replace_column 8 # 12 # 13 #
+--replace_column 8 # 12 # 13 # 14 #
show table status;
drop view v1,v2,v3,v4,v5,v6;
=== modified file 'sql/lock.cc'
--- a/sql/lock.cc 2009-10-15 21:38:29 +0000
+++ b/sql/lock.cc 2010-02-10 19:06:24 +0000
@@ -1049,10 +1049,14 @@ int lock_table_name(THD *thd, TABLE_LIST
DBUG_RETURN(-1);
table_list->table=table;
+ table->s->deleting= table_list->deleting;
/* Return 1 if table is in use */
DBUG_RETURN(test(remove_table_from_cache(thd, db, table_list->table_name,
- check_in_use ? RTFC_NO_FLAG : RTFC_WAIT_OTHER_THREAD_FLAG)));
+ (check_in_use ?
+ RTFC_NO_FLAG :
+ RTFC_WAIT_OTHER_THREAD_FLAG),
+ table_list->deleting)));
}
=== modified file 'sql/mysql_priv.h'
--- a/sql/mysql_priv.h 2009-12-03 11:19:05 +0000
+++ b/sql/mysql_priv.h 2010-02-10 19:06:24 +0000
@@ -1636,7 +1636,7 @@ uint prep_alter_part_table(THD *thd, TAB
#define RTFC_WAIT_OTHER_THREAD_FLAG 0x0002
#define RTFC_CHECK_KILLED_FLAG 0x0004
bool remove_table_from_cache(THD *thd, const char *db, const char *table,
- uint flags);
+ uint flags, my_bool deleting);
#define NORMAL_PART_NAME 0
#define TEMP_PART_NAME 1
=== modified file 'sql/mysqld.cc'
--- a/sql/mysqld.cc 2010-01-29 18:42:22 +0000
+++ b/sql/mysqld.cc 2010-02-10 19:06:24 +0000
@@ -7962,7 +7962,13 @@ static int mysql_init_variables(void)
refresh_version= 1L; /* Increments on each reload */
global_query_id= thread_id= 1L;
strmov(server_version, MYSQL_SERVER_VERSION);
- myisam_recover_options_str= sql_mode_str= "OFF";
+ sql_mode_str= "";
+
+ /* By default, auto-repair MyISAM tables after crash */
+ myisam_recover_options_str= "DEFAULT";
+ myisam_recover_options= HA_RECOVER_DEFAULT;
+ ha_open_options|= HA_OPEN_ABORT_IF_CRASHED;
+
myisam_stats_method_str= "nulls_unequal";
my_bind_addr = htonl(INADDR_ANY);
threads.empty();
@@ -8616,26 +8622,31 @@ mysqld_get_one_option(int optid,
#endif
case OPT_MYISAM_RECOVER:
{
- if (!argument)
- {
- myisam_recover_options= HA_RECOVER_DEFAULT;
- myisam_recover_options_str= myisam_recover_typelib.type_names[0];
- }
- else if (!argument[0])
+ if (argument && (!argument[0] ||
+ my_strcasecmp(system_charset_info, argument, "OFF") == 0))
{
myisam_recover_options= HA_RECOVER_NONE;
myisam_recover_options_str= "OFF";
+ ha_open_options&= ~HA_OPEN_ABORT_IF_CRASHED;
}
else
{
- myisam_recover_options_str=argument;
- myisam_recover_options=
- find_bit_type_or_exit(argument, &myisam_recover_typelib, opt->name,
- &error);
- if (error)
- return 1;
+ if (!argument)
+ {
+ myisam_recover_options= HA_RECOVER_DEFAULT;
+ myisam_recover_options_str= myisam_recover_typelib.type_names[0];
+ }
+ else
+ {
+ myisam_recover_options_str=argument;
+ myisam_recover_options=
+ find_bit_type_or_exit(argument, &myisam_recover_typelib, opt->name,
+ &error);
+ if (error)
+ return 1;
+ }
+ ha_open_options|=HA_OPEN_ABORT_IF_CRASHED;
}
- ha_open_options|=HA_OPEN_ABORT_IF_CRASHED;
break;
}
case OPT_CONCURRENT_INSERT:
=== modified file 'sql/sql_base.cc'
--- a/sql/sql_base.cc 2010-01-15 15:27:55 +0000
+++ b/sql/sql_base.cc 2010-02-10 19:06:24 +0000
@@ -930,7 +930,7 @@ bool close_cached_tables(THD *thd, TABLE
for (TABLE_LIST *table= tables; table; table= table->next_local)
{
if (remove_table_from_cache(thd, table->db, table->table_name,
- RTFC_OWNED_BY_THD_FLAG))
+ RTFC_OWNED_BY_THD_FLAG, table->deleting))
found=1;
}
if (!found)
@@ -8404,6 +8404,11 @@ void remove_db_from_cache(const char *db
if (!strcmp(table->s->db.str, db))
{
table->s->version= 0L; /* Free when thread is ready */
+ /*
+ This functions only called from DROP DATABASE code, so we are going
+ to drop all tables so we mark them as deleting
+ */
+ table->s->deleting= TRUE;
if (!table->in_use)
relink_unused(table);
}
@@ -8446,7 +8451,7 @@ void flush_tables()
*/
bool remove_table_from_cache(THD *thd, const char *db, const char *table_name,
- uint flags)
+ uint flags, my_bool deleting)
{
char key[MAX_DBKEY_LENGTH];
uint key_length;
@@ -8540,7 +8545,10 @@ bool remove_table_from_cache(THD *thd, c
}
}
while (unused_tables && !unused_tables->s->version)
+ {
+ unused_tables->s->deleting= deleting;
VOID(hash_delete(&open_cache,(uchar*) unused_tables));
+ }
DBUG_PRINT("info", ("Removing table from table_def_cache"));
/* Remove table from table definition cache if it's not in use */
@@ -8734,7 +8742,8 @@ int abort_and_upgrade_lock(ALTER_PARTITI
/* If MERGE child, forward lock handling to parent. */
mysql_lock_abort(lpt->thd, lpt->table->parent ? lpt->table->parent :
lpt->table, TRUE);
- VOID(remove_table_from_cache(lpt->thd, lpt->db, lpt->table_name, flags));
+ VOID(remove_table_from_cache(lpt->thd, lpt->db, lpt->table_name, flags,
+ FALSE));
VOID(pthread_mutex_unlock(&LOCK_open));
DBUG_RETURN(0);
}
@@ -8759,7 +8768,7 @@ void close_open_tables_and_downgrade(ALT
{
VOID(pthread_mutex_lock(&LOCK_open));
remove_table_from_cache(lpt->thd, lpt->db, lpt->table_name,
- RTFC_WAIT_OTHER_THREAD_FLAG);
+ RTFC_WAIT_OTHER_THREAD_FLAG, FALSE);
VOID(pthread_mutex_unlock(&LOCK_open));
/* If MERGE child, forward lock handling to parent. */
mysql_lock_downgrade_write(lpt->thd, lpt->table->parent ? lpt->table->parent :
=== modified file 'sql/sql_delete.cc'
--- a/sql/sql_delete.cc 2010-01-15 15:27:55 +0000
+++ b/sql/sql_delete.cc 2010-02-10 19:06:24 +0000
@@ -1088,6 +1088,7 @@ bool mysql_truncate(THD *thd, TABLE_LIST
HA_CREATE_INFO create_info;
char path[FN_REFLEN + 1];
TABLE *table;
+ TABLE_LIST *tbl;
bool error;
uint path_length;
bool is_temporary_table= false;
@@ -1108,6 +1109,9 @@ bool mysql_truncate(THD *thd, TABLE_LIST
if (!ha_check_storage_engine_flag(table_type, HTON_CAN_RECREATE))
goto trunc_by_del;
+ for (tbl= table_list; tbl; tbl= tbl->next_local)
+ tbl->deleting= TRUE; /* to trigger HA_PREPARE_FOR_DROP */
+
table->file->info(HA_STATUS_AUTO | HA_STATUS_NO_LOCK);
create_info.options|= HA_LEX_CREATE_TMP_TABLE;
=== modified file 'sql/sql_table.cc'
--- a/sql/sql_table.cc 2010-01-15 15:27:55 +0000
+++ b/sql/sql_table.cc 2010-02-10 19:06:24 +0000
@@ -1880,6 +1880,7 @@ int mysql_rm_table_part2(THD *thd, TABLE
{
TABLE_SHARE *share;
table->db_type= NULL;
+
if ((share= get_cached_table_share(table->db, table->table_name)))
table->db_type= share->db_type();
@@ -1974,9 +1975,10 @@ int mysql_rm_table_part2(THD *thd, TABLE
{
TABLE *locked_table;
abort_locked_tables(thd, db, table->table_name);
+ table->deleting= TRUE;
remove_table_from_cache(thd, db, table->table_name,
RTFC_WAIT_OTHER_THREAD_FLAG |
- RTFC_CHECK_KILLED_FLAG);
+ RTFC_CHECK_KILLED_FLAG, FALSE);
/*
If the table was used in lock tables, remember it so that
unlock_table_names can free it
@@ -4213,9 +4215,10 @@ void wait_while_table_is_used(THD *thd,T
/* Wait until all there are no other threads that has this table open */
remove_table_from_cache(thd, table->s->db.str,
table->s->table_name.str,
- RTFC_WAIT_OTHER_THREAD_FLAG);
+ RTFC_WAIT_OTHER_THREAD_FLAG, FALSE);
/* extra() call must come only after all instances above are closed */
- VOID(table->file->extra(function));
+ if (function != HA_EXTRA_NOT_USED)
+ VOID(table->file->extra(function));
DBUG_VOID_RETURN;
}
@@ -4717,7 +4720,7 @@ static bool mysql_admin_table(THD* thd,
remove_table_from_cache(thd, table->table->s->db.str,
table->table->s->table_name.str,
RTFC_WAIT_OTHER_THREAD_FLAG |
- RTFC_CHECK_KILLED_FLAG);
+ RTFC_CHECK_KILLED_FLAG, FALSE);
thd->exit_cond(old_message);
DBUG_EXECUTE_IF("wait_in_mysql_admin_table", wait_for_kill_signal(thd););
if (thd->killed)
@@ -4975,7 +4978,8 @@ send_result_message:
{
pthread_mutex_lock(&LOCK_open);
remove_table_from_cache(thd, table->table->s->db.str,
- table->table->s->table_name.str, RTFC_NO_FLAG);
+ table->table->s->table_name.str,
+ RTFC_NO_FLAG, FALSE);
pthread_mutex_unlock(&LOCK_open);
}
/* May be something modified consequently we have to invalidate cache */
@@ -6738,7 +6742,9 @@ view_err:
from concurrent DDL statements.
*/
VOID(pthread_mutex_lock(&LOCK_open));
- wait_while_table_is_used(thd, table, HA_EXTRA_FORCE_REOPEN);
+ wait_while_table_is_used(thd, table,
+ thd->locked_tables ? HA_EXTRA_NOT_USED :
+ HA_EXTRA_FORCE_REOPEN);
VOID(pthread_mutex_unlock(&LOCK_open));
DBUG_EXECUTE_IF("sleep_alter_enable_indexes", my_sleep(6000000););
error= table->file->ha_enable_indexes(HA_KEY_SWITCH_NONUNIQ_SAVE);
@@ -6746,7 +6752,9 @@ view_err:
break;
case DISABLE:
VOID(pthread_mutex_lock(&LOCK_open));
- wait_while_table_is_used(thd, table, HA_EXTRA_FORCE_REOPEN);
+ wait_while_table_is_used(thd, table,
+ thd->locked_tables ? HA_EXTRA_NOT_USED :
+ HA_EXTRA_FORCE_REOPEN);
VOID(pthread_mutex_unlock(&LOCK_open));
error=table->file->ha_disable_indexes(HA_KEY_SWITCH_NONUNIQ_SAVE);
/* COND_refresh will be signaled in close_thread_tables() */
@@ -7192,7 +7200,9 @@ view_err:
else
{
VOID(pthread_mutex_lock(&LOCK_open));
- wait_while_table_is_used(thd, table, HA_EXTRA_FORCE_REOPEN);
+ wait_while_table_is_used(thd, table,
+ thd->locked_tables ? HA_EXTRA_NOT_USED :
+ HA_EXTRA_FORCE_REOPEN);
VOID(pthread_mutex_unlock(&LOCK_open));
thd_proc_info(thd, "manage keys");
alter_table_manage_keys(table, table->file->indexes_are_disabled(),
=== modified file 'sql/table.cc'
--- a/sql/table.cc 2010-01-15 15:27:55 +0000
+++ b/sql/table.cc 2010-02-10 19:06:24 +0000
@@ -1977,7 +1977,11 @@ int closefrm(register TABLE *table, bool
DBUG_PRINT("enter", ("table: 0x%lx", (long) table));
if (table->db_stat)
+ {
+ if (table->s->deleting)
+ table->file->extra(HA_EXTRA_PREPARE_FOR_DROP);
error=table->file->close();
+ }
my_free((char*) table->alias, MYF(MY_ALLOW_ZERO_PTR));
table->alias= 0;
if (table->field)
=== modified file 'sql/table.h'
--- a/sql/table.h 2010-01-15 15:27:55 +0000
+++ b/sql/table.h 2010-02-10 19:06:24 +0000
@@ -431,6 +431,7 @@ typedef struct st_table_share
bool is_view;
bool name_lock, replace_with_name_lock;
bool waiting_on_cond; /* Protection against free */
+ bool deleting; /* going to delete this table */
ulong table_map_id; /* for row-based replication */
ulonglong table_map_version;
@@ -1379,7 +1380,7 @@ struct TABLE_LIST
*/
bool create;
bool internal_tmp_table;
-
+ bool deleting; /* going to delete this table */
/* View creation context. */
=== modified file 'storage/maria/ha_maria.cc'
--- a/storage/maria/ha_maria.cc 2009-12-03 11:34:11 +0000
+++ b/storage/maria/ha_maria.cc 2010-02-10 19:06:24 +0000
@@ -2255,9 +2255,12 @@ int ha_maria::extra(enum ha_extra_functi
extern_lock(F_UNLOCK) (which resets file->trn) followed by maria_close()
without calling commit/rollback in between. If file->trn is not set
we can't remove file->share from the transaction list in the extra() call.
+
+ table->in_use is not set in the case this is a done as part of closefrm()
+ as part of drop table.
*/
- if (!file->trn &&
+ if (file->s->now_transactional && !file->trn && table->in_use &&
(operation == HA_EXTRA_PREPARE_FOR_DROP ||
operation == HA_EXTRA_PREPARE_FOR_RENAME))
{
=== modified file 'storage/maria/ma_blockrec.c'
--- a/storage/maria/ma_blockrec.c 2010-01-28 11:35:10 +0000
+++ b/storage/maria/ma_blockrec.c 2010-02-10 19:06:24 +0000
@@ -430,8 +430,9 @@ my_bool _ma_once_end_block_record(MARIA_
if (share->bitmap.file.file >= 0)
{
if (flush_pagecache_blocks(share->pagecache, &share->bitmap.file,
- share->temporary ? FLUSH_IGNORE_CHANGED :
- FLUSH_RELEASE))
+ ((share->temporary || share->deleting) ?
+ FLUSH_IGNORE_CHANGED :
+ FLUSH_RELEASE)))
res= 1;
/*
File must be synced as it is going out of the maria_open_list and so
=== modified file 'storage/maria/ma_close.c'
--- a/storage/maria/ma_close.c 2010-01-29 18:42:22 +0000
+++ b/storage/maria/ma_close.c 2010-02-10 19:06:24 +0000
@@ -79,7 +79,7 @@ int maria_close(register MARIA_HA *info)
if ((*share->once_end)(share))
error= my_errno;
if (flush_pagecache_blocks(share->pagecache, &share->kfile,
- (share->temporary ?
+ ((share->temporary || share->deleting) ?
FLUSH_IGNORE_CHANGED :
FLUSH_RELEASE)))
error= my_errno;
=== modified file 'storage/maria/ma_extra.c'
--- a/storage/maria/ma_extra.c 2009-10-06 06:13:56 +0000
+++ b/storage/maria/ma_extra.c 2010-02-10 19:06:24 +0000
@@ -305,6 +305,12 @@ int maria_extra(MARIA_HA *info, enum ha_
pthread_mutex_unlock(&THR_LOCK_maria);
break;
case HA_EXTRA_PREPARE_FOR_DROP:
+ /* Signals about intent to delete this table */
+ share->deleting= TRUE;
+ share->global_changed= FALSE; /* force writing changed flag */
+ /* To force repair if reopened */
+ _ma_mark_file_changed(info);
+ /* Fall trough */
case HA_EXTRA_PREPARE_FOR_RENAME:
{
my_bool do_flush= test(function != HA_EXTRA_PREPARE_FOR_DROP);
=== modified file 'storage/maria/ma_locking.c'
--- a/storage/maria/ma_locking.c 2009-10-06 06:13:56 +0000
+++ b/storage/maria/ma_locking.c 2010-02-10 19:06:24 +0000
@@ -387,6 +387,9 @@ int _ma_test_if_changed(register MARIA_H
open_count is not maintained on disk for temporary tables.
*/
+#define _MA_ALREADY_MARKED_FILE_CHANGED \
+ ((share->state.changed & STATE_CHANGED) && share->global_changed)
+
int _ma_mark_file_changed(MARIA_HA *info)
{
uchar buff[3];
@@ -394,8 +397,6 @@ int _ma_mark_file_changed(MARIA_HA *info
int error= 1;
DBUG_ENTER("_ma_mark_file_changed");
-#define _MA_ALREADY_MARKED_FILE_CHANGED \
- ((share->state.changed & STATE_CHANGED) && share->global_changed)
if (_MA_ALREADY_MARKED_FILE_CHANGED)
DBUG_RETURN(0);
pthread_mutex_lock(&share->intern_lock); /* recheck under mutex */
=== modified file 'storage/maria/ma_recovery.c'
--- a/storage/maria/ma_recovery.c 2009-10-26 11:35:42 +0000
+++ b/storage/maria/ma_recovery.c 2010-02-10 19:06:24 +0000
@@ -312,11 +312,14 @@ int maria_apply_log(LSN from_lsn, enum m
now= my_getsystime();
in_redo_phase= TRUE;
+ trnman_init(max_trid_in_control_file);
if (run_redo_phase(from_lsn, apply))
{
ma_message_no_user(0, "Redo phase failed");
+ trnman_destroy();
goto err;
}
+ trnman_destroy();
if ((uncommitted_trans=
end_of_redo_phase(should_run_undo_phase)) == (uint)-1)
=== modified file 'storage/maria/maria_def.h'
--- a/storage/maria/maria_def.h 2009-11-29 23:08:56 +0000
+++ b/storage/maria/maria_def.h 2010-02-10 19:06:24 +0000
@@ -390,6 +390,7 @@ typedef struct st_maria_share
my_bool now_transactional;
my_bool have_versioning;
my_bool key_del_used; /* != 0 if key_del is locked */
+ my_bool deleting; /* we are going to delete this table */
#ifdef THREAD
THR_LOCK lock;
void (*lock_restore_status)(void *);
=== modified file 'storage/myisam/mi_close.c'
--- a/storage/myisam/mi_close.c 2009-09-07 20:50:10 +0000
+++ b/storage/myisam/mi_close.c 2010-02-10 19:06:24 +0000
@@ -64,8 +64,9 @@ int mi_close(register MI_INFO *info)
if (share->kfile >= 0) abort(););
if (share->kfile >= 0 &&
flush_key_blocks(share->key_cache, share->kfile,
- share->temporary ? FLUSH_IGNORE_CHANGED :
- FLUSH_RELEASE))
+ ((share->temporary || share->deleting) ?
+ FLUSH_IGNORE_CHANGED :
+ FLUSH_RELEASE)))
error=my_errno;
if (share->kfile >= 0)
{
=== modified file 'storage/myisam/mi_extra.c'
--- a/storage/myisam/mi_extra.c 2009-10-06 06:13:56 +0000
+++ b/storage/myisam/mi_extra.c 2010-02-10 19:06:24 +0000
@@ -256,8 +256,13 @@ int mi_extra(MI_INFO *info, enum ha_extr
share->last_version= 0L; /* Impossible version */
pthread_mutex_unlock(&THR_LOCK_myisam);
break;
- case HA_EXTRA_PREPARE_FOR_RENAME:
case HA_EXTRA_PREPARE_FOR_DROP:
+ /* Signals about intent to delete this table */
+ share->deleting= TRUE;
+ share->global_changed= FALSE; /* force writing changed flag */
+ _mi_mark_file_changed(info);
+ /* Fall trough */
+ case HA_EXTRA_PREPARE_FOR_RENAME:
pthread_mutex_lock(&THR_LOCK_myisam);
share->last_version= 0L; /* Impossible version */
pthread_mutex_lock(&share->intern_lock);
=== modified file 'storage/myisam/mi_open.c'
--- a/storage/myisam/mi_open.c 2009-12-03 11:19:05 +0000
+++ b/storage/myisam/mi_open.c 2010-02-10 19:06:24 +0000
@@ -58,6 +58,8 @@ MI_INFO *test_if_reopen(char *filename)
{
MI_INFO *info=(MI_INFO*) pos->data;
MYISAM_SHARE *share=info->s;
+ DBUG_ASSERT(strcmp(share->unique_file_name,filename) ||
+ share->last_version);
if (!strcmp(share->unique_file_name,filename) && share->last_version)
return info;
}
=== modified file 'storage/myisam/myisamdef.h'
--- a/storage/myisam/myisamdef.h 2009-12-03 11:34:11 +0000
+++ b/storage/myisam/myisamdef.h 2010-02-10 19:06:24 +0000
@@ -221,6 +221,7 @@ typedef struct st_mi_isam_share
my_bool changed, /* If changed since lock */
global_changed, /* If changed since open */
not_flushed, temporary, delay_key_write, concurrent_insert;
+ my_bool deleting; /* we are going to delete this table */
#ifdef THREAD
THR_LOCK lock;
pthread_mutex_t intern_lock; /* Locking for use with _locking */
1
0
[Maria-developers] Rev 2757: Subquery optimizations backport: Update test results (checked) in file:///home/psergey/dev/maria-5.3-subqueries-r3/
by Sergey Petrunya 09 Feb '10
by Sergey Petrunya 09 Feb '10
09 Feb '10
At file:///home/psergey/dev/maria-5.3-subqueries-r3/
------------------------------------------------------------
revno: 2757
revision-id: psergey(a)askmonty.org-20100209203217-al1k9h50zrlphy5d
parent: psergey(a)askmonty.org-20100208133030-e4zjy15b7o14ud8c
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.3-subqueries-r3
timestamp: Tue 2010-02-09 23:32:17 +0300
message:
Subquery optimizations backport: Update test results (checked)
=== modified file 'mysql-test/r/join_cache.result'
--- a/mysql-test/r/join_cache.result 2009-12-21 02:26:15 +0000
+++ b/mysql-test/r/join_cache.result 2010-02-09 20:32:17 +0000
@@ -1028,8 +1028,8 @@
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
id select_type table type possible_keys key key_len ref rows Extra
-1 PRIMARY City ALL Population NULL NULL NULL 4079 Using where
-2 DEPENDENT SUBQUERY Country unique_subquery PRIMARY,Name PRIMARY 3 func 1 Using where
+1 PRIMARY Country range PRIMARY,Name Name 52 NULL 10 Using index condition; Using MRR
+1 PRIMARY City ref Population,Country Country 3 world.Country.Code 18 Using where; Using join buffer
SELECT Name FROM City
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
@@ -1343,8 +1343,8 @@
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
id select_type table type possible_keys key key_len ref rows Extra
-1 PRIMARY City ALL Population NULL NULL NULL 4079 Using where
-2 DEPENDENT SUBQUERY Country unique_subquery PRIMARY,Name PRIMARY 3 func 1 Using where
+1 PRIMARY Country range PRIMARY,Name Name 52 NULL 10 Using index condition; Using MRR
+1 PRIMARY City ref Population,Country Country 3 world.Country.Code 18 Using where; Using join buffer
SELECT Name FROM City
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
@@ -1658,8 +1658,8 @@
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
id select_type table type possible_keys key key_len ref rows Extra
-1 PRIMARY City ALL Population NULL NULL NULL 4079 Using where
-2 DEPENDENT SUBQUERY Country unique_subquery PRIMARY,Name PRIMARY 3 func 1 Using where
+1 PRIMARY Country range PRIMARY,Name Name 52 NULL 10 Using index condition; Using MRR
+1 PRIMARY City ref Population,Country Country 3 world.Country.Code 18 Using where; Using join buffer
SELECT Name FROM City
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
@@ -1973,8 +1973,8 @@
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
id select_type table type possible_keys key key_len ref rows Extra
-1 PRIMARY City ALL Population NULL NULL NULL 4079 Using where
-2 DEPENDENT SUBQUERY Country unique_subquery PRIMARY,Name PRIMARY 3 func 1 Using where
+1 PRIMARY Country range PRIMARY,Name Name 52 NULL 10 Using index condition; Using MRR
+1 PRIMARY City ref Population,Country Country 3 world.Country.Code 18 Using where; Using join buffer
SELECT Name FROM City
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
@@ -2292,8 +2292,8 @@
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
id select_type table type possible_keys key key_len ref rows Extra
-1 PRIMARY City ALL Population NULL NULL NULL 4079 Using where
-2 DEPENDENT SUBQUERY Country unique_subquery PRIMARY,Name PRIMARY 3 func 1 Using where
+1 PRIMARY Country range PRIMARY,Name Name 52 NULL 10 Using index condition; Using MRR
+1 PRIMARY City ref Population,Country Country 3 world.Country.Code 18 Using where; Using join buffer
SELECT Name FROM City
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
@@ -2514,8 +2514,8 @@
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
id select_type table type possible_keys key key_len ref rows Extra
-1 PRIMARY City ALL Population NULL NULL NULL 4079 Using where
-2 DEPENDENT SUBQUERY Country unique_subquery PRIMARY,Name PRIMARY 3 func 1 Using where
+1 PRIMARY Country range PRIMARY,Name Name 52 NULL 10 Using index condition; Using MRR
+1 PRIMARY City ref Population,Country Country 3 world.Country.Code 18 Using where; Using join buffer
SELECT Name FROM City
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
@@ -2736,8 +2736,8 @@
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
id select_type table type possible_keys key key_len ref rows Extra
-1 PRIMARY City ALL Population NULL NULL NULL 4079 Using where
-2 DEPENDENT SUBQUERY Country unique_subquery PRIMARY,Name PRIMARY 3 func 1 Using where
+1 PRIMARY Country range PRIMARY,Name Name 52 NULL 10 Using index condition; Using MRR
+1 PRIMARY City ref Population,Country Country 3 world.Country.Code 18 Using where; Using join buffer
SELECT Name FROM City
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
@@ -2958,8 +2958,8 @@
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
id select_type table type possible_keys key key_len ref rows Extra
-1 PRIMARY City ALL Population NULL NULL NULL 4079 Using where
-2 DEPENDENT SUBQUERY Country unique_subquery PRIMARY,Name PRIMARY 3 func 1 Using where
+1 PRIMARY Country range PRIMARY,Name Name 52 NULL 10 Using index condition; Using MRR
+1 PRIMARY City ref Population,Country Country 3 world.Country.Code 18 Using where; Using join buffer
SELECT Name FROM City
WHERE City.Country IN (SELECT Code FROM Country WHERE Country.Name LIKE 'L%') AND
City.Population > 100000;
=== modified file 'mysql-test/r/type_datetime.result'
--- a/mysql-test/r/type_datetime.result 2009-02-13 18:07:03 +0000
+++ b/mysql-test/r/type_datetime.result 2010-02-09 20:32:17 +0000
@@ -514,10 +514,9 @@
where id in (select id from t1 as x1 where (t1.cur_date is null));
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY NULL NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const tables
-2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
Warnings:
Note 1276 Field or reference 'test.t1.cur_date' of SELECT #2 was resolved in SELECT #1
-Note 1003 select '1' AS `id`,'2007-04-25 18:30:22' AS `cur_date` from `test`.`t1` where <in_optimizer>('1',<exists>(select 1 AS `Not_used` from `test`.`t1` `x1` where 0))
+Note 1003 select '1' AS `id`,'2007-04-25 18:30:22' AS `cur_date` from `test`.`t1` `x1` join `test`.`t1` where (('2007-04-25 18:30:22' = 0))
select * from t1
where id in (select id from t1 as x1 where (t1.cur_date is null));
id cur_date
@@ -526,10 +525,9 @@
where id in (select id from t2 as x1 where (t2.cur_date is null));
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY NULL NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const tables
-2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
Warnings:
Note 1276 Field or reference 'test.t2.cur_date' of SELECT #2 was resolved in SELECT #1
-Note 1003 select '1' AS `id`,'2007-04-25' AS `cur_date` from `test`.`t2` where <in_optimizer>('1',<exists>(select 1 AS `Not_used` from `test`.`t2` `x1` where 0))
+Note 1003 select '1' AS `id`,'2007-04-25' AS `cur_date` from `test`.`t2` `x1` join `test`.`t2` where (('2007-04-25' = 0))
select * from t2
where id in (select id from t2 as x1 where (t2.cur_date is null));
id cur_date
@@ -540,10 +538,10 @@
where id in (select id from t1 as x1 where (t1.cur_date is null));
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t1 ALL NULL NULL NULL NULL 2 100.00 Using where
-2 DEPENDENT SUBQUERY x1 ALL NULL NULL NULL NULL 2 100.00 Using where
+1 PRIMARY x1 ALL NULL NULL NULL NULL 2 100.00 Using where; FirstMatch(t1)
Warnings:
Note 1276 Field or reference 'test.t1.cur_date' of SELECT #2 was resolved in SELECT #1
-Note 1003 select `test`.`t1`.`id` AS `id`,`test`.`t1`.`cur_date` AS `cur_date` from `test`.`t1` where <in_optimizer>(`test`.`t1`.`id`,<exists>(select 1 AS `Not_used` from `test`.`t1` `x1` where ((`test`.`t1`.`cur_date` = 0) and (<cache>(`test`.`t1`.`id`) = `test`.`x1`.`id`))))
+Note 1003 select `test`.`t1`.`id` AS `id`,`test`.`t1`.`cur_date` AS `cur_date` from `test`.`t1` semi join (`test`.`t1` `x1`) where ((`test`.`x1`.`id` = `test`.`t1`.`id`) and (`test`.`t1`.`cur_date` = 0))
select * from t1
where id in (select id from t1 as x1 where (t1.cur_date is null));
id cur_date
@@ -552,10 +550,10 @@
where id in (select id from t2 as x1 where (t2.cur_date is null));
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t2 ALL NULL NULL NULL NULL 2 100.00 Using where
-2 DEPENDENT SUBQUERY x1 ALL NULL NULL NULL NULL 2 100.00 Using where
+1 PRIMARY x1 ALL NULL NULL NULL NULL 2 100.00 Using where; FirstMatch(t2)
Warnings:
Note 1276 Field or reference 'test.t2.cur_date' of SELECT #2 was resolved in SELECT #1
-Note 1003 select `test`.`t2`.`id` AS `id`,`test`.`t2`.`cur_date` AS `cur_date` from `test`.`t2` where <in_optimizer>(`test`.`t2`.`id`,<exists>(select 1 AS `Not_used` from `test`.`t2` `x1` where ((`test`.`t2`.`cur_date` = 0) and (<cache>(`test`.`t2`.`id`) = `test`.`x1`.`id`))))
+Note 1003 select `test`.`t2`.`id` AS `id`,`test`.`t2`.`cur_date` AS `cur_date` from `test`.`t2` semi join (`test`.`t2` `x1`) where ((`test`.`x1`.`id` = `test`.`t2`.`id`) and (`test`.`t2`.`cur_date` = 0))
select * from t2
where id in (select id from t2 as x1 where (t2.cur_date is null));
id cur_date
1
0