developers
Threads by month
- ----- 2025 -----
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 1 participants
- 6832 discussions

[Maria-developers] Updated (by Monty): Add support for google protocol buffers (34)
by worklog-noreply@askmonty.org 11 Feb '10
by worklog-noreply@askmonty.org 11 Feb '10
11 Feb '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add support for google protocol buffers
CREATION DATE..: Tue, 21 Jul 2009, 21:11
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 34 (http://askmonty.org/worklog/?tid=34)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Monty - Thu, 11 Feb 2010, 19:59)=-=-
High-Level Specification modified.
--- /tmp/wklog.34.old.17962 2010-02-11 19:59:45.000000000 +0200
+++ /tmp/wklog.34.new.17962 2010-02-11 19:59:45.000000000 +0200
@@ -1,13 +1,8 @@
-
-<contents>
1. GPB Encoding overview
2. GPB in an SQL database
-2.1 Informing server about GPB field names and types
-2.2 Addressing GPB fields
-2.2.1 Option1: SQL Function
-2.2.2 Option2: SQL columns
-</contents>
-
+3. Encoding to use for dynamic columns
+4. How to store and access data in a protocol buffer from SQL
+5. Extensions for the future
1. GPB Encoding overview
========================
@@ -37,42 +32,50 @@
traffic right away, and will open path to getting the best possible
performance.
-2.1 Informing server about GPB field names and types
-----------------------------------------------------
-User-friendly/meaningful access to GPB fields requires knowledge of GPB field
-names and types, which are not available from GPB message itself (see "GPB
-encoding overview" section).
-
-So the first issue to be addressed is to get the server to know the definition
-of stored messages. We intend to assume that all records have GPB messages
-that conform to a certain single definition, which gives one definition per
-GPB field.
+3. Encoding to use for dynamic columns
+======================================
-DecisionToMake: How to pass the server the GPB definition?
-First idea: add a CREATE TABLE parameter which will specify either the
-definition itself or path to .proto file with the definition.
+The data should be coded into the proto buffer in the following format:
+
+<field_number><value_type><value>[<field_number><value_type><value>...]
+
+Where field_number is a number between 0-65536 that identifes the field
+<value_type> is a enum of type 'Item_result'
+<value> is the value coded in proto format.
+
+In other words, we should have no nested or complex structure.
+
+4. How to store and access data in a protocol buffer from SQL
+============================================================
+
+User-friendly/meaningful access to GPB fields requires knowledge of
+GPB field names and types, which are not available from GPB message
+itself (see "GPB encoding overview" section).
+
+To make things easy for the user, we will at first stage provide SQL
+functions to manipulate a string that is actually in proto format.
-2.2 Addressing GPB fields
--------------------------
-We'll need to provide a way to access GPB fields. This can be complicated as
-structures that are encoded in GPB message can be nested and recursive.
-
-2.2.1 Option1: SQL Function
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Introduce an SQL function GPB_FIELD(path) which will return contents of the
-field.
-- Return type of the function will be determined from GPB message definition.
-- For path, we can use XPath selector (a subset of XPath) syntax.
-
-(TODO ^ the above needs to be specified in more detail. is the selector as
-simple as filesystem path or we allow quantifiers (with predicates?)?)
-
-2.2.2 Option2: SQL columns
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-Make GPB columns to be accessible as SQL columns.
-This approach has problems:
-- It might be hard to implement code-wise
- - (TODO will Virtual columns patch help??)
-- It is not clear how to access fields from nested structures. Should we allow
- quoted names like `foo/bar[2]/baz' ?
+The functions we should provde are:
+proto_get(gpb, field_number, type)
+
+This return the field tagged with 'field_number' from the 'gpb' buffer.
+
+Example: proto_get(blob, 1, varchar) -> Returns field number 1 as varchar
+
+proto_put(gpb, field_number, value)
+
+This returns a new gbp buffer with the new value appended.
+
+Example: proto_put(proto_put(blob, 1, 1), 2, "hello")
+
+5. Extension for future
+=======================
+
+In the future we may want to access data based on name and get MariaDB to
+automaticly know the correct type. To do this we need to be able to
+store a definition for the content of the proto buffer somewhere.
+
+DecisionToMake: How to pass the server the GPB definition?
+First idea: add a CREATE TABLE parameter which will specify the
+definition itself.
-=-=(Monty - Thu, 11 Feb 2010, 19:59)=-=-
High Level Description modified.
--- /tmp/wklog.34.old.17915 2010-02-11 17:59:17.000000000 +0000
+++ /tmp/wklog.34.new.17915 2010-02-11 17:59:17.000000000 +0000
@@ -1,5 +1,21 @@
-Add support for Google Protocol Buffers (further GPB). It should be possible
-to have columns that store GPB-encoded data, as well as use SQL constructs to
+Add support for dynamic columns:
+
+- A column that can hold information from many columns
+- One can instantly add or remove column data
+
+This is a useful feature for any store type of application, where you want to
+store different type of information for different kind of items.
+
+For example, for shoes you want to store: material, size, colour, maker
+For a computer you want to store ram, hard disk size etc...
+
+In a normal 'relational' system you would need to a table for each type.
+With dynamic columns you have all common items as fixed fields (like
+product_code, manufacturer, price) and the rest stored in a dynamic column.
+
+The proposed idea is to store the dynamic information in a blob in
+Google Protocol Buffers (further GPB) format and use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
+
Any support for indexing GPB data is outside of scope of this WL entry.
-=-=(Knielsen - Fri, 22 Jan 2010, 11:38)=-=-
Low Level Design modified.
--- /tmp/wklog.34.old.29965 2010-01-22 11:38:57.000000000 +0200
+++ /tmp/wklog.34.new.29965 2010-01-22 11:38:57.000000000 +0200
@@ -2,3 +2,12 @@
and a parser for text form of .proto file which then exposes the parsed
file via standard GPB message navigation API.
+* We should have both server-side support and client-side support (client side
+ means functions in libmysqlclient so that user can select the full BLOB and
+ extract fields in the application).
+
+* Add some kind of header to the GPB blob to support versioning and future
+ extensibility.
+
+* Add complete syntax description (update, add, drop, exists, ...).
+
-=-=(Psergey - Tue, 21 Jul 2009, 21:13)=-=-
Low Level Design modified.
--- /tmp/wklog.34.old.6462 2009-07-21 21:13:13.000000000 +0300
+++ /tmp/wklog.34.new.6462 2009-07-21 21:13:13.000000000 +0300
@@ -1 +1,4 @@
+* GPB tarball contains a protocol definition for .proto file structure itself
+ and a parser for text form of .proto file which then exposes the parsed
+ file via standard GPB message navigation API.
-=-=(Psergey - Tue, 21 Jul 2009, 21:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.34.old.6399 2009-07-21 21:12:23.000000000 +0300
+++ /tmp/wklog.34.new.6399 2009-07-21 21:12:23.000000000 +0300
@@ -1 +1,78 @@
+<contents>
+1. GPB Encoding overview
+2. GPB in an SQL database
+2.1 Informing server about GPB field names and types
+2.2 Addressing GPB fields
+2.2.1 Option1: SQL Function
+2.2.2 Option2: SQL columns
+</contents>
+
+
+1. GPB Encoding overview
+========================
+
+GBB is a compact encoding for structured and typed data. A unit of GPB data
+(it is called message) is only partially self-describing: it's possible to
+iterate over its parts, but, quoting the spec
+
+http://code.google.com/apis/protocolbuffers/docs/encoding.html:
+ " the name and declared type for each field can only be determined on the
+ decoding end by referencing the message type's definition (i.e. the .proto
+ file). "
+
+2. GPB in an SQL database
+=========================
+
+It is possible to store GPB data in MariaDB today - one can declare a binary
+blob column and use it to store GPB messages. Storing and retrieving entire
+messages will be the only available operations, though, as the server has no
+idea about the GPB format.
+It is apparent that ability to peek inside GPB data from SQL layer would be of
+great advantage: one would be able to
+- select only certain fields or parts of GPB messages
+- filter records based on the values of GPB fields
+- etc
+performing such operations at SQL layer will allow to reduce client<->server
+traffic right away, and will open path to getting the best possible
+performance.
+
+2.1 Informing server about GPB field names and types
+----------------------------------------------------
+User-friendly/meaningful access to GPB fields requires knowledge of GPB field
+names and types, which are not available from GPB message itself (see "GPB
+encoding overview" section).
+
+So the first issue to be addressed is to get the server to know the definition
+of stored messages. We intend to assume that all records have GPB messages
+that conform to a certain single definition, which gives one definition per
+GPB field.
+
+DecisionToMake: How to pass the server the GPB definition?
+First idea: add a CREATE TABLE parameter which will specify either the
+definition itself or path to .proto file with the definition.
+
+2.2 Addressing GPB fields
+-------------------------
+We'll need to provide a way to access GPB fields. This can be complicated as
+structures that are encoded in GPB message can be nested and recursive.
+
+2.2.1 Option1: SQL Function
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Introduce an SQL function GPB_FIELD(path) which will return contents of the
+field.
+- Return type of the function will be determined from GPB message definition.
+- For path, we can use XPath selector (a subset of XPath) syntax.
+
+(TODO ^ the above needs to be specified in more detail. is the selector as
+simple as filesystem path or we allow quantifiers (with predicates?)?)
+
+2.2.2 Option2: SQL columns
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make GPB columns to be accessible as SQL columns.
+This approach has problems:
+- It might be hard to implement code-wise
+ - (TODO will Virtual columns patch help??)
+- It is not clear how to access fields from nested structures. Should we allow
+ quoted names like `foo/bar[2]/baz' ?
+
DESCRIPTION:
Add support for dynamic columns:
- A column that can hold information from many columns
- One can instantly add or remove column data
This is a useful feature for any store type of application, where you want to
store different type of information for different kind of items.
For example, for shoes you want to store: material, size, colour, maker
For a computer you want to store ram, hard disk size etc...
In a normal 'relational' system you would need to a table for each type.
With dynamic columns you have all common items as fixed fields (like
product_code, manufacturer, price) and the rest stored in a dynamic column.
The proposed idea is to store the dynamic information in a blob in
Google Protocol Buffers (further GPB) format and use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
Any support for indexing GPB data is outside of scope of this WL entry.
HIGH-LEVEL SPECIFICATION:
1. GPB Encoding overview
2. GPB in an SQL database
3. Encoding to use for dynamic columns
4. How to store and access data in a protocol buffer from SQL
5. Extensions for the future
1. GPB Encoding overview
========================
GBB is a compact encoding for structured and typed data. A unit of GPB data
(it is called message) is only partially self-describing: it's possible to
iterate over its parts, but, quoting the spec
http://code.google.com/apis/protocolbuffers/docs/encoding.html:
" the name and declared type for each field can only be determined on the
decoding end by referencing the message type's definition (i.e. the .proto
file). "
2. GPB in an SQL database
=========================
It is possible to store GPB data in MariaDB today - one can declare a binary
blob column and use it to store GPB messages. Storing and retrieving entire
messages will be the only available operations, though, as the server has no
idea about the GPB format.
It is apparent that ability to peek inside GPB data from SQL layer would be of
great advantage: one would be able to
- select only certain fields or parts of GPB messages
- filter records based on the values of GPB fields
- etc
performing such operations at SQL layer will allow to reduce client<->server
traffic right away, and will open path to getting the best possible
performance.
3. Encoding to use for dynamic columns
======================================
The data should be coded into the proto buffer in the following format:
<field_number><value_type><value>[<field_number><value_type><value>...]
Where field_number is a number between 0-65536 that identifes the field
<value_type> is a enum of type 'Item_result'
<value> is the value coded in proto format.
In other words, we should have no nested or complex structure.
4. How to store and access data in a protocol buffer from SQL
============================================================
User-friendly/meaningful access to GPB fields requires knowledge of
GPB field names and types, which are not available from GPB message
itself (see "GPB encoding overview" section).
To make things easy for the user, we will at first stage provide SQL
functions to manipulate a string that is actually in proto format.
The functions we should provde are:
proto_get(gpb, field_number, type)
This return the field tagged with 'field_number' from the 'gpb' buffer.
Example: proto_get(blob, 1, varchar) -> Returns field number 1 as varchar
proto_put(gpb, field_number, value)
This returns a new gbp buffer with the new value appended.
Example: proto_put(proto_put(blob, 1, 1), 2, "hello")
5. Extension for future
=======================
In the future we may want to access data based on name and get MariaDB to
automaticly know the correct type. To do this we need to be able to
store a definition for the content of the proto buffer somewhere.
DecisionToMake: How to pass the server the GPB definition?
First idea: add a CREATE TABLE parameter which will specify the
definition itself.
LOW-LEVEL DESIGN:
* GPB tarball contains a protocol definition for .proto file structure itself
and a parser for text form of .proto file which then exposes the parsed
file via standard GPB message navigation API.
* We should have both server-side support and client-side support (client side
means functions in libmysqlclient so that user can select the full BLOB and
extract fields in the application).
* Add some kind of header to the GPB blob to support versioning and future
extensibility.
* Add complete syntax description (update, add, drop, exists, ...).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0

[Maria-developers] Updated (by Monty): Add support for google protocol buffers (34)
by worklog-noreply@askmonty.org 11 Feb '10
by worklog-noreply@askmonty.org 11 Feb '10
11 Feb '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add support for google protocol buffers
CREATION DATE..: Tue, 21 Jul 2009, 21:11
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 34 (http://askmonty.org/worklog/?tid=34)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Monty - Thu, 11 Feb 2010, 19:59)=-=-
High Level Description modified.
--- /tmp/wklog.34.old.17915 2010-02-11 17:59:17.000000000 +0000
+++ /tmp/wklog.34.new.17915 2010-02-11 17:59:17.000000000 +0000
@@ -1,5 +1,21 @@
-Add support for Google Protocol Buffers (further GPB). It should be possible
-to have columns that store GPB-encoded data, as well as use SQL constructs to
+Add support for dynamic columns:
+
+- A column that can hold information from many columns
+- One can instantly add or remove column data
+
+This is a useful feature for any store type of application, where you want to
+store different type of information for different kind of items.
+
+For example, for shoes you want to store: material, size, colour, maker
+For a computer you want to store ram, hard disk size etc...
+
+In a normal 'relational' system you would need to a table for each type.
+With dynamic columns you have all common items as fixed fields (like
+product_code, manufacturer, price) and the rest stored in a dynamic column.
+
+The proposed idea is to store the dynamic information in a blob in
+Google Protocol Buffers (further GPB) format and use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
+
Any support for indexing GPB data is outside of scope of this WL entry.
-=-=(Knielsen - Fri, 22 Jan 2010, 11:38)=-=-
Low Level Design modified.
--- /tmp/wklog.34.old.29965 2010-01-22 11:38:57.000000000 +0200
+++ /tmp/wklog.34.new.29965 2010-01-22 11:38:57.000000000 +0200
@@ -2,3 +2,12 @@
and a parser for text form of .proto file which then exposes the parsed
file via standard GPB message navigation API.
+* We should have both server-side support and client-side support (client side
+ means functions in libmysqlclient so that user can select the full BLOB and
+ extract fields in the application).
+
+* Add some kind of header to the GPB blob to support versioning and future
+ extensibility.
+
+* Add complete syntax description (update, add, drop, exists, ...).
+
-=-=(Psergey - Tue, 21 Jul 2009, 21:13)=-=-
Low Level Design modified.
--- /tmp/wklog.34.old.6462 2009-07-21 21:13:13.000000000 +0300
+++ /tmp/wklog.34.new.6462 2009-07-21 21:13:13.000000000 +0300
@@ -1 +1,4 @@
+* GPB tarball contains a protocol definition for .proto file structure itself
+ and a parser for text form of .proto file which then exposes the parsed
+ file via standard GPB message navigation API.
-=-=(Psergey - Tue, 21 Jul 2009, 21:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.34.old.6399 2009-07-21 21:12:23.000000000 +0300
+++ /tmp/wklog.34.new.6399 2009-07-21 21:12:23.000000000 +0300
@@ -1 +1,78 @@
+<contents>
+1. GPB Encoding overview
+2. GPB in an SQL database
+2.1 Informing server about GPB field names and types
+2.2 Addressing GPB fields
+2.2.1 Option1: SQL Function
+2.2.2 Option2: SQL columns
+</contents>
+
+
+1. GPB Encoding overview
+========================
+
+GBB is a compact encoding for structured and typed data. A unit of GPB data
+(it is called message) is only partially self-describing: it's possible to
+iterate over its parts, but, quoting the spec
+
+http://code.google.com/apis/protocolbuffers/docs/encoding.html:
+ " the name and declared type for each field can only be determined on the
+ decoding end by referencing the message type's definition (i.e. the .proto
+ file). "
+
+2. GPB in an SQL database
+=========================
+
+It is possible to store GPB data in MariaDB today - one can declare a binary
+blob column and use it to store GPB messages. Storing and retrieving entire
+messages will be the only available operations, though, as the server has no
+idea about the GPB format.
+It is apparent that ability to peek inside GPB data from SQL layer would be of
+great advantage: one would be able to
+- select only certain fields or parts of GPB messages
+- filter records based on the values of GPB fields
+- etc
+performing such operations at SQL layer will allow to reduce client<->server
+traffic right away, and will open path to getting the best possible
+performance.
+
+2.1 Informing server about GPB field names and types
+----------------------------------------------------
+User-friendly/meaningful access to GPB fields requires knowledge of GPB field
+names and types, which are not available from GPB message itself (see "GPB
+encoding overview" section).
+
+So the first issue to be addressed is to get the server to know the definition
+of stored messages. We intend to assume that all records have GPB messages
+that conform to a certain single definition, which gives one definition per
+GPB field.
+
+DecisionToMake: How to pass the server the GPB definition?
+First idea: add a CREATE TABLE parameter which will specify either the
+definition itself or path to .proto file with the definition.
+
+2.2 Addressing GPB fields
+-------------------------
+We'll need to provide a way to access GPB fields. This can be complicated as
+structures that are encoded in GPB message can be nested and recursive.
+
+2.2.1 Option1: SQL Function
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Introduce an SQL function GPB_FIELD(path) which will return contents of the
+field.
+- Return type of the function will be determined from GPB message definition.
+- For path, we can use XPath selector (a subset of XPath) syntax.
+
+(TODO ^ the above needs to be specified in more detail. is the selector as
+simple as filesystem path or we allow quantifiers (with predicates?)?)
+
+2.2.2 Option2: SQL columns
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make GPB columns to be accessible as SQL columns.
+This approach has problems:
+- It might be hard to implement code-wise
+ - (TODO will Virtual columns patch help??)
+- It is not clear how to access fields from nested structures. Should we allow
+ quoted names like `foo/bar[2]/baz' ?
+
DESCRIPTION:
Add support for dynamic columns:
- A column that can hold information from many columns
- One can instantly add or remove column data
This is a useful feature for any store type of application, where you want to
store different type of information for different kind of items.
For example, for shoes you want to store: material, size, colour, maker
For a computer you want to store ram, hard disk size etc...
In a normal 'relational' system you would need to a table for each type.
With dynamic columns you have all common items as fixed fields (like
product_code, manufacturer, price) and the rest stored in a dynamic column.
The proposed idea is to store the dynamic information in a blob in
Google Protocol Buffers (further GPB) format and use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
Any support for indexing GPB data is outside of scope of this WL entry.
HIGH-LEVEL SPECIFICATION:
<contents>
1. GPB Encoding overview
2. GPB in an SQL database
2.1 Informing server about GPB field names and types
2.2 Addressing GPB fields
2.2.1 Option1: SQL Function
2.2.2 Option2: SQL columns
</contents>
1. GPB Encoding overview
========================
GBB is a compact encoding for structured and typed data. A unit of GPB data
(it is called message) is only partially self-describing: it's possible to
iterate over its parts, but, quoting the spec
http://code.google.com/apis/protocolbuffers/docs/encoding.html:
" the name and declared type for each field can only be determined on the
decoding end by referencing the message type's definition (i.e. the .proto
file). "
2. GPB in an SQL database
=========================
It is possible to store GPB data in MariaDB today - one can declare a binary
blob column and use it to store GPB messages. Storing and retrieving entire
messages will be the only available operations, though, as the server has no
idea about the GPB format.
It is apparent that ability to peek inside GPB data from SQL layer would be of
great advantage: one would be able to
- select only certain fields or parts of GPB messages
- filter records based on the values of GPB fields
- etc
performing such operations at SQL layer will allow to reduce client<->server
traffic right away, and will open path to getting the best possible
performance.
2.1 Informing server about GPB field names and types
----------------------------------------------------
User-friendly/meaningful access to GPB fields requires knowledge of GPB field
names and types, which are not available from GPB message itself (see "GPB
encoding overview" section).
So the first issue to be addressed is to get the server to know the definition
of stored messages. We intend to assume that all records have GPB messages
that conform to a certain single definition, which gives one definition per
GPB field.
DecisionToMake: How to pass the server the GPB definition?
First idea: add a CREATE TABLE parameter which will specify either the
definition itself or path to .proto file with the definition.
2.2 Addressing GPB fields
-------------------------
We'll need to provide a way to access GPB fields. This can be complicated as
structures that are encoded in GPB message can be nested and recursive.
2.2.1 Option1: SQL Function
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduce an SQL function GPB_FIELD(path) which will return contents of the
field.
- Return type of the function will be determined from GPB message definition.
- For path, we can use XPath selector (a subset of XPath) syntax.
(TODO ^ the above needs to be specified in more detail. is the selector as
simple as filesystem path or we allow quantifiers (with predicates?)?)
2.2.2 Option2: SQL columns
~~~~~~~~~~~~~~~~~~~~~~~~~~
Make GPB columns to be accessible as SQL columns.
This approach has problems:
- It might be hard to implement code-wise
- (TODO will Virtual columns patch help??)
- It is not clear how to access fields from nested structures. Should we allow
quoted names like `foo/bar[2]/baz' ?
LOW-LEVEL DESIGN:
* GPB tarball contains a protocol definition for .proto file structure itself
and a parser for text form of .proto file which then exposes the parsed
file via standard GPB message navigation API.
* We should have both server-side support and client-side support (client side
means functions in libmysqlclient so that user can select the full BLOB and
extract fields in the application).
* Add some kind of header to the GPB blob to support versioning and future
extensibility.
* Add complete syntax description (update, add, drop, exists, ...).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0

[Maria-developers] Updated (by Monty): Add support for google protocol buffers (34)
by worklog-noreply@askmonty.org 11 Feb '10
by worklog-noreply@askmonty.org 11 Feb '10
11 Feb '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add support for google protocol buffers
CREATION DATE..: Tue, 21 Jul 2009, 21:11
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 34 (http://askmonty.org/worklog/?tid=34)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Monty - Thu, 11 Feb 2010, 19:59)=-=-
High Level Description modified.
--- /tmp/wklog.34.old.17915 2010-02-11 17:59:17.000000000 +0000
+++ /tmp/wklog.34.new.17915 2010-02-11 17:59:17.000000000 +0000
@@ -1,5 +1,21 @@
-Add support for Google Protocol Buffers (further GPB). It should be possible
-to have columns that store GPB-encoded data, as well as use SQL constructs to
+Add support for dynamic columns:
+
+- A column that can hold information from many columns
+- One can instantly add or remove column data
+
+This is a useful feature for any store type of application, where you want to
+store different type of information for different kind of items.
+
+For example, for shoes you want to store: material, size, colour, maker
+For a computer you want to store ram, hard disk size etc...
+
+In a normal 'relational' system you would need to a table for each type.
+With dynamic columns you have all common items as fixed fields (like
+product_code, manufacturer, price) and the rest stored in a dynamic column.
+
+The proposed idea is to store the dynamic information in a blob in
+Google Protocol Buffers (further GPB) format and use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
+
Any support for indexing GPB data is outside of scope of this WL entry.
-=-=(Knielsen - Fri, 22 Jan 2010, 11:38)=-=-
Low Level Design modified.
--- /tmp/wklog.34.old.29965 2010-01-22 11:38:57.000000000 +0200
+++ /tmp/wklog.34.new.29965 2010-01-22 11:38:57.000000000 +0200
@@ -2,3 +2,12 @@
and a parser for text form of .proto file which then exposes the parsed
file via standard GPB message navigation API.
+* We should have both server-side support and client-side support (client side
+ means functions in libmysqlclient so that user can select the full BLOB and
+ extract fields in the application).
+
+* Add some kind of header to the GPB blob to support versioning and future
+ extensibility.
+
+* Add complete syntax description (update, add, drop, exists, ...).
+
-=-=(Psergey - Tue, 21 Jul 2009, 21:13)=-=-
Low Level Design modified.
--- /tmp/wklog.34.old.6462 2009-07-21 21:13:13.000000000 +0300
+++ /tmp/wklog.34.new.6462 2009-07-21 21:13:13.000000000 +0300
@@ -1 +1,4 @@
+* GPB tarball contains a protocol definition for .proto file structure itself
+ and a parser for text form of .proto file which then exposes the parsed
+ file via standard GPB message navigation API.
-=-=(Psergey - Tue, 21 Jul 2009, 21:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.34.old.6399 2009-07-21 21:12:23.000000000 +0300
+++ /tmp/wklog.34.new.6399 2009-07-21 21:12:23.000000000 +0300
@@ -1 +1,78 @@
+<contents>
+1. GPB Encoding overview
+2. GPB in an SQL database
+2.1 Informing server about GPB field names and types
+2.2 Addressing GPB fields
+2.2.1 Option1: SQL Function
+2.2.2 Option2: SQL columns
+</contents>
+
+
+1. GPB Encoding overview
+========================
+
+GBB is a compact encoding for structured and typed data. A unit of GPB data
+(it is called message) is only partially self-describing: it's possible to
+iterate over its parts, but, quoting the spec
+
+http://code.google.com/apis/protocolbuffers/docs/encoding.html:
+ " the name and declared type for each field can only be determined on the
+ decoding end by referencing the message type's definition (i.e. the .proto
+ file). "
+
+2. GPB in an SQL database
+=========================
+
+It is possible to store GPB data in MariaDB today - one can declare a binary
+blob column and use it to store GPB messages. Storing and retrieving entire
+messages will be the only available operations, though, as the server has no
+idea about the GPB format.
+It is apparent that ability to peek inside GPB data from SQL layer would be of
+great advantage: one would be able to
+- select only certain fields or parts of GPB messages
+- filter records based on the values of GPB fields
+- etc
+performing such operations at SQL layer will allow to reduce client<->server
+traffic right away, and will open path to getting the best possible
+performance.
+
+2.1 Informing server about GPB field names and types
+----------------------------------------------------
+User-friendly/meaningful access to GPB fields requires knowledge of GPB field
+names and types, which are not available from GPB message itself (see "GPB
+encoding overview" section).
+
+So the first issue to be addressed is to get the server to know the definition
+of stored messages. We intend to assume that all records have GPB messages
+that conform to a certain single definition, which gives one definition per
+GPB field.
+
+DecisionToMake: How to pass the server the GPB definition?
+First idea: add a CREATE TABLE parameter which will specify either the
+definition itself or path to .proto file with the definition.
+
+2.2 Addressing GPB fields
+-------------------------
+We'll need to provide a way to access GPB fields. This can be complicated as
+structures that are encoded in GPB message can be nested and recursive.
+
+2.2.1 Option1: SQL Function
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Introduce an SQL function GPB_FIELD(path) which will return contents of the
+field.
+- Return type of the function will be determined from GPB message definition.
+- For path, we can use XPath selector (a subset of XPath) syntax.
+
+(TODO ^ the above needs to be specified in more detail. is the selector as
+simple as filesystem path or we allow quantifiers (with predicates?)?)
+
+2.2.2 Option2: SQL columns
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make GPB columns to be accessible as SQL columns.
+This approach has problems:
+- It might be hard to implement code-wise
+ - (TODO will Virtual columns patch help??)
+- It is not clear how to access fields from nested structures. Should we allow
+ quoted names like `foo/bar[2]/baz' ?
+
DESCRIPTION:
Add support for dynamic columns:
- A column that can hold information from many columns
- One can instantly add or remove column data
This is a useful feature for any store type of application, where you want to
store different type of information for different kind of items.
For example, for shoes you want to store: material, size, colour, maker
For a computer you want to store ram, hard disk size etc...
In a normal 'relational' system you would need to a table for each type.
With dynamic columns you have all common items as fixed fields (like
product_code, manufacturer, price) and the rest stored in a dynamic column.
The proposed idea is to store the dynamic information in a blob in
Google Protocol Buffers (further GPB) format and use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
Any support for indexing GPB data is outside of scope of this WL entry.
HIGH-LEVEL SPECIFICATION:
<contents>
1. GPB Encoding overview
2. GPB in an SQL database
2.1 Informing server about GPB field names and types
2.2 Addressing GPB fields
2.2.1 Option1: SQL Function
2.2.2 Option2: SQL columns
</contents>
1. GPB Encoding overview
========================
GBB is a compact encoding for structured and typed data. A unit of GPB data
(it is called message) is only partially self-describing: it's possible to
iterate over its parts, but, quoting the spec
http://code.google.com/apis/protocolbuffers/docs/encoding.html:
" the name and declared type for each field can only be determined on the
decoding end by referencing the message type's definition (i.e. the .proto
file). "
2. GPB in an SQL database
=========================
It is possible to store GPB data in MariaDB today - one can declare a binary
blob column and use it to store GPB messages. Storing and retrieving entire
messages will be the only available operations, though, as the server has no
idea about the GPB format.
It is apparent that ability to peek inside GPB data from SQL layer would be of
great advantage: one would be able to
- select only certain fields or parts of GPB messages
- filter records based on the values of GPB fields
- etc
performing such operations at SQL layer will allow to reduce client<->server
traffic right away, and will open path to getting the best possible
performance.
2.1 Informing server about GPB field names and types
----------------------------------------------------
User-friendly/meaningful access to GPB fields requires knowledge of GPB field
names and types, which are not available from GPB message itself (see "GPB
encoding overview" section).
So the first issue to be addressed is to get the server to know the definition
of stored messages. We intend to assume that all records have GPB messages
that conform to a certain single definition, which gives one definition per
GPB field.
DecisionToMake: How to pass the server the GPB definition?
First idea: add a CREATE TABLE parameter which will specify either the
definition itself or path to .proto file with the definition.
2.2 Addressing GPB fields
-------------------------
We'll need to provide a way to access GPB fields. This can be complicated as
structures that are encoded in GPB message can be nested and recursive.
2.2.1 Option1: SQL Function
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduce an SQL function GPB_FIELD(path) which will return contents of the
field.
- Return type of the function will be determined from GPB message definition.
- For path, we can use XPath selector (a subset of XPath) syntax.
(TODO ^ the above needs to be specified in more detail. is the selector as
simple as filesystem path or we allow quantifiers (with predicates?)?)
2.2.2 Option2: SQL columns
~~~~~~~~~~~~~~~~~~~~~~~~~~
Make GPB columns to be accessible as SQL columns.
This approach has problems:
- It might be hard to implement code-wise
- (TODO will Virtual columns patch help??)
- It is not clear how to access fields from nested structures. Should we allow
quoted names like `foo/bar[2]/baz' ?
LOW-LEVEL DESIGN:
* GPB tarball contains a protocol definition for .proto file structure itself
and a parser for text form of .proto file which then exposes the parsed
file via standard GPB message navigation API.
* We should have both server-side support and client-side support (client side
means functions in libmysqlclient so that user can select the full BLOB and
extract fields in the application).
* Add some kind of header to the GPB blob to support versioning and future
extensibility.
* Add complete syntax description (update, add, drop, exists, ...).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0

Re: [Maria-developers] Rev 2740: Group commit for maria storage engine. in file:///Users/bell/maria/bzr/work-maria-5.2-groupcommit/
by Oleksandr Byelkin 11 Feb '10
by Oleksandr Byelkin 11 Feb '10
11 Feb '10
Hi!
10 февр. 2010, в 21:38, Sergei Golubchik написал(а):
[skip]
>>> Why use my_atomic_store32 ?
>>
>> As I understood idea of atomic operation it is guaranted that we will
>> read consistent value (not one byte from one value and other one from
>> other). Yes I remember your statement that on modern 32bit system
>> you
>> always get it consistent, then why we made atomic operations at all?
>
> Because my_atomic_store32() also adds a full memory barrier to the
> atomic store operation. That is, if you do
>
> my_atomic_store32(&a, 1);
> my_atomic_store32(&b, 2);
>
> and then in another thread
>
> if (my_atomic_load32(&b) == 2)
> {
> ...
> here you can be sure that a==1, because a=1 was executed before
> b=2. And neither compiler nor the cpu swapped two assignments.
> }
>
In other words it is real current value of the variable in all
threads. It looks like what I need.
2
1

[Maria-developers] Rev 2758: Subquery optimizations: backport in file:///home/psergey/dev/maria-5.3-subqueries-r3/
by Sergey Petrunya 11 Feb '10
by Sergey Petrunya 11 Feb '10
11 Feb '10
At file:///home/psergey/dev/maria-5.3-subqueries-r3/
------------------------------------------------------------
revno: 2758
revision-id: psergey(a)askmonty.org-20100211120315-o1hpcxl5lkbrbl25
parent: psergey(a)askmonty.org-20100209203217-al1k9h50zrlphy5d
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.3-subqueries-r3
timestamp: Thu 2010-02-11 15:03:15 +0300
message:
Subquery optimizations: backport
- Fix valgrind failure: do initialize Item::is_expensive_cache.
=== modified file 'sql/item.cc'
--- a/sql/item.cc 2010-02-08 13:10:19 +0000
+++ b/sql/item.cc 2010-02-11 12:03:15 +0000
@@ -373,8 +373,8 @@
Item::Item():
- rsize(0), name(0), orig_name(0), name_length(0), fixed(0),
- is_autogenerated_name(TRUE),
+ is_expensive_cache(-1), rsize(0), name(0), orig_name(0), name_length(0),
+ fixed(0), is_autogenerated_name(TRUE),
collation(&my_charset_bin, DERIVATION_COERCIBLE)
{
marker= 0;
@@ -410,6 +410,7 @@
tables.
*/
Item::Item(THD *thd, Item *item):
+ is_expensive_cache(-1),
rsize(0),
str_value(item->str_value),
name(item->name),
=== modified file 'sql/item.h'
--- a/sql/item.h 2010-01-28 13:48:33 +0000
+++ b/sql/item.h 2010-02-11 12:03:15 +0000
@@ -513,6 +513,9 @@
enum traverse_order { POSTFIX, PREFIX };
+ /* Cache of the result of is_expensive(). */
+ int8 is_expensive_cache;
+
/* Reuse size, only used by SP local variable assignment, otherwize 0 */
uint rsize;
@@ -878,9 +881,6 @@
static CHARSET_INFO *default_charset();
virtual CHARSET_INFO *compare_collation() { return NULL; }
- /* Cache of the result of is_expensive(). */
- int8 is_expensive_cache;
-
virtual bool walk(Item_processor processor, bool walk_subquery, uchar *arg)
{
return (this->*processor)(arg);
1
0

[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2815: Added option --temporary-tables to test speed of temporary tables
by noreply@launchpad.net 10 Feb '10
by noreply@launchpad.net 10 Feb '10
10 Feb '10
------------------------------------------------------------
revno: 2815
committer: Michael Widenius <monty(a)askmonty.org>
branch nick: maria-5.1
timestamp: Wed 2010-02-10 23:26:06 +0200
message:
Added option --temporary-tables to test speed of temporary tables
added:
mysql-test/suite/parts/t/partition_repair_myisam-master.opt
modified:
sql-bench/bench-init.pl.sh
sql-bench/server-cfg.sh
sql-bench/test-connect.sh
sql-bench/test-create.sh
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0

[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2814: When one does a drop table, the indexes are not flushed to disk before drop anymore (with MyISAM/...
by noreply@launchpad.net 10 Feb '10
by noreply@launchpad.net 10 Feb '10
10 Feb '10
------------------------------------------------------------
revno: 2814
committer: Michael Widenius <monty(a)askmonty.org>
branch nick: maria-5.1
timestamp: Wed 2010-02-10 21:06:24 +0200
message:
When one does a drop table, the indexes are not flushed to disk before drop anymore (with MyISAM/Maria)
myisam-recover options changed from OFF to 'DEFAULT' to get less change of data loss when using MyISAM.
(The disadvantage is that changed MyISAM tables will be checked at access time; Use --myisam-recover=OFF for old behavior)
Don't call extra(HA_EXTRA_FORCE_REOPEN) in ALTER TABLE if table is locked as this will mark table as crashed!
Added assert to detect if we accidently would use MyISAM versioning in MySQL
modified:
include/my_base.h
mysql-test/mysql-test-run.pl
mysql-test/r/sp-destruct.result
mysql-test/r/variables.result
mysql-test/r/view.result
mysql-test/suite/maria/t/maria-recovery2-master.opt
mysql-test/t/sp-destruct.test
mysql-test/t/view.test
sql/lock.cc
sql/mysql_priv.h
sql/mysqld.cc
sql/sql_base.cc
sql/sql_delete.cc
sql/sql_table.cc
sql/table.cc
sql/table.h
storage/maria/ha_maria.cc
storage/maria/ma_blockrec.c
storage/maria/ma_close.c
storage/maria/ma_extra.c
storage/maria/ma_locking.c
storage/maria/ma_recovery.c
storage/maria/maria_def.h
storage/myisam/mi_close.c
storage/myisam/mi_extra.c
storage/myisam/mi_open.c
storage/myisam/myisamdef.h
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0

[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (monty:2815)
by Michael Widenius 10 Feb '10
by Michael Widenius 10 Feb '10
10 Feb '10
#At lp:maria based on revid:monty@askmonty.org-20100210190624-38ucdn8y98k1v1zd
2815 Michael Widenius 2010-02-10
Added option --temporary-tables to test speed of temporary tables
added:
mysql-test/suite/parts/t/partition_repair_myisam-master.opt
modified:
sql-bench/bench-init.pl.sh
sql-bench/server-cfg.sh
sql-bench/test-connect.sh
sql-bench/test-create.sh
per-file messages:
mysql-test/suite/parts/t/partition_repair_myisam-master.opt
Added missing file from last push
sql-bench/bench-init.pl.sh
Added options:
--temporary-tables to test speed of temporary tables
sql-bench/server-cfg.sh
Added limit for number of temporary tables one can create
sql-bench/test-connect.sh
Skip test that doesn't work with temporary tables.
sql-bench/test-create.sh
Added limit for number of temporary tables one can create
=== added file 'mysql-test/suite/parts/t/partition_repair_myisam-master.opt'
--- a/mysql-test/suite/parts/t/partition_repair_myisam-master.opt 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/parts/t/partition_repair_myisam-master.opt 2010-02-10 21:26:06 +0000
@@ -0,0 +1 @@
+--myisam-recover=off
=== modified file 'sql-bench/bench-init.pl.sh'
--- a/sql-bench/bench-init.pl.sh 2010-02-09 17:17:04 +0000
+++ b/sql-bench/bench-init.pl.sh 2010-02-10 21:26:06 +0000
@@ -39,7 +39,7 @@ require "$pwd/server-cfg" || die "Can't
$|=1; # Output data immediately
-$opt_skip_test=$opt_skip_create=$opt_skip_delete=$opt_verbose=$opt_fast_insert=$opt_lock_tables=$opt_debug=$opt_skip_delete=$opt_fast=$opt_force=$opt_log=$opt_use_old_results=$opt_help=$opt_odbc=$opt_small_test=$opt_small_tables=$opt_samll_key_tables=$opt_stage=$opt_old_headers=$opt_die_on_errors=$opt_tcpip=$opt_random=$opt_only_missing_tests=0;
+$opt_skip_test=$opt_skip_create=$opt_skip_delete=$opt_verbose=$opt_fast_insert=$opt_lock_tables=$opt_debug=$opt_skip_delete=$opt_fast=$opt_force=$opt_log=$opt_use_old_results=$opt_help=$opt_odbc=$opt_small_test=$opt_small_tables=$opt_samll_key_tables=$opt_stage=$opt_old_headers=$opt_die_on_errors=$opt_tcpip=$opt_random=$opt_only_missing_tests=$opt_temporary_tables=0;
$opt_cmp=$opt_user=$opt_password=$opt_connect_options=$opt_connect_command= "";
$opt_server="mysql"; $opt_dir="output";
$opt_host="localhost";$opt_database="test";
@@ -59,7 +59,7 @@ $log_prog_args=join(" ", skip_arguments(
"use-old-results","skip-test",
"optimization","hw",
"machine", "dir", "suffix", "log"));
-GetOptions("skip-test=s","comments=s","cmp=s","server=s","user=s","host=s","database=s","password=s","loop-count=i","row-count=i","skip-create","skip-delete","verbose","fast-insert","lock-tables","debug","fast","force","field-count=i","regions=i","groups=i","time-limit=i","log","use-old-results","machine=s","dir=s","suffix=s","help","odbc","small-test","small-tables","small-key-tables","stage=i","threads=i","random","old-headers","die-on-errors","create-options=s","hires","tcpip","silent","optimization=s","hw=s","socket=s","connect-options=s","connect-command=s","only-missing-tests") || usage();
+GetOptions("skip-test=s","comments=s","cmp=s","server=s","user=s","host=s","database=s","password=s","loop-count=i","row-count=i","skip-create","skip-delete","verbose","fast-insert","lock-tables","debug","fast","force","field-count=i","regions=i","groups=i","time-limit=i","log","use-old-results","machine=s","dir=s","suffix=s","help","odbc","small-test","small-tables","small-key-tables","stage=i","threads=i","random","old-headers","die-on-errors","create-options=s","hires","tcpip","silent","optimization=s","hw=s","socket=s","connect-options=s","connect-command=s","only-missing-tests","temporary-tables") || usage();
usage() if ($opt_help);
$server=get_server($opt_server,$opt_host,$opt_database,$opt_odbc,
@@ -454,6 +454,9 @@ All benchmarks takes the following optio
create all MySQL tables as InnoDB tables use:
--create-options=ENGINE=InnoDB
+--temporary-tables
+ Use temporary tables for all tests.
+
--database (Default $opt_database)
In which database the test tables are created.
=== modified file 'sql-bench/server-cfg.sh'
--- a/sql-bench/server-cfg.sh 2010-02-09 17:17:04 +0000
+++ b/sql-bench/server-cfg.sh 2010-02-10 21:26:06 +0000
@@ -159,6 +159,7 @@ sub new
$limits{'max_index'} = 16; # Max number of keys
$limits{'max_index_parts'} = 16; # Max segments/key
$limits{'max_tables'} = (($machine || '') =~ "^win") ? 5000 : 65000;
+ $limits{'max_temporary_tables'}= 400;
$limits{'max_text_size'} = 1000000; # Good enough for tests
$limits{'multi_drop'} = 1; # Drop table can take many tables
$limits{'order_by_position'} = 1; # Can use 'ORDER BY 1'
@@ -189,6 +190,7 @@ sub new
$self->{'transactions'} = 1; # Transactions enabled
$limits{'max_columns'} = 90; # Max number of columns in table
$limits{'max_tables'} = 32; # No comments
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
}
if (defined($main::opt_create_options) &&
$main::opt_create_options =~ /engine=bdb/i)
@@ -200,6 +202,7 @@ sub new
{
$limits{'working_blobs'} = 0; # Blobs not implemented yet
$limits{'max_tables'} = 500;
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$self->{'transactions'} = 1; # Transactions enabled
}
@@ -270,7 +273,14 @@ sub create
my($self,$table_name,$fields,$index,$options) = @_;
my($query,@queries);
- $query="create table $table_name (";
+ if ($main::opt_temporary_tables)
+ {
+ $query="create temporary table $table_name (";
+ }
+ else
+ {
+ $query="create table $table_name (";
+ }
foreach $field (@$fields)
{
# $field =~ s/ decimal/ double(10,2)/i;
@@ -393,6 +403,7 @@ sub new
$limits{'max_conditions'} = 74;
$limits{'max_columns'} = 75;
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$limits{'max_text_size'} = 32000;
$limits{'query_size'} = 65535;
$limits{'max_index'} = 5;
@@ -622,7 +633,9 @@ sub new
$limits{'max_conditions'} = 9999; # This makes Pg real slow
$limits{'max_index'} = 64; # Big enough
$limits{'max_index_parts'} = 16;
- $limits{'max_tables'} = 5000; # 10000 crashes pg 7.0.2
+ $limits{'max_tables'} = 65000;
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 65000; # Good enough for test
$limits{'multi_drop'} = 1;
$limits{'order_by_position'} = 1;
@@ -873,6 +886,8 @@ sub new
$limits{'max_conditions'} = 9999; # Probably big enough
$limits{'max_columns'} = 2000; # From crash-me
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 65492; # According to tests
$limits{'query_size'} = 65535; # Probably a limit
$limits{'max_index'} = 64; # Probably big enough
@@ -1104,6 +1119,7 @@ sub new
# above this value .... but can handle 2419 columns
# maybe something for crash-me ... but how to check ???
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$limits{'max_text_size'} = 4095; # max returned ....
$limits{'query_size'} = 65535; # Not a limit, big enough
$limits{'max_index'} = 64; # Big enough
@@ -1374,6 +1390,8 @@ sub new
$limits{'max_conditions'} = 9999; # (Actually not a limit)
$limits{'max_columns'} = 254; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 2000; # Limit for blob test-connect
$limits{'query_size'} = 65525; # Max size with default buffers.
$limits{'max_index'} = 16; # Max number of keys
@@ -1647,6 +1665,8 @@ sub new
$limits{'max_column_name'} = 18; # max table and column name
$limits{'max_columns'} = 994; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_index'} = 64; # Max number of keys
$limits{'max_index_parts'} = 15; # Max segments/key
$limits{'max_text_size'} = 65535; # Max size with default buffers. ??
@@ -1835,6 +1855,8 @@ sub new
$limits{'max_conditions'} = 97; # We get 'Query is too complex'
$limits{'max_columns'} = 255; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 255; # Max size with default buffers.
$limits{'query_size'} = 65535; # Not a limit, big enough
$limits{'max_index'} = 32; # Max number of keys
@@ -2020,6 +2042,8 @@ sub new
$limits{'max_conditions'} = 1030; # We get 'Query is too complex'
$limits{'max_columns'} = 250; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 9830; # Max size with default buffers.
$limits{'query_size'} = 9830; # Max size with default buffers.
$limits{'max_index'} = 64; # Max number of keys
@@ -2216,6 +2240,8 @@ sub new
$limits{'max_conditions'} = 1030; # We get 'Query is too complex'
$limits{'max_columns'} = 250; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 9830; # Max size with default buffers.
$limits{'query_size'} = 9830; # Max size with default buffers.
$limits{'max_index'} = 64; # Max number of keys
@@ -2448,6 +2474,8 @@ sub new
$limits{'max_conditions'} = 50; # (Actually not a limit)
$limits{'max_columns'} = 254; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 2000; # Limit for blob test-connect
$limits{'query_size'} = 65525; # Max size with default buffers.
$limits{'max_index'} = 16; # Max number of keys
@@ -2652,6 +2680,8 @@ sub new
$limits{'max_conditions'} = 418; # We get 'Query is too complex'
$limits{'max_columns'} = 500; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
+
$limits{'max_text_size'} = 254; # Max size with default buffers.
$limits{'query_size'} = 254; # Max size with default buffers.
$limits{'max_index'} = 48; # Max number of keys
@@ -2830,6 +2860,7 @@ sub new
$limits{'max_conditions'} = 9999; # (Actually not a limit)
$limits{'max_columns'} = 252; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$limits{'max_text_size'} = 15000; # Max size with default buffers.
$limits{'query_size'} = 1000000; # Max size with default buffers.
$limits{'max_index'} = 32; # Max number of keys
@@ -3032,6 +3063,7 @@ sub new
$limits{'max_conditions'} = 9999; # (Actually not a limit)
$limits{'max_columns'} = 252; # Max number of columns in table
$limits{'max_tables'} = 65000; # Should be big enough
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$limits{'max_text_size'} = 15000; # Max size with default buffers.
$limits{'query_size'} = 1000000; # Max size with default buffers.
$limits{'max_index'} = 65000; # Max number of keys
@@ -3228,6 +3260,7 @@ sub new
# The following should be 8192, but is smaller because Frontbase crashes..
$limits{'max_columns'} = 150; # Max number of columns in table
$limits{'max_tables'} = 5000; # 10000 crashed FrontBase
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$limits{'max_text_size'} = 65000; # Max size with default buffers.
$limits{'query_size'} = 8000000; # Max size with default buffers.
$limits{'max_index'} = 38; # Max number of keys
@@ -3440,6 +3473,7 @@ sub new
$limits{'max_conditions'} = 9999; # (Actually not a limit) *
$limits{'max_columns'} = 1023; # Max number of columns in table *
$limits{'max_tables'} = 65000; # Should be big enough * unlimited actually
+ $limits{'max_temporary_tables'}= $limits{"max_tables"};
$limits{'max_text_size'} = 15000; # Max size with default buffers.
$limits{'query_size'} = 64*1024; # Max size with default buffers. *64 kb by default. May be set by system variable
$limits{'max_index'} = 510; # Max number of keys *
=== modified file 'sql-bench/test-connect.sh'
--- a/sql-bench/test-connect.sh 2009-05-29 13:40:55 +0000
+++ b/sql-bench/test-connect.sh 2010-02-10 21:26:06 +0000
@@ -161,41 +161,48 @@ if ($opt_fast && defined($server->{vacuu
{
$server->vacuum(0,\$dbh);
}
-$dbh->disconnect;
+if (!$main::opt_temporary_tables)
+{
+ $dbh->disconnect;
+}
#
# First test connect/select/disconnect
#
-print "Testing connect/select 1 row from table/disconnect\n";
+if (!$main::opt_temporary_tables)
+{
+ print "Testing connect/select 1 row from table/disconnect\n";
-$loop_time=new Benchmark;
-$errors=0;
+ $loop_time=new Benchmark;
+ $errors=0;
-for ($i=0 ; $i < $small_loop_count ; $i++)
-{
- for ($j=0; $j < $max_test ; $j++)
+ for ($i=0 ; $i < $small_loop_count ; $i++)
{
- last if ($dbh = DBI->connect($server->{'data_source'}, $opt_user, $opt_password));
- $errors++;
- }
- die $DBI::errstr if ($j == $max_test);
+ for ($j=0; $j < $max_test ; $j++)
+ {
+ last if ($dbh = DBI->connect($server->{'data_source'}, $opt_user, $opt_password));
+ $errors++;
+ }
+ die $DBI::errstr if ($j == $max_test);
- $sth = $dbh->do("select a,i,s,$i from bench1") # Select * from table with 1 record
+ $sth = $dbh->do("select a,i,s,$i from bench1") # Select * from table with 1 record
or die $DBI::errstr;
- $dbh->disconnect;
-}
+ $dbh->disconnect;
+ }
-$end_time=new Benchmark;
-print "Warning: $errors connections didn't work without a time delay\n" if ($errors);
-print "Time to connect+select_1_row ($small_loop_count): " .
+ $end_time=new Benchmark;
+ print "Warning: $errors connections didn't work without a time delay\n" if ($errors);
+ print "Time to connect+select_1_row ($small_loop_count): " .
timestr(timediff($end_time, $loop_time),"all") . "\n\n";
+ $dbh = $server->connect();
+}
+
#
# The same test, but without connect/disconnect
#
print "Testing select 1 row from table\n";
-$dbh = $server->connect();
$loop_time=new Benchmark;
for ($i=0 ; $i < $opt_loop_count ; $i++)
=== modified file 'sql-bench/test-create.sh'
--- a/sql-bench/test-create.sh 2009-05-29 13:40:55 +0000
+++ b/sql-bench/test-create.sh 2010-02-10 21:26:06 +0000
@@ -47,7 +47,15 @@ if ($opt_small_test)
$create_loop_count/=1000;
}
-$max_tables=min($limits->{'max_tables'},$opt_loop_count);
+if ($opt_temporary_tables)
+{
+ $max_tables=min($limits->{'max_tables'},$opt_loop_count);
+}
+else
+{
+ $max_tables=min($limits->{'max_tables'},$opt_loop_count);
+ $max_tables=400;
+}
if ($opt_small_test)
{
@@ -71,7 +79,7 @@ $dbh = $server->connect();
if ($opt_force) # If tables used in this test exist, drop 'em
{
print "Okay..Let's make sure that our tables don't exist yet.\n\n";
- for ($i=1 ; $i <= $max_tables ; $i++)
+ for ($i=1 ; $i <= max($max_tables, $create_loop_count) ; $i++)
{
$dbh->do("drop table bench_$i" . $server->{'drop_attr'});
}
@@ -245,7 +253,7 @@ for ($i=2 ; $i <= $keys ; $i++)
}
$loop_time=new Benchmark;
-for ($i=1 ; $i <= $opt_loop_count ; $i++)
+for ($i=1 ; $i <= $create_loop_count ; $i++)
{
do_many($dbh,$server->create("bench_$i", \@fields, \@keys));
$dbh->do("drop table bench_$i" . $server->{'drop_attr'}) or die $DBI::errstr;
1
0

[Maria-developers] Rev 2740: Group commit for maria storage engine. in file:///Users/bell/maria/bzr/work-maria-5.2-groupcommit/
by sanja@askmonty.org 10 Feb '10
by sanja@askmonty.org 10 Feb '10
10 Feb '10
At file:///Users/bell/maria/bzr/work-maria-5.2-groupcommit/
------------------------------------------------------------
revno: 2740
revision-id: sanja(a)askmonty.org-20100210205026-8l8veoi8dbon5cwl
parent: knielsen(a)knielsen-hq.org-20100201190519-b9uktnn90rwwiile
committer: sanja(a)askmonty.org
branch nick: work-maria-5.2-groupcommit
timestamp: Wed 2010-02-10 22:50:26 +0200
message:
Group commit for maria storage engine.
=== added file 'mysql-test/suite/maria/r/group_commit.result'
--- a/mysql-test/suite/maria/r/group_commit.result 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/maria/r/group_commit.result 2010-02-10 20:50:26 +0000
@@ -0,0 +1,17 @@
+drop table if exists t1;
+create table t1 (a int);
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 100;
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 0;
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 100;
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 0;
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 100;
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+drop table t1;
=== modified file 'mysql-test/suite/maria/r/maria3.result'
--- a/mysql-test/suite/maria/r/maria3.result 2009-09-18 01:04:43 +0000
+++ b/mysql-test/suite/maria/r/maria3.result 2010-02-10 20:50:26 +0000
@@ -306,6 +306,8 @@
maria_block_size 8192
maria_checkpoint_interval 30
maria_force_start_after_recovery_failures 0
+maria_group_commit none
+maria_group_commit_interval 0
maria_log_file_size 4294959104
maria_log_purge_type immediate
maria_max_sort_file_size 9223372036853727232
@@ -328,6 +330,7 @@
Maria_pagecache_reads #
Maria_pagecache_write_requests #
Maria_pagecache_writes #
+Maria_transaction_log_syncs #
create table t1 (b char(0));
insert into t1 values(NULL),("");
select length(b) from t1;
=== added file 'mysql-test/suite/maria/t/group_commit.test'
--- a/mysql-test/suite/maria/t/group_commit.test 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/maria/t/group_commit.test 2010-02-10 20:50:26 +0000
@@ -0,0 +1,71 @@
+# Test different ways of syncing (mostly syntax)
+
+--disable_warnings
+drop table if exists t1;
+--enable_warnings
+
+create table t1 (a int);
+
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 100;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 0;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 100;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 0;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 100;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+drop table t1;
=== modified file 'storage/maria/ha_maria.cc'
--- a/storage/maria/ha_maria.cc 2009-12-03 11:34:11 +0000
+++ b/storage/maria/ha_maria.cc 2010-02-10 20:50:26 +0000
@@ -102,22 +102,40 @@
array_elements(maria_translog_purge_type_names) - 1, "",
maria_translog_purge_type_names, NULL
};
+
+/* transactional log directory sync */
const char *maria_sync_log_dir_names[]=
{
"NEVER", "NEWFILE", "ALWAYS", NullS
};
-
TYPELIB maria_sync_log_dir_typelib=
{
array_elements(maria_sync_log_dir_names) - 1, "",
maria_sync_log_dir_names, NULL
};
+/* transactional log group commit */
+const char *maria_group_commit_names[]=
+{
+ "none", "hard", "soft", NullS
+};
+TYPELIB maria_group_commit_typelib=
+{
+ array_elements(maria_group_commit_names) - 1, "",
+ maria_group_commit_names, NULL
+};
+
/** Interval between background checkpoints in seconds */
static ulong checkpoint_interval;
static void update_checkpoint_interval(MYSQL_THD thd,
struct st_mysql_sys_var *var,
void *var_ptr, const void *save);
+static void update_maria_group_commit(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save);
+static void update_maria_group_commit_interval(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save);
/** After that many consecutive recovery failures, remove logs */
static ulong force_start_after_recovery_failures;
static void update_log_file_size(MYSQL_THD thd,
@@ -164,6 +182,24 @@
NULL, update_log_file_size, TRANSLOG_FILE_SIZE,
TRANSLOG_MIN_FILE_SIZE, 0xffffffffL, TRANSLOG_PAGE_SIZE);
+static MYSQL_SYSVAR_ENUM(group_commit, maria_group_commit,
+ PLUGIN_VAR_RQCMDARG,
+ "Specifies maria group commit mode. "
+ "Possible values are \"none\" (no group commit), "
+ "\"hard\" (with waiting to actual commit), "
+ "\"soft\" (no wait for commit (DANGEROUS!!!))",
+ NULL, update_maria_group_commit,
+ TRANSLOG_GCOMMIT_NONE, &maria_group_commit_typelib);
+
+static MYSQL_SYSVAR_ULONG(group_commit_interval, maria_group_commit_interval,
+ PLUGIN_VAR_RQCMDARG,
+ "Interval between commite in microseconds (1/1000000c)."
+ " 0 stands for no waiting"
+ " for other threads to come and do a commit in \"hard\" mode and no"
+ " sync()/commit at all in \"soft\" mode. Option has only an effect"
+ " if maria_group_commit is used",
+ NULL, update_maria_group_commit_interval, 0, 0, UINT_MAX, 1);
+
static MYSQL_SYSVAR_ENUM(log_purge_type, log_purge_type,
PLUGIN_VAR_RQCMDARG,
"Specifies how maria transactional log will be purged. "
@@ -3275,6 +3311,8 @@
MYSQL_SYSVAR(block_size),
MYSQL_SYSVAR(checkpoint_interval),
MYSQL_SYSVAR(force_start_after_recovery_failures),
+ MYSQL_SYSVAR(group_commit),
+ MYSQL_SYSVAR(group_commit_interval),
MYSQL_SYSVAR(page_checksum),
MYSQL_SYSVAR(log_dir_path),
MYSQL_SYSVAR(log_file_size),
@@ -3306,6 +3344,92 @@
}
/**
+ @brief Updates group commit mode
+*/
+
+static void update_maria_group_commit(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save)
+{
+ ulong value= (ulong)*((long *)var_ptr);
+ DBUG_ENTER("update_maria_group_commit");
+ DBUG_PRINT("enter", ("old value: %lu new value %lu rate %lu",
+ value, (ulong)(*(long *)save),
+ maria_group_commit_interval));
+ /* old value */
+ switch (value) {
+ case TRANSLOG_GCOMMIT_NONE:
+ break;
+ case TRANSLOG_GCOMMIT_HARD:
+ translog_hard_group_commit(FALSE);
+ break;
+ case TRANSLOG_GCOMMIT_SOFT:
+ translog_soft_sync(FALSE);
+ if (maria_group_commit_interval)
+ translog_soft_sync_end();
+ break;
+ default:
+ DBUG_ASSERT(0); /* impossible */
+ }
+ value= *(ulong *)var_ptr= (ulong)(*(long *)save);
+ translog_sync();
+ /* new value */
+ switch (value) {
+ case TRANSLOG_GCOMMIT_NONE:
+ break;
+ case TRANSLOG_GCOMMIT_HARD:
+ translog_hard_group_commit(TRUE);
+ break;
+ case TRANSLOG_GCOMMIT_SOFT:
+ translog_soft_sync(TRUE);
+ /* variable change made under global lock so we can just read it */
+ if (maria_group_commit_interval)
+ translog_soft_sync_start();
+ break;
+ default:
+ DBUG_ASSERT(0); /* impossible */
+ }
+ DBUG_VOID_RETURN;
+}
+
+/**
+ @brief Updates group commit interval
+*/
+
+static void update_maria_group_commit_interval(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save)
+{
+ ulong new_value= (ulong)*((long *)save);
+ ulong *value_ptr= (ulong*) var_ptr;
+ DBUG_ENTER("update_maria_group_commit_interval");
+ DBUG_PRINT("enter", ("old value: %lu new value %lu group commit %lu",
+ *value_ptr, new_value, maria_group_commit));
+
+ /* variable change made under global lock so we can just read it */
+ switch (maria_group_commit) {
+ case TRANSLOG_GCOMMIT_NONE:
+ *value_ptr= new_value;
+ translog_set_group_commit_interval(new_value);
+ break;
+ case TRANSLOG_GCOMMIT_HARD:
+ *value_ptr= new_value;
+ translog_set_group_commit_interval(new_value);
+ break;
+ case TRANSLOG_GCOMMIT_SOFT:
+ if (*value_ptr)
+ translog_soft_sync_end();
+ translog_set_group_commit_interval(new_value);
+ if ((*value_ptr= new_value))
+ translog_soft_sync_start();
+ break;
+ default:
+ DBUG_ASSERT(0); /* impossible */
+ }
+ DBUG_VOID_RETURN;
+}
+
+/**
@brief Updates the transaction log file limit.
*/
@@ -3327,6 +3451,7 @@
{"Maria_pagecache_reads", (char*) &maria_pagecache_var.global_cache_read, SHOW_LONGLONG},
{"Maria_pagecache_write_requests", (char*) &maria_pagecache_var.global_cache_w_requests, SHOW_LONGLONG},
{"Maria_pagecache_writes", (char*) &maria_pagecache_var.global_cache_write, SHOW_LONGLONG},
+ {"Maria_transaction_log_syncs", (char*) &translog_syncs, SHOW_LONGLONG},
{NullS, NullS, SHOW_LONG}
};
=== modified file 'storage/maria/ma_init.c'
--- a/storage/maria/ma_init.c 2008-10-09 20:03:54 +0000
+++ b/storage/maria/ma_init.c 2010-02-10 20:50:26 +0000
@@ -82,6 +82,11 @@
maria_inited= maria_multi_threaded= FALSE;
ft_free_stopwords();
ma_checkpoint_end();
+ if (translog_status == TRANSLOG_OK)
+ {
+ translog_soft_sync_end();
+ translog_sync();
+ }
if ((trid= trnman_get_max_trid()) > max_trid_in_control_file)
{
/*
=== modified file 'storage/maria/ma_loghandler.c'
--- a/storage/maria/ma_loghandler.c 2010-01-06 21:27:53 +0000
+++ b/storage/maria/ma_loghandler.c 2010-02-10 20:50:26 +0000
@@ -18,6 +18,7 @@
#include "ma_blockrec.h" /* for some constants and in-write hooks */
#include "ma_key_recover.h" /* For some in-write hooks */
#include "ma_checkpoint.h"
+#include "ma_servicethread.h"
/*
On Windows, neither my_open() nor my_sync() work for directories.
@@ -47,6 +48,15 @@
#include <m_ctype.h>
#endif
+/** @brief protects checkpoint_in_progress */
+static pthread_mutex_t LOCK_soft_sync;
+/** @brief for killing the background checkpoint thread */
+static pthread_cond_t COND_soft_sync;
+/** @brief control structure for checkpoint background thread */
+static MA_SERVICE_THREAD_CONTROL soft_sync_control=
+ {THREAD_DEAD, FALSE, &LOCK_soft_sync, &COND_soft_sync};
+
+
/* transaction log file descriptor */
typedef struct st_translog_file
{
@@ -124,10 +134,24 @@
/* Previous buffer offset to detect it flush finish */
TRANSLOG_ADDRESS prev_buffer_offset;
/*
+ If the buffer was forced to close it save value of its horizon
+ otherwise LSN_IMPOSSIBLE
+ */
+ TRANSLOG_ADDRESS pre_force_close_horizon;
+ /*
How much is written (or will be written when copy_to_buffer_in_progress
become 0) to this buffer
*/
translog_size_t size;
+ /*
+ When moving from one log buffer to another, we write the last of the
+ previous buffer to file and then move to start using the new log
+ buffer. In the case of a part filed last page, this page is not moved
+ to the start of the new buffer but instead we set the 'skip_data'
+ variable to tell us how much data at the beginning of the buffer is not
+ relevant.
+ */
+ uint skipped_data;
/* File handler for this buffer */
TRANSLOG_FILE *file;
/* Threads which are waiting for buffer filling/freeing */
@@ -304,6 +328,7 @@
*/
pthread_mutex_t log_flush_lock;
pthread_cond_t log_flush_cond;
+ pthread_cond_t new_goal_cond;
/* Protects changing of headers of finished files (max_lsn) */
pthread_mutex_t file_header_lock;
@@ -344,13 +369,39 @@
ulong log_purge_type= TRANSLOG_PURGE_IMMIDIATE;
ulong log_file_size= TRANSLOG_FILE_SIZE;
+/* sync() of log files directory mode */
ulong sync_log_dir= TRANSLOG_SYNC_DIR_NEWFILE;
+ulong maria_group_commit= TRANSLOG_GCOMMIT_NONE;
+ulong maria_group_commit_interval= 0;
/* Marker for end of log */
static uchar end_of_log= 0;
#define END_OF_LOG &end_of_log
+/**
+ Switch for "soft" sync (no real sync() but periodical sync by service
+ thread)
+*/
+static volatile my_bool soft_sync= FALSE;
+/**
+ Switch for "hard" group commit mode
+*/
+static volatile my_bool hard_group_commit= FALSE;
+/**
+ File numbers interval which have to be sync()
+*/
+static uint32 soft_sync_min= 0;
+static uint32 soft_sync_max= 0;
+static uint32 soft_need_sync= 1;
+/**
+ stores interval in microseconds
+*/
+static uint32 group_commit_wait= 0;
enum enum_translog_status translog_status= TRANSLOG_UNINITED;
+ulonglong translog_syncs= 0; /* Number of sync()s */
+
+/* time of last flush */
+static ulonglong flush_start= 0;
/* chunk types */
#define TRANSLOG_CHUNK_LSN 0x00 /* 0 chunk refer as LSN (head or tail */
@@ -980,12 +1031,17 @@
static TRANSLOG_FILE *get_current_logfile()
{
TRANSLOG_FILE *file;
+ DBUG_ENTER("get_current_logfile");
rw_rdlock(&log_descriptor.open_files_lock);
+ DBUG_PRINT("info", ("max_file: %lu min_file: %lu open_files: %lu",
+ (ulong) log_descriptor.max_file,
+ (ulong) log_descriptor.min_file,
+ (ulong) log_descriptor.open_files.elements));
DBUG_ASSERT(log_descriptor.max_file - log_descriptor.min_file + 1 ==
log_descriptor.open_files.elements);
file= *dynamic_element(&log_descriptor.open_files, 0, TRANSLOG_FILE **);
rw_unlock(&log_descriptor.open_files_lock);
- return (file);
+ DBUG_RETURN(file);
}
uchar NEAR maria_trans_file_magic[]=
@@ -1069,6 +1125,7 @@
static my_bool translog_max_lsn_to_header(File file, LSN lsn)
{
uchar lsn_buff[LSN_STORE_SIZE];
+ my_bool rc;
DBUG_ENTER("translog_max_lsn_to_header");
DBUG_PRINT("enter", ("File descriptor: %ld "
"lsn: (%lu,0x%lx)",
@@ -1077,11 +1134,17 @@
lsn_store(lsn_buff, lsn);
- DBUG_RETURN(my_pwrite(file, lsn_buff,
- LSN_STORE_SIZE,
- (LOG_HEADER_DATA_SIZE - LSN_STORE_SIZE),
- log_write_flags) != 0 ||
- my_sync(file, MYF(MY_WME)) != 0);
+ rc= (my_pwrite(file, lsn_buff,
+ LSN_STORE_SIZE,
+ (LOG_HEADER_DATA_SIZE - LSN_STORE_SIZE),
+ log_write_flags) != 0 ||
+ my_sync(file, MYF(MY_WME)) != 0);
+ /*
+ We should not increase counter in case of error above, but it is so
+ unlikely that we can ignore this case
+ */
+ translog_syncs++;
+ DBUG_RETURN(rc);
}
@@ -1423,7 +1486,9 @@
static my_bool translog_buffer_init(struct st_translog_buffer *buffer, int num)
{
DBUG_ENTER("translog_buffer_init");
- buffer->prev_last_lsn= buffer->last_lsn= LSN_IMPOSSIBLE;
+ buffer->pre_force_close_horizon=
+ buffer->prev_last_lsn= buffer->last_lsn=
+ LSN_IMPOSSIBLE;
DBUG_PRINT("info", ("last_lsn and prev_last_lsn set to 0 buffer: 0x%lx",
(ulong) buffer));
@@ -1435,6 +1500,7 @@
memset(buffer->buffer, TRANSLOG_FILLER, TRANSLOG_WRITE_BUFFER);
/* Buffer size */
buffer->size= 0;
+ buffer->skipped_data= 0;
/* cond of thread which is waiting for buffer filling */
if (pthread_cond_init(&buffer->waiting_filling_buffer, 0))
DBUG_RETURN(1);
@@ -1489,7 +1555,10 @@
TODO: sync only we have changed the log
*/
if (!file->is_sync)
+ {
rc= my_sync(file->handler.file, MYF(MY_WME));
+ translog_syncs++;
+ }
rc|= my_close(file->handler.file, MYF(MY_WME));
my_free(file, MYF(0));
return test(rc);
@@ -2044,7 +2113,8 @@
(ulong) LSN_OFFSET(log_descriptor.horizon),
(ulong) LSN_OFFSET(log_descriptor.horizon)));
DBUG_ASSERT(buffer_no == buffer->buffer_no);
- buffer->prev_last_lsn= buffer->last_lsn= LSN_IMPOSSIBLE;
+ buffer->pre_force_close_horizon=
+ buffer->prev_last_lsn= buffer->last_lsn= LSN_IMPOSSIBLE;
DBUG_PRINT("info", ("last_lsn and prev_last_lsn set to 0 buffer: 0x%lx",
(ulong) buffer));
buffer->offset= log_descriptor.horizon;
@@ -2052,6 +2122,7 @@
buffer->file= get_current_logfile();
buffer->overlay= 0;
buffer->size= 0;
+ buffer->skipped_data= 0;
translog_cursor_init(cursor, buffer, buffer_no);
DBUG_PRINT("info", ("file: #%ld (%d) init cursor #%u: 0x%lx "
"chaser: %d Size: %lu (%lu)",
@@ -2523,6 +2594,7 @@
TRANSLOG_ADDRESS offset= buffer->offset;
TRANSLOG_FILE *file= buffer->file;
uint8 ver= buffer->ver;
+ uint skipped_data;
DBUG_ENTER("translog_buffer_flush");
DBUG_PRINT("enter",
("Buffer: #%u 0x%lx file: %d offset: (%lu,0x%lx) size: %lu",
@@ -2557,6 +2629,8 @@
disk
*/
file= buffer->file;
+ skipped_data= buffer->skipped_data;
+ DBUG_ASSERT(skipped_data < TRANSLOG_PAGE_SIZE);
for (i= 0, pg= LSN_OFFSET(buffer->offset) / TRANSLOG_PAGE_SIZE;
i < buffer->size;
i+= TRANSLOG_PAGE_SIZE, pg++)
@@ -2573,13 +2647,16 @@
DBUG_ASSERT(i + TRANSLOG_PAGE_SIZE <= buffer->size);
if (translog_status != TRANSLOG_OK && translog_status != TRANSLOG_SHUTDOWN)
DBUG_RETURN(1);
- if (pagecache_inject(log_descriptor.pagecache,
+ if (pagecache_write_part(log_descriptor.pagecache,
&file->handler, pg, 3,
buffer->buffer + i,
PAGECACHE_PLAIN_PAGE,
PAGECACHE_LOCK_LEFT_UNLOCKED,
- PAGECACHE_PIN_LEFT_UNPINNED, 0,
- LSN_IMPOSSIBLE))
+ PAGECACHE_PIN_LEFT_UNPINNED,
+ PAGECACHE_WRITE_DONE, 0,
+ LSN_IMPOSSIBLE,
+ skipped_data,
+ TRANSLOG_PAGE_SIZE - skipped_data))
{
DBUG_PRINT("error",
("Can't write page (%lu,0x%lx) to pagecache, error: %d",
@@ -2589,10 +2666,12 @@
translog_stop_writing();
DBUG_RETURN(1);
}
+ skipped_data= 0;
}
file->is_sync= 0;
- if (my_pwrite(file->handler.file, buffer->buffer,
- buffer->size, LSN_OFFSET(buffer->offset),
+ if (my_pwrite(file->handler.file, buffer->buffer + buffer->skipped_data,
+ buffer->size - buffer->skipped_data,
+ LSN_OFFSET(buffer->offset) + buffer->skipped_data,
log_write_flags))
{
DBUG_PRINT("error", ("Can't write buffer (%lu,0x%lx) size %lu "
@@ -2985,6 +3064,7 @@
uchar *from, *table= NULL;
int is_last_unfinished_page;
uint last_protected_sector= 0;
+ uint skipped_data= curr_buffer->skipped_data;
TRANSLOG_FILE file_copy;
uint8 ver= curr_buffer->ver;
translog_wait_for_writers(curr_buffer);
@@ -2997,7 +3077,38 @@
}
DBUG_ASSERT(LSN_FILE_NO(addr) == LSN_FILE_NO(curr_buffer->offset));
from= curr_buffer->buffer + (addr - curr_buffer->offset);
- memcpy(buffer, from, TRANSLOG_PAGE_SIZE);
+ if (skipped_data && addr == curr_buffer->offset)
+ {
+ /*
+ We read page part of which is not present in buffer,
+ so we should read absent part from file (page cache actually)
+ */
+ file= get_logfile_by_number(file_no);
+ DBUG_ASSERT(file != NULL);
+ /*
+ it's ok to not lock the page because:
+ - The log handler has it's own page cache.
+ - There is only one thread that can access the log
+ cache at a time
+ */
+ if (!(buffer= pagecache_read(log_descriptor.pagecache,
+ &file->handler,
+ LSN_OFFSET(addr) / TRANSLOG_PAGE_SIZE,
+ 3, buffer,
+ PAGECACHE_PLAIN_PAGE,
+ PAGECACHE_LOCK_LEFT_UNLOCKED,
+ NULL)))
+ DBUG_RETURN(NULL);
+ }
+ else
+ skipped_data= 0; /* Read after skipped in buffer data */
+ /*
+ Now we have correct data in buffer up to 'skipped_data'. The
+ following memcpy() will move the data from the internal buffer
+ that was not yet on disk.
+ */
+ memcpy(buffer + skipped_data, from + skipped_data,
+ TRANSLOG_PAGE_SIZE - skipped_data);
/*
We can use copy then in translog_page_validator() because it
do not put it permanently somewhere.
@@ -3291,6 +3402,7 @@
uint32 next_page_offset, page_rest;
uint32 i;
File fd;
+ int rc;
TRANSLOG_VALIDATOR_DATA data;
char path[FN_REFLEN];
uchar page_buff[TRANSLOG_PAGE_SIZE];
@@ -3316,14 +3428,19 @@
TRANSLOG_PAGE_SIZE);
page_rest= next_page_offset - LSN_OFFSET(addr);
memset(page_buff, TRANSLOG_FILLER, page_rest);
- if ((fd= open_logfile_by_number_no_cache(LSN_FILE_NO(addr))) < 0 ||
- ((my_chsize(fd, next_page_offset, TRANSLOG_FILLER, MYF(MY_WME)) ||
- (page_rest && my_pwrite(fd, page_buff, page_rest, LSN_OFFSET(addr),
- log_write_flags)) ||
- my_sync(fd, MYF(MY_WME))) |
- my_close(fd, MYF(MY_WME))) ||
- (sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS &&
- sync_dir(log_descriptor.directory_fd, MYF(MY_WME | MY_IGNORE_BADFD))))
+ rc= ((fd= open_logfile_by_number_no_cache(LSN_FILE_NO(addr))) < 0 ||
+ ((my_chsize(fd, next_page_offset, TRANSLOG_FILLER, MYF(MY_WME)) ||
+ (page_rest && my_pwrite(fd, page_buff, page_rest, LSN_OFFSET(addr),
+ log_write_flags)) ||
+ my_sync(fd, MYF(MY_WME)))));
+ translog_syncs++;
+ rc|= (fd > 0 && my_close(fd, MYF(MY_WME)));
+ if (sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS)
+ {
+ rc|= sync_dir(log_descriptor.directory_fd, MYF(MY_WME | MY_IGNORE_BADFD));
+ translog_syncs++;
+ }
+ if (rc)
DBUG_RETURN(1);
/* fix the horizon */
@@ -3511,6 +3628,7 @@
pthread_mutex_init(&log_descriptor.dirty_buffer_mask_lock,
MY_MUTEX_INIT_FAST) ||
pthread_cond_init(&log_descriptor.log_flush_cond, 0) ||
+ pthread_cond_init(&log_descriptor.new_goal_cond, 0) ||
my_rwlock_init(&log_descriptor.open_files_lock,
NULL) ||
my_init_dynamic_array(&log_descriptor.open_files,
@@ -3912,7 +4030,6 @@
log_descriptor.flushed= log_descriptor.horizon;
log_descriptor.in_buffers_only= log_descriptor.bc.buffer->offset;
log_descriptor.max_lsn= LSN_IMPOSSIBLE; /* set to 0 */
- log_descriptor.previous_flush_horizon= log_descriptor.horizon;
/*
Now 'flushed' is set to 'horizon' value, but 'horizon' is (potentially)
address of the next LSN and we want indicate that all LSNs that are
@@ -3995,6 +4112,10 @@
It is beginning of the log => there is no LSNs in the log =>
There is no harm in leaving it "as-is".
*/
+ log_descriptor.previous_flush_horizon= log_descriptor.horizon;
+ DBUG_PRINT("info", ("previous_flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(log_descriptor.
+ previous_flush_horizon)));
DBUG_RETURN(0);
}
file_no--;
@@ -4070,6 +4191,9 @@
translog_free_record_header(&rec);
}
}
+ log_descriptor.previous_flush_horizon= log_descriptor.horizon;
+ DBUG_PRINT("info", ("previous_flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(log_descriptor.previous_flush_horizon)));
DBUG_RETURN(0);
err:
ma_message_no_user(0, "log initialization failed");
@@ -4157,6 +4281,7 @@
pthread_mutex_destroy(&log_descriptor.log_flush_lock);
pthread_mutex_destroy(&log_descriptor.dirty_buffer_mask_lock);
pthread_cond_destroy(&log_descriptor.log_flush_cond);
+ pthread_cond_destroy(&log_descriptor.new_goal_cond);
rwlock_destroy(&log_descriptor.open_files_lock);
delete_dynamic(&log_descriptor.open_files);
delete_dynamic(&log_descriptor.unfinished_files);
@@ -6885,11 +7010,11 @@
{
translog_size_t res;
DBUG_ENTER("translog_read_record_header_from_buffer");
- DBUG_ASSERT(translog_is_LSN_chunk(page[page_offset]));
- DBUG_ASSERT(translog_status == TRANSLOG_OK ||
- translog_status == TRANSLOG_READONLY);
DBUG_PRINT("info", ("page byte: 0x%x offset: %u",
(uint) page[page_offset], (uint) page_offset));
+ DBUG_ASSERT(translog_is_LSN_chunk(page[page_offset]));
+ DBUG_ASSERT(translog_status == TRANSLOG_OK ||
+ translog_status == TRANSLOG_READONLY);
buff->type= (page[page_offset] & TRANSLOG_REC_TYPE);
buff->short_trid= uint2korr(page + page_offset + 1);
DBUG_PRINT("info", ("Type %u, Short TrID %u, LSN (%lu,0x%lx)",
@@ -7356,27 +7481,27 @@
"Buffer addr: (%lu,0x%lx) "
"Page addr: (%lu,0x%lx) "
"size: %lu (%lu) Pg: %u left: %u in progress %u",
- (uint) log_descriptor.bc.buffer_no,
- (ulong) log_descriptor.bc.buffer,
- LSN_IN_PARTS(log_descriptor.bc.buffer->offset),
+ (uint) old_buffer_no,
+ (ulong) old_buffer,
+ LSN_IN_PARTS(old_buffer->offset),
(ulong) LSN_FILE_NO(log_descriptor.horizon),
(ulong) (LSN_OFFSET(log_descriptor.horizon) -
log_descriptor.bc.current_page_fill),
- (ulong) log_descriptor.bc.buffer->size,
+ (ulong) old_buffer->size,
(ulong) (log_descriptor.bc.ptr -log_descriptor.bc.
buffer->buffer),
(uint) log_descriptor.bc.current_page_fill,
(uint) left,
- (uint) log_descriptor.bc.buffer->
+ (uint) old_buffer->
copy_to_buffer_in_progress));
translog_lock_assert_owner();
LINT_INIT(current_page_fill);
- new_buff_beginning= log_descriptor.bc.buffer->offset;
- new_buff_beginning+= log_descriptor.bc.buffer->size; /* increase offset */
+ new_buff_beginning= old_buffer->offset;
+ new_buff_beginning+= old_buffer->size; /* increase offset */
DBUG_ASSERT(log_descriptor.bc.ptr !=NULL);
DBUG_ASSERT(LSN_FILE_NO(log_descriptor.horizon) ==
- LSN_FILE_NO(log_descriptor.bc.buffer->offset));
+ LSN_FILE_NO(old_buffer->offset));
translog_check_cursor(&log_descriptor.bc);
DBUG_ASSERT(left < TRANSLOG_PAGE_SIZE);
if (left)
@@ -7387,18 +7512,20 @@
*/
DBUG_PRINT("info", ("left: %u", (uint) left));
+ old_buffer->pre_force_close_horizon=
+ old_buffer->offset + old_buffer->size;
/* decrease offset */
new_buff_beginning-= log_descriptor.bc.current_page_fill;
current_page_fill= log_descriptor.bc.current_page_fill;
memset(log_descriptor.bc.ptr, TRANSLOG_FILLER, left);
- log_descriptor.bc.buffer->size+= left;
+ old_buffer->size+= left;
DBUG_PRINT("info", ("Finish Page buffer #%u: 0x%lx "
"Size: %lu",
- (uint) log_descriptor.bc.buffer->buffer_no,
- (ulong) log_descriptor.bc.buffer,
- (ulong) log_descriptor.bc.buffer->size));
- DBUG_ASSERT(log_descriptor.bc.buffer->buffer_no ==
+ (uint) old_buffer->buffer_no,
+ (ulong) old_buffer,
+ (ulong) old_buffer->size));
+ DBUG_ASSERT(old_buffer->buffer_no ==
log_descriptor.bc.buffer_no);
}
else
@@ -7509,11 +7636,21 @@
if (left)
{
- /*
- TODO: do not copy beginning of the page if we have no CRC or sector
- checks on
- */
- memcpy(new_buffer->buffer, data, current_page_fill);
+ if (log_descriptor.flags &
+ (TRANSLOG_PAGE_CRC | TRANSLOG_SECTOR_PROTECTION))
+ memcpy(new_buffer->buffer, data, current_page_fill);
+ else
+ {
+ /*
+ This page header does not change if we add more data to the page so
+ we can not copy it and will not overwrite later
+ */
+ new_buffer->skipped_data= current_page_fill;
+#ifndef DBUG_OFF
+ memset(new_buffer->buffer, 0xa5, current_page_fill);
+#endif
+ DBUG_ASSERT(new_buffer->skipped_data < TRANSLOG_PAGE_SIZE);
+ }
}
old_buffer->next_buffer_offset= new_buffer->offset;
translog_buffer_lock(new_buffer);
@@ -7561,6 +7698,7 @@
{
log_descriptor.next_pass_max_lsn= lsn;
log_descriptor.max_lsn_requester= pthread_self();
+ pthread_cond_broadcast(&log_descriptor.new_goal_cond);
}
while (flush_no == log_descriptor.flush_no)
{
@@ -7572,66 +7710,78 @@
/**
- @brief Flush the log up to given LSN (included)
-
- @param lsn log record serial number up to which (inclusive)
- the log has to be flushed
-
- @return Operation status
+ @brief sync() range of files (inclusive) and directory (by request)
+
+ @param min min internal file number to flush
+ @param max max internal file number to flush
+ @param sync_dir need sync directory
+
+ return Operation status
@retval 0 OK
@retval 1 Error
-
-*/
-
-my_bool translog_flush(TRANSLOG_ADDRESS lsn)
-{
- LSN sent_to_disk= LSN_IMPOSSIBLE;
- TRANSLOG_ADDRESS flush_horizon;
- uint fn, i;
+*/
+
+static my_bool translog_sync_files(uint32 min, uint32 max,
+ my_bool sync_dir)
+{
+ uint fn;
+ my_bool rc= 0;
+ ulonglong flush_interval;
+ DBUG_ENTER("translog_sync_files");
+ DBUG_PRINT("info", ("min: %lu max: %lu sync dir: %d",
+ (ulong) min, (ulong) max, (int) sync_dir));
+ DBUG_ASSERT(min <= max);
+
+ flush_interval= group_commit_wait;
+ if (flush_interval)
+ flush_start= my_micro_time();
+ for (fn= min; fn <= max; fn++)
+ {
+ TRANSLOG_FILE *file= get_logfile_by_number(fn);
+ DBUG_ASSERT(file != NULL);
+ if (!file->is_sync)
+ {
+ if (my_sync(file->handler.file, MYF(MY_WME)))
+ {
+ rc= 1;
+ translog_stop_writing();
+ DBUG_RETURN(rc);
+ }
+ translog_syncs++;
+ file->is_sync= 1;
+ }
+ }
+
+ if (sync_dir)
+ {
+ if (!(rc= sync_dir(log_descriptor.directory_fd,
+ MYF(MY_WME | MY_IGNORE_BADFD))))
+ translog_syncs++;
+ }
+
+ DBUG_RETURN(rc);
+}
+
+
+/*
+ @brief Flushes buffers with LSNs in them less or equal address <lsn>
+
+ @param lsn address up to which all LSNs should be flushed,
+ can be reset to real last LSN address
+ @parem sent_to_disk returns 'sent to disk' position
+ @param flush_horizon returns horizon of the flush
+
+ @note About terminology see comment to translog_flush().
+*/
+
+void translog_flush_buffers(TRANSLOG_ADDRESS *lsn,
+ TRANSLOG_ADDRESS *sent_to_disk,
+ TRANSLOG_ADDRESS *flush_horizon)
+{
dirty_buffer_mask_t dirty_buffer_mask;
+ uint i;
uint8 last_buffer_no, start_buffer_no;
- my_bool rc= 0;
- DBUG_ENTER("translog_flush");
- DBUG_PRINT("enter", ("Flush up to LSN: (%lu,0x%lx)", LSN_IN_PARTS(lsn)));
- DBUG_ASSERT(translog_status == TRANSLOG_OK ||
- translog_status == TRANSLOG_READONLY);
- LINT_INIT(sent_to_disk);
-
- pthread_mutex_lock(&log_descriptor.log_flush_lock);
- DBUG_PRINT("info", ("Everything is flushed up to (%lu,0x%lx)",
- LSN_IN_PARTS(log_descriptor.flushed)));
- if (cmp_translog_addr(log_descriptor.flushed, lsn) >= 0)
- {
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);
- DBUG_RETURN(0);
- }
- if (log_descriptor.flush_in_progress)
- {
- translog_flush_set_new_goal_and_wait(lsn);
- if (!pthread_equal(log_descriptor.max_lsn_requester, pthread_self()))
- {
- /* fix lsn if it was horizon */
- if (cmp_translog_addr(lsn, log_descriptor.bc.buffer->last_lsn) > 0)
- lsn= BUFFER_MAX_LSN(log_descriptor.bc.buffer);
- translog_flush_wait_for_end(lsn);
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);
- DBUG_RETURN(0);
- }
- log_descriptor.next_pass_max_lsn= LSN_IMPOSSIBLE;
- }
- log_descriptor.flush_in_progress= 1;
- flush_horizon= log_descriptor.previous_flush_horizon;
- DBUG_PRINT("info", ("flush_in_progress is set"));
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);
-
- translog_lock();
- if (log_descriptor.is_everything_flushed)
- {
- DBUG_PRINT("info", ("everything is flushed"));
- rc= (translog_status == TRANSLOG_READONLY);
- translog_unlock();
- goto out;
- }
+ DBUG_ENTER("translog_flush_buffers");
/*
We will recheck information when will lock buffers one by
@@ -7656,15 +7806,15 @@
/*
if LSN up to which we have to flush bigger then maximum LSN of previous
buffer and at least one LSN was saved in the current buffer (last_lsn !=
- LSN_IMPOSSIBLE) then we better finish the current buffer.
+ LSN_IMPOSSIBLE) then we have to close the current buffer.
*/
- if (cmp_translog_addr(lsn, log_descriptor.bc.buffer->prev_last_lsn) > 0 &&
+ if (cmp_translog_addr(*lsn, log_descriptor.bc.buffer->prev_last_lsn) > 0 &&
log_descriptor.bc.buffer->last_lsn != LSN_IMPOSSIBLE)
{
struct st_translog_buffer *buffer= log_descriptor.bc.buffer;
- lsn= log_descriptor.bc.buffer->last_lsn; /* fix lsn if it was horizon */
+ *lsn= log_descriptor.bc.buffer->last_lsn; /* fix lsn if it was horizon */
DBUG_PRINT("info", ("LSN to flush fixed to last lsn: (%lu,0x%lx)",
- LSN_IN_PARTS(log_descriptor.bc.buffer->last_lsn)));
+ LSN_IN_PARTS(log_descriptor.bc.buffer->last_lsn)));
last_buffer_no= log_descriptor.bc.buffer_no;
log_descriptor.is_everything_flushed= 1;
translog_force_current_buffer_to_finish();
@@ -7676,8 +7826,10 @@
TRANSLOG_BUFFERS_NO);
translog_unlock();
}
- sent_to_disk= translog_get_sent_to_disk();
- if (cmp_translog_addr(lsn, sent_to_disk) > 0)
+
+ /* flush buffers */
+ *sent_to_disk= translog_get_sent_to_disk();
+ if (cmp_translog_addr(*lsn, *sent_to_disk) > 0)
{
DBUG_PRINT("info", ("Start buffer #: %u last buffer #: %u",
@@ -7697,53 +7849,238 @@
LSN_IN_PARTS(buffer->last_lsn),
(buffer->file ?
"dirty" : "closed")));
- if (buffer->prev_last_lsn <= lsn &&
+ if (buffer->prev_last_lsn <= *lsn &&
buffer->file != NULL)
{
- DBUG_ASSERT(flush_horizon <= buffer->offset + buffer->size);
- flush_horizon= buffer->offset + buffer->size;
+ DBUG_ASSERT(*flush_horizon <= buffer->offset + buffer->size);
+ *flush_horizon= (buffer->pre_force_close_horizon != LSN_IMPOSSIBLE ?
+ buffer->pre_force_close_horizon :
+ buffer->offset + buffer->size);
+ /* pre_force_close_horizon is reset during new buffer start */
+ DBUG_PRINT("info", ("flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(*flush_horizon)));
+ DBUG_ASSERT(*flush_horizon <= log_descriptor.horizon);
+
translog_buffer_flush(buffer);
}
translog_buffer_unlock(buffer);
i= (i + 1) % TRANSLOG_BUFFERS_NO;
} while (i != last_buffer_no);
- sent_to_disk= translog_get_sent_to_disk();
- }
-
- /* sync files from previous flush till current one */
- for (fn= LSN_FILE_NO(log_descriptor.flushed); fn <= LSN_FILE_NO(lsn); fn++)
- {
- TRANSLOG_FILE *file= get_logfile_by_number(fn);
- DBUG_ASSERT(file != NULL);
- if (!file->is_sync)
- {
- if (my_sync(file->handler.file, MYF(MY_WME)))
+ *sent_to_disk= translog_get_sent_to_disk();
+ }
+
+ DBUG_VOID_RETURN;
+}
+
+/**
+ @brief Flush the log up to given LSN (included)
+
+ @param lsn log record serial number up to which (inclusive)
+ the log has to be flushed
+
+ @return Operation status
+ @retval 0 OK
+ @retval 1 Error
+
+ @note
+
+ - Non group commit logic: Commits made in passes. Thread which started
+ flush first is performing actual flush, other threads sets new goal (LSN)
+ of the next pass (if it is maximum) and waits for the pass end or just
+ wait for the pass end.
+
+ - If hard group commit enabled and rate set to zero:
+ The first thread sends all changed buffers to disk. This is repeated
+ as long as there are new LSNs added. The process can not loop
+ forever because we have limited number of threads and they will wait
+ for the data to be synced.
+ Pseudo code:
+
+ do
+ send changed buffers to disk
+ while new_goal
+ sync
+
+ - If hard group commit switched ON and less than rate microseconds has
+ passed from last sync, then after buffers have been sent to disk
+ wait until rate microseconds has passed since last sync, do sync and return.
+ This ensures that if we call sync infrequently we don't do any waits.
+
+ - If soft group commit enabled everything works as with 'non group commit'
+ but the thread doesn't do any real sync(). If rate is not zero the
+ sync() will be performed by a service thread with the given rate
+ when needed (new LSN appears).
+
+ @note Terminology:
+ 'sent to disk' means written to disk but not sync()ed,
+ 'flushed' mean sent to disk and synced().
+*/
+
+my_bool translog_flush(TRANSLOG_ADDRESS lsn)
+{
+ struct timespec abstime;
+ ulonglong flush_interval;
+ ulonglong time_spent;
+ LSN sent_to_disk= LSN_IMPOSSIBLE;
+ TRANSLOG_ADDRESS flush_horizon;
+ my_bool rc= 0;
+ my_bool hgroup_commit_at_start;
+ DBUG_ENTER("translog_flush");
+ DBUG_PRINT("enter", ("Flush up to LSN: (%lu,0x%lx)", LSN_IN_PARTS(lsn)));
+ DBUG_ASSERT(translog_status == TRANSLOG_OK ||
+ translog_status == TRANSLOG_READONLY);
+ LINT_INIT(sent_to_disk);
+
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ DBUG_PRINT("info", ("Everything is flushed up to (%lu,0x%lx)",
+ LSN_IN_PARTS(log_descriptor.flushed)));
+ if (cmp_translog_addr(log_descriptor.flushed, lsn) >= 0)
+ {
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ DBUG_RETURN(0);
+ }
+ if (log_descriptor.flush_in_progress)
+ {
+ translog_lock();
+ /* fix lsn if it was horizon */
+ if (cmp_translog_addr(lsn, log_descriptor.bc.buffer->last_lsn) > 0)
+ lsn= BUFFER_MAX_LSN(log_descriptor.bc.buffer);
+ translog_unlock();
+ translog_flush_set_new_goal_and_wait(lsn);
+ if (!pthread_equal(log_descriptor.max_lsn_requester, pthread_self()))
+ {
+ /*
+ translog_flush_wait_for_end() release log_flush_lock while is
+ waiting then acquire it again
+ */
+ translog_flush_wait_for_end(lsn);
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ DBUG_RETURN(0);
+ }
+ log_descriptor.next_pass_max_lsn= LSN_IMPOSSIBLE;
+ }
+ log_descriptor.flush_in_progress= 1;
+ flush_horizon= log_descriptor.previous_flush_horizon;
+ DBUG_PRINT("info", ("flush_in_progress is set, flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(flush_horizon)));
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+
+ hgroup_commit_at_start= hard_group_commit;
+ if (hgroup_commit_at_start)
+ flush_interval= group_commit_wait;
+
+ translog_lock();
+ if (log_descriptor.is_everything_flushed)
+ {
+ DBUG_PRINT("info", ("everything is flushed"));
+ translog_unlock();
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ goto out;
+ }
+
+ for (;;)
+ {
+ /* Following function flushes buffers and makes translog_unlock() */
+ translog_flush_buffers(&lsn, &sent_to_disk, &flush_horizon);
+
+ if (!hgroup_commit_at_start)
+ break; /* flush pass is ended */
+
+retest:
+ /*
+ We do not check time here because pthread_mutex_lock rarely takes
+ a lot of time so we can sacrifice a bit precision to performance
+ (taking into account that my_micro_time() might be expensive call).
+ */
+ if (flush_interval == 0)
+ break; /* flush pass is ended */
+
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ if (log_descriptor.next_pass_max_lsn == LSN_IMPOSSIBLE)
+ {
+ if (flush_interval == 0 ||
+ (time_spent= (my_micro_time() - flush_start)) >= flush_interval)
{
- rc= 1;
- translog_stop_writing();
- sent_to_disk= LSN_IMPOSSIBLE;
- goto out;
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ break;
}
- file->is_sync= 1;
- }
- }
-
- if (sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS &&
- (LSN_FILE_NO(log_descriptor.previous_flush_horizon) !=
- LSN_FILE_NO(flush_horizon) ||
- ((LSN_OFFSET(log_descriptor.previous_flush_horizon) - 1) /
- TRANSLOG_PAGE_SIZE) !=
- ((LSN_OFFSET(flush_horizon) - 1) / TRANSLOG_PAGE_SIZE)))
- rc|= sync_dir(log_descriptor.directory_fd, MYF(MY_WME | MY_IGNORE_BADFD));
+ DBUG_PRINT("info", ("flush waits: %llu interval: %llu spent: %llu",
+ flush_interval - time_spent,
+ flush_interval, time_spent));
+ /* wait time or next goal */
+ set_timespec_nsec(abstime, flush_interval - time_spent);
+ pthread_cond_timedwait(&log_descriptor.new_goal_cond,
+ &log_descriptor.log_flush_lock,
+ &abstime);
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ DBUG_PRINT("info", ("retest conditions"));
+ goto retest;
+ }
+
+ /* take next goal */
+ lsn= log_descriptor.next_pass_max_lsn;
+ log_descriptor.next_pass_max_lsn= LSN_IMPOSSIBLE;
+ /* prevent other thread from continue */
+ log_descriptor.max_lsn_requester= pthread_self();
+ DBUG_PRINT("info", ("flush took next goal: (%lu,0x%lx)",
+ LSN_IN_PARTS(lsn)));
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+
+ /* next flush pass */
+ DBUG_PRINT("info", ("next flush pass"));
+ translog_lock();
+ }
+
+ /*
+ sync() files from previous flush till current one
+ */
+ if (!soft_sync || hgroup_commit_at_start)
+ {
+ if ((rc=
+ translog_sync_files(LSN_FILE_NO(log_descriptor.flushed),
+ LSN_FILE_NO(lsn),
+ sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS &&
+ (LSN_FILE_NO(log_descriptor.
+ previous_flush_horizon) !=
+ LSN_FILE_NO(flush_horizon) ||
+ (LSN_OFFSET(log_descriptor.
+ previous_flush_horizon) /
+ TRANSLOG_PAGE_SIZE) !=
+ (LSN_OFFSET(flush_horizon) /
+ TRANSLOG_PAGE_SIZE)))))
+ {
+ sent_to_disk= LSN_IMPOSSIBLE;
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ goto out;
+ }
+ /* keep values for soft sync() and forced sync() actual */
+ {
+ uint32 fileno= LSN_FILE_NO(lsn);
+ my_atomic_rwlock_wrlock(&soft_sync_rwl);
+ my_atomic_store32(&soft_sync_min, fileno);
+ my_atomic_store32(&soft_sync_max, fileno);
+ my_atomic_rwlock_wrunlock(&soft_sync_rwl);
+ }
+ }
+ else
+ {
+ my_atomic_rwlock_wrlock(&soft_sync_rwl);
+ my_atomic_store32(&soft_sync_max, LSN_FILE_NO(lsn));
+ my_atomic_store32(&soft_need_sync, 1);
+ my_atomic_rwlock_wrunlock(&soft_sync_rwl);
+ }
+
+ DBUG_ASSERT(flush_horizon <= log_descriptor.horizon);
+
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
log_descriptor.previous_flush_horizon= flush_horizon;
out:
- pthread_mutex_lock(&log_descriptor.log_flush_lock);
if (sent_to_disk != LSN_IMPOSSIBLE)
log_descriptor.flushed= sent_to_disk;
log_descriptor.flush_in_progress= 0;
log_descriptor.flush_no++;
DBUG_PRINT("info", ("flush_in_progress is dropped"));
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);\
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
pthread_cond_broadcast(&log_descriptor.log_flush_cond);
DBUG_RETURN(rc);
}
@@ -8113,6 +8450,8 @@
my_bool translog_purge(TRANSLOG_ADDRESS low)
{
uint32 last_need_file= LSN_FILE_NO(low);
+ uint32 min_unsync;
+ int soft;
TRANSLOG_ADDRESS horizon= translog_get_horizon();
int rc= 0;
DBUG_ENTER("translog_purge");
@@ -8120,12 +8459,26 @@
DBUG_ASSERT(translog_status == TRANSLOG_OK ||
translog_status == TRANSLOG_READONLY);
+ soft= soft_sync;
+ my_atomic_rwlock_wrlock(&soft_sync_rwl);
+ min_unsync= my_atomic_load32(&soft_sync_min);
+ my_atomic_rwlock_wrunlock(&soft_sync_rwl);
+ DBUG_PRINT("info", ("min_unsync: %lu", (ulong) min_unsync));
+ if (soft && min_unsync < last_need_file)
+ {
+ last_need_file= min_unsync;
+ DBUG_PRINT("info", ("last_need_file set to %lu", (ulong)last_need_file));
+ }
+
pthread_mutex_lock(&log_descriptor.purger_lock);
+ DBUG_PRINT("info", ("last_lsn_checked file: %lu:",
+ (ulong) log_descriptor.last_lsn_checked));
if (LSN_FILE_NO(log_descriptor.last_lsn_checked) < last_need_file)
{
uint32 i;
uint32 min_file= translog_first_file(horizon, 1);
DBUG_ASSERT(min_file != 0); /* log is already started */
+ DBUG_PRINT("info", ("min_file: %lu:",(ulong) min_file));
for(i= min_file; i < last_need_file && rc == 0; i++)
{
LSN lsn= translog_get_file_max_lsn_stored(i);
@@ -8356,6 +8709,159 @@
}
+
+/**
+ Sets soft sync mode
+
+ @param mode TRUE if we need switch soft sync on else off
+*/
+
+void translog_soft_sync(my_bool mode)
+{
+ soft_sync= mode;
+}
+
+
+/**
+ Sets hard group commit
+
+ @param mode TRUE if we need switch hard group commit on else off
+*/
+
+void translog_hard_group_commit(my_bool mode)
+{
+ hard_group_commit= mode;
+}
+
+
+/**
+ @brief forced log sync (used when we are switching modes)
+*/
+
+void translog_sync()
+{
+ uint32 max= get_current_logfile()->number;
+ uint32 min;
+ DBUG_ENTER("ma_translog_sync");
+
+ my_atomic_rwlock_rdlock(&soft_sync_rwl);
+ min= my_atomic_load32(&soft_sync_min);
+ my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+ if (!min)
+ min= max;
+
+ translog_sync_files(min, max, sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS);
+
+ DBUG_VOID_RETURN;
+}
+
+
+/**
+ @brief set rate for group commit
+
+ @param interval interval to set.
+
+ @note We use this function with additional variable because have to
+ restart service thread with new value which we can't make inside changing
+ variable routine (update_maria_group_commit_interval)
+*/
+
+void translog_set_group_commit_interval(uint32 interval)
+{
+ DBUG_ENTER("translog_set_group_commit_interval");
+ group_commit_wait= interval;
+ DBUG_PRINT("info", ("wait: %llu",
+ (ulonglong)group_commit_wait));
+ DBUG_VOID_RETURN;
+}
+
+
+/**
+ @brief syncing service thread
+*/
+
+static pthread_handler_t
+ma_soft_sync_background( void *arg __attribute__((unused)))
+{
+
+ my_thread_init();
+ {
+ DBUG_ENTER("ma_soft_sync_background");
+ for(;;)
+ {
+ ulonglong prev_loop= my_micro_time();
+ ulonglong time, sleep;
+ uint32 min, max, sync_request;
+ my_atomic_rwlock_rdlock(&soft_sync_rwl);
+ min= my_atomic_load32(&soft_sync_min);
+ max= my_atomic_load32(&soft_sync_max);
+ sync_request= my_atomic_load32(&soft_need_sync);
+ my_atomic_store32(&soft_sync_min, max);
+ my_atomic_store32(&soft_need_sync, 0);
+ my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+
+ sleep= group_commit_wait;
+ if (sync_request)
+ translog_sync_files(min, max, FALSE);
+ time= my_micro_time() - prev_loop;
+ if (time > sleep)
+ sleep= 0;
+ else
+ sleep-= time;
+ if (my_service_thread_sleep(&soft_sync_control, sleep))
+ break;
+ }
+ my_service_thread_signal_end(&soft_sync_control);
+ my_thread_end();
+ DBUG_RETURN(0);
+ }
+}
+
+
+/**
+ @brief Starts syncing thread
+*/
+
+int translog_soft_sync_start(void)
+{
+ pthread_t th;
+ int res= 0;
+ uint32 min, max;
+ DBUG_ENTER("translog_soft_sync_start");
+
+ /* check and init variables */
+ my_atomic_rwlock_rdlock(&soft_sync_rwl);
+ min= my_atomic_load32(&soft_sync_min);
+ max= my_atomic_load32(&soft_sync_max);
+ if (!max)
+ my_atomic_store32(&soft_sync_max, (max= get_current_logfile()->number));
+ if (!min)
+ my_atomic_store32(&soft_sync_min, max);
+ my_atomic_store32(&soft_need_sync, 1);
+ my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+
+ if (!(res= ma_service_thread_control_init(&soft_sync_control)))
+ if (!(res= pthread_create(&th, NULL, ma_soft_sync_background, NULL)))
+ soft_sync_control.status= THREAD_RUNNING;
+ DBUG_RETURN(res);
+}
+
+
+/**
+ @brief Stops syncing thread
+*/
+
+void translog_soft_sync_end(void)
+{
+ DBUG_ENTER("translog_soft_sync_end");
+ if (soft_sync_control.inited)
+ {
+ ma_service_thread_control_end(&soft_sync_control);
+ }
+ DBUG_VOID_RETURN;
+}
+
+
#ifdef MARIA_DUMP_LOG
#include <my_getopt.h>
extern void translog_example_table_init();
=== modified file 'storage/maria/ma_loghandler.h'
--- a/storage/maria/ma_loghandler.h 2009-01-15 22:25:53 +0000
+++ b/storage/maria/ma_loghandler.h 2010-02-10 20:50:26 +0000
@@ -342,6 +342,14 @@
TRANSLOG_SHUTDOWN /* going to shutdown the loghandler */
};
extern enum enum_translog_status translog_status;
+extern ulonglong translog_syncs; /* Number of sync()s */
+
+void translog_soft_sync(my_bool mode);
+void translog_hard_group_commit(my_bool mode);
+int translog_soft_sync_start(void);
+void translog_soft_sync_end(void);
+void translog_sync();
+void translog_set_group_commit_interval(uint32 interval);
/*
all the rest added because of recovery; should we make
@@ -441,6 +449,14 @@
typedef enum
{
+ TRANSLOG_GCOMMIT_NONE,
+ TRANSLOG_GCOMMIT_HARD,
+ TRANSLOG_GCOMMIT_SOFT
+} enum_maria_group_commit;
+extern ulong maria_group_commit;
+extern ulong maria_group_commit_interval;
+typedef enum
+{
TRANSLOG_PURGE_IMMIDIATE,
TRANSLOG_PURGE_EXTERNAL,
TRANSLOG_PURGE_ONDEMAND
1
0

[Maria-developers] Rev 2740: Group commit for maria storage engine. in file:///Users/bell/maria/bzr/work-maria-5.2-groupcommit/
by sanja@askmonty.org 10 Feb '10
by sanja@askmonty.org 10 Feb '10
10 Feb '10
At file:///Users/bell/maria/bzr/work-maria-5.2-groupcommit/
------------------------------------------------------------
revno: 2740
revision-id: sanja(a)askmonty.org-20100209083259-ekki5zw4hbaeqpwh
parent: knielsen(a)knielsen-hq.org-20100201190519-b9uktnn90rwwiile
committer: sanja(a)askmonty.org
branch nick: work-maria-5.2-groupcommit
timestamp: Tue 2010-02-09 10:32:59 +0200
message:
Group commit for maria storage engine.
=== added file 'mysql-test/suite/maria/r/group_commit.result'
--- a/mysql-test/suite/maria/r/group_commit.result 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/maria/r/group_commit.result 2010-02-09 08:32:59 +0000
@@ -0,0 +1,17 @@
+drop table if exists t1;
+create table t1 (a int);
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 100;
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 0;
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 100;
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 0;
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 100;
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+drop table t1;
=== modified file 'mysql-test/suite/maria/r/maria3.result'
--- a/mysql-test/suite/maria/r/maria3.result 2009-09-18 01:04:43 +0000
+++ b/mysql-test/suite/maria/r/maria3.result 2010-02-09 08:32:59 +0000
@@ -306,6 +306,8 @@
maria_block_size 8192
maria_checkpoint_interval 30
maria_force_start_after_recovery_failures 0
+maria_group_commit none
+maria_group_commit_interval 0
maria_log_file_size 4294959104
maria_log_purge_type immediate
maria_max_sort_file_size 9223372036853727232
@@ -328,6 +330,7 @@
Maria_pagecache_reads #
Maria_pagecache_write_requests #
Maria_pagecache_writes #
+Maria_transaction_log_syncs #
create table t1 (b char(0));
insert into t1 values(NULL),("");
select length(b) from t1;
=== added file 'mysql-test/suite/maria/t/group_commit.test'
--- a/mysql-test/suite/maria/t/group_commit.test 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/maria/t/group_commit.test 2010-02-09 08:32:59 +0000
@@ -0,0 +1,71 @@
+# Test different ways of syncing (mostly syntax)
+
+--disable_warnings
+drop table if exists t1;
+--enable_warnings
+
+create table t1 (a int);
+
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 100;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 0;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="HARD";
+SET GLOBAL maria_group_commit_interval= 100;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 0;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="SOFT";
+SET GLOBAL maria_group_commit_interval= 100;
+--disable_query_log
+let $num = 5000;
+while ($num)
+{
+ insert into t1 values (1);
+ dec $num;
+}
+--enable_query_log
+SET GLOBAL maria_group_commit="NONE";
+SET GLOBAL maria_group_commit_interval= 0;
+drop table t1;
=== modified file 'storage/maria/ha_maria.cc'
--- a/storage/maria/ha_maria.cc 2009-12-03 11:34:11 +0000
+++ b/storage/maria/ha_maria.cc 2010-02-09 08:32:59 +0000
@@ -102,22 +102,40 @@
array_elements(maria_translog_purge_type_names) - 1, "",
maria_translog_purge_type_names, NULL
};
+
+/* transactional log directory sync */
const char *maria_sync_log_dir_names[]=
{
"NEVER", "NEWFILE", "ALWAYS", NullS
};
-
TYPELIB maria_sync_log_dir_typelib=
{
array_elements(maria_sync_log_dir_names) - 1, "",
maria_sync_log_dir_names, NULL
};
+/* transactional log group commit */
+const char *maria_group_commit_names[]=
+{
+ "none", "hard", "soft", NullS
+};
+TYPELIB maria_group_commit_typelib=
+{
+ array_elements(maria_group_commit_names) - 1, "",
+ maria_group_commit_names, NULL
+};
+
/** Interval between background checkpoints in seconds */
static ulong checkpoint_interval;
static void update_checkpoint_interval(MYSQL_THD thd,
struct st_mysql_sys_var *var,
void *var_ptr, const void *save);
+static void update_maria_group_commit(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save);
+static void update_maria_group_commit_interval(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save);
/** After that many consecutive recovery failures, remove logs */
static ulong force_start_after_recovery_failures;
static void update_log_file_size(MYSQL_THD thd,
@@ -164,6 +182,24 @@
NULL, update_log_file_size, TRANSLOG_FILE_SIZE,
TRANSLOG_MIN_FILE_SIZE, 0xffffffffL, TRANSLOG_PAGE_SIZE);
+static MYSQL_SYSVAR_ENUM(group_commit, maria_group_commit,
+ PLUGIN_VAR_RQCMDARG,
+ "Specifies maria group commit mode. "
+ "Possible values are \"none\" (no group commit), "
+ "\"hard\" (with waiting to actual commit), "
+ "\"soft\" (no wait for commit (DANGEROUS!!!))",
+ NULL, update_maria_group_commit,
+ TRANSLOG_GCOMMIT_NONE, &maria_group_commit_typelib);
+
+static MYSQL_SYSVAR_ULONG(group_commit_interval, maria_group_commit_interval,
+ PLUGIN_VAR_RQCMDARG,
+ "Interval between commite in microseconds (1/1000000c)."
+ " 0 stands for no waiting"
+ "for other threads to come and do a commit in \"hard\" mode and no"
+ " sync()/commit at all in \"soft\" mode. Option has only an effect"
+ "if maria_group_commit is used",
+ NULL, update_maria_group_commit_interval, 0, 0, UINT_MAX, 1);
+
static MYSQL_SYSVAR_ENUM(log_purge_type, log_purge_type,
PLUGIN_VAR_RQCMDARG,
"Specifies how maria transactional log will be purged. "
@@ -3275,6 +3311,8 @@
MYSQL_SYSVAR(block_size),
MYSQL_SYSVAR(checkpoint_interval),
MYSQL_SYSVAR(force_start_after_recovery_failures),
+ MYSQL_SYSVAR(group_commit),
+ MYSQL_SYSVAR(group_commit_interval),
MYSQL_SYSVAR(page_checksum),
MYSQL_SYSVAR(log_dir_path),
MYSQL_SYSVAR(log_file_size),
@@ -3306,6 +3344,92 @@
}
/**
+ @brief Updates group commit mode
+*/
+
+static void update_maria_group_commit(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save)
+{
+ ulong value= (ulong)*((long *)var_ptr);
+ DBUG_ENTER("update_maria_group_commit");
+ DBUG_PRINT("enter", ("old value: %lu new value %lu rate %lu",
+ value, (ulong)(*(long *)save),
+ maria_group_commit_interval));
+ /* old value */
+ switch (value) {
+ case TRANSLOG_GCOMMIT_NONE:
+ break;
+ case TRANSLOG_GCOMMIT_HARD:
+ translog_hard_group_commit(FALSE);
+ break;
+ case TRANSLOG_GCOMMIT_SOFT:
+ translog_soft_sync(FALSE);
+ if (maria_group_commit_interval)
+ translog_soft_sync_end();
+ break;
+ default:
+ DBUG_ASSERT(0); /* impossible */
+ }
+ value= *(ulong *)var_ptr= (ulong)(*(long *)save);
+ translog_sync();
+ /* new value */
+ switch (value) {
+ case TRANSLOG_GCOMMIT_NONE:
+ break;
+ case TRANSLOG_GCOMMIT_HARD:
+ translog_hard_group_commit(TRUE);
+ break;
+ case TRANSLOG_GCOMMIT_SOFT:
+ translog_soft_sync(TRUE);
+ /* variable change made under global lock so we can just read it */
+ if (maria_group_commit_interval)
+ translog_soft_sync_start();
+ break;
+ default:
+ DBUG_ASSERT(0); /* impossible */
+ }
+ DBUG_VOID_RETURN;
+}
+
+/**
+ @brief Updates group commit interval
+*/
+
+static void update_maria_group_commit_interval(MYSQL_THD thd,
+ struct st_mysql_sys_var *var,
+ void *var_ptr, const void *save)
+{
+ ulong new_value= (ulong)*((long *)save);
+ ulong *value_ptr= (ulong*) var_ptr;
+ DBUG_ENTER("update_maria_group_commit_interval");
+ DBUG_PRINT("enter", ("old value: %lu new value %lu group commit %lu",
+ *value_ptr, new_value, maria_group_commit));
+
+ /* variable change made under global lock so we can just read it */
+ switch (maria_group_commit) {
+ case TRANSLOG_GCOMMIT_NONE:
+ *value_ptr= new_value;
+ translog_set_group_commit_interval(new_value);
+ break;
+ case TRANSLOG_GCOMMIT_HARD:
+ *value_ptr= new_value;
+ translog_set_group_commit_interval(new_value);
+ break;
+ case TRANSLOG_GCOMMIT_SOFT:
+ if (*value_ptr)
+ translog_soft_sync_end();
+ translog_set_group_commit_interval(new_value);
+ if ((*value_ptr= new_value))
+ translog_soft_sync_start();
+ break;
+ default:
+ DBUG_ASSERT(0); /* impossible */
+ }
+ DBUG_VOID_RETURN;
+}
+
+/**
@brief Updates the transaction log file limit.
*/
@@ -3327,6 +3451,7 @@
{"Maria_pagecache_reads", (char*) &maria_pagecache_var.global_cache_read, SHOW_LONGLONG},
{"Maria_pagecache_write_requests", (char*) &maria_pagecache_var.global_cache_w_requests, SHOW_LONGLONG},
{"Maria_pagecache_writes", (char*) &maria_pagecache_var.global_cache_write, SHOW_LONGLONG},
+ {"Maria_transaction_log_syncs", (char*) &translog_syncs, SHOW_LONGLONG},
{NullS, NullS, SHOW_LONG}
};
=== modified file 'storage/maria/ma_init.c'
--- a/storage/maria/ma_init.c 2008-10-09 20:03:54 +0000
+++ b/storage/maria/ma_init.c 2010-02-09 08:32:59 +0000
@@ -82,6 +82,11 @@
maria_inited= maria_multi_threaded= FALSE;
ft_free_stopwords();
ma_checkpoint_end();
+ if (translog_status == TRANSLOG_OK)
+ {
+ translog_soft_sync_end();
+ translog_sync();
+ }
if ((trid= trnman_get_max_trid()) > max_trid_in_control_file)
{
/*
=== modified file 'storage/maria/ma_loghandler.c'
--- a/storage/maria/ma_loghandler.c 2010-01-06 21:27:53 +0000
+++ b/storage/maria/ma_loghandler.c 2010-02-09 08:32:59 +0000
@@ -18,6 +18,7 @@
#include "ma_blockrec.h" /* for some constants and in-write hooks */
#include "ma_key_recover.h" /* For some in-write hooks */
#include "ma_checkpoint.h"
+#include "ma_servicethread.h"
/*
On Windows, neither my_open() nor my_sync() work for directories.
@@ -47,6 +48,15 @@
#include <m_ctype.h>
#endif
+/** @brief protects checkpoint_in_progress */
+static pthread_mutex_t LOCK_soft_sync;
+/** @brief for killing the background checkpoint thread */
+static pthread_cond_t COND_soft_sync;
+/** @brief control structure for checkpoint background thread */
+static MA_SERVICE_THREAD_CONTROL soft_sync_control=
+ {THREAD_DEAD, FALSE, &LOCK_soft_sync, &COND_soft_sync};
+
+
/* transaction log file descriptor */
typedef struct st_translog_file
{
@@ -124,10 +134,20 @@
/* Previous buffer offset to detect it flush finish */
TRANSLOG_ADDRESS prev_buffer_offset;
/*
+ If the buffer was forced to close it save value of its horizon
+ otherwise LSN_IMPOSSIBLE
+ */
+ TRANSLOG_ADDRESS pre_force_close_horizon;
+ /*
How much is written (or will be written when copy_to_buffer_in_progress
become 0) to this buffer
*/
translog_size_t size;
+ /*
+ How much data was skipped during moving page from previous buffer
+ to this one (it is optimisation of forcing buffer to finish
+ */
+ uint skipped_data;
/* File handler for this buffer */
TRANSLOG_FILE *file;
/* Threads which are waiting for buffer filling/freeing */
@@ -304,6 +324,7 @@
*/
pthread_mutex_t log_flush_lock;
pthread_cond_t log_flush_cond;
+ pthread_cond_t new_goal_cond;
/* Protects changing of headers of finished files (max_lsn) */
pthread_mutex_t file_header_lock;
@@ -344,13 +365,38 @@
ulong log_purge_type= TRANSLOG_PURGE_IMMIDIATE;
ulong log_file_size= TRANSLOG_FILE_SIZE;
+/* sync() of log files directory mode */
ulong sync_log_dir= TRANSLOG_SYNC_DIR_NEWFILE;
+ulong maria_group_commit= TRANSLOG_GCOMMIT_NONE;
+ulong maria_group_commit_interval= 0;
/* Marker for end of log */
static uchar end_of_log= 0;
#define END_OF_LOG &end_of_log
+/**
+ Switch for "soft" sync (no real sync() but periodical sync by service
+ thread)
+*/
+static volatile my_bool soft_sync= FALSE;
+/**
+ Switch for "hard" group commit mode
+*/
+static volatile my_bool hard_group_commit= FALSE;
+/**
+ File numbers interval which have to be sync()
+*/
+static uint32 soft_sync_min= 0;
+static uint32 soft_sync_max= 0;
+/**
+ stores interval in microseconds
+*/
+static uint32 group_commit_wait= 0;
enum enum_translog_status translog_status= TRANSLOG_UNINITED;
+ulonglong translog_syncs= 0; /* Number of sync()s */
+
+/* time of last flush */
+static ulonglong flush_start= 0;
/* chunk types */
#define TRANSLOG_CHUNK_LSN 0x00 /* 0 chunk refer as LSN (head or tail */
@@ -980,12 +1026,17 @@
static TRANSLOG_FILE *get_current_logfile()
{
TRANSLOG_FILE *file;
+ DBUG_ENTER("get_current_logfile");
rw_rdlock(&log_descriptor.open_files_lock);
+ DBUG_PRINT("info", ("max_file: %lu min_file: %lu open_files: %lu",
+ (ulong) log_descriptor.max_file,
+ (ulong) log_descriptor.min_file,
+ (ulong) log_descriptor.open_files.elements));
DBUG_ASSERT(log_descriptor.max_file - log_descriptor.min_file + 1 ==
log_descriptor.open_files.elements);
file= *dynamic_element(&log_descriptor.open_files, 0, TRANSLOG_FILE **);
rw_unlock(&log_descriptor.open_files_lock);
- return (file);
+ DBUG_RETURN(file);
}
uchar NEAR maria_trans_file_magic[]=
@@ -1069,6 +1120,7 @@
static my_bool translog_max_lsn_to_header(File file, LSN lsn)
{
uchar lsn_buff[LSN_STORE_SIZE];
+ my_bool rc;
DBUG_ENTER("translog_max_lsn_to_header");
DBUG_PRINT("enter", ("File descriptor: %ld "
"lsn: (%lu,0x%lx)",
@@ -1077,11 +1129,13 @@
lsn_store(lsn_buff, lsn);
- DBUG_RETURN(my_pwrite(file, lsn_buff,
- LSN_STORE_SIZE,
- (LOG_HEADER_DATA_SIZE - LSN_STORE_SIZE),
- log_write_flags) != 0 ||
- my_sync(file, MYF(MY_WME)) != 0);
+ if (!(rc= (my_pwrite(file, lsn_buff,
+ LSN_STORE_SIZE,
+ (LOG_HEADER_DATA_SIZE - LSN_STORE_SIZE),
+ log_write_flags) != 0 ||
+ my_sync(file, MYF(MY_WME)) != 0)))
+ translog_syncs++;
+ DBUG_RETURN(rc);
}
@@ -1423,7 +1477,9 @@
static my_bool translog_buffer_init(struct st_translog_buffer *buffer, int num)
{
DBUG_ENTER("translog_buffer_init");
- buffer->prev_last_lsn= buffer->last_lsn= LSN_IMPOSSIBLE;
+ buffer->pre_force_close_horizon=
+ buffer->prev_last_lsn= buffer->last_lsn=
+ LSN_IMPOSSIBLE;
DBUG_PRINT("info", ("last_lsn and prev_last_lsn set to 0 buffer: 0x%lx",
(ulong) buffer));
@@ -1435,6 +1491,7 @@
memset(buffer->buffer, TRANSLOG_FILLER, TRANSLOG_WRITE_BUFFER);
/* Buffer size */
buffer->size= 0;
+ buffer->skipped_data= 0;
/* cond of thread which is waiting for buffer filling */
if (pthread_cond_init(&buffer->waiting_filling_buffer, 0))
DBUG_RETURN(1);
@@ -1489,7 +1546,10 @@
TODO: sync only we have changed the log
*/
if (!file->is_sync)
+ {
rc= my_sync(file->handler.file, MYF(MY_WME));
+ translog_syncs++;
+ }
rc|= my_close(file->handler.file, MYF(MY_WME));
my_free(file, MYF(0));
return test(rc);
@@ -2044,7 +2104,8 @@
(ulong) LSN_OFFSET(log_descriptor.horizon),
(ulong) LSN_OFFSET(log_descriptor.horizon)));
DBUG_ASSERT(buffer_no == buffer->buffer_no);
- buffer->prev_last_lsn= buffer->last_lsn= LSN_IMPOSSIBLE;
+ buffer->pre_force_close_horizon=
+ buffer->prev_last_lsn= buffer->last_lsn= LSN_IMPOSSIBLE;
DBUG_PRINT("info", ("last_lsn and prev_last_lsn set to 0 buffer: 0x%lx",
(ulong) buffer));
buffer->offset= log_descriptor.horizon;
@@ -2052,6 +2113,7 @@
buffer->file= get_current_logfile();
buffer->overlay= 0;
buffer->size= 0;
+ buffer->skipped_data= 0;
translog_cursor_init(cursor, buffer, buffer_no);
DBUG_PRINT("info", ("file: #%ld (%d) init cursor #%u: 0x%lx "
"chaser: %d Size: %lu (%lu)",
@@ -2523,6 +2585,7 @@
TRANSLOG_ADDRESS offset= buffer->offset;
TRANSLOG_FILE *file= buffer->file;
uint8 ver= buffer->ver;
+ uint skipped_data;
DBUG_ENTER("translog_buffer_flush");
DBUG_PRINT("enter",
("Buffer: #%u 0x%lx file: %d offset: (%lu,0x%lx) size: %lu",
@@ -2557,6 +2620,8 @@
disk
*/
file= buffer->file;
+ skipped_data= buffer->skipped_data;
+ DBUG_ASSERT(skipped_data < TRANSLOG_PAGE_SIZE);
for (i= 0, pg= LSN_OFFSET(buffer->offset) / TRANSLOG_PAGE_SIZE;
i < buffer->size;
i+= TRANSLOG_PAGE_SIZE, pg++)
@@ -2573,13 +2638,16 @@
DBUG_ASSERT(i + TRANSLOG_PAGE_SIZE <= buffer->size);
if (translog_status != TRANSLOG_OK && translog_status != TRANSLOG_SHUTDOWN)
DBUG_RETURN(1);
- if (pagecache_inject(log_descriptor.pagecache,
+ if (pagecache_write_part(log_descriptor.pagecache,
&file->handler, pg, 3,
buffer->buffer + i,
PAGECACHE_PLAIN_PAGE,
PAGECACHE_LOCK_LEFT_UNLOCKED,
- PAGECACHE_PIN_LEFT_UNPINNED, 0,
- LSN_IMPOSSIBLE))
+ PAGECACHE_PIN_LEFT_UNPINNED,
+ PAGECACHE_WRITE_DONE, 0,
+ LSN_IMPOSSIBLE,
+ skipped_data,
+ TRANSLOG_PAGE_SIZE - skipped_data))
{
DBUG_PRINT("error",
("Can't write page (%lu,0x%lx) to pagecache, error: %d",
@@ -2589,10 +2657,12 @@
translog_stop_writing();
DBUG_RETURN(1);
}
+ skipped_data= 0;
}
file->is_sync= 0;
- if (my_pwrite(file->handler.file, buffer->buffer,
- buffer->size, LSN_OFFSET(buffer->offset),
+ if (my_pwrite(file->handler.file, buffer->buffer + buffer->skipped_data,
+ buffer->size - buffer->skipped_data,
+ LSN_OFFSET(buffer->offset) + buffer->skipped_data,
log_write_flags))
{
DBUG_PRINT("error", ("Can't write buffer (%lu,0x%lx) size %lu "
@@ -2985,6 +3055,7 @@
uchar *from, *table= NULL;
int is_last_unfinished_page;
uint last_protected_sector= 0;
+ uint skipped_data= curr_buffer->skipped_data;
TRANSLOG_FILE file_copy;
uint8 ver= curr_buffer->ver;
translog_wait_for_writers(curr_buffer);
@@ -2997,7 +3068,25 @@
}
DBUG_ASSERT(LSN_FILE_NO(addr) == LSN_FILE_NO(curr_buffer->offset));
from= curr_buffer->buffer + (addr - curr_buffer->offset);
- memcpy(buffer, from, TRANSLOG_PAGE_SIZE);
+ if (skipped_data > (addr - curr_buffer->offset))
+ {
+ /*
+ We read page part of which is not present in buffer,
+ so we should read absent part from file (page cache actually)
+ */
+ file= get_logfile_by_number(file_no);
+ DBUG_ASSERT(file != NULL);
+ buffer= pagecache_read(log_descriptor.pagecache, &file->handler,
+ LSN_OFFSET(addr) / TRANSLOG_PAGE_SIZE,
+ 3, buffer,
+ PAGECACHE_PLAIN_PAGE,
+ PAGECACHE_LOCK_LEFT_UNLOCKED,
+ NULL);
+ }
+ else
+ skipped_data= 0; /* Read after skipped in buffer data */
+ memcpy(buffer + skipped_data, from + skipped_data,
+ TRANSLOG_PAGE_SIZE - skipped_data);
/*
We can use copy then in translog_page_validator() because it
do not put it permanently somewhere.
@@ -3291,6 +3380,7 @@
uint32 next_page_offset, page_rest;
uint32 i;
File fd;
+ int rc;
TRANSLOG_VALIDATOR_DATA data;
char path[FN_REFLEN];
uchar page_buff[TRANSLOG_PAGE_SIZE];
@@ -3316,14 +3406,19 @@
TRANSLOG_PAGE_SIZE);
page_rest= next_page_offset - LSN_OFFSET(addr);
memset(page_buff, TRANSLOG_FILLER, page_rest);
- if ((fd= open_logfile_by_number_no_cache(LSN_FILE_NO(addr))) < 0 ||
- ((my_chsize(fd, next_page_offset, TRANSLOG_FILLER, MYF(MY_WME)) ||
- (page_rest && my_pwrite(fd, page_buff, page_rest, LSN_OFFSET(addr),
- log_write_flags)) ||
- my_sync(fd, MYF(MY_WME))) |
- my_close(fd, MYF(MY_WME))) ||
- (sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS &&
- sync_dir(log_descriptor.directory_fd, MYF(MY_WME | MY_IGNORE_BADFD))))
+ rc= ((fd= open_logfile_by_number_no_cache(LSN_FILE_NO(addr))) < 0 ||
+ ((my_chsize(fd, next_page_offset, TRANSLOG_FILLER, MYF(MY_WME)) ||
+ (page_rest && my_pwrite(fd, page_buff, page_rest, LSN_OFFSET(addr),
+ log_write_flags)) ||
+ my_sync(fd, MYF(MY_WME)))));
+ translog_syncs++;
+ rc|= (fd > 0 && my_close(fd, MYF(MY_WME)));
+ if (sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS)
+ {
+ rc|= sync_dir(log_descriptor.directory_fd, MYF(MY_WME | MY_IGNORE_BADFD));
+ translog_syncs++;
+ }
+ if (rc)
DBUG_RETURN(1);
/* fix the horizon */
@@ -3511,6 +3606,7 @@
pthread_mutex_init(&log_descriptor.dirty_buffer_mask_lock,
MY_MUTEX_INIT_FAST) ||
pthread_cond_init(&log_descriptor.log_flush_cond, 0) ||
+ pthread_cond_init(&log_descriptor.new_goal_cond, 0) ||
my_rwlock_init(&log_descriptor.open_files_lock,
NULL) ||
my_init_dynamic_array(&log_descriptor.open_files,
@@ -3912,7 +4008,6 @@
log_descriptor.flushed= log_descriptor.horizon;
log_descriptor.in_buffers_only= log_descriptor.bc.buffer->offset;
log_descriptor.max_lsn= LSN_IMPOSSIBLE; /* set to 0 */
- log_descriptor.previous_flush_horizon= log_descriptor.horizon;
/*
Now 'flushed' is set to 'horizon' value, but 'horizon' is (potentially)
address of the next LSN and we want indicate that all LSNs that are
@@ -3995,6 +4090,10 @@
It is beginning of the log => there is no LSNs in the log =>
There is no harm in leaving it "as-is".
*/
+ log_descriptor.previous_flush_horizon= log_descriptor.horizon;
+ DBUG_PRINT("info", ("previous_flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(log_descriptor.
+ previous_flush_horizon)));
DBUG_RETURN(0);
}
file_no--;
@@ -4070,6 +4169,9 @@
translog_free_record_header(&rec);
}
}
+ log_descriptor.previous_flush_horizon= log_descriptor.horizon;
+ DBUG_PRINT("info", ("previous_flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(log_descriptor.previous_flush_horizon)));
DBUG_RETURN(0);
err:
ma_message_no_user(0, "log initialization failed");
@@ -4157,6 +4259,7 @@
pthread_mutex_destroy(&log_descriptor.log_flush_lock);
pthread_mutex_destroy(&log_descriptor.dirty_buffer_mask_lock);
pthread_cond_destroy(&log_descriptor.log_flush_cond);
+ pthread_cond_destroy(&log_descriptor.new_goal_cond);
rwlock_destroy(&log_descriptor.open_files_lock);
delete_dynamic(&log_descriptor.open_files);
delete_dynamic(&log_descriptor.unfinished_files);
@@ -6885,11 +6988,11 @@
{
translog_size_t res;
DBUG_ENTER("translog_read_record_header_from_buffer");
- DBUG_ASSERT(translog_is_LSN_chunk(page[page_offset]));
- DBUG_ASSERT(translog_status == TRANSLOG_OK ||
- translog_status == TRANSLOG_READONLY);
DBUG_PRINT("info", ("page byte: 0x%x offset: %u",
(uint) page[page_offset], (uint) page_offset));
+ DBUG_ASSERT(translog_is_LSN_chunk(page[page_offset]));
+ DBUG_ASSERT(translog_status == TRANSLOG_OK ||
+ translog_status == TRANSLOG_READONLY);
buff->type= (page[page_offset] & TRANSLOG_REC_TYPE);
buff->short_trid= uint2korr(page + page_offset + 1);
DBUG_PRINT("info", ("Type %u, Short TrID %u, LSN (%lu,0x%lx)",
@@ -7356,27 +7459,27 @@
"Buffer addr: (%lu,0x%lx) "
"Page addr: (%lu,0x%lx) "
"size: %lu (%lu) Pg: %u left: %u in progress %u",
- (uint) log_descriptor.bc.buffer_no,
- (ulong) log_descriptor.bc.buffer,
- LSN_IN_PARTS(log_descriptor.bc.buffer->offset),
+ (uint) old_buffer_no,
+ (ulong) old_buffer,
+ LSN_IN_PARTS(old_buffer->offset),
(ulong) LSN_FILE_NO(log_descriptor.horizon),
(ulong) (LSN_OFFSET(log_descriptor.horizon) -
log_descriptor.bc.current_page_fill),
- (ulong) log_descriptor.bc.buffer->size,
+ (ulong) old_buffer->size,
(ulong) (log_descriptor.bc.ptr -log_descriptor.bc.
buffer->buffer),
(uint) log_descriptor.bc.current_page_fill,
(uint) left,
- (uint) log_descriptor.bc.buffer->
+ (uint) old_buffer->
copy_to_buffer_in_progress));
translog_lock_assert_owner();
LINT_INIT(current_page_fill);
- new_buff_beginning= log_descriptor.bc.buffer->offset;
- new_buff_beginning+= log_descriptor.bc.buffer->size; /* increase offset */
+ new_buff_beginning= old_buffer->offset;
+ new_buff_beginning+= old_buffer->size; /* increase offset */
DBUG_ASSERT(log_descriptor.bc.ptr !=NULL);
DBUG_ASSERT(LSN_FILE_NO(log_descriptor.horizon) ==
- LSN_FILE_NO(log_descriptor.bc.buffer->offset));
+ LSN_FILE_NO(old_buffer->offset));
translog_check_cursor(&log_descriptor.bc);
DBUG_ASSERT(left < TRANSLOG_PAGE_SIZE);
if (left)
@@ -7387,18 +7490,20 @@
*/
DBUG_PRINT("info", ("left: %u", (uint) left));
+ old_buffer->pre_force_close_horizon=
+ old_buffer->offset + old_buffer->size;
/* decrease offset */
new_buff_beginning-= log_descriptor.bc.current_page_fill;
current_page_fill= log_descriptor.bc.current_page_fill;
memset(log_descriptor.bc.ptr, TRANSLOG_FILLER, left);
- log_descriptor.bc.buffer->size+= left;
+ old_buffer->size+= left;
DBUG_PRINT("info", ("Finish Page buffer #%u: 0x%lx "
"Size: %lu",
- (uint) log_descriptor.bc.buffer->buffer_no,
- (ulong) log_descriptor.bc.buffer,
- (ulong) log_descriptor.bc.buffer->size));
- DBUG_ASSERT(log_descriptor.bc.buffer->buffer_no ==
+ (uint) old_buffer->buffer_no,
+ (ulong) old_buffer,
+ (ulong) old_buffer->size));
+ DBUG_ASSERT(old_buffer->buffer_no ==
log_descriptor.bc.buffer_no);
}
else
@@ -7509,11 +7614,21 @@
if (left)
{
- /*
- TODO: do not copy beginning of the page if we have no CRC or sector
- checks on
- */
- memcpy(new_buffer->buffer, data, current_page_fill);
+ if (log_descriptor.flags &
+ (TRANSLOG_PAGE_CRC | TRANSLOG_SECTOR_PROTECTION))
+ memcpy(new_buffer->buffer, data, current_page_fill);
+ else
+ {
+ /*
+ This page header does not change if we add more data to the page so
+ we can not copy it and will not overwrite later
+ */
+ new_buffer->skipped_data= current_page_fill;
+#ifndef DBUG_OFF
+ memset(new_buffer->buffer, 0xa5, current_page_fill);
+#endif
+ DBUG_ASSERT(new_buffer->skipped_data < TRANSLOG_PAGE_SIZE);
+ }
}
old_buffer->next_buffer_offset= new_buffer->offset;
translog_buffer_lock(new_buffer);
@@ -7561,6 +7676,7 @@
{
log_descriptor.next_pass_max_lsn= lsn;
log_descriptor.max_lsn_requester= pthread_self();
+ pthread_cond_broadcast(&log_descriptor.new_goal_cond);
}
while (flush_no == log_descriptor.flush_no)
{
@@ -7572,66 +7688,78 @@
/**
- @brief Flush the log up to given LSN (included)
-
- @param lsn log record serial number up to which (inclusive)
- the log has to be flushed
-
- @return Operation status
+ @brief sync() range of files (inclusive) and directory (by request)
+
+ @param min min internal file number to flush
+ @param max max internal file number to flush
+ @param sync_dir need sync directory
+
+ return Operation status
@retval 0 OK
@retval 1 Error
-
-*/
-
-my_bool translog_flush(TRANSLOG_ADDRESS lsn)
-{
- LSN sent_to_disk= LSN_IMPOSSIBLE;
- TRANSLOG_ADDRESS flush_horizon;
- uint fn, i;
+*/
+
+static my_bool translog_sync_files(uint32 min, uint32 max,
+ my_bool sync_dir)
+{
+ uint fn;
+ my_bool rc= 0;
+ ulonglong flush_interval;
+ DBUG_ENTER("translog_sync_files");
+ DBUG_PRINT("info", ("min: %lu max: %lu sync dir: %d",
+ (ulong) min, (ulong) max, (int) sync_dir));
+ DBUG_ASSERT(min <= max);
+
+ flush_interval= group_commit_wait;
+ if (flush_interval)
+ flush_start= my_micro_time();
+ for (fn= min; fn <= max; fn++)
+ {
+ TRANSLOG_FILE *file= get_logfile_by_number(fn);
+ DBUG_ASSERT(file != NULL);
+ if (!file->is_sync)
+ {
+ if (my_sync(file->handler.file, MYF(MY_WME)))
+ {
+ rc= 1;
+ translog_stop_writing();
+ DBUG_RETURN(rc);
+ }
+ translog_syncs++;
+ file->is_sync= 1;
+ }
+ }
+
+ if (sync_dir)
+ {
+ if (!(rc= sync_dir(log_descriptor.directory_fd,
+ MYF(MY_WME | MY_IGNORE_BADFD))))
+ translog_syncs++;
+ }
+
+ DBUG_RETURN(rc);
+}
+
+
+/*
+ @brief Flushes buffers with LSNs in them less or equal address <lsn>
+
+ @param lsn address up to which all LSNs should be flushed,
+ can be reset to real last LSN address
+ @parem sent_to_disk returns 'sent to disk' position
+ @param flush_horizon returns horizon of the flush
+
+ @note About terminology see comment to translog_flush().
+*/
+
+void translog_flush_buffers(TRANSLOG_ADDRESS *lsn,
+ TRANSLOG_ADDRESS *sent_to_disk,
+ TRANSLOG_ADDRESS *flush_horizon)
+{
dirty_buffer_mask_t dirty_buffer_mask;
+ uint i;
uint8 last_buffer_no, start_buffer_no;
- my_bool rc= 0;
- DBUG_ENTER("translog_flush");
- DBUG_PRINT("enter", ("Flush up to LSN: (%lu,0x%lx)", LSN_IN_PARTS(lsn)));
- DBUG_ASSERT(translog_status == TRANSLOG_OK ||
- translog_status == TRANSLOG_READONLY);
- LINT_INIT(sent_to_disk);
-
- pthread_mutex_lock(&log_descriptor.log_flush_lock);
- DBUG_PRINT("info", ("Everything is flushed up to (%lu,0x%lx)",
- LSN_IN_PARTS(log_descriptor.flushed)));
- if (cmp_translog_addr(log_descriptor.flushed, lsn) >= 0)
- {
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);
- DBUG_RETURN(0);
- }
- if (log_descriptor.flush_in_progress)
- {
- translog_flush_set_new_goal_and_wait(lsn);
- if (!pthread_equal(log_descriptor.max_lsn_requester, pthread_self()))
- {
- /* fix lsn if it was horizon */
- if (cmp_translog_addr(lsn, log_descriptor.bc.buffer->last_lsn) > 0)
- lsn= BUFFER_MAX_LSN(log_descriptor.bc.buffer);
- translog_flush_wait_for_end(lsn);
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);
- DBUG_RETURN(0);
- }
- log_descriptor.next_pass_max_lsn= LSN_IMPOSSIBLE;
- }
- log_descriptor.flush_in_progress= 1;
- flush_horizon= log_descriptor.previous_flush_horizon;
- DBUG_PRINT("info", ("flush_in_progress is set"));
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);
-
- translog_lock();
- if (log_descriptor.is_everything_flushed)
- {
- DBUG_PRINT("info", ("everything is flushed"));
- rc= (translog_status == TRANSLOG_READONLY);
- translog_unlock();
- goto out;
- }
+ DBUG_ENTER("translog_flush_buffers");
/*
We will recheck information when will lock buffers one by
@@ -7656,15 +7784,15 @@
/*
if LSN up to which we have to flush bigger then maximum LSN of previous
buffer and at least one LSN was saved in the current buffer (last_lsn !=
- LSN_IMPOSSIBLE) then we better finish the current buffer.
+ LSN_IMPOSSIBLE) then we have to close the current buffer.
*/
- if (cmp_translog_addr(lsn, log_descriptor.bc.buffer->prev_last_lsn) > 0 &&
+ if (cmp_translog_addr(*lsn, log_descriptor.bc.buffer->prev_last_lsn) > 0 &&
log_descriptor.bc.buffer->last_lsn != LSN_IMPOSSIBLE)
{
struct st_translog_buffer *buffer= log_descriptor.bc.buffer;
- lsn= log_descriptor.bc.buffer->last_lsn; /* fix lsn if it was horizon */
+ *lsn= log_descriptor.bc.buffer->last_lsn; /* fix lsn if it was horizon */
DBUG_PRINT("info", ("LSN to flush fixed to last lsn: (%lu,0x%lx)",
- LSN_IN_PARTS(log_descriptor.bc.buffer->last_lsn)));
+ LSN_IN_PARTS(log_descriptor.bc.buffer->last_lsn)));
last_buffer_no= log_descriptor.bc.buffer_no;
log_descriptor.is_everything_flushed= 1;
translog_force_current_buffer_to_finish();
@@ -7676,8 +7804,10 @@
TRANSLOG_BUFFERS_NO);
translog_unlock();
}
- sent_to_disk= translog_get_sent_to_disk();
- if (cmp_translog_addr(lsn, sent_to_disk) > 0)
+
+ /* flush buffers */
+ *sent_to_disk= translog_get_sent_to_disk();
+ if (cmp_translog_addr(*lsn, *sent_to_disk) > 0)
{
DBUG_PRINT("info", ("Start buffer #: %u last buffer #: %u",
@@ -7697,53 +7827,237 @@
LSN_IN_PARTS(buffer->last_lsn),
(buffer->file ?
"dirty" : "closed")));
- if (buffer->prev_last_lsn <= lsn &&
+ if (buffer->prev_last_lsn <= *lsn &&
buffer->file != NULL)
{
- DBUG_ASSERT(flush_horizon <= buffer->offset + buffer->size);
- flush_horizon= buffer->offset + buffer->size;
+ DBUG_ASSERT(*flush_horizon <= buffer->offset + buffer->size);
+ *flush_horizon= (buffer->pre_force_close_horizon != LSN_IMPOSSIBLE ?
+ buffer->pre_force_close_horizon :
+ buffer->offset + buffer->size);
+ /* pre_force_close_horizon is reset during new buffer start */
+ DBUG_PRINT("info", ("flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(*flush_horizon)));
+ DBUG_ASSERT(*flush_horizon <= log_descriptor.horizon);
+
translog_buffer_flush(buffer);
}
translog_buffer_unlock(buffer);
i= (i + 1) % TRANSLOG_BUFFERS_NO;
} while (i != last_buffer_no);
- sent_to_disk= translog_get_sent_to_disk();
- }
-
- /* sync files from previous flush till current one */
- for (fn= LSN_FILE_NO(log_descriptor.flushed); fn <= LSN_FILE_NO(lsn); fn++)
- {
- TRANSLOG_FILE *file= get_logfile_by_number(fn);
- DBUG_ASSERT(file != NULL);
- if (!file->is_sync)
- {
- if (my_sync(file->handler.file, MYF(MY_WME)))
+ *sent_to_disk= translog_get_sent_to_disk();
+ }
+
+ DBUG_VOID_RETURN;
+}
+
+/**
+ @brief Flush the log up to given LSN (included)
+
+ @param lsn log record serial number up to which (inclusive)
+ the log has to be flushed
+
+ @return Operation status
+ @retval 0 OK
+ @retval 1 Error
+
+ @note
+
+ - Non group commit logic: Commits made in passes. Thread which started
+ flush first is performing actual flush, other threads sets new goal (LSN)
+ of the next pass (if it is maximum) and waits for the pass end or just
+ wait for the pass end.
+
+ - If hard group commit enabled and rate set to zero:
+ The first thread sends all changed buffers to disk. This is repeated
+ as long as there are new LSNs added. The process can not loop
+ forever because we have limited number of threads and they will wait
+ for the data to be synced.
+ Pseudo code:
+
+ do
+ send changed buffers to disk
+ while new_goal
+ sync
+
+ - If hard group commit switched ON and less than rate microseconds has
+ passed from last sync, then after buffers have been sent to disk
+ wait until rate microseconds has passed since last sync, do sync and return.
+ This ensures that if we call sync infrequently we don't do any waits.
+
+ - If soft group commit enabled everything works as with 'non group commit'
+ but the thread doesn't do any real sync(). If rate is not zero the
+ sync() will be performed by a service thread with the given rate
+ when needed (new LSN appears).
+
+ @note Terminology:
+ 'sent to disk' means written to disk but not sync()ed,
+ 'flushed' mean sent to disk and synced().
+*/
+
+my_bool translog_flush(TRANSLOG_ADDRESS lsn)
+{
+ struct timespec abstime;
+ ulonglong flush_interval;
+ ulonglong time_spent;
+ LSN sent_to_disk= LSN_IMPOSSIBLE;
+ TRANSLOG_ADDRESS flush_horizon;
+ my_bool rc= 0;
+ my_bool hgroup_commit_at_start;
+ DBUG_ENTER("translog_flush");
+ DBUG_PRINT("enter", ("Flush up to LSN: (%lu,0x%lx)", LSN_IN_PARTS(lsn)));
+ DBUG_ASSERT(translog_status == TRANSLOG_OK ||
+ translog_status == TRANSLOG_READONLY);
+ LINT_INIT(sent_to_disk);
+
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ DBUG_PRINT("info", ("Everything is flushed up to (%lu,0x%lx)",
+ LSN_IN_PARTS(log_descriptor.flushed)));
+ if (cmp_translog_addr(log_descriptor.flushed, lsn) >= 0)
+
+
+ {
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ DBUG_RETURN(0);
+ }
+ if (log_descriptor.flush_in_progress)
+ {
+ translog_lock();
+ /* fix lsn if it was horizon */
+ if (cmp_translog_addr(lsn, log_descriptor.bc.buffer->last_lsn) > 0)
+ lsn= BUFFER_MAX_LSN(log_descriptor.bc.buffer);
+ translog_unlock();
+ translog_flush_set_new_goal_and_wait(lsn);
+ if (!pthread_equal(log_descriptor.max_lsn_requester, pthread_self()))
+ {
+ /*
+ translog_flush_wait_for_end() release log_flush_lock while is
+ waiting then acquire it again
+ */
+ translog_flush_wait_for_end(lsn);
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ DBUG_RETURN(0);
+ }
+ log_descriptor.next_pass_max_lsn= LSN_IMPOSSIBLE;
+ }
+ log_descriptor.flush_in_progress= 1;
+ flush_horizon= log_descriptor.previous_flush_horizon;
+ DBUG_PRINT("info", ("flush_in_progress is set, flush_horizon: (%lu,0x%lx)",
+ LSN_IN_PARTS(flush_horizon)));
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+
+ hgroup_commit_at_start= hard_group_commit;
+ if (hgroup_commit_at_start)
+ flush_interval= group_commit_wait;
+
+ translog_lock();
+ if (log_descriptor.is_everything_flushed)
+ {
+ DBUG_PRINT("info", ("everything is flushed"));
+ translog_unlock();
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ goto out;
+ }
+
+ for (;;)
+ {
+ /* Following function flushes buffers and makes translog_unlock() */
+ translog_flush_buffers(&lsn, &sent_to_disk, &flush_horizon);
+
+ if (!hgroup_commit_at_start)
+ break; /* flush pass is ended */
+
+retest:
+ if (flush_interval != 0 &&
+ (my_micro_time() - flush_start) >= flush_interval)
+ break; /* flush pass is ended */
+
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ if (log_descriptor.next_pass_max_lsn != LSN_IMPOSSIBLE)
+ {
+ /* take next goal */
+ lsn= log_descriptor.next_pass_max_lsn;
+ log_descriptor.next_pass_max_lsn= LSN_IMPOSSIBLE;
+ /* prevent other thread from continue */
+ log_descriptor.max_lsn_requester= pthread_self();
+ DBUG_PRINT("info", ("flush took next goal: (%lu,0x%lx)",
+ LSN_IN_PARTS(lsn)));
+ }
+ else
+ {
+ if (flush_interval == 0 ||
+ (time_spent= (my_micro_time() - flush_start)) >= flush_interval)
{
- rc= 1;
- translog_stop_writing();
- sent_to_disk= LSN_IMPOSSIBLE;
- goto out;
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ break;
}
- file->is_sync= 1;
- }
- }
-
- if (sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS &&
- (LSN_FILE_NO(log_descriptor.previous_flush_horizon) !=
- LSN_FILE_NO(flush_horizon) ||
- ((LSN_OFFSET(log_descriptor.previous_flush_horizon) - 1) /
- TRANSLOG_PAGE_SIZE) !=
- ((LSN_OFFSET(flush_horizon) - 1) / TRANSLOG_PAGE_SIZE)))
- rc|= sync_dir(log_descriptor.directory_fd, MYF(MY_WME | MY_IGNORE_BADFD));
+ DBUG_PRINT("info", ("flush waits: %llu interval: %llu spent: %llu",
+ flush_interval - time_spent,
+ flush_interval, time_spent));
+ /* wait time or next goal */
+ set_timespec_nsec(abstime, flush_interval - time_spent);
+ pthread_cond_timedwait(&log_descriptor.new_goal_cond,
+ &log_descriptor.log_flush_lock,
+ &abstime);
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+ DBUG_PRINT("info", ("retest conditions"));
+ goto retest;
+ }
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
+
+ /* next flush pass */
+ DBUG_PRINT("info", ("next flush pass"));
+ translog_lock();
+ }
+
+ /*
+ sync() files from previous flush till current one
+ */
+ if (!soft_sync || hgroup_commit_at_start)
+ {
+ if ((rc=
+ translog_sync_files(LSN_FILE_NO(log_descriptor.flushed),
+ LSN_FILE_NO(lsn),
+ sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS &&
+ (LSN_FILE_NO(log_descriptor.
+ previous_flush_horizon) !=
+ LSN_FILE_NO(flush_horizon) ||
+ (LSN_OFFSET(log_descriptor.
+ previous_flush_horizon) /
+ TRANSLOG_PAGE_SIZE) !=
+ (LSN_OFFSET(flush_horizon) /
+ TRANSLOG_PAGE_SIZE)))))
+ {
+ sent_to_disk= LSN_IMPOSSIBLE;
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
+ goto out;
+ }
+ /* keep values for soft sync() and forced sync() actual */
+ {
+ uint32 fileno= LSN_FILE_NO(lsn);
+ my_atomic_rwlock_wrlock(&soft_sync_rwl);
+ my_atomic_store32(&soft_sync_min, fileno);
+ my_atomic_store32(&soft_sync_max, fileno);
+ my_atomic_rwlock_wrunlock(&soft_sync_rwl);
+ }
+ }
+ else
+ {
+ my_atomic_rwlock_wrlock(&soft_sync_rwl);
+ my_atomic_store32(&soft_sync_max, LSN_FILE_NO(lsn));
+ my_atomic_rwlock_wrunlock(&soft_sync_rwl);
+ }
+
+ DBUG_ASSERT(flush_horizon <= log_descriptor.horizon);
+
+ pthread_mutex_lock(&log_descriptor.log_flush_lock);
log_descriptor.previous_flush_horizon= flush_horizon;
out:
- pthread_mutex_lock(&log_descriptor.log_flush_lock);
if (sent_to_disk != LSN_IMPOSSIBLE)
log_descriptor.flushed= sent_to_disk;
log_descriptor.flush_in_progress= 0;
log_descriptor.flush_no++;
DBUG_PRINT("info", ("flush_in_progress is dropped"));
- pthread_mutex_unlock(&log_descriptor.log_flush_lock);\
+ pthread_mutex_unlock(&log_descriptor.log_flush_lock);
pthread_cond_broadcast(&log_descriptor.log_flush_cond);
DBUG_RETURN(rc);
}
@@ -8113,6 +8427,8 @@
my_bool translog_purge(TRANSLOG_ADDRESS low)
{
uint32 last_need_file= LSN_FILE_NO(low);
+ uint32 min_unsync;
+ int soft;
TRANSLOG_ADDRESS horizon= translog_get_horizon();
int rc= 0;
DBUG_ENTER("translog_purge");
@@ -8120,12 +8436,23 @@
DBUG_ASSERT(translog_status == TRANSLOG_OK ||
translog_status == TRANSLOG_READONLY);
+ soft= soft_sync;
+ DBUG_PRINT("info", ("min_unsync: %lu", (ulong) min_unsync));
+ if (soft && min_unsync < last_need_file)
+ {
+ last_need_file= min_unsync;
+ DBUG_PRINT("info", ("last_need_file set to %lu", (ulong)last_need_file));
+ }
+
pthread_mutex_lock(&log_descriptor.purger_lock);
+ DBUG_PRINT("info", ("last_lsn_checked file: %lu:",
+ (ulong) log_descriptor.last_lsn_checked));
if (LSN_FILE_NO(log_descriptor.last_lsn_checked) < last_need_file)
{
uint32 i;
uint32 min_file= translog_first_file(horizon, 1);
DBUG_ASSERT(min_file != 0); /* log is already started */
+ DBUG_PRINT("info", ("min_file: %lu:",(ulong) min_file));
for(i= min_file; i < last_need_file && rc == 0; i++)
{
LSN lsn= translog_get_file_max_lsn_stored(i);
@@ -8356,6 +8683,155 @@
}
+
+/**
+ Sets soft sync mode
+
+ @param mode TRUE if we need switch soft sync on else off
+*/
+
+void translog_soft_sync(my_bool mode)
+{
+ soft_sync= mode;
+}
+
+
+/**
+ Sets hard group commit
+
+ @param mode TRUE if we need switch hard group commit on else off
+*/
+
+void translog_hard_group_commit(my_bool mode)
+{
+ hard_group_commit= mode;
+}
+
+
+/**
+ @brief forced log sync (used when we are switching modes)
+*/
+
+void translog_sync()
+{
+ uint32 max= get_current_logfile()->number;
+ uint32 min;
+ DBUG_ENTER("ma_translog_sync");
+
+ my_atomic_rwlock_rdlock(&soft_sync_rwl);
+ min= my_atomic_load32(&soft_sync_min);
+ my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+ if (!min)
+ min= max;
+
+ translog_sync_files(min, max, sync_log_dir >= TRANSLOG_SYNC_DIR_ALWAYS);
+
+ DBUG_VOID_RETURN;
+}
+
+
+/**
+ @brief set rate for group commit
+
+ @param interval interval to set.
+
+ @note We use this function with additional variable because have to
+ restart service thread with new value which we can't make inside changing
+ variable routine (update_maria_group_commit_interval)
+*/
+
+void translog_set_group_commit_interval(uint32 interval)
+{
+ DBUG_ENTER("translog_set_group_commit_interval");
+ group_commit_wait= interval;
+ DBUG_PRINT("info", ("wait: %llu",
+ (ulonglong)group_commit_wait));
+ DBUG_VOID_RETURN;
+}
+
+
+/**
+ @brief syncing service thread
+*/
+
+static pthread_handler_t
+ma_soft_sync_background( void *arg __attribute__((unused)))
+{
+
+ my_thread_init();
+ {
+ DBUG_ENTER("ma_soft_sync_background");
+ for(;;)
+ {
+ ulonglong prev_loop= my_micro_time();
+ ulonglong time, sleep;
+ uint32 min, max;
+ my_atomic_rwlock_rdlock(&soft_sync_rwl);
+ min= my_atomic_load32(&soft_sync_min);
+ max= my_atomic_load32(&soft_sync_max);
+ my_atomic_store32(&soft_sync_min, max);
+ my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+
+ sleep= group_commit_wait;
+ translog_sync_files(min, max, FALSE);
+ time= my_micro_time() - prev_loop;
+ if (time > sleep)
+ sleep= 0;
+ else
+ sleep-= time;
+ if (my_service_thread_sleep(&soft_sync_control, sleep))
+ break;
+ }
+ my_service_thread_signal_end(&soft_sync_control);
+ my_thread_end();
+ DBUG_RETURN(0);
+ }
+}
+
+
+/**
+ @brief Starts syncing thread
+*/
+
+int translog_soft_sync_start(void)
+{
+ pthread_t th;
+ int res= 0;
+ uint32 min, max;
+ DBUG_ENTER("translog_soft_sync_start");
+
+ /* check and init variables */
+ my_atomic_rwlock_rdlock(&soft_sync_rwl);
+ min= my_atomic_load32(&soft_sync_min);
+ max= my_atomic_load32(&soft_sync_max);
+ if (!max)
+ my_atomic_store32(&soft_sync_max, (max= get_current_logfile()->number));
+ if (!min)
+ my_atomic_store32(&soft_sync_min, max);
+ my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+
+ if (!(res= ma_service_thread_control_init(&soft_sync_control)))
+ if (!(res= pthread_create(&th, NULL, ma_soft_sync_background, NULL)))
+ soft_sync_control.status= THREAD_RUNNING;
+ DBUG_RETURN(res);
+}
+
+
+/**
+ @brief Stops syncing thread
+*/
+
+void translog_soft_sync_end(void)
+{
+ DBUG_ENTER("translog_soft_sync_end");
+ if (soft_sync_control.inited)
+ {
+ ma_service_thread_control_end(&soft_sync_control);
+ }
+ DBUG_VOID_RETURN;
+}
+
+
#ifdef MARIA_DUMP_LOG
#include <my_getopt.h>
extern void translog_example_table_init();
=== modified file 'storage/maria/ma_loghandler.h'
--- a/storage/maria/ma_loghandler.h 2009-01-15 22:25:53 +0000
+++ b/storage/maria/ma_loghandler.h 2010-02-09 08:32:59 +0000
@@ -342,6 +342,14 @@
TRANSLOG_SHUTDOWN /* going to shutdown the loghandler */
};
extern enum enum_translog_status translog_status;
+extern ulonglong translog_syncs; /* Number of sync()s */
+
+void translog_soft_sync(my_bool mode);
+void translog_hard_group_commit(my_bool mode);
+int translog_soft_sync_start(void);
+void translog_soft_sync_end(void);
+void translog_sync();
+void translog_set_group_commit_interval(uint32 interval);
/*
all the rest added because of recovery; should we make
@@ -441,6 +449,14 @@
typedef enum
{
+ TRANSLOG_GCOMMIT_NONE,
+ TRANSLOG_GCOMMIT_HARD,
+ TRANSLOG_GCOMMIT_SOFT
+} enum_maria_group_commit;
+extern ulong maria_group_commit;
+extern ulong maria_group_commit_interval;
+typedef enum
+{
TRANSLOG_PURGE_IMMIDIATE,
TRANSLOG_PURGE_EXTERNAL,
TRANSLOG_PURGE_ONDEMAND
3
2