- developers - lists.mariadb.org

Re: [Maria-developers] Fwd: [Commits] Rev 3616: Fix for bug in file:///home/tsk/mprog/src/5.3-md641/
by Igor Babaev 28 Jan '13

28 Jan '13

Timour, It's not clear for me: 1. why group by queries with GROUP BY col1, col2, where col1,col2 are defined over the same table, are considered as possibly good for for loose scan optimization. 2. how it comes that unique index is considered as a candidate for a loose scan (see your test case). (The second issue, of course, are unrelated to bug mdev-641). Besides, didn't you notice that you removes duplicates in group by under any circumstances. How it will affect the following query: select a, sum(c) from t1 group by a, a with rollup; ? In general, removing duplicates from GROUP BY and ORDER BY lists is a good optimization, but why it should be done in connection with this bug? Regards, Igor. On 01/24/2013 06:38 AM, Timour Katchaounov wrote: > Igor, > > Could you please review the following patch (we discussed it at the team > call): > > ------------------------------------------------------------ > revno: 3616 > revision-id: timour(a)askmonty.org-20130124143526-bpaiyushble1ieek > parent: timour(a)askmonty.org-20130117140805-4kyoq7azx4v2irhq > fixes bug: https://mariadb.atlassian.net/browse/MDEV-614 > committer: timour(a)askmonty.org > branch nick: 5.3-md641 > timestamp: Thu 2013-01-24 16:35:26 +0200 > message: > Fix for bug > MDEV-641 LP:1002108 - Wrong result (or crash) from a query with > duplicated field in the group list and a limit clause > Bug#11761078: 53534: INCORRECT 'SELECT SQL_BIG_RESULT...' > WITH GROUP BY ON DUPLICATED FIELDS > > Analysis > Range analysis discoveres that the query can be executed via loose > index scan for GROUP BY. > Later, GROUP BY analysis fails to confirm that the GROUP operation can > be computed via an > index because there is no logic to handle duplicate field references > in the GROUP clause. > As a result the optimizer the optimizer produces an inconsistent plan. > It constructs a > temporary table, but on the other hand the group fields are not set to > point there. > > Solution: > Make order by analysis be in sync with loose scan analysis for group > by. Instead of > changing the logic of how indexes are analyzed for order by, the patch > takes the same > approach as MySQL - normalize the query by removing duplicate column > references. The > difference from MySQL is that we remove duplicates earlier, by reusing > remove_const() > which already removed unneeded ORDER elements. > >

1 0

[Maria-developers] MDEV-3875 Wrong result on a DISTINCT query ... and GROUP BY
by Sergei Golubchik 27 Jan '13

27 Jan '13

Hi, Igor, See my patch below. The only questionable part - in remove_dup_with_compare(). Before it was memcpy'ing only part of the record, skipping const fields. Now it memcpy's the complete record. I don't think it matters much performance-wise. An alternative would be to do two memcpy's - for the NULL bitmap and for the fields (because const fields are located between the NULL bitmap that we need, and the other non-const fields). Another alternative would be to have const fields at the end of the record, but it'd be a much larger and more risky change. ------------------------------------------------------------ revno: 3620 fixes bug: https://mariadb.atlassian.net/browse/MDEV-3875 committer: Sergei Golubchik <sergii(a)pisem.net> branch nick: 5.3 timestamp: Sat 2013-01-26 22:33:18 +0100 message: MDEV-3875 Wrong result (missing row) on a DISTINCT query with the same subquery in the SELECT list and GROUP BY fix remove_dup_with_hash_index() and remove_dup_with_compare() to take NULLs into account modified: mysql-test/r/distinct.result mysql-test/t/distinct.test sql/field.cc sql/field.h sql/filesort.cc sql/sql_select.cc diff: === modified file 'mysql-test/r/distinct.result' --- mysql-test/r/distinct.result 2011-12-15 22:26:59 +0000 +++ mysql-test/r/distinct.result 2013-01-26 21:33:18 +0000 @@ -847,3 +847,35 @@ time(f1) 00:00:00.000200 00:00:00.000300 drop table t1; +create table t1(i int, g int); +insert into t1 values (null, 1), (0, 2); +select distinct i from t1 group by g; +i +NULL +0 +drop table t1; +create table t1(i int, g blob); +insert into t1 values (null, 1), (0, 2); +select distinct i from t1 group by g; +i +NULL +0 +drop table t1; +create table t1 (a int) engine=myisam; +insert into t1 values (0),(7); +create table t2 (b int) engine=myisam; +insert into t2 values (7),(0),(3); +create algorithm=temptable view v as +select distinct (select max(a) from t1 where alias.b = a) as field1 from t2 as alias group by field1; +select * from v; +field1 +NULL +0 +7 +select distinct (select max(a) from t1 where alias.b = a) as field1 from t2 as alias group by field1; +field1 +NULL +0 +7 +drop view v; +drop table t1, t2; === modified file 'mysql-test/t/distinct.test' --- mysql-test/t/distinct.test 2011-12-15 22:26:59 +0000 +++ mysql-test/t/distinct.test 2013-01-26 21:33:18 +0000 @@ -658,3 +658,27 @@ select time(f1) from t1 ; select distinct time(f1) from t1 ; drop table t1; +# +# MDEV-3875 Wrong result (missing row) on a DISTINCT query with the same subquery in the SELECT list and GROUP BY +# MySQL Bug#66896 Distinct not distinguishing 0 from NULL when GROUP BY is used +# +create table t1(i int, g int); # remove_dup_with_hash_index +insert into t1 values (null, 1), (0, 2); +select distinct i from t1 group by g; +drop table t1; + +create table t1(i int, g blob); # remove_dup_with_compare +insert into t1 values (null, 1), (0, 2); +select distinct i from t1 group by g; +drop table t1; + +create table t1 (a int) engine=myisam; +insert into t1 values (0),(7); +create table t2 (b int) engine=myisam; +insert into t2 values (7),(0),(3); +create algorithm=temptable view v as +select distinct (select max(a) from t1 where alias.b = a) as field1 from t2 as alias group by field1; +select * from v; +select distinct (select max(a) from t1 where alias.b = a) as field1 from t2 as alias group by field1; +drop view v; +drop table t1, t2; === modified file 'sql/field.cc' --- sql/field.cc 2012-08-29 15:55:59 +0000 +++ sql/field.cc 2013-01-26 21:33:18 +0000 @@ -1062,6 +1062,21 @@ bool Field::type_can_have_key_part(enum } +void Field::make_sort_key(uchar *buff,uint length) +{ + if (maybe_null()) + { + if (is_null()) + { + bzero(buff, length + 1); + return; + } + *buff++= 1; + } + sort_string(buff, length); +} + + /** Numeric fields base class constructor. */ === modified file 'sql/field.h' --- sql/field.h 2012-06-23 22:00:05 +0000 +++ sql/field.h 2013-01-26 21:33:18 +0000 @@ -386,6 +386,7 @@ class Field return bytes; } + void make_sort_key(uchar *buff, uint length); virtual void make_field(Send_field *); virtual void sort_string(uchar *buff,uint length)=0; virtual bool optimize_range(uint idx, uint part); === modified file 'sql/filesort.cc' --- sql/filesort.cc 2012-11-09 08:11:20 +0000 +++ sql/filesort.cc 2013-01-26 21:33:18 +0000 @@ -783,21 +783,9 @@ static void make_sortkey(register SORTPA bool maybe_null=0; if ((field=sort_field->field)) { // Field - if (field->maybe_null()) - { - if (field->is_null()) - { - if (sort_field->reverse) - bfill(to,sort_field->length+1,(char) 255); - else - bzero((char*) to,sort_field->length+1); - to+= sort_field->length+1; - continue; - } - else - *to++=1; - } - field->sort_string(to, sort_field->length); + field->make_sort_key(to, sort_field->length); + if ((maybe_null = field->maybe_null())) + to++; } else { // Item @@ -949,8 +937,11 @@ static void make_sortkey(register SORTPA } if (sort_field->reverse) { /* Revers key */ - if (maybe_null) - to[-1]= ~to[-1]; + if (maybe_null && (to[-1]= !to[-1])) + { + to+= sort_field->length; // don't waste the time reversing all 0's + continue; + } length=sort_field->length; while (length--) { === modified file 'sql/sql_select.cc' --- sql/sql_select.cc 2013-01-16 13:11:13 +0000 +++ sql/sql_select.cc 2013-01-26 21:33:18 +0000 @@ -204,7 +204,7 @@ static int create_sort_index(THD *thd, J static int remove_duplicates(JOIN *join,TABLE *entry,List<Item> &fields, Item *having); static int remove_dup_with_compare(THD *thd, TABLE *entry, Field **field, - ulong offset,Item *having); + Item *having); static int remove_dup_with_hash_index(THD *thd,TABLE *table, uint field_count, Field **first_field, ulong key_length,Item *having); @@ -18891,19 +18891,24 @@ static bool fix_having(JOIN *join, Item #endif -/***************************************************************************** - Remove duplicates from tmp table - This should be recoded to add a unique index to the table and remove - duplicates - Table is a locked single thread table - fields is the number of fields to check (from the end) -*****************************************************************************/ +/** + Compare fields from table->record[0] and table->record[1], + possibly skipping few first fields. + @param table + @param ptr field to start the comparison from, + somewhere in the table->field[] array + + @retval 1 different + @retval 0 identical +*/ static bool compare_record(TABLE *table, Field **ptr) { for (; *ptr ; ptr++) { - if ((*ptr)->cmp_offset(table->s->rec_buff_length)) + Field *f= *ptr; + if (f->is_null() != f->is_null(table->s->rec_buff_length) || + (!f->is_null() && f->cmp_offset(table->s->rec_buff_length))) return 1; } return 0; @@ -18931,15 +18936,15 @@ static void free_blobs(Field **ptr) static int -remove_duplicates(JOIN *join, TABLE *entry,List<Item> &fields, Item *having) +remove_duplicates(JOIN *join, TABLE *table, List<Item> &fields, Item *having) { int error; - ulong reclength,offset; + ulong keylength= 0; uint field_count; THD *thd= join->thd; DBUG_ENTER("remove_duplicates"); - entry->reginfo.lock_type=TL_WRITE; + table->reginfo.lock_type=TL_WRITE; /* Calculate how many saved fields there is in list */ field_count=0; @@ -18956,24 +18961,21 @@ remove_duplicates(JOIN *join, TABLE *ent join->unit->select_limit_cnt= 1; // Only send first row DBUG_RETURN(0); } - Field **first_field=entry->field+entry->s->fields - field_count; - offset= (field_count ? - entry->field[entry->s->fields - field_count]-> - offset(entry->record[0]) : 0); - reclength=entry->s->reclength-offset; - - free_io_cache(entry); // Safety - entry->file->info(HA_STATUS_VARIABLE); - if (entry->s->db_type() == heap_hton || - (!entry->s->blob_fields && - ((ALIGN_SIZE(reclength) + HASH_OVERHEAD) * entry->file->stats.records < + + Field **first_field=table->field+table->s->fields - field_count; + for (Field **ptr=first_field; *ptr; ptr++) + keylength+= (*ptr)->sort_length() + (*ptr)->maybe_null(); + + free_io_cache(table); // Safety + table->file->info(HA_STATUS_VARIABLE); + if (table->s->db_type() == heap_hton || + (!table->s->blob_fields && + ((ALIGN_SIZE(keylength) + HASH_OVERHEAD) * table->file->stats.records < thd->variables.sortbuff_size))) - error=remove_dup_with_hash_index(join->thd, entry, - field_count, first_field, - reclength, having); + error=remove_dup_with_hash_index(join->thd, table, field_count, first_field, + keylength, having); else - error=remove_dup_with_compare(join->thd, entry, first_field, offset, - having); + error=remove_dup_with_compare(join->thd, table, first_field, having); free_blobs(first_field); DBUG_RETURN(error); @@ -18981,18 +18983,13 @@ remove_duplicates(JOIN *join, TABLE *ent static int remove_dup_with_compare(THD *thd, TABLE *table, Field **first_field, - ulong offset, Item *having) + Item *having) { handler *file=table->file; - char *org_record,*new_record; - uchar *record; + uchar *record=table->record[0]; int error; - ulong reclength= table->s->reclength-offset; DBUG_ENTER("remove_dup_with_compare"); - org_record=(char*) (record=table->record[0])+offset; - new_record=(char*) table->record[1]+offset; - if (file->ha_rnd_init_with_error(1)) DBUG_RETURN(1); @@ -19029,7 +19026,7 @@ static int remove_dup_with_compare(THD * error=0; goto err; } - memcpy(new_record,org_record,reclength); + store_record(table,record[1]); /* Read through rest of file and mark duplicated rows deleted */ bool found=0; @@ -19088,8 +19085,9 @@ static int remove_dup_with_hash_index(TH int error; handler *file= table->file; ulong extra_length= ALIGN_SIZE(key_length)-key_length; - uint *field_lengths,*field_length; + uint *field_lengths, *field_length; HASH hash; + Field **ptr; DBUG_ENTER("remove_dup_with_hash_index"); if (!my_multi_malloc(MYF(MY_WME), @@ -19101,21 +19099,8 @@ static int remove_dup_with_hash_index(TH NullS)) DBUG_RETURN(1); - { - Field **ptr; - ulong total_length= 0; - for (ptr= first_field, field_length=field_lengths ; *ptr ; ptr++) - { - uint length= (*ptr)->sort_length(); - (*field_length++)= length; - total_length+= length; - } - DBUG_PRINT("info",("field_count: %u key_length: %lu total_length: %lu", - field_count, key_length, total_length)); - DBUG_ASSERT(total_length <= key_length); - key_length= total_length; - extra_length= ALIGN_SIZE(key_length)-key_length; - } + for (ptr= first_field, field_length=field_lengths ; *ptr ; ptr++) + (*field_length++)= (*ptr)->sort_length(); if (hash_init(&hash, &my_charset_bin, (uint) file->stats.records, 0, key_length, (hash_get_key) 0, 0, 0)) @@ -19155,13 +19140,13 @@ static int remove_dup_with_hash_index(TH /* copy fields to key buffer */ org_key_pos= key_pos; field_length=field_lengths; - for (Field **ptr= first_field ; *ptr ; ptr++) + for (ptr= first_field ; *ptr ; ptr++) { - (*ptr)->sort_string(key_pos,*field_length); - key_pos+= *field_length++; + (*ptr)->make_sort_key(key_pos, *field_length); + key_pos+= (*ptr)->maybe_null() + *field_length++; } /* Check if it exists before */ - if (hash_search(&hash, org_key_pos, key_length)) + if (my_hash_search(&hash, org_key_pos, key_length)) { /* Duplicated found ; Remove the row */ if ((error= file->ha_delete_row(record))) Regards, Sergei

1 0

[Maria-developers] MariaDB RPMs
by Rich Prohaska 24 Jan '13

24 Jan '13

Suppose that we have a RHEL5 machine running and we want to install MariaDB on it. Unfortunately, it conflicts with files installed by the mysql package. # rpm -i MariaDB-5.5.28a-centos5-x86_64-common.rpm warning: MariaDB-5.5.28a-centos5-x86_64-common.rpm: Header V3 DSA signature: NOKEY, key ID 1bb943db file /etc/my.cnf from install of MariaDB-common-5.5.28a-1.x86_64 conflicts with file from package mysql-5.0.77-4.el5_6.6.x86_64 file /usr/share/mysql/charsets/Index.xml from install of MariaDB-common-5.5.28a-1.x86_64 conflicts with file from package mysql-5.0.77-4.el5_6.6.x86_64 file /usr/share/mysql/charsets/cp1250.xml from install of MariaDB-common-5.5.28a-1.x86_64 conflicts with file from package mysql-5.0.77-4.el5_6.6.x86_64 file /usr/share/mysql/charsets/cp1251.xml from install of MariaDB-common-5.5.28a-1.x86_64 conflicts with file from package mysql-5.0.77-4.el5_6.6.x86_64 Unfortunately, a lot of other packages require the mysql package. What is a good approach to solving this problem? Thanks

2 1

[Maria-developers] A new Ubuntu UPDATE event problem smashed MariaDB dead
by Peter W Bowey 23 Jan '13

23 Jan '13

Subject: A new Ubuntu UPDATE event problem smashed MariaDB 5.5.28a dead Hi MariaDB Team, Ubuntu released a update 23 hours ago that 'smashed out' omega8.cc's careful work to support MariaDB 5.5.28-mariadb-a1~precise. Ubuntu's new version is '5.5.29-0ubuntu012.10.1' dated - 2013-01-22 15:07:08 UTC. This does not work yet with MariaDB & Ubuntu 12.04 See Link here: https://launchpad.net/ubuntu/quantal/+source/mysql-5.5/5.5.29-0ubuntu0.12..… Now, I have been 'battling for hours' to work around this Ubuntu pain, while we wait on the good MariaDB team to re-syncs it's own updates with the Ubuntu 12.04 updates. Here is the 'crude *forced* style' I worked with to MariaDB back to life - after the Ubuntu *forced* update and dependency killed the database [MariaDB]: -------------------------------------------------------------------------------- server:~/aegir-boa-installer# aptitude install mariadb-server-5.5 mariadb-common The following NEW packages will be installed: libdbd-mysql-perl{a} libmariadbclient18{ab} libmysqlclient18{a} mariadb-client-5.5{a} mariadb-client-core-5.5{a} mariadb-server-5.5 mariadb-server-core-5.5{a} 0 packages upgraded, 7 newly installed, 0 to remove and 0 not upgraded. Need to get 31.8 MB of archives. After unpacking 107 MB will be used. The following packages have unmet dependencies: mysql-common : Breaks: mysql-client-5.1 which is a virtual package. Breaks: mysql-client-core-5.1 which is a virtual package. Breaks: mysql-server-core-5.1 which is a virtual package. libmariadbclient18 : Depends: libmysqlclient18 (= 5.5.28-mariadb-a1~precise) but 5.5.29-0ubuntu0.12.04.1 is to be installed. The following actions will resolve these dependencies: Keep the following packages at their current version: 1) libmariadbclient18 [Not Installed] 2) mariadb-client-5.5 [Not Installed] 3) mariadb-client-core-5.5 [Not Installed] 4) mariadb-server-5.5 [Not Installed] 5) mariadb-server-core-5.5 [Not Installed] Accept this solution? [Y/n/q/?] n The following actions will resolve these dependencies: Install the following packages: 1) libmysqlclient18 [5.5.28-mariadb-a1~precise (<NULL>)] Downgrade the following packages: 2) mysql-common [5.5.29-0ubuntu0.12.04.1 (now, precise-security, precise-updates) -> 5.5.28-mariadb-a1~precise (<NULL>)] Accept this solution? [Y/n/q/?] The following packages will be DOWNGRADED: mysql-common The following NEW packages will be installed: libdbd-mysql-perl{a} libmariadbclient18{a} libmysqlclient18{a} mariadb-client-5.5{a} mariadb-client-core-5.5{a} mariadb-server-5.5 mariadb-server-core-5.5{a} 0 packages upgraded, 7 newly installed, 1 downgraded, 0 to remove and 0 not upgraded. Need to get 30.9 MB of archives. After unpacking 104 MB will be used. Do you want to continue? [Y/n/?] y Get: 1 http://archive.ubuntu.com/ubuntu/ precise/main libdbd-mysql-perl i386 4.020-1build2 [104 kB] Get: 2 http://ftp.osuosl.org/pub/mariadb/repo/5.5/ubuntu/ precise/main mysql-common all 5.5.28-mariadb-a1~precise [8,780 B] Get: 3 http://ftp.osuosl.org/pub/mariadb/repo/5.5/ubuntu/ precise/main libmariadbclient18 i386 5.5.28-mariadb-a1~precise [817 kB] Get: 4 http://ftp.osuosl.org/pub/mariadb/repo/5.5/ubuntu/ precise/main libmysqlclient18 i386 5.5.28-mariadb-a1~precise [2,920 B] Get: 5 http://ftp.osuosl.org/pub/mariadb/repo/5.5/ubuntu/ precise/main mariadb-client-core-5.5 i386 5.5.28-mariadb-a1~precise [1,797 kB] Get: 6 http://ftp.osuosl.org/pub/mariadb/repo/5.5/ubuntu/ precise/main mariadb-client-5.5 i386 5.5.28-mariadb-a1~precise [5,097 kB] Get: 7 http://ftp.osuosl.org/pub/mariadb/repo/5.5/ubuntu/ precise/main mariadb-server-core-5.5 i386 5.5.28-mariadb-a1~precise [5,113 kB] Get: 8 http://ftp.osuosl.org/pub/mariadb/repo/5.5/ubuntu/ precise/main mariadb-server-5.5 i386 5.5.28-mariadb-a1~precise [18.0 MB] Fetched 30.9 MB in 30s (1,020 kB/s) Preconfiguring packages ... dpkg: warning: downgrading mysql-common from 5.5.29-0ubuntu0.12.04.1 to 5.528-mariadb-a1~precise. (Reading database ... 61332 files and directories currently installed.) Preparing to replace mysql-common 5.5.29-0ubuntu0.12.04.1 (using ..../mysql-common_5.5.28-mariadb-a1~precise_all.deb) ... Unpacking replacement mysql-common ... Selecting previously unselected package libmariadbclient18. Unpacking libmariadbclient18 (from ..../libmariadbclient18_5.5.28-mariadb-a1~precise_i386.deb) ... Selecting previously unselected package libmysqlclient18. Unpacking libmysqlclient18 (from ..../libmysqlclient18_5.5.28-mariadb-a1~precise_i386.deb) ... Selecting previously unselected package libdbd-mysql-perl. Unpacking libdbd-mysql-perl (from ..../libdbd-mysql-perl_4.020-1build2_i386deb) ... Selecting previously unselected package mariadb-client-core-5.5. Unpacking mariadb-client-core-5.5 (from ..../mariadb-client-core-5.5_5.5.28-mariadb-a1~precise_i386.deb) ... Selecting previously unselected package mariadb-client-5.5. Unpacking mariadb-client-5.5 (from ..../mariadb-client-5.5_5.5.28-mariadb-a1~precise_i386.deb) ... Selecting previously unselected package mariadb-server-core-5.5. Unpacking mariadb-server-core-5.5 (from ..../mariadb-server-core-5.5_5.5.28-mariadb-a1~precise_i386.deb) ... Selecting previously unselected package mariadb-server-5.5. Unpacking mariadb-server-5.5 (from ..../mariadb-server-5.5_5.5.28-mariadb-a1~precise_i386.deb) ... * Stopping MariaDB database server mysqld [ OK ] Processing triggers for man-db ... Processing triggers for ureadahead ... Setting up mysql-common (5.5.28-mariadb-a1~precise) ... Configuration file `/etc/mysql/my.cnf' ==> Modified (by you or by a script) since installation. ==> Package distributor has shipped an updated version. What would you like to do about it ? Your options are: Y or I : install the package maintainer's version N or O : keep your currently-installed version D : show the differences between the versions Z : start a shell to examine the situation The default action is to keep your current version. *** my.cnf (Y/I/N/O/D/Z) [default=N] ? N Setting up libmysqlclient18 (5.5.28-mariadb-a1~precise) ... Setting up libdbd-mysql-perl (4.020-1build2) ... Setting up libmariadbclient18 (5.5.28-mariadb-a1~precise) ... Setting up mariadb-client-core-5.5 (5.5.28-mariadb-a1~precise) ... Setting up mariadb-client-5.5 (5.5.28-mariadb-a1~precise) ... Setting up mariadb-server-core-5.5 (5.5.28-mariadb-a1~precise) ... Setting up mariadb-server-5.5 (5.5.28-mariadb-a1~precise) ... * Stopping MariaDB database server mysqld [ OK ] * Starting MariaDB database server mysqld [ OK ] * Checking for corrupt, not cleanly closed and upgrade needing tables. Processing triggers for libc-bin ... ldconfig deferred processing now taking place -------------------------------------------------------------------------------- Ubuntu will drag in the new update dependants - > mysql-common [5.5.29-0ubuntu0.12.04.1 (now, precise-security, precise-updates) instead of 5.5.28-mariadb-a1~precise (<NULL>)] Peter Peter Bowey Web Design & Developer Peter Bowey Computer Solutions 69 Sutherland Ave, Victor Harbor, South Australia, Australia Tel.: 08 8552 8630 Fax: 08 8552 9185 Email: support(a)pbcomp.com.au Web: http://www.pbcomp.com.au/ Making Windows, Macintosh & Linux work for you! http://www.pbcomp.com.au/ This email and any attachment is intended only for the exclusive and confidential use of the addressee(s). If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. If you have received this message in error, please notify the sender by return email immediately and delete the message from your computer without making any copies and publishing the contents therein. Consider the environment before printing this e-mail.

2 1

Re: [Maria-developers] Rev 3177: Fixed bug mdev-4063 (bug #56927)
by Sergei Golubchik 21 Jan '13

21 Jan '13

Hi, Igor! On Jan 21, Igor Babaev wrote: > revno: 3177 > revision-id: igor(a)askmonty.org-20130121084657-hvcval7hykojxpmg > parent: sergii(a)pisem.net-20130109225151-z9e7gh8z5nl38dxl > committer: Igor Babaev <igor(a)askmonty.org> > branch nick: maria-5.1-bug56927 > timestamp: Mon 2013-01-21 00:46:57 -0800 > message: > Fixed bug mdev-4063 (bug #56927). > This bug could result in returning 0 for the expressions of the form > <aggregate_function>(distinct field) when the system variable > max_heap_table_size was set to a small enough number. > It happened because the method Unique::walk() did not support > the case when more than one pass was needed to merge the trees > of distinct values saved in an external file. Looks ok. Just one thought. You use merge_many_buff() if the memory is not enough for the number_of_buffpeks*size. But merge_many_buff() only reduces the number of buffpeks down to 15. What if max_heap_table_size < 15*size ? I'd suggest to either 1) issue an error or 2) merge completely, like in Unique::get(). Actually, 2) should be easy, you just pass the last argument to your new Unique::merge() instead of TRUE, like in your patch, change to merge( ..., max_in_memory_size > MERGEBUFF2*size); Regards, Sergei

1 0

Re: [Maria-developers] Bug in trx_group_commit_leader()
by Kristian Nielsen 21 Jan '13

21 Jan '13

Michael Widenius <monty(a)askmonty.org> writes: > I noticed this compiler error on one of the build machines for 10.0-base: > > /home/buildbot/buildbot/build/mariadb-10.0.0/sql/log.cc:6564:40: > warning: ‘last_in_queue’ may be used uninitialized in this function > [-Wuninitialized] Thanks. I've pushed the fix to 10.0-base. > Looking at the code, this looks like a real potential bug. Yes (though likely dead code, as there's a DBUG_ASSERT before the if () asserting that it is always true). - Kristian.

1 0

[Maria-developers] review of MDEV-4011 per thread memory counting and usage
by Sergei Golubchik 18 Jan '13

18 Jan '13

Hi. Here it is: > === modified file 'client/completion_hash.cc' > --- client/completion_hash.cc 2011-06-30 15:46:53 +0000 > +++ client/completion_hash.cc 2013-01-15 21:00:33 +0000 > @@ -49,7 +49,7 @@ int completion_hash_init(HashTable *ht, > ht->initialized = 0; > return FAILURE; > } > - init_alloc_root(&ht->mem_root, 8192, 0); > + init_alloc_root(&ht->mem_root, 8192, 0, 0); very often you write 0 or MY_THREAD_SPECIFIC instead of MYF(0) and MYF(MY_THREAD_SPECIFIC) > ht->pHashFunction = hashpjw; > ht->nTableSize = nSize; > ht->initialized = 1; > === modified file 'include/my_sys.h' > --- include/my_sys.h 2012-12-23 21:37:11 +0000 > +++ include/my_sys.h 2013-01-15 21:00:33 +0000 > @@ -86,6 +86,8 @@ typedef struct my_aio_result { > #define MY_SYNC 4096 /* my_copy(): sync dst file */ > #define MY_SYNC_DIR 32768 /* my_create/delete/rename: sync directory */ > #define MY_SYNC_FILESIZE 65536 /* my_sync(): safe sync when file is extended */ > +#define MY_THREAD_SPECIFIC 0x10000 /* my_malloc(): thread specific */ > +#define MY_THREAD_MOVE 0x20000 /* realloc(); Memory can move */ may be, better to remove this, as it's unused? it's easier to add it later (just a couple of lines of code), than to keep the dead code around. > #define MY_CHECK_ERROR 1 /* Params to my_end; Check open-close */ > #define MY_GIVE_INFO 2 /* Give time info about process*/ > @@ -148,6 +150,42 @@ typedef struct my_aio_result { > /* Extra length needed for filename if one calls my_create_backup_name */ > #define MY_BACKUP_NAME_EXTRA_LENGTH 17 > > +/* If we have our own safemalloc (for debugging) */ > +#if defined(SAFEMALLOC) > +#define MALLOC_SIZE_AND_FLAG(p,b) sf_malloc_usable_size(p,b) > +#define MALLOC_PREFIX_SIZE 0 > +#define MALLOC_STORE_SIZE(a,b,c,d) > +#define MALLOC_FIX_POINTER_FOR_FREE(a) a > +#define SAFEMALLOC_REPORT_MEMORY(X) sf_report_leaked_memory(X) > +void sf_report_leaked_memory(my_thread_id id); > +extern my_thread_id (*sf_malloc_dbug_id)(void); > +#else > +/* > + * We use double as prefix size as this guarantees the correct > + * alignment on all platforms and will optimize things for > + * memcpy(), memcmp() etc. > + */ > +#define MALLOC_PREFIX_SIZE (sizeof(double)) > +#define MALLOC_SIZE(p) (*(size_t*) ((char*)(p) - MALLOC_PREFIX_SIZE)) > +#define MALLOC_STORE_SIZE(p, type_of_p, size, flag) \ > +{\ > + *(size_t*) p= (size) | (flag); \ > + (p)= (type_of_p) (((char*) (p)) + MALLOC_PREFIX_SIZE); \ > +} > +static inline size_t malloc_size_and_flag(void *p, myf *flags) > +{ > + size_t size= MALLOC_SIZE(p); > + *flags= (size & 1); > + return size & ~ (ulonglong) 1; > +} > +#define MALLOC_SIZE_AND_FLAG(p,b) malloc_size_and_flag(p, b); > +#define MALLOC_FIX_POINTER_FOR_FREE(p) (((char*) (p)) - MALLOC_PREFIX_SIZE) > +#define SAFEMALLOC_REPORT_MEMORY(X) do {} while(0) > +#endif /* SAFEMALLOC */ > + > +typedef void (*MALLOC_SIZE_CB) (long long size, myf my_flags); > +extern void set_malloc_size_cb(MALLOC_SIZE_CB func); I don't think you need many of these defines here. my_sys.h is the interface header. MALLOC_STORE_SIZE and MALLOC_FIX_POINTER_FOR_FREE (for example, I didn't check the others) are only used in my_malloc.c - they should not be used elsewhere, and thus, I think, they should be defined in my_malloc.c And, frankly speaking, I wouldn't bother doing two different ways of storing the size and the flags. my_malloc can always allocate one sizeof(double) more and store its data there. Who cares that in the debug builds safemalloc duplicates that... > /* defines when allocating data */ > extern void *my_malloc(size_t Size,myf MyFlags); > extern void *my_multi_malloc(myf MyFlags, ...); > @@ -323,6 +361,7 @@ typedef struct st_dynamic_array > uint elements,max_element; > uint alloc_increment; > uint size_of_element; > + myf malloc_flags; hm. you've preserved the MEM_ROOT structure for compatibiity reasons, but you've changed DYNAMIC_ARRAY. Why? I'd expect you either to change everything or to change nothing. Besides, what's the point of preserving MEM_ROOT, if you change prototypes of functions that work with it? > } DYNAMIC_ARRAY; > > typedef struct st_my_tmpdir > @@ -768,16 +807,13 @@ extern my_bool real_open_cached_file(IO_ > extern void close_cached_file(IO_CACHE *cache); > File create_temp_file(char *to, const char *dir, const char *pfx, > int mode, myf MyFlags); > -#define my_init_dynamic_array(A,B,C,D) init_dynamic_array2(A,B,NULL,C,D) > -#define my_init_dynamic_array_ci(A,B,C,D) init_dynamic_array2(A,B,NULL,C,D) > -#define my_init_dynamic_array2(A,B,C,D,E) init_dynamic_array2(A,B,C,D,E) > -#define my_init_dynamic_array2_ci(A,B,C,D,E) init_dynamic_array2(A,B,C,D,E) > +#define my_init_dynamic_array(A,B,C,D,E) init_dynamic_array2(A,B,NULL,C,D,E) > +#define my_init_dynamic_array_ci(A,B,C,D,E) init_dynamic_array2(A,B,NULL,C,D,E) > +#define my_init_dynamic_array2(A,B,C,D,E,F) init_dynamic_array2(A,B,C,D,E,F) > +#define my_init_dynamic_array2_ci(A,B,C,D,E,F) init_dynamic_array2(A,B,C,D,E,F) as you're changing function prototypes anyway, you can remove the _ci functions and all other "preserve the ABI" garbage, > extern my_bool init_dynamic_array2(DYNAMIC_ARRAY *array, uint element_size, > void *init_buffer, uint init_alloc, > - uint alloc_increment); > -/* init_dynamic_array() function is deprecated */ > -extern my_bool init_dynamic_array(DYNAMIC_ARRAY *array, uint element_size, > - uint init_alloc, uint alloc_increment); > + uint alloc_increment, myf my_flags); > extern my_bool insert_dynamic(DYNAMIC_ARRAY *array, const uchar * element); > extern uchar *alloc_dynamic(DYNAMIC_ARRAY *array); > extern uchar *pop_dynamic(DYNAMIC_ARRAY*); > @@ -829,7 +865,8 @@ extern void my_free_lock(void *ptr); > #define ALLOC_ROOT_MIN_BLOCK_SIZE (MALLOC_OVERHEAD + sizeof(USED_MEM) + 8) > #define clear_alloc_root(A) do { (A)->free= (A)->used= (A)->pre_alloc= 0; (A)->min_malloc=0;} while(0) > extern void init_alloc_root(MEM_ROOT *mem_root, size_t block_size, > - size_t pre_alloc_size); > + size_t pre_alloc_size, myf my_flags); > +extern void alloc_root_set_min_malloc(MEM_ROOT *mem_root, size_t min_malloc); if you'd store the flag in the lowest bit of the block_size or if you wouldn't try to keep the old MEM_ROOT structure, this function wouldn't be needed > extern void *alloc_root(MEM_ROOT *mem_root, size_t Size); > extern void *multi_alloc_root(MEM_ROOT *mem_root, ...); > extern void free_root(MEM_ROOT *root, myf MyFLAGS); > === modified file 'include/my_tree.h' > --- include/my_tree.h 2011-11-03 18:17:05 +0000 > +++ include/my_tree.h 2013-01-15 21:00:33 +0000 > @@ -68,13 +68,15 @@ typedef struct st_tree { > MEM_ROOT mem_root; > my_bool with_delete; > tree_element_free free; > + myf malloc_flags; > uint flag; > } TREE; > > /* Functions on whole tree */ > void init_tree(TREE *tree, size_t default_alloc_size, size_t memory_limit, > int size, qsort_cmp2 compare, my_bool with_delete, > - tree_element_free free_element, void *custom_arg); > + tree_element_free free_element, void *custom_arg, > + myf my_flags_for_malloc); I'd just call it my_flags. it doesn't matter what tree is doing with them. and there may be other flags too. E.g. you could've removed 'with_delete' and added MY_TREE_WITH_DELETE > void delete_tree(TREE*); > void reset_tree(TREE*); > > === modified file 'include/mysql.h' > --- include/mysql.h 2012-11-20 14:24:39 +0000 > +++ include/mysql.h 2013-01-15 21:00:33 +0000 > @@ -167,7 +167,7 @@ enum mysql_option > MYSQL_OPT_GUESS_CONNECTION, MYSQL_SET_CLIENT_IP, MYSQL_SECURE_AUTH, > MYSQL_REPORT_DATA_TRUNCATION, MYSQL_OPT_RECONNECT, > MYSQL_OPT_SSL_VERIFY_SERVER_CERT, MYSQL_PLUGIN_DIR, MYSQL_DEFAULT_AUTH, > - MYSQL_PROGRESS_CALLBACK, > + MYSQL_PROGRESS_CALLBACK, MYSQL_THREAD_SPECIFIC_MALLOC, I don't think you need this on the client side. (the only use case is federated, and I don't think you need to extend the client API specifically for it, federated can set whatever it needs directliy). > /* MariaDB options */ > MYSQL_OPT_NONBLOCK=6000 > }; > === modified file 'mysys/array.c' > --- mysys/array.c 2012-01-13 14:50:02 +0000 > +++ mysys/array.c 2013-01-15 21:00:33 +0000 > @@ -62,17 +64,19 @@ my_bool init_dynamic_array2(DYNAMIC_ARRA > should not throw an error > */ > if (init_alloc && > - !(array->buffer= (uchar*) my_malloc(element_size*init_alloc, MYF(0)))) > + !(array->buffer= (uchar*) my_malloc(element_size*init_alloc, > + MYF(my_flags)))) > array->max_element=0; > DBUG_RETURN(FALSE); > } > > my_bool init_dynamic_array(DYNAMIC_ARRAY *array, uint element_size, > - uint init_alloc, uint alloc_increment) > + uint init_alloc, uint alloc_increment, > + myf my_flags) > { > /* placeholder to preserve ABI */ > return my_init_dynamic_array_ci(array, element_size, init_alloc, You can remove this, as you're changing the ABI anyway > - alloc_increment); > + alloc_increment, my_flags); > } > /* > Insert element at the end of array. Allocate memory if needed. > === modified file 'mysys/my_alloc.c' > --- mysys/my_alloc.c 2012-01-13 14:50:02 +0000 > +++ mysys/my_alloc.c 2013-01-15 21:00:33 +0000 > @@ -41,16 +48,20 @@ > Altough error can happen during execution of this function if > pre_alloc_size is non-0 it won't be reported. Instead it will be > reported as error in first alloc_root() on this memory root. > + > + We don't want to change the api for MEMROOT. > + Becasue of this, we store in min_malloc of my_thread_specific is set. > */ > > void init_alloc_root(MEM_ROOT *mem_root, size_t block_size, > - size_t pre_alloc_size __attribute__((unused))) > + size_t pre_alloc_size __attribute__((unused)), > + myf my_flags) > { > DBUG_ENTER("init_alloc_root"); > DBUG_PRINT("enter",("root: 0x%lx", (long) mem_root)); > > mem_root->free= mem_root->used= mem_root->pre_alloc= 0; > - mem_root->min_malloc= 32; > + mem_root->min_malloc= 32 | (test(my_flags & MY_THREAD_SPECIFIC) << 16); it'd be safer to put it in the lowest bit. min_malloc or - particularly - block size, are always even numbers, the lowest bit is always 0. or to add a new field to the MEM_ROOT, as compatibility is already broken anyway. > mem_root->block_size= block_size - ALLOC_ROOT_MIN_BLOCK_SIZE; > mem_root->error_handler= 0; > mem_root->block_num= 4; /* We shift this with >>2 */ > === modified file 'mysys/my_malloc.c' > --- mysys/my_malloc.c 2012-01-13 14:50:02 +0000 > +++ mysys/my_malloc.c 2013-01-15 21:00:33 +0000 > @@ -18,6 +18,31 @@ > #include "mysys_err.h" > #include <m_string.h> > > +MALLOC_SIZE_CB malloc_size_cb_func= NULL; > + > +/** > + Inform appliction that memory usage has changed application > + > + @param size Size of memory segment allocated or freed > + @param flag 1 if thread specific (allocated by MY_THREAD_SPECIFIC), > + 0 if system specific. > + > + The type os size is long long, to be able to handle negative numbers to > + decrement the memory usage > +*/ > + > +static void update_malloc_size(long long size, myf my_flags) > +{ > + if (malloc_size_cb_func) > + malloc_size_cb_func(size, my_flags); > +} > + > +void set_malloc_size_cb(MALLOC_SIZE_CB func) > +{ > + malloc_size_cb_func= func; > +} > + > + > /** > Allocate a sized block of memory. > > === modified file 'mysys/mysys_priv.h' > --- mysys/mysys_priv.h 2011-12-12 21:58:24 +0000 > +++ mysys/mysys_priv.h 2013-01-15 21:00:33 +0000 > @@ -70,12 +70,13 @@ extern PSI_file_key key_file_charset, ke > #endif /* HAVE_PSI_INTERFACE */ > > #ifdef SAFEMALLOC > -void *sf_malloc(size_t size); > -void *sf_realloc(void *ptr, size_t size); > +void *sf_malloc(size_t size, myf my_flags); > +void *sf_realloc(void *ptr, size_t size, myf my_flags); > void sf_free(void *ptr); > +size_t sf_malloc_usable_size(void *ptr, myf *my_flags); this is used nowhere. dead code again. > #else > -#define sf_malloc(X) malloc(X) > -#define sf_realloc(X,Y) realloc(X,Y) > +#define sf_malloc(X,Y) malloc(X) > +#define sf_realloc(X,Y,Z) realloc(X,Y) > #define sf_free(X) free(X) > #endif > > === modified file 'mysys/safemalloc.c' > --- mysys/safemalloc.c 2012-04-18 01:29:26 +0000 > +++ mysys/safemalloc.c 2013-01-15 21:00:33 +0000 > @@ -79,11 +81,21 @@ static int bad_ptr(const char *where, vo > static void free_memory(void *ptr); > static void sf_terminate(); > > +/* Setup default call to get a thread id for the memory */ > + > +my_thread_id default_sf_malloc_dbug_id(void) > +{ > + return my_thread_dbug_id(); > +} > + > +my_thread_id (*sf_malloc_dbug_id)(void)= default_sf_malloc_dbug_id; Why do you need different functions for getting the thread id? > + > + > /** > allocates memory > */ > > -void *sf_malloc(size_t size) > +void *sf_malloc(size_t size, myf my_flags) > { > struct st_irem *irem; > uchar *data; > === modified file 'sql/event_scheduler.cc' > --- sql/event_scheduler.cc 2012-09-27 23:06:56 +0000 > +++ sql/event_scheduler.cc 2013-01-15 21:00:33 +0000 > @@ -182,12 +180,15 @@ deinit_event_thread(THD *thd) > void > pre_init_event_thread(THD* thd) > { > + THD *orig_thd= current_thd; > DBUG_ENTER("pre_init_event_thread"); > + > + my_pthread_setspecific_ptr(THR_THD, thd); I'd rather see set_current_thd(thd) here: #define set_current_thd(X) my_pthread_setspecific_ptr(THR_THD, (X)) > thd->client_capabilities= 0; > thd->security_ctx->master_access= 0; > thd->security_ctx->db_access= 0; > thd->security_ctx->host_or_ip= (char*)my_localhost; > - my_net_init(&thd->net, NULL); > + my_net_init(&thd->net, NULL, MY_THREAD_SPECIFIC); > thd->security_ctx->set_user((char*)"event_scheduler"); > thd->net.read_timeout= slave_net_timeout; > thd->variables.option_bits|= OPTION_AUTO_IS_NULL; > === modified file 'sql/handler.cc' > --- sql/handler.cc 2012-12-17 00:49:19 +0000 > +++ sql/handler.cc 2013-01-15 21:00:33 +0000 > @@ -5395,6 +5395,7 @@ fl_log_iterator_buffer_init(struct handl > iterator->buffer= buff; > iterator->next= &fl_log_iterator_next; > iterator->destroy= &fl_log_iterator_destroy; > + my_dir_end(dirp); Why did we never see memory leaks because of that? > return HA_ITERATOR_OK; > } > > === modified file 'sql/mysqld.cc' > --- sql/mysqld.cc 2013-01-09 03:34:33 +0000 > +++ sql/mysqld.cc 2013-01-15 21:00:33 +0000 > @@ -2606,9 +2607,10 @@ bool one_thread_per_connection_end(THD * > > /* It's safe to broadcast outside a lock (COND... is not deleted here) */ > DBUG_PRINT("signal", ("Broadcasting COND_thread_count")); > + mysql_cond_broadcast(&COND_thread_count); > + > DBUG_LEAVE; // Must match DBUG_ENTER() > my_thread_end(); > - mysql_cond_broadcast(&COND_thread_count); why? > > pthread_exit(0); > return 0; // Avoid compiler warnings > @@ -7024,6 +7096,7 @@ SHOW_VAR status_vars[]= { > {"Key", (char*) &show_default_keycache, SHOW_FUNC}, > {"Last_query_cost", (char*) offsetof(STATUS_VAR, last_query_cost), SHOW_DOUBLE_STATUS}, > {"Max_used_connections", (char*) &max_used_connections, SHOW_LONG}, > + {"Memory_used", (char*) &total_memory_used, SHOW_LONGLONG}, May be this should rather be: SHOW SESSION STATUS -> thd->memory_used SHOW GLOBAL STATUS -> total_memory_used (which means that total_memory_used/memory_used should be in the status_var struct) > {"Not_flushed_delayed_rows", (char*) &delayed_rows_in_use, SHOW_LONG_NOFLUSH}, > {"Open_files", (char*) &my_file_opened, SHOW_LONG_NOFLUSH}, > {"Open_streams", (char*) &my_stream_opened, SHOW_LONG_NOFLUSH}, > === modified file 'sql/net_serv.cc' > --- sql/net_serv.cc 2012-02-28 17:53:05 +0000 > +++ sql/net_serv.cc 2013-01-15 21:00:33 +0000 > @@ -139,6 +140,7 @@ my_bool my_net_init(NET *net, Vio* vio) > net->net_skip_rest_factor= 0; > net->last_errno=0; > net->unused= 0; > + net->thread_specific_malloc= test(my_flags & MY_THREAD_SPECIFIC); why do you need that in NET ? > > if (vio != 0) /* If real connection */ > { > === modified file 'sql/sql_analyse.h' > --- sql/sql_analyse.h 2011-11-03 18:17:05 +0000 > +++ sql/sql_analyse.h 2013-01-15 21:00:33 +0000 > @@ -121,7 +121,8 @@ class field_str :public field_info > must_be_blob(0), was_zero_fill(0), > was_maybe_zerofill(0), can_be_still_num(1) > { init_tree(&tree, 0, 0, sizeof(String), (qsort_cmp2) sortcmp2, > - 0, (tree_element_free) free_string, NULL); }; > + 0, (tree_element_free) free_string, NULL, > + MY_THREAD_SPECIFIC); }; Now, when I'm thinking of it, I got the idea that it's better to have MY_THREAD_SPECIFIC as a default, and add MY_GLOBAL_ALLOC instead to mark global allocations explicitly. It's more future-proof. When one copies init_dynamic_array() from somewhere and zeros out unknown or unneeded parameters, the memory will be per-thread by default. And if that's wrong, the test suite will find it, you have an assert for this. To mark memory global as an explicit action, one needs to think about it. Currently, the memory is global by default, and if one doesn't think,# copy-pastes, and leaves new allocation global, it will not be noticed, the memory will simply be unaccounted for. > > void add(); > void get_opt_type(String*, ha_rows); > === modified file 'sql/sql_error.cc' > --- sql/sql_error.cc 2012-01-13 14:50:02 +0000 > +++ sql/sql_error.cc 2013-01-15 21:00:33 +0000 > @@ -457,25 +457,37 @@ Diagnostics_area::disable_status() > m_status= DA_DISABLED; > } > > -Warning_info::Warning_info(ulonglong warn_id_arg, bool allow_unlimited_warnings) > +Warning_info::Warning_info(ulonglong warn_id_arg, > + bool allow_unlimited_warnings, bool initialize) I'd rather add a special constructor for Warning_info that THD would use. Because there's only one place where you need initialize=false, and all other "normal" uses, should not suffer (be changed) because of that. > :m_statement_warn_count(0), > m_current_row_for_warning(1), > m_warn_id(warn_id_arg), > m_allow_unlimited_warnings(allow_unlimited_warnings), > m_read_only(FALSE) > { > - /* Initialize sub structures */ > - init_sql_alloc(&m_warn_root, WARN_ALLOC_BLOCK_SIZE, WARN_ALLOC_PREALLOC_SIZE); > m_warn_list.empty(); > bzero((char*) m_warn_count, sizeof(m_warn_count)); > + if (initialize) > + init(); > } > Regards, Sergei

2 2

Re: [Maria-developers] LP:1024058 - mysqld XA crash in replication slave
by Kristian Nielsen 18 Jan '13

18 Jan '13

"Sergei Golubchik (JIRA)" <jira(a)mariadb.atlassian.net> writes: > [ https://mariadb.atlassian.net/browse/MDEV-633?page=com.atlassian.jira.plugi… ] [Taking this to maria-developers@, which is better for discussions like this] > The problem here is very simple to explain. The server can use either > mmap-based transaction coordinator for 2PC or a binary log. 2PC always uses > binary log, if binary logging is enabled. But even if it is enabled > globally, it is usually disabled in the replication slave thread unless > --log-slave-updates is specified. > > One would probably get the same crash without replication, if one disables > binary log manually with SET SQL_LOG_BIN=0; > > Possible fixes: > * auto-enable binary log for 2PC transactions (bad, binlog will contain > unwanted DDL and DML events). > * abort 2PC transactions if binary log is locally disabled for this thread > (worse, too easy to break the replication) > * write Xid events to binlog even if binary log is locally disabled (best?) > > The last approach seems to be preferable. But in the future if we'll start > recovering transactions from the binary log (doing only one sync per 2PC > transaction), we'll have this problem again, because then we'll need the > actual changes to be logged, not just the Xid. > > [~knielsen] - opinion? First, I do not think SQL_LOG_BIN=0 or binlog enabled without --log-slave-updates are common or very useful. And multi-engine transactions are problematic in several ways. But we should not crash, of course. I would find it acceptable to simply call each engine's commit method separately, without 2PC, especially in stable 5.5 and below. Note that we now have @@skip_replication, which can be used to log a transaction to binlog without replicating it, similar to @@sql_log_bin but maintaining 2PC and recovery. If we really want, we could fix it 10.0 so that 2PC works even for binlog disabled. As you say, probably writing the Xid to the binlog without any associated DML events would be a reasonable way to do this. I would suggest logging a separate new event to not confuse things with XID_LOG_EVENT, which implies a commit of a prior transactions (there are facilities in 10.0 to rewrite such event to a dummy event so we will not break old slaves). With respect to a future recovering transactions from the binary log, it is actually in any case very hard to recover a multi-engine transaction in this way. The problem is when one engine is missing the transaction but the other engine has it, and the event is logged with statement-based replication. But we could simply not do this optimisation for multi-engine transactions, which are really not very commonly used (for multi-engine transactions we would keep the fsync() inside prepare()). Then things should work, for the normal case as well as for binlog locally disabled. So in summary, my immediate opinion would be: - In 5.5, commit each engine individually without 2PC when binlog is locally disabled. - In 10.0, optionally binlog just a new XID_RECOVERY_LOG_EVENT and use it in recovery if binlog is locally disabled. Hope this helps, - Kristian.

1 0

Re: [Maria-developers] Structure of Binary Log
by Kristian Nielsen 16 Jan '13

16 Jan '13

Nuno Rebocho <rebocho.nmg(a)portugalmail.pt> writes: > The attractive hypothesis, develop a data recovery tool, never left my > mind. I've been trying to learn together as a "Maria developers" team > can get information about the structure of the log file from MySql to > navigate this and obtain information needed for recovery. I am not aware of any definitive manual for the MySQL/MariaDB binary log, however there are various resources out there. For example, I wrote two articles that give an overview of the events in the log: http://kristiannielsen.livejournal.com/13382.html http://kristiannielsen.livejournal.com/13253.html My articles on group commit on the same blog would give you lots on information on how the binary log is written. Google can probably find other articles by other authors that might be useful. Otherwise, the source code in the definite source of information. The binary log code is in sql/log.cc. The code for creating and parsing events is in sql/log_event.cc. I noticed you are now on the maria-developers@ mailing list, this will be a good place to ask concrete questions about the binary log. Good luck with your project! - Kristian.

1 0

Re: [Maria-developers] 答复: in-order commit
by Kristian Nielsen 16 Jan '13

16 Jan '13

丁奇 <dingqi.lxb(a)taobao.com> writes: > Hi, Kristian > Ok. I have got the information from JIRA. > > I find you control the commit order inside the user thread. > > Will it be easier to let Trans_worker thread hold this logic? Yes, I think you are right. Of course, the user thread is the one that knows the ordering, but the logic for waiting needs to be in the Trans_worker thread. In fact this is a bug in my first patch: Transaction T3 could wait for the THD of worker thread 1 which has both T1 and T2 queued; then it will wake up too early, when T1 commits rather than when T2 does. I will try to implement the new idea today. > After they have done the execution of one transaction, "register the transaction and wait" if there are transactions from other workers should be commited ahead. > After commit in one worker, wake up another worker, the worker who wait for the next "head of commitee" should be woken up. Right, I'll need to look into this a bit deeper. Actually, in my patch the actual wait and wakeup happens inside ha_commit_trans(), and there is a reason for this. Because eventually I want to do it inside tc_log->log_and_order(), which is called from ha_commit_trans(). Here is how a commit happens: InnoDB prepare step fsync() InnoDB redo log (*A) TC_LOG_BINLOG::log_and_order Write transaction to binlog fsync() binlog (*B) InnoDB commit_ordered() (*C) Write commit record to InnoDB redo log InnoDB commit step The steps (*A) and (*B) are slow, typically around 1-10 milliseconds depending on disk system. So we need many threads to commit in parallel and reach points (*A) and (*B) at the same time, so we only need to do the fsync() once for many threads. This is group commit. Thus for in-order parallel replication, we must not do the wait for the previous commit before the (*B) step. Because if we do, then it becomes impossible for two transactions to be at point (*B) at the same time, and group commit is impossible. On the other hand, point (*C) is where the commit order is determined. So if we do the wait after point (*C), then we cannot enforce that T1 commits before T2. So therefore, the wait must happen exactly around point (B) and (C), inside TC_LOG_BINLOG::log_and_order(). That is why I invented all the register_wait_for_prior_commit() and so on: so that log_and_order() has somewhere to look for exactly who is waiting for who. Then if T2 is waiting for T1 to commit, we can do steps (*B) and (*C) for both of them together, achiving both group commit and in-order parallel replication. Anyway, I just wanted to mention this, I know it will be difficult to understand fully from just this description. This is something that I have been planning to have for years, but I still need to show some real code that actually works. If I manage that, hopefully things will be clearer. (If not - then I need to think again ;-) Thanks, - Kristian.

1 2