9 Jun
2011
9 Jun
'11
5:14 p.m.
Hi! >>>>> "Sergei" == Sergei Golubchik <serg@askmonty.org> writes: Sergei> Hi, erkan! Sergei> On May 30, erkan yanar wrote: >> >> Moin, >> I did some tmp-table tests. As it is told MariaDB is faster on >> tmp-tables because of the Aria-Engine. Sergei> I've tried that, but without mysqlslap. Your table structure, few times more rows, Sergei> mysql-5.1, mariadb-5.2 with aria tmp tables, mariadb-5.2 with myisam tmp Sergei> tables. I didn't try mysql-5.5. I used a query like Sergei> SELECT id, tea from t1 group by left(id,1) order by null; Sergei> where "group by left(id,1)" makes sure the number of groups is small Sergei> (only ten), so the result set is small and the query doesn't spend most Sergei> of its time sending thousands of result rows. And "order by null" Sergei> removes filesort from the query plan. Sergei> I've got almost identical numbers (which is a bit scary, considering Sergei> different setup, and different queries). Sergei> MariaDB with myisam tmp tables performed similar to MySQL (1.2-1.5 sec) Sergei> MariaDB with aria tables was slower (5.1-5.7 sec). Sergei> I remember that few months ago Monty fixed a slowdown caused by Aria Sergei> temp tables. May be this fix didn't get into 5.2? Or you've found a Sergei> different problem? I'll let Monty comment on it - he was already Sergei> debugging this problem once. I have now analysed this in detail. I used sergis test table: CREATE TABLE `t1` ( `id` int(11) DEFAULT NULL, `tea` text ) ENGINE=MyISAM DEFAULT CHARSET=latin1; I repeated then the following query until I got 2097152 rows. insert t1 select rand()*2e8, repeat(rand(), rand()*64) from t1; Then I issued the following queries. SELECT id, tea from t1 group by left(id,1) order by null; SELECT id, count(*), tea from t1 group by left(id,1) order by null The results I got was (for both queries): Aria(8K) row_format=page: 8.56 sec & 6.06 Aria(8K) row_format=dynamic: 3.19 & 13.32 Aria(1K) row_format=page: 6.40 & 4.88 Aria(1K) row_format=dynamic: 2.02 & 13.32 Aria(2K) row_format=page: 6.40 & 4.88 Aria(2K) row_format=dynamic: 2.26 & 13.51 MyISAM: 2.17, 12.89 The number after Aria is the block size (with mysqld --aria-block-size=). The 'dynamic' test was done by patching mysqld to use the dynamic page format instead of 'page' which is default. Dynamic is what MYISAM is using; If the page size would be the same then Aria and MyISAM should perform identically. >From the above you can see - That page size affects the query time notable in this case. This is because there is more data to move back/from the page cache. The bigger page cache however allows longer keys and fewer index levels so for bigger data sets the different should be smaller. It's possible to in the future optimize Aria to not copy pages from the page cache also for index writes and then this difference should disappear. - Aria is 2x faster than MyISAM when you add 'count(*)' to the query. (More about this later). I also tried the change the query to use left(id,2) to get more groups. Then the result was: Aria(8K) row_format=page: 10.21 & 7.96 Aria(8K) row_format=dynamic: 4.85 & 15.47 MyISAM: 4.04 & 15.14 Here one can see that compared to dynamic, the better code in Aria negates the page cache a bit for dynamic. row_format=page is still slower for the simpler query. I then tried by using group on an integer: SELECT id, tea from t1 group by id % 100 order by null; SELECT id, count(*), tea from t1 group by id % 100 order by null; Aria(8K) row_format=page: 9.83 & 10.23 Aria(8K) row_format=dynamic: 7.19 & 17.42 MyISAM: 6.28 & 17.00 Results is still similar. I examined the code and then found out why Aria was slower for the simpler query. MariaDB has an optimization that when you don't have any summary functions in the query it will just write to the group temp table and let duplicate key handling remove the duplicates. With the 'page' format however this is not a good idea as this is optimized for the case when there is few duplicates and it starts by writing the row, then the keys. What happens is that as almost all rows are duplicates we first insert the row, then notice the duplication and then delete the row. Fortunately it's not common that you have group by queries without a sum function and that's why we have not noticed this before. I have now fixed this in 5.2 so that if there is no summary function we are using the dynamic row format for aria. (It's not trivial to fix the way page format works by first writing the row and this is not something I would recommended to done in a stable release). After this (one line change) the results are: First queries LEFT(id,1) : 3.08 & 6.24 Second LEFT(id,2)): 4.87 & 8.20 Third (id % 10): 7.10 & 10.38 We are still a little bit slower than MyISAM for the simpler, not common queries, (which can be fixed by running with a block size of 2K if this would be critical for anyone) However for queries with sum functions we are about 50 % faster than MyISAM. Erkan, thanks for finding this issue! Regards, Monty