[Maria-developers] an observation
Hi, by playing with the code I think I found something interesting. My environment: MariaDB 10.0.10, MyISAM-engine I played with a table-scan, no index is defined on this table. When I execute a SQL-statement that forces the server to do a second table- scan on a table this 2nd table-scan will be slow. The reason for this behaviour is the usage of a buffer: during the 1st scan this buffer is filled, used and filled again until the whole table is processed. At the end of the 1st scan it contains the last bytes of the file. When a 2nd scan is started the reading of the table starts from the beginning of the file but the buffer and all associated variables are not reset: the buffer still contains the bytes from the end of the file, the request cannot be fulfilled by the buffer so the request has to be handled by reading the bytes directly from the file using the read()-function of the Std-library. This takes much more time then simply copying the bytes from the internal buffer. My idea is: somewhere in the code this situation must be detected and the buffer (and all associated variables) reset to initial values. rein it_io_cache() looks like the right candidate for this. I assume the engine Maria will show the same effect (I didn't check other engines). Some questions: is my observation correct? Do you think this needs to be fixed? Or is there no further development on this part of the code? Thanks Regards AugustQ PS: you will find more details on this topic here: http://augustq.blogs pot.com/2017/01/subselect-execution-3.html
hum... did you checked only linux version, or windows too? could you provide a SQL example? 2017-01-29 14:30 GMT-02:00 AugustQ <augustq@gmx.net>:
Hi,
by playing with the code I think I found something interesting.
My environment: MariaDB 10.0.10, MyISAM-engine
I played with a table-scan, no index is defined on this table. When I execute a SQL-statement that forces the server to do a second table-scan on a table this 2nd table-scan will be slow.
The reason for this behaviour is the usage of a buffer: during the 1st scan this buffer is filled, used and filled again until the whole table is processed. At the end of the 1st scan it contains the last bytes of the file. When a 2nd scan is started the reading of the table starts from the beginning of the file but the buffer and all associated variables are not reset: the buffer still contains the bytes from the end of the file, the request cannot be fulfilled by the buffer so the request has to be handled by reading the bytes directly from the file using the read()-function of the Std-library. This takes much more time then simply copying the bytes from the internal buffer.
My idea is: somewhere in the code this situation must be detected and the buffer (and all associated variables) reset to initial values. reinit_io_cache() looks like the right candidate for this.
I assume the engine Maria will show the same effect (I didn't check other engines).
Some questions: is my observation correct? Do you think this needs to be fixed? Or is there no further development on this part of the code?
Thanks
Regards AugustQ
PS: you will find more details on this topic here: http://augustq.blogspot.com/2017/01/subselect-execution-3.html
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Roberto Spadim SPAEmpresarial - Software ERP Eng. Automação e Controle
Hi Robert, #1: OS I've only tested on Linux. But I was using the source-code of MariaDB so I assume the effect will happen on Windows too. #2: SQL-statement this is the statement I've used for my test: MariaDB [TestOpt]> select SQL_NO_CACHE B.PZN, B.ArtikelText from TestBig B where exists ( select 1 from TestSmall A where A.Hersteller = '00020' and A.PZN = B.PZN and Id = 1) and B.Hersteller = '36367'; It's a silly statement but it demonstrates the effect. You will find more details in my text: http://augustq.blogspot.com/2017 /01/subselect-execution-3.html #3: optimizer_switch here it is (polished it a bit): MariaDB [(none)]> select @@optimizer_switch; +-----------------------------------------------+ | @@optimizer_switch | +-----------------------------------------------+ | index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, engine_condition_pushdown=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on,materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=off, table_elimination=on, extended_keys=on, exists_to_in=off | +-----------------------------------------------+ 1 row in set (0.00 sec) MariaDB [(none)]> I hope I did not introduce any errors. AugustQ Am Sonntag, den 29.01.2017, 15:22 -0200 schrieb Roberto Spadim:
hum... did you checked only linux version, or windows too? could you provide a SQL example?
2017-01-29 14:30 GMT-02:00 AugustQ <augustq@gmx.net>:
Hi,
by playing with the code I think I found something interesting.
My environment: MariaDB 10.0.10, MyISAM-engine
I played with a table-scan, no index is defined on this table. When I execute a SQL-statement that forces the server to do a second table-scan on a table this 2nd table-scan will be slow.
The reason for this behaviour is the usage of a buffer: during the 1st scan this buffer is filled, used and filled again until the whole table is processed. At the end of the 1st scan it contains the last bytes of the file. When a 2nd scan is started the reading of the table starts from the beginning of the file but the buffer and all associated variables are not reset: the buffer still contains the bytes from the end of the file, the request cannot be fulfilled by the buffer so the request has to be handled by reading the bytes directly from the file using the read()- function of the Std-library. This takes much more time then simply copying the bytes from the internal buffer.
My idea is: somewhere in the code this situation must be detected and the buffer (and all associated variables) reset to initial values. reinit_io_cache() looks like the right candidate for this.
I assume the engine Maria will show the same effect (I didn't check other engines).
Some questions: is my observation correct? Do you think this needs to be fixed? Or is there no further development on this part of the code?
Thanks
Regards AugustQ
PS: you will find more details on this topic here: http://augustq.blogspot.com/2017/01/subselect-execution-3.html
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Hi, AugustQ! On Jan 29, AugustQ wrote:
Hi,
by playing with the code I think I found something interesting.
My environment: MariaDB 10.0.10, MyISAM-engine
I played with a table-scan, no index is defined on this table. When I execute a SQL-statement that forces the server to do a second table- scan on a table this 2nd table-scan will be slow.
The reason for this behaviour is the usage of a buffer: during the 1st scan this buffer is filled, used and filled again until the whole table is processed. At the end of the 1st scan it contains the last bytes of the file. When a 2nd scan is started the reading of the table starts from the beginning of the file but the buffer and all associated variables are not reset: the buffer still contains the bytes from the end of the file, the request cannot be fulfilled by the buffer so the request has to be handled by reading the bytes directly from the file using the read()-function of the Std-library. This takes much more time then simply copying the bytes from the internal buffer.
Right... MyISAM does not know how you're going to access the table. It might be a second full table scan. Or may be you'll just want to read the end of the table?
My idea is: somewhere in the code this situation must be detected and the buffer (and all associated variables) reset to initial values. reinit_io_cache() looks like the right candidate for this.
How would that help? You'll get faster execution if MyISAM would preload first pages of the table. But it doesn't know you're going to do a full table scan, so why would it preload it? Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi Sergei, thnkas for your reply. In my tests I found that the first table-scan takes about 12 seconds, the second table-scan takes about 25 seconds (and all folowing) scans too). When I call the function reinit_io_cache() all scans take about 12 seonds. But my implementation was quick and dirty.... I found the reason for this behaviour in the function _mi_read_cache() where the first if()-statement was false in the case of the first scan and true for all other scans.My conclusion was that the IO_CACHE-struct ure had to be reset when the second scan begins. And this seems to work. Regards AugustQ Am Montag, den 30.01.2017, 12:27 +0100 schrieb Sergei Golubchik:
Hi, AugustQ!
On Jan 29, AugustQ wrote:
Hi,
by playing with the code I think I found something interesting.
My environment: MariaDB 10.0.10, MyISAM-engine
I played with a table-scan, no index is defined on this table. When I execute a SQL-statement that forces the server to do a second table- scan on a table this 2nd table-scan will be slow.
The reason for this behaviour is the usage of a buffer: during the 1st scan this buffer is filled, used and filled again until the whole table is processed. At the end of the 1st scan it contains the last bytes of the file. When a 2nd scan is started the reading of the table starts from the beginning of the file but the buffer and all associated variables are not reset: the buffer still contains the bytes from the end of the file, the request cannot be fulfilled by the buffer so the request has to be handled by reading the bytes directly from the file using the read()-function of the Std-library. This takes much more time then simply copying the bytes from the internal buffer.
Right... MyISAM does not know how you're going to access the table. It might be a second full table scan. Or may be you'll just want to read the end of the table?
My idea is: somewhere in the code this situation must be detected and the buffer (and all associated variables) reset to initial values. reinit_io_cache() looks like the right candidate for this.
How would that help? You'll get faster execution if MyISAM would preload first pages of the table. But it doesn't know you're going to do a full table scan, so why would it preload it?
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi Sergei, coming back to my observation. With the text-file I already sent you in another thread here you can also try to reproduce the effect I described here. Here is how: - modify the INSERT INTO TestBig-statement so that it will include much more records into the table - execute the statement and check the code I mentioned Please keep in mind that the statement here is a different one (not the same statement as in the other thread). And please keep in mind that the effect only happens when the data-file for the table TestBig has a size >128K. You can simply double the lines with the INSERT-statement (and double again as the current statement creates a file of approx. 880 bytes in size only). You can also execute the statement again and again as there are no constraints on the table. hope that helps. AugustQ Am Montag, den 30.01.2017, 12:27 +0100 schrieb Sergei Golubchik:
Hi, AugustQ!
On Jan 29, AugustQ wrote:
Hi,
by playing with the code I think I found something interesting.
My environment: MariaDB 10.0.10, MyISAM-engine
I played with a table-scan, no index is defined on this table. When I execute a SQL-statement that forces the server to do a second table- scan on a table this 2nd table-scan will be slow.
The reason for this behaviour is the usage of a buffer: during the 1st scan this buffer is filled, used and filled again until the whole table is processed. At the end of the 1st scan it contains the last bytes of the file. When a 2nd scan is started the reading of the table starts from the beginning of the file but the buffer and all associated variables are not reset: the buffer still contains the bytes from the end of the file, the request cannot be fulfilled by the buffer so the request has to be handled by reading the bytes directly from the file using the read()-function of the Std-library. This takes much more time then simply copying the bytes from the internal buffer.
Right... MyISAM does not know how you're going to access the table. It might be a second full table scan. Or may be you'll just want to read the end of the table?
My idea is: somewhere in the code this situation must be detected and the buffer (and all associated variables) reset to initial values. reinit_io_cache() looks like the right candidate for this.
How would that help? You'll get faster execution if MyISAM would preload first pages of the table. But it doesn't know you're going to do a full table scan, so why would it preload it?
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi Sergei, another tip: instead of increasing the size of the data-file to >128K you can also decrease the size of the buffer. I did something similar for a test and (if I remember it correctly) it's done in sys/sysvars.cc in the var. Sys_read_buff_size. I've set the entry for DEFAULT to 4K and played with this. In this case you will only need a data-file of size >4K. Hope this helps. AugustQ Am Dienstag, den 21.02.2017, 16:47 +0100 schrieb AugustQ:
Hi Sergei,
coming back to my observation.
With the text-file I already sent you in another thread here you can also try to reproduce the effect I described here.
Here is how: - modify the INSERT INTO TestBig-statement so that it will include much more records into the table
- execute the statement and check the code I mentioned
Please keep in mind that the statement here is a different one (not the same statement as in the other thread).
And please keep in mind that the effect only happens when the data- file for the table TestBig has a size >128K. You can simply double the lines with the INSERT-statement (and double again as the current statement creates a file of approx. 880 bytes in size only). You can also execute the statement again and again as there are no constraints on the table.
hope that helps. AugustQ
Am Montag, den 30.01.2017, 12:27 +0100 schrieb Sergei Golubchik:
Hi, AugustQ!
On Jan 29, AugustQ wrote:
Hi,
by playing with the code I think I found something interesting.
My environment: MariaDB 10.0.10, MyISAM-engine
I played with a table-scan, no index is defined on this table. When I execute a SQL-statement that forces the server to do a second table- scan on a table this 2nd table-scan will be slow.
The reason for this behaviour is the usage of a buffer: during the 1st scan this buffer is filled, used and filled again until the whole table is processed. At the end of the 1st scan it contains the last bytes of the file. When a 2nd scan is started the reading of the table starts from the beginning of the file but the buffer and all associated variables are not reset: the buffer still contains the bytes from the end of the file, the request cannot be fulfilled by the buffer so the request has to be handled by reading the bytes directly from the file using the read()-function of the Std-library. This takes much more time then simply copying the bytes from the internal buffer.
Right... MyISAM does not know how you're going to access the table. It might be a second full table scan. Or may be you'll just want to read the end of the table?
My idea is: somewhere in the code this situation must be detected and the buffer (and all associated variables) reset to initial values. reinit_io_cache() looks like the right candidate for this.
How would that help? You'll get faster execution if MyISAM would preload first pages of the table. But it doesn't know you're going to do a full table scan, so why would it preload it?
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
participants (3)
-
AugustQ
-
Roberto Spadim
-
Sergei Golubchik