[Maria-discuss] Implementing APPROXIMATE_COUNT(*)
Hi, I have realised that whenever I am doing any query and want to count number of rows:SELECT COUNT(*) FROM atable WHERE <complicated condition>; On large tables (>10e6 rows), when the count is >100e3, it takes over a second... even when all the indexes are set up, etc... and with large counts, I have not been able to go below ~<1 s. So I would like to implement APPROXIMATE_COUNT(*) function, which checks, if, for example every 10th, 100th or 1000th row is present in my final output... and based on this statistics, return the approximate number of rows that given query produces. I am using MyISAM engine and would like to do the implementation on it, where the total number of rows is automatically stored. Obviously, it would be useful for anybody dealing with these huge aggregate websites, where users do not really need to know if their keyword search returned 1234567890 results or just 1.2e9 results... Please, can you advise me how to start this project? I.e. whether it's easy to do in SQL or whether I would have to start looking somewhere in the source code of MariaDB? I have plenty of time, programming/SQL experience, but no experience with maria/mysql source code. Tomek
Hi, Tomek! On Jan 05, Tomek R wrote:
Hi, I have realised that whenever I am doing any query and want to count number of rows:SELECT COUNT(*) FROM atable WHERE <complicated condition>; On large tables (>10e6 rows), when the count is >100e3, it takes over a second... even when all the indexes are set up, etc... and with large counts, I have not been able to go below ~<1 s. So I would like to implement APPROXIMATE_COUNT(*) function, which checks, if, for example every 10th, 100th or 1000th row is present in my final output... and based on this statistics, return the approximate number of rows that given query produces. I am using MyISAM engine and would like to do the implementation on it, where the total number of rows is automatically stored. Obviously, it would be useful for anybody dealing with these huge aggregate websites, where users do not really need to know if their keyword search returned 1234567890 results or just 1.2e9 results... Please, can you advise me how to start this project? I.e. whether it's easy to do in SQL or whether I would have to start looking somewhere in the source code of MariaDB? I have plenty of time, programming/SQL experience, but no experience with maria/mysql source code.
See the new server variable in MariaDB-10.0 - optimizer_use_condition_selectivity - and in particular, when happens when it's set to 5. The manual documents it as 5 Additionally use selectivity of certain non-range predicates calculated on record sample. It means that to estimate the selectivity of the WHERE condition for a specific table, exactly, SELECT ... FROM atable WHERE <complicated condition>, optimizer reads few first rows of the table (as set in optimizer_selectivity_sampling_limit) and checks how many of them pass the WHERE condition. This is very close to what you want, but currently it only works for LIKE predicates. If you'd extend this to work - optionally - for all WHERE conditions, you'll be able to see your "approximate count" value in the EXPLAIN output. Regards, Sergei
Hi, Another option is using an external fastbit index, which will give you more than count(*) and work excellently over complex ranges. https://sdm.lbl.gov/fastbit/ On Wed, Feb 19, 2014 at 5:25 AM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Tomek!
On Jan 05, Tomek R wrote:
Hi, I have realised that whenever I am doing any query and want to count number of rows:SELECT COUNT(*) FROM atable WHERE <complicated condition>; On large tables (>10e6 rows), when the count is >100e3, it takes over a second... even when all the indexes are set up, etc... and with large counts, I have not been able to go below ~<1 s. So I would like to implement APPROXIMATE_COUNT(*) function, which checks, if, for example every 10th, 100th or 1000th row is present in my final output... and based on this statistics, return the approximate number of rows that given query produces. I am using MyISAM engine and would like to do the implementation on it, where the total number of rows is automatically stored. Obviously, it would be useful for anybody dealing with these huge aggregate websites, where users do not really need to know if their keyword search returned 1234567890 results or just 1.2e9 results... Please, can you advise me how to start this project? I.e. whether it's easy to do in SQL or whether I would have to start looking somewhere in the source code of MariaDB? I have plenty of time, programming/SQL experience, but no experience with maria/mysql source code.
See the new server variable in MariaDB-10.0 - optimizer_use_condition_selectivity - and in particular, when happens when it's set to 5. The manual documents it as
5 Additionally use selectivity of certain non-range predicates calculated on record sample.
It means that to estimate the selectivity of the WHERE condition for a specific table, exactly, SELECT ... FROM atable WHERE <complicated condition>, optimizer reads few first rows of the table (as set in optimizer_selectivity_sampling_limit) and checks how many of them pass the WHERE condition.
This is very close to what you want, but currently it only works for LIKE predicates.
If you'd extend this to work - optionally - for all WHERE conditions, you'll be able to see your "approximate count" value in the EXPLAIN output.
Regards, Sergei
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
participants (3)
-
Justin Swanhart
-
Sergei Golubchik
-
Tomek R