Serg, All, On 05/19/2012 10:51 AM, Sergei Golubchik wrote:
Hi, Igor!
On May 19, Igor Babaev wrote:
On 05/19/2012 04:22 AM, Sergei Golubchik wrote:
On May 10, Igor Babaev wrote: Alternatively, it could have an explicit "PERSISTENT" keyword:
ANALYZE TABLE t1 PERSISTENT; -- meaning all columns and indexes ANALYZE TABLE t1 PERSISTENT FOR COLUMNS (c,e,d) INDEXES (idx2, idx4);
We usually don't use key words in plural. So maybe you meant: ANALYZE TABLE t1 PERSISTENT FOR COLUMN (c,e,d) INDEX (idx2, idx4); ?
I've checked before suggesting this syntax. Both COLUMNS and INDEXES already exist as tokens and used somewhere in the grammar.
For "INDEXES" you can use the existing keys_or_index rule:
keys_or_index: KEYS {} | INDEX_SYM {} | INDEXES {} ;
it's typically used in places like that. There is no corresponding column_or_columns, but feel free to create it, if you'd like.
Ok, Basically I'm positive about your suggestions. There are some details I'm doubting about, but I'll dwell on them later. Let's talk about the semantics of the extended ANALYZE first. We need to agree upon it because now I plan moving from a 'proof of concept' for the feature to a 'real' implementation. For this I have to give you more context. The engine persistent independent statistics is stored in 3 tables of the system mysql database: table_stat, column_stat, index_stat. The particular contents of these tables is not important for the further discussion here, it's enough to know that: - table_stat contains some statistics on the table in general - column_stat contains statistics on table columns - index_stat contains statistics on index prefixes. The global server option --stat-tables says now whether the statistics tables have to be taken into account (without this option the server ignores them altogether) There is a system variable (global/session) that controls how the statistics table are used by queries/statements. The variable is called optimizer_use_stat_tables. There are 4 possible values for this variable: - 'never' - 'complementary' - 'preferably' - 'exclusively'. The default value for the variable now is 'never'. If we start the server with --stat-tables disabled the only meaningful value for the variable is 'never'. Other values are ignored Q1. Should we issue a warning with the attempt to reset this value when --stat-tables is disabled? Let me first tell you how the setting of this variable affects regular queries (SELECT, UPDATE etc). If the variable is set to: 1. case 'never': then no statistics from stat tables is used 2. case 'complementary' then the statistical value for a particular attribute is used from persistent tables ONLY when there is no corresponding statistics from other sources 3. case 'preferably' then the statistical value for a particular attribute is used from other sources ONLY when there is no corresponding statistics from persistent tables 4. case 'exclusively' the only statistics from persistent tables is used, if there is none on some attributes no statistics is used The variable setting is also affects how ANALYZE works. The current syntax of the ANALYZE command is ANALYZE TABLE t1, ...; An currently if @@optimizer_use_stat_tables is set to 1. 'never' then it works as before (e.g. in 5.5) 2. 'complementary' then it works as before + persistent statistics for table_stat and column_stat is collected 3. 'preferably' then it works as before + persistent statistics for table_stat, column_stat and index_stat is collected 4. 'exclusively' only persistent statistics for table_stat, column_stat and index_stat is collected Q2. Is this semantics of ANALYZE is natural? Collection of persistent statistics with ANALYZE is now engine independent. So it's not a cheap and fast process. In many cases it is enough to collect/update statistics only some columns or even for some indexes. That's why I suggested to extend the syntax of the analyze. Let's choose something that is close to what SergeiG suggested. There are some questions here. Q3. If I need statistics for all columns and some indexes of table t1 and everything on table t2. can I use: ANALYZE TABLE t1 COLUMN[S] ALL INDEX[ES] (idx1,...), t2; ? Q4. Do we really need the keyword PERSISTENT here? If so what is its semantics? Q5. Isn't 'FOR' in the syntax of ANALYZE confusing in the sense that it commands to collect statistics for column_stat/index_stat, but not for table_stat? Maybe it's better to omit it? I do not plan to extend the syntax of ANALYZE to collect/update statistics only on some attributes of the columns (like collect statistics on min/max values, but not on the number of distinct values). Although I admit that collecting statistics on the number of distinct values is more expensive I don'tthink such an extension of ANALYZE is badly needed now. I plan to add a possibility to collect statistics from random sample records. I plan to control the process of sampling only through system variables. How does all above comply with persistent statistics of InnoDB from mysql-5.6? Almost perfectly. They store their persistent statistics in mysql.innodb_table_stat and mysql.innodb_index_stat. So we do not conflict here. The collection and the usage of their persistent statistics is going through their own system variables. Again - no conflicts. I will appreciate any comments on this post and, of course, I would like to have some answers for the above questions from SergeiG. Regards, Igor. then any particular satistics from
Regards, Sergei