
Just to clarify: we are looking at two options currently. Please find their description below. == Current mysql.column_stats == Currently, MariaDB defines mysql.column_stats as: CREATE TABLE column_stats ( db_name varchar(64) COLLATE utf8mb3_bin NOT NULL, table_name varchar(64) COLLATE utf8mb3_bin NOT NULL, column_name varchar(64) COLLATE utf8mb3_bin NOT NULL, min_value varbinary(255) DEFAULT NULL, max_value varbinary(255) DEFAULT NULL, nulls_ratio decimal(12,4) DEFAULT NULL, avg_length decimal(12,4) DEFAULT NULL, avg_frequency decimal(12,4) DEFAULT NULL, hist_size tinyint(3) unsigned DEFAULT NULL, hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB') COLLATE utf8mb3_bin DEFAULT NULL, histogram varbinary(255) DEFAULT NULL, PRIMARY KEY (db_name,table_name,column_name) ); It seems there is some redundancy here: the histogram column is of variable length, but there is also hist_size column which also stores length of the histogram (?). == Variant #1: Add JSON support to the current scheme * Keep all the same fields, but add another histogram type: JSON (more on the * name choice in the next email) * Change the definition of the 'histogram' column so it is not limited to 255 bytes. CREATE TABLE column_stats ( ... -- the same columns as before ... hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB', 'JSON') COLLATE utf8mb3_bin DEFAULT NULL, histogram blob DEFAULT NULL, PRIMARY KEY (db_name,table_name,column_name) ); Advantages: * The definition is close to the current definition * The data from the older version can be kept as-is: no migration or conversion is necessary * The format is backward-compatible: old server binary can read the records stored in the new format (we'll still need to verify this) Disadvantages: - depending on value of hist_type, histogram column holds either JSON or binary data, which is not a good data modeling practice. - JSON histogram holds real bucket bounds. The column's min_value and max_value will be stored twice: First, in min_value and max_value columns, and then in the first/last bound in the histogram. On the other hand, separate min_value and max_value columns allow storing min/max statistics without collecting the histogram. == Variant #2: Change to JSON-only == New mysql.column_stats will look like this: CREATE TABLE column_stats ( db_name varchar(64) COLLATE utf8mb3_bin NOT NULL, table_name varchar(64) COLLATE utf8mb3_bin NOT NULL, column_name varchar(64) COLLATE utf8mb3_bin NOT NULL, -- perhaps some columns like avg_length, avg_frequency histogram JSON, -- All histograms are in JSON PRIMARY KEY (db_name,table_name,column_name) ); The histogram will be always stored in JSON. Old histograms will be converted to the new format. Advantages - straightforward data modeling. - no "baggage" (Do we need old-format histograms, long term?) Disadvantages - This is not compatible with the old data format. - If we chose to support legacy histograms, we'll need to convert them into JSON (This should be fairly easy to with DECODE_HISTOGRAM function is useful for that) Any thoughts on this anyone? BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net