Re: d20e4e10353: MDEV-36536 Add option to not collect statistics for long char/varchars

Hi, Michael, On Apr 22, Michael Widenius wrote:
revision-id: d20e4e10353 (mariadb-10.6.21-79-gd20e4e10353) parent(s): 4f0a0dbd104 author: Michael Widenius committer: Michael Widenius timestamp: 2025-04-22 10:53:08 +0300 message:
MDEV-36536 Add option to not collect statistics for long char/varchars
This is needed to make it easy for users to automatically ignore long char and varchars when using ANALYZE TABLE PERSISTENT. These fields can cause problems as they will consume 'CHARACTERS * MAX_CHARACTER_LENGTH * 2 * number_of_rows' space on disk during analyze, which can easily be much bigger than the analyzed table.
This commit adds a new user variable, analyze_max_length, default value 4G. Any field that is bigger than this in bytes, will be ignored by ANALYZE TABLE PERSISTENT unless it is specified in FOR COLUMNS().
While doing this patch, I noticed that we do not skip GEOMETRY columns from ANALYZE TABLE, like we do with BLOB. This should be fixed when merging to the 'main' branch. At the same time we should add a resonable default value for analyze_max_length, probably 1024, like we have for max_sort_length.
diff --git a/mysql-test/main/mysqld--help.test b/mysql-test/main/mysqld--help.test index 971983fd66c..44449cd3ad5 100644 --- a/mysql-test/main/mysqld--help.test +++ b/mysql-test/main/mysqld--help.test @@ -20,14 +20,14 @@ exec $MYSQLD_BOOTSTRAP_CMD --symbolic-links=0 --log-bin=foo --lower-case-table-n
perl; # Variables which we don't want to display in the result file since - # their paths may vary: + # their paths may vary or they may use a default of 4294967295 :
why? what's wrong with 4294967295?
@skipvars=qw/basedir open-files-limit general-log-file log plugin-dir plugin-maturity log-slow-queries pid-file slow-query-log-file log-basename datadir slave-load-tmpdir tmpdir socket thread-pool-size large-files-support lower-case-file-system system-time-zone collation-server character-set-server log-tc-size table-cache table-open-cache table-open-cache-instances max-connections - server-uid tls-version version.*/; + server-uid tls-version version.* analyze-max-length/;
# Plugins which may or may not be there: @plugins=qw/innodb archive blackhole federated partition s3 diff --git a/sql/sql_class.h b/sql/sql_class.h index 69b021cd41d..aac433181a4 100644 --- a/sql/sql_class.h +++ b/sql/sql_class.h @@ -731,6 +731,7 @@ typedef struct system_variables ha_rows select_limit; ha_rows max_join_size; ha_rows expensive_subquery_limit; + ulong analyze_max_length;
don't use ulong, please. uint or ulonglong.
ulong auto_increment_increment, auto_increment_offset; #ifdef WITH_WSREP /* diff --git a/sql/sys_vars.cc b/sql/sys_vars.cc index 0d4a8e4c6fe..4ae14a58feb 100644 --- a/sql/sys_vars.cc +++ b/sql/sys_vars.cc @@ -445,6 +445,20 @@ static Sys_var_double Sys_analyze_sample_percentage( CMD_LINE(REQUIRED_ARG), VALID_RANGE(0, 100), DEFAULT(100));
+/* + The max length have to be UINT_MAX32 to not remove GEOMETRY fields + from analyze. +*/ + +static Sys_var_ulong Sys_analyze_max_length( + "analyze_max_length", + "Fields that require more storage than analyze_max_length and are not "
better "Fields which length in bytes more than this are skipped by ANALYZE TABLE PERSISTENT unless explicitly listed in the FOR COLUMNS () clause" "require more storage" is not very clear
+ "listed in ANALYZE ... FOR COLUMNS () will automatically be skipped by " + "ANALYZE TABLE PERSISTENT", + SESSION_VAR(analyze_max_length), + CMD_LINE(REQUIRED_ARG), VALID_RANGE(32, UINT_MAX32), + DEFAULT(UINT_MAX32), BLOCK_SIZE(1)); + static Sys_var_ulong Sys_auto_increment_increment( "auto_increment_increment", "Auto-increment columns are incremented by this",
Regards, Sergei Chief Architect, MariaDB Server and security@mariadb.org
participants (1)
-
Sergei Golubchik