[Maria-discuss] Calculating Maximum Bytes Per Record
*Hello, * * * *I have recently been doing some capacity planning exercises for a new database we are developing in MariaDB. One of the exercises I completed was to create a database with all tables loaded with maximum width records at 100,000, 500,000, and 1,000,000 records. I noticed that when I did this it seems that the larger the tables grow, the less bytes per record are required. I am curious what causes this to happen? These are all InnoDB tables saved to their own tablespaces.* * Max Bytes Per Record Record Count Record Count Record Count 100,000 500,000 1,000,000 acct_activation 241 176 164 acct_deactivated 419 336 327 disk_usage 136 75 63 keywords 713 646 633 pods 77,511 56,698 53,126 pods_joined 178 101 96 pod_activity 461 369 357 pod_files 78,224 57,220 53,615 sessions 168 92 80 users 1,426 1,351 1,338 usr_profile 78,392 57,343 53,730 * *I also created a query (using Information_Schema.Columns) to calculate maximum possible bytes per record / table (SQL attached below). I ran the query and compared the estimates to the actual values produced above. The query is close, but I would like to make it a little more accurate if possible. * *TABLE_SCHEMA* *TABLE_NAME* *PK_BYTES_TOT* *FIELD_BYTES_TOT* * IX_FIELD_BYTES_TOT* *IX_FIELD_COUNT* *IX_OVERHEAD_BYTES* *IX_BYTES_TOT* * TABLE_BYTES_TOT* pods acct_activation 0 73 33 1 29 62 *189* pods acct_deactivated 8 269 0 0 0 0 *388* pods devices 1 273 0 0 0 0 *384* pods disk_usage 8 32 0 0 0 0 *56* pods keywords 0 264 264 2 58 322 *820* pods pods 8 65,821 16 1 29 45 *92,224* pods pods_joined 0 16 16 2 58 74 *126* pods pod_activity 0 275 9 1 29 38 *438* pods pod_files 8 66,376 56 2 58 114 *93,097* pods sessions 0 10 10 2 58 68 *109* pods users 8 1,112 41 1 29 70 *1,666* pods usr_profile 8 66,628 0 0 0 0 *93,290* *Does anyone have any suggestions for my calculations in the code below to make it more accurate?* * * *Basic Assumptions * * * - Calculated field widths are defined by data type in the CASE logic below. - Primary key bytes = The sum of all defined PK calculated field widths - Index Bytes = Primary key bytes + Index calculated field width + 29 bytes overhead (for each index defined) - 1 byte is added to each field that is Nullable. - All bytes defined above are multiplied by 1.4 for BTREE overhead.* * * * *Current SQL to calculate MAX record bytes by table:* * * SELECT B.TABLE_SCHEMA , B.TABLE_NAME , SUM(PK_BYTES) AS PK_BYTES_TOT , SUM(FIELD_BYTE_SPACE) AS FIELD_BYTES_TOT , SUM(IX_BYTES) AS IX_FIELD_BYTES_TOT , SUM(CASE WHEN IX_BYTES > 0 THEN 1 ELSE 0 END) AS IX_FIELD_COUNT , SUM(CASE WHEN IX_BYTES > 0 THEN 1 ELSE 0 END) * 29 AS IX_OVERHEAD_BYTES , (SUM(CASE WHEN IX_BYTES > 0 THEN 1 ELSE 0 END) * 29) + SUM(IX_BYTES) AS IX_BYTES_TOT , (SUM(PK_BYTES) + SUM(FIELD_BYTE_SPACE) + (SUM(CASE WHEN IX_BYTES > 0 THEN 1 ELSE 0 END) * 29) + SUM(IX_BYTES)) * 1.4 AS TABLE_BYTES_TOT FROM ( SELECT A.* , CASE WHEN COLUMN_KEY = 'PRI'THEN FIELD_BYTE_SPACE ELSE 0 END AS PK_BYTES , CASE WHEN A.COLUMN_KEY <> 'PRI' AND A.COLUMN_KEY <> '' THEN (PK_BYTE_SPACE + FIELD_BYTE_SPACE) ELSE 0 END AS IX_BYTES FROM ( SELECT PK_SP.TABLE_SCHEMA , PK_SP.TABLE_NAME , PK_SP.COLUMN_NAME , DATA_TYPE , CHARACTER_MAXIMUM_LENGTH , NUMERIC_PRECISION , IS_NULLABLE , COLUMN_KEY , CHARACTER_SET_NAME , CHARACTER_OCTET_LENGTH , (CASE -- CHARACTER FIELDS WHEN DATA_TYPE = 'varchar' THEN CHARACTER_MAXIMUM_LENGTH + 1 WHEN DATA_TYPE = 'char' THEN CHARACTER_MAXIMUM_LENGTH WHEN DATA_TYPE = 'tinyblob' OR DATA_TYPE = 'tinytext' THEN CHARACTER_MAXIMUM_LENGTH + 1 WHEN DATA_TYPE = 'blob' OR DATA_TYPE = 'text' THEN CHARACTER_MAXIMUM_LENGTH + 2 WHEN DATA_TYPE = 'mediumblob' OR DATA_TYPE = 'mediumtext' THEN CHARACTER_MAXIMUM_LENGTH + 3 WHEN DATA_TYPE = 'largeblob' OR DATA_TYPE = 'largetext' THEN CHARACTER_MAXIMUM_LENGTH + 4 -- NUMERIC FIELDS WHEN DATA_TYPE = 'tinyint' THEN 1 WHEN DATA_TYPE = 'smallint' THEN 2 WHEN DATA_TYPE = 'mediumint' THEN 3 WHEN DATA_TYPE = 'int' OR DATA_TYPE = 'integer' THEN 4 WHEN DATA_TYPE = 'bigint' THEN 8 WHEN DATA_TYPE = 'float' AND (NUMERIC_PRECISION <= 24 OR NUMERIC_PRECISION IS NULL) THEN 4 WHEN DATA_TYPE = 'float' AND NUMERIC_PRECISION > 24 THEN 8 WHEN DATA_TYPE = 'bit' THEN (NUMERIC_PRECISION + 7) / 8 WHEN DATA_TYPE = 'double' OR DATA_TYPE = 'numeric' THEN (FLOOR(NUMERIC_PRECISION/9)*4) + ROUND((NUMERIC_PRECISION- FLOOR(NUMERIC_PRECISION/9)*9)*.5,0) -- DATETIME FIELDS WHEN DATA_TYPE = 'date' OR DATA_TYPE = 'time' THEN 3 WHEN DATA_TYPE = 'datetime' THEN 8 WHEN DATA_TYPE = 'timestamp' THEN 4 WHEN DATA_TYPE = 'year' THEN 1 ELSE 999999999999999 END) + (CASE WHEN IS_NULLABLE = 'YES' THEN 1 ELSE 0 END) AS FIELD_BYTE_SPACE , CASE WHEN PK_BYTE_SPACE IS NULL THEN 0 ELSE PK_BYTE_SPACE END AS PK_BYTE_SPACE FROM information_schema.columns AS PK_SP LEFT OUTER JOIN (SELECT TABLE_SCHEMA , TABLE_NAME , SUM((CASE -- CHARACTER FIELDS WHEN DATA_TYPE = 'varchar' THEN CHARACTER_MAXIMUM_LENGTH + 1 WHEN DATA_TYPE = 'char' THEN CHARACTER_MAXIMUM_LENGTH WHEN DATA_TYPE = 'tinyblob' OR DATA_TYPE = 'tinytext' THEN CHARACTER_MAXIMUM_LENGTH + 1 WHEN DATA_TYPE = 'blob' OR DATA_TYPE = 'text' THEN CHARACTER_MAXIMUM_LENGTH + 2 WHEN DATA_TYPE = 'mediumblob' OR DATA_TYPE = 'mediumtext' THEN CHARACTER_MAXIMUM_LENGTH + 3 WHEN DATA_TYPE = 'largeblob' OR DATA_TYPE = 'largetext' THEN CHARACTER_MAXIMUM_LENGTH + 4 -- NUMERIC FIELDS WHEN DATA_TYPE = 'tinyint' THEN 1 WHEN DATA_TYPE = 'smallint' THEN 2 WHEN DATA_TYPE = 'mediumint' THEN 3 WHEN DATA_TYPE = 'int' OR DATA_TYPE = 'integer' THEN 4 WHEN DATA_TYPE = 'bigint' THEN 8 WHEN DATA_TYPE = 'float' AND (NUMERIC_PRECISION <= 24 OR NUMERIC_PRECISION IS NULL) THEN 4 WHEN DATA_TYPE = 'float' AND NUMERIC_PRECISION > 24 THEN 8 WHEN DATA_TYPE = 'bit' THEN (NUMERIC_PRECISION + 7) / 8 WHEN DATA_TYPE = 'double' OR DATA_TYPE = 'numeric' THEN (FLOOR(NUMERIC_PRECISION/9)*4) + ROUND((NUMERIC_PRECISION- FLOOR(NUMERIC_PRECISION/9)*9)*.5,0) -- DATETIME FIELDS WHEN DATA_TYPE = 'date' OR DATA_TYPE = 'time' THEN 3 WHEN DATA_TYPE = 'datetime' THEN 8 WHEN DATA_TYPE = 'timestamp' THEN 4 WHEN DATA_TYPE = 'year' THEN 1 ELSE 999999999999999 END) + (CASE WHEN IS_NULLABLE = 'YES' THEN 1 ELSE 0 END)) AS PK_BYTE_SPACE FROM information_schema.columns COL_SP WHERE COLUMN_KEY = 'PRI' AND TABLE_SCHEMA = 'PODS' GROUP BY TABLE_SCHEMA , TABLE_NAME) AS IX_SP ON PK_SP.TABLE_SCHEMA = IX_SP.TABLE_SCHEMA AND PK_SP.TABLE_NAME = IX_SP.TABLE_NAME WHERE PK_SP.TABLE_SCHEMA = 'PODS') AS A ) AS B GROUP BY B.TABLE_SCHEMA , B.TABLE_NAME * *
participants (1)
-
Jake Drew