[Maria-discuss] MariaDB-10.2.11 Long Time To Start
Hi, I'm setting up a replication slave based off of MySQL 5.6. There are a large number of tables (400K). The MySQL 5.6 master takes around 10 minutes to start up, where the new MariaDB 10.2 slave takes close to 30 minutes. What is making the startup take soo long? In the below mysql log, you can see in the startup sequence between "Highest supported file format is Barracuda." and "InnoDB: 128 out of 128 rollback segments are active." the startup process takes the majority of its time: 2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Uses event mutexes 2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Compressed tables use zlib 1.2.8 2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Using Linux native AIO 2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Number of pools: 1 2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Using SSE2 crc32 instructions 2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Initializing buffer pool, total size = 54G, instances = 8, chunk size = 128M 2018-01-07 9:38:18 139980432365760 [Note] InnoDB: Completed initialization of buffer pool 2018-01-07 9:38:18 139918113195776 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2018-01-07 9:38:18 139980432365760 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: 128 out of 128 rollback segments are active. 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: Creating shared tablespace for temporary tables 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: Waiting for purge to start 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: 5.7.20 started; log sequence number 1394303471760 2018-01-07 10:11:48 139919461500672 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2018-01-07 10:11:48 139980432365760 [Note] Server socket created on IP: '::'. 2018-01-07 10:11:49 139980432365760 [Note] Reading of all Master_info entries succeded 2018-01-07 10:11:49 139980432365760 [Note] Added new Master_info '' to hash table 2018-01-07 10:11:49 139980432365760 [Note] /usr/sbin/mysqld: ready for connections. Version: '10.2.11-MariaDB-10.2.11+maria~xenial-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 mariadb.org binary distribution Any ideas why? I've included my my.conf file below. Thanks, Mike # Generated by Percona Configuration Wizard (http://tools.percona.com/) version REL5-20120208 # Configuration name usaddxprdsqlpm3 generated formcaplan@labnet.net at 2017-12-22 13:49:58 [mysql] # CLIENT # port = 3306 socket = /var/lib/mysql/mysql.sock [mysqladmin] # CLIENT # port = 3306 socket = /var/lib/mysql/mysql.sock [mysqld] # GENERAL # user = mysql default-storage-engine = InnoDB socket = /var/lib/mysql/mysql.sock pid-file = /var/lib/mysql/mysql.pid character-set-server = utf8 collation_server = utf8_general_ci performance_schema = 0 # MyISAM # key-buffer-size = 32M myisam-recover = FORCE,BACKUP # SAFETY # max-allowed-packet = 20M max-connect-errors = 1000000 skip-name-resolve sysdate-is-now = 1 innodb = FORCE # DATA STORAGE # datadir = /var/lib/mysql/ # BINARY LOGGING # log-bin = /var/lib/mysql/mysql-bin binlog_format = MIXED server-id = 3 expire-logs-days = 7 sync-binlog = 1 # REPLICATION # read-only = 1 skip-slave-start = 1 log-slave-updates = 1 relay-log = /var/lib/mysql/relay-bin slave-net-timeout = 60 sync-master-info = 1 sync-relay-log = 1 sync-relay-log-info = 1 # CACHES AND LIMITS # tmp-table-size = 32M max-heap-table-size = 32M query-cache-type = 0 query-cache-size = 0 max-connections = 500 thread-cache-size = 50 open-files-limit = 65535 table-definition-cache = 4096 table-open-cache = 10240 # INNODB # innodb-flush-method = O_DIRECT innodb-log-files-in-group = 2 innodb-log-file-size = 512M innodb-flush-log-at-trx-commit = 1 innodb-file-per-table = 1 innodb-buffer-pool-size = 54G # LOGGING # log-error = /var/lib/mysql/mysql-error.log log-queries-not-using-indexes = 0 slow-query-log = 0 slow-query-log-file = /var/lib/mysql/mysql-slow.log [mariadb] plugin-load-add=file_key_management file_key_management_filename=/var/lib/mysql/keys.enc file_key_management_filekey=FILE:/var/lib/mysql/mysql.key file_key_management_encryption_algorithm=AES_CTR encrypt_binlog=0
Actually, Was just reviewing the startup time for MySQL 5.6. I'm at 5 minutes there, not 10. Both these physical machines are running the same hardware, with the exception of the new slave having 64GB RAM (32GB on the master). Master is running Ubuntu Precise, slave Ubuntu Xenial. Anyone else running 10.2 with a large number of tables? Thanks, Mike On 2018-01-08 09:06 AM, Michael Caplan wrote:
Hi,
I'm setting up a replication slave based off of MySQL 5.6. There are a large number of tables (400K). The MySQL 5.6 master takes around 10 minutes to start up, where the new MariaDB 10.2 slave takes close to 30 minutes. What is making the startup take soo long?
In the below mysql log, you can see in the startup sequence between "Highest supported file format is Barracuda." and "InnoDB: 128 out of 128 rollback segments are active." the startup process takes the majority of its time:
2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Uses event mutexes 2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Compressed tables use zlib 1.2.8 2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Using Linux native AIO 2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Number of pools: 1 2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Using SSE2 crc32 instructions 2018-01-07 9:38:15 139980432365760 [Note] InnoDB: Initializing buffer pool, total size = 54G, instances = 8, chunk size = 128M 2018-01-07 9:38:18 139980432365760 [Note] InnoDB: Completed initialization of buffer pool 2018-01-07 9:38:18 139918113195776 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2018-01-07 9:38:18 139980432365760 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: 128 out of 128 rollback segments are active. 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: Creating shared tablespace for temporary tables 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: Waiting for purge to start 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: 5.7.20 started; log sequence number 1394303471760 2018-01-07 10:11:48 139919461500672 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2018-01-07 10:11:48 139980432365760 [Note] Server socket created on IP: '::'. 2018-01-07 10:11:49 139980432365760 [Note] Reading of all Master_info entries succeded 2018-01-07 10:11:49 139980432365760 [Note] Added new Master_info '' to hash table 2018-01-07 10:11:49 139980432365760 [Note] /usr/sbin/mysqld: ready for connections. Version: '10.2.11-MariaDB-10.2.11+maria~xenial-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 mariadb.org binary distribution
Any ideas why? I've included my my.conf file below.
Thanks,
Mike
# Generated by Percona Configuration Wizard (http://tools.percona.com/) version REL5-20120208 # Configuration name usaddxprdsqlpm3 generated formcaplan@labnet.net at 2017-12-22 13:49:58
[mysql]
# CLIENT # port = 3306 socket = /var/lib/mysql/mysql.sock
[mysqladmin]
# CLIENT # port = 3306 socket = /var/lib/mysql/mysql.sock
[mysqld]
# GENERAL # user = mysql default-storage-engine = InnoDB socket = /var/lib/mysql/mysql.sock pid-file = /var/lib/mysql/mysql.pid character-set-server = utf8 collation_server = utf8_general_ci performance_schema = 0
# MyISAM # key-buffer-size = 32M myisam-recover = FORCE,BACKUP
# SAFETY # max-allowed-packet = 20M max-connect-errors = 1000000 skip-name-resolve sysdate-is-now = 1 innodb = FORCE
# DATA STORAGE # datadir = /var/lib/mysql/
# BINARY LOGGING # log-bin = /var/lib/mysql/mysql-bin binlog_format = MIXED server-id = 3 expire-logs-days = 7 sync-binlog = 1
# REPLICATION # read-only = 1 skip-slave-start = 1 log-slave-updates = 1 relay-log = /var/lib/mysql/relay-bin slave-net-timeout = 60 sync-master-info = 1 sync-relay-log = 1 sync-relay-log-info = 1
# CACHES AND LIMITS # tmp-table-size = 32M max-heap-table-size = 32M query-cache-type = 0 query-cache-size = 0 max-connections = 500 thread-cache-size = 50 open-files-limit = 65535 table-definition-cache = 4096 table-open-cache = 10240
# INNODB # innodb-flush-method = O_DIRECT innodb-log-files-in-group = 2 innodb-log-file-size = 512M innodb-flush-log-at-trx-commit = 1 innodb-file-per-table = 1 innodb-buffer-pool-size = 54G
# LOGGING # log-error = /var/lib/mysql/mysql-error.log log-queries-not-using-indexes = 0 slow-query-log = 0 slow-query-log-file = /var/lib/mysql/mysql-slow.log
[mariadb] plugin-load-add=file_key_management file_key_management_filename=/var/lib/mysql/keys.enc file_key_management_filekey=FILE:/var/lib/mysql/mysql.key file_key_management_encryption_algorithm=AES_CTR encrypt_binlog=0
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi Michael,
2018-01-07 9:38:18 139980432365760 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: 128 out of 128 rollback segments are active.
This is really unacceptable behaviour. It looks like you have encountered https://jira.mariadb.org/browse/MDEV-9843 where this exactly same behaviour has been documented. Can you reproduce the delay on every startup? Stack traces (from http://poormansprofiler.org/ or "perf top -g" or similar) during this startup delay would be very useful. With best regards, Marko
Hi Marko, I can reproduce this reliably with every start up. Poor man's profiler output: 84 ../sysdeps/unix/syscall-template.S: No such file or directory. 10 ??,??,??,??,??,??,start_thread,clone 5 at,??,??,start_thread,clone 1 pread64,??,??,??,??,??,??,??,??,??,??,??,ha_initialize_handlerton(st_plugin_int*),??,plugin_init(int*,,??,mysqld_main(int,,__libc_start_main,_start 1 at,??,start_thread,clone 1 Anything else I can get you? Best, Mike On 2018-01-08 09:52 AM, Marko Mäkelä wrote:
Hi Michael,
2018-01-07 9:38:18 139980432365760 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: 128 out of 128 rollback segments are active.
This is really unacceptable behaviour. It looks like you have encountered https://jira.mariadb.org/browse/MDEV-9843 where this exactly same behaviour has been documented.
Can you reproduce the delay on every startup? Stack traces (from http://poormansprofiler.org/ or "perf top -g" or similar) during this startup delay would be very useful.
With best regards,
Marko
Hi Marko, Kinda outside of my element here, but attached are a few screenshots from perf top -g Thanks, Mike On 2018-01-08 10:34 AM, Michael Caplan wrote:
Hi Marko,
I can reproduce this reliably with every start up.
Poor man's profiler output:
84 ../sysdeps/unix/syscall-template.S: No such file or directory. 10 ??,??,??,??,??,??,start_thread,clone 5 at,??,??,start_thread,clone 1 pread64,??,??,??,??,??,??,??,??,??,??,??,ha_initialize_handlerton(st_plugin_int*),??,plugin_init(int*,,??,mysqld_main(int,,__libc_start_main,_start 1 at,??,start_thread,clone 1
Anything else I can get you?
Best,
Mike
On 2018-01-08 09:52 AM, Marko Mäkelä wrote:
Hi Michael,
2018-01-07 9:38:18 139980432365760 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: 128 out of 128 rollback segments are active.
This is really unacceptable behaviour. It looks like you have encountered https://jira.mariadb.org/browse/MDEV-9843 where this exactly same behaviour has been documented.
Can you reproduce the delay on every startup? Stack traces (from http://poormansprofiler.org/ or "perf top -g" or similar) during this startup delay would be very useful.
With best regards,
Marko
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi Michael, OK, all the interesting functions (InnoDB functions) are displayed as ??. In the "perf top" it would have helped to drill into the mysqld executable, and only show its top functions. Apparently a lot of pages are being read; that is all that I can tell from the screenshots. Can you try the following: 1. Find the proces ID, e.g., ps ax|grep mysqld 2. gdb /usr/sbin/mysqld set log file /tmp/gdb.txt set log on attach [pid] thr a a bt detach quit With this we should get detailed stack traces. I am interested in the thread whose stack trace includes plugin_init and mysqld_main. Could you please try again? Marko
Here you go! Thanks, Mike On 2018-01-08 12:38 PM, Marko Mäkelä wrote:
Hi Michael,
OK, all the interesting functions (InnoDB functions) are displayed as ??.
In the "perf top" it would have helped to drill into the mysqld executable, and only show its top functions. Apparently a lot of pages are being read; that is all that I can tell from the screenshots.
Can you try the following:
1. Find the proces ID, e.g., ps ax|grep mysqld 2. gdb /usr/sbin/mysqld set log file /tmp/gdb.txt set log on attach [pid] thr a a bt detach quit
With this we should get detailed stack traces. I am interested in the thread whose stack trace includes plugin_init and mysqld_main.
Could you please try again?
Marko
Michael, Sorry, I now realize that you must likely have used a stripped executable, and therefore the symbols for the InnoDB functions are missing (displayed as ?? in the stack trace output from gdb). As far as I understand, gdb should be able to resolve the symbols if you install the separate debug symbol package: sudo apt install mariadb-server-core-10.2-dbgsym I tried to find the installation package corresponding to '10.2.11-MariaDB-10.2.11+maria~xenial-log' so that I would be able to resolve the stack trace manually. As a developer, normally I work with code that I build myself from the source code, and therefore I am a bit bad at giving advice that applies to the packaged code. I downloaded and extracted the files from the following packages: mariadb-server-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-10.2_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-core-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-core-10.2_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... The server executable /usr/sbin/mysqld is included in the mysql-server-core package. Unfortunately I was unable to match that executable with the addresses from your gdb output. I did the following: file mysqld mysqld: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=7649cc717c0ee820185a633fd27692f3b9160bed, stripped gdb mysqld symbol-file ../lib/debug/.build-id/76/49cc717c0ee820185a633fd27692f3b9160bed.debug followed by a number of gdb commands. I expected the least significant 12 bits of the addresses to match in the disassembly of ha_initialize_handlerton() and plugin_init(), but that was not the case. Maybe I did something wrong. So, can you please install the debug symbol package and try again? Best regards, Marko
Hi Marko, Thanks for your patience with me. I think I have better material for you to look at now. Best, Mike On 2018-01-09 03:17 AM, Marko Mäkelä wrote:
Michael,
Sorry, I now realize that you must likely have used a stripped executable, and therefore the symbols for the InnoDB functions are missing (displayed as ?? in the stack trace output from gdb). As far as I understand, gdb should be able to resolve the symbols if you install the separate debug symbol package:
sudo apt install mariadb-server-core-10.2-dbgsym
I tried to find the installation package corresponding to '10.2.11-MariaDB-10.2.11+maria~xenial-log' so that I would be able to resolve the stack trace manually. As a developer, normally I work with code that I build myself from the source code, and therefore I am a bit bad at giving advice that applies to the packaged code.
I downloaded and extracted the files from the following packages: mariadb-server-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-10.2_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-core-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-core-10.2_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main...
The server executable /usr/sbin/mysqld is included in the mysql-server-core package. Unfortunately I was unable to match that executable with the addresses from your gdb output. I did the following:
file mysqld mysqld: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=7649cc717c0ee820185a633fd27692f3b9160bed, stripped
gdb mysqld symbol-file ../lib/debug/.build-id/76/49cc717c0ee820185a633fd27692f3b9160bed.debug
followed by a number of gdb commands. I expected the least significant 12 bits of the addresses to match in the disassembly of ha_initialize_handlerton() and plugin_init(), but that was not the case. Maybe I did something wrong.
So, can you please install the debug symbol package and try again?
Best regards,
Marko
Hi Michael,
Thanks! In the stack trace, I see fsp_flags_try_adjust(), a function that I
had to introduce to fix a file format compatibility problem that had been
introduced in MariaDB 10.1.0:
MDEV-11623 https://jira.mariadb.org/browse/MDEV-11623, commit ab1e6fefd
https://github.com/MariaDB/server/commit/ab1e6fefd869242d962cb91a006f37bb9ad...
in MariaDB 10.1.21, 10.2.4. The format compatibility problem mostly affects
a non-default innodb_page_size.
However, the "root cause" for this slowdown is the validate=true in the
following code in the function innobase_start_or_create_for_mysql():
bool validate = recv_needed_recovery
&& srv_force_recovery == 0;
dict_check_tablespaces_and_store_max_id(validate);
Normally, we should have validate=false, because no crash recovery should
be needed. If the server was not shut down cleanly, then there should be a
startup message "Starting crash recovery", and recv_needed_recovery should
hold.
The variable srv_force_recovery is tied to the parameter
innodb_force_recovery, which is 0 by default.
To me, this looks like it could be 'working as designed'. But, you did not
share the server error log from the very start. Was there a crash recovery
message? If not, then it is an interesting bug.
If you copied the data files from a file system snapshot from a running
server, then it is very likely that crash recovery was needed. You could
work around this by setting --innodb-force-recovery=1 when starting up. But
beware that this will cause InnoDB to ignore certain errors.
Maybe a better idea would be to invoke "mariabackup --prepare" and
"mariabackup --copy-back" in order to apply the redo log before starting up
the server on the copied files. And while you are at it, you could also use
"mariabackup --backup" and skip the file system snapshot.
Best regards,
Marko
On Tue, Jan 9, 2018 at 4:32 PM, Michael Caplan
Hi Marko,
Thanks for your patience with me. I think I have better material for you to look at now. Best,
Mike
On 2018-01-09 03:17 AM, Marko Mäkelä wrote:
Michael,
Sorry, I now realize that you must likely have used a stripped executable, and therefore the symbols for the InnoDB functions are missing (displayed as ?? in the stack trace output from gdb). As far as I understand, gdb should be able to resolve the symbols if you install the separate debug symbol package:
sudo apt install mariadb-server-core-10.2-dbgsym
I tried to find the installation package corresponding to '10.2.11-MariaDB-10.2.11+maria~xenial-log' so that I would be able to resolve the stack trace manually. As a developer, normally I work with code that I build myself from the source code, and therefore I am a bit bad at giving advice that applies to the packaged code.
I downloaded and extracted the files from the following packages: mariadb-server-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-10.2_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-core-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-core-10.2_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main...
The server executable /usr/sbin/mysqld is included in the mysql-server-core package. Unfortunately I was unable to match that executable with the addresses from your gdb output. I did the following:
file mysqld mysqld: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=7649cc717c0ee820185a633fd27692f3b9160bed, stripped
gdb mysqld symbol-file ../lib/debug/.build-id/76/49cc717c0ee820185a633fd27692f3 b9160bed.debug
followed by a number of gdb commands. I expected the least significant 12 bits of the addresses to match in the disassembly of ha_initialize_handlerton() and plugin_init(), but that was not the case. Maybe I did something wrong.
So, can you please install the debug symbol package and try again?
Best regards,
Marko
-- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation DON’T MISS M|18 MariaDB User Conference February 26 - 27, 2018 New York City https://m18.mariadb.com/
Hi Marko, Not sure how I missed your reply. Sorry for the delayed response. To give you a complete picture, this fresh install of MariaDB was populated with data copied from MySQL 5.6. The data was copied via rsync (https://www.stephenrlang.com/2016/08/setting-up-mysql-master-slave-replicati...), mysql_upgrade, and here we are. Prior to the upgrade to 10.2, I did need to go through a crash recover back in Sept. 2017, but it seemed to recover just fine. Here is a snippet from back then: 2017-09-17 09:23:59 16441 [Note] InnoDB: Database was not shutdown normally! 2017-09-17 09:23:59 16441 [Note] InnoDB: Starting crash recovery. 2017-09-17 09:23:59 16441 [Note] InnoDB: Reading tablespace information from the .ibd files... 2017-09-17 10:21:30 16441 [Note] InnoDB: Restoring possible half-written data pages 2017-09-17 10:21:30 16441 [Note] InnoDB: from the doublewrite buffer... InnoDB: Doing recovery: scanned up to log sequence number 1183264246272 InnoDB: Doing recovery: scanned up to log sequence number 1183266067498 2017-09-17 10:21:32 16441 [Note] InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 InnoDB: Apply batch completed InnoDB: Last MySQL binlog file position 0 133845121, file name mysql-bin.003599 2017-09-17 10:22:22 16441 [Note] InnoDB: 128 rollback segment(s) are active. 2017-09-17 10:22:22 16441 [Note] InnoDB: Waiting for purge to start 2017-09-17 10:22:23 16441 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.16-64.2 started; log sequence number 1183266067498 2017-09-17 10:22:23 16441 [Note] Recovering after a crash using /var/lib/mysql/mysql-bin 2017-09-17 10:22:24 16441 [Note] Starting crash recovery... 2017-09-17 10:22:24 16441 [Note] Crash recovery finished. I've restarted a few times since then without any known issue seen in the log. Fast forward to the migration to 10.2. Following the rsync process, upon initial startup of MariaDB, it did report a crash recovery. (see attached log). The recovery seemed to run just fine, and no issues reported in the log following mysql_upgrade. I've restarted numerous times since then, and the log file isn't reporting anything funky that my newb eyes can see. For where I am right now, would the recommended resolution would be to restart once with --innodb-force-recovery=1 ? Thanks again for for your thorough analysis and easy to understand explanation. Best, Mike On 2018-01-10 05:15 PM, Marko Mäkelä wrote:
Hi Michael,
Thanks! In the stack trace, I see fsp_flags_try_adjust(), a function that I had to introduce to fix a file format compatibility problem that had been introduced in MariaDB 10.1.0: MDEV-11623 https://jira.mariadb.org/browse/MDEV-11623, commit ab1e6fefd https://github.com/MariaDB/server/commit/ab1e6fefd869242d962cb91a006f37bb9ad... in MariaDB 10.1.21, 10.2.4. The format compatibility problem mostly affects a non-default innodb_page_size.
However, the "root cause" for this slowdown is the validate=true in the following code in the function innobase_start_or_create_for_mysql():
bool validate = recv_needed_recovery && srv_force_recovery == 0;
dict_check_tablespaces_and_store_max_id(validate);
Normally, we should have validate=false, because no crash recovery should be needed. If the server was not shut down cleanly, then there should be a startup message "Starting crash recovery", and recv_needed_recovery should hold. The variable srv_force_recovery is tied to the parameter innodb_force_recovery, which is 0 by default.
To me, this looks like it could be 'working as designed'. But, you did not share the server error log from the very start. Was there a crash recovery message? If not, then it is an interesting bug.
If you copied the data files from a file system snapshot from a running server, then it is very likely that crash recovery was needed. You could work around this by setting --innodb-force-recovery=1 when starting up. But beware that this will cause InnoDB to ignore certain errors. Maybe a better idea would be to invoke "mariabackup --prepare" and "mariabackup --copy-back" in order to apply the redo log before starting up the server on the copied files. And while you are at it, you could also use "mariabackup --backup" and skip the file system snapshot.
Best regards,
Marko
On Tue, Jan 9, 2018 at 4:32 PM, Michael Caplan
mailto:michael@eggplant.ws> wrote: Hi Marko,
Thanks for your patience with me. I think I have better material for you to look at now.
Best,
Mike
On 2018-01-09 03:17 AM, Marko Mäkelä wrote:
Michael,
Sorry, I now realize that you must likely have used a stripped executable, and therefore the symbols for the InnoDB functions are missing (displayed as ?? in the stack trace output from gdb). As far as I understand, gdb should be able to resolve the symbols if you install the separate debug symbol package:
sudo apt install mariadb-server-core-10.2-dbgsym
I tried to find the installation package corresponding to '10.2.11-MariaDB-10.2.11+maria~xenial-log' so that I would be able to resolve the stack trace manually. As a developer, normally I work with code that I build myself from the source code, and therefore I am a bit bad at giving advice that applies to the packaged code.
I downloaded and extracted the files from the following packages: mariadb-server-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-10.2_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-core-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-core-10.2_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main...
The server executable /usr/sbin/mysqld is included in the mysql-server-core package. Unfortunately I was unable to match that executable with the addresses from your gdb output. I did the following:
file mysqld mysqld: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=7649cc717c0ee820185a633fd27692f3b9160bed, stripped
gdb mysqld symbol-file ../lib/debug/.build-id/76/49cc717c0ee820185a633fd27692f3b9160bed.debug
followed by a number of gdb commands. I expected the least significant 12 bits of the addresses to match in the disassembly of ha_initialize_handlerton() and plugin_init(), but that was not the case. Maybe I did something wrong.
So, can you please install the debug symbol package and try again?
Best regards,
Marko
-- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation
DON’T MISS
M|18
MariaDB User Conference
February 26 - 27, 2018
New York City
Hi Marko, I've been reading more into --innodb-force-recovery and trying to understand the state of my new slaves. If I understand what you are saying, the current databases are in an unrecovered state and requires crash recovery. If that is the case, should my log file indicate this the case when I restart. To be sure, when I shutdown, it looks like it is shutting down cleanly: 2018-01-07 19:51:13 140398761576192 [Note] Error reading relay log event: slave SQL thread was killed 2018-01-07 19:51:13 140398761576192 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.003923' at position 662261965 2018-01-07 19:51:13 140398761879296 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.003923', position 662261965 2018-01-07 19:51:37 140398761576192 [Note] /usr/sbin/mysqld (unknown): Normal shutdown 2018-01-07 19:51:37 140398761576192 [Note] Event Scheduler: Purging the queue. 0 events 2018-01-07 19:51:37 140338008692480 [Note] InnoDB: FTS optimize thread exiting. 2018-01-07 19:51:38 140398761576192 [Note] InnoDB: Starting shutdown... 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 0, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 1, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 2, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 3, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 4, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 5, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 6, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 7, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Buffer pool(s) dump completed at 180107 19:51:38 2018-01-07 19:52:10 140398761576192 [Note] InnoDB: Shutdown completed; log sequence number 1395535052281 2018-01-07 19:52:10 140398761576192 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2018-01-07 19:52:10 140398761576192 [Note] /usr/sbin/mysqld: Shutdown complete Startup also seems to run clean too (except for the long time): 2018-01-07 19:52:12 140563014502592 [Note] Using unique option prefix 'myisam-recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead. 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Uses event mutexes 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Compressed tables use zlib 1.2.8 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Using Linux native AIO 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Number of pools: 1 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Using SSE2 crc32 instructions 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Initializing buffer pool, total size = 54G, instances = 8, chunk size = 128M 2018-01-07 19:52:14 140563014502592 [Note] InnoDB: Completed initialization of buffer pool 2018-01-07 19:52:14 140500678067968 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2018-01-07 19:52:14 140563014502592 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-07 20:20:49 140563014502592 [Note] InnoDB: 128 out of 128 rollback segments are active. 2018-01-07 20:20:49 140563014502592 [Note] InnoDB: Creating shared tablespace for temporary tables 2018-01-07 20:20:49 140563014502592 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2018-01-07 20:20:49 140563014502592 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2018-01-07 20:20:49 140563014502592 [Note] InnoDB: Waiting for purge to start 2018-01-07 20:20:49 140563014502592 [Note] InnoDB: 5.7.20 started; log sequence number 1395535052281 2018-01-07 20:20:49 140502034765568 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2018-01-07 20:20:49 140563014502592 [Note] Plugin 'FEEDBACK' is disabled. 2018-01-07 20:20:49 140563014502592 [Note] Server socket created on IP: '::'. 2018-01-07 20:20:49 140563014502592 [Note] Reading of all Master_info entries succeded 2018-01-07 20:20:49 140563014502592 [Note] Added new Master_info '' to hash table 2018-01-07 20:20:49 140563014502592 [Note] /usr/sbin/mysqld: ready for connections. Version: '10.2.11-MariaDB-10.2.11+maria~xenial-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 mariadb.org binary distribution I guess I'm at a loss with regards to understanding the state of my current database slave. It is not reporting that crash recovery is required, yet you seem to be suggesting that under the hood it is going through motions to perform a crash recovery? Any recommendations for next steps? Thanks, Mike On 2018-01-12 08:42 PM, Michael Caplan wrote:
Hi Marko,
Not sure how I missed your reply. Sorry for the delayed response.
To give you a complete picture, this fresh install of MariaDB was populated with data copied from MySQL 5.6. The data was copied via rsync (https://www.stephenrlang.com/2016/08/setting-up-mysql-master-slave-replicati...), mysql_upgrade, and here we are. Prior to the upgrade to 10.2, I did need to go through a crash recover back in Sept. 2017, but it seemed to recover just fine. Here is a snippet from back then:
2017-09-17 09:23:59 16441 [Note] InnoDB: Database was not shutdown normally! 2017-09-17 09:23:59 16441 [Note] InnoDB: Starting crash recovery. 2017-09-17 09:23:59 16441 [Note] InnoDB: Reading tablespace information from the .ibd files... 2017-09-17 10:21:30 16441 [Note] InnoDB: Restoring possible half-written data pages 2017-09-17 10:21:30 16441 [Note] InnoDB: from the doublewrite buffer... InnoDB: Doing recovery: scanned up to log sequence number 1183264246272 InnoDB: Doing recovery: scanned up to log sequence number 1183266067498 2017-09-17 10:21:32 16441 [Note] InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 InnoDB: Apply batch completed InnoDB: Last MySQL binlog file position 0 133845121, file name mysql-bin.003599 2017-09-17 10:22:22 16441 [Note] InnoDB: 128 rollback segment(s) are active. 2017-09-17 10:22:22 16441 [Note] InnoDB: Waiting for purge to start 2017-09-17 10:22:23 16441 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.16-64.2 started; log sequence number 1183266067498 2017-09-17 10:22:23 16441 [Note] Recovering after a crash using /var/lib/mysql/mysql-bin 2017-09-17 10:22:24 16441 [Note] Starting crash recovery... 2017-09-17 10:22:24 16441 [Note] Crash recovery finished.
I've restarted a few times since then without any known issue seen in the log.
Fast forward to the migration to 10.2. Following the rsync process, upon initial startup of MariaDB, it did report a crash recovery. (see attached log). The recovery seemed to run just fine, and no issues reported in the log following mysql_upgrade. I've restarted numerous times since then, and the log file isn't reporting anything funky that my newb eyes can see.
For where I am right now, would the recommended resolution would be to restart once with --innodb-force-recovery=1 ?
Thanks again for for your thorough analysis and easy to understand explanation.
Best,
Mike
On 2018-01-10 05:15 PM, Marko Mäkelä wrote:
Hi Michael,
Thanks! In the stack trace, I see fsp_flags_try_adjust(), a function that I had to introduce to fix a file format compatibility problem that had been introduced in MariaDB 10.1.0: MDEV-11623 https://jira.mariadb.org/browse/MDEV-11623, commit ab1e6fefd https://github.com/MariaDB/server/commit/ab1e6fefd869242d962cb91a006f37bb9ad... in MariaDB 10.1.21, 10.2.4. The format compatibility problem mostly affects a non-default innodb_page_size.
However, the "root cause" for this slowdown is the validate=true in the following code in the function innobase_start_or_create_for_mysql():
bool validate = recv_needed_recovery && srv_force_recovery == 0;
dict_check_tablespaces_and_store_max_id(validate);
Normally, we should have validate=false, because no crash recovery should be needed. If the server was not shut down cleanly, then there should be a startup message "Starting crash recovery", and recv_needed_recovery should hold. The variable srv_force_recovery is tied to the parameter innodb_force_recovery, which is 0 by default.
To me, this looks like it could be 'working as designed'. But, you did not share the server error log from the very start. Was there a crash recovery message? If not, then it is an interesting bug.
If you copied the data files from a file system snapshot from a running server, then it is very likely that crash recovery was needed. You could work around this by setting --innodb-force-recovery=1 when starting up. But beware that this will cause InnoDB to ignore certain errors. Maybe a better idea would be to invoke "mariabackup --prepare" and "mariabackup --copy-back" in order to apply the redo log before starting up the server on the copied files. And while you are at it, you could also use "mariabackup --backup" and skip the file system snapshot.
Best regards,
Marko
On Tue, Jan 9, 2018 at 4:32 PM, Michael Caplan
mailto:michael@eggplant.ws> wrote: Hi Marko,
Thanks for your patience with me. I think I have better material for you to look at now.
Best,
Mike
On 2018-01-09 03:17 AM, Marko Mäkelä wrote:
Michael,
Sorry, I now realize that you must likely have used a stripped executable, and therefore the symbols for the InnoDB functions are missing (displayed as ?? in the stack trace output from gdb). As far as I understand, gdb should be able to resolve the symbols if you install the separate debug symbol package:
sudo apt install mariadb-server-core-10.2-dbgsym
I tried to find the installation package corresponding to '10.2.11-MariaDB-10.2.11+maria~xenial-log' so that I would be able to resolve the stack trace manually. As a developer, normally I work with code that I build myself from the source code, and therefore I am a bit bad at giving advice that applies to the packaged code.
I downloaded and extracted the files from the following packages: mariadb-server-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-10.2_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-core-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-core-10.2_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main...
The server executable /usr/sbin/mysqld is included in the mysql-server-core package. Unfortunately I was unable to match that executable with the addresses from your gdb output. I did the following:
file mysqld mysqld: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=7649cc717c0ee820185a633fd27692f3b9160bed, stripped
gdb mysqld symbol-file ../lib/debug/.build-id/76/49cc717c0ee820185a633fd27692f3b9160bed.debug
followed by a number of gdb commands. I expected the least significant 12 bits of the addresses to match in the disassembly of ha_initialize_handlerton() and plugin_init(), but that was not the case. Maybe I did something wrong.
So, can you please install the debug symbol package and try again?
Best regards,
Marko
-- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation
DON’T MISS
M|18
MariaDB User Conference
February 26 - 27, 2018
New York City
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi Marko, For what it is worth (and I am unsure if I understand the issue you described), I ran mysqlcheck --extended on all database tables. It ran clean. Why do you think my database is in a crashed state and therefore needing recovery? Thanks, Mike On 2018-01-16 10:44 AM, Michael Caplan wrote:
Hi Marko,
I've been reading more into --innodb-force-recovery and trying to understand the state of my new slaves.
If I understand what you are saying, the current databases are in an unrecovered state and requires crash recovery. If that is the case, should my log file indicate this the case when I restart.
To be sure, when I shutdown, it looks like it is shutting down cleanly:
2018-01-07 19:51:13 140398761576192 [Note] Error reading relay log event: slave SQL thread was killed 2018-01-07 19:51:13 140398761576192 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.003923' at position 662261965 2018-01-07 19:51:13 140398761879296 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.003923', position 662261965 2018-01-07 19:51:37 140398761576192 [Note] /usr/sbin/mysqld (unknown): Normal shutdown 2018-01-07 19:51:37 140398761576192 [Note] Event Scheduler: Purging the queue. 0 events 2018-01-07 19:51:37 140338008692480 [Note] InnoDB: FTS optimize thread exiting. 2018-01-07 19:51:38 140398761576192 [Note] InnoDB: Starting shutdown... 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 0, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 1, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 2, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 3, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 4, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 5, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 6, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Instance 7, restricted to 110592 pages due to innodb_buf_pool_dump_pct=25 2018-01-07 19:51:38 140337896998656 [Note] InnoDB: Buffer pool(s) dump completed at 180107 19:51:38 2018-01-07 19:52:10 140398761576192 [Note] InnoDB: Shutdown completed; log sequence number 1395535052281 2018-01-07 19:52:10 140398761576192 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2018-01-07 19:52:10 140398761576192 [Note] /usr/sbin/mysqld: Shutdown complete
Startup also seems to run clean too (except for the long time):
2018-01-07 19:52:12 140563014502592 [Note] Using unique option prefix 'myisam-recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead. 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Uses event mutexes 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Compressed tables use zlib 1.2.8 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Using Linux native AIO 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Number of pools: 1 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Using SSE2 crc32 instructions 2018-01-07 19:52:12 140563014502592 [Note] InnoDB: Initializing buffer pool, total size = 54G, instances = 8, chunk size = 128M 2018-01-07 19:52:14 140563014502592 [Note] InnoDB: Completed initialization of buffer pool 2018-01-07 19:52:14 140500678067968 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2018-01-07 19:52:14 140563014502592 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-07 20:20:49 140563014502592 [Note] InnoDB: 128 out of 128 rollback segments are active. 2018-01-07 20:20:49 140563014502592 [Note] InnoDB: Creating shared tablespace for temporary tables 2018-01-07 20:20:49 140563014502592 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2018-01-07 20:20:49 140563014502592 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2018-01-07 20:20:49 140563014502592 [Note] InnoDB: Waiting for purge to start 2018-01-07 20:20:49 140563014502592 [Note] InnoDB: 5.7.20 started; log sequence number 1395535052281 2018-01-07 20:20:49 140502034765568 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2018-01-07 20:20:49 140563014502592 [Note] Plugin 'FEEDBACK' is disabled. 2018-01-07 20:20:49 140563014502592 [Note] Server socket created on IP: '::'. 2018-01-07 20:20:49 140563014502592 [Note] Reading of all Master_info entries succeded 2018-01-07 20:20:49 140563014502592 [Note] Added new Master_info '' to hash table 2018-01-07 20:20:49 140563014502592 [Note] /usr/sbin/mysqld: ready for connections. Version: '10.2.11-MariaDB-10.2.11+maria~xenial-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 mariadb.org binary distribution
I guess I'm at a loss with regards to understanding the state of my current database slave. It is not reporting that crash recovery is required, yet you seem to be suggesting that under the hood it is going through motions to perform a crash recovery?
Any recommendations for next steps?
Thanks,
Mike
On 2018-01-12 08:42 PM, Michael Caplan wrote:
Hi Marko,
Not sure how I missed your reply. Sorry for the delayed response.
To give you a complete picture, this fresh install of MariaDB was populated with data copied from MySQL 5.6. The data was copied via rsync (https://www.stephenrlang.com/2016/08/setting-up-mysql-master-slave-replicati...), mysql_upgrade, and here we are. Prior to the upgrade to 10.2, I did need to go through a crash recover back in Sept. 2017, but it seemed to recover just fine. Here is a snippet from back then:
2017-09-17 09:23:59 16441 [Note] InnoDB: Database was not shutdown normally! 2017-09-17 09:23:59 16441 [Note] InnoDB: Starting crash recovery. 2017-09-17 09:23:59 16441 [Note] InnoDB: Reading tablespace information from the .ibd files... 2017-09-17 10:21:30 16441 [Note] InnoDB: Restoring possible half-written data pages 2017-09-17 10:21:30 16441 [Note] InnoDB: from the doublewrite buffer... InnoDB: Doing recovery: scanned up to log sequence number 1183264246272 InnoDB: Doing recovery: scanned up to log sequence number 1183266067498 2017-09-17 10:21:32 16441 [Note] InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 InnoDB: Apply batch completed InnoDB: Last MySQL binlog file position 0 133845121, file name mysql-bin.003599 2017-09-17 10:22:22 16441 [Note] InnoDB: 128 rollback segment(s) are active. 2017-09-17 10:22:22 16441 [Note] InnoDB: Waiting for purge to start 2017-09-17 10:22:23 16441 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.16-64.2 started; log sequence number 1183266067498 2017-09-17 10:22:23 16441 [Note] Recovering after a crash using /var/lib/mysql/mysql-bin 2017-09-17 10:22:24 16441 [Note] Starting crash recovery... 2017-09-17 10:22:24 16441 [Note] Crash recovery finished.
I've restarted a few times since then without any known issue seen in the log.
Fast forward to the migration to 10.2. Following the rsync process, upon initial startup of MariaDB, it did report a crash recovery. (see attached log). The recovery seemed to run just fine, and no issues reported in the log following mysql_upgrade. I've restarted numerous times since then, and the log file isn't reporting anything funky that my newb eyes can see.
For where I am right now, would the recommended resolution would be to restart once with --innodb-force-recovery=1 ?
Thanks again for for your thorough analysis and easy to understand explanation.
Best,
Mike
On 2018-01-10 05:15 PM, Marko Mäkelä wrote:
Hi Michael,
Thanks! In the stack trace, I see fsp_flags_try_adjust(), a function that I had to introduce to fix a file format compatibility problem that had been introduced in MariaDB 10.1.0: MDEV-11623 https://jira.mariadb.org/browse/MDEV-11623, commit ab1e6fefd https://github.com/MariaDB/server/commit/ab1e6fefd869242d962cb91a006f37bb9ad... in MariaDB 10.1.21, 10.2.4. The format compatibility problem mostly affects a non-default innodb_page_size.
However, the "root cause" for this slowdown is the validate=true in the following code in the function innobase_start_or_create_for_mysql():
bool validate = recv_needed_recovery && srv_force_recovery == 0;
dict_check_tablespaces_and_store_max_id(validate);
Normally, we should have validate=false, because no crash recovery should be needed. If the server was not shut down cleanly, then there should be a startup message "Starting crash recovery", and recv_needed_recovery should hold. The variable srv_force_recovery is tied to the parameter innodb_force_recovery, which is 0 by default.
To me, this looks like it could be 'working as designed'. But, you did not share the server error log from the very start. Was there a crash recovery message? If not, then it is an interesting bug.
If you copied the data files from a file system snapshot from a running server, then it is very likely that crash recovery was needed. You could work around this by setting --innodb-force-recovery=1 when starting up. But beware that this will cause InnoDB to ignore certain errors. Maybe a better idea would be to invoke "mariabackup --prepare" and "mariabackup --copy-back" in order to apply the redo log before starting up the server on the copied files. And while you are at it, you could also use "mariabackup --backup" and skip the file system snapshot.
Best regards,
Marko
On Tue, Jan 9, 2018 at 4:32 PM, Michael Caplan
mailto:michael@eggplant.ws> wrote: Hi Marko,
Thanks for your patience with me. I think I have better material for you to look at now.
Best,
Mike
On 2018-01-09 03:17 AM, Marko Mäkelä wrote:
Michael,
Sorry, I now realize that you must likely have used a stripped executable, and therefore the symbols for the InnoDB functions are missing (displayed as ?? in the stack trace output from gdb). As far as I understand, gdb should be able to resolve the symbols if you install the separate debug symbol package:
sudo apt install mariadb-server-core-10.2-dbgsym
I tried to find the installation package corresponding to '10.2.11-MariaDB-10.2.11+maria~xenial-log' so that I would be able to resolve the stack trace manually. As a developer, normally I work with code that I build myself from the source code, and therefore I am a bit bad at giving advice that applies to the packaged code.
I downloaded and extracted the files from the following packages: mariadb-server-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-10.2_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-core-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main... mariadb-server-core-10.2_10.2.11+maria~xenial_amd64.deb http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main...
The server executable /usr/sbin/mysqld is included in the mysql-server-core package. Unfortunately I was unable to match that executable with the addresses from your gdb output. I did the following:
file mysqld mysqld: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=7649cc717c0ee820185a633fd27692f3b9160bed, stripped
gdb mysqld symbol-file ../lib/debug/.build-id/76/49cc717c0ee820185a633fd27692f3b9160bed.debug
followed by a number of gdb commands. I expected the least significant 12 bits of the addresses to match in the disassembly of ha_initialize_handlerton() and plugin_init(), but that was not the case. Maybe I did something wrong.
So, can you please install the debug symbol package and try again?
Best regards,
Marko
-- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation
DON’T MISS
M|18
MariaDB User Conference
February 26 - 27, 2018
New York City
_______________________________________________ Mailing list:https://launchpad.net/~maria-discuss Post to :maria-discuss@lists.launchpad.net Unsubscribe :https://launchpad.net/~maria-discuss More help :https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi Mike, Sorry, I was busy with other development tasks. In the error log that you attached, the first startup upgrades the redo log from the pre-MySQL 5.7.9/MariaDB 10.2.2 format:
2018-01-08 5:39:50 140390495905984 [Note] InnoDB: Upgrading redo log: 2*536870912 bytes; LSN=1395657325222 2018-01-08 5:39:50 140390495905984 [Note] InnoDB: Upgrading redo log: 2*536870912 bytes; LSN=1395657325222 2018-01-08 5:39:50 140390495905984 [Note] InnoDB: Starting to delete and rewrite log files. 2018-01-08 5:39:50 140390495905984 [Note] InnoDB: Setting log file ./ib_logfile101 size to 536870912 bytes 2018-01-08 5:39:50 140390495905984 [Note] InnoDB: Setting log file ./ib_logfile1 size to 536870912 bytes 2018-01-08 5:39:50 140390495905984 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0 2018-01-08 5:39:50 140390495905984 [Note] InnoDB: New log files created, LSN=1395657325222 2018-01-08 6:01:49 140390495905984 [Note] InnoDB: 128 out of 128 rollback segments are active.
This should not count as a crash recovery, because we would refuse startup unless the redo log was logically empty (crash-upgrade). So, something is wrong already here. There is one more strange thing from that startup:
2018-01-08 6:01:49 140390495905984 [Note] InnoDB: Waiting for purge to start 2018-01-08 6:01:49 140390495905984 [Note] InnoDB: 5.7.20 started; log sequence number 0
Why is it reporting 0 instead of 1395657325222 or more? We are missing an assignment srv_start_lsn = log_sys->lsn when upgrading the redo log. Then, there is an apparently clean shutdown and subsequent startup with no sign of any crash recovery taking place. But still we have the following delay:
2018-01-08 8:23:03 140364886096064 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-08 8:45:14 140364886096064 [Note] InnoDB: 128 out of 128 rollback segments are active.
There are only two assignments recv_needed_recovery=true. Together with the assignments, we should output one of the following messages:
[Note] InnoDB: Starting crash recovery from checkpoint LSN=_ [Note] InnoDB: The log sequence number _ in the system tablespace does not match the log sequence number _ in the ib_logfiles!
#11 0x0000558a37dd1c24 in fil_ibd_open (validate=<optimized out>, validate@entry=true, fix_dict=<optimized out>, purpose=purpose@entry=FIL_TYPE_TABLESPACE, id=1233316, flags=<optimized out>, space_name=<optimized out>, path_in=0x558a8d79af60 "./ddx_lab_801002/pickups.ibd") at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/fil/fil0fil.cc:4211 #12 0x0000558a37db15c9 in dict_check_sys_tables (validate=true) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/dict/dict0load.cc:1482 #13 dict_check_tablespaces_and_store_max_id (validate=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/dict/dict0load.cc:1537
Actually, we do not know what the value of the parameter was in stack
But, I do not see either message in the log excerpt. Oh, now I realize that I was misled by the following in the gdb trace that you previously attached: frame 13. It might very well have been passed as 'false'. In dict_check_sys_tables() I see this code which is not present in MySQL 5.7: /* Check that the .ibd file exists. */ validate = true; /* Encryption */ dberr_t err = fil_ibd_open( validate, Thie above assignment 'validate = true' is the smoking gun! It was added in MariaDB 10.2.2 https://github.com/MariaDB/server/commit/2e814d4702d71a04388386a9f591d14a359..., apparently by mistake. I just checked the differences in the function dict_check_sys_tables() between MariaDB 10.2 and MySQL 5.7, and this assignment is the only one that does not make any sense whatsoever. I will shortly push a fix. Thank you for your patience (and sorry for the pun)! Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation DON’T MISS M|18 MariaDB User Conference February 26 - 27, 2018 New York City https://m18.mariadb.com/
Marko, Thank you again for such a clear explanation of your reasoning! Should you be able to pass along a defect ID that I can follow, I'll follow that and wait for the next release to test things out. Best, Mike On 2018-01-16 01:48 PM, Marko Mäkelä wrote:
Hi Mike,
Sorry, I was busy with other development tasks.
In the error log that you attached, the first startup upgrades the redo log from the pre-MySQL 5.7.9/MariaDB 10.2.2 format:
2018-01-08 5:39:50 140390495905984 [Note] InnoDB: Upgrading redo log: 2*536870912 bytes; LSN=1395657325222 2018-01-08 5:39:50 140390495905984 [Note] InnoDB: Upgrading redo log: 2*536870912 bytes; LSN=1395657325222 2018-01-08 5:39:50 140390495905984 [Note] InnoDB: Starting to delete and rewrite log files. 2018-01-08 5:39:50 140390495905984 [Note] InnoDB: Setting log file ./ib_logfile101 size to 536870912 bytes 2018-01-08 5:39:50 140390495905984 [Note] InnoDB: Setting log file ./ib_logfile1 size to 536870912 bytes 2018-01-08 5:39:50 140390495905984 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0 2018-01-08 5:39:50 140390495905984 [Note] InnoDB: New log files created, LSN=1395657325222 2018-01-08 6:01:49 140390495905984 [Note] InnoDB: 128 out of 128 rollback segments are active.
This should not count as a crash recovery, because we would refuse startup unless the redo log was logically empty (crash-upgrade). So, something is wrong already here. There is one more strange thing from that startup:
2018-01-08 6:01:49 140390495905984 [Note] InnoDB: Waiting for purge to start 2018-01-08 6:01:49 140390495905984 [Note] InnoDB: 5.7.20 started; log sequence number 0
Why is it reporting 0 instead of 1395657325222 or more? We are missing an assignment srv_start_lsn = log_sys->lsn when upgrading the redo log.
Then, there is an apparently clean shutdown and subsequent startup with no sign of any crash recovery taking place. But still we have the following delay:
2018-01-08 8:23:03 140364886096064 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-08 8:45:14 140364886096064 [Note] InnoDB: 128 out of 128 rollback segments are active.
There are only two assignments recv_needed_recovery=true. Together with the assignments, we should output one of the following messages:
[Note] InnoDB: Starting crash recovery from checkpoint LSN=_ [Note] InnoDB: The log sequence number _ in the system tablespace does not match the log sequence number _ in the ib_logfiles!
But, I do not see either message in the log excerpt.
Oh, now I realize that I was misled by the following in the gdb trace that you previously attached:
#11 0x0000558a37dd1c24 in fil_ibd_open (validate=<optimized out>, validate@entry=true, fix_dict=<optimized out>, purpose=purpose@entry=FIL_TYPE_TABLESPACE, id=1233316, flags=<optimized out>, space_name=<optimized out>, path_in=0x558a8d79af60 "./ddx_lab_801002/pickups.ibd") at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/fil/fil0fil.cc:4211 #12 0x0000558a37db15c9 in dict_check_sys_tables (validate=true) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/dict/dict0load.cc:1482 #13 dict_check_tablespaces_and_store_max_id (validate=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/dict/dict0load.cc:1537
Actually, we do not know what the value of the parameter was in stack frame 13. It might very well have been passed as 'false'. In dict_check_sys_tables() I see this code which is not present in MySQL 5.7:
/* Check that the .ibd file exists. */ validate = true; /* Encryption */
dberr_t err = fil_ibd_open( validate,
Thie above assignment 'validate = true' is the smoking gun! It was added in MariaDB 10.2.2 https://github.com/MariaDB/server/commit/2e814d4702d71a04388386a9f591d14a359..., apparently by mistake. I just checked the differences in the function dict_check_sys_tables() between MariaDB 10.2 and MySQL 5.7, and this assignment is the only one that does not make any sense whatsoever.
I will shortly push a fix.
Thank you for your patience (and sorry for the pun)!
Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation
DON’T MISS
M|18
MariaDB User Conference
February 26 - 27, 2018
New York City
Hi Mike, I will use MDEV-13869 MariaDB slow start https://jira.mariadb.org/browse/MDEV-13869 for tracking this fix. The fix is not as trivial as I thought; I got some of my corruption-injection tests failing. Furthermore, I think that in MariaDB 10.2+, we could remove the extended checks also when crash recovery is needed, to avoid unnecessarily long downtime. That one is the already mentioned MDEV-9843 https://jira.mariadb.org/browse/MDEV-9843 InnoDB hangs on startup… https://jira.mariadb.org/browse/MDEV-9843. Related to that, there is also MDEV-14481 Execute InnoDB crash recovery in the background https://jira.mariadb.org/browse/MDEV-14481 which we could have in MariaDB 10.4 or maybe even earlier. Best regards, Marko
Hi Marko, Hope it is okay to resurrect this thread. Restart times where great with 10.2.14, until I recently enabled FTS on thousands of tables. Shutdown and startup both take over an hour. This, coupled with MariaDB eating up far greater sums of allocated memory in my.cnf, I need to restart every few few days or risk hitting the OOM Killer. Any ideas? Thanks Mike Shutdown example pulled from my log (16 minutes): 2019-03-07 20:20:28 140185257170688 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown 2019-03-07 20:20:28 140185257170688 [Note] Event Scheduler: Purging the queue. 0 events 2019-03-07 20:20:35 140185550460672 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2019-03-07 20:20:44 140185550460672 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2019-03-07 20:20:48 140185550460672 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. .... (hundreds more) 2019-03-07 20:30:24 140185550460672 [Note] InnoDB: FTS optimize thread exiting. 2019-03-07 20:31:02 140185257170688 [Note] InnoDB: Starting shutdown... 2019-03-07 20:31:02 140185299834624 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool 2019-03-07 20:31:02 140185299834624 [Note] InnoDB: Instance 0, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:03 140185299834624 [Note] InnoDB: Instance 1, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:04 140185299834624 [Note] InnoDB: Instance 2, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:05 140185299834624 [Note] InnoDB: Instance 3, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:06 140185299834624 [Note] InnoDB: Instance 4, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:07 140185299834624 [Note] InnoDB: Instance 5, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:09 140185299834624 [Note] InnoDB: Instance 6, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:10 140185299834624 [Note] InnoDB: Instance 7, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:12 140185299834624 [Note] InnoDB: Buffer pool(s) dump completed at 190307 20:31:12 2019-03-07 20:36:36 140185257170688 [Note] InnoDB: Shutdown completed; log sequence number 3123967291460 2019-03-07 20:36:36 140185257170688 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2019-03-07 20:36:43 140185257170688 [Note] /usr/sbin/mysqld: Shutdown complete Startup (close to 1 hour): 2019-03-07 20:38:36 140573848283328 [Note] Using unique option prefix 'myisam-recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead. 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Uses event mutexes 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Compressed tables use zlib 1.2.8 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Using Linux native AIO 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Number of pools: 1 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Using SSE2 crc32 instructions 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Initializing buffer pool, total size = 46G, instances = 8, chunk size = 128M 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Setting NUMA memory policy to MPOL_INTERLEAVE 2019-03-07 20:38:38 140573848283328 [Note] InnoDB: Setting NUMA memory policy to MPOL_DEFAULT 2019-03-07 20:38:38 140573848283328 [Note] InnoDB: Completed initialization of buffer pool 2019-03-07 20:38:38 140520675911424 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2019-03-07 20:38:38 140573848283328 [Note] InnoDB: Highest supported file format is Barracuda. 2019-03-07 21:31:01 140573848283328 [Note] InnoDB: 128 out of 128 rollback segments are active. 2019-03-07 21:31:01 140573848283328 [Note] InnoDB: Creating shared tablespace for temporary tables 2019-03-07 21:31:01 140573848283328 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2019-03-07 21:31:01 140573848283328 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2019-03-07 21:31:02 140573848283328 [Note] InnoDB: Waiting for purge to start 2019-03-07 21:31:02 140573848283328 [Note] InnoDB: 5.7.21 started; log sequence number 3123967291460 2019-03-07 21:31:02 140521824470784 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2019-03-07 21:31:02 140573848283328 [Note] Plugin 'FEEDBACK' is disabled. 2019-03-07 21:31:02 140573848283328 [Note] Server socket created on IP: '::'. 2019-03-07 21:31:02 140573848283328 [Warning] 'user' entry 'root@usaddxprdsqlpm1' ignored in --skip-name-resolve mode. 2019-03-07 21:31:02 140573848283328 [Warning] 'proxies_priv' entry '@% root@usaddxprdsqlpm1' ignored in --skip-name-resolve mode. 2019-03-07 21:31:02 140573848283328 [Note] Reading of all Master_info entries succeded 2019-03-07 21:31:02 140573848283328 [Note] Added new Master_info '' to hash table 2019-03-07 21:31:02 140573848283328 [Note] /usr/sbin/mysqld: ready for connections. Version: '10.2.14-MariaDB-10.2.14+maria~xenial-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution On 2018-01-16 2:35 p.m., Marko Mäkelä wrote:
Hi Mike,
I will use MDEV-13869 MariaDB slow start https://jira.mariadb.org/browse/MDEV-13869 for tracking this fix. The fix is not as trivial as I thought; I got some of my corruption-injection tests failing. Furthermore, I think that in MariaDB 10.2+, we could remove the extended checks also when crash recovery is needed, to avoid unnecessarily long downtime. That one is the already mentioned MDEV-9843 https://jira.mariadb.org/browse/MDEV-9843InnoDB hangs on startup… https://jira.mariadb.org/browse/MDEV-9843. Related to that, there is also MDEV-14481 Execute InnoDB crash recovery in the background https://jira.mariadb.org/browse/MDEV-14481 which we could have in MariaDB 10.4 or maybe even earlier.
Best regards,
Marko
Not user what is currently going on. Restarting our slave. Last start process was around an hour. This time around I'm at close to 2 hours and still waiting. According to gbd, looks it is stuck in storage/innobase/fts/fts0fts.cc:6870 fts_check_and_drop_orphaned_tables. Not sure if I hit some infinite loop.... Oh boy! On 2019-03-08 10:44 p.m., Michael Caplan wrote:
Hi Marko,
Hope it is okay to resurrect this thread. Restart times where great with 10.2.14, until I recently enabled FTS on thousands of tables. Shutdown and startup both take over an hour. This, coupled with MariaDB eating up far greater sums of allocated memory in my.cnf, I need to restart every few few days or risk hitting the OOM Killer.
Any ideas?
Thanks
Mike
Shutdown example pulled from my log (16 minutes):
2019-03-07 20:20:28 140185257170688 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown 2019-03-07 20:20:28 140185257170688 [Note] Event Scheduler: Purging the queue. 0 events 2019-03-07 20:20:35 140185550460672 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2019-03-07 20:20:44 140185550460672 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2019-03-07 20:20:48 140185550460672 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. .... (hundreds more) 2019-03-07 20:30:24 140185550460672 [Note] InnoDB: FTS optimize thread exiting. 2019-03-07 20:31:02 140185257170688 [Note] InnoDB: Starting shutdown... 2019-03-07 20:31:02 140185299834624 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool 2019-03-07 20:31:02 140185299834624 [Note] InnoDB: Instance 0, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:03 140185299834624 [Note] InnoDB: Instance 1, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:04 140185299834624 [Note] InnoDB: Instance 2, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:05 140185299834624 [Note] InnoDB: Instance 3, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:06 140185299834624 [Note] InnoDB: Instance 4, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:07 140185299834624 [Note] InnoDB: Instance 5, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:09 140185299834624 [Note] InnoDB: Instance 6, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:10 140185299834624 [Note] InnoDB: Instance 7, restricted to 73728 pages due to innodb_buf_pool_dump_pct=25 2019-03-07 20:31:12 140185299834624 [Note] InnoDB: Buffer pool(s) dump completed at 190307 20:31:12 2019-03-07 20:36:36 140185257170688 [Note] InnoDB: Shutdown completed; log sequence number 3123967291460 2019-03-07 20:36:36 140185257170688 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2019-03-07 20:36:43 140185257170688 [Note] /usr/sbin/mysqld: Shutdown complete
Startup (close to 1 hour):
2019-03-07 20:38:36 140573848283328 [Note] Using unique option prefix 'myisam-recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead. 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Uses event mutexes 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Compressed tables use zlib 1.2.8 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Using Linux native AIO 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Number of pools: 1 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Using SSE2 crc32 instructions 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Initializing buffer pool, total size = 46G, instances = 8, chunk size = 128M 2019-03-07 20:38:37 140573848283328 [Note] InnoDB: Setting NUMA memory policy to MPOL_INTERLEAVE 2019-03-07 20:38:38 140573848283328 [Note] InnoDB: Setting NUMA memory policy to MPOL_DEFAULT 2019-03-07 20:38:38 140573848283328 [Note] InnoDB: Completed initialization of buffer pool 2019-03-07 20:38:38 140520675911424 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2019-03-07 20:38:38 140573848283328 [Note] InnoDB: Highest supported file format is Barracuda. 2019-03-07 21:31:01 140573848283328 [Note] InnoDB: 128 out of 128 rollback segments are active. 2019-03-07 21:31:01 140573848283328 [Note] InnoDB: Creating shared tablespace for temporary tables 2019-03-07 21:31:01 140573848283328 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2019-03-07 21:31:01 140573848283328 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2019-03-07 21:31:02 140573848283328 [Note] InnoDB: Waiting for purge to start 2019-03-07 21:31:02 140573848283328 [Note] InnoDB: 5.7.21 started; log sequence number 3123967291460 2019-03-07 21:31:02 140521824470784 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2019-03-07 21:31:02 140573848283328 [Note] Plugin 'FEEDBACK' is disabled. 2019-03-07 21:31:02 140573848283328 [Note] Server socket created on IP: '::'. 2019-03-07 21:31:02 140573848283328 [Warning] 'user' entry 'root@usaddxprdsqlpm1' ignored in --skip-name-resolve mode. 2019-03-07 21:31:02 140573848283328 [Warning] 'proxies_priv' entry '@% root@usaddxprdsqlpm1' ignored in --skip-name-resolve mode. 2019-03-07 21:31:02 140573848283328 [Note] Reading of all Master_info entries succeded 2019-03-07 21:31:02 140573848283328 [Note] Added new Master_info '' to hash table 2019-03-07 21:31:02 140573848283328 [Note] /usr/sbin/mysqld: ready for connections. Version: '10.2.14-MariaDB-10.2.14+maria~xenial-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
On 2018-01-16 2:35 p.m., Marko Mäkelä wrote:
Hi Mike,
I will use MDEV-13869 MariaDB slow start https://jira.mariadb.org/browse/MDEV-13869 for tracking this fix. The fix is not as trivial as I thought; I got some of my corruption-injection tests failing. Furthermore, I think that in MariaDB 10.2+, we could remove the extended checks also when crash recovery is needed, to avoid unnecessarily long downtime. That one is the already mentioned MDEV-9843 https://jira.mariadb.org/browse/MDEV-9843InnoDB hangs on startup… https://jira.mariadb.org/browse/MDEV-9843. Related to that, there is also MDEV-14481 Execute InnoDB crash recovery in the background https://jira.mariadb.org/browse/MDEV-14481 which we could have in MariaDB 10.4 or maybe even earlier.
Best regards,
Marko
On Thu, 14 Mar 2019 00:08:08 -0300
Michael Caplan
Not user what is currently going on. Restarting our slave. Last start process was around an hour. This time around I'm at close to 2 hours and still waiting. According to gbd, looks it is stuck in storage/innobase/fts/fts0fts.cc:6870 fts_check_and_drop_orphaned_tables. Not sure if I hit some infinite loop....
Oh boy!
Can you show: thread apply all bt full After a little longer do the same capture. Maybe some of the local variables will be enough so show if there's progress or just odd values that could cause infinite loops. https://mariadb.com/kb/en/library/how-to-produce-a-full-stack-trace-for-mysq...
Hi Michael,
Hope it is okay to resurrect this thread. Restart times where great with 10.2.14, until I recently enabled FTS on thousands of tables. Shutdown and startup both take over an hour. This, coupled with MariaDB eating up far greater sums of allocated memory in my.cnf, I need to restart every few few days or risk hitting the OOM Killer.
Any ideas?
I am sorry, but there have been several problems with InnoDB fulltext indexes. My personal opinion is that the fulltext search in InnoDB was badly designed and is lacking essential features. These observations still apply to it: https://mariadb.com/resources/blog/initial-impressions-innodb-fulltext I hope that some day, MariaDB will gain a redesigned fulltext search that works with any storage engine. Some synchronization problems or hangs have been fixed since 10.2.14, but I am not convinced that the code is free of race conditions or hangs. Like Daniel Black pointed out, stack traces of all threads would be very helpful. MDEV-16557 (10.2.17) might help on shutdown, because it would allow some fulltext search operations to be interrupted inside InnoDB. (All currently running queries are supposed to be aborted on shutdown.)
2019-03-07 20:20:35 140185550460672 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. This basically reports a bug in InnoDB fulltext indexing. We could find this one by running a server (in a testing environment) inside gdb, and setting a breakpoint to where this message is reported.
Because of this bug, I suppose that your fulltext searches can miss some results that should have matched the search pattern. (Please file this bug separately with some more details, preferably with a repeatable test case.) While searching our bug database for the above message, I noticed that you had filed MDEV-18867 for these slow start/shutdown issues. I would suggest to continue the diagnosis there.
Startup (close to 1 hour): [snip] 2019-03-07 20:38:38 140573848283328 [Note] InnoDB: Highest supported file format is Barracuda. 2019-03-07 21:31:01 140573848283328 [Note] InnoDB: 128 out of 128 rollback segments are active.
According to the gdb-startup.txt that you attached to MDEV-18867 the problem is fts_check_and_drop_orphaned_tables(), which we can remove once MDEV-18518 implements atomic multi-table transactions in InnoDB, possibly in MariaDB 10.5. When it comes to the slow shutdown, I would recommend to try with the newest MariaDB 10.2 release, and report details if the problem remains with that. One more source of "slow start" remains in the case when some redo log was applied in recovery (which is not the case in the log that you posted). In this case, InnoDB is reading every data file, to validate that all the .ibd files have correct names and tablespace IDs. I believe that this check is redundant and can be removed, now that MariaDB 10.2.19+ (and 10.3 or 10.4) have crash-safe RENAME operations inside InnoDB. I have filed that as https://jira.mariadb.org/browse/MDEV-18733 MariaDB slow start after crash recovery With best regards, Marko
participants (3)
-
Daniel Black
-
Marko Mäkelä
-
Michael Caplan