I want to resurrect this. b/c i have found what i see to be the broken bit of code/configuration. If i don't include the following the commented out portion everything starts up. -- Starts -- /etc/mysql/conf.conf.d/100_sqlerr.log [mysqld] #plugin_load_add = sql_errlog sql_error_log=ON sql_error_log_rotate=1 sql_error_log_size_limit=100000000 so if i remove the comment for "plugin_log_add" to run the server never starts. I will produce a reproduce case, but can someone explain why this would not start if i added a plugin that was already added. This journey sucked, but i learned a lot, even though i've been using mysql since 3.X Is there different syntax for 10.4. Should i just add it to start up args? and then add the variables after it starts. This worked in 10.3. Thanks all! On Mon, Apr 27, 2020 at 3:32 PM Jeff Dyke <jeff.dyke@gmail.com> wrote:
That second note is horribly misleading (note the bold): -- orig -- - I was able to take a full physical backup that runs every morning from the VPC 1 (prod cluster) directly from S3 and restore it to the server where i can not restore backups of any of the prod nodes. After this successfully started, i ran mysqlcheck --all-databases (online) againh mst this newly restored database and there was no corruption. -- should be -- - I was able to take the full physical backup *from a slave whose master is a prod cluster node,* which runs every morning, from S3 to the server where i can not restore backups of any of the prod nodes. After this successfully started, i ran mysqlcheck --all-databases (online) againh mst this newly restored database and there was no corruption.
Sorry and Thanks
On Mon, Apr 27, 2020 at 2:38 PM Jeff Dyke <jeff.dyke@gmail.com> wrote:
Hello again peoples. I'm still at it, but have taken some time for higher priority projects and now have some follow up data, not sure if i have any questions, we'll see.
in the staging VPC, referred above/below as VPC 2. I can restore any combinations of tables right down to only selecting mysql and performance_schema. In the prod VPC (VPC 1). I can not install any combination of databases. So like was thought before, i may have corruption in one of the tables, but....
- i have checked all tables using innochecksum (offline), aria_chk (offline) and there are no indications of corruption - I was able to take a full physical backup that runs every morning from the VPC 1 (prod cluster) directly from S3 and restore it to the server where i can not restore backups of any of the prod nodes. After this successfully started, i ran mysqlcheck --all-databases (online) againh mst this newly restored database and there was no corruption. - if i take the production node, that I'm using for backup testing, out of the cluster and then run mysqlcheck --all-databases and put it back on the cluster, no corrupt tables are reported.
I would prefer not to build an entire new cluster, but realize i could have done so by now. b/c i would really like to know why and with the backups and snapshots that i can use the slave for along with 3 cluster nodes, my data is safe and being written to daily.
All of these databases where upgraded from 10.3 to 10.4 (in the last two months) according to the instructions on mariadb.com. I ran staging first, let it hang a couple weeks and then upgraded production and both are at the same exact release. Ver 10.4.12-MariaDB-1:10.4.12+maria~bionic-log for debian-linux-gnu on x86_64 (mariadb.org binary distribution)
I've considered compiling a debug enabled version of mariadb, but that seems it would only be useful if it were on the galera nodes, and i'm not too keen on doing that.
Any and all thoughts are welcome, i didn't want to repeat any information so for new readers you'll have to read from the start of the thread and i do indeed appreciate it.
Jeff
On Sat, Apr 18, 2020 at 1:22 PM Jeff Dyke <jeff.dyke@gmail.com> wrote:
Hi Karl, someone else just saw that as well. it was indeed a type in the email. To answer your questions; the plugins are the same sql_error_log and socket_plugin (which is no longer needed in 10.4 and was removed after the upgrade). Also since i can restore a new node from a staging backup, in either VPC.
Databases are a good question, no i have not. Perhaps i should try a complete set of combinations, which may be a better use of time than brining the cluster up to 5 and then seeing if i can take a physical backup of one of the new nodes.
On Sat, Apr 18, 2020 at 1:11 PM Karl Erik Levik <karl.levik@gmail.com> wrote:
Well, what I mean is, you're missing the "q" in "mysql", so I was thinking perhaps that command doesn't actually do what you think it does?
If that's not the issue, then I guess you have to look again at your prod nodes and figure out what, if anything, is special/unusual about them. Do you have any special plugins enabled? Any special storage engines used that you don't have on staging? (Wouldn't think so since only InnoDB is properly supported by Galera ...) Do you have any crud in your datadir that might create problems for mariabackup?
Have you tried doing a partial backup/restore instead, i.e. only some of the databases?
Cheers, Karl
On Sat, 18 Apr 2020 at 17:25, Jeff Dyke <jeff.dyke@gmail.com> wrote:
Thanks Karl, that is just there so everything under mysql is deleted and not the directory itself as mariabackup needs the datadir clean. And i'm using the same "script" on both VPCs.
Appreciate the read!
On Sat, Apr 18, 2020 at 6:01 AM Karl Erik Levik <karl.levik@gmail.com> wrote:
Hi Jeff,
Probably not helpful, but I dunno if this is a typo just in your email or also in the script?
> sudo rm -rf /var/lib/mysl/*
Cheers, Karl
On Thu, 16 Apr 2020 at 00:47, Jeff Dyke <jeff.dyke@gmail.com> wrote:
> This has biting me for a bit, and i have tried a litany of things. > Here is the issue: > VPC 1: (prod) > Mariadb 10.4.12 - galera in a 3 node cluster, cluster functioning > perfectly > > VPC 2: (staging) > Mariadb 10.4.12 galera in a 3 node cluster also running smoothly. > > -- backup script (clipped for only the essentials) -- > set -e > DIR="/home/jeff/restore" > rm -rf $DIR > mkdir $DIR > AUTH='--defaults-extra-file=/root/db.cnf -u root' > sudo mysql ${AUTH} -e "SET GLOBAL wsrep_desync = ON;" > sudo mysql ${AUTH} -e "SET GLOBAL wsrep_on = OFF"; > sudo mariabackup ${AUTH} --backup --binlog-info=ON > --target-dir="${DIR}" 2>&1 | grep -vE "Copying|done|log scanned" > sudo mysql ${AUTH} -e "SET GLOBAL wsrep_on = ON;" > sudo mysql ${AUTH} -e "SET GLOBAL wsrep_desync = OFF"; > > -- end script -- > > -- restore process -- > sudo service mariadb stop (or kill -9 PID only when the server won't > start) > sudo rm -rf /var/lib/mysl/* > sudo mariabackup --prepare --target-dir=restore > sudo mariabackup --copy-back --target-dir=restore > sudo chown -R mysql:mysql /var/lib/mysql/* > sudo service mariadb start > -- end restore -- > > I can run this on any of the script and backup process on any of the > *staging* nodes and restore them to a freshly built server within > that VPC. I can also take the backup from *staging* and load in > into a newly build machine on the *prod* VPC. > > What i can not do is run the script on any of the prod members and > restore it to a newly built server, regardless of location. (really the > locations don't matter at all) > > The following is the log from a failing restore from a proddb backup > and restore: > 2020-04-15 22:11:10 0 [Note] InnoDB: Using Linux native AIO > 2020-04-15 22:11:10 0 [Note] InnoDB: Mutexes and rw_locks use GCC > atomic builtins > 2020-04-15 22:11:10 0 [Note] InnoDB: Uses event mutexes > 2020-04-15 22:11:10 0 [Note] InnoDB: Compressed tables use zlib > 1.2.11 > 2020-04-15 22:11:10 0 [Note] InnoDB: Number of pools: 1 > 2020-04-15 22:11:10 0 [Note] InnoDB: Using SSE2 crc32 instructions > 2020-04-15 22:11:10 0 [Note] mysqld: O_TMPFILE is not supported on > /tmp (disabling future attempts) > 2020-04-15 22:11:10 0 [Note] InnoDB: Initializing buffer pool, total > size = 2G, instances = 8, chunk size = 128M > 2020-04-15 22:11:10 0 [Note] InnoDB: Completed initialization of > buffer pool > 2020-04-15 22:11:10 0 [Note] InnoDB: If the mysqld execution user is > authorized, page cleaner thread priority can be changed. See the man page > of setpriority(). > 2020-04-15 22:11:10 0 [Note] InnoDB: Setting log file > ./ib_logfile101 size to 50331648 bytes > 2020-04-15 22:11:10 0 [Note] InnoDB: Setting log file ./ib_logfile1 > size to 50331648 bytes > 2020-04-15 22:11:10 0 [Note] InnoDB: Renaming log file > ./ib_logfile101 to ./ib_logfile0 > 2020-04-15 22:11:10 0 [Note] InnoDB: New log files created, > LSN=42006460184 > 2020-04-15 22:11:10 0 [Note] InnoDB: 128 out of 128 rollback > segments are active. > 2020-04-15 22:11:10 0 [Note] InnoDB: Creating shared tablespace for > temporary tables > 2020-04-15 22:11:10 0 [Note] InnoDB: Setting file './ibtmp1' size to > 12 MB. Physically writing the file full; Please wait ... > 2020-04-15 22:11:10 0 [Note] InnoDB: File './ibtmp1' size is now 12 > MB. > 2020-04-15 22:11:10 0 [Note] InnoDB: Waiting for purge to start > 2020-04-15 22:11:10 0 [Note] InnoDB: 10.4.12 started; log sequence > number 42006460428; transaction id 23858673 > 2020-04-15 22:11:10 0 [Note] InnoDB: Loading buffer pool(s) from > /var/lib/mysql/ib_buffer_pool > 2020-04-15 22:11:10 0 [Note] Plugin 'FEEDBACK' is disabled. > 2020-04-15 22:11:11 0 [Note] InnoDB: Buffer pool(s) load completed > at 200415 22:11:11 > -- hangs here for about an hour before timing out -- > > > Logfile from a successful start > > 2020-04-15 23:13:15 0 [Note] InnoDB: Using Linux native AIO > 2020-04-15 23:13:15 0 [Note] InnoDB: Mutexes and rw_locks use GCC > atomic builtins > 2020-04-15 23:13:15 0 [Note] InnoDB: Uses event mutexes > 2020-04-15 23:13:15 0 [Note] InnoDB: Compressed tables use zlib > 1.2.11 > 2020-04-15 23:13:15 0 [Note] InnoDB: Number of pools: 1 > 2020-04-15 23:13:15 0 [Note] InnoDB: Using SSE2 crc32 instructions > 2020-04-15 23:13:15 0 [Note] mysqld: O_TMPFILE is not supported on > /tmp (disabling future attempts) > 2020-04-15 23:13:15 0 [Note] InnoDB: Initializing buffer pool, total > size = 2G, instances = 8, chunk size = 128M > 2020-04-15 23:13:16 0 [Note] InnoDB: Completed initialization of > buffer pool > 2020-04-15 23:13:16 0 [Note] InnoDB: If the mysqld execution user is > authorized, page cleaner thread priority can be changed. See the man page > of setpriority(). > 2020-04-15 23:13:16 0 [Note] InnoDB: Setting log file > ./ib_logfile101 size to 50331648 bytes > 2020-04-15 23:13:16 0 [Note] InnoDB: Setting log file ./ib_logfile1 > size to 50331648 bytes > 2020-04-15 23:13:16 0 [Note] InnoDB: Renaming log file > ./ib_logfile101 to ./ib_logfile0 > 2020-04-15 23:13:16 0 [Note] InnoDB: New log files created, > LSN=43323668550 > 2020-04-15 23:13:16 0 [Note] InnoDB: 128 out of 128 rollback > segments are active. > 2020-04-15 23:13:16 0 [Note] InnoDB: Creating shared tablespace for > temporary tables > 2020-04-15 23:13:16 0 [Note] InnoDB: Setting file './ibtmp1' size to > 12 MB. Physically writing the file full; Please wait ... > 2020-04-15 23:13:16 0 [Note] InnoDB: File './ibtmp1' size is now 12 > MB. > 2020-04-15 23:13:16 0 [Note] InnoDB: Waiting for purge to start > 2020-04-15 23:13:16 0 [Note] InnoDB: 10.4.12 started; log sequence > number 43323669004; transaction id 22422468 > 2020-04-15 23:13:16 0 [Note] InnoDB: Loading buffer pool(s) from > /var/lib/mysql/ib_buffer_pool > 2020-04-15 23:13:16 0 [Note] Plugin 'FEEDBACK' is disabled. > 2020-04-15 23:13:16 0 [Note] Server socket created on IP: '0.0.0.0'. > 2020-04-15 23:13:16 0 [Note] Reading of all Master_info entries > succeeded > 2020-04-15 23:13:16 0 [Note] Added new Master_info '' to hash table > 2020-04-15 23:13:16 0 [Note] /usr/sbin/mysqld: ready for connections. > Version: '10.4.12-MariaDB-1:10.4.12+maria~bionic' socket: > '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary > distribution > 2020-04-15 23:13:22 0 [Note] InnoDB: Buffer pool(s) load completed > at 200415 23:13:22 > > ------------------------ > The main difference between the two is that: > - the server socket is logged > - master_info is logged > - and it binds to the port/socket > > Before they *both* end with > 2020-04-15 23:13:22 0 [Note] InnoDB: Buffer pool(s) load completed > at 200415 23:13:22 > > -- Other Notes -- > - Config files are the same except for ips, both on newly built > servers as the cluster as they are managed through salt. I ran a diff -r > against prod vs staging and only saw > > There has to be some corruption in the mariabackup output, but still > reported OK. Since i can put a staging backup on a prod server, but i can > not put a prod backup on a staging server. > > If you've gotten this far, thank you. If you have any ideas on what > i could try, i'd be grateful. aria_chk and all other table checks return > correctly. I'm not sure what makes a server write out the master_info, so > possibly that is a pointer. > > I'm happy to answer any questions, i've tried way to many things to > post here as i didn't want to write until i had enough of dumb stuff > figured out. I could rebuild the prod cluster but am really looking to > understand this. I don't really want to build my own with debug info on > b/c why would they be different between the two vpcs. > > > _______________________________________________ > Mailing list: https://launchpad.net/~maria-discuss > Post to : maria-discuss@lists.launchpad.net > Unsubscribe : https://launchpad.net/~maria-discuss > More help : https://help.launchpad.net/ListHelp > _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp