From ubuntu@rkw.io Wed Jul 20 14:38:25 2016 From: Mark Wadham To: discuss@lists.mariadb.org Subject: [Maria-discuss] MariaDB 10.1.14 failure to initiate SST after RSU schema upgrade Date: Wed, 20 Jul 2016 15:38:25 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8170018330045337942==" --===============8170018330045337942== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Hi, We have a repeatable failure to initiate IST with MariaDB 10.1.14 after performing a schema upgrade on a single node in RSU mode. The error condition is when there is a delete query in the format: delete from where id >= on the non-RSU cluster nodes while the node is disconnected from the cluster. On rejoining the node determines that it is in sync with the other cluster nodes and no IST is performed, despite the rows that were deleted in the cluster. If we then delete the rows manually from the joining node, mysqld immediately crashes on the other nodes because they can't execute the new write transaction. The process we followed is: 1. Set up a 3-node cluster, nodes 0,1,2 2. Enable RSU on node 0: SET GLOBAL wsrep_OSU_method='RSU'; 3. Isolate node 0 from the cluster: SET GLOBAL wsrep_cluster_address="gcomm://"; 4. Perform a backward-compatible schema change, since this is the point of this process. In our test we added a single column to a table with a default value of null. Additionally we deleted some rows from a table on nodes 1 and 2, with: delete from
where id >= 100; which affected around 20 rows. 5. Rejoin the node to the cluster: SET GLOBAL wsrep_cluster_address=""; At this point the node immediately rejoins without doing IST and believes it is in sync, yet the rows are deleted on nodes 1 and 2 but not node 0. Interestingly if the delete query is: delete from
where id = ; there is no problem. Also we have not had any issue with syncing INSERT and UPDATE statements. A combination of INSERT, UPDATE and DELETE where id >= resulted in the insert/update statements being synced but the deletes not synced. It is as if the quorum somehow doesn't recognise delete where id >= as an event. Our next test cases are: 1. Switching node 0 back to TOI mode before rejoining the cluster, although I can't really see how this would make a difference. 2. Upgrading to MariaDB 10.1.16 which was released a couple of days ago. 3. Testing whether regular IST is affected, ie IST that should occur normally without switching to RSU mode or dropping a node out of the cluster. This seems like a pretty basic failure and I'm concerned that it may also affect regular IST, i.e. a node falling behind the cluster for normal reasons without any involvement of RSU mode, which would effectively make the whole system useless if it could randomly drop delete statements. If anyone can shed any light on why this may be happening we would be very grateful! Thanks, Mark --===============8170018330045337942==-- From ubuntu@rkw.io Wed Jul 20 15:34:35 2016 From: Mark Wadham To: discuss@lists.mariadb.org Subject: Re: [Maria-discuss] MariaDB 10.1.14 failure to initiate SST after RSU schema upgrade Date: Wed, 20 Jul 2016 16:34:34 +0100 Message-ID: In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2370926307088483366==" --===============2370926307088483366== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > 1. Switching node 0 back to TOI mode before rejoining the cluster, > although I can't really see how this would make a difference. Made no difference. > 2. Upgrading to MariaDB 10.1.16 which was released a couple of days > ago. Also made no difference, the issue is still reproducable. > 3. Testing whether regular IST is affected, ie IST that should occur > normally without switching to RSU mode or dropping a node out of the > cluster. Will reply again when we've tested this. --===============2370926307088483366==-- From ubuntu@rkw.io Wed Jul 20 16:13:52 2016 From: Mark Wadham To: discuss@lists.mariadb.org Subject: Re: [Maria-discuss] MariaDB 10.1.14 failure to initiate SST after RSU schema upgrade Date: Wed, 20 Jul 2016 17:13:51 +0100 Message-ID: <1F0DF66C-305E-47A7-B560-06F202F6542D@rkw.io> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1098904269633866601==" --===============1098904269633866601== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On 20 Jul 2016, at 16:34, Mark Wadham wrote: >> 3. Testing whether regular IST is affected, ie IST that should occur >> normally without switching to RSU mode or dropping a node out of the >> cluster. > > Will reply again when we've tested this. Haven't tested this yet, but we just tested the process without making a schema change and this resulted in the delete statements being synced correctly via IST when the node rejoins, so it seems like making the schema change triggers the bug. I have raised a jira issue for this: https://jira.mariadb.org/browse/MDEV-10406 Since IST is not disturbed when we don't make a schema change it seems likely that regular IST will work correctly, but we will test it anyway. --===============1098904269633866601==-- From nirbhay@mariadb.com Wed Jul 20 21:31:23 2016 From: Nirbhay Choubey To: discuss@lists.mariadb.org Subject: Re: [Maria-discuss] MariaDB 10.1.14 failure to initiate SST after RSU schema upgrade Date: Wed, 20 Jul 2016 17:31:22 -0400 Message-ID: In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1872175155460153389==" --===============1872175155460153389== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Hi Mark, On Wed, Jul 20, 2016 at 10:38 AM, Mark Wadham wrote: > Hi, > > We have a repeatable failure to initiate IST with MariaDB 10.1.14 after > performing a schema upgrade on a single node in RSU mode. The error > condition is when there is a delete query in the format: > > delete from
where id >= > > on the non-RSU cluster nodes while the node is disconnected from the > cluster. On rejoining the node determines that it is in sync with the > other cluster nodes and no IST is performed, despite the rows that were > deleted in the cluster. If we then delete the rows manually from the > joining node, mysqld immediately crashes on the other nodes because they > can't execute the new write transaction. > > The process we followed is: > > 1. Set up a 3-node cluster, nodes 0,1,2 > 2. Enable RSU on node 0: > > SET GLOBAL wsrep_OSU_method='RSU'; > > 3. Isolate node 0 from the cluster: > > SET GLOBAL wsrep_cluster_address="gcomm://"; > > 4. Perform a backward-compatible schema change, since this is the point of > this process. In our test we added a single column to a table with a > default value of null. > As discussed on IRC #mariadb, you do not really need to take the node off cluster (3). Just set wsrep_osu_method's session value to RSU and perform the schema change. With RSU mode enabled, the node automatically desyncs itself from the cluster before executing any DDL,and thus other nodes in the cluster are not impacted. Best, Nirbhay > > Additionally we deleted some rows from a table on nodes 1 and 2, with: > > delete from
where id >= 100; > > which affected around 20 rows. > > 5. Rejoin the node to the cluster: > > SET GLOBAL wsrep_cluster_address=""; > > At this point the node immediately rejoins without doing IST and believes > it is in sync, yet the rows are deleted on nodes 1 and 2 but not node 0. > > Interestingly if the delete query is: > > delete from
where id = ; > > there is no problem. Also we have not had any issue with syncing INSERT > and UPDATE statements. A combination of INSERT, UPDATE and DELETE where id > >= resulted in the insert/update statements being synced but the deletes > not synced. It is as if the quorum somehow doesn't recognise delete where > id >= as an event. > > Our next test cases are: > > 1. Switching node 0 back to TOI mode before rejoining the cluster, > although I can't really see how this would make a difference. > > 2. Upgrading to MariaDB 10.1.16 which was released a couple of days ago. > > 3. Testing whether regular IST is affected, ie IST that should occur > normally without switching to RSU mode or dropping a node out of the > cluster. > > > This seems like a pretty basic failure and I'm concerned that it may also > affect regular IST, i.e. a node falling behind the cluster for normal > reasons without any involvement of RSU mode, which would effectively make > the whole system useless if it could randomly drop delete statements. > > If anyone can shed any light on why this may be happening we would be very > grateful! > > Thanks, > Mark > > _______________________________________________ > Mailing list: https://launchpad.net/~maria-discuss > Post to : maria-discuss(a)lists.launchpad.net > Unsubscribe : https://launchpad.net/~maria-discuss > More help : https://help.launchpad.net/ListHelp > --===============1872175155460153389== Content-Type: text/html Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="attachment.html" MIME-Version: 1.0 PGRpdiBkaXI9Imx0ciI+SGkgTWFyayw8ZGl2IGNsYXNzPSJnbWFpbF9leHRyYSI+PGJyPjxkaXYg Y2xhc3M9ImdtYWlsX3F1b3RlIj5PbiBXZWQsIEp1bCAyMCwgMjAxNiBhdCAxMDozOCBBTSwgTWFy ayBXYWRoYW0gPHNwYW4gZGlyPSJsdHIiPiZsdDs8YSBocmVmPSJtYWlsdG86dWJ1bnR1QHJrdy5p byIgdGFyZ2V0PSJfYmxhbmsiPnVidW50dUBya3cuaW88L2E+Jmd0Ozwvc3Bhbj4gd3JvdGU6PGJy PjxibG9ja3F1b3RlIGNsYXNzPSJnbWFpbF9xdW90ZSIgc3R5bGU9Im1hcmdpbjowIDAgMCAuOGV4 O2JvcmRlci1sZWZ0OjFweCAjY2NjIHNvbGlkO3BhZGRpbmctbGVmdDoxZXgiPkhpLDxicj4KPGJy PgpXZSBoYXZlIGEgcmVwZWF0YWJsZSBmYWlsdXJlIHRvIGluaXRpYXRlIElTVCB3aXRoIE1hcmlh REIgMTAuMS4xNCBhZnRlciBwZXJmb3JtaW5nIGEgc2NoZW1hIHVwZ3JhZGUgb24gYSBzaW5nbGUg bm9kZSBpbiBSU1UgbW9kZS7CoCBUaGUgZXJyb3IgY29uZGl0aW9uIGlzIHdoZW4gdGhlcmUgaXMg YSBkZWxldGUgcXVlcnkgaW4gdGhlIGZvcm1hdDo8YnI+Cjxicj4KZGVsZXRlIGZyb20gJmx0O3Rh YmxlJmd0OyB3aGVyZSBpZCAmZ3Q7PSAmbHQ7biZndDs8YnI+Cjxicj4Kb24gdGhlIG5vbi1SU1Ug Y2x1c3RlciBub2RlcyB3aGlsZSB0aGUgbm9kZSBpcyBkaXNjb25uZWN0ZWQgZnJvbSB0aGUgY2x1 c3Rlci7CoCBPbiByZWpvaW5pbmcgdGhlIG5vZGUgZGV0ZXJtaW5lcyB0aGF0IGl0IGlzIGluIHN5 bmMgd2l0aCB0aGUgb3RoZXIgY2x1c3RlciBub2RlcyBhbmQgbm8gSVNUIGlzIHBlcmZvcm1lZCwg ZGVzcGl0ZSB0aGUgcm93cyB0aGF0IHdlcmUgZGVsZXRlZCBpbiB0aGUgY2x1c3Rlci7CoCBJZiB3 ZSB0aGVuIGRlbGV0ZSB0aGUgcm93cyBtYW51YWxseSBmcm9tIHRoZSBqb2luaW5nIG5vZGUsIG15 c3FsZCBpbW1lZGlhdGVseSBjcmFzaGVzIG9uIHRoZSBvdGhlciBub2RlcyBiZWNhdXNlIHRoZXkg Y2FuJiMzOTt0IGV4ZWN1dGUgdGhlIG5ldyB3cml0ZSB0cmFuc2FjdGlvbi48YnI+Cjxicj4KVGhl IHByb2Nlc3Mgd2UgZm9sbG93ZWQgaXM6PGJyPgo8YnI+CjEuIFNldCB1cCBhIDMtbm9kZSBjbHVz dGVyLCBub2RlcyAwLDEsMjxicj4KMi4gRW5hYmxlIFJTVSBvbiBub2RlIDA6PGJyPgo8YnI+ClNF VCBHTE9CQUwgd3NyZXBfT1NVX21ldGhvZD0mIzM5O1JTVSYjMzk7Ozxicj4KPGJyPgozLiBJc29s YXRlIG5vZGUgMCBmcm9tIHRoZSBjbHVzdGVyOjxicj4KPGJyPgpTRVQgR0xPQkFMIHdzcmVwX2Ns dXN0ZXJfYWRkcmVzcz0mcXVvdDtnY29tbTovLyZxdW90Ozs8YnI+Cjxicj4KNC4gUGVyZm9ybSBh IGJhY2t3YXJkLWNvbXBhdGlibGUgc2NoZW1hIGNoYW5nZSwgc2luY2UgdGhpcyBpcyB0aGUgcG9p bnQgb2YgdGhpcyBwcm9jZXNzLsKgIEluIG91ciB0ZXN0IHdlIGFkZGVkIGEgc2luZ2xlIGNvbHVt biB0byBhIHRhYmxlIHdpdGggYSBkZWZhdWx0IHZhbHVlIG9mIG51bGwuPGJyPjwvYmxvY2txdW90 ZT48ZGl2Pjxicj48L2Rpdj48ZGl2PkFzIGRpc2N1c3NlZCBvbiBJUkMgI21hcmlhZGIsIHlvdSBk byBub3QgcmVhbGx5IG5lZWQgdG8gdGFrZSB0aGUgbm9kZSBvZmYgY2x1c3RlciAoMykuPC9kaXY+ PGRpdj5KdXN0IHNldCB3c3JlcF9vc3VfbWV0aG9kJiMzOTtzIHNlc3Npb24gdmFsdWUgdG8gUlNV IGFuZCBwZXJmb3JtIHRoZSBzY2hlbWEgY2hhbmdlLjwvZGl2PjxkaXY+V2l0aCBSU1UgbW9kZSBl bmFibGVkLCB0aGUgbm9kZSBhdXRvbWF0aWNhbGx5IGRlc3luY3MgaXRzZWxmIGZyb20gdGhlIGNs dXN0ZXIgYmVmb3JlPC9kaXY+PGRpdj5leGVjdXRpbmcgYW55IERETCxhbmQgdGh1cyBvdGhlciBu b2RlcyBpbiB0aGUgY2x1c3RlciBhcmUgbm90IGltcGFjdGVkLjwvZGl2PjxkaXY+PGJyPjwvZGl2 PjxkaXY+QmVzdCw8L2Rpdj48ZGl2Pk5pcmJoYXk8L2Rpdj48ZGl2PsKgPC9kaXY+PGJsb2NrcXVv dGUgY2xhc3M9ImdtYWlsX3F1b3RlIiBzdHlsZT0ibWFyZ2luOjAgMCAwIC44ZXg7Ym9yZGVyLWxl ZnQ6MXB4ICNjY2Mgc29saWQ7cGFkZGluZy1sZWZ0OjFleCI+Cjxicj4KQWRkaXRpb25hbGx5IHdl IGRlbGV0ZWQgc29tZSByb3dzIGZyb20gYSB0YWJsZSBvbiBub2RlcyAxIGFuZCAyLCB3aXRoOjxi cj4KPGJyPgpkZWxldGUgZnJvbSAmbHQ7dGFibGUmZ3Q7IHdoZXJlIGlkICZndDs9IDEwMDs8YnI+ Cjxicj4Kd2hpY2ggYWZmZWN0ZWQgYXJvdW5kIDIwIHJvd3MuPGJyPgo8YnI+CjUuIFJlam9pbiB0 aGUgbm9kZSB0byB0aGUgY2x1c3Rlcjo8YnI+Cjxicj4KU0VUIEdMT0JBTCB3c3JlcF9jbHVzdGVy X2FkZHJlc3M9JnF1b3Q7Jmx0O2djb21tIHN0cmluZyBmcm9tIGNvbmZpZyBmaWxlJmd0OyZxdW90 Ozs8YnI+Cjxicj4KQXQgdGhpcyBwb2ludCB0aGUgbm9kZSBpbW1lZGlhdGVseSByZWpvaW5zIHdp dGhvdXQgZG9pbmcgSVNUIGFuZCBiZWxpZXZlcyBpdCBpcyBpbiBzeW5jLCB5ZXQgdGhlIHJvd3Mg YXJlIGRlbGV0ZWQgb24gbm9kZXMgMSBhbmQgMiBidXQgbm90IG5vZGUgMC48YnI+Cjxicj4KSW50 ZXJlc3RpbmdseSBpZiB0aGUgZGVsZXRlIHF1ZXJ5IGlzOjxicj4KPGJyPgpkZWxldGUgZnJvbSAm bHQ7dGFibGUmZ3Q7IHdoZXJlIGlkID0gJmx0O24mZ3Q7Ozxicj4KPGJyPgp0aGVyZSBpcyBubyBw cm9ibGVtLsKgIEFsc28gd2UgaGF2ZSBub3QgaGFkIGFueSBpc3N1ZSB3aXRoIHN5bmNpbmcgSU5T RVJUIGFuZCBVUERBVEUgc3RhdGVtZW50cy7CoCBBIGNvbWJpbmF0aW9uIG9mIElOU0VSVCwgVVBE QVRFIGFuZCBERUxFVEUgd2hlcmUgaWQgJmd0Oz0gcmVzdWx0ZWQgaW4gdGhlIGluc2VydC91cGRh dGUgc3RhdGVtZW50cyBiZWluZyBzeW5jZWQgYnV0IHRoZSBkZWxldGVzIG5vdCBzeW5jZWQuwqAg SXQgaXMgYXMgaWYgdGhlIHF1b3J1bSBzb21laG93IGRvZXNuJiMzOTt0IHJlY29nbmlzZSBkZWxl dGUgd2hlcmUgaWQgJmd0Oz0gYXMgYW4gZXZlbnQuPGJyPgo8YnI+Ck91ciBuZXh0IHRlc3QgY2Fz ZXMgYXJlOjxicj4KPGJyPgoxLiBTd2l0Y2hpbmcgbm9kZSAwIGJhY2sgdG8gVE9JIG1vZGUgYmVm b3JlIHJlam9pbmluZyB0aGUgY2x1c3RlciwgYWx0aG91Z2ggSSBjYW4mIzM5O3QgcmVhbGx5IHNl ZSBob3cgdGhpcyB3b3VsZCBtYWtlIGEgZGlmZmVyZW5jZS48YnI+Cjxicj4KMi4gVXBncmFkaW5n IHRvIE1hcmlhREIgMTAuMS4xNiB3aGljaCB3YXMgcmVsZWFzZWQgYSBjb3VwbGUgb2YgZGF5cyBh Z28uPGJyPgo8YnI+CjMuIFRlc3Rpbmcgd2hldGhlciByZWd1bGFyIElTVCBpcyBhZmZlY3RlZCwg aWUgSVNUIHRoYXQgc2hvdWxkIG9jY3VyIG5vcm1hbGx5IHdpdGhvdXQgc3dpdGNoaW5nIHRvIFJT VSBtb2RlIG9yIGRyb3BwaW5nIGEgbm9kZSBvdXQgb2YgdGhlIGNsdXN0ZXIuPGJyPgo8YnI+Cjxi cj4KVGhpcyBzZWVtcyBsaWtlIGEgcHJldHR5IGJhc2ljIGZhaWx1cmUgYW5kIEkmIzM5O20gY29u Y2VybmVkIHRoYXQgaXQgbWF5IGFsc28gYWZmZWN0IHJlZ3VsYXIgSVNULCBpLmUuIGEgbm9kZSBm YWxsaW5nIGJlaGluZCB0aGUgY2x1c3RlciBmb3Igbm9ybWFsIHJlYXNvbnMgd2l0aG91dCBhbnkg aW52b2x2ZW1lbnQgb2YgUlNVIG1vZGUsIHdoaWNoIHdvdWxkIGVmZmVjdGl2ZWx5IG1ha2UgdGhl IHdob2xlIHN5c3RlbSB1c2VsZXNzIGlmIGl0IGNvdWxkIHJhbmRvbWx5IGRyb3AgZGVsZXRlIHN0 YXRlbWVudHMuPGJyPgo8YnI+CklmIGFueW9uZSBjYW4gc2hlZCBhbnkgbGlnaHQgb24gd2h5IHRo aXMgbWF5IGJlIGhhcHBlbmluZyB3ZSB3b3VsZCBiZSB2ZXJ5IGdyYXRlZnVsITxicj4KPGJyPgpU aGFua3MsPGJyPgpNYXJrPGJyPgo8YnI+Cl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fPGJyPgpNYWlsaW5nIGxpc3Q6IDxhIGhyZWY9Imh0dHBzOi8vbGF1bmNo cGFkLm5ldC9+bWFyaWEtZGlzY3VzcyIgcmVsPSJub3JlZmVycmVyIiB0YXJnZXQ9Il9ibGFuayI+ aHR0cHM6Ly9sYXVuY2hwYWQubmV0L35tYXJpYS1kaXNjdXNzPC9hPjxicj4KUG9zdCB0b8KgIMKg IMKgOiA8YSBocmVmPSJtYWlsdG86bWFyaWEtZGlzY3Vzc0BsaXN0cy5sYXVuY2hwYWQubmV0IiB0 YXJnZXQ9Il9ibGFuayI+bWFyaWEtZGlzY3Vzc0BsaXN0cy5sYXVuY2hwYWQubmV0PC9hPjxicj4K VW5zdWJzY3JpYmUgOiA8YSBocmVmPSJodHRwczovL2xhdW5jaHBhZC5uZXQvfm1hcmlhLWRpc2N1 c3MiIHJlbD0ibm9yZWZlcnJlciIgdGFyZ2V0PSJfYmxhbmsiPmh0dHBzOi8vbGF1bmNocGFkLm5l dC9+bWFyaWEtZGlzY3VzczwvYT48YnI+Ck1vcmUgaGVscMKgIMKgOiA8YSBocmVmPSJodHRwczov L2hlbHAubGF1bmNocGFkLm5ldC9MaXN0SGVscCIgcmVsPSJub3JlZmVycmVyIiB0YXJnZXQ9Il9i bGFuayI+aHR0cHM6Ly9oZWxwLmxhdW5jaHBhZC5uZXQvTGlzdEhlbHA8L2E+PGJyPgo8L2Jsb2Nr cXVvdGU+PC9kaXY+PGJyPjwvZGl2PjwvZGl2Pgo= --===============1872175155460153389==-- From ubuntu@rkw.io Thu Jul 21 08:06:01 2016 From: Mark Wadham To: discuss@lists.mariadb.org Subject: Re: [Maria-discuss] MariaDB 10.1.14 failure to initiate SST after RSU schema upgrade Date: Thu, 21 Jul 2016 09:05:56 +0100 Message-ID: <90C83306-8050-48FE-BE25-70161830881F@rkw.io> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0048247588352839598==" --===============0048247588352839598== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Hi Nirbhay, > As discussed on IRC #mariadb, you do not really need to take the node > off > cluster (3). > Just set wsrep_osu_method's session value to RSU and perform the > schema > change. > With RSU mode enabled, the node automatically desyncs itself from the > cluster before > executing any DDL,and thus other nodes in the cluster are not > impacted. I don't think this was the case during our testing, but I'll test it again anyway. If I understand RSU mode correctly all it does is ensure that DDL statements aren't replicated to the other nodes. AFAIK it won't stop the write events from the other nodes from being synced to the node in RSU mode, meaning that if we have writes to the table being altered we could run into problems. We have some very large tables and one of them we tested took around 7h 45min to add a column. Mark --===============0048247588352839598==-- From ubuntu@rkw.io Thu Jul 21 08:39:15 2016 From: Mark Wadham To: discuss@lists.mariadb.org Subject: Re: [Maria-discuss] MariaDB 10.1.14 failure to initiate SST after RSU schema upgrade Date: Thu, 21 Jul 2016 09:39:10 +0100 Message-ID: <48F2AD4A-DFA7-4C60-A45B-97B515A9949A@rkw.io> In-Reply-To: <90C83306-8050-48FE-BE25-70161830881F@rkw.io> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3317372899869915137==" --===============3317372899869915137== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On 21 Jul 2016, at 9:05, Mark Wadham wrote: > I don't think this was the case during our testing, but I'll test it > again anyway. If I understand RSU mode correctly all it does is > ensure that DDL statements aren't replicated to the other nodes. > AFAIK it won't stop the write events from the other nodes from being > synced to the node in RSU mode, meaning that if we have writes to the > table being altered we could run into problems. We have some very > large tables and one of them we tested took around 7h 45min to add a > column. Confirmed this, on node 0: SET GLOBAL wsrep_OSU_method='RSU'; show status like 'wsrep%'; - still shows "Synced" If we then alter table on node 0 and try to insert to the same table on node 1 we get a deadlock error. Mark --===============3317372899869915137==-- From ubuntu@rkw.io Thu Jul 21 08:49:37 2016 From: Mark Wadham To: discuss@lists.mariadb.org Subject: Re: [Maria-discuss] MariaDB 10.1.14 failure to initiate SST after RSU schema upgrade Date: Thu, 21 Jul 2016 09:49:32 +0100 Message-ID: <4AEBF45D-66AD-4224-9421-C948E93E0D7A@rkw.io> In-Reply-To: <48F2AD4A-DFA7-4C60-A45B-97B515A9949A@rkw.io> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5241399571183052342==" --===============5241399571183052342== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On 21 Jul 2016, at 9:39, Mark Wadham wrote: > Confirmed this Oh wait, it seems that if we do: SET wsrep_OSU_method='RSU'; (with GLOBAL omitted) then it sort of works. The node doesn't desync immediately but as soon as we execute some DDL it does desync, and then immediately re-syncs as soon as the alter statement is completed. I'll do some more testing as I'm not sure if we can execute multiple DDL statements (or more likely a sequence of migrations for one of our webapps) and be sure that it will stay in RSU mode until they've all been executed. Mark --===============5241399571183052342==--