[Maria-discuss] Single-threaded DDL limitation still present in Galera?
Hi, On this page https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/, a user in 2017 pointed out the following: 3 years, 3 months ago Björn Schneider https://mariadb.com/kb/user/id/4916 Schema changes of large tables in Galera https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/#comment_... You should be warned that every DDL statement executed on a Galera cluster will per default BLOCK the complete cluster (not only the table, or even just the database the table resides in)! This is the default "Total Order Isolation" (TOI) mode. The DDL statements can't be killed - once issued, it will run until completed or an error occurs. Issuing e.g. a column-altering DDL statement on a large table will take the complete cluster out of commission until every node has completed the migration. Only some operations (e.g. changing DEFAULT values) are always short-timed and won't interefere with the cluster's opperations. Long-running, cluster-blocking DDL are of course a no-go on a productive system. To resolve this issue, there are several resulutions; Percona, for example, provides a script to "online" migrate a table (pt-online-schema-change), or you can use the "Rolling Schema Upgrade" (RSU) for data-compatible changes. More about that in the Galera documentation. In my opionion, this behaviour should definitely added as "observation" to this page - it's definitely not something you'd expect coming from a "normal" MariaDB/MySQL system. We're considering moving to a cluster environment from a 1 master - 3 slave configuration that has proven inflexible, but such a severe limitation seems unusable in production. Is this still the case in 2020? What is the reasoning behind such an architectural decision, if so? Thank you. Sincerely, Artem -- Founder, Android Police http://www.androidpolice.com, APK Mirror http://www.apkmirror.com/, Illogical Robot LLC beerpla.net | @ArtemR http://twitter.com/ArtemR
Hi,
I haven't received a reply to this one. Is someone on the team with
knowledge of the cluster able to comment?
Thanks.
Sincerely,
Artem
--
Founder, Android Police http://www.androidpolice.com, APK Mirror
http://www.apkmirror.com/, Illogical Robot LLC
beerpla.net | @ArtemR http://twitter.com/ArtemR
On Tue, Apr 21, 2020 at 4:15 PM Artem Russakovskii
Hi,
On this page https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/, a user in 2017 pointed out the following:
3 years, 3 months ago Björn Schneider https://mariadb.com/kb/user/id/4916 Schema changes of large tables in Galera https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/#comment_...
You should be warned that every DDL statement executed on a Galera cluster will per default BLOCK the complete cluster (not only the table, or even just the database the table resides in)! This is the default "Total Order Isolation" (TOI) mode.
The DDL statements can't be killed - once issued, it will run until completed or an error occurs.
Issuing e.g. a column-altering DDL statement on a large table will take the complete cluster out of commission until every node has completed the migration. Only some operations (e.g. changing DEFAULT values) are always short-timed and won't interefere with the cluster's opperations.
Long-running, cluster-blocking DDL are of course a no-go on a productive system. To resolve this issue, there are several resulutions; Percona, for example, provides a script to "online" migrate a table (pt-online-schema-change), or you can use the "Rolling Schema Upgrade" (RSU) for data-compatible changes. More about that in the Galera documentation.
In my opionion, this behaviour should definitely added as "observation" to this page - it's definitely not something you'd expect coming from a "normal" MariaDB/MySQL system.
We're considering moving to a cluster environment from a 1 master - 3 slave configuration that has proven inflexible, but such a severe limitation seems unusable in production.
Is this still the case in 2020? What is the reasoning behind such an architectural decision, if so?
Thank you.
Sincerely, Artem
-- Founder, Android Police http://www.androidpolice.com, APK Mirror http://www.apkmirror.com/, Illogical Robot LLC beerpla.net | @ArtemR http://twitter.com/ArtemR
Hi Artem,
As far as I know based on reading release notes, and in my experience as a
MariaDB Galera user: Yes, this is still the case. Although more and more
types of ALTER TABLE statements are now done "online", i.e. they run with
algorithm=INPLACE | NOCOPY | INSTANT so the table doesn't have to be
re-built, and therefore they don't necessarily result in the kind of
long-running, cluster-blocking operations we had in the past.
Still, it would have been nice if there was a way to avoid cluster-blocking
by DDL in the cases where ALTER TABLE still doesn't run "online".
Regards,
Karl
On Wed, 12 Aug 2020 at 21:11, Artem Russakovskii
Hi,
I haven't received a reply to this one. Is someone on the team with knowledge of the cluster able to comment?
Thanks.
Sincerely, Artem
-- Founder, Android Police http://www.androidpolice.com, APK Mirror http://www.apkmirror.com/, Illogical Robot LLC beerpla.net | @ArtemR http://twitter.com/ArtemR
On Tue, Apr 21, 2020 at 4:15 PM Artem Russakovskii
wrote: Hi,
On this page https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/, a user in 2017 pointed out the following:
3 years, 3 months ago Björn Schneider https://mariadb.com/kb/user/id/4916 Schema changes of large tables in Galera https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/#comment_...
You should be warned that every DDL statement executed on a Galera cluster will per default BLOCK the complete cluster (not only the table, or even just the database the table resides in)! This is the default "Total Order Isolation" (TOI) mode.
The DDL statements can't be killed - once issued, it will run until completed or an error occurs.
Issuing e.g. a column-altering DDL statement on a large table will take the complete cluster out of commission until every node has completed the migration. Only some operations (e.g. changing DEFAULT values) are always short-timed and won't interefere with the cluster's opperations.
Long-running, cluster-blocking DDL are of course a no-go on a productive system. To resolve this issue, there are several resulutions; Percona, for example, provides a script to "online" migrate a table (pt-online-schema-change), or you can use the "Rolling Schema Upgrade" (RSU) for data-compatible changes. More about that in the Galera documentation.
In my opionion, this behaviour should definitely added as "observation" to this page - it's definitely not something you'd expect coming from a "normal" MariaDB/MySQL system.
We're considering moving to a cluster environment from a 1 master - 3 slave configuration that has proven inflexible, but such a severe limitation seems unusable in production.
Is this still the case in 2020? What is the reasoning behind such an architectural decision, if so?
Thank you.
Sincerely, Artem
-- Founder, Android Police http://www.androidpolice.com, APK Mirror http://www.apkmirror.com/, Illogical Robot LLC beerpla.net | @ArtemR http://twitter.com/ArtemR
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Thanks, Karl. It's pretty shocking that this kind of severe limitation
isn't disclosed and only pointed out in the comments, and that this was the
selected cluster design in the first place. I don't understand how anyone
could use this in production.
Sincerely,
Artem
--
Founder, Android Police http://www.androidpolice.com, APK Mirror
http://www.apkmirror.com/, Illogical Robot LLC
beerpla.net | @ArtemR http://twitter.com/ArtemR
On Wed, Aug 12, 2020 at 1:44 PM Karl Levik
Hi Artem,
As far as I know based on reading release notes, and in my experience as a MariaDB Galera user: Yes, this is still the case. Although more and more types of ALTER TABLE statements are now done "online", i.e. they run with algorithm=INPLACE | NOCOPY | INSTANT so the table doesn't have to be re-built, and therefore they don't necessarily result in the kind of long-running, cluster-blocking operations we had in the past.
Still, it would have been nice if there was a way to avoid cluster-blocking by DDL in the cases where ALTER TABLE still doesn't run "online".
Regards, Karl
On Wed, 12 Aug 2020 at 21:11, Artem Russakovskii
wrote: Hi,
I haven't received a reply to this one. Is someone on the team with knowledge of the cluster able to comment?
Thanks.
Sincerely, Artem
-- Founder, Android Police http://www.androidpolice.com, APK Mirror http://www.apkmirror.com/, Illogical Robot LLC beerpla.net | @ArtemR http://twitter.com/ArtemR
On Tue, Apr 21, 2020 at 4:15 PM Artem Russakovskii
wrote: Hi,
On this page https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/, a user in 2017 pointed out the following:
3 years, 3 months ago Björn Schneider https://mariadb.com/kb/user/id/4916 Schema changes of large tables in Galera https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/#comment_...
You should be warned that every DDL statement executed on a Galera cluster will per default BLOCK the complete cluster (not only the table, or even just the database the table resides in)! This is the default "Total Order Isolation" (TOI) mode.
The DDL statements can't be killed - once issued, it will run until completed or an error occurs.
Issuing e.g. a column-altering DDL statement on a large table will take the complete cluster out of commission until every node has completed the migration. Only some operations (e.g. changing DEFAULT values) are always short-timed and won't interefere with the cluster's opperations.
Long-running, cluster-blocking DDL are of course a no-go on a productive system. To resolve this issue, there are several resulutions; Percona, for example, provides a script to "online" migrate a table (pt-online-schema-change), or you can use the "Rolling Schema Upgrade" (RSU) for data-compatible changes. More about that in the Galera documentation.
In my opionion, this behaviour should definitely added as "observation" to this page - it's definitely not something you'd expect coming from a "normal" MariaDB/MySQL system.
We're considering moving to a cluster environment from a 1 master - 3 slave configuration that has proven inflexible, but such a severe limitation seems unusable in production.
Is this still the case in 2020? What is the reasoning behind such an architectural decision, if so?
Thank you.
Sincerely, Artem
-- Founder, Android Police http://www.androidpolice.com, APK Mirror http://www.apkmirror.com/, Illogical Robot LLC beerpla.net | @ArtemR http://twitter.com/ArtemR
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
participants (2)
-
Artem Russakovskii
-
Karl Levik