[Maria-discuss] Galera replication - optimistic locking problems
Hi. I hope this is a reasonable place to ask about Galera replication problems - if not, please point me elsewhere and I'll happily ask others :) I have four Debian 10 "Buster" servers installed with MariaDB 10.1.47 and Galera 25.3.19-2. I have other servers which write to these machines (there's no single "master", since I don't want any particular client to depend on any single master being available), and also read from them (in case it matters, those machines are running Asterisk, and writing Call Event Logs and Call Detail Records into MariaDB via ODBC). My problem is the optimistic locking used by Galera when replicating between nodes. It results in Asterisk log file entries such as: WARNING[20089]: res_odbc.c:460 in ast_odbc_print_errors: SQL Execute returned an error: 40001: [ma-3.0.6][10.1.47-MariaDB-0+deb9u1]Deadlock found when trying to get lock; try restarting transaction Now, that's all very well for an application which can respond to such a warning (personally I'd call it an error, when a write did not take place, but that's a question for Asterisk, not MariaDB) and perform the retry, but Asterisk does not do this. Therefore I'd like to find some way to: 1. tell Galera to use pessimistic locking for replication if possible (I can accept the performance penalty) 2. tell MariaDB to automatically retry the write when the error occurs (although I can't think of any way that could be done, since I can't create a transaction of any sort - the operation is entirely determined for me by Asterisk) 3. find a High-Availability (which basically means no single point of failure) front-end for the whole cluster of 4 MariaDB servers so that the problem does not occur. Questions: 1. How do other people deal with this problem? 2. Are any of my potential solutions above actually feasible? (If so, how?) 3. Does anyone have any alternative ideas about how to connect an application which doesn't understand retrying database writes (Asterisk) with a database which doesn't guarantee to write the data you give it (MariaDB + Galera)? Thanks, Antony. -- Schrödinger's rule of data integrity: the condition of any backup is unknown until a restore is attempted. Please reply to the list; please *don't* CC me.
I don't have the answer to your exact question off the top of my head, but deadlocks like this are not specific to Galera - they can happen in standalone node, even if the window of opportunity for deadlocks is much greater with Galera. The point I'm getting to is that if the application can't handle deadlocks, it is broken and arguably unfit for purpose - because deadlocks in a relational database that serves concurrent connections cannot be guaranteed to never occur. On Tue, 15 Dec 2020, 21:53 Antony Stone, < Antony.Stone@mariadb.open.source.it> wrote:
Hi.
I hope this is a reasonable place to ask about Galera replication problems - if not, please point me elsewhere and I'll happily ask others :)
I have four Debian 10 "Buster" servers installed with MariaDB 10.1.47 and Galera 25.3.19-2.
I have other servers which write to these machines (there's no single "master", since I don't want any particular client to depend on any single master being available), and also read from them (in case it matters, those machines are running Asterisk, and writing Call Event Logs and Call Detail Records into MariaDB via ODBC).
My problem is the optimistic locking used by Galera when replicating between nodes. It results in Asterisk log file entries such as:
WARNING[20089]: res_odbc.c:460 in ast_odbc_print_errors: SQL Execute returned an error: 40001: [ma-3.0.6][10.1.47-MariaDB-0+deb9u1]Deadlock found when trying to get lock; try restarting transaction
Now, that's all very well for an application which can respond to such a warning (personally I'd call it an error, when a write did not take place, but that's a question for Asterisk, not MariaDB) and perform the retry, but Asterisk does not do this.
Therefore I'd like to find some way to:
1. tell Galera to use pessimistic locking for replication if possible (I can accept the performance penalty)
2. tell MariaDB to automatically retry the write when the error occurs (although I can't think of any way that could be done, since I can't create a transaction of any sort - the operation is entirely determined for me by Asterisk)
3. find a High-Availability (which basically means no single point of failure) front-end for the whole cluster of 4 MariaDB servers so that the problem does not occur.
Questions:
1. How do other people deal with this problem?
2. Are any of my potential solutions above actually feasible? (If so, how?)
3. Does anyone have any alternative ideas about how to connect an application which doesn't understand retrying database writes (Asterisk) with a database which doesn't guarantee to write the data you give it (MariaDB + Galera)?
Thanks,
Antony.
-- Schrödinger's rule of data integrity: the condition of any backup is unknown until a restore is attempted.
Please reply to the list; please *don't* CC me.
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi, With Galera, you really want to write to a single node to avoid conflicts. Since Galera doesn't do any partitioning of the data, the IO will still happen on all nodes which leaves convenience as the only reason to write to multiple nodes. A common method of avoiding deadlocks with Galera is to put a proxy in front of it that understands the cluster. One of these is MariaDB MaxScale (a team which I'm a part of) which has advanced support for Galera clusters. The Galera monitor in MaxScale uses the wsrep_local_index variable to pick a single node in the cluster that all MaxScales write to. This eliminates the possibility of conflicts due to the distributed nature of Galera but still allows you to write to any of the MaxScale instances. Another feature of MaxScale that could help with applications that don't know to retry transactions is the transaction_replay <https://mariadb.com/kb/en/mariadb-maxscale-25-readwritesplit/#transaction_replay> feature of the readwritesplit router. This feature has a mode <https://mariadb.com/kb/en/mariadb-maxscale-25-readwritesplit/#transaction_replay_retry_on_deadlock> where it can retry an active transaction if it ends up in a deadlock. This allows transparent retrying of the transaction while still keeping all the consistency guarantees. There are of course some limitations to what can be successfully retried which means some edge cases might not be solved by this. Using a proxy does have its downside, increased latency and additional maintenance burden of the added servers being the most obvious ones. One way you can avoid this is to place the proxy on the application server and have it behave as a sort of a connector. Markus On 12/15/20 11:53 PM, Antony Stone wrote:
Therefore I'd like to find some way to:
1. tell Galera to use pessimistic locking for replication if possible (I can accept the performance penalty)
2. tell MariaDB to automatically retry the write when the error occurs (although I can't think of any way that could be done, since I can't create a transaction of any sort - the operation is entirely determined for me by Asterisk)
3. find a High-Availability (which basically means no single point of failure) front-end for the whole cluster of 4 MariaDB servers so that the problem does not occur.
Questions:
1. How do other people deal with this problem?
2. Are any of my potential solutions above actually feasible? (If so, how?)
3. Does anyone have any alternative ideas about how to connect an application which doesn't understand retrying database writes (Asterisk) with a database which doesn't guarantee to write the data you give it (MariaDB + Galera)?
-- Markus Mäkelä, Senior Software Engineer MariaDB Corporation t: +358 40 7740484
On Wednesday 16 December 2020 at 02:35:33, Markus Mäkelä wrote:
A common method of avoiding deadlocks with Galera is to put a proxy in front of it that understands the cluster. One of these is MariaDB MaxScale (a team which I'm a part of) which has advanced support for Galera clusters.
Thanks for the info and pointer to a potential solution. I started by looking at how to install it and landed at https://mariadb.com/kb/en/mariadb-package-repository-setup-and-usage/ That tells me "MariaDB MaxScale is licensed under the Business Source License 1.1, so it is not entirely free to use for organizations who do not have a subscription with MariaDB Corporation. If you would like more information, see the information at MariaDB Business Source License (BSL): Frequently Asked Questions." So, I'm now trying to understand the licensing limitations of MaxScale - since it's not simply GPL like MariaDB is :( Looking at https://mariadb.com/bsl-faq-mariadb/ it talks about usage "below the specified use limitation". Then referring to https://mariadb.com/projects-using-bsl-11/ I see: "Additional Use Grant: You may use the Licensed Work when your application uses the Licensed Work with a total of less than three server instances for any purpose." So, is this "additional use grant" the same as "usage below the specified use limitation"? Also, does "less than three" mean "one or two" or does it include "three"? Finally, assuming we want to use MaxScale in a production environment with not "less than three" instances, where can I find the price list telling us what this would cost? Thanks, Antony. -- "In fact I wanted to be John Cleese and it took me some time to realise that the job was already taken." - Douglas Adams Please reply to the list; please *don't* CC me.
Hi,
So, is this "additional use grant" the same as "usage below the specified use limitation"?
Also, does "less than three" mean "one or two" or does it include "three"?
Less than three means either one or two servers. You can find the MaxScale license here: https://github.com/mariadb-corporation/MaxScale/blob/2.5/LICENSE25.TXT
Finally, assuming we want to use MaxScale in a production environment with not "less than three" instances, where can I find the price list telling us what this would cost?
I'd recommend looking this up on the MariaDB website; I unfortunately only know about the technical side of MaxScale. If it's not listed there, you can most likely get it by either sending an email to one of the addresses on the site or asking it in the live chat. Markus -- Markus Mäkelä, Senior Software Engineer MariaDB Corporation t: +358 40 7740484
participants (3)
-
Antony Stone
-
Gordan Bobic
-
Markus Mäkelä