Re: MDEV-36290: ALTER TABLE with multi-master can cause data loss (PR #3967)

Brandon Nesterenko <notifications@github.com> writes:
MDEV-36290: ALTER TABLE with multi-master can cause data loss
Hi Brandon, Here is my initial review of the MDEV-36290 draft patch. Overall, it is clear that this is not something that's appropriate for 10.6. It implements a new way to apply row events, based on column names (if available) rather than column number. It is something for next dev version, 12.x.
commit de27bfbafc22601d53ddc0567401fcb26d28a91c Author: Brandon Nesterenko <brandon.nesterenko@mariadb.com> Date: Thu Apr 3 12:37:09 2025 -0600
MDEV-36290: ALTER TABLE with multi-master can cause data loss
Please do not use the term "data loss" in the description (I know it is not you who introduced the term ;-). The issue here is conflicting queries run in parallel against multiple masters in a multi-master setup. In general, the support of multi-master in asynchronous replication requires the user to ensure that no conflicting queries are run, or to ensure that such conflicts will not cause undesirable behaviour. The behaviour addressed here is just one of many ways that conflicting ALTER TABLE could cause different data on slave and master. Using the term "data loss" can cause misunderstanding about what the problem is and what the change does. Of course, every case that is handled more correctly is in principle better. In this case, we introduce the facility to configure (--binlog_row_metadata=FULL) replication so that it is possible to replicate between tables on master and slave with different order of the columns. Not sure how useful such an option is in practice, seems to be very few users that will really need this, and such need has to be weighed against the increase in complexity in the code and potential for bugs. With that said, the approach in the patch generally looks reasonable. A few initial detail comments below, as you mentioned this is still a draft patch. As you also mentioned to me earlier, the logic in the patch is not trivial. I did not yet try to understand in detail all parts of the logic and the different corner cases.
+ unsigned char *get_optional_metadata_str() + { + return m_optional_metadata; + } + + unsigned int get_optional_metadata_len() + { + return m_optional_metadata_len; + } +
Generally I do not like such do-nothing accessor functions. They just add abstraction and reduce grepability without doing anything useful. That's a style preference of course.
+ /* + Maps column index from master to slave. This is determined using the field + names (provied by optional metadata when the master is configured with + binlog_row_metadata=FULL). + */ + std::map<uint, uint> master_to_slave_index_map; + + /* + When using field names to map from master->slave columns, this keeps track + of column indices on the master which aren't present on the slave. + + It is used to skip columns when checking type-conversions and unpacking + row data. + */ + std::set<uint> master_unmatched_cols;
Is using std::map/std::set going to be expensive here, this is operations that will happen on every row-based event (when --binlog-row-metadata=FULL)? The std:: functions can imply a lot of memory allocations that are not very visible, and we usually try to avoid them in performance-relevant server code. master_to_slave_index_map is just a lookup table of numbers 0..#columns-1, right? So it doesn't need to be an associative array, a normal array is sufficient, and doesn't need to be re-allocated each event. And master_unmatched_cols is again just a set of numbers 0..#columns-1, so a bitmap could be used. - Kristian.
participants (1)
-
Kristian Nielsen