Kristian Nielsen <knielsen@knielsen-hq.org> writes:
I will try to get some initial docs written up by the end of the week.
Giuseppe, here (attached) is my first stab at documentation for MariaDB GTID. Thanks for prodding me to get this done, even if it was a very gentle prodding :-) Daniel, do you think you can get these formatted properly and included in some appropriate place in the Knowledgebase? And feel free to fix any problems you find in the text on the way, of course. I will then make sure to keep this updated as the code progresses. This is just a first stab, I am sure there is much that is incomplete and I will work to extend it as needed (comments welcome if anything is found missing). - Kristian. A. MariaDB global transaction ID
From version 10.0, MariaDB supports global transaction IDs for replication.
MariaDB replication in general works as follows: On a master server, all updates to the database (DML and DDL) are written into the binary log as binlog events. A slave server connects to the master and reads the binlog events, then applies the events locally to replicate the same changes as done on the master. A server can be both a master and a slave at the same time, and it is thus possible for binlog events to replicated through multiple levels of servers. A slave server keeps track of the position in the master's binlog of the last event applied on the slave. This allows the slave server to re-connect and resume from where it left off after replication has been temporarily stopped. It also allows to disconnect from the master server and connect to a different server to resume replication from a new master, as long as the new master has the proper binlog events, and the new master connection starts replicationg at the appropriate point in the binlogs. Global transaction ID introduces a new event attached to each event group in the binlog. (An event group is a collection of events that are always applied as a unit. They are best thought of as a "transaction", though they also include non-transactional DML statements, as well as DDL). As an event group is replicated from master server to slave server, the global transaction ID is preserved. Since the ID is globally unique across the entire group of servers, this makes it easy to uniquely identify the same binlog events on different servers that replicate each other (this was not easily possible before MariaDB 10.0). Using global transaction ID provides two main benefits: 1. Easy to change a slave server to connect to and replicate from a different master server. The slave remembers the global transaction ID of the last event group applied from the old master. This makes it easy to know where to resume replication on the new master, since the global transaction IDs are know throughout the entire replication hierarchy. This is not the case when using old-style replication; in this case the slave knows only the specific file name and offset of the old master server of the last event applied. There is no simple way to guess from this the correct file name and offset on a new master. 2. The state of the slave is recorded in a crash-safe way. The slave keeps track of its current position (the global transaction ID of the last transaction applied) in a system table mysql.rpl_slave_state. If this table is using a transactional storage engine (such as InnoDB, which is the default), then updates to the state is done in the same transaction as the updates to the data. This makes the state crash-safe; if the slave server crashes, crash recovery on restart will make sure that the recorded replication position matches what changes were actually replicated. This is not the case for old-style replication, where the state is recorded in a file relay-log.info, which is updated independently of the actual data changes and can easily get out of sync if the slave server crashes. (This works for DML to transactional tables; non-transactional tables and DDL in general are not crash-safe in MariaDB.) Because of these two benefits, it is generally recommended to use global transaction ID for any replication setups based on MariaDB 10.0 or later. However, old-style replication continues to work as always, so there is no pressing need to change existing setups. Global transaction ID integrates smoothly with old-style replication, and the two can be used freely together in the same replication hierarchy. There is no special configuration needed of the server to start using global transaction ID. However, it must be explicitly set for a slave server with the appropriate CHANGE MASTER option; by default old-style replication is used by a replication slave, to maintain backwards compatibility. B. The concept of global transaction ID A global transaction ID, or GTID for short, consists of three numbers separated with dashes '-'. For example: 0-1-10 - The first number 0 is the domain ID, which is specific for global transaction ID (more on this below). It is a 32-bit unsigned integer. - The second number is the server ID, the same as is also used in old-style replication. It is a 32-bit unsigned integer. - The third number is the sequence number. This is a 64-bit unsigned integer that is monotonically increasing for each new event group logged into the binlog. The server ID is set to the server ID of the server where the event group is first logged into the binlog. The sequence number is increased on a server for every event group logged. Since server IDs must be unique for every server, this makes the (server_id, sequence_number) pair, and hence the whole GTID, globally unique. B.1. The domain ID When events are replicated from a master server to a slave server, the events are always logged into the slave's binlog in the same order that they were read from the master's binlog. Thus, if there is only ever a single master server receiving (non-replication) updates at a time, then the binlog order will be idential on every server in the replication hierarchy. This consistent binlog order is used by the slave to keep track of its current position in the replication. Basically, the slave remembers the GTID of the last event group replicated from the master. When reconnecting to a master, whether the same one or a new one, it sends this GTID position to the master, and the master starts sending events from the first event after the corresponding event group. However, if user updates are done independently on multiple servers at the same time, then in general it is not possible for binlog order to be identical across all servers. This can happen when using multi-source replication, with multi-master ring topologies, or just if manual updates are done on a slave that is replicating from active master. If the binlog order is different on the new master from the order on the old master, then it is not sufficient for the slave to keep track of a single GTID to completely record the current state. The domain ID, the first component of the GTID, is used to handle this. In general, the binlog is not a single ordered stream. Rather, it consists of a number of different streams, each one identified by its own domain ID. Within each stream, GTIDs always have the same order in every server binlog. However, different streams can be interleaved in different ways on different servers. A slave server then keeps track of its replication position by recording the last GTID applied within each replication stream. When connecting to a new master, the slave can start replication from a different point in the binlog for each domain ID. For more details on using multi-master setups and multiple domain IDs, see section "F. Using global transaction with multi-source replication and other multi-master setups". Simple replication setups only have a single master being updated by the application at any one time. In such setups, there is only a single replication stream needed. Then domain ID can be ignored, and left as the default of 0 on all servers. C. Using global transaction ID In MariaDB 10.0, global transaction ID is enabled automaticall. Each event group logged to the binlog receives a GTID event, as can be seen with mysqlbinlog or SHOW BINLOG EVENTS. The slave automatically keeps track of the GTID of the last applied event group, as can be seen from the gtid_pos variable: SELECT @@GLOBAL.gtid_pos 0-1-1 When a slave connects to a master, it can use either global transaction ID or old-style filename/offset to decide where in the master binlogs to start replicating from. To use global transaction ID, use the master_use_gtid option of CHANGE MASTER: CHANGE MASTER TO master_use_gtid = 1, master_host = 'my_master', ... When the slave is then later started with START SLAVE, it will send the value of @@GLOBAL.gtid_pos to the master and start replication from the corresponding point in the master binlogs. Even when a slave is configured to connect with the old-style binlog filename and offset (CHANGE MASTER TO master_log_file=..., master_log_pos=...), it will still keep track of the current GTID position in @@GLOBAL.gtid_pos. This means that an existing slave previously configured and running can be changed to connect with GTID (to the same or a new master) simply with: CHANGE MASTER TO master_use_gtid = 1 The slave remembers that master_use_gtid=1 was specified and will use it also for subsequent connects, until it is explicitly changed by specifying master_log_file/pos=... or master_use_gtid=0. The current value can be seen as the field Using_Gtid of SHOW SLAVE STATUS: SHOW SLAVE STATUS Using_Gtid: 1 The slave server internally uses the table mysql.rpl_slave_state to store the GTID position (and so preserve the value of @@GLOBAL.gtid_pos across server restarts). After upgrading a server to 10.0, it is necessary to run mysql_upgrade_db (as always) to get the table created. In order to be crash-safe, this table must use a transactional storage engine such as InnoDB. When MariaDB is first installed (or upgraded to 10.0), the table is created using the default storage engine - which itself defaults to InnoDB. If there is a need to change the storage engine for this table (to make it transactional on a system configured with MyISAM as the default storage engine, for example), use ALTER TABLE: ALTER TABLE mysql.rpl_slave_state ENGINE = InnoDB The table mysql.rpl_slave_state should not be modified in any other way. In particular, do not try to update the rows in the table to change the slave's idea of the current GTID position; instead use SET GLOBAL gtid_pos = '0-1-1' The actual slave GTID position, and thus the value of @@GLOBAL.gtid_pos, is the result of a combination of the contents of the mysql.rpl_slave_state and the contents of the slave binlog (if any, eg. if --log-slave-updates is enabled). This allows to use the same CHANGE MASTER TO ... MASTER_USE_GTID=1 command to connect a server as slave to a new master, regardless of whether the server was acting as a slave or a master before. D. Setting up a new slave server with global transaction ID Setting up a new replication slave server with global transaction ID is not much different from setting up an old-style slave. The basic steps are: 1. Setup the new server and load it with the initial data. 2. Start the slave replicating from the appropriate point in the master's binlog. D.1. Start with empty server, replicate all binary logs The simplest way for testing purposes is probably to setup a new, empty slave server and replicate all of the master's binlogs from the start (this is usually not feasible in a realistic production setup, as the initial binlog files will probably have been purged or take too long to apply). The slave server is installed in the normal way. By default, the GTID position for a newly installed server is empty, which makes the slave replicate from the start of the master's binlogs. But if the slave was used for other purposes before, the initial position can be explicitly set to empty first: SET GLOBAL gtid_pos = ""; Next, point the slave to the master with CHANGE MASTER. Specify master_host etc. as usual. But instead of specifying master_log_file and master_log_pos manually, use master_use_gtid=1 to have GTID do it automatically: CHANGE MASTER TO master_host="127.0.0.1", master_port=3310, master_user="root", master_use_gtid=1; START SLAVE; D.2. Setting up a new slave from a backup The normal way to set up a new replication slave is to restore a backup from an existing server (whether master or slave) as the new server, then point it to start replication from the appropriate position in the master's binlog. It is important that the position at which replication is started corresponds exactly to the state of the data at the point in time that the backup was taken. Otherwise, the slave can end up with different data than the master because of transactions missing or duplicated. Two common ways to take a backup are XtraBackup and mysqldump. Both of these can provide the current binlog position of the backup in a non-blocking way. Of course, if there are no writes to the server being backed up during the backup process, then a simple SHOW MASTER STATUS will give the correct position. Once the current binlog position for the backup has been obtained, in the form of a binlog file name and offset, the corresponding GTID position can be obtained from BINLOG_GTID_POS() on the server that was backed up: SELECT BINLOG_GTID_POS("master-bin.000001", 600); The new slave can now be started replicating by setting the correct @@gtid_pos, issuing CHANGE MASTER to point to the master server, and starting the slave threads: SET GLOBAL gtid_pos = "0-1-2"; CHANGE MASTER TO master_host="127.0.0.1", master_port=3310, master_user="root", master_use_gtid=1; START SLAVE; This method is particularly useful when setting up a new slave from a backup of the master. Remember to ensure that the value of server_id for the new server is different from that of any other server (this is set in my.cnf). If the backup was taken of an existing slave server, then it already has the correct GTID position stored in the table mysql.rpl_slave_state (provided that the backup includes this table and is consistent with changes to other tables, of course). In this case, there is no need to explicitly look up the GTID position on the old server and set it on the new slave - it will be already correctly loaded from mysql.rpl_slave_state. This however does not work if the backup was taken of the master - because then the current GTID position is contained in the binlog, not in mysql.rpl_slave_state. D.3. Switching an existing old-style slave to use GTID. If there is already an existing slave running using old-style binlog filename/offset position, then this can be changed to use GTID directly. This can be useful for upgrades for example, or where there are already tools to setup new slaves using old-style binlog positions. When a slave connects to a master using old-style binlog positions, and the master supports GTID (ie. is MariaDB 10.0.2 or bigger), then the slave automatically downloads the GTID position at connect and updates it during replication. Thus, once a slave has connected to the GTID-aware master at least once, it can be switched to using GTID without any other actions needed; STOP SLAVE; CHANGE MASTER TO master_host="127.0.0.1", master_port=3310, master_user="root", master_use_gtid=1; START SLAVE; (A later version will probably add a way to setup the slave so that it will connect with old-style binlog file/offset the first time, and automatically switch to using GTID on subsequent connects.) E. Changing a slave to replicate from a different master Once replication is running with GTID (master_use_gtid=1), the slave can be pointed to a new master simply by specifying in CHANGE MASTER the new master_host (and if required master_port, master_user, and master_password): STOP SLAVE; CHANGE MASTER TO master_host='127.0.0.1', master_port=3312; START SLAVE; The slave has a record of the GTID of the last applied transaction from the old master, and since GTIDs are identical across all servers in a replication hierarchy, the slave will just continue from the appropriate point in the new master's binlog. It is important to understand how this change of masters work. The binlog is an ordered stream of events (or multiple streams, one per replication domain, see section "F. Using global transaction with multi-source replication and other multi-master setups"). Events within the stream are always applied in the same order on every slave that replicates it. The MariaDB GTID relies on this ordering, so that it is sufficient to remember just a single point within the stream. Since event order is the same on every server, switching to the point of the same GTID in the binlog of another server will give the same result. This translates into some responsibility for the user. The MariaDB GTID replication is fully asynchronous, and fully flexible in how it can be configured. This makes it possible to use it in ways where the assumption that binlog sequence is the same on all servers is violated. In such cases, when changing master, GTID will still attempt to continue at the point of current GTID in the new binlog. The most common way that binlog sequence gets different between servers is when the user/DBA does updates directly on a slave server (and these updates are written into the slaves binlog). This results in events in the slaves binlog that are not present on the master or any other slaves. This can be avoided by setting the session variable sql_log_bin false while doing such updates, so they do not go into the binlog. It is normally best to avoid any differences in binlogs between servers. That being said, MariaDB replication is designed for maximum flexibility, and there can be valid reasons for introducing such differences from time to time. It this case, it just needs to be understood that the GTID position is a single point in each binlog stream (one per replication domain), and how this affects the users particular setup. Differences can also occur when two masters are active at the same time in a replication hierarchy. This happens when using a multi-master ring. But it can also occur in a simple master-slave setup, during switch to a new master, if changes on the old master is not allowed to fully replicate to all slave servers before switching master. Normally, to switch master, first writes to the old master should be stopped, then one should wait for all changes to be replicated to the new master, and only then should writes begin on the new master. Deliberately using multiple active masters is also supported, this is described in the next section. F. Using global transaction with multi-source replication and other multi-master setups MariaDB global transaction ID supports having multiple masters active at the same time. Typically this happens with either multi-source replication or multi-master ring setups. In such setups, each active master must be configured with its own distinct replication domain ID, gtid_domain_id. The binlog will then in effect consists of multiple independent streams, one per active master. Within one replication domain, binlog order is always the same on every server. But two different streams can be interleaved differently in different server binlogs. The GTID position of a given slave is then not a single GTID. Rather, it becomes the GTID of the last event group applied for each value of domain ID, in effect the position reached in each binlog stream. When the slave connects to a master, it can continue from one stream in a different binlog position than another stream. Since order within one stream is consistent across all servers, this is sufficient to always be able to continue replicationat the correct point in any new master server(s). Domain IDs are assigned by the DBA, according to the need of the application. The default value of @@GLOBAL.gtid_domain_id is 0. This is appropriate for most replication setups, where only a single master is active at a time. The MariaDB server will never by itself introduce new domain_id values into the binlog. When using multi-source replication, where a single slave connects to multiple masters at the same time, each such master should be configured with its own distict domain ID. Similarly, in a multi-master ring topology, where all master in the ring are updated by the application concurrently (with some mechanism to avoid conflicts), a distict domain ID should be configured for each server (In a multi-master ring where the application is careful to only do updates on one master at a time, a single domain ID is sufficient). Normally, a slave server should not receive direct updates (as this creates binlog differences compared to the master). Thus it does not matter what value of gtid_domain_id is set on a slave, though it may make sense to make it the same as the master (if not using multi-master) to make it easy to promote the slave as a new master. Of course, if a slave is itself an active master, as in a multi-master ring topology, the domain ID should be set according to the server's role as active master. Note that domain ID and server ID are distinct concepts. It is possible to use a different domain ID on each server, but this is normally not desirable. It makes the current GTID position (@@global.gtid_pos) more complicated to understand and work with, and looses the concept of a single ordered binlog stream across all servers. It is recommended only to configure as many domain IDs as there are master servers actively being updated by the application at the same time. It is not an error in itself to configure domain IDs incorrectly (for example, not configuring them at all). For example, this will be typical in an upgrade scenario where a multi-master ring using 5.5 is upgraded to 10.0. The ring will continue to work as before even though everything is configured to use the default domain ID 0. It is even possible to use GTID for replication between the servers. However, case must be taken when switching a slave to a different master. If the binlog order between the old and the new master differs, then a single GTID position to start replication from in the new master's binlog may not be sufficient. G. New syntax for global transaction ID G.1. CHANGE MASTER CHANGE MASTER has a new option, master_use_gtid=[0|1]. When enabled (set to 1), the slave will connect to the master using the GTID position. When disabled, the old-style binlog filename/offset position is used to decide where to start replicating when connecting. The value of master_use_gtid is saved across server restarts (in master.info). The current value can be seen as the field Using_Gtid in the output of SHOW SLAVE STATUS. G.2. BINLOG_GTID_POS(). The BINLOG_GTID_POS() function takes as input an old-style binlog position in the form of a file name and a file offset. It looks up the position in the current binlog, and returns a string representation of the corresponding GTID position. If the position is not found in the current binlog, NULL is returned. H. New system variables for global transaction ID H.1. gtid_pos This variable is the current GTID position of a slave server. It can be set by the user to change the current replication position. This requires all slave threads to be stopped first. Note that the position is shared among all slave connections when using multi-source replication. To set position for two masters, one using replication domain 1 and another replication domain 2, set a GTID for both domains, for example: SET GLOBAL gtid_pos = "1-10-100,2-20-500"; The variable value is updated whenever an event group is replicated on a slave, and whenever something is logged to the binlog on the master. Note that the value of the variable is the result of whatever event happened last, either slave replication or master binlogging, per replication domain. It is an error to set it to something that conflicts with what is in the binlog. This means that to completely reset a slave server (RESET SLAVE and delete all tables), it is also necessary to RESET MASTER before @@GLOBAL.gtid_pos can be cleared (if binlogging is enabled on the slave). This is in any case necessary to avoid incorrect binlog on the slave. Name: gtid_pos Type: String Scope: global Privileged: yes Dynamic: yes H.2. gtid_domain_id This variable is used to decide which replication domain new GTIDs are logged in for a master server. See section "F. Using global transaction with multi-source replication and other multi-master setups" for details. This variable can also be set on the session level. This is used by mysqlbinlog to preserve the domain ID of GTID events. Name: gtid_domain_id Type: 32-bit unsigned integer Scope: global and session Privileged: yes Dynamic: yes H.3. server_id Server_id can be set on the session level to change which server_id value is logged in binlog events (both GTID and other events). This is used by mysqlbinlog to preserve the server ID of GTID events. Name: server_id Type: 32-bit unsigned integer Scope: global and session Privileged: yes Dynamic: yes H.3. gtid_seq_no gtid_seq_no can be set on the session level to change which sequence number is logged in the following GTID event. This is used by mysqlbinlog to preserve the sequence number of GTID events. Name: gtid_seq_no Type: 64-bit unsigned integer Scope: session only Privileged: yes Dynamic: yes