Kristian Nielsen <knielsen@knielsen-hq.org> writes:

I will try to get some initial docs written up by the end of the week.

Giuseppe, here (attached) is my first stab at documentation for MariaDB

GTID. Thanks for prodding me to get this done, even if it was a very gentle

prodding :-)

Daniel, do you think you can get these formatted properly and included in some

appropriate place in the Knowledgebase? And feel free to fix any problems you

find in the text on the way, of course.

I will then make sure to keep this updated as the code progresses.

This is just a first stab, I am sure there is much that is incomplete and I

will work to extend it as needed (comments welcome if anything is found

missing).

- Kristian.

A. MariaDB global transaction ID

From version 10.0, MariaDB supports global transaction IDs for replication.

MariaDB replication in general works as follows: On a master server, all

updates to the database (DML and DDL) are written into the binary log as

binlog events. A slave server connects to the master and reads the binlog

events, then applies the events locally to replicate the same changes as done

on the master. A server can be both a master and a slave at the same time, and

it is thus possible for binlog events to replicated through multiple levels of

servers.

A slave server keeps track of the position in the master's binlog of the last

event applied on the slave. This allows the slave server to re-connect and

resume from where it left off after replication has been temporarily

stopped. It also allows to disconnect from the master server and connect to a

different server to resume replication from a new master, as long as the new

master has the proper binlog events, and the new master connection starts

replicationg at the appropriate point in the binlogs.

Global transaction ID introduces a new event attached to each event group in

the binlog. (An event group is a collection of events that are always applied

as a unit. They are best thought of as a "transaction", though they also

include non-transactional DML statements, as well as DDL). As an event group

is replicated from master server to slave server, the global transaction ID is

preserved. Since the ID is globally unique across the entire group of servers,

this makes it easy to uniquely identify the same binlog events on different

servers that replicate each other (this was not easily possible before MariaDB

10.0).

Using global transaction ID provides two main benefits:

1. Easy to change a slave server to connect to and replicate from a different

master server.

The slave remembers the global transaction ID of the last event group

applied from the old master. This makes it easy to know where to resume

replication on the new master, since the global transaction IDs are know

throughout the entire replication hierarchy. This is not the case when

using old-style replication; in this case the slave knows only the

specific file name and offset of the old master server of the last event

applied. There is no simple way to guess from this the correct file name

and offset on a new master.

2. The state of the slave is recorded in a crash-safe way.

The slave keeps track of its current position (the global transaction ID

of the last transaction applied) in a system table mysql.rpl_slave_state.

If this table is using a transactional storage engine (such as InnoDB,

which is the default), then updates to the state is done in the same

transaction as the updates to the data. This makes the state crash-safe;

if the slave server crashes, crash recovery on restart will make sure that

the recorded replication position matches what changes were actually

replicated. This is not the case for old-style replication, where the

state is recorded in a file relay-log.info, which is updated independently

of the actual data changes and can easily get out of sync if the slave

server crashes. (This works for DML to transactional tables;

non-transactional tables and DDL in general are not crash-safe in

MariaDB.)

Because of these two benefits, it is generally recommended to use global

transaction ID for any replication setups based on MariaDB 10.0 or later.

However, old-style replication continues to work as always, so there is no

pressing need to change existing setups. Global transaction ID integrates

smoothly with old-style replication, and the two can be used freely together

in the same replication hierarchy. There is no special configuration needed of

the server to start using global transaction ID. However, it must be

explicitly set for a slave server with the appropriate CHANGE MASTER option;

by default old-style replication is used by a replication slave, to maintain

backwards compatibility.

B. The concept of global transaction ID

A global transaction ID, or GTID for short, consists of three numbers

separated with dashes '-'. For example:

0-1-10

- The first number 0 is the domain ID, which is specific for global

transaction ID (more on this below). It is a 32-bit unsigned integer.

- The second number is the server ID, the same as is also used in old-style

replication. It is a 32-bit unsigned integer.

- The third number is the sequence number. This is a 64-bit unsigned integer

that is monotonically increasing for each new event group logged into the

binlog.

The server ID is set to the server ID of the server where the event group is

first logged into the binlog. The sequence number is increased on a server for

every event group logged. Since server IDs must be unique for every server,

this makes the (server_id, sequence_number) pair, and hence the whole GTID,

globally unique.

B.1. The domain ID

When events are replicated from a master server to a slave server, the events

are always logged into the slave's binlog in the same order that they were

read from the master's binlog. Thus, if there is only ever a single master

server receiving (non-replication) updates at a time, then the binlog order

will be idential on every server in the replication hierarchy.

This consistent binlog order is used by the slave to keep track of its current

position in the replication. Basically, the slave remembers the GTID of the

last event group replicated from the master. When reconnecting to a master,

whether the same one or a new one, it sends this GTID position to the master,

and the master starts sending events from the first event after the

corresponding event group.

However, if user updates are done independently on multiple servers at the

same time, then in general it is not possible for binlog order to be identical

across all servers. This can happen when using multi-source replication, with

multi-master ring topologies, or just if manual updates are done on a slave

that is replicating from active master. If the binlog order is different on

the new master from the order on the old master, then it is not sufficient for

the slave to keep track of a single GTID to completely record the current

state.

The domain ID, the first component of the GTID, is used to handle this.

In general, the binlog is not a single ordered stream. Rather, it consists of

a number of different streams, each one identified by its own domain

ID. Within each stream, GTIDs always have the same order in every server

binlog. However, different streams can be interleaved in different ways on

different servers.

A slave server then keeps track of its replication position by recording the

last GTID applied within each replication stream. When connecting to a new

master, the slave can start replication from a different point in the binlog

for each domain ID.

For more details on using multi-master setups and multiple domain IDs, see

section "F. Using global transaction with multi-source replication and other

multi-master setups".

Simple replication setups only have a single master being updated by the

application at any one time. In such setups, there is only a single

replication stream needed. Then domain ID can be ignored, and left as the

default of 0 on all servers.

C. Using global transaction ID

In MariaDB 10.0, global transaction ID is enabled automaticall. Each event

group logged to the binlog receives a GTID event, as can be seen with

mysqlbinlog or SHOW BINLOG EVENTS.

The slave automatically keeps track of the GTID of the last applied event

group, as can be seen from the gtid_pos variable:

SELECT @@GLOBAL.gtid_pos

0-1-1

When a slave connects to a master, it can use either global transaction ID or

old-style filename/offset to decide where in the master binlogs to start

replicating from. To use global transaction ID, use the master_use_gtid option

of CHANGE MASTER:

CHANGE MASTER TO master_use_gtid = 1, master_host = 'my_master', ...

When the slave is then later started with START SLAVE, it will send the value

of @@GLOBAL.gtid_pos to the master and start replication from the

corresponding point in the master binlogs.

Even when a slave is configured to connect with the old-style binlog filename

and offset (CHANGE MASTER TO master_log_file=..., master_log_pos=...), it will

still keep track of the current GTID position in @@GLOBAL.gtid_pos. This means

that an existing slave previously configured and running can be changed to

connect with GTID (to the same or a new master) simply with:

CHANGE MASTER TO master_use_gtid = 1

The slave remembers that master_use_gtid=1 was specified and will use it also

for subsequent connects, until it is explicitly changed by specifying

master_log_file/pos=... or master_use_gtid=0. The current value can be seen as

the field Using_Gtid of SHOW SLAVE STATUS:

SHOW SLAVE STATUS

Using_Gtid: 1

The slave server internally uses the table mysql.rpl_slave_state to store the

GTID position (and so preserve the value of @@GLOBAL.gtid_pos across server

restarts). After upgrading a server to 10.0, it is necessary to run

mysql_upgrade_db (as always) to get the table created.

In order to be crash-safe, this table must use a transactional storage engine

such as InnoDB. When MariaDB is first installed (or upgraded to 10.0), the

table is created using the default storage engine - which itself defaults to

InnoDB. If there is a need to change the storage engine for this table (to

make it transactional on a system configured with MyISAM as the default

storage engine, for example), use ALTER TABLE:

ALTER TABLE mysql.rpl_slave_state ENGINE = InnoDB

The table mysql.rpl_slave_state should not be modified in any other way. In

particular, do not try to update the rows in the table to change the slave's

idea of the current GTID position; instead use

SET GLOBAL gtid_pos = '0-1-1'

The actual slave GTID position, and thus the value of @@GLOBAL.gtid_pos, is

the result of a combination of the contents of the mysql.rpl_slave_state and

the contents of the slave binlog (if any, eg. if --log-slave-updates is

enabled). This allows to use the same CHANGE MASTER TO ... MASTER_USE_GTID=1

command to connect a server as slave to a new master, regardless of whether

the server was acting as a slave or a master before.

D. Setting up a new slave server with global transaction ID

Setting up a new replication slave server with global transaction ID is not

much different from setting up an old-style slave. The basic steps are:

1. Setup the new server and load it with the initial data.

2. Start the slave replicating from the appropriate point in the master's

binlog.

D.1. Start with empty server, replicate all binary logs

The simplest way for testing purposes is probably to setup a new, empty slave

server and replicate all of the master's binlogs from the start (this is

usually not feasible in a realistic production setup, as the initial binlog

files will probably have been purged or take too long to apply).

The slave server is installed in the normal way. By default, the GTID position

for a newly installed server is empty, which makes the slave replicate from

the start of the master's binlogs. But if the slave was used for other

purposes before, the initial position can be explicitly set to empty first:

SET GLOBAL gtid_pos = "";

Next, point the slave to the master with CHANGE MASTER. Specify master_host

etc. as usual. But instead of specifying master_log_file and master_log_pos

manually, use master_use_gtid=1 to have GTID do it automatically:

CHANGE MASTER TO master_host="127.0.0.1", master_port=3310,

master_user="root", master_use_gtid=1;

START SLAVE;

D.2. Setting up a new slave from a backup

The normal way to set up a new replication slave is to restore a backup from

an existing server (whether master or slave) as the new server, then point it

to start replication from the appropriate position in the master's binlog.

It is important that the position at which replication is started corresponds

exactly to the state of the data at the point in time that the backup was

taken. Otherwise, the slave can end up with different data than the master

because of transactions missing or duplicated.

Two common ways to take a backup are XtraBackup and mysqldump. Both of these

can provide the current binlog position of the backup in a non-blocking

way. Of course, if there are no writes to the server being backed up during

the backup process, then a simple SHOW MASTER STATUS will give the correct

position.

Once the current binlog position for the backup has been obtained, in the form

of a binlog file name and offset, the corresponding GTID position can be

obtained from BINLOG_GTID_POS() on the server that was backed up:

SELECT BINLOG_GTID_POS("master-bin.000001", 600);

The new slave can now be started replicating by setting the correct

@@gtid_pos, issuing CHANGE MASTER to point to the master server, and starting

the slave threads:

SET GLOBAL gtid_pos = "0-1-2";

CHANGE MASTER TO master_host="127.0.0.1", master_port=3310,

master_user="root", master_use_gtid=1;

START SLAVE;

This method is particularly useful when setting up a new slave from a backup

of the master. Remember to ensure that the value of server_id for the new

server is different from that of any other server (this is set in my.cnf).

If the backup was taken of an existing slave server, then it already has the

correct GTID position stored in the table mysql.rpl_slave_state (provided that

the backup includes this table and is consistent with changes to other tables,

of course). In this case, there is no need to explicitly look up the GTID

position on the old server and set it on the new slave - it will be already

correctly loaded from mysql.rpl_slave_state. This however does not work if the

backup was taken of the master - because then the current GTID position is

contained in the binlog, not in mysql.rpl_slave_state.

D.3. Switching an existing old-style slave to use GTID.

If there is already an existing slave running using old-style binlog

filename/offset position, then this can be changed to use GTID directly. This

can be useful for upgrades for example, or where there are already tools to

setup new slaves using old-style binlog positions.

When a slave connects to a master using old-style binlog positions, and the

master supports GTID (ie. is MariaDB 10.0.2 or bigger), then the slave

automatically downloads the GTID position at connect and updates it during

replication. Thus, once a slave has connected to the GTID-aware master at

least once, it can be switched to using GTID without any other actions needed;

STOP SLAVE;

CHANGE MASTER TO master_host="127.0.0.1", master_port=3310,

master_user="root", master_use_gtid=1;

START SLAVE;

(A later version will probably add a way to setup the slave so that it will

connect with old-style binlog file/offset the first time, and automatically

switch to using GTID on subsequent connects.)

E. Changing a slave to replicate from a different master

Once replication is running with GTID (master_use_gtid=1), the slave can be

pointed to a new master simply by specifying in CHANGE MASTER the new

master_host (and if required master_port, master_user, and master_password):

STOP SLAVE;

CHANGE MASTER TO master_host='127.0.0.1', master_port=3312;

START SLAVE;

The slave has a record of the GTID of the last applied transaction from the

old master, and since GTIDs are identical across all servers in a replication

hierarchy, the slave will just continue from the appropriate point in the new

master's binlog.

It is important to understand how this change of masters work. The binlog is

an ordered stream of events (or multiple streams, one per replication domain,

see section "F. Using global transaction with multi-source replication and

other multi-master setups"). Events within the stream are always applied in

the same order on every slave that replicates it. The MariaDB GTID relies on

this ordering, so that it is sufficient to remember just a single point within

the stream. Since event order is the same on every server, switching to the

point of the same GTID in the binlog of another server will give the same

result.

This translates into some responsibility for the user. The MariaDB GTID

replication is fully asynchronous, and fully flexible in how it can be

configured. This makes it possible to use it in ways where the assumption that

binlog sequence is the same on all servers is violated. In such cases, when

changing master, GTID will still attempt to continue at the point of current

GTID in the new binlog.

The most common way that binlog sequence gets different between servers is

when the user/DBA does updates directly on a slave server (and these updates

are written into the slaves binlog). This results in events in the slaves

binlog that are not present on the master or any other slaves. This can be

avoided by setting the session variable sql_log_bin false while doing such

updates, so they do not go into the binlog.

It is normally best to avoid any differences in binlogs between servers. That

being said, MariaDB replication is designed for maximum flexibility, and there

can be valid reasons for introducing such differences from time to time. It

this case, it just needs to be understood that the GTID position is a single

point in each binlog stream (one per replication domain), and how this affects

the users particular setup.

Differences can also occur when two masters are active at the same time in a

replication hierarchy. This happens when using a multi-master ring. But it can

also occur in a simple master-slave setup, during switch to a new master, if

changes on the old master is not allowed to fully replicate to all slave

servers before switching master. Normally, to switch master, first writes to

the old master should be stopped, then one should wait for all changes to be

replicated to the new master, and only then should writes begin on the new

master. Deliberately using multiple active masters is also supported, this is

described in the next section.

F. Using global transaction with multi-source replication and other

multi-master setups

MariaDB global transaction ID supports having multiple masters active at the

same time. Typically this happens with either multi-source replication or

multi-master ring setups.

In such setups, each active master must be configured with its own distinct

replication domain ID, gtid_domain_id. The binlog will then in effect consists

of multiple independent streams, one per active master. Within one replication

domain, binlog order is always the same on every server. But two different

streams can be interleaved differently in different server binlogs.

The GTID position of a given slave is then not a single GTID. Rather, it

becomes the GTID of the last event group applied for each value of domain ID,

in effect the position reached in each binlog stream. When the slave connects

to a master, it can continue from one stream in a different binlog position

than another stream. Since order within one stream is consistent across all

servers, this is sufficient to always be able to continue replicationat the

correct point in any new master server(s).

Domain IDs are assigned by the DBA, according to the need of the application.

The default value of @@GLOBAL.gtid_domain_id is 0. This is appropriate for

most replication setups, where only a single master is active at a time. The

MariaDB server will never by itself introduce new domain_id values into the

binlog.

When using multi-source replication, where a single slave connects to multiple

masters at the same time, each such master should be configured with its own

distict domain ID.

Similarly, in a multi-master ring topology, where all master in the ring are

updated by the application concurrently (with some mechanism to avoid

conflicts), a distict domain ID should be configured for each server (In a

multi-master ring where the application is careful to only do updates on one

master at a time, a single domain ID is sufficient).

Normally, a slave server should not receive direct updates (as this creates

binlog differences compared to the master). Thus it does not matter what value

of gtid_domain_id is set on a slave, though it may make sense to make it the

same as the master (if not using multi-master) to make it easy to promote the

slave as a new master. Of course, if a slave is itself an active master, as in

a multi-master ring topology, the domain ID should be set according to the

server's role as active master.

Note that domain ID and server ID are distinct concepts. It is possible to use

a different domain ID on each server, but this is normally not desirable. It

makes the current GTID position (@@global.gtid_pos) more complicated to

understand and work with, and looses the concept of a single ordered binlog

stream across all servers. It is recommended only to configure as many domain

IDs as there are master servers actively being updated by the application at

the same time.

It is not an error in itself to configure domain IDs incorrectly (for example,

not configuring them at all). For example, this will be typical in an upgrade

scenario where a multi-master ring using 5.5 is upgraded to 10.0. The ring

will continue to work as before even though everything is configured to use

the default domain ID 0. It is even possible to use GTID for replication

between the servers. However, case must be taken when switching a slave to a

different master. If the binlog order between the old and the new master

differs, then a single GTID position to start replication from in the new

master's binlog may not be sufficient.

G. New syntax for global transaction ID

G.1. CHANGE MASTER

CHANGE MASTER has a new option, master_use_gtid=[0|1]. When enabled (set to

1), the slave will connect to the master using the GTID position. When

disabled, the old-style binlog filename/offset position is used to decide

where to start replicating when connecting.

The value of master_use_gtid is saved across server restarts (in

master.info). The current value can be seen as the field Using_Gtid in the

output of SHOW SLAVE STATUS.

G.2. BINLOG_GTID_POS().

The BINLOG_GTID_POS() function takes as input an old-style binlog position in

the form of a file name and a file offset. It looks up the position in the

current binlog, and returns a string representation of the corresponding GTID

position. If the position is not found in the current binlog, NULL is

returned.

H. New system variables for global transaction ID

H.1. gtid_pos

This variable is the current GTID position of a slave server.

It can be set by the user to change the current replication position. This

requires all slave threads to be stopped first. Note that the position is

shared among all slave connections when using multi-source replication. To set

position for two masters, one using replication domain 1 and another

replication domain 2, set a GTID for both domains, for example:

SET GLOBAL gtid_pos = "1-10-100,2-20-500";

The variable value is updated whenever an event group is replicated on a

slave, and whenever something is logged to the binlog on the master.

Note that the value of the variable is the result of whatever event happened

last, either slave replication or master binlogging, per replication

domain. It is an error to set it to something that conflicts with what is in

the binlog. This means that to completely reset a slave server (RESET SLAVE

and delete all tables), it is also necessary to RESET MASTER before

@@GLOBAL.gtid_pos can be cleared (if binlogging is enabled on the slave). This

is in any case necessary to avoid incorrect binlog on the slave.

Name: gtid_pos

Type: String

Scope: global

Privileged: yes

Dynamic: yes

H.2. gtid_domain_id

This variable is used to decide which replication domain new GTIDs are logged

in for a master server. See section "F. Using global transaction with

multi-source replication and other multi-master setups" for details.

This variable can also be set on the session level. This is used by

mysqlbinlog to preserve the domain ID of GTID events.

Name: gtid_domain_id

Type: 32-bit unsigned integer

Scope: global and session

Privileged: yes

Dynamic: yes

H.3. server_id

Server_id can be set on the session level to change which server_id value is

logged in binlog events (both GTID and other events). This is used by

mysqlbinlog to preserve the server ID of GTID events.

Name: server_id

Type: 32-bit unsigned integer

Scope: global and session

Privileged: yes

Dynamic: yes

H.3. gtid_seq_no

gtid_seq_no can be set on the session level to change which sequence number is

logged in the following GTID event. This is used by mysqlbinlog to preserve

the sequence number of GTID events.

Name: gtid_seq_no

Type: 64-bit unsigned integer

Scope: session only

Privileged: yes

Dynamic: yes