Hi Kristian,
Thanks. I will have a look as soon as I have some time. 
I need just a piece of information: is this syntax applicable to the current binaries, or do I need to recompile after a specific revision in the code?

Cheers

Giuseppe



On Monday, May 6, 2013 at 15:55 , Kristian Nielsen wrote:

Kristian Nielsen <knielsen@knielsen-hq.org> writes:

I will try to get some initial docs written up by the end of the week.

Giuseppe, here (attached) is my first stab at documentation for MariaDB
GTID. Thanks for prodding me to get this done, even if it was a very gentle
prodding :-)

Daniel, do you think you can get these formatted properly and included in some
appropriate place in the Knowledgebase? And feel free to fix any problems you
find in the text on the way, of course.

I will then make sure to keep this updated as the code progresses.

This is just a first stab, I am sure there is much that is incomplete and I
will work to extend it as needed (comments welcome if anything is found
missing).

- Kristian.
A. MariaDB global transaction ID

From version 10.0, MariaDB supports global transaction IDs for replication.

MariaDB replication in general works as follows: On a master server, all
updates to the database (DML and DDL) are written into the binary log as
binlog events. A slave server connects to the master and reads the binlog
events, then applies the events locally to replicate the same changes as done
on the master. A server can be both a master and a slave at the same time, and
it is thus possible for binlog events to replicated through multiple levels of
servers.

A slave server keeps track of the position in the master's binlog of the last
event applied on the slave. This allows the slave server to re-connect and
resume from where it left off after replication has been temporarily
stopped. It also allows to disconnect from the master server and connect to a
different server to resume replication from a new master, as long as the new
master has the proper binlog events, and the new master connection starts
replicationg at the appropriate point in the binlogs.

Global transaction ID introduces a new event attached to each event group in
the binlog. (An event group is a collection of events that are always applied
as a unit. They are best thought of as a "transaction", though they also
include non-transactional DML statements, as well as DDL). As an event group
is replicated from master server to slave server, the global transaction ID is
preserved. Since the ID is globally unique across the entire group of servers,
this makes it easy to uniquely identify the same binlog events on different
servers that replicate each other (this was not easily possible before MariaDB
10.0).

Using global transaction ID provides two main benefits:

1. Easy to change a slave server to connect to and replicate from a different
master server.

The slave remembers the global transaction ID of the last event group
applied from the old master. This makes it easy to know where to resume
replication on the new master, since the global transaction IDs are know
throughout the entire replication hierarchy. This is not the case when
using old-style replication; in this case the slave knows only the
specific file name and offset of the old master server of the last event
applied. There is no simple way to guess from this the correct file name
and offset on a new master.

2. The state of the slave is recorded in a crash-safe way.

The slave keeps track of its current position (the global transaction ID
of the last transaction applied) in a system table mysql.rpl_slave_state.
If this table is using a transactional storage engine (such as InnoDB,
which is the default), then updates to the state is done in the same
transaction as the updates to the data. This makes the state crash-safe;
if the slave server crashes, crash recovery on restart will make sure that
the recorded replication position matches what changes were actually
replicated. This is not the case for old-style replication, where the
state is recorded in a file relay-log.info, which is updated independently
of the actual data changes and can easily get out of sync if the slave
server crashes. (This works for DML to transactional tables;
non-transactional tables and DDL in general are not crash-safe in
MariaDB.)

Because of these two benefits, it is generally recommended to use global
transaction ID for any replication setups based on MariaDB 10.0 or later.
However, old-style replication continues to work as always, so there is no
pressing need to change existing setups. Global transaction ID integrates
smoothly with old-style replication, and the two can be used freely together
in the same replication hierarchy. There is no special configuration needed of
the server to start using global transaction ID. However, it must be
explicitly set for a slave server with the appropriate CHANGE MASTER option;
by default old-style replication is used by a replication slave, to maintain
backwards compatibility.


B. The concept of global transaction ID

A global transaction ID, or GTID for short, consists of three numbers
separated with dashes '-'. For example:

0-1-10

- The first number 0 is the domain ID, which is specific for global
transaction ID (more on this below). It is a 32-bit unsigned integer.

- The second number is the server ID, the same as is also used in old-style
replication. It is a 32-bit unsigned integer.

- The third number is the sequence number. This is a 64-bit unsigned integer
that is monotonically increasing for each new event group logged into the
binlog.

The server ID is set to the server ID of the server where the event group is
first logged into the binlog. The sequence number is increased on a server for
every event group logged. Since server IDs must be unique for every server,
this makes the (server_id, sequence_number) pair, and hence the whole GTID,
globally unique.


B.1. The domain ID

When events are replicated from a master server to a slave server, the events
are always logged into the slave's binlog in the same order that they were
read from the master's binlog. Thus, if there is only ever a single master
server receiving (non-replication) updates at a time, then the binlog order
will be idential on every server in the replication hierarchy.

This consistent binlog order is used by the slave to keep track of its current
position in the replication. Basically, the slave remembers the GTID of the
last event group replicated from the master. When reconnecting to a master,
whether the same one or a new one, it sends this GTID position to the master,
and the master starts sending events from the first event after the
corresponding event group.

However, if user updates are done independently on multiple servers at the
same time, then in general it is not possible for binlog order to be identical
across all servers. This can happen when using multi-source replication, with
multi-master ring topologies, or just if manual updates are done on a slave
that is replicating from active master. If the binlog order is different on
the new master from the order on the old master, then it is not sufficient for
the slave to keep track of a single GTID to completely record the current
state.

The domain ID, the first component of the GTID, is used to handle this.

In general, the binlog is not a single ordered stream. Rather, it consists of
a number of different streams, each one identified by its own domain
ID. Within each stream, GTIDs always have the same order in every server
binlog. However, different streams can be interleaved in different ways on
different servers.

A slave server then keeps track of its replication position by recording the
last GTID applied within each replication stream. When connecting to a new
master, the slave can start replication from a different point in the binlog
for each domain ID.

For more details on using multi-master setups and multiple domain IDs, see
section "F. Using global transaction with multi-source replication and other
multi-master setups".

Simple replication setups only have a single master being updated by the
application at any one time. In such setups, there is only a single
replication stream needed. Then domain ID can be ignored, and left as the
default of 0 on all servers.


C. Using global transaction ID

In MariaDB 10.0, global transaction ID is enabled automaticall. Each event
group logged to the binlog receives a GTID event, as can be seen with
mysqlbinlog or SHOW BINLOG EVENTS.

The slave automatically keeps track of the GTID of the last applied event
group, as can be seen from the gtid_pos variable:

SELECT @@GLOBAL.gtid_pos
0-1-1

When a slave connects to a master, it can use either global transaction ID or
old-style filename/offset to decide where in the master binlogs to start
replicating from. To use global transaction ID, use the master_use_gtid option
of CHANGE MASTER:

CHANGE MASTER TO master_use_gtid = 1, master_host = 'my_master', ...

When the slave is then later started with START SLAVE, it will send the value
of @@GLOBAL.gtid_pos to the master and start replication from the
corresponding point in the master binlogs.

Even when a slave is configured to connect with the old-style binlog filename
and offset (CHANGE MASTER TO master_log_file=..., master_log_pos=...), it will
still keep track of the current GTID position in @@GLOBAL.gtid_pos. This means
that an existing slave previously configured and running can be changed to
connect with GTID (to the same or a new master) simply with:

CHANGE MASTER TO master_use_gtid = 1

The slave remembers that master_use_gtid=1 was specified and will use it also
for subsequent connects, until it is explicitly changed by specifying
master_log_file/pos=... or master_use_gtid=0. The current value can be seen as
the field Using_Gtid of SHOW SLAVE STATUS:

SHOW SLAVE STATUS
Using_Gtid: 1

The slave server internally uses the table mysql.rpl_slave_state to store the
GTID position (and so preserve the value of @@GLOBAL.gtid_pos across server
restarts). After upgrading a server to 10.0, it is necessary to run
mysql_upgrade_db (as always) to get the table created.

In order to be crash-safe, this table must use a transactional storage engine
such as InnoDB. When MariaDB is first installed (or upgraded to 10.0), the
table is created using the default storage engine - which itself defaults to
InnoDB. If there is a need to change the storage engine for this table (to
make it transactional on a system configured with MyISAM as the default
storage engine, for example), use ALTER TABLE:

ALTER TABLE mysql.rpl_slave_state ENGINE = InnoDB

The table mysql.rpl_slave_state should not be modified in any other way. In
particular, do not try to update the rows in the table to change the slave's
idea of the current GTID position; instead use

SET GLOBAL gtid_pos = '0-1-1'

The actual slave GTID position, and thus the value of @@GLOBAL.gtid_pos, is
the result of a combination of the contents of the mysql.rpl_slave_state and
the contents of the slave binlog (if any, eg. if --log-slave-updates is
enabled). This allows to use the same CHANGE MASTER TO ... MASTER_USE_GTID=1
command to connect a server as slave to a new master, regardless of whether
the server was acting as a slave or a master before.


D. Setting up a new slave server with global transaction ID

Setting up a new replication slave server with global transaction ID is not
much different from setting up an old-style slave. The basic steps are:

1. Setup the new server and load it with the initial data.

2. Start the slave replicating from the appropriate point in the master's
binlog.


D.1. Start with empty server, replicate all binary logs

The simplest way for testing purposes is probably to setup a new, empty slave
server and replicate all of the master's binlogs from the start (this is
usually not feasible in a realistic production setup, as the initial binlog
files will probably have been purged or take too long to apply).

The slave server is installed in the normal way. By default, the GTID position
for a newly installed server is empty, which makes the slave replicate from
the start of the master's binlogs. But if the slave was used for other
purposes before, the initial position can be explicitly set to empty first:

SET GLOBAL gtid_pos = "";

Next, point the slave to the master with CHANGE MASTER. Specify master_host
etc. as usual. But instead of specifying master_log_file and master_log_pos
manually, use master_use_gtid=1 to have GTID do it automatically:

CHANGE MASTER TO master_host="127.0.0.1", master_port=3310,
master_user="root", master_use_gtid=1;
START SLAVE;


D.2. Setting up a new slave from a backup

The normal way to set up a new replication slave is to restore a backup from
an existing server (whether master or slave) as the new server, then point it
to start replication from the appropriate position in the master's binlog.

It is important that the position at which replication is started corresponds
exactly to the state of the data at the point in time that the backup was
taken. Otherwise, the slave can end up with different data than the master
because of transactions missing or duplicated.

Two common ways to take a backup are XtraBackup and mysqldump. Both of these
can provide the current binlog position of the backup in a non-blocking
way. Of course, if there are no writes to the server being backed up during
the backup process, then a simple SHOW MASTER STATUS will give the correct
position.

Once the current binlog position for the backup has been obtained, in the form
of a binlog file name and offset, the corresponding GTID position can be
obtained from BINLOG_GTID_POS() on the server that was backed up:

SELECT BINLOG_GTID_POS("master-bin.000001", 600);

The new slave can now be started replicating by setting the correct
@@gtid_pos, issuing CHANGE MASTER to point to the master server, and starting
the slave threads:

SET GLOBAL gtid_pos = "0-1-2";
CHANGE MASTER TO master_host="127.0.0.1", master_port=3310,
master_user="root", master_use_gtid=1;
START SLAVE;

This method is particularly useful when setting up a new slave from a backup
of the master. Remember to ensure that the value of server_id for the new
server is different from that of any other server (this is set in my.cnf).

If the backup was taken of an existing slave server, then it already has the
correct GTID position stored in the table mysql.rpl_slave_state (provided that
the backup includes this table and is consistent with changes to other tables,
of course). In this case, there is no need to explicitly look up the GTID
position on the old server and set it on the new slave - it will be already
correctly loaded from mysql.rpl_slave_state. This however does not work if the
backup was taken of the master - because then the current GTID position is
contained in the binlog, not in mysql.rpl_slave_state.


D.3. Switching an existing old-style slave to use GTID.

If there is already an existing slave running using old-style binlog
filename/offset position, then this can be changed to use GTID directly. This
can be useful for upgrades for example, or where there are already tools to
setup new slaves using old-style binlog positions.

When a slave connects to a master using old-style binlog positions, and the
master supports GTID (ie. is MariaDB 10.0.2 or bigger), then the slave
automatically downloads the GTID position at connect and updates it during
replication. Thus, once a slave has connected to the GTID-aware master at
least once, it can be switched to using GTID without any other actions needed;

STOP SLAVE;
CHANGE MASTER TO master_host="127.0.0.1", master_port=3310,
master_user="root", master_use_gtid=1;
START SLAVE;

(A later version will probably add a way to setup the slave so that it will
connect with old-style binlog file/offset the first time, and automatically
switch to using GTID on subsequent connects.)


E. Changing a slave to replicate from a different master

Once replication is running with GTID (master_use_gtid=1), the slave can be
pointed to a new master simply by specifying in CHANGE MASTER the new
master_host (and if required master_port, master_user, and master_password):

STOP SLAVE;
CHANGE MASTER TO master_host='127.0.0.1', master_port=3312;
START SLAVE;

The slave has a record of the GTID of the last applied transaction from the
old master, and since GTIDs are identical across all servers in a replication
hierarchy, the slave will just continue from the appropriate point in the new
master's binlog.

It is important to understand how this change of masters work. The binlog is
an ordered stream of events (or multiple streams, one per replication domain,
see section "F. Using global transaction with multi-source replication and
other multi-master setups"). Events within the stream are always applied in
the same order on every slave that replicates it. The MariaDB GTID relies on
this ordering, so that it is sufficient to remember just a single point within
the stream. Since event order is the same on every server, switching to the
point of the same GTID in the binlog of another server will give the same
result.

This translates into some responsibility for the user. The MariaDB GTID
replication is fully asynchronous, and fully flexible in how it can be
configured. This makes it possible to use it in ways where the assumption that
binlog sequence is the same on all servers is violated. In such cases, when
changing master, GTID will still attempt to continue at the point of current
GTID in the new binlog.

The most common way that binlog sequence gets different between servers is
when the user/DBA does updates directly on a slave server (and these updates
are written into the slaves binlog). This results in events in the slaves
binlog that are not present on the master or any other slaves. This can be
avoided by setting the session variable sql_log_bin false while doing such
updates, so they do not go into the binlog.

It is normally best to avoid any differences in binlogs between servers. That
being said, MariaDB replication is designed for maximum flexibility, and there
can be valid reasons for introducing such differences from time to time. It
this case, it just needs to be understood that the GTID position is a single
point in each binlog stream (one per replication domain), and how this affects
the users particular setup.

Differences can also occur when two masters are active at the same time in a
replication hierarchy. This happens when using a multi-master ring. But it can
also occur in a simple master-slave setup, during switch to a new master, if
changes on the old master is not allowed to fully replicate to all slave
servers before switching master. Normally, to switch master, first writes to
the old master should be stopped, then one should wait for all changes to be
replicated to the new master, and only then should writes begin on the new
master. Deliberately using multiple active masters is also supported, this is
described in the next section.


F. Using global transaction with multi-source replication and other
multi-master setups

MariaDB global transaction ID supports having multiple masters active at the
same time. Typically this happens with either multi-source replication or
multi-master ring setups.

In such setups, each active master must be configured with its own distinct
replication domain ID, gtid_domain_id. The binlog will then in effect consists
of multiple independent streams, one per active master. Within one replication
domain, binlog order is always the same on every server. But two different
streams can be interleaved differently in different server binlogs.

The GTID position of a given slave is then not a single GTID. Rather, it
becomes the GTID of the last event group applied for each value of domain ID,
in effect the position reached in each binlog stream. When the slave connects
to a master, it can continue from one stream in a different binlog position
than another stream. Since order within one stream is consistent across all
servers, this is sufficient to always be able to continue replicationat the
correct point in any new master server(s).

Domain IDs are assigned by the DBA, according to the need of the application.
The default value of @@GLOBAL.gtid_domain_id is 0. This is appropriate for
most replication setups, where only a single master is active at a time. The
MariaDB server will never by itself introduce new domain_id values into the
binlog.

When using multi-source replication, where a single slave connects to multiple
masters at the same time, each such master should be configured with its own
distict domain ID.

Similarly, in a multi-master ring topology, where all master in the ring are
updated by the application concurrently (with some mechanism to avoid
conflicts), a distict domain ID should be configured for each server (In a
multi-master ring where the application is careful to only do updates on one
master at a time, a single domain ID is sufficient).

Normally, a slave server should not receive direct updates (as this creates
binlog differences compared to the master). Thus it does not matter what value
of gtid_domain_id is set on a slave, though it may make sense to make it the
same as the master (if not using multi-master) to make it easy to promote the
slave as a new master. Of course, if a slave is itself an active master, as in
a multi-master ring topology, the domain ID should be set according to the
server's role as active master.

Note that domain ID and server ID are distinct concepts. It is possible to use
a different domain ID on each server, but this is normally not desirable. It
makes the current GTID position (@@global.gtid_pos) more complicated to
understand and work with, and looses the concept of a single ordered binlog
stream across all servers. It is recommended only to configure as many domain
IDs as there are master servers actively being updated by the application at
the same time.

It is not an error in itself to configure domain IDs incorrectly (for example,
not configuring them at all). For example, this will be typical in an upgrade
scenario where a multi-master ring using 5.5 is upgraded to 10.0. The ring
will continue to work as before even though everything is configured to use
the default domain ID 0. It is even possible to use GTID for replication
between the servers. However, case must be taken when switching a slave to a
different master. If the binlog order between the old and the new master
differs, then a single GTID position to start replication from in the new
master's binlog may not be sufficient.


G. New syntax for global transaction ID

G.1. CHANGE MASTER

CHANGE MASTER has a new option, master_use_gtid=[0|1]. When enabled (set to
1), the slave will connect to the master using the GTID position. When
disabled, the old-style binlog filename/offset position is used to decide
where to start replicating when connecting.

The value of master_use_gtid is saved across server restarts (in
master.info). The current value can be seen as the field Using_Gtid in the
output of SHOW SLAVE STATUS.


G.2. BINLOG_GTID_POS().

The BINLOG_GTID_POS() function takes as input an old-style binlog position in
the form of a file name and a file offset. It looks up the position in the
current binlog, and returns a string representation of the corresponding GTID
position. If the position is not found in the current binlog, NULL is
returned.


H. New system variables for global transaction ID

H.1. gtid_pos

This variable is the current GTID position of a slave server.

It can be set by the user to change the current replication position. This
requires all slave threads to be stopped first. Note that the position is
shared among all slave connections when using multi-source replication. To set
position for two masters, one using replication domain 1 and another
replication domain 2, set a GTID for both domains, for example:

SET GLOBAL gtid_pos = "1-10-100,2-20-500";

The variable value is updated whenever an event group is replicated on a
slave, and whenever something is logged to the binlog on the master.

Note that the value of the variable is the result of whatever event happened
last, either slave replication or master binlogging, per replication
domain. It is an error to set it to something that conflicts with what is in
the binlog. This means that to completely reset a slave server (RESET SLAVE
and delete all tables), it is also necessary to RESET MASTER before
@@GLOBAL.gtid_pos can be cleared (if binlogging is enabled on the slave). This
is in any case necessary to avoid incorrect binlog on the slave.

Name: gtid_pos
Type: String
Scope: global
Privileged: yes
Dynamic: yes


H.2. gtid_domain_id

This variable is used to decide which replication domain new GTIDs are logged
in for a master server. See section "F. Using global transaction with
multi-source replication and other multi-master setups" for details.

This variable can also be set on the session level. This is used by
mysqlbinlog to preserve the domain ID of GTID events.

Name: gtid_domain_id
Type: 32-bit unsigned integer
Scope: global and session
Privileged: yes
Dynamic: yes


H.3. server_id

Server_id can be set on the session level to change which server_id value is
logged in binlog events (both GTID and other events). This is used by
mysqlbinlog to preserve the server ID of GTID events.

Name: server_id
Type: 32-bit unsigned integer
Scope: global and session
Privileged: yes
Dynamic: yes


H.3. gtid_seq_no

gtid_seq_no can be set on the session level to change which sequence number is
logged in the following GTID event. This is used by mysqlbinlog to preserve
the sequence number of GTID events.

Name: gtid_seq_no
Type: 64-bit unsigned integer
Scope: session only
Privileged: yes
Dynamic: yes