Re: [Maria-developers] prospective GSOC 2017 student [MDEV-7502]

19 Mar 2017

      Hi, ibrar!

On Mar 19, ibrar arshad wrote:
...
Hi,
My name is Ibrar Arshad and I am interested in working on the task of
automatic slave provisioning(ticket: MDEV-7502
<https://jira.mariadb.org/browse/MDEV-7502>) during GSOC 2017. I have read
the summary on the ticket and have achieved a fair understanding of the
problem and I am working towards ironing out the implementation details.
The use-case as I understand is that we want the slave to auto-replicate
the data from master once pointed the master
Yes.
...
and we want to do it in such a manner that the binlog events from
current master position as well as the old data chunks are relayed to
the slave in a parallel fashion.
Not necessarily. There could be other approaches too.

May be even bulk-loading the data would be faster than sending data in
chunks and applying events in parallel. Or may be not.
...
I have a few questions related to the proposal:
1. After reading a few pages on replication, my understanding is
   that after "CHANGE MASTER TO" and "START SLAVE", master starts
   sending binlog events from its current position to the slave which
   slave starts applying. The usual replication approach is to get the
   current binlog position on master, backup all the data till this
   position from master to slave, point slave to this position(or
   GTID) via "CHANGE MASTER TO", and START SLAVE to start replicating
   bin events from master. But for MDEV-7502, we want the normal
   events and old data chunks to be transmitted in parallel.
The main thing we want for MDEV-7502 is to avoid the step of "backup all
the data... restore on the slave".
...
The ticket summary mentions using separate domain_ids to send the
   new and old data in parallel, does there exist a way to do so
   currently? How can domain id be used here? Can we currently point
   the slave to 2 different bin positions on a single master and
   expect the master to send events from both positions?  Or will this
   require some sort of new process/thread implementation on master to
   do so?
No, this won't. I didn't actually try to connect twice from a slave to
the same master, but I suspect it'll either work or can be fixed to work
rather easily.
...
2. There are at-least two other approaches mentioned in the
   ticket's comments section. It doesn't seem like that a single
   approach has been finalized. This project doesn't seem to have a
   mentor yet to provide guidance so which approach should an
   applicant pursue further?
Yes, the project suggests few different approaches. You can discuss them
in your proposal and suggest the one you think is the best.
There will be a mentor, don't worry. It just wasn't formally assigned
yet.
...
I would like to discuss the project approaches and implementation
further in detail before submitting a proposal so can somebody please
answer my queries and further suggest pointers to this project
specific material which I can go through to get a deeper
understanding? Thanks.
Hmm..

For example, I've mentioned above that it's not clear whether sending
all data first and bulk-loading them will be faster or slower than
interleaving data anf RBR binlog events.

You can test it. Get a big table dump (not huge, but something that
loads a noticeable amount of time). Then get a bunch of single-row
update/delete/updates.
And try 1) load the dump, do updates. 2) do updates in parallel with the
dump. Just take care to enable at least the primary key, and made sure
that in both approaches you get the same table content at the end.
That's a simple test, no coding involved, but it'll give some
understanding as to what approach is faster on the slave side.

Regards,
Sergei
Chief Architect MariaDB
and security@mariadb.org

Re: [Maria-developers] prospective GSOC 2017 student [MDEV-7502]

Sergei Golubchik