Hi Jan,

Thanks for the review.

On Tue, May 3, 2016 at 3:06 AM, Jan Lindström <jan.lindstrom@mariadb.com> wrote:
Hi Nirbhay,

I'm not expert on this code area

I am also hoping to get some remarks from Seppo/Teemu on this.
 
but some questions, comments below. 

Addressed : http://lists.askmonty.org/pipermail/commits/2016-May/009343.html
See my comments inline.

 

On Tue, May 3, 2016 at 1:26 AM, Nirbhay Choubey <nirbhay@mariadb.com> wrote:
revision-id: a6fb98fd74ab6f14fab0c1ef4630b010ff28880f (mariadb-10.1.13-6-ga6fb98f)
parent(s): f8cdb9e7e8fb692f8eb3b9bb4792cd60653e0eb8
author: Nirbhay Choubey
committer: Nirbhay Choubey
timestamp: 2016-05-02 18:26:58 -0400
message:

MDEV-6368: assertion xid_seqno > trx_sys_cur_xid_seqno

- Validate the specified wsrep_start_position value by also
checking the return status of wsrep->sst_received. This also
ensures that changes in wsrep_start_position is not allowed
when the node is not in JOINING state.
- Do not allow decrease in seqno within same UUID.
- The initial checkpoint in SEs should be [0...:-1].


diff --git a/sql/wsrep_sst.cc b/sql/wsrep_sst.cc
index 4f6cc51..44fdd03 100644
--- a/sql/wsrep_sst.cc
+++ b/sql/wsrep_sst.cc
@@ -264,42 +264,91 @@ void wsrep_sst_complete (const wsrep_uuid_t* sst_uuid,
   mysql_mutex_unlock (&LOCK_wsrep_sst);
 }

-void wsrep_sst_received (wsrep_t*            const wsrep,
+/*
+  If wsrep provider is loaded, inform that the new state snapshot
+  has been received. Also update the local checkpoint.
+
+  @param wsrep     [IN]               wsrep handle
+  @param uuid      [IN]               Initial state UUID
+  @param seqno     [IN]               Initial state sequence number
+  @param state     [IN]               Always NULL, also ignored by wsrep provider (?)
+  @param state_len [IN]               Always 0, also ignored by wsrep provider (?)
+  @param implicit  [IN]               Whether invoked implicitly due to SST
+                                      (true) or explicitly because if change
+                                      in wsrep_start_position by user (false).
+  @return false                       Success
+          true                        Error
+
+*/
+bool wsrep_sst_received (wsrep_t*            const wsrep,
                          const wsrep_uuid_t&       uuid,
                          wsrep_seqno_t       const seqno,
                          const void*         const state,
-                         size_t              const state_len)

Why not const size_t state_len ? Use of const here is confusing.

Updated.
 
 
+                         size_t              const state_len,
+                         bool                const implicit)
 {
-    wsrep_get_SE_checkpoint(local_uuid, local_seqno);
+  /*
+    To keep track of whether the local uuid:seqno should be updated. Also, note
+    that local state (uuid:seqno) is updated/checkpointed only after we get an
+    OK from wsrep provider. By doing so, the values remain consistent across
+    the server & wsrep provider.
+  */
+  bool do_update= false;
+
+  // Get the locally stored uuid:seqno.
+  wsrep_get_SE_checkpoint(local_uuid, local_seqno);

What if this fails, no error handling ?

Done.
 
 

-    if (memcmp(&local_uuid, &uuid, sizeof(wsrep_uuid_t)) ||
-        local_seqno < seqno || seqno < 0)
+  if (memcmp(&local_uuid, &uuid, sizeof(wsrep_uuid_t)) ||
+      local_seqno < seqno)
+  {
+    do_update= true;
+  }
+  else if (local_seqno > seqno)
+  {
+    WSREP_WARN("SST position can't be set in past. Requested: %lld, Current: "
+               " %lld.", (long long)seqno, (long long)local_seqno);
+    /*
+      If we are here because of SET command, simply return true (error) instead of
+      aborting.
+    */
+    if (implicit)
     {
-        wsrep_set_SE_checkpoint(uuid, seqno);
-        local_uuid = uuid;
-        local_seqno = seqno;
+      WSREP_WARN("Can't continue.");
+      unireg_abort(1);
     }
-    else if (local_seqno > seqno)
+    else
     {
-        WSREP_WARN("SST postion is in the past: %lld, current: %lld. "
-                   "Can't continue.",
-                   (long long)seqno, (long long)local_seqno);
-        unireg_abort(1);
+      return true;
     }
+  }

 #ifdef GTID_SUPPORT
-    wsrep_init_sidno(uuid);
+  wsrep_init_sidno(uuid);
 #endif /* GTID_SUPPORT */

-    if (wsrep)
-    {
-        int const rcode(seqno < 0 ? seqno : 0);
-        wsrep_gtid_t const state_id = {
-            uuid, (rcode ? WSREP_SEQNO_UNDEFINED : seqno)
-        };
+  if (wsrep)
+  {
+    int const rcode(seqno < 0 ? seqno : 0);

When seqno can be < 0 ?

While I cannot think of a use case for anything below -1, but
-1 is the seqno assigned during initialization.

 
+    wsrep_gtid_t const state_id= {uuid,
+      (rcode ? WSREP_SEQNO_UNDEFINED : seqno)};
+
+    wsrep_status_t ret= wsrep->sst_received(wsrep, &state_id, state,
+                                            state_len, rcode);

-        wsrep->sst_received(wsrep, &state_id, state, state_len, rcode);
+    if (ret != WSREP_OK)
+    {
+      return true;
     }
+  }
+
+  // Now is the good time to update the local state and checkpoint.
+  if (do_update) {
+    wsrep_set_SE_checkpoint(uuid, seqno);

Again, I would add error handling.

Done.
 
 
+    local_uuid= uuid;
+    local_seqno= seqno;
+  }
+
+  return false;
 }

 // Let applier threads to continue
@@ -308,7 +357,7 @@ void wsrep_sst_continue ()
   if (sst_needed)
   {
     WSREP_INFO("Signalling provider to continue.");
-    wsrep_sst_received (wsrep, local_uuid, local_seqno, NULL, 0);
+    wsrep_sst_received (wsrep, local_uuid, local_seqno, NULL, 0, true);

You should check the return code.

Done.
 
 
   }
 }

diff --git a/sql/wsrep_var.cc b/sql/wsrep_var.cc
index 9c01e54..bd94190 100644


 bool wsrep_start_position_update (sys_var *self, THD* thd, enum_var_type type)
 {
-  WSREP_INFO ("wsrep_start_position var submitted: '%s'",
-              wsrep_start_position);
-  // since this value passed wsrep_start_position_check, don't check anything
-  // here
-  wsrep_set_local_position (wsrep_start_position, true);
-  return 0;
+  // Print a confirmation that wsrep_start_position has been updated.
+  WSREP_INFO ("wsrep_start_position set to '%s'", wsrep_start_position);
+  return false;

Hmm, normally I thought that system variable set would go like if (variable_check()) variable_update()
but now update does not do nothing ?

variable_update() (or variable's on_update()), is actually called after the actual
update of the system variable. So the sequence is roughly like:

on_check() -> global_update()/session_update() -> on_update().

See sql_set_variables() and sys_var::update().

So, since wsrep_set_local_position() can fail, I have moved it under on_check, such that
the actual variable is updated only after setting of local position is successful. And thus,
on_update essentially becomes a no-op.

 
 } 

 void wsrep_start_position_init (const char* val)
@@ -164,7 +194,7 @@ void wsrep_start_position_init (const char* val)
     return;
   }

-  wsrep_set_local_position (val, false);
+  wsrep_set_local_position (val, strlen(val), false);

Add error handling.

Done.
 
 
 }

 static bool refresh_provider_options()
diff --git a/storage/innobase/trx/trx0sys.cc b/storage/innobase/trx/trx0sys.cc
index 76c8613..1625e8d 100644
--- a/storage/innobase/trx/trx0sys.cc
+++ b/storage/innobase/trx/trx0sys.cc
@@ -349,10 +349,10 @@ void read_wsrep_xid_uuid(const XID* xid, unsigned char* buf)
             unsigned char xid_uuid[16];
             long long xid_seqno = read_wsrep_xid_seqno(xid);
             read_wsrep_xid_uuid(xid, xid_uuid);
-            if (!memcmp(xid_uuid, trx_sys_cur_xid_uuid, 8))
+            if (!memcmp(xid_uuid, trx_sys_cur_xid_uuid, 16) &&
+                (xid_seqno > -1))

Does this mean that you ignore uuid if it is exactly the same, or is there error handling on else?

If UUID remains same, we only have to update the seqno.

Best,
Nirbhay

 
R: Jan