Re: [Maria-developers] 5365db70cb8: Improve update handler (long unique keys on blobs)

16 Mar 2020

Hi!

<cut>
...
I mean, you have tests, both with PERIOD, but one in the main suite, the
other in the versioning suite. While they both should be in the period
suite. Here, your test from the main/long_unique.test:
...
+CREATE OR REPLACE TABLE t1 (a INT, b BLOB, s DATE, e DATE, PERIOD FOR app(s,e), UNIQUE(b)) ENGINE=MyISAM PARTITION BY HASH(a) PARTITIONS 2;
+INSERT INTO t1 VALUES (1,'foo','2022-01-01', '2025-01-01');
+--error ER_DUP_ENTRY
+DELETE FROM t1 FOR PORTION OF app FROM '2023-01-01' TO '2024-01-01';
+DROP TABLE t1;
Your test from the suite/versioning/t/long_unique.test:
+CREATE TABLE t1 (f VARCHAR(4096), s DATE, e DATE, PERIOD FOR app(s,e), UNIQUE(f)) ENGINE=MyISAM;
+INSERT INTO t1 VALUES ('foo', '2023-08-30', '2025-07-09'),('bar', '2021-01-01', '2021-12-31');
+DELETE FROM t1 FOR PORTION OF app FROM '2023-08-29' TO '2025-07-01';
+DROP TABLE t1;
The above are actual different, indendent tests not related to each other.
But I agree that they are both related to period.

I will move the PERIOD test from
main/long_unique.test
and
suite/versioning/t/long_unique.test
to
suite/period/t/long_unique.test

Sorry, I got confused about the suggestion with moving whole test files.
...
...
...
...

diff --git a/sql/ha_partition.cc b/sql/ha_partition.cc
index 73191907aac..c7e61864366 100644
--- a/sql/ha_partition.cc
+++ b/sql/ha_partition.cc
@@ -4340,6 +4339,10 @@ int ha_partition::write_row(const uchar * buf)
   }
   m_last_part= part_id;
   DBUG_PRINT("info", ("Insert in partition %u", part_id));
+  if (table->s->long_unique_table &&
+      m_file[part_id]->update_handler == m_file[part_id] && inited == RND)
+    m_file[part_id]->prepare_for_insert(0);
This looks quite incomprehensible. If the table has a long unique key
and the update_handler is the handler itself you _prepare for insert_ ???
This is basically the original code we had in handler::write_row.
We have to have this in this place as we don't want to call
handler->prepare_for_insert() for all partitions, only those
that where really used.
I mean not the condition, but the "prepare_for_insert" method name that
you've introduced, it didn't exist in the original code.
If the name would've been, say, "prepare_update_handler" or
"prepare_long_unique_handler" then the code would've looked fine and I
wouldn't have asked.
...
...
if it's an insert it should've been prepared earlier, like, normally for
an insert, and it don't have to do anything with long uniques.
See above. The above is to avoid creating new handlers when we don't need
them!  If you have 100 partitions, you really don't want to create 100 cloned
table handlers.
Of course. I just mean the method name. If it is "preparing for insert"
as the name suggests, one could've expected it to be done earlier. I'm
saying, the name should not suggest it's a preparation for an insert,
call it, for example, "prepare_update_handler"
See my comments. It's not only intended for update handler, so I prefer
to keep the current name as that will be more future proof.

Also, it should only be called when one does insert's, not because
there exists an update handler.  It's perfectly safe to call before
any insert usage. The current code just optimized when it's called.
...
...
...
What you actually do here (as I've found out looking firther in the patch)
you prepare update_handler. That is "prepare_for_insert" is VERY confusing
here, it sounds like it prepares for a normal insert and should be
called at the beginning of mysql_insert(). And it's nothing like that.
We are calling it as start of mysql_insert(), however we have an if around
it as a small optimization.  We can remove the if you want, things will
work as before if we d.
...
...
...
It would remove half of the questions if you'd call it
prepare_update_handler() or, better, prepare_auxiliary_handler() (because
it'll be used not only for long uniques) or something like that.
As one ONLY have to call it for insert and it's safe to call if one has
update handler or not, I don't think any of the above names are any better.
...
...
The idea with prepare_for_insert is that we can add special handler code
that we want to do once in case of insert. We didn't have that before and
I can see many use of that.
In any case, I will add a comment to the above partition code to clarify
what we are doing:
"We have to call prepare_for_insert() if we have an update handler
in the underlying table (to clone the handler). This is because for
INSERT's prepare_for_insert() is only called for the main table,
not for all partitions. This is to reduce the huge overhead of cloning
a possible not needed handler if there are many partitions."
I don't think this is needed. The confusing part was not the condition,
but the method name. If you rename the method, everything else will be
fine.
I think it will be worse with a rename. I had originally plans to add more
things to prepare_for_insert(), but this is for a future commit.
...
...
...
Still, some questions are left:
1. why partitioning needs special code for that?
The comment explains it all
Not quite. The long unique is only done for the main table, not for
partitions. Individual partitions should never have update_handler
initialized in the first place.
It must have as we call ha_write_row() with the partition handler and
this must do with the unique handler if the row exists or not.
...
...
...
2. when a handler itself is its update_handler AND inited==RND?
   the handler itself can be the update_handler on INSERT, but then inited is NONE
   Is it UPDATE with system versioning?
No, this is needed for normal updates, just like in the old code.
For normal updates this->update_handler != this.
For inserts inited!=RND.
I'm asking when update_handler==this && inited==RND.
This is the code from the original code before any of my changes.
I assumed you had already asked that as part of the orignal review.

In case of insert with DUP_ERROR we call ha_rnd_init_with_error().
I thought it was possible also with update, but that could be mainly
when versioning is involved.
...
...
...
...
start_part_bulk_insert(thd, part_id);
tmp_disable_binlog(thd); /* Do not replicate the low-level changes. */
diff --git a/sql/handler.cc b/sql/handler.cc
index 0700b1571e5..1e3f987b4e5 100644
--- a/sql/handler.cc
+++ b/sql/handler.cc
@@ -2648,6 +2649,39 @@ handler *handler::clone(const char *name, MEM_ROOT *mem_root)
   return NULL;
 }
+/**
+  @brief clone of current handler.
you generally don't have to write @brief in these cases. Just
I usually don't write @brief.  Strange, I must have copied the code from
somewhere.
Ok, the @brief came from the original code that I copied from table.cc
and I didn't remove it.
I have now removed it. Someone else should fix all the other places
where @brief is used wrongly!
It's not really wrong, just redundant. So, I'd say, does not justify
global removal from everywhere, but better not to do it in the new code.
Not a big deal if you use it, though.
I don't like it. I only got @brief becaues I copied two functions with
function comments between files.  When doing that, I prefer to do as
little changes as possible to make it easier for people to find
the new functions.
...
...
<cut>
...
...
+bool handler::clone_handler_for_update()
+{
+  handler *tmp;
+  DBUG_ASSERT(table->s->long_unique_table);
+
+  if (update_handler != this)
+    return 0;                                   // Already done
+  if (!(tmp= clone(table->s->normalized_path.str, table->in_use->mem_root)))
+    return 1;
+  update_handler= tmp;
+  update_handler->ha_external_lock(table->in_use, F_RDLCK);
Looks strange. Why F_RDLCK if it's an *update* handler?
This definitely needs a comment, why you're using read lock for updates.
Because the update handler is ONLY used to check if there is an existing
row. We never use it for doing changes.
Note that this is old code. Didn't you catch this in your original review
before it was pushed ?
Okay. It doesn't matter because it's renamed in the Nikita's patch
anyway. (but only a few days ago, after I wrote that review).
I hope that Nikita will base his new work on my code. It would be better that
he wait until I have pushed before doing new work.
...
...
...
...
@@ -6481,6 +6515,8 @@ int handler::ha_reset()
   DBUG_ASSERT(inited == NONE);
   /* reset the bitmaps to point to defaults */
   table->default_column_bitmaps();
+  if (update_handler != this)
+    delete_update_handler();
you already have an if() inside the delete_update_handler().
Either remove it from here or from there.
We don't check the handler in all calls.
I just added the check to ha_reset() as this is a very common call
and wanted to avoid an extra call.
Keeping current code.
it's not virtual. And in the same file. And small. The compiler will
inline it, you don't need to bother.
The compiler will also remove it. I think it's better to make things
explicite for anything that is in commonly used functions.
...
...
...
...
+++ b/sql/sql_delete.cc
@@ -752,6 +752,10 @@ bool mysql_delete(THD *thd, TABLE_LIST *table_list, COND *conds,
           && !table->versioned()
           && table->file->has_transactions();
+  if (table->versioned(VERS_TIMESTAMP) ||
+      (table_list->has_period() && !portion_of_time_through_update))
+    table->file->prepare_for_insert(1);
Ok, now I think that your idea of the default update_handler=this
is rather fragile. If the default is NULL, than we can put in many places
DBUG_ASSERT(update_handler != NULL);
to make sure it was initialized when needed. Now we cannot do it.
The current is faster as we can always rely on the value.
The code will work in most cases even if prepare_for_insert()
is not set.
For an assert, on can always do (trough an inline handler function)
DBUG_ASSERT(update_handler != table->file)
Exactly same thing.
Not at all, usually update_handler==table->file for inserts. It does not
mean update_handler is not initialized. NULL was not initialized before
your patch, and now one cannot detect whether update_handler was
initialized or not.
Initialized means that it changes value. In old code it was only
updated if there was an update handler, so asserts would not work 'as
such'. Also it was only set after the first insert, so it was hard
to have any asserts over the place.

It still really trivial to add an assert that checks if we should need
a clone and if the update handler was set.  It's actually easier to do
now as the update handler will be set BEFORE the ha_write_row call.
...
...
...
And there are many places in the code and non-trivial conditions when
the update_handler has to be initialized and these places and conditions
will change in the future. An assert would be helpful here.
Nothing stops us from having an assert with current code.
The only different from my code is that we don't get a crash
if we forget an assert and wrongly call update_handler.
see above. An assert is now impossible. If we wrongly call
update_handler it might work, but more often than not it'll produce
incorrect table content. So, no crash, but a hard to detect incorrect
data corruption.
If ha_reset() is called, then things will work.  If it's not called,
most things will crash.  This means that it's not likely that this
will go wrong without having it detected.
...
...
...
...
+
   THD_STAGE_INFO(thd, stage_updating);
   while (likely(!(error=info.read_record())) && likely(!thd->killed) &&
          likely(!thd->is_error()))
--- a/storage/myisam/ha_myisam.cc
+++ b/storage/myisam/ha_myisam.cc
@@ -1740,7 +1740,7 @@ void ha_myisam::start_bulk_insert(ha_rows rows, uint flags)
     enable_indexes() failed, thus it's essential that indexes are
     disabled ONLY for an empty table.
   */
-  if (file->state->records == 0 && can_enable_indexes &&
+  if (file->state->records == 0 && can_enable_indexes && !has_long_unique() &&
       (!rows || rows >= MI_MIN_ROWS_TO_DISABLE_INDEXES))
why do you disable bulk inserts for long uniques?
Because they are incompatible.
Didn't you read my excellent commit message?
" Disable write cache in MyISAM and Aria when using update_handler as
if cache is used, the row will not be inserted until end of statement
and update_handler would not find conflicting rows"
Does that explain things?
No, sorry, it doesn't. I did read your comment before writing the question.
Here you didn't disable _write cache_. You disabled switching off
indexes. Your comment doesn't apply to that, because no matter whether
indexes are disabled or not, rows are still inserted at once.
Sorry, you right the comment is wrong. I orignally thought we have to
switch of the row cache, but we didn't need to as we do flush the cache
for any key read.

I update the comment to the following:

- Disable write cache in MyISAM and Aria when using update_handler as
  if cache is used, the long unique index will not be updated on insert
  because the index is not marked with HA_NOSAME

A better version for future updates to long unique would be to mark
the long unique index with a special flag (HA_EXTERNAL_NOSAME) so
that the engine knows that this can't be disabled.

Regards,
Monty

    

Re: [Maria-developers] 5365db70cb8: Improve update handler (long unique keys on blobs)

Michael Widenius