Re: [Maria-developers] 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()

3 Jun 2021

Hi, Aleksey!

On Jan 26, Aleksey Midenkov wrote:
...
On Tue, Dec 22, 2020 at 7:58 PM Sergei Golubchik <serg@mariadb.org> wrote:
...
On Nov 03, Aleksey Midenkov wrote:
And please, pretty please, don't use non-constant references as
function arguments.
Why? With pointers we can pass NULL and we don't need NULL there, do we?
Because this is the policy. If you convince Monty that non-constant
references are fine - feel free to use them.
...
...
...
On Mon, Nov 2, 2020 at 11:33 PM Sergei Golubchik <serg@mariadb.org> wrote:
...
...

diff --git a/sql/field.cc b/sql/field.cc
index bdaaecc20269..0fd40c979d2c 100644
--- a/sql/field.cc
+++ b/sql/field.cc
@@ -8310,6 +8310,7 @@ int Field_blob::store(const char *from,uint length,CHARSET_INFO *cs)
   copy_length= copier.well_formed_copy(field_charset,
                                        (char*) value.ptr(), new_length,
                                        cs, from, length);
+  value.length(copy_length);
good! Could this be a distinct bug with its own test case?
Do we need to spend time on that bureaucracy? There are real problems to
solve... Tickets drain time from the management people and generally from
anyone who sees them.
I didn't mean opening an MDEV. Only writing a test case and extracting
it into a separate commit.
If this is a genuine bug fix and you don't create a test case - how can
you be sure the fix will stay? It may disappear in a merge or a rebase,
manually or automatically. Or in a refactoring that "looks fine, all
tests pass". If you can create a test case - please, do.
It is tested by federated.federated_partition test case.
Do you mean, that if I remove this line, federated.federated_partition
test will fail? If yes - good, this is the test I meant, no need to
create another one.
...
Why do we have to extract this side-fix to a separate commit?
Because later when someone looks at the code it helps to be able to
understand what a commit is actually fixing.
...
...
...
...
...
diff --git a/sql/ha_partition.cc b/sql/ha_partition.cc
--- a/sql/ha_partition.cc
+++ b/sql/ha_partition.cc
@@ -5125,14 +5128,14 @@ bool ha_partition::init_record_priority_queue()
   uint alloc_len;
   uint used_parts= bitmap_bits_set(&m_part_info->read_partitions);
   /* Allocate record buffer for each used partition. */
-  m_priority_queue_rec_len= m_rec_length + PARTITION_BYTES_IN_POS;
+  m_priority_queue_rec_len= m_rec_length + ORDERED_REC_OFFSET;
   if (!m_using_extended_keys)
       m_priority_queue_rec_len += m_file[0]->ref_length;
   alloc_len= used_parts * m_priority_queue_rec_len;
   /* Allocate a key for temporary use when setting up the scan. */
   alloc_len+= table_share->max_key_length;
-  if (!(m_ordered_rec_buffer= (uchar*)my_malloc(alloc_len, MYF(MY_WME))))
+  if (!(m_ordered_rec_buffer= (uchar*) alloc_root(&m_ordered_root, alloc_len)))
I don't see why you need a memroot here. memroot is needed when
you allocate later at difeerent points in time some initially
unpredictable amount of memory that has the same lifetime and
needs to be freed all at once.
One should keep in mind this is an artificial usage of the prio queue.
In diversified real-world usage the share of such kind SELECT is say
0.1%. But even if it is 1% we divide our stats by 100. So, do you
really care about 0.0015%?
A right tool for a job. If you need to allocate few buffers at the same
time - use multi_alloc, if you do many allocations over time and want to
fee them at once - use memroot.
...
...
...
...
...
@@ -6178,7 +6203,11 @@ int ha_partition::handle_ordered_index_scan(uchar *buf, bool reverse_order)
       */
       error= file->read_range_first(m_start_key.key? &m_start_key: NULL,
                                     end_range, eq_range, TRUE);
-      memcpy(rec_buf_ptr, table->record[0], m_rec_length);
+      if (!error)
+      {
+        memcpy(rec_buf_ptr, table->record[0], m_rec_length);
+      }
did you have a bug because of that? something didn't work?
No, I just saw worthless memcpy and avoided it.
You're right that memcopy is unnecessary in case of an error.
But most of the time there is no error. You made an exceptional code path
faster by making a common code path slower. I think it's generally
better to do the opposite, making the common, no-error code path
faster, fewer conditionals, in particular. At the cost of making the error
path slower, if needed.
I'm usually aware of such things in my decisions. The slowdown is
close to 0, it is not even 0.0001%, it is much lower. OTOH memcpy()
slowdown is much higher and this is not a fully exceptional code path:
HA_ERR_KEY_NOT_FOUND continues the loop.
This is a wrong code pattern, you should optimize for the normal code
path, not for errors. Errors are exceptions, they're much more rare than
non-errors.
...
...
...
...
...
@@ -6310,6 +6349,43 @@ int ha_partition::handle_ordered_index_scan_key_not_found()
+      if (cached)
+      {
+        cached->swap(s.blob);
+        s.set_read_value= set_read_value;
is it indeed possible here for a value to be either in `value` or in
`read_value` ?
When happens what?
Yes, it uses `value` for plain fields and `read_value` for virtual fields.
Oh, I see.
This read_value was introduced in ea1b25046c81db8fdf to solve the
case of UPDATE, where a row is moved with store_record(table,
record[1]) and blob pointers aren't updated. This is very similar to
what you're fixing, perhaps read_value should go away and both bugs
should have a one unified solution that allows to "park" a record
somewhere out of record[0] and restore later, all with proper blob
handling?
Do you think this is necessary now? I believe there are a lot of
*much* more important tasks.
Uhm, yes? This would be a notable code simplification, and it won't take
long, a low-hanging fruit.

Regards,
Sergei
VP of MariaDB Server Engineering
and security@mariadb.org