On Wed, Mar 31, 2021 at 9:45 PM Michael Widenius <michael.widenius@gmail.com> wrote:
Cut
Optimize usage of c_ptr(), c_ptr_quick() and String::alloc()
The proble was that hen using String::alloc() to allocate a string,
"hen" ? after a minute of staring I realized that you probably meant "when".
(also, "problem")
the String ensures that there is space for an extra NULL byte in the buffer and if not, reallocates the string. This is a problem with the String::set_int() that calls alloc(21), which forces an extra malloc/free to happen.
- We do not anymore re-allocate String if alloc() is called with the Allocated_length. This reduces number of malloc() allocations, especially one big re-allocation in Protocol::send_result_Set_metadata() for almost every query that produced a result to the connnected client. - Avoid extra mallocs when using LONGLONG_BUFFER_SIZE This can now be done as alloc() doesn't increase buffers if new length is not bigger than old one. - c_ptr() is redesigned to be safer (but a bit longer) than before. - Remove wrong usage of c_ptr_quick() c_ptr_quick() where used in many cases to get the pointer to the used
s/where/was/
buffer, even when it didn't need to be \0 terminated. In this case ptr() is a better substitute. Another problem with c_ptr_quick() is that it didn't guarantee that the string would be \0 terminated. - item_val_str(), an API function not used currently by the server, now always returns a null terminated string (before it didn't always do that). - Ensure that all String allocations uses STRING_PSI_MEMORY_KEY. The old mixed usage of performance keys caused assert's when String buffers where shrunk. - Binary_string::shrink() is simplifed
Will fix the typos. Thanks!
diff --git a/sql/item.cc b/sql/item.cc index f6f3e2720fa..a8a43c6266a 100644 --- a/sql/item.cc +++ b/sql/item.cc @@ -4703,7 +4703,7 @@ Item *Item_param::value_clone_item(THD *thd) return 0; // Should create Item_decimal. See MDEV-11361. case STRING_RESULT: return new (mem_root) Item_string(thd, name, - Lex_cstring(value.m_string.c_ptr_quick(), + Lex_cstring(value.m_string.ptr(),
Hmm, you said that LEX_CSTRING::str should always be \0-terminated. If that's the case, then c_ptr() would be correct here, not ptr()
value.m_string.length()), value.m_string.charset(), collation.derivation,
Yes, LEX_CSTRING *should* always be \0 terminated, but as we sometimes build these up I do not want to depend on that for generic usage. For usage to printf() than we should trust that the developer knows what he is doing.
However Item_string() will only use the exact length and never call printf, so here this is safe.
diff --git a/sql/partition_info.cc b/sql/partition_info.cc <cut>
- DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %s", + DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %.*s", part_def->part_id, part_def->is_subpart, - part_name)); + length, part_name));
These two changes contradict each other. Either part_name must be \0-terminated, and then you don't need to specify a length in printf. Or it is not \0-terminated and you must specify a length.
part_name is not null terminated and that is why I had to add .* and length. What is the contradiction ?
diff --git a/sql/sql_string.cc b/sql/sql_string.cc index f2a0f55aec8..95a57017c53 100644 --- a/sql/sql_string.cc +++ b/sql/sql_string.cc @@ -118,9 +121,14 @@ bool Binary_string::realloc_raw(size_t alloc_length) return FALSE; }
+ bool String::set_int(longlong num, bool unsigned_flag, CHARSET_INFO *cs) { - uint l=20*cs->mbmaxlen+1; + /* + This allocates a few bytes extra in the unlikely case that cs->mb_maxlen + > 1, but we can live with that
dunno, it seems that utf8 is the *likely* case and it's getting more and more likely with the time,
Not for integers, as we often use binary strings for these.
@@ -1254,21 +1262,16 @@ bool String::append_semi_hex(const char *s, uint len, CHARSET_INFO *cs) // Shrink the buffer, but only if it is allocated on the heap. void Binary_string::shrink(size_t arg_length) { - if (!is_alloced()) - return; - if (ALIGN_SIZE(arg_length + 1) < Alloced_length) + if (is_alloced() && ALIGN_SIZE(arg_length + 1) < Alloced_length) + { + char *new_ptr; + /* my_realloc() can't fail as new buffer is less than the original one */ + if ((new_ptr= (char*) my_realloc(STRING_PSI_MEMORY_KEY, Ptr, arg_length,
you don't need an if() because, as you wrote yourself "my_realloc() can't fail" here.
Yes, this should be true, but one never knows with allocation libraries. There is implementation of realloc that makes a copy and deletes the old one. However I am not sure what happens if there is no place to make a copy. Better safe than sorry.
{
diff --git a/sql/sql_string.h b/sql/sql_string.h index b3eca118b63..ba0cff68fb4 100644 --- a/sql/sql_string.h +++ b/sql/sql_string.h @@ -600,25 +600,34 @@ class Binary_string: public Static_binary_string
inline char *c_ptr() { - DBUG_ASSERT(!alloced || !Ptr || !Alloced_length || - (Alloced_length >= (str_length + 1))); - - if (!Ptr || Ptr[str_length]) // Should be safe - (void) realloc(str_length); + if (unlikely(!Ptr)) + return (char*) ""; + /* + Here we assume that any buffer used to initalize String has + an end \0 or have at least an accessable character at end. + This is to handle the case of String("Hello",5) efficently. + */ + if (unlikely(!alloced && !Ptr[str_length])) + return Ptr;
No, this is not good. Note the difference between
String a("Hello", 5)
and
char hello[5]; String a(buf, 5);
Your assumption should only apply to the first case, not to the second. In the first case alloced=Alloced_length=0, in the second case only alloced=0 and Alloced_length=5. So in the if() above you need to look at Alloced_length:
if (!Alloced_length && !Ptr[str_length]) return Ptr;
Good point.> > + if (str_length < Alloced_length)
+ { + Ptr[str_length]=0; + return Ptr; + } + (void) realloc(str_length+1); /* This will add end \0 */ return Ptr; } + /* Use c_ptr() instead. This will be deleted soon, kept for compatiblity */ inline char *c_ptr_quick() { - if (Ptr && str_length < Alloced_length) - Ptr[str_length]=0; - return Ptr; + return c_ptr_safe();
1. why not to remove it now? 2. it's strange that you write "use c_ptr() instead" but you actually use c_ptr_safe() instead.
What I meant is that the developer should instead use c_ptr(). I will make the message more clear. I decided to call c_ptr_safe() here as this is the most safe version to use. In practice the new c_ptr() should be safe for 99% of our code.
} inline char *c_ptr_safe() { if (Ptr && str_length < Alloced_length) Ptr[str_length]=0; else - (void) realloc(str_length); + (void) realloc(str_length + 1); return Ptr; }
could you write a comment, explaining when one should use c_ptr() and when - c_ptr_safe() ?
Yes, try to. Will not be easy as most cases is now handled by c_ptr().
Regards, Monty