Re: [Maria-developers] f46382ba93c: Optimize usage of c_ptr(), c_ptr_quick() and String::alloc()
Hi, Michael! On Mar 27, Michael Widenius wrote:
revision-id: f46382ba93c (mariadb-10.5.2-515-gf46382ba93c) parent(s): c157c285db9 author: Michael Widenius <michael.widenius@gmail.com> committer: Michael Widenius <michael.widenius@gmail.com> timestamp: 2021-03-24 14:31:53 +0200 message:
Optimize usage of c_ptr(), c_ptr_quick() and String::alloc()
The proble was that hen using String::alloc() to allocate a string,
"hen" ? after a minute of staring I realized that you probably meant "when". (also, "problem")
the String ensures that there is space for an extra NULL byte in the buffer and if not, reallocates the string. This is a problem with the String::set_int() that calls alloc(21), which forces an extra malloc/free to happen.
- We do not anymore re-allocate String if alloc() is called with the Allocated_length. This reduces number of malloc() allocations, especially one big re-allocation in Protocol::send_result_Set_metadata() for almost every query that produced a result to the connnected client. - Avoid extra mallocs when using LONGLONG_BUFFER_SIZE This can now be done as alloc() doesn't increase buffers if new length is not bigger than old one. - c_ptr() is redesigned to be safer (but a bit longer) than before. - Remove wrong usage of c_ptr_quick() c_ptr_quick() where used in many cases to get the pointer to the used
s/where/was/
buffer, even when it didn't need to be \0 terminated. In this case ptr() is a better substitute. Another problem with c_ptr_quick() is that it didn't guarantee that the string would be \0 terminated. - item_val_str(), an API function not used currently by the server, now always returns a null terminated string (before it didn't always do that). - Ensure that all String allocations uses STRING_PSI_MEMORY_KEY. The old mixed usage of performance keys caused assert's when String buffers where shrunk. - Binary_string::shrink() is simplifed
diff --git a/sql/item.cc b/sql/item.cc index f6f3e2720fa..a8a43c6266a 100644 --- a/sql/item.cc +++ b/sql/item.cc @@ -4703,7 +4703,7 @@ Item *Item_param::value_clone_item(THD *thd) return 0; // Should create Item_decimal. See MDEV-11361. case STRING_RESULT: return new (mem_root) Item_string(thd, name, - Lex_cstring(value.m_string.c_ptr_quick(), + Lex_cstring(value.m_string.ptr(),
Hmm, you said that LEX_CSTRING::str should always be \0-terminated. If that's the case, then c_ptr() would be correct here, not ptr()
value.m_string.length()), value.m_string.charset(), collation.derivation, diff --git a/sql/partition_info.cc b/sql/partition_info.cc index 871411cf6c4..b3e1bceec31 100644 --- a/sql/partition_info.cc +++ b/sql/partition_info.cc @@ -126,7 +126,7 @@ partition_info *partition_info::get_clone(THD *thd) /** Mark named [sub]partition to be used/locked.
- @param part_name Partition name to match. + @param part_name Partition name to match. Must be \0 terminated! @param length Partition name length.
@return Success if partition found @@ -172,9 +172,9 @@ bool partition_info::add_named_partition(const char *part_name, size_t length) else bitmap_set_bit(&read_partitions, part_def->part_id); } - DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %s", + DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %.*s", part_def->part_id, part_def->is_subpart, - part_name)); + length, part_name));
These two changes contradict each other. Either part_name must be \0-terminated, and then you don't need to specify a length in printf. Or it is not \0-terminated and you must specify a length.
DBUG_RETURN(false); }
diff --git a/sql/sql_string.cc b/sql/sql_string.cc index f2a0f55aec8..95a57017c53 100644 --- a/sql/sql_string.cc +++ b/sql/sql_string.cc @@ -118,9 +121,14 @@ bool Binary_string::realloc_raw(size_t alloc_length) return FALSE; }
+ bool String::set_int(longlong num, bool unsigned_flag, CHARSET_INFO *cs) { - uint l=20*cs->mbmaxlen+1; + /* + This allocates a few bytes extra in the unlikely case that cs->mb_maxlen + > 1, but we can live with that
dunno, it seems that utf8 is the *likely* case and it's getting more and more likely with the time,
+ */ + uint l= LONGLONG_BUFFER_SIZE * cs->mbmaxlen; int base= unsigned_flag ? 10 : -10;
if (alloc(l)) @@ -1254,21 +1262,16 @@ bool String::append_semi_hex(const char *s, uint len, CHARSET_INFO *cs) // Shrink the buffer, but only if it is allocated on the heap. void Binary_string::shrink(size_t arg_length) { - if (!is_alloced()) - return; - if (ALIGN_SIZE(arg_length + 1) < Alloced_length) + if (is_alloced() && ALIGN_SIZE(arg_length + 1) < Alloced_length) + { + char *new_ptr; + /* my_realloc() can't fail as new buffer is less than the original one */ + if ((new_ptr= (char*) my_realloc(STRING_PSI_MEMORY_KEY, Ptr, arg_length,
you don't need an if() because, as you wrote yourself "my_realloc() can't fail" here.
+ MYF(thread_specific ? + MY_THREAD_SPECIFIC : 0)))) { diff --git a/sql/sql_string.h b/sql/sql_string.h index b3eca118b63..ba0cff68fb4 100644 --- a/sql/sql_string.h +++ b/sql/sql_string.h @@ -600,25 +600,34 @@ class Binary_string: public Static_binary_string
inline char *c_ptr() { - DBUG_ASSERT(!alloced || !Ptr || !Alloced_length || - (Alloced_length >= (str_length + 1))); - - if (!Ptr || Ptr[str_length]) // Should be safe - (void) realloc(str_length); + if (unlikely(!Ptr)) + return (char*) ""; + /* + Here we assume that any buffer used to initalize String has + an end \0 or have at least an accessable character at end. + This is to handle the case of String("Hello",5) efficently. + */ + if (unlikely(!alloced && !Ptr[str_length])) + return Ptr;
No, this is not good. Note the difference between String a("Hello", 5) and char hello[5]; String a(buf, 5); Your assumption should only apply to the first case, not to the second. In the first case alloced=Alloced_length=0, in the second case only alloced=0 and Alloced_length=5. So in the if() above you need to look at Alloced_length: if (!Alloced_length && !Ptr[str_length]) return Ptr;
+ if (str_length < Alloced_length) + { + Ptr[str_length]=0; + return Ptr; + } + (void) realloc(str_length+1); /* This will add end \0 */ return Ptr; } + /* Use c_ptr() instead. This will be deleted soon, kept for compatiblity */ inline char *c_ptr_quick() { - if (Ptr && str_length < Alloced_length) - Ptr[str_length]=0; - return Ptr; + return c_ptr_safe();
1. why not to remove it now? 2. it's strange that you write "use c_ptr() instead" but you actually use c_ptr_safe() instead.
} inline char *c_ptr_safe() { if (Ptr && str_length < Alloced_length) Ptr[str_length]=0; else - (void) realloc(str_length); + (void) realloc(str_length + 1); return Ptr; }
could you write a comment, explaining when one should use c_ptr() and when - c_ptr_safe() ? Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Cut
Optimize usage of c_ptr(), c_ptr_quick() and String::alloc()
The proble was that hen using String::alloc() to allocate a string,
"hen" ? after a minute of staring I realized that you probably meant "when".
(also, "problem")
the String ensures that there is space for an extra NULL byte in the buffer and if not, reallocates the string. This is a problem with the String::set_int() that calls alloc(21), which forces an extra malloc/free to happen.
- We do not anymore re-allocate String if alloc() is called with the Allocated_length. This reduces number of malloc() allocations, especially one big re-allocation in Protocol::send_result_Set_metadata() for almost every query that produced a result to the connnected client. - Avoid extra mallocs when using LONGLONG_BUFFER_SIZE This can now be done as alloc() doesn't increase buffers if new length is not bigger than old one. - c_ptr() is redesigned to be safer (but a bit longer) than before. - Remove wrong usage of c_ptr_quick() c_ptr_quick() where used in many cases to get the pointer to the used
s/where/was/
buffer, even when it didn't need to be \0 terminated. In this case ptr() is a better substitute. Another problem with c_ptr_quick() is that it didn't guarantee that the string would be \0 terminated. - item_val_str(), an API function not used currently by the server, now always returns a null terminated string (before it didn't always do that). - Ensure that all String allocations uses STRING_PSI_MEMORY_KEY. The old mixed usage of performance keys caused assert's when String buffers where shrunk. - Binary_string::shrink() is simplifed
Will fix the typos. Thanks!
diff --git a/sql/item.cc b/sql/item.cc index f6f3e2720fa..a8a43c6266a 100644 --- a/sql/item.cc +++ b/sql/item.cc @@ -4703,7 +4703,7 @@ Item *Item_param::value_clone_item(THD *thd) return 0; // Should create Item_decimal. See MDEV-11361. case STRING_RESULT: return new (mem_root) Item_string(thd, name, - Lex_cstring(value.m_string.c_ptr_quick(), + Lex_cstring(value.m_string.ptr(),
Hmm, you said that LEX_CSTRING::str should always be \0-terminated. If that's the case, then c_ptr() would be correct here, not ptr()
value.m_string.length()), value.m_string.charset(), collation.derivation,
Yes, LEX_CSTRING *should* always be \0 terminated, but as we sometimes build these up I do not want to depend on that for generic usage. For usage to printf() than we should trust that the developer knows what he is doing. However Item_string() will only use the exact length and never call printf, so here this is safe.
diff --git a/sql/partition_info.cc b/sql/partition_info.cc <cut>
- DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %s", + DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %.*s", part_def->part_id, part_def->is_subpart, - part_name)); + length, part_name));
These two changes contradict each other. Either part_name must be \0-terminated, and then you don't need to specify a length in printf. Or it is not \0-terminated and you must specify a length.
part_name is not null terminated and that is why I had to add .* and length. What is the contradiction ?
diff --git a/sql/sql_string.cc b/sql/sql_string.cc index f2a0f55aec8..95a57017c53 100644 --- a/sql/sql_string.cc +++ b/sql/sql_string.cc @@ -118,9 +121,14 @@ bool Binary_string::realloc_raw(size_t alloc_length) return FALSE; }
+ bool String::set_int(longlong num, bool unsigned_flag, CHARSET_INFO *cs) { - uint l=20*cs->mbmaxlen+1; + /* + This allocates a few bytes extra in the unlikely case that cs->mb_maxlen + > 1, but we can live with that
dunno, it seems that utf8 is the *likely* case and it's getting more and more likely with the time,
Not for integers, as we often use binary strings for these.
@@ -1254,21 +1262,16 @@ bool String::append_semi_hex(const char *s, uint len, CHARSET_INFO *cs) // Shrink the buffer, but only if it is allocated on the heap. void Binary_string::shrink(size_t arg_length) { - if (!is_alloced()) - return; - if (ALIGN_SIZE(arg_length + 1) < Alloced_length) + if (is_alloced() && ALIGN_SIZE(arg_length + 1) < Alloced_length) + { + char *new_ptr; + /* my_realloc() can't fail as new buffer is less than the original one */ + if ((new_ptr= (char*) my_realloc(STRING_PSI_MEMORY_KEY, Ptr, arg_length,
you don't need an if() because, as you wrote yourself "my_realloc() can't fail" here.
Yes, this should be true, but one never knows with allocation libraries. There is implementation of realloc that makes a copy and deletes the old one. However I am not sure what happens if there is no place to make a copy. Better safe than sorry.
{
diff --git a/sql/sql_string.h b/sql/sql_string.h index b3eca118b63..ba0cff68fb4 100644 --- a/sql/sql_string.h +++ b/sql/sql_string.h @@ -600,25 +600,34 @@ class Binary_string: public Static_binary_string
inline char *c_ptr() { - DBUG_ASSERT(!alloced || !Ptr || !Alloced_length || - (Alloced_length >= (str_length + 1))); - - if (!Ptr || Ptr[str_length]) // Should be safe - (void) realloc(str_length); + if (unlikely(!Ptr)) + return (char*) ""; + /* + Here we assume that any buffer used to initalize String has + an end \0 or have at least an accessable character at end. + This is to handle the case of String("Hello",5) efficently. + */ + if (unlikely(!alloced && !Ptr[str_length])) + return Ptr;
No, this is not good. Note the difference between
String a("Hello", 5)
and
char hello[5]; String a(buf, 5);
Your assumption should only apply to the first case, not to the second. In the first case alloced=Alloced_length=0, in the second case only alloced=0 and Alloced_length=5. So in the if() above you need to look at Alloced_length:
if (!Alloced_length && !Ptr[str_length]) return Ptr;
Good point.> > + if (str_length < Alloced_length)
+ { + Ptr[str_length]=0; + return Ptr; + } + (void) realloc(str_length+1); /* This will add end \0 */ return Ptr; } + /* Use c_ptr() instead. This will be deleted soon, kept for compatiblity */ inline char *c_ptr_quick() { - if (Ptr && str_length < Alloced_length) - Ptr[str_length]=0; - return Ptr; + return c_ptr_safe();
1. why not to remove it now? 2. it's strange that you write "use c_ptr() instead" but you actually use c_ptr_safe() instead.
What I meant is that the developer should instead use c_ptr(). I will make the message more clear. I decided to call c_ptr_safe() here as this is the most safe version to use. In practice the new c_ptr() should be safe for 99% of our code.
} inline char *c_ptr_safe() { if (Ptr && str_length < Alloced_length) Ptr[str_length]=0; else - (void) realloc(str_length); + (void) realloc(str_length + 1); return Ptr; }
could you write a comment, explaining when one should use c_ptr() and when - c_ptr_safe() ?
Yes, try to. Will not be easy as most cases is now handled by c_ptr(). Regards, Monty
On Wed, Mar 31, 2021 at 9:45 PM Michael Widenius <michael.widenius@gmail.com> wrote:
Cut
Optimize usage of c_ptr(), c_ptr_quick() and String::alloc()
The proble was that hen using String::alloc() to allocate a string,
"hen" ? after a minute of staring I realized that you probably meant "when".
(also, "problem")
the String ensures that there is space for an extra NULL byte in the buffer and if not, reallocates the string. This is a problem with the String::set_int() that calls alloc(21), which forces an extra malloc/free to happen.
- We do not anymore re-allocate String if alloc() is called with the Allocated_length. This reduces number of malloc() allocations, especially one big re-allocation in Protocol::send_result_Set_metadata() for almost every query that produced a result to the connnected client. - Avoid extra mallocs when using LONGLONG_BUFFER_SIZE This can now be done as alloc() doesn't increase buffers if new length is not bigger than old one. - c_ptr() is redesigned to be safer (but a bit longer) than before. - Remove wrong usage of c_ptr_quick() c_ptr_quick() where used in many cases to get the pointer to the used
s/where/was/
buffer, even when it didn't need to be \0 terminated. In this case ptr() is a better substitute. Another problem with c_ptr_quick() is that it didn't guarantee that the string would be \0 terminated. - item_val_str(), an API function not used currently by the server, now always returns a null terminated string (before it didn't always do that). - Ensure that all String allocations uses STRING_PSI_MEMORY_KEY. The old mixed usage of performance keys caused assert's when String buffers where shrunk. - Binary_string::shrink() is simplifed
Will fix the typos. Thanks!
diff --git a/sql/item.cc b/sql/item.cc index f6f3e2720fa..a8a43c6266a 100644 --- a/sql/item.cc +++ b/sql/item.cc @@ -4703,7 +4703,7 @@ Item *Item_param::value_clone_item(THD *thd) return 0; // Should create Item_decimal. See MDEV-11361. case STRING_RESULT: return new (mem_root) Item_string(thd, name, - Lex_cstring(value.m_string.c_ptr_quick(), + Lex_cstring(value.m_string.ptr(),
Hmm, you said that LEX_CSTRING::str should always be \0-terminated. If that's the case, then c_ptr() would be correct here, not ptr()
value.m_string.length()), value.m_string.charset(), collation.derivation,
Yes, LEX_CSTRING *should* always be \0 terminated, but as we sometimes build these up I do not want to depend on that for generic usage. For usage to printf() than we should trust that the developer knows what he is doing.
However Item_string() will only use the exact length and never call printf, so here this is safe.
diff --git a/sql/partition_info.cc b/sql/partition_info.cc <cut>
- DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %s", + DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %.*s", part_def->part_id, part_def->is_subpart, - part_name)); + length, part_name));
These two changes contradict each other. Either part_name must be \0-terminated, and then you don't need to specify a length in printf. Or it is not \0-terminated and you must specify a length.
part_name is not null terminated and that is why I had to add .* and length. What is the contradiction ?
diff --git a/sql/sql_string.cc b/sql/sql_string.cc index f2a0f55aec8..95a57017c53 100644 --- a/sql/sql_string.cc +++ b/sql/sql_string.cc @@ -118,9 +121,14 @@ bool Binary_string::realloc_raw(size_t alloc_length) return FALSE; }
+ bool String::set_int(longlong num, bool unsigned_flag, CHARSET_INFO *cs) { - uint l=20*cs->mbmaxlen+1; + /* + This allocates a few bytes extra in the unlikely case that cs->mb_maxlen + > 1, but we can live with that
dunno, it seems that utf8 is the *likely* case and it's getting more and more likely with the time,
Not for integers, as we often use binary strings for these.
@@ -1254,21 +1262,16 @@ bool String::append_semi_hex(const char *s, uint len, CHARSET_INFO *cs) // Shrink the buffer, but only if it is allocated on the heap. void Binary_string::shrink(size_t arg_length) { - if (!is_alloced()) - return; - if (ALIGN_SIZE(arg_length + 1) < Alloced_length) + if (is_alloced() && ALIGN_SIZE(arg_length + 1) < Alloced_length) + { + char *new_ptr; + /* my_realloc() can't fail as new buffer is less than the original one */ + if ((new_ptr= (char*) my_realloc(STRING_PSI_MEMORY_KEY, Ptr, arg_length,
you don't need an if() because, as you wrote yourself "my_realloc() can't fail" here.
Yes, this should be true, but one never knows with allocation libraries. There is implementation of realloc that makes a copy and deletes the old one. However I am not sure what happens if there is no place to make a copy. Better safe than sorry.
{
diff --git a/sql/sql_string.h b/sql/sql_string.h index b3eca118b63..ba0cff68fb4 100644 --- a/sql/sql_string.h +++ b/sql/sql_string.h @@ -600,25 +600,34 @@ class Binary_string: public Static_binary_string
inline char *c_ptr() { - DBUG_ASSERT(!alloced || !Ptr || !Alloced_length || - (Alloced_length >= (str_length + 1))); - - if (!Ptr || Ptr[str_length]) // Should be safe - (void) realloc(str_length); + if (unlikely(!Ptr)) + return (char*) ""; + /* + Here we assume that any buffer used to initalize String has + an end \0 or have at least an accessable character at end. + This is to handle the case of String("Hello",5) efficently. + */ + if (unlikely(!alloced && !Ptr[str_length])) + return Ptr;
No, this is not good. Note the difference between
String a("Hello", 5)
and
char hello[5]; String a(buf, 5);
Your assumption should only apply to the first case, not to the second. In the first case alloced=Alloced_length=0, in the second case only alloced=0 and Alloced_length=5. So in the if() above you need to look at Alloced_length:
if (!Alloced_length && !Ptr[str_length]) return Ptr;
Good point.> > + if (str_length < Alloced_length)
+ { + Ptr[str_length]=0; + return Ptr; + } + (void) realloc(str_length+1); /* This will add end \0 */ return Ptr; } + /* Use c_ptr() instead. This will be deleted soon, kept for compatiblity */ inline char *c_ptr_quick() { - if (Ptr && str_length < Alloced_length) - Ptr[str_length]=0; - return Ptr; + return c_ptr_safe();
1. why not to remove it now? 2. it's strange that you write "use c_ptr() instead" but you actually use c_ptr_safe() instead.
What I meant is that the developer should instead use c_ptr(). I will make the message more clear. I decided to call c_ptr_safe() here as this is the most safe version to use. In practice the new c_ptr() should be safe for 99% of our code.
} inline char *c_ptr_safe() { if (Ptr && str_length < Alloced_length) Ptr[str_length]=0; else - (void) realloc(str_length); + (void) realloc(str_length + 1); return Ptr; }
could you write a comment, explaining when one should use c_ptr() and when - c_ptr_safe() ?
Yes, try to. Will not be easy as most cases is now handled by c_ptr().
Regards, Monty
Hi! Sorry, this was because gmail send it before I wanted to. (I do not like that ctrl+return sends email as this causes problems when one uses this a lot in slack!) <cut>
--- a/sql/sql_string.h +++ b/sql/sql_string.h @@ -600,25 +600,34 @@ class Binary_string: public Static_binary_string
inline char *c_ptr() { - DBUG_ASSERT(!alloced || !Ptr || !Alloced_length || - (Alloced_length >= (str_length + 1))); - - if (!Ptr || Ptr[str_length]) // Should be safe - (void) realloc(str_length); + if (unlikely(!Ptr)) + return (char*) ""; + /* + Here we assume that any buffer used to initalize String has + an end \0 or have at least an accessable character at end. + This is to handle the case of String("Hello",5) efficently. + */ + if (unlikely(!alloced && !Ptr[str_length])) + return Ptr;
No, this is not good. Note the difference between
String a("Hello", 5)
and
char hello[5]; String a(buf, 5);
Your assumption should only apply to the first case, not to the second. In the first case alloced=Alloced_length=0, in the second case only alloced=0 and Alloced_length=5. So in the if() above you need to look at Alloced_length:
if (!Alloced_length && !Ptr[str_length]) return Ptr;
Unfortunately this does not work good with our code. We have a lot of code that does things like this: String *new_str= new (thd->mem_root) String((const char*) name.str, name.length, system_charset_info) Where string is 0 terminated. With your proposed change, all of these will require a full realloc when using c_ptr(). Regards, Monty
Hi, Monty! On Mar 31, Michael Widenius wrote:
--- a/sql/sql_string.h +++ b/sql/sql_string.h @@ -600,25 +600,34 @@ class Binary_string: public Static_binary_string
inline char *c_ptr() { - DBUG_ASSERT(!alloced || !Ptr || !Alloced_length || - (Alloced_length >= (str_length + 1))); - - if (!Ptr || Ptr[str_length]) // Should be safe - (void) realloc(str_length); + if (unlikely(!Ptr)) + return (char*) ""; + /* + Here we assume that any buffer used to initalize String has + an end \0 or have at least an accessable character at end. + This is to handle the case of String("Hello",5) efficently. + */ + if (unlikely(!alloced && !Ptr[str_length])) + return Ptr;
No, this is not good. Note the difference between
String a("Hello", 5)
and
char hello[5]; String a(buf, 5);
Your assumption should only apply to the first case, not to the second. In the first case alloced=Alloced_length=0, in the second case only alloced=0 and Alloced_length=5. So in the if() above you need to look at Alloced_length:
if (!Alloced_length && !Ptr[str_length]) return Ptr;
Unfortunately this does not work good with our code.
We have a lot of code that does things like this: String *new_str= new (thd->mem_root) String((const char*) name.str, name.length, system_charset_info) Where string is 0 terminated. With your proposed change, all of these will require a full realloc when using c_ptr().
Hmm, I don't understand. The difference between String a("Hello", 5); and char hello[5]; String a(buf, 5); is that in the first case the argument is const char*. And in that case Alloced_length=0 and in that case you can expect the string to be \0 terminated. In your example that we do "a lot", the first argument is const char*, so String will have Alloced_length==0 and my suggestion will work exactly as planned, it'll use a \0 terminated fast path. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi! On Thu, Apr 1, 2021 at 12:08 AM Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Monty!
<cut>
Hmm, I don't understand. The difference between
String a("Hello", 5);
and
char hello[5]; String a(buf, 5);
is that in the first case the argument is const char*. And in that case Alloced_length=0 and in that case you can expect the string to be \0 terminated.
There is no guarantee at all that when one has allocated length == 0 that the string is 0 terminated. I tried your suggestion and it did not work, we got a lot of calls to malloc() that never gets freed. Here is an example: diff --git a/sql/sql_string.h b/sql/sql_string.h index aa773ba396f..80c672c0137 100644 --- a/sql/sql_string.h +++ b/sql/sql_string.h @@ -609,7 +609,7 @@ class Binary_string: public Sql_alloc This is to handle the case of String("Hello",5) and String("hello",5) efficently. */ - if (unlikely(!alloced && !Ptr[str_length])) + if (unlikely(!Alloced_length && !Ptr[str_length])) return Ptr; if (str_length < Alloced_length) mtr main.grant main.grant [ pass ] 943 ***Warnings generated in error logs during shutdown after running tests: main.grant Warning: Memory not freed: 256 I traced it to this code, that is also in 10.5: String(const char *str, size_t len, CHARSET_INFO *cs) :Charset(cs), Binary_string((char *) str, len) { } Removing the const caused Alloced_length to be set, which caused the extra allocations. After fixing this I was able to run the full test suite without issues. Will try again with valgrind today.
In your example that we do "a lot", the first argument is const char*, so String will have Alloced_length==0 and my suggestion will work exactly as planned, it'll use a \0 terminated fast path.
Yes, we have a lot, but we have also calls with not const that we have to consider. The current assumptions when using String and wanting a \0 terminated string are: - If you let String allocate the memory, c_ptr() will always be safe and fast. - If you use your own buffer that you fill in and you assign it to a String and you expect to use it for functions that expects \0 terminated strings, you should add the extra \0 yourself if you want c_ptr() to be fast and safe. If not, then use c_ptr_safe(). The thing to consider is which option is better: to use 'alloced' or Alloced_length. alloced is a weaker test and will cover more cases and will this cause fewer memory allocations. Alloced_length is probably a bit safer. In practice it looks like with current code both works (I have in the past tested alloced with valgrind and there has been no issues. Regards, Monty
Hi, Michael! On Mar 31, Michael Widenius wrote:
Yes, LEX_CSTRING *should* always be \0 terminated, but as we sometimes build these up I do not want to depend on that for generic usage. For usage to printf() than we should trust that the developer knows what he is doing.
However Item_string() will only use the exact length and never call printf, so here this is safe.
Okay. So, in effect, LEX_CSTRING is *not necessarily* \0 terminated. Fine.
diff --git a/sql/partition_info.cc b/sql/partition_info.cc <cut>
- DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %s", + DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %.*s", part_def->part_id, part_def->is_subpart, - part_name)); + length, part_name));
These two changes contradict each other. Either part_name must be \0-terminated, and then you don't need to specify a length in printf. Or it is not \0-terminated and you must specify a length.
part_name is not null terminated and that is why I had to add .* and length. What is the contradiction ?
You've cut too much from my email. "contradiction" referred to the change just above this one, it was:
- @param part_name Partition name to match. + @param part_name Partition name to match. Must be \0 terminated!
so, please, decide what it is. Either the comment is wrong or the code is redundant.
@@ -1254,21 +1262,16 @@ bool String::append_semi_hex(const char *s, uint len, CHARSET_INFO *cs) // Shrink the buffer, but only if it is allocated on the heap. void Binary_string::shrink(size_t arg_length) { - if (!is_alloced()) - return; - if (ALIGN_SIZE(arg_length + 1) < Alloced_length) + if (is_alloced() && ALIGN_SIZE(arg_length + 1) < Alloced_length) + { + char *new_ptr; + /* my_realloc() can't fail as new buffer is less than the original one */ + if ((new_ptr= (char*) my_realloc(STRING_PSI_MEMORY_KEY, Ptr, arg_length,
you don't need an if() because, as you wrote yourself "my_realloc() can't fail" here.
Yes, this should be true, but one never knows with allocation libraries. There is implementation of realloc that makes a copy and deletes the old one. However I am not sure what happens if there is no place to make a copy. Better safe than sorry.
No. I've fixed this quite a while ago. my_realloc() can *never* return NULL when reducing a buffer size. System realloc(), indeed, can, I've researched that topic when making the change. But my_realloc() has a protection against it and it never returns NULL in this case. We have few places in the code that rely on it.
+ { + Ptr[str_length]=0; + return Ptr; + } + (void) realloc(str_length+1); /* This will add end \0 */ return Ptr; } + /* Use c_ptr() instead. This will be deleted soon, kept for compatiblity */ inline char *c_ptr_quick() { - if (Ptr && str_length < Alloced_length) - Ptr[str_length]=0; - return Ptr; + return c_ptr_safe();
1. why not to remove it now? 2. it's strange that you write "use c_ptr() instead" but you actually use c_ptr_safe() instead.
What I meant is that the developer should instead use c_ptr(). I will make the message more clear. I decided to call c_ptr_safe() here as this is the most safe version to use. In practice the new c_ptr() should be safe for 99% of our code.
Let's remove c_ptr_quick() now. Why keep it? Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi! On Thu, Apr 1, 2021 at 12:03 AM Sergei Golubchik <serg@mariadb.org> wrote: <cut>
- DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %s", + DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %.*s", part_def->part_id, part_def->is_subpart, - part_name)); + length, part_name));
These two changes contradict each other. Either part_name must be \0-terminated, and then you don't need to specify a length in printf. Or it is not \0-terminated and you must specify a length.
part_name is not null terminated and that is why I had to add .* and length. What is the contradiction ?
You've cut too much from my email. "contradiction" referred to the change just above this one, it was:
- @param part_name Partition name to match. + @param part_name Partition name to match. Must be \0 terminated!
so, please, decide what it is. Either the comment is wrong or the code is redundant.
Neither. The comment is right. part_name must at this point in time be null terminated, but may not have to be that in the future. The printf string for DBUG_PRINT is correct. It uses the length, which avoids a strlen() inside sprintf() and is also future proof. The only reason that part_name needs to be null terminated is because of this code: if (!part_def) { my_error(ER_UNKNOWN_PARTITION, MYF(0), part_name, table->alias.c_ptr()); DBUG_RETURN(true); } Whenever we fix the error message to use %.*s, then we can lift the restrictions that part_name needs to be null terminated. Regards, Monty
@@ -1254,21 +1262,16 @@ bool String::append_semi_hex(const char *s, uint len, CHARSET_INFO *cs) // Shrink the buffer, but only if it is allocated on the heap. void Binary_string::shrink(size_t arg_length) { - if (!is_alloced()) - return; - if (ALIGN_SIZE(arg_length + 1) < Alloced_length) + if (is_alloced() && ALIGN_SIZE(arg_length + 1) < Alloced_length) + { + char *new_ptr; + /* my_realloc() can't fail as new buffer is less than the original one */ + if ((new_ptr= (char*) my_realloc(STRING_PSI_MEMORY_KEY, Ptr, arg_length,
you don't need an if() because, as you wrote yourself "my_realloc() can't fail" here.
Yes, this should be true, but one never knows with allocation libraries. There is implementation of realloc that makes a copy and deletes the old one. However I am not sure what happens if there is no place to make a copy. Better safe than sorry.
No. I've fixed this quite a while ago. my_realloc() can *never* return NULL when reducing a buffer size. System realloc(), indeed, can, I've researched that topic when making the change. But my_realloc() has a protection against it and it never returns NULL in this case. We have few places in the code that rely on it.
We also have code that relies on that my_malloc() never fails. Note what this commit did was just an cleanup of the original code, not trying to do any big changes of logic, except for c_ptr(). Anyway, I can remove the if here to keep you happy.
+ { + Ptr[str_length]=0; + return Ptr; + } + (void) realloc(str_length+1); /* This will add end \0 */ return Ptr; } + /* Use c_ptr() instead. This will be deleted soon, kept for compatiblity */ inline char *c_ptr_quick() { - if (Ptr && str_length < Alloced_length) - Ptr[str_length]=0; - return Ptr; + return c_ptr_safe();
1. why not to remove it now? 2. it's strange that you write "use c_ptr() instead" but you actually use c_ptr_safe() instead.
What I meant is that the developer should instead use c_ptr(). I will make the message more clear. I decided to call c_ptr_safe() here as this is the most safe version to use. In practice the new c_ptr() should be safe for 99% of our code.
Let's remove c_ptr_quick() now. Why keep it?
Not enough time to do it now. I rather work on fixing review comments. Regards, Monty
participants (2)
-
Michael Widenius
-
Sergei Golubchik