Re: 217a7a7ceff: online alter: rework savepoints
Hi, Nikita, On Sep 29, Nikita Malyavin wrote:
revision-id: 217a7a7ceff (mariadb-11.2.1-10-g217a7a7ceff) parent(s): ad6a305d84d author: Nikita Malyavin committer: Nikita Malyavin timestamp: 2023-09-23 11:46:36 +0400 message:
online alter: rework savepoints
Use standard handlerton functions for savepoint add/rollback.
To identify the savepoint, the pointer passed is used.
Every table that has online alter in progress maintains a list of savepoints independently.
Also this removes setting a value to a global variable savepoint_alloc_size without any protection, which was a race condition bug.
--- a/sql/online_alter.cc +++ b/sql/online_alter.cc @@ -6,7 +6,19 @@
static handlerton *online_alter_hton;
-using Online_alter_cache_list= ilist<online_alter_cache_data>; +using sv_id_t= void *;
typedef, please
+ +struct Online_alter_cache_list: ilist<online_alter_cache_data> +{ + sv_id_t savepoint_id= 0; +};
@@ -281,13 +308,11 @@ static int online_alter_log_init(void *p) { online_alter_hton= (handlerton *)p; online_alter_hton->db_type= DB_TYPE_ONLINE_ALTER; - online_alter_hton->savepoint_offset= sizeof(my_off_t); + online_alter_hton->savepoint_offset= 1;
you can even use 0, I guess. Otherwise it'll still be 8, as the server will need to align allocation, I suppose. But I suggest to try to swap the implementation. Make online_alter_hton->savepoint_offset=sizeof(void*); and in savepoint_set create a list or an array of all offsets and store it in the savepoint. Something like this: struct { uint savepoints; my_off_t log_prev_pos[1]; } oasv; int online_alter_savepoint_set(handlerton *hton, THD *thd, oasv **sv) { uint count= 0, i=0; for (auto &cache: cache_list) count++; DBUG_ASSERT(count); *sv= malloc(sizeof(oasv) + sizeof(my_off_t)*(count - 1)); (*sv)->savepoints= count; for (auto &cache: cache_list) (*sv)->log_prev_pos[i]= cache.get_byte_position(); } and so on. You just need to make sure you always use push_back and not push_front for adding caches to the cache_list. instead of a structure with a counter you can use an array with MY_OFF_T_UNDEF (not 0!) at the end.
online_alter_hton->close_connection= online_alter_close_connection;
- online_alter_hton->savepoint_set= // Done by online_alter_savepoint_set - [](handlerton *, THD *, void *){ return 0; }; - online_alter_hton->savepoint_rollback= // Done by online_alter_savepoint_rollback - [](handlerton *, THD *, void *){ return 0; }; + online_alter_hton->savepoint_set= online_alter_savepoint_set; + online_alter_hton->savepoint_rollback= online_alter_savepoint_rollback; online_alter_hton->savepoint_rollback_can_release_mdl= [](handlerton *hton, THD *thd){ return true; };
Regards, Sergei Chief Architect, MariaDB Server and security@mariadb.org
Hello Sergei! On Fri, 29 Sept 2023 at 23:14, Sergei Golubchik <serg@mariadb.org> wrote:
But I suggest to try to swap the implementation. Make
online_alter_hton->savepoint_offset=sizeof(void*);
and in savepoint_set create a list or an array of all offsets and store it in the savepoint. Something like this:
struct { uint savepoints; my_off_t log_prev_pos[1]; } oasv;
int online_alter_savepoint_set(handlerton *hton, THD *thd, oasv **sv) { uint count= 0, i=0; for (auto &cache: cache_list) count++; DBUG_ASSERT(count);
*sv= malloc(sizeof(oasv) + sizeof(my_off_t)*(count - 1)); (*sv)->savepoints= count;
for (auto &cache: cache_list) (*sv)->log_prev_pos[i]= cache.get_byte_position(); }
and so on. You just need to make sure you always use push_back and not push_front for adding caches to the cache_list.
instead of a structure with a counter you can use an array with MY_OFF_T_UNDEF (not 0!) at the end.
Just checked, I can really make it zero yes! Nice idea of exposing a flexible array member to reduce a number of allocations. I'll remark also, that my implementation can also be alloc-optimized. Overall I don't like the savepoint_offset design. Even Innodb only uses it to store a local counter, which I also had in an early implementation (but then found that a pointer itself can identify it). Only binlog is lucky to use it on purpose. Also if a plugin is uninstalled, the offset memory will never be reused. And if a dynamic size is required for a savepoint, like we have, it will not be automatically deallocated. So in your case, we'd also need to deallocate all the skipped savepoints somehow. Probably, they should be linked in a list, but then it's not worth it to transponse the implementation. -- Yours truly, Nikita Malyavin
Hi, Nikita, On Oct 09, Nikita Malyavin wrote:
Overall I don't like the savepoint_offset design. Even Innodb only uses it to store a local counter, which I also had in an early implementation (but then found that a pointer itself can identify it). Only binlog is lucky to use it on purpose. Also if a plugin is uninstalled, the offset memory will never be reused. And if a dynamic size is required for a savepoint, like we have, it will not be automatically deallocated.
the idea was that a server does the mapping of savepoint name to savepoint data, and the storage engine doesn't need to duplicate that with is own hash. it's not only about avoiding memory allocations, although it's a nice bonus, when it's possible. Regards, Sergei Chief Architect, MariaDB Server and security@mariadb.org
participants (2)
-
Nikita Malyavin
-
Sergei Golubchik