[Maria-developers] custom storage backend rnd_next() implementation and caching questions
Hi Everyone, I am developing a custom mariadb storage backend which is intended to give filesystem related information via mariadb tables. For example one could tell: CREATE TEMPORARY TABLE IF NOT EXISTS `/var/log` ENGINE=fsview; which would create a table (using the assisted discovery method) with the following schema: MariaDB [test]> describe `/var`; +-----------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------+--------------+------+-----+---------+-------+ | file_name | varchar(255) | YES | | NULL | | | contents | longblob | YES | | NULL | | +-----------+--------------+------+-----+---------+———+ However i have some implementation issues: 1. I am having trouble implementing rnd_next() in the backend: int ha_fsview::rnd_next(uchar *buf) { DBUG_ENTER(__PRETTY_FUNCTION__); if (!dirp) { DBUG_RETURN(HA_ERR_INTERNAL_ERROR); } if (struct dirent* entry = readdir(dirp)) { uchar *pos = buf; *pos = 0; ++pos; // file name (1bytes size + str) *pos = strlen(entry->d_name); ++pos; size_t bytes_to_copy = strlen(entry->d_name); memcpy(pos, entry->d_name, bytes_to_copy); pos+=bytes_to_copy; DBUG_PRINT("info", ("fsview filename: %s", entry->d_name)); // content (4 bytes size + content ptr) static const std::string fake_content = "hello world"; int s = fake_content.size(); memcpy(pos, &s, sizeof(s)); pos+=sizeof(s); const char* c = fake_content.c_str(); memcpy(pos, &c, sizeof(c)); pos+=sizeof(c); DBUG_RETURN(0); } else { DBUG_RETURN(HA_ERR_END_OF_FILE); } } My problem is that filenames (first field) is ok, but i cannot really see the expected content “hello world” in the second field. I am assuming that i am doing something wrong, but i cannot really find the reason. Could somebody help me, please? 2. If i am issuing: SELECT * FROM `/var`; twice, then the server is not initiating a table scan again on my backend (most likely results from the first run are stored in some cache). So if filesystem contents are changed between the two statements, then it won’t be reflected in the resultset. Of course if i do a: SELECT SQL_NO_CACHE * FROM `/var`; then everything is fine, however i would not like to put the burden on the user to specify the option to not cache the results. My question is whether it’s possible to specify in the storage backend that the results from the tables should not be cached. Thank for the help in advance -- Andras Szabo Sent with Airmail
Hi, Andras! On Jan 30, Andras Szabo wrote:
Hi Everyone,
I am developing a custom mariadb storage backend which is intended to give filesystem related information via mariadb tables. For example one could tell: CREATE TEMPORARY TABLE IF NOT EXISTS `/var/log` ENGINE=fsview; which would create a table (using the assisted discovery method) with the following schema:
Cool!
1. I am having trouble implementing rnd_next() in the backend:
int ha_fsview::rnd_next(uchar *buf) { DBUG_ENTER(__PRETTY_FUNCTION__); if (!dirp) { DBUG_RETURN(HA_ERR_INTERNAL_ERROR); } if (struct dirent* entry = readdir(dirp)) { uchar *pos = buf; *pos = 0; ++pos; // file name (1bytes size + str) *pos = strlen(entry->d_name); ++pos; size_t bytes_to_copy = strlen(entry->d_name); memcpy(pos, entry->d_name, bytes_to_copy); pos+=bytes_to_copy; DBUG_PRINT("info", ("fsview filename: %s", entry->d_name)); // content (4 bytes size + content ptr) static const std::string fake_content = "hello world"; int s = fake_content.size(); memcpy(pos, &s, sizeof(s)); pos+=sizeof(s); const char* c = fake_content.c_str(); memcpy(pos, &c, sizeof(c)); pos+=sizeof(c); DBUG_RETURN(0); } else { DBUG_RETURN(HA_ERR_END_OF_FILE); } }
My problem is that filenames (first field) is ok, but i cannot really see the expected content “hello world” in the second field. I am assuming that i am doing something wrong, but i cannot really find the reason. Could somebody help me, please?
This looks correct to me. May be, perhaps. std::string->c_str() isn't returning the pointer to your static string (but to a some temporary copy that gets invalidated on return)? I don't know how std::string is supposed to work in this case. Anyway it might've been easier to use Field methods than assembling the row directly. Like table->field[0]->store(entry->d_name, strlen(entry->d_name), system_charset_info); table->field[1]->store(STRING_WITH_LEN("hello world"), system_charset_info);
2. If i am issuing:
SELECT * FROM `/var`;
twice, then the server is not initiating a table scan again on my backend (most likely results from the first run are stored in some cache). So if filesystem contents are changed between the two statements, then it won’t be reflected in the resultset.
Of course if i do a:
SELECT SQL_NO_CACHE * FROM `/var`;
then everything is fine, however i would not like to put the burden on the user to specify the option to not cache the results.
My question is whether it’s possible to specify in the storage backend that the results from the tables should not be cached.
Yes. See handler::register_query_cache_table(). Regards, Sergei
Hi Sergei, Thanks for the quick reply. 1. I have tried your suggestion (which is a cool simplification to my bit-messing… ;). Unfortunately the code is crashing now: mariadb’s crash report shows the following stack trace: stack_bottom = 0x9e5de314 thread_stack 0x30000 mysys/stacktrace.c:246(my_print_stacktrace)[0x89912c5] sql/signal_handler.cc:153(handle_fatal_signal)[0x841593a] [0xb76f5c58] [0xb76f5c7c] /lib/i386-linux-gnu/libc.so.6(gsignal+0x47)[0xb70e6607] /lib/i386-linux-gnu/libc.so.6(abort+0x143)[0xb70e7d83] /lib/i386-linux-gnu/libc.so.6(+0x27757)[0xb70df757] /lib/i386-linux-gnu/libc.so.6(+0x27807)[0xb70df807] sql/field.cc:6876(Field_varstring::store(char const*, unsigned int, charset_info_st const*))[0x840032c] /usr/local/mysql/lib/plugin/ha_fsview.so(_ZN9ha_fsview8rnd_nextEPh+0xec)[0xb70adb00] sql/handler.cc:2553(handler::ha_rnd_next(unsigned char*))[0x841b003] sql/records.cc:465(rr_sequential(READ_RECORD*))[0x852cf50] sql/sql_select.cc:18532(join_init_read_record(st_join_table*))[0x82adff6] sql/sql_select.cc:17639(sub_select(JOIN*, st_join_table*, bool))[0x82ac256] sql/sql_select.cc:17304(do_select)[0x82abb38] sql/sql_select.cc:3079(JOIN::exec_inner())[0x82895c3] sql/sql_select.cc:2369(JOIN::exec())[0x8286b3a] sql/sql_select.cc:3307(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x8289e70] sql/sql_select.cc:372(handle_select(THD*, LEX*, select_result*, unsigned long))[0x8280acc] sql/sql_parse.cc:5269(execute_sqlcom_select)[0x8258632] sql/sql_parse.cc:2552(mysql_execute_command(THD*))[0x8250ee9] sql/sql_parse.cc:6415(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x825aa1f] sql/sql_parse.cc:1307(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x824e3db] sql/sql_parse.cc:1004(do_command(THD*))[0x824d7fd] sql/sql_connect.cc:1379(do_handle_one_connection(THD*))[0x8358b06] sql/sql_connect.cc:1293(handle_one_connection)[0x835887c] /lib/i386-linux-gnu/libpthread.so.0(+0x6f16)[0xb75fff16] /lib/i386-linux-gnu/libc.so.6(clone+0x5e)[0xb71a3a3e] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x9891f840): is an invalid pointer Connection ID (thread ID): 2 Status: NOT_KILLED Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on Apparently it seems that Field_varchar::store is somehow unhappy. I am using 10.0.14 sources, the given function is: int Field_varstring::store(const char *from,uint length,CHARSET_INFO *cs) { ASSERT_COLUMN_MARKED_FOR_WRITE_OR_COMPUTED; uint copy_length; const char *well_formed_error_pos; const char *cannot_convert_error_pos; const char *from_end_pos; copy_length= well_formed_copy_nchars(field_charset, (char*) ptr + length_bytes, field_length, cs, from, length, field_length / field_charset->mbmaxlen, &well_formed_error_pos, &cannot_convert_error_pos, &from_end_pos); if (length_bytes == 1) *ptr= (uchar) copy_length; else int2store(ptr, copy_length); if (check_string_copy_error(this, well_formed_error_pos, cannot_convert_error_pos, from + length, cs)) return 2; return report_if_important_data(from_end_pos, from + length, TRUE); } (More specificallly the problematic line seems to be the well_formed_copy_nchars() invocation in this function). I am a bit puzzled about how this stuff is working. I mean i haven’t told the table struct buf pointer value from rnd_next(). How can the field class compute, where he should put the data? Thanks again for the quick help Andras -- Andras Szabo Sent with Airmail On 30 Jan 2015 at 17:19:21, Sergei Golubchik (serg@mariadb.org) wrote: Hi, Andras! On Jan 30, Andras Szabo wrote:
Hi Everyone,
I am developing a custom mariadb storage backend which is intended to give filesystem related information via mariadb tables. For example one could tell: CREATE TEMPORARY TABLE IF NOT EXISTS `/var/log` ENGINE=fsview; which would create a table (using the assisted discovery method) with the following schema:
Cool!
1. I am having trouble implementing rnd_next() in the backend:
int ha_fsview::rnd_next(uchar *buf) { DBUG_ENTER(__PRETTY_FUNCTION__); if (!dirp) { DBUG_RETURN(HA_ERR_INTERNAL_ERROR); } if (struct dirent* entry = readdir(dirp)) { uchar *pos = buf; *pos = 0; ++pos; // file name (1bytes size + str) *pos = strlen(entry->d_name); ++pos; size_t bytes_to_copy = strlen(entry->d_name); memcpy(pos, entry->d_name, bytes_to_copy); pos+=bytes_to_copy; DBUG_PRINT("info", ("fsview filename: %s", entry->d_name)); // content (4 bytes size + content ptr) static const std::string fake_content = "hello world"; int s = fake_content.size(); memcpy(pos, &s, sizeof(s)); pos+=sizeof(s); const char* c = fake_content.c_str(); memcpy(pos, &c, sizeof(c)); pos+=sizeof(c); DBUG_RETURN(0); } else { DBUG_RETURN(HA_ERR_END_OF_FILE); } }
My problem is that filenames (first field) is ok, but i cannot really see the expected content “hello world” in the second field. I am assuming that i am doing something wrong, but i cannot really find the reason. Could somebody help me, please?
This looks correct to me. May be, perhaps. std::string->c_str() isn't returning the pointer to your static string (but to a some temporary copy that gets invalidated on return)? I don't know how std::string is supposed to work in this case. Anyway it might've been easier to use Field methods than assembling the row directly. Like table->field[0]->store(entry->d_name, strlen(entry->d_name), system_charset_info); table->field[1]->store(STRING_WITH_LEN("hello world"), system_charset_info);
2. If i am issuing:
SELECT * FROM `/var`;
twice, then the server is not initiating a table scan again on my backend (most likely results from the first run are stored in some cache). So if filesystem contents are changed between the two statements, then it won’t be reflected in the resultset.
Of course if i do a:
SELECT SQL_NO_CACHE * FROM `/var`;
then everything is fine, however i would not like to put the burden on the user to specify the option to not cache the results.
My question is whether it’s possible to specify in the storage backend that the results from the tables should not be cached.
Yes. See handler::register_query_cache_table(). Regards, Sergei
Hi, Andras! On Jan 30, Andras Szabo wrote:
Hi Sergei,
Thanks for the quick reply.
1. I have tried your suggestion (which is a cool simplification to my bit-messing… ;). Unfortunately the code is crashing now:
mariadb’s crash report shows the following stack trace:
stack_bottom = 0x9e5de314 thread_stack 0x30000 mysys/stacktrace.c:246(my_print_stacktrace)[0x89912c5] sql/signal_handler.cc:153(handle_fatal_signal)[0x841593a] [0xb76f5c58] [0xb76f5c7c] /lib/i386-linux-gnu/libc.so.6(gsignal+0x47)[0xb70e6607] /lib/i386-linux-gnu/libc.so.6(abort+0x143)[0xb70e7d83] /lib/i386-linux-gnu/libc.so.6(+0x27757)[0xb70df757] /lib/i386-linux-gnu/libc.so.6(+0x27807)[0xb70df807] sql/field.cc:6876(Field_varstring::store(char const*, unsigned int, charset_info_st const*))[0x840032c] /usr/local/mysql/lib/plugin/ha_fsview.so(_ZN9ha_fsview8rnd_nextEPh+0xec)[0xb70adb00]
You didn't tell why it's failing - it was in the log before the stack trace. I suspect it was signal 6 - abort (jugding from abort in the stack trace). Then it must be that ASSERT_COLUMN_MARKED_FOR_WRITE_OR_COMPUTED, and I suppose you're using debug build of the server. A fix would be to start your method with my_bitmap_map *old_map = dbug_tmp_use_all_columns(table, table->write_set); and to finish it with dbug_tmp_restore_column_map(table->write_set, old_map); A longer explanation is here: http://mariadb.atlassian.net/browse/MDEV-6381 Regards, Sergei
Hi Sergei, Your assumptions were right (and sorry, for leaving out important info), and your proposal fixed the issue, now i can see the contents properly. Again thanks for your help -- Andras Szabo Sent with Airmail On 30 Jan 2015 at 19:05:34, Sergei Golubchik (serg@mariadb.org) wrote: Hi, Andras! On Jan 30, Andras Szabo wrote:
Hi Sergei,
Thanks for the quick reply.
1. I have tried your suggestion (which is a cool simplification to my bit-messing… ;). Unfortunately the code is crashing now:
mariadb’s crash report shows the following stack trace:
stack_bottom = 0x9e5de314 thread_stack 0x30000 mysys/stacktrace.c:246(my_print_stacktrace)[0x89912c5] sql/signal_handler.cc:153(handle_fatal_signal)[0x841593a] [0xb76f5c58] [0xb76f5c7c] /lib/i386-linux-gnu/libc.so.6(gsignal+0x47)[0xb70e6607] /lib/i386-linux-gnu/libc.so.6(abort+0x143)[0xb70e7d83] /lib/i386-linux-gnu/libc.so.6(+0x27757)[0xb70df757] /lib/i386-linux-gnu/libc.so.6(+0x27807)[0xb70df807] sql/field.cc:6876(Field_varstring::store(char const*, unsigned int, charset_info_st const*))[0x840032c] /usr/local/mysql/lib/plugin/ha_fsview.so(_ZN9ha_fsview8rnd_nextEPh+0xec)[0xb70adb00]
You didn't tell why it's failing - it was in the log before the stack trace. I suspect it was signal 6 - abort (jugding from abort in the stack trace). Then it must be that ASSERT_COLUMN_MARKED_FOR_WRITE_OR_COMPUTED, and I suppose you're using debug build of the server. A fix would be to start your method with my_bitmap_map *old_map = dbug_tmp_use_all_columns(table, table->write_set); and to finish it with dbug_tmp_restore_column_map(table->write_set, old_map); A longer explanation is here: http://mariadb.atlassian.net/browse/MDEV-6381 Regards, Sergei
Hi Sergei, I am bringing up this topic again with a performance related question. Since the plugin is returning blob fields which can be large, for performance considerations we’ve chosen to use mmap to obtain the blob values. However it seems that the Field_blob::store(…) is using a malloc/memcopy to store the field value. In my understanding this is not required for the blob fields (data is represented as a ptr + size). Is there a method where i could bypass the copying? If i can bypass the copying of course i need to keep the mmap active until mysql has no need for the data. Is it right that rnd_end() is the earliest point where i can release the mmap? Thanks Andras -- Andras Szabo Sent with Airmail On 30 Jan 2015 at 19:05:34, Sergei Golubchik (serg@mariadb.org) wrote: Hi, Andras! On Jan 30, Andras Szabo wrote:
Hi Sergei,
Thanks for the quick reply.
1. I have tried your suggestion (which is a cool simplification to my bit-messing… ;). Unfortunately the code is crashing now:
mariadb’s crash report shows the following stack trace:
stack_bottom = 0x9e5de314 thread_stack 0x30000 mysys/stacktrace.c:246(my_print_stacktrace)[0x89912c5] sql/signal_handler.cc:153(handle_fatal_signal)[0x841593a] [0xb76f5c58] [0xb76f5c7c] /lib/i386-linux-gnu/libc.so.6(gsignal+0x47)[0xb70e6607] /lib/i386-linux-gnu/libc.so.6(abort+0x143)[0xb70e7d83] /lib/i386-linux-gnu/libc.so.6(+0x27757)[0xb70df757] /lib/i386-linux-gnu/libc.so.6(+0x27807)[0xb70df807] sql/field.cc:6876(Field_varstring::store(char const*, unsigned int, charset_info_st const*))[0x840032c] /usr/local/mysql/lib/plugin/ha_fsview.so(_ZN9ha_fsview8rnd_nextEPh+0xec)[0xb70adb00]
You didn't tell why it's failing - it was in the log before the stack trace. I suspect it was signal 6 - abort (jugding from abort in the stack trace). Then it must be that ASSERT_COLUMN_MARKED_FOR_WRITE_OR_COMPUTED, and I suppose you're using debug build of the server. A fix would be to start your method with my_bitmap_map *old_map = dbug_tmp_use_all_columns(table, table->write_set); and to finish it with dbug_tmp_restore_column_map(table->write_set, old_map); A longer explanation is here: http://mariadb.atlassian.net/browse/MDEV-6381 Regards, Sergei
Hi, Andras! On Feb 03, Andras Szabo wrote:
Since the plugin is returning blob fields which can be large, for performance considerations we’ve chosen to use mmap to obtain the blob values. However it seems that the Field_blob::store(…) is using a malloc/memcopy to store the field value. In my understanding this is not required for the blob fields (data is represented as a ptr + size). Is there a method where i could bypass the copying?
I don't think so. It copies the blob to transfer the ownership, the buffer where the caller put the blob data can be overwritten or freed by the caller. Which cannot happen in your case, but generally Field_blob::store doesn't know that.
If i can bypass the copying of course i need to keep the mmap active until mysql has no need for the data. Is it right that rnd_end() is the earliest point where i can release the mmap?
Would you really want to keep all your blobs mmapped until the end of the table scan? I know there were attempts to do something similar, but I haven't heard of any successfull ones :( This is something that must be done in the server and it isn't done yet. But what you can do - pay attention to table->read_set and don't read blobs at all, unless the server explicitly asks you to. Regards, Sergei
Hi! On 30.01.15 17:19, Sergei Golubchik wrote: [skip]
2. If i am issuing:
SELECT * FROM `/var`;
twice, then the server is not initiating a table scan again on my backend (most likely results from the first run are stored in some cache). So if filesystem contents are changed between the two statements, then it won’t be reflected in the resultset.
Of course if i do a:
SELECT SQL_NO_CACHE * FROM `/var`;
then everything is fine, however i would not like to put the burden on the user to specify the option to not cache the results.
My question is whether it’s possible to specify in the storage backend that the results from the tables should not be cached. Yes. See handler::register_query_cache_table().
If you do not want any results with this engine cached use table_cache_type() returning HA_CACHE_TBL_NOCACHE . [skip]
participants (3)
-
Andras Szabo
-
Oleksandr Byelkin
-
Sergei Golubchik