Re: [Maria-developers] InnoDB blob for primary key
Hello Sir Weekly Report for second week of gsoc 1. Implemented unique in the case of null 2. Implemented unique(A,b,c....) Currently working on tests , field hiding Regards sachin On Tue, May 31, 2016 at 10:49 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Sachin!
On May 31, Sachin Setia wrote:
Hello Sir Weekly Report For Gsoc 1. Implemented hash function in parser for one column 2. Implemenetd hash function for unlimited columns in parser 3. Implemented unique checking in case of hash collusion(it retrieves the row and compare data) but this only woks for unique blob column
Currently Working on unique checking in case of unique(a,b,c)
Thanks. This is ok.
Next time, please, send it to maria-developers list, not to me only :)
Do you have questions? Any code you want me to look at?
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi, Sachin! On Jun 06, Sachin Setia wrote:
Hello Sir Weekly Report for second week of gsoc 1. Implemented unique in the case of null 2. Implemented unique(A,b,c....)
Currently working on tests , field hiding
Good, thanks! I'd suggest to postpone field hiding until bb-10.2-default branch is merged into 10.2. Start with tests, it's very important to have them. See mysql-test/t/*.test and the manual is here: https://mariadb.com/kb/en/mariadb/mysql-test/ Don't forget direct tests of your HASH function. Then, may be, look at UPDATE. And here's a code review up to the commit b69e141
diff --git a/sql/handler.cc b/sql/handler.cc index fb2921f..e68e37b 100644 --- a/sql/handler.cc +++ b/sql/handler.cc @@ -5863,16 +5863,86 @@ int handler::ha_reset() DBUG_RETURN(reset()); }
+/** @breif
typo: brief
+ Compare two records + Need a better place for this function + @returns true if equal else false*/ +my_bool rec_hash_cmp(TABLE *tbl ,Field * hash_field)
in C++ you can use 'bool'. 'my_bool' is only for C and, please, pay attention to where you put spaces :)
+{ + Item * t_item; + t_item=hash_field->vcol_info->expr_item->next;
No, please don't rely on the 'next' pointer. It used to link all items into a list, so that they could be destroyed all at once. It's not how you traverse expression arguments. Use, for example, Item **arg,**arg_end; for (arg=args, arg_end=args+arg_count; arg != arg_end ; arg++) { Item t_item= *arg; or for (uint i= 0; i < arg_count; i++) { Item *t_item= args[i]; See these and other examples in item_func.cc
+ int diff = tbl->record[1]-tbl->record[0];
record[1] - record[0] is always table->s->rec_buff_length But there's one problem with that - you cannot use record[1], see a comment below.
+ Field * t_field; + int first_length,second_length; + while(t_item) + { + for(int i=0;i<tbl->s->fields;i++) + { + t_field=tbl->field[i]; + if(!my_strcasecmp(system_charset_info, + t_item->name,t_field->field_name)) + break; + }
your t_item is an Item_field, so your t_field will be t_item->field, no need to iterate anything.
+ /* First check for nulls */ + if(t_field->is_real_null()||t_field->is_real_null(diff)) + { + return false; + }
This shouldn't be necessary, see below, you shouldn't call this function when a field value is NULL.
+ /* + length of field is not equal to pack length when it is + subclass of field_blob and field _varsting + */ + first_length = t_field->data_length(); + second_length = t_field->data_length(diff); + if(first_length!=second_length) + return false; + if(t_field->cmp_max(t_field->ptr,t_field->ptr+diff,first_length)) + return false; + t_item=t_item->next; + } + return true; +} int handler::ha_write_row(uchar *buf) { - int error; + int error,map; Log_func *log_func= Write_rows_log_event::binlog_row_logging_function; + Field **field_iter; DBUG_ASSERT(table_share->tmp_table != NO_TMP_TABLE || m_lock_type == F_WRLCK); DBUG_ENTER("handler::ha_write_row"); DEBUG_SYNC_C("ha_write_row_start"); + /* First need to whether inserted record is unique or not */
+ /* One More Thing if i implement hidden field then detection can be easy */ + field_iter=table->field; + for(uint i=0;i<table->s->fields;i++)
You can iterate only virtual fields, there's a separate list for them.
+ { + if(strncmp((*field_iter)->field_name,"DB_ROW_HASH_",12)==0) + { + table->file->ha_index_init(0,0);
this might be a problem, as it changes the current index. I think it's ok for INSERT, but may be not ok for UPDATE (imagine, UPDATE ... WHERE indexed_column=5, the server does ha_index_init for indexed_column, and then, in a loop, ha_index_next/ha_update_row. So you cannot swicth to another index in the middle of UPDATE loop) So, use ha_index_read_idx_map(), it doesn't change the current index. by the way, how could it be ha_index_init(0,0) ??? you need to specify your unique key number, not 0.
+ /* We need to add the null bit */ + /* If the column can be NULL, then in the first byte we put 1 if the + field value is NULL, 0 otherwise. */ + uchar * ptr = (uchar *)my_malloc(sizeof(char ) *1+8, MYF(MY_WME));
1. where do you free it? 2. doing malloc/free for _every row_ is too expensive, mallocs generally are not very good for concurrency. You can simply use uchar ptr[9] can you not?
+ *ptr=0;
why *ptr=0 ? You need to check whether the column value is actually NULL, because now you set the key to "not null" unconditionally. Besides, because NULL value can never cause HA_ERR_FOUND_DUPP_KEY, you don't need to search if the column is NULL, you can skip this whole check.
+ ptr++; + memcpy(ptr,(*field_iter)->ptr,8); + ptr--;
No need to go low-level, you can do: Why wouldn't you use key_copy() function? Like uchar buf[9]; DBUG_ASSERT(key->length == sizeof(buf)); key_copy(ptr, buf, key, 0, false);
+ map= table->file->ha_index_read_map(table->record[1],ptr,HA_WHOLE_KEY,HA_READ_KEY_EXACT);
1. why 'map'? what does it mean? 2. Now it's getting a bit tricky. in UPDATE you cannot use record[1] - it's the row before the update. I'm afraid you'll need to allocate another table->s->rec_buff_length buffer for this, I don't see any other choice. Of course, you can do it on demand, on the table's memroot, preserve between calls, and so on.
+ if(!map) + { + if(rec_hash_cmp(table,*field_iter)) + { + table->file->ha_index_end(); + DBUG_RETURN(HA_ERR_FOUND_DUPP_KEY); + } + + } + table->file->ha_index_end(); + } + field_iter++; + } MYSQL_INSERT_ROW_START(table_share->db.str, table_share->table_name.str); mark_trx_read_write(); increment_statistics(&SSV::ha_write_count); diff --git a/sql/item_func.cc b/sql/item_func.cc index e5c1f4c..fb23495 100644 --- a/sql/item_func.cc +++ b/sql/item_func.cc @@ -1916,6 +1916,41 @@ void Item_func_int_div::fix_length_and_dec() }
+longlong Item_func_hash::val_int() +{ + unsigned_flag =true;
the other way around, no space before, one space after :)
+ ulong nr1= 1,temp=1, nr2= 0; + CHARSET_INFO *cs; + for(int i=0;i<arg_count;i++) + { + String * str = args[i]->val_str(); + if(args[i]->null_value) + { + null_value=1;
don't forget to set null_value=0 if no arguments are NULL
+ return 0; + } + nr2=args[i]->max_length;
Hmm, I don't know. This makes your hash depend on metadata, not only on data. I don't think it's a good idea.
+ cs=str->charset(); + char l[4]; + int4store(l,str->length()); + cs->coll->hash_sort(cs, (uchar *)l,sizeof(str->length()), &temp, &nr2);
not sizeof(str->length()), but sizeof(l).
+ nr1^=temp; + cs->coll->hash_sort(cs, (uchar *)str->ptr(),str->length(), &temp, &nr2);
Why did you change the hashing function? &nr1 should be the 4th argument of hash_sort(), no xors afterwards. If you change it, you need to prove mathematically that your version is better. That's why I personally prefer to not to mess with it :)
+ nr1^=temp; + } + return (longlong)nr1; +} + + +void Item_func_hash::fix_length_and_dec() +{ + maybe_null= 1; + null_value= 0;
no need to set null_value here, the value is calculated in val_int, not in fix_length_and_dec. fix_length_and_dec is called once, at the very beginning of a query, val_int is called for every row. And you need to set null_value=0 for every row.
+ decimals= 0; + max_length= 8; +} + + longlong Item_func_mod::int_op() { DBUG_ASSERT(fixed == 1); diff --git a/sql/sql_insert.cc b/sql/sql_insert.cc index 9b4238b..2c90f3c 100644 --- a/sql/sql_insert.cc +++ b/sql/sql_insert.cc @@ -724,6 +723,20 @@ bool mysql_insert(THD *thd,TABLE_LIST *table_list, DBUG_RETURN(TRUE); }
Please mark places you're going to fix later, e.g. // TODO when we'll have hidden fields, this will be removed
+ context= &thd->lex->select_lex.context; + Field ** f=table_list->table->field; + List<Item> * i_list = (List<Item> *)values_list.first_node()->info; + for(uint i=0;i<table_list->table->s->fields;i++) + { + if(strncmp((*f)->field_name,"DB_ROW_HASH_",12)==0) + { + i_list->push_back(new (thd->mem_root) + Item_default_value(thd,context),thd->mem_root); + } + f++; + } + List_iterator_fast<List_item> its(values_list); + lock_type= table_list->lock_type;
THD_STAGE_INFO(thd, stage_init); diff --git a/sql/sql_table.cc b/sql/sql_table.cc index dfce503..ac2988f 100644 --- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -3012,8 +3012,8 @@ int prepare_create_field(Create_field *sql_field, (sql_field->decimals << FIELDFLAG_DEC_SHIFT)); break; } - if (!(sql_field->flags & NOT_NULL_FLAG) || - (sql_field->vcol_info)) /* Make virtual columns allow NULL values */ + if (!(sql_field->flags & NOT_NULL_FLAG) ) + /* (sql_field->vcol_info)) Make virtual columns allow NULL values */
don't forget to test (see above about mysql-test) that your change does not affect normal generated columns and they are still always nullable. to test that, create a table with a persistent generated column with an expression that can return NULL.
sql_field->pack_flag|= FIELDFLAG_MAYBE_NULL; if (sql_field->flags & NO_DEFAULT_VALUE_FLAG) sql_field->pack_flag|= FIELDFLAG_NO_DEFAULT; @@ -3223,6 +3223,80 @@ mysql_prepare_create_table(THD *thd, HA_CREATE_INFO *create_info, bool tmp_table= create_table_mode == C_ALTER_TABLE; DBUG_ENTER("mysql_prepare_create_table");
+ /* + scan the the whole alter list + and add one field if length of blob is zero + */
you need a TODO marker here (TODO change to work for all too-long unique keys)
+ + List_iterator<Key> key_iter(alter_info->key_list); + Key *key_iter_key; + Key_part_spec *temp_colms; + char num = '1'; diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy index 24dc3c4..0135d9c 100644 --- a/sql/sql_yacc.yy +++ b/sql/sql_yacc.yy @@ -9543,6 +9543,15 @@ function_call_keyword: if ($$ == NULL) MYSQL_YYABORT; } + /*Currently hash for just 1 field + * */
your comment is already obsolete :)
+ |HASH_SYM '(' expr_list ')' + { + $$= new (thd->mem_root)Item_func_hash(thd,*$3); + if($$==NULL) + MYSQL_YYABORT; + } + | INSERT '(' expr ',' expr ',' expr ',' expr ')' { $$= new (thd->mem_root) Item_func_insert(thd, $3, $5, $7, $9);
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hello Sergei, Sir actually I added flags for unique_hash and index_hash for KEY these are #define HA_INDEX_HASH 4 #define HA_UNIQUE_HASH 8 but ./mtr inoodb test are failing bacause KEY flag is 40 which is HA_BINARY_PACK_KEY + HA_UNIQUE_HASH but HA_UNIQUE_HASH should not be there . Is 4 and 8 are already used ,or should i go for another approach. Sir please review the code. Regards sachin On Mon, Jun 6, 2016 at 5:00 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Sachin!
On Jun 06, Sachin Setia wrote:
Hello Sir Weekly Report for second week of gsoc 1. Implemented unique in the case of null 2. Implemented unique(A,b,c....)
Currently working on tests , field hiding
Good, thanks!
I'd suggest to postpone field hiding until bb-10.2-default branch is merged into 10.2.
Start with tests, it's very important to have them. See mysql-test/t/*.test and the manual is here: https://mariadb.com/kb/en/mariadb/mysql-test/ Don't forget direct tests of your HASH function.
Then, may be, look at UPDATE.
And here's a code review up to the commit b69e141
diff --git a/sql/handler.cc b/sql/handler.cc index fb2921f..e68e37b 100644 --- a/sql/handler.cc +++ b/sql/handler.cc @@ -5863,16 +5863,86 @@ int handler::ha_reset() DBUG_RETURN(reset()); }
+/** @breif
typo: brief
+ Compare two records + Need a better place for this function + @returns true if equal else false*/ +my_bool rec_hash_cmp(TABLE *tbl ,Field * hash_field)
in C++ you can use 'bool'. 'my_bool' is only for C
and, please, pay attention to where you put spaces :)
+{ + Item * t_item; + t_item=hash_field->vcol_info->expr_item->next;
No, please don't rely on the 'next' pointer. It used to link all items into a list, so that they could be destroyed all at once. It's not how you traverse expression arguments.
Use, for example,
Item **arg,**arg_end; for (arg=args, arg_end=args+arg_count; arg != arg_end ; arg++) { Item t_item= *arg;
or for (uint i= 0; i < arg_count; i++) { Item *t_item= args[i];
See these and other examples in item_func.cc
+ int diff = tbl->record[1]-tbl->record[0];
record[1] - record[0] is always table->s->rec_buff_length But there's one problem with that - you cannot use record[1], see a comment below.
+ Field * t_field; + int first_length,second_length; + while(t_item) + { + for(int i=0;i<tbl->s->fields;i++) + { + t_field=tbl->field[i]; + if(!my_strcasecmp(system_charset_info, + t_item->name,t_field->field_name)) + break; + }
your t_item is an Item_field, so your t_field will be t_item->field, no need to iterate anything.
+ /* First check for nulls */ + if(t_field->is_real_null()||t_field->is_real_null(diff)) + { + return false; + }
This shouldn't be necessary, see below, you shouldn't call this function when a field value is NULL.
+ /* + length of field is not equal to pack length when it is + subclass of field_blob and field _varsting + */ + first_length = t_field->data_length(); + second_length = t_field->data_length(diff); + if(first_length!=second_length) + return false; + if(t_field->cmp_max(t_field->ptr,t_field->ptr+diff,first_length)) + return false; + t_item=t_item->next; + } + return true; +} int handler::ha_write_row(uchar *buf) { - int error; + int error,map; Log_func *log_func= Write_rows_log_event::binlog_row_logging_function; + Field **field_iter; DBUG_ASSERT(table_share->tmp_table != NO_TMP_TABLE || m_lock_type == F_WRLCK); DBUG_ENTER("handler::ha_write_row"); DEBUG_SYNC_C("ha_write_row_start"); + /* First need to whether inserted record is unique or not */
+ /* One More Thing if i implement hidden field then detection can be easy */ + field_iter=table->field; + for(uint i=0;i<table->s->fields;i++)
You can iterate only virtual fields, there's a separate list for them.
+ { + if(strncmp((*field_iter)->field_name,"DB_ROW_HASH_",12)==0) + { + table->file->ha_index_init(0,0);
this might be a problem, as it changes the current index. I think it's ok for INSERT, but may be not ok for UPDATE (imagine, UPDATE ... WHERE indexed_column=5, the server does ha_index_init for indexed_column, and then, in a loop, ha_index_next/ha_update_row. So you cannot swicth to another index in the middle of UPDATE loop)
So, use ha_index_read_idx_map(), it doesn't change the current index.
by the way, how could it be ha_index_init(0,0) ??? you need to specify your unique key number, not 0.
+ /* We need to add the null bit */ + /* If the column can be NULL, then in the first byte we put 1 if the + field value is NULL, 0 otherwise. */ + uchar * ptr = (uchar *)my_malloc(sizeof(char ) *1+8, MYF(MY_WME));
1. where do you free it? 2. doing malloc/free for _every row_ is too expensive, mallocs generally are not very good for concurrency. You can simply use
uchar ptr[9]
can you not?
+ *ptr=0;
why *ptr=0 ? You need to check whether the column value is actually NULL, because now you set the key to "not null" unconditionally. Besides, because NULL value can never cause HA_ERR_FOUND_DUPP_KEY, you don't need to search if the column is NULL, you can skip this whole check.
+ ptr++; + memcpy(ptr,(*field_iter)->ptr,8); + ptr--;
No need to go low-level, you can do: Why wouldn't you use key_copy() function? Like
uchar buf[9]; DBUG_ASSERT(key->length == sizeof(buf)); key_copy(ptr, buf, key, 0, false);
+ map= table->file->ha_index_read_map(table->record[1],ptr,HA_WHOLE_KEY,HA_READ_KEY_EXACT);
1. why 'map'? what does it mean? 2. Now it's getting a bit tricky. in UPDATE you cannot use record[1] - it's the row before the update. I'm afraid you'll need to allocate another table->s->rec_buff_length buffer for this, I don't see any other choice. Of course, you can do it on demand, on the table's memroot, preserve between calls, and so on.
+ if(!map) + { + if(rec_hash_cmp(table,*field_iter)) + { + table->file->ha_index_end(); + DBUG_RETURN(HA_ERR_FOUND_DUPP_KEY); + } + + } + table->file->ha_index_end(); + } + field_iter++; + } MYSQL_INSERT_ROW_START(table_share->db.str, table_share->table_name.str); mark_trx_read_write(); increment_statistics(&SSV::ha_write_count); diff --git a/sql/item_func.cc b/sql/item_func.cc index e5c1f4c..fb23495 100644 --- a/sql/item_func.cc +++ b/sql/item_func.cc @@ -1916,6 +1916,41 @@ void Item_func_int_div::fix_length_and_dec() }
+longlong Item_func_hash::val_int() +{ + unsigned_flag =true;
the other way around, no space before, one space after :)
+ ulong nr1= 1,temp=1, nr2= 0; + CHARSET_INFO *cs; + for(int i=0;i<arg_count;i++) + { + String * str = args[i]->val_str(); + if(args[i]->null_value) + { + null_value=1;
don't forget to set null_value=0 if no arguments are NULL
+ return 0; + } + nr2=args[i]->max_length;
Hmm, I don't know. This makes your hash depend on metadata, not only on data. I don't think it's a good idea.
+ cs=str->charset(); + char l[4]; + int4store(l,str->length()); + cs->coll->hash_sort(cs, (uchar *)l,sizeof(str->length()), &temp, &nr2);
not sizeof(str->length()), but sizeof(l).
+ nr1^=temp; + cs->coll->hash_sort(cs, (uchar *)str->ptr(),str->length(), &temp, &nr2);
Why did you change the hashing function? &nr1 should be the 4th argument of hash_sort(), no xors afterwards.
If you change it, you need to prove mathematically that your version is better. That's why I personally prefer to not to mess with it :)
+ nr1^=temp; + } + return (longlong)nr1; +} + + +void Item_func_hash::fix_length_and_dec() +{ + maybe_null= 1; + null_value= 0;
no need to set null_value here, the value is calculated in val_int, not in fix_length_and_dec. fix_length_and_dec is called once, at the very beginning of a query, val_int is called for every row. And you need to set null_value=0 for every row.
+ decimals= 0; + max_length= 8; +} + + longlong Item_func_mod::int_op() { DBUG_ASSERT(fixed == 1); diff --git a/sql/sql_insert.cc b/sql/sql_insert.cc index 9b4238b..2c90f3c 100644 --- a/sql/sql_insert.cc +++ b/sql/sql_insert.cc @@ -724,6 +723,20 @@ bool mysql_insert(THD *thd,TABLE_LIST *table_list, DBUG_RETURN(TRUE); }
Please mark places you're going to fix later, e.g. // TODO when we'll have hidden fields, this will be removed
+ context= &thd->lex->select_lex.context; + Field ** f=table_list->table->field; + List<Item> * i_list = (List<Item> *)values_list.first_node()->info; + for(uint i=0;i<table_list->table->s->fields;i++) + { + if(strncmp((*f)->field_name,"DB_ROW_HASH_",12)==0) + { + i_list->push_back(new (thd->mem_root) + Item_default_value(thd,context),thd->mem_root); + } + f++; + } + List_iterator_fast<List_item> its(values_list); + lock_type= table_list->lock_type;
THD_STAGE_INFO(thd, stage_init); diff --git a/sql/sql_table.cc b/sql/sql_table.cc index dfce503..ac2988f 100644 --- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -3012,8 +3012,8 @@ int prepare_create_field(Create_field *sql_field, (sql_field->decimals << FIELDFLAG_DEC_SHIFT)); break; } - if (!(sql_field->flags & NOT_NULL_FLAG) || - (sql_field->vcol_info)) /* Make virtual columns allow NULL values */ + if (!(sql_field->flags & NOT_NULL_FLAG) ) + /* (sql_field->vcol_info)) Make virtual columns allow NULL values */
don't forget to test (see above about mysql-test) that your change does not affect normal generated columns and they are still always nullable. to test that, create a table with a persistent generated column with an expression that can return NULL.
sql_field->pack_flag|= FIELDFLAG_MAYBE_NULL; if (sql_field->flags & NO_DEFAULT_VALUE_FLAG) sql_field->pack_flag|= FIELDFLAG_NO_DEFAULT; @@ -3223,6 +3223,80 @@ mysql_prepare_create_table(THD *thd, HA_CREATE_INFO *create_info, bool tmp_table= create_table_mode == C_ALTER_TABLE; DBUG_ENTER("mysql_prepare_create_table");
+ /* + scan the the whole alter list + and add one field if length of blob is zero + */
you need a TODO marker here (TODO change to work for all too-long unique keys)
+ + List_iterator<Key> key_iter(alter_info->key_list); + Key *key_iter_key; + Key_part_spec *temp_colms; + char num = '1'; diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy index 24dc3c4..0135d9c 100644 --- a/sql/sql_yacc.yy +++ b/sql/sql_yacc.yy @@ -9543,6 +9543,15 @@ function_call_keyword: if ($$ == NULL) MYSQL_YYABORT; } + /*Currently hash for just 1 field + * */
your comment is already obsolete :)
+ |HASH_SYM '(' expr_list ')' + { + $$= new (thd->mem_root)Item_func_hash(thd,*$3); + if($$==NULL) + MYSQL_YYABORT; + } + | INSERT '(' expr ',' expr ',' expr ',' expr ')' { $$= new (thd->mem_root) Item_func_insert(thd, $3, $5, $7, $9);
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi, Sachin! On Jul 01, Sachin Setia wrote:
Hello Sergei, Sir actually I added flags for unique_hash and index_hash for KEY these are
#define HA_INDEX_HASH 4
#define HA_UNIQUE_HASH 8
but ./mtr inoodb test are failing bacause KEY flag is 40 which is HA_BINARY_PACK_KEY + HA_UNIQUE_HASH but HA_UNIQUE_HASH should not be there . Is 4 and 8 are already used ,or should i go for another approach.
Yes, they're already used. Below in the file you'll find: /* Automatic bits in key-flag */ #define HA_SPACE_PACK_USED 4 /* Test for if SPACE_PACK used */ #define HA_VAR_LENGTH_KEY 8 I'm reviewing your tree now Regards, Sergei Chief Architect MariaDB and security@mariadb.org
participants (2)
-
Sachin Setia
-
Sergei Golubchik