Re: [Maria-developers] open_binary_frm() decomposed an methods of TABLE_SHARE
Hi, Sanja! On Oct 04, sanja@askmonty.org wrote:
------------------------------------------------------------ revno: 3202 revision-id: sanja@askmonty.org-20111004111459-1i44vpkdm1j3lisl parent: psergey@askmonty.org-20110929130743-yj0zl4hdvoydnmda committer: sanja@askmonty.org branch nick: work-maria-5.3-frm2 timestamp: Tue 2011-10-04 14:14:59 +0300 message: open_binary_frm() decomposed an methods of TABLE_SHARE to allow engine to create TABLE_SHARE.
utype and geometry_type moved out of Field.
Thanks! Here're my thoughts. The summary is: 1. I'd like to have it simpler. It's going to be an external API for engines to use, after all. 2. Don't try to support all optimizations that open_binary_frm() uses, and old frm formats - keep that out of the API, open_binary_frm() can modify TABLE_SHARE directly to achieve the necessary effect. 3. Try to create a usage example - it will show the unnecessary complexity See all the comments below:
=== modified file 'sql/handler.h' --- a/sql/handler.h 2011-07-14 13:44:37 +0000 +++ b/sql/handler.h 2011-10-04 11:14:59 +0000 @@ -298,6 +298,31 @@ enum legacy_db_type DB_TYPE_FIRST_DYNAMIC=42, DB_TYPE_DEFAULT=127 // Must be last }; + +/* + We use three additional unireg types for TIMESTAMP to overcome limitation + of current binary format of .frm file. We'd like to be able to support + NOW() as default and on update value for such fields but unable to hold + this info anywhere except unireg_check field. This issue will be resolved + in more clean way with transition to new text based .frm format. + See also comment for Field_timestamp::Field_timestamp(). +*/ +enum utype +{ + UT_NONE, UT_DATE, UT_SHIELD, UT_NOEMPTY, UT_CASEUP, UT_PNR, + UT_BGNR, UT_PGNR, UT_YES, UT_NO, UT_REL, + UT_CHECK, UT_EMPTY, UT_UNKNOWN_FIELD, UT_CASEDN, + UT_NEXT_NUMBER, UT_INTERVAL_FIELD, + UT_BIT_FIELD, UT_TIMESTAMP_OLD_FIELD, UT_CAPITALIZE, + UT_BLOB_FIELD, UT_TIMESTAMP_DN_FIELD, UT_TIMESTAMP_UN_FIELD, + UT_TIMESTAMP_DNUN_FIELD +};
Why wouldn't you remove all unused values from utype enum?
+enum geometry_type +{ + GEOM_GEOMETRY = 0, GEOM_POINT = 1, GEOM_LINESTRING = 2, GEOM_POLYGON = 3, + GEOM_MULTIPOINT = 4, GEOM_MULTILINESTRING = 5, GEOM_MULTIPOLYGON = 6, + GEOM_GEOMETRYCOLLECTION = 7 +}; /* Better name for DB_TYPE_UNKNOWN. Should be used for engines that do not have a hard-coded type value here. === modified file 'sql/table.cc' --- a/sql/table.cc 2011-09-02 12:10:10 +0000 +++ b/sql/table.cc 2011-10-04 11:14:59 +0000 @@ -675,6 +684,797 @@ err_not_open: }
+/** + Set some general table parameters (should be called first after the + TABLE SHARE initialization) + + @param frm_ver Version of the frm (FRM_VER_TRUE_VARCHAR)
no need to pass it as a parameter, set frm version always to the latest one. if needed - overwrite it later from open_binary_frm(). don't add support for historical features into the API.
+ @param db_create_opt Flags of CREATE options + @param avg_row_len Average row length + @param transact CREATE option TRANSACTION + @param page_chcksum CREATE option PAGE_CHECK_SUM + @param rtype CREATE option ROW_TYPE (used by aria only) + @param charset default charset for the table. + @param null_fld_first Should be TRUE for modern frm
same as frm_version
+ @param mi_rows CREATE option MIN_ROWS + @param ma_rows CREATE option MAX_ROWS + @param reclen Record buffer length + @param extra_rec_buf_len Extra record buffer length (???) + @param key_blck_size CREATE option KEY_BLOCK_SIZE + @param new_fld_pck_flg Should be 2 for modern frm
same as frm_version
+ @param error Error code assigned here in case of error + + @retval FALSE OK + @retval TRUE Error (error assigned) +*/ + +bool TABLE_SHARE::set_init_parameters(uchar frm_ver, ulong mysql_ver, + uint db_create_opt, + ulong avg_row_len, + enum ha_choice transact, + enum ha_choice page_chcksum, + enum row_type rtype, + CHARSET_INFO *charset, + bool null_fld_first, + ha_rows mi_rows, + ha_rows ma_rows, + ulong reclen, + uint extra_rec_buf_len, + uint key_blck_size, + uint new_fld_pck_flg, + int &error)
use a pointer, not a reference. Or, better, just return the error code
+{ + frm_version= frm_ver; + mysql_version= mysql_ver; + db_options_in_use= db_create_options= db_create_opt; + if (db_create_options & HA_OPTION_LONG_BLOB_PTR) + blob_ptr_size= portable_sizeof_char_ptr; + avg_row_length= avg_row_len; + transactional= transact; + page_checksum= page_chcksum; + row_type= rtype; + table_charset= charset; + null_field_first= null_fld_first; + db_record_offset= 1; + min_rows= mi_rows; + max_rows= ma_rows; + stored_rec_length= reclength= reclen; + key_block_size= key_blck_size; + new_field_pack_flag= new_fld_pck_flg; + rec_buff_length= ALIGN_SIZE(reclength + 1 + extra_rec_buf_len); + if (!(default_values= (uchar *) alloc_root(&mem_root, + rec_buff_length))) + { + error= 4; + return TRUE; + } + /* Default values */ + system= 0; + crypted= 0; + + return FALSE; +} + + +/** + Allocate space for indices and its parts + + @param nkeys Number of keys + @param nkey_parts Number of key parts of all keys + @param names_len Sum of lengths of all indices names (string ends + are not counted) + @param error Error code assigned here in case of error\
return the error code instead
+ + @retval FALSE OK + @retval TRUE Error (error assigned) +*/ + +bool TABLE_SHARE::allocate_indices_space(uint nkeys, uint nkey_parts, + uint names_len, int &error) +{ + uint n_length= (nkeys * sizeof(KEY) + + nkey_parts * sizeof(KEY_PART_INFO)); + + DBUG_ASSERT(key_info == NULL); // prevent reallocation + DBUG_ASSERT(nkeys <= nkey_parts); + + keys= nkeys; + key_parts= nkey_parts; + keys_for_keyread.init(0); + keys_in_use.init(keys); + + + if (!(rec_per_key= (ulong *) alloc_root(&mem_root, + sizeof(ulong) * key_parts + + n_length + + (names_len + nkeys + 1)))) + { + error= 4; + return TRUE; /* purecov: inspected */ + } + key= key_info= + my_reinterpret_cast(KEY*) (rec_per_key + key_parts); + bzero((char*) key_info, n_length); + first_key_part= key_part= + my_reinterpret_cast(KEY_PART_INFO*) (key_info + keys); + first_keyname= keyname= + my_reinterpret_cast(char *) (first_key_part + key_parts); + is_keys_allocated= TRUE; + if (key_parts == 0) + { + DBUG_ASSERT(keys == 0); + is_keynames_ready= is_keyparts_ready= TRUE; + if (is_fieldnames_ready) + parse_names(); + if (is_fields_ready && + connect_indexes_and_fields(error)) + return TRUE; + } + if (is_fields_allocated && final_fields_keys_allocation()) + { + error= 4; + return TRUE; + } + return FALSE; +} + + +/** + Add index header + + @param name Name of the index (could be NULL if it is + already copied by add_index_names_as_onestring()) + @param name_len Length of the index name (used only if name passed) + @param flags Key flags + @param length Length of the key (in its buffer) + @param parts Number of parts in the index + @param alg Index algorithm + @param blk_size Index block size + + @retval pointer on the allocated index descriptor +*/ + +KEY *TABLE_SHARE::add_index(char *name, uint name_len, + uint flags, uint length, uint parts, + enum ha_key_alg alg, uint blk_size) +{ + DBUG_ASSERT(key < (key_info + keys)); + DBUG_ASSERT(key->key_length == 0); // prevent redefinition + DBUG_ASSERT(is_keys_allocated); + key->flags= flags; + key->key_length= length;
isn't key_length equal to the sum of keypart lengths?
+ key->key_parts= parts; + key->algorithm= alg; + key->block_size= blk_size; + if (name) + { + /* + TODO: should be check bounds of the allocated string? + + Check for illegal separating characters + */ + DBUG_ASSERT(memchr(name, '\0', name_len) == NULL); + DBUG_ASSERT(memchr(name, NAMES_SEP_CHAR, name_len) == NULL); + + if (key == key_info) + *(keyname++)= NAMES_SEP_CHAR;
why is that?
+ + memcpy(keyname, name, name_len); + keyname+= name_len; + *(keyname++)= NAMES_SEP_CHAR;
please, don't do that. Don't ask the caller for the total length of all key names, and don't allocate them in one alloc_root. just set the pointer here: key->name= name. the caller can allocate them in one array, or independently, or whatever - don't bother enforcing anything specific here.
+ + if (key == key_info + (keys - 1)) + { + is_keynames_ready= TRUE; + if (is_fieldnames_ready) + parse_names(); + } + + /* names should be all or none */ + DBUG_ASSERT(key == key_info || (key - 1)->name != NULL); + } + else + { + /* names should be all or none */ + DBUG_ASSERT(key == key_info || (key - 1)->name == NULL); + } + return key++; +} + + +/** + Add part description to the index + + @param keyinfo The index where to add part + @param fieldnr Number of the field used in the index + @param offset Offset of the part in the key biffer
do you need that? The keypart starts where the previous one ends, doesn't it?
+ @param keytype Key type + @param length Length of the part + @param flags Flags of the part + @param error Error code assigned here in case of error
return the error instead
+ + @retval FALSE OK + @retval TRUE Error (error assigned) +*/ + +bool TABLE_SHARE::add_index_part(KEY *keyinfo, uint16 fieldnr, + uint offset, uint16 keytype, + uint16 length, uint16 flags, + int &error) +{ + if (!keyinfo->key_part) + { + // first key part for the key + keyinfo->key_part= key_part; + keyinfo->rec_per_key= rec_per_key;
this can be done as key[i].rec_per_key=key[i-1].rec_per_key+key[i-1].key_parts not in add_index_part() but later, in something like finalize_share() - function that is called at the end. where you fix all typelibs and stuff.
+ } + *rec_per_key++= 0; + key_part->fieldnr= fieldnr; + key_part->offset= offset; + key_part->key_type= keytype; + key_part->length= length; + key_part->key_part_flag= flags; + key_part->store_length= key_part->length; // will be fixed + key_part++; + + if (key_part == first_key_part + key_parts) + { + is_keyparts_ready= TRUE; + if (is_fields_ready && + connect_indexes_and_fields(error)) + return TRUE; + }
you don't need any is_fields_ready, etc move all that code to finalize_share() and let the caller to call it at the end, when all fields and keys are parsed. No need to try to detect this moment automatically with various cursors.
+ return FALSE; +} + +/** + Allocate space for field information + + @param fields_count Number of the fields + @param null_flds Number of the fields which could be NULL + @param enum_set_count Number of fields which are ENUM or SET + @param enum_set_values_count Sum of number of values of all fields which + are ENUM or SET + @param field_names_length Sum length of all fields names (string ends are not + counted)
don't ask for this (see above)
+ @param enum_set_names_length Sum length of all values of all fields which + are ENUM or SET (string ends are not counted) + @param com_length Sum ength of the all field comments
nor for this, nor enum_set_count, nor enum_set_values_count, nor enum_set_names_length
+ @param handler_file handler - used to check fields/indices features + @param tbl_com_length Length of the table comment (if present) or 0 + @param tbl_com reference on the table comment (if present)
move table comment related parameters somewhere out of "allocate_fields_space" method.
+ + @retval FALSE OK + @retval TRUE Error (error assigned) +*/ + +bool TABLE_SHARE::allocate_fields_space(uint fields_count, + uint null_flds, + uint enum_set_count, + uint enum_set_values_count, + uint field_names_length, + uint enum_set_names_length, + uint com_length, + handler *handler_file, + size_t tbl_com_length, + char *tbl_com) +{ + stored_fields= fields= fields_count; + vfields= 0; + tmp_handler_file= handler_file; + null_fields= null_flds; + enum_set_values= enum_set_values_count; + enums_sets= enum_set_count; + field_names_length+= fields_count + 1; //add space for separators + comments_length= com_length; + /* add space for separators between names and intervals */ + enum_set_names_length+= enum_set_values_count + enum_set_count * 2; + if (!(field_ptr= field = (Field **) + alloc_root(&mem_root, + (uint) ((fields_count + 1) * sizeof(Field*) + + enum_set_count * sizeof(TYPELIB) + + field_names_length + enum_set_names_length + + comments_length)))) + return TRUE; /* purecov: inspected */ + + field_ptr[fields]= 0; // End marker + curr_interval= intervals= (TYPELIB*) (field + fields + 1); + names= (char *) (intervals + enum_set_count); + curr_interval_names= names + field_names_length; + first_comment= comment_pos= + (char *) (names + field_names_length + enum_set_names_length); + if (!enum_set_count) + intervals= 0; // For better debugging + is_fields_allocated= TRUE; + if (is_keys_allocated && final_fields_keys_allocation()) + return TRUE; + comment.str= strmake_root(&mem_root, tbl_com, + (comment.length= tbl_com_length)); + /* init counters */ + null_bits_are_used= (null_fields != 0); + if (null_field_first) + { + null_flags= null_pos= (uchar*) default_values; + null_bit_pos= (db_create_options & HA_OPTION_PACK_RECORD) ? 0 : 1; + /* + null_bytes below is only correct under the condition that + there are no bit fields. Correct values is set below after the + table struct is initialized + */ + null_bytes= (null_fields + null_bit_pos + 7) / 8; + } +#ifndef WE_WANT_TO_SUPPORT_VERY_OLD_FRM_FILES + else + { + null_bytes= (null_fields + 7) / 8; + null_flags= null_pos= (uchar*) (default_values + reclength - null_bytes); + null_bit_pos= 0; + } +#endif
we don't want to support anything but the latest frm version here. all compatibility hacks belong to open_binary_frm directly.
+ use_hash= fields >= MAX_FIELDS_BEFORE_HASH; + if (use_hash) + use_hash= !hash_init(&name_hash, + system_charset_info, + fields, 0, 0, + (hash_get_key) get_field_name, 0, 0); + + return FALSE; +} + +bool TABLE_SHARE::final_fields_keys_allocation()
see above - move this and parse_names and the likes into one function, that the caller will call at the end. don't try to detect when "the end" is.
+{ + DBUG_ASSERT(is_fields_allocated && is_keys_allocated); + if (!(interval_array= + (const char **) alloc_root(&mem_root, + (uint) ((fields + enum_set_values + + keys + 3) * sizeof(char *))))) + return TRUE; + curr_interval_array= interval_array + fields + 1; + return FALSE; +} + +/** + Internal function triggeren when all names + (indices, fields, enum and set) are allocated and copied + + + @retval FALSE OK + @retval TRUE Error +*/ + +bool TABLE_SHARE::parse_names() +{ + TYPELIB *interval; + DBUG_ASSERT(is_keys_allocated && is_fields_allocated && + is_keynames_ready); + fix_type_pointers(&interval_array, &fieldnames, 1, &names); + if (is_fieldnames_ready) + fix_type_pointers(&interval_array, intervals, enums_sets, &names); + else + { + /* Intervals were passed to the add_field without preparation */ + names= curr_interval_names; + interval_array= curr_interval_array; + is_fieldnames_ready= TRUE; + } + if ((keyname= first_keyname)) + fix_type_pointers(&interval_array, &keynames, 1, &keyname); + if (fieldnames.count != fields) + return TRUE; + + /* Set ENUM and SET lengths */ + for (interval= intervals; + interval < intervals + enums_sets; + interval++) + { + uint count= (uint) (interval->count + 1) * sizeof(uint); + if (!(interval->type_lengths= (uint *) alloc_root(&mem_root, count))) + return TRUE; + for (count= 0; count < interval->count; count++) + { + char *val= (char*) interval->type_names[count]; + interval->type_lengths[count]= strlen(val); + } + interval->type_lengths[count]= 0; + } + + return FALSE; +} + +/** + Add field description to the TABLE_SHARE + + @param name Field name + @param pos Field position in the record buffer
isn't it equal the end of the previous field? (with special treatment for BIT)
+ @param field_length Length of the field + @param pack_flag Flags of the field
what's that? I mean, I understand that it's some set of cryptic bits, but how would an engine use that? You know what, I think we're missing a usage example. could you add it to the example engine (ha_example) support for discovery? Something like that - a table with a fixed name, say, "example_discovered_table" is created by discovery() on the example engine. That is, a fixed table name, keep it as simple as possible. Fixed table structure. Few fields, say, an int, primary key, and enum (Y,N) - or whatever. because you haven't yet done the step 2 in our plan (extracting that part of ALTER TABLE that converts TABLE_SHARE into Create_field), this example discovery won't create frm files - it'll be completely in memory. which is fine, for now.
+ @param field_type Field type + @param charset Caracter set infoemation of the field + @param geom_type Geometry data type (for geometry fields) + @param auto_increment Is the field auto increment + @param enum_set Definition of ENUM or SET (for appropriate type)
a complete typelib? probably it's an overkill. better to pass an array of names, and you can build a typelib internally.
+ @param comm_pos Reference on the field commentary + @param comment_length Length of the field commentary + @param error Error code assigned here in case of error
as always, you can return the error code
+ + @retval FALSE OK + @retval TRUE Error (error assigned) +*/ + +Field **TABLE_SHARE::add_field(const char *name, + uchar *pos, + uint32 field_length, + uint pack_flag, + enum_field_types field_type, + CHARSET_INFO *charset, + geometry_type geom_type, + bool auto_increment, + TYPELIB* enum_set, + char *comm_pos, + uint comment_length, + int &error) +{ + DBUG_ASSERT(is_fields_allocated); + utype unireg_type= get_unireg_type(auto_increment, + !f_no_default(pack_flag), + FALSE, + FALSE, f_maybe_null(pack_flag), + field_type); + + return add_field_int(name, pos, field_length, + pack_flag, field_type, + charset, geom_type, unireg_type, + enum_set, comm_pos, comment_length, + NULL, TRUE, error); +} + + +/** + Add field description to the TABLE_SHARE (internal) + + @param name Field name + @param pos Field position in the record buffer + @param field_length Length of the field + @param pack_flag Flags of the field + @param field_type Field type + @param charset Caracter set infoemation of the field + @param geom_type Geometry data type (for geometry fields) + @param unireg_type Unireg type (or check flags) + @param auto_increment Is the field auto increment + @param enum_set Definition of ENUM or SET (for appropriate type) + @param comm_pos Reference on the field commentary + @param comment_length Length of the field commentary + @param vcol_info Virtual column information + @param fld_stored_in_db TRUE for normal fields and persistent virtual fields + @param error Error code assigned here in case of error + + @retval FALSE OK + @retval TRUE Error (error assigned) +*/ + +Field **TABLE_SHARE::add_field_int(const char *name, + uchar *pos, + uint32 field_length, + uint pack_flag, + enum_field_types field_type, + CHARSET_INFO *charset, + geometry_type geom_type, + utype unireg_type, + TYPELIB* enum_set, + char *comm_pos, + uint comment_length, + Virtual_column_info *vcol_info, + bool fld_stored_in_db, + int &error) +{ + Field* reg_field; + DBUG_ASSERT(is_fields_allocated); + if (!is_fieldnames_ready) + { + if (enum_set) + { + copy_typelib(curr_interval, + &curr_interval_array, enum_set, + &curr_interval_names, NAMES_SEP_CHAR);
See? it's better to get an array of names from the caller and build the typelib here - as you won't use the typelib that caller provided anyway
+ enum_set= curr_interval++; + } + { + uchar *tmp= (uchar*) strmov((char*) *curr_field_names, name); + *tmp++= (uchar)NAMES_SEP_CHAR; + *tmp= 0; + name= curr_field_names; + curr_field_names= (char*)tmp; + }
just like with other names - let the caller handle the allocations, you simply use the pointer that the caller provides.
+ } + reg_field = *field_ptr= + make_field(this, pos, + (uint32) field_length, + null_pos, null_bit_pos, + pack_flag, + field_type, + charset, + geom_type, + (utype) MTYP_TYPENR(unireg_type),
add the cast to MTYP_TYPENR definition and remove it here (and elsewhere)
+ enum_set, + name); + + if (!reg_field) // Not supported field type + { + error= 4; + return NULL; + } + reg_field->field_index= (field_ptr - field); + reg_field->vcol_info= vcol_info; + reg_field->stored_in_db= fld_stored_in_db; + DBUG_ASSERT(reg_field->field_index < fields); + if (comm_pos && comment_length) + { + memcpy(comment_pos, comm_pos, comment_length);
same here and everywhere. I'll stop repeating "just use the pointer, let the caller handle the allocation" alternatively (here and everywhere) do alloc_root() - that is the caller gives you a pointer, and you call alloc_root() anc copy the data. Yes, it means that every string will be allocated in its own chunk. And in the _int version of the function you use the pointers. I mean something like that: TABLE_SHARE::add_field(char *name, ...) { ... return add_field_int(strdup_root(name), ... ); so, if one uses add_field, he does not need to think about memory allocations, and does not need to calculate the sum of lengths, and stuff like that. while open_binary_frm() will use add_field_int() can will pass the pointers directly into the big allocated chunk, as always. The above applies to all strings, comments, enum/set, everything.
+ DBUG_ASSERT(!is_comments_ready); + } + else + DBUG_ASSERT(is_comments_ready || comment_length == 0); + if (comment_length) + { + reg_field->comment.str= comment_pos; + reg_field->comment.length= comment_length; + comment_pos+= comment_length;
right, that's what I mean by "use the pointer directly, let the caller to handle the allocations".
+ } + else + { + reg_field->comment.str= (char *)""; + reg_field->comment.length= 0;
very minor stilistic details: in these cases I prefer to use reg_field->comment= empty_lex_string; it was introduced for exactly this purpose.
+ } + if (f_no_default(pack_flag)) + reg_field->flags|= NO_DEFAULT_VALUE_FLAG; + + if (reg_field->unireg_check == UT_NEXT_NUMBER) + found_next_number_field= field_ptr; + if (timestamp_field == reg_field) + timestamp_field_offset= reg_field->field_index; + + if (use_hash) + { + if (my_hash_insert(&name_hash, + (uchar*) field_ptr)) + { + /* + Set return code 8 here to indicate that an error has + occurred but that the error message already has been + sent (OOM). + */ + error= 8; + return NULL; + } + } + + /* move null bit/byte pointers */ + if (field_type == MYSQL_TYPE_BIT && !f_bit_as_char(pack_flag)) + { + null_bits_are_used= 1; + if ((null_bit_pos+= field_length & 7) > 7) + { + null_pos++; + null_bit_pos-= 8; + } + } + if (!(reg_field->flags & NOT_NULL_FLAG)) + { + if (!(null_bit_pos= (null_bit_pos + 1) & 7)) + null_pos++; + } + + if (reg_field->field_index == fields - 1) + { + if (!is_fieldnames_ready) + + is_fields_ready= TRUE; + if (is_keyparts_ready && + connect_indexes_and_fields(error)) + return NULL; + } + return field_ptr++; +} + +/** + Internal function which connect fields and indices (its parts) triggered + when fields and index parts are ready + + @param error Error code assigned here in case of error + + @retval FALSE OK + @retval TRUE Error (error assigned) +*/ + +bool TABLE_SHARE::connect_indexes_and_fields(int &error) +{ + KEY *keyinfo; + uint i; + /* Fix key->name and key_part->field */ + if (key_parts) + { + KEY_PART_INFO *key_part= first_key_part; + uint prm_key=(uint) (find_type((char*) primary_key_name, + &keynames, 3) - 1); + longlong ha_option= tmp_handler_file->ha_table_flags(); + keyinfo= key_info; + + for (uint key=0 ; key < keys ; key++, keyinfo++) + { + uint usable_parts= 0; + keyinfo->name=(char*) keynames.type_names[key]; + keyinfo->name_length= strlen(keyinfo->name); + keyinfo->cache_name= + (uchar*) alloc_root(&mem_root, + table_cache_key.length+ + keyinfo->name_length + 1); + if (keyinfo->cache_name) // If not out of memory + { + uchar *pos= keyinfo->cache_name; + memcpy(pos, table_cache_key.str, table_cache_key.length); + memcpy(pos + table_cache_key.length, keyinfo->name, + keyinfo->name_length+1); + } + + /* Fix fulltext keys for old .frm files */ + if (key_info[key].flags & HA_FULLTEXT) + key_info[key].algorithm= HA_KEY_ALG_FULLTEXT; + + if (prm_key >= MAX_KEY && (keyinfo->flags & HA_NOSAME)) + { + /* + If the UNIQUE key doesn't have NULL columns and is not a part key + declare this as a primary key. + */ + prm_key= key; + for (i= 0 ; i < keyinfo->key_parts; i++) + { + uint fieldnr= key_part[i].fieldnr; + if (!fieldnr || + field[fieldnr-1]->null_ptr || + field[fieldnr-1]->key_length() != + key_part[i].length) + { + prm_key=MAX_KEY; // Can't be used + break; + } + } + } + + for (i=0 ; i < keyinfo->key_parts ; key_part++,i++) + { + Field *fld; + if (new_field_pack_flag <= 1) + key_part->fieldnr= (uint16) find_field(field, + default_values, + (uint) key_part->offset, + (uint) key_part->length); + if (!key_part->fieldnr) + { + error= 4; // Wrong file + return TRUE; + } + fld= key_part->field= field[key_part->fieldnr-1]; + key_part->type= fld->key_type(); + if (fld->null_ptr) + { + key_part->null_offset=(uint) ((uchar*) fld->null_ptr - + default_values); + key_part->null_bit= fld->null_bit; + key_part->store_length+=HA_KEY_NULL_LENGTH; + keyinfo->flags|=HA_NULL_PART_KEY; + keyinfo->key_length+= HA_KEY_NULL_LENGTH; + } + if (fld->type() == MYSQL_TYPE_BLOB || + fld->real_type() == MYSQL_TYPE_VARCHAR || + fld->type() == MYSQL_TYPE_GEOMETRY) + { + if (fld->type() == MYSQL_TYPE_BLOB || + fld->type() == MYSQL_TYPE_GEOMETRY) + key_part->key_part_flag|= HA_BLOB_PART; + else + key_part->key_part_flag|= HA_VAR_LENGTH_PART; + key_part->store_length+=HA_KEY_BLOB_LENGTH; + keyinfo->key_length+= HA_KEY_BLOB_LENGTH; + } + if (fld->type() == MYSQL_TYPE_BIT) + key_part->key_part_flag|= HA_BIT_PART; + + if (i == 0 && key != prm_key) + fld->flags |= (((keyinfo->flags & HA_NOSAME) && + (keyinfo->key_parts == 1)) ? + UNIQUE_KEY_FLAG : MULTIPLE_KEY_FLAG); + if (i == 0) + fld->key_start.set_bit(key); + if (fld->key_length() == key_part->length && + !(fld->flags & BLOB_FLAG)) + { + if (tmp_handler_file->index_flags(key, i, 0) & HA_KEYREAD_ONLY) + { + keys_for_keyread.set_bit(key); + fld->part_of_key.set_bit(key); + fld->part_of_key_not_clustered.set_bit(key); + } + if (tmp_handler_file->index_flags(key, i, 1) & HA_READ_ORDER) + fld->part_of_sortkey.set_bit(key); + } + if (!(key_part->key_part_flag & HA_REVERSE_SORT) && + usable_parts == i) + usable_parts++; // For FILESORT + fld->flags|= PART_KEY_FLAG; + if (key == prm_key) + { + fld->flags|= PRI_KEY_FLAG; + /* + If this field is part of the primary key and all keys contains + the primary key, then we can use any key to find this column + */ + if (ha_option & HA_PRIMARY_KEY_IN_READ_INDEX) + { + if (fld->key_length() == key_part->length && + !(fld->flags & BLOB_FLAG)) + fld->part_of_key= keys_in_use; + if (fld->part_of_sortkey.is_set(key)) + fld->part_of_sortkey= keys_in_use; + } + } + if (fld->key_length() != key_part->length) + { + key_part->key_part_flag|= HA_PART_KEY_SEG; + } + if (fld->real_maybe_null()) + key_part->key_part_flag|= HA_NULL_PART; + /* + Sometimes we can compare key parts for equality with memcmp. + But not always. + */ + if (!(key_part->key_part_flag & (HA_BLOB_PART | HA_VAR_LENGTH_PART | + HA_BIT_PART)) && + key_part->type != HA_KEYTYPE_FLOAT && + key_part->type == HA_KEYTYPE_DOUBLE) + key_part->key_part_flag|= HA_CAN_MEMCMP; + } + keyinfo->usable_key_parts= usable_parts; // Filesort + + set_if_bigger(max_key_length,keyinfo->key_length+ + keyinfo->key_parts); + total_key_length+= keyinfo->key_length; + /* + MERGE tables do not have unique indexes. But every key could be + an unique index on the underlying MyISAM table. (Bug #10400) + */ + if ((keyinfo->flags & HA_NOSAME) || + (ha_option & HA_ANY_INDEX_MAY_BE_UNIQUE)) + set_if_bigger(max_unique_length,keyinfo->key_length); + } + if (prm_key < MAX_KEY && + (keys_in_use.is_set(prm_key))) + { + primary_key= prm_key; + /* + If we are using an integer as the primary key then allow the user to + refer to it as '_rowid' + */ + if (key_info[prm_key].key_parts == 1) + { + Field *fld= key_info[prm_key].key_part[0].field; + if (fld && fld->result_type() == INT_RESULT) + { + /* note that fieldnr here (and rowid_field_offset) starts from 1 */ + rowid_field_offset= (key_info[prm_key].key_part[0]. + fieldnr); + } + } + } + else + primary_key = MAX_KEY; // we do not have a primary key + } + else + primary_key= MAX_KEY; + + return FALSE; +} + /* Read data from a binary .frm file from MySQL 3.23 - 5.0 into TABLE_SHARE */ @@ -747,38 +1544,53 @@ static int open_binary_frm(THD *thd, TAB legacy_db_type < DB_TYPE_FIRST_DYNAMIC) share->db_plugin= ha_lock_engine(NULL, ha_checktype(thd, legacy_db_type, 0, 0)); - share->db_create_options= db_create_options= uint2korr(head+30); - share->db_options_in_use= share->db_create_options; - share->mysql_version= uint4korr(head+51); - share->null_field_first= 0; - if (!head[32]) // New frm file in 3.23 - { - share->avg_row_length= uint4korr(head+34); - share->transactional= (ha_choice) (head[39] & 3); - share->page_checksum= (ha_choice) ((head[39] >> 2) & 3); - share->row_type= (row_type) head[40]; - share->table_charset= get_charset((uint) head[38],MYF(0)); - share->null_field_first= 1; - } - if (!share->table_charset) - { - /* unknown charset in head[38] or pre-3.23 frm */ - if (use_mb(default_charset_info)) - { - /* Warn that we may be changing the size of character columns */ - sql_print_warning("'%s' had no or invalid character set, " - "and default character set is multi-byte, " - "so character column sizes may have changed", - share->path.str); - } - share->table_charset= default_charset_info; + + + { + enum ha_choice transact= HA_CHOICE_UNDEF; + enum ha_choice page_chcksum= HA_CHOICE_UNDEF; + enum row_type rtype= ROW_TYPE_DEFAULT; + CHARSET_INFO *charset= NULL; + bool null_fld_first= 0; + ulong avg_row_len= 0; + if (!head[32]) // New frm file in 3.23 + { + avg_row_len= uint4korr(head+34); + transact= (ha_choice) (head[39] & 3); + page_chcksum= (ha_choice) ((head[39] >> 2) & 3); + rtype= (row_type) head[40]; + charset= get_charset((uint) head[38],MYF(0)); + null_fld_first= 1; + } + if (!charset) + { + /* unknown charset in head[38] or pre-3.23 frm */ + if (use_mb(default_charset_info)) + { + /* Warn that we may be changing the size of character columns */ + sql_print_warning("'%s' had no or invalid character set, " + "and default character set is multi-byte, " + "so character column sizes may have changed", + share->path.str); + } + charset= default_charset_info; + } + if (share-> + set_init_parameters(frm_ver, + uint4korr(head + 51), /*MySQL Version of the frm*/ + uint2korr(head + 30), /*Create options (old)*/ + avg_row_len, transact, page_chcksum, + rtype, charset, null_fld_first, + uint4korr(head+22), /*MIN_ROWS*/ + uint4korr(head+18), /*MAX_ROWS*/ + uint2korr(head+16), /*record length*/ + uint2korr(head+59), /*extra_rec_buf_length*/ + uint2korr(head+62), /*key block size*/ + new_field_pack_flag, + error))
I'd say you can remove this method completely - it's just a set of assignments, the caller can do them directly. better try to provide reasonable defaults (in share->init()), so that the caller would not need to assign most of those.
+ goto free_and_err; } - share->db_record_offset= 1; - if (db_create_options & HA_OPTION_LONG_BLOB_PTR) - share->blob_ptr_size= portable_sizeof_char_ptr; error=4; - share->max_rows= uint4korr(head+18); - share->min_rows= uint4korr(head+22);
/* Read keyinformation */ key_info_length= (uint) uint2korr(head+28); @@ -787,84 +1599,85 @@ static int open_binary_frm(THD *thd, TAB goto err; /* purecov: inspected */ if (disk_buff[0] & 0x80) { - share->keys= keys= (disk_buff[1] << 7) | (disk_buff[0] & 0x7f); - share->key_parts= key_parts= uint2korr(disk_buff+2); + keys= (disk_buff[1] << 7) | (disk_buff[0] & 0x7f); + key_parts= uint2korr(disk_buff+2); } else { - share->keys= keys= disk_buff[0]; - share->key_parts= key_parts= disk_buff[1]; + keys= disk_buff[0]; + key_parts= disk_buff[1]; } - share->keys_for_keyread.init(0); - share->keys_in_use.init(keys); - - n_length=keys*sizeof(KEY)+key_parts*sizeof(KEY_PART_INFO); - if (!(keyinfo = (KEY*) alloc_root(&share->mem_root, - n_length + uint2korr(disk_buff+4)))) - goto err; /* purecov: inspected */ - bzero((char*) keyinfo,n_length); - share->key_info= keyinfo; - key_part= my_reinterpret_cast(KEY_PART_INFO*) (keyinfo+keys); - strpos=disk_buff+6; - - if (!(rec_per_key= (ulong*) alloc_root(&share->mem_root, - sizeof(ulong)*key_parts))) + if (share->allocate_indices_space(keys, key_parts, + uint2korr(disk_buff + 4) - keys - 1, + error)) goto err; + strpos=disk_buff+6;
- for (i=0 ; i < keys ; i++, keyinfo++) + for (i=0 ; i < keys ; i++) { if (new_frm_ver >= 3) { - keyinfo->flags= (uint) uint2korr(strpos) ^ HA_NOSAME; - keyinfo->key_length= (uint) uint2korr(strpos+2); - keyinfo->key_parts= (uint) strpos[4]; - keyinfo->algorithm= (enum ha_key_alg) strpos[5]; - keyinfo->block_size= uint2korr(strpos+6); + keyinfo= share->add_index(NULL, 0, + (uint) uint2korr(strpos) ^ HA_NOSAME, + (uint) uint2korr(strpos + 2), + (uint) strpos[4], (enum ha_key_alg) strpos[5], + uint2korr(strpos + 6));
make block_size an optional parameter, or let the caller set it directly
strpos+=8; } else { - keyinfo->flags= ((uint) strpos[0]) ^ HA_NOSAME; - keyinfo->key_length= (uint) uint2korr(strpos+1); - keyinfo->key_parts= (uint) strpos[3]; - keyinfo->algorithm= HA_KEY_ALG_UNDEF; + keyinfo= share->add_index(NULL, 0, + ((uint) strpos[0]) ^ HA_NOSAME, + (uint) uint2korr(strpos + 1), + (uint) strpos[3], + HA_KEY_ALG_UNDEF, 0); strpos+=4; }
- keyinfo->key_part= key_part; - keyinfo->rec_per_key= rec_per_key; - for (j=keyinfo->key_parts ; j-- ; key_part++) + for (j=keyinfo->key_parts ; j-- ; ) { - *rec_per_key++=0; - key_part->fieldnr= (uint16) (uint2korr(strpos) & FIELD_NR_MASK); - key_part->offset= (uint) uint2korr(strpos+2)-1; - key_part->key_type= (uint) uint2korr(strpos+5); - // key_part->field= (Field*) 0; // Will be fixed later if (new_frm_ver >= 1) { - key_part->key_part_flag= *(strpos+4); - key_part->length= (uint) uint2korr(strpos+7); + if (share->add_index_part(keyinfo, + (uint16) + (uint2korr(strpos) & FIELD_NR_MASK), + (uint) uint2korr(strpos + 2) - 1, + (uint) uint2korr(strpos + 5), + (uint) uint2korr(strpos + 7), + *(strpos + 4), + error)) + { + DBUG_ASSERT(0); // it is not possible if keys defined first + } strpos+=9; } else { - key_part->length= *(strpos+4); - key_part->key_part_flag=0; - if (key_part->length > 128) + uint16 length= *(strpos+4); + uint16 flags= 0; + if (length > 128) { - key_part->length&=127; /* purecov: inspected */ - key_part->key_part_flag=HA_REVERSE_SORT; /* purecov: inspected */ + length&= 127; /* purecov: inspected */ + flags= HA_REVERSE_SORT; /* purecov: inspected */ } + if (share->add_index_part(keyinfo, + (uint16) + (uint2korr(strpos) & FIELD_NR_MASK), + (uint) uint2korr(strpos+2)-1, + (uint) uint2korr(strpos+5), + length, flags, error)) + { + DBUG_ASSERT(0); // it is not possible if keys defined first + } strpos+=7; } - key_part->store_length=key_part->length; } } - keynames=(char*) key_part; - strpos+= (strmov(keynames, (char *) strpos) - keynames)+1;
- share->reclength = uint2korr((head+16)); - share->stored_rec_length= share->reclength; + if (share->add_index_names_as_onestring(strpos, i)) + goto err;
No, please remove this method, let the caller do strmov as before. it's a weird thing that open_binary_frm is doing, there's no need to create an api for that, just keep it the old way. To repeat: we want to have a reasonable and sane API for storage engines to use, not an API that allows to do everything that frm format was historically forcing us to do in openfrm().
+ strpos+= i; + if (*(head+26) == 1) share->system= 1; /* one-record-database */ #ifdef HAVE_CRYPTED_FRM @@ -1034,17 +1847,10 @@ static int open_binary_frm(THD *thd, TAB } DBUG_ASSERT(next_chunk <= buff_end); } - share->key_block_size= uint2korr(head+62);
error=4; - extra_rec_buf_length= uint2korr(head+59); - rec_buff_length= ALIGN_SIZE(share->reclength + 1 + extra_rec_buf_length); - share->rec_buff_length= rec_buff_length; - if (!(record= (uchar *) alloc_root(&share->mem_root, - rec_buff_length))) - goto free_and_err; /* purecov: inspected */ - share->default_values= record; - if (my_pread(file, record, (size_t) share->reclength, + /* TODO: make copy function for handler */
what does this comment mean?
+ if (my_pread(file, share->default_values, (size_t) share->reclength, record_offset, MYF(MY_NABP))) goto free_and_err; /* purecov: inspected */
@@ -2785,6 +3271,27 @@ fix_type_pointers(const char ***array, T } /* fix_type_pointers */
+static void +copy_typelib(TYPELIB *type_copy, + const char ***array, TYPELIB *point_to_type, + char **names, char chr) +{ + *((*names)++)= chr; // full compatibility, just to be safe + type_copy->name= 0; + type_copy->type_names= *array; + type_copy->count= point_to_type->count; + for (uint i= 0; i < point_to_type->count; i++) + { + *((*array)++) = *names; + uchar *tmp= (uchar*) strmov((char*) *names, type_copy->type_names[i]); + *tmp++= (uchar)0; + *names= (char *)tmp; + } + *((*array)++)= NullS; /* End of type */ + *((*names)++)= 0; +} + + TYPELIB *typelib(MEM_ROOT *mem_root, List<String> &strings) { TYPELIB *result= (TYPELIB*) alloc_root(mem_root, sizeof(TYPELIB));
I don't understand how you handle our new table/field/index options, those that can be defined per storage engine
=== modified file 'sql/table.h' --- a/sql/table.h 2011-08-17 05:48:35 +0000 +++ b/sql/table.h 2011-10-04 11:14:59 +0000 @@ -484,6 +485,39 @@ typedef struct st_table_share void *ha_data; void (*ha_data_destroy)(void *); /* An optional destructor for ha_data. */
+ /* + Variables of building the TABLE_SHARE. + + TODO: it might have sense to make separate structure for this variables + and deallocate it after building + */
Agree! It would be better to remove all this added functionality (all new methods and variables) from the TABLE_SHARE and create a separate class TABLE_SHARE_builder (or whatever), and have it allocated on the stack, like int open_binary_frm( ... ) { TABLE_SHARE_builder builder(share); ... builder->add_field(...); ... ... etc ... } this looks like a cleaner approach - the builder and all its memory is destroyed when the function returns. And the TABLE_SHARE itself does not have the "building" methods - it always represents a complete and frozen table structure.
+ KEY_PART_INFO *first_key_part; // pointer on parts array + KEY_PART_INFO *key_part; // current position during key parts adding + KEY *key; // current position during keys adding + ulong *rec_per_key; + const char **interval_array; + char *first_keyname; // pointer on the first index name string + char *keyname; // pointer on the current index name string during adding + char *names; // array of pointers to all names used in the definition + char *curr_interval_names; + TYPELIB *curr_interval; + const char **curr_interval_array; + char *first_comment, *comment_pos; + char *curr_field_names; + uchar *null_flags, *null_pos; //pointers to the NULL bits during field adding + /* Needed for fixing indices according to handler abilities and features */ + handler *tmp_handler_file; + Field **field_ptr; + uint enum_set_values, enums_sets, comments_length; + uint new_field_pack_flag; + uchar null_bit_pos; + bool use_hash; /* use hash for field names */ + bool null_bits_are_used; + /* Flags of the building progress (also used by triggers) */ + bool is_keys_allocated, is_fields_allocated, + is_keynames_ready, is_fieldnames_ready, is_keyparts_ready, + is_comments_ready, is_fields_ready; + /* end of declaration of variables for building TABLE_SHARE*/
/* Set share's table cache key and update its db and table name appropriately. @@ -652,6 +686,96 @@ typedef struct st_table_share return (tmp_table == SYSTEM_TMP_TABLE || is_view) ? 0 : table_map_id; }
+ /* Table_share building functions */ + void init_metadata() + { + is_keys_allocated= is_fields_allocated= + is_keynames_ready= is_fieldnames_ready= is_comments_ready= + is_keyparts_ready= is_fields_ready= FALSE; + } + bool set_init_parameters(uchar frm_ver, ulong mysql_ver, uint db_create_opt, + ulong avg_row_len, enum ha_choice transact, + enum ha_choice page_chcksum, enum row_type rtype, + CHARSET_INFO *charset, bool null_fld_first, + ha_rows mi_rows, ha_rows ma_rows, + ulong reclen, uint extra_rec_buf_len, + uint key_blck_size, uint new_fld_pck_flg, + int &error); + bool allocate_indices_space(uint keys, uint parts, uint names_len, + int &error); + KEY *add_index(char *name, uint name_len, + uint flags, uint length, uint parts, + enum ha_key_alg alg, uint blk_size); + bool add_index_part(KEY *keyinfo, uint16 fieldnr, + uint offset, uint16 keytype, + uint16 length, uint16 flags, int &error); + inline bool add_index_names_as_onestring(uchar *pos, uint &len) + { + DBUG_ASSERT(is_keys_allocated); + DBUG_ASSERT(keyname == first_keyname); // add_key did not used names + len= (strmov(first_keyname, (char *) pos) - first_keyname) + 1; + is_keynames_ready= TRUE; + if (is_fieldnames_ready && parse_names()) + return TRUE; + return FALSE; + } + inline bool add_field_enum_set_names_as_onestring(uchar *pos, uint len) + { + DBUG_ASSERT(is_fields_allocated); + memcpy((char*) names, pos, len); + is_fieldnames_ready= TRUE; + if (is_keynames_ready && parse_names()) + return TRUE; + return FALSE; + } + inline void add_comments_as_one_string(const uchar *pos) + { + DBUG_ASSERT(is_fields_allocated); + memcpy(first_comment, pos, comments_length); + is_comments_ready= TRUE; + }
See above, I don't think that these *_onestring functions are necessary. if you use the pointers are provided by the caller, than open_binary_frm can parse these "one strings" as it wants and pass pointers down. Thus keeping this trickery out of the api.
+ + bool allocate_fields_space(uint fields_count, + uint null_flds, + uint enum_set_count, + uint enum_set_values_count, + uint field_names_length, + uint enum_set_names_length, + uint com_length, + handler *handler_file, + size_t tbl_com_length= 0, + char *tbl_com= NULL); + Field **add_field(const char *name, + uchar *pos, + uint32 field_length, + uint pack_flag, + enum_field_types field_type, + CHARSET_INFO *charset, + geometry_type geom_type, + bool auto_increment, + TYPELIB* enum_set, + char *comm_pos, + uint comment_length, + int &error); + Field **add_field_int(const char *name, + uchar *pos, + uint32 field_length, + uint pack_flag, + enum_field_types field_type, + CHARSET_INFO *charset, + geometry_type geom_type, + utype unireg_type, + TYPELIB* enum_set, + char *comment_pos, + uint comment_length, + Virtual_column_info *vcol_info, + bool fld_stored_in_db, + int &error); +private: + bool final_fields_keys_allocation(); + bool parse_names(); + bool connect_indexes_and_fields(int &error); + } TABLE_SHARE;
Regards, Sergei
participants (1)
-
Sergei Golubchik