Re: [Maria-developers] InnoDB blob for primary key
Hi Sergei! Actually I was going through the mysql source code for unique long constraints in file sql_tmp_table.cc in function create_tmp_table they make a new field and a new key(hash_key) and pass this table obejct to storage engine.They actually refer this field as a hash field On the time of insert they call bool check_unique_constraint(TABLE *table) function which first calculate the hash and store it in field then they see for duplicate hash and retrive ha_index_next_same if records are not same then record We can do the same thing in mariadb by adding one more field and key in mysql_prepare_create_table in this we check for blob with unlimited length or varchar for length greater then internal storage engine by doing this in mysql_prepare_create_table there will be no issues of frm file inconsistance. In case of insert first we will fill the hash field in fill_record function of sql_base.cc by first calculating the hash. Then we will retrive the index map using ha_index_read_map if returened value is zero then we will comapare two records and if they match then we will through error I am not sure where to place this code either in fill_record or later Or i can simple just fill hash in field in fill_record and then check for duplicates later on. Current I am not sure how to hide columns from user.Sir, can you suggest me where to look But there is one problem we can make unique key by this approch but not primary key because primary key is clustered and hashes can collide so i think we can't use hash field as primary key. To overcome this problem I have one idea instead of storing just hash we can make hash field length 10 bytes and in last two bytes we can store short int which tells how much time hash is repeated this can make hash unique in case of collusion. And also we are not doing more computation because we already retrive all records with same hashes. What do you think of this idea?. And there is one more problem how to make it foreign key. Will send you a prototype code tomorrow. Regards sachin On Fri, Apr 15, 2016 at 12:23 AM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Sachin!
On Apr 13, Sachin Setia wrote:
Hello Sergei Sorry I did not see your mail. Actually i was thinking something like this before implementing the prototype but if i am more closer to innodb the more performance i will i get. I will definitively think about it.
Great!
Could you please tell me (mailing list, that is) what you think before next Monday (before April 18h, that is)?
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi Sergei! As I already told you i was building prototype.It is some what completed apart from one thing comparing of two field values. the difficulty is how to get data length of field from table->record[1]. I will try to solve it. One more thing actually i got how mysql hide field. For example condsider three fields hash,data,data. mysql field pointer point at second field not at hash field and hash field ptr is stored in table->hash_field can we do something similar to store hash fields(if we make array of hashes in case of more than one unique).But will adding member variable cause problem? what do you think? Regards sachin On Fri, Apr 15, 2016 at 6:37 PM, Sachin Setia <sachinsetia1001@gmail.com> wrote:
Hi Sergei!
Actually I was going through the mysql source code for unique long constraints in file sql_tmp_table.cc in function create_tmp_table they make a new field and a new key(hash_key) and pass this table obejct to storage engine.They actually refer this field as a hash field On the time of insert they call bool check_unique_constraint(TABLE *table) function which first calculate the hash and store it in field then they see for duplicate hash and retrive ha_index_next_same if records are not same then record
We can do the same thing in mariadb by adding one more field and key in mysql_prepare_create_table in this we check for blob with unlimited length or varchar for length greater then internal storage engine by doing this in mysql_prepare_create_table there will be no issues of frm file inconsistance.
In case of insert first we will fill the hash field in fill_record function of sql_base.cc by first calculating the hash. Then we will retrive the index map using ha_index_read_map if returened value is zero then we will comapare two records and if they match then we will through error I am not sure where to place this code either in fill_record or later Or i can simple just fill hash in field in fill_record and then check for duplicates later on.
Current I am not sure how to hide columns from user.Sir, can you suggest me where to look
But there is one problem we can make unique key by this approch but not primary key because primary key is clustered and hashes can collide so i think we can't use hash field as primary key. To overcome this problem I have one idea instead of storing just hash we can make hash field length 10 bytes and in last two bytes we can store short int which tells how much time hash is repeated this can make hash unique in case of collusion. And also we are not doing more computation because we already retrive all records with same hashes. What do you think of this idea?. And there is one more problem how to make it foreign key.
Will send you a prototype code tomorrow. Regards sachin
On Fri, Apr 15, 2016 at 12:23 AM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Sachin!
On Apr 13, Sachin Setia wrote:
Hello Sergei Sorry I did not see your mail. Actually i was thinking something like this before implementing the prototype but if i am more closer to innodb the more performance i will i get. I will definitively think about it.
Great!
Could you please tell me (mailing list, that is) what you think before next Monday (before April 18h, that is)?
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hello Sergei! Hi Actually I was thinking about how to implement blob as a foreign key.Foreign has to be unique which we can already implement. To make it foreign key we can either store unique hash or the whole blob column. But I am not sure much people want to copy so long blob data in reference table. Second option would be use blob hash as a reference key. But user can not directly us hash as a reference key because that is hidden. What I was thinking of a clear to programmer way of using blob hash. Suppose user can directly create blob hash column ,use that column as a primary key or foreign key.Like create table t1(abc blob , blob hash(abc))//this will just create blob hash column create table t1(abc blob,unique(blob hash(abc))) // this will create unique blob hash column and similar for primary key and foreign key user can enter hash value if they have some good algorithm or if they do not give any value we will automatically create and store hash. What do you think? sir. Regards sachin On Sat, Apr 16, 2016 at 7:37 PM, Sachin Setia <sachinsetia1001@gmail.com> wrote:
Hi Sergei! As I already told you i was building prototype.It is some what completed apart from one thing comparing of two field values. the difficulty is how to get data length of field from table->record[1]. I will try to solve it. One more thing actually i got how mysql hide field. For example condsider three fields hash,data,data. mysql field pointer point at second field not at hash field and hash field ptr is stored in table->hash_field can we do something similar to store hash fields(if we make array of hashes in case of more than one unique).But will adding member variable cause problem? what do you think? Regards sachin
On Fri, Apr 15, 2016 at 6:37 PM, Sachin Setia <sachinsetia1001@gmail.com> wrote:
Hi Sergei!
Actually I was going through the mysql source code for unique long constraints in file sql_tmp_table.cc in function create_tmp_table they make a new field and a new key(hash_key) and pass this table obejct to storage engine.They actually refer this field as a hash field On the time of insert they call bool check_unique_constraint(TABLE *table) function which first calculate the hash and store it in field then they see for duplicate hash and retrive ha_index_next_same if records are not same then record
We can do the same thing in mariadb by adding one more field and key in mysql_prepare_create_table in this we check for blob with unlimited length or varchar for length greater then internal storage engine by doing this in mysql_prepare_create_table there will be no issues of frm file inconsistance.
In case of insert first we will fill the hash field in fill_record function of sql_base.cc by first calculating the hash. Then we will retrive the index map using ha_index_read_map if returened value is zero then we will comapare two records and if they match then we will through error I am not sure where to place this code either in fill_record or later Or i can simple just fill hash in field in fill_record and then check for duplicates later on.
Current I am not sure how to hide columns from user.Sir, can you suggest me where to look
But there is one problem we can make unique key by this approch but not primary key because primary key is clustered and hashes can collide so i think we can't use hash field as primary key. To overcome this problem I have one idea instead of storing just hash we can make hash field length 10 bytes and in last two bytes we can store short int which tells how much time hash is repeated this can make hash unique in case of collusion. And also we are not doing more computation because we already retrive all records with same hashes. What do you think of this idea?. And there is one more problem how to make it foreign key.
Will send you a prototype code tomorrow. Regards sachin
On Fri, Apr 15, 2016 at 12:23 AM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Sachin!
On Apr 13, Sachin Setia wrote:
Hello Sergei Sorry I did not see your mail. Actually i was thinking something like this before implementing the prototype but if i am more closer to innodb the more performance i will i get. I will definitively think about it.
Great!
Could you please tell me (mailing list, that is) what you think before next Monday (before April 18h, that is)?
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi, Sachin! On Apr 18, Sachin Setia wrote:
Hi Actually I was thinking about how to implement blob as a foreign key.Foreign has to be unique which we can already implement. To make it foreign key we can either store unique hash or the whole blob column. But I am not sure much people want to copy so long blob data in reference table.
Agree :)
Second option would be use blob hash as a reference key. But user can not directly us hash as a reference key because that is hidden. What I was thinking of a clear to programmer way of using blob hash. Suppose user can directly create blob hash column ,use that column as a primary key or foreign key.Like create table t1(abc blob , blob hash(abc))//this will just create blob hash column create table t1(abc blob,unique(blob hash(abc))) // this will create unique blob hash column and similar for primary key and foreign key user can enter hash value if they have some good algorithm or if they do not give any value we will automatically create and store hash. What do you think? sir.
Mixed feelings. First, I wrote in an earlier email about using virtual columns for that. In this line of thoughts, a user-specified hash function is absolutely possible and logical. On the other hand, I don't see why anyone would need that - a hash is not guaranteed to be unique, no matter what algorithm one uses. And, as I wrote in an earlier email, I don't think that primary/foreign keys on blobs is a feature worth spending time on. Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi, Sachin! On Apr 16, Sachin Setia wrote:
Hi Sergei! As I already told you i was building prototype.It is some what completed apart from one thing comparing of two field values. the difficulty is how to get data length of field from table->record[1]. I will try to solve it.
Do you mean, a length of the blob value? Try field->get_length()
One more thing actually i got how mysql hide field. For example condsider three fields hash,data,data. mysql field pointer point at second field not at hash field and hash field ptr is stored in table->hash_field can we do something similar to store hash fields(if we make array of hashes in case of more than one unique).But will adding member variable cause problem? what do you think?
Well, yes, that would work. But I see few other features where field hiding might be useful, and in these cases this simple trick with moving table->field pointer will not work, unfortunately. So we might need something more complex. But, again, this is not a most important part of this project, so don't start from it. In fact, I'm rather interested to know about the adaptive hashing that you've mentioned. Can you tell me more about it, please?
diff --git a/sql/sql_base.cc b/sql/sql_base.cc index ac2162b..291a3e2 100644 --- a/sql/sql_base.cc +++ b/sql/sql_base.cc @@ -8956,6 +8964,31 @@ fill_record(THD *thd, TABLE *table, Field **ptr, List<Item> &values, goto err; field->set_explicit_default(value); } + table->file->ha_index_init(0,0); + res = table->file->ha_index_read_map(table->record[1],hash_field->ptr,HA_WHOLE_KEY,HA_READ_KEY_EXACT); + while(!res){ + //compare the record if not sure how to compare it so just assume it works + diff = table->record[1]-table->record[0]; + src_length = blob_field->data_length(); + //dest_length = blob_field->data_length(table->record[1]); // i dont know how to get the length from record 1 + // so i am enable to do this + // then we can comapare records using + //field->cmp_max + //this is mysql code + /* if (!(table->distinct ? + table_rec_cmp(table) : + group_rec_cmp(table->group, table->record[0], table->record[1]))) + return false; // skip it + res= table->file->ha_index_next_same(table->record[1], + table->hash_field->ptr, + sizeof(hash)); */ + //fetch the next record + res= table->file->ha_index_next_same(table->record[1], + hash_field->ptr, + 8); + + }
not quite, you should check unique constraints in fill_record(). This should happen when the row is actually inserted or updated. A good place for this check, I'd say, is handler::ha_write_row() and handler::ha_update_row(). Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi, Sachin! On Apr 15, Sachin Setia wrote:
Hi Sergei!
Actually I was going through the mysql source code for unique long constraints in file sql_tmp_table.cc in function create_tmp_table they make a new field and a new key(hash_key) and pass this table obejct to storage engine.They actually refer this field as a hash field On the time of insert they call bool check_unique_constraint(TABLE *table) function which first calculate the hash and store it in field then they see for duplicate hash and retrive ha_index_next_same if records are not same then record
Right. Very good!
We can do the same thing in mariadb by adding one more field and key in mysql_prepare_create_table in this we check for blob with unlimited length or varchar for length greater then internal storage engine by doing this in mysql_prepare_create_table there will be no issues of frm file inconsistance.
In case of insert first we will fill the hash field in fill_record function of sql_base.cc by first calculating the hash. Then we will retrive the index map using ha_index_read_map if returened value is zero then we will comapare two records and if they match then we will through error I am not sure where to place this code either in fill_record or later Or i can simple just fill hash in field in fill_record and then check for duplicates later on.
MariaDB supports "virtual columns", see https://mariadb.com/kb/en/mariadb/virtual-computed-columns/ So, it might be enough to mark this hash column as virtual, and will be automatically calculated when necessary (in fill_record, etc).
Current I am not sure how to hide columns from user.Sir, can you suggest me where to look
This would need a new flag per field, like "hidden". And in all commands that use a list of columns (SHOW, INFORMATION_SCHEMA tables, SELECT *, INSERT table VALUE (...), etc) - this column should be skipped. But don't worry, I don't think this will be a problem for you. You can, of course, start from not hiding this column and implement this hiddden flag later.
But there is one problem we can make unique key by this approch but not primary key because primary key is clustered and hashes can collide so i think we can't use hash field as primary key.
Of course.
To overcome this problem I have one idea instead of storing just hash we can make hash field length 10 bytes and in last two bytes we can store short int which tells how much time hash is repeated this can make hash unique in case of collusion. And also we are not doing more computation because we already retrive all records with same hashes. What do you think of this idea?. And there is one more problem how to make it foreign key.
I would say, there is no need to bother. Let's just say that UNIQUE for blobs is only a constraint, can not be used as a primary key, can not be used as a foreign key. This is a reasonable limitation, I think :) Regards, Sergei Chief Architect MariaDB and security@mariadb.org
participants (2)
-
Sachin Setia
-
Sergei Golubchik