[Maria-developers] InnoDB blob for primary key
Hello Developers, Hi this is sachin.Actually i was currently experimenting with with blob uniques in innodb there is three main problems 1.Unique blob 2.Blob Forigen key 3.Blob primary key 1. For blob unique we can simply store hash in unclustered index 2. Blob Forigen key i am currently working on it 3. Blob primary key :- for this i thought we create a 4 byte column which stores the hash of blob primary key.Internally this column will work as primary key and key for clustered index. I already successufully tested this here is my output MariaDB [sachin]> create table t4 (abc blob primary key); Query OK, 0 rows affected (0.10 sec) MariaDB [sachin]> insert into t4 values('sachin'); Query OK, 1 row affected (0.01 sec) MariaDB [sachin]> insert into t4 values('sachin'); ERROR 1062 (23000): Duplicate entry 'sachin' for key 'PRIMARY' MariaDB [sachin]> insert into t4 values('sachin setiya'); Query OK, 1 row affected (0.00 sec) MariaDB [sachin]> insert into t4 values('sachin setiya'); ERROR 1062 (23000): Duplicate entry 'sachin setiya' for key 'PRIMARY' MariaDB [sachin]> desc t4; +-------+------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------+------+-----+---------+-------+ | abc | blob | NO | PRI | NULL | | +-------+------+------+-----+---------+-------+ 1 row in set (0.01 sec) MariaDB [sachin]> select * from t4; +---------------+ | abc | +---------------+ | sachin | | sachin setiya | +---------------+ 2 rows in set (0.01 sec) @Sergei hi! Actually i invested arround 2 months in mariadb So for me now it does not matter either i got selected in gsoc i want to do work in innodb blob unique from today.Please sir allow me to do this I am including the patch file and t4.ibd and t4.frm file Regards sachin
Hi, Sachin, naturally you may continue, this is open source, please read https://mariadb.com/kb/en/mariadb/community-contributing-to-the-mariadb-proj... From InnoDB point of view there is issue if you add a new system generated column to the row. Firstly, existing tables will not have that column and secondly it will effect e.g maximum row size and external tools like backup. R: Jan On Mon, Apr 11, 2016 at 12:33 PM, Sachin Setia <sachinsetia1001@gmail.com> wrote:
Hello Developers, Hi this is sachin.Actually i was currently experimenting with with blob uniques in innodb there is three main problems 1.Unique blob 2.Blob Forigen key 3.Blob primary key
1. For blob unique we can simply store hash in unclustered index 2. Blob Forigen key i am currently working on it 3. Blob primary key :- for this i thought we create a 4 byte column which stores the hash of blob primary key.Internally this column will work as primary key and key for clustered index. I already successufully tested this here is my output
MariaDB [sachin]> create table t4 (abc blob primary key); Query OK, 0 rows affected (0.10 sec)
MariaDB [sachin]> insert into t4 values('sachin'); Query OK, 1 row affected (0.01 sec)
MariaDB [sachin]> insert into t4 values('sachin'); ERROR 1062 (23000): Duplicate entry 'sachin' for key 'PRIMARY' MariaDB [sachin]> insert into t4 values('sachin setiya'); Query OK, 1 row affected (0.00 sec)
MariaDB [sachin]> insert into t4 values('sachin setiya'); ERROR 1062 (23000): Duplicate entry 'sachin setiya' for key 'PRIMARY'
MariaDB [sachin]> desc t4; +-------+------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------+------+-----+---------+-------+ | abc | blob | NO | PRI | NULL | | +-------+------+------+-----+---------+-------+ 1 row in set (0.01 sec)
MariaDB [sachin]> select * from t4; +---------------+ | abc | +---------------+ | sachin | | sachin setiya | +---------------+ 2 rows in set (0.01 sec)
@Sergei hi! Actually i invested arround 2 months in mariadb So for me now it does not matter either i got selected in gsoc i want to do work in innodb blob unique from today.Please sir allow me to do this
I am including the patch file and t4.ibd and t4.frm file Regards sachin
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Hello Jan! Actually I was going through the problems raised by you 1>Firstly, existing tables will not have that column Yes Sir, you are right that existing table woudn't have DB_BLOB_HASH column, but this is fine because they indexed there blob column then they must have provided blob length, and if so then length of key_part would not be zero and so need to make changes in patch to detect this. And also by doing I can also eliminate another problem what if existing table have a coloumn named 'DB_BLOB_HASH' 2>secondly it will effect e.g maximum row size For this i can increase reclength by 4 in pack_header function(but i am not sure whether this will work or not) 3>and external tools like backup Here, I am assuming that by backup you mean mysqldump actually i tried this by create a table with blob coloum (length was given ) and than changing mariadb binary(to changed code) and .sql file of backup was working fine But I think there are more problems with this this prototype 1>Existing table with coloumn named 'DB_BLOB_HASH' .We can solve this as written above but some code in patch compares first coloumn name with 'DB_BLOB_HASH' so this will create problem Sir, can we add a flag in dict_table_t->flag so that we can reliable check for blob hash coloumn 2>This is very bad problem , what happen if if data is different and crc hash of data is same.And this will occur when we have medium amount of data like 100 thousand records.To avoid this we can do like 2.1> use crc 64 reduces probability 2.2> use two different algorithm for hash and treat these two algoriths hash for uniquesness.Make collusion probability very very less But the main problem is what would happen if collusion occurs to solve this i have two ideas 2.3> like 2.2 but instead of using two algorithm use one and add column for number of times this hash is repeated collusion frequency will be zero i think this will work fine but need to test it 2.4> whole prototype is crap start again with some different method. Can you please suggest some differnt way of doing this Regards sachin On Tue, Apr 12, 2016 at 4:52 PM, Jan Lindström <jan.lindstrom@mariadb.com> wrote:
Hi,
Sachin, naturally you may continue, this is open source, please read https://mariadb.com/kb/en/mariadb/community-contributing-to-the-mariadb-proj... From InnoDB point of view there is issue if you add a new system generated column to the row. Firstly, existing tables will not have that column and secondly it will effect e.g maximum row size and external tools like backup.
R: Jan
On Mon, Apr 11, 2016 at 12:33 PM, Sachin Setia <sachinsetia1001@gmail.com> wrote:
Hello Developers, Hi this is sachin.Actually i was currently experimenting with with blob uniques in innodb there is three main problems 1.Unique blob 2.Blob Forigen key 3.Blob primary key
1. For blob unique we can simply store hash in unclustered index 2. Blob Forigen key i am currently working on it 3. Blob primary key :- for this i thought we create a 4 byte column which stores the hash of blob primary key.Internally this column will work as primary key and key for clustered index. I already successufully tested this here is my output
MariaDB [sachin]> create table t4 (abc blob primary key); Query OK, 0 rows affected (0.10 sec)
MariaDB [sachin]> insert into t4 values('sachin'); Query OK, 1 row affected (0.01 sec)
MariaDB [sachin]> insert into t4 values('sachin'); ERROR 1062 (23000): Duplicate entry 'sachin' for key 'PRIMARY' MariaDB [sachin]> insert into t4 values('sachin setiya'); Query OK, 1 row affected (0.00 sec)
MariaDB [sachin]> insert into t4 values('sachin setiya'); ERROR 1062 (23000): Duplicate entry 'sachin setiya' for key 'PRIMARY'
MariaDB [sachin]> desc t4; +-------+------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------+------+-----+---------+-------+ | abc | blob | NO | PRI | NULL | | +-------+------+------+-----+---------+-------+ 1 row in set (0.01 sec)
MariaDB [sachin]> select * from t4; +---------------+ | abc | +---------------+ | sachin | | sachin setiya | +---------------+ 2 rows in set (0.01 sec)
@Sergei hi! Actually i invested arround 2 months in mariadb So for me now it does not matter either i got selected in gsoc i want to do work in innodb blob unique from today.Please sir allow me to do this
I am including the patch file and t4.ibd and t4.frm file Regards sachin
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Hello Sergei Sorry I did not see your mail. Actually i was thinking something like this before implementing the prototype but if i am more closer to innodb the more performance i will i get. I will definitively think about it. Regards sachin On Wed, Apr 13, 2016 at 9:17 PM, Sachin Setia <sachinsetia1001@gmail.com> wrote:
Hello Jan!
Actually I was going through the problems raised by you
1>Firstly, existing tables will not have that column
Yes Sir, you are right that existing table woudn't have DB_BLOB_HASH column, but this is fine because they indexed there blob column then they must have provided blob length, and if so then length of key_part would not be zero and so need to make changes in patch to detect this. And also by doing I can also eliminate another problem what if existing table have a coloumn named 'DB_BLOB_HASH'
2>secondly it will effect e.g maximum row size For this i can increase reclength by 4 in pack_header function(but i am not sure whether this will work or not)
3>and external tools like backup Here, I am assuming that by backup you mean mysqldump actually i tried this by create a table with blob coloum (length was given ) and than changing mariadb binary(to changed code) and .sql file of backup was working fine
But I think there are more problems with this this prototype 1>Existing table with coloumn named 'DB_BLOB_HASH' .We can solve this as written above but some code in patch compares first coloumn name with 'DB_BLOB_HASH' so this will create problem Sir, can we add a flag in dict_table_t->flag so that we can reliable check for blob hash coloumn
2>This is very bad problem , what happen if if data is different and crc hash of data is same.And this will occur when we have medium amount of data like 100 thousand records.To avoid this we can do like
2.1> use crc 64 reduces probability 2.2> use two different algorithm for hash and treat these two algoriths hash for uniquesness.Make collusion probability very very less
But the main problem is what would happen if collusion occurs to solve this i have two ideas 2.3> like 2.2 but instead of using two algorithm use one and add column for number of times this hash is repeated collusion frequency will be zero i think this will work fine but need to test it 2.4> whole prototype is crap start again with some different method. Can you please suggest some differnt way of doing this Regards sachin
On Tue, Apr 12, 2016 at 4:52 PM, Jan Lindström <jan.lindstrom@mariadb.com> wrote:
Hi,
Sachin, naturally you may continue, this is open source, please read https://mariadb.com/kb/en/mariadb/community-contributing-to-the-mariadb-proj... From InnoDB point of view there is issue if you add a new system generated column to the row. Firstly, existing tables will not have that column and secondly it will effect e.g maximum row size and external tools like backup.
R: Jan
On Mon, Apr 11, 2016 at 12:33 PM, Sachin Setia <sachinsetia1001@gmail.com> wrote:
Hello Developers, Hi this is sachin.Actually i was currently experimenting with with blob uniques in innodb there is three main problems 1.Unique blob 2.Blob Forigen key 3.Blob primary key
1. For blob unique we can simply store hash in unclustered index 2. Blob Forigen key i am currently working on it 3. Blob primary key :- for this i thought we create a 4 byte column which stores the hash of blob primary key.Internally this column will work as primary key and key for clustered index. I already successufully tested this here is my output
MariaDB [sachin]> create table t4 (abc blob primary key); Query OK, 0 rows affected (0.10 sec)
MariaDB [sachin]> insert into t4 values('sachin'); Query OK, 1 row affected (0.01 sec)
MariaDB [sachin]> insert into t4 values('sachin'); ERROR 1062 (23000): Duplicate entry 'sachin' for key 'PRIMARY' MariaDB [sachin]> insert into t4 values('sachin setiya'); Query OK, 1 row affected (0.00 sec)
MariaDB [sachin]> insert into t4 values('sachin setiya'); ERROR 1062 (23000): Duplicate entry 'sachin setiya' for key 'PRIMARY'
MariaDB [sachin]> desc t4; +-------+------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------+------+-----+---------+-------+ | abc | blob | NO | PRI | NULL | | +-------+------+------+-----+---------+-------+ 1 row in set (0.01 sec)
MariaDB [sachin]> select * from t4; +---------------+ | abc | +---------------+ | sachin | | sachin setiya | +---------------+ 2 rows in set (0.01 sec)
@Sergei hi! Actually i invested arround 2 months in mariadb So for me now it does not matter either i got selected in gsoc i want to do work in innodb blob unique from today.Please sir allow me to do this
I am including the patch file and t4.ibd and t4.frm file Regards sachin
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
participants (2)
-
Jan Lindström
-
Sachin Setia