Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Hello Sergei Actually I was prototyping for blob and varchar for aria and myisam storage engine. My prototype worked for complex definations like craete table(abc int primary key, blob_col blob unique, varchar_col varchar(1000) unique) engine=myisam; Solved the issue of frm file incosistance. As you suggested for doing it for innodb i am current working on it.Innodb does not natively support hash based index. when we run select distinct column from tbl; it use create_internal_tmp_table() which uses maria storage engine for creating tmp table. But query like this works MariaDB [sachin]> create table iu2(abc blob unique); Query OK, 0 rows affected (0.04 sec) MariaDB [sachin]> insert into iu2 values(1); Query OK, 1 row affected (0.03 sec) MariaDB [sachin]> insert into iu2 values(1); ERROR 1062 (23000): Duplicate entry '1' for key 'abc' this query does not use hash but it simply compares values Will write a proposal shortly. On Tue, Mar 22, 2016 at 4:20 PM, Sachin Setia <sachinsetia1001@gmail.com> wrote:
Hello Sergi Actually I was prototyping for blob and varchar for aria and myisam storage engine. My prototype worked for complex definations like craete table(abc int primary key, blob_col blob unique, varchar_col varchar(1000) unique) engine=myisam; Solved the issue of frm file incosistance.
As you suggested for doing it for innodb i am current working on it.Innodb does not natively support hash based index. when we run select distinct column from tbl; it use create_internal_tmp_table() which uses maria storage engine for creating tmp table. But query like this works MariaDB [sachin]> create table iu2(abc blob unique); Query OK, 0 rows affected (0.04 sec)
MariaDB [sachin]> insert into iu2 values(1); Query OK, 1 row affected (0.03 sec)
MariaDB [sachin]> insert into iu2 values(1); ERROR 1062 (23000): Duplicate entry '1' for key 'abc' this query does not use hash but it simply compares values Will write a proposal shortly.
Regards sachin
On Sat, Mar 19, 2016 at 1:52 AM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Sachin!
On Mar 18, Sachin Setia wrote:
ERROR 1030 (HY000): Got error 190 "Incompatible key or row definition between the MariaDB .frm file and the information in the storage engine. You have to dump an" from storage engine MyISAM
We are getting this becuase in mi_create for each unique_def it creates
keydef and writes it.And this creates two problem 1. frm keydef algorithm is different from saved kefdef algorithm(always zero) for the time I have solved this problem .
2.Mismatch between keydef's keysegs the reason for this is when mi_create creates keyseg for unique_def it did not keeps the orignal uniquedef's keyseg parameters in mind like language start length which creates
one problem
in check_definition function in ha_myisam.cc.I am currently working on it Once again sorry for this foolish mistake. Regars sachin
No problem, everyone makes mistakes :)
It's a prototype, after all. It's much more important that you understand how the code works and why it doesn't work.
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi, Sachin! On Mar 22, Sachin Setia wrote:
Hello Sergei Actually I was prototyping for blob and varchar for aria and myisam storage engine. My prototype worked for complex definations like craete table(abc int primary key, blob_col blob unique, varchar_col varchar(1000) unique) engine=myisam; Solved the issue of frm file incosistance.
As you suggested for doing it for innodb i am current working on it.Innodb
*I* did not suggest that. But you're welcome to try, of course. If you think that just MyISAM is too simple for a three month project. (Aria doesn't count it's a couple of days after you done MyISAM).
does not natively support hash based index. when we run select distinct column from tbl; it use create_internal_tmp_table() which uses maria storage engine for creating tmp table. But query like this works MariaDB [sachin]> create table iu2(abc blob unique); Query OK, 0 rows affected (0.04 sec)
MariaDB [sachin]> insert into iu2 values(1); Query OK, 1 row affected (0.03 sec)
MariaDB [sachin]> insert into iu2 values(1); ERROR 1062 (23000): Duplicate entry '1' for key 'abc' this query does not use hash but it simply compares values
Interesting.
Will write a proposal shortly.
Okay. Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hello Sergei Today I made some progress related to project. MyISAM/ARIA Got clear understanding of how to implement unique index for query like create table tbl(col1 int primary key , col2 blob ,col3 blob , unique(col2,col3)) InnoDB Reading about it.Actually Sir, I want to do this project whether I will select in gsoc or not(because InnoDB is amazing). Proposal Still Writing Actually sir i have one doubt in table2myisam function definition recinfo_out, (share->fields * 2 + 2) * sizeof(MI_COLUMNDEF), ^^^^^ ^ ^ why we allocating these many number of recinfo because we only require share->fields + 1 . One more doubt in optimizing "select distinct coloumn_name(here it is a blob coloumn) from table" query. In mi write which take one record and write it we check for unique constraint. It takes O(n^2) time. I was thinking if we can optimize this by first fetching the whole table record and calculating hash for each record.Instead of comparing one hash with all other we can sort the hashes and ignore the duplicate (we can make an array of 0 and 1 and if it 1 that means record is not duplicate and for 0 it is duplicte) .buy doing this we can reduce the time complexity to O(nlog(n)).This will work fast if we have enough buffer_storage in case of low buffer memory this will turn to tradeoff between cpu and i/o requests because in order to sort keys in low ram we need to use m way merge sort which ultimately result in more I/O because we have to send back records to hard disk which we can not store in ram and then once again fetch unique record for storing in tmp table.But we can get performance if records fit in ram .For caching the records we can do it over here sql/sql_select.cc 18313 error= info->read_record(info); Regards sachin On Wed, Mar 23, 2016 at 12:06 AM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Sachin!
On Mar 22, Sachin Setia wrote:
Hello Sergei Actually I was prototyping for blob and varchar for aria and myisam storage engine. My prototype worked for complex definations like craete table(abc int primary key, blob_col blob unique, varchar_col varchar(1000) unique) engine=myisam; Solved the issue of frm file incosistance.
As you suggested for doing it for innodb i am current working on it.Innodb
*I* did not suggest that. But you're welcome to try, of course. If you think that just MyISAM is too simple for a three month project. (Aria doesn't count it's a couple of days after you done MyISAM).
does not natively support hash based index. when we run select distinct column from tbl; it use create_internal_tmp_table() which uses maria storage engine for creating tmp table. But query like this works MariaDB [sachin]> create table iu2(abc blob unique); Query OK, 0 rows affected (0.04 sec)
MariaDB [sachin]> insert into iu2 values(1); Query OK, 1 row affected (0.03 sec)
MariaDB [sachin]> insert into iu2 values(1); ERROR 1062 (23000): Duplicate entry '1' for key 'abc' this query does not use hash but it simply compares values
Interesting.
Will write a proposal shortly.
Okay.
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi, Sachin! On Mar 23, Sachin Setia wrote:
Hello Sergei Today I made some progress related to project. MyISAM/ARIA Got clear understanding of how to implement unique index for query like create table tbl(col1 int primary key , col2 blob ,col3 blob , unique(col2,col3)) InnoDB Reading about it.Actually Sir, I want to do this project whether I will select in gsoc or not(because InnoDB is amazing).
:)
Proposal Still Writing
don't miss the deadline :)
Actually sir i have one doubt in table2myisam function definition
recinfo_out, (share->fields * 2 + 2) * sizeof(MI_COLUMNDEF), ^^^^^ ^ ^ why we allocating these many number of recinfo because we only require share->fields + 1 .
good question :) I don't know. this line apparently goes back at least to 2001, there is no revision history beyond that date. It could be that it allocates twice as much columns as necessary nowadays, even if it made sense many years ago.
One more doubt in optimizing "select distinct coloumn_name(here it is a blob coloumn) from table" query. In mi write which take one record and write it we check for unique constraint. It takes O(n^2) time. I
This isnt O(n^2), because hashes are stored in the index, in a b-tree. So, it's O(n*log(n)).
was thinking if we can optimize this by first fetching the whole table record and calculating hash for each record.Instead of comparing one hash with all other we can sort the hashes and ignore the duplicate (we can make an array of 0 and 1 and if it 1 that means record is not duplicate and for 0 it is duplicte). by doing this we can reduce the time complexity to O(nlog(n)).
As you see, we already have O(n*log(n)). But if we put these hashes into a hash table in memory (instead of just sorting them), the cost will go down to O(n). Sounds interesting :) Regards, Sergei Chief Architect MariaDB and security@mariadb.org
participants (2)
-
Sachin Setia
-
Sergei Golubchik