[Maria-developers] GSoC 2016:Unique indexes for blobs

newer
Re: [Maria-developers] InnoDB blob...

older
[Maria-developers] Fwd: Do we have...

Shubham Barai

3 Mar 2016 3 Mar '16

2:32 p.m.

Hello everyone, I am Shubham Barai, a 3rd-year undergraduate student in computer science at Maharashtra Institute of technology, Pune, India. I am interested in the project "Unique indexes for blobs".I have read the description of the project given on the link https://jira.mariadb.org/browse/MDEV-371. MyISAM and innoDB do allow users to create unique indexes on blobs with some limitations on prefix length. So end user can create an index on blobs using "create table" or "create index "statement. example:CREATE TABLE some_table (blob_column BLOB, UNIQUE (blob_column(700))); For MyISAM tables, the prefix limit is 1000 bytes and for innoDB limit is 767 bytes. After creating an index, I ran some queries like "explain select distinct blob_column from some_table" and found out that server executes this query with the help of the temporary table but for other data types like int, the server executes the query with the help of the index. I want to know what exactly is the task for this project.Any help would be greatly appreciated Thank You, Shubham.

Attachments:

attachment.html (text/html — 4.5 KB)

Show replies by date

Sergei Golubchik

4 Mar 4 Mar

11:22 p.m.

Hi, Shubham! On Mar 03, Shubham Barai wrote:

...

I am interested in the project "Unique indexes for blobs".I have read the description of the project given on the link https://jira.mariadb.org/browse/MDEV-371.

Great!

...

I want to know what exactly is the task for this project.Any help would be greatly appreciated

See storage/myisam/mi_create.c. You see that a myisam table can have keys (defined in MI_KEYDEF structure) and so-called "uniques" (defined by MI_UNIQUEDEF). Keys have length limitation, keys can be unique (HA_NOSAME flag) or not unique. "uniques" have no length limitation. When one creates a unique key from SQL: CREATE TABLE ... (... UNIQUE KEY (...) ... ) this will always create a key (MI_KEYDEF) in a MyISAM table. That's why unique constraints in MyISAM have a limited length. This task is about creating MI_UNIQUEDEF "uniques" instead of MI_KEYDEF "keys" for long unique constraints. Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Shubham Barai

16 Mar 16 Mar

4:48 a.m.

Hello, Sergie! I tried to explore the source code from mi_create.c,ha_myisam.cc,sql/sql_table.cc,include/myisam.h,.and some other files. The main task is to create MI_UNIQUEDEF "uniques" for long unique constraints. We have to consider all the cases where we need to create MI_UNIQUEDEF instead of MI_KEYDEF. It will include queries like create table table1 (blob_column blob,unique(blob_column) ); create table table1 (a int,blob_column blob,unique(a,blob_column) ); create table table1 (a int,blob_column1 blob,blob_column2 blob,unique(blob_column1(300),blob_column2) ); (key with multiple blob columns and one of the blob column specified with prefix length). I think we have to create MI_UNIQUEDEF if any one of the columns in a key is a blob field without prefix length. In sql/sql_table, mysql_prepare_create_table is the function which prepares the table and key structures for table creation in mi_create. It generates an error if any one of the blob fields in a key is specified without length. Currently, this task is limited to MyISAM, so if any other storage engine is selected, we have to generate the same error in mysql_prepare_create_table. In storage/myisam/ha_myisam.cc, table2myisam is a function which allocates and initializes myisam key and column definitions.The current function prototype of table2myisam is table2myisam(TABLE *table_arg, MI_KEYDEF **keydef_out, MI_COLUMNDEF **recinfo_out, uint records_out) We have to change it to table2myisam(TABLE *table_arg,MI_KEYDEF **keydef_out,MI_UNIQUEDEF ** uniquedef_out,MI_COLUMNDEF **recinfo_out,uint records_out) table2myisam initializes all the key definitions from table_arg->keyinfo. So we can set a new flag (say uniquedef) in a keyinfo struct in mysql_prepare_create_table if any one of the key_part consists of blob field without prefix length. Later we can check the flag in table2myisam to see if we want to create MI_KEYDEF or MI_UNIQUEDEF. Thanks, Shubham. On 5 March 2016 at 03:52, Sergei Golubchik <serg@mariadb.org> wrote:

...

Hi, Shubham!

On Mar 03, Shubham Barai wrote:

...
I am interested in the project "Unique indexes for blobs".I have read the description of the project given on the link https://jira.mariadb.org/browse/MDEV-371.

Great!

...
I want to know what exactly is the task for this project.Any help would be greatly appreciated

See storage/myisam/mi_create.c. You see that a myisam table can have keys (defined in MI_KEYDEF structure) and so-called "uniques" (defined by MI_UNIQUEDEF).

Keys have length limitation, keys can be unique (HA_NOSAME flag) or not unique.

"uniques" have no length limitation.

When one creates a unique key from SQL:

CREATE TABLE ... (... UNIQUE KEY (...) ... )

this will always create a key (MI_KEYDEF) in a MyISAM table. That's why unique constraints in MyISAM have a limited length.

This task is about creating MI_UNIQUEDEF "uniques" instead of MI_KEYDEF "keys" for long unique constraints.

Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Sachin Setia

5:22 a.m.

Hello everyone @Shubham As I said i am just prototyping This is not the final code It is just poc to see whether if I send some unique key will mi_create work fine or not as it is working fine in create_internal_tmp_table on following query "select distinct(a2) from tbl" where schema of tbl will be a1 int primary key ,a2 blob On Wed, Mar 16, 2016 at 9:18 AM, Shubham Barai <shubhambaraiss@gmail.com> wrote:

...

Hello, Sergie! I tried to explore the source code from mi_create.c,ha_myisam.cc,sql/sql_table.cc,include/myisam.h,.and some other files. The main task is to create MI_UNIQUEDEF "uniques" for long unique constraints. We have to consider all the cases where we need to create MI_UNIQUEDEF instead of MI_KEYDEF.

It will include queries like create table table1 (blob_column blob,unique(blob_column) ); create table table1 (a int,blob_column blob,unique(a,blob_column) ); create table table1 (a int,blob_column1 blob,blob_column2 blob,unique(blob_column1(300),blob_column2) ); (key with multiple blob columns and one of the blob column specified with prefix length).

I think we have to create MI_UNIQUEDEF if any one of the columns in a key is a blob field without prefix length.

In sql/sql_table, mysql_prepare_create_table is the function which prepares the table and key structures for table creation in mi_create. It generates an error if any one of the blob fields in a key is specified without length. Currently, this task is limited to MyISAM, so if any other storage engine is selected, we have to generate the same error in mysql_prepare_create_table.

In storage/myisam/ha_myisam.cc, table2myisam is a function which allocates and initializes myisam key and column definitions.The current function prototype of table2myisam is table2myisam(TABLE *table_arg, MI_KEYDEF **keydef_out, MI_COLUMNDEF **recinfo_out, uint records_out) We have to change it to table2myisam(TABLE *table_arg,MI_KEYDEF **keydef_out,MI_UNIQUEDEF ** uniquedef_out,MI_COLUMNDEF **recinfo_out,uint records_out)

table2myisam initializes all the key definitions from table_arg->keyinfo. So we can set a new flag (say uniquedef) in a keyinfo struct in mysql_prepare_create_table if any one of the key_part consists of blob field without prefix length. Later we can check the flag in table2myisam to see if we want to create MI_KEYDEF or MI_UNIQUEDEF.

Thanks, Shubham.

On 5 March 2016 at 03:52, Sergei Golubchik <serg@mariadb.org> wrote:

...
Hi, Shubham!

On Mar 03, Shubham Barai wrote:

...
I am interested in the project "Unique indexes for blobs".I have read

the

...
description of the project given on the link https://jira.mariadb.org/browse/MDEV-371.

Great!

...
I want to know what exactly is the task for this project.Any help would be greatly appreciated

See storage/myisam/mi_create.c. You see that a myisam table can have keys (defined in MI_KEYDEF structure) and so-called "uniques" (defined by MI_UNIQUEDEF).

Keys have length limitation, keys can be unique (HA_NOSAME flag) or not unique.

"uniques" have no length limitation.

When one creates a unique key from SQL:

CREATE TABLE ... (... UNIQUE KEY (...) ... )

this will always create a key (MI_KEYDEF) in a MyISAM table. That's why unique constraints in MyISAM have a limited length.

This task is about creating MI_UNIQUEDEF "uniques" instead of MI_KEYDEF "keys" for long unique constraints.

Regards, Sergei Chief Architect MariaDB and security@mariadb.org

_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

Sergei Golubchik

1:25 p.m.

Hi, Shubham! On Mar 16, Shubham Barai wrote:

...

Hello, Sergie! I tried to explore the source code from mi_create.c,ha_myisam.cc,sql/sql_table.cc,include/myisam.h,.and some other files. The main task is to create MI_UNIQUEDEF "uniques" for long unique constraints. We have to consider all the cases where we need to create MI_UNIQUEDEF instead of MI_KEYDEF.

It will include queries like create table table1 (blob_column blob,unique(blob_column) ); create table table1 (a int,blob_column blob,unique(a,blob_column) ); create table table1 (a int,blob_column1 blob,blob_column2 blob,unique(blob_column1(300),blob_column2) ); (key with multiple blob columns and one of the blob column specified with prefix length).

I think we have to create MI_UNIQUEDEF if any one of the columns in a key is a blob field without prefix length.

Yes. And also for any other UNIQUE constraint that is too long for a normal index. For example, many long VARCHAR columns. But, of course, for a prototype one can start just with blobs.

...

In sql/sql_table, mysql_prepare_create_table is the function which prepares the table and key structures for table creation in mi_create. It generates an error if any one of the blob fields in a key is specified without length. Currently, this task is limited to MyISAM, so if any other storage engine is selected, we have to generate the same error in mysql_prepare_create_table.

There's a generic check whether the key is too long. It can be used here, it doesn't depend on the storage engine.

...

In storage/myisam/ha_myisam.cc, table2myisam is a function which allocates and initializes myisam key and column definitions.The current function prototype of table2myisam is table2myisam(TABLE *table_arg, MI_KEYDEF **keydef_out, MI_COLUMNDEF **recinfo_out, uint records_out) We have to change it to table2myisam(TABLE *table_arg,MI_KEYDEF **keydef_out,MI_UNIQUEDEF ** uniquedef_out,MI_COLUMNDEF **recinfo_out,uint records_out)

Right.

...

table2myisam initializes all the key definitions from table_arg->keyinfo. So we can set a new flag (say uniquedef) in a keyinfo struct in mysql_prepare_create_table if any one of the key_part consists of blob field without prefix length.

There's a field 'algorithm' already. Because MI_UNIQUEDEF in MyISAM is, basically, an index of hashed column values, it is kind of a hash index. So you can use algorithm=HA_KEY_ALG_HASH to mark such columns. And the user will be able to create these indexes explicitly with create table table1 (blob_column blob,unique(blob_column) using hash);

...

Later we can check the flag in table2myisam to see if we want to create MI_KEYDEF or MI_UNIQUEDEF.

Thanks, Shubham.

Very good! Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Shubham Barai

19 Mar 19 Mar

9:52 a.m.

Hello Sergei, As you suggested, I tried to develop a prototype just for blobs. So far it is working fine but I am still verifying it for different cases. I will provide you the link to my Github repository shortly. Thanks, Shubham On 16 March 2016 at 17:55, Sergei Golubchik <serg@mariadb.org> wrote:

...

Hi, Shubham!

On Mar 16, Shubham Barai wrote:

...
Hello, Sergie! I tried to explore the source code from mi_create.c,ha_myisam.cc,sql/sql_table.cc,include/myisam.h,.and some other files. The main task is to create MI_UNIQUEDEF "uniques" for long unique constraints. We have to consider all the cases where we need to create MI_UNIQUEDEF instead of MI_KEYDEF.

It will include queries like create table table1 (blob_column blob,unique(blob_column) ); create table table1 (a int,blob_column blob,unique(a,blob_column) ); create table table1 (a int,blob_column1 blob,blob_column2 blob,unique(blob_column1(300),blob_column2) ); (key with multiple blob columns and one of the blob column specified with prefix length).

I think we have to create MI_UNIQUEDEF if any one of the columns in a key is a blob field without prefix length.

Yes. And also for any other UNIQUE constraint that is too long for a normal index. For example, many long VARCHAR columns.

But, of course, for a prototype one can start just with blobs.

...
In sql/sql_table, mysql_prepare_create_table is the function which prepares the table and key structures for table creation in mi_create. It generates an error if any one of the blob fields in a key is specified without length. Currently, this task is limited to MyISAM, so if any other storage engine is selected, we have to generate the same error in mysql_prepare_create_table.

There's a generic check whether the key is too long. It can be used here, it doesn't depend on the storage engine.

...
In storage/myisam/ha_myisam.cc, table2myisam is a function which allocates and initializes myisam key and column definitions.The current function prototype of table2myisam is table2myisam(TABLE *table_arg, MI_KEYDEF **keydef_out, MI_COLUMNDEF **recinfo_out, uint records_out) We have to change it to table2myisam(TABLE *table_arg,MI_KEYDEF **keydef_out,MI_UNIQUEDEF ** uniquedef_out,MI_COLUMNDEF **recinfo_out,uint records_out)

Right.

...
table2myisam initializes all the key definitions from table_arg->keyinfo. So we can set a new flag (say uniquedef) in a keyinfo struct in mysql_prepare_create_table if any one of the key_part consists of blob field without prefix length.

There's a field 'algorithm' already. Because MI_UNIQUEDEF in MyISAM is, basically, an index of hashed column values, it is kind of a hash index. So you can use algorithm=HA_KEY_ALG_HASH to mark such columns. And the user will be able to create these indexes explicitly with

create table table1 (blob_column blob,unique(blob_column) using hash);

...
Later we can check the flag in table2myisam to see if we want to create MI_KEYDEF or MI_UNIQUEDEF.

Thanks, Shubham.

Very good!

Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Shubham Barai

20 Mar 20 Mar

6:40 a.m.

Hi Sergei, Sorry! I made a mistake. Actually, when I changed my storage engine to MyISAM and executed create table statement(with unique blobs and no prefix length),mysqld crashed. After reconnecting to the server, the storage engine got changed to default (InnoDB).I didn't know about that and I executed all queries on InnoDB. I thought my patch was working for MyISAM but there are some issues.I am working on it but it might require some time to get it working. Since application deadline for GSoC is 25 March, I will try to submit my proposal as early as possible. Thanks, Shubham On 19 March 2016 at 14:22, Shubham Barai <shubhambaraiss@gmail.com> wrote:

...

Hello Sergei,

As you suggested, I tried to develop a prototype just for blobs. So far it is working fine but I am still verifying it for different cases. I will provide you the link to my Github repository shortly.

Thanks, Shubham

On 16 March 2016 at 17:55, Sergei Golubchik <serg@mariadb.org> wrote:

...
Hi, Shubham!

On Mar 16, Shubham Barai wrote:

...
Hello, Sergie! I tried to explore the source code from mi_create.c,ha_myisam.cc,sql/sql_table.cc,include/myisam.h,.and some other files. The main task is to create MI_UNIQUEDEF "uniques" for long unique constraints. We have to consider all the cases where we need to create MI_UNIQUEDEF instead of MI_KEYDEF.

It will include queries like create table table1 (blob_column blob,unique(blob_column) ); create table table1 (a int,blob_column blob,unique(a,blob_column) ); create table table1 (a int,blob_column1 blob,blob_column2 blob,unique(blob_column1(300),blob_column2) ); (key with multiple blob columns and one of the blob column specified with prefix length).

I think we have to create MI_UNIQUEDEF if any one of the columns in a key is a blob field without prefix length.

Yes. And also for any other UNIQUE constraint that is too long for a normal index. For example, many long VARCHAR columns.

But, of course, for a prototype one can start just with blobs.

...
In sql/sql_table, mysql_prepare_create_table is the function which prepares the table and key structures for table creation in mi_create. It generates an error if any one of the blob fields in a key is specified without length. Currently, this task is limited to MyISAM, so if any other storage engine is selected, we have to generate the same error in mysql_prepare_create_table.

There's a generic check whether the key is too long. It can be used here, it doesn't depend on the storage engine.

...
In storage/myisam/ha_myisam.cc, table2myisam is a function which allocates and initializes myisam key and column definitions.The current function prototype of table2myisam is table2myisam(TABLE *table_arg, MI_KEYDEF **keydef_out, MI_COLUMNDEF **recinfo_out, uint records_out) We have to change it to table2myisam(TABLE *table_arg,MI_KEYDEF **keydef_out,MI_UNIQUEDEF ** uniquedef_out,MI_COLUMNDEF **recinfo_out,uint records_out)

Right.

...
table2myisam initializes all the key definitions from table_arg->keyinfo. So we can set a new flag (say uniquedef) in a keyinfo struct in mysql_prepare_create_table if any one of the key_part consists of blob field without prefix length.

There's a field 'algorithm' already. Because MI_UNIQUEDEF in MyISAM is, basically, an index of hashed column values, it is kind of a hash index. So you can use algorithm=HA_KEY_ALG_HASH to mark such columns. And the user will be able to create these indexes explicitly with

create table table1 (blob_column blob,unique(blob_column) using hash);

...
Later we can check the flag in table2myisam to see if we want to create MI_KEYDEF or MI_UNIQUEDEF.

Thanks, Shubham.

Very good!

Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Shubham Barai

3:43 p.m.

Hi Sergei, As I told you, I was working on prototype just for blobs. I was having some issue with key definition between MariaDB .frm file and information in the MyISAM storage engine. I have solved that problem temporarily. Here is my output MariaDB [database1]> set storage_engine=myisam; Query OK, 0 rows affected (0.00 sec) MariaDB [database1]> create database mydb; Query OK, 1 row affected (0.00 sec) MariaDB [database1]> use mydb; Database changed MariaDB [mydb]> create table table1(a int,b1 blob,b2 longblob,unique(a,b1,b2(5)) ); Query OK, 0 rows affected (0.09 sec) MariaDB [mydb]> insert into table1 values(4,3333333,55555555); Query OK, 1 row affected (0.00 sec) MariaDB [mydb]> insert into table1 values(4,3333333,55555); ERROR 1169 (23000): Can't write, because of unique constraint, to table 'table1' MariaDB [mydb]> insert into table1 values(4,3333333,55555555); ERROR 1169 (23000): Can't write, because of unique constraint, to table 'table1' MariaDB [mydb]> insert into table1 values(4,333333453,55555555); Query OK, 1 row affected (0.00 sec) 3 rows in set (0.00 sec) MariaDB [mydb]> insert into table1 values(5,333333453,55555555); Query OK, 1 row affected (0.00 sec) MariaDB [mydb]> insert into table1 values(5,333333453,55555); ERROR 1169 (23000): Can't write, because of unique constraint, to table 'table1' MariaDB [mydb]> show indexes from table1; +--------+------------+----------+--------------+-------------+-----------+----- --------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Card inality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +--------+------------+----------+--------------+-------------+-----------+----- --------+----------+--------+------+------------+---------+---------------+ | table1 | 0 | a | 1 | a | A | NULL | NULL | NULL | YES | BTREE | | | | table1 | 0 | a | 2 | b1 | A | NULL | NULL | NULL | YES | BTREE | | | | table1 | 0 | a | 3 | b2 | A | 0 | 5 | NULL | YES | BTREE | | | +--------+------------+----------+--------------+-------------+-----------+----- --------+----------+--------+------+------------+---------+---------------+ 3 rows in set (0.00 sec) Any suggestions would be helpful. Thanks, Shubham On 20 March 2016 at 11:10, Shubham Barai <shubhambaraiss@gmail.com> wrote:

...

Hi Sergei, Sorry! I made a mistake. Actually, when I changed my storage engine to MyISAM and executed create table statement(with unique blobs and no prefix length),mysqld crashed. After reconnecting to the server, the storage engine got changed to default (InnoDB).I didn't know about that and I executed all queries on InnoDB. I thought my patch was working for MyISAM but there are some issues.I am working on it but it might require some time to get it working. Since application deadline for GSoC is 25 March, I will try to submit my proposal as early as possible.

Thanks, Shubham

On 19 March 2016 at 14:22, Shubham Barai <shubhambaraiss@gmail.com> wrote:

...
Hello Sergei,

As you suggested, I tried to develop a prototype just for blobs. So far it is working fine but I am still verifying it for different cases. I will provide you the link to my Github repository shortly.

Thanks, Shubham

On 16 March 2016 at 17:55, Sergei Golubchik <serg@mariadb.org> wrote:

...
Hi, Shubham!

On Mar 16, Shubham Barai wrote:

...
Hello, Sergie! I tried to explore the source code from mi_create.c,ha_myisam.cc,sql/sql_table.cc,include/myisam.h,.and some other files. The main task is to create MI_UNIQUEDEF "uniques" for long unique constraints. We have to consider all the cases where we need to create MI_UNIQUEDEF instead of MI_KEYDEF.

It will include queries like create table table1 (blob_column blob,unique(blob_column) ); create table table1 (a int,blob_column blob,unique(a,blob_column) ); create table table1 (a int,blob_column1 blob,blob_column2 blob,unique(blob_column1(300),blob_column2) ); (key with multiple blob columns and one of the blob column specified with prefix length).

I think we have to create MI_UNIQUEDEF if any one of the columns in a key is a blob field without prefix length.

Yes. And also for any other UNIQUE constraint that is too long for a normal index. For example, many long VARCHAR columns.

But, of course, for a prototype one can start just with blobs.

...
In sql/sql_table, mysql_prepare_create_table is the function which prepares the table and key structures for table creation in mi_create. It generates an error if any one of the blob fields in a key is specified without length. Currently, this task is limited to MyISAM, so if any other storage engine is selected, we have to generate the same error in mysql_prepare_create_table.

There's a generic check whether the key is too long. It can be used here, it doesn't depend on the storage engine.

...
In storage/myisam/ha_myisam.cc, table2myisam is a function which allocates and initializes myisam key and column definitions.The current function prototype of table2myisam is table2myisam(TABLE *table_arg, MI_KEYDEF **keydef_out, MI_COLUMNDEF **recinfo_out, uint records_out) We have to change it to table2myisam(TABLE *table_arg,MI_KEYDEF **keydef_out,MI_UNIQUEDEF ** uniquedef_out,MI_COLUMNDEF **recinfo_out,uint records_out)

Right.

...
table2myisam initializes all the key definitions from table_arg->keyinfo. So we can set a new flag (say uniquedef) in a keyinfo struct in mysql_prepare_create_table if any one of the key_part consists of blob field without prefix length.

There's a field 'algorithm' already. Because MI_UNIQUEDEF in MyISAM is, basically, an index of hashed column values, it is kind of a hash index. So you can use algorithm=HA_KEY_ALG_HASH to mark such columns. And the user will be able to create these indexes explicitly with

create table table1 (blob_column blob,unique(blob_column) using hash);

...
Later we can check the flag in table2myisam to see if we want to create MI_KEYDEF or MI_UNIQUEDEF.

Thanks, Shubham.

Very good!

Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Sergei Golubchik

4:28 p.m.

Hi, Shubham! On Mar 20, Shubham Barai wrote:

...

Hi Sergei,

As I told you, I was working on prototype just for blobs. I was having some issue with key definition between MariaDB .frm file and information in the MyISAM storage engine. I have solved that problem temporarily. Here is my output

MariaDB [database1]> set storage_engine=myisam; Query OK, 0 rows affected (0.00 sec) MariaDB [mydb]> create table table1(a int,b1 blob,b2 longblob,unique(a,b1,b2(5))); Query OK, 0 rows affected (0.09 sec) MariaDB [mydb]> insert into table1 values(4,3333333,55555555); Query OK, 1 row affected (0.00 sec) MariaDB [mydb]> insert into table1 values(4,3333333,55555); ERROR 1169 (23000): Can't write, because of unique constraint, to table 'table1' MariaDB [mydb]> insert into table1 values(4,3333333,55555555); ERROR 1169 (23000): Can't write, because of unique constraint, to table 'table1' MariaDB [mydb]> insert into table1 values(4,333333453,55555555); Query OK, 1 row affected (0.00 sec) 3 rows in set (0.00 sec) MariaDB [mydb]> insert into table1 values(5,333333453,55555555); Query OK, 1 row affected (0.00 sec) MariaDB [mydb]> insert into table1 values(5,333333453,55555); ERROR 1169 (23000): Can't write, because of unique constraint, to table 'table1' MariaDB [mydb]> show indexes from table1; +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+ | table1 | 0 | a | 1 | a | A | NULL | NULL | NULL | YES | BTREE | | table1 | 0 | a | 2 | b1 | A | NULL | NULL | NULL | YES | BTREE | | table1 | 0 | a | 3 | b2 | A | 0 | 5 | NULL | YES | BTREE | +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+ 3 rows in set (0.00 sec)

Any suggestions would be helpful.

Frankly speaking, I'd suggest you not to miss a deadline and submit it a proposal now :) You apparently have more than enough information already to understand the project and to create a good proposal. As for the code - I don't really understand what you're asking about. Are you trying to say that something in the output above is wrong? Is SHOW INDEXES incorrect? Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Shubham Barai

21 Mar 21 Mar

7:13 a.m.

Hello Sergei, I have created a draft proposal and submitted to summerofcode website. I haven't written the project timeline yet. Actually, the project looks small for three months. I have already developed a prototype for blobs and it is working but there are a lot of things to be taken care of, such as compatibility issue of key definitions between .frm file and information stored in MyISAM storage engine. I would appreciate if you could go through it and leave comments so that I can make any necessary corrections. I am currently looking into InnoDB codebase to see if it is possible for me to extend this feature to InnoDB storage engine. As InnoDB doesn't support this feature internally, it would require more time than MyISAM. Any suggestions regarding this would be helpful. Thanks, Shubham On 20 March 2016 at 20:58, Sergei Golubchik <serg@mariadb.org> wrote:

...

Hi, Shubham!

On Mar 20, Shubham Barai wrote:

...
Hi Sergei,

As I told you, I was working on prototype just for blobs. I was having some issue with key definition between MariaDB .frm file and information in the MyISAM storage engine. I have solved that problem temporarily. Here is my output

MariaDB [database1]> set storage_engine=myisam; Query OK, 0 rows affected (0.00 sec) MariaDB [mydb]> create table table1(a int,b1 blob,b2 longblob,unique(a,b1,b2(5))); Query OK, 0 rows affected (0.09 sec) MariaDB [mydb]> insert into table1 values(4,3333333,55555555); Query OK, 1 row affected (0.00 sec) MariaDB [mydb]> insert into table1 values(4,3333333,55555); ERROR 1169 (23000): Can't write, because of unique constraint, to table 'table1' MariaDB [mydb]> insert into table1 values(4,3333333,55555555); ERROR 1169 (23000): Can't write, because of unique constraint, to table 'table1' MariaDB [mydb]> insert into table1 values(4,333333453,55555555); Query OK, 1 row affected (0.00 sec) 3 rows in set (0.00 sec) MariaDB [mydb]> insert into table1 values(5,333333453,55555555); Query OK, 1 row affected (0.00 sec) MariaDB [mydb]> insert into table1 values(5,333333453,55555); ERROR 1169 (23000): Can't write, because of unique constraint, to table 'table1' MariaDB [mydb]> show indexes from table1;

+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+

...
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type |

+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+

...
| table1 | 0 | a | 1 | a | A | NULL | NULL | NULL | YES | BTREE | | table1 | 0 | a | 2 | b1 | A | NULL | NULL | NULL | YES | BTREE | | table1 | 0 | a | 3 | b2 | A | 0 | 5 | NULL | YES | BTREE |

+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+

...
3 rows in set (0.00 sec)

Any suggestions would be helpful.

Frankly speaking, I'd suggest you not to miss a deadline and submit it a proposal now :) You apparently have more than enough information already to understand the project and to create a good proposal.

As for the code - I don't really understand what you're asking about. Are you trying to say that something in the output above is wrong? Is SHOW INDEXES incorrect?

Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Sergei Golubchik

12:07 p.m.

Hi, Shubham! On Mar 21, Shubham Barai wrote:

...

I am currently looking into InnoDB codebase to see if it is possible for me to extend this feature to InnoDB storage engine. As InnoDB doesn't support this feature internally, it would require more time than MyISAM. Any suggestions regarding this would be helpful.

Heh, that's very good (and ambitious) :) Do you already know how MyISAM supports arbitrary long UNIQUE constraints internally? It stores only the hash of the value (of the blob, for example) in the non-unique index, and on INSERT it checks if there are identical hashes in the index. If there are (hash collision) it will retrieve the rows and compare actual blob values. It seems that InnoDB should use the same approach as MyISAM. It'll need some care for a case when two concurrent transactions insert conflicting rows - as transaction changes are not visible until commit, you won't see the conflict until it's too late. But gap locks [1] should be able to prevent that. Regards, Sergei Chief Architect MariaDB and security@mariadb.org [1] https://dev.mysql.com/doc/refman/5.7/en/innodb-record-level-locks.html

Shubham Barai

22 Mar 22 Mar

4:28 p.m.

Hello Sergei, I understood most of the internals of long unique constraints in MyISAM. I am still going through the code in InnoDB. I will soon reply to you. Thanks, Shubham On 21 March 2016 at 16:37, Sergei Golubchik <serg@mariadb.org> wrote:

...

Hi, Shubham!

On Mar 21, Shubham Barai wrote:

...
I am currently looking into InnoDB codebase to see if it is possible for me to extend this feature to InnoDB storage engine. As InnoDB doesn't support this feature internally, it would require more time than MyISAM. Any suggestions regarding this would be helpful.

Heh, that's very good (and ambitious) :)

Do you already know how MyISAM supports arbitrary long UNIQUE constraints internally? It stores only the hash of the value (of the blob, for example) in the non-unique index, and on INSERT it checks if there are identical hashes in the index. If there are (hash collision) it will retrieve the rows and compare actual blob values.

It seems that InnoDB should use the same approach as MyISAM. It'll need some care for a case when two concurrent transactions insert conflicting rows - as transaction changes are not visible until commit, you won't see the conflict until it's too late. But gap locks [1] should be able to prevent that.

Regards, Sergei Chief Architect MariaDB and security@mariadb.org

[1] https://dev.mysql.com/doc/refman/5.7/en/innodb-record-level-locks.html

Shubham Barai

23 Mar 23 Mar

5:38 p.m.

Hello Sergei, I have gone through some of the source files in InnoDB. / handler/ha_innodb.cc /row/row0mysql.cc /row/row0ins.cc /dict/dict0mem.cc etc. In MyISAM, MI_KEYDEF and MI_UNIQUEDEF structure have most of the variables same except MI_KEYDEF uses some extra variables like keylength, minlength, maxlength . As we are calculating a hash of the column values for long UNIQUE constraints ,MI_UNIQUEDEF doesn't need them. mi_write(inserts a row in MyISAM table) and mi_update(updates a row) call mi_unique_check to verify all unique constraints before inserting or updating any row. In InnoDB , dict_index_t is the data structure used for an index. I think we can use the dict_index_t structure for our hash based index . It has a type-variable which represents the type of an index (DICT_CLUSTERED, DICT_UNIQUE,DICT_UNIVERSAL etc) For our new index, we can define a new type (let's say DICT_NON_UNIQUE_HASH). Insertion of a row in InnoDB has a function call graph like this ha_innobase::write_row(uchar* record) [image: Inline images 1] row_insert_for_mysql(byte* mysql_rec, row_prebuilt_t* prebuilt) [image: Inline images 1] row_ins_step(que_thr_t* thr) [image: Inline images 5] row_ins(ins_node_t* node,que_thr_t* thr) [image: Inline images 7] row_ins_index_entry_step(ins_node_t* node,que_thr_t* thr) [image: Inline images 7] row_ins_index_entry_set_vals(dict_index_t* then row_ins_index_entry ( dict_index_t* index, dtuple_t* entry, que_thr_t* thr) index,dtuple_t* entry,const dtuple_t* row) [image: Inline images 1] row_ins_clust_index_entry/ row_ins_sec_index_entry [image: Inline images 3] row_ins_sec_index_entry_low [image: Inline images 4] row_ins_scan_sec_index_for_duplicate 1. row_ins(inserts a row to a table) calls row_ins_index_entry_step for each index defined on a table. 2. row_ins_index_entry_step first calls row_ins_index_entry_set_vals and then row_ins_index_entry . row_ins_index_entry_set_vals is a function which sets the values of the dtuple fields in entry from the appropriate columns in row. entry is the index tuple which we are going to insert into the index. 3. In our hash based index ,instead of storing the actual key value in the index, we have to store an only hash of the column values in a key . So for our new DICT_NON_UNIQUE_HASH index ,we can define a new function row_ins_index_entry_set_vals_hash which will calculate hash of all the columns values in a key and create an index entry to insert into index 4. row_ins_sec_index_entry_low is the function which tries to insert an entry into the secondary index. If the type of the index is DICT_UNIQUE, it calls row_ins_scan_sec_index_for_duplicate. row_ins_scan_sec_index_for_duplicate scans a unique non-clustered index to check unique constraints. In our case (if the type is DICT_NON_UNIQUE_HASH), we can define a new function row_ins_scan_sec_index_for_duplicate_hash which will scan the index and if there are identical hashes in the index,it will retrieve the rows and compare actual values. This is my initial approach for implementing long unique constraints in InnoDB. Do you think this approach will work? I would really appreciate your suggestions. Thanks, Shubham On 22 March 2016 at 20:58, Shubham Barai <shubhambaraiss@gmail.com> wrote:

...

Hello Sergei,

I understood most of the internals of long unique constraints in MyISAM. I am still going through the code in InnoDB. I will soon reply to you.

Thanks, Shubham

On 21 March 2016 at 16:37, Sergei Golubchik <serg@mariadb.org> wrote:

...
Hi, Shubham!

On Mar 21, Shubham Barai wrote:

...
I am currently looking into InnoDB codebase to see if it is possible for me to extend this feature to InnoDB storage engine. As InnoDB doesn't support this feature internally, it would require more time than MyISAM. Any suggestions regarding this would be helpful.

Heh, that's very good (and ambitious) :)

Do you already know how MyISAM supports arbitrary long UNIQUE constraints internally? It stores only the hash of the value (of the blob, for example) in the non-unique index, and on INSERT it checks if there are identical hashes in the index. If there are (hash collision) it will retrieve the rows and compare actual blob values.

It seems that InnoDB should use the same approach as MyISAM. It'll need some care for a case when two concurrent transactions insert conflicting rows - as transaction changes are not visible until commit, you won't see the conflict until it's too late. But gap locks [1] should be able to prevent that.

Regards, Sergei Chief Architect MariaDB and security@mariadb.org

[1] https://dev.mysql.com/doc/refman/5.7/en/innodb-record-level-locks.html

Jan Lindström

7:30 p.m.

Hi, All below is correct and naturally you may re-implement duplicate search on different index type. I think there is more code that needs change as this new index type would contain a column (hash value of the blob field) that is not on stored on base table, right ? Indexed columns are also stored inside a InnoDB data dictionary persistently, so that part would also need change (dict/dict0*.cc files). Actually, you could store blob has also to table, it could make things easier. Secondly, remember that unique keys can be used inside a InnoDB as foreign keys, this is again a design question, do you allow blobs to be foreign keys or not. Finally, unique key with NOT NULL can be used as primary key i.e. clustered key on InnoDB, using blobs hash on that might be out of reach on this timetable. R: Jan Lindström Principal Engineer InnoDB On Wed, Mar 23, 2016 at 6:38 PM, Shubham Barai <shubhambaraiss@gmail.com> wrote:

...

Hello Sergei,

I have gone through some of the source files in InnoDB.

/ handler/ha_innodb.cc /row/row0mysql.cc /row/row0ins.cc /dict/dict0mem.cc etc.

In MyISAM, MI_KEYDEF and MI_UNIQUEDEF structure have most of the variables same except MI_KEYDEF uses some extra variables like keylength, minlength, maxlength . As we are calculating a hash of the column values for long UNIQUE constraints ,MI_UNIQUEDEF doesn't need them.

mi_write(inserts a row in MyISAM table) and mi_update(updates a row) call mi_unique_check to verify all unique constraints before inserting or updating any row.

In InnoDB , dict_index_t is the data structure used for an index. I think we can use the dict_index_t structure for our hash based index . It has a type-variable which represents the type of an index (DICT_CLUSTERED, DICT_UNIQUE,DICT_UNIVERSAL etc)

For our new index, we can define a new type (let's say DICT_NON_UNIQUE_HASH).

Insertion of a row in InnoDB has a function call graph like this

ha_innobase::write_row(uchar* record)

[image: Inline images 1]

row_insert_for_mysql(byte* mysql_rec, row_prebuilt_t* prebuilt)

[image: Inline images 1]

row_ins_step(que_thr_t* thr)

[image: Inline images 5]

row_ins(ins_node_t* node,que_thr_t* thr)

[image: Inline images 7]

row_ins_index_entry_step(ins_node_t* node,que_thr_t* thr)

[image: Inline images 7]

row_ins_index_entry_set_vals(dict_index_t* then row_ins_index_entry ( dict_index_t* index, dtuple_t* entry, que_thr_t* thr) index,dtuple_t* entry,const dtuple_t* row)

[image: Inline images 1]

row_ins_clust_index_entry/

row_ins_sec_index_entry

[image: Inline images 3]

row_ins_sec_index_entry_low

[image: Inline images 4]

row_ins_scan_sec_index_for_duplicate

1. row_ins(inserts a row to a table) calls row_ins_index_entry_step for each index defined on a table.

2. row_ins_index_entry_step first calls row_ins_index_entry_set_vals and then row_ins_index_entry .

row_ins_index_entry_set_vals is a function which sets the values of the dtuple fields in entry from the appropriate columns in row.

entry is the index tuple which we are going to insert into the index.

3. In our hash based index ,instead of storing the actual key value in the index, we have to store an only hash of the column values in a key .

So for our new DICT_NON_UNIQUE_HASH index ,we can define a new function

row_ins_index_entry_set_vals_hash which will calculate hash of all the columns values in a key and create an index entry to insert into index

4. row_ins_sec_index_entry_low is the function which tries to insert an entry into the secondary index. If the type of the index is DICT_UNIQUE,

it calls row_ins_scan_sec_index_for_duplicate. row_ins_scan_sec_index_for_duplicate scans a unique non-clustered index

to check unique constraints.

In our case (if the type is DICT_NON_UNIQUE_HASH), we can define a new function row_ins_scan_sec_index_for_duplicate_hash which will scan the index and if there are identical hashes in the index,it will retrieve the rows and compare actual values.

This is my initial approach for implementing long unique constraints in InnoDB.

Do you think this approach will work?

I would really appreciate your suggestions.

Thanks,

Shubham

On 22 March 2016 at 20:58, Shubham Barai <shubhambaraiss@gmail.com> wrote:

...
Hello Sergei,

I understood most of the internals of long unique constraints in MyISAM. I am still going through the code in InnoDB. I will soon reply to you.

Thanks, Shubham

On 21 March 2016 at 16:37, Sergei Golubchik <serg@mariadb.org> wrote:

...
Hi, Shubham!

On Mar 21, Shubham Barai wrote:

...
I am currently looking into InnoDB codebase to see if it is possible for me to extend this feature to InnoDB storage engine. As InnoDB doesn't support this feature internally, it would require more time than MyISAM. Any suggestions regarding this would be helpful.

Heh, that's very good (and ambitious) :)

Do you already know how MyISAM supports arbitrary long UNIQUE constraints internally? It stores only the hash of the value (of the blob, for example) in the non-unique index, and on INSERT it checks if there are identical hashes in the index. If there are (hash collision) it will retrieve the rows and compare actual blob values.

It seems that InnoDB should use the same approach as MyISAM. It'll need some care for a case when two concurrent transactions insert conflicting rows - as transaction changes are not visible until commit, you won't see the conflict until it's too late. But gap locks [1] should be able to prevent that.

Regards, Sergei Chief Architect MariaDB and security@mariadb.org

[1] https://dev.mysql.com/doc/refman/5.7/en/innodb-record-level-locks.html

_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

Sergei Golubchik

9:17 p.m.

Hi, Jan! On Mar 23, Jan Lindström wrote:

...

Hi,

All below is correct and naturally you may re-implement duplicate search on different index type. I think there is more code that needs change as this new index type would contain a column (hash value of the blob field) that is not on stored on base table, right ? Indexed columns are also stored inside a InnoDB data dictionary persistently, so that part would also need change (dict/dict0*.cc files). Actually, you could store blob has also to table, it could make things easier. Secondly, remember that unique keys can be used inside a InnoDB as foreign keys, this is again a design question, do you allow blobs to be foreign keys or not. Finally, unique key with NOT NULL can be used as primary key i.e. clustered key on InnoDB, using blobs hash on that might be out of reach on this timetable.

R: Jan Lindström Principal Engineer InnoDB

Agree, blobs (even UNIQUE NOT NULL) cannot be used as primary keys. For simplicity I would not support foreign keys either, this can be added later. Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Shubham Barai

24 Mar 24 Mar

4:51 p.m.

Hello Jan, Thanks for the kind response and for giving the details of implementation. I am going to explore InnoDB data dictionary before the official coding period starts. Thanks, Shubham On 24 March 2016 at 01:47, Sergei Golubchik <serg@mariadb.org> wrote:

...

Hi, Jan!

...
Hi,

All below is correct and naturally you may re-implement duplicate search on different index type. I think there is more code that needs change as

On Mar 23, Jan Lindström wrote: this

...
new index type would contain a column (hash value of the blob field) that is not on stored on base table, right ? Indexed columns are also stored inside a InnoDB data dictionary persistently, so that part would also need change (dict/dict0*.cc files). Actually, you could store blob has also to table, it could make things easier. Secondly, remember that unique keys can be used inside a InnoDB as foreign keys, this is again a design question, do you allow blobs to be foreign keys or not. Finally, unique key with NOT NULL can be used as primary key i.e. clustered key on InnoDB, using blobs hash on that might be out of reach on this timetable.

R: Jan Lindström Principal Engineer InnoDB

Agree, blobs (even UNIQUE NOT NULL) cannot be used as primary keys. For simplicity I would not support foreign keys either, this can be added later.

Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Shubham Barai

4:53 p.m.

Hello Sergei, After discussing with Jan and you , I have included the proposal for InnoDB and project timeline in my draft proposal. It would be great if you can provide final feedback so that I will be able to submit my final proposal today. Thanks, Shubham On 24 March 2016 at 21:21, Shubham Barai <shubhambaraiss@gmail.com> wrote:

...

Hello Jan,

Thanks for the kind response and for giving the details of implementation. I am going to explore InnoDB data dictionary before the official coding period starts.

Thanks, Shubham

On 24 March 2016 at 01:47, Sergei Golubchik <serg@mariadb.org> wrote:

...
Hi, Jan!

...
Hi,

All below is correct and naturally you may re-implement duplicate search on different index type. I think there is more code that needs change as

...
new index type would contain a column (hash value of the blob field)

On Mar 23, Jan Lindström wrote: this that

...
is not on stored on base table, right ? Indexed columns are also stored inside a InnoDB data dictionary persistently, so that part would also need change (dict/dict0*.cc files). Actually, you could store blob has also to table, it could make things easier. Secondly, remember that unique keys can be used inside a InnoDB as foreign keys, this is again a design question, do you allow blobs to be foreign keys or not. Finally, unique key with NOT NULL can be used as primary key i.e. clustered key on InnoDB, using blobs hash on that might be out of reach on this timetable.

R: Jan Lindström Principal Engineer InnoDB

Agree, blobs (even UNIQUE NOT NULL) cannot be used as primary keys. For simplicity I would not support foreign keys either, this can be added later.

Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Sergei Golubchik

14 Apr 14 Apr

3:46 p.m.

Hi, Shubham! What do you think about a higher-level implementation? Not in InnoDB or in MyISAM at all, but in the sql/ directory, on the sql layer? Like, on CREATE TABLE you automatically add a new BIGINT column, but hide it from the user (not show it in SHOW COLUMNS and in SELECT *). Create an index for it (also hidden). Put the blob's hash into this column. And on collisions retrieve both rows and compare blobs. Just like MyISAM is doing, but not on the storage engine level, but in the engine-independent fashion, something that works for all engines. This might be even easier than doing it inside InnoDB. Want to look at this approach? MySQL 5.7 has something similar (long unique constraints on the sql layer), but they've done it only for internal temporary tables (like MyISAM's MI_UNIQUE is only used for internal temporary tables), and it's not directly available for the user. Still, it might give you some hinst about how to implement this feature on the sql layer. What do you think? Regards, Sergei Chief Architect MariaDB and security@mariadb.org On Mar 24, Shubham Barai wrote:

...

Hello Sergei,

After discussing with Jan and you , I have included the proposal for InnoDB and project timeline in my draft proposal. It would be great if you can provide final feedback so that I will be able to submit my final proposal today.

Sergei Golubchik

23 Apr 23 Apr

10:47 a.m.

New subject: [Maria-developers] GSoC 2016: Unique indexes for blobs

Hi, Shubham! While there's a community bonding period now, let's get prepared for coding, shall we? You can start from creating a proper git repository with your prototype code. Here's how you can do it: * read https://help.github.com/categories/bootcamp/ * fork https://github.com/MariaDB/server * clone it locally * create a new branch for your prototype. Like this: git branch prototype and check it out: git checkout prototype you could create a branch and checkout in one command too with git checkout -b prototype * Now replace ha_myisam.cc with your modified copy. Do it for all files you've changed. * commit (git commit or git citool) and push Don't forget, the best place for a quick question is irc. we're very responsive on the #maria channel on Freenode. It's community bonding period, so go ahead and bond away :) Regards, Sergei Chief Architect MariaDB and security@mariadb.org

3366

Age (days ago)

3417

Last active (days ago)

List overview

18 comments

4 participants

participants (4)

Jan Lindström
Sachin Setia
Sergei Golubchik
Shubham Barai

[Maria-developers] GSoC 2016:Unique indexes for blobs

tags

participants (4)