Hello Sergei
Today I made some progress related to project.
MyISAM/ARIA
Got clear understanding of how to implement unique index for query like
create table tbl(col1 int primary key , col2 blob ,col3 blob , unique(col2,col3))
InnoDB
Reading about it.Actually Sir, I want to do this project whether I will select in
gsoc or not(because InnoDB is amazing).
Proposal
Still Writing
Actually sir i have one doubt in table2myisam function definition
recinfo_out, (share->fields * 2 + 2) * sizeof(MI_COLUMNDEF),
^^^^^ ^ ^
why we allocating these many number of recinfo because we only require share->fields + 1 .
One more doubt in optimizing "select distinct coloumn_name(here it is a blob coloumn) from table"
query. In mi write which take one record and write it we check for unique constraint. It takes O(n^2)
time. I was thinking if we can optimize this by first fetching the whole table record and calculating hash for
each record.Instead of comparing one hash with all other we can sort the hashes and ignore the duplicate
(we can make an array of 0 and 1 and if it 1 that means record is not duplicate and for 0 it is duplicte)
.buy doing this we can reduce the time complexity to O(nlog(n)).This will work fast if we have enough buffer_storage
in case of low buffer memory this will turn to tradeoff between cpu and i/o requests because in order to sort keys
in low ram we need to use m way merge sort which ultimately result in more I/O because we have to send back records to
hard disk which we can not store in ram and then once again fetch unique record for storing in tmp table.But we can get
performance if records fit in ram .For caching the records we can do it over here
sql/sql_select.cc
18313 error= info->read_record(info);
Regards
sachin