Hi, Sachin! First. I believe you'll need to do your final evaluation soon, and it will need to have a link to the code. Did you check google guidelines about it? Is everything clear there? Do you need help publishing your work in a format that google requires? They don't accept delays for any reasons, so even if your code is not 100% complete and ready, you'd better still publish it and submit the evaluation, because otherwise google will fail you and that'd be too sad. If you'd like you can publish the google-way only the unique-constraint part without further optimizer work. Or at least please mention that you'd completed the original project and went working on extensions. I mean, it's better than saying "the code is not 100% complete" :) On Aug 18, Sachin Setia wrote:
Hello Sergei! I am stuck at one problem consider table t1(a blob , b blob , unique(a,b)); although select * from t1 where a= 12 and b= 23 works but consider the case like select * from t1 where ( a= 12 or a=45 ) and (b= 23 or b=45 ) does not works and also update and delete using long index does not work simple query like delete/ update table t1 where a=1 and b=3; does not works
"does not work" means "return wrong results"? Or "does not use the index but results are correct" ?
The reason is that because these query uses test_quick_select function which does not recognize hash index basically in get_mm_parts it always return false because it compare db_row_hash_1 to a and b i solved this problem but now the problems 1. first how to assure we have both column a, and b is in where currently i do this as crawl through tree->key[i]->next_key_part untill next_key_part is null this works for simple case like a= 1 and b=2 but i am not sure how will i do this in conditions like ((a = 1 or a =34) and (b=34 or b=33)) or (b=34) ^^^^^^^^^^^^^^^^^^^^^^^ in this condition before or past it is okay to use hash but for b=34 we should not use hash index but do not know how to do
Range optimizer can do it. With normal indexes, for example, it needs to convert that query into four "ranges" (I use quotes, because these are degenerate ranges of one value only, but they're still ranges internally): (1,33), (1,34), (34,44), (34, 34) For every range the optimizer will call handler::records_in_range() to do its cost estimations. If you'd run a test, like create table t1 (a int, b int, index(a,b), c int); insert t1 values (1,2,3),(3,4,5),(5,6,7); explain select * from t1 force index (a) where (a=1 or a=3) and (b=2 or b=4); and put a breakpoint on ha_myisam::records_in_range(), you'll be able to find where the SEL_ARG tree is converted to ranges. check_quick_select() sets RANGE_SEQ_IF seq_if = {NULL, sel_arg_range_seq_init, sel_arg_range_seq_next, 0, 0}; and later in handler::multi_range_read_info_const() these two functions (seq_if.init() and seq_if.next()) are used to iterate the tree and create ranges. The main work is done in sel_arg_range_seq_next(). But please try not to copy-paste it. [ Side note: could you please try to use more punctuation in your emails? :) It's a bit difficult to understand what you mean when sentences get long. ]
2. when SEL_TREE is evaluated ?
I believe I've answered that above :) handler::multi_range_read_info_const() converts the tree into two (min and max) key images.
3. where should i evaluate hash i can do that but main problem is that i need to preserve original data like a=1 and b=3 etc because hash does not guaranty same.
Hmm... You can compute the hash as above, where the key image is created. Original data... SQL layer does not always trust the engine, in some cases it evaluates the WHERE expression after getting the row from the engine. I think (not sure) that if you keep the a=1 and b=3 part in the COND then the upper layer will evaluate it normally, like if (!cond || cond->val_int() != 0) so you just need to make sure that 1. a=1 and b=3 is not removed from COND 2. when cond->val_int() is 0, upper layer does *not* treat it as reaching the end of the index range and does *not* stop the loop of index_next() calls. (for a normal index, if you search for a=5 and do index_next, as soon as you found a!=5 you can stop searching).
4 there is one more problem in test_quick_select
there is one code
if ((range_trp= get_key_scans_params(¶m, tree, FALSE, TRUE, best_read_time)))
in the case of hash key before reaching this function we must have to calculate hash after this it will work but i guess upto this point our tree is not evaluated , i do not know
Sorry, I didn't quite understand that :(
please review branch https://github.com/SachinSetiya/server/tree/unique_index_where_up_de and tell me what should i do and also review the orignal branch unique_index_sachin i added more test cases and reinsert also works
I won't be able to do that until next week. I've already started reviewing a big diff with your commits, if I'll recreate it now with more commits, I'll need to start over. I'd better finish reviewing what I have now (up to commit 03e29c6) and then look at newer changes.
On Sun, Aug 14, 2016 at 1:46 PM, Sergei Golubchik <serg@mariadb.org> wrote:
+ cs= str->charset(); + uchar l[4]; + int4store(l,str->length()); + cs->coll->hash_sort(cs,l,sizeof(l), &nr1, &nr2);
You use
cs->coll->hash_sort(cs,l,sizeof(l), &nr1, &nr2);
to sort the length, the byte value of str->length(). This is binary data, you should have
cs= my_charset_binary;
i do not find any variable name my_charset_binary but i used
my_charset_utf8_bin is it ok ?
No, utf8 charset uses a concept of "chracter" and "letter", it still compares characters and expects the string to be a valid utf8 string. You have just four bytes, they can contain anything (invalid utf8 too). And must be compated as binary data. The correct name is my_charset_bin, sorry. You can grep for it, it's used everywhere. Regards, Sergei Chief Architect MariaDB and security@mariadb.org