Thanks so much for looking into this! 

Last weekend I started playing with the code, but it looks like I may need to upgrade to an SSD to get a reasonable build time.

For Zone B I was imagining there would be a way to try increasingly large jumps, perhaps aided by an index cursor that remembered some information it saw as it walked through the B*-tree and page directory.

I'm happy to hear an optimization to skip over Zone A looks fairly easy. 

As for real data, do full-text indexes return rowid-ordered rows? If so, queries for a long-tail word (e.g. "hobbit") AND a frequent value (e.g. soft_deleted=0) should benefit significantly from skipping over Zone A. Maybe we could use a Wikipedia dump? 

Thank you, 
David




On Wed, 2 Oct 2019, 8:51 AM jocelyn fournier, <jocelyn.fournier@gmail.com> wrote:
Hi Sergey!

Actually fetching min & max value for each keys (key1=1 and key2=2 in your case) should not cost much?
Then you can compute the overlap zone (Zone B in your graph) and directly perform a jump on the beginning of Zone B and stop at the end of Zone B instead of EOF, using the index with the less amount of matching rows.

BR,
  Jocelyn


> Le 1 oct. 2019 à 21:31, Sergey Petrunia <sergey@mariadb.com> a écrit :
>
> Hello,
>
> More thoughts on this:
>
> Let's consider a query
>
> select * from t1 where key1=1 and key2=2
>
> let's assume it uses index itnersection.
>
> Check the attached picture. It shows the table rowids at the center,
> key1 on the left, key2 on the right.
>
> There are three zones:
>
> - Zone A, where rowids matching 'key1=1' have no overlap with any rowids for 'key2=2'
>
> - Zone B, where rowids from two scans overlap.
>
> - Zone C, similar to zone A but at the end of the scan. Here, rowids matching
> key2=2 have no overlap.
>
> The current "merge-ordered-streams" algorithm takes care of "Zone C".
> the code will return EOF as soon as one of the merged streams has produced EOF.
>
> Now, let's look at Zone B.  Here, the code would scan both indexes forward
> and search for records with key=1 that have matches with key=2.
>
> The search forward is this loop in QUICK_ROR_INTERSECT_SELECT::get_next:
>
>    do
>    {
>      DBUG_EXECUTE_IF("innodb_quick_report_deadlock",
>                      DBUG_SET("+d,innodb_report_deadlock"););
>      if (unlikely((error= quick->get_next())))
>      {
>        /* On certain errors like deadlock, trx might be rolled back.*/
>        if (!thd->transaction_rollback_request)
>          quick_with_last_rowid->file->unlock_row();
>        DBUG_RETURN(error);
>      }
>      quick->file->position(quick->record);
>      cmp= head->file->cmp_ref(quick->file->ref, last_rowid);
>      if (cmp < 0)
>      {
>        /* This row is being skipped.  Release lock on it. */
>        quick->file->unlock_row();
>      }
>    } while (cmp < 0);
>
>
> the quick->get_next() call (after several levels of indirection) do this
> storage engine API call:
>
>  handler->index_next_same();
>
> And the question is whether it would be faster if this call was made instead:
>
>  handler->index_read_map(key_value, rowid_of_match_candidate);
>
> The answer is that it's faster to do a jump when we will jump over many
> records. When we jump over a few, it is actually slower than making several
> index_next_same() calls.  The first reason for this is that doing seeks will
> break InnoDB's prefetch buffer optimization. There might be other reasons.
>
> I don't have any idea how one could predict the jump size.
>
> As for Zone A, we could jump it over with one jump. If we first read the rowid
> from "key2=2", then we could already use it when making a lookup with key1=1.
>
> But what if we first did an index lookup on key1=1? Then we will miss the
> chance to jump to the first rowid for that one sees for key2=2...
>
> If we assume that the first jump is more important than others, we could just
> look at the first record in each of the merged indexes and then feed them all
> back the maximum rowid we saw as the starting point. This would let us jump
> over zone A.
>
> I think, implementing "jump over zone A" is easier than inventing a way to do
> jumps while in zone B. (And this is actually what your idea was).
>
>
> A different question - are there any example datasets or queries we
> could try this on? One can of course construct an artificial dataset and
> queries but it would be much nicer to try this on a real dataset and a real
> query.
>
> BR
> -- Sergei P.
>
> On Fri, Sep 20, 2019 at 12:31:10PM +0300, Sergey Petrunia wrote:
>> Hello David,
>>
>> On Mon, Sep 16, 2019 at 09:07:09PM +1200, David Sickmiller wrote:
>>> I've been using MySQL/MariaDB for two decades but have more recently been
>>> working with Elasticsearch.  I knew to expect an inverted index to speed up
>>> querying full text fields, but I've been surprised (and a bit annoyed) at
>>> how fast ES can query structured data.  (In my case, I'm largely looking
>>> for intersections of a number of varchar fields with lowish cardinality,
>>> e.g. WHERE country = 'US' AND client = 'Microsoft' AND status =
>>> 'Completed'.)
>>>
>>> Elasticsearch seems to have several things going on, but I think a core
>>> aspect, to use RDMS terminology, is that each column is indexed, and index
>>> unions/intersections are used if the WHERE clause references multiple
>>> columns.
>>>
>>> I've heard that MySQL/MariaDB has the ability to merge indexes, but I've
>>> rarely observed it in person.  Googling for it yields a bunch of
>>> StackOverflow posts complaining how slow it is, with responses agreeing and
>>> explaining how to disable it.
>>>
>>> If I'm reading the MySQL/MariaDB code correctly, it looks like MariaDB will
>>> intersect indexes by looping through each index, reading the rowids of all
>>> matching keys and then, at the end (or once the buffer is full), checking
>>> whether each rowid is present in each index.
>>
>> There are two algorithms:
>>
>> 1. index_merge/intersection
>>
>> This is implemented in QUICK_ROR_INTERSECT_SELECT.
>> It is applicable when rowids from each source we are merging come ordered by
>> the rowid value.
>>
>> This requirement is met when the scan over a single-point range. If the table
>> has an index defined as
>>
>> INDEX i1(col1, ..., colN)
>>
>> then index tuples that compare as equal are ordered by their rowid. That is,
>> if one does an index scan over
>>
>> (col1, ... colN)=(const1, ..., constN)
>>
>> they will get the records i`n rowid order.
>>
>> QUICK_ROR_INTERSECT select will run the scans on all merged streams
>> simultaneously and do ordered-stream-merge on them.
>>
>> 2. index_merge/sort-interesect
>> ( https://mariadb.com/kb/en/library/index_merge-sort_intersection/)
>>
>> This is implemented in QUICK_INDEX_INTERSECT_SELECT.
>>
>> The algorithm doesn't assume that the input streams are ordered.
>>
>> It scans all inputs and puts the rowids into a "Unique" object (think RB tree
>> which overflows to disk). After the scans are finished, it can check which
>> rowids were produced in all of the inputs, and those are in the intersection.
>>
>>> I wonder if there's an opportunity to speed this up.  If we first read in
>>> the rowids from one index (ideally the one with the fewest matches), we
>>> could tell the storage engine that, when reading the next index, it should
>>> skip over rowids lower than the next candidate intersection.
>>
>> This is a good idea, neither QUICK_ROR_INTERSECT_SELECT, nor
>> QUICK_INDEX_INTERSECT_SELECT do this.
>>
>>> In the best
>>> case scenario, I think this could enable InnoDB to use its page directory
>>> to skip past some of the keys, improving the performance from O(n) to O(log
>>> n).
>>
>> Agree.
>> If the scans being merged produce data with non-overlapping rowid ranges,
>> then things could be sped up. I'm just wondering how often this is the case
>> in practice. Do you have any thoughts this?
>>
>>> That said, this is all new to me.  Maybe there's an obvious reason this
>>> wouldn't make much of an improvement, or maybe I've overlooked that it's
>>> already been done.  However, if it looks promising to you folk, and it's
>>> something you'd consider merging, I'd be willing to attempt writing a PR
>>> for it.
>>
>> BR
>> Sergei
>> --
>> Sergei Petrunia, Software Developer
>> MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
>>
>>
>
> BR
> Sergei
> --
> Sergei Petrunia, Software Developer
> MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
>
>
> <index-merge1.jpg>