[Maria-developers] MySQL Optimizer Trace Questions

20 Feb 2014

      Hello All,

I work for TokuTek as a Storage Engine developer and I have been using
the, relatively, new optimizer trace functionality.  I have found the
results very helpful when combined with analyzing the source code, but
there a few questions I want to ask.  Here is some output from the
trace for a recent SELECT query on an InnoDB table:

At the end of the trace, it shows the index chosen by, what I presume,
is the optimizer:

...
"considered_execution_plans":
  [
    {
      "plan_prefix": [],
      "table": "`reports` `a`",
      "best_access_path":
        {
            "considered_access_paths": [
               {
                 "access_type": "ref",
                 "index": "reports_group_id",
                 "rows": 1.55e6,
                 "cost": 411057,
                 "chosen": true
               },
               ...
             ]
         },
       "cost_for_plan": 411057,
       "rows_for_plan": 1.55e6,
       "chosen": true
     }
  ]
},
...

Here are the cost results for some rejected alternative indexes,
(note: the row count for two of them is similar to the chosen index):

...
"analyzing_range_alternatives": {
      "range_scan_alternatives"
...
          "rows": 2377102,
          "cost": 2.81e6,
...
          "rows": 1554164,
          "cost": 1.84e6,
...
          "rows": 2346664,
          "cost": 2.78e6,
...
          "rows": 1554164,
          "cost": 1.84e6,
...

Here is the cost analysis for a table scan, from the beginning of the
trace (still not as good as the above chosen index, but close on
cost):

...
"table": "`reports` `a`",
"range_analysis":
 {
   "table_scan":
    {
      "rows": 3108329,
      "cost": 655076
    },
...

Related to the fact that the table scan analysis is dominated by the
large row count, and probably rejected as a valid plan compared to the
chosen index for that very reason:

1. How is the row count more than merely informational to the
optimizer?  I had the impression the row count did not directly
contribute to the optimizer's decision as much as the estimated cost,
or that the row count was already accounted for/factored into in the
final cost measurement.
2.  Related to that question, what is cost exactly?  Is there a unit
of measurement (ex: Time, Disk Seeks, etc.)?
3.  Is cost more important than row count or equally important?
4.  What is the genesis of the cost variable displayed in the trace?
I assume it is just a guess on the respective storage engine's part of
how expensive (in terms of disk access, processing, time, etc.)
getting the rows will be for the query.

thanks in advance,
Christian

[Maria-developers] MySQL Optimizer Trace Questions

Christian Rober