Hello everyone,
Here are the highlights of this week:
  1. I am keeping the script customizable, to make simple/complex runs if necessary. I am also documenting the options.
  2. I have added the logic to consider test file changes into the mix. Finding first-failures has improved significantly, and this is useful.
  3. I am working now on considering correlation between test failures. Adding this should be quite quick, and I should be testing it by the end of the week.
  4. I feel quite okay with the results so far. I have been testing with the first 5000 test runs (2000 for training, 3000 for simulation). Once I finish implementing the factor of correlation, I will run tests on a more ample spectrum (~20,000 or ~50,000 test runs). If the results look good, we might be able to review, make any changes/suggestions, and if everyone is okay with it, discuss details about the implementation while I finish the last touches for the simulation script. How does that seem?
Regards, and best for everyone.

Pablo


On Tue, Jun 3, 2014 at 7:42 AM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Pablo!

To add to my last reply:

On Jun 03, Sergei Golubchik wrote:
> Well, it's your project, you can keep any measure you want.
> But please mark clearly (in comments or whatever) what factors affect
> results and what don't.
>
> It would be very useful to be able to see the simplest possible model
> that still delivers reasonably good results. Even if we'll decide to use
> something more complicated at the end.
...
> Same as above, basically. I'd prefer to use not the model that simply
> "looks realistic", but the one that makes best predictions.
>
> You can use whatever criteria you prefer, but if taking into account
> changed tests will not improve the results, I'd like it to be to clearly
> documented or visible in the code.

Alternatively, you can deliver (when the this GSoC project ends) two
versions of the script - one with anything you want in it, and the
second one - as simple as possible.

For example, the only really important metric is the "recall as a
function of total testing` time". We want to reach as high recall as
possible in the shortest possible testing time, right? But according to
this criteria one needs to take into account individual test execution
times (it's better to run 5 fast tests than 1 slow test) and individual
builder speed factors (better to run 10 tests on a fast builder than 5
tests on a slow builder). And in my tests it turned out that these
complications don't improve results much. So, while they make perfect
sense and make the model more realistic, the simple model can perfectly
survive without them and use "recall vs. number of tests" metric.

Regards,
Sergei