Hello Elena and all,
I have pushed the fixed code. There are a lot of changes in it because I went through all the code making sure that it made sense. The commit is here, and although there are a lot of changes, the main line where failures are caught or missed is this.
  1. The test result file edition information helps improve recall - if marginally
  2. The time since last run information does not improve recall much at all - See [Weaknesses - 2]
A couple of concepts that I want to define before going on:
  • First failures. These are failures that happen because of new bugs. They don't occur close in time as part of a chain of failures. The occur as a consequence of a transaction that introduces a bug, but they might occur soon or long after this transaction (usually soon, rather than long). They might be correlated with the frequency of failure of a test (core or basic tests that fail often might be specially good at exposing bugs); but many of them are not (tests of a feature, that don't fail often, but rather, when that feature is modified).
  • Strict simulation mode. This is the mode where, if a test is not part of the running set, its failure is not considered.
Weaknesses:
Some ideas:
I am currently running tests to get the adjusted results. I will graph them, and send them out in a couple hours.
Regards

Pablo


On Fri, Jun 13, 2014 at 12:40 AM, Elena Stepanova <elenst@montyprogram.com> wrote:
Hi Pablo,

Thanks for the update.


On 12.06.2014 19:13, Pablo Estrada wrote:
Hello Sergei, Elena and all,
Today while working on the script, I found and fixed an issue:

There is some faulty code code in my script that is in charge of collecting
the statistics about whether a test failure was caught or not (here
<https://github.com/pabloem/Kokiri/blob/master/basic_simulator.py#L393>). I
looked into fixing it, and then I could see another *problem*: The *recall
numbers* that I had collected previously were *too high*.

The actual recall numbers, once we consider the test failures that are *not
caught*, are disappointingly lower. I won't show you results yet, since I

want to make sure that the code has been fixed, and I have accurate tests
first.

This is all for now. The strategy that I was using is a lot less effective
than it seemed initially. I will send out a more detailed report with
results, my opinion on the weak points of the strategy, and ideas,
including a roadmap to try to improve results.

Regards. All feedback is welcome.

Please push your fixed code that triggered the new results, even if you are not ready to share the results themselves yet. It will be easier to discuss then.

Regards,
Elena


Pablo



_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp