Hi, Pablo! On May 25, Pablo Estrada wrote:
On Thu, May 22, 2014 at 5:39 PM, Sergei Golubchik <serg@mariadb.org> wrote:
I don't think you should introduce artificial limitations that make the recall worse, because they "look realistic".
You can do it realistic instead, not look realistic - simply pretend that your code is already running on buildbot and limits the number of tests to run. So, if the test didn't run - you don't have any failure information about it.
And then you only need to do what improves recall, nothing else :)
(of course, to calculate the recall you need to use all failures, even for tests that you didn't run)
Yes, my code *already works this way*. It doesn't consider failure information from tests that were not supposed to run. The graphs that I sent are from scripts that ran like this.
Good. I hoped that'll be the case (but didn't check your scripts on github yet, sorry).
Of course, the recall is just the number of spotted failures from the 100% of known failures : )
Anyway, with all this, I will get to work on adapting the simulation a little bit:
- Time since last run will also affect the relevancy of a test - I will try to use the list of changed files from commits to make sure new tests start running right away
Any other comments are welcome.
Getting back to your "potential fallacy" at how you start taking tests into account only when they fail for the first time... I agree, in real life we cannot do that. Instead we start from a complete list of tests, that is known in advance. And you don't have it, unfortunately. An option would be to create a complete list of all tests that have ever failed (and perhaps remove tests that were added in some revision present in the history). And use that as a "starting set" of tests. Alternatively, we can generate a list of all tests currently present in the 10.0 tree - everything that you have in the history tables should be a subset of that. Regards, Sergei