Hello Elena,
Concluding with the results of the recent experimentation, here is the available information:
I have ported the basic code for the 'original' strategy into the core-wrapper architecture, and uploaded it to the 'master' branch.
Now both strategies can be tested equivalently.
Branch:
master - Original strategy, using exponential decay. The performance increased a little bit after incorporating randomizing of the end of the queue.
Branch:
core-wrapper_architecture - 'New' strategy using co occurrence between file changes and failures to calculate relevance.
I think they are both reasonably useful strategies. My theory is that the 'original' strategy performs better with the input_test lists is that we now know which tests ran, and so only the relevance of tests which ran is affected (whereas previously, all tests were having their relevance reduced). The tests were run with 3000 rounds of training and 7000 rounds of prediction.
I think that now the most reasonable option would be to gather data for a longer period, just to be sure that the performance of the 'original' strategy holds for the long term. We already discussed that it would be desirable that buildbot incorporated functionality to keep track of which tests were run, or considered to run (since buildbot already parses the output of MTR, the changes should be quite quick, but I understand that being a production system, extreme care must be had in the changes and the design).
Finally, I fixed the chart comparing the results, sorry about the confusion yesterday.
Let me know what you think, and how you'd like to proceed now. : )
Regards
Pablo