Hello Sergei, Elena and all, Today while working on the script, I found and fixed an issue: There is some faulty code code in my script that is in charge of collecting the statistics about whether a test failure was caught or not (here <https://github.com/pabloem/Kokiri/blob/master/basic_simulator.py#L393>). I looked into fixing it, and then I could see another *problem*: The *recall numbers* that I had collected previously were *too high*. The actual recall numbers, once we consider the test failures that are *not caught*, are disappointingly lower. I won't show you results yet, since I want to make sure that the code has been fixed, and I have accurate tests first. This is all for now. The strategy that I was using is a lot less effective than it seemed initially. I will send out a more detailed report with results, my opinion on the weak points of the strategy, and ideas, including a roadmap to try to improve results. Regards. All feedback is welcome. Pablo