Hi Elena,
I just ran the tests comparing both strategies.
To my surprise, according to the tests, the results from the 'original' strategy are a lot higher that the 'new' strategy. The difference in results might come from one of many possibilities, but I feel it's the following:
Using the lists of run tests allows the relevance of a test to decrease only if it is considered to run and it runs. That way, tests with high relevance that would run, but were not in the list, don't run and thus are able to be hit their failures later on, rather than losing relevance.
I will have charts in a few hours, and I will review the code more deeply, to make sure that the results are accurate. For now I can inform you that for a 50% size of the running set, the 'original' strategy, with no randomization, time factor or edit factor achieved a recall of 0.90 in the tests that I ran.
Regards
Pablo