Hi Sergei, On 16.06.2014 13:46, Sergei Golubchik wrote:
And if lets say we decide that N=100 (or N=10%) is the best cutoff value, and then find out that by not filling the queue completely we lose even 1% in recall,we might want to stay with the full queue. What is the time difference between running 50 tests and 100 tests? Almost nothing, especially comparing to what we spend on preparation of the tests.
I don't think this project will determine what the "best" value is. It can only find the "best set of model parameters" for recall as a function of cutoff (here the "best set" means that for any other set of parameters, recall - in the target recall range - will be less for any given value of cutoff).
In other words, this project will deliver a function recall(cutoff). And then we can decide what cutoff we want and how many failures we can afford to miss. For example, for 80% recall one achieve in 5% of the time, 90% recall - in 10% of the time, 95% recal - in 50% of the time, etc.
Of course... The example above with the numbers was merely to show that since our primary goal is to make cutoff small enough, an attempt to artificially reduce it further by not using the whole queue won't necessarily pay off. In terms of execution time, it's an essential difference whether to run 100 vs 2000 tests, or even 1000 vs 2000 tests; it's hardly critical whether to run 50 or 100 tests. But to be sure, we need to see whether filling/not-filling the queue makes such an obvious difference that only one approach should always be used, or whether the difference is marginal and we should make it a part of the "set of model parameters and choose its value later as a part of our voluntary decision (and maybe adjust it later). Currently we only have the "non-filling" strategy in place, so we can't see the difference, and it will be a pity if we miss this chance to improve the recall at almost no cost. /E
And then you can say, "no, we cannot miss more than 5% of failures, so we'll have to live with 50% speedup only". But no experiments will tell you how many failures are acceptable for us to miss.
Regards, Sergei