Hi, Elena! On Jun 16, Elena Stepanova wrote:
It is still the same model. The core of the model was to make recall a function of cutoff, right? So
Yes, kind of. Ideally, it should be a recall as a function of the total testing time (taking into account number of tests to run, relative builders' speeds and individual test durations). For example, it's better to run two tests on a fast builder, than one test on a slow builder. Better to run three short tests, than one long test. Because that's what we care about - to get highest possible recall and as fast as possible. But, indeed, using number of tests, instead of time, makes the model simpler and only marginally affects the results (if at all). That's what I've seen in my experiments. So, yes, recall a function of cutoff it is.
lets try it first, lets make it real cutoff and see the results. Not filling the queue completely (or in some cases having it empty) is optimization over the initial model, which improves the execution time (marginally) but affects recall (even only marginally). It can be considered, but the results should be compared to the basic ones.
And if lets say we decide that N=100 (or N=10%) is the best cutoff value, and then find out that by not filling the queue completely we lose even 1% in recall,we might want to stay with the full queue. What is the time difference between running 50 tests and 100 tests? Almost nothing, especially comparing to what we spend on preparation of the tests.
I don't think this project will determine what the "best" value is. It can only find the "best set of model parameters" for recall as a function of cutoff (here the "best set" means that for any other set of parameters, recall - in the target recall range - will be less for any given value of cutoff). In other words, this project will deliver a function recall(cutoff). And then we can decide what cutoff we want and how many failures we can afford to miss. For example, for 80% recall one achieve in 5% of the time, 90% recall - in 10% of the time, 95% recal - in 50% of the time, etc. And then you can say, "no, we cannot miss more than 5% of failures, so we'll have to live with 50% speedup only". But no experiments will tell you how many failures are acceptable for us to miss. Regards, Sergei