On Thu, May 22, 2014 at 4:12 PM, Elena Stepanova
<elenst@montyprogram.com> wrote:
I suggest to stay with the terminology, for clarity.
You are right. I'll stick to MTR terminology.
But even on an ideal data set the mixed approach should still be most efficient, so it should be okay to use it even if some day we fix all the broken tests and collect reliable data.
Yes, I agree. Keeping the Mixed (Branch/Platform) approach.
2. Include a new measure that increases relevancy: Time since last run.
The relevancy index should have a component that makes the test more
relevant the longer it spends not running
I agree with the idea, but have doubts about the criteria.
I think you should measure not the time, but the number of test runs that happened since the last time the test was run (it would be even better if we could count the number of revisions, but that's probably not easy).
The reason is that some branches are very active, while others can be extremely slow. So, with the same time-based coefficient the relevancy of a test can strike between two consequent test runs just because they happened with a month interval, but will be changing too slowly on a branch which has a dozen of commits a day.
Yes. I agree with you on this. This is what I had in mind, but I couldn't express it properly on my email : )
3. Include also correlation. I still don't have a great idea of how
correlation will be considered, but it's something like this:
1. The data contains the list of test_runs where each test_suite has
failed. If two test suites have failed together a certain percentage of
times (>30%?), then when test A fails, the relevancy test of test B also
goes up... and when test A runs without failing, the relevancy
test of test
B goes down too.
We'll need to see how it goes.
In real life correlation of this kind does exist, but I'd say much more often related failures happen due to some environmental problems, so the presumed correlation will be fake.
Good point. Let's see how the numbers play out, but I think you are right that this will end up with a severe bias due to test blowups and failures due to environmental problems.
I think in any case we'll have to rely on the fact that your script will choose tests not from the whole universe of tests, but from an initial list that MTR produces for this particular test run. That is, it will go something like that:
- test run is started in buildbot;
- MTR collects test cases to run, according to the startup parameters, as it always does;
- the list is passed to your script;
- the script filters it according to the algorithm that you developed, keeps only a small portion of the initial list, and passes it back to MTR;
- MTR runs the requested tests.
That is, you do exclusion of tests rather than inclusion.
This will solve two problems:
- first test run: when a new test is added, only MTR knows about it, buildbot doesn't; so, when MTR passes to you a test that you know nothing about (and assuming that we do have a list of all executed tests in buildbot), you'll know it's a new test and will act accordingly;
- abandoned tests: MTR just won't pass them to your script, so it won't take them into account.
Great. This is good to know, to have a more precise idea of how the project would fit into the MariaDB development.