Hi, Pablo! On Apr 22, Pablo Estrada wrote:
To begin with, I am focusing on the *Buildbot documentation*, to *(1)understand how the data is organized* inside it, and to *(2)*figure out *how to add the extra functionality* into the BuildBot.
as for (1) - I think the tables where test failures are recorded is our own addition, not part of the buildbot. if that's true, then buildbot documentation won't help much. I'll attach a chart of table relationships that I've drawn for myself earlier, when I was looking into this. as for (2) - I'd say the most important part is to figure out how to select the subset of tests to run, not how to integrate it in buildbot. Think about it this way - if you won't complete everything you planned, but you'll know how reduce the number of tests, we can integrate it in buildbot. But if you'll completely figure the integration part out, but won't know what tests to run - the whole project will useless, it's a failure. Of course, we all hope that you'll complete everything you wanted :) but the scientific/experimental part is way more important that purely technical integration task.
I want to get started with the data analysis, so it would be great if you could help me with the following things:
1. Sergei, you said you did a bit of research into this. Do you have your *previous results*, or *any papers* that you have read that you could suggest to me?
No papers, sorry. Previous results - yes. I'm thinking what should be less confusing to show :) I have a bzr repository with the test data (text files, basically database dumps) and a pretty hairy perl script that processes them and outputs result files. Then a gnuplot script that reads these result files and shows graphs. There were 9 script runs, that is 9 result files and 9 graphs. I'll attach them too. The model I used was: 1. For every new revision buildbot runs all tests on all platforms 2. But they are sorted specially. No matter in what order tests are executed, this doesn't change the result - the set of failed tests. So, my script was emulating that, changing the order in which tests were run. With the goal to have failures happen as early as possible. That's how to interpret the graphs. For example, on v5 you can see that more than 90% of all test failures happen within the first 20K tests 3. There was no strictly defined "training set of data", but for every new test run I used the data for all previous test runs. Still, for the first ~1500 test runs I didn't do any predictions (reordering), waiting until I get enough statistics. The next step for me would've been to emulate *running less tests* - that is, pretend that buildbot only runs these 20K tests (or whatever the cutoff value is) and only use these failures in later predictions. I didn't do that. Regards, Sergei