as for (2) - I'd say the most important part is to figure out how to
select the subset of tests to run, not how to integrate it in buildbot.
I have a bzr repository with the test data (text files, basically
database dumps) and a pretty hairy perl script that processes them and
outputs result files. Then a gnuplot script that reads these result
files and shows graphs.
There were 9 script runs, that is 9 result files and 9 graphs.I'll attach them too.
The model I used was:
1. For every new revision buildbot runs all tests on all platforms
2. But they are sorted specially. No matter in what order tests are
executed, this doesn't change the result - the set of failed tests. So,
my script was emulating that, changing the order in which tests were
run. With the goal to have failures happen as early as possible.
That's how to interpret the graphs. For example, on v5 you can see that
more than 90% of all test failures happen within the first 20K tests