Hi Elena,

It's hard to make suggestions without seeing what you currently have, please let me know when you have pushed the code.

I just finished cleaning up the code with the new implementation, but in any case, the strategy is exactly the same. I have been looking for advice with the strategy.

In any case, I just uploaded the new code: https://github.com/pabloem/Kokiri/tree/core-wrapper_architecture
But the strategy of using file correlation is still the same.

Could you please explain what you mean by logging into buildbot (and by more precise data collection via it)? How exactly you are planning to work with buildbot interactively? In the part that concerns our task, buildbot picks up a push, gets it compiled and runs MTR with certain predefined parameters. There isn't really much room for interaction. Possibly I totally misunderstand your question, so please elaborate on it.

What I can do (it also concerns your previous comment about the non-continuous data) is upload a fresh data dump for you; hopefully it will have [almost] all matching logs, so you'll get a consistent chunk of test runs to experiment with.

I mean adding some code that does logging of extra information such as which tests were run on each test_run. This would be the main thing.
I understand that the logfiles that you sent me contain this information, but storing them is not scalable, and even with a fresh dump, I'm not sure there would be a continuous set of data. I made a small script that analyzes the matches of the files with the dump from the database, and their matching is quite random.
Towards the end there is more matching, but it still is quite random, and it doesn't seem to have consistent matching for too long:

https://raw.githubusercontent.com/pabloem/random/master/matches.txt

If you observe, close to the end, there is already a continuous set of 20 skipped test runs:
148484: - kvm-bintar-centos5-x86_1066-log-test-stdio
Skip 20
148485: - winx64-packages_3203-log-test-stdio
So, what I had suggested was to log more data about each test run e.g. mainly, which tests ran, but as much information as possible.

For now, yes, if you'd be so kind, please upload a fresh dump of the database : )

Regards
Pablo