Re: [Maria-developers] [GSoC] Optimize mysql-test-runs - Results of new strategy

23 Jul 2014

      Hi Pablo,

On 23.07.2014 15:51, Pablo Estrada wrote:
...
Hi Elena,
It's hard to make suggestions without seeing what you currently have,
...
please let me know when you have pushed the code.
I just finished cleaning up the code with the new implementation, but in
any case, the strategy is exactly the same. I have been looking for advice
with the strategy.
In any case, I just uploaded the new code:
https://github.com/pabloem/Kokiri/tree/core-wrapper_architecture
But the strategy of using file correlation is still the same.
Thanks. I hoped you would have results of the experiments involving 
incoming lists of tests, as I think it's an important factor which might 
affect the results (and hence the strategy); but I'll look at what we 
have now.
...
Could you please explain what you mean by logging into buildbot (and by
...
...
more precise data collection via it)? How exactly you are planning to work
with buildbot interactively? In the part that concerns our task, buildbot
picks up a push, gets it compiled and runs MTR with certain predefined
parameters. There isn't really much room for interaction. Possibly I
totally misunderstand your question, so please elaborate on it.
What I can do (it also concerns your previous comment about the
non-continuous data) is upload a fresh data dump for you; hopefully it will
have [almost] all matching logs, so you'll get a consistent chunk of test
runs to experiment with.
I mean adding some code that does logging of extra information such as
which tests were run on each test_run. This would be the main thing.
I understand that the logfiles that you sent me contain this information,
but storing them is not scalable, and even with a fresh dump, I'm not sure
there would be a continuous set of data. I made a small script that
analyzes the matches of the files with the dump from the database, and
their matching is quite random.
I will see what we can do about getting reliable lists one or another 
way; certainly the log files are a temporary solution, but it would be 
nice to use them for experiments and see the results anyway, because 
modifying MTR/buildbot tandem and especially collecting the new data of 
considerable volume will take time.
...
Towards the end there is more matching, but it still is quite random, and
it doesn't seem to have consistent matching for too long:
https://raw.githubusercontent.com/pabloem/random/master/matches.txt
If you observe, close to the end, there is already a continuous set of 20
skipped test runs:
...
148484: - kvm-bintar-centos5-x86_1066-log-test-stdio
Skip 20
148485: - winx64-packages_3203-log-test-stdio
If I interpret your list correctly, you mean that logs for test runs 
with id between 148464 and 148483 (included) are missing.
It's a bit strange.

I see logs for the following runs:

148466 - winx64-packages_3170-log-test-stdio
148467 - win32-packages_3172-log-test-stdio
148470 - win-rqg-se_309-log-test-stdio
148471 - kvm-deb-lucid-x86_3313-log-test_4-stdio
148472 - win32-packages_3173-log-test-stdio
148473 - kvm-deb-debian6-amd64_2705-log-test_4-stdio
148474 - winx64-packages_3171-log-test-stdio
148476 - win-rqg-se_310-log-test-stdio
148778 - kvm-deb-debian6-x86_2850-log-test_4-stdio
148481 - win-rqg-se_311-log-test-stdio
148482 - kvm-bintar-centos5-amd64_359-log-test-stdio
148483 - kvm-deb-precise-amd64_2709-log-test_4-stdio

This is not to say that parsing logs is the best way to do things, but 
apparently something went wrong either with my archiving or with your 
matching. If you don't have these files, please let me know.

Now, regarding the misses.

148464, 148475, 148480 are bld-dan-release. For this builder we indeed 
don't seem to have logs; and the tests are not reliable there, so it 
should be all right to ignore failures from it.
148465 - that's a miss, something went wrong while storing logs.
148468, 148469, 148477, 148479 - these are real misses, we don't have 
these logs

Most of them should not happen for newer tests. For example, logs for 
labrador only start from June, while our database dump was from April.
...
So, what I had suggested was to log more data about each test run e.g.
mainly, which tests ran, but as much information as possible.
For now, yes, if you'd be so kind, please upload a fresh dump of the
database : )
I've uploaded the fresh dump. Same location, file name 
buildbot-20140722.dump.gz.

Regards,
Elena
...
Regards
Pablo

Re: [Maria-developers] [GSoC] Optimize mysql-test-runs - Results of new strategy

Elena Stepanova