1. Structure
2. Cleanup
if you call several simulations from run_basic_simulations.py, only the very first one will use the correct data and get real results. All consequent ones will use the data modified by the previous ones, and the results will be totally irrelevant.
3. Modes
These flaws should be easy to fix by doing proper cleanup before each simulation. But there are also other fragments of code where, for example, logic for 'standard' mode is supposed to be always run and is relied upon, even if the desired mode is different.
In fact, you build all queues every time. It would be an understandable trade-off to save the time on simulations, but you re-run them separately anyway, and only return the requested queue.
4. Failed tests vs executed tests
- if we say we can afford running 500 tests, we'd rather run 500 than 20, even if some of them never failed before. This will also help us break the bubble, especially if we randomize the "tail" (tests with the minimal priority that we add to fill the queue). If some of them fail, they'll get a proper metric and will migrate to the meaningful part of the queue.
To populate the queue, You don't really need the information which tests had ever been run; you only need to know which ones MTR *wants* to run, if the running set is unlimited. If we assume that it passes the list to you, and you iterate through it, you can use your metrics for tests that failed or were edited before, and a default minimal metric for other tests. Then, if the calculated tests are not enough to fill the queue, you'll randomly choose from the rest. It won't completely solve the problem of tests that never failed and were never edited, but at least it will make it less critical.
6. Full / non-full simulation mode
I couldn't understand what the *non*-full simulation mode is for, can you explain this?
7. Matching logic (get_test_file_change_history)
The logic where you are trying to match result file names to test names is not quite correct. There are some highlights:
There can also be subsuites. Consider the example:
./mysql-test/suite/engines/iuds/r/delete_decimal.result
The result file can live not only in /r dir, but also in /t dir, together with the test file. It's not cool, but it happens, see for example mysql-test/suite/mtr/t/
Here are some other possible patterns for engine/plugin suites:
./storage/tokudb/mysql-test/suite/tokudb/r/rows-32m-1.result
./storage/innodb_plugin/mysql-test/innodb.result
Also, in release builds they can be in mysql-test/plugin folder:
mysql-test/plugin/example/mtr/t
Be aware that the logic where you compare branch names doesn't currently work as expected. Your list of "fail branches" consists of clean names only, e.g. "10.0", while row[BRANCH] can be like "lp:~maria-captains/maria/10.0". I'm not sure yet why it is sometimes stored this way, but it is.
Lets get back to it (both of them) after the logic with dependent simulations is fixed, after that we'll review it and see why it doesn't work if it still doesn't. Right now any effect that file edition might have is rather coincidental, possibly the other one is also broken.
- Humans are probably a lot better at predicting first failures than thecurrent strategy.
This is true, unfortunately it's a full time job which we can't afford to waste a human resource on.