Hi Pablo, Thanks for the update. Some comments inline. On 02.06.2014 18:44, Pablo Estrada wrote:
Hello everyone, Here's a small report on the news that I have so far:
1. I had a slow couple of weeks because of a quick holiday that I took. I will make up for that. 2. I added the metric that considers number of test_runs since a test_case ran for the last time. I graphed it, and it does not affect the results much at all. -I still think this is useful to uncover hidden bugs that might lurk in the code for a long time; but testing this condition is difficult with our data. I would like to keep this measure, specially since it doesn't seem to affect results negatively. Opinions? 3. I liked Sergei's idea about using the changes on the test files to calculate the relevancy index. If a test has been changed recently, its relevancy index should be high. This is also more realistic, and uses information that it's easy for us to figure out.
Right, change should be taken into account, but test files cannot be completely relied upon. The problem is that many MTR tests have a more complicated structure than just test/result pair. They call various other files from inside; sometimes a test file is not much more than setting a few variables and then calling some common logic. Thus, the test file itself never changes, while the logic might. I think it makes more sense to use *result* files. A result file will almost always reflect a change to the logic, it should be more accurate, although not perfect either. If a change was to fix the test itself, e.g. to get rid of a sporadic timeout and such, it's possible that the result file stays the same, while a test or an included file changes.
- I am trying to match the change_files table or the mysql-test directory with the test_failure table. I was confused about the name of tests and test suites, but I am making progress on it. Once I am able to match at least 90% of the test_names in test_failure with the filename in the change_files table, I will incorporate this data into the code and see how it works out.
I recommend to read these pages: https://mariadb.com/kb/en/mysql-test-overview/ https://mariadb.com/kb/en/mysql-test-auxiliary-files/ They might help to resolve some confusion. And of course there is a good "official" MTR manual which you've probably seen: http://dev.mysql.com/doc/mysqltest/2.0/en/index.html
- *Question*: Looking at the change_files table, there are files that have been *ADDED* several times. Why would this be? Maybe when a new branch is created, all files are ADDED to it? Any ideas? : ) (If no one knows, I'll figure it out, but maybe you do know ; ))
My guess is that the most common reason is multiple merges. A file is added to lets say 5.1; then the change, along with others, is merged into 5.2 and hence the file is added again; then to 5.3, etc. Besides, there might be numerous custom development trees registered on buildbot where the change is also merged into, hence the file is added again. I'm sure there are more reasons.
4. I uploaded inline comments for my code last week, let me know if it's clear enough. You can start by run_basic_simulations.py, where the most important functions are called... and after, you can dive into basic_simulator.py, where the simulation is actually done. The repository is a bit messy, I admit. I'll clean it up in the following commits.
Thanks for the hints. I started looking at your code, but haven't made much progress yet. I will follow the path you recommended. Regards, /E
This is all I have to report for now. Any advice on the way I'm proceeding is welcome : ) Have a nice week, everyone.
Pablo
On Sun, May 25, 2014 at 11:43 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Pablo!
On May 25, Pablo Estrada wrote:
On Thu, May 22, 2014 at 5:39 PM, Sergei Golubchik <serg@mariadb.org> wrote:
I don't think you should introduce artificial limitations that make the recall worse, because they "look realistic".
You can do it realistic instead, not look realistic - simply pretend that your code is already running on buildbot and limits the number of tests to run. So, if the test didn't run - you don't have any failure information about it.
And then you only need to do what improves recall, nothing else :)
(of course, to calculate the recall you need to use all failures, even for tests that you didn't run)
Yes, my code *already works this way*. It doesn't consider failure information from tests that were not supposed to run. The graphs that I sent are from scripts that ran like this.
Good. I hoped that'll be the case (but didn't check your scripts on github yet, sorry).
Of course, the recall is just the number of spotted failures from the 100% of known failures : )
Anyway, with all this, I will get to work on adapting the simulation a little bit:
- Time since last run will also affect the relevancy of a test - I will try to use the list of changed files from commits to make sure new tests start running right away
Any other comments are welcome.
Getting back to your "potential fallacy" at how you start taking tests into account only when they fail for the first time...
I agree, in real life we cannot do that. Instead we start from a complete list of tests, that is known in advance. And you don't have it, unfortunately.
An option would be to create a complete list of all tests that have ever failed (and perhaps remove tests that were added in some revision present in the history). And use that as a "starting set" of tests.
Alternatively, we can generate a list of all tests currently present in the 10.0 tree - everything that you have in the history tables should be a subset of that.
Regards, Sergei