Hi, Zhongyi Hu! On Mar 14, Zhongyi Hu wrote:
Dear Sergei Golubchik,
I am a post graduate student of Institute of Software, Chinese Academy of Sciences and my name is Zhongyi Hu. I major in computer science and my research field is data stream mining. Because I have got enough papers and works for graduation, I want to do something interesting, meaningful and valuable in the rest time as student.
I see. That's very nice :)
I have participated in two projects about database, one is main memory database and the other is database cluster. I got some experience of database system design and implementaion from them. Although I am just a beginner of this area, I really like it and expect to make it as my career. I often use Mysql in research and work, but MariaDB is not very familiar to me. I am tremendously optimistic about it's future because all of you.
Well, let's come to the point. I am interested in the task of "statistically optimize mysql-test runs by running less tests". I chose this task because I have written a few tools for automatic test. I know the performance is very important if there are a large amount of data or cases to test.
This task won't make you familiar with database system design or implementation. For this task it doesn't matter whether tests are database tests, unit tests, or something completely different. As far as this task is concerned, they're abstracts units of work that can be executed in arbitrary order and they can "succeed" or "fail", and the goal is to execute as few of these "tests" as possible, while detecting as many "failures" as possible.
I read the MDEV-5776 and I think the major job is as follow.
When the code is changed, the mysql-test is used to do the requisite tests. We need to integrate the information of the changes and the scenarios to predict the probability of failure for each test and get the relationships of the tests. Then decide what to test and what test cases should be used. The purpose is to optimize the efficiency of testing. All of these should be done by algorithm and program.
Yes. But it's also useful to take into account the historical data - what tests failed before and where. In my experiments historical data were most important (I've got good results purely from statistical analysys of historical data), and the information about what files were changed didn't improve the results much. But perhaps I was doing it wrong?
In addition, I think that the job is in some ways like mining in data stream, such as many data need to be statistical analyzed and the hidden patterns changing over time.
Yes, exactly.
At last, I have two basic questions. 1) What exactly are the builder and the combination? I thought they refer to compiler and runtime environment.
Kind of, yes. See this my reply: https://lists.launchpad.net/maria-developers/msg06972.html it contains links to our buildbot (the tool that automatically builds and tests mariadb on different platforms - "builders"). There you will see what builders are, what combinations are, and so on.
2) What does the "individual tests within a big test file" mean?
Most tests use "mysqltest" tool. It is conceptually very simple - execute a set of commands, record the output. Compare with the correct pre-recorded output. A test file contains SQL statements (and sometimes mysqltest directives). Technically, one can have many logical tests in one test file.
Maybe I am completely wrong, but I still look forward to your reply. I hope to have the opportunity to learn from you in work and discussion.
If you want to participate in Google Summer of Code, don't forget to submit a proposal before the deadline: http://www.google-melange.com/gsoc/events/google/gsoc2014 Regards, Sergei