Re: [Maria-developers] For Google Summer of Code 2014, Interested in the task of "statistically optimize mysql-test runs by running less tests"

13 Mar 2014

      Hi, Zhongyi Hu!

On Mar 14, Zhongyi Hu wrote:
...
Dear Sergei Golubchik,
I am a post graduate student of Institute of Software, Chinese Academy
of Sciences and my name is Zhongyi Hu.
I major in computer science and my research field is data stream
mining.  Because I have got enough papers and works for graduation, I
want to do something interesting, meaningful and valuable in the rest
time as student.
I see. That's very nice :)
...
I have participated in two projects about database, one is main memory
database and the other is database cluster.  I got some experience of
database system design and implementaion from them.  Although I am
just a beginner of this area, I really like it and expect to make it
as my career.  I often use Mysql in research and work, but MariaDB is
not very familiar to me.  I am tremendously optimistic about it's
future because all of you.
Well, let's come to the point.  I am interested in the task of
"statistically optimize mysql-test runs by running less tests".  I
chose this task because I have written a few tools for automatic test.
I know the performance is very important if there are a large amount
of data or cases to test.
This task won't make you familiar with database system design or
implementation. For this task it doesn't matter whether tests are
database tests, unit tests, or something completely different. As far as
this task is concerned, they're abstracts units of work that can be
executed in arbitrary order and they can "succeed" or "fail", and the
goal is to execute as few of these "tests" as possible, while detecting
as many "failures" as possible.
...
I read the MDEV-5776 and I think the major job is as follow.
When the code is changed, the mysql-test is used to do the requisite
tests.  We need to integrate the information of the changes and the
scenarios to predict the probability of failure for each test and get
the relationships of the tests.
Then decide what to test and what test cases should be used.  The
purpose is to optimize the efficiency of testing.  All of these should
be done by algorithm and program.
Yes. But it's also useful to take into account the historical data -
what tests failed before and where.

In my experiments historical data were most important (I've got good
results purely from statistical analysys of historical data), and the
information about what files were changed didn't improve the results
much. But perhaps I was doing it wrong?
...
In addition, I think that the job is in some ways like mining in data
stream, such as many data need to be statistical analyzed and the
hidden patterns changing over time.
Yes, exactly.
...
At last, I have two basic questions.
1) What exactly are the builder and the combination?
I thought they refer to compiler and runtime environment.
Kind of, yes. See this my reply:
https://lists.launchpad.net/maria-developers/msg06972.html

it contains links to our buildbot (the tool that automatically builds
and tests mariadb on different platforms - "builders").

There you will see what builders are, what combinations are, and so on.
...
2) What does the "individual tests within a big test file" mean?
Most tests use "mysqltest" tool. It is conceptually very simple -
execute a set of commands, record the output. Compare with the correct
pre-recorded output.

A test file contains SQL statements (and sometimes mysqltest
directives). Technically, one can have many logical tests in one test
file.
...
Maybe I am completely wrong, but I still look forward to your reply.
I hope to have the opportunity to learn from you in work and discussion.
If you want to participate in Google Summer of Code, don't forget to
submit a proposal before the deadline:
http://www.google-melange.com/gsoc/events/google/gsoc2014

Regards,
Sergei