[Maria-developers] For Google Summer of Code 2014, Interested in the task of "statistically optimize mysql-test runs by running less tests"

older
[Maria-developers] [GSOC 2014]...

胡仲义

13 Mar 2014 13 Mar '14

6:32 p.m.

Dear Sergei Golubchik, I am a post graduate student of Institute of Software, Chinese Academy of Sciences and my name is Zhongyi Hu. I major in computer science and my research field is data stream mining. Because I have got enough papers and works for graduation, I want to do something interesting, meaningful and valuable in the rest time as student. I have participated in two projects about database, one is main memory database and the other is database cluster. I got some experience of database system design and implementaion from them. Although I am just a beginner of this area, I really like it and expect to make it as my career. I often use Mysql in research and work, but MariaDB is not very familiar to me. I am tremendously optimistic about it's future because all of you. Well, let's come to the point. I am interested in the task of "statistically optimize mysql-test runs by running less tests". I chose this task because I have written a few tools for automatic test. I know the performance is very important if there are a large amount of data or cases to test. I read the MDEV-5776 and I think the major job is as follow. When the code is changed, the mysql-test is used to do the requisite tests. We need to integrate the information of the changes and the scenarios to predict the probability of failure for each test and get the relationships of the tests. Then decide what to test and what test cases should be used. The purpose is to optimize the efficiency of testing. All of these should be done by algorithm and program. In addition, I think that the job is in some ways like mining in data stream, such as many data need to be statistical analyzed and the hidden patterns changing over time. At last, I have two basic questions. 1) What exactly are the builder and the combination? I thought they refer to compiler and runtime environment. 2) What does the "individual tests within a big test file" mean? Maybe I am completely wrong, but I still look forward to your reply. I hope to have the opportunity to learn from you in work and discussion. Best regards Zhongyi Hu 20140314

Attachments:

attachment.html (text/html — 2.6 KB)

Show replies by date

Roberto Spadim

13 Mar 13 Mar

6:50 p.m.

Hi Zhongyi Hu! You're in the right place :) mariadb development is very very nice, much better than mysql :) guys here are very helpfull, good jobs =) 2014-03-13 14:32 GMT-03:00 胡仲义 <sunnyddhzy@gmail.com>:

...

Dear Sergei Golubchik,

I am a post graduate student of Institute of Software, Chinese Academy of Sciences and my name is Zhongyi Hu. I major in computer science and my research field is data stream mining. Because I have got enough papers and works for graduation, I want to do something interesting, meaningful and valuable in the rest time as student.

I have participated in two projects about database, one is main memory database and the other is database cluster. I got some experience of database system design and implementaion from them. Although I am just a beginner of this area, I really like it and expect to make it as my career. I often use Mysql in research and work, but MariaDB is not very familiar to me. I am tremendously optimistic about it's future because all of you.

Well, let's come to the point. I am interested in the task of "statistically optimize mysql-test runs by running less tests". I chose this task because I have written a few tools for automatic test. I know the performance is very important if there are a large amount of data or cases to test.

I read the MDEV-5776 and I think the major job is as follow.

When the code is changed, the mysql-test is used to do the requisite tests. We need to integrate the information of the changes and the scenarios to predict the probability of failure for each test and get the relationships of the tests. Then decide what to test and what test cases should be used. The purpose is to optimize the efficiency of testing. All of these should be done by algorithm and program.

In addition, I think that the job is in some ways like mining in data stream, such as many data need to be statistical analyzed and the hidden patterns changing over time.

At last, I have two basic questions. 1) What exactly are the builder and the combination? I thought they refer to compiler and runtime environment. 2) What does the "individual tests within a big test file" mean?

Maybe I am completely wrong, but I still look forward to your reply. I hope to have the opportunity to learn from you in work and discussion.

Best regards

Zhongyi Hu

20140314

_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

-- Roberto Spadim SPAEmpresarial Eng. Automação e Controle

Sergei Golubchik

7:30 p.m.

Hi, Zhongyi Hu! On Mar 14, Zhongyi Hu wrote:

...

Dear Sergei Golubchik,

I am a post graduate student of Institute of Software, Chinese Academy of Sciences and my name is Zhongyi Hu. I major in computer science and my research field is data stream mining. Because I have got enough papers and works for graduation, I want to do something interesting, meaningful and valuable in the rest time as student.

I see. That's very nice :)

...

I have participated in two projects about database, one is main memory database and the other is database cluster. I got some experience of database system design and implementaion from them. Although I am just a beginner of this area, I really like it and expect to make it as my career. I often use Mysql in research and work, but MariaDB is not very familiar to me. I am tremendously optimistic about it's future because all of you.

Well, let's come to the point. I am interested in the task of "statistically optimize mysql-test runs by running less tests". I chose this task because I have written a few tools for automatic test. I know the performance is very important if there are a large amount of data or cases to test.

This task won't make you familiar with database system design or implementation. For this task it doesn't matter whether tests are database tests, unit tests, or something completely different. As far as this task is concerned, they're abstracts units of work that can be executed in arbitrary order and they can "succeed" or "fail", and the goal is to execute as few of these "tests" as possible, while detecting as many "failures" as possible.

...

I read the MDEV-5776 and I think the major job is as follow.

When the code is changed, the mysql-test is used to do the requisite tests. We need to integrate the information of the changes and the scenarios to predict the probability of failure for each test and get the relationships of the tests. Then decide what to test and what test cases should be used. The purpose is to optimize the efficiency of testing. All of these should be done by algorithm and program.

Yes. But it's also useful to take into account the historical data - what tests failed before and where. In my experiments historical data were most important (I've got good results purely from statistical analysys of historical data), and the information about what files were changed didn't improve the results much. But perhaps I was doing it wrong?

...

In addition, I think that the job is in some ways like mining in data stream, such as many data need to be statistical analyzed and the hidden patterns changing over time.

Yes, exactly.

...

At last, I have two basic questions. 1) What exactly are the builder and the combination? I thought they refer to compiler and runtime environment.

Kind of, yes. See this my reply: https://lists.launchpad.net/maria-developers/msg06972.html it contains links to our buildbot (the tool that automatically builds and tests mariadb on different platforms - "builders"). There you will see what builders are, what combinations are, and so on.

...

2) What does the "individual tests within a big test file" mean?

Most tests use "mysqltest" tool. It is conceptually very simple - execute a set of commands, record the output. Compare with the correct pre-recorded output. A test file contains SQL statements (and sometimes mysqltest directives). Technically, one can have many logical tests in one test file.

...

Maybe I am completely wrong, but I still look forward to your reply. I hope to have the opportunity to learn from you in work and discussion.

If you want to participate in Google Summer of Code, don't forget to submit a proposal before the deadline: http://www.google-melange.com/gsoc/events/google/gsoc2014 Regards, Sergei

胡仲义

16 Mar 16 Mar

6:38 p.m.

Hi, Sergei Golubchik! I am afraid I don't understand the following items very well, could you explain them for me? - average over different combinations or builders - or don't average and treat triplets (test,combination,builder) as individual "tests" Regards, Zhongyi Hu 2014-03-14 2:30 GMT+08:00 Sergei Golubchik <serg@mariadb.org>:

...

Hi, Zhongyi Hu!

On Mar 14, Zhongyi Hu wrote:

...
Dear Sergei Golubchik,

I am a post graduate student of Institute of Software, Chinese Academy of Sciences and my name is Zhongyi Hu. I major in computer science and my research field is data stream mining. Because I have got enough papers and works for graduation, I want to do something interesting, meaningful and valuable in the rest time as student.

I see. That's very nice :)

...
I have participated in two projects about database, one is main memory database and the other is database cluster. I got some experience of database system design and implementaion from them. Although I am just a beginner of this area, I really like it and expect to make it as my career. I often use Mysql in research and work, but MariaDB is not very familiar to me. I am tremendously optimistic about it's future because all of you.

Well, let's come to the point. I am interested in the task of "statistically optimize mysql-test runs by running less tests". I chose this task because I have written a few tools for automatic test. I know the performance is very important if there are a large amount of data or cases to test.

This task won't make you familiar with database system design or implementation. For this task it doesn't matter whether tests are database tests, unit tests, or something completely different. As far as this task is concerned, they're abstracts units of work that can be executed in arbitrary order and they can "succeed" or "fail", and the goal is to execute as few of these "tests" as possible, while detecting as many "failures" as possible.

...
I read the MDEV-5776 and I think the major job is as follow.

When the code is changed, the mysql-test is used to do the requisite tests. We need to integrate the information of the changes and the scenarios to predict the probability of failure for each test and get the relationships of the tests. Then decide what to test and what test cases should be used. The purpose is to optimize the efficiency of testing. All of these should be done by algorithm and program.

Yes. But it's also useful to take into account the historical data - what tests failed before and where.

In my experiments historical data were most important (I've got good results purely from statistical analysys of historical data), and the information about what files were changed didn't improve the results much. But perhaps I was doing it wrong?

...
In addition, I think that the job is in some ways like mining in data stream, such as many data need to be statistical analyzed and the hidden patterns changing over time.

Yes, exactly.

...
At last, I have two basic questions. 1) What exactly are the builder and the combination? I thought they refer to compiler and runtime environment.

Kind of, yes. See this my reply: https://lists.launchpad.net/maria-developers/msg06972.html

it contains links to our buildbot (the tool that automatically builds and tests mariadb on different platforms - "builders").

There you will see what builders are, what combinations are, and so on.

...
2) What does the "individual tests within a big test file" mean?

Most tests use "mysqltest" tool. It is conceptually very simple - execute a set of commands, record the output. Compare with the correct pre-recorded output.

A test file contains SQL statements (and sometimes mysqltest directives). Technically, one can have many logical tests in one test file.

...
Maybe I am completely wrong, but I still look forward to your reply. I hope to have the opportunity to learn from you in work and discussion.

If you want to participate in Google Summer of Code, don't forget to submit a proposal before the deadline: http://www.google-melange.com/gsoc/events/google/gsoc2014

Regards, Sergei

Sergei Golubchik

6:58 p.m.

Hi, Zhongyi Hu! On Mar 17, Zhongyi Hu wrote:

...

Hi, Sergei Golubchik!

I am afraid I don't understand the following items very well, could you explain them for me?

- average over different combinations or builders - or don't average and treat triplets (test,combination,builder) as individual "tests"

Individual test files (say, select.test) run on different builders (say, on win64, and on fedora20-i686, etc). Test files fail with different probabilities. These probabilities may be different for different builders. One can calculate a "test failure probability" as a function of probabilities for this test file to fail on different builders. Alternatively, one cat treat the "select.test run on win64" and "select.test run on fedora20-i686" as two different tests. Regards, Sergei

胡仲义

17 Mar 17 Mar

8:49 a.m.

Hi, Sergei Golubchik! Thanks for your explanations! I have submitted the first version of my proposal. I will really appreciate if you could give me some comments and suggestions to improve it. Regards, Zhongyi Hu 2014-03-17 1:58 GMT+08:00 Sergei Golubchik <serg@mariadb.org>:

...

Hi, Zhongyi Hu!

On Mar 17, Zhongyi Hu wrote:

...
Hi, Sergei Golubchik!

I am afraid I don't understand the following items very well, could you explain them for me?

- average over different combinations or builders - or don't average and treat triplets (test,combination,builder) as individual "tests"

Individual test files (say, select.test) run on different builders (say, on win64, and on fedora20-i686, etc).

Test files fail with different probabilities. These probabilities may be different for different builders.

One can calculate a "test failure probability" as a function of probabilities for this test file to fail on different builders.

Alternatively, one cat treat the "select.test run on win64" and "select.test run on fedora20-i686" as two different tests.

Regards, Sergei

4134

Age (days ago)

4138

Last active (days ago)

List overview

5 comments

3 participants

participants (3)

Roberto Spadim
Sergei Golubchik
胡仲义