[Maria-developers] [GSoC] Accepted student ready to work : )

older
[Maria-developers] mdev6027 RLIKE:...

Pablo Estrada

22 Apr 2014 22 Apr '14

6:14 a.m.

Hello Sergei, Elena and everyone, I'm Pablo, and I was selected as part of Google Summer of Code 2014 to work on the project "*Statistically optimize mysql-test runs by running less tests*". First of all, thank you very much for having selected me for the project. I am very excited to start reading, reviewing data and coding up. I see that Elena will be my mentor. Nice to meet you Elena : ) To begin with, I am focusing on the *Buildbot documentation*, to *(1)understand how the data is organized* inside it, and to *(2)*figure out *how to add the extra functionality* into the BuildBot. I want to get started with the data analysis, so it would be great if you could help me with the following things: 1. Sergei, you said you did a bit of research into this. Do you have your *previous results*, or *any papers* that you have read that you could suggest to me? 2. Can I get *access to the buildbot database* to look at the data? If not, can I get the E/R model, or any scheme of *how the data looks like?* - I understand that maybe the data is not just a single database. I am starting by *reading the buildbot documentation*, to understand how the stuff is organized in it. 3. *Any other information* that you consider useful for me to know? Any advice? And that's all I have for now. I'm looking forward to getting started! Regards Pablo

Attachments:

attachment.html (text/html — 1.6 KB)

Show replies by date

Elena Stepanova

22 Apr 22 Apr

10:54 a.m.

Hi Pablo, On 4/22/2014 10:14 AM, Pablo Estrada wrote:

...

Hello Sergei, Elena and everyone, I'm Pablo, and I was selected as part of Google Summer of Code 2014 to work on the project "*Statistically optimize mysql-test runs by running less tests*". First of all, thank you very much for having selected me for the project. I am very excited to start reading, reviewing data and coding up.

I see that Elena will be my mentor. Nice to meet you Elena : )

Nice to meet you too and welcome!

...

To begin with, I am focusing on the *Buildbot documentation*, to *(1)understand how the data is organized* inside it, and to *(2)*figure out *how to add the extra functionality* into the BuildBot.

I want to get started with the data analysis, so it would be great if you could help me with the following things:

1. Sergei, you said you did a bit of research into this. Do you have your *previous results*, or *any papers* that you have read that you could suggest to me?

I assume you have already seen the JIRA task https://mariadb.atlassian.net/browse/MDEV-5776, which gives an overall picture of what Sergei had in mind. There was also an email thread which contained some information, particularly this email https://lists.launchpad.net/maria-developers/msg06972.html. I suppose if Sergei has something else to add, he will.

...

2. Can I get *access to the buildbot database* to look at the data? If not, can I get the E/R model, or any scheme of *how the data looks like?*

I will send you the current data dump from the buildbot database (or rather a link where you can download it from).

...

* I understand that maybe the data is not just a single database. I am starting by *reading the buildbot documentation*, to understand how the stuff is organized in it.

Please note that our current buildbot master is running on version 0.8.5 in case it makes a difference, I suppose it might.

...

3. *Any other information* that you consider useful for me to know? Any advice?

As Sergei probably mentioned before, and as you will see from the buildbot data, it doesn't contain bzr revision IDs, only revision numbers. It doesn't matter much as long as you only need to ensure uniqueness, as you can always generate fake ones, but it will cause a problem when you need to map the results to the real bzr change sets, because unlike revIDs, reNos are not not necessarily persistent. In many cases they can be restored from bzr logs, but it will require additional efforts. We can get back to it if/when you find out that you really need them but don't know how to get them. Regards, Elena

...

And that's all I have for now. I'm looking forward to getting started!

Regards Pablo

_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

Sergei Golubchik

12:23 p.m.

Hi, Pablo! On Apr 22, Pablo Estrada wrote:

...

To begin with, I am focusing on the *Buildbot documentation*, to *(1)understand how the data is organized* inside it, and to *(2)*figure out *how to add the extra functionality* into the BuildBot.

as for (1) - I think the tables where test failures are recorded is our own addition, not part of the buildbot. if that's true, then buildbot documentation won't help much. I'll attach a chart of table relationships that I've drawn for myself earlier, when I was looking into this. as for (2) - I'd say the most important part is to figure out how to select the subset of tests to run, not how to integrate it in buildbot. Think about it this way - if you won't complete everything you planned, but you'll know how reduce the number of tests, we can integrate it in buildbot. But if you'll completely figure the integration part out, but won't know what tests to run - the whole project will useless, it's a failure. Of course, we all hope that you'll complete everything you wanted :) but the scientific/experimental part is way more important that purely technical integration task.

...

I want to get started with the data analysis, so it would be great if you could help me with the following things:

1. Sergei, you said you did a bit of research into this. Do you have your *previous results*, or *any papers* that you have read that you could suggest to me?

No papers, sorry. Previous results - yes. I'm thinking what should be less confusing to show :) I have a bzr repository with the test data (text files, basically database dumps) and a pretty hairy perl script that processes them and outputs result files. Then a gnuplot script that reads these result files and shows graphs. There were 9 script runs, that is 9 result files and 9 graphs. I'll attach them too. The model I used was: 1. For every new revision buildbot runs all tests on all platforms 2. But they are sorted specially. No matter in what order tests are executed, this doesn't change the result - the set of failed tests. So, my script was emulating that, changing the order in which tests were run. With the goal to have failures happen as early as possible. That's how to interpret the graphs. For example, on v5 you can see that more than 90% of all test failures happen within the first 20K tests 3. There was no strictly defined "training set of data", but for every new test run I used the data for all previous test runs. Still, for the first ~1500 test runs I didn't do any predictions (reordering), waiting until I get enough statistics. The next step for me would've been to emulate *running less tests* - that is, pretend that buildbot only runs these 20K tests (or whatever the cutoff value is) and only use these failures in later predictions. I didn't do that. Regards, Sergei

Kristian Nielsen

12:35 p.m.

Sergei Golubchik <serg@mariadb.org> writes:

...

as for (1) - I think the tables where test failures are recorded is our own addition, not part of the buildbot. if that's true, then buildbot documentation won't help much.

It is our own addition, but it has been included in upstream Buildbot, including documentation: http://docs.buildbot.net/0.8.5/manual/cfg-buildsteps.html#mtr-mysql-test-run - Kristian.

Pablo Estrada

23 Apr 23 Apr

8:07 a.m.

Hello Sergei, as for (2) - I'd say the most important part is to figure out how to

...

select the subset of tests to run, not how to integrate it in buildbot.

Definitely! I was just reading into buildbot to spend time while I got access to the data! : )

...

I have a bzr repository with the test data (text files, basically database dumps) and a pretty hairy perl script that processes them and outputs result files. Then a gnuplot script that reads these result files and shows graphs.

There were 9 script runs, that is 9 result files and 9 graphs.

...

I'll attach them too.

The model I used was:

1. For every new revision buildbot runs all tests on all platforms

2. But they are sorted specially. No matter in what order tests are executed, this doesn't change the result - the set of failed tests. So, my script was emulating that, changing the order in which tests were run. With the goal to have failures happen as early as possible.

That's how to interpret the graphs. For example, on v5 you can see that more than 90% of all test failures happen within the first 20K tests

Yup, I understand what you mean here... I can grasp the concepts, but I am still having some trouble understanding some of the terms that you use in the graphs. I can see that recall is related to the percentage of failures that have been encountered, and I guess that cutoff has to do with how many files you analyze before starting reordering... Also, I can see by your comments a bit of your thought process, but I have a few questions. I also have some questions regarding what you did with some of the data, to get some ideas on how to do it myself. Also regarding how the buildbot organizes builds and how they correspond to code changes and to test runs. If it's not too inconvenient, do you think we could set up a Google Hangout or a Skype call on Monday to go over a few questions quickly? If you are busy, we can do it through email or on IRC. I can also take a dive into the bzr repo, and look at what you did, as well as read up all the info regarding the data; but it would be real helpful if you could lend me a hand : ) Thank you very much. Best Pablo

Elena Stepanova

8:26 a.m.

Hi Pablo, On 4/23/2014 12:07 PM, Pablo Estrada wrote:

...

Hello Sergei,

<cut>

Yup, I understand what you mean here... I can grasp the concepts, but I am still having some trouble understanding some of the terms that you use in the graphs. I can see that recall is related to the percentage of failures that have been encountered, and I guess that cutoff has to do with how many files you analyze before starting reordering... Also, I can see by your comments a bit of your thought process, but I have a few questions.

I also have some questions regarding what you did with some of the data, to get some ideas on how to do it myself. Also regarding how the buildbot organizes builds and how they correspond to code changes and to test runs.

The buildbot dump that I sent to you will give you an idea of how buildbot organizes builds etc. There is also a buildbot master configuration file in LP (http://bazaar.launchpad.net/~maria-captains/mariadb-tools/trunk/view/head:/b...) which might be useful.

...

If it's not too inconvenient, do you think we could set up a Google Hangout or a Skype call on Monday to go over a few questions quickly?

If you are busy, we can do it through email or on IRC. I can also take a dive into the bzr repo, and look at what you did, as well as read up all the info regarding the data; but it would be real helpful if you could lend me a hand : )

I suggest doing it in a different order. Please do take a dive into the bzr repo, look at what Sergei did, and read all the info. I hope after you have done that, there won't be so many questions left, so a voice session won't be necessary, and email or two will cover the rest. It will be more efficient at the end, will help us keep track of the information and ideas, and besides Sergei is currently on vacation anyway. Regards, Elena

...

Thank you very much. Best Pablo

4090

Age (days ago)

4091

Last active (days ago)

List overview

5 comments

4 participants

participants (4)

Elena Stepanova
Kristian Nielsen
Pablo Estrada
Sergei Golubchik