September 2017 - developers - lists.mariadb.org

[Maria-developers] ha_innobase::info_low() n_rows hack
by Aleksey Midenkov 11 Sep '17

11 Sep '17

In ha_innobase::info_low() there is following dirty hack: /* The MySQL optimizer seems to assume in a left join that n_rows is an accurate estimate if it is zero. Of course, it is not, since we do not have any locks on the rows yet at this phase. Since SHOW TABLE STATUS seems to call this function with the HA_STATUS_TIME flag set, while the left join optimizer does not set that flag, we add one to a zero value if the flag is not set. That way SHOW TABLE STATUS will show the best estimate, while the optimizer never sees the table empty. */ if (n_rows == 0 && !(flag & HA_STATUS_TIME)) { n_rows++; } It is very old (from 5.0 or earlier) and bug-prone. Because in ha_innobase::open(): info(HA_STATUS_NO_LOCK | HA_STATUS_VARIABLE | HA_STATUS_CONST); every opened empty table will be non-empty! I don't know what is the problem with join optimizer, but having storage engine to handle it seems not the right thing to do. Moreover, relying on HA_STATUS_TIME in this is definitely wrong. We can make join optimizer to ignore "0 rows case" for all storage engines. Is it big win from "1 row case" anyway? Or we can make new flag HA_JOIN_STAT and use it in make_join_statistics().

2 3

Re: [Maria-developers] A problem with implementing Group Commit with Binlog with MyRocks
by Sergey Petrunia 11 Sep '17

11 Sep '17

Hi Kristian, On Mon, Sep 11, 2017 at 01:02:31PM +0200, Kristian Nielsen wrote: > Sergey Petrunia <sergey(a)mariadb.com> writes: > > > This is about https://jira.mariadb.org/browse/MDEV-11934. I've encountered > > an insteresting issue here, so I thought I would consult on both MyRocks and > > MariaDB lists. > > Is your current code available somewhere? > No, it wasn't. Just pushed the current code into bb-10.2-mdev11934 branch: https://github.com/MariaDB/server/tree/bb-10.2-mdev11934 All changes are in the last commit. BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog

1 0

[Maria-developers] 2nd 2017 Developers Unconference
by Ian Gilfillan 11 Sep '17

11 Sep '17

The dates for the Developers Unconference in Shenzhen, China, are coming up fast. This is our first official event in Asia, and will be held from 13 November - 17 November. 13 November - new contributor day 14-15 November - traditional developers unconference 16-17 November - patch days As always, it's free to attend, and open to anyone interested in MariaDB development. Details are at https://mariadb.org/2017-2-developers-unconference-and-related-events-shenz… and signup on the Meetup page: https://www.meetup.com/MariaDB-Developers/events/241873269/

1 0

[Maria-developers] GSoC NUMA Evaluation Student failed Mail 2
by Sumit Lakra 06 Sep '17

06 Sep '17

Hello, I am not aware of what information regarding this project has been exchanged between my mentor Daniel Black and others including serg, so I am sharing this here for everyone to see. My last mail contains the conversations I had with my mentors. Some quotes from the final evaluation - "Your questions in the last week have been exceptionally keen even though these seem to be the same questions from 1-2 months ago" Remind me when you (Daniel Black) answered them. "Sumit, overall we are happy with the quality of work. Unfortunately the quantity of work showed a lack of putting in hours and limited test cases fell short of the project plan and organisational expectations" What limited test cases ? I wrote all the test cases you (Daniel Black) and JAN Lindstrom asked me to write. Which test case did I miss ? I even ran a bench-marking test with all possible values and sent you the results. However, testing the actual performance difference is something that can only be done in a real NUMA Machine. Sorry, I don't have that. And have you ever even tested my code in a real NUMA Machine ? I too deserve to know if my work has made any real difference on the machine it is supposed to work on. And I have asked you to run it on a real machine more than once over the past months. You are happy with the quality of the work. Good. You are unhappy with the quantity. How do you expect me to increase the quantity when I already completed the tasks I promised in my GSoC proposal except for those that you removed from the list. What project plan are you referring to ? My GSoC proposal - https://docs.google.com/document/d/1UbOQysgOOzCM7z5FPC7gmuCpyxPmB7RGg7y9Ogq… Tasks spreadsheet - https://docs.google.com/spreadsheets/d/1nE-qFXhwwhF0hpOX3wUlczPjlPZTK_UzBu_… The tasks can be categorised as follows. [1] The tasks mentioned in my GSoC proposal which were not removed / changed later. - I completed all these. [2] The tasks mentioned in my GSoC proposal which were crossed out / (under consideration to be dropped / unnecessary) / or were implemented differently. - I either did not complete these for obvious reasons or implemented some of them differently. [3] The tasks you added (in the last month) to the spreadsheet originally under the title 'Out of Time' in the last month. - I completed some of these too. Then when these were done you moved a task (the queue per node task) from the 'Out of Time' to the third month group. We both agreed it was an important part for the completion of the project and I tried my best to implement it. I spent days trying to figure it out, asked you for help, didn't get any, tried again and yet failed because I am no expert of either InnoDB or SQl side of MariaDB. Which organisational expectations did you ask me to fulfil ? When ? I don't remember ever being told about any organisational expectations. Perhaps these were your personal expectations of having a complete NUMA Support in MariaDB by the end of GSoC. If so, then I am really sorry I wasn't able to fulfill those expectations because I got stuck at an important part of the project which also was a seemingly impossible task to do and *YOU FAILED TO HELP ME* with it even though I repeatedly asked (begged actually) you for help. You just kept reminding me that to do the task was my job. That's some real good mentoring right there. Another quote from the final evaluation "Passing to get your continued work isn't honoring the rules of the project or being fair to Google" Yeah, How about showing some good mentor ship by helping me to complete the project while we had the time to do it within Google's timelines in stead. Oh! wait a sec... you never went through the code.. until 3-4 days ago. I was determined to complete it if you had helped me. I have been mentioning it in all mails since the mail I wrote to you and Jan on : 22nd Aug - "Last but not least, if you think it can still be done and have a idea, I will be more than willing to attempt it. After all, GSoC has only got me started to contribute to open-source and this evaluation won't be the end of it." 25th Aug - "I am sorry I am unable to create an implementation. It's not like I don't want to. I just can't completely understand how it has been implemented currently. I have tried to follow the code before, and I did again the whole day today, but I can't come up with an idea at all." 25th Aug - "Like I said, I am not unwilling to complete this task. But I can't even think of where to start. Since, the two of you have greater experience with InnoDB, maybe you should give it a shot. You will definitely understand the present structure better than I do. And if you can come up with even a verbal solution that sounds like it could work, I will implement it asap." 25th Aug - "I spent all day trying to think of any way to implement a queue-per-node thing, and I confess I failed to come up with one. So, in stead of spending more time on that, I will work on migrating these commits to a new branch today. And I also urge you to give this task a try yourselves." 25th Aug - " I haven't been able to implement the queue per node structure for background threads, and I am afraid I won't be able to do it without your help. It is beyond me. I gave it a few attempts and I have failed. It's not the implementation part but the 'coming up with a way to implement it' part that I haven't been able to figure out yet. I need you to know that I am trying my best here." 25th Aug - "But anyway I kindly request you two to try it out yourselves now. You don't have to do the work. I just urge you to take some time out this weekend and go through the code. Let me know how you think it may be possible to implement it. I will do it. I have 4 days 2 hrs before the deadline to submit the evaluation, ends. If you can come up with an implementation plan within the next 2 or 2.5 days, I assure you I will code the implementation within a day of that and probably commit it before submitting the final evaluation as well. I am willing to take this risk." On Aug 28th you wrote a mail and asked me "Knowing what plans you had and how they failed would have been good to know." So, on the same day I wrote you a mail with a detailed explanation of the attempts I had made in trying to complete the task, asking you again to try it yourself and help me if you could come up with an idea. You never did. 31st Aug http://marialog.archivist.info/2017-08-31.txt Me - "did you have an attempt at the task ? come up with any ideas yet ?" dragonheart - been a bit busy with my own work so far. I know I'm going to have to look at a lot of stuff to finish the final GSoC evaluation 4th Sept http://marialog.archivist.info/2017-09-04.txt Me - "You must have seen my mails. Did you have an attempt at the queue per node thing ? Any ideas ?" dragonheart - well, implementing it was your task. but no, haven't found the hours to look though the code again. As per our conversation on IRC clearly Daniel Black hadn't even gone through the code by 4th Sept or attempted to solve the problem. How did he hope to complete the final evaluation. Even if he went through the code a day before submitting his evaluation, isn't this too late for a mentor to go through his/her student's code. Here is a list of some questions I would like MariaDB staff to ponder upon especially serg. [1] Was the task doable at all ? [A] Not unless you are an expert of both the SQL side and the InnoDB side, expert enough to re-write a major part of the interfaace between them. Jan himself acknowledged he wouldn't be able to help me with this, because he has no experience with the interface. I really appreciate that. [2] Even if the task was doable by a team of SQL and InnoDB experts, was it doable by an individual student working on MariaDB for the past four months ? [A] Nope. [3] Why didn't Daniel Black help me with the task even when I clearly mentioned I would need help with the task weeks ago ? [A] ??? [4] Did my mentor Daniel Black even try to help me with this particular task ? Ever ? [A] Nope. [5] Did Daniel Black evaluate me on the basis of the number of commits or the tasks accomplished by those commits ? [A] Based on the evaluation, he clearly has been unhappy with the decreased number of commits, not the content, as all the tasks mentioned in the original proposal have been achieved along with a few more. [6] Can Daniel Black himself complete this task, and thus the project ? [A] I bet not. No doubt the project is incomplete because it involves doing a task that is nearly impossible to do, at least for me, especially without any help. But does GSoC means that the student has to necessarily complete the entire project. I have gone through the code submitting guidelines plenty of times. It clearly states that we must state which tasks were done, and which tasks remain. Obviously not all projects are complete, some only complete the project partially, but mention everything and continue working on them. I did the same, and mentioned in my evaluation that almost all tasks were complete, in stead of all, and I have always been willing to complete this project but man, I can't do something impossible, especially when the people I am supposed to ask for help themselves fail to help their students. Thanks, Sumit

1 0

[Maria-developers] GSoC NUMA Evaluation Student failed Mail 1
by Sumit Lakra 06 Sep '17

06 Sep '17

Hello Serg, I am sorry to bug you with this mail but I am not sure what discussions you and my mentors have had before I was failed in GSoC. I completed all the tasks I had mentioned in my proposal but the project is incomplete. No doubts about that, but it is so because I got stuck with a major task (added in the last month) of the project for which I received no help from my mentors. These are the conversations I had with my mentors recently regarding the same task. Thanks, Sumit ---------- Forwarded message ---------- From: Sumit Lakra <sumitlakradev(a)gmail.com> Date: Tue, Aug 22, 2017 at 1:04 AM Subject: GSoC Final Evaluation To: Daniel Black <daniel.black(a)au1.ibm.com>, Jan Lindström < jan.lindstrom(a)mariadb.com> Hello, I went through the code that the InnoDB background threads use to pick tasks and execute them, in detail. This is how they seem to work. The reader/writer threads are passed an integer which acts as a segment. They then call fil_aio_wait(segment) which calls os_aio_handler(segment, &node, &message, &type). The control then goes to the os_aio_simulated_handler(segment, m1, m2, request) where the code gets more complicated with AIO arrays and slots. It gets harder to understand how they choose their tasks. It is definitely not a simple queue structure from which they pick their tasks. Also, which buffer pool the task is related to can only be figured out quite later.. based on the value of m2, which stores the address of a bpage. A simple queue could have been easily replaced with multiple queues, i.e. a queue per numa-node like we had once discussed on IRC. Lastly, all these procedures are common for log threads as well. Another thing, you mentioned more than once you wanted the reader threads to look for a bpage in their local nodes before looking them up in other nodes, but they use the bpage structure itself like I mentioned. Obviously neither of us had a proper understanding of how InnoDB worked in these aspects when we started the project. These threads seem to operate on bpages mostly rather than buf_pools, which makes numa mapping even harder (buf_pools to numa nodes would comparatively be easier), but is definitely more efficient than buf_pools and hence shouldn't be changed. Then again there were cases when the tasks assigned to the background threads were done by other server threads as well, especially in case of flushing, but I successfully restricted them to their nodes. If we were to create queues per node, which queues would these other threads work on. In other words the way in which InnoDB was initially written and expanded has made it significantly difficult to be adapted for explicit NUMA support. It wouldn't just require restructuring and changing the way these threads use parameters, pick tasks etc, but may also require re-ordering the order in which they are executed. For example, the background threads are created before the user threads, and trying to use the NUMA node from user THD later on would mean more system calls and probably thread migrations. When you added the task in the spreadsheet, you were right to anticipate that it could require large cleanup of the InnoDB structure, but I am beginning to think it will be way more complicated. Also, as you once mentioned that most architectures have 2 to 4 NUMA nodes. No doubt, if we could implement a support for NUMA architectuure such that InnoDB would make the best use out of it, there would be a performance difference but it would still be quite negligible in a very fast computer, and I really don't think making such big changes to InnoDB would be worth the effort and risk. Trying to bring a major change to a working and verified code (written by someone else) is more error-prone than adding some modular code and functions to support a new feature. I hope you agree. Last but not least, if you think it can still be done and have a idea, I will be more than willing to attempt it. After all, GSoC has only got me started to contribute to open-source and this evaluation won't be the end of it. And since there's no other database software out there which supports NUMA (let's ignore numa-interleave) and since I was the first to start working on this, I will be really proud to see it through to the end. Some Evaluation stuff : I see only one text field in the form for a link to the work done, and it takes a single url. Reserving this for the github link, where do I add a link to the spreadsheet or documentation ? Assuming you have already verified the work done so far, I will be creating a pull request shortly, so I can add whether it was merged or not, in the evaluation. Thanking You, Sumit ---------- Forwarded message ---------- From: Daniel Black <daniel.black(a)au1.ibm.com> Date: Thu, Aug 24, 2017 at 12:11 PM Subject: Re: GSoC Final Evaluation To: Sumit Lakra <sumitlakradev(a)gmail.com>, Jan Lindström < jan.lindstrom(a)mariadb.com> On 22/08/17 05:34, Sumit Lakra wrote: > Hello, > > I went through the code that the InnoDB background threads use to pick > tasks and execute them, in detail. This is how they seem to work. > > The reader/writer threads are passed an integer which acts as a segment. > They then call fil_aio_wait(segment) which calls os_aio_handler(segment, > &node, &message, &type). The control then goes to the > os_aio_simulated_handler(segment, m1, m2, request) where the code gets > more complicated with AIO arrays and slots. It gets harder to understand > how they choose their tasks. It is definitely not a simple queue > structure from which they pick their tasks. Also, which buffer pool the > task is related to can only be figured out quite later.. based on the > value of m2, which stores the address of a bpage. > A simple queue could > have been easily replaced with multiple queues, i.e. a queue per > numa-node like we had once discussed on IRC. Yes, you know where requests are created and where the job message ends up so you should be able to create an implementation. Even if it doesn't preserver the existing behaviour it should be enough to test it. > Lastly, all these > procedures are common for log threads as well. relevance? > Another thing, you mentioned more than once you wanted the reader SQL threads? Or is this the same thing? > threads to look for a bpage in their local nodes before looking them up > in other nodes, but they use the bpage structure itself like I > mentioned > Obviously neither of us had a proper understanding of how > InnoDB worked in these aspects when we started the project. No project knows all the details. > These > threads seem to operate on bpages mostly rather than buf_pools, which > makes numa mapping even harder You could add a node_id to the class buf_page_t? > (buf_pools to numa nodes would > comparatively be easier), but is definitely more efficient than > buf_pools and hence shouldn't be changed. > Then again there were cases when the tasks assigned to the background > threads were done by other server threads as well, especially in case of > flushing, but I successfully restricted them to their nodes. If we were > to create queues per node, which queues would these other threads work on. Yes, which is why this issue was raised with you months ago. > In other words the way in which InnoDB was initially written and > expanded has made it significantly difficult to be adapted for explicit > NUMA support. The split of buffer pool instances was in appreciation that there there are hot locks causes by a large number of threads. It just needed more thought than implementing a few thread bindings. I'm sorry you though it was that easy. It wouldn't just require restructuring and changing the > way these threads use parameters, pick tasks etc, You've changed very few parameters so far. https://github.com/MariaDB/server/compare/10.2...theGodlessLakra:numa_one but may also require > re-ordering the order in which they are executed. I can't think of one example of this. > For example, the > background threads are created before the user threads, and trying to > use the NUMA node from user THD later on would mean more system calls > and probably thread migrations. Huh, what? "may..require..(bad design)". Which is why a mapping of THD to known background threads per node was desired, to avoid syscall thread migrations. > When you added the task in the spreadsheet, you were right to anticipate > that it could require large cleanup of the InnoDB structure, but I am > beginning to think it will be way more complicated. I mentioned in on 2017-07-10 too (http://marialog.archivist.info/2017-07-10.txt) too and you said you'd give it a try. > Also, as you once > mentioned that most architectures have 2 to 4 NUMA nodes. No doubt, if > we could implement a support for NUMA architectuure such that InnoDB > would make the best use out of it, there would be a performance > difference but it would still be quite negligible in a very fast > computer, Cross node cache-coherency takes significant clock cycles compared to local access. What you'd end up with is a system that has idle time that can't be used. By finishing this task this could be measured. local/remote perf access can be measure (perhaps even with fake_numa). https://joemario.github.io/blog/2016/09/01/c2c-blog/ > and I really don't think making such big changes to InnoDB > would be worth the effort and risk. There is no gain without risk. Small changes with careful review, running full tests between them is they way large changes are made low risk. > Trying to bring a major change to a > working and verified code (written by someone else) is more error-prone > than adding some modular code and functions to support a new feature. I > hope you agree. When developing you are always developing for current and future use. As such architectural changes need to be commiserate to the functional changes. Jan and I have placed emphasis on writing test cases. These ensure that changes. > Last but not least, if you think it can still be done and have a idea, I > will be more than willing to attempt it. After all, GSoC has only got me > started to contribute to open-source and this evaluation won't be the > end of it. > And since there's no other database software out there which > supports NUMA (let's ignore numa-interleave) and since I was the first > to start working on this There is, found https://github.com/aerospike/aerospike-server/blob/master/cf/src/hardware.c the other day. Wouldn't have helped much as it has a different architecture and getting on top of one was hard enough. >, I will be really proud to see it through to > the end. This week looks like you've done one commit that is largely a revert that Jan asked for. You have slowed down considerably. > Some Evaluation stuff : > I see only one text field in the form for a link to the work done, and > it takes a single url. Reserving this for the github link, where do I > add a link to the spreadsheet or documentation ? > > Assuming you have already verified the work done so far, I will be > creating a pull request shortly, so I can add whether it was merged or > not, in the evaluation. Looks like the fake_numa stuff is incomplete. Pinning of background threads looks odd as they aren't big allocators of memory and aren't associated with SQL threads/buffer pool instance nodes. I have looked though it. As a pull request 60 commits with 26(?) of them fixing prior commits will look messy for anyone trying to look at the changes. If you are up for trying to clean it up please give it a go. Doing it in a copied repository is a good idea as you've seen before it can get a bit hard to keep track of. > Thanking You, > Sumit > ---------- Forwarded message ---------- From: Sumit Lakra <sumitlakradev(a)gmail.com> Date: Fri, Aug 25, 2017 at 3:03 AM Subject: Re: GSoC Final Evaluation To: Daniel Black <daniel.black(a)au1.ibm.com> Cc: Jan Lindström <jan.lindstrom(a)mariadb.com> On Thu, Aug 24, 2017 at 12:11 PM, Daniel Black <daniel.black(a)au1.ibm.com> wrote: > > > On 22/08/17 05:34, Sumit Lakra wrote: > > Hello, > > > > I went through the code that the InnoDB background threads use to pick > > tasks and execute them, in detail. This is how they seem to work. > > > > The reader/writer threads are passed an integer which acts as a segment. > > They then call fil_aio_wait(segment) which calls os_aio_handler(segment, > > &node, &message, &type). The control then goes to the > > os_aio_simulated_handler(segment, m1, m2, request) where the code gets > > more complicated with AIO arrays and slots. It gets harder to understand > > how they choose their tasks. It is definitely not a simple queue > > structure from which they pick their tasks. Also, which buffer pool the > > task is related to can only be figured out quite later.. based on the > > value of m2, which stores the address of a bpage. > > > A simple queue could > > have been easily replaced with multiple queues, i.e. a queue per > > numa-node like we had once discussed on IRC. > > Yes, you know where requests are created and where the job message ends > up so you should be able to create an implementation. Even if it doesn't > preserver the existing behaviour it should be enough to test it. > I am sorry I am unable to create an implementation. It's not like I don't want to. I just can't completely understand how it has been implemented currently. I have tried to follow the code before, and I did again the whole day today, but I can't come up with an idea at all. > > > Lastly, all these > > procedures are common for log threads as well. > > relevance? The reader/writer/log threads all start by executing fil_aio_wait(segment), in srv0start.cc > > > Another thing, you mentioned more than once you wanted the reader > > SQL threads? Or is this the same thing? > > > threads to look for a bpage in their local nodes before looking them up > > in other nodes, but they use the bpage structure itself like I > > mentioned > > Obviously neither of us had a proper understanding of how > > InnoDB worked in these aspects when we started the project. > > No project knows all the details. > > > These > > threads seem to operate on bpages mostly rather than buf_pools, which > > makes numa mapping even harder > > You could add a node_id to the class buf_page_t? > Okay, but how to use it ? where ? In which functions ? > > > (buf_pools to numa nodes would > > comparatively be easier), but is definitely more efficient than > > buf_pools and hence shouldn't be changed. > > > > Then again there were cases when the tasks assigned to the background > > threads were done by other server threads as well, especially in case of > > flushing, but I successfully restricted them to their nodes. If we were > > to create queues per node, which queues would these other threads work > on. > > Yes, which is why this issue was raised with you months ago. > > > In other words the way in which InnoDB was initially written and > > expanded has made it significantly difficult to be adapted for explicit > > NUMA support. > > The split of buffer pool instances was in appreciation that there there > are hot locks causes by a large number of threads. > > It just needed more thought than implementing a few thread bindings. I'm > sorry you though it was that easy. > The split of buffer pool is not problematic to NUMA implementation. Its the task queues for the background threads that's wrecking my brain. > > It wouldn't just require restructuring and changing the > > way these threads use parameters, pick tasks etc, > > > You've changed very few parameters so far. > > https://github.com/MariaDB/server/compare/10.2...theGodlessLakra:numa_one > > but may also require > > re-ordering the order in which they are executed. > > I can't think of one example of this. > > For example, the > > background threads are created before the user threads, and trying to > > use the NUMA node from user THD later on would mean more system calls > > and probably thread migrations. > > Huh, what? "may..require..(bad design)". > One way (a really bad design), would be to create background threads when a request is made by a SQL thread, and then bind it to the node depending on the request i.e. the buffer pool instance in which the associated bpage belongs, which would mean more system calls and migrations. Obviously a very bad design. > > Which is why a mapping of THD to known background threads per node was > desired, to avoid syscall thread migrations. > The SQL threads were expected to be bound to numa nodes, which I did, but I can't think of how to map them to background threads. Can you think of a way to do this ? Let me know how you think this can be done. I will implement it asap. > When you added the task in the spreadsheet, you were right to anticipate > > that it could require large cleanup of the InnoDB structure, but I am > > beginning to think it will be way more complicated. > > I mentioned in on 2017-07-10 too > (http://marialog.archivist.info/2017-07-10.txt) too and you said you'd > give it a try. > > > Also, as you once > > mentioned that most architectures have 2 to 4 NUMA nodes. No doubt, if > > we could implement a support for NUMA architectuure such that InnoDB > > would make the best use out of it, there would be a performance > > difference but it would still be quite negligible in a very fast > > computer, > > Cross node cache-coherency takes significant clock cycles compared to > local access. What you'd end up with is a system that has idle time that > can't be used. > > By finishing this task this could be measured. local/remote perf access > can be measure (perhaps even with fake_numa). > https://joemario.github.io/blog/2016/09/01/c2c-blog/ > > > and I really don't think making such big changes to InnoDB > > would be worth the effort and risk. > > There is no gain without risk. Small changes with careful review, > running full tests between them is they way large changes are made low > risk. > Agreed, but I honestly can't make out the heads and tails of the changes that will be required to replace the present task queue with a queue-per-node structure :( > > > Trying to bring a major change to a > > working and verified code (written by someone else) is more error-prone > > than adding some modular code and functions to support a new feature. I > > hope you agree. > > When developing you are always developing for current and future use. As > such architectural changes need to be commiserate to the functional > changes. Jan and I have placed emphasis on writing test cases. These > ensure that changes. > Like I said, I am not unwilling to complete this task. But I can't even think of where to start. Since, the two of you have greater experience with InnoDB, maybe you should give it a shot. You will definitely understand the present structure better than I do. And if you can come up with even a verbal solution that sounds like it could work, I will implement it asap. > > > Last but not least, if you think it can still be done and have a idea, I > > will be more than willing to attempt it. After all, GSoC has only got me > > started to contribute to open-source and this evaluation won't be the > > end of it. > > > > And since there's no other database software out there which > > supports NUMA (let's ignore numa-interleave) and since I was the first > > to start working on this > > There is, found > https://github.com/aerospike/aerospike-server/blob/master/cf > /src/hardware.c > the other day. Wouldn't have helped much as it has a different > architecture and getting on top of one was hard enough. > > >, I will be really proud to see it through to > > the end. > > This week looks like you've done one commit that is largely a revert > that Jan asked for. You have slowed down considerably. > You have a good reason to be upset at me and I am not complaining. But although it doesn't look from Github like much work due to no changes, I spent a good deal of time brainstorming on the implementation of the first task from the spreadsheet under the "should be done" title, unfortunately to no success. > > > Some Evaluation stuff : > > I see only one text field in the form for a link to the work done, and > > it takes a single url. Reserving this for the github link, where do I > > add a link to the spreadsheet or documentation ? > > > > Assuming you have already verified the work done so far, I will be > > creating a pull request shortly, so I can add whether it was merged or > > not, in the evaluation. > > Looks like the fake_numa stuff is incomplete. Pinning of background > threads looks odd as they aren't big allocators of memory and aren't > associated with SQL threads/buffer pool instance nodes. > Fixed the fake numa test which was failing in non-debug builds. Most of the memory allocation is done during the buffer pool creation, and these have been managed properly while creating buffer pools per numa node. What other allocators of memory do you wish to be managed ? and how ? (Non-pinned threads should probably be allowed to allocate memory from any node) > > I have looked though it. As a pull request 60 commits with 26(?) of them > fixing prior commits will look messy for anyone trying to look at the > changes. If you are up for trying to clean it up please give it a go. > Doing it in a copied repository is a good idea as you've seen before it > can get a bit hard to keep track of. > I spent all day trying to think of any way to implement a queue-per-node thing, and I confess I failed to come up with one. So, in stead of spending more time on that, I will work on migrating these commits to a new branch today. And I also urge you to give this task a try yourselves. Thanks, Sumit ---------- Forwarded message ---------- From: Sumit Lakra <sumitlakradev(a)gmail.com> Date: Fri, Aug 25, 2017 at 6:52 PM Subject: Re: GSoC Final Evaluation To: Daniel Black <daniel.black(a)au1.ibm.com> Cc: Jan Lindström <jan.lindstrom(a)mariadb.com> Hello, I am currently working on copying the changes from numa_one branch to a different branch in a more orderly fashion. I haven't been able to implement the queue per node structure for background threads, and I am afraid I won't be able to do it without your help. It is beyond me. I gave it a few attempts and I have failed. It's not the implementation part but the 'coming up with a way to implement it' part that I haven't been able to figure out yet. I need you to know that I am trying my best here. Back in http://marialog.archivist.info/2017-07-10.txt, you also said "warning the whole io read/write thread segements and scheduling is really messy and could do with a significant rework", and you also mentioned that you and Jan would have an attempt at it yourselves, if I am unable to do it and I move on. Well, I regret moving on to SQL threads back then without informing you that I wasn't successfull. A 'queue-per-numa-node' is easier said than done. But anyway I kindly request you two to try it out yourselves now. You don't have to do the work. I just urge you to take some time out this weekend and go through the code. Let me know how you think it may be possible to implement it. I will do it. I have 4 days 2 hrs before the deadline to submit the evaluation, ends. If you can come up with an implementation plan within the next 2 or 2.5 days, I assure you I will code the implementation within a day of that and probably commit it before submitting the final evaluation as well. I am willing to take this risk. However, if you are unable to come up with an implementation plan, we don't really have to hurry and stress ourselves. We can work together (I can't do this on my own) on this particular task later on and come up with something, assuming I am not failed in the final evaluation. Happy Weekend, Sumit ---------- Forwarded message ---------- From: Daniel Black <daniel.black(a)au1.ibm.com> Date: Mon, Aug 28, 2017 at 7:56 AM Subject: Re: GSoC Final Evaluation To: Sumit Lakra <sumitlakradev(a)gmail.com> Cc: Jan Lindström <jan.lindstrom(a)mariadb.com> On 25/08/17 23:22, Sumit Lakra wrote: > Hello, > > I am currently working on copying the changes from numa_one branch to a > different branch in a more orderly fashion. This is looking good: https://github.com/theGodlessLakra/server/tree/NUMA_Support Don't be afraid to be very descriptive in the commit message especially about the problem being solved and what functionality it implements/changes. Don't re-say what the code is doing, just cover its meaning at a high level. > I haven't been able to > implement the queue per node structure for background threads, and I am > afraid I won't be able to do it without your help. It is beyond me. > I > gave it a few attempts and I have failed. It's not the implementation > part but the 'coming up with a way to implement it' part that I haven't > been able to figure out yet. I need you to know that I am trying my best > here. Knowing what plans you had and how they failed would have been good to know. > Back in http://marialog.archivist.info/2017-07-10.txt, > you also said "warning the whole io read/write thread segements and > scheduling is really messy and could do with a significant rework", To do it properly yes. > and > you also mentioned that you and Jan would have an attempt at it > yourselves, if I am unable to do it and I move on. Well, I regret moving > on to SQL threads back then without informing you that I wasn't > successfull. I probably should of harassed you a bit too. Acknowledging failures is an important part to learning, even if you just acknowledge to yourself. How did SQL threads end up? > A 'queue-per-numa-node' is easier said than done. But > anyway I kindly request you two to try it out yourselves now. You don't > have to do the work. I just urge you to take some time out this weekend > and go through the code. Let me know how you think it may be possible to > implement it. I will do it. Having only just read this, this is a bit late notice. The aspect of understanding the interoperation between parts is also a fundamental aspect of getting a plan to implement it. > I have 4 days 2 hrs before the deadline to submit the evaluation, ends. > If you can come up with an implementation plan within the next 2 or 2.5 > days, I assure you I will code the implementation within a day of that > and probably commit it before submitting the final evaluation as well. I > am willing to take this risk. I thought your submission of a URL left it open to still committing on it. Please don't risk missing it. Making sure the existing code/comments/documentation is as clean as possible is better than tackling new work now. > However, if you are unable to come up with an implementation plan, we > don't really have to hurry and stress ourselves. We can work together (I > can't do this on my own) on this particular task later on and come up > with something, assuming I am not failed in the final evaluation. > > Happy Weekend, > Sumit ---------- Forwarded message ---------- From: Sumit Lakra <sumitlakradev(a)gmail.com> Date: Mon, Aug 28, 2017 at 9:50 PM Subject: Re: GSoC Final Evaluation To: Daniel Black <daniel.black(a)au1.ibm.com> Cc: Jan Lindström <jan.lindstrom(a)mariadb.com> Hello, The following is a detailed explanation of what I did, how and why, while trying to come up with a solution to the task queue problem. However, before you go through this I would still request you to have an attempt at it yourselves, so your perspectives won't be affected by my ideas and you may come up with something different. I had once tried this before in https://github.com/MariaDB/ server/commit/4b8436d3f7c573668a0d37b54dab2d05e8e102be. I even pointed this commit out to once on IRC, (can't access logs of this time). Anyway I didn't put much focus on this as I was onto some other task. Before trying to replace the present queue structure with a queue-per-node, I first tried to let all the threads access this queue as before but only pick tasks dealing with their local nodes. Neither this queue nor these tasks seem to be easily distinguishable in code. The threads start by running fil_aio_wait(segment). This function calls os_aio_handler(segment, &node, &message, &type). This calls os_aio_simulated_handler(segment, m1, m2, request). This calls AIO::get_array_and_local_segment(&array, global_segment), and also creates SimulatedAIOHandler handler(array, segment), followed by calling handler.check_pending(global_segment, event), handler.init(n_slots), and slot = handler.check_completed(&n_reserved). Now, in order to skip a task to be done by other thread, what a thread must do is pick the task, check whether the task is related to the same node or not. If the task is concerned with the same node execute it or else leave it in the queue. This is where the problem occurs. In order to figure out whether a task is concerned with the same node or not, it must know which buffer pool instance the task is associated to. This can only be figured out after the task picks up the bpage it has to deal with. The bpage already has this information by now. Anyway, the bpage a thread has to deal with can only be found out by buf_page_t* bpage = static_cast<buf_page_t*>(*m2). Also, the task is not picked up at any particular place, but is picked up in parts in different functions. All these function calls seem to make some or other kind of changes to the segment/slot/array they are picking up the task from. So, when a thread finally gets a task and finds it concerned with a different node, then in order to safely return the task back to the queue, it must also undo the changes made by all these procedures so far. This is the difficult part. Compare this with pc_flush_slot() where I successfully implemented this method of skipping tasks if not related to local pool. In the above commit you will find I have a few lines like 'ib::info() << thread_name << " accessing buf_page on node : " << node_of_pool << " in fil_aio_wait()";' If you compile and run this code you can see in real time which threads are accessing which pools. I have given proper names to them. Also, when I tried to skip tasks which were not of the same queue by returning from the function os_aio_simulated_handler(), which partly worked but on trying to return from the funtion fil_aio_wait(), there seems to be either an infinite loop or a deadlock somewhere, as the server stops midway (doesn't abort) while starting up. Hence I have commented out those lines on lines 5463 and 5464 in fil0fil.cc in the above commit. Another thing is the role of segments here. Are these segments related to buffer pool instances somehow ? Because all these threads including the log threads start there work by executing fil_aio_wait(segment), but segment here seems to be just an incrementing integer. At one point assuming these segments could contain tasks related with different pools/nodes, I added a loop such that both the read and write threads loop around fil_aio_wait(segment) calling it with all valid segments. Didn't work out. These were attempted by me when I pushed this commit weeks ago in the temp branch. And even though I pointed it out to you, I think I should have given much more importance to this task. Over the past few days, when I tried to give this task another try, I couldn't come up with any idea either to replace this queue with a queue-per-node thing or even to complete this skip-if-diff-node thing.

1 0