I went through the code that the InnoDB background threads use to pick tasks and execute them, in detail. This is how they seem to work.
The reader/writer threads are passed an integer which acts as a segment. They then call fil_aio_wait(segment) which calls os_aio_handler(segment, &node, &message, &type). The control then goes to the os_aio_simulated_handler(segment, m1, m2, request) where the code gets more complicated with AIO arrays and slots. It gets harder to understand how they choose their tasks. It is definitely not a simple queue structure from which they pick their tasks. Also, which buffer pool the task is related to can only be figured out quite later.. based on the value of m2, which stores the address of a bpage. A simple queue could have been easily replaced with multiple queues, i.e. a queue per numa-node like we had once discussed on IRC. Lastly, all these procedures are common for log threads as well.
Another thing, you mentioned more than once you wanted the reader threads to look for a bpage in their local nodes before looking them up in other nodes, but they use the bpage structure itself like I mentioned. Obviously neither of us had a proper understanding of how InnoDB worked in these aspects when we started the project. These threads seem to operate on bpages mostly rather than buf_pools, which makes numa mapping even harder (buf_pools to numa nodes would comparatively be easier), but is definitely more efficient than buf_pools and hence shouldn't be changed.
Then again there were cases when the tasks assigned to the background threads were done by other server threads as well, especially in case of flushing, but I successfully restricted them to their nodes. If we were to create queues per node, which queues would these other threads work on.
In other words the way in which InnoDB was initially written and expanded has made it significantly difficult to be adapted for explicit NUMA support. It wouldn't just require restructuring and changing the way these threads use parameters, pick tasks etc, but may also require re-ordering the order in which they are executed. For example, the background threads are created before the user threads, and trying to use the NUMA node from user THD later on would mean more system calls and probably thread migrations.
When you added the task in the spreadsheet, you were right to anticipate that it could require large cleanup of the InnoDB structure, but I am beginning to think it will be way more complicated. Also, as you once mentioned that most architectures have 2 to 4 NUMA nodes. No doubt, if we could implement a support for NUMA architectuure such that InnoDB would make the best use out of it, there would be a performance difference but it would still be quite negligible in a very fast computer, and I really don't think making such big changes to InnoDB would be worth the effort and risk. Trying to bring a major change to a working and verified code (written by someone else) is more error-prone than adding some modular code and functions to support a new feature. I hope you agree.
Last but not least, if you think it can still be done and have a idea, I will be more than willing to attempt it. After all, GSoC has only got me started to contribute to open-source and this evaluation won't be the end of it. And since there's no other database software out there which supports NUMA (let's ignore numa-interleave) and since I was the first to start working on this, I will be really proud to see it through to the end.
Some Evaluation stuff :
I see only one text field in the form for a link to the work done, and it takes a single url. Reserving this for the github link, where do I add a link to the spreadsheet or documentation ?
Assuming you have already verified the work done so far, I will be creating a pull request shortly, so I can add whether it was merged or not, in the evaluation.