MARK CALLAGHAN <mdcallag@gmail.com> writes:
Why does this need to save/restore thread contexts (setcontext, etc)? I think that if the library is that hard to change then it should be fixed or a simpler solution should be attempted.
In general, when implementing non-blocking semantics like this, there are two main approaches: 1. Write the code in event-driven style. This means using some kind of state machine or message passing (or both) rather than normal nested calls. Whenever code needs to do I/O, or call another function that might do I/O, the current state is manually saved to some struct and control returns to the caller. When the I/O completes control must continue in the next state of the state machine or handler of completion message. 2. Use co-routines (also called fibers or light-weight threads or whatever). The code is written in normal style with nested calls. For operations that need to do I/O, a co-routine is spawned to run the operation; when waiting for I/O the co-routine is suspended, and resumed again when I/O completes. The main reason I choose (2) is similar to what RethinkDB describe here: http://blog.rethinkdb.com/improving-a-large-c-project-with-coroutines http://blog.rethinkdb.com/making-coroutines-fast Basically, (1) is nice for writing quick IRC bots and the like, but as the complexity of the problem grows, the event-driven code becomes very hard to maintain and extend. For adding non-blocking to an existing library like libmysql, there is the added advantage that the change can be much less intrusive, as we can avoid re-writing parts of the existing code into event-driven style. As also discussed in the RethinkDB blogs, one needs to be aware of performance, as some co-routine implementations (eg. POSIX ucontext) are inefficient. The use of co-routines in non-blocking libmysqlclient is particularly simple, so I added simple optimised implementation for i386 and x86_64, and benchmarked them. Micro-benchmark on x86_64 shows that the cost of spawning a co-routine (compared to a direct function call) is about 12 cycles (64-bit Intel Sandy Bridge CPU). That is quite low. It's about the same cost as a pipeline stall (eg. a single mispredicted branch). I think one would be hard pressed to achieve such low overhead from the state machine/message passing needed for event-driven style (1). I also benchmarked real library usage: fetch 100,000,000 rows using mysql_use_result()+mysql_fetch_row(). Rows have a single constant integer value pulled from a self join of MEMORY tables: SELECT 1 from t1 a CROSS JOIN t1 b CROSS JOIN t1 c CROSS JOIN ... Connection is using localhost unix socket to minimise network overhead. This is about the worst-case scenario, with a co-routine spawn for every row fetched. On my machine, this can fetch around 10,000,000 rows per second. Even in this worst-case scenario, it is impossible to measure any difference between the normal blocking code and the non-blocking code using co-routines, as the server is the bottleneck. In short, option (2) is not "needed", but I think it is the best and simplest choice. Hope this helps, - Kristian.