Hi! <cut>
The my_pwrite() code now looks like:
.... #if defined(_WIN32) readbytes= my_win_pread(Filedes, Buffer, Count, offset); #else ...
And there is ew my_winfile.c file.
Is that your code? <cut>
Jeremiah> Yeah, this is definitely 99% my code. Someone spent the time to clean Jeremiah> it, normalize it, and add a few fixes and polish, but it's essentially Jeremiah> the same as what I submitted. Unfortunately, there's no one line Jeremiah> attribution I was promised for contributing the code. :) But at least Jeremiah> it's in there, that's a great start. Sorry about the attribution; I will ensure that when we pull/implement this to MariaDB we will add it! <cut>
Could you take a look a the MySQL-6.0 mysys/windows code and see if you think if it's now good enough.
If not, do you think we should take the 6.0 code as a base for our tree or should we start with your code?
Jeremiah> The MySQL 6.0 code for Windows file handling is a good starting point, Jeremiah> but you can see the primary "hack" of my code is the my_open_osfhandle Jeremiah> function, which is a table specifically for emulating the C style Jeremiah> integer file handles. If I recall correctly (it has been years) the Jeremiah> only reasons I had to implement this were: Jeremiah> As you said, the my_file_info structure, which uses the file descriptor Jeremiah> itself as an index. Jeremiah> my_file.c Jeremiah> if ((uint) fd < my_file_limit && my_file_info[fd].type != UNOPEN) Jeremiah> There you can see the opacity of the file descriptor is violated by Jeremiah> using it as a bounds check for the array, although that's obviously Jeremiah> follows using it as an index into the array. When I made the patch my Jeremiah> understanding was that I needed to keep as many of the changes as Jeremiah> possible within Windows ifdefs, and that changing the structure that Jeremiah> contains all of the file handles significantly would be too much change Jeremiah> to non-Windows code. Yes, we should to strive to keep most of the code 'similar' to ensure easy pulls between MySQL versions. Jeremiah> There are obviously a lot of ways to resolve this. If I were writing it Jeremiah> from scratch, I wouldn't pass the file descriptor around to different Jeremiah> functions, I would pass a pointer to the my_file_info structure around Jeremiah> instead, and all of the mysys file handling code would take a Jeremiah> my_file_info structure pointer as the file desc parameter. There's no Jeremiah> need to index into an array if you already have the pointer to the Jeremiah> struct in question. The code would become portable, and the Jeremiah> serialization on file open/close (fairly minor) would disappear. Agree that a much better way would be to move away from int's and have a structure instead. It should however not be that hard to do this properly as most of the MySQL code is using the type 'File' for the file handle. My old idea was to create a new library, my_file, and copy all the file handling and file name functions from this to my_file and then change this to use a pointer to a structure instead of a File object. The File objects should also contains the file name in parts and one should be able to manipulate the file name without having the file open. The changes in the code would mainly be: - Create the new library. - Add a separate call to populate the File object with the name parts - Change all calls to my_open / my_read to my_file_open, my_file_read etc... (Can be done trivially with one replace command) - Change error handling to instead of checking if my_file_open returns -1 see if it returns MY_INVALID_FILE - Over time, change all file and file name handling functions to use the new file handling functions. You can find the original description of this task here: http://forge.mysql.com/worklog/task.php?id=314 I have included a halfdone version if this code in this email. basharteg> If things are basharteg> being designed in a way that isn't entirely UNIX-centric, it's easy to make basharteg> MySQL for Windows fly. If we get to the point where we're willing to look basharteg> at async socket I/O and thread queues / I/O Completion Ports for work basharteg> dispatch, we can do even more.
I don't know much about that, so I would be really thankful for any help I can get in this area.
Jeremiah> It's going to take a fair amount of design work to figure out how to Jeremiah> adapt MySQL to this kind of model. It could be implemented on all Jeremiah> platforms with epoll on Linux, kqueue on FreeBSD, and I/O Completion Jeremiah> Ports on Windows. Event notification I/O is the key to serious Jeremiah> scalability, but it requires significant design changes, especially when Jeremiah> you're coming from a 1:1 threading model. I have always been worried about the fact that if you have other threads doing IO, you will always get a thread switch + some atomic operations / mutex when doing any read or write operations. The 1:1 model makes handling io operations much easier and if things are buffered then this can be superior compared to having a pool for data. Also, if all the threads are mixing cpu bound and io operations, this gives you somewhat balanced load that works reasonable good in many cases. I understand that pools are superior in some context; I am just not yet sure when it's best to use a pool and when to use 1:1 threads. <cut>
In MySQL 6.0 we you can choose at startup if you want to have one thread per connection or a pool of threads to handle all connections. I wrote the patch in such a way that it's < 10 min of works to pull this code back to 5.1 (or MariaDB); Will do this next week.
http://dev.mysql.com/doc/refman/6.0/en/connection-threads.html
Jeremiah> This change makes significant progress towards moving to event Jeremiah> notification I/O. That means you broke the state information up between Jeremiah> thread and connection and eliminated the bad thread == connection Jeremiah> assumptions. Yes. The patch is now in lp:maria basharteg> So in terms of MySQL, it definitely runs with a 1:1 threading model using basharteg> blocking I/O. While this isn't uncommon for UNIX applications, you'll basharteg> notice the "hot" UNIX apps that are more interested in the C10K ( basharteg> http://www.kegel.com/c10k.html) target for their scalability are starting to basharteg> move to async I/O implementations. This has produced some good output like basharteg> lighttpd (http://www.lighttpd.net/).
I haven't yet made up my mind what's the best route for MariaDB in this case; I am waiting for a benchmark or nice code that will make up my mind for me...
The thread goes both ways. When MySQL was created, it was more common for people to use using async instead of an 1:1 thread model. As far as people have told me, after MySQL and as the thread libraries got better a lot of programs changed to us a 1:1 thread model. Don't think it has yet been proved what is the best model/mix long term. A lot depends on the worklog and what the threads are doing. For example in the MySQL case: - If all your queries are short and there are very little conflicts a pool-of-threads version is much better than a 1:1 model. - If you have a big mix long running queries and short running queries, a 1:1 thread application is superior to simple pole based one. Jeremiah> Yeah, benchmarks are tough too, because serial benchmarks will tend to Jeremiah> show the overhead involved in asynchronous I/O operations. You have to Jeremiah> decide that you're developing for the multi-core world and accept the Jeremiah> minor overhead in exchange for the significant scalability enhancements, Jeremiah> and design your benchmarks to implement concurrent loads. See above. What is best is mostly depending one what the threads are supposed to do. Multi-core is not the most critical part here. <cut>
Note that the thread pool is still 'beta' quality, in the sense it still have some notable week spots:
- If all threads are working or hang (in lock tables) then no new connections can be established, one can't examine what is going on and one can't kill offending threads (as one can't send in new queries)
The easyest fix would be to have another port where one could connect with the 1:1 model and check what's going on.
I added this to the MariaDB thread-pool implementation that is now pushed. Jeremiah> Having an administrative connection is something MySQL has needed for a Jeremiah> very long time anyway. However, as someone who uses thread pools Jeremiah> extensively on a daily basis, you can handle the cases of thread pool Jeremiah> exhaustion in several ways, including safety timer events, timeouts on Jeremiah> lock waits, event based notification on lock acquisition (may be Jeremiah> difficult). This is not easy when you don't want to abort any of the running queries. As the lock between threads can bappen very low, for example inside a storage engine for which you don't have any control, none of the above solutions are easy or even possible to implement. Jeremiah> But as someone who runs a fairly large MySQL system for a Jeremiah> telephony company, I can tell you that the 1:1 threading model doesn't Jeremiah> save you from thread exhaustion any more than having a thread pool Jeremiah> would. When we get threads stuck waiting for locks our servers can Jeremiah> easily become thread exhausted with no way to kill the threads. As long as you limit max_connections under the number of threads your system can handle, you should always be able to connect as a SUPER user and fix things. Jeremiah> Besides, in a good design, you shouldn't have just one thread pool. The Jeremiah> thread pool that handles I/O events (like accepting new connections) Jeremiah> should be different from the worker thread pool that executes SQL Jeremiah> statements. In the current thread pool implementation, the threads are waiting in the pool for any request on connection port; In this case the connection/login handling and reading commands from the user are both io events. Jeremiah> In the Windows async I/O world, we consider connections to be relatively Jeremiah> inexpensive. I don't set connection limits in my applications. Each Jeremiah> connection is a little bit of non-paged pool kernel memory, plus the Jeremiah> little struct I use to keep the connection state information, plus one Jeremiah> async I/O read pending (among thousands). You also need to keep the connection state, which in MySQL means the THD object. This is a semi-expensive object (10 K) Jeremiah> But yes, one of the basics of our design is having at least two thread Jeremiah> pools, one for socket I/O (at least so far as new connections, posting Jeremiah> reads, and receiving queries) and one for SQL statements. The socket Jeremiah> I/O thread pool should not be something that deadlocks, since whatever Jeremiah> locking is used, it would be far more deterministic than the locking Jeremiah> involved in executing SQL statements. In the current implementation, we have only one pool and in MySQL:s case it may be enough. The reason for this is that when the thread gets a signal that there is data to be read, it will have all data it needs in the buffer (as the client commands are mostly very short and if they are not short, the client is working full time to send it as they are always send whole). This means that there is in practice no deadlocks when it comes to reading data and you win the context switch from reading data and then giving it to a worker thread. Jeremiah> Another thing to throw out there since I'm already being terribly Jeremiah> verbose is, each query should have a timeout value, and when that Jeremiah> timeout value is exceeded, the query should terminate. One of the worst Jeremiah> parts about working in MySQL is the idea of a stuck query, especially Jeremiah> when you need a separate connection to kill it. In systems like Jeremiah> Microsoft SQL, the default query timeout is 2 minutes, and if you need Jeremiah> to exceed that or you don't want a timeout, you can change it as part of Jeremiah> the connection settings. Runaway queries running indefinitely doesn't Jeremiah> make any practical sense. This agains depends on the application. Most applications need to do long statistical queries from time to time and you don't want to kill these! Same thing with any queries that does critical updates or moves things around (like ALTER TABLE, REPAIR etc). In other words; I agree we need a timeout mechanism. I am however not sure what should be the best default value for this. That said, Guilhem did once implement a timeout for queries but this was unfortunately never pushed becasue it changed the code too much at the time when the patch was made and we where close to GA. We should dig out this patch and fix it for the current code and add it. Jeremiah> Also, when we get async I/O in place, you Jeremiah> could eventually add an extension to the protocol that allows the client Jeremiah> to cancel the query in progress. Since the incoming I/Os are handled by Jeremiah> I/O threads rather than the SQL threads, you can still receive commands Jeremiah> while the query is in progress. So even as you're executing a query for Jeremiah> someone, you have an async read on the socket waiting for the next Jeremiah> command. If the next command is received while the query is still in Jeremiah> progress, you check to see if it's a cancel command, or simply a Jeremiah> pipelined SQL query. If it's a cancel command, you flag your SQL thread Jeremiah> to stop when it can, and if it's a pipelined SQL query they're probably Jeremiah> violating the MySQL protocol anyway since they need to wait for a Jeremiah> response to the previous query, but if we allowed pipelined queries some Jeremiah> day in the future, they've lost their right to cancel this query and you Jeremiah> simply queue the SQL query until the current query is finished. In the Jeremiah> far future, multiple queries on the same connection are fine, and are Jeremiah> given query identifiers and all handled through async I/O, and all Jeremiah> queries are cancellable. Now you can achieve the same thing by just sending the kill one another connection (could for example be used through the extra-port). Drizzle has done some work in the direction of sending many queries on the same connection. We should look at this when their implementation is ready. I am still a bit sceptical of having two thread pools; My major worry is that it will for normal users slow down things instead of giving a speed increase (because of the extra context switch and mutex/atomic operations to shuffle things around) We need to find someone that can write a prototype and show the benefits trough some agreed to benchmark... basharteg> We need to find the points in there where I/Os are taking place that can be basharteg> adapted to an async model. Basically, the disk reads that are happening to basharteg> satisfy the query, the network reads to get the query, the network writes to basharteg> send the results, and the network waits for the next query. All of those basharteg> things need to happen without a thread being associated with this basharteg> connection/query. We need a proof of concept and benchmark of this...
For mysqld this isn't trivial as we can't easily change the code for the engines.
What I don't like with the async model is that for reads on disk it doesn't speed up things as much and it may even slow down thing as you now have two threads communication and the original is just waiting for the other to take the message, execute it, and tell the other that it's now ready.
Jeremiah> Well, you can avoid async I/O for disk I/O if you think it's going to be Jeremiah> a performance negative. On single or few drive SATA systems, you're Jeremiah> right. On high end I/O subsystems, async I/O can definitely have a Jeremiah> throughput benefit, especially when you have an advanced storage Jeremiah> subsystem driver that combines operations. As we get to Solid State Jeremiah> Disks in the future, it's going to be the same thing as with the Jeremiah> multi-core CPUs we're adapting to. If you're not doing multiple things Jeremiah> at once, your performance is going to be weak. For now though, you can Jeremiah> leave out async disk I/O, but async socket I/O is the only way to get up Jeremiah> to tens of thousands of connections and beyond. You shouldn't be afraid Jeremiah> of the overhead of using multiple threads and multiple thread pools. If Jeremiah> done properly, the penalty to serial operations is minimal, and the Jeremiah> performance gains are significant (in addition to the robustness Jeremiah> enhancements previously discussed). The pool-of-threads code we have now already gives us 100,000 of connections. But I am still afread of splitting the current thread pool to 2 parts... <cut>
What would be the advantages of using Apache portable runtime? Should this be instead of mysys ?
The benefits of using mysys are: - Well working and integrated with all MySQL parts - Well integrated with dbug statements - Provides a lot of extra features, like hiding windows thread implementation.
Jeremiah> I can understand moving to apr from mysys might not be feasible, I just Jeremiah> bring it up because they've abstracted async and event driven I/O and a Jeremiah> lot of other platform differences very well. It works great on Jeremiah> Windows, FreeBSD, and Linux. Probably not something we can go for Jeremiah> right now, but they may have some useful designs and hints. I am happy to take any good ideas from their libraires and add them to ours and even move some of our code in their direction (if there is a clear gain for this). Regards, Monty