[Maria-discuss] Why isn't SO_SNDTIMEO used?
I describe how read/write is done for sockets in mysqld. http://www.facebook.com/note.php?note_id=122655300932 Can someone explain the history of this code? -- Mark Callaghan mdcallag@gmail.com
On Fri, Aug 14, 2009 at 10:28:35AM -0700, MARK CALLAGHAN wrote:
I describe how read/write is done for sockets in mysqld. http://www.facebook.com/note.php?note_id=122655300932
Can someone explain the history of this code?
IIRC the socket code is mostly rather (read: VERY) old. From some book/discussion somewhere about what was best practice plus some modifications over the years to fix some things. So you're probably right and it probably should use them. -- Stewart Smith
Hi!
"Stewart" == Stewart Smith
writes:
Stewart> On Fri, Aug 14, 2009 at 10:28:35AM -0700, MARK CALLAGHAN wrote:
I describe how read/write is done for sockets in mysqld. http://www.facebook.com/note.php?note_id=122655300932
Can someone explain the history of this code?
A comment about the article: The common path for process_alarm is described wrongly - By default on Linux, USE_ONE_SIGNAL_HAND is set, which means that there is no pthread_sigmask() done for process_alarm() We also don't loop over all alarms, when we get an alarm; We only loop over the alarms that have timed out (it's a priority queue), so we find these very fast The alarm code is written with the assumption that there is very few alarms, in which case the LOCK_alarm mutex is mostly hold a very short time and process_alarm() is very seldom called. There should also be very few re-scheduling of alarms. While looking at the code, I did however find one hot spot when there is a lot of threads that could explain the contention for the mutex that you are talking of: Removing the alarm is done with a full loop over all alarms. It should not be too hard to use the priority queue to quickly find the alarm without having to go trough more than a fraction of the alarms. By doing this, you could get a major speedup in the case of many threads as the LOCK_alarm mutex is hold over a much shorter time. Mark, when you see the contention for the LOCK_alarm mutex, how many threads have you been running? As a separate note, I agree it would make perfect sense to use SO_SNDTIMEO on platforms that supports it, except for one little problem: We also use the thr_alarm() functionality when one uses 'kill connection-id' in MySQL. I don't know of any easy way to gracefully wake up a thread that is sleeping on SO_SNDTIMEO. Do you? Regards, Monty
On Wed, Mar 10, 2010 at 12:29 PM, Michael Widenius
Hi!
"Stewart" == Stewart Smith
writes: Stewart> On Fri, Aug 14, 2009 at 10:28:35AM -0700, MARK CALLAGHAN wrote:
I describe how read/write is done for sockets in mysqld. http://www.facebook.com/note.php?note_id=122655300932
Can someone explain the history of this code?
A comment about the article:
The common path for process_alarm is described wrongly
- By default on Linux, USE_ONE_SIGNAL_HAND is set, which means that there is no pthread_sigmask() done for process_alarm()
We also don't loop over all alarms, when we get an alarm; We only loop over the alarms that have timed out (it's a priority queue), so we find these very fast
The alarm code is written with the assumption that there is very few alarms, in which case the LOCK_alarm mutex is mostly hold a very short time and process_alarm() is very seldom called. There should also be very few re-scheduling of alarms.
While looking at the code, I did however find one hot spot when there is a lot of threads that could explain the contention for the mutex that you are talking of:
Removing the alarm is done with a full loop over all alarms.
It should not be too hard to use the priority queue to quickly find the alarm without having to go trough more than a fraction of the alarms. By doing this, you could get a major speedup in the case of many threads as the LOCK_alarm mutex is hold over a much shorter time.
Mark, when you see the contention for the LOCK_alarm mutex, how many threads have you been running?
I have to revisit this. I will reply again when I have data to share.
As a separate note, I agree it would make perfect sense to use SO_SNDTIMEO on platforms that supports it, except for one little problem:
We also use the thr_alarm() functionality when one uses 'kill connection-id' in MySQL. I don't know of any easy way to gracefully wake up a thread that is sleeping on SO_SNDTIMEO. Do you?
No, although I haven't looked and I don't do much network systems programming. -- Mark Callaghan mdcallag@gmail.com
On Wed, Mar 10, 2010 at 12:29 PM, Michael Widenius
wrote: We also use the thr_alarm() functionality when one uses 'kill connection-id' in MySQL. I don't know of any easy way to gracefully wake up a thread that is sleeping on SO_SNDTIMEO. Do you?
Well, I checked the code, and it seems to wake up the thread using pthread_kill(thread, signal) for the 'kill connection-id' command. This should work fine also when using SO_SNDTIMEO for timeouts on the socket. Just send the signal to the thread blocking on the socket with SO_SNDTIMEO, and the blocking socket call will return with EAGAIN or similar. - Kristian.
Hi!
"Kristian" == Kristian Nielsen
writes:
On Wed, Mar 10, 2010 at 12:29 PM, Michael Widenius
wrote: We also use the thr_alarm() functionality when one uses 'kill connection-id' in MySQL. I don't know of any easy way to gracefully wake up a thread that is sleeping on SO_SNDTIMEO. Do you?
Kristian> Well, I checked the code, and it seems to wake up the thread using Kristian> pthread_kill(thread, signal) for the 'kill connection-id' command. This should Kristian> work fine also when using SO_SNDTIMEO for timeouts on the socket. Kristian> Just send the signal to the thread blocking on the socket with SO_SNDTIMEO, Kristian> and the blocking socket call will return with EAGAIN or similar. It's not that easy. The problem is the following: The alarm code now makes sure that we don't send the signal if we are not waiting for it; I may not be safe for the thread to receive the kill signal at any point in time (for example in thread engine code, which we don't want to interrupt). The alarm code makes sure that the signal is never missed. For example, if we would send the signal just before we enter read with SNO_SNDTIMEO, the thread would miss the signal and the 'kill command' would not have any effect. To solve this, we would need to add the following mechanism: - Add a flag to THD that signals if we are in a read() call on a connection. This flag should be modified under a mutex to ensure that the 'kill thread-id' code knows if it should send a signal or not. - The kill code should send multiple kill commands to the thread, until the 'read()' flag changes state to 'not in read'. Possible to do, but still a little bit of work and test. Regards, Monty
Michael Widenius
"Kristian" == Kristian Nielsen
writes:
Kristian> Well, I checked the code, and it seems to wake up the thread using Kristian> pthread_kill(thread, signal) for the 'kill connection-id' command. This should Kristian> work fine also when using SO_SNDTIMEO for timeouts on the socket.
Kristian> Just send the signal to the thread blocking on the socket with SO_SNDTIMEO, Kristian> and the blocking socket call will return with EAGAIN or similar.
It's not that easy.
The problem is the following:
The alarm code now makes sure that we don't send the signal if we are not waiting for it; I may not be safe for the thread to receive the kill signal at any point in time (for example in thread engine code, which we don't want to interrupt).
I do not see any problems sending the signal at any time. Of course, there should be an appropriate handler set up (so we do not kill ourselves), but any interuptible system call (like socket read()/write()) should in any case be coded in a way that is safe for EAGAIN interruption. But maybe I did not understand what particular problem you had in mind, not sure what you mean by "thread engine code" and why we do not want to interrupt it.
The alarm code makes sure that the signal is never missed. For example, if we would send the signal just before we enter read with SNO_SNDTIMEO, the thread would miss the signal and the 'kill command' would not have any effect.
Yes, you are right, this would be prone to races with missed signal. One option might be to call shutdown(2) on the socket, and then send the signal. But this only works for killing the connection, not for just killing a query. So not sure if this is a good idea.
To solve this, we would need to add the following mechanism:
- Add a flag to THD that signals if we are in a read() call on a connection. This flag should be modified under a mutex to ensure that the 'kill thread-id' code knows if it should send a signal or not.
I did not understand why it is important not to send a signal if we are not in read(). (Protecting with a mutex seems a bit of a problem, as I think there is no way to atomically unlock the mutex and initiate the read() call?)
- The kill code should send multiple kill commands to the thread, until the 'read()' flag changes state to 'not in read'.
If this is acceptable (looping, sending kill and waiting a bit for the thread to respond), then the race can be solved easily enough this way.
Possible to do, but still a little bit of work and test.
Yes. - Kristian.
Hi!
"Kristian" == Kristian Nielsen
writes:
<cut>
The problem is the following:
The alarm code now makes sure that we don't send the signal if we are not waiting for it; I may not be safe for the thread to receive the kill signal at any point in time (for example in thread engine code, which we don't want to interrupt).
Kristian> I do not see any problems sending the signal at any time. Of course, there Kristian> should be an appropriate handler set up (so we do not kill ourselves), but any Kristian> interuptible system call (like socket read()/write()) should in any case be Kristian> coded in a way that is safe for EAGAIN interruption. But maybe I did not Kristian> understand what particular problem you had in mind, not sure what you mean by Kristian> "thread engine code" and why we do not want to interrupt it. It depends on how good all the other libraries are that are in used. For example, assume that we send a signal while a storage engine is doing a read on a file. There is a notable change the storage engine will not do a retry ready in case of interrupts, especially if it would use some library to do read/writes. This is because in normal cases on never gets a signal during read/write no MySQL.
The alarm code makes sure that the signal is never missed. For example, if we would send the signal just before we enter read with SNO_SNDTIMEO, the thread would miss the signal and the 'kill command' would not have any effect.
Kristian> Yes, you are right, this would be prone to races with missed signal. Kristian> One option might be to call shutdown(2) on the socket, and then send the Kristian> signal. But this only works for killing the connection, not for just killing a Kristian> query. So not sure if this is a good idea. Yes, we can't use shutdown() as we also want to be able to just kill queries. The other problem is that if we do a shutdown() we can't tell the client that we did a 'graceful kill' and it didn't hit a bug.
To solve this, we would need to add the following mechanism:
- Add a flag to THD that signals if we are in a read() call on a connection. This flag should be modified under a mutex to ensure that the 'kill thread-id' code knows if it should send a signal or not.
Kristian> I did not understand why it is important not to send a signal if we are not in Kristian> read(). Kristian> (Protecting with a mutex seems a bit of a problem, as I think there is no way Kristian> to atomically unlock the mutex and initiate the read() call?) The above is needed to ensure that we really get a signal during read and we don't miss it. Pseudo code: Thread1: get_mutex() thd->in_read= 1; release_mutex(); if (!thd->killed) read() get_mutex() thd->in_read= 0; release_mutex(); The mutex would of course be a local mutex so there is never a conflict from this, except if someone wants to send a kill signal. When sending a kill in thread 2 do { get_mutex(); in_read= thd->in_read; thd->killed= 1; release_mutex(); if (!in_read) break; send_kill(); sleep(1); } As you see, we don't need to have the mutex over the read. We however need to mutex to ensure that we don't miss the kill signal whatever happens. In the above code, we may miss the kill signal, but this is ok as we will retry until thread 2 succeeds to break the read. Without a mutex, there is a chance that thread 2 will not detect that thread 1 will do a read and just set the killed flag, while thread 1 may not see the killed flag but instead block in the read.
- The kill code should send multiple kill commands to the thread, until the 'read()' flag changes state to 'not in read'.
Kristian> If this is acceptable (looping, sending kill and waiting a bit for the thread Kristian> to respond), then the race can be solved easily enough this way. Yes, but you need a mutex to make this fool proof. Regards, Monty
participants (4)
-
Kristian Nielsen
-
MARK CALLAGHAN
-
Michael Widenius
-
Stewart Smith