Dave C <dave.zap@gmail.com> writes:
Originally posted to : http://stackoverflow.com/questions/37909652/mariadb-non-blocking-with-epoll Edited for context.
I have single threaded server written in C that accepts TCP/UDP connections based on EPOLL and supports plugins for the multitude of protocol layers we need to support. That bit is fine.
We use MariaDB and the MariaDB C connector that supports non blocking functions in it's API as described here.
Cool, always glad to see users of this API.
Otherwise I fetch the file descriptor that seems to be immediate and register it with EPOLL and bail back to the main EPOLL loop waiting for events.
s = mysql_get_socket(mysql); if(s > 0){ brt_socket_set_fds(endpoint, s); struct epoll_event event; event.data.fd = s; event.events = EPOLLRDHUP | EPOLLIN | EPOLLET | EPOLLOUT; s = epoll_ctl(efd, EPOLL_CTL_ADD, s, &event);
So, in principle here you should/need only register the events that mysql_real_connect_start() requests in the return value. For mysql_real_connect_start(), this would typically be MYSQL_WAIT_WRITE, corresponding to EPOLLOUT. (EPOLLOUT marks that an async socket connect has completed). If it is easier for you to always register for all events, I am not sure it will be a problem. Normally I suppose only the events that the API requests will be possible to trigger. Though, if you somehow eg. got in a situation where there is a EPOLLIN pending, but the non-blocking API is requesting only MYSQL_WAIT_WRITE, maybe you could end up in a busy-loop calling the API with the EPOLLIN being ignored. But I doubt this is related to your problem at hand, just wanted to mention it as something maybe worth considering. Note also that if MYSQL_WAIT_TIMEOUT is included in the return from mysql_real_connect_start(), your code is expected to set up a timeout handler that will call back into the API if the time period returned by mysql_get_timeout_value() elapses. Without this, timeout values from the mysql api will not work (though other things should be fine).
if (s == -1) { syslog(LOG_ERR, "brd_db : epoll error."); // handle error. }...
So then some time later I do get the EPOLLOUT indicating the socket has been opened.
And I dutifully call mysql_real_connect_cont() but at this stage it is still returning a non-zero value, indicating I must wait longer?
Yes. At this stage, data needs to be sent back and forth between server and client to handle login and such. So the return value should probably be MYSQL_WAIT_READ, indicating that the API is waiting for a response from the server.
But then that is the last EPOLL event I get, except for the EPOLLRDHUP when I guess the MariaDB hangs up after 10 seconds.
Hm. So I noticed you added a post to the stackoverflow that you solved your problem by passing the return value from mysql_real_connect_start() as the third parameter to mysql_real_connect_cont()? But in fact this is not correct. What you need to pass in is a value indicating the events that _actually_ occured. So mysql_real_connect_start() probably returns MYSQL_WAIT_WRITE | MYSQL_WAIT_TIMEOUT. If you get EPOLLOUT, you should then pass MYSQL_WAIT_WRITE as the third parameter to mysql_real_connect_cont(). Otherwise, if the timeout triggers, you would pass MYSQL_WAIT_TIMEOUT. The idea is that the API returns the events it wants to be notified about, and you then pass back the subset of those events that actually occured. If you pass MYSQL_WAIT_TIMEOUT to mysql_real_connect_cont(), then it will actually fail the connection with a timeout. And looking in current code, all other values happen to be currently ignored. So I am puzzled that this would solve your problem, though the timeout might mean your code no longer hangs (but also does not connect successfully)?
Can anyone help me understand if this idea is even workable?
It should definitely be workable. One thing I notice is that you are using the EPOLLET flag, which uses edge-triggered epoll. Edge-triggered can be tricky to use, and it is easy to introduce a bug that will cause an event to be lost. But I do not quite see anything wrong - the non-blocking API shouldn't return non-zero unless it is actually waiting for something (eg. recv() returns EAGAIN or EINTR). But there might be a bug, maybe the API has not been well tested in edge-triggered mode? Or an event gets lost between _start() and _cont() somehow? Can you try to run your program under strace? Here is what strace gives for the example program client/async_example.c : socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3 fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.1.7")}, 16) = -1 EINPROGRESS (Operation now in progress) poll([{fd=3, events=POLLOUT}], 1, 4294967295) = 1 ([{fd=3, revents=POLLOUT}]) getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 setsockopt(3, SOL_IP, IP_TOS, [8], 4) = 0 setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 recvfrom(3, 0x55a0bf5ff1d0, 16384, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}], 1, 4294967295) = 1 ([{fd=3, revents=POLLIN}]) recvfrom(3, "S\0\0\0\n5.5.49-0+deb7u1\0\220\0\0\0009^OR+~]"..., 16384, MSG_DONTWAIT, NULL, NULL) = 87 stat("/usr/local/mysql/share/charsets/Index.xml", 0x55a0bf5f8cd0) = -1 ENOENT (No such file or directory) futex(0x55a0be039740, FUTEX_WAKE_PRIVATE, 2147483647) = 0 sendto(3, "V\0\0\1\5\242>\0\0\0\0@\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 90, MSG_DONTWAIT, NULL, 0) = 90 recvfrom(3, 0x55a0bf5ff1d0, 16384, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}], 1, 4294967295) = 1 ([{fd=3, revents=POLLIN}]) recvfrom(3, "\7\0\0\2\0\0\0\2\0\0\0", 16384, MSG_DONTWAIT, NULL, NULL) = 11 Here we see how first the connect() call returns EINPROGRESS, and then a POLLOUT event arrives when the connection completes. And after, recvfrom() fails with EAGAIN twice, followed by POLLIN events when the expected reply arrives from the server. If you compare with an strace from when your program hangs, maybe it will show what the problem is. (Now I'm thinking if there might actually be a bug in case of EINTR? Seems the API will return in this case, while there might still be data pending? So the API should actually retry the recv on EINTR, rather than return with MYSQL_WAIT_READ, for edge-triggered to work? But I don't suppose you're getting EINTR in your application, or if you do, the strace should show.) I would also try to run a tcpdump in parallel with your tests, just to make sure that packets are being sent and received on the network, and that your problem is not being caused by something outside your application (firewall or whatever) - just in case... Hope this helps, and if not do ask again with more info. - Kristian.