[Maria-developers] Missing locking around THD::set_db() ?
Hi, I was looking at a Valgrind warning in Buildbot (appended below). What happens seems to be this: Thread 1 is running SHOW PROCESSLIST, it grabs the pointer THD::db to the current database of thread 2. Thread 2 then does THD::set_db(), freeing the old THD::db pointer and allocating a new one with the new data. Thread 1 then resumes, doing strdup() of the _old_, now invalid, THD::db pointer, which reads garbage data (or could even segfault if we get really unlucky). This seems like a genuine bug. I see absolutely no locking protecting against this race :-( Any suggestions for how to deal with this? The only locking I see in mysqld_list_processes() is holding LOCK_thread_count, but taking that for every THD::set_db() sounds quite bad for scalability (for example, set_db() is run by every query replicated by a slave thread). Should we use LOCK_thd_data also for THD::db ? Or maybe we could have a fixed buffer in THD for the db? Seems we would need 193 bytes. Then SHOW PROCESSLIST could sometimes show a random mix of old and new db name if we race, but at least we would not read invalid memory or crash due to mummap() ... Any better ideas? - Kristian. ----------------------------------------------------------------------- Original valgrind warning, for reference: rpl.rpl_stop_slave 'mix,xtradb' w5 [ fail ] Found warnings/errors in server log file! Test ended at 2013-04-22 22:05:12 line ==9897== Thread 20: ==9897== Invalid read of size 1 ==9897== at 0x4C2826B: memcpy (mc_replace_strmem.c:878) ==9897== by 0xBAF9E8: strmake_root (my_alloc.c:424) ==9897== by 0x63C74E: mysqld_list_processes(THD*, char const*, bool) (sql_class.h:771) ==9897== by 0x5C8574: mysql_execute_command(THD*) (sql_parse.cc:3149) ==9897== by 0x5CC844: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5759) ==9897== by 0x5CDDC9: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1068) ==9897== by 0x5CE4BD: do_command(THD*) (sql_parse.cc:794) ==9897== by 0x6A4E23: do_handle_one_connection(THD*) (sql_connect.cc:1266) ==9897== by 0x6A4F23: handle_one_connection (sql_connect.cc:1181) ==9897== by 0x8F0027: pfs_spawn_thread (pfs.cc:1015) ==9897== by 0x4E3506F: start_thread (in /lib64/libpthread-2.9.so) ==9897== by 0x62F713C: clone (in /lib64/libc-2.9.so) ==9897== Address 0x1a06a973 is 3 bytes inside a block of size 5 free'd ==9897== at 0x4C257D8: free (vg_replace_malloc.c:446) ==9897== by 0xBB96BB: my_free (my_malloc.c:115) ==9897== by 0x838C86: Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned int) (sql_class.h:2846) ==9897== by 0x53904C: apply_event_and_update_pos(Log_event*, THD*, Relay_log_info*) (log_event.h:1230) ==9897== by 0x5446DD: exec_relay_log_event(THD*, Relay_log_info*) (slave.cc:2766) ==9897== by 0x545BD9: handle_slave_sql (slave.cc:3627) ==9897== by 0x8F0027: pfs_spawn_thread (pfs.cc:1015) ==9897== by 0x4E3506F: start_thread (in /lib64/libpthread-2.9.so) ==9897== by 0x62F713C: clone (in /lib64/libc-2.9.so)
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
What happens seems to be this:
Thread 1 is running SHOW PROCESSLIST, it grabs the pointer THD::db to the current database of thread 2.
Thread 2 then does THD::set_db(), freeing the old THD::db pointer and allocating a new one with the new data.
Thread 1 then resumes, doing strdup() of the _old_, now invalid, THD::db pointer, which reads garbage data (or could even segfault if we get really unlucky).
I filed MDEV-4422 for this. - Kristian.
Hi, Kristian! On Apr 23, Kristian Nielsen wrote:
I was looking at a Valgrind warning in Buildbot (appended below).
Thread 1 is running SHOW PROCESSLIST, it grabs the pointer THD::db to the current database of thread 2. Thread 2 then does THD::set_db(), freeing the old THD::db pointer and allocating a new one with the new data. Thread 1 then resumes, doing strdup() of the _old_, now invalid, THD::db pointer, which reads garbage data (or could even segfault if we get really unlucky).
This seems like a genuine bug. I see absolutely no locking protecting against this race :-(
Any suggestions for how to deal with this?
I'd say, let's use a fixed buffer in 5.1-5.5. In 10.0 a better way would be to use Sergey Petrunia's APC subsystem for that. Regards, Sergei
The new client connector library for C crashes a multi-threaded Windows application. The crash occurs in libmysql.c which uses my_gethostbyname_r, which on Windows, resolves to gethostbyname, which is not thread safe. gethostbyname was deprecated ten years ago in favor of getaddrinfo which is thread safe. The following code fixes the problem. This fix allows you to remove the file my_gethostbyname.c which was just a cross platform hack that is no longer needed. The following code, which shows both old and new code, is not tested under Linux. Cheers, and all the best with Maria, Rob #ifdef USE_OLD_GETHOSTBYNAME int tmp_errno; struct hostent tmp_hostent,*hp; char buff2[GETHOSTBYNAME_BUFF_SIZE]; hp = my_gethostbyname_r(host,&tmp_hostent,buff2,sizeof(buff2), &tmp_errno); if (!hp) { my_set_error(mysql, CR_UNKNOWN_HOST, SQLSTATE_UNKNOWN, ER(CR_UNKNOWN_HOST), host, tmp_errno); my_gethostbyname_r_free(); goto error; } memcpy(&sock_addr.sin_addr,hp->h_addr, (size_t) hp->h_length); my_gethostbyname_r_free(); #else USE_NEW_GETADDRINFO char portbuf[16]; struct addrinfo hints, *resinfo; struct sockaddr_in *ipv; memset (&hints, 0, sizeof(hints)); hints.ai_family = AF_INET; // ipv4 only if (getaddrinfo (host, itoa (port, portbuf, 10), &hints, &resinfo) != 0){ goto error;} ipv = (struct sockaddr_in *)resinfo->ai_addr; memcpy (&sock_addr.sin_addr, &(ipv->sin_addr), 4); freeaddrinfo (resinfo); #endif
-----Original Message----- From: Maria-developers [mailto:maria-developers- bounces+wlad=montyprogram.com@lists.launchpad.net] On Behalf Of Rob Sent: Dienstag, 7. Mai 2013 06:13 To: maria-developers@lists.launchpad.net Subject: [Maria-developers] Client connector library crash
Hi Rob,
The new client connector library for C crashes a multi-threaded Windows application. The crash occurs in libmysql.c which uses my_gethostbyname_r, which on Windows, resolves to gethostbyname, which is not thread safe. gethostbyname was deprecated ten years ago in favor of getaddrinfo which is thread safe.
The following code fixes the problem. This fix allows you to remove the file my_gethostbyname.c which was just a cross platform hack that is no longer needed. The following code, which shows both old and new code, is not tested under Linux.
While I agree that getaddrinfo() should be used always now (alone because it is IPv6-compatible) , it does not change the fact that gethostbyname() is thread safe on Windows, and have always been. It is using thread local storage for the return parameter, not a global variable. MSDN documentation for this function http://msdn.microsoft.com/en-us/library/windows/desktop/ms738524(v=vs.85).as px states "The memory for the hostent structure returned by the gethostbyname function is allocated internally by the Winsock DLL from thread local storage. Only a single hostent structure is allocated and used, no matter how many times the gethostbyaddr or gethostbyname functions are called on the thread. The returned hostent structure must be copied to an application buffer if additional calls are to be made to the gethostbyname or gethostbyaddr functions on the same thread. Otherwise, the return value will be overwritten by subsequent gethostbyname or gethostbyaddr calls on the same thread. The internal memory allocated for the returned hostent structure is released by the Winsock DLL when the thread exits. " So there must be something that would cause the crash for you. Bug reports and patches like this I think best should be submitted to our JIRA : https://mariadb.atlassian.net/browse/CONC Thank you, Wlad
participants (4)
-
Kristian Nielsen
-
Rob
-
Sergei Golubchik
-
Vladislav Vaintroub