[Maria-discuss] connection handling is buggy (HY000/2002): Resource temporarily unavailable)

newer
Re: [Maria-discuss] Pacemaker and...

Reindl Harald

16 Jul 2017 16 Jul '17

4:55 a.m.

i started to play around with "thread_handling = pool-of-threads" with 10.2.7 and removed at that time the @mysqli_real_connect() error supression of my database-layer which also has a usleep() and retry-loop in case connection failed on so completly burried the issue PHP Warning: mysqli_real_connect() [<a href='http://at.php.net/manual/de/function.mysqli-real-connect.php'>function.mysqli-real-connect.php</a>]: (HY000/2002): Resource temporarily unavailable you should not see such messages when run a "ab -c 200 -n 500000 -k http://corecms/show_content.php?sid=2" with "max_connections = 300" _____________________________________ thread_handling = one-thread-per-connection [root@srv-rhsoft:~]$ cat php_error.log | wc -l 52596 thread_handling = pool-of-threads thread_pool_idle_timeout = 900 [root@srv-rhsoft:~]$ cat php_error.log | wc -l 39282 thread_handling = pool-of-threads thread_pool_oversubscribe = 10 thread_pool_idle_timeout = 900 thread_pool_priority = high [root@srv-rhsoft:~]$ cat php_error.log | wc -l 24849 since my database-layer makes a usleep(100000) before each retry and the retry-lop still has error-supression that means the cms waits 10% of all requests at least 0.1 seconds for the mariadb server which means the 4300 Requests/Second could be much higher if every connection suceeds at the first try (at least the thread pool seems to work slightly better then without) _____________________________________ what makes me really worry here is while the load is running (mysqld each time restartet before start apache-benchmark) +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 181 | | Threadpool_threads | 189 | +-------------------------+-------+ after the benchmark has finished and the machine is idle: +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 207 | | Threadpool_threads | 208 | +-------------------------+-------+ frankly i would expect at least that this numbers are going up while the load is running to at least 200 or not going up that high at all but don't refuse conncetions which is IMHO the point of a pool _____________________________________ the core-cms itself makes exactly two queries per request over 3 MyISAM tables, one cache-table with a pirmary key and the second one is a simple join on two tables with only 2 records, so not really something one should call real load select SQL_NO_CACHE * from cms1_cache where `hash`='fullmenu_1_2_0de0_0' select SQL_NO_CACHE * from `cms1_sub` join `cms1_haupt` on sid=2 and hid=shid mysqld as well as httpd had "LimitNOFILE=infinity" in the systemd-unit, the connection type is unix socket, so that all should not be a problem on a i7-3770 CPU @ 3.40GHz with 32 GB RAM it takes some time until the errors start to appear in the log, likely after httpd (mod_prefork) had forked enough worker to introduce real concurrency to the database

Show replies by date

Reindl Harald

16 Jul 16 Jul

5:33 a.m.

simple reproducer script below, obviously it needs the '-k' (keep-alive) flag, otherwise not enough contention on the database server ab -c 200 -n 500000 -k http://corecms/connect-bench.php [root@srv-rhsoft:~]$ cat php_error.log | wc -l 312326 <?php declare(strict_types=1); require __DIR__ . '/php/serverconf.inc.php'; $conn = mysqli_init(); mysqli_options($conn, MYSQLI_OPT_INT_AND_FLOAT_NATIVE, true); if(mysqli_real_connect($conn, 'localhost', $sql_user, $sql_pwd, $sql_db, 3600, '', 0) === true) { echo 'OK'; } else { echo 'FAILED'; } ?> [harry@srv-rhsoft:~]$ php -v PHP 7.1.8-dev (cli) (built: Jul 13 2017 17:26:17) ( NTS ) [harry@srv-rhsoft:~]$ rpm -q mariadb mariadb-10.2.7-5.fc25.20170714.rh.x86_64 Am 16.07.2017 um 06:55 schrieb Reindl Harald:

...

i started to play around with "thread_handling = pool-of-threads" with 10.2.7 and removed at that time the @mysqli_real_connect() error supression of my database-layer which also has a usleep() and retry-loop in case connection failed on so completly burried the issue

PHP Warning: mysqli_real_connect() [<a href='http://at.php.net/manual/de/function.mysqli-real-connect.php'>function.mysqli-real-connect.php</a>]: (HY000/2002): Resource temporarily unavailable

you should not see such messages when run a "ab -c 200 -n 500000 -k http://corecms/show_content.php?sid=2" with "max_connections = 300" _____________________________________

thread_handling = one-thread-per-connection [root@srv-rhsoft:~]$ cat php_error.log | wc -l 52596

thread_handling = pool-of-threads thread_pool_idle_timeout = 900 [root@srv-rhsoft:~]$ cat php_error.log | wc -l 39282

thread_handling = pool-of-threads thread_pool_oversubscribe = 10 thread_pool_idle_timeout = 900 thread_pool_priority = high [root@srv-rhsoft:~]$ cat php_error.log | wc -l 24849

since my database-layer makes a usleep(100000) before each retry and the retry-lop still has error-supression that means the cms waits 10% of all requests at least 0.1 seconds for the mariadb server which means the 4300 Requests/Second could be much higher if every connection suceeds at the first try (at least the thread pool seems to work slightly better then without) _____________________________________

what makes me really worry here is while the load is running (mysqld each time restartet before start apache-benchmark)

+-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 181 | | Threadpool_threads | 189 | +-------------------------+-------+

after the benchmark has finished and the machine is idle: +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 207 | | Threadpool_threads | 208 | +-------------------------+-------+

frankly i would expect at least that this numbers are going up while the load is running to at least 200 or not going up that high at all but don't refuse conncetions which is IMHO the point of a pool _____________________________________

the core-cms itself makes exactly two queries per request over 3 MyISAM tables, one cache-table with a pirmary key and the second one is a simple join on two tables with only 2 records, so not really something one should call real load

select SQL_NO_CACHE * from cms1_cache where `hash`='fullmenu_1_2_0de0_0' select SQL_NO_CACHE * from `cms1_sub` join `cms1_haupt` on sid=2 and hid=shid

mysqld as well as httpd had "LimitNOFILE=infinity" in the systemd-unit, the connection type is unix socket, so that all should not be a problem on a i7-3770 CPU @ 3.40GHz with 32 GB RAM

it takes some time until the errors start to appear in the log, likely after httpd (mod_prefork) had forked enough worker to introduce real concurrency to the database

Reindl Harald

17 Jul 17 Jul

12:19 p.m.

i can reprduce that also on the 12-core production sevrer, in that case only 250 connections are failing and also here only with Keep-Alive requests to the webserver, otherwise not enough load i guess "futex(0x7f65e917eae0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)" is the culprit but why needs that to be exposed to the client as connection error instead just let him wait and try internally again? fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK) = 0 accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 95 fcntl(88, F_SETFL, O_RDWR) = 0 fcntl(95, F_SETFD, FD_CLOEXEC) = 0 futex(0x7f65e917eae0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0x7f65e917eae0, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7f656bf20ca4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f656bf20ca0, FUTEX_OP_SET<<28|0<<12|FUTEX_OP_CMP_GT<<24|0x1) = 1 futex(0x55ce6e382888, FUTEX_WAKE_PRIVATE, 1) = 1 poll([{fd=87, events=POLLIN}, {fd=88, events=POLLIN}], 2, -1) = 1 ([{fd=88, revents=POLLIN}]) fcntl(88, F_GETFL) = 0x2 (flags O_RDWR) fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK) = 0 accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 104 fcntl(88, F_SETFL, O_RDWR) = 0 fcntl(104, F_SETFD, FD_CLOEXEC) = 0 futex(0x7f65be075ca4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f65be075ca0, FUTEX_OP_SET<<28|0<<12|FUTEX_OP_CMP_GT<<24|0x1) = 1 futex(0x55ce6e382948, FUTEX_WAKE_PRIVATE, 1) = 1 poll([{fd=87, events=POLLIN}, {fd=88, events=POLLIN}], 2, -1) = 1 ([{fd=88, revents=POLLIN}]) fcntl(88, F_GETFL) = 0x2 (flags O_RDWR) fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK) = 0 accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 101 fcntl(88, F_SETFL, O_RDWR) = 0 fcntl(101, F_SETFD, FD_CLOEXEC) = 0 Am 16.07.2017 um 07:33 schrieb Reindl Harald:

...

simple reproducer script below, obviously it needs the '-k' (keep-alive) flag, otherwise not enough contention on the database server

ab -c 200 -n 500000 -k http://corecms/connect-bench.php

[root@srv-rhsoft:~]$ cat php_error.log | wc -l 312326

<?php declare(strict_types=1); require __DIR__ . '/php/serverconf.inc.php'; $conn = mysqli_init(); mysqli_options($conn, MYSQLI_OPT_INT_AND_FLOAT_NATIVE, true); if(mysqli_real_connect($conn, 'localhost', $sql_user, $sql_pwd, $sql_db, 3600, '', 0) === true) { echo 'OK'; } else { echo 'FAILED'; } ?>

[harry@srv-rhsoft:~]$ php -v PHP 7.1.8-dev (cli) (built: Jul 13 2017 17:26:17) ( NTS )

[harry@srv-rhsoft:~]$ rpm -q mariadb mariadb-10.2.7-5.fc25.20170714.rh.x86_64

Am 16.07.2017 um 06:55 schrieb Reindl Harald:

...
i started to play around with "thread_handling = pool-of-threads" with 10.2.7 and removed at that time the @mysqli_real_connect() error supression of my database-layer which also has a usleep() and retry-loop in case connection failed on so completly burried the issue

PHP Warning: mysqli_real_connect() [<a href='http://at.php.net/manual/de/function.mysqli-real-connect.php'>function.mysqli-real-connect.php</a>]: (HY000/2002): Resource temporarily unavailable

you should not see such messages when run a "ab -c 200 -n 500000 -k http://corecms/show_content.php?sid=2" with "max_connections = 300" _____________________________________

thread_handling = one-thread-per-connection [root@srv-rhsoft:~]$ cat php_error.log | wc -l 52596

thread_handling = pool-of-threads thread_pool_idle_timeout = 900 [root@srv-rhsoft:~]$ cat php_error.log | wc -l 39282

thread_handling = pool-of-threads thread_pool_oversubscribe = 10 thread_pool_idle_timeout = 900 thread_pool_priority = high [root@srv-rhsoft:~]$ cat php_error.log | wc -l 24849

since my database-layer makes a usleep(100000) before each retry and the retry-lop still has error-supression that means the cms waits 10% of all requests at least 0.1 seconds for the mariadb server which means the 4300 Requests/Second could be much higher if every connection suceeds at the first try (at least the thread pool seems to work slightly better then without) _____________________________________

what makes me really worry here is while the load is running (mysqld each time restartet before start apache-benchmark)

+-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 181 | | Threadpool_threads | 189 | +-------------------------+-------+

after the benchmark has finished and the machine is idle: +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 207 | | Threadpool_threads | 208 | +-------------------------+-------+

frankly i would expect at least that this numbers are going up while the load is running to at least 200 or not going up that high at all but don't refuse conncetions which is IMHO the point of a pool _____________________________________

the core-cms itself makes exactly two queries per request over 3 MyISAM tables, one cache-table with a pirmary key and the second one is a simple join on two tables with only 2 records, so not really something one should call real load

select SQL_NO_CACHE * from cms1_cache where `hash`='fullmenu_1_2_0de0_0' select SQL_NO_CACHE * from `cms1_sub` join `cms1_haupt` on sid=2 and hid=shid

mysqld as well as httpd had "LimitNOFILE=infinity" in the systemd-unit, the connection type is unix socket, so that all should not be a problem on a i7-3770 CPU @ 3.40GHz with 32 GB RAM

it takes some time until the errors start to appear in the log, likely after httpd (mod_prefork) had forked enough worker to introduce real concurrency to the database

Reinis Rozitis

12:39 p.m.

New subject: [Maria-discuss] connection handling is buggy (HY000/2002): Resource temporarily unavailable)

...

...
...
mysqld as well as httpd had "LimitNOFILE=infinity" in the systemd-unit, the connection type is unix socket, so that all should not be a problem on a i7-3770 CPU @ 3.40GHz with 32 GB RAM

Could it be you are hitting this https://github.com/systemd/systemd/issues/3211 (also https://news.ycombinator.com/item?id=11675129 ) - besides the open files limit some systemd versions (before it was bumped to a relative value) are screwed with the default 512 tasks limit. At least I got this after some update on OpenSuse Leap 42.2 and since the systemd service file for Mariadb doesn't specify Tasks everything went south immediately. So from now on I'm setting DefaultTasksMax=infinity in the global systemd config. tt

Reindl Harald

12:48 p.m.

Am 17.07.2017 um 14:39 schrieb Reinis Rozitis:

...

...
...
...
mysqld as well as httpd had "LimitNOFILE=infinity" in the systemd-unit, the connection type is unix socket, so that all should not be a problem on a i7-3770 CPU @ 3.40GHz with 32 GB RAM

Could it be you are hitting this https://github.com/systemd/systemd/issues/3211 (also https://news.ycombinator.com/item?id=11675129 ) - besides the open files limit some systemd versions (before it was bumped to a relative value) are screwed with the default 512 tasks limit.

for sure not, i read systemd news and adjust my services LimitNOFILE=infinity LimitMEMLOCK=infinity OOMScoreAdjust=-1000 TasksMax=2048 [root@srv-rhsoft:~]$ systemctl status mysqld ? mysqld.service - MariaDB Database Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled) Active: active (running) since Mo 2017-07-17 14:04:39 CEST; 43min ago Main PID: 24897 (mysqld) Status: "Taking your SQL requests now..." Tasks: 238 (limit: 2048)

Reindl Harald

1:03 p.m.

looks more like similar to https://www.percona.com/blog/2009/11/20/rare-evil-mysql-bug/ which is really old but who knows Am 17.07.2017 um 14:48 schrieb Reindl Harald:

...

Am 17.07.2017 um 14:39 schrieb Reinis Rozitis:

...
...
...
...
mysqld as well as httpd had "LimitNOFILE=infinity" in the systemd-unit, the connection type is unix socket, so that all should not be a problem on a i7-3770 CPU @ 3.40GHz with 32 GB RAM

Could it be you are hitting this https://github.com/systemd/systemd/issues/3211 (also https://news.ycombinator.com/item?id=11675129 ) - besides the open files limit some systemd versions (before it was bumped to a relative value) are screwed with the default 512 tasks limit.

for sure not, i read systemd news and adjust my services

LimitNOFILE=infinity LimitMEMLOCK=infinity OOMScoreAdjust=-1000 TasksMax=2048

[root@srv-rhsoft:~]$ systemctl status mysqld ? mysqld.service - MariaDB Database Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled) Active: active (running) since Mo 2017-07-17 14:04:39 CEST; 43min ago Main PID: 24897 (mysqld) Status: "Taking your SQL requests now..." Tasks: 238 (limit: 2048)

Vladislav Vaintroub

1:02 p.m.

...

i can reprduce that also on the 12-core production sevrer, in that case only 250 connections are failing and also here only with Keep-Alive requests to the webserver, otherwise not enough load

i guess "futex(0x7f65e917eae0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)" is the culprit but why needs that to be exposed to the client as connection error instead just let him wait and try internally again? You are getting HY000/2002, the range starting with 2000 (ending iirc

...

fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK) = 0 accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 95 fcntl(88, F_SETFL, O_RDWR) = 0 fcntl(95, F_SETFD, FD_CLOEXEC) = 0 futex(0x7f65e917eae0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0x7f65e917eae0, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7f656bf20ca4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f656bf20ca0, FUTEX_OP_SET<<28|0<<12|FUTEX_OP_CMP_GT<<24|0x1) = 1 futex(0x55ce6e382888, FUTEX_WAKE_PRIVATE, 1) = 1 poll([{fd=87, events=POLLIN}, {fd=88, events=POLLIN}], 2, -1) = 1 ([{fd=88, revents=POLLIN}]) fcntl(88, F_GETFL) = 0x2 (flags O_RDWR) fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK) = 0 accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 104 fcntl(88, F_SETFL, O_RDWR) = 0 fcntl(104, F_SETFD, FD_CLOEXEC) = 0 futex(0x7f65be075ca4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f65be075ca0, FUTEX_OP_SET<<28|0<<12|FUTEX_OP_CMP_GT<<24|0x1) = 1 futex(0x55ce6e382948, FUTEX_WAKE_PRIVATE, 1) = 1 poll([{fd=87, events=POLLIN}, {fd=88, events=POLLIN}], 2, -1) = 1 ([{fd=88, revents=POLLIN}]) fcntl(88, F_GETFL) = 0x2 (flags O_RDWR) fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK) = 0 accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 101 fcntl(88, F_SETFL, O_RDWR) = 0 fcntl(101, F_SETFD, FD_CLOEXEC) = 0

Am 16.07.2017 um 07:33 schrieb Reindl Harald:

...
simple reproducer script below, obviously it needs the '-k' (keep-alive) flag, otherwise not enough contention on the database server

ab -c 200 -n 500000 -k http://corecms/connect-bench.php

[root@srv-rhsoft:~]$ cat php_error.log | wc -l 312326

<?php declare(strict_types=1); require __DIR__ . '/php/serverconf.inc.php'; $conn = mysqli_init(); mysqli_options($conn, MYSQLI_OPT_INT_AND_FLOAT_NATIVE, true); if(mysqli_real_connect($conn, 'localhost', $sql_user, $sql_pwd, $sql_db, 3600, '', 0) === true) { echo 'OK'; } else { echo 'FAILED'; } ?>

[harry@srv-rhsoft:~]$ php -v PHP 7.1.8-dev (cli) (built: Jul 13 2017 17:26:17) ( NTS )

[harry@srv-rhsoft:~]$ rpm -q mariadb mariadb-10.2.7-5.fc25.20170714.rh.x86_64

Am 16.07.2017 um 06:55 schrieb Reindl Harald:

...
i started to play around with "thread_handling = pool-of-threads" with 10.2.7 and removed at that time the @mysqli_real_connect() error supression of my database-layer which also has a usleep() and retry-loop in case connection failed on so completly burried the issue

PHP Warning: mysqli_real_connect() [<a href='http://at.php.net/manual/de/function.mysqli-real-connect.php'>function.mysqli-real-connect.php</a>]: (HY000/2002): Resource temporarily unavailable

you should not see such messages when run a "ab -c 200 -n 500000 -k http://corecms/show_content.php?sid=2" with "max_connections = 300" _____________________________________

thread_handling = one-thread-per-connection [root@srv-rhsoft:~]$ cat php_error.log | wc -l 52596

thread_handling = pool-of-threads thread_pool_idle_timeout = 900 [root@srv-rhsoft:~]$ cat php_error.log | wc -l 39282

thread_handling = pool-of-threads thread_pool_oversubscribe = 10 thread_pool_idle_timeout = 900 thread_pool_priority = high [root@srv-rhsoft:~]$ cat php_error.log | wc -l 24849

since my database-layer makes a usleep(100000) before each retry and the retry-lop still has error-supression that means the cms waits 10% of all requests at least 0.1 seconds for the mariadb server which means the 4300 Requests/Second could be much higher if every connection suceeds at the first try (at least the thread pool seems to work slightly better then without) _____________________________________

what makes me really worry here is while the load is running (mysqld each time restartet before start apache-benchmark)

+-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 181 | | Threadpool_threads | 189 | +-------------------------+-------+

after the benchmark has finished and the machine is idle: +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 207 | | Threadpool_threads | 208 | +-------------------------+-------+

frankly i would expect at least that this numbers are going up while the load is running to at least 200 or not going up that high at all but don't refuse conncetions which is IMHO the point of a pool The thread count increases in presence of contention. The benchmarks are usually written in such a way that all threads shutdown at the same time at the end. This runs into internal locks (LOCK_thread_count, if I remember correctly). What happens in case of the serious contention is that threadpool notices contention, starts getting worried about all threads getting blocked and possibly deadlocked, and not having enough spare threads to handle new requests( should they come). Thus threadpool will increase

On 17.07.2017 14:19, Reindl Harald wrote: 3000) is the error originating on client. If I was to guess it is a client-side connection timeout. Threadpool throttles thread creation, which means following : if all threads are busy, threadpool won't create a thread to handle connection request immediately. The queuing time can be long enough to exceed client or server side connection timeout. One way to handle that is to increase connection timeout, hopefully PHP allows that. Another through : with your setting "thread_pool_priority = high", all requests have the same priority, either connecting or handling the queries. If you used default setting, connection request would be higher priority, which means generally it is handled faster compared to normal requests. the number if threads. If you wait long enough after your benchmark (long enough is longer that thread_pool_idle_timeout seconds), the excess threads will disappear.

...

...
...
_____________________________________

the core-cms itself makes exactly two queries per request over 3 MyISAM tables, one cache-table with a pirmary key and the second one is a simple join on two tables with only 2 records, so not really something one should call real load

select SQL_NO_CACHE * from cms1_cache where `hash`='fullmenu_1_2_0de0_0' select SQL_NO_CACHE * from `cms1_sub` join `cms1_haupt` on sid=2 and hid=shid

mysqld as well as httpd had "LimitNOFILE=infinity" in the systemd-unit, the connection type is unix socket, so that all should not be a problem on a i7-3770 CPU @ 3.40GHz with 32 GB RAM

it takes some time until the errors start to appear in the log, likely after httpd (mod_prefork) had forked enough worker to introduce real concurrency to the database

_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp

Reindl Harald

1:07 p.m.

Am 17.07.2017 um 15:02 schrieb Vladislav Vaintroub:

...

...
i can reprduce that also on the 12-core production sevrer, in that case only 250 connections are failing and also here only with Keep-Alive requests to the webserver, otherwise not enough load

i guess "futex(0x7f65e917eae0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)" is the culprit but why needs that to be exposed to the client as connection error instead just let him wait and try internally again? You are getting HY000/2002, the range starting with 2000 (ending iirc

On 17.07.2017 14:19, Reindl Harald wrote: 3000) is the error originating on client. If I was to guess it is a client-side connection timeout. Threadpool throttles thread creation, which means following : if all threads are busy, threadpool won't create a thread to handle connection request immediately. The queuing time can be long enough to exceed client or server side connection timeout. One way to handle that is to increase connection timeout, hopefully PHP allows that

please look at the thread start i can reproduce this with pool / thread-per-connection and "thread_pool_priority" set as well as keep it to defaults

...

Another through : with your setting "thread_pool_priority = high", all requests have the same priority, either connecting or handling the queries. If you used default setting, connection request would be higher priority, which means generally it is handled faster compared to normal requests.

see above, also another piece of my intial posting - why in the world does "Threadpool_threads" not reach the highest value under load but after the benchmark is finished what makes me really worry here is while the load is running (mysqld each time restartet before start apache-benchmark) +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 181 | | Threadpool_threads | 189 | +-------------------------+-------+ after the benchmark has finished and the machine is idle: +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 207 | | Threadpool_threads | 208 | +-------------------------+-------+

...

...
fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK) = 0 accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 95 fcntl(88, F_SETFL, O_RDWR) = 0 fcntl(95, F_SETFD, FD_CLOEXEC) = 0 futex(0x7f65e917eae0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0x7f65e917eae0, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7f656bf20ca4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f656bf20ca0, FUTEX_OP_SET<<28|0<<12|FUTEX_OP_CMP_GT<<24|0x1) = 1 futex(0x55ce6e382888, FUTEX_WAKE_PRIVATE, 1) = 1 poll([{fd=87, events=POLLIN}, {fd=88, events=POLLIN}], 2, -1) = 1 ([{fd=88, revents=POLLIN}]) fcntl(88, F_GETFL) = 0x2 (flags O_RDWR) fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK) = 0 accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 104 fcntl(88, F_SETFL, O_RDWR) = 0 fcntl(104, F_SETFD, FD_CLOEXEC) = 0 futex(0x7f65be075ca4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f65be075ca0, FUTEX_OP_SET<<28|0<<12|FUTEX_OP_CMP_GT<<24|0x1) = 1 futex(0x55ce6e382948, FUTEX_WAKE_PRIVATE, 1) = 1 poll([{fd=87, events=POLLIN}, {fd=88, events=POLLIN}], 2, -1) = 1 ([{fd=88, revents=POLLIN}]) fcntl(88, F_GETFL) = 0x2 (flags O_RDWR) fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK) = 0 accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 101 fcntl(88, F_SETFL, O_RDWR) = 0 fcntl(101, F_SETFD, FD_CLOEXEC) = 0

...
simple reproducer script below, obviously it needs the '-k' (keep-alive) flag, otherwise not enough contention on the database server

ab -c 200 -n 500000 -k http://corecms/connect-bench.php

[root@srv-rhsoft:~]$ cat php_error.log | wc -l 312326

<?php declare(strict_types=1); require __DIR__ . '/php/serverconf.inc.php'; $conn = mysqli_init(); mysqli_options($conn, MYSQLI_OPT_INT_AND_FLOAT_NATIVE, true); if(mysqli_real_connect($conn, 'localhost', $sql_user, $sql_pwd, $sql_db, 3600, '', 0) === true) { echo 'OK'; } else { echo 'FAILED'; } ?>

[harry@srv-rhsoft:~]$ php -v PHP 7.1.8-dev (cli) (built: Jul 13 2017 17:26:17) ( NTS )

[harry@srv-rhsoft:~]$ rpm -q mariadb mariadb-10.2.7-5.fc25.20170714.rh.x86_64

Am 16.07.2017 um 06:55 schrieb Reindl Harald:

...
i started to play around with "thread_handling = pool-of-threads" with 10.2.7 and removed at that time the @mysqli_real_connect() error supression of my database-layer which also has a usleep() and retry-loop in case connection failed on so completly burried the issue

PHP Warning: mysqli_real_connect() [<a href='http://at.php.net/manual/de/function.mysqli-real-connect.php'>function.mysqli-real-connect.php</a>]: (HY000/2002): Resource temporarily unavailable

you should not see such messages when run a "ab -c 200 -n 500000 -k http://corecms/show_content.php?sid=2" with "max_connections = 300" _____________________________________

thread_handling = one-thread-per-connection [root@srv-rhsoft:~]$ cat php_error.log | wc -l 52596

thread_handling = pool-of-threads thread_pool_idle_timeout = 900 [root@srv-rhsoft:~]$ cat php_error.log | wc -l 39282

thread_handling = pool-of-threads thread_pool_oversubscribe = 10 thread_pool_idle_timeout = 900 thread_pool_priority = high [root@srv-rhsoft:~]$ cat php_error.log | wc -l 24849

since my database-layer makes a usleep(100000) before each retry and the retry-lop still has error-supression that means the cms waits 10% of all requests at least 0.1 seconds for the mariadb server which means the 4300 Requests/Second could be much higher if every connection suceeds at the first try (at least the thread pool seems to work slightly better then without) _____________________________________

what makes me really worry here is while the load is running (mysqld each time restartet before start apache-benchmark)

+-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 181 | | Threadpool_threads | 189 | +-------------------------+-------+

after the benchmark has finished and the machine is idle: +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 207 | | Threadpool_threads | 208 | +-------------------------+-------+

frankly i would expect at least that this numbers are going up while the load is running to at least 200 or not going up that high at all but don't refuse conncetions which is IMHO the point of a pool The thread count increases in presence of contention. The benchmarks are usually written in such a way that all threads shutdown at the same time at the end. This runs into internal locks (LOCK_thread_count, if I remember correctly). What happens in case of the serious contention is that threadpool notices contention, starts getting worried about all threads getting blocked and possibly deadlocked, and not having enough spare threads to handle new requests( should they come). Thus threadpool will increase

Am 16.07.2017 um 07:33 schrieb Reindl Harald: the number if threads.

If you wait long enough after your benchmark (long enough is longer that thread_pool_idle_timeout seconds), the excess threads will disappear.

...
...
...
_____________________________________

the core-cms itself makes exactly two queries per request over 3 MyISAM tables, one cache-table with a pirmary key and the second one is a simple join on two tables with only 2 records, so not really something one should call real load

select SQL_NO_CACHE * from cms1_cache where `hash`='fullmenu_1_2_0de0_0' select SQL_NO_CACHE * from `cms1_sub` join `cms1_haupt` on sid=2 and hid=shid

mysqld as well as httpd had "LimitNOFILE=infinity" in the systemd-unit, the connection type is unix socket, so that all should not be a problem on a i7-3770 CPU @ 3.40GHz with 32 GB RAM

it takes some time until the errors start to appear in the log, likely after httpd (mod_prefork) had forked enough worker to introduce real concurrency to the database

Vladislav Vaintroub

1:37 p.m.

On 17.07.2017 15:07, Reindl Harald wrote:

...

Am 17.07.2017 um 15:02 schrieb Vladislav Vaintroub:

<skip>

...

see above, also another piece of my intial posting - why in the world does "Threadpool_threads" not reach the highest value under load but after the benchmark is finished

what makes me really worry here is while the load is running (mysqld each time restartet before start apache-benchmark)

+-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 181 | | Threadpool_threads | 189 | +-------------------------+-------+

after the benchmark has finished and the machine is idle: +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 207 | | Threadpool_threads | 208 | +-------------------------+-------+

The benchmarks are usually written in such a way, that all connections will disconnect *at the same time* at the end of benchmark. The disconnect code runs through couple of into global locks (LOCK_thread_count, if I remember correctly, maybe some others), i.e if several connections disconnect at the same time, there will be contention. What happens in case of the contention is that threadpool notices it , starts getting worried about all threads getting blocked and possibly deadlocked, and about not having enough spare threads to handle new requests. Thus threadpool will *increase* the number of threads in presence of contention. If you wait long enough after the benchmark (long enough is longer that thread_pool_idle_timeout seconds), the excess threads will disappear.

Reindl Harald

1:47 p.m.

Am 17.07.2017 um 15:37 schrieb Vladislav Vaintroub:

...

On 17.07.2017 15:07, Reindl Harald wrote:

...
see above, also another piece of my intial posting - why in the world does "Threadpool_threads" not reach the highest value under load but after the benchmark is finished

what makes me really worry here is while the load is running (mysqld each time restartet before start apache-benchmark)

+-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 181 | | Threadpool_threads | 189 | +-------------------------+-------+

after the benchmark has finished and the machine is idle: +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 207 | | Threadpool_threads | 208 | +-------------------------+-------+

The benchmarks are usually written in such a way, that all connections will disconnect *at the same time* at the end of benchmark.

The disconnect code runs through couple of into global locks (LOCK_thread_count, if I remember correctly, maybe some others), i.e if several connections disconnect at the same time, there will be contention.

What happens in case of the contention is that threadpool notices it , starts getting worried about all threads getting blocked and possibly deadlocked, and about not having enough spare threads to handle new requests. Thus threadpool will *increase* the number of threads in presence of contention.

If you wait long enough after the benchmark (long enough is longer that thread_pool_idle_timeout seconds), the excess threads will disappear

it's clear that they disappear after thread_pool_idle_timeout but on a benchmarks running for 120 seconds with 250 connections i expect that thread number also reache while the benchmark is actevily running

Vladislav Vaintroub

3:11 p.m.

On 17.07.2017 15:47, Reindl Harald wrote:

...

Am 17.07.2017 um 15:37 schrieb Vladislav Vaintroub:

...
On 17.07.2017 15:07, Reindl Harald wrote:

...
see above, also another piece of my intial posting - why in the world does "Threadpool_threads" not reach the highest value under load but after the benchmark is finished

what makes me really worry here is while the load is running (mysqld each time restartet before start apache-benchmark)

+-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 181 | | Threadpool_threads | 189 | +-------------------------+-------+

after the benchmark has finished and the machine is idle: +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Threadpool_idle_threads | 207 | | Threadpool_threads | 208 | +-------------------------+-------+

The benchmarks are usually written in such a way, that all connections will disconnect *at the same time* at the end of benchmark.

The disconnect code runs through couple of into global locks (LOCK_thread_count, if I remember correctly, maybe some others), i.e if several connections disconnect at the same time, there will be contention.

What happens in case of the contention is that threadpool notices it , starts getting worried about all threads getting blocked and possibly deadlocked, and about not having enough spare threads to handle new requests. Thus threadpool will *increase* the number of threads in presence of contention.

If you wait long enough after the benchmark (long enough is longer that thread_pool_idle_timeout seconds), the excess threads will disappear

it's clear that they disappear after thread_pool_idle_timeout but on a benchmarks running for 120 seconds with 250 connections i expect that thread number also reache while the benchmark is actevily running

Our thread pool tries to keep the number of active (non-waiting) threads the same as thread_pool_size. You can calculate "active count" as (Threadpool_threads - Threadpool_idle_threads) in the "show status" output. If all queries run lightning fast / never block, active count would be thread_pool_size, and Threadpool_idle_threads will be 0. Yet, there are different situations where overall number of threads grows, due to contention and/or long running queries. Often, also active count grows somewhat, but we have a mechanism to handle this "oversubscription", so it does not run out of control. It is well possible that bumping active threads will increase the throughput and decrease the query time in your benchmark. If you like to experiment, the easiest way to do it is to increase thread_pool_size. You may, dependent on the load, also get more active threads by decreasing thread_pool_stall_limit and/or increasing thread_pool_oversubscribe, but I would not recommend this, as it is not really reliably way to achieve more active threads. As explained before, the situation with "benchmark end", where a lot of threads are created, is probably due to lock contention in concurrent disconnects.

...

_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp

Reindl Harald

3:36 p.m.

Am 17.07.2017 um 17:11 schrieb Vladislav Vaintroub:

...

Our thread pool tries to keep the number of active (non-waiting) threads the same as thread_pool_size. You can calculate "active count" as (Threadpool_threads - Threadpool_idle_threads) in the "show status" output.

If all queries run lightning fast / never block, active count would be thread_pool_size, and Threadpool_idle_threads will be 0.

Yet, there are different situations where overall number of threads grows, due to contention and/or long running queries. Often, also active count grows somewhat, but we have a mechanism to handle this "oversubscription", so it does not run out of control.

It is well possible that bumping active threads will increase the throughput and decrease the query time in your benchmark. If you like to experiment, the easiest way to do it is to increase thread_pool_size. You may, dependent on the load, also get more active threads by decreasing thread_pool_stall_limit and/or increasing thread_pool_oversubscribe, but I would not recommend this, as it is not really reliably way to achieve more active threads.

"thread_pool_oversubscribe" did not help to get the "Resource temporarily unavailable" solved

...

As explained before, the situation with "benchmark end", where a lot of threads are created, is probably due to lock contention in concurrent disconnects

since only the connections between "ab" and httpd with mod_prefork are keep-alive you have the same amount of disconnects through the whole benchmark otherwise with persistent connections which have their own drawback and are not a option the connection handling on the mariadb side won#t be a problem at all

Reindl Harald

23 Jul 23 Jul

4:08 p.m.

since i am still searching for how to get the "connection handling is buggy (HY000/2002): Resource temporarily unavailable)" contention on the unix socket solved i found the "back_log" param recently https://jira.mariadb.org/browse/MDEV-13375 * it's completly ignored * it's on 110 and not at "max_connections" as the docs state DUNNO if that could improve the situation but it really makes me sad to see the "Resource temporarily unavailable" errors while a concurrency of 250 is really a joke on a 3.50 Ghz 4-core with 32 GB RAM when that also happens with just connect and do nothing else i also opened https://bugs.php.net/bug.php?id=74971 recently because that really limits scalability while the application still responds within 0.003 seconds to additional requests in the browser which also makes clear that timeouts should not happen at all Am 17.07.2017 um 17:36 schrieb Reindl Harald:

...

Am 17.07.2017 um 17:11 schrieb Vladislav Vaintroub:

...
Our thread pool tries to keep the number of active (non-waiting) threads the same as thread_pool_size. You can calculate "active count" as (Threadpool_threads - Threadpool_idle_threads) in the "show status" output.

If all queries run lightning fast / never block, active count would be thread_pool_size, and Threadpool_idle_threads will be 0.

Yet, there are different situations where overall number of threads grows, due to contention and/or long running queries. Often, also active count grows somewhat, but we have a mechanism to handle this "oversubscription", so it does not run out of control.

It is well possible that bumping active threads will increase the throughput and decrease the query time in your benchmark. If you like to experiment, the easiest way to do it is to increase thread_pool_size. You may, dependent on the load, also get more active threads by decreasing thread_pool_stall_limit and/or increasing thread_pool_oversubscribe, but I would not recommend this, as it is not really reliably way to achieve more active threads.

"thread_pool_oversubscribe" did not help to get the "Resource temporarily unavailable" solved

...
As explained before, the situation with "benchmark end", where a lot of threads are created, is probably due to lock contention in concurrent disconnects

since only the connections between "ab" and httpd with mod_prefork are keep-alive you have the same amount of disconnects through the whole benchmark otherwise with persistent connections which have their own drawback and are not a option the connection handling on the mariadb side won#t be a problem at all

Reindl Harald

17 Jul 17 Jul

1:14 p.m.

Am 17.07.2017 um 15:02 schrieb Vladislav Vaintroub:

...

You are getting HY000/2002, the range starting with 2000 (ending iirc 3000) is the error originating on client. If I was to guess it is a client-side connection timeout

are you aware that the strace output below with "EAGAIN (Resource temporarily unavailable)" is from running strace on the myslqd PID? on a machine which spits out 4000 cms pages per second while responding to additional clients via browser within 0.005 seconds there souldb e not any timeout at all under a 250 concurrency

...

...
fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK) = 0 accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 95 fcntl(88, F_SETFL, O_RDWR) = 0 fcntl(95, F_SETFD, FD_CLOEXEC) = 0 futex(0x7f65e917eae0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)

Vladislav Vaintroub

1:22 p.m.

On 17.07.2017 15:14, Reindl Harald wrote:

...

Am 17.07.2017 um 15:02 schrieb Vladislav Vaintroub:

...
You are getting HY000/2002, the range starting with 2000 (ending iirc 3000) is the error originating on client. If I was to guess it is a client-side connection timeout

are you aware that the strace output below with "EAGAIN (Resource temporarily unavailable)" is from running strace on the myslqd PID?

And what does it prove? With server doing non-blocking socket reads, EAGAIN is expected , and handled by switching to poll(). Since long time (if memory serves me right 10.0/MySQL 5.6), this is how net_read_timeout/net_write_timeout are handled on Unixes,on the server side.

...

on a machine which spits out 4000 cms pages per second while responding to additional clients via browser within 0.005 seconds there souldb e not any timeout at all under a 250 concurrency

...
...
fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK) = 0 accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 95 fcntl(88, F_SETFL, O_RDWR) = 0 fcntl(95, F_SETFD, FD_CLOEXEC) = 0 futex(0x7f65e917eae0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)

_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp

Reindl Harald

19 Aug 19 Aug

10:02 p.m.

Am 17.07.2017 um 15:22 schrieb Vladislav Vaintroub:

...

On 17.07.2017 15:14, Reindl Harald wrote:

...
Am 17.07.2017 um 15:02 schrieb Vladislav Vaintroub:

...
You are getting HY000/2002, the range starting with 2000 (ending iirc 3000) is the error originating on client. If I was to guess it is a client-side connection timeout

are you aware that the strace output below with "EAGAIN (Resource temporarily unavailable)" is from running strace on the myslqd PID?

And what does it prove? With server doing non-blocking socket reads, EAGAIN is expected , and handled by switching to poll(). Since long time (if memory serves me right 10.0/MySQL 5.6), this is how net_read_timeout/net_write_timeout are handled on Unixes,on the server side

after 10.2.8 fixed https://jira.mariadb.org/browse/MDEV-13375 it proves that all the talking about timeouts and issue on the client side was nonsense as i said from the very begin [harry@srv-rhsoft:~]$ ab -c 150 -n 1000000 -k http://corecms Concurrency Level: 150 Time taken for tests: 214.193 seconds Complete requests: 1000000 Failed requests: 0 Keep-Alive requests: 1000000 Total transferred: 4188197222 bytes HTML transferred: 3777430423 bytes Requests per second: 4668.69 [#/sec] (mean) Time per request: 32.129 [ms] (mean) Time per request: 0.214 [ms] (mean, across all concurrent requests) Transfer rate: 19095.09 [Kbytes/sec] received so longer any "HY000/2002): Resource temporarily unavailable"

2874

Age (days ago)

2908

Last active (days ago)

List overview

15 comments

3 participants

participants (3)

Reindl Harald
Reinis Rozitis
Vladislav Vaintroub

[Maria-discuss] connection handling is buggy (HY000/2002): Resource temporarily unavailable)

tags

participants (3)