Re: 95f2a07d323: MDEV-33582 Add more warnings to be able to better diagnose network issues
Hi, Michael, On Mar 04, Michael Widenius wrote:
revision-id: 95f2a07d323 (mariadb-10.6.17-30-g95f2a07d323) parent(s): 33dcf8155b4 author: Michael Widenius committer: Michael Widenius timestamp: 2024-03-04 14:52:03 +0200 message:
MDEV-33582 Add more warnings to be able to better diagnose network issues
Warnings are added to net_server.cc when global_system_variables.log_warnings >= 4.
When the above condition holds then: - All communication errors from net_serv.cc is also written to the general log.
"to the error log"
- In case of a of not being able to read or write a packet, a more detailed
"of a of" ?
error is given.
Other things: - vio_close() is now first marking the socket closed before closing it. The reason for this is to ensure that the connection that gets a read error can check if the reason was that the socket was closed. - Added option to NET to avoid calling my_error() when there is no data to be read on socket. This is only used in semi sync master to speed up handling zero size packages that seams to happen from time to time.
"seems" I don't understand why it happens and whether it's normal or not, and if normal, why it's normal only for semisync and not normal otherwise.
- Add a new state to vio to be able to detect if vio is acive, shutdown or closed. - Testing of the new warnings is done in rpl_get_lock.test - Suppress some of the new warnings in mtr to allow one to run some of the tests with -mysqld=--log-warnings=4. All test in the 'rpl' suite can now be run with this option. - Ensure that global.log_warnings are restored at test end in a way that allows one to use mtr --mysqld=--log-warnings=4.
diff --git a/mysql-test/mariadb-test-run.pl b/mysql-test/mariadb-test-run.pl index 4b3d323760d..eabfe1171cc 100755 --- a/mysql-test/mariadb-test-run.pl +++ b/mysql-test/mariadb-test-run.pl @@ -4486,6 +4486,11 @@ sub extract_warning_lines ($$) { qr/Slave I\/0: Master command COM_BINLOG_DUMP failed/, qr/Error reading packet/, qr/Lost connection to MariaDB server at 'reading initial communication packet'/, + qr/Could not read packet:.* state: [2-3] /, + qr/Could not read packet:.* errno: 104 /, + qr/Could not write packet:.* errno: 32 /, + qr/Could not write packet:.* errno: 104 /, + qr/Connection was killed/,
why are these errors "normal" and can be ignored?
qr/Failed on request_dump/, qr/Slave: Can't drop database.* database doesn't exist/, qr/Slave: Operation DROP USER failed for 'create_rout_db'/, diff --git a/mysql-test/suite/rpl/include/rpl_extra_col_master.test b/mysql-test/suite/rpl/include/rpl_extra_col_master.test index 02645a577f5..512118f91cd 100644 --- a/mysql-test/suite/rpl/include/rpl_extra_col_master.test +++ b/mysql-test/suite/rpl/include/rpl_extra_col_master.test @@ -59,6 +59,10 @@ #VARCHAR(M) #
+--disable_query_log +call mtr.add_suppression("Could not read packet:.* errno: 11"); +--enable_query_log
error 11? "Resource temporarily unavailable"? What does it mean in this context?
+ --let $_saved_conn= $CURRENT_CONNECTION
let $binformat = `SHOW VARIABLES LIKE '%binlog_format%'`; diff --git a/sql/semisync_master_ack_receiver.cc b/sql/semisync_master_ack_receiver.cc index 9320c7fc3e6..1bb462ef934 100644 --- a/sql/semisync_master_ack_receiver.cc +++ b/sql/semisync_master_ack_receiver.cc @@ -256,6 +256,12 @@ void Ack_receiver::run() thd->security_ctx->skip_grants(); thd->set_command(COM_DAEMON); init_net(&net, net_buff, REPLY_MESSAGE_MAX_LENGTH); + /* + Don't generate error message in case of no data on socket. Will speed + up my_net_read() slightly in case of NET_READ_INTERRUPTED. + */ + net.no_errors_for_no_data= 1;
do you mean, it's for performance reasons?
+
Regards, Sergei Chief Architect, MariaDB Server and security@mariadb.org
Hi! On Tue, Mar 5, 2024 at 1:47 AM Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Michael,
<cut>
to be read on socket. This is only used in semi sync master to speed up handling zero size packages that seams to happen from time to time.
"seems"
I don't understand why it happens and whether it's normal or not, and if normal, why it's normal only for semisync and not normal otherwise.
I changed the code to assume that when recv() returns 0 and errno = 0 this is actually a communication error. This fixed the issues and I could remove the flag. <cut>
+++ b/mysql-test/mariadb-test-run.pl @@ -4486,6 +4486,11 @@ sub extract_warning_lines ($$) { qr/Slave I\/0: Master command COM_BINLOG_DUMP failed/, qr/Error reading packet/, qr/Lost connection to MariaDB server at 'reading initial communication packet'/, + qr/Could not read packet:.* state: [2-3] /, + qr/Could not read packet:.* errno: 104 /, + qr/Could not write packet:.* errno: 32 /, + qr/Could not write packet:.* errno: 104 /, + qr/Connection was killed/,
why are these errors "normal" and can be ignored?
These are only printed as part of 'extended logging' when using --log-warnings=4 and used to produce extra information (not errors) why things happens. Many of the errors are also depending on timing so it is better to suppress them. We also do not have any other option than suppress these, as they are not part of any result set. <cut>
--- a/mysql-test/suite/rpl/include/rpl_extra_col_master.test +++ b/mysql-test/suite/rpl/include/rpl_extra_col_master.test @@ -59,6 +59,10 @@ #VARCHAR(M) #
+--disable_query_log +call mtr.add_suppression("Could not read packet:.* errno: 11"); +--enable_query_log
error 11? "Resource temporarily unavailable"? What does it mean in this context?
mtr --mysqld=--log-warnings=4 rpl_extra_col_master_innodb Produces 3 lines on the master and 12 times on the slave of type: var/log/mysqld.2.err:2024-03-05 20:10:34 6 [ERROR] mariadbd: Could not read packet: fd: 68 state: 2 remain: 4 errno: 11 vio_errno: 1158 length: 0 This comes from killing connections and disconnecting slaves in the test Probably that the connection as
+ --let $_saved_conn= $CURRENT_CONNECTION
let $binformat = `SHOW VARIABLES LIKE '%binlog_format%'`; diff --git a/sql/semisync_master_ack_receiver.cc b/sql/semisync_master_ack_receiver.cc index 9320c7fc3e6..1bb462ef934 100644 --- a/sql/semisync_master_ack_receiver.cc +++ b/sql/semisync_master_ack_receiver.cc @@ -256,6 +256,12 @@ void Ack_receiver::run() thd->security_ctx->skip_grants(); thd->set_command(COM_DAEMON); init_net(&net, net_buff, REPLY_MESSAGE_MAX_LENGTH); + /* + Don't generate error message in case of no data on socket. Will speed + up my_net_read() slightly in case of NET_READ_INTERRUPTED. + */ + net.no_errors_for_no_data= 1;
do you mean, it's for performance reasons?
Yes, that was the original intent. However, I have now removed this as I was able to find and fix the source of the problem. Thanks for the review! Regards, Monty
participants (2)
-
Michael Widenius
-
Sergei Golubchik