[Maria-discuss] Performance tuning sought for MariaDB
I am running MariaDB 10.0.34 on a Slackware 14.2 system. I have a C application that interacts with MariaDB in the following way: 1. Read data from a FIFO. 2. Insert the data into a table in a MariaDB database, if absent, or modify it according to certain specific criteria otherwise. This works as expected. The problem that I have is that data are being written to the FIFO at a fast rate. In order to be able to keep up, at any given time my application reads the data available at the FIFO, and spawns a thread to process the chunk of data just read. It is in this thread that all the database interaction takes place. In order to deal with this, I have the following entries in my /etc/my.cnf file: # this is read by the standalone daemon and embedded servers [server] # this is only for the mysqld standalone daemon [mysqld] # thread_handling=pool-of-threads # log=/var/log/mysqld.log # this is only for embedded server [embedded] # This group is only read by MariaDB servers, not by MySQL. # If you use the same .cnf file for MySQL and MariaDB, # you can put MariaDB-only options here [mariadb] # log=/var/log/mysqld.log general_log_file = /var/lib/mysql/mysql.log # general_log = 1 # transaction-isolation = READ-COMMITTED # key_buffer = 1280M # 128MB for every 1GB of RAM # sort_buffer_size = 1M # 1MB for every 1GB of RAM # read_buffer_size = 1M # 1MB for every 1GB of RAM # read_rnd_buffer_size = 1M # 1MB for every 1GB of RAM # thread_concurrency = 24 # Based on the number of CPUs # so make it CPU*2 # thread-handling=pool-of-threads # innodb_flush_log_at_trx_commit != 1 # open_files_limit = 50000 thread-handling=pool-of-threads max_connections = 1000 table_open_cache = 800 query_cache_type = 0 innodb_buffer_pool_size = 512M innodb_buffer_pool_instances = 10 innodb_adaptive_hash_index_partitions = 20 innodb_lock_wait_timeout = 5000 With this, my application can keep up with the FIFO writer, but - depending on the circumstances - my database can't. As I am writing this, there are over 1300 threads connected to my database; any command that I issue at the mysql CLI takes over one minute to return. I am keeping track on how long each thread takes to complete, and that is of the order of hundreds of seconds - sometimes thousands. Each thread is itself simple, in that it just issues a couple of simple MariaDB commands. Currently my table consists of 1.6 million entries, and growing - on this basis, I expect that things will get only worse. Each entry,however, will never require more than a couple of hundred bytes of storage. The operations that can be undertaken on entries are insertion, deletion and modification, the latter being straightforward - like e.g. incrementing a counter or replacing a short string. My system has 24 GB of RAM and 12 cores. Occasionally all the cores are fully busy with MariaDB activity, but most of the time barely one or two are. I am a newbie when it comes to interacting with MariaDB - please, bear with me. I know I must use a single database and a single table. I also know - because of the nature of the data that are being written to the FIFO - that the probability for two different threads to be operating on the same entry in the table at the same time is negligible - i.e. for all practical purposes, that will not happen. What I need is advice on how to configure my instance of MariaDB to perform optimally in the scenario above. In particular, I would like for it to make better use of all the cores available - in essence, to parallelize the database operations as much as possible. Feedback from the experts will be much appreciated.
Hello, thread-handling=pool-of-threads
max_connections = 1000 table_open_cache = 800 query_cache_type = 0 innodb_buffer_pool_size = 512M innodb_buffer_pool_instances = 10 innodb_adaptive_hash_index_partitions = 20 innodb_lock_wait_timeout = 5000
No need to use buffer pool instances with only 512M of buffer. you said you have 24GB of RAM - why not increase the buffer size? how big is your table on the disk right now? If you want the best performance it must be hold in the buffer.
With this, my application can keep up with the FIFO writer, but - depending on the circumstances - my database can't. As I am writing this, there are over 1300 threads connected to my database; any command that I issue at the mysql CLI takes over one minute to return. I am keeping track on how long each thread takes to complete, and that is of the order of hundreds of seconds - sometimes thousands. Each thread is itself simple, in that it just issues a couple of simple MariaDB commands. Currently my table consists of 1.6 million entries, and growing - on this basis, I expect that things will get only worse. Each entry,however, will never require more than a couple of hundred bytes of storage. The operations that can be undertaken on entries are insertion, deletion and modification, the latter being straightforward - like e.g. incrementing a counter or replacing a short string.
You are not considering many factors which is a) limiting the number of concurrent threads - with 1300 threads you are creating concurrency races and locking issues. Try limiting your threads to a factor of the # of cores b), you're writing to a disk system. The number of CPU cores won't matter if you saturate the disk. You say nothing about the disk, if it's SSD, HDD etc. Note that HDD's are severely handicapped when it comes to concurrent IO operations. c) given the factor above you could maybe try relaxing commit to disk if integrity of the data is not of utmost importance, for example by adding "innodb_flush_log_at_trx_commit = 2" to your config. My system has 24 GB of RAM and 12 cores. Occasionally all the cores are fully busy with MariaDB activity, but most of the time barely one or two are. d) CPU in a DB system will mostly be used for queries (index scans) and some for reindexing but you're only doing writes, so unless you have dozens of indexes e) Table design - is your table design efficient ? how many rows per sec are you writing? maybe you could also benefit from hash partitioning or clever data types.
I am a newbie when it comes to interacting with MariaDB - please, bear with me. I know I must use a single database and a single table. I also know - because of the nature of the data that are being written to the FIFO - that the probability for two different threads to be operating on the same entry in the table at the same time is negligible - i.e. for all practical purposes, that will not happen. What I need is advice on how to configure my instance of MariaDB to perform optimally in the scenario above. In particular, I would like for it to make better use of all the cores available - in essence, to parallelize the database operations as much as possible.
f) Congratulations, you have managed to contradict yourself in two
sentences. If your workload isn't parallel by design you will not make use of the available cores. Also, your workload sounds IO-bound to me - there's a strong change the disk is the bottleneck. g) "I know I must use a single database and a single table." How do you know this if you are a newbie? No offense meant but nost of monolithic design is not the best to leverage performance.... -GL
Thanks for your feedback. Please see my comments interspersed below. On Mon, Oct 7, 2019 at 10:38 AM Guillaume Lefranc <guillaume@adishatz.net> wrote:
Hello,
thread-handling=pool-of-threads
max_connections = 1000 table_open_cache = 800 query_cache_type = 0 innodb_buffer_pool_size = 512M innodb_buffer_pool_instances = 10 innodb_adaptive_hash_index_partitions = 20 innodb_lock_wait_timeout = 5000
No need to use buffer pool instances with only 512M of buffer. you said you have 24GB of RAM - why not increase the buffer size? how big is your table on the disk right now? If you want the best performance it must be hold in the buffer.
It does not seem to be that big - about 101 MB. I used the following command: SELECT table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size (MB)` FROM information_schema.TABLES WHERE table_schema = "<my-db-name>"; I had to look it up.
With this, my application can keep up with the FIFO writer, but - depending on the circumstances - my database can't. As I am writing this, there are over 1300 threads connected to my database; any command that I issue at the mysql CLI takes over one minute to return. I am keeping track on how long each thread takes to complete, and that is of the order of hundreds of seconds - sometimes thousands. Each thread is itself simple, in that it just issues a couple of simple MariaDB commands. Currently my table consists of 1.6 million entries, and growing - on this basis, I expect that things will get only worse. Each entry,however, will never require more than a couple of hundred bytes of storage. The operations that can be undertaken on entries are insertion, deletion and modification, the latter being straightforward - like e.g. incrementing a counter or replacing a short string.
You are not considering many factors which is a) limiting the number of concurrent threads - with 1300 threads you are creating concurrency races and locking issues. Try limiting your threads to a factor of the # of cores
I will. Notice, though, that most of the time most of the cores are idle anyway.
b), you're writing to a disk system. The number of CPU cores won't matter if you saturate the disk. You say nothing about the disk, if it's SSD, HDD etc. Note that HDD's are severely handicapped when it comes to concurrent IO operations.
It is an HDD. I am sure it will be a factor in making things slower.
c) given the factor above you could maybe try relaxing commit to disk if integrity of the data is not of utmost importance, for example by adding "innodb_flush_log_at_trx_commit = 2" to your config.
Thanks - I'll try that.
My system has 24 GB of RAM and 12 cores. Occasionally all the cores are fully busy with MariaDB activity, but most of the time barely one or two are.
d) CPU in a DB system will mostly be used for queries (index scans) and some for reindexing but you're only doing writes, so unless you have dozens of indexes e) Table design - is your table design efficient ? how many rows per sec are you writing? maybe you could also benefit from hash partitioning or clever data types.
I am a newbie when it comes to interacting with MariaDB - please, bear with me. I know I must use a single database and a single table. I also know - because of the nature of the data that are being written to the FIFO - that the probability for two different threads to be operating on the same entry in the table at the same time is negligible - i.e. for all practical purposes, that will not happen. What I need is advice on how to configure my instance of MariaDB to perform optimally in the scenario above. In particular, I would like for it to make better use of all the cores available - in essence, to parallelize the database operations as much as possible.
f) Congratulations, you have managed to contradict yourself in two
sentences.
available cores. Also, your workload sounds IO-bound to me - there's a strong change the disk is the bottleneck. g) "I know I must use a single database and a single table." How do you know this if you are a newbie? No offense meant but nost of monolithic design is not the best to leverage performance....
The data that I have consist of a string S, two string attributes T and A, and an integer attribute D. String S is what I have to use as a key in order to insert/delete/modify entries. Can this be advantageously be stored across several tables? The S strings have nothing in common, beyond
If your workload isn't parallel by design you will not make use of the the attributes that I mentioned. As for the parallelism, the essence is what I pointed out - namely, that no two threads will work on the same entry (as accessed by S) at the same time. They can of course be adding new entries, or deleting existing ones, at the same time, with the proviso that the insertion operations will always be for different values of S, and analogously for the deletion operations. I am indeed open to suggestions about a better design.
-GL
Opening many connections will cause issues, especially if each thread is running a sub-optimal query. Try to decrease the number of connections, one-way can be using message queues. Tuning the database I/O is important, ensure you are using a separate mount point for /var/lib/mysql and mount it with noatime. Resize the innodb_buffer_pool_size to ensure that it can hold your workload. Subsequently, perform some tests with innodb_flush_method = "O_DIRECT".
From your description, data S is the key which in turn is composed of T and A. Can the data be stripped by the application and T and A stored separately? Maybe use persistent virtual columns and index those instead. Recall that in InnoDB the primary key is a clustered index thus the table is written to disk sorted by column S. Inserts and updates may require moving rows around slowing the SQL statements (high insert/update concurrency will worsen the situation). If column S is the primary key and is large, all other indexes will use the primary key as the prefix, causing indexes to be large increasing the load on the system I/O.
On Mon, Oct 7, 2019 at 7:13 PM JCA <1.41421@gmail.com> wrote:
Thanks for your feedback. Please see my comments interspersed below.
On Mon, Oct 7, 2019 at 10:38 AM Guillaume Lefranc <guillaume@adishatz.net> wrote:
Hello,
thread-handling=pool-of-threads
max_connections = 1000 table_open_cache = 800 query_cache_type = 0 innodb_buffer_pool_size = 512M innodb_buffer_pool_instances = 10 innodb_adaptive_hash_index_partitions = 20 innodb_lock_wait_timeout = 5000
No need to use buffer pool instances with only 512M of buffer. you said you have 24GB of RAM - why not increase the buffer size? how big is your table on the disk right now? If you want the best performance it must be hold in the buffer.
It does not seem to be that big - about 101 MB. I used the following command:
SELECT table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size (MB)` FROM information_schema.TABLES WHERE table_schema = "<my-db-name>";
I had to look it up.
With this, my application can keep up with the FIFO writer, but - depending on the circumstances - my database can't. As I am writing this, there are over 1300 threads connected to my database; any command that I issue at the mysql CLI takes over one minute to return. I am keeping track on how long each thread takes to complete, and that is of the order of hundreds of seconds - sometimes thousands. Each thread is itself simple, in that it just issues a couple of simple MariaDB commands. Currently my table consists of 1.6 million entries, and growing - on this basis, I expect that things will get only worse. Each entry,however, will never require more than a couple of hundred bytes of storage. The operations that can be undertaken on entries are insertion, deletion and modification, the latter being straightforward - like e.g. incrementing a counter or replacing a short string.
You are not considering many factors which is a) limiting the number of concurrent threads - with 1300 threads you are creating concurrency races and locking issues. Try limiting your threads to a factor of the # of cores
I will. Notice, though, that most of the time most of the cores are idle anyway.
b), you're writing to a disk system. The number of CPU cores won't matter if you saturate the disk. You say nothing about the disk, if it's SSD, HDD etc. Note that HDD's are severely handicapped when it comes to concurrent IO operations.
It is an HDD. I am sure it will be a factor in making things slower.
c) given the factor above you could maybe try relaxing commit to disk if integrity of the data is not of utmost importance, for example by adding "innodb_flush_log_at_trx_commit = 2" to your config.
Thanks - I'll try that.
My system has 24 GB of RAM and 12 cores. Occasionally all the cores are fully busy with MariaDB activity, but most of the time barely one or two are.
d) CPU in a DB system will mostly be used for queries (index scans) and some for reindexing but you're only doing writes, so unless you have dozens of indexes e) Table design - is your table design efficient ? how many rows per sec are you writing? maybe you could also benefit from hash partitioning or clever data types.
I am a newbie when it comes to interacting with MariaDB - please, bear with me. I know I must use a single database and a single table. I also know - because of the nature of the data that are being written to the FIFO - that the probability for two different threads to be operating on the same entry in the table at the same time is negligible - i.e. for all practical purposes, that will not happen. What I need is advice on how to configure my instance of MariaDB to perform optimally in the scenario above. In particular, I would like for it to make better use of all the cores available - in essence, to parallelize the database operations as much as possible.
f) Congratulations, you have managed to contradict yourself in two
sentences.
available cores. Also, your workload sounds IO-bound to me - there's a strong change the disk is the bottleneck. g) "I know I must use a single database and a single table." How do you know this if you are a newbie? No offense meant but nost of monolithic design is not the best to leverage performance....
The data that I have consist of a string S, two string attributes T and A, and an integer attribute D. String S is what I have to use as a key in order to insert/delete/modify entries. Can this be advantageously be stored across several tables? The S strings have nothing in common, beyond
If your workload isn't parallel by design you will not make use of the the attributes that I mentioned. As for the parallelism, the essence is what I pointed out - namely, that no two threads will work on the same entry (as accessed by S) at the same time. They can of course be adding new entries, or deleting existing ones, at the same time, with the proviso that the insertion operations will always be for different values of S, and analogously for the deletion operations. I am indeed open to suggestions about a better design.
-GL
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
On Mon, Oct 7, 2019 at 12:12 PM Kenneth Penza <kpenza@gmail.com> wrote:
Opening many connections will cause issues, especially if each thread is running a sub-optimal query. Try to decrease the number of connections, one-way can be using message queues.
Tuning the database I/O is important, ensure you are using a separate mount point for /var/lib/mysql and mount it with noatime. Resize the innodb_buffer_pool_size to ensure that it can hold your workload. Subsequently, perform some tests with innodb_flush_method = "O_DIRECT".
From your description, data S is the key which in turn is composed of T and A. Can the data be stripped by the application and T and A stored separately? Maybe use persistent virtual columns and index those instead.
I don't think I understand. It is not that S is composed of T and A; what I am reading from the FIFO is two items: S and A. T is derived from S and A, together with a timestamp. Recall that in InnoDB the primary key is a clustered index thus the table
is written to disk sorted by column S. Inserts and updates may require moving rows around slowing the SQL statements (high insert/update concurrency will worsen the situation). If column S is the primary key and is large, all other indexes will use the primary key as the prefix, causing indexes to be large increasing the load on the system I/O.
There is something to what you are saying, for I have a high insert/update concurrency. I will definitely try to understand to details of what you are suggesting.
On Mon, Oct 7, 2019 at 7:13 PM JCA <1.41421@gmail.com> wrote:
Thanks for your feedback. Please see my comments interspersed below.
On Mon, Oct 7, 2019 at 10:38 AM Guillaume Lefranc <guillaume@adishatz.net> wrote:
Hello,
thread-handling=pool-of-threads
max_connections = 1000 table_open_cache = 800 query_cache_type = 0 innodb_buffer_pool_size = 512M innodb_buffer_pool_instances = 10 innodb_adaptive_hash_index_partitions = 20 innodb_lock_wait_timeout = 5000
No need to use buffer pool instances with only 512M of buffer. you said you have 24GB of RAM - why not increase the buffer size? how big is your table on the disk right now? If you want the best performance it must be hold in the buffer.
It does not seem to be that big - about 101 MB. I used the following command:
SELECT table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size (MB)` FROM information_schema.TABLES WHERE table_schema = "<my-db-name>";
I had to look it up.
With this, my application can keep up with the FIFO writer, but - depending on the circumstances - my database can't. As I am writing this, there are over 1300 threads connected to my database; any command that I issue at the mysql CLI takes over one minute to return. I am keeping track on how long each thread takes to complete, and that is of the order of hundreds of seconds - sometimes thousands. Each thread is itself simple, in that it just issues a couple of simple MariaDB commands. Currently my table consists of 1.6 million entries, and growing - on this basis, I expect that things will get only worse. Each entry,however, will never require more than a couple of hundred bytes of storage. The operations that can be undertaken on entries are insertion, deletion and modification, the latter being straightforward - like e.g. incrementing a counter or replacing a short string.
You are not considering many factors which is a) limiting the number of concurrent threads - with 1300 threads you are creating concurrency races and locking issues. Try limiting your threads to a factor of the # of cores
I will. Notice, though, that most of the time most of the cores are idle anyway.
b), you're writing to a disk system. The number of CPU cores won't matter if you saturate the disk. You say nothing about the disk, if it's SSD, HDD etc. Note that HDD's are severely handicapped when it comes to concurrent IO operations.
It is an HDD. I am sure it will be a factor in making things slower.
c) given the factor above you could maybe try relaxing commit to disk if integrity of the data is not of utmost importance, for example by adding "innodb_flush_log_at_trx_commit = 2" to your config.
Thanks - I'll try that.
My system has 24 GB of RAM and 12 cores. Occasionally all the cores are fully busy with MariaDB activity, but most of the time barely one or two are.
d) CPU in a DB system will mostly be used for queries (index scans) and some for reindexing but you're only doing writes, so unless you have dozens of indexes e) Table design - is your table design efficient ? how many rows per sec are you writing? maybe you could also benefit from hash partitioning or clever data types.
I am a newbie when it comes to interacting with MariaDB - please, bear with me. I know I must use a single database and a single table. I also know - because of the nature of the data that are being written to the FIFO - that the probability for two different threads to be operating on the same entry in the table at the same time is negligible - i.e. for all practical purposes, that will not happen. What I need is advice on how to configure my instance of MariaDB to perform optimally in the scenario above. In particular, I would like for it to make better use of all the cores available - in essence, to parallelize the database operations as much as possible.
f) Congratulations, you have managed to contradict yourself in two
sentences.
available cores. Also, your workload sounds IO-bound to me - there's a strong change the disk is the bottleneck. g) "I know I must use a single database and a single table." How do you know this if you are a newbie? No offense meant but nost of monolithic design is not the best to leverage performance....
The data that I have consist of a string S, two string attributes T and A, and an integer attribute D. String S is what I have to use as a key in order to insert/delete/modify entries. Can this be advantageously be stored across several tables? The S strings have nothing in common, beyond
If your workload isn't parallel by design you will not make use of the the attributes that I mentioned. As for the parallelism, the essence is what I pointed out - namely, that no two threads will work on the same entry (as accessed by S) at the same time. They can of course be adding new entries, or deleting existing ones, at the same time, with the proviso that the insertion operations will always be for different values of S, and analogously for the deletion operations. I am indeed open to suggestions about a better design.
-GL
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Might be worth having a look at TokuDB https://mariadb.com/kb/en/library/tokudb/ From: Maria-discuss <maria-discuss-bounces+rhys.campbell=swisscom.com@lists.launchpad.net> On Behalf Of JCA Sent: 07 October 2019 21:03 To: Kenneth Penza <kpenza@gmail.com> Cc: Mailing-List mariadb <maria-discuss@lists.launchpad.net> Subject: Re: [Maria-discuss] Performance tuning sought for MariaDB On Mon, Oct 7, 2019 at 12:12 PM Kenneth Penza <kpenza@gmail.com<mailto:kpenza@gmail.com>> wrote: Opening many connections will cause issues, especially if each thread is running a sub-optimal query. Try to decrease the number of connections, one-way can be using message queues. Tuning the database I/O is important, ensure you are using a separate mount point for /var/lib/mysql and mount it with noatime. Resize the innodb_buffer_pool_size to ensure that it can hold your workload. Subsequently, perform some tests with innodb_flush_method = "O_DIRECT". From your description, data S is the key which in turn is composed of T and A. Can the data be stripped by the application and T and A stored separately? Maybe use persistent virtual columns and index those instead. I don't think I understand. It is not that S is composed of T and A; what I am reading from the FIFO is two items: S and A. T is derived from S and A, together with a timestamp. Recall that in InnoDB the primary key is a clustered index thus the table is written to disk sorted by column S. Inserts and updates may require moving rows around slowing the SQL statements (high insert/update concurrency will worsen the situation). If column S is the primary key and is large, all other indexes will use the primary key as the prefix, causing indexes to be large increasing the load on the system I/O. There is something to what you are saying, for I have a high insert/update concurrency. I will definitely try to understand to details of what you are suggesting. On Mon, Oct 7, 2019 at 7:13 PM JCA <1.41421@gmail.com<mailto:1.41421@gmail.com>> wrote: Thanks for your feedback. Please see my comments interspersed below. On Mon, Oct 7, 2019 at 10:38 AM Guillaume Lefranc <guillaume@adishatz.net<mailto:guillaume@adishatz.net>> wrote: Hello, thread-handling=pool-of-threads max_connections = 1000 table_open_cache = 800 query_cache_type = 0 innodb_buffer_pool_size = 512M innodb_buffer_pool_instances = 10 innodb_adaptive_hash_index_partitions = 20 innodb_lock_wait_timeout = 5000 No need to use buffer pool instances with only 512M of buffer. you said you have 24GB of RAM - why not increase the buffer size? how big is your table on the disk right now? If you want the best performance it must be hold in the buffer. It does not seem to be that big - about 101 MB. I used the following command: SELECT table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size (MB)` FROM information_schema.TABLES WHERE table_schema = "<my-db-name>"; I had to look it up. With this, my application can keep up with the FIFO writer, but - depending on the circumstances - my database can't. As I am writing this, there are over 1300 threads connected to my database; any command that I issue at the mysql CLI takes over one minute to return. I am keeping track on how long each thread takes to complete, and that is of the order of hundreds of seconds - sometimes thousands. Each thread is itself simple, in that it just issues a couple of simple MariaDB commands. Currently my table consists of 1.6 million entries, and growing - on this basis, I expect that things will get only worse. Each entry,however, will never require more than a couple of hundred bytes of storage. The operations that can be undertaken on entries are insertion, deletion and modification, the latter being straightforward - like e.g. incrementing a counter or replacing a short string. You are not considering many factors which is a) limiting the number of concurrent threads - with 1300 threads you are creating concurrency races and locking issues. Try limiting your threads to a factor of the # of cores I will. Notice, though, that most of the time most of the cores are idle anyway. b), you're writing to a disk system. The number of CPU cores won't matter if you saturate the disk. You say nothing about the disk, if it's SSD, HDD etc. Note that HDD's are severely handicapped when it comes to concurrent IO operations. It is an HDD. I am sure it will be a factor in making things slower. c) given the factor above you could maybe try relaxing commit to disk if integrity of the data is not of utmost importance, for example by adding "innodb_flush_log_at_trx_commit = 2" to your config. Thanks - I'll try that. My system has 24 GB of RAM and 12 cores. Occasionally all the cores are fully busy with MariaDB activity, but most of the time barely one or two are. d) CPU in a DB system will mostly be used for queries (index scans) and some for reindexing but you're only doing writes, so unless you have dozens of indexes e) Table design - is your table design efficient ? how many rows per sec are you writing? maybe you could also benefit from hash partitioning or clever data types. I am a newbie when it comes to interacting with MariaDB - please, bear with me. I know I must use a single database and a single table. I also know - because of the nature of the data that are being written to the FIFO - that the probability for two different threads to be operating on the same entry in the table at the same time is negligible - i.e. for all practical purposes, that will not happen. What I need is advice on how to configure my instance of MariaDB to perform optimally in the scenario above. In particular, I would like for it to make better use of all the cores available - in essence, to parallelize the database operations as much as possible. f) Congratulations, you have managed to contradict yourself in two sentences. If your workload isn't parallel by design you will not make use of the available cores. Also, your workload sounds IO-bound to me - there's a strong change the disk is the bottleneck. g) "I know I must use a single database and a single table." How do you know this if you are a newbie? No offense meant but nost of monolithic design is not the best to leverage performance.... The data that I have consist of a string S, two string attributes T and A, and an integer attribute D. String S is what I have to use as a key in order to insert/delete/modify entries. Can this be advantageously be stored across several tables? The S strings have nothing in common, beyond the attributes that I mentioned. As for the parallelism, the essence is what I pointed out - namely, that no two threads will work on the same entry (as accessed by S) at the same time. They can of course be adding new entries, or deleting existing ones, at the same time, with the proviso that the insertion operations will always be for different values of S, and analogously for the deletion operations. I am indeed open to suggestions about a better design. -GL _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net<mailto:maria-discuss@lists.launchpad.net> Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
I doubt that TokuDB will help with 100MB of data. It's been designed with big datasets in mind and generally underperforms with small data. The question is more how fast you can read and write to the table. At this point even an in-memory database could be considered and maybe MariaDB isn't even the right choice for that (Redis?) Le mer. 9 oct. 2019 à 09:15, <Rhys.Campbell@swisscom.com> a écrit :
Might be worth having a look at TokuDB https://mariadb.com/kb/en/library/tokudb/
*From:* Maria-discuss <maria-discuss-bounces+rhys.campbell= swisscom.com@lists.launchpad.net> *On Behalf Of *JCA *Sent:* 07 October 2019 21:03 *To:* Kenneth Penza <kpenza@gmail.com> *Cc:* Mailing-List mariadb <maria-discuss@lists.launchpad.net> *Subject:* Re: [Maria-discuss] Performance tuning sought for MariaDB
On Mon, Oct 7, 2019 at 12:12 PM Kenneth Penza <kpenza@gmail.com> wrote:
Opening many connections will cause issues, especially if each thread is running a sub-optimal query. Try to decrease the number of connections, one-way can be using message queues.
Tuning the database I/O is important, ensure you are using a separate mount point for /var/lib/mysql and mount it with noatime. Resize the innodb_buffer_pool_size to ensure that it can hold your workload. Subsequently, perform some tests with innodb_flush_method = "O_DIRECT".
From your description, data S is the key which in turn is composed of T and A. Can the data be stripped by the application and T and A stored separately? Maybe use persistent virtual columns and index those instead.
I don't think I understand. It is not that S is composed of T and A; what I am reading from the FIFO is two items: S and A. T is derived from S and A, together with a timestamp.
Recall that in InnoDB the primary key is a clustered index thus the table is written to disk sorted by column S. Inserts and updates may require moving rows around slowing the SQL statements (high insert/update concurrency will worsen the situation). If column S is the primary key and is large, all other indexes will use the primary key as the prefix, causing indexes to be large increasing the load on the system I/O.
There is something to what you are saying, for I have a high insert/update concurrency. I will definitely try to understand to details of what you are suggesting.
On Mon, Oct 7, 2019 at 7:13 PM JCA <1.41421@gmail.com> wrote:
Thanks for your feedback. Please see my comments interspersed below.
On Mon, Oct 7, 2019 at 10:38 AM Guillaume Lefranc <guillaume@adishatz.net> wrote:
Hello,
thread-handling=pool-of-threads max_connections = 1000 table_open_cache = 800 query_cache_type = 0 innodb_buffer_pool_size = 512M innodb_buffer_pool_instances = 10 innodb_adaptive_hash_index_partitions = 20 innodb_lock_wait_timeout = 5000
No need to use buffer pool instances with only 512M of buffer. you said you have 24GB of RAM - why not increase the buffer size? how big is your table on the disk right now? If you want the best performance it must be hold in the buffer.
It does not seem to be that big - about 101 MB. I used the following command:
SELECT table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size (MB)` FROM information_schema.TABLES WHERE table_schema = "<my-db-name>";
I had to look it up.
With this, my application can keep up with the FIFO writer, but - depending on the circumstances - my database can't. As I am writing this, there are over 1300 threads connected to my database; any command that I issue at the mysql CLI takes over one minute to return. I am keeping track on how long each thread takes to complete, and that is of the order of hundreds of seconds - sometimes thousands. Each thread is itself simple, in that it just issues a couple of simple MariaDB commands. Currently my table consists of 1.6 million entries, and growing - on this basis, I expect that things will get only worse. Each entry,however, will never require more than a couple of hundred bytes of storage. The operations that can be undertaken on entries are insertion, deletion and modification, the latter being straightforward - like e.g. incrementing a counter or replacing a short string.
You are not considering many factors which is
a) limiting the number of concurrent threads - with 1300 threads you are creating concurrency races and locking issues. Try limiting your threads to a factor of the # of cores
I will. Notice, though, that most of the time most of the cores are idle anyway.
b), you're writing to a disk system. The number of CPU cores won't matter if you saturate the disk. You say nothing about the disk, if it's SSD, HDD etc. Note that HDD's are severely handicapped when it comes to concurrent IO operations.
It is an HDD. I am sure it will be a factor in making things slower.
c) given the factor above you could maybe try relaxing commit to disk if integrity of the data is not of utmost importance, for example by adding "innodb_flush_log_at_trx_commit = 2" to your config.
Thanks - I'll try that.
My system has 24 GB of RAM and 12 cores. Occasionally all the cores are fully busy with MariaDB activity, but most of the time barely one or two are.
d) CPU in a DB system will mostly be used for queries (index scans) and some for reindexing but you're only doing writes, so unless you have dozens of indexes
e) Table design - is your table design efficient ? how many rows per sec are you writing? maybe you could also benefit from hash partitioning or clever data types.
I am a newbie when it comes to interacting with MariaDB - please, bear with me. I know I must use a single database and a single table. I also know - because of the nature of the data that are being written to the FIFO - that the probability for two different threads to be operating on the same entry in the table at the same time is negligible - i.e. for all practical purposes, that will not happen.
What I need is advice on how to configure my instance of MariaDB to perform optimally in the scenario above. In particular, I would like for it to make better use of all the cores available - in essence, to parallelize the database operations as much as possible.
f) Congratulations, you have managed to contradict yourself in two sentences.
If your workload isn't parallel by design you will not make use of the available cores. Also, your workload sounds IO-bound to me - there's a strong change the disk is the bottleneck.
g) "I know I must use a single database and a single table." How do you know this if you are a newbie? No offense meant but nost of monolithic design is not the best to leverage performance....
The data that I have consist of a string S, two string attributes T and A, and an integer attribute D. String S is what I have to use as a key in order to insert/delete/modify entries. Can this be advantageously be stored across several tables? The S strings have nothing in common, beyond the attributes that I mentioned. As for the parallelism, the essence is what I pointed out - namely, that no two threads will work on the same entry (as accessed by S) at the same time. They can of course be adding new entries, or deleting existing ones, at the same time, with the proviso that the insertion operations will always be for different values of S, and analogously for the deletion operations. I am indeed open to suggestions about a better design.
-GL
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
From your description, data S is the key which in turn is composed of T and A. Can the data be stripped by the application and T and A stored separately? Maybe use persistent virtual columns and index those instead. I don't think I understand. It is not that S is composed of T and A; what I am reading from the FIFO is two items: S and A. T is derived from S and A, together with a timestamp. Recall that in InnoDB the primary key is a clustered index thus the table is written to disk sorted by column S. Inserts and updates may require moving rows around slowing the SQL statements (high insert/update concurrency will worsen the situation). If column S is the primary key and is large, all other indexes will use the primary key as the prefix, causing indexes to be large increasing the load on the system I/O. There is something to what you are saying, for I have a high insert/update concurrency. I will definitely try to understand to details of what you are suggesting. On Mon, Oct 7, 2019 at 7:13 PM JCA <1.41421@gmail.com> wrote: Thanks for your feedback. Please see my comments interspersed below. On Mon, Oct 7, 2019 at 10:38 AM Guillaume Lefranc <guillaume@adishatz.net> wrote: Hello,
Given zero optimization in the original post, innodb_flush_log_at_trx_commit = 2 is almost the right fix, orders of magnitude faster on hard disk. Add simple batching to this, and it is possible to add another order of magnitude, or 2. Write into several tables instead of 1, and that will improve the throughput by 5-10% more or something 😊 I doubt anything there is a need for anything fancy, TokuDB , Redis, what not. From: Guillaume Lefranc Sent: Wednesday, 9 October 2019 15:24 To: Rhys.Campbell@swisscom.com Cc: Mailing-List mariadb Subject: Re: [Maria-discuss] Performance tuning sought for MariaDB I doubt that TokuDB will help with 100MB of data. It's been designed with big datasets in mind and generally underperforms with small data. The question is more how fast you can read and write to the table. At this point even an in-memory database could be considered and maybe MariaDB isn't even the right choice for that (Redis?) Le mer. 9 oct. 2019 à 09:15, <Rhys.Campbell@swisscom.com> a écrit : Might be worth having a look at TokuDB https://mariadb.com/kb/en/library/tokudb/ From: Maria-discuss <maria-discuss-bounces+rhys.campbell=swisscom.com@lists.launchpad.net> On Behalf Of JCA Sent: 07 October 2019 21:03 To: Kenneth Penza <kpenza@gmail.com> Cc: Mailing-List mariadb <maria-discuss@lists.launchpad.net> Subject: Re: [Maria-discuss] Performance tuning sought for MariaDB On Mon, Oct 7, 2019 at 12:12 PM Kenneth Penza <kpenza@gmail.com> wrote: Opening many connections will cause issues, especially if each thread is running a sub-optimal query. Try to decrease the number of connections, one-way can be using message queues. Tuning the database I/O is important, ensure you are using a separate mount point for /var/lib/mysql and mount it with noatime. Resize the innodb_buffer_pool_size to ensure that it can hold your workload. Subsequently, perform some tests with innodb_flush_method = "O_DIRECT". thread-handling=pool-of-threads max_connections = 1000 table_open_cache = 800 query_cache_type = 0 innodb_buffer_pool_size = 512M innodb_buffer_pool_instances = 10 innodb_adaptive_hash_index_partitions = 20 innodb_lock_wait_timeout = 5000 No need to use buffer pool instances with only 512M of buffer. you said you have 24GB of RAM - why not increase the buffer size? how big is your table on the disk right now? If you want the best performance it must be hold in the buffer. It does not seem to be that big - about 101 MB. I used the following command: SELECT table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size (MB)` FROM information_schema.TABLES WHERE table_schema = "<my-db-name>"; I had to look it up. With this, my application can keep up with the FIFO writer, but - depending on the circumstances - my database can't. As I am writing this, there are over 1300 threads connected to my database; any command that I issue at the mysql CLI takes over one minute to return. I am keeping track on how long each thread takes to complete, and that is of the order of hundreds of seconds - sometimes thousands. Each thread is itself simple, in that it just issues a couple of simple MariaDB commands. Currently my table consists of 1.6 million entries, and growing - on this basis, I expect that things will get only worse. Each entry,however, will never require more than a couple of hundred bytes of storage. The operations that can be undertaken on entries are insertion, deletion and modification, the latter being straightforward - like e.g. incrementing a counter or replacing a short string. You are not considering many factors which is a) limiting the number of concurrent threads - with 1300 threads you are creating concurrency races and locking issues. Try limiting your threads to a factor of the # of cores I will. Notice, though, that most of the time most of the cores are idle anyway. b), you're writing to a disk system. The number of CPU cores won't matter if you saturate the disk. You say nothing about the disk, if it's SSD, HDD etc. Note that HDD's are severely handicapped when it comes to concurrent IO operations. It is an HDD. I am sure it will be a factor in making things slower. c) given the factor above you could maybe try relaxing commit to disk if integrity of the data is not of utmost importance, for example by adding "innodb_flush_log_at_trx_commit = 2" to your config. Thanks - I'll try that. My system has 24 GB of RAM and 12 cores. Occasionally all the cores are fully busy with MariaDB activity, but most of the time barely one or two are. d) CPU in a DB system will mostly be used for queries (index scans) and some for reindexing but you're only doing writes, so unless you have dozens of indexes e) Table design - is your table design efficient ? how many rows per sec are you writing? maybe you could also benefit from hash partitioning or clever data types. I am a newbie when it comes to interacting with MariaDB - please, bear with me. I know I must use a single database and a single table. I also know - because of the nature of the data that are being written to the FIFO - that the probability for two different threads to be operating on the same entry in the table at the same time is negligible - i.e. for all practical purposes, that will not happen. What I need is advice on how to configure my instance of MariaDB to perform optimally in the scenario above. In particular, I would like for it to make better use of all the cores available - in essence, to parallelize the database operations as much as possible. f) Congratulations, you have managed to contradict yourself in two sentences. If your workload isn't parallel by design you will not make use of the available cores. Also, your workload sounds IO-bound to me - there's a strong change the disk is the bottleneck. g) "I know I must use a single database and a single table." How do you know this if you are a newbie? No offense meant but nost of monolithic design is not the best to leverage performance.... The data that I have consist of a string S, two string attributes T and A, and an integer attribute D. String S is what I have to use as a key in order to insert/delete/modify entries. Can this be advantageously be stored across several tables? The S strings have nothing in common, beyond the attributes that I mentioned. As for the parallelism, the essence is what I pointed out - namely, that no two threads will work on the same entry (as accessed by S) at the same time. They can of course be adding new entries, or deleting existing ones, at the same time, with the proviso that the insertion operations will always be for different values of S, and analogously for the deletion operations. I am indeed open to suggestions about a better design. -GL _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
On 10/9/2019 3:15 AM, Rhys.Campbell@swisscom.com wrote:
Might be worth having a look at TokuDB https://mariadb.com/kb/en/library/tokudb/
With TokuDB deprecated upstream and scheduled to be removed in a future MariaDB version, I would not recommend it to anyone at this point. Brian
You do not share many details how exactly your application exactly interacts with the server. Do you work with large batches, I.e generate big (say 1MB) multi-valued statements like INSERT INTO t(a,b) VALUES(a1,b1),(a2,b2)......,(aN, bN) ON DUPLICATE KEY UPDATE counter=counter+1 The mass-deletion is straightforward DELETE FROM t WHERE id in (id1,.......idN) Make sure your multi-valued inserts/deletes do not exceed the value of “max_allowed_packet” session variable. And work preferably with a single connection or small amount of connections. If this sounds complicated, you can combine multiple updates in large transactions instead, though this could be slightly less efficient, since there is more interaction between the application and DB. Rather than parallelizing single updates, it is usually better to combine updates in large-ish transactions. Most of the update-related work will happen in background anyway, at least for innodb. There is some info in the documentation that mentions multi-value inserts https://mariadb.com/kb/en/library/how-to-quickly-insert-data-into-mariadb/ From: JCA Sent: Monday, 7 October 2019 18:17 To: maria-discuss@lists.launchpad.net Subject: [Maria-discuss] Performance tuning sought for MariaDB I am running MariaDB 10.0.34 on a Slackware 14.2 system. I have a C application that interacts with MariaDB in the following way: 1. Read data from a FIFO. 2. Insert the data into a table in a MariaDB database, if absent, or modify it according to certain specific criteria otherwise. This works as expected. The problem that I have is that data are being written to the FIFO at a fast rate. In order to be able to keep up, at any given time my application reads the data available at the FIFO, and spawns a thread to process the chunk of data just read. It is in this thread that all the database interaction takes place. In order to deal with this, I have the following entries in my /etc/my.cnf file: # this is read by the standalone daemon and embedded servers [server] # this is only for the mysqld standalone daemon [mysqld] # thread_handling=pool-of-threads # log=/var/log/mysqld.log # this is only for embedded server [embedded] # This group is only read by MariaDB servers, not by MySQL. # If you use the same .cnf file for MySQL and MariaDB, # you can put MariaDB-only options here [mariadb] # log=/var/log/mysqld.log general_log_file = /var/lib/mysql/mysql.log # general_log = 1 # transaction-isolation = READ-COMMITTED # key_buffer = 1280M # 128MB for every 1GB of RAM # sort_buffer_size = 1M # 1MB for every 1GB of RAM # read_buffer_size = 1M # 1MB for every 1GB of RAM # read_rnd_buffer_size = 1M # 1MB for every 1GB of RAM # thread_concurrency = 24 # Based on the number of CPUs # so make it CPU*2 # thread-handling=pool-of-threads # innodb_flush_log_at_trx_commit != 1 # open_files_limit = 50000 thread-handling=pool-of-threads max_connections = 1000 table_open_cache = 800 query_cache_type = 0 innodb_buffer_pool_size = 512M innodb_buffer_pool_instances = 10 innodb_adaptive_hash_index_partitions = 20 innodb_lock_wait_timeout = 5000 With this, my application can keep up with the FIFO writer, but - depending on the circumstances - my database can't. As I am writing this, there are over 1300 threads connected to my database; any command that I issue at the mysql CLI takes over one minute to return. I am keeping track on how long each thread takes to complete, and that is of the order of hundreds of seconds - sometimes thousands. Each thread is itself simple, in that it just issues a couple of simple MariaDB commands. Currently my table consists of 1.6 million entries, and growing - on this basis, I expect that things will get only worse. Each entry,however, will never require more than a couple of hundred bytes of storage. The operations that can be undertaken on entries are insertion, deletion and modification, the latter being straightforward - like e.g. incrementing a counter or replacing a short string. My system has 24 GB of RAM and 12 cores. Occasionally all the cores are fully busy with MariaDB activity, but most of the time barely one or two are. I am a newbie when it comes to interacting with MariaDB - please, bear with me. I know I must use a single database and a single table. I also know - because of the nature of the data that are being written to the FIFO - that the probability for two different threads to be operating on the same entry in the table at the same time is negligible - i.e. for all practical purposes, that will not happen. What I need is advice on how to configure my instance of MariaDB to perform optimally in the scenario above. In particular, I would like for it to make better use of all the cores available - in essence, to parallelize the database operations as much as possible. Feedback from the experts will be much appreciated.
On Mon, Oct 7, 2019 at 11:06 AM Vladislav Vaintroub <vvaintroub@gmail.com> wrote:
You do not share many details how exactly your application exactly interacts with the server.
Do you work with large batches, I.e generate big (say 1MB) multi-valued statements like
INSERT INTO t(a,b) VALUES(a1,b1),(a2,b2)......,(aN, bN)
ON DUPLICATE KEY UPDATE counter=counter+1
It will be mostly individual insertions, amounting to a small amount of data every time.
The mass-deletion is straightforward
DELETE FROM t WHERE id in (id1,.......idN)
This I do every so often, but far less frequently than insertions and modifications.
Make sure your multi-valued inserts/deletes do not exceed the value of “max_allowed_packet” session variable.
And work preferably with a single connection or small amount of connections.
Thanks. I do the latter, actually -- if I have have several insertions/modifications to do, I strive to submit them in a single connection. I do have a separate connection per thread though. Do you think it might be worth the while using a single connection, or a small connection pool, throughout?
If this sounds complicated, you can combine multiple updates in large transactions instead, though this could be slightly less efficient, since there is more interaction between the application and DB.
That, in fact, has been my observation.
Rather than parallelizing single updates, it is usually better to combine updates in large-ish transactions. Most of the update-related work will happen in background anyway, at least for innodb.
There is some info in the documentation that mentions multi-value inserts https://mariadb.com/kb/en/library/how-to-quickly-insert-data-into-mariadb/
Thanks; I'll check that out.
*From: *JCA <1.41421@gmail.com> *Sent: *Monday, 7 October 2019 18:17 *To: *maria-discuss@lists.launchpad.net *Subject: *[Maria-discuss] Performance tuning sought for MariaDB
I am running MariaDB 10.0.34 on a Slackware 14.2 system. I have a C application that interacts with MariaDB in the following way:
1. Read data from a FIFO.
2. Insert the data into a table in a MariaDB database, if absent, or modify it according to certain specific criteria otherwise.
This works as expected.
The problem that I have is that data are being written to the FIFO at a fast rate. In order to be able to keep up, at any given time my application reads the data available at the FIFO, and spawns a thread to process the chunk of data just read. It is in this thread that all the database interaction takes place. In order to deal with this, I have the following entries in my /etc/my.cnf file:
# this is read by the standalone daemon and embedded servers [server]
# this is only for the mysqld standalone daemon [mysqld] # thread_handling=pool-of-threads # log=/var/log/mysqld.log
# this is only for embedded server [embedded]
# This group is only read by MariaDB servers, not by MySQL. # If you use the same .cnf file for MySQL and MariaDB, # you can put MariaDB-only options here [mariadb] # log=/var/log/mysqld.log general_log_file = /var/lib/mysql/mysql.log # general_log = 1
# transaction-isolation = READ-COMMITTED # key_buffer = 1280M # 128MB for every 1GB of RAM # sort_buffer_size = 1M # 1MB for every 1GB of RAM # read_buffer_size = 1M # 1MB for every 1GB of RAM # read_rnd_buffer_size = 1M # 1MB for every 1GB of RAM # thread_concurrency = 24 # Based on the number of CPUs # so make it CPU*2 # thread-handling=pool-of-threads # innodb_flush_log_at_trx_commit != 1 # open_files_limit = 50000
thread-handling=pool-of-threads max_connections = 1000 table_open_cache = 800 query_cache_type = 0 innodb_buffer_pool_size = 512M innodb_buffer_pool_instances = 10 innodb_adaptive_hash_index_partitions = 20 innodb_lock_wait_timeout = 5000
With this, my application can keep up with the FIFO writer, but - depending on the circumstances - my database can't. As I am writing this, there are over 1300 threads connected to my database; any command that I issue at the mysql CLI takes over one minute to return. I am keeping track on how long each thread takes to complete, and that is of the order of hundreds of seconds - sometimes thousands. Each thread is itself simple, in that it just issues a couple of simple MariaDB commands. Currently my table consists of 1.6 million entries, and growing - on this basis, I expect that things will get only worse. Each entry,however, will never require more than a couple of hundred bytes of storage. The operations that can be undertaken on entries are insertion, deletion and modification, the latter being straightforward - like e.g. incrementing a counter or replacing a short string.
My system has 24 GB of RAM and 12 cores. Occasionally all the cores are fully busy with MariaDB activity, but most of the time barely one or two are.
I am a newbie when it comes to interacting with MariaDB - please, bear with me. I know I must use a single database and a single table. I also know - because of the nature of the data that are being written to the FIFO - that the probability for two different threads to be operating on the same entry in the table at the same time is negligible - i.e. for all practical purposes, that will not happen.
What I need is advice on how to configure my instance of MariaDB to perform optimally in the scenario above. In particular, I would like for it to make better use of all the cores available - in essence, to parallelize the database operations as much as possible.
Feedback from the experts will be much appreciated.
The problem with many single-statement parallel updates/inserts is that each statement creates an individual implicit transaction, and this transaction will stuck on commit, waiting for its changes to be persistent, i.e written into Innodb redo log (by default also followed with fsync). There is a group commit feature n Innodb, which aims to fix that, but I’m not too familiar with it, not sure how efficient it is. Innodb_flush_log_at_trx_commit=2 mentioned by Guillaume can do wonders, because it omits the slowest part of commit, the fsync. On the other hand, large multivalued updates , or transactions will do a single write on commit for many updates, instead of multiple tiny writes. That’s the whole trick, which for example makes mysqldump-made logical backups fast to restore , without doing any work in parallel on the client side (Alright, mysqldump has couple more tricks, such as disabling indexes, but multivalued updates are essential). You can experiment with the number of connections, and check whether increasing them brings much more throughput, but my guess more than say number_of_cores *2 would not give improvement (once you start combining the updates). Definitely not thousands of connections. Perhaps a single client connection will suffice. From: JCA Sent: Monday, 7 October 2019 20:52 To: Vladislav Vaintroub Cc: maria-discuss@lists.launchpad.net Subject: Re: [Maria-discuss] Performance tuning sought for MariaDB On Mon, Oct 7, 2019 at 11:06 AM Vladislav Vaintroub <vvaintroub@gmail.com> wrote: You do not share many details how exactly your application exactly interacts with the server. Do you work with large batches, I.e generate big (say 1MB) multi-valued statements like INSERT INTO t(a,b) VALUES(a1,b1),(a2,b2)......,(aN, bN) ON DUPLICATE KEY UPDATE counter=counter+1 It will be mostly individual insertions, amounting to a small amount of data every time. The mass-deletion is straightforward DELETE FROM t WHERE id in (id1,.......idN) This I do every so often, but far less frequently than insertions and modifications. Make sure your multi-valued inserts/deletes do not exceed the value of “max_allowed_packet” session variable. And work preferably with a single connection or small amount of connections. Thanks. I do the latter, actually -- if I have have several insertions/modifications to do, I strive to submit them in a single connection. I do have a separate connection per thread though. Do you think it might be worth the while using a single connection, or a small connection pool, throughout? If this sounds complicated, you can combine multiple updates in large transactions instead, though this could be slightly less efficient, since there is more interaction between the application and DB. That, in fact, has been my observation. Rather than parallelizing single updates, it is usually better to combine updates in large-ish transactions. Most of the update-related work will happen in background anyway, at least for innodb. There is some info in the documentation that mentions multi-value inserts https://mariadb.com/kb/en/library/how-to-quickly-insert-data-into-mariadb/ Thanks; I'll check that out. From: JCA Sent: Monday, 7 October 2019 18:17 To: maria-discuss@lists.launchpad.net Subject: [Maria-discuss] Performance tuning sought for MariaDB I am running MariaDB 10.0.34 on a Slackware 14.2 system. I have a C application that interacts with MariaDB in the following way: 1. Read data from a FIFO. 2. Insert the data into a table in a MariaDB database, if absent, or modify it according to certain specific criteria otherwise. This works as expected. The problem that I have is that data are being written to the FIFO at a fast rate. In order to be able to keep up, at any given time my application reads the data available at the FIFO, and spawns a thread to process the chunk of data just read. It is in this thread that all the database interaction takes place. In order to deal with this, I have the following entries in my /etc/my.cnf file: # this is read by the standalone daemon and embedded servers [server] # this is only for the mysqld standalone daemon [mysqld] # thread_handling=pool-of-threads # log=/var/log/mysqld.log # this is only for embedded server [embedded] # This group is only read by MariaDB servers, not by MySQL. # If you use the same .cnf file for MySQL and MariaDB, # you can put MariaDB-only options here [mariadb] # log=/var/log/mysqld.log general_log_file = /var/lib/mysql/mysql.log # general_log = 1 # transaction-isolation = READ-COMMITTED # key_buffer = 1280M # 128MB for every 1GB of RAM # sort_buffer_size = 1M # 1MB for every 1GB of RAM # read_buffer_size = 1M # 1MB for every 1GB of RAM # read_rnd_buffer_size = 1M # 1MB for every 1GB of RAM # thread_concurrency = 24 # Based on the number of CPUs # so make it CPU*2 # thread-handling=pool-of-threads # innodb_flush_log_at_trx_commit != 1 # open_files_limit = 50000 thread-handling=pool-of-threads max_connections = 1000 table_open_cache = 800 query_cache_type = 0 innodb_buffer_pool_size = 512M innodb_buffer_pool_instances = 10 innodb_adaptive_hash_index_partitions = 20 innodb_lock_wait_timeout = 5000 With this, my application can keep up with the FIFO writer, but - depending on the circumstances - my database can't. As I am writing this, there are over 1300 threads connected to my database; any command that I issue at the mysql CLI takes over one minute to return. I am keeping track on how long each thread takes to complete, and that is of the order of hundreds of seconds - sometimes thousands. Each thread is itself simple, in that it just issues a couple of simple MariaDB commands. Currently my table consists of 1.6 million entries, and growing - on this basis, I expect that things will get only worse. Each entry,however, will never require more than a couple of hundred bytes of storage. The operations that can be undertaken on entries are insertion, deletion and modification, the latter being straightforward - like e.g. incrementing a counter or replacing a short string. My system has 24 GB of RAM and 12 cores. Occasionally all the cores are fully busy with MariaDB activity, but most of the time barely one or two are. I am a newbie when it comes to interacting with MariaDB - please, bear with me. I know I must use a single database and a single table. I also know - because of the nature of the data that are being written to the FIFO - that the probability for two different threads to be operating on the same entry in the table at the same time is negligible - i.e. for all practical purposes, that will not happen. What I need is advice on how to configure my instance of MariaDB to perform optimally in the scenario above. In particular, I would like for it to make better use of all the cores available - in essence, to parallelize the database operations as much as possible. Feedback from the experts will be much appreciated.
participants (6)
-
Brian Evans
-
Guillaume Lefranc
-
JCA
-
Kenneth Penza
-
Rhys.Campbell@swisscom.com
-
Vladislav Vaintroub