Hi, Pierre! Okay, there were quite a few replies already here. I'll chime in, but only to confirm that there's no easy solution, unfortunately :( On Feb 10, Pierre wrote:
Hello,
Mariadb is getting slower and slower when I'm inserting a massive amout of data. I'm trying to insert 166 507 066 rows (12go) using load data infile '' into an empty table. I splitted my file in 13 parts of the same size because it was too long to insert in one shot. When I inserted more than 100M rows, it starts to be extremely slow.
Here are the time for each load data infile :
LOAD DATA INFILE '/tmp/hash/xaa' : 2min 49sec LOAD DATA INFILE '/tmp/hash/xab' : 2min 49sec LOAD DATA INFILE '/tmp/hash/xac' : 3min 48sec LOAD DATA INFILE '/tmp/hash/xad' : 3min 48sec LOAD DATA INFILE '/tmp/hash/xae' : 3min 49sec LOAD DATA INFILE '/tmp/hash/xaf' : 5min 59sec LOAD DATA INFILE '/tmp/hash/xag' : 10min 50sec LOAD DATA INFILE '/tmp/hash/xah' : 20min 7sec LOAD DATA INFILE '/tmp/hash/xai' : 47min 7sec LOAD DATA INFILE '/tmp/hash/xaj' : 4 hours 1 min 9.34 sec ( ouch !) LOAD DATA INFILE '/tmp/hash/xak' : still running...
Can someone explain me what is happening internaly and why it is so slow ?
Yes. That's because indexes are implemented as B-trees, and B-tree has a cost of insertion of O(log N). As you insert more data, insertions become slower. Furthermore, B-tree organizes data in pages. MariaDB tries to keep all pages in memory, but as soon as index becomes larger than keycache size, every key insertion means at least one disk read and one write (assuming your key values are distributed normally and randomly).
Do you think if I just create an index (not a unique index) it will be faster ?
Not really. Inserting values into a B-tree one-by-one will always be slow when your B-tree becomes big enough. But you don't need to insert values into a B-tree one by one. MyISAM has a much faster (like, two orders of magnitude) way of building b-trees by creating a tree all at once. This is used when you insert data into the empty table. Or when you add an index to the existing table. Or when you enable indexes. That is: 1. LOAD DATA INFILE 'everything, not in chunks' (a convenient way would be to load from a pipe, and cat all your chunks into it). 2. ALTER TABLE ... ADD KEY 3. ALTER TABLE DISABLE KEYS. insert the data. ALTER TABLE ENABLE KEYS But unfortunately, all this only works for non-unique keys, when MariaDB doesn't need to expect a unique constraint error and won't need to delete rows when creating indexes. For unique indexes values will be inserted into a b-tree one by one. Now, there's another trick you can use. Create a table. Then disable all indexes - not with ALTER TABLE ENABLE KEYS (which only disables non-unique indexes) - but from a command-line with myisamchk: $ myisamchk --keys-used=0 /path/to/table.MYI then insert your data. And REPAIR TABLE or REPAIR TABLE ... QUICK. This will rebuild all indexes, the fast way. The second variant (with QUICK) will not copy your (presumably, huge) data - it will only rebuild indexes. But it cannot remove duplicates either. And also it's worth noting that MariaDB-10.0 rebuilds both UNIQUE and non-UNIQUE indexes the fast way when you do ALTER TABLE ... ADD KEY. Regards, Sergei