[Maria-discuss] How does InnoDB delete rows?
Hi everyone. I have some questions regarding InnoDB deletion process. As far as I know, when we issue a DELETE command to a row stored in a InnoDB table, the row contents are stored in the UNDO Log, and the contents of the row are only deleted when the last transaction after the DELETE transaction has finished. But what actually happens to the row that was deleted? Is any disk space ever released to the OS, or are the contents simply erased? Defragmenting a table is only done through OPTIMIZE TABLE, or are there any automatic threads responsible for rearranging the rows? And if the are no such threads, how do we maintain tables which suffer from many deletions? I hope I made myself clear. Thanks in advance, Arlindo Neto.
It's actually a little different: when you issue DELETE, InnoDB marks the rows as deleted and that's it. The space occupied by the deleted rows will be reused later whenever new rows are inserted or when other rows grow in size. There are no threads doing compaction, and InnoDB won't release disk space to OS even if you delete all rows from the table (the space will be released though if you execute TRUNCATE TABLE). So if you deleted a lot of rows from the table and expect that there will be no similar-sized additions to the table any time soon, then you'll need to execute OPTIMIZE TABLE to release the disk space occupied by the table. If your table sees roughly the same amount of deletes and inserts (or it sees more inserts than deletes), then don't bother, InnoDB will eventually reuse the space occupied by the deleted rows. Pavel On Fri, Jul 27, 2018 at 12:07 PM Arlindo Neto <neto.acs@gmail.com> wrote:
Hi everyone.
I have some questions regarding InnoDB deletion process.
As far as I know, when we issue a DELETE command to a row stored in a InnoDB table, the row contents are stored in the UNDO Log, and the contents of the row are only deleted when the last transaction after the DELETE transaction has finished. But what actually happens to the row that was deleted? Is any disk space ever released to the OS, or are the contents simply erased? Defragmenting a table is only done through OPTIMIZE TABLE, or are there any automatic threads responsible for rearranging the rows? And if the are no such threads, how do we maintain tables which suffer from many deletions?
I hope I made myself clear. Thanks in advance,
Arlindo Neto.
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Nice. Thanks Pavel. A few more questions though: Can InnoDB reuse this space by adding new rows with AUTO_INCREMENT primary key when our table use clustered indexes? As new rows are added, wouldn't the table structure organize the data based on the primary key, preventing the disk space containing the old row from being written again? Also, when talking about large tables, could this "table bloat" impact performance in, for instance, sequential scans? Thanks again, Arlindo Neto. Em sex, 27 de jul de 2018 às 18:35, Pavel Ivanov <pivanof@google.com> escreveu:
It's actually a little different: when you issue DELETE, InnoDB marks the rows as deleted and that's it. The space occupied by the deleted rows will be reused later whenever new rows are inserted or when other rows grow in size. There are no threads doing compaction, and InnoDB won't release disk space to OS even if you delete all rows from the table (the space will be released though if you execute TRUNCATE TABLE). So if you deleted a lot of rows from the table and expect that there will be no similar-sized additions to the table any time soon, then you'll need to execute OPTIMIZE TABLE to release the disk space occupied by the table. If your table sees roughly the same amount of deletes and inserts (or it sees more inserts than deletes), then don't bother, InnoDB will eventually reuse the space occupied by the deleted rows.
Pavel
On Fri, Jul 27, 2018 at 12:07 PM Arlindo Neto <neto.acs@gmail.com> wrote:
Hi everyone.
I have some questions regarding InnoDB deletion process.
As far as I know, when we issue a DELETE command to a row stored in a
InnoDB table, the row contents are stored in the UNDO Log, and the contents of the row are only deleted when the last transaction after the DELETE transaction has finished. But what actually happens to the row that was deleted? Is any disk space ever released to the OS, or are the contents simply erased? Defragmenting a table is only done through OPTIMIZE TABLE, or are there any automatic threads responsible for rearranging the rows? And if the are no such threads, how do we maintain tables which suffer from many deletions?
I hope I made myself clear. Thanks in advance,
Arlindo Neto.
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
If your table uses auto-increment field as primary key, then possibility to reuse the space depends on how many rows you delete. When a page in the clustered index contains too few rows, InnoDB will merge it with an adjacent page, and then this space will become available for reuse by a new page in the tail of the index. Yes, "table bloat" will reduce performance of reads, because InnoDB buffer pool is based on pages, so the fewer active rows in the pages, the fewer active data fits into the buffer pool and InnoDB needs to read from disk more often. But at the same time if you do a lot of inserts/updates in the middle of the index, then lack of "table bloat" will reduce performance of those operations, because InnoDB will have to split pages to do them. On Fri, Jul 27, 2018 at 4:01 PM Arlindo Neto <neto.acs@gmail.com> wrote:
Nice. Thanks Pavel.
A few more questions though: Can InnoDB reuse this space by adding new rows with AUTO_INCREMENT primary key when our table use clustered indexes? As new rows are added, wouldn't the table structure organize the data based on the primary key, preventing the disk space containing the old row from being written again? Also, when talking about large tables, could this "table bloat" impact performance in, for instance, sequential scans?
Thanks again, Arlindo Neto.
Em sex, 27 de jul de 2018 às 18:35, Pavel Ivanov <pivanof@google.com> escreveu:
It's actually a little different: when you issue DELETE, InnoDB marks the rows as deleted and that's it. The space occupied by the deleted rows will be reused later whenever new rows are inserted or when other rows grow in size. There are no threads doing compaction, and InnoDB won't release disk space to OS even if you delete all rows from the table (the space will be released though if you execute TRUNCATE TABLE). So if you deleted a lot of rows from the table and expect that there will be no similar-sized additions to the table any time soon, then you'll need to execute OPTIMIZE TABLE to release the disk space occupied by the table. If your table sees roughly the same amount of deletes and inserts (or it sees more inserts than deletes), then don't bother, InnoDB will eventually reuse the space occupied by the deleted rows.
Pavel
On Fri, Jul 27, 2018 at 12:07 PM Arlindo Neto <neto.acs@gmail.com> wrote:
Hi everyone.
I have some questions regarding InnoDB deletion process.
As far as I know, when we issue a DELETE command to a row stored in a InnoDB table, the row contents are stored in the UNDO Log, and the contents of the row are only deleted when the last transaction after the DELETE transaction has finished. But what actually happens to the row that was deleted? Is any disk space ever released to the OS, or are the contents simply erased? Defragmenting a table is only done through OPTIMIZE TABLE, or are there any automatic threads responsible for rearranging the rows? And if the are no such threads, how do we maintain tables which suffer from many deletions?
I hope I made myself clear. Thanks in advance,
Arlindo Neto.
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
On 28.07.2018 01:01, Arlindo Neto wrote:
Nice. Thanks Pavel.
A few more questions though: Can InnoDB reuse this space by adding new rows with AUTO_INCREMENT primary key when our table use clustered indexes? As new rows are added, wouldn't the table structure organize the data based on the primary key, preventing the disk space containing the old row from being written again?
yes, when adding new rows in primary key order only a row marked as free can't be reused for that, only a full page marked as free can. With only a single row deleted, a page will just continue to exist with that gap in it, unless doing an OPTIMIZE TABLE or a copying ALTER operation on it. If "enough" rows on a page get deleted, making its "fill factor" drop below a specific low water mark, InnoDB will attempt to re-balance the clustered primary index by merging it with other pages.
Also, when talking about large tables, could this "table bloat" impact performance in, for instance, sequential scans?
Depends on whether the table fits into the buffer pool, or is too large for that so that operations on the full table will become IO bound. As long as pages are cached in-memory in the buffer pool, there will be some impact due to cache misses, but that would usually not be that bad. With pages from disk the effect will be more obvious, as rows marked as freed still need to be read from disk and will consume precious IO bandwidth. And if the clustered primary index gets rebalanced a lot its pages will not necessarily be in linear order anymore, so that even a full table scan can cause quite a bit of random IO. That can be an issue on rotating disks for sure, but luckily not so much on SSDs anymore. -- Hartmut Holzgraefe - Principal Support Engineer (EMEA) MariaDB Corporation - http://www.mariadb.com/
Arlindo Keep in mind that when looking at the real work done physically on a DB, all changes and handling are page-based. Not only that row is modified, but the adjacent rows in the very same page are rewritten too. There are some caching and smarter handlings, but its up to the database management system to even write a whole page just because of one row, if a transaction is definied to do so. About DELETE. There is a standard behavior that new developers (and some old ones) doesnt know. Rows are not physically deleted (just as the majority of the filesystems available today). They are logically marked as free. Check out the link below https://dev.mysql.com/doc/internals/en/innodb-page-structure.html. So, the "better performing and storage wise procedure" is to delete all the rows in a given page. This will mark the page as free. Some answers are done in the link below for what happens when specific cases happen.Beyond that, unfortunately you may need to go to C code reading for some stuff, but the guy who originally wrote InnoDB answered them https://www.percona.com/blog/2007/10/26/heikki-tuuri-innodb-answers-part-i/ I didnt find further info here https://mariadb.com/kb/en/library/xtradb-and-innodb/, but i suppose its the best place to add this info And lastly, go look over other RDBMs way of dealing with this stuff. You sure learn the tradeoffs that involves them https://hackernoon.com/showdown-mysql-8-vs-postgresql-10-3fe23be5c19e 2018-07-27 16:06 GMT-03:00 Arlindo Neto <neto.acs@gmail.com>:
Hi everyone.
I have some questions regarding InnoDB deletion process.
As far as I know, when we issue a DELETE command to a row stored in a InnoDB table, the row contents are stored in the UNDO Log, and the contents of the row are only deleted when the last transaction after the DELETE transaction has finished. But what actually happens to the row that was deleted? Is any disk space ever released to the OS, or are the contents simply erased? Defragmenting a table is only done through OPTIMIZE TABLE, or are there any automatic threads responsible for rearranging the rows? And if the are no such threads, how do we maintain tables which suffer from many deletions?
I hope I made myself clear. Thanks in advance,
Arlindo Neto.
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
On 31.07.2018 20:01, Alexandre hadjinlian guerra wrote:
So, the "better performing and storage wise procedure" is to delete all the rows in a given page. This will mark the page as free.
but that only moves the question up by one level: now the full page will be marked as free, and be ready to be reused, but it will not be returned to the file system either. -- Hartmut Holzgraefe - Principal Support Engineer (EMEA) MariaDB Corporation - http://www.mariadb.com/
Hi Arlindo, 2018-07-27 22:06 GMT+03:00 Arlindo Neto <neto.acs@gmail.com>:
Hi everyone.
I have some questions regarding InnoDB deletion process.
As far as I know, when we issue a DELETE command to a row stored in a InnoDB table, the row contents are stored in the UNDO Log, and the contents of the row are only deleted when the last transaction after the DELETE transaction has finished.
That is true. History will be preserved in undo logs and index trees, so that any active read views associated with transactions that started before the transaction (which made changes) committed, will have a consistent read view, as if the DELETE did not take place. So, DELETE is a special form of UPDATE, which will only update the delete-mark flag in the index records. The oldest active transaction read view is also called the "purge view". Anything older than that may be purged, but purge might also be lagging a little bit behind. The only case when records can actually be deleted outside purge is transaction rollback. A "fresh insert" can be rolled back by immediately deleting the index records. But, an insert may also be performed by updating a delete-marked purgeable record. In that case, the rollback will have to do a purge-like check: delete the record if it is not visible in any purge view. (This is a bit tricky for secondary indexes, because secondary index records do not contain a per-record transaction identifier.) I explained some of this in the "Deep Dive" talk at M18: https://vimeo.com/258533150 https://docs.google.com/presentation/d/1bP3yh57B58yfGDd-34TZ5MeiacIdaseaTLpd...
But what actually happens to the row that was deleted? Is any> disk space ever released to the OS, or are the contents simply erased?
InnoDB keeps a PAGE_FREE stack of purged records. An insert would allocate space from the top of that stack unless deleted record was smaller than the one that is about to be inserted. It would fall back to allocating new space from the "page heap". If the page would get too full, and if the record would fit after removing the garbage from the page, then there would be a btr_page_reorganize() operation that would rebuild the page, followed by an insert. If the insert does not fit, then the page would be split. Page merges will typically be done lazily. MySQL 5.7 (and MariaDB 10.2) introduced the table comment keyword MERGE_THRESHOLD for controlling it.
Defragmenting a table is only done through OPTIMIZE TABLE, or are there any automatic threads responsible for rearranging the rows?
In MariaDB 10.1 and later, there is an option. After innodb_defragment=ON, OPTIMIZE TABLE would not rebuild the table, but instead invoke a defragmentation operation. This would not shrink the data file, however. By default, OPTIMIZE TABLE does rebuild the table. It is supported as an online operation, but the log of concurrent DML operations would increase the storage requirements
And if the are no such threads, how do we maintain tables which suffer from many deletions?
Generally, it could be good to avoid mass deletions with InnoDB when possible. Alternatives could be to partition the table in a suitable way if possible, and then use DROP PARTITION for the mass deletions. Or, you could copy the "surviving" records to a new table and then drop the old one. I hope that this helps. Best regards, Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation
participants (5)
-
Alexandre hadjinlian guerra
-
Arlindo Neto
-
Hartmut Holzgraefe
-
Marko Mäkelä
-
Pavel Ivanov