Re: [Maria-developers] 0cd5377e9ff: MDEV-18734 ASAN heap-use-after-free in my_strnxfrm_simple_internal upon update on versioned partitioned table
Hi, Aleksey! On Oct 20, Aleksey Midenkov wrote:
revision-id: 0cd5377e9ff (mariadb-10.2.31-357-g0cd5377e9ff) parent(s): dc716da4571 author: Aleksey Midenkov
committer: Aleksey Midenkov timestamp: 2020-08-17 00:52:35 +0300 message: MDEV-18734 ASAN heap-use-after-free in my_strnxfrm_simple_internal upon update on versioned partitioned table
update_virtual_fields() may reallocate Field_blob::value buffer due to different reasons (f.ex. different types, see Field_str::memcpy_field_possible()). But the buffer address may be already in m_ordered_rec_buffer which is then used in ordered index scan queue.
The patch updates blob virtual fields when the ordered index scan queue is sorted as well as the top record is restored from the queue.
New VCOL_UPDATE_BLOBS mode is introduced to update only blobs. Swap is not needed as it was already done if it was needed.
Related to ea1b2504
No, I think we need to go deeper here. The bug is not only not caused by system versioning, it is also not caused by virtual columns. The problem is that partitioning copies record[0] out and then back to save and restore values. This does not work for blobs, because record[0] may store the pointer to the memory that Field_blob owns and can free or reallocate as needed. Normally it's not the problem because typically after row reads the engine owns the memory, not Field_blob. When a virtual column is set to a calculated value - then Field_blob owns the memory and we have this bug. But there are engines that use field->store() to set values - in this case Field_blob will own the memory too. Here's a test case with FederatedX: ============================================================= source include/have_partition.inc; source include/have_sequence.inc; source have_federatedx.inc; source include/federated.inc; connection slave; use federated; create table t1_1 (x int, b text, key(x)); create table t1_2 (x int, b text, key(x)); connection master; eval create table t1 (x int, b text, key(x)) engine=federated partition by range columns (x) ( partition p1 values less than (40) connection='mysql://root@127.0.0.1:$SLAVE_MYPORT/federated/t1_1', partition pn values less than (maxvalue) connection='mysql://root@127.0.0.1:$SLAVE_MYPORT/federated/t1_2' ); insert t1 select seq, seq from seq_1_to_35; insert t1 select seq, repeat(seq, 100) from seq_50_to_90; flush tables; select x, b from t1 where x > 30 and x < 60 order by x; select x, b from t1 where x > 30 and x < 60 order by b; drop table t1; connection slave; drop table t1_1, t1_2; source include/federated_cleanup.inc; ============================================================= first select doesn't trigger ASAN error, but it produces incorrect results (I'm showing it here, so that you could create a non-ASAN test that could be run on all builders). the second select gets the same ASAN error as your test below.
diff --git a/mysql-test/suite/vcol/r/partition.result b/mysql-test/suite/vcol/r/partition.result index bd1353fa145..74c3c3394dc 100644 --- a/mysql-test/suite/vcol/r/partition.result +++ b/mysql-test/suite/vcol/r/partition.result @@ -28,3 +28,26 @@ set statement sql_mode= '' for update t1 set i= 1, v= 2; Warnings: Warning 1906 The value specified for generated column 'v' in table 't1' has been ignored drop table t1; +# +# MDEV-18734 ASAN heap-use-after-free in my_strnxfrm_simple_internal upon update on versioned partitioned table +# +# Cover queue_fix() in ha_partition::handle_ordered_index_scan() +create table t1 ( +x int auto_increment primary key, +b text, v mediumtext as (b) virtual, +index (v(10)) +) partition by range columns (x) ( +partition p1 values less than (3), +partition p2 values less than (6), +partition pn values less than (maxvalue)); +insert into t1 (b) values ('q'), ('a'), ('b'); +update t1 set b= 'bar' where v > 'a'; +drop table t1;
this test above doesn't cause ASAN failures for me without your fix. the test below fails all right.
+# Cover return_top_record() in ha_partition::handle_ordered_index_scan() +create table t1 (x int primary key, b tinytext, v text as (b) virtual) +partition by range columns (x) ( +partition p1 values less than (4), +partition pn values less than (maxvalue)); +insert into t1 (x, b) values (1, ''), (2, ''), (3, 'a'), (4, 'b'); +update t1 set b= 'bar' where x > 0 order by v limit 2; +drop table t1;
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi Sergei!
On Wed, Oct 21, 2020 at 4:19 PM Sergei Golubchik
Hi, Aleksey!
On Oct 20, Aleksey Midenkov wrote:
revision-id: 0cd5377e9ff (mariadb-10.2.31-357-g0cd5377e9ff) parent(s): dc716da4571 author: Aleksey Midenkov
committer: Aleksey Midenkov timestamp: 2020-08-17 00:52:35 +0300 message: MDEV-18734 ASAN heap-use-after-free in my_strnxfrm_simple_internal upon update on versioned partitioned table
update_virtual_fields() may reallocate Field_blob::value buffer due to different reasons (f.ex. different types, see Field_str::memcpy_field_possible()). But the buffer address may be already in m_ordered_rec_buffer which is then used in ordered index scan queue.
The patch updates blob virtual fields when the ordered index scan queue is sorted as well as the top record is restored from the queue.
New VCOL_UPDATE_BLOBS mode is introduced to update only blobs. Swap is not needed as it was already done if it was needed.
Related to ea1b2504
No, I think we need to go deeper here. The bug is not only not caused by system versioning, it is also not caused by virtual columns.
The problem is that partitioning copies record[0] out and then back to save and restore values. This does not work for blobs, because record[0] may store the pointer to the memory that Field_blob owns and can free or reallocate as needed.
Normally it's not the problem because typically after row reads the engine owns the memory, not Field_blob. When a virtual column is set to a calculated value - then Field_blob owns the memory and we have this bug.
Yes, you right... Actually engine owning blobs is not good too because engine does not guarantee to keep previous records blobs. My solution is to make a special mem_root (m_ordered_root) and copy the blobs to m_ordered_root when the record is stored into m_ordered_rec_buffer. That m_ordered_root is freed when m_ordered_rec_buffer is destroyed (m_ordered_rec_buffer can be allocated on m_ordered_root too). When the record is restored from m_ordered_rec_buffer the table fields are made to own blobs: we can destroy m_ordered_root any time and the table->record[0] pointers will still be valid.
But there are engines that use field->store() to set values - in this case Field_blob will own the memory too. Here's a test case with FederatedX:
============================================================= source include/have_partition.inc; source include/have_sequence.inc; source have_federatedx.inc; source include/federated.inc; connection slave; use federated; create table t1_1 (x int, b text, key(x)); create table t1_2 (x int, b text, key(x)); connection master; eval create table t1 (x int, b text, key(x)) engine=federated partition by range columns (x) ( partition p1 values less than (40) connection='mysql://root@127.0.0.1: $SLAVE_MYPORT/federated/t1_1', partition pn values less than (maxvalue) connection='mysql://root@127.0.0.1:$SLAVE_MYPORT/federated/t1_2' ); insert t1 select seq, seq from seq_1_to_35; insert t1 select seq, repeat(seq, 100) from seq_50_to_90; flush tables; select x, b from t1 where x > 30 and x < 60 order by x; select x, b from t1 where x > 30 and x < 60 order by b; drop table t1; connection slave; drop table t1_1, t1_2; source include/federated_cleanup.inc; =============================================================
first select doesn't trigger ASAN error, but it produces incorrect results (I'm showing it here, so that you could create a non-ASAN test that could be run on all builders).
It turned out that this is the very same MDEV-17573 issue. Multiple partitions are multiple handlers which interfere with the single federatedx_io_mysql::current pointer.
the second select gets the same ASAN error as your test below.
Slightly modified this test case and added to the patch.
diff --git a/mysql-test/suite/vcol/r/partition.result b/mysql-test/suite/vcol/r/partition.result index bd1353fa145..74c3c3394dc 100644 --- a/mysql-test/suite/vcol/r/partition.result +++ b/mysql-test/suite/vcol/r/partition.result @@ -28,3 +28,26 @@ set statement sql_mode= '' for update t1 set i= 1, v= 2; Warnings: Warning 1906 The value specified for generated column 'v' in table 't1' has been ignored drop table t1; +# +# MDEV-18734 ASAN heap-use-after-free in my_strnxfrm_simple_internal upon update on versioned partitioned table +# +# Cover queue_fix() in ha_partition::handle_ordered_index_scan() +create table t1 ( +x int auto_increment primary key, +b text, v mediumtext as (b) virtual, +index (v(10)) +) partition by range columns (x) ( +partition p1 values less than (3), +partition p2 values less than (6), +partition pn values less than (maxvalue)); +insert into t1 (b) values ('q'), ('a'), ('b'); +update t1 set b= 'bar' where v > 'a'; +drop table t1;
this test above doesn't cause ASAN failures for me without your fix. the test below fails all right.
+# Cover return_top_record() in ha_partition::handle_ordered_index_scan() +create table t1 (x int primary key, b tinytext, v text as (b) virtual) +partition by range columns (x) ( +partition p1 values less than (4), +partition pn values less than (maxvalue)); +insert into t1 (x, b) values (1, ''), (2, ''), (3, 'a'), (4, 'b'); +update t1 set b= 'bar' where x > 0 order by v limit 2; +drop table t1;
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
-- All the best, Aleksey Midenkov @midenok
Hi, Aleksey! On Oct 25, Aleksey Midenkov wrote:
On Wed, Oct 21, 2020 at 4:19 PM Sergei Golubchik
wrote: On Oct 20, Aleksey Midenkov wrote:
revision-id: 0cd5377e9ff (mariadb-10.2.31-357-g0cd5377e9ff) parent(s): dc716da4571 author: Aleksey Midenkov
committer: Aleksey Midenkov timestamp: 2020-08-17 00:52:35 +0300 message: MDEV-18734 ASAN heap-use-after-free in my_strnxfrm_simple_internal upon update on versioned partitioned table
No, I think we need to go deeper here. The bug is not only not caused by system versioning, it is also not caused by virtual columns.
The problem is that partitioning copies record[0] out and then back to save and restore values. This does not work for blobs, because record[0] may store the pointer to the memory that Field_blob owns and can free or reallocate as needed.
Normally it's not the problem because typically after row reads the engine owns the memory, not Field_blob. When a virtual column is set to a calculated value - then Field_blob owns the memory and we have this bug.
Yes, you right... Actually engine owning blobs is not good too because engine does not guarantee to keep previous records blobs.
I think engine owning blobs is safe. Because they belong to different handlers, and every handler is pretty much guaranteed to keep the blob for the last read row until the next row is read.
My solution is to make a special mem_root (m_ordered_root) and copy the blobs to m_ordered_root when the record is stored into m_ordered_rec_buffer. That m_ordered_root is freed when m_ordered_rec_buffer is destroyed (m_ordered_rec_buffer can be allocated on m_ordered_root too). When the record is restored from m_ordered_rec_buffer the table fields are made to own blobs: we can destroy m_ordered_root any time and the table->record[0] pointers will still be valid.
Just from your description it doesn't look like a good idea, MEM_ROOT can be only freed as a whole, so you'll keep allocating memory for blobs there and it'll keep growing, remembering all blobs you ever had in the table. But I'll see the patch, perhaps I misunderstood what you're doing. If I'm right though, then I'd suggest to look only at the case when Field_blob owns the memory. And to avoid copying blobs - they are big, moving them back and forth is going to take time. A fast solution could be to take the ownership of the blob memory from the Field_blob. And then give it back. Using either String::swap or a pair of String::release and String::reset - this won't need to allocate or memcpy anything. For example, every queue element has a `String backup`. It's empty, not alloced. no buffer, nothing. When a Field_blob is put into a queue, you backup::swap(Field_blob::value). When it's popped you swap again. This should work even if the memory is owned by the engine and the Field_blob::value is empty. Of course, there may be more than one blob, so it should really be an array of backup strings. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Sergei,
On Sat, Oct 31, 2020 at 12:54 PM Sergei Golubchik
Hi, Aleksey!
On Oct 25, Aleksey Midenkov wrote:
On Wed, Oct 21, 2020 at 4:19 PM Sergei Golubchik
wrote:
On Oct 20, Aleksey Midenkov wrote:
revision-id: 0cd5377e9ff (mariadb-10.2.31-357-g0cd5377e9ff) parent(s): dc716da4571 author: Aleksey Midenkov
committer: Aleksey Midenkov timestamp: 2020-08-17 00:52:35 +0300 message: MDEV-18734 ASAN heap-use-after-free in my_strnxfrm_simple_internal upon update on versioned partitioned table
No, I think we need to go deeper here. The bug is not only not caused by system versioning, it is also not caused by virtual columns.
The problem is that partitioning copies record[0] out and then back to save and restore values. This does not work for blobs, because record[0] may store the pointer to the memory that Field_blob owns and can free or reallocate as needed.
Normally it's not the problem because typically after row reads the engine owns the memory, not Field_blob. When a virtual column is set to a calculated value - then Field_blob owns the memory and we have this bug.
Yes, you right... Actually engine owning blobs is not good too because engine does not guarantee to keep previous records blobs.
I think engine owning blobs is safe. Because they belong to different handlers, and every handler is pretty much guaranteed to keep the blob for the last read row until the next row is read.
Agreed.
My solution is to make a special mem_root (m_ordered_root) and copy the blobs to m_ordered_root when the record is stored into m_ordered_rec_buffer. That m_ordered_root is freed when m_ordered_rec_buffer is destroyed (m_ordered_rec_buffer can be allocated on m_ordered_root too). When the record is restored from m_ordered_rec_buffer the table fields are made to own blobs: we can destroy m_ordered_root any time and the table->record[0] pointers will still be valid.
Just from your description it doesn't look like a good idea, MEM_ROOT can be only freed as a whole, so you'll keep allocating memory for blobs there and it'll keep growing, remembering all blobs you ever had in the table.
Right. I figured that out after I wrote the reply and made an additional patch with the array of mem_roots.
But I'll see the patch, perhaps I misunderstood what you're doing.
If I'm right though, then I'd suggest to look only at the case when Field_blob owns the memory. And to avoid copying blobs - they are big, moving them back and forth is going to take time. A fast solution could be to take the ownership of the blob memory from the Field_blob. And then give it back. Using either String::swap or a pair of String::release and String::reset - this won't need to allocate or memcpy anything.
This complicates things a little and makes the patch a bit less universal: we assert that only table->field can own and free the blob buffer otherwise it is safe. But ok, I made one more patch for your consideration: https://github.com/MariaDB/server/commits/2f9bbf392072361175c889cbf826f11c98...
For example, every queue element has a `String backup`. It's empty, not alloced. no buffer, nothing. When a Field_blob is put into a queue, you backup::swap(Field_blob::value). When it's popped you swap again. This should work even if the memory is owned by the engine and the Field_blob::value is empty. Of course, there may be more than one blob, so it should really be an array of backup strings.
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
-- All the best, Aleksey Midenkov @midenok
participants (2)
-
Aleksey Midenkov
-
Sergei Golubchik