[Maria-developers] Review for: MDEV-17399 Add support for JSON_TABLE, part #3
Hi Alexey, Please find the next batch of input below. (I haven't finished the review but I'm sending it now so you can start addressing it)
commit 68ca18518337443090bdeddf6d612129b6843c21 Author: Alexey Botchkov <holyfoot@askmonty.org> Date: Fri Apr 17 14:42:40 2020 +0400
MDEV-17399 Add support for JSON_TABLE.
Syntax for JSON_TABLE added. The ha_json_table handler added. Executing the JSON_TABLE we create the temporary table of the ha_json_table, add dependencies of other tables and sometimes conditions to WHERE.
...
+public: + ha_json_table(TABLE_SHARE *share_arg, Table_function_json_table *jt): + handler(&table_function_hton.m_hton, share_arg), m_jt(jt) + { + mark_trx_read_write_done= 1;
This definitely needs a comment.
+ ~ha_json_table() {} + handler *clone(const char *name, MEM_ROOT *mem_root) { return NULL; } + const char *index_type(uint inx) { return "HASH"; }
Why is ha_json_table still trying to pretend it supports indexes? Is this a left-over from the old way of interfacing with the optimizer that can now be removed?
+int Table_function_json_table::setup(THD *thd, TABLE_LIST *sql_table, + COND **cond) +{ + if (m_json->fix_fields(thd, &m_json)) + return TRUE; + + m_dep_tables= m_json->used_tables();
Why does this function have a 'cond' argument? Please remove it if it's not necessary. == Duplicate column names are allowed == select * from JSON_TABLE('{"color": "black", "price": 1000}', "$" COLUMNS (rowSeq FOR ORDINALITY, color VARCHAR(100) PATH '$.color', color VARCHAR(100) PATH '$.price') ) as T; +--------+-------+-------+ | rowSeq | color | color | +--------+-------+-------+ | 0 | black | 1000 | +--------+-------+-------+ Is it really okay that duplicate column names are allowed? == RIGHT JOIN is allowed == create table t1 (item_name varchar(32), item_props varchar(1024)); insert into t1 values ('Laptop', '{"color": "black", "price": 1000}'); insert into t1 values ('Jeans', '{"color": "blue", "price": 50}'); This query should not be allowed, because JSON_TABLE has a dependency on an inner side of an outer join: select * from t1 right join json_table(t1.item_props, '$' columns(color varchar(100) path '$.color', f2 for ordinality) ) as T on 1; === NATURAL JOIN crashes == create table t10 (color varchar(100), f2 int); select * from t10 natural join json_table(t1.item_props, '$' columns(color varchar(100) path '$.color', f2 for ordinality) ) as JT ; Thread 31 "mysqld" received signal SIGSEGV, Segmentation fault. 0x0000555555d8ac12 in Select_limit_counters::get_select_limit (this=0xa5a5a5a5a5a5ad2d) at /home/psergey/dev-git/10.5-json/sql/sql_limit.h:67 (gdb) wher #0 0x0000555555d8ac12 in Select_limit_counters::get_select_limit (this=0xa5a5a5a5a5a5ad2d) at /home/psergey/dev-git/10.5-json/sql/sql_limit.h:67 #1 0x0000555555e2a4a0 in JOIN::optimize_inner (this=0x7fffac018520) at /home/psergey/dev-git/10.5-json/sql/sql_select.cc:1805 #2 0x0000555555e29b12 in JOIN::optimize (this=0x7fffac018520) at /home/psergey/dev-git/10.5-json/sql/sql_select.cc:1612 BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
Hi Alexey, More review input, based on the latest code: == It doesn't compile == I get the following compiler error and indeed looks the compiler complains about a real problem there: /home/psergey/dev-git2/10.5-hf-review/sql/item_jsonfunc.cc:340:15: error: ‘code’ may be used uninitialized in this function [-Werror=maybe-uninitialized] my_error(code, MYF(0), JSON_DEPTH_LIMIT, n_param, fname, position); ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ == Rowid support == ha_json_table has these functions: int rnd_pos(uchar * buf, uchar *pos) { return 1; } void position(const uchar *record) {} That is, saving the rowid does nothing, and attempting to read a row by rowid returns an error. It is mandatory for a storage engine to produce distinct rowids for different rows. One user of this property is Duplicate Weedout optimization. Here's a testcase: https://gist.github.com/spetrunia/a905d51731c58f5439bd9f70c64cdc43 I think it is also mandatory to implement rnd_pos() and be able to read the row by its rowid. One known user of this feature is "filesort over blob columns", but I've experienced another error when trying to construct a testcase (see the next section). How do we implement position and rnd_pos(). === Solution #1 === Let the rowid contain the FOR ORDINALITY number for each Json_nested_path. This number will be different for each row and also it will identify the row. I assume that implementing rnd_pos() will be hard, though. We no longer have pointer to values we've returned for row number N. Should we walk the document again from the start? This seems inefficient. Summary: complex and/or inefficient === Solution #2 === when we have produced the output row, let's write it into a temporary table (a HEAP or Aria table that is), and return something that identifies the row we've written. Ideally that should be the rowid of the row we just wrote, but AFAIU the storage engine API doesn't allow one to easily get this. We'll need to either implement such call, or roll our own rowids in form of a hidden auto-increment column. Summary: also there's some work to do, but at least this has an advantage of being potentially useful to other table functions. == Text columns not supported? == I've accidentally discovered that JSON_TABLE cannot have text columns (this is not obvious from the code). select * from json_table('[{"color": "blue", "price": 50}, {"color": "red", "price": 100}]', '$[*]' columns( color varchar(100) path '$.color', price text path '$.price' ) ) as T; Any idea why? (MySQL supports them btw) == Multiple nested paths produce -1 as FOR ORDINALITY column == set @js= ' [ {"color": "blue", "sizes": [1,2,3,4], "prices" : [10,20]}, {"color": "red", "sizes": [10,11,12,13,14], "prices" : [100,200,300]} ]'; select * from json_table(@js, '$[*]' columns( color varchar(4) path '$.color', seq0 for ordinality, nested path '$.sizes[*]' columns (seq1 for ordinality, size int path '$'), nested path '$.prices[*]' columns (seq2 for ordinality, price int path '$') ) ) as T; +-------+------+------+------+------------+-------+ | color | seq0 | seq1 | size | seq2 | price | +-------+------+------+------+------------+-------+ | blue | 1 | 1 | 1 | 2147483647 | NULL | | blue | 1 | 2 | 2 | 2147483647 | NULL | | blue | 1 | 3 | 3 | 2147483647 | NULL | | blue | 1 | 4 | 4 | 2147483647 | NULL | | blue | 1 | 4 | NULL | 1 | 10 | | blue | 1 | 4 | NULL | 2 | 20 | | red | 2 | 1 | 10 | 2 | NULL | | red | 2 | 2 | 11 | 2 | NULL | | red | 2 | 3 | 12 | 2 | NULL | | red | 2 | 4 | 13 | 2 | NULL | | red | 2 | 5 | 14 | 2 | NULL | | red | 2 | 5 | NULL | 1 | 100 | | red | 2 | 5 | NULL | 2 | 200 | | red | 2 | 5 | NULL | 3 | 300 | +-------+------+------+------+------------+-------+ 14 rows in set (0.000 sec) BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
participants (1)
-
Sergey Petrunia