Hi Alexey, Please find my review input below. There is one big issue and a number of smaller ones.
commit 654fdfee33e3eafe3b7f25d7e213717c22ea1e18 Author: Alexey Botchkov <holyfoot@askmonty.org> Date: Mon Mar 30 01:00:28 2020 +0400
MDEV-17399 Add support for JSON_TABLE.
Syntax for JSON_TABLE added. The ha_json_table handler added. Executing the JSON_TABLE we create the temporary table of the ha_json_table, add dependencies of other tables and sometimes conditions to WHERE.
== The big issue == I think some design choices of this patch are questionable: The temporary table has a unique key. What is it for? The key is defined over the field holding the JSON text. What if the JSON text gets bigger than MAX_KEY_LEN? Then, there is an Item_func_eq(json_text, temp_table_field), which is set to be "always equal" with set_always_equal(). This looks like a hack. EXPLAIN shows that JSON_TABLE uses a kind of ref access which I think is counter-intuitive. What is the goal of all this? The JSON_TABLE table is "laterally dependent" on the tables that are referred from its arguments. For the optimizer, this means: 1. The JSON_TABLE table must be put after its dependencies (this is similar to outer joins) 2. The JSON_TABLE can only be read when there are "current records" available for each of the dependencies (an example when they are not available: join buffering. There are many records in the buffer, which one we should produce matches for? For outer joins, join buffering code has a complex system to track this). The above "kludges" make an attempt to trick the optimizer into meeting requrirements #1 and #2 but I think this is approach is hack-ish and has holes. It would be much better if the optimizer was explicitly aware that a table is "laterally dependent" on other tables. "table dependencies" are already there, so the property #1 is already taken care of. We still need to take care of #2 which means disable join buffering in such cases. I think this should be easy to do. == Small bits from trying out the patch == === FOR ORDINALITY doesn't seem to work === select * from json_table('[{"a": 1, "b": [11,111]}, {"a": 2, "b": [22,222]}]', '$[*]' COLUMNS(a for ordinality) ) as tt; +------+ | a | +------+ | | | | +------+ 2 rows in set (0.001 sec) === NULL ON ERROR doesn't seem to work === select * from json_table('[{"a": 1, "b": [11,111]}, {"a": "bbbb", "b": [22,222]}]', '$[*]' COLUMNS( a DECIMAL(6,3) PATH '$.a' NULL ON ERROR)) as tt; +-------+ | a | +-------+ | 1.000 | | 0.000 | +-------+ I would expect the second row to have NULL, not 0. === Lack of test coverage === Other parts, like NESTED PATH, also have no test coverage at all. I think this must be addressed before this is done. === No error on Invalid Json input === If I pass invalid JSON, I get no warning or error or anything: MariaDB [test]> select * from json_table('[{"a": 1, invalid-json', '$[*]' COLUMNS( a INT PATH '$.a')) as tt; +------+ | a | +------+ | 1 | +------+ 1 row in set (0.001 sec) === Recursive ON-rule in the grammar === The json_opt_on_empty_or_error rule in sql_yacc.yy is recursive and causes the following to be accepted: select * from json_table( '[{"a": 1, "b": [11,111]}, {"a": 2, "b": [22,222]}]', '$[*]' COLUMNS(a INT PATH '$.a' NULL ON EMPTY NULL ON EMPTY NULL ON EMPTY) ) as tt; Is this intentional? === No error on invalid LATERAL dependency === create table t1 (item_name varchar(32), item_props varchar(1024)); insert into t1 values ('Laptop', '{"color": "black", "price": 1000}'); insert into t1 values ('Jeans', '{"color": "blue", "price": 50}'); MariaDB [test]> select * from t1 right join json_table(t1.item_props,'$' columns( color varchar(100) path '$.color')) as T on 1; Empty set (0.000 sec) The above query cannot be executed as left join execution requires that T is computed before 1, but t is dependent on t1. We dont get an error for this though. MySQL produces this: ERROR 3668 (HY000): INNER or LEFT JOIN must be used for LATERAL references made by 'T' with left join, there's a demo of how the trickery with the optimizer was successful (and I think one can construct other examples of this): MariaDB [test]> select * from t1 left join json_table(t1.item_props,'$' columns( color varchar(100) path '$.color')) as T on 1; +-----------+-----------------------------------+-------+ | item_name | item_props | color | +-----------+-----------------------------------+-------+ | Laptop | {"color": "black", "price": 1000} | blue | | Jeans | {"color": "blue", "price": 50} | blue | +-----------+-----------------------------------+-------+ 2 rows in set (0.002 sec) MariaDB [test]> explain select * from t1 left join json_table(t1.item_props,'$' columns( color varchar(100) path '$.color')) as T on 1; +------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 2 | | | 1 | SIMPLE | T | ALL | NULL | NULL | NULL | NULL | 40 | Using where; Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+ 2 rows in set (0.001 sec) === AS is required with table alias === sql_yacc.yy has this:
+table_function: + JSON_TABLE_SYM '(' expr ',' TEXT_STRING_sys + { ... + } + json_table_columns_clause ')' AS ident_table_alias
Is the 'AS' really required? MySQL-8 doesn't seem to require it. BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog