Hi Alexey, On Fri, May 15, 2020 at 04:26:02PM +0400, Alexey Botchkov wrote:
See the next iteration here https://github.com/MariaDB/server/commit/692cb566096d61b240ec26e84fc7d3c7d13...
So the ha_json_table;:position() was implemented. The size of the reference depends on the nested path depth - each level adds some 9 bytes to the initial 5. That can be decreased but i decided to keep it simple initially and i doubt we're going to have really deep nesting in realistic scenarios.
This is ok. I'm not sure if the server has a limit on the rowid size. Perhaps there is, and in that case we just need to limit the depth we allow.
And i'd like to ask you for some testing queries here. Or just the model how to produce queries that will be using the ::position() extensively.
I've provided a testcase for the position() call in my previous email. https://gist.github.com/spetrunia/a905d51731c58f5439bd9f70c64cdc43 As for rnd_pos(), one case that I'm aware of is when the query does a filesort, and the row being sorted either includes a blob, or has the total length exceeding certain limit. I've tried constructing an example with blobs, and it fails with an assertion before the execution reaches rnd_pos() calls: select * from json_table('[{"color": "blue", "price": 50}, {"color": "red", "price": 100}, {"color": "rojo", "price": 10.0}, {"color": "blanco", "price": 11.0}]', '$[*]' columns( color varchar(100) path '$.color', price text path '$.price', seq for ordinality ) ) as T order by color desc; fails an assertion: mysqld: /home/psergey/dev-git/10.5-json/sql/field.cc:8309: virtual int Field_blob::store(const char*, size_t, CHARSET_INFO*): Assertion `marked_for_write_or_computed()' failed. Once that is fixed, this should use position() and rnd_pos() calls. This will likely expose more issues with rnd_pos(), see my comments to the Json_table_nested_path::set_position below. == A problem with VIEWs == mysql> select * -> from -> json_table( -> '[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]', -> '$[*]' columns( color varchar(100) path '$.color')) as `T_A` where T_A.color<>'azul' ; +-------+ | color | +-------+ | red | | black | +-------+ 2 rows in set (6.34 sec) Good so far. Now, let's try creating a VIEW from this: create view v1 as select * from json_table( '[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]', '$[*]' columns( color varchar(100) path '$.color')) as `T_A` where T_A.color<>'azul' ; select * from v1; ERROR 4041 (HY000): Unexpected end of JSON path in argument 2 to function 'JSON_TABLE' Examining the .frm file, I see something that looks like garbage data: query=select `T_A`.`color` AS `color` from JSON_TABLE(\'[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]\', \'\'\0<D8>7^A<9C><FF>^?\0\0\0\0\0\0\0\0\0\0\0\0^A<A5><A5><A5><A5><A5>^H<D0>kWUU\0\0[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]\0$[*]\'\' COLUMNS (`color` PATH \'^A<9C><FF>^?\0\0\0\0\0\0^A\0\0\0\0\0\0\0<A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5>\0\0\0\0<A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5> <A5><A5><A5><A5><A5><A5> D^A<9C><FF>^?\0\0H:^A<9C><FF>^?\0\0<A5><A5><A5><A5><A5><A5><A5><A5>100\0<8F><8F><8F><8F>$.color\')) T_A where `T_A`.`color` <> \'azul\' EXPLAIN EXTENDED ...; SHOW WARNINGS; - also print something odd. == EXPLAIN [FORMAT=JSON] == I think EXPLAIN output should provide an indication that a table function is used. MySQL does it like so: <TODO> == Unneeded recursive rules in grammar == I've already complained about this: the grammar has recursive ON EMPTY, ON ERROR rules, which cause the following to be accepted (note the two ON EMPTY clauses) : select * from json_table('[{"color": "blue", "price": 50},{"color": "red"}]', '$[*]' columns( price varchar(255) path '$.price' default 'xyz' on empty default 'abc' on empty ) ) as T;
commit 692cb566096d61b240ec26e84fc7d3c7d13f024c Author: Alexey Botchkov <holyfoot@askmonty.org> Date: Fri May 15 15:25:28 2020 +0400
MDEV-17399 Add support for JSON_TABLE.
Syntax for JSON_TABLE added. The ha_json_table handler added. Executing the JSON_TABLE we create the temporary table of the ha_json_table, add dependencies of other tables and sometimes conditions to WHERE.
and sometimes conditions to WHERE.
I think this is not true anymore? ...
diff --git a/include/my_base.h b/include/my_base.h index 7efa5eb9673..89ef3e8e7c1 100644 --- a/include/my_base.h +++ b/include/my_base.h @@ -523,6 +523,13 @@ enum ha_base_keytype { #define HA_ERR_TABLESPACE_MISSING 194 /* Missing Tablespace */ #define HA_ERR_SEQUENCE_INVALID_DATA 195 #define HA_ERR_SEQUENCE_RUN_OUT 196 + +/* + Share the error code to not increment the HA_ERR_LAST for now, + as it disturbs some storage engine's tests. + Probably should be fixed later. +*/ +#define HA_ERR_INVALID_JSON HA_ERR_TABLE_IN_FK_CHECK
We definitely can't have this in the final patch. We need to either: A. Use an engine-specific error code. Check out MyRocks and ha_rocksdb::get_error_message() for an example of how this is done B. Indeed introduce a generic "Invalid JSON" error code. I'm hesitant about B, let's discuss this with other developers. ...
diff --git a/sql/table_function.cc b/sql/table_function.cc new file mode 100644 index 00000000000..71f2378ce7d --- /dev/null +++ b/sql/table_function.cc ... + +class ha_json_table: public handler +{
Please add comments about these
+protected: + Table_function_json_table *m_jt; + String m_tmps; + String *m_js; + uchar *m_cur_pos; +public: + ha_json_table(TABLE_SHARE *share_arg, Table_function_json_table *jt): + handler(&table_function_hton.m_hton, share_arg), m_jt(jt) + { + /* + set the mark_trx_read_write_done to avoid the + handler::mark_trx_read_write_internal() call. + It relies on &ha_thd()->ha_data[ht->slot].ha_info[0] to be set. + But we don't set the ha_data for the ha_json_table, and + that call makes no sence for ha_json_table. + */ + mark_trx_read_write_done= 1; + ref_length= (jt->m_depth+1) * (4 + 1) + + jt->m_depth * (sizeof(Json_table_nested_path *)); + } + ~ha_json_table() {} + handler *clone(const char *name, MEM_ROOT *mem_root) { return NULL; } + const char *index_type(uint inx) { return "NONE"; } + /* Rows also use a fixed-size format */ + enum row_type get_row_type() const { return ROW_TYPE_FIXED; } + ulonglong table_flags() const + { + return (HA_FAST_KEY_READ | /*HA_NO_BLOBS |*/ HA_NULL_IN_KEY | + HA_CAN_SQL_HANDLER | + HA_REC_NOT_IN_SEQ | HA_NO_TRANSACTIONS | + HA_HAS_RECORDS | HA_CAN_HASH_KEYS); + } + ulong index_flags(uint inx, uint part, bool all_parts) const + { + return HA_ONLY_WHOLE_INDEX | HA_KEY_SCAN_NOT_ROR; + } + uint max_supported_keys() const { return 1; } + uint max_supported_key_part_length() const { return MAX_KEY_LENGTH; } + double scan_time() { return 1000000.0; } + double read_time(uint index, uint ranges, ha_rows rows) { return 0.0; }
The above two functions are never called. Please * add a comment saying that. * add a DBUG_ASSERT() into them to back the point made by the comment :-)
+ int open(const char *name, int mode, uint test_if_locked); + int close(void) { return 0; } + int rnd_init(bool scan); + int rnd_next(uchar *buf); + int rnd_pos(uchar * buf, uchar *pos); + void position(const uchar *record); + int can_continue_handler_scan() { return 1; } + int info(uint); + int extra(enum ha_extra_function operation); + THR_LOCK_DATA **store_lock(THD *thd, THR_LOCK_DATA **to, + enum thr_lock_type lock_type) + { return NULL; } + int create(const char *name, TABLE *form, HA_CREATE_INFO *create_info) + { return 1; } +private: + void update_key_stats(); +}; + ...
+void Json_table_nested_path::set_position(const char *j_start, const uchar *pos) +{
* This needs a comment. * I don't see where the value of m_ordinality_counter is restored? * The same about m_null - its value is not restored either?
+ if (m_nested) + { + memcpy(&m_cur_nested, pos, sizeof(m_cur_nested)); + pos+= sizeof(m_cur_nested); + } + + m_engine.s.c_str= (uchar *) j_start + sint4korr(pos); + m_engine.state= (int) pos[4]; + if (m_cur_nested) + m_cur_nested->set_position(j_start, pos); +} + ...
+int ha_json_table::info(uint) +{ + /* We don't want 0 or 1 in stats.records. */ + stats.records= 4; + return 0;
Does this value matter? As far as I understand it doesn't, as the optimizer will use the estimates obtained from Table_function_json_table::get_estimates. Please add a comment about this.
+} + ...
Please document this function. What's last_column? Why does print() method get it and return it?
+int Json_table_nested_path::print(THD *thd, TABLE_LIST *sql_table, String *str, + List_iterator_fast<Json_table_column> &it, + Json_table_column **last_column) +{ + Json_table_nested_path *c_path= this; ...
diff --git a/sql/table_function.h b/sql/table_function.h new file mode 100644 index 00000000000..49d650cb7b2 --- /dev/null +++ b/sql/table_function.h @@ -0,0 +1,168 @@ ... +#include <json_lib.h> + +/* + The Json_table_nested_path represents the 'current nesting' level + for a set of JSON_TABLE columns. + Each column (Json_table_column instance) is linked with corresponding + 'nested path' object and gets it's piece of JSON to parse during the computation + phase. + The root 'nested_path' is always present as a part of Table_function_json_table, + then other 'nested_paths' can be created and linked into a tree structure when new + 'NESTED PATH' is met. The nested 'nested_paths' are linked with 'm_nested', the same-level + 'nested_paths' are linked with 'm_next_nested'. + So for instance + JSON_TABLE( '...', '$[*]' + COLUMNS( a INT PATH '$.a' , + NESTED PATH '$.b[*]' COLUMNS (b INT PATH '$', + NESTED PATH '$.c[*]' COLUMNS(x INT PATH '$')), + NESTED PATH '$.n[*]' COLUMNS (z INT PAHT '$')) + results in 4 'nested_path' created: + root nested_b nested_c nested_n + m_path '$[*]' '$.b[*]' '$.c[*]' '$.n[*] + m_nested &nested_b &nested_c NULL NULL + n_next_nested NULL &nested_n NULL NULL + +and 4 columns created: + a b x z + m_nest &root &nested_b &nested_c &nested_n +*/ + + +class Json_table_column; + +class Json_table_nested_path : public Sql_alloc +{ +public: + bool m_null; + json_path_t m_path; + json_engine_t m_engine; + json_path_t m_cur_path; + + /* Counts the rows produced. Value is set to the FOR ORDINALITY coluns */ + longlong m_ordinality_counter; + + Json_table_nested_path *m_parent; + Json_table_nested_path *m_nested, *m_next_nested; + Json_table_nested_path **m_nested_hook; + Json_table_nested_path *m_cur_nested;
Please add documentation about the above members. I see the diagram above, but I think text descriptions are also needed for each member. What's m_nested_hook?
+ Json_table_nested_path(Json_table_nested_path *parent_nest): + m_parent(parent_nest), m_nested(0), m_next_nested(0), + m_nested_hook(&m_nested) {} + int set_path(THD *thd, const LEX_CSTRING &path); + void scan_start(CHARSET_INFO *i_cs, const uchar *str, const uchar *end); + int scan_next(); + int print(THD *thd, TABLE_LIST *sql_table, String *str, + List_iterator_fast<Json_table_column> &it, + Json_table_column **last_column); + void get_current_position(const char *j_start, uchar *pos) const; + void set_position(const char *j_start, const uchar *pos); +}; ...
+class Table_function_json_table : public Sql_alloc
Please document the class and the data members.
+{ +public: + Item *m_json; + Json_table_nested_path m_nested_path; + List<Json_table_column> m_columns; + table_map m_dep_tables; + uint m_depth, m_cur_depth; + + Table_function_json_table(Item *json): m_json(json), m_nested_path(0), + m_depth(0), m_cur_depth(0) {} + + /* + Used in sql_yacc.yy. + Represents the current NESTED PATH level being parsed. + */ + Json_table_nested_path *m_sql_nest; + void add_nested(Json_table_nested_path *np); + void leave_nested();
BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog