[Maria-developers] JSON_TABLE: on default values
Hi Alexey, At the moment MariaDB requires that the values in DEFAULT clauses are quoted. Example: select * from json_table( '{"intval": 1000}', '$' columns( col1 int path '$.intval_' default '100' on empty ) ) as T; here, "100" must be quoted, otherwise one gets a parse error. However, the quoted value is interpreted as an SQL literal. This looks puzzling. MySQL-8 also requires that the default value is quoted, but they have a (very odd) reason for it: they interpret the default value as JSON: https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/json-table-functions.html says: DEFAULT json_string ON EMPTY: the provided json_string is parsed as JSON, as long as it is valid, and stored instead of the missing value. Column type rules also apply to the default value. I am not sure why MySQL chose to do this. Looking into the SQL Standard, one can see: <JSON table regular column definition> ::= <column name> <data type> [ PATH <JSON table column path specification> ] [ <JSON table column empty behavior> ON EMPTY ] [ <JSON table column error behavior> ON ERROR ] <JSON table column empty behavior> ::= ERROR | NULL | DEFAULT <value expression> ... This doesn't say whether the <value expression> should be interepreted as JSON or just as a value. But one can find this passage: <quote> Without Feature T826, “General value expression in ON ERROR or ON EMPTY clauses”, the <value expression> contained in <JSON table column empty behavior> or <JSON table column error behavior> contained in a <JSON table regular column definition> JTRCD shall be a <literal> that can be cast to the data type specified by the <data type> contained in JTRCD without raising an exception condition according to the General Rules of Subclause 6.13, “<cast specification>”. </quote> The important part is: ... shall be a <literal> that can be cast to the data type specified ... which means it is not JSON. It is just a literal, and literal can be a string literal (in quotes, 'string') or an integer literal (without quotes) or other kind of literal. Btw, Oracle Database allows non-string literals in the default clause: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=9af7e43ede77ee285e1a65f1f419d3bd What are your thoughts on this? Is MariaDB's behavior intentional? Should we follow the standard and allow all kinds of literals? What was the reason for the limitation that default values are quoted? BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
Hi, Sergey! I meditated about it for some time. I remember i was thinking on that part before and did that so for some reason. Though either i was wrong or didn't finish what i planned. This time i'd say we should allow numeric constants there too. Here's the patch i'd push to fix this: https://github.com/MariaDB/server/commit/9c518e4cc9b0569cae2daa5a4024e209293... Best regards. HF On Wed, May 26, 2021 at 8:01 PM Sergey Petrunia <sergey@mariadb.com> wrote:
Hi Alexey,
At the moment MariaDB requires that the values in DEFAULT clauses are quoted. Example:
select * from json_table( '{"intval": 1000}', '$' columns( col1 int path '$.intval_' default '100' on empty ) ) as T;
here, "100" must be quoted, otherwise one gets a parse error. However, the quoted value is interpreted as an SQL literal. This looks puzzling.
MySQL-8 also requires that the default value is quoted, but they have a (very odd) reason for it: they interpret the default value as JSON:
https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/json-table-functions.html says:
DEFAULT json_string ON EMPTY: the provided json_string is parsed as JSON, as long as it is valid, and stored instead of the missing value. Column type rules also apply to the default value.
I am not sure why MySQL chose to do this. Looking into the SQL Standard, one can see:
<JSON table regular column definition> ::= <column name> <data type> [ PATH <JSON table column path specification> ] [ <JSON table column empty behavior> ON EMPTY ] [ <JSON table column error behavior> ON ERROR ]
<JSON table column empty behavior> ::= ERROR | NULL | DEFAULT <value expression>
... This doesn't say whether the <value expression> should be interepreted as JSON or just as a value. But one can find this passage:
<quote> Without Feature T826, “General value expression in ON ERROR or ON EMPTY clauses”, the <value expression> contained in <JSON table column empty behavior> or <JSON table column error behavior> contained in a <JSON table regular column definition> JTRCD shall be a <literal> that can be cast to the data type specified by the <data type> contained in JTRCD without raising an exception condition according to the General Rules of Subclause 6.13, “<cast specification>”. </quote>
The important part is:
... shall be a <literal> that can be cast to the data type specified ...
which means it is not JSON. It is just a literal, and literal can be a string literal (in quotes, 'string') or an integer literal (without quotes) or other kind of literal.
Btw, Oracle Database allows non-string literals in the default clause:
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=9af7e43ede77ee285e1a65f1f419d3bd
What are your thoughts on this? Is MariaDB's behavior intentional? Should we follow the standard and allow all kinds of literals? What was the reason for the limitation that default values are quoted?
BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
Hi Alexey, Ok I've filed https://jira.mariadb.org/browse/MDEV-25822 for this. Let me review the patch. On Sun, May 30, 2021 at 06:44:19AM +0400, Alexey Botchkov wrote:
Hi, Sergey!
I meditated about it for some time. I remember i was thinking on that part before and did that so for some reason. Though either i was wrong or didn't finish what i planned. This time i'd say we should allow numeric constants there too. Here's the patch i'd push to fix this: https://github.com/MariaDB/server/commit/9c518e4cc9b0569cae2daa5a4024e209293...
Best regards. HF
On Wed, May 26, 2021 at 8:01 PM Sergey Petrunia <sergey@mariadb.com> wrote:
Hi Alexey,
At the moment MariaDB requires that the values in DEFAULT clauses are quoted. Example:
select * from json_table( '{"intval": 1000}', '$' columns( col1 int path '$.intval_' default '100' on empty ) ) as T;
here, "100" must be quoted, otherwise one gets a parse error. However, the quoted value is interpreted as an SQL literal. This looks puzzling.
MySQL-8 also requires that the default value is quoted, but they have a (very odd) reason for it: they interpret the default value as JSON:
https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/json-table-functions.html says:
DEFAULT json_string ON EMPTY: the provided json_string is parsed as JSON, as long as it is valid, and stored instead of the missing value. Column type rules also apply to the default value.
I am not sure why MySQL chose to do this. Looking into the SQL Standard, one can see:
<JSON table regular column definition> ::= <column name> <data type> [ PATH <JSON table column path specification> ] [ <JSON table column empty behavior> ON EMPTY ] [ <JSON table column error behavior> ON ERROR ]
<JSON table column empty behavior> ::= ERROR | NULL | DEFAULT <value expression>
... This doesn't say whether the <value expression> should be interepreted as JSON or just as a value. But one can find this passage:
<quote> Without Feature T826, “General value expression in ON ERROR or ON EMPTY clauses”, the <value expression> contained in <JSON table column empty behavior> or <JSON table column error behavior> contained in a <JSON table regular column definition> JTRCD shall be a <literal> that can be cast to the data type specified by the <data type> contained in JTRCD without raising an exception condition according to the General Rules of Subclause 6.13, “<cast specification>”. </quote>
The important part is:
... shall be a <literal> that can be cast to the data type specified ...
which means it is not JSON. It is just a literal, and literal can be a string literal (in quotes, 'string') or an integer literal (without quotes) or other kind of literal.
Btw, Oracle Database allows non-string literals in the default clause:
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=9af7e43ede77ee285e1a65f1f419d3bd
What are your thoughts on this? Is MariaDB's behavior intentional? Should we follow the standard and allow all kinds of literals? What was the reason for the limitation that default values are quoted?
BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
-- BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
Hi Alexey,
On Sun, May 30, 2021 at 06:44:19AM +0400, Alexey Botchkov wrote:
Hi, Sergey!
I meditated about it for some time. I remember i was thinking on that part before and did that so for some reason. Though either i was wrong or didn't finish what i planned. This time i'd say we should allow numeric constants there too. Here's the patch i'd push to fix this: https://github.com/MariaDB/server/commit/9c518e4cc9b0569cae2daa5a4024e209293...
In the next patch, please refer to MDEV-25822 in the commit comment. select * from json_table('{"a": "b"}', '$' columns(col1 varchar(32) path '$.fooo' default 123456 on empty) ) as T; +--------+ | col1 | +--------+ | 123456 | +--------+ Ok. "default 0.5" also works "default -0.5" DOESN'T - it produces a parse error. "default 8446744073709551615" - works, bigint is accepted "default 18446744073709551615" - DOESN'T, bigint unsigned is not accepted Other kinds of literals like "DATE '20201-01-01'" are not accepted either. The cause of this is the grammar, which allows to accept only certain kinds of literals:
diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy index 37cdfc20030ab..192a4879b751d 100644 --- a/sql/sql_yacc.yy +++ b/sql/sql_yacc.yy @@ -1324,6 +1324,7 @@ End SQL_MODE_ORACLE_SPECIFIC */ TEXT_STRING NCHAR_STRING json_text_literal + json_text_literal_or_num
%type <lex_str_ptr> opt_table_alias_clause @@ -11550,6 +11551,26 @@ json_text_literal: } ;
+json_text_literal_or_num: + json_text_literal + | NUM + { + Lex->json_table->m_text_literal_cs= NULL; + } + | LONG_NUM + { + Lex->json_table->m_text_literal_cs= NULL; + } + | DECIMAL_NUM + { + Lex->json_table->m_text_literal_cs= NULL; + } + | FLOAT_NUM + { + Lex->json_table->m_text_literal_cs= NULL; + } + ; + join_table_list: derived_table_list { MYSQL_YYABORT_UNLESS($$=$1); } ;
When I see this, I wonder if it was possible to make use of a general 'literal' rule instead? That rule differs from the current symbols: it produces Item expressions. Is this a problem? (I don't think it should be). The only concern about allowing more kinds of literals is NULL literals: JSON TABLE has special provision for emitting NULL: NULL ON {EMPTY|ERROR} which makes supporting DEFAULT NULL ON {EMPTY|ERROR} redundant. I don't see any problem if we support this, though. BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
When I see this, I wonder if it was possible to make use of a general 'literal' rule instead? Well the 'literal' rule doesn't accept the -0.5 :) But basically agree.
DEFAULT NULL ON {EMPTY|ERROR} redundant. I don't see any problem if we support this, though. Let it be so.
Here is the new patch. https://github.com/MariaDB/server/commit/8dae7ee02f98e71e9352d73d1da235fd412... Best regards. HF On Tue, Jun 8, 2021 at 6:50 PM Sergey Petrunia <sergey@mariadb.com> wrote:
Hi Alexey,
Hi, Sergey!
I meditated about it for some time. I remember i was thinking on that
On Sun, May 30, 2021 at 06:44:19AM +0400, Alexey Botchkov wrote: part
before and did that so for some reason. Though either i was wrong or didn't finish what i planned. This time i'd say we should allow numeric constants there too. Here's the patch i'd push to fix this:
https://github.com/MariaDB/server/commit/9c518e4cc9b0569cae2daa5a4024e209293...
In the next patch, please refer to MDEV-25822 in the commit comment.
select * from json_table('{"a": "b"}', '$' columns(col1 varchar(32) path '$.fooo' default 123456 on empty) ) as T; +--------+ | col1 | +--------+ | 123456 | +--------+
Ok.
"default 0.5" also works "default -0.5" DOESN'T - it produces a parse error. "default 8446744073709551615" - works, bigint is accepted "default 18446744073709551615" - DOESN'T, bigint unsigned is not accepted
Other kinds of literals like "DATE '20201-01-01'" are not accepted either.
The cause of this is the grammar, which allows to accept only certain kinds of literals:
diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy index 37cdfc20030ab..192a4879b751d 100644 --- a/sql/sql_yacc.yy +++ b/sql/sql_yacc.yy @@ -1324,6 +1324,7 @@ End SQL_MODE_ORACLE_SPECIFIC */ TEXT_STRING NCHAR_STRING json_text_literal + json_text_literal_or_num
%type <lex_str_ptr> opt_table_alias_clause @@ -11550,6 +11551,26 @@ json_text_literal: } ;
+json_text_literal_or_num: + json_text_literal + | NUM + { + Lex->json_table->m_text_literal_cs= NULL; + } + | LONG_NUM + { + Lex->json_table->m_text_literal_cs= NULL; + } + | DECIMAL_NUM + { + Lex->json_table->m_text_literal_cs= NULL; + } + | FLOAT_NUM + { + Lex->json_table->m_text_literal_cs= NULL; + } + ; + join_table_list: derived_table_list { MYSQL_YYABORT_UNLESS($$=$1); } ;
When I see this, I wonder if it was possible to make use of a general 'literal' rule instead? That rule differs from the current symbols: it produces Item expressions. Is this a problem? (I don't think it should be).
The only concern about allowing more kinds of literals is NULL literals: JSON TABLE has special provision for emitting NULL:
NULL ON {EMPTY|ERROR}
which makes supporting
DEFAULT NULL ON {EMPTY|ERROR}
redundant. I don't see any problem if we support this, though.
BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
participants (2)
-
Alexey Botchkov
-
Sergey Petrunia