[Maria-developers] Please review MDEV-17832 Protocol: extensions for Pluggable types and JSON, GEOMETRY
Hi, Sergei, Georg, Please review a fixed version of the patch for MDEV-17832. There are two files attached: - mdev-17821.v18.diff (server changes) - mdev-17821-cli.v06.diff (libmariadb changes) Comparing to the previous version, this version: 1. Adds a new structure MA_FIELD_EXTENSION 2. Moves extended data type information from MYSQL_FIELD to MYSQL_FIELD::extension in the client-server implementation. Note, in case of embedded server, the extended metadata is stored directly to MYSQL_FIELD. 3. Adds a new API function mariadb_field_metadata_attr(), to extract metadata from MYSQL_FIELD. 4. Changes the way how the metadata is packed on the wire from "easily human readable" to "easily parse-able", which: - makes the things faster - allows to transfer arbitrary binary data in the future, if needed. Every metadata chunk is now encoded as: a. chunk type (1 byte) b. chunk data length (1 byte) c. chunk data (according to #b) For now, two chunk types are implemented: - data type name (used for GEOMETRY sub-types, and for INET6) - format name (for JSON) Thanks!
Hi Bar, 2 things I'd like to change: +typedef struct st_mariadb_metadata_string +{ + const char *str; + size_t length; +} MARIADB_FIELD_METADATA_STRING; We already have MYSQL_LEX_STRING for dynamic columns and MARIADB_STRING for replication API. I think we should just move MARIADB_STRING from mariadb_rpl.h to mysql.h and use it instead. + + +MARIADB_FIELD_METADATA_STRING + mariadb_field_metadata_attr(const MYSQL_FIELD *field, + enum mariadb_field_metadata_attr_t type); I think we should have a more general function, so that we don't need to add a new function when we need to extend MYSQL_FIELD (but it can be used also for retrieving values from other members of MYSQL_FIELD): enum mariadb_field_info_type { FIELD_INFO_DATA_TYPE_NAME, FIELD_INFO_DATA_TYPE_FORMAT /* Later I can add FIELD_INFO_CATALOG, FIELD_INFO_TABLE, .. */ } my_bool STDCALL mariadb_get_field_info(const MYSQL_FIELD *field, enum mariadb_field_info_type, void *arg, ..) We have a similiar function in mariadb_lib.c: mariadb_get_infov(): Also the function needs to be added to the MARIADB_SYMBOLS in libmariadb/CMakeLists.txt Otherwise it looks ok! /Georg On Wed, Feb 26, 2020 at 3:40 PM Alexander Barkov <bar@mariadb.com> wrote:
Hi, Sergei, Georg,
Please review a fixed version of the patch for MDEV-17832.
There are two files attached: - mdev-17821.v18.diff (server changes) - mdev-17821-cli.v06.diff (libmariadb changes)
Comparing to the previous version, this version:
1. Adds a new structure MA_FIELD_EXTENSION
2. Moves extended data type information from MYSQL_FIELD to MYSQL_FIELD::extension in the client-server implementation.
Note, in case of embedded server, the extended metadata is stored directly to MYSQL_FIELD.
3. Adds a new API function mariadb_field_metadata_attr(), to extract metadata from MYSQL_FIELD.
4. Changes the way how the metadata is packed on the wire from "easily human readable" to "easily parse-able", which: - makes the things faster - allows to transfer arbitrary binary data in the future, if needed.
Every metadata chunk is now encoded as:
a. chunk type (1 byte) b. chunk data length (1 byte) c. chunk data (according to #b)
For now, two chunk types are implemented: - data type name (used for GEOMETRY sub-types, and for INET6) - format name (for JSON)
Thanks!
-- Georg Richter, Senior Software Engineer MariaDB Corporation Ab
Hi Georg, Thanks for reviewing! See comments inline: On 2/26/20 7:58 PM, Georg Richter wrote:
Hi Bar,
2 things I'd like to change:
+typedef struct st_mariadb_metadata_string +{ + const char *str; + size_t length; +} MARIADB_FIELD_METADATA_STRING;
We already have MYSQL_LEX_STRING for dynamic columns and MARIADB_STRING for replication API. I think we should just move MARIADB_STRING from mariadb_rpl.h to mysql.h and use it instead.
These structures do not work. They use a "char *" pointer: struct st_mysql_lex_string { char *str; size_t length; }; typedef struct st_mysql_lex_string MYSQL_LEX_STRING; typedef struct st_mysql_lex_string LEX_STRING; typedef struct { char *str; size_t length; } MARIADB_STRING; While I need a "const char *" pointer. Perhaps we could add a new structure MARIADB_CSTRING as follows: typedef struct { const char *str; size_t length; } MARIADB_CSTRING; Btw, possibly in many cases we'll be able to change MARIADB_STRING / MYSQL_LEX_STRING to MARIADB_CSTRING.
+ + +MARIADB_FIELD_METADATA_STRING + mariadb_field_metadata_attr(const MYSQL_FIELD *field, + enum mariadb_field_metadata_attr_t type);
I think we should have a more general function, so that we don't need to add a new function when> We have a similiar function in mariadb_lib.c: mariadb_get_infov():
we need to extend MYSQL_FIELD (but it can be used also for retrieving values from other members of MYSQL_FIELD):
enum mariadb_field_info_type { FIELD_INFO_DATA_TYPE_NAME, FIELD_INFO_DATA_TYPE_FORMAT /* Later I can add FIELD_INFO_CATALOG, FIELD_INFO_TABLE, .. */ }
my_bool STDCALL mariadb_get_field_info(const MYSQL_FIELD *field, enum mariadb_field_info_type, void *arg, ..)
We have a similiar function in mariadb_lib.c: mariadb_get_infov():
Sounds fine. I have one concern though: wouldn't it be confusing that mariadb_get_field_info() return strings as "char*", while mariadb_get_field_info() will return strings as MARIADB_CSTRING?
Also the function needs to be added to the MARIADB_SYMBOLS in libmariadb/CMakeLists.txt
Thanks. Will do.
Otherwise it looks ok!
/Georg
On Wed, Feb 26, 2020 at 3:40 PM Alexander Barkov <bar@mariadb.com <mailto:bar@mariadb.com>> wrote:
Hi, Sergei, Georg,
Please review a fixed version of the patch for MDEV-17832.
There are two files attached: - mdev-17821.v18.diff (server changes) - mdev-17821-cli.v06.diff (libmariadb changes)
Comparing to the previous version, this version:
1. Adds a new structure MA_FIELD_EXTENSION
2. Moves extended data type information from MYSQL_FIELD to MYSQL_FIELD::extension in the client-server implementation.
Note, in case of embedded server, the extended metadata is stored directly to MYSQL_FIELD.
3. Adds a new API function mariadb_field_metadata_attr(), to extract metadata from MYSQL_FIELD.
4. Changes the way how the metadata is packed on the wire from "easily human readable" to "easily parse-able", which: - makes the things faster - allows to transfer arbitrary binary data in the future, if needed.
Every metadata chunk is now encoded as:
a. chunk type (1 byte) b. chunk data length (1 byte) c. chunk data (according to #b)
For now, two chunk types are implemented: - data type name (used for GEOMETRY sub-types, and for INET6) - format name (for JSON)
Thanks!
-- Georg Richter, Senior Software Engineer MariaDB Corporation Ab
Hi, Alexander! On Feb 26, Alexander Barkov wrote:
Hi, Sergei, Georg,
Please review a fixed version of the patch for MDEV-17832.
There are two files attached: - mdev-17821.v18.diff (server changes)
Here you are:
diff --git a/include/mysql.h b/include/mysql.h index ec49ca0482a..7ba7503ca4e 100644 --- a/include/mysql.h +++ b/include/mysql.h @@ -411,6 +413,19 @@ MYSQL_FIELD * STDCALL mysql_fetch_fields(MYSQL_RES *res); MYSQL_ROW_OFFSET STDCALL mysql_row_tell(MYSQL_RES *res); MYSQL_FIELD_OFFSET STDCALL mysql_field_tell(MYSQL_RES *res);
+ +typedef struct st_mariadb_metadata_string +{ + const char *str; + size_t length; +} MARIADB_FIELD_METADATA_STRING;
not LEX_CSTRING? it's inside the server. and plugins can use MYSQL_CONST_LEX_STRING, if needed. clients don't see this header, so why a new type?
+ + +MARIADB_FIELD_METADATA_STRING + mariadb_field_metadata_attr(const MYSQL_FIELD *field, + enum mariadb_field_metadata_attr_t type); + + unsigned int STDCALL mysql_field_count(MYSQL *mysql); my_ulonglong STDCALL mysql_affected_rows(MYSQL *mysql); my_ulonglong STDCALL mysql_insert_id(MYSQL *mysql); diff --git a/client/client_metadata.h b/client/client_metadata.h new file mode 100644 index 00000000000..ee0185e5049 --- /dev/null +++ b/client/client_metadata.h @@ -0,0 +1,58 @@ +#ifndef SQL_METADATA_INCLUDED +#define SQL_METADATA_INCLUDED +/* + Copyright (c) 2020, MariaDB Corporation. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; version 2 of the License. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA */ + +/* This file is originally from the mysql distribution. Coded by monty */
what do you mean by that? I didn't find it anywhere copy-paste from sql_string.h ?
+ +#ifdef USE_PRAGMA_INTERFACE +#pragma interface /* gcc class implementation */ +#endif
same?
+ +#include "sql_string.h" + + +class Client_field_metadata +{ + MARIADB_FIELD_METADATA_STRING m_type; + MARIADB_FIELD_METADATA_STRING m_format; +public: + Client_field_metadata(MYSQL_FIELD *field) + :m_type(mariadb_field_metadata_attr(field, + MARIADB_FIELD_METADATA_DATA_TYPE_NAME)), + m_format(mariadb_field_metadata_attr(field, + MARIADB_FIELD_METADATA_FORMAT_NAME)) + { } + void append_to(Binary_string *to) const + { + uint orig_to_length= to->length(); + if (m_type.length) + { + to->append(C_STRING_WITH_LEN("type=")); + to->append(m_type.str, m_type.length); + } + if (m_format.length) + { + if (to->length() != orig_to_length) + to->append(" ", 1); + to->append(C_STRING_WITH_LEN("format=")); + to->append(m_format.str, m_format.length); + } + } +}; + + +#endif // SQL_METADATA_INCLUDED diff --git a/libmysqld/libmysql.c b/libmysqld/libmysql.c index 2be94882303..8e3647f3c24 100644 --- a/libmysqld/libmysql.c +++ b/libmysqld/libmysql.c @@ -732,6 +732,35 @@ mysql_fetch_field(MYSQL_RES *result) }
+/************************************************************************** +** Return mysql field metadata +**************************************************************************/ +MARIADB_FIELD_METADATA_STRING +mariadb_field_metadata_attr(const MYSQL_FIELD *field, + enum mariadb_field_metadata_attr_t type) +{ + static const MARIADB_FIELD_METADATA_STRING null_str= {NULL, 0}; + const char *ptr= field->type_info; + const char *end= field->type_info + field->type_info_length; + for ( ; ptr < end ; ) + { + MARIADB_FIELD_METADATA_STRING res; + size_t itype; + if ((uchar) *ptr > 127) + return null_str; + itype= (size_t) (*ptr++); + res.length= (size_t) (*ptr++); + res.str= ptr; + if (res.str + res.length > end) + return null_str; + if (itype == (size_t) type) + return res; + ptr+= res.length; + } + return null_str;
this means when you need both the type and the format, like in the Client_field_metadata constructor, you call mariadb_field_metadata_attr_t twice, for N metadata fields you need to scan the string N times. why not to do it in one scan, filling in an array MARIADB_FIELD_METADATA_STRING metadata[MARIADB_FIELD_METADATA_MAX];
+} + + /************************************************************************** Move to a specific row and column **************************************************************************/ diff --git a/mysql-test/main/gis.result b/mysql-test/main/gis.result index 936924ffe87..ebdb4ddfd24 100644 --- a/mysql-test/main/gis.result +++ b/mysql-test/main/gis.result @@ -5095,5 +5095,51 @@ ERROR HY000: Operator does not exists: 'CAST(expr AS multilinestring)' SELECT CONVERT(1, MULTIPOLYGON); ERROR HY000: Operator does not exists: 'CAST(expr AS multipolygon)' # +# MDEV-17832 Protocol: extensions for Pluggable types and JSON, GEOMETRY +# +SET NAMES utf8; +CREATE TABLE t1 ( +p POINT, +ls LINESTRING, +pl POLYGON, +mp MULTIPOINT, +mls MULTILINESTRING, +mpl MULTIPOLYGON, +gc GEOMETRYCOLLECTION, +g GEOMETRY +) CHARACTER SET utf8; +SELECT * FROM t1; +Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr +def test t1 t1 p p 255 (type=point) 4294967295 0 Y 144 0 63 +def test t1 t1 ls ls 255 (type=linestring) 4294967295 0 Y 144 0 63
A bit strange to have length=4294967295 for point and linestring, don't you think?
+def test t1 t1 pl pl 255 (type=polygon) 4294967295 0 Y 144 0 63 +def test t1 t1 mp mp 255 (type=multipoint) 4294967295 0 Y 144 0 63 +def test t1 t1 mls mls 255 (type=multilinestring) 4294967295 0 Y 144 0 63 +def test t1 t1 mpl mpl 255 (type=multipolygon) 4294967295 0 Y 144 0 63 +def test t1 t1 gc gc 255 (type=geometrycollection) 4294967295 0 Y 144 0 63 +def test t1 t1 g g 255 4294967295 0 Y 144 0 63 +p ls pl mp mls mpl gc g +SELECT +COALESCE(p) AS p, +COALESCE(ls) AS ls, +COALESCE(pl) AS pl, +COALESCE(mp) AS mp, +COALESCE(mls) AS mls, +COALESCE(mpl) AS mpl, +COALESCE(gc) AS gc, +COALESCE(g) AS g +FROM t1; +Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr +def p 255 (type=point) 4294967295 0 Y 128 0 63 +def ls 255 (type=linestring) 4294967295 0 Y 128 0 63 +def pl 255 (type=polygon) 4294967295 0 Y 128 0 63 +def mp 255 (type=multipoint) 4294967295 0 Y 128 0 63 +def mls 255 (type=multilinestring) 4294967295 0 Y 128 0 63 +def mpl 255 (type=multipolygon) 4294967295 0 Y 128 0 63 +def gc 255 (type=geometrycollection) 4294967295 0 Y 128 0 63 +def g 255 4294967295 0 Y 128 0 63 +p ls pl mp mls mpl gc g +DROP TABLE t1;
please test type aggregation too, like COALESCE(p,ls), COALESCE(pl, mpl), etc
+# # End of 10.5 tests # diff --git a/mysql-test/main/gis.test b/mysql-test/main/gis.test index 48f2803b27d..22bd1553bc2 100644 --- a/mysql-test/main/gis.test +++ b/mysql-test/main/gis.test @@ -3180,6 +3180,38 @@ SELECT CONVERT(1, MULTILINESTRING); SELECT CONVERT(1, MULTIPOLYGON);
+--echo # +--echo # MDEV-17832 Protocol: extensions for Pluggable types and JSON, GEOMETRY +--echo # + +SET NAMES utf8; +CREATE TABLE t1 ( + p POINT, + ls LINESTRING, + pl POLYGON, + mp MULTIPOINT, + mls MULTILINESTRING, + mpl MULTIPOLYGON, + gc GEOMETRYCOLLECTION, + g GEOMETRY +) CHARACTER SET utf8; +--disable_ps_protocol
why?
+--enable_metadata +SELECT * FROM t1; +SELECT + COALESCE(p) AS p, + COALESCE(ls) AS ls, + COALESCE(pl) AS pl, + COALESCE(mp) AS mp, + COALESCE(mls) AS mls, + COALESCE(mpl) AS mpl, + COALESCE(gc) AS gc, + COALESCE(g) AS g +FROM t1; +--disable_metadata +--enable_ps_protocol +DROP TABLE t1; + --echo # --echo # End of 10.5 tests --echo # diff --git a/mysql-test/main/mysql-metadata.test b/mysql-test/main/mysql-metadata.test new file mode 100644 index 00000000000..bab44496f78 --- /dev/null +++ b/mysql-test/main/mysql-metadata.test @@ -0,0 +1,22 @@ +-- source include/have_working_dns.inc +-- source include/not_embedded.inc + +--echo # +--echo # MDEV-17832 Protocol: extensions for Pluggable types and JSON, GEOMETRY +--echo # + +SET NAMES utf8; +CREATE TABLE t1 ( + js0 JSON, + js1 TEXT CHECK (JSON_VALID(js1)), + js2 TEXT CHECK (LENGTH(js2) > 0 AND JSON_VALID(js2)), + js3 TEXT CHECK (LENGTH(js2) > 0 OR JSON_VALID(js2)) +) CHARACTER SET utf8; + +--replace_regex /0 rows in set [(].*[)]/0 rows in set (TIME)/
Why [(] ? because backslash escapes don't work in replace_regex?
+--exec $MYSQL -vvv --column-type-info --database=test -e "SELECT * FROM t1;" + +--replace_regex /0 rows in set [(].*[)]/0 rows in set (TIME)/ +--exec $MYSQL -vvv --column-type-info --database=test -e "SELECT JSON_COMPACT(js0) FROM t1;" + +DROP TABLE t1; diff --git a/plugin/type_inet/sql_type_inet.cc b/plugin/type_inet/sql_type_inet.cc index 6803bdba434..79f85cd1f19 100644 --- a/plugin/type_inet/sql_type_inet.cc +++ b/plugin/type_inet/sql_type_inet.cc @@ -705,6 +705,11 @@ class Field_inet6: public Field str.set_ascii(name.ptr(), name.length()); }
+ bool extended_type_info_append_to(Extended_type_info *to) const override + { + return type_handler_inet6.extended_type_info_append_to(to); + } +
I don't like that it needs to be done _per plugin_ there should be a default implementation that just appends the type name. A plugin will only need to overwrite it if it wants to do something more complex. Actually, I'd even not let the plugin to overwrite it. Just append the type name for _every_ pluggable type. Unconditionally. We can revise it if there will be a convincing use case, no need to overengineer it now.
bool validate_value_in_record(THD *thd, const uchar *record) const override { return false; diff --git a/sql/field.cc b/sql/field.cc index 1ce49b0bdfa..8bab37a567c 100644 --- a/sql/field.cc +++ b/sql/field.cc @@ -2039,6 +2039,19 @@ void Field::make_send_field(Send_field *field) field->set_handler(type_handler()); field->flags=table->maybe_null ? (flags & ~NOT_NULL_FLAG) : flags; field->decimals= 0; + extended_type_info_append_to(&field->extended_type_info); + if (check_constraint) + { + /* + Append the format that is implicitly implied by the CHECK CONSTRAINT. + For example: + CREATE TABLE t1 (js longtext DEFAULT NULL CHECK (json_valid(a))); + SELECT j FROM t1; + will add "cf=json" to the extended type info metadata for t1.js.
format=json ?
+ */ + check_constraint->expr-> + format_by_check_constraint_append_to(&field->extended_type_info); + }
in the base Field class? Not in, say, Field_longstr?
}
diff --git a/sql/item.h b/sql/item.h index 6a9d401b101..d0ff49275b4 100644 --- a/sql/item.h +++ b/sql/item.h @@ -1814,6 +1814,14 @@ class Item: public Value_source, /* This is to handle printing of default values */ virtual bool need_parentheses_in_default() { return false; } virtual void save_in_result_field(bool no_conversions) {} + /* + Data type format implied by the CHECK CONSTRAINT, + to be sent to the client in the result set metadata. + */ + virtual bool format_by_check_constraint_append_to(Extended_type_info *) const
This is rather strange order of words, it doesn't read well. May be better append_format_by_check_constraint() ?
+ { + return false; + } /* set value of aggregate function in case of no rows for grouping were found */ diff --git a/sql/item_cmpfunc.cc b/sql/item_cmpfunc.cc index 6df2b5dbd3a..949e389698a 100644 --- a/sql/item_cmpfunc.cc +++ b/sql/item_cmpfunc.cc @@ -6481,6 +6481,20 @@ Item *Item_cond_and::neg_transformer(THD *thd) /* NOT(a AND b AND ...) -> */ }
+bool +Item_cond_and::format_by_check_constraint_append_to(Extended_type_info *to) const +{ + List_iterator_fast<Item> li(const_cast<List<Item>&>(list));
Why the cast?
+ Item *item; + while ((item= li++)) + { + if (item->format_by_check_constraint_append_to(to)) + return true; + } + return false; +} + + Item *Item_cond_or::neg_transformer(THD *thd) /* NOT(a OR b OR ...) -> */ /* NOT a AND NOT b AND ... */ { diff --git a/sql/sql_type.h b/sql/sql_type.h index ce87c8e9d93..899f189adeb 100644 --- a/sql/sql_type.h +++ b/sql/sql_type.h @@ -137,6 +138,40 @@ enum column_definition_type_t };
+class Extended_type_info: public Binary_string +{ +public: + void init() { length(0); } + bool append_chunk(mariadb_field_metadata_attr_t type, + const LEX_CSTRING &value) + { + /* + If we eventually support many metadata chunk types and long metadata + values, we'll need to encode type and length using net_store_length() + and do corresponding changes to the unpacking code in libmariadb. + For now let's just assert that type and length fit into one byte. + */ + DBUG_ASSERT(net_length_size((ulonglong) type) == 1); + DBUG_ASSERT(net_length_size((ulonglong) value.length) == 1);
why do you need these casts?
+ size_t nbytes= 1/*type*/ + 1/*length*/ + value.length; + if (reserve(nbytes)) + return true; + qs_append((char) (uchar) type); + qs_append((char) (uchar) value.length); + qs_append(&value); + return false; + } + bool append_format(const LEX_CSTRING &str) + { + return append_chunk(MARIADB_FIELD_METADATA_FORMAT_NAME, str); + } + bool append_type(const LEX_CSTRING &str) + { + return append_chunk(MARIADB_FIELD_METADATA_DATA_TYPE_NAME, str); + } +}; + + class Data_type_statistics { public: @@ -3426,6 +3461,11 @@ class Type_handler return field_type(); } virtual protocol_send_type_t protocol_send_type() const= 0; + virtual bool Item_extended_type_info_append_to(Extended_type_info *to,
Item_append_extended_type_info ?
+ const Item *item) const + { + return false; + } virtual Item_result result_type() const= 0; virtual Item_result cmp_type() const= 0; virtual enum_dynamic_column_type
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi Sergei, I addressed most of your suggestions. Except these ones: On 3/4/20 4:25 PM, Sergei Golubchik wrote:
--- a/plugin/type_inet/sql_type_inet.cc +++ b/plugin/type_inet/sql_type_inet.cc @@ -705,6 +705,11 @@ class Field_inet6: public Field str.set_ascii(name.ptr(), name.length()); }
+ bool extended_type_info_append_to(Extended_type_info *to) const override + { + return type_handler_inet6.extended_type_info_append_to(to); + } +
I don't like that it needs to be done _per plugin_
We should add Field_fixed_binary and move a number of Field_inet6 methods there, including this method. I propose I do it separately.
diff --git a/sql/item_cmpfunc.cc b/sql/item_cmpfunc.cc index 6df2b5dbd3a..949e389698a 100644 --- a/sql/item_cmpfunc.cc +++ b/sql/item_cmpfunc.cc @@ -6481,6 +6481,20 @@ Item *Item_cond_and::neg_transformer(THD *thd) /* NOT(a AND b AND ...) -> */ }
+bool +Item_cond_and::format_by_check_constraint_append_to(Extended_type_info *to) const +{ + List_iterator_fast<Item> li(const_cast<List<Item>&>(list));
Why the cast?
Same here. It's a separate problem. It needs reorganization in List_iterator_fast. I don't like the idea of adding a new List_iterator_fast constructor to with a cast inside. Also, please discuss with Georg the exact API of the new library functions. Thanks!
participants (3)
-
Alexander Barkov
-
Georg Richter
-
Sergei Golubchik