[Maria-developers] Type_handler and INET6
Hello Sergei, I'm sending a new version for Type_handler and INET6 for your review. There are three patches attached. 1. inet6-v65-prepare.diff.gz Includes preparatory changes, to avoid different execution paths for different data types. 2. inet6-v65-incremental.diff.gz Actually adds INET6, together with val_raw family methods, last minute changes that did not get into #1. 3. inet6-v65-full.diff.gz is a full patch that includes the previous two. Details on the patches as well as my answers to your previous review suggestions follow below. Thanks. The first patch makes preparatory changes, to avoid different execution paths and adds some helper classes needed for the API. - Adds a class Record_addr, to simplify API (pass just one value instead of 3 separate values for ptr, null_ptr and null_bit) and share some code moving initialization (e.g. for a temporary table purposes) to Record_addr constructors. Derives Field from it. - Adds a class Type_ext_attributes. This collects extra members of Item_type_holder. Currently it covers only extra attributes for GEOMETRY, DECIMAL, ENUM/SET. We'll add a generic "void *extra" pointer here, when user-defined types need even more extra attributes. This class will also be needed inside Item_hybrid_func, which currently have a bug (do not preserve geometry type and srid of the arguments). I'll fix it later. See MDEV-9405. - Adds a class Create_attr, to pass Create_field related attributes to handlers. - Adds a class Type_handler_register, to look up handlers by type code and names. - Continues unification of the code that creates temporary fields for various Items, see ::create_tmp_field() in sql_select.cc. This task is not fully finished. Some more chages are still needed. The intent is to get rid of ::create_tmp_field() and switch to use methods in Item instead. - Adds a data type handler pointer to Arg_comparator. Previously the was only "Item_result m_compare_type". - Splits Item_func_case::fix_length_and_dec() into three separate steps: fix_field_type(), fix_return_arguments(), fix_comparison_arguments(), to incorporate the type handler methods easier. - Splits Item_func_between::val_int() into virtual methods in Type_handler_xxx. - Adds virtual methods to Type_handler, to avoid different execution paths for non-traditional types: calc_pack_length(), check_column_definition(), prepare_column_definition(), make_table_field(), Item_save_in_field(), make_cache_item(), calc_pack_length(), calc_display_length(), Item_type_holder_join_attributes(), Item_func_hex_val_str_ascii(), Item_func_hybrid_field_type_val_xxx(), Item_func_between_fix_length_and_dec(), Item_func_between_val_int(), Item_sum_hybrid_fix_length_and_dec(), - Adds Type_handler_numeric, to share some code in real, int decimal handlers. - Moves field_type_for_temporal_comparison() from Item to Type_handler. - Makes some other changes in data type aggregation in various pieces of the code. This will need some more changes though. - Removes Field::result_merge_type(). Uses Type_handler::cmp_type() and Type_handler::result_type() instead. - Simplifies ::make_field() in field.cc, using Type_handler::make_table_field(). - Removes the global Arg_comparator::comparator_matrix[6][2] and uses methods in Type_handler instead. The second patch actually adds INET6 and: - Adds val_raw family methods into Items - Adds some pieces into the type aggregation code in a few places - Fixes sql_yacc.yy to handle non-traditional types, adds Cast_type and fixes Lex_cast_type_st. - Fixes Type_handler_register to search non-traditional types by name. - Disables equal expression propagation for non-traditional types (the code is not ready yet for this, e.g. Item_equal needs some changes) - Adds the INET6 handler with tests Also, I answer your questions and suggestions that you made during the first review. Please see below.
=== added file 'mysql-test/r/type_inet6.result' --- mysql-test/r/type_inet6.result 1970-01-01 00:00:00 +0000 +++ mysql-test/r/type_inet6.result 2014-07-01 11:10:33 +0000 +::ff +::ffff +::0.255.255.255 +::255.255.255.255 +::ff:ffff:ffff +::ffff:255.255.255.255
Hmm. A bit unexpected. Do you have a link to some document that specifies how ipv6 numbers should be parsed and printed?
https://en.wikipedia.org/wiki/IPv6_address says: :ffff:0:0/96 — This prefix is designated as an IPv4-mapped IPv6 address. Note, I also checked this in PostgreSQL and it works in the same way: bar=> select inet'::ffff:ffff:ffff'; inet ------------------------ ::ffff:255.255.255.255
+SELECT * FROM t1 WHERE a=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF; +a +ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff +ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff +Warnings: +Warning 1105 Cast to INET6 converted '\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF' to 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff'
Too many warnings. I'd rather use Item_cache or something similar to have only one actual type conversion and one warning
=== modified file 'sql/field.h' --- sql/field.h 2014-06-11 08:08:08 +0000 +++ sql/field.h 2014-07-01 10:38:12 +0000 @@ -366,6 +437,10 @@ class Field virtual ~Field() {} /* Store functions returns 1 on overflow and -1 on fatal error */ virtual int store(const char *to, uint length,CHARSET_INFO *cs)=0; + virtual int store_raw(const String *raw, enum_field_types type) + { + return store(raw->ptr(), raw->length(), raw->charset()); + }
How do you guarantee that the 'raw' is in the correct internal raw
It now creates Item_cache. So there's only one warning. format?
Or is it only used after int_to_raw, str_to_raw, etc?
raw representation can only be created by a method of the corresponding Type_handler_xxx, or come from "ptr" of the corresponding Field_xxx. Bad raw format means that there is a bug inside Type_handler_xxx. Btw, In the latest patch I changed it to: int store_raw_native(const char *str, uint length); Conversion (when needed) is done outside of Field.
+class Field_type_joiner
It's rather "aggregator", not "joiner". But in fact, it's not even that, it's a pair of { type, derivation }. May be it should be Field_type_with_derivation ? :) And a method Field_type_with_derivation::aggregate(...)
It's now Type_handler_hybrid_field_type, which I earlier added in 10.1. There is no "derivation" in this version, as we discussed in Luxembough. Sigh, I'd really really prefer if we had it though. I think when field=expr is compared according to the field data type is clearer than they're compared accoring to the expr datatype. Also, comparing according to field will use the optimizer by default, without having to cast "expr" to the field data type.
+class Type_handler +{ +public: + Type_handler() {} + ~Type_handler() {} + virtual const char *field_type_name() const = 0; + virtual uint field_type_name_length() const = 0;
may be, remove field_ prefix, or even a field_type_ prefix? the method is still unambiguous, but much shorter and readable. and, may be, make it non-virtual, like
... private: LEX_STRING type_name; public: Type_handler(const char *name_arg) { type_name.str= name_arg; type_name.length= strlen(name_arg); } const char *name() const { return type_name.str; } const char *name_length() const { return type_name.length; } ...
and your new type will do in the constructor:
Type_handler_inet6(): Type_handler("INET6") { }
less boilerplate code for the new types, and you know how I'm trying to avoid boilerplate code in the APIs :)
I'm using the very needed new class Name for it. It now looks simpler, I believe.
+ virtual enum_field_types field_type() const = 0;
That doesn't work for loadable types. I think that we eventually need to do with types what we've done with storage engines. There was db_type enum, but now the engine is almost
always
identified by the pointer to the handlerton. So, for types a pointer to the Type_handler might be a unique type identifier in the server. Replacing enum_field_types where possible.
Sure, in the end we should get a code that will identify handlers by a pointer rather than field_type(). This will need a few additional steps, which I just did not have a chance to do yet: - Fix all existing Items to return type_handler() instead of field_type() as a type identifier. This will be easier after we add virtual inheritance (or templates) soon, on top of my Type_handler changes. - Get rid of all field_type() calls: split the remaining pieces of the code that use field_type() into more virtual methods in Type_handler. - Fix FRM to store new field types as a textual name rather than a numeric ID. Can we do these steps a separately? Note, it can be done after adding INET6 and UUID, before adding dynamic loadability.
+ virtual enum_field_types fallback_field_type() const = 0;
I don't like the idea of fallback_field_type. You have Field_type_joiner for new types, but you fallback to Field::field_type_merge() for old types. I'd rather not distinguish between "old" and "new" types. Just like we have with storage engines or authentication - there are no "native" and "plugin" storage engines (or authentication), *all* storage engines are plugins. So there's no code like
if (is_plugin(engine_type)) ... else ...
so, I prefer uniform code with no special checks for type handlers. Implementations differ, but the interface is the same for all types.
I nodified the code, so there is no a separate fallback_field_type() any more. But there is still cmp_type(), which is uses in very rare cases, to exchange data to/from other data types: INET6->something, something->INET6. Also, there are still some different execution paths inside the methods that aggregate data types. We'll need to discuss it. I could not come up with a simple single method for aggregation. Also, currently field type aggregation is done in different ways in various pieces of the code (hybrid functions, UNION, comparison). We should try to unify these ways.
+ virtual Item_result cast_to_int_type() const = 0;
Why is this abstract? It could have a reasonable default implementation reducing the amount of boilerplate code in the Type_handler's
+ virtual int cmp(Item *a, Item *b) const = 0; + virtual int cmp(String *a, String *b) const = 0;
These too
Not sure that exactly cast_to_int_type() can have a reasonable default. For INET6 and UUID it's DECIMAL_RESULT. For compressed URLs (MDEV-4630) and external ENUM (MDEV-4913) it's more likely to be STRING_RESULT. Anyway, I agree that we should add some reasonable defaults when possible. Perhaps we could add two abstract classes with some defaults: Type_handler | +-- Type_handler_traditional (derive all existing types from this) | +-- Type_handler_binary_fixed_length_storage (derive INET6 and UUID from this) But I'd propose not to hurry with that. We have time before we add loadabale data type plugins. Can we add UUID first? It will show how much INET6 and UUID can share and thus how much can be moved to Type_handler_binary_fixed_length_storage.
+ virtual int save_in_field(Item_hex_hybrid *item, Field *field) const = 0;
Why do you need that virtual? It could be always val_raw/store_raw
I guess this question is not relevant any more, because we now have handlers for the traditional data types, and they need to treat hex hybrids differenty (as string for Field_str and Field_temporal, and as a longlong for the other datatypes). Note, I modified this piece slighly. It's now a virtual method in Field rather than in Type_handler: int Field::store_hex_hybrid(const char *str, uint length);
+ virtual int set_cmp_func(Arg_comparator *cmp, Item *a, Item *b) const = 0;
this can be removed, see below (under Type_handler_inet6::set_cmp_func)
I could not get rid of it. Moreover, the traditional types now use this method. For example: bool Type_handler_int_result::set_comparator_func(Arg_comparator *cmp) const { return cmp->set_cmp_func_int(); } So now I have the same execution path for all data types here.
+ virtual bool varbinary_to_raw(const String *from, String *to) const = 0;
1. This most certainly needs a comment. It took me quite a while to understand what this method is for and why str_to_raw isn't used instead. A comment should say that it's used when a field binary representation is specified in a query directly was 0xHHHHHHHH...
2. Still, I'm not sure this method is needed. It means a different behavior for
where ipv6 = 0xHHHHHH...
and
create t1 (b binary(16)); insert t1 values (0xHHHHH....); ... where ipv6 = a ...
Note, it's the same with what we do for numbers: CREATE OR REPLACE TABLE t1 (a INT, b BLOB); INSERT INTO t1 VALUES (0x31,0x31); SELECT COUNT(*) FROM t1 WHERE a=0x31; -- returns a row SELECT * FROM t1 WHERE a=b; -- returns no rows See below.
an alternative would be to treat any binary string as raw, directly in str_to_raw. But a drawback would be that
where ipv6 = "ffff::1"
and
where ipv6 = convert("ffff::1" to binary)
will work differently. And it's not particularly good either.
I like the idea to treat all binary strings as raw with INET6, and then with UUID. I think this is not really a drawback. Remember, when we discussed "is [VAR]BINARY a data type or a character set" around the year of 2005, we fortunately decided that they are data types, so we introduced data types [VAR]BINARY as a replacement to "CHAR CHARACTER SET binary". So [VAR]CHAR abd [VAR]BINARY are different data types! It should be OK that INET6 will communicate with two ***different*** datatypes [VAR]CHAR vs [VAR]BINARY differently. If we agree on that, I'll change the code accordingly. We preliminary agreed on IRC to make explicit and implicit conversion between a data type and BINARY use methods in Type_handler. (but I have not added them yet).
So, it's a tradeoff, and the question is - what is "less gotcha" ? I don't know. I have a slight preference towards removing
varbinary_to_raw()
just to make the API smaller. And thinking about adding it if there would be user complains.
Another thought. I'm starting to doubt that we should support direct raw values in the SQL at all. It might work for ipv6
where ipv6 = 0xHHHHHH
but it certainly won't work for, say, integers:
where int = 0xHHHHH
^^^ this is not raw, but a normal hexadecimal number. So, unless you introduce a special unambiguous syntax (like RAW(0xHHHHH)), I think that raw values should not be supported in SQL.
This syntax looks useful for me: WHERE inet6_field=0xFFFF; Btw, in case of integer numbers, it's exactly the low level raw format in Field. This is how integer numbers are stored, isn't it? I didn't understand why it won't work for numbers. Note, comparison to a hex hybrid works fine for INET6 in my patch and does not break the existing data types. Numeric comparion goes through Item_hex_hybrid::val_int(). String comparison goes through Item_hex_hybrid::val_str(). INET6 comparison goes through Item_hex_hybrid::val_raw(). This is Arg_comparator::func who decides which method to use. In the future, we could probably unify this: implement val_raw() for the traditional data types, and use val_raw() for comparison instead of the usual methods like val_int(). It's more tricky what should happen in case of VARCHAR-alike or BLOB-alike data types (suppose we'll decide the standard CLOB type in the future). Should val_raw() include the length bytes? I think no, as it will assume duplication of "length", once in String::length(), and one more time in the beginning of String::ptr(). So I was thinking of "raw" as something very close to how the data is stored in Field, but it's not necessary a 100% match. Perhaps we just need a different name than "raw"... I agree we don't need the field-style low level raw format (e.g. with length bytes for a VARCHAR-alike data type).
+ virtual Item *make_cast(MEM_ROOT *mem_root, Item *arg) const = 0;
Hmm. This implies an Item_typecast_xxx for every type. I would be easier to have one Item_typecast_handled, that stores a pointer to Type_handler. Would be much easier than implementing new Item_typecast_xxx for every type.
Some data types could probably share the same Item_typecast_xxx, but I don't think it's generally doable with a single Item for all data types. There are a few reasons why I prefer a separate method in handler to create a CAST Item. - Different data types can have different set of extra attributes, which the corresponding Item_typecast_xxx should store in. It's easier to store attributes in typed structures rather than in a generic memory pointed by "void *". It will make our life easier when we start debugging in gdb :) - I plan to simplify sql_yacc.yy so the loadable datatypes and the traditional datatypes share the same CAST and field_type grammar. It will be needed soon even for the traditional data types. - A method in handler will help to add the missing CAST types easier, e.g. FLOAT, SMALLINT, etc. It looks natural that every Field_xxx has a corresponding Item_typecast_xxx.
+ int store_raw(const String *str, enum_field_types type_arg) + { + if (type_arg != type()) + { + set_null(); + return 1; // TODO: go through str/int/real/dec/date? + } + DBUG_ASSERT(str->length() == binary_length()); + memcpy(ptr, str->ptr(), binary_length()); + return 0; + }
val_raw and store_raw should not be virtual, they are always identical for all types. Just don't forget the validity check.
<offtopic> Btw, it now looks like this: int store_raw_native(const char *str, uint length) { DBUG_ASSERT(length == binary_length()); memcpy(ptr, str, binary_length()); return 0; } I moved type conversion to the upper level. </offtopic> Right, the actual "store" is often a simple memcpy() for fixed length data types like INET6. But it's not generally the case. - It can be different for non-fixed length data types. It's again about the fact that the value of val_raw() and the byte array stored in Field::ptr() can have slighly differrent formats. Variable length types will pass the values using a normal String format, without additional length bytes coded inside the array pointed by String::ptr(). - Also, in case we have a shared method to store, we'll have to have additional virtual methods for validity check, to call inside. There's no sense for two virtual calls. - Also, in some cases it will be possible to use val_raw() instead of val_int(), val_decimal(), val_str(), val_real(), get_date(), and thus share almost duplicate code that's now implemented for different Item_result types. I already mentioned Arg_comparator. But there could also be only one universal Item_cache, etc.
+ int store_time_dec(MYSQL_TIME *ltime, uint dec) + { + my_decimal tmp; + return store_decimal(date2my_decimal(ltime, &tmp)); + }
Numerous val_xxx and store_xxx could be replaced by xxx_to_raw and raw_to_xxx + val_raw and store_raw. You shouldn't introduce both val_xxx and raw_to_xxx. Same for store_xxx and xxx_to_raw
Right. I don't expose raw_to_xxx publically any more. They are now internal protected Type_handler_inet6 routines.
+ + uint size_of() const { return sizeof(*this); } + }; + + + class Arg_comparator_inet6: public Arg_comparator + { + public: + int compare_inet6() + { + if ((*a)->val_raw(&value1, cmp_field_type()) || + (*b)->val_raw(&value2, cmp_field_type())) + goto null; + if (set_null) + owner->null_value= 0; + DBUG_ASSERT(value1.length() ==
Type_handler_inet6::binary_length());
+ DBUG_ASSERT(value2.length() == Type_handler_inet6::binary_length()); + return memcmp(value1.ptr(), value2.ptr(), Type_handler_inet6::binary_length()); + null: + if (set_null) + owner->null_value= 1; + return -1; + } + int compare_e_inet6() + { + bool a_is_null= (*a)->val_raw(&value1, cmp_field_type()); + bool b_is_null= (*b)->val_raw(&value2, cmp_field_type()); + if (a_is_null || b_is_null) + return MY_TEST(a_is_null == b_is_null); + DBUG_ASSERT(value1.length() == Type_handler_inet6::binary_length()); + DBUG_ASSERT(value2.length() == Type_handler_inet6::binary_length()); + return MY_TEST(memcmp(value1.ptr(), value2.ptr(), Type_handler_inet6::binary_length()) == 0); + }
we should try to get rid of one of the functions, see below ... ... ...
+ int set_cmp_func(Arg_comparator *cmp, Item *a, Item *b) const + { + cmp->set_func(cmp->is_owner_equal_func() ? + (arg_cmp_func) &Arg_comparator_inet6::compare_e_inet6 : + (arg_cmp_func) &Arg_comparator_inet6::compare_inet6, + field_type());
Again, boilerplate code, better:
cmp->set_func(Arg_comparator_inet6::compare_e_inet6, Arg_comparator_inet6::compare_inet6, field_type());
no "cmp->is_owner_equal_func() ?" - this can be done inside set_func() Also, I wonder, whether every type really needs two functions, for equal func and not. Perhaps having only one - compare_e_inet6() should be sufficient and checks for NULL can be done in the common wrapper? And, even more, set_cmp_func() is not necessary at all. Type_handler should only have compare_type() and compare_e_type() virtual methods (or only one compare_e_type()) and the implementation of set_cmp_func() becomes identical for all types, and thus doesn't have to be virtual anymore. One method less to implement.
I agree. We should try to do that. Let's try to decompose the functions of the existing data types and the INET6 related functions in a separate patch.
+ return 0; + } + void sortlength(struct st_sort_field *order, const Item *item)
const
+ { + order->result_type= item->cmp_type(); + order->length= Type_handler_inet6::binary_length(); + order->suffix_length= 0; + order->need_strxnfrm= 0; + } + bool make_sort_key(struct st_sort_field *order, uchar *to) const + { + Item *item= order->item; + String buf((char *) to, order->length, &my_charset_bin); + /* + make_sortkey() uses str_result(), val_result(), val_int_result(), + val_decimal_result() for the standard result types. + QQ: should we introduce val_raw_result() and use here instead of + val_raw() ?
Or, perhaps, we should fix make_sortkey() to use val_str, val_real, etc. I failed to understant why *result() methos are used there.
So did I. But as I had to implement the val_raw_result() methods anyway for other purposes, it was easy to use val_raw_result() here, for symmetry with the other types, and in the latest patch I still use _result_ here. It can be fixed later for all data types to use the usual val_xxx() instead.
I've even replaced them with non-*result versions and not a single test failed. Including the test case that was added when val* methods were replaced with val*result methods.
+ +Type_handler_register::Type_handler_register() +{ + memset(m_handler, 0, sizeof(m_handler)); + m_min_type= 256; + m_max_type= 0; + m_min_length= 256; + m_max_length= 0; + add(&Type_handler_inet6::s_singleton);
Eh? I thought it's added in the Type_handler_inet6 constructor.
I think in the end there should be two separate steps actually: - a constructor for Type_handler_xxx - a method to add it into the registry So the plugin loading routine will first create a handler from a dynamic library, and then put it into the registry. Now I just implemented it by a special "class Type_handler_inet6_singleton".
Hi, Alexander! On Feb 12, Alexander Barkov wrote:
Hello Sergei,
I'm sending a new version for Type_handler and INET6 for your review.
There are three patches attached.
1. inet6-v65-prepare.diff.gz Includes preparatory changes, to avoid different execution paths for different data types.
2. inet6-v65-incremental.diff.gz Actually adds INET6, together with val_raw family methods, last minute changes that did not get into #1.
3. inet6-v65-full.diff.gz is a full patch that includes the previous two.
Thanks! What branch are you using for this? Is it on github? Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi Sergei, On 02/26/2016 11:19 PM, Sergei Golubchik wrote:
Hi, Alexander!
On Feb 12, Alexander Barkov wrote:
Hello Sergei,
I'm sending a new version for Type_handler and INET6 for your review.
There are three patches attached.
1. inet6-v65-prepare.diff.gz Includes preparatory changes, to avoid different execution paths for different data types.
2. inet6-v65-incremental.diff.gz Actually adds INET6, together with val_raw family methods, last minute changes that did not get into #1.
3. inet6-v65-full.diff.gz is a full patch that includes the previous two.
Thanks!
What branch are you using for this? Is it on github?
Oops. I did not add a branch and made everything locally. I guess you'd like to apply it, and it does not apply to the latest 10.2. Would you like me to push it, say to bb-10.2-bar ?
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi, Alexander! On Feb 12, Alexander Barkov wrote:
Hello Sergei,
I'm sending a new version for Type_handler and INET6 for your review.
There are three patches attached.
1. inet6-v65-prepare.diff.gz Includes preparatory changes, to avoid different execution paths for different data types.
2. inet6-v65-incremental.diff.gz Actually adds INET6, together with val_raw family methods, last minute changes that did not get into #1.
3. inet6-v65-full.diff.gz is a full patch that includes the previous two.
A question. Is the code in item_inetfunc.cc the *only* code that's needed to add a new type "INET6" ? Everything else is just the framework, not inet6 specific, right? Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hello Sergei, On 02/27/2016 12:53 AM, Sergei Golubchik wrote:
Hi, Alexander!
On Feb 12, Alexander Barkov wrote:
Hello Sergei,
I'm sending a new version for Type_handler and INET6 for your review.
There are three patches attached.
1. inet6-v65-prepare.diff.gz Includes preparatory changes, to avoid different execution paths for different data types.
2. inet6-v65-incremental.diff.gz Actually adds INET6, together with val_raw family methods, last minute changes that did not get into #1.
3. inet6-v65-full.diff.gz is a full patch that includes the previous two.
A question. Is the code in item_inetfunc.cc the *only* code that's needed to add a new type "INET6" ? Everything else is just the framework, not inet6 specific, right?
Yes, that's true.
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
participants (2)
-
Alexander Barkov
-
Sergei Golubchik