[Maria-developers] INET6, data type plugins
Hello all, I have mostly implemented the server side part of the fixed length 16-byte INET6 data type (only some clean-ups left). MDEV-274 The data type for IPv6/IPv4 addresses in MariaDB https://mariadb.atlassian.net/browse/MDEV-274 I made it in a very new style, which is ready to data type plugins, as described in: MDEV-4912 Add a plugin to field types (column types) https://mariadb.atlassian.net/browse/MDEV-4912 It is done by introducing a new abstract super-class Type_handler with a number of virtual public methods (around 50 public methods at this point). To implement a new data type one just needs to implement an instantiable class derived from Type_handler and register the new data type in a special global registry. Everything start to work automatically, including (but not limited to): - correct comparison - hybrid type functions like COALESCE, LEAST - aggregate functions MAX() and MIN() - CAST to the new type - ORDER BY, both for field and for items - conversion to the standard data types - indexing and index search The definition of the instantiable class that actually implements INET6 looks like this: class Type_handler_inet6: public Type_handler { private: class Field_inet6: public Field { ... }; class Arg_comparator_inet6: public Arg_comparator { ... }; class Item_typecast_inet6: public Item_func { ... }; public: // various access methods }; It looks very nice for me so far :) I have some questions about the remaining things: 1. In my current patch the new data type INET6 is not loaded dynamically from a shared library. It's implemented inside mysqld. Is it Okey? Or would we prefer to make it as a plugin from the beginning? (in this case there's some more work left: the dynamic loader code, client-server protocol, PS protocol). 2. In case of pluggable data types we'll need some changes in the client-server protocol, to transfer data type names (and probably some more metadata) to the client side in some textual form. Any ideas on that? 3. PS protocol obviously needs two ways to bind data: in text format and in binary format. For example, in case of INET6 it would be nice to be able to bind parameters and fetch values using: - text representation "ffff::ffff" - and binary representation 0xFFFF000000000000000000000000FFFF. I can see some ways to do this: a. Every pluggable data type uses its own type code in MYSQL_BIND: char ip6[16]= "\xff\xff...\xff"; MYSQL_BIND bind; bind.buffer_type= 128; // This is INET6 bind.buffer= ip6; bind.length= sizeof(ip6); but the code (128) may vary between servers, depending on how the plugin is installed. b. Or we could introduce a singe new type code MYSQL_TYPE_RAW: char ip6[16]= "\xff\xff...\xff"; MYSQL_BIND bind; bind.buffer_type= MYSQL_TYPE_RAW; bind.buffer= ip6; bind.length= sizeof(ip6); The server should be able to treat a single type code with different data types without any problems, because it knows the parameter context. But this will not allow to mix values of different types though, for example insert an INET4 value into an INET6 column (or vice verse) won't be possible. c. Or we could support both "native" type codes and MYSQL_TYPE_RAW. 4. Do we need the INET4 data type (fixed length, 4 bytes)? 5. Do we need a PostgreSQL-alike INET data type which is able to store both INET4 and INET6 values (I guess this one should be variable length: 4-bytes for INET4 and 16 bytes for INET6 values. Any feedback would be appreciated. The current patch (attached) will be polished. but the general idea should already be clear from the current version. Thanks.
On Tue, Jul 01, 2014 at 05:07:36PM +0400, Alexander Barkov wrote:
2. In case of pluggable data types we'll need some changes in the client-server protocol, to transfer data type names (and probably some more metadata) to the client side in some textual form. Any ideas on that?
I'm wondering, will those on older (or Oracle's) client libraries be able to access the new datatype then? I think, we should provide that. BR Sergei -- Sergei Petrunia, Software Developer MariaDB | Skype: sergefp | Blog: http://s.petrunia.net/blog
On Tue, Jul 01, 2014 at 05:07:36PM +0400, Alexander Barkov wrote:
Hello all,
I have mostly implemented the server side part of the fixed length 16-byte INET6 data type (only some clean-ups left). MDEV-274 The data type for IPv6/IPv4 addresses in MariaDB https://mariadb.atlassian.net/browse/MDEV-274
If we're going to start a new API, it's better to have at least some documentation. Ability to provide clear documentation is correlated with the feature having good design. Could you add some comments explaining what these classes do: - class Field_type_joiner - class Type_handler - class Type_handler_register ? - How do I get the IPv6 address in binary form? HEX(inet6_field) seems to return hex of the string form. When I try to compile, I get a lot of warnings like this: sql/field.h:3090:7: warning: ‘class Type_handler’ has virtual functions and accessible non-virtual destructor.. BR Sergei -- Sergei Petrunia, Software Developer MariaDB | Skype: sergefp | Blog: http://s.petrunia.net/blog
Hi, Alexander! On Jul 01, Alexander Barkov wrote:
I have mostly implemented the server side part of the fixed length 16-byte INET6 data type (only some clean-ups left). MDEV-274 The data type for IPv6/IPv4 addresses in MariaDB https://mariadb.atlassian.net/browse/MDEV-274
Please, find my review below:
I have some questions about the remaining things:
1. In my current patch the new data type INET6 is not loaded dynamically from a shared library. It's implemented inside mysqld. Is it Okey? Or would we prefer to make it as a plugin from the beginning? (in this case there's some more work left: the dynamic loader code, client-server protocol, PS protocol).
May be later, depending on how much time we'll have.
2. In case of pluggable data types we'll need some changes in the client-server protocol, to transfer data type names (and probably some more metadata) to the client side in some textual form. Any ideas on that?
Yes, a function to convert type names to type codes. Like uint mysql_get_typecode(MYSQL * mysql, const char *typename); It'll work by querying I_S.TYPES or something. And is supposed to be used like uint MYSQL_TYPE_IPV6= mysql_get_typecode(mysql, "IPV6"); if (MYSQL_TYPE_IPV6 == 0) die("IPV6 type is not supported"); after that one can use MYSQL_TYPE_IPV6 as any other MYSQL_TYPE_xxx value.
3. PS protocol obviously needs two ways to bind data: in text format and in binary format. For example, in case of INET6 it would be nice to be able to bind parameters and fetch values using: - text representation "ffff::ffff" - and binary representation 0xFFFF000000000000000000000000FFFF.
I can see some ways to do this: a. Every pluggable data type uses its own type code in MYSQL_BIND:
char ip6[16]= "\xff\xff...\xff"; MYSQL_BIND bind; bind.buffer_type= 128; // This is INET6 bind.buffer= ip6; bind.length= sizeof(ip6);
but the code (128) may vary between servers, depending on how the plugin is installed.
Yes, but the suggestion above would solve this. Still I_S.TYPES should show the correct value of sizeof(ipv6), I mean, it should contain type metadata such as whether it's fixed-size or variable size, and if fixed - how many bytes it takes.
4. Do we need the INET4 data type (fixed length, 4 bytes)?
I believe users will ask for it. But may be not in this task?
5. Do we need a PostgreSQL-alike INET data type which is able to store both INET4 and INET6 values (I guess this one should be variable length: 4-bytes for INET4 and 16 bytes for INET6 values.
Same.
Any feedback would be appreciated.
The current patch (attached) will be polished. but the general idea should already be clear from the current version.
=== modified file 'include/mysql_com.h' --- include/mysql_com.h 2014-05-13 09:53:30 +0000 +++ include/mysql_com.h 2014-06-27 15:41:28 +0000 @@ -443,6 +443,11 @@ enum enum_field_types { MYSQL_TYPE_DECIM
};
+ +#define MYSQL_TYPE_HANDLED_MIN_ID 128 +#define MYSQL_TYPE_HANDLED_MAX_ID 191 +
is this the range of values for new types?
+ /* For backward compatibility */ #define CLIENT_MULTI_QUERIES CLIENT_MULTI_STATEMENTS #define FIELD_TYPE_DECIMAL MYSQL_TYPE_DECIMAL
=== added file 'mysql-test/r/type_inet6.result' --- mysql-test/r/type_inet6.result 1970-01-01 00:00:00 +0000 +++ mysql-test/r/type_inet6.result 2014-07-01 11:10:33 +0000 @@ -0,0 +1,1420 @@ +CREATE TABLE t1 ( +a INET6 DEFAULT 'ffff::0001', +b INET6 DEFAULT 'ffff::01', +c INET6 DEFAULT 'fe80::192.168.1.1' +); +SHOW CREATE TABLE t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `a` inet6 DEFAULT 'ffff::1', + `b` inet6 DEFAULT 'ffff::1', + `c` inet6 DEFAULT 'fe80::c0a8:101' +) ENGINE=MyISAM DEFAULT CHARSET=latin1 +INSERT INTO t1 VALUES(); +SELECT * FROM t1; +a b c +ffff::1 ffff::1 fe80::c0a8:101 +DROP TABLE t1; +CREATE TABLE t1 (a INET6 NOT NULL AUTO_INCREMENT); +ERROR 42000: Incorrect column specifier for column 'a' +CREATE TABLE t1 (a INET6 DEFAULT ''); +ERROR 42000: Invalid default value for 'a' +CREATE TABLE t1 (a INET6); +INSERT INTO t1 VALUES ('bad'); +Warnings: +Warning 1366 Incorrect inet6 value: 'bad' for column 'a' at row 1 +SELECT * FROM t1; +a +NULL +DROP TABLE t1; +CREATE TABLE t1 (a INET6 NOT NULL); +SHOW CREATE TABLE t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `a` inet6 NOT NULL +) ENGINE=MyISAM DEFAULT CHARSET=latin1 +INSERT INTO t1 VALUES ('bad'); +Warnings: +Warning 1366 Incorrect inet6 value: 'bad' for column 'a' at row 1 +SELECT * FROM t1; +a +:: +SELECT * FROM t1 WHERE a='bad'; +a +:: +Warnings: +Warning 1105 Cast to INET6 converted 'bad' to '::' +DROP TABLE t1; +CREATE TABLE t1 (a INET6); +INSERT INTO t1 VALUES (0xFF); +INSERT INTO t1 VALUES (0xFFFF); +INSERT INTO t1 VALUES (0xFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFFFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFFFFFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFFFFFFFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFFFFFFFFFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFFFFFFFFFFFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFFFFFFFFFFFFFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFFFFFFFFFFFFFFFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFFFFFFFFFFFFFFFFFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF); +INSERT INTO t1 VALUES (0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF); +Warnings: +Warning 1105 Cast to INET6 converted '\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF' to 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff' +SELECT * FROM t1; +a +::ff +::ffff +::0.255.255.255 +::255.255.255.255 +::ff:ffff:ffff +::ffff:255.255.255.255
Hmm. A bit unexpected. Do you have a link to some document that specifies how ipv6 numbers should be parsed and printed?
+::ff:ffff:ffff:ffff +::ffff:ffff:ffff:ffff +::ff:ffff:ffff:ffff:ffff +::ffff:ffff:ffff:ffff:ffff +::ff:ffff:ffff:ffff:ffff:ffff +::ffff:ffff:ffff:ffff:ffff:ffff +::ff:ffff:ffff:ffff:ffff:ffff:ffff +::ffff:ffff:ffff:ffff:ffff:ffff:ffff +ff:ffff:ffff:ffff:ffff:ffff:ffff:ffff +ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff +ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff +SELECT * FROM t1 WHERE a=0xFFFF; +a +::ffff +SELECT * FROM t1 WHERE a=0xFFFFFFFFFFFFFFFF; +a +::ffff:ffff:ffff:ffff +SELECT * FROM t1 WHERE a=0xFFFFFFFFFFFFFFFFFFFFFFFF; +a +::ffff:ffff:ffff:ffff:ffff:ffff +SELECT * FROM t1 WHERE a=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF; +a +ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff +ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff +SELECT * FROM t1 WHERE a=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF; +a +ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff +ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff +Warnings: +Warning 1105 Cast to INET6 converted '\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF' to 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff'
Too many warnings. I'd rather use Item_cache or something similar to have only one actual type conversion and one warning
+Warning 1105 Cast to INET6 converted '\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF' to 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff' +Warning 1105 Cast to INET6 converted '\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF' to 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff' +Warning 1105 Cast to INET6 converted '\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF' to 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff' +Warning 1105 Cast to INET6 converted '\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF' to 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff' +Warning 1105 Cast to INET6 converted '\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF' to 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff' ... === modified file 'sql/field.h' --- sql/field.h 2014-06-11 08:08:08 +0000 +++ sql/field.h 2014-07-01 10:38:12 +0000 @@ -59,6 +60,15 @@ enum Derivation DERIVATION_EXPLICIT= 0 };
+ +/* Data type derivation */ +enum Type_derivation +{ + TYPE_DERIVATION_COERCIBLE= 1, // Non-typified literals + TYPE_DERIVATION_IMPLICIT= 0 // Fields, functions, typified literals +};
Interesting. Does it affect the existing behavior? Or you only use it for new types? I think both alternatives are far from ideal: the former means incompatibilities, the latter - that new types behave differently from old ones. Still, given the choice, I'd prefer consistency over compatibility.
+ + #define STORAGE_TYPE_MASK 7 #define COLUMN_FORMAT_MASK 7 #define COLUMN_FORMAT_SHIFT 3 @@ -307,6 +317,67 @@ class Field uint32 flags; uint16 field_index; // field number in fields array uchar null_bit; // Bit used to test null bit + + class Create_param
A comment would be nice here, explaining what this class is for. ... or rename it to Make_field_param.
+ { + TABLE_SHARE *m_share; + uchar *m_ptr; + uint32 m_field_length; + uchar *m_null_pos; + uchar m_null_bit; + uint m_pack_flag; + CHARSET_INFO *m_field_charset; + Field::geometry_type m_geom_type; + Field::utype m_unireg_check; + TYPELIB *m_interval; + public: + Create_param(TABLE_SHARE *share, + uchar *ptr, + uint32 field_length, + uchar *null_pos, + uchar null_bit, + uint pack_flag, + CHARSET_INFO *cs, + Field::geometry_type geom_type, + Field::utype unireg_check, + TYPELIB *interval)
don't use tabs in the new code, prefer spaces
+ { + m_share= share; + m_ptr= ptr; + m_field_length= field_length; + m_null_pos= null_pos; + m_null_bit= null_bit; + m_pack_flag= pack_flag; + m_field_charset= cs; + m_geom_type= geom_type; + m_unireg_check= unireg_check; + m_interval= interval; + } + Create_param(uchar *ptr, uchar *null_pos, + uchar null_bit, utype unireg_check) + { + m_share= NULL; + m_ptr= ptr; + m_field_length= 0; + m_null_pos= null_pos; + m_null_bit= null_bit; + m_pack_flag= 0; + m_field_charset= &my_charset_bin; + m_geom_type= GEOM_GEOMETRY; + m_unireg_check= unireg_check; + m_interval= NULL; + } + TABLE_SHARE *share() const { return m_share; } + uchar *ptr() const { return m_ptr; } + uint32 field_length() const { return m_field_length; } + uchar *null_pos() const { return m_null_pos; } + uchar null_bit() const { return m_null_bit; } + uint pack_flag() const { return m_pack_flag; } + CHARSET_INFO *charset() const { return m_field_charset; } + Field::geometry_type geom_type() const { return m_geom_type; } + Field::utype unireg_check() const { return m_unireg_check; } + TYPELIB *interval() const { return m_interval; } + }; /** If true, this field was created in create_tmp_field_from_item from a NULL value. This means that the type of the field is just a guess, and the type @@ -366,6 +437,10 @@ class Field virtual ~Field() {} /* Store functions returns 1 on overflow and -1 on fatal error */ virtual int store(const char *to, uint length,CHARSET_INFO *cs)=0; + virtual int store_raw(const String *raw, enum_field_types type) + { + return store(raw->ptr(), raw->length(), raw->charset()); + }
How do you guarantee that the 'raw' is in the correct internal raw format? Or is it only used after int_to_raw, str_to_raw, etc?
virtual int store(double nr)=0; virtual int store(longlong nr, bool unsigned_val)=0; virtual int store_decimal(const my_decimal *d)=0; @@ -1511,6 +1588,8 @@ class Field_double :public Field_real {
class Field_null :public Field_str { static uchar null[1]; +private: + void push_warning();
unused?
public: Field_null(uchar *ptr_arg, uint32 len_arg, enum utype unireg_check_arg, const char *field_name_arg, @@ -2953,6 +3037,145 @@ class Copy_field :public Sql_alloc { };
+ +class Field_type_joiner
It's rather "aggregator", not "joiner". But in fact, it's not even that, it's a pair of { type, derivation }. May be it should be Field_type_with_derivation ? :) And a method Field_type_with_derivation::aggregate(...)
+{ +private: + enum_field_types m_type; + Type_derivation m_derivation; + void reset() + { + m_type= MYSQL_TYPE_NULL; + m_derivation= TYPE_DERIVATION_COERCIBLE; + } +public: + Field_type_joiner() { reset(); } + Field_type_joiner(const Item *item) { set(item); } + Field_type_joiner(enum_field_types type, Type_derivation derivation) + { + set(type, derivation); + } + Field_type_joiner(const Item *a, const Item *b) + { + set(a); + join(b); + } + Field_type_joiner(const Item **item, uint nitems) + { + set(item, nitems); + } + void set(enum_field_types type, Type_derivation derivation) + { + m_type= type; + m_derivation= derivation; + } + void set(const Field_type_joiner &other) + { + m_type= other.m_type; + m_derivation= other.m_derivation; + } + void set(const Item *item); + void set(const Item **item, uint nitems); + void join(const Field_type_joiner &other); + void join(const Item *item) { join(Field_type_joiner(item)); } + enum_field_types type() const { return m_type; } + Type_derivation derivation() const { return m_derivation; } +}; + + +class Item_hex_hybrid; +class Arg_comparator; +class Item_func; + +class Type_handler +{ +public: + Type_handler() {} + ~Type_handler() {} + virtual const char *field_type_name() const = 0; + virtual uint field_type_name_length() const = 0;
may be, remove field_ prefix, or even a field_type_ prefix? the method is still unambiguous, but much shorter and readable. and, may be, make it non-virtual, like ... private: LEX_STRING type_name; public: Type_handler(const char *name_arg) { type_name.str= name_arg; type_name.length= strlen(name_arg); } const char *name() const { return type_name.str; } const char *name_length() const { return type_name.length; } ... and your new type will do in the constructor: Type_handler_inet6(): Type_handler("INET6") { } less boilerplate code for the new types, and you know how I'm trying to avoid boilerplate code in the APIs :)
+ virtual enum_field_types field_type() const = 0;
That doesn't work for loadable types. I think that we eventually need to do with types what we've done with storage engines. There was db_type enum, but now the engine is almost always identified by the pointer to the handlerton. So, for types a pointer to the Type_handler might be a unique type identifier in the server. Replacing enum_field_types where possible.
+ virtual enum_field_types fallback_field_type() const = 0;
I don't like the idea of fallback_field_type. You have Field_type_joiner for new types, but you fallback to Field::field_type_merge() for old types. I'd rather not distinguish between "old" and "new" types. Just like we have with storage engines or authentication - there are no "native" and "plugin" storage engines (or authentication), *all* storage engines are plugins. So there's no code like if (is_plugin(engine_type)) ... else ... so, I prefer uniform code with no special checks for type handlers. Implementations differ, but the interface is the same for all types.
+ virtual Item_result fallback_cmp_type() const = 0; // TODO: get rid of this
Right, same here.
+ virtual Item_result cast_to_int_type() const = 0;
Why is this abstract? It could have a reasonable default implementation reducing the amount of boilerplate code in the Type_handler's
+ virtual int cmp(Item *a, Item *b) const = 0; + virtual int cmp(String *a, String *b) const = 0;
These too
+ + virtual uint decimal_precision() const = 0; + virtual uint temporal_scale() const = 0;
temporal_scale can be 0 by default
+ virtual int save_in_field(Item_hex_hybrid *item, Field *field) const = 0;
Why do you need that virtual? It could be always val_raw/store_raw
+ virtual int set_cmp_func(Arg_comparator *cmp, Item *a, Item *b) const = 0;
this can be removed, see below (under Type_handler_inet6::set_cmp_func)
+ virtual void sortlength(struct st_sort_field *field, const Item *item) const = 0;
better, set_sort_field()
+ virtual bool make_sort_key(struct st_sort_field *field, uchar *to) const = 0; + + virtual bool val_raw(Item *item, String *to) const = 0; + virtual bool raw_to_str(const String *raw, String *to) const = 0; + virtual bool raw_to_real(const String *raw, double *to) const = 0; + virtual bool raw_to_int(const String *raw, longlong *to) const = 0; + virtual bool raw_to_decimal(const String *raw, my_decimal *to) const = 0; + virtual bool raw_to_date(const String *raw, + MYSQL_TIME *to, ulonglong fuzzydate) const = 0; + + virtual bool str_to_raw(const String *str, String *raw) const = 0; + virtual bool varbinary_to_raw(const String *from, String *to) const = 0;
1. This most certainly needs a comment. It took me quite a while to understand what this method is for and why str_to_raw isn't used instead. A comment should say that it's used when a field binary representation is specified in a query directly was 0xHHHHHHHH... 2. Still, I'm not sure this method is needed. It means a different behavior for where ipv6 = 0xHHHHHH... and create t1 (b binary(16)); insert t1 values (0xHHHHH....); ... where ipv6 = a ... an alternative would be to treat any binary string as raw, directly in str_to_raw. But a drawback would be that where ipv6 = "ffff::1" and where ipv6 = convert("ffff::1" to binary) will work differently. And it's not particularly good either. So, it's a tradeoff, and the question is - what is "less gotcha" ? I don't know. I have a slight preference towards removing varbinary_to_raw() just to make the API smaller. And thinking about adding it if there would be user complains. Another thought. I'm starting to doubt that we should support direct raw values in the SQL at all. It might work for ipv6 where ipv6 = 0xHHHHHH but it certainly won't work for, say, integers: where int = 0xHHHHH ^^^ this is not raw, but a normal hexadecimal number. So, unless you introduce a special unambiguous syntax (like RAW(0xHHHHH)), I think that raw values should not be supported in SQL.
+ virtual bool decimal_to_raw(const my_decimal *num, String *raw) const = 0; + virtual bool int_to_raw(longlong num, bool unsigned_val, String *raw) const = 0; + virtual bool date_to_raw(const MYSQL_TIME *date, String *raw) const = 0; + virtual bool real_to_raw(double num, String *raw) const = 0; + + // TODO: should these stay non-virtual?
Sure
+ virtual String *val_str_from_val_raw(Item *item, String *buf) const = 0; + virtual double val_real_from_val_raw(Item *item) const = 0; + virtual longlong val_int_from_val_raw(Item *item) const = 0; + virtual my_decimal *val_decimal_from_val_raw(Item *item, my_decimal *buf) const = 0; + virtual bool get_date_from_val_raw(Item *item, + MYSQL_TIME *ltime, + ulonglong fuzzydate) const = 0; + virtual bool val_raw_from_val_int(Item *item, String *to) const = 0; + virtual bool val_raw_from_val_str(Item *item, String *to) const = 0; + virtual bool val_raw_from_val_decimal(Item *item, String *to) const = 0; + virtual bool val_raw_from_val_real(Item *item, String *to) const = 0; + virtual bool val_raw_from_get_date(Item *item, String *to) const = 0; + + virtual bool Item_func_fix_length_and_dec(Item_func *item) const = 0;
What's that for?
+ virtual Field* make_field(const Field::Create_param *param, const char *name) const = 0; + virtual bool init_field(Create_field *field) const = 0; + virtual uint calc_pack_length(uint length) const = 0; + virtual Item *make_cast(MEM_ROOT *mem_root, Item *arg) const = 0;
Hmm. This implies an Item_typecast_xxx for every type. I would be easier to have one Item_typecast_handled, that stores a pointer to Type_handler. Would be much easier than implementing new Item_typecast_xxx for every type.
+ virtual bool Item_hex_hybrid_val_raw(Item_hex_hybrid *item, String *to) const = 0;
same comment as about varbinary_to_raw
+}; + + +class Type_handler_register +{ + const Type_handler *m_handler[256]; + uint m_min_type; + uint m_max_type; + uint m_min_length; + uint m_max_length; +public: + Type_handler_register(); + const Type_handler *handler(enum_field_types type) const + { + return m_handler[type]; + } + const Type_handler *handler(const char *name, uint length) const; + bool add(const Type_handler *handler) + { + if (m_handler[handler->field_type()]) + return true; // TODO: add a warning? + m_handler[handler->field_type()]= handler; + set_if_smaller(m_min_type, handler->field_type()); + set_if_bigger(m_max_type, handler->field_type()); + set_if_smaller(m_min_length, handler->field_type_name_length()); + set_if_bigger(m_max_length, handler->field_type_name_length());
I'd expect a hash here. There can be many types here, especially if we're going to treat all types uniformly, adding all existing types to the registry.
+ return false; + } +}; +extern Type_handler_register Type_handlers; + + + Field *make_field(TABLE_SHARE *share, uchar *ptr, uint32 field_length, uchar *null_pos, uchar null_bit, uint pack_flag, enum_field_types field_type,
=== modified file 'sql/field.cc' --- sql/field.cc 2014-06-11 08:08:08 +0000 +++ sql/field.cc 2014-07-01 10:54:07 +0000 @@ -956,6 +957,58 @@ enum_field_types Field::field_type_merge [field_type2index(b)]; }
+void Field_type_joiner::set(const Item *item) +{ + if (item->result_type() == ROW_RESULT) + { + reset(); + return; + } + m_type= item->field_type(); + m_derivation= item->type_derivation(); +} + + +void Field_type_joiner::join(const Field_type_joiner &other) +{ + enum_field_types a= m_type; + enum_field_types b= other.type(); + const Type_handler *handler1= Type_handlers.handler(a); + const Type_handler *handler2= Type_handlers.handler(b);
just as I said above, you can get rid of constant searches in the registry, if the main type identifier will be a pointer to its Type_handlers, not its enum_field_types value.
+ if (handler1 || handler2)
and, also, as I said above, I'd really like to have no special cases, I mean, every type should have a handler.
+ { + if (handler1 == handler2) + { + m_derivation= MY_MIN(m_derivation, other.derivation()); + return; + } + if (m_derivation < other.derivation()) + return; + if (m_derivation > other.derivation()) + { + set(other); + return; + } + // Different types with equal type derivation + // TODO: INET6 should be treated as DECIMAL in some cases + if (handler1) + a= handler1->fallback_field_type(); + if (handler2) + b= handler2->fallback_field_type(); + } + m_derivation= MY_MIN(m_derivation, other.derivation()); + m_type= Field::field_type_merge(a, b); +} + + +void Field_type_joiner::set(const Item **item, uint nitems) +{ + DBUG_ASSERT(nitems > 0); + set(item[0]); + for (uint i= 1; i < nitems; i++) + join(item[i]); +} +
static Item_result field_types_result_type [FIELDTYPE_NUM]= { @@ -1444,6 +1499,48 @@ String *Field::val_int_as_str(String *va }
+bool Field::val_raw(String *to, enum_field_types type) +{ + const Type_handler *handler= Type_handlers.handler(type); + if (!handler) + return true; + switch (cmp_type()) + { + case INT_RESULT: + return handler->int_to_raw(val_int(), (flags & UNSIGNED_FLAG), to);
I don't understand that. If there's a handler, you can just retrieve field's value directly as raw, why do you convert it to int (val_int()) and then back to raw?
+ case REAL_RESULT: + return handler->real_to_raw(val_real(), to); + case DECIMAL_RESULT: + { + my_decimal *dec, buf; + if (!(dec= val_decimal(&buf))) + return true; + return handler->decimal_to_raw(dec, to); + } + case STRING_RESULT: + { + StringBuffer<64> buf; + String *tmp; + if (!(tmp= val_str(&buf))) + return true; + return handler->str_to_raw(tmp, to); + } + case TIME_RESULT: + { + MYSQL_TIME ltime; + if (get_date(<ime, 0)) + return true; + return handler->date_to_raw(<ime, to); + } + case ROW_RESULT: + case IMPOSSIBLE_RESULT: + DBUG_ASSERT(0); + break; + } + return true; +} + + /// This is used as a table name when the table structure is not set up Field::Field(uchar *ptr_arg,uint32 length_arg,uchar *null_ptr_arg, uchar null_bit_arg, @@ -10139,3 +10255,841 @@ void Field::set_explicit_default(Item *v return; set_has_explicit_value(); } + + +/****************************************************************************/ + + +#define MYSQL_TYPE_INET6_ID MYSQL_TYPE_HANDLED_MIN_ID + + +class Type_handler_inet6: public Type_handler +{ + static uint char_length() { return 46; } //TODO: 8 * 4 + 7 ? + static uint binary_length() { return 16; } + static void truncated_fraction_warning(const ErrConv *err) + { + // Fractional digits were truncated, send a note. + // TODO: a better warning + push_warning_printf(current_thd, Sql_condition::WARN_LEVEL_NOTE, + ER_UNKNOWN_ERROR, + "Cast to INET6 truncated fractional digits from '%s'", + err->ptr()); + } + static void truncated_value_warning(const ErrConv *err, const char *value) + { + // TODO: better warning + push_warning_printf(current_thd, Sql_condition::WARN_LEVEL_WARN, + ER_UNKNOWN_ERROR, + "Cast to INET6 converted '%s' to '%s'", + err->ptr(), value); + } + static void truncated_value_warning(const ErrConv *err, bool to_max) + { + truncated_value_warning(err, + !to_max ? + "::" : "ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff"); + } + static void set_min_value(char *to) + { + memset(to, 0x00, binary_length()); + } + static void set_max_value(char *to) + { + memset(to, 0xFF, binary_length()); + } + static void set_min_value_with_warn(char *to, const ErrConv *err) + { + set_min_value(to); + truncated_value_warning(err, false); + } + static void set_max_value_with_warn(char *to, const ErrConv *err) + { + set_max_value(to); + truncated_value_warning(err, true); + } + static bool set_truncated_value_with_warn(String *to, + const ErrConv *err, bool to_max) + { + if (to->alloc(Type_handler_inet6::binary_length())) + return true; + truncated_value_warning(err, to_max); + if (to_max) + set_max_value((char *) to->ptr()); + else + set_min_value((char *) to->ptr()); + to->length(Type_handler_inet6::binary_length()); + return false; + } + static void int_to_raw(ulonglong tmp, char *to) + { + memset(to, 0, 8); + mi_int8store(to + 8, tmp); + } + static bool int_to_raw(ulonglong tmp, String *to) + { + if (to->alloc(Type_handler_inet6::binary_length())) + return true; + int_to_raw(tmp, (char *) to->ptr()); + to->length(Type_handler_inet6::binary_length()); + return false; + } + + + class Field_inet6: public Field { + private: + void store_warning(const ErrConv *str, + Sql_condition::enum_warning_level level) + { + // TODO: better warning + push_warning_printf(get_thd(), level, + ER_TRUNCATED_WRONG_VALUE_FOR_FIELD, + ER(ER_TRUNCATED_WRONG_VALUE_FOR_FIELD), + "inet6", str->ptr(), field_name, + (ulong) table->in_use->get_stmt_da()-> + current_row_for_warning()); + } + int set_min_value_with_warn(const ErrConv *str) + { + store_warning(str, Sql_condition::WARN_LEVEL_WARN); + Type_handler_inet6::set_min_value((char*) ptr); + return 1; + } + int set_max_value_with_warn(const ErrConv *str) + { + store_warning(str, Sql_condition::WARN_LEVEL_WARN); + Type_handler_inet6::set_max_value((char*) ptr); + return 1; + } + public: + Field_inet6(const Field::Create_param *param, const char *field_name_arg) + :Field(param->ptr(), Type_handler_inet6::char_length(), + param->null_pos(), param->null_bit(), + param->unireg_check(), field_name_arg) + { + flags|= BINARY_FLAG | UNSIGNED_FLAG; + } + Item_result result_type () const { return STRING_RESULT; } + uint32 max_display_length() { return field_length; } + bool str_needs_quotes() { return TRUE; } + enum Derivation derivation(void) const { return DERIVATION_NUMERIC; } + uint repertoire(void) const { return MY_REPERTOIRE_NUMERIC; } + CHARSET_INFO *charset(void) const { return &my_charset_numeric; } + const CHARSET_INFO *sort_charset(void) const { return &my_charset_numeric; } + bool binary() const { return false; } + enum Item_result cmp_type () const { return STRING_RESULT; } + enum Item_result cast_to_int_type() const { return DECIMAL_RESULT; } + enum_field_types type() const + { return (enum_field_types) MYSQL_TYPE_INET6_ID;} + enum ha_base_keytype key_type() const { return HA_KEYTYPE_BINARY; } + + uint is_equal(Create_field *new_field) + { + return new_field->sql_type == real_type(); + } + bool eq_def(Field *field) + { + return (Field::eq_def(field) && decimals() == field->decimals()); + } + double pos_in_interval(Field *min, Field *max) + { + return pos_in_interval_val_real(min, max); + } + int cmp(const uchar *a, const uchar *b) + { return memcmp(a,b,pack_length()); } + + void sort_string(uchar *to, uint length) + { + DBUG_ASSERT(length == pack_length()); + memcpy(to, ptr, length); + } + uint32 pack_length() const { return Type_handler_inet6::binary_length(); } + void sql_type(String &str) const + { + str.set_ascii(STRING_WITH_LEN("inet6")); + } + // bool send_binary(Protocol *protocol); // TODO: change from val_str + // + // + // uchar *pack(uchar* to, const uchar *from, + // uint max_length __attribute__((unused))) + // { + // return pack_int32(to, from); + // } + // const uchar *unpack(uchar* to, const uchar *from, const uchar *from_end, + // uint param_data __attribute__((unused))) + // { + // return unpack_int32(to, from, from_end); + // } + + + String *val_str(String *val_buffer, + String *val_ptr __attribute__((unused))) + { + return ipv6_to_str((const char *) ptr, val_buffer) ? NULL : val_buffer; + } + + my_decimal *val_decimal(my_decimal *to) + { + ASSERT_COLUMN_MARKED_FOR_READ; + return Type_handler_inet6::raw_to_decimal((const char *) ptr, to); + } + + longlong val_int() + { + ASSERT_COLUMN_MARKED_FOR_READ; + return Type_handler_inet6::raw_to_int((const char *)ptr); + } + + double val_real() + { + ASSERT_COLUMN_MARKED_FOR_READ; + return Type_handler_inet6::raw_to_real((const char *) ptr); + } + + bool get_date(MYSQL_TIME *ltime, ulonglong fuzzydate) + { + ASSERT_COLUMN_MARKED_FOR_READ; + return Type_handler_inet6::raw_to_date((const char *)ptr, ltime, fuzzydate); + } + + bool val_raw(String *to, enum_field_types type) + { + to->set((const char *) ptr, pack_length(), &my_charset_bin); + return false; + } + + + int store_raw(const String *str, enum_field_types type_arg) + { + if (type_arg != type()) + { + set_null(); + return 1; // TODO: go through str/int/real/dec/date? + } + DBUG_ASSERT(str->length() == binary_length()); + memcpy(ptr, str->ptr(), binary_length()); + return 0; + }
val_raw and store_raw should not be virtual, they are always identical for all types. Just don't forget the validity check.
+ + int store(const char *str, uint length, CHARSET_INFO *cs) + { + if (!str_to_ipv6(str, length, (char*) ptr)) + { + set_null(); + ErrConvString err(str, length, cs); + return set_min_value_with_warn(&err); + } + return 0; + } + + int store_decimal(const my_decimal *num) + { + ASSERT_COLUMN_MARKED_FOR_WRITE_OR_COMPUTED; + longlong tmp; + my_decimal dec; + /* Store the high 8 bytes */ + my_decimal_div(E_DEC_FATAL_ERROR, &dec, num, + &Type_handler_inet6::s_singleton.my_2p64, 0); + my_decimal_round(E_DEC_FATAL_ERROR, &dec, 0, true, &dec); + if (my_decimal2int(0, &dec, true, &tmp)) + { + ErrConvDecimal err(num); + return dec.sign() ? set_min_value_with_warn(&err) : + set_max_value_with_warn(&err); + } + mi_int8store(ptr, tmp); + /* Store the low 8 bytes */ + my_decimal_mod(E_DEC_FATAL_ERROR, &dec, num, + &Type_handler_inet6::s_singleton.my_2p64); + my_decimal2int(E_DEC_FATAL_ERROR, &dec, true, &tmp); + mi_int8store(ptr + 8, tmp); + if (decimal_actual_fraction(num) > 0) + { + ErrConvDecimal err(num); + truncated_fraction_warning(&err); + return 2; + } + return 0; + } + + int store(longlong nr, bool unsigned_flag) + { + ASSERT_COLUMN_MARKED_FOR_WRITE_OR_COMPUTED; + if (nr < 0 && !unsigned_flag) + { + ErrConvInteger err(nr); + return set_min_value_with_warn(&err); + } + int_to_raw(nr, (char *) ptr); + return 0; + } + + int store(double nr) + { + ASSERT_COLUMN_MARKED_FOR_WRITE_OR_COMPUTED; + if (nr < 0) + { + ErrConvDouble err(nr); + return set_min_value_with_warn(&err); + } + my_decimal dec; + if (double2my_decimal(E_DEC_FATAL_ERROR & ~E_DEC_OVERFLOW, nr, &dec)) + { + ErrConvDouble err(nr); + return set_max_value_with_warn(&err); + } + return store_decimal(&dec); + } + + int store_time_dec(MYSQL_TIME *ltime, uint dec) + { + my_decimal tmp; + return store_decimal(date2my_decimal(ltime, &tmp)); + }
Numerous val_xxx and store_xxx could be replaced by xxx_to_raw and raw_to_xxx + val_raw and store_raw. You shouldn't introduce both val_xxx and raw_to_xxx. Same for store_xxx and xxx_to_raw
+ + uint size_of() const { return sizeof(*this); } + }; + + + class Arg_comparator_inet6: public Arg_comparator + { + public: + int compare_inet6() + { + if ((*a)->val_raw(&value1, cmp_field_type()) || + (*b)->val_raw(&value2, cmp_field_type())) + goto null; + if (set_null) + owner->null_value= 0; + DBUG_ASSERT(value1.length() == Type_handler_inet6::binary_length()); + DBUG_ASSERT(value2.length() == Type_handler_inet6::binary_length()); + return memcmp(value1.ptr(), value2.ptr(), Type_handler_inet6::binary_length()); + null: + if (set_null) + owner->null_value= 1; + return -1; + } + int compare_e_inet6() + { + bool a_is_null= (*a)->val_raw(&value1, cmp_field_type()); + bool b_is_null= (*b)->val_raw(&value2, cmp_field_type()); + if (a_is_null || b_is_null) + return MY_TEST(a_is_null == b_is_null); + DBUG_ASSERT(value1.length() == Type_handler_inet6::binary_length()); + DBUG_ASSERT(value2.length() == Type_handler_inet6::binary_length()); + return MY_TEST(memcmp(value1.ptr(), value2.ptr(), Type_handler_inet6::binary_length()) == 0); + }
we should try to get rid of one of the functions, see below
+ }; + + class Item_typecast_inet6: public Item_func + { + public: + Item_typecast_inet6(Item *a) :Item_func(a) {} + enum Functype functype() const { return HANDLED_TYPECAST_FUNC; } + enum_field_types field_type() const + { return (enum_field_types) MYSQL_TYPE_INET6_ID; } + bool eq(const Item *item, bool binary_cmp) const + { + if (this == item) + return true; + if (item->type() != FUNC_ITEM || + functype() != ((Item_func*)item)->functype()) + return false; + if (field_type() != ((Item_func*) item)->field_type()) + return false; + Item_typecast_inet6 *cast= (Item_typecast_inet6*) item; + if (!args[0]->eq(cast->args[0], binary_cmp)) + return 0; + return true; + } + const char *func_name() const { return "cast_as_inet6"; } + void print(String *str, enum_query_type query_type) + { + str->append(STRING_WITH_LEN("cast(")); + args[0]->print(str, query_type); + str->append(STRING_WITH_LEN(" as inet6")); + str->append(')'); + } + void fix_length_and_dec() + { + max_length= Type_handler_inet6::char_length(); + collation.set(&my_charset_numeric, DERIVATION_NUMERIC); + } + String *val_str(String *to) + { + return Type_handler_inet6::s_singleton.val_str_from_val_raw(args[0], to); + } + longlong val_int() + { + return Type_handler_inet6::s_singleton.val_int_from_val_raw(args[0]); + } + double val_real() + { + return Type_handler_inet6::s_singleton.val_real_from_val_raw(args[0]); + } + my_decimal *val_decimal(my_decimal *to) + { + return Type_handler_inet6::s_singleton.val_decimal_from_val_raw(args[0], to); + } + bool get_date(MYSQL_TIME *ltime, ulonglong fuzzydate) + { + return Type_handler_inet6::s_singleton.get_date_from_val_raw(args[0], + ltime, + fuzzydate); + } + enum Item_result result_type () const { return STRING_RESULT; } + }; + + /* 2 power 64, in my_decimal format */ + class my_decimal_2p64: public my_decimal + { + public: + my_decimal_2p64() + { + my_decimal tmp; // 2 power 32 + int2my_decimal(E_DEC_FATAL_ERROR, 0x100000000ULL, true, &tmp); + my_decimal_mul(E_DEC_FATAL_ERROR, this, &tmp, &tmp); + } + }; + +public: + my_decimal_2p64 my_2p64; + static Type_handler_inet6 s_singleton; + Type_handler_inet6() + { + Type_handlers.add(this); + } + ~Type_handler_inet6() {} + const char *field_type_name() const { return "INET6"; } + uint field_type_name_length() const { return 5; } + enum_field_types field_type() const + { return (enum_field_types) MYSQL_TYPE_INET6_ID; } + enum_field_types fallback_field_type() const + { + // TODO: perhaps it should be DECIMAL + return MYSQL_TYPE_VARCHAR; + } + Item_result fallback_cmp_type() const // TODO: get rid of this + { + return STRING_RESULT; + } + Item_result cast_to_int_type() const { return DECIMAL_RESULT; } + + static double raw_to_real(const char *raw) + { + return (double) mi_uint8korr(raw) * (double) 0x100000000ULL + + (double) mi_uint8korr(raw + 8); + } + static my_decimal *raw_to_decimal(const char *raw, my_decimal *to) + { + my_decimal a, b; + int2my_decimal(E_DEC_FATAL_ERROR, mi_uint8korr(raw), true, &a); + my_decimal_mul(E_DEC_FATAL_ERROR, &b, &a, + &Type_handler_inet6::s_singleton.my_2p64); + int2my_decimal(E_DEC_FATAL_ERROR, mi_uint8korr(raw + 8), true, &a); + my_decimal_add(E_DEC_FATAL_ERROR, to, &b, &a); + return to; + } + static longlong raw_to_int(const char *raw) + { + longlong i; + my_decimal tmp; + my_decimal2int(E_DEC_FATAL_ERROR, raw_to_decimal(raw, &tmp), + true, &i); + return i; + } + static bool raw_to_date(const char *raw, + MYSQL_TIME *ltime, ulonglong fuzzydate) + { + my_decimal tmp; + return decimal_to_datetime_with_warn(raw_to_decimal(raw, &tmp), + ltime, fuzzydate, NULL); + } + + bool raw_to_real(const String *raw, double *to) const + { + DBUG_ASSERT(raw->length() == Type_handler_inet6::binary_length()); + *to= raw_to_real(raw->ptr()); + return false; + } + bool raw_to_int(const String *raw, longlong *to) const + { + DBUG_ASSERT(raw->length() == Type_handler_inet6::binary_length()); + *to= raw_to_int(raw->ptr()); + return false; + } + bool raw_to_decimal(const String *raw, my_decimal *to) const + { + DBUG_ASSERT(raw->length() == Type_handler_inet6::binary_length()); + raw_to_decimal(raw->ptr(), to); + return false; + } + bool raw_to_date(const String *raw, + MYSQL_TIME *ltime, ulonglong fuzzydate) const + { + DBUG_ASSERT(raw->length() == Type_handler_inet6::binary_length()); + return raw_to_date(raw->ptr(), ltime, fuzzydate); + } + bool raw_to_str(const String *raw, String *to) const + { + DBUG_ASSERT(raw->length() == Type_handler_inet6::binary_length()); + return ipv6_to_str(raw->ptr(), to); + } + + int cmp(String *a, String *b) const + { + DBUG_ASSERT(a->length() == Type_handler_inet6::binary_length()); + DBUG_ASSERT(a->length() == Type_handler_inet6::binary_length()); + return memcmp(a->ptr(), b->ptr(), Type_handler_inet6::binary_length()); + } + int cmp(Item *a, Item *b) const + { + StringBuffer<64> value1, value2; + if (!a->val_raw(&value1, field_type()) && + !b->val_raw(&value2, field_type())) + { + return cmp(&value1, &value2); + } + return -1; + } + String *val_str_from_val_raw(Item *item, String *buf) const + { + StringBuffer<64> tmp; + if (item->val_raw(&tmp, field_type())) + return 0; + // Fatal; TODO: check maybe_null?? + return ((item->null_value= raw_to_str(&tmp, buf))) ? 0 : buf; + } + double val_real_from_val_raw(Item *item) const + { + StringBuffer<64> tmp; + if (item->val_raw(&tmp, field_type())) + return 0; + double res; + return (item->null_value= raw_to_real(&tmp, &res)) ? 0 : res; + } + longlong val_int_from_val_raw(Item *item) const + { + StringBuffer<64> tmp; + if (item->val_raw(&tmp, field_type())) + return 0; + longlong res; + return (item->null_value= raw_to_int(&tmp, &res)) ? 0 : res; + } + my_decimal *val_decimal_from_val_raw(Item *item, my_decimal *buf) const + { + StringBuffer<64> tmp; + if (item->val_raw(&tmp, field_type())) + return 0; + return (item->null_value= raw_to_decimal(&tmp, buf)) ? 0 : buf; + } + bool get_date_from_val_raw(Item *item, + MYSQL_TIME *ltime, + ulonglong fuzzydate) const + { + StringBuffer<64> tmp; + if (item->val_raw(&tmp, field_type())) + return 0; + return item->null_value= raw_to_date(&tmp, ltime, fuzzydate); + } + uint decimal_precision() const { return 39; } + uint decimal_scale() const { return 0; } + uint temporal_scale() const { return 0; } + int set_cmp_func(Arg_comparator *cmp, Item *a, Item *b) const + { + cmp->set_func(cmp->is_owner_equal_func() ? + (arg_cmp_func) &Arg_comparator_inet6::compare_e_inet6 : + (arg_cmp_func) &Arg_comparator_inet6::compare_inet6, + field_type());
Again, boilerplate code, better: cmp->set_func(Arg_comparator_inet6::compare_e_inet6, Arg_comparator_inet6::compare_inet6, field_type()); no "cmp->is_owner_equal_func() ?" - this can be done inside set_func() Also, I wonder, whether every type really needs two functions, for equal func and not. Perhaps having only one - compare_e_inet6() should be sufficient and checks for NULL can be done in the common wrapper? And, even more, set_cmp_func() is not necessary at all. Type_handler should only have compare_type() and compare_e_type() virtual methods (or only one compare_e_type()) and the implementation of set_cmp_func() becomes identical for all types, and thus doesn't have to be virtual anymore. One method less to implement.
+ return 0; + } + void sortlength(struct st_sort_field *order, const Item *item) const + { + order->result_type= item->cmp_type(); + order->length= Type_handler_inet6::binary_length(); + order->suffix_length= 0; + order->need_strxnfrm= 0; + } + bool make_sort_key(struct st_sort_field *order, uchar *to) const + { + Item *item= order->item; + String buf((char *) to, order->length, &my_charset_bin); + /* + make_sortkey() uses str_result(), val_result(), val_int_result(), + val_decimal_result() for the standard result types. + QQ: should we introduce val_raw_result() and use here instead of + val_raw() ?
Or, perhaps, we should fix make_sortkey() to use val_str, val_real, etc. I failed to understant why *result() methos are used there. I've even replaced them with non-*result versions and not a single test failed. Including the test case that was added when val* methods were replaced with val*result methods. Also, I don't like the separate execution path for "old" fields, but I've already said that many times
+ */ + if (item->val_raw(&buf, field_type())) + { + memset(to, 0, order->length); + if (!item->maybe_null) + { + DBUG_ASSERT(0); + DBUG_PRINT("warning", + ("Got null on something that shouldn't be null")); + memset(to, 0, order->length); // Avoid crash + } + return true; + } + else + { + DBUG_ASSERT(buf.length() == order->length); + //TODO: DBUG_ASSERT(buf.ptr() == (const char *) to); + memmove(to, buf.ptr(), order->length); + } + return false; + } + + bool str_to_raw(const String *from, String *to) const + { + if (to->alloc(Type_handler_inet6::binary_length())) + return true; + if (!str_to_ipv6(from->ptr(), from->length(), (char*) to->ptr())) + { + ErrConvString err(from); + set_min_value_with_warn((char *) to->ptr(), &err); + } + to->length(Type_handler_inet6::binary_length()); + to->set_charset(&my_charset_bin); + return false; + } + + bool varbinary_to_raw(const String *from, String *to) const + { + if (to->alloc(Type_handler_inet6::binary_length())) + return true; + to->length(Type_handler_inet6::binary_length()); + int diff= from->length() - Type_handler_inet6::binary_length(); + if (!diff) + { + memcpy((char *) to->ptr(), from->ptr(), from->length()); + } + else if (diff > 0) // The value is longer than INET6 binary size + { + memcpy((char *) to->ptr(), from->ptr() + diff, from->length() - diff); + StringBuffer<64> buf; + raw_to_str(to, &buf); + ErrConvString err(from); + truncated_value_warning(&err, buf.c_ptr()); + } + else // The value is shorter + { + memset((char *) to->ptr(), 0x00, -diff); + memcpy((char *) to->ptr() - diff, from->ptr(), from->length()); + } + return false; + } + + bool int_to_raw(longlong tmp, bool unsigned_val, String *to) const + { + if (tmp < 0 && !unsigned_val) + { + ErrConvInteger err(tmp); + return set_truncated_value_with_warn(to, &err, false); + } + return int_to_raw((ulonglong) tmp, to); + } + + bool real_to_raw(double nr, String *to) const + { + my_decimal dec; + if (nr < 0 || + double2my_decimal(E_DEC_FATAL_ERROR & ~E_DEC_OVERFLOW, nr, &dec)) + { + ErrConvDouble err(nr); + return set_truncated_value_with_warn(to, &err, nr > 0); + } + return decimal_to_raw(&dec, to); + } + + bool decimal_to_raw(const my_decimal *num, String *to) const + { + longlong tmp; + my_decimal dec; + my_decimal_div(E_DEC_FATAL_ERROR, &dec, num, &my_2p64, 0); + my_decimal_round(E_DEC_FATAL_ERROR, &dec, 0, true, &dec); + if (my_decimal2int(0, &dec, true, &tmp)) + { + ErrConvDecimal err(num); + return set_truncated_value_with_warn(to, &err, !dec.sign()); + } + if (to->alloc(Type_handler_inet6::binary_length())) + return true; + // TODO: check overflow + mi_int8store((char*) to->ptr(), tmp); + /* Store the low 8 bytes */ + my_decimal_mod(E_DEC_FATAL_ERROR, &dec, num, &my_2p64); + if (!my_decimal2int(E_DEC_FATAL_ERROR, &dec, true, &tmp) && + decimal_actual_fraction(num) > 0) + { + ErrConvDecimal err(num); + truncated_fraction_warning(&err); + } + mi_int8store((char *) to->ptr() + 8, tmp); + to->length(Type_handler_inet6::binary_length()); + return false; + } + + bool date_to_raw(const MYSQL_TIME *ltime, String *to) const + { + ulonglong tmp= TIME_to_ulonglong(ltime); + if (ltime->neg && (tmp || ltime->second_part)) + { + ErrConvTime err(ltime); + return set_truncated_value_with_warn(to, &err, false); + } + if (ltime->second_part > 0) + { + ErrConvTime err(ltime); + truncated_fraction_warning(&err); + } + return int_to_raw(tmp, to); + } + + bool val_raw_from_val_str(Item *item, String *to) const + { + StringBuffer<64> buf; + String *tmp; + if (!(tmp= item->val_str(&buf))) + return item->null_value= true; + return item->null_value= str_to_raw(tmp, to); + } + + bool val_raw_from_val_decimal(Item *item, String *to) const + { + my_decimal buf, *num; + if (!(num= item->val_decimal(&buf))) + return item->null_value= true; + return item->null_value= decimal_to_raw(num, to); + } + + bool val_raw_from_val_int(Item *item, String *to) const + { + longlong tmp= item->val_int(); + if (item->null_value) + return true; + return item->null_value= int_to_raw(tmp, item->unsigned_flag, to); + } + + bool val_raw_from_val_real(Item *item, String *to) const + { + double tmp= item->val_real(); + if (item->null_value) + return true; + return item->null_value= real_to_raw(tmp, to); + } + + bool val_raw_from_get_date(Item *item, String *to) const + { + MYSQL_TIME ltime; + if (item->get_date(<ime, item->field_type() == MYSQL_TYPE_TIME ? + TIME_TIME_ONLY : 0)) + return true; + return item->null_value= date_to_raw(<ime, to); + } + + bool val_raw(Item *item, String *to) const + { + switch (item->cmp_type()) + { + case STRING_RESULT: + return val_raw_from_val_str(item, to); + case INT_RESULT: + return val_raw_from_val_int(item, to); + case REAL_RESULT: + return val_raw_from_val_real(item, to); + case TIME_RESULT: + return val_raw_from_get_date(item, to); + case DECIMAL_RESULT: + return val_raw_from_val_decimal(item, to); + case ROW_RESULT: + case IMPOSSIBLE_RESULT: + DBUG_ASSERT(0); + break; + } + return item->null_value= true; + } + + int save_in_field(Item_hex_hybrid *item, Field *field) const + { + StringBuffer<64> buf; + if (item->val_raw(&buf, field->type())) + { + field->set_null(); + return 1; + } + field->set_notnull(); + return field->store_raw(&buf, field->type()); + } + bool Item_hex_hybrid_val_raw(Item_hex_hybrid *item, String *to) const + { + String *tmp= item->val_str(NULL); + DBUG_ASSERT(tmp); + return item->null_value= varbinary_to_raw(tmp, to); + } + + bool Item_func_fix_length_and_dec(Item_func *item) const + { + return false; + } + + Field* make_field(const Field::Create_param *param, const char *name) const + { + return new Field_inet6(param, name); + } + + Item *make_cast(MEM_ROOT *mem_root, Item *arg) const + { + return new (mem_root) Item_typecast_inet6(arg); + } + + bool init_field(Create_field *field) const + { + field->length= Type_handler_inet6::char_length(); + field->char_length= field->length; + field->pack_length= Type_handler_inet6::binary_length(); + return false; + } + + uint calc_pack_length(uint length) const + { + return Type_handler_inet6::binary_length(); + } +}; + + +Type_handler_inet6 Type_handler_inet6::s_singleton; + + +const Type_handler * +Type_handler_register::handler(const char *name, uint length) const +{ + if (length < m_min_length || length > m_max_length) + return NULL; + for (uint i= m_min_type; i <= m_max_type; i++) + { + const Type_handler *handler= m_handler[i]; + if (handler->field_type_name_length() != length) + continue; + if (!my_strnncoll(&my_charset_latin1, + (const uchar *) handler->field_type_name(), length, + (const uchar *) name, length)) + return handler; + } + return NULL; +} + + +Type_handler_register::Type_handler_register() +{ + memset(m_handler, 0, sizeof(m_handler)); + m_min_type= 256; + m_max_type= 0; + m_min_length= 256; + m_max_length= 0; + add(&Type_handler_inet6::s_singleton);
Eh? I thought it's added in the Type_handler_inet6 constructor.
+} + +Type_handler_register Type_handlers;
=== modified file 'sql/item.cc' --- sql/item.cc 2014-06-11 08:09:24 +0000 +++ sql/item.cc 2014-07-01 08:37:34 +0000 @@ -764,6 +777,7 @@ Item_result Item::cmp_type() const case MYSQL_TYPE_DATETIME2: case MYSQL_TYPE_NEWDATE: return TIME_RESULT; + break;
Eh? break after return? :)
}; DBUG_ASSERT(0); return IMPOSSIBLE_RESULT; === modified file 'sql/item_strfunc.cc' --- sql/item_strfunc.cc 2014-06-10 18:20:33 +0000 +++ sql/item_strfunc.cc 2014-06-27 15:48:18 +0000 @@ -4303,6 +4303,7 @@ bool Item_func_dyncol_create::prepare_ar DYNAMIC_COLUMN_TYPE type= defs[i].type; if (type == DYN_COL_NULL) // auto detect { + DBUG_ASSERT(!Type_handlers.handler(args[valpos]->field_type()));
mark this TODO, please
/* We don't have a default here to ensure we get a warning if one adds a new not handled MYSQL_TYPE_...
=== modified file 'sql/rpl_utility.cc' --- sql/rpl_utility.cc 2014-05-13 09:53:30 +0000 +++ sql/rpl_utility.cc 2014-06-27 15:48:45 +0000 @@ -675,6 +675,9 @@ can_convert_field_to(Field *field, */
DBUG_PRINT("debug", ("Base types are different, checking conversion")); + if (Type_handlers.handler(source_type)) + DBUG_RETURN(false);
add a TODO comment here, please
+ switch (source_type) // Source type (on master) { case MYSQL_TYPE_DECIMAL:
Regards, Sergei
Hi Sergei, Thanks for review. Sorry for delay with this. I've been busy with the other tasks. I won't reply to every your comment in this letter. I skip all comments that I agree with, and those that I need some time to think. Only some critical and unclear follow below: On 08/10/2014 03:22 PM, Sergei Golubchik wrote: <skip>
=== modified file 'sql/field.h' --- sql/field.h 2014-06-11 08:08:08 +0000 +++ sql/field.h 2014-07-01 10:38:12 +0000 @@ -59,6 +60,15 @@ enum Derivation DERIVATION_EXPLICIT= 0 };
+ +/* Data type derivation */ +enum Type_derivation +{ + TYPE_DERIVATION_COERCIBLE= 1, // Non-typified literals + TYPE_DERIVATION_IMPLICIT= 0 // Fields, functions, typified literals +};
Interesting. Does it affect the existing behavior? Or you only use it for new types? I think both alternatives are far from ideal: the former means incompatibilities, the latter - that new types behave differently from old ones. Still, given the choice, I'd prefer consistency over compatibility.
Currently my patch implements this only for the new data types. But I've been thinking about the idea of type derivation for a long time for the standard types as well. I find that taking into account type derivation during type aggregation makes behavior clearer in most cases. Also, this approach forces using indexes in more cases, when you do: WHERE field = literal There should be three levels actually: enum Type_derivation { TYPE_DERIVATION_COERCIBLE= 2, // Non-typified literals TYPE_DERIVATION_IMPLICIT= 1, // Fields, functions, typified literals TYPE_DERIVATION_EXPLICIT= 0, // Explicit CAST(expr AS type) }; So when you do: WHERE field=CAST(literal AS force_type) comparison is done according to "force_type", no matter what the type of "field" is. Some examples: 1. WHERE char_field=10 will compare as string. Currently it compares as real. 2. WHERE int_field='10' will compare as int. Currently it compares as real. 3. WHERE int_field='string' will return "impossible condition", as the literal does not cast to INT safely. Currently it compares as real, and just wastes time, because the result set is empty anyway. 4. WHERE int_field=10.1 will return "impossible condition", as the literal does not cast to INT safely. Currently it compares as real (and wastes time again). 5. WHERE time_field='00:00:00' will compare as TIME. Currently it does the same. Nothing change. 6. WHERE string_field=TIME'00:00:00' Open question. Some variants are possible: a. typified literals are weaker than fields (as it is still a literal on the right side) b. typified literals are stronger than fields (as mentioning the type is a sort of CAST) c. typified literals have the same type strength with fields, use the current rules that we have for the type aggregation now. If we go (a) or (b), then a separate type derivation level is needed. Currently it compares as TIME (in all cases when a TIME argument is given). If we agree on this proposal with the idea of type derivation, then it should be a separate task. <skip>
I don't like the idea of fallback_field_type. You have Field_type_joiner for new types, but you fallback to Field::field_type_merge() for old types.
Not only. We need something to do when two different non-standard types are being aggregated. My idea was: Data types will go in packages. INET6 and INET4 will be in the same package and will know about each other, and will know how to aggregate to each other (INET6 is a super-type for INET4). But we also need to do something when we have two absolutely different data types, which do not know each other. 1. We could return an error in such cases, which would be more SQL standard strict behavior. 2. Or, we could use fallback_field_type(), which would be a more MariaDB behavior (and more consistent with the standard data types). So what would you prefer? N1, or N2 (have fallback_field_type(), but limit its use only for the cases when two data types that do not know each other are being aggregated)?
I'd rather not distinguish between "old" and "new" types. Just like we have with storage engines or authentication - there are no "native" and "plugin" storage engines (or authentication), *all* storage engines are plugins. So there's no code like
if (is_plugin(engine_type)) ... else ...
so, I prefer uniform code with no special checks for type handlers. Implementations differ, but the interface is the same for all types.
<skip>
+ virtual bool varbinary_to_raw(const String *from, String *to) const = 0;
1. This most certainly needs a comment. It took me quite a while to understand what this method is for and why str_to_raw isn't used instead. A comment should say that it's used when a field binary representation is specified in a query directly was 0xHHHHHHHH...
2. Still, I'm not sure this method is needed. It means a different behavior for
where ipv6 = 0xHHHHHH...
and
create t1 (b binary(16)); insert t1 values (0xHHHHH....); ... where ipv6 = a ...
Right. Currently it also works differently if you mix numbers and strings: WHERE string_field=0xHHHHHH and CREATE TABLE t1 (int_column); INSERT INTO t1 VALUES (0xHHHHHH); SELECT .. WHERE string_field=int_column;
an alternative would be to treat any binary string as raw, directly in str_to_raw. But a drawback would be that
where ipv6 = "ffff::1"
and
where ipv6 = convert("ffff::1" to binary)
will work differently. And it's not particularly good either.
So, it's a tradeoff, and the question is - what is "less gotcha" ? I don't know. I have a slight preference towards removing varbinary_to_raw() just to make the API smaller. And thinking about adding it if there would be user complains.
Another thought. I'm starting to doubt that we should support direct raw values in the SQL at all. It might work for ipv6
where ipv6 = 0xHHHHHH
but it certainly won't work for, say, integers:
where int = 0xHHHHH
^^^ this is not raw, but a normal hexadecimal number. So, unless you introduce a special unambiguous syntax (like RAW(0xHHHHH)), I think that raw values should not be supported in SQL.
So we have three alternatives: 1. Add a method, like I did in the patch, just rename it properly. Each type will define its behavior when processing a hex literal. 2. Add RAW(). Return error for: WHERE new_type=0xHHHHHH Preserve the existing behavior for: WHERE old_type=0XHHHHHH 3. Treat all binary strings as RAW for the new data types. Preserve the current behavior for the standard data types. I find that N1 is the best. The price is not high: a method in API, with a clear purpose (providing that we rename and comment it properly), and the reward is a good flexibility. <skip>
+ bool make_sort_key(struct st_sort_field *order, uchar *to) const + { + Item *item= order->item; + String buf((char *) to, order->length, &my_charset_bin); + /* + make_sortkey() uses str_result(), val_result(), val_int_result(), + val_decimal_result() for the standard result types. + QQ: should we introduce val_raw_result() and use here instead of + val_raw() ?
Or, perhaps, we should fix make_sortkey() to use val_str, val_real, etc. I failed to understant why *result() methos are used there. I've even replaced them with non-*result versions and not a single test failed. Including the test case that was added when val* methods were replaced with val*result methods.
I don't know. I copied this code from filesort.cc. It uses val_xxx_result() methods. Are you saying that filesort.cc can be fixed to use val_xxx() instead of val_xxx_result()? If so, then it should be just fixed in a separate patch in 10.1 before my patch.
+Type_handler_register::Type_handler_register() +{ + memset(m_handler, 0, sizeof(m_handler)); + m_min_type= 256; + m_max_type= 0; + m_min_length= 256; + m_max_length= 0; + add(&Type_handler_inet6::s_singleton);
Eh? I thought it's added in the Type_handler_inet6 constructor.
Well, I could not understand how to guarantee the proper order call of the constructors for Type_handler_register and Type_handler_inet6. They there called in the opposite order and it crashed. Perhaps it should not be even done in constructors at all. The code somewhere in mysqld.cc could add Type_handler_inet6 and Type_hander_xxx for the all standard types: Type_handlers.add(&Type_handler_int); Type_handlers.add(&Type_handler_tinyint); ... all standard types ... Type_handlers.add(&Type_handler_inet6); Later Type_handlers.add() will also be used when a type handler is loaded from a dynamic library. Does it sound fine?
=== modified file 'sql/item_strfunc.cc' --- sql/item_strfunc.cc 2014-06-10 18:20:33 +0000 +++ sql/item_strfunc.cc 2014-06-27 15:48:18 +0000 @@ -4303,6 +4303,7 @@ bool Item_func_dyncol_create::prepare_ar DYNAMIC_COLUMN_TYPE type= defs[i].type; if (type == DYN_COL_NULL) // auto detect { + DBUG_ASSERT(!Type_handlers.handler(args[valpos]->field_type()));
mark this TODO, please
Can you clarify please?
/* We don't have a default here to ensure we get a warning if one adds a new not handled MYSQL_TYPE_...
=== modified file 'sql/rpl_utility.cc' --- sql/rpl_utility.cc 2014-05-13 09:53:30 +0000 +++ sql/rpl_utility.cc 2014-06-27 15:48:45 +0000 @@ -675,6 +675,9 @@ can_convert_field_to(Field *field, */
DBUG_PRINT("debug", ("Base types are different, checking conversion")); + if (Type_handlers.handler(source_type)) + DBUG_RETURN(false);
add a TODO comment here, please
Can you clarify please?
+ switch (source_type) // Source type (on master) { case MYSQL_TYPE_DECIMAL:
Regards, Sergei
Thanks.
participants (3)
-
Alexander Barkov
-
Sergei Golubchik
-
Sergey Petrunia