[Maria-discuss] Restated: JSON cannot represent binary data
I think we're getting lost in the detail. Beyond dispute are two facts: 1. BINARY is a valid datatype in dynamic columns. 2. JSON cannot represent binary data. It follows that: COLUMN_JSON() cannot (correctly) represent all dynamic column values. So I ask the question is: What *should* COLUMN_JSON() do when a dynamic column contains BINARY values? When I file a bug report, I like to give an expected result. I don't know what it is in this case. Every other decent JSON encoder errors on such input. Is that what Maria should do?
Hi, Tom! On Jan 31, Tom Worster wrote:
I think we're getting lost in the detail. Beyond dispute are two facts:
1. BINARY is a valid datatype in dynamic columns. 2. JSON cannot represent binary data.
It follows that:
COLUMN_JSON() cannot (correctly) represent all dynamic column values.
So I ask the question is:
What *should* COLUMN_JSON() do when a dynamic column contains BINARY values?
When I file a bug report, I like to give an expected result. I don't know what it is in this case. Every other decent JSON encoder errors on such input. Is that what Maria should do?
I suppose MariaDB could either return a NULL (meaning "failed", "invalid input", "not possible to convert to JSON") or, may be, base64 this binary data in the output. But base64 looks like an arbitrary choice, and it, technically, returns something that is *not* the original binary data. Regards, Sergei
Hi Sergei, base64 makes sense only if I store a base64 encoded value, which is what my ORM extension does (prefixed with 'data:application/octet-stream;base64,') for non-utf8 strings. So I say leave that to me. A dynamic column cannot be NULL, so using a JSON null (a different kind of null) to express "dynamic column exists but cannot be represented as requested" should work. The ORM would then have the names and positions in the structure of all the BINARY dynamic columns. With that it can send a SELECT with one or more COLUMN_GET(dyncol_blob, "name" AS BINARY) expressions. I could live with that. Tom On 2/1/15, 3:45 PM, "Sergei Golubchik" <serg@mariadb.org> wrote:
Hi, Tom!
On Jan 31, Tom Worster wrote:
I think we're getting lost in the detail. Beyond dispute are two facts:
1. BINARY is a valid datatype in dynamic columns. 2. JSON cannot represent binary data.
It follows that:
COLUMN_JSON() cannot (correctly) represent all dynamic column values.
So I ask the question is:
What *should* COLUMN_JSON() do when a dynamic column contains BINARY values?
When I file a bug report, I like to give an expected result. I don't know what it is in this case. Every other decent JSON encoder errors on such input. Is that what Maria should do?
I suppose MariaDB could either return a NULL (meaning "failed", "invalid input", "not possible to convert to JSON") or, may be, base64 this binary data in the output.
But base64 looks like an arbitrary choice, and it, technically, returns something that is *not* the original binary data.
Regards, Sergei
Hi, Tom! On Feb 01, Tom Worster wrote:
Hi Sergei,
base64 makes sense only if I store a base64 encoded value, which is what my ORM extension does (prefixed with 'data:application/octet-stream;base64,') for non-utf8 strings. So I say leave that to me.
Right...
A dynamic column cannot be NULL, so using a JSON null (a different kind of null) to express "dynamic column exists but cannot be represented as requested" should work. The ORM would then have the names and positions in the structure of all the BINARY dynamic columns. With that it can send a SELECT with one or more COLUMN_GET(dyncol_blob, "name" AS BINARY) expressions. I could live with that.
Hmm. May be a dynamic column cannot be NULL now, but this is not a conceptual limitation, there is no logical reason why it coldn't be. So I'd rather keep JSON null for that. What I mean was that the whole COLUMN_GET should fail and return NULL, just as any function does when it gets invalid input: MariaDB [test]> select uncompress("foobar"), 1/0, sqrt(-2); +----------------------+------+----------+ | uncompress("foobar") | 1/0 | sqrt(-2) | +----------------------+------+----------+ | NULL | NULL | NULL | +----------------------+------+----------+ (it can also throw a warning, like uncompress does). Regards, Sergei
What *should* COLUMN_JSON() do when a dynamic column contains BINARY values?
I suppose MariaDB could either return a NULL (meaning "failed", "invalid input", "not possible to convert to JSON") or, may be, base64 this binary data in the output.
But base64 looks like an arbitrary choice, and it, technically, returns something that is *not* the original binary data.
On 03.02.15 13:13, Sergei Golubchik wrote: [skip]
A dynamic column cannot be NULL, so using a JSON null (a different kind of null) to express "dynamic column exists but cannot be represented as requested" should work. The ORM would then have the names and positions in the structure of all the BINARY dynamic columns. With that it can send a SELECT with one or more COLUMN_GET(dyncol_blob, "name" AS BINARY) expressions. I could live with that. Hmm. May be a dynamic column cannot be NULL now, but this is not a conceptual limitation, there is no logical reason why it coldn't be. So I'd rather keep JSON null for that. What I mean was that the whole COLUMN_GET should fail and return NULL, just as any function does when it gets invalid input:
MariaDB [test]> select uncompress("foobar"), 1/0, sqrt(-2); +----------------------+------+----------+ | uncompress("foobar") | 1/0 | sqrt(-2) | +----------------------+------+----------+ | NULL | NULL | NULL | +----------------------+------+----------+
(it can also throw a warning, like uncompress does).
Logic under dynamic column NULL is following - do not store NULL data. So actually any unknown field is NULL and dynamic column function accept NULL as value and even return NULL. The problem is that for JSON the field will not be listed then. [skip]
Hi Sergei, On 2/3/15, 7:13 AM, "Sergei Golubchik" <serg@mariadb.org> wrote:
A dynamic column cannot be NULL, so using a JSON null (a different kind of null) to express "dynamic column exists but cannot be represented as requested" should work. The ORM would then have the names and positions in the structure of all the BINARY dynamic columns. With that it can send a SELECT with one or more COLUMN_GET(dyncol_blob, "name" AS BINARY) expressions. I could live with that.
Hmm. May be a dynamic column cannot be NULL now, but this is not a conceptual limitation, there is no logical reason why it coldn't be.
There may be no *technical* reason why it couldn't be but I see no *logical* reason why it would be. I can understand how tables (be they schema, temp or select) need NULL to say "it's not here" but a dynamic column existing and having a "value" of NULL is a non-sequitur. Writing a NULL to a dynamic column currently deletes it and I think that makes perfect sense. (I think we see know how the word "column" in "dynamic columns" can be misleading.)
So I'd rather keep JSON null for that. What I mean was that the whole COLUMN_GET should fail and return NULL, just as any function does when it gets invalid input:
MariaDB [test]> select uncompress("foobar"), 1/0, sqrt(-2); +----------------------+------+----------+ | uncompress("foobar") | 1/0 | sqrt(-2) | +----------------------+------+----------+ | NULL | NULL | NULL | +----------------------+------+----------+
(it can also throw a warning, like uncompress does).
I see what you mean but it's not a very useful answer. Much more useful to get the names, datatypes and structure plus all the values that aren't non-unicode strings. Tom
NULL and missing are two different concepts. Null means unknown, not missing. The data structure should have an ability to store NULL to be compatible with RDBMS semantics. While a dynamic column is similar to a NoSQL document store it is not one, and should support SQL semantics. IMHO --Justin Sent from my iPhone
On Feb 3, 2015, at 7:58 AM, Tom Worster <fsb@thefsb.org> wrote:
Hi Sergei,
On 2/3/15, 7:13 AM, "Sergei Golubchik" <serg@mariadb.org> wrote:
A dynamic column cannot be NULL, so using a JSON null (a different kind of null) to express "dynamic column exists but cannot be represented as requested" should work. The ORM would then have the names and positions in the structure of all the BINARY dynamic columns. With that it can send a SELECT with one or more COLUMN_GET(dyncol_blob, "name" AS BINARY) expressions. I could live with that.
Hmm. May be a dynamic column cannot be NULL now, but this is not a conceptual limitation, there is no logical reason why it coldn't be.
There may be no *technical* reason why it couldn't be but I see no *logical* reason why it would be. I can understand how tables (be they schema, temp or select) need NULL to say "it's not here" but a dynamic column existing and having a "value" of NULL is a non-sequitur. Writing a NULL to a dynamic column currently deletes it and I think that makes perfect sense. (I think we see know how the word "column" in "dynamic columns" can be misleading.)
So I'd rather keep JSON null for that. What I mean was that the whole COLUMN_GET should fail and return NULL, just as any function does when it gets invalid input:
MariaDB [test]> select uncompress("foobar"), 1/0, sqrt(-2); +----------------------+------+----------+ | uncompress("foobar") | 1/0 | sqrt(-2) | +----------------------+------+----------+ | NULL | NULL | NULL | +----------------------+------+----------+
(it can also throw a warning, like uncompress does).
I see what you mean but it's not a very useful answer. Much more useful to get the names, datatypes and structure plus all the values that aren't non-unicode strings.
Tom
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
On 2/3/15, 10:24 AM, "Justin Swanhart" <greenlion@gmail.com> wrote:
NULL and missing are two different concepts. Null means unknown, not missing. The data structure should have an ability to store NULL to be compatible with RDBMS semantics. While a dynamic column is similar to a NoSQL document store it is not one, and should support SQL semantics.
IMHO
--Justin
Hi Justin, My understanding has always been basically what Wikipedia says:
Null is a special marker used in Structured Query Language (SQL) to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL Null serves to fulfill the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information".
What does Null represent if not the above? Tom
Wikipedia's explanation sounds like a way to say nothing, while still pretending they are saying something. An information does not exist "in the database"? This is obvious for anyone! I don't know a single person stupid enough to think that NULL is a valid name or telephone number :) The open question is: is the datum *unknown* or *missing* "in the reality"? Or, if you prefer: does it mean that I don't know your telephone number, or does it mean that you don't have a telephone? I totally agree with Justin's view. Any other interpratation does not justify the special threatment of NULL pseudo-values in SQL. Regards Federico -------------------------------------------- Mar 3/2/15, Tom Worster <fsb@thefsb.org> ha scritto: Oggetto: Re: [Maria-discuss] Restated: JSON cannot represent binary data A: "Justin Swanhart" <greenlion@gmail.com> Cc: "maria-discuss email list" <maria-discuss@lists.launchpad.net> Data: Martedì 3 febbraio 2015, 17:29 On 2/3/15, 10:24 AM, "Justin Swanhart" <greenlion@gmail.com> wrote:
NULL and missing are two different concepts. Null means unknown, not missing. The data structure should have an ability to store NULL to be compatible with RDBMS semantics. While a dynamic column is similar to a NoSQL document store it is not one, and should support SQL semantics.
IMHO
--Justin
Hi Justin, My understanding has always been basically what Wikipedia says:
Null is a special marker used in Structured Query Language (SQL) to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL Null serves to fulfill the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information".
What does Null represent if not the above? Tom _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
back to the question What *should* COLUMN_JSON() do when a dynamic column contains BINARY values? today what should be expect? return "wrong" (incompatible with json specs) json value? return null+warning? return "wrong"+warning? i think it's a 'sql_mode' that should be set to strict or not
Why a warning? NULL has a meaning that cannot be represented in other ways. A missing death_date means that you John is alive. death_date=NULL means that we don't know when John has died. Representing an information in the only correct way should not cause a warning. Redards Federico -------------------------------------------- Mar 3/2/15, Roberto Spadim <roberto@spadim.com.br> ha scritto: Oggetto: Re: [Maria-discuss] Restated: JSON cannot represent binary data A: "Federico Razzoli" <federico_raz@yahoo.it> Cc: "Justin Swanhart" <greenlion@gmail.com>, "Tom Worster" <fsb@thefsb.org>, "maria-discuss email list" <maria-discuss@lists.launchpad.net> Data: Martedì 3 febbraio 2015, 20:45 back to the question What *should* COLUMN_JSON() do when a dynamic column contains BINARY values? today what should be expect? return "wrong" (incompatible with json specs) json value? return null+warning? return "wrong"+warning? i think it's a 'sql_mode' that should be set to strict or not
i will give an example with uncompress: SELECT UNCOMPRESS('abc') return-> NULL SHOW WARNINGS "Level" "Code" "Message" "Warning" "1259" "ZLIB: Dados de entrada está corrupto" return NULL+warning but that's not COLUMN_JSON UNCOMPRESS receive a wrong input COLUMN_JSON return a "wrong" output with a binary data at dynamic column, json output "should not" be expected by some decoders, but maybe it work with others decoders the point is... what should we return? NULL, 'wrong' output, or NULL+warning, or wrong output + warning? check json specs: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf "JSON does not support cyclic graphs, at least not directly. JSON is not indicated for applications requiring binary data." but i don't see why many parts at google / forum / etc, tell that it *don't* suport binary, if it's a javascript representation (not 100%), we should consider that node.js can't handle binary data? being practical... what decoders should be used to test COLUMN_JSON? maybe we need just a bit of documentation at COLUMN_JSON, instead of changing function source code, now i see that's not 100% clear
the warning is not for null, the warning is for column_json return not strict to json specs
Hi, I suggest creating an MDEV spec for COLUMN_BSON() to return BSON data which can handle binary (MongoDB uses it to store binary data). In the same MDEV, COLUMN_JSON should be modified to omit binary columns and return a warning to use COLUMN_BSON for a complete representation of the data. Does that sound reasonable? --Justin On Tue, Feb 3, 2015 at 4:43 PM, Roberto Spadim <roberto@spadim.com.br> wrote:
the warning is not for null, the warning is for column_json return not strict to json specs
Hi guys I will comment Justin Swanhart mail, that's reasonable
Hi, I suggest creating an MDEV spec for COLUMN_BSON() to return BSON data which can handle binary (MongoDB uses it to store binary data).
In the same MDEV, COLUMN_JSON should be modified to omit binary columns and return a warning to use COLUMN_BSON for a complete representation of
Nice, solve a binary representation problem, maybe we will need BSON_TO_COLUMN and JSON_TO_COLUMN too the data. I think we should extend the COLUMN_JSON function instead of COLUMN_JSON( dyncol ) "we" could have: COLUMN_JSON( dyncol , binary_output ) at binary_output we explain how binary values will be returned: 'base64' -> return as base64 'raw' -> do what COLUMN_JSON do today 'buffer' -> like node.js encoding binary as array [12,34,52,42,123, etc ...] 'null' -> return NULL any binary_output 'conversion' should report a warning that we have 'binary casting' with column_json function or maybe, a warning to use column_bson function
BSON works for me so long as Maria is happy to pass BSON texts as strings. When I was reading the BSON spec I just had to laugh. Douglas Crockford often talk about a bug he made in JSON reference library: * http://youtu.be/bo36MrBfTk4?t=38m2s And BSON start's its specification with:
document ::= int32 e_list "\x00"
Just wonderful!
From: Justin Swanhart <greenlion@gmail.com> Date: Tuesday, February 3, 2015 at 7:31 PM To: Roberto Spadim <roberto@spadim.com.br> Cc: Federico Razzoli <federico_raz@yahoo.it>, Tom Worster <fsb@thefsb.org>, maria-discuss email list <maria-discuss@lists.launchpad.net> Subject: Re: [Maria-discuss] Restated: JSON cannot represent binary data
Hi,
I suggest creating an MDEV spec for COLUMN_BSON() to return BSON data which can handle binary (MongoDB uses it to store binary data). In the same MDEV, COLUMN_JSON should be modified to omit binary columns and return a warning to use COLUMN_BSON for a complete representation of the data.
Does that sound reasonable?
--Justin
On Tue, Feb 3, 2015 at 4:43 PM, Roberto Spadim <roberto@spadim.com.br> wrote:
the warning is not for null, the warning is for column_json return not strict to json specs
Hi, Justin! On Feb 03, Justin Swanhart wrote:
Hi,
I suggest creating an MDEV spec for COLUMN_BSON() to return BSON data which can handle binary (MongoDB uses it to store binary data). In the same MDEV, COLUMN_JSON should be modified to omit binary columns and return a warning to use COLUMN_BSON for a complete representation of the data.
Does that sound reasonable?
Yes, very. Good idea, thanks. Regards, Sergei
Hi, Tom! On Feb 03, Tom Worster wrote:
Hi Sergei,
On 2/3/15, 7:13 AM, "Sergei Golubchik" <serg@mariadb.org> wrote:
A dynamic column cannot be NULL, so using a JSON null (a different kind of null) to express "dynamic column exists but cannot be represented as requested" should work. The ORM would then have the names and positions in the structure of all the BINARY dynamic columns. With that it can send a SELECT with one or more COLUMN_GET(dyncol_blob, "name" AS BINARY) expressions. I could live with that.
Hmm. May be a dynamic column cannot be NULL now, but this is not a conceptual limitation, there is no logical reason why it coldn't be.
There may be no *technical* reason why it couldn't be but I see no *logical* reason why it would be. I can understand how tables (be they schema, temp or select) need NULL to say "it's not here" but a dynamic column existing and having a "value" of NULL is a non-sequitur. Writing a NULL to a dynamic column currently deletes it and I think that makes perfect sense. (I think we see know how the word "column" in "dynamic columns" can be misleading.) ...
Okay. I think that as our dyncols don't support NULLs and JSON does, that might be a reasonable solution.
I see what you mean but it's not a very useful answer. Much more useful to get the names, datatypes and structure plus all the values that aren't non-unicode strings.
Okay. Would you like to report a bug for that or should I? Regards, Sergei
participants (6)
-
Federico Razzoli
-
Justin Swanhart
-
Oleksandr Byelkin
-
Roberto Spadim
-
Sergei Golubchik
-
Tom Worster