Re: [Maria-discuss] JSON and binary strings
I could have been more direct. What should be the expected behavior in the following? SELECT COLUMN_JSON(COLUMN_CREATE('name', UNHEX('C1C2C3C4C5C6'))); From: Tom Worster <fsb@thefsb.org> Date: Friday, January 30, 2015 at 9:05 PM To: maria-discuss email list <maria-discuss@lists.launchpad.net> Subject: [Maria-discuss] JSON and binary strings BINARY is one of the datatypes you can use with dynamic columns. If you try saving binary string and then retrieve it with COLUMN_JSON() then you get an invalid JSON document back. All the JSON decoders I tested reject such documents. It doesn't make sense to me that COLUMN_JSON() would return a document that's invalid according to the JSON spec and can't be decoded. But I don't know what it should do. It seems that BINARY in dynamic columns means that COLUMN_JSON() is in an impossible position. [On a related note, if you save a valid utf8 string that includes control characters (e.g. "I went down to the river,\nI set down on the bank.") then COLUMN_JSON() fails to escape them. While similar, this I think is a bug.] Tom _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
I get the result (what I think is expected) from the query "SELECT COLUMN_JSON(COLUMN_CREATE('name', UNHEX('C1C2C3C4C5C6'))):" *{"name":"ÁÂÃÄÅÆ"} * .. in SQLyog (GUI client always using UTF8 as client character set) on Windows. In command line client shipped with MariaDB 10.0 for Windows, I get garbage (both after SET NAMES latin1 and SET NAMES utf8 - but MariaDB does not (yet) have a fully unicode-enabled commandline client for Windows like MySQL 5.6 and 5.7 have). Are you sure that your client environment will handle all those special/accented characters at the same time? I think you bumped into an encoding issue in the client of some kind. What is your client environment and OS, and what does the query return here? -- Peter -- Webyog On Sat, Jan 31, 2015 at 3:59 PM, Tom Worster <fsb@thefsb.org> wrote:
I could have been more direct. What should be the expected behavior in the following?
SELECT COLUMN_JSON(COLUMN_CREATE('name', UNHEX('C1C2C3C4C5C6')));
From: Tom Worster <fsb@thefsb.org> Date: Friday, January 30, 2015 at 9:05 PM To: maria-discuss email list <maria-discuss@lists.launchpad.net> Subject: [Maria-discuss] JSON and binary strings
BINARY is one of the datatypes you can use with dynamic columns. If you try saving binary string and then retrieve it with COLUMN_JSON() then you get an invalid JSON document back. All the JSON decoders I tested reject such documents.
It doesn't make sense to me that COLUMN_JSON() would return a document that's invalid according to the JSON spec and can't be decoded. But I don't know what it should do. It seems that BINARY in dynamic columns means that COLUMN_JSON() is in an impossible position.
[On a related note, if you save a valid utf8 string that includes control characters (e.g. "I went down to the river,\nI set down on the bank.") then COLUMN_JSON() fails to escape them. While similar, this I think is a bug.]
Tom _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
one "problem" about JSON, it only accept Unicode, be carefull when using latin1 + utf8 + json http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf "2 Conformance Conforming JSON text is a sequence of Unicode code points that strictly conforms to the JSON grammar"
Hi Roberto, Never mind latin1. What about BINARY, one of the datatypes explicitly supported by dynamic columns? If you put a BINARY string into a dynamic column, what should COLUMN_JSON do? All other JSON encoders I know error on data that cannot be encoded as JSON. But Maria instead returns non-JSON. Tom From: Roberto Spadim <roberto@spadim.com.br> Date: Saturday, January 31, 2015 at 3:48 PM To: Peter Laursen <peter_laursen@webyog.com> Cc: Tom Worster <fsb@thefsb.org>, maria-discuss email list <maria-discuss@lists.launchpad.net> Subject: Re: [Maria-discuss] JSON and binary strings one "problem" about JSON, it only accept Unicode, be carefull when using latin1 + utf8 + json http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf "2 Conformance Conforming JSON text is a sequence of Unicode code points that strictly conforms to the JSON grammar"
Hi Peter, Yes, that is the actual returned value. But
{"name":"ÁÂÃÄÅÆ"}
is not valid JSON (see 2.5 in rfc4627). And I don't think invalid JSON *should* be expected from a function called COLUMN_JSON. But I don't know what Maria should do when it is asked to return a value in a format that cannot express that value. Tom From: Peter Laursen <peter_laursen@webyog.com> Date: Saturday, January 31, 2015 at 3:39 PM To: Tom Worster <fsb@thefsb.org> Cc: maria-discuss email list <maria-discuss@lists.launchpad.net> Subject: Re: [Maria-discuss] JSON and binary strings I get the result (what I think is expected) from the query "SELECT COLUMN_JSON(COLUMN_CREATE('name', UNHEX('C1C2C3C4C5C6'))):" {"name":"ÁÂÃÄÅÆ"} .. in SQLyog (GUI client always using UTF8 as client character set) on Windows. In command line client shipped with MariaDB 10.0 for Windows, I get garbage (both after SET NAMES latin1 and SET NAMES utf8 - but MariaDB does not (yet) have a fully unicode-enabled commandline client for Windows like MySQL 5.6 and 5.7 have). Are you sure that your client environment will handle all those special/accented characters at the same time? I think you bumped into an encoding issue in the client of some kind. What is your client environment and OS, and what does the query return here? -- Peter -- Webyog On Sat, Jan 31, 2015 at 3:59 PM, Tom Worster <fsb@thefsb.org> wrote:
I could have been more direct. What should be the expected behavior in the following?
SELECT COLUMN_JSON(COLUMN_CREATE('name', UNHEX('C1C2C3C4C5C6')));
From: Tom Worster <fsb@thefsb.org> Date: Friday, January 30, 2015 at 9:05 PM To: maria-discuss email list <maria-discuss@lists.launchpad.net> Subject: [Maria-discuss] JSON and binary strings
BINARY is one of the datatypes you can use with dynamic columns. If you try saving binary string and then retrieve it with COLUMN_JSON() then you get an invalid JSON document back. All the JSON decoders I tested reject such documents.
It doesn't make sense to me that COLUMN_JSON() would return a document that's invalid according to the JSON spec and can't be decoded. But I don't know what it should do. It seems that BINARY in dynamic columns means that COLUMN_JSON() is in an impossible position.
[On a related note, if you save a valid utf8 string that includes control characters (e.g. "I went down to the river,\nI set down on the bank.") then COLUMN_JSON() fails to escape them. While similar, this I think is a bug.]
Tom _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
from jslint: http://www.jslint.com/ return: JSON: good.
JSLint is not a JSON decoder. Try Firefox or Chrome or PHP or Python instead. The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F). From: Roberto Spadim <roberto@spadim.com.br> Date: Saturday, January 31, 2015 at 3:56 PM To: Tom Worster <fsb@thefsb.org> Cc: Peter Laursen <peter_laursen@webyog.com>, maria-discuss email list <maria-discuss@lists.launchpad.net> Subject: Re: [Maria-discuss] JSON and binary strings from jslint: http://www.jslint.com/ return: JSON: good.
well jslint use javascript :) that's why we have json, it's a decoder i think the decoder is have some problem with unicode characters... at mariadb i see something like this: select COLUMN_JSON( COLUMN_CREATE('name', UNHEX('C1C2C3C4C5C6')) <- a valid dynamic column value ) <- a valid json value what 0xC1C2C3C4C5C6 mean in utf-8? could you execute you program, and echo byte-byte the value? something like (php) $php=mysql_result(...) for($i=0;$i<strlen($php);$i++) echo "$i = ".dechex(ord(substr($php,$i,1)))."\n"; and check if you have something like C1 C2 C3 C4 C5 C6
Actually, the characters were probably converted into valid utf8 when you pasted them into the JSLint web form. From: Roberto Spadim <roberto@spadim.com.br> Date: Saturday, January 31, 2015 at 3:56 PM To: Tom Worster <fsb@thefsb.org> Cc: Peter Laursen <peter_laursen@webyog.com>, maria-discuss email list <maria-discuss@lists.launchpad.net> Subject: Re: [Maria-discuss] JSON and binary strings from jslint: http://www.jslint.com/ return: JSON: good.
yeap, check what's the result() from database, byte-byte, probably you got some conversion at mysql client or at programming language at json docs i didn't found a specific encode to c1c2c3c4c5c6.... at http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf --- 9 String A string is a sequence of Unicode code points wrapped with quotation marks (U+0022). All characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark (U+0022), reverse solidus (U+005C), and the control characters U+0000 to U+001F. There are two-character escape sequence representations of some characters. \" represents the quotation mark character (U+0022). \\ represents the reverse solidus character (U+005C). \/ represents the solidus character (U+002F). \b represents the backspace character (U+0008). \f represents the form feed character (U+000C). \n represents the line feed character (U+000A). \r represents the carriage return character (U+000D). \t represents the character tabulation character (U+0009). So, for example, a string containing only a single reverse solidus character may be represented as "\\". Any code point may be represented as a hexadecimal number. The meaning of such a number is determined by ISO/IEC 10646. If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point. Hexadecimal digits can be digits (U+0030 through U+0039) or the hexadecimal letters A through F in uppercase (U+0041 through U+0046) or lowercase (U+0061 through U+0066). So, for example, a string containing only a single reverse solidus character may be represented as "\u005C". The following four cases all produce the same result: "\u002F" "\u002f" "\/" "/" To escape a code point that is not in the Basic Multilingual Plane, the character is represented as a twelvecharacter sequence, encoding the UTF-16 surrogate pair. So for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E".
at mariadb i think we don't have binary problems, check: create table a9 SELECT COLUMN_JSON(COLUMN_CREATE('name', UNHEX('C1C2C3C4C5C6'))) as b show create table a9: CREATE TABLE `a9` ( `b` longblob <- binary ) ENGINE=MyISAM DEFAULT CHARSET=utf8 <- charset ok
i'm using: Variable_name=Value character_set_client=utf8mb4 character_set_connection=utf8mb4 character_set_database=latin1 character_set_filesystem=latin1 character_set_results=utf8mb4 character_set_server=latin1 character_set_system=utf8
participants (3)
-
Peter Laursen
-
Roberto Spadim
-
Tom Worster