[Maria-discuss] JSON in MariaDB 10.2.7 and MySQL
-- MariaDB 10.2.7 CREATE TABLE `jsontst`( `id` INT NOT NULL AUTO_INCREMENT, `str` JSON, PRIMARY KEY (`id`) ) ENGINE=Aria CHARSET=latin1; SHOW CREATE TABLE `jsontst`; /* returns CREATE TABLE `jsontst2` ( `id` int(11) NOT NULL AUTO_INCREMENT, `str` text DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=Aria DEFAULT CHARSET=latin1 PAGE_CHECKSUM=1 */ readings: https://jira.mariadb.org/browse/MDEV-9144 https://bugs.mysql.com/bug.php?id=86875 As you see except for JSON in MariaDB just being a synonym of TEXT there are two more significant differences 1) length: in MySQL JSON has the lenght of a LONGTEXT - in MariaDB it is a (plain) TEXT 2) encoding: in MySQL the utf8mb4 character set is enforced for JSON data - in MariaDB you may specify any character set. I think in particular 2nd point could cause problems (unicode garbling) when loading a dump with JSON data from MySQL. Basically because latin1 is still default character set in MariaDB. Are you aware of the 2 differences? I think the major reason for introducing JSON in MariaDB is compability concerns with MySQL. But the 2 points here are IMO incompabilites that could have been avoided?
Additionally as regards 1): I am not a programmer myself and don't know if it makes sense to support JSON string > TEXT (64K) at all. But there is at least a theoretical chance of truncation when moving data from MySQL to MariaDB. as regards 2) I think it makes perfectly sense to enforce utf8mb4 for JSON becuase of *emojis,* that are mapped outside the BMP in Unicode and thus not even supproted with utf8. Emojis are getting increasingly popular with web applications and can be expected to replace smileys based on mapping of ASCII character sequences (such as "." + "-" + ")"). -- Peter On Thu, Jul 13, 2017 at 10:50 AM, Peter Laursen <peter_laursen@webyog.com> wrote:
-- MariaDB 10.2.7 CREATE TABLE `jsontst`( `id` INT NOT NULL AUTO_INCREMENT, `str` JSON, PRIMARY KEY (`id`) ) ENGINE=Aria CHARSET=latin1; SHOW CREATE TABLE `jsontst`; /* returns CREATE TABLE `jsontst2` ( `id` int(11) NOT NULL AUTO_INCREMENT, `str` text DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=Aria DEFAULT CHARSET=latin1 PAGE_CHECKSUM=1 */
readings: https://jira.mariadb.org/browse/MDEV-9144 https://bugs.mysql.com/bug.php?id=86875
As you see except for JSON in MariaDB just being a synonym of TEXT there are two more significant differences 1) length: in MySQL JSON has the lenght of a LONGTEXT - in MariaDB it is a (plain) TEXT 2) encoding: in MySQL the utf8mb4 character set is enforced for JSON data - in MariaDB you may specify any character set.
I think in particular 2nd point could cause problems (unicode garbling) when loading a dump with JSON data from MySQL. Basically because latin1 is still default character set in MariaDB.
Are you aware of the 2 differences? I think the major reason for introducing JSON in MariaDB is compability concerns with MySQL. But the 2 points here are IMO incompabilites that could have been avoided?
Hi, Peter! On Jul 13, Peter Laursen wrote:
-- MariaDB 10.2.7 CREATE TABLE `jsontst`( `id` INT NOT NULL AUTO_INCREMENT, `str` JSON, PRIMARY KEY (`id`) ) ENGINE=Aria CHARSET=latin1; SHOW CREATE TABLE `jsontst`; /* returns CREATE TABLE `jsontst2` ( `id` int(11) NOT NULL AUTO_INCREMENT, `str` text DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=Aria DEFAULT CHARSET=latin1 PAGE_CHECKSUM=1 */
readings: https://jira.mariadb.org/browse/MDEV-9144 https://bugs.mysql.com/bug.php?id=86875
As you see except for JSON in MariaDB just being a synonym of TEXT there are two more significant differences 1) length: in MySQL JSON has the lenght of a LONGTEXT - in MariaDB it is a (plain) TEXT 2) encoding: in MySQL the utf8mb4 character set is enforced for JSON data - in MariaDB you may specify any character set.
I think in particular 2nd point could cause problems (unicode garbling) when loading a dump with JSON data from MySQL. Basically because latin1 is still default character set in MariaDB.
Are you aware of the 2 differences? I think the major reason for introducing JSON in MariaDB is compability concerns with MySQL. But the 2 points here are IMO incompabilites that could have been avoided?
JSON functions were added for SQL Standard and for MySQL compatibility. But JSON alias for TEXT - it was only for MySQL compatibility, no other reasons. So you're right, these issues should be fixed. 1. LONGTEXT - no questions here, we'll fix it. 2. charset. If one would write col1 JSON CHARACTER SET latin1 That'll be an error, right? 3. I could make JSON to imply validity checks too. col1 JSON being equal to col1 LONGTEXT CHARACTER SET utf8mb4 CHECK(JSON_VALID(col1)) is it what users would expect for compatibility reasons? Regards, Sergei Chief Architect MariaDB and security@mariadb.org
I personally think like this col1 JSON being equal to col1 LONGTEXT CHARACTER SET utf8mb4 CHECK(JSON_VALID(col1)) .. would be nice. -- Peter On Thu, Jul 13, 2017 at 12:03 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Peter!
On Jul 13, Peter Laursen wrote:
-- MariaDB 10.2.7 CREATE TABLE `jsontst`( `id` INT NOT NULL AUTO_INCREMENT, `str` JSON, PRIMARY KEY (`id`) ) ENGINE=Aria CHARSET=latin1; SHOW CREATE TABLE `jsontst`; /* returns CREATE TABLE `jsontst2` ( `id` int(11) NOT NULL AUTO_INCREMENT, `str` text DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=Aria DEFAULT CHARSET=latin1 PAGE_CHECKSUM=1 */
readings: https://jira.mariadb.org/browse/MDEV-9144 https://bugs.mysql.com/bug.php?id=86875
As you see except for JSON in MariaDB just being a synonym of TEXT there are two more significant differences 1) length: in MySQL JSON has the lenght of a LONGTEXT - in MariaDB it is a (plain) TEXT 2) encoding: in MySQL the utf8mb4 character set is enforced for JSON data - in MariaDB you may specify any character set.
I think in particular 2nd point could cause problems (unicode garbling) when loading a dump with JSON data from MySQL. Basically because latin1 is still default character set in MariaDB.
Are you aware of the 2 differences? I think the major reason for introducing JSON in MariaDB is compability concerns with MySQL. But the 2 points here are IMO incompabilites that could have been avoided?
JSON functions were added for SQL Standard and for MySQL compatibility.
But JSON alias for TEXT - it was only for MySQL compatibility, no other reasons. So you're right, these issues should be fixed.
1. LONGTEXT - no questions here, we'll fix it.
2. charset. If one would write
col1 JSON CHARACTER SET latin1
That'll be an error, right?
3. I could make JSON to imply validity checks too.
col1 JSON
being equal to
col1 LONGTEXT CHARACTER SET utf8mb4 CHECK(JSON_VALID(col1))
is it what users would expect for compatibility reasons?
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi, Peter! Okay. I've created https://jira.mariadb.org/browse/MDEV-13313 But I'd really like other users to comment on this too. Is it a generally useful thing, is it what you would expect? On Jul 13, Peter Laursen wrote:
I personally think like this
col1 JSON
being equal to
col1 LONGTEXT CHARACTER SET utf8mb4 CHECK(JSON_VALID(col1))
.. would be nice.
-- Peter
On Thu, Jul 13, 2017 at 12:03 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Peter!
On Jul 13, Peter Laursen wrote:
-- MariaDB 10.2.7 CREATE TABLE `jsontst`( `id` INT NOT NULL AUTO_INCREMENT, `str` JSON, PRIMARY KEY (`id`) ) ENGINE=Aria CHARSET=latin1; SHOW CREATE TABLE `jsontst`; /* returns CREATE TABLE `jsontst2` ( `id` int(11) NOT NULL AUTO_INCREMENT, `str` text DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=Aria DEFAULT CHARSET=latin1 PAGE_CHECKSUM=1 */
readings: https://jira.mariadb.org/browse/MDEV-9144 https://bugs.mysql.com/bug.php?id=86875
As you see except for JSON in MariaDB just being a synonym of TEXT there are two more significant differences 1) length: in MySQL JSON has the lenght of a LONGTEXT - in MariaDB it is a (plain) TEXT 2) encoding: in MySQL the utf8mb4 character set is enforced for JSON data - in MariaDB you may specify any character set.
I think in particular 2nd point could cause problems (unicode garbling) when loading a dump with JSON data from MySQL. Basically because latin1 is still default character set in MariaDB.
Are you aware of the 2 differences? I think the major reason for introducing JSON in MariaDB is compability concerns with MySQL. But the 2 points here are IMO incompabilites that could have been avoided?
JSON functions were added for SQL Standard and for MySQL compatibility.
But JSON alias for TEXT - it was only for MySQL compatibility, no other reasons. So you're right, these issues should be fixed.
1. LONGTEXT - no questions here, we'll fix it.
2. charset. If one would write
col1 JSON CHARACTER SET latin1
That'll be an error, right?
3. I could make JSON to imply validity checks too.
col1 JSON
being equal to
col1 LONGTEXT CHARACTER SET utf8mb4 CHECK(JSON_VALID(col1))
is it what users would expect for compatibility reasons?
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
participants (2)
-
Peter Laursen
-
Sergei Golubchik