[Maria-discuss] execute failed: Incorrect string value: '\xD6sterl...' with mariadb and perl DBD
Hi, I'm trying to use perl-DBD to write a buffer of text that contains an email with umlauts and other non-ASCII characters to a joomla database and having a problem. DBD::mysql::st execute failed: Incorrect string value: '\xD6sterl...' for column `lsv5webstage`.`xuxgc_content`.`fulltext` at row 1 at /home/alerts/scripts_linstage/AdvisoryTest.pm line 373. I'm not familiar enough with how encoding works to fully understand what the problem is. This is a fedora29 system with mariadb-10.3.12 and joomla-3.9. Apparently the '\xD6' is an O with an umlaut in "Sebastian �sterlund". I read something about utf8 not being able to handle 4-char, but I don't fully understand. I found the following reference online which talks about changing the encoding type from utf8 to utf8mb4, but the tables all appear to already be using that encoding: > SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; +--------------------------+--------------------+ | Variable_name | Value | +--------------------------+--------------------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8 | | collation_connection | utf8mb4_unicode_ci | | collation_database | utf8mb4_unicode_ci | | collation_server | utf8mb4_unicode_ci | +--------------------------+--------------------+ execute failed: Incorrect string value: '\xD6sterl...' with mariadb and perl DBD a novice perl programmer I'm not sure it's helpful, but this is the insert statement I'm using in my perl code: my $sql = <<EOF; INSERT INTO xuxgc_content (title, alias, introtext, `fulltext`, state, catid, created, created_by, created_by_alias, modified, modified_by, checked_out, checked_out_time, publish_up, publish_down, images, urls, attribs, version, ordering, metakey, metadesc, metadata, access, hits, language) VALUES ($title, "$title_alias", $introText, $fullText, $state, $catid, $created, $created_by, $created_by_alias, $modified, $modified_by, $checked_out, $checked_out_time, $publish_up, $publish_down, $images, $urls, $attribs, $version, $ordering, $metakey, $metadesc, $metadata, $access, $hits, $language); EOF my $sth = $dbh->prepare($sql); $sth->execute(); db_disconnect($dbh); I don't recall having this problem in the past, and this script has been in use for quite a while. What am I doing wrong?
It looks like your input, with “Ö” as a single byte \xd6, is not UTF-8. Any Unicode code point above 127 requires at least 2 bytes in UTF-8. So if MySQL is expecting UTF-8, it may be choking on what it sees as invalid input. -F
On Jun 19, 2019, at 1:56 PM, Dave Wreski <dwreski@guardiandigital.com> wrote:
Hi,
I'm trying to use perl-DBD to write a buffer of text that contains an email with umlauts and other non-ASCII characters to a joomla database and having a problem.
DBD::mysql::st execute failed: Incorrect string value: '\xD6sterl...' for column `lsv5webstage`.`xuxgc_content`.`fulltext` at row 1 at /home/alerts/scripts_linstage/AdvisoryTest.pm line 373.
I'm not familiar enough with how encoding works to fully understand what the problem is. This is a fedora29 system with mariadb-10.3.12 and joomla-3.9.
Apparently the '\xD6' is an O with an umlaut in "Sebastian �sterlund". I read something about utf8 not being able to handle 4-char, but I don't fully understand.
I found the following reference online which talks about changing the encoding type from utf8 to utf8mb4, but the tables all appear to already be using that encoding:
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; +--------------------------+--------------------+ | Variable_name | Value | +--------------------------+--------------------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8 | | collation_connection | utf8mb4_unicode_ci | | collation_database | utf8mb4_unicode_ci | | collation_server | utf8mb4_unicode_ci | +--------------------------+--------------------+
execute failed: Incorrect string value: '\xD6sterl...' with mariadb and perl DBD a novice perl programmer I'm not sure it's helpful, but this is the insert statement I'm using in my perl code:
my $sql = <<EOF; INSERT INTO xuxgc_content (title, alias, introtext, `fulltext`, state, catid, created, created_by, created_by_alias, modified, modified_by, checked_out, checked_out_time, publish_up, publish_down, images, urls, attribs, version, ordering, metakey, metadesc, metadata, access, hits, language) VALUES ($title, "$title_alias", $introText, $fullText, $state, $catid, $created, $created_by, $created_by_alias, $modified, $modified_by, $checked_out, $checked_out_time, $publish_up, $publish_down, $images, $urls, $attribs, $version, $ordering, $metakey, $metadesc, $metadata, $access, $hits, $language); EOF
my $sth = $dbh->prepare($sql); $sth->execute(); db_disconnect($dbh);
I don't recall having this problem in the past, and this script has been in use for quite a while.
What am I doing wrong?
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi Felipe, On 6/19/19 1:59 PM, Felipe Gasper wrote:
It looks like your input, with “Ö” as a single byte \xd6, is not UTF-8. Any Unicode code point above 127 requires at least 2 bytes in UTF-8. So if MySQL is expecting UTF-8, it may be choking on what it sees as invalid input.
It's from an email with the following: Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Do you have any recommendations for fixing it? I can't imagine this is the first time it's been encountered. I thought about escaping or encoding it, but I don't think joomla would know to decode it. I also found this reference, which points to making changes to the tables and columns, but it's like seven years old: https://mathiasbynens.be/notes/mysql-utf8mb4 Is this still relevant? If so, is there a way to automate the changes? There are dozens of tables... Thanks, Dave
-F
On Jun 19, 2019, at 1:56 PM, Dave Wreski <dwreski@guardiandigital.com> wrote:
Hi,
I'm trying to use perl-DBD to write a buffer of text that contains an email with umlauts and other non-ASCII characters to a joomla database and having a problem.
DBD::mysql::st execute failed: Incorrect string value: '\xD6sterl...' for column `lsv5webstage`.`xuxgc_content`.`fulltext` at row 1 at /home/alerts/scripts_linstage/AdvisoryTest.pm line 373.
I'm not familiar enough with how encoding works to fully understand what the problem is. This is a fedora29 system with mariadb-10.3.12 and joomla-3.9.
Apparently the '\xD6' is an O with an umlaut in "Sebastian �sterlund". I read something about utf8 not being able to handle 4-char, but I don't fully understand.
I found the following reference online which talks about changing the encoding type from utf8 to utf8mb4, but the tables all appear to already be using that encoding:
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; +--------------------------+--------------------+ | Variable_name | Value | +--------------------------+--------------------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8 | |
collation_connection | utf8mb4_unicode_ci | | collation_database | utf8mb4_unicode_ci | | collation_server | utf8mb4_unicode_ci | +--------------------------+--------------------+
execute failed: Incorrect string value: '\xD6sterl...' with mariadb and perl DBD a novice perl programmer I'm not sure it's helpful, but this is the insert statement I'm using in my perl code:
my $sql = <<EOF; INSERT INTO xuxgc_content (title, alias, introtext, `fulltext`, state, catid, created, created_by, created_by_alias, modified, modified_by, checked_out, checked_out_time, publish_up, publish_down, images, urls, attribs, version, ordering, metakey, metadesc, metadata, access, hits, language) VALUES ($title, "$title_alias", $introText, $fullText, $state, $catid, $created, $created_by, $created_by_alias, $modified, $modified_by, $checked_out, $checked_out_time, $publish_up, $publish_down, $images, $urls, $attribs, $version, $ordering, $metakey, $metadesc, $metadata, $access, $hits, $language); EOF
my $sth = $dbh->prepare($sql); $sth->execute(); db_disconnect($dbh);
I don't recall having this problem in the past, and this script has been in use for quite a while.
What am I doing wrong?
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi Dave, You can convert iso8859-1 to UTF8 using the iconv utility. Then feed the output into MariaDB. -FG
On Jun 19, 2019, at 2:21 PM, Dave Wreski <dwreski@guardiandigital.com> wrote:
Hi Felipe,
On 6/19/19 1:59 PM, Felipe Gasper wrote: It looks like your input, with “Ö” as a single byte \xd6, is not UTF-8. Any Unicode code point above 127 requires at least 2 bytes in UTF-8. So if MySQL is expecting UTF-8, it may be choking on what it sees as invalid input.
It's from an email with the following:
Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable
Do you have any recommendations for fixing it? I can't imagine this is the first time it's been encountered. I thought about escaping or encoding it, but I don't think joomla would know to decode it.
I also found this reference, which points to making changes to the tables and columns, but it's like seven years old:
https://mathiasbynens.be/notes/mysql-utf8mb4
Is this still relevant? If so, is there a way to automate the changes? There are dozens of tables...
Thanks, Dave
-F
On Jun 19, 2019, at 1:56 PM, Dave Wreski <dwreski@guardiandigital.com> wrote: Hi, I'm trying to use perl-DBD to write a buffer of text that contains an email with umlauts and other non-ASCII characters to a joomla database and having a problem. DBD::mysql::st execute failed: Incorrect string value: '\xD6sterl...' for column `lsv5webstage`.`xuxgc_content`.`fulltext` at row 1 at /home/alerts/scripts_linstage/AdvisoryTest.pm line 373. I'm not familiar enough with how encoding works to fully understand what the problem is. This is a fedora29 system with mariadb-10.3.12 and joomla-3.9. Apparently the '\xD6' is an O with an umlaut in "Sebastian �sterlund". I read something about utf8 not being able to handle 4-char, but I don't fully understand. I found the following reference online which talks about changing the encoding type from utf8 to utf8mb4, but the tables all appear to already be using that encoding:
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; +--------------------------+--------------------+ | Variable_name | Value | +--------------------------+--------------------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8 | |
collation_connection | utf8mb4_unicode_ci | | collation_database | utf8mb4_unicode_ci | | collation_server | utf8mb4_unicode_ci | +--------------------------+--------------------+ execute failed: Incorrect string value: '\xD6sterl...' with mariadb and perl DBD a novice perl programmer I'm not sure it's helpful, but this is the insert statement I'm using in my perl code: my $sql = <<EOF; INSERT INTO xuxgc_content (title, alias, introtext, `fulltext`, state, catid, created, created_by, created_by_alias, modified, modified_by, checked_out, checked_out_time, publish_up, publish_down, images, urls, attribs, version, ordering, metakey, metadesc, metadata, access, hits, language) VALUES ($title, "$title_alias", $introText, $fullText, $state, $catid, $created, $created_by, $created_by_alias, $modified, $modified_by, $checked_out, $checked_out_time, $publish_up, $publish_down, $images, $urls, $attribs, $version, $ordering, $metakey, $metadesc, $metadata, $access, $hits, $language); EOF my $sth = $dbh->prepare($sql); $sth->execute(); db_disconnect($dbh); I don't recall having this problem in the past, and this script has been in use for quite a while. What am I doing wrong? _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hey Dave, I had some problems dealing with wrong coded UTF-8 chars. You have to either deal with them before sending them to MariaDB, trying to detect if the string is encoded correctly or you can change the field type to blob, which will accept this string as is and not thrown a invalid string error in MariaDB. []s. On Wed, Jun 19, 2019 at 2:56 PM Dave Wreski <dwreski@guardiandigital.com> wrote:
Hi,
I'm trying to use perl-DBD to write a buffer of text that contains an email with umlauts and other non-ASCII characters to a joomla database and having a problem.
DBD::mysql::st execute failed: Incorrect string value: '\xD6sterl...' for column `lsv5webstage`.`xuxgc_content`.`fulltext` at row 1 at /home/alerts/scripts_linstage/AdvisoryTest.pm line 373.
I'm not familiar enough with how encoding works to fully understand what the problem is. This is a fedora29 system with mariadb-10.3.12 and joomla-3.9.
Apparently the '\xD6' is an O with an umlaut in "Sebastian �sterlund". I read something about utf8 not being able to handle 4-char, but I don't fully understand.
I found the following reference online which talks about changing the encoding type from utf8 to utf8mb4, but the tables all appear to already be using that encoding:
> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; +--------------------------+--------------------+ | Variable_name | Value | +--------------------------+--------------------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8 | | collation_connection | utf8mb4_unicode_ci | | collation_database | utf8mb4_unicode_ci | | collation_server | utf8mb4_unicode_ci | +--------------------------+--------------------+
execute failed: Incorrect string value: '\xD6sterl...' with mariadb and perl DBD a novice perl programmer I'm not sure it's helpful, but this is the insert statement I'm using in my perl code:
my $sql = <<EOF; INSERT INTO xuxgc_content (title, alias, introtext, `fulltext`, state, catid, created, created_by, created_by_alias, modified, modified_by, checked_out, checked_out_time, publish_up, publish_down, images, urls, attribs, version, ordering, metakey, metadesc, metadata, access, hits, language) VALUES ($title, "$title_alias", $introText, $fullText, $state, $catid, $created, $created_by, $created_by_alias, $modified, $modified_by, $checked_out, $checked_out_time, $publish_up, $publish_down, $images, $urls, $attribs, $version, $ordering, $metakey, $metadesc, $metadata, $access, $hits, $language); EOF
my $sth = $dbh->prepare($sql); $sth->execute(); db_disconnect($dbh);
I don't recall having this problem in the past, and this script has been in use for quite a while.
What am I doing wrong?
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
participants (3)
-
Alessandro Ren
-
Dave Wreski
-
Felipe Gasper