On Wed, Jul 5, 2017 at 3:54 PM, David Karr <davidmichaelkarr@gmail.com> wrote:
I've inherited a small webapp that is using MariaDB for persistence. Some of the forms have textarea fields for extended text to be entered.
Someone reported an issue saving a form with some text that they had pasted from an email. The message started with this: ------------------ Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Incorrect string value: '\xC2\x95\x09Onb...' for column 'ssimpact' at row 1 ----------------
I found where "Onb" is in the text, and right before it is a "bullet" character. So, this appeared to be a Unicode conversion issue. I tried pasting the same text after it had been passed to me, and it didn't fail. I'm pretty sure it didn't fail because that process of "passing it around" filtered the text to be all valid characters. The person who reported the problem said that when she just resubmitted it, it didn't fail. That might also point to a "cleansing" process that resulted in the submitted characters being legal.
What are some reasonable strategies for getting this to work a little better?
Self-replying to add some more information. I see from the output of "SELECT * FROM INFORMATION_SCHEMA.SCHEMATA;" that for my database, DEFAULT_CHARACTER_SET_NAME is "latin1" and DEFAULT_COLLATION_NAME is "latin1_swedish_ci". When I created the database, I just did "create database <name>;". I'm guessing that when I created this database, I should have added "CHARACTER SET = 'utf-8'". Now that my database is created, and I have data in it, if I do an "alter table" on the tables that can have this data, will this do a proper conversion to the existing data, and allow the insertion of those "special" characters like bullets?
From https://mariadb.com/kb/en/mariadb/setting-character-sets-and-collations/ , I would guess I would do something like this:
ALTER TABLE table_name CONVERT TO CHARACTER SET 'utf-8' COLLATE 'utf8_general_ci'; ----------- I'm not certain about that collation name, but I noticed that the "information_schema" database has the utf-8 charset, and the "utf8_general_ci" collation name.