MySQL & Python & UTF-8
1 min readJun 30, 2020
I recently ran into the following error when inserting a non-ASCII char into the database:
UnicodeEncodeError: ‘charmap’ codec can’t encode character ‘\u718a’ in position 5: character maps to <undefined>
After some Googling I was told to change the DB connection URI to:
mysql://user:passwd@localhost/db?charset=utf8
which only yielded
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (2019, “Can’t initialize character set utf8 (path: /usr/share/mysql/charsets/)”)
This could be caused by incorrect database charset, as shown in:
mysql> SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name = “db”;
+ — — — — — — — — — — — — — — +
| default_character_set_name |
+ — — — — — — — — — — — — — — +
| latin1 |
+ — — — — — — — — — — — — — — +
Below is my fix:
First, create the database with utf8mb4(which supports emoji as well):
CREATE DATABASE db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
and double-check that’s the case:
mysql> SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name = “db”;
+ — — — — — — — — — — — — — — +
| default_character_set_name |
+ — — — — — — — — — — — — — — +
| utf8mb4 |
+ — — — — — — — — — — — — — — +
Then in the create table statements, be explicit about the charset:
…`title` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,…
In the end, attach the charset parameter when connecting:
mysql://user:passwd@localhost/db?charset=utf8mb4
Now things should just work.