MySQL & Python & UTF-8

Yandong Liu
1 min readJun 30, 2020

I recently ran into the following error when inserting a non-ASCII char into the database:

UnicodeEncodeError: ‘charmap’ codec can’t encode character ‘\u718a’ in position 5: character maps to <undefined>

After some Googling I was told to change the DB connection URI to:

mysql://user:passwd@localhost/db?charset=utf8

which only yielded

sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (2019, “Can’t initialize character set utf8 (path: /usr/share/mysql/charsets/)”)

This could be caused by incorrect database charset, as shown in:

mysql> SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name = “db”;
+ — — — — — — — — — — — — — — +
| default_character_set_name |
+ — — — — — — — — — — — — — — +
| latin1 |
+ — — — — — — — — — — — — — — +

Below is my fix:

First, create the database with utf8mb4(which supports emoji as well):

CREATE DATABASE db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

and double-check that’s the case:

mysql> SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name = “db”;
+ — — — — — — — — — — — — — — +
| default_character_set_name |
+ — — — — — — — — — — — — — — +
| utf8mb4 |
+ — — — — — — — — — — — — — — +

Then in the create table statements, be explicit about the charset:

`title` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,

In the end, attach the charset parameter when connecting:

mysql://user:passwd@localhost/db?charset=utf8mb4

Now things should just work.

--

--