Converting a MySQL database from latin1 to utf8

A detailed tutorial on how to migrate an existing latin1 encoded MySQL database to utf8 (UTF-8) encoding.

Introduction

When you create a new database on MySQL, the default behaviour is to create a database supporting the latin1 character set. This is fine for most use cases, however if your application needs to support natural languages that do not use the Latin alphabet (Greek, Japanese, Arabic etc.), then you will need to convert your database to use UTF-81 instead.

In this tutorial, I will show you have to convert an existing database and tables from latin1 to the utf8 character set. Note that I am using MySQL 5.5.34.

Creating a test database

Firstly we are going to create a test database for testing the migration process. Note that if you are going to migrate a real database, you should run this procedure against an offline backup, not against a live production system.

Lets begin by creating a new database with a simple table:

mysql> CREATE DATABASE testmigration;
mysql> USE testmigration;
mysql> CREATE TABLE test (sometext VARCHAR(50));

The new database and table will have the latin1 character set by default on a stock MySQL installation, lets confirm this:

mysql> SHOW FULL COLUMNS FROM test;
+----------+-------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| FIELD    | TYPE        | Collation         | NULL | KEY | DEFAULT | Extra | Privileges                      | Comment |
+----------+-------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| sometext | VARCHAR(50) | latin1_swedish_ci | YES  |     | NULL    |       | SELECT,INSERT,UPDATE,REFERENCES |         |
+----------+-------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
1 ROW IN SET (0.00 sec)
 
mysql> SELECT default_character_set_name FROM information_schema.schemata WHERE schema_name = "testmigration";
+----------------------------+
| default_character_set_name |
+----------------------------+
| latin1                     |
+----------------------------+
1 ROW IN SET (0.00 sec)

If I try to store some non-Latin characters in the test.sometext field, it will become garbled.

Doing the migration to UTF-8

To begin with, we will alter the default character set on the new database to be utf8, which will ensure that any new tables in this database will use this by default:

mysql> ALTER DATABASE testmigration CHARACTER SET utf8 COLLATE utf8_general_ci;
 
Query OK, 1 ROW affected (0.00 sec)
 
SELECT default_character_set_name FROM information_schema.schemata WHERE schema_name = "testmigration";
 
+----------------------------+
| default_character_set_name |
+----------------------------+
| utf8                       |
+----------------------------+
1 ROW IN SET (0.00 sec)

That takes care of new tables, but for existing tables we have to do something a little more complex. Drop back to your bash terminal, and run the following command:

$ mysqldump --add-drop-table -u root --password='password' testmigration | sed -e 's/CHARSET\=latin1/CHARSET\=utf8\ COLLATE\=utf8_general_ci/g' | iconv -f latin1 -t utf8 | mysql -u root --password='password' testmigration

The command uses the mysqldump command to dump the database to standard out, then sed is used to replace latin1 with utf8 in the dump, iconv is used to convert the dump from latin1 character encoding to utf8, and finally the mysql command is used to restore the resulting backup to the database server.

Testing

To confirm that this worked, log back in again and inspect the table as before:

mysql> USE testmigration;
mysql> SHOW FULL COLUMNS FROM test;
+----------+-------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| FIELD    | TYPE        | Collation       | NULL | KEY | DEFAULT | Extra | Privileges                      | Comment |
+----------+-------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| sometext | VARCHAR(50) | utf8_general_ci | YES  |     | NULL    |       | SELECT,INSERT,UPDATE,REFERENCES |         |
+----------+-------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
1 ROW IN SET (0.00 sec)
 
mysql> SHOW CREATE TABLE test;
+-------+--------------------------------------------------------------------------------------------------+
| TABLE | CREATE TABLE                                                                                     |
+-------+--------------------------------------------------------------------------------------------------+
| test  | CREATE TABLE `test` (
  `sometext` VARCHAR(50) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+--------------------------------------------------------------------------------------------------+
1 ROW IN SET (0.00 sec)</code></pre>

You should now be able to safely insert in non-Latin characters from your clients.

References

John Collins

I have been writing about web technology and software development since 2001. I am the developer of the Alpha Framework for PHP, and the five.today personal productivity app. I love open source, technology, and economics.