A few months back I ran into an issue with migrating Unicode characters from MSSQL to MySQL (okay… so maybe it was more than a year ago and I never really got around to writing anything, stop judging). Through much research and frustration, I found that MSSQL and MySQL do not support any of the same Unicode character sets. Thus it is nearly impossible to migrate any character that is not covered under the extended ASCII standard. Also go figure I couldn’t find any documentation on anyone running into this particular issue.

MSSQL has a fairly limited amount of character sets that it can export; you can export using the -C or -w flag while using bcp. By specifying the -C you will be able to export any of the character sets found in the Code Page Architecture (http://msdn.microsoft.com/en-us/library/ms186356.aspx). When you specify the -w flag MSSQL will export all data as UTF-16LE (http://en.wikipedia.org/wiki/UTF-16/UCS-2). Unfortunately MySQL’s native support for UTF-16 is limited to UTF-16BE. This inconsistency in the supported UTF-16 encoding by the two database engines renders a direct migration impossible for unicode characters.

After searching high and low for a boxed solution I came to realization that either I suck at googling (a definite possibility) or nothing existed. So like any true professional, I decided to roll my own half-assed solution using bcp, libiconv (http://gnuwin32.sourceforge.net/packages/libiconv.htm), and MySQL’s LOAD DATA INFILE.

Walk Through

First you will need to convert your table schema in MSSQL to a MySQL table. This should be pretty straight forward, but you will need to have a completely identical tables in both systems before you start the moving process.

Then we start out by dumping the information from MSSQL. During this dumping process I ended up having to use marker characters for special positions. I would love to say this is because there was some real reason behind it, but nope, I was just too inept to get it working any other way.

$ bcp "select '**BOL**', u.* from absynth.dbo.user u;" queryout "C:tmpuser.csv.dump" -w -CRAW -t"**EOT**" -r"***EOL***" -S localhost -U "absynth" -P "peanutbutter"

Then we will run the iconv conversion from MSSQL’s Unicode format to MySQL’s Unicode format.

$ iconv -f UTF-16LE -t UTF-8  "C:tmpuser.csv"

After that we will load the information into MySQL using the markers we created during the dumping process in MSSQL.

$ mysql -h localhost -u absynth --password=cookies --execute="ALTER TABLE user DISABLE KEYS; LOAD DATA LOCAL INFILE 'C:tmpuser.csv.dump' INTO TABLE user CHARACTER SET utf8 FIELDS TERMINATED BY '**EOT**' ENCLOSED BY '' ESCAPED BY '' LINES STARTING BY '**BOL****EOT**' TERMINATED BY '***EOL***'; ALTER TABLE user ENABLE KEYS;" --database=absynth

Finally here is a completed script for the conversion process and some extra commands to clean up our temp data.

$ bcp "select '**BOL**', u.* from absynth.dbo.user u;" queryout "C:tmpuser.csv.dump" -w -CRAW -t"**EOT**" -r"***EOL***" -S localhost -U "absynth" -P "peanutbutter"
 $ iconv -f UTF-16LE -t UTF-8  "C:tmpuser.csv" 
$ del "C:tmpuser.csv.dump" 
$ mysql -h localhost -u absynth --password=cookies --execute="ALTER TABLE user DISABLE KEYS; LOAD DATA LOCAL INFILE 'C:tmpuser.csv.dump' INTO TABLE user CHARACTER SET utf8 FIELDS TERMINATED BY '**EOT**' ENCLOSED BY '' ESCAPED BY '' LINES STARTING BY '**BOL****EOT**' TERMINATED BY '***EOL***'; ALTER TABLE user ENABLE KEYS;" --database=absynth 
$ del "C:tmpuser.csv"

2 Comments

  1. Thanks very much for your article. It does have some small errors. Here is a script I use (save as a .bat file) based off of yours. Things in [brackets] are just placeholders and should be changed to reflect actual dbname, tables, usernames, and passwords:

    bcp “select ‘**BOL**’, a.* from [mssqlDatabaseName].[mssqlTableName] a;” queryout “C:tmp[mysqlTableName].csv.dump” -w -CRAW -t”**EOT**” -r”***EOL***” -S localhost -U “[mssqlUsername]” -P “[mssqlPass]”

    iconv -f UTF-16LE -t UTF-8 “C:tmp[mysqlTableName].csv”

    del “C:tmp[mysqlTableName].csv.dump”

    mysql -h localhost -u [mysqlUsername] –password=[mysqlPass] –database=[mysqlDatabaseName] –execute=”ALTER TABLE [mysqlTableName] DISABLE KEYS; LOAD DATA LOCAL INFILE ‘C:\tmp\[mysqlTableName].csv’ INTO TABLE [mysqlTableName] CHARACTER SET utf8 FIELDS TERMINATED BY ‘**EOT**’ ENCLOSED BY ” ESCAPED BY ” LINES STARTING BY ‘**BOL****EOT**’ TERMINATED BY ‘***EOL***’; ALTER TABLE [mysqlTableName] ENABLE KEYS;”

    del “C:tmp[mysqlTableName].csv”

    Like

  2. Thanks Absynth,

    After scouring the Internet for a few hours trying to solve my MSSQL->MySql issues with unicode, your instructions worked perfectly.

    David

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.