Sunday, January 17, 2010

Using Unicode UTF8 for Webpages (PHP & MySQL)

Hi there,

Another adventure of mine; making the webpage support Unicode characters.

As you guys already know, I developed my own CMS and commercialized it as Digital Paper. Last time my client asked me either it support chinese characters or not? In that time, I was lack of answer as I never get chance to test it with chinese characters. But the day finally came whereas another client of mine gave me the opportunity to provide him the website with chinese character support.

And my adventure of tuning my pages started like this;

Basically I found that when I am submitting a form with unicode character (chinese characters), the php receives the unicode character as numeric character references (NCR) and it works great! The exact NCR being saved into the database.

But that wasn't ideal because my CMS is also developed using AJAX which enables the values to be saved without submitting the form. The AJAX will grab the value from the text-box and input direct to database. In this case, it did not work.

At first, I suspected the AJAX. But then I found the AJAX has full-unicode support and I tested it as well. Then I tried to modify function to convert the data to NCR before saving to database. For this purpose, I failed to write and failed to find a good function to do what I wanted. Then I studied further through googling and found advise that using NCR could be temporary solution only and not absolute because it is converting each characters rather than preparing the whole website for unicode support.

I finally decided to follow the classic way that I know which is making the whole website to be UTF8 character-set. Only 3 step, and my work was done. Seriously.

Step 1:
add the appropriate meta and make sure all the pages does load this meta (this is for the browser to understand and extract UTF8)
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"></meta>
Reference: http://webcollab.sourceforge.net/unicode.html

Step 2:
add the appropriate character set to php.ini (this is for the web-server to understand and output UTF8)
default_charset = "utf-8"
Reference:http://htmlpurifier.org/docs/enduser-utf8.html

Step 3:
change the mysql table to appropriate encoding by executing this kind of command line for every table (this is for the database server to understand and accept UTF8)
ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
Reference: http://wolfram.kriesing.de/blog/index.php/2007/convert-mysql-db-to-utf8

Finally, enjoy the show!
(the website works with Chinese characters)


Have fun guys!
 

1 comment:

Unknown said...

This information is fantastic.. I have been searching for multilingual database and could not find descent steps to implement.. I think it will work for english as well as chinese/japanese data as well..
Thanks
Anand Barhate