Tierlieb Chinese sorting and filtering on an table with charset utf-8
Mar 07, 2011; 18:30
Tierlieb
Chinese sorting and filtering on an table with charset utf-8
Hello there!
I'm trying to get some software ready to work with Chinese/Hanzi characters/graphems/sinograms. My tables, using mysql 5.x, use UTF-8 for all text data. It is not an option to change that for a local Chinese installation.
Storing data works quite nicely. That's what I like about UTF-8.
But sorting does not work. And, related to that, comparisons neither. I'm getting the hopefully silly notion that there is no collation for CHARSET=UTF-8 that could sort according to the rules of GB2312 (which, if it was true, would probably an interesting story to explain why).
This leads me to fear that I might have to use a Pinyin translation library (like pinyin4j, which only translates character by character and is not bijective), store that in a separate column and sort that. Which would only give approximate results since without a huge dictionary backing, the lack of bijective mapping between Hanzi and Pinyin will result in a lack of information.
Or I could create a separate column with GB2312 as encoding and sort by this using the collation I want (gb2312_chinese_ci most likely). The latter would provider better sorting than the former, but require me to do the same thing for other countries, too (I assume there'd be people interested in having big5 and that is still the same language...).
Or I could do it one level above the database layer - which is possible, but I assume any code I could write would be less sophisticated (and thereby slower) than what the database can do.
Previous research: I've seen the threads "Chinese order by with utf8" and "Chinese and MySQL / UTF8 and versions" which are a bit older than I'd like and also don't seem conclusive.
So what are my options here? I'm not married to the idea of using GB2312 - I would be willing to use any other Chinese collation, as long as it is better than the default one (which sorts by number value).
Thanks in advance, Tierlieb
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=listsearcharchive@lassosoft.com
Search
Lasso Programming
This site manages and broadcasts several email lists pertaining to Lasso Programming and technologies related and used by Lasso developers. Sign up today!