AZIndex LogoJust a quick update on what I have been working on with the AZIndex plugin.   I decided it was finally time to do something that I have long been putting off — adding national language support to the plugin.  That doesn’t mean I am translating all the English text to other languages (sorry!), but I am looking at fixing the problems to do with sorting indexes with non-English characters in it, and the displaying of non-English characters in the alphabetical links and headings.

But, wow, little did I realize the complexity that is PHP national language support.  Unicode support will only appear in PHP 6.0, so I have to rely on the older PHP APIs, and only those which are likely to be installed on a WordPress server (i.e not many!).  Not only that, but I discovered that WordPress used something called UTF-8 (which is a multi-byte codepage where characters can be one, two, or even three bytes long) which is fine, but PHP’s collation (sorting) function on Windows doesn’t work with UTF-8 so, on Windows systems, you have to convert every index item into the local codepage before the index can be sorted.  Yuck!

Finally, there is the thorny problem of character equivalence.  If you have index items beginning with an accented character (e.g. “Êtes” or “Übersicht”) then where does it go in the index?  In some languages, like French, the accented characters belong in the same group as the same non-accented character (e.g. “Êtes” goes between “Elle” and “Eve” under “E”) but in other languages, some of the accented characters are grouped seperately.

None of this works in AZIndex at the moment.  Even if the entries like  “Elle”, “Êtes”, and “Eve” are sorted in the correct order (which they will be when I make the right changes) , the items will be put under three separate headings (“E”, “Ê”, and “E” again) when they all need to be under “E”.

The only way I have found to do this reliably on all platforms is to hardcode the mappings of the accented characters to the base characters.  I’ve done this by reverse engineering the UTF-8 collation tables for MySQL (please tell me if you know of a better way!), so that I can fold the accented entries into the correct alphabetical grouping.

Finally, since some non-English characters can appear in different places in the index depending on which language is being used, I have decided to add a new option to the index settings to allow you to pick which language rules to use when folding the accented index entries into the index.  Hopefully, if the results in the default language are not to your liking, you will find a language setting that does work.

Anyway, I will likely be putting out a new version of AZIndex with all this stuff in it sometime soon.  Hopefully those of you who are using AZIndex with non-English web sites will see an improvement.  In the meantime, please let me know if you have any suggestions about anything here or that I may have missed.