tsearch2-utf8-czech - Czech UTF-8 support for Tsearch2 PostgreSQL 8.2+

Here goes quick and dirty manual for Czech UTF-8 fulltext support in PostgreSQL (8.2+). It’s quite possible, it will work with other languages as well. The important thing here is the UTF-8 support, apart from typical Latin2 (ISO-8859-2) settings which I was never fond of.

First you should set your server machine environment, that is set its locale. Launch the command locale -a to get a list of all supported locales. Here’s what my Mac box says:


iMac:/usr/local/pgsql/bin postgres$ locale -a
..
..
cs_CZ
cs_CZ.ISO8859-2
cs_CZ.UTF-8
..
..

sk_SK.UTF-8
sl_SI
sl_SI.ISO8859-2
sl_SI.UTF-8
sr_YU
sr_YU.ISO8859-2
sr_YU.ISO8859-5
sr_YU.UTF-8
sv_SE
sv_SE.ISO8859-1
sv_SE.ISO8859-15
sv_SE.UTF-8
tr_TR
tr_TR.ISO8859-9
tr_TR.UTF-8
uk_UA
uk_UA.ISO8859-5
uk_UA.KOI8-U
uk_UA.UTF-8
..
..
C
POSIX

You should find your UTF-8 language there. I was looking for cs_CZ.UTF-8. Once found and confirmed, you should set it up to be your primary locale. Run locale and see what is your actual setting.


iMac:/usr/local/pgsql/bin postgres$ locale
LANG="cs_CZ.UTF-8"
LC_COLLATE="cs_CZ.UTF-8"
LC_CTYPE="cs_CZ.UTF-8"
LC_MESSAGES="cs_CZ.UTF-8"
LC_MONETARY="cs_CZ.UTF-8"
LC_NUMERIC="cs_CZ.UTF-8"
LC_TIME="cs_CZ.UTF-8"
LC_ALL="cs_CZ.UTF-8"

In order to change it (if it’s different than required), I’ve added the following to the ~/.bash_login file (this may vary on your system, consult Google for how to setup default locale on you machine).


iMac:/usr/local/pgsql/bin postgres$ vim ~/.bash_login

export LC_CTYPE=cs_CZ.UTF-8
export LANG=cs_CZ.UTF-8
export LANGUAGE=cs_CZ.UTF-8

Now you need to initialize postgres datadir with appropriate locale. In my case:


-- initialize
/usr/local/pgsql/bin/initdb --locale=cs_CZ.UTF-8 /usr/local/pgsql/data
-- and to create a database use:
createdb -E UTF8 test

You are almost done, now install tsearch2 support to you database by running the tsearch2.sql script.

Final step is to download my tsearch2-utf8-czech package and follow the instructions (you basically edit and launch the install.sql script on your database and test.sql to test tsearch2 support).

Download the tsearch2-utf8-czech package (includes tsearch2.sql script and ispell dictionaries). All examples of use and configuration are included in the packages.

Links: Tsearch2 page, Tsearch2 information in Czech.


About this entry