<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AprilChild &#187; Garuda</title>
	<atom:link href="http://www.april-child.com/blog/category/garuda/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.april-child.com/blog</link>
	<description>Insignificant Tagline</description>
	<lastBuildDate>Tue, 13 Apr 2010 18:42:42 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>tsearch2-utf8-czech &#8211; Czech UTF-8 support for Tsearch2 PostgreSQL 8.2+</title>
		<link>http://www.april-child.com/blog/2007/06/25/tsearch2-utf8-czech-czech-utf-8-support-for-tsearch2-postgresql-82/</link>
		<comments>http://www.april-child.com/blog/2007/06/25/tsearch2-utf8-czech-czech-utf-8-support-for-tsearch2-postgresql-82/#comments</comments>
		<pubDate>Mon, 25 Jun 2007 11:26:22 +0000</pubDate>
		<dc:creator>p</dc:creator>
				<category><![CDATA[Garuda]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://www.april-child.com/blog/2007/06/25/tsearch2-utf8-czech-czech-utf-8-support-for-tsearch2-postgresql-82/</guid>
		<description><![CDATA[
Here goes quick and dirty manual for Czech UTF-8 fulltext support in PostgreSQL (8.2+). It&#8217;s quite possible, it will work with other languages as well. The important thing here is the UTF-8 support, apart from typical Latin2 (ISO-8859-2) settings which I was never fond of.


First you should set your server machine environment, that is set [...]]]></description>
			<content:encoded><![CDATA[<p>
Here goes quick and dirty manual for Czech UTF-8 fulltext support in PostgreSQL (8.2+). It&#8217;s quite possible, it will work with other languages as well. The important thing here is the <strong>UTF-8 support</strong>, apart from typical Latin2 (ISO-8859-2) settings which I was never fond of.
</p>
<p>
First you should set your server machine environment, that is set its locale. Launch the command <strong>locale -a</strong> to get a list of all supported locales. Here&#8217;s what my Mac box says:</p>
<p><code><br />
iMac:/usr/local/pgsql/bin postgres$ locale -a<br />
..<br />
..<br />
cs_CZ<br />
cs_CZ.ISO8859-2<br />
cs_CZ.UTF-8<br />
..<br />
..</p>
<p>sk_SK.UTF-8<br />
sl_SI<br />
sl_SI.ISO8859-2<br />
sl_SI.UTF-8<br />
sr_YU<br />
sr_YU.ISO8859-2<br />
sr_YU.ISO8859-5<br />
sr_YU.UTF-8<br />
sv_SE<br />
sv_SE.ISO8859-1<br />
sv_SE.ISO8859-15<br />
sv_SE.UTF-8<br />
tr_TR<br />
tr_TR.ISO8859-9<br />
tr_TR.UTF-8<br />
uk_UA<br />
uk_UA.ISO8859-5<br />
uk_UA.KOI8-U<br />
uk_UA.UTF-8<br />
..<br />
..<br />
C<br />
POSIX<br />
</code></p>
<p>
You should find your UTF-8 language there. I was looking for cs_CZ.UTF-8. Once found and confirmed, you should set it up to be your primary locale. Run <strong>locale</strong> and see what is your actual setting.
</p>
<p><code><br />
iMac:/usr/local/pgsql/bin postgres$ locale<br />
LANG="cs_CZ.UTF-8"<br />
LC_COLLATE="cs_CZ.UTF-8"<br />
LC_CTYPE="cs_CZ.UTF-8"<br />
LC_MESSAGES="cs_CZ.UTF-8"<br />
LC_MONETARY="cs_CZ.UTF-8"<br />
LC_NUMERIC="cs_CZ.UTF-8"<br />
LC_TIME="cs_CZ.UTF-8"<br />
LC_ALL="cs_CZ.UTF-8"<br />
</code></p>
<p>In order to change it (if it&#8217;s different than required), I&#8217;ve added the following to the ~/.bash_login file (this may vary on your system, consult Google for how to setup default locale on you machine).
</p>
<p><code><br />
iMac:/usr/local/pgsql/bin postgres$ vim ~/.bash_login </p>
<p>export LC_CTYPE=cs_CZ.UTF-8<br />
export LANG=cs_CZ.UTF-8<br />
export LANGUAGE=cs_CZ.UTF-8<br />
</code></p>
<p>Now you need to initialize postgres datadir with appropriate locale. In my case:
</p>
<p><code><br />
-- initialize<br />
/usr/local/pgsql/bin/initdb --locale=cs_CZ.UTF-8 /usr/local/pgsql/data<br />
-- and to create a database use:<br />
createdb -E UTF8 test<br />
</code></p>
<p>You are almost done, now install tsearch2 support to you database by running the <strong>tsearch2.sql</strong> script.
</p>
<p>Final step is to download my tsearch2-utf8-czech package and follow the instructions (you basically edit and launch the install.sql script on your database and test.sql to test tsearch2 support).
</p>
<p>
	Download the <a href="/blog-data/tsearch2-utf8-czech.tgz">tsearch2-utf8-czech</a> package (includes tsearch2.sql script and ispell dictionaries). <em>All examples of use and configuration are included in the packages</em>.
</p>
<p>
Links: <a href="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/">Tsearch2 page</a>, <a href="http://www.pgsql.cz/index.php/TSearch2">Tsearch2 information in Czech</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.april-child.com/blog/2007/06/25/tsearch2-utf8-czech-czech-utf-8-support-for-tsearch2-postgresql-82/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
