SuttaCentral

Pāli spell checker and hyphenator

Dear all,

I’m not sure to what extent this topic has been discussed early so that I apologize if I’m opening a discussion which has maybe been solved.

I’m trying to make a Pāli spell checker and hyphenator for Hunspell. I contacted Libre Office for this. A guy answered me inviting me to collect Pāli words. I began with collecting words from the first part of Dīgha Nikāya so that he/she made the first “draft” of Pāli spell checker. That draft works also on Page Plus X9 and Affinity Publisher. So I collected more words from tipitaka.org, all words from the whole Tipiṭaka, Aṭthakathā, Ṭīka and so on. I tried to contact again the guy from Libre Office in order to make a complete Pāli spell checker but he/she dind’t answer. Anyway, I managed to make a complete “spell checker” just renaming my .txt file in .dic file. This bad made spell checker somehow works in Page Plus X9 but it doesn’t work at all in Affinity Publisher. (I have to use an existent language as X9 doesn’t recognize Pāli (pi)). The discussion in Libre Office is here: Is it possible to create a Pāli dictionary for Libreoffice? - Ask LibreOffice

Some months later I came across a Pāli hyphenator which Cittānurakkho bhikkhu made for Latex. Cittānurakkho bhikkhu was so kind to share his hyphenator with me and I contacted again the guy in Libre Office in order to create both complete Pāli spell checker and hyphenator. Again no response.

I tried Cittānurakkho bhikkhu’s hyphenator in Affinity Publisher. Somehow it seems to work as you can see from the image below.

Is there anyone who has skills for making a Hunspell spell checker starting from .txt word list? I have two versions, Pāli ṃ and Pāli ṁ. Regarding hyphenator, it seems that Cittānurakkho’s one works. Maybe it should be just edited in order if someone prefers another kind of hyphenation. For example, I’m aware of people not wanting to split i. e. buddha like bud-dha rather bu-ddha. Any comments about this last issue would also be appreciated.

Thank you.

1 Like

Hi Antonio!

I hope you get some help, it would be great to have a proper hyphenator for Hunspell.

As you note, the example does indeed appear to hyphenate correctly.

One option that sometimes works is to define the language as Sanskrit. Have your tried that? There may already be what you need there.

No, the former is correct. The syllables correctly broken at bud/dha, and the hyphenator should reflect that, not what people may or may not want.

1 Like

Dear Bhante Sujato,

Thank you for your reply. I have some texts where there are both Sanskrit and Pāli words. Affinity Publisher recognize Sanskrit but not Pāli. I have had to use another language in order to be able to use Pāli spell checker and hyphenator. I have uploaded my .dic files in a folder for fi_FI (I don’t use Finnish) so that I can use them. If I try to make a pi (= Pāli) folder, both Affinity and Page Plus display a message like “unknown local settings”. Maybe I should talk about it to the developers.

Anyway, I hope someone can make a .dic file using my Pāli .txt word list(s).

1 Like