I’m not sure to what extent this topic has been discussed early so that I apologize if I’m opening a discussion which has maybe been solved.
I’m trying to make a Pāli spell checker and hyphenator for Hunspell. I contacted Libre Office for this. A guy answered me inviting me to collect Pāli words. I began with collecting words from the first part of Dīgha Nikāya so that he/she made the first “draft” of Pāli spell checker. That draft works also on Page Plus X9 and Affinity Publisher. So I collected more words from tipitaka.org, all words from the whole Tipiṭaka, Aṭthakathā, Ṭīka and so on. I tried to contact again the guy from Libre Office in order to make a complete Pāli spell checker but he/she dind’t answer. Anyway, I managed to make a complete “spell checker” just renaming my .txt file in .dic file. This bad made spell checker somehow works in Page Plus X9 but it doesn’t work at all in Affinity Publisher. (I have to use an existent language as X9 doesn’t recognize Pāli (pi)). The discussion in Libre Office is here: Is it possible to create a PÄli dictionary for Libreoffice? - Ask LibreOffice
Some months later I came across a Pāli hyphenator which Cittānurakkho bhikkhu made for Latex. Cittānurakkho bhikkhu was so kind to share his hyphenator with me and I contacted again the guy in Libre Office in order to create both complete Pāli spell checker and hyphenator. Again no response.
I tried Cittānurakkho bhikkhu’s hyphenator in Affinity Publisher. Somehow it seems to work as you can see from the image below.
Is there anyone who has skills for making a Hunspell spell checker starting from .txt word list? I have two versions, Pāli ṃ and Pāli ṁ. Regarding hyphenator, it seems that Cittānurakkho’s one works. Maybe it should be just edited in order if someone prefers another kind of hyphenation. For example, I’m aware of people not wanting to split i. e. buddha like bud-dha rather bu-ddha. Any comments about this last issue would also be appreciated.
Thank you for your reply. I have some texts where there are both Sanskrit and Pāli words. Affinity Publisher recognize Sanskrit but not Pāli. I have had to use another language in order to be able to use Pāli spell checker and hyphenator. I have uploaded my .dic files in a folder for fi_FI (I don’t use Finnish) so that I can use them. If I try to make a pi (= Pāli) folder, both Affinity and Page Plus display a message like “unknown local settings”. Maybe I should talk about it to the developers.
Anyway, I hope someone can make a .dic file using my Pāli .txt word list(s).
Unfortunately, I have no idea about how a hyphenator works. All I have done regard it was to include in my sources the hyphenator made by Cittānurakkho Bhikkhu for Latex. Then I send it to Gabix, the guy in LibreOffice who helped me to do the spell checker. I don’t know whether Gabix had to edit Cittānurakkho’s hyphenator, or he has used it straight away in the Hunspell version. Maybe you could send Cittānurakkho e-mail.
It would be great to add more words as the words that are in the actual spell checker are only actual words found in the canon. As long as I remember, I didn’t include words from Aṭṭhakathayo or Ṭīkayo. So words in thematic form as dhamma or buddha and so on are not included. You find there only actual words used in the canon collection as buddho, buddhā, buddhāna, dhammā, dhammo and so on. Maybe you could join our discussion here
I think Gabix could help us in enlarging our spell checker.
The TeX hyphenation algorithm is very sophisticated. One of Donald Knuth’s students wrote a PhD thesis on it…
See, for example:
[LaTeX is a very comprehensive set of TeX macros: LaTeX - Wikipedia. Hardly anyone uses “Plain TeX”, because if you do you have to write quite a lot of layout code, and it’s a lot easier to just use the LaTeX packages. However, the really clever stuff that gives us beautiful typesetting is in the underlying TeX code. Knuth, a Stanford Computer Scientist took a break from writing books about algorithms in the 70s to write the tools he needed to typeset the books properly.]
Here, the point is in which way Gabix has integrated Latex’ spell checker in Hunspell. Maybe this could be quite evident for experts, but for persons like me who have very little know-how on this matter, it is puzzling.
hyphen algo would need a proper sandhi breaker which is not really existent. We hope to make a proper sandhi breakup, but based on dictionaries in the next year or so. breaking the 950k words. First we start with the DPR algo and then manually correct from there. The file i pointed you at is that actual rough draft. We use that to give us the dictionary lookups.
Do you know Cittārukkho Bhikkhu? He wrote to me that he wanted to develop hyphenation pattern for Pāḷi compound words and for this he thought he needs a very clean error free Pāḷi word list. Maybe it could be a good thing to unify efforts in making this. If you agree I would ask Cittānurakkho Bhikkhu whether he would be involved in this task.
I’m just looking at using Affinity Publisher (V2) and was wondering if there is a working Pali spell checker and hyphenator for it? Is this project alive @Antonio-Costanzo ? If so, where can I learn more and download the current necessary files?