A better Pali Dictionary

sujato · May 29, 2016, 9:56pm

Thanks for letting us know. It’s so easy for things to get lost in digital editions!

Russell · September 14, 2016, 4:45am

Dear Bhante @sujato,

I’ve been meaning to let you know that I have completed the initial compiling of the EBT Pāli words from CDOP II (Ga-Na). Work done on the spreadsheet:

-verified all EBT Pāli words to include adding missing words and terms found in CDOP II to the spreadsheet you uploaded.

I am now going through the spreadsheet a second time to finalize it. I need to ensure it is accurate to minimize further rework. I had some issues with OfficeLibre during these past months and had to recover work at least three times (the latest being earlier this evening). I am estimating at least 30% loss with the issues. Work being done this second round:

-review for accuracy
-add back what was lost during the issues
-alphabetize it in order according to the Pāli order of letters

Happy vassa!

with respect, reverence, gratitude, and in mettā,
russ

sujato · September 14, 2016, 7:18am

This is absolutely awesome. I have been vacillating between wanting ask you how it’s going and not wanting to hassle you about it!

That’s not good! I hope you can recover it. I haven’t used LibreOffice much for years, I would have hoped it was more stable than that.

I look forward to the final product!

You may have noticed we’ve recently made some upgrades to the PTS dictionary; nothing that will affect you, I think, but it works significantly better now.

Russell · September 15, 2016, 5:16am

Dear Bhante @sujato,

Thank you! Even though with the estimated loss (I projected it higher so that way I don’t get complacent and do the checks properly), we’ll be okay . I just did some proof reading this evening and sure enough, some of the new entries I thought I made before where gone. But it’s just a matter of putting them back manually. A good lesson for patience LOL .

OfficeLibre is not bad, it’s just that for some reason, it doesn’t like having the file having open for so long. Within 30 minutes of working on it, it would often freeze on me just when i’m doing entries. And I just don’t know why I had to recover three times. I keep my hard drive clean and I have a lot of space and CPU memory isn’t a problem either. But it is what it is! Good enough!

And I’m just thankful I don’t have to start from scratch hee hee ! I don’t mind it but that would be suffering to see all the work gone LOL

with respect, reverence, gratitude, and in mettā,
russ

sujato · September 15, 2016, 7:47am

But that’s really bad; a mature spreadsheet program should be very stable.

Meanwhile, our friend @Dheerayupa has offered to help out. Perhaps you could have a chat about what has been done, what needs doing, and how she might help?

sujato · September 18, 2016, 10:51pm

A post was split to a new topic: Wildcard word search of entire Pali canon

llt · September 19, 2016, 5:28am

There are different versions of LibreOffice available for download, including versions meant to be more stable rather than “bleeding edge.” If LibreOffice is freezing or crashing every 30 minutes, then it’s not even functioning on a basic level. In that case, it should probably be removed and installed again from a download on the LibreOffice website.

https://www.libreoffice.org/download/libreoffice-still/

trevor · September 19, 2016, 7:00am

This is all of considerable interest to me as I’ve been making sporadic attempts at mastering, if that’s not too ambitious an aspiration, Pali for some years now, and began with Prof Buddhadatta’s New Pali Course, with help from a teacher during my time in Darwin. But most recently I’ve gone back to Lily Da Silva’s Pali Primer, James Gair’s New Course in Reading Pali, as well as dipping into A.K.Warder. But getting as quickly in to the canonical texts through Gair, Johansson and Suttacentral, and above all hearing the texts as you read them with this guidance seems vital.

Linda · September 19, 2016, 9:07pm

@trevor
If you haven’t seen this thread, it might be of interest to you.

Also this.

Dheerayupa · September 21, 2016, 8:26am

Dear Russ,

If you think I can be of any help, please email me.

Mega metta,

Dheerayupa

Russell · September 22, 2016, 5:13am

Dear Bhante @sujato,

Thank you for the heads up! I’ll get in touch with Dheera!

with respect, reverence and gratitude,
russ

break

Dear @Dheerayupa,

Sawat dī khrap! Kor tot for the late reply! Hope you’re having a great vassa! Of course you’re help is welcomed! I’ll e-mail you the details. Looking forward to working with you on a project again Khop kun mak māk!!!

with gratitude and in mettā,
russ

Russell · October 16, 2016, 6:21am

Dear Bhante @sujato,

With much pleasure please be informed that identifying and compiling the EBT Ga-Na terms have been completed. I have sent you the file through gmail and have cc’d our dear friend @Dheerayupa . Thank you very much once again for giving the opportunity to assist.

Cittalamkaram cittaparikkharattham danam deti.

May all beings be released.

with gratitude, reverence, and in mettā,
russ

Senryu · June 9, 2017, 3:31pm

Perhaps both early and later forms could be included in one dictionary, but differentiated. Digitally that could be done for example by colour coding, having earlier definitions in one colour (black for example) and later definitions of the same words, or of words which only occur later, in another colour (blue for example). There could also be an option for digital searches of only old definitions (i.e. EBT definitions), to filter out the later ones.

The advantage of that could be that it would cater to more people (the more the better in terms of costs and manpower to make it, and maintaining it over time), and would prevent someone studying both early and later material from needing two separate dictionaries. And it would also help people to not remain ignorant of the changes in the definitions, and thus would in fact help people to understand the difference between the earlier and later use of the terms; why things are understood differently in later texts and by other groups (such as normal Theravadins who may hold those later meanings as being standard; and importantly why many people (such as normal Theravadins) will be understanding the earlier texts differently, due to them holding different definitions of words.

That sounds like a whole lot of benefit. Whereas keeping them as separate dictionaries may increase the tendency of fans of EBTs to remain ignorant of a detached from people indoctrinated in later material, and thus hinder the potential for mutual understanding and communication.

Just thinking aloud on the topic.

sujato · June 9, 2017, 11:12pm

For practical purposes Cone’s approach is, I think, the right one. She focusses on covering the EBTs, but includes a lot of coverage of later texts, without attempting to be comprehensive. The main problem is that it’s not a properly semantic digital text, so until someone does this work it will remain limited in usefulness.

tuvok · June 10, 2017, 12:31am

I have some vague plans to write a Pali dictionary software, if you have any need for such stuff I could probably include some additional requirements
If there is any need for that, I can share what I was thinking about.

sujato · June 10, 2017, 12:33am

We’d be very interested in anything like that. Please share your ideas! Feel free to get as nerdy as you like.

tuvok · June 10, 2017, 12:10pm

I wanted to have a software that would help in translation process. It would need to have three features that I imagined useful:

Repetition discovery (both exact word for word matches, and less exact, e.g. sentences with one / two words different or in different form).
Syntax highlighting (different colours for different word forms, e.g. cases, gender etc.)
Dictionary

The Dictionary part could be a separate service, which would make it useful outside of translating software, for example as a backend dictionary for SC.

The (very) high level architecture would look something like:
[database] <=> [server doing db queries] <=> [jsonrpc interface]

Data would be organized in three categories:

Word definition in target language
Word meta-data (noun / verb, case, mode, voice etc.)
Pali words

[3] would be in one-to-many relationship with [2]. [3] together with [2] would be in relation with [1], again one-to-many (or perhaps many-to-many, as the same definition could probably fit two different words?)

Database would have the sutta texts imported and addressed - not only with paragraph numbers as is currently done, it would need more detailed addressing, with every sentence numbered (as they do with Bible), and also word in sentence number.

Having that will enable two kinds of word definitions and two kinds of queries to the server:

Weak or light definition / query: definition is linked to word (in exact form), but not to explicitly addressed word (no context). This would return all definitions matching word (e.g. dīpa: 1. lamp, 2. island)
Strong or hard definition / query: definition is linked to word in exact place (addressed to sutta / paragraph / sentence / position in sentence). This would of course require manual linking of word/definition by someone. It would return only the right definition for that context.
This point will require much more work and will probably be never 100% complete, but will allow generation of word definitions as in M. Cone’s dictionaries:
[word]: [word type and form], [definition], [list of context sentences]

frankk · June 10, 2017, 3:43pm

SN 8.4 offers a couple of interesting examples of nimitta in a samadhi context that seem to work better as “sign” rather than “cause”.

(bodhi)

“It is through an inversion of perception
That your mind is engulfed by fire.
Turn away from the sign of beauty
Provocative of sensual lust.503

…

“Develop meditation on the signless,
And discard the tendency to conceit.
Then, by breaking through conceit,
You will be one who fares at peace.”506

edit: addition

How do you translate nimitta in MN 20 Bhante? That word gets used frequently in there, and if its “cause”, it would read pretty awkward or redundant.
(bodhi)

.(i) “Here, bhikkhus, when a bhikkhu is giving attention to some sign, and owing to that sign there arise in him evil unwholesome thoughts connected with desire, with hate, and with delusion, then he should give attention to some other sign connected with what is wholesome.240 “” When he gives attention to some other sign connected with what is wholesome, then any evil unwholesome thoughts connected with desire, with hate, and with delusion are abandoned in him and subside. With the abandoning of them his mind becomes steadied internally, quieted, brought to singleness, and concentrated. Just as a skilled carpenter or his apprentice might knock out, remove, and extract a coarse peg by means of a fine one, so too…when a bhikkhu gives attention to some other sign connected with what is wholesome…his mind becomes steadied internally, quieted, brought to singleness, and concentrated.

sujato · June 10, 2017, 11:05pm

Thanks so much, that is fascinating.

I’m not sure if you’ve been following this, but our upcoming generation of SC (tagged SC next) relies on translations that are segmented and numbered on a sentence level or smaller. That means we can match segment to segment in source and target; it also means we can match one translation with another, or attach notes, etc. that will work across languages.

On the whole, we are moving towards use of standoff properties rather than markup. The position of words or even glyphs is identified by glyph count within the segment. We can use this for things such as variant readings, but it would equally work for a dictionary lookup.

In terms of fuzzy matching, our CAT engine Pootle does this pretty well out of the box. Since the texts will be hosted on Git, we can also leverage the awesome power of git diffs if need be.

That would be great, but not easy to do for the whole corpus. It’s one of those things that you could do readily enough for a small passage; or else get 80% over the whole corpus; but to get anywhere near 100% over the whole corpus would be challenging indeed.

Again, just so you know, we are developing what we call the “New Concise Pali English Dictionary”, which essentially consists of the old Concise Dictionary, corrected and expanded in line with Cone’s DoP. The first of the three volumes is live on the site, and the second is in its final stages of preparation. When the third volume of Cone’s dictionary is published, we will complete this. This will be a far more accurate and comprehensive concise dictionary, which would be ideal for this purpose.

sujato · June 10, 2017, 11:08pm

The general meaning of nimitta in all these contexts is the same: an aspect or part of experience that, when focused on, tends to promote the growth of similar or related qualities. It thus straddles the sense of “sign” and “basis”. While the meaning is clear, the translation is not, and I am not sure what renderings i will settle on.