A better Pali Dictionary

Tags: #<Tag:0x00007f788c94b158> #<Tag:0x00007f788c94b018>


Secondly, it would be useful to rate dictionaries by quality, and use the highest quality source.

Since Margaret Cone’s dictionary is of higher quality, its entries would usually override the entries of previous dictionaries. So it makes sense to work first of all on the entries not covered in the Margaret Cone’s dictionary.

CPED is of low quality, so it would be a last choice.

Thirdly, the electronic dictionary has an advantage - it may be extendable and interactive. Scholars may be able to add extended glosses and whole discussions “under the hood” of the articles. Last year members of the Pali Study group sought for sofware platform that would help to preserve the results of work on the Pali terms. It would be great to build such a bridge between scholars and public.

People won’t have to wait another hundred years before the update of the dictionary.


Fourthly, the format needs to take in account the modern state of metadata. I don’t know much about this subject, but IMHO, the Resource Description Framework would take us in proper direction.


This is not difficult, we can simply sort the words. But I don’t think it would be useful. Most of the frequent words are well covered by the CPED already, and it is the infrequent ones that need attention.

More subtly, remember the distinction I made previously about the difference between a “word” and a “token”. If we sort words alpabetically, we end up with similar tokens near each other, and this will often correspond with words. For example, karosi and karoti will sort near each other. This is helpful so that we can recognize and organize sets of tokens into words. However if we sort according to frequency, karoti will be more frequent than karosi and they will be separated. There will be no meaningful relation between tokens that represent the same “word”. This will, I think, make the organizational task much more difficult.


Agreed. This is why I suggested we begin by correcting and expanding the CPED by comparing with the DOP.

Sorry, but I don’t understand this.

This is a very good point, and should be at the foundation of our efforts. What I am thinking is that we can build a “skeleton” dictionary, with the aim to create simple, accurate, comprehensive entries for all words in the Pali canon. This can then be progressively enriched and extended over time. @blake and I have already done some work in this area. The translation software that we use allows for entry of terminology. With careful design, such an approach can, I think, evolve a very useful dictionary over time.

Did you end up with a solution?

If I might add, the issue is not to find a software, but to use a well-defined form of structured data. Using a consistent, predictable form of structured data, you can transform it into another format easily, and various kinds of software can assist in doing various kinds of things.


This is a good idea, yes. Currently SC doesn’t supply RDF metadata, but it would be a good addition.


It makes sense to work first of all on the P-H range of the Pali alphabet which is not yet covered by the published volumes of Margaret Cone’s dictionary.

Not at all. For example, we discussed the term “Buddha”:
but the results of this discussion remained in our closed group.

Even such widely used terms remain underexplored, with established translations which are used by habit.

That’s why I’m interested in exploring the key Pali terms - they are often mistranslated or misunderstood.

CPED, being too simplistic, is sometimes outright misleading.

Sometimes an article on the narrow contextual usage of the term, like:

Akira Hirakawa
The Relationship between Paṭiccasamuppāda and Dhātu

helps to understand what the sutta is about.

I would like for such contextual explorations to be added to the body of knowledge.


[quote=“sujato, post:24, topic:2445”]
If I might add, the issue is not to find a software, but to use a well-defined form of structured data. Using a consistent, predictable form of structured data, you can transform it into another format easily, and various kinds of software can assist in doing various kinds of things.[/quote]

Yes, indeed. I wonder how to find a balance between the ease of adding new entries, as in Wiktionary:
and the transferability of the body of knowledge.

What would be its advantage over other dictionaries?

I’m thinking along the lines of Digital Dictionary of Buddhism, where scholarly community would gradually extend the dictionary.


Such frequency lists allow to easily see what terms are used in the early Buddhist texts, and what terms occur only in later literature.

Yes, indeed, to make the frequency lists perfect, one would need a Pali stemmer.
AFAIK, David Alfter has not yet made the stemmer.
( )
So we are left with the raw frequency lists, which are also useful.
Knowledge of several hundreds of most frequent words makes most of the text comprehensible.

CPED sometimes creates the illusion of understanding the term, with articles like:
nimitta : [nt.] sign; omen; portent; cause.
which tragically misses the meaning of the term in meditative practice.
So, IMO, it is most frequent words that require extended treatment.


Good question… I have extensive experience in database development and project management.

My experience with the Suttas is that I started reading them a little over a year ago and have found them to be … what can I say? This is it. This is what I have been looking for since I was 8 years old. So I have a passion and a thirst for knowledge and understanding. I am looking into M.A. and Ph.D. programs, people of like-mind in my area (Northern Arizona).

I also have an attention for detail and comprehension, and intelligence according to the tests.

I’ve been reading on right speech and so this is a little off to be saying things about my skills. I have many faults and shortcomings. I suppose the first is that I am hesitant to list them all at the moment. I have some sort of ADHD and/or PTSD that limits my ability to memorize things. So part of my personal reasons for working on this sort of project is to make the texts more accessible and convenient in terms of being able to view as much as needed as possible on one screen so as not to rely on memory.

I also have time. I’ve been very fortunate to be able to make a living without having to spend time commuting, and am able to make enough $/hour that I don’t have to work 40 hours/week.

So, if we could agree on a course, I could and would commit to the project, with gladness and dedication.


Oh, thanks, okay, now I get it. I will respond further down.


Okay, so we are talking about two quite different things here. So we need to clarify that!

My goal—and this is something that is only becoming clear as the discussion proceeds—was to create a dictionary for basic Pali terminology. The primary use of this would be for word lookup, and thus it would extend, and hopefully complete, the range of words that were correctly identified by our Pali lookup tool. Let’s call is a Glossary rather than a Dictionary, if you like.

What you’re interested in, and if I’m not mistaken, Elissa too, is more of a dictionary of Buddhist terms. Perhaps something like Payutto’s Dictionary of numerical Dhammas, perhaps, but not just numerical. There are a number of such:

And no doubt others. However, none of them, so far as I know, deal specifically with early Buddhism.

This is also a great project, and would fill another need that I have felt for SC. Let me first discuss a little how I envisage something like this being used—or at least, one application—and then consider the project itself.

One of the things we have done with the texts on SC is to remove the footnotes. I have discussed this at length elsewhere, so I won’t go into it here. But one gap this leaves us is that we end up with texts that liberally use technical terms and ideas that will be unfamiliar to readers. Someone reading a sutta and coming across the term “aggregate” is unlikely to know what this means, unless they have some background in Buddhism already.

Now, footnotes are one way to deal with this, but not a very good one, especially in a digital medium. Why? Because they explain the term once, and we need the information to be contextual. People aren’t going to read the suttas sequentially, and we shouldn’t structure our information as if they will.

So, what to do? Well, I think that in a web environment we can use several means to approach this. One of those is this very discourse site, where we can discuss things, post essays and so on. But this doesn’t give us the fine-grained ability to explain specific words in a text. For this, I envisage two things.

  1. A system of site-wide annotations, where people can write notes on specific passages, and
  2. A terminological dictionary, such as the one we are considering, which will define doctrinally significant terms in a meaningful and useful way, to be applied site-wide.

So what you’d do is, if you wanted help with terminology, turn on the terminological dictionary, then the explanations will appear as popups for the terms wherever they appear in the site. The annotations would be similar, except they apply to specific passages, not general terms.

Of course, the terminological dictionary could also be used just as well on its own, or in other ways, maybe even printed.

What is the relation between this doctrinal dictionary and the simple glossary that I was envisaging?

Well, there doesn’t have to be a relation. Perhaps they are two separate projects. Or perhaps, we start by making a simple glossary, then enriching it with further information. I think both approaches could work. The latter approach would be conceptually more satisfying; but then, Worse Is Better!


Well, just what I have been saying: clear, comprehensive, accurate.


Are we talking at cross purposes here? The list of terms that I made, and on which I was proposing we base the glossary, is just those that are found in the EBTs.

It would be possible to map out the kind of evolution you’re talking about, but you’d need to use the much larger Pali corpus at the VRI site. It would be a great thing to do. But it would be really, really hard. You’d need to accurately stem the Pali words from all periods, and not only that, but to break up compounds as well. This is why I was proposing we work only on the vocabulary actually used in the EBTs, as it is a reasonably concise task. Only 80,000 tokens!

Interesting, I wasn’t aware of this. I’ll read it carefully. In fact SC has a stemmer in javascript, but it works on quite simple principles. You can get maybe 90% accuracy, but beyond that it’s hard. My thinking is that, again, by restricting the corpus to the EBTs, we can avoid the hard computational problems. Use the computer to do what it can, then correct and fill in the blanks by hand. (BTW, Google does the same thing. Part of its secret is that it employs thousands of people around the world to google stuff and submit corrections …)

Well, in this case I would disagree. I think this is fine, although I’d probably say “sign, mark, precursor, hint, cause, omen”.

The meaning of nimitta as “bright light seen in meditation” doesn’t occur in the suttas. In meditative contexts nimitta usually means “cause”, perhaps “precursor”, or even “aspect”. Lights in meditation are called simply “light” (obhāsa, pabhassara, pariyodāta, etc.) This is why, when assembling a dictionary of early Buddhist terms, we need to be diligent about rejecting later meanings.


Very good!

Not at all! I asked, and it’s important to know.

Well, that sounds fantastic.


Duh, I am in fact well aware of this project! In fact while in Germany I went to the University of Trier and discussed it with the developer. Fun fact: this started as a side-project to another University project dealing with the connections between Buddhism and ancient Egypt! So cool. But he didn’t know if there were any results from that.

However, the Pali project itself was a bit of a disaster. The core to it was a detailed annotated version of the PTS dictionary, without which the program is pretty useless. But after a lot of the basic programming work was done, one of the team members, a core IT guy, just disappeared one day, and with him, the keys to the relevant data. It just all vanished, and extensive searches couldn’t find it. So basically it just all collapsed, and now the developer—who doesn’t have any special interest in Buddhism, it was just a class project—is doing other things.


and it’d be nice if it were multilingual or at least user expandable through translation to languages other than English


Sure. This a later meaning which emerged due to the semantic shift of concretization.
With time, subtle abstract concepts were reduced to more concrete and easily comprehensible ones. In case of nimitta, Stephen Hodge writes:

“I understand “nimitta” to be roughly equivalent to basic sense, perceptual data or just percepts, such as colours, shapes, sounds and so forth. Perceptual data derived from the external world are mediated by consciousness (vij~naana / vi~n~naan.a) and apprehended by sa.mj~naa / sa~n~naa. In other words, I believe that “nimitta” are mental phenomena rather than external things per se, if that is what you mean here by “objects”. External objects in themselves are neither pleasurable or otherwise – is not that element introduced by the person perceiving and labelling the bare object ? Though, of course, from the viewpoint of the untrained person, it is the external itself which seems to be pleasurable etc, so ultimately your translation is not wrong in that sense. I normally translate “nimitta” as “perceptual form” – I would prefer “perceptual image” but I use that for “aakaara”. The popular translation of “nimitta” as “sign” seems laughably crude to me in the context of Buddhist accounts of perceptual processes.”

“Nimittas are created inside the individual by sa.mj~naa / sa~n~naa. Thus, Buddhaghosa defines sa~n~naa as “”, which corresponds exactly to the understanding of other Indian Buddhist schools. A nimitta is a result of synthesized raw sense data, combined with vedanaa, and, usually, also involves a labelling process – which is why sa.mj~naa / sa~n~naa also means “name” etc. Indeed, sa.mj~naa / sa~n~naa can describe, according to the context, either the process and the product. Hence, the Chinese version of the Anguttara text in question does not actually translate nimitta as such but instead has the standard equivalent for sa.mj~naa / sa~n~naa.”

Fotrunately, some of the modern research has trickled down to the Margaret Cone’s dictionary, where we read:

nimitta 3. (ii) an internal appearance or total awareness; a mental impression (appearing as an early stage of jhana, a sign of progress); A IV 418,24 …

What I would like to see is a system like Digital Dictionary of Buddhism where all the scholarly research of terms would be collected, with relevant Pali glosses.


I wonder if you will find useful the table from Kogen Mizuno’s dictionary, to give look-up hints on grammar:


You may find useful the Pali Tools project:


Ta, yes that looks very useful. @blake, you should check it out!