We’ve had some discussion about fixing the many, many problems with the current digital version of the Rhys Davids/Stede PTS dictionary. @waiyin has kindly offered to help, and @Simon also. But it was not something that I had looked at very closely.
For the past few days, Blake has been upgrading our translation app, so I have had some spare time and decided to take a look at the PTS dictionary and see what it would entail.
Here’s the results. It includes:
- HTML files of a small sample portion that is reasonably finished, and the full, HIGHLY UNFINISHED version.
- PDFs created from the samples, so you can see what I’m aiming at.
pali_ped.zip (3.5 MB)
What I’ve done:
- Expanded most abbreviations
- Adapted the refererences to the SC style
- Corrected and checked
- Marked up the various parts, such as grammatical terms, main definitions, etymology, references etc.
- Used a modern dictionary style, especially by removing a lot of punctuation
- Structured the entries as lists
As far as it goes, I’m happy with the result. It’s certainly a lot clearer and easier to read. But I’ve reached a point where I’m not really wanting to proceed further, so I thought I’d share it with you all and see if you have any feedback or suggestions.
The sample covers 96 entries. In the whole dictionary, there’s 16,528 entries. On a rough estimate, it’d take about 6 months work to finish. And that’s not something I can contemplate right now.
In fact, many of the changes are not that difficult. Most of the references, terms, and so on have been done with regular expressions. And with much of the extra markup it is simply a matter of going through the text and adding it one by one.
The real killer is the entry structure. It’s simply a nightmare to figure out how the entries are meant to be read. There is, most of the time, a structure, but it is no easy matter to work it out. Does this reference refer to the Pali phrase before it, or to the one after it, or to neither? Sometimes!
In the original text, these vague structures are, perhaps, not such a problem, as most of the time you just want to know the meaning of the word. But in order to mark it up properly, you have to figure out exactly how each element is related. There’s no way of automating this, so each entry has to be considered on its own. Even now, I am by no means confident that I have it correct, even in the small sample.
If it could be done for the whole text, I have no doubt it would make it far more usable and adaptable. As an example of that, I have included two versions of the sample, one of which is adapted for printing. With just a few CSS changes it makes a fairly good, printable version, taking up roughly the same space as the original print version, but far more legible. It’s not perfect, but it gives an idea what can be done when you have a well marked-up text.
The problem is, is it worth it? The landscape of Pali dictionaries is littered with the bones of failure: the webpage for the now-abandoned Critical Pali Dictionary lists the numerous obituaries of the scholars who died while trying to complete it. No, really!
So we have:
- The Rhys Divids/Stede version, which is digitized, but poorly, and is out of date. But it is fairly complete as far as the canonical texts go.
- Buddhadatta’s Concise Pali Dictionary, which is not greatly reliable or complete, and relies a lot on later Pali.
- The Critical Pali Dictionary, available online, but only covering up to kh, and not at all user-friendly.
- Margaret Cone’s A Dictionary of Pali, which so far covers about half the language, based on the texts published by the PTS; a date of 2030 has been mentioned for completion. This is not available digitally, and the print edition is not user friendly.
None of these are really satisfactory, and there’s no real sign of improvement in the near future, so far as I know.
Why, I am wondering, have we failed to produce a decent dictionary for Pali?
I think the failure has to do with a lack of clarity. We treat Pali as one language. But the Pali texts span 2500 years! You wouldn’t expect a dictionary of modern English to include this:
Hwæt! We Gardena in geardagum,
þeodcyninga, þrym gefrunon,
hu ða æþelingas ellen fremedon.
Oft Scyld Scefing sceaþena þreatum,
It’s from Beowulf, which is Old English. This is dated 8th–11th centuries, so the Pali canon is more than twice as old!
True, Pali has not evolved as fast as English, but still, there is no linguistic reason to insist that every strata of Pali belongs in the same dictionary. Later texts have a huge amount of extra vocabulary and usages, why not reflect this in the dictionaries? This is similar to what A.K. Warder successfully did with his Introduction to Pali; he based the grammar and vocabulary on just the Dīgha Nikāya.
What I am suggesting is that we need a Dictionary of Early Pali. Perhaps this would be a dictionary of the Pali Canon, or perhaps just the EBTs. Not only would this reduce the scope of the project greatly, it would define a language that is roughly contemporary, and which includes the texts that are of most interest for most people.
It would not need to be a detailed academic dictionary; we can leave that for Margaret Cone’s project. It would be something more like the Concise Dictionary, but covering all the vocabulary of the early texts, with meanings and context as used in those texts only.
With the convenience of text search, we no longer need to list so many references in a dictionary. Only when they indicate specific use cases are they needed. Etymology is unnecessary, as is scholarly discussion. Just words and meanings, essential grammar, and nice clean markup.
Anyway, as you know I have no time, nor do I have the inclination for this kind of work. But I wonder if there is a way to get it done. Perhaps a crowdsourcing venture of some sort could be implemented; but it is a specialized kind of work, so I am not sure how that would go. So I will just leave this here, and see what you think.