On the usefulness or otherwise of the commentaries for establishing canonical texts

There is an idea proposed by some that the Pali commentaries have a crucial role in establishing our earliest readings of the Pali texts. The argument is that all our manuscripts are much later than the commentaries. In cases, therefore, where a commentarial reading is unambiguous, it provides our earliest independent witness for the text.

Now, clearly there is something to this. The commentaries are sometimes invaluable in establishing both the reading and the meaning of the Pali text. But we can sometimes get the impression that somehow we can reconstitute “the” text before the commentators, which would then become the authoritative source.

This idea is dubious on many grounds, not least of which is that the commentaries cannot possibly have been based on a single unified Pali source. The commentaries stem from the unifying and editing work of Buddhaghosa, based on multiple earlier independent commentaries. All those earlier works must have been based on different manuscripts. Even any individual commentary is probably based on multiple manuscripts, referred to at different times, accumulated through the generations, consulted by different scholars. So the commentarial tradition must rely on a deeply unknowable complex of earlier canonical manuscripts. At best we can speak of “a” reading attested in the commentaries, but cannot assume that this was the only one before them, even if it is the only one they mention.

But leaving aside this problem, the notion that the Pali text can be established based on the commentaries does not withstand even a cursory review of what the commentaries actually contain.

  1. The commentaries only comment on a few words. How much of the text, exactly, is commented on? Well, let’s do a very rough count, based on the sutta I happen to be translating right now, DN 18 Janavasabha. The text has close to 3000 words. The commentary comments on about 126 words. That’s about 4%. The meaningful percentage is somewhat higher than that, for many words are repeated. Still, it’s only a small fraction of the text.
  2. Most words don’t have variants. Only a small percentage of words have variants. In our Mahasangiti text, DN 18 has 37 words with variants. Obviously, collating more manuscripts will increase this; equally obviously, most of that increase will be trivialities. So the amount of words with variants is maybe 2%.
  3. The overlap of words commented on and words with variants is tiny. Given the above percentages, it should come as no surprise that in our sample text, there are hardly any words with variants that are actually commented on. One of them is the name of the town, Nādika. Then there is the series of obscure terms on psychic powers, which I discussed in my previous essay. This is the only case in this sutta where the commentary is of any use in establishing the text; which leads us to the next issues.
  4. Text and commentary were handed down together, and often reflect each other. Frequently we find that variants in the commentary reflect the spelling of words in that tradition; a Burmese commentary will spell words as they are found in the Burmese tradition, and so on. This doesn’t mean that none of the commentaries can be used, but it does reduce their usefulness further.
  5. The manner of comment often does not meaningfully clarify the reading. It is not enough to simply have a comment on a word with variants; the comment must be of a sort that is actually fit for purpose. The aim of the commentators was to explain the meaning of the text, not to establish the readings. To take the case of the relevant variants in DN 18, the commentary has iddhipahutāyāti iddhipahonakatāya, which rules out the variant bahulīkata. That variant is, however, already excluded by lectio difficilior, so the commentary simply serves to confirm this. However, it also rules out my proposed emendation bahudha. So then we are left to consider whether the text before the commentary had already suffered change, or whether the emendation is incorrect. The Chinese parallels don’t offer any help on this point, so the commentary remains our earliest witness. So in this case it may well be useful. For the other variant, iddhi­vi­sevitāya for iddhi­vi­sivitāya, this again runs into lectio difficilior, and seems to me to be a case of the text being normalized to agree with the commentary. Which leads us to the next point:
  6. The commentary normalizes problematic passages. Since the commentators were engaged in sect-building, they were invested in creating a single, authoritative doctrinal explanation for the texts. To their credit, they almost always avoided making changes to the source texts, even in cases where they obviously disagree with the later Theravadin doctrines. Nevertheless, in cases where the original reading is unclear or problematic, they tend to impose a normalized dogmatic meaning, reading it in terms of their established doctrines. Given that it is precisely in such cases that an independent witness is required, this substantially reduces their usefulness. Using an approach known as redaction criticism, we are justified in suspecting that, where a text reconstitutes the commentaries, it is an apology for Theravada orthodoxy.
  7. Other methods are often more useful. Often questions of variants can be resolved by other means; by comparing manuscripts, or simply through grammar, sense, spelling, and so on. Where we have canonical parallels, Sanskrit, Chinese, and Tibetan texts can often be helpful in establishing the reading. While all these methods obviously have their own problems and limitations, they do have the advantage that they cover more of the text, and offer the possibility of uncovering corruptions that had crept into the Pali even before the commentaries. Where they are useful they further diminish the usefulness of the commentaries.

None of this is to say that the commentaries are useless. Of course they have an important place in Pali studies. But it is simply a fantasy to imagine that in any meaningful sense we can establish the Pali canon on firm grounds through the commentaries. In our sample text, the commentary helps establish maybe one word out of 3000. In establishing the Pali text, commentaries are occasionally helpful as a reference, but that’s all.


Dear Banthe; is everything you said about the commentaries the reason for them not have been translated in modern languages?

No, I don’t think this has anything to do why why they haven’t been translated. The reason for that is, I think, simply that they are not all that interesting except for specialists, and there are very few people with the ability and interest in doing so.


[quote=“sujato, post:1, topic:5820”]
The commentaries only comment on a few words. How much of the text, exactly, is commented on? Well, let’s do a very rough count, based on the sutta I happen to be translating right now, DN 18 Janavasabha. The text has close to 3000 words. The commentary comments on about 126 words. That’s about 4%. The meaningful percentage is somewhat higher than that, for many words are repeated. Still, it’s only a small fraction of the text.[/quote]

I think this seriously underestimates the proportion.

With the Janavasabha being the 18th sutta in the DN, it’s not really a very good choice for this sort of calculation. Buddhaghosa doesn’t like to repeat himself and so when he has defined a term once in the course of a commentary he won’t usually define any subsequent occurrences of it in later suttas except where it’s being used in a different sense. And so to get an accurate picture it would be better to look at the first sutta in each nikāya.

In the case of the Brahmajāla, when we eliminate all the duplicates we’re left with 1353 words. The commentary defines 710 of them. So that’s already 52%. But actually it would be considerably more than this, for many of the words are merely the same lexeme occurring in different cases:


While others are the same noun followed by different numerals:


And yet others are conjunctions, personal and demonstrative pronouns, etc. that seldom need defining. Eliminate all these and I believe the figure might well rise to 70% or more.


Good points, thanks, I should change my OP to reflect this. But first I’d like to check the numbers. May I ask, how did you get the figure of 1353 unique words in DN1?


I used BBEdit, though I’m sure there are other text editors that will do the job.

  1. Copy the CSCD version into a text file.
  2. Use BBEdit to replace every space with a line break. Now you have one word per line.
  3. Remove all the punctuation marks, parentheses, etc.
  4. Select “process duplicate lines” and then “remove duplicates leaving just one”.
  5. Remove all double line breaks.
  6. The number of lines remaining is equal to the number of unique words.

Cool, thanks. Turns out you can do the same thing in Sublime Text, so now I’ve learned something! I’ll check it a bit further, then update my post.

1 Like

Unfortunately, bhante, I’ve now found that the method I was using is flawed. The problem is that the opening tag that is used for indicating a definiendum in the CSCD Atthakathā files is used for other things too. So it seems that to arrive at a (more or less) accurate count one would need to count the closing tags that are followed by ti or nti.

I discovered the flaw when I was applying my earlier method to the Khp. and found that there were 700 unique words of which 935 had commentarial glosses. :confounded:


Ahh yes, NLP is hard! Let me go back to this and try again, bearing this in mind.

1 Like

These are the figures that I get by applying the search method I mentioned in my last post to the whole Tipiṭaka, minus most of the KN. For the KN I’ve omitted all the books in verse, except the Jataka, because too many of the commentaries’ glosses are for words in the commentarial stories rather than the canonical verses.

The first figure is the number of ‘unique’ words in the mūlapāḷi and the second the number of definienda in the atthakathā. For the former I first removed all the variant readings, since the atthakathā will only be commenting on one of them. Bearing in mind the great number of repetitions of a single word in different numbers and grammatical cases, I suspect the first figure would be more accurate if it were reduced by about a third; for now I’ve left it as it is.

Vinaya Pitaka 31563/6809 = 22%


Digha Nikaya 17374/6347 = 37%
Majjhima Nikaya 21662/8594 = 40%
Samyutta Nikaya 23058/7447 = 32%
Anguttara Nikaya 25481/9663 = 38%

Khuddaka Nikaya
Udana 4287/2359 = 55%
Itivuttaka 3330/1743 = 52%
Jataka 25947/12781 = 49%
Niddesa 18380/6906 = 38%
Patisambhidamagga 8498/4489 = 53%


Dhammasangani 2574/784 = 30%
Vibhanga 5850/1722 = 29%
Dhatukatha 826/133 = 16%
Puggalapannatti 2524/673 = 27%
Kathavatthu 6900/741 = 11%
Yamaka 1279/232 = 18%
Patthana 4692/491 = 10%


Wow, that is fantastic, thanks so much. Much more scientific than my first post!

Interesting that the Vinaya is about half that of the Suttas; might this indicate that the Samantapasadika was preceded by the sutta commentaries?

1 Like

Possibly, though it might also be due simply to the nature of Vinaya diction. For example, the Vinaya Piṭaka has 24 unique words beginning with itthi-, 39 beginning with purisa-, 43 with adhikaraṇa-, and 140 with bhikkhu-. But there are only a small number of words in each of these sets that would actually need defining.


The quotation particle is “ti” or rather "'ti* for “iti”.

“nti” is not a quotation particle.
The “n” is the last consonant of the preceeding word.
Final .m is changed to “n” by word sandhi through the influence of the following particle 'ti.

The same argument applies to expressions like “sahassampi”. This expression is not a single word but two words “sahassam” and “api”. pi or rather 'pi is a variant of api.
The final m in “sahassam” replaces .m. The change is due to word sandhi.
Such combinations are often regarded as a single word, but this is an error based on the fact that two consonants following each other are written as a single conjuncted letter in the scripts derived from India. It is merely orthographic. Word sandhi does not join two words into a compound. They remain separate word.

1 Like

Well, yes, sure, we understand that. But a text editor doesn’t!

What we’re talking about here is finding out ways to identify unique words, and that’s based, not on understanding of Pali linguistic structure, but sequences of glyphs, which is what a text editor or other program works in. This makes working with Pali quite a challenge. A text editor cannot tell the difference between, say, -nti if it is a verb ending and -nti if it indicates close quote.

1 Like

Oh, I thought a text editor was a human being… welcome in the age of robots (or"bots").

1 Like

Once upon a time, I’m sure it was. Like computers, which originally were humans—mostly women—who computed things. Now, a text editor is a program for editing plain text. Ahh, how we are fallen!


I hope to reach Nibbana bfore all humans turn into “bots”.

1 Like

I just dealt with the word “text editor” for the first time and made a similar assumption.

Surely copy editors have yet to be assimilated into the computer?


I checked what Prof. von Hinüber wrote about these five texts (in A Handbook of Pali Literature, 1996, Walter de Gruyter, Berlin…§208-220, and §226-244, but just briefly. He points out that …

  1. only the commentaries on the “Four Great Nikayas” and the Visuddhimagga can safely be regarded as works written by Buddhaghosa.

  2. Samantapaasaadika has a Chinese “version”