Stratification of the Suttas

Does anyone know of a work which continues in the same direction as Govind Chandra Pande’s Studies in the origins of Buddhism in terms of stratification of the suttas?
Thank you!

1 Like

I am contemplating trying to create files from his classification of DN suttas (early /late) and then apply text mining methods (with gensim for example ) and see what comes up. But I am in a monastery with very limited access to the internet. Maybe one day

It is not easy to master Latent Dirichlet Allocation and other such advanced statistical tools, so I have resorted to a very simple word count.

Here is what I have done:

  1. I created 3 files.
    a) One file contains suttas from DN and MN classified by Pande as early (DN 2, DN 13, MN 17, 24, 26, 29, 61, 63, 71, 108, 144). This file contains 267k characters
    b) Another file contains suttas from DN classified by Pande as late (DN 17, 18, 22, 24-30, 32-34). This file contains 454k characters
    c) A third file contains suttas from MN classified by Pande as late (MN 12, 28, 33, 35, 41, 43, 44, 50 etc.). This file contains 518k characters
  2. I did a word count and then compared the frequency of words in the three files.

I noticed that there are sometimes striking differences. Often the difference between early DN-MN suttas and late MN suttas is amplified between early DN-MN and late DN suttas.

This is still a work in progress, but here are some early findings:

Late suttas use much less the words cittaṃ, pajānāti, viharati, natthi, kammaṃ. This means that beyond purely stylistic differences, they talk less about the mind, about understanding things and about kamma.

On the other hand, late suttas use much more the words ahosi, rājā, deva, ānanda, ahesuṃ. Again, beyond stylistic differences, this means they talk a lot more about kings, devas and Ananda than early suttas (not much surprise so far, but this confirms what would have been expected).

I will continue the dissection of the results and will report later on interesting findings I may come across. In the meantime, I welcome comments and critics on methodology.


It seems the stratification of the texts is about the formation of strata in the EBTs. If so, it may be connected with the issue of angas (classifications) in EBTs. The following article by Choong Mun-keat may be useful:

“Ācāriya Buddhaghosa and Master Yinshun 印順 on the Three-aṅga Structure of Early Buddhist Texts”, Research on the Saṃyukta-āgama (Dharma Drum Institute of Liberal Arts, Research Series 8; edited by Dhammadinnā), Taiwan: Dharma Drum Corporation, August 2020, pp. 883-932.

((PDF) Ācāriya Buddhaghosa and Master Yinshun 印順 on the Three-aṅga Structure of Early Buddhist Texts | Mun Keat Choong -

Or this site:

Unfortunately, I don’t have an easy access to this document. But looking for it, I came across this one:


It seems Bhikkhu Analayo believes that the Pali and Chinese early Buddhist texts (such as the five Nikayas and four Agamas) had originated and finalised at once from the first Saṅgha council in their complete form (structure) and content, although some late, and different components of the texts identified by him.

1 Like

Bhikkhu Analayo completely ignores the relevant findings of Ven. Yinshun on SA/SN (i.e. the synthesis of the three aṅgas ) and the Ceylonese/Burmese version’s reading in MN 122:
“na kho Ānanda arahati sāvako satthāraṃ anubandhituṃ yadidaṃ suttaṃ geyyaṃ veyyākaraṇaṃ tassa hetu” (“It is not right, Ānanda, that a disciple should seek the Teacher’s company for this reason, namely sutta, geyya, veyyākaraṇa.”).

This Pali version’s reading is clearly supported by the Chinese version in the Madhyama-āgama, MA 191 at T I 739c4–5:
“佛言。阿難。不其正經.歌詠.記說故。信弟子隨世尊行奉事至命盡也” (“The Buddha said: Ānanda, it is not for this reason, namely sūtra, geya, vyākaraṇa, that a disciple follows the World-Honoured One with respect until the end of life.”).

Only the first three aṅgas (sūtra, geya, vyākaraṇa) are mentioned in the Mahāsuññatā-sutta, MN 122 at MN III 115,17 and its Chinese counterpart, the Dakong jing 大空經, MA 191 at T I 739c4. This suggests the possibility that only the three aṅgas existed in the period of Early (or pre-sectarian) Buddhism.

Accordingly, Bhikkhu Analayo is apparently unable to present a clear and precise argument or analysis regarding why only the first three aṅgas are mentioned in MN 122 and its Chinese counterpart, MA 191.

Venerable Anālayo has a forthcoming article (completed in June and now in the hands of publishers) responding to Choong’s concerns that Western scholarship on early Buddhism has ignored Yinshun’s proposal that the three aṅgas served as an early ordering principle of the Buddhist scriptures. In the article, he examines the five premises that Yinshun’s hypothesis rests on.

1 Like

Very good. Hopefully he does not just repeat/reprint the same five points shown in the following paper, pp. 983-997. If so, he again completely ignores the relevant findings of Ven. Yinshun on SA/SN (i.e. the synthesis of the three aṅgas ):

Travagnin, Stefania and Anālayo, Bhikkhu. 2020. “Assessing the Field of Āgama Studies in Twentieth-century China: With a Focus on Master Yinshun’s 印順 Three-aṅga Theory”. Research on the Saṃyukta-āgama (Dharma Drum Institute of Liberal Arts, Research Series 8), edited by Dhammadinnā, 933-1007. Taiwan: Dharma Drum Corporation.

This is really interesting as an initial trial. May I make a few points?

Generally speaking, Pande’s work is a mixed bag: he was a knowledgeable and careful scholar, but his work is limited in time and place, and in some cases makes theoretical assumptions that are, in my view, not well founded. Sorry, I can’t remember details, this is just my memory of reading him many years ago. Anyway, point being, his findings should not be accepted uncritically—but you knew that!

Methodologically, the crucial thing is that your analysis must be an independent test of Pande’s work, and must not explicitly or implicitly rely on shared assumptions. For example, if Pande has identified legendary narratives involving kings as late, and then in those texts we find an increased incidence of ahosi, rājā, deva, ānanda, ahesuṃ, this merely confirms that they are legendary narratives involving kings, and doesn’t tell us anything about the date.

Are you familiar with Ayya @vimala’s BuddhaNexus project? This does a lot of the analytical crunching, but it needs informed analysis to make sense of the data.


If I recall correctly, he seemed to think it was unlikely that the Buddha taught the 4 Noble Truths. But I appreciated that bias, because it means he wasn’t blinded by positive beliefs, which is sometimes a limitation for some scholars who are deeply invested in Buddhism and can have a hard time taking a step back to question their own beliefs.

Yes, that would be proper research :grinning: I am just playing around and trying to see if anything comes up. Ideally, I would like to come up with a list of words and expression that indicate either earliness or lateness. Some are easy to figure out, for example any passage mentioning the number 84 is very likely a late addition to the Canon. But this is certainly a difficult ground to progress on due to the sheer amount of unknowns. But I think people need to be aware that what is written in some suttas is sometimes to be taken with a grain of salt, or even sometimes thrown to the trash (like when it says that mount Sineru is a million kilometers high and the oceans are 5,000 km deep or something along these lines).

I wasn’t. This seems quite interesting, thank you for the link Bhante. There’s still a lot to be achieved in comparative studies too, including to assess the earliness or lateness of a sutta or passage.

1 Like

Venerable @Vimala also tried an analysis on the age of texts, via the length of words:

All these are just initial steps to check out the terrain, before definitive conclusions can be drawn. But we have to start somewhere!


It’s a good point. Sometimes you learn the most from the people you disagree with.

Also a valid activity!


This is funny, I tried that (just divided the number of characters by the number of words in each file) but I couldn’t see a striking difference (the scope was much narrower than in Ven. Vimala’s study though). Also, I found the AWL to be around 10, but I didn’t try to remove all the headers, and there can be a lot of them in DN.

I will have a closer look at Ven. Vimala’s work.


What I found is that the AWL can be used as an indicator, but no more than an indicator, of the relative age of entire collections. It does not work well on individual suttas because there are also other factors involved; it needs to have a considerable large sample of text for it to be more reliable.

Next to the PDF and charts as Sabbamitta mentioned above, here are some of the raw outputdata if you want to have a look at it yourself:


Regarding DN/DA (長阿含), Ven. Yin Shun states it was developed and expanded from the Geya (祇夜) anga portion of SA/SN. The following is the quotation, in Chapter 10, Section 4, from the book The Formation of Early Buddhist Texts by Ven. Yin Shun:

第四節 結說










1 Like

In this statement, Ven. Yin Shun suggests:

Even if the original form of SA/SN is being considered the most ancient/earliest, it does not mean one is able to understand SA/SN is a synthesis of the three parts/aṅgas (sūtra, geya, vyākaraṇa). Without knowing the characteristics of the three aṅgas and their connection with the formation of the other three Agamas/Nikayas (MA/MN, DA/DN, EA/AN), it is impossible to understand the process of forming the four Agamas/Nikayas according to SA/SN.