How many words are each of the first 4 Nikayas?

I am writing up some of my quantitative arguments, focusing on the first 4 Nikayas, can someone tell me how many words in Pali, not counting implied expansions, jsut what is actually there, each of the 4 N have? or a way to easily load a file and do a word count? there where some zip files or pdfs? shared here a while back I think??

I think the total number of words would depend on how one considered compounds.

How many words is this?

Sokaparidevadukkhadomanassupāyāsa

And I think some early written editions didn’t have spaces at all.

it’s not especially important, I guess what I want to know is how many words in the PTS editions? for example.

I am looking to be able to make comparisons about the frequency of the occurrence of terms say between MN and DN, so provided the spaces are relatively consistently applied then it will work for my purposes.

So far I have tried just pasting gretil html files into word and doing word counts, but it is crashing quite a bit, and i havent gotten very far.

any suggestions will be very much appreciated.

Do you consider the above compound 1 or 5 words?

I’ve never really understood why some groups of words are compounded and others not.
Consider:

ḍaṁsamakasavātātapasarīsapasamphassānaṁ

From MN 2. Why is this ‘one word’?

I don’t know, as I say, its not relevant to my question, if someone knows the number of letters, without spaces, of each of the 4N then that would do for my purposes as well…

I am looking to establish the relative sizes of the printed Nikayas, either by “words” if there is a spacing system, or just by lettters, so I can work out ratios of strings of letters to the totals…

any suggestions much appreciated.

Also, how would you address inflections of case?
Are, for instance, ariyasāvako and ariyasāvakassa two distinct words?

I’m not addressing inflections of case. I am not trying to work out how many distinct words, like a vocabulary, I just mean the total number of words (or letters) in each of the 4N.

It’s for a quantitative thing, it has nothing to do with compounds or cases or anything like that.

I am pasting the PTS pali from GRETIL into word to do a word count (or letter count) but the files are too large and my computer crashes.

Does anyone have any ideas?

If you download the Suttapitaka from the 6th council website, tipitaka.org , you will have after unzipping separate pdf files for the major divisions of each nikaya.

I suppose you could then paste these pieces (or parts of these pieces) into a word processor that can count up all the letters.

But you would probably have to strip out the section titles and headers. Also all the variant readings (syā, ka, etc. in blue).
Not sure how you would do that.

1 Like

It might be worthwhile to look at the number of suttas that contain a given keyword or phrase. Counting occurrences can be a little misleading at times. I do this quite a bit in Chinese Agamas, and I’ve noticed that one passage might have the word “aggregate” 20 times and another only once, but they have the same significance really.

Example: Dhyana (jhana) and equivalents occur in ~31% (69/222) of Madhyama Agama sutras, whereas the five skandhas are only mentioned in 7% (16/222) of them.

1 Like

Word isn’t really robust enough for this sort of thing. Better to use something like Sublime Text or BBEdit.

Of possible interest, an old thread on number -crunching:

2 Likes