On the 7.4% (or what to make of Walser's "anatta" data)

Anatta appears in 7.4% of the pāli suttas. OK, now for the background and more specific data. If you’re not particularly into data analysis, this will be of little interest. If you are, then please feel free to comment.

J. Walser’s 2018 paper has been sitting on my local drive for a while. I need to file it away on my Google drive, but I felt it needed a little summarizing first.

When Did Buddhism Become Anti-Brahmanical_ The Case of the Missing Soul.pdf (6.0 MB)

The data is hard to follow in places and there seem to be a handful of typos related to the figures. Absent this, his data-driven conclusions appear sound based on his methodology. (I’m not talking to his other conclusions which I have no opinion on.)

As Bhante has noted elsewhere, counting things where the suttas are concerned is problematic. That said, let’s assume Walser’s methodology is good enough.

  1. What is the total number of pāli suttas under investigation?

5,126 total suttas
(DN, MN, SN, AN, Dhammapāda, Suttanipāta, and Udāna)

This includes peyyālas or “potential discourses” which are not actually written down. (These are presumed verbatim repetitions of text which are notated with an ellipse…) All counts of everything assume occurrences in the peyyālas.

  1. How often do the following three terms appear, with their various declensions or grammatical forms?

378 = anatta
368 = Noble Eightfold Path (N8P)
340 = jhāna

That is, anatta appears in 7.4% of the pāli suttas. Put another way, anatta does not appear in 92.6% of the suttas.

(The Digital Pali Dictionary shows 305 occurrences in the same population set; I assume it does not take into account the peyyālas.)

Comparatively anatta shows up just about as frequently as N8P and a little bit more than jhāna.

  1. Does the Buddha appear to discriminate among audiences when it comes to his teaching? Yes.

Walser defines four main documented audiences; I’ve added the number of suttas where they show up per his data (either as interlocutors or at least being present in the sutta). One sutta = at least one anatta occurrence.

200 suttas = Brahmin laity (householders)
115 suttas = Brahmin paribbājaka or wanderers
394 suttas = Buddhist Brahmin monastics (i.e., born into Brahmanical caste)
309 suttas = Non-Brahmins (all monastics?)

Walser states that some monastics were “coded” as Brahmins and that’s the third category. My “all monastics?” parenthetical refers to my own lack of knowledge; I didn’t find Walser clarifying that so I assume Yes.

There is overlap in some suttas where Buddhist Brahmin monastics are present with non-Brahmins. Walser applies a filter to drill down on the data and eliminate double-counting.

The paper does not make it possible to show exact correlations across all four audiences. I show below whatever I could find.

Also, the numbers will not add up to total figures above. This is cherry-picking certain terms per Walser.

  1. Doctrines taught to Brahmin laity (householders) (200 suttas)

25 suttas = jhāna (12.3%)
5 suttas = N8P (2.5%)
6 suttas = nothingness as highest achievement (3%)
0 suttas = anatta

  1. Doctrines taught to Brahmin paribbājaka or wanderers (115 suttas)

60 suttas = anatta

(59 variations of the same discourse are taught to Vacchagotta.)

  1. Doctrines taught to Buddhist Brahmin monastics (394 then apply filter)

58 suttas = jhāna (14.7%)
45 suttas = N8P (11.4%)
29 suttas = anatta (7.4%)

In all, 10 Buddhist Brahmin monastics by name were taught anatta. Of the 29 sutta occurrences (where the doctrine is taught), 19 are in SN. Nine of those SN occurrences are with Rādha.

In all, 15 Buddhist Brahmin monastics by name were not taught anatta.

  1. Doctrines taught to Non-Brahmins (all monastics?) (309 then apply filter)

55 suttas = anatta (18.7%)
18 suttas = jhāna (6.4%)
16 suttas = N8P (5.7%)

In all, 18 Non-Brahmins by name were taught anatta.

  1. How often do all various types of concentration doctrines appear?

Finally, if we group together all the different types of concentration from the jhānas to the formless meditations to the cessation of perception and feeling (and thus avoid double counting), we come up with a total of 116 (29.4%) discourses containing a discussion of absorption delivered to or by a Brahmin monastic while only thirty-two (11%) are delivered to or by a non-Brahmin monastic.

  1. Walser restates that, in all, 378 suttas reference anatta whereas 340 suttas reference samādhi.

  2. In summary, per Walser the suttas including Brahmins by name skew toward samādhi because that’s what they already understood.

Please comment on any data I may have stated incorrectly.

:pray:t2: :smiling_face_with_three_hearts: :thinking:

6 Likes

I’ve reflected on this summary data again.

  1. At least some of the Brahmin non-laity were familiar with micchāsamādhi – the Buddha exposed them to sammāsamādhi.

Regarding Walser’s proposal that the Buddhist Brahmin monastics were exposed to teaching about samādhi more frequently than anatta:

For certain, the methodology he uses is imperfect. Are there other data analyses comparable to his? Until I see those, I’ll assume that his imperfect methodology is the best we have at the moment.

Following on that, I agree with his conclusion that the Brahmin non-laity were already familiar with a samādhi practice but that it was micchāsamādhi. It was not samādhi within the Buddha’s liberation program.

Everything I’ve read to date in current literature suggests there was knowledge of a samādhi practice but the evidence is sufficiently scant that we don’t know anything more than that.

I’m pretty sure I’ve seen Bhante Sujato’s notations about this but don’t know where they are (i.e., in which sutta translations).

That’s from the recent thread on listening to music. Still, Bhante defines “wrong samadhi” and presumably that’s the samādhi some of the Brahmins were familiar with – or at least some of them.

  1. From the data, teaching the anatta doctrine to Brahmin householders was a non-starter.

I must think that this has some relevance for today. It can feel obtuse or even weird when I hear some lay practitioners talk to it. For example, someone at a retreat writing “not self” as the name on their water cup rather than just writing their name :thinking:. Like, you have a name!

At times, it helps me to recognize when I’m clinging to or grasping at an over-personalized sense of self and a legacy. Maybe some awareness of the aggregates helping that along. But beyond that, nah. Nothing there for me.

1 Like

One obvious objection would be that jhāna is included in the N8P in the last step, and that anatta is included in the wisdom and insight results of the N8P, from abandoning sakkāyadiṭṭhi of the stream winner to fully overcoming the conceit “I am” for the arahant.

I’m a little bit confused as to what your point is.

When I first started learning about the suttas I was almost obsessed by the idea of non-self. To me it was literally the most profound idea I had ever come across in my life thus far :slight_smile: so YMMV as they say.

I have to think that’s why Walser counted the # of occurrences of samādhi separately from that of jhāna. My assumption is that they do not always occur in the same sutta.

I would say that for most lay practitioners – at least the ones I’ve met over the years – this is too subtle to recognize as a profound idea, for different reasons. I hardly ever ventured into teaching this in meditation groups.

That said, it’s wonderful that it clicked for you.

2 Likes

I think one of the (quite pragmatic) reasons to shy away from teaching anatta to brahmin householders is that it would alienate an audience who grew up with strict atman views. Though I suppose in Abrahamic world, we kind of grow up with the same “soul” idea.

I think atta-anatta controversies is one of the biggest question marks of the Buddhism. I certainly think suttas describe the aggregates as bound with nothing permanent or unchanging, and that’s a rather easy perspective to see / deduce for one’s self.

Still, Dhammapada also has a famous Atta section. :smiley:

It can be like worst of both worlds at times - you carry the burden and responsibility of being a person, without the crutches of thinking that person is held by anything permanent.

I don’t suppose how else it can be, though. But then again, as an edgy teenager my nickname was “Soulless” so I might be biased. :sweat_smile:

3 Likes

I realize that since I don’t believe in the kind of corpus analysis you are doing I should probably refrain from commenting at all. But I feel obligated to point out that one can both believe that the texts we have are the words of the Buddha and believe that this kind of frequency analysis is deeply flawed.

3 Likes

Which is precisely why I think it would be insightful, personally. :slight_smile:

2 Likes

Alas, the more the merrier :grinning:

I’m interested to understand how a frequency analysis like this one is at odds with believing that these texts are the Buddha’s teachings. (Or any frequency analysis; it’s just that I don’t know of any others except for one of our colleagues who posts occasionally on this forum.)

Walser is honest about the sample size being pretty small (thus introducing statistical invalidity) but he does his best to attest to the data’s viability. He does this with the broad word “skew”.

There is some deviation of the Buddha teaching terms or doctrines across different audiences as presented in the canon. That is binary data.

And I assume people curated the written texts, as they were produced, in a way that they thought made sense relative to when those various texts became available and to whom. I don’t see a reasonable path to discounting this curation process.

The main point was summarizing Walser’s paper. But I am a fan of data analysis when it comes to unwrapping to whom various teachings are presented. This is, for me, a means to appreciate the brilliance of his teaching. That it wasn’t just the content but how he delivered it.

The drunk may be looking for the keys only where he can see. But they have to start somewhere.

1 Like

I guess I’ll have to read the article, but small samples for statistical inference make the results more uncertain, and it makes it difficult to determine whether something is signal or noise.

But also, a core part of statistical inference is the idea of randomly sampling from a population, so that we have reason to believe the values we calculate based on the sample are not too different from what we would find if we could measure the entire population.

It is not clear to me how this would work applied to the suttas. It would mean something like, imagine the Buddha kept on teaching for a really long time, what would be the true proportion of the time he would speak about anatta, and can we infer that from the suttas? Why would that be meaningful?

My feeling is that sometimes using numbers and frequencies makes something seem more scientific or valid, because it is a signifier for “real science”.

Anyway, I should probably read the paper :slight_smile:

3 Likes

Would love to get your take on it.

1 Like

I tried to read a little bit but the flavor is too ‘opinion piece’/polemical for me. It has a distinct flavor of conspiratorialism that, AFAIK, is particularly prominent among US Buddhists.

E.g.:

I mean, does stuff like this really get through peer review in Buddhist studies? Sheesh! :sweat_smile:

3 Likes

I am so excited to see some quantitative analysis here but these numbers are wrong.

jhana occurs, without peyyālas, and without counting anything outside DN MN SN and AN, at least 436 times, and is indicated by “pe” at least a dozen (and really certainly more) times in DN alone.

I will read this thread and the article and have more to say, but I would also point out than anatta as a term does not always refer to anatta the teaching like jhana always does.

Also anatta only occurs 162 times in the 4 principle collections so Walser is getting a lot out of the “pe” and KN here.

1 Like

This is even more wildly wrong than the initial claim.

Samadhi is mentioned 1216 times in the 4 principle prose collections alone, again without a “pe” count, it dwarfs the aforementioned 162 times anatta is mentioned, theres really no comparison.

1 Like

Well, there is a basic problem with this kind of analysis. It is apparent when comparing parallels that there was an artificial increase in the number of discrete suttas in AN and SN. It’s quite obvious when comparing AN with EA because EA has very little multiplication of sutras as AN does. There was a paper analyzing the evidence of this by Kuan and Bucknell: The Structure and Formation of the Aṅguttara Nikāya and the Ekottarika Āgama @ The Open Buddhist University

For SN, it’s more difficult to discern a similar process because the only full parallel that exists is from the Sarvastivada tradition, and they were engaging in the same expansionary process with their scriptures.

So, the result is that one would be overcounting occurrences of terms assuming that sets of 32 or 60 suttas were originally a much smaller number in some cases.

5 Likes

The key thing I have realised with SN is that it is in 3 parts.
The sagathavagga obviously, but there is a clear difference in language and terminology between the nidanavagga and SN22-SN56.

The portion SN22-SN56 is more like the later KN works and the Abhidhamma Vibangha, and less like the nidanavagga, MN, and DN.

REally the problem with Walsers analysis is the hypothesis.

THe suttas are not divided doctrinally by audience, they are diveded doctrinally by chronology.

Jhana comes first becasue it is part of the Patipada at DN2 which is replicated in all ten of the shared vagga of the silakhandavagga in DN and DA (and DS).

The rest of D forms after the silakhandavagga and has new and different teaching tropes.

M is later in general as evidenced by the growth of DO in it compared to Silakhandhavagga (6DO) and the rest of DN (10 DO)

Then the first half of SN, and the original core of AN, then the later KN, the Abhidhamma, the VInaya, the commentaries etc.

I have plenty of proof of this but am struggling to adaquately communicate it.

The other issue here is counting SN and AN suttas as somehow “equal” to DN and MN suttas, they are not.

comparing a 50KB text with a 1KB text and claiming that since one says jhana and one says anatta that means we have a weight of 1 vs 1 is absurd.

then claiming that becasue the 1KB sutta repeats ten times with almost no variation other than known permutations of pre-existing tropes means that we have 10 anatta suttas to one jhana sutta is equally absurd.

The massive narrative text is MORE IMPORTANT than the dozens of tiny hundred word texts that repeat themselves.

The tiny texts are appendages to the long ones, preserving variations, permutations and combinations, they rely on the longer, and earlier (in the corpus and in time) long narrative texts.

Regarding word frequency and important points, the US Constitution, the fundamental document for the basis of the government, has around 4500 words.

Essential and fundamental governmental and social/cultural principles for the US are freedom and equality, (at least they’re supposed to be). Everyone in the US is aware of this, like it’s in the water.

And yet

The word “equality” does not appear at all.
The word “equal” occurs only 5 times.
The word “freedom” is entirely absent.

This doesn’t disprove the frequency-of-words hypothesis, but it shows how lack of frequency doesn’t necessarily correlate with essential principles.

6 Likes

Indeed. It’s very silly as there are so many sources of “skew” and “bias” in these texts: what discourses were passed down, which were duplicated, etc. Not to mention the basic fact that term frequency has very little to do with its e.g. importance as a concept.

I guess I’m just grateful to the past generations of Buddhists who have preserved enough of the Buddha’s words that people can even think of doing “statistics” on them! Can you imagine even trying to do this kind of analysis on Jesus’ words? All dozens of them :face_with_hand_over_mouth:

3 Likes

Indeed.

Unless one is of the belief that the Nikayas are the sum total of a precise transcript of one single person, what can these numbers tell us?

1 Like

That wasn’t what I meant. I’m saying that it is possible to believe that what we have are the authentic words/teachings of the Buddha but to not believe that this kind of statistical analysis is valid. Of course to believe in the statistical analysis you have to believe that the texts are authentic in some way if you are going to assume to find any meaning in the numbers.

But from within the texts themselves I don’t see any indication that the Buddha taught that we can attribute any importance to the number of times he used a word. On it’s face, I think it would sound absurd for him to say, “Monks, count all the times I use a word to see how important I think a topic is.”

Moreover, we can only do this counting for the texts preserved at the various councils. While we do see identical/near identical teachings given in different contexts (audience and location) I don’t think that we can draw any conclusions that they kept a record of every single time the Buddha said something. I mean, no one thinks that, do they? So basically I think that the sample size is too small, but also that it may not statistically represent the number of times he talked about something.

So while I believe the texts we are looking at were spoken by the Buddha, I don’t think they reflect accurately every single word he uttered. If we did have an actual record of every single word he spoke, then maybe we could come to some conclusion about importance based on how often he said a word. I still think that would not give meaningful results. But what we have to work with is so far from what we would need, it’s not even worth discussing.

But even if we are trying to gauge the sentiment of the redactors, I still don’t think that they were paying attention to word frequency. My personal faith is that they included the texts they felt were important. But that they wouldn’t try and include duplicates solely for the purpose of boosting the stats.

That’s just not true if you believe, like me, that there is nothing to be found in statistical analysis. In fact that’s the whole point of the joke. He gains absolutely nothing from looking for the keys under the street lamp. Other than perhaps practice in crawling around on the ground. But at the end of the day he is still looking in the wrong place.

My point is that counting words gives us no meaningful information. It gives us data, but not information. Any conclusions we draw from the numbers says more about us than it does the mind of the Buddha or the redactors.

That said, we can probably derive some meaning from a word never appearing in the texts. But the relative frequency of any given word, I don’t think so.

Sorry for derailing your thread, but you seemed open to an expansion of the topic.

3 Likes

The four principal Nikāyas/Āgamas had not originated at once from the first Saṅgha council.

That the Pali language of the texts was not identical with Magadhi.

The language spoken by the Buddha was not Magadhi or Pali.

So, the numbers of a frequency analysis on the texts could not tell us what are the essential teachings of the Buddha.

1 Like