A Weight Function for Parallels

Could a function be made for statistically “weighting” the presence of parallels? Should a parallel in the Chinese Āgamas weigh more than a Gāndhārī fragment, or vice versa?

Just as an example of what I mean:
SN-1 — (4 parallels) — 85%
SN-2 — (0 parallels) — 0%

Maybe this isn’t even a worthwhile pursuit?


I’m not entirely sure what you mean: can you give some more detail?

Perhaps the idea is to look at a sutta, say:

SN 44.2 is, in fact 100% the same, so that’s easy.
SA 106 appears, at a cursory read, to have the same protagonists and the same message, so it’s close to 100% the same.

In some other parallels there is overlap of some parts, but not others, so the overlap would be less than 100%. It becomes a case of judgement, and would be time-consuming to implement, but is potentially useful.

1 Like

Okay, I understand now.

In our new system, we have more detail as to the kinds of parallels and relations. But it is very difficult to quantify. In many cases, relationships are of different kinds, not just degrees. For example, one text might refer by name to another text. This is a meaningful connection, even if there is no actual text shared between them.

In the future, what we could look at is to enhance our human-curated sets of parallels—which number in the tens of thousands—with AI. Linguistic patterns and similarities could be calculated across the corpus. This would allow us to recognize similar passages or linguistic patterns that fall outside the scope of recognized parallels. Such patterns could be mapped or visualized in various ways, for example, to show their distribution across different collections.

But this is far from trivial. With current AI it would be required to compare the texts in English translations rather than original languages. And of course, we don’t have such translations for all the texts. A possible solution might be the recent advances in training the AI itself to learn how to translate between new languages.

However, it’s not sure if the Buddhist corpus is big enough for this to work, even for the Pali and Chinese, not to speak of Tibetan and Sanskrit. Moreover, there are other problems, such as the massive abbreviations found in all the texts. How do we draw a map of, say, the jhana formula across all the texts when it is almost always highly abbreviated?


I meant something more simplistic. If there are many parallels we can be safe to say that there was an important source, we would have some percentage of confidence in that sutta. If on the other hand a sutta only exists alone with no parallels we might have less confidence in it.

So not so much about how exactly it matches up across parallels, just whether or not there is a presence of parallels. And weighting the confidence of a sutta based on it’s “parallelization”.

…it’s totally possible I don’t understand the nature of the parallels though.

1 Like

Sorry, my misunderstanding.

I’m not sure that this would achieve anything. On the one hand, it’s easy to see at a glance if a sutta has parallels, and how many. On the other hand, the number of parallels is only indirectly related to how confident we can be in its authenticity. This depends on many other factors.

For example, with the Anguttara, there is no really close parallel collection, as the Ekottara Agama is very different. However this is, we assume, most likely just an accident of history, and there would have been other Ekottaras in ancient India more similar to the Pali collection. (In fact there is a partial translation of another Ekottara in Chinese that supports this.) So the lack of parallels for a sutta in the Anguttara is much less significant than if it was a Majjhima sutta.

There is also the complex interaction of other textual features, all of which must be taken into account when considering such matters.