A Weight Function for Parallels

sujato · December 5, 2016, 1:13am

Okay, I understand now.

In our new system, we have more detail as to the kinds of parallels and relations. But it is very difficult to quantify. In many cases, relationships are of different kinds, not just degrees. For example, one text might refer by name to another text. This is a meaningful connection, even if there is no actual text shared between them.

In the future, what we could look at is to enhance our human-curated sets of parallels—which number in the tens of thousands—with AI. Linguistic patterns and similarities could be calculated across the corpus. This would allow us to recognize similar passages or linguistic patterns that fall outside the scope of recognized parallels. Such patterns could be mapped or visualized in various ways, for example, to show their distribution across different collections.

But this is far from trivial. With current AI it would be required to compare the texts in English translations rather than original languages. And of course, we don’t have such translations for all the texts. A possible solution might be the recent advances in training the AI itself to learn how to translate between new languages.

However, it’s not sure if the Buddhist corpus is big enough for this to work, even for the Pali and Chinese, not to speak of Tibetan and Sanskrit. Moreover, there are other problems, such as the massive abbreviations found in all the texts. How do we draw a map of, say, the jhana formula across all the texts when it is almost always highly abbreviated?