Digitizing All Buddhist Scriptures

Hello everyone.

I was just talking with two other people I found on facebook about the digitization of all Buddhist scriptures. Basically, this guy’s idea consisted of digitizing the scriptures of all the different lineages (zen, tibetan, pali, korean, etc.) and putting them all in one central database. From there, many things could be done easily and freely, such as searching by metadata.

I realized that I have a lot to learn if I want to help develop this idea. What scriptures currently exist? Which have already been digitized and which have not? Which organizations have already digitized some and/or intend to do so in the future?

I had heard of Sutta Central, so I came to check it out some more, and I find that it seems very similar to what I was discussing with these people. I’m also aware of other organizations’ efforts to digitize scripture.

What do you think, would it be possible for an organization to get the data from various digitizing organizations and put all the scriptures in one database with a consistent, metadata-ble format? And would it be worthwhile/beneficial to do so?

Hi @DiegoHemken, this would be great, but there would be many things to consider.

SuttaCentral contains the entire Pali Canon, in Pali using Roman text, with translations for quite a lot of it.

It also has the early texts preserved in other languages:
Chinese text from CBETA: T01n0001 長阿含經 | CBETA 漢文大藏經
Sanskrit using Roman text from DSBC: http://www.dsbcproject.org/
plus some Tibetan and other dialects. Known parallels are documented, though this is still a work in progress.

Looking at the Chinese:

It seems that the parallels with the Pali texts are in the first four (out of 100) sections, plus the Vinaya and Abhidharma.

So there is an enormous amount of material that has no parallel in the Pali and other early canons:

Prajñapāramitā Perfection of Wisdom
Saddharma Puṇḍarīka The Lotus Sūtra
Avataṃsaka Flower Garland
Ratnakūṭa Jewel Peak
Nirvāṇa The Parinirvāṇa

Presumably there would be parallels between the Chinese, Sanskrit, Tibetan, etc, texts of the Mahayana material, but that’s a whole huge field of study.

Automatic searching and collation would be an interesting challenge. I imagine that would require a database of terminology correspondence between Pali, Sanskrit, Chinese, etc languages…

I think these things are worth thinking about. The efforts on SuttaCentral to have translations of the Nikayas with sentence-by-sentence correspondence back to the Pali is something that would be very useful for the other Canons. Perhaps some of that already exists, but the major users of the Chinese texts would be Chinese speakers (though of modern Chinese…).



Thanks for the question, and to Mike for his helpful answer.

So far as I know, no-one has tried to digitize the entire corpus of Buddhist texts in one place.

Essentially what has happened is that various projects have digitized the traditional corpus of a particular tradition. So the Pali, Chinese, and Tibetan texts have been digitized, but they are all separate.

Sanskrit is a special case, as these texts are not really part of a recognized canon, but have been assembled from Sanskrit remnant texts discovered more or less randomly in various places.

Obviously such a project would be huge, and you’d need a clear and coherent strategy and use case to want to do it. The most obvious question is, why? What advantage would users gain from having all the texts in one place?

For SuttaCentral, our answer to that is, the texts we have here are those that stem from the earliest period of Buddhism, and in the majority of cases, the texts are very similar, just preserved in different languages. We think that it’s interesting and important to view the texts in this way. In addition, there are no other sites being currently developed that cover the early Buddhist texts in a comprehensive and professional way across all languages. So there’s a clear need there.

There may well be great advantages to taking a similar approach with the entire spectrum of texts, but you need to make sure you know what it is.

On one level, to do this with existing text corpuses is not all that difficult. The texts exist in digital form. Each collection is formatted in its own markup style, so you’d have to develop a script to transform each source into one consistent style. But the sources are pretty consistent within themselves, so this is no big deal.

Then you can simply generate an index and throw it at the web, with a bit of CSS to make it look pretty. This wouldn’t be too hard; but I don’t know that it would offer much advantage over existing sites.

The hard part would be to truly unify the different corpuses, for example by providing parallels and so on, as we do on SC. Another difficult challenge is providing translations, if that is of interest.

On a more advanced level, there are plenty of things that would very cool to do. I’d love to have a cross-lingual search engine for Buddhist texts. Search for “cat” and it gives you entries for biḷāra in Pali, bilāla in Sanskrit, 貓 in Chinese, and ཞི་མི། in Tibetan. Something a little like this. On the to-do list!

Just as one note, on SC we don’t maintain our texts in a database. All the texts are simply kept as plain HTML5. We’ve found this to be a great approach for many reasons, not least of which is that we can edit the texts individually or collectively with just a text editor. It also means that they are completely portable and can be used by anyone with a computer.

I could go on for quite some time, but I’ll stop there!

Everything SuttaCentral does is freely available on Github. You are most welcome to make use of anything we have.


Thanks Bhante,

Very cool! I gather that’s the sort of thing that @DiegoHemken is thinking of. Search for Sāriputta and also get Śāriputra, etc, from all possible sources. Of course, for most of us such a search wouldn’t be much use unless the results were linked to translations…

It’s one step along a very long road …

Thank you for the replies.

I am amazed with Sutta Central. It certainly covers a great need quite well and has potential to get even better in the future.

As to a collection of ALL Buddhist scriptures in one full-featured database, I think there are reasons why it would be good to develop even though it would require lots of time, money, and people.

Being superlative in quality to anything else out there in terms of thoroughness and ease of use, it would become preeminent. Like Google. When in need of a search engine, everyone goes to Google because it’s by far the best. For a database of digitized scriptures, preeminence would be very valuable because it would displace dhamma-representations of inadequate quality.

It would appeal to the entire range of Buddhists sects, bringing them together to one place where they can more clearly see a reliable record of the differences and similarities in the evidence of what the Buddha actually taught. This could help reduce ignorance and misunderstandings. It could also make it more difficult for people to misrepresent the dhamma, because a ubiquitous and convenient reference could be easily checked to verify or falsify claims.

In short, it would increase the clarity of the Buddha dhamma in the world.

Humans who have not yet encountered the Buddha dhamma would be more likely to encounter it in a form whose accuracy has been preserved by the consensus of qualified scholars.

Humans who encounter this collection could more easily understand the history and differences of the scriptures than if they browsed through a bunch of repeatedly re-digested misrepresentations on the internet.

Humans who have already encountered misrepresentations of the Buddha dhamma could more easily see the inconsistencies and understand their cause.

It would help decrease the ignorant adherence to sects which are based on incorrect dhamma without hostile criticism.

Analysis of the scriptures would be much easier. For example, I could search for a sutta that I vaguely remember. The other day I couldn’t find the one where a guy asks the buddha if he will go to hell if he gets run over crossing a busy street while thinking akusala thoughts. I tried searching keywords on ATI and couldn’t find it. Also I was looking for a sutta I remember reading about how even a gift given without a particularly noble or pure intention is still meritorious. I searched ATI and I used the index in my Wisdom Pubs volumes and couldn’t find it. A complete database would have helped in those cases.

I could search any topic more easily because I could search across translations. For example, if I want to search for “effluents,” I could choose which scripture(s) to search in, and which translation(s) I want to see. Say I chose to include all available scriptures in the search, and specify that I want to see only translations in English. My search results could include suttas that use the translation “taints,” “fermentations,” etc., and their original sources might include “āsavā,” “อาสวา,” or “流.”

I could search by the date or location of scribing. Sri Lanka, Japan, 100CE, 1000CE, etc.

It could render interlinear versions which could be browsed online or printed with a free license. Here is an online interlinear rendition of Mark 7:15: http://biblehub.com/interlinear/mark/7-15.htm
showing 1) the original Greek script, 2) the original Greek in a romanized script, 3) a word-for-word translation to English, 4) a “parsing key” noting the tense and grammar of each word, and 5) a reference number to a glossary.

I know that what I’ve described would be a huge project, and it may be unrealistic to expect something so good to actually happen. And I know it’s not necessary. As it is, someone who wants to find good dhamma can do so. But I do see what I’ve described as ideal, and I do believe it’s possible. If I had billions of dollars I could just pay people to make it happen. Anyway, I hope that in this world with billions of people, and abundant resources and technology, the clarity and availability of the dhamma will improve.

I agreed with everything up to here! Fact is, you need people who care about what they’re doing, and no amount of money can change that.

But back to the main thing, for now you can just check out what’s around. Obviously you can, if you want, use the approach we have on SC, as this is the largest scale attempt so far at what you want. But perhaps another approach will suit you better. Have a look around our Github and let us know if you have any questions or help.


I am interested in digitizing Dharma, primarily Vajrayana practices. I would love to discuss the implications of Vajrayana on the Blockchain with anyone interested. I am not so much interested in creating a scholarly database, but rather just making teachings and practice more accessible for the modern age. i feel we can use the blockchain to incorporate wearable EEG headsets and Augmented Reality for guided Yidam practices as well as creating e-books for sutra practices.

1 Like