Statistical analysis of Early Buddhist Texts

sujato · July 12, 2020, 9:43pm

We haven’t heard back from Animitto, so I’ll just make some general remarks.

If anyone wants to do statistical analysis of Pali texts use GitHub - suttacentral/bilara-data: Content for Bilara translation webapp.
- Segmented translations in English are also found there, as well as a growing collection of other languages.
For remaining texts, use sc-data/html_text at html-clean5 · suttacentral/sc-data · GitHub
If you want more precise or specialized information than a regular search engine provides, clone the git repo and search it locally using Sublime text or some other tool.
To export texts into a spreadsheet, use Bilara i/o. Bilara i/o
The main SC search uses elasticsearch, SC-Voice uses ripgrep, while our translation webapp Bilara uses ArangoDB. All these have advantages and disadvantages, so you may get somewhat different results.