SQL database of the Pali Canon

I’ve been working on a relational SQL database for the Pali Canon after discovering that there wasn’t an easily accessible one online! Here is the link to it:

Pali Canon db

I hope some folks find it helpful. If you have feedbacks on improvements or encounter any issues, please let me know. Instructions on how to download and use are detailed in the link above. To summarize, I’ve posted a ~500mb SQLite db file that contains a SQL representation of the Pali Canon, including all its texts and metadata.

This database is entirely sourced from SuttaCentral’s GitHub repository. I’m grateful for their vast and organized library of the Pali Canon which made this project possible.

I had initially developed this database with the hope of creating a mobile app for the Pali Canon after noticing that one as comprehensive as the SuttaCentral web app did not exist in the App Store. I hope to get to working on this soon, but in the meantime I thought someone out there might make use of this SQL database.

Why might a SQL database be useful?

SQL makes complex searches much easier. For instance, this query searches all instances of the word “impermanence” in any of the English translated scriptures:

SELECT 
   tr.translation_id, 
   tr.uid, 
	ti.parent_uid,
	ti.basket,
   ti.translated_title,
   a.author,
   tr.text_content
FROM 
   Translation AS tr
LEFT JOIN TextInfo ti
   ON tr.uid = ti.uid
LEFT JOIN Author a 
   ON tr.author_uid = a.author_uid
WHERE 
   tr.text_content LIKE '%impermanence%' 
   AND tr.lang = 'en';

Yes, this can be easily done in suttacentral.net using the search bar, but you can see that the possibilities are endless when you start querying. You can write very complex queries that may be challenging to perform through a web interface. Not to mention, a SQL database can make development or scholarly research easier.

Please let me know of any criticisms and suggestions!

3 Likes

Any chance you’d want to figure out a way to include thanissaro’s translations, and perhaps ven. Anighas? I already have them in a postgres db and have scrapping/adding process (mostly) automated. But the format is very different. Happy to chat if that’s something you might be interesting in. It would be the first to have all up to date translations. I think it would be a great open source project. Id be happy to help maintain it.

I’m not sure if you are familiar with these projects:

and
https://simsapa.github.io/

Feels like teaming up might help everyone. No idea if your projects are compatible, though.

1 Like

Funnily enough, the first web incarnation of SuttaCentral was a LAMP application. We began replacing it in 2012 with what is now legacy:

There were no actual texts, just links, and the UI was unstyled tables. However, the DB had all the parallels that were previously stored in Word documents.

Fun times!

3 Likes

Yes, my goal is to include all translations out there (if licensing permits)! If you can let me know how/where you sourced Thanissaro and Ven. Anigha’s translations I’ll take a look at how to incorporate these translations in the pipeline.

You’re also more than welcome to contribute if you’d like to add these translations instead (simply fork the repo, create a new branch, and open a new pull request once you’re ready!). If you have any questions regarding the code, ask away.

Interesting! Is the codebase for that LAMP application available anywhere? I’m curious on how the SQL database looks like.

Ahh, I love a bit of software archaeology. Here’s the repository for the old code:

Which was replaced by this:

Happy hacking!

1 Like