Commentaries Database in SuttaCentral

sarana · June 24, 2020, 12:21pm

Let me suggest here that the commentarial scriptures (can be from VRI, Sinhalese, Burmese, and Thai translation are most certainly also public domain) are added to the basic SuttaCentral navigation tree in the way CST4 software has it. The commentaries can be added below the fragments, divided into Sutta-Comy, Vinaya-Comy, Abhidhamma-Comy, Other-Comy. The same thing could be done for Sub-Commentaries or Sub-Commentaries could be included in the Sutta/Vinaya/Abhidhamma Comy parents. Example -
Sutta-Comy

Digha-Comy
Digha-Sub-Comy
Majjhima-Comy
Majjhima-Sub-Comy
Samyutta-Comy
Samyutta-Sub-Comy
etc.

I am fluent in Pali, English, Sinhalese, and Burmese, so for me this kind of database in addition to the treasure you have already made with Pali-Mula would be immensely helpful. Please, consider my suggestion and let me know your thoughts.

Ceisiwr · June 24, 2020, 9:42pm

This would be great to see.

sujato · June 24, 2020, 10:24pm

Dear venerable,

I agree, and we have had some discussions about this.

Currently, on the old VRI site at tipitaka.org site, we have a reliable and simple interface to read the commentarial literature. But there is no way to link from a text to its commentary, you have to navigate in each case. I would only be interested in hosting the commentarial texts on SC if we could provide a substantially better functionality.

The Pali texts on SC are now in a very clean and well-structured form, which would make the addition of commentaries—and any level of sub-commentaries—trivial in terms of the application. The problem is preparing the data.

I have just spent the past several months working with a team to prepare and clean the Pali Tipitaka texts, so I have a good idea of the complexity of the task. It is doable, but not trivial. How I would go about it is this:

Scrape the XML files from the VRI site.
Match the references inside the XML files with those in our files.
- We already have a program that can do this kind of thing, I’m assuming it could be adapted.
Once the texts are aligned, match the head-words of the comments against the text in SC’s text root.
- Since the editions are different, the matching would have to be loose and exceptions dealt with on a case by case basis.
Divide the commentary accordingly, so that each chunk of text in the commentary corresponds with a specific text ID in SuttaCentral.
Add the commentarial text to an /atthakatha directory in our bilara-data repo.
Figure out some way of handling HTML for longer entries.

The upside is, the commentarial text would be firmly linked to its root in a consistent pattern that can be consumed by any app. That would automatically make it available for translation if anyone wishes to do so.

This would work for many cases, especially where you have a simple comment on a root text. However, there are many exceptions; think of things like the Jatakas or Dhammapada, where a “commentary” may consist of many pages of text for a single verse; or the Nidanas, which don’t necessarily relate to a specific root text. We would have to develop a system for handling all such cases.

None of this is trivial, and we don’t have the resources to invest in it. However, if someone wants to take it on, we’d be happy to support it. I would guess it would take a small team of 3–4 people, including experts in both Pali and programming, about 6 months of full-time work.

If we did not segment the texts, but simply matched them sutta-by-sutta rather than passage-by-passage, it would be a lot easier. But it would also be a lot less useful, and I’m not convinced it would be significantly better than what is currently on the VRI site. But again, if someone was willing to do that, I’d be happy to have a look at it, and possibly support it if it looked workable.

sarana · June 25, 2020, 1:29am

This is wonderful, sadhu, sadhu, sadhu!

Now what I am going to write you certainly know already, but still, it’s better to say than not say and miss it… Unfortunately, I am a new user here, so I cannot send more than 1 screenshot and not more than 2 links. So, please, see full message with links here - message to Ajahn Sujato - JustPaste.it I have numbered the points to reduce any potential confusion.

Do you know DPR, Digital Pali Reader by ven. Yuttadhammo? (link) In the Dictionary tab, he has there “Atthakatha Dictionary” Is that something that you would like to do? I don’t know how he did it, but it is possible to search terms among the Atthakathas inside the Atthakatha texts without any references. It simply “explains” any term that I would type in by the Atthakatha explanation. (3 links to screenshots)

I found this particularly helpful.

Just a little note for you, huge commentarial text for Mula Pali appears also in AN 1 - Etadaggavagga.
As for Nidanas, I think they are commentaries for Aṭṭhakavagga in Suttanipata. (But no guarantee. ) Oh, that’s Niddesas. I am sorry, if I misunderstood. If you mean Mahaniddesa and Culaniddesa, they are basically commentaries for Aṭṭhakavagga. As for nidanas as Pali introductions to Commentareis embedded in almost every Pali Commentary book, I think you do not have to join them with Suttas. They are great alone, if you keep the Commentaries available also as standalone texts (like you have done it with Mula Nikayas).
I personally use (link), perhaps even more often than an average person would yous his/her phone. I find this software the most useful for scholars like me, who always need to research, debate, and answer controversial Pali issues. CST4 works by paragraphs. For example, you open MN MulapannasaPali, go to paragraph 121 in MN MulapannasaPali, and you have there two buttons in the window - Atthakatha and Tika. If you click to Atthakatha, a new window pops up and shows you the Atthakatha to MN Mulapannasa right in paragraph 121. In Tika, it will pop up a window with Tika right in paragraph 121. (There is no par.121 in the Commentary and Tika, hence it shows the closest par., i.e., 119.) See some screenshots here - (3 links to screenshots)

I cannot count the hundreds of hours I saved thanks to this superfast linking. I think you may like to consider this way of referencing, I believe it is simple and very powerful.

I think I was not clear with my “suggestion.” What I would really wish to find in SuttaCentral (and I am sure thousands of scholars too) is to have the Commentaries (and Sub-Commentaries) database with translations . The ever-ready translations are what makes SuttaCentral so valuable for me because when I speak with Czech, Burmese, English, or Pali lay/monks students or scholars, I need to give them translations and the sooner the better. Sometimes I do my own research and cannot be sure about the original Pali, so I look up the translation in Burmese or Sinhalese. But I need to open translations in different PDFs etc., I am sure you know what I mean. So, the main point in my suggestion was not to include “Commentaries” but rather “Commentaries and their translations.”

I have also seen some posts here in the forum where people ask for translations of Commentaries. I think if SuttaCentral had them in the database it would be just sooooo sweet.

I have 5 workers who work fulltime on my large Dhamma projects. (All arranged according to the strictest Vinaya rules.) I cannot promise anything, but I would love to know what are the ways we could help and possibly join your amazing Dhamma endeavors.

Gabriel_L · June 25, 2020, 1:37am

sadhu bhante, out of curiosity, are your workers bhikkhus?

sarana · June 25, 2020, 1:42am

No no, Burmese gentlemen.

Gabriel_L · June 25, 2020, 1:44am

That`s nice. I am always glad to hear of collaboration between the parisa and the sangha towards preservation of the Dhamma-vinaya.

sujato · June 25, 2020, 2:16am

I’ve raised your user rights, so you should be fine in future.

Of course! Our pali texts come from him.

I’d have to look at it in more detail. The scopes of the projects are a little different, as he focusses more specifically on the Pali. I am not sure how portable such a feature would be.

I was basically just thinking of anything that falls outside the normal sutta structure. For example, DN atthakatha begins with Ganthārambhakathā, which on VRI has a different URL than the first sutta.

To understand, SuttaCentral’s structure is built from the idea of a “sutta”. We have text, translation, and parallels, all pointing to a “sutta”. Now, if we’re talking about say DN or MN, it is obvious what a “sutta” means. However in other cases it is not so clear. For example, are we to count each item in a peyyala series as a “sutta”? How to represent “suttas” that do not actually exist? What of Vinaya rules? Is a rule a “sutta”? Again, sometimes a “rule” is as little as a single character in a Chinese text: is that a “sutta”? And what of something like the Dhammapada? Is that a “sutta” or is each verse a “sutta”? Each of these contexts, and many more, must be answered by making some decision. And it is always complicated because the different sources we rely on make different decisions.

So if we were to introduce the commentaries, it would add another layer of complexities to this. Not saying it can’t be done!

Okay, great. Well, that’s pretty good, and it may well be something we could build upon. It’s been ages since I used the CST4 desktop app.

It would be great, and once the texts are divided into segments, it is definitely doable. The translations work via a simple web app called Bilara, which we have been building for the last couple of years, and which is now pretty much ready for use. It would simply be a matter of co-ordinating a team of volunteers, and translate, one sentence at a time.

That’s awesome, thanks!

One thing I would really love is to have modern, idiomatic Burmese translations. Is that something your team would be interested to do?

But if you are really interested in working with the commentaries, then we can discuss it. Is there someone on your team with programming expertise?

sarana · June 25, 2020, 5:24am

Thank you for raising my rights.

Yes, indeed, “SuttaCentral” indicates, that it is about suttas. Did you consider taking “Dhammakkhandha” as “sutta”? If so, you could take every paragraph (or I suppose that’s they way the Burmese understand it) in the Pali texts as one Dhammakkhandha and reference according to paragraphs. I have actually seen that your sutta references do point to paragraphs in suttas, so it seems you already take it this way. I think you can resolve this problem in the way you have done it with the other translations of suttas. I see you reference between Chinese, ancient, and various other translations and texts. Perhaps it would be possible to do it the same way in the Commentaries as well?

I see you have ancient fragments and their related texts separately shown in SuttaCentral’s main Tipitaka tree. I think you could take the Commentarial portions which are not directly connected to a sutta and either keep them without reference (yet) or give them any (probable) reference in the way I think it is done with some of the fragments. I think references and connections can be gradually improved as the time goes…

Translations… yes, and Burmese translations … I think it would be easy to find people to add the content (esp. for Burmese translations) when you get the commentaries system ready. The Burmese literature of Commentarial translations is vast and incredibly accurate. I have also Sinhala Atuwa, the Sinhalese Commentaries which also help a lot. Many (although not all) Sinhalese monks are particularly appreciative of them, even the president Rajapaksha supported this modern edition. But you might already know (or presumed) that nothing of that is typed yet. It’s all just picture-PDF, not searchable. I don’t have a good connection with many Sinhalese people, but for Burmese there’s a big hope.

I am not native English, so I may not understand sometimes, please, be patient with me. What do you mean by “idiomatic translation?”

None of my workers have programming skills, but I have some supporters who might be enthusiastic to work on this. I have seen ven. Subhuti contributing comments in this forum, he is actually a “professional programmer.” Let me know more details and we’ll get to work!

sujato · June 28, 2020, 2:53am

I’ve replied mostly in our private thread, so I’ll just say a few things here:

It can, but you should know, we have two quite separate systems working side-by-side on SuttaCentral.

The older system, which we call the “legacy HTML” texts, has texts (whether root texts or translations) in conventional HTML files, with internal paragraph links. The problem with this system is that all the content (text, markup, references, variants, notes, metadata …) is mixed up in one file, so it gets really messy and does not scale.

The new system, which all future development is based on, keeps all the text data as JSON objects, allowing us to cleanly separate all the different kinds of content, This is much more powerful, but it comes at a cost: it is a lot more work to prepare the files properly. How much work depends on the sources.

Anyhow, my point is, we can’t directly compare the legacy files on SC with any work we would do with the commentaries. It would be quite a new kind of project.

The starting point would be to look at the source code of the existing applications and see how they are linking the text and commentaries. Then we can think about how that might be implemented at SC.