Suggestion: Automated Submissions For Sutta Translations

I’ve gained a lot by being able to view alternate translations on SuttaCentral.net

I remember reading that not all alternate translations that could be on SuttaCentral.net are on SuttaCentral.net. The process for entering suttas is manual, and just too much work.

How about automating that process?

I know that would be an amazing amount of work.

Perhaps it could be started with a standard sutta translation submission form on SuttaCentral.net, that a person would gain access to through permission, and where submissions would be held for approval.

After that perhaps a program could be made for web masters of other translation sites that they would tweak to automatically submit what they already have.

Perhaps before any of that work started the owners of the major translation sites could have a “council” about what their needs would be with automated translation submissions.

1 Like

For automating anything, I recommend doing the task manually for a while and then slowly automating the parts of the process that make sense to automate.

1 Like

So what, exactly, are the major translation sites you are talking about? The only other site I know of that could be called “major” in terms of volume is dhammatalks.org. If we expand the definition, then ancient-buddhist-texts.net would be included. And if we really stretched you might include suttafriends.org.

I don’t see any incentive for those three sites to publish their translations in two places. It’s a headache they don’t need. Especially when it comes to making any corrections or updates. They would have to do it in two places.

Also, for folks not aware, there are two kinds of translations on SuttaCentral. For the user the difference is kind of hidden. One is the “legacy” translations. These are hard coded html. They do not have the ability to show line by line Pali with the translations. The license on these translations varies by the author.

The second type is the “Bilara” translations that are done using the Bilara software. This facilitates the creation of translations in a segmented fashion, allowing them to be (at the will of the end user) displayed line by line with the Pali. The license on these is CC0, similar to public domain.

I find that Bhante Sujato is really open to suggestions on how to make SC better. But I also feel people need to respect when he has made a policy decision on how the site will run. I wouldn’t want him to have to spend time defending that decision when it could be spent doing productive things on the site. I don’t think it has ever been the mission of SC to provide all translations available. So your proposal would mean a change of that mission.

I believe (and I could be wrong) that if any of those three sites wanted to add their translations using the Bilara system they would be welcome to. That seems reasonable to me. But it would require them to change their license.

I think you know about the Citation Helper tool I created. Using that will cover most of the translations available free online. Since I created it I have been watching the RSS feeds of the sites I mentioned above and there have been no new translations added to their sites. I have also written up instructions for using a browser plugin so you can select a citation on a web page and instantly open it in the CitationHelper showing all the translations.

1 Like

I did not know of the Citation Helper tool, I will check it out, thank you.

1 Like

Oh, great. I find it extremely useful, but then again, I built it. (with much help from @Khemarato.bhikkhu). I also wrote a bookmarlet to scrape the urls from the results and create the text you see at the bottom of the daily sutta emails I put out. If you are interested, it’s this:

code
javascript: (() => {   const currentCitations = document.getElementsByClassName("url-button-link");    let numberOfItems = currentCitations.length;   let textLinks = [];   let audioLinks = [];   let textSites = ["SC", "SF", "DT", "ABT"];   let audioSites = ["PA", "SCV"];    for (let i = 0; i < numberOfItems; i++) {     const thisCitation = currentCitations[i].attributes.site.value;     const thisUrl = currentCitations[i].href;     if (textSites.includes(thisCitation)) {       textLinks.push([thisCitation, thisUrl]);     } else if (audioSites.includes(thisCitation)) {       audioLinks.push([thisCitation, thisUrl]);     }   }   function createTextSiteText(textLinks) {     let numberOfItems = textLinks.length;     let output = "";     for (let i = 0; i < numberOfItems; i++) {       let prettySiteUrl = "";       let endJoiner = "";       let currentSiteName = textLinks[i][0];       let currentSiteUrl = textLinks[i][1];        switch (currentSiteName) {         case "SC":           prettySiteUrl = "SuttaCentral.net";           break;         case "SF":           prettySiteUrl = "SuttaFriends.org";           break;         case "DT":           prettySiteUrl = "DhammaTalks.org";           break;         case "ABT":           prettySiteUrl = "Ancient-Buddhist-Texts.net";           break;         default:           prettySiteUrl = "Something went wrong";       }        if (i === numberOfItems - 1) {         endJoiner = ".";       } else if (i === numberOfItems - 2) {         endJoiner = " or ";       } else {         endJoiner = ", ";       }       output += `<a href="${currentSiteUrl}" rel="noreferrer" target="_blank">${prettySiteUrl}</a>${endJoiner}`;     }     return output;   }    function createAudioSiteText(audioLinks) {     let numberOfItems = audioLinks.length;     let output = "";     for (let i = 0; i < numberOfItems; i++) {       let prettySiteUrl = "";       let endJoiner = "";       let currentSiteName = audioLinks[i][0];       let currentSiteUrl = audioLinks[i][1];        switch (currentSiteName) {         case "PA":           prettySiteUrl = "PaliAudio.com";           break;         case "SCV":           prettySiteUrl = "Voice.SuttaCentral.net";           break;         default:           prettySiteUrl = "Something went wrong";       }        if (i === numberOfItems - 1) {         endJoiner = ".";       } else if (i === numberOfItems - 2) {         endJoiner = " or ";       } else {         endJoiner = ", ";       }        output += `<a href="${currentSiteUrl}" rel="noreferrer" target="_blank">${prettySiteUrl}</a>${endJoiner}`;     }     return output;   }    let finalOutput = `Or read a different translation on ${createTextSiteText(     textLinks   )} Or <i>listen</i> on ${createAudioSiteText(audioLinks)}`;   navigator.clipboard.writeText(finalOutput); })();

There is one issue, though. The numbering on DhammaTalks.org is, for some suttas in SN and AN, off by a little bit. I haven’t decided on a way to fix that. If you have any ideas or suggestions, please discuss it in the citation helper thread. And of course the limitation for my tool is that it has to be updated when new translations on those sites come out. But that’s a fairly rare occurrence.

It is integrated (kind of) with the name lookup tool I built, which you may also find helpful.

1 Like

That’s a great tool, and I think that, rather than add copies of sutta, etc, to SC, it would be much more efficient to have more tools like that either on SC, or linked on SC. There are similar issues for navigation. I’d love to have some more friendly tables of contents, particularly for the Vinaya, but, again, that could be handled by separate pages, rather than messing with the SC structure, which has been constructed for a particular task.

2 Likes

Thanks everyone for the contributions.

Just briefly, let me add a few words on the original topic, a conversion tool for suttas.

We have to distinguish two use cases: a general purpose tool (X to sc) or a specific tool. The former is effectively impossible, since the source formats are too diverse. Moreover it doesn’t really help anything. if you know what you’re doing it is possible to convert a text the size of a Nikaya in a couple of hours.

The second option is more doable, if you have a tightly defined set of source files. I have thought of doing something like this, not for translations, but for Chinese source texts: cbeta to sc. Still, even in this case it would be a lot of work, and you’d probably be better off with a text editor and some regex.

The ultimate problem is about abstraction. In file conversion, low level abstraction is trivial, but high level abstraction is really hard.

My NLP professor in Uni compared it to stone soup: no magic, you just add the recipes one at a time.

Granted, that was before GPT-3… :thinking:

Right, and the appropriate fancy tool might get the job done. But you still end up with the same problem. To set it up, run the tests, and evaluate the results, it’d require a specialized engineer and take a good deal of time. Meanwhile, an amateur armed with regex and Sublime text could have prepared a half-dozen Pitakas.

I actually had this conversation a few years ago with a high-level engineer. He was a Vietnamese Buddhist working in Germany, and I approached him to discuss this problem. He immediately grasped the issue and said it’s not worth trying to engineer a solution. :person_shrugging:

2 Likes