Dhamma transmission and the ellipsis

karl_lew · August 21, 2018, 2:10pm

And here I thought it was an easy one.

There are some considerations here.

The four Arahant sections appear to differ only in the exposition of dhukkha (understood, greed, hate, delusion). Hearing and reciting these at length will be quite the experience.
There are two Perfected One sections that differ only in the exposition of realization (understood…, awakened…). Hearing and reciting these at length will also be quite the experience.
The mechanical expansion of MN1 itself still seems straightforward even though I grant that the recitation and listening is most certainly arduous at two hours in length. Had you encountered any ambiguity of meaning in the exhaustingly long 2hour version?
I now see the need to break up a 2hr sutta into chapters and sections for navigable listening, where a chapter is a group of semantically related expanded blocks with “surrounding content”. This jolts my mind into thinking about a standoff JSON markup for expansion with chapters (e.g., A mendicant who is perfected) and sections (e.g., one of the four Arahant expansion groups)

When you mentioned two hours, I was actually thrilled. Perhaps that makes me a bit of a masochist, but what I saw was the possibility of playing a single section repeatedly (e.g., Arahant/greed) to use as “sandpaper” for smoothing out a rough edge in our practice.

Thank you for the inspiration.

Are the transcripts for Pali Audio available on site? I would very much like to see them since they would inform any automated expansion effort.

karl_lew · August 21, 2018, 6:07pm

Might you by any chance have expanded MN1?

sujato · August 23, 2018, 1:19am

This is sounding fantastic, you’ve made incredible progress. By the way, I came across this article the other day, perhaps it might be useful:

Meanwhile, when you have a shareable service, let us know, we’d love to try it out! Just so you know, we are about to start a new round of upgrades to SC with our friends at STXnext. This won’t directly impact you, but it does mean that we have the opportunity to make any UI changes that would be helpful. So perhaps at some point we could sit down and talk this through. It will be at least a month, probably two or three, before we’re ready to do this, though.

Finally, if we are talking seriously about expanding texts by hand, the critical thing from our perspective is that we agree on an unambiguous way of expressing this in the data.

Currently we have something like dn1:1.1, where the number(s) before the colon represent the sutta number, and those after the colon represent the section and segment. Note that there is no explicit way of notating the segment as opposed to the section; perhaps we should do this. But currently, the final number represents the segment.

Adding additional text should be at the segment level. So we could technically just add another level, say dn1:1.1.1, dn1:1.1.2 and so on. But I think this would be a little confusing. And I’m wondering whether there might not be cases where the added segments are not mere subdivisions of the original segment. I’m not sure exactly what this might be, but I just have an inkling in my bones that we need something more.

Currently the SC standard is to use the CSS class .expanded to notate such cases.

https://suttacentral.net/zz1/zz/test

Perhaps we could introduce an x to indicate expanded text? Something like:

dn1:1.1 Thus have I heard
dn1:1.2 At one time …
dn1:1.2.x1 At one time the Buddha was staying at Savatthi, Jeta's Grove, Anathapindika's monastery.
dn1:1.3 Then he said:
dn1:1.3.x1 There the Buddha addressed the mendicants:
dn1:1.3.x2 "Mendicants!"
dn1:1.3.x3 "Yes, sir", replied the mendicants.
dn1:1.3.x4 "Listen up," said the Buddha.
dn1:1.3.x5 "Always," said the mendicants.
dn1:1.3.x6 The Buddha said this:
dn1:1.4 "Mendicants, I exhort you: ensure your data formats are explicit and unambiguous, lest you will suffer later on!"

So long as there is an explicit data markup, we can keep the expansions in a separate file and call them as needed.

@blake?

karl_lew · August 23, 2018, 10:17pm

Great article that provides a sobering overview of the challenges we face.

At this stage probably the only consideration for STXnext would be a way to get to a dedicated voice assistance page from suttacentral.net. Perhaps a link on the expandable left sidebar.

The current SC numbering is quite natural as it matches the formatted grouping of the Pali manuscripts, which basically has sections and segments. This is, I think, its primary function. With a reference to Pali manuscript original, we can easily cross-reference the segments of translators who start from the same Pali manuscript. As you point out, it might be useful to add additional semantic levels (e.g., MN1:Arahant-section#.delusion-subsection#.segment#)

I would not remove the ellipses, since they are placeholders that indicate where the expansion takes place. It’s easy to look for the “…” and we need not change the id for expansion.

I have been thinking a lot about the standoff markup concept you mentioned and it seems quite applicable to expansion–the expanded sutta is just another layer and will probably have its own markup.

sections: [
     { title: "Introduction", type:"chapter", start: "mn1:1.1", end: "mn1:2.1"},
     [
         { title: "Uneducated ordinary person", type:"section", start: "mn1:2.2", end:"mn1:2.4"},
         {start: "mn1:3.1", end"mn1:3.5", template:true, expand:"earth"},
          etc.
      ],[
           {title: "Trainee", start: ...}, ...    
      ],[
          etc.
       ],

]

The general idea is to embrace standoff markup for expansions. This will provide complete editorial freedom to multiple editors/translators since each can have their own standoff markup. It also keeps the translation itself pristine and useful as an almost invariant primary source on its own. Basically, I’m thinking that we should explore the use of separate JSON dedicated to expansion. This expansion standoff markup can reside in a separate file or as a a subtree within the sutta JSON document–but I would not interleave expansion notation with the sutta itself.

For now, I haven’t had time to work on expansion itself. I’ve been spending days patiently trying to teach Pali to Slow Amy. It’s actually quite the challenge! But we need this capability for reading suttas because the translated suttas have Pali words in them. English speaking voices such as Slow Amy completely garble Pali words, and she will require a special Pali lexicon for her very British accent.

sujato · August 24, 2018, 1:56am

Indeed. We’ll look at this in due course. I guess we should make sure this is one of the first elements accessed if anyone is using a site reader, then they can be informed that we have a voice interface right away.

Yeah, that’s not really a thing. Translators may work primarily from one text, but they generally consult a variety of editions and use the reading they think is best.

One issue, though, is that when we come to deal with multiple root editions, we will encounter cases where the root editions expand the text differently. I believe this will be fairly rare, as the editions are not very dissimilar when it comes to abbreviations. Still, it will happen from time to time, so any system we develop here for expansion should be applicable or adaptable to root texts as well.

Indeed. I discussed this with Blake yesterday and he emphasized the same point.

Gillian · August 24, 2018, 9:46am

I’m largely out of range atm. Will get back to you next week.

karl_lew · August 24, 2018, 12:40pm

Wow. Once again my naive assumptions are exploded. This is more complicated than genealogy! (Now my mind has gone off tangentially into blockchain for tracking translations and transmissions.)

With all the scattered sources, we may need a way to refer to a document or set of documents unambiguously. We can use signature guids for our JSON documents as they wander into and out our universe. Signature guids are generated from the content. For JSON documents, I use merkle-json to generate unique references to specific versions of given JSON documents. These guids can link a global corpus by identity no matter where the documents or their copies are. Using guids we can provide unambiguous chains of reference spanning great stretches of time for future scholars and translators.