Copying Pali & English verses

Snowbird · November 18, 2020, 11:28am

Has anyone figured out a way to easily copy verses off of SuttaCentral so it reads? And by verses I mean gatha, like in the Theragatha.

Pali line one
Pali line two
Pali line three
Pali line four
English line one
English line two
English line three
English line four

All I’m able to get is

English line onePali line one
English line twoPali line two
English line threePali line three
English line fourPali line four

or

English line one
Pali line one
English line two
Pali line two
English line three
Pali line three
English line four
Pali line four

Khemarato.bhikkhu · November 18, 2020, 11:45am

I believe @sabbamitta uses a command line version of voice search for beautiful copypasta?

sabbamitta · November 18, 2020, 12:03pm

Yes, indeed. @karl_lew built a version that automatically adds Markdown links to every segment. You can get either a bilingual or a monolingual version of your quote, or even trilingual (pli | en | de).

At the moment all links go to SuttaCentral’s English translation pages by Bhante Sujato and assume that you have set it to show the Pali text along with the translation. That will need to be adapted when more segmented translations become available on SC.

You find scv-bilara here, including some installation instructions (but I am not sure how up-to-date they are):

I only could install it on my computer with a lot of transatlantic help from Karl, since I am not a “native” programmer … But once installed it works great!

And maybe an easier way is to set SuttaCentral to bilingual line-by-line view and just copy from the website? It only works for segmented texts, just like scv-bilara. Here the beginning of MN 33:

1.1So I have heard. Evaṃ me sutaṃ— 1.2At one time the Buddha was staying near Sāvatthī in Jeta’s Grove, Anāthapiṇḍika’s monastery. ekaṃ samayaṃ bhagavā sāvatthiyaṃ viharati jetavane anāthapiṇḍikassa ārāme. 1.3There the Buddha addressed the mendicants, Tatra kho bhagavā bhikkhū āmantesi: 1.4“Mendicants!” “bhikkhavo”ti.

1.5“Venerable sir,” they replied. “Bhadante”ti te bhikkhū bhagavato paccassosuṃ.

It looks like you have to adjust the line formatting.

In scv-bilara I get this for the same bit of text:

MN33:1.1: Evaṁ me sutaṁ—
MN33:1.1: So I have heard.
MN33:1.2: ekaṁ samayaṁ bhagavā sāvatthiyaṁ viharati jetavane anāthapiṇḍikassa ārāme.
MN33:1.2: At one time the Buddha was staying near Sāvatthī in Jeta’s Grove, Anāthapiṇḍika’s monastery.
MN33:1.3: Tatra kho bhagavā bhikkhū āmantesi:
MN33:1.3: There the Buddha addressed the mendicants,
MN33:1.4: “bhikkhavo”ti.
MN33:1.4: “Mendicants!”
MN33:1.5: “Bhadante”ti te bhikkhū bhagavato paccassosuṁ.
MN33:1.5: “Venerable sir,” they replied.

Maybe a bit many links …

Snowbird · November 18, 2020, 12:14pm

I don’t think that’s what I’m after. When I say verses, I mean gatha, like in the Theragatha.

I believe it would just output

English line one
Pali line one
English line two
Pali line two
English line three
Pali line three
English line four
Pali line four

which is also not what I’m after.

sabbamitta · November 18, 2020, 12:14pm

Only now I see that this is what you want. No, very sorry, but even awesome scv-bilara can’t give you this. You’d have to re-arrange the lines.

sabbamitta · November 18, 2020, 12:15pm

Yes, you’re right, and I just posted at the same moment as you …

karl_lew · November 18, 2020, 2:00pm

@snowbird, we can add an scv-bilara feature to group by major segment number bilingually. Is a command-line tool acceptable? Or do you need a web application (we haven’t yet had time to make scv-bilara into an “ebt-quote” web application)?

Snowbird · November 18, 2020, 2:34pm

Dear Karl, thank you so much for your offer. Recently Bhante Sujato walked me through setting up things on my machine so I could run the sheet_export.py script. So if it is a script like that that I can run on a windows machine with python installed, then that would be marvelous.

I’m fairly good at using regex to get things the way I want. So as long as the Pali is marked up differently from the English, the Pali and English are their own block elements (like p), and there are breaks between lines, I should be good. But this is what I’m after
<blockquote>“Putto buddhassa dāyādo, kassapo susamāhito; Pubbenivāsaṃ yovedi, saggāpāyañca passati.</blockquote>

<blockquoteKassapa is the son and heir of the Buddha, whose mind is immersed in samādhi. He knows his past lives, he sees heaven and places of loss,</blockquote>

I know it’s not the best html, but it’s what I need to work with at the moment. If it is better for you to export as a markdown file, I think that would work as well.

I really appreciate whatever you can offer. Hopefully it will be beneficial to others as well.

sabbamitta · November 18, 2020, 2:54pm

Thig4.1:0.1: Therīgāthā
Thig4.1:0.1: Verses of the Senior Nuns
Thig4.1:0.2: Catukkanipāta
Thig4.1:0.2: The Book of the Fours
Thig4.1:0.3: 1. Bhaddākāpilānītherīgāthā
Thig4.1:0.3: 4.1. Bhaddā Kāpilānī
Thig4.1:1.1: “Putto buddhassa dāyādo,
Thig4.1:1.1: Kassapa is the son and heir of the Buddha,
Thig4.1:1.2: kassapo susamāhito;
Thig4.1:1.2: whose mind is immersed in samādhi.
Thig4.1:1.3: Pubbenivāsaṁ yovedi,
Thig4.1:1.3: He knows his past lives,
Thig4.1:1.4: saggāpāyañca passati.
Thig4.1:1.4: he sees heaven and places of loss,
Thig4.1:2.1: Atho jātikkhayaṁ patto,
Thig4.1:2.1: and has attained the end of rebirth:
Thig4.1:2.2: abhiññāvosito muni;
Thig4.1:2.2: that sage has perfect insight.
Thig4.1:2.3: Etāhi tīhi vijjāhi,
Thig4.1:2.3: It’s because of these three knowledges
Thig4.1:2.4: tevijjo hoti brāhmaṇo.
Thig4.1:2.4: that the brahmin is a master of the three knowledges.
Thig4.1:3.1: Tatheva bhaddā kāpilānī,
Thig4.1:3.1: In exactly the same way, Bhaddā Kāpilānī
Thig4.1:3.2: tevijjā maccuhāyinī;
Thig4.1:3.2: is master of the three knowledges, destroyer of death.
Thig4.1:3.3: Dhāreti antimaṁ dehaṁ,
Thig4.1:3.3: She bears her final body,
Thig4.1:3.4: jetvā māraṁ savāhiniṁ.
Thig4.1:3.4: having vanquished Māra and his mount.
Thig4.1:4.1: Disvā ādīnavaṁ loke,
Thig4.1:4.1: Seeing the danger of the world,
Thig4.1:4.2: ubho pabbajitā mayaṁ;
Thig4.1:4.2: both of us went forth.
Thig4.1:4.3: Tyamha khīṇāsavā dantā,
Thig4.1:4.3: Now we are tamed, our defilements have ended;
Thig4.1:4.4: sītibhūtamha nibbutā”ti.
Thig4.1:4.4: we’ve become cooled and quenched.
Thig4.1:5.1: …
Thig4.1:5.2: Bhaddā kāpilānī therī ….
Thig4.1:6.1: Catukkanipāto niṭṭhito.
Thig4.1:6.1: The Book of the Fours is finished.

This is the relevant verse in scv-bilara’s current Markdown output. I am posting this here to show the verse structure:

The “0” numbers after the colon are titles.
The “1” numbers after the colon are the first verse, the “2” the second, etc.
The “5” and “6” are the “end of sutta” and “end of chapter” parts that are mostly not translated—except for the “end of book” line thig4.1:6.1 in this case.

So in theory it would be possible to group by the numbers that come after the colon.

Would you like to see the segment numbers in your output?

karl_lew · November 18, 2020, 3:01pm

Thanks for the example, Anagarika Sabbamitta. It will take a bit of work to support scv-bilara output grouped by verse, not by line. For example, one should be able to search for “son and heir” to get:

Thig4.1:1.1: Kassapa is the son and heir of the Buddha,
whose mind is immersed in samādhi.
He knows his past lives,
he sees heaven and places of loss,

I, too, have wished for this capability.

sabbamitta · November 18, 2020, 3:04pm

It’d certainly be awesome not just for one person. I’ll make an issue for us, maybe after @Snowbird has more precisely specified how the output should be, like:

Should it include segment numbers?
Should it include titles and end-of-sutta etc. stuff?

karl_lew · November 18, 2020, 3:09pm

For verse output, different formats could be supported: html, markdown, bilingual, etc.

@snowbird, scv-bilara runs on Linux. I think that Windows has reluctantly embraced Linux support belatedly, so I’m hoping that we can get you the revised functionality early in a command-line. Eventually, I think we can offer a simpler version of scv-bilara on the web, perhaps as part of Voice. In either case, we’ll be looking into solving this shared need. Thank you for bringing up this topic.

Snowbird · November 18, 2020, 3:13pm

I think when it comes time to generate pdfs it will also be useful. Alternating line by line instead of verse by verse is a bit ugly and takes up quite a bit more space. As well, verses aren’t always translated line by line, so it’s not even helpful really, to have line by line

.

Thanks for bring this up. I have no need for the segment numbers, but if they are there I can probably regex them out.
The titles, however, are needed. I forgot about them. They can be included in any way you like.

Would the script be in python? Because that runs fine on Windows. I have the git repository pulled down to my local machine.

I only need to be able to do it one time.

sabbamitta · November 18, 2020, 3:13pm

In the bilara-data html.json files verses are specified as gatha, see for example line 5 here:

github.com

sc-voice/bilara-data/blob/unpublished/html/pli/ms/sutta/kn/thig/thig4.1_html.json

{
  "thig4.1:0.1": "<article id='thig4.1'><header><ul><li class='division'>{}</li>",
  "thig4.1:0.2": "<li>{}</li></ul>",
  "thig4.1:0.3": "<h1 class='sutta-title'>{}</h1></header>",
  "thig4.1:1.1": "<blockquote class='gatha'><p><a class='vns' id='vns63'></a>{}<br>",
  "thig4.1:1.2": "{}<br>",
  "thig4.1:1.3": "{}<br>",
  "thig4.1:1.4": "{}</p>",
  "thig4.1:2.1": "<p><a class='vns' id='vns64'></a>{}<br>",
  "thig4.1:2.2": "{}<br>",
  "thig4.1:2.3": "{}<br>",
  "thig4.1:2.4": "{}</p>",
  "thig4.1:3.1": "<p><a class='vns' id='vns65'></a>{}<br>",
  "thig4.1:3.2": "{}<br>",
  "thig4.1:3.3": "{}<br>",
  "thig4.1:3.4": "{}</p>",
  "thig4.1:4.1": "<p><a class='vns' id='vns66'></a>{}<br>",
  "thig4.1:4.2": "{}<br>",
  "thig4.1:4.3": "{}<br>",
  "thig4.1:4.4": "{}</p></blockquote>",

This file has been truncated. show original

That’s something we are having in mind anyway, so seeing there are actually users who are interested is great!

sabbamitta · November 18, 2020, 3:17pm

Voice is in Javascript.

This is an issue to start with, which can be specified as more details come to mind:

karl_lew · November 18, 2020, 4:02pm

Indeed! Perhaps these may be able to help us…

The script is in javascript, which can also run on Windows. Likely you shall be our first Windows guinea pig.

Thank you, Anagarika Sabbamitta.

khagga · November 18, 2020, 4:10pm

Nerd moment: I was thinking about trying to write a quick bookmarklet that could serve as an interim solution: it could just select the right elements, grab the text, and then build some output HTML. However, it seems that this is not possible, since the custom elements in SC have their shadow DOMs set to closed. </ >

Khemarato.bhikkhu · November 18, 2020, 9:09pm

Exactly. These segments are too small.

sujato · November 18, 2020, 10:28pm

Just to confirm, as noted, the only reliable way to do this at the content level would be to parse it back from the HTML.

Ideally, we might have a more explicit markup to group chunks of text together, perhaps we could look at adding something like that to Bilara.

However, note that we do essentially the same thing in the side-by-side view just with CSS. Check staging for the modern approach to this using display: grid. If you’re using the texts on a website this is easy to do.

Try doing the same thing on the staging site. It still uses Shadow DOM, but you never know, you might be in luck.

We render text to shadow DOM by accident, TBH, the site is built on Polymer which uses shadow DOM by default. On the new site it may be better to switch off shadow DOM for the text and just use it for other components. But we haven’t really looked into it.

khagga · November 18, 2020, 11:19pm

Thanks for the pointers, Bhante, I’ll try that. I’ve never used Polymer, but I do love Web Components in general. I find the Shadow DOM part of the standard rather weird to be honest — I feel like most of the benefit in Web Components lies in simple custom elements. But that may be down to the fact that I work mostly alone on my own stuff, and the Shadow DOM is more oriented toward larger projects where a lot of people are trying to avoid stepping on each other’s toes.