Copying Pali & English verses

sujato · November 18, 2020, 11:33pm

Indeed. It’s an emerging spec and the right way to use it is still being worked out. It seems the trend is towards “use Shadow DOM where it makes sense”, which mostly is in encapsulated components.

I think the killer app for web components is the corporation. Use the same components with styles and all in any web app.

Snowbird · November 19, 2020, 5:02am

I’m happy to be a guinea pig. I could create what I need manually, but the chances of me making a mistake are non-zero. Please let me know when you have something for me to try.

karl_lew · November 19, 2020, 2:51pm

The newest scv-bilara version v1.4.39 will return the following for “son and heir”:

Thig4.1:1.1: “Putto buddhassa dāyādo,
kassapo susamāhito;
Pubbenivāsaṁ yovedi,
saggāpāyañca passati.
Thig4.1:1.1: Kassapa is the son and heir of the Buddha,
whose mind is immersed in samādhi.
He knows his past lives,
he sees heaven and places of loss,

The full command line for the above is:

scripts/js/search.js son and heir --groupVerse2 --break1

The actual output is:

> [Thig4.1:1.1](https://suttacentral.net/thig4.1): “Putto buddhassa dāyādo,  
kassapo susamāhito;  
Pubbenivāsaṁ yovedi,  
saggāpāyañca passati.
> [Thig4.1:1.1](https://suttacentral.net/thig4.1/en/sujato#thig4.1:1.1): Kassapa is the son and heir of the Buddha,  
whose mind is immersed in samādhi.  
He knows his past lives,  
he sees heaven and places of loss,

Here, a “verse” is simply a set of contiguous segments starting with an id ending in .1. The segment numbers are inclined to be semantic and we simply take advantage of Bhante Sujato’s numbering. You will notice that the “verse” shown is not exactly grammatically complete as it ends in a comma. The actual verse is longer in translation, although the Pali original verse does indeed end in a period.

The --groupVerse2 option indicates the bilingual output is requested. Monolingual and trilingual are also available.

The --break1 option will honor line breaks by segment. The option --break0 simply joins all lines without a linebreak:

Thig4.1:1.1: “Putto buddhassa dāyādo, kassapo susamāhito; Pubbenivāsaṁ yovedi, saggāpāyañca passati.
Thig4.1:1.1: Kassapa is the son and heir of the Buddha, whose mind is immersed in samādhi. He knows his past lives, he sees heaven and places of loss,

This implementations serves my own quotation needs. How well does it match yours?

Snowbird · November 19, 2020, 3:43pm

So, my need is not really for quotations. I just need a single copy of, say, the Therigatha exported. I’m not searching for individual verses. The line breaks you have are what I need. I need to end up with the Pali in bold and the English not. It looks like the url’s are different in a consistent way so I might be able to wrangle them to do what I need.

I have no need for the citations or the links (other than if I need them to mark up the pali.) This is what I’m working towards:
<h1>Name of the monk</h1>
<blockquote><p><strong>“Putto buddhassa dāyādo,<br>kassapo susamāhito;<br>Pubbenivāsaṃ yovedi,<br>saggāpāyañca passati.</strong></p></blockquote>

<blockquote<p>Kassapa is the son and heir of the Buddha,<br>whose mind is immersed in samādhi.<br>He knows his past lives,<br>he sees heaven and places of loss,</p></blockquote>

For the whole book. It’s just a one time thing, I don’t need to generate it on demand.

Does that make sense?

karl_lew · November 19, 2020, 4:56pm

Yes. That does make sense. And it might be easiest to simply deliver that to you in the form of a new GitHub repository with all the HTML files you need. Then you could work directly within the repository as you desire. Would that be acceptable?

sujato · November 19, 2020, 8:15pm

Of course! This method would work for any text if we can assume that it is all verses.

It might get tripped up on the “conclusion” lines at the end of texts, though.

Snowbird · November 20, 2020, 5:55am

I think so, although I am kind of faking understanding precisely what you are talking about.

As the conversation has developed, I realize that there are actually a couple of projects I could use this data for. Thinking about that, I wonder if a structure like this would be useful for lots of purposes:

<h1>
<span class="pali_nipata_title">Catukkanipāta</span>
<span class="translation_nipata_title">The Book of the Fours</span>
</h1>
<h2>
<span class="pali_sutta_title">Bhaddākāpilānītherīgāthā<span> 
<span class="translation_sutta_title">4.1. Bhaddā Kāpilānī</span>
</h2>
<p class="pali_verse_paragraph">
<span class="citation pali_citation">
<a href="https://suttacentral.net/thig4.1">Thig4.1:1.1 </a>
</span>
“Putto buddhassa dāyādo,<br>
kassapo susamāhito;<br>
Pubbenivāsaṁ yovedi,<br>
saggāpāyañca passati.</p>
<p class="translation_verse_paragraph">
<span class="citation translation_citation">
<a href="https://suttacentral.net/thig4.1/en/sujato#thig4.1:1.1">Thig4.1:1.1 </a>
</span>
Kassapa is the son and heir of the Buddha,  
whose mind is immersed in samādhi.<br>
He knows his past lives,<br>
he sees heaven and places of loss,</p>

What do you think about that? At the moment my projects involve the Thera and Therigatha. I’m not sure how the classes would apply to other verse-exclusive texts like Dhp, Vv, and Pv.

sujato · November 20, 2020, 7:49am

This may help to clean up the markup for you. A few tips I have learned along the way.

The pali and translation are the same “thing” and should be represented as such in the HTML. We use <blockquote> but <div> or whatever is fine too. (Blockquote really only makes semantic sense when a verse appears in a prose text, but it is a handy tag.)
I’d recommend just having one ID for the whole thing or you can put separate ones on text and translation (y tho?)
add lang and translate=no for a11y
Don’t do <br></p>

<div class="verse" id="thig4.1:1">
<span class="citation translation_citation">
<a href="https://suttacentral.net/thig4.1/en/sujato#thig4.1:1.1">Thig4.1:1.1 </a>
</span>
<p class="pali_verse_paragraph" lang="pi" translate="no">
“Putto buddhassa dāyādo,<br>
kassapo susamāhito;<br>
Pubbenivāsaṁ yovedi,<br>
saggāpāyañca passati.</p>
<p class="translation_verse_paragraph" lang="en">
Kassapa is the son and heir of the Buddha,  <br>
whose mind is immersed in samādhi.<br>
He knows his past lives,<br>
he sees heaven and places of loss,</p>
</div>

Snowbird · November 20, 2020, 8:13am

Thank you for these. They all work for what I am trying to do, and I agree would be better in general. The <br></p> was a typo.

What do you think @karl_lew?

khagga · November 20, 2020, 11:25am

Now this is my kind of conversation

May I interject to ask what the translate="no" attribute does in <p class="pali_verse_paragraph" lang="pi" translate="no">?

(EDIT: …Oh I see now).

karl_lew · November 20, 2020, 3:48pm

I think that we have a new repository to look at:

github.com

sc-voice/bilara-verse/blob/main/translation/en/sujato/sutta/kn/thig/thig4.1.html

<div class="title" id="thig4.1:0.1">
<a href="https://suttacentral.net/thig4.1/en/sujato#thig4.1:0.1">Thig4.1:0.1</a>
<div class="root_titles" lang="pli">
<h1>Therīgāthā</h1>
<h2>Catukkanipāta</h2>
<h3>1. Bhaddākāpilānītherīgāthā</h3>
</div>
<div class="translation_titles" lang="en">
<h1>Verses of the Senior Nuns</h1>
<h2>The Book of the Fours</h2>
<h3>4.1. Bhaddā Kāpilānī</h3>
</div>
</div>
<div class="verse" id="thig4.1:1.1">
<a href="https://suttacentral.net/thig4.1/en/sujato#thig4.1:1.1">Thig4.1:1.1</a>
<p class="pali_verse_paragraph" lang="pi" translate="no">
“Putto buddhassa dāyādo,<br/>
kassapo susamāhito;<br/>
Pubbenivāsaṁ yovedi,<br/>
saggāpāyañca passati.

This file has been truncated. show original

Let’s get thig4.1 to look as we wish then I’ll run the script on other files.
BTW, what’s your Github username? I’ll give you access.

Snowbird · November 20, 2020, 4:13pm

Wow. That’s great! Thank you so much. I’m easily able to turn it into exactly what I’m after.

Snowbird · November 22, 2020, 2:40am

karl_lew:

<div class="verse" id="thig4.1:0.1">
<a href="https://suttacentral.net/thig4.1/en/sujato#thig4.1:0.1">Thig4.1:0.1</a>
<p class="pali_verse_paragraph" lang="pi" translate="no">
Therīgāthā<br/>
Catukkanipāta<br/>
1. Bhaddākāpilānītherīgāthā
</p>
<p class="translation_verse_paragraph" lang="en">
Verses of the Senior Nuns<br/>
The Book of the Fours<br/>
4.1. Bhaddā Kāpilānī
</p>
</div>

Now that I look at this again, I think that it is better if this is rendered as three separate div’s, one for book title, chapter title and “sutta title”. Each div should probably also have a class of their own since styling is sure to be different between them. "translation_verse_paragraph" isn’t really accurate anyway.

sujato · November 22, 2020, 6:53am

I’d recommend using similar markup to SC here, modern HTML has some nice guidelines for this sort of thing.

As a rule use <p> for subheading-type things unless it is considered navigation, in which case use <ul>.

In this case, as these are not arbitrary bits of text to expand the heading, but rather, steps in a hierarchical structure, I think <ul> is more semantic.

But either is fine (or <div> or <span>). But you should definitely use <h1> for your main heading (or <h2> if there are multiple suttas on a page.)

Also, don’t use <br/> what is this, 2005?

TBH you don’t need classes for the verses, the lang attributes let you select them for styling just fine (.verse:lang[pi]).

But you can use them for the heading if you like. Personally I’d try without to start with and see if I could get away with using CSS selectors to achieve the same thing. But that’s just me, I like using plain tags where I can, but there’s definitely an argument that we should class all the things.

Also I’d recommend wrapping each “sutta” in an <article> tag with the sutta id on it. An <article> represents a distinct, self-contained “thing”.

While we’re at it, technically a <section> tag might be better than <div> for the verses, as it essential means “a part of a larger thing”.

<article id="thig4.1">
<header>
<a href="https://suttacentral.net/thig4.1/en/sujato#thig4.1:1.1">Thig4.1:1.1 </a>
<ul lang="pi" translate="no">
<li class="book">Therīgāthā</li>
<li class="part">Catukkanipāta</li>
<li class="title">1. Bhaddākāpilānītherīgāthā</li>
</ul>
<ul lang="en">
<li class="book">Verses of the Senior Nuns</li>
<li class="part">The Book of the Fours</li>
</ul>
<h1 class="title" lang="en">4.1. Bhaddā Kāpilānī</h1>
</div>
</header>
<section class="verse" id="thig4.1:1">
<a href="https://suttacentral.net/thig4.1/en/sujato#thig4.1:1.1">Thig4.1:1.1 </a>
<p lang="pi" translate="no">
“Putto buddhassa dāyādo,<br>
kassapo susamāhito;<br>
Pubbenivāsaṁ yovedi,<br>
saggāpāyañca passati.</p>
<p lang="en">
Kassapa is the son and heir of the Buddha,  <br>
whose mind is immersed in samādhi.<br>
He knows his past lives,<br>
he sees heaven and places of loss,</p>
</section>
</article>

Snowbird · November 22, 2020, 7:37am

I was talking about this regarding the book, section, etc. But I do agree that it isn’t needed for the verses.

I’m wondering why you don’t group the Pali and English for the book, part, and title together. Something like

<ul>
<li class="book"><span lang="pi">Therīgāthā </span><span lang="en">Verses of the Senior Nuns</span></li>
<li class="part"><span lang="pi">Catukkanipāta </span><span lang="en">The Book of the Fours</span></li>
</ul>

<h1 class="title"><span lang="pi">1. Bhaddākāpilānītherīgāthā </span><span lang="en">4.1. Bhaddā Kāpilānī</span></h1>

I’m also not sure why the first part should be a list. I think of a list of things all being at the same level hierarchically unless they are indented. Anyway, for my purposes for this specific set of projects it is all fine. Although I do really prefer that the Pali and English h tags be grouped together.

Thanks, Bhante for all your feedback and support.

karl_lew · November 22, 2020, 4:24pm

My muddle brain has somewhat parsed the above to come up with a new proposal:

github.com

sc-voice/bilara-verse/blob/main/translation/en/sujato/sutta/kn/thig/thig4.1.html

<div class="title" id="thig4.1:0.1">
<a href="https://suttacentral.net/thig4.1/en/sujato#thig4.1:0.1">Thig4.1:0.1</a>
<div class="root_titles" lang="pli">
<h1>Therīgāthā</h1>
<h2>Catukkanipāta</h2>
<h3>1. Bhaddākāpilānītherīgāthā</h3>
</div>
<div class="translation_titles" lang="en">
<h1>Verses of the Senior Nuns</h1>
<h2>The Book of the Fours</h2>
<h3>4.1. Bhaddā Kāpilānī</h3>
</div>
</div>
<div class="verse" id="thig4.1:1.1">
<a href="https://suttacentral.net/thig4.1/en/sujato#thig4.1:1.1">Thig4.1:1.1</a>
<p class="pali_verse_paragraph" lang="pi" translate="no">
“Putto buddhassa dāyādo,<br/>
kassapo susamāhito;<br/>
Pubbenivāsaṁ yovedi,<br/>
saggāpāyañca passati.

This file has been truncated. show original

Snowbird · November 23, 2020, 10:09am

That would work for me. Although I do think Bhante is probably right that we should use <br> instead of <br/>.
Thanks!

karl_lew · November 23, 2020, 1:04pm

I disagree with Bhante. The self-closing <br/> is compatible with more parsers that actually will not accept <br>. If you wish, I can change it, but as a standard,<br/> causes fewer problems.

Snowbird · November 23, 2020, 1:42pm

OK, I actually thought the same as you but wasn’t confident to defend it. Either one is fine with me.

karl_lew · November 23, 2020, 4:36pm

Bhante is quite right that <br/> is annoying. Unfortunately we often have to work with annoying software when massaging html files. I would agree with Bhante that <br/> is not necessary for any modern browser. With the <br/> you’ll even be able to use XML parsers if that suits your needs. And XML is extremely annoying.