Fixing HTML on legacy texts

Aminah · June 16, 2019, 2:03pm

As per discussion elsewhere this tasks plods on (currently having got itself embroiled in correcting VI SN numbers). An update will follow in due course.

However, I wanted to quickly check up on detail. Prompted by the follow in connection to po/json texts:

What is the correct way for the legacy texts? Currently, a couple of ways can be found:

<a class="sc" id="1">
(this mostly only appears in root texts, but also can be found elsewhere: nl/…/snp/dubois/, en/…/thig/thanissaro/, en/…/thig/others/, in vagga suttas and also in all but one of the dhp translations, <a class="sc" id="1" data-uid="dhp1">)
<a class="sc" id="sc1"> (this is most common across the legacy translations—where there are ids—and as <a class="sc" id="sc1" data-uid="dhp1"> in the IT dhp)

Would, indeed, be good if this could be explained.

With regards to that AN ones and twos in the majority of cases (de, en, es*, fr, hu, it, my, nl, ru, si, sl, vi), it’s not applied at all and suttas are just separated by <h1> (well, now <h2> in my working branch), or have been translated as one sutta. In the instances where it is used (ca, cs, es – 1 suttas, id, no – 2 suttas, pt, th) there are a couple of slight variations in application:

<div id="text" lang="cs">
<section id="vagga">
<section class="sutta" id="1" data-uid="an1.1-10">

<div id="text" lang="th">
<section id="vagga" data-uid="an1-10">
<section class="sutta" id="1" data-uid="an1.1">

<div id="text" lang="id">
<section id="vagga">
<section class="sutta" id="1" data-uid="an1.1">

Trying to synthasize what has previously been discussed (particularly with referrence to your outline of how vagga-suttas should be handled), and the impression I have from looking over the files that use data-uid, the following is my current thought what migth need to be done (before the great reform by script):

Current (live site)

<div id="text" lang="th">
<section id="vagga" data-uid="an1-10">
<section class="sutta" id="1" data-uid="an1.1">
<article>
<div class="hgroup">
<p class="division">อังคุตตรนิกาย</p>
<p class="subdivision">เอกนิบาต</p>
<h1>1.1</h1>
</div>
<p …</p>
</article>
</section>
<section class="sutta" id="2" data-uid="an1.2">
<article>
<div class="hgroup">
<h1>1.2</h1>
</div>
<p>…</p>
</article>
</section>
…
<aside id="metaarea">
…

As pressently on working HTML branch

<div id="text" lang="th">
<section class="vagga" id="an1.1-10">
<article>
<div class="hgroup">
<p class="division">อังคุตตรนิกาย</p>
<p class="subdivision">เอกนิบาต</p>
<h1>1.1–10</h1>
</div>
<h2 id="an1.1">1.1</h2>
<p>…</p>
<h2 id="an1.2">1.2</h2>
<p>…</p>
…
</article>
<aside id="metaarea">
…

Deducded/proposed

…
<div id="text" lang="th">
<section id="vagga" data-uid="an1-10">
<article>
<div class="hgroup">
<p class="division">อังคุตตรนิกาย</p>
<p class="subdivision">เอกนิบาต</p>
<h1>1.1-10</h1>
</div>
<article>
<h2 class="sutta" id="sc1" data-uid="an1.1">1.1</h2>
<p>…</p>
</article>
<article>
<h1 class="sutta" id="sc2" data-uid="an1.2">1.2</h1>
<p>…</p>
</article>
…
</section>
<aside id="metaarea">
…

(I didn’t yet put in the articles for for vagga-sutta suttas as I figured it could be handled by the pending script; but I may as well add them in now if I’m going to revisit these suttas anyway).