Fixing HTML on legacy texts

As per discussion elsewhere this tasks plods on (currently having got itself embroiled in correcting VI SN numbers). An update will follow in due course.

However, I wanted to quickly check up on detail. Prompted by the follow in connection to po/json texts:

What is the correct way for the legacy texts? Currently, a couple of ways can be found:

  1. <a class="sc" id="1">
    (this mostly only appears in root texts, but also can be found elsewhere: nl/…/snp/dubois/, en/…/thig/thanissaro/, en/…/thig/others/, in vagga suttas and also in all but one of the dhp translations, <a class="sc" id="1" data-uid="dhp1">)

  2. <a class="sc" id="sc1"> (this is most common across the legacy translations—where there are ids—and as <a class="sc" id="sc1" data-uid="dhp1"> in the IT dhp)

Would, indeed, be good if this could be explained.

With regards to that AN ones and twos in the majority of cases (de, en, es*, fr, hu, it, my, nl, ru, si, sl, vi), it’s not applied at all and suttas are just separated by <h1> (well, now <h2> in my working branch), or have been translated as one sutta. In the instances where it is used (ca, cs, es – 1 suttas, id, no – 2 suttas, pt, th) there are a couple of slight variations in application:

<div id="text" lang="cs">
<section id="vagga">
<section class="sutta" id="1" data-uid="an1.1-10">
<div id="text" lang="th">
<section id="vagga" data-uid="an1-10">
<section class="sutta" id="1" data-uid="an1.1">
<div id="text" lang="id">
<section id="vagga">
<section class="sutta" id="1" data-uid="an1.1">

Trying to synthasize what has previously been discussed (particularly with referrence to your outline of how vagga-suttas should be handled), and the impression I have from looking over the files that use data-uid, the following is my current thought what migth need to be done (before the great reform by script):

Current (live site)

<div id="text" lang="th">
<section id="vagga" data-uid="an1-10">
<section class="sutta" id="1" data-uid="an1.1">
<article>
<div class="hgroup">
<p class="division">อังคุตตรนิกาย</p>
<p class="subdivision">เอกนิบาต</p>
<h1>1.1</h1>
</div>
<p …</p>
</article>
</section>
<section class="sutta" id="2" data-uid="an1.2">
<article>
<div class="hgroup">
<h1>1.2</h1>
</div>
<p>…</p>
</article>
</section>
…
<aside id="metaarea">
…

As pressently on working HTML branch

<div id="text" lang="th">
<section class="vagga" id="an1.1-10">
<article>
<div class="hgroup">
<p class="division">อังคุตตรนิกาย</p>
<p class="subdivision">เอกนิบาต</p>
<h1>1.1–10</h1>
</div>
<h2 id="an1.1">1.1</h2>
<p>…</p>
<h2 id="an1.2">1.2</h2>
<p>…</p>
…
</article>
<aside id="metaarea">
…

Deducded/proposed

…
<div id="text" lang="th">
<section id="vagga" data-uid="an1-10">
<article>
<div class="hgroup">
<p class="division">อังคุตตรนิกาย</p>
<p class="subdivision">เอกนิบาต</p>
<h1>1.1-10</h1>
</div>
<article>
<h2 class="sutta" id="sc1" data-uid="an1.1">1.1</h2>
<p>…</p>
</article>
<article>
<h1 class="sutta" id="sc2" data-uid="an1.2">1.2</h1>
<p>…</p>
</article>
…
</section>
<aside id="metaarea">
…

(I didn’t yet put in the articles for for vagga-sutta suttas as I figured it could be handled by the pending script; but I may as well add them in now if I’m going to revisit these suttas anyway).