A brief note on section numbering for DN and MN

sujato · September 19, 2017, 3:10am

On reviewing the reference numbers for DN and MN, I find there are a few problems, and have made some changes.

In DN, the section numbers have been incorrectly marked "wp" i.e. “Wisdom Publications edition”. In fact the section numbers derive ultimately from the Pali edition of the PTS, and should be marked "pts-cs", i.e. PTS chapter and section. This class is already implemented in our lists of editions at paragraph-num.scss. These sections have subsequently been adopted by multiple editions, including the Wisdom one. I have made these changes to the PO file.
I think we should also change the label for the "wp" numbering in MN. While this section numbering is best known from the Wisdom edition by Ven Bodhi, in fact it was introduced by Ven Nyanamoli, as I can confirm from the scanned images of his translation. This system would have been, I think, first published in Khantipalo’s A Treasury of the Buddha’s Words. So I think it would be best to call it the “Nyanamoli” numbering, class="nya". This class will have to be added to Next.
Throughout, the PTS vol/page numbers have been marked inadequately as "pts". We should use the more specific "pts-pi-vp", i.e. PTS Pali vol/page. Again, this class is already implemented. I have made the changes to the PO file. This change might affect the text-image displayer.
Certain of the suttas in MN miss a few wp numbers. I have fixed some, but there may be more. We should do some integrity testing for these, ensure they are all sequential, for example.
A number of suttas in DN lack these numbers altogether: dn1, dn2, dn20, dn31, dn32. I will add them myself to the PO file.

Note that these numbers will be used as the basis for the canonical segment numbers, and we should take care to make sure they’re correct.

The corrected versions will be used for Next, there is no need to change the old HTML files.

Vimala · September 19, 2017, 7:49am

I understood that you did most points already in Pootle.
Here is a dump of the errors found in MN.
checkwp.txt.zip (512 Bytes)

What this lists is the number of the sutta and the wp-numbers that are not in line.
So if it says:

mn4.html
8
27

It means that wp7 and wp26 are most likely missing in mn4 so that needs to be checked. It will also give an error if the number before is not an integer, for instance if it is a range but good to check anyway.

Sometimes you will see something like:

mn108.html
9
8

This means that wp9 appears before wp8 in mn108 so that needs to be changed.

In Pootle, make sure that the wp numbers are mentioned as:

#. <a class="wp" id="wp14">

So as a comment above the relevant segment and not inside the segment otherwise it messes up the markup that is used in SC-Next.

sujato · September 19, 2017, 7:50am

Cool, very excellent. I’ll work on these today.

Vimala · September 19, 2017, 7:51am

Just remember: it’s a rough python dump and I did not check all the entries against the html files. They are just flags to denote that something is not in sequence at those points.

Vimala · September 19, 2017, 7:55am

Just noticed you added an extra point in there (number 2) which I had not seen before. I think this is best done in with a regex in the .po files and then upload it to Pootle again, but we have to talk about all these things because I’m not sure where everything is at and what I can and cannot do and where.

sujato · September 19, 2017, 7:58am

As to point 2, I have done this in the PO files, so it is a matter of updating the code. IIRC there’s three places it needs doing. Anyway, don’t worry, I’ll do it, I wrote those classes in the first place so it’s my mess to clean up!

frankk · September 19, 2017, 3:49pm

I guess it didn’t digest its prey very well?

sujato · September 20, 2017, 2:32am

@vimala, could I ask you to run this again?

I have located and corrected a bunch of errors in MN and DN, and it would be nice to do an additional verification. The classes are pts-cs in DN and nya in MN; neither of these classes is used in AN or SN.

One thing I noticed in the previous run, I think it didn’t properly handle ranges (eg 4-10), so it generated a bunch of false positives.

Here’s the latest version.

4nikayas.txt.zip (4.4 MB)

Vimala · September 20, 2017, 5:43am

Actually, the word “python” does not come from the snake, but from Monty Python (just thought I’d throw that bit of info in to confuse you).

Vimala · September 20, 2017, 6:01am

This is not a “run it again” I’m afraid. Everything in one huge .txt file?
There are several problems with this file. For instance:

> #. </p><h1> #msgctxt "dn1:2.1"
> msgid "Brahmajālasutta"
> msgstr "The Prime Net"
> 3
> #. </h1></div><h2> #msgctxt "dn1:3.1"
> msgid "1. Paribbājakakathā"
> msgstr "Talk on Wanderers"

First of all, no idea where the “3” comes from in between the lines, and the msgctxt are wrongly marked up (not correct .po). It should be:

> #. </p><h1>
> msgctxt "dn1:2.1"
> msgid "Brahmajālasutta"
> msgstr "The Prime Net"
> 
> #. </h1></div><h2> 
> msgctxt "dn1:3.1"
> msgid "1. Paribbājakakathā"
> msgstr "Talk on Wanderers"

Another thing with the comments is things like this:

#. <a class="sc" id="sc9"></a><a class="pts-vp-pi" id="pts-vp-pi1.153"></a> # RD’s translation of this passage is much more satisfactory than MW’s.

Which should be:

#. <a class="sc" id="sc9"></a><a class="pts-vp-pi" id="pts-vp-pi1.153"></a>
#. RD’s translation of this passage is much more satisfactory than MW’s.

(So on a new line with #. in front of it. Alternatively, on the same line but without the #). Unless this is something that Blake has devised and filters out in the markup. I myself have coded such things as:

#. <a class="sc" id="sc9"></a><a class="pts-vp-pi" id="pts-vp-pi1.153"></a>
#. RD’s translation of this passage is much more satisfactory than MW’s.

I’ll split it into .po files and then I can check the numbers. No, it did not do the ranges at all as I mentioned above.

But I will see if I can filter those out.

sujato · September 20, 2017, 9:30am

If it’s a hassle, just leave it, we can sort it out with Blake when he gets back.

Vimala · September 20, 2017, 9:47am

I’m nearly done with the list but was sidetracked by setting up the development environment in an Ubuntu Bash-shell inside W10 (which failed)

Vimala · September 20, 2017, 12:19pm

Voila!
pocheck.txt.zip (882 Bytes)
Happy to help with this

sujato · September 20, 2017, 10:33pm

Excellent, thanks so much.

blake · September 26, 2017, 6:53am

Getting on the same page here:

I would favor removing ALL paragraph number type things from the .po files and mapping them using the msgctxt, the point at which they are re-inserted could vary, the obvious points are when processing the po files (integrating them into the markup), or alternatively when activating textual information (sent as a separate JSON payload and spliced in using javascript)

A volume page mapping JSON file might look something a lot like this:

{
    "class": "sc",
    "id_prefix": "sc",
    "description": "SuttaCentral Paragraph Number",
    "mapping": {
        "dn1:4.1": "1",
        "dn1:5.1": "2",
        "dn1:6.1": "3",
        ...
        "an11.992:7.1": "3"
    }
}

Thoughts?

sujato · September 26, 2017, 6:56am

Oh absolutely, that would be perfect. The trick is, how do we keep it aligned when the segment numbers change? Ultimately we should aim to keep the segment numbers as stable as possible, but there will be some fluidity …

blake · September 26, 2017, 7:00am

I would initially key it to the sc numbers in the original texts, then from that to the msgctxts as they are now.

If you want something firmer then we could map it using sc numbers all the way, but I assume in the not-so-long run the msgctxts will become “set in stone”.

sujato · September 26, 2017, 8:32am

In DN it needs to be keyed to the pts-cs numbers, in MN to the nya numbers, elsewhere to the sc numbers; except Vinaya, which is also to the pts-cs.

Yes, this really needs to be the aim. So it’s just a matter of managing it in the meantime. I suspect that over the next while, I’ll keep noticing some segmenting issues and wanting to fix them, but it will get less and less. Eventually it will get close to zero, but we should probably assume that at any point we may want to change them. But it will be such small numbers that simply adjusting them by hand will be doable, so long as there is a clear way of doing that.