Updated!
I am currently working on the PO files for Brahmali’s Vinaya translation. I have made some adjustments to the way the PO is handled in line with:
@Brahmali @blake @sabbamitta @greenTara
Here are the corrected files:
pootle-corrections-sep-2018.zip (2.6 MB)
- All Pali and Brahmali translations up to Pc 50.
- All Pali only texts for the remainder of the Vinaya.
- Detailed reference info up to Pc 50.
- Incomplete ref info for some Khandhakas.
- A few Sutta texts that also need updating.
- A set of corrections to sutta numberings, based on sanity checking of
msgctxt
numbers. The folder “corrected numbers” contains both the corrected files and the sanity test with notes.
Things I have done
- Basis of numbering is the
pts-cs
numbers of the PTS edition. Segments subdivide these. - However, the notation
pts-cs
has been removed from the segment numbers. It is assumed. - Generally speaking, the
msgctxt
numbers in these files do correctly follow thepts-cs
numbers. This should be preserved as much as possible. Nonetheless, when the English is added it will also contain thepts-cs
numbers. Where there are two sets, they can be checked against each other accuracy. And if a number is missing from one it can be supplied from the other. For this reason, ensure that all numbers are preserved for now. We can reconcile them later. - Unnumbered text is assigned zeroth numbers. This is usually for headings or front matter that is not assigned a number in the
pts-cs
scheme. In such cases we insert a 0 so we can count the numbers without affecting thepts-cs
count. For example, beginning the Aniyata rules we havepli-tv-bu-vb-ay1:0.1
. This zeroth level continues up topli-tv-bu-vb-ay1:0.5
. - Segments that are in the translation but not the Pali text are assigned an ID suffix
a
(eg.pli-tv-bu-vb-ss6:3.1.0a
), or in some cases, alsob
. This is used for extra headings. In such cases the segment is blank in the Pali. In a few cases I have adjusted and simplified these from Brahmali: it is best to have as few extra segments as possible. - Reference numbers included class
pts-cs-empty
. These have been deleted. They were originally added as a display convenience. - Explicit markers have been added for each kind of segment metadata. The form of the mark follows the current
#. VAR:
for variant readings. Thus we have:-
#. HTML:
= HTML markup -
#. VAR:
= variant reading -
#. REF:
= reference -
# NOTE:
= note by Brahmali
-
- Each of these kinds of metadata normally takes one line only. An exception is in the case of notes and variant readings, which occasionally have more than one per segment. These are genuine cases where they have two or more notes, or two or more variant readings per segment, so should be retained.
- Note that
# NOTE:
lacks a trailing period after the hash. I think this is how to get the notes to work as comments in Pootle. - The REF numbers are stripped of the HTML trappings and made consistent. They are comma separated.
- For some reason the word Ānanda appeared multiple times in the reference data. It has been removed.
- Sometimes the reference numbers and HTML structural markup was not cleanly separated. This has been done.
- The refs had inconsistently
wt-pa
andms-pa
. I have changed these all toms
. They represent thems
(notmsdiv
) numbers in the Mahasangiti edition, which is the primary system in that edition. - Use markdown link syntax for cross-references. This has
[square brackets for the displayed text](followed by round brackets for the ID reference)
. Original uses bothclass=“cr”
and<ref>
, but both can be treated the same way. Example:To be expanded as in [Relinquishment 1, paragraphs 13–17](pli-tv-bu-vb-np1#13), with appropriate substitutions
- The original included rule counts at the end of the text. These were of the form
<em>42</em>:<strong>91</strong>
. The first number is a rule count in that class of rules, the second the total rule count. These had been assigned a separate segment. In fact, there is no need for them at all. I have deleted them and their segments. - Certain other numbers had also been assigned separate segments. But we should only give segments to genuine text, so I have moved the numbers into the previous
msgid
. - All HTML/XML style markup has been removed from the text and translation. Instead we use the markdown-style conventions as defined in Nilakkhana. Things actually used in the text are:
-
*abc*
— emphasis, =<em>
-
_abc_
— Pali text quoted in another language. =<i lang=“pli”>
-
**abc**
— strong emphasis =<strong>
(I think this is only in the unfinished Khandaka texts and will end up being replaced.) -
#
— numbers found in text =.counter
. -
«abc»
— A note in the text identifying the speaker =.speaker
(once only!) -
[link text](link ID)
— link for cross reference, etc. =<a class=“cr”>
(usually).
-
- The reference numbers have been separated and keyed off the segment numbers. These are in a separate CSV file. I have retained the REF data in the PO files, and made all corrections in both places.
- I have used the new and simpler HTML “starter” code, which leaves off everything before
<section>
. - I have added the missing text from Ss 13. This required re-segmenting it from the beginning.
- DN 6 fixes a numbering issue.
- AN 4.106 is a ghost text. We supply a file to explain that it is missing.
- The AN Elevens got seriously borked on Pootle, I have corrected them.
Things to be done
Blake
- Each set of reference numbers must be checked for sanity and completeness. See “Correcting the segment numbers” below.
- The
msgctxt
numbers must be re-incremented, as certain segments have been removed or merged. - Check that NOTE displays as comments in Pootle.
- Upload all texts to Pootle and check that they work.
- Export texts to the site.
Later
These things can be left until the entire collection is done.
- Check and ensure heading levels are correct and consistent.
- Semantic labelling of headings and sections needs to be made consistent.
About the reference data
Here are the reference types we have, and what they mean.
-
pts-cs-seg
: Thepts-cs
numbers, with an added level to subdivide for the segments. -
pts-cs
: The chapter and section numbers for the PTS Pali (and English) edition. These need to be checked to ensure they match up properly with thepts-cs-seg
numbers. Once done, thepts-cs
can be deleted, as they are redundant. -
sc
: The SuttaCentral paragraph numbers. These are created from thems
numbers, adjusted so that they are a simple increment from the start of each sutta (i.e. an HTML file on SC). As with thepts-cs
numbers, they are redundant, so we should check that they match correctly with thems
numbers, then delete them. -
ms
: These are the primary reference system of the Mahasangiti edition. They should be retained. Once we are confident that they are correct, we can use them as the key to import the remainingms
reference data for multiple editions, which is found here. -
pts-vp-pli
: Volume/page for the PTS Pali edition. -
pts-vp-en
: Volume/page for the PTS English edition. -
msdiv
: They are from the Mahasangiti edition, and equal the paragraph numbering in the VRI source text. Only in SS 13.
Correcting the segment numbers
Sometimes the numbering of extra segments is incorrect. The basic principle is that no extra segment should interfere with the numbering of the original text. Now, in our edition we have cases like this:
Added headings using zeroth numbers
pli-tv-bu-vb-ay1:0.6
pli-tv-bu-vb-ay1:1.0a
pli-tv-bu-vb-ay1:1.1
This is fine. The extra heading appears as the numbering restarts, and is accommodated with a zero. Zero level numbers can, of course, be added in the Mahasangiti itself, as the heading levels are not incorporated as numbered sections. So adding the -a suffix makes it explicit that this is in the translation alone.
Added segments in sequence
Sometimes the sequence does not restart yet we have an added segment, again usually a heading.
pli-tv-bu-vb-ay1:1.35
pli-tv-bu-vb-ay1:1.35a
pli-tv-bu-vb-ay1:1.36
Here the extra segment has the same number as the previous, with an added letter suffix. This is also correct.
Added segments that mess the sequence
However, sometimes we have cases where the extra segments mess with the numbers:
pli-tv-bu-vb-np9:1.42
pli-tv-bu-vb-np9:1.43a
pli-tv-bu-vb-np9:1.44
This is incorrect, it should be:
pli-tv-bu-vb-np9:1.42
pli-tv-bu-vb-np9:1.42a
pli-tv-bu-vb-np9:1.43
So we shall have to test for such cases and fix them.