I am currently working on the PO files for Brahmali’s Vinaya translation. I have made some adjustments to the way the PO is handled in line with:
Here are the corrected files:
pootle-corrections-sep-2018.zip (2.6 MB)
- All Pali and Brahmali translations up to Pc 50.
- All Pali only texts for the remainder of the Vinaya.
- Detailed reference info up to Pc 50.
- Incomplete ref info for some Khandhakas.
- A few Sutta texts that also need updating.
- A set of corrections to sutta numberings, based on sanity checking of
msgctxtnumbers. The folder “corrected numbers” contains both the corrected files and the sanity test with notes.
Things I have done
- Basis of numbering is the
pts-csnumbers of the PTS edition. Segments subdivide these.
- However, the notation
pts-cshas been removed from the segment numbers. It is assumed.
- Generally speaking, the
msgctxtnumbers in these files do correctly follow the
pts-csnumbers. This should be preserved as much as possible. Nonetheless, when the English is added it will also contain the
pts-csnumbers. Where there are two sets, they can be checked against each other accuracy. And if a number is missing from one it can be supplied from the other. For this reason, ensure that all numbers are preserved for now. We can reconcile them later.
- Unnumbered text is assigned zeroth numbers. This is usually for headings or front matter that is not assigned a number in the
pts-csscheme. In such cases we insert a 0 so we can count the numbers without affecting the
pts-cscount. For example, beginning the Aniyata rules we have
pli-tv-bu-vb-ay1:0.1. This zeroth level continues up to
- Segments that are in the translation but not the Pali text are assigned an ID suffix
pli-tv-bu-vb-ss6:3.1.0a), or in some cases, also
b. This is used for extra headings. In such cases the segment is blank in the Pali. In a few cases I have adjusted and simplified these from Brahmali: it is best to have as few extra segments as possible.
- Reference numbers included class
pts-cs-empty. These have been deleted. They were originally added as a display convenience.
- Explicit markers have been added for each kind of segment metadata. The form of the mark follows the current
#. VAR:for variant readings. Thus we have:
#. HTML:= HTML markup
#. VAR:= variant reading
#. REF:= reference
# NOTE:= note by Brahmali
- Each of these kinds of metadata normally takes one line only. An exception is in the case of notes and variant readings, which occasionally have more than one per segment. These are genuine cases where they have two or more notes, or two or more variant readings per segment, so should be retained.
- Note that
# NOTE:lacks a trailing period after the hash. I think this is how to get the notes to work as comments in Pootle.
- The REF numbers are stripped of the HTML trappings and made consistent. They are comma separated.
- For some reason the word Ānanda appeared multiple times in the reference data. It has been removed.
- Sometimes the reference numbers and HTML structural markup was not cleanly separated. This has been done.
- The refs had inconsistently
ms-pa. I have changed these all to
ms. They represent the
msdiv) numbers in the Mahasangiti edition, which is the primary system in that edition.
- Use markdown link syntax for cross-references. This has
[square brackets for the displayed text](followed by round brackets for the ID reference). Original uses both
<ref>, but both can be treated the same way. Example:
To be expanded as in [Relinquishment 1, paragraphs 13–17](pli-tv-bu-vb-np1#13), with appropriate substitutions
- The original included rule counts at the end of the text. These were of the form
<em>42</em>:<strong>91</strong>. The first number is a rule count in that class of rules, the second the total rule count. These had been assigned a separate segment. In fact, there is no need for them at all. I have deleted them and their segments.
- Certain other numbers had also been assigned separate segments. But we should only give segments to genuine text, so I have moved the numbers into the previous
- All HTML/XML style markup has been removed from the text and translation. Instead we use the markdown-style conventions as defined in Nilakkhana. Things actually used in the text are:
*abc*— emphasis, =
_abc_— Pali text quoted in another language. =
**abc**— strong emphasis =
<strong>(I think this is only in the unfinished Khandaka texts and will end up being replaced.)
#— numbers found in text =
«abc»— A note in the text identifying the speaker =
[link text](link ID)— link for cross reference, etc. =
- The reference numbers have been separated and keyed off the segment numbers. These are in a separate CSV file. I have retained the REF data in the PO files, and made all corrections in both places.
- I have used the new and simpler HTML “starter” code, which leaves off everything before
- I have added the missing text from Ss 13. This required re-segmenting it from the beginning.
- DN 6 fixes a numbering issue.
- AN 4.106 is a ghost text. We supply a file to explain that it is missing.
- The AN Elevens got seriously borked on Pootle, I have corrected them.
Things to be done
- Each set of reference numbers must be checked for sanity and completeness. See “Correcting the segment numbers” below.
msgctxtnumbers must be re-incremented, as certain segments have been removed or merged.
- Check that NOTE displays as comments in Pootle.
- Upload all texts to Pootle and check that they work.
- Export texts to the site.
These things can be left until the entire collection is done.
- Check and ensure heading levels are correct and consistent.
- Semantic labelling of headings and sections needs to be made consistent.
About the reference data
Here are the reference types we have, and what they mean.
pts-csnumbers, with an added level to subdivide for the segments.
pts-cs: The chapter and section numbers for the PTS Pali (and English) edition. These need to be checked to ensure they match up properly with the
pts-cs-segnumbers. Once done, the
pts-cscan be deleted, as they are redundant.
sc: The SuttaCentral paragraph numbers. These are created from the
msnumbers, adjusted so that they are a simple increment from the start of each sutta (i.e. an HTML file on SC). As with the
pts-csnumbers, they are redundant, so we should check that they match correctly with the
msnumbers, then delete them.
ms: These are the primary reference system of the Mahasangiti edition. They should be retained. Once we are confident that they are correct, we can use them as the key to import the remaining
msreference data for multiple editions, which is found here.
pts-vp-pli: Volume/page for the PTS Pali edition.
pts-vp-en: Volume/page for the PTS English edition.
msdiv: They are from the Mahasangiti edition, and equal the paragraph numbering in the VRI source text. Only in SS 13.
Correcting the segment numbers
Sometimes the numbering of extra segments is incorrect. The basic principle is that no extra segment should interfere with the numbering of the original text. Now, in our edition we have cases like this:
Added headings using zeroth numbers
This is fine. The extra heading appears as the numbering restarts, and is accommodated with a zero. Zero level numbers can, of course, be added in the Mahasangiti itself, as the heading levels are not incorporated as numbered sections. So adding the -a suffix makes it explicit that this is in the translation alone.
Added segments in sequence
Sometimes the sequence does not restart yet we have an added segment, again usually a heading.
Here the extra segment has the same number as the previous, with an added letter suffix. This is also correct.
Added segments that mess the sequence
However, sometimes we have cases where the extra segments mess with the numbers:
This is incorrect, it should be:
So we shall have to test for such cases and fix them.