SuttaCentral

Notes on preparing the segmented Vinaya files


#1

These are ongoing notes arising out of preparing the segmented Vinaya files.

Process

This is what I do, in case something goes horribly wrong and it needs to be done again!

  • Download all Pali Vinaya files from pootle (incl. Parivara and pm, aim is to make sure everything is consistent.
  • remove Pootle header metadata.
  • update old-style HTML: class=hgroup, section, etc.
  • remove \ and \n.
  • put all msgid and msgstr on one line.
  • Sometimes note is duplicated as HTML, these are removed.
  • where a segment has two notes, these are combined into one.

Notes

  • Vibhanga seems pretty clean!
  • Vibhanga uses nilakkhana. Ext links indicated with [](abc)
  • Internal references in Notes should be converted to segment numbers.

Questions

  • Heading segment numbers sometimes end in 0a
  • Remember, final close tags are missing!
  • segment ID pts-cs-bu-pj1 etc. seems excessive. Normally the context of a segment number is inferred. Should it not just be: pts-cs1?

Sometimes headings are added in translation that are not in MS. I am thinking we should use an x-notation to indicate all such cases. It will happen a lot once we start adding different editions, so it should be explicit and extensible. I am thinking instead of:

#. HTML: </h1></header><h2>
msgctxt "pli-tv-bu-vb-pj3:1.1.0a"
msgid ""
msgstr "Origin story"

#. HTML: </h2><h3>
msgctxt "pli-tv-bu-vb-pj3:1.1.0b"
msgid ""
msgstr "First sub-story"

We might have:

#. HTML: </h1></header><h2>
msgctxt "pli-tv-bu-vb-pj3:1.1.x.0.1"
msgid ""
msgstr "Origin story"

#. HTML: </h2><h3>
msgctxt "pli-tv-bu-vb-pj3:1.1.x.0.2"
msgid ""
msgstr "First sub-story"

Where:

  • pli-tv-bu-vb-pj3 = the UID i.e. what comes before the colon indicates the general context.
  • :1.1 indicates the context in the text.
  • x indicates that this is an “extra” segment compared to the MS reference edition.
  • 0 indicates a heading.

#2

This may come from the way we first started the segmentation work, still without entering the translated text. These were the initial instructions given to Tara and myself by Venerable Vimala:

Whatever plain text editor you normally use is fine. If you get it to use the syntax for ‘gettext’ (as opposed to html or anything else) for this it will probably show you nice colors.

The best way to work I find is to open up the pali and the english files with the same name side by side.

It is the pali file that will needs a few changes in order to be used as a template for our pootle translation system. The only thing that is needed is to change the numbers (msgctxt) to the correct pts-cs numbers, which are to be found in the English translations only.

First some terminology:

This is a segment:

#. <html><head><meta charset=“UTF-8”><meta author=“Bhikkhu Brahmali”></head><body><div id=“text”><section class=“sutta” id=“pli-tv-kd11”><article><div class=“hgroup”><p class=“collection”>
msgctxt "pli-tv-kd11:0.1"
msgid “Theravāda Vinayapiṭaka”
msgstr “”

It consists of 4 parts:

  1. #. = anything behind here is a comment for the use in pootle. So the html is in here. But sometimes also other notes. In the English translation the pts-cs numbers can be found in here. I will come back to that later.

  2. msgctxt = message context number, always starting with the file name ‘pli-tv-kd11’ followed by a colon and a number. It is this number that needs to change. I’ve already changed it for all the first three segments in the files.

  3. msgid = pali text (or English in the english translation file). This you can use for comparison to see where a section with the correct pts-cs number starts.

  4. msgstr “” = this is where Ajahn Brahmali’s translation will go at a later stage. We keep that blank for now.

The very first segment is that starts with ‘#Translation Template For SuttaCentral’ you can just leave.

Let’s take this file pli-tv-kd11 as an example.

When we search for ‘pts-cs’ in the English file, the first occurance is in line 40:

#. </h2><p> <a class=“pts-cs” id=“Kd.11.1.1” href="#Kd.11.1.1"></a> <a class=“pts-vp-en” id=“bd5.1”></a>
msgctxt “pli-tv-kd11:6.1”
msgid “At one time the Awakened One, the Lord was staying at Sāvatthī in the Jeta Grove in Anāthapiṇḍika’s monastery.”
msgstr “”

The number for the pts-cs is given as “11.1.1”. The first number refers to the number of the file so we disregard that and use ’ 1.1 ’ for our base number.

Then we have a look at the text “At one time the Awakened One, the Lord was staying at Sāvatthī in the Jeta Grove in Anāthapiṇḍika’s monastery.”. This corresponds to the text in the pali segment (also at line 40):

#. </p><p><a class=“sc” id=“2”></a><a class=“pts-vp-pli” id=“pts-vp-pli2.1”></a>
msgctxt “pli-tv-kd11:5.1”
msgid “Tena samayena buddho bhagavā sāvatthiyaṃ viharati jetavane anāthapiṇḍikassa ārāme.”
msgstr “”

So now we know which segment in English corresponds to which segment in Pali and which number to start using.

So in the pali, the msgctx becomes pli-tv-kd11:1.1.1

Then the next segment gets number pli-tv-kd11:1.1.2, etc. until the next pts-cs number is reached.

This is at line 65 in the English:

#. </p><p> <a class=“pts-cs” id=“Kd.11.1.2” href="#Kd.11.1.2"></a>
msgctxt “pli-tv-kd11:7.1”
msgid “Those who were modest monks looked down upon, criticized, spread it about, saying:”
msgstr “”

Which corresponds to the Pali in line 74:

msgctxt “pli-tv-kd11:5.9”
msgid “Ye te bhikkhū appicchā …”
msgstr “”

So from here on, the numbers become:

msgctxt “pli-tv-kd11:1.2.1”

msgctxt “pli-tv-kd11:1.2.2”

etc.

All the segments in the top header get numbers pli-tv-kd11:0.1, pli-tv-kd11:0.2, etc.

Below I’ve given the worked out example of this for the first lines.

NOTE: the English segments and the Pali segments do not always correspond completely. Sometimes there is one pali segment while the English has 4 or 5 or visa versa. Only the start of the section with the next pts-cs number is important.

One thing I find very helpful if I am not sure about where which translation corresponds to the Pali is go to the website: SuttaCentral and turn on the pali -> english lookup tool in the settings. Then hover over a word and see the english translation according to the dictionary.

If you want to know where approximately you are in on that page, also turn on Textual information. The SC numbers in the margin correspond to the sc numbers in the pali .po file like this: <a class=“sc” id=“2”></a> = SC 2

Maybe @greenTara can say something about the instances with an “a” in the heading number?


#3

%0a is urlencode for linefeed, so perhaps this is just some artefact from the imported source text?