Summary
I have just completed a major set of corrections and adjustments to the paragraphing of the four main nikayas. Previously our paragraphs simply inherited those of the Mahasangiti edition on which our Pali text is based. These are generally sane, but far from perfect.
Improvements have been made in the following main areas:
- Use standard conventions for dialogue, each speaker has a new paragraph.
- Ensure proper semantic divisions of doctrinal passages.
- Ensure abbreviated passages present properly.
To this end, over 12,000 new paragraph tags have been added, and a few hundred removed.
In addition, I made a number of other adjustments:
- Prefer using comma rather than colon to introduce direct speech. Exceptions include such things as quotes of doctrinal statements.
- Many minor corrections and adjustments along the way.
- Use āgentlemanā for kulaputta.
Process
To get it done, I used a quirky feature of our PO files: since they contain HTML tags in the notes, they can be renamed, and with a couple of tweaks, work as HTML files. This allowed me to view the presentation as I worked. In addition, it allowed me to run HTML Tidy over all files, picking up a number of errors.
My aim was to produce something that as would require as little further processing as possible.
Segment conventions
Trailing space is correct as is, in both English and Pali. All segments include trailing space, except where they end with em-dash ā
. (It seemed to me this is the simplest solution, no processing is required.)
The text follows nilakkhana. There are only two cases (both these in English only). These will have to be transformed on export to HTML:
- For list markup:
~
ā<li>
- Let the list items self-close. There is no need to add
<ol>
tags, these are already present.
- Let the list items self-close. There is no need to add
- For emphasis:
\*(.*?)\*
ā<em>$1</em>
In line with the Vinaya, all meta-info for segments, i.e. every line starting with #
has an explicit type. Here are examples of each type:
#. HTML: </span>
# NOTE: See BB.
#. VAR: bhaddasÄritÄ«re ā kaddamadahatÄ«re (bj, s1-3, km, pts1)
#. REF: pts-vp-pli5.477, sc1
Notes:
- There may be zero or one instances of HTML, REF, and NOTE per segment.
- There may be multiple VAR.
- The sequence of types is not necessarily consistent.
- Only NOTE lacks a trailing period after the
#
.
File conventions
The HTML for each file is complete, and ends with the required close tags. There is no need to add close </blockquote>
, </section>
, etc.
Versions
The added and deleted paragraphs are not explicitly indicated in the final commit.
However, if at any point someone wishes to reconstruct the added <p>
tags, they are indicated with class="added"
in this commit.
To recover the deleted paragraph tags: the original sc
numbers were added per paragraph. So for any segment that contains an sc
number, but has no <p>
(or other block-level tag), a <p>
can be inferred.