Notes on the segmentation of Pali Vinaya with Brahmali's translation

Brahmali · June 5, 2019, 4:12am

Sādhu! And yes, “it” is correct. I have fixed it up.

I have started to look at how well the English translation on Pootle matches with the Pali segmentation, and so far it’s looking very good. It’s still a bit early for me to give definitive feedback, but my initial impression is that the quality of your work is very high. I soon intend to go through it all to fix up any minor problems.

If you wish to take a break from this work, now might be a good time to do so. My impression is that @sabbamitta has the most spare time to commit to this. Since there is so little left to do, I would suggest it might be most efficient to let her finish off what remains. However, as always I am flexible. Please let me know if you see this differently.

Regardless, thank you so much for all your work on this. I know you have a lot of other things going on in your life, and I am truly grateful you have been able to spare time for this project. Hopefully it will be of benefit to many monastics - and perhaps laypeople too.

I wish you the very best in your practice of the Dhamma.

greenTara · June 5, 2019, 10:46am

Yes, I would like to take a little time off from the segmentation work. I will concentrate on learning Sinhalese for a while!

This has been very rewarding work. I look forward to future collaborations.

sabbamitta · June 5, 2019, 12:58pm

Kd 16 segments 124 ff:

A few segments seem to have no English translation:

124 Tena kho pana samayena saṃghassa sosāniko bundikābaddho mañco uppanno hoti.
126 “Anujānāmi, bhikkhave, bundikābaddhaṃ mañcan”ti.
127 Bundikābaddhaṃ pīṭhaṃ uppannaṃ hoti.
129 “Anujānāmi, bhikkhave, bundikābaddhaṃ pīṭhan”ti.

Brahmali · June 5, 2019, 2:23pm

Thanks, but this is actually on purpose. See my note in segment 118. It is hard to discern the difference between the māsaraka bed/bench and the bundikābaddha bed/bench, and so I have chosen to group them together.

But as always, thanks for bringing up any apparent irregularities.

sabbamitta · June 5, 2019, 3:05pm

I did in fact read this note, but didn’t realize that it refers to multiple segments. I was happily engaged in copying & pasting without checking each segment more closely. Then when I arrived at the end of the chapter there were still a few segments left in the Pali where I had no translation for. And it took me between half an hour and an hour to figure out which were the segments that had no translation… by then I had forgotten about the note in segment 118.

sabbamitta · June 7, 2019, 11:35am

Kd 16 segment 694
“When I was a young, I stepped over this banyan tree, keeping it between my thighs, and the top shoots touched by belly.”

… touched my belly.

sabbamitta · June 8, 2019, 12:35pm

Kd 16 segment 754
“a woolen rug with long fleece on both side”

… on both sides

Edited later:

Segment 1134
“How can the monks at Āḷavī put monks in charge of such kind of work?

—It seems the end quotation marks are lacking in the English. Added.

Segment 1152
“They told the Master and he sais,”

… and he said

Brahmali · June 11, 2019, 12:24pm

Thanks so much @sabbamitta for all this. I thought I would just post this so that the system will allow you to keep adding new posts. These systems are so bossy. How can you get good work done when you are thwarted by technology?

sabbamitta · June 12, 2019, 8:52am

Thank you, Ajahn. I guess Discourse hasn’t been set up primarily as a workplace, but rather as a discussion forum; and from that perspective there is not much point in people discussing just with themselves, therefore the restriction that the same user can’t post more than 3 posts in a row.

Kd 22 segments 228, 229, and 230 could be candidates to be merged together (marked “needs work” in the latter two).

Segment 279
“Who speak according to the Teaching”

Who speaks…

Brahmali · June 12, 2019, 10:24am

Thanks, but “speak” is actually correct. The agent is plural, “monks”. I have changed it back to “speak”. It’s better too be a little bit too vigilant than a bit too slack!

sabbamitta · June 12, 2019, 10:32am

Ah, okay. This would be handled differently in German… or actually, “who” in English can probably be both, singular or plural? This wasn’t clear to me.

In German this would be:

Wer spricht… ? — Die Mönche sprechen.

Even if the answer is plural, the question would be singular.

So I drew the wrong conclusion.

Segment 329
“the Order should settle this legal issue in the place where is arose”

… where it arose

Segment 388
“Revato then informed the Order”

In all other instances there is “Revata” for the name of this elder → changed here too.

sabbamitta · June 12, 2019, 1:27pm

The same again for segments 446, 447, and 448.

Segment 547
“These ten practice are contrary to the Teaching”

These ten practices…

Segment 551
“Revato then asked Sabbakāmī about these ten practices in the midst of the Order”

Revata, like before

sabbamitta · June 12, 2019, 1:39pm

Hmm… hmm… I am afraid I have bad news this time:

Unfortunately I now have to quit my job as a copier & paster of Ajahn @Brahmali’s Vinaya translation… because… THERE IS NOTHING LEFT TO COPY & PASTE!!! (Except for what Tracy is still doing, but that’s not my department )

Yes, even the second council—if it has in fact happened exactly the way it is reported—has come to an end. And I was just so immersed in the story that I had to finish it…

I love this little chit chat between Arahants:

“My friend, what is your main meditation?”

“My main meditation, Venerable, is good will.”

“Your meditation is noble, for good will is a noble meditation.”

“Formerly, too, when I was a lay person, I habitually practiced good will, and now it is my main meditation. Besides, I attained perfection long ago. But what is your main meditation?”

“My main meditation is emptiness.”

“Your meditation is that of a great man, for emptiness is the meditation of a great man.”

“Formerly, too, when I was a lay person, I habitually practiced emptiness, and now it is my main meditation. Besides, I attained perfection long ago.”

So, and now I am jobless… no, not really!

sujato · June 12, 2019, 10:44pm

OMG congratulations!

Brahmali · June 13, 2019, 11:53am

Sādhu! Sādhu!! Sādhu!!!

Thanks so much @sabbamitta for hanging in there until this was completed. You have been a stalwart of this project. You have been incredibly helpful over such a long period of time. Without your support, this would have taken much longer. Not only that, but now that I have reviewed much of your work (and that of your two excellent colleagues), I can say that it is all of a very high quality. It’s rather amazing you have been a able to do this without much knowledge of Pali.

The input project is now almost complete. There is a little bit left for @tracy to do on Kd14, but she too is getting very close. In a few days time we will be ready for the next stage. I am not sure how we proceed from here. @sujato?

sujato · June 14, 2019, 12:24am

Just let me know when it is done, and I’ll do the next step.

What that will involve is essentially this:

Download the PO files from Pootle. From then on, no work, corrections, or anything should be done on these texts on Pootle.
I will go over the files and massage them until they are all in a consistent and clean form:
- Ensure markup is correct, deduplicate where necessary.
- Deduplicate references and put them in data form (eg, <a class="sc" id="sc12"></a> will become sc12)
- Ensure all meta content is on separate and labelled lines in PO files.
- Ensure each kind of content is on one line per segment.
Adjust segmenting
Adjust paragraphing
Run automated tests to ensure data reliability
Convert to JSON.

To describe this all in more detail, let me give an example from Kd 9:

#. HTML: </p><p>
#. REF: sc2
msgctxt "pli-tv-kd9:1.2.1"
msgid "Atha kho kassapagottassa bhikkhuno etadahosi—"
msgstr ""
"<p><a class=\"pts-cs\" id=\"Kd.9.1.2\" href=\"#Kd.9.1.2\">Kd.9.1.2</a><a "
"class=\"ms-pa\" id=\"MS.3.1775\" href=\"#MS.3.1775\">MS.3.1775</a>Soon "
"afterwards Kassapagotta thought,"

When I have processed the PO file (step 2) this will look like:

#. HTML: </p><p>
#. REF: sc2, kd.9.1.2, ms.3.1775
msgctxt "pli-tv-kd9:1.2.1"
msgid "Atha kho kassapagottassa bhikkhuno etadahosi—"
msgstr "Soon afterwards Kassapagotta thought,"

Which I think you can see is much cleaner and better organized.

Next, I will review the segments to improve consistency and accuracy (step 3). In the PO files, any segments that have been marked as “needing work” will have the tag #fuzzy. So I go through all the fuzzy segments and resolve problems. I’ll also check generally for consistency and coherence in segmenting.

The next stage will be to review the paragraphing, as I have recently done with the nikayas (step 4). To do this, I take advantage of a little quirk in the PO files: since they have HTML tags recorded as comments, with a few tweaks they can be made to render as actual HTML files! Then I can visually review the paragraph breaks by just opening the files in a browser. i will make the paragraphs conform to the normal rules, for example, paragraphs for each speaker in a dialogue passage. Generally speaking, the outcome will be to make more finely articulated and readable text by having shorter paragraphs; occasionally, however, it also means combining existing paragraphs.

When adjusting the segments, the numbering of the segments gets put out of wack. This does not affect the reference numbers, only the msgctxt, which is the universal key for all information associated with that segment. So at the end of the process, Blake will re-generate the msgctxt numbers to ensure that they are all correct, sequential, and unduplicated (step 5). He will also run tests to ensure that the text remains exactly as it was before this process. We will also run tests to ensure all the markup is valid and correct, and all the reference numbers are sane (for example, checking if any page reference numbers are omitted or doubled).

Up to now, we are still working with PO files, and they can, in principle, be re-uploaded to Pootle for further editing and so on. However the aim is to move on to Bilara, so the next step is to adjust the data for Bilara (step 6). If the preparation work has been done well, this will be an automated process, merely duplicating the process that is being done at the moment for the nikayas. This will split the PO data into separate JSON files. Currently, in the PO files, we have in the same file: original text, translation, segment ID, reference numbers, HTML markup, variant readings, and comments, as well as PO-specific file. Keeping all of this straight is the same file is ridoinculous. So the idea is that this is cleanly separated by data type, and may be recombined at will, all coordinated by the universal ID supplied by the segment number (which in PO is called msgctxt).

To see how beautiful these look, check out SN 1.1.

Root text:

https://github.com/suttacentral/bilara-data/blob/master/source/pli/ms/sn/sn1/sn1.1.json

Translation:

https://github.com/suttacentral/bilara-data/blob/master/translation/en/sujato/sn/sn1/sn1.1.json

References:

https://github.com/suttacentral/bilara-data/blob/master/reference/sn/sn1/sn1.1.json

In the markup files, we have the HTML skeleton, fleshed out with the ID numbers.

https://github.com/suttacentral/bilara-data/blob/master/markup/sn/sn1/sn1.1.html

By abstracting and separating concerns like this, we can combine these things across any language. The same set of references will work in Pali, English, Italian, of Thai. The same HTML markup will apply. If we like, we can apply comments across the different languages. None of this has been previously possible, because the relevant data is embedded in a file, and can’t be transferred from one context to another except by hand—which is exactly what you folks have been doing these past months. Now that you’ve done it, no-one else will have to. Yay!

i would estimate roughly a month to get the above process completed.

tracy · June 14, 2019, 4:18am

Shiny!!! Hooray data organization.

I hope to finish this weekend, definitely will within the next week.

Brahmali · June 14, 2019, 8:51am

I assume you will be doing the entire Vinaya Piṭaka in one go. If so, this would mean no editing for the duration of one month, right?

The original text that I have entered on Notepad is already formatted in this way. Many (all?) of the paragraph breaks I inserted in the plain text file have been kept in the Pootle version. Most of the time all you need to do is to make use of the html paragraph tags to recreate paragraph breaks at the right place.

So once the month of processing is over, I may continue the editing on Bilara?

I see what you mean.

And because there is nothing more to do, saṃsāra comes to an end.

sabbamitta · June 14, 2019, 9:11am

Huhh… wow! I never thought we could end Saṁsāra by means of copy & paste…

sujato · June 14, 2019, 9:53pm

That’s correct.

Oh, excellent, well that makes that much easier. in that case, i will just do a brief review of the paragraphing. In any case, so long as it is generally okay, it can always be adjusted later; it is, after all, a matter of presentation rather than content.