Notes on the segmentation of Pali Vinaya with Brahmali's translation

Ah, okay. This would be handled differently in German… or actually, “who” in English can probably be both, singular or plural? This wasn’t clear to me.

In German this would be:

Wer spricht… ? — Die Mönche sprechen.

Even if the answer is plural, the question would be singular.

So I drew the wrong conclusion. :grin:


Segment 329
“the Order should settle this legal issue in the place where is arose”

… where it arose :white_check_mark:


Segment 388
“Revato then informed the Order”

In all other instances there is “Revata” for the name of this elder → changed here too. :white_check_mark:

2 Likes

The same again for segments 446, 447, and 448.


Segment 547
“These ten practice are contrary to the Teaching”

These ten practices:white_check_mark:


Segment 551
“Revato then asked Sabbakāmī about these ten practices in the midst of the Order”

Revata, like before :white_check_mark:

1 Like

Hmm… hmm… I am afraid I have bad news this time:

Unfortunately I now have to quit my job as a copier & paster of Ajahn @Brahmali’s Vinaya translation… because… THERE IS NOTHING LEFT TO COPY & PASTE!!! (Except for what Tracy is still doing, but that’s not my department :grin:)

Yes, even the second council—if it has in fact happened exactly the way it is reported—has come to an end. And I was just so immersed in the story that I had to finish it… :laughing: :tada:

I love this little chit chat between Arahants:

“My friend, what is your main meditation?”

“My main meditation, Venerable, is good will.”

“Your meditation is noble, for good will is a noble meditation.”

“Formerly, too, when I was a lay person, I habitually practiced good will, and now it is my main meditation. Besides, I attained perfection long ago. But what is your main meditation?”

“My main meditation is emptiness.”

“Your meditation is that of a great man, for emptiness is the meditation of a great man.”

“Formerly, too, when I was a lay person, I habitually practiced emptiness, and now it is my main meditation. Besides, I attained perfection long ago.”

:meditation: :meditation:

So, and now I am jobless… no, not really! :blush:

12 Likes

OMG congratulations!

8 Likes

Sādhu! Sādhu!! Sādhu!!!

Thanks so much @sabbamitta for hanging in there until this was completed. You have been a stalwart of this project. You have been incredibly helpful over such a long period of time. Without your support, this would have taken much longer. Not only that, but now that I have reviewed much of your work (and that of your two excellent colleagues), I can say that it is all of a very high quality. It’s rather amazing you have been a able to do this without much knowledge of Pali.

The input project is now almost complete. There is a little bit left for @tracy to do on Kd14, but she too is getting very close. In a few days time we will be ready for the next stage. I am not sure how we proceed from here. @sujato?

10 Likes

Just let me know when it is done, and I’ll do the next step.

What that will involve is essentially this:

  1. Download the PO files from Pootle. From then on, no work, corrections, or anything should be done on these texts on Pootle.
  2. I will go over the files and massage them until they are all in a consistent and clean form:
    • Ensure markup is correct, deduplicate where necessary.
    • Deduplicate references and put them in data form (eg, <a class="sc" id="sc12"></a> will become sc12)
    • Ensure all meta content is on separate and labelled lines in PO files.
    • Ensure each kind of content is on one line per segment.
  3. Adjust segmenting
  4. Adjust paragraphing
  5. Run automated tests to ensure data reliability
  6. Convert to JSON.

To describe this all in more detail, let me give an example from Kd 9:

#. HTML: </p><p>
#. REF: sc2
msgctxt "pli-tv-kd9:1.2.1"
msgid "Atha kho kassapagottassa bhikkhuno etadahosi—"
msgstr ""
"<p><a class=\"pts-cs\" id=\"Kd.9.1.2\" href=\"#Kd.9.1.2\">Kd.9.1.2</a><a "
"class=\"ms-pa\" id=\"MS.3.1775\" href=\"#MS.3.1775\">MS.3.1775</a>Soon "
"afterwards Kassapagotta thought,"

When I have processed the PO file (step 2) this will look like:

#. HTML: </p><p>
#. REF: sc2, kd.9.1.2, ms.3.1775
msgctxt "pli-tv-kd9:1.2.1"
msgid "Atha kho kassapagottassa bhikkhuno etadahosi—"
msgstr "Soon afterwards Kassapagotta thought,"

Which I think you can see is much cleaner and better organized.

Next, I will review the segments to improve consistency and accuracy (step 3). In the PO files, any segments that have been marked as “needing work” will have the tag #fuzzy. So I go through all the fuzzy segments and resolve problems. I’ll also check generally for consistency and coherence in segmenting.

The next stage will be to review the paragraphing, as I have recently done with the nikayas (step 4). To do this, I take advantage of a little quirk in the PO files: since they have HTML tags recorded as comments, with a few tweaks they can be made to render as actual HTML files! Then I can visually review the paragraph breaks by just opening the files in a browser. i will make the paragraphs conform to the normal rules, for example, paragraphs for each speaker in a dialogue passage. Generally speaking, the outcome will be to make more finely articulated and readable text by having shorter paragraphs; occasionally, however, it also means combining existing paragraphs.

When adjusting the segments, the numbering of the segments gets put out of wack. This does not affect the reference numbers, only the msgctxt, which is the universal key for all information associated with that segment. So at the end of the process, Blake will re-generate the msgctxt numbers to ensure that they are all correct, sequential, and unduplicated (step 5). He will also run tests to ensure that the text remains exactly as it was before this process. We will also run tests to ensure all the markup is valid and correct, and all the reference numbers are sane (for example, checking if any page reference numbers are omitted or doubled).

Up to now, we are still working with PO files, and they can, in principle, be re-uploaded to Pootle for further editing and so on. However the aim is to move on to Bilara, so the next step is to adjust the data for Bilara (step 6). If the preparation work has been done well, this will be an automated process, merely duplicating the process that is being done at the moment for the nikayas. This will split the PO data into separate JSON files. Currently, in the PO files, we have in the same file: original text, translation, segment ID, reference numbers, HTML markup, variant readings, and comments, as well as PO-specific file. Keeping all of this straight is the same file is ridoinculous. So the idea is that this is cleanly separated by data type, and may be recombined at will, all coordinated by the universal ID supplied by the segment number (which in PO is called msgctxt).

To see how beautiful these look, check out SN 1.1.

Root text:

https://github.com/suttacentral/bilara-data/blob/master/source/pli/ms/sn/sn1/sn1.1.json

Translation:

https://github.com/suttacentral/bilara-data/blob/master/translation/en/sujato/sn/sn1/sn1.1.json

References:

https://github.com/suttacentral/bilara-data/blob/master/reference/sn/sn1/sn1.1.json

In the markup files, we have the HTML skeleton, fleshed out with the ID numbers.

https://github.com/suttacentral/bilara-data/blob/master/markup/sn/sn1/sn1.1.html

By abstracting and separating concerns like this, we can combine these things across any language. The same set of references will work in Pali, English, Italian, of Thai. The same HTML markup will apply. If we like, we can apply comments across the different languages. None of this has been previously possible, because the relevant data is embedded in a file, and can’t be transferred from one context to another except by hand—which is exactly what you folks have been doing these past months. Now that you’ve done it, no-one else will have to. Yay!

i would estimate roughly a month to get the above process completed.

9 Likes

Shiny!!! Hooray data organization.

I hope to finish this weekend, definitely will within the next week.

8 Likes

I assume you will be doing the entire Vinaya Piṭaka in one go. If so, this would mean no editing for the duration of one month, right?

The original text that I have entered on Notepad is already formatted in this way. Many (all?) of the paragraph breaks I inserted in the plain text file have been kept in the Pootle version. Most of the time all you need to do is to make use of the html paragraph tags to recreate paragraph breaks at the right place.

So once the month of processing is over, I may continue the editing on Bilara?

I see what you mean. :lying_face:

And because there is nothing more to do, saṃsāra comes to an end. :thinking:

4 Likes

Huhh… wow! I never thought we could end Saṁsāra by means of copy & paste… :joy: :rofl:

6 Likes

That’s correct.

Oh, excellent, well that makes that much easier. in that case, i will just do a brief review of the paragraphing. In any case, so long as it is generally okay, it can always be adjusted later; it is, after all, a matter of presentation rather than content.

7 Likes

In some cases the paragraphs don’t quite match with the segment breaks. They can easily be found searching for “</p><p>” within a segment.

4 Likes

Right, this would be part of item 2, move all html into segments where possible. In cases where html markup cannot be moved to a segment level, for example inline emphasis, we use markdown, as we do here on discourse. (Actually a specific SC version of markdown called nilakkhana.)

2 Likes

kd14 is entered! It was fun to read an accessible translation and dip into the Vinaya and Pali, thank you! Now I gotta find another task…

7 Likes

Yay!!! Well done @tracy! Your support has been very valuable. You have done a tremendous job.

I have now reviewed eight Khandhakas, including at least one from each one of you, and the quality is very high. There are occasional mistakes, of course, a lack of which would only be attributable to super-normal powers! Not that you haven’t got them, it’s just that I am sure you wouldn’t flaunt them here on the forum. :grinning:

I wish to thank all three of you once more for your generous and kind contribution to this project. I am hoping this Vinaya translation will be of use to monastics and others for a long time to come, at least several decades. What a wonderful thing it is to have this available on the web. And that’s thanks to the three of you!

I wish you all a long and joyful association with the Dhamma.

@sabbamitta @greenTara

9 Likes

Nice name! :orange_heart:

4 Likes

And I would like to thank you in return for patiently answering all our questions, silly or otherwise, and for accompanying our work, never short of encouraging words!

For me this has been a great opportunity to learn both about the Vinaya and Pali. Even if a systematic study of Pali is still waiting for me to come, my knowledge and understanding now is so much better than when I started working on this project. Hopefully that will be very useful in other respects, so thank you for the opportunity! :anjal:

8 Likes

Dear friends, may I echo the celebratory words of Ven Brahmali, and add to the list Brahmali himself! It all looks like it’s coming together fantabulously.

If I understand correctly, everything is done now, is that right? If the work from your part is complete, I’ll download it and get to work.

7 Likes

Yes, you know, sort of. I was hoping to review the input before you download it. I’ve done 9 out 22 Khandhakas so far. But perhaps it is not required? Or rather, perhaps I can do this at a later stage?

One of the problems is that segmentation of the Pali is often awkward. This will make the line-by-line display on SuttaCentral seem awkward too. I was hoping to go through all of this and streamline it. I am wondering, however, whether this can be done on Github, once everything has been uploaded there? Or is the Pali segmenting going to be fixed and unchangeable, as it was in Pootle?

4 Likes

Well, it’s up to you. Once I have finished my work, the whole text will be much cleaner and more consistent, which would make it easier for you. So it really just depends on how you want to work. If it’s something that can be readily done on Pootle, then by all means go ahead. Or if you are happy to work offline also, that is fine, but it may be better to wait until I have done my bit first.

The segmenting can be adjusted, it will not be as rigid as it is on Pootle (which is really just a problem with Pootle’s database.) However it is best to get it right first up and keep any later adjustments to a minimum.

I’m wondering whether you want to make similar adjustments to the Vibhangas?

2 Likes

When it comes to Pali segments that need merging, all I can do on Pootle is mark them with “needs work”. It’s not all that satisfactory, I feel.

Not sure. But it will take me a while to go through the entire Vinaya. If you have the time right now, I think it’s probably better for you to just go ahead. So I say, go for it!

Is this because the segment numbering gets out of whack?

1 Like