Thanks so much @sabbamitta for hanging in there until this was completed. You have been a stalwart of this project. You have been incredibly helpful over such a long period of time. Without your support, this would have taken much longer. Not only that, but now that I have reviewed much of your work (and that of your two excellent colleagues), I can say that it is all of a very high quality. It’s rather amazing you have been a able to do this without much knowledge of Pali.
The input project is now almost complete. There is a little bit left for @tracy to do on Kd14, but she too is getting very close. In a few days time we will be ready for the next stage. I am not sure how we proceed from here. @sujato?
Which I think you can see is much cleaner and better organized.
Next, I will review the segments to improve consistency and accuracy (step 3). In the PO files, any segments that have been marked as “needing work” will have the tag #fuzzy. So I go through all the fuzzy segments and resolve problems. I’ll also check generally for consistency and coherence in segmenting.
The next stage will be to review the paragraphing, as I have recently done with the nikayas (step 4). To do this, I take advantage of a little quirk in the PO files: since they have HTML tags recorded as comments, with a few tweaks they can be made to render as actual HTML files! Then I can visually review the paragraph breaks by just opening the files in a browser. i will make the paragraphs conform to the normal rules, for example, paragraphs for each speaker in a dialogue passage. Generally speaking, the outcome will be to make more finely articulated and readable text by having shorter paragraphs; occasionally, however, it also means combining existing paragraphs.
When adjusting the segments, the numbering of the segments gets put out of wack. This does not affect the reference numbers, only the msgctxt, which is the universal key for all information associated with that segment. So at the end of the process, Blake will re-generate the msgctxt numbers to ensure that they are all correct, sequential, and unduplicated (step 5). He will also run tests to ensure that the text remains exactly as it was before this process. We will also run tests to ensure all the markup is valid and correct, and all the reference numbers are sane (for example, checking if any page reference numbers are omitted or doubled).
Up to now, we are still working with PO files, and they can, in principle, be re-uploaded to Pootle for further editing and so on. However the aim is to move on to Bilara, so the next step is to adjust the data for Bilara (step 6). If the preparation work has been done well, this will be an automated process, merely duplicating the process that is being done at the moment for the nikayas. This will split the PO data into separate JSON files. Currently, in the PO files, we have in the same file: original text, translation, segment ID, reference numbers, HTML markup, variant readings, and comments, as well as PO-specific file. Keeping all of this straight is the same file is ridoinculous. So the idea is that this is cleanly separated by data type, and may be recombined at will, all coordinated by the universal ID supplied by the segment number (which in PO is called msgctxt).
To see how beautiful these look, check out SN 1.1.
In the markup files, we have the HTML skeleton, fleshed out with the ID numbers.
By abstracting and separating concerns like this, we can combine these things across any language. The same set of references will work in Pali, English, Italian, of Thai. The same HTML markup will apply. If we like, we can apply comments across the different languages. None of this has been previously possible, because the relevant data is embedded in a file, and can’t be transferred from one context to another except by hand—which is exactly what you folks have been doing these past months. Now that you’ve done it, no-one else will have to. Yay!
i would estimate roughly a month to get the above process completed.
I assume you will be doing the entire Vinaya Piṭaka in one go. If so, this would mean no editing for the duration of one month, right?
The original text that I have entered on Notepad is already formatted in this way. Many (all?) of the paragraph breaks I inserted in the plain text file have been kept in the Pootle version. Most of the time all you need to do is to make use of the html paragraph tags to recreate paragraph breaks at the right place.
So once the month of processing is over, I may continue the editing on Bilara?
I see what you mean.
And because there is nothing more to do, saṃsāra comes to an end.
Oh, excellent, well that makes that much easier. in that case, i will just do a brief review of the paragraphing. In any case, so long as it is generally okay, it can always be adjusted later; it is, after all, a matter of presentation rather than content.
Right, this would be part of item 2, move all html into segments where possible. In cases where html markup cannot be moved to a segment level, for example inline emphasis, we use markdown, as we do here on discourse. (Actually a specific SC version of markdown called nilakkhana.)
Yay!!! Well done @tracy! Your support has been very valuable. You have done a tremendous job.
I have now reviewed eight Khandhakas, including at least one from each one of you, and the quality is very high. There are occasional mistakes, of course, a lack of which would only be attributable to super-normal powers! Not that you haven’t got them, it’s just that I am sure you wouldn’t flaunt them here on the forum.
I wish to thank all three of you once more for your generous and kind contribution to this project. I am hoping this Vinaya translation will be of use to monastics and others for a long time to come, at least several decades. What a wonderful thing it is to have this available on the web. And that’s thanks to the three of you!
I wish you all a long and joyful association with the Dhamma.
And I would like to thank you in return for patiently answering all our questions, silly or otherwise, and for accompanying our work, never short of encouraging words!
For me this has been a great opportunity to learn both about the Vinaya and Pali. Even if a systematic study of Pali is still waiting for me to come, my knowledge and understanding now is so much better than when I started working on this project. Hopefully that will be very useful in other respects, so thank you for the opportunity!
Yes, you know, sort of. I was hoping to review the input before you download it. I’ve done 9 out 22 Khandhakas so far. But perhaps it is not required? Or rather, perhaps I can do this at a later stage?
One of the problems is that segmentation of the Pali is often awkward. This will make the line-by-line display on SuttaCentral seem awkward too. I was hoping to go through all of this and streamline it. I am wondering, however, whether this can be done on Github, once everything has been uploaded there? Or is the Pali segmenting going to be fixed and unchangeable, as it was in Pootle?
Well, it’s up to you. Once I have finished my work, the whole text will be much cleaner and more consistent, which would make it easier for you. So it really just depends on how you want to work. If it’s something that can be readily done on Pootle, then by all means go ahead. Or if you are happy to work offline also, that is fine, but it may be better to wait until I have done my bit first.
The segmenting can be adjusted, it will not be as rigid as it is on Pootle (which is really just a problem with Pootle’s database.) However it is best to get it right first up and keep any later adjustments to a minimum.
I’m wondering whether you want to make similar adjustments to the Vibhangas?
A suggestion for the modification of segment breaks:
In passages like Atha kho āyasmā kaccānagotto yena bhagavā tenupasaṅkami; upasaṅkamitvā bhagavantaṃ abhivādetvā ekamantaṃ nisīdi. Ekamantaṃ nisinno kho āyasmā kaccānagotto bhagavantaṃ etadavoca: (here from the Kaccānagottasutta; but in the Vinaya there are plenty of such instances) the segment usually breaks after ekamantaṃ nisīdi, and then it leaves something like “and said” for the next segment. Wouldn’t it make more sense to merge this sort of segments?