Bi Pm / Vb translation & parallels issues

vimalanyani · July 2, 2017, 4:24pm

The Dharmaguptaka Bhikkhuni Parajikas 6-8 are incorrectly linked with the Pali.
It should be like this:
Dg 6 with Pi 8;
Dg 7 with Pi 6;
Dg 8 with Pi 7.

@Vimala, they are also wrong in the file from github you gave me earlier today.

vimalanyani · July 2, 2017, 4:35pm

The Sarvastivada Parajikas have the same mistake and should be linked like this:

Sarv 6 with Pi 8;
Sarv 7 with Pi 6;
Sarv 8 with Pi 7.

(same order as Dg)

vimalanyani · July 2, 2017, 6:02pm

Mu has the same mistake, too. Should be the same order as above.

sujato · July 2, 2017, 10:50pm

Thanks, Ayya. @vimala, can you work with Ayya on this? It would probably be best for @vimalanyani to check the Chinese texts, and I’ll check the Indic, if necessary.

Vimala · July 3, 2017, 6:29am

Thanks for this!
What I have now is the following:

“parallels”: [“pi-tv-bi-vb-pj6”, “pi-tv-bi-pm#pj6”, “pi-tv-pvr-bi#pj6”, “lzh-dg-bi-vb-pj6”, “lzh-dg-bi-pm#pj6”, “lzh-mi-bi-vb-pj6”, “lzh-mi-bi-pm#pj6”, “lzh-sarv-bi-vb-pj6”, “lzh-sarv-bi-pm#pj6”, “skt-sarv-bi-pm#pj6”, “lzh-mu-bi-vb-pj6”, “lzh-mu-bi-pm#pj6”, “bo-mu-bi-vb-pj6”, “bo-mu-bi-pm#pj6”, “lzh-mg-bi-vb-pj7”, “lzh-mg-bi-pm#pj7”, “skt-lo-bi-vb-pj7”]

“parallels”: [“pi-tv-bi-vb-pj7”, “pi-tv-bi-pm#pj7”, “pi-tv-pvr-bi#pj7”, “lzh-dg-bi-vb-pj7”, “lzh-dg-bi-pm#pj7”, “lzh-mi-bi-vb-pj8”, “lzh-mi-bi-pm#pj8”, “lzh-sarv-bi-vb-pj7”, “lzh-sarv-bi-pm#pj7”, “skt-sarv-bi-pm#pj7”, “lzh-mu-bi-vb-pj7”, “lzh-mu-bi-pm#pj7”, “bo-mu-bi-vb-pj7”, “bo-mu-bi-pm#pj7”, “lzh-mg-bi-vb-pj8”, “lzh-mg-bi-pm#pj8”, “skt-lo-bi-vb-pj8”]

“parallels”: [“pi-tv-bi-vb-pj8”, “pi-tv-bi-pm#pj8”, “pi-tv-pvr-bi#pj8”, “lzh-dg-bi-vb-pj8”, “lzh-dg-bi-pm#pj8”, “lzh-mi-bi-vb-pj7”, “lzh-mi-bi-pm#pj7”, “lzh-sarv-bi-vb-pj8”, “lzh-sarv-bi-pm#pj8”, “skt-sarv-bi-pm#pj8”, “lzh-mu-bi-vb-pj8”, “lzh-mu-bi-pm#pj8”, “bo-mu-bi-vb-pj8”, “bo-mu-bi-pm#pj8”, “lzh-mg-bi-vb-pj6”, “lzh-mg-bi-pm#pj6”, “skt-lo-bi-vb-pj6”]

You say that lzh-dg-bi-pm#pj6 etc. should not be in the same series as pi-tv-bi-pm#pj6, etc. But how about the whole rest of the series? For instance, I can also see that if that is not correct, it is also not correctly linked with pi-tv-bi-vb-pj6, etc. Also, f.i. if you change lzh-dg-bi-pm#pj6 in the first one to #pj7 as you suggest, is this then still correct with the other Vinayas in that series?

I know it looks complicated in these large lists of parallels, but if you change one somewhere, this can have effects on other series as well.

So can you be a bit more specific about what needs to be changed exactly to make sure the series are still correct in total? And Maybe @sujato can help with the Sanskrit Vinaya texts for that.
Thanks for your help!

Vimala · July 3, 2017, 7:37am

Now you can also see the parallels online:

https://staging.suttacentral.net/ll/pi-tv-bi-pm

vimalanyani · July 3, 2017, 11:10am

Ok, I’ll update the list.

Bhante @sujato: no need to check the skt lo text. I can do that, since I need to look into it anyway. But we have three more texts for which we mention parallels, but don’t have the texts online.

lzh-sarv-bi-pm-dh (can’t find anything on the dunhaung project website)
skt-sarv-bi-pm
skt-sarv-bi-vb

Without the text, I can’t check if they are matched correctly.
I’d also be curious to see them for the translation project. They are probably just fragments, but still, might be interesting…
Is there any way to display meta data on SC for these texts, so that people know where the information came from?

vimalanyani · July 3, 2017, 11:31am

There might be more updates on parallels as I progress with the translation project.

@Vimala: I think you are doing something with the patimokkha files to prepare them for pootle. I wonder though if there might be a problem, because all the patimokkhas have the rules in a different sequence.

I am planning to translate all the parallels together, rule by rule, so it would be better to have files where they are in the same sequence. I. e. is it possible to rearrange all the rules according to the pali sequence, plus the ones that don’t have a pali parallel, and then convert them back into the original sequence at the end?

Vimala · July 3, 2017, 12:03pm

That should not make any difference. The pootle files are just like the files you see on the site, like for instance https://suttacentral.net/pi/pi-tv-bi-pm, but just in a different format that makes it easier for translation and that can then be exported again to exactly that same format you see online.

I’m afraid you will have to keep switching between the different patimokkhas and keep the list with the correct parallels at hand when doing so.

Vimala · July 4, 2017, 5:44am

I’ve uploaded all the patimokkha files in pootle format here: lzh-pm.zip (98.4 KB)

We will not be using Pootle until after Pootle has been upgraded and the Chinese dictionary added. But as we discussed, it might be easier for you to just use these raw Pootle files in Sublime so you can have them next to each other. Have a look at the files and see what you think. No doubt you will have questions about them so please let me know.

The Chinese dictionary is also online in SuttaCentral and shows the meanings of characters on mouse-hover when activated.

vimalanyani · July 4, 2017, 4:47pm

Here’s the updated list.
bo-mu is tentative, (now following the numbering according to Waldschmidt, and also in accordance with lzh-mu), but best to wait for Ayya Dhammadinna’s approval.

“parallels”: [“pi-tv-bi-vb-pj6”, “pi-tv-bi-pm#pj6”, “pi-tv-pvr-bi#pj6”, “lzh-dg-bi-vb-pj7”, “lzh-dg-bi-pm#pj7”, “lzh-mi-bi-vb-pj8”, “lzh-mi-bi-pm#pj8”, “lzh-sarv-bi-vb-pj7”, “lzh-sarv-bi-pm#pj7”, “skt-sarv-bi-pm#pj7”, “lzh-mu-bi-vb-pj7”, “lzh-mu-bi-pm#pj7”, “bo-mu-bi-vb-pj7”, “bo-mu-bi-pm#pj7”, “lzh-mg-bi-vb-pj7”, “lzh-mg-bi-pm#pj7”, “skt-lo-bi-vb-pj7”]

“parallels”: [“pi-tv-bi-vb-pj7”, “pi-tv-bi-pm#pj7”, “pi-tv-pvr-bi#pj7”, “lzh-dg-bi-vb-pj8”, “lzh-dg-bi-pm#pj8”, “lzh-mi-bi-vb-pj7”, “lzh-mi-bi-pm#pj7”, “lzh-sarv-bi-vb-pj8”, “lzh-sarv-bi-pm#pj8”, “skt-sarv-bi-pm#pj8”, “lzh-mu-bi-vb-pj8”, “lzh-mu-bi-pm#pj8”, “bo-mu-bi-vb-pj8”, “bo-mu-bi-pm#pj8”, “lzh-mg-bi-vb-pj8”, “lzh-mg-bi-pm#pj8”, “skt-lo-bi-vb-pj8”]

“parallels”: [“pi-tv-bi-vb-pj8”, “pi-tv-bi-pm#pj8”, “pi-tv-pvr-bi#pj8”, “lzh-dg-bi-vb-pj6”, “lzh-dg-bi-pm#pj6”, “lzh-mi-bi-vb-pj6”, “lzh-mi-bi-pm#pj6”, “lzh-sarv-bi-vb-pj6”, “lzh-sarv-bi-pm#pj6”, “skt-sarv-bi-pm#pj6”, “lzh-mu-bi-vb-pj6”, “lzh-mu-bi-pm#pj6”, “bo-mu-bi-vb-pj6”, “bo-mu-bi-pm#pj6”, “lzh-mg-bi-vb-pj6”, “lzh-mg-bi-pm#pj6”, “skt-lo-bi-vb-pj6”]

vimalanyani · July 5, 2017, 5:33pm

(As per Bhante Sujato’s request, I’ll post any further comments that come up during the Bi Pm translation in this thread. Therefore, I’ve changed the title to make it more comprehensive.)

Bhante @Sujato, are there any guidelines for digitizing Sanskrit fragments?
This is an example of a Patimokkha fragment from the Waldschmidt book.

Obviously, Waldschmidt tried to represent the exact image of the fragment, i.e. the spaces between words, the linebreaks, and the hole in the fragment for the string. Should I try to preserve that when digitizing it? If yes, then should I just roughly put the right number of spaces between words, or is there any special formatting that we use? Do we have a symbol for the string hole? Should I preserve information about “recto/verso”?
We don’t seem to have any of that in the other Sanskrit texts on SC but I’m not sure if that is because we don’t want it or because the information wasn’t available.

OCR-ing the text doesn’t work for Sanskrit, as was to be expected, therefore I have to type everything manually.

sujato · July 5, 2017, 11:12pm

Okay, well may I first ask, how much text is involved?

The best way to approach it will be to use plain text. As long as this is done carefully and consistently we can transform it to HTML later. This is effectively what we do with the GRETIL texts, which are mostly plain text.

We should design the texts from the beginning so they can be easily segmented and added to Pootle. For patimokkha rules, we have one segment per rule, so this is easy. Each rule is on one line. Just add the rule ID at the start of the line. For example, for Nissaggiya Pacittiya 12, start the line np12.

As for the line numbers and page numbers, these can also be added in a similar way, using a unique form for each. So line numbers could be #1, #2, etc., and page numbers ^1, ^2, etc. It doesn’t really matter, so long as you are careful and consistent. We’ll regex them into proper metadata later. Doing it this way just saves time and make it easier to type and read the plain text document.

We should also ensure that each text has metadata with the correct ID number as given by Waldschmidt. So let us assume that the fragment above is part of a longer patimokkha text. Waldschmidt has named this kat-nr.539. Let us assume that this ID uniquely applies to this image. So we save the image as kat-nr.539.png. And in the text file, we include this ID number at the appropriate spot, again using some unique form, say $kat-nr.539. This will allow us to associate the correct source image for checking. The details of this will need sorting out, but this is the basic idea.

I assume that there are no paragraphs, etc. But let me know if there are, we can deal with that.

Each file should be headed with some meta text that defines these conventions, in addition to explaining the use of [brackets] etc.

I would suggest start with a small and simple text, share it here as you go, and we can sort the details out as we progress.

Yes, this can be done in a similar way, just use some unique sign for these.

Yes, they were adopted mostly from GRETIL, which ignores such things.

It’s a bit hard to say, but it looks to me like the spacing in that fragment is mostly determined by justification, i.e. the spacing is stretched to fill the line. (FYI, manuscripts themselves don’t use justification; they just break the line wherever it runs into the end.)

The exception is the space around the hole, where it appears to create a circular space with the hole in the middle. This kind of thing is messy to reproduce in a digital document, so I would ignore it. You can insert some kind of unique symbol where the hole is, but that should be enough. Any unique symbol is fine, we can transform it later.

Vimala · July 6, 2017, 6:12am

Sofar we have converted our HTML texts to Pootle and that worked fine. If @vimalanyani makes it in plain text (with whatever special signs in it), I can design a Pootle file that includes the correct html-coding in it as well as put it in html on the site. It might be slightly difficult to read in the Pootle file itself with all the <span class="gap"> and <span class="unclear"> etc. which have to stay within the segment and cannot be put in the comments.

Note that the PM files are each just one file with every rule as a separate paragraph <p></p>. So there is only one metaarea per each Patimokkha too.

vimalanyani · July 6, 2017, 10:15am

There are 4 manuscripts (all very fragmented): 2 Patimokkhas, 1 Vibhanga, 1 about Sikkhamana rules.

The longest text has 14 leaves, each with a recto and a verso, with content from Parajikas to Sekhiyas. The example I uploaded above is one recto, thus we have 28 times that amount of text for that manuscript. The shortest text (Sikkhamana) is just one recto and verso.

Waldschmidt is very consistent with the numbering, so I can just use his system to ID the text.

I’ll start with a manuscript of three leaves and see how it goes.

vimalanyani · July 6, 2017, 1:37pm

Here’s the first try. Manuscript Pb aka Kat.-Nr. 539.
This is the most decayed text, so there’s not much…

vimalanyani · July 6, 2017, 2:06pm

Just wondering now about copyright… Since we are taking bits from a freely available text and quoting the source in the meta data, we should be fine, right?

The digital Turfan archive, which possesses the original fragments - or at least the fragments which have survived bombing and looting during WW2 - , has a scary copyright notice on their website: http://turfan.bbaw.de/dta/dta_RulesEngl_index.html

edit: Is the Waldschmidt book actually freely available? I assumed it was, since Bhante @Sujato uploaded it to his thread about Vinaya parallels. But I can’t find a pdf on google. It’s from 1926, so copyright should have run out at some point (?).

sujato · July 7, 2017, 2:49am

Copyright doesn’t apply to ancient texts, regardless of what anyone says. The only thing that is potentially copyrightable is the image or design. So the photos of the Turfan manuscripts are copyright—madness though it is—but the text is not. Anyway, we can link to them. Having said which—and this is something it would be good to do for all our linked manuscripts—we should make local copies of all these, in case the website goes offline.

The images of the Waldschmidt book should be fine to publish.

Vimala · July 7, 2017, 6:20am

How about Waldschmidt’s translation thereof? It might be a very old (and maybe not so good) translation, but it would be nice to have on SC.

vimalanyani · July 7, 2017, 7:10am

Waldschmidt doesn’t actually translate the fragments. What he translates is most/all (?) the bhikkhuni-specific rules (not the shared rules) of all the Indic, Chinese and Tibetan Patimokkhas, including comments on the rules in the Vibhangas, if they are different. It is a great resource. It is very old, but not necessarily bad (haven’t looked at it in detail yet).

I am going to make use of it for my translations. Would be great to have it on SC, and I’d offer to prepare the texts, but there might be copyright issues. Bhante @Sujato, you posted it in your bibliography of Vinaya parallels, so do you think it is freely available?

The book was published in 1926, but Waldschmidt died in 1985, so 70 years after his death have not passed yet. He was 88 when he died, so by now it must be his grandchildren who would have the copyright (unless it is with the publisher), and it would be impossible to find them.