Tibetan updates, once more with feeling

sujato · November 1, 2017, 4:38am

As per the old issue here, we have long wanted to update the Tibetan texts:

I made some progress, but along the way got confused and made some errors of both detail and judgement, so I hope to fix those here.

The basic idea is that we completely eliminate the separate category of “Tibetan Critical editions” and just have “Derge” and “Upayika” for the suttas.

In addition there is the Vinaya of course. But the Vinaya has not yet been prepared for SC, and due to its complexity, we have no established set of semantic IDs yet. So for now we simply label the Vinaya texts with their Derge number. The only exception to this is xct-mu-kd-eimer, for which I have not been able to establish a Derge number.

In addition we make a set of other changes.

Eliminate the old DQ (Derge/Peking) numbers completely. These are a hybrid, which, much like an uruk-hai, serve no master well. Use plain D (Derge) numbers instead.
In the text files, reconcile the D numbers so that the title and the file ID are the same. The reason they are different is that the THLIB edition uses a different number for its D, while the correct one (AKA the one we use!) is their “Master Catalogue Number”. Add an explanation for this in the text files.
Eliminate the old tib numbers completely.
Eliminate duplicates.
Ensure all text files are updated with correct ID in:
- file name
- file metadata (eg. <section class="sutta" id="d31">)
- file heading
In sutta.json:
- add Sanskrit title,
- use correct IDs,
- move “Q” information (another edition of the Kangyur) to vol/page
In d.json and parallels.json use correct IDs
In doubtful cases check against Skilling’s Mahasutras.

I have done all this, and I think it is ready. The transformations of the IDs from the old system is below. Hopefully we can simply upload these files as is to the next server.

Note that in an earlier version of this scheme I used many more # IDs. But on closer inspection I discovered that this was a mistake, and in fact all the sutra D numbers do in fact apply to the whole text, so there is no need. The information previously included in the # IDs is vol/page info, and is correctly recorded as such in sutta.json. The # IDs remain for the Vinaya texts only. We do not have the text files for these, so these are not file names, merely parallels. When the Vinaya is properly added we will change these to the correct semantic numbering.

Here, for the record, is the scheme for conversion.

Old ID	Next ID
dq10	d1#ga4a
dq20	d1#ka304b
dq30	d1#kha53a
dq40	d1#kha63a
dq50	d1#kha86a
dq60	d1#kha101a
dq70	d1#nga258a
dq80	d3#cha215a
dq85	d6#da191a
dq90	d6#tha82a
dq95	d31
dq100	d38
dq102	d41
dq103	d42
dq104	d211
dq105	d296
dq110	d297
dq115	d300
dq120	d313
dq122	d316
dq124	d331
dq126	d337
dq130	d338
dq140	d617
dq150	d974
dq160	up
tib10	(eliminated as a duplicate of d1#kha101a)
tib20	(eliminated as a duplicate of xct-mu-kd-eimer)
tib30	xct-mu-kd-eimer
tib40	(eliminated as a duplicate of d338)
tib60	d290
tib70	d291
tib75	d292
tib78	d293
tib80	d294
tib90	d34
tib100	d33
tib110	(eliminated as a duplicate of Up 3.050)