Tibetan updates, once more with feeling

As per the old issue here, we have long wanted to update the Tibetan texts:

I made some progress, but along the way got confused and made some errors of both detail and judgement, so I hope to fix those here.

The basic idea is that we completely eliminate the separate category of “Tibetan Critical editions” and just have “Derge” and “Upayika” for the suttas.

In addition there is the Vinaya of course. But the Vinaya has not yet been prepared for SC, and due to its complexity, we have no established set of semantic IDs yet. So for now we simply label the Vinaya texts with their Derge number. The only exception to this is xct-mu-kd-eimer, for which I have not been able to establish a Derge number.

In addition we make a set of other changes.

  • Eliminate the old DQ (Derge/Peking) numbers completely. These are a hybrid, which, much like an uruk-hai, serve no master well. Use plain D (Derge) numbers instead.
  • In the text files, reconcile the D numbers so that the title and the file ID are the same. The reason they are different is that the THLIB edition uses a different number for its D, while the correct one (AKA the one we use!) is their “Master Catalogue Number”. Add an explanation for this in the text files.
  • Eliminate the old tib numbers completely.
  • Eliminate duplicates.
  • Ensure all text files are updated with correct ID in:
    • file name
    • file metadata (eg. <section class="sutta" id="d31">)
    • file heading
  • In sutta.json:
    • add Sanskrit title,
    • use correct IDs,
    • move “Q” information (another edition of the Kangyur) to vol/page
  • In d.json and parallels.json use correct IDs
  • In doubtful cases check against Skilling’s Mahasutras.

I have done all this, and I think it is ready. The transformations of the IDs from the old system is below. Hopefully we can simply upload these files as is to the next server.

Note that in an earlier version of this scheme I used many more # IDs. But on closer inspection I discovered that this was a mistake, and in fact all the sutra D numbers do in fact apply to the whole text, so there is no need. The information previously included in the # IDs is vol/page info, and is correctly recorded as such in sutta.json. The # IDs remain for the Vinaya texts only. We do not have the text files for these, so these are not file names, merely parallels. When the Vinaya is properly added we will change these to the correct semantic numbering.

Here, for the record, is the scheme for conversion.

Old ID Next ID
dq10 d1#ga4a
dq20 d1#ka304b
dq30 d1#kha53a
dq40 d1#kha63a
dq50 d1#kha86a
dq60 d1#kha101a
dq70 d1#nga258a
dq80 d3#cha215a
dq85 d6#da191a
dq90 d6#tha82a
dq95 d31
dq100 d38
dq102 d41
dq103 d42
dq104 d211
dq105 d296
dq110 d297
dq115 d300
dq120 d313
dq122 d316
dq124 d331
dq126 d337
dq130 d338
dq140 d617
dq150 d974
dq160 up
tib10 (eliminated as a duplicate of d1#kha101a)
tib20 (eliminated as a duplicate of xct-mu-kd-eimer)
tib30 xct-mu-kd-eimer
tib40 (eliminated as a duplicate of d338)
tib60 d290
tib70 d291
tib75 d292
tib78 d293
tib80 d294
tib90 d34
tib100 d33
tib110 (eliminated as a duplicate of Up 3.050)