SuttaCentral

Proposal: use three-letter ISO codes for ancient languages


#1

Usually on SC we use two-letter ISO codes for languages. There are some exceptions:

  • For Sanskrit, we use skt rather than sa, as there are lots of things in Pali/Sanskrit that start with sa-, so it’s not very convenient.
  • For Chinese, we adopted the classical Chinese language code of lzh for ancient texts, using zh for modern Chinese.

It seems prudent to follow the same practice for Tibetan, and use the classical code xct for the ancient texts. While we don’t have any modern Tibetan translations, it is fairly likely that we will at some stage. The Dalai Lama has requested that the Pali canon be translated into Tibetan. Thus we should reserve bo for modern translations.

When it comes to the more obscure languages, most of them have only three-letter codes: pra for prakrit, pgd for Gandhari, etc.

Thus almost all our ancient texts will use three-letter codes anyway, so why not make it a rule? We can adopt pli for pali, and use the correct san for Sanskrit. This will give us a handy way of distinguishing ancient and modern languages, as, so far at least, all our modern languages use two-letter codes.

The full list of ancient languages will be as follows.

Pali pli
Chinese lzh
Tibetan xct
Sanskrit san
Prakrit pra
Gandhari pgd
Uighur uig
Tocharian A xto
Khotanese kho

Overall text hierarchy for SC Next
#2

Not much response sofar. Up to you I guess.


#3

This has already been adopted by the team and is under implementation.


#4

Out of curiosity, I had to look it up (Wiki) :

LZH may refer to:

Classical Chinese (ISO 639: lzh), a written form of Old Chinese
LHA (file format), a data compression format
Liuzhou Bailian Airport (IATA code: LZH), an airport in China

Classical Tibetan
Region Tibet, North Nepal
Era 10th–12th centuries
Language family
Sino-Tibetan
Tibeto-Kanauri ?
Bodish
Tibetic
Classical Tibetan
Early form
Old Tibetan
Writing system
Tibetan script

Language codes
ISO 639-3 xct
Linguist List
xct