Incorporating author/translator codes in SuttaCentral

sujato · November 14, 2016, 10:49am

This is an issue that will be pending for the future evolution of SuttaCentral. However it came up recently in a specific text, so I thought it was worth sharing my thoughts here. This is mainly for @blake and @vimala, although any thoughts are welcome.

So far we only include one translation per text, and thus there has been no need to identify the translator in the URL. In the future, we will require this, as multiple translations for one text become available. In addition, this is one way to improve visibility of translators, an issue that has been flagged by some users.

##How Access to Insight does it: four letter author codes

This issue was discussed quite some time ago by John Bullitt, who advocated for the significance of standardized and robust URLs. It’s beyond the scope of this article to discuss this principle. I would simply note that the example of SC shows that such standardized systems are difficult to enforce. At AtI they were not thinking of the vast range of non-Pali texts that we include, and which necessitated a complete rethink of how abbreviations are used. It’s all about the namespace. The critical thing is that the system be internally coherent and consistent; then it’s relatively trivial to convert things consistently to another namespace when needed.

Bullitt developed a system of four-letter author abbreviations. (For the sake of this article, I’ll just use “author” to include translator, editor, etc.) These are applied consistently throughout Access to Insight. So can readily see that mn.001.than.html was translated by Ven Thanissaro. The four letter codes are short enough to maintain handy URLs, while at the same time allowing many authors to be identified from the abbreviation alone.

So there are good reasons to adopt this scheme, and perhaps that is what we should do.

On the other hand, this scheme has limitations.

In many cases, it is not apparent from the abbreviation who the author is. Could we guess that rhyc is Mrs CAF Rhys Davids, while rhyt is Mr. TW Rhy Davids?
Another limitation is the inclusion of authors from other languages. As Bullitt acknowledges, this falls outside the scope of his project.
Even for English texts, we omit many of the authors in the AtI list, while adding new ones.

##Another option: full name slugs

Up until now, I have insisted on extreme mortification of URLs. Almost all our URLs rely solely on two-letter abbreviations and a number or two. We identify MN 1 with mn1, adding the language where necessary, /pi/mn1, /en/mn1, and so on. Perhaps it is time to relax this stringency.

While it is, I feel, not unreasonable to expect knowledgeable users to know the ISO language code for their language, and to know or acquire the knowledge of our abbreviation system at least for the main texts, remembering a long list of translator codes is too much. So perhaps we should instead supply a slug of the full author name.

This slug would follow the usual principles in designing stable URLs (mentioned by John Bullitt here. No diacriticals, only URL-safe glyphs, avoid spaces & underscore, all lowercase. Similar principles are used to slug the Post titles as URLs here on Discourse (and Wordpress, etc.). Titles are dropped. If an author has published under multiple names (for example Lawrence Khantipalo Mills), the most common name for that author is used (Khantipalo).

Thus:

Sujāto would become sujato, Bodhi would be bodhi, Ṭhānissaro would be thanissaro.
Lay authors would include the full surname, with initials if necessary. So caf-rhys-davids, tw-rhys-davids, norman.
Groups can simply be listed as such: kelly-sawyer-wareham.
Organizational names can be similarly treated: burmese-pitaka-association, yahoo-pali-group.

I would suggest these be addended to the current URLs. In this way, if you lack the author, or get it wrong, we can default back to the “details” page and you can find your way back easily. Since the author names are clear of the confusing short URL codes, they can easily be read as plain English, and serve to practically identify the author. Thus we would have /en/mn1/sujato, en/mn1/bodhi, and so on.

It would be more logical to have the language code next to the author name, so that is something we can look into.