Text structuring: Multiple Translations/Editions

blake · April 15, 2016, 9:08pm

This relates to the idea we’ve had for a while for supporting multiple translations in the same language, for example Bodhi and Sujato.

What I want to do is come up with a folder and URL structure which elegantly handles this.

The first thing is I think what we currently have as the collection/division/sutta structure should be labelled the “navigation” or “menu” structure because it’s not precisely canonical. It’s simply the structure we use to build our menus.

Next is the concept of a “manuscript”, “book” or “publication” which may be online (technically we can even have an online edition of a manuscript), I would like to use the word “edition” to refer to any such thing at least for organizational purposes, altough perhaps “source” or “origin” would be terms which could also work. Approximately speaking an edition would be a collection of texts with a common creator (whether individual or organization) and copyright, also and more importantly a text would be unique within an edition.

Edition:

An edition is a collection of texts with a common creator, individual or organization
An edition has a label and a uid.
A text is unique within an edition.
Editions are at a higher organizational level than languages.
An edition should (but not must) have a common copyright applicable to all texts within.
An edition may have translations from multiple languages and/or into multiple languages but this would be unusual. Example: The same translator doing both Pali and Sanskrit translations.

I’d thought about having something like:

dn1-sujato.html
dn1-bodhi.html

But in the end from an organizational perspective, I think it’s best that all texts from a common creator should be in the same folder structure. Furthermore, I think that folder should be at the very highest level which would mean the file path might look like this:

/text/sujato/en/pi/su/dn/dn1.html

For ease of transition there would be a graball folder:

/text/common/...

While ideally you would place a group of translations from a particular origin into it’s own folder, you can also just dump them into common, at least to the extent there are no collisions with existing translations.

I’m not decided on whether it’s better to have text/sujato/en/pi or text/en/sujato/pi, the relevant thing is both the translation language and the translator actually exist outside the primary navigation structure. It will in any case be fairly rare that the same translator translates to multiple languages so maybe it better belongs under the translation language.

Now as for why I think each edition should have be it’s own folder rather than mixing up texts from different editions. I feel it is superior from an organizational perspective, in a sense the texts from a same origin belong together more than texts in the same language, this is especially true when the origin is a book, the book object should be a container (i.e. folder) which contains all the texts from it. Also there is the DRY principle, if you use mn1-bodhi.html, mn2-bodhi.html… and so on you’re repeating yourself an awful lot. Also there is the language/division/subdivision/vagga type structure, the edition just doesn’t belong anywhere in that navigation structure, it belongs outside it (either to the right or the left). And finally just practically it is convenient for search/replace over multiple files as often there will be a problem particular to a set of texts from a common source (say for example misuse of a class).

The other thing is the URL, which can be quite independent of the folder structure. It’s actually not important right now because I’m mainly interested in making sure the data structure is forward compatible, but I’m thinking something like en/sujato/dn1 or en/dn1/sujato

sujato · April 16, 2016, 1:41am

The idea on editions is great.

I would definitely put it under the same language. Even if a translator translates into multiple languages—which is so rare I can’t think of any examples on SC—there’s no real advantages in lumping them together, for example find/replace, publication details, copyright and so on will all be different.

Can, but if possible I’d like to keep them at least related, keep the structure clear.

I prefer the latter, it makes it clear it’s the same thing. You can drop the last item off and still get the same thing, or just rewrite the URL by hand to get a different author.

mikenz66 · April 16, 2016, 2:34am

Will it be possible to do this without completely breaking links? Eg if you add the subdirectory can the /en/mn1 default to something? Similarly for links to paragraphs. If they break we should at least get the sutta.

sujato · April 16, 2016, 3:03am

Don’t worry, we will try not to break anything. But it does get a bit complicated, so let me work through some of the logic.

Take a URL that points to, say:

/en/mn1

That is, the current translation on SC of MN1, which happens to be that of Bhikkhu Bodhi. That can be redefined as:

/en/mn1-bodhi

Or something of the sort. But if you go to en/mn1 it will redirect you to a translation. Here’s where it gets tricky.

Normally, the redirect would go to whatever is considered the 'default" translation. Let’s assume that this is mine. So:

 /en/mn1

shows the same text as:

/en/mn1-sujato

That is, unless you have specifically chosen to use Bhikkhu Bodhi’s texts by deafult (an option we plan to implement). In which case

/en/mn1

shows the same text as

/en/mn1-bodhi

So this is a bit awkward, the same URL can have two different texts.

Moreover, a link, which may have been made to the Bodhi text, now points to another one. This might not matter in many cases, but sometimes it will.

A further problem is the paragraph linking. If I have a paragraph number for BB, this is not always going to work in another text.

We could kludge this problem by simply saying, keep the existing texts as default. But while this would be fine for BB’s texts, many of our translations are subpar and should definitely not be default.

One option might be to infer the text from the paragraph links. So let us assume we have a paragraph link to mn1#7. This will go to ID number 7 in the text. So in other texts we don’t use that, we prefix the ID for example, so it must be mn1#new7, for example. If each of the texts has a distinct ID system, this might work, although it is somewhat complex.

Blake or Vimala may have some more ideas on this.

mikenz66 · April 16, 2016, 3:10am

Thanks Bhante,

As I said, it would be good if broken links at least gave something reasonably logical. For example, I reported a problem with the linking to paragraphs in the Vinaya a little while ago. Using that link just gives are 404 error. A better solution would have been to give the Vinaya section. Of course, that might be too time-consuming to implement, but it would be a good aim.

sujato · April 16, 2016, 3:27am

Sorry, I’ve forgotten about this, can you let me know what happened, or link to the original report?

mikenz66 · April 16, 2016, 3:31am

Sorry, here’s the link.

sujato · April 16, 2016, 8:34am

Ahh, yes, all coming back to me now. You are quite right, the link should fall back to the page as a whole if its broken. For further details, we’ll have to wait a little and see how we proceed.

blake · April 16, 2016, 8:51am

Ah, well that’s a case of a link which never worked to begin with nor can it be converted to a working link (since the language is lost). We definitely make sure that links that work now will continue to work in the future. It is likely there will be minor incompatibilities though, but we will try to make sure they point to approximately the right place.

mikenz66 · April 16, 2016, 8:59am

Thanks @blake,

I was wondering if system might be smart enough to do something helpful even for links that are wrong. You have code in the Discourse forum that recognises, for example, MN1. If the error-handling code did the same it could at least offer some useful suggestion for a link like
suttacentral.org/.../mn1/…

Whether this is worth the trouble is, of course, another question.