Side-by-side Translation & Source?

samantha_vipassana · November 3, 2015, 6:59pm

I’m not sure if this is already available, but I think it would be really helpful to be able to see the Paali and the translation side-by-side in a split pane view. I know we can turn on “textual information” and match that up by switching between the /pi/ and /en/ for example, but it would be a lot easier to see them side-by-side imo.

Brahmali · November 4, 2015, 12:25am

Excellent suggestion. I support this 100%.

Vimala · November 4, 2015, 8:07am

It might be an idea to develop this, but in the mean time, I am using Tab Resize on Chrome.
It looks like this: https://samitaasbl.files.wordpress.com/2015/11/tabresize.png
And gives this result: https://samitaasbl.files.wordpress.com/2015/11/splitwindow.png

fxam · November 4, 2015, 12:59pm

I like the way Buddha-Vacana.org presents side-by-side translation.

sujato · November 4, 2015, 3:44pm

Sorry everyone, side by side is a terrible idea. You can see texts side by side by resizing your browser window. That’s what windows are for. It is simply bad design for a website to duplicate a basic function of every computer.

I use side by side windows all the time, in fact this is how I normally work. I have infinitely more flexibility and power in how I arrange windows, how I scale them, how I arrange them, than I can ever get by being forced into a configuration on a website. Not to mention the nightmare that is responsive design and and adaptation to browser windows anywhere from a watch to a wall-sized screen.

The only advantage in side by side viewing is if you can match the exact place in the Pali with the exact place in the translation. But we can’t do that anyway, at least until the new translations are ready. When that happens, prepare to be awed …

samantha_vipassana · November 4, 2015, 6:40pm

I guess what I was getting at was that you could scroll through both in parallel (something you can’t do with independent tabs or windows). It would also be nice to hover over a word and see it’s counterpart highlighted on the other side. Sounds like that is something that is coming anyway!

Either way, I’m already very impressed with this site as it is.

LXNDR · November 4, 2015, 7:17pm

to me both sound like cool ideas, enhancement of user experience is always a good thing

the realization of the second one though i assume being especially labor intensive programmatically, considering the number of languages the translations in which the site hosts, and thus not very feasible in the short run

sujato · November 5, 2015, 2:05pm

This is the kind of thing which, with our “next-gen” translations, will become trivially easy. But it all takes time!

yap · April 5, 2016, 12:54pm

I’m not a fan of whole sutta(chapter) side by side reading, but I think word level parallel reading for a chosen paragraph is helpful. An example of word-level parallel reading: http://ya.ksana.tw/tipitaka/index2.html

Here is my approach:

break various recensions of sutta text and translations into small paragraphs and assign unique ID to each paragraph. By “small” I mean something like “lowest common multiple” of all recensions, for example, S56-11 requires at least 43 <p> to do things like this: http://rawgit.com/ksanaforge/nikaya_diff/master/
(VRI has 13 <p>, SC Pali has 14 <p>, SC English translation has 24 <p> )
Build one to one relationship automatically, mark missing corresponding word and and ambiguity (one to many , or many to one) for human intervention.
for example: mapping “evaṃ me sutaṃ” to “如是我聞”(Kumārajīva), given p1=evaṃ , p2=m3, p3=sutaṃ, k1=如, k2=是,k3=我,k4=聞。we thus have p1:k1k2 , p2:k3 , p3:k4

another common translation by XuanZang “聞如是” , x1=聞,x2=如,x3=是.
we have p1:x2x3 , p2:null , p3:x1

once we have these infomation, finding omitted or extra word in translation is very easy.

It is obvious that word order is not likely be the same across translation, at first I thought the order will remain on paragraph level, but when paragraph is very small, the target paragraph might be swapped or rearranged, just like what happen in word-level.

So I start building a tool to help manual paragraph-breaking and alignment.

rawgit.com/ksanaforge/alignparagraph/master/

When Enter key is pressed, instead of inserting an CR-LF into the text, the position is added to “break point offsets” and screen re-layout accordingly. Text will be keep intact.

I’m thinking of using the tool to create unified ID for all recensions, these IDs can be served as interchange unit for various paragraph numbering system, any suggestion?

sujato · April 5, 2016, 10:58pm

Well, this all sounds awesome.

To start with, so you know what we’re working on at this end.

I am developing a English translation of the Pali text based on a segmented version of the Mahasangiti text. This can break the text at any level we choose, but so far we are working on breaking it at major punctuation, that is, most punctuation excluding commas. This results is a fairly consistently segmented text, and one which, in the vast majority of cases, can be well translated segment by segment.

There are of course exceptions to this. the root text is not always consistently punctuated, so the segments can be inconsistent. And sometimes syntax demands translating out of order, especially in verse. And there’s the question of the abbreviated sections, which are always hard to handle elegantly.

Given the nature of pali, which can use very long sentences, we can sometimes end up with quite large chunks in one segment. Still, on the whole this gives us a nicely matched set of Pali/English segments. Here’s a sample text for you, a typical Anguttara sutta.

an5.032.po.zip (3.5 KB)

As you can see it’s all in PO, with the HTML embedded as comments so we can reconstitute the HTML files at the end of the day. Obviously this can be further transformed as needed. We’ve also developed a LaTeX flow for producing books.

Each segment has an ID, and that ID will be consistently matched between source and target texts across the whole canon.

But this is just the start. We also plan to support further translations into other target languages. In this way the IDs will not only be consistent with the English and Pali, but also with all other translations. So a reader of a sutta in Italian can check, say, what the English translation of that segment was. In addition, we plan to implement segment-level notes, which could become available cross-language as well. Display options also become available, and we plan to enable the user to read the text in different ways as convenient, such as side by side, line by line, or with original text as popup (like Google translate).

So far the scope of the translation is the four nikayas, and I plan to cover all the EBTs in Pali, that is, including the six early books of the Khuddaka, and the Vinaya. I’d also like to do the relevant Sanskrit texts.

Our web app will be made available for other translators to sign up to, and we will support on an ongoing basis translation into other target languages. All our translations and software is, of course CC0, with no copyright restrictions of any sort.

In addition, we’ll look at extending support to the non-indic languages. This is not trivial, however, mainly due to the Taisho text’s notoriously erratic punctuation. I haven’t looked into segmenting Tibetan texts yet, although given the relatively small quantity doing it by hand is not unfeasible.

So anyway, at the very least making the ID system we use interchangeable with what you’re doing would be great.

But the main question I have for you is, how do you do it? I mean, to match up the texts at such a fine-grained level? To do it by hand with a small selection of texts is one thing, but across the whole corpus?