Evaluation of Translation Software

sirinath · July 12, 2015, 3:29am

I have seen on this forum and Ven. Sujato’s blog some references to translation related softwares. Perhaps the discussion can be brought onto one thread so all information is in one place.

Brenna · July 12, 2015, 5:33am

The conversation regarding the translation software is here:

mhviriyo · April 25, 2017, 1:41pm

I see this is an old thread. So not sure whether to start a new one or just plug into this. I could not seem to find much other discussion regarding software solutions except the single one regarding Pootle.

I am embarking on translating MN 10 into Estonian (and there is a whole Tipitaka translation project being set up). I could use a thousand ways to go about it technically from a simple document to setting up a CAT environment of some sort – but ideally I’d like to do it right in relation to the wonderful work being conducted here from the word go. Since we have almost nothing (besides an older paper translation of the Dhammapada), we can turn any which technical way we want.

For example, I’d be comfortable working with the .po file found here, but ourneed is actually a human readable text, not a machine readable one. If I invest the time into working with primitive tools such as Poedit or Gtranslator (OSS and Linux, of course), I might end up with quite some double work. Any point in going the .po way right now?

(For example, do we know, are there ways to extract the text out of a PO file (using a different spelling on purpose, for the search engine) into a nice human friendly layout? The original purpose of the .po files seems a bit different (localization), so I’m wondering about that. The format does not seem to contain markers for paragraphs and such, it’s just about “expression-for-expression” adaption, so that would lead to double work etc.

sujato · April 25, 2017, 11:15pm

Hi Ven,

Lovely to see you here, and thanks for the question.

If you’re embarking on a large scale Tipitaka translation project, i’d strongly encourage you to consider using our framework: that’s what we built it for! Not all the pieces are in place yet, which is why we have not opened it more widely, but the basic translation engine is more than usable. If you like, I can invite you to Pootle and you can have a look around.

You are quite right, PO files in and of themselves do not contain paragraphing and other essential information, such as reference metadata. What we do is extract that data from the HTML files and set it aside as comments in the PO file. It can then be recombined when exported back to HTML. For example:

#. </p><p><a class="sc" id="3"></a><a class="pts" id="pts5.312"></a>
msgctxt "sn54.1:6.1"
msgid "‘Pītippaṭisaṃvedī assasissāmī’ti sikkhati, ‘pītippaṭisaṃvedī passasissāmī’ti "
"sikkhati;"
msgstr "They practice like this: ‘I’ll breathe in experiencing rapture.’ They practice like this: ‘I’ll breathe out experiencing rapture.’"

This process is entirely automated, and the translator doesn’t have to worry about it. Just translate each segment, and it will be exported as well-structured HTML, ready for publishing on SC, and of course, perfectly usable in other sites as well.

We have, in addition, already set up a PO 2 LaTeX process, which is used for printing books. Our edition of the Majhima Nikaya in Portuguese has been done this way, and is at the printers now. This process also includes PO 2 EPUB.

In addition to this, the next version of Pootle (2.8) comes with built in Github integration (so-called “Pootle FS”). This means that all translations done on our system will come under Git’s awesome version control, keeping a permanent and precise record of all work and changes.

When our new site is launched, which we aim for the end of the year, it will contain my new translations, and all this stuff fully integrated under the hood.

While we are no ready to fully open for other language translations, our overriding aim is to serve as a platform for a new generation of translations. We already have a few people interested.

As for working on your own PO machine, the only problem is the segmenting. If the text is segmented different than ours, it becomes a hassle to integrate them: it must be done by hand. Of course you are most welcome to use our segmented Pali text if you like, should you prefer to not use our Pootle instance.

mhviriyo · April 26, 2017, 2:50am

Sorry. My unsuccessful choice of words. Irrelevant. Meant the “Entire Tipitaka into Estonian” project that an Estonian Bhikkhu has initiated (being different from my smaller goal of just getting down some critical suttas first – they would be included in the bigger project should it succeed in taking off, of course). He had commissioned setting up a collaborative translation environment but the direction did not technically work out. So now I can investigate Pootle and report to them as well.

sujato · April 26, 2017, 3:18am

Okay, I will invite you to join. we can set up an experimental branch in Estonian for you to play with.

Thanks for noticing! Yes, we take care to optimize such things.

Sorry, i seem to have missed something here: which other project are you referring to?

Indeed, this is an issue, which will affect any serious attempt at internationalisation. Generally speaking, the idea is that the segments are made on meaningful smantic units: sentences or clauses of sentences. On the whole, they can be translated segment-by-segment. But in any version there will be cases where this breaks down. So the idea is not to impose a literal and exact segment equivalence, but to use the segments to help aid consistent and clear translation.

One issue is that as I go, I am finding issues with the current segmenting, and when I have finished my translation I will go back and fix these all. This is one of the reasons we are not eager to start having lots of translations just yet. But still, it’s not too hard to sort this out in limited cases.

I suspect that we will encounter many such issues and as time goes on will have to learn case by case what the best approach is. Anyway, we can give it a go and see.

As for the epub, yes, we will make them available, but like so many things, this is waiting for development.