Pootle for translation

blake · August 24, 2015, 11:47am

This appears to be primarily a performance problem. It seems to consistently work fine on my desktop, although in an earlier iteration of the code there were severe problems if you hit ctrl-shift-down twice in rapid succession (I fixed this by simply ignoring subsequent uses until the page had reloaded). I suspect these shortcuts were removed due to performance issues, with that said, it should be fine on a powerful enough machine.

Also I suggest upgrading the database from SQLite to MySQL, as the de facto SQL server used by Pootle in production, this is likely to reduce the number of potential problems encountered so is a wise thing to do in general, and may make the advance-by-10 problem go away (though it works fine for me both under MySQL and SQLite). See the new instructions in the second post in this thread under “Upgrade to MySQL” - it should be relatively painless.

blake · August 24, 2015, 8:06pm

I have added this shortcut as ctrl+m, press multiple times to cycle through the suggestions. Patch as usual.

I’ll do this tomorrow. It’s tricky. It can appear at the start of paragraphs, at the end, in the middle and it can occur in pairs with a ‘pe’ in between, it’s really a case requiring special handling, it’s not tidy like conventional sentence breaks, or <br>s.

Done. Pootle’s diff will show the extra stuff as being deleted, but that doesn’t impact the matching - it is compared with the cruft completely removed.

More or less impossible due to how amagama works, it’s not very delete friendly. What I have done, is made it so the translation memory is regenerated every time the server is restarted, this will cause it to forget translations which no longer exist.
Although amagama isn’t very into deletes, I could add a soft hide function in javascript with significant hackery.

Also I rewrote remember.py to make it more performant, translation memory should be much more responsive now, in short it scans the most recently changed file (i.e. the one you are working on) for modification very frequently (4 times a second atm) so changes will be picked up nearly instantly.

Also start.sh can now be used to restart the server (i.e. it will automatically kill it if it is already running).

I should be able to fix the other points, not sure about find and replace, pootle’s search hardly seems to work to begin with. It might be more of a 2.7 task.

sujato · August 25, 2015, 1:16am

Very nice, ta.

Well, see how we go.

Excellent, very helpful.

Yeah, don’t worry about it, I’ll just reset from time to time. See how we go with 2.7.

Okay, fine. It’s not too much of a problem now, but will become more so as I go on. Now if I change something there’s only a few instance, but if I get to the end and want to redo a basic term, I’m stuffed. Not really, I can do it in a text editor, but still.

sujato · August 25, 2015, 1:22am

Ha ha ha. Relative to what, exactly? As I translated yesterday:

whipping, caning, and clubbing; cutting off hands or feet, or both; cutting off ears or nose, or both; the ‘porridge pot’, the ‘shell-shave’, the ‘demon’s mouth’, the ‘garland of fire’, the ‘burning hand’, the ‘grass blades’, the ‘bark dress’, the ‘antelope’, the ‘meat hook’, the ‘coins’, the ‘acid pickle’, the ‘twisting bar’, the ‘straw mat’; being splashed with hot oil, being fed to the dogs, being impaled alive, and being beheaded.

Relative to that, maybe.

Anyway, after much toil, evreything seems to be working okay, except I get

File "/home/sujato/pootle/env/local/lib/python2.7/site-packages/django/templatetags/cache.py", line 20, in render
raise TemplateSyntaxError('"cache" tag got an unknown variable: %r' % self.expire_time_var.var)
TemplateSyntaxError: "cache" tag got an unknown variable: u'settings.CACHE_TIMEOUT'

Which leaves me with a delightfully minimalist instance of Pootle: pure, unsullied white. Maybe it’s for the best… Anyway, I’m back to SQLlite for now.

Meanwhile,

He he he.

O, and

Has an existential problem, i.e. it doesn’t exist. You mean mysql-python, right?

And one more very minor bug. The popup for defining your own terminology is too persistent: it remains even on subsequent segments. It should fade with the normal lookup.

blake · August 25, 2015, 5:45pm

Ah yes, that bug. I suppose you could call it a “Everything is fine but I’m going to throw an exception and refuse to work anyway” bug.
It’s an instance dependent bug which sometimes goes away by itself, sometimes goes away with a trivial configuration change, sometimes is incredibly persistent, often goes away with a trivial and irrelevant change in install procedure (i.e. using a different version of pip, or installing under a different user account), is more likely to occur with SQLite (or at least is more reproducible with SQLite), but can occur with MySQL too. The most probable cause is that the server is transiently too busy to respond with a page, although it could also conceal a deeper problem.

I wouldn’t give up on MySQL just yet, the first thing to try would be running ./patch.sh, one of the things it does is clears the django cache. The second thing is giving the server a few minutes to digest stuff on first startup, as the CACHE_TIMEOUT error will often occur while the server is busy.

Ironically, the CACHE_TIMEOUT bug is the primary one I’m concerned about with staying with SQLite, because while it’s highly erratic in it’s causes, it’s definitely more likely to occur with SQLite.

Strangely I am unable to reproduce this.

blake · August 25, 2015, 9:31pm

sc-html2po.py updated:
… and … pe … now end segments.

git pull the suttacentral repository for update.

sujato · August 25, 2015, 11:32pm

Are you sure? I’m not seeing any new commits.

sujato · August 26, 2015, 1:01am

I tried those things, no success.

I should mention I’m using MariaDB rather than MySQL. Maybe this is the problem, but it should be a drop in replacement. Anyway, shouldn’t we be using MariaDB? It’s where the cool kids are at these days…

sujato · August 26, 2015, 10:29am

And another question. As we’ve discussed before, we should have descriptions for each of the suttas (at least in theory). I’m wondering whether I can do these as we go in the “comments” field of Pootle? i think we may have discussed this earlier, so forgive me if i’ve forgotten!

Another way would be to put these in a text file, in which case it would be nice to have a plain text file with the sutta IDs listed one per line.

blake · August 26, 2015, 11:19am

I forgot to push. I’ve also pushed a change to the pootle repository - remember.py would use 100% cpu and didn’t work properly.

blake · August 26, 2015, 11:31am

Well, I got it to work with MariaDB, which does seem to produce the TIMEOUT error quite reliably, but for me the following change does eliminate the CACHE_TIMEOUT error under MariaDB:
edit ~/.pootle/pootle.conf
Under the CACHES section, replace:

'BACKEND': 'django.core.cache.backends.db.DatabaseCache',

with:

'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',

That’s all. It reduces the number of things the database is being used for, and seems to resolve the CACHE_TIMEOUT problem.

blake · August 26, 2015, 12:02pm

That sounds excellent. Do whatever is most convenient for you as long as it is consistent. Note that pootle comments are not written to the .po files, they are kept in the pootle database, this is far from an insurmountable issue though, for example comments are easily extracted from the JSON from pootle dumpdata, and they are linked directly to the filename and via a key to the msgid/msgctxt. I think pootle comments would be a convenient way to do this, and what I would suggest is making the first comment on the first entry the description.

sujato · August 26, 2015, 11:29pm

Do you mean the sutta title? 'cos that’s where I was thinking of doing it. I have been very occasionally using the comments field for actual comments, mainly in places where I disagree with BB’s translation. If I keep these out of the “title” comments, will it still be easy to extract them? Or would using some identifier help? Or is it best just to have no comments apart from the descriptions?

One advantage of using the comments field over a separate file is that it makes it easy for subsequent translators to use for the same purpose, but only if the usage is clear and unambiguous.

blake · August 27, 2015, 12:07am

Yeah, in practice.

Easy enough for me. You can also add a translation comment on the title but after the description comment, as I said “first comment on the first entry” would be description, anything subsequent being ignored.

sujato · August 27, 2015, 12:21am

Excellent, that does it. Now running on MariaDB. Everything is much snappier, before there was always a lag, I assumed it was the browser, but obviously it was the DB. And the “move ten up and down” works great, too.

I still hate working with databases though. Just sayin’.

sujato · September 20, 2015, 8:11am

So a minor question here for @blake .

I think I mentioned some time ago that Pootle says it supports markdown, which would be great at some point. Right now, however, I’m finding that occasionally I’m wanting to write lists, and there’s no convenient way of doing it. So I can just write the HTML, fine. But I’m wondering if we can just do it more simply.

What if I put some sign, say a ~ at the beginning of each list item. The item ends with the next major punctuation (;:—.?!)

~ is convenient because it’s not used in the PO files at all, I think. Then we can just convert this to an HTML <ol> later.

Let me know if you think this is a good idea, otherwise I’ll just write the HTML.

blake · September 21, 2015, 1:45pm

I think pootle supporting markdown is in the context of static pages and perhaps templates, I would presume the translated strings would always be treated as plain text.

I think you could just use the hyphen since we don’t really use hyphens in the text content (they appear in the HTML, but this is irrelevant).

However using HTML has the advantage of being very explicit and since this is html5 you don’t need bother with </li>, you only need the </ul> at the end.

I’ll leave it up to you, but it’s certainly not hard to convert markdown style markup into HTML - with one provision - it becomes harder to perform the conversion reliably if the list spills over multiple msgstrings, probably safer in that case to use real HTML.

sujato · September 25, 2015, 11:29pm

Okay, well that’s good anyway. I’m doing a static page for translators’ guidelines and markdown would be perfect. But maybe wait for 2.7.

Which, BTW, is out already. I’m thinking that when I break to come to Europe, we can take the time out to upgrade.

Yes, I’ll bear this in mind.

blake · September 30, 2015, 11:48am

Pootle 2.7.0 was buggy as heck, as in you had to muck around with configuration and probably fix a bug or two just to make it kind of work. They’re up to 2.7.2 now and working on 2.7.3, so development is fast.

I’ll probably take another crack at installing it soon and see how it is progressing.

sujato · October 1, 2015, 11:48pm

No worries, no hurry.

One minor detail for the next iteration. We should extend the autocorrect for quotes to include:

ellipsis: … → …
en-dash: – → –
em-dash: — or –- → —

When, down the track, we start working with other languages, we’ll have to work out autocorrects for the different kinds of quote marks.