Pootle for translation

Okay, looking into this. A definite and actual issue here is if “rescan project files” is called on a large project, it will take a LOOOOOOOOOONG time to complete - maybe 10 minutes for the pali canon (but that depends on PC speed of course). During this time pootle will be at least partially unresponsive. Restarting pootle or the computer will not achieve anything, it will just continue rescanning after restarting, until it has finished. It will then respond normally again.

So for a large project, consider “rescan project files” a bit of a nuclear option.

What seems to work fine, is uploading only the modified po file, and choose the “Overwrite the current file if it exists” option. This seems to work instantly and perfectly. Note you can upload a po files, or a zip of po files. Uploading the po file directly from ~/pootle/po/... is fine.

In principle it should be possible to implemented the ability to modify the msgid directly from pootle, but bearing in mind this goes rather against the intended usecase of pootle, and modifying pootle is always a bit of a case of going down the rabbit hole because it’s always like javascript ajax > django > pootle library > translate toolkit library. I can probably do this though, especially if you anticipate regularly wishing to edit the pali.

That’s great, thanks.

I’m not sure what this means, and it doesn’t seem to do anything.

The only other one that I have been looking for is “Copy the contents from the original language”, which is supposed to be Alt+Down, but doesn’t work.

Okay, seeing changes now.

Where? Presumably in the SC folder?

Okay, I’ll try this, and we’ll see how often it comes up. If it’s only occasional, then this should be good enough.

One minor issue with sc-html2po that I just came across. Occasionally the Pali text has constructions like:

(Etadaggavaggo niṭṭhito.)

The script puts the ) in the next segment, by itself.

Yeah, the Pali Lookup is really a part of the SuttaCentral server, Pootle just hooks into it. The SuttaCentral server is responsible for loading all the data into Elasticsearch.

I’ve added this function. It is bound to alt+down, and also ctrl+b. Linux can be finicky around binding alt key, also Ubuntu likes to intercept alt, at the moment I have my compose key set to right alt, and that causes it to be intercepted before reaching javascript - left alt is both intercepted by ubuntu (brings up “type a command”) and by javascript so it does technically work. Anyway that’s why I set it to ctrl+b as well.

Changes are pushed, do the patch process as usual.

Fixed and pushed. The fix is that trailing punctuation after a segment break is now kept with the segment.

I also fixed another bug - in the msgids, double quotes (i.e. in inline html) weren’t being escaped. Pootle seems to handle this leniently, but according to the po specification a po utility would be quite entitled to terminate the string early.

EDIT: Another update: Now inline HTML which follows a segment break, will generally be attached to that segment, if it is sufficiently trivial, which is defined as “not containing anything more than punctuation and numbers”, which means things like brnums and anchors, will be attached to the preceding segment.
Also an anchor at the end of a segment (without any following text) will be turned into a HTML comment, but only if it is empty - this works on the assumption it is a page number marker or something. Anchors with content (often a number) might want to be translated as a HTML list or something so remain part of the segment.

Okay this works, but on further attempts the “move ten segments up and down” don’t work: in fact they often freeze the whole page, you have to kill it and reload. occasionally they do work, but too erratically to be any use. The “go to end and go to start” works fine, though.

ctrl+b works well, becuase you’re also using ctrl for accepting the segment and moving on. It would be nice to have something for accepting the TM suggestion. Something also with ctrl, perhaps ctrl+m or ctrl+?. Ideally you could select the first, second, or third suggestion, but even just choosing the first one would be a great help.

Thanks. I’m near the end of my first folder (AN1), so i’ll wait till I do the next folder before trying this out. One detail: can you add … to the list of “breaking punctuation”? I’m finding that abbreviated portions are slowing things down a lot, and this might help.

I’ve just discovered a bug in the Pali text. This is found in the actual text, not just the PO-ified one:

Tāni abhibhuyya: ‘ānāmi passāmī’ti

The “j” is left off “jānāmi”. This is in an1.394-574.html, I am not sure if it is elsewhere, but a quick search doesn’t show any other instances. I have fixed it, and some punctuation issues, and pushed the changes.

Here I’ll make a list of things that can improve Pootle. None of them are urgent, and we have discussed them before, but I am gathering them here so we have a record, especially as I get more experience using it.

  • Find & replace: if we leave out a “replace all” option, this would be a great help.
  • The inline markup, esp. variant readings, stuffs up the TM. It would be good to keep the variants, but hide them from TM.
  • A ⊗ to dismiss false TM suggestions. I think this would work better than using “smart” techniques. Often enough, the suggestion is just a mistake that you’ve corrected, or a phrasing that you’ve changed, and you easily know to get it gone, where a machine would struggle.
  • Occasionally a long compound triggers an overflow X. It would be nice to implement SC’s hyphenation.
  • Also on long compounds, the “add to terminology” widget breaks with compunds of a certain size; the box appears too far to the left and you can’t get to it.
  • I notice in pootle.conf that markdown can be enabled. I wonder whether this would be a good thing, for certain cases. For example, sometimes you want use emphasis, or
  • make a list
  • where there isn’t one in the Pali, and markdown is great for this kind of thing. Or is it going to screw with our HTML?

This appears to be primarily a performance problem. It seems to consistently work fine on my desktop, although in an earlier iteration of the code there were severe problems if you hit ctrl-shift-down twice in rapid succession (I fixed this by simply ignoring subsequent uses until the page had reloaded). I suspect these shortcuts were removed due to performance issues, with that said, it should be fine on a powerful enough machine.

Also I suggest upgrading the database from SQLite to MySQL, as the de facto SQL server used by Pootle in production, this is likely to reduce the number of potential problems encountered so is a wise thing to do in general, and may make the advance-by-10 problem go away (though it works fine for me both under MySQL and SQLite). See the new instructions in the second post in this thread under “Upgrade to MySQL” - it should be relatively painless.

I have added this shortcut as ctrl+m, press multiple times to cycle through the suggestions. Patch as usual.

I’ll do this tomorrow. It’s tricky. It can appear at the start of paragraphs, at the end, in the middle and it can occur in pairs with a ‘pe’ in between, it’s really a case requiring special handling, it’s not tidy like conventional sentence breaks, or <br>s.

Done. Pootle’s diff will show the extra stuff as being deleted, but that doesn’t impact the matching - it is compared with the cruft completely removed.

More or less impossible due to how amagama works, it’s not very delete friendly. What I have done, is made it so the translation memory is regenerated every time the server is restarted, this will cause it to forget translations which no longer exist.
Although amagama isn’t very into deletes, I could add a soft hide function in javascript with significant hackery.

Also I rewrote remember.py to make it more performant, translation memory should be much more responsive now, in short it scans the most recently changed file (i.e. the one you are working on) for modification very frequently (4 times a second atm) so changes will be picked up nearly instantly.

Also start.sh can now be used to restart the server (i.e. it will automatically kill it if it is already running).

I should be able to fix the other points, not sure about find and replace, pootle’s search hardly seems to work to begin with. It might be more of a 2.7 task.

Very nice, ta.

Well, see how we go.

Excellent, very helpful.

Yeah, don’t worry about it, I’ll just reset from time to time. See how we go with 2.7.

Okay, fine. It’s not too much of a problem now, but will become more so as I go on. Now if I change something there’s only a few instance, but if I get to the end and want to redo a basic term, I’m stuffed. Not really, I can do it in a text editor, but still.

Ha ha ha. Relative to what, exactly? As I translated yesterday:

whipping, caning, and clubbing; cutting off hands or feet, or both; cutting off ears or nose, or both; the ‘porridge pot’, the ‘shell-shave’, the ‘demon’s mouth’, the ‘garland of fire’, the ‘burning hand’, the ‘grass blades’, the ‘bark dress’, the ‘antelope’, the ‘meat hook’, the ‘coins’, the ‘acid pickle’, the ‘twisting bar’, the ‘straw mat’; being splashed with hot oil, being fed to the dogs, being impaled alive, and being beheaded.

Relative to that, maybe.

Anyway, after much toil, evreything seems to be working okay, except I get

File "/home/sujato/pootle/env/local/lib/python2.7/site-packages/django/templatetags/cache.py", line 20, in render
raise TemplateSyntaxError('"cache" tag got an unknown variable: %r' % self.expire_time_var.var)
TemplateSyntaxError: "cache" tag got an unknown variable: u'settings.CACHE_TIMEOUT'

Which leaves me with a delightfully minimalist instance of Pootle: pure, unsullied white. Maybe it’s for the best… Anyway, I’m back to SQLlite for now.

Meanwhile,

He he he.

O, and

Has an existential problem, i.e. it doesn’t exist. You mean mysql-python, right?

And one more very minor bug. The popup for defining your own terminology is too persistent: it remains even on subsequent segments. It should fade with the normal lookup.

1 Like

Ah yes, that bug. I suppose you could call it a “Everything is fine but I’m going to throw an exception and refuse to work anyway” bug.
It’s an instance dependent bug which sometimes goes away by itself, sometimes goes away with a trivial configuration change, sometimes is incredibly persistent, often goes away with a trivial and irrelevant change in install procedure (i.e. using a different version of pip, or installing under a different user account), is more likely to occur with SQLite (or at least is more reproducible with SQLite), but can occur with MySQL too. The most probable cause is that the server is transiently too busy to respond with a page, although it could also conceal a deeper problem.

I wouldn’t give up on MySQL just yet, the first thing to try would be running ./patch.sh, one of the things it does is clears the django cache. The second thing is giving the server a few minutes to digest stuff on first startup, as the CACHE_TIMEOUT error will often occur while the server is busy.

Ironically, the CACHE_TIMEOUT bug is the primary one I’m concerned about with staying with SQLite, because while it’s highly erratic in it’s causes, it’s definitely more likely to occur with SQLite.

Strangely I am unable to reproduce this.

sc-html2po.py updated:
and … pe … now end segments.

git pull the suttacentral repository for update.

Are you sure? I’m not seeing any new commits.

I tried those things, no success.

I should mention I’m using MariaDB rather than MySQL. Maybe this is the problem, but it should be a drop in replacement. Anyway, shouldn’t we be using MariaDB? It’s where the cool kids are at these days…

And another question. As we’ve discussed before, we should have descriptions for each of the suttas (at least in theory). I’m wondering whether I can do these as we go in the “comments” field of Pootle? i think we may have discussed this earlier, so forgive me if i’ve forgotten!

Another way would be to put these in a text file, in which case it would be nice to have a plain text file with the sutta IDs listed one per line.