Handling Site UI (and other dynamic text) in Bilara

sujato · June 16, 2020, 2:01am

We are looking to add the site UI translations to Bilara. Here is a technical discussion towards that.

@blake @Robbie @karl_lew @sabbamitta @HongDa @Aminah

Segment IDs

UI text differs in several respects from suttas. For the sutta, the segment numbers:

Uniquely identify a segment in the corpus
Give the sequence of segments inside that text.
Reflect the semantic structure of the text and/or connect it with other editions
Coordinate segments across a wide variety of materials (notes, variants, etc.)
Maintain consistency over time.

For UI text, points 3, 4, and 5 are irrelevant, which makes things simpler. On the other hand, we introduce a new element:

UI text changes over time.

This is a significant difference, since the assumption behind Sutta text is that it aims for 100% stability. Our site UI does not change very much, but still, it should be simple to update.

Currently the UI files use a hash to identity strings. This hash is wrapped in HTML for structure, and the whole served as a tagged template literal via LitElement’s html function. Let’s take the Start page as an example.

 <p>
     ${this.localize('5b7b920de41da5d9441b0e71fbe6fa63')}
 </p>

The system is instructed to localize it when the text is requested in a different language.

Note that this system is quite different to the one used for texts on Bilara. Even the location is different: these texts are stored in /suttacentral.

Of the two main functions of segement IDs, hashes only achieve the first: they uniquely identify the string. However as random numbers they do not provide a sequence. What to do?

This is my first proposal:

Handling of UI text should be as similar to suttas as possible.

With the sutta IDs we have the form:

"mn1:1.1": "So have I heard."

Where what is to the left of the colon identifies the text, and what is to the right identifies the segment. We can do the same thing: the text is identified with the slug, and the number is a simple increment. Since we have nor requirement to make the IDs readable or consistent with external sources, let’s just use a plain increment and make sure it’s sortable.

"start:0001": "When it comes to reading suttas, there are as many approaches as there are readers. The single most helpful thing to understand the suttas is a good teacher. As it says in the Boat Sutta (Snp 2.8):"

So what happens if the text changes?

If a segment is removed, no problems. The numbering goes 001, 002, 003, then becomes 001, 003. It’s still sorted and still unique.

If a segment is changed, i.e. the content is edited, also no problem. It remains unique. The uniqueness is not mean to say that it is the same exact thing, merely that it occupies the same position.

If a segment is added, it’s a little more complicated. One way would be to add a point segment number to account for it: 001, 002, 002.1, 003.

Handling this is simpler than it was for the suttas, since there is no semantic requirement. For example, we don’t need to make sure that a heading has the same major segment number as the section it begins. Still, if lots of segments are added and changed over time, it might get complicated.

Perhaps it would be better to simply add the segments and just increment everything after that. This is trivial in bilara i/o.

Making changes

Apart from ordinary editing of segments, changes must be done outside of Bilara. Normally this will use Bilara i/o.

The repo is updated locally
Bilara i/o is run on the relevant text to produce a spreadsheet with all fields
Fields are edited in spreadsheet; rows may be added or deleted.
Segment numbers generated in spreadsheet.
Bilara i/o run in reverse to regenerate the JSON files, which are pushed to Github.

HTML in JSON

For the sutta texts, they are almost entirely plain text, with occasional Markdown. The UI texts are more complex, they include a variety of HTML inside the segments. This is a hassle to edit, but there seems no alternative, it should remain as-is.

Notifying translators of changes?

Finally, a question. This in fact applies to the sutta translations as well. What happens when a root is changed? We should notify anyone who has made a translation of that string. How best to do this? Is there a native Github function that can help?

And what about translations of suttas, where the root will not change, but where the English translation may change? A translator who has relied on that would surely want to know. Perhaps we should add a field to publication.json. In fact this is good information to know anyway.

"source": ["pli-ms", "en-sujato", "de-sabbamitta"]

Meaning:

“This translation is based on these principal sources: Mahasangiti in Pali, Sujato in English, and Sabbamitta in German.”

Then whenever there is a change to a string in the relevant source, the translator is notified.

If a translator relies on a source that is not in Bilara, it can be recorded in a second field:

"external-source": ["pli-pts", "en-bodhi"]

karl_lew · June 16, 2020, 3:27am

Anagarika Sabbamitta has been meticulously looking at all commits and has followed all quite closely. Others might use native git functions to compare commits over a date range in root/… For example:

git whatchanged --since="1 week ago" -p root/pli/ms/sutta/**

Unfortunately, the web interface shows massive amounts of change in root over the past few months and it’s a bit hard to wade through.

sujato · June 16, 2020, 7:41am

I wonder if those Github APIs can be leveraged to provide notifications? It needs to be granular: a list of segments would be best.

musiko · June 16, 2020, 11:25am

Is this something you would find useful?

karl_lew · June 16, 2020, 12:39pm

Root text changes should be rare and we’ve all been attending to those changes for renumbering, but your own ongoing translation changes will probably be followed avidly by all the readers of SuttaCentral, translators or otherwise. The informal D&D posts about typos seem to address such changes in a remarkably efficient, clear and public manner. The zealous may be inspired to resort to git or Github for further deep dive, but the D&D posts are quite useful to the community.

I’m also not sure we need to track the exact provenance of each translation in _publication.json since that tracking would be incomplete and brittle. Translators rely heavily on your own translation but they also draw on multiple resources in and beyond SC. For example, Anagarika Sabbamitta draws upon existing German translations that she can find. We may need to rely on translators own acknowledgements, which might properly reside in _authors.json.

sabbamitta · June 16, 2020, 12:54pm

Thank you for considering this, this would indeed be a great help.

That’s right, I do compare other translations. But I wouldn’t say I use them as my sources. My sources would indeed be well described by
"source": ["pli-ms", "en-sujato"], and only changes in these would be of interest to me. They would definitely be of interest.

I’ve been taking a lot of trouble not to miss any changes in translation-en-sujato, as Karl already mentioned, and had just assumed that would always have to be the hard way. So the idea of being notified sounds like … delicious luxury!

sujato · June 16, 2020, 10:43pm

Perhaps, maybe it uses a similar functionality. The notifications would be on Bilara, though, (and maybe Github) not on Discourse.

Indeed. Generally, published texts won’t change very often, but Bilara texts will usually be “in-progress”. In any case, the issue isn’t relevant for third-party translations, as we have no means of tracking changes anyway.

If we track primary source texts via "source": ["pli-ms", "en-sujato"],, probably the best thing would be to mention any other texts in "text_description" or in a more detailed project description.

We aim to please! I’ll discuss it with Blake tomorrow.