We are looking to add the site UI translations to Bilara. Here is a technical discussion towards that.
@blake @Robbie @karl_lew @sabbamitta @HongDa @Aminah
Segment IDs
UI text differs in several respects from suttas. For the sutta, the segment numbers:
- Uniquely identify a segment in the corpus
- Give the sequence of segments inside that text.
- Reflect the semantic structure of the text and/or connect it with other editions
- Coordinate segments across a wide variety of materials (notes, variants, etc.)
- Maintain consistency over time.
For UI text, points 3, 4, and 5 are irrelevant, which makes things simpler. On the other hand, we introduce a new element:
- UI text changes over time.
This is a significant difference, since the assumption behind Sutta text is that it aims for 100% stability. Our site UI does not change very much, but still, it should be simple to update.
Currently the UI files use a hash to identity strings. This hash is wrapped in HTML for structure, and the whole served as a tagged template literal via LitElement’s html
function. Let’s take the Start page as an example.
<p>
${this.localize('5b7b920de41da5d9441b0e71fbe6fa63')}
</p>
The system is instructed to localize it when the text is requested in a different language.
Note that this system is quite different to the one used for texts on Bilara. Even the location is different: these texts are stored in /suttacentral.
Of the two main functions of segement IDs, hashes only achieve the first: they uniquely identify the string. However as random numbers they do not provide a sequence. What to do?
This is my first proposal:
Handling of UI text should be as similar to suttas as possible.
With the sutta IDs we have the form:
"mn1:1.1": "So have I heard."
Where what is to the left of the colon identifies the text, and what is to the right identifies the segment. We can do the same thing: the text is identified with the slug, and the number is a simple increment. Since we have nor requirement to make the IDs readable or consistent with external sources, let’s just use a plain increment and make sure it’s sortable.
"start:0001": "When it comes to reading suttas, there are as many approaches as there are readers. The single most helpful thing to understand the suttas is a good teacher. As it says in the Boat Sutta (Snp 2.8):"
So what happens if the text changes?
If a segment is removed, no problems. The numbering goes 001, 002, 003, then becomes 001, 003. It’s still sorted and still unique.
If a segment is changed, i.e. the content is edited, also no problem. It remains unique. The uniqueness is not mean to say that it is the same exact thing, merely that it occupies the same position.
If a segment is added, it’s a little more complicated. One way would be to add a point segment number to account for it: 001, 002, 002.1, 003.
Handling this is simpler than it was for the suttas, since there is no semantic requirement. For example, we don’t need to make sure that a heading has the same major segment number as the section it begins. Still, if lots of segments are added and changed over time, it might get complicated.
Perhaps it would be better to simply add the segments and just increment everything after that. This is trivial in bilara i/o.
Making changes
Apart from ordinary editing of segments, changes must be done outside of Bilara. Normally this will use Bilara i/o.
- The repo is updated locally
- Bilara i/o is run on the relevant text to produce a spreadsheet with all fields
- Fields are edited in spreadsheet; rows may be added or deleted.
- Segment numbers generated in spreadsheet.
- Bilara i/o run in reverse to regenerate the JSON files, which are pushed to Github.
HTML in JSON
For the sutta texts, they are almost entirely plain text, with occasional Markdown. The UI texts are more complex, they include a variety of HTML inside the segments. This is a hassle to edit, but there seems no alternative, it should remain as-is.
Notifying translators of changes?
Finally, a question. This in fact applies to the sutta translations as well. What happens when a root is changed? We should notify anyone who has made a translation of that string. How best to do this? Is there a native Github function that can help?
And what about translations of suttas, where the root will not change, but where the English translation may change? A translator who has relied on that would surely want to know. Perhaps we should add a field to publication.json. In fact this is good information to know anyway.
"source": ["pli-ms", "en-sujato", "de-sabbamitta"]
Meaning:
“This translation is based on these principal sources: Mahasangiti in Pali, Sujato in English, and Sabbamitta in German.”
Then whenever there is a change to a string in the relevant source, the translator is notified.
If a translator relies on a source that is not in Bilara, it can be recorded in a second field:
"external-source": ["pli-pts", "en-bodhi"]