Text-critical information styling



gaiji: span (13144)
t-gaiji: span (61)

It seems to me that .gaiji is correct markup here and not .t-gaiji. Which one do you prefer?


They should consistent, that’s the main thing. As a convention, I have used t- to indicate classes that are taken over from CBETA. So to be consistent we’d keep to t- here.

Yes, that sounds good. For now, if we need to use more colors than are provided in our sc-colors.html, we can just use any arbitrary color until we figure out exactly how many shades we need, then we can define them properly and add them to sc-colors.html.

For the rest of the cases you’ve mentioned, I’ll get around to them this afternoon or tomorrow, I’m still in Kaosiung, off to Qimei in an hour or so.


Have a good flight. :small_airplane:
You might have read my message on Slack so I might not be there next meeting. :cyclone: :sandstorm:

And I’m making a bogus sutta file for testing all the classes together because we don’t have access to everything on next-sc yet.



The ones that are in the html and not in the pootle files are now marked up as:
<span class="var" id="note1407" title="vibhīṭa­ka­miñjiyo (bj, pts1) | vibheda­ka­bījiyo (s2, s3, mr)">vibhītakamiñjiyo</span>

Should that change to:
<span class="var" title="vibhītakamiñjiyo → vibhīṭa­ka­miñjiyo (bj, pts1) | vibheda­ka­bījiyo (s2, s3, mr)">vibhītakamiñjiyo</span>

Another interesting one I found in pli-tv-kd15:
<span class="var" hot="" id="note232" kho="" lasu="" me="" moggall="" ph="" pubbe="" title="lasunena me āvusoti (pts1) | atha kho āyasmā sāriputto āyasmantaṃ ­mahā­mog­gallā­naṃ etadavoca-- \" udarav="">
What is that all about? Regex error?


Really? Crikey, hunker down and think of Dune!


That’s great. Hopefully we’ll arrive to an annotated Pali corpus eventually, and TEI markup will be quite helpful for structurizing it.

For example, Frederik Elwert and Sven Sellmer have improved upon the VRI TEI markup, as described in their article:āli-canon.pdf/

and I think, TEI markup will be essential for future work on annotated Pali corpus.


Thanks for the reference.

They could have saved some time by using our text, as we have already done an equivalent, in some cases greater, level of semantic markup, as well has using a more accurate base text. But the work on POS tagging looks interesting.

We are already moving beyond the concepts of TEI, and hope to do so more in the future. Basically there are a number of inherent problems in the very notion of “markup”, and we believe that an emerging concept called “standoff properties” will be more powerful and flexible. The basic idea is that you separate the text, and any markup or reference data is maintained in separate files. Our new generation of translations applies a small step in this direction.


yes, that’s right.

Yes, I agree.

yes, it’s correct. The pali text says “The section on “arising together” is [to be expanded] the same as the section on conditions”

Yes, that’s right.

I think leaving it as add is okay. Generally speaking, we can consider add as a more general catch-all category, so it can include remarks indicating expansion of text. However pe is more specific, so if it is clearly just an indication of expansion, pe is preferable. In this case, it’s not clear to me whether it is indicating a pure expansion, so best leave as add.

I agree with your remarks here.


Yes, I agree.

Again, i agree with your remarks here.

Again, I agree, although I would regard these as not so vital. If it’s possible to easily distinguish which should be pe, then great, otherwise it’s fine to leave tham as add.


No, they should stay as they are. The difference is that with sic/corr, there is a direction: the incorrect reading has been changed to the correct reading. With the variants, there is no direction: they are merely variations in different manuscripts.

Lol, yes, this is some kind of demonic regex Mara.


I will no doubt miss a few because going over 24000 instances with a toothcombe would take too much time, but I will do what I can.
One more question:
<p><span class="add">The devatā:</span></p> followed by a verse (English suttas).
Should this not be .speaker class?


Yes, it should.


Most I can manage to get, except the Thai. Google translate gives me interesting translations like “Sutra is not a pagan” and “Another 50 recipes to make this baroque implicit.”, which doesn’t help. So for the Thai I’ve left it as is.


Now for the styling, I have not added anything for CORR and VAR yet but did the styling for the rest.
So please run make run-dev and then make load-data and go to to see the test-file. Some styles are already applied, others are only applied in Text Info mode. All the above styles should be there.

Other classes

Other classes we have not dealt with:

suppliedmetre: (this is where the metre is constantly visible, while in the metre class it only shows up when TI is on)

show: only appears together with “add” does not seem to be doing much at all. I have also just found that we have a file that does not show up in the menu because the menu divides LAL into it’s chapters and this one does not: In any case, the headings use the “show” class. I suggest to remove it everywhere (34 instances in 7 files).

term on strong tags: seems superfluous because term is already defined as bold in the css.

long-var: is the same as var on a div-block instead of a span. Seems a bit superfluous (see f.i. pi/ja539) as it is supposed to do exactly the same.

surplus: appears only 4 times in 1 file:

Sutta-classes that puzzle me:

uddesa: there are only 5 of those like here: but there is no css for it. When changed, these also need changing in the pootle files.

uddana & uddanagatha are not used consistently. Sometimes the pali uses the one and the english translation the other (see f.i. and I have used div class="uddana" in the pootle files but that might be wrong.

There are quite a few more but let’s just start with these.

Something else weird:

What is this: ??


Just to come back to the use of .scribe and .supplied in san-mu-kd6:

According to the source text: ITALICS for restored text.
So these are now marked supplied because according to your instructions: editorial restoration of lost text --> supplied

I agree that the .scribe can be better marked .add, but I wonder if you saw the note on top of the source text marking italics as restored (and thus supplied) text.

After changing all the .scribe there to .add, there are only 5 instances of .scribe left. There is also no title tag on .scribe to denote what it really is.

  445  罽賓律師佛陀什。彌沙塞部僧也。</p>
  446  <a class="t" id="t0194b23"></a>
  447: <p><span class="scribe">
  448  <a class="sc" id="4"></a>
  449  <a class="t-linehead" id="t-linehead0194b23"></a>

  323  <p><a class="jb" data-uid="gdhp277" id="277"></a>akodhaṇasa vi<span class="add">yi</span>di<br>ṭ́hidadhamasa rayiṇo<br>suhu puruṣu asea<br>śidachade va sva ghari ◦</p>
  324  <p><a class="jb" data-uid="gdhp278" id="278"></a>uṣavha viva go­sag̱i<br>śilamadu akodhaṇo<br>baho ṇa payuvasadi<br>rayaṇa viva dhamia ◦</p>
  325: <p><a class="jb" data-uid="gdhp279" id="279"></a>hasti va muyajadaṇa<br>śela<span class="add">ṇa</span> hemavañ iva<br>sakaro va śravadiṇa<br>adico tavada r iva ◦<br><span class="scribe">khaṇakhaṇi tidikṣea<br>kodhu rakṣea atvaṇi</span></p>
  326  <p><a class="jb" data-uid="gdhp280" id="280"></a>jiṇa kodha akotheṇa<br>asadhu sadhuṇa jiṇa<br>jiṇa kradava daṇeṇa<br>saceṇa alia jiṇa ◦</p>
  327  <p><a class="jb" data-uid="gdhp281" id="281"></a>saca bhaṇi na kuvea<br>daya apadu yayida<br>edehi trihi ṭ́haṇehi<br>gacha devaṇa sadii ◦</p>

 1938  teṣāṃ ca yo nirodha evaṃvādī mahāśravaṇaḥ |</p>
 1939  </blockquote>
 1940: <p><span class="scribe">
 1941  <a class="sc" id="294"></a>
 1942  yo dharmo ’yaṃ pravaramahāyānayayisya śākyabhikṣuloka śrī<span class="supplied">dharasya</span>

  100  <span class="counter">tha</span> ||
  101  <span class="counter">tha</span> ||</p>
  102: <p lang="xct" class="scribe">
  103  <a class="sc" id="7"></a>
  104  ’phag-spa dge-’dun-phal-chen-pai ’jig rten-las-’das-par-smra-bai dge- sloṅ-gi

 1867  mahāśravaṇaḥ | ye dharmmo yaṃ pravaramahāyāna payiśya śākyabhikṣuloka
 1868  <span class="gap">....</span> |</p>
 1869: <p><span class="scribe">
 1870  śākyabhikṣu śrīvijaya­bhadra­likhitamidam</span></p>
 1871  </article>


Added pull request for CSS/JS info to be uploaded to next-sc. Should be done when the devas wake up so you can see on the bogus sutta how it looks.


Sounds wise.

Yes, go ahead.

The only reason to keep this would be to make the display more robust if the CSS is missing. But it’s not really critical, so I agree, replace them with span.

Sure, just replace with var.

Yes, it’s odd, the GRETIL source gives these passages with <b> tags, and I guess I interpreted them in line with TEI’s “surplus”: “Text present in the source which the editor believes to be superfluous or redundant”

i’m not sure if this is really that different from <sic>. Looking at the Sanskrit, by comparison with the Pali, it does seem like the marked passages are surplus, although it’s not really clear. I would suggest simply keep it as “surplus”, give it secondary-text-color, and supply the title as per the TEI definition.

I would just get rid of these.

The uddanagatha really just adds a bit of specificity. IIRC, I probably used it originally for the Vinaya in LaTeX, where inheriting style classes is harder than in CSS. So it really just means the same thing as a class="gatha" inside a uddana.

I’ll leave this up to you. If you think it’s easier to eliminate it and just infer the gatha styling within an uddana, that’s fine. Most of the uddanas are verse anyway, so uddanagatha isn’t really doing much. But if it’s easier to leave it, it doesn’t really matter if the markup is inconsistent between text and translation, again, it’s just a bit more specific.

I’m not sure what the issue is here?

“Note of attribution by the scribe of the manuscript.”


According to your post above on this issue, you wanted to replace all <span class="supplied"> with <i>. I wonder if that is correct. The original text remarks that all italics are indeed reproduced texts.

Will have a look at all this tomorrow/this evening/afternoon … whenever.


No, it’s not, I’m afraid: I was talking about one specific case.

In that case, a note in English contained a phrase in Sanskrit. The <i> tag in HTML5 is used for “alternate voice”, including uses of words foreign to the language context. Thus we normally use <i> for marking words in Pali or Sanskrit when they occur in a passage in another language.

supplied in other contexts should remain unchanged. I hope this doesn’t make a lot of work for you!

Just so we’re clear, none of this has any relation to how <i> tags may be used in GRETIL or any of our source texts.


Yes, I know. That’s why I enquired about it; it seemed very odd to change all supplied text to italics. Now I understand what you mean.


Well, the difference is that we have no more <sic> as per the above discussion.

Current code for surplus is:

.surplus {
display: none;
color: var(–sc-secondary-text-color);
.infomode .surplus {
display: inline;

So it disappears when TI is not on. Do you want to keep that?

Done that in san-mu-kd6 as indicated, but please run a regex on <span class="add">.*?<span class="supplied">.*?</span></span> and you’ll see 35 more cases that might be the same. Please pull nextdata first and advice.

I’ve uploaded a new pull-request to show the actual working of the JS as well for all classes except .var and .corr. It’s best visible on dn2 mockup (don’t look at dn1 mockup). So now what happens is that titles with explanations are only visible on those classes that show a difference in the actual text. So f.i. on .gap, it is always visible because this always has a different colour. But for instance on .add the title is only there when the Textual Information is on. Does that make sense?


Ayya, just so you know, I’ll be focusing on proofreading today, and will hopefully look at this tomorrow.