Text-critical information styling

Vimala · November 17, 2017, 1:50pm

Sure, but with things like this I find Discourse sometimes a bit difficult, especially if you change things later on. Maybe you can make new posts for new findings?

This is only one example; I made a quite a few changes already. But I’m stuck with the rest of them so could you please go over the list I posted on Jira to see if there is anything else that needs changing. Also if there are any classes we have missed. There are some classes that only have 1 instance in the texts to I assume they are wrong. There are also a few instances of empty classes so those seem to be wrong too. I put a few small questions in the list for that also.

I will check with Kuba. I think he was doing something with that in order to prevent it getting caught up with the pali lookup tool. Otherwise, I will just put it in again.

For instance here: https://suttacentral.net/en/kv1.1/29.93-29.99 it is a span class. Maybe change all those numbers to a-tags (what you call reference tags?)

Will look into this a bit more later.

Vimala · November 17, 2017, 9:30pm

Re: .lem / .rdg classes:

I’ve changed a lot of them to .var but am still stuck on 22 files so need some help:

Find Results.txt.zip (11.3 KB)

Vimala · November 17, 2017, 10:11pm

In theory, that should work. In practise we have:
choice: span (578)
corr latin: span (4)
corr: span (3560)
sic: span (463)

So we have a total of 3564 corr-spans and many less choice and sic-spans. So this would indicate that “choice” is not the dominant way to use “corr” and “sic”. For instance in the Dharmaskandha, “corr” is used just like that without any corrected form or anything: tpadyante or in t10, it is used like this: ] cf.【麗】">, so there is a title element already, but nothing that is actually corrected …

sujato · November 19, 2017, 9:55am

Okay, so let’s work through the choice/sic/corr cases, figure out what’s going on and resolve them. For the most part, here I am just focusing on the markup rather than JS/CSS, although I make a few remarks about how these might be handled.

These should cover pretty much all the cases. I have messed around a lot while doing this, so I won’t make the changes myself, if that’s okay, but leave them to you.

One trick I use that may be handy, when doing work like this I add a z or something to the changed cases (class="zcorr", etc.) so that I can be sure that all have been corrected. At the end I remove them.

Make sure to read the whole thing carefully before making any changes, and check with me if anything’s unclear.

The Right Way! Or is it?

Sometimes we have the full paradigm, which is the preferred way of using these tags. These should change as indicated above. For example in lzh-mi-bu-pm:

<span class="choice"><span class="sic">搏</span><span class="corr">摶</span></span>

Which above I suggested we adapt to:

<span class="corr" title="Corrected from: 搏">摶</span>

But see the next example!

Cases: 324

An Even Righter Way

Sometimes the Chinese text uses the following form:

 <span class="corr" title="[曰&gt;白]">白</span>

There’s actually a good argument in favor of something like this. Since it is a Chinese (or Sanskrit, etc.) text, perhaps it is better to adopt a language-independent way of presenting it. But we can do better than this ASCII approach:

 <span class="corr" title="[曰→白]">白</span>

What do you think? Using the arrow is nice and clear. If we need to add explanatory text, we can do it via JS rather than hard-coding.

Cases: about 800

Another kind of thing

In Dk 5 we find eg.:

u<span class="corr">tpadyante</span>

Here no sic is present. All we can say is that this represents a form that has been corrected by an editor.

We also find things like this:

<span class="del">|</span> tasya tajjā vedanā <span class="corr">|</span>

The | is a mere punctuation, and in this instance the editor has apparently deleted an earlier punctuation and used their own. Strictly speaking we should probably use sic instead of del here, but never mind.

Nothing needs be done to the markup in these cases.

Cases: 1452

Complicated Chinese stuff

Sometimes we find corr used together with gaiji. Painful!

<span class="corr" title="[劫&gt;却]"><span class="gaiji">却<!--gaiji,卻,1[去*ㄗ],2&#x5374;,3--></span></span>

Here they can be treated as normal, just inserting an arrow and leaving the gaiji code as is. Note that such cases often span more than one line for some reason. Probably HTML Tidy or something decided to put the gaiji tag on its own line at some point, but this is a mistake.

<span class="corr" title="[劫→却]"><span class="gaiji">却<!--gaiji,卻,1[去*ㄗ],2&#x5374;,3--></span></span>

Cases: About 5

lem/rdg

In Uv and san-lo-bu-pm we find choice used with lem and rdg rather than sic and corr.

<span class="choice"><span class="lem">hastanirdhūnakaṃ</span><span class="rdg">Variant: hastanirdhūtakaṃ</span></span>

We can convert these to the normal var markup:

<span class="var" title="hastanirdhūtakaṃ">hastanirdhūnakaṃ</span>

Cases: 111

A Little Odd, But Okay

In T 10 we find:

<span class="corr" title="[悉&gt;] cf.【麗】"></span>

The intent of this is clear enough. It is indicating that the reading 悉 has been removed as a superfluous term, citing the 麗 edition as authority.

Here the serve to define the actual term, distinguishing it from the discussion about the term. This should probably be adopted as a universal notation in such cases. Thus the actual text is [in the brackets] while what outside the brackets is comments or notes, etc. The brackets can be removed if desired for display purposes, but they do provide a handy way to structure the metadata.

Since the span is empty, there is no way of seeing this entity in the text. I propose that in such cases we leave the span empty, and insert asterisk * when Textual Information is activated.

Otherwise, this remains the same, except we use arrow as before:

<span class="corr" title="[悉→] cf.【麗】"></span>

Cases: 33

We also find the edition cited when there is a term to be inserted, as eg.:

<span class="corr" title="[是此&gt;此是] cf.【麗】">此是</span>

Which means: “The apparently incorrect reading found in some sources 是此 has been corrected by the editor to 此是 based on the authority of the 麗 edition.”

Once again this can remain as is, except for the arrow:

    <span class="corr" title="[是此→此是] cf.【麗】">此是</span>

Cases: 913

In a few cases it seems I wrote the description of sic in by hand, these should be replaced:

<span class="choice"><span class="sic">搏</span><span class="corr">Sic!
Correct to: 摶</span></span>

Note that such cases span two lines for some reason, but this may not be consistent.

<span class="corr" title="[搏→ 摶]">摶</span>

Cases: 3

In a few cases we find sic and corr used without choice.

<span class="sic">Avidyāpratyayāḥ saṃskārāḥ katame</span><span class="corr">Saṃskārāḥ katame</span>

Correct to:

<span class="corr" title="[Avidyāpratyayāḥ saṃskārāḥ katame→Saṃskārāḥ katame]">Saṃskārāḥ katame</span>

Bad and Wrong Things

<span class="corr" title="[剌&gt;刺</span>

Yikes! Correct to:

<span class="corr" title="[剌→刺]">刺</span>

Cases: 5

<span class="corr" title="[十&gt;千] cf.【麗】">【麗】</span>

Here the edition has mistakenly made it into the text. Oops!

<span class="corr" title="[十→千] cf.【麗】">千</span>

Cases: 22

Sometimes we get more complex cases of the form:

<span class="corr" title="[娑&gt;裟]">異] cf.［麗】">［麗】</span>

Notice a couple of errors that seem to have messed up the regex, eg. not proper bracket matching. Now, in this case 娑 and 裟 are obviously similar and easily confused. The glyph 異 means “different”, and i guess it is saying that the “different reading is in the 麗 edition”. I think we can simplify this and at the same time correct the brackets:

    <span class="corr" title="[娑→裟]" cf.【麗】">裟</span>

Infelicitous HTML

In a few cases ‘single quotes’ have been used for title='single quote thing', which messes up regex matching and should be standardized.

Cases: 7

Also , in addition to the cases noted above, there are a few instances where line breaks cause issues. I always prefer to have line breaks on block level elements only for this reason. I would suggest doing this with HTML tidy to all texts before starting work!

My additions and the resurgence of the sic

In a few cases I have noted apparent mistakes in the text, these apply to what appear to be numbering errors in the Vinaya rules.

<span class="choice"><span class="sic"><span class="rule-number">五十</span></span><span class="corr latin">So CBETA.</span></span>

Change to:

<span class="sic" title="So CBETA."><span class="rule-number">五十</span></span>

Since the uncorrected version remains, in this case use of a sic class is required. So it seems we cannot get rid of this totally. There are a small number of other cases where apparent mistakes are marked like this:

<span class="sic">Matsumura: om.</span>

These should not need the Textual Infomration to be activated, but should always be colored and have title="Apparent error in text" if there is no title defined.

I suggest using red for the sic class.

sujato · November 20, 2017, 12:29am

Okay, let me look through these.

First thing I notice is that many of them are from avs, and in the Metadata for this, the source URL is incorrect or broken. It should be: Avadanasataka

Okay, so we have this:

<span class="lem">karmāṇi kalpakoṭiśatair api</span> | <span class="rdg">sonst: <span class="lem">karmāṇy api kalpaśatair api</span> |</span>

Which is very confused. In fact the second phrase is a variant of the first, and “sonst” is German for “otherwise”, which seems redundant, so we correct to:

<span class="var" title="karmāṇy api kalpaśatair api">karmāṇi kalpakoṭiśatair api</span>

There are several cases of this kind, sometimes using “auch” instead of “sonst”.

Another case in the same text:

<span class="lem">tumbaru</span>prabhṛtīni <span class="rdg">Speyer: tumburu</span>

Change to:

<span class="var" title="Speyer: tumburu">tumbaru</span>prabhṛtīni

I’m not sure what the standard syntax for editions is, but anyway Speyer is the edition.

<span class="lem">adhigamya</span> vīra buddhyā <span class="rdg">adhigatya</span>

Here no edition is mentioned. But the variant is separated from the reading.

<span class="var" title="adhigatya">adhigamya</span> vīra buddhyā

The remainder in avs are similar, and hopefully you can sort these out from here. Let me know if there are any issues.

Let me examine the cases in uv and make a new post on that.

sujato · November 20, 2017, 12:52am

Okay, now for the uv.

This is based on the TITUS version.

Once again, i notice that the metdata is incorrect, this time the publication dates are messed up.

Frankfurt a|M, 11.1. | 1.6. | 20.11. | 3.12.

Should be:

Frankfurt a/M, 11.1.1999 / 1.6.2000 / 20.11.2005 / 3.12.2008

But anyway, I think we should probably take a more radical approach to this. The Titus sources are very weird and complex, and best avoided where possible. It seems this text has been independently digitized three times: by TITUS, GRETIL, and Anandajoti. I suggest we abandon the TITUS version and go with Anandajoti’s. He did a number of studies on this text over a few years, so we can be pretty confident it is accurate. And his markup is much more friendly than either GRETIL or TITUS.

What do you think?

https://www.ancient-buddhist-texts.net/Buddhist-Texts/S1-Udanavarga/index.htm

Vimala · November 20, 2017, 2:52pm

I congratulate you and @Aminah for making me look up at least one word a day in the Google Define function!

If only they were as simple as you describe:

traividyaḥ syāt sa ced bhikṣur sa ced bhavati traividyo

Change stuff like that to? (i.e. take the   and  out?):
traividyaḥ syāt sa ced bhikṣur

Also made a little error in sf102 which now reads:
<xspan class="corr" title="prakṣyāmi→Sic! So Matsumura. Correct to drakṣyāmi">Sic! So Matsumura. Correct to drakṣyāmi

So that should become something like this?:
<xspan class="corr" title="prakṣyāmi→drakṣyāmi [Matsumura]">drakṣyāmi

(leave the xspan for now please)

sujato · November 20, 2017, 11:50pm

Well, that example is from uv, and as I suggested above, I recommend that we ditch this version and use Anandajoti’s. I suspect that will be easier than trying to resolve all these issues, and will almost certainly result in a more satisfactory text overall.

Would you be happy to go ahead with this? If it’s too much work for now, we can just leave it till later, there’s no urgency.

Vimala · November 21, 2017, 12:01am

Sorry, did not get that far down yet
Sure, will do. No problem.

But can you answer my question re sf102 above please?

sujato · November 21, 2017, 12:18am

Sorry, i missed that. yes, it looks correct.

Vimala · November 22, 2017, 1:35am

I’ve done the Udanavarga and uploaded it to the old SC so you can see it. It includes the metre as well. Please check if this is OK:

sujato · November 22, 2017, 8:56am

It looks perfect, Ayya, thanks so much.

Vimala · November 22, 2017, 11:34pm

Few more questions:

How about this: https://suttacentral.net/skt/avs39/5.598- and https://suttacentral.net/skt/avs39/3.339- and other such cases. Is there not a more elegant solution than to put this in the middle of the text?
class="sic" is now only used 6 times in 4 files. Maybe have a look at if this is really necessary to have or if it can be simplified into another class? (see attached)
sic.txt.zip (2.5 KB)

I’ve pushed my changes sofar to nextdata, still using the xspan class for those things that were corrected. That will have to change but please have a look first.

Vimala · November 23, 2017, 12:00am

The span classes are all in KV in the form of numbers Kv9.6.1. I suggest to change these to <a class="kv" id="kv9.6.1">.

sujato · November 23, 2017, 1:21am

I’m not really sure what’s going on here, can you tell me what “marked” means?

But in any case, the English notes are not corr, they should be add.

You’re right, it’s looking superfluous now.

In the cases that you gave me:

lzh-sarv-bu-pm: just remove the sic span.
sf102: delete Matsumura: om.
up: delete A་
san-lo-bi-vb-ss6: This is a little obscure. the source file marks this with <> but I can’t find anywhere that this mark is explained. It’s not used anywhere else in this text. From the context, my guess is that it’s not actually sic but is more likely to mean supplied. Anyway, let’s use supplied.

Yes, that would be good.

Vimala · November 23, 2017, 1:34pm

??? Where do you see this?

Yes, makes sense. I will take care. But can you tell me what all the asterixes in the AVS mean? Can they be taken out? They don’t seem to have any particular markup, are are just there.

Will do.

I was wrong here. They are numbers but inside the text, referring to somewhere else. So they should be in the form of: <a class="cr" href="#pts-cs-697"></a> or whatever the correct link is. The Kv refers to the pts-cs numbers.

But that brings me to another problem I found with the a-tags of this class: the links don’t always work. The reason is that they do not refer to the correct numbers. For instance in the English VB 14 it says <a class="cr" href="./vb12#pts-cs-626">626</a>but there is no pts-cs numbers anywhere in the VB. They are called pts-s instead. So what is going on here? Is pts-s a typo and it should be pts-cs???

Not to mention that the numbers appear multiple times in the margin …

Any thoughts on what I uploaded or can you not look at that while you’re away? OK to leave it for now.

sujato · November 23, 2017, 2:15pm

Never mind, it must be added by the highlighting widget.

Yes, I don’t know either. If they’re meaningless, they can be removed.

Looks like it, I can’t think what pts-s would be.

Vimala · November 23, 2017, 2:19pm

Thanks. Did not expect you to be still up. I have plenty to get on with now so have a good trip!

Vimala · November 27, 2017, 10:31pm

.add / .pe

Please see below my analysis. @Sujato, please confirm if this is correct.

pe-classes

seem to only occur in the English (and one German) Abhidhamma texts and this is the full regex find: pe_finds.txt.zip (47.0 KB)

Where it says f.i. Answer as for “self-collectedness”, §11. or Continue as in the Fifth Type of Thought. that is the correct usage of the pe class.

But here it seems that the following should be .add-class:
are each single [factors]; (in DS 2.1.1)
 ...pe... (in Vb 6)

In the pali Patthana we find things like:
<a class="sc" id="89"></a><a class="ms" id="p_37P1_1143"></a>Sahajātavāro paṭiccavārasadiso. https://suttacentral.net/pi/patthana1.10/-1
Which seems to be correct but I’m unsure.

add-classes

are a bit more complicated because we have nearly 24000 of them.

Pali
So first the Pali ones (add_pli.txt.zip (55.6 KB)).
In the Vinaya these are correctly just showing the rule names in pali or vagga names:
<h4><a id="pc31">Pācittiya 31. Āvasathapiṇḍasikkhāpadaṃ</a></h4>
<h3>Pattavaggo</h3>

In the Dhammapada you find things like:
<h4>Ānandattherapañhavatthu</h4> which is also correct.

Sanskrit

add_en.txt.zip (878.7 KB)

I’m not sure about these German additions in the sanskrit text:
Nur im Tib.I think those notes are correct in .add. Please confirm.

But this is a reference together with a note, so should it be .pe? Hier im Tib. und Skt. Uddāna; cf. MPS p. 495:

All the rest seems to be OK.

Chinese
add_lzh.txt.zip (218.1 KB)

All seem to be used for rule classes and headings so there .add is fine.

English

add_en.txt.zip (878.7 KB)

In the translations from the Chinese its seems OK, just used for added headings, words and for the Vinaya rule classes.

Then the translations from the Pali Abhidhamma seem to be a mix:
Similar questions are then put respecting “spheres”, “elements”, and so on through the list of constituent species. The answers are identical with those given to similar questions in the previous “Summary,” viz., in §§64, 67, 70, 74, 83, 89, 95, 103, and 107–120.<a class="pts-cs" id="pts-cs124-145"></a> should probably be .pe.
Also continue as in <a class="cr" href="#pts-cs1.1.1"></a> and Complete as for defilements in previous section. should probably be .pe.

While But when, as the result of this or that Jhāna the corresponding Jhāna is attained and Puggalavādin: and are replaced by: is correct.

Sutta translations:
Things like as in 6:2 §4 and as above, down to: should be .pe

But mostly the .add class is used correct: thinking and [4]

There roman numerals like this: iii but that seems superfluous because roman_numerals are always added and should have their own CSS. Most roman_numerals classes do not have the .add-class added as well.

Vinaya translations:
There are instances like this in the English (pali) Vinaya, which should probably be .pe (and then also need to be changed in the pootle files thereof):
 To be expanded as in <a href="/en/pli-tv-bu-vb-np1#13">Relinquishment 1, paragraphs 13–17</a>, with appropriate substitutions. …

The combination with .cr class appears here too as above.
the same three cases as above are repeated here also would need .pe class.

For the rule-names again the .add-class is correct.

Other languages
add_other.txt.zip (679.5 KB)

Most seem OK but there are some that should be .pe:
seperti Sutta 4, paragraf 5
(continua como no 4)
(сутта идентична предыдущей АН 10.108, но здесь вместо “слабительное” идёт “средство для рвоты”. И далее вместо “счищается” везде идёт “выблёвывается”)

Vimala · November 27, 2017, 11:02pm

Markup for .var and .corr

So basically both have the same CSS. Maybe a slightly different color?
In any case, this is the CSS we have for .var class sofar but I think we can simplify that.
Note that the .deets class is added for the pop-up box that shows the variant reading.

var.scss.zip (780 Bytes)