Okay, so let’s work through the choice/sic/corr cases, figure out what’s going on and resolve them. For the most part, here I am just focusing on the markup rather than JS/CSS, although I make a few remarks about how these might be handled.
These should cover pretty much all the cases. I have messed around a lot while doing this, so I won’t make the changes myself, if that’s okay, but leave them to you.
One trick I use that may be handy, when doing work like this I add a z or something to the changed cases (
class="zcorr", etc.) so that I can be sure that all have been corrected. At the end I remove them.
Make sure to read the whole thing carefully before making any changes, and check with me if anything’s unclear.
The Right Way! Or is it?
Sometimes we have the full paradigm, which is the preferred way of using these tags. These should change as indicated above. For example in lzh-mi-bu-pm:
<span class="choice"><span class="sic">搏</span><span class="corr">摶</span></span>
Which above I suggested we adapt to:
<span class="corr" title="Corrected from: 搏">摶</span>
But see the next example!
An Even Righter Way
Sometimes the Chinese text uses the following form:
<span class="corr" title="[曰>白]">白</span>
There’s actually a good argument in favor of something like this. Since it is a Chinese (or Sanskrit, etc.) text, perhaps it is better to adopt a language-independent way of presenting it. But we can do better than this ASCII approach:
<span class="corr" title="[曰→白]">白</span>
What do you think? Using the arrow is nice and clear. If we need to add explanatory text, we can do it via JS rather than hard-coding.
Cases: about 800
Another kind of thing
In Dk 5 we find eg.:
sic is present. All we can say is that this represents a form that has been corrected by an editor.
We also find things like this:
<span class="del">|</span> tasya tajjā vedanā <span class="corr">|</span>
The | is a mere punctuation, and in this instance the editor has apparently deleted an earlier punctuation and used their own. Strictly speaking we should probably use
sic instead of
del here, but never mind.
Nothing needs be done to the markup in these cases.
Complicated Chinese stuff
Sometimes we find
corr used together with gaiji. Painful!
<span class="corr" title="[劫>却]"><span class="gaiji">却<!--gaiji,卻,1[去*ㄗ],2却,3--></span></span>
Here they can be treated as normal, just inserting an arrow and leaving the gaiji code as is. Note that such cases often span more than one line for some reason. Probably HTML Tidy or something decided to put the gaiji tag on its own line at some point, but this is a mistake.
<span class="corr" title="[劫→却]"><span class="gaiji">却<!--gaiji,卻,1[去*ㄗ],2却,3--></span></span>
Cases: About 5
In Uv and san-lo-bu-pm we find
choice used with
rdg rather than
<span class="choice"><span class="lem">hastanirdhūnakaṃ</span><span class="rdg">Variant: hastanirdhūtakaṃ</span></span>
We can convert these to the normal
<span class="var" title="hastanirdhūtakaṃ">hastanirdhūnakaṃ</span>
A Little Odd, But Okay
In T 10 we find:
<span class="corr" title="[悉>] cf.【麗】"></span>
The intent of this is clear enough. It is indicating that the reading 悉 has been removed as a superfluous term, citing the 麗 edition as authority.
Here the  serve to define the actual term, distinguishing it from the discussion about the term. This should probably be adopted as a universal notation in such cases. Thus the actual text is [in the brackets] while what outside the brackets is comments or notes, etc. The brackets can be removed if desired for display purposes, but they do provide a handy way to structure the metadata.
Since the span is empty, there is no way of seeing this entity in the text. I propose that in such cases we leave the span empty, and insert asterisk * when Textual Information is activated.
Otherwise, this remains the same, except we use arrow as before:
<span class="corr" title="[悉→] cf.【麗】"></span>
We also find the edition cited when there is a term to be inserted, as eg.:
<span class="corr" title="[是此>此是] cf.【麗】">此是</span>
Which means: “The apparently incorrect reading found in some sources 是此 has been corrected by the editor to 此是 based on the authority of the 麗 edition.”
Once again this can remain as is, except for the arrow:
<span class="corr" title="[是此→此是] cf.【麗】">此是</span>
In a few cases it seems I wrote the description of
sic in by hand, these should be replaced:
<span class="choice"><span class="sic">搏</span><span class="corr">Sic!
Correct to: 摶</span></span>
Note that such cases span two lines for some reason, but this may not be consistent.
<span class="corr" title="[搏→ 摶]">摶</span>
In a few cases we find
corr used without
<span class="sic">Avidyāpratyayāḥ saṃskārāḥ katame</span><span class="corr">Saṃskārāḥ katame</span>
<span class="corr" title="[Avidyāpratyayāḥ saṃskārāḥ katame→Saṃskārāḥ katame]">Saṃskārāḥ katame</span>
Bad and Wrong Things
<span class="corr" title="[剌>刺</span>
Yikes! Correct to:
<span class="corr" title="[剌→刺]">刺</span>
<span class="corr" title="[十>千] cf.【麗】">【麗】</span>
Here the edition has mistakenly made it into the text. Oops!
<span class="corr" title="[十→千] cf.【麗】">千</span>
Sometimes we get more complex cases of the form:
<span class="corr" title="[娑>裟]">異] cf.［麗】">［麗】</span>
Notice a couple of errors that seem to have messed up the regex, eg. not proper bracket matching. Now, in this case 娑 and 裟 are obviously similar and easily confused. The glyph 異 means “different”, and i guess it is saying that the “different reading is in the 麗 edition”. I think we can simplify this and at the same time correct the brackets:
<span class="corr" title="[娑→裟]" cf.【麗】">裟</span>
In a few cases ‘single quotes’ have been used for
title='single quote thing', which messes up regex matching and should be standardized.
Also , in addition to the cases noted above, there are a few instances where line breaks cause issues. I always prefer to have line breaks on block level elements only for this reason. I would suggest doing this with HTML tidy to all texts before starting work!
My additions and the resurgence of the sic
In a few cases I have noted apparent mistakes in the text, these apply to what appear to be numbering errors in the Vinaya rules.
<span class="choice"><span class="sic"><span class="rule-number">五十</span></span><span class="corr latin">So CBETA.</span></span>
<span class="sic" title="So CBETA."><span class="rule-number">五十</span></span>
Since the uncorrected version remains, in this case use of a
sic class is required. So it seems we cannot get rid of this totally. There are a small number of other cases where apparent mistakes are marked like this:
<span class="sic">Matsumura: om.</span>
These should not need the Textual Infomration to be activated, but should always be colored and have
title="Apparent error in text" if there is no title defined.
I suggest using red for the