In the old SC, we introduced a method of displaying and clarifying text-critical information, mainly by using CSS styling. This must be adapted for the new site. @vimala, @blake, thoughts and opinions please!
Depending how difficult it is we can decide whether this is a requirement for the new site.
Background
In text studies, there are a variety of notations developed for handling the complex kinds of issues encountered when dealing with old manuscripts. These include such things as unclear letters, variants in different manuscripts, changes made by an editor, and so on. In print editions these are typically indicated with conventions like [square brackets], <
angles>
, and so on. Not only are these unsightly, but they are not very informative.
In addition, there are cases where one wants to indicate different parts of a text that are not specified in HTML alone, such as “term” and “gloss”.
Starting from the late 1980’s, a loose consortium of academics has developed a specification called TEI (Text Encoding Initiative), which aims to handle such matters with markup. TEI is currently implemented as an XML specification, and many of our sister projects such as CBETA use it. However, TEI is complex, and we have not adopted it. These days, plain old HTML works just fine.
Still, where possible we adopt the naming and markup conventions of TEI.
The Job
Over time, our text-critical system has become somewhat over-complicated, and it’s not always implemented consistently. (This is, incidentally, a problem that has bedeviled TEI and prevented its wider adoption.) We should try to simplify it where possible and ensure it is correctly implemented on the new site. Because it is somewhat complex, we should not make it a requirement for the launch of the new site, but an enhancement.
Resources
Most of the relevant styles are in common.scss. This also gives explanations for the various classes.
Notes
I have not made a detailed study, but here are some corrections that are required.
- There’s a confusion in the application of .add and .pe. .pe is used to indicate an abbreviation, and tell the reader where to read the full version. It is used in this way in en/ds. In en/vb and (I think) Vinaya trans., .add is used for the same thing. We should use pe for this, and reserve .add for text actually added.
- Note the markup and CSS for choice/corr/sic. This applies a CSS popup in the old version. We should probably make this consistent with our “var” markup.
- The markup for .lem and .rdg duplicates the normal .var. In fact .lem and .rdg are the standard terms in TEI for “main” reading and “variant” reading. We should replace all instances of lem/rdg with .var.
Styles
Here is a rough guide to the style transformation, mostly dealing with the colors. It also gives an example of a text where the markup is used.
class | old display | found in | new display |
---|---|---|---|
.term, .gloss | color: pastel-color(dirty-green) | san-lo-bi-vb-np11 | accent-color |
.supplied | color:#B3A72D | san-lo-bi-vb-np11 | primary-color |
.supplied2 | color:#9B8D00 | san-lo-bi-vb-np11 | primary-color (different shade) |
.suttainfo, .xu, .w | color: misc-color(dark-medium-gray); background-color: misc-color(off-white) | /da1/lzh | secondary-text-color;tertiary-background-color |
.cr | color:#4805A2 | en/vb8 | default link style |
.add | color:#767676;@include sans-serif; | en/vb8 | secondary-text-color |
various classes | #767676 | secondary-text-color | |
.pe | font-style:italic; color:#9B8D00; | en/ds2.1.1 | secondary-text-color, italic |
.metre | color:#9B8D00 | pra/pdhp | accent-color |
.expanded | color: misc-color(light-medium-gray) | dn10/rhys-davids | secondary-text-color |
.sic | color: pastel-color(bright-red); | san-lo-bi-vb-np11 | accent-color |
.del | color:#EE1700 | gdhp | text-decoration:line-through |
.t-gaiji | color:darkcyan | lzh-mu-bi-pm | accent-color |
.var | /pli/kv17.1 | secondary-accent-color | |
.scribe | italic | gdhp | |
.gap | color:#B2AF8C | g2dhp | secondary-text-color |
.unclear | color:#767676 | g2dhp | secondary-text-color |
.del-scribe | none? | g3dhp | same as "del", but with title clarifying. |
Definitions
Here are working definitions of these classes, taken where possible from TEI.
class | definition |
---|---|
.term | contains a single-word, multi-word, or symbolic designation which is regarded as a technical term |
.gloss | identifies a phrase or word used to provide a gloss or definition for some other word or phrase |
.supplied | signifies text supplied by the transcriber or editor for any reason; for example because the original cannot be read due to physical damage, or because of an obvious omission by the author or scribe. |
.cr | (cross-reference phrase) contains a phrase, sentence, or icon referring the reader to some other location in this or another text The TEI term is actually "xr". Should we adopt this? |
.add | contains letters, words, or phrases inserted in the source text by an author, scribe, or a previous annotator or corrector. |
.pe | Used frequently in the Pali Abhidhamma to indicate a span of text describing how an abbreviation is meant to be filled in. Note that it shouldn't be used to indicate a mere abbreviation with ellipsis. |
.metre | Represents the metrical form of a line of verse. We don't follow TEI usage |
.expanded | Text that is abbreviated in original but expanded in translation. Used solely for old DN translations that have been filed in. Not TEI. |
.sic | contains text reproduced although apparently incorrect or inaccurate |
.del | contains a letter, word, or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, or a previous annotator or corrector |
.t-gaiji | Indicates Chinese characters not in the Unicode set as used in our CBETA texts. |
.var | indicates a variant reading. Similar to TEI rdg, except that refers only to a single reading, whereas for us var is used a little more loosely. |
.scribe | a remark by or about a translator or scribe that is found in the original text, usually recording their work or commenting on it in some way. (see discussion below) |
.gap | indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible, invisible, or inaudible |
.unclear | contains a word, phrase, or passage which cannot be transcribed with certainty because it is illegible or inaudible in the source. |
.del-scribe | the same as del, except deleted by scribe. However, this is included under the TEI definition for del. |
Notes on usage
.scribe
Scribe is intended to note a remark by or about a translator or scribe that is found in the original text, usually recording their work or commenting on it in some way.
It is encountered occasionally in the Chinese texts. In Sanskrit (san-mg-bu-pm), for example, we find śākyabhikṣu śrīvijayabhadralikhitamidam “This was written by the Buddhist monk Śrīvijayabhadra”.
In san-mu-kd6, however, it is used incorrectly, eg.:
<span class="scribe">MS adds <span class="supplied">jānantaḥ pṛcchanti ajānanto na pṛcchanti |</span></span>
Manuscript adds “knowing , he asks, not knowing he does not ask”.
What’s happening here is a well-known embarrassment in later Buddhism, where they felt the need to explain away the evident fact that the Buddha had to ask questions about things, whereas we all know he’s really omniscient and never really needs to ask about anything. The Pali commentaries make similar remarks, but here the remark has intruded on the text. The modern editor (Wille) identifies it as an alien intrusion on the text.
Let’s look at the original source for this at:
http://gretil.sub.uni-goettingen.de/gretil/1_sanskr/4_rellit/buddh/vinv_06u.htm
There are a number of problems here:
- Text marked as
scribe
is indicated in the source with {} not ⟪ ⟫. - However, {} does not mean “deleted text”.
- Text marked as
supplied
is not in fact supplied text, but is text attested in the manuscript. Supplied is only for text missing in the manuscript. - {} is used in a variety of ways:
- Indicating apparent mistakes in text (=
sic
) - Describing mistakes in previous editions (“the following first two lines of fol. 94r have not been transliterated by Dutt”)
- Random textual notes (“Tib. inserts the whole Vairambhasūtra; cf. AN II 54-57”)
- Indicating apparent mistakes in text (=
So it would be best to mark these as add
, not scribe
, and replace supplied
with <i>
.
In san-mu-kd7 and san-mu-kd8 scribe
is used yet another way, to mark up numbers in the text.
http://gretil.sub.uni-goettingen.de/gretil/1_sanskr/4_rellit/buddh/vinv07_u.htm
Since these are simple editorial additions, just use add
.
Things that are missing
Apart from the classes mentioned here, there are a number of classes used in the Vinaya. They need to be assessed as well.
In addition, there are various classes used to indicate the end of texts, uddanas, and so on.
Method
At the moment, the implementation of these things is haphazard, as many of them were introduced little by little over time. Also, we are learning as we go, so sometimes old and new approaches are mixed up. We can take this chance to clean things up and make them nice and consistent.
I propose that we handle all this non-standard markup in a single web component. Ooh, can we call it sc-hypercritical.html
? That would be cool.
So basically standard HTML stuff is handled in one place: paragraphs, headings, lists, quotes, tables. Then all the stuff that’s specifically pertaining to text-critical markup is in one component, keeping all the JS and CSS tidy.
Things to consider
- Should we go all-in on the custom elements approach, and replace eg.
<span class="add">
with<sc-add>
? It would be more work, as we would have to consider things like the interaction of block and inline elements. But it would create a cleaner and more meaningful markup. - Speaking of, we should determine which of these is used for block-level elements, which for inline only, and which for both.
- We should also determine the scope of each class, so that we don’t end up, eg., using the same styling to indicate two different things in the same text. Generally speaking, though, it would be best for each class of thing to have a unique style, so far as this is reasonable.