Excellent.
I need to make the word MahÄvastu italics in the first sentence âHere begins the MahÄvastu.â I installed the MarkdownEditing package for Sublime Text (ST). The italics word for MahÄvastu in ST appears as _ MahÄvastu _ (without the spaces between the word and the underscores) which should appear italicised in the browser but does not (instead appears as _ MahÄvastu _). Just wondering if any ideas available to fix. (Note - Iâve enabled Markdown editing to the file in ST - indicated by the text in the file now appearing in ST grey on grey).
italicising with underscores is not an HTML standard, therefore browser cannot recognize it, the standard is <i>...</i>
No problems, if this works for you itâs fine. We can transform the Markdown to HTML later on.
Normally, however, what I do is mark such things directly in HTML in Sublime Text. Sublime makes this very fast. To put tags around one or more words, highlight them, then hit the relevant shortcut (for me it is alt+shift+w, but this might be different on your system). This puts tags around the word. For italics, just write âiâ and youâre done. It takes a fraction of a second, faster than writing the markdown, actually, because it does both tags simultaneously.
Spending a little time to get familiar with Sublimeâs shortcuts will pay off! You wonât use all of them, but those you do will become your firm friends.
As a further refinement, I distinguish between <em>``</em>
for normal italics used for emphasis, and <i>``</i>
which is used specifically for foreign words, usually Pali/Sanskrit. This is according to the HTML5 semantic definitions of these, but they appear the same in a browser.
Having said which, in the specific case here, personally I wouldnât bother with italics at all. Itâs an old-fashioned usage. While itâs not black and white, modern usage tends to not use italics for ancient sacred texts, such as the Bible or the Koran, and I think this should apply here as well. But thatâs a matter of taste, and I leave that up to you.
Thanks LXNDR. That has fixed it.
Thank you Bhante. That shortcut works on my system too and is good to know. Thanks also for the shortcut reference guide. Iâm pretty sure that will come in handy.
Now that I understand all this and that different italics tags are used in different situations, I would really like to know whether you prefer italics or not? I know you said you would leave it up to me (and being a little old fashioned I like to use italics) however, for sake of consistency, if SuttaCentral has dropped the use of italics then I am happy to follow suit.
Bhante. I have finished the first two pages (of the text) and what corresponds in the html file. That is, from: âPrologueâ to âHere ends the prologue of âhomagesââ.
The updated file is attached.
mvu-en.zip (618.3 KB)
NOTE 1
Regarding my previous comment about my preference that italics be used, I have now changed my mind. I note that the text uses italics for words like kalpas each time the word is used. I prefer to use italics the first time it is used and not thereafter, however, as there are so many Sanskrit words, it would be hard to recall whether it was the first time I had come across this word or not. Additionally, as there may be multiple people working on this one day, it would be better not to use italics at all. However, italics may still have a role and be used in some circumstances.
NOTE 2
Page 1 of the prologue: Ćakyamuni and ĆÄkyamuni are different Buddhas however for some reason the html file does not follow the text in the use of Ćakyamuni and ĆÄkyamuni - they are used oppositely. Ćakyamuni in the text appears as ĆÄkyamuni and vice versa. I have changed the html file and will follow the text.
There are other names, where diacritical marks in the html do not follow the text. Eg, the text uses SamitÄvin whereas the html uses Samitavin. Again, I have changed these to follow the text.
Unfortunately, however, the text is inconsistent with its use of diacritical marks. Eg, sometimes it says TathÄgata and sometimes Tathagata. When I see how common these errors are, it reduces my confidence in text (when it comes to the use of diacritics) somewhat!
NOTE 3
Since the text uses Sanskrit should we change âArhanâ to the common Sanskrit word âArhatâ which is more recognisable?
Great, it looks good so far.
Re italics, please use whatever you feel comfortable with. In books, I also prefer the âuse first timeâ approach, which is more common in modern texts. But this is not an approach that suits online texts, which are not read in a linear order. In the texts I do for SC, I try to put all Pali/Sanskrit words in italics, apart from proper nouns. But we have a variety of styles on SC, so itâs up to you.
Just by way of explanation, the reason I put Pali/Sanskrit words in italics, and specifically in <i></i>
tags is not just for presentation, but for semantics. Marking the text as a foreign language word opens up the potential for various kinds of automated processing. For example, we might make a widget that allows a reader to click on a Pali/Sanskrit word and have it speak the word to them. Or give a definition. And so on. It might also be used to enable proper pronunciation of such words for people using screen-readers.
Weâre not doing this yet, but itâs only a matter of time and resources. It is, of course, possible to recognize Pali/Sanskrit words by running the text against a dictionary, but that is much more processor-intensive and inaccurate than simply reading a hard-coded tag.
The name of both Buddhas is spelled ĆÄkyamuni in the original text. Any variation is just an error.
When in doubt you can consult the original text if you like. The chapters are the same as the translation, and you can search a name in Chrome just using the non-diacritical spelling and it will highlight. Very handy!
Bear in mind that the source text is itself highly erratic and full of textual inconsistencies and errors. The translation will, of course, have errors as well, and the HTML file produced via OCR is full of them. Just do your best!
With systematic problems like Tathagatha/TathÄgata I would use find/replace all. Remember to use âPreserve caseâ when doing this kind of find/replace.
If you like. The original text in fact has arhat.
That makes sense and it is the method that the text uses. Itâs also an easy rule to follow and fits with other SC texts. Additionaly, it will help if SC does use tags for potential functions and features like you describe. If I donât add the tags now, it will be a lost opportunity so I will start using the âiâ and âemâ italics tags where appropriate.
I checked and sure enough, as you say, the spelling of ĆÄkyamuni is consistent in the original text. I then checked the archive.org version and the spelling is consistent there as well. It turns out that the version I downloaded from forgottenbooks.com is not pristine. It is, as far as I can tell, identical to the archive.org version except for diacritics where there are many inconsistencies. The forgottenbooks version is, for some reason, corrupted or ânot quite rightâ. It means I can only rely on the archive version.
Thank you BTW for providing the link to the original text. Iâll attempt to cross-reference this with the archive.org version if similar queries come up again.
Fine.
Good to know. You might want to use the optimized versions I uploaded yesterday, they will render much faster:
Thank you. They couldnât have come at a better time!
Hi Bhante. Just touching base. Please find attached an update of the file. I am up to the heading âThe hell named SañjÄ«vaâ (bottom of p. 33 of Vol 1 of the pdf you supplied / line 118 of the html file).mvu-en.zip (618.4 KB)
Okay, thanks so much. Iâll check it out in the next day or so.
Howâs the work going?
Yes. Itâs going wellâŠread half a sentence from the pdf, read the same half a sentence in the html, fix errors, repeat!
The text is enjoyable. There arenât too many errors. Most errors are missing diacritical marks (so the the PÄli keyboard has been really good for that). Sometimes a âbâ incorrectly appears as an âhâ (especially if the original text is in italics).
Iâm starting to get my head around html, so that, for example, I can indent text where required.
Bhante. A couple of queries.
QUERY 1
Tags appear in the html file where numbers in brackets appear in the pdf file.
Example:
Page 14 of the MahÄvastu (34th page of the pdf file): ââŠAs the maturing of what karma does the cold wind blow on them? (17) Those who in this world scatter grain as bait for jackalsâŠâ
In the html file this appears as: ââŠAs the maturing of what karma does the cold wind blow on them? < a class=âsenâ id=âsen1.17â> Those who in this world scatter grain as bait for jackalsâŠâ
Do the tags in the html file need to be there?
QUERY 2
Tags also appear wherever there is a page number.
Example:
Where â14â appears on the top of page 14 of the MahÄvastu (34th page of the pdf file), corresponding tags appear in the html file: < a class=âjonesâ id=âjones1.14â>
Do these tags need to be there?
Iâve had a chance to look at the text, and itâs looking good, congrats.
Yes, keep all the tags. These encode the metadata for page references. The âsenâ tags are the vol/page for Senartâs original Sanskrit edition (which is used in most scholarly sources) and the âjonesâ are tags for Jonesâ translation. If you display the file normally as HTML they wonât appear. Theyâre used by SC to create the info revealed by the âTextual Informationâ button on the sidebar.
Sometimes I find it easier to make these visible in the HTML file. If you like, let me know and I can make changes in the file to make them display. But the way they are now is exactly how they will be in the final SC file.
As for the indented portions, these are verse, and if youâre into digging into the HTML a little, it would be better to mark them as we mark verse on SC. The basic idea is that all verses are wrapped in <blockquote>
tags. These will normally render with some indentation, so no need to worry about writing CSS for it.
Each set of verses is one <blockquote>
. Within a set of verses, each verse is a <p>
. And within each verse, each line (except the last line of the verse) is ended with a <br>
.
Now, in the translation, and hence in the file you have, the verses are not broken up by lines. In fact they are not translated as verse at all. But I would suggest we insert lines. This is basically just to make it look like verse.
In the printed edition of the translation, they use italics to achieve this. But we canât preserve the stylistic quirks of the hundreds of printed and digital editions on which SC is based, so we do it our way. So we should remove the <em>
tags that create the italics.
So how to add lines? Well, basically just insert a <br>
tag to create new line at a reasonable point in the text, so the verses end up as approximately four lines of a reasonable length. Donât worry about whether the translation matches line by line with the original: this never happens anyway, as the syntax of Indic verse is too convoluted.
Note that HTML ignores new lines that you insert in a file. It only recognizes tags. A ânew lineâ is indicated in HTML with the <br>
tag. (This is often used erroneously to create paragraphs or vertical space, but it is correctly used as we are doing here.) Itâs handy to indicate the verses by using <br>
and starting a new line after it, but actually it doesnât make any difference to the final output. Nor, incidentally, does it matter whether thereâs an extra space before or after the <br>
tag.
For the first couple of verses in Mvu, for example, we have:
<blockquote>
<p>The Enlightened One himself looked on this world<br>
and the world beyond,<br>
on the coming and going of men,<br>
on the round of passing away and coming to be.</p>
<p>The Seer himself reflects upon and understands<br>
the peculiar fruition of acts<br>
which is bound up with the nature of man,<br>
and the place wherein they come to fruition.</p>
⊠and so on until we reach the end of this set of verses.
</blockquote>
Does that make sense?
Thanks for the positive feedback. Good to know Iâm on track.
Understood & done. Thanks.
Thank you, but thatâs fine. Itâs all good for me the way it is.
As you probably saw from the html file, I used CSS, however, I am more than happy to go with < blockquote > and follow the guide you provided. Thank you for the detailed explanation & instructions. I shall edit the file accordingly so that < blockquote > is used for that verse I have already done (the same one you used as the example.)
Yes, thank you.
Bhante. Just on another note (sorry, i know this is not really the place) but on Sun 6 Mar the development group for the formation of the Buddhist Council of the Australian Capital Territory (BCACT) will have itâs second official meeting at Ven. Thich Quang Baâs temple, Sakyamuni Buddhist Centre.
We have a good number coming from various groups in the ACT. As you have some history in having helped establish a Buddhist council here in the ACT in 2011, and are very well known and respected here, would it be too much trouble to ask if you could please give us a few lines to show your support for the council which could be delivered at the meeting? Just a few words would suffice. There is no need to single out individuals, just something to say that you support the formation and wish us all the best in our endeavoursâŠthat sort of thing? It could be something written, or a recorded message perhaps?
If you do not have time I completely understand.
Sure, Iâll email it.
Thank you! :anjal: