Proofreading the Mahavastu

Thanks LXNDR. That has fixed it.

Thank you Bhante. That shortcut works on my system too and is good to know. Thanks also for the shortcut reference guide. I’m pretty sure that will come in handy.

Now that I understand all this and that different italics tags are used in different situations, I would really like to know whether you prefer italics or not? I know you said you would leave it up to me (and being a little old fashioned I like to use italics) however, for sake of consistency, if SuttaCentral has dropped the use of italics then I am happy to follow suit.

Bhante. I have finished the first two pages (of the text) and what corresponds in the html file. That is, from: “Prologue” to “Here ends the prologue of ‘homages’”.
The updated file is attached.
mvu-en.zip (618.3 KB)

NOTE 1

Regarding my previous comment about my preference that italics be used, I have now changed my mind. I note that the text uses italics for words like kalpas each time the word is used. I prefer to use italics the first time it is used and not thereafter, however, as there are so many Sanskrit words, it would be hard to recall whether it was the first time I had come across this word or not. Additionally, as there may be multiple people working on this one day, it would be better not to use italics at all. However, italics may still have a role and be used in some circumstances.

NOTE 2

Page 1 of the prologue: Śakyamuni and Śākyamuni are different Buddhas however for some reason the html file does not follow the text in the use of Śakyamuni and Śākyamuni - they are used oppositely. Śakyamuni in the text appears as Śākyamuni and vice versa. I have changed the html file and will follow the text.

There are other names, where diacritical marks in the html do not follow the text. Eg, the text uses Samitāvin whereas the html uses Samitavin. Again, I have changed these to follow the text.

Unfortunately, however, the text is inconsistent with its use of diacritical marks. Eg, sometimes it says Tathāgata and sometimes Tathagata. When I see how common these errors are, it reduces my confidence in text (when it comes to the use of diacritics) somewhat!

NOTE 3

Since the text uses Sanskrit should we change ‘Arhan’ to the common Sanskrit word ‘Arhat’ which is more recognisable?

Great, it looks good so far.

Re italics, please use whatever you feel comfortable with. In books, I also prefer the “use first time” approach, which is more common in modern texts. But this is not an approach that suits online texts, which are not read in a linear order. In the texts I do for SC, I try to put all Pali/Sanskrit words in italics, apart from proper nouns. But we have a variety of styles on SC, so it’s up to you.

Just by way of explanation, the reason I put Pali/Sanskrit words in italics, and specifically in <i></i> tags is not just for presentation, but for semantics. Marking the text as a foreign language word opens up the potential for various kinds of automated processing. For example, we might make a widget that allows a reader to click on a Pali/Sanskrit word and have it speak the word to them. Or give a definition. And so on. It might also be used to enable proper pronunciation of such words for people using screen-readers.

We’re not doing this yet, but it’s only a matter of time and resources. It is, of course, possible to recognize Pali/Sanskrit words by running the text against a dictionary, but that is much more processor-intensive and inaccurate than simply reading a hard-coded tag.

The name of both Buddhas is spelled śākyamuni in the original text. Any variation is just an error.

When in doubt you can consult the original text if you like. The chapters are the same as the translation, and you can search a name in Chrome just using the non-diacritical spelling and it will highlight. Very handy!

Bear in mind that the source text is itself highly erratic and full of textual inconsistencies and errors. The translation will, of course, have errors as well, and the HTML file produced via OCR is full of them. Just do your best!

With systematic problems like Tathagatha/Tathāgata I would use find/replace all. Remember to use “Preserve case” when doing this kind of find/replace.

If you like. The original text in fact has arhat.

1 Like

That makes sense and it is the method that the text uses. It’s also an easy rule to follow and fits with other SC texts. Additionaly, it will help if SC does use tags for potential functions and features like you describe. If I don’t add the tags now, it will be a lost opportunity so I will start using the ‘i’ and ‘em’ italics tags where appropriate.

I checked and sure enough, as you say, the spelling of śākyamuni is consistent in the original text. I then checked the archive.org version and the spelling is consistent there as well. It turns out that the version I downloaded from forgottenbooks.com is not pristine. It is, as far as I can tell, identical to the archive.org version except for diacritics where there are many inconsistencies. The forgottenbooks version is, for some reason, corrupted or ‘not quite right’. It means I can only rely on the archive version.

Thank you BTW for providing the link to the original text. I’ll attempt to cross-reference this with the archive.org version if similar queries come up again.

Fine.

Good to know. You might want to use the optimized versions I uploaded yesterday, they will render much faster:

Thank you. They couldn’t have come at a better time!

Hi Bhante. Just touching base. Please find attached an update of the file. I am up to the heading “The hell named Sañjīva” (bottom of p. 33 of Vol 1 of the pdf you supplied / line 118 of the html file).mvu-en.zip (618.4 KB)

1 Like

Okay, thanks so much. I’ll check it out in the next day or so.

How’s the work going?

Yes. It’s going well…read half a sentence from the pdf, read the same half a sentence in the html, fix errors, repeat!

The text is enjoyable. There aren’t too many errors. Most errors are missing diacritical marks (so the the Pāli keyboard has been really good for that). Sometimes a ‘b’ incorrectly appears as an ‘h’ (especially if the original text is in italics).

I’m starting to get my head around html, so that, for example, I can indent text where required.

1 Like

Bhante. A couple of queries.

QUERY 1

Tags appear in the html file where numbers in brackets appear in the pdf file.

Example:

Page 14 of the Mahāvastu (34th page of the pdf file): “…As the maturing of what karma does the cold wind blow on them? (17) Those who in this world scatter grain as bait for jackals…”

In the html file this appears as: “…As the maturing of what karma does the cold wind blow on them? < a class=“sen” id=“sen1.17”> Those who in this world scatter grain as bait for jackals…”

Do the tags in the html file need to be there?

QUERY 2

Tags also appear wherever there is a page number.

Example:

Where ‘14’ appears on the top of page 14 of the Mahāvastu (34th page of the pdf file), corresponding tags appear in the html file: < a class=“jones” id=“jones1.14”>

Do these tags need to be there?

I’ve had a chance to look at the text, and it’s looking good, congrats.

Yes, keep all the tags. These encode the metadata for page references. The “sen” tags are the vol/page for Senart’s original Sanskrit edition (which is used in most scholarly sources) and the “jones” are tags for Jones’ translation. If you display the file normally as HTML they won’t appear. They’re used by SC to create the info revealed by the “Textual Information” button on the sidebar.

Sometimes I find it easier to make these visible in the HTML file. If you like, let me know and I can make changes in the file to make them display. But the way they are now is exactly how they will be in the final SC file.

As for the indented portions, these are verse, and if you’re into digging into the HTML a little, it would be better to mark them as we mark verse on SC. The basic idea is that all verses are wrapped in <blockquote> tags. These will normally render with some indentation, so no need to worry about writing CSS for it.

Each set of verses is one <blockquote>. Within a set of verses, each verse is a <p>. And within each verse, each line (except the last line of the verse) is ended with a <br>.

Now, in the translation, and hence in the file you have, the verses are not broken up by lines. In fact they are not translated as verse at all. But I would suggest we insert lines. This is basically just to make it look like verse.

In the printed edition of the translation, they use italics to achieve this. But we can’t preserve the stylistic quirks of the hundreds of printed and digital editions on which SC is based, so we do it our way. So we should remove the <em> tags that create the italics.

So how to add lines? Well, basically just insert a <br> tag to create new line at a reasonable point in the text, so the verses end up as approximately four lines of a reasonable length. Don’t worry about whether the translation matches line by line with the original: this never happens anyway, as the syntax of Indic verse is too convoluted.

Note that HTML ignores new lines that you insert in a file. It only recognizes tags. A “new line” is indicated in HTML with the <br> tag. (This is often used erroneously to create paragraphs or vertical space, but it is correctly used as we are doing here.) It’s handy to indicate the verses by using <br> and starting a new line after it, but actually it doesn’t make any difference to the final output. Nor, incidentally, does it matter whether there’s an extra space before or after the <br> tag.

For the first couple of verses in Mvu, for example, we have:

<blockquote>
<p>The Enlightened One himself looked on this world<br>
and the world beyond,<br>
on the coming and going of men,<br>
on the round of passing away and coming to be.</p>
<p>The Seer himself reflects upon and understands<br>
the peculiar fruition of acts<br>
which is bound up with the nature of man,<br>
and the place wherein they come to fruition.</p>

… and so on until we reach the end of this set of verses.

</blockquote>

Does that make sense?

1 Like

Thanks for the positive feedback. Good to know I’m on track. :smile:

Understood & done. Thanks.

Thank you, but that’s fine. It’s all good for me the way it is.

As you probably saw from the html file, I used CSS, however, I am more than happy to go with < blockquote > and follow the guide you provided. Thank you for the detailed explanation & instructions. I shall edit the file accordingly so that < blockquote > is used for that verse I have already done (the same one you used as the example.)

Yes, thank you.

1 Like

Bhante. Just on another note (sorry, i know this is not really the place) but on Sun 6 Mar the development group for the formation of the Buddhist Council of the Australian Capital Territory (BCACT) will have it’s second official meeting at Ven. Thich Quang Ba’s temple, Sakyamuni Buddhist Centre.

We have a good number coming from various groups in the ACT. As you have some history in having helped establish a Buddhist council here in the ACT in 2011, and are very well known and respected here, would it be too much trouble to ask if you could please give us a few lines to show your support for the council which could be delivered at the meeting? Just a few words would suffice. There is no need to single out individuals, just something to say that you support the formation and wish us all the best in our endeavours…that sort of thing? It could be something written, or a recorded message perhaps?

If you do not have time I completely understand.

Sure, I’ll email it.

Thank you! :anjal:

Bhante. Please find attached updated file with the verses of the hells arranged within the < blockquote > tags and line breaks so that each stanza contains 4 lines. I hope the arrangement is ok. Thanks.

mvu-en.zip (618.6 KB)

The Mvu: a hell called “Saṃghāta” in the html file is “Sanghāta” in the pdf file. As I am following the pdf I am replacing Saṃghāta with Sanghāta but I am starting to have my doubts and think Saṃghāta may actually be correct. Any ideas anyone?

Hi Stu,

Saṃghāta is actually correct.

With metta from Perth.

Thank you Ajahn. I shall stick with Saṃghāta then.