Proofreading the Mahavastu

Recently Ayya Vimala and i have been working on improving our coverage of various Sanskrit texts, as well as implementing verse parallels.

As a side effect of this I had a look at the class English translation of the Mahavastu. This is a fascinating text. It’s one of the few Mahasanghika texts in existence. It is a large compilation of texts, which readily pulls from anything from the early suttas and Vinaya, to verses, to Jatakas, to Mahayana sutras, all loosely arranged to tell the life of the Buddha, and included in the Vinaya.

Although it was translated into English last century, no proper digital text has been made. It is available on archive.org (for the links, see the Wikipedia page). There, the text is digitized via OCR, with all the usual problems inherent in this.

I’ve taken the text from there, cleaned and structured it, and made it into a fairly sane HTML file. As always on SC, I have removed footnotes and the like. It’s immeasurably better than the version on archive.org, but still needs alot of proofreading.

If anyone’s interested in this, please let me know. Basically it would involve reading the text side by side with the scanned pages on archive.org and making corrections. It’s published in three volumes, so it’s a large text. Once it’s finished we can publish on SC.

If you want to check it out, here’s the corrected text I am working with.

mvu-en.html.zip (573.6 KB)

4 Likes

Yes please Bhante. I would like to help.

Hi Stu.

Great! If you download the text and check it out, and also check the scanned versions at archive.org, you should have a reasonable idea what’s required.

I’ve had one other possible offer to help, but still unsure if they’re available.

Thank you! Just for general information, I’ve found that The Mahavastu is also available on other sites. I have downloaded pdf versions of the three volumes from “forgottenbooks.com”. This will allow me to work off-line. I propose to start at the beginning of Vol 1 and work my way through. If others want to help, sections can be divided up accordingly.

Great. I think forgotten books gets it from archive.org, so whatever is fine.

It’s a hot mess of a text, but there are lots of nice things in there!

Shall I copy the text from the html file you created into a word document for editing purposes?

No! Please edit the HTML file in a text editor, it will be a nightmare to edit in a word processor.

I use Sublime Text for editing, but any capable text editor will do. The main restriction is that they have to be able to handle large files, which rules out Atom and some others.

Sublime Text has a spellchecker, powerful regex, customizable appearance, and pretty much everything you’ll need.

Ok. I’ve downloaded Sublime Text and I’m away! Thanks for the opportunity to help. It looks like a juicy text and I am looking forward to getting my teeth into it. Would you like updates? I’m not sure if it would be useful to you, but if you want, I can email you the updated file every now and then?

Great. How about make a start of a page or two and send me that so I can check it. But mostly it should be okay.

How are you with inputting Pali/Sanskrit diacriticals? We had a thread on here some time ago that talked about methods for doing this on the different operating systems.

Ok. Once I’ve done a couple of pages I’ll send you the updated file. Again, thank you. :grin:

Diacrtics are fine. I use them regularly in MS Word. I haven’t checked it out yet but I assume if I have a letter with a diacritic in Word, I should be able to copy it straight into Sublime Text. Either which way it should be ok. (I’ve probably had greater challenges inserting diacritical marks into websites. Platforms such as WordPress aren’t always diactritical friendly.)

You can do this, but it’s clumsy. It’s much easier if you have a system-wide way of inserting them, then you don’t have to worry what application you’re using, or indeed if using them online. The thread I posted before gives solutions for all major operating systems. Only Linux has a native implementation, which is pretty sweet. But there are hacks for other, inferior, operating systems.

One detail which might seem silly, but is actually quite useful: make sure to use an editor font that is designed for legibility. (Hint: not Arial or Helvetica.) You have to clearly distinguish between things like l, I, 1 and other commonly confused glyphs. The text has a large number of these confusions (since it is OCR) so it’s important to catch them. I use Source Sans Pro, which is free, excellent, and has all the glyphs you need.

1 Like

Ok, sorry, I shall have a more detailed look at that thread (I only read the first bit).

I’ll also make sure I use Sans Pro.

Thank you.

I’ve only just started this text, but based on the Early Buddhism course you and Ajahn Brahmali gave in 2013 and the subsequent book “Authenticity of the Early Buddhist Texts”, I would have supposed that this text does not fit the criteria of what is ‘authentic’. Is there however some suggestion that the Mahavastu may be authentic and therefore things like the Bodhisatva career has (at least some) legitimacy and that this text fits at least some of the crietria scholars use to determin authenticity as we define it?

Like I said at the beginning:

It is a large compilation of texts, which readily pulls from anything from the early suttas and Vinaya, to verses, to Jatakas, to Mahayana sutras, all loosely arranged to tell the life of the Buddha, and included in the Vinaya.

The compilation is late, but it includes a variety of early materials, and other passages which, while not early, are still interesting, such as the many Jataka stories, which often give a different take to the ones found in Pali. The intorduction to the translation gives more details.

I have installed the Pāli keyboard which is great and Source Sans Pro and am using this font in Sublime Text so all is going well.
However, when I make changes to the html file in Sublime Text and save them, they don’t seem to always be reflected when it the file is viewed in a browser. For example:
Sublime Text: Here begins the Mahvāstu.
Chrome: Here begins the Mahvāstu.
The ‘Ä’ in the browser is actually ‘ā’ in Sublime Text. What you see in Sublime Text is not what you see in the browser. There are many more examples like this.

i never used this piece of software but suggest fiddling with encodings in the program settings, the correct encoding must be UTF-8 i suppose, this is what works in the Notepad++ at least

1 Like

Possibly this is the browser not reading your encoding. Usually we find that Chrome is very forgiving, but with Firefox you have to specify the Unicode encoding in the page header. Not sure about other browsers. I use Chrome so I get lazy!

Try adding this to the <head> of the HTML file:

 <meta charset="UTF-8"> 

Let me know if this works.

Ok thanks I’ll try that but also like you I would have thought it was a browser issue. Do you get the same issue when you open the file in your browser? It happens for me also in IE. If your suggestion doesn’t work I’ll send you a screen shot so you can see better what I mean.

If the tag doesn’t work, please send a screenshot and the modified file.

Yes, that’s fixed it. Thank you.

1 Like