We’re working on getting more texts available in Hindi, and need some help from a Hindi speaker who has a reasonable knowledge of the Suttas. If you can help, or know someone who can, then we’d love to hear from you.
There’s Hindi translations of most of the Pali texts, and we have been having these typed up by contractors in India, and adding them to the site. Currently we have the whole DN and about half of MN. Contractors are finishing MN and working on AN.
We’ve run into a problem with AN, which is that the Hindi text doesn’t give any information as to when one sutta ends and another begins. No sutta numbers, no titles, nothing. So we have over a thousand suttas and no way of telling which is which.
The way SuttaCentral works is based entirely on the number assigned to the sutta. Get the number right, and everything just works. So we need someone to go through the whole of AN and add the correct numbers to the start of each sutta. The numbers need to reflect, not the implied numbering of the Hindi text, but the numbers as used on SuttaCentral. These will almost always be the same as the Bhikkhu Bodhi English edition. So essentially you will need to go through the text and add AN3.1, AN3.2 and so on at the start of each sutta, comparing the Hindi with Ven Bodhi’s edition, or with the Pali text on SuttaCentral.
Although it’s a simple job, it does need some discernment, as the division of suttas and so on will not be identical in the Hindi text and SuttaCentral. I would estimate that the job would take 10-20 hours work.
While doing the job, it is also a chance to do some proofreading, which would also be of great help.
We have a Hindi speaker helping us with these texts. He doesn’t have time to do this job, but he will be able to offer support if needed.
I think I know someone who can help you in this regard. I know Ven.Bhikkhu Satyapala who has even received honour from president of India ,Mr. Pranab Mukkherji. He knows Pali and the entire Tipitaka. I think he can help in this matter. If you say I will contact him and tell him to help building this site.
We have had typed up a large number of texts in Hindi: the four main nikayas, several books from the Khuddaka, and the Vinaya Pitaka. This is a major achievement, and it deserves to be done well. The digitized Unicode texts are of course available for anyone who wants to use them.
There are two main tasks that we could use some help with.
Supplying sutta numbers for AN and SN to ensure that they match those on SuttaCentral.
Proof reading and general quality control.
If Ven Satyapala is able to help with either of these it would be wonderful.
Venerable Sujato, there are many mistakes in Hindi suttas and its because of typing error.
The words are joined in some sentences and there is no space or gap between them. So any reader would find it very difficult to read since the words are joined. It definitely requires hard work to correct it. I have uploaded a snapshot of those mistakes where there are no spaces or gaps. Please see it and tell about it to your contractars. I would be ready to help anytime.
Thanks, yes we are aware of this. It is apparently due to a change in the way Hindi is written, and our typists kept the old style in which the original books were printed. We have asked them to do it in the new style, but not sure how successful this is.
There’s a lot of new and updated texts on the way, I’ll let you know when these are here. I hope a lot of these problems are solved, but we shall have to wait and see.
Hi bhante @sujato , Lately I was near Department of Buddhist studies, Delhi university and your Hindi translation work came in my mind so i went forward to the faculty of Buddhist studies. There was only one professor available today, tried to explain him about your work and how he can help in Hindi translation.
Bhante Sujato, I have Hindi translations PDF’s of DN, MN, SN, AN, Dhammapada, Theragatha, Jataka and Udana.
But the problem is that they are not editable PDFs. They can be edited only when we run OCR (Optical Character Recognition) on them. I have the Hindi OCR software which can be used to convert them into editable text. However, it may require human verification after OCR.
If you don’t mind, can I upload (or assist you to upload) the Hindi texts to suttacentral?
But it will require a lot of work to do and somebody who understands Hindi.
Of course if you want to do this we would be very grateful. But how about we start with just one of the Nikayas and see how you go? Myself and the other volunteers will be very happy to code them once they are in a text file.
I just decided to read the whole thread because I’m not up to date with these things.
Bhante @Sujato is saying (in May 2015) that the DN, MN and part of the AN are already typed up by contractors in India and help is basically needed with the numbering and proofreading of the already typed-up texts. Then you remarked that there are many mistakes in these texts in January 2016.
I think it is best to wait until Bhante @Sujato comes back online tomorrow and give us an overview of the current state of affairs and then we can decide on the best course of action to take.
Our experience with using OCR on non-Roman texts is very poor. We tried it out and rapidly rejected it as an option, preferring to pay typists to do the job. The characters are not recognized with sufficient clarity, especially in old texts with poor printing and dubious or inconsistent scanning. Then there is the whole problem of the layout in book form, with page headers, numbers and so on, which is a hassle to deal with even in a text-based PDF. Our discussions with professionals in Sri Lanka and India essentially confirmed that OCR is not an appropriate tool for this kind of job.
We can’t. Any OCR project must begin with a rigorous proofreading to ensure the reliability of the texts. That’s not a job that SC can undertake, as we are stretched quite thin right now. We would not consider accepting raw OCR texts I’m afraid.
So far as I know, the typed texts we have are fairly good. I should hope so, because we put a lot of time and money into them. The only issue that people have raised with us is the word breaking, which is a stylistic issue. If someone is willing to go over them, it would be good to insert the word breaks where appropriate. Would you be interested to do this?
The main job that we need done is for a Hindi speaker to go through and sort out what we have and assign the proper number to each sutta. This is quite a simple job for someone familiar with the suttas. Just go one by one and put the proper SC number at the start of each sutta. Partly it can be automated, but it needs to be checked by hand, because each edition is slightly different.
First of all, congratulations on the redesign of SuttaCentral. It is very appealing and beautiful. I am very grateful for the great work done by you and your team to bring the invaluable Suttas to the entire world. Hope I can meet you someday.
I am very inspired to bring back the Dhamma to its place of origin (in any small way I can) and I am delighted you feel the same way.
I was talking to my friend Abhinav over at FB about starting a channel with him on YouTube having just Hindi language readings of the Suttas.
I think he might be busy, so I’ll go ahead myself. I would be grateful to receive your blessings in this small endeavour.
He also sent me to this forum link. I would be delighted to contribute to this project in any way I can in my free time as a Hindi speaker and a regular faithful practitioner. I will definitely have a look at those files and see what I can do.