Google Translate in Sanskrit

Google Translate launched it’s Sanskrit translator. You have to put the Sanskrit in in Devanagari for it to recognize it.

For instance, using Udanavarga 1.4:

को नु हर्षः का आनन्द एवं प्रज्वलिते सति ।
अन्धकारं प्रविष्टाः स्था प्रदीपं न गवेषथ ।।

What is the joy and what is the pleasure when it is so ignited?
You have entered darkness and you are not looking for a lamp.

Bhante Sujato’s translation of it’s parallel DHP 146:

What is joy, what is laughter,
when the flames are ever burning?
Shrouded by darkness,
would you not seek a light?

There are still a lot of issues with this machine translation but it is hailed in the Indology community as a great step forward.

14 Likes

Awesome, thanks for the tip! Do you know anywhere there is a technical discussion of how they did this? Normally AFAIK they train Google on parallel datasets in English.

We’ll have to revive our project to add script-changing to Sanskrit, then we can use this directly on SC.

Trying with the Pali of the same verse:

को नु हासो किमानन्दो, निच्चं पज्जलिते सति;
अन्धकारेन ओनद्धा, पदीपं न गवेसथ।

What is the laughter, what is the joy, when it is low;
You are covered with darkness, and you do not seek a lamp.

Not too bad, except it has confused nicca with nīca.

Trying a bit of Pali prose:

एवं मे सुतं—एकं समयं भगवा सावत्थियं विहरति जेतवने अनाथपिण्डिकस्स आरामे।
Thus it is said to me—Once upon a time the Lord was enjoying His pastimes in the forest of Jetavana in the resting place of the orphan Pindika.

Pretty good!

It often chokes and doesn’t recognize the Pali, but when it does it’s not bad.

It’s an interesting indication of just how close Pali and Sanskrit are. Much closer than English and French, or even English and Dutch; more like English and Scottish.

11 Likes

I will be excited when it can handle Kalidasa.

5 Likes

Sorry guys but I’m not understanding when would the occasion of studying Sanskrit be needed. Aren’t we focused on the pali traditions. Again sorry not being judgmental I’m just confused

This page has some useful information about the SuttaCentral project

https://suttacentral.net/introduction?lang=en

2 Likes

There are many useful sources in Sanskrit which contain EBTs that have not been translated.

For example - Sanskrit sutras - SuttaCentral

The Vinaya part of suttacentral has many untranslated Vinaya texts in sanskrit - SuttaCentral

And so on

There are also many non-Buddhist sanskrit sources that are useful for understanding early Buddhism, like some Dharmasastras and the Upanishads.

5 Likes

I tried putting some longer texts into it and if you put some longer prose you can more easily see the limitations.

For example, from the introductory verses to the Mahasamghika Pratimoksa:

नरेन्द्रदेवेन्द्रसुवन्दितेन त्रिलोकविद्युषु विशालकीर्तिना
बुद्धेन लोकानुचरेण तायिनामुदेशितं प्रातिमोक्षं विदुना

तं प्रातिमोक्षं भवदुःखमोक्षं श्रुत्त्वानुधीराः सुगतस्य भाषितां
षडिन्द्रियं सम्वरसम्वृतत्वात्करोन्ति जातीमरणस्य अन्तं

चिरस्य लब्ध्वा रतनानि त्रीणि बुद्धो योदं मायिकाञ्च शुद्धां
दौःशीलवद्यं परिवर्ज्जयित्वा विशुद्धशीला भवथाप्रमत्ताः

शीलेन युक्तो श्रमणो तिरेति शीलेन युक्तो ब्राह्मणो तिरेति
शीलेन युक्तो नरदेवपूज्यो शीलेन युक्तस्य हि प्रातिमोक्षं

अनेक बुद्धानुमतं विशुद्धं शीलं प्रतिष्ठा धरणीवसान्तं
तदाहरिष्याम्यहं संघमध्ये हिताय लोकस्य सदेवकस्य

Google translate gives:

He was well-worshipped by kings and gods and had a vast fame among the lightnings of the three worlds
The Buddha, the follower of the world, the knower of the Pratimoksha, taught them

Hearing that Pratimoksha, the liberation from your suffering, the patients spoke of Sugata
Because the six senses are enveloped in Samvara, they put an end to caste and death

After a long time, the Buddha obtained three gems, Yoda and Mayika, pure
Avoid the fault of immorality and become pure and careless

A Sramana endowed with virtue crosses over, and a Brahmin endowed with virtue crosses over
A man endowed with virtue is worshiped by the gods for he who is endowed with virtue is liberated

Many Buddhas have permitted pure virtue, establishment, and the spring of the earth
Then I will take it in the midst of the Sangha for the welfare of the world and the gods

Example 2, the first two paragraphs of the Avadanasataka:

बुद्धो भगवान् सत्कृतो गुरुकृतो मानितः पूजितो राजभी राजमात्रैर् धनिभिः पौरैः श्रेष्ठिभिः सार्थवाहैर् देवैर् नागैर् यक्षैर् असुरैर् गरुडैः किन्नरैर् महोरगैर् इति देवनागयक्षासुरगरुडकिन्नरमहोरगाभ्यर्चितो बुद्धो भगवान् ज्ञातो महापुण्यो लाभी चीवरपिण्डपातशयनासनग्लानप्रत्ययभैषज्यपरिष्काराणां सश्रावकसंघो राजगृहम् उपनिश्रित्य विहरति वेणुवने कलन्दकनिवापे । तत्र भगवतो ’चिराभिसंबुद्धबोधेर् यशसा च सर्वलोक आपूर्णः ॥

अथ दक्षिणागिरिषु जनपदे संपूर्णो नाम ब्राह्मणमहाशालः प्रतिवसति आढ्यो महाधनो महाभोगो विस्तीर्णविशालपरिग्रहो वैश्रवणधनसमुदितो वैश्रवणधनप्रतिस्पर्धी । स च श्राद्धो भद्रः कल्याणाशय आत्महितपरहितप्रतिपन्नः कारुणिको महात्मा धर्मकामः प्रजावत्सलस् त्यागरुचिः प्रदानरुचिः प्रदानाभिरतो महति त्यागो वर्तते ॥

Google spits out:

The Buddha, the Supreme Personality of Godhead, was honored by his teacher, honored by the kings, the kings, the great citizens of the city, the gods, the serpents, the yakshas, ​​the demons, the garudas, the great serpents, the gods, the serpents, the demons, the garuda, the demon, the great serpent, the greatly pious, the greatly pious. There the entire world was filled with the glory of the Supreme Personality of Godhead, who had been enlightened for a long time.

There lived in a town in the southern mountains a brāhmaṇa named Sampurṇa, who was very rich, very wealthy and very enjoyable. The shrāddha ceremony is auspicious, auspicious, devoted to the welfare of the self, compassionate, great, desires religious principles, affectionate to the subjects, is fond of renunciation, is fond of giving charity and is engaged in giving charity.

Still, better than nothing, cool stuff!

Also, Supreme Personality of Godhead, LMAO

5 Likes

I’ve been plugging Japanese into these AI translators for over a year. There’s usually at least one guffaw that takes place each day. But the guffaws are not the problem - they are entertainment during a dull day. It’s when I have to squint and check it with a dictionary and shake my head. Those are the translation issues that bother me. These tools are meant for people who don’t read or speak the language being translated, but they don’t seem to get any better than my quick drafts full of errors. The reality is that software is not intelligent and has no idea what language is. But, if a person knows or can learn enough of the language being translated to check the accuracy, it’s definitely a useful tool. It’s how I’ve been accessing the Japanese translation of the Dirgha Agama.

8 Likes

There will most likely have been a very extensive pre-training with just English. So everything that comes out at least sounds like correct English most of the time.

Then the actual training is done with a set of matching sentences Sanskrit ↔ English. But we suspect that this is done with modern Sanskrit and therefore it does not perform so well with Classic or Vedic Sanskrit as @Javier pointed out. This is still something that we want to do with BuddhaNexus at some point but this is already a good step forward.

The problem often is that as the English is so good, it is a bit deceiving because people think it is an actual correct translation. I feel there should be a different name for these things and not ‘translation’. It is a useful tool but no more than that.

You might also be interested to learn a bit more on the different tools Google offers for this:

4 Likes

The “Supreme Personality of Godhead” is a dead giveaway that they used a host of modern translations from classic Sanskrit texts, probably scraped from the web. This is how AC Bhaktivedanta Swami Prabhupada translates “Bhagavan” in his (in)famous translation of the Bhagavad Gita, so they probably threw a bunch of translations into it and the AI chose this translation for Bhagavan because its so widespread on the English internet (due to the influence of ISKCON and their publishing capacity). Likely the texts in the dataset also lean towards Hindu material, since this is what is more likely to be easily available online in easily accessible bi-lingual format. Another giveaway that the dataset is biased towards Hindu texts is that you have to use Devanagari instead of IAST (Buddhist Sanskrit sources generally don’t use Devanagari anyways).

5 Likes

Except on Wikipedia… another excellent source of multilingual texts in good markup suitable for training algorithms.

Unfortunately

However, I’ve been fighting this losing Devanagari battle for years. My latest shift in strategy has been to change them to Brahmi script instead of removing Indic scripts and just using IAST.

4 Likes

Does any one of you know if you can use copyrighted data scraped from the web as training data for a neural network? It is not like you are going to publish it somewhere but there might still be some restrictions.

𑀲𑁆ā𑀥𑀼!

In the US you can. It’s considered fair use because the product is transformative enough and is in no way a substitute for the original works. This is the same doctrine that allows Google to scrape the web to create its search engine in the first place.

As a matter of courtesy, though, most crawlers will honor a site’s robots.txt file and not scrape whatever the site owner has asked to be left alone.

Fair use allows algorithmic results to include bits of the original works: for example a low resolution thumbnail image or a snippet of original text or a 15 second preview of the song or video, etc. The exact amount of reproduction allowed is not stipulated in US law but the big industry associations have by now settled with the big tech companies on certain guidelines and people generally respect those.

3 Likes

Thank you @Khemarato.bhikkhu ! This is very helpful!

3 Likes

5 Likes

Oh there is a name for this. MTL, abbreviation from Machine Translation.

There are many novel reading websites that host this MTL novel. Due to the popularity of foreign language webnovel and lack of competent translator, people resort to MTL to read novels.

The English is wonky and one familiar with MTL can recognize signs that it is MTL. Usually it is labeled as MTL to differentiate with proper translation work.