Feature request: Chinese text with automatic lookups auto inserted

feature request:

This should be easy to do for a programmer, and would be extraordinarily useful:

In the side and side and line by line chinese + english for agama reading, for the agamas that are not already translated, could we add a mode to automatically, after the line of chinese, put in looked up english translations from the dictionary next to the chinese?

For example:

四阿含義同

would be displayed as:
四阿含/義同
four agamas / the meaning is the same

(I just substituted in what the bubble text over dictionary shows when you mouse hover over the chinese text to construct the english line)

If suttcentral automatically does this when in the line by line or side by side view for chinese agama/english, then for the untranslated agamas we have, which is the majority, one could at least get a somewhat accurate feel for what the sutra is talking about, and amateur chinese->english agama translators have much of the grunt work removed.

2 Likes

I’m afraid this is not easy at all the implement. Right now the Chinese and English texts are plain HTML. Making them into the side-by-side view I would have to make them into Pootle files and manually go over each line and match them up. I’m doing this already with the Vinaya texts in Pali and for instance just doing Parajika 1 took me just about a week! So this takes a lot of time. Of course in the future we would like this and once we have Pootle up and running for other translators and can import also the Chinese texts in there, a volunteer could go in and copy/paste the English with the Chinese. So if you’re volunteering :wink:

5 Likes

How is the dictionary lookup done from chinese to english, that produces the bubble text that hovers over the cursor in SC? That’s the important part of the grunt work amateur translators waste a lot of time on. If SC could just dump a dictionary lookup translation of that into english into an unformatted mass of html, that would still be immensely helpful.

Currently, if I use google translate, on for example, SA 484 from SC, I get the result pasted below. SC’s dictionary lookup would produce a much better job because it would recognize a lot of the special phrases like “thus I have heard”, etc.

Saṃyuktāgama 雜阿含經

SA 484 (四八四) 跋陀羅

If I smell:

At one time, the house of the Buddha lived in the tree and gave it to Lonely Garden alone.

At 0123b21 hrs, the priests of the Vadodara and Psalm Adan lived alone and gave trees to Solitude Park.

At 0123b22 hrs, Ayana Sayādaw visited the Gandhara Institute of the Sakyamuni, and he was relieved by a total of inquiries. He lived on one side. At the time, His Holiness Ananda asked His Holiness the following words: “What is the name of the cloud? What is the first? What is the cloud? The cloud is the first? What cloud is the first?

0123b26 The Venerable Gandhara Says Amin says: “If Brahma is a man who creates and transforms himself as a father of the world, if he sees him, he will see the first.

0123b28 "Ananda! There are people living happy, everywhere moist, everywhere pleasing, full of body, full of dissatisfaction. The so-called free joy, he from the three voices, lifted and sang, said to the public: "extremely silent, from Joy, joy, and joy of living.

0123c04 "The second time, Ananda! There are people who are born out of joy and moisturizing this place. Everywhere is moist and full of pleasing. It is filled with full body. There is no place for dissatisfaction.

0123c07 "Why don’t think of the first in the cloud? Ananda! There are all living beings. There is nothing to know. There is no place for everything. If you want to think of others, it is the first name.

0123c09 “How does the cloud have the first? Repeatedly, Avon! There are all living beings who have no place to go. If you want to go from one place to another, you have the first place.”

0123c12 Ayana Sayādaw, Lord Buddha, said: “If there are many people who act as if they are seeing, if they say, what is the difference between you and the other person? What do you think is different? In order to see what is missing, it is to see the first. As they have heard, it is a famous name for everything you do. If you are happy, you will be the first to enjoy music. The leaker is the name of the first person. Truthfully observed, the leaks are the first and last name.”

At 0123c19, the two priests all talked about it, and they are going to get up.

如是我聞:

一時,佛住舍衛國祇樹給 孤獨園。

0123b21爾時,尊者跋陀羅比丘及尊者阿 難俱住祇樹給孤獨園。

0123b22爾時,尊者阿難往 詣尊者跋陀羅所,共相問訊慰勞已,於一面 住。時,尊者阿難問尊者跋陀羅比丘言:「云何 名為見第一?云何聞第一?云何樂第一?云何 想第一?云何有第一?」

0123b26尊者跋陀羅語尊者 阿難言:「有梵天自在造作、化如意,為世之 父,若見彼梵天者,名曰見第一。

0123b28「阿難! 有眾生離生喜樂,處處潤澤,處處敷悅,舉 身充滿,無不滿處。所謂離生喜樂,彼從 三昧起,舉聲唱說,遍告大眾:『極寂靜者, 離生喜樂,極樂者,離生喜樂。』諸有聞彼聲 者,是名聞第一。

0123c04「復次,阿難!有眾生於 此身離喜之樂潤澤,處處潤澤,敷悅充滿, 舉身充滿,無不滿處,所謂離喜之樂,是 名樂第一。

0123c07「云何想第一?阿難!有眾生度一 切識入處無所有,無所有入處具足住,若起 彼想者,是名想第一。

0123c09「云何有第一?復次,阿 難!有眾生度一切無所有入處,非想非非 想入處具足住,若起彼有者,是名有第一。」

0123c12尊者阿難語尊者跋陀羅比丘言:「多有人 作如是見、如是說,汝亦同彼,有何差別? 我作方便問汝,汝當諦聽,當為汝說。如 其所觀,次第盡諸漏,是為見第一。如其 所 聞 ,次第盡諸漏,是名聞第一。如所生 樂,次第盡諸漏者,是名樂第一。如其所想, 次第盡諸漏者,是名想第一。如實觀察, 次第盡諸漏,是名有第一。」

0123c19時,二正士共論說 已,從座起去。

1 Like

:laughing::laughing::laughing:

Not to make light of your problem, but that is hilarious and made my morning. Thank you.

Google translate is woefully inadequate for certain languages.

6 Likes

Actually, I often use Google Translate, and WeChat’s translate (which I think is from Mictrosoft), to coverse with Chinese friends or read spam that I get in Chinese… :laughing: It generally works quite well if they stay away from slang.

I think the problem here is that the Agamas are using an archaic form of Chinese (I imagine like Chaucer or Shakespeare), so it is no wonder Google Translate balks at that.

I think Frank’s suggestion is a good one. However, it is related to another feature request that has been discussed in the past: the ability to paste a block of Pali (or Agama Chinese in this case) into a text box for translation. Perhaps that would be easier to implement, and would do most of what Frank wants.

3 Likes

Yeah, that would work just fine. And preferably with a nice big buffer in that text box so most samyutta sized (smallish) suttas could fit in one cut and paste!

Since agamas already have taisho line numbers, I could write an offline tool that could easily create line by line or side by side html format, in group size according to line numbers, possibly even at comma and period dilineated sections.

1 Like

It is a json file where each chinese character is defined with and english translation.

The segmented texts like Bhante Sujato’s translations can have Pali as side-by-side, line-by-line or pop-up view. But these are proper translations of each whole segment and are far more than the dictionary definitions. I personally would not be in favor of making a dump of just the dictionary definitions for each separate chinese character. I think it would be far more useful to spend some time in setting up Pootle with the Chinese texts at some point and somebody can properly translate the texts. And after that it can appear in the same way: as segmented texts.

It’s a different form of Chinese with, if I’m not mistaken, different characters as well. Unfortunately due to limitations of Chinese fontsets, the online Taisho texts have been changed slightly to accommodate for this. Hopefully in the future we will have a better transcription.

2 Likes

Just to clarify, in the main, the ancient Chinese texts use the same characters as modern Chinese, although the meanings of words and turns of phrase may be quite different.

But in addition there are a fairly large number of glyphs that are not defined in modern Chinese Unicode, known as gaiji. The CBETA texts handle these by using a special markup to define these characters, which is really just an interim patch.

Meanwhile, the SAT people in Japan have been systematically submitting such obscure characters to the Unicode consortium, and updating their corpus accordingly.

Thus currently the SAT edition of the Taisho canon has a better representation of the text using the correct Unicode glyphs. Unfortunately their text is deficient in other respects, notably in that it has no proper markup. So for the time being we continue to use the CBETA text, and wait for them to complete their Unicode upgrade over the next several years.

3 Likes