Tool for Memorising Hanzis from the Harivarman`s Satyasiddhishastra (Chengshilun)

That’s great news! Yeah, I bought a copy of the translation from the 70s - it’s more like a synopsis than a translation.

3 Likes

Yeah, in my opinion, the English translation of the Satyasiddhishastra isn’t good, unfortunately. But it has its merits, of course.
We are trying to develop a good translation that will be available for everyone.

1 Like

There’s this Buddhist Chinese - English dictionary, though perhaps @cdpatton might be better equipped to tell how legit it is. I’ve been using it for a while and it seems rather comprehensive.

The one issue with your list is that it’s a good stepping stone for a general chinese study, but if you omit repetitions, it means you’re also omitting compounds. I think it’s a good start nonetheless, but for a comprehensive study, one needs to check for these manually; 一乘 for example, meaning “One Vehicle”, is easy enough to understand (translation for ekayana) but 一 is one/same/unity, 九 means nine, 一九 means Amitabha, and there’s no good way to understand it contextually that I know of.

That’s a most extreme example, and usually the compounds are legible if you know the general buddhist vocabulary from pāli / sanskrit, but there are some edge cases like that. :slight_smile:

Otherwise, it’s rather easy to write a script to fetch the meanings of these from a common online dictionary for a simple text file. I had such a script for Dao De Jing somewhere, I’ll see if I can modify it a little to use with this…

1 Like

I understand your point.

But, as the tool is focused on the Chengshilun, this isn’t such a significant issue. This is because the recurrence of compound words in that text is the rarest among Kumarajiva’s translations, meaning the context readily clarifies the meaning.

My Sensei, despite being a strong critic of Kumarajiva’s Madhyamaka thinking, believes that Harivarman’s text was the most clearly translated by Kumarajiva. It’s a lucid and extremely clear text with few compound words and many repetitions.

1 Like

Later, I can give some examples if it`s necessary.

1 Like

Since I can’t paste .txt files here, I’ll just post the python script I’ve run to fetch the CC-CEDICT definitions. Pinyin diacritics were a bit problematic, but I think it works for the most part now. There are 14 characters without any definitions, which I think are not found in the CC-CEDICT dictionary.

To run it, you need the CC-CEDICT in the same place of the Python script:

And pasting that wall of chinese text into a input.txt file should work for the most part.

And I’m pretty sure you can find a way to use the CC-CEDICT database directly for your own application. I’m convinced that the real challenge is handling the diacritic mapping rather than fetching the definitions themselves.

import re

# Path to your CEDICT file
CEDICT_FILE = "cedict_ts.u8"

# Input: text containing Chinese characters
INPUT_TEXT_FILE = "input.txt"

# Output: one character per line with pinyin and definition
OUTPUT_FILE = "character_definitions.txt"

# Tone marks
tone_marks = {
    'a': ['ā', 'á', 'ǎ', 'à'],
    'e': ['ē', 'é', 'ě', 'è'],
    'i': ['ī', 'í', 'ǐ', 'ì'],
    'o': ['ō', 'ó', 'ǒ', 'ò'],
    'u': ['ū', 'ú', 'ǔ', 'ù'],
    'ü': ['ǖ', 'ǘ', 'ǚ', 'ǜ'],
}

vowel_priority = ['a', 'o', 'e', 'i', 'u', 'ü']

def numbered_pinyin_to_diacritic(pinyin: str) -> str:
    def convert_syllable(syllable):
        match = re.match(r"([a-züv]+)([1-5])$", syllable)
        if not match:
            return syllable
        base, tone_num = match.groups()
        base = base.replace('v', 'ü')
        tone = int(tone_num)
        if tone == 5:
            return base
        for vowel in vowel_priority:
            if vowel in base:
                idx = base.find(vowel)
                return base[:idx] + tone_marks[vowel][tone - 1] + base[idx + 1:]
        return base

    syllables = re.findall(r"[a-züv]+[1-5]", pinyin.lower())
    return ' '.join(convert_syllable(s) for s in syllables)


def clean_pinyin(pinyin):
    """ Clean out parenthetical tone markings (like '[tai2]') and convert them to diacritic form """
    # Convert pinyin inside square brackets (definitions or otherwise)
    pinyin = re.sub(r"\[([a-zü]+)(\d)\]", lambda m: numbered_pinyin_to_diacritic(m.group(1) + m.group(2)), pinyin)
    # Convert normal pinyin with tones
    pinyin = re.sub(r"([a-züv]+)(\d)", lambda m: numbered_pinyin_to_diacritic(m.group(1) + m.group(2)), pinyin)
    return pinyin


# Load CEDICT entries
def load_cedict(path):
    cedict = {}
    with open(path, encoding="utf-8") as f:
        for line in f:
            if line.startswith("#"):
                continue
            match = re.match(r"(\S+)\s+(\S+)\s+\[(.+?)\]\s+/(.+)/", line)
            if match:
                trad, simp, pinyin, definitions = match.groups()
                definitions = definitions.strip().replace("/", "; ")
                pinyin = clean_pinyin(pinyin)  # Convert pinyin to diacritic format
                # Also clean pinyin inside definitions
                definitions = clean_pinyin(definitions)
                for char in (trad, simp):
                    if char not in cedict:
                        cedict[char] = []
                    cedict[char].append({
                        "pinyin": pinyin,
                        "definition": definitions
                    })
    return cedict


# Read unique characters from input
def get_unique_characters(path):
    with open(path, encoding="utf-8") as f:
        text = f.read()
    return sorted(set(c for c in text if '\u4e00' <= c <= '\u9fff'))


# Main logic
cedict = load_cedict(CEDICT_FILE)
characters = get_unique_characters(INPUT_TEXT_FILE)

with open(OUTPUT_FILE, "w", encoding="utf-8") as f:
    for char in characters:
        if char in cedict:
            entry = cedict[char][0]
            f.write(f"{char} ({entry['pinyin']}): {entry['definition']}\n")
        else:
            f.write(f"{char}: No definition found\n")

print(f"✅ Done! Definitions written to {OUTPUT_FILE}")
1 Like

:heart_eyes:

Thanks a lot @Dogen .
I`m gonna try it now

1 Like

This is gonna help a lot!

Really: many thanks!

1 Like

No problem at all, I’m glad that it’s of some help. :slight_smile:

Also edit: If you’re going to expand the original tool for all the hanzis, I’d appreciate if you kept a beginner first 300-500 words edition of your tool :sweat_smile:

I should fork it already really, but yknow… Laziness. :smiley:

1 Like

I’ll check, but I believe that 500 hanzis is already more than 90% of the text.

If one masters these hanzis well, one can already handle the Chengshilun quite well. And, of course, there will always be the more “annoying” hanzis, and we’ll always use a dictionary for those. Even the Chinese themselves do. :laughing:

1 Like

Yes, and the new translation shows that N. Aiyaswami Sastri completely mishandled and mutilated all complex and tricky places in the text (which are many). He often is completly wrong about the meaning.

1 Like

Hi, everyone!

I have created another tool. This one is more like a Tetris game.

You can check it out and please let me know what you all think.

@Dogen

3 Likes

Yeah, people sometimes think learning Chinese is more difficult than other languages because of the number of “characters” - but actually they are words. Once you know a couple thousand words, you are doing pretty well. Then you learn more and more unusual words with experience like one would learning any other language.

1 Like

I spent more time with this version than with the last one I tried, but it’s a little too easy to be helpful. I’m getting the impression that the 3 out of 4 wrong answers are almost always the same.

Oh, I see. I am gonna fix it.
Thanks a lot :pray:

1 Like

Fixed it.
Please, if you notice any issues, let me know.

1 Like

Mates, I’ve noticed that TetrisChengshilun behaves inconsistently across browsers. As a backend engineer, I’m struggling to make it fully responsive on tablets, desktops, and mobile phones. I’ve primarily tested it on Safari for iPad, my preferred browser for practice, and want to maintain its current functionality there while ensuring compatibility with others. If any frontend developer is willing to assist, please reach out. I believe it’s a simple task for a frontend developer.

1 Like

Men will literally write a Tetris game rather than use Anki flashcards. :joy:

In all seriousness, it’s fun! I seem to know more words by heart than I thought I did…

2 Likes