Seeking insight on SC publication & AI

I’m in a pickle and I figure it’s best to lay it out for a public question to Ven @sujato. I would also welcome the PoV of people like Ven @vimala on this thorny subject. :slight_smile:

For publishing my Therīgāthā translation, I’ve been keeping touch with the local Sri Lankan embassy for a while now. So far they’ve been the only party interested in my work to handle publishing & also funding, for Pāli → Turkish.

Now, they’re in correspondence with Athens Theravada Centre, who’re working on their private AI for Pāli translations, and for the first stage, they want to use AI to translate my work to English, and a general English Therīgāthā translation to Turkish, to measure the fidelity of my work.

There’s only a single DHP/SNP translation available in Turkish, and I won’t beat around the bush - if you only Google translate the 50 most common Buddhist terms in those books, you’ll know that it’s no source material. So there’s very little established Buddhist terminology in Turkish, and I’m sailing uncharted waters.

So, to me this sounds like a benign use of AI, not as a source of authority, but as a source of making sure things are as accurate as they can be. Using Google Translate for this same purpose, would be the exact same thing (and still also using AI, not to mention data breach).

But they also would like me to contribute to development of the AI providing terminology, support, etc.

Now personally, I don’t use AI in my translations, not just because I want to stay affiliated with SC if possible, but because it’s just not that good. Translating something like Vism with or without AI, would take the same amount of time if done right. Currently the tech isn’t there yet so it’s not part of my work.

Now, discussing AI in any fashion around here is hard. All I’m asking is, is it possible to contribute to SC, given that I don’t use AI for my translations, but I’m still allowing chosen parties (with monastics directly involved in the projects) to use my work for AI development?

1 Like

This sounds like it’s anything but benign.

If I am understanding it correctly… they are trying to use AI to evaluate your work? How can anyone see this as an appropriate usage of AI? I don’t mean this in some alarmist kind of way. It just seems like a profound misunderstanding of how LLMs should be used. Especially on minority languages like Pali and Turkish.

If they trust you so little that they would think that AI could be trusted more… gosh, that sounds like bad news. AI would need to be several degrees better than you if it was going to be used in this way. But I’m sure it is much, much worse.

Honestly, I’d just show them the exhaustive thread you participated in here on D&D as you did your work. If they don’t trust you after than, well, then it’s their loss.

But what you are realy asking is if you are somehow breaking your promise to SC by helping them develop their own AI by providing a term glossary? I don’t think that has any bearing on you publishing translations on SC that make no use of AI. Am I missing something?

5 Likes

It’s really less about not/trusting me per se, but using this as an excuse to test the machines (people want to play with their toys), to see what comes up. Personally, I don’t mind scrutiny and from what I’ve heard of the man in charge of the project, it’s still in development so there’s no “Machine says this, so you’re wrong” assumption. It’s also custom built AI specifically trained on Pāli and corresponding translations.

But, people want to see roughly that the text is what it claims to represent (I mean, remember the OG Thig fiasco…), so that the red tape to get a govt. funding for my publication is made easier, while really testing how accurate the machines are working.

Exactly my question, yes.

I would hope that it’s fine, but I really don’t want to presume. :slight_smile:

Thanks a lot for your encouraging words and support though Bhante. :heart: :lotus:

3 Likes

Well, it certainly didn’t take a LLM to see the problem there. All someone would need to do is speak Turkish and English and compare your work side by side. That it is a translation should be obvious.

I do think it’s good to ask/discuss. On this I can only give my opinion. If it were me and I felt obligated to participate, I would strongly suggest that Pali verse should not be used to train an LLM. It’s so idiosyncratic and the grammar is very… irregular. But establishing a whole Turkish Dhamma glossary—that’s a serious job and one that would probably be best done by a group of experienced people. Or you should spend a hundred years experimenting :grin:

2 Likes

:joy: Which is why I picked Thig first, because at least it only uses basic terminology so that I could spend a good amount of time getting those right, before moving on to more thorny doctrinal elements.

Early on, on D&D and DWs I’ve tried to find any Turkish speaking monastic, to no success. I’ve been hunting for old Buddhist texts in Turkish, there’s a few popular books from 70s when people took such stuff (like proper translations & editing) a bit more seriously, and some of the terminology was quite useful, even if a tad outdated at times.

That’s actually very important insight that I’ll be sure to pass along.

1 Like

I agree. Alarm bells went off for me too.

1 Like

Okay. How else should Dhamma translations & glossary work begin, for a language it doesn’t yet really exist, that no known monastics speak? :grin:

Also, I should remark this because maybe what I’ve said came off wrong - I said that they were interested in using my work to build a Turkish Glossary, not taking me as the authority on all things Pāli or Dharma, or using the work with no oversight at all.

There’s psychologists and linguists working with the Greek work, and they want to use my work as a first stepping stone in a similar group of assorted experts. So neither am I posing as the sole authority here, nor am I alone in building the glossary.

So if you still find something alarming about this, I’m all ears. :slight_smile:

2 Likes

I apologize for perhaps sounding curt; no criticism or critique was intended! Since the Venerable Sujato’s 16 part series on AI and the use of AI with Buddhist texts, I’ve become much more aware of the ramifications of AI on many levels and when it overlaps with Buddhist texts and translations, my “alarm bells” go off.

2 Likes

Yes, I also didn’t mean to say you shouldn’t be involved in establishing a vocabulary. I think you should!! Just that it is a serious thing that should probably involve lots of people if there is the hope of speeding up the timeline that English has taken to accomplish the same thing.

2 Likes

Last I checked we were still discussing what Vedana means a few months ago… So can’t go any slower than that! :joy:

No problem. As I said, discussing AI is hard. :sweat_smile:

Which is why I think monastics working directly with such projects is important (Bhante @sujato visited some ground level testing in the past, and Ven. @Vimala is still on the CBETA team to my best of knowledge), to highlight the challenges of such an undertaking. To me, people bumping into a wall trying to make machines do these translations only makes the human translations more valuable.

And this kind of work opportunity will mean I’ll get in touch with aforementioned specialists and get to discuss the thorny sides of the language, which should help with my main work.

I wouldn’t have sought out something like this personally, but it kinda fell into my lap from the Sri Lankan embassy - the governmental & monastic oversight of the project is making me confident it won’t be a rogue, harmful endeavour (although perhaps one shouldn’t assume these things anyway).

Too true!

I posted a news article in some AI post a while back that the SL govt was looking to use AI to spread the Dhamma. Maybe this is part of that initiative.

1 Like

It might be. I think Mr. Michael Xynos, SL’s honorary consulate in Greece is playing a major role in it, probably the head of the whole enterprise. As I said, monastic sangha is involved as well, so it’s probably the same clockworks.

I’ll keep it posted once I get more concrete details.

Edit: Yes, it’s this gentleman and the same Athens Theravada Centre aforementioned .

I’m mostly a spectator in this thread, due to my lack of qualifications. That said, I highly respect your work here on D&D.

I see the potential ethical dilemma you’re presenting.

You make it clear that there is a dire lack of Turkish-Pāli resources. I wouldn’t have known and appreciate understanding this now.

In my (not-famous) hand-drawn AI pie chart, I reserve 15% for Useful AI. Mostly we think about medical research and some specific types of clinical work.

For me, your scenario falls into this piece of the pie. If I am the project lead with funding in hand, I don’t see any other viable way to verify the reliability of your Pāli translations from Turkish. Sitting there with the government officer, they would not give me the time of day if I say to them, “We have identified someone to start the translation project but there’s no way to verify the quality of their work upfront except for AI which we will not use on principle.”

I think, at that point, they’d look for a new project lead.

A dear nun recently mentioned to me, in passing, that an elderly someone standing close by had recently subjected their dog to euthanasia. She said, I don’t think we should ever do that to a dog. I had to catch myself for a half-minute because I couldn’t imagine my deceased pets having suffered through any more physical anguish at the end of their lives without my taking that step.

Obviously I’m mixing apples and oranges here (if not fruits and vegetables). But the point being, at certain points in time our ethical posture must make room for context.

Anyway, I hope this project works out for you and for the good of the dhamma for Turkish speakers.

2 Likes

I think the proper thing to say is

but there’s no way to verify the quality of their work upfront except for AI which we will not use because it does not work.”

This for me has little to do with ethics as far as AI is concerned. Other than the ethics of using a tool/metric that is inaccurate.

How would this problem have been solved two years ago? Or three? Because it sure wasn’t going to be AI back then.

2 Likes

Well, going back to your earlier comment

I do feel that if public sector funds are in the mix here the funding organization is responsible for verifying the reliability of someone’s work. Absent any other way of doing so – everyone’s starting from Ground Zero based on the OP – why not use LLMs judiciously in this case. They can provide some verifiable level of assurance IMO.

Eventually there would be a growing network of human translators who are native Turkish speakers and study Pāli directly. I have to think this is how it would evolve, once a broader initiative gets off the ground.

The fact it hasn’t been solved yet says a lot. That’s why I say judicial use of LLMs here with the assumptions I stated.

Because translating from one minority language to another is precisely something LLMs are bad at. It all comes back to the guy looking under the street light for his keys cause that’s where the light is.

And this is what is unethical, namely using tools inappropriately.

Clearly it is not impossible to find someone who speaks Turkish and English. All that person has to do is compare the Turkish translation to an English translation of the same text and they can tell if they are wildly different. There is no need for AI other than to sound like you are using some fancy technology.

3 Likes

How very exciting and frightening at the same time! :nerd_face:

Yeah, I also had no idea.

As to the OP’s question of verifying the fidelity of the translation: I don’t know about AI and LLM and how they work so I can’t really comment on this. Human proof-reading is certainly the golden standard but I would caution against just leaving it to people who are only fluent in Turkish and English. Maybe as a first step but certainly not as the final vote.

Potential proofreaders of Dogen’s work need to be native (or C2) speakers of the target language and be familiar with the source language Pali. Turkish and English are just too different and allegedly ‘easy’ terms can turn out to be quite tricky - so you need to be able to go back to the source language to weigh translation choices sooner or later. I just did a quick search on the common challenges for both languages ( English and Turkish) which confirmed this.

So, anyway - good luck Dogen - keep us posted!


For some language fun: This is what happens if you mean well but have no idea how stuff works (and I’m obviously not referring to the OP as he’s a native speaker) :smile:

3 Likes

If the parties are planning to compensate you then I would cooperate with them and their use of AI as much as is needed.

In any case keeping your hand in and your eye on the matter means that you will at least have some influence on the veracity of interpretation as you see it.

In the long run you will be contributing your view of wisdom to the future.

2 Likes