AI-1: Let’s Make SuttaCentral 100% AI-free Forever

I don’t deny the existence of the elite class’s involvement in Pali translation. And that of course opens up a whole different form of bias in translations. However the three most prolific translators of Pali into English (Bhantes Bodhi, Thanissaro, and Sujato) are not a part of that. As far as I know all three are mostly self taught, although Bhante Bodhi would have had Sinhala teachers as a guide.

I’m not so sure I agree. But I recognize many people would feel this way.

I think on an individual basis the problem of consuming “wonder bread” translations is not a big deal. It’s more that when the market is saturated with one thing, the other doesn’t have a chance.

But to your point on democratizing Pali education, I think the better solution to the problem of access is increasing the amount of language learning materials. But that isn’t a problem that either of us can solve.

I sincerely hope you can find a way to work this out.

3 Likes

But did this ever happen? I.e. do we have evidence for people disenganging in language learning because of the availability of machine translation?

1 Like

To be clear, this has nothing to do with copyright. The state of play is that it is generally considered legally permissible to use copyrighted works when training an AI.

This is being heavily contested, and it is possible, even likely, that it will change in the future. If this happens, it would expose all the big AIs, as they would have to delete models that have cost billions to develop. Copyright applies pre-emptively, so that as a rule, things are under copyright unless they are released from it. Clearly existing copyright law is inadequate, and it is unclear how this will evolve in the future.

3 Likes

Requiring prescience of yourself 2 years ago seems unreasonable. :heart:

We are here now looking ahead to MaraAutomation. In that Maraverse of desire fulfillment, the integrity of truth will become a beacon of light in a sea of mud. The 100%AI-Free rule has firmly prepared a spartan rock solid foundation of integrity for the way forward with commitment to build and maintain continued trust in SuttaCentral and its work for people taking refuge from Maramadness.

May your heart find a way to help us without compromise.
:pray:

2 Likes

Not at all, you made the best decision with the best intentions at the time. As did I when, for the past several years, I supported your work, encouraging the use of my translations, promoting BuddhaNexus on SuttaCentral, even donating hardware on which it was developed.

Over time, the situation with AI has changed, and with it, my perception. But that doesn’t say anything about your intentions. You’ve been most respectful in wanting to respect my wishes, and I am sure your own understanding is also evolving.

In terms of changes that need to be made, take your time. I’m not pushing anyone to do anything. On both main SC and this forum, and on Voice, we’re discussing things and working out the details.

6 Likes

Couldn’t you finish your PhD without sharing the weights and datasets? You can still show how different prompting would affect the translation quality, and visualize token embeddings and their distances from core concepts. You can also show how well the model does on general translation tasks between these langauges and their metrics based on model parameter count. These are all useful findings for LLM research, and would not result in the automatically generated translations being used by anyone. You would still need to delete the weights and the datasets but that is something you can explain in the thesis. I think it’s in fact a really good research finding to show that a community, despite its interest in the technology has eventually decided to move away from LLMs due to their limitations. You can show some of the mistranslations that can only be spotted by expert translators and could have caused issues for Buddhism if left unchecked.

Exactly, LLMs can only have experience from their inputs, which is in this case text only.
I think it is totally fine for AI not to have a heart because it also means it can not really have defilements. So in theory, it is possible for LLMs to pick up core concepts that most of us struggle with due to our defilements, have a useful representation of that context and related concepts, and produce outputs that are aligned with them (although these concepts would still only be text based). Nevertheless, even if an LLM could generate better translations than a lay person would, the issue is that there is simply no way of telling if the model is just making stuff up or if it is indeed returning something useful. Even if an LLM would learn to mimic the understanding of nuances in translations, the longer the text, the way tokens are sampled, the more likely it is that it will just stray off and make stuff up randomly.

3 Likes

That’s unlikely the route I am going to take, as I believe in the beneficial impact of open source data and models. Also I wouldn’t consider my work LLM research, I stay below 7B parameters. :slight_smile:
I am likely to switch to datasets where people have not voiced public opposition, i.e. ven. Analayo, who is supportive of my work. I can’t ask permission for everybody who ever created something, but at least I can respect the whishes of those people who voiced their opposition so publicly.

2 Likes

Ayya @Sabbamitta and I have decided to shut down voice.suttacentral.net. Voice.suttacentral.net is no longer available.

For Suttacentral.net to be 100% AI Free, I can see the following additional considerations:

1 Like

Just to avoid any confusion: voice.suttacentral.net was the old version of Voice which was going to die soon anyway. For those who wish to listen to the suttas there is the new version, SC-Voice.net, which is hosted on a server that does not belong to SuttaCentral, and doesn’t have suttacentral in its name.

5 Likes

Bhant @sujato, can you please clarify this point. I seem to remember you saying that usong this tech in search was fine.

We don’t have any MT-only interface translations. I would have never done that anyway. Several started out with a googel translate as a basis, but all were checked by humans.

2 Likes

We’ll have to consider these details carefully, I don’t have all the answers. What I would like is for the various contributers to think about what I’m saying, reach out to me if you can, and maybe soon we could have a zoom session or similar to discuss policy in more detail?

3 Likes

@sujato The last I checked, your translations were cc0 (also called copy-left or public domain). AI plays the “fair use” game with more heavily copyrighted stuff. With cc0 declarations, AI or anyone can use your translations as they please. But that is good… It makes it totally open and free for people to change some of translations and make it better for others.

I probably would have gone with a cc-by-nc . But from a dhamma point of view. cc0 is the least attached when giving a gift of dhamma. gpt+ is a lot different from gpt especially from the translation abilities of pāḷi. It does not use your characteristics words when translating.

As for suttacentral discourse… perhaps you are speaking about that.
Perhaps when you say copyright, you are referring to discourse material.

1 Like

Right, which would be relevant if we had a more restrictive license.

I mean everything.

@Snowbird , “Respectfully requests” is not a licence agreement. Especially coupled with a cc0.

@sujato I wonder if you have used the premium gpt+. I have access to it if you want to see how well it translates text, even from commentaries. It knows pāḷi grammar quite well but not totally perfectly (yet). The difference between gpt+ and gpt is huge. Furthermore a larger gap will appear when gpt5 comes out. gpt 4 turbo is not gpt 4 and there is a reason for that lack of processing power that makes it “turbo”. It is not perfect, but you can tell it knows a lot. Below is not edited (by content).

Here is a random quote / translation from angulimala sutta and commentary.
Myanmar paragraph majjhimapāṇṇasapāḷi page 302 para 348

  1. atha kho bhagavā tuṇhībhūto agamāsi.
    Then indeed, the Blessed One, being silent, went.*
  2. addasā kho coro aṅgulimālo bhagavantaṃ dūratova āgacchantaṃ.
    The robber Aṅgulimāla saw the Blessed One coming from a distance.*
  3. disvānassa etadahosi – "acchariyaṃ vata, bho, abbhutaṃ vata, bho!
    Upon seeing him, he thought, "Wonderful, indeed, sir! Marvelous, indeed, sir!*
  4. imañhi maggaṃ dasapi purisā vīsampi purisā tiṃsampi purisā cattārīsampi purisā paññāsampi purisā saṅkaritvā saṅkaritvā paṭipajjanti.
    On this path even ten men, twenty men, thirty men, forty men, fifty men, walk together for safety.*
  5. tepi mama hatthatthaṃ gacchanti.
    Even they fall into my hands.*
  6. atha ca panāyaṃ samaṇo eko adutiyo pasayha maññe āgacchati.
    But here, this ascetic comes alone, without a second, confidently, I think.*
  7. yaṃnūnāhaṃ imaṃ samaṇaṃ jīvitā voropeyya"nti.
    Why not I deprive this ascetic of his life?"*

Commentary: 348

  1. saṅkaritvā saṅkaritvāti saṅketaṃ katvā vaggavaggā hutvā.
    “Saṅkaritvā saṅkaritvā” means having formed a pact, they travel in groups.*
  2. hatthatthaṃ gacchantīti hatthe atthaṃ vināsaṃ gacchanti.
    “Hatthatthaṃ gacchanti” means they meet with destruction at the hands (of danger).*
  3. kiṃ pana te bhagavantaṃ sañjānitvā evaṃ vadanti asañjānitvāti? asañjānitvā.
    Do they speak thus knowing the Blessed One or not knowing him? Not knowing him.*
  4. aññātakavesena hi bhagavā ekakova agamāsi.
    Indeed, the Blessed One went alone in the guise of an unknown person.*
  5. coropi tasmiṃ samaye dīgharattaṃ dubbhojanena ca dukkhaseyyāya ca ukkaṇṭhito hoti.
    The robber at that time was weary due to long suffering from poor food and uncomfortable sleeping conditions.*
  6. kittakā panānena manussā māritāti? ekenūnasahassaṃ.
    How many people had he killed? Nine hundred and ninety-nine.*
  7. so pana idāni ekaṃ labhitvā sahassaṃ pūressatīti saññī hutvā yameva paṭhamaṃ passāmi, taṃ ghātetvā gaṇanaṃ pūretvā sippassa upacāraṃ katvā kesamassuṃ ohāretvā nhāyitvā vatthāni parivattetvā mātāpitaro passissāmīti aṭavimajjhato aṭavimukhaṃ āgantvā ekamantaṃ ṭhitova bhagavantaṃ addasa.
    Thinking that by capturing one more he would complete a thousand, he resolved that whoever he first saw, he would kill to complete his count, then after shaving his hair and beard, bathing, changing his clothes, he would go to see his parents, emerging from the forest to the edge of the forest, and staying at one side, he saw the Blessed One.*
  8. etamatthaṃ dassetuṃ "addasā kho"tiādi vuttaṃ.
    To illustrate this matter, it is said, “He saw, indeed…”*

You might be worried about AI taking SC data, but it is clearly allowed to do whatever they want… even resell it on amazon as some people do with public domain texts. However, the real problem of AI might be what you have not contemplated yet. Taking away many jobs. Robots are catching up. It was my idea 20 years ago that robots would be ahead of AI. When these two come together within the next 5 years there will be many people out of work. It is already happening in the white collar world for programmers. A job we thought would be the safest from AI. Co-Pilot and Gpt+ are also very good at programming, or writing poetry. Customer support is on its way as well. One of the first things it did besides programming was perfect its legal knowledge to combat lawsuits. There are even thoughts of having AI world leaders. Let us hope that somehow some dhamma gets programmed into AI when and if that happens.

It will be interesting to see how and where it gets its data (which is secret) and what the result is. SC, while popular, is just a very small fraction of Theravāda Buddhism. SC is not Theravāda and bills itself as general “open” Buddhism actually. AI knows Sinhala and Myanmar languages much better than Google Translate, but the written digitized data from those two countries is not so common. The audio dhamma talks are there though. I’m not sure how well it can discern these languages from recordings or videos. There is a huge wealth of information and traditional dhamma talks if it can grab some of this data.

Another scary thing is the click bait algorithms which rank posts. Just like a kid might think a lot of this is real (not scripted) and normal (happens in normal life), AI might think the same. It would be good if AI got some dhamma in its knowledge base.

Could you please refrain from posting AI junk on this thread? Thanks.

I feel like you’re not really hearing the things that I am saying. :pray:

3 Likes

Bhante, I feel that is a divisive comment. There are those of use who use AI out of necessity. I use AI to hear the teachings.

I understand that AI is causing tremendous pain and that you perceive “AI junk”. But I believe your comment can only be read as hostile since we are discussing how to make SuttaCentral free from AI, it seems that allowing AI people to express their opinions is reasonable.

mn8:12.26 ‘Others will be hostile, but here we will be without hostility.’

If I have misunderstood your intent, please excuse my misunderstanding.

I propose that we all look at the actual teachings in our discussions. In particular, I believe that the passage that is foundational to our discussion is this one. The following quote quite succinctly summarizes the problem we face.

49: “an1.130:1.1”: "“Mendicants, those mendicants who explain what is not the teaching as the teaching are acting for the hurt and unhappiness of the people,
for the harm, hurt, and suffering of gods and humans. ",
52: “an1.131:1.1”: "“Mendicants, those mendicants who explain what is the teaching as not the teaching are acting for the hurt and unhappiness of the people,
for the harm, hurt, and suffering of gods and humans. ",

I believe that the above should be a published part of whatever license we craft in regards to the teaching. We, as a group, must acknowledge the great responsibility that we hold together in our hands. The above quote is quite stark and clear. It has no wiggle room. It also places the consequences squarely on our shoulders to carry and uphold whatever we do with the teachings. The Buddha is not a traffic cop, but there is a very very clear warning in the above that we all need to acknowledge.

In addition, I propose that instead of using “Buddhist values” as a hand wave, that we also quote exactly what the Buddha said we should do. For me, that is MN8. MN8 has wonderfully clear instructions. For example:

mn8:12.4 ‘Others will steal, but here we will not steal.’

I think the license should include a code of conduct such as from MN8. It should also include something like “We abide by the code of conduct of MN8. By using this content, you acknowledge that you will do the same.” It is not legally enforceable but it is quite clear.

In other words, let us ground ourselves in the suttas, speak according to the suttas and quote the suttas in our thoughts, speech and actions. I see too much hand waving, shouting and fear and loathing running through this thread. Let us mind here our own code of conduct. The highest code of conduct is held by monastics. What we are discussing is lay people. Lay people also need a code of conduct clearly stated for Buddhists and non-Buddhists. Let us state that in the terms we can all agree on. I think we can all agree on the suttas. So let’s use them.

4 Likes

As one who is automatically pushed to moderation (maybe I can get this “auto-mod” status removed?), I was wondering why this type of speech was said by the chief moderator himself?

In any case, the topic is AI. I believe I was “on-topic” by posting a translation by gpt+ which might not be available for the rest of the world to see. Most of the world sees the free gpt 3.5 version or even worse, the Bing version. It was suggested that if one posts AI, one should specify it as so (as per the new possible suggested guidelines suggestions in this thread of what to do with AI). I complied with all of that.

I think that having AI answer dhamma questions specific to finding texts is not there yet and indeed “junk” because there are just too many mistakes or hallucinations. However, as for translations and pāḷi knowledge, I and others believe it is quite good. An extremely skilled pāḷi scholar’s response was “wow” when I showed him some samples. He has a special degree in pāḷi and it is said that one with his degree, there is no pāḷi word that he cannot understand (abhivamsa monk). I once passed a Petavatthu story to another monk who spent 5 full days translating it. He said it was not perfect, but if he had that translation as a base, it would have probably only taken 3-4 hours to do start to finish (instead of his 5 full days).

So I don’t think it is “junk” and neither do 2 experienced pāḷi monks. Perhaps you can explain why the sample I posted here is junk? Furthermore, I would like to point out that, “What you see here is the worst it will ever be. It will only get better.”

Since you are a pāḷi translator, please point out where it was extremely wrong. Isn’t this part of the topic?

I am hearing that you are afraid about AI using the SC data found here and everywhere for dhamma purposes. If it is quoted in dhamma writings, it will re-verify itself based on those writings that quote AI.

I keep saying or reminding you of the generous and wonderful cc0 declaration, the benefits and how one can be taken advantage of with cc0. For instance someone can freely sell your translations without credit given if they want (even though you kindly request that courtesy). I do not see any problem in any form or shape with AI using data that has been declared cc0. I seem to agree with fair use laws. If I read a freely available website (even copyright website) on how to fix a Toyota and then then I write my own book on fixing cars, I don’t see any violations.

Perhaps you can elaborate what I’m not hearing?

2 Likes

The danger here is in the “quite”. This is a very deadly danger in the sense that eating a “only a bit of lead” is quite good. Lead and incoherence accumulate toxicity that will destroy the Dhamma.

49: “an1.130:1.1”: "“Mendicants, those mendicants who explain what is not the teaching as the teaching are acting for the hurt and unhappiness of the people,
for the harm, hurt, and suffering of gods and humans. ",
52: “an1.131:1.1”: "“Mendicants, those mendicants who explain what is the teaching as not the teaching are acting for the hurt and unhappiness of the people,
for the harm, hurt, and suffering of gods and humans. ",

We absolutely need a refuge from AI. Bhante’s 100% AI-free solution is exactly that. WIth 100% AI-free SuttaCentral, SuttaCentral monks can say definitively “that is the teaching” or “that is not the teaching” by examining any text presented by human, machine, monkeys at a typewriter or Klingons.

But having a refuge from AI doesn’t solve the AI problem for the rest of us who have to live surrounded by AI. We need help discerning truth from lies and deception. WIth a 100% AI-Free SuttaCentral, a distance appears between the AI-world and SuttaCentral. How do we bridge that gap? How do we conduct ourselves in a sea of deception and lies? AI will soon be doing news broadcasts and summarizing our world in AI pablum casually fed to us in bite size chunks filled with advertisements. Can we all just run away somewhere. No. We are stuck here and we have to deal with it somehow.

Telling lay people that AI is tainted and evil does not help at all. It leads to irrational behavior. It leads to extreme prejudice. The problem with AI is that it can generate deception as easily as truth. And the most dangerous AI is the one that is “quite good”. It is the “quite good” AI that can foster a sense of trust that will be betrayed casually in a random moment.

I myself have got rid of Google Assistant because it is too damn smart. I use Siri because she is so aggravatingly stupid that I never am tempted to trust her utterances.
DeepL is, likewise, so damn stupid that I have to be as vigilant as a cat waiting for a mouse to appear.

3 Likes