AI-1: Let’s Make SuttaCentral 100% AI-free Forever

sujato · April 7, 2024, 9:27pm

AI is exploding. The underlying ideas have existed since the 1980s, but it is only with the release of OpenAI’s ChatGPT that it rose to dominate tech discourse. In the past, I have been interested in learning about the technology, and excited about possibilities. As time goes on, however, I have grown increasingly skeptical, to the point where I now believe that we need to ban the very existence of “strong AI” or “AGI”, by which I mean any computer agent that appears human. Even the weaker, but still powerful, forms of AI available today have over-hyped benefits and under-appreciated harms.

I’ll discuss these things later on, but for now, I want to make a firm proposal. SuttaCentral in all its aspects, including this forum, should make a commitment to being 100% free of AI content. No machine generated content should be allowed in any of our platforms, promotions, data repos, or anywhere else. This applies even if edited by a human. That does not mean that there cannot be any AI behind the scenes, for example as coding assistants, search, and the like. Such things have drawbacks and benefits, and it is practically impossible to avoid them completely at this point. So the ban affects actual presence of machine-generated text.

One reason for this is that, very rapidly, the internet is being filled with AI slop. A torrent of websites, apps, blogs, forums, spam, social media, advertising, you name it, is full of machine generated gunk. This ruins the entire digital landscape, starting with search engines. Very soon, if not already, human-generated “content” (O how I loathe that word!) will be swamped by AI slop, and the “better” the AI gets the harder it will be to tell the difference. The corollary of this is that in no long time, the few bastions of genuine humanity will be prized like precious gems.

As to use of our content by others, in particular scraping translation data to use for machine translations, this is not something that we can legally prevent. Even if there were copyright protections—which there are not as of yet—then our permissive CC0 license allows any kind of usage.

Nonetheless, we do ask that people use our work in the spirit of the Buddhist tradition. This, in my view, rules out any human-imitating AI applications such as a “Buddhist chatbot” or “AI translations”. The Buddhist tradition has always regarded translation as a sacred activity, and the texts produced as the Buddha’s words. Translations require not merely linguistic knowledge, but a deep understanding of the Dhamma and its expression.

I therefore ask that our content be left out of such projects. If someone wants to make a “Buddhist” chatbot or similar, I request that not only should they not add our work, they should not use general products like ChatGPT that already contain it. I also request that they delete any models trained on our work, as well as any textual corpus that has been generated with it.

So we should make this strong commitment and advertise it, badging our websites and books as “Not By AI”. There are already projects that do this, showing how broad the concern is.

If someone believes that AI has major benefits that we are missing out on, then great: build it yourself and prove me wrong.

sujato · April 7, 2024, 9:31pm

Just a note here. This post is the first part of what has become a long essay on AI. I’ve been working on this for the past few months, during which time I’ve been trying to clarify my ideas and my intentions.

I’ll post the essay in sections over the next while. It’s not fully finished, and TBH I want to get it done so I can focus on things that matter.

Anyway, so if you want some clarification as to why I think like this, it’s coming! Or maybe it’ll clarify nothing!

Meanwhile, there are already one or two examples of edited AI text in our repos. I haven’t checked in detail the status of these. But if you’re involved with this, please do contact me and we can chat about how to proceed.

christie · April 7, 2024, 9:45pm

Thank @sujato. I fully support your position on this.

It is kind of weird, because as a child I read science fiction books, and I particularly liked Asimov’s Foundation and Robot series as well as Frank Herbert’s Dune, and I simply could not comprehend why anyone would be “against” the very idea of artificial intelligence.

But now I understand. In a way, though, AI is interesting in that there are parallels to our sense of self, which is also artificial and constructed. So, seeing AI all around me is just a reminder that “I” am also like that.

sujato · April 7, 2024, 9:54pm

Greetings, fellow geek!

I’m not sure if you’ve heard this story, but it made me happy, so I’ll tell it again.

A few weeks ago, just as Dune the movie was released, I had a dana invitation with a Sri Lankan family. The owner, Pauline, had lost her husband a year ago.

They wanted a talk, and I’m thinking, “should I talk about Dune and AI? Probably a bad idea, they won’t know what I’m on about!”

Then I overheard some of them talking, and they had just seen Dune and loved it. Okay, I thought, let’s go!

Once I started, i learned something amazing. Pauline not only knew about science fiction, she worked for Arthur C. Clarke! At seventeen, looking for her first job, her father heard about this English writer who had just set up residence in Colombo, and was looking for a secretary. So she goes along and gets the job!

She worked for him for three years, which she loved. She said there were always interesting people coming and going, and fascinating conversations. And Clarke himself was always kind and inspiring.

But, she said, she didn’t really get the significance of it at the time. Only after she left she realized what a special experience that was.

While she was there, she typed out a number of his manuscripts. So there’s a good chance that the books she typed as a young woman were then read over a decade later by a teenager in Perth, hunting through second-hand bookshops for good sci-fi!

christie · April 7, 2024, 10:13pm

That’s a lovely story, Bhante. It reminded me a lot of A.C.Clarke’s stories seem to be inspired by Buddhism.

One of the books that inspired me when I was a child was “The Buddha’s Explanation of the Universe” because it read like a science fiction book. It claimed the Buddha knew about atoms, and the theory of the expanding and contracting universe, and drew parallel’s between Buddhist cosmology and modern physics. The author claimed all the information was in the Abhidhamma. The concept of mind moments of course blew my mind, and I still think of the mind today as a kind of computer, and each mind moment is like a clock cycle executing an instruction.

Of course, now that I’ve read the Abhidhamma, I know better. But if I had never read that book, I may not have made the journey, so in that sense that book was very valuable to me. I am sure there are many Buddhists today who are attracted to the religion because of the various planes of existence and devas etc.

Snowbird · April 7, 2024, 11:13pm

It’s interesting that in the guidelines for using their badges, they state that AI translated content is allowed. While not as problematic as original content created by AI, I think that publishing AI translations is also problematic since you will still get the AI on AI feedback loops. Meaning using AI material to train other AI translation tools.

sujato · April 7, 2024, 11:34pm

Right, yes, it’s very difficult to draw boundaries.

It’s a curious distinction, because in the case of copyright, the translation is considered an original creative work.

One distinction that I think is relevant is between server and client. When someone serves AI content, it gives it an air of authority, even if very vague. It’s like, this thing is valuable enough and reliable enough to make and keep and serve. And even if it’s clearly labelled as AI-generated, it’s easy for those labels to get unattached.

Whereas using something on the client side is different. In that case, I decide to right-click “translate page” and I know very well that it is a nonce-generated approximation. ChatGPT is sort of similar. The user is under control, and they can change things if they want, say by using a different prompt. A disadvantage of this approach is that it uses a lot more energy, as things have to be done every time. But on the other hand, it is clearer that it is a transient bit of information rather than a storable text.

I discussed some of these issues with Ajahn Brahm last night, and he agreed that AI generated translations were a bad idea. He said, though, they may be okay if labelled as such. As it happens, Sam Altman said the same thing, AI content should be labelled. I had to explain to Ajahn that there’s just no way of making sticky metadata so that any “label” would be permanent. You can do it on, say, your website, but you have no control what happens after then.

BTW, this is also one of the arguments that I will make for banning generative AI. We should adopt as a minimum standard the requirements spelled out by the makers of AI themselves, such as, in this case, all content should be labelled. Since they can’t do this, by their own standards it should not be allowed.

Same as if a car manufacturer says, “This vehicle is not safe without airbags” then doesn’t put in airbags, we wouldn’t allow it on the roads.

Invo · April 8, 2024, 12:10am

Bhante Sujato starts the Butlerian Jihad?

Snowbird · April 8, 2024, 12:20am

I agree 100%. When I am on a website and I select a target language for the content, I have a very high expectation that it is at least human verified. But if I am using my own tool (like google translate) then there is no question that I am reading something that may be crap.

Jhanarato · April 8, 2024, 2:51am

Agree.

I see re-posted ChatGPT output as nothing more than spam.

There may be a case for generative AI in search as that’s always going to be algorithmic.

Dan · April 8, 2024, 5:09am

General Wisdom is ok

NgXinZhao · April 8, 2024, 7:44am

Forever is a long time. While I am confident that AI cannot replace meditation interview teachers unless they have a mind and beings can be reborn into them and they can attain awakening, AI can do many other jobs which doesn’t require a mind.

In this forum I believe already ruled out meditation interview and guidance kind of content, so I don’t think there’s much of a danger.

In the future, an AI can produce essays which are of higher quality than any human can. They can read all the suttas from all the traditions and list down all the differences, do indexing and give us the exact sutta etc. It’s like a google search engine just with a bit more convience.

As AI go into more and more of the internet, it’s going to get harder and harder to filter out what counts as AI generated content or not.

Say website A uses AI to generate Buddhist articles, which are read by person C, who posted his understanding of the article here with quotation here and there, wouldn’t that be putting AI generated content in the forum? Especially if website A don’t bother to label that it’s AI generated context or that person C didn’t read who wrote it.

I normally just use google translate to do a rough first translation for the chinese and then edit it myself to make it into a nicer english, then such content wouldn’t be allowed here as well? Because google translate is also AI.

Bhante @sujato I also think to ask creator of Buddhist AI to remove contents from sutta central from their bots would do more harm than good. It’s unlikely that the project of Buddhist AI would completely be stopped, and removing such content means the accuracy of the information goes to 2nd or 3rd hand accounts of the sutta which can be more distorted, making the AI less reliable for the short run, until someone else made a translation and put it into the AI.

sujato · April 8, 2024, 8:54am

Don’t tempt me! In fact this was one of the articles that has influenced my thinking:

That’s correct. Look, I’m not saying that we should police every snippet of content, and I do similar things myself, but on the whole, we shouldn’t host AI content. It’ll take a while to sort out the details.

It’s possible. At the end of the day, we don’t really know, we can just make the best decisions we can. The reality is, of course, that OpenAI and the like will scrape us regardless.

Dana · April 8, 2024, 12:39pm

Gabriel_L · April 8, 2024, 1:25pm

Bhante, I get your point and concerns, and will comply, but frankly, resistance is futile. Mara always wins.

Cordeaux · April 8, 2024, 2:51pm

Sādhu. I am fully in support. I also think all audio versions of suttas should be human-read.
I’ve started doing some recordings of my own, and intend to do both English recitations and chanting of the Pali.

karl_lew · April 8, 2024, 2:58pm

Bhante @Sujato, I am currently translating your translations from EN to ES/PT with the help of DeepL, which is an AI translator. These works are therefore labelled as “EBT-DeepL” for clarity.

My motivation for doing so is simple. I would like to listen to ES/PT Dhamma and won’t live long enough to hear the works of contemporary segmented ES/PT translators.

On its own, DeepL randomizes the Dhamma, producing grammatically correct utterances that have randomly assigned synonyms for key Dhamma terms. This process destroys the coherence of the Dhamma with its randomness. This is not some mystical evil process. It is decoherence. It is the difference between the laser that touches the Moon from the Earth and a flashlight whose light is scattered in time and space fading soon into night.

The thing is…people randomize the Dhamma as well. I’ve caught myself using my own terms instead of the carefully chosen words used in your translations. And to guard against that very human randomization, I make a very conscious effort to quote your translations instead of summarizing them and train myself to use your specific word choices when talking about the Dhamma. That practice ensures coherence since each version of the Sujato translations is highly coherent. The vocabulary can change from version to version, but the coherence remains.

What makes human translation so special is that the human experience is singular and shareable. I have a head. We all each have a head. So when we talk about heads we know what we mean.

In contrast, AI is by design a collation where differences of view are mixed together statistically. The singularity of human experience is therefore muddled and randomized incoherently as a jittery average for translation. Pick up Google translate to translate “pau” from PT to EN and you will hear “stick”. Pick up Google translate to translate “um pau grande” from PT to EN and you will hear “a big dick”. This means that on average humans talk and think about big dicks a lot. Now if you translate “a big stick” to PT, you will hear “um pau grande”. THAT is the problem with AI translation. It’s inherently incoherent since its based on a group average.

To maintain coherence, humans review each other. This works because shareable mutually agreeable views erase idiosyncrasies. And just so I have heard:

SN45.3:1.3: “Sir, good friends, companions, and associates are the whole of the spiritual life.”

It is that very human coherence that has kept the Dhamma coherent for thousands of years.

So why am I using DeepL to create ES/PT translations from a coherent EN translations? What madness is this?

But there is a reason and an important one.

The Dhamma is multilingually coherent. The Dhamma’s coherence extends beyond the view of a single language. Which means that one language alone will stumble on some words and phrases that describe experiences uncommon to the shared experience of that one language. And where one language stumbles, other languages will resonate deeply. Look at how the Portuguese nod wisely at “saudade” as a PT experience, one of deep joy and sorrow released to a defining, all-encompassing non-clinging memory in the passage of time. A language is a view.

Translation therefore affords an opportunity for new insights into the heart of human experience. Reviewing, editing, and questioning a raw DeepL translation is beneficial. And it’s hard, painstaking work. DeepL mangles the complexity of nested quotation marks, so it takes additional custom software to make the quotation marks come out right. More importantly DeepL’s natural tendency towards synonym randomness has to be strictly curtailed with the use of glossaries and other special software. The resulting software (EBT-DeepL) is therefore anything but general.

It is the reviewing and editing of translations that provides human benefit. This goes for human translations as well as AI translations. Quality rises with more human reviewers. Coherence and insights increase with more reviews.

In summary, I think of AI as a tool for summarizing human output. But like any output, for example sewage, it needs to be treated for potability. Yes. We’ve all been drinking sewage water for thousands and thousands of years–it’s just been filtered by earth, air and fire to the point where it’s just water again. And that is how we should treat AI output. It’s just sewage that needs to be treated by humans. AI isn’t some mystical evil taint. It’s just a computed average lacking the precision and consistency of human filtering. AI alone is just “a big dick”. Thank you, Google.

BethL · April 8, 2024, 2:59pm

This makes sense, for reasons you stated further in the thread. Personally, this one resonates the most:

Also, it also seems plausible from a technology perspective.

I think it’s worthwhile for anyone who might unwittingly go into “blah blah blah” mode when you reference the “CC0 license” to review what it means:

CC0 enables scientists, educators, artists and other creators and owners of copyright- or database-protected content to waive those interests in their works and thereby place them as completely as possible in the public domain, so that others may freely build upon, enhance and reuse the works for any purposes without restriction under copyright or database law.

The Creative Commons (CC) licensing model supports the altruistic act to publish “stuff” so that everyone can benefit from it to the extent possible. The “owner” of the published stuff isn’t generating any wealth for themselves.

A shout-out to all us typists who learned to type with 10 fingers! To this day I tell my husband how much time I’ve gained over decades, compared to pecking with two fingers. Of course, Pauline may have pecked away…nah, I doubt it.

Which puts the onus on the servers to be fully transparent about how AI content is being served…and, as you infer, this is unlikely due to people’s underlying tendencies.

When I was playing pickleball in January (before I broke my wrist playing pickleball), I delighted in serving the ball in the most devilish ways. All to win a point. The serving process does not bring out the best in us unless we’re serving with supreme mindfulness and compassion…

… whereas (when I was playing pickleball) I was confident I could position myself to defend any devilish serve coming across the net. But that was from years of playing tennis. Many people start playing pickleball without this experience, and they are taken advantage of. It takes much practice and energy to learn to relate intelligently to what’s being served. Some people will never have the intelligence or energy, for different reasons.

So making it incumbent on the client for the whole Strong AI thing to work ethically seems dubious to me (as a thought experiment). To be clear, I think there are ethically positive reasons on the horizon for Strong AI. But, per the article you shared in this thread, traditional risk-benefit analysis quickly breaks down with Strong AI.

Agreed. In a way, for me, it’s like the Internet. During the early 1990s, we knew where it “was.” Now, no regular person (non-geek) knows or even cares where it “is.” It’s just there, magically.

Of course, the Strong AI “magic” depends on ever-increasing compute capacity housed in huge data centers that are grabbing all spare power capacity from the grids. Thank goodness we’re getting a break from the crypto-currency power-grid grab, but that will come back in full force.

All of which is to say this requires carbon-based resources which exact horrible tolls on our climate (not to mention on human bodies who are having to mine all the rare earth elements). I’m not certain we’ll solve the global warming crisis soon enough to make all this additional compute capacity available. So I support near-term measures now (such as your proposal for SuttaCentral) while doing the best we possibly can with the long-term issues and questions.

DonatorProponent · April 8, 2024, 7:04pm

Just to make sure - in this sentence you mean

Right?

Are “traditional” (pre GPT 3) text analysis methods OK?

Viveka · April 8, 2024, 8:15pm

Absolutely support this. Keep AI out of the Dhamma. I cannot think of anything more incongrous to the Path of Practice.

The Dhamma is already in a precarious state with so much false Dhamma out there… to me this AI infiltration is like the lid on the coffin.

@karl_lew i agree with a lot of what you say, and it is particularly this statistical drift of meaning that is so dangerous. However, i can not realistically see teams of dedidicated reviewers processing all AI output to couteract for it. Possible in theory only, not in reality. I feel this drift is guaranteed.

Just to be clear - this drift must by default be in line with Ignorance and Delusion, going with the stream, the way that the wordlings perceive.