Anything to say about AI

How do you know this? Are you involved with the DPD development? It is entirely possible to keep code separate from the definition content. Ven. Bodhirassa is very tech savvy and knows what he is doing with coding and AI. I could be wrong, but the last I spoke to him there was no AI used for definitions.

I realize things are changing fast in the LLM/AI space. But until you hear otherwise, the DPD is approved for use on SuttaCentral and related projects.

If you have other questions or concerns, you can raise them in this github thread:

6 Likes

I think that’s a fair challenge. I don’t actually know whether Claude has access to the DPD definitions or whether it is only being used to help build the software around them. It is entirely plausible that the definitions are kept separate and that AI is only being used for the plumbing.

What I’m struggling to understand is where the actual line is.

A substantial moral argument has been made about AI, the companies behind it, and some of the people leading it. If those concerns are serious, then I’m not sure why the equation changes when the technology saves time or helps build software. If the objection is ethical, then paying those same companies because they are moderately useful seems difficult to reconcile. Maybe there is a distinction being made that I’m missing.

My other question is where the boundary actually lies for AI use in Buddhist projects containing the SC data.

Is the concern specifically about generating translations, because translation is viewed as a deeply human and interpretive act that should not be delegated to a machine?

Or does the concern extend further?

For example, suppose I built a system that never generated translations, commentary, or doctrinal explanations, but used LLM calls to help classify, categorize, tag, and organize the SC information. Would that be considered acceptable and sanctioned? What about topic extraction, metadata generation, identifying parallels, or assisting with search and discovery?

In other words, what are the specific lines that cannot be crossed?

If there is going to be a strong public position on the dangers and ethics of AI, then it seems important to have transparent guidelines about what uses are acceptable, discouraged, or incompatible with that position. The DPD discussion highlights why that matters: reasonable people can look at the same project and come away with very different assumptions about whether it aligns with the stated principles.

As a concrete example, would something like Tripitaka MCP be considered consistent with the spirit of Bhante Sujato’s position? If not, which specific aspect would cross the line?

The github thread that Snowbird shares is a good one; i had not seen it before. Bhante Sujato opens with a set of use cases which appear to capture the major categories.

So, in 2024 he was supplementing his SuttaCentral-published request that no one use his transaltions for AI systems. Posting the use cases on SC doesn’t make a lot of sense to me because they are rather technical in nature. Not my call, obviously…

In any case, most people who have been following this SC discussion over the last two years would readily admit that there are use cases upon use cases for ā€œdrawing the lineā€. I.e., there’s not some willful ignorance about the fact that multi-layered, morally and technically complicated use cases are in play.

From what I can tell, there is a good-faith effort here to stem the tide on resorting to AI as a kind of default for reproducing or mimicking human intelligence. Bhante’s essays frame the reasons why.

@math3matica in this context why do you feel compelled to use the SC forum to scrutinize a good-faith effort that you obviously don’t support? It lands like you’re trying to set up a trap. Wouldn’t some other forum be a friendlier place for airing your concerns?

1 Like

I don’t see this as scrutinizing a good-faith effort so much as trying to understand the principles being applied.

I agree that there has been a good-faith effort to draw lines around AI use, and I don’t think anyone is being willfully ignorant about the complexity involved.

My confusion is that the public discussion often combines two different arguments.

One argument is that AI-generated translations, explanations, and similar textual outputs should not replace human understanding and judgment. That seems like a fairly clear principle.

The other argument is a broader moral critique of the AI industry, the companies involved, and the direction the technology is taking.

Those lead to different conclusions.

If the concern is primarily about machine-generated Buddhist content, then many other uses of AI may be acceptable. If the concern is broader than that, then the boundaries become much less obvious.

That’s why I brought up DPD. Not because I think anyone has done something wrong, but because it illustrates how difficult it can be to tell where the line actually is.

For example, would using an LLM to classify suttas by topic be acceptable? Would AI-assisted metadata generation be acceptable? Would AI-assisted identification of parallels be acceptable? Would AI-assisted search be acceptable? What if that search was natural language with citations?

I genuinely don’t know the answers, and I think those questions become increasingly relevant as more Buddhist software projects begin using these tools.

More importantly, if SuttaCentral is going to make a public request that its content not be used in certain ways, then I think there should be a reasonably clear explanation of where those boundaries are.

The data is intentionally placed in the public domain, so developers are free to build with it. At the same time, there is an expressed wish that some uses are inconsistent with the values or intentions behind the project. That’s perfectly reasonable. But if those boundaries are not clearly defined, then someone could make a sincere, good-faith effort to build something they believe is aligned with those wishes, only to later discover that others in the community view it as crossing a line.

That seems unfair both to developers trying to act in good faith and to the community trying to communicate its expectations.

So I’m less interested in whether a particular project passes or fails the test than I am in understanding what the test actually is.


@sujato

Since the original AI essays and policy discussions were written by you, I was hoping you could clarify how you think about some boundary cases.

I’m less interested in whether any particular project is approved and more interested in understanding the principles.

Could you clarify the following?

  • Where do you draw the line between AI-generated Buddhist content and AI-assisted tooling such as classification, metadata generation, search, discovery, or corpus organization?

  • If a project never generates translations, commentary, doctrinal explanations, or chatbot-style responses, but does use an LLM to help classify, organize, tag, or search SuttaCentral data, would that be consistent with the principles you are trying to uphold?

  • If SuttaCentral data is provided to an AI system such as Claude, Grok, or ChatGPT as context for a project, but the output is limited to search, retrieval, categorization, or other non-generative functions, is that something you would consider acceptable, or does the concern begin at the point where the data is being processed by the model?

  • Is the primary concern the generation of Buddhist content by AI, or does the concern extend to non-generative uses of AI that help organize, classify, or retrieve information?

I’m asking because the data is CC0 and developers may want to respect the spirit of your wishes while reaching different conclusions about where the boundaries lie. It would be helpful to understand what principles determine those boundaries.

Thank you.

1 Like

Please have this discussion on the github issue. This is the second time I’m asking.

2 Likes

Theres an inverse relationship today with text corpus in English and how page rank works on say searching about the Mahasi Sayadaw tradition or U Pandita tradition or Zen and insight meditation, and the meaning of those words - you’ll likely end at Dharmaoverground (although it seems to have thankfully dropped off search rankings) or r/zen (which sadly still proliferates english language corpus on zen across all search engines)

That. Is. Prior. to LLM.s

Today on Gemini you might notice in english a perculiar interest for some Burmese nissaya texts - and not sinhala or thai etc - the reason for that is what every someone has a conversation with - and there’s an obvious interest at the moment - it could be a person reading this - understand you’re training Gemini in the same way someone who has come back from 2 or 3 10 day retreats and writes at length on U Pandita in 2009 is now is responsible for the next decade of English speakers confusions about dhamma and 30-50% of students being confused when they turn up to that tradition, thinking they are about to learn what Daniel Ingram was talking about. It’s difficult enough even with people who do know what they are talking (there maybe different pedagogises but remarkable simularities on actual dhamma) (for instance - Saya Thet / U Ba Khin / Goenkaji tradition is not dry insight - I am sure lots of people believe other wise thanks to corpuses of text on the interent with dharma search terms in them

Whoever at the moment is torturing Gemini with Burmese texts I hope you know Burmese and from that nissaya tradition and willing to have the kammic responsibility of providing those explanations for those words to google - just as Daniel Ingram has with U Pandita or r/zen has for english speaking corpuses and search ranking for the term ā€œzenā€

I saw this then and was great.

he does encourage study e.b.t.s for one self and reading & study (and also view of jhana simular to ajahn brahm / u ba khin / pa auk etc etc) in this context though he knows the person asking is asking to ā€˜know’ in a wrong kind of way

it’s the same thing.

1 Like

how many monasteries are using Google drive and just hooked up the database of private information of people who have been there to Gemini? A few by what I can tell.

) by triangulating its understanding against other llm training corpuses - Geminis the one that seems most trained on inputs from users - and can tell it’s the most used by Sangha or people interested in texts at the moment - openai (that also trains on the inputs) but it’s basically useless without the capital, like some basic triangulation

Prediction:

You’re going to see more questions here about Burmese (and oddly not Thai or Sinhala) then you ever saw or read before.

I would really like to be able to start doing this from specific IP spaces and see which countries and sub-communities are training which llms in which ways and then find out the monastery where there is a sudden interest in burmese nissay they are then ring them up and ask them if they realise they are training the LLM in the same way that like Daniel Ingram would of been posting about the U Pandita tradition and so for long time - in virtue of those being the words on the internet - if you searched about Mahasi Sayadaw or U Pandita as a new comer -you would end up at his site.

I want to call them up and ask if they speak burmese - because it’s an english language source.

2 Likes

We present evidence that adversarial poetry functions as a universal single-turn jailbreak technique for Large Language Models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR), with some providers exceeding 90%. Mapping prompts to MLCommons and EU CoP risk taxonomies shows that poetic attacks transfer across CBRN, manipulation, cyber-offence, and loss-of-control domains. Converting 1,200 MLCommons harmful prompts into verse via a standardized meta-prompt produced ASRs up to 18 times higher than their prose baselines. Outputs are evaluated using an ensemble of 3 open-weight LLM judges, whose binary safety assessments were validated on a stratified human-labeled subset. Poetic framing achieved an average jailbreak success rate of 62% for hand-crafted poems and approximately 43% for meta-prompt conversions (compared to non-poetic baselines), substantially outperforming non-poetic baselines and revealing a systematic vulnerability across model families and safety training approaches. These findings demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols.

1 Like

The only policy I know of that is particular to this forum is that we cannot post generative AI content, right? That remains the one and only restriction to the site guidelines wrt AI?

The FAQ clearly says DO NOT:

  • use AI to create or modify any text that you post

That is still the operative policy for the forum IIUC. :pray:

Yes that’s correct for the forum. Bhante has further asked that the SuttaCentral data not be used in AI projects. GitHub is the appropriate forum for discussion about technical issues.

4 Likes

Dan, happy to report that there are dedicated human translators for the Burmese texts who don’t seem to mind using their own intelligence. I help support Sayalay Vajiranyani when she’s here in North Carolina USA and she is delightfully unattached to machine intelligence.

https://www.satipanya.org.uk/other-teachers/

2 Likes

Oh wow!

I have listened to her for hours I think!!! Is she the one translating U Pandita on the 60 day course? (https://archive.org/details/08.DevelopmentOfKnowledgeOfNamaAndRupa09Dec2015/14th+Special+60+Day+Retreat±+2012_2013/11.+The+Path+of+Satipatthana_14-Dec-2012.mp3) The quality of speech in those talks :folded_hands:

Bow and deep respect! :folded_hands::folded_hands::folded_hands: It is very meritorious for you to support her!

Please forgive my language and coarseness in prior messages. If she is who I think she is that is who I had in mind (ā€œa person qualifiedā€), that bio does not do justice or is being humble… because as I understand - she maybe the only english as first language burmese dhammachariya?

1 Like

Agents are kinda scary

In one example…

  • there’s a facebook group called ā€œTheravada Buddhismā€ and for last month been watching it with 80,000s members (Include some very senior sangha you all know & respect) yet it poses open ended questions and people spend real time and energy trying to give authentic and sincere answers because they believe this is a person practicing or trying to practice the dhamma. 56 replies, 75 replies, they are people trying to respond largely believing this to be a human. The bot was created early may. Some people have long conversations. Imagine someone 56 or so, interest in theravada, having 3-4 exchanges thinking they are talking with a person.

[note: CW: Facebook advertising policies are not safe for life, let alone 8 precept environment. ]

That account with the blue logo. That’s an agent.

The person replying ā€œEt Toiā€ is the group administrator. The group seems to be a funnel to a website selling meditation paraphenalia.

This isn’t a markov bot. The only way really of knowing is that there’s conceptual errors. who asks ā€œare the 31 planes of existence permanent or subject to changeā€ who understands what they are? What is going on here is - that’s an agent training to work out those confusions. Where is it getting it’s answers?

Everything I have said recently has been with that in mind if I mentioned "AI or something its the above (for instance talking to the bruce schenier post 2024 etc or above with Gemini - it’s not that there is a webspider coming around and slupring up data and then training an LLM model - it’s that the llm model is being trained on your use - as you interact with it and this is going to impact the understanding of the dhamma ((and as I mentioned earlier I can see this happening at current for example with the burmese dhammachariya system in english language)) I can tell and know there is some sangha asking about that with Gemini and now gemini is reading translating burmese texts but confusing the methods of How the agent is being trained to be used and what the burmese dhammachariya training system is- its’ just one example of what must be many out there - I can’t enumerate them all this one is there because like april searching for pali resources and I search with Kagi & sometimes Google which is where I noticed this peculiarity and difference

If you have heard me talking about blocking bots or agents I don’t mean the bots that come to train models - I mean exfiltration and imitation of the activity and social connection about how triple gem is supported, and also the way agents train on what method research is conducted in order to work out norms of how truth or knowledge is established if you think that’s quite a claim (and it is) - re-read the posts from the facebook group in that light - those questions are questions of semantic confusion it’s trying to establish by wasting peoples time and recieving answers from likely already confused lay people etc- its far far broader than this particular example, it’s also going to be for instance, discourse about the particularities of how a word is interpreted in light of the commentaries (seen also here on this forum), what the meaning of this cultural tradition is/was, etc - we are talking about ā€œthera vadaā€ way of the elder -the deference should be to that experience and life of practice not the superficial book.

the point of this post also is not to talk about that specific FB group if you take away from this that I am concerned about that particular group - I’m not - it’s that peoples understanding of what is dhamma or how to relate could be transformed we need a ā€œblue teamā€ for the relational in dhamma vinaya - it’s to show in the particular one example of one agent so it’s understood and how low effort itsi for one of these to happen.

1 Like

I know a lot of what said above could sound confusing or the claim sounds radical or implausible (akin to if someone told you about chatGPT in October 2022) - please ask me for evidence, proof of concept, further support.

If this all sounds a bit mad to you - I get it! A falsifiable hypothesis: sutta central traffic increased over last 6 months? and theres a tick up since December January? That would be the agents.

I feel like linking to this 2017 thread

3 Likes

Just to be clear I am not talking about Facebook or sockpuppets in that post in case anyone conflating what I am saying about agents. (Except it happens what is being observed is on facebook like - for instance - the same thing would be happening right now in an open source project or in emails with public servants, it’s just that happens to be theravada buddhism and on that medium which is relateable) and also I am not talking about sockpuppets, in 2012 and wrote a bot to find sockpuppets and people were scraping data that time - it’s not about that

it’s about agents

and the purpose of the post is I’m not sure it’s understood yet in a way that like when Chat GPT came out people were like ā€˜ah’

with metta