Exploring the Development of a ChatGPT Plugin for SuttaCentral

jgreen01 · July 1, 2023, 11:26pm

Hi @Snowbird, thank you for your input. It’s always interesting to hear different perspectives on AI.

For anyone who’s curious about the kind of AI tools we’re working with, you might want to explore ChatGPT and Bing Chat. You can find ChatGPT at chat.openai.com and Bing Chat can be accessed from the chat option on www.bing.com.

There’s also an intriguing article, “Top 20 Most Insane Things ChatGPT Has Ever Done”, that shares some creative uses of ChatGPT. It’s quite a read for anyone interested in the potential of AI.

I understand your concerns about the potential for AI to ‘lie’ or misrepresent information. It’s important to remember that AI, including Language Learning Models (LLMs) like the one we’re working with, have limitations. They generate responses based on patterns they’ve learned from large amounts of data, but they don’t understand or know the truth in the way humans do. Therefore, it’s crucial for users to understand these limitations and to always refer to trusted teachers or religious texts before accepting anything generated by an LLM as truth. It’s much like the saying, ‘don’t believe everything you read.’

One idea we’ve been considering is to embed a warning into the plugin itself, reminding users of these limitations each time they interact with it. Do you think that would help address your concerns?

Also, we’d love to have you help us test the plugin when it’s in a usable state. Your perspective could be very valuable in ensuring we’re addressing these important ethical considerations.

And as always, if there are more technical questions or if anyone wants to understand more about our project, I’m here for a chat. Always happy to discuss our work with those who are interested. Feel free to DM or post here.

michaelh · July 2, 2023, 7:50am

Venerable @Snowbird, thanks for saying these things - your site and posts have been a wonderful way into the scriptures for me. I have been trying to work through these ethics - thanks so much for engaging . With my previous work with LLMs - I was originally planning to put up a site inspired by yours, where scriptures could be navigable by location, person, list or subject that’s present in multiple parts of the scriptures after I’d checked/rewritten all the outputs, which I was planning to check over the course of some months/years.

Boycotting LLMs is more and more looking like the best ethical position here as this stuff is playing out, but I’ve chosen to try to reform rather than boycott. On top of confident lies these models produce, are we entering a world where unfeeling LLMs are to teach children rather than people, every interaction with any service will be through a chatbot, our intelligence outsourced, our efforts/kamma relinquished to some language model - not to mention large sections of the population becoming redundant or their lives dominated by these things. It’s frightening. I also don’t think that these promise better understandings of dhamma by virtue of being from a super-intelligence, but this may become true for other subjects imo.

You raise that the plugin will do summarisations - super problematic if they purport to interpret the dhamma. The Buddha declared the 4NT are for “one who feels”. I would like it if we could stop the model interpreting too much or at least add a big disclaimer if people are trying to learn dhamma from it directly, as @jgreen01 has suggested.

With these in mind, I’m working on this because:

It’s a way to make ChatGPT answer more accurately rather than confidently spitting out falsehoods and misquoting scriptures. Some of its answers are better than those found on many web sites or forums. Better to improve rather than leave people with lies - I think there may be a kind of bad kamma by not attempting to improve it if I can help.
It can do things with texts with a great amount of attention, and answer certain kinds of questions that are otherwise extremely arduous to find answers to or produce data about.
It is a way into the texts broadly - there is a possibility there will be answers that could lead to an improved dhamma understanding, but in order to do that, it at least needs to go to the texts.

I don’t think developing this plugin will lead to OpenAI utilising it for training - just inference. But this is something to research for sure, and yes something to ponder that it will sound more convincing if it can quote and yet infer wrongly .

Ric · July 2, 2023, 8:34am

Is there a way to get an LLM to quote the source of the information associated with the output, every time a statement is made?
Especially when discussing EBTs (or any other specific topic), we notice how precise references are important. If the source of information is given, then at least someone could verify it.

StudentOfLife · July 2, 2023, 12:55pm

Thank you Venerable @Snowbird for starting a really important conversation!

I work in a university, and we share similar concerns, esp. with regard to supporting proper student growth.

One helpful analogy that has helped me sort through all the concerns is via parallels to the massive proliferation of content on the internet over the past two decades, and the relationship of end-users to all this content. History has many lessons for us (both things to worry about and potential solutions), that I hope we can take forward. The SuttaCentral community has been a critical part of managing and mediating this online relationship between dharma and student, and has a lot of institutional experience that is super valuable in this regard.

This is not a perfect parallel, but I think one particularly apt example for thinking through the human-centered part of the equation is wikipedia, because of its purported stature and authority. I came of age as wikipedia was just starting, and back then there were lots of questions that are still pertinent to ChatGPT-like tools today: how much trust to place in it? how to ensure quality? what purpose does it serve/what kind of tool is wikipedia? How to use wikipedia responsibly? How do we educate students in the use of wikipedia?

The long-term answer to these questions, for me, is that the responsible way to use wikipedia is as a map to ideas/content/vocabulary that are new to me, so that I can begin looking for better sources, and ideally, people to talk to.

I suspect that, once all the hype comes back down to reality, something similar will happen with ChatGPT. It will forever change the way people begin to access information (just as google search and wikipedia did). I think it is important to get ahead of the many problems that you have identified. Just as it became important to teach students to develop a measured approach to wikipedia (i.e. it is not “ground truth”), we as a community have an important role in guiding the responsible use of, relationship to, and improvement of ChatGPT-like tools.

Thank you for joining the discussion and guiding us!

michaelh · July 2, 2023, 1:03pm

Yes I believe some plugins will do that @Ric, the web plugins show their URL sources but not how they arrived at their conclusions necessarily.

Snowbird · July 2, 2023, 10:06pm

But the ability to misrepresent information is a design feature, not a limitation. The purpose of these LLM is to convincingly use Language in a fluent way, not to provide correct information. That the information it presents is sometimes correct is a side effect, not a feature.

I played around quite a bit with ChatGPT asking it questions about Buddhism. It was very impressive how convincingly it could present wrong information. But the number of actual suttas it could present wrong information about was quite limited. If your plugin is attempting to train ChatGPT on a larger body of texts, then it will eventually be able to very convincingly present wrong information about a larger body of texts. No matter how much you feed it, it will still not distinguish between true and false information. And the whole while it will be shockingly convincing.

But they don’t understand or know truth at all. That’s the issue. They only know patterns.

I’m sorry if I seem hyperbolic, but you can’t put poison in someone’s drink and absolve yourself by putting a label on it. Especially if you don’t know how that drink is going to be served. If you want my ethical advice, it would be to not put poison in people’s drinks. Don’t ask me about how you can ethically put poison in people’s drinks.

And besides, once someone feeds the texts into the model, they are no longer in a position to even put a warning on things. So then they’ve poisoned the drink but it won’t be served in a glass with the warning label.

I really don’t mean my criticisms personally. It’s a general issue. But I’m still not seeing any benefit in feeding suttas into these LLMs. I’m open to hearing one, but I haven’t so far.

Honestly I don’t think this is a parallel at all. Encyclopedias could never be used as primary sources in academic situations. And honestly, Wikipedia has never been considered an authority. Moreover, the purpose of Wikipeda is in fact to present references to primary sources. And if there are inaccuracies there is a method for correcting it. This is completely different than these LLMs whose purpose is to sound fluent in language and hide the primary sources all the while being impossible to correct.

There is no shortage of articles out there talking about the problems with these LLMs but I found this one especially interesting:

From the article…

Advances over the past year in the misnamed field of “artificial intelligence” have activated the inverse form of the heuristic that haunts so many disabled humans: most people see the language fluency exhibited by large language models (LLMs) like ChatGPT and erroneously assume that the computer possesses intelligent comprehension — that the program understands both what users say to it, and what it replies.

Some critics have attributed this fallacy to pareidolia — the tendency of the human brain to perceive patterns (often humanlike ones, such as faces or voices) in random visual or audio data. But I think that explanation is inadequate, because this phenomenon is demonstrably specific to the use of language . For example, text-to-image generators like DALL-E and Midjourney utilize the same underlying LLM technology as chatbots, but the public release of these programs last year — controversial as they were for other reasons — did not spark the same kinds of zealous speculation about a mind in the machine. Humans don’t have a history of using “image generation” or even “visual art” as a heuristic for intelligence. Only fluency of language has that distinction.

As computational linguist Dr. Emily Bender has said, “We now have machines that can mindlessly generate words, but we haven’t learned how to stop imagining a mind behind them.”

“Languages are symbolic systems,” explains Bender, comprised of “pairs of form and meaning. Something trained only on form” — as all LLMs are, by definition — “is only going to get form; it’s not going to get meaning. It’s not going to get to understanding.”

How does one get to natural language understanding? No one so far has any real idea. It’s considered an ‘AI-complete’ problem, something that would require computers that are as fully complex as, and functionally equivalent to, human beings.

(Which about five minutes ago was precisely what the term ‘artificial intelligence’ meant, but since tech companies managed to dumb down and rebrand ‘AI’ to mean “anything utilizing a machine-learning algorithm”, the resulting terminology vacuum necessitated a new coinage, so now we have to call machine cognition of human-level complexity ‘AGI’, for ‘artificial general intelligence’.)

StudentOfLife · July 3, 2023, 12:47am

@Snowbird I agree with you 100% on all your concerns! And we have no illusions that LLMs are “intelligent.”

Encyclopedias could never be used as primary sources in academic situations. And honestly, Wikipedia has never been considered an authority. Moreover, the purpose of Wikipeda is in fact to present references to primary sources. And if there are inaccuracies there is a method for correcting it.

Re: wikipedia, 20 years ago, when it was hard to find information on the internet and primary sources hard to verify, the conversation point and fear of teachers was precisely that students would treat wikipedia like encyclopedias and as authoritative. Of course we know that’s the wrong way to use wikipedia, but the concern was that people wouldn’t know…

We’re not talking about more academic settings, where everybody knows that encyclopedias don’t serve as proper references in and of themselves for scholarship, but for the general population, your “lay person” so to speak, especially students, who need to look something up but are not necessarily creating more scholarship on top of those resources and might get sloppy with their methods. I think these issues of misplaced trust are enduring perennial topics. I think we all agree because we’re all concerned about people misusing and misplacing trust in LLMs.

There are lots of important topics to cover. Wikipedia had to create governance systems to maintain the quality of the knowledge. Similar things have to be done today.

I think one potential concern, like you said, is naively training LLMs on suttas and trusting its output. This is also something that we are wary about, and we are not currently looking to train on suttas. One potential strategy around this, is to instead only use ChatGPT as a “natural language frontend” for interpreting user input, but on the backend routing that to trusted, non-LLM resources like the primary reference analysis systems michaelh is building, or other translation tools from the SC community. I work in a scientific field where information integrity is similarly important; the strategy that many in the sciences are currently turning towards is similar – using ChatGPT only as a “natural language API”, but doing all the critical information retrieval/processing work using more traditional, transparent, and better validated non-LLM systems.

One of our fears, like you have highlighted, is that the naive student out there will simply use ChatGPT and get all the wrong answers. We want to get ahead of the game and provide something that is as easy to use, but much more trustworthy (e.g. because the backend is non-LLM, or something that we actually validate extensively). Your concerns and conversation are important in helping make an actually trustworthy tool. Building a tool that the community can control and can continuously validate, check, and fix, is of utmost important, just as it is for maintaining wikipedia’s quality. The real work is not in the technology, but in the community, and we are mindful that any tool needs to be developed, improved, and vetted with meticulous care. Hope you can continue offer your wisdom and guidance.

Thank you!

Sasha_A · July 3, 2023, 2:41pm

The purpose of a LLM is to process input data into output data. A LLM cannot lie, because it cannot know and it cannot have intentions - there is nothing conscious in it: it is just a program.

The purpose of these LLMs is to convincingly imitate the use of language in a fluent way, but not to use it, because for a program there is no language, only data to process.

They don’t even have the ability to understand anything. They cannot know anything either, they execute an input data into an output according to some patterns.

In other words, an LLM is just a tool, like a hummer or a computer. How one would use this tool is not up to the tool, but only up to the user.

jgreen01 · July 7, 2023, 1:05am

Hello everyone,

I wanted to provide a quick update on our progress with the ChatGPT plugin for SuttaCentral. We’ve made some significant strides and currently have a proof of concept (POC) in place. This is a big step forward and shows the potential of what we’re working on.

However, we’ve hit a bit of a waiting period. To move forward, we need Plugin development access from Chat GPT. Unfortunately, this can take some time - we’ve heard it can be a matter of weeks. We’ve submitted our request and are eagerly awaiting their response.

While we wait, we’re not standing still. We’re going to start figuring out how to train a Language Learning Model (LLM). This is a new challenge, but we’re excited to dive into it.

@SebastianN, we’ve seen the great work you’ve been doing with the Linguae Dharmae project. We’d love to collaborate with you on this if you’re interested.

We appreciate your patience and support during this time. We’re excited about the potential of this project and can’t wait to share more updates soon.

Best regards,
Jon

cdpatton · July 21, 2023, 3:08am

This video gets very close to my own view of AI as someone with a BS in IT and a translator of ancient texts.

The basic problem with it is that:

It’s not intelligent, therefore,
Someone intelligent has to babysit it at all times,
But it can be useful for basic processing of big data sets.

kora · July 31, 2023, 2:42am

In early July, OpenAI released Code Interpreter to all ChatGPT Plus users.

This could be an alternative method to SC Plugin. You can just upload an Sqlite database of all suttas and let ChatGPT query it, e.g. with fulltext search. Then it will generate responses based on the query result.

By this method, it is all free, no need to pay for API or ask permission to release the plugin.

Khemarato.bhikkhu · July 31, 2023, 2:56am

Code Interpreter is still a premium feature available only to paid subscribers

richard.nagyfi · August 4, 2023, 12:18pm

+1 for LangChain integration (#5)

I think having models that can retrive suttas is more useful than going for trianing a buddhist chat agent, as it would eventually halucinate, adding more confusion to the path. Also, fine-tuning a basic model with dhamma talks and suttas does not feel right either, because the foundation model can have all kinds of rubbish in it with data crawled in unethical ways.

We can also use vector embeddings to increase the relevance of search results, that’s a quick win.

thrasymachus · November 24, 2023, 11:21pm

Hey I have been following this with in interest

Any word from CharGPT about the Plugin Development Access request?

trusolo · November 24, 2023, 11:24pm

Hi thrasymachus,

Welcome to the D&D forum! We hope you enjoy the various resources, FAQs, and previous threads. You can use the search function for topics and keywords you are interested in. Forum guidelines are here: Forum Guidelines. May some of these resources be of assistance along the path.

If you have any questions or need further clarification regarding anything, feel free to contact the moderators by including @moderators in your post or a PM.

Regards,
trusolo (on behalf of the moderators)

jgreen01 · November 25, 2023, 12:13am

Nope, we’ve heard nothing. Plus I’ve been too busy working and studying to make any progress with a development version.

Adutiya · November 30, 2023, 2:39pm

Now matter what, people at large will be using ChatGPT for information about Buddhism. This information will source from vast amounts of data outside of the suttas, thus incorporating wrong information into the output.

As a check and balance to that, the advantage of only using data from Sutta Central (the suttas), the output may be significantly more free of misinformation. Then it’s up to

DonatorProponent · December 7, 2023, 1:30am

I was listening to a podcast today where a senior leader at Microsoft was describing / selling what sounded like a very simple and potentially helpful use case for LLMs - search.

The example he provided was an internal use case where Microsoft put its internal health insurance plan policy documents through a vectorization process so that search results for terms with a lot of synonyms, eg eyeglasses, glasses, correctional lenses, etc. are all encoded as syntactically close, and a search for any one would pull up them all (so someone could quickly find out which plans cover them).

This reminded me of my experience searching sutta central, how I’ll frequently do several back to back searches trying to find a sutta I half remember, thinking “Was she called Migara’s mother in this sutta, or was she called Visakha?” Since, as has been pointed out, the “foundation” models are already pre-trained on Buddhist texts, I think they’d have many of these syntactically similar Buddhist terms close together in the vector space without requiring any fine-tuning or retraining.

They said of course sometimes the older methods work better, but because in the end you cardinalize search results into a single dimension, you can easily use a combined approach.

I don’t have a real web / software development background (I do data analysis), so I don’t know how sutta central search works right now, or how hard it would be to implement this in practice (probably harder than the Microsoft official doing PR claimed), but this is an example of how I think the technology could be used in an ethically neutral / positive way.

JStewart · December 7, 2023, 2:32am

I don’t know where the development of a proper ChatGPT plugin is, but here’s a link to a custom GPT I made to help me with questions of the Dhamma: https://chat.openai.com/g/g-Dk2ZdcXe0-dhamma-guide

Here are its directions: Dhamma Guide embodies the qualities of both a knowledgeable scholar and an empathetic guide. When addressing scholarly topics, it demonstrates deep knowledge and insight, providing detailed and accurate explanations of early Buddhist texts and teachings. In matters of daily practice and meditation advice, it adopts an empathetic and supportive tone, understanding the challenges and queries of practitioners, and offering guidance that is both practical and rooted in early Buddhist principles. This blend of scholarly depth and empathetic guidance ensures that Dhamma Guide meets the diverse needs of its users, whether they seek academic understanding or practical application of Buddhist teachings.

I haven’t plugged any custom data into it or inserted any script, so it’s basically just GPT-4 with a prompt. Still, it’s pretty good. It helps me find suttas on the regular.

Gillian · December 19, 2023, 4:12am

These aren’t actually directions; it’s a description, a description what we all wish chatgpt to be! Empathetic!

Do I have to spend $US20 to see what your prompt is, or will you share it here?