I have some questions regarding the use of Sutta Central’s data

I’m currently developing a project where humans and AI collaborate on translations. In this project, we input the original text, have AI create a draft translation, and then human users evaluate this translation. If there are errors, users can add new translations. This system allows for continuous improvement of the translations.

I would like to add Pali canon texts to this project and hope to utilize Sutta Central’s data. However, I’m aware that Bhante Sujato @sujato has expressed concerns about AI, and that AI posts and translations are prohibited on Sutta Central.

In my project, we assume that AI translations are inferior to human translations. We clearly indicate that the translations are AI-generated and have humans evaluate them. I believe this is an approach that accelerates translation through human-AI collaboration and helps spread accurate knowledge more widely.

With this background in mind, I have the following questions:

  1. Is the use of Sutta Central’s data prohibited in my case as well?
  2. If it is prohibited, which specific data is forbidden to use? The original texts should be in the public domain and can’t be prohibited by anyone, right?

I would greatly appreciate your response. Thank you very much for your consideration.

1 Like

If you’re looking for solely Pāli texts, you could use Tipitaka Pāli Reader’s data freely I assume. Bhante Subhuti (manager of TPR) has an active interest in monitoring the progress of AI translations, and tests out ChatGPT etc. periodically. He might also provide you with further details.

This response of Sujato struck me as pretty memorable:

It’s good to keep it organic.

Thank you! I’ll ask him!

1 Like

この質問は別のカテゴリに移動したほうが良さそうなので移動しました。
また、私は英語が母語ではないため、Claudeを使用して翻訳していることをルールに則って明記しておきます。
はい、Bhante Sujatの意見は理解しました。
質問を以下のように言い換えます。
この質問は、誰かの立場に挑戦するためではなく、すべての貢献者の意向を尊重しながら、SuttaCentralのリソースを適切に使用する方法をより明確に理解し、同じ疑問を抱いた人のために記録しておくためです。デジタル時代における帰属、ライセンス、倫理的使用のこれらの複雑な問題を乗り越えるにあたり、皆様の洞察は貴重なものとなるでしょう。

  • SuttaCentralの運営者が何かを禁止することができるのは、具体的に、リポジトリに存在するどのフォルダのどのデータに対してなのでしょうか?これはパブリックドメインのテキストとどのように区別されているのでしょうか?

お時間とご検討いただき、ありがとうございます。

I have moved this question to a different category as it seems more appropriate.

Also, as English is not my native language, I am disclosing that I am using Claude for translation, in accordance with the rules.

Yes, I understand Bhante Sujat’s opinion.

I would like to rephrase my questions as follows:

This question is not intended to challenge anyone’s position, but to gain a clearer understanding of how to properly use Sutta Central’s resources while respecting the wishes of all contributors, and to record this for others who may have similar questions. Your insights will be valuable in navigating these complex issues of attribution, licensing, and ethical use in the digital age.

  • Specifically, for which folders and data in the repository can Sutta Central’s operators prohibit something? How is this distinguished from public domain texts?

Thank you for your time and consideration.

1 Like

You can use the original (Pali) data the SuttaCentral scraped from the World Tipitaka website, in it’s original form, preserved in this repository:

I believe since these are the original files, not modified by SuttaCentral in any way, they should be okay to use. But best check with @sujato.

I’ve been thinking for some time on having my own website hosting the Tipitaka, and break my reliance on Suttacentral since I don’t use any of the translations.

If you make any progress on using this data, please let me know.

PS: If you want to avoid using ANYTHING by SuttaCentral, then the original World Tipitaka source files from which SuttaCentral derives it’s data from is contained in this repository:

1 Like

Hey thanks for asking! I appreciate it.

SuttaCentral has previously considered using such a system, but on consideration we rejected it. It is my deeply-held belief that the creation of systems where humans end up being no more than editors of AI prompts will undermine the creative struggle that is essential to any work of lasting value. Time after time, we have seen situations where people using such AI-first approaches end up making basic mistakes that would have been avoided by any genuine expert in the field. But the AI convinces them that they know what they are doing, and they end up even refuting and rejecting the advice of people who have won their knowledge over decades of hard work. In my view, this shows the AI working exactly as designed, in order to create and amplify hallucinations in human beings.

AI-prompted translations may well be fine in the context of ephemeral or low value text like marketing or spam, but is not suitable for work of lasting value, like spiritual texts.

Yes.

It’s not illegal to use any of our data, we simply ask politely that people refrain from doing so.

Note that I also ask that people refrain from using upstream AI models that use SC’s data against our wishes, like ChaptGPT or any of the large commercial models.

6 Likes

Thanks!
I see that data.
I need to organize it, but is seems to be usable.

Yes!

Thanks for your reply!
I see your point!

1 Like

You believe that the creators of GPT/Transformer technology specifically designed it to create and amplify hallucinations in human beings? IOW, you believe the actual ML researchers that made the GPT/Transformer breakthrough had nefarious goals of amplifying hallucinations?

To make this a concrete question, do you believe that the authors of this famous paper that kickstarted the GPT/Transformer AI era had the goal of amplifying hallucinations in humans? :pray:

お願いします。頑張ってくださいね。

I wish you the best in your work.

I think they did not understand the implications of their technology, which have only become apparent with its widespread deployment. Writers of fiction, on the other hand, whose job is to dream of possible futures, have long warned us. But certainly, the role of any “Buddhist” chatbot is to create the hallucination that what it produces is a meaningful expression of the Buddha’s teaching. It’s not: it’s a counterfeit.

But we should not be asking the creators of the technology for an opinion as to its worth: they have already decided. Hayao Miyazaki shook the scales from my eyes with this brief discussion, where he says, “I strongly feel this is an insult to life itself.” As can be expected, to search for this video today is to be met with a cascade of search results advertising AI that plagiarizes his work.

2 Likes