Training AI models on the suttas

Definitely a use case I encounter repeatedly.

For me, one of the potentially exciting parts of training AI models on the suttas is that it allows for new & more natural queries which do not hit the exact keywords in a search.

E.g. “Which layperson said to a monk that, if the monk doesn’t teach the Dhamma, then he the layperson will teach the monk the Dhamma?” (Answer: Ugga the Householder) doesn’t yield good results in Google Searches on Suttacentral, but such queries might be a good fit for Chat GPT-like interfaces for AI models trained on the suttas!

I think the AI models really haven’t been trained on large enough data sets. Besides Chat GPT, recently I’ve also been playing with DALL-E by putting in specific phrases from the suttas into them, e.g. Four Noble Truths.

It’s quite telling that the images generated are very clichéd Buddhist images, often with very strange alien gibberish, which is symptomatic of a model that is trained with insufficient data sets.
E.g. the prompt There is what is given and what is offered and what is sacrificed; there is fruit and result of good and bad actions" yielded this alien masterpiece:

Very exciting!
The other potential benefit is the use of more natural query-language. In addition to searching “renunciation”, you could ask “What did the Buddha teach about renunciation? Provide references from the Pali Canon and Chinese paralllels”

TIL that, even though Buddhist texts are one of the largest religious corpuses in the world, even that isn’t big enough for training models… sigh.

Which probably points to where the underlying technology can improve. Recently, I was very surprised to learn that the ‘neurons’ in neural networks are actually a very oversimplified version of the organic natural thing… maybe improving the ‘neurons’ might be where future AI improvements will happen, like in Moore’s Law for semicon chips. :man_shrugging:

1 Like