Narrating Reading Faithfully passages and making them available for download

Hello friends,

I love audiobooks, and for a while now I’ve been dreaming about recording the daily Reading Faithfully passages and making them freely available for others. Recently, I discovered how easy it is to generate high quality text to speech with AI using ElevenLabs. The expressiveness of the naration is incredible and really brings these passages to life.

My question is this: may I create AI recordings of the Reading Faithfully passages and then offer these recordings freely to others?

I was thinking I might make a blog called Listening Faithfully where I make these recordings available. Or, if Sutta Central would like to add these recordings to thier Reading Faithfully archives, that would be even better!

The quality of the AI generated audio is far better than what I’m capable of recording myself at the moment. In fact, I’m using the audio I’ve been generating as a kind of model for good narration. For each file I generate, I try to make a recording of the same passage using my own voice. I try my best to emulate the excellent expressiveness and delivery the AI narrator produced. It’s difficult to match, but I think it will be helpful for me to practice in this way.

I’ve really been enjoying both generating and listening to these AI generated recordings. It helps me to engage with the texts. I think that it’s possible that it may help others engage with them as well.

In some passages, there are multiple characters. For example, a narrator, the Buddha, Venerable Ananda, or others. Using ElevenLabs Studio, it is possible to assign different voices to different parts, and suddenly the text becomes something like a play, or a dramatic radio broadcast. I particularly enjoy listeninig to these recordings.

One thing that I’ve found difficult while producing these is choosing which voice should play which part. For example, there is no perfect voice out there that would match the perfect wisdom of the Buddha. Since that is the case, I tend to switch narrators for his role. There are so many high quality voices to choose from, so it’s easy to do.

If anyone is interested in hearing the recordings I’ve generated so far, I’d be happy to share them.

1 Like

Namo Buddhaya!

I’m so happy to see your enthusiasm for the daily sutta readings. Glad it is helpful for you.

So a point of clarification… The readingfaithfully.org website (run by me) is a project independent from SuttaCentral.net, although of course it relies heavily on Bhante Sujato’s translations here.

As I’m sure you have seen, there is a link at the bottom of (99%) of the suttas with a link to sc-voice.net that lets you listen to (mostly) TTS of the translation and the Pali. (Shout out to @sabbamitta

And at the bottom of many of the daily suttas is a link to human recordings on PaliAudio.com. (Shout out to @Paliaudio.com).

As far as copyright/permission… I think that other than the few translations by Ven. K. Gnanananda the original translations are all released under a license that permits republishing as audio.

5 Likes

First of all, thank you for running the readingfaithfully.org website! I really appreciate the work that you’ve put into it. These daily reflections are wonderful.

Also, thank you for your clarification you provided.

And thank you for referring me to PaliAudio.com. I hadn’t noticed the link to human recordings at the bottom of certain passages. I just checked it out, and the sound quality of the audio files there is excellent.

I am a big fan of sc-voice.net and am grateful for their tools. It’s nice to hear the Pali pronounced so clearly, side by side with its translation. The voices are good, but the moment, the English voices are not as capable of reading the texts with the same expressive qualities and delivery as some of the other speech to text apps out there. There is something about the elevenlabs narration that really brings these texts to life for me, which is why I am so interested in sharing it.

Would it be alright with you if I created a blog called Listening Faithfully? I feel that I should ask you this, since what I’d be doing is directly connected with the site you run.

2 Likes

i would check with Bhante @sujato about using his translations with Ai.
it’s possibly an acceptable use case, but he is firmly in the no AI camp, except for limited accessibility case.

Noteworthy, on this topic, is https://www.bhashini.ai which is tts developed for ancient and modern Indian languages. It has now been rolled out with DPD

ETA when I first saw this topic I thought that you were planning to read them. Which would be lovely! The correct audio gear could be purchased for the amount you would pay elevwnlabs in a year.

2 Likes

Thank you for your advice. I’ll check with Bhante @sujato about using his translations with AI.

After I generate an AI version, I usually have a go at recording the passage myself using the AI version as a reference for what I should be aiming for. I think that it helps me make my speech less monotone and more interesting to listen to. I would be happy to share those recordings as well. I’m just an amatuer, though… I’m not sure how good they are objectively.

3 Likes

Hi @Jirayu , great that you like sc-voice!

With respects to the narrators, I personally do particularly like the fact that they don’t have the expressive qualities that you apparently like. I would rather feel distracted from the content by such an expressiveness.

But the main reason why we are (still) staying with the current voices is that it is not easy to find a good Pali voice. Maybe that will change now with bashini.ai.

1 Like

LOL, I thought that too!

As Ven. Sabbamitta’s comment shows, different people like different things. Personally I don’t like a flat reading, but I probably wouldn’t like a super emotional one either. I find the PaliAudio ones a great balance. But for sure I have heard other people also like a kind of flat voice. And also, your voice is likely not as flat as you think.

All that to say that if you record them for sure people will listen. And as AI voices become more common, I have a theory that people will start to seek out and value actual human recordings.

Ven. @sabbamitta Are you still including human recordings in Voice? I know you have (had?) a few from Bhante there as an option.

@Jirayu, I certainly would encourage you do do and share recordings if you are motivated to do so. The gift of Dhamma is the greatest gift!

I actually have a page on the site with the title Listening Faithfully…

But I can’t copyright the phrase :face_with_hand_over_mouth:. Just so you are clear that it’s your own project that’s perfectly fine.

2 Likes

In theory yes. But none of us has the skills to edit the audios in order to make them fit the segment structure as @michaelh did for the ones Bhante Sujato has recorded. You cannot just record them segment-wise, that would give a very interrupted listening experience.

For those who are curious: in sc-voice you currently find Bhante’s recordings for SN1 and SN2.1-20 (select respective narrator voice for Pali and/or English in settings). However, the translations have been edited since these recordings were made; I don’t know how much in these particular cases, but you might find quite some discrepancy between what you read and what you hear. One of the few drawbacks of human voices is that they can’t so easily adapt to translation changes.

I guess this recording project came to a halt when Bhante started his revision of the entire canon, so it didn’t probably make much sense to record texts that would soon be outdated.

2 Likes

Yeah, the Paliaudio voice is very okay for me as well. But I have heard AI voices that were just so over-enthusiastic! That’s not my thing. And yes, in case of doubt, I personally would rather err on the flat side.

In any case I would second the encouragement for human recordings!

1 Like

In Audacity this is theoretically a tap of the key while listening to drop markers, tabbing to label markers and then split batch exporting. I was batch segmenting pali audio a few months back and it was pretty simple, though my labels weren’t complex.

Hopefully I will be back to two handed functioning soon and I can document my workflow if anyone is interested

4 Likes

Perhaps I should add: no-one has the skills and capacity to edit audios. :smiley: But if someone would jump in to help out we’d love to add more human audios.

3 Likes

The trouble with AI voices is that they can get so “wonderful” that we lose ourselves and fall into the illusion that they are exemplary of what we should be. And that brings great sadness because our goal should not be to grasp at imagined machine perfection–that grasping would only lead to more suffering, a suffering of “not being good enough ourselves”. Upholding AI we become discontent with ourselves. We say, “Oh that AI Voice is SOOO WONDERFUL!”. And in that grasping misplaced acclaim, we would lose ourselves in discontent of our own daily being. :cry:

SC-Voice uses AI voices from Amazon Web Services Polly. We have hammered them into a semblance of utility. They are irritating and riddled with inaccuracies. We could expend enormous effort hammering them into a better state, at which point they would still be irritating and imperfect. And yet, in that very imperfection lies a seed of hope. And in that seed is a simple message. The simple message is this: “I could do better”. And so we can. We can learn and should learn to speak the suttas ourselves. We should do so because the suttas have been spoken for thousands of years with everyday voices of all kinds. And those voices have melded and blended spiritual friends in harmony and joy.

So I listen to SC-Voice trying to ease the roughness of that machine voice, wondering how I should speak these words for myself in the company of others.

:pray:

——
By the way. I do like the idea of distinguishing the speakers in AI audio. Blind people do not see quotes, so a demarcation of speaker change is actually quite helpful. A simple way to achieve that demarcation is to change the pitch and/or rate of speech for each speaker.

5 Likes

I have been tinkering with Bhashini and a couple of other research-stage (https://syspin.iisc.ac.in/) Indic language TTS projects. Each one is very promising and there are a few things I have discovered.

  • Using a model trained on current languages like Maithili, Bhojpuri, Chattisgadhi, etc. gives very authentic Pali sounds. Until recently, for Sanskrit and Pali, Kannada model was used.

  • There are models trained on Sanskrit (smaller datasets) but they tend to sound like polished newsreaders.

  • Speakers can be distinguished and given a different voice by inserting a prompt before the text but so far I have been doing it manually on small paragraphs.

  • The cool bit is while calling a model, one can specify parameters like speed, ambience, etc. via simple prompts and one can also specify emotive qualities like calm, agitated, rude, etc. via simple prompts.

I am still playing around with small text set from SC. I will write a longer post detailing my results. I am doing all this locally. As of now, I do not know how one does this in an automated fashion like SC-Voice.

4 Likes

I was so excited when I saw this and then realized quickly that Sinhala isn’t a language of India. :pleading_face:

4 Likes

One of the challenges SCVoice faces is offline usage. A TTS model would be ideal, especially one based on Indian languages. In fact the Aditi voice is an en-HI voice.

Our offline efforts have solidified into active development of scVoice for iOS devices. Two approaches seem viable: 1) customize third party existing TTS Indian voices as you are doing, and 2) customize Apple Indian voices (e.g., Sangeeta). It would be very interesting to learn about your own research into potential Pali TTS that we could deploy for offline listening.

:pray:

Regarding bashing.ai, I see a restrictive license policy that would not allow us to redistribute audio. For example a simple conversion of format is a derivative work I think:

2.1 Customer will not, directly or indirectly: reverse engineer, decompile, disassemble or otherwise attempt to discover the source code, object code or underlying structure, ideas, know-how or algorithms relevant to the Services or any software, documentation or data related to the Services (“Software”); modify, translate, or create derivative works based on the Services or any Software; use the Services or any Software for timesharing or service bureau purposes or otherwise for the benefit of a third; or remove any proprietary notices or labels. With respect to any Software that is distributed or provided to Customer for use on Customer premises or devices, Company hereby grants Customer a non-exclusive, non-transferable, non-sublicensable license to use such Software during the Term only in connection with the Services.

2 Likes

Oh but the way they do gazillion Indian languages is use models of a near enough language to do the TTS. The project started off with a few languages a few years ago and added languages one by one.

So for Sinhala this approach could work. There are smaller projects on github for Sinhala:

3 Likes