Upload your own voice to sc-voice

What if users were able to pick a sutta, click on a line, and record themselves right there for others to hear?

There would be key/buttons for starting/stopping/retrying/deleting/playing back/uploading a recording, going to next/prev line, saving, and requesting to upload to the site for others to hear.

I say a button and keyboard shortcut since the mic may pickup the sound of one or the other. Another way to avoid that is it could start recording a second later and cut off the last fraction.

It would bring these problems:

  • corrupted/bad recordings (force a request/approval system)
  • spam uploads (rate limiting)
  • huge file size uploads (add file size limit)
  • in-progress/unapproved records stored in browser cache or database?
  • repeat text - should the user really read ā€œAį¹…guttara Nikāyaā€ over and over? or just use 1 recording of it and access that by search wherever
  • (probably a different project altogether) uploading an entire sutta recording to be parsed by line

Itā€™s hard for me to say how challenging this would be until after it would have began. I donā€™t feel like the interface is anything serious cost-wise, but the storage and assigning of the recording to a person/ip/account could bring tough issues and might require user accounts. I donā€™t think a high number of accounts would be created anyway since not many people record this, and audio recordings arenā€™t too big ā€œOn average, audiobook files are 28 MB per hourā€ (kobo). Similar websites exist for design reference like howtopronounce.com

There are few reasons for having this. It would improve accessibility for blind and vision impaired people (giving them better access to the teachings). This siteā€™s design would expand and promote users to read other language translations, improving accessibility by language. Itā€™d help those who prefer a human voice over the robot voice. Itā€™d be a lot easier to have all the texts in voice if many people were recording it. Unlike many of the recordings of ebtā€™s being read online, the advantage of this website is its pairing with the translation and degree of control around line/phrase selection, allowing people to hear both the original and translation easily (rather than skipping to the 6 minute mark of a recording and backing it up as needed). This is also in the series of A SuttaCentral creativity multiplier! as itā€™d allow people to express their own vocal interpretation of these texts and hear othersā€™ too.

I could contribute to this by the way. (@sabbamitta, @firetick1)

4 Likes

Thank you @bran for your proposal. As you already said, there are a lot of challenges to it.

Perhaps the biggest problem from the Voice side of things is that there is simply no capacity to realize such an apparently substantial piece of work. We have only one programmer whose time is running out, and one helper who isnā€™t able to do programming work.

Perhaps the path to go is that someone else can clone SC-Voice and build their own thing on top of it.

You probably meant to tag in @karl_lew , thatā€™s his username here on D&D.

3 Likes

Thank you, Ayya @Sabbamitta. :pray:

@Bran, as Ayya mentions this is difficult currently. It is also very very possible. You will soon be able to pay some company some money to do things like this. Text-to-speech voices are trained from human speech. There is an actual woman who spoke the initial words for Siri, the voice on my phone. She was chosen for the clarity and evenness of her speech. So yes, it is very possible because we have done such things already and AI technology will continue in this direction doing such things like resurrecting the voices of dead actors to use them for their own purposes. The woman who spoke Siriā€™s first words is no longer needed by Appleā€“she does other stuff as a voice actor.

From the point of view of the suttas, it is a bit odd however.

SN45.3:1.3: ā€œSir, good friends, companions, and associates are the whole of the spiritual life.ā€

If we spend our lives listening to our own voice, without good friends, companions and associates, are we living the whole of the spiritual life?

AN2.125:1.1: ā€œThere are two conditions for the arising of wrong view.
AN2.125:1.2: What two?
AN2.125:1.3: The words of another and irrational application of mind.

Is the desire to listen to our own voice rational or irrational?

What do you think, @Bran? :thinking:

:pray:

p.s., thank you for considering the blind. Voice is how I listen to the suttas. I am going blind.

2 Likes

I applaud this creative thinking! :pray:t3: :heart_eyes:

As an IT project manager, I wholly concur with Ven. Sabbamitta, absent a huge influx of resources.

As Bhante has highlighted scenarios in his stochastic-parrots essays, itā€™s hard to imagine that someone wonā€™t end up doing something like this (with dead-actor voices).

In the meantime, as Iā€™ve mentioned in the pāli class threads, Iā€™ve ā€œgone old schoolā€ and begun uploading all of Ven. Jivā€™s pāli recordings that Frank has posted on the Internet. (This also includes a handful of Frankā€™s own recordings.)

To date, Iā€™ve uploaded about 4GB and likely captured about 60-70% of whatā€™s available. In the Google drive, which is now publicly accessible via the link below, I also include a Google sheet that keeps a running tally of all the suttas Iā€™ve uploaded.

https://drive.google.com/drive/folders/14hd872VeebgRrdY8H_dPUhOpZxNimUTn?usp=drive_link

Out of respect for Frankā€™s privacy Iā€™m not really saying much else about it. (I donā€™t know Frank.)

That said, Iā€™ve deciphered some of his file naming tendencies and am starting to find Ven. Jiv recordings that arenā€™t linked to anything on the Internet.

So, weā€™ll see how much we end up with!

:pray:t3: :elephant: :smiling_face:

4 Likes

paliaudio.com have many of Bhante Sujatoā€™s English translations already recorded.
They seem to be under Creative Commons Attribution 4.0 International licence.

I wonder if someone could help to add these to voice?

1 Like

AudioTipitaka was trying to do crowd sourced audio readings. If you are striving for a reasonable level of quality, there is a lot of work involved. Not a reason not to do this, but just a warning that itā€™s a bit more difficult than one would think. You have to do ā€œproof-listeningā€ and if there are errors itā€™s much more difficult to correct than with text editing.

@BethL, you could consider approaching @Paliaudio.com and see if they are missing any recording by Ven. Jiv. They have some, but I donā€™t think all. And just as a technical matter, they were recorded from the Buddhajayantitipitika (BJT), not the edition found on SuttaCentral.

Ven. Snowbird, will do! I have been looking at Frankā€™s various sites for so many months to discern where they are all hiding :smirk_cat:.

Thanksā€¦I did double and triple-checks against SuttaCentral to ensure that how Iā€™ve labeled them in my google file naming convention is consistent with SuttaCentral. My next step is to actually embed the SC links!

Thank youā€¦yes, to clarify, these recordings by Ven. Jiv are in pāli.

1 Like

We should always keep in mind that Bhante Sujatoā€™s translations are still subject to a lot of revision. I guess that any recorded version out there is likely to be outdated.

That is actually a great benefit of machine TTS, that it can easily be up to date with the latest version. No more difficult than text editing. Which is one of the reasons why we chose this approach with Voice.

They would have to be uploaded to a server after audio editing/segmenting. For the few files recorded by Bhante Sujato that we have so far this has been done by @michaelh. :pray: After that this server needs to be linked to Voice (I am not sure how that works, @karl_lew did that).

But I guess the English versions of those recordings are probably ā€¦ totally outdated! :person_shrugging: So in this case the text that you read and the audio that you hear will differ.

2 Likes

Sadly, yes. Bhanteā€™s existing audio recordings will eventually no longer match Bhanteā€™s translations. For example, ā€œidentity viewā€ has become ā€œsubstantialist viewā€. The robot narrators, in contrast, automatically follow all SuttaCentral translations as they are published, so the robots will speak according to what is written.

Importantly, we should all be aware that robots make mistakes. We tend to assume robots are free of mistakes given their monotonous repetitiveness. Robots make mistakes. And we cannot fully trust these robot narrators because they make mistakes. We have to listen mindfully. One perfect example is the word ā€œbowā€, which has two (2) pronunciations. These pronunciations differ in semantic meaning. There is a bow :bow_and_arrow: And there is a bow :bowing_man: The Buddha uses both these terms. So please note that the robots always make a mistake here and use the same pronunciation for both. This is bad, but I could not fix it. So please be mindful as you listen to the robots. Robots make mistakes.

Ideally, humans should just chant the suttas together as we all did in the past. Logistically, that is really hard to manage. So the robots are a poor approximation for what we really should be doing.

Finally, although it is technically possible to capture the sense of an individual humanā€™s voice and reapply it in TTS so that it seems like that human, the prospect of training an AI voice to speak the Dhamma as an avatar for a deceased monastic is simply horrific. This is why I would strongly object to training any robot narrator to sound like an actual follower of the Teachings. The potential for harm there is enormous. It is very very unethical because it is based on deceit.

mn8:12.32 ā€˜Others will be deceitful, but here we will not be deceitful.ā€™

Please let us not upload our own voices to SC-Voice to speak on our behalf. I believe that to do so would not be ethical.

:pray:

2 Likes