Upload your own voice to sc-voice

bran · April 21, 2024, 12:12am

What if users were able to pick a sutta, click on a line, and record themselves right there for others to hear?

There would be key/buttons for starting/stopping/retrying/deleting/playing back/uploading a recording, going to next/prev line, saving, and requesting to upload to the site for others to hear.

I say a button and keyboard shortcut since the mic may pickup the sound of one or the other. Another way to avoid that is it could start recording a second later and cut off the last fraction.

It would bring these problems:

corrupted/bad recordings (force a request/approval system)
spam uploads (rate limiting)
huge file size uploads (add file size limit)
in-progress/unapproved records stored in browser cache or database?
repeat text - should the user really read “Aṅguttara Nikāya” over and over? or just use 1 recording of it and access that by search wherever
(probably a different project altogether) uploading an entire sutta recording to be parsed by line

It’s hard for me to say how challenging this would be until after it would have began. I don’t feel like the interface is anything serious cost-wise, but the storage and assigning of the recording to a person/ip/account could bring tough issues and might require user accounts. I don’t think a high number of accounts would be created anyway since not many people record this, and audio recordings aren’t too big “On average, audiobook files are 28 MB per hour” (kobo). Similar websites exist for design reference like howtopronounce.com

There are few reasons for having this. It would improve accessibility for blind and vision impaired people (giving them better access to the teachings). This site’s design would expand and promote users to read other language translations, improving accessibility by language. It’d help those who prefer a human voice over the robot voice. It’d be a lot easier to have all the texts in voice if many people were recording it. Unlike many of the recordings of ebt’s being read online, the advantage of this website is its pairing with the translation and degree of control around line/phrase selection, allowing people to hear both the original and translation easily (rather than skipping to the 6 minute mark of a recording and backing it up as needed). This is also in the series of A SuttaCentral creativity multiplier! as it’d allow people to express their own vocal interpretation of these texts and hear others’ too.

I could contribute to this by the way. (@sabbamitta, @firetick1)

sabbamitta · April 21, 2024, 6:57am

Thank you @bran for your proposal. As you already said, there are a lot of challenges to it.

Perhaps the biggest problem from the Voice side of things is that there is simply no capacity to realize such an apparently substantial piece of work. We have only one programmer whose time is running out, and one helper who isn’t able to do programming work.

Perhaps the path to go is that someone else can clone SC-Voice and build their own thing on top of it.

You probably meant to tag in @karl_lew , that’s his username here on D&D.

karl_lew · April 21, 2024, 10:53am

Thank you, Ayya @Sabbamitta.

@Bran, as Ayya mentions this is difficult currently. It is also very very possible. You will soon be able to pay some company some money to do things like this. Text-to-speech voices are trained from human speech. There is an actual woman who spoke the initial words for Siri, the voice on my phone. She was chosen for the clarity and evenness of her speech. So yes, it is very possible because we have done such things already and AI technology will continue in this direction doing such things like resurrecting the voices of dead actors to use them for their own purposes. The woman who spoke Siri’s first words is no longer needed by Apple–she does other stuff as a voice actor.

From the point of view of the suttas, it is a bit odd however.

SN45.3:1.3: “Sir, good friends, companions, and associates are the whole of the spiritual life.”

If we spend our lives listening to our own voice, without good friends, companions and associates, are we living the whole of the spiritual life?

AN2.125:1.1: “There are two conditions for the arising of wrong view.
AN2.125:1.2: What two?
AN2.125:1.3: The words of another and irrational application of mind.

Is the desire to listen to our own voice rational or irrational?

What do you think, @Bran?

p.s., thank you for considering the blind. Voice is how I listen to the suttas. I am going blind.

BethL · April 21, 2024, 1:47pm

I applaud this creative thinking!

As an IT project manager, I wholly concur with Ven. Sabbamitta, absent a huge influx of resources.

As Bhante has highlighted scenarios in his stochastic-parrots essays, it’s hard to imagine that someone won’t end up doing something like this (with dead-actor voices).

In the meantime, as I’ve mentioned in the pāli class threads, I’ve “gone old school” and begun uploading all of Ven. Jiv’s pāli recordings that Frank has posted on the Internet. (This also includes a handful of Frank’s own recordings.)

To date, I’ve uploaded about 4GB and likely captured about 60-70% of what’s available. In the Google drive, which is now publicly accessible via the link below, I also include a Google sheet that keeps a running tally of all the suttas I’ve uploaded.

https://drive.google.com/drive/folders/14hd872VeebgRrdY8H_dPUhOpZxNimUTn?usp=drive_link

Out of respect for Frank’s privacy I’m not really saying much else about it. (I don’t know Frank.)

That said, I’ve deciphered some of his file naming tendencies and am starting to find Ven. Jiv recordings that aren’t linked to anything on the Internet.

So, we’ll see how much we end up with!

Pasanna · April 21, 2024, 10:45pm

paliaudio.com have many of Bhante Sujato’s English translations already recorded.
They seem to be under Creative Commons Attribution 4.0 International licence.

I wonder if someone could help to add these to voice?

Snowbird · April 21, 2024, 10:59pm

AudioTipitaka was trying to do crowd sourced audio readings. If you are striving for a reasonable level of quality, there is a lot of work involved. Not a reason not to do this, but just a warning that it’s a bit more difficult than one would think. You have to do “proof-listening” and if there are errors it’s much more difficult to correct than with text editing.

@BethL, you could consider approaching @Paliaudio.com and see if they are missing any recording by Ven. Jiv. They have some, but I don’t think all. And just as a technical matter, they were recorded from the Buddhajayantitipitika (BJT), not the edition found on SuttaCentral.

BethL · April 21, 2024, 11:44pm

Ven. Snowbird, will do! I have been looking at Frank’s various sites for so many months to discern where they are all hiding .

Thanks…I did double and triple-checks against SuttaCentral to ensure that how I’ve labeled them in my google file naming convention is consistent with SuttaCentral. My next step is to actually embed the SC links!

BethL · April 21, 2024, 11:46pm

Thank you…yes, to clarify, these recordings by Ven. Jiv are in pāli.

sabbamitta · April 22, 2024, 6:33am

We should always keep in mind that Bhante Sujato’s translations are still subject to a lot of revision. I guess that any recorded version out there is likely to be outdated.

That is actually a great benefit of machine TTS, that it can easily be up to date with the latest version. No more difficult than text editing. Which is one of the reasons why we chose this approach with Voice.

They would have to be uploaded to a server after audio editing/segmenting. For the few files recorded by Bhante Sujato that we have so far this has been done by @michaelh. After that this server needs to be linked to Voice (I am not sure how that works, @karl_lew did that).

But I guess the English versions of those recordings are probably … totally outdated! So in this case the text that you read and the audio that you hear will differ.

karl_lew · April 22, 2024, 3:35pm

Sadly, yes. Bhante’s existing audio recordings will eventually no longer match Bhante’s translations. For example, “identity view” has become “substantialist view”. The robot narrators, in contrast, automatically follow all SuttaCentral translations as they are published, so the robots will speak according to what is written.

Importantly, we should all be aware that robots make mistakes. We tend to assume robots are free of mistakes given their monotonous repetitiveness. Robots make mistakes. And we cannot fully trust these robot narrators because they make mistakes. We have to listen mindfully. One perfect example is the word “bow”, which has two (2) pronunciations. These pronunciations differ in semantic meaning. There is a bow And there is a bow The Buddha uses both these terms. So please note that the robots always make a mistake here and use the same pronunciation for both. This is bad, but I could not fix it. So please be mindful as you listen to the robots. Robots make mistakes.

Ideally, humans should just chant the suttas together as we all did in the past. Logistically, that is really hard to manage. So the robots are a poor approximation for what we really should be doing.

Finally, although it is technically possible to capture the sense of an individual human’s voice and reapply it in TTS so that it seems like that human, the prospect of training an AI voice to speak the Dhamma as an avatar for a deceased monastic is simply horrific. This is why I would strongly object to training any robot narrator to sound like an actual follower of the Teachings. The potential for harm there is enormous. It is very very unethical because it is based on deceit.

mn8:12.32 ‘Others will be deceitful, but here we will not be deceitful.’

Please let us not upload our own voices to SC-Voice to speak on our behalf. I believe that to do so would not be ethical.