My new project: recording the suttas 🎙

karl_lew · March 16, 2019, 9:56pm

Absolutely fantastic! problem solved!

Cool.

Then this means that Voice could actually just use the JSON timing map generated by aenas and simply show the segment text according to the timing map. Anagarika @sabbamitta, I have added a new Voice backlog section with items for me to research use of aeneas for auto-segmented replaying of an entire audio recording. I’ve also added an item for @Aminah to design the new settings UI to handle this.

What this will do is allow us to:

provide a Voice setting for Pali spoken by Bhante Sujato as the default if available and Aditi as the option.
provide a Voice setting for English spoken by Bhante Sujato as the default if available and all existing AI voices as optional.

@Michaelh, I’m guessing that aeneas mapping files take time to generate, so I will need to figure out where to store the mapping files in Voice. Also, would you generate the mapping files yourself or would expect this to be done by the Voice server?

Bhante @Sujato, just to confirm, we assume you would re-record entire suttas affected by translation changes made post-recording to avoid mismatch of the corrected text segment with its corresponding audio recording.

sabbamitta · March 16, 2019, 10:04pm

Oh really?? !!!

Thank you so much!!!

Timothy · March 16, 2019, 10:22pm

Absolutely marvelous!

sujato · March 17, 2019, 2:01am

Indeed, it does take some time. Less than a second, to be precise.

karl_lew · March 17, 2019, 2:44am

Sounds like you and Michael are happily familiar with this and will have no need for help in generating JSON mapping files. I shall be the happy recipient of that JSON. I assume that the filenames for .flac and -map.json content will follow hierarchies similar to mn/en/sujato/filename so that I can infer the proper urls for either resource.

I shall probably need to cobble together a simple Voice prototype for a single sutta flac with its mapping file before I can understand how Voice code needs to change. Up until now there has been a design assumption in Voice that sound files are segment level, not sutta level. When design assumption change, code creaks and groans mightily with stress.

michaelh · March 17, 2019, 4:26am

Hi all!

I’ve written a changed wiki at Recording suttas: process for creating voice recordings · michaelh-sc/suttacentral Wiki · GitHub - can someone give the github user michaelh-sc permission to push straight to the wiki, as it ain’t easy to fork a github wiki and issue pull requests it seems.

Now, I would suggest Digital Ocean buckets in the long term as they have a good geolocating CDN for everyone to access and are cheaper than S3. Wasabi and Cyberduck used for sharing around working files.

The only question (I’m still unsure if it’s answered) is are the segments from Aeneas mapping straight to segment?

Now I understand that the key is the actual text, wow that’s very awesome solution and all in RAM? - sadhu!!

I think that the intonation and context might get lost if we continue to have the same line pronounced in the same way for any sutta, what do we think? Might need a second subclassed guidcache where the key is the hash of suttaname-segment_number-isolang-author or just a key value pair of that key and a URL but I am no expert on nodejs at all.

Let me know if any of that wiki article needs changing, and if no one has any objections I’ll set up a wasabi bucket with keys very shortly and register it as a company thing with SC and owned by Ajahn @Sujato, and a list of authorised people later because anicca. Also a Digital Ocean object storage with the same keys - both can be accessed with Cyberduck, Cloudberry, and web if needed at very high speeds I bet.

Sadhu all!!!

Robbie · March 17, 2019, 11:06am

One of my favorite aspects of hanging around this forum is seeing progress unfold in action.

Thank you so much, Bhante @sujato, @michaelh, and @karl_lew, for making this project a reality. Buddhism was first introduced in the West in the 19th century. But it is only now, toward the end of the 2010s that the well-translated Dhamma is becoming accessible for everyone: rich or poor, blind or sighted, located in NYC or Sub-Saharan Africa, speaking English fluently or not-so-fluently. It’s an honor to witness this development.

@karl_lew It’s a curious state of affairs when two people are both convinced by the other’s position! However, your idea of combining flac files with images to form YouTube videos gave me an idea. What about combining selected sutta recordings with artwork already made by Ven Yodha? Then those could be uploaded to Dhammanet, which already has 5 000 subscribers. It would give the additional advantage of starting out small w.r.t. YouTube videos.

Thinking about the next decade, I could see great value in high-quality segmented translations to Mandarin Chinese, Hindi, and Spanish—on par with and a full substitute for Pāli/English—to make the Dhamma accessible to even more people, learning from the experience of distributing the Dhamma in English. My $.02

karl_lew · March 17, 2019, 1:55pm

Fantastic idea! We’re still waiting on the Vinaya from Ajahn Brahmali, but when that arrives we can start working on videos with Ven @Yodha 's permission. As with the suttas, it would be good to have a choice of human (which human(s)?) and robot voices. Anagarika @Sabbamitta might we have this on the Voice backlog?

Michael, thanks for the update. I don’t have wiki permissions myself so I cannot help with merging your change.

Would you say more about the following? I don’t quite understand the _3v2 bit. What’s the 3 and what’s the 2? And is that a full sutta or just a segment? Does the original ever get updated? How do we find the latest version?

for whole sutta and en/sujato/sn1.1_3v2.flac for any retakes of segments if needed.

One of the things I will need to determine is whether Voice should use cloud storage for each user request or only to fill its own cache. My initial research indicates that latency and/or egress costs may cause us to favor the cache solution rather than the accessing the cloud storage for each user request. If we do rely on Voice caching of cloud storage, then the advantage of a geolocating CDN would not matter so much since the latency would be solved by the VPS cache and we would geolocate the VPS’s instead.

The following example from the aeneas site isn’t JSON, but could certainly be a JSON object with text keys and value objects having start/end millisecond offset:

1 => [00:00:00.000, 00:00:02.640]
From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880]
That thereby beauty’s rose might never die, => [00:00:05.880, 00:00:09.240]
But as the riper should by time decease, => [00:00:09.240, 00:00:11.920]

Bhante will repeat segments as he speaks each sutta, so the outcome will be more natural. However, if he updates the translation of any repeated segment, then all suttas that use the translated segment would need to be re-recorded. For example, if Bhante changes the translation of “There are 1.4 million main wombs,…”, then he would have to re-record three suttas. If Bhante changes the translation of the first jhana definition, then he would need to re-record 96 suttas.

The SoundStore GUID is the hash of the JSON signature, which for human voices will include the segment number since Bhante is recording entire suttas. This will disambiguate common segments. Indeed, we could dispense with the guid altogether since the key you gave would also work. I’ll need to check the Voice code to see which will be easiest.

Thanks for doing this. I’m slowly wading through the S3 Javascript SDK documentation and finding it rather massive and overwhelming. It may take a while to figure this out. Also, let’s wait for Bhante to tell us what is easiest for him to store the flac files that he creates. He may or may not want to upload to Wasabi directly himself and may need your help with that.

Oh and I think the Wasabi buckets would be public for read so that Voice can access them?
Bhante’s intent is broad distribution, so I think public for read is fine.

Aminah · March 17, 2019, 2:03pm

Done! (A warm hullo, btw)

sabbamitta · March 17, 2019, 3:01pm

I am not entirely sure what you mean. If we have Vinaya texts available for Voice we’ll use just the same robot voices as for the suttas. Human voices of course depend on recording. I am not sure what you want me to put on the Voice Backlog?

karl_lew · March 17, 2019, 5:19pm

Robbie had the cool idea of creating movies for YouTube from Ven. Yodha’s Dhamma Doodles with the text of the Vinaya and the sound of the Vinaya all synchronized together.
It’s a pretty amazing combination! And it would take several helping hands.

sabbamitta · March 17, 2019, 5:24pm

Oh, I just didn’t understand this falls within the scope of SC-Voice. But if you’d like to work on this I don’t want to stand in your way, of course!

It’s on the Backlog now.

karl_lew · March 17, 2019, 5:37pm

I actually don’t know who would do it, but they would likely need our help. Let’s just nominate @Robbie to lead the charge on this!

Robbie · March 17, 2019, 8:20pm

Nomination accepted!

Now I have questions!

How can I find the backlog? Do we also want to create videos for selected suttapiṭaka suttas (those which already have doodles)? Do we want to wait for human vinaya recordings? English only or Pāli/English?

sabbamitta · March 17, 2019, 8:33pm

Robbie · March 17, 2019, 8:36pm

@karl_lew I think we can use hand-made YouTube subs for the text. Then people can turn them on/off as they see fit.

sujato · March 18, 2019, 9:04am

yes, we will make sure the naming conventions are correct.

That was our assumption, too: Michael creates both segmented files for SCV, and whole files for ebooks, Youtube, internet archive, etc.

On the other hand, if you want to figure out how to deliver segmented audio direct from SCV using whole files as source, that’d be even better. The whole/segmented difference is, perhaps, better handled at the application level. For epubs, for example, we use the whole audio file and an XML smil file to tell it where the segments are. But of course, the bandwidth issues for an ebook are different than with SCV.

Anyway, from our point of view, we are happy to serve segmented audio files to SCV, but if we could serve whole audio files, we’d be even happier!

SCV won’t touch the flac files. We will serve pre-optimized opus files for production, and retain flac for archive only. Opus has really good compression! Just with some initial tests, a 6.7 MB flac file converts to a 1.1MB opus file with no audible loss of quality.

For this reason, I also do not think we should deploy low and high quality versions of the files. Just make opus files at a good compression with no audible quality loss. I’d like to keep our pipeline as simple as possible.

Yes.

This is a hard problem, and I don’t think there is an easy solution. We should maintain some kind of version control so that we can be sure that this audio is the recording of this text. It would be nice, but not guaranteed, to update the audio with every correction of the text. I suspect we’ll have to dribble this one down the field a bit and hope that we all die before it becomes a big issue.

For your edification and amusement, I include a sample of the files I created today. For Michael, I did the first ten suttas. Here I just upload one. This has English and Pali text, audio, and timing maps. The audio has both raw flac files as well as opus. Everything is, of course, just for testing purposes, these are not production files.

sn1.1.zip (10.1 MB)

Postscript: opus is supported by 86% of browsers currently.

But we need to use a “CAF” wrapping for Safari.

https://hetzel.net/2017-06-12/ios-11-opus-support-in-podcast-feeds/

sabbamitta · March 18, 2019, 10:09am

Wow—the Pali sounds so very beautiful! It’s certainly Aditi who will have the most serious competitor!!! Full of awe…

michaelh · March 18, 2019, 3:34pm

Thanks for adding me Aminah! Please give me another day to look through a bit more before I make any pull reqs/sudden pushes to wiki
All suttas on Youtube with Ven. Yodha’s artwork is a great idea imo, the dhamma doodles ven. Akaliko gave out once are so nice - can be done later once some recordings are finalised perhaps.

Sorry was an error on my part missed one digit , I’ll just follow the already worked out segment naming convention for final deliveries to the app, but I do suggest underscore instead of colon for segment delimiter, and no versioning nor step name for final opus files:
sn1.8_3.4.suffix
sutta_segment.webm

ok, so a couple servers per broad ‘region’ perhaps. CDNs can be pretty great but perhaps too much data out - I’m not sure how many downloads there will be, could go full viral . If the root URL is a variable instead of stored in database, it could point to various servers or one good web server in the app later.

If a sutta with timecodes for when the text shows is possible, it might be a faster experience for hifi than downloading all segments separately, either in parallel or one after the other. Buffering like a youtube video does?

OK, a single bitrate opus in .webm and .caf seems like a good single target imo.

yodha · March 19, 2019, 11:03am

Sure, you can use any doodles you like. Let me know if you need any special topic or sutta not covered yet, and I’ll see what I can do.