Oh really?? !!!
Thank you so much!!!
Oh really?? !!!
Thank you so much!!!
Absolutely marvelous!
Indeed, it does take some time. Less than a second, to be precise.
Sounds like you and Michael are happily familiar with this and will have no need for help in generating JSON mapping files. I shall be the happy recipient of that JSON. I assume that the filenames for .flac
and -map.json
content will follow hierarchies similar to mn/en/sujato/filename
so that I can infer the proper urls for either resource.
I shall probably need to cobble together a simple Voice prototype for a single sutta flac
with its mapping file before I can understand how Voice code needs to change. Up until now there has been a design assumption in Voice that sound files are segment level, not sutta level. When design assumption change, code creaks and groans mightily with stress.
Hi all!
Iāve written a changed wiki at Recording suttas: process for creating voice recordings Ā· michaelh-sc/suttacentral Wiki Ā· GitHub - can someone give the github user michaelh-sc permission to push straight to the wiki, as it aināt easy to fork a github wiki and issue pull requests it seems.
Now, I would suggest Digital Ocean buckets in the long term as they have a good geolocating CDN for everyone to access and are cheaper than S3. Wasabi and Cyberduck used for sharing around working files.
The only question (Iām still unsure if itās answered) is are the segments from Aeneas mapping straight to segment?
Now I understand that the key is the actual text, wow thatās very awesome solution and all in RAM? - sadhu!!
I think that the intonation and context might get lost if we continue to have the same line pronounced in the same way for any sutta, what do we think? Might need a second subclassed guidcache where the key is the hash of suttaname-segment_number-isolang-author
or just a key value pair of that key and a URL but I am no expert on nodejs at all.
Let me know if any of that wiki article needs changing, and if no one has any objections Iāll set up a wasabi bucket with keys very shortly and register it as a company thing with SC and owned by Ajahn @Sujato, and a list of authorised people later because anicca. Also a Digital Ocean object storage with the same keys - both can be accessed with Cyberduck, Cloudberry, and web if needed at very high speeds I bet.
Sadhu all!!!
One of my favorite aspects of hanging around this forum is seeing progress unfold in action.
Thank you so much, Bhante @sujato, @michaelh, and @karl_lew, for making this project a reality. Buddhism was first introduced in the West in the 19th century. But it is only now, toward the end of the 2010s that the well-translated Dhamma is becoming accessible for everyone: rich or poor, blind or sighted, located in NYC or Sub-Saharan Africa, speaking English fluently or not-so-fluently. Itās an honor to witness this development.
@karl_lew Itās a curious state of affairs when two people are both convinced by the otherās position! However, your idea of combining flac
files with images to form YouTube videos gave me an idea. What about combining selected sutta recordings with artwork already made by Ven Yodha? Then those could be uploaded to Dhammanet, which already has 5 000 subscribers. It would give the additional advantage of starting out small w.r.t. YouTube videos.
Thinking about the next decade, I could see great value in high-quality segmented translations to Mandarin Chinese, Hindi, and Spanishāon par with and a full substitute for PÄli/Englishāto make the Dhamma accessible to even more people, learning from the experience of distributing the Dhamma in English. My $.02
Fantastic idea! Weāre still waiting on the Vinaya from Ajahn Brahmali, but when that arrives we can start working on videos with Ven @Yodha 's permission. As with the suttas, it would be good to have a choice of human (which human(s)?) and robot voices. Anagarika @Sabbamitta might we have this on the Voice backlog?
Michael, thanks for the update. I donāt have wiki permissions myself so I cannot help with merging your change.
Would you say more about the following? I donāt quite understand the _3v2
bit. Whatās the 3 and whatās the 2? And is that a full sutta or just a segment? Does the original ever get updated? How do we find the latest version?
for whole sutta and
en/sujato/sn1.1_3v2.flac
for any retakes of segments if needed.
One of the things I will need to determine is whether Voice should use cloud storage for each user request or only to fill its own cache. My initial research indicates that latency and/or egress costs may cause us to favor the cache solution rather than the accessing the cloud storage for each user request. If we do rely on Voice caching of cloud storage, then the advantage of a geolocating CDN would not matter so much since the latency would be solved by the VPS cache and we would geolocate the VPSās instead.
The following example from the aeneas site isnāt JSON, but could certainly be a JSON object with text keys and value objects having start/end millisecond offset:
1 => [00:00:00.000, 00:00:02.640]
From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880]
That thereby beautyās rose might never die, => [00:00:05.880, 00:00:09.240]
But as the riper should by time decease, => [00:00:09.240, 00:00:11.920]
Bhante will repeat segments as he speaks each sutta, so the outcome will be more natural. However, if he updates the translation of any repeated segment, then all suttas that use the translated segment would need to be re-recorded. For example, if Bhante changes the translation of āThere are 1.4 million main wombs,ā¦ā, then he would have to re-record three suttas. If Bhante changes the translation of the first jhana definition, then he would need to re-record 96 suttas.
The SoundStore GUID is the hash of the JSON signature, which for human voices will include the segment number since Bhante is recording entire suttas. This will disambiguate common segments. Indeed, we could dispense with the guid altogether since the key you gave would also work. Iāll need to check the Voice code to see which will be easiest.
Thanks for doing this. Iām slowly wading through the S3 Javascript SDK documentation and finding it rather massive and overwhelming. It may take a while to figure this out. Also, letās wait for Bhante to tell us what is easiest for him to store the flac
files that he creates. He may or may not want to upload to Wasabi directly himself and may need your help with that.
Oh and I think the Wasabi buckets would be public for read so that Voice can access them?
Bhanteās intent is broad distribution, so I think public for read is fine.
Done! (A warm hullo, btw)
I am not entirely sure what you mean. If we have Vinaya texts available for Voice weāll use just the same robot voices as for the suttas. Human voices of course depend on recording. I am not sure what you want me to put on the Voice Backlog?
Robbie had the cool idea of creating movies for YouTube from Ven. Yodhaās Dhamma Doodles with the text of the Vinaya and the sound of the Vinaya all synchronized together.
Itās a pretty amazing combination! And it would take several helping hands.
Oh, I just didnāt understand this falls within the scope of SC-Voice. But if youād like to work on this I donāt want to stand in your way, of course!
Itās on the Backlog now.
I actually donāt know who would do it, but they would likely need our help. Letās just nominate @Robbie to lead the charge on this!
Nomination accepted!
Now I have questions!
How can I find the backlog? Do we also want to create videos for selected suttapiį¹aka suttas (those which already have doodles)? Do we want to wait for human vinaya recordings? English only or PÄli/English?
@karl_lew I think we can use hand-made YouTube subs for the text. Then people can turn them on/off as they see fit.
yes, we will make sure the naming conventions are correct.
That was our assumption, too: Michael creates both segmented files for SCV, and whole files for ebooks, Youtube, internet archive, etc.
On the other hand, if you want to figure out how to deliver segmented audio direct from SCV using whole files as source, thatād be even better. The whole/segmented difference is, perhaps, better handled at the application level. For epubs, for example, we use the whole audio file and an XML smil
file to tell it where the segments are. But of course, the bandwidth issues for an ebook are different than with SCV.
Anyway, from our point of view, we are happy to serve segmented audio files to SCV, but if we could serve whole audio files, weād be even happier!
SCV wonāt touch the flac
files. We will serve pre-optimized opus
files for production, and retain flac
for archive only. Opus has really good compression! Just with some initial tests, a 6.7 MB flac
file converts to a 1.1MB opus
file with no audible loss of quality.
For this reason, I also do not think we should deploy low and high quality versions of the files. Just make opus
files at a good compression with no audible quality loss. Iād like to keep our pipeline as simple as possible.
Yes.
This is a hard problem, and I donāt think there is an easy solution. We should maintain some kind of version control so that we can be sure that this audio is the recording of this text. It would be nice, but not guaranteed, to update the audio with every correction of the text. I suspect weāll have to dribble this one down the field a bit and hope that we all die before it becomes a big issue.
For your edification and amusement, I include a sample of the files I created today. For Michael, I did the first ten suttas. Here I just upload one. This has English and Pali text, audio, and timing maps. The audio has both raw flac
files as well as opus
. Everything is, of course, just for testing purposes, these are not production files.
sn1.1.zip (10.1 MB)
Postscript: opus
is supported by 86% of browsers currently.
But we need to use a āCAFā wrapping for Safari.
https://hetzel.net/2017-06-12/ios-11-opus-support-in-podcast-feeds/
Wowāthe Pali sounds so very beautiful! Itās certainly Aditi who will have the most serious competitor!!! Full of aweā¦
Thanks for adding me Aminah! Please give me another day to look through a bit more before I make any pull reqs/sudden pushes to wiki
All suttas on Youtube with Ven. Yodhaās artwork is a great idea imo, the dhamma doodles ven. Akaliko gave out once are so nice - can be done later once some recordings are finalised perhaps.
Sorry was an error on my part missed one digit , Iāll just follow the already worked out segment naming convention for final deliveries to the app, but I do suggest underscore instead of colon for segment delimiter, and no versioning nor step name for final opus files:
sn1.8_3.4.suffix
sutta_segment.webm
ok, so a couple servers per broad āregionā perhaps. CDNs can be pretty great but perhaps too much data out - Iām not sure how many downloads there will be, could go full viral . If the root URL is a variable instead of stored in database, it could point to various servers or one good web server in the app later.
If a sutta with timecodes for when the text shows is possible, it might be a faster experience for hifi than downloading all segments separately, either in parallel or one after the other. Buffering like a youtube video does?
OK, a single bitrate opus in .webm and .caf seems like a good single target imo.
Sure, you can use any doodles you like. Let me know if you need any special topic or sutta not covered yet, and Iāll see what I can do.
Bhante! Definitely, it is very good work carrying by you.