Cool hacker stuff to do with SuttaCentral? redbean!

Unfortunately so—it has always been our wish to make Voice work offline too, but currently that is far from close … so to speak.

1 Like

I worked a bit more on this “proof of concept” and builds of the lastest tag will now appear on this site which tries to add some docs: sc-portable (everything basically still only for testing use).

2 Likes

Perhaps you have already considered this or it’s not suited, but modern browsers can make use of a Web API for speech synthesis. An example is https://mdn.github.io/web-speech-api/speak-easy-synthesis/ . The quality is of course rather low compared to AWS.

3 Likes

This won’t work for Pāḷi (or older browsers) but I think it would be a nice option for the translation voice for people like me with a modern, mobile browser but with a high-latency internet connection. What do you think, @karl_lew ?

2 Likes

Yes, and most particularly: Try to get them to speak Pali!

Karl has put a lot of effort into optimizing the AWS Hindi voice Aditi for speaking Pali, and she is pretty good now. Inbuilt browser TTS systems can’t do that.

Oh, only saw your post now … :smiley_cat:

1 Like

I just tried this sentence in English

At one time the Buddha was staying near Sāvatthī in Jeta’s Grove, Anāthapiṇḍika’s monastery.

The word “Anāthapiṇḍika” is way too much of a challenge!!! :open_mouth:

2 Likes

Offline Voice is indeed the mountain to climb, especially for Pali and for low-bandwidth users. Offline Voice with full audio caches running on a personal computer, even Raspberry Pi looks quite doable. I’ve been thinking about audio caching using Github, perhaps with a repository for each nikaya. What this might enable is something like “donate a speaking EBT-Pi to a monastic”.

I did try listening to Speech synthesiser, but it was much too jarring for study. Indeed, I regularly listen to bilingual Pali/English recordings for study, and I shudder at how Pali would be butchered. AWS narrators bridge that critical Pali gap for me, and it is currently simpler to store such audio rather than re-create it using alternate technolgy.

I have tried AI voice compression techniques for several months and can get really close to the AWS sound. However, it’s a race between my declining mental capacity and what is thereby achievable. Currently, the audio storage is Occam’s razor of a solution.

2 Likes

Wouldn’t it be possible to do caching on a word (or “n-gram”) basis? Listening to some of the Pali text on SC Voice it seems to me like the pronounciation of each word is independent of the sentence it’s in. Not sure if that would work out well, but just storing the audio of unique words and then combining those to sentences offline (“on the fly”) would be more powerful and space efficient (if it’s possible).

Were you using Tensorflow for this? Just asking, because they do have a “Tensorflow.js” port which works in the browser with a cli tool that can reduce size of models by quantization. I used that in another instance to reduce a model from 80MB to ca. 20MB (with quality tradeoffs, of course), but it had nothing to do with audio and perhaps those are much bigger. There seems to be someone on the internet has tried something like this before with standard TTS models.

2 Likes

There is already a really nice service for sending a sutta a day. I just started using it. Snowbird’s. It includes direct links to alternate translations including SC.

1 Like

What about providing sutta translations in Klingon?

1 Like

https://suttacentral.net/snp1.8/kln/worf

2 Likes

Well. I’m impressed. LOL!

2 Likes

Word caching is really tricky, since the AWS voices are sensitive to punctuation and the most advanced AWS voices are sensitive to grammar. Speech is difficult.

Instead of word caching, I’ve been looking at caching by phoneme. The premise is that although MP3 compression is fantastic, some extra compression might be afforded by compressing a specific narrator’s speech (i.e., Aditi). Some projects at Google have beaten MP3 compression using AI. This is really promising, but given the lack of time, I’ve currently focused on brute force simplicity. Disk is simple and cheap on Linode.

I’ll revisit caching once I get the brute force solution going for offline. All the audio for a language should fit on a largish MicroSD card for a brute force approach.

I’ll be experimenting with the Raspberry Pi 400.

Yes, indeed. I’ve been using the JS API for Tensorflow. It’s been a fascinating journey. Here’s a sound sample from my experiments:

As you can see, it needs a little more cooking. :laughing:

If you’re interested, AI audio compression for a single narrator seems like a really fun thing to look into. :wink:

4 Likes

Holy cow that’s come a long way! Not bad at all. We got to the moon (and back!) with worse audio quality than that!

2 Likes

I see…

For a 11KB file I am not sure if you can increase quality by much and keep the size, actually.

Yeah, that sounds complicated. I’d be probably more interested to find out if you could train a model so that it can speak any text correctly. The results for english and other languages are pretty impressive. You’d need a lot of training data, of course. And usually the results are not as good as you expected them to be.

2 Likes

I agree. My initial attempts were with direct text-to-speech creation using AWS narrators as training data. AWS narrators are quite consistent, so my hypothesis was that training would be more successful. However, I could not get any acceptable degree of quality with the smallish models I was able to train with. I’m sure it is possible–AWS itself is the shining example.

I’ve had much more success reducing the transformation gap by using a different approach. Essentially, I’m modelling phase and amplitudes of harmonics with supplementary pink noise for fricatives. The models are smaller and much faster to train. However, the compression isn’t quite there yet. It’s easy to get to MP3 levels of compression, but as you say, it’s a bit harder to push beyond.

If you’d like to investigate the direct TTS AI for Pali, I will be able to provide endless training data for you. That would definitely be the compression sweet spot. :smile:

2 Likes

I guess you avoid ogg/opus for compatibility reasons? Those files are always really small.

If it’s no trouble for you to share some training data, I would maybe try something out. But since you already tried that approach and it seems hard to do I probably won’t come up with something groundbreaking :smile:.

2 Likes

I’ve split this off into a new thread AI TTS for Pali for those interested in following:

2 Likes

Hi @olastor you probably know already but just a heads up: Justine just blew up HN again by announcing a new redbean 2.0 with even more awesome packed in there.

1 Like

Thank you, yes I’ve actually seen it already! Most interesting I find that she seems to be considering creating binaries with Python3 instead of Lua.