Cool hacker stuff to do with SuttaCentral? redbean!

Jhana4 · May 16, 2022, 7:12pm

What about providing sutta translations in Klingon?

sabbamitta · May 16, 2022, 7:46pm

https://suttacentral.net/snp1.8/kln/worf

Jhana4 · May 16, 2022, 8:25pm

Well. I’m impressed. LOL!

karl_lew · May 17, 2022, 11:50am

Word caching is really tricky, since the AWS voices are sensitive to punctuation and the most advanced AWS voices are sensitive to grammar. Speech is difficult.

Instead of word caching, I’ve been looking at caching by phoneme. The premise is that although MP3 compression is fantastic, some extra compression might be afforded by compressing a specific narrator’s speech (i.e., Aditi). Some projects at Google have beaten MP3 compression using AI. This is really promising, but given the lack of time, I’ve currently focused on brute force simplicity. Disk is simple and cheap on Linode.

I’ll revisit caching once I get the brute force solution going for offline. All the audio for a language should fit on a largish MicroSD card for a brute force approach.

I’ll be experimenting with the Raspberry Pi 400.

Yes, indeed. I’ve been using the JS API for Tensorflow. It’s been a fascinating journey. Here’s a sound sample from my experiments:

As you can see, it needs a little more cooking.

If you’re interested, AI audio compression for a single narrator seems like a really fun thing to look into.

Khemarato.bhikkhu · May 17, 2022, 1:31pm

Holy cow that’s come a long way! Not bad at all. We got to the moon (and back!) with worse audio quality than that!

olastor · May 19, 2022, 9:14am

I see…

For a 11KB file I am not sure if you can increase quality by much and keep the size, actually.

Yeah, that sounds complicated. I’d be probably more interested to find out if you could train a model so that it can speak any text correctly. The results for english and other languages are pretty impressive. You’d need a lot of training data, of course. And usually the results are not as good as you expected them to be.

karl_lew · May 19, 2022, 9:36am

I agree. My initial attempts were with direct text-to-speech creation using AWS narrators as training data. AWS narrators are quite consistent, so my hypothesis was that training would be more successful. However, I could not get any acceptable degree of quality with the smallish models I was able to train with. I’m sure it is possible–AWS itself is the shining example.

I’ve had much more success reducing the transformation gap by using a different approach. Essentially, I’m modelling phase and amplitudes of harmonics with supplementary pink noise for fricatives. The models are smaller and much faster to train. However, the compression isn’t quite there yet. It’s easy to get to MP3 levels of compression, but as you say, it’s a bit harder to push beyond.

If you’d like to investigate the direct TTS AI for Pali, I will be able to provide endless training data for you. That would definitely be the compression sweet spot.

olastor · May 19, 2022, 10:11am

I guess you avoid ogg/opus for compatibility reasons? Those files are always really small.

If it’s no trouble for you to share some training data, I would maybe try something out. But since you already tried that approach and it seems hard to do I probably won’t come up with something groundbreaking .

karl_lew · May 19, 2022, 10:34am

I’ve split this off into a new thread AI TTS for Pali for those interested in following:

sujato · June 19, 2022, 12:39am

Hi @olastor you probably know already but just a heads up: Justine just blew up HN again by announcing a new redbean 2.0 with even more awesome packed in there.

olastor · June 19, 2022, 9:58pm

Thank you, yes I’ve actually seen it already! Most interesting I find that she seems to be considering creating binaries with Python3 instead of Lua.