SC-Voice: Raveena meets Slow Amy

karl_lew · August 23, 2018, 11:11pm

Suttas have Pali words in them. This is a challenge for text to speech (TTS), because the TTS voices do not know Pali. To help the voices out, I’ve added a Pali IPA lexicon to words/en.json. This is actually quite difficult to get right since we’re basically trying to arm-wrestle the Amazon artificial intelligence and get it to do something it really wasn’t designed for. Most of the available AWS English voices are terrible at Pali. The worst are the American voices. I only found two voices that are somewhat customizable. The first is SlowAmy, who you’ve heard before. The other voice is Raveena, who speaks Indian English. Raveena is actually quite good at belting out search results at a fast clip (she’s going along at +5%) and takes 18 seconds to SlowAmy’s 22 seconds. And that is significant benefit to the end user for navigation. Here are some samples:

Raveena is probably best for crisp voice interaction. And the measured pace of SlowAmy somehow invites a deeper consideration, so she might be best at reading passages and suttas.

I’m just learning Pali pronunciation myself, so if you hear any Pali mistakes, please let us know!

For the programmers out there, you may find it interesting to see the Javascript code that generated these sound files. It uses the sc-voice library and just makes a few calls.

Gabriel_L · August 23, 2018, 11:56pm

I really like the pali pronunciation!
Just make sure you get ’c’ pronounced as ‘ch’ (like in Italian!), and it will be perfect!

mikenz66 · August 24, 2018, 12:39am

Raveena is cool. Sounds much more natural to me…

karl_lew · August 24, 2018, 12:46am

I’ve uploaded MN10-14, which start with a “c”. These are quite the tongue twisters.

Looks like Raveena is the navigation favorite. There is something about her that just makes me want to clean my room and “get a move on:”. She is actually the best voice for Pali and does a beautiful job with words like “bhava”. In the spoken examples above, Raveena is using SlowAmy’s IPA Lexicon. Thanks to the feedback, I will now proceed to tune up Raveena with her own phonemes.

mikenz66 · August 24, 2018, 1:35am

It’s probably unsurprising that an Indian-tuned voice will do better on Pali than an English or American (or New Zealand ) one…

Gabriel_L · August 24, 2018, 1:39am

It seems it is reading ’cula’ as ’kula’, my understanding is that it should be read as ’chula’.
I am really glad to hear these audios. I must admit I very much prefer a machine to read the suttas for me.

sujato · August 24, 2018, 2:22am

These are remarkable. We have looked briefly into getting Pali pronounced correctly, but this is really getting there!

My only thought—and I say it with great reluctance!—is to bear in mind that many speakers of English, especially those who have learned it as a second language, struggle with unfamiliar dialects. And for most people that means “not American”. There are people in Asia who don’t listen to the BBC because they can’t understand it, but they’ll happily watch CNN.

Mat · August 24, 2018, 7:50am

Great work, so far!

I prefer Raveena, but she speaks too fast for the listener to appreciate the meaning of the dhamma! Slow Amy is better and calm- this would be good speed, if it were possible to slow down Raveena’s voice.

Also, if possible, having Raveena as the default, would it be possible to have voice options so that listeners (sravakas) could literally choose the voice which creates the most wholesome effect?

Raveena pronounces sutta as a t (as in bat), when, if I’m correct, it should be ‘th’ (as in health). So -suththa. It would be great if this could be changed to the correct pronunciation.

with metta,

sujato · August 24, 2018, 10:45am

No! This is incorrect, there is no dental fricative th sound in Pali or Sanskrit.

T is an unaspirated voiceless dental stop, which is only found in English where the t follows an s: “still”, “stop”.
Th is an aspirated voiceless dental stop, similar to the normal English pronunciation of t (though you can put a bit more air behind it) in words like: “till”, “top”.

See explainer of aspirants here:

And see Wikipedia for correct pronunciation of Pali:

Pali, as a classical Indic language, has a clearly defined phonology which is laid out in detail in the old linguistic texts. There is no doubt as to how it is correctly pronounced. The international conventions for writing Pali and Sanskrit are based on this, and so long as they are understood correctly, can be relied on to pronounce Pali as precisely as any of the Indic scripts.

Modern Romanizations of Asian languages such as Thai and Sinhala are intended to represent the main language, and they do not always accurately represent the Pali or Sanskrit. I am not familiar with Sinhala, but in Thai, for example, “Buddha” is spelled พุทธ. The letter พ represents the voiced unaspirated labial, which is romanized as “b” in Pali. However in modern Thai this letter is unvoiced aspirated labial, identical with the letter romanized as “ph” in Pali. This is of course not an “f” sound as there is no labial fricative in Pali. Thus someone romanizing Thai would spell Buddha as “Phut”, which sounds much like the English word “put”. Thais will commonly pronounce the word the same way, although of course those who are educated in Pali will understand the correct pronunciation.

sabbamitta · August 24, 2018, 11:23am

Is the pronunciation given in this course correct?

Actually, I guess so, because it’s Ajahn Brahmali’s course.

karl_lew · August 24, 2018, 1:13pm

One of the maddening things about AI voices is that they encode local cultural conventions that diverge from global consistency. Pali is remarkably consistent and vowels sound the same no matter where they are (e.g., ananda). The AI voices assign different sounds to vowels in different parts of a word. This means that “a” varies in sound when spoken by the AI voices. In particular, the obnoxious American voices use “-er” for ending “-a”. Raveena’s voice is the most consistent.

I will go ahead and develop three English voices: Raveena (en-IN), Ami (en-GB), Salli (en-US). These will be available in two speeds: fast (for navigation), slow (for recitation). That’s a matrix of six possibilities of choice applicable to two different settings (navigation vs. recitation). Hopefully that will also suffice for global use. One significant downside of supporting multiple voices and speeds is disk storage. For crisp interaction, sc-voice caches sounds to avoid AWS Polly service lags and costs. Each of the sound samples in this post is about 140KB. With six voice variations, we would have almost 1MB for a miniscule sample.

Dhammanando · August 24, 2018, 3:51pm

Almost correct, except for the palatal stops ja and jha, which the speaker realises as post-alveolar affricates, i.e., ja as [dʒɑ] instead of [ɟɑ] and jha as [dʒʰɑ] instead of [ɟʱɑ].

Almost all Pali textbooks written for English users give erroneous instructions to the effect that the Pali ja should be pronounced like the ja in jam. The result is as described above.

In fact when Pali ja is pronounced palatally, to the ear of a native English speaker it should sound COMPLETELY different from the ja in jam but only very subtly different from the ga in gap.

Click here for an example of how ja should sound…

Mat · August 24, 2018, 4:56pm

This wiki article table agrees with that: Pali - Wikipedia

However there is a voiceless aspirate of ‘th’ here, and the monk pronounces it, as in ‘health’: http://wisdomandwonders.org/itp/ -sorry I can give a link to the exact column in the table.

In voiceless aspirates, dental ‘ṭh’ is different from cerebral ‘th’.

with metta,

karl_lew · August 24, 2018, 5:17pm

I will do my best to make Raveena, Amy and Salli speak Pali as they should. It is very difficult in that AWS Polly does not recognize and act upon the full IPA alphabet. For example, there is no way I can get Amy to say “bhava”. Amy says “bava”. Amy even speaks “bʰava” just like “bava”. Raveena, however DOES say “bhava” correctly. But Raveena ignores “ā” as in “ananda” vs. “ānanda”. And Aditi, the AWS bilingual voice, also cannot distinguish between the two words in Pali without a lot of arm-wrestling.

I’m sure the situation will improve over time. I think I can fix glaring errors, but the subtle nuances may currently be beyond us at the moment.

Mat · August 24, 2018, 6:15pm

This isn’t a big problem- the meaning will be informed from the context. As for the correct pronunciation for recitation, it is done differently in different countries…

Pali doesn’t have its own script being an oral transmission and in Sri Lanka it was written down verbatim, in early Sinhalese.

with metta (mettha? )

karl_lew · August 24, 2018, 6:29pm

Oh no. Now you’ve gone all meta on us.

Mat · August 24, 2018, 6:56pm

Metta is naturally written as ‘meththa’ by sri lankans, according to how it is pronounced in Sinhalese. I had my email sign-off, meththa ‘corrected’ to metta, by a Ven Mettavihari a Danish monk, who developed metta.lk, a precursor to suttacentral, many years ago.

http://srilanka.portbridge.com/images/kms.html

https://www.researchgate.net/figure/Illustration-of-the-Evolution-of-Sinhala-Script_fig4_277612133 - this shows the precence of ‘th’ from an early age.

So when metta is pronounced as meththa it sounds just little softer and kinder than the harsh ‘t’ usage.

with loving-kindness ,

sujato · August 25, 2018, 9:03am

Yes, of course, I should have given this link!

Huh, I’d never realized this. Now I get to research the phonetics … hmm.

There’s a nice chart here where you can hear the various sounds of the IPA:

English lacks a plosive palatal, so there is no way of giving an exact analogue. However Wikipedia says this:

[ɟ] is a less common sound worldwide than [d͡ʒ] because it is difficult to get the tongue to touch just the hard palate without also touching the back part of the alveolar ridge.[1] It is also common for the symbol ⟨ɟ⟩ to be used to represent a palatalized voiced velar stop or palato-alveolar/alveolo-palatal affricates, as in Indic languages. That may be considered appropriate when the place of articulation needs to be specified, and the distinction between stop and affricate is not contrastive.

Which seems reasonable.

Listening to the samples, both the one on the IPA chart, and the one you give, it seems to me that [ɟɑ] sounds closer to [dʒɑ] than [g]. Sorry!

How would you go about teaching the correct pronunciation to an English speaker?

Well, I don’t think he does, but regardless, that’s not how it is supposed to be pronounced.

Dhammanando · August 25, 2018, 11:20am

I think the only widely-spoken forms of English that contain [c] and [ɟ] are Jamaican Creole and Black British English. When Jamaican dub poets wish to indicate palatalization, they’ll do so by inserting a ‘y’ in the word. Linton Kwesi Johnson, for example spells “car” as “kyar” and “guard” as gyard.

So, the palatals can be taught in the same way. For Pali ca, start by saying “car kyar, car kyar, car kyar, car kyar…” until one develops a sensitivity to the difference between the velar and the palatal initial consonants. Then try saying kyar without the y but without making it sound like car.

For Pali ja one repeats the process using “guard” and “gyard”.

karl_lew · August 25, 2018, 2:03pm

Sadly, AWS Polly generates identical sound files for many of these sounds and ignores IPA subtleties. I think what they have done here is applied the local human recording of “j” to the IPA symbol. We could not, for example, get Raveena to say “jar”. Nor can we get Sali to say “jhana”.

Here is Raveena saying “jhana” with three different IPA phonemes. Raveena is our best bet for Pali fidelity. Which one of these jhana’s is closest to Pali?