Wanted 🕵️‍♀ : Translator for SC-Voice interface

Tags: #<Tag:0x00007f78882df480>


Adding Portuguese/Cristiano is easy in principle (just update words/voices.json). However, it is logistically complicated as Anagarika Sabbamitta mentions. We are paying for 30GB of disk storage currently and the disk is 40% full. To give you an idea of our disk needs, Angutara Nikaya in Pali today takes up 1.8GB. :open_mouth:

The website translations themselves are cheap and easy. They require minimal disk space. It is the TTS storage that consumes major disk. To store the speech for all the available languages on a single server would probably require almost 1000GB or 1TB. That would be quite costly for use to support. But supporting different web interface languages is easy and fun!

Because of the difficulty of TTS, we’re proceeding cautiously with the TTS portion of new languages. As Anagarika mentions, we are starting first with Deutsch and now, thanks to your team’s help, we are also exploring Portuguese. Each AWS Polly voice we add requires lots of careful editing to adjust pronunciation. For example, you’ve probably noticed that Gabriel_L has already made many corrections to the Portugues voices for words like “arahant”. Each AWS Polly voice has its own quirks. Fixes for Ricardo will probably not work for Cristiano. :see_no_evil:

In the longer term, we need to work out a strategy for multilingual Voice. Although we might store everything on one huge server, perhaps that is not the way to proceed. Instead, it may make more sense to have local servers handle local languages. For example, we can host EU languages on an EU server or servers. And we can host Asian languages in data centers that will minimize latency. We could host the Voice Portuguese server in Brazil. This is quite doable. In fact, Aminah herself has created her own Voice server that we are using for staging. If the Portuguese team would like to explore this, we have instructions for AWS Installation. In this way, the Portuguese team could dive deep into Voice Portugues with their own servers using a custom fork of Voice. We’re currently looking at Bilara integration, so your translations would also be hosted on those servers.

The logic for this is in public/js/scv-singleton.js:

var navLang = g.navigator && g.navigator.language;

For testing, I set my browser to Deutsch (for example) and then CTRL_SHIFT_R for a hard refresh of the website => Alles wird zu unserem Vergnügen auf Deutsch gezeigt


Ah, like that.
Confirmed, setting my preferred language to Romanian, makes it appear in Romanian by default.


Danish interface is now on the staging server, --> Alt vises på dansk til vores fornøjelse! :smile:


I can help for the Vietnamese translation :vietnam:
On a side note, I’m still editing the Vietnamese audio, it’s just that I have less free time at the moment


Great, Phineas, thank you! Here is your file to translate.

If you have any questions, please check this thread; some issues have already been discussed here. If there is still something unclear, feel free to ask anytime.

And no worries for the Vietnamese audios: When they are done they are done, and we will happily include them into the Voice Wiki. :pray:


Voice v1.8.11 is ready for staging with the Portuguese fixes as well as the initial Vietnamese file. I was saddened to discover that AWS Polly has no Vietnamese voice. We may have to rely on other TTS services such as Microsoft. I’m thinking about breaking out the Voice TTS adapter as its own Github NodeJs project so that others can help with such efforts. It would be useful to have an open-source TTS package that presented a unified API for all languages independent of the EBTs.

Translation in general is quite a painstaking process. You are all helping the Voice web page with localizations for specific languages. However, as you’ve noticed, not all languages will have EBT translations in Voice. We currently are working on having Voice display German segmented translations. For some languages there may not yet be any segmented or even unsegmented translations available for Voice to display. Furthermore, Voice currently only supports English searches. Anagarika Sabbamitta and I are working to support German word search and are making great progress, but it will be some time before we can support Voice search for all languages. Voice search relies entirely on segmented translations and does not work with unsegmented translations. Nevertheless, it is quite exciting to see the languages of the world appear as shown above.

Thank you all!