Voice release v2.5: Search memorizer

sabbamitta · December 3, 2020, 7:27pm

Dear friends of Voice,

Tomorrow we are going to release a new version of Voice. We’ll let you know when everything has been successfully updated.

sabbamitta · December 4, 2020, 2:18pm

Hmm. Very sorry, we have to delay for one day …

It turned out that we had been testing an older version, and now prefer to wait another 24 hours after updating it.

sabbamitta · December 5, 2020, 2:15pm

… Finally, it was done! The new Voice version 2.5 has just been released.

You won’t perhaps see much difference, compared to the previous version. Yet a lot of work has been done by @karl_lew that is not so visible.

Firstly, we have built a new search memorizer. Before, if people have been searching a second or repeated time for the same thing, this second search would just take as much time as the first one. However, the result would still be the same. “Root of suffering” returns 7 Suttas, whether you search for the first or a subsequent time. So we thought, why not remember these results? This way, on repeated search, the result would be available much faster! This seemed especially useful as we could see in our logs that most searches were done with terms from our examples list.

AN5.155:6.3:
It’s when the mendicants memorize the teaching—
statements, songs, discussions, verses, inspired exclamations, legends, stories of past lives, amazing stories, and classifications.
This is the first thing that leads to the continuation, persistence, and enduring of the true teaching.

We thought: “Why not teach our Voice robot to do the same thing?” So that’s what we did.

The new search memorizer is now caching the search results, and if the same search request is made another time, the result is already there! This works both for Voice and for scv-bilara, our command line search tool.

This means, since I did already search for “root of suffering” right now, this term won’t take as much time as it used to do before. From the day of the release onward, the memory cache will gradually fill up, and more and more terms are available almost instantaneously. However, this also means that changes that are made to the texts won’t show up in your search results. This isn’t very relevant for English, as Bhante Sujato only occasionally still makes some edits, but it is more relevant for German where my translations are continually evolving and new texts are being added. In the future, that may also be the case for other languages.

Therefore, on content update the search memory cache will be cleared in order to allow for new search results to appear. But we won’t update content as often from now on as we used to do in the past. Having a faster search means that you will have to wait longer for new content to appear.
As many of you may have noticed, over the past weeks the SuttaCentral main site has been down a couple of times, and this also led to outages of Voice. Voice is dependent on SuttaCentral in various ways. It pulls information like legacy texts, titles, legacy author information etc. from SuttaCentral; basically, all the data that are not in bilara-data—of which Voice has its independent copy.

In order to reduce these dependencies we have built another cache to store these data, except for the legacy texts; this would be another huge task to get our own version of those. But with the new SC-API cache Voice is able to stay up in case SC is down, even if it can’t show legacy texts. At the very least it will then show a relevant error message in case you are trying to access a legacy text—i.e. for example a translation by Bhikkhu Bodhi—and Voice is unable to respond to your request.

To keep up-to-date with changes in the SuttaCentral data, this cache has to be rebuilt from time to time.
While fixing a download bug it turned out that we needed to switch to a new operating system for our Voice servers in order to get the latest version of ffmpeg running, a software tool required for building the download files. At the same time, we also upgraded our own machines to the new OS standards for our local Voice installs to be compatible.

On this occasion, a completely new downloader has been built, and at the same time we also integrated new download formats. Voice does now allow download in MP3, Opus, or OGG formats; the latter two being much smaller in file size, compared to MP3, so that more sound files can be stored in the same space.

We hope to have smoothed out all bugs for the new downloader, but in case we haven’t, please help us find them!
Following this request by @Snowbird, we have added a new verse output format to our command line search tool, scv-bilara—which at a later date we’d also like to make available as a tool for general use with a user-friendly interface. It turned out that this output format wasn’t quite what Snowbird needed, but I guess all Voice devs that have tried it out are very happy to have it! So a big thank you to Snowbird for the request!
We still found another Pali pronunciation issue that the SuttaCentral team has kindly fixed for us by removing some numbers from the root text that should not be there.
Well, and a few more bugs have been fixed as they arose … see all issues of this release here.

Thanks to all our users for being with us. You make us feel that our efforts are not in vain!!

Please make use of our feedback thread or call us by typing @devs-voice.

Stay tuned! We keep developing awesome features … at least Karl and I find them awesome, if nobody else, and we’re having much fun! Thank you so much, Karl, for working together in this way!

karl_lew · December 5, 2020, 2:59pm

I would also like to add that Anagarika Sabbamitta has been translating a LOT of German suttas, which are now available on Voice. Reading and listening to suttas in multiple languages is quite eye-opening. Each language has certain biases that are balanced out when reading or listening to multiple translations of the same sutta in different languages. Thank you, Anagarika Sabbamitta.

Anagarika Sabbamitta has also been quite prolific in expanding the Voice examples in both English and German. Example phrases are proving to be quite important. We analyzed Voice search strings for a month and discovered that most people prefer to use the Inspire Me! button, which searches for a random example phrase. We are slowly realizing the mnemonic value of these phrases–they provide a very strong basis for memorization and understanding. Please do let us know of any phrases you would like to see included in Voice examples–we are always on the lookout for more examples! And thank you, Ven. @Khemarato.bhikkhu for your example.

As Anagarika Sabbamitta mentions, the new release of Voice supports Ogg and Opus. Ogg and Opus have the same audio content but your operating system may prefer one to the other. Both Ogg and Opus downloads have lots of metadata. With the new metadata tags, you’ll be able to order your Voice downloads by Author, Album or Title as you prefer. For example, I order my own downloads by Album, which includes a year/month prefix that allows me to sort downloads by date for future repeat listening. Anagarika Sabbamitta and I both use VLC for listening to Voice audio downloads.

Looking forward, we’ll be streamlining Voice infrastructure. We’re investigating the use of static web-site components that will allow us to offer almost instantaneous searches without a dedicated server. And without a dedicated server, Voice becomes truly free.

Kaz · December 6, 2020, 12:41am

Thank you thank you very much for keeping improving the Voice, @sabbamitta and @karl_lew.
I truly just respect your efforts and dedication. Sadhu sadhu sadhu!

sabbamitta · December 6, 2020, 7:38am

And I am looking forward to the day when Voice will start speaking Japanese!