SuttaCentral Voice Assistant

sujato · September 22, 2018, 1:55am

Just looking around a little, it seems the best-reviewed option for something like this is either the shure mv5 or Yeti by Blue:

The main difference is that the shure is cheaper and more portable, but the Yeti comes out with marginally better audio quality. But either one would be fine, I’m sure.

karl_lew · September 22, 2018, 11:01am

Release 0.3.0 in 5minutes (4:05AM PDT) RELEASE COMPLETED

Decreased concurrent AWS Polly requests from 20 to 10. Hopefully this will alleviate the “File too short” issue encountered when listening to an uncached sutta.
Inappropriate background image of Buddha removed. Buddha images shall be placed above eye level.
POSTPONED: Due to complexities of grammatical changes, MN8 will not be expanded. Instead, users will be directed to PaliAudio MN8

karl_lew · September 22, 2018, 12:17pm

This is a great opportunity!

And a new challenge. I looked into Git LFS (large file storage) per your suggestion. What it does is provide a small pointer file inside Git to alleviate the demands on Git file storage. The actual file resides in a cloud store of one’s choice. This later point is crucial because that file storage is not part of GitHub, it is separate. Indeed the default implementation would require $75/month subscription. Before proceeding, we need to decide as a team how and where SuttaCentral AV resources should be stored. We should have a single solution for AV. I personally have zero experience with CDNs and would gladly defer to someone with more expertise if we have such a person in the community. AV storage tends to be massive and CDNs provide local delivery globally.

Another consideration is offline. Currently SC-Voice can be installed in a remote offline location with a minimal set of sound files suitable for local study. And all that would fit cheaply on a MicroSD card. I’m not sure we could treat the entire SuttaCentral AV repository as liberally. There are storage, bandwidth as well as copyright issues. Copyright issues can be intense. If the copyright owner revokes permission, we must eliminate access to all affected content. Indeed, SC-Voice is bound by copyright to Bhante Sujato for all currently available content. SC-Voice does not archive Bhante Sujato’s text–it caches it from Bhante’s Github repository. If Bhante deletes the Github repository, SC-Voice becomes mute. Automatically.

Bhante Sujato knows more about audio recording than I do :D. I don’t know anything about microphones. But once recorded, the files will need to be stored in the above mentioned SuttaCentral AV repository. I’d say record one or two suttas and let’s tackle how to store them as a team effort. For file format, please provide MP3. I personally prefer OGG, but Apple does not support that. You may wish to retain higher fidelity recordings personally, but SC-Voice delivers MP3.

Yes. Absolutely. After yesterday’s conversation with Blake, I realized that SC-Voice can access suttacentral.net/api/suttas directly for all suttas. This is actually better than Pootl, since this API is actually used to generate SuttaCentral.net web pages. I’ll be implementing an SC-Voice adapter for this API. We will still fall back to Pootl if SC is unavailable, but SC will become the primary sutta source.

Yes. I will make the Wheel colored again and have it link to SuttaCentral when clicked. It was grey as a placeholder. For visual assistance, minimizing control options is important (to prevent endless sequential tabbing). The assistance link to SuttaCentral will actually be on the footer. Placing global controls on the footer reduces conceptual burden while retaining global feature access. This is why the top Settings icon is clickable but not tab-able. The bottom Settings icon is tab-able. Visual assistance considerations are counter-intuitive to the sighted.

It was actually inappropriate and below eye level.

Actually, I regularly use CTL+ or CTL- to adjust font size on the fly as needed. The fontsize is determined by the Vuetify framework as being optimal to most users.

With increasing blindness, white on black is better because the screen reader highlighting shows up more prominently. I’m actually not reading the text –I’m scanning for the orange screen reader highlight that shows me what is being spoken . That said, out of acknowledgement for sighted comfort, I would offer the light theme provided by Vuetify as an alternate choice.

Yes. I’d love that. The screen reader does give that for free, and ChromeVox can speak sentences, words or characters as they are highlighted in response to user keystroke navigation. Amy/Raveena serve a different use case of “eyes-free, hands-free background stream listening”. In combination, SC-Voice and the native screenreader provide a powerful set of tools to assist micro- as well as macro-comprehension. Unfortunately, arranging visual correspondence between an MP3 stream and onscreen text is horribly difficult–I would not know how to solve that.

suaimhneas · September 22, 2018, 6:23pm

The Voice Assistant sounds really amazing. The speed choices are about right for me (Raveena a bit faster for studying and Amy slower and more measured). I knew something like this should be technically possible, but it’s great someone with technical experience and ability has actually gone and done it (and had some helpers here too ).

I do use audio quite a bit doing household jobs, exercising, commuting etc. As mentioned earlier in the thread, MN and DN and even some of the KN have a fair bit of audio coverage (via PaliAudio, Frankk’s audtip.org site and some commercial recordings also on audible). Your Voice Assistant will enable me to do that also for SN and AN. I think I’ll use your tool to do an audio run through of some of the MaggaSamyutta of the SN in the near future. Thanks again!

Aminah · September 22, 2018, 9:10pm

Just flagging up a pronunciation quirk: Mallikā (Raveena), found in MN87, SC 5.2.

sujato · September 23, 2018, 12:13am

Thanks for the explanations!

Okay, well that is good to know. Keep it as-is for now, and we’ll give some thought to a best-practice long term solution. Perhaps we should use the Internet Archive as our base source, like audiotip?

All my translations are CC0, so entirely copyright free and can be used for anyone any way they wish.

For other translations, we are careful to ensure that we comply with all copyright restrictions, so this should not be a problem. Right now I am getting Ven Buddharakkhita to make a set of his files that only includes properly licensed texts.

MP3 is a quarter-century old format with higher bandwidth and lower audio quality (not to mention a long history of legal complications arising from patents) compared with the modern, open-source, patent-free opus/ogg.

http://listening-test.coresv.net/results.htm

But the richest company ever in human history doesn’t feel like supporting it because it doesn’t help them make even more money. So we should penalize all users. Makes sense! </sarcasm>

But seriously folks, this is annoying. SC embraces and supports open source, and as so often, the open source option is superior in every respect. This is not an accident: it is the flourishing of the commons.

Karl, the app is yours and the decision is yours. But can we consider offering opus/ogg as the primary file type, and maybe a fallback to MP3 for unsupported devices?

Viveka · September 23, 2018, 1:13am

I’m currently investigating the equipment I need to make good recordings of voice. The microphone info is straight forward, but I believe that recording software is also required. I have no idea about this at all, but imagine that you would have specific format requirements.
I need something with a simple non-geek interface
I use a windows 10, Hp laptop. Previously, I used Apple, but had to replace my computer recently and apple was just too expensive by comparison.

sujato · September 23, 2018, 1:16am

I’m afraid I know nothing about Windows software, and almost nothing about recording software. Maybe make a special post on here asking for help?

karl_lew · September 23, 2018, 4:06am

Thank you, Aminah!

Excellent idea. There are currently over 3850 suttas available via SC-Voice. With SuttaCentral API integration, that number will increase dramatically. A simple way to tackle this might be to offer expandable groups such as “Majjhima Nikāya 1-152”

I learn something every day here.

The Internet Archive looks like it was intended for such archival. Indeed, there are lots of Buddhism audio files already up there.

That said, the experience of browsing the Internet Archive gave me pause. It’s like drinking from a firehose. As people contribute more and more recordings, we’ll need a way to organize them and reference them. And at some point we would inflict overchoice on ourselves.

OK. Let’s go OGG. SC-Voice originally did OGG, but then it wouldn’t play on my iPhone, so I had to switch to MP3. The ffmpeg program converts between the two, so if we store in OGG, we can deliver in MP3. OGG files tend to be a little plumper and are good at high-frequencies. OGG files make percussion music crisp and clean. I actually can’t hear the difference between OGG and MP3 voice files, however. But others might. The problem is that AWS Polly samples at 22kHz, which means that the highest frequency in Amy/Raveena voices is 11kHz . Old ears like mine can’t hear higher frequencies but young ears can hear up to 20kHz. For analog voice, the situation is quite different. Microphones will record all audible frequencies and make for a richer sound which, yes, should be represented as OGG, not MP3. Raveena and Amy will be MP3 robots and humans will be full quality.

frankk · September 23, 2018, 2:58pm

for info how on audtip.org does the filenames, mp3 metadata, to match suttacentral sutta numbering, and preserve proper sorting order with zero padded reference numbers. If you follow those standards, or something similar, it will make Karl’s job very easy to integrate into SC.

https://sites.google.com/a/audtip.org/wiki/

For recording software, audacity is free and very good.

make sure you get it from there ,and when the download link comes up, make sure it’s coming from their site or a trusted site so you don’t get malware or bloatware or adware.

for microphone:
https://www.amazon.com/Zoom-H2N-H2n-Handy-Recorder/dp/B005CQ2ZY6/ref=sr_1_3?ie=UTF8&qid=1537713685&sr=8-3&keywords=zoom+h2

I did a lot of research on this, and that’s the microphone you want. Unless you’re doing something really high end, have a recording studio, etc. I have an older generation version on that mic, run it on two rechargeable AA’s, it’s portable, can take it anywhere. Most of my recordings on audtip.org, see MN, and almost all of Ven. Jiv’s pali readings are done with that Mic. Bhante Sujato has complained about the background noise, but that’s not the mic’s fault, that’s the recording studio environment. Personally that level of noise (on my english audtip.org recordings) doesn’t bother me, and I didn’t think it worth the suffering that would be required for me to make noiseless recordings. Such as recording in a closet full of clothes to absorb the noise. I experimented with software noise reduction, but I didn’t like the results. It gets rid of all noise, but it makes the human voice sound more sterile. The zoom h2 has really great quality. When I listen to it, I can still hear in very detail even faint background noises like my housemate burping through closed doors, very distant car driving (my doors and windows are closed), birds chirping, etc. The level of detail is excellent, and I feel it’s worth it to preserve the raw human subtleties in the voice (compared to the sterilzed noise reduced version). For example, when I read the 32 marks of a great man, you can hear the subtleties of my voice trying to surpress laughter and contempt.

The h2 is portable, so you can record in the jungle, in a closet if you want noiseless background, and anywhere that you feel comfortable. I’d experimented with a few usb mics that require you t have a laptop or pc to hook up to, and some of those are really good, and a little cheaper than the zoom h2, but you have less portability, and you need a laptop/pc that’s super quiet, otherwise the laptop fan hum is going to show up as noise.

On audtip.org, for english I have almost a complete set of MN. The EBT portion of KN is almost complete, we could really use a female voice to the the Therigatha. So if that’s of interest to you, that would be a great one to record, that’s not in existence yet in English.

frankk · September 23, 2018, 3:04pm

@karl_lew,

Is there an automated way to generate offline mp3 of the suttas, segmented, for people to use offline line driving/commuting with just mp3 player and no internet connection?

segmented sutta mp3 file meaning it’s broken up by those sections you have in MN 1 for example, or any of the B. Sujato’s sutta sections in teh sutta source, not just one large one hour full sutta for example. The segmentation would allow the mp3 player to skip forward quickly to desired sections.

If the sutta section numbers are zero padded to sort, if the sections have meaningful descriptions, for the generated segment outputs, then the mp3 output should be sorted and ready to play on most mp3 players without any other special metadata.

karl_lew · September 23, 2018, 3:21pm

Yes. I plan on making full suttas available for download in a week or so after I hook up to all of SuttaCentral sutta content. Full length suttas will be expanded as possible and therefore surprisingly long. For example, I estimate that MN1 will take about 60 minutes with Raveena and longer for Amy.
Right now, SC-Voice permits download by section only.

Say more about the segmenting requirement? Did you want hundreds of spoken section numbers in MN1, for example (the expansions are auto-numbered)?

karl_lew · September 23, 2018, 3:31pm

@Viveka, @frankk, Bhante @sujato, and others…

Concerned about how to manage a vast collection of audio recordings, I have a simple idea to propose. The idea is to use a wiki. The SC-Voice wiki could be used. Wikis are crowd-sourced. We all contribute. We all moderate (or a select few, or something).

Here is a sample page for MN1.

SC-Voice could just link to each of these pages automatically. That would simply the coding of SC-Voice and unleash the power of, well, everybody…

Viveka · September 23, 2018, 5:17pm

Thanks so much for this @frankk
I had a look at the mic, am I correct in thinking that this mic has a SD card, and that you record to that direct from the mic… ie that you don’t need to have it plugged into a computer to record?
Also, that audacity would be required for playback and editing on the computer.

@frankk @karl_lew I’m afraid that I’ve been away from tech for so long (think 20 years!), that I’m struggling to understand all the terms and relation of the individual bits to each other etc. I’m really going to need an idiots guide… for what you’d like me to do. I have no personal preferences, but am happy to contribute, wherever it is most needed. Apologies, but I’m really going to need instructions; buy this, install that, record these things, like this, in this format, and send it here…
Once I’m set up, I’ll just get on with it

Snowbird · September 23, 2018, 5:56pm

Yes, it records to an sd card. It can also play back but only through earphones. The mic is good but you need to be prepared to have it relatively close to your mouth. You can also connect it to a computer and use it just as a usb mic, but as Frank said, it could pick up computer noise.

It can record in mp3 or as wav files. It’s a great little unit. The older models are also good.

You may find the technical advice pages on the LibreVox project site helpful starting here librivox.org/pages/about-recording/ as well as this page for setting up a recording system… https://wiki.librivox.org/index.php/Newbie_Guide_to_Recording

I should also say that Audacity is very easy to use. I have known many people who were quite computer illiterate and they mastered the basics of audio editing using Audacity very quickly. The system is visual and intuitive for doing basic editing. The fact that you have never done it before means nothing. It’s very easy.

Viveka · September 23, 2018, 8:26pm

Well, I’m sold on that unit… means recording is very simple
An happy with audacity too - thank-you.

I’ll ask more targetted questions when I have the gear, about how to arrange the recording files etc.

Many Thanks!

sujato · September 24, 2018, 2:29am

To be clear, the Zoom is not a mic, or not just a mic: it’s a recorder that includes a mic.

Zoom is good, we do recordings of talks and things on it. Ven Buddharakkhita’s sutta recordings were on something similar.

The advantage of the Shure mic I pointed out before is that it is just a mic: you plug it in via USB to your phone or PC. I haven’t heard the comparable sound, but I think probably both will be fine.

Once you’ve chosen the mic, it’s important to be careful and consistent in the audio environment and the distance from the mic. Different mics have very different behaviors. A stage mic demands a close and consistent distance from the lips. But I believe the Shure is designed to be more distant, as it is used by vloggers. So you can stick it on a desk a couple of feet away and it will be fine. There are advantages and disadvantages to both types. Basically, the further a mic sits away from your lips, the less effectively it will eliminate external noise. So if you’re in a quiet place, anything is fine, but if background noise is a problem you want something that sits close.

Snowbird · September 24, 2018, 3:23am

This is absolutely correct. On the flip side, the closer the mic is to your mouth, the more likely that it will pick up mouth noises.

Once you start recording you may hear all kinds of sounds you didn’t realize were there. I was surprised to hear airplane noises on the recording that I had never noticed in real life.

If you already own some kind of recorder, you may want to just start practicing with that to see what kind of problems you might have. Another problem with a mic far away from the mouth is that it is more likely to pick up room acoustics. So if you are in an empty room with nothing on the walls, the recording may be echo-ie. There really is a reason studio recordings sound as good as they do.

I just realized this conversation is happening in the voice assistant thread. Perhaps it needs its own.

frankk · September 24, 2018, 11:39am

Ideal case is something like KN Snp where each sutta probably avg’s about 3-5 min each reading time, and is already sectioned by SC sutta reference title+number.

audtip.org kn-snp-eng-than-rdrfrn sutta nipāta : Free Download, Borrow, and Streaming : Internet Archive

Sutta central’s DN already seems to have logical sections built in

Some MN suttas have sections built in, but most don’t. I’m planning to add sections for all MN suttas, in fact any suttas from any nikaya that could use section divisions. For example, in AN XX, each sutta would have at least XX number of sections added, like I did for this AN 8 sutta

audtip.org an08-0030-anuruddha-eng-than-rdrfrn : Free Download, Borrow, and Streaming : Internet Archive

For an AN XX sutta, you can see the advantage of doing that on standalone mp3 players. Example: I want to hear the 5th great thought, I can just hit the mp3 skip button 5 times and get there immediately.

For your MN 1 sutta, having hundreds of sections might be overkill, but maybe not for a DN sutta.

frankk · September 24, 2018, 11:51am

I tried a Shure mic once, had poor results, because then you really need to think about all the parts to buy with it to make it connect to a computer, the number of choices and factors to consider can be overwhelming. The AD converter, USB converter to connect to computer, etc. And even the Mic itself seemed like it would be better for certain kinds of singing but not so great for podcasting and sutta reading.