SuttaCentral Voice Assistant

frankk · September 24, 2018, 12:29pm

If you do a wiki, you would need moderators to make sure submissions and changes, like all of the SC sutta text, does not violate any copy right. There are other criteria as well, like knowing which suttas have been proof listened for accuracy, whether the files are hosted on stable sites like archive.org, and free of malware on downloads, etc.

I think a wiki can work somewhere in the process, but I don’t think you’d want the SC voice to be able to play any unmoderated submission from that wiki, just as SC sutta text collection does not just contain links to all sutta translations available on the web.

karl_lew · September 24, 2018, 2:13pm

Aha. I think I understand better. Sentences are too granular and even Bhante Sujato’s text segments are about that level. You have grouped at an even higher level presumably for ease of traversal and study. SuttaVoice does generate sections from autoexpanded suttas. The sections are derived from the expansion groups themselves. However, I’ll be adding an auto-section option that will do the same for any sutta. The auto-sectioning will group sentences/paragraphs into 3-4 minute sections to slide in under the Apple screensaver limitations. The sections will be titled with the starting phrase and all be accessible individually. The entire sutta will also be downloadable as you have done.

On a side note, it is quite the experience to read a sutta while listening to an alternate translation spoken. The nuances stand out and are actually quite informative. I read Bhante Sujato’s translation of AN8.30 while listening to yours. Much learning!

Thank you for clearing up my confusion. I shall work on this hopefully in the next four weeks.

karl_lew · September 24, 2018, 2:22pm

Yes. The wiki is currently open to the public and it perhaps should not be. I was discussing this with my wife who advised me to be cautious of MP3 viruses. I shall restrict editing to approved contributors. I propose Bhante Sujato, Aminah, Viveka and Frankk as initial contributors. You will all have the right to allow others to contribute–that will be an extension of trust in your hands. Bhante Sujato will be a full admin and will be able to designate other admins. For this to work, please post your github account names here so that I may set up such access. SC-Voice will not play any of these audios directly, it will simply redirect listeners to the wiki page on a new browser tab. This should provide greatest flexibility of action with a measure of moderation.

Viveka · September 24, 2018, 5:15pm

What is github?

karl_lew · September 24, 2018, 5:56pm

Github is a place for software engineers to share. Anybody can join. Simply go here.

This will give you an account. Some software engineers enjoy stealing other people’s accounts, so you should probably consider two-factor authentication for security (as you should on any online account such as gmail, etc.) Logged into that account, you will be able to edit any wiki page in SC-Voice. For example, should you decide to record MN1, then you can upload it to the Internet Archive (frankk knows how to do that and can help you). Once uploaded, you or frankk can copy the link into the SC-Voice MN1 page., providing any relevant information. I already added one of frankk’s recordings as an example. As a moderator you’ll need to establish what’s important to document about a recording and how it should be laid out on the page. The existing layout of the wiki page was simply something off the top of my head–y’all should discuss how it should be done formally.

Once the moderators have determined proper wiki page layout, I can write a script to generate initial wiki pages for suttas with voice recordings. That will be a lot of pages (>4000), so a script will save us all time.

Viveka · September 24, 2018, 10:14pm

Thanks for that. I’ve made an account with the user name of SuttaVoice. And I’ve installed audacity.

This is so much more complicated than what I expected !

FYI, I just have to leave things for about a week - I’ll be back to ask more details later
Thanks

karl_lew · September 25, 2018, 12:11am

Github invitations have been sent to Bhante Sujato and SuttaVoice. Thank you for helping us out. Sorry for the chaos–we are all new to this.

karl_lew · October 1, 2018, 8:51pm

Releasing SC-Voice v0.4.0 at 2pm PDT RELEASE COMPLETED

This release brings with it direct content integration with SuttaCentral via the suttacentral.net/api REST services. With this release, you can now listen to suttas translated by:

Bhikkhu Sujato (existing/Supported)
Bhikkhu Bodhi (new/Legacy)
I.M. Horner (new/Legacy)
Thanissaro Bhikkhu (new/Legacy)

Also, the title details now serve as the “sutta card” for SC-Voice and provide blurb, attribution and links to additional resources:

As always, please do report mispronunciations or bugs.

more information…

sujato · October 2, 2018, 12:13am

Amazing work as always!

May I ask what kind of mic that was? The one I am recommending is specifically designed for podcasting and the like.

karl_lew · October 2, 2018, 2:25pm

The sc-voice wiki now has wiki pages for all Pali Audio MN and DN suttas. For example, there is MN96 and DN15. Each of these wiki pages will be served automatically from a SuttaCentral Voice Assistant link on the associated sutta.

@Viveka and @frankk please feel free to add your own wiki pages or edit any existing wiki page as you record your suttas. Doing so will make your recordings available to anybody using SuttaCentral Voice Assistant.

p.s., The Pali Audio SN and AN suttas are arranged slightly differently than on SuttaCentral, so I’ve not created any sc-voice wiki pages for them. We can add these later.

frankk · October 2, 2018, 3:02pm

Sorry, I don’t remember which one. It was analog, a famous model, probably the one they use for musicians recording with amps, preamps, lots of heavy and expensive analog equipment. The mic itself wasn’t that expensive, but getting the D to A converters and other things to connect to USB and a PC, you had too many choices to make and try out. The set up that was recommended to me, did not give good results in the final WAV recording.

I don’t doubt that Shure could make a good podcasting mic, and maybe even one with a direct USB PC hookup.

sujato · October 3, 2018, 12:55am

Yes, well that would make sense.

Outstanding!

Some news on the TTS front:

https://azure.microsoft.com/en-us/blog/microsoft-s-new-neural-text-to-speech-service-helps-machines-speak-like-people/

karl_lew · October 3, 2018, 1:57pm

Oh! That was a surprise. I did not realize Microsoft also joined the TTS bandwagon. That’s quite encouraging. The competition shall be fierce and healthy.

karl_lew · October 3, 2018, 2:46pm

Scanning the error logs I noticed:

20181001 21:32:57 WARN Error: loadSuttaJson() no sutta found for id:87
20181002 03:01:17 WARN Error: loadSuttaJson() no sutta found for id:Lokavipatti Sutta
20181002 12:09:23 WARN Error: loadSuttaJson() no sutta found for id:an.1.55-60
20181002 12:09:36 WARN Error: loadSuttaJson() no sutta found for id:an1.55-60
20181003 02:15:46 WARN Error: loadSuttaJson() no sutta found for id:mn 125
20181003 03:44:29 WARN Error: loadSuttaJson() no sutta found for id:1
20181003 03:49:37 WARN Error: loadSuttaJson() no sutta found for id:sn
20181003 03:49:44 WARN Error: loadSuttaJson() no sutta found for id:sn48
20181003 03:50:01 WARN Error: loadSuttaJson() no sutta found for id:sn
20181003 03:50:56 WARN Error: loadSuttaJson() no sutta found for id:linked discourses 48
20181003 03:53:42 WARN Error: loadSuttaJson() no sutta found for id:12
20181003 03:53:48 WARN Error: loadSuttaJson() no sutta found for id:sn
20181003 03:53:53 WARN Error: loadSuttaJson() no sutta found for id:sn 1
20181003 03:54:09 WARN Error: loadSuttaJson() no sutta found for id:42
20181003 03:54:22 WARN Error: loadSuttaJson() no sutta found for id:48.42
20181003 04:11:45 WARN Error: loadSuttaJson() no sutta found for id:dn
20181003 09:53:57 WARN Error: loadSuttaJson() no sutta found for id:SN 48.50
20181003 09:54:22 WARN Error: loadSuttaJson() no sutta found for id:SN v 225
20181003 10:25:27 WARN Error: loadSuttaJson() no sutta found for id:sn

Good news

SC-Voice automatically finds the enclosing range for Supported suttas such as an1.55 and maps them as possible to their enclosing files such as an1.51-60
spaces are significant, so “MN 125” should be entered as “MN125”. In the future, “MN 125” might work as a compound search interpreted as “suttas having both ‘MN’ and ‘125’”, but that has not been implemented. For now, just omit the spaces
you can also go to a specific sutta such as “mn1/en/bodhi”. This is the canonical SC-Voice way of addressing a sutta and corresponds exactly to SuttaCentral.net’s REST API.
Searches for sutta by name (e.g., “Lokavipatti Sutta”) are not yet implemented.
42 may eventually be supported out of deference to HG2TG.

Bad news (HELP)
I need help understanding what the desired outcome would be for searches such as these:

dn
SN v 225
87

SCMatt · October 3, 2018, 2:59pm

karl_lew · October 3, 2018, 3:00pm

@SCMatt

sujato · October 3, 2018, 11:54pm

That’s excellent.

Just so you know, on SC we use two forms of ID:

Jeans-and-tshirt: mn1, snp2.4, ud3.1
Suit -and-tie: MN 1, Snp 2.4, Ud 3.1

It’s not possible to convert from one to the other in all cases (because the capitalization conventions depend on understanding the Pali terms). So we maintain both sets of IDs. The jeans-and-tshirt form is used for processing, URLs, and the like, while the suit-and-tie form is for presentation.

None: these are malformed.

Can you let me know what the source is for the the IDs that produce an error? I’ll look into it.

Gabriel_L · October 4, 2018, 5:03am

Hi @karl_lew,

I am not able to load in my browser (Chrome) the link at the OP:
http://50.18.90.151/scv/

It seems the right address is: http://50.18.90.151/scv/#/ (I found in one of the posts within the thread)

Is there any plan to come up with a user friendly web address for the SuttaCentral Voice Assistant?

sujato · October 4, 2018, 9:57am

There is, yes, once it is ready we’ll put it at voice.suttacentral.net.

karl_lew · October 4, 2018, 12:00pm

Oh dear. This is difficult. “MN 117” is actually NOT in mn117.po. Formal designations are apparently stored elsewhere and are not in the suttas themselves. @Blake, do you have any ideas on how I might add “suit-and-tie” search to SC-Voice?

SC-Voice currently lacks identity view. It reacts solely to the search string. My guess and hope was that someone here at SC might have entered it and might be able to shed light on the particular use case. For example, I could see how “dn” might simply return a catalog of Digha Nikaya. But that was my inference and I would prefer corroboration of that hypothesis to even start thinking about a solution.

I do think that a simple identity view might provide benefit as a study aid. I personally would like to know: 1) how many gazillion times I’ve listened to MN1, MN10 and MN2, etc. 2) what suttas I have NOT listened to, and 3) what suttas are generally popular. Such an identity view would probably not be traceable back to any individual, however. There still would be no login. Only a tracking guid generated from a secret phrase known only to the user.