I made a thing: EPUB + audio = 🤯

Things are starting to come together:

  1. @karl_lewl 's SCV app, syncs voice and text by segment (but segmenting only works with computer voices)
  2. EPUB generation
  3. Sutta reading

I took a bunch of these ideas and made a proof of concept for a high-quality audio ebook:

sn1.01.epub (930.8 KB)

The process is relatively straightforward, and it should be possible to automate it wthout much difficulty. I’ve mentioned most of these bits and pieces before, but here is the whole process to make this book.

Segmented sutta translation

I use SN 1.1 as a nice short example.


For the recording, I used a Rode M5 mic, Presonus Studio 24 interface, recorded on Ubuntu 18.10 with Audacity, with noise reduction and compression in post.

It proved to be a bit of a hassle to get the mic setup working. I think the problem was lack of JACK installed. Anyway, it works fine, if a little flaky in terms of recognizing the I/O.

Create map of audio

I used Aeneas made by Read Beyond to map the audio onto the text file.


  1. Install aeneas.
  2. Create a plain text file from the source po file. One segment per line.
  3. python -m aeneas.tools.execute_task sn1.01.ogg sn1.01.txt "task_language=eng|os_task_file_format=json|is_text_type=plain" sn1.01map.json

Make EPUB with embedded audio

I am no expert in EPUBs, so I just took one of the files from Read Beyond and copied it. I edited the file in Calibre and validated it in http://validator.idpf.org. I can’t really find much on the web for audio ebooks, so I just played around until it worked. The EPUB file is mostly valid, and anyway works fine. It’s very bare-bones, just enough to show off the text and audio concept.

Install ebook reader

Menestrello from Read Beyond is designed specifically for audio ebooks.


It’s a bit old, but works fine on Android.


It should be possible, given high quality human-read audio, to:

  • Create audio ebooks for all the suttas
  • Use the segmented audio files in SCV to complement the computer voice.

The process isn’t for you: it’s for our developers. Those who want to make recordings just need to worry about that. I’m still looking into it!


Wow! Even I… got it to work on my phone! :star_struck: :heart:


I wonder if you could flesh out the problem you are trying to solve by the creation of something like this. I’m in no way against it, I’m just curious who the audience is. It would have to be used on a tablet or desktop device as I don’t think (but don’t know actually) if any e-ink readers can render multimedia epubs.

The big advantage with epubs (beyond reflowable text) is that they are completely an off line thing and can be viewed on e-ink readers. But in this case I would think that either the file size for a complete nikaya would be problematic, or managing lots of smaller files would be start to be difficult.

Now I believe that the Edge browser is the default (only pre-installed?) epub reader on Windows. I tried it out and the audio player was on the last page of the book. When I started it playing and scrolled back to the sutta, the audio stopped.

Anyway, not trying to rain on your parade, just curious what need this is seeking to fill.

EDIT: Ok, so I did some more reading. Is it your understanding that a special reader app is necessary for the functionality? I installed the chrome extension Readium and got the warning that Chrome was discontinuing apps, so not to count on this extension.

I have zero experience with multimedia epub 3.0 things, but doing research solving other epub issues I came across several complaints that epub 3 includes so many features and ways of executing tasks that it is not very practical to implement things as not all apps will render them correctly. Do the major epub reading apps support this functionality?

Also, I would suggest creating a proof of concept that was at least three or four times as long… When I opened the book in Chrome the whole sutta fit on one page so the read-along nature of the file was moot. :stuck_out_tongue_winking_eye:

EDIT 2: Sorry, just found this…

Menestrello is a free app specifically designed for Audio-eBooks, and we developed it because the other existing apps lack one feature or another. However any EPUB 3-compliant reading system should be able to open EPUB 3 Audio-eBooks; for example: Apple iBooks, IDPF Readium, or Infogrid Pacific AZARDI. Note, however, that not all the features present in Menestrello are available in the aforementioned applications: for example, Apple iBooks does not support Media Overlays in reflowable eBooks. Please also note that EPUB 3 is a relatively young standard, and many improvements to the current reading systems are expected by the end of 2014.


I think this is the major rub…

However any EPUB 3-compliant reading system should be able to open EPUB 3 Audio-eBooks

This is the thing with open formats. Everyone waves the flag “Open! Open!” But as I understand it there is no such thing as legally “EPUB 3-compliant” so developers are free to impliment whatever they want/whatever there is a demand for.

Again, I’m not against having books like this, I just wonder if it wouldn’t be safer to implement it as part of the website.


You’re quite right, ereader support is patchy. I have, so far, only got it to work on Menestrello. But the point is that it’s trivially easy to create, once we have the source files. The whole process could be automated with a few simple scripts; aeneas was designed to support this.

That’s how standards work. It takes time, many years, for standards to be built and many more to be adopted. But once they are there, they stay. And there’s always the power of competition: once someone starts to popularize them, everyone will want to jump on board.

From my little research, it seems that the culprit here, as so often, is copyright. Amazon launched a feature similar to this, but the publishers said the rights to view the text did not include the right to listen to it, and so it goes.

Our web platform, polymer, has had similar issues, and in fact much of our recent work has been to update the site to agree with the mature web standards adopted late 2017. These things take time.

The functionality of this is similar to that on SCV already, except using human voices. From what I can see, adding human voices to SCV should now be trivial, but it remains to be seen. Adding similar functionality to the main site would be nice, maybe in the future, but it’s not on the immediate roadmap.



Are you talking about the text to speech feature? There were some publishers (authors?) who blocked the text to speech feature, but not many, as I understand. The quality of the text to speech was just not that great (I’m still not a fan of current text to speech for pleasure listening). But authors have to eat too. And the overhead cost of creating audio books is so large that anything that would decrease return on investment… well it’s not surprising they would try and block it if they could.

Now this is supported on some books, if you own the audible version as well as the text: How do I read and listen to a book in the Kindle App?

So this may show that there is an interest in having this kind of a file.

1 Like

Amazing. I think audio EPUBs will make the discourses even more accessible!

Surely we can use sub-nikāya audio EPUB units if size is a problem? Eg, Linked Discourses Volume I: Book With Verses.


I have the first 50 suttas from PaliAudio and they are 1.4gb.

1 Like

Okay, what about this: offer two versions of each audio EPUB, one with high and one with low audio quality + file size?

It would be great to have an API for accessing a human voice reading any particular text segment. This API would allow SCV to do the following for all languages segment-by-segment:

  1. show the Pali/translated text
  2. speak the Pali (AI or human)
  3. speak the translation (AI or human)

In this way, we would offer an assembly of many voices for the world to hear.
In this way, we could all recite together.


I love this idea. Keep it coming please! Many thanks in advance.


Indeed. We’ll see how it goes. Modern opus compression is pretty efficient.

For your purposes, would you rather have;

  1. Suttas divided up into audio files, one file per segment.
  2. One sutta per audio file, with json mapping of segment start and finish.

Having one audio file per segment is exactly how SCV works. The audio files are concatenated on demand using ffmpeg. What this permits is the bilingual (Pali/translation) reading of the suttas. It also permits Pali-only as well as translation-only listening. These three styles of listening are all valuable to different people. And the three styles of listening are multiplied by the number of voices available. Currently the fast listeners amongst us (e.g., Aminah) prefer Raveena exclusively. The slow listeners amongst us (e.g., Karl) much prefer Amy.

Splitting up suttas into human recordings of each segment is a daunting task. It is much more complicated than recording a single sutta as a file. Yet that effort would extend the SuttaCentral Line-by-Line experience to the auditory realm.

In either case, having human recordings as a file per sutta or as many files per sutta would be a great gift for us all.

Thank you, Bhante.