Spelling errors in EA12.1 The One Way In Sūtra

karl_lew · September 26, 2018, 12:40am

I happened to choose EA12.1 as a test case for Sutta Central Voice Assistant. I found some mispronunciations as well as some misspellings. I corrected mispronunciations but need help fixing these:

carving vs. craving
un-wholesomeness vs, unwholesomeness
un-wholesome vs. unwholesome
bound-Lessons vs. boundlessness
knee cap vs. kneecap
fellings vs. feelings

sujato · September 26, 2018, 10:21am

I’ve fixed these, but frankly I’m not very enthusiastic about it.

Ever since we have shifted to the new PO segmented files, I have consistently said that the old HTML files are deprecated. We maintain them and can add new ones, but we no longer develop new features for them.

Why? Because of things like this. The EA translations are full of typos. Great, so we fix them. Problem is, the translations themselves are often terrible. I don’t just mean that they are awkwardly phrased and not entirely accurate. They frequently completely miss the point of the text, for example by taking a list of doctrinal terms and trying to construe it as if it were a sentence.

We are spending our time fixing things so that the mistake is presented correctly. That is why I want to spend my time and the time of my team creating new resources of assured and consistent quality, rather than wasting our time to bodge along old stuff, like a “cart kept going by being strapped up”.

So why then have them at all? Because at least it is something. There are very, very few translations from the Agamas, and this was pretty much the first sustained series of Agama translations ever published. But in the coming years I expect to see more Agama translations of a better quality, and older generations of translations should be allowed to retire with dignity.

This case is far from unique. All the old PTS translations are similarly dubious, and are really only of legacy value. In other cases, notably Bhikkhu Bodhi’s translations, the translations themselves are good, but the files they came in (originally, EPUBs from Wisdom) are poor. We have done what we can to fix them up, but I am sure that many mistakes in the files remain.

So we have made the effort to get all these things in a consistent format, correct them as best we can with our limited resources, and put them online. That’s enough.

I look at the possibilities we can do with the segmented texts: there are so many great things! Even just with the voice app, we could read the Pali and English line by line. Eventually we’ll support other languages, so we could learn Italian by listening to English/Italian suttas!

I want to keep our systems as focussed and simple as possible so that we can fully take advantage of the segmented texts, not having the additional overhead of maintaining dual, incompatible systems. We already have this overhead in our main app, I don’t want to perpetuate it again and again. Of course, it’s easy to say, well, it’s not hard, it’s not so complicated to add support for legacy texts. But then, yeah, this happens. And the next thing. And the next thing.

Anyway, as always, it is your app and your decision. But I just wanted you to know my position and why I hold it!

Mat · September 26, 2018, 11:42am

It makes sense to keep the workload manageable. I see a lot of corrections being posted which look like perfectionism to me, which might eat into the core ‘business’ of SC.

karl_lew · September 26, 2018, 1:06pm

I did notice the lack of quality and was a bit confused by it, hence a separate post. For example, there wasn’t a root text link. In fact, I was seeing translations of translations, basically the telephone game. You’ve actually answered the unspoken question of how much work we should devote to this. I was really not looking forward to scanning and correcting these texts. It does mean that SC-Voice will offer an even more degraded rendering as mispronunciations pile up on misspellings, but I’m quite grateful to stick to the segmented texts. SC-Voice scrapes the HTML and does muddle through, which is quite sufficient. I mostly wanted this particular one cleaned up because I listened to it all day yesterday and it was driving me a bit batty not fixing it.

SC-Voice has a unit test to verify HTML parsing and this sutta is in the unit tests. The importance of the sutta isn’t so much the content, but how SuttaCentral encapsulates the semantics. I learned about metaarea for example and will be posting that information alongside blurbs inside the SC-Voice pages. I also learned about suttaplex elements and will be basically adding the voice equivalent of a suttacard.

I also am very excited about this possibility and would like to start looking at one other language as soon as we can. Doing so will illuminate internationalization trouble-spots in the architecture–I’d rather address those now while I can. We are limited by the AWS Voices and I would choose a language with multiple voices and genders, simply because experience with English voices has shown how much that availability matters (i.e., Amy vs. Raveena). How should we proceed with other languages?

karl_lew · September 26, 2018, 1:16pm

No.

I really think we should fix any misspellings in Pali Canon segmented text. This is core content. Failure to do so accelerates the inevitable degradation of the scriptures. The misspellings lead to mispronunciations which lead to misunderstanding which leads to delusion which leads to suffering. It’s not about potato vs. potatoe. It’s about carving vs. craving. They sound completely different although they look kinda the same. If the content is already compromised because of multiple translations and lack of Pali root text, then I would agree that the effort involved is simply not worth it. But please, let’s keep the electronic record as clean as we can for the Pali Canon. It’s not just for us but for those not yet born.

In software engineering we have this notion of levels of support. Given the massive amount of information available on SuttaCentral, maintaining all of it would be infeasible. Adopting a formal support policy would help us focus our efforts on “the critical few”. For example, segmented suttas should be “supported”, whereas legacy suttas such as EA12.1 would be “unsupported” or “legacy”. If we wish, we can elaborate on levels of support:

Supported content semantic errors (e.g., “good” translated as “bad”, carving vs craving)
Supported content cosmetic errors (e.g., unwholesome vs. un-wholesome)
Legacy content errors of transcription (e.g., bound-LESSON vs. boundlessness)
Unsupported content (never changed).

SC-Voice will display “Supported” for all 3850 suttas in Pootl and will display “Legacy” for all other suttas served via SuttaCentral api. Supported suttas will be available offline. Legacy suttas will not be available.

Bhante @Sujato, please edit this Support Policy as you wish. The support policy link will be given on each page.

karl_lew · September 26, 2018, 3:42pm

Erm. I actually see my role as “original maintainer” To facilitate that I have renamed sc-karl to sc-voice. The repository has now moved:

I’ll defer to the SuttaCentral team for future migration and/or deprecation.

Viveka · September 26, 2018, 9:15pm

so the previous link is no longer operating. Could you please send me an invite to the new one

sujato · September 27, 2018, 1:48am

Well, it really depends on the work done by other translation teams. At the moment we have people working in Indonesian and Portuguese, and Chinese will hopefully start in the not too distant future. But starting is one thing, publishing is another. For the time being I would say we should stick with English only. But we should build it with the expectation that support for other languages will come in the future.

Agreed.

That sounds fine. I wonder if we should include something similar on the main site …

Sure! I simply meant that I respect your role as developer and chief designer of this app, so I am happy to defer to your opinion.

karl_lew · September 27, 2018, 2:38am

Oopsie. Here is the Audio MN1 wiki page. Log into Github and see if you can edit that page. I have revoked public edits. You are all powerful.

I was thinking of adding an icon of some sort to each segmented sutta. This icon would be a guarantee of support and would be automatically shown on every segmented sutta. It’s basically a symbol of quality. In SC-Voice, I will use the word “Supported” as a placeholder and defer to the SuttaCentral UI team for the actual icon/graphic. Similarly, I will use the word “Legacy” on every non-segmented sutta as a placeholder for a corresponding icon/graphic. I believe that this will really help users find their way to content that is vetted by and consistent with multiple sources. For example, knowing that EA12.1 is Legacy would now temper my initial excitement at finding another sutta with jhana descriptions.

Thank you. If I need something, I’ll just use the German translation of MN1 to test the language code in preparation for future coordination.

sujato · September 27, 2018, 8:58am

We considered using something like this for the main site, but never settled on a suitable UI, and eventually let the idea slide. Something like a badge or star icon might work. The problem is that the UI rapidly becomes cluttered and things added for clarification end up weighing you down.

The other problem is that the supported texts follow a very easy pattern. They are the ones by me, and soon by Brahmali. So you put a badge or some kind of icon on every link, thousands of them, to keep on telling someone something that they figured out after the first couple of entries.

Perhaps it would be better to have a brief introduction for a first-time visitor, and just explain, “Translations by Sujato and Brahmali are produced by SuttaCentral and are fully supported, while other translations are legacy texts inherited from third parties and may contain errors.”

karl_lew · September 27, 2018, 12:27pm

Agreed. That’s clutter. However, in SC-Voice, the badge/icon will only be in the annotational blurb, which is discoverable on demand by expanding the blurb. Most folks don’t expand the blurb, which includes the translator’s name. For example, SC-Voice will show something like this in an expanded blurb for MN1:

(expanded)
Mūlapariyāya Sutta
Translated by Bhikkhu Sujato
The Buddha examines how the notion of a permanent self emerges from the process of perception. A wide range of phenomena are considered, embracing both naturalistic and cosmological dimensions. An unawakened person interprets experience in terms of a self, while those more advanced have the same experiences without attachment.
Supported