Aksharamukha: Indic Script Converter

virtualvinodh · December 10, 2018, 2:34am

Hello!

I’ve been working on a nifty tool to converter between 60+ Indic scripts including Brahmi, Kharoshthi, Devanagari, Khmer, Sinhala, and other romanization formats.

http://aksharamukha.appspot.com/#/converter

Also, you can view some sample texts in all of the scripts supported.

http://aksharamukha.appspot.com/#/texts/khuddakapatha

http://aksharamukha.appspot.com/#/texts/dhammachakka

(Just a note for Brahmi & Kharoshthi: Windows 10 has better support for compound clusters in these scripts. In other systems, the clusters might not be formed properly)

Do try it out. Any feedback will be appreciated!

V

Dhammanando · December 11, 2018, 1:56pm

I tried it out this morning, converting between romanised Pali, Thai, Lanna, Burmese and Devanagari. It’s absolutely wonderful! Thanks enormously for all the effort you’ve put into it.

virtualvinodh · December 11, 2018, 6:46pm

Thanks for trying it out!

Hopefully, it’s of some use

V

anon61506839 · December 12, 2018, 2:46am

Absolutely! Now we can begin reading Pāli texts in Burmese and Sinhalese scripts that were never Romanised. This is a remarkable accomplishment, congratulations! (I checked several scripts, including Urdu, and it’s working well).

Was it an individual effort on your part or belongs to some project? Just curious!

sujato · December 12, 2018, 9:59am

May I echo the praises of my Dhamma brothers! This is a great piece of work, congratulations.

I would very much like to see if we can apply this for the texts on SuttaCentral. Currently we convert between a few scripts on the front end, but the implementation is buggy and the conversion limited.

I am wondering if it is possible to use your conversion scripts on our texts? How might this work? Our site will be made with LitElement, which basically means that any vanilla JS should work fine. Would we need to import any libraries?

virtualvinodh · December 12, 2018, 4:41pm

It should be possible. I have separated the front end and the back end for a reason. You just need send a JSON request to the backend (currently, hosted with Google) and it will give you the transliterated results as a JSON file back. You should theoretically be able to host the backend code on your own server as well (will make it faster).

AFAIK it shouldn’t take too much effort.

V

virtualvinodh · December 12, 2018, 4:41pm

It was just me

V

sujato · December 12, 2018, 8:20pm

Thanks!

Great, we like JSON!

I’d be inclined to rely on your servers if possible. That may reduce the complexity of our setup, and ensure that we stay up-to-date with any improvements you make. But a few caveats:

Speed; although it remains to be seen how important that is.
Will it overload your servers? (I doubt it!)
Will your server remain reliable and consistent? (Any breaking API changes expected?)

Let’s say a user sets the script to Bengali. Then they call up MN 2 on SuttaCentral. The root text is in Roman, so it needs to be converted. The source is either:

Processed on our backend, or
Sent to your servers for processing.

Then it is served to our front end and rendered in the changed script.

Is that right?

Currently, I believe, our script changer is written in JS and works entirely on the front end. Is this possible with your setup?

virtualvinodh · December 12, 2018, 11:19pm

It’s Google. It should be fast enough.

It shouldn’t. I can setup a specific backend just for Sutta central and see how it fares. I do get a weekly billing. My guess is that it shouldn’t go beyond the free tier limit offered by Google.

As I said, the code is hosted in Google’s cloud servers. I’d assume it should be reliable enough. I can keep the API consistent. In case, I updated anything I can keep you in the loop.

Yup.

Unfortunately no. When I started 5 years ago, JS wasn’t this popular.

If you want client side processing, you can see the mapping/rules and reimplement it in JS. For the mainstream scripts, it is not too complex (and shouldn’t take lot of time).

V

sujato · December 12, 2018, 11:22pm

Okay, thanks for the info. That’s enough for us to take it to our devas. I’ll pitch it to our backend deva, Blake, and see how we go. Currently he’s building this:

virtualvinodh · December 12, 2018, 11:32pm

Cool. Just let me know.

I’d be glad to lend a hand in supporting more scripts (irrespective of my code being used :))

V

sujato · December 12, 2018, 11:39pm

Well, thanks.

Oh, one question I had: will this work for Sanskrit, or just Pali?

virtualvinodh · December 12, 2018, 11:45pm

It should work for both. In terms of writing, Pali is technically a subset of Sanskrit.

V

anon61506839 · December 25, 2018, 11:32am

That’s amazing! Please check your private inbox!

Coemgenu · December 25, 2018, 4:51pm

Doesn’t that depend on the ability of the program to produces a gazillion ligatures that Pāli doesn’t need? I just ask because I know nothing about computers.

virtualvinodh · December 26, 2018, 4:11pm

Actually, it depends on the font. The program just produced the Unicode codepoints to represent the script. The font (and the rendering engine of the applications) decides how it should appear on the screen.

V

Snowbird · March 9, 2019, 5:13am

I just noticed the checkbox to capitalize sentences. This is marvelous. Really a great, great feature.

It doesn’t seem to recognize quotation marks, though. For example, this doesn’t get capitalized:

“āyudhayakin pahara kǣvā vagē, hisa ginigattā vagē, ē bhikṣuva kāmāśāva nætikarala dānna sihi nuvaṇin inna ōna.”

I know it’s tricky, and I especially wouldn’t expect it to recognize quote marks within a paragraph. But perhaps at least when the quotation starts the paragraph. Both smart quotes and straight quotes don’t work.

Sadhu sadhu for your great work!

Snowbird · March 9, 2019, 5:25am

One usability suggestion…

All of the Romanized options have names that are quite obscure. In the interest of the non-specialist, could they be included in the list alphabetically as
Roman (ITRANS)
Roman (IAST)
Roman (ISO 15919)
etc.

I actually think saying English instead of Roman would be even better, but I recognize that is not accurate, really.

The reason I noticed this is because I am writing some instructions for Sinhala speaking monks who would be familiar with the term “Romanized” or who generally call the transliteration “English”. But these other names, even I haven’t heard. Except IPA and Velthuis.

As it is, the list is mostly composed of alphabets, so it seems like that should be the primary sorting method. At least if it was like that one would try searching for English or Latin if Roman was not there. As it is, one has to guess at the items grouped at the end.

Also not sure why Cyrillic is last.

virtualvinodh · March 9, 2019, 11:05pm

Thanks for the feedback @Snowbird.

I just noticed the checkbox to capitalize sentences. This is marvelous. Really a great, great feature.

That’s cool. I didn’t think many people would find it useful. I added it because, at least for me, somehow uncapitalized romanized texts look a bit unnatural, thereby affecting the readability of the text.

I know it’s tricky, and I especially wouldn’t expect it to recognize quote marks within a paragraph. But perhaps at least when the quotation starts the paragraph. Both smart quotes and straight quotes don’t work.

I have fixed this. Do give it a try and let me know if it works alright.

If you’d like to capitalize to a specific word, you can specify it by placing @ before the word. For instance, @බුද්ධ

One usability suggestion…

It actually makes sense. I have modified it accordingly. Now the entire thing is sorted alphabetically.

V

Snowbird · March 10, 2019, 9:24am

Oh, no, I think it is an absolute must. That was one one of the small things that bugged me about the DPR.

Thanks so much for this tool. It’s a great blessing.

I just tried the print feature using the Microsoft Print to PDF driver as well as another, and they both put the words “Print text” kind of in the middle of every page. I’m guessing it’s a bug and not a watermark. :-)