I’ve been working on a nifty tool to converter between 60+ Indic scripts including Brahmi, Kharoshthi, Devanagari, Khmer, Sinhala, and other romanization formats.
(Just a note for Brahmi & Kharoshthi: Windows 10 has better support for compound clusters in these scripts. In other systems, the clusters might not be formed properly)
I tried it out this morning, converting between romanised Pali, Thai, Lanna, Burmese and Devanagari. It’s absolutely wonderful! Thanks enormously for all the effort you’ve put into it.
Absolutely! Now we can begin reading Pāli texts in Burmese and Sinhalese scripts that were never Romanised. This is a remarkable accomplishment, congratulations! (I checked several scripts, including Urdu, and it’s working well).
Was it an individual effort on your part or belongs to some project? Just curious!
May I echo the praises of my Dhamma brothers! This is a great piece of work, congratulations.
I would very much like to see if we can apply this for the texts on SuttaCentral. Currently we convert between a few scripts on the front end, but the implementation is buggy and the conversion limited.
I am wondering if it is possible to use your conversion scripts on our texts? How might this work? Our site will be made with LitElement, which basically means that any vanilla JS should work fine. Would we need to import any libraries?
It should be possible. I have separated the front end and the back end for a reason. You just need send a JSON request to the backend (currently, hosted with Google) and it will give you the transliterated results as a JSON file back. You should theoretically be able to host the backend code on your own server as well (will make it faster).
I’d be inclined to rely on your servers if possible. That may reduce the complexity of our setup, and ensure that we stay up-to-date with any improvements you make. But a few caveats:
Speed; although it remains to be seen how important that is.
Will it overload your servers? (I doubt it!)
Will your server remain reliable and consistent? (Any breaking API changes expected?)
Let’s say a user sets the script to Bengali. Then they call up MN 2 on SuttaCentral. The root text is in Roman, so it needs to be converted. The source is either:
Processed on our backend, or
Sent to your servers for processing.
Then it is served to our front end and rendered in the changed script.
Is that right?
Currently, I believe, our script changer is written in JS and works entirely on the front end. Is this possible with your setup?
It shouldn’t. I can setup a specific backend just for Sutta central and see how it fares. I do get a weekly billing. My guess is that it shouldn’t go beyond the free tier limit offered by Google.
As I said, the code is hosted in Google’s cloud servers. I’d assume it should be reliable enough. I can keep the API consistent. In case, I updated anything I can keep you in the loop.
Yup.
Unfortunately no. When I started 5 years ago, JS wasn’t this popular.
If you want client side processing, you can see the mapping/rules and reimplement it in JS. For the mainstream scripts, it is not too complex (and shouldn’t take lot of time).
Okay, thanks for the info. That’s enough for us to take it to our devas. I’ll pitch it to our backend deva, Blake, and see how we go. Currently he’s building this:
Doesn’t that depend on the ability of the program to produces a gazillion ligatures that Pāli doesn’t need? I just ask because I know nothing about computers.
Actually, it depends on the font. The program just produced the Unicode codepoints to represent the script. The font (and the rendering engine of the applications) decides how it should appear on the screen.
I know it’s tricky, and I especially wouldn’t expect it to recognize quote marks within a paragraph. But perhaps at least when the quotation starts the paragraph. Both smart quotes and straight quotes don’t work.
All of the Romanized options have names that are quite obscure. In the interest of the non-specialist, could they be included in the list alphabetically as
Roman (ITRANS)
Roman (IAST)
Roman (ISO 15919)
etc.
I actually think saying English instead of Roman would be even better, but I recognize that is not accurate, really.
The reason I noticed this is because I am writing some instructions for Sinhala speaking monks who would be familiar with the term “Romanized” or who generally call the transliteration “English”. But these other names, even I haven’t heard. Except IPA and Velthuis.
As it is, the list is mostly composed of alphabets, so it seems like that should be the primary sorting method. At least if it was like that one would try searching for English or Latin if Roman was not there. As it is, one has to guess at the items grouped at the end.
I just noticed the checkbox to capitalize sentences. This is marvelous. Really a great, great feature.
That’s cool. I didn’t think many people would find it useful. I added it because, at least for me, somehow uncapitalized romanized texts look a bit unnatural, thereby affecting the readability of the text.
I know it’s tricky, and I especially wouldn’t expect it to recognize quote marks within a paragraph. But perhaps at least when the quotation starts the paragraph. Both smart quotes and straight quotes don’t work.
I have fixed this. Do give it a try and let me know if it works alright.
If you’d like to capitalize to a specific word, you can specify it by placing @ before the word. For instance, @බුද්ධ
One usability suggestion…
It actually makes sense. I have modified it accordingly. Now the entire thing is sorted alphabetically.
Oh, no, I think it is an absolute must. That was one one of the small things that bugged me about the DPR.
Thanks so much for this tool. It’s a great blessing.
I just tried the print feature using the Microsoft Print to PDF driver as well as another, and they both put the words “Print text” kind of in the middle of every page. I’m guessing it’s a bug and not a watermark. :-)