SuttaCentral

International Participative Pali Dictionary (IPPD) : looking for comments and advice

This is my first post in the forum, so I start by greeting everyone ! And thanks for the wonderful resources suttacentral has to offer. To simply introduce myself, I’m working in IT and I have some experience in various opensource projects (https://luxcorerender.org/, https://www.picochess.org/, http://gaiachess.free.fr/,..). I’m a beginner in Pali, but very interested in the Pali Canon since over 15 years.

I’m French, and Pali/French dictionaries are not as complete as English ones ; I assume that there are lots of people looking for a Pali dictionary in their mother language. Also, English resources are great (PTS/Cone/CPD/…) but I feel that there is no « structured » data and tool to improve English dictionaries.

So, here is the pharaonic project : building an international, participative, structured dictionary. To provide simple tools for lexicographers/volunteers, allowing them to work collaboratively (think wikipedia), and consolidating the work in a simple, public, and exportable database (allowing producing PDF, queryable website and API).

So the project is divided in three parts :

  1. A simple graphical user interface for non-IT lexicographers to work on the dictionary in a collaborative manner.
  2. A central database allowing querying and simple data export (JSON)
  3. Ready to use formatted data (PDF, website)

I already have working prototypes for the 3 parts :

  1. A proof of concept GUI (see attached screenshot), (will be free, opensource, multiplatform)
  2. A working MongoDB database (simple JSON storage)
  3. A working proof of concept PDF export (see attached PDF) from a tool I did some years ago

I plan to have a working « sandbox » infrastructure in 1-2 months, at the moment there are still important features missing (nicer GUI, versioning, etc.)

When the tool will be complete, I hope to create a ‘dictonary comitee’ to organize the methodology around the dictionary ; I already have plans for the « French committee » , but I hope there will be some international interest around this :slight_smile:

On the collaborative part, a lot of work is common to all languages (selection of lemmas, grammatical classification, morphology, semantic relationships, etymology, etc.), and only some parts need specific work in each language (meanings, encyclopedic information). So I hope that there will be emulation and progress, even if target languages are different.

So I’m posting here for two reasons :

  1. The project is at a very early stage, so if you have some advice or you think some points should be improved, please let me know !

  2. One of the first things I have to do is to decide the structure of a dictionary entry ; I’m not a lexicographer, and here is my proposal (directly inspired from Lexicography). I attached a proposal for the entry structure (see entry_structure.PDF), and I’m looking for comments on this.

I’m not allowed to upload file attachements, so here are the links :
Screenshot : https://drive.infomaniak.com/app/share/256623/e1a31580-83c1-4f49-8571-de70daf8d31f

Entry structure : https://drive.infomaniak.com/app/share/256623/69205432-e06f-40e6-81d1-91e2045d1d1a

PDF export sample : https://drive.infomaniak.com/app/share/256623/51b0fdbb-3110-4cfa-937f-a2a20d051216

11 Likes

Hi, welcome to the forum! This sounds like something Ven @sujato and his merry band of translators/coders might be interested in. :slightly_smiling_face:

5 Likes

Hi & Welcome. The first post but creating a big impact to the society.

2 Likes

Greetings,
nice project! It looks like that this approach has some things in common with ven. Subhūti’s work on the UPR (Ultimate Pāli Reader; Discord ; he is also on this forum under the username “bksubhuti”). One feature of this application will be a comprehensive, multilingual Pāḷi dictionary, so a contact to him might be fruitful. He and his team are also quite tech-savvy …

Mettā 2u!

4 Likes

And if a REST API is available we would love to use that for Voice.suttacentral.net.

2 Likes

Hi, just a couple of points for now. By the way, the second two files that you link to appear to be the same.

You can see SuttaCentral’s dictionary sources here:

And note that the scope of your project is similar to that of Charles Muller’s Digital Dictionary of Buddhism, which is (despite the slightly unclear title) a dictionary of Chinese Buddhist terms. He has been curating this as a volunteer contribution project since 1995; it is in fact one of the longest-lived collaborative projects on the internet.

http://www.buddhism-dict.net/ddb/

7 Likes

Thanks for all the links ! I joined the discord, and also checked the data provided by suttacentral on github and Charles Muller’s project. Yes I made a mistake on the lemma structure file upload, I will post an updated one soon - the fields are still evolving day by day :slight_smile:

3 Likes

Dear @jromang ,

Your project sounds interesting. I would like to talk personally with you to understand whether we could collaborate anyway. I have been collecting Pāli words for more than three years to build a Pāli-Italian dictionary. The website is unfortunately still closed to public even though the collected words are now more than 1000. The problem is that I’m not a specialist informatic programmer, so I have to rely on other’s work for the informatic side of the matter. I have been waiting for one person or two more than one year in order to launch the website, but it has not been possible yet. Could you join me in a Skype conversation? My Skype name is gangvegr.

2 Likes

Dear @Antonio-Costanzo ; I would be very interested if a collaboration is possible ! Like I said the project is at a very early stage, but I would be very happy to know your needs and to see the data you have collected and the format you are using :slight_smile: My Skype is ‘jromang’, I will try to add you to my contacts.

1 Like

Here is a proposal for the structure of a dictionary entry : Entry Structure
It may look complicated, but the ‘lemma’ field is the only mandatory field ; other fields are optional, and many of them are common to all target languages (most of the work needs to be done only once -grammar classes, morphological structure, etc.- , and it will be easier for contributors to translate to a new language)
All comments are welcome ! :slight_smile:

2 Likes

This is a great project and we’d love to have a REST api for lookup. :pray:

One consideration is server v.s. serverless implementation of your api. Notably, dictionaries rarely change, so it would seem that a serverless REST api could be possible, perhaps using Github and/or NPM. The advantage of serverless is that there is no cloud vendor to pay and no server to maintain.

How would a serverless dictionary work?

A serverless dictionary would simply have its data files available on Github. A suitable companion javascript library could then be included into client applications such as Voice.suttacentral.net for dictionary lookup. Furthermore, Github actions could be used to periodically update the automatically translated dictionaries as Google translate improves. Other dictionaries would be manually maintained but all dictionaries would share same API for client access.

3 Likes

I’ll provide that :+1:

As the goal of the website is to be a participative dictionary (by humans, and not only google translate), I hope it will change very often :slight_smile: I already have all the database/server infrastructure running, but of course it will be possible to do automatic exports of data files on Github !

3 Likes

Hi @jromang, today I get a security warning when I want to go to your dictionary. It says there is no valid certificate for pali.tny.ovh, the certificate is only valid for the following names: ippd.ovh and www.ippd.ovh. (On Firefox)

Yes, sorry for that, I changed the domain name to https://ippd.ovh, if you use the new link, there should be no problem !

1 Like

Good, thank you! :white_check_mark:

Well, I can still say a bit more to make Discourse happy (my post was too short).

1 Like

Yes, it looks good, although I haven’t checked in detail.

One design feature to bear in mind: make it growable. The “only lemma is required” idea is a good start. As you well know, the field is littered with ambitious projects that collapsed under their scale. If the project adds value right away, it can be gradually improved and expanded indefinitely.

I’d encourage you to keep the main files in JSON. It serves well, it goes natively over the internet, and everyone understands it. On SC we use the DB for querying, not for data storage.

1 Like

Thanks ! I’ll create an ‘ippd-data’ github repository to export all the data in JSON :slight_smile: