@silence kindly offered to help out in making a better Pali compound-breaker for our lookup tool.
Currently it works on an automatic basis, parsing the compound and comparing it to a list of words, and seeing where it might be broken. Given the complex and unpredictable nature of Pali compounds, it’s not easy to get such an approach working all the time.
Just a couple of points worth bearing in mind.
- The current lookup is based on the New Concise Pali English Dictionary (NCPED), which is Buddhadatta’s old CPED, enhanced and corrected based on Cone’s Dictionary of Pali. We are awaiting the release of her third volume before this can be finalized.
- A fair number of extra improvements has been generated during the translation process, but these have not been integrated into the lookup. Essentially I broke compounds by hand while translating, but this was entirely ad hoc, not all were done by any means.
So to improve the tool, we have essentially three kinds of angles that can help:
- Improve the underlying resources, especially the dictionary.
- Improve the automated analysis and breakup.
- Manually insert breakpoints in the text.
@silence, can you describe what your experience has been in doing this in the past, and how you envisage approaching the task?