A question about commas

josephzizys · April 3, 2022, 3:32am

I have been using Suttacentral’s wonderful translations displayed line by line with the Pali and Digital Pali Reader’s search functionality to explore facets of the frequencies of terms and formulas in the Nikayas. Both resources are wonderful and I am very much like a kid in a candy store, learning new and I think interesting things all the time about the texts.

One thing that I have been noticing is in both the Suttacentral Pali and the Digital Pali Reader Pali, that certain formulas are given in DN with commas and in the MN text without commas or vice versa. DPR will miss a search string if it has commas in it, while SC will give every page with any combination of even related words in no particular order, so I am trying to learn contemporary web development to get in there and fix these issues myself, but in the meantime I wanted to know, what is the actual source of SC’s Roman text Pali? and why are there commas and stops and capitals in it? and why are those commas applied inconsistently across DN and MN?

Is there any possibility in the future of cleaning up these Pali sources to make them consistent? (I’m happy to be the one who attempts it). Obviously I would not want to deprive those users of the Pali who are chanting and reading of their caps and commas and stops, but for search functionality, if it was possible to interrogate a text without those complications I wouldn’t have to try multiple search strings with commas every place they might occur to find all instances of a string

Snowbird · April 3, 2022, 3:41am

FWIW, the DPR allows regex searching, so you could create a search string that would check for both with and without, although it’s not ideal. I guess instead of word spaces you could just put something like[\s\.,;]+

Probably the best (meaning easiest, quickest, and most reliable) would be to work with the team at the DPR and modify the search to ignore punctuation. That is a very specific and reasonable enhancement.