Softening Search

Search often comes up with no results I think if the spelling isn’t 100% accurate. Can we have softer, fuzzier approach to finding texts? Pali words may end with ‘o’ or ‘a’ and little variation would mean no results at times.

With gratitude

Mat

1 Like

I agree, this would be good. Various search issues are discussed here.

We will look at tweaking this, but we have other priorities at the moment.

It would be good if someone like @chansik_park could teach us to regex like the pros.

2 Likes

Haha. :grin:

So for client-side searching, to which it seems we must resort in the interim for exacting searches, there are a few avenues for the typical user to take, each with their own requisite installation steps, to get to a point of being able to start using regular expressions. But I think the simplest way, concisely speaking, is that you’d download (or alternatively git-clone) the folder here for just the Pali root texts (if I’m not mistaken), and then either use a commandline regular expression engine, eg grep, or a modern text editor like Notepad++ and Sublime Text for executing the actual searches.

Generally speaking, there are a few different dialects of regular expression supporting various levels of complexity implemented by the various engines, but there is pretty much a core subset that all engines support. The Digital Pali Reader’s engine, being a browser add-on, is based on that of Javascript for which the standard reference can be found here. The naggy but courteous indefinitely-trial version of Sublime Text and the FOSS but Windows-only Notepad++ both use a richer dialect (PCRE; eg: qv. or qv.). But I’m pretty sure that it’s a superset to 95%+ of the more beginner-friendly Javascript dialect though.

For Mat’s example, you might use the search term eg dev[ao]. If I were a more organized person, I’d probably have a file full of stock regular expression patterns to share but I think for the most part I found the one crude pattern takes me pretty far:

(word1part.{3,30}word2part)|(word2part.{3,30}word1part)

It checks if the strings “word1part” and “word2part” ever occur between 3 and 30 characters apart from each other in either order. More recently I’ve found that the “negated lookahead” operator, eg (?!abc), is good for ad-hoc narrowing of searches.

If anyone has any particular search in mind I’d be happy to oblige with how I’d look for it.

4 Likes

Thank you, all. :+1:

1 Like

Further improvements notwithstanding, I too encounter this issue. Try these approaches:

  1. By default, SC will try for a fuzzy match of Pali terms. so if you enter sujato it will also match sujata, sujatena, etc. But this has limits.
  2. Sometimes there is variation in spelling of Pali words, so this makes it complicated.
  3. The forms of words are not always easy to match. For example, Google uses its vast dataset and complicated language rules to recognize that “does” and “did” and “doing” are forms of the same word, but it is hard to teach a computer that karoti and akāsi and kubbamāno are the same word in different tenses.
  4. Try including the term in asterisks.
  5. Try lopping off a character at a time from the ends, or even the beginning, of the word.
  6. Try Google site search: site:https://suttacentral.net
  7. Try searching on Github: legacy-suttacentral-data/text at master · suttacentral/legacy-suttacentral-data · GitHub
1 Like

Dear Ajhan Sujato,

Great suggestions!

Would different Pali words (with similar meanings) be included in the dictionary entry? If that is the case does Search look in the entry itself rather than only the dictionary index? Not sure how it works, but it would be a good tool when trying to understand the meaning or the context of particular Pali term.

This is a separate issue. It would be possible to do this using lists extracted from the Abhidhamma. There, we find fairly extensive and reliable lists of synonyms for stock doctrinal terms and ideas. It would be possible to complie such lists and supplement the regular search with a semantic search. Thus, for example, if you wanted to search for suttas on the topic of “energy” you could find contexts using padhāna, viriya, vāyāma, and so on. But this would be a major project!

No I don’t like recommending major projects on monks! You need time to enjoy your enlightenment :mindblown:

:cactus::meditation:

1 Like