Search improvements

anon61506839 · September 21, 2018, 3:27am

I see! Let me then try to explain more. Text analysis in Pāli is proving to be so essential, and many are struggling with the text data-mining process (me included, I don’t even have a computer to tweak the experience as you suggest). SC can be a very helpful and powerful tool for this, and I already use it for these purposes. I don’t regard this as abuse since, “abuse” would require some harm (in this case either to the SC organism or to myself!). But using the search features in ways which go beyond what the SC team envisages - that I would rather call, creativity! But i didn’t even know i was being creative, as i before your reply now thought that you were actually aiming at text analysis through your technology, which it already partly provides.

Your engine offers a thorough search through specifically Pāli texts (unlike google which goes through most of the text on the WWW!), it also transcends declension variables which is awesome (you search for “chanda” you also get “chando” etc. and sometimes the word will be listed even as it appears in compounds). But the really important thing for me is that you can also search for multiple words at once … let me give an example:

Recently I came across “chanda, vitakka, & sañña”, being grouped together in a list (in the 1st & 2nd Vihārasuttas - SN 45.11-12), which caught my attention at once. Since the context in which they appeared was quite ambiguous, I wanted to see if they appear together as a group in any other sutta/context which could help illuminate their purport. It would be too time-consuming to do this by searching through text files (as far as my skills go), but SC can do it in a blink! Only, the engine will also list all instances in which chanda, vitakka, & sañña appear individually as well. Then you’ll get 700+ results, most of which are listings of either of these terms individually. The question at this point is: “who’s got the patience to go through all of these just in case the three terms appear together in the midst of all these results?!” Hey! I would do it! But as I keep scrolling down, the page soon crashes, and even if it doesn’t, as I mentioned before, it only takes the page to reload in order for one to get lost, you can only with difficulty return to where you stopped.

There may possibly exist other forms and motivations of search which will produce abundant results, especially in the case of someone who wishes to explore the text rather than search for something specific that they already know exists! There would be no need to abolish infinite scrolling, but only offering the option of displaying results by pages would be very helpful and would mostly solve the problem. Whether this is worth the trouble? I’d say it could even encourage text analysis in Pāli, especially for people who are not so skilled or independent technologically. But i have already come across many researchers who would benefit from something like this (what about other participants here, would you find that useful?!).

Just some suggestions. Thanks so much venerable.

karl_lew · September 21, 2018, 3:42pm

This is actually one of the things I’d like to add for SC-Voice. The conceptual overwhelm is even worse for assisted users. Ideally all searches would only result in a handful of choices. The main idea will be to use text segments as the definition of locality (as in “words close together”). In this manner, we simply do keyword search on text segments and sort by hits. This type of search may be out of reach of the Lucene based content search currently in use, so I was going to experiment for SC-Voice unless @Blake or Bhante @Sujato already have such implementations underway for SuttaCentral. My benchmark for success would be to type “root of suffering” without quotes and get 10 suttas.

Snowbird · September 21, 2018, 4:56pm

I didn’t find anyone mentioning this… I don’t see any way to modify the search string once I have searched. Am I missing something? As far as I can tell I have to start over if I want to make any changes.

sujato · September 22, 2018, 12:04am

Please go ahead, we don’t have any such plans at the moment.

Indeed, this is a UI mistake, we will fix it when we can.

Hey everyone, just so you know, a bit of bad news on the search front.

Having made search a major priority, when just getting started we ran into a major glitch with our translation engine. We have been heavily relying on Pootle, which handles all our new segmented text translations. But we need Pootle to integrate with Github to actually get the texts on the site. Pootle built a system for this, and put out a “release candidate” version. It was quite buggy, but we relied on it on the understanding that the polished version would be soon released. However, the Pootle project essentially stopped development completely without releasing the final version and it now is effectively abandonware. This has left us in a pickle!

The long and short of it is, Blake has left aside the search upgrades for now and is building a new translation server. Search is still a big priority, but handling our translations properly is an even bigger one!

chanakavp · March 6, 2019, 7:47pm

Happy to hear that this is being worked on. I could not find the sutta by searching for “satipatthana sutta.” I was hoping it would be the first result.

But I found a workaround. The Google query satipatthana sutta site:suttacentral.net showed what I was looking for as the first result.

I would also suggest developers to look at the search technology provided by https://www.algolia.com/. It seems very smart. But its not free; Google Custom Search is, I think.

I found out about Algolia on https://idratherbewriting.com/ which uses the technology. If interested, use the site as a demo to test it out.

Vimala · March 7, 2019, 8:24am

Thanks for the suggestion! Maybe @blake can have a look at this sometime.