Search improvements

karl_lew · August 8, 2018, 10:42pm

Yes it’s a bit clunky. And you need to type in the short id form to get to the English version. The wiki should provide implementation guidance since we’d like to minimize the need for such documentation.

Snowbird · August 9, 2018, 12:11am

I’d say more like non-functioning. Seems like being able to search for the names of a sutta should be a basic feature. But maybe I’m the only one who would do that.

Any way, big thank you to all who are working on this.

Personally I really liked the search feature in the Digital Pali Reader.

sujato · August 9, 2018, 12:19am

It is indeed! This is on the to-do list.

jerclarke · August 15, 2018, 5:33pm

Great! This was my main complaint I was going to bring up, really strange that the most popular words like “Satipatthana” don’t have the corresponding suttas as the first result.

FWIW I think to improve the filtering setup, you have to make a major change: In the results, show the search field with the previous search text filled out, with a button to resubmit.

This is a standard practice in Google and IME most major sites with good search. It lets the user see what they typed and consider changing it if they don’t get the results they want.

Right now you have to go back up, click the icon in the header, and re-type your search. IMHO that’s a really confusing path, and when you don’t get the results you want it’s extra-discouraging to find that your search isn’t in the field and has to be copy-pasted.

Having this field at the top of results also gives you a perfect place to add documentation and/or filtering tools.

You can even replace the current plaintext output of the search term with the field version, just make it big and pretty but still obviously a field.

In terms of the nikaya filtering, I’ll +1 it as a feature.

In terms of the nikaya filtering, it could be some checkboxes (if any one of them is ticked, then only results from ticked ones are returned).

If that’s too clunky, another way would be to have text indicating how to limit a search using text query strings, which functions as guide, but then ALSO, if you click on the examples, they get inserted in the search form directly above.

So it says something like:

Filter by nikaya sn,mn,dn,an,kn:

And clicking on e.g. “sn” adds sn: to the start of the search query.

Maybe that’s too clever for it’s own good, but it would be nice to find a way that is a hybrid of “tell us what to do” and “help us do it”.

Also, it’s pretty clear that having some detailed instructions about filtering and tips for searching right on that page, but hidden behind a “more search tips” kind of .toggle button would really help.

Thanks for wanting to review this!

sujato · August 16, 2018, 12:24am

You’re absolutely right, this is a UI failure, and I have also noticed it. I have put it on the 2do list.

The filtering suggestion is also a good one, pretty much how I’d like to do it.

richard.nagyfi · August 21, 2018, 7:59pm

Please note that ranking search results in any way will affect which suttas will be seen the most. Anything on the first page of Google results will have an extreme boost in traffic, while the sites Google finds less relevant will experience less traffic. Modern search engines are hybrids that also use collaborative filtering techniques to get better results: instead of just looking at the contents of a file, how users interact with that certain file is even more important. The search results we receive are also filtered by the context we unknowingly provide with our search and access history (so if I search for “bus” I will most likely get results from my country on public transportation, and since I’m into IT I will also get hits about the “bus” in computer architecture). The problem with this, that it requires user data to work and users are mostly unaware that they are giving their data away during these interactions.

So in order to create an efficient search engine, first there should be a decision on how the automatic ranking should work and whether or not there is any user data that is offered for this service.

sujato · August 22, 2018, 1:24am

And a further consideration is that we have a didactic purpose: we are not neutral in regards to the information we serve. Primarily that means we want to highlight early texts rather than later. But it also means that we would like to broaden users’ horizons from the half-dozen suttas that are, for most, the be-all-and-end-all of of the discourses. If someone wants to find the Metta Sutta or the Dhammacakka Sutta, they should be able to. But if they want a teaching on metta or the four noble truths, they should have more choices than these.

richard.nagyfi · August 22, 2018, 7:32am

It is also common practice to get information from outside sources to generate context (for instance Spotify’s bots -among many other things- also crawl lyrics which they use to find textual similarities for songs you enjoy and improve their models). Crawling the discussions and explanations on Suttas to improve search results could be useful - however, it also means that whatever ideas certain people had will be a basis on how users will reach these Suttas. If people’s understandings were wrong, the search results will also be off.

And yes, to broaden users’ horizons is also a challenge, as their preference biases will affect results as well. I assume that the popularity of Suttas also follow the Pareto distribution, meaning that about 20% of the Suttas will be quoted 80% of the time.

The evaluation of the results is also tricky, as it requires people with much knowledge in the domain so they can tell if the returned quotes are the ones that were indeed relevant. However, this also skews the ranking towards the preferences of the people who take part in the evaluation process.

So this is pretty much like an anti-search engine task, where the usual ranking algorithms would be partly counterproductive.

karl_lew · August 22, 2018, 2:01pm

In my searches, my own preferences haven’t been relevant–engineering and rock climbing are way too specific as attributes and only indirectly related to the suttas. Instead, I rely on the diversity of translation. For example, I was led to MN1 via Thanissaro Bhikkhu, who used “delight is the root of suffering”. We also have “relishing is the root of suffering” via Bhante Sujato. The diversity of translation itself provides guidance in search.

This very diversity of translation is one of the amazing things about SuttaCentral–we can find the same suttas via many paths. I still find it remarkable that “root of suffering” comes up only about 13 times in SC search results. That’s much better than any Google search I have conducted. SuttaCentral will also support more and more languages, each with their own cultural web of meaning. When I think in German, my perspectives shift. When I think in Spanish, my perspectives shift. Each of these languages will have their own translators, and I can see that we may in the future be able to use these alternate translations to automatically inform searches. Indeed, it is interesting to note that SC returns three (3) results for “Wurzel des Leidens” specifically, we have “the will is the root of suffering”. Now isn’t that a remarkable new spin on old suttas?

Denn der Wille ist die Wurzel des Leidens

richard.nagyfi · August 22, 2018, 2:19pm

I just realized it’s possible to build an automatic synonym dictionary to improve search results, by comparing multiple translations of the same Suttas. So “delight” could return Suttas with only the word “relishing” in them.

sujato · August 23, 2018, 12:47am

If you can do that, great! But I suspect it’s not going to be trivial. If our different translations were all segmented it would make it a lot easier, but alas that is not the case.

Mat · August 23, 2018, 5:39am

There is a button for ‘dictionaries’ already (which is helpful) but what I really would like to see is a similar button for ‘suttas (only)’ as in most of my searches I just need to filter out non-EBT material including Dictionary entries.

richard.nagyfi · August 23, 2018, 5:49pm

I will try to do a proof of concept code to see if it’s possible. Is there a way I can find the Suttas with the most translations? Thank you.

sabbamitta · August 24, 2018, 6:36am

Maybe the metta sutta (Snp 1.8 and Kp 9)?

richard.nagyfi · August 24, 2018, 11:03am

thanks!

ERose · August 24, 2018, 12:35pm

The abundant diversity of translations, and your comments about perspectives-shifts from different languages, remind me what a beautiful useful persistently cultivated gift is SuttaCentral.net. As are the Suttas, and the Triple Gem!

=D May these be of benefit for many for a long time.

richard.nagyfi · August 28, 2018, 3:22pm

I’ve started comparing 3 versions of the Metta Sutta, but it’s much more complicated than I’ve thought. My initial belief was, that at least the number of paragraphs will be consistent or the dialogues would be similar with the Buddha asking and the Bhikkhus replying, providing points of reference for comparison. Yet, they are so different, it’s hard for even a human to find which sentence is which in the other translations.

https://suttacentral.net/iti27/en/ireland
https://suttacentral.net/an8.1/en/bodhi#4--7
https://suttacentral.net/an8.1/en/sujato#4--7

Still, it might be possible to extract “synonyms” from these texts, not by going through sentences, but by finding words that are unique to these translations. Although this approach would still not tell us which word means what, they could be still assigned as possible search keywords for the other translations.

karl_lew · August 28, 2018, 4:44pm

welcome to my voice-assisted nightmare. Good thing I have short hair. It does not pull so easily.

anon61506839 · September 20, 2018, 4:32pm

Many thanks for the new enhanced SuttaCentral. Appreciation to Ven. @sujato & Ven. @Vimala as well as other geeks and contributors behind the scenes. It’s awesome!

I have some suggestions regarding ‘search’. Others have already mentioned “filters” by nikayas, i’d add filters by other sections of the Canon too (Vinaya: Patimokkha, Mahavagga, Cullavagga, etc.) and likewise for the seven Abhidhamma books. I don’t know how demanding is the implementation of this but I presume if it can be applied for the Nikayas, then it would probably be easy to just extend the function to other parts of the Canon?

Then I would also suggest filtering by language (Pali, English, etc.).

The most important suggestion I have and which is already affecting my experience is this: when sort of data-mining for recurrent words, like “dhamma” or “citta” etc. and you get over a thousand listing, you’re bound to stop your review at one point and come back to it at a later time. By then you might need to restart/reload the search page, and then it becomes very difficult to get back to where you stopped last time (even if you remember where you stopped!), and you end up scrolling up and down trying to catch the spot. Also as you keep scrolling down and the page keeps loading more data, it often eventually crashes (possibly because of my old device, but the data gets too much in a single page!).

The solution to this problem, I suppose, is to have search results displayed by pages rather than by scrolling down (like in google search for example); then it will have become easy to return to the page where one has last stopped. It would be great to have this feature at least as optional.

Another thing I noticed is that, at present, search results follow no discernable order. Listings appear from different nikayas and from Vinaya texts scrambled in the order, as opposed to from one nikaya at a time, then from the vinaya, etc. (Is this your experience too?!). It would be desirable and helpful to have them displayed in order, i believe. But again i don’t know how much demanding the implementation of that may be.

Soon we will be able to say goodbye to “CTRL-f” on pdf files! Many thanks, great work, and best of luck in the future.

sujato · September 20, 2018, 11:25pm

Hey, thanks!

Sure.

This is less useful, as in the majority of cases word are unique to a language anyway. Currently you can filter by root texts and translations.

Honestly, I think this is pretty much abusing the concept of searching on a website. If you want to do serious data-mining of texts, you’d be far better off using a local text editor or other utility and a set of plain text files. We will be keeping the infinite loading.

Yes, this is one of the problems we want to fix.