In other words, when there is more than one result, what order should they appear?
Currently when a keyword is found in a text title, the result is displayed as a suttaplex card and (on mobile) they are all grouped at the top of the list and on desktop they appear in the right hand column separate from results where the word is found in the body of the text.
Here are some ideas gathered so far:
If keyword is in title (currently used on site)
Length of document
Frequency of keyword in document
Using beginner/intermediate/advanced tags
Number of footnotes/annotations
Number of translations of a document (because more translations might indicate importance)
The method on SC-Voice: The relevance score is simply the sum of the number of matches plus the fraction of matching segments.
DN ā MN ā SN ā AN ā Kp ā Dhp ā ā¦ ā Vinaya ā Abhidhamma?
You had asked in the other thread why AN results appear first. It could be an alphabetical thing. It could also be that the Tfidf ranker tends to put shorter documents ahead of longer ones. In retrospect, thatās probably not the right approach for us. We probably want to encourage people to read longer suttas, right?
I limited a search to in:sn and it sorted them āalphabeticallyā in the sense of sn22.111 coming before sn22.99 etc.
See, thatās where I start to feel shaky. I guess all things being equal thatās not necessarily bad. But it doesnāt increase the chances of the sutta Iām looking for being closer to the top.
I think I tend to use the search to find a sutta I already know exists. Which is not the only way! If someone is just looking for a text about consciousness, then yeah, maybe a longer sutta would be more appropriate? But Iām really not sure. A very long sutta could mention consciousness in passing and not offer much more than a short sutta would. And if we went by sutta length, then we might as well just rank DN and MN first.
As far as segmented suttas go, it might be valuable to prioritize suttas where the keyword was found in a longer segment. That would downgrade all of the results where the only excerpt is āā¦feelingā etc.
However that breaks down with the non-segmented texts. They seem to treat the whole paragraph as a segment regardless.
Perhaps the number of times a keyword appears in the sutta. A sutta that had the word āconsciousnessā 10 times might be more relevant than if it only mentioned it once or twice.
Iām trying to think of metrics we have about suttas that could even be used. We have the beginner, intermediate, advanced quality. But should beginner or advanced be prioritized? The vast majority of suttas donāt have that quality assigned, so maybe simply having one of those means that a human has decided it is of special importance.
We also know how many translations exist for any given sutta. In theory the more translations exist for a sutta the more important it is.
We know how many footnotes each sutta has. Should a highly commented on sutta rank higher?
In Voice, we use a relevance score for ranking search results.
Search results are sorted by relevance. The relevance score is simply the sum of the number of matches plus the fraction of matching segments. Suttas densely packed with search terms have highest relevance.
This was done exactly for the purpose to avoid that AN results are always displayed first; by default, the software would use alphabetical order. And probably any software does, if not told otherwise.
BUT ā¦ Voice does by default only search in
Pali
segmented
from Mahasangiti manuscript, or basically, thexts that have a translation by Bhante Sujato
suttas, no vinaya, no abhidhamma
So many of the problems SC has to solve do not apply here.
This is what I would have expected to be the primary weighting.
Then, if there are two or more suttas with the same score I think your suggestions about the number of translations is reasonable.
However, if Iām looking for a sutta with an elephant in it, then I might not be looking for the most densely āelephantedā sutta- as opposed to if Iām looking for a sutta on consciousness. In this instance the title and the summary of the sutta would be useful in the weightingā¦ which takes us back to the other thread on search criteria/filters. What about in:title and in:summary but either way these criteria seem important.
We currently have title:elephant. Itās not very flexible because it only takes a single keyword. I had this idea about how it could be implemented differently, but @Khemarato.bhikkhu has given me doubts about it.
However, the title is already given priority, just not in a way that I really like:
It puts them in the right column as SuttaPlex cards. I donāt like it for many reasons. It separates them out, so now there are two separate rankings. And being separate itās easy to miss one or the other. Itās also not always obvious why a result is being given if one author puts that specific word in the title and another one doesnāt (see example below). And in the example above, the second title-first result has nothing at all to do with elephants! And with the suttaplex cards we get no excerpt. Just because a keyword is in a title doesnāt mean I donāt want to see the context of the word in the sutta itself.
Here is what the first result of title:elephant looks like:
I donāt know if I would always want to see DN first, but all other things being equal, perhaps it does make sense to show EBTs before everything else. Which in general would be what you are proposing I think? I just wouldnāt necessarily want the EBT results to be sorted that way.
I donāt know if I like that ranking for the nikayas.
SN is the original categorised search!
I donāt even think we should give a weighted ranking to the different Nikayas. I agree that Sutta > Vinaya > Abhidhamma within the context of this site
I agree. I was a bit confused about having suttaplex cards on the right, but you can get used to anything!
Within the context of the search I would want to see the title, the translator (why is this called author?) and then the chunk of text where my search term is showing, with the term highlighted. I would also find it useful to know how many times that term appears in that particular text (This is something that thebuddhaswords.net 's very basic search does- maybe I have just become accustomed to it).
Interesting idea. For segmented texts I believe you are shown up to three segments that have the term. So if you know thatās how it works, then you already get some indication (1, 2, or 3/3+)
I wonder if segmented translations should be given weight? That would mean for now defacto showing Bhante Sujato and Bhante Brahmaliās first.
Yeah, I canāt make a strong argument. And of course there is nothing inherently better about the translation itself.
But in terms of site functionality, they show better excerpts in the results. Kind of.
And they will take you to a text that can be viewed side by side with the Pali, which in general I would say is better. But of course that wonāt matter to everyone. Also, those are the only texts that have translator notes. Basically the segmented texts give you the most complete SuttaCentral experience.
The segmented texts tend to be newer as well. (although I just now realized that @cdpattonās translations are legacy texts) And potentially, although not necessarily, have more uniform translations since they were done using Bilara which promotes that.
At this point Iām just kind of throwing out any possible way we might be able to rank them that might in some way be useful.
I think the biggest problem is that they are simply so many so that it seems that almost any ranking may be better than no ranking at all ā¦ :TRIPLE_SIGH:
Thatās a great survey. Thank you ven @Snowbird for bringing it up. And thank you @HongDa for improvements.
It might be that my suggestions about filters more on the grouping and sorting side of the already produced search results. But these are also some kind of filters after all.
Here is an example with Kuį¹hÄr search results on Suttacentral.net (partial match)
#1 imo this grouping like I did on dhamma.gift looks more user friendly and helping to work with search results. While output on suttacentral is fine for 13 texts. if there will be 50+ texts or more itāll almost become unmanagable to make any desicions which texts might help more then other.
#2 The other big thing for the user is results aggregated by words. That can be a very important feature for the partial match search.
#3 is really minor but can be crucial for user. Showing variants of the word like
In the part āVariants for Kuį¹hÄrā
Ps i gave test links for fdg. If youāll see some tech info or errors please donāt mind. I just wanted to show the output that might improve the search result visualization for user on Sc.net.
I created this issue previously that kind of addresses your suggestion:
My personal feeling is that the Digital Pali Reader does such a good job at search for Pali that itās not worth it to try and replicate it on SuttaCentral.
But if we do, I would like to see those partial match hits as well as breakdown by book.
Ideally I might like to see filter suggestions customized to the results (i.e. only show potential filters that would have some effect on the results)
That would avoid having all of the same book at the top of the results. But Iām not sure if thatās the best way to do it.
One thing that would be helpful is to have the default search order by available translations. English search terms like āGreedā will pull up texts which only have their titles translated. Iām sure this is helpful for some people who want to search in English but do their own translation in root texts, but if a major goal of the site / search is to help people read translated EBTs in a language they know, then the current functionality is a bit suboptimal.
If this is the root issue, why not just apply a simple discount, like multiplying by length (or some function of length, like log length).
I think a part of a good benchmark might be that a search for right view returns MN9 at the top.
You donāt want really long texts which just briefly touch on a subject to rise to the top, but you also want longer texts which directly focus on a topic to be treated fairly.
Iām not sure I quite understand. Could you say this in a different way?
Currently, title matches are shown as suttaplex cards in the right column on desktop and at the top of the page on mobile.
When viewed on mobile, common terms like this to create the illusion that only items with title matches are returned.
Thatās a very good point. Currently it seems to be third in the right column on desktop. It could automatically be brought to the top in this case if
we sorted by longest sutta first
or we sorted by ārecommended suttas firstā
To me, the second option seems more reliable.
Greed has a different problem. The first ten title results are all for the abbreviated/repetition series at the end of AN chapters. I donāt want to disparage any sutta ever, but I doubt those are the suttas people want most.
Ah, yes I guess I just misunderstood what I was seeing.
This may not belong in this thread, but I do find the cards presence / presentation in search somewhat odd. And you can get it where they appear for texts with no translations - e.g. if I search āseven buddhasā (in english) I get āsuttaplex cardsā for two untranslated Chinese texts.
or we sorted by ārecommended suttas firstā
To me, the second option seems more reliable.
I donāt know exactly what you mean in terms of an underlying implementation, but I would worry something like a ārecommendedā flag could lead to unintended consequences. For example, thereās a lot of AN entries which are much more relevant for the search term āharsh speechā (which occurs once in MN9ās Bodhi translation). You wouldnāt want āgoodā but tangential suttas being sorted above more directly relevant suttas.
I agree it is confusing. Iām happy to get any and all feedback.
I also find it odd.
I think the idea is that translated texts will appear first (although Iām not 100% sure this is happening everywhere and in all ways. In this case there were no suttas translated that have the words in the title.
Also a very good point! Currently there are only a limited number of suttas that have been given any kind of recommendation status. And as you point out it is for the whole sutta. Perhaps if more suttas could be given this recommendation status it would mitigate the problem.