I figured it could be repurposed. I mean I try my best to be a good Buddhist and as such I’m all down with lists, but I think after a point very long lists just get quite hard to read. Don’t worry though, Ang. Sabbamitta, I believe, will directly add anything very important to the current release plan. And I will take responsibility for shifting over new additions to later RPs. If it doesn’t work/seems too scattered, or we discover that actually there aren’t that many mispronunciation finds as we expected we can always restore things as they were.
I’ve been thinking about this one a bit and I have to admit, I’m not 100% convinced, or at the very least feel there are other equally legitimate ways to consider things. Also, I still don’t quite see how this answers the original question of why a good search term might be turned down on the basis that it produces too many results when SCV has an inbuilt mechanism for narrowing down returned results to 3 and 5.
On the principle itself, I can think of at least a couple of counter points to the one put forward to the idea that it is preferable to only use search terms that generate one or two results, but I’ll just pause briefly on one: it goes rather against what, in my eyes at least, is quite an important part of getting a sense of the suttas. That is, it is the parts we find repeated over and over again that we can be most sure the Buddha actually did teach and are central to his teaching. Excluding them for the examples feature could well give quite a skewed representation of the Buddha teaching.
Nevertheless, I also want to take the opportunity to set my reservations aside and just think about how I can most constructively approach things if I were to adopt your point of view (incidentally, I do, of course, see great merit in using the examples feature to help guide users with respect to the vocabulary found in the translations—I most naturally just want to question and explore any, my own most certainly included, assumptions fed into the app). When I do I feel that maybe a tweak method would be supportive of the given goal.
It is two separate things to (1) ask people what things they would ask the Buddha about and (2) highlight the specific language a translator has used to render his precise teaching. If the later is taken as the primary interest over promoting a more “adventurous happenchance exploration” (which, by the by, I think can be as beneficial and beautiful as any other way) then I think we should be asking grep as much as we should be asking people. People may supply general themes, but it is grep that will give relevant search phrases.
Eg. taking up the theme of joy, grep already gives us our short(ish) list (of course, the exact search I ran can surely be fiddled with). By just a glance it’s pretty easy to weed out unsuitable phrases and select a few good items to add to the examples list:
A joy he is
and a joy he seems
and consists of joy and bliss
and find joy connected with
and finds joy connected with
and finds no joy connected with
and joy that ethical
are full of joy and
a source of joy and happiness
become filled with joy and happiness
be filled with joy and happiness
being full of joy in the
bring happiness and joy to themselves
bringing joy to the
bring pleasure and joy to yourself
But the joy and happiness
committed to the joy of
committed to the joy of solitude
do you experience joy in the
earlier ending of joy and aversion
ever feeling such joy and happiness
find joy connected with
finds joy in the
finds no joy in the
gives rise to joy and samadhi
hard to find joy in it
in rapture and joy because of
I say that joy connected with
I say that joy has a
Is internal joy with the
is unswerving finds joy in the
joy and clarity
joy and happiness
Joy and Happiness
joy in the
Joy is a
Joy is the
joy springs up
Joy springs up
me with a joy I never
one who lacks joy has destroyed
only natural that joy springs up
refers to the joy a parent
re full of joy and
re full of joy and happiness
Seeing the joy of those
speaking of such joy and
speak of such joy and happiness
that joy and happiness
them and take joy in them
Then joy and happiness
the rapture and joy that
the rapture and joy that faith
the rapture and joy that wisdom
they find joy in
they find joy in the
When you feel joy you need
which consists of joy and bliss
who has fulfilled joy has fulfilled
with joy in their
Trying out this way, my subsequent nominations (that still conform to the desired not too many results principle) would be:
“joy and bliss” (6)
“joy that ethical conduct brings” (1)
“being full of joy” (7)
“hard to find joy” (2)
“internal joy” (1)
“one who lacks joy” (5)
“May I feel joy” (2 - derived from the blurb result “When you feel joy”)
Absolutely. And since our users will have different perspectives all we really need to do is be clear on our own individual objectives as we add search terms. I merely provided my own perspective since that’s how I use SCV.
Ah. Right. I sort of left out the back story.
If a team of humans found 100 suttas for a search term, then there would be a lot of disagreement as to what the “top 5” should be shown the user. If a team of humans had only 5 suttas, then there would be no disagreement since all 5 should be returned. The longer the list, the more unfair SCV becomes in its recommendations. SCV does sort by relevance but there will be ties. And when there are ties, SCV sorts alphabetically. That would bias the results towards AN and completely hide the SN results. A shorter result set is therefore truer to the Dhamma–it’s what the user would have found personally in searching the suttas by hand. As the list gets longer, what we show the user starts behaving more like a rigged lottery and that didn’t feel proper to me.
There is also the technical reason that long search results take longer to process and result in a sluggish response. They take longer because each sutta found by grep is analysed individually to compute relevance and that takes CPU power. A 100 sutta result is about 20x more costly than a 5 sutta result. Yes, AWS charges for CPU usage. The CPU cost is zero right now, but as we scale on usage it won’t be.
This is a great example. Searching for “joy” returns 84 suttas (you can set maxResults to 100 in the URL to see this). You’ll notice the slowness caused by a larger search list. Compare with “root of suffering”, which is quick.
We can speed up the search for “joy” simply by using a different search term. For example, “find joy” returns 5 suttas quickly. And you have found 7 other search terms that are equally zippy.
One last consideration is that multiple examples can introduce bias. For example, adding 7 joy examples will bias search results towards joy vs. the other search terms. Indeed, joy will show up 7x more frequently than another example such as “root of suffering”. We’d literally be promoting the search for joy over the search to end suffering. That seems somehow…improper?
Right, apologies, I misunderstood you to mean that the way you set out was the way it had to be done universally, which is why I wanted to pause over the point, and just check if it was the best principle to run with.
Thanks so much for that, now I understand the point and I must say how much I love your attention and care for details like this. I agree, it is important, but I also think it’s important to get the right balance of factors with respect to any given point and here in terms of the actual function the examples feature is designed to serve I’d feel that process (top five filtered by highest hit rate, and then alphabetically) is pretty acceptable for a feature that is meant to help people as a first stepping stone or with a random suggestion. But one very high priority factor is this…
I think for various reasons we’re all on the same page about wanting to restrict the number of search results (as mentioned above, I think a shorter list can reduce overwhelm). the area of question for me, is between say 5 and 15, not 5 and 100s, but of course, whatever the case, the general point is key: more results = greater cost.
Just out of curiosity, the maximum number of results SCV gives is 25, but does eg. “joy” take deliver an 84 results CPU hit or 25 results CPU hit?
Yes, using grep; I just pre-grepped locally.
My point here was that, if we want to target meaningful phrases found in the suttas (around themes suggested by whomever) we may as well let the texts tell us what those exact phrases are.
Well, the tongue-in-cheek retort is that actually, it is correcting an existing bias in the list towards the negative!!!
The serious reply is that, yes absolutely, of course we want to keep things as level as possible (within pragmatic reason). This was just an example of a method. I’m very happy to do this process again for other major themes, if you or others would like to suggest them. At any rate, I’d really like to see that list get a little fatter.
Looking for more ecstasy one finds a drug dealer.
Looking for the end of suffering one finds the Buddha.
The search for “joy” consumes roughly about 84/7 more than the search for “root of suffering” even though both display only 5 results because SCV still has to process 84 suttas to find the top five.
Here is the URL you typed. I initially got 5 results as well, but when pressing CTRL-SHIFT-R, I got a clean slate and 84 results. The browser works insanely hard to cache stuff and not go to the internet. CTRL-SHIFT-R is the “do as I say really!” command that avoids the cache.
That produced a list which I then enjoyed manually checking for search results numbers and listening to excerpts from in order to produce the suggested list.
Now, particularly with some of the excellent points about bias and proper representation Karl has raised I have had to make sure I could account for my process. With respect to method, I am a keen fan of the both the methodical and the whimsical (balanced according to the need of context) and think that here they complement each other nicely.
In turn, I figured that, if we accept the above joy list (which was extracted from an original list of 60) then the number of extracted terms with respect to “faith” should proportional (with an original list of 141, I make it a final list of 16). However, I knocked off a few corresponding to visual estimate of, to all intents and purposes, duplication (by virtue of eg. capitalisation, or variation in plural, article and such).
After that, whimsicality, tempered by a not >10 rule, completed the selection as I just went through them (not even from top to bottom, but bit top, bit bottom, bit middle) and to my own tastes this step has a tolerable degree of bias (again thinking of the actual purpose of the given function I think the degree of distortion is 1) negligible 2) no or (possible less) greater than by other methods 3) not worth getting too tangled up about).
Lastly on the results numbers, I figure with some higher numbers and some very low the over all CPU damage should on average be acceptable.
Grep, is one of my happiest discoveries of last year:
Like really, I think it’s a bit questionable how much I love grep (in very brief, it’s a powerful search tool the you can use through Terminal (the scary command line thing one can (and I have) used to break my computer)).
The “unsegmented” text folder simply means that I’m still on another planet despite having had to have a nap! I meant “segmented” and was referring to the folder containing Bhante’s segmented files (I imagine you may know it from your work with the Vinaya - and yes, I know Vinaya results will actually be in the list too, but as far as I’m concerned it doesn’t matter given the rest of the process)
Oh and also, I think mixed methods is brilliant, so your way rocks!
Concerning your discussion on what are the important criteria for search phrases, I think that from whichever angle people approach the suttas, they will always end up with the way to the end of suffering. Simply because this is what the Buddha taught; he didn’t teach anything else. The whole mass of suttas are all an elaboration on the four noble truths, aren’t they?
Whether people are asking a question directly associated with their suffering, or they look for something inspiring, or whether they are just curious about how a certain concept developed over time, like this, their search will always lead them to the only thing the Buddha ever spoke about: suffering and the ending of suffering!