Google search for SuttaCentral

User @faujidoc1 has made a Google search for SuttaCentral.

I’ve had it in the back of my mind to add a google search box to SC, just wondering what it takes. Obviously the recent outages of search make this more important.

For such a long time we have gotten used to the idea that search is something simple, a solved problem, and we forget that Google worked out a massive set of technical issues and by doing so became one of the world’s biggest companies. Nothing about search is easy; as Chade Meng-Tan, one of the early Google engineers and an SC sponsor, told me, “search is more dark arts than science”.

Anyway, we have a long term commitment to improve the search on SC, but possibly we may add a third-party search as well. The obvious candidate is Google.

Pros:

  • Simple
  • Fast
  • It’s Google, you get to leverage Google’s ancient hoodoo.
  • Google site search is used on lots of major sites, eg. Guardian, Boing Boing.
  • Resilience: there’s a backup search if SC’s search fails
  • Every approach to search will give somewhat different results.

Cons:

  • You have to pay or else you have ads
  • risk creepy tracking
  • Being in thrall of dark overlords
  • lack of customization or results
  • can’t guarantee indexing of all texts
  • interface is optimized for the wrong things, eg. “date”

Quality of results: mixed

Here’s a few sample searches side by side.

Dhamma: a common Pali word

SC has a dictionary result which is great. Both give relevant results, but Google is better at making core suttas prominent. Also, Google results are more focused on the EBTs, SCs are more random. We don’t weight the searches according to past searches, which would explain this.

winner: Google (narrowly)

Atammayata: a rare Pali word

SC gives some relevant results, although it doesn’t give them all, and misses the main canonical reference in MN. Google doesn’t get it at all.

winner: SuttaCentral!

Savatthi: a place name

SC gives a dictionary result and a map. Awesome! Both give relevant results.

winner: SuttaCentral

Cat: a moderately common English word

Both give relevant results, with similar content near the top. Google gives multiple results for dictionary entries that include the word cat. We don’t give dictionary results for English words.

winner: Google


So the outcome is that both approaches have their uses. Of course, using Google site search is always possible for a user, so the question is whether it’s worth exposing it more prominently.

Alternatives:

  • Use another service such as Duck Duck Go. (I’ve checked the results, they are similar, in some cases better, than Googles.)
    • Can’t have results on page, you gert redirected to DDG.
    • Which then exposes you to ads.
  • Expose Voice search on SC (only for a small subset of texts).

Anyway, having not used Google CSE I’m interested if anyone has any feedback, or if there are other alternatives.

13 Likes

Greetings Bhante :anjal:

I’m totally “voting” for adding Google search to SuttaCentral. It is amazing alternative, and I find it most importaint to be able to find core suttas related to a particular word like for example type “fear” and you’ve got pretty much all suttas about it. I think lack of such Google or Google-like search is last thing that SC misses to be the ultimate sutta platform. :smiley: Personally I was even using other sites to use search, and then type the sutta in SC after I found it elsewhere, because many times I just could not find what I was looking for on SC search.

Also if search engine creation is so difficult, why not use someone else work done already? :wink: At least by being on SuttaCentral, Google can make some good kamma to redeem itself from its Dark Overlord like actions :smiley:

As to AD’s, I think most of smart Internet users are using ADblock of some sort (especially on PC/lap-tops), so the AD’s should not appear anyway. Even on phones there are browsers with ADblocks so shouldn’t be much of a problem I guess :slight_smile: I’m not an expert thou.

I think it would help a lot of people to find some important suttas they’re looking for. And not everyone who use SC is also on D&D, so not everyone seen Faujidoc’s thread (great job Faujidoc btw! :wink: ) That’s why implementing it to SC as additional alternative search would be great. :slight_smile:

With Metta :heart: :anjal:

3 Likes

To be clear, we will not countenance for a second the possibility of ever having any ads for any purpose on SuttaCentral. Google offers a paid tier that has no ads, so we’d use that.

7 Likes

Edit:

:+1:t3: If it’s the paid tier with no ads, that’s great. We can all donate to cover the costs :pray:t3:

————
Ads… :nauseated_face:… personally I would not like to see them associated with SC in the long term.
I think the ads are embedded @Invo, not pop up, so a blocker won’t work.

But until the search starts functioning properly using Google might be the best alternative. :thinking:

Do we have any idea of the costs involved? If not too much I would be happy to make a contribution towards that. :anjal:

4 Likes

Another downside of Google is that it is blocked in some places. Notably China. Is that the case for DuckDuckGo?

3 Likes

How about YaCy?? It’s open source, but I’m not sure about the intricacies of how or if it would work for SC.

The feedback seems to be “nice idea, been around for ages, not sure how useful it is”

The thing is, we have a perfectly fine search engine, elasticsearch, which is the most powerful and sophisticated software of its type. There’s no reason we can’t invest the time and effort into making it better, except of course that we don’t have any time because we are busy building other things!

What i’m looking for is a drop-in enhancement that’s not taking of programmers away from other things.

Oh, it also seems that they have a free, no-ad tier for non-profits, nice!

Well that’s very kind! It seems the paid version is $5 per thousand queries. I haven’t looked at the logs to see how many queries we get. Let’s look into the non-profit version first and see how it is.

Also, if the criteria is “simple paid service” then Algolia becomes an option. It’s $1/1000 queries.

3 Likes

Re: ads on the Google search result page. My understanding is that one has to sign up for ad sense to display ads on the custom Search results page. You may have noticed that there are no ads at all on the Google search page I’ve created.
Of course, Google’s user agreement has you agree to ads, and perhaps they might place their own ads if there is heavy traffic… :thinking:
My thoughts in creating this customized search page was to solve an irritating problem (the frequent outages in the native elastic search engine on SuttaCentral) as simply as possible.
The problem with using minor, unknown companies to provide your service is that they are minor and unknown for a very good reason… their product is still ‘promising’ (usually sucks) :rofl: And unless one is confident of being able to spot the next Google in the making, one is better off going with the already established player.

1 Like

It most certainly does have ads, and lots of them. Here’s the same search on CSE; on the left Firefox with adblocker, on the right, Chrome with no adblocker.

Adsense is so that you can make some money from the ads. Otherwise the Big G takes it all.

2 Likes

I have no idea how CBETA’s corpus search function (pictured) works, but it is something incredible. If Sutta Central got something similar, that would be cool.

2 Likes

Look at that; I’ve been using Adblock on my devices for so long I forgot of its very existence! :laughing: :rofl: My bad! :blush: :pray:

In terms of number of searches, if it’s any help as an indicator, D&D search received 778 individual search requests in the last 1 month.

5 Likes

Yeah, I’ve got no idea what they’re doing with search. CBETA is mighty impressive these days, although remember it’s just one language. Search gets exponentially harder with multiple languages.

6 Likes

Could you create a configured google search page outside of sutta central that is linked to?

Available for those who wanted to use it but not integrated.

1 Like

Voice search results:

Winner? :wink:

4 Likes

Well that’s pretty much what @faujidoc1 has made. But we could put a link on the search results, sure.

Nice!

4 Likes

Here we go again.

3 Likes

We’re back, we made some upgrades, hopefully it will fix it.

3 Likes

Yes, working now - thanks!

2 Likes

:crossed_fingers: :crossed_fingers:

3 Likes