SuttaCentral: a new search

Today we have launched our new search engine on SuttaCentral. This is many years in the making! There have been many previous threads about improving search, and it has been stuck for technical reasons. We’re really happy with our new choice, so let me explain a little about what is happening.

This is a brand new and incomplete feature and there will be bugs so please be patient with us.

inelastic

When we launched SC with texts in 2012 we knew we’d have to provide full text search. At the time, the shiny new kid on the block was elasticsearch, built on the venerable Lucene library. We implemented this ourselves. I remember at one stage I had a discussion with a Buddhist programmer in England who did elasticseach for corporations. I said we’d like to hire him, and he just laughed and said, you couldn’t afford it. It’s not easy, is the point. SuttaCentral’s friend Chade Meng Tan, who built the Chinese search for Google, said “search is more dark arts than science”.

Anyway, elastic is great! It’s fast, full-featured, free, and can expand to include huge datasets. It’s also terrible: it’s a major dependency that has to be updated, and the interface is completely different to everything else on the site. If you’ve got a team of engineers to maintain and adjust search for your e-commerce site, it’s a good choice. For us, there’s just such a friction to changing or adjusting anything that we have simply left it for years.

arangoDB

In 2018 we adopted ArangoDB as our primary database. It’s awesome and our developers love it. While we were considering possible new search options, Arango went ahead and implemented a whole bunch of nice features which taken together result in it being a pretty fantastic search engine in itself.

So we decided to re-implement search using Arango. That means we can simply delete the search dependency and use what we already have, with the same methods that our engineers already use. Bye bye friction, hello easy tweaking and adjustments!

This work has been done by @HongDa — congratulations on your wonderful work!

what search can do

We implement a fairly rich range of features in the search.

  • type the sutta ID to get the card and results (mn123)
  • map results for locations (there seems to be a bug with this right now, only the map is showing!)
  • diacritical conversion
  • search in many languages
  • refine language selection via toolbar
  • show results from multiple dictionaries
  • use a range of filters to narrow results (see toolbar for list of filters)
    • one nice filter is in:ebt which lets you exclude later texts from search.
    • also search by author, collection, etc.
    • combine filters!

Probably a bunch of stuff I forgot.

what search can’t do

  • read your mind
  • replace other tools
  • end dukkha

Search of suttas presents a range of hard problems, not least of which is the repetitive nature of the texts. There’s no single solution.

If you think, “I get what I want better using Google site search / ChatGPT / Bing” then congratulations! Use them! We can’t and shouldn’t try to emulate the results of other services. We do try to offer a range of results that will be generally useful.

in the future

The feature is under active development and will evolve steadily over the next few weeks. If you want to notify us of search bugs you’re welcome, but remember, we have a solid list of issues to work through and yours may well be on it already.

Our initial aim is to get more-or-less parity with the old search, then work on a range of new features. We’ll get started on them soon.

You can keep track of progress here:

35 Likes

Sadhu Sadhu! Thank you so much for all the hard work.

If we are able, would you prefer for us to create issues in the repo after checking the ones in the list you linked to above if we find bugs?

9 Likes

It can’t end dukkha, but it can find the root of suffering, which is an important precondition for ending dukkha.

The old search returned 12275 results for root of suffering, the new one returns 13—which is a number that you can handle and that is helpful.

Even with root OR of OR suffering, the new search shows “only” 9050 results. So no idea what the old one did to arrive at 12275. In any case, 9050 for root OR of OR suffering is still much, but it’s exactly what you asked for!

To cut the long story short: Heartfelt congratulations to this development! It’s just wonderful!!!

:balloon: :tada: :confetti_ball:

17 Likes

Congratulations!

However, I just put the word ratana into the search and it didn’t return the ratana sutta (on the first page at least). :person_shrugging: Typing ratanasutta did return what I was after, but kids these days are lazy :stuck_out_tongue:

7 Likes

Sure, that would be great.

1 Like

The filters are AWESOME!
I’ve just been doing all sorts of cool searches.
Thank you!

Feature request (already)
I’m wondering if it’s possible to return the sutta card?
For my purpose I was; searching a pali phrases, then clicking through to the sutta, then having to click parallels so I could see the side by side in English

2 Likes

Ooh. That sounds great.

1 Like

Like this?

This is quite possible, the main objection being that the suttaplex API contains a lot of data so it would weigh the search results down a lot. We need to figure out a way around this.

  • return partial data?
  • lazy load?
  • load on click?
  • load on user option?

All outstanding issues are here:

2 Likes

FWIW, even with bad internet connections I always find the api to be quite zippy. Personally I’d be happy with a click to load.

And while we are talking about APIs, is there any chance there will be a search API? I didn’t see that in the list of issues.

1 Like

load on click sound reasonable to me too.

If I’ve got the results I don’t mind waiting a second for the parallels to pop up.
(Aussie monastery internets must be slower than in SL. SC is far from ‘zippy’ here sometimes)

1 Like

I meant individual API calls. The site itself is slow to load, but if you try the SC-light version, you can see what it’s like when only two API calls are made to build a sutta page.

3 Likes

Apologies if any are already aware of this issue but it seems English sutta titles are not returning with the new search. For example, if I search “bright protectors” I receive zero results. But if I search “bright principles” it will return Iti 42, even though the title of that sutta is “bright protectors”.


4 Likes

English titles are returning, just not those English titles. It seems that titles in the legacy texts are not being found by search? Is that a known error Bhante @Sujato? I couldn’t find an issue for it.

3 Likes

Seems, not could you add it to the list? Doesn’t hurt to have more tests.

1 Like

Are you referring to this list?

Looks like you have recently updated it.

2 Likes

Yes, that’s good, Hongda is working through the issues.

TBH it was launched prematurely, but I think it should be in a reasonable state in a few weeks.

3 Likes

They say if you’re not launching prematurely, you’re launching too late! :joy: No worries about the bugs and thanks for all the hard work, @HongDa :grin:

3 Likes

@SDC Thank you for posting. I too have not received any results when using the search bar. I attempted with my Android and also with my laptop.

1 Like

Maybe a dumb question, but how are the results ordered?
(I’d love to see how many times my search query appears in that result)

1 Like

The typical order is occurrences in the document divided by total words in the document… not sure if they’re doing anything special beyond that.

2 Likes