Today we have launched our new search engine on SuttaCentral. This is many years in the making! There have been many previous threads about improving search, and it has been stuck for technical reasons. We’re really happy with our new choice, so let me explain a little about what is happening.
This is a brand new and incomplete feature and there will be bugs so please be patient with us.
inelastic
When we launched SC with texts in 2012 we knew we’d have to provide full text search. At the time, the shiny new kid on the block was elasticsearch, built on the venerable Lucene library. We implemented this ourselves. I remember at one stage I had a discussion with a Buddhist programmer in England who did elasticseach for corporations. I said we’d like to hire him, and he just laughed and said, you couldn’t afford it. It’s not easy, is the point. SuttaCentral’s friend Chade Meng Tan, who built the Chinese search for Google, said “search is more dark arts than science”.
Anyway, elastic is great! It’s fast, full-featured, free, and can expand to include huge datasets. It’s also terrible: it’s a major dependency that has to be updated, and the interface is completely different to everything else on the site. If you’ve got a team of engineers to maintain and adjust search for your e-commerce site, it’s a good choice. For us, there’s just such a friction to changing or adjusting anything that we have simply left it for years.
arangoDB
In 2018 we adopted ArangoDB as our primary database. It’s awesome and our developers love it. While we were considering possible new search options, Arango went ahead and implemented a whole bunch of nice features which taken together result in it being a pretty fantastic search engine in itself.
So we decided to re-implement search using Arango. That means we can simply delete the search dependency and use what we already have, with the same methods that our engineers already use. Bye bye friction, hello easy tweaking and adjustments!
This work has been done by @HongDa — congratulations on your wonderful work!
what search can do
We implement a fairly rich range of features in the search.
- type the sutta ID to get the card and results (
mn123
) - map results for locations (there seems to be a bug with this right now, only the map is showing!)
- diacritical conversion
- search in many languages
- refine language selection via toolbar
- show results from multiple dictionaries
- use a range of filters to narrow results (see toolbar for list of filters)
- one nice filter is
in:ebt
which lets you exclude later texts from search. - also search by author, collection, etc.
- combine filters!
- one nice filter is
Probably a bunch of stuff I forgot.
what search can’t do
- read your mind
- replace other tools
- end dukkha
Search of suttas presents a range of hard problems, not least of which is the repetitive nature of the texts. There’s no single solution.
If you think, “I get what I want better using Google site search / ChatGPT / Bing” then congratulations! Use them! We can’t and shouldn’t try to emulate the results of other services. We do try to offer a range of results that will be generally useful.
in the future
The feature is under active development and will evolve steadily over the next few weeks. If you want to notify us of search bugs you’re welcome, but remember, we have a solid list of issues to work through and yours may well be on it already.
Our initial aim is to get more-or-less parity with the old search, then work on a range of new features. We’ll get started on them soon.
You can keep track of progress here: