SuttaCentral Search Bugs 🪰

This whole thread will probably be a farm for the tests we need.

If there was someone who could set up the system for testing maybe I could be taught how to create new ones.

Well, not right now. But here’s a few quick ideas before I go to work:

  • Integrating this into Github’s CI system would be great, but perhaps just running it on your own machine for now would be fine.
  • Level of testing: are most of the bugs on the client or server. If the latter, an API test is going to be much easier to set up.
  • Creating tests: unit tests are simple, but require programming knowledge. Something like Cucumber or it’s Python equivalent, Behave, would let you create an executable specification that’s much easier to create and be understood by non technical folk.
  • Alternatively, a front end could be developed where assertions could be expressed in a non technical way.
  • Note that the presence of tests will make future features and refactoring much easier.
3 Likes

So for an absolute minimal example, this is how you might do it:

import pytest

def top_search_results(query: str) -> list[str]:
    return ["an10.16", "an10.216", "an10.33"]
    # TODO: connect to test server and get the ids of the texts


@pytest.mark.parametrize(
    "query,results", [
        ("cat", ["an10.16", "an10.216", "an10.33"]),
        # Add a new line here for each test.
    ]
)
def test_top_search_results(query, results):
    assert top_search_results(query) == results
2 Likes

Thank you for that!

I absolutely agree that implementing something like this will save work in the long run.

Here is an odd one. https://suttacentral.net/search?query=in:mn+rag-robe with a hyphen return 4 results without a hyphen, but https://suttacentral.net/search?query=in:mn+ragrobe without a hyphen returns nothing.

1 Like

Just had a look at the code. Looks like we use Algolia for some searches and ArangoDB for others. Both have a class TextLoader and both have a method fix_text() that might be the problem.

EDIT

No, the two methods are the same. Perhaps Algolia and ArangoDB deal with apostrophes differently?

3 Likes

Thanks for feedback!

Algolia may normalize punctuation, so the apostrophe in ā€œpotter’s shedā€ may be ignored or treated as a delimiter, causing the phrase to be indexed as something like ā€œpotters shedā€. In this case, searching for ā€œpotters shedā€ may result in a match for ā€œpotter’s shedā€.

When searching with Arangosearch, there is no similar standardization, so searching for potters shed will not match potter's shed.

I will see if arangosearch provides a method to normalize punctuation.

2 Likes

Perhaps this should be discussed on github, but I had a thought:

The fix_text() methods do some preprocessing of the search string. Would removing all apostrophes there solve the problem?

1 Like

Nope, my bad. The TextLoader classes are for setting up the search engines at the command line.

1 Like

Not really related, but it made me remember that when I search author:kelly on suttacentral I just get «Data Load Error». I tried to go through all the authors now and see if any other translator got the same result and I found some:

author:rhysdavids_litt

author:tw_rhysdavids

author:ukumarabhivamsa

author:unandamedha

author:unarada

Am I the only one who get «Data Load Error»?

Hey Beaver,

I’m not sure if this is related, but I am taking a look at the code for loading data:

This is going to be ongoing work in my spare time, but I’ll see if this is related to that area of the code base.

Cheers,

Ajahn J.R.

3 Likes

Yesterday-ish all the filters were returning data load errors. I could only do a vanilla search

1 Like

Yesterday-ish all the filters were returning data load errors. I could only do a vanilla search

@HongDa, do you know what might be happening?

I just tried:

in:an kassapa OR moggallana
in:dn cat
author:sujato kassapa OR moggallana
title:intention

It looks like nothing went wrong. Can you tell me a filter that would cause the error? I’ll check it out.

1 Like

NM it seems to be working now. Sorry

2 Likes

No need to apologize! It’s always good to report (after doing a hard refresh)

@Jhanarato mentioned this:

https://suttacentral.net/search?query=author:kelly

gives the data load error

1 Like

Reproducible on my machine. Below is the output when I hit http://localhost:2580/search?query=author:kelly

sc-nginx     | 10.0.2.2 - - [13/Mar/2025:09:33:06 +0000] "POST /api/search/instant?limit=50&query=author%3Akelly&language=en&restrict=all&matchpartial=false HTTP/1.1" 502 157 "http://localhost:2580/search?query=author:kelly" "Mozilla/5.0 (X11; Linux x86_64; rv:136.0) Gecko/20100101 Firefox/136.0"
sc-flask     |   File "/usr/local/lib/python3.11/site-packages/flask_restful/__init__.py", line 467, in wrapper
sc-flask     |     resp = resource(*args, **kwargs)
sc-flask     |            ^^^^^^^^^^^^^^^^^^^^^^^^^
sc-flask     |   File "/usr/local/lib/python3.11/site-packages/flask/views.py", line 107, in view
sc-flask     |     return current_app.ensure_sync(self.dispatch_request)(**kwargs)
sc-flask     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sc-flask     |   File "/usr/local/lib/python3.11/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
sc-flask     |     resp = meth(*args, **kwargs)
sc-flask     |            ^^^^^^^^^^^^^^^^^^^^^
sc-flask     |   File "/opt/sc/sc-flask/server/search/view.py", line 69, in post
sc-flask     |     return instant_search_query(
sc-flask     |            ^^^^^^^^^^^^^^^^^^^^^
sc-flask     |   File "/opt/sc/sc-flask/server/search/instant_search.py", line 153, in instant_search_query
sc-flask     |     fuzzy_dictionary_entries, hits, suttaplexs, total = process_search_results(
sc-flask     |                                                         ^^^^^^^^^^^^^^^^^^^^^^^
sc-flask     |   File "/opt/sc/sc-flask/server/search/instant_search.py", line 209, in process_search_results
sc-flask     |     sort_by_sutta_numbering_rules(hits)
sc-flask     |   File "/opt/sc/sc-flask/server/search/instant_search.py", line 1110, in sort_by_sutta_numbering_rules
sc-flask     |     input_list.sort(key=get_key)
sc-flask     |   File "/opt/sc/sc-flask/server/search/instant_search.py", line 1101, in get_key
sc-flask     |     integer_part, decimal_part = number_part.split('.')
sc-flask     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^
sc-flask     | ValueError: too many values to unpack (expected 2)
sc-flask     | [pid: 13|app: 0|req: 1/7] 10.0.2.2 () {60 vars in 1520 bytes} [Thu Mar 13 09:33:05 2025] POST /api/search/instant?limit=50&query=author%3Akelly&language=en&restrict=all&matchpartial=false => generated 0 bytes in 1342 msecs (HTTP/1.1 500) 0 headers in 0 bytes (0 switches on core 0)

Can I have a fixed width font please? :sob:

Issue created on github, with monospaced font:

I tried doing author: brahmali and it didn’t return anything (no error) but author: brahmali sweep did return results.

Is the first criteria meant to return all text by Aj Brahmali?

I tried the same with author:Kelly but I got a data load error as Aj. JR
So these are similar but not the same problem,

author:brahamali worked for me now. Looks like you have a space after the colon which might have made it not work.

I think so, I got 424 results.

author: brahmali sweep or author:brahmali sweep both got 8 results, no matter if there was a space or not after the colon.

1 Like