This whole thread will probably be a farm for the tests we need.
If there was someone who could set up the system for testing maybe I could be taught how to create new ones.
This whole thread will probably be a farm for the tests we need.
If there was someone who could set up the system for testing maybe I could be taught how to create new ones.
Well, not right now. But hereās a few quick ideas before I go to work:
So for an absolute minimal example, this is how you might do it:
import pytest
def top_search_results(query: str) -> list[str]:
return ["an10.16", "an10.216", "an10.33"]
# TODO: connect to test server and get the ids of the texts
@pytest.mark.parametrize(
"query,results", [
("cat", ["an10.16", "an10.216", "an10.33"]),
# Add a new line here for each test.
]
)
def test_top_search_results(query, results):
assert top_search_results(query) == results
Thank you for that!
I absolutely agree that implementing something like this will save work in the long run.
Here is an odd one. https://suttacentral.net/search?query=in:mn+rag-robe with a hyphen return 4 results without a hyphen, but https://suttacentral.net/search?query=in:mn+ragrobe without a hyphen returns nothing.
Just had a look at the code. Looks like we use Algolia for some searches and ArangoDB for others. Both have a class TextLoader
and both have a method fix_text()
that might be the problem.
EDIT
No, the two methods are the same. Perhaps Algolia and ArangoDB deal with apostrophes differently?
Thanks for feedback!
Algolia may normalize punctuation, so the apostrophe in āpotterās shedā may be ignored or treated as a delimiter, causing the phrase to be indexed as something like āpotters shedā. In this case, searching for āpotters shedā may result in a match for āpotterās shedā.
When searching with Arangosearch, there is no similar standardization, so searching for potters shed
will not match potter's shed
.
I will see if arangosearch provides a method to normalize punctuation.
Perhaps this should be discussed on github, but I had a thought:
The fix_text()
methods do some preprocessing of the search string. Would removing all apostrophes there solve the problem?
Nope, my bad. The TextLoader
classes are for setting up the search engines at the command line.
Not really related, but it made me remember that when I search author:kelly on suttacentral I just get «Data Load Error». I tried to go through all the authors now and see if any other translator got the same result and I found some:
author:rhysdavids_litt
author:tw_rhysdavids
author:ukumarabhivamsa
author:unandamedha
author:unarada
Am I the only one who get «Data Load Error»?
Hey Beaver,
Iām not sure if this is related, but I am taking a look at the code for loading data:
This is going to be ongoing work in my spare time, but Iāll see if this is related to that area of the code base.
Cheers,
Ajahn J.R.
Yesterday-ish all the filters were returning data load errors. I could only do a vanilla search
Yesterday-ish all the filters were returning data load errors. I could only do a vanilla search
@HongDa, do you know what might be happening?
I just tried:
in:an kassapa OR moggallana
in:dn cat
author:sujato kassapa OR moggallana
title:intention
It looks like nothing went wrong. Can you tell me a filter that would cause the error? Iāll check it out.
NM it seems to be working now. Sorry
No need to apologize! Itās always good to report (after doing a hard refresh)
@Jhanarato mentioned this:
https://suttacentral.net/search?query=author:kelly
gives the data load error
Reproducible on my machine. Below is the output when I hit http://localhost:2580/search?query=author:kelly
sc-nginx | 10.0.2.2 - - [13/Mar/2025:09:33:06 +0000] "POST /api/search/instant?limit=50&query=author%3Akelly&language=en&restrict=all&matchpartial=false HTTP/1.1" 502 157 "http://localhost:2580/search?query=author:kelly" "Mozilla/5.0 (X11; Linux x86_64; rv:136.0) Gecko/20100101 Firefox/136.0"
sc-flask | File "/usr/local/lib/python3.11/site-packages/flask_restful/__init__.py", line 467, in wrapper
sc-flask | resp = resource(*args, **kwargs)
sc-flask | ^^^^^^^^^^^^^^^^^^^^^^^^^
sc-flask | File "/usr/local/lib/python3.11/site-packages/flask/views.py", line 107, in view
sc-flask | return current_app.ensure_sync(self.dispatch_request)(**kwargs)
sc-flask | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sc-flask | File "/usr/local/lib/python3.11/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
sc-flask | resp = meth(*args, **kwargs)
sc-flask | ^^^^^^^^^^^^^^^^^^^^^
sc-flask | File "/opt/sc/sc-flask/server/search/view.py", line 69, in post
sc-flask | return instant_search_query(
sc-flask | ^^^^^^^^^^^^^^^^^^^^^
sc-flask | File "/opt/sc/sc-flask/server/search/instant_search.py", line 153, in instant_search_query
sc-flask | fuzzy_dictionary_entries, hits, suttaplexs, total = process_search_results(
sc-flask | ^^^^^^^^^^^^^^^^^^^^^^^
sc-flask | File "/opt/sc/sc-flask/server/search/instant_search.py", line 209, in process_search_results
sc-flask | sort_by_sutta_numbering_rules(hits)
sc-flask | File "/opt/sc/sc-flask/server/search/instant_search.py", line 1110, in sort_by_sutta_numbering_rules
sc-flask | input_list.sort(key=get_key)
sc-flask | File "/opt/sc/sc-flask/server/search/instant_search.py", line 1101, in get_key
sc-flask | integer_part, decimal_part = number_part.split('.')
sc-flask | ^^^^^^^^^^^^^^^^^^^^^^^^^^
sc-flask | ValueError: too many values to unpack (expected 2)
sc-flask | [pid: 13|app: 0|req: 1/7] 10.0.2.2 () {60 vars in 1520 bytes} [Thu Mar 13 09:33:05 2025] POST /api/search/instant?limit=50&query=author%3Akelly&language=en&restrict=all&matchpartial=false => generated 0 bytes in 1342 msecs (HTTP/1.1 500) 0 headers in 0 bytes (0 switches on core 0)
Can I have a fixed width font please?
Issue created on github, with monospaced font:
I tried doing author: brahmali
and it didnāt return anything (no error) but author: brahmali sweep
did return results.
Is the first criteria meant to return all text by Aj Brahmali?
I tried the same with author:Kelly
but I got a data load error as Aj. JR
So these are similar but not the same problem,
author:brahamali worked for me now. Looks like you have a space after the colon which might have made it not work.
I think so, I got 424 results.
author: brahmali sweep or author:brahmali sweep both got 8 results, no matter if there was a space or not after the colon.