Improving "Join the discussion about this sutta" link

For folks who didn’t know, if you are on a sutta on the main site, you can click this menu item to open the forum in a search for the sutta number:

Clicking there will open up this page on the forum:

https://discourse.suttacentral.net/search?q="mn10 "

The following suggestion was made:

I wonder if anyone at @helpdesk-dd has done a deep dive on how the forum search works?

5 Likes

Wow! I just tried it out and it’s fantastic. Thank you SO much for posting this! Using this feature could help people to get more answers and avoid creating multiple discussions on the same topic.

:pray:

3 Likes

It’s funny because this has been there maybe for 5+ years. Perhaps some oldtimer can fill us in?

2 Likes

Did you just notice it?

Me? No. I’ve known about it for years. I just can’t pin down exactly when.

It’s a bit less useful for the Vinaya. For example if you are on a Vinaya page it looks like this:
https://discourse.suttacentral.net/search?q=%22pli-tv-bu-vb-pj1%20%22 and doesn’t return any results, probably because people tend not to post that kind of a reference.

1 Like

Yes, it was there quite a while. The second link you get for MN10 was something I posted in 2015 :rofl:

It slipped my mind when the Forum stopped supplying links to suttas. I’m not sure whether it was still working at that point, but now it’s all back and working both ways!

1 Like

I meant the feature on suttacentral.net. That feature could have been added long after you posted your message in 2015.

In any case, It looks like for some reason it is appending a space after the citation and that may be why mn10. and mn10, are being missed. Will have to fool around with it.

I think it is appending the space to prevent a search for MN 10 from matching MN 101, etc.

Prepending and appending a forward slash instead ("/mn10/") seems like a functional workaround. Are there cases that this search term misses?

I’ve checked my correspondence and I see I discussed the idea of posting here some links to discussions of suttas on DhammaWheel to take advantage of that feature back in May 2015…

2 Likes

love this ,thank you for sharing @Snowbird,Anumodami :pray: :+1:

1 Like

This is a long standing problem Finding posts with exact SC IDs and a fix for sutta search integration from SuttaCentral (proposal).

The idea of using the tags with dot notation has an additional problem since the dot character is now forbidden for tags (I was thinking of a possible workaround using a unicode dot but this approach would need some additional coding both on the SC and D&D side).

1 Like

Ah, I thought you might already be on top of this.

Would you prefer this thread be merged into the thread you linked to?

From what I read there it sounds like you are trying to set up some auto tagging system. Is there a reason that you don’t just want to do searches for the citations in the posts and then let the relevance option bubble the better results to the top?

At a quick glance it appears that this only returns results that include a url. It misses cases where it’s not in a url. Were you thinking that the / / would turn it into a regex search?

1 Like

No, I included the slashes to match URLs. D&D (at least currently) is automatically turning sutta citations into links to SC, and it seems to me that the search engine finds those.

I tried regular expressions too but those seem unsupported. For example, \bmn *10\b would be a good enough pattern that wouldn’t rely on the presence of a URL to the sutta but would also avoid false positives like MN 101. If this could be enabled somehow, it would solve the problem without any pitfall that I can think of.

1 Like

Ah, right. However the way this works (@musiko please correct me if I’m wrong) is that this transformation happens on the fly when the post is loaded in the browser. No permanent changes are made to the actual text of the post.

You can kind of see this if you go back later to edit a post that has a citation. It’s still going to just be the text citation. Now if you copy/paste it into a new post then I think the link will go with it. For example:

false positives like [MN 101](https://suttacentral.net/mn101/en/sujato). If this could be

Unfortunately there doesn’t seem to be support for an or operator and "mn10 " /mn10/ appears to be treated like "mn10 " AND /mn10/

You’re right, my previous message is not in the search results for "/mn101/", only for "MN 101.".

Strangely, it is not in the results even for "MN 101". The period in this case seems to be required.

Edit: Actually, I am not sure what is going on in the search results.

1 Like

Something I have said myself too many times :rofl:!

1 Like

We can continue here, I’m not sure that approach is still valid.

The issue is word boundary (how to map an1.1 in SC ID only to linkified strings AN 1.1, an1:1 and An 1.1 and not to AN 1.10, an1:10 and An 1.10 in D&D. This is why SC appends the space "an1.1 " to Discourse search query, but this has issues with punctuation marks after the ID.

Correct. and this is why the /an1.1/ approach only works for finding posts that actually contain direct links to SC (either inserted manually or transformed from linkified IDs by quoting them).

Tags could be a solution, but this would require generating thousands of (special) dotted tags in advance on D&D, creating a plugin to autotag topics with IDs in OP (and replies?!), considering a max tag limit (currently 10) on topics, and updating SC code to generate correct Discourse search strings, therefore I think this is not a feasible solution.

This would be an ideal solution, if the linkified IDs were actually baked in the cooked post (i.e. html rendered from the raw post and saved to the DB). But as it stands, the linkified IDs are generated by the client browser using JS on each page render (which has its pros and cons).

If you mean many more results, I’d guess it’s database reindexing the newly searched strings.

2 Likes

@musiko Thank you for this explanation!

In your view, is it feasible to add this feature via a custom plugin to the Discourse instance that D&D is running on? For example, search results for “AN 1.1” could be filtered before displaying them so that “AN 1.10” and so on are discarded.

Discourse search is based on PostgreSQL full text search, and I see no way to hack that deeply into the core.

Full text search is very good at finding, well, full text, but not so much at finding arbitrary substrings.

Examine the different results using this /mn1/ and this “/mn1/” search string.

The first one finds many more approximate hits (e.g. /mn1 in /mn147/) than the second one, which finds exact strings only. This would be the ideal solution, if the linkified sutta ID string was actually part of the cooked post (which would also require a custom plugin, because the built-in linkify functionality doesn’t allow for regex substitution).

But the main problem with the plugin is that any change to the regex rules requires rebake of all posts (which is very resource intensive and time consuming), as well as a forum rebuild on any plugin code change.

As a partial workaround, the SC search link to the forum could be changed to search for the /search?q=%22%2Fmn1%2F%22 instead of the /search?q=%22mn1%20%22 until we figure out a better way.

Actually, I’m not sure this would be an improvement, because it seems that there are many more sutta id reference strings than direct links in the posts (but there is also an issue of finding MN1 vs finding MN 1—which is found by neither existing or proposed search string).

Could we do it without cooking or re-cooking any of the posts?

Currently, we are querying the PostgreSQL database for only one search phrase, e.g., "MN1 ". Instead, I am thinking that we could query the database for ‘either "MN1" or "MN 1".’ This seems doable with PostgreSQL’s @@ operator. With this alternative query:

  1. The results would contain citations to both MN 1 and MN1. This would resolve the spacing issue between “MN” and “1.”
  2. We would have a false positive issue because unlike the search phrase "MN1 ", the phrase "MN1" accepts punctuation after the citation as in MN1., but it also accepts citations to different suttas like MN10. However, this issue could be resolved by filtering the results for “true” citations to the sutta by matching the post against a case-insensitive regular expression.
    • This filtering could be done in Ruby in a plugin (maybe?), or in the PostgreSQL query (probably more efficient, e.g., with a CTE?).
    • The regular expression could be \bmn *1\b if PCRE or (^|[^a-z])mn *1([^0-9]|$) if Postgres.

What do you think about this approach?