Finding posts with exact SC IDs and a fix for sutta search integration from SuttaCentral (proposal)

musiko · March 10, 2018, 10:57pm

Continuing the discussion from We're live!:

I included Blake as the primary dev on SC integration plugin.

I have looked into existing solutions for tweaking the discourse fulltext search and, as Ayya Vimala mentioned before, there is not much interest among devs. I was then considering a way to do some sort of search hijack combined with regex parsing of search string into (multiple) SC IDs and remainder of keywords but it seemed too complicated. Next I considered a fulltext hack directly on PostgreSQL or DB level but this usually turns into a maintenance nightmare where every site upgrade might break something.

I almost gave up but then a very simple idea popped up: we have a solution implemented already, we just didn’t realize it

So, we take Blake’s autolink plugin, modify it to add a tag with the newly generated id to the thread in which it is invoked, and Bob’s your uncle!

We can then search on D&D with tags:id1,id2 for posts containing id1 or id2 and with tags:id1+id2 for posts containing both id1 and id2. And we can also send the exact same search string from SC to D&D

Of course it would be just too boring if this just worked, so I found some issues to fix.

search with tags containing dot doesn’t work as expected (other tags with sc ids work ok)

AN3.65 and MA16 Kālāma: Comparative translation, by Piya Tan is tagged with an3.65

Search results for 'tags:an3.65' - Discuss & Discover returns 0 results
We would of course need to bulk update all existing threads with all ids already mentioned in the posts within the thread, but there seem to be some easy solutions for that:

Adding a tag in bulk without erasing existing tags - support - Discourse Meta

Bulk Actions: "Add Tags" in addition to "Change Tags" - feature - Discourse Meta
SC discussion button should generate Search results for 'tags:mn10' - Discuss & Discover instead of Search results for '"mn10 "' - Discuss & Discover
autolink plugin could be updated so that it contains a popup link to sutta card + additional link to EN trans. by Sujato

this is needed for suttas that have many-to-one mapping
autolink plugin could also be updated to include occurrences like an2,12 (typo) and an 2:12 (alternative style)
Then there is the issue of suttas on SC that don’t have a one-to-one mapping to URI

an2.11-20 should expand to tags:an2.11,an2.12,an2.13,an2.14,an2.15,an2.16,an2.17,an2.18,an2.19,an2.20 (user can then manually correct search string for specific sutta)

This can be done with a bit of code with all URIs that contain a hyphen or with a mapping table (maybe the same one that will be needed for the existing links pointing to the previous version of SC).

Perhaps this could be baked into an expansion table (we know that new sutta ids wil not be added so this is a one time job) and would fix SC search problems for suttas like an2.12

This could also be extended for para references in suttas: first link always points to sutta card, second to exact paragraph or start of the sutta within the range

one-to-one relationship

/dn1 → /dn1 (/en/sujato/dn1)
/dn1.2 → /dn1 (/en/sujato/dn1#8)
/dn1.2.1 → /dn1 (/en/sujato/dn#8)
/dn1.2.3 → /dn1 (/en/sujato/dn1#10)
many-to-one relationship

an2.11 → an2.11-20 (/en/sujato/an2.11-20#sc11.1)
an2.12 → an2.11-20 (/en/sujato/an2.11-20#sc12.1)
…
this would also work with the workaround for old links /en/dn1 and pi/dn1 → /dn1
I noticed that there are two types of #bookmarks in the suttas, #sc12 and #12 for dn and mn, but just #sc12 for an.

Maybe #sc12 could be used to mark just the positions (for sections and start of the sutta, without highlighting) and #12 could highlight the referenced para? Then sections and paragraphs could be differentiated like this:

/dn1.2 → /dn1 (/en/sujato/dn1#sc8)
/dn1.2.1 → /dn1 (/en/sujato/dn#8)

The same principle should work for agamas and the rest

Your thoughts?

sujato · March 10, 2018, 11:16pm

Okay, great idea, it sounds cool, lets see what Vimala and Blake have to say.

Vimala · March 10, 2018, 11:32pm

Thank you @musiko. That’s fast work.
I’m a bit flat out with all the issues at the moment as well as the preparations for a Bhikkhuni ordination next weekend but will get back to this asap.

Vimala · March 12, 2018, 8:45pm

Just a few initial remarks:

It would be better if links to dn1 would simply to to /dn suttaplex card so people can decide for themselves which translation they want in which language and not default to English.

This won’t be needed. We hope to have this working by the end of the week.

Either will work. If you go to #3 or to #sc3 will end you up in the same place on Sujato/pali translations in segmented texts. The old html texts still use only #3 format.

I would like to have @Blake’s opinion on the plugin solution. But after next week because he’s a bit busy now we still have STXNext working for us this week.