Finding posts with exact SC IDs and a fix for sutta search integration from SuttaCentral (proposal)

Continuing the discussion from We're live!:

@vimala @sujato @blake

I included Blake as the primary dev on SC integration plugin.

I have looked into existing solutions for tweaking the discourse fulltext search and, as Ayya Vimala mentioned before, there is not much interest among devs. I was then considering a way to do some sort of search hijack combined with regex parsing of search string into (multiple) SC IDs and remainder of keywords but it seemed too complicated. Next I considered a fulltext hack directly on PostgreSQL or DB level but this usually turns into a maintenance nightmare where every site upgrade might break something.

I almost gave up but then a very simple idea popped up: we have a solution implemented already, we just didn’t realize it :smile:

So, we take Blake’s autolink plugin, modify it to add a tag with the newly generated id to the thread in which it is invoked, and Bob’s your uncle!

We can then search on D&D with tags:id1,id2 for posts containing id1 or id2 and with tags:id1+id2 for posts containing both id1 and id2. And we can also send the exact same search string from SC to D&D :smile:

Of course it would be just too boring if this just worked, so I found some issues to fix.

This could also be extended for para references in suttas: first link always points to sutta card, second to exact paragraph or start of the sutta within the range

  • one-to-one relationship

    /dn1 → /dn1 (/en/sujato/dn1)
    /dn1.2 → /dn1 (/en/sujato/dn1#8)
    /dn1.2.1 → /dn1 (/en/sujato/dn#8)
    /dn1.2.3 → /dn1 (/en/sujato/dn1#10)

  • many-to-one relationship

    an2.11 → an2.11-20 (/en/sujato/an2.11-20#sc11.1)
    an2.12 → an2.11-20 (/en/sujato/an2.11-20#sc12.1)

  • this would also work with the workaround for old links /en/dn1 and pi/dn1 → /dn1

  • I noticed that there are two types of #bookmarks in the suttas, #sc12 and #12 for dn and mn, but just #sc12 for an.

    Maybe #sc12 could be used to mark just the positions (for sections and start of the sutta, without highlighting) and #12 could highlight the referenced para? Then sections and paragraphs could be differentiated like this:

    /dn1.2 → /dn1 (/en/sujato/dn1#sc8)
    /dn1.2.1 → /dn1 (/en/sujato/dn#8)

    The same principle should work for agamas and the rest

Your thoughts?

5 Likes

Okay, great idea, it sounds cool, lets see what Vimala and Blake have to say.

2 Likes

Thank you @musiko. That’s fast work.
I’m a bit flat out with all the issues at the moment as well as the preparations for a Bhikkhuni ordination next weekend but will get back to this asap.

3 Likes

Just a few initial remarks:

It would be better if links to dn1 would simply to to /dn suttaplex card so people can decide for themselves which translation they want in which language and not default to English.

This won’t be needed. We hope to have this working by the end of the week.

Either will work. If you go to #3 or to #sc3 will end you up in the same place on Sujato/pali translations in segmented texts. The old html texts still use only #3 format.

I would like to have @Blake’s opinion on the plugin solution. But after next week because he’s a bit busy now we still have STXNext working for us this week.

1 Like