Citation Link Up tool 🔧 to turn citations into links for a block of text

Inspired by this post, and in my ongoing efforts to learn coding, I created this tool to take a block of text and turn the citations into links:

If you want to past it into a post in the discussion forum here, select Markdown. If it is for a post on dhammawheel.com, select PHPbb.

There are some limitations at this point. You need to use the same abbreviations that are used on SuttaCentral. And of course if the citations are bad, then the links will be bad. It also doesn’t accommodate ranges.

So, if you look at the post that inspired me to make it, the Udana citations won’t work because it says “Udana” instead of “Ud”. And the Iti links won’t work because they are in the form of chapter and sutta rather than suttas 1–112.

Any feedback would be most welcome. It’s mostly a learning project for me, so I’d be interested in feature requests.

2 Likes

This is awesome.
This is awesome. So if I want to link to SN 2.12, and compare this with Snp 2.12, they’ll both work.

We should add this to Discourse! Can you turn your widget into a Discourse plugin?


Just to note, canonically URLs on SC are all lowercase. Uppercase works, but in the past it didn’t, so if you can convert to all-lowercase it’d be more robust.

1 Like

Thanks for the appreciation.

At the moment it is not really anything more than a regex search and replace. Accommodating the other two number formats for Iti and Ud will make it slightly more complex.

Done!

OK, I have the issue with the Iti and the Ud sorted. No matter what format those citations are in, it should give you a working link (assuming it is a valid citation)

4 posts were split to a new topic: Automatically linking up citations to suttas on D&D

A post was merged into an existing topic: Automatically linking up citations to suttas on D&D

I just added a preview for the html result so users can make sure all the links work. (I’m assuming that the other formats would produce the same preview)

Bhante @Sujato, I think this is stable enough that it’s ready to be added to the “SC Awesome” page if you think it will be helpful.

(small side note… it’s a bit more user friendly to link directly to the readme anchor on github pages for non github users, i.e. https://github.com/suttacentral/awesome#readme)

Bhante, you can try these for size. The regex rules below will produce only valid SC links!

Use $2 for substitution and map each regex below to corresponding SC id:

  • \b(DN...)\b[DN $2](https://suttacentral.net/dn$2/en/sujato)
  • \b(Ud...)\b[Ud $2](https://suttacentral.net/ud$2/en/sujato)
\b(DN|Dn|dn)\s?([1-9]|[1-2][0-9]|3[0-4])\b

\b(MN|Mn|mn)\s?([1-9]|[1-9][0-9]|1[0-4][0-9]|15[0-2])\b

\b(AN|An|an)\s?((1|8)\.([1-9]|[1-9][0-9]|[1-5][0-9][0-9]|6[0-1][0-9]|62[0-7])|2\.([1-9]|[1-9][0-9]|[1-3][0-9][0-9]|4[0-7][0-9])|3\.([1-9]|[1-9][0-9]|[1-2][0-9][0-9]|3[0-4][0-9]|35[0-2])|4\.([1-9]|[1-9][0-9]|[1-6][0-9][0-9]|7[0-7][0-9]|78[0-3])|5\.([1-9]|[1-9][0-9]|[1-9][0-9][0-9]|10[0-9][0-9]|11[0-4][0-9]|115[0-2])|6\.([1-9]|[1-9][0-9]|[1-5][0-9][0-9]|6[0-4][0-9])|7\.([1-9]|[1-9][0-9]|[1-9][0-9][0-9]|10[0-9][0-9]|11[0-1][0-9]|112[0-4])|9\.([1-9]|[1-9][0-9]|[1-3][0-9][0-9]|4[0-2][0-9]|43[0-2])|10\.([1-9]|[1-9][0-9]|[1-6][0-9][0-9]|7[0-3][0-9]|74[0-6])|11\.([1-9]|[1-9][0-9]|[1-9][0-9][0-9]|10[0-9][0-9]|11[0-4][0-9]|115[0-1]))\b

\b(SN|Sn|sn)\s?(1\.([1-9]|[1-7][0-9]|8[0-1])|2\.([1-9]|[1-2][0-9]|30)|(3|4|11)\.([1-9]|1[0-9]|2[0-5])|(5|2[5-8]|41)\.([1-9]|10)|6\.([1-9]|1[0-5])|(7|18)\.([1-9]|1[0-9]|2[0-2])|(8|10|2[01])\.([1-9]|1[0-2])|9\.([1-9]|1[0-4])|12\.([1-9]|[1-9][0-9]|1[0-9][0-9]|20[0-9]|21[0-3])|(13|4[04])\.([1-9]|1[0-1])|14\.([1-9]|[1-3][0-9])|(15|54)\.([1-9]|1[0-9]|20)|(16|42)\.([1-9]|1[0-3])|17\.([1-9]|[1-3][0-9]|4[0-3])|19\.([1-9]|1[0-9]|2[0-1])|22\.([1-9]|[1-9][0-9]|1[0-5][0-9])|(23|30)\.([1-9]|[1-3][0-9]|4[0-6])|24\.([1-9]|[1-8][0-9]|9[0-6])|29\.([1-9]|[1-4][0-9]|50)|31\.([1-9]|[1-9][0-9]|10[0-9]|11[0-2])|32\.([1-9]|[1-4][0-9]|5[0-7])|(3[34])\.([1-9]|[1-4][0-9]|5[0-5])|35\.([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-3][0-9]|24[0-8])|36\.([1-9]|[1-2][0-9]|3[0-1])|37\.([1-9]|[1-2][0-9]|3[0-4])|(3[89])\.([1-9]|1[0-6])|43\.([1-9]|[1-3][0-9]|4[0-4])|45\.([1-9]|[1-9][0-9]|1[0-7][0-9]|180)|46\.([1-9]|[1-9][0-9]|1[0-7][0-9]|18[0-4])|47\.([1-9]|[1-9][0-9]|10[0-4])|48\.([1-9]|[1-9][0-9]|1[0-6][0-9]|17[0-8])|(49|53)\.([1-9]|[1-4][0-9]|5[0-4])|50\.([1-9]|[1-9][0-9]|10[0-8])|51\.([1-9]|[1-7][0-9]|8[0-6])|52\.([1-9]|1[0-9]|2[0-4])|55\.([1-9]|[1-6][0-9]|7[0-4])|56\.([1-9]|[1-9][0-9]|1[0-2][0-9]|13[0-1]))\b

\b(Dhp|DHP|dhp)\s?([1-9]|[1-9][0-9]|[1-3][0-9][0-9]|4[0-1][0-9]|42[0-3])\b

\b(Iti|ITI|iti)\s?([1-9]|[1-9][0-9]|10[0-9]|11[0-2])\b

\b(Snp|SNP|snp)\s?((1|3)\.([1-9]|1[0-2])|2\.([1-9]|1[0-4])|4\.([1-9]|1[0-6])|5\.([1-9]|1[0-9]))\b

\b(Thag|THAG|thag)\s?(1\.([1-9]|[1-9][0-9]|1[0-1][0-9]|120)|2\.([1-9]|[1-4][0-9])|3\.([1-9]|1[0-6])|(4|5)\.([1-9]|1[0-2])|6\.([1-9]|1[0-4])|7\.[1-5]|(8|17)\.[1-3]|(9|1[1389]|2[01])\.1|10\.[1-7]|1[245]\.[1-2]|16\.([1-9]|10))\b

\b(Thig|THIG|thig)\s?(1\.([1-9]|1[0-8])|2\.([1-9]|10)|(3|6)\.[1-8]|([489]|1[0-24-6])\.1|5\.([1-9]|1[0-2])|7\.[1-3]|13\.[1-5])\b

\b(Ud|UD|ud)\s?([1-8]\.([1-9]|10))\b

This should be complete and accurate list of rules now (some tweaks were made and some missing chapters in Thag and Thig were added).

Also, there is an added option to use : as a separator for SN (for legacy purposes).

1 Like

Sorcery, I say! :man_mage:t2:

I’ll give it a try. Couple of things:

I’d recommend not including Sn as a possible citation for Saṁyutta Nikāya since it is a standard (if not outdated in a digital world) abbreviation for Sutta Nipāta. You would rarely see it for SN. In fact, I wouldn’t bother with Dn, Mn, or An either.

Is there a tool you used to create that? It’s amazing.

2 Likes

Here are a couple test cases, just to see what happens:

  • a long link with DN 1 inside
  • DN 1: The very first sutta
  • SN 1: a chapter
  • AN 1.10 is a sutta in a range
  • AN 1.1-10 is the range
  • References sometimes use colons like this: AN 10:5
  • References could occur inside code blocks MN 5 like this
  • lowercase AN is an actual word, though it shouldn’t be followed by numbers “in the wild”, though I might make an 3xample just to try!
  • What about italic: Thag 1.1 and bold: Ud 1.2?
  • And what about deleted Thig 1.1 text?
1 Like

Yes, if at all possible colons as well as periods should be acceptable as separators.

I think the very nature of code blocks is that nothing should touch them, eh? I believe that the default behaviour of the code being used ignores things inside of code blocks.

By work, you mean…?

Personally, I think it would be find to require a standard capitalization (e.g. DN, AN, Snp, Dhp). That might even give people the option of not having citations linked up automatically if they desired.

My idea here is simply to think of edge cases. What the behavior “should” be in these cases is up to y’all! But, fwiw, I agree that code block refs should probably be left unlinkafied

Ah, I got confused as to which thread we were in. :man_facepalming:t2: Yes, with the tool I wrote, things inside code blocks will get linked. That’s not good.

1 Like

For what its worth, those test cases should also apply to @musiko 's changes here on SuttaCentral Dungeons and Dragons Discuss and Discover :slight_smile:

Yes, several of these do break the linkup tool. I didn’t really consider people pasting in html text. I should probably just say that it’s unsupported/use at your own risk for linking up code. I think there might be too many edge cases to deal with.

Ranges are already unsupported. The link will be created for the first part of the range.

But the an 3example is bad and I will fix.

Thanks!

1 Like

Simply edit Sn out from (SN|Sn|sn)

It might not seem so at first glance but these are really simple regex rules (basically just matching numerical sequences 1…n).

The hard part was to find the ranges for each collection (I used sc-data/sc_bilara_data/html/pli/ms/sutta at master · suttacentral/sc-data · GitHub to extract the ranges) and then construct ranges by hand ([1-9] matches range 1…9, [1-9][0-9] matches 10…99 etc.) for the easier part.

1 Like

Right. I made that comment when I thought we were discussing the feature on the forum.

Thanks!

1 Like

Implementation on the forum is simplified due to the current constraints with using |, so the rule here (for the time being) is to simply match all case insensitive strings (which will also match dN 1).

1 Like

@Snowbird please replace all the rules in your tool. This should be the final version of the rules :grin:

Added support for legacy : notation for SN, AN, Ud and Snp.

Rules using : as a separator are not exact (i.e. they will map some non-existent notations to invalid links). Use $3.$4 for substitution and map each regex below to corresponding SC id:

  • \b(SN...)\b[SN $3.$4](https://suttacentral.net/sn$3.$4/en/sujato)
  • \b(AN...)\b[AN $3.$4](https://suttacentral.net/an$3.$4/en/sujato)
  • \b(Ud...)\b[Ud $3.$4](https://suttacentral.net/ud$3.$4/en/sujato)
  • \b(Snp...)\b[Snp $3.$4](https://suttacentral.net/snp$3.$4/en/sujato)
\b(SN|Sn|sn)\s?(([1-9]|[1-5][0-9]):([1-9]|[1-9][0-9]))\b

\b(SN|Sn|sn)\s?((12|22|31|35|45|46|47|48|50|56|):([12][0-9][0-9]))\b

\b(AN|An|an)\s?(([1-9]|1[01]):([1-9]|[1-9][0-9]|[1-9][0-9][0-9]|10[0-9][0-9]|11[0-5][0-9]))\b

\b(Ud|UD|ud)\s?(([1-8]):([1-9]|10))\b

\b(Snp|SNP|snp)\s?(([1-5]):([1-9]|1[0-9]))\b
1 Like

Thanks! I’ll give it a spin.

1 Like