Automatically linking up citations to suttas on D&D

From my test cases, the only thing I’d ask for would be colon support:

  • SN 5:2

Some omissions were fixed, all sutas should now be automatically linkified (requires browser refresh). Here’s a list of all supported collections with corresponding last sutta IDs:

dhp dn iti mn
Dhp 423 DN 34 Iti 112 MN 152
an 8122
an1 AN 1.627
an2 AN 2.479
an3 AN 3.352
an4 AN 4.783
an5 AN 5.1152
an6 AN 6.649
an7 AN 7.1124
an8 AN 8.627
an9 AN 9.432
an10 AN 10.746
an11 AN 11.1151
sn 3024
sn1 SN 1.81
sn2 SN 2.30
sn3 SN 3.25
sn4 SN 4.25
sn5 SN 5.10
sn6 SN 6.15
sn7 SN 7.22
sn8 SN 8.12
sn9 SN 9.14
sn10 SN 10.12
sn11 SN 11.25
sn12 SN 12.213
sn13 SN 13.11
sn14 SN 14.39
sn15 SN 15.20
sn16 SN 16.13
sn17 SN 17.43
sn18 SN 18.22
sn19 SN 19.21
sn20 SN 20.12
sn21 SN 21.12
sn22 SN 22.159
sn23 SN 23.46
sn24 SN 24.96
sn25 SN 25.10
sn26 SN 26.10
sn27 SN 27.10
sn28 SN 28.10
sn29 SN 29.50
sn30 SN 30.46
sn31 SN 31.112
sn32 SN 32.57
sn33 SN 33.55
sn34 SN 34.55
sn35 SN 35.248
sn36 SN 36.31
sn37 SN 37.34
sn38 SN 38.16
sn39 SN 39.16
sn40 SN 40.11
sn41 SN 41.10
sn42 SN 42.13
sn43 SN 43.44
sn44 SN 44.11
sn45 SN 45.180
sn46 SN 46.184
sn47 SN 47.104
sn48 SN 48.178
sn49 SN 49.54
sn50 SN 50.108
sn51 SN 51.86
sn52 SN 52.24
sn53 SN 53.54
sn54 SN 54.20
sn55 SN 55.74
sn56 SN 56.131
snp 73
snp1 Snp 1.12
snp2 Snp 2.14
snp3 Snp 3.12
snp4 Snp 4.16
snp5 Snp 5.19
thag 264
thag1 Thag 1.120
thag2 Thag 2.49
thag3 Thag 3.16
thag4 Thag 4.12
thag5 Thag 5.12
thag6 Thag 6.14
thag7 Thag 7.5
thag8 Thag 8.3
thag9 Thag 9.1
thag10 Thag 10.7
thag11 Thag 11.1
thag12 Thag 12.2
thag13 Thag 13.1
thag14 Thag 14.2
thag15 Thag 15.2
thag16 Thag 16.10
thag17 Thag 17.3
thag18 Thag 18.1
thag19 Thag 19.1
thag20 Thag 20.1
thag21 Thag 21.1
thig 73
thig1 Thig 1.18
thig2 Thig 2.10
thig3 Thig 3.8
thig4 Thig 4.1
thig5 Thig 5.12
thig6 Thig 6.8
thig7 Thig 7.3
thig8 Thig 8.1
thig9 Thig 9.1
thig10 Thig 10.1
thig11 Thig 11.1
thig12 Thig 12.1
thig13 Thig 13.5
thig14 Thig 14.1
thig15 Thig 15.1
thig16 Thig 16.1
ud 80
ud1 Ud 1.10
ud2 Ud 2.10
ud3 Ud 3.10
ud4 Ud 4.10
ud5 Ud 5.10
ud6 Ud 6.10
ud7 Ud 7.10
ud8 Ud 8.10

should work now (requires browser refresh).

3 Likes

It linkafies but sends me to a malformed url. The : should be replaced with a . in the URL :slight_smile:

I have also seen the : used in the AN, Ud and Snp. Would be nice to accommodate that too.

1 Like

Unfortunately, the : as an optional separator proved to be more tricky than it looked. I’m removing this option for now (please refresh browser) and will investigate further.

You would also want to replace the SN rule for your tool with the updated one here Citation Link Up tool 🔧 to turn citations into links for a block of text - #7 by musiko

Sorry, I’m having a little bit of a hard time tracking… is the : separator problematic for regex in general, or just for the Discourse implementation?

It’s the same regex rule, so both (the problem is not the engine but the rule itself, which I’ll try and fix).

Currently the rule is not capturing the left and the right number separately and this prevents linking to the correct id.

2 Likes

These should all work now (for legacy purposes only!—use dot notation for all new entries to D&D), with a caveat: legacy notation uses simplified rules which in some cases match non-existent sutta IDs to invalid SC links!

1 Like

Thanks! So, to be clear, If I do do a AN 4:9 it’ll work, you just recommend using dots (An 4.9) instead? Or you won’t support colons going forward?

1 Like

Colon rule will map all existing entries on D&D. While it will also work for new entries I recommend using dot notation as it is more precise (it will not map invalid SC links as opposed to the colon rule).

This is so great! It’s fun to see it all over the site now.

Just out of curiosity, could you share an example of what doesn’t link right with a colon?

1 Like

Basically anything that has the last number larger than the last existing sutta ID in that chapter (see table Automatically linking up citations to suttas on D&D - #30 by musiko).

For example:

  • sn50:108 but not sn50:109
  • an8:627 but not an8:628
  • snp2:14 but not snp2:15

Dot notation doesn’t allow invalid IDs:

  • sn50.108 but not sn50.109
  • an8.627 but not an8.628
  • snp2.14 but not snp2.15
2 Likes

Why can’t you just use [:\.] in the regex?

2 Likes

This is because of the way capturing groups work.

Let’s say we have strings ranging from SC1.1 to SC9.9. Then the corresponding regex to match them would be

\bSC\s?([1-9]\.[1-9])\b

where

  • \b matches word boundary
  • \s matches whitespace character
    • ? repeated 0 or 1 times
  • () is a capturing group
    • [1-9] matches range of digits from 1–9
    • \. matches dot character (unescaped dot is a special character representing any character)

Capturing groups are enumerated (from left to right) and can be used in substitution rules using $ and group number, in this case $1.

This regex rule matches all strings from SC1.1 to SC9.9, and captures 1.1–9.9

Substitution rule can than be https://example.com/sc$1, which will produce https://example.com/sc1.1 to https://example.com/sc9.9

If we want to match both . and : in SC1.1 to SC 9.9 and SC1:1 to SC9:9 the regex to match all strings would be

\bSC\s?([1-9][\.:][1-9])\b

but the capture this time would be 1.1–9.9 and 1:1–9:9. We cannot use $1 for substitution this time, because the link would be wrong for captures with :.

We can adjust the regex like this (note the two capturing groups)

\bSC\s?([1-9])[\.:]([1-9])\b

and then construct the substitution rule as https.//example.com/sc$1.$2 (we capture two separate digits and manually substitute the separator with dot).

This seemingly works, so why its it not working in our real case scenario?

Note that all SC ranges in our example were of the same size (1.1–1.9, 2.1–2.9,…,9.1–9.9). What if this was not so? Let’s say our ranges shrink progressively, for example 1.1–1.9, 2.1–2.8,…,8.1–8.2 and 9.1). Now we cannot construct a simple matching regex rule, if we only want to match these specific ranges. The rule is a bit more complex, but still a single rule

\bSC\s?(?:(1)[\.:]([1-9])|(2)[\.:]([1-8])|(3)[\.:]([1-7])|(4)[\.:]([1-6])|(5)[\.:]([1-5])|(6)[\.:]([1-4])|(7)[\.:]([1-3])|(8)[\.:]([12])|(9)[\.:](1))\b

where

  • (?:) is a non-capturing group (the outermost group)
  • | is the OR operator (in this case it helps making separate matcings for 1.1–1.9, 2.1–2.8,…, 8.1–8.2 and 9.1)

But there are more than two capturing groups now (nine pairs) and they are all enumerated from left to right, which makes it impossible to construct the replacement rule as before: $1.$2, because this will only substitute 1.1–1.9, but not the others.

For this to work the one rule must be either broken into nine separate rules, or we must accept that we will match a wider range, and by that capture some invalid combinations $1.$2 for the generated link.

2 Likes

Thank you sooo much for this reply. I appreciate all the time it must have taken to create.

I think I am starting to get it now. The core issue seems to be that while you could easily capture exclusively correct citations with a :, once captured you would not be able to use what you captured in a url, because the url would not work. If we weren’t restricted to regex only, then the solution would be as simple as doing a .replace(":",".")

Am I understanding correctly? And that if we were to create a regex for each of the chapters in the AN and SN that would be a technical solution (although possibly too much work for the software to do quickly enough?)

In my link up app, I’m not restricted to regex only. I just did that because I was lazy. Or maybe I could call it “getting a minimal viable product to market quickly.” :rofl:

In my Citation Helper app, since I am dealing with multiple websites, I have a basic structure object that defines what is allowable as a citation, and then I have an object for each website, "e.g. SuttaCentral) that specifics what suttas are available and what, if any, are only available in a range.

Obviously that is way more complicated and not suitable for the situation here without writing a whole new plugin. Personally I do think that your regex solution is a perfectly good solution here.

Thanks! I’m so happy seeing all the linked up citations as I browse the forum. I wonder if this will lead to more people taking the time to click to the suttas being discussed.

4 Likes

Exactly. The component on D&D only accepts regex, but you have more freedom in your environment. The easiest way to have both notations work correctly would be to match and replace using the exact regex first, and then run .replace(":", ".") on the result.

Too much work for the human :grin:, software can proccess heaps.

Laziness is the mother of invention.

Probably doable using regex too, but whatever works for you is OK. I must always remind myself there are different tools for different jobs (Law of the instrument - Wikipedia).

It seems to work nicely and I’m also happy that you nudged me to do it, D&D looks way cooler with all these links.

3 Likes

I concur! :partying_face: It really does make it much easier to double check people’s references and dive into the suttas themselves. Kudos!

2 Likes

Added support for Vinaya (long and shorthand—with or without dashes, and Pli or Pi (for backwad comaptibility only) in long notation).

All links point to translation by Bhante @Brahmali (where present—some translations are not published yet?, e.g Bi Sk, Bu Pm and Bi Pm).

Bhikkhu Vibhanga
Pli-Tv-Bu-Vb-Pj4 Bu Pj 4
Pli-Tv-Bu-Vb-Ss13 Bu Ss 13
Pli-Tv-Bu-Vb-Ay2 Bu Ay 2
Pli-Tv-Bu-Vb-Np30 Bu Np 30
Pli-Tv-Bu-Vb-Pc92 Bu Pc 92
Pli-Tv-Bu-Vb-Pd4 Bu Pd 4
Pli-Tv-Bu-Vb-Sk75 Bu Sk 75
Pli-Tv-Bu-Vb-As7 Bu As 7
Bhikkhuni Vibhanga
Pli-Tv-Bi-Vb-Pj8 Bi Pj 8
Pli-Tv-Bi-Vb-Ss13 Bi Ss 13
Pli-Tv-Bi-Vb-Np12 Bi Np 12
Pli-Tv-Bi-Vb-Pc96 Bi Pc 96
Pli-Tv-Bi-Vb-Pd8 Bi Pd 8
Pli-Tv-Bi-Vb-Sk75 Bi Sk 75
Pli-Tv-Bi-Vb-As7 Bi As 7
Khandhaka
Pli-Tv-Kd22 Kd 22
Parivara
Pli-Tv-Pvr1.16 Pvr 1.16
Pli-Tv-Pvr2.16 Pvr 2.16
Pli-Tv-Pvr21 Pvr 21
Patimokkha
Pli-Tv-Bu-Pm Bu Pm
Pli-Tv-Bi-Pm Bi Pm
4 Likes

Thanks, @Musiko. And yes, you are right some of the pages are not showing on SuttaCentral. I think the problem occurs whenever a single page cover more than one rule. This is so for the bhikkhunī sekhiyas and paṭidesanīyas. Bhante @ Sujato, are you able to have a look at this?

1 Like

could also be related to this issue: Linking together the Bhikkhuni and Bhikkhu rules which are the same

FWIW, I rarely see people giving citations for Vinaya, but it’s great to have them linking when they do.

1 Like