Automatically linking up citations to suttas on D&D

Ah, it actually wasn’t clear to me! I think dotted underlines usually indicate a definition is available if you hover, eh? From the standpoint of a general reader of posts, will it really matter if they were auto linked or not? I guess my point is that for that general reader, they would have no idea that dotted links were auto generated or even what auto generated links were at all.

I think for usability even the auto generated links should be styled in the same way as regular links, possibly with the addition of dotted underline.

As a side note, I’m an Accessibility Dark Theme user so for me the links are usually yellow, which I think is great. But to see what was going on, I tried out the Accessibility Light Theme and I find the gray color of the regular links to be awful. I don’t see how gray text could be considered accessible at all. Usually it indicates something is unavailable or possibly a link visited. I find it really hard to see where the links are. But I guess if no one else has complained then it’s not an issue? If it was really going to be an accessible theme, the links should be underlined for maximum usability. Anyway, just my $0.2.

@musiko, Is this live on the site now? Looks like it is. Amazing!

Edit: Just for completeness, I tried the legacy theme and it looks like auto generated and regular links are both styled the same. I’d also note that the very thin yellow underline on links doesn’t seem great for accessibility either.

1 Like

Once an auto link is quoted, it appears like a regular link, which I think is just fine. :+1:t2:

1 Like

This just in—all major collections are now supported. No need to copy and paste links to the suttas from browser any more, simply put an abbreviation to the specific sutta in the post and let Discourse do its magic (may require browser refresh once).

It’s best to use standard abbreviations: DN, MN, SN, AN, Dhp, Iti, Snp, Thag, Thig and Ud for consistency, but upper, proper and lowercase all work.

DN MN Dhp Iti
DN0 MN0 DHP0 ITI0
Dn0 Mn0 Dhp 0 Iti 0
dn 0 mn 0 dhp 0 iti 0
DN1 MN1 DHP1 ITI1
Dn1 Mn1 Dhp1 Iti1
dn1 mn1 dhp1 iti1
DN 34 MN 152 DHP 423 ITI 112
Dn 34 Mn 152 Dhp 423 Iti 112
dn34 mn152 dhp423 iti112
DN35 MN153 DHP424 ITI113
Dn35 Mn153 Dhp 424 Iti 113
dn 35 mn 153 dhp 424 iti 113
an 0 1 2 max max+1
0 an 0.0 AN0.1 An0.2 an0.9999 an0.10000
1 AN 1.0 AN1.1 An1.2 an1.627 an1.628
2 An2.0 AN2.1 An2.2 an2.479 an 2.480
11 an11.0 AN 11.1 An 11.2 an 11.1151 an 11.1152
12 an12.0 AN 12.1 An 12.2 an 12.9999 an 12.10000
sn 0 1 2 max max+1
0 sn 0.0 SN0.1 Sn0.2 sn0.9999 sn0.10000
1 SN 1.0 SN1.1 Sn1.2 sn1.81 sn1.82
2 Sn2.0 SN2.1 Sn2.2 sn2.30 sn 2.31
56 sn56.0 SN 56.1 Sn 56.2 sn 56.131 sn 56.132
57 sn57.0 SN 57.1 Sn 57.2 sn 57.9999 sn 57.10000
snp 0 1 2 max max+1
0 snp 0.0 SNP0.1 Snp0.2 snp0.9999 snp0.10000
1 SNP 1.0 SNP1.1 Snp1.2 snp1.12 snp1.13
2 Snp2.0 SNP2.1 Snp2.2 snp2.14 snp 2.15
5 snp5.0 SNP 5.1 Snp 5.2 snp 5.19 snp 5.20
6 snp6.0 SNP 6.1 Snp 6.2 snp 6.9999 snp 6.10000
thag 0 1 2 max max+1
0 thag 0.0 THAG0.1 Thag0.2 thag0.9999 thag0.10000
1 THAG 1.0 THAG1.1 Thag1.2 thag1.120 thag1.121
2 Thag2.0 THAG2.1 Thag2.2 thag2.49 thag 2.50
9 thag9.0 THAG 9.1 Thag 9.2 thag 9.1 thag 9.2
10 thag10.0 THAG 10.1 Thag 10.2 thag 10.9999 thag 10.10000
thig 0 1 2 max max+1
0 thig 0.0 THIG0.1 Thig0.2 thig0.9999 thig0.10000
1 THIG 1.0 THIG1.1 Thig1.2 thig1.18 thig1.19
2 Thig2.0 THIG2.1 Thig2.2 thig2.10 thig 2.11
16 thig16.0 THIG 16.1 Thig 16.2 thig 16.1 thig 16.2
17 thig17.0 THIG 17.1 Thig 17.2 thig 17.9999 thig 17.10000
ud 0 1 2 max max+1
0 ud 0.0 UD0.1 Ud0.2 ud0.9999 ud0.10000
1 UD 1.0 UD1.1 Ud1.2 ud1.10 ud1.11
2 Ud2.0 UD2.1 Ud2.2 ud2.10 ud 2.11
8 ud8.0 UD 8.1 Ud 8.2 ud 8.10 ud 8.11
9 ud9.0 UD 9.1 Ud 9.2 ud 9.9999 ud 9.10000
5 Likes

Sadhu sadhu!!!

Great work. It’s really impressive. This is a great enhancement to the forum.

BTW, for me some of the citations linked and some didn’t, but then when I did a hard refresh they all did. :+1:t2:

3 Likes

From my test cases, the only thing I’d ask for would be colon support:

  • SN 5:2

Some omissions were fixed, all sutas should now be automatically linkified (requires browser refresh). Here’s a list of all supported collections with corresponding last sutta IDs:

dhp dn iti mn
Dhp 423 DN 34 Iti 112 MN 152
an 8122
an1 AN 1.627
an2 AN 2.479
an3 AN 3.352
an4 AN 4.783
an5 AN 5.1152
an6 AN 6.649
an7 AN 7.1124
an8 AN 8.627
an9 AN 9.432
an10 AN 10.746
an11 AN 11.1151
sn 3024
sn1 SN 1.81
sn2 SN 2.30
sn3 SN 3.25
sn4 SN 4.25
sn5 SN 5.10
sn6 SN 6.15
sn7 SN 7.22
sn8 SN 8.12
sn9 SN 9.14
sn10 SN 10.12
sn11 SN 11.25
sn12 SN 12.213
sn13 SN 13.11
sn14 SN 14.39
sn15 SN 15.20
sn16 SN 16.13
sn17 SN 17.43
sn18 SN 18.22
sn19 SN 19.21
sn20 SN 20.12
sn21 SN 21.12
sn22 SN 22.159
sn23 SN 23.46
sn24 SN 24.96
sn25 SN 25.10
sn26 SN 26.10
sn27 SN 27.10
sn28 SN 28.10
sn29 SN 29.50
sn30 SN 30.46
sn31 SN 31.112
sn32 SN 32.57
sn33 SN 33.55
sn34 SN 34.55
sn35 SN 35.248
sn36 SN 36.31
sn37 SN 37.34
sn38 SN 38.16
sn39 SN 39.16
sn40 SN 40.11
sn41 SN 41.10
sn42 SN 42.13
sn43 SN 43.44
sn44 SN 44.11
sn45 SN 45.180
sn46 SN 46.184
sn47 SN 47.104
sn48 SN 48.178
sn49 SN 49.54
sn50 SN 50.108
sn51 SN 51.86
sn52 SN 52.24
sn53 SN 53.54
sn54 SN 54.20
sn55 SN 55.74
sn56 SN 56.131
snp 73
snp1 Snp 1.12
snp2 Snp 2.14
snp3 Snp 3.12
snp4 Snp 4.16
snp5 Snp 5.19
thag 264
thag1 Thag 1.120
thag2 Thag 2.49
thag3 Thag 3.16
thag4 Thag 4.12
thag5 Thag 5.12
thag6 Thag 6.14
thag7 Thag 7.5
thag8 Thag 8.3
thag9 Thag 9.1
thag10 Thag 10.7
thag11 Thag 11.1
thag12 Thag 12.2
thag13 Thag 13.1
thag14 Thag 14.2
thag15 Thag 15.2
thag16 Thag 16.10
thag17 Thag 17.3
thag18 Thag 18.1
thag19 Thag 19.1
thag20 Thag 20.1
thag21 Thag 21.1
thig 73
thig1 Thig 1.18
thig2 Thig 2.10
thig3 Thig 3.8
thig4 Thig 4.1
thig5 Thig 5.12
thig6 Thig 6.8
thig7 Thig 7.3
thig8 Thig 8.1
thig9 Thig 9.1
thig10 Thig 10.1
thig11 Thig 11.1
thig12 Thig 12.1
thig13 Thig 13.5
thig14 Thig 14.1
thig15 Thig 15.1
thig16 Thig 16.1
ud 80
ud1 Ud 1.10
ud2 Ud 2.10
ud3 Ud 3.10
ud4 Ud 4.10
ud5 Ud 5.10
ud6 Ud 6.10
ud7 Ud 7.10
ud8 Ud 8.10

should work now (requires browser refresh).

3 Likes

It linkafies but sends me to a malformed url. The : should be replaced with a . in the URL :slight_smile:

I have also seen the : used in the AN, Ud and Snp. Would be nice to accommodate that too.

1 Like

Unfortunately, the : as an optional separator proved to be more tricky than it looked. I’m removing this option for now (please refresh browser) and will investigate further.

You would also want to replace the SN rule for your tool with the updated one here Citation Link Up tool 🔧 to turn citations into links for a block of text - #7 by musiko

Sorry, I’m having a little bit of a hard time tracking… is the : separator problematic for regex in general, or just for the Discourse implementation?

It’s the same regex rule, so both (the problem is not the engine but the rule itself, which I’ll try and fix).

Currently the rule is not capturing the left and the right number separately and this prevents linking to the correct id.

2 Likes

These should all work now (for legacy purposes only!—use dot notation for all new entries to D&D), with a caveat: legacy notation uses simplified rules which in some cases match non-existent sutta IDs to invalid SC links!

1 Like

Thanks! So, to be clear, If I do do a AN 4:9 it’ll work, you just recommend using dots (An 4.9) instead? Or you won’t support colons going forward?

1 Like

Colon rule will map all existing entries on D&D. While it will also work for new entries I recommend using dot notation as it is more precise (it will not map invalid SC links as opposed to the colon rule).

This is so great! It’s fun to see it all over the site now.

Just out of curiosity, could you share an example of what doesn’t link right with a colon?

1 Like

Basically anything that has the last number larger than the last existing sutta ID in that chapter (see table Automatically linking up citations to suttas on D&D - #30 by musiko).

For example:

  • sn50:108 but not sn50:109
  • an8:627 but not an8:628
  • snp2:14 but not snp2:15

Dot notation doesn’t allow invalid IDs:

  • sn50.108 but not sn50.109
  • an8.627 but not an8.628
  • snp2.14 but not snp2.15
2 Likes

Why can’t you just use [:\.] in the regex?

2 Likes

This is because of the way capturing groups work.

Let’s say we have strings ranging from SC1.1 to SC9.9. Then the corresponding regex to match them would be

\bSC\s?([1-9]\.[1-9])\b

where

  • \b matches word boundary
  • \s matches whitespace character
    • ? repeated 0 or 1 times
  • () is a capturing group
    • [1-9] matches range of digits from 1–9
    • \. matches dot character (unescaped dot is a special character representing any character)

Capturing groups are enumerated (from left to right) and can be used in substitution rules using $ and group number, in this case $1.

This regex rule matches all strings from SC1.1 to SC9.9, and captures 1.1–9.9

Substitution rule can than be https://example.com/sc$1, which will produce https://example.com/sc1.1 to https://example.com/sc9.9

If we want to match both . and : in SC1.1 to SC 9.9 and SC1:1 to SC9:9 the regex to match all strings would be

\bSC\s?([1-9][\.:][1-9])\b

but the capture this time would be 1.1–9.9 and 1:1–9:9. We cannot use $1 for substitution this time, because the link would be wrong for captures with :.

We can adjust the regex like this (note the two capturing groups)

\bSC\s?([1-9])[\.:]([1-9])\b

and then construct the substitution rule as https.//example.com/sc$1.$2 (we capture two separate digits and manually substitute the separator with dot).

This seemingly works, so why its it not working in our real case scenario?

Note that all SC ranges in our example were of the same size (1.1–1.9, 2.1–2.9,…,9.1–9.9). What if this was not so? Let’s say our ranges shrink progressively, for example 1.1–1.9, 2.1–2.8,…,8.1–8.2 and 9.1). Now we cannot construct a simple matching regex rule, if we only want to match these specific ranges. The rule is a bit more complex, but still a single rule

\bSC\s?(?:(1)[\.:]([1-9])|(2)[\.:]([1-8])|(3)[\.:]([1-7])|(4)[\.:]([1-6])|(5)[\.:]([1-5])|(6)[\.:]([1-4])|(7)[\.:]([1-3])|(8)[\.:]([12])|(9)[\.:](1))\b

where

  • (?:) is a non-capturing group (the outermost group)
  • | is the OR operator (in this case it helps making separate matcings for 1.1–1.9, 2.1–2.8,…, 8.1–8.2 and 9.1)

But there are more than two capturing groups now (nine pairs) and they are all enumerated from left to right, which makes it impossible to construct the replacement rule as before: $1.$2, because this will only substitute 1.1–1.9, but not the others.

For this to work the one rule must be either broken into nine separate rules, or we must accept that we will match a wider range, and by that capture some invalid combinations $1.$2 for the generated link.

2 Likes

Thank you sooo much for this reply. I appreciate all the time it must have taken to create.

I think I am starting to get it now. The core issue seems to be that while you could easily capture exclusively correct citations with a :, once captured you would not be able to use what you captured in a url, because the url would not work. If we weren’t restricted to regex only, then the solution would be as simple as doing a .replace(":",".")

Am I understanding correctly? And that if we were to create a regex for each of the chapters in the AN and SN that would be a technical solution (although possibly too much work for the software to do quickly enough?)

In my link up app, I’m not restricted to regex only. I just did that because I was lazy. Or maybe I could call it “getting a minimal viable product to market quickly.” :rofl:

In my Citation Helper app, since I am dealing with multiple websites, I have a basic structure object that defines what is allowable as a citation, and then I have an object for each website, "e.g. SuttaCentral) that specifics what suttas are available and what, if any, are only available in a range.

Obviously that is way more complicated and not suitable for the situation here without writing a whole new plugin. Personally I do think that your regex solution is a perfectly good solution here.

Thanks! I’m so happy seeing all the linked up citations as I browse the forum. I wonder if this will lead to more people taking the time to click to the suttas being discussed.

4 Likes

Exactly. The component on D&D only accepts regex, but you have more freedom in your environment. The easiest way to have both notations work correctly would be to match and replace using the exact regex first, and then run .replace(":", ".") on the result.

Too much work for the human :grin:, software can proccess heaps.

Laziness is the mother of invention.

Probably doable using regex too, but whatever works for you is OK. I must always remind myself there are different tools for different jobs (Law of the instrument - Wikipedia).

It seems to work nicely and I’m also happy that you nudged me to do it, D&D looks way cooler with all these links.

3 Likes