From my test cases, the only thing I’d ask for would be colon support:
- SN 5:2
Some omissions were fixed, all sutas should now be automatically linkified (requires browser refresh). Here’s a list of all supported collections with corresponding last sutta IDs:
dhp | dn | iti | mn |
---|---|---|---|
Dhp 423 | DN 34 | Iti 112 | MN 152 |
an | 8122 |
---|---|
an1 | AN 1.627 |
an2 | AN 2.479 |
an3 | AN 3.352 |
an4 | AN 4.783 |
an5 | AN 5.1152 |
an6 | AN 6.649 |
an7 | AN 7.1124 |
an8 | AN 8.627 |
an9 | AN 9.432 |
an10 | AN 10.746 |
an11 | AN 11.1151 |
sn | 3024 |
---|---|
sn1 | SN 1.81 |
sn2 | SN 2.30 |
sn3 | SN 3.25 |
sn4 | SN 4.25 |
sn5 | SN 5.10 |
sn6 | SN 6.15 |
sn7 | SN 7.22 |
sn8 | SN 8.12 |
sn9 | SN 9.14 |
sn10 | SN 10.12 |
sn11 | SN 11.25 |
sn12 | SN 12.213 |
sn13 | SN 13.11 |
sn14 | SN 14.39 |
sn15 | SN 15.20 |
sn16 | SN 16.13 |
sn17 | SN 17.43 |
sn18 | SN 18.22 |
sn19 | SN 19.21 |
sn20 | SN 20.12 |
sn21 | SN 21.12 |
sn22 | SN 22.159 |
sn23 | SN 23.46 |
sn24 | SN 24.96 |
sn25 | SN 25.10 |
sn26 | SN 26.10 |
sn27 | SN 27.10 |
sn28 | SN 28.10 |
sn29 | SN 29.50 |
sn30 | SN 30.46 |
sn31 | SN 31.112 |
sn32 | SN 32.57 |
sn33 | SN 33.55 |
sn34 | SN 34.55 |
sn35 | SN 35.248 |
sn36 | SN 36.31 |
sn37 | SN 37.34 |
sn38 | SN 38.16 |
sn39 | SN 39.16 |
sn40 | SN 40.11 |
sn41 | SN 41.10 |
sn42 | SN 42.13 |
sn43 | SN 43.44 |
sn44 | SN 44.11 |
sn45 | SN 45.180 |
sn46 | SN 46.184 |
sn47 | SN 47.104 |
sn48 | SN 48.178 |
sn49 | SN 49.54 |
sn50 | SN 50.108 |
sn51 | SN 51.86 |
sn52 | SN 52.24 |
sn53 | SN 53.54 |
sn54 | SN 54.20 |
sn55 | SN 55.74 |
sn56 | SN 56.131 |
snp | 73 |
---|---|
snp1 | Snp 1.12 |
snp2 | Snp 2.14 |
snp3 | Snp 3.12 |
snp4 | Snp 4.16 |
snp5 | Snp 5.19 |
thag | 264 |
---|---|
thag1 | Thag 1.120 |
thag2 | Thag 2.49 |
thag3 | Thag 3.16 |
thag4 | Thag 4.12 |
thag5 | Thag 5.12 |
thag6 | Thag 6.14 |
thag7 | Thag 7.5 |
thag8 | Thag 8.3 |
thag9 | Thag 9.1 |
thag10 | Thag 10.7 |
thag11 | Thag 11.1 |
thag12 | Thag 12.2 |
thag13 | Thag 13.1 |
thag14 | Thag 14.2 |
thag15 | Thag 15.2 |
thag16 | Thag 16.10 |
thag17 | Thag 17.3 |
thag18 | Thag 18.1 |
thag19 | Thag 19.1 |
thag20 | Thag 20.1 |
thag21 | Thag 21.1 |
thig | 73 |
---|---|
thig1 | Thig 1.18 |
thig2 | Thig 2.10 |
thig3 | Thig 3.8 |
thig4 | Thig 4.1 |
thig5 | Thig 5.12 |
thig6 | Thig 6.8 |
thig7 | Thig 7.3 |
thig8 | Thig 8.1 |
thig9 | Thig 9.1 |
thig10 | Thig 10.1 |
thig11 | Thig 11.1 |
thig12 | Thig 12.1 |
thig13 | Thig 13.5 |
thig14 | Thig 14.1 |
thig15 | Thig 15.1 |
thig16 | Thig 16.1 |
ud | 80 |
---|---|
ud1 | Ud 1.10 |
ud2 | Ud 2.10 |
ud3 | Ud 3.10 |
ud4 | Ud 4.10 |
ud5 | Ud 5.10 |
ud6 | Ud 6.10 |
ud7 | Ud 7.10 |
ud8 | Ud 8.10 |
should work now (requires browser refresh).
It linkafies but sends me to a malformed url. The : should be replaced with a . in the URL
I have also seen the :
used in the AN, Ud and Snp. Would be nice to accommodate that too.
Unfortunately, the :
as an optional separator proved to be more tricky than it looked. I’m removing this option for now (please refresh browser) and will investigate further.
You would also want to replace the SN rule for your tool with the updated one here Citation Link Up tool 🔧 to turn citations into links for a block of text - #7 by musiko
Sorry, I’m having a little bit of a hard time tracking… is the :
separator problematic for regex in general, or just for the Discourse implementation?
It’s the same regex rule, so both (the problem is not the engine but the rule itself, which I’ll try and fix).
Currently the rule is not capturing the left and the right number separately and this prevents linking to the correct id.
These should all work now (for legacy purposes only!—use dot notation for all new entries to D&D), with a caveat: legacy notation uses simplified rules which in some cases match non-existent sutta IDs to invalid SC links!
Thanks! So, to be clear, If I do do a AN 4:9 it’ll work, you just recommend using dots (An 4.9) instead? Or you won’t support colons going forward?
Colon rule will map all existing entries on D&D. While it will also work for new entries I recommend using dot notation as it is more precise (it will not map invalid SC links as opposed to the colon rule).
This is so great! It’s fun to see it all over the site now.
Just out of curiosity, could you share an example of what doesn’t link right with a colon?
Basically anything that has the last number larger than the last existing sutta ID in that chapter (see table Automatically linking up citations to suttas on D&D - #30 by musiko).
For example:
Dot notation doesn’t allow invalid IDs:
Why can’t you just use [:\.]
in the regex?
This is because of the way capturing groups work.
Let’s say we have strings ranging from SC1.1 to SC9.9. Then the corresponding regex to match them would be
\bSC\s?([1-9]\.[1-9])\b
where
\b
matches word boundary\s
matches whitespace character
?
repeated 0 or 1 times(
… )
is a capturing group
[1-9]
matches range of digits from 1–9\.
matches dot character (unescaped dot is a special character representing any character)Capturing groups are enumerated (from left to right) and can be used in substitution rules using $
and group number, in this case $1
.
This regex rule matches all strings from SC1.1 to SC9.9, and captures 1.1–9.9
Substitution rule can than be https://example.com/sc$1
, which will produce https://example.com/sc1.1
to https://example.com/sc9.9
If we want to match both .
and :
in SC1.1 to SC 9.9 and SC1:1 to SC9:9 the regex to match all strings would be
\bSC\s?([1-9][\.:][1-9])\b
but the capture this time would be 1.1–9.9 and 1:1–9:9. We cannot use $1
for substitution this time, because the link would be wrong for captures with :
.
We can adjust the regex like this (note the two capturing groups)
\bSC\s?([1-9])[\.:]([1-9])\b
and then construct the substitution rule as https.//example.com/sc$1.$2
(we capture two separate digits and manually substitute the separator with dot).
This seemingly works, so why its it not working in our real case scenario?
Note that all SC ranges in our example were of the same size (1.1–1.9, 2.1–2.9,…,9.1–9.9). What if this was not so? Let’s say our ranges shrink progressively, for example 1.1–1.9, 2.1–2.8,…,8.1–8.2 and 9.1). Now we cannot construct a simple matching regex rule, if we only want to match these specific ranges. The rule is a bit more complex, but still a single rule
\bSC\s?(?:(1)[\.:]([1-9])|(2)[\.:]([1-8])|(3)[\.:]([1-7])|(4)[\.:]([1-6])|(5)[\.:]([1-5])|(6)[\.:]([1-4])|(7)[\.:]([1-3])|(8)[\.:]([12])|(9)[\.:](1))\b
where
(?:
… )
is a non-capturing group (the outermost group)|
is the OR operator (in this case it helps making separate matcings for 1.1–1.9, 2.1–2.8,…, 8.1–8.2 and 9.1)But there are more than two capturing groups now (nine pairs) and they are all enumerated from left to right, which makes it impossible to construct the replacement rule as before: $1
.$2
, because this will only substitute 1.1–1.9, but not the others.
For this to work the one rule must be either broken into nine separate rules, or we must accept that we will match a wider range, and by that capture some invalid combinations $1
.$2
for the generated link.
Thank you sooo much for this reply. I appreciate all the time it must have taken to create.
I think I am starting to get it now. The core issue seems to be that while you could easily capture exclusively correct citations with a :
, once captured you would not be able to use what you captured in a url, because the url would not work. If we weren’t restricted to regex only, then the solution would be as simple as doing a .replace(":",".")
Am I understanding correctly? And that if we were to create a regex for each of the chapters in the AN and SN that would be a technical solution (although possibly too much work for the software to do quickly enough?)
In my link up app, I’m not restricted to regex only. I just did that because I was lazy. Or maybe I could call it “getting a minimal viable product to market quickly.”
In my Citation Helper app, since I am dealing with multiple websites, I have a basic structure object that defines what is allowable as a citation, and then I have an object for each website, "e.g. SuttaCentral) that specifics what suttas are available and what, if any, are only available in a range.
Obviously that is way more complicated and not suitable for the situation here without writing a whole new plugin. Personally I do think that your regex solution is a perfectly good solution here.
Thanks! I’m so happy seeing all the linked up citations as I browse the forum. I wonder if this will lead to more people taking the time to click to the suttas being discussed.
Exactly. The component on D&D only accepts regex, but you have more freedom in your environment. The easiest way to have both notations work correctly would be to match and replace using the exact regex first, and then run .replace(":", ".")
on the result.
Too much work for the human , software can proccess heaps.
Laziness is the mother of invention.
Probably doable using regex too, but whatever works for you is OK. I must always remind myself there are different tools for different jobs (Law of the instrument - Wikipedia).
It seems to work nicely and I’m also happy that you nudged me to do it, D&D looks way cooler with all these links.
I concur! It really does make it much easier to double check people’s references and dive into the suttas themselves. Kudos!
Added support for Vinaya (long and shorthand—with or without dashes, and Pli or Pi (for backwad comaptibility only) in long notation).
All links point to translation by Bhante @Brahmali (where present—some translations are not published yet?, e.g Bi Sk, Bu Pm and Bi Pm).
Bhikkhu Vibhanga | |
---|---|
Pli-Tv-Bu-Vb-Pj4 | Bu Pj 4 |
Pli-Tv-Bu-Vb-Ss13 | Bu Ss 13 |
Pli-Tv-Bu-Vb-Ay2 | Bu Ay 2 |
Pli-Tv-Bu-Vb-Np30 | Bu Np 30 |
Pli-Tv-Bu-Vb-Pc92 | Bu Pc 92 |
Pli-Tv-Bu-Vb-Pd4 | Bu Pd 4 |
Pli-Tv-Bu-Vb-Sk75 | Bu Sk 75 |
Pli-Tv-Bu-Vb-As7 | Bu As 7 |
Bhikkhuni Vibhanga | |
---|---|
Pli-Tv-Bi-Vb-Pj8 | Bi Pj 8 |
Pli-Tv-Bi-Vb-Ss13 | Bi Ss 13 |
Pli-Tv-Bi-Vb-Np12 | Bi Np 12 |
Pli-Tv-Bi-Vb-Pc96 | Bi Pc 96 |
Pli-Tv-Bi-Vb-Pd8 | Bi Pd 8 |
Pli-Tv-Bi-Vb-Sk75 | Bi Sk 75 |
Pli-Tv-Bi-Vb-As7 | Bi As 7 |
Khandhaka | |
---|---|
Pli-Tv-Kd22 | Kd 22 |
Parivara | |
---|---|
Pli-Tv-Pvr1.16 | Pvr 1.16 |
Pli-Tv-Pvr2.16 | Pvr 2.16 |
Pli-Tv-Pvr21 | Pvr 21 |
Patimokkha | |
---|---|
Pli-Tv-Bu-Pm | Bu Pm |
Pli-Tv-Bi-Pm | Bi Pm |
Thanks, @Musiko. And yes, you are right some of the pages are not showing on SuttaCentral. I think the problem occurs whenever a single page cover more than one rule. This is so for the bhikkhunī sekhiyas and paṭidesanīyas. Bhante @ Sujato, are you able to have a look at this?
could also be related to this issue: Linking together the Bhikkhuni and Bhikkhu rules which are the same
FWIW, I rarely see people giving citations for Vinaya, but it’s great to have them linking when they do.