Parallels for the Suttanipāta: would someone like to volunteer for data entry?

We’ve been very lucky to receive the kind assistance of a number of volunteers through Discourse, and I’m wondering whether I can push my luck!

Today, while checking something else, I stumbled on a file in the “was meant to be done but got forgotten” folder. It’s a highly detailed and complete set of parallels for the verses of the Suttanipāta (Snp on SC).

The source text is:

Yajima, Michihiko (矢島道彦) “Suttanipata対応句索引 An Index to Parallel Verses and Padas of the Suttanipata collected from Buddhist, Jain, and Brahmanical Texts.” — In: Bulletin of the Institute of Buddhist Culture, Tsurumi University. - 2 (1997), 1, S. 1-97.

It was brought to my attention by @Qianxi some time ago, but I’m only just getting around to it.

Thus far SC has not listed many verse parallels. Ayya @Vimala is addressing this by compiling a set of parallels for the Dhammapada. Adding the Suttanipāta to this would be tremendous.

The text is over 1,000 verses, and most verses have several parallels, so you can do the math. The data takes about 80 pages in the printed edition. So this is a job that will require care, patience, and attention to detail.

The file is a scanned pdf of moderate quality. I have OCR-d the text, and include the results. You can’t expect much from OCR in a case like this. Mostly the data will have to be entered by hand.

However the OCR did do a reasonable job of recognizing Chinese characters, so you should be able to enter these by cut and paste. Knowing Chinese will help, but it is not essential.

Let me know if anyone’s interested. Here are the files: (2.6 MB)


Wow, it doesn’t feel like nearly 3 years ago.

Yes, I’ll help if I can.

Will verse parallels be displayed in a new way, or will they simply be listed as partial parallels to the sutta?

Also, looking at the pdf, around half of the parallels are not whole verse parallels but parallels to certain lines in a verse.

Anyway, i’m happy to help if i’m able.

That is so good, thanks so much.

As for display, we have not yet considered this in detail. Our revised data system will allow for a much more granular approach to parallels, as well as different ways of viewing the data. I would guess that there are maybe 20,000 verse parallels altogether, so we need to find a way to use this data effectively.

Most likely, yes, they will be listed as a kind of partial parallel. But perhaps we will be more informative than that, specifying that it is a verse parallel. In addition, perhaps we will have special views for examining the verse parallels.

Yes. So that’s a consideration. As long as you’re happy to enter the data, we’ll handle it. Our texts will be segmented on a line by line basis, so we can directly link even on this granularity.

There are probably some texts that we’d want to exclude. The source lists parallels found in commentaries. Normally we don’t include this. If we were to record all commentarial quotes, well, that would be a job! More importantly, it usually doesn’t tell us anything interesting. So I’d suggest leave these out. (Note that the commentaries are sometimes labelled with abbreviations that on SC are used for the Agamas. So SA for us is Samyuktagama, while in the paper it is Samyutta-Atthakatha).

We can also leave out the Niddesa (Mnd and Cnd). Our Pali text already includes these cross-references, so we can automatically extract these and there is no need to enter them by hand.

Allow me to make a few more remarks on the task.

  • Don’t bother preserving diacritical marks in the text. So Thig rather than Thīg.
  • Ultimately we will transform all the references to the form used on SC. Mostly these are trivial. For example, “J” becomes “Ja” and so on. It’s up to you whether to do this yourself or leave it to us.
  • This also applies to the conventions of the references, for example use of commas and so on. My preference would be to give references in a comma separated list, eliminating commas inside the references. However, this is up to you. The most important thing is to be consistent.
  • In some cases the text gives alternate references. Including both styles merely clutters the data. For example, in verse 6a we have
    Vin ii,184(CV VII,1,6ab); Ud 20(II,10ab).
    I haven’t checked all such cases in the text, but in these instances the basic reference is vol/page while the bracketed reference is by sutta or section. Now, since SC is based on sutta/section, this is more useful for us. Also it is more precise, as it has the line numbers. So in this case we should have:
    CV VII,1,6ab; Ud II,10ab.
    Or, if we use the comma-separated conventions:
    Cv 7,1,6ab, Ud 2.10ab
    Ultimately these will be transformed to the URL IDs of SC, where “cv” for cullavagga is replaced by “kd” for Khandhaka:
  • As this example shows, we also prefer to eliminate Roman numerals except for vol. references.
  • What would be very handy is if you could input the Chinese references in the same form used on SC. Would this be possible?
  • For the comfort and well-being of our machines, we need to write each reference in full. So, for example, for 1a instead of
    J ii,113-4; 266; iv, 58
    we need
    J ii, 113-4; J ii, 226; J iv, 58.
    Or, if we follow the SC preferred style,
    Ja ii 113-4, Ja ii 226, Ja iv 58.
  • There’s no need to preserve the columns of the original. Just a comma separated list (or if preserving the style of the original references, a semi-colon separated list).
  • Regarding the OCR file, should you choose to use it, it will make more sense if you know that it simply runs line by line. I tried different configurations, and, while this is by no means perfect, this gave the most useful results.
  • In some cases the numbering will differ from that on SC, as we count the verses differently. I’d suggest just keeping it as is for now, we can adjust this later.
How about this for the first verse. The references to whole sutta parallels ( after the (1)) are written in a different format to line/verse parallels, but I couldn’t see how to correct this as I don’t have access to the books referred to. For the Chinese whole sutta parallels however I could point to the first line of the ‘scroll/juan’ referred to.

I Uragavagga (1) cf.GDh ii.Bhikkhu, Uv 32.Bhiksu, SDh 22.Uraga-v, cf.T 212.764c12, T 213.794a18 1 GDh 82, cf.Uv 32, cf.Uv 62-80, SDh 402, T 210.559c, T 213.797c 1a Dh 222a, cf.J iii.133-4, J iii.266, J iv.58, GDh 275, MBh i.74.2a, MBh i.74.4a, MBh iii.30.17a 1cd Sn 2-17cd, GDh 81-90, Mvu iii.105.15 1d Pv i.12.1, J iii.164, J iv.341, J v.100, J vi.361, Ap 394.13, Bv ix.28, cf.Suy i.2.2.1a, Utt 14.34ab, Utt 19.86d, BrhadUp iv.4.7, PrasnaUp v.5, MBh i.74.4c, MBh xii.242.11b

Are the “cf.” compare references useful? Cut and paste google translate of page 4 of the Japanese introduction seems to say that ‘cf.’ indicates ‘somewhat similar’ rather than ‘strikingly similar/the same’. Is tagging a ‘cf.’ in front of the reference an OK way of recording these?

I won’t change any of the abbreviations because it would likely introduce mistakes.

I took out references to Dighanikayatthakatha-tika, Suttanipata-atthakatha, Vinaya-atthakatha (DAT, SnA, VinA - assume those are all commentaries?), but left in those to Upanishads, Mahabharata and Jaina texts (PrasnaUp, BrhadUp, MBh, Utt, Suy).

By the way, the mouseover comment on Chinese line references on suttacentral (eg. ) says that line references refer to ‘volume, page, column, line’. In fact, they only refer to Taisho sutra number + page, column and line. I don’t think there are any Taisho sutra numbers used across multiple volumes, so sutra references without volume number should still work in all cases. I have used
T Sutranumber.PageColumnLine
in the Snp references above, although usually the references given only go as far as a certain column.

That’s great, looking very nice. I notice the MBh iii reference is not easy to parse, but I think you’ve got it right.

Indeed. For these can you use a slightly different system. The “suttas” correspond to the basic ID on SuttaCentral, so it would make sense to just use these. So for the first row we would have:

Snp1.1, cf.GDh ii.Bhikkhu, Uv 32.Bhiksu, SDh 22.Uraga-v, cf.T 212.764c12, T 213.794a18

You could even keep these in a separate file if you like.

Fine: but would it be possible to indicate both first and last lines?

Yes, they are what we call “partial parallels”. Basically the idea is “if you are studying this thing you might want to check that thing”. It’d be better to label these with an asterisk as on SC:

1 GDh 82, Uv 32*, Uv 62-80*, SDh 402, T 210.559c, T 213.797c

I’m a little concerned about the lonely numbers at the start of the line. Theoretically they should be fine but it would be nice to disambiguate them for easier data processing later. Maybe just label them with a hash:

#1 GDh 82, Uv 32*, Uv 62-80*, SDh 402, T 210.559c, T 213.797c

Okay, it’s up to you. The only thing is the SC system is more robust. Similar to the issue above, it helps to avoid mistakes in data entry that might confuse a machine. Using DN, MN, SN, AN, Snp, Ja, Dhp is less likely to result in mistakes than the text’s D, M, S, A, Sn, J, Dh. But as long as you’re careful it isn’t that big of a deal.

Perfect. Yes, all those with A at the end are commentaries. DAT is a subcommentary (ṭīkā). These are all listed in the introduction, but if you have any doubts just let me know.

The inclusion of the non-Buddhist texts is one of the most interesting things about this concordance. I suppose we’ll have to work out how to link to them one day!

Yes, thanks, we should fix this. @Vimala can you adjust the text in the appropriate js file? Ta!

Yes, this is fine.

I have changed this into: Sutra number and page, column, and line in the Taishō canon.

I’m looking forward to receiving the finished data-set!

Not sure if it is of any use, but I noticed that in this text there is a reference to the work of Otto Franke from 1912, which is available on JSTOR:

I’m sure the Japanese article will have included all the references included in Franke.

Dear @Qianxi,
It has been some time since you volunteered your help in compiling this list and I must admit that I completely forgot about it until now. How are you getting on? Do you need any help?

Really sorry to @Vimala and @sujato for volunteering for this and then disappearing for four years! Life suddenly got very busy. I’m not in a position to help any time soon, but I wish you all the best, this site is such a valuable project.


No worries @Qianxi! Such is life.
I hope you are doing well. Please take good care of yourself!