A plea for (technical) help

I am a long term lay practitioner of non-sectarian Buddhism with an interest in the EBT’s.
I have recently discovered suttacentral and this forum and it has really reinvigorated my practice and research.
I am trying to begin some more research, focusing on the distribution of doctrinal formulas across the first four Nikayas of the Pali Tripitaka.
I have been trying to find a way to search for strings of text say " satipaṭṭhānā" or “pītiyā ca virāgā upekkhako ca viharati sato sampajāno” and see a list of results from the first 4 Nikayas, preferably listing DN first, MN, then SN, then AN.
That way I could quickly find parallels or repetitions in the 4N and compare them.

I have basically tried 4 options so far;

  1. suttacentral;
  • using the search bar I can get 584 results for “jhana”, however if any filter is then selected, say “root texts” i get 0. so I guess I just stick with the 584.
  • There does not seem to be any order to the results, MN, AN, Kv, they all just appear at random.
  • there doesn’t seem to be any way to just get results for the first 4 N
  • the results load a few at at a time, so you have to scroll down, click load more, and repeat
  1. buddhanexus search bar
  • jhana gives 200 results and says that is the limit so pretty much a non starter
  • again no way to limit results to 4N
  • again no way of ordering results by volume
  1. buddhanexus text search
  • no way to search directly, required to start from a sutta
  • navigating to DN2 and clicking on the first occurrence of “jhanam”, then changing the settings to “EARLY SUTTAS” and the length and similarity sliders gives 57 parallels of the first jhana formula.
  • the results are again not ordered by N
  1. Digital Pali Reader
  • I navigate to seach and type in Jhana and get the error “no books in selection” and can’t figure out what to do.

SO I am looking for advice!

What do others do to find where a doctrinal formula or term occurs in the first 4 Nikayas
Any tips or tricks or other useful sites would be greatly appreciated.
Also links to tutorials or resources.

Much Metta!

2 Likes

What I had was having an ebook which is complied from the 4 nikayā in epub format. Calibre is a very useful tool to do that. Edit ebook is available.

Then using most ebook readers, there search for the term. It took a while for the apple books software. Need to wait for them to finish the search before clicking to browse or else they redo it again.

I dunno where to download the epub version of B. Sujato’s 4 nikayā from sutta central.

GitHub is too unfriendly for those of us who are not used to seeing programming stuffs.

Edit add on: found something.

Edit add on 2:

https://thebuddhaswords.net/home/palisearch.html

This website might do.

3 Likes

Just tagging in @sabbamitta and @karl_lew for you as they are search superheros

3 Likes

The dpr will do exactly what you want.

Can you share a screen shot of what you are seeing?

2 Likes

To add to @NgXinZhao comment, here is a link to a thread containing the Digha, Majjhima, and Anguttara Nikaya for ebook

1 Like

I’m still using the (by now old) software Chattha Sangayana Tipitaka 4.0 by the Vipassana Research Institute. You can use the asterix* for wildcard search, search for two words, define their max distance, define the books you’re interested in… Here’s a screenshot

A (maybe) disadvantage is that you need the correct orthography for the search, i.e. jhāna, not jhana, etc. One diacritical mark is different than in suttacentral. The Chatta uses ṃ while suttacentral uses ṁ.

4 Likes

Well if there’s any superhero that’s @karl_lew, and I am just using the tools that he built. That is, scv-bilara, which is the basis of Voice search.

Now, scv-bilara or Voice respectively can search both for Pali or for a translation language (not that many that are available, but a few). Trying jhana, I get 5 results by default or 50 results if I set the maximum number of search results to 50—which is the maximum possible.

Voice does not have the aim to be complete, but to be useful, and especially so for users with a visual disability. Hence the limit to the number of search results, to avoid flooding.

Search results are ordered according to relevance in a certain text. That means, a text where a search term appears in 5 segments ranks higher than a text where it appears in only 1 segment; or if it’s in 1 segment in a short text, that’s higher than 1 segment in a long text. That’s basically the relevance order in Voice.

It is however possible to limit search to a certain part of the canon by using Tipitica Category flags (see here under “advanced search”). For example jhana -tc:dn gives you all DN results, which are 23 in that case.

By design, search is limited to Sutta texts of the Mahasangiti that have at least one segmented translation, using SuttaCentral’s GitHub repository (Vinaya may be included at a later time).

So overall, this tool does not serve the needs described here, as I understand the OP is interested in a complete and comprehensive list of results. Perhaps it can be of partial use.

3 Likes

https://www.digitalpalireader.online/_dprhtml/index.html?feature=search&type=0&query=p%C4%ABtiy%C4%81%20ca%20vir%C4%81g%C4%81%20upekkhako%20ca%20viharati%20sato%20sampaj%C4%81no&MAT=m&set=dmsak&book=1&part=1&rx=true

See those blue ticks on the left? If you tick nothing you will get nothing.

I find that for searching, DPR is better than the VRI tool because on the right it shows the context of the search term.

4 Likes

I tick boxes @Snowbird , I use Linux and I’ve tried it on Firefox and chromium and get the same error…

1 Like

Can you try to click on that link I gave above (the ugly one). It should have all the flags in it to make it work.

And a screen shot would be great.

Wouldn’t hurt to check the console either.

1 Like

Oh! wait!! I didn’t have Mula ticked!!! :slight_smile:

now it works!

2 Likes

Sadhu!!

BTW, they have a fairly active Discord whatever you call it. That’s a great place to get help too.

And if you end up there, it might be good to suggest that the interface force you to have at least one source checked.

4 Likes

Do you program at all? If so, you can use my repo sutta science to do exactly that.

Assuming I’ve understood your question correctly, here’s an R script to do that as an example.

library(dplyr)
library(stringi)

data_url <- "https://github.com/chaz23/sutta-science/raw/main/data/pali-texts/dataset_1/pli_sutta_data.Rda"

load(url(data_url))

# Phrase you want to search for.
term <- "satipatthana"

pali_sutta_data %>% 
  filter(grepl("^(d|m|s|a)n[0-9]", segment_id)) %>% 
  mutate(segment_text_trans = stri_trans_general(segment_text, id = "latin-ascii")) %>% 
  filter(grepl(term, segment_text_trans, ignore.case = TRUE)) %>% 
  mutate(sutta = stri_extract(segment_id, regex = "^.*(?=:)")) %>% 
  distinct(sutta)
3 Likes

@josephzizys, as Ayya Sabbamitta mentions, we use scv-bilara, which is a Linux search tool for SuttaCentral segmented texts stored in bilara-data. Scv-bilara is multilingual and knows the Pali alphabet. In particular, Scv-bilara allows romanized searches. This means that we can use scv-bilara to find dukkhassa mula, which returns a relevance-scored list of suttas. The relevance metric is quite important as it scores definitional sources highest regardless of nikaya.:

3.016   mn105
2.006   mn1
1.005   mn66

Pali searches are tricky. Spelling and word endings are mutable. Scv-bilara searches with regular expressions to match word prefixes. Indeed, Bhante Sujato himself recommends ripgrep. Ripgrep is the technology used to implement scv-bilara. So at this time the absolutely most precise tool for searching is actually ripgrep of bilara-data.

As you’ve mentioned, doctrinal formulas are key, and both Ayya Sabbamitta and I are collecting lists of doctrinal formulas for use as examples in Voice. We store Bhante Sujato’s discovered doctrinal formulas in English as examples-sutta-1-sujato.txt and in German as examples-de-sutta-1-sabbamitta.txt

If you would like to help us discover and share such doctrinal examples, we’d love to work together. We are focused on doctrinal formulas in translation rather than doctrinal formlas in Pali. However, a doctrinal formula is a doctrinal formula, so if you find fomulas we haven’t found, that would be COOL. We constantly find new ones ourselves.

:pray:

2 Likes

@josephzizys I have been doing something similar and had the same questions. Thanks for asking this question and thanks to everyone for their replies.
:pray:

3 Likes

Oh my goodness!!! thank you all SO MUCH!! this is so exciting, I now have multiple options that have me up to my neck in formulas and statistics and ideas, this is so great, this forum has been such a find, I really haven’t had this much fun on the internet EVER!

thank you all again.

Much Metta and Gratitude.

5 Likes

OK, a little update; I am using Digital Pali Reader and some of the results are confusing me, i have done a search for dukkhasamudayaṃ ariyasaccaṃ and it gives MN10, while on Buddhanexus and Suttacentral MN10 does not come up but MN141 does.

After much noodling I figured out that Digital Pali Reader was ignoring dukkhasamudayassa ariyasaccass in MN141, which is fine, except that I would much prefer to find it.

So I guess my questions now are; is the Suttacentral Pali source more conservative than the Digital Pali Reader and does that explain why Suttacentral’s MN10 lacks dukkhasamudayaṃ ariyasaccaṃ?

And secondly, it has been a long time since I was a web-programmer, like back in the ASP and PHP days, but I am willing to get back into the game for the sake of the dhamma, is there a gentle introduction to the sutta central’s github repo? I am keen to try and get my hands dirty. as I would just love to have Digital Pali readers ability to organize results by Nikaya but want to get the nice variant results that SUtta Central seems to do so well.

all advice appreciated!

Metta

“I’m not dead yet!”
—PHP

I’d also be interested to read such a document. I’m guessing that the devs are too busy devving to maintain such a document.

I may have mentioned this before, but as we speak, there is work being done to improve the search feature on SC.

Here is one of the issues on github:

I think this is because in the DPR source MN10 is actually identical to DN22. I mean you can even see in the title of the page it says Mahāsatipaṭṭhāna. Strange, eh? I’m guessing that in the version used on SC they decided to correct this.

1 Like

Ha! Here I was clicking around trying to find the edition of the text being used and they haven’t used any!

Thanks @Snowbird I guess if I manage to figure out how to get suttacentral staged on my machine for development I can write a wiki article about it and thus contribute to the community :slightly_smiling_face:

Metta