Hi Karl,
Much mudita for all your work on Voice, and I’m impressed by the search tool in scv-bilara.
Given all the wonderful new developments of SC, I am very humbled to give you an update from my end.
I have written a python script to implement a search of bilara for my own local study and for getting an idea about how voice navigation could work in the future. The script is named text_analysis.py (link: text_analysis.py) and here are the features:
- Searching arbitrary english words or sentences in the entire translation of the Tipitika by Ven. Sujato.
For example, the shell command
python3.7 text_analysis.py “the five grasping aggregates”
will return an exhaustive list of results, as follows:
A. Returning the verse where the search string has been found together with its key, as in the output from the above:
“an10.60:4.4”: "And so they meditate observing impermanence in the five grasping aggregates. ",
B. Returning underneath the corresponding Pali verse from the root text of the above:
an10.60:4.4: Iti imesu pañcasu upādānakkhandhesu aniccānupassī viharati.
C. Returns a list of all sutta numbers where the search string has been found
D. Returns an ordered list of suttas where the words have been found most frequently, as from the above example:
[(6, ‘sn22.122’), (6, ‘dn22’), (5, ‘sn22.89’), (5, ‘sn22.123’), (4, ‘sn22.82’), (4, ‘mn109’), (3, ‘sn22.48’), (3, ‘mn44’), (3, ‘mn149’), (3, ‘mn141’), (3, ‘mn10’), (2, ‘sn4.16’), (2, ‘sn22.100’), (2, ‘mn28’), (2, ‘mn23’), (2, ‘mn122’), (2, ‘an8.2’), (2, ‘an4.90’), (1, ‘sn56.11’), (1, ‘sn46.30’), (1, ‘sn45.178’), (1, ‘sn45.159’), (1, ‘sn35.245’), (1, ‘sn35.238’), (1, ‘sn22.79’), (1, ‘sn22.47’), (1, ‘sn22.105’), (1, ‘sn22.104’), (1, ‘sn22.103’), (1, ‘mn9’), (1, ‘mn75’), (1, ‘mn151’), (1, ‘mn112’), (1, ‘dn34’), (1, ‘dn33’), (1, ‘dn14’), (1, ‘an9.66’), (1, ‘an6.63’), (1, ‘an5.30’), (1, ‘an4.41’), (1, ‘an3.61’), (1, ‘an10.60’)]
E. Analyzes and orders output according to the frequency of the next following word after the search string:
[[‘’, 50], [‘are’, 11], [‘in’, 8], [‘I’m’, 3], [‘as’, 3], [‘for’, 3], [‘.’’, 1], [‘?’’, 1], [‘is’, 1], [‘that’, 1], [‘’’, 1]]
Another example for this feature: Finding the names of all Venerables and their frequency is easy:
python3.7 text_analysis.py “Venerable”
returns
[[‘Ānanda’, 303], [‘Sāriputta’, 207], [‘sir’, 105], [‘Mahāmoggallāna’, 63], [‘Anuruddha’, 49], [‘Mahākaccāna’, 32], [‘Rādha’, 29], [‘’, 25], [‘Udāyī’, 25], [‘Mahākassapa’, 24], [‘Channa’, 20], [‘Vaṅgīsa’, 19], [‘Bhāradvāja’, 16], [‘Rāhula’, 15], [‘s’, 15], [‘Mahākoṭṭhita’, 14], [‘Nārada’, 14], [‘Kassapa’, 13], [‘Bakkula’, 12], [‘Dhammika’, 11], [‘Mahācunda’, 11], [‘Samiddhi’, 11], [‘Upavāṇa’, 11], [‘Anurādha’, 9], [‘Phagguṇa’, 9], [‘Khemaka’, 8], [‘Meghiya’, 8], [‘Aṅgulimāla’, 7], [‘Isidatta’, 7], [‘Māluṅkyaputta’, 7], [‘Puṇṇa’, 7], [‘Bāhiya’, 6], [‘Mahākappina’, 6], [‘Nandaka’, 6], [‘Uttara’, 6], [‘Vakkali’, 6], [‘Bhaddiya’, 5], [‘Citta’, 5], [‘Kimbila’, 5], [‘Koṇḍañña’, 5], [‘Kāmabhū’, 5], [‘Raṭṭhapāla’, 5], [‘Sāriputta’s’, 5], [‘Upāli’, 5], [‘Assaji’, 4], [‘Bhūmija’, 4], [‘Mahaka’, 4], [‘Moggallāna’, 4], [‘Nanda’, 4], [‘Nāgita’, 4], [‘Uttiya’, 4], [‘Visākha’, 4], [‘Abhiya’, 3], [‘Bhadda’, 3], [‘Bhaddāli’, 3], [‘Gavampati’, 3], [‘Girimānanda’, 3], [‘Godhika’, 3], [‘Gotama’, 3], [‘Kassapagotta’, 3], [‘Lomasakaṅgiya’, 3], [‘Migajāla’, 3], [‘Nāgadatta’, 3], [‘Nāgasamāla’, 3], [‘Phagguna’, 3], [‘Piṇḍola’, 3], [‘Pukkusāti’, 3], [‘Soṇa’, 3], [‘Upasena’, 3], [‘Vacchagotta’, 3], [‘Bhaddaji’, 2], [‘Brahmadeva’, 2], [‘Candikāputta’, 2], [‘Cundaka’, 2], [‘Gavesī’, 2], [‘Godatta’, 2], [‘Kappa’, 2], [‘Khema’, 2], [‘Kosiya’, 2], [‘Lomasavaṅgīsa’, 2], [‘Mahāmoggallāna’s’, 2], [‘Māgaṇḍiya’, 2], [‘Nanda—who’, 2], [‘Pilindavaccha’, 2], [‘Puṇṇiya’, 2], [‘Revata’, 2], [‘Sandha’, 2], [‘Saviṭṭha’, 2], [‘Saṅgāmaji’, 2], [‘Sela’, 2], [‘Seniya’, 2], [‘Subhadda’, 2], [‘Subhūti’, 2], [‘Surādha’, 2], [‘Susīma’, 2], [‘Tissa’, 2], [‘Udena’, 2], [‘Vidhura’, 2], [‘Yasoja’, 2], [‘Ambaṭṭha’, 1], [‘Anuruddha’s’, 1], [‘Ariṭṭha’, 1], [‘Bhagu’, 1], [‘Brahmadeva’s’, 1], [‘Bāhuna’, 1], [‘Cetaka’, 1], [‘Cūḷapanthaka’, 1], [‘Dabba’, 1], [‘Dāsaka’, 1], [‘Gotama.’’, 1], [‘Gotama?’’, 1], [‘Isidāsī’, 1], [‘Kaccāna’, 1], [‘Kaccānagotta’, 1], [‘Lakkhaṇa’, 1], [‘Mahākaccāna’s’, 1], [‘Mogharāja’, 1], [‘Musila’, 1], [‘Māluṅkya’, 1], [‘Nanda—the’, 1], [‘Nigrodhakappa’, 1], [‘Raṭṭhapāla’s’, 1], [‘Sabhiya’, 1], [‘Sañjīva’, 1], [‘Senior’, 1], [‘Sujāta’, 1], [‘Tissa—the’, 1], [‘Tāḷapuṭa’, 1]]
- Furthermore, it returns all verses associated with a key, for example:
python3.7 text_analysis.py “mn21”
“mn21:0.1”: "Middle Discourses 21 ",
mn21:0.1: Majjhima Nikāya 21
“mn21:0.2”: "The Simile of the Saw ",
mn21:0.2: Kakacūpamasutta
“mn21:1.1”: "So I have heard. ",
mn21:1.1: Evaṁ me sutaṁ—
etcetera for the rest of the sutta
- Returns a specific verse from a key:
python3.7 text_analysis.py “dn1:0.2”
“dn1:0.2”: "The Prime Net ",
dn1:0.2: Brahmajālasutta
My idea is to let the program speak at least some of those search results given a voice nav input.
Finally, sadhu to everyone for making this resource available.