Training AI models on the suttas

The project I’ve been working on for a long time is to write software that can help to produce from all segmented English translations of the suttas, a large database of:

The speaker of things inside quotes, if the speaker is indicated somewhere in the text
So I have heard.  At one time the Buddha was staying near Rājagaha in the Mango Grove of Jīvaka Komārabhacca, together with a large Saṅgha of 1,250 mendicants.  Now, at that time it was the sabbath—the Komudi full moon on the fifteenth day of the fourth month—and King Ajātasattu Vedehiputta of Magadha was sitting upstairs in the royal longhouse surrounded by his ministers.  Then Ajātasattu expressed this heartfelt sentiment,   
 
    King Ajātasattu Vedehiputta: “Oh, sirs, this moonlit night is so very delightful, so beautiful, so glorious, so lovely, so striking. Now, what ascetic or
                                 brahmin might I pay homage to today, paying homage to whom my mind might find peace?”

  When he had spoken, one of the king’s ministers said to him,   
  
    one of the king's ministers: “Sire, Pūraṇa Kassapa leads an order and a community, and teaches a community. He’s a well-known and famous religious founder,
                                 regarded as holy by many people. He is of long standing, long gone forth; he is advanced in years and has reached the final
                                 stage of life. Let Your Majesty pay homage to him.  Hopefully in so doing your mind will find peace.”

  But when he had spoken, the king kept silent.  Another of the king’s ministers said to him,   
  
another of the king's ministers: “Sire, Makkhali Gosāla leads an order and a community, and teaches a community. He’s a well-known and famous religious founder,
                                 regarded as holy by many people. He is of long standing, long gone forth; he is advanced in years and has reached the final
                                 stage of life. Let Your Majesty pay homage to him.  Hopefully in so doing your mind will find peace.”

  But when he had spoken, the king kept silent.  Another of the king’s ministers said to him,   
  
another of the king's ministers: “Sire, Ajita Kesakambala leads an order and a community, and teaches a community. He’s a well-known and famous religious
                                 founder, regarded as holy by many people. He is of long standing, long gone forth; he is advanced in years and has reached the
                                 final stage of life. Let Your Majesty pay homage to him.  Hopefully in so doing your mind will find peace.”

  But when he had spoken, the king kept silent.  Another of the king’s ministers said to him,   
  
another of the king's ministers: “Sire, Pakudha Kaccāyana leads an order and a community, and teaches a community. He’s a well-known and famous religious
                                 founder, regarded as holy by many people. He is of long standing, long gone forth; he is advanced in years and has reached the
                                 final stage of life. Let Your Majesty pay homage to him.  Hopefully in so doing your mind will find peace.”

  But when he had spoken, the king kept silent.  Another of the king’s ministers said to him,   
  
another of the king's ministers: “Sire, Sañjaya Belaṭṭhiputta leads an order and a community, and teaches a community. He’s a well-known and famous religious
                                 founder, regarded as holy by many people. He is of long standing, long gone forth; he is advanced in years and has reached the
                                 final stage of life. Let Your Majesty pay homage to him.  Hopefully in so doing your mind will find peace.”

  But when he had spoken, the king kept silent.  Another of the king’s ministers said to him,   
  
another of the king's ministers: “Sire, Nigaṇṭha Nātaputta leads an order and a community, and teaches a community. He’s a well-known and famous religious
                                 founder, regarded as holy by many people. He is of long standing, long gone forth; he is advanced in years and has reached the
                                 final stage of life. Let Your Majesty pay homage to him.  Hopefully in so doing your mind will find peace.”

  But when he had spoken, the king kept silent.  Now at that time Jīvaka Komārabhacca was sitting silently not far from the king.  Then the king said to him,  

    King Ajātasattu Vedehiputta: “But my dear Jīvaka, why are you silent?”
    
            Jīvaka Komārabhacca: “Sire, the Blessed One, the perfected one, the fully awakened Buddha is staying in my mango grove together with a large Saṅgha
                                 of 1,250 mendicants. He has this good reputation:  ‘That Blessed One is perfected, a fully awakened Buddha, accomplished in
                                 knowledge and conduct, holy, knower of the world, supreme guide for those who wish to train, teacher of gods and humans,
                                 awakened, blessed.’  Let Your Majesty pay homage to him.  Hopefully in so doing your mind will find peace.”

    King Ajātasattu Vedehiputta: “Well then, my dear Jīvaka, have the elephants readied.”

            Jīvaka Komārabhacca: “Yes, Your Majesty,”

 replied Jīvaka. He had around five hundred female elephants readied, in addition to the king’s bull elephant for riding. Then he informed the king,   
 
            Jīvaka Komārabhacca: “The elephants are ready, sire. Please go at your convenience.”

  Then King Ajātasattu had women mounted on each of the five hundred female elephants, while he mounted his bull elephant. With attendants carrying torches, he set out in full royal pomp from Rājagaha to Jīvaka’s mango grove.  But as he drew near the mango grove, the king became frightened, scared, his hair standing on end.  He said to Jīvaka,   
   
    King Ajātasattu Vedehiputta: “My dear Jīvaka, I hope you’re not deceiving me! I hope you’re not betraying me!  I hope you’re not turning me over to my
                                 enemies!  For how on earth can there be no sound of coughing or clearing throats or any noise in such a large Saṅgha of 1,250
                                 mendicants?”

            Jīvaka Komārabhacca: “Do not fear, great king, do not fear! I am not deceiving you,  or betraying you,  or turning you over to your enemies.  Go
                                 forward, great king, go forward! Those are lamps shining in the pavilion.”

  Then King Ajātasattu rode on the elephant as far as the terrain allowed, then descended and approached the pavilion door on foot, where he asked Jīvaka,  
 
    King Ajātasattu Vedehiputta: “But my dear Jīvaka, where is the Buddha?”   
 
            Jīvaka Komārabhacca: “That is the Buddha, great king, that is the Buddha! He’s sitting against the central column facing east, in front of the
                                 Saṅgha of mendicants.”

  Then the king went up to the Buddha and stood to one side.  He looked around the Saṅgha of monks, who were so very silent, like a still, clear lake, and expressed this heartfelt sentiment,  

    King Ajātasattu Vedehiputta: “May my son, Prince Udāyibhadda, be blessed with such peace as the Saṅgha of mendicants now enjoys!”
 
                     the Buddha: “Has your mind gone to one you love, great king?”
 
    King Ajātasattu Vedehiputta: “I love my son, sir, Prince Udāyibhadda. May he be blessed with such peace as the Saṅgha of mendicants now enjoys!”

  Then the king bowed to the Buddha, raised his joined palms toward the Saṅgha, and sat down to one side.  He said to the Buddha,  
 
    King Ajātasattu Vedehiputta: “Sir, I’d like to ask you about a certain point, if you’d take the time to answer.”

                     the Buddha: “Ask what you wish, great king.”

    King Ajātasattu Vedehiputta: “Sir, there are many different professional fields. These include elephant riders, cavalry, charioteers, archers, bannermen,
                                 adjutants, food servers, warrior-chiefs, princes, chargers, great warriors, heroes, leather-clad soldiers, and sons of
                                 bondservants.  They also include bakers, barbers, bathroom attendants, cooks, garland-makers, dyers, weavers, basket-makers,
                                 potters, accountants, finger-talliers, or those following any similar professions. All these live off the fruits of their
                                 profession which are apparent in the present life.  With that they bring happiness and joy to themselves, their parents, their
                                 children and partners, and their friends and colleagues. And they establish an uplifting religious donation for ascetics and
                                 brahmins that’s conducive to heaven, ripens in happiness, and leads to heaven.  Sir, can you point out a fruit of the ascetic
                                 life that’s likewise apparent in the present life?”
 
                     the Buddha: “Great king, do you recall having asked this question of other ascetics and brahmins?”
 
    King Ajātasattu Vedehiputta: “I do, sir.”
 
    King Ajātasattu Vedehiputta: “If you wouldn’t mind, great king, tell me how they answered.”
 
    King Ajātasattu Vedehiputta: “It’s no trouble when someone such as the Blessed One is sitting here.”

                     the Buddha: “Well, speak then, great king.”
                                                                                                                     
    King Ajātasattu Vedehiputta: “One time, sir, I approached Pūraṇa Kassapa and exchanged greetings with him. When the greetings and polite conversation were
                                 over, I sat down to one side, and asked him the same question.  He said to me:  ‘Great king, the one who acts does nothing
                                 wrong when they punish, mutilate, torture, aggrieve, oppress, intimidate, or when they encourage others to do the same. They do
                                 ...
                                 bring happiness and joy to themselves, their parents, their children and partners, and their friends and colleagues. And they
                                 establish an uplifting religious donation for ascetics and brahmins that’s conducive to heaven, ripens in happiness, and leads
                                 to heaven.  Sir, can you point out a fruit of the ascetic life that’s likewise apparent in the present life?”
               
                     the Buddha: “I can, great king. Well then, I’ll ask you about this in return, and you can answer as you like.  What do you think, great
                                 king?  Suppose you had a person who was a bondservant, a worker. They get up before you and go to bed after you, and are
                                 obliging, behaving nicely and speaking politely, and gazing up at your face.  They’d think:  ‘The outcome and result of good
                                 deeds is just so incredible, so amazing!  For this King Ajātasattu is a human being, and so am I.  Yet he amuses himself,
                                 supplied and provided with the five kinds of sensual stimulation as if he were a god.  Whereas I’m his bondservant, his worker.
                                 I get up before him and go to bed after him, and am obliging, behaving nicely and speaking politely, and gazing up at his face.
                                 I should do good deeds.  Why don’t I shave off my hair and beard, dress in ocher robes, and go forth from the lay life to
                                 homelessness?’  After some time, that is what they do.  Having gone forth they’d live restrained in body, speech, and mind,
                                 living content with nothing more than food and clothes, delighting in seclusion.  And suppose your men were to report all this
                                 to you.  Would you say to them:  ‘Bring that person to me! Let them once more be my bondservant, my worker’?”
 
    King Ajātasattu Vedehiputta: “No, sir. Rather, I would bow to them, rise in their presence, and offer them a seat. I’d invite them to accept robes,
                                 almsfood, lodgings, and medicines and supplies for the sick. And I’d organize their lawful guarding and protection.”
 
                     the Buddha: “What do you think, great king? If this is so, is there a fruit of the ascetic life apparent in the present life or not?”
 
    King Ajātasattu Vedehiputta: “Clearly, sir, there is.”
 
                     the Buddha: “This is the first fruit of the ascetic life that’s apparent in the present life, which I point out to you.”
 
    King Ajātasattu Vedehiputta: “But sir, can you point out another fruit of the ascetic life that’s likewise apparent in the present life?”
               
                     the Buddha: “I can, great king. Well then, I’ll ask you about this in return, and you can answer as you like.  What do you think, great
                                 king?  Suppose you had a person who was a farmer, a householder, a hard worker, someone who builds up their capital.  They’d
                                 think:  ‘The outcome and result of good deeds is just so incredible, so amazing!  For this King Ajātasattu is a human being,
                                 and so am I.  Yet he amuses himself, supplied and provided with the five kinds of sensual stimulation as if he were a god.
                                 Whereas I’m a farmer, a householder, a hard worker, someone who builds up their capital.  I should do good deeds.  Why don’t I
                                 shave off my hair and beard, dress in ocher robes, and go forth from the lay life to homelessness?’  After some time they give
                                 up a large or small fortune, and a large or small family circle. They’d shave off hair and beard, dress in ocher robes, and go
                                 forth from the lay life to homelessness.  Having gone forth they’d live restrained in body, speech, and mind, living content
                                 with nothing more than food and clothes, delighting in seclusion.  And suppose your men were to report all this to you.  Would
                                 you say to them:  ‘Bring that person to me! Let them once more be a farmer, a householder, a hard worker, someone who builds up
                                 their capital’?”

    King Ajātasattu Vedehiputta: “No, sir. Rather, I would bow to them, rise in their presence, and offer them a seat. I’d invite them to accept robes,
                                 almsfood, lodgings, and medicines and supplies for the sick. And I’d organize their lawful guarding and protection.”

                     the Buddha: “What do you think, great king? If this is so, is there a fruit of the ascetic life apparent in the present life or not?”
 
    King Ajātasattu Vedehiputta: “Clearly, sir, there is.”
 
                     the Buddha: “This is the second fruit of the ascetic life that’s apparent in the present life, which I point out to you.”
 
    King Ajātasattu Vedehiputta: “But sir, can you point out a fruit of the ascetic life that’s apparent in the present life which is better and finer than
                                 these?”
 
                     the Buddha: “I can, great king. Well then, listen and pay close attention, I will speak.”
 
    King Ajātasattu Vedehiputta: “Yes, sir,”

 replied the king.  The Buddha said this:   
                                                                                                                                                                                                                                                                              
                     the Buddha: “Consider when a Realized One arises in the world, perfected, a fully awakened Buddha, accomplished in knowledge and conduct,
                                 holy, knower of the world, supreme guide for those who wish to train, teacher of gods and humans, awakened, blessed. He has
                                 ...
                                 fish swimming about or staying still.’  In the same way, when their mind has become immersed in samādhi like this—purified,
                                 bright, flawless, rid of corruptions, pliable, workable, steady, and imperturbable—they extend it and project it toward
                                 knowledge of the ending of defilements.  This too, great king, is a fruit of the ascetic life that’s apparent in the present
                                 life which is better and finer than the former ones.  And, great king, there is no other fruit of the ascetic life apparent in
                                 the present life which is better and finer than this.”

  When the Buddha had spoken, King Ajātasattu said to him,   
     
    King Ajātasattu Vedehiputta: “Excellent, sir! Excellent! As if he were righting the overturned, or revealing the hidden, or pointing out the path to the
                                 lost, or lighting a lamp in the dark so people with good eyes can see what’s there, the Buddha has made the teaching clear in
                                 many ways.  I go for refuge to the Buddha, to the teaching, and to the mendicant Saṅgha.  From this day forth, may the Buddha
                                 remember me as a lay follower who has gone for refuge for life.  I have made a mistake, sir. It was foolish, stupid, and
                                 unskillful of me to take the life of my father, a just and principled king, for the sake of authority.  Please, sir, accept my
                                 mistake for what it is, so I will restrain myself in future.”
  
                     the Buddha: “Indeed, great king, you made a mistake. It was foolish, stupid, and unskillful of you to take the life of your father, a just
                                 and principled king, for the sake of sovereignty. But since you have recognized your mistake for what it is, and have dealt
                                 with it properly, I accept it.  For it is growth in the training of the Noble One to recognize a mistake for what it is, deal
                                 with it properly, and commit to restraint in the future.”

  When the Buddha had spoken, King Ajātasattu said to him,  

 
    King Ajātasattu Vedehiputta: “Well, now, sir, I must go. I have many duties, and much to do.”

   
                     the Buddha: “Please, great king, go at your convenience.”

  Then the king, having approved and agreed with what the Buddha said, got up from his seat, bowed, and respectfully circled him, keeping him on his right, before leaving.  Soon after the king had left, the Buddha addressed the mendicants,   
  
                     the Buddha: “The king is broken, mendicants, he is ruined.  If he had not taken the life of his father, a just and principled king, the
                                 stainless, immaculate vision of the Dhamma would have arisen in him in that very seat.”

  That is what the Buddha said.  Satisfied, the mendicants were happy with what the Buddha said.
Where each discourse took place, in each part of the sutta being asked about
{
  "dn2:1.5+dn2:1.6": "King Aj\u0101tasattu Vedehiputta",
  "dn2:2.2+dn2:2.4": "one of the king's ministers",
  "dn2:3.2+dn2:3.4": "another of the king's ministers",
  "dn2:4.2+dn2:4.4": "another of the king's ministers",
  "before^DN2_4.2^who_else_present": "a large Sa\u1e45gha of 1,250 mendicants",
  "before^DN2_4.2^where": "R\u0101jagaha in the Mango Grove of J\u012bvaka Kom\u0101rabhacca",
  "dn2:5.2+dn2:5.4": "another of the king's ministers",
  "dn2:6.2+dn2:6.4": "another of the king's ministers",
  "dn2:7.2+dn2:7.4": "another of the king's ministers",
  "dn2:8.3": "King Aj0101tasattu Vedehiputta",
  "dn2:8.4+dn2:8.8": "J\u012bvaka Kom\u0101rabhacca",
  "dn2:8.9": "King Aj0101tasattu Vedehiputta",
  "dn2:9.1": "J\u012bvaka Kom\u0101rabhacca",
  "before^DN2_9.1^who_else_present": "the king's ministers, the Buddha, a large Sa\u1e45gha of 1,250 mendicants, the king's bull elephant",
  "before^DN2_9.1^where": "J\u012bvaka's mango grove",
...
Broad lists and numbered lists based on sentence structure, or if they are named
{
  "dn2": {
    "The six religious founders [dn2:0.3]": [
      "Pūraṇa Kassapa",
      "Makkhali Gosāla",
      "Ajita Kesakambala",
      "Pakudha Kaccāyana",
      "Sañjaya Belaṭṭhiputta",
      "Nigaṇṭha Nātaputta"
    ],
    "The different professional fields [dn2:11.0]": [
      "Elephant riders",
      "Cavalry",
      "Charioteers",
      "Archers",
      "Bannermen",
      "Adjutants",
      "Food servers",
      "Warrior-chiefs",
      "Princes",
      "Chargers",
      "Great warriors",
      "Heroes",
      "Leather-clad soldiers",
      "Sons of bondservants",
      "Bakers",
      "Barbers",
      "Bathroom attendants",
      "Cooks",
      "Garland-makers",
      "Dyers",
      "Weavers",
      "Basket-makers",
      "Potters",
      "Accountants",
      "Finger-talliers",
      "Those following any similar professions"
    ],
    "The things for which one who acts does not do wrong according to Pūraṇa Kassapa [dn2:16.0]": [
      "Punishing",
      "Mutilating",
      "Torturing",
      "Aggrieving",
      "Oppressing",
      "Intimidating",
      "Encouraging others to do the same",
      "Killing",
      "Stealing",
      "Breaking into houses",
      "Plundering wealth",
      "Stealing from isolated buildings",
      "Committing highway robbery",
      "Committing adultery",
      "Lying"
    ],
    "The things for which Makkhali Gosāla said there is no cause or condition [dn2:19.0]": [
      "The corruption of sentient beings",
      "The purification of sentient beings",
      "Acting of one's own volition",
      "Acting of another's volition",
      "Acting from a person's volition"
    ],
    "The things Makkhali Gosāla said all sentient beings, living creatures, beings, and souls lack [dn2:19.0]": [
      "Control",
      "Power",
      "Energy"
    ],
    "The things Makkhali Gosāla said are allotted [dn2:19.0]": [
      "Pleasure",
      "Pain"
    ],
    "The things Makkhali Gosāla said there are 7 of [dn2:19.0]": [
      "Main wombs",
      "Sub-eons",
      "Classes of rebirth",
      "Stages in a person's life",
      "Ājīvaka ascetics",
      "Wanderers",
      "Naked ascetics"
    ],
    "The things Makkhali Gosāla said there are 8.4 million of [dn2:19.0]": [
      "Great eons",
      "Fools",
      "Astute"
    ],
    "The things for which Ajita Kesakambala says there is no meaning [dn2:22.0]": [
      "Giving",
      "Sacrifice",
      "Offerings"
    ],
    "The things for which Ajita Kesakambala says there is no fruit or result [dn2:22.0]": [
      "Good deeds",
      "Bad deeds"
    ],
    "The things that Ajita Kesakambala says do not exist [dn2:22.0]": [
      "Afterlife",
      "Obligation to mother and father",
      "Beings that are reborn spontaneously",
      "Ascetic or brahmin who is well attained and practiced"
    ],
    "The seven substances which are not made, not derived, not created, without a creator, barren, steady as a mountain peak, standing firm like a pillar according to Pakudha Kaccāyana [dn2:25.0]": [
      "The substance of earth",
      "The substance of water",
      "The substance of fire",
      "The substance of air",
      "Pleasure",
      "Pain",
      "The soul"
    ],
    "The fourfold restraint according to Nigaṇṭha Nātaputta  [dn2:28.0]": [
      "Obstructed by all water",
      "Devoted to all water",
      "Shaking off all water",
      "Pervaded by all water"
    ],
    "The different professional fields [dn2:31.0]": [
      "Elephant riders",
      "Cavalry",
      "Charioteers",
      "Archers",
      "Bannermen",
      "Adjutants",
      "Food servers",
      "Warrior-chiefs",
      "Princes",
      "Chargers",
      "Great warriors",
      "Heroes",
      "Leather-clad soldiers",
      "Sons of bondservants",
      "Bakers",
      "Barbers",
      "Bathroom attendants",
      "Cooks",
      "Garland-makers",
      "Dyers",
      "Weavers",
      "Basket-makers",
      "Potters",
      "Accountants",
      "Finger-talliers",
      "Those following any similar professions"
    ],
    "The fruits of the ascetic life that are apparent in the present life [dn2:39.0]": [
      "The Realized One arises in the world",
      "The Realized One is perfected",
      "The Realized One is fully awakened",
      "The Realized One is accomplished in knowledge and conduct",
      "The Realized One is holy",
      "The Realized One is a knower of the world",
      "The Realized One is the supreme guide for those who wish to train",
      "The Realized One is the teacher of gods and humans",
      "The Realized One is awakened",
      "The Realized One is blessed",
      "The Realized One has realized with his own insight this world—with its gods, Māras and Brahmās, this population with its ascetics and brahmins, gods and humans",
      "The Realized One makes it known to others",
      "The Realized One teaches Dhamma that’s good in the beginning, good in the middle, and good in the end, meaningful and well-phrased",
      "The Realized One reveals a spiritual practice that’s entirely full and pure",
      "A householder hears that teaching, or a householder’s child, or someone reborn in some clan",
      "They gain faith in the Realized One, and reflect: ‘Living in a house is cramped and dirty, but the life of one gone forth is wide open. It’s not easy for someone living at home to lead the spiritual life utterly full and pure, like a polished shell. Why don’t I shave off my hair and beard, dress in ocher robes, and go forth from the lay life to homelessness?’",
      "After some time they give up a large or small fortune, and a large or small family circle",
      "They shave off hair and beard, dress in ocher robes, and go forth from the lay life to homelessness",
      "Once they’ve gone forth, they live restrained in the monastic code, conducting themselves well and seeking alms"
    ],
    "The things that some ascetics and brahmins still engage in, despite enjoying food given in faith [dn2:46.0]": [
      "Injuring plants and seeds",
      "Storing up goods for their own use",
      "Seeing shows",
      "Gambling that causes negligence",
      "Making use of high and luxurious bedding",
      "Beautifying and adorning themselves with garlands, fragrance, and makeup",
      "Engaging in unworthy talk",
      "Engaging in arguments",
      "Engaging in running errands and messages",
      "Engaging in deceit, flattery, hinting, and belittling, and using material possessions to chase after other material possessions"
    ],
    "The fields in which ascetics and brahmins should not engage [dn2:56.0]": [
      "Limb-reading",
      "Omenology",
      "Divining celestial portents",
      "Interpreting dreams",
      "Divining bodily marks",
      "Divining holes in cloth gnawed by mice",
      "Fire offerings",
      "Ladle offerings",
      "Offerings of husks, rice powder, rice, ghee, or oil",
      "Offerings from the mouth",
      "Blood sacrifices",
      "Palmistry",
      "Geomancy for building sites, fields, and cemeteries",
      "Exorcisms",
      "Earth magic",
      "Snake charming",
      "Poisons",
      "The crafts of the scorpion, the rat, the bird, and the crow",
      "Prophesying life span",
      "Chanting for protection",
      "Deciphering animal cries",
      "Reading the marks of gems, cloth, clubs, swords, spears, arrows, weapons, women, men, boys, girls, male and female bondservants, elephants, horses, buffaloes, bulls, cows, goats, rams, chickens, quails, monitor lizards, rabbits, tortoises, or deer",
      "Making predictions that the king will march forth or march back; or that our king will attack and the enemy king will retreat, or vice versa; or that our king will triumph and the enemy king will be defeated, or vice versa; and so there will be victory for one and defeat for the other",
      "Making predictions that there will be an eclipse of the moon, or sun, or stars; that the sun, moon, and stars will be in conjunction or in opposition; that there will be a meteor shower, a fiery sky, an earthquake, thunder; that there will be a rising, a setting, a darkening, a brightening of the moon, sun, and stars",
      "Making predictions about the results of all such phenomena"
    ],
    "The things for which a mendicant has mindfulness and situational awareness [dn2:65.0]": [
      "Going out and coming back",
      "Looking ahead and aside",
      "Bending and extending the limbs",
      "Bearing the outer robe, bowl and robes",
      "Eating, drinking, chewing, and tasting",
      "Urinating and defecating",
      "Walking, standing, sitting, sleeping, waking, speaking, and keeping silent"
    ],
    "The things with which a mendicant is content [dn2:66.0]": [
      "Robes to look after the body",
      "Almsfood to look after the belly"
    ],
    "The five hindrances [dn2:67.0]": [
      "Desire for the world",
      "Ill will and malevolence",
      "Dullness and drowsiness",
      "Restlessness and remorse",
      "Doubt"
    ],
    "The things that happen when a mendicant enters the first absorption [dn2:75.0]": [
      "They enter and remain in the first absorption",
      "They place the mind and keep it connected",
      "They drench, steep, fill, and spread their body with rapture and bliss born of seclusion",
      "There's no part of the body that's not spread with rapture and bliss born of seclusion"
    ],
    "The things that happen when a mendicant enters and remains in the second absorption [dn2:77.0]": [
      "The mendicant's mind is stilled",
      "The mendicant enters and remains in the second absorption",
      "The mendicant has rapture and bliss born of immersion",
      "The mendicant has internal clarity and confidence",
      "The mendicant has a unified mind",
      "The mendicant does not apply the mind and keep it connected"
    ],
    "The things that happen when a mendicant drenches, steeps, fills, and spreads their body with rapture and bliss born of immersion [dn2:77.0]": [
      "The mendicant drenches their body with rapture and bliss born of immersion",
      "The mendicant steeps their body with rapture and bliss born of immersion",
      "The mendicant fills their body with rapture and bliss born of immersion",
      "The mendicant spreads their body with rapture and bliss born of immersion",
      "There is no part of the mendicant's body that is not spread with rapture and bliss born of immersion"
    ],
    "The different types of absorption [dn2:79.0]": [
      "The first absorption",
      "The second absorption",
      "The third absorption",
      "The fourth absorption"
    ],
    "The eight knowledges which are a fruit of the ascetic life [dn2:83.0]": [
      "Knowledge and vision",
      "Mind-made body",
      "Psychic powers",
      "Clairaudience",
      "Comprehending the minds of others",
      "Recollection of past lives",
      "Clairvoyance",
      "The death and rebirth of sentient beings"
    ],
    "The different things that are used as an analogy for the mind-made body [dn2:85.0]": [
      "A reed and its sheath",
      "A sword and its scabbard",
      "A snake and its slough"
    ],
    "The qualities of the mind that is necessary for the creation of the mind-made body [dn2:85.0]": [
      "Purified",
      "Bright",
      "Flawless",
      "Rid of corruptions",
      "Pliable",
      "Workable",
      "Steady",
      "Imperturbable"
    ],
    "The many kinds of psychic power [dn2:87.0]": [
      "Multiplying themselves and becoming one again",
      "Going unimpeded through a wall, a rampart, or a mountain as if through space",
      "Diving in and out of the earth as if it were water",
      "Walking on water as if it were earth",
      "Flying cross-legged through the sky like a bird",
      "Touching and stroking with the hand the sun and moon, so mighty and powerful",
      "Controlling the body as far as the Brahmā realm"
    ],
    "The items that can be produced with the well-prepared clay, ivory, and gold [dn2:87.0]": [
      "Any kind of pot",
      "Any kind of ivory item",
      "Any kind of gold item"
    ],
    "The things that happen when their mind has become immersed in samādhi [dn2:89.0]": [
      "The mind becomes purified",
      "The mind becomes bright",
      "The mind becomes flawless",
      "The mind becomes rid of corruptions",
      "The mind becomes pliable",
      "The mind becomes workable",
      "The mind becomes steady",
      "The mind becomes imperturbable"
    ],
    "The two kinds of sounds one can hear when their mind has become immersed in samādhi [dn2:89.0]": [
      "Human sounds",
      "Divine sounds"
    ],
    "The things by which an astute person is known [dn2:91.0]": [
      "Good conduct by way of body",
      "Good conduct by way of speech",
      "Good conduct by way of mind"
    ],
    "The different types of mind [dn2:91.0]": [
      "Mind with greed",
      "Mind without greed",
      "Mind with hate",
      "Mind without hate",
      "Mind with delusion",
      "Mind without delusion",
      "Constricted mind",
      "Scattered mind",
      "Expansive mind",
      "Unexpansive mind",
      "Mind that is not supreme",
      "Mind that is supreme",
      "Immersed mind",
      "Unimmersed mind",
      "Freed mind",
      "Unfreed mind"
    ],
    "The many kinds of past lives recollected [dn2:93.0]": [
      "One life",
      "Two lives",
      "Three lives",
      "Four lives",
      "Five lives",
      "Ten lives",
      "Twenty lives",
      "Thirty lives",
      "Forty lives",
      "Fifty lives",
      "A hundred lives",
      "A thousand lives",
      "A hundred thousand lives",
      "Many eons of the world contracting",
      "Many eons of the world expanding",
      "Many eons of the world contracting and expanding"
    ],
    "The features and details past lives recollected [dn2:93.0]": [
      "The person's name",
      "The person's clan",
      "The person's appearance",
      "The person's food",
      "The person's feelings of pleasure and pain",
      "The person's death",
      "The person's rebirth"
    ],
    "The things by which a person is known [dn2:95.0]": [
      "Good conduct by way of body",
      "Good conduct by way of speech",
      "Good conduct by way of mind"
    ],
    "The things that a person does [dn2:95.0]": [
      "Entering and leaving a house",
      "Walking along the streets and paths",
      "Sitting at the central square"
    ],
    "The four things that the person in samādhi would see regarding defilements [dn2:97.0]": [
      "This is suffering",
      "This is the origin of suffering",
      "This is the cessation of suffering",
      "This is the practice that leads to the cessation of suffering"
    ],
    "The things by which the Buddha has made the teaching clear [dn2:99.0]": [
      "Righting the overturned",
      "Revealing the hidden",
      "Pointing out the path to the lost",
      "Lighting a lamp in the dark"
    ]
  }
}

Now notice above it’s missing some lists, has things that shouldn’t be included in any database, and gets lists wrong too but much of it is quite accurate. Also it tends to ‘summarize’ things into a single word or phrase for things that are more detailed, contextual, and explained in the text and that’s worrying especially because it can’t understand the text with any depth. I’m trying to make it as verbatim as possible though. Some lists don’t need language model software to produce, but some lists can be constructed quicker by LLMs so that’s where this is coming from.

Each list, if it isn’t reflective of what the sutta says needs care in re-prompting, modifying outputs and re-reading the text. Obviously the first thing in this example is that one of the important parts of DN2 is the gradual training sequence, and since each text prompt is maximum 600 words or so, it’s not picking a whole gradual training sequence up, just in sections. Still the LLMs can pick up implied lists with the correct prompt quite often.

Conditioned/causal sequences
 "The things that arise when certain conditions are met [dn14:2.18.0] (a sequence)": [
      "When rebirth exists there’s old age and death.",
      "When continued existence exists there’s rebirth.",
      "When grasping exists there’s continued existence.",
      "When craving exists there’s grasping.",
      "When feeling exists there’s craving.",
      "When contact exists there’s feeling.",
      "When the six sense fields exist there’s contact.",
      "When name and form exist there are the six sense fields.",
      "When consciousness exists there are name and form.",
      "Name and form are conditions for consciousness.",
      "Consciousness is a condition for name and form.",
      "Name and form are conditions for the six sense fields.",
      "The six sense fields are conditions for contact.",
      "Contact is a condition for feeling.",
      "Feeling is a condition for craving.",
      "Craving is a condition for grasping.",
      "Grasping is a condition for continued existence.",
      "Continued existence is a condition for rebirth.",
      "Rebirth is a condition for old age and death, sorrow, lamentation, pain, sadness, and distress to come to be.",
      "That is how this entire mass of suffering originates."
    ],
    "The things that cease when certain conditions are met [dn14:2.18.0] (a sequence)": [
      "When rebirth doesn’t exist there’s no old age and death.",
      "When rebirth ceases, old age and death cease.",
      "When continued existence doesn’t exist there’s no rebirth.",
      "When continued existence ceases, rebirth ceases.",
      "When grasping doesn’t exist there’s no continued existence.",
      "When grasping ceases, continued existence ceases.",
      "When craving doesn’t exist there’s no grasping."
    ],
    "The things that would happen if the Buddha Vipassī did not teach the Dhamma [dn14:3.1.0] (a sequence)": [
      "The world would be lost",
      "The world would perish",
      "People would not understand the Dhamma"
    ],

Notice it doesn’t know that that is a dependent origination sequence, just “The things that arise when certain conditions are met” hehe. Sequences are presented in many different forms in the suttas and often aren’t immediately obvious that they’re conditioned sequences, or it’s debatable.

If anyone has any suggestions of other metadata-y things that might be useful summarizations that point back to the text to be verified, then I would love to hear. @charith has also been working on producing a list of the speakers in the suttas, he’s also done some work with keywords, and other data science stuff with R and I hope we can collaborate more with producing graphs and graph networks that can be visualized.

I just thought I’d try extracting these features of the English-language segmented translations using specifically worded prompt engineering and one-shot examples followed by parsing the output, and it seemed to produce these things quite well. I did test on various small models without fine tuning: OPT-13B, GPTJ-6B, Bigscience Bloom 13B, and so far usually only the large language models BigScience Bloom 167B and GPT-3 have come back with decent answers of what’s in each list in a sutta or speaker attribution. I couldn’t get Stanford NLP to do speaker attribution reliably but it has produced lemmatized and stemmed English for possible search, in case that’s useful at some point, and seems to recognize place names/peoples’ names too.

On how to display or edit all the potential finalized outputs, I am contemplating forking bilara and SC itself, or perhaps put it up on a website as each are done with a away to suggest changes.

Now back to the original discussion about fine tuning, my understanding is that they train for text continuations or “given this input, produce this output”. The task is feeding in the inputs and outputs in such a way that it remembers all possible things that might asked about, its segment and sutta name. One could test this on the minimum needed (I think it’s probably around 200 sets) to see what happens on subsequent queries. Reformer model can apparently do perfect recall, but it’s basically a database and can’t take natural language questions in. Bloom is working on distributed training in Petals AI but it’s still pretty alpha, I believe. The compute needs for training Bloom 13B might be possible to do on Lambda GPU cloud fairly cheaply but I’m unsure of power usage. Bloom 167B was trained on French Nuclear energy so it’s … carbon-neutral? :unamused:. There’s also this txtai thing that looks like it might be able to do natural language queries perhaps and you can choose from various models, but I think the whole point of that is semantic search and recommendations which will be hard for the suttas but maybe the evil bits can be turned off.

1 Like