List of concatenated texts for debaking

Concatenated texts are those where more than one “sutta” is contained within a single file. For this list, I only include those texts that may be meaningfully split into the respective suttas.

If we are to adopt the principle of not baking presentation assumptions into text data, then these are candidates for debaking. The segments of each of these need to be renamed from the range ID to the individual sutta ID.

Note that the distinction between debakable and not debakable texts usually follows explicit indications in the texts. Often these suttas are indicated in the MS edition with a bracketed number (1) at the end of each divisible sutta. in other cases, the different suttas are indicated by a repetition of the setting, or a final pali number (pathamam). On the other hand, Peyyala suttas are often indicated with an explicit remark that the text should be expanded. They are clearly intended to be a “range” and thus are not debaked.

  • Anguttara Ones and twos.
  • an11.22-29
  • an3.156-162
  • an3.163-182
  • sn12.83-92
  • sn12.93-213
  • sn23.23-33
  • sn23.35-45
  • sn33.11-15
  • sn33.16-20
  • sn33.51-54
  • sn35.33-42
  • sn35.43-51
  • sn43.14-43
  • sn45.104-108
  • sn45.110-114
  • sn45.116-120

Also some segment corrections:

“an11.502-981:5.5”: “.”,

“sn45.50-54:1.4”: “yadidaṃ—”,
“sn45.50-54:1.5”: “chandasampadā … pe …”,


And going in the other direction, should sn24.38 to sn24.44 actually be baked? :cake:

1 Like

No, they are unbaked in BB, so we follow him.

1 Like

Note that doing this might have an effect on the parallels that refer to certain parts of these suttas. So that is something to keep in mind when you do this: check the parallels afterwards.


Right, yes, we shall have to take care of this.


Note, debaking is done.


Do you have a handy, easy to read final list?
As it happens just yesterday I was just referring to this list when coding up some texts and thought it curious that sn12.83-92 was on the list but sn12.93-213 wasn’t, but lo, there’re both included in the update.

I honestly have no idea how that happened. I was just using the list above. Must have stumbled on the extra text at some point! I’ll add it above, so both of these comments will look weird!

But apart from that, it is the same as the list. The only variation is that not all of the Ones and Twos needed debaking, a few were genuinely combined texts so were left as-is.

1 Like

For the sake of clarity, as I’m trying to follow this for legacy coding, can I confirm that (going by the commit comments on GH) the only ones that remain as “single vaggasutta suttas” are: an1.378-393 and an2.230-279?

Yes, that’s correct.

Please note that this will also affect the menu structure as well as the parallels.
So what needs to be changed are:

Otherwise the files will simply not show up in the menu and suttaplex cards and parallels will not show.
You might want to make an issue on ZenHub for this.

Next to this, the Buddhanexus data need changing but we were already discussing pulling certain things together because of the large amount of repetitions, for instance in Samyuttas.

With regards to parallels, I wonder how this affects the paragraph numbers. Most of the time, paragraph numbers start at sc1 for each file (that was created automatically) and the parallels reflect this. I guess now all the parallels (95 for AN and 164 for SN) will have to be gone over by hand.

Only the segment IDs are changed, so no, I don’t think any of this will be affected.

1 Like

Ta … thanks for checking. My bad … I thought it was also the filenames. Will just check the parallels after it has been implemented just in case.

The parallels are based on the actual text number (eg. an1.1) rather than the range, which was what the old (unbaked) segment IDs had (eg. an1.1-10:1). So the new system is in fact closer to what the parallels have. The only issue might be, not in the data, but in the processing. So once the new changes are pulled into the system, we can see if they throw any errors.