Aditi does have a higher pitch, but my hypothesis is that Aeneas relies heavily on the inter-segment pauses for breathing. This provides a coarse segmentation.
If you listen carefully to Bhante’s segment #4 audio, you will hear two (2) breaths.
- The first breath is an intermediate breath (which is only needed because the preceding inter-segment breath was too short).
- The second breath is an intersegment breath
Both breaths are the same length, which presents Aeneas with a problem in that there are more audio fragments than the text would imply. In these cases, my hypothesis is that Aeneas uses a time-weighted average of the TTS audio to determine the inter-segment boundary. This would explain why Aeneas arbitrarily chose to chop “ayasma” in half. In other words, I think Aeneas got confused here and simply sliced at a computed time value totally disregarding any audio pauses.
When Bhante takes a deep breath before each segment, that deep breath itself provides a distinct timing block of silence. I’m guessing that Bhante is speaking from continuity born of immersion as one would in lengthy chants. In such a chanting mode, the breaths support the continuity of chanting to keep a certain cadence. Breaths would tend to be shallow or as needed.
By taking a deep breath before each segment, Bhante would be mindfully segmenting the audio himself. This is why I think Bhante @Sujato will want to decide how he wants the user to experience his audio. The difference is one of presentation. One may speak the sutta as a single entire full chant or one may simply speak each sutta segment mindfully. This is Bhante’s decision to make, not ours.
Note that the above is still a hypothesis. We only have the one recording of SN1.20.