A few years ago I actually create these segments.
Here are the top 50 SC segments with translations, plus another 100ish without.
It was a while ago now, so I don’t remember what the criteria for weeding was. I think that we took segments, and if they contained more than 1 sentence, we split them at the full stop. In the second sheet in this document are shorter phrases which were split at commas or full-stops.
These are a neglected work in progress. I offer them for the benefit of anyone interested.
Even if you don’t click through, I think it’s quite lovely that the most frequent phrase is
‘taṁ kissa hetu?’
ETA: take the frequency numbers as not an exact science. We did our best but things were surely missed in the attempt to not just have a pile of *kho/pana/*ti" type phrases.