KN Thag, Theragāthā pāli + english side by side

I did a couple of quick regex substitutions on the Pali to get it to line up pretty close to how Bhante @sujato and J. Walton’s translation is formatted.

Any suggestion on how to quickly get the whole collection of english Thag into one sequential text file so I can put it into the spreadsheet and match it up with the pali?

I saw that there’s a PDF file of Thag out there, problem is if I just grab all the text, some characters like “h” go missing.

And I don’t want to grab each individual Thag sutta from suttacentral KN one by one.

kn-thag.zip (197.4 KB)

In the zip file is the thag.xls spreadsheet, with the first sutta in english and pali so you can see how it works.

the html file is created with librecalc, “saving xls as html”, (free version of ms excel, like openoffice).

1 Like

Try with the attached raw html files. If you know regex, you can filter out what you need and cat the files together to make a continuous file. If you then regex it in such a way that it becomes a comma-separated-file (csv) you can import that in Libreoffice Calc. Do the same with the other file, open both in Libreoffice and copy the whole column to the other file and you should be there. Good luck!

thag_pi.zip (307.8 KB)
thag_en.zip (386.5 KB)

2 Likes

Thanks Ven. Vimala. With windows powershell, I could cat all of those html into one file:

powershell cat all files into one:
Get-Content -Encoding utf8 .*.html | Out-File -Encoding utf8 all.html

But one more snag to overcome. the File names don’t have the numbers zero padded to sort properly, so the suttas get into ascii sorted order instead of numerical sorting.

My programming skills are very rusty, so it will take a few more hours than I expected, but I should have something in the next day or two.