Bilara I/O Scripts on Windows

Jhanarato · August 15, 2020, 11:16am

G’day,

I helped Bhante Khemaratana get the import/export scripts running for bilara-data on windows. In doing so we ran into a bug that turned up in at least 2 places. When opening a file to read in text mode, python will get the encoding type via the posix api which returns UTF-8 on linux but CP1252 on windows. Reading in a UTF-8 file will throw a “UnicodeEncodeError”. There are two ways to handle this.

Python allows windows users to set “UTF-8” mode via an environment variable or as a an argument when launching the interpreter. We could simply make this a prerequisite for windows users and document it accordingly, or else more explicitly go through the code and specify the encoding when opening files. I’ve done this is one place and it works. Either way we’re expecting all files being read to be in UTF-8 format.

So, handle it in the code or in the shell?

Cheers,

J.R.

Jhanarato · August 16, 2020, 4:11am

Bhante Khemaratana has shown python -Xutf8 to work, though we didn’t have any luck with the environment variable method. Probably my fault. If you happen to be on Windows with python installed you can try setting PYTHONUTF8 to 1 and see how you go.

sujato · August 16, 2020, 8:36pm

Thanks for the feedback, I’ll tag @blake in on this. It sounds like at the very least we should document the fact that the flag is required.

On a related note, next year will be the 30th birthday of Unicode 1.0, and we’re still having to use special flags on Windows!

Jhanarato · August 17, 2020, 12:36am

All good mate. We got it working via the environment variable as well. A simple note on the readme would do fine.

sujato · August 17, 2020, 10:34pm

Feel free to edit the Readme, Blake is otherwise occupied right now with family.