I helped Bhante Khemaratana get the import/export scripts running for bilara-data on windows. In doing so we ran into a bug that turned up in at least 2 places. When opening a file to read in text mode, python will get the encoding type via the posix api which returns UTF-8 on linux but CP1252 on windows. Reading in a UTF-8 file will throw a “UnicodeEncodeError”. There are two ways to handle this.
Python allows windows users to set “UTF-8” mode via an environment variable or as a an argument when launching the interpreter. We could simply make this a prerequisite for windows users and document it accordingly, or else more explicitly go through the code and specify the encoding when opening files. I’ve done this is one place and it works. Either way we’re expecting all files being read to be in UTF-8 format.
So, handle it in the code or in the shell?