Monday, April 28, 2008

 

Dealing with "native" Windows encoding

Microsoft Windows and other Microsoft utilities, like Microsoft Office, use encoding "UTF-16LE" by default; if they offer you multiple choices of encoding, they call it simply "Unicode". If the goal is to generate Unicode files which could be opened by all Microsoft applications, these better be in UTF-16LE.

Multiple language and libraries offer built-in conversion to UTF-16LE; however, one must be aware of two potential problems with that: (1) standard 4-byte header that Windows expects (and writes on output), and (2) potential problem with built-in DOS line ending mode ("text mode"); files must be written in "binary" mode.

Proper way to create UTF-16LE file in Python would be this:

fh = open ( "Test.txt", "wb" )
fh.write ( "\xff\xfe")
fh.write ( u"Проверка\r\n".encode("UTF-16LE" ) )
fh.close()

Labels: , ,


Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?