python - UTF-8 and os.listdir() -
I'm having trouble with the file "ş" character (this is \ xC8 \ x99
UTF-8 - COMMA with Latin small letter S).
I am creating a ş.txt
file and assign it to os.listdir ()
. Unfortunately, os.listdir ()
returns this as s \ xCC \ xA6
("s" + COMMINING COMMA below) and my test program (below) Fails
It is on my OS X, but it works on a Linux machine. Any thoughts that cause this behavior exactly (both environments are configured with LANG = en_US.UTF8)?
Here is the test program:
#coding: utf-8 import os fname = "ş.txt" with open (fname, "w") f In: files in F files: printed "found" second: print "did not get"
< P> Instead of their composite form, you will have to return the filenames to the normalized NFC combined generalized form:
import unicodedata files = [unicodeedata.normalize ('NFC', f) oslistdir ( F). ']]
This process as Unicode ; Otherwise you want to decode BestStasting in Unicode.
Also see.
Comments
Post a Comment