

To interpret a byte sequence as a text, you have to know theĬorresponding character encoding: unicode_text = code(character_encoding) Lines.append(code('utf-8', 'slashescape')) #print err, dir(err), err.start, err.end, err.objectĬodecs.register_error('slashescape', slashescape)

returnĪ tuple with a replacement for the unencodable part of the inputĪnd a position where encoding should continue""" It should be slower than the cp437 solution, but it should produce identical results on every Python version.
Convert int to string python update#
UPDATE 20170119: I decided to implement slash escaping decode that works for both Python 2 and Python 3. See Python’s Unicode Support for details. Lines.append(code('utf-8', 'backslashreplace')) That works only for Python 3, so even with this workaround you will still get inconsistent output from different Python versions: PY3K = sys.version_info >= (3, 0) UPDATE 20170116: Thanks to comment by Nearoo - there is also a possibility to slash escape all unknown bytes with backslashreplace error handler.

UPDATE 20150604: There are rumors that Python 3 has the surrogateescape error strategy for encoding stuff into binary data without data loss and crashes, but it needs conversion tests, -> ->, to validate both performance and reliability. See the missing points in Codepage Layout - it is where Python chokes with infamous ordinal not in range. The same applies to latin-1, which was popular (the default?) for Python 2. UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid If you don't know the encoding, then to read binary input into string in Python 3 and Python 2 compatible way, use the ancient MS-DOS CP437 encoding: PY3K = sys.version_info >= (3, 0)īecause encoding is unknown, expect non-English symbols to translate to characters of cp437 (English characters are not translated, because they match in most single byte encodings and UTF-8).ĭecoding arbitrary binary input to UTF-8 is unsafe, because you may get this: > b'\x00\x01\xffsd'.decode('utf-8')
