Skip to content

Latest commit

 

History

History
26 lines (22 loc) · 754 Bytes

decoder.md

File metadata and controls

26 lines (22 loc) · 754 Bytes

#解决编码问题


  • 尽早将字符串转为unicode
  • 在sublime中默认以utf-8的形式输出,所以要用encode('utf-8')将unicode转成sublime可输出
  • 在cmd中默认以gbk的形式输出,所以要用encode('gbk')/encode('gb2312')/encode('gb18030')将unicode转成sublime可输出
  • 最后输出时候才将unicode转为相应编码

old = u'\xe9\xa2\x98\xe5\x90\x8d/\xe8\xb4\xa3\xe4\xbb\xbb\xe8\x80\x85:'

''.join([chr(ord(x)) for x in old])

s='\u554a\u54c8' s.decode('unicode-escape')

or like this: u = eval('u'' + s +''')


html = '<abc>' import HTMLParser html_parser = HTMLParser.HTMLParser() txt = html_parser.unescape(html)

escape: import cgi html = cgi.escape(txt)