Skip to content

Latest commit

 

History

History
30 lines (26 loc) · 1.55 KB

File metadata and controls

30 lines (26 loc) · 1.55 KB

<- .encodings[String Encoding] ->


ASCII


  • String encoding that maps a byte to an English character, a special character, or a number
    • ASCII Table
    • Out of the 128 characters defined in ASCII, only 95 of them are human-readable
    • ASCII used 7 bits only, but the extra bit is still not enough to encode all the other languages
  • Line Terminator: encoded character sequence that represents end of line
    • On DOS/Windows it's "\r\n" whereas on Linux it's "\n"
    • "\r" is carriage return (0x0D)
    • "\n" is line feed or new line (0x0A)

Unicode


  • Various encoding schemes were invented but none covered every languages until Unicode came along
    • Unicode Character Table
    • Unicode is a large table mapping every character to a unique numbers (code point)
    • First 256 code points maps 1:1 to ASCII
    • Different UTF encodings (e.g. UTF-8, UTF-16) use different amount of bytes to encode those code points

Cause Of Garbled Text


  • Reading a byte sequence using the wrong encoding scheme

Anti-Emulation <- RERM[.encodings] -> Data Encoding