reverse-engineering-reference-manual/contents/encodings/String_Encoding.md at master · yellowbyte/reverse-engineering-reference-manual · GitHub

<- .encodings[String Encoding] ->

ASCII

String encoding that maps a byte to an English character, a special character, or a number
- ASCII Table
- Out of the 128 characters defined in ASCII, only 95 of them are human-readable
- ASCII used 7 bits only, but the extra bit is still not enough to encode all the other languages
Line Terminator: encoded character sequence that represents end of line
- On DOS/Windows it's "\r\n" whereas on Linux it's "\n"
- "\r" is carriage return (0x0D)
- "\n" is line feed or new line (0x0A)

Unicode

Various encoding schemes were invented but none covered every languages until Unicode came along
- Unicode Character Table
- Unicode is a large table mapping every character to a unique numbers (code point)
- First 256 code points maps 1:1 to ASCII
- Different UTF encodings (e.g. UTF-8, UTF-16) use different amount of bytes to encode those code points

Cause Of Garbled Text

Reading a byte sequence using the wrong encoding scheme

Anti-Emulation <- RERM[.encodings] -> Data Encoding