You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
final byte[] utf8bytes = input.getBytes(StandardCharsets.UTF_8) encodes the unknown character as a single byte ASCII question mark. When Util.utf8ByteIndexesMapping(input, bytesLength); creates the byteIndexes array of size bytesLength (the size of utf8bytes), it creates an array that is 2 smaller than it should be since the unknown character is 3 bytes. The set of if statements then references the original string and decide that the unknown character is 3 bytes, filling 3 array spots with the character index. Eventually Array.fill goes out of bounds and an exception is thrown.
The text was updated successfully, but these errors were encountered:
Interesting, invalid UTF-8 characters are not handled properly and are replaced with an ASCII "?" instead of with U+FFFD. I will look into a different API that handles that correctly or a lower-level API that give me more control on how invalid chars are processed.
final byte[] utf8bytes = input.getBytes(StandardCharsets.UTF_8)
encodes the unknown character as a single byte ASCII question mark. WhenUtil.utf8ByteIndexesMapping(input, bytesLength);
creates thebyteIndexes
array of sizebytesLength
(the size ofutf8bytes
), it creates an array that is 2 smaller than it should be since the unknown character is 3 bytes. The set of if statements then references the original string and decide that the unknown character is 3 bytes, filling 3 array spots with the character index. EventuallyArray.fill
goes out of bounds and an exception is thrown.The text was updated successfully, but these errors were encountered: