Scanner.scan utf8ByteIndexesMapping Array Out Of Bounds #170

crusse54 · 2021-12-17T21:02:36Z

final byte[] utf8bytes = input.getBytes(StandardCharsets.UTF_8) encodes the unknown character as a single byte ASCII question mark. When Util.utf8ByteIndexesMapping(input, bytesLength); creates the byteIndexes array of size bytesLength (the size of utf8bytes), it creates an array that is 2 smaller than it should be since the unknown character is 3 bytes. The set of if statements then references the original string and decide that the unknown character is 3 bytes, filling 3 array spots with the character index. Eventually Array.fill goes out of bounds and an exception is thrown.

The text was updated successfully, but these errors were encountered:

gliwka · 2023-11-20T11:06:08Z

Interesting, invalid UTF-8 characters are not handled properly and are replaced with an ASCII "?" instead of with U+FFFD. I will look into a different API that handles that correctly or a lower-level API that give me more control on how invalid chars are processed.

gliwka · 2023-11-20T11:07:25Z

Alternatively, I might also find a way to directly operate on the strings without the mapping. Let me know if you also have any ideas.

yenuka78 · 2024-01-24T08:46:18Z

We also encountered this bug. When is a fix expected?

gliwka · 2024-01-26T20:29:17Z

Fix will be released in the upcoming days

gliwka added the bug label Dec 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scanner.scan utf8ByteIndexesMapping Array Out Of Bounds #170

Scanner.scan utf8ByteIndexesMapping Array Out Of Bounds #170

crusse54 commented Dec 17, 2021

gliwka commented Nov 20, 2023

gliwka commented Nov 20, 2023

yenuka78 commented Jan 24, 2024

gliwka commented Jan 26, 2024

Scanner.scan utf8ByteIndexesMapping Array Out Of Bounds #170

Scanner.scan utf8ByteIndexesMapping Array Out Of Bounds #170

Comments

crusse54 commented Dec 17, 2021

gliwka commented Nov 20, 2023

gliwka commented Nov 20, 2023

yenuka78 commented Jan 24, 2024

gliwka commented Jan 26, 2024