Why is `clean_str` present in `wikienv`? #31

jamesbraza · 2024-08-26T23:08:00Z

I am finding certain strings can break clean_str:

p = "This is a test string with unicode escape: \\u00e9"

This will break clean_str:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 43: unexpected end of data

Why do we need to convert string to be UTF-8? And if it's required, why not just ignore conversion errors?

The text was updated successfully, but these errors were encountered:

ysymyth · 2025-01-16T20:00:25Z

does it actually break things when you run? if so happy to accept pr!

jamesbraza changed the title ~~Why is clean_str present in the repo?~~ Why is clean_str present in wikienv? Aug 26, 2024

Provide feedback