-
Notifications
You must be signed in to change notification settings - Fork 774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems when dealing with invalidly-encoded filenames #575
Comments
My first impulse is PR welcome. I'm assuming this wouldn't affect the external API, but I'm wondering how we'd handle the |
Yes, I was thinking it wouldn't affect the API, but I didn't know about the I think it's possible for From this it looks like Node simply uses utf-8 encoding by default. This seems strange since I think Windows stores file names in UTF-16 / UCS-2 encoding, but I just checked on Windows and the Buffers are indeed utf-8 encoded. |
That's my first thought too, but how do we actually filter non-UTF8 files? |
Ah, I was thinking of not filtering non-UTF8 names and just sending whatever string we get from the UTF8 conversion to the |
@rossj @RyanZim, bringing this issue back up, because we face the same problem with I've been working on a port of Node.js' path methods that work on Buffers: https://github.com/bcoe/path-buffer I've made an effort to detect |
fs-extra
version: 5.0.0Hi there. I ran into some cases where
remove()
was unable to remove a directory due to filename encoding issues. I believe there are similar issues usingempty
,copy
, andmove
operations (and their sync counterparts - basically anything that relies onfs.readdir
/fs.readdirSync
).My issue arose when trying to
fs.remove()
some directories that were created from an unzip operation. Duringremove
s /rimraf
's tree walk, some of the returned directories seemed not to exist (although they did), causing the finalunlink
operation to fail (since it wasn't actually successfully emptied).It seems that, in general, names on a file system are just byte sequences, which are not guaranteed to represent fully valid strings. This causes the bytes-> string -> bytes operation, that happens when listing and then operating on items in a directory using Node, to not always produce the same file name that it read.
This encoding problem has been a known Node issue for a while, which is why an option was added to return
Buffer
s fromfs.readdir
. My suggestion is to update the affected methods to use thisBuffer
option. I'm happy to work on a PR, but I wanted to at least get some feedback and discuss the issue before diving in.Here are a couple Node issues relating to the file name encoding problem:
nodejs/node-v0.x-archive#2387
nodejs/node#5616
Thanks!
The text was updated successfully, but these errors were encountered: