Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update reader.py #353

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions clip_retrieval/clip_inference/reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ def folder_to_keys(folder, enable_text=True, enable_image=True, enable_metadata=
image_files = None
if enable_text:
text_files = [*path.glob("**/*.txt")]
text_files = {text_file.relative_to(path).as_posix(): text_file for text_file in text_files}
text_files = {text_file.relative_to(path).with_suffix('').as_posix(): text_file for text_file in text_files}
if enable_image:
image_files = [
*path.glob("**/*.png"),
Expand All @@ -29,10 +29,10 @@ def folder_to_keys(folder, enable_text=True, enable_image=True, enable_metadata=
*path.glob("**/*.BMP"),
*path.glob("**/*.WEBP"),
]
image_files = {image_file.relative_to(path).as_posix(): image_file for image_file in image_files}
image_files = {image_file.relative_to(path).with_suffix('').as_posix(): image_file for image_file in image_files}
if enable_metadata:
metadata_files = [*path.glob("**/*.json")]
metadata_files = {metadata_file.relative_to(path).as_posix(): metadata_file for metadata_file in metadata_files}
metadata_files = {metadata_file.relative_to(path).with_suffix('').as_posix(): metadata_file for metadata_file in metadata_files}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this doing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the file suffix from the key value in the file path dictionary.If the key value has a file suffix, when setting --enable_text True , the source code will use the key value containing ".txt" to find the corresponding image path in the image dictionary. At this time,

KeyError: 'xxx.txt'

will be raised ' resulting in

FileNotFoundError: [Errno 2] No such file or directory

As an example, my local folder structure is:

image

if the function without .with_suffix(''), the output is:

keys: ['BoredApeYachtClub_0.txt', 'BoredApeYachtClub_5.txt', 'folder1/BoredApeYachtClub_0.txt', 'folder1/BoredApeYachtClub_2.txt', 'folder1/BoredApeYachtClub_3.txt', 'folder2/BoredApeYachtClub_3.txt', 'folder2/BoredApeYachtClub_4.txt', 'folder3/BoredApeYachtClub_3.txt']
text_files: {'BoredApeYachtClub_5.txt': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/BoredApeYachtClub_5.txt'), 'BoredApeYachtClub_0.txt': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/BoredApeYachtClub_0.txt'), 'folder1/BoredApeYachtClub_0.txt': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder1/BoredApeYachtClub_0.txt'), 'folder1/BoredApeYachtClub_2.txt': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder1/BoredApeYachtClub_2.txt'), 'folder1/BoredApeYachtClub_3.txt': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder1/BoredApeYachtClub_3.txt'), 'folder2/BoredApeYachtClub_3.txt': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder2/BoredApeYachtClub_3.txt'), 'folder2/BoredApeYachtClub_4.txt': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder2/BoredApeYachtClub_4.txt'), 'folder3/BoredApeYachtClub_3.txt': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder3/BoredApeYachtClub_3.txt')}
image_files: {'BoredApeYachtClub_5.png': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/BoredApeYachtClub_5.png'), 'BoredApeYachtClub_0.png': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/BoredApeYachtClub_0.png'), 'folder1/BoredApeYachtClub_0.png': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder1/BoredApeYachtClub_0.png'), 'folder1/BoredApeYachtClub_2.png': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder1/BoredApeYachtClub_2.png'), 'folder1/BoredApeYachtClub_3.png': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder1/BoredApeYachtClub_3.png'), 'folder2/BoredApeYachtClub_3.png': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder2/BoredApeYachtClub_3.png'), 'folder2/BoredApeYachtClub_4.png': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder2/BoredApeYachtClub_4.png'), 'folder3/BoredApeYachtClub_3.png': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder3/BoredApeYachtClub_3.png')}

By contrast, if the function with .with_suffix(''), the output is:

keys:  ['BoredApeYachtClub_0', 'BoredApeYachtClub_5', 'folder1/BoredApeYachtClub_0', 'folder1/BoredApeYachtClub_2', 'folder1/BoredApeYachtClub_3', 'folder2/BoredApeYachtClub_3', 'folder2/BoredApeYachtClub_4', 'folder3/BoredApeYachtClub_3']
text_files:  {'BoredApeYachtClub_5': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/BoredApeYachtClub_5.txt'), 'BoredApeYachtClub_0': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/BoredApeYachtClub_0.txt'), 'folder1/BoredApeYachtClub_0': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder1/BoredApeYachtClub_0.txt'), 'folder1/BoredApeYachtClub_2': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder1/BoredApeYachtClub_2.txt'), 'folder1/BoredApeYachtClub_3': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder1/BoredApeYachtClub_3.txt'), 'folder2/BoredApeYachtClub_3': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder2/BoredApeYachtClub_3.txt'), 'folder2/BoredApeYachtClub_4': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder2/BoredApeYachtClub_4.txt'), 'folder3/BoredApeYachtClub_3': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder3/BoredApeYachtClub_3.txt')}
image_files:  {'BoredApeYachtClub_5': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/BoredApeYachtClub_5.png'), 'BoredApeYachtClub_0': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/BoredApeYachtClub_0.png'), 'folder1/BoredApeYachtClub_0': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder1/BoredApeYachtClub_0.png'), 'folder1/BoredApeYachtClub_2': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder1/BoredApeYachtClub_2.png'), 'folder1/BoredApeYachtClub_3': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder1/BoredApeYachtClub_3.png'), 'folder2/BoredApeYachtClub_3': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder2/BoredApeYachtClub_3.png'), 'folder2/BoredApeYachtClub_4': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder2/BoredApeYachtClub_4.png'), 'folder3/BoredApeYachtClub_3': PosixPath('/xxx/NFTs/BoredApeYachtClub_copy_V2/folder3/BoredApeYachtClub_3.png')}

At this time, no matter what the file format is, the corresponding files will share the same key value, allowing the dataloador to load the corresponding file.

Finally, regarding your question in #352 (comment), I also did a response test. The modified code can be compatible with the previous modifications and get the desired output.


keys = None

Expand All @@ -41,9 +41,9 @@ def join(new_set):

if enable_text:
keys = join(text_files.keys())
elif enable_image:
if enable_image:
keys = join(image_files.keys())
elif enable_metadata:
if enable_metadata:
keys = join(metadata_files.keys())

keys = list(sorted(keys))
Expand Down
Loading