Ability to rsync zipped directory with gcs ? #1765
Unanswered
merzak274j
asked this question in
Q&A
Replies: 1 comment
-
Update: I was able to get this to work, although it works sequentially.
I used get_mapper to list all the files in GCS, so that I only transfer new files. Building on that, I used ThreadPoolExecutor to create multiple connections to the ZIP filesystem, since the bottleneck seemed to be the connection and not the number of files I was attempting to read simultaneously. I have something that works now, but I'd still be interested in seeing if rsync can apply to my use case. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
Thanks for this great library.
I’m wondering if it’s possible to use rsync for my use case. So far, after trying a few different methods I have not been successful. I’ve also reviewed the Q&A and issues and wasn’t able to find a clear example of what I’m trying to do.
I would like to sync a zipped directory hosted on an FTP server, to an unzipped directory on GCS.
Rsync would decompress and copy files from ftp://user:[email protected]/path/to/archive.zip to gs://my-bucket/targetdirectory with update condition set to never.
Is this supported? And if so would it be possible to get an example of the syntax required ? So far my efforts all result in a StartsWith error or an IsADirectoryError
Using different syntax I’ve tried passing the ftp url ending in .zip using URL chaining as shown here URL Chaining which resulted in a protocol not recognized (zip::ftp) or opening the ftp with ZipFileSystem and setting source as (myzipfilesysteminstance, “”) or “/“ to represent the root even thought rsync needs a string.
Based on some issues I read, I also tried registering a generic file system like this
I noticed that url_to_fs does return the path to my GCS directory, but in the case of the zip the URL is shown as an empty string.
How do i get rsync to recognize my zip file system as a directory ?
My zip file is quite large (several GB) and contains tens of thousands of files, so I’m looking for a clean and efficient solution that may also leverage concurrency to sync the directories, which rsync seems to provide.
I am somewhat new to programming, and so I’m not sure if what I want is technically possible or if not how I could go about this using fsspec, maybe using copy with many files to a directory; any help would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions