Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix #15
This PR adds two new schemes to
spider-feeder
,http
andhttps
.They just open URLs using
urlopen
fromurllib.request
.No extra data or certificates are allowed (maybe this is something that can be set somehow, but I didn't find a use case for it so far).
If there are query string parameters to be set, they can be formatted by the spider, as
%(param)s
are allowed inSPIDERFEEDER_INPUT_URI
.Also, I thought to include some transformations for common URLs.
For example, Google Drive displays the URLs like
https://drive.google.com/file/d/<file id>/view?usp=sharing
, but the actual URL for download ishttps://drive.google.com/uc?export=download&id=<file id>
. To avoid requiring the users writing this piece of code, the transform could go inside theopen
method, what do you think?At least, if not going there, and the user must write it explicitly in the spider, provide some utility function like
get_google_drive_url(url)
that would return the proper one and the user could use it when required.