Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #15: http/https inputs #22

Merged
merged 4 commits into from
Sep 19, 2019
Merged

Fix #15: http/https inputs #22

merged 4 commits into from
Sep 19, 2019

Conversation

ejulio
Copy link
Owner

@ejulio ejulio commented Sep 17, 2019

Fix #15
This PR adds two new schemes to spider-feeder, http and https.
They just open URLs using urlopen from urllib.request.
No extra data or certificates are allowed (maybe this is something that can be set somehow, but I didn't find a use case for it so far).
If there are query string parameters to be set, they can be formatted by the spider, as %(param)s are allowed in SPIDERFEEDER_INPUT_URI.

Also, I thought to include some transformations for common URLs.
For example, Google Drive displays the URLs like https://drive.google.com/file/d/<file id>/view?usp=sharing, but the actual URL for download is https://drive.google.com/uc?export=download&id=<file id>. To avoid requiring the users writing this piece of code, the transform could go inside the open method, what do you think?
At least, if not going there, and the user must write it explicitly in the spider, provide some utility function like get_google_drive_url(url) that would return the proper one and the user could use it when required.

@Gallaecio
Copy link
Contributor

Regarding Google Drive, I feel like this could be covered in the documentation instead. But I guess a utility function for it is OK.

@ejulio
Copy link
Owner Author

ejulio commented Sep 19, 2019

@Gallaecio , I created an issue for the utility function #23

@ejulio ejulio merged commit 692992c into master Sep 19, 2019
@ejulio ejulio deleted the feat-http_inputs branch September 19, 2019 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow http inputs
2 participants