Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Seed-level video capture setting handling + Job-level PDF-only option #288

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

gretchenleighmiller
Copy link

@gretchenleighmiller gretchenleighmiller commented Sep 12, 2024

This PR covers the following:

  • Adds a new video_capture configuration option on the Seed level. This has four possible values; see newly added documentation for details.
  • Implements the video_capture option, which impacts yt-dlp extraction and MIME types of outlinks. The remainder of video capture handling is accomplished in warcprox via Warcprox-Meta headers.
  • Removes the previous file-based skip_av_seeds functionality.
  • Adds a new pdfs_only configuration option on the Job level. This is a boolean that defaults to False; see newly added documentation for details.
  • Implements the pdfs_only option based on the MIME type of outlinks. The remaining PDF-only filtering is accomplished in warcprox via Warcprox-Meta headers.
  • Documentation updates and minor style cleanup.

@gretchenleighmiller gretchenleighmiller changed the title Gmiller/2950 skip ytdlp Implement Seed-level video capture setting handling Sep 12, 2024
@gretchenleighmiller gretchenleighmiller changed the title Implement Seed-level video capture setting handling Implement Seed-level video capture setting handling + Job-level PDF-only option Sep 20, 2024
@gretchenleighmiller gretchenleighmiller marked this pull request as ready for review September 20, 2024 23:45
brozzler/worker.py Outdated Show resolved Hide resolved
job-conf.rst Outdated Show resolved Hide resolved
job-conf.rst Outdated Show resolved Hide resolved
job-conf.rst Show resolved Hide resolved
brozzler/worker.py Outdated Show resolved Hide resolved
Copy link
Contributor

@galgeek galgeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gretchenleighmiller, thanks for your work on this!

I've left a couple of comments it might be good to address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants