-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for PDF "attachments" with pdftk #2
Comments
Rough procedure:
This seems kinda ugly, but there aren't many programs out there that deal with PDF attachments |
I wrote this up as an out-of-band script in my fork, (https://github.com/divergentdave/inspectors-general/blob/scripts/find_pdf_attachments.py) @konklone could you run this against your full archive when you get a chance? |
I ran this on my local collection (archive of just the current year) and found a few PDFs with |
I reran this with more PDFs on my local machine and got the following interesting results. (excluding qpdf failures, joboptions files, and accessibility report files) There are several important-looking file names, but it's scarce enough that throwing the files inside into unitedstates/reports might be the best move.
|
Agreed on all counts! |
This will also be needed for "PDF portfolios" such as this https://www.si.edu/Content/OIG/Audits/2015/A-14-06.pdf. According to https://blogs.adobe.com/pdfdevjunkie/2008/09/how_do_you_deal_with_large_pdf.html, "To maintain backward compatibility, a PDF Portfolio is basically a PDF with a bunch of attachments and some extra stuff in the catalog object." Edit: https://oversight.garden/reports?query=%22PDF+portfolio%22 |
Even if the USPS or DHS IGs don't have them, at least set up a process where if it does detect any, it emails the admin.
The text was updated successfully, but these errors were encountered: