Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for PDF "attachments" with pdftk #2

Open
konklone opened this issue Nov 29, 2013 · 6 comments
Open

Check for PDF "attachments" with pdftk #2

konklone opened this issue Nov 29, 2013 · 6 comments

Comments

@konklone
Copy link
Member

Even if the USPS or DHS IGs don't have them, at least set up a process where if it does detect any, it emails the admin.

@divergentdave
Copy link
Contributor

Rough procedure:

  • Run qpdf on report with --decrypt option, output to a temporary file
  • Run pdftk report_decrypted.pdf unpack_files in a temporary directory
  • Check if pdftk created any files
  • Clean up

This seems kinda ugly, but there aren't many programs out there that deal with PDF attachments

@divergentdave
Copy link
Contributor

I wrote this up as an out-of-band script in my fork, (https://github.com/divergentdave/inspectors-general/blob/scripts/find_pdf_attachments.py) @konklone could you run this against your full archive when you get a chance?

@divergentdave
Copy link
Contributor

I ran this on my local collection (archive of just the current year) and found a few PDFs with Press Quality.joboptions or Standard.joboptions attached, which are configuration files for/from Acrobat Distiller. Several reports from the State Department OIG include an automatically-generated accessibility report (*.accreport.html) from Adobe Acrobat. Nothing too interesting so far.

@divergentdave
Copy link
Contributor

I reran this with more PDFs on my local machine and got the following interesting results. (excluding qpdf failures, joboptions files, and accessibility report files) There are several important-looking file names, but it's scarce enough that throwing the files inside into unitedstates/reports might be the best move.

data/treasury/2006/aprsep06/report.pdf has the following attachments: Front Cover and HIB - Final (10_30).pdf
data/treasury/2015/OIG-16-014/report.pdf has the following attachments: Re FW CPFS - Balance Sheet Presentation of Net Position.pdf, FW Proposed JV to DOI SCNP.pdf, RE Reclassification of Financial Statements for FY 2013.pdf, Re GFRS - Note 04A Direct Loans Receivable.pdf
data/pbgc/2006/2007-1-FA-0024-1/report.pdf has the following attachments: PBGC Response to OIG CG Financial Statement Audit Reports as revised1_.doc
data/osc/2016/FY2016-16-29%20DI-14-5128-16-29-DI-14-5218%20Agency%20Report/report.pdf has the following attachments: IMG_2190[1].JPG, IMG_2191[1].JPG, IMG_2189[1].JPG, IMG_0780.JPG, IMG_2193[1].JPG, IMG_0781.JPG, IMG_2192[1].JPG
data/osc/2012/FY2012-12-12d%20DI-11-2238%20and%20DI-11-2709%20-%20Supplemental%20Report/report.pdf has the following attachments: MS Comments in Blue.doc
data/peacecorps/2006/PC_South_Africa_Final_Evaluation-Report-IG-0702EA/report.pdf has the following attachments: Attachment Q.pdf, Attachment V - Early Funding Request 09-2005.pdf, Attachment F - GTOT RSA 06 Trainer.pdf, Attachment Z - 3 Walk-Around Personal Identification.pdf, Attachment X - 1 Courtship or Harrassment.pdf, Attachment W - IG Response Gene Peuse.pdf, Attachment O - D567D.pdf, Attachment S - vehfleetplan.pdf, Attachment T.pdf, Attachment BB - 5A Summary of Duties for APCD.pdf, Attachment P - vehiclemaintrecord.pdf, Attachment G - Housing_SS checklist RSA 07-6-2006.pdf, Attachment B - South Africa 170-06.pdf, Attachment K.pdf, Attachment L - MOU with FNB 08-2006.pdf, Attachment H - PCV Site Placement and Housing Checklist.pdf, Attachment CC - APCD Programming PA form 2003-2004.pdf, Attachment E - Weekly Self Assessments SA15.pdf, Attachment J - March 27 06 Minutes.pdf, Attachment D - EDUCATION - COMPETENCIES DRAFT.pdf, Attachment AA - 4 Are Rites Out of Step.pdf, Attachment C - South Africa 145-06.pdf, Attachment U - PCSA ARV-AB changes of COP Nov 2005.pdf, Attachment R - vehstareport.pdf, Attachment Y - 2 Accident.pdf, Attachment I - VAC Meeting Agenda 03-2006.pdf, Attachment A - TECHNICAL TRAINING PROGRAMME SA 15.pdf, Attachment N.pdf, Attachment M.pdf
data/dod/2011/SPO-2011-010/report.pdf has the following attachments: Final Report 100711.docx
data/smithsonian/2015/A-14-06/report.pdf has the following attachments: Transmittal Memo.pdf, KPMG Smithsonan A-133.pdf, DCAA Smithsonian A-133.pdf
data/nasa/2010/OMEGA-Report/report.pdf has the following attachments: OMEGA report FINAL Sept 20-v1.docx
data/nasa/2008/IG-09-006/report.pdf has the following attachments: Report_of_Independent_Auditors.pdf, Compliance_Report.pdf, Internal_Control_Report.pdf
data/nasa/2008/IG-09-006/report.decrypted.pdf has the following attachments: Report_of_Independent_Auditors.pdf, Compliance_Report.pdf, Internal_Control_Report.pdf
data/dot/2010/29584/report.pdf has the following attachments: quitecommands.xml
data/dot/2007/30096/report.pdf has the following attachments: Two Official in Bridge Division of NYDOT Charged in Bribery Scheme .doc
data/dot/2007/29992/report.pdf has the following attachments: New National Bridge Inspection Memo.doc
data/dot/2012/29073/report.pdf has the following attachments: FHWAARRA_FinalReport_4-5-12_CLee_MHchanges.docx

@konklone
Copy link
Member Author

konklone commented Dec 2, 2016

Agreed on all counts!

@divergentdave
Copy link
Contributor

divergentdave commented Feb 14, 2017

This will also be needed for "PDF portfolios" such as this https://www.si.edu/Content/OIG/Audits/2015/A-14-06.pdf.

According to https://blogs.adobe.com/pdfdevjunkie/2008/09/how_do_you_deal_with_large_pdf.html, "To maintain backward compatibility, a PDF Portfolio is basically a PDF with a bunch of attachments and some extra stuff in the catalog object."

Edit: https://oversight.garden/reports?query=%22PDF+portfolio%22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants