Skip to content

ParanoiDF - PDF Analysis Suite based on PeePDF by Jose Miguel Esparza (http://peepdf.eternal-todo.com/). Tools added: Password cracking, redaction recovery, DRM removal, malicious JavaScript extraction, and more.

License

Notifications You must be signed in to change notification settings

patrickdw123/ParanoiDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ParanoiDF

The swiss army knife of PDF Analysis Tools. Based on peepdf - http://peepdf.eternal-todo.com. This README builds on the peepdf README.

This tool was developed as part of my M.Sc dissertation/project for the School of Computing (University of Kent, Canterbury, UK). The man behind the idea for this tool was Julio Hernandez-Castro (www.azhala.com).

Home Page

Features

See https://github.com/patrickdw123/ParanoiDF/wiki.

Dependancies

  • In order to crack passwords:
    • PdfCrack needed (apt-get install pdfcrack)
  • In order to remove DRM (editing, copying Etc.):
    • Calibre's ebook-convert needed (apt-get install calibre)
  • In order to decrypt PDFs:
    • qpdf needed (apt-get install qpdf)
  • In order to use the command redact:
    • NLTK (Natural Language ToolKit) needed (apt-get install python-nltk)
    • Java (Stanford parser is written in Java) needed (apt-get install default-jre)
  • To support XML output "lxml" is needed:
  • Included modules: lzw, colorama, jsbeautifier, ccitt, pythonaes (Thanks to all the developers!!)

Installation

No installation is needed apart of the commented dependencies, just execute:

python paranoiDF.py

Execution

There are two important options when ParanoidF is executed:

-f: Ignores the parsing errors. Analysing malicious files propably leads to parsing errors, so this parameter should be set. -l: Sets the loose mode, so does not search for the endobj tag because it's not obligatory. Helpful with malformed files.

  • Simple execution

Shows the statistics of the file after being decoded/decrypted and analysed:

python paranoiDF.py [options] pdf_file
  • Interactive console

Executes the interactive console, giving a wide range of tools to play with.

python paranoiDF.py -i 
  • Batch execution

It's possible to use a commands file to specify the commands to be executed in the batch mode. This type of execution is good to automatise analysis of several files:

python paranoiDF.py [options] -s script_file 

Some Hints

If the information shown when a PDF file is parsed is not enough to know if it's harmful or not, the following commands can help to do it:

  • tree

Shows the tree graph of the file or specified version. Here we can see suspicious elements.

  • offsets

Shows the physical map of the file or the specified version of the document. This is helpful to see unusual big objects or big spaces between objects.

  • search

Search the specified string or hexadecimal string in the objects (decoded and encrypted streams included).

  • object/rawobject

Shows the (raw) content of the object.

  • stream/rawstream

Shows the (raw) content of the stream.

TODO (with date that I intend to start work on)

V2.0:

  • Refine and test redaction thresholds: 30/08/2014
  • Add automation of retrieval of redaction box information such as font size, font and redaction box coordinates: 30/08/2014
  • Add other encryption algorithms (such as the AES): 1/10/2014
  • Digital Signatures analysis: 1/10/2014
  • Add a GUI

Bugs

Feel free to send bugs/criticisms/praises/comments to patrickdw123(at)gmail(dot)com.