Skip to content

Latest commit

 

History

History
45 lines (36 loc) · 2.58 KB

README.md

File metadata and controls

45 lines (36 loc) · 2.58 KB

PDF-to-Text

A few simple Python scripts to extract text from text-based or OCR-ed PDF files:

PDF-to-Text-A

This code searches only through the specified directory for PDF files, extracts their text, and saves them as individual text files in the specified output directory.

PDF-to-Text-B

This code searches only through the specified directory for PDF files, extracts their text, and combines it to save it as one text files in the specified output directory.

PDF-to-Text-C

This code searches through the specified directory and all its subdirectories for PDF files, extracts their text, and saves them as individual text files in the specified output directory.

PDF-to-Text-D

This code searches through the specified directory and all its subdirectories for PDF files, extracts their text, aand combines it to save it as one text files in the specified output directory.

How to use

PDF-to-Text-A

  1. Open the Python script in your code editor.
  2. In pdf_directory = '/path/to/pdf/files' replace /path/to/pdf/files with the actual directory path.
  3. In output_directory = '/path/to/output/directory' replace /path/to/output/directory with the desired output directory path.
  4. Save the script and you're ready to go.

PDF-to-Text-B

  1. Open the Python script in your code editor.
  2. In pdf_directory = '/path/to/pdf/files' replace /path/to/pdf/files with the actual directory path.
  3. In output_directory = '/path/to/output/directory' replace /path/to/output/directory with the desired output directory path.
  4. Rename the output file 'combined_text.txt' as desired.
  5. Save the script and you're ready to go.

PDF-to-Text-C

  1. Open the Python script in your code editor.
  2. In pdf_directory = '/path/to/pdf/files' replace /path/to/pdf/files with the actual directory path.
  3. In output_directory = '/path/to/output/directory' replace /path/to/output/directory with the desired output directory path.
  4. Save the script and you're ready to go.

PDF-to-Text-D

  1. Open the Python script in your code editor.
  2. In pdf_directory = '/path/to/pdf/files' replace /path/to/pdf/files with the actual directory path.
  3. In output_directory = '/path/to/output/directory' replace /path/to/output/directory with the desired output directory path.
  4. In combined_text_file_name = 'combined_text.txt' rename the output file as desired.
  5. Save the script and you're ready to go.

Requirements

To run either of these Python scripts you need to have the PyPDF2 library in your terminal, you can install it using pip: pip install PyPDF2.

Scripts written with the help of GPT-3.5.