This project provides a Python script to extract text from images of Georgia driver's permits using Optical Character Recognition (OCR) with Tesseract-OCR. The script enhances the image, extracts text, and saves the text to a CSV file.
- Image enhancement using OpenCV
- OCR text extraction with Tesseract-OCR
- Saves extracted text to a CSV file
- Sample image from the Georgia DDS provided for demonstration
This project is independent of the Georgia Department of Driver Services (DDS) and is not officially endorsed by or affiliated with DDS. The information provided in this project is for educational and informational purposes only. DDS is not responsible for any errors or omissions in the data or for any consequences arising from the use of this information.
-
Clone the repository:
git clone https://github.com/yourusername/your-repo.git cd your-repo
-
Install dependencies:
pip install -r requirements.txt
-
Ensure Tesseract-OCR is installed. Download it from here and update the PATH in the script (if necessary):
git clone https://github.com/yourusername/your-repo.git cd your-repo
-
Place the image of the driver's license in the same directory as the script or provide the path to the image.
-
Run the script:
python 1-write.py
-
The extracted text will be saved to
drivers_license_data.csv
your-repo/ │ ├── README.md ├── requirements.txt ├── 1-write.py ├── drivers_license.jpg ├── enhanced_image.jpg (generated in the program) └── drivers-license_data.csv (generated in the program)
The extracted text from the driver's license will be saved in drivers_license_data.csv
, with each line representing a separate piece of text extracted from the image.
- Fork the repository.
- Create a feature branch (git checkout -b feature-branch).
- Commit your changes (git commit -am 'Add new feature'). -Make frequent and small commits!
- Push to the branch (git push origin feature-branch).
- Create a new Pull Request. -Once your request is approved you are now a contributor!
This project is licensed under the MIT License - see the LICENSE file for details.
- Georgia DDS: Sample image used in this project.
- OpenCV - Computer vision library (Apache License 2.0)
- Pytesseract - Python wrapper for Tesseract (Apache License 2.0)
- Pandas - Data analysis library (BSD 3-Clause License)
- Pillow - Image processing library (HPND License)
requirements.txt
: List of dependencies:- opencv-python
- pytesseract
- pandas
- Pillow