Skip to content

jasminaaa20/streamlit-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Streamlit OCR Image Text Extractor

This is a web-based application that allows users to upload an image containing text and extract the text using Optical Character Recognition (OCR) technology. The application is built with Streamlit and uses Tesseract OCR for text extraction.

Features

  • Upload images in PNG, JPEG, or JPG formats.
  • Extract text from images using Tesseract OCR.
  • Simple and interactive web interface.
  • Dockerized for easy deployment.

Prerequisites

  • Python 3.9+
  • Tesseract OCR

Installing Tesseract OCR

Windows

  1. Download the installer from the UB Mannheim Tesseract page.

  2. Install the application and add the installation directory to your PATH environment variable.

  3. Verify installation:

    tesseract --version

macOS

  1. Install via Homebrew:

    brew install tesseract
  2. Verify installation:

    tesseract --version

Linux

  1. Install via the package manager:

    sudo apt update
    sudo apt install tesseract-ocr
  2. Verify installation:

    tesseract --version

Local Development

Clone the Repository

git clone https://github.com/jasminaaa20/streamlit-ocr.git
cd streamlit-ocr

Create a Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Python Dependencies

pip install -r requirements.txt

Run the Application

streamlit run main.py

Visit http://localhost:8501 in your browser to use the app.


Docker Deployment

Build the Docker Image

docker build -t streamlit-ocr .

Run the Docker Container

docker run -p 8501:8501 streamlit-ocr

Access the app at http://localhost:8501.


Deploy to Google Cloud Run

Prerequisites to Deploy

  • Install the Google Cloud SDK.

  • Authenticate with your Google Cloud account:

    gcloud auth login
  • Enable the Cloud Run API:

    gcloud services enable run.googleapis.com

Steps to Deploy

  1. Build and Push the Image to Google Container Registry

    gcloud builds submit --tag gcr.io/<PROJECT-ID>/streamlit-ocr
  2. Deploy to Cloud Run

    gcloud run deploy streamlit-ocr \
        --image gcr.io/<PROJECT-ID>/streamlit-ocr \
        --platform managed \
        --region <REGION> \
        --allow-unauthenticated

    Replace <PROJECT-ID> with your Google Cloud project ID and <REGION> with your desired region.

  3. Access the Application After deployment, you’ll receive a URL to access your app.


Application Structure

.
├── Dockerfile           # Docker configuration file
├── requirements.txt     # Python dependencies
├── app.py               # Streamlit application
├── README.md            # Documentation

Technologies Used

  • Streamlit: For building the web interface.
  • Tesseract OCR: For text extraction from images.
  • Docker: For containerization.
  • Google Cloud Run: For deployment.

License

License: MIT


Author

Akmal Ali Jasmin

LinkedIn post

Feel free to contribute or raise issues in the repository. Enjoy extracting text from images!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published