This is a web-based application that allows users to upload an image containing text and extract the text using Optical Character Recognition (OCR) technology. The application is built with Streamlit and uses Tesseract OCR for text extraction.
- Upload images in PNG, JPEG, or JPG formats.
- Extract text from images using Tesseract OCR.
- Simple and interactive web interface.
- Dockerized for easy deployment.
- Python 3.9+
- Tesseract OCR
-
Download the installer from the UB Mannheim Tesseract page.
-
Install the application and add the installation directory to your
PATH
environment variable. -
Verify installation:
tesseract --version
-
Install via Homebrew:
brew install tesseract
-
Verify installation:
tesseract --version
-
Install via the package manager:
sudo apt update sudo apt install tesseract-ocr
-
Verify installation:
tesseract --version
git clone https://github.com/jasminaaa20/streamlit-ocr.git
cd streamlit-ocr
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
streamlit run main.py
Visit http://localhost:8501
in your browser to use the app.
docker build -t streamlit-ocr .
docker run -p 8501:8501 streamlit-ocr
Access the app at http://localhost:8501
.
-
Install the Google Cloud SDK.
-
Authenticate with your Google Cloud account:
gcloud auth login
-
Enable the Cloud Run API:
gcloud services enable run.googleapis.com
-
Build and Push the Image to Google Container Registry
gcloud builds submit --tag gcr.io/<PROJECT-ID>/streamlit-ocr
-
Deploy to Cloud Run
gcloud run deploy streamlit-ocr \ --image gcr.io/<PROJECT-ID>/streamlit-ocr \ --platform managed \ --region <REGION> \ --allow-unauthenticated
Replace
<PROJECT-ID>
with your Google Cloud project ID and<REGION>
with your desired region. -
Access the Application After deployment, you’ll receive a URL to access your app.
.
├── Dockerfile # Docker configuration file
├── requirements.txt # Python dependencies
├── app.py # Streamlit application
├── README.md # Documentation
- Streamlit: For building the web interface.
- Tesseract OCR: For text extraction from images.
- Docker: For containerization.
- Google Cloud Run: For deployment.
Feel free to contribute or raise issues in the repository. Enjoy extracting text from images!