Image Speak

For my grauation project, my team and I developed an innovative Image Caption Generator, which leverages a pretrained GPT-2 model and Vision Transformer (ViT) model to generate image captions. The main goal of the project was to explore the capabilities of AI in image understanding and provide users with a platform to generate descriptive captions for images.

Key Features:

Utilized the GPT-2 model and Vision Transformer to generate coherent and accurate captions based on image input.
Developed a user-friendly mobile app using Flutter, allowing users to interact with the AI seamlessly.
Employed Flask and FastAPI to create a scalable and efficient backend that handles AI requests and responses smoothly.
Used Google Cloud services like Text-To-Speach API and Translate API to ensure high availability and performance.
Users can capture an image with the app, and it will generate a near real-time description, providing an interactive and engaging user experience.
Integrated a feature to convert generated captions to voice and translate them, enhancing accessibility and user engagement.

The application demo

ImageSpeak.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
android		android
assets		assets
ios		ios
lib		lib
linux		linux
macos		macos
test		test
web		web
windows		windows
.gitignore		.gitignore
.metadata		.metadata
README.md		README.md
analysis_options.yaml		analysis_options.yaml
pubspec.lock		pubspec.lock
pubspec.yaml		pubspec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Speak

Key Features:

The application demo

About

Releases

Packages

Languages

Zeinab-Mohsen/Image-Speak

Folders and files

Latest commit

History

Repository files navigation

Image Speak

Key Features:

The application demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages