Skip to content

Zeinab-Mohsen/Image-Speak

Repository files navigation

Image Speak

For my grauation project, my team and I developed an innovative Image Caption Generator, which leverages a pretrained GPT-2 model and Vision Transformer (ViT) model to generate image captions. The main goal of the project was to explore the capabilities of AI in image understanding and provide users with a platform to generate descriptive captions for images.

Key Features:

  1. Utilized the GPT-2 model and Vision Transformer to generate coherent and accurate captions based on image input.

  2. Developed a user-friendly mobile app using Flutter, allowing users to interact with the AI seamlessly.

  3. Employed Flask and FastAPI to create a scalable and efficient backend that handles AI requests and responses smoothly.

  4. Used Google Cloud services like Text-To-Speach API and Translate API to ensure high availability and performance.

  5. Users can capture an image with the app, and it will generate a near real-time description, providing an interactive and engaging user experience.

  6. Integrated a feature to convert generated captions to voice and translate them, enhancing accessibility and user engagement.

The application demo

ImageSpeak.mp4

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published