For my grauation project, my team and I developed an innovative Image Caption Generator, which leverages a pretrained GPT-2 model and Vision Transformer (ViT) model to generate image captions. The main goal of the project was to explore the capabilities of AI in image understanding and provide users with a platform to generate descriptive captions for images.
-
Utilized the GPT-2 model and Vision Transformer to generate coherent and accurate captions based on image input.
-
Developed a user-friendly mobile app using Flutter, allowing users to interact with the AI seamlessly.
-
Employed Flask and FastAPI to create a scalable and efficient backend that handles AI requests and responses smoothly.
-
Used Google Cloud services like Text-To-Speach API and Translate API to ensure high availability and performance.
-
Users can capture an image with the app, and it will generate a near real-time description, providing an interactive and engaging user experience.
-
Integrated a feature to convert generated captions to voice and translate them, enhancing accessibility and user engagement.