Narrating the Unseen Real-Time Video Descriptions for Visually Impaired Individuals

This research explores a novel system designed to empower visually impaired individuals by narrating their surroundings through spoken language, leveraging the capabilities of a mobile camera. Our study involves a comparative analysis of various pre-trained models in generating descriptive captions. Globally, approximately 2.2 billion people are affected by some form of visual impairment or blindness. Addressing this significant challenge, our research proposes an integrated solution aimed at assisting visually impaired individuals in comprehending their environment. This is achieved through the description of video streams, utilizing advanced Generative AI techniques.

The cornerstone of our proposed methodology is the use of a pre-trained GPT-4 Vision multimodal model, which has been trained on an extensive dataset comprising 13 million tokens. Additionally, we have engineered a robust Client-Server socket connection framework. This design ensures that intensive computational tasks, particularly video stream preprocessing, are primarily conducted server-side.

A key aspect of our research involves the evaluation of generated captions. These are meticulously compared with standard captions using established metrics such as BLEU and ROUGE scores. Recognizing the semantic limitations inherent in these metrics, we also employ a Semantic Similarity metric for a more nuanced comparison. This comprehensive approach allows for a thorough assessment of the effectiveness of our system in providing accurate and contextually relevant descriptions for the visually impaired.

Referenes:

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
DIfferent Image Captioning Models		DIfferent Image Captioning Models
Reference Papers		Reference Papers
Report		Report
Results		Results
Test_data		Test_data
.env		.env
.gitignore		.gitignore
Computer Vision Project Proposal.pdf		Computer Vision Project Proposal.pdf
Evaluation Metrics.md		Evaluation Metrics.md
Evaluations.py		Evaluations.py
Image_captioning using VIT Transformers.py		Image_captioning using VIT Transformers.py
Image_to_text_pipeline.py		Image_to_text_pipeline.py
Information for Report.docx		Information for Report.docx
Multimodal Pretraining.md		Multimodal Pretraining.md
Narrating the Unseen Real-Time Video Descriptions for Visually Impaired Individuals.pdf		Narrating the Unseen Real-Time Video Descriptions for Visually Impaired Individuals.pdf
README.md		README.md
Real_time_description_using_GPT4.py		Real_time_description_using_GPT4.py
Real_time_image_audio_description_for_visually_Impaired_people.docx		Real_time_image_audio_description_for_visually_Impaired_people.docx
gitattributes.txt		gitattributes.txt
livestream.py		livestream.py
pyvenv.cfg		pyvenv.cfg
requirements.txt		requirements.txt
test.jpg		test.jpg
workflow.gif		workflow.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Narrating the Unseen Real-Time Video Descriptions for Visually Impaired Individuals

About

Releases

Packages

Languages

nelson123-lab/Narrating-the-Unseen-Real-Time-Video-Descriptions-for-Visually-Impaired-Individuals

Folders and files

Latest commit

History

Repository files navigation

Narrating the Unseen Real-Time Video Descriptions for Visually Impaired Individuals

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages