Skip to content

This research explores a novel system designed to empower visually impaired individuals by narrating their surroundings through spoken language, leveraging the capabilities of a mobile camera.

Notifications You must be signed in to change notification settings

nelson123-lab/Narrating-the-Unseen-Real-Time-Video-Descriptions-for-Visually-Impaired-Individuals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Narrating the Unseen Real-Time Video Descriptions for Visually Impaired Individuals

This research explores a novel system designed to empower visually impaired individuals by narrating their surroundings through spoken language, leveraging the capabilities of a mobile camera. Our study involves a comparative analysis of various pre-trained models in generating descriptive captions. Globally, approximately 2.2 billion people are affected by some form of visual impairment or blindness. Addressing this significant challenge, our research proposes an integrated solution aimed at assisting visually impaired individuals in comprehending their environment. This is achieved through the description of video streams, utilizing advanced Generative AI techniques.

The cornerstone of our proposed methodology is the use of a pre-trained GPT-4 Vision multimodal model, which has been trained on an extensive dataset comprising 13 million tokens. Additionally, we have engineered a robust Client-Server socket connection framework. This design ensures that intensive computational tasks, particularly video stream preprocessing, are primarily conducted server-side.

A key aspect of our research involves the evaluation of generated captions. These are meticulously compared with standard captions using established metrics such as BLEU and ROUGE scores. Recognizing the semantic limitations inherent in these metrics, we also employ a Semantic Similarity metric for a more nuanced comparison. This comprehensive approach allows for a thorough assessment of the effectiveness of our system in providing accurate and contextually relevant descriptions for the visually impaired.

Referenes:

About

This research explores a novel system designed to empower visually impaired individuals by narrating their surroundings through spoken language, leveraging the capabilities of a mobile camera.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages