Multimodal Video Description (MMVD) framework generates rich natural langauge description of web videos by exploring information from multiple sources e.g., viasual concepts, audio tracks and video category information. The framework is one of the top three performers in the MST-VTT grand challenge organized with ACM-MM 2016. This page provides the code, additional results and illustrative examples of the challenges involved in producing natural language description for "in the wild" web videos as found in MSR-VTT dataset.
forked from VisionLearningGroup/MMVD
-
Notifications
You must be signed in to change notification settings - Fork 0
AndyMjw/MMVD
About
Multimodal Video Description
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published