- [2017 CVPR] TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering, [paper], [bibtex], sources: [YunseokJANG/tgif-qa].
- [2017 ACMMM] Video Question Answering via Gradually Refined Attention over Appearance and Motion, [paper], [bibtex].
- [2018 CVPR] Motion-Appearance Co-Memory Networks for Video Question Answering, [paper], [bibtex].
- [2018 CVPR] Focal Visual-Text Attention for Visual Question Answering, [paper], [bibtex], sources: [JunweiLiang/FVTA_memoryqa].
- [2018 EMNLP] TVQA: Localized Compositional Video Question Answering, [paper], [bibtex], [supplementary], [homepage], sources: [jayleicn/TVQA].
- [2019 TPAMI] Focal Visual-Text Attention for Memex Question Answering, [paper], [bibtex], [homepage], sources: [JunweiLiang/FVTA_memoryqa].
- [2019 CVPR] Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering, [paper], [bibtex], [poster], sources: [fanchenyou/HME-VideoQA].
- [2019 AAAI] ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering, [paper], [bibtex], sources: [MILVLG/activitynet-qa].
- [2019 ArXiv] TVQA+: Spatio-Temporal Grounding for Video Question Answering, [paper], [bibtex], [homepage].
- [2020 AAAI] Divide and Conquer: Question-Guided Spatio-Temporal Contextual Attention for Video Question Answering, [paper], [bibtex].
- [2020 ICLR] CLEVRER: Collision Events for Video Representation and Reasoning, [paper], [bibtex], [homepage].
- [2019 ACL] Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems, [paper], [bibtex], sources: [henryhungle/MTN].