- [2018 ECCV] Attention-aware Deep Adversarial Hashing for Cross-Modal Retrieval, [paper], [bibtex].
- [2019 CVPR] Deep Supervised Cross-modal Retrieval, [paper], [bibtex], sources: [penghu-cs/DSCMR].
- [2019 SIGIR] Scalable Deep Multimodal Learning for Cross-Modal Retrieval, [paper], [bibtex].
- [2019 ACMMM] A New Benchmark and Approach for Fine-grained Cross-media Retrieval, [paper], [bibtex], [homepage], sources: [PKU-ICST-MIPL/FGCrossNet_ACMMM2019].
- [2019 ICCV] Adversarial Representation Learning for Text-to-Image Matching, [paper], [bibtex].
- Resources: [forence/Awesome-Visual-Captioning].
- [2015 ICML] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, [paper], [slides], [bibtex], [homepage], sources: [kelvinxu/arctic-captions], [yunjey/show-attend-and-tell], [DeepRNN/image_captioning], [coldmanck/show-attend-and-tell].
- [2015 NeurIPS] Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks, [paper], [bibtex].
- [2016 NeurIPS] Professor Forcing: A New Algorithm for Training Recurrent Networks, [paper], [bibtex], sources: [anirudh9119/LM_GANS].
- [2017 PAMI] Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, [paper], sources: [tensorflow/models/im2txt].
- [2017 CVPR] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning, [paper], [bibtex], sources: [zjuchenlong/sca-cnn.cvpr17].
- [2017 CVPR] Self-critical Sequence Training for Image Captioning, [paper], [bibtex], sources: [ruotianluo/self-critical.pytorch].
- [2018 CVPR] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering, [paper], [bibtex], sources: [peteanderson80/bottom-up-attention], [hengyuan-hu/bottom-up-attention-vqa], [LeeDoYup/bottom-up-attention-tf].
- [2018 ACL] Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning, [paper], [bibtex], [homepage], sources: [google-research-datasets/conceptual-captions].
- [2018 NeurIPS] Partially-Supervised Image Captioning, [paper], [bibtex].
- [2019 CVPR] Auto-Encoding Scene Graphs for Image Captioning, [paper], [bibtex], [post], sources: [yangxuntu/SGAE].
- [2019 ICCV] Attention on Attention for Image Captioning, [paper], [bibtex], sources: [husthuaan/AoANet].
- [2019 ICCV] Learning to Collocate Neural Modules for Image Captioning, [paper], [bibtex].
- [2020 ArXiv] Deconfounded Image Captioning: A Causal Retrospect, [paper], [bibtex].
- [2020 ICML] Zero-Shot Text-to-Image Generation, [paper], [bibtex], sources: [openai/DALL-E].
- [2020 ECCV] Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning, [paper], [bibtex].
- [2020 TPAMI] Auto-encoding and Distilling Scene Graphs for Image Captioning, [paper], [bibtex], sources: [yangxuntu/SGAE].
- [2021 CVPR] Causal Attention for Vision-Language Tasks, [paper], [bibtex], [supplementary], sources: [yangxuntu/lxmertcatt].
- [2021 CVPR] RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words, [paper], [bibtex], sources: [zhangxuying1004/RSTNet].
- [2019 ACL] Expressing Visual Relationships via Language, [paper], [bibtex], sources: [airsplay/VisualRelationships].