Repository for ACL'23 Paper: "Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning"
Authors: Shivaen Ramshetty*, Gaurav Verma*, and Srijan Kumar
Affiliation: Georgia Institute of Technology
Paper (pdf): arXiv, ACL Anthology
Poster (pdf): ACL Underline
We provide an easy to follow repository with guided notebooks detailing our baselines, method, and evaluation.
The dataset subsets can be downloaded here:
To allow for rapid experimentation we provide pre-computed objects and attributes for each dataset:
To perform object and attribute detection yourself:
- Setup Bottom-Up Attention Repo or use our docker
- Download pretrained model if setting up yourself.
- Follow instructions in
detector/README.md
to capture objects and attributes for the above data or your own.
To augment and evaluate your own data, we provide scripts in XMAI
Notebooks and data for our paper are found within paper_experiments
- The baseline notebook can be found here:
paper_experiments/colab_notebooks/augmentation/TextAttack_baselines.ipynb
- We provide our resulting files under:
paper_experiments/modified_captions/
- The XMAI notebook can be found here:
paper_experiments/colab_notebooks/augmentation/TextAttack_baselines.ipynb
- We provide our main experiment results under:
paper_experiments/modified_captions/
- We provide separate evaluation scripts for MSCOCO and SNLI-VE datasets in
paper_experiments/colab_notebooks/evaluation/
@inproceedings{ramshetty2023xmai,
title={Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning},
author={Ramshetty, Shivaen and Verma, Gaurav and Kumar, Srijan},
booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)},
year={2023}
}
We also thank the authors and contributors of the following repositories:
@misc{yu2020buapt,
author = {Yu, Zhou and Li, Jing and Luo, Tongan and Yu, Jun},
title = {A PyTorch Implementation of Bottom-Up-Attention},
howpublished = {\url{https://github.com/MILVLG/bottom-up-attention.pytorch}},
year = {2020}
}
@inproceedings{dou2022meter,
title={An Empirical Study of Training End-to-End Vision-and-Language Transformers},
author={Dou, Zi-Yi and Xu, Yichong and Gan, Zhe and Wang, Jianfeng and Wang, Shuohang and Wang, Lijuan and Zhu, Chenguang and Zhang, Pengchuan and Yuan, Lu and Peng, Nanyun and Liu, Zicheng and Zeng, Michael},
booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022},
url={https://arxiv.org/abs/2111.02387},
}
@article{wang2022ofa,
author = {Peng Wang and
An Yang and
Rui Men and
Junyang Lin and
Shuai Bai and
Zhikang Li and
Jianxin Ma and
Chang Zhou and
Jingren Zhou and
Hongxia Yang},
title = {OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence
Learning Framework},
journal = {CoRR},
volume = {abs/2202.03052},
year = {2022}
}
@misc{wu2019detectron2,
author = {Yuxin Wu and Alexander Kirillov and Francisco Massa and
Wan-Yen Lo and Ross Girshick},
title = {Detectron2},
howpublished = {\url{https://github.com/facebookresearch/detectron2}},
year = {2019}
}