Skip to content

A collection of papers, datasets, benchmarks, code, and model weights for Remote Sensing Cross-Modal Image-Text Retrieval (RSCMIT).

Notifications You must be signed in to change notification settings

BaolanChen/Awesome-Remote-Sensing-Cross-Modal-Image-Text-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 

Repository files navigation

Awesome-Remote-Sensing-Cross-Modal-Image-Text-Retrieval

Awesome

A collection of papers, datasets, benchmarks, code, and model weights for Remote Sensing Cross-Modal Image-Text Retrieval (RSCMIT).


📢 Latest Updates

🔥🔥🔥 Last Updated on 2025.01.09 🔥🔥🔥

  • 2025.01.10: Update TeoChat、RingMoGPT and VHM.
  • 2025.01.09: Update PERSVL、GeoChat.
  • 2024.12.24: Update CFITR.
  • 2024.12.09: Update CDMAN、MSA、KTIR、CMPAGL、CCLS2T、SARCI、FSISR and SCAT.
  • 2024.12.05: Update SIRS and HVSA.

Table of Contents

Remote Sensing Cross-Modal Image-Text Survey

Paper Title Publication Affiliation Note
Paper Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques Remote Sensing 2025 Northwestern Polytechnical University
Paper Foundation Models for Remote Sensing and Earth Observation: A Survey Arxiv 2024 University of Tokyo and the RIKEN Center for Advanced Intelligence Project, Japan
Paper When Geoscience Meets Foundation Models: Toward a general geoscience artificial intelligence system GRSM 2024 Nanjing University of Aeronautics and Astronautics
Paper Vision-Language Models in Remote Sensing: Current progress and future trends GRSM 2024 King Abdullah University of Science and Technology
Paper Language Integration in Remote Sensing: Tasks, datasets, and future directions GRSM 2023 King Saud University
Paper Self-Supervised Remote Sensing Feature Learning: Learning Paradigms, Challenges, and Future Works TGRS 2023 Central South University
Paper The Potential of Visual ChatGPT For Remote Sensing Arxiv 2023 University of Western São Paulo
Paper 遥感大模型:进展与前瞻 武汉大学学报 (信息科学版) 2023 Wuhan University

Remote Sensing Image-Text Datasets

Dataset Name Image size Image Resolution VLMs Note
UCM-Captions 613 256 × 256 - -
Sydney-Captions 2,100 500 × 500 - -
RSICD 10,921 224 × 224 - -
RSITMD 4,743 256 × 256 - -
NWPU-Captions 31,500 256 × 256 - -
RS5M 5 million+ All Resolutions GeoRSCLIP -
SkyScript 5.2 million+ All Resolutions SkyCLIP -
ChatEarthNet 163,488 + 10,000 - ChatEarthNet The dataset will be made publicly available.
GEOBench-VLM over 10,000 - GEOBench-VLM -
VRSBench 29,614 - VRSBench 29,614 images, with 29,614 human-verified detailed captions, 52,472 object references, and 123,221 question-answer pairs (NeruIPS 2024 Dataset and Benchmark Track)

Remote Sensing Cross-Modal Image-Text Retrieval Models

Paper Title Publication Affiliation Code Note
CFITR Toward Efficient and Accurate Remote Sensing Image–Text Retrieval With a Coarse-to-Fine Approach GRSL 2024 Beijing Foreign Studies University Github
PERSVL Prior-Experience-based Vision-Language Model for Remote Sensing Image-Text Retrieval TGRS 2024 Xidian Github
CDMAN Thread the Needle: Cues-Driven Multi-Association for Remote Sensing Cross-Modal Retrieval TGRS 2024 Wuhan University of Technology -
MSA Transcending Fusion: A Multiscale Alignment Method for Remote Sensing Image–Text Retrieval TGRS 2024 Xidian University Github
KTIR Knowledge-aware Text-Image Retrieval for Remote Sensing Images TGRS 2024 EPFL -
CMPAGL Cross-Modal Prealigned Method With Global and Local Information for Remote Sensing Image and Text Retrieval TGRS 2024 Shanghai Maritime University Github
FGIS Fine-Grained Information Supplementation and Value-Guided Learning for Remote Sensing Image-Text Retrieval JSTARS 2024 Chongqing University -
EBAKER Eliminate Before Align: A Remote Sensing Image-Text Retrieval Framework with Keyword Explicit Reasoning ACMMM 2024 Tianjin University -
CUP Cross-Modal Remote Sensing Image–Text Retrieval via Context and Uncertainty-Aware Prompt TNNLS 2024 Xidian University Github
CCLS2T Cross-Modal Contrastive Learning With Spatiotemporal Context for Correlation-Aware Multiscale Remote Sensing Image Retrieval TGRS 2024 Xidian University -
MIIA Global–Local Information Soft-Alignment for Cross-Modal Remote-Sensing Image–Text Retrieval TGRS 2024 Northwestern Polytechnical University -
SARCI Scale-Aware Adaptive Refinement and Cross-Interaction for Remote Sensing Audio-Visual Cross-Modal Retrieval TGRS 2024 Wuhan University of Technology Github
GLISA Masking-Based Cross-Modal Remote Sensing Image–Text Retrieval via Dynamic Contrastive Learning TGRS 2024 China University of Mining and Technology -
SCAT Spatial–Channel Attention Transformer With Pseudo Regions for Remote Sensing Image-Text Retrieval TGRS 2024 Northwestern Polytechnical University -
FSISR Cross-Modal Hashing With Feature Semi-Interaction and Semantic Ranking for Remote Sensing Ship Image Retrieval TGRS 2024 Harbin Institute of Technology -
SkyEyeGPT Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model Arxiv 2024 Northwestern Polytechnical University Github
MFF-SFE Cross-modal retrieval method based on MFF-SFE for remote sensing image-text 中国科学院大学学报 2024 Aerospace Information Research Institute, Chinese Academy of Sciences -
RemoteCLIP RemoteCLIP: A Vision Language Foundation Model for Remote Sensing TGRS 2024 Hohai University Github
C2F-ITR From Coarse To Fine: An Offline-Online Approach for Remote Sensing Cross-Modal Retrieval IGARSS 2024 Beijing Foreign Studies University -
MGRM-EL Exploring Uni-Modal Feature Learning on Entities and Relations for Remote Sensing Cross-Modal Text-Image Retrieval TGRS 2024 Northwestern Polytechnical University -
SIRS Multitask Joint Learning for Remote Sensing Foreground-Entity Image–Text Retrieval TGRS 2024 Soochow University Github
PIR A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval ACMMM 2023 oral Zhejiang University of Technology Github
PE-RSITR Parameter-Efficient Transfer Learning for Remote Sensing Image–Text Retrieval TGRS 2023 Northwestern Polytechnical University Github
HVSA Hypersphere-Based Remote Sensing Cross-Modal Text–Image Retrieval via Curriculum Learning TGRS 2023 Aerospace Information Research Institute, Chinese Academy of Sciences Github
SWAN Reducing Semantic Confusion Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval ICMR 2023 oral Zhejiang University of Technology Github
KAMCL Knowledge-Aided Momentum Contrastive Learning for Remote-Sensing Image Text Retrieval TGRS 2023 Tianjin University Github
IEFT Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval TGRS 2023 Xidian University Github
- A Texture and Saliency Enhanced Image Learning Method For Cross-Modal Remote Sensing Image-Text Retrieval IGARSS 2023 Xidian University -
Multilanguage Transformer Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval JSTARS 2022 King Saud University -
GaLR Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information TGRS 2022 Aerospace Information Research Institute, Chinese Academy of Sciences Github
- Cross-modal retrieval of remote sensing images and text based on self-attention unsupervised deep common feature space IJRS 2022 National University of Defense Technology -
AMFMN Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval TGRS 2021 Aerospace Information Research Institute, Chinese Academy of Sciences Github
LW-MCR A Lightweight Multi-Scale Crossmodal Text-Image Retrieval Method in Remote Sensing TGRS 2021 Aerospace Information Research Institute, Chinese Academy of Sciences Github
VSE++ VSE++: Improving Visual-Semantic Embeddings with Hard Negatives BMVC 2018 spotlight University of Toronto Github

Remote Sensing Vision-Language Modal Model for More Tasks

Paper Title Publication Affiliation Code Note
FIANet Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation TGRS 2024 Southwest Jiaotong University Github Image Segmentation
FedRSClip FedRSClip: Federated Learning for Remote Sensing Scene Classification Using Vision-Language Models Arxiv 2024 China Academy of Electronics and Information Technology - Scene Classification

Remote Sensing Vision-Language Large & Foundation Models

Abbreviation Title Publication Paper Code & Weights
VHM VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis Arxiv 2024 VHM link
TeoChat TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data Arxiv 2024 TeoChat link
RingMoGPT RingMoGPT: A Unified Remote Sensing Foundation Model for Vision, Language, and grounded tasks TGRS 2024 RingMoGPT -
EarthMarker EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote Sensing TGRS 2024 EarthMarker link
GeoChat GeoChat: Grounded Large Vision-Language Model for Remote Sensing CVPR 2024 GeoChat link
EarthGPT EarthGPT: A Universal Multimodal Large Language Model for Multisensor Image Comprehension in Remote Sensing Domain TGRS 2024 EarthGPT link
RemoteCLIP RemoteCLIP: A Vision Language Foundation Model for Remote Sensing TGRS 2024 RemoteCLIP link
RSGPT RSGPT: A Remote Sensing Vision Language Model and Benchmark Arxiv 2023 RSGPT link
GeoRSCLIP RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model Arxiv 2023 GeoRSCLIP link
GRAFT Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment ICLR 2024 GRAFT -
CSP CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations ICML 2023 CSP link
GeoCLIP GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization NeurIPS 2023 GeoCLIP link
SatCLIP SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery Arxiv 2023 SatCLIP link

 

问题、反馈和对此存储库的贡献

我欢迎各种反馈,最好通过GitHub Issues 分享。 同样,如果您有任何疑问或只是想与他人交流想法,请随时发布这些内容。

致谢

感谢相关论文、相关项目

引用

如果您发现本项目对您的研究有用,请考虑引用它。

About

A collection of papers, datasets, benchmarks, code, and model weights for Remote Sensing Cross-Modal Image-Text Retrieval (RSCMIT).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published