Feb. 12, 2024, 5:45 a.m. | Jo\~ao Daniel Silva Jo\~ao Magalh\~aes Devis Tuia Bruno Martins

cs.CV updates on arXiv.org arxiv.org

Image captioning and cross-modal retrieval are examples of tasks that involve the joint analysis of visual and linguistic information. In connection to remote sensing imagery, these tasks can help non-expert users in extracting relevant Earth observation information for a variety of applications. Still, despite some previous efforts, the development and application of vision and language models to the remote sensing domain have been hindered by the relatively small size of the available datasets and models used in previous studies. In …

