May 16, 2024, 4:45 a.m. | Mingming Zhang, Qingjie Liu, Yunhong Wang

cs.CV updates on

arXiv:2310.00022v3 Announce Type: replace
Abstract: Learning representations through self-supervision on unlabeled data has proven highly effective for understanding diverse images. However, remote sensing images often have complex and densely populated scenes with multiple land objects and no clear foreground objects. This intrinsic property generates high object density, resulting in false positive pairs or missing contextual information in self-supervised learning. To address these problems, we propose a context-enhanced masked image modeling method (CtxMIM), a simple yet efficient MIM-based self-supervised learning for …

abstract arxiv clear context data diverse false however image images intrinsic modeling multiple object objects property replace sensing supervision through type understanding

