Feb. 8, 2024, 5:44 a.m. | Kai Chen Zhili Liu Lanqing Hong Hang Xu Zhenguo Li Dit-Yan Yeung

cs.LG updates on arXiv.org arxiv.org

Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks via randomly masking image patches and reconstruction. However, effective data augmentation strategies for MAE still remain open questions, different from those in contrastive learning that serve as the most important part. This paper studies the prevailing mixing augmentation for MAE. We first demonstrate that naive mixing will in contrast degenerate model performance due to the increase of mutual information (MI). To address, we propose homologous recognition, an auxiliary pretext …

augmentation autoencoder cs.cv cs.lg data image masked autoencoder masking mixed paper part performance questions representation representation learning serve strategies studies tasks via vision visual

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote