Aug. 11, 2023, 6:51 a.m. | Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang

cs.CV updates on arXiv.org arxiv.org

We present a novel masked image modeling (MIM) approach, context autoencoder
(CAE), for self-supervised representation pretraining. We pretrain an encoder
by making predictions in the encoded representation space. The pretraining
tasks include two tasks: masked representation prediction - predict the
representations for the masked patches, and masked patch reconstruction -
reconstruct the masked patches. The network is an encoder-regressor-decoder
architecture: the encoder takes the visible patches as input; the regressor
predicts the representations of the masked patches, which are expected …

arxiv autoencoder cae context encoder image making modeling novel prediction predictions representation representation learning space

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead Data Engineer

@ WorkMoney | New York City, United States - Remote