Nov. 18, 2022, 2:14 a.m. | Xinyu Zhang, Jiahui Chen, Junkun Yuan, Qiang Chen, Jian Wang, Xiaodi Wang, Shumin Han, Xiaokang Chen, Jimin Pi, Kun Yao, Junyu Han, Errui Ding, Jingdo

cs.CV updates on arXiv.org arxiv.org

Masked image modeling (MIM) learns visual representation by masking and
reconstructing image patches. Applying the reconstruction supervision on the
CLIP representation has been proven effective for MIM. However, it is still
under-explored how CLIP supervision in MIM influences performance. To
investigate strategies for refining the CLIP-targeted MIM, we study two
critical elements in MIM, i.e., the supervision position and the mask ratio,
and reveal two interesting perspectives, relying on our developed simple
pipeline, context autodecoder with CLIP target (CAE v2). …

arxiv autoencoder cae clip context

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote