Jan. 7, 2022, 2:10 a.m. | Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

cs.LG updates on arXiv.org arxiv.org

This paper explores a better codebook for BERT pre-training of vision
transformers. The recent work BEiT successfully transfers BERT pre-training
from NLP to the vision field. It directly adopts one simple discrete VAE as the
visual tokenizer, but has not considered the semantic level of the resulting
visual tokens. By contrast, the discrete tokens in NLP field are naturally
highly semantic. This difference motivates us to learn a perceptual codebook.
And we surprisingly find one simple yet effective idea: enforcing …

arxiv bert cv training transformers vision

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote