Web: http://arxiv.org/abs/2206.07699

June 16, 2022, 1:11 a.m. | Shizhe Diao, Wangchunshu Zhou, Xinsong Zhang, Jiawei Wang

cs.LG updates on arXiv.org arxiv.org

With the success of vision-language pre-training, we have witnessed the
state-of-the-art has been pushed on multi-modal understanding and generation.
However, the current pre-training paradigm is either incapable of targeting all
modalities at once (e.g., text generation and image generation), or requires
multi-fold well-designed tasks which significantly limits the scalability. We
demonstrate that a unified modal model could be learned with a prefix language
modeling objective upon text and image sequences. Thanks to the simple but
powerful pre-training paradigm, our proposed …

arxiv cv language language models models

More from arxiv.org / cs.LG updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY