all AI news
All are Worth Words: A ViT Backbone for Diffusion Models. (arXiv:2209.12152v2 [cs.CV] UPDATED)
Nov. 18, 2022, 2:12 a.m. | Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, Jun Zhu
cs.LG updates on arXiv.org arxiv.org
Vision transformers (ViT) have shown promise in various vision tasks while
the U-Net based on a convolutional neural network (CNN) remains dominant in
diffusion models. We design a simple and general ViT-based architecture (named
U-ViT) for image generation with diffusion models. U-ViT is characterized by
treating all inputs including the time, condition and noisy image patches as
tokens and employing long skip connections between shallow and deep layers. We
evaluate U-ViT in unconditional and class-conditional image generation, as well
as …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Senior ML Researcher - 3D Geometry Processing | 3D Shape Generation | 3D Mesh Data
@ Promaton | Europe
Data Scientist
@ Motive | India - Remote
Senior Perception Engineer
@ NVIDIA | US, CA, Santa Clara
Business Data Analyst, Finance and Treasury Data Repositories, Senior Associate
@ State Street | Krakow, Poland
Junior AI Engineer (Internship)
@ Sony | SEU - Italy - Roma
Manager, Data Science 3
@ PayPal | USA - Pennsylvania - Virtual