May 24, 2024, 4:50 a.m. | Maciej Kilian, Varun Japan, Luke Zettlemoyer

cs.CV updates on arXiv.org arxiv.org

arXiv:2405.13218v1 Announce Type: new
Abstract: Nearly every recent image synthesis approach, including diffusion, masked-token prediction, and next-token prediction, uses a Transformer network architecture. Despite this common backbone, there has been no direct, compute controlled comparison of how these approaches affect performance and efficiency. We analyze the scalability of each approach through the lens of compute budget measured in FLOPs. We find that token prediction methods, led by next-token prediction, significantly outperform diffusion on prompt following. On image quality, while next-token …

abstract analyze architecture arxiv comparison computational compute cs.cv diffusion efficiency every image network network architecture next performance prediction scalability synthesis token transformer transformer network type

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Senior Data Engineer

@ Displate | Warsaw

Associate Director, IT Business Partner, Cell Therapy Analytical Development

@ Bristol Myers Squibb | Warren - NJ

Solutions Architect

@ Lloyds Banking Group | London 125 London Wall

Senior Lead Cloud Engineer

@ S&P Global | IN - HYDERABAD ORION

Software Engineer

@ Applied Materials | Bengaluru,IND