May 24, 2024, 4:50 a.m. | Maciej Kilian, Varun Japan, Luke Zettlemoyer

cs.CV updates on arXiv.org arxiv.org

arXiv:2405.13218v1 Announce Type: new
Abstract: Nearly every recent image synthesis approach, including diffusion, masked-token prediction, and next-token prediction, uses a Transformer network architecture. Despite this common backbone, there has been no direct, compute controlled comparison of how these approaches affect performance and efficiency. We analyze the scalability of each approach through the lens of compute budget measured in FLOPs. We find that token prediction methods, led by next-token prediction, significantly outperform diffusion on prompt following. On image quality, while next-token …

abstract analyze architecture arxiv comparison computational compute cs.cv diffusion efficiency every image network network architecture next performance prediction scalability synthesis token transformer transformer network type

Senior Data Engineer

@ Displate | Warsaw

Junior Data Analyst - ESG Data

@ Institutional Shareholder Services | Mumbai

Intern Data Driven Development in Sensor Fusion for Autonomous Driving (f/m/x)

@ BMW Group | Munich, DE

Senior MLOps Engineer, Machine Learning Platform

@ GetYourGuide | Berlin

Data Engineer, Analytics

@ Meta | Menlo Park, CA

Data Engineer

@ Meta | Menlo Park, CA