Computational Tradeoffs in Image Synthesis: Diffusion, Masked-Token, and Next-Token Prediction | allainews.com

May 24, 2024, 4:50 a.m. | Maciej Kilian, Varun Japan, Luke Zettlemoyer

cs.CV updates on arXiv.org arxiv.org

arXiv:2405.13218v1 Announce Type: new
Abstract: Nearly every recent image synthesis approach, including diffusion, masked-token prediction, and next-token prediction, uses a Transformer network architecture. Despite this common backbone, there has been no direct, compute controlled comparison of how these approaches affect performance and efficiency. We analyze the scalability of each approach through the lens of compute budget measured in FLOPs. We find that token prediction methods, led by next-token prediction, significantly outperform diffusion on prompt following. On image quality, while next-token …

abstract analyze architecture arxiv comparison computational compute cs.cv diffusion efficiency every image network network architecture next performance prediction scalability synthesis token transformer transformer network type

More from arxiv.org / cs.CV updates on arXiv.org

DIAS: A Dataset and Benchmark for Intracranial Artery Segmentation in DSA sequences 2 days, 11 hours ago | arxiv.org

arxiv benchmark cs.cv dataset +6

Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images 2 days, 11 hours ago | arxiv.org

abstract arxiv benchmarking biases +20

MAFA: Managing False Negatives for Vision-Language Pre-training 2 days, 11 hours ago | arxiv.org

arxiv cs.ai cs.cv false +7

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation 2 days, 11 hours ago | arxiv.org

abstract animate anyone animation arxiv +23

KNVQA: A Benchmark for evaluation knowledge-based VQA 2 days, 11 hours ago | arxiv.org

abstract accuracy arxiv benchmark +22

Optimization Efficient Open-World Visual Region Recognition 2 days, 11 hours ago | arxiv.org

abstract arxiv building capabilities +25

HyperFields: Towards Zero-Shot Generation of NeRFs from Text 2 days, 11 hours ago | arxiv.org

abstract arxiv cs.cv distillation +14

Multi-modal Learning with Missing Modality via Shared-Specific Feature Modelling 2 days, 11 hours ago | arxiv.org

arxiv cs.cv feature modal +5

A Generative Model for Digital Camera Noise Synthesis 2 days, 11 hours ago | arxiv.org

abstract arxiv cs.cv digital +14

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Junior Data Analyst - ESG Data

@ Institutional Shareholder Services | Mumbai

View on ai-jobs.net

Intern Data Driven Development in Sensor Fusion for Autonomous Driving (f/m/x)

@ BMW Group | Munich, DE

View on ai-jobs.net

Senior MLOps Engineer, Machine Learning Platform

@ GetYourGuide | Berlin

View on ai-jobs.net

Data Engineer, Analytics

@ Meta | Menlo Park, CA

View on ai-jobs.net

Data Engineer

@ Meta | Menlo Park, CA

View on ai-jobs.net