April 2, 2024, 7:48 p.m. | Shikai Li, Jianglin Fu, Kaiyuan Liu, Wentao Wang, Kwan-Yee Lin, Wayne Wu

cs.CV updates on arXiv.org arxiv.org

arXiv:2404.01294v1 Announce Type: new
Abstract: We present CosmicMan, a text-to-image foundation model specialized for generating high-fidelity human images. Unlike current general-purpose foundation models that are stuck in the dilemma of inferior quality and text-image misalignment for humans, CosmicMan enables generating photo-realistic human images with meticulous appearance, reasonable structure, and precise text-image alignment with detailed dense descriptions. At the heart of CosmicMan's success are the new reflections and perspectives on data and models: (1) We found that data quality and a …

abstract alignment arxiv cs.cv current fidelity foundation foundation model general human human images humans image images photo quality text text-image text-to-image type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Software Engineer, Machine Learning (Tel Aviv)

@ Meta | Tel Aviv, Israel

Senior Data Scientist- Digital Government

@ Oracle | CASABLANCA, Morocco