PixArt-\Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation | allainews.com

March 8, 2024, 5:45 a.m. | Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.04692v1 Announce Type: new
Abstract: In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution. PixArt-\Sigma represents a significant advancement over its predecessor, PixArt-\alpha, offering images of markedly higher fidelity and improved alignment with text prompts. A key feature of PixArt-\Sigma is its training efficiency. Leveraging the foundational pre-training of PixArt-\alpha, it evolves from the `weaker' baseline to a `stronger' model via incorporating higher quality data, a process we term "weak-to-strong training". …

abstract advancement alignment alpha arxiv cs.cv diffusion feature fidelity image image generation images key paper pixart prompts text text-to-image training transformer transformer model type

More from arxiv.org / cs.CV updates on arXiv.org

Anatomically aware dual-hop learning for pulmonary embolism detection in CT pulmonary angiograms 14 hours ago | arxiv.org

abstract arxiv cases cs.cv +13

PREGO: online mistake detection in PRocedural EGOcentric videos 14 hours ago | arxiv.org

abstract applications arxiv capability +12

Uncertainty estimates for semantic segmentation: providing enhanced reliability for automated motor claims handling 14 hours ago | arxiv.org

abstract arxiv automated automation +17

CG-HOI: Contact-Guided 3D Human-Object Interaction Generation 14 hours ago | arxiv.org

abstract arxiv cs.cv dynamic +9

DSD-DA: Distillation-based Source Debiasing for Domain Adaptive Object Detection 14 hours ago | arxiv.org

abstract alignment arxiv bias +14

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models 14 hours ago | arxiv.org

abstract arxiv capabilities commonsense +21

REB: Reducing Biases in Representation for Industrial Anomaly Detection 14 hours ago | arxiv.org

anomaly anomaly detection arxiv biases +7

Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems 14 hours ago | arxiv.org

arxiv block bridge cs.ai +11

Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases Autosegmentation 14 hours ago | arxiv.org

abstract arxiv brain cs.cv +16

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net