April 14, 2024, 3:07 a.m. | /u/Jazzlike-Common-8978

Deep Learning www.reddit.com

Hi guys, I want to ask about conditioning mechanism in DiT (Diffusion Transformer). It use AdaLN which is scale & shift operators, and the author report that it is better than cross attention. However, I think AdaLN must be worse than cross attention becase it only allows the condition infomation to be a vector which limit amount of carrying information.

Am I correct?

https://preview.redd.it/ljyh2xop1duc1.png?width=1181&format=png&auto=webp&s=78ba16dfd3a44e894a3714719cac9e9cd3d732a8

attention author deeplearning diffusion diffusion transformer however operators report scale shift think transformer vector

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York