April 14, 2024, 3:07 a.m. | /u/Jazzlike-Common-8978

Deep Learning www.reddit.com

Hi guys, I want to ask about conditioning mechanism in DiT (Diffusion Transformer). It use AdaLN which is scale & shift operators, and the author report that it is better than cross attention. However, I think AdaLN must be worse than cross attention becase it only allows the condition infomation to be a vector which limit amount of carrying information.

Am I correct?

https://preview.redd.it/ljyh2xop1duc1.png?width=1181&format=png&auto=webp&s=78ba16dfd3a44e894a3714719cac9e9cd3d732a8

attention author deeplearning diffusion diffusion transformer however operators report scale shift think transformer vector

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Engineer - New Graduate

@ Applied Materials | Milan,ITA

Lead Machine Learning Scientist

@ Biogen | Cambridge, MA, United States