Feb. 12, 2024, 5:42 a.m. | Yuhang Liu Zhen Zhang Dong Gong Biwei Huang Mingming Gong Anton van den Hengel Kun Zhang Javen

cs.LG updates on arXiv.org arxiv.org

Multimodal contrastive representation learning methods have proven successful across a range of domains, partly due to their ability to generate meaningful shared representations of complex phenomena. To enhance the depth of analysis and understanding of these acquired representations, we introduce a unified causal model specifically designed for multimodal data. By examining this model, we show that multimodal contrastive representation learning excels at identifying latent coupled variables within the proposed unified model, up to linear or permutation transformations resulting from different …

acquired analysis cs.cv cs.lg data domains generate multimodal multimodal data representation representation learning stat.ml through understanding

