May 8, 2023, 12:47 a.m. | Xu Yang, Jiawei Peng, Zihua Wang, Haiyang Xu, Qinghao Ye, Chenliang Li, Ming Yan, Fei Huang, Zhangzikang Li, Yu Zhang

cs.CV updates on arXiv.org arxiv.org

We propose to Transform Scene Graphs (TSG) into more descriptive captions. In
TSG, we apply multi-head attention (MHA) to design the Graph Neural Network
(GNN) for embedding scene graphs. After embedding, different graph embeddings
contain diverse specific knowledge for generating the words with different
part-of-speech, e.g., object/attribute embedding is good for generating
nouns/adjectives. Motivated by this, we design a Mixture-of-Expert (MOE)-based
decoder, where each expert is built on MHA, for discriminating the graph
embeddings to generate different kinds of words. …

apply arxiv attention design diverse embedding embeddings good graph graph neural network graphs head image knowledge multi-head multi-head attention network neural network part part-of-speech speech words

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Robotics Technician - 3rd Shift

@ GXO Logistics | Perris, CA, US, 92571