Web: http://arxiv.org/abs/2112.06482

Sept. 21, 2022, 1:14 a.m. | Xinyu Wang, Min Gui, Yong Jiang, Zixia Jia, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu

cs.CL updates on arXiv.org arxiv.org

Recently, Multi-modal Named Entity Recognition (MNER) has attracted a lot of
attention. Most of the work utilizes image information through region-level
visual representations obtained from a pretrained object detector and relies on
an attention mechanism to model the interactions between image and text
representations. However, it is difficult to model such interactions as image
and text representations are trained separately on the data of their respective
modality and are not aligned in the same space. As text representations take
the …

arxiv image text

