Web: http://arxiv.org/abs/2205.14458

Sept. 22, 2022, 1:14 a.m. | Longzhen Yang, Yihang Liu, Yitao Peng, Lianghua He

cs.CV updates on arXiv.org arxiv.org

Accuracy and Diversity are two essential metrizable manifestations in
generating natural and semantically correct captions. Many efforts have been
made to enhance one of them with another decayed due to the trade-off gap. In
this work, we will show that the inferior standard of accuracy draws from human
annotations (leave-one-out) are not appropriate for machine-generated captions.
To improve diversity with a solid accuracy performance, we exploited a novel
Variational Transformer framework. By introducing the "Invisible Information
Prior" and the "Auto-selectable …

accuracy arxiv captioning diversity framework image trade transformer

