all AI news
Variational Transformer: A Framework Beyond the Trade-off between Accuracy and Diversity for Image Captioning. (arXiv:2205.14458v2 [cs.CV] UPDATED)
cs.CV updates on arXiv.org arxiv.org
Accuracy and Diversity are two essential metrizable manifestations in
generating natural and semantically correct captions. Many efforts have been
made to enhance one of them with another decayed due to the trade-off gap. In
this work, we will show that the inferior standard of accuracy draws from human
annotations (leave-one-out) are not appropriate for machine-generated captions.
To improve diversity with a solid accuracy performance, we exploited a novel
Variational Transformer framework. By introducing the "Invisible Information
Prior" and the "Auto-selectable …
accuracy arxiv captioning diversity framework image trade transformer