Web: http://arxiv.org/abs/2205.04948

May 11, 2022, 1:10 a.m. | Jing Yang, Junwen Chen, Keiji Yanai

cs.CV updates on arXiv.org arxiv.org

In this paper, we present a cross-modal recipe retrieval framework,
Transformer-based Network for Large Batch Training (TNLBT), which is inspired
by ACME~(Adversarial Cross-Modal Embedding) and H-T~(Hierarchical Transformer).
TNLBT aims to accomplish retrieval tasks while generating images from recipe
embeddings. We apply the Hierarchical Transformer-based recipe text encoder,
the Vision Transformer~(ViT)-based recipe image encoder, and an adversarial
network architecture to enable better cross-modal embedding learning for recipe
texts and images. In addition, we use self-supervised learning to exploit the
rich information …

arxiv cross cv recipe training transformer

More from arxiv.org / cs.CV updates on arXiv.org

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC

Senior Data Science Writer

@ NannyML | Remote

Director of AI/ML Engineering

@ Armis Industries | Remote (US only), St. Louis, California

Digital Analytics Manager

@ Patagonia | Ventura, California