all AI news
Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training. (arXiv:2205.04948v1 [cs.CV])
Web: http://arxiv.org/abs/2205.04948
May 11, 2022, 1:10 a.m. | Jing Yang, Junwen Chen, Keiji Yanai
cs.CV updates on arXiv.org arxiv.org
In this paper, we present a cross-modal recipe retrieval framework,
Transformer-based Network for Large Batch Training (TNLBT), which is inspired
by ACME~(Adversarial Cross-Modal Embedding) and H-T~(Hierarchical Transformer).
TNLBT aims to accomplish retrieval tasks while generating images from recipe
embeddings. We apply the Hierarchical Transformer-based recipe text encoder,
the Vision Transformer~(ViT)-based recipe image encoder, and an adversarial
network architecture to enable better cross-modal embedding learning for recipe
texts and images. In addition, we use self-supervised learning to exploit the
rich information …
More from arxiv.org / cs.CV updates on arXiv.org
Latest AI/ML/Big Data Jobs
Director, Applied Mathematics & Computational Research Division
@ Lawrence Berkeley National Lab | Berkeley, Ca
Business Data Analyst
@ MainStreet Family Care | Birmingham, AL
Assistant/Associate Professor of the Practice in Business Analytics
@ Georgetown University McDonough School of Business | Washington DC
Senior Data Science Writer
@ NannyML | Remote
Director of AI/ML Engineering
@ Armis Industries | Remote (US only), St. Louis, California
Digital Analytics Manager
@ Patagonia | Ventura, California