July 21, 2022, 1:12 a.m. | Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani

cs.CV updates on arXiv.org arxiv.org

Current state-of-the-art methods for image captioning employ region-based
features, as they provide object-level information that is essential to
describe the content of images; they are usually extracted by an object
detector such as Faster R-CNN. However, they have several issues, such as lack
of contextual information, the risk of inaccurate detection, and the high
computational cost. The first two could be resolved by additionally using
grid-based features. However, how to extract and fuse these two types of
features is uncharted. …

arxiv captioning cv features image transformer

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Business Intelligence Developer / Analyst

@ Transamerica | Work From Home, USA

Data Analyst (All Levels)

@ Noblis | Bethesda, MD, United States