March 22, 2024, 4:46 a.m. | Oliver Limoyo, Jimmy Li, Dmitriy Rivkin, Jonathan Kelly, Gregory Dudek

cs.CV updates on arXiv.org arxiv.org

arXiv:2401.11061v2 Announce Type: replace
Abstract: We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via reference images that are selected from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize the reference images via textual descriptions and then use a large language model (LLM) to retrieve relevant reference images based …

abstract acquisition arxiv automated cs.ai cs.cv cs.ro framework guidance human images interactive language language model natural natural language photo photographer photography reference robot suggestions type via visual visual language model

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Principal Machine Learning Engineer (AI, NLP, LLM, Generative AI)

@ Palo Alto Networks | Santa Clara, CA, United States

Consultant Senior Data Engineer F/H

@ Devoteam | Nantes, France