Feb. 9, 2024, 5:47 a.m. | Xiaowen Sun Jiazhan Feng Yuxuan Wang Yuxuan Lai Xingyu Shen Dongyan Zhao

cs.CV updates on arXiv.org arxiv.org

A picture is worth a thousand words, thus, it is crucial for conversational agents to understand, perceive, and effectively respond with pictures. However, we find that directly employing conventional image generation techniques is inadequate for conversational agents to produce image responses effectively. In this paper, we focus on the innovative dialog-to-image generation task, where the model synthesizes a high-resolution image aligned with the given dialog context as a response. To tackle this problem, we design a tailored fine-tuning approach on …

agents conversational conversational agents cs.ai cs.cl cs.cv focus image image generation paper responses teaching text text-to-image words

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Scientist

@ Publicis Groupe | New York City, United States

Bigdata Cloud Developer - Spark - Assistant Manager

@ State Street | Hyderabad, India