Sept. 21, 2022, 1:13 a.m. | Jianzong Wu, Xiangtai Li, Xia Li, Henghui Ding, Yunhai Tong, Dacheng Tao

Referring Image Segmentation (RIS) aims to connect image and language via
outputting the corresponding object masks given a text description, which is a
fundamental vision-language task. Despite lots of works that have achieved
considerable progress for RIS, in this work, we explore an essential question,
"what if the description is wrong or misleading of the text description?". We
term such a sentence as a negative sentence. However, we find that existing
works cannot handle such settings. To this end, we …

