all AI news
Hear to Segment: Unmixing the Audio to Guide the Semantic Segmentation. (arXiv:2305.07223v1 [cs.SD])
cs.CV updates on arXiv.org arxiv.org
In this paper, we focus on a recently proposed novel task called Audio-Visual
Segmentation (AVS), where the fine-grained correspondence between audio stream
and image pixels is required to be established. However, learning such
correspondence faces two key challenges: (1) audio signals inherently exhibit a
high degree of information density, as sounds produced by multiple objects are
entangled within the same audio stream; (2) the frequency of audio signals from
objects with the same category tends to be similar, which hampers …
arxiv audio challenges fine-grained focus guide image novel paper pixels segmentation semantic