all AI news
A Closer Look at Audio-Visual Semantic Segmentation. (arXiv:2304.02970v3 [cs.CV] UPDATED)
cs.CV updates on arXiv.org arxiv.org
Audio-visual segmentation (AVS) is a complex task that involves accurately
segmenting the corresponding sounding object based on audio-visual queries.
Successful audio-visual learning requires two essential components: 1) an
unbiased dataset with high-quality pixel-level multi-class labels, and 2) a
model capable of effectively linking audio information with its corresponding
visual object. However, these two requirements are only partially addressed by
current methods, with training sets containing biased audio-visual data, and
models that generalise poorly beyond this biased training set. In this …
arxiv audio closer look components dataset information labels look pixel quality segmentation semantic unbiased