I see what you hear: a vision-inspired method to localize words. (arXiv:2210.13567v1 [cs.CV]) | allainews.com

Oct. 26, 2022, 1:14 a.m. | Mohammad Samragh, Arnav Kundu, Ting-Yao Hu, Minsik Cho, Aman Chadha, Ashish Shrivastava, Oncel Tuzel, Devang Naik

cs.CV updates on arXiv.org arxiv.org

This paper explores the possibility of using visual object detection
techniques for word localization in speech data. Object detection has been
thoroughly studied in the contemporary literature for visual data. Noting that
an audio can be interpreted as a 1-dimensional image, object localization
techniques can be fundamentally useful for word localization. Building upon
this idea, we propose a lightweight solution for word detection and
localization. We use bounding box regression for word localization, which
enables our model to detect the …

arxiv vision words

More from arxiv.org / cs.CV updates on arXiv.org

KDAS: Knowledge Distillation via Attention Supervision Framework for Polyp Segmentation 7 hours ago | arxiv.org

arxiv attention cs.cv cs.lg +8

Orbital Polarimetric Tomography of a Flare Near the Sagittarius A* Supermassive Black Hole 7 hours ago | arxiv.org

abstract arxiv astro-ph.he astro-ph.im +9

Bridging the Gap: Learning Pace Synchronization for Open-World Semi-Supervised Learning 7 hours ago | arxiv.org

arxiv cs.cv cs.lg gap +7

The LuViRA Dataset: Measurement Description 7 hours ago | arxiv.org

abstract algorithms arxiv audio +16

The Brain Tumor Sequence Registration (BraTS-Reg) Challenge: Establishing Correspondence Between Pre-Operative and Follow-up MRI Scans … 7 hours ago | arxiv.org

arxiv brain challenge cs.cv +6

GenURL: A General Framework for Unsupervised Representation Learning 7 hours ago | arxiv.org

abstract algorithms arxiv compact +21

Learning to Score Sign Language with Two-stage Method 7 hours ago | arxiv.org

abstract action recognition analysis arxiv +17

Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models 7 hours ago | arxiv.org

arxiv cs.cv knowledge language +10

OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model 7 hours ago | arxiv.org

abstract arxiv capabilities cs.cv +16

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

View on ai-jobs.net

Data Engineer

@ Paxos | Remote - United States

View on ai-jobs.net

Data Analytics Specialist

@ Media.Monks | Kuala Lumpur

View on ai-jobs.net

Software Engineer III- Pyspark

@ JPMorgan Chase & Co. | India

View on ai-jobs.net

Engineering Manager, Data Infrastructure

@ Dropbox | Remote - Canada

View on ai-jobs.net

Senior AI NLP Engineer

@ Hyro | Tel Aviv-Yafo, Tel Aviv District, Israel

View on ai-jobs.net