all AI news
F-VLM: Open-vocabulary object detection upon frozen vision and language models
Google AI Blog ai.googleblog.com
Detection is a fundamental vision task that aims to localize and recognize objects in an image. However, the data collection process of manually annotating bounding boxes or instance masks is tedious and costly, which limits the modern detection vocabulary size to roughly 1,000 object classes. This is orders of magnitude smaller than the vocabulary people use to describe the visual world and leaves out many categories. Recent vision and …
collection computer vision data data collection detection google google research iclr image language language models masks multimodal learning objects process research scientists vision