June 1, 2023, 9:18 a.m. | /u/kazhdan_d

Computer Vision www.reddit.com

Hey folks,

As I'm sure you know, there's a lot of buzz around Zero-Shot Foundational Models these days (e.g. SAM, OWL-ViT, ImageBind etc.), but not too much info on how these should be compared against "classic" in-house models (e.g. Yolo-v8/NAS).

Any thoughts on best-practices for comparing/evaluating these and deciding whether to integrate them into your CV pipeline?

P.S. Below is an example of an amusing edge case we found in a Zero-Shot Object Detection Model from HuggingFace, where it is …

computervision etc foundational models hey imagebind nas practices sam vit yolo

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)

@ takealot.com | Cape Town