Dec. 21, 2023, 1:34 a.m. | /u/fpgaminer

Machine Learning www.reddit.com

I'm a hobbyist ML researcher and finally, after a year of work, built a state of the art machine vision model from scratch. It's ViT-B/16 based, 448x448x3 input, 91M parameters, trained for 660M samples, with multi-label classification as the target task, on over 5000 unique tags.

All the big foundation vision models today were trained on heavily filtered datasets, greatly limiting the concepts they can represent, in line with arbitrary sets of rules for what is deemed "wholesome" by leading …

art classification clip image machine machinelearning machine vision parameters researcher sota state state of the art tagging tags vision vit work

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Scientist, Demography and Survey Science, University Grad

@ Meta | Menlo Park, CA | New York City

Computer Vision Engineer, XR

@ Meta | Burlingame, CA