Dec. 14, 2023, 4:13 p.m. | /u/FallMindless3563

Machine Learning www.reddit.com

Hey all, I ran some experiments benchmarking fine-tuning ViT, ResNet50, and CLIP on a Facial Emotion Recognition dataset. I had read the original papers the past few weeks, but wanted to do some practical hands on use of the models themselves.

[https://blog.oxen.ai/practical-ml-dive-how-to-customize-a-vision-transformer-on-your-own-data/](https://blog.oxen.ai/practical-ml-dive-how-to-customize-a-vision-transformer-on-your-own-data/)

\~ TLDR \~ ViT works the best in this small experiment, with minimal code. The experiment was classifying 7 different facial emotions such as "happy", "sad", "angry", etc...



|Model|Accuracy|
|:-|:-|
|ViT|69%|
|ResNet50|64%|
|Zero-Shot CLIP|53%|

Was honestly most …

benchmark benchmarking clip code dataset emotion fine-tuning hey machinelearning papers practical recognition resnet50 train transformers vision vision transformers vit

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Data Engineer (m/f/d)

@ Project A Ventures | Berlin, Germany

Principle Research Scientist

@ Analog Devices | US, MA, Boston