Jan. 7, 2024, 10:49 a.m. | /u/randomnes-random

Computer Vision www.reddit.com

# Task:

**Finetune SAM model on Custom dataset to segment objects without prompts (during training and inference)**

# Approach:



https://preview.redd.it/8h0xgyk500bc1.png?width=1333&format=png&auto=webp&s=839212aaf8ab209ea0e4eadebaec9d3467c4df4c



>Note: The post is created using my Kaggle notebook -- [https://www.kaggle.com/code/yogendrayatnalkar/promptless-taskspecific-finetuning-of-metaai-sam](https://www.kaggle.com/code/yogendrayatnalkar/promptless-taskspecific-finetuning-of-metaai-sam)

## How does SAM work (high-level):

* Sam Encoder --> **ViT + Neck-Module** (Consisting of 2 Conv2D layers used for downsampling the channels of the ViT output)
* The Encoder ViT has a patch-size of **16x16**.
* Input: **1024x1024x3**
* With the above patch-size and input-image-size, the …

computervision dataset downsampling encoder finetuning inference objects prompts sam segment training vit work

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US