July 25, 2022, 7:51 a.m. | /u/MultiheadAttention

Natural Language Processing www.reddit.com

My task is to fine-tune Bi-Encoder SBERT ([https://www.sbert.net/](https://www.sbert.net/)) on custom data.

My custom data consists of \~100K short sentence pairs with similar meaning.

I create \~500K negative samples by negative sampling.

For Bi-Encoder SBERT fine-tuning I need a score for each pair. The only scores I have are 0 (for all negative samples) and 1 (for all positive samples).

​

My question: Are those scores are good enough for fine tuning?

If not, how should I give a score for …

bi encoder languagetechnology

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US