Oct. 6, 2022, 5:51 p.m. | /u/MultiheadAttention

Natural Language Processing www.reddit.com

I have N sentences, hopefully most of them have the same semantic meaning.

I've encoded each sentence and now I have N embedding vectors. Probably most of the vectors are close to each other.

I've calculated pairwise cosine distance and now I have NxN distance matrix.

Based on this matrix, how do I find the outliers, i.e. sentences with very different meaning?

* N is small, let's say 10<N<100

languagetechnology outliers set

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote