April 9, 2023, 12:18 p.m. | /u/hardwareDE

Natural Language Processing www.reddit.com

Hey everyone, as part of a project (data cannot be disclosed) I am comparing around 5000 sentences by their semantic content. The aim is to find pairs / groups of sentences with an exact or almost exact content. The sentences are all very domain heavy and often contain similar words in different contexts. Creating tf-idf vectors and grouping with cosine similarity works somewhat okay. Word frequency clustering did dnot work well, as many of the domain specific works are dominating …

aim cluster clustering data hey languagetechnology part project semantic text tf-idf vectors words work

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Quantexa | Sydney, New South Wales, Australia

Staff Analytics Engineer

@ Warner Bros. Discovery | NY New York 230 Park Avenue South