Aug. 16, 2023, 6:07 p.m. | /u/Grand_Comparison2081

Natural Language Processing www.reddit.com

Hello, I’ve made sentence embeddings of my documents and want to do topic modeling by clustering these documents.

I’ve done dimensionality reduction and clustering and the results are ok. I used UMAP and HDBSCAN.

How can I trouble shoot if my dimensionality reduction, clustering algorithm or sentence embeddings are the issue for clustering things that maybe should go somewhere else.

On top of that, I’ve read that doing dimensionality reduction but reducing to a medium number of dimensions (50, for …

algorithm clustering clustering algorithm dimensionality documents embedding embeddings hdbscan hello issue languagetechnology modeling topic modeling umap

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Data Scientist (Database Development)

@ Nasdaq | Bengaluru-Affluence