March 25, 2024, 4:42 a.m. | Alina Petukhova, Joao P. Matos-Carvalho, Nuno Fachada

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.15112v1 Announce Type: cross
Abstract: Text clustering is an important approach for organising the growing amount of digital content, helping to structure and find hidden patterns in uncategorised data. In this research, we investigated how different textual embeddings - particularly those used in large language models (LLMs) - and clustering algorithms affect how text datasets are clustered. A series of experiments were conducted to assess how embeddings influence clustering results, the role played by dimensionality reduction through summarisation, and embedding …

abstract algorithms arxiv clustering cs.ai cs.cl cs.lg data datasets digital digital content embeddings hidden language language models large language large language models llm llms patterns research text textual type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

AIML - Sr Machine Learning Engineer, Data and ML Innovation

@ Apple | Seattle, WA, United States

Senior Data Engineer

@ Palta | Palta Cyprus, Palta Warsaw, Palta remote