all AI news
Text clustering with LLM embeddings
March 25, 2024, 4:42 a.m. | Alina Petukhova, Joao P. Matos-Carvalho, Nuno Fachada
cs.LG updates on arXiv.org arxiv.org
Abstract: Text clustering is an important approach for organising the growing amount of digital content, helping to structure and find hidden patterns in uncategorised data. In this research, we investigated how different textual embeddings - particularly those used in large language models (LLMs) - and clustering algorithms affect how text datasets are clustered. A series of experiments were conducted to assess how embeddings influence clustering results, the role played by dimensionality reduction through summarisation, and embedding …
abstract algorithms arxiv clustering cs.ai cs.cl cs.lg data datasets digital digital content embeddings hidden language language models large language large language models llm llms patterns research text textual type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
AIML - Sr Machine Learning Engineer, Data and ML Innovation
@ Apple | Seattle, WA, United States
Senior Data Engineer
@ Palta | Palta Cyprus, Palta Warsaw, Palta remote