all AI news
Correlation Dimension of Natural Language in a Statistical Manifold
May 13, 2024, 4:46 a.m. | Xin Du, Kumiko Tanaka-Ishii
cs.CL updates on arXiv.org arxiv.org
Abstract: The correlation dimension of natural language is measured by applying the Grassberger-Procaccia algorithm to high-dimensional sequences produced by a large-scale language model. This method, previously studied only in a Euclidean space, is reformulated in a statistical manifold via the Fisher-Rao distance. Language exhibits a multifractal, with global self-similarity and a universal dimension around 6.5, which is smaller than those of simple discrete random sequences and larger than that of a Barab\'asi-Albert process. Long memory is …
abstract algorithm arxiv cond-mat.stat-mech correlation cs.ai cs.cl fisher language language model manifold natural natural language scale space statistical type via
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)
@ HelloBetter | Remote
Doctoral Researcher (m/f/div) in Automated Processing of Bioimages
@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena
Seeking Developers and Engineers for AI T-Shirt Generator Project
@ Chevon Hicks | Remote
Global Clinical Data Manager
@ Warner Bros. Discovery | CRI - San Jose - San Jose (City Place)
Global Clinical Data Manager
@ Warner Bros. Discovery | COL - Cundinamarca - Bogotá (Colpatria)
Ingénieur Data Manager / Pau
@ Capgemini | Paris, FR