May 13, 2024, 4:46 a.m. | Xin Du, Kumiko Tanaka-Ishii

cs.CL updates on arXiv.org arxiv.org

arXiv:2405.06321v1 Announce Type: new
Abstract: The correlation dimension of natural language is measured by applying the Grassberger-Procaccia algorithm to high-dimensional sequences produced by a large-scale language model. This method, previously studied only in a Euclidean space, is reformulated in a statistical manifold via the Fisher-Rao distance. Language exhibits a multifractal, with global self-similarity and a universal dimension around 6.5, which is smaller than those of simple discrete random sequences and larger than that of a Barab\'asi-Albert process. Long memory is …

abstract algorithm arxiv cond-mat.stat-mech correlation cs.ai cs.cl fisher language language model manifold natural natural language scale space statistical type via

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Global Clinical Data Manager

@ Warner Bros. Discovery | CRI - San Jose - San Jose (City Place)

Global Clinical Data Manager

@ Warner Bros. Discovery | COL - Cundinamarca - Bogotá (Colpatria)

Ingénieur Data Manager / Pau

@ Capgemini | Paris, FR