all AI news
Intellecta Cognitiva: A Comprehensive Dataset for Advancing Academic Knowledge and Machine Reasoning
April 23, 2024, 4:49 a.m. | Ajmal PS, Ditto PS, Jithin VG
cs.CL updates on arXiv.org arxiv.org
Abstract: Intellecta dataset emerges as an innovative synthetic dataset, engineered to enhance the cognitive processing capabilities of contemporary language models. With a composition of 11.53 billion tokens, integrating 8.01 billion tokens of synthetic data with 3.52 billion tokens of rich textbook data, Intellecta is crafted to foster advanced reasoning and comprehensive educational narrative generation. Leveraging the Mixtral-8x7B-Instruct-v0.1 model, the dataset facilitates the generation of complex thought processes and detailed, textbook-style explanations, thus enabling language models to …
abstract academic arxiv billion capabilities cognitive cs.ai cs.cl data dataset knowledge language language models machine processing reasoning synthetic synthetic data textbook tokens type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne