The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis | allainews.com

April 2, 2024, 7:52 p.m. | Chen Yang, Junzhuo Li, Xinyao Niu, Xinrun Du, Songyang Gao, Haoran Zhang, Zhaoliang Chen, Xingwei Qu, Ruibin Yuan, Yizhi Li, Jiaheng Liu, Stephen W. H

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.01204v1 Announce Type: new
Abstract: Uncovering early-stage metrics that reflect final model performance is one core principle for large-scale pretraining. The existing scaling law demonstrates the power-law correlation between pretraining loss and training flops, which serves as an important indicator of the current training state for large language models. However, this principle only focuses on the model's compression properties on the training data, resulting in an inconsistency with the ability improvements on the downstream tasks. Some follow-up works attempted to …

abstract analysis arxiv capability core correlation cs.cl current language language model large language large language model law line loss metrics performance power power-law pretraining scale scaling scaling law stage state streaming training type

More from arxiv.org / cs.CL updates on arXiv.org

Knowledge Graphs and Pre-trained Language Models enhanced Representation Learning for Conversational Recommender Systems 18 hours ago | arxiv.org

abstract arxiv context conversation +20

ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs) 18 hours ago | arxiv.org

abstract active learning arxiv chatgpt +22

UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations 18 hours ago | arxiv.org

abstract arxiv commonsense cs.cl +10

Response: Emergent analogical reasoning in large language models 18 hours ago | arxiv.org

abstract acquired analogy arxiv +16

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization 18 hours ago | arxiv.org

abstract agents arxiv autonomous +18

NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance 18 hours ago | arxiv.org

abstract arxiv chinese cs.ce +25

CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions 18 hours ago | arxiv.org

abstract acquired arxiv collection +17

GOLD: Geometry Problem Solver with Natural Language Description 18 hours ago | arxiv.org

abstract artificial artificial intelligence arxiv +22

Enhancing Surgical Robots with Embodied Intelligence for Autonomous Ultrasound Scanning 18 hours ago | arxiv.org

abstract arxiv autonomous cs.ai +17

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Business Data Analyst

@ Alstom | Johannesburg, GT, ZA

View on ai-jobs.net