all AI news
Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales. (arXiv:2304.06875v1 [cs.CL])
cs.LG updates on arXiv.org arxiv.org
As language models scale up, it becomes increasingly expensive to verify
research ideas because conclusions on small models do not trivially transfer to
large ones. A possible solution is to establish a generic system that directly
predicts some metrics for large models solely based on the results and
hyperparameters from small models. Existing methods based on scaling laws
require hyperparameter search on the largest models, which is impractical with
limited resources. We address this issue by presenting our discoveries
indicating …
arxiv discoveries hyperparameter ideas language language models large models laws loss metrics prediction presenting research resources scale scaling search small solution transfer verify