Feb. 6, 2024, 5:48 a.m. | Max Milkert David Hyde Forrest Laine

cs.LG updates on arXiv.org arxiv.org

A neural network with ReLU activations may be viewed as a composition of piecewise linear functions. For such networks, the number of distinct linear regions expressed over the input domain has the potential to scale exponentially with depth, but it is not expected to do so when the initial parameters are chosen randomly. This poor scaling can necessitate the use of overly large models to approximate even simple functions. To address this issue, we introduce a novel training strategy: we …

cs.ai cs.lg domain functions linear network networks neural network relu scale scaling training

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Software Engineer, Generative AI (C++)

@ SoundHound Inc. | Toronto, Canada