Nov. 14, 2023, 8:32 p.m. | /u/cstein123

Machine Learning

Can LLMs stack more layers than the largest ones currently have, or is it bottlenecked? Is it because the gradients can’t propagate properly to the beginning of the network? Because inference would be to slow?

If anyone could provide a paper that talks about layer stacking scaling I would love to read it!

inference layer llms love machinelearning network paper practical scaling stack talks

Lecturer in Social Data Analytics

@ The University of Hong Kong | Hong Kong

Applied Scientist - Conversational AI, Analytics and Data Management

@ | Seattle, WA, USA

Senior Perception & Autonomy Researcher

@ Draper | Cambridge, MA, United States

Lead ML & AI Engineer

@ HERE Technologies | Navi Mumbai, India

Data Scientist - Sr. Consultant level ( 9Yrs to 11yrs, TensorFlow , Python,R , Bigdata)

@ Visa | Bengaluru, India

Spatial Data Engineer

@ HERE Technologies | New Cairo, Egypt