June 14, 2024, 7:20 p.m. | /u/danielhanchen

Machine Learning www.reddit.com

I took a look at NVIDIA's 340B Nemotron LLM - some of my findings:

* **Squared ReLU** unlike Llama SwiGLU, Gemma GeGLU. Different to GLU variants found in [arxiv.org/pdf/2002.05202](http://arxiv.org/pdf/2002.05202) (GLU Variants Improve Transformer, Noam Shazeer)
* ReGLU is \[ ReLU(X \* W\_gate) \* (X \* W\_up) \] \* W\_down
* We need 2 ReLUs + tied weights \[ ReLU(X \* W\_up) \* ReLU(X \* W\_up) \] \* W\_down, so bit like GLU, but not the same
* Why does Squared …

analysis llm look machinelearning nvidia relu

Senior Data Engineer

@ Displate | Warsaw

Professor/Associate Professor of Health Informatics [LKCMedicine]

@ Nanyang Technological University | NTU Novena Campus, Singapore

Research Fellow (Computer Science (and Engineering)/Electronic Engineering/Applied Mathematics/Perception Sciences)

@ Nanyang Technological University | NTU Main Campus, Singapore

Java Developer - Assistant Manager

@ State Street | Bengaluru, India

Senior Java/Python Developer

@ General Motors | Austin IT Innovation Center North - Austin IT Innovation Center North

Research Associate (Computer Engineering/Computer Science/Electronics Engineering)

@ Nanyang Technological University | NTU Main Campus, Singapore