[D] Nemotron-4 340b detailed analysis | allainews.com

June 14, 2024, 7:20 p.m. | /u/danielhanchen

Machine Learning www.reddit.com

I took a look at NVIDIA's 340B Nemotron LLM - some of my findings:

* **Squared ReLU** unlike Llama SwiGLU, Gemma GeGLU. Different to GLU variants found in [arxiv.org/pdf/2002.05202](http://arxiv.org/pdf/2002.05202) (GLU Variants Improve Transformer, Noam Shazeer)
* ReGLU is \[ ReLU(X \* W\_gate) \* (X \* W\_up) \] \* W\_down
* We need 2 ReLUs + tied weights \[ ReLU(X \* W\_up) \* ReLU(X \* W\_up) \] \* W\_down, so bit like GLU, but not the same
* Why does Squared …

analysis llm look machinelearning nvidia relu

More from www.reddit.com / Machine Learning

[D] Why does developing these RAG applications feel like alchemy? 2 hours ago | www.reddit.com

applications biases least machinelearning +4

[D] How do you quantize a finetuned encoder-decoder (seq2seq) transformer like mT5 on ONNXRuntime or … 8 hours ago | www.reddit.com

decoder encoder encoder-decoder errors +14

[R] GNOME: Generating Negotiations through Open-Domain Mapping of Exchanges 9 hours ago | www.reddit.com

domain machinelearning mapping negotiations +1

[D] Datasets of the google Gemma for Indic languages 11 hours ago | www.reddit.com

datasets english gemma google +8

[D] Academic ML Labs: How many GPUS ? 17 hours ago | www.reddit.com

amazon capacity compute extra +11

[D] Memory mechanism for Transformers 1 day, 9 hours ago | www.reddit.com

hey important machinelearning memory +2

[P] AgileRL - evolutionary RLOps for state-of-the-art deep reinforcement learning 1 day, 9 hours ago | www.reddit.com

art framework hyperparameter library +10

[D] Visualising attention maps for multimodal ACT model 1 day, 11 hours ago | www.reddit.com

act action attention chunk +17

[D] [R] Need Help: Using ML to differentiate Radiation Necrosis from Tumor Progression in glioblastoma 1 day, 13 hours ago | www.reddit.com

development figure images machine +6

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Professor/Associate Professor of Health Informatics [LKCMedicine]

@ Nanyang Technological University | NTU Novena Campus, Singapore

View on ai-jobs.net

Research Fellow (Computer Science (and Engineering)/Electronic Engineering/Applied Mathematics/Perception Sciences)

@ Nanyang Technological University | NTU Main Campus, Singapore

View on ai-jobs.net

Java Developer - Assistant Manager

@ State Street | Bengaluru, India

View on ai-jobs.net

Senior Java/Python Developer

@ General Motors | Austin IT Innovation Center North - Austin IT Innovation Center North

View on ai-jobs.net

Research Associate (Computer Engineering/Computer Science/Electronics Engineering)

@ Nanyang Technological University | NTU Main Campus, Singapore

View on ai-jobs.net