all AI news
[D] Nemotron-4 340b detailed analysis
June 14, 2024, 7:20 p.m. | /u/danielhanchen
Machine Learning www.reddit.com
* **Squared ReLU** unlike Llama SwiGLU, Gemma GeGLU. Different to GLU variants found in [arxiv.org/pdf/2002.05202](http://arxiv.org/pdf/2002.05202) (GLU Variants Improve Transformer, Noam Shazeer)
* ReGLU is \[ ReLU(X \* W\_gate) \* (X \* W\_up) \] \* W\_down
* We need 2 ReLUs + tied weights \[ ReLU(X \* W\_up) \* ReLU(X \* W\_up) \] \* W\_down, so bit like GLU, but not the same
* Why does Squared …
More from www.reddit.com / Machine Learning
[D] Academic ML Labs: How many GPUS ?
17 hours ago |
www.reddit.com
[D] Memory mechanism for Transformers
1 day, 9 hours ago |
www.reddit.com
[D] Visualising attention maps for multimodal ACT model
1 day, 11 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Senior Data Engineer
@ Displate | Warsaw
Professor/Associate Professor of Health Informatics [LKCMedicine]
@ Nanyang Technological University | NTU Novena Campus, Singapore
Research Fellow (Computer Science (and Engineering)/Electronic Engineering/Applied Mathematics/Perception Sciences)
@ Nanyang Technological University | NTU Main Campus, Singapore
Java Developer - Assistant Manager
@ State Street | Bengaluru, India
Senior Java/Python Developer
@ General Motors | Austin IT Innovation Center North - Austin IT Innovation Center North
Research Associate (Computer Engineering/Computer Science/Electronics Engineering)
@ Nanyang Technological University | NTU Main Campus, Singapore