[P] How I found 8 bugs in Google's Gemma 6T token model | allainews.com

March 19, 2024, 5:23 p.m. | /u/danielhanchen

Machine Learning www.reddit.com

Hey r/MachineLearning! Maybe you might have seen me post on [Twitter](https://twitter.com/danielhanchen/status/1765446273661075609), but I'll just post here if you don't know about 8 bugs in multiple implementations on Google's Gemma :) The fixes should already be pushed into HF's transformers main branch, and Keras, Pytorch Gemma, vLLM should have gotten the fix :) [https://github.com/huggingface/transformers/pull/29402](https://github.com/huggingface/transformers/pull/29402)

By comparing 5 implementations, I found the following issues:

1. Must add or else losses will be very high.
2. There’s a typo for model in the …

bfloat16 found keras losses machinelearning mixed report rope technical will

More from www.reddit.com / Machine Learning

[D] Modern best coding practices for Pytorch (for research)? 3 hours ago | www.reddit.com

coding config example good +14

[P] I reproduced Anthropic's recent interpretability research 6 hours ago | www.reddit.com

anthropic attention basic capabilities +8

[R] KAN: Kolmogorov-Arnold Networks 7 hours ago | www.reddit.com

abstract every function functions +11

[D] Looking for a recent study/paper/article that showed that an alternate model with a similar … 7 hours ago | www.reddit.com

article conversation machinelearning nothing +4

[D] Is RPE still a valid approach, or is RoPE entirely superior? 11 hours ago | www.reddit.com

attention datasets embed information +8

[D] TensorDock — GPU Cloud Marketplace, H100s from $2.49/hr 14 hours ago | www.reddit.com

building cloud cloud gpu gpu +17

How does freezing a model work? [D] 17 hours ago | www.reddit.com

clip encoder guides inputs +9

[D] ICML 2024 Decision Thread 17 hours ago | www.reddit.com

create decision discuss every +9

Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land 22 hours ago | www.reddit.com

differentiable machinelearning

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Scientist

@ Publicis Groupe | New York City, United States

View on ai-jobs.net

Bigdata Cloud Developer - Spark - Assistant Manager

@ State Street | Hyderabad, India

View on ai-jobs.net