March 21, 2024, 10:09 a.m. | /u/DickMasterGeneral

Machine Learning www.reddit.com

Unfortunately Microsoft didn’t release the weights but if it’s between that and the training code then this is certainly preferable. [The post is here](https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf).

What do we think? Any ideas about that oddly shaped loss curve? Do you think this approach breaks after 4B parameters? Thoughts on how LLMs could even work at such a low precision? I know it’s not particularly scientific but it seems pretty counterintuitive to me that you could potentially fit a fully functional 7B parameter …

functional ideas llms loss low machinelearning parameters precision scientific think thoughts work

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Scientist, gTech Ads

@ Google | Mexico City, CDMX, Mexico

Lead, Data Analytics Operations

@ Zocdoc | Pune, Maharashtra, India