March 24, 2024, 9:27 p.m. | /u/MidnightSun_55

Machine Learning www.reddit.com

From what I understand the paper "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits", talks about doing inference and training on -1, 0 and 1 as possible values for weights.

Could this be applied to a simple feed forward networks and train it on the MNIST dataset to test and compare performance? Also as a simple example that can be used as a learning experience.

Or are some LLM parts necessary for this to work, …

dataset digits inference language language models large language large language models llms machinelearning network paper simple talks training values

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne