all AI news
[P][D] A100 is much slower than expected at low batch size for text generation
Dec. 4, 2023, 2:23 a.m. | /u/currytrash97
Machine Learning www.reddit.com
Unfortunately I’m limited in the infrastructure I can use to deploy this model. There is no batch inference supported. The infrastructure I have allows me to deploy a copy of the model on a single A100, 1 per process with up to 9 processes supported (these are called “replicas”). I understand that this makes little sense given my model is memory bound, and each …
a100 deploy generate inference infrastructure llm low machinelearning precision project text text generation
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote