Oct. 27, 2023, 5:18 p.m. | /u/faschu

Machine Learning www.reddit.com

What are the benefits of using an H100 over an A100 (both at 80 GB and both using FP16) for LLM inference?



Seeing the datasheet for both GPUS, the H100 has twice the max flops, but they have almost the same memory bandwidth (2000 GB/sec). As memory latency dominates inference, I wonder what benefits the H100 has. One benefit could, of course, be the ability to use FP8 (which is extremely useful), but I'm interested in the difference in …

a100 bandwidth benefits fp16 gpus h100 inference latency llm machinelearning max memory sec

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Machine Learning Engineer

@ Apple | Sunnyvale, California, United States