[D] How does fast inference work with state of the art LLMs? | allainews.com

May 7, 2024, 5:36 a.m. | /u/Fit-Flow-4180

Machine Learning www.reddit.com

I’ve read that inference speed for models like Llama-2 70B is ~10 t/s at best. So that left me wondering how the extremely large models like GPT-4 (1T params?) do their fast 20 t/s inference. With 10x the params, they gotta have at least 3x the layers(?) So that should make its inference much slower. Am I missing anything? What kind of further improvements might these companies be doing to power their fast APIs?

Edit: I must mention that you …

70b art gpt gpt-4 inference large models least llama llms machinelearning params speed state state of the art work

More from www.reddit.com / Machine Learning

[D] Culture of Recycling Old Conference Submissions in ML an hour ago | www.reddit.com

conference conferences culture iclr +10

[D] How Do You Efficiently Conduct Ablation Studies in Machine Learning? 2 hours ago | www.reddit.com

fine-tuning grid insights machine +7

[P] N-way-attention 6 hours ago | www.reddit.com

algorithm attention concept every +12

[D] Is it possible to train ViTMAE with Hyperspectral Satellite Images? 16 hours ago | www.reddit.com

encoder format images learn +4

[D] Mamba Convergence speed 19 hours ago | www.reddit.com

class convergence dataset example +10

[Project] Tabletop HandyBot: low-cost robotic arm assistant for tabletop tasks 23 hours ago | www.reddit.com

arm assistant cost functional +9

[R] Grounding DINO 1.5 Release: the most capable open-set detection model 23 hours ago | www.reddit.com

building dataset detection foundation +12

[D] Foundational Time Series Models Overrated? 1 day ago | www.reddit.com

chronos domain etc example +13

[project] YOLOv8 quantized in INT8 1 day ago | www.reddit.com

fps github jetson jetson orin +5

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net