Scale LLM Inference on Amazon SageMaker with Multi-Replica Endpoints

Jan. 11, 2024, midnight | schmidphilipp1995@gmail.com (Philipp Schmid)

In this blog post you will learn how to increase the throughput of Llama 13B on Amazon SageMaker using single instance multi-replica endpoints.

13b amazon amazon sagemaker blog endpoints huggingface inference instance learn llama llm replica sagemaker scale will

Visit resource

More from www.philschmid.de / philschmid blog

Efficiently fine-tune Llama 3 with PyTorch FSDP and Q-Lora 3 weeks, 4 days ago | www.philschmid.de

70b datasets face generativeai +11

Deploy Llama 3 on Amazon SageMaker 4 weeks, 1 day ago | www.philschmid.de

70b amazon amazon sagemaker blog +9

Accelerate Mixtral 8x7B with Speculative Decoding and Quantziation on Amazon SageMaker 1 month, 2 weeks ago | www.philschmid.de

amazon amazon sagemaker blog decoding +9

Deploy Llama 2 70B on AWS Inferentia2 with Hugging Face Optimum 1 month, 3 weeks ago | www.philschmid.de

70b amazon amazon sagemaker aws +16

Fine-Tune & Evaluate LLMs in 2024 with Amazon SageMaker 2 months ago | www.philschmid.de

amazon amazon sagemaker blog face +8

Evaluate LLMs with Hugging Face Lighteval on Amazon SageMaker 2 months, 1 week ago | www.philschmid.de

amazon amazon sagemaker blog face +8

How to fine-tune Google Gemma with ChatML and Hugging Face TRL 2 months, 2 weeks ago | www.philschmid.de

blog datasets face gemma +10

RLHF in 2024 with DPO & Hugging Face 3 months, 3 weeks ago | www.philschmid.de

blog direct preference optimization face generativeai +9

How to Fine-Tune LLMs in 2024 with Hugging Face 3 months, 3 weeks ago | www.philschmid.de

blog dataset datasets face +11

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

all AI news

Scale LLM Inference on Amazon SageMaker with Multi-Replica Endpoints

More from www.philschmid.de / philschmid blog

Jobs in AI, ML, Big Data

Software Engineer for AI Training Data (School Specific)

Software Engineer for AI Training Data (Python)

Software Engineer for AI Training Data (Tier 2)

Data Engineer

Artificial Intelligence – Bioinformatic Expert

Lead Developer (AI)