April 23, 2024, 12:33 p.m. | /u/juliensalinas

Machine Learning www.reddit.com

Many are trying to install and deploy their own LLaMA 3 model, so here is a tutorial I just made showing how to deploy LLaMA 3 on an AWS EC2 instance: [https://nlpcloud.com/how-to-install-and-deploy-llama-3-into-production.html](https://nlpcloud.com/how-to-install-and-deploy-llama-3-into-production.html?utm_source=reddit&utm_campaign=fqwerty13-6816-81ed-a26450242ac140019)

Deploying LLaMA 3 8B is fairly easy but LLaMA 3 70B is another beast. Given the amount of VRAM needed you might want to provision more than one GPU and use a dedicated inference server like vLLM in order to split your model on several GPUs.

LLaMA 3 …

70b beast easy gpu gpus inference llama llama 3 machinelearning server space split

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States