April 11, 2024, 6:27 a.m. | Jaydeep Biswas

DEV Community dev.to

Hey everyone,


I'm currently facing some challenges with optimizing the response time of my AWS instance. Here's the setup: I'm using a g5.xlarge instance which houses a single NVIDIA A10G GPU with 24GB of VRAM. Recently, I fine-tuned a mistralai/Mistral-7B-Instruct-v0.2 model on my custom data and then merged it with the base model. Additionally, I applied quantization methods to optimize further.


However, when I send a request to my fine-tuned model, it's taking approximately 3 minutes to respond, even …

advice ai aws challenges gpu hey instance llm machinelearning mistral multiple nvidia python setup

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US