all AI news
Seeking advice on optimizing response time and handling multiple requests on AWS instance with NVIDIA A10G GPU
DEV Community dev.to
Hey everyone,
I'm currently facing some challenges with optimizing the response time of my AWS instance. Here's the setup: I'm using a g5.xlarge instance which houses a single NVIDIA A10G GPU with 24GB of VRAM. Recently, I fine-tuned a mistralai/Mistral-7B-Instruct-v0.2 model on my custom data and then merged it with the base model. Additionally, I applied quantization methods to optimize further.
However, when I send a request to my fine-tuned model, it's taking approximately 3 minutes to respond, even …
advice ai aws challenges gpu hey instance llm machinelearning mistral multiple nvidia python setup