March 18, 2024, 7:15 a.m. | /u/Aggravating-Floor-38

Natural Language Processing www.reddit.com

I'm trying to deploy a Mistral 7B api endpoint for a RAG application I'm building. A few major things I'm confused about - I'm GPU poor :( so was planning on using AWS sagemaker to deploy the model - the 2 month free plan has 125 hours of m4.xlarge or m5.xlarge instance per month on Inference - would that be enough to set up an endpoint for quantized mistral (I'm thinking 5-bit)? And like if you don't have a GPU …

api application aws aws sagemaker building deploy etc free gpu hosting languagetechnology major mistral mistral 7b planning quantization rag sagemaker

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Principal Machine Learning Engineer (AI, NLP, LLM, Generative AI)

@ Palo Alto Networks | Santa Clara, CA, United States

Consultant Senior Data Engineer F/H

@ Devoteam | Nantes, France