[D] Deploying Mistral 7B - Quantization Methods, Hosting Options etc. (for the GPU poor) | allainews.com

March 18, 2024, 7:19 a.m. | /u/Aggravating-Floor-38

Machine Learning www.reddit.com

I'm trying to deploy a Mistral 7B api endpoint for a RAG application I'm building. A few major things I'm confused about - I'm GPU poor :( so was planning on using AWS sagemaker to deploy the model - the 2 month free plan has 125 hours of m4.xlarge or m5.xlarge instance per month on Inference - would that be enough to set up an endpoint for quantized mistral (I'm thinking 5-bit)? And like if you don't have a GPU …

api application aws aws sagemaker building deploy etc free gpu hosting machinelearning major mistral mistral 7b planning quantization rag sagemaker

More from www.reddit.com / Machine Learning

[D] What are the most common and significant challenges moving your LLM (application/system) to production? 3 hours ago | www.reddit.com

application building challenges companies +10

[P] NLLB-200 Distill 350M for en-ko 9 hours ago | www.reddit.com

cpu english good gpu +9

[D] Real talk about RAG 17 hours ago | www.reddit.com

data deal documents machinelearning +5

[P] Classification finetuning experiments on small GPT-2 sized LLMs 22 hours ago | www.reddit.com

acc classification context cpu +16

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 23 hours ago | www.reddit.com

70b art biomedical domain +16

How do I convince my superior to do data preprocessing? [D] 23 hours ago | www.reddit.com

ai engineer build chat chatbots +11

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 23 hours ago | www.reddit.com

70b art biomedical domain +16

[D] Mathematical aspects of tokenization 1 day, 2 hours ago | www.reddit.com

compression educational encoding entropy +7

[R] Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey 1 day, 3 hours ago | www.reddit.com

abstract advancement application challenges +15

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Software Engineering Manager, Generative AI - Characters

@ Meta | Bellevue, WA | Menlo Park, CA | Seattle, WA | New York City | San Francisco, CA

View on ai-jobs.net

Senior Operations Research Analyst / Predictive Modeler

@ LinQuest | Colorado Springs, Colorado, United States

View on ai-jobs.net