In this blog post you will learn how to accelerate Mixtral using Speculative Decoding (Medusa) and Quantization (AWQ).

amazon amazon sagemaker blog decoding generativeai huggingface learn llm mixtral mixtral 8x7b quantization sagemaker will

