Nov. 27, 2023, 9 p.m. | Venelin Valkov

Venelin Valkov www.youtube.com

Discover vLLM, UC Berkeley's open-source library for fast LLM inference, featuring a PagedAttention algorithm for up to 24x higher throughput than HuggingFace Transformers. We'll compare vLLM and HuggingFace using the LLama 2 7b model, and learn how to easily integrate vLLM into your projects.

vLLM page: https://blog.vllm.ai/2023/06/20/vllm.html

Discord: https://discord.gg/UaNPxVD6tv
Prepare for the Machine Learning interview: https://mlexpert.io
Subscribe: http://bit.ly/venelin-subscribe
GitHub repository: https://github.com/curiousily/Get-Things-Done-with-Prompt-Engineering-and-LangChain

Join this channel to get access to the perks and support my work:
https://www.youtube.com/channel/UCoW_WzQNJVAjxo4osNAxd_g/join

00:00 - What is vLLM? …

ai predictions algorithm berkeley boost huggingface inference join language language model large language large language model learn library llama llama 2 llm predictions projects speed transformers uc berkeley

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US