July 16, 2023, 3 p.m. | Venelin Valkov

Venelin Valkov www.youtube.com

Can you build a private Chatbot with ChatGPT-like performance using a local LLM on a single GPU?

Mostly, yes! In this tutorial, we'll use Falcon 7B with LangChain to build a chatbot that retains conversation memory. We can achieve decent performance by utilizing a single T4 GPU and loading the model in 8-bit (~6 tokens/second). We'll also explore techniques to improve the output quality and speed, such as:

- Stopping criteria: detect the start of LLM "rambling" and stop the …

build chatbot chatgpt conversation falcon gpu langchain llm loading memory performance tutorial

More from www.youtube.com / Venelin Valkov

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States