March 23, 2023, 2:53 p.m. | James Briggs

James Briggs www.youtube.com

In this video, we're going to focus on preparing our text using LangChain data loaders, tokenization using the tiktoken tokenizers, chunking with LangChain text splitters, and storing data with Hugging Face datasets. Naturally, the focus here is on OpenAI embedding and completion models, but we can apply the same logic to other LLMs like those available via Hugging Face, Cohere, and so on.

🔗 Notebook link:
https://github.com/pinecone-io/examples/blob/master/generation/langchain/handbook/xx-langchain-chunking.ipynb

🎙️ Support me on Patreon:
https://patreon.com/JamesBriggs

🎨 AI Art:
https://www.etsy.com/uk/shop/IntelligentArtEU

🤖 70% Discount …

ai art apply art article cohere course data data prep datasets embedding face focus hugging face langchain llms logic nlp notebook openai patreon python support text tokenization transformers video

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Machine Learning Engineer

@ Samsara | Canada - Remote