March 7, 2024, 6:35 p.m. | Kaggle

Kaggle www.youtube.com

About this project: The Yoruba-RAG project focuses on enhancing the performance of large language models, like GPT-3, when handling questions in low-resource languages like Yoruba. The project involves web scraping from a Yoruba blog using Beautiful Soup, storing the data in a text file, and dividing it into smaller chunks. To effectively process Yoruba text, the Language-agnostic BERT Sentence Embedding (LaBSE) model is employed, and the results are stored in a Chroma database. This enriched database significantly improves GPT's ability …

blog data file gpt gpt-3 joshua kaggle language language models languages large language large language models low performance project questions rag scraping text web web scraping

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Software Engineer, Data Tools - Full Stack

@ DoorDash | Pune, India

Senior Data Analyst

@ Artsy | New York City