Feb. 19, 2024, 9:47 p.m. | Eva Revear

Towards Data Science - Medium towardsdatascience.com

Image from Unsplash

Building a Semantic Book Search: Scale an Embedding Pipeline with Apache Spark and AWS EMR Serverless

Using OpenAI’s Clip model to support natural language search on a collection of 70k book covers

In a previous post I did a little PoC to see if I could use OpenAI’s Clip model to build a semantic book search. It worked surprisingly well, in my opinion, but I couldn’t help wondering if it would be better with more data. The …

apache spark data engineering python semantic-search spark

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne