June 9, 2023, 11:47 p.m. | /u/davidmezzetti

Machine Learning www.reddit.com

This project explores baseball history using similarity search, the [Baseball Databank](https://github.com/chadwickbureau/baseballdatabank) dataset available on GitHub, [Streamlit](https://github.com/streamlit/streamlit) and [txtai](https://github.com/neuml/txtai).

Raw data is automatically downloaded from the Baseball Databank project and indexed. Two separate indexes are created, one for batting stats and one for pitching stats. The indexing pipeline is the same for both and shown below.

https://preview.redd.it/9bn3gb5yw25b1.png?width=720&format=png&auto=webp&v=enabled&s=76f4cc6a6778b59c4c2ab2aec95421fd18bab1f1

The application shows the name of the player, the year, a trend of their OPS+ over time and the 10 most similar seasons. This …

application baseball data indexing machinelearning ops pipeline project raw shows stats trend

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Consultant Senior Power BI & Azure - CDI - H/F

@ Talan | Lyon, France