June 9, 2023, 11:47 p.m. | /u/davidmezzetti

Machine Learning www.reddit.com

This project explores baseball history using similarity search, the [Baseball Databank](https://github.com/chadwickbureau/baseballdatabank) dataset available on GitHub, [Streamlit](https://github.com/streamlit/streamlit) and [txtai](https://github.com/neuml/txtai).

Raw data is automatically downloaded from the Baseball Databank project and indexed. Two separate indexes are created, one for batting stats and one for pitching stats. The indexing pipeline is the same for both and shown below.

https://preview.redd.it/9bn3gb5yw25b1.png?width=720&format=png&auto=webp&v=enabled&s=76f4cc6a6778b59c4c2ab2aec95421fd18bab1f1

The application shows the name of the player, the year, a trend of their OPS+ over time and the 10 most similar seasons. This …

application baseball data indexing machinelearning ops pipeline project raw shows stats trend

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US