May 6, 2024, 10:06 a.m. | /u/Boring_Astronaut_421

Machine Learning www.reddit.com

Hello people
I am currently working as a data scientist at startup. We have a requirement of extracting entities from the text of 10 billion tokens. I am not aware how to do it at this much scale. What should be the pipeline and so on. It would be helpful if you guys share your knowledge or good research paper/blog. Currently we are working on 18 entities and my boss wants me to get 93% accuracy.
Thankyou

billion data data scientist hello machinelearning ner people pipeline scale startup text tokens

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US