all AI news
Optimizing for efficiency/memory use with spaCy and dask when preprocessing ~30M medium-large strings
May 26, 2022, 2:06 p.m. | /u/synthphreak
Natural Language Processing www.reddit.com
My current set up is that the comment strings are stored as a `dask.Series`. At first I was using `dask` methods to clean the comments in parallel (this step involves multiple passes each using regex), then using `apply(nlp)` to convert each comment into a `spacy` `Doc` (this just uses …
More from www.reddit.com / Natural Language Processing
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
IT Commercial Data Analyst - ESO
@ National Grid | Warwick, GB, CV34 6DA
Stagiaire Data Analyst – Banque Privée - Juillet 2024
@ Rothschild & Co | Paris (Messine-29)
Operations Research Scientist I - Network Optimization Focus
@ CSX | Jacksonville, FL, United States
Machine Learning Operations Engineer
@ Intellectsoft | Baku, Baku, Azerbaijan - Remote
Data Analyst
@ Health Care Service Corporation | Richardson Texas HQ (1001 E. Lookout Drive)