Sept. 28, 2022, 12:25 a.m. | /u/Ok-Vermicelli9298

Natural Language Processing www.reddit.com

Hi!

I need to clean over 30 million Twitter tweets for training my ml model. I have tried regex and clean-text for basic cleaning like removing punctuations, emojis etc. But, my script is taking over an hour to run which is not optimal. Is this usual? If not, then what other alternative libraries can I use?

I am quite new to nlp, so have no clue about other libraries.

languagetechnology twitter

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Machine Learning Engineer (m/f/d)

@ StepStone Group | Düsseldorf, Germany

2024 GDIA AI/ML Scientist - Supplemental

@ Ford Motor Company | United States