July 31, 2023, 1:20 p.m. | Shivamshinde

Towards AI - Medium pub.towardsai.net

This article will discuss different cleaning techniques that are essential to obtain maximum performance from textual data.

Photo by Amador Loureiro on Unsplash

For the demonstration of the text cleaning methods, we will use the text dataset named ‘metamorphosis’ from Kaggle.

Let’s start with importing the required Python libraries for the cleaning process.
import nltk, re, string
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
Now let’s load the dataset.
file_directory = 'link-to-the-dataset-local-directory'
file = open(file_directory, 'rt', encoding='utf-8') …

article closer look data data science dataset discuss embedding kaggle libraries look natural-language-processi nlp performance python text textual word embeddings

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Analyst

@ Alstom | Johannesburg, GT, ZA