June 24, 2023, 2:29 a.m. | /u/newauthry

Data Science www.reddit.com

While doing NLP vector embeddings for media, how do I handle and get rid of proper nouns in the work? tf-idf is also a useless metric to use, right, Since the highest tf-idf words will be words exclusive to the media that aren't overall useful (e.g. "Jedi" will have a high tf-idf score for Star Wars but is completely useless since no other media uses it). The opposite side of this is also a problem: if a character's name is …

datascience embeddings media nlp tf-idf vector vector embeddings words work

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst (Digital Business Analyst)

@ Activate Interactive Pte Ltd | Singapore, Central Singapore, Singapore