all AI news
Why is the TF-IDF matrix found manually different from sklearn TfidfVectorizer matrix?
Sept. 27, 2022, 1:59 p.m. | /u/Informal_Truck_1235
Data Science www.reddit.com
[TF-IDF manually calculated](https://preview.redd.it/clwuxvtqpeq91.png?width=397&format=png&auto=webp&s=ae6788cefae9b83b3033db3612ec83e8f5ded24f)
calculated with **sklearn -** **TfidfVectorizer:**
import nltk
paragraph = """he is good boy.
she is good girl.
boy and girl are good."""
# Cleaning the texts
import re
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from nltk.stem import WordNetLemmatizer
ps = PorterStemmer()
wordnet=WordNetLemmatizer()
sentences = nltk.sent_tokenize(paragraph)
corpus = []
for i in range(len(sentences)):
review = re.sub('[^a-zA-Z]', ' ', sentences[i])
review = review.lower()
review = review.split()
review = [wordnet.lemmatize(word) for word in review if not …
More from www.reddit.com / Data Science
Cant land a job in Data Science
1 day, 8 hours ago |
www.reddit.com
What title would you describe this position as?
1 day, 8 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Senior ML Researcher - 3D Geometry Processing | 3D Shape Generation | 3D Mesh Data
@ Promaton | Europe
Senior AI Engineer, EdTech (Remote)
@ Lightci | Toronto, Ontario
Data Scientist for Salesforce Applications
@ ManTech | 781G - Customer Site,San Antonio,TX
AI Research Scientist
@ Gridmatic | Cupertino, CA
Data Engineer
@ Global Atlantic Financial Group | Boston, Massachusetts, United States
Machine Learning Engineer - Conversation AI
@ DoorDash | Sunnyvale, CA; San Francisco, CA; Seattle, WA; Los Angeles, CA