all AI news
Unicode error in vectorizing text
Jan. 18, 2022, 12:54 p.m. | /u/Bishwa12
Natural Language Processing www.reddit.com
I am trying to vectorize 20-news group data
using tensorflow TextVectorization layer
but in TextVectorization layer
if I limit the vocab size to some number say 10000 then it works fine. However if I preprocess the data or do not set the vocab size to some number then I get UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfe in position 2257: invalid start byte
error.
My question is have I done something wrong in preprocessing? Because if I set the vocab …
!-->More from www.reddit.com / Natural Language Processing
Jobs in AI, ML, Big Data
Senior ML Researcher - 3D Geometry Processing | 3D Shape Generation | 3D Mesh Data
@ Promaton | Europe
Senior AI Engineer, EdTech (Remote)
@ Lightci | Toronto, Ontario
Data Scientist for Salesforce Applications
@ ManTech | 781G - Customer Site,San Antonio,TX
AI Research Scientist
@ Gridmatic | Cupertino, CA
Data Engineer
@ Global Atlantic Financial Group | Boston, Massachusetts, United States
Machine Learning Engineer - Conversation AI
@ DoorDash | Sunnyvale, CA; San Francisco, CA; Seattle, WA; Los Angeles, CA