Feb. 2, 2024, 2:08 p.m. | /u/aMnHa7N0Nme

Machine Learning www.reddit.com

So i have a task that i am trying to solve at the moment, the input strings in my datasets are in 10-20K length. What would be the best way to handle this with the tokenizers in the BertForSeqeuenceClassification?



I have checked the LongFormer and BigBird but they are also limited to the 512 and 4096 token length.

some help in this matter would be greatly appreciated.

datasets machinelearning solve strings

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US