Aug. 2, 2023, 1:43 p.m. | /u/juopitz

Machine Learning www.reddit.com

Hi,

same as other folks, I was quite curious about the recent [GZIP paper](https://aclanthology.org/2023.findings-acl.426/) presented at ACL 2023, where the authors demonstrate strong text classification performance by using a compression-based distance function in a KNN model.

However, in the end, I am not sure whether GZIP can fully live up to the hype. I tested a very simple bag-of-words distance and found that it can achieve better results compared with GZIP, while being also faster.

In a nutshell, I think …

bag classification faster found knn machinelearning simple text text classification think words

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US