Aug. 2, 2023, 1:43 p.m. | /u/juopitz

Machine Learning www.reddit.com

Hi,

same as other folks, I was quite curious about the recent [GZIP paper](https://aclanthology.org/2023.findings-acl.426/) presented at ACL 2023, where the authors demonstrate strong text classification performance by using a compression-based distance function in a KNN model.

However, in the end, I am not sure whether GZIP can fully live up to the hype. I tested a very simple bag-of-words distance and found that it can achieve better results compared with GZIP, while being also faster.

In a nutshell, I think …

bag classification faster found knn machinelearning simple text text classification think words

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Machine Learning Engineer

@ Apple | Sunnyvale, California, United States