March 1, 2024, 5:43 a.m. | Zhaoyuan Su, Ammar Ahmed, Zirui Wang, Ali Anwar, Yue Cheng

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.13429v1 Announce Type: cross
Abstract: As the number of pre-trained machine learning (ML) models is growing exponentially, data reduction tools are not catching up. Existing data reduction techniques are not specifically designed for pre-trained model (PTM) dataset files. This is largely due to a lack of understanding of the patterns and characteristics of these datasets, especially those relevant to data reduction and compressibility.
This paper presents the first, exhaustive analysis to date of PTM datasets on storage compressibility. Our analysis …

abstract arxiv cs.db cs.lg data data reduction dataset everything files machine machine learning ml models storage tools type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Global Data Architect, AVP - State Street Global Advisors

@ State Street | Boston, Massachusetts

Data Engineer

@ NTT DATA | Pune, MH, IN