March 21, 2024, 4:48 a.m. | Dongsheng Zhu, Zhenyu Mao, Jinghui Lu, Rui Zhao, Fei Tan

cs.CL updates on arXiv.org arxiv.org

arXiv:2210.03963v2 Announce Type: replace
Abstract: Contrastive learning has recently achieved compelling performance in unsupervised sentence representation. As an essential element, data augmentation protocols, however, have not been well explored. The pioneering work SimCSE resorting to a simple dropout mechanism (viewed as continuous augmentation) surprisingly dominates discrete augmentations such as cropping, word deletion, and synonym replacement as reported. To understand the underlying rationales, we revisit existing approaches and attempt to hypothesize the desiderata of reasonable data augmentation methods: balance of semantic …

abstract arxiv augmentation continuous cs.cl data dropout element however performance representation representation learning simple type unsupervised word work

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne