Web: http://arxiv.org/abs/2201.10005

Jan. 26, 2022, 2:11 a.m. | Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johann

cs.LG updates on arXiv.org arxiv.org

Text embeddings are useful features in many applications such as semantic
search and computing text similarity. Previous work typically trains models
customized for different use cases, varying in dataset choice, training
objective and model architecture. In this work, we show that contrastive
pre-training on unsupervised data at scale leads to high quality vector
representations of text and code. The same unsupervised text embeddings that
achieve new state-of-the-art results in linear-probe classification also
display impressive semantic search capabilities and sometimes even …

arxiv code text training

More from arxiv.org / cs.LG updates on arXiv.org

Data Scientist

@ Fluent, LLC | Boca Raton, Florida, United States

Big Data ETL Engineer

@ Binance.US | Vancouver

Data Scientist / Data Engineer

@ Kin + Carta | Chicago

Data Engineer

@ Craft | Warsaw, Masovian Voivodeship, Poland

Senior Manager, Data Analytics Audit

@ Affirm | Remote US

Data Scientist - Nationwide Opportunities, AWS Professional Services

@ Amazon.com | US, NC, Virtual Location - N Carolina