Feb. 19, 2024, 5:48 a.m. | Yiyi Chen, Heather Lent, Johannes Bjerva

cs.CL updates on arXiv.org arxiv.org

arXiv:2401.12192v2 Announce Type: replace
Abstract: Textual data is often represented as realnumbered embeddings in NLP, particularly with the popularity of large language models (LLMs) and Embeddings as a Service (EaaS). However, storing sensitive information as embeddings can be vulnerable to security breaches, as research shows that text can be reconstructed from embeddings, even without knowledge of the underlying model. While defence mechanisms have been explored, these are exclusively focused on English, leaving other languages vulnerable to attacks. This work explores …

abstract arxiv breaches cs.ai cs.cl cs.cr data embedding embeddings information language language models large language large language models llms multilingual nlp research security security breaches service shows text text embedding textual type vulnerable

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US