s
Jan. 8, 2024, 5:22 a.m. |

Simon Willison's Weblog simonwillison.net

Text Embeddings Reveal (Almost) As Much As Text


Embeddings of text - where a text string is converted into a fixed-number length array of floating point numbers - are demonstrably reversible: "a multi-step method that iteratively corrects and re-embeds text is able to recover 92% of 32-token text inputs exactly".

This means that if you're using a vector database for embeddings of private data you need to treat those embedding vectors with the same level of protection as the original …

ai array embeddings floating point inputs numbers privacy security string text token

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States