s
March 26, 2024, 6:19 a.m. |

Simon Willison's Weblog simonwillison.net

Cohere int8 & binary Embeddings - Scale Your Vector Database to Large Datasets


Jo Kristian Bergum told me "The accuracy retention [of binary embedding vectors] is sensitive to whether the model has been using this binarization as part of the loss function."


Cohere provide an API for embeddings, and last week added support for returning binary vectors specifically tuned in this way.


250M embeddings (Cohere provide a downloadable dataset of 250M embedded documents from Wikipedia) at float32 (4 bytes) is …

accuracy api binary cohere database datasets embedding embeddings function large datasets loss part retention scale vector vector database vectors

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US