May 14, 2024, 8:42 p.m. |

Simon Willison's Weblog simonwillison.net

Context caching for Google Gemini

Another new Gemini feature announced today. Long context models enable answering questions against large chunks of text, but the price of those long prompts can be prohibitive—$3.50/million for Gemini Pro 1.5 up to 128,000 tokens and $7/million beyond that.

Context caching offers a price optimization, where the long prefix prompt can be reused between requests, halving the cost per prompt but at an additional cost of $4.50 / 1 million tokens per hour to keep …

ai caching context gemini generativeai google llms optimization

