March 6, 2024, 10:38 a.m. | Vrushank

DEV Community dev.to

LLMs can become more expensive as your app consumes more tokens. Portkey's AI gateway allows you to cache LLM responses and serve users from the cache to save costs. Here's the best part: now, with streams enabled.


Streams are an efficient way to work with large responses because:



  • They reduce the perceived latency when users are using your app.

  • Your app doesn't have to buffer it in the memory.


Let's check out how to get cached responses to your app …

app become cache costs latency llm llms part reduce responses save serve tokens work

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US