March 6, 2024, 10:38 a.m. | Vrushank

DEV Community dev.to

LLMs can become more expensive as your app consumes more tokens. Portkey's AI gateway allows you to cache LLM responses and serve users from the cache to save costs. Here's the best part: now, with streams enabled.


Streams are an efficient way to work with large responses because:



  • They reduce the perceived latency when users are using your app.

  • Your app doesn't have to buffer it in the memory.


Let's check out how to get cached responses to your app …

app become cache costs latency llm llms part reduce responses save serve tokens work

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US