June 28, 2024, 12:16 p.m. | Mateusz Charytoniuk

DEV Community dev.to

Paddler is an open-source load balancer and reverse proxy designed to optimize servers running llama.cpp.


Typical strategies like round robin or least connections are not effective for llama.cpp servers, which need slots for continuous batching and concurrent requests.


Paddler overcomes this by maintaining a stateful load balancer that is aware of each server's available slots, ensuring efficient request distribution. Additionally, Paddler uses agents to monitor the health of individual llama.cpp instances, providing feedback to the load balancer for optimal …

ai batching continuous cpp devops least llama llms opensource production requests running servers strategies

Junior Senior Reliability Engineer

@ NielsenIQ | Bogotá, Colombia

[Job - 15712] Vaga Afirmativa para Mulheres - QA (Automation), SR

@ CI&T | Brazil

Production Reliability Engineer, Trade Desk

@ Jump Trading | Sydney, Australia

Senior Process Engineer, Prenatal

@ BillionToOne | Union City and Menlo Park, CA

Senior Scientist, Sustainability Science and Innovation

@ Microsoft | Redmond, Washington, United States

Data Scientist

@ Ford Motor Company | Chennai, Tamil Nadu, India