all AI news
Cost/Performance Optimization with LLMs [Panel]
MLOps.community mlops.community
Sign up for the next LLM in production conference here: https://go.mlops.community/LLMinprod
Watch all the talks from the first conference: https://go.mlops.community/llmconfpart1
// Abstract
In this panel discussion, the topic of the cost of running large language models (LLMs) is explored, along with potential solutions. The benefits of bringing LLMs in-house, such as latency optimization and greater control, are also discussed. The panelists explore methods such as structured pruning and knowledge distillation for optimizing LLMs. OctoML's platform is mentioned as a tool …
abstract benefits control cost distillation knowledge language language models large language models latency llms octoml optimization panel platform pruning running solutions tool