all AI news
How to Serve LLM Completions in Production
Jan. 18, 2024, 9:19 p.m. | Mateusz Charytoniuk
DEV Community dev.to
Preparations
To start, you need to compile llama.cpp. You can follow their README for instructions.
The server is compiled alongside other targets by default.
Once you have the server running, we can continue. We will use PHP Resonance framework.
Troubleshooting
Obtaining Open-Source LLM
I recommend starting either with llama2 or Mistral. You need to download the pretrained weights and convert them into GGUF format before they can be used with llama.cpp.
Starting Server Without a GPU
ai cpp framework llama llama2 llm mistral php production readme running serve server targets troubleshooting webdev will
More from dev.to / DEV Community
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
C003549 Data Analyst (NS) - MON 13 May
@ EMW, Inc. | Braine-l'Alleud, Wallonia, Belgium
Marketing Decision Scientist
@ Meta | Menlo Park, CA | New York City