April 26, 2024, 4:42 a.m. | Yu Gao, Juan Camilo Vega, Paul Chow

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.16158v1 Announce Type: cross
Abstract: FPGAs are rarely mentioned when discussing the implementation of large machine learning applications, such as Large Language Models (LLMs), in the data center. There has been much evidence showing that single FPGAs can be competitive with GPUs in performance for some computations, especially for low latency, and often much more efficient when power is considered. This suggests that there is merit to exploring the use of multiple FPGAs for large machine learning applications. The challenge …

abstract applications arxiv center cs.ar cs.dc cs.lg data data center evidence fpga fpgas gpus implementation language language models large language large language models latency llms low low latency machine machine learning machine learning applications performance platforms scale transformers type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Sr. BI Analyst

@ AkzoNobel | Pune, IN