Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

Jan. 31, 2024, 1:17 a.m. |

Quantization is a technique for making machine learning models smaller and faster. We quantize Llama2-70B-Chat, producing an equivalent-quality model that generates 2.2x more...

chat core faster generative-ai gpus h100 llama2 llms machine machine learning machine learning models making nvidia nvidia h100 quality quantization tensor tensor core gpus

Visit resource

More from www.databricks.com / Databricks

Executive Overview: The Rise of Open Foundational Models 3 days, 2 hours ago | www.databricks.com

ai applications applications concept control +16

Revolutionizing Data in Sports: The Game-Changing Impact of Databricks Marketplace and Delta Sharing 4 days, 3 hours ago | www.databricks.com

advanced advanced analytics analytics data +14

Databricks Assistant Tips & Tricks for Data Engineers 4 days, 10 hours ago | www.databricks.com

assistant best of data databricks +13

Intelligently Balance Cost Optimization & Reliability on Databricks 4 days, 15 hours ago | www.databricks.com

access balance compute cost +15

The Modern Data Stack: How The Evolution of Data Architecture Led to The Data Intelligence … 4 days, 23 hours ago | www.databricks.com

analysis and analysis architecture collection +16

Calibrating the Mosaic Evaluation Gauntlet 5 days, 15 hours ago | www.databricks.com

benchmark databricks evaluation good +5

Databricks named a Leader in the 2024 Forrester Wave for Data Lakehouses 5 days, 16 hours ago | www.databricks.com

announcements company blog current data +4

Databricks receives FedRAMP High agency ATO on AWS GovCloud, now in public preview 5 days, 17 hours ago | www.databricks.com

agency aws databricks fedramp +7

How we improved DatabricksIQ LLM quality for AI-generated table comments 6 days, 14 hours ago | www.databricks.com

algorithms engineering blog generated improvements +7

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

DevOps Engineer (Data Team)

@ Reward Gateway | Sofia/Plovdiv

View on ai-jobs.net

View more jobs

all AI news

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

More from www.databricks.com / Databricks

Jobs in AI, ML, Big Data

Founding AI Engineer, Agents

AI Engineer Intern, Agents

AI Research Scientist

Data Architect

Data ETL Engineer

DevOps Engineer (Data Team)