May 6, 2024, 4:42 a.m. | Zhiyu Guo, Hidetaka Kamigaito, Taro Wanatnabe

cs.LG updates on arXiv.org arxiv.org

arXiv:2405.01943v1 Announce Type: cross
Abstract: The rapid advancement in Large Language Models (LLMs) has markedly enhanced the capabilities of language understanding and generation. However, the substantial model size poses hardware challenges, affecting both memory size for serving and inference latency for token generation. To address those challenges, we propose Dependency-aware Semi-structured Sparsity (DaSS), a novel method for the recent prevalent SwiGLU-based LLMs pruning. Our approach incorporates structural dependency into the weight magnitude-based unstructured pruning. We introduce an MLP-specific pruning metric …

abstract advancement arxiv capabilities challenges cs.ai cs.cl cs.lg hardware however inference inference latency language language models language understanding large language large language models latency llms memory sparsity token type understanding variants

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US