Web: http://arxiv.org/abs/2205.01571

May 4, 2022, 1:11 a.m. | Kuo-Wei Chang, Hsu-Tung Shih, Tian-Sheuan Chang, Shang-Hong Tsai, Chih-Chyau Yang, Chien-Ming Wu, Chun-Ming Huang

cs.LG updates on arXiv.org arxiv.org

Memory bandwidth has become the real-time bottleneck of current deep learning
accelerators (DLA), particularly for high definition (HD) object detection.
Under resource constraints, this paper proposes a low memory traffic DLA chip
with joint hardware and software optimization. To maximize hardware utilization
under memory bandwidth, we morph and fuse the object detection model into a
group fusion-ready model to reduce intermediate data access. This reduces the
YOLOv2's feature memory traffic from 2.9 GB/s to 0.15 GB/s. To support group
fusion, …

ar arxiv chip detection memory time traffic

More from arxiv.org / cs.LG updates on arXiv.org

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC

Senior Data Science Writer

@ NannyML | Remote

Director of AI/ML Engineering

@ Armis Industries | Remote (US only), St. Louis, California

Digital Analytics Manager

@ Patagonia | Ventura, California