all AI news
Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices. (arXiv:2207.12852v1 [cs.LG])
cs.LG updates on arXiv.org arxiv.org
Modern search systems use several large ranker models with transformer
architectures. These models require large computational resources and are not
suitable for usage on devices with limited computational resources. Knowledge
distillation is a popular compression technique that can reduce the resource
needs of such models, where a large teacher model transfers knowledge to a
small student model. To drastically reduce memory requirements and energy
consumption, we propose two extensions for a popular sentence-transformer
distillation procedure: generation of an optimal size …
arxiv battery compression devices edge edge devices inference lg life storage transformer