all AI news
Hyena Hierarchy: Towards Larger Convolutional Language Models
Blog Content - TOGETHER www.together.xyz
Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the core building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiting the amount of context accessible. Existing subquadratic methods based on low-rank and sparse approximations need to be combined with dense attention layers to match Transformers, indicating a gap in capability. In this work, we propose Hyena, …
attention building context core cost deep learning language language models learn publication research scale transformers