Oct. 28, 2022, 1:16 a.m. | Raymond Li, Wen Xiao, Linzi Xing, Lanjun Wang, Gabriel Murray, Giuseppe Carenini

cs.CL updates on arXiv.org arxiv.org

The multi-head self-attention mechanism of the transformer model has been
thoroughly investigated recently. In one vein of study, researchers are
interested in understanding why and how transformers work. In another vein,
researchers propose new attention augmentation methods to make transformers
more accurate, efficient and interpretable. In this paper, we combine these two
lines of research in a human-in-the-loop pipeline to first discover important
task-specific attention patterns. Then those patterns are injected, not only to
smaller models, but also to the …

arxiv attention exploitation human patterns segmentation summarization

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne