Oct. 25, 2022, 1:17 a.m. | Junwei Bao, Yifan Wang, Jiangyong Ying, Yeyun Gong, Jing Zhao, Youzheng Wu, Xiaodong He

cs.CL updates on arXiv.org arxiv.org

Conventional autoregressive left-to-right (L2R) sequence generation faces two
issues during decoding: limited to unidirectional target sequence modeling, and
constrained on strong local dependencies. To address the aforementioned
problem, we propose P$^3$LM, a probabilistically permuted prophet language
model, which strengthens the modeling of bidirectional information and long
token dependencies for sequence generation. Specifically, P$^3$LM learns to
generate tokens in permuted order upon an order-aware transformer decoder, as
well as to generate the corresponding future $N$ tokens with a multi-stream
attention mechanism. …

arxiv language modeling pre-training prophet training

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote