[R] Self-Attention: Positional encoding with QK kernels using FFT | allainews.com

Jan. 2, 2024, 9:39 p.m. | /u/alagagbar

Machine Learning www.reddit.com

I've pre-trained tiny character-level transformer (4M parameters) with 128 token/character training context-length, using RoPE (rotary) positional encodings. Here is how my model performs with RoPE (I've trained on Polish language, sorry). The prompt was " Adam Mickiewicz był to ":

Adam Mickiewicz był to konkomiste, narodzinie cesarza racja datka zajmowania nazwy. Co reguła we stanie można wyrażenie znane symbol języka, że niewyka paszawy, że
język dostowy posinistwa. Matejkoknieswobowi, wszczucie inazwidzstekwi.

Języ przeskobiślanistani nindyb pasowemodarówistanizacharajustęży, nisku daberzycze lawinersławachystrodwodateliżaćby, i istny celefystraminy …

adam attention context encoding fft language machinelearning parameters positional encoding prompt rope self-attention token training transformer

More from www.reddit.com / Machine Learning

[D] The "it" in AI models is really just the dataset? 2 hours ago | www.reddit.com

ai models dataset machinelearning

[P] Open Source / Projects Based Machine Learning Community? 7 hours ago | www.reddit.com

building collaborations community devs +16

[R] DDPM for Timeseries Generation 9 hours ago | www.reddit.com

column data data generation dataset +13

[P] [D] Examples of client projects that you have delivered 10 hours ago | www.reddit.com

client consulting examples freelance +6

[D] is any traditional industry employee here can share if they are using gen ai … 11 hours ago | www.reddit.com

ai at work banking employee enterprises +6

[N] AI engineers report burnout and rushed rollouts as ‘rat race’ to stay competitive hits … 20 hours ago | www.reddit.com

ai tools article artificial artificial intelligence +17

[D] software to design figures 22 hours ago | www.reddit.com

algorithms alphatensor alphazero create +11

[D] How to train a text detection model that will detect it's orientation (rotation) ranging … 23 hours ago | www.reddit.com

case convention detection image +6

[R] HGRN2: Gated Linear RNNs with State Expansion 1 day, 3 hours ago | www.reddit.com

abstract attention expansion however +15

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net