Aug. 24, 2023, 2:16 p.m. | /u/seawee1

Machine Learning www.reddit.com

I'm working with 2D input, where I have discrete objects arranged in a grid-like structure with one temporal dimension and one spatial dimension. I'd like to process these inputs with a Transformer. Any idea what would be a suitable positional encoding to use for this? I could probably use something similar to what is used in ViT (2 spatial dimensions), but maybe there's something more suitable for the mixed "temporal-spatial" case?

encoding grid machinelearning objects positional encoding process something temporal transformer

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne