May 19, 2022, 9:35 p.m. | /u/Black_Beard53

Computer Vision www.reddit.com

Hello all,

I am trying to implement an encoder-decoder architecture with attention mechanism (not self-attention) for image sequences instead of text. So far I am only able to get resources that deal with image to text only. Has anyone worked on this before or know any resources that would be helpful ?

I am thinking of using CNN to get a flattened image vectors and feed it to the encoder-decoder module sequentially....and train the model to obtain a latent representation …

attention computervision image seq2seq

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Management Associate

@ EcoVadis | Ebène, Mauritius

Senior Data Engineer

@ Telstra | Telstra ICC Bengaluru