Aug. 29, 2022, 1:14 a.m. | Michał J. Tyszkiewicz, Kevis-Kokitsi Maninis, Stefan Popov, Vittorio Ferrari

cs.CV updates on arXiv.org arxiv.org

We propose a transformer-based neural network architecture for multi-object
3D reconstruction from RGB videos. It relies on two alternative ways to
represent its knowledge: as a global 3D grid of features and an array of
view-specific 2D grids. We progressively exchange information between the two
with a dedicated bidirectional attention mechanism. We exploit knowledge about
the image formation process to significantly sparsify the attention weight
matrix, making our architecture feasible on current hardware, both in terms of
memory and computation. …

3d arxiv cv objects ray transformers videos

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote