all AI news
LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial Expression Recognition. (arXiv:2305.03343v1 [cs.CV])
cs.CV updates on arXiv.org arxiv.org
Previous methods for dynamic facial expression recognition (DFER) in the wild
are mainly based on Convolutional Neural Networks (CNNs), whose local
operations ignore the long-range dependencies in videos. Transformer-based
methods for DFER can achieve better performances but result in higher FLOPs and
computational costs. To solve these problems, the local-global spatio-temporal
Transformer (LOGO-Former) is proposed to capture discriminative features within
each frame and model contextual relationships among frames while balancing the
complexity. Based on the priors that facial muscles move …
arxiv cnns computational convolutional neural networks costs dependencies dynamic global logo networks neural networks operations recognition temporal transformer videos