A Directional Attention Fusion and Multi-Head Spatial-Channel Attention Network for Facial Expression Recognition

Yukun Shao; Yang Li; Baiqiang Wu

doi:10.53941/jmlis.2025.100007

Abstract

Facial expression recognition (FER) constitutes a core component of affective computing and provides substantial practical value in human–computer interaction. However, its performance in unconstrained scenarios remains challenging due to issues such as intra-class variations, subtle inter-class differences, and environmental interference. To address these limitations, this paper introduces a novel Directional Attention Fusion and Multi-Head Spatial-Channel Attention Network (DAF-MHSCA). The proposed framework first extracts coarse-grained facial representations using a ResNet18 backbone. To further enrich discriminative details, an Adaptive Feature Calibration (AFC) module employs multi-scale dilated convolutions to capture fine-grained expression details. Subsequently, a Directional Attention Fusion (DAF) module is incorporated, leveraging self-attention and cross-attention along the width and height directions to generate spatial attention maps. Finally, a Multi-Head Spatial–Channel Attention (MHSCA) module performs joint spatial and channel-wise attention, guided by the previously generated attention maps, thereby enabling more accurate emotion classification. The competitive experimental results on five datasets have shown that our proposed method achieves notable improvements over state-of-the-art methods.

References

1.
Schmidt, K.L.; Cohn, J.F. Human Facial Expressions as Adaptations: Evolutionary Questions in Facial Expression Research. Am. J. Phys. Anthropol. Off. Publ. Am. Assoc. Phys. Anthropol. 2001, 116, 3–24.
2.
Li, S.; Deng, W. Deep Facial Expression Recognition: A Survey. IEEE Trans. Affect. Comput. 2020, 13, 1195–1215.
3.
Li, Y.; Yang, G.; Su, Z.; et al. Human Activity Recognition Based on Multienvironment Sensor Data. Inf. Fusion 2023, 91, 47–63.
4.
Ekundayo O; Viriri S. Facial Expression Recognition: A Review of Methods, Performances and Limitations. In Proceedings of the 2019 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 6–8 March 2019.
5.
Kopalidis, T.; Solachidis, V.; Vretos, N.; et al. Advances in Facial Expression Recognition: A Survey of Methods, Benchmarks, Models, and Datasets. Information 2024, 15, 135.
6.
Pham, T.D.; Duong, M.T.; Ho, Q.T.; et al. CNN-Based Facial Expression Recognition with Simultaneous Consideration of Inter-Class and Intra-Class Variations. Sensors 2023, 23, 9658.
7.
Li, S.; Deng, W. Reliable Crowdsourcing and Deep Locality-Preserving Learning for Unconstrained Facial Expression Recognition. IEEE Trans. Image Process. 2019, 28, 356–370.
8.
Wen, Z.; Lin, W.; Wang, T.; et al. Distract Your Attention: Multihead Cross Attention Network for Facial Expression Recognition. Biomimetics 2023, 8, 199.
9.
Zhang, S.; Zhang, Y.; Zhang, Y.; et al. A Dual Direction Attention Mixed Feature Network for Facial Expression Recognition. Electronics 2023, 12, 3595.
10.
Cabacas-Maso, J.; Ortega-Beltrn, E.; Benito-Altamirano, I.; et al. Enhancing Facial Expression Recognition through Dual-Direction Attention Mixed Feature Networks: Application to 7th ABAW Challenge. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024; pp.311-321.
11.
Mollahosseini, A.; Hasani, B.; Mahoor, M.H. Affectnet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Trans. Affect. Comput. 2019, 10, 18–31.
12.
He, K.; Zhang, X.; Ren, S.; et al. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016.
13.
Guo, Y.; Zhang, L.; Hu, Y.; et al. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp.87-102.
14.
Uniyal, S.; Agarwal, R. Analyzing Facial Emotion Patterns in AffectNet with Deep Neural Networks. In Proceedings of the 2024 1st International Conference on Advances in Computing, Communication and Networking (ICAC2N), Greater Noida, India, 16–17 December 2024.
15.
Song, C.H.; Han, H.J.; Avrithis, Y. All the attention you need: Global-local, spatial-channel attention for image retrieval. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp.439-448.
16.
Li, H.; Sui, M.; Zhao, F.; et al. MVT: Mask Vision Transformer for Facial Expression Recognition in the Wild. arXiv 2021, arXiv:2106.04520.
17.
Farzaneh, A.H.; Qi, X. Facial Expression Recognition in the Wild via Deep Attentive Center Loss. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Virtual, 3–8 January 2021.
18.
Ma, F.; Sun, B.; Li, S. Facial Expression Recognition with Visual Transformers and Attentional Selective Fusion. IEEE Trans. Affect. Comput. 2023, 14, 1236–1248.
19.
Liu, H.; Cai, H.; Lin, Q.; et al. FEDA: Fine-Grained Emotion Difference Analysis for Facial Expression Recognition. Biomed. Signal Process. Control 2023, 79, 104209.
20.
Zheng, J.; Li, B.; Zhang, S.; et al. Attack Can Benefit: An Adversarial Approach to Recognizing Facial Expressions Under Noisy Annotations. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI), Washington, DC, USA, 7–14 February 2023.
21.
Liu, H.; Zhou, Q.; Zhang, C.; et al. MMATrans: Muscle Movement Aware Representation Learning for Facial Expression Recognition via Transformers. IEEE Trans. Ind. Inform. 2024, 20, 13753–13764.
22.
Xu, J.; Li, Y.; Yang, G.; et al. Multiscale Facial Expression Recognition Based on Dynamic Global and Static Local Attention. IEEE Trans. Affect. Comput. 2025, 16, 683–696.

Scilight Press

Author Information

Abstract

Keywords

References

About Scilight

Journals

Publishing Policies

Contact Us