Facial expression recognition (FER) constitutes a core component of affective computing and provides substantial practical value in human–computer interaction. However, its performance in unconstrained scenarios remains challenging due to issues such as intra-class variations, subtle inter-class differences, and environmental interference. To address these limitations, this paper introduces a novel Directional Attention Fusion and Multi-Head Spatial-Channel Attention Network (DAF-MHSCA). The proposed framework first extracts coarse-grained facial representations using a ResNet18 backbone. To further enrich discriminative details, an Adaptive Feature Calibration (AFC) module employs multi-scale dilated convolutions to capture fine-grained expression details. Subsequently, a Directional Attention Fusion (DAF) module is incorporated, leveraging self-attention and cross-attention along the width and height directions to generate spatial attention maps. Finally, a Multi-Head Spatial–Channel Attention (MHSCA) module performs joint spatial and channel-wise attention, guided by the previously generated attention maps, thereby enabling more accurate emotion classification. The competitive experimental results on five datasets have shown that our proposed method achieves notable improvements over state-of-the-art methods.



