With the rapid proliferation of multimodal social media platforms, fake news has been increasingly disseminated through multiple modalities such as text, images, and videos, posing serious threats to social stability, public cognition, and these platforms’ ecosystem. Existing unimodal fake news detection methods face much challenge in multimodal scenarios, as they can not capture fully cross-modal semantic correlations and inconsistencies. Multimodal fake news detection, which integrates heterogeneous information from text, visual, and audio modalities to explore inter-modal consistency and complementarity, has therefore become a major research focus in recent years. A comprehensive survey of recent advances in multimodal fake news detection is presented in the paper, which consists of systematically reviews of the fundamental concepts, detection tasks, and underlying technical principles in this field. Major benchmark datasets and commonly used evaluation metrics are also introduced, followed by a structured taxonomy of representative detection methods and a summary of their experimental results. Furthermore, the key challenges faced by current research are discussed and promising future research directions are outlined. Compared with existing surveys, this work presents a more comprehensive method categorization that emphasizes the evolution of detection techniques, offers clearer comparisons of datasets and experimental analyses, and provides more practical insights for researchers and practitioners in multimodal fake news detection.



