2506000829
  • Open Access
  • Article
Federated Bimodal Graph Neural Networks for Text-Image Retrieval
  • Xueming Yan 1, 2,   
  • Chuyue Wang 1,   
  • Yaochu Jin 3, *

Received: 24 Dec 2024 | Accepted: 20 Mar 2025 | Published: 27 Jun 2025

Abstract

Text-image retrieval is a key challenge in computer vision and natural language processing, aiming to retrieve the most semantically relevant image or text given a query in the opposite modality. However, growing privacy and security concerns make traditional centralized learning approaches increasingly unsuitable for handling sensitive multimodal data. In this paper, we propose FedBi-GNNs, a federated learning framework for bimodal graph neural networks, which enables collaborative training across decentralized clients without sharing private data. Each client independently constructs heterogeneous graphs from local text and image data and learns correspondences via bimodal graph matching. These local representations are then aggregated at a central server using a heterogeneous federated aggregation scheme. Empirical results on the MSCOCO benchmark demonstrate that FedBi-GNNs significantly outperform existing state-of-the-art methods, offering improved retrieval accuracy, enhanced privacy preservation, and greater robustness to data heterogeneity across clients.

References 

  • 1.

    Ebaid, D.B.; Madbouly, M.M.; El-Zoghabi, A.A. Bi-directional image–text matching deep learning-based approaches: Concepts, methodologies, benchmarks and challenges. Int. J. Comput. Intell. Syst., 2023, 16: 81. doi:10.1007/s44196-023-00260-3

  • 2.

    Zhou, Y.H.; Yan, X.M.; Huang, H.; et al. Legal text retrieval with contrastive representation learning and evolutionary data augmen- tation. In Proceedings of2024 IEEE Congress on Evolutionary Computation (CEC), Yokohama, Japan, 30 June 2024—5 July 2024; IEEE: New York, 2024; pp. 1–7. doi:10.1109/CEC60901.2024.10612052

  • 3.

    Ren, Z.; Jin, H.L.; Lin, Z.; et al. Joint image-text representation by Gaussian visual-semantic embedding. In Proceedings ofthe 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; ACM: New York, 2016; pp. 207–211. doi:10.1145/2964284.2967212

  • 4.

    Engilberge, M.; Chevallier, L.; Perez, P.; et al. Finding beans in burgers: Deep semantic-visual embedding with localization. In Pro- ceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, 2018; pp. 3984–3993. doi:10.1109/CVPR.2018.00419

  • 5.

    Zhen, L.L.; Hu, P.; Wang, X.; et al. Deep supervised cross-modal retrieval. In Proceedings of2019 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; IEEE: New York, 2019; 10386–10395. doi:10.1109/CVPR.2019.01064

  • 6.

    Yan, X.M.; Xue, H.W.; Jiang, S.Y.; et al. Multimodal sentiment analysis using multi-tensor fusion network with cross-modal model- ing. Appl. Artif. Intell., 2022, 36: 2000688. doi:10.1080/08839514.2021.2000688

  • 7.

    Lee, K.H.; Chen, X.; Hua, G.; et al. Stacked cross attention for image-text matching. In Proceedings of the 15th European Confer- ence on Computer Vision, Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 201–216. doi:10.1007/978-3-030-01225-__0 13

  • 8.
    Wang, Y.X.; Yang, H.; Qian, X.M.; et al. Position focused attention network for image-text matching. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10 August 2019; AAAI Press: Palo Alto, CA, USA, 2019; pp. 3792–3798.
  • 9.

    Wei, X.; Zhang, T.Z.; Li, Y.; et al. Multi-modality cross attention network for image and sentence matching. In Proceedings of2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 10938–10947. doi:10.1109/CVPR42600.2020.01095

  • 10.

    Ji, Z.; Chen, K.X.; Wang, H.R. Step-wise hierarchical alignment network for image-text matching. In Proceedings ofthe 30th Inter- national Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 9–27 August 2021; pp. 765–771. doi:10.24963/ijcai. 2021/106

  • 11.

    Yan, X.M.; Huang, H.; Jin, Y.C.; et al. Neural architecture search via multi-hashing embedding and graph tensor networks for multi- lingual text classification. IEEE Trans. Emerg. Top. Comput. Intell., 2024, 8: 350–363. doi:10.1109/TETCI.2023.3301774

  • 12.

    Wang, S.J.; Wang, R.P.; Yao, Z.W.; et al. Cross-modal scene graph matching for relationship-aware image-text retrieval. In Pro- ceedings of 2020 IEEE Winter Conference on Applications of Computer Vision ( WACV), Snowmass, CO, USA, 1–5 March 2020; IEEE: New York, NY, USA, 2020; pp. 1497–1506. doi:10.1109/WACV45572.2020.9093614

  • 13.

    Nguyen, M.D.; Nguyen, B.T.; Gurrin, C. A deep local and global scene-graph matching for image-text retrieval. In New Trends in Intelligent Software Methodologies, Tools and Techniques; Fujita, H., Perez-Meana, H., Eds.; IOS Press: Amsterdam, The Nether- lands, 2021. doi:10.3233/FAIA210049

  • 14.

    Liu, C.X.; Mao, Z.D.; Zhang, T.Z.; et al. Graph structured network for image-text matching. In Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13—19 June 2020; IEEE: New York, NY, USA, 2020; pp. 10918–10927. doi:10.1109/CVPR42600.2020.01093

  • 15.

    Diao, H.W.; Zhang, Y.; Ma, L.; et al. Similarity reasoning and filtration for image-text matching. In Proceedings of the 35th AAAIConference on Artificial Intelligence, New York, NY, USA, 2—9 February 2021; AAAI Press: Palo Alto, CA, USA, 2021; pp. 1218–1226. doi:10.1609/aaai.v35i2.16209

  • 16.

    Yang, Q.; Liu, Y.; Chen, T.J.; et al. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. (TIST), 2019, 10: 12. doi:10.1145/3298981

  • 17.
    Liu, D.B.; Miller, T. Federated pretraining and fine tuning of BERT using clinical notes from multiple silos. arXiv, 2020, arXiv: 2002.08562.
  • 18.

    Zhuo, Y.X.; Li, B.X. Fedns: Improving federated learning for collaborative image classification on mobile clients. In Proceedings of 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, 5—9 July 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. doi:10.1109/ICME51207.2021.9428075

  • 19.

    Wang, H.; Zeng, Z.R.; Liu, R.F.; et al. A federated learning based Chinese text classification model with parameter factorization weighting. In Proceedings ofthe 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC), Beijing, China, 17—19 November 2021; IEEE: New York, NY, USA, 2021; pp. 299–303. doi:10.1109/IC-NIDC54101.2021.9660471

  • 20.

    Lyu, L.J.; Yu, H.; Ma, X.J.; et al. Privacy and robustness in federated learning: Attacks and defenses. IEEE Trans. Neural Netw. Learn. Syst., 2024, 35, 8726–8746. doi:10.1109/TNNLS.2022.3216981

  • 21.
    Faghri, F.; Fleet, D.J.; Kiros, J.R.; et al. VSE++: Improved visual-semantic embeddings. arXiv, 2018, arXiv: 1707.05612.
  • 22.

    Li, K.P.; Zhang, Y.L.; Li, K.; et al. Visual semantic reasoning for image-text matching. In Proceedings of2019 IEEE/CVF Interna- tional Conference on Computer Vision (ICCV), Seoul, Korea (South), 27 October 2019–2 November 2019; IEEE: New York, NY, USA, 2019; pp. 4653–4661. doi:10.1109/ICCV.2019.00475

  • 23.

    Zong, L.L.; Xie, Q.J.; Zhou, J.H.; et al. FedCMR: Federated cross-modal retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Online, 11—15 July 2021; ACM: New York, 2021; pp. 1672–1676. doi:10.1145/3404835.3462989

  • 24.
    McMahan, B.; Moore, E.; Ramage, D.; et al. Communication-efficient learning of deep networks from decentralized data. In Pro- ceedings ofthe 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20—22 April 2017; pp. 1273–1282.
  • 25.
    Li, T.; Sahu, A.K.; Zaheer, M.; et al. Federated optimization in heterogeneous networks. In Proceedings of the 3rd Conference on Machine Learning and Systems, Austin, TX, USA, 2—4 March 2020; pp. 429–450.
  • 26.
    Ren, S.Q.; He, K.M.; Girshick, R.; et al. Faster R-CNN: Towards real-time object detection with region proposal networks. In Pro- ceedings ofthe 29th International Conference on Neural Information Processing Systems, Montreal, Canada, 7—12 December 2015; MIT Press: Cambridge, UK, 2015; pp. 91–99.
  • 27.

    Pennington, J.; Socher, R.; Manning, C.D. GloVe: Global vectors for word representation. In Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25—29 October 2014; Association for Computational Linguistics: Stroudsburg, PA, USA, 2014; pp. 1532–1543. doi:10.3115/v1/D14-1162

  • 28.

    Lin, T.Y.; Maire, M.; Belongie, S.; et al. Microsoft COCO: Common objects in context. In Proceedings ofthe 13th European Con- ference on Computer Vision, Zurich, Switzerland, 6—12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. doi:10.1007/978-3-319-10602-__148

  • 29.

    Li, Q.B.; Diao, Y.Q.; Chen, Q.; et al. Federated learning on non-IID data silos: An experimental study. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9—12 May 2022; IEEE: New York, NY, USA, 2022; pp. 965–978. doi:10.1109/ICDE53745.2022.00077

  • 30.

    Li, A.; Sun, J.W.; Wang, B.H.; et al. LotteryFL: Empower edge intelligence with personalized and communication-efficient feder- ated learning. In Proceedings of 2021 IEEE/ACM Symposium on Edge Computing (SEC), San Jose, CA, USA, 14—17 December 2021; IEEE: New York, NY, USA, 2021; pp. 68–79. doi:10.1145/3453142.3492909

  • 31.

    Kairouz, P.; McMahan, H.B.; Avent, B.; et al. Advances and open problems in federated learning. Found. Trends® Mach. Learn., 2021, 14: 1–210. doi:10.1561/2200000083

Share this article:
How to Cite
Yan, X.; Wang, C.; Jin, Y. Federated Bimodal Graph Neural Networks for Text-Image Retrieval. International Journal of Network Dynamics and Intelligence 2025, 4 (2), 100009. https://doi.org/10.53941/ijndi.2025.100009.
RIS
BibTex
Copyright & License
article copyright Image
Copyright (c) 2025 by the authors.