2511002179
  • Open Access
  • Article

Data-Based Optimal Couple-Group Consensus Control for Heterogeneous Multi-Agent Systems via Policy Gradient Reinforcement Learning

  • Jun Li 1,   
  • Xiaoyu Pei 2,   
  • Lianghao Ji 2,*

Received: 30 Aug 2025 | Revised: 17 Dec 2025 | Accepted: 05 Jan 2026 | Published: 13 Jan 2026

Abstract

This paper investigates the optimal couple-group consensus control (OCGCC) for heterogeneous multi-agent systems (HeMASs) with completely unknown dynamics. The agents in HeMASs are divided into two groups according to order differences. Meanwhile, heterogeneous systems are transformed into homogeneous ones by adding virtual velocities. Then, a novel data-driven distributed control protocol for HeMASs is proposed based on policy gradient reinforcement learning (RL). The proposed algorithm is implemented asynchronously, and is specifically designed to address the issue of computational imbalance caused by individual differences among participants. It achieves this by constructing an actor-critic (AC) framework. The system’s learning efficacy is optimized using offline data sets. The convergence and stability are ensured by applying functional analysis and the Lyapunov stability theory. Finally, the effectiveness of the proposed algorithm is confirmed by various simulation examples.

References 

  • 1.

    Lin, Z.; Wang, L.; Han, Z.; et al. Distributed Formation Control of Multi-Agent Systems Using Complex Laplacian. IEEE Trans. Autom. Control 2014, 59, 1765–1777.

  • 2.

    Prodanovic, M.; Green, T. High-Quality Power Generation Through Distributed Control of a Power Park Microgrid. IEEE Trans. Ind. Electron. 2006, 53, 1471–1482.

  • 3.

    Gao, W.; Jiang, Z.P.; Ozbay, K. Data-Driven Adaptive Optimal Control of Connected Vehicles. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1122–1133.

  • 4.

    Olfati-Saber, R.; Murray, R. Consensus Problems in Networks of Agents With Switching Topology and Time-Delays. IEEE Trans. Autom. Control 2004, 49, 1520–1533.

  • 5.

    Lesser, V.; Tambe, M.; Ortiz, C.L. Distributed Sensor Networks: A Multiagent Perspective; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2003.

  • 6.

    Yang, R.; Zhang, H.; Feng, G.; et al. Robust Cooperative Output Regulation of Multi-Agent Systems via Adaptive Event-Triggered Control. Automatica 2019, 102, 129–136.

  • 7.

    Qin, J.; Yu, C.; Gao, H. Coordination for Linear Multiagent Systems With Dynamic Interaction Topology in the Leader-Following Framework. IEEE Trans. Ind. Electron. 2014, 61, 2412–2422.

  • 8.

    Wang, N.; Gao, Y.; Zhao, H.; et al. Reinforcement Learning-Based Optimal Tracking Control of an Unknown Unmanned Surface Vehicle. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 3034–3045.

  • 9.

    Shi, H.; Shi, L.; Xu, M.; et al. End-to-End Navigation Strategy With Deep Reinforcement Learning for Mobile Robots. IEEE Trans. Ind. Inform. 2020, 16, 2393–2402.

  • 10.

    Li, L.; Wu, D.; Huang, Y.; et al. A Path Planning Strategy Unified With a COLREGS Collision Avoidance Function Based on Deep Reinforcement Learning and Artificial Potential Field. Appl. Ocean. Res. 2021, 113, 102759.

  • 11.

    Zhou, W.; Liu, Z.; Li, J.; et al. Multi-Target Tracking for Unmanned Aerial Vehicle Swarms Using Deep Reinforcement Learning. Neurocomputing 2021, 466, 285–297.

  • 12.

    Jiang, Y.; Jiang, Z.P. Computational Adaptive Optimal Control for Continuous-Time Linear Systems With Completely Unknown Dynamics—ScienceDirect. Automatica 2012, 48, 2699–2704.

  • 13.

    Zhang, J.; Zhang, H.; Feng, T. Distributed Optimal Consensus Control for Nonlinear Multiagent System With Unknown Dynamic. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 3339–3348.

  • 14.

    Xiong, H.; Chen, G.; Ren, H.; et al. Broad-Learning-System-Based Model-Free Adaptive Predictive Control for Nonlinear MASs Under DoS Attacks. IEEE/CAA J. Autom. Sin. 2025, 12, 381–393.

  • 15.

    Cai, G.; Yin, G.; Liu, Y.; et al. Stochastic Cooperative Adaptive Cruise Control With Sensor Data Distortion and Communication Delay. IEEE Trans. Intell. Transp. Syst. 2025, 26, 9500–9515.

  • 16.

    Zhang, J.; Wang, Z.; Zhang, H. Data-Based Optimal Control of Multiagent Systems: A Reinforcement Learning Design Approach. IEEE Trans. Cybern. 2019, 49, 4441–4449.

  • 17.

    Feng, T.; Zhang, J.; Tong, Y.; et al. Q-Learning Algorithm in Solving Consensusability Problem of Discrete-Time Multi-Agent Systems. Automatica 2021, 128, 109576.

  • 18.

    Chen, L.; Dong, C.; Dai, S.L. Adaptive Optimal Consensus Control of Multiagent Systems With Unknown Dynamics and Disturbances via Reinforcement Learning. IEEE Trans. Artif. Intell. 2024, 5, 2193–2203.

  • 19.

    Niu, B.; Wang, X.A.; Wang, H.Q.; et al. Adaptive RL Optimized Bipartite Consensus Tracking for Heterogeneous Nonlinear MASs Under a Switching Threshold Event Triggered Strategy. IEEE Trans. Autom. Sci. Eng. 2024, 21, 7379–7389.

  • 20.

    Lin, M.; Zhao, B.; Liu, D. Policy Gradient Adaptive Critic Designs for Model-Free Optimal Tracking Control With Experience Replay. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 3692–3703.

  • 21.

    Yang, X.; Zhang, H.; Wang, Z. Data-Based Optimal Consensus Control for Multiagent Systems With Policy Gradient Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 3872–3883.

  • 22.

    Li, J.; Ji, L.; Zhang, C.; et al. Optimal Couple-Group Tracking Control for the Heterogeneous Multi-Agent Systems With Cooperative-Competitive Interactions via Reinforcement Learning Method. Inf. Sci. 2022, 610, 401–424.

  • 23.

    Mohammadi, M.; Arefi, M.M.; Setoodeh, P.; et al. Optimal Tracking Control Based on Reinforcement Learning Value Iteration Algorithm for Time-Delayed Nonlinear Systems With External Disturbances and Input Constraints. Inf. Sci. 2021, 554, 84–98.

  • 24.

    Ji, Y.; Wang, G.; Li, Q.; et al. Event-Triggered Optimal Consensus of Heterogeneous Nonlinear Multi-Agent Systems. Mathematics 2022, 10, 10105–10115.

  • 25.

    Li, G.; Wang, L. Adaptive Output Consensus of Heterogeneous Nonlinear Multiagent Systems: A Distributed Dynamic Compensator Approach. IEEE Trans. Autom. Control 2023, 68, 2483–2489.

  • 26.

    Liu, D.; Wei, Q. Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 621–634.

  • 27.

    Zhang, H.; Jiang, H.; Luo, Y.; et al. Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method. IEEE Trans. Ind. Electron. 2017, 64, 4091–4100.

  • 28.

    Wen, G.; Yu, Y.; Peng, Z.; et al. Dynamical Group Consensus of Heterogenous Multi-Agent Systems With Input Time Delays. Neurocomputing 2016, 175, 278–286.

  • 29.

    Guo, X.G.; Liu, P.M.; Wang, J.L.; et al. Event-Triggered Adaptive Fault-Tolerant Pinning Control for Cluster Consensus of Heterogeneous Nonlinear Multi-Agent Systems Under Aperiodic DoS Attacks. IEEE Trans. Netw. Sci. Eng. 2021, 8, 1941–1956.

  • 30.

    Li, K.; Hua, C.; You, X.; et al. Output Feedback Predefined-Time Bipartite Consensus Control for High-Order Nonlinear Multiagent Systems. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 3069–3078.

  • 31.

    Li, X.; Yu, Z.; Li, Z.; et al. Group Consensus via Pinning Control for a Class of Heterogeneous Multi-Agent Systems With Input Constraints. Inf. Sci. 2021, 542, 247–262.

  • 32.

    Zhao, G.; Hua, C. Leaderless and Leader-Following Bipartite Consensus of Multiagent Systems With Sampled and Delayed Information. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 2220–2233.

  • 33.

    Ai, X. Adaptive Robust Bipartite Consensus of High-Order Uncertain Multi-Agent Systems Over Cooperation-Competition Networks. J. Frankl. Inst. 2020, 357, 1813–1831.

  • 34.

    Jiang, Y.; Ji, L.; Liu, Q.; Yang, S.; Liao, X. Couple-Group Consensus for Discrete-Time Heterogeneous Multiagent Systems With Cooperative–Competitive Interactions and Time Delays. Neurocomputing 2018, 319, 92–101.

  • 35.

    Wen, G.; Li, B. Optimized Leader-Follower Consensus Control Using Reinforcement Learning for a Class of Second-Order Nonlinear Multiagent Systems. IEEE Trans. Syst. Man, Cybern. Syst. 2022, 52, 5546–5555.

  • 36.

    Liu, C.L.; Liu, F. Dynamical Consensus Seeking of Heterogeneous Multi-Agent Systems Under Input Delays. Int. J. Commun. Syst. 2013, 26, 1243–1258.

  • 37.

    Luo, B.; Liu, D.; Wu, H.N.; et al. Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control. IEEE Trans. Cybern. 2017, 47, 3341–3354.

  • 38.

    Peng, Z.; Hu, J.; Shi, K.; et al. A Novel Optimal Bipartite Consensus Control Scheme for Unknown Multi-Agent Systems via Model-Free Reinforcement Learning. Appl. Math. Comput. 2020, 369, 124821.

  • 39.

    Ji, L.; Lin, Z.; Zhang, C.; et al. Data-Based Optimal Consensus Control for Multiagent Systems With Time Delays: Using Prioritized Experience Replay. IEEE Trans. Syst. Man, Cybern. Syst. 2024, 54, 3244–3256.

Share this article:
How to Cite
Li, J.; Pei, X.; Ji, L. Data-Based Optimal Couple-Group Consensus Control for Heterogeneous Multi-Agent Systems via Policy Gradient Reinforcement Learning. Journal of Machine Learning and Information Security 2026, 2 (1), 1. https://doi.org/10.53941/jmlis.2026.100001.
RIS
BibTex
Copyright & License
article copyright Image
Copyright (c) 2026 by the authors.