This paper investigates the optimal couple-group consensus control (OCGCC) for heterogeneous multi-agent systems (HeMASs) with completely unknown dynamics. The agents in HeMASs are divided into two groups according to order differences. Meanwhile, heterogeneous systems are transformed into homogeneous ones by adding virtual velocities. Then, a novel data-driven distributed control protocol for HeMASs is proposed based on policy gradient reinforcement learning (RL). The proposed algorithm is implemented asynchronously, and is specifically designed to address the issue of computational imbalance caused by individual differences among participants. It achieves this by constructing an actor-critic (AC) framework. The system’s learning efficacy is optimized using offline data sets. The convergence and stability are ensured by applying functional analysis and the Lyapunov stability theory. Finally, the effectiveness of the proposed algorithm is confirmed by various simulation examples.



