The rapid expansion of intelligent mobile applications has made the deployment of Deep Neural Networks (DNNs) on Mobile Edge Devices (EDs) a critical necessity. However, the inherent computational limitations of EDs frequently result in severe energy inefficiency and compromised inference precision. To overcome these hurdles, we present a Distributed Collaborative Inference (DCI) framework aimed at minimizing on-device overhead by dispersing inference burdens across a mesh of EDs and Mobile Edge Computing (MEC) nodes. To effectively navigate the system's dynamic complexities, we construct an evolutionary reinforcement learning algorithm anchored in the Cross-Entropy Method (CEM). A novel feature of this algorithm is the utilization of negative Temporal Difference (TD) error as a fitness criterion for isolating elite population members. By leveraging high-fidelity samples from these elites, the learning trajectory is expedited, facilitating optimal decision-making in volatile environments. Simulation outcomes demonstrate the superiority of our method over current standards, evidencing a 57.5% rise in task completion rates and a 65.7% decrease in aggregate system costs.



