To address the issues of sparse rewards, low sample efficiency, and limited policy generalization in offline motion planning of spherical multi-telescopic-legged robots in complex three-dimensional environments, this paper proposes a Look-Ahead-Look-Back Hindsight Experience Replay (LALB-HER) algorithm. Building upon the future goal sampling strategy of the conventional Hindsight Experience Replay (HER) framework, the proposed method introduces a historical trajectory reinforcement mechanism to enhance the utilization of historical experiences, thereby mitigating the degradation of model generalization performance caused by the diminishing influence of past samples. On this basis, the Soft Actor–Critic (SAC) algorithm is adopted as the underlying reinforcement learning framework, into which the proposed LALB-HER mechanism is integrated. In addition, a task-oriented reward function is designed to facilitate stable convergence and efficient policy learning. Simulation results in complex environments demonstrate that the proposed method not only significantly accelerates policy convergence but also effectively improves the generalization performance of the learned policy under varying task conditions.



