Look-Ahead-Look-Back Hindsight Experience Replay for Motion Planning of Spherical Multi-Telescopic-Legged Robots

Xinyun Liu; Fengde Xu; Xudong Zhao

doi:10.53941/ic.2026.100004

Abstract

To address the issues of sparse rewards, low sample efficiency, and limited policy generalization in offline motion planning of spherical multi-telescopic-legged robots in complex three-dimensional environments, this paper proposes a Look-Ahead-Look-Back Hindsight Experience Replay (LALB-HER) algorithm. Building upon the future goal sampling strategy of the conventional Hindsight Experience Replay (HER) framework, the proposed method introduces a historical trajectory reinforcement mechanism to enhance the utilization of historical experiences, thereby mitigating the degradation of model generalization performance caused by the diminishing influence of past samples. On this basis, the Soft Actor–Critic (SAC) algorithm is adopted as the underlying reinforcement learning framework, into which the proposed LALB-HER mechanism is integrated. In addition, a task-oriented reward function is designed to facilitate stable convergence and efficient policy learning. Simulation results in complex environments demonstrate that the proposed method not only significantly accelerates policy convergence but also effectively improves the generalization performance of the learned policy under varying task conditions.

References

1.
Ortigoza, R.S.; Marcelino-Aranda, M.; Ortigoza, G.S.; et al. Wheeled mobile robots: a review. IEEE Lat. Am. Trans. 2012, 10, 2209–2217.
2.
Chan, R.P.M.; Stol, K.A.; Halkyard, C.R. Review of modelling and control of two-wheeled robots. Annu. Rev. Control. 2013, 37, 89–103.
3.
Chung, W.; Iagnemma, K. Wheeled robots. In Springer Handbook of Robotics; Springer: Berlin/Heidelberg, Germany, 2016; pp. 575–594.
4.
Zhao, D.; Revzen, S. Multi-legged steering and slipping with low DoF hexapod robots. Bioinspir. Biomimetics 2020, 15, 045001.
5.
Wen, Q.; He, J.; Gao, F. Kinematic design of a novel Multi-legged robot with Rigid-flexible coupling grippers for asteroid exploration. Robotica 2022, 40, 3699–3725.
6.
Peng, S.; Ding, X.; Yang, F.; et al. Motion planning and implementation for the self-recovery of an overturned multi-legged robot. Robotica 2017, 35, 1107–1120.
7.
He, X.; Huo, J.; Lin, R. Multi Spherical Robot Control System for Nuclear Radiation Leak Detection. Intell. Control. 2025, 1, 2.
8.
Liu, H.; Ying, F.; Jiang, R.; et al. Obstacle-Avoidable Robotic Motion Planning Framework Based on Deep Reinforcement Learning. IEEE/ASME Trans. Mechatronics 2024, 29, 4377–4388.
9.
Wang, H.; Gao, W.; Wang, Z.; et al. Research on Obstacle Avoidance Planning for UUV Based on A3C Algorithm. J. Mar. Sci. Eng. 2023, 12, 63.
10.
10. Tang, W.; Wu, F.; Lin, S.W.; et al. Causal deconfounding deep reinforcement learning for mobile robot motion planning. Knowl.-Based Syst. 2024, 303, 112406.
11.
Mnih, V.; Kavukcuoglu, K.; Silver, D.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533.
12.
Xu, Q.; Zhang, T.; Zhou, K.; et al. Active Collision Avoidance for Robotic Arm Based on Artificial Potential Field and Deep Reinforcement Learning. Appl. Sci. 2024, 14, 4936.
13.
Choi, J.; Lee, G.; Lee, C. Reinforcement learning-based dynamic obstacle avoidance and integration of path planning. Intell. Serv. Robot. 2021, 14, 663–677.
14.
Hart, F.; Okhrin, O. Enhanced method for reinforcement learning based dynamic obstacle avoidance by assessment of collision risk. Neurocomputing 2024, 568, 127097.
15.
Andrychowicz, M.; Wolski, F.; Ray, A.; et al. Hindsight experience replay. Adv. Neural Inf. Process. Syst. 2017, 30.
16.
Dong, M.; Ying, F.; Li, X.; et al. Efficient policy learning for general robotic tasks with adaptive dual-memory hindsight experience replay based on deep reinforcement learning. In Proceedings of the 7th International Conference on Robotics, Control and Automation (ICRCA), Taizhou, China, 5–7 January 2023; pp. 62–66.
17.
He, Q.; Zhuang, L.; Li, H. Soft hindsight experience replay. arXiv 2020, arXiv:2002.02089.
18.
Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596.
19.
Haarnoja, T.; Zhou, A.; Hartikainen, K.; et al. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905.

Scilight Press

Author Information

Abstract

Keywords

References

About Scilight

Journals

Publishing Policies

Contact Us