Learning-Based Optimization for Vehicle/Robot Routing Problems: A Survey

Jun Li

doi:10.53941/jaia.2026.100011

Abstract

Learning-based Neural Combinatorial Optimization (NCO) is an emerging paradigm for various vehicle/robot routing problems. It transitions solution strategies from manual heuristics to data-driven learning. This paper presents a systematic survey of deep learning-based approaches for route optimization. We first unify classical routing models and formulate reinforcement learning methods within a Markov Decision Process (MDP) framework. Existing literature is primarily classified into two categories: (1) end-to-end neural solvers, encompassing constructive and improvement-based methods, constraint-handling techniques, and various encoder–decoder or generative training schemes; and (2) scalability-oriented solvers, which leverage divide-and-conquer strategies to address large-scale routing problems. Finally, we discuss vital future research directions, including the integration of heuristic knowledge into NCO, large-scale multi-objective optimization, and automated modeling/solving. This survey offers a structured taxonomy of learning-based route optimization methods and discusses the potential for extending them to a broader class of combinatorial optimization problems.

References

1.
Toth, P.; Vigo, D. Vehicle Routing: Problems, Methods, and Applications, 2nd ed.; SIAM: Philadelphia, PA, USA, 2014; pp. 2–5.
2.
Matai, R.; Singh, S.; Mittal, M.L. Traveling Salesman Problem: An Overview of Applications, Formulations, and Solution Approaches. In Traveling Salesman Problem: Theory and Applications; Davendra, D., Ed.; InTech: Rijeka, Croatia, 2010; pp. 1–24.
3.
Cheikhrouhou, O. A Comprehensive Survey on the Multiple Traveling Salesman Problem: Applications, Approaches and Taxonomy. Comput. Sci. Rev. 2021, 40, 100369.
4.
Braekers, K. The Vehicle Routing Problem: State of the Art Classification and Review. Comput. Ind. Eng. 2016, 99, 300–313.
5.
Li, J.; Zhou, M.; Sun, Q.; et al. Colored Traveling Salesman Problem. IEEE Trans. Cybern. 2015, 45, 2390–2401.
6.
Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer Networks. In Advances in Neural Information Processing Systems 28, Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Cortes, C., Lawrence, N., Lee, D., et al., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2015; pp. 2692–2700.
7.
Zhou, F.; Lischka, A.; Kulcsár, B.Z.; et al. Learning for Routing: A Guided Review of Recent Developments and Future Directions. Transp. Res. Part E Logist. Transp. Rev. 2025, 202, 104278.
8.
Ottoni, A.L.C.; Nepomuceno, E.G.; de Oliveira, M.S.; et al. Tuning of Reinforcement Learning Parameters Applied to SOP Using the Scott-Knott Method. Soft Comput. 2020, 24, 4441–4453.
9.
Ye, H.; Wang, J.; Liang, H.; et al. GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 16268–16276.
10.
Sun, Z.; Yang, Y. DIFUSCO: Graph-Based Diffusion Solvers for Combinatorial Optimization. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; pp. 3706–3731.
11.
Tang, Q.; Kong, Y.; Pan, L.; et al. Learning to Solve Soft-Constrained Vehicle Routing Problems with Lagrangian Relaxation. arXiv 2022, arXiv:2207.09860.
12.
Zheng, Z.; Zhou, C.; Tong, X.; et al. UDC: A Unified Neural Divide-and-Conquer Framework for Large-Scale Combinatorial Optimization Problems. In Proceedings of the 38th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 9–15 December 2024; pp. 6081–6125.
13.
Naveed, H.; Khan, A.U.; Qiu, S.; et al. A Comprehensive Overview of Large Language Models. ACM Trans. Intell. Syst. Technol. 2025, 16, 1–72.
14.
Huang, Z.; Shi, G.; Sukhatme, G.S. From Words to Routes: Applying Large Language Models to Vehicle Routing. arXiv 2024, arXiv:2403.10795.
15.
Li, K.; Liu, F.; Wang, Z.; et al. ARS: Automatic Routing Solver with Large Language Models. arXiv 2025, arXiv:2502.15359.
16.
Meng, S.; Wang, Y.; Yang, C. F.; et al. LLM-A: Large Language Model Enhanced Incremental Heuristic Search on Path Planning. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA, 12–16 November 2024; pp. 1087–1102.
17.
Cao, L.; Wang, M.; Xiong, X. A Large Language Model-Enhanced Q-Learning for Capacitated Vehicle Routing Problem with Time Windows. arXiv 2025, arXiv:2505.06178.
18.
Bogyrbayeva, A.; Meraliyev, M.; Mustakhov, T.; et al. Learning to Solve Vehicle Routing Problems: A Survey. arXiv 2022, arXiv:2205.02453.
19.
Dantzig, G. Solution of a Large-Scale Traveling-Salesman Problem. J. Oper. Res. Soc. Am. 1954, 2, 393–410.
20.
Zhang, J.; Liu, C.; Li, X.; et al. A Survey for Solving Mixed Integer Programming via Machine Learning. Neurocomputing 2023, 519, 205–217.
21.
Yang, A.; Liu, Y.; Zou, J.; et al. Decomposed Multi-Objective Method Based on Q-Learning for Solving Multi-Objective Combinatorial Optimization Problems. In Proceedings of the International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2023), Singapore, 8–10 December 2023; pp. 59–73.
22.
Bello, I.; Lee, C.K.; Tsang, Y.P. Neural Combinatorial Optimization with Reinforcement Learning. arXiv 2016, arXiv:1611.09940.
23.
Jin, Y.; Ding, Y.; Pan, X.; et al. Pointerformer: Deep Reinforced Multi-Pointer Transformer for the Traveling Salesman Problem. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–13 February 2023; pp. 8132–8140.
24.
Luo, F.; Lin, X.; Liu, F.; et al. Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; pp. 8845–8864.
25.
Kool, W.; Van Hoof, H.; Welling, M. Attention, Learn to Solve Routing Problems! In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019.
26.
Kwon, Y.-D.; Choo, J.; Kim, B.; et al. POMO: Policy Optimization with Multiple Optima for Reinforcement Learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; pp. 21188–21198.
27.
Kwon, Y.D.; Choo, J.; Yoon, I.; et al. Matrix Encoding Networks for Neural Combinatorial Optimization. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Virtual, 6–14 December 2021; pp. 5138–5149.
28.
Li, J. Heterogeneous Attentions for Solving Pickup and Delivery Problem via Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2021, 23, 2306–2315.
29.
Zhang, N.; Yang, J.; Cao, Z.; et al. Adversarial Generative Flow Network for Solving Vehicle Routing Problems. In Proceedings of the 13th International Conference on Learning Representations (ICLR 2025), Singapore, 24–28 April 2025.
30.
Hudson, B.; Li, Q.; Malencia, M.; et al. Graph Neural Network Guided Local Search for the Traveling Salesperson Problem. arXiv 2021, arXiv:2110.05291.
31.
Fu, Z.-H.; Qiu, K.B.; Zha, H. Generalize a Small Pre-Trained Model to Arbitrarily Large TSP Instances. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 7474–7482.
32.
Kool, W.; van Hoof, H.; Gromicho, J.; et al. Deep Policy Dynamic Programming for Vehicle Routing Problems. In Lecture Notes in Computer Science; Volume 13292; Springer: Cham, Switzerland, 2022; pp. 190–213.
33.
Chalumeau, F.; Surana, S.; Bonnet, C.; et al. Combinatorial Optimization with Policy Adaptation Using Latent Space Search. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; pp. 7947–7959.
34.
Lu, H.; Zhang, X.; Yang, S. A Learning-Based Iterative Method for Solving Vehicle Routing Problems. In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019.
35.
Choo, J.; Kwon, Y.D.; Kim, J.; et al. Simulation-Guided Beam Search for Neural Combinatorial Optimization. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 8760–8772.
36.
Wang, Q.; Hao, Y.; Zhang, J. Generative Inverse Reinforcement Learning for Learning 2-Opt Heuristics Without Extrinsic Rewards in Routing Problems. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 101787.
37.
Pan, X.; Jin, Y.; Ding, Y.; et al. H-TSP: Hierarchically Solving the Large-Scale Traveling Salesman Problem. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–13 February 2023; pp. 9345–9353.
38.
Zong, Z.; Wang, H.; Wang, J.; et al. RBG: Hierarchically Solving Large-Scale Routing Problems in Logistic Systems via Reinforcement Learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 4648–4658.
39.
Pan, Y.; Liu, R.; Chen, Y.; et al. Hierarchical Learning-Based Graph Partition for Large-Scale Vehicle Routing Problems. arXiv 2025, arXiv:2502.08340.
40.
Ma, Q.; Ge, S.; He, D.; et al. Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning. arXiv 2019, arXiv:1911.04936.
41.
Nazari, M.; Oroojlooy, A.; Snyder, L.; et al. Reinforcement Learning for Solving the Vehicle Routing Problem. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 9839–9849.
42.
Xin, L. Step-Wise Deep Learning Models for Solving Routing Problems. IEEE Trans. Ind. Inform. 2020, 17, 4861–4871.
43.
Keskin, M.; Yılmaz, M. Chinese and windy postman problem with variable service costs. Soft Comput. 2019, 23, 7359–7373.
44.
Guo, R.; Xue, F.; Ming, A.; et al. An efficient learning-based solver comparable to metaheuristics for the capacitated arc routing problem. arXiv 2024, arXiv:2403.07028.
45.
Jia, Y.; Zheng, Q.; Wang, Y.; et al. A Neural Solver with Traversal-Based Feature Representation and Adjacent Attention for Capacitated Arc Routing Problem. IEEE Trans. Intell. Transport. Syst. 2025, 26, 22329–22343.
46.
Helsgaun, K. LKH-3. 2025. Available online: http://akira.ruc.dk/~keld/research/LKH-3 (accessed on 5 December 2025).
47.
Wu, X.; Wang, D.; Wen, L.; et al. Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with Perspectives. arXiv 2024, arXiv:2406.00415.
48.
Ma, Y.; Cao, Z.; Chee, Y.M. Learning to Search Feasible and Infeasible Regions of Routing Problems with Flexible Neural k-Opt. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Virtual, 6–12 December 2020; pp. 49555–49578.
49.
Bi, J.; Ma, Y.; Zhou, J.; et al. Learning to Handle Complex Constraints for Vehicle Routing Problems. In Proceedings of the 38th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 9–15 December 2024; pp. 93479–93509.
50.
Prates, M.; Avelar, P.H.; Lemos, H.; et al. Learning to Solve NP-Complete Problems: A Graph Neural Network for Decision TSP. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 4731–4738.
51.
Milan, A.; Rezatofighi, S.; Garg, R.; et al. Data-Driven Approximations to NP-Hard Problems. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 4083–4089.
52.
Kotary, J.; Fioretto, F.; Van Hentenryck, P. Learning Hard Optimization Problems: A Data Generation Perspective. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Virtual, 6–14 December 2021; pp. 24981–24992.
53.
Karalias, N.; Loukas, A. Erdos Goes Neural: An Unsupervised Learning Framework for Combinatorial Optimization on Graphs. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Virtual, 6–12 December 2020, pp. 6659–6672.
54.
Wang, H.; Wu, N.; Yang, H.; et al. Unsupervised Learning for Combinatorial Optimization with Principled Objective Relaxation. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 31444–31458.
55.
Wang, H.; Li, P. Unsupervised Learning for Combinatorial Optimization Needs Meta-Learning. arXiv 2023, arXiv:2301.03116.
56.
Kim, M.; Park, J.; Park, J. Sym-NCO: Leveraging Symmetricity for Neural Combinatorial Optimization. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 1936–1949.
57.
Min, Y.; Bai, Y.; Gomes, C.P. Unsupervised Learning for Solving the Travelling Salesman Problem. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; pp. 47264–47278.
58.
Souza, G.K.B.; Santos, S.O.S.; Ottoni, A.L.C.; et al. Transfer Reinforcement Learning for Combinatorial Optimization Problems. Algorithms 2024, 17, 87.
59.
Bi, J.; Ma, Y.; Wang, J.; et al. Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 31226–31238.
60.
Kanda, J.; Carvalho, A.D.; Hruschka, E.; et al. Meta-Learning to Select the Best Meta-Heuristic for the Traveling Salesman Problem: A Comparison of Meta-Features. Neurocomputing 2016, 205, 393–406.
61.
Son, J.; Kim, M.; Kim, H.; et al. Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimization. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 32194–32210.
62.
Pan, M.; Lin, G.; Luo, Y.W.; et al. Preference Optimization for Combinatorial Optimization Problems. arXiv 2025, arXiv:2505.08735.
63.
Hospedales, T.; Antoniou, A.; Micaelli, P.; et al. Meta-Learning in Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5149–5169.
64.
Taillard, É.D.; Helsgaun, K. POPMUSIC for the Travelling Salesman Problem. Eur. J. Oper. Res. 2019, 272, 420–429.
65.
Taillard, É.D. A Linearithmic Heuristic for the Travelling Salesman Problem. Eur. J. Oper. Res. 2022, 297, 442–450.
66.
Li, T.; Zou, H.; Wu, J.; et al. LMask: Learn to Solve Constrained Routing Problems with Lazy Masking. arXiv 2025, arXiv:2505.17938.
67.
Xu, Y. Reinforcement Learning with Multiple Relational Attention for Solving Vehicle Routing Problems. IEEE Trans. Cybern. 2021, 52, 11107–11120.
68.
Bai, H.; Cheng, R.; Jin, Y. Evolutionary Reinforcement Learning: A Survey. Intell. Comput. 2023, 2, 0025.
69.
Jaderberg, M.; Dalibard, V.; Osindero, S.; et al. Population Based Training of Neural Networks. arXiv 2017, arXiv:1711.09846.
70.
Hong, J.; Shen, B.; Pan, A. A Reinforcement Learning-Based Neighborhood Search Operator for Multi-Modal Optimization and Its Applications. Expert Syst. Appl. 2024, 246, 123150.
71.
Zhou, J.; Wu, Y.; Cao, Z.; et al. Learning Large Neighborhood Search for Vehicle Routing in Airport Ground Handling. IEEE Trans. Knowl. Data Eng. 2023, 35, 9769–9782.
72.
Li, X.; Qin, Y.; Huo, J.; et al. Heuristically Assisted Multiagent RL-Based Framework for Computation Offloading and Resource Allocation of Mobile-Edge Computing. IEEE Internet Things J. 2023, 10, 15477–15487.
73.
Joshi, C.K.; Cappart, Q.; Rousseau, L.M.; et al. Learning the Travelling Salesperson Problem Requires Rethinking Generalization. Constraints 2022, 27, 1–29.
74.
Wang, C. Heuristic-Augmented Attentions for the Electric Vehicle Routing Problem with Time Windows. IEEE Trans. Veh. Technol. 2026, Early Access.
75.
Wu, Q. MOELS: Multiobjective Evolutionary List Scheduling for Cloud Workflows. IEEE Trans. Autom. Sci. Eng. 2020, 17, 166–176.
76.
Lin, J.; Wang, X.; Niu, R.; et al. A Q-Learning-Based Hyper-Heuristic for Capacitated Electric Vehicle Routing Problem. IEEE Trans. Intell. Transp. Syst. 2025, 26, 15746–15757.
77.
Miettinen, K. Nonlinear Multiobjective Optimization; International Series in Operations Research & Management Science; Volume 12; Springer: New York, NY, USA, 2012.
78.
Wang, R.; Zhou, Z.; Ishibuchi, H.; et al. Localized Weighted Sum Method for Many-Objective Optimization. IEEE Trans. Evol. Comput. 2018, 22, 3–18.
79.
Wang, R.; Zhang, Q.; Zhang, T. Decomposition-Based Algorithms Using Pareto Adaptive Scalarizing Methods. IEEE Trans. Evol. Comput. 2016, 20, 821–837.
80.
Li, K. Deep Reinforcement Learning for Multiobjective Optimization. IEEE Trans. Cybern. 2020, 51, 3103–3114.
81.
Li, J.; Chu, Y.; Sun, Y.; et al. AutoPBO: LLM-Powered Optimization for Local Search PBO Solvers. arXiv 2025, arXiv:2509.04007.
82.
Hsu, C.H.; Chang, S.H.; Liang, J.H.; et al. MONAS: Multi-Objective Neural Architecture Search Using Reinforcement Learning. arXiv 2018, arXiv:1806.10332.
83.
Fu, Y.; Zhou, M.; Guo, X.; et al. Multiobjective Scheduling of Energy-Efficient Stochastic Hybrid Open Shop with Brain Storm Optimization and Simulation Evaluation. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 4260–4272.
84.
Cui, M.; Li, L.; Zhou, M.; et al. Surrogate-Assisted Autoencoder-Embedded Evolutionary Optimization Algorithm to Solve High-Dimensional Expensive Problems. IEEE Trans. Evol. Comput. 2022, 26, 676–689.
85.
Zhou, M.C.; Cui, M.; Xu, D.; et al. Evolutionary Optimization Methods for High-Dimensional Expensive Problems: A Survey. IEEE/CAA J. Autom. Sin. 2024, 11, 1092–1105.
86.
Wang, X.; Kang, Q.; Zhou, M.; et al. Domain Adaptation Multitask Optimization. IEEE Trans. Cybern. 2023, 53, 4567–4578.
87.
Wang, X.; Kang, Q.; Zhou, M.; et al. Knowledge Classification-Assisted Evolutionary Multitasking for Two-Task Multiobjective Optimization Problems. IEEE/CAA J. Autom. Sin. 2025, 12, 1176–1193.
88.
Deng, J.; Wang, J.; Wang, X.; et al. Multi-Task Multi-Objective Evolutionary Search Based on Deep Reinforcement Learning for Multi-Objective Vehicle Routing Problems with Time Windows. Symmetry 2024, 16, 1030.
89.
Tian, Y.; Si, L.; Zhang, X.; et al. Evolutionary Large-Scale Multi-Objective Optimization: A Survey. ACM Comput. Surv. 2021, 54, 1–34.
90.
Hottung, A.; Berto, F.; Hua, C.; et al. VRPAgent: LLM-Driven Discovery of Heuristic Operators for Vehicle Routing Problems. arXiv 2025, arXiv:2510.07073.
91.
Astorga, N.; Liu, T.; Xiao, Y.; et al. Autoformulation of Mathematical Optimization Models Using LLMs. arXiv 2024, arXiv:2411.01679.
92.
Hu, Y.; Zhao, T.; Yue, M. From Natural Language to Solver-Ready Power System Optimization: An LLM-Assisted, Validation-in-the-Loop Framework. arXiv 2025, arXiv:2508.08147.
93.
Peng, M.; Chen, Z.; Yang, J.; et al. Automatic MILP Model Construction for Multi-Robot Task Allocation and Scheduling Based on Large Language Models. arXiv 2025, arXiv:2503.13813.
94.
Huang, Z.; Wu, W.; Wu, K.; et al. CALM: Co-Evolution of Algorithms and Language Model for Automatic Heuristic Design. arXiv 2025, arXiv:2505.12285.
95.
Huang, Z.; Wu, W.; Wu, K.; et al. CALM: Co-Evolution of Algorithms and Language Model for Automatic Heuristic Design. arXiv 2025, arXiv:2505.12285. Tran, C.D.; Nguyen-Tri, Q.; Binh, H.T.T.; et al. Large Language Models Powered Neural Solvers for Generalized Vehicle Routing Problems. In Proceedings of the 13th International Conference on Learning Representations (ICLR 2025), Singapore, 24–28 April 2025.
96.
Jiang, X.; Wu, Y.; Zhang, C.; et al. DRoC: Elevating Large Language Models for Complex Vehicle Routing via Decomposed Retrieval of Constraints. In Proceedings of the 13th International Conference on Learning Representations (ICLR 2025), Singapore, 24–28 April 2025.

Scilight Press

Author Information

Abstract

Keywords

References

About Scilight

Journals

Publishing Policies

Contact Us