运筹与管理 ›› 2026, Vol. 35 ›› Issue (1): 153-159.DOI: 10.12005/orms.2026.0022

• 应用研究 • 上一篇    下一篇

多源信息融合的深度强化学习求解车辆路径规划

苏继满1,2, 鲁宇明1, 黎政秀3, 洪连环1, 揭丽琳4   

  1. 1.南昌航空大学 航空制造与机械工程学院,江西 南昌 330063;
    2.南通科技职业学院 能源交通学院,江苏 南通 226007;
    3.南昌航空大学 信息工程学院,江西 南昌 330063;
    4.南昌航空大学 仪器科学与光电工程学院,江西 南昌 330063
  • 收稿日期:2024-06-15 发布日期:2026-06-04
  • 通讯作者: 鲁宇明(1969-),女,江西南昌人,教授,硕士生导师,研究方向:优化理论,进化算法等。Email: luyuming69@163.com。
  • 作者简介:苏继满(1998-),女,安徽宿州人,硕士研究生,研究方向:组合优化,深度强化学习等。
  • 基金资助:
    国家自然科学基金资助项目(620066031);江西省自然科学基金项目(20242BAB25094);南通市自然科学基金和社会民生科技计划项目(MSZ2024106)

Deep Reinforcement Learning Based on Multi-source Information Fusion Solving Vehicle Path Planning Problem

SU Jiman1,2, LU Yuming1, LI Zhengxiu3, HONG Lianhuan1, JIE Lilin4   

  1. 1. School of Aeronautical Manufacturing and Mechanical Engineering, Nanchang Hangkong University, Nanchang 330063, China;
    2. School of Faculty of Energy and Transportation, Nantong College of Science And Technology, Nantong 226007, China;
    3. School of Information Engineering, Nanchang Hangkong University, Nanchang 330063, China;
    4. School of Instrument Science and Optical Engineering, Nanchang Hangkong University, Nanchang 330063, China
  • Received:2024-06-15 Published:2026-06-04

摘要: 车辆路径规划问题是现代物流及交通运输领域中应用较为广泛的问题模型,目前在基于深度强化学习求解车辆路径规划问题中,出现耗时较长、求解速度较慢以及无法获取准确解等问题。针对上述问题,提出了一种改进的基于深度强化学习求解多目标车辆路径的方法,利用多样化的编码器挖掘多源信息;引入了基于上下文多尺度信息解码器的结构,用以构建出精确的决策序列。在模型训练阶段,采用REINFORCE算法中Greedy Rollout基线的方法,提高模型求解质量。通过对不同规模问题进行实验以及与现有的深度强化学习算法进行比较,本文算法在求解质量上更加准确,求解速度上也得到了很大的提升,另外通过泛化性实验表明,本文算法具有更好的鲁棒性和泛化性。

关键词: 车辆路径, 深度强化学习, 局部表征, 上下文多尺度信息, Greedy Rollout

Abstract: Vehicle routing problem is a combinatorial optimization model widely used in the field of modern logistics and transportation, involving many fields such as resource allocation, process optimization, network planning, logistics and transportation. It is a typical NP-hard combinatorial optimization problem with high complexity and many constraints. The traditional vehicle routing problem solving methods include exact algorithm, heuristic algorithm and approximate algorithm, but it is difficult to obtain the optimal solution under multi-objective and multi-constraint conditions. In recent years, the deep reinforcement learning method has been gradually applied in the field of vehicle routing problem solving, through the construction of neural network models to autonomously learn the characteristics of problems and optimize in complex scenarios. However, the existing methods based on deep reinforcement learning usually face the problems with long time, slow solving speed and difficulty to obtain accurate solutions in practical applications. To solve them, this paper proposes an improved multi-objective vehicle path planning method based on deep reinforcement learning.
This method uses a variety of encoders to dig into multi-source information, and introduces a context-based multi-scale information decoder structure to construct an accurate decision sequence. In the encoder part, each sub-problem in the multi-objective combinational optimization problem is encoded to realize the deep mining and comprehensive capture of multi-source information. The decoder constructs an accurate and coherent decision sequence based on the context multi-scale interaction space. During the model training phase, the Greedy Rollout baseline method in the REINFORCE algorithm is adopted to improve the solving quality and stability of the model and accelerate the convergence process.
To verify the validity of the proposed method, experimental data are obtained from classical vehicle routing problem datasets and evaluated by generating problem sets of different sizes through simulation. Through experiments on problems of different scales and comparison with existing deep reinforcement learning algorithms, the results show that the proposed algorithm is superior to existing deep reinforcement learning algorithms in solution quality and speed, and shows excellent robustness and generalization in generalization experiments. Specifically, in terms of solution quality, the proposed algorithm can solve the multi-objective combinatorial optimization problem accurately, and the result quality is significantly improved. In terms of solving speed, the learning stability of the model is improved with the REINFORCE algorithm based on the Greedy Rollout baseline, the convergence process is accelerated, and the efficiency is significantly improved. In terms of generalization, the excellent robustness and efficiency of the algorithm in dealing with complex, and high and multi-variable interaction problems are verified by generalization experiments.
The improved model and algorithm proposed in this paper provide a new solution to the multi-objective vehicle path planning problem, which has important theoretical and practical significance. In future studies, we can further optimize the model structure, expand its application in dynamic VRP problems and try to combine with other optimization methods to improve the model performance.

Key words: vehicle routing, deep reinforcement learning, local representation, contextual multi-scale information, Greedy Rollout

中图分类号: