多源信息融合的深度强化学习求解车辆路径规划

doi:10.12005/orms.2026.0022

运筹与管理 ›› 2026, Vol. 35 ›› Issue (1): 153-159.DOI: 10.12005/orms.2026.0022

多源信息融合的深度强化学习求解车辆路径规划

苏继满^1,2, 鲁宇明¹, 黎政秀³, 洪连环¹, 揭丽琳⁴

1.南昌航空大学航空制造与机械工程学院,江西南昌 330063;
2.南通科技职业学院能源交通学院,江苏南通 226007;
3.南昌航空大学信息工程学院,江西南昌 330063;
4.南昌航空大学仪器科学与光电工程学院,江西南昌 330063

收稿日期:2024-06-15 发布日期:2026-06-04
通讯作者: 鲁宇明(1969-),女,江西南昌人,教授,硕士生导师,研究方向:优化理论,进化算法等。Email: luyuming69@163.com。
作者简介:苏继满(1998-),女,安徽宿州人,硕士研究生,研究方向:组合优化,深度强化学习等。
基金资助:
国家自然科学基金资助项目(620066031);江西省自然科学基金项目(20242BAB25094);南通市自然科学基金和社会民生科技计划项目(MSZ2024106)

Deep Reinforcement Learning Based on Multi-source Information Fusion Solving Vehicle Path Planning Problem

SU Jiman^1,2, LU Yuming¹, LI Zhengxiu³, HONG Lianhuan¹, JIE Lilin⁴

1. School of Aeronautical Manufacturing and Mechanical Engineering, Nanchang Hangkong University, Nanchang 330063, China;
2. School of Faculty of Energy and Transportation, Nantong College of Science And Technology, Nantong 226007, China;
3. School of Information Engineering, Nanchang Hangkong University, Nanchang 330063, China;
4. School of Instrument Science and Optical Engineering, Nanchang Hangkong University, Nanchang 330063, China

Received:2024-06-15 Published:2026-06-04

摘要/Abstract

摘要： 车辆路径规划问题是现代物流及交通运输领域中应用较为广泛的问题模型,目前在基于深度强化学习求解车辆路径规划问题中,出现耗时较长、求解速度较慢以及无法获取准确解等问题。针对上述问题,提出了一种改进的基于深度强化学习求解多目标车辆路径的方法,利用多样化的编码器挖掘多源信息;引入了基于上下文多尺度信息解码器的结构,用以构建出精确的决策序列。在模型训练阶段,采用REINFORCE算法中Greedy Rollout基线的方法,提高模型求解质量。通过对不同规模问题进行实验以及与现有的深度强化学习算法进行比较,本文算法在求解质量上更加准确,求解速度上也得到了很大的提升,另外通过泛化性实验表明,本文算法具有更好的鲁棒性和泛化性。

关键词: 车辆路径, 深度强化学习, 局部表征, 上下文多尺度信息, Greedy Rollout

Abstract: Vehicle routing problem is a combinatorial optimization model widely used in the field of modern logistics and transportation, involving many fields such as resource allocation, process optimization, network planning, logistics and transportation. It is a typical NP-hard combinatorial optimization problem with high complexity and many constraints. The traditional vehicle routing problem solving methods include exact algorithm, heuristic algorithm and approximate algorithm, but it is difficult to obtain the optimal solution under multi-objective and multi-constraint conditions. In recent years, the deep reinforcement learning method has been gradually applied in the field of vehicle routing problem solving, through the construction of neural network models to autonomously learn the characteristics of problems and optimize in complex scenarios. However, the existing methods based on deep reinforcement learning usually face the problems with long time, slow solving speed and difficulty to obtain accurate solutions in practical applications. To solve them, this paper proposes an improved multi-objective vehicle path planning method based on deep reinforcement learning.
This method uses a variety of encoders to dig into multi-source information, and introduces a context-based multi-scale information decoder structure to construct an accurate decision sequence. In the encoder part, each sub-problem in the multi-objective combinational optimization problem is encoded to realize the deep mining and comprehensive capture of multi-source information. The decoder constructs an accurate and coherent decision sequence based on the context multi-scale interaction space. During the model training phase, the Greedy Rollout baseline method in the REINFORCE algorithm is adopted to improve the solving quality and stability of the model and accelerate the convergence process.
To verify the validity of the proposed method, experimental data are obtained from classical vehicle routing problem datasets and evaluated by generating problem sets of different sizes through simulation. Through experiments on problems of different scales and comparison with existing deep reinforcement learning algorithms, the results show that the proposed algorithm is superior to existing deep reinforcement learning algorithms in solution quality and speed, and shows excellent robustness and generalization in generalization experiments. Specifically, in terms of solution quality, the proposed algorithm can solve the multi-objective combinatorial optimization problem accurately, and the result quality is significantly improved. In terms of solving speed, the learning stability of the model is improved with the REINFORCE algorithm based on the Greedy Rollout baseline, the convergence process is accelerated, and the efficiency is significantly improved. In terms of generalization, the excellent robustness and efficiency of the algorithm in dealing with complex, and high and multi-variable interaction problems are verified by generalization experiments.
The improved model and algorithm proposed in this paper provide a new solution to the multi-objective vehicle path planning problem, which has important theoretical and practical significance. In future studies, we can further optimize the model structure, expand its application in dynamic VRP problems and try to combine with other optimization methods to improve the model performance.

Key words: vehicle routing, deep reinforcement learning, local representation, contextual multi-scale information, Greedy Rollout

中图分类号:

TP18
U492.22

苏继满, 鲁宇明, 黎政秀, 洪连环, 揭丽琳. 多源信息融合的深度强化学习求解车辆路径规划[J]. 运筹与管理, 2026, 35(1): 153-159.

SU Jiman, LU Yuming, LI Zhengxiu, HONG Lianhuan, JIE Lilin. Deep Reinforcement Learning Based on Multi-source Information Fusion Solving Vehicle Path Planning Problem[J]. Operations Research and Management Science, 2026, 35(1): 153-159.

参考文献

[1] BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[C]//5th International Conference on Learning Representations, April 24-26, 2017, Toulon, France. Amherst: Open Review.net, 2017: 1-14.
[2] ZHANG Z, WU Z, ZHANG H, et al. Meta-Learning-based deep reinforcement learning for multi objective optimization problems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 32(2): 2334-2342.
[3] LI K, ZHANG T, WANG R. Deep reinforcement learning for multi-objective optimization[J]. IEEE Transactions on Cybernetics,2020,51(6): 3103-3114.
[4] 王万良,陈浩立,李国庆,等.基于深度强化学习的多配送中心车辆路径规划[J].控制与决策,2022,37(8):2101-2109.
[5] ZHENG J, HE K, ZHOU J, et al. Combining reinforcement learning with Lin-Kernighan-Helsgaun algorithm for the traveling salesman problem[C]//35th AAAI Conference on Artificial Intelligence, February 2-9, 2021, Virtual Event, Vancouver, Canada. Palo Alto: AAAI Press, 2021: 12445-12452.
[6] PENG B, WANG J H, ZHANG Z Z. A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems[C]//11th International Symposium on Intelligence Computation and Applications, November 16-17, 2019, Guangzhou Yanling Hotel, Guangzhou, China. Singapore: Springer, 2020: 636-650.
[7] LIN B, GHADDAR B, NATHWANI J. Deep reinforcement learning for the electric vehicle routing problem with time windows[J]. IEEE Transactions on Intelligent Transportation Systems,2022,23(8): 11528-11538.
[8] ZHANG Q, LI H. MOEA/D: A multi objective evolutionary algorithm based on decomposition[J]. IEEE Transactions on Evolutionary Computation, 2007,11(6): 712-731.
[9] JAMES J Q, YU W, GU J T. Online vehicle routing with neural combinatorial optimization and deep reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 20(10): 3806-3817.
[10] KOOL W, VAN HOOF H, WELLING M. Attention, learn to solve routing problems![C]//7th International Conference on Learning Representations, May 6-9, 2019, Ernest N. Morial Convention Center, New Orleans, USA. Amherst: OpenReview.net, 2019: 1-16.
[11] BECKER S, JENTZEN A, MULLER M S, et al. Learning the random variables in monte carlo simulations with stochastic gradient descent: Machine learning for parametric PDEs and financial derivative pricing[J]. Mathematical Finance, 2024, 34(1): 90-150.

[1]	丁秋雷, 刘目康, 胡祥培, 姜洋. 时间窗变动下生鲜品同时取送车辆路径问题的干扰管理方法[J]. 运筹与管理, 2025, 34(9): 61-69.
[2]	高妮, 冉启黎, 贺毅岳. 基于变分模态分解的深度强化学习投资组合模型[J]. 运筹与管理, 2025, 34(9): 162-168.
[3]	邵举平, 施瑾, 孙延安. 考虑碳交易与实时电价的“光储充一体化”冷链物流运营优化[J]. 运筹与管理, 2025, 34(9): 233-239.
[4]	陆心屹, 韩晓龙. 基于深度强化学习的柔性作业车间动态节能调度问题研究[J]. 运筹与管理, 2025, 34(8): 99-104.
[5]	王苓, 王钰, 梁承姬. 面向连续泊位和岸桥动态调度问题的强化学习方法[J]. 运筹与管理, 2025, 34(8): 185-191.
[6]	倪智铖, 杨臻, 王能民, 曹真, 郑爽. 垃圾分类清运多车型多车厢车辆路径优化研究[J]. 运筹与管理, 2025, 34(5): 31-38.
[7]	王雅雪, 陈彦如. 基于分支定价算法的众包车辆和无人机与卡车混合配送路径问题研究[J]. 运筹与管理, 2025, 34(4): 86-91.
[8]	吴鹏, 宋法融. 考虑时变灾情的震后应急医疗救援物资调配车辆路径规划[J]. 运筹与管理, 2025, 34(4): 92-98.
[9]	邱菲尔, 耿娜. 柔性生产与物流配送联合调度优化方法[J]. 运筹与管理, 2025, 34(2): 1-8.
[10]	文若霖, 陈峰. 序定车辆路径问题:模型与算法研究[J]. 运筹与管理, 2025, 34(2): 9-15.
[11]	王勇, 谢红霞, 罗思妤. 生鲜商品物流多车舱多车型温控配送的车辆路径问题研究[J]. 运筹与管理, 2025, 34(12): 47-55.
[12]	刘浩, 张静文, 陈志, 李恒. 基于深度强化学习的碳交易机制下施工场地受限型项目调度优化[J]. 运筹与管理, 2025, 34(12): 100-106.
[13]	孙卓, 杨慧荣, 吴龙杰, 韩沛秀. 两级模式下考虑充电策略的垃圾收运路径优化[J]. 运筹与管理, 2025, 34(12): 115-122.
[14]	马云峰, 胡健, 欧阳立君, 胡依娜, 李建. 考虑三维装箱约束的无人机二级车辆路径问题研究[J]. 运筹与管理, 2025, 34(12): 130-137.
[15]	周愉峰, 赵奕萌. 患者可选服务模式与时间窗的家庭医生调度优化[J]. 运筹与管理, 2025, 34(11): 166-172.

多源信息融合的深度强化学习求解车辆路径规划

Deep Reinforcement Learning Based on Multi-source Information Fusion Solving Vehicle Path Planning Problem

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics