运筹与管理 ›› 2020, Vol. 29 ›› Issue (5): 227-239.DOI: 10.12005/orms.2020.0137

• 综述 • 上一篇    

强化学习在运筹学的应用:研究进展与展望

徐翔斌, 李志鹏   

  1. 华东交通大学 交通运输与物流学院,江西 南昌 330013
  • 收稿日期:2018-12-27 出版日期:2020-05-25
  • 作者简介:徐翔斌(1975-),男,江西湖口人,教授,博士,主要研究方向为物流与供应链管理;李志鹏(1995-),男,江西鄱阳人,硕士研究生,主要研究方向为强化学习与运筹优化。
  • 基金资助:
    国家自然科学基金资助项目(71761013);江西省自然科学基金面上项目(20181BAB201010)

Research Progress and Prospects for Application of Reinforcement Learning in Operations Research

XU Xiang-bin, LI Zhi-peng   

  1. School of Transportation and Logistics, East China Jiaotong University, Nanchang 330013, China
  • Received:2018-12-27 Online:2020-05-25

摘要: 强化学习已经成为人工智能领域一个新的研究热点,并已成功应用于各领域,强化学习将运筹优化领域的很多问题视为序贯决策问题,建模为马尔可夫决策过程并进行求解,在求解复杂、动态、随机运筹优化问题具有较大的优势。本文主要对强化学习在运筹优化领域的应用进行综述,首先介绍了强化学习的基本原理及其应用于运筹优化领域的研究框架,然后回顾并总结了强化学习在库存控制、路径优化、装箱配载和车间作业调度等方面的研究成果,并将最新的深度强化学习以及传统方法在运筹学领域的应用研究进行了对比分析,以突出深度强化学习的优越性。最后提出几个值得进一步探讨的研究方向,期望能为强化学习在运筹优化领域的研究提供参考。

关键词: 强化学习, 运筹优化, 序贯决策, 马尔可夫决策过程, 深度强化学习

Abstract: Reinforcement learning has become a new research hotspot in the field of artificial intelligence, and has been successfully applied in various fields. Reinforcement learning regards many problems in the community of operational optimization as sequential decision problems, modeled as Markov decision processes, and then solve them. It has great advantages in solving complex, dynamic and random operation optimization problems. This paper mainly summarizes the application of reinforcement learning in the area of operational optimization. Firstly, it introduces the basic principles of reinforcement learning and its application framework in the field of operational optimization. Then it systematically reviews and summarizes the reinforcement learning in inventory control, path optimization, packing and loading and job shop scheduling. And the latest deep reinforcement learning and the application of traditional methods in the field of operations research are compared and analyzed to highlight the superiority of deep reinforcement learning. Finally, several research directions worthy of further discussion are proposed, and it is expected to provide reference for the study of reinforcement learning in the field of operational optimization.

Key words: reinforcement learning, operation and optimization, sequential decision, Markov decision process, deep reinforcement learning

中图分类号: