运筹与管理 ›› 2025, Vol. 34 ›› Issue (8): 99-104.DOI: 10.12005/orms.2025.0247

• 理论分析与方法探讨 • 上一篇    下一篇

基于深度强化学习的柔性作业车间动态节能调度问题研究

陆心屹, 韩晓龙   

  1. 上海海事大学 物流科学与工程研究院,上海 201306
  • 收稿日期:2023-08-15 发布日期:2025-12-04
  • 通讯作者: 陆心屹(1998-),女,江苏苏州人,硕士研究生,研究方向:采购与供应链管理。Email: 992028708@qq.com。

Dynamic Energy-saving Scheduling Method Based on DeepReinforcement Learning for Flexible Job Shop

LU Xinyi, HAN Xiaolong   

  1. Institute of Logistics Science and Engineering, Shanghai Maritime University, Shanghai 201306, China
  • Received:2023-08-15 Published:2025-12-04

摘要: 针对绿色制造背景下的柔性作业车间调度问题,考虑具有随机工件到达的动态事件,建立最小化总延迟时间及生产能耗的混合整数规划模型,提出一种基于复合调度规则和深度强化学习的DDQN-ST9算法。首先,结合优化目标分别对工序排序和机器选择各设计3种启发式调度规则。其次,通过引入优先经验回放SumTree实现DDQN训练框架的改进。此外,设定6个生产状态特征指标作为算法的输入,利用组合优势构成9种复合调度规则作为算法的动作空间并设计稀疏奖励机制,形成了自适应调度算法DDQN-ST9。最后,通过不同规模和参数的生产算例验证了DDQN-ST9在求解柔性作业车间动态节能调度问题的有效性和优越性,以及应对具有不同配置生产环境的通用性。

关键词: 柔性作业车间动态调度, 深度强化学习, 启发式调度规则, 优先经验回放, 自适应调度

Abstract: The growing speed of economic globalization has exposed the manufacturing industry to fierce market competition and a changeable production environment. Therefore, responding effectively to the challenges posed by various dynamic events has become a key issue affecting the survival of businesses. In recent years, many enterprises have been focusing on flexible manufacturing models. In this context, the dynamic flexible job-shop scheduling problem (DFJSP) has attracted extensive attention from industry and academia.
Random events associated with workpieces are one of the reasons for driving dynamic scheduling. One of the typical problems is the dynamic flexible job-shop scheduling problem with random job arrivals (DFJSP-RJA). To address this problem, this paper develops a mixed-integer programming model in conjunction with green manufacturing propositions. The optimization objective is to simultaneously minimize the total production delay and energy consumption that includes the machine idling and processing energy consumption.
Since deep reinforcement learning (DRL) can achieve both high quality and high response in dynamic environments, and scheduling rules are widely used as an immediate response method in the pre-study of dynamic scheduling problems, this paper combines the two and proposes an algorithm, called DDQN-ST9, based on composite scheduling rules and DRL. First, based on the optimization objectives of timely completion and energy saving, six production state feature indicators with values between [0,1] are set and three scheduling rules are designed for procedure and machine scheduling respectively, which are used for the construction of feature vectors and action spaces in the later usage of the algorithm. And then the prioritized experience replay, called Sum Tree, is introduced on the basis of the DDQN algorithm to accelerate the convergence speed and improve the training efficiency. DFJSP-RJA can be regarded as a Markov decision process (MDP), in which the agent, after a perturbation occurs, selects the most suitable scheduling rule among nine composite scheduling rules by integrating the information of the current production state to complete the scheduling of the original and afterwards arrive jobs.
In order to comprehensively test the performance of the DDQN-ST9 in solving the dynamic energy-saving scheduling problem for flexible job-shop, the algorithm is simulated using several benchmark examples from the Kacem and Brandimarte series. Firstly, the nine composite scheduling rules proposed in this paper are compared with the five classical scheduling rules appearing in the literature, and the Kacem and Brandimarte benchmark instances of different sizes are solved respectively, which verifies the superiority of DDQN-ST9 in the aspects of scheduling rule design, scheduling algorithm design and improving algorithm structure. Secondly, by varying both parameters of the delivery urgency factor and the exponential distribution followed by the random arrivals of the jobs, a number of Brandimarte benchmark instances with different delivery requirements and market demands are solved, verifying that the DDQN-ST9 algorithm can effectively cope with a variety of production environment configurations.
This paper focuses on combining DRL with scheduling rules to apply to the dynamic energy-saving scheduling problem in a flexible job shop, which can be extended to other shop environments in the future, and to consider the impact of other dynamic events on production scheduling, such as machining time changes, machine failures, and so on. In addition, it can be investigated how to better optimize the solution process of complex dynamic scheduling problems based on deep reinforcement learning.

Key words: dynamic flexible job shop scheduling problem, deep reinforcement learning, heuristic scheduling rules, prioritized experience replay, adaptive scheduling

中图分类号: