运筹与管理 ›› 2024, Vol. 33 ›› Issue (9): 221-226.DOI: 10.12005/orms.2024.0309

• 管理科学 • 上一篇    下一篇

基于深度强化学习的云订单动态接受与调度问题研究

丁祥海, 张梦钗, 刘春来, 韩杰   

  1. 杭州电子科技大学 管理学院,浙江 杭州 310018
  • 收稿日期:2022-03-27 出版日期:2024-09-25 发布日期:2024-12-31
  • 通讯作者: 刘春来(1986-),男,山东青岛人,讲师,研究方向:生产运作管理
  • 作者简介:丁祥海(1971-),男,湖南湘潭人,教授,研究方向:工业工程;张梦钗(1997-),女,浙江杭州人,硕士研究生,研究方向:智能优化调度;韩杰(1997-),男,山东德州人,硕士研究生,研究方向:车间调度。
  • 基金资助:
    国家自然科学基金资助项目(71901084);浙江省自然科学基金项目(LQ19G020010);浙江省属高校基本科研业务费专项资金项目(GK199900299012-210);教育部人文社会科学研究基金项目(19YJC630099)

Research on Cloud Order Dynamic Acceptance and Scheduling Based on Deep Reinforcement Learning

DING Xianghai, ZHANG Mengchai, LIU Chunlai, HAN Jie   

  1. School of Management, Hangzhou Dianzi University, Hangzhou 310018, China
  • Received:2022-03-27 Online:2024-09-25 Published:2024-12-31

摘要: 为解决动态到达的云订单接受与调度问题,以柔性流水车间为背景,提出了结合改进策略的深度Q值网络(DQN)算法。基于研究问题的两阶段性,设计了接单智能体和排单智能体联合决策模型;其中接单智能体以最大化利润为目标,排单智能体以最小化拖期和最小化扰动为目标。针对订单到达动态性,设计了两个智能体的动态交互机制。在排单智能体中,考虑了关键路径的工序候选集、机器候选集、工序最早开始加工等算法改进策略,同时改进DQN网络结构,使选择工件和机器的规则增加至50种,提高云订单与现有订单生产的协同能力。经过数值仿真实验表明,所提算法在不同规模问题上,最大利润、机器负荷、算法稳定性等方面,都有较好的表现,能提高企业利润和接单率。

关键词: 订单接受, 动态决策, 深度强化学习, 柔性流水车间

Abstract: Cloud manufacturing is a new intelligent manufacturing model which uses network and service platform to provide all kinds of on-demand manufacturing services for customer needs. The main characteristics of cloud manufacturing can be summarized as customer-centric, service uncertainty, and service on demand. After production enterprises participate in cloud manufacturing, there are two types of orders: established existing orders and dynamic arrival cloud orders.
In the cloud manufacturing environment, the OAS problem with the flexible flow shop as the processing environment is described as follows: After the platform sends the cloud order to the enterprise with surplus capacity, the enterprise needs to choose whether to accept the order and complete the production arrangement under the premise of producing the existing order. Order arrival follows Poisson distribution, and each order includes quantity, price, delivery time, machining part number and other information. If a cloud order is dynamically distributed to the enterprise, the enterprise needs to combine the production information of the cloud order, the production situation of the workshop and the arrival of future orders, and determine the collection of accepted orders and the production and processing arrangement, so as to maximize the total profit of the enterprise.
Based on flexible flow shop, an improved DQN algorithm is proposed to solve the problem with order acceptance and scheduling in cloud platform dynamic order dispatching. The single agent aims at maximum profit, and the single agent aims at minimum delay time and minimum disturbance. Since the objective functions of the two agents are different, they are non-homogeneous agents, so each agent adopts an independent DQN algorithm, and a dynamic interaction mechanism is established between agents. After the cloud order arrives, the receiving agent chooses to accept or reject the order and transmits the accepted order information to the placing agent. After trying different scheduling rules, the scheduling agent finds the optimal scheduling strategy through the feedback obtained by the reward function. The DQN network structure is improved in the scheduling agent, which increases the number of rules for selecting work piece and machine to 50, and further designs the process candidate set and machine candidate set combining the critical path, and the algorithm improvement strategy such as the earliest start of the process.
The improved DQN algorithm is compared with the heuristic rule, Q-learning algorithm and DQN algorithm. The numerical experiments show that the improved algorithm is stable and superior to other algorithms in terms of maximum and average profit under different delay penalty factors, with higher order acceptance rate and balanced machine load. When the number of cloud orders increases, the worst solution of the improved algorithm is also better than the other algorithms. This shows the effectiveness of the improved algorithm. The scheduling strategy of the agent can optimize the scheduling of dynamically arrived cloud orders, improve the resource utilization rate of the workshop while producing existing orders normally, and improve the profit and order acceptance rate of enterprises. According to the research, most heuristic rules are short-sighted, but they have better performance when combined with DQN algorithm. Different rules are applicable to different scheduling targets and production environments. When deciding whether to accept cloud orders, DQN algorithm, after continuous learning, chooses appropriate scheduling rules and utilizes improved strategies, and can make each rule adjust order acceptance strategy and scheduling strategy in a short time, reduce workshop disturbance, and reduce the impact of delay penalty cost on profits, so as to ensure that the enterprise can obtain the maximum profit.

Key words: order acceptance, dynamic scheduling, reinforcement learning, flexible flow shop

中图分类号: