基于深度强化学习的云订单动态接受与调度问题研究

doi:10.12005/orms.2024.0309

运筹与管理 ›› 2024, Vol. 33 ›› Issue (9): 221-226.DOI: 10.12005/orms.2024.0309

基于深度强化学习的云订单动态接受与调度问题研究

丁祥海, 张梦钗, 刘春来, 韩杰

杭州电子科技大学管理学院,浙江杭州 310018

收稿日期:2022-03-27 出版日期:2024-09-25 发布日期:2024-12-31
通讯作者: 刘春来(1986-),男,山东青岛人,讲师,研究方向:生产运作管理
作者简介:丁祥海(1971-),男,湖南湘潭人,教授,研究方向:工业工程;张梦钗(1997-),女,浙江杭州人,硕士研究生,研究方向:智能优化调度;韩杰(1997-),男,山东德州人,硕士研究生,研究方向:车间调度。
基金资助:
国家自然科学基金资助项目(71901084);浙江省自然科学基金项目(LQ19G020010);浙江省属高校基本科研业务费专项资金项目(GK199900299012-210);教育部人文社会科学研究基金项目(19YJC630099)

Research on Cloud Order Dynamic Acceptance and Scheduling Based on Deep Reinforcement Learning

DING Xianghai, ZHANG Mengchai, LIU Chunlai, HAN Jie

School of Management, Hangzhou Dianzi University, Hangzhou 310018, China

Received:2022-03-27 Online:2024-09-25 Published:2024-12-31

摘要/Abstract

摘要： 为解决动态到达的云订单接受与调度问题,以柔性流水车间为背景,提出了结合改进策略的深度Q值网络(DQN)算法。基于研究问题的两阶段性,设计了接单智能体和排单智能体联合决策模型;其中接单智能体以最大化利润为目标,排单智能体以最小化拖期和最小化扰动为目标。针对订单到达动态性,设计了两个智能体的动态交互机制。在排单智能体中,考虑了关键路径的工序候选集、机器候选集、工序最早开始加工等算法改进策略,同时改进DQN网络结构,使选择工件和机器的规则增加至50种,提高云订单与现有订单生产的协同能力。经过数值仿真实验表明,所提算法在不同规模问题上,最大利润、机器负荷、算法稳定性等方面,都有较好的表现,能提高企业利润和接单率。

关键词: 订单接受, 动态决策, 深度强化学习, 柔性流水车间

Abstract: Cloud manufacturing is a new intelligent manufacturing model which uses network and service platform to provide all kinds of on-demand manufacturing services for customer needs. The main characteristics of cloud manufacturing can be summarized as customer-centric, service uncertainty, and service on demand. After production enterprises participate in cloud manufacturing, there are two types of orders: established existing orders and dynamic arrival cloud orders.
In the cloud manufacturing environment, the OAS problem with the flexible flow shop as the processing environment is described as follows: After the platform sends the cloud order to the enterprise with surplus capacity, the enterprise needs to choose whether to accept the order and complete the production arrangement under the premise of producing the existing order. Order arrival follows Poisson distribution, and each order includes quantity, price, delivery time, machining part number and other information. If a cloud order is dynamically distributed to the enterprise, the enterprise needs to combine the production information of the cloud order, the production situation of the workshop and the arrival of future orders, and determine the collection of accepted orders and the production and processing arrangement, so as to maximize the total profit of the enterprise.
Based on flexible flow shop, an improved DQN algorithm is proposed to solve the problem with order acceptance and scheduling in cloud platform dynamic order dispatching. The single agent aims at maximum profit, and the single agent aims at minimum delay time and minimum disturbance. Since the objective functions of the two agents are different, they are non-homogeneous agents, so each agent adopts an independent DQN algorithm, and a dynamic interaction mechanism is established between agents. After the cloud order arrives, the receiving agent chooses to accept or reject the order and transmits the accepted order information to the placing agent. After trying different scheduling rules, the scheduling agent finds the optimal scheduling strategy through the feedback obtained by the reward function. The DQN network structure is improved in the scheduling agent, which increases the number of rules for selecting work piece and machine to 50, and further designs the process candidate set and machine candidate set combining the critical path, and the algorithm improvement strategy such as the earliest start of the process.
The improved DQN algorithm is compared with the heuristic rule, Q-learning algorithm and DQN algorithm. The numerical experiments show that the improved algorithm is stable and superior to other algorithms in terms of maximum and average profit under different delay penalty factors, with higher order acceptance rate and balanced machine load. When the number of cloud orders increases, the worst solution of the improved algorithm is also better than the other algorithms. This shows the effectiveness of the improved algorithm. The scheduling strategy of the agent can optimize the scheduling of dynamically arrived cloud orders, improve the resource utilization rate of the workshop while producing existing orders normally, and improve the profit and order acceptance rate of enterprises. According to the research, most heuristic rules are short-sighted, but they have better performance when combined with DQN algorithm. Different rules are applicable to different scheduling targets and production environments. When deciding whether to accept cloud orders, DQN algorithm, after continuous learning, chooses appropriate scheduling rules and utilizes improved strategies, and can make each rule adjust order acceptance strategy and scheduling strategy in a short time, reduce workshop disturbance, and reduce the impact of delay penalty cost on profits, so as to ensure that the enterprise can obtain the maximum profit.

Key words: order acceptance, dynamic scheduling, reinforcement learning, flexible flow shop

中图分类号:

F252
TP183

丁祥海, 张梦钗, 刘春来, 韩杰. 基于深度强化学习的云订单动态接受与调度问题研究[J]. 运筹与管理, 2024, 33(9): 221-226.

DING Xianghai, ZHANG Mengchai, LIU Chunlai, HAN Jie. Research on Cloud Order Dynamic Acceptance and Scheduling Based on Deep Reinforcement Learning[J]. Operations Research and Management Science, 2024, 33(9): 221-226.

参考文献

[1] 李伯虎,张霖,王时龙,等.云制造—面向服务的网络化制造新模式[J].计算机集成制造系统,2010,16(1):1-7.
[2] SILVA Y L T V, SUBRAMANIAN A, PESSOA A. Exact and heuristic algorithms for order acceptance and scheduling with sequence-dependent setup times[J]. Computers and Operations Research, 2018, 90: 142-160.
[3] 谢杏子,王秀利.单机订单接受与加工调度问题的拉格朗日松弛算法[J].系统管理学报,2020,29(5):874-881.
[4] WU G H, CHENG C Y, YANG H I, et al. An improved water flow-like algorithm for order acceptance and scheduling with identical parallel machines[J]. Applied Soft Computing, 2018, 71: 1072-1084.
[5] NADERI B, ROSHANAEI V. Branch-relax-and-check: A tractable decomposition method for order acceptance and identical parallel machine scheduling[J]. European Journal of Operation Research, 2020, 286(3): 811-827.
[6] YAVARI M, MARVI M, AKBARI A H. Semi-permutation-based genetic algorithm for order acceptance and scheduling in two-stage assembly problem[J]. Neural Computing and Application, 2020, 32(8): 2989-3003.
[7] PENG Z, LIN J, CUI D, et al. A multi-objective trade-off framework for cloud resource scheduling based on the Deep Q-network algorithm[J]. Cluster Computing, 2020,23(4): 2753-2767.
[8] 王晓欢,王宁宁,樊治平.基于强化学习的订单生产型企业的订单接受策略[J].系统工程理论与实践,2014,34(12):3121-3129.
[9] 陈勇,王昊天,易文超,等.基于元胞机与强化学习的多扰动车间调度算法[J].计算机集成制造系统,2020,26(5):1-24.
[10] HU H, JIA X L, HE Q X, et al. Deep reinforcement learning based AGVs real-time scheduling with mixed rule for flexible shop floor in Industry 4.0[J]. Computer & Industrial Engineering, 2020, 149: 106749.
[11] HAN B A, YANG J J. A deep reinforcement learning based solution for flexible job shop scheduling problem[J]. International Journal of Simulation Modeling, 2021, 20(2): 376-386.
[12] LUO S, ZHANG L X, FAN Y S. Dynamic muti-objective scheduling for flexible job shop by deep reinforcement learning[J]. Computer & Industrial Engineering, 2021, 159: 107489.
[13] 唐红涛,刘家毅.改进的布谷鸟算法求解考虑运输时间的分布式柔性流水车间调度问题[J].运筹与管理,2021,30(11):76-83.

基于深度强化学习的云订单动态接受与调度问题研究

Research on Cloud Order Dynamic Acceptance and Scheduling Based on Deep Reinforcement Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 8

编辑推荐

Metrics

[1]	李泰新, 刘锋, 徐健. 基于深度强化学习技术的算力服务平台革新——以中国东数西算重大工程为案例[J]. 运筹与管理, 2024, 33(9): 160-167.
[2]	熊福力, 储梦伶. 预制构件流水车间订单接受与调度的集成优化[J]. 运筹与管理, 2022, 31(8): 70-76.
[3]	轩华, 刘淑燕, 王薛苑, 李冰. 含不相关机的动态可重入柔性流水车间问题的混合DABC-GA算法[J]. 运筹与管理, 2022, 31(11): 44-51.
[4]	唐红涛, 刘家毅. 改进的布谷鸟算法求解考虑运输时间的分布式柔性流水车间调度问题[J]. 运筹与管理, 2021, 30(11): 76-83.
[5]	李俊, 张煜, 计三有, 马杰. 不确定箱量下内河集装箱班轮航线动态配载决策[J]. 运筹与管理, 2020, 29(7): 64-71.
[6]	徐翔斌, 李志鹏. 强化学习在运筹学的应用:研究进展与展望[J]. 运筹与管理, 2020, 29(5): 227-239.
[7]	柴剑彬, 刘赫, 贝晓强. 考虑机器调整次数和产品质量的卷烟批量计划和柔性流水车间调度集成问题[J]. 运筹与管理, 2019, 28(10): 165-174.
[8]	刘文静,杨璐,刘克,王颖芝. 恶意欠薪问题的定量分析[J]. 运筹与管理, 2018, 27(11): 61-69.