Real-time Scheduling Method Based on Reinforcement Learning for Material Handling in Assembly Lines

doi:10.12005/orms.2024.0183

Operations Research and Management Science ›› 2024, Vol. 33 ›› Issue (6): 71-77.DOI: 10.12005/orms.2024.0183

• Theory Analysis and Methodology Study • Previous Articles Next Articles

Real-time Scheduling Method Based on Reinforcement Learning for Material Handling in Assembly Lines

XIA Beixin, GU Jiayi, TIAN Tong, YUAN Jie, PENG Yunfang

School of Management, Shanghai University, Shanghai 200444, China

Received:2022-06-14 Online:2024-06-25 Published:2024-08-14

基于改进强化学习的准时化物料搬运系统实时调度方法

夏蓓鑫, 顾嘉怡, 田童, 袁杰, 彭运芳

上海大学管理学院,上海 200444

通讯作者: 彭运芳(1984-),女,湖北孝感人,博士,副教授,研究方向:生产计划与排序,物流系统规划,运筹优化。
作者简介:夏蓓鑫(1984-),男,浙江宁波人,博士,副教授,研究方向:系统建模与仿真,调度,数据分析。
基金资助:
国家自然科学基金资助项目(71801147);上海市浦江人才计划项目(22PJC051)

Abstract

Abstract: The scheduling of the workshop material handling system is an important part of the production control system of the manufacturing enterprise’s flow workshop. Timely and efficient material scheduling can effectively improve production efficiency and economic benefits. In the actual production process, there may be some random events that make the workshop material handling system dynamic. In order to dynamically respond to changes in the state of the assembly line and effectively balance the production efficiency and energy consumption of mixed flow assembly, this paper proposes a reinforcement learning scheduling model based on Q-learning algorithm.
The real-time state information of the manufacturing system includes all the state characteristic information of the system at a certain moment. Considering that the complexity of the system is difficult to cover all system states, in order to simplify the model and ensure the accuracy of the decision-making model, and effectively use reinforcement learning to solve it, this paper selects the current real-time information, forward-looking information of the system and the slack time of each part as the system state characteristics used in the scheduling decision model. It sets up five action groups according to the number of transported parts and the transport sequence of multiple parts. The calculation of the transport scheduling plan for each action group of a multi-carrying trolley is divided into three steps: selecting the transport task, calculating the start time, and coordinating the start time point. The reward and punishment function of the system feedback includes three dimensions: out-of-stock time, handling distance, and part-line inventory, which are given different weights according to the optimization goal, in order to realize the multi-objective optimization of minimizing the travel distance of multi-load trolleys and the line-side inventory of each part while satisfying the on-time delivery of parts on the assembly line as much as possible.
In order to solve the problem that the Q table is too large, this paper proposes an improved two-parameter greedy strategy selection method, and introduces the LSTM neural network on the basis of the greedy strategy to fit the Q value, approximating the Q-value function with LSTM neural network, in order to achieve a balanced optimization between speeding up convergence and avoiding premature maturity.
This paper uses Arena simulation software to build a simulation system for the mixed-flow assembly line of automobiles, and compare and observe the performance of different scheduling methods under different product ratios. The simulation results show that the optimization effect of modified Q-learning algorithm is better than other scheduling strategies which can effectively reduce the handling distance while ensuring that materials are delivered to the assembly line on time to achieve maximum output.At the same time, the calculation time consumed by the reinforcement learning scheduling method for a scheduling decision is significantly less than other methods, showing good real-time response capability, which meets the real-time requirements of the actual production environment for the scheduling method of the material handling system.

Key words: shop floor material handling system, reinforcement learning, Q-learning, hybrid policy

摘要： 准时高效的物料搬运系统保证了装配制造的持续稳定运行,为动态应对装配线状态变化,有效平衡混流装配的生产效率与能耗,本文提出了基于Q学习算法的强化学习调度模型,对其系统状态、动作策略、报酬函数进行设计,并引入神经网络对Q值函数进行泛化和逼近,改进策略选择机制,形成基于双参数贪婪策略的强化学习动态调度方法。仿真实验结果表明,这种强化学习调度相比其他调度方法,物料搬运调度的优化效果更好,能在保证物料准时运送到装配线,实现最大产量的同时,有效减少搬运距离。

关键词: 车间物料搬运系统, 强化学习, Q学习, 混合策略

CLC Number:

C935/TP18

XIA Beixin, GU Jiayi, TIAN Tong, YUAN Jie, PENG Yunfang. Real-time Scheduling Method Based on Reinforcement Learning for Material Handling in Assembly Lines[J]. Operations Research and Management Science, 2024, 33(6): 71-77.

夏蓓鑫, 顾嘉怡, 田童, 袁杰, 彭运芳. 基于改进强化学习的准时化物料搬运系统实时调度方法[J]. 运筹与管理, 2024, 33(6): 71-77.

References

[1] 付建林,张恒志,张剑,等.自动导引车调度优化研究综述[J].系统仿真学报,2020,32(9):1664-1675.
[2] 曹立佳,刘洋.制造车间自动导引车调度新进展[J].计算机工程与应用,2021,57(21):59-67.
[3] MIYAMOTO T, INOUE K. Local and random searches for dispatch and conflict free routing problem of capacitated AGV systems[J]. Computers & Industrial Engineering, 2016, 91: 1-9.
[4] 霍凯歌,张亚琦,胡志华.自动化集装箱码头多载AGV调度问题研究[J].大连理工大学学报,2016,56(3):244-251.
[5] HO Y C, LIU H C, YIH Y. A multiple-attribute method for concurrently solving the pickup-dispatching problem and the load-selection problem of multiple-load AGVs[J]. Journal of Manufacturing Systems, 2012, 31(3):288-300.
[6] 肖海宁,楼佩煌,满增光,等.自动导引车系统实时多属性任务调度方法[J].计算机集成制造系统,2012,18(10):2224-2230.
[7] NAMITA S, SARNGADHARAN P V, PAL P K. AGV scheduling for automated material distribution: A case study[J]. Journal of Intelligent Manufacturing, 2011, 22(2): 219-228.
[8] YANG Y S, ZHONG M S, DESSOUKY Y, et al. An integrated scheduling method for AGV routing in automated container terminals[J]. Computers & Industrial Engineering, 2018, 126: 482-493.
[9] ZHANG F Q, LI J J. An improved particle swarm optimization algorithm for integrated scheduling model in AGV-served manufacturing systems[J]. Journal of Advanced Manufacturing Systems, 2018, 17(3): 375-390.
[10] JIN J, ZHANG X H. Multi AGV scheduling problem in automated container terminal[J]. Journal of Marine Science and Technology—Taiwan, 2016, 24(1): 32-38.
[11] ZHANG J, DING G F, ZOU Y S, et al. Review of job shop scheduling research and its new perspectives under industry 4.0[J]. Journal of Intelligent Manufacturing, 2019, 30(4): 1809-1830.
[12] XUE T F, ZENG P, YU H B. A reinforcement learning method for multi-AGV scheduling in manufacturing[C]//2018 IEEE International Conference on Industrial Technology (ICIT),February 20-22,2018, Lyon, France. IEEE, 2018: 1557-1561.

Real-time Scheduling Method Based on Reinforcement Learning for Material Handling in Assembly Lines

基于改进强化学习的准时化物料搬运系统实时调度方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 4

Recommended Articles

Metrics

[1]	ZHANG Wenyu, YUAN Yongbin, GAO Xue, ZHANG Bingchen. Improved Marine Predators Algorithm for Large-scale Optimization Problems [J]. Operations Research and Management Science, 2024, 33(6): 14-21.
[2]	REN Jian-feng, YE Chun-ming, YANG Feng. Research on Path Optimization Modeling and Algorithm of WorkshopHandling Robotwith Time Window [J]. Operations Research and Management Science, 2020, 29(5): 52-60.
[3]	XU Xiang-bin, LI Zhi-peng. Research Progress and Prospects for Application of Reinforcement Learning in Operations Research [J]. Operations Research and Management Science, 2020, 29(5): 227-239.
[4]	CHEN Jun-lin, ZHAO Xiao-bo, SONG Ya-nan, CHEN Jian-ming. An Experimental Study of Fairness and Learning in a Triadic Supply Chain [J]. Operations Research and Management Science, 2015, 24(2): 20-28.