Operations Research and Management Science ›› 2024, Vol. 33 ›› Issue (6): 71-77.DOI: 10.12005/orms.2024.0183

• Theory Analysis and Methodology Study • Previous Articles     Next Articles

Real-time Scheduling Method Based on Reinforcement Learning for Material Handling in Assembly Lines

XIA Beixin, GU Jiayi, TIAN Tong, YUAN Jie, PENG Yunfang   

  1. School of Management, Shanghai University, Shanghai 200444, China
  • Received:2022-06-14 Online:2024-06-25 Published:2024-08-14

基于改进强化学习的准时化物料搬运系统实时调度方法

夏蓓鑫, 顾嘉怡, 田童, 袁杰, 彭运芳   

  1. 上海大学 管理学院,上海 200444
  • 通讯作者: 彭运芳(1984-),女,湖北孝感人,博士,副教授,研究方向:生产计划与排序,物流系统规划,运筹优化。
  • 作者简介:夏蓓鑫(1984-),男,浙江宁波人,博士,副教授,研究方向:系统建模与仿真,调度,数据分析。
  • 基金资助:
    国家自然科学基金资助项目(71801147);上海市浦江人才计划项目(22PJC051)

Abstract: The scheduling of the workshop material handling system is an important part of the production control system of the manufacturing enterprise’s flow workshop. Timely and efficient material scheduling can effectively improve production efficiency and economic benefits. In the actual production process, there may be some random events that make the workshop material handling system dynamic. In order to dynamically respond to changes in the state of the assembly line and effectively balance the production efficiency and energy consumption of mixed flow assembly, this paper proposes a reinforcement learning scheduling model based on Q-learning algorithm.
The real-time state information of the manufacturing system includes all the state characteristic information of the system at a certain moment. Considering that the complexity of the system is difficult to cover all system states, in order to simplify the model and ensure the accuracy of the decision-making model, and effectively use reinforcement learning to solve it, this paper selects the current real-time information, forward-looking information of the system and the slack time of each part as the system state characteristics used in the scheduling decision model. It sets up five action groups according to the number of transported parts and the transport sequence of multiple parts. The calculation of the transport scheduling plan for each action group of a multi-carrying trolley is divided into three steps: selecting the transport task, calculating the start time, and coordinating the start time point. The reward and punishment function of the system feedback includes three dimensions: out-of-stock time, handling distance, and part-line inventory, which are given different weights according to the optimization goal, in order to realize the multi-objective optimization of minimizing the travel distance of multi-load trolleys and the line-side inventory of each part while satisfying the on-time delivery of parts on the assembly line as much as possible.
In order to solve the problem that the Q table is too large, this paper proposes an improved two-parameter greedy strategy selection method, and introduces the LSTM neural network on the basis of the greedy strategy to fit the Q value, approximating the Q-value function with LSTM neural network, in order to achieve a balanced optimization between speeding up convergence and avoiding premature maturity.
This paper uses Arena simulation software to build a simulation system for the mixed-flow assembly line of automobiles, and compare and observe the performance of different scheduling methods under different product ratios. The simulation results show that the optimization effect of modified Q-learning algorithm is better than other scheduling strategies which can effectively reduce the handling distance while ensuring that materials are delivered to the assembly line on time to achieve maximum output.At the same time, the calculation time consumed by the reinforcement learning scheduling method for a scheduling decision is significantly less than other methods, showing good real-time response capability, which meets the real-time requirements of the actual production environment for the scheduling method of the material handling system.

Key words: shop floor material handling system, reinforcement learning, Q-learning, hybrid policy

摘要: 准时高效的物料搬运系统保证了装配制造的持续稳定运行,为动态应对装配线状态变化,有效平衡混流装配的生产效率与能耗,本文提出了基于Q学习算法的强化学习调度模型,对其系统状态、动作策略、报酬函数进行设计,并引入神经网络对Q值函数进行泛化和逼近,改进策略选择机制,形成基于双参数贪婪策略的强化学习动态调度方法。仿真实验结果表明,这种强化学习调度相比其他调度方法,物料搬运调度的优化效果更好,能在保证物料准时运送到装配线,实现最大产量的同时,有效减少搬运距离。

关键词: 车间物料搬运系统, 强化学习, Q学习, 混合策略

CLC Number: