Reinforcement Learning Model of Dynamic Newsboy Problem with Framework Protocol

doi:10.12005/orms.2022.0326

Operations Research and Management Science ›› 2022, Vol. 31 ›› Issue (10): 105-112.DOI: 10.12005/orms.2022.0326

• Theory Analysis and Methodology Study • Previous Articles Next Articles

Reinforcement Learning Model of Dynamic Newsboy Problem with Framework Protocol

QI Yu-qing, ZHAO Xing-lei, ZHAO Tian-dong-jie

School of Economics and Management, Nanjing Tech University, Nanjing 211816, China

Received:2020-10-20 Online:2022-10-25 Published:2022-11-14

考虑框架协议的动态报童模型强化学习建模研究

祁玉青, 赵兴雷, 赵田东杰

南京工业大学经济与管理学院,江苏南京 211816

通讯作者: 祁玉青(1984-),男,江苏盐城人,副教授,博士,研究方向为物流与供应链管理。
作者简介:赵兴雷(1997-),男,山东临沂人,硕士,研究方向为库存管理;赵田东杰(1996-),女,重庆忠县人,硕士,研究方向为供应链管理。
基金资助:
国家自然科学青年基金项目(71701092);国家社会科学基金项目(20BGL025)

Abstract

Abstract: In order to stabilize the supply of goods and supply relations, enterprises often sign framework agreements with suppliers for a certain period of time. In order to solve the problem that retailers purchase newsboy products under the framework protocol, an inventory decision model is established by using reinforcement learning, and the optimal ordering strategy is obtained by using Q-learning algorithm. By generating random number of samples to simulate the demand, the difference between Q-learning algorithm and traditional ordering method is compared. Through a number of numerical experiments, it is found that the average profit of orderingwith reinforcement learning method is about 7%~22% higher thanof traditional ordering methods (quantitative ordering method,moving average forecasting and exponential smoothing), and the average profit difference of ordering with reinforcement learning method is about 8% compared with the ideal state. These findings verify the effectiveness and feasibility of reinforcement learning to solve inventory problems. This paper also studies the influence of several parameter changes on the total profit, and finds that the profit decreases with the increase of ε, while the profit increases with the increase of α. This conclusion can provide a new way of thinking for solving relevant inventory problems.

Key words: inventory model, framework agreement, Q-learning algorithm

摘要： 企业为了稳定货源和供货关系,常与供应商签订一定时期的框架性协议。为了解决零售商在框架协议下采购报童产品的问题,本文运用强化学习建立库存决策模型并使用Q学习算法求取较优订货策略。通过生成样本随机数来模拟需求量,对比研究Q学习算法订货和传统方法订货的差别。通过多次数值实验,发现使用强化学习方法订货相比于传统订货方法(定量订货法、移动平均预测、指数平滑法)平均利润提高约7%~22%,且多次实验下强化学习方法订货相比于理想状态的平均利润相差约8%。这些发现验证了强化学习解决库存问题的有效性和可行性。本文还研究了相关参数变化对总利润的影响,发现利润随着贪婪率(ε)增加而降低、随着学习率(α)的增加而增加。该结论能够为解决相关库存问题提供新的思路。

关键词: 库存模型, 框架协议, Q学习算法

CLC Number:

F224

QI Yu-qing, ZHAO Xing-lei, ZHAO Tian-dong-jie. Reinforcement Learning Model of Dynamic Newsboy Problem with Framework Protocol[J]. Operations Research and Management Science, 2022, 31(10): 105-112.

祁玉青, 赵兴雷, 赵田东杰. 考虑框架协议的动态报童模型强化学习建模研究[J]. 运筹与管理, 2022, 31(10): 105-112.

/ Recommend

References

[1] 戴伟.一种改进企业在框架协议下库存管理的方法[J].运筹与管理,2011,20(4):182-186.
[2] 毛克宁.报童问题及其商业拓展的两类利润期望模型[J].数学的实践与认识,2021,51(4):87-92.
[3] Zhang G, Shi J, et al. Multi-period multi-product acquisition planning with uncertain demands and supplier quantity discounts[J]. Transportation Research Part E: Logistics and Transportation Review, 2019, 132: 117-140.
[4] Li B, Yang X, Zhang Y. Distribution-free solutions to the extended multi-period newsboy problem[J]. Journal of Industrial & Management Optimization, 2017, 13(2): 37-37.
[5] Kartikeya Puranam David, et al. Managing blood inventory with multiple independent sources of supply[J]. European Journal of Operational Research, 2017, 259: 500-511.
[6] Chen F Y, Krass D. Analysis of supply contracts with minimum total order quantity commitments and non-stationary demands[J]. European Journal of Operational Research, 2001, 131(2): 309-323.
[7] Cai J, Hu X, et al. Optimal input quantity decisions considering commitment order contracts under yield uncertainty[J]. International Journal of Production Economics, 2019, 216: 398-412.
[8] Wang T, Gong X, Zhou S X. Dynamic inventory management with total minimum order commitments and two supply option[J]. Operations Research, 2017, 65(5): 1285-1302.
[9] 蒋国飞,吴沧浦.Q学习算法在库存控制中的应用[J].自动化学报,1999(2):96-101.
[10] 郑江波,程福阳,杨柳.基于马氏决策过程的易逝品联合策略[J].计算机集成制造系统,2017,2(1):144-153.
[11] Kara A, Dogan I. Reinforcement learning approaches for specifying ordering policies of perishable inventory systems[J]. Expert Systems with Applications, 2018, 91: 150-158.
[12] Inderfurth K, Kelle P, Kleber R. Dual sourcing using capacity reservation and spot market: optimal procurement policy and heuristic parameter determination[J]. European Journal of Operational Research, 2013, 225(2): 298-309.
[13] 杨华龙,叶迪,张倩,曾庆成.时间窗变动的车辆调度干扰管理模型与算法[J].运筹与管理,2017,(10):56-64.
[14] 邰世文,商剑平.煤炭码头卸车调度问题多目标优化模型及算法[J].运筹与管理,2018,27(6):91-99.
[15] 徐翔斌,李志鹏.强化学习在运筹学的应用:研究进展与展望[J].运筹与管理,2020,29(5):227-239.
[16] Mortazavi A, Arshadi Khamseh A, Azimi P. Designing of an intelligent self-adaptive model for supply chain ordering management system[J]. Engineering Applications of Artificial Intelligence, 2015, 37: 207-220.
[17] Paraschos P D, Koulinas G K, Koulouriotis D E. Reinforcement learning for combined production-maintenance and quality control of a manufacturing system with deterioration failures[J]. Journal of Manufacturing Systems, 2020, 56: 470-483.

Reinforcement Learning Model of Dynamic Newsboy Problem with Framework Protocol

考虑框架协议的动态报童模型强化学习建模研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 2

Recommended Articles

Metrics

[1]	DUAN Yong-rui, FU Qiong-chao, LI Gui-ping. Optimal Inventory Policy for Perishable Items with Service-level-dependent Demand Rate [J]. Operations Research and Management Science, 2015, 24(6): 65-75.
[2]	CHEN Ke-jia, LIN Lin. Two-stage Component Commonality Inventory Model with Guaranteed Delivery Time [J]. Operations Research and Management Science, 2011, 20(6): 39-44.