基于深度强化学习的碳交易机制下施工场地受限型项目调度优化

doi:10.12005/orms.2025.0381

运筹与管理 ›› 2025, Vol. 34 ›› Issue (12): 100-106.DOI: 10.12005/orms.2025.0381

基于深度强化学习的碳交易机制下施工场地受限型项目调度优化

刘浩, 张静文, 陈志, 李恒

西北工业大学管理学院,陕西西安 710072

收稿日期:2024-08-01 出版日期:2025-12-25 发布日期:2026-04-29
通讯作者: 张静文(1976-),女,陕西韩城人,博士,教授,研究方向:项目调度,服务运作管理。Email: zhangjingwen@nwpu.edu.cn。
作者简介:刘浩(1995-),男,安徽芜湖人,博士研究生,研究方向:项目调度。
基金资助:
国家自然科学基金资助项目(71971173,72201209);西北工业大学博士论文创新基金项目(SOMBC202203,CX2023069);陕西省自然科学基础研究计划(2025JC-YBMS-800)

Optimization of Project Scheduling with Limited Construction Site in a Carbon Trading Scheme Based on Deep Reinforcement Learning

LIU Hao, ZHANG Jingwen, CHEN Zhi, LI Heng

School of Management, Northwestern Polytechnical University, Xi’an 710072, China

Received:2024-08-01 Online:2025-12-25 Published:2026-04-29

摘要/Abstract

摘要： 工程项目施工过程中机械设备会产生大量的碳排放,碳交易机制是推动建筑业向绿色化转型的有效路径之一。对此,本文提出了碳交易机制下施工场地受限型项目调度问题(PSPLCS-CTS)。以最小化总成本为优化目标构建了PSPLCS-CTS的整数规划模型,并将问题转化为马尔可夫决策过程(MDP),设计了一种双深度Q网络和局部搜索相结合的两阶段算法(Double DQN-LS)进行求解。实验结果表明:Double DQN-LS算法更适合求解大规模问题,其在C-J30和C-J60算例集上所获得解的质量优于遗传算法(GA)和分布估计算法(EDA),同时在全部算例上Double DQN-LS算法的平均求解时间仅约为GA的6%和EDA的12%,求解效率显著提升。

关键词: 碳交易, 项目调度, 马尔可夫决策过程, 深度强化学习

Abstract: Heavy machinery used in construction projects generates significant carbon emissions. The carbon trading scheme aims to reduce these emissions through market mechanism. This paper proposes a Project Scheduling Problem with a Limited Construction Site in a Carbon Trading Scheme (PSPLCS-CTS). The objective is to minimize the total project cost, including the carbon trading cost. We assume construction machinery can operate at different speeds, leading to varying carbon emissions and activity durations. Upon project completion, if actual carbon emissions exceed the allocated quota, the excess emissions must be purchased additionally; conversely, any surplus quota can be sold.
Based on the above analysis, we construct an integer programming model for PSPLCS-CTS. Then, the integer programming model is transformed into a Markov Decision Process (MDP) model. We design five key components of the MDP model according to the problem’s characteristics: decision points, states, actions, state transition equations and reward function.
We develop a two-stage algorithm (Double DQN-LS) that combines Double Deep Q-Network and local search to solve the MDP model. In the first stage, the agent interacts with the environment to generate experiences, which are stored in a replay buffer and then randomly sampled for training. The state and action information are converted into a matrix and input to the network, where convolutional layers automatically extract features, and the Q-value of the state-action pair is estimated. In addition, to reduce the overestimation of the target value, the evaluation network is used to select the action during the learning process, and the target network is used to estimate its Q-value to improve the stability and performance of the algorithm. In the second stage, two local search algorithms are employed to enhance the quality of the schedule produced by the Double DQN.
Finally, extensive computational experiments are conducted to verify the effectiveness of the algorithm. For each set of instances, a sample is randomly selected for training at the level of each characteristic parameter. The trained Double DQN is then used to solve other new instances, and the two local search algorithms are used to refine the schedules generated by the Double DQN. The experimental results show the proposed Double DQN-LS algorithm outperforms the Genetic Algorithm (GA) and Estimation of Distribution Algorithm (EDA) on instances with larger sizes. Furthermore, the Double DQN-LS algorithm demonstrates a significant advantage in solving efficiency on all instances, with an average solving time of only about 6% of that of GA and 12% of that of EDA.

Key words: carbon trading, project scheduling, Markov decision processes, deep reinforcement learning

中图分类号:

F224

刘浩, 张静文, 陈志, 李恒. 基于深度强化学习的碳交易机制下施工场地受限型项目调度优化[J]. 运筹与管理, 2025, 34(12): 100-106.

LIU Hao, ZHANG Jingwen, CHEN Zhi, LI Heng. Optimization of Project Scheduling with Limited Construction Site in a Carbon Trading Scheme Based on Deep Reinforcement Learning[J]. Operations Research and Management Science, 2025, 34(12): 100-106.

参考文献

[1] 中国建筑节能协会建筑能耗与碳排放数据专委会.2022中国建筑能耗与碳排放研究报告[R].重庆:中国建筑节能协会,2022.
[2] BARATI K, SHEN X. Operational level emissions modelling of on-road construction equipment through field data analysis[J]. Automation in Construction, 2016, 72: 338-346.
[3] WU X, CHE A. A memetic differential evolution algorithm for energy-efficient parallel machine scheduling[J]. Omega, 2019, 82: 155-165.
[4] ZHANG W, ZHENG Y, AHMAD R. An energy-efficient multi-objective scheduling for flexible job-shop-type remanufacturing system[J]. Journal of Manufacturing Systems, 2023, 66: 211-232.
[5] LI J, HAN Y, GAO K, et al. Bi-population balancing multi-objective algorithm for fuzzy flexible job shop with energy and transportation[J]. IEEE Transactions on Automation Science and Engineering, 2024, 21(3): 4686-4702.
[6] HOSSEINIAN A H, BARADARAN V. An energy-efficient mathematical model for the resource-constrained project scheduling problem: An evolutionary algorithm[J]. Iranian Journal of Management Studies, 2019, 12(1): 91-119.
[7] MORILLO D, BARBER F, SALIDO M A. Mode-based versus activity-based search for a nonredundant resolution of the multimode resource-constrained project scheduling problem[J]. Mathematical Problems in Engineering, 2017, 2017: 4627856.
[8] DU B, TAN T, GUO J, et al. Energy-cost-aware resource-constrained project scheduling for complex product system with activity splitting and recombining[J]. Expert Systems with Applications, 2021,173: 114754.
[9] 刘洪海,吁新华.基于沥青路面施工质量的设备合理匹配技术研究[J].筑路机械与施工机械化,2007,24(7):23-27.
[10] ZHENG H, WANG L. Reduction of carbon emissions and project makespan by a Pareto-based estimation of distribution algorithm[J]. International Journal of Production Economics, 2015, 164: 421-432.
[11] 张静文,刘婉君,李琦.基于关键链改进搜索的遗传算法求解分布式多项目调度[J].运筹与管理,2021,30(3):123-129.
[12] 张豪华,白思俊.基于MAS的多模式分布式资源约束多项目调度[J].运筹与管理,2024,33(1):9-15.
[13] MORI M, TSENG C. A genetic algorithm for multi-mode resource constrained project scheduling problem[J]. European Journal of Operational Research, 1997, 100(1): 134-141.
[14] WANG L, FANG C. An effective estimation of distribution algorithm for the multi-mode resource-constrained project scheduling problem[J]. Computers & Operations Research, 2012, 39(2): 449-460.

[1]	高妮, 冉启黎, 贺毅岳. 基于变分模态分解的深度强化学习投资组合模型[J]. 运筹与管理, 2025, 34(9): 162-168.
[2]	夏良杰, 冯锦茹, 杨新文, 李友东. 考虑过度自信和信息不对称的低碳供应链减排与促销博弈[J]. 运筹与管理, 2025, 34(9): 169-176.
[3]	邵举平, 施瑾, 孙延安. 考虑碳交易与实时电价的“光储充一体化”冷链物流运营优化[J]. 运筹与管理, 2025, 34(9): 233-239.
[4]	陆心屹, 韩晓龙. 基于深度强化学习的柔性作业车间动态节能调度问题研究[J]. 运筹与管理, 2025, 34(8): 99-104.
[5]	王苓, 王钰, 梁承姬. 面向连续泊位和岸桥动态调度问题的强化学习方法[J]. 运筹与管理, 2025, 34(8): 185-191.
[6]	田旻, 石纯来, 范国强. “双碳”背景下汽车制造行业分布式多项目调度研究[J]. 运筹与管理, 2025, 34(6): 39-46.
[7]	董玉琛, 郑维博, 马志强, 何正文. 资源变化下公共卫生应急事件反应性调度优化[J]. 运筹与管理, 2025, 34(5): 39-46.
[8]	谢凝, 张姝. 考虑交通拥堵和充电站排队的纯电动车辆路径规划问题[J]. 运筹与管理, 2025, 34(5): 68-75.
[9]	周福礼, 陈天赋. 碳交易政策下考虑废旧产品质量异质性的闭环供应链决策与协调研究[J]. 运筹与管理, 2025, 34(4): 163-169.
[10]	张厚坤, 马冉, 彭琨琨, 张玉忠. 不确定性环境下多目标资源受限项目鲁棒性调度优化[J]. 运筹与管理, 2025, 34(2): 44-51.
[11]	庄子安, 苏强, 董海燕, 庄思良. 复杂门诊动态接收策略优化[J]. 运筹与管理, 2025, 34(12): 93-99.
[12]	张豪华, 白思俊, 李鲁波. 局部资源供应量不确定的多项目可调节鲁棒优化[J]. 运筹与管理, 2025, 34(10): 59-65.
[13]	王文杰, 胡志华, 田曦丹. 基于Q学习求解限定性集装箱翻箱问题的优化算法研究[J]. 运筹与管理, 2025, 34(10): 66-72.
[14]	李银, 宋亚植, 王新宇, 李凯风. 碳交易规制下中国光电企业融资策略研究[J]. 运筹与管理, 2025, 34(10): 127-133.
[15]	林新宇, 刘国山, 王敏. 基于活动切割机制的分布式多项目柔性调度优化研究[J]. 运筹与管理, 2025, 34(1): 19-26.

基于深度强化学习的碳交易机制下施工场地受限型项目调度优化

Optimization of Project Scheduling with Limited Construction Site in a Carbon Trading Scheme Based on Deep Reinforcement Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics