运筹与管理 ›› 2025, Vol. 34 ›› Issue (12): 100-106.DOI: 10.12005/orms.2025.0381

• 应用研究 • 上一篇    下一篇

基于深度强化学习的碳交易机制下施工场地受限型项目调度优化

刘浩, 张静文, 陈志, 李恒   

  1. 西北工业大学 管理学院,陕西 西安 710072
  • 收稿日期:2024-08-01 出版日期:2025-12-25 发布日期:2026-04-29
  • 通讯作者: 张静文(1976-),女,陕西韩城人,博士,教授,研究方向:项目调度,服务运作管理。Email: zhangjingwen@nwpu.edu.cn。
  • 作者简介:刘浩(1995-),男,安徽芜湖人,博士研究生,研究方向:项目调度。
  • 基金资助:
    国家自然科学基金资助项目(71971173,72201209);西北工业大学博士论文创新基金项目(SOMBC202203,CX2023069);陕西省自然科学基础研究计划(2025JC-YBMS-800)
       

Optimization of Project Scheduling with Limited Construction Site in a Carbon Trading Scheme Based on Deep Reinforcement Learning

LIU Hao, ZHANG Jingwen, CHEN Zhi, LI Heng   

  1. School of Management, Northwestern Polytechnical University, Xi’an 710072, China
  • Received:2024-08-01 Online:2025-12-25 Published:2026-04-29

摘要: 工程项目施工过程中机械设备会产生大量的碳排放,碳交易机制是推动建筑业向绿色化转型的有效路径之一。对此,本文提出了碳交易机制下施工场地受限型项目调度问题(PSPLCS-CTS)。以最小化总成本为优化目标构建了PSPLCS-CTS的整数规划模型,并将问题转化为马尔可夫决策过程(MDP),设计了一种双深度Q网络和局部搜索相结合的两阶段算法(Double DQN-LS)进行求解。实验结果表明:Double DQN-LS算法更适合求解大规模问题,其在C-J30和C-J60算例集上所获得解的质量优于遗传算法(GA)和分布估计算法(EDA),同时在全部算例上Double DQN-LS算法的平均求解时间仅约为GA的6%和EDA的12%,求解效率显著提升。

关键词: 碳交易, 项目调度, 马尔可夫决策过程, 深度强化学习

Abstract: Heavy machinery used in construction projects generates significant carbon emissions. The carbon trading scheme aims to reduce these emissions through market mechanism. This paper proposes a Project Scheduling Problem with a Limited Construction Site in a Carbon Trading Scheme (PSPLCS-CTS). The objective is to minimize the total project cost, including the carbon trading cost. We assume construction machinery can operate at different speeds, leading to varying carbon emissions and activity durations. Upon project completion, if actual carbon emissions exceed the allocated quota, the excess emissions must be purchased additionally; conversely, any surplus quota can be sold.
Based on the above analysis, we construct an integer programming model for PSPLCS-CTS. Then, the integer programming model is transformed into a Markov Decision Process (MDP) model. We design five key components of the MDP model according to the problem’s characteristics: decision points, states, actions, state transition equations and reward function.
We develop a two-stage algorithm (Double DQN-LS) that combines Double Deep Q-Network and local search to solve the MDP model. In the first stage, the agent interacts with the environment to generate experiences, which are stored in a replay buffer and then randomly sampled for training. The state and action information are converted into a matrix and input to the network, where convolutional layers automatically extract features, and the Q-value of the state-action pair is estimated. In addition, to reduce the overestimation of the target value, the evaluation network is used to select the action during the learning process, and the target network is used to estimate its Q-value to improve the stability and performance of the algorithm. In the second stage, two local search algorithms are employed to enhance the quality of the schedule produced by the Double DQN.
Finally, extensive computational experiments are conducted to verify the effectiveness of the algorithm. For each set of instances, a sample is randomly selected for training at the level of each characteristic parameter. The trained Double DQN is then used to solve other new instances, and the two local search algorithms are used to refine the schedules generated by the Double DQN. The experimental results show the proposed Double DQN-LS algorithm outperforms the Genetic Algorithm (GA) and Estimation of Distribution Algorithm (EDA) on instances with larger sizes. Furthermore, the Double DQN-LS algorithm demonstrates a significant advantage in solving efficiency on all instances, with an average solving time of only about 6% of that of GA and 12% of that of EDA.

Key words: carbon trading, project scheduling, Markov decision processes, deep reinforcement learning

中图分类号: