运筹与管理 ›› 2025, Vol. 34 ›› Issue (9): 92-98.DOI: 10.12005/orms.2025.0280

• 理论分析与方法探讨 • 上一篇    下一篇

一种基于广义梯度的组合优化问题求解方法

张晗   

  1. 东北财经大学 数据科学与人工智能学院,辽宁 大连 116025
  • 收稿日期:2024-01-23 出版日期:2025-09-25 发布日期:2026-01-19
  • 作者简介:张晗(1990-),男,辽宁大连人,博士,讲师,研究方向:神经网和表示学习。Email: hanzhang@dufe.edu.cn。
  • 基金资助:
    辽宁省应用基础研究计划项目(2023JH2/101600040);辽宁省教育厅基本科研项目(LJKMZ20221598);国家自然科学基金资助项目(72273019)

A Generalized Gradient-based Method for Solving Combinatorial Optimization Problems

ZHANG Han   

  1. School of Data Science and Artificial Intelligence, Dongbei University of Finance and Economics, Dalian 116025, China
  • Received:2024-01-23 Online:2025-09-25 Published:2026-01-19

摘要: 具有线性目标函数的组合问题的解关于指定问题实例的参数是不可微的。因此,当以组合问题的解作为模型训练的准则时,我们很难使用基于梯度的方法对模型进行优化。目前,有很多作者尝试将组合优化和更广泛的凸优化求解器应用到基于梯度的模型训练中,几种对优化问题的解向量进行微分的方法应运而生。然而在大多数情况下,我们只需对目标值(而不是解向量)求微分便可求解优化问题,而现有的求解方式会增加许多不必要的计算开销。针对上述问题,本文提出了一种直接在组合问题解的目标值上执行梯度下降的方法。同时,通过两个实验:(1)弱监督图像分类问题;(2)使用SoftMax或Gumbel-SoftMax的可微Encoder-Decoder结构的全局序列比对问题,展示了所提方法可以为该类问题提供可微组合损失,并验证了该方法与现有的其它方法相比,具有训练过程更稳定、预测更准确高效等优势。

关键词: 可微组合损失, 广义梯度, 线性规划, 神经网络

Abstract: Combinatorial problems with linear objective functions have a wide range of applications in real life, and there are many highly efficient algorithms for finding the optimal solutions to these combinatorial optimization problems. With the continuous development of artificial intelligence technology, people can make full use of the characteristics of powerful function approximators such as neural networks and combine rich feature extraction with efficient combinatorial solvers to achieve efficient approximate solution of high complexity combinatorial problems in an end-to-end, without any compromise. However, the solutions to the above combinatorial problems are similar with respect to the parameters in the problem instance, therefore, it is difficult for us to use gradient-based methods to optimize the model when the solution to the combinatorial problem is used as the criterion for model training. Many authors have attempted to apply combinatorial optimization and more broad convex optimization solvers to gradient-trained models. Several methods have been developed to differentiate the solution vectors of optimization problems. In most cases, we only need to differentiate the objective value (not the solution vector), but current existing methods can introduce unnecessary extra computations.
Here, we show how to perform gradient descents directly over the objective value of the solution to the combinatorial problems. Specifically, for efficiently solvable combinatorial problems that can be efficiently expressed as integer linear programs, the generalized gradients of objective values with respect to the real-valued parameters in the problem exist and can be efficiently computed by a black-box combinatorial algorithm in a single-run. This way of turning combinatorial solvers into differentiable building blocks in deep learning models enables us to execute their internal algorithms more efficiently. While ensuring the generalization of combinatorial deep learning models, it solves the problems: combinatorial solvers are difficult to be directly invoked and their applicability under specific problem structures is difficult to guarantee.
Moreover, we conduct two experiments: (1)weakly supervised image classification and (2)global sequence alignment problems with differentiable encoder-decoder architectures using Softmax or Gumbel-Softmax. The experimental results show that our proposed method can provide differentiable combinatorial losses for the above problems. Compared with other existing methods, our proposed method has the advantages of being more stable in training processes and more accurate and efficient in prediction. Specifically, in experiment (1), we use the information from the model about the class probability distribution output by each feature vector, match the model’s output about the feature vectors in the bag to the class label, and use the Hungarian algorithm as a combinatorial solver to find the permutation order in this problem. Compared with the existing gradient-based methods, the proposed method using generalized gradient to solve combinatorial optimization problems can provide training signals for large neural networks very effectively and is much faster than the current state-of-art methods in training speed. In experiment (2), we use Global Sequence Alignment (GSA) as loss. Compared to training using baseline loss, the proposed method achieves the best text summarization results of all three ROUGE evaluation metrics, and is more accurate and effective.
In the future, we will do further research on how to apply the proposed method in this paper. For example, DEtection TRansformer (DETR) was the first algorithm to apply the Transformer encoder-decoder architecture to object detection, and its architecture has become a building block in many transformer-based applications. However, DETR usually is set-based on the global loss function, which leads to inconsistent allocation cost and global loss. How to use the generalized gradient-based method proposed in this paper to improve the convergence speed and performance of DETR is a very valuable research direction.

Key words: differentiable combinatorial losses, generalized gradients, linear programs, neural networks

中图分类号: