一种基于广义梯度的组合优化问题求解方法

doi:10.12005/orms.2025.0280

摘要/Abstract

摘要： 具有线性目标函数的组合问题的解关于指定问题实例的参数是不可微的。因此,当以组合问题的解作为模型训练的准则时,我们很难使用基于梯度的方法对模型进行优化。目前,有很多作者尝试将组合优化和更广泛的凸优化求解器应用到基于梯度的模型训练中,几种对优化问题的解向量进行微分的方法应运而生。然而在大多数情况下,我们只需对目标值(而不是解向量)求微分便可求解优化问题,而现有的求解方式会增加许多不必要的计算开销。针对上述问题,本文提出了一种直接在组合问题解的目标值上执行梯度下降的方法。同时,通过两个实验:(1)弱监督图像分类问题;(2)使用SoftMax或Gumbel-SoftMax的可微Encoder-Decoder结构的全局序列比对问题,展示了所提方法可以为该类问题提供可微组合损失,并验证了该方法与现有的其它方法相比,具有训练过程更稳定、预测更准确高效等优势。

关键词: 可微组合损失, 广义梯度, 线性规划, 神经网络

Abstract: Combinatorial problems with linear objective functions have a wide range of applications in real life, and there are many highly efficient algorithms for finding the optimal solutions to these combinatorial optimization problems. With the continuous development of artificial intelligence technology, people can make full use of the characteristics of powerful function approximators such as neural networks and combine rich feature extraction with efficient combinatorial solvers to achieve efficient approximate solution of high complexity combinatorial problems in an end-to-end, without any compromise. However, the solutions to the above combinatorial problems are similar with respect to the parameters in the problem instance, therefore, it is difficult for us to use gradient-based methods to optimize the model when the solution to the combinatorial problem is used as the criterion for model training. Many authors have attempted to apply combinatorial optimization and more broad convex optimization solvers to gradient-trained models. Several methods have been developed to differentiate the solution vectors of optimization problems. In most cases, we only need to differentiate the objective value (not the solution vector), but current existing methods can introduce unnecessary extra computations.
Here, we show how to perform gradient descents directly over the objective value of the solution to the combinatorial problems. Specifically, for efficiently solvable combinatorial problems that can be efficiently expressed as integer linear programs, the generalized gradients of objective values with respect to the real-valued parameters in the problem exist and can be efficiently computed by a black-box combinatorial algorithm in a single-run. This way of turning combinatorial solvers into differentiable building blocks in deep learning models enables us to execute their internal algorithms more efficiently. While ensuring the generalization of combinatorial deep learning models, it solves the problems: combinatorial solvers are difficult to be directly invoked and their applicability under specific problem structures is difficult to guarantee.
Moreover, we conduct two experiments: (1)weakly supervised image classification and (2)global sequence alignment problems with differentiable encoder-decoder architectures using Softmax or Gumbel-Softmax. The experimental results show that our proposed method can provide differentiable combinatorial losses for the above problems. Compared with other existing methods, our proposed method has the advantages of being more stable in training processes and more accurate and efficient in prediction. Specifically, in experiment (1), we use the information from the model about the class probability distribution output by each feature vector, match the model’s output about the feature vectors in the bag to the class label, and use the Hungarian algorithm as a combinatorial solver to find the permutation order in this problem. Compared with the existing gradient-based methods, the proposed method using generalized gradient to solve combinatorial optimization problems can provide training signals for large neural networks very effectively and is much faster than the current state-of-art methods in training speed. In experiment (2), we use Global Sequence Alignment (GSA) as loss. Compared to training using baseline loss, the proposed method achieves the best text summarization results of all three ROUGE evaluation metrics, and is more accurate and effective.
In the future, we will do further research on how to apply the proposed method in this paper. For example, DEtection TRansformer (DETR) was the first algorithm to apply the Transformer encoder-decoder architecture to object detection, and its architecture has become a building block in many transformer-based applications. However, DETR usually is set-based on the global loss function, which leads to inconsistent allocation cost and global loss. How to use the generalized gradient-based method proposed in this paper to improve the convergence speed and performance of DETR is a very valuable research direction.

Key words: differentiable combinatorial losses, generalized gradients, linear programs, neural networks

中图分类号:

TP183

张晗. 一种基于广义梯度的组合优化问题求解方法[J]. 运筹与管理, 2025, 34(9): 92-98.

ZHANG Han. A Generalized Gradient-based Method for Solving Combinatorial Optimization Problems[J]. Operations Research and Management Science, 2025, 34(9): 92-98.

参考文献

[1] CORMEN T H, LEISERSON C E, RIVEST R L, et al. Introduction to Algorithms[M]. Third Edition. Cambridge: MIT Press, 2009.
[2] GARMENDIA A I, CEBERIO J, AMENDIBURU A. Applicability of neural combinatorial optimization: A critical view[J]. ACM Transactions on Evolutionary Learning and Optimization, 2024, 4(3): (Article)15.
[3] SAHOO S S, PAULUS A, VLASTELICA M, et al. Backpropagation through combinatorial algorithms: Identity with projection works[C/OL]//International Conference on Learning Representations (ICLR 2023), May 1-5, 2023, Kigali, Rwanda.2023:1-15[2024-01-10]. https://openreview.net/forum?id=JZMR727O29.
[4] TSCHIATSCHEK S, SAHIN A, KRAUSE A. Differentiable submodular maximization[C]//The 27th International Joint Conference on Artificial Intelligence, July 13-19, 2018, Stockholm, Sweden. San Francisco: IJCAI, 2018: 2731-2738.
[5] MENSCH A, BLONDEL M. Differentiable dynamic programming for structured prediction and attention[C]//The 35th International Conference on Machine Learning, July 10-15, Stockholm, Sweden, 2018. Cambridge, Massachusetts: PMLR, 2018: 3462-3471.
[6] CHANG C Y, HUANG D A, SUI Y, et al. D3tw: Discriminative differentiable dynamic time warping for weakly supervised action alignment and segmentation[C]//The IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 15-20, 2019, Long Beach, USA. New York: IEEE, 2019: 3546-3555.
[7] POGANČIĆ M V, PAULUS A, MUSIL V, et al. Differentiation of blackbox combinatorial solvers[C/OL]//International Conference on Learning Representations, April 26-May 1, 2020: 10941-10959[2024-01-10]. https://openreview.net/forum?id=BkevoJSYPB.
[8] ROLÍNEK M, SWOBODA P, ZIETLOW D, et al. Deep graph matching via blackbox differentiation of combinatorial solvers[C]//European Conference on Computer Vision, August 23-28, 2020, Glasgow, UK. Berlin: Springer, 2020: 407-424.
[9] AMOS B, KOLTER J Z. Optnet: Differentiable optimization as a layer in neural networks[C]//The 34th International Conference on Machine Learning, August 6-11, 2017, Sydney, Australia. Cambridge, Massachusetts: PMLR, 2017: 136-145.
[10] AGRAWAL A, AMOS B, BARRATT S, et al. Differentiable convex optimization layers[C]//Advances in Neural Information Processing Systems, December 8-14, 2019, Vancouver, Canada. New York: Curran Associates, 2019: 9558-9570.
[11] WILDER B, DILKINA B, TAMBE M. Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization[C]//The Thirty-Third Conference on Artificial Intelligence (AAAI), January 27-February 1, 2019, Honolulu, USA. Washington, DC: AAAI, 2019: 1658-1665.
[12] FERBER A, WILDER B, DILINA B, et al. MIPaaL: Mixed integer program as a layer[C]//The AAAI Conference on Artificial Intelligence, February 7-12, 2020, New York, NY, USA. Washington, DC: AAAI, 2020: 1504-1511.
[13] JANG E, GU S, POOLE B. Categorical reparameterization with Gumbel-softmax[C/OL]//International Conference on Learning Representations, April 24-26, 2017, Toulon, France. 2017: 1920-1931[2024-01-10]. https://openreview.net/forum?id=rkE3y85ee.
[14] WOLSEY L. Strong formulations for mixed integer programming: A survey[J]. Mathematical Programming, 1989, 45(1): 173-191.
[15] LIN C Y. Rouge: A package for automatic evaluation of summaries[C]//Text Summarization Branches Out, July 25-26, 2004, Barcelona, Spain. Stroudsburg, Pennsylvania: Association for Computational Linguistics, 2004: 74-81.
[16] LIU H, GU X, SAMARAS D. A two-step computation of the exact GAN Wasserstein distance[C]//International Conference on Machine Learning, July 10-15, 2018, Stockhold, Sweden. Cambridge, Massachusetts: PMLR, 2018: 3165-3174.