Gelbrich度量下的分布鲁棒随机优化及其在机器学习中的应用

doi:10.12005/orms.2025.0225

摘要/Abstract

摘要： 分布鲁棒优化(Distributionally Robust Optimization: DRO)作为处理分布信息不确定下的随机优化方法在机器学习等领域有广泛应用。本文基于Gelbrich不确定集构建分布鲁棒优化模型,通过优化对偶理论将模型转换成易于计算求解的形式。进一步,在一定假设条件下,证明了对偶转换后的模型可等价为一类半定规划问题。作为典型的应用,考虑 Gelbrich度量下的分布鲁棒线性回归问题,构建了对应的优化模型。数值实验验证了该模型的有效性和在回归分析中的应用。

关键词: 分布鲁棒优化, 机器学习, Gelbrich不确定集, 线性回归

Abstract: Stochastic optimization can effectively describe the decision-making related problems with uncertain factors. One of the key problems in stochastic optimization research is to determine the distribution information of random variables. With the increasing complexity of practical problems, it becomes more and more difficult to obtain the distribution information of random variables accurately. In view of the situation that the distribution information of random variables is incomplete, optimization scholars have developed stochastic optimization theories and methods, and proposed the distributionally robust optimization, which is widely used in decision-making related problems in practical fields. The distributionally robust optimization combines the traditional robust optimization and stochastic optimization methods, which can effectively deal with the optimal decision-making of random variables under uncertain probability distribution. The key issues of distributionally robust optimization include the construction of uncertain distribution sets and the transformation of models. The ambiguity sets based on Wasserstein metric have been widely studied. However, as soon as one of the two distributions is no longer discrete, the Wasserstein distance cannot be computed in polynomial time, that is, computing the Wasserstein distance is generally a #P-hard problem. Therefore, new measures need to be developed. On the other hand, the distributionally robust optimization is widely used in machine learning. For example, the distributionally robust optimization can be used for outlier detection and processing, and can also improve the classification accuracy of the model and so on.
Although the application of distributionally robust optimization in machine learning has great potential, there are still many challenges and problems to be solved. One is how to effectively construct ambiguity sets, and the other is how to convert the distributionally robust optimization model into a solvable problem. Based on Gelbrich ambiguity sets, this paper constructs a distributionally robust optimization model, and transforms the model into a form that is easy to calculate and solve by optimization duality theory. Then the model is applied to the linear regression problem of machine learning. Under certain assumptions, the previous conclusion is applied to prove that the model is equivalent to a semidefinite programming problem. Finally, we select the “red win” data set from UCI machine learning repository, which is commonly used for regression analysis, for numerical experiments. The goal of the data set is to predict the quality score of red wine, which contains 1599 samples. Each data point has 11 features (such as fixed acidity, remaining sugar, alcohol, etc.) and a label (quality score). We first normalize each feature and label of the dataset by min-max to eliminate dimensional differences between features. Then the effectiveness of the model is verified from three aspects: the influence of the change of radius on the optimal value of error, the influence of the change of sample size on the optimal value of error, and the comparison between Gelbrich ambiguity sets and Wasserstein ambiguity sets.
From the numerical experiments, we obtain the following conclusions: (1)Theoretically, with an increase in the radius, the optimal value of the distributionally robust interior problem becomes larger, and thus the optimal value of the whole distributionally robust optimization becomes larger. The actual calculation results verify the theory. (2)Taking the same number of different random samples and repeating ten times to obtain the ten optimal values of the same problem, it can be obtained that with an increase in the sample size, the Gelbrich distributionally robust minimum absolute error tends to be the same. This indicates that the learned model becomes more and more stable as the sample size continues to increase. (3)The Gelbrich distributionally robust least absolute error model is compared with the type-1 Wasserstein distributionally robust least absolute error model. It is found that the error of the former is smaller than that of the latter, that is, GDR-LAE has a better fitting effect on the data, which further verifies the calculation effect of the proposed model.
Since this paper only considers the linear regression problem in machine learning, how to construct and solve the distributionally robust optimization model for nonlinear regression problems and optimization problems such as classification in machine learning can be further studied.

Key words: distributionally robust optimization, machine learning, Gelbrich ambiguity set, linear regression

中图分类号:

O224

张吉蓝, 童小娇. Gelbrich度量下的分布鲁棒随机优化及其在机器学习中的应用[J]. 运筹与管理, 2025, 34(7): 183-188.

ZHANG Jilan, TONG Xiaojiao. Gelbrich Distributionally Robust Optimization and its Application to Machine Learning[J]. Operations Research and Management Science, 2025, 34(7): 183-188.

参考文献

[1] SHAPIRO A, DENTCHEVA D, RUSZCZYNSKI A. Lectures on Stochastic Programming: Modeling and Theory[M]. Philadelphia: Society for Industrial and Applied Mathematics, 2021.
[2] DELAGE E, YE Y. Distributionally robust optimization under moment uncertainty with application to data-driven problems[J]. Operations Research, 2010, 58(3): 592-612.
[3] NGUYEN V A, KUHN D, ESFAHANI P M. Distributionally robust inverse covariance estimation: The Wasserstein shrinkage estimator[J]. Operations Research, 2022, 70(1): 490-515.
[4] SHAPIRO A. Distributionally robust stochastic programming[J]. SIAM Journal on Optimization, 2017, 27(4): 2258-2275.
[5] ESFAHANI P M, KUHN D. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations[J]. Mathematical Programming, 2018, 171(1-2): 115-166.
[6] LUO F, MEHROTRA S. Decomposition algorithm for distributionally robust optimization using Wasserstein metric with an application to a class of regression models[J]. European Journal of Operational Research, 2019, 278(1): 20-35.
[7] JIN X, LIU B, LIAO S, et al. A Wasserstein metric-based distributionally robust optimization approach for reliable-economic equilibrium operation of hydro-wind-solar energy systems[J]. Renewable Energy, 2022, 196: 204-219.
[8] TASKESEN B, SHAFIEEZADEH-ABADEH S, KUHN D. Semi-discrete optimal transport: Hardness, regularization and numerical solution[J]. Mathematical Programming, 2023, 199(1-2): 1033-1106.
[9] NGUYEN V A, SHAFIEE S, FILIPOVIĆ D, et al. Mean-covariance robust risk measurement[J/OL]. arXiv, 2023: 2112.09959v2[2024-08-11]. https://arxiv.org/pdf/2112.09959v2.
[10] CHEN R, PASCHALIDIS I C. A distributionally robust optimization approach for outlier detection[C]//2018 IEEE Conference on Decision and Control (CDC), December 17-19, 2018,Miami Beach, FL, USA. IEEE, 2018: 352-357.
[11] SHAFIEEZADEH-ABADEH S, ESFAHANI P M, KUHN D. Distributionally robust logistic regression[J]. Advances in Neural Information Processing Systems, 2015, 28: 1576-1584.
[12] CHEN R, PASCHALIDIS I C. A robust learning approach for regression models based on distributionally robust optimization[J]. Journal of Machine Learning Research, 2018, 19(13): 1-48.
[13] SHAFIEEZADEH-ABADEH S, KUHN D, ESFAHANI P M. Regularization via mass transportation[J/OL]. Journal of Machine Learning Research, 2019, 20: 1-68[2023-05-10].https://www.jmlr.org/papers/volume20/17-633/17-633.pdf.
[14] KUHN D, ESFAHANI P M, NGUYEN V A, et al. Wasserstein distributionally robust optimization: Theory and applications in machine learning[J/OL]. arXiv, 2024: 1908.08729v2[2025-03-10]. https://arxiv.org/pdf/1908.08729v2.
[15] CORTEZ P, CERDEIRA A, ALMEIDA F, et al. UCI Machine Learning Repository: Wine Quality[DB/OL]. Irvine CA: University of California, School of Information and Computer Science, 2013.(2009-06-09)[2023-06-01]. http://doi.org/10.24432/C56S3T.