Operations Research and Management Science ›› 2025, Vol. 34 ›› Issue (1): 12-18.DOI: 10.12005/orms.2025.0003

• Theory Analysis and Methodology Study • Previous Articles     Next Articles

A Robust Control Chart for Monitoring High-dimensional Data Streams

DING Dong, JIANG Yalei   

  1. School of Management, Xi’an Polytechnic University, Xi’an 710048, China
  • Received:2022-12-09 Online:2025-01-25 Published:2025-05-16

一种监控高维数据流的稳健型控制图

丁冬, 姜亚蕾   

  1. 西安工程大学 管理学院,陕西 西安 710048
  • 通讯作者: 丁冬(1989-),女,广西南宁人,博士,副教授,研究方向:统计过程控制,质量管理与质量工程等。Email: dingdong@xpu.edu.cn。
  • 基金资助:
    陕西省教育厅哲学社会科学重点研究基地项目(21JZ030)

Abstract: As technology advances quickly, the functions of products are expanding in number, and their structures are becoming progressively more complicated. Therefore, it is often necessary to monitor multiple quality characteristics simultaneously during the production process. However, the data dimension is expanding quickly with the rapid growth of data collecting technology in the data age with the innovation of science technology and the advanced Internet. The number of product indicators that need to be monitored during the production process is growing day by day. High-dimensional data streams appear more and more frequently in various industries, especially in sensor-based manufacturing and image processing. High-dimensional data streams have attracted a lot of attention as a new type of data, and they have been already pervasive in daily life. Examples include information returned by sensors, real-time meteorological cloud images captured by satellites and user communication records.
However, the complexity of high-dimensional data brings many new challenges to quality monitoring. For instance, due to a large number of variables, the normality assumption of data is often invalid in high-dimensional cases, and the distribution form is usually unknown in practical applications. At the same time, the control chart that only detects mean shifts has been unable to satisfy the practical needs. Therefore, we urgently need statistical methods to monitor high-dimensional data streams.
To this end, a new robust control chart for monitoring independent high-dimensional data streams is proposed. Firstly, the local statistics for monitoring each dimension of the data streams are constructed by combining the score test statistic with the exponentially weighted moving average strategy. As a result, for the tth observations of kth data stream Xk,t, the final local charting statistic is given by
Rk,t=(θk,t)TI-10θk,t,
where, θk,t is the EWMA-type score function vector, and I0is the Fisher information matrix in control. Naturally, this type of statistic makes use of all data up to the current time point, and the control chart gives different observations varying weights. On this basis, the global monitoring statistics are constructed by utilizing the sum, the maximum value, and the top-r strategy. Especially, the proposed control chart method based on top-r method Ztop-r monitors is better than the method Zmax and more efficiently than the method Zsum because it only needs to calculate the first r local statistics. Therefore, this method is more convenient for calculation and more economical in cost. Accordingly, we advise we use the method Ztop-r whether it is for detecting mean shifts or variance drifts. In fact, the numerical simulations and a real case study have demonstrated its effectiveness. Practically, the top-r control chart method can be expressed as
$Z_{\mathrm{top}-r}=\sum_{k=1}^{r} R_{(k), t}=\sum_{k=1}^{r}\left[\left(\theta_{(k), t}\right)^{\mathrm{T}} I_{0}^{-1} \theta_{(k), t}\right], 1 \leq k \leq p,$,
where R(k),t denotes the kth largest local statistic. In practice, the simulation results have shown that using the Ztop-r statistics is sensitive and robust to detect process changes with suitable choices of the parameter r.
This method is appropriate for data with normal distribution or non-normal distribution. At the same time, it can detect not only shifts of mean value, but also shifts of variance, which is not available in many control charts. In order to evaluate the monitoring effect of the proposed control charts, the Monte Carlo simulation method is used. The average run length is used as an indicator to evaluate the monitoring performance of the control chart. The effectiveness and robustness of the proposed control charts are verified by the numerical simulation.
In order to illustrate the monitoring effect of the new control chart method in practical application, a practical case study is carried out with a set of real data. The data set contains 1,567 samples in total from a semiconductor manufacturing process. Each observation vector is composed of 590 dimensional variables. The final results prove that the proposed method Ztop-r has a higher calculation and detection efficiency. And it can detect abnormal shifts well in practical production in high-dimensional data streams.
The proposed new control chart in this paper has several advantages. Firstly, it can deal with both the normal and non-normal data. Secondly, it can not only detect mean shifts, but also variance shifts. Finally, the method only needs to focus on the first r local statistics. The statistics are simple in form and calculation, and more efficient. Therefore, these advantages of this new method guarantee that in the actual production process, any shifts of data streams can be quickly and effectively alarmed. The new control chart can be used in actual production process and effectively monitor product quality.
In this paper, we assume that the data streams are independent of one another, but in the actual production process, the relationship among data streams will be more complex as the dimension increases. In future research, we can consider extending the proposed method to the case of more general high-dimensional data streams.

Key words: high-dimensional data streams, robust control charts, EWMA, the top-r statistics, statistical process control

摘要: 随着传感器等现代化科技的迅速发展,高维数据流在各行各业中频繁出现。然而,高维数据的复杂性给质量监控带来了许多挑战。例如,在高维情形下正态性假设往往失效,而且实际中分布形式通常未知;同时仅监控均值的控制图已经无法满足实际需求,监控方差的重要性早已成为学界和业界的共识。为此,提出了一种用于监控独立高维数据流的稳健型控制图。首先将经典的得分检验统计量经过数学变换后,与指数加权移动平均(EWMA)方法相结合,提出监控每一维数据流的局部统计量,并在此基础上结合top-r等方法,提出了监控高维数据流的全局监控统计量。所提出方法适用于正态分布及非正态分布的数据,并且能够同时监控均值和方差。通过数值仿真分析和实际案例研究阐明新方法的有效性和稳健性。

关键词: 高维数据流, 稳健型控制图, EWMA, top-r统计量, 统计过程控制

CLC Number: