Citation |

- Permanent Link:
- http://ufdc.ufl.edu/AA00052595/00001
## Material Information- Title:
- Robust multivariate control charts
- Creator:
- Ajmani, Vivek Balraj, 1963-
- Publication Date:
- 1998
- Language:
- English
- Physical Description:
- xv, 196 leaves : ill. ; 29 cm.
## Subjects- Subjects / Keywords:
- Charts ( jstor )
Degrees of freedom ( jstor ) False positive errors ( jstor ) Gaussian distributions ( jstor ) Sample size ( jstor ) Signals ( jstor ) Simulations ( jstor ) Statistics ( jstor ) Subroutines ( jstor ) T distribution ( jstor ) Dissertations, Academic -- Statistics -- UF ( lcsh ) Statistics thesis, Ph. D ( lcsh ) - Genre:
- bibliography ( marcgt )
non-fiction ( marcgt )
## Notes- Thesis:
- Thesis (Ph. D.)--University of Florida, 1998.
- Bibliography:
- Includes bibliographical references (leaves 193-195).
- General Note:
- Typescript.
- General Note:
- Vita.
- Statement of Responsibility:
- by Vivek Balraj Ajmani.
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- The University of Florida George A. Smathers Libraries respect the intellectual property rights of others and do not claim any copyright interest in this item. This item may be protected by copyright but is made available here under a claim of fair use (17 U.S.C. Â§107) for non-profit research and educational purposes. Users of this work have responsibility for determining copyright status prior to reusing, publishing or reproducing this item for purposes other than what is allowed by fair use or other copyright exemptions. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. The Smathers Libraries would like to learn more about this item and invite individuals or organizations to contact the RDS coordinator (ufdissertations@uflib.ufl.edu) with any additional information they can provide.
- Resource Identifier:
- 029217888 ( ALEPH )
40096973 ( OCLC )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

ROBUST MULTIVARIATE CONTROL CHARTS By VIVEK BALRAJ AJMANI A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1998 To God for giving me strength, my family for their support, and Preeti for her love and friendship. ACKNOWLEDGMENTS I would like to express my sincere gratitude to Dr. Geoffery Vining for being my advisor. Everything I know about industrial statistics is directly attributed to his vast knowledge of the area and his excellent ability to impart this knowledge to his students. I look forward to continued collaboration with him in future. I would also like to thank Dr. Ronald Randles for his continued support, encouragement and kindness. Thanks also to Dr. Malay Ghosh, Dr. John Cornell, and Dr. Dianne Schaub for being on my dissertation committee. I would also like to thank Dr. William Woodall (University of Alabama) for agreeing to critique this dissertation, Dr. Thomas Hettmansperger (Pennsylvania State University) for allowing me to use his algorithm for the H chart, and Dr. Peter Rousseeuw (University of Antwerp) for supplying me with the FORTRAN code to compute data depths. In addition, I would like to thank my family for their constant words of encouragement. This dissertation would not have been possible without their support. In particular, I would like to thank my nephews, Vinay, Timothy, and Arvin, and my niece, Anjali for providing me with hours of fun and laughter. Words are not enough to express thanks to my wife, Preeti. Preeti has been my "shelter from the storm." Her love, devotion and kind words of encouragement have helped me get through some difficult and often very frustrating days. This dissertation is hers more than it is mine. TABLE OF CONTENTS ACKNOWLEDGMENTS .............................................................................................. iii LIST O F FIG U R E S ....................................................................................................... vi A B ST R A C T ..................................................................................................................... xiv CHAPTERS 1 IN T R O D U C T IO N ..................................................................................................... 1 2 REVIEW OF LITERATURE ................................................................................... 7 2.1 The Shewhart Control Chart .............................................................................. 7 2.2 The Cumulative Sum Chart (CUSUM) ............................................................. 16 2.3 The Exponentially Weighted Moving Average Chart (EWMA) ..................... 24 2.4 A Review of Liu (1995) ................................................................................... 30 3 ROBUSTNESS OF THE NORMAL THEORY MULTIVARIATE CONTROL CHARTS ....................................................................................... 35 3.1 Distributions Used in the Simulation Study ................................................... 36 3.2 The Simulation Strategy .................................................................................. 37 3.3 The Performance of the X2 Chart ...................................................................... 38 3.4 The Performance of Hotelling's 72 Chart ....................................................... 42 3.5 Performance of the Multivariate CUSUM Charts ............................................. 58 3.6 Performance of the Multivariate EWMA Chart (MEWMA) ........................... 62 4 ROBUSTNESS OF LIU'S (1995) NONPARAMETRIC MULTIVARIATE CONTROL CHARTS ....................................................................................... 64 4.1 The Notion of Data Depths .............................................................................. 64 4.2 The "r" C hart ................................................................................................... 68 4.3 The "Q " C hart ................................................................................................. 69 4.4 T he "S " C hart .................................................................................................. 71 4.5 Discussion of Simulation Results ................................................................... 72 5 ROBUST MULTIVARIATE CONTROL CHARTS UNDER A KNOWN COVARIANCE MATRIX ................................................................................. 89 5.1 Affine Invariant Multivariate One Sample Sign And Sign Rank Tests ........... 90 5.2 Robust Multivariate Shewhart Type Charts ...................................................... 93 5.3 A Robust Multivariate Exponentially Weighted Moving Average Chart ........... 109 6 AFFINE INVARIANT ROBUST MULTIVARIATE CONTROL CHARTS UNDER AN UNKNOWN COVARIANCE MATRIX ....................................... 113 6.1 Affine Invariant Multivariate Shewhart Type Charts ......................................... 114 6.2 An Affine-Invariant Multivariate EWMA Chart ................................................ 144 7 SUMMARY AND CONCLUSIONS ........................................................................ 151 APPENDIX: FORTRAN PROGRAMS .................................................................... 153 1. ARL of the X2 Chart for Individual Observations ............................................... 153 2. ARL of the T2 Chart for Individual Observations .............................................. 155 3. ARL of the T2 Chart for Subgroups of Observations ......................................... 158 4. ARL of Crosier's (1988) Multivariate CUSUM ................................................. 161 5. ARL of Pignatiello and Runger's (1990) Multivariate CUSUM ........................ 162 6. ARL of the Normal Theory Multivariate EWMA Chart (r=0.30) ...................... 164 7. A R L of the "r" C hart .......................................................................................... 166 8. A R L of the "Q " C hart ......................................................................................... 172 9. A R L of the "S ' C hart .......................................................................................... 175 10. A R L of the RST C hart ......................................................................................... 178 11. A R L of the PR-SR T Chart ................................................................................... 179 12. ARL of the Robust EWMA Chart ...................................................................... 182 13. A R L of the V(n) C hart ........................................................................................ 183 14. A R L of the W (n) C hart ....................................................................................... 186 15. A R L of the H C hart ............................................................................................. 189 R E F E R E N C E S ................................................................................................................ 193 B IO G RA PH ICA L SK ETCH ........................................................................................... 196 LIST OF FIGURES Figure pmgFigure 3.1 Plot of ARL versus 2 of the X' chart for individual observations ........... 40 Figure 3.2 Plot of the ARL versus 2 of the X2 chart with subgroups of size 5 ...... 41 Figure 3.3 Plot of ARL versus 2 of the T2 chart for individual observations under bivariate norm ality .......................................................................................... . . . 4 5 Figure 3.4 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate C auchy distribution .............................................................................. 45 Figure 3.5 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate t distribution with 2 degrees of freedom ................................................... 46 Figure 3.6 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate t distribution with 5 degrees of freedom .............................................. 46 Figure 3.7 Plot of ARL versus 2 of the T' chart for individual observations under the bivariate I distribution with 8 degrees of freedom ............................................... 47 Figure 3.8 Plot of ARL versus 2 of the T' chart for individual observations under the bivariate t distribution with 18 degrees of freedom ............................................. 47 Figure 3.9 Plot of ARL versus 2 of the T' chart for individual observations under the bivariate contaminated normal distribution ........................................................ 48 Figure 3.10 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate norm al distribution .............................................................................. 49 Figure 3.11 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate C auchy distribution ............................................................................. 49 Figure 3.12 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate t distribution with 2 degrees of freedom ............................................... 50 Figure 3.13 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate t distribution with 5 degrees of freedom ............................................... 50 Figure 3. 14 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate t distribution with 8 degrees of freedom .............................................. 51 Figure 3.15 Plot of ARL versus A of the T! chart for individual observations under the bivariate t distribution with 18 degrees of freedom ............................................. 51 Figure 3.16 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate m ixed norm al distribution ..................................................................... 52 Figure 3.17 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate norm al distribution ............................................................................. 53 Figure 3.18 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate C auchy distribution ............................................................................. 54 Figure 3.19 Plot of ARL versus 2 of the T 2 chart for individual observations under the bivariate t distribution with 2 degrees of freedom .............................................. 54 Figure 3.20 Plot of ARL versus 2 of the T 2 chart for individual observations under the bivariate t distribution with 5 degrees of freedom .............................................. 55 Figure 3.21 Plot of ARL versus 2 of the T 2 chart for individual observations under the bivariate t distribution with 8 degrees of freedom .............................................. 55 Figure 3.22 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate t distribution with 18 degrees of freedom ............................................. 56 Figure 3.23 Plot of ARL versus 2 of the T 2 chart for individual observations under the bivariate m ixed norm al distribution ..................................................................... 56 Figure 3.24 Plot of ARL versus 2 of the T2 chart for subgroups of size 5 and a base period of size 25 under the various bivariate distributions .................................. 57 Figure 3.25 Plot of ARL versus 2 showing the performance of the multivariate CUSUM procedures under the bivariate normal distribution ............................................. 59 Figure 3.26 Plot of ARL versus 2 showing the performance of the multivariate CUSUM procedures under the bivariate cauchy distribution ............................................. 59 Figure 3.27 Plot of ARL versus 2 showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 2 degrees of freedom .................. 60 Figure 3.28 Plot of ARL versus 2 showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 5 degrees of freedom .................. 60 Figure 3.29 Plot of ARL versus A showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 8 degrees of freedom .................. 61 Figure 3.30 Plot of ARL versus A showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 18 degrees of freedom ........ 61 Figure 3.31 Plot of ARL versus A showing the performance of the multivariate CUSUM procedures under the bivariate mixed normal distribution ................................... 62 Figure 3.32 Plot of the ARL versus 2 for the MEWMA chart for individual observations under diff erent distributions ................................................................................ 63 Figure 4.1 Illustrating the values of the indicator function in the bivariate case ....... 66 Figure 4.2 Plot of ARL versus 2 performance of the "r" chart under the various bivariate distributions ...................................................................................... . . 74 Figure 4.3 Plot comparing performance of the T2 chart with the "r" chart under bivariate norm ality ........................................................................................... . . 75 Figure 4.4 Plot comparing the performance of the T 2 chart with the "r" chart under the bivariate t distribution w ith 2 d.f. ...................................................................... 76 Figure 4.5 Plot comparing the performance of the T2 chart with the "r" chart under the bivariate t distribution w ith 5 d.f. .................................................................... 76 Figure 4.6 Plot comparing the performance of the T2 chart with the "r" chart under the bivariate t distribution w ith 8 d.f. ...................................................................... 77 Figure 4.7 Plot comparing the performance of the T' chart with the "r" chart under the bivariate t distribution w ith 18 d.f .................................................................... 77 Figure 4.8 Plot comparing the performance of the T2 chart with the "r" chart under the bivariate m ixed norm al distribution ..................................................................... 78 Figure 4.9 Plot of ARL versus 2 performance of the 0 chart under the various bivariate d istrib u tio n s ...................................................... .............................................. . . . 7 8 Figure 4.10 Plot of ARL versus A performance of the 0 chart under the various bivariate distributions ...................................................................................... . . 79 Figure 4.11 Plot comparing the performance of the T2 chart with the "Q" chart under the bivariate norm al distribution ......................................................................... 80 Figure 4.12 Plot comparing the performance of the T' chart with the "0" chart under the bivariate t distribution w ith 2 d.f. .................................................................... 81 Figure 4. 13 Plot comparing the performance of the T2 chart with the "0" chart under the bivariate I distribution w ith 5 d.f. .............. ................................................. 81 Figure 4.14 Plot comparing the performance of the T chart with the "0" chart under the bivariate t distribution w ith 8 d.f. .................................................................... 82 Figure 4.15 Plot comparing the performance of the T2 chart with the "0" chart under the bivariate t distribution w ith 18 d.f. .............................................................. 82 Figure 4.16 Plot comparing the performance of the T2 chart with the "0" chart under the bivariate m ixed norm al distribution .............................................................. 83 Figure 4.17 Plot of ARL versus 2 for the "S" chart under the various bivariate d istrib u tio n s . .......................................................................................................... 8 4 Figure 4.18 Plot comparing the performance of the normal theory multivariate CUSUM charts with the "S" chart under the bivariate normal distribution ....................... 85 Figure 4.19 Plot comparing the performance of the normal theory multivariate CUSUM charts with the "S" chart under the bivariate t distribution with 2 d.f. ............... 85 Figure 4.20 Plot comparing the performance of the normal theory multivariate CUSUM charts with the "S" chart under the bivariate I distribution with 5 d.f. ................... 86 Figure 4.21 Plot comparing the performance of the normal theory multivariate CUSUM charts with the "S" chart under the bivariate t distribution with 8 d.f ................... 86 Figure 4.22 Plot comparing the performance of the normal theory multivariate CUSUM charts with the "S" chart under the bivariate t distribution with 18 d.f. ............. 87 Figure 4.23 Plot comparing the performance of the normal theory multivariate CUSUM charts with the "S" chart under the bivariate mixed normal distribution ............. 87 Figure 5.1 Plot of ARL versus 2 for the RST with n = 5 under various bivariate d istrib u tio n s . ..................................................................................................... . . 9 9 Figure 5.2 Plot of ARL versus 2 for the RST with n = 10 under various bivariate d istrib u tio n s . ........................................................................................................ 10 0 Figure 5.3 Plot of ARL versus 2 for the PR - SRT with n = 5 under various bivariate d istrib u tio n s . ........................................................................................................ 10 4 Figure 5.4 Plot of ARL versus 2 for the PR - SRT with n = 10 under various bivariate d istrib u tio n s . ........................................................................................................ 10 4 Figure 5.5 Plot comparing the T2, the "0", the RST,the PR-SRT, and the Z' charts u nd er b iv ariate n o rm ality ...................................................................................... 10 5 Figure 5.6 Plot comparing the T2, the "0", the RST, the PR-SRT, and the z charts under the bivariate t distribution w ith 2 d.f. .......................................................... 105 Figure 5.7 Plot comparing the T , the "0", the RST, the PR-SRT, and the ,: charts under the bivariate t distribution w ith 5 d.f .......................................................... 106 Figure 5.8 Plot comparing the T2, the "0", the RST, the PR-SRT, and the - charts under the bivariate t distribution w ith 8 d.f. .......................................................... 106 Figure 5.9 Plot comparing the T2, the "0", the RST, the PR-SRT, and the - charts under the bivariate t distribution w ith 18 d.f. ......................................................... 107 Figure 5.10 Plot comparing the T2, the "0", the RST, the PR-SRT, and the %- charts under the bivariate m ixed norm al distribution ....................................................... 107 Figure 5.11 Plot of ARL versus A for the RAIIEWMA with r = 0.10 under various b iv ariate d istribu tio n s ........................................................................................... 1 12 Figure 6.1 Plot comparing the performance of the V,, chart for different base period sample sizes and under the bivariate normal distribution ....................................... 115 Figure 6.2 Plot comparing the performance of the V, chart for different base period sample sizes and under the bivariate t distribution with 2 d.f. ............................... 116 Figure 6.3 Plot comparing the performance of the V, chart for different base period sample sizes and under the bivariate t distribution with 5 d.f. ............................... 116 Figure 6.4 Plot comparing the performance of the V7 chart for different base period sample sizes and under the bivariate t distribution with 8 d.f, ............................... 117 Figure 6.5 Plot comparing the performance of the V chart for different base period sample sizes and under the bivariate t distribution with 18 d.f. ................. 117 Figure 6.6 Plot comparing the performance of the V, chart for different base period sample sizes and under the bivariate mixed normal distribution ............................. 118 Figure 6.7 Plot comparing the performance of the V chart for different base period sample sizes and under the bivariate normal distribution ....................................... 119 Figure 6.8 Plot comparing the performance of the V, chart for different base period sample sizes and under the bivariate t distribution with 2 d.f. ............................... 119 Figure 6.9 Plot comparing the performance of the V chart for different base period sample sizes and under the bivariate t distribution with 5 d.f .............................. 120 Figure 6.10 Plot comparing the performance of the V chart for different base period sample sizes and under the bivariate t distribution with 8 d.f. .............................. 120 Figure 6. 11 Plot comparing the performance of the V' chart for different base period sample sizes and under the bivariate t distribution with 18 d.f ............................ 121 Figure 6.12 Plot comparing the performance of the V, chart for different base period sample sizes and under the bivariate mixed normal distribution ............................ 121 Figure 6.13. Plot comparing the performance of the W chart for different base period sample sizes and under the bivariate normal distribution ...................................... 123 Figure 6.14. Plot comparing the performance of the W chart for different base period sample sizes and under the bivariate t distribution with 2 d.f. .............................. 124 Figure 6.15 Plot comparing the performance of the WV chart for different base period sample sizes and under the bivariate t distribution with 5 d.f. .............................. 124 Figure 6.16 Plot comparing the performance of the W chart for different base period sample sizes and under the bivariate t distribution with 8 d.f. ............................... 125 Figure 6.17 Plot comparing the performance of the KV chart for different base period sample sizes and under the bivariate t distribution with 18 d.f. ............................. 125 Figure 6.18 Plot comparing the performance of the KV chart for different base period sample sizes and under the bivariate mixed normal distribution ............................. 126 Figure 6.19 Plot comparing the performance of the W chart for different base period sample sizes and under the bivariate normal distribution ....................................... 126 Figure 6.20 Plot comparing the performance of the W chart for different base period sample sizes and under the bivariate t distribution with 2 d.f. ............................... 127 Figure 6.21. Plot comparing the performance of the W, chart for different base period sample sizes and under the bivariate t distribution with 5 d.f. ............................... 127 Figure 6.22 Plot comparing the performance of the KV chart for different base period sample sizes and under the bivariate t distribution with 8 d.f. ............................... 128 Figure 6.23 Plot comparing the performance of the KV chart for different base period sample sizes and under the bivariate t distribution with 18 d.f. ............................. 128 Figure 6.24 Plot comparing the performance of the W chart for different base period sample sizes and under the bivariate mixed normal distribution ............................. 129 Figure 6.25 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate normal distribution ....................................... 133 Figure 6.26 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 2 d.f ................................ 133 Figure 6.27 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 5 d.f. ................................ 134 Figure 6.28 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 8 d.f ................................ 134 Figure 6.29 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 18 d.f .............................. 135 Figure 6.30 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate mixed normal distribution ............................. 135 Figure 6.31 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate normal distribution ....................................... 137 Figure 6.32 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 2 d.f. ............................... 137 Figure 6.33 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 5 d.f. ............................... 138 Figure 6.34 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 8 d.f. ............................... 138 Figure 6.35 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 18 d.f ............................. 139 Figure 6.36 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate mixed normal distribution ............................. 139 Figure 6.37 Plot comparing the performance of the T2, "Q", RST, PR - SRT, V, W, and H charts under the bivariate normal distribution ............................................ 141 Figure 6.38 Plot comparing the performance of the T2, "Q ", RST, PR - SRT, V,, kV, and H charts under the bivariate t distribution with 2 d.f ................................... 141 Figure 6.39 Plot comparing the performance of the T 2, "Q", RST, PR - SRT, V" , WV, and H charts under the bivariate t distribution with 5 d.f ................................... 142 Figure 6.40 Plot comparing the performance of the T2 "Q" RST, PR-SRT, V W, and H charts under the bivariate t distribution with 8 df ................................... 142 Figure 6.41 Plot comparing the performance of the T, "0" RST,PR-SRT,nVW, and H charts under the bivariate t distribution with 18 d.f ................................. 143 Figure 6.42 Plot comparing the performance of the T2, "Q", RST, PR- SRT, V, W, and H charts under the bivariate mixed normal distribution ................................. 143 Figure 6.43 Plot showing the performance of the Vn - EWMA chart under the various b iv ariate d istrib u tio n s ........................................................................................... 14 6 Figure 6.44 Plot comparing the performance of Lowry et al. (1992), REWMA, and V - EWA charts under the bivariate normal distribution ................................... 147 Figure 6.45 Plot comparing the performance of Lowry et al. (1992), REWMA, and V - EWA charts under the bivariate t distribution with 2 d.f. ........... 147 Figure 6.46 Plot comparing the performance of Lowry et al. (1992), REWMA, and V - EWMA charts under the bivariate t distribution with 5 df .......................... 148 Figure 6.47 Plot comparing the performance of Lowry et al. (1992), REWM4A, and VT - EWMA charts under the bivariate t distribution with 8 df .......................... 148 Figure 6.48 Plot comparing the performance of Lowry et al. (1992), REWMA, and V, - EWMA charts under the bivariate t distribution with 18 df ........................ 149 Figure 6.49 Plot comparing the performance of Lowry et al. (1992), REWMA, and Vf - EWA charts under the bivariate mixed normal distribution ........................ 149 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ROBUST MULTIVARIATE CONTROL CHARTS By Vivek Balraj Ajmani May 1998 Chairman: G. Geoffrey Vining Major Department: Statistics Control charts are one of the most powerful tools for monitoring a process. Univariate control charts are useful for monitoring processes that manufacture products with a single quality characteristic of interest. In many cases, products may be characterized by two or more quality characteristics that jointly determine the usefulness or the quality of the product. In many instances, these quality characteristics are correlated and, therefore, alternative multivariate control chart techniques are required to monitor the process that manufactures such products. The performance of the multivariate control chart procedures that are currently being used in industry and that are being cited in the literature have been studied under the assumption that the underlying distribution of the process is multivariate normal. It is well known that in reality this assumption rarely holds. Our results indicate that the normal theory multivariate control charts perform poorly when departures from multivariate normality occur. Alternatives to the normal theory multivariate control charts are needed in case the assumption of multivariate normality fails to hold. One such alternative is based on the notion of data depths which leads to non-parametric multivariate control charts. However, our simulation studies indicate that the performances of the data depth multivariate control charts are poor under both multivariate normality and under departures from it. We propose robust alternatives which are based on affine-invariant one-sample multivariate versions of the sign and sign-rank hypotheses tests. These hypotheses tests are used to construct multivariate Shewhart type and exponentially weighted moving average (EWMA) charts. Our simulation results indicate that the performance of the proposed charts are comparable to the performance of the normal theory and the data depth based multivariate control charts under the assumption of multivariate normality. On the other hand, the performance of the proposed charts are an improvement over the performance of the normal theory and the data depth based multivariate control charts under departures from multivariate normality. CHAPTER 1 INTRODUCTION Global competition and increased consumer awareness have made American industries focus heavily on quality control issues. Statistical Process Control (SPC) provides an important set of tools for achieving quality control objectives. These tools help in achieving process stability through the reduction of process variability. Stable processes are required to meet the consumer's fitness for use criteria. Ideally, processes should operate with little variability around the target of the quality characteristic in question. Assignable causes or process shifts must be detected quickly so that corrective actions can be taken before many non-conforming units are produced. Control charts, developed in the 1920s by Walter A. Shewhart of Bell Laboratories, are one of the most powerful on-line techniques for controlling process variability. They help in monitoring a process so that efforts can be made to improve the process. Since their introduction, control charts have gained wide usage and acceptance in industry particularly in the manufacturing sector. Univariate control charts are useful for manufactured items with only one quality characteristic of interest. For instance, the quality of a compact disc may be characterized by its diameter. A compact disc must have a diameter that is less than or equal to the diameter of the disc holder in the compact disc player. Univariate control charts may be used in this case to monitor the target diameter of a compact disc. The objective would be to detect any deviations from the target diameter that individual discs or sample means exhibit as soon as they occur. In many manufacturing situations, products may have two or more quality characteristics. In such cases the usefulness of the product is determined by both quality variables, which often are correlated. As an example, the quality of certain types of tablets may be determined by their weight, degree of hardness, thickness, width, and length. These quality variables are correlated and, therefore, alternative methods of control are needed since control charts for monitoring individual quality characteristics may not be adequate for detecting changes in the quality of such products. These methods are collectively classified as multivariate quality control techniques and the control charts based on such procedures are called multivariate quality control charts. Multivariate quality control was first introduced by Hotelling in 1947 in the testing of bombsights. Two bombsights from a lot of size 20 were selected at random. The sights were tested by taking two flights and dropping four bombs on each flight. The range error (measured in the flight direction of the airplane) and the deflection error (measured perpendicular to the direction of the flight) were used as a measure of the quality of the bombsights (Alt, 1984). Hotelling introduced a multivariate Shewhart type chart (the T2 chart) for monitoring this process (Hotelling, 1947). Jackson (1956, 1959) introduced a control ellipse which produced the same result as the T- control chart proposed by Hotelling. The control ellipse and the T2 chart are similar in the sense that points which plot out-of-control on the T2 chart also plot out-of-control on the control ellipse. Since then, various authors have studied the performance and properties of the T2 chart. Alt (1984) and Jackson (1985) give a thorough literature review on this topic. One of the drawbacks of the Shewhart control chart is its insensitivity to small shifts in the process mean. Alternatives to the Shewhart control chart when small shifts in the process mean are of interest include the cumulative sum (CUSUM) chart and the exponentially weighted moving average (EWMA) chart. The Shewhart, CUSUM, and the EWMA charts differ with respect to how each chart uses the data from the production process. The Shewhart chart places all the weight on the most recent observation and, therefore, ignores information from past data. This makes the Shewhart chart insensitive to small shifts in the process mean. The CUSUM chart is based on the cumulative sum of the deviations of the observations from the target mean of the process and, therefore, each observation in this sum is equally weighted. Thus, the CUSUM chart uses information from both recent and past data. The EWMA chart is based on a statistic that gives less weight to past data than to present or more recent data through the use of a weighting constant. The weighting constant in the EWMA chart depends on the magnitude of the shift in the process mean that needs to be detected. The EWMA chart can be designed to behave like the Shewhart or the CUSUM charts by choosing an appropriate weighting constant. The CUSUM and the EWMA charts are, therefore, more sensitive to small shifts than the Shewhart chart since they make use of information from past data. The multivariate generalizations of the CUSUM and EWMA charts have been studied extensively. See for example Woodall and Ncube (1985), Crosier (1988), Pignatiello and Runger (1990), and Lowry et al. (1992). The average run length (ARL) performance of these charts were shown to be an improvement over the multivariate Shewhart charts particularly for small shifts in the process mean. The average run length of a control chart is the average number of observations or samples that need to be collected before the control chart gives an-out-of-control signal. The performances of the multivariate Shewhart, CUSUM and EWMA charts that are currently being used in industry and are cited in the literature have been studied under the assumption that the underlying distribution of the process is multivariate normal. It is well known that this assumption is rarely true in practice. Alternative methods of control are needed in case the assumption of multivariate normality is violated. Research in the area of robust multivariate control charts is needed. Liu (1995) proposed three nonparametric multivariate control charts that do not require any assumptions be satisfied regarding the underlying distribution of the process. However, Liu did not conduct average run length studies of the proposed charts. Therefore, we cannot compare the performance her charts with that of the normal theory based multivariate control charts. The performance of a control chart is measured by its in-control and out-of-control average run lengths. Typically, we would require a control chart to maintain its prespecified in-control average run length and quickly detect out-of-control states in a process. The objectives of this dissertation are threefold. First, we provide a thorough literature review of both the normal theory and the non-parametric multivariate quality control charts. Secondly, we do a comprehensive study of the average run length performance of the existing methods under departures from the multivariate normality assumption. This includes the study of the average length performance of the procedures suggested in Liu (1995). Finally, we propose robust alternatives to the existing multivariate control chart methods. The new methods are based on affine invariant multivariate one sample tests that were developed by Hettmansperger et al. (1994), Peters and Randles (1990), and Randles (1989). The performances of the proposed multivariate control charts are similar to those of the existing methods or charts when the underlying distribution of the process is multivariate normal and are better than the performances of the existing methods under departures from the multivariate normality assumption. This dissertation provides an important contribution to the field of multivariate quality control charts in several ways. First, this dissertation gives a concise summary of the methods that are currently being used to solve multivariate quality control problems. Secondly, this dissertation explores the behavior of the existing methods under departures from the multivariate normality assumption. Next, we propose alternative methods that are shown to be more robust than the existing methods under deviations from multivariate normality. The new methods are robust in the sense that they maintain their pre-specified type I error rate (and, therefore, the pre-specified in-control average run length) and detect out-of-control conditions quickly under both multivariate normality and deviations from multivariate normality. Another important contribution of this dissertation relates to its effort to bridge the gap between theoretical and applied statistics. We use the affine invariant multivariate one sample tests that were developed by Hettmansperger et al. (1994), Peters and Randles (1990), and Randles (1989) to solve a problem in industrial statistics. That is, these affine invariant tests form a basis for the multivariate quality control charts that are proposed. The literature review in Chapter 2 is intended to acquaint the reader with the normal theory and the non-parametric multivariate quality control chart procedures that have been suggested in the literature. In Chapter 3, we thoroughly investigate the performance of the normal theory multivariate control charts under departures from the multivariate normality assumption. In Chapter 4, we investigate the behavior of the multivariate control charts that were proposed by Liu (1995). In Chapter 5. we propose robust multivariate control charts under the assumption that the variance-covariance matrix of the underlying distribution of the process is known. Chapter 6 extends Chapter 5 to the case when the variance-covariance matrix of the underlying distribution of the process is unknown. Chapter 7 contains conclusions and potential areas of further research into the area of multivariate quality control charts. CHAPTER 2 REVIEW OF LITERATURE This chapter is intended to acquaint the reader with the multivariate quality control chart procedures that have been proposed in the literature. We discuss the normal theory multivariate Shewhart, CUSUM, and EWMA charts as well as the non-parametric multivariate control charts that were introduced by Liu (1995). The multivariate Shewhart charts are discussed in Section 2.1. The theory underlying these charts is a simple extension of the theory underlying the univariate Shewhart charts. We, therefore, discuss the univariate Shewhart charts first. The multivariate CUSUM and EWMA charts are discussed in Sections 2.2 and 2.3, respectively. The underlying theory behind these charts is also a straightforward extension of the underlying theory behind the univariate CUSUM and EWMA charts. We, therefore, discuss the univariate CUSUM and EWMA charts first. We briefly discuss Liu's non-parametric charts in Section 2.4. A detailed discussion of these charts along with simulation results is presented in Chapter 4. 2.1 The Shewhart Control Chart The Shewhart control chart is perhaps the most widely used control chart in statistical process control. Developed in the 1920s by Walter A. Shewhart of Bell Labs, it has gained wide acceptance and usage in industry. Montgomery (1991) presents a detailed list of references and a good overview of the theoretical background and applications of the Shewhart chart. We begin our discussion by first describing the univariate Shewhart chart. Let the quality characteristic of a manufactured item be denoted by X. An example is where X represents the inside diameter of forged piston rings. Assume that X N(p0, oo) where both p. and o are known and consider a sample X ..... X from the manufacturing process. A univariate Shewhart control chart for the process mean is given by the following characteristics: UCL =/o + Za/ ) CL = (2.1) LCL =/ o0 where UCL, CL, LCL are the upper control limit, the target or center line, and the lower control limit, respectively, and Zai2 is the upper (a / 2)th quantile of the standard normal distribution. For successive samples of size n, this control chart can be viewed, when the values of the means of the successive samples are plotted on it, as repeated tests of hypothesis of the form H,: p =/u versus Ha : p # p, at the a level of significance. The regions above UCL and below LCL represent the rejection regions of the likelihood ratio test of the above hypothesis. Several authors argue that control charts should not be viewed as repeated tests of hypothesis. We, however, disagree with that view. We see that the advantages from formally viewing a control chart as a sequence of hypothesis tests clearly outweigh any disadvantages. Further, by viewing a control chart as a sequence of tests, we provide a formal basis for introducing novel, robust control chart procedures. For additional discussion on the relationship between control charts and hypothesis testing, see Woodall and Faltin (1996). Most often, the nominal values of p, and 0o are unknown. A typical way of estimating these unknown quantities is by taking m preliminary samples of size n each over a base period when the process is assumed to be in-control. The target mean Pu0 is estimated by X which is given by where X, is the i'h sample mean and the process variance o is estimated by S2 which is defined as rn s2 Y' S 12 m i=l where Si is the Pth sample variance. Substituting these estimators in the control limits given in Equation (2.1), we get UCL = X + A CL= X (2.2) LCL = X7- A,2 where A = Z/2 / - nI and is tabulated in standard quality control text books such as Montgomery (1991). The control limits given in Equation (2.2) are called trial limits and if the sample means from the preliminary samples fall between UCL and LCL, then these limits can be used for future control. If one or more of the preliminary sample means falls above UCL or below LCL, and if we know why it or they are out-of-control, the corresponding samples are dropped and the control limits are recalculated. This procedure is continued until all the preliminary sample means fall within the control limits. In many instances, a manufactured item may have two or more quality characteristics that jointly define the usefulness of the product. For example, consider a bearing that has both an inner ( X, ) and an outer ( X2 ) diameter. Suppose that X, and X2 have a bivariate normal distribution with Cov(X, X2) # 0. One way of monitoring the values of these quality characteristics is by constructing two separate univariate Shewhart charts. The process is said to be in-control only if the sample means X, and X2 both fall within their respective control limits. This method of control yields a joint rectangular control region and the process is said to be in-control if the point ( X1, X2) falls inside this region. There are two major problems associated with this approach. The first deals with wrong probability statements. Assuming independence, suppose that we use a type I error rate of 0.05 to construct control charts for each of the two quality characteristics in the above example. The probability that each sample mean falls within its respective control limits is 0.95. However, the probability that both sample means simultaneously fall within their control limits is 0.952 = 0.9025 producing an inflated type I error of about 10%. Furthermore, the magnitude of inflation increases as the number of quality characteristics increases. In general, if there are p independent quality characteristics and if p univariate X charts each with Pr(type I error)= a are constructed into a single chart, then the true probability of a type I error is 1 - (1 - a)'. In most cases the p quality characteristics are correlated and, therefore, this formula cannot be used to compute the effective type I error rate. A second problem in using separate control charts for two or more quality characteristics is the resulting conflicting answers regarding the signals. Conflicting answers arise because under the assumption of a normal distribution (assuming unequal variances) the contours of constant probability are ellipses and not the square or rectangular region as previously stated. We could, therefore, falsely claim that both X, and X, are in-control when at least one is not or that one or both of X, and X, are outof-control when, in fact, they are both in-control. The general multivariate quality control problem consists of a repetitive process in which each item is characterized by p quality characteristics X ...., XP. The underlying distribution of the p random variables is assumed to be multivariate normal with a known mean vector a and a known variance-covariance matrix Y. The multivariate Shewhart control chart procedure can be viewed as a sequence of hypothesis tests of H,:.) =/'U versus H # u .0. The likelihood ratio test for this set of hypotheses specifies that the null hypothesis be rejected if X2 = n(X- p0) 1(X-/p0) > %,a (2.3) where X denotes the (p x 1 ) vector of sample means and X2, is the upper cath quantile of the Chi-square distribution with p degrees of freedom. The control chart is formed by letting UCL = X 2 LCL = 0 12 and plotting the values of X2 . We conclude that the process is out-of-control if any of the collected samples yields a value of X- that falls above UCL. Most often, both p,0 and Y are unknown and, therefore, have to be estimated from a base period of m samples each of size n when the process is assumed to be incontrol. The sample means and variances are then calculated as follows: I n Xjk -7Xik Sk 1 (X k - Ak) where Xyk is the ih observation on the jth quality characteristics in the k' sample. The covariances between quality characteristics X, and Xh where j # h are given by n S,hk - - (x, - Xj )(xhk - X hk). The statistics Xjk, S2k, and SJhk are then averaged over all m samples to get X, XZjk. m k I s2 IIS and 1tm 53h Zjhk m k where j ; h. The sample means X, are the elements of the vector X and the p x p sample covariance matrix S is defined as S1 S12 ... SI S S1 2 SP S p S ... S p We now replace p, by X and Z with S in the X: test statistic given in Equation (2.3) to get T'2 =n(X- X)T (XX). This test statistic is called Hotelling's T2, and the control chart that is based on it is referred to as the Hotelling's T2 control chart. For large sample sizes and under the assumption that the process is in-control, the T2 distribution converges to the X2 distribution. Therefore, if po and Y are estimated from a large number of preliminary samples then it is customary to use the X 2 control chart that was described earlier. For small sample sizes the control chart is formed by letting UCL = T2 z, p'n- p LCL = 0 where p,np is the upper ah quantile from the T2 distribution with p numerator and n - p denominator degrees of freedom. We claim that the process is out-of-control whenever T2 for any sample exceeds UCL. As in the case of the univariate control chart the multivariate chart control limits are considered to be trial limits. If all the preliminary sample means yields a X2 (or T2) test statistic value that is less than p,a ( T ) then we can use these limits for future control. If one or more preliminary sample means yields a X2 (or T2 ) value that exceeds Z ", ( T,2.pP,_ ) and if we know why it or they are out-of-control, the corresponding samples are dropped and the control limits are recalculated. The reader is referred to Alt (1984), Alt and Smith (1988), and Jackson (1980, 1985) for more details on the Hotelling's T2 chart. Next, we discuss the principal components approach to the multivariate quality control chart problem. The procedure is based on principal components and amounts to transforming the correlated quality characteristics into a set of new independent variables which are linear functions of the quality characteristics. The starting point of the statistical applications of the method of principal components is the sample covariance matrix S. The axes of the control ellipsoid corresponds to the eigenvectors of S. For example, when p = 2, the axes of the control ellipse may be thought of as vectors in two-space that characterizes the rotation of the original axes. The length of the major axis is 2 -X, T] and the length of the minor axis is 2J 2 where A1 and , are the eigenvalues of S and 2(N- 1) (N-2) a.2,V- 2 Note that T2 and F,,,..- 2 are the upper octh quantiles of the T2 and F distribution with 2 and N - 2 degrees of freedom, respectively. The coefficients of the first eigenvector of S are the cosines of the angles between the major axis and the X, and the X2 axes. Similarly, the coefficients of the second eigenvector of S are the cosines of the angles between the minor axis and the X and the X2 axes. We can generalize this concept to p quality variables XT = (X ,..., Xp). A principal axis transformation transforms X into p uncorrelated variables yT - (Y,,...,Yp). The coordinate axes of these new variables are described by the vectors ua ,.., Up which make up the columns of the orthonormal matrix U. The columns of U are the eigenvectors of S. The transformation that yields Y is given by Y = UT(X-X) where X and X are the p x 1 vectors of the original quality variables and their sample means. If X - Np (p, Y), then Y - Np (0, A), where 0 is the p x l vector of zeros and A = diag(.1,.....Ap) where AP ,...,*p are the eigenvalues of S. It can be shown that the determinant of S, ISI, equals the determinant of A, JAI. Similarly, it can be shown that the trace of S, tr(S), equals the trace of A, tr(A). Therefore, the proportion of total variability associated with each principal component is given by (k, / tr(S))100 for i l,...,p. Two alternate and sometimes more desirable ways of scaling the principal components are by using the following transformations: Y* =A12UT(X- X) and y** A-2UT (X _ X). The p x 1 vector Y* has mean 0 and variance-covariance matrix equal to A2. On the other hand, the p x 1 vector Y** has mean 0 and variance-covariance matrix equal to I, the identity matrix of order p. The vector Y** is preferred in the multivariate quality control chart setting since it has as an identity covariance matrix. Therefore, each component of Y** has unit variance. It can be shown that T2 = y**Ty** (see Jackson, 1959). We can, therefore, plot y**Ty** on the T2 control chart to monitor the process mean. 2.2 The Cumulative Sum Chart (CUSUM) The cumulative sum (CUSUM) control chart, first proposed by Page (1954), is a powerful and popular method for monitoring industrial processes. Numerous studies on the CUSUM chart's performance and properties have been conducted, see Montgomery (1991) for a list of references on the subject. The CUSUM can be viewed as a sequence of sequential probability ratio tests (SPRT) which are applications of the generalized likelihood ratio test which we describe below. Consider testing H.: 0=0, versus H :0 = 0 where 0 is the parameter of interest. The parameter values 0 and 0, can be looked upon as the in-control and the out-ofcontrol values of the process. Suppose that we begin observing the process at time 0 and that we are able to make a decision whenever some value , i 1,2,... suggests there is a problem. Assume that 0,, 0... 0,,, ... are appropriate functions of sufficient statistics and consider a procedure that is based on SPRT. Let the likelihood ratio at time i be given by zi = Lnf(X10 f (xi 0) where the x's are random variables with a probability density function given by f and Ln is the natural log function. For independent observations the log-likelihood statistic at the nth step is given by Z,= z, and we conclude that the process is out-of-control whenever Z, >_ a where a is some constant. Similarly, we can conclude that the process is in-control whenever Z,, < b where b is another constant. For a SPRT, b is typically set at 0 since b < 0 slows the procedure's ability to detect an out-of-control state. Therefore, the CUSUM is a sequence of independent SPRTs with the following decision rules: 1. Signal an out-of-control state whenever Z,, > a. 2. Restart the CUSUM whenever Z,, _ 0. 3. Continue the current CUSUM whenever 0 < Zn < a. To illustrate the basic CUSUM procedure, consider a sequence of individual observations from a normal distribution with mean p and variance o-> Assume that a' is known and that /u0 and A are the in-control and out-of-control values of the process mean u. The likelihood ratio at time i is given by f (xi I A ,_U_) z, x pf(x- 1,,.2) 1 exp(-_ _ (Xi- _,u)2) = Ln- u exp(-22 (x, -p.)2) (p1- /)x, /-2 A 2- Therefore, the critical inequality is given by: n O n~ 0-- 20-2 10 < 2[ ] a~ 2-/ 2 ~ For p1 = p,) + do-, where 5 is the magnitude of the shift, the critical inequality is given by n.,- -/4 -(8 a 0< a 8 and if we let -l xi-14 =1 0then the critical inequality becomes o < Sn_, + ( xn-PO)--2(5< a 10 28 This critical inequality can be written as O I Let So be the initial value for the CUSUM statistic. 2. Signal a possible out-of-control state whenever S, > h. In the previous development we assumed that /u > ,o For the case when ,u1 S, = min[O, S, + (x* - d)] and we signal an out-of-control state whenever S,, < -h. The basic parameters of the CUSUM are, therefore, d and h. For monitoring a normal mean the typical value of d for a one o- shift in ,u is 0.50. Woodall and Ncube (1985) extended the univariate CUSUM scheme to the multivariate setting. They described a method for monitoring a p - dimensional multivariate normal process by using p two-sided univariate CUSUM charts. Each quality characteristic is controlled by operating a two-sided univariate CUSUM chart. That is, the jth two-sided univariate CUSUM is operated by forming the cumulative sums S J', = max(0, Sj,_t1 + Xjt - ks ) Tt = min(O, T,-_ +x, + kj) where Sjj . 0, Tjt __ 0, kj > 0, and x1j is the sample mean at time t for quality characteristic j. The jth two-sided chart signals that the corresponding process mean has shifted when either Sj, > h, or T, < -hj for some CUSUM chart parameters kj and h,. The process is declared out-of-control whenever any of the p two sided charts signal. A disadvantage of Woodall and Ncube's multivariate CUSUM chart is that its performance depends on the direction of the shift of the process mean. Healy (1987) showed that a univariate CUSUM chart, which is based on a linear combination of the quality characteristics can be used in the multivariate case. The performance of Healy's multivariate CUSUM procedure also depends on the direction of the shift of the process mean. This procedure is based on the theory of sequential probability ratio tests and is explained below. Assume that X1,...X,,X , . are distributed as NpY-) where u =pu (the process mean in the in-control state) or Pu P2. (the process mean in the out-of-control state) and the variance-covariance matrix Z is assumed to be known. The likelihood ratio at time i is given by f2 (X,) c, exp(-0.5(X, _)T (Xi -/2)) f, (X,) c 2 exp(-0.5(Xi -..,4)T (X, - U)) exp(_0.5(X, _ * )_ -I (Xi - '7 ) exp(-0.5(X, - 47 (X, - [ )) exp((u ) - 0. ) _ X, - 0.5(p_ .,-- yIi -T [-1)) = exp(( 1 ) y 2X - 05(/.12 ry,(t: 4) where c, (27r) ,2 . The log of the likelihood ratio at time i is given by log( f(X,) _- X, - 0.5(P2 +/4 ( - /4 A f, (X) ,) and if we let z1~Of2(Xl) - = lo f,(X) then since the observation vectors are considered independent, the log-likelihood statistic n at the nth step is given by Z, = z . We conclude that the process is out-of-control i=1 whenever Z > L where L is a constant that achieves a pre-specified in-control average run length. The critical inequality is, therefore, given by o n [0 _ -I )T Z-1 x, .(, + P4 )T E- 'U-/) < L. i=1 Y = V]( Y2 A ) I ( -U-_A) represent the Mahalanobis distance between p, and u, and divide the critical inequality throughout by y to get n 0 < - [a TX, - K]< H where Tr=71(t -/)T -1, a (U K =0.5y-(At + )T '(t-), and L H Note that aT (Xn - ) is distributed as N(0,1) in the in-control state and is distributed as N(y,1) in the out-of-control state. We can rewrite the critical inequality as 0 by letting n-I s, I (aTx,- K). The multivariate CUSUM procedure can, therefore, be defined as Sn = max(Sn1 + a TX - K,O) > H. Therefore, for detecting a shift in the mean of a multivariate normal random variable the CUSUM procedure reduces to a form of the univariate normal CUSUM procedure. The above formulation of the multivariate CUSUM procedure considers shifts in the mean vector in the direction p, to At, = + 5 where 5 = It, - A . If shifts along the line connecting A and u,, but in the direction away from P, are also considered, then another one-sided CUSUM can be developed and combined with the first one-sided CUSUM to form an ordinary two-sided CUSUM. The preceding derivation was modeled after the derivation of discriminant analysis by Anderson (1984) which we briefly discuss now. Let f1 (x) and f2 (x) be the probability density functions associated with the p x I random vector X as belonging to the populations ;i, and if-, respectively. An object X with associated sample space C1 must be assigned to either x, or '. Let R, be the set of X values for which we classify objects as ;-, and let R, = f2 - R, be the remaining X values for which we classify objects as ;r,. Assume that ;r, has the multivariate normal distribution with mean A2 and variance-covariance matrix F_ whereas 1f, has the multivariate normal distribution with mean I_ and variance-covariance matrix E . It can then be shown that a vector of observations X is classified as belonging to i if 1 c(211) p, where p, and p,. are prior probabilities of ir, and ;f.. Note that c(211) and c(2) are the costs of misspecifying the observation vector X as 7., when it should be classified as ;., and vice-versa. In the multivariate CUSUM setting, we assume that p, = p,. and that ir, and zf, are the distributions of X in the in-control and the out-of-control states, respectively. The ratio between the two costs is equivalent to selecting a cut-off value to achieve a pre-specified in-control average run length. Crosier (1988) proposed two multivariate CUSUM procedures. The first procedure is based on accumulating successive values of T, the square-root of Hotelling's T2. This multivariate CUSUM is given by S = max(0, S.-, + T. - k), where So > 0 and k, > 0. This procedure signals that the process is out-of-control whenever S, > h, where h, > 0 is chosen to achieve a pre-specified in-control average run length. Crosier's second chart is based on accumulating the observation vectors using the statistic c, = ((s,, +x,)r 2- 1(S,_, + X,) "and S, =0, ifC, _ Pignatiello and Runger (1990) also proposed two multivariate CUSUM charts. The first procedure is based on accumulating successive values of Hotelling's T2. This multivariate CUSUM scheme is given by S, = max(O,S._I + 72 -k3), where S. > 0 and k3 > 0. This procedure signals that the process is out-of-control whenever Sn > h3 where h3 > 0 is chosen to achieve a pre-specified in-control average run length. Pignatiello and Runger's second chart is based on the following vectors of cumulative sums: Di = X j=i-Ij+l and MC, = max{0,(D T I D, )12- k4 }, (2.5) where k4 > 0 and , + 1, ifMC1_j > 0 - 1, otherwise, i = 1,2,3,.... The process is declared out-of-control whenever MC, > h4 where h4 > 0 is chosen to achieve a pre-specified in-control average run length. As was the case with Crosier's (1988) procedure, Pignatiello and Runger's vector valued multivariate CUSUM is more sensitive to shifts in the process mean than the multivariate CUSUM procedure that is based on accumulating successive values of T2. 2.3 The Exponentially Weighted Moving Average Chart (EWMA) The univariate EWMA chart, introduced by Roberts (1959), is another alternative to the Shewhart chart when small shifts in the process mean are of interest. The performance of the EWMA is similar to that of the CUSUM in the sense that both charts are able to quickly detect small shifts in the process mean. The Shewhart, CUSUM, and EWMA control charts differ in how each chart uses the data generated by the production process (Hunter, 1986). An out-of-control signal from the Shewhart chart depends entirely on the most recently plotted point. That is, the weight (w, ) given to the most recently plotted point is w, = 1, and the weights given to all previous points are i'. 0 for k > 1. The Shewhart chart thus ignores all information in the past data and is insensitive, therefore, to small shifts in the process. The out-of-control signal from the T CUSUM chart depends upon the sum ST =7'd, where d, is the deviation of an t=1 observation Xt from the target mean p. Thus, all the observations in the CUSUM are weighted equally since all the d, 's in the sum ST receive equal weight. On the other hand, the EWMA chart is based on a statistic that gives less and less weight to data as they get older and older. The performance of the EWMA chart, therefore, depends on the weighting constant. A smaller weighting constant leads to an EWMA that is more sensitive to smaller shifts in the process mean whereas a large weighting constant leads to an EWMA that is more sensitive to larger shifts. The univariate EWMA statistic is defined as Zt = 0)X, + (I - co)Zt_1 where 0 < co < 1 is the weighting constant, X, is the sample mean at time t, and the starting value for the first sample (t = 1) is Z0 = X where X is the average of the sample means from m preliminary samples taken when the process is assumed to be in-control. Note that Zt_1 = coXtrI + (I - Vo)Zt_2 and we can substitute this in the formula for Zt to get z =XT + o(1-o )Xl + (1- t)2Z_ ,. If we continue substituting recursively for Z- (j 2,3,..., t ) we get t-I z, = COY (I- o)', -, + (I - co)t Zo . j=0 It is clear that the weights co(1 - co) decrease in value with the age of the sample mean, that is, as the value of j increases. For example, if the most recent sample mean has been assigned a weight of 0.20 then the sample mean at time t - 1 gets a weight of 0.16, the sample mean at time t - 2 gets a weight of 0.128 and so on. The values of the weights, therefore, decrease geometrically with age. If the sample means X, are independent random variables with variance a72 In, then it can be shown that the variance of Z, is 2 a2 0) Uzi =-n(2-c)(1-(-o)2) with limiting value co/(2 -o) We can, therefore, form the EWMA chart control limits as follows: LCL - k4 (2 - o) / con, CL = X, and UCL X + k-/(2 - o))/ oi where k > 0. The control limits in this case are based on the asymptotic variance of Z,. The control limits that are based on the exact variance of Z, lead to a natural fast initial response EWMA control chart where initial out-of-control states are detected quickly. However, in reality we would expect the process to be in-control at the start up stage and then drift out-of-control and, therefore, we have used the asymptotic variance in the construction of the control chart. The univariate EWMA chart is a plot of Z, against time t and the process is declared out-of-control whenever Z, falls above UCL or below LCL. The design parameters of the EWMA chart are k, the multiple of o used in the control limits, and co. The parameter values are usually chosen to achieve a pre-specified in-control and out-of-control average run length (ARL). Theoretical studies of the average run length properties of the EWMA chart have been conducted by Crowder (1989) and Lucas and Saccucci (1990). These studies provided average run length tables for a range of values of co and k. An optimal design strategy would involve specifying the desired in-control and out-of-control average run lengths and the magnitude of the process shift that needs to be detected. Once these quantities are specified, the appropriate values of co and k are selected. In general, small values of co are chosen to detect small shifts in the process mean. The control limits are usually set at the standard (k = 3) 3c limits. Lowry et al. (1992) extended the univariate EWMA to the multivariate setting (MEWMA) by defining vectors of EWMA's as follows: Zt = AXt +(I-A)Zt-1, for t - 1,2.... and with Z0 = 0 and A= diag(co (o,,...,cp), 0 < co < 1. j = 1,2,...,p and where I is the p - dimensional identity matrix. The random vectors X, are i.i.d. NP (0, Y) for t = 1,2,.... The MEWMA chart gives an out-of-control signal as soon as T2 = Z' Z, > H (2.6) where H is chosen to achieve a pre-specified in-control average run length and z, is the covariance matrix of Z,. If there is no a priori reason to weight past observations differently for the p quality characteristics being monitored, then equal weights are assigned, i.e., co1 = 2 -...oP = o. In this case the MEWIMA can be written as Z, = COX, +(1- co)Z,_j. It can be shown that the covariance matrix of Z, in this case is given by C o [l_(lco)2t]E 2-co with asymptotic covariance matrix 0) 2-co The MEEWMA with the exact variance-covariance matrix leads to a natural fast initial response chart. Thus, initial out-of-control states are detected more quickly. However, it is more likely that the process will startup and remain in-control for a while and then shift out-of-control. Therefore, in practice the asymptotic variance-covariance matrix is used to calculate the MEWIA statistic. Simulation results given in Lowry et al. (1992) indicate that smaller values of co are more effective in detecting small shifts in the process mean. Their results also suggested the performance of the MEWMA chart with co =0.10 compares favorably with that of the multivariate CUSUM charts that have been proposed. However, the MEWMA is slow in detecting large shifts in the process mean. Therefore, it is recommended that the T2 chart be used in conjunction with the MEWMA. However, in this case there is a trade-off between protection against the detection of large shifts and the quick detection of small shifts in the process mean. This is because the control limits of the MEWMA must be increased slightly to maintain the desired in-control ARL. The EWMA chart can also be used as a process forecasting device. The EWMA provides a forecast of where the process mean will be at the next time period. In order to see this in the univariate case, we first write the EWMA at time t as EWMA = Z, = Zt 1 + we, = Zt 1 + cv(1-1 - Zt_1) where Z, = predicted value of the process mean at time t (the new EWMA), x,_, = estimate of the process mean at time t - 1, Zt_1 = predicted value of the process mean at time t - I (the old EWMA), e, = Tt_ - Zt_ = observed error at time t - 1, and c is a constant (0 < co < 1 ) that determines the depth of memory of the EWMA. We assume that the random error et is distributed normal with zero mean and variance aj. Therefore, Zt is actually a forecast of the value of the process mean / at time t and it can be used as the basis for a dynamic process control algorithm. If the forecast of the mean is different from the target by a critical amount, then either the operator or some electro-mechanical control system can make the necessary process adjustment. The control limits on the EWMA chart can be used to signal when an adjustment is necessary, and the difference between the target and the forecast of the mean , can be used to determine how much adjustment is necessary. 2.4 A Review of Liu (1995) Liu (1995) introduced three non-parametric multivariate control charts- the "r" chart, the "0" chart, and the "S" chart. These charts are based on the concept of data depth and do not require any assumptions be made about the underlying distribution of the process. The main idea behind the construction of these charts is the reduction of each multivariate observation, XT = (XI,..., X,), to a univariate index- namely a ranking that is based on the notion of data depths. These ranks are then used to construct the multivariate control charts. We present a brief summary of the three non-parametric multivariate control charts in this Section. A detailed discussion along with simulation results is given in Chapter 4. For any point X E RI, the simplicial depth of X with respect to a distribution G is given by D(X) = PC(XG-sfX,...,Xp,]), where s[X .,..., X p,] is a simplex whose vertices X ...., X,,, are p + I random observations from G. The quantity D. (X) is a measure of how "deep" or how "central" the point X is with respect to the distribution G. Most often, G is unknown and only a sample X ,..., Xm is available. The empirical depth of the point X with respect to the data cloud X,..., Xm is given by D,(X) I I(X es[X,,,.... X]) where Gm is the empirical distribution of X,,..., X., I is the indicator function and is equal to one if X c s[.] and equal to zero otherwise. The quantity DG, measures how "deep" the point X is within the data cloud X, ..., X. In the multivariate control chart setting, the sample X ...., X. is considered to be the base period sample and the point X is considered to be an observation from the control period. The base period sample is assumed to come from a distribution G while the control period sample is assumed to come from a distribution F If the process is in-control then G = F otherwise G # F. We will assume that both G and F are unknown. We will now briefly discuss the three data depth multivariate control charts. First, consider taking a base period sample, Xl,..., Xm, when the process is assumed to be incontrol. Next, for each observation, X, in the control period consider the following test statistic: m rc (X) = -ZI(DG. (X,) DG (X)) m j=1 where I is the indicator function and is equal to one if the data depth of X, is less than or equal to the data depth of X and is equal to zero otherwise. The quantity r% (X) measures how outlying the point X is with respect to the data cloud X ...... Xn. A small value of r. (X) indicates that only a small fraction of the X,'s are more outlying than the point X. This would indicate that the point X is at the outskirts or on the boundary with respect to the data cloud X1.... X,,. The three non-parametric multivariate control charts are based on the statistic rG. (X). The "r" chart is constructed by first taking a base period of m observations X1 ,...,X.. Next, for each observation, X, in the control period, the statistic r. (X) is computed. The "r" chart is a plot of rG. (X*) versus time t = 1,2,.... The center line (CL) is set at 0.50 and the lower control limit (LCL) is set at a. These control limits are based on the asymptotic distribution of r. (X*) being a uniform distribution between 0 and 1, U[0,1]. The asymptotic distribution of r% (X*) suggests that LCL = a. We claim that the process is out-of-control whenever the values of rG. (Xt) for a point X7 is below LCL. The "0" chart is constructed by first taking a base period of m observations X1,...,X . Next consider taking samples of size n ( X,, i = 1,.,n) in the control period. For each observation, X*, in the control period sample, the statistic r., (X*) is computed. The "Q" chart is based on the average of the %G (X*) taken over the control period sample. That is, for each sample of size n in the control period, we compute the statistic 1 n Qt (G., Fj)=-I rG. (X,*) where G. and F1 are the empirical distributions of X1,... Xm and X ,..,X respectively, and t = 1,2,.... Liu and Singh (1993) show by simulation that the asymptotic distribution of Q' (G., Fn) is 1 1 1 1 2'12 i n The control limits of the "0" chart are, therefore, given by CL - 0.50 and LCL--0.50-Z 1 The process is declared out-of-control whenever Q (G,, F,) for the th sample falls below LCL. The "S" chart which is analogous to the univariate CUSUM chart for the process mean is constructed by first taking a base period sample X .... , Next, for each observation, X*, in the control period, the statistic r. (X7) is computed. The "S" chart is based on the statistic S, (G,) which is defined as: t1 St,(G.)= [rc (Xi)-21 2 Note that S, (G,) can be rewritten as S, (G. ) -n[Q(G,,, F, I / 2]. We can, therefore, construct the "S" chart by letting CL = 0 and LCL -Z W n[(1 /m) + (1/n)]/12. These control limits were derived by using the asymptotic distribution of Q(Gm, F"). The "S" chart is a plot of S, (G,) versus time t - 1,2,.... We claim that the process is outof-control whenever St (Gm) falls below LCL. With the literature review in the background, we will now discuss the performance of the normal theory and the non-parametric control charts under departures from the 34 normality assumption. This will lay the foundation for introducing novel, robust multivariate control charts. CHAPTER 3 ROBUSTNESS OF THE NORMAL THEORY MULTIVARIATE CONTROL CHARTS In this chapter we investigate the performance of various normal theory multivariate control chart procedures under departures from the multivariate normality assumption. Included in this investigation are the average run length studies of the X2 chart, the Hotelling's T2 chart, the multivariate CUSUM procedures that were proposed by Crosier (1988) and Pignatiello and Runger (1990), and the multivariate EWMA chart that was proposed by Lowry et al. (1992). The various multivariate distributions that were used in the simulation study are discussed in Section 3.1 and the simulation strategy is outlined in Section 3.2. The average run length performance of the ;2 charts under departures from multivariate normality are discussed in Section 3.3. We discuss the performance of the X2 chart for individual observations and for subgroups of observations. The average run length performance of the Hotelling's T2 chart under deviations from multivariate normality is discussed in Section 3.4. Also, we discuss the performance of the Hotelling's T2 chart for individual observations and for subgroups of observations. The performances of the multivariate CUSUM and the multivariate EWMA charts under deviations from multivariate normality are discussed in Sections 3.5 and 3.6, respectively. The simulation programs to compute the average run lengths are given in the appendix. 3.1 Distributions Used in the Simulation Study The simulation study was conducted by sampling random variates from distributions with elliptical directions. We define X., X, to be observations from distributions with elliptical directions if they can be constructed in the following way: Let Ul ..., U, be i.i.d. uniformly distributed over the p - dimensional unit hypersphere. Let R., R. be any positive scalar random variables. Let D be any nonsingular p x p matrix and form: XI = R1DU . X, = R DU,. Note that when R,..., R. are i.i d. the X, 's are a sample from some elliptically symmetric population. In fact, this generation process characterizes all elliptically symmetric populations when the R,'s are 1i..d. The class of elliptically symmetric distributions includes the multivariate normal distribution, the Pearson type VII heavy tailed distributions and the Pearson type II light tailed distributions. The following bivariate distributions (with the exception of the bivariate mixed normal distribution) fall under the general class of elliptically symmetric distributions: 1. The bivariate normal distribution denoted by N, (p, -) where u is the 2 x 1 mean vector and Y is the 2 x 2 covariance matrix which is assumed to be known. Without loss of generality it was assumed that p 0 and Y I where 0 is the 2 x 1 zero vector and I is the 2 x 2 identity matrix. 2. The bivariate Cauchy distribution: This was obtained by first generating a bivariate normal (N2 (0, 1)) random variate and an independent Chi-square random variable (q) with one degree of freedom (v = 1). The bivariate Cauchy random vector is then given by N,(0,I)/ ri/v 3. The bivariate t distribution with two degrees of freedom: This was generated by using the same method that is described in (2) but with v = 2. 4. The bivariate t distribution with five degrees of freedom: This was generated by using the same method that is described in (2) with v = 5. 5. The bivariate t distribution with eight degrees of freedom: This was generated by using the same method that is described in (2) with v = 8. 6. The bivariate I distribution with eighteen degrees of freedom: This was generated by using the same method that is described in (2) with v = 18. 7. The bivariate mixed normal distribution: This was obtained by first generating a uniform random variable (U) between 0 and 1. A random vector was generated from the bivariate normal N, (p1, I) distribution if the observed value of U was less than or equal to 0.50. A random vector was generated from the bivariate normal N, (p2, I) distribution if the observed value of U was greater than 0.50. Note that /.'T (-1,-1) and /r =(1,1). For more information on these distributions or on techniques of multivariate simulation, refer to Johnson (1986). 3.2 The Simulation Strategy The simulation study was conducted by using various FORTRAN 77 and IMSL subroutines on the UNIX platform. The type I error (a) was set at 0.005. For a given, known variance-covariance matrix this yields an in-control average run length of 200. This implies that on the average we would expect a false alarm every 200th observation. The average run length values in the simulation study are based on 100,000 out-of-control signals. To illustrate the simulation strategy that was used, consider the simulation study of Hotelling's T2 chart for individual observations with a base period sample of m observations. Assume that the observations are generated from the N2 (Pu, Y) distribution where both p and I are unknown. 1. Generate m (X,., Xm) observations from the N, (p, 1) distribution. 2. Estimate both p and 1 as follows: the mean vector p is estimated by m X = Xi / m and the covariance matrix I is estimated by I=1 m S (I/m M 1)y (X, -X)(X,-). 3. Generate a future observation (Xf, where f = 1,2 .... ) from the N2 (P,) distribution. 4. Compute 7 = (Xf -X) S- (Xf -X). 5. If T 2 _> h (where h is an appropriate cut-off point), record f (the point where the out-of-control signal was observed) and go to step 1, else go to step 3. These steps are repeated 100,000 times and the average run length is taken as the average of the 100,000 f values that were recorded whenever T 2__ h. 3.3 The Performance of the X2 Chart Consider testing H0: p = 0 versus Ha: p # 0. Here 0 is used without loss of generality, since H0: = p, can be tested by subtracting p0 from each observation vector and testing whether these differences are located at 0. Since the X2 test is invariant under all nonsingular linear transformations of the data, without loss of generality, we assume that Z = 1. The X' chart for individual observations is based on the test statistic X2 = XTX and we claim that there is evidence to indicate that the process is out-ofcontrol whenever the Z2 value for an observation X exceeds Z20005 = 10.60 (since we are dealing with bivariate vectors). Figure 3.1 gives the average run length performance of the X2 chart for various shifts in the process mean under each of the sampled distributions that are given in section 3. 1. The shifts are given in terms of the non-centrality parameter which is defined as 2 = (U T1) 2 where p is the mean of the process in the out-of-control state. Note that 2 is the Mahalanobis distance between the mean of the process in the in-control state and the mean of the process in the out-of-control state. A value of /. = 0 indicates that the process is in-control. Figure 3.1 shows that the performance of the X2 chart is poor under the heavier tailed bivariate distributions as indicated by the lower average run lengths. The Z2 chart maintains the pre-specified in-control average run length of 200 under multivariate normality. The in-control average run lengths are lower under the heavier tailed distributions. This is indicative of a higher than 0.005 false alarm rate. As the degrees of freedom (v) of the multivariate t distribution increases, the average run lengths start to converge to the average run lengths of the chart under multivariate normality. This is true since as v - oo the multivariate t distribution converges to the multivariate normal distribution. The type I error rate is also high under the mixed normal distribution. On Figure 3.1 Plot of ARL versus A, of the X2 chart for individual observations. the other hand, the type I error rates were found to be lower than the pre-specified type I error rates under the various Pearson type 2 distributions that were used in the study. We can, therefore, conclude that the X2 chart for individual observations performs poorly when the distribution of the data is other than multivariate normal. Next, consider the performance of the ,2 chart with subgroups of n1 observations. This chart is based on the test statistic X2 nXTX where X is the p x 1 sample mean vector. We claim that there is evidence to indicate that the process is out-ofcontrol whenever the X 2 value for a sample in the control period exceeds " =0 10.60. Figure 3.2 shows the average run length performance of this chart under the various bivariate distributions that were used in this dissertation study. The simulation results are based on subgroups of size 5 (n = 5 ) and the non-centrality parameter is given by A =(np TU)1/2. 200 180 160 + Normal 140 -- Cauchy 120 &t-2 ï¿½j X t-5 100 80 t-8 -- t-18 60 .-t--Mix Nor 40 20 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.2 Plot of the ARL versus 2, of the ,Z chart with subgroups of size 5. Figure 3.2 shows that the performance of the X2 chart with subgroups of size 5 is poor under the heavier tailed bivariate distributions. However, it maintains its prespecified in-control average run length under multivariate normality. The type I error rates are high under both the heavier tailed and the multivariate mixed normal distributions. As expected, an increase in the degrees of freedom (v) of the multivariate t distributions causes the performance of the chart under the multivariate t distributions to converge to the performance of the chart under multivariate normality. The type I error rates of the chart under the various Pearson type 2 distributions were found to be lower than the pre-specified type I error rate. We can, therefore, conclude that the performance of the X-2 chart with subgroups of size 5 is poor under deviations from multivariate normality. 250 Normal 200 C-.11- Cauch y I 150 X t-5 100 t-8 i --Q --t- 18 50+mix Nor 0 0 0.5 1 1.5 2 2.5 3 Lambda 3.4 The Performance of Hotelling's P Chart We will first discuss the performance of Hotelling's T2 chart for individual observations. This chart is based on the test statistic T2 = (Xf-X)TS I(Xf-X) where X and S are unbiased estimators of the process mean pu and the covariance matrix Y, respectively, and X, is an observation from the control period. The estimators X and S are computed from a base period of m observations when the process is assumed to be in-control. Woodall and Sullivan (1996) have noted that the in-control average run lengths are smaller than the pre-specified in-control average run lengths when S is used to estimate the variance-covariance matrix Y. They proposed alternative estimators which were shown to be more robust than S. These robust estimators were used in place of S to form the T2 statistic. We have assumed that the process is in-control in the base period and we, therefore, use S in our simulation studies. The performance of the T 2 chart with S replaced by an appropriate robust estimator of E will be studied as a follow up to this dissertation. Showing that the process is out-of-control whenever T2 for an observation Xf exceeds an appropriate cut-off value (say) h, is the next order of business. We shall study the performance of the T2 chart for individual observations under three different cut-off values. The first cut-off value is based on the asymptotic distribution of T2 which is Chisquare with p degrees of freedom. The cut-off value in this case is 0.005 10.60. The second cut-off value is based on the exact distribution of T2 which is related to the well known F distribution by the following relationship: Tpp pm M- 1) F (m- p) =Pwhere TJp~m p is the upper ath percentile of the T2 distribution with p numerator and m - p denominator degrees of freedom, and Fp,,,_p is the upper ath percentile of the F distribution with p numerator and m - p denominator degrees of freedom. The simulation study was conducted with base periods of 20, 50, and 100 observations. The cut-off values of the T2 chart for individual observations with these base periods are 15.99, 12.34, and 11.41, respectively. Tracy, Young, and Mason (1992) discussed different cut-off values for the start-up and the control stages of the T-2 chart. For simplicity, we have assumed that the process is in-control at the start-up stage and thus does not require monitoring. The cut-off value in the control stage as suggested by Tracy, Young, and Mason (1992) is based on the assumption that the process mean, p, is known. They suggest the following cut-off value: A~M + 1)(m - 1) FPM-I m(m - p) ,p,mwhere F p,,_ is the upper ath percentile of the F distribution with p numerator and m - p denominator degrees of freedom, respectively. In practice, the process mean is unknown and is estimated from a base period of m observations. Our cut-off value is based on practical considerations and not convenience. The third cut-off value was obtained by simulation to achieve an in-control average run length of 200. Note that the average run lengths follow the geometric distribution. However, one of the underlying assumptions of the geometric distribution is that the observations are independent. This assumption does not hold for the Hotelling's T2 chart since successive values of the T2 statistic are not independent because they depend on the base period estimates X and S. The simulated cut-off values for base period sample sizes of 20, 50, and 100 are 10.90, 10.83, and 10.77, respectively. Figures 3.3 through 3.9 shows the average run length performance of the T2 chart (with the cut-off value based on the asymptotic - 2 distribution) for individual observations with base periods of sizes 20, 50, and 100 taken from various bivariate distributions. These plots indicate that the performance of the T2 chart (with the asymptotic cut-off value) for individual observations is poor under departures from multivariate normality. The in-control average run lengths are underestimated even when the underlying distribution of the process is multivariate normal. The type I error rates are high for the heavy tailed distributions. In comparison, the type I error rate is smaller than the pre-specified type I error rate when the underlying distribution of the process is assumed to be multivariate mixed normal. Increasing the base period sample size does not compensate adequately for this adverse behavior. The type I error rates were lower than the pre-specified type I error rates under the various Pearson type II lighter tailed distributions that were used in this dissertation study. As a result, the in-control average run lengths of the T' chart are higher than the pre-specified in-control average run length. The average run lengths of the charts in the out-of-control states are consequently affected. Figure 3.3 Plot of ARL versus 2 of the T 2 chart for individual observations under bivariate normality. .J 45 40 35 30 25 20 I C; -- r=50 ----'-m=100 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.4 Plot of ARL versus A of the T2 chart for individual observations under the bivariate Cauchy distribution. 200 180 160 140 120 -I 100 80 m 60 40 20 0 0 0.5 1 1.5 2 2.5 3 Lambda "--M=20 ï¿½--'-m:50 : A M- 100 I~O S15 U m=20 ----M=50 M=l0 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.5 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate t distribution with 2 degrees of freedom. 45 40 35 30 . 25 20 15 10 5 0 0.5 1 1.5 2 2.5 3 Lam bda --- m=20 l-- --m=50 [ -k-m=!0 Figure 3.6 Plot of ARL versus 2 of the T 2 chart for individual observations under the bivariate t distribution with 5 degrees of freedom. 60 50 ,40 , , 30 20 10 0 0.5 1 1.5 2 2.5 m=20 M=50 AM=10 Lambda Figure 3.7 Plot of ARL versus A of the T 2 chart for individual observations under the bivariate t distribution with 8 degrees of freedom. 10090 807060 50J 40," 30 2010 0 0.5 1 1.5 2 2.5 Lambda +m=20 ----m=50 M-A-m=100 Figure 3.8 Plot of ARL versus A of the T2 chart for individual observations under the bivariate t distribution with 18 degrees of freedom. 500 450 400 350 300 ----rn=20 S250 --m5 200 -a- m=l0 150 100 50 0 0.5 1 1.5 2 2.5 Lambda Figure 3.9 Plot of ARL versus A of the T2 chart for individual observations under the bivariate contaminated normal distribution. Figures 3.10 through 3.16 show the performance of the T2 chart (with cut-off values based on the exact distribution of T' ) for individual observations under the various bivariate distributions that were used in this dissertation and for different base period sample sizes. These plots indicate that the performance of the T2 chart for individual observations (with cut-off value based on the exact distribution of T2 ) is poor under both multivariate normality and deviations from multivariate normality. Under multivariate normality, the type I error rates are lower than the pre-specified type I error rate. Also note that under multivariate normality, the in-control average run lengths converge down to 200 as the base period sample size increases. The type I error rates is higher than the pre-specified type I error rate under the heavier tailed distributions. On the other hand the type I error rate is very small under the bivariate mixed normal distribution. We have not included the performance of the T2 chart for m=20 under the mixed normal distribution 6000 5000 4000 ,-I z3000 2000 1000 -*--m=20 ---- m=50 -*-ml 0 0 0.5 1 1.5 2 2.5 Lambda Figure 3.10 Plot of ARL versus A of the T2 chart for individual observations under the bivariate normal distribution. 10 + - m=20 m=50 ---- m=10 0 0.5 1 1.5 2 2.5 Lambda Figure 3.11 Plot of ARL versus A of the T2 chart for individual observations under the bivariate Cauchy distribution. 0 0.5 1 -*-- m=20 ----m=,50 M m=100 1.5 2 2.5 Lambda Figure 3.12 Plot of ARL versus A of the T2 chart for individual observations under the bivariate t distribution with 2 degrees of freedom. 80 70 . 60501 S40 30 20 10 0 0.5 1 1.5 2 2.5 Lambda --- m=20 --m=50 I Figure 3.13 Plot of ARL versus A of the T2 chart for individual observations under the bivariate t distribution with 5 degrees of freedom. m=20 ---- m=50 A m=100 0 0.5 1 1.5 2 2.5 Lam bda Figure 3.14 Plot of ARL versus 2 of the T' chart for individual observations under the bivariate t distribution with 8 degrees of freedom. 450 400 ii 350 300 . 250 : 200 150 100 50. 0 4 0 0.5 1 1.5 2 2.5 Lambda K.. .m=20 M=50I Ai --dr-m=l0 Figure 3.15 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate t distribution with 18 degrees of freedom. 1200 1000 800 ---- m=50 600 ---m1 0 400 200 0 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.16 Plot of ARL versus /I of the T2 chart for individual observations under the bivariate mixed normal distribution. since the average run lengths were very high (the in-control average run length is well over 15000). The type I error rates are small under the various Pearson type 2 light tailed distributions that were used in the dissertation study. We can, therefore, conclude that the performance of the T2 chart (with cut-off values based on the exact distribution of T2) is poor under both multivariate normality and deviations from multivariate normality. Figures 3.17 through 3.23 show the performance of the T2 chart (with cut-off values obtained by simulation) for individual observations under the various bivariate distributions and for the different base period sample sizes. The plots indicate that these charts do not perform well under departures from multivariate normality. Although, the pre-specified in-control average run length is achieved under the bivariate normal distribution, the type I error rates are high under the heavier tailed distributions. The type I error rate is smaller than the pre-specified type I error rate under the bivariate mixed normal distribution. Under bivariate normality, the performance of the T2 charts with simulated cut-off values is an improvement over the performance of the T2 charts with both asymptotic and exact distribution cut-off values. The type I error rates are smaller than the pre-specified type I error rates under the various Pearson type 2 distributions that were used in the dissertation. We can, therefore, conclude that the performance of the T2 chart (with cut-off values based on simulation) is poor under departures from multivariate normality. Our simulation study shows that the T2 chart does not perform well under deviations from multivariate normality. The in-control average run lengths are lower than the pre-specified in-control average run lengths under the heavier tailed distributions and are higher than the pre-specified in-control average run lengths under the lighter tailed distributions. 250 200 150 - m=20 .J --M=50 100 -A-- m=100 50 0 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.17 Plot of ARL versus A of the T2 chart for individual observations under the bivariate normal distribution. 0 0.5 1 1.5 2 2.5 3 Lam bda Figure 3.18 Plot of ARL versus A of the bivariate Cauchy distribution. 0 0.5 1 1.5 Lam bda T2 chart for individual observations under the --*-- m=20 .Um=50 Sm= 100 2 2.5 3 Figure 3.19 Plot of ARL versus A of the T2 chart for individual observations under the bivariate t distribution with 2 degrees of freedom. 45 40, 35 30 . 25 " 20 15 10 5 0 -m=20 m=50 A r-m=100 .25-- m=20 2U m=50 < 20 15 10 5 0 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.20 Plot of ARL versus A of the T2 chart for individual observations under the bivariate t distribution with 5 degrees of freedom. 60 50 40 ,-J , 30 20 10 0 0.5 1 1.5 2 2.5 3 Lambda m=50 A -m=100 Figure 3.21 Plot of ARL versus A of the T2 chart for individual observations under the bivariate t distribution with 8 degrees of freedom. _*-m=20 m=50 I ---- m=10 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.22 Plot of ARL versus 2 of the T2 chart for individual observations under the bivariate t distribution with 18 degrees of freedom. 600 500 400 -0 S300 m=20 m=50 I A M=10 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.23 Plot of ARL versus 2 of the bivariate mixed normal distribution. T2 chart for individual observations under the Next, consider the performance of the T 2 chart with subgroups of size n. This chart is based on the test statistic T 2 (X-)T S- (X X) where n is the subgroup size, X is the p x I subgroup mean vector, X is the p x I estimate of the process mean p, and S is the p x p unbiased estimator of the process covariance matrix Z. The estimates X and S are computed from a base period of m samples, each of size (say) n' when the process is assumed to be in-control. This simulation study is based on a base period of 25 samples, each of size 5 and subgroup samples, each of size 5 also (that is, m - 25, n'= 5, n = 5). We claim that there is evidence to indicate that the process is out-of-control whenever T 2 for a sample in the control period exceeds the asymptotic X2 cut-off value 10.60. Figure 3.24 shows the average run length performance of this chart under the various bivariate distributions that were used in this dissertation study. The non-centrality parameter is given by 2 =1/2 200 180 160 - 4-Normal 140 -U--t(1) 120 A t(2) , 100 x t(5) 80 *. t(8) 60 *-t(18) 40 -- Nor 20 0 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.24 Plot of ARL versus 2 of the T2 chart for subgroups of size 5 and a base period of size 25 under the various bivariate distributions. Figure 3.24 shows that the performance of the T2 chart with subgroups of size 5 is poor under both multivariate normality and deviations from multivariate normality. The in-control average run lengths of this chart are lower than the pre-specified nominal value under all the bivariate distributions that were used in this dissertation study. It is interesting to note that most texts suggest using a base period of 25 observations and subgroup samples of size 5 in both the base period and the control period while suggesting the use of the asymptotic V2 cut-off value. The above figure clearly shows that the performance of the T2 chart is poor with these values even under the assumption that the underlying distribution of the process is multivariate normal. The type I error rates are high for the heavier tailed distributions whereas the type I error rate is closer to the nominal type I error rate under the bivariate mixed normal distribution. Our simulation results indicated that the type I error rates under the various Pearson type II distributions were also very small (<< 0.005). 3.5 Performance of the Multivariate CUSUM Charts We will now discuss the performance of the multivariate CUSUM charts that were proposed by Crosier (1988) and Pignatiello and Runger (1990). Crosier's multivariate CUSUM is given in Equation (2.4). For k, = 0.50 and a pre-specified in-control average run length of 200, h2 =5.50. These values are provided in Crosier (1988). The multivariate CUSUM proposed by Pignatiello and Runger is given in Equation (2.5). For k4 = 0.50 and for a pre-specified in-control average run length of 200, h4 = 4.75. These values are provided in Pignatiello and Runger (1990). Figures 3.25 through 3.31 show the average run length performance of the two multivariate CUSUM charts under the various distributions that were used in this dissertation study. 250 200 150 -J -.-MC1F W2 100 2 50 0i 0 r-----0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.25 Plot of ARL versus A showing the performance of the multivariate CUSUM procedures under the bivariate normal distribution. 6 5 4 rï¿½3 2 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.26 Plot of ARL versus A showing the performance of the multivariate CUSUM procedures under the bivariate cauchy distribution. -4--MC1 ----MC2 - MC2 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.27 Plot of ARL versus 2 showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 2 degrees of freedom. 45 40 35 30 . 25 < 20 15 10 5 0 0.5 1 1.5 2 2.5 3 Lambda MC2 --MC2 Figure 3.28 Plot of ARL versus 2 showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 5 degrees of freedom. --MCI I --- MC2 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.29 Plot of ARL versus , showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 8 degrees of freedom. - MC1 MC2 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.30 Plot of ARL versus 2 showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 18 degrees of freedom. 30 25 20 .-.- MC1 M 15 ,,,Z + MC2 10 5 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.31 Plot of ARL versus 2 showing the performance of the multivariate CUSUM procedures under the bivariate mixed normal distribution. Figures 3.25 through 3.31 show that the performance of the multivariate CUSUM charts are poor under departures from multivariate normality. The false alarm rates are very high under both the heavier tailed distributions and the bivariate mixed normal distribution. The multivariate CUSUM charts are more sensitive to smaller shifts in the process mean than the T chart. The type I error rates of both multivariate CUSLTM charts under the various Pearson type 2 light tailed distributions that were used in this dissertation study are smaller than the pre-specified type I error rate. The above figures also indicate that the performances of the two multivariate CUSUM charts are very similar. 3.6 Performance of the Multivariate EWMA Chart (MEWMA) We will now study the performance of the multivariate EWMA chart under deviations from multivariate normality. The procedure we use was suggested by Lowry et al. (1992) and is given in Equation (2.6). For co = 0.10, and a pre-specified in-control average run length of 200, H = 8.66. These values are given in Lowry et al. (1992). Figure 3.32 shows the average run length performance of the MEWMA chart under the various bivariate distributions that were used in this dissertation study. Figure 3.32 shows that the performance of the MEWMA chart is poor under deviations from normality. The type I error rates are high under both the heavier tailed distributions and the bivariate mixed normal distribution. However, the performance of the MEWMA under multivariate normality is similar to the performances of the T- and CUSUM charts that were discussed in sections 3.4 and 3.5 (Figures 3.3 and 3.25). The type I error rates of the MIEWMA chart under the various Pearson type 2 light tailed distributions that were used in this study are smaller than the pre-specified type I error rate. 250 200 - Normal1 4-I-t1) 150 A t(2) "x -- - t(5) 100 t(8) t(18) 50 - MxNor 0 0 0.5 1 1.5 2 2.5 3 Lambda Figure 3.32 Plot of the ARL versus 2 for the MIEWMA chart for individual observations under different distributions. CHAPTER 4 ROBUSTNESS OF LIU'S (1995) NON-PARAMETRIC MULTIVARIATE CONTROL CHARTS. In this chapter, we introduce three non-parametric multivariate control charts that were proposed by Liu (1995) and also present simulation results of the average run length performances of these charts. Liu introduced the " r" chart, the "Q" chart, and the "S" chart. These charts are analogous to the univariate X, X, and the CUSUM charts, respectively. The main idea behind the proposed control charts is the reduction of each multivariate observation to a univariate index - namely a ranking, that is based on the concept of data depth. These ranks are then used to construct the control charts. The concept of data depth and some statistics that are derived from it is discussed in Section 4. 1. The "r" chart is introduced in Section 4.2. We discuss the "0" and the "S" charts in Sections 4.3 and 4.4, respectively. 4.1 The Notion of Data Depths In this Section we introduce the notion of data depths and also discuss some of the statistics that are derived from it. These statistics are then used to construct the data depth control charts. First, assume that the quality of each product can be classified by p quality characteristics. Let G be the prescribed distribution of the process when it is operating in-control and let Y ... . Y. be m p x I random observation vectors from G. The sample Y ,..., Y,. is referred to as the base period sample. Next, let X1, X,,... be new p x I observation vectors taken from the process during the control phase and assume that the X,'s follow a distribution F. The X, 's are used to determine if the process is operating in-control or whether the process has gone out of control. An out-ofcontrol state would imply that a difference exists between the distributions G and F. The concept of data depth is used to determine if the two distributions are different. For any point y e R1, the simplicial depth of y with respect to the distribution G is given by DG (Y) = P (Y e s[Y,.., Y- ]) where s[Y1 .... Yp,, ] is a simplex whose vertices Y ,..., Yp are p + 1 random observations from G. The quantity DG (y) is a measure of how "deep" or how "central" the point y is with respect to the distribution G. Most often G is unknown and only a sample Y, ...., Ym is available. In this case the sample simplicial depth of the point y with respect to the data cloud Y ......,, is given by DG, (y) = Ip I (Y G .S[,'~ '~ YP.l (IP+(y s[Y,,... V ]) where the function I(.) is an indicator function that is set equal to 1 if y is in s[.] and is set equal to 0 otherwise (see figure 4. 1 for an example in the bivariate case). The quantity DG. measures how deep y is within the data cloud Yl,..., Ym. Note that Gm is the empirical distribution of Y,..., Ym. Liu (1990) proved that DG() is affine invariant. This implies that it is invariant to nonsingular transformations of the data. This property is desirable since it leads to the construction of multivariate control charts which are invariant to the direction of the shift. The performance of the proposed charts can therefore, be compared with the performance of other competing affine invariant, multivariate control charts. The author also proved the uniform convergence of DG (.) to D, (.). This convergence property allows the estimation of D, (.) from D,, when G is unknown. Throughout this chapter, we will assume that the distribution G is unknown and will, therefore, have to be estimated by Gm. Figure 4.1 Illustrating the values of the indicator function in the bivariate case. To illustrate the notion of data depth, consider a sample Y ..Y. in R 2 Denote by A(Y, , Y,) a triangle with vertices Y,, Y, and Y, For a sample of size n wecan form C(n,3) ("n choose 3") such triangles. Therefore, for any point y Ge we can now associate the number of such triangles that contain y inside. This number should be close to C(n,3) if y lies close to the center of the data cloud Y1 ,..., Y,' and should be close to zero if it is on the outskirts and exactly zero if it is outside the boundary of the data cloud. This concept of data depth extends very easily to p dimensions. A center-outward ranking of the sample points is induced if the data depth of each point in the base period sample is computed and ranked in ascending order. If we use Y,) to denote the sample point associated with the jth smallest depth value, then Y111 ..... Yrn are the order statistics of the Y,'s with Y1.1 being the most "central" or the "deepest" point. A small value of the depth indicates that the associated point is outlying with respect to the distribution G. Next, we discuss some statistics that are derived from data depths. First, let Y - G indicate that the random variable Y follows the distribution G. Let rG(y) = P(DG(Y) < D(y)IY - G), and r,, (y) - I(D, (Y) < DG. (y)) where I is the indicator function and is equal to one if YV has a data depth which is less than or equal to the data depth of y. The probability rc (y) is a measure of how outlying the point y is with respect to G. A small value of r, (y) indicates that y is an outlier and is, therefore, not "central" with respect to the underlying distribution G. On the other hand the quantity r,,m (y), which is an empirical version of r, (y), measures how outlying the point y is with respect to the data cloud Y1 ...., Ym A small value of r.m (y) indicates that only a small fraction of the Y,'s are more outlying than the point y which suggests that y is at the outskirts or on the boundary of the data cloud Y ..... Y. Next, define O(G,F)= P(D (Y) D(X)Y-G,X--.F). 68 It can be shown that Q(G, F) = E, (r, (X)) where E stands for expected value and rG(X) measures the fraction of the G population that is more outlying than X. The quantity Q(G, F) is the average of such fractions over all the X's from the F population. A value of Q(G,F) < 0.50 implies that on the average more than 50% of the G population is more "central" than any observation X from F. If G is known but F is unknown, then Q(G, F) can be estimated by Q(G,F) ZrG(X,) where F denotes the empirical distribution of X,,..., X,. On the other hand if both G and F are unknown then an estimate of Q(G, F) is given by Throughout this chapter, we will assume that the distributions G and F are unknown and will, therefore, have to be estimated by G. and F , respectively. 4.2 The "r" Chart The "r" chart which is analogous to the univariate Shewhart chart for individual observations is based on the statistic r% (.) which was discussed in Section 4.2. It is constructed by first taking a base period sample of size m (labeled Y ...... Y.). Next, for each future observation X, in the control period, the statistic r.. (X,) is computed. The "r" chart is a plot of r% (X ) against time i . The center line of the control chart (CL) is set at 0.50 and the lower control limit (LCL) is set at the type I error rate (a). These control limits are based on the asymptotic distribution of r, (.) Liu and Singh (1993) proved that the asymptotic distribution of r, (.) is U[0,1] (a uniform distribution between 0 and 1). The asymptotic distribution of rG (X,) suggests that the LCL = a. We claim that the process is out-of-control whenever rc, (X,) for a point X, plots below a. To illustrate the construction of the "r" chart consider the example given in Liu (1995). Liu considers a base period of 500 observations and records the number of observations or fraction of the sample with depth equal to zero. It is observed that 2.2% of the sample observations had a depth of zero so that the type I error rate (a) is then set at 2.5%. Note that in this approach the type I error rate is set after observing the base period sample size. In reality, the type I error rate is pre-specified and is usually set at 0.005. We require a base period sample size larger than 500 to achieve this pre-specified type I error rate. In reality, the base period sample size is usually less than 100. The type I error rate for base period sample sizes less than 100 is higher than 0.05. This was verified by simulation. 4.3 The "O" Chart The "Q" chart which is similar to the univariate Shewhart chart with subgroups of observations is based on the statistic Q(G, F, ) which was discussed in Section 4.2. It is constructed by first taking a base period of m observations (labeled Y1,..., Y. ). Subgroups of size n (labeled X ,...,X,) are taken in the control period. For each X in the kh sample (k = 1,2.... ) the statistic rc, (X) is then computed. The statistic Q(G,,, F ) for the k "' sample is given by in I i= The "0" chart is a plot of 0' (Gm, F ) versus time k (k = 1,2,...). The asymptotic distribution of Q(G,, F ) is used for deriving the control limits of the "0" chart. Liu and Singh (1993) show by simulation results that the large sample distribution of Q(G,, F ) is 1 1 1 1 2'12 m n Liu and Singh claim that this approximation holds well even when the size of the samples in the control period is as small as 5. However, our simulation results show that the performance of the "Q" chart is poor with control limits based on this approximation. The center line and lower control limit of the "0" chart with the above approximation are given by CL = 0.50 and LCL = 0.50 - Za -i [(I / m) + (I / n)]. The process is declared out of control whenever Qk (Gm, F,,) for the k th sample falls below LCL. To illustrate the construction of the "0" chart, let us consider an example given in Liu (1995). Liu considered a base period sample of 500 observations and observed that 2.2% of these observations had a depth of zero. The type I error rate (a) was then set at 0.025. The control period sample size is set at 5. The control limits of the "0" chart are, therefore, CL = 0.50 LCL = 0.246 and the process is declared out-of-control whenever Qk (G, F ) for the k th sample falls below 0.246. 4.4 The "S" Chart The "S" chart which is analogous to the univariate CUSUM chart for the process mean, is based on the statistic Sn(Gm) which is defined as n1 Sn(Gm) = Note that S, (Gm) can be rewritten as Sn(G.) = n[Q(G ,Fn) - 1/2] and we can construct the "S" chart by letting CL = 0 and L CL = - Z, Wn [(1I / m) + (1I / n) ] /A 12. These control limits were derived by using the asymptotic distribution of Q(G, , Fn) which was given in Section 4.3. The "S" chart is a plot of Sn(Gn) versus time n = 1,2,.... We claim that the process is out of control whenever Sn(G,) at time n1 falls below LCL. Note that the above formulation of the "S" chart does not include a reset feature that is typical of the normal theory CUSUM charts. Although, the "S" chart is reset at zero whenever the statistic S,(G,) falls below LCL, it is not reset to zero when S,, (G) exceeds zero. This characteristic of the "S" chart slows its ability to detect out-ofcontrol states. To illustrate the construction of the "S" chart, let us consider an example from Liu (1995). Liu considered a base period of 500 observations and observed that 2.2% of the observations had a depth of zero. The type I error rate (a) was then set at 0.025. The control limits of the "S" chart in this case is given by: CL =0 LCL = - 1,96Vtn2 [(1 / 500) + (1l/)]/ 12. The process is declared out-of-control whenever Sn(Gm,) falls below LCL. 4.5 Discussion of Simulation Results We will now discuss the average run length performance of the three data depth multivariate control charts. Our results are based on simulation studies that were conducted using various FORTRAN 77 and IMSL subroutines. The type I error rate (a) was set at 0.005. One of the first problems that we encountered was determining an appropriate base period sample size in order to achieve the pre-specified type I error rate. Liu (1995) simulated 500 random observations from the N2 (0, I) distribution and observed that 2.2% of the base period sample or 11 points had zero data depth. The type I error rate was then set at 0.025. How large a base period sample size would we need to achieve a type I error rate of 0.005? Heuter (1994) was used to heuristically determine an appropriate base period sample size. Heuter's result states that the expected number of vertices or extreme points on the convex hull of a multivariate normal sample of size n is given by 2 /2z-In(n) where r = 3.142 and ln(.) is the natural log function. This result was used as a guideline and base periods of sizes 2000, 2500, and 3000 were simulated from the N2 (0, 1) distribution. It was found that the average number of extreme points for a base period of 2500 observations was 12.24 thus giving a simulated type I error rate of 0.0048 which is very close to the nominal value of 0.005. The base period sample size was, therefore, set at 2500 observations for all three data depth control charts. The simulation results are based on only 1000 out-of-control signals since the algorithm to compute the data depths is very computer intensive. The data depths were computed by using the FORTRAN algorithm developed by Rousseeuw and Ruts (1992). Figure 4.2 shows the average run length performance of the """ chart under the bivariate distributions that were used in this dissertation. Note that the "7" chart was modified by using a LCL = 0 instead of the recommended LCL = a due to the negligible difference between the pre-specified type I error rate of 0.005 and zero. Therefore, the process was declared out-of-control for any observation that was on the boundary of the data cloud that is determined by the base period sample. The performance of the modified "r" chart was found (by simulation) to be identical to the performance of the chart with LCL = a. This is because the only way to achieve a value of the depth less than a is to observe a depth equal to zero. We were also able to considerably reduce the computation time by using LCL = 0. The plot shows that the in-control ARL is overestimated when the random deviates were sampled from the bivariate normal distribution. It is also clear that the performance of the "r" chart is poor under the heavier tailed distributions. The type I error rate is very small for the multivariate t distribution with 2 and 5 degrees of freedom. As the degrees of freedom increases the type I error rate starts to converge to the type I error rate under the multivariate normal distribution. The type I error rate is overestimated for the bivariate mixed normal distribution. The performance of the "r" chart under the multivariate t distribution with 2 degrees of freedom is unchanged under the various values of the non-centrality parameter. The performances of the "r" chart under multivariate normality and the mixed normal distribution are similar. 400 350 300 i-- Normal 250 ----t(2) < --x--t(5) 200 150 I--*--t(18) 100 . --!vMNor 50 0 0 0.5 1 1.5 2 2.5 3 Lam bda Figure 4.2 Plot of ARL versus 2 performance of the "r" chart under the various bivariate distributions. Figures 4.3 through 4.8 compare the performance of Hotelling's T2 chart with the performance of the "r" chart under the various distributions that were used in this dissertation. Under bivariate normality, the in-control average run lengths of the T' chart are shorter than with the "r" chart for values of A between 0 and 2. The performance of the T2 chart is better than the performance of the "r" chart under bivariate normality owing to the smaller base period sample sizes of the T2 chart relative to the ""' chart. The in-control average run lengths of the T2 chart under the various bivariate t distributions are shorter than the pre-specified in-control average run length. On the other hand the in-control average run lengths of the "r" chart are higher than the pre-specified in-control average run length. The in-control average run length of the T2 chart under the bivariate mixed normal distribution is higher than the pre-specified in-control average run length whereas the in-control average run length of the "r" chart is lower than the pre-specified in-control average run length. Figures 4.9 through 4.10 show the average run length performance of the "Q" chart for subgroups of sizes 5 and 10 under the various bivariate distributions that were used in this dissertation study. The non-centrality parameter is given by 2 = (np T ',), Figure 4.9 shows that the average run length performance of the "Q" chart with subgroups of size 5 is poor under both multivariate normality and departures from it. Figure 4.9 indicates that the in-control average run lengths of the "Q" chart are underestimated under all the bivariate distributions. This implies that the chart will give a high rate of false alarms even though the process is in-control. 250 200 4 m=20 150 150--m=50 100 r 50 0 0 0.5 1 1.5 2 2.5 3 Lambda Figure 4.3 Plot comparing performance of the T2 chart with the "r" chart under bivariate normality. Figure 4.4 Plot comparing the performance of the T2 chart with the "r"" chart under the bivariate t distribution with 2 d.f Figure 4.5 Plot comparing the performance of the T2 chart with the "r" chart under the bivariate t distribution with 5 d.f. 600 500 400 -- m=20 [ Mm=50 300 1 m=l0 200 100 00 0 0.5 1 1.5 2 2.5 3 Lambda 600 500 400 M4 = m20 .J m=50 300 m 0 200 r 100 0T 0 0.5 1 1.5 2 2.5 3 Lambda Figure 4.6 Plot comparing the performance of the T2 chart with the "r" chart under the bivariate t distribution with 8 d.f Figure 4.7 Plot comparing the performance of the T 2 chart with the -r" chart under the bivariate t distribution with 18 d.f, 600 500 400 m=20 J ---m=50 300 Am=l0 200 100 0 0 0.5 1 1.5 2 2.5 3 Lambda 600 500 400 m=20 -J ï¿½ m=50 300 -m=100 200 - r 100 0 0 0.5 1 1.5 2 2.5 3 Lambda Figure 4.8 Plot comparing the performance of the T2 chart with the "r" chart under the bivariate mixed normal distribution. Figure 4.9 Plot of ARL versus 2 performance of the 0 chart under the various bivariate distributions. 600 500 400 1--- m=20 -j " -W- m=50 W 300 - A 0 1X O 0 .... .... 200r 100 0 0 0.5 1 1.5 2 2.5 3 Lambda OU - Normal 70u -t(2) 604 6o t(5) 50 40- t(18) 20 - - No 10 0 , 0 0.5 1 1.5 2 2.5 3 Lambda Figure 4.10 Plot of ARL versus A performance of the Q chart under the various bivariate distributions. Figure 4.10 indicates that the performance of the "Q" chart can be improved upon by increasing the subgroup size. However, it is clear that increasing the subgroup size does not improve the performance dramatically. The type I error rates are still smaller than the pre-specified type I error rate. Figures 4.11 through 4.16 compares the performance of the Hotelling's T2 chart with the performance of the "Q" chart with subgroups of size 5 under the various distributions that were used in this dissertation study. Under bivariate normality, the incontrol average run length of the T2 chart is closer to the pre-specified in-control average run length than the in-control average run length of the "Q" chart. The average run lengths of the T2 chart under the out-of-control states are shorter than the average run lengths of the "Q" chart under the out-of-control states. This is a direct consequence of 70 60 50 - Normal -- --t(2) 40 ' 30 -- -t(8) 20 )K t(18) -e Mx~vNo 10 0 0 0.5 1 1.5 2 2.5 3 Lam bda the relationship between the type I error rate ( a ) and the power of a hypothesis test. As the type I error rate increases so does the power of the hypothesis test. The in-control average run lengths of the T2 chart are shorter than the in-control average run lengths of the -Q" chart under the bivariate t distribution with 2 and 5 degrees of freedom. On the other hand, the in-control average run lengths of the T 2 chart are higher than the in-control average run lengths of the " 0" under the bivariate t distribution with 8 and 18 degrees of freedom. The poor performance of the " Q" chart can be attributed to its cut-off value which is based on the asymptotic distribution of the "Q" statistic. We feel that the performance of the "Q" chart can be improved upon by simulating the cut-off value. However, since the FORTRAN programs to compute data depths are very computer intensive, this method may use up unnecessary computer time. 160 140 120 100 ._ j "Q" (n=5) 80 < | .-.1..--T"2 (n=5) 60 40 20 0 0 0.5 1 1.5 2 2.5 3 Lambda Figure 4.11 Plot comparing the performance of the T2 chart with the "0" chart under the bivariate normal distribution. 90 80 70 60 50 40 30 20 10 0 0 0.5 1 1.5 2 2.5 3 "Q" (n=5) ---T^2 (n=5) Lam bda Figure 4.12 Plot comparing the performance of the T2 chart with the "'0" chart under the bivariate t distribution with 2 d.f. 90 80 70 60 50 40 30 20 10 0 "Q" (n=5) T^2 (n=5) 0 0.5 1 1.5 2 2.5 3 Lambda Figure 4.13 Plot comparing the performance of the T2 chart with the "0" chart under the bivariate t distribution with 5 d.f. "Q" (n--5) - TA2 (n--5) 0 0.5 1 1.5 Lambda 2 2.5 3 Figure 4.14 Plot comparing the performance of the T2 chart with the "0" chart under the bivariate t distribution with 8 d.f. "Q" (n-5) --!-T^2 (n--5) 0 0.5 1 1.5 Lambda 2 2.5 3 Figure 4.15 Plot comparing the performance of the T2 chart with the "Q" chart under the bivariate t distribution with 18 d.f 100 90 80 70 60 ,-J , 50 40 30 20 10 0 200 180 160 140 120 0"Q" (n--5) 100 T- - 2 (n=5) 80 60 40 20 0 0 0.5 1 1.5 2 2.5 3 Lambda Figure 4.16 Plot comparing the performance of the T2 chart with the 0" chart under the bivariate mixed normal distribution. Figure 4.17 shows the performance of the "S" chart under various bivariate distributions. Note that the in-control average run lengths are overestimated under all the bivariate distributions that were used in this dissertation. The in-control average run length is greatest for the mixed normal distribution. The long in-control average run lengths are due to the inability of the "S" CUSUM to reset to zero whenever the statistic S, (G ) exceeds zero. Figures 4.18 through 4.23 compare the performance of the multivariate CUSUM charts that were proposed by Crosier (1988) and Pignatiello and Runger (1990) with the performance of the "S" chart under the various bivariate distributions that were used in this dissertation. Under bivariate normality, both the normal theory multivariate CUSUM charts attain the pre-specified in-control average run lengths and do very well in detecting out-of-control states. On the other hand, the "S" chart has an in-control average run length that is greater than the pre-specified in-control average run length. The out-ofcontrol average run lengths of the "S" chart are greater than the out-of-control average run lengths of both the normal theory multivariate CUSUM charts. The in-control average run lengths of the normal theory CUSUM charts are shorter than the pre-specified in-control average run lengths for the various non-normal distributions. On the other hand the in-control average run lengths of the "S" chart are greater than the pre-specified incontrol average run length under the various non-normal distributions. This is a direct consequence of the relationship between the type I error rate (a) of a hypothesis test and the power of a test. Also note that the in-control average run lengths of the " S" chart are high since it does not have a reset feature that is typical of the normal theory CUSUM procedures. In particular, although the "S" statistic is reset to zero when the process is declared out-of-control, it is not reset to zero when it is equals a positive value. 450 400 350 4 Normal 300 t(2) _j 250 A t(5) t t(8) " 200 -( 150 W ~ - -t(1 8) 1-- MxNo 100 50 0I 0 0.5 1 1.5 2 2.5 3 Lambda Figure 4.17 Plot of ARL versus A for the "S" chart under the various bivariate distributions. Figure 4.18 Plot comparing the performance of the normal theory multivariate CUSUM charts with the "S" chart under the bivariate normal distribution. 300 250 200 , 150 100 50 0 0 0.5 1 1.5 2 2.5 3 Lambda ---- Crosier ----- RgRunger A---"s" Figure 4.19 Plot comparing the performance of the normal theory multivariate CUSUM charts with the "S" chart under the bivariate t distribution with 2 df 300 250 20011 Crosier 150 --- PigRunger 100 50 0 0 0.5 1 1.5 2 2.5 3 Lambda |

Full Text |

xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd INGEST IEID E6CL0EZRW_ZU37RI INGEST_TIME 2016-11-18T21:05:20Z PACKAGE AA00052595_00001 AGREEMENT_INFO ACCOUNT UF PROJECT UFDC FILES PAGE 1 ... ROBUST MULTIVARIATE CONTROL CHARTS By VIVEK BALRAJ AJMANI A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1998 PAGE 2 To God for giving me strength, my family for their support , and Preeti for her love and friendship . PAGE 3 ACKNOWLEDGMENTS I would like to express my sincere gratitude to Dr. Geoffery Vining for being m y advisor . Everything I know about industrial statistics is directly attributed to his vast knowledge of the area and his excellent ability to impart this knowledge to his students. I look forward to continued collaboration with him in future . I would also like to thank Dr. Ronald Randles for his continued support , encouragement and kindness. Thanks also to Dr. Malay Ghosh , Dr. John Cornell, and Dr. Dianne Schaub for being on my d i ssertation committee . I would also like to thank Dr. William Woodall (University of Alabama ) for agreeing to critique this dissertation, Dr. Thomas Hettmansperger (Pennsyl v ania State University) for allowing me to use his algorithm for the H chart , and Dr. Peter Rousseeuw (University of Antwerp) for supplying me with the FOR TRAN code to compute data depths . In addition , I would like to thank my family for their constant words of encouragement. This dissertation would not have been possible without their support. In particular , I would like to thank my nephews , Vinay, Timothy , and Arvin , and m y niece , Anjali for providing me with hours of fun and laughter. Words are not enough to express thanks to my wife, Preeti . Preeti has been my " shelter from the storm ." Her love , devotion and kind words of encouragement have helped me get through some difficult and often very frustrating days . This dissertation is hers more than it is mine . Ill PAGE 4 TABLE OF CONTENTS ACKNOWLEDGMENTS .... . ........ . ........ . ................... ... . . .... ........ . ............................ . . . ...... iii LIST OF FIGURES . .... . ..................... . . ..... . . ......... . ..... . ............................ . ... . .......... . .... ....... vi ABSTRACT .... . .... ..... . ........ . .... . ....... . . ...................... ....... ........ . . .... . . ................ . ....... ...... .... xiv CHAPTERS 1 INTRODUCTION ........... . ..................... . ................... . ..... . . . ................... . ....................... 1 2 REVIEW OF LITERATURE ............................................................ ........ . ................... 7 2.1 The Shew hart Control Chart ..... . ......................... .......... ..................... .... ................. 7 2.2 The Cumulative Sum Chart (CUSUM) ... .................................. . ........................... 16 2.3 The Exponentially Weighted Moving Average Chart (EWMA) ...... ... ................. 24 2.4 A Review of Liu ( 1995) ................................................. . ............... ... .... ..... . .......... 30 3 ROBUSTNESS OF THE NORMAL THEORY MULTIVARIATE CONTROL CHARTS ........ .... . ........... . .......... . ..... . ............... ........ . ... . ...................... 35 3.1 Distributions Used in the Simulation Stud y ............ .... . .... ......... ............ ......... . ..... 36 3.2 The Simulation Strategy ..................................................... .... .... ..................... ..... 37 3.3 The Performance of the x2 Chart ....................................... . ......... ... ......... . ... .... .. .. .. 38 3.4 The Performance of Hotelling's T2 Chart ............................... ...... ........ ... ............ .42 3 . 5 Performance of the Multivariate CUSUM Charts . .... . ... .......................... . ............. 58 3.6 Performance of the Multivariate EWMA Chart ( MEWMA) .................. ......... . .... 62 4 ROBUSTNESS OF LIU'S (1995) NONPARAMETRIC MULTIVARIATE CONTROL CHARTS ................................... .... .... . ..... ..... .................... .... . . ........ .... 64 4.1 The Notion of Data Depths ............................ . .............................. . .... ... ...... . ......... 64 4 . 2 The " r " Chart .................................................................................. . ..................... 68 4.3 The " Q " Chart ............... .. .......................................... . ....... . ........... . ......... . ....... . ..... 69 4 . 4 The " S " Chart ................. ......................... ........ .................... ............... . .... ............. 71 4.5 Discussion of Simulation Results ........... . ............................................. . ......... ...... 72 lV PAGE 5 5 ROBUST MUL TIV ARIA TE CONTROL CHARTS UNDER A KNOWN COVARIANCE MATRIX ....... ............... . ...................................... ........................ 89 5.1 Affine Invariant Multivariate One Sample Sign And Sign Rank Tests ................ 90 5.2 Robust Multivariate Shewhart T y pe Charts ................................ .......................... 93 5 . 3 A Robust Multivariate Exponentiall y Weighted Moving Average Chart ........... 109 6 AFFINE INVARIANT ROBUST MULTIVARIATE CONTROL CHARTS UNDER AN UNKNOWN COVARIANCE MATRIX . . ............. ....... . . ............... 113 6.1 Affine Invariant Multivariate Shewhart T ype Charts .. ........ ......... ...... ................ 114 6.2 An Affine-Invariant Multivariate EWMA Chart ............................................... .144 7 SUMMARY AND CONCLUSIONS ........ . ............................................................... 151 APPENDIX: FORTRAN PROGRAMS .................................................................... 153 1. ARL of the x : Chart for Individual Observations ............................................... 153 2. ARL of the T2 Chart for Individual Observations .............................................. 155 3. ARL of the T2 Chart for Subgroups ofObservations ...... .... ............. .. ................ 158 4. ARL of Crosier ' s (1988) Multivariate CUSUM ............................................... . . 161 5. ARL of Pignatiello and Runger's (1990) Multivariate C U SUM ........................ 162 6. ARL of the Normal Theory Multivariate EWMA Chart ( r=0.30) ......... ............. 164 7 . ARL of the "r" Chart ........................................... . .......................... . ................... 166 8 . ARL of the "Q" Chart ..................................................... ............. .................... .. . 172 9. ARL of the "S' Chart .............................. .. ............................................ . ............. 175 10. ARL of the RST Chart ......................................................................................... 178 11. ARL of the PR-SRT Chart ................................... . ............ . .................................. 179 12. ARL of the Robust EWMA Chart ........ .... .......................... .... .......... . ................. 182 13. ARL of the V ( n ) Chart . ....................... ...... ....................... .... ........ . ..... . ............. ... 183 14. ARL of the W ( n) Chart ....................... .......................... ...... ................ .... .... . ....... 186 15. ARL of the H Chart ............................................................. ........ .... .................... 189 REFERENCES ..... . ................ . .............................. . ........ . . . .... . ... .... ... .... . . ...... . ...... . ............ 193 BIOGRAPHICAL SKETCH ............ ................... ................................ . ............ . ....... .... ... 196 V PAGE 6 LIST OF FIGURES Figure Figure 3 . 1 Plot of ARL versus ,.l of the x 2 chart for individual observations .... . ..... ..... 40 Figure 3 . 2 Plot of the ARL versus ,.l of the x2 chart with subgroups of size 5 .......... . . . 41 Figure 3 . 3 Plot of ARL versus A of the T2 chart for individual observations under bivariate normality .... . ....... . ................. . . .... . . .... ................. ............. ........... . . ............. 45 Figure 3 . 4 Plot of ARL versus ,.l of the T2 chart for individual observations under the bivariate Cauchy distribution ... . .... .... ......... . .... . . ....... ... ...... ... .... .......... ....... . ............ . 45 Figure 3. 5 Plot of ARL versus A of the T2 chart for individual observations under the bivariate t distribution with 2 degrees of freedom .... ....... ... ... . . . ................ .. ... .......... 46 Figure 3. 6 Plot of ARL versus A of the T2 chart for individual observations under the bivariate t distribution with 5 degrees of freedom .... ... ...... ..... ............... . ....... ....... ... 46 Figure 3. 7 Plot of ARL versus ,.l of the T2 chart for individual observations under the bivariate t distribution with 8 degrees of freedom ........ . .......... . .... .............. ............ 4 7 Figure 3 . 8 Plot of ARL versus ,.l of the T2 chart for individual observations under the bivariate t distribution with 18 degrees of freedom ... . ......... . . . ..... . . . . . ..... . . .... .... . ..... . . . 4 7 Figure 3 . 9 Plot of ARL versus A of the T2 chart for individual observations under the bivariate contaminated normal distribution . . . . . .... . .. .... . .... . . .... ............. ............... .. .... . 48 Figure 3 . 10 Plot of ARL versus A of the T2 chart for individual observations under the bivariate normal distribution . . . . . . ..... ........ . . . .......... ...... . ... . .... .... . . . . . .... . . .... ... .... .... . . ... . 49 Figure 3. 11 Plot of ARL versus ,.l of the T2 chart for ind i vidual observations under the bivariate Cauchy distribution .... ............ . . .... . . ... ... . ... ... ...... . . .... ... . . ...................... . . .. . . 49 Figure 3. 12 Plot of ARL versus ,.l of the T2 chart for individual observations under the bivariate t distribution with 2 degrees of freedom .... . . .... . . . .... . .......... . .... . . ..... . . ... ...... 50 Figure 3 . 13 Plot of ARL versus ,.l of the T2 chart for individual observations under the bivariate t distribution with 5 degrees of freedom ....... ........ ........ ...... .......... .... ....... 50 VI PAGE 7 Figure 3 . 14 Plot of ARL versus ,1 of the T2 chart for individual observations under the bivariate t distribution with 8 degrees of freedom ...... ... . ............ . ............. . ........... ... 51 Figure 3 .15 Plot of ARL versus A of the T2 chart for individual observations under the bivariate t distribution with 18 degrees of freedom ... .... . .... . ........... . .... .... .......... ....... 51 Figure 3 . 16 Plot of ARL versus A of the T2 chart for individual observations under the bivariate mixed normal distribution .. : ..... . . . ...... . . . . . . .... . ....... . . ... . .................... . ........ ... 52 Figure 3 .17 Plot of ARL versus A of the T2 chart for individual observations under the bivariate normal distribution ... . . ................. . . .......... ....... . . . ... .... ... . ...... ....... . ...... . . .... . . 53 Figure 3 . 18 Plot of ARL versus A of the T2 chart for individual observations under the bivariate Cauchy distribution ... .... .... ...... ... . . ... . . ....... ...... . . . .... ... .................. . . . . . . . . . . . ... 54 Figure 3 . 19 Plot of ARL v ersus A of the T2 chart for individual observations under the bivariate t distribution with 2 degrees of freedom . . . . . . ... ...... . . .... . .... . . . ......... . . . . ..... .... 54 Figure 3 . 20 Plot of ARL versus A of the T2 chart for individual observations under the bivariate t distribution with 5 degrees of freedom . . ..... . . . . . ....... .... .... . . . ..... . ...... ... . . ... . 5 5 Figure 3 . 21 Plot of ARL versus A of the T2 chart for individual observations under the bivariate t distribution with 8 degrees of freedom . . ......... ........ ...... .... .......... . ..... . ..... 55 Figure 3 . 22 Plot of ARL versus A of the T2 chart for individual observations under the bivariate t distribution with 18 degrees of freedom ... . . ..... . ..... .... ....... . . . ....... ........ ... . . 56 Figure 3 .23 Plot of ARL versus A of the T2 chart for individual observations under the bivariate mixed normal distribution .... . ...... . . . . . . . ... .... . ........... ... .... ... . . . . ............... ...... . 56 Figure 3 . 24 Plot of ARL versus A of the T2 chart for subgroups of size 5 and a base period of size 25 under the various bivariate distributions ... . ... . . . . . ...... .... ....... . . . .... ... 57 Figure 3 .25 Plot of ARL versus A showing the performance of the multivariate CUSUM procedures under the bivariate normal distribution .. . ...... ...... . . . . ... . . ...... . . ....... .... . ...... 59 Figure 3.26 Plot of ARL versus A showing the performance of the multivariate CUSUM procedures under the bivariate cauchy distribution ..... . .......... .... .... ....... ........ . . . .... .... 59 Figure 3.27 Plot of ARL versus A showing the performance of the multivariate C U SUM procedures under the bivariate t distribution with 2 degrees of freedom ..... . . . .......... 60 Figure 3.28 Plot of ARL versus A showing the performance of the multivariate C U SUM procedures under the bivariate t distribution with 5 degrees of freedom .... . . . ........... 60 Vil PAGE 8 Figure 3.29 Plot of ARL versus .A. showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 8 degrees of freedom ........ ... .... ... 61 Figure 3.30 Plot of ARL ve rsus .A. showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 18 degrees of freedom ....... ......... 61 Figure 3 .31 Plot of ARL versus .A. showing the performance of the multivariate CUSUM procedures under the bivariate mixed normal distribution ....................................... 62 Figure 3 . 32 Plot of the ARL versus .A. for the MEWMA chart for individual observations under different distributions ....... ...... . ........................ .......... .... .... ............................ 63 Figure 4 . 1 Illustrating the values of the indicator function in the bivariate case .... .... . . . ... 66 Figure 4 . 2 Plot of ARL versus .A. performance of the " r " chart under the various bivariate distributions . ........... ..... . ............. . ..... ... ........... .... ..... . ........... .... . . ..... ..... ..... 7 4 Figure 4 . 3 Plot comparing performance of the T2 chart with the" r" chart under bivariate normality .... ... ........ . . ... . . . . ..... ... ....... . . .... ....... . . . .... .... . .... . .......... ... . . ...... ...... .. 75 Figure 4 .4 Plot comparing the performance of the T2 chart with the " r " chart under the bivariate t distribution with 2 d .f. ... ........... . . . ......... . ....... . ..... . ..... .... . . .... .... . . ..... . ...... 76 Figure 4 . 5 Plot comparing the performance of the T2 chart with the "r " chart under the bivariate t distribution with 5 d .f. .... . . . .... . . ......... ......... .......... . .... . . . ................ ......... 76 Figure 4. 6 Plot comparing the performance of the T2 chart with the " r " chart under the bivariate t distribution with 8 d .f. ..... ...... ......... . . .. . . ........ . . ....... ... . . .............. . ....... .... 77 Figure 4. 7 Plot comparing the performance of the T2 chart with the "r " chart under the bivariate t distribution with 18 d .f. ... . ...... . . . ........... ................ . . ... . . .... . ...... .... . . . . ..... . 77 Figure 4. 8 Plot comparing the performance of the T2 chart with the " r " chart under the bivariate mixed normal distribution .... . . . ..... ....... . ... ... ....... . ............ . . ... . ..... ........ . .... . . . 78 Figure 4 . 9 Plot of ARL versus .A. performance of the Q chart under the various bivariate distributions .......... . . . . .... . .......... ... ... . . . . .... ........... ... . ....... .... ........ . . . ... .... . ....... . ... ........ 78 Figure 4. 10 Plot of ARL versus .A. performance of the Q chart under the various bivariate distributions .... ..... . . ...... ...... .... . ... ...... . ... ... ..... ... ... . . . . . . ....... ... . ...... ... ... ...... . . . 79 Figure 4. 11 Plot comparing the performance of the T2 chart with the "Q" chart under the bivariate normal distribution .... .... ....... .... . .... ...... ........ . ....... . . .... ...... . ... .... ....... . . . . 80 Figure 4 . 12 Plot comparing the performance of the T2 c hart with the " Q " chart under the bivariate t distribution with 2 d.f. . . . . . . . ...................... . .... . ..... .......... . . . ............ ... 81 Vlll PAGE 9 Figure 4 . 13 Plot comparing the performance of the T2 chart with the "Q" chart under the bivariate t distribution with 5 d. f. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... ........ .... . .... . .... 81 Figure 4. 14 Plot comparing the performance of the T2 chart with the "Q" chart under the bivariate t distribution with 8 d.f. . ...... ... . ....... .. ........ .... .... . ... . . . . . .... ...... . . ........ . . 82 Figure 4 . 15 Plot comparing the performance of the T2 chart with the " Q" chart under the bivariate t distribution with 18 d.f.. .. ..... . . . ..... . . ...... ......... ... . ........... . ..... .... ... ...... 82 Figure 4. 16 Plot comparing the performance of the T2 chart with the " Q" chart under the bivariate mixed normal distribution . .... . .... . ..... . .......... .... ........ .... .... .... ....... . ........ 83 Figure 4 . 1 7 Plot of ARL versus A for the "S " chart under the various bivariate distributions .......... . . .......... .... ........................ . .... ............ . ..... .... ...... . . ... . .... .... . . . . . ..... 84 Figure 4 .18 Plot comparing the performance of the normal theory multivariate CUSUM charts with the "S " chart under the bivariate normal distribution ............ .......... . . ... . 85 Figure 4 .19 Plot comparing the performance of the normal theory multivariate CUSUM charts with the " S" chart under the bivariate t distribution with 2 d.f. . .................. 85 Figure 4 . 20 Plot comparing the performance of the normal theory multivariate CUSUM charts with the " S" chart under the bivariate t distribution with 5 d .f. ........... . . . . .... 86 Figure 4 .21 Plot comparing the performance of the normal theory multivariate CUSUM charts with the "S" chart under the bivariate t distribution with 8 d .f. . ........ . .... . .... 86 Figure 4 . 22 Plot comparing the performance of the normal theory multivariate CUSUM charts with the" S" chart under the bivariate t distribution with 18 d .f. . . . ... ..... . . .... 87 Figure 4.23 Plot comparing the performance of the normal theory multivariate CUSUM charts with the "S" chart under the bivariate mixed normal distribution .......... .. ... . . 87 Figure 5. 1 Plot of ARL versus A for the RST with n = 5 under various bivariate distributions ..... .... .......... ...... ....... . . . .... . .......... . .... . . . ......... . ........ ....... ...... . . . ............... 99 Figure 5 . 2 Plot of ARL versus A for the RST with n = l O under various bivariate distributions ........................ . ... ... .................. ... . . ....... . . . .......... . . . .................... .... .... 100 Figure 5.3 Plot of ARL versus A for the PR-SRT with n = 5 under various bivariate distributions . . .... .... . . ................................................... . ... ....... ......... ............ ... . . ..... 104 Figure 5 . 4 Plot of ARL versus A for the PR SRT with n = 10 under various bivariate distributions ............................. . ............ . ..... ........ . ............ . . ............... . ....... ........ . . . 104 lX PAGE 10 Figure 5 . 5 Plot comparing the T2 , the " Q", the RST , the PR-SRT, and the x : charts under bivariate normality .. .... . . .... . . . . ...... . ...... . . . ... . . . ....... . ....... ....... . . . . . . . ... . . . ...... . . . .... 105 Figure 5 . 6 Plot comparing the T2 , the " Q", the RST, the PR-SRT, and the x: charts under the bivariate t distribution with 2 d.f ... . ..... . ............. . . . . . . .... ........ . ...... .... ... ... 105 Figure 5 . 7 Plot comparing the T2 , the " Q ", the RST, the PR-SRT, and the x: charts under the bivariate t distribution with 5 d.f ............. ... ..... ........... ........ ...... ............ 106 Figure 5.8 Plot comparing the T2 , the "Q", the RST, t he PR-SRT, and the x: c harts under the bivariate t distribution w i th 8 d . f .. .. .. .. . .. . .. .. . . . .. . .. .. .. .. .. . .. . .. .. . .... . ......... 106 Figure 5 . 9 Plot comparing the T2 , the " Q " , the RST, t he PR-SRT, and th e x: charts under the bivariate t distribution with 18 d . f .. ........... ..................... . ..................... . 107 Figure 5 . 10 Plot comparing the T2 , the " Q" , the RST, the PR-SRT, and the x : charts under the bivariate mixed normal distribution ....... ..... .... .... . ..... .......................... ... 107 Figure 5 . 11 Plot of ARL versus ,1 for the RMEWMA with r = 0.10 under v arious bivariate distributions . . ..... . . ........ . . .... . . ...... . . . .... . . ...... ... . . . ....... . . . . . ...... ...... .... .... ....... 112 Figure 6.1 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivariate normal distribution ........ ...... . . . ... ... . . . . . .... . . ..... 115 Figure 6 . 2 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivariate t distribution with 2 d .f. ... . ............. ...... ...... . 116 Figure 6 . 3 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivariate t distribution with 5 d.f ............ ..... .............. 116 Figure 6.4 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivar i ate t distribution with 8 d . f ...... .... ............. . . ...... 117 Figure 6 . 5 Plot comparing the performance of the V n chart for different base period sample sizes and under the bivariate t distribution with 18 d . f ......................... .... 117 Figure 6.6 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivariate mixed normal distribution . ....... ... . .... . ....... . . . . . 118 Figure 6 . 7 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivariate normal distribution ..... ........................ .......... 119 Figure 6 . 8 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivariate t distribution with 2 d.f .... .... ....... . ...... ...... ... 119 X PAGE 11 Figure 6 . 9 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivariate t distribution with 5 d .f.. ... . .......................... 120 Figure 6 . 10 Plot comparing the performance of the Vn chart for different base per i od sample sizes and under the bivariate t distribution with 8 d .f. ......................... .... . . 120 Figure 6.11 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivariate t distribution with 18 d .f. ...... .................. .... . 121 Figure 6.12 Plot comparing the performance of the V n chart for different base period sample sizes and under the bivariate mixed normal distribution ..... . . .... . .... . ... ....... . . 121 Figure 6. 13. Plot comparing the performance of the w;, chart for different base period sample sizes and under the bivariate normal distribution ...... ............. ..... ........... ... . 123 Figure 6.14 . Plot comparing the performance of the w;, chart for different base period sample sizes and under the bivariate t distribution with 2 d .f. ............ ..... ....... ....... 124 Figure 6. 15 Plot comparing the performance of the W,, chart for different base period sample sizes and under the bivariate t distribution with 5 d .f. ...... .... ... ............. ..... 124 Figure 6 . 16 Plot comparing the performance of the W,, chart for different base period sample sizes and under the bivariate t distribution with 8 d .f. ..... . ...... ......... ..... .. ... 125 Figure 6 .17 Plot comparing the performance of the w;, chart for different base period sample sizes and under the bivariate t distribution with 18 d . f ............................ . 125 Figure 6 .18 Plot comparing the performance of the w;, chart for different base period sample sizes and under the bivariate mixed normal distribution . .......... .. ........ ........ 126 Figure 6.19 Plot comparing the performance of the w;, chart for different base period sample sizes and under the bivariate normal distribution . . ......... ... ... . ... ... ... ....... ..... 126 Figure 6.20 Plot comparing the performance of the W,, chart for different base period sample sizes and under the bivariate t distribution with 2 d .f. .... ................... ... ..... 127 Figure 6 . 21. Plot comparing the performance of the w;, chart for different base period sample sizes and under the bivariate t distribution with 5 d . f .......... ...... ............... 127 Figure 6 . 22 Plot comparing the performance of the W,, chart for different base period sample sizes and under the bivariate t distribution with 8 d .f. ..... . ............. ............ 1 2 8 Figure 6 .23 Plot comparing the performance of the W,, chart for different base period sample sizes and under the bivariate t distribution with 18 d . f .. .. .............. ........... 128 X1 PAGE 12 Figure 6 . 24 Plot comparing the performance of the W,, chart for different base period sample sizes and under the bivariate mixed normal distribution . . . ..... ....... . .... . ....... . 129 Figure 6 .25 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate normal distribution ..... ........... .... . . .... . ............ 13 3 Figure 6 . 26 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 2 d.f. . . . ....... ......... ... .... .... . . 133 Figure 6 . 27 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 5 d .f. .... . . . ... . . . . .......... . .... ... 134 Figure 6 . 28 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 8 d.f. ... . .... ...... . . . ............ ... 134 Figure 6 . 29 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 18 d . f. .... . . . ........ . .... . . .... .... 13 5 Figure 6 . 30 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate mixed normal distribution . ..... . . .... . . . ....... ...... . 13 5 Figure 6 .31 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate normal distribution . ...... . ... .... . . .... . . . . . ... ... ..... . . 13 7 Figure 6 . 32 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 2 d.f.. . . ........... ...... .... ..... . . 137 Figure 6 . 33 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 5 d.f.. . . .......... .... . . . .... . . . . ... 138 Figure 6 . 34 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 8 d.f.. . . .... ....... . ... .... ........ . 138 Figure 6 . 35 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 18 d .f. ... . .... . .............. . . . . . . 139 Figure 6 . 36 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate mixed normal distribution ...... .... ......... . . . ... . ... 13 9 Figure 6 . 37 Plot comparing the performance of the T2, " Q " , RST, PR SRT, Vn, W,,, and H charts under the bivariate normal distribution ........ . . ...... ....... ....... ......... . ... . 141 Figure6. 38 Plotcomparingtheperformanceofthe T2, " Q ", RST, PR-SRT, V n , W,,, and H charts under the bivariate t distribution with 2 d.f. ...... . . ............ ............... 141 Figure 6.39 Plot comparing the performance of the T2 , " Q " , RST, PR -SRT, Vn, W,,, and H charts under the bivariate t distribution with 5 d .f. ........ .... . .. . .... . ... . . ... . . ... . 142 Xll PAGE 13 Figure 6.40 Plot comparing the performance of the T2, "Q", RST, PR-SRT, Vn, W,,, and H charts under the bivariate t distribution with 8 d .f. ... . ... . . . ....... . ....... ... . .... . 142 Figure 6.41 Plot comparing the performance of the T2 , "Q", RST, P R SRT, Vn, W,,, and H charts under the bivariate t distribution with 18 d .f. ... . .... ..................... ... 143 Figure6.42 Plotcomparingtheperformanceofthe T2, "Q", RST , PR-SRT, Vn, W,,, and H charts under the bivariate mixed normal distribution ....... .... .... . ..... . ....... .... 143 Figure 6.43 Plot showing the performance of the Vn EWlv!A chart under the various bivariate distributions ...... . ... . . ..... .... . .... ... . . . ... .......... . ..... . . . .... . ..... ........ ....... .... ....... . 146 Figure 6.44 Plot comparing the performance of Lowry et al. (1992), REWMA , and Vn EWlv!A charts under the bivariate normal distribution . . ...... . . ........ . . . ..... . ...... . . 14 7 Figure 6 .4 5 Plot comparing the performance of Lowry et al. ( 1992 ), REWlv!A , and Vn -EWMA charts under the bivariate t distribution with 2 d.f. ..... . ... . ... . ... .... ..... 147 Figure 6.46 Plot comparing the performance ofLowry et al. (1992) , REWMA, and Vn EWlv!A charts under the bivariate t distribution with 5 d.f. ....................... ... 148 Figure 6.47 Plot comparing the performance of Lowry et al. (1992), REWMA, and Vn EWlv!A charts under the bivariate t distribution with 8 d.f. . ....... .... ............. . 148 Figure 6.48 Plot comparing the performance of Lowry et al. (1992) , REWlv!A, and Vn EWlv!A charts under the bivariate t distribution with 18 d.f. ....... . .... . .... .... . . . 149 Figure 6.49 Plot comparing the performance of Lowry et al. (1992), REWlv!A, and V" EWlv!A charts under the bivariate mixed normal distribution . .... . .... .... .......... 149 Xlll PAGE 14 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosoph y ROBUST MULTIVARIATE CONTROL CHARTS By Chairman : G . Geoffrey Vining Major Department : Statistics Vivek Balraj Ajmani May 1998 Control charts are one of the most powerful tools for monitoring a process . Uni v ariate control charts are useful for monitoring processes that manufacture products with a single quality characteristic of interest. In many cases , products may be characterized by two or more quality characteristics that jointly determine the usefulness or the quality of the product. In man y ins tances , these quality characteristi c s ar e correlated and , therefore , alternative multivariate control chart techniques are required to monitor the process that manufactures such products . The performance of the multivariate control chart procedures that are currently being used in industry and that are being cited in the literature have been studied under the assumption that the underlying distribution of the process is multivariate normal. It is well known that in reality this assumption rarely holds . Our results indicate t hat the normal theory multivariate control charts p e rform poorly when departures from multivariate normality occur. Alternatives to the normal theory multivariat e control c h a rts a re needed XIV PAGE 15 in case the assumption of multivariate normality fails to hold . One such alternative is based on the notion of data depths which leads to non-parametric multivariate control charts. However , our simulation studies indicate that the performances of the data depth multivariate control charts are poor under both multivariate normality and under departures from it. We propose robust alternatives which are based on affine-invariant one-sample multivariate versions of the sign and sign-rank hypotheses tests . These hypotheses tests are used to construct multivariate Shewhart type and exponentially weighted moving average (EWMA) charts. Our simulation results indicate that the performance of the proposed charts are comparable to the performance of the normal theory and the data depth based multivariate control charts under the assumption of multivariate normality . On the other hand, the performance of the proposed charts are an improvement over the performance of the normal theory and the data depth based multivariate control charts under departures from multivariate normality . xv PAGE 16 CHAPTER 1 INTRODUCTION Global competition and increased consumer awareness have made American industries focus heavily on quality control issues . Statistical Process Control (SPC) provides an important set of tools for achieving quality control objectives. These tools help in achieving process stability through the reduction of process variability. Stable processes are required to meet the consumer ' s fitness for use criteria . Ideally , processes should operate with little variability around the target of the quality characteristic in question . Assignable causes or process shifts must be detected quickly so that corrective actions can be taken before many non-conforming units are produced. Control charts , developed in the 1920s by Walter A. Shewhart of Bell Laboratories , are one of the most powerful on-line techniques for controlling process variability . They help in monitoring a process so that efforts can be made to improve the process . Since their introduction , control charts have g ained wide usage and acceptance in industry particularl y in the manufacturing sector . Univariate control charts are useful for manufactured items with only one quality characteristic of interest. For instance, the quality of a compact disc may be characterized by its diameter . A compact disc must have a diameter that is less than or equal to the diameter of the disc holder in the compact disc pla yer . Un ivariate co ntrol charts may be used in this case to monitor the target diameter of a compact disc . The objective would be 1 PAGE 17 2 to detect any deviations from the target diameter that individual discs or sample means exhibit as soon as they occur. In many manufacturing situations, products may have two or more quality characteristics . In such cases the usefulness of the product is determined b y both quality variables , which often are correlated . As an example , the quality of certain t y pes of tablets may be determined by their weight, degree of hardness, thickness , width, and length. These quality variables are correlated and, therefore , alternative methods of control are needed since control charts for monitoring individual quality characteristics may not be adequate for detecting changes in the quality of such products . These methods are collectively classified as multivariate quality control techniques and the control charts based on such procedures are called multivariate quality control charts. Multivariate qualit y control was first introduced b y Hotelling in 1 94 7 in the testing of bomb sights. Two bomb sights from a lot of size 20 were selected at random. The sights were tested by taking two flights and dropping four bombs on each flight. The range error (measured in the flight direction of the airplane) and the deflection error ( measured perpendicular to the direction of the flight) were used as a measure of the quality of the bombsights (Alt , 1 9 84) . Hotelling introduced a multi va riate Shewhart t y pe chart (the T2 chart) for monitoring this process (Hotelling, 1 94 7) . Jackson (1956, 1959) introduced a control ellipse which produced the same result as the T2 contro l chart proposed b y Hotelling. The control ellipse and the T2 c hart are similar in the sense t hat points which plot out-of-control on the T2 chart also plot out-of-control on the control ellipse. Since then, various authors have studied the performance a nd properties of the T2 c hart . Alt (1984) and Jackson ( 1985) give a thorough lit erature review o n this topic. PAGE 18 One of the dra w backs of the She w hart control chan is its insen sitivit y to small sh i fts in the process mean . Alternati v es to the Shewhan control chart w hen small shifts i n the process mean are of interest include the cumulative sum (CUSUM) chan and the exponentially weighted moving average ( EWMA) chart. The Shewhart , C U SUM , and the EWMA charts differ with respect to how each chart uses the data from the production process . The Shewhart chart places all the weight on the most recent observation and , therefore , ignores information from past data . This makes the Shewhart chart insensitive to small shifts in the process mean . The CUSUM chart is based on the cumulati v e sum of the deviations of the observations from the target mean of the process and , therefore , each observation in this sum is equally weighted . Thus , the CUSUM chart uses information from both recent and past data . The EWMA chart is based on a statistic that giv es less weight to past data than to present or more recent data through the use of a w eighting constant. The weighting constant in the EWMA chart depends on the magnitude of the shift in the process mean that needs to be detected . The EWMA chart can be designed to behave like the Shewhart or the CUSUM charts b y choosing an appropriate w eigh t ing constant . The CUSUM and the EWMA charts are , therefore , more sensiti v e to small shifts than the Shewhart chart since the y make use of information from past data . The multivariate generalizations of the CUSUM and EWMA charts have been studied extensively . See for example Woodall and Ncube ( 1985), Crosier (1988) , Pignatiello and Runger (1990), and Lowry et al. (1992). The average run length ( ARL) performance of these charts were shown to be an improvement over the multivariate Shewhart charts particularly for small shifts in the process mean. The average run length of a control chart PAGE 19 4 is the average number of observations or samples that need to be collected before the control chart gi v es an-out-of-control signal. The performances of the multivariate Shewhart , CUSUM and EWMA charts t hat are currentl y being used in industry and are cited in the literature have been studied unde r the assumption that the underlying distribution of the process is multi v ariate normal. It is well known that this assumption is rarely true in practice . Alternative methods of contro l are needed in case the assumption of multivariate normalit y is v iolated . Research in the area of robust multivariate control charts is needed. Liu (1995 ) proposed three non parametric multivariate control charts that do not require any assumptions be satisfied regarding the underlying distribution of the process . However , Liu did not conduct average run length studies of the proposed charts. Therefore , we canno t compare the performance her charts with that of the normal theory based multi v ariate control charts . The performance of a control chart is measured by its in-control and out-of-control average run lengths . Typically , we would require a control chart to maintain its pre specified in-control average run length and quickl y detect out -of-control states in a process . The objectives of this dissertation are threefold . First , we pro v ide a thorough literature review of both the normal theory and the non-parametric multi v ariate quality control charts . Secondly , we do a comprehensive stud y of the a v erage run length performance of the existing methods under departures from the multivariate normalit y assumption . This includes the study of the average length performance of the procedure s suggested in Liu (1995). Finally , we propose robust alternatives to the existing multivariate control chart methods . The new methods are based on affine inv ariant PAGE 20 5 multivariate one sample tests that were developed by Hettmansperger et al. ( 1994 ) , Peters and Randles (1990) , and Randles (1989). The performances of the proposed multivariate control charts are similar to those of the existing methods or charts when the underlying distribution of the process is multivariate normal and are better than the performances of the existing methods under departures from the multivariate normality assumption . This dissertation provides an important contribution to the field of multivariate quality control charts in several ways. First , this dissertation gives a concise summary of the methods that are currently being used to solve multivariate quality control problems . Secondly, this dissertation explores the behavior of the existing methods under departures from the multivariate normality assumption. Next, we propose alternative methods that are shown to be more robust than the existing methods under deviations from multivariate normality . The new methods are robust in the sense that they maintain their pre-specified type I error rate (and , therefore, the pre-specified in-control average run length) and detect out-of-control conditions quickly under both multivariate normality and deviations from multivariate normality . Another important contribution of this dissertation relates to its effort to bridge the gap between theoretical and applied statistics. We use the affine invariant multivariate one sample tests that were developed by Hettmansperger et al. (1994), Peters and Randles (1990) , and Randles (1989) to solve a problem in industrial statistics. That is, these affine invariant tests form a basis for the multivariate quality control charts that are proposed . The literature review in Chapter 2 is intended to acquaint the reader with the normal theory and the non-parametric multivariate quality control chart procedures that have been suggested in the literature. In Chapter 3, we thoroughly investigate the PAGE 21 6 performance of the normal theory multivariate control charts under departures from the multivariate normality assumption . In Chapter 4 , we investigate the behavior of the multivariate control charts that were proposed by Liu (1995). In Chapter 5 , we propose robust multivariate control charts under the assumption that the variance-covariance matrix of the underlying distribution of the process is known . Chapter 6 extends Chapter 5 to the case when the variance-covariance matrix of the underlying distribution of the process is unknown . Chapter 7 contains conclusions and potential areas of further research into the area of multivariate quality control charts. PAGE 22 CHAPTER2 REVIEW OF LITERATURE This chapter is intended to acquaint the reader with the multivariate quality control chart procedures that have been proposed in the literature . We discuss the normal theory multivariate Shewhart, CUSUM, and EWMA charts as well as the non-parametric multivariate control charts that were introduced by Liu (1995) . The multivariate Shewhart charts are discussed in Section 2.1 . The theory underlying these charts is a simple extension of the theory underlying the univariate Shew hart charts. We, therefore , discuss the univariate Shewhart charts first . The multivariate CUSUM and EWMA charts are discussed in Sections 2 . 2 and 2 . 3, respectively . The underlying theory behind these charts is also a straightforward extension of the underlying theory behind the univariate CU SUM and EWMA charts. We, therefore , discuss the univariate CUSUM and EWMA charts first. We briefly discuss Liu ' s non-parametric charts in Section 2.4. A detailed discussion of these charts along with simulation results is presented in Chapter 4 . 2 . 1 The Shewhart Control Chart The Shewhart control chart is perhaps the most widely used control chart in statistical process control. Developed in the 1920s by Walter A. Shewhart of Bell Labs , it has gained wide acceptance and usage in industry. Montgomery (1991) presents a detailed list of references and a good overview of the theoretical background and 7 PAGE 23 8 applications of the Shewhart chart. We begin our discussion by first describing the univariate Shewhart chart. Let the quality characteristic of a manufactured item be denoted by X . An example is where X represents the inside diameter of forged piston rings . Assume that X -N(0,a; ) where both 0 and a g are known and consider a sample X1 , â€¢â€¢â€¢ ,Xn from the manufacturing process. A univariate Shewhart control chart for the process mean is given by the following characteristics : a o UCL= 0 + Zan (J;;) CL= 0 a o LCL = -Za12(J;;) (2 .1) where UCL, CL , LCL are the upper control limit , the target or center line, and the lower control limit, respectively, and Za,2 is the upper (a I 2)th quantile of the standard normal distribution . For successive samples of size n , this control chart can be viewed, when the values of the means of the successive samples are plotted on it , as repeated tests of hypothesis of the form H0: = 0 versus Ha: -:t:. 0 at the a level of significance . The regions above UCL and below LCL represent the rejection regions of the likelihood ratio test of the above hypothesis. Several authors argue that control charts should not be viewed as repeated tests of hypothesis. We , however, disagree with that view . We see that the advantages from formally viewing a control chart as a sequence of hypothesis tests clearly outweigh any disadvantages. Further, by viewing a control chart as a sequence of tests , we provide a formal basis for introducing novel , robust control chart PAGE 24 9 procedures. For additional discussion on the relationship between control charts and hypothesis testing , see Woodall and Faltin (1996). Most often , the nominal values of 0 and erg are unknown . A typical way of estimating these unknown quantities is by taking m preliminary samples of size n each over a base period when the process is assumed to be in-control. The target mean 0 is estimated by X which is given by where X; is the i1h sample mean and the process variance erg is estimated by S2 which is defined as where S;2 is the ith sample variance . Substituting these estimators in the control limits given in Equation (2.1 ) , we g et UCL= X +Alsf C L = X LCL = X-Alsf where A = Za12 I Fn and is tabulated in standard quali ty control text books such as ( 2 .2) Montgomery (1991 ) . The control limits g iven in Equation ( 2 . 2 ) are called trial limits and if the sample mean s from the preliminary samples fall b e tween UCL and LCL , then these limits can b e used for future contro l . If one or more of the prelimin ary sample means falls above UCL or below LCL, and if we know why it or th ey are ou t-of-c ontrol , t h e PAGE 25 10 corresponding samples are dropped and the control limits are recalculated . This procedure is continued until all the preliminary sample means fall within the control limits . In many instances , a manufactured item may have two or more quality characteristics that jointly define the usefulness of the product. For example , consider a bearing that has both an inner ( X1 ) and an outer ( X2 ) diameter. Suppose that X1 and X2 have a bivariate normal distribution with Cov( X1 , X2 ) -::f. 0 . One way of monitoring the values of these quality characteristics is by constructing two separate univariate Shewhart charts . The process is said to be in-control only if the sample means X 1 and X 2 both fall within their respective control limits . This method of control yields a joint rectangular control region and the process is said to be in-control if the point ( X 1 , X 2 ) falls inside this region. There are two major problems associated with this approach. The first deals with wrong probability statements. Assuming independence , suppose that we use a type I error rate of 0.05 to construct control charts for each of the two qualit y characteristics in the above example . The probability that each sample mean falls within its respective control limits is 0.95 . However, the probability that both sample means simultaneously fall within their control limits is 0.952 = 0.9025 producing an inflated type I error of about 10% . Furthermore, the magnitude of inflation increases as the number of quality characteristics increases. In general, if there are p independent quality characteristics and if p univariate X charts each with Pr(type I error)= a are constructed into a single chart, then the true probability of a type I error is 1 (1-a V . In most cases the p qualit y characteristics are PAGE 26 11 correlated and , therefore , this formula cannot be used to compute the effecti v e t y pe I error rate . A second problem in using separate control charts for t w o or more qualit y characteristics is the resulting conflicting answers regarding the signals. Conflictin g answers arise because under the assumption of a normal distribution (assuming unequal variances) the contours of constant probabilit y are ellipses and not the square or rectangular region as previously stated . We could, therefore , falsely claim that both X1 and Xe are in-control when at least one is not or that one or both of X1 and X~ are ou t of-control when , in fact , they are both in-control. The general multivariate quality control problem consists of a repetiti v e process in which each item is characterized by p quality characteristics X1 , â€¢â€¢â€¢ , X P . The underl y ing distribution of the p random variables is assumed to be multivariate normal with a known mean vector and a known variance-covariance matrix L . The multivariate Shewhart control chart procedure can be viewed as a sequence of hypothesis tests of H0 : = 0 versus Ha: cl= 0 â€¢ The likelihood ratio test for this set of hypotheses specifies that the null hypothesis be rejected if (2.3) where X denotes the (p x 1) vector of sample means and x ~ . a is the upper ath quan t ile of the Chi-square distribution with p degrees of freedom . The control chart is formed b y letting U CL= X~.a L C L = 0 PAGE 27 12 and plotting the values of x 2 . We conclude that the process is out-of-control if any of the collected samples yields a value of x 2 that falls above UC L . Most often, both 0 and L are unknown and , therefore, have to be estimated from a base period of m samples each of size n when the process is assumed to be in control. The sample means and variances are then calculated as follows : -1 n xik = -Lxiik n i=I ' 1 ,, S 1~ k = --~ (Xuk X 1 k )-n 1 i=I where XiJk is the / h observation on the /h quality characteristics in the k th sample. The covariances between quality characteristics X i and X h where j -:le h are given b y The statistics X ik, S}, and S ihk are then averaged over all m samples to get and 2 1 2 S i = ~S1 k ' m k = I 1 m s i h = Lsihk m k = I where j -:1c h . The sample means X 1 are the elements of the vector X and the p x p sample covariance matri x S is defined as PAGE 28 13 52 I S12 sip S= S 1 2 s ; S2p s i p S2p 5 2 p We now replace 0 by X and L with S in the x:: test statistic given in Equation (2 . 3) to get T2 = n(X -Xf s -1 (X-X). This test statistic is called Hotelling' s T2 , and the control chart that is based on it is referred to as the Hotelling's T2 control chart. For large sample sizes and under t he assumption that the process is in-control , the T2 distribution converges to the x " distribution. Therefore , if 0 and L are estimated from a large number of preliminary samples then it is customary to use the x2 control chart that was described earlier. For small sample sizes the control chart is formed by letting UCL= T,}.p.np LCL = O where T}.p. n p is the upper ath quantile from the T2 distribution with p numerator and n -p denominator degrees of freedom . We claim that the process is out-of-control whenever T2 for any sample exceeds UCL . As in the case of the univariate control chart the multivariate chart control limits are considered to be trial limits . If all the preliminary sample me a ns y ields a x 2 ( or T2 ) test statistic value that is less than x~.a ( T}.p,n p ) then we can use t hese limits for future control. If one or more prelimina ry sample means y ields a x 2 ( or T2 ) va lu e t h at e xc eeds PAGE 29 14 X~.a ( T; _p,np) and if we know why it or they are out-of-control , the corresponding samples are dropped and the control limits are recalculated . The reader is referred to Alt ( 1984), Alt and Smith ( 1988), and Jackson (1980, 1985) for more details on the Hotelling's T2 chart. Next , we discuss the principal components approach to the multivariate quality control chart problem. The procedure is based on principal components and amounts to transforming the correlated quality characteristics into a set of new independent variables which are linear functions of the quality characteristics . The starting point of the statistical applications of the method of principal components is the sample covariance matrix S . The axes of the control ellipsoid corresponds to the eigenvectors of S . For example, when p = 2 , the axes of the control ellipse may be thought of as vectors in two-space that characterizes the rotation of the original axes. The length of the major axis is 2~t...1 Ta.2 and the length of the minor axis is 2~t...2 T ~ where A, and are the eigenvalues of S and 2 2(N -1) T = --F J ., 2 . a (N2) a . -. , â€¢ Note that T; and Fa.: .. v -2 are the upper ath quantiles of the T2 and F distribution with 2 and N -2 degrees of freedom, respectively. The coefficients of the first eigenvector of S are the cosines of the angles between the major axis and the X1 and the X2 axes. Similarly , the coefficients of the second eigenvector of S are the cosines of the angles between the minor axis and t he X1 and the X2 axes. PAGE 30 15 We can generalize this concept to p quality variables X r = ( X1 , . .â€¢ , X P ) . A principal ax.is transformation transforms X into p uncorrelated variables yr= (r;, .. . ,YP). The coordinate axes ofthese new variables are described by the vectors u 1 , â€¢â€¢â€¢ , u P which make up the columns of the orthonormal matrix U . The columns of U are the eigenvectors of S . The transformation that yields Y is given by where X and X are the p x 1 vectors of the original quality variables and their sample means. If X -Np(,"'[.), then Y N/0,A), where O is the p x l vector of zeros and A = diag( Ai , ... , AP) where ' ... , AP are the eigenvalues of S . It can be shown that the determinant of S , [Sf, equals the determinant of A , [ A [ . Similarly , it can be shown that the trace of S , tr(S) , equals the trace of A , tr(A) . Therefore , the proportion of total variability associated with each principal component is given by 0-; I tr(S))IOO for i = l , ... ,p. Two alternate and sometimes more desirable ways of scaling the principal components are by using the following transformations : yâ€¢ = A112ur (XX) and The p x 1 vector yâ€¢ has mean O and variance-covariance matri x eq ual to A2 . On the other hand , the p x 1 vector Y .. has mean O and varia nce-covariance matrix equal to I , the identity matrix of order p. The vector Y .. is preferred in the multivariate quality PAGE 31 16 control chart setting since it has as an identity covariance matrix . Therefore, each component of yâ€¢â€¢ has unit variance . It can be shown that T2 = yâ€¢â€¢ry â€¢ â€¢ (see Jackson , 1959). We can , therefore , plot yâ€¢â€¢ry â€¢ â€¢ on the T2 control chart to monitor the process mean . 2.2 The Cumulati v e Sum Chart (CUSUM) The cumulative sum (CUSUM) control chart , first proposed b y Page ( 1 954), is a powerful and popular method for monitoring industrial processes . N umerous studies on the CUSUM chart ' s performance and properties have been conducted ; see Montg omery ( 1991) for a list of references on the subject. The CUS UM can be viewed as a s equence of sequential probability ratio tests (SPR T) which are applications of the generalized likelihood ratio test which we describe below . Consider testing H0 : 0 = 00 versus Ha: 0 = Bi where 0 i s the parameter of interest. The parameter values 00 and 01 can be looked upon as the in-control and the out-ofcontrol values of the process. Suppose that we begin observing the process at time O and that we are able to make a decision whenever some value B; , i = 1,2, .. . suggests there is a problem. Assume that ~ ' ... ' e n 'en+ I , ... are appropriate functions of sufficient statistics and consider a procedure that is based on SPRT . Let the likelihood ratio at time i be given by where the x' s are random variables with a probability density function given by f and Ln is the natural log function. For independent observations the log-likelihood statistic at PAGE 32 17 n the nth step is given by Z n = L z , and we conclude that the process is out-of-control i=I whenever Z n 2'. a where a is some constant. Similarly , we can conclude that the process is in-control whenever Z n < b where b is another constant. For a SPRT , b is typically set at O since b < 0 slows the procedure's ability to detect an out-of-control state. Therefore , the CUSUM is a sequence of independent SPRTs with the following decision rules : 1 . Signal an out-of-control state whenever Z n 2'. a . 2. Restart the CUSUM whenever Z n 0. 3 . Continue the current CUSUM whenever O < Z n < a . To illustrate the basic CU SUM procedure , consider a sequence of individual observations from a normal distribution with mean and variance u2 . Assume that u2 is known and that 0 and JJi are the in-control and out-of-control values of the process mean . The likelihood ratio at time i is given by Therefore , the critical inequality is given by : PAGE 33 n 0< LZ; < a i=I 18 For 1 = 0 +Jo, where J is the magnitude of the shift, the critical inequalit y is given by ~x-0 5 a O PAGE 34 19 S n = min[O, S n I + ex: -d)] and we signal an out-of-control state whenever S n < -h . The basic parameters of the CU SUM are, therefore , d and h . For monitoring a normal mean the typical value of d for a one CY shift in is 0.50 . Woodall and Ncube (1985) extended the univariate CUSUM scheme to the multivariate setting. They described a method for monitoring a p -dimensional multivariate normal process by using p two-sided univariate CUSUM charts . Each quality characteristic is controlled by operating a two-sided univariate CUSUM chart. That is , the jth two-sided univariate CUSUM is operated by forming the cumulative sums S J.r = max(O,SJ.ti +xi1 -k) ~.r = min(O,~.ri +xi1 + k i ) where SJ. t 0, ~ . r 0 , k i > 0 , and xi1 is the sample mean at time t for quality characteristic j . The jth two-sided chart signals that the corresponding process mean has shifted when either SJ.i > \ or ~ .1 < -\ for some CUSUM chart parameters k i and hi. The process is declared out-of-control whenever any of the p two sided charts signal. A disadvantage of Woodall and Ncube's multivariate CUSUM chart is that its performance depends on the direction of the shift of the process mean. Healy ( 1987) showed that a univariate CUSUM chart , which is based on a linear combination of the quality characteristics can be used in the multivariate case . The performance ofHealy' s multivariate CUSUM procedure also depends on the direction of the shift of the process mean . This procedure is based on the theory of sequential probability ratio tests and is explained below. PAGE 35 20 Assume that X1 , ... , X11,X11+1 , â€¢â€¢â€¢ are distributed as N/,",) where=,, ( the process mean in the in-control state) or = , (the process mean in the out-of-control state) and the variance-covariance matrix ", is assumed to be known . The likelihood ratio at time i is given by c pl".l -1 1 2 exp(-0.S(X i -~f ".-1(Xi -~)) = c )".l -112 exp(-0.S(X i -1 f L -1 (Xi -1 )) exp( 0.S(Xi -f ".-1 (Xi 2 )) exp( 0.5(X i -i)r L -1 (Xi -1 )) = exp((~ 1 f ".-1 X i -0. 5({ ".-1 2 t ".-1 1 )) = exp((~ -,, f ".-1 x i -0 .5(~ + ,, f ".-1 (~ 1)) where c P = (2n") -P 1 2 . The log of the likelihood ratio at time i is given by and ifwe let then since the observation vectors are considered independent , the log-likelihood statistic n at the nth step is given by Z n = L zi . We conclude that the process is out-of-control i = l whenever Zn L where L is a constant that achieves a pre-specified in-control average run length . The critical inequality is , therefore, given by n = o < L[C2 1 f ".-1 x i -o.sc2 + 1f ".-1c~ JJ < L i = l Let PAGE 36 21 represent the Mahalanobis distance between 1 and : and divide the critical inequality throughout by y to get where and n 0 PAGE 37 22 line connecting /Ji and . but in the direction away from are also considered, then another one-sided CUSUM can be developed and combined with the first one-sided CUSUM to form an ordinary two-sided CUSUM . The preceding derivation was modeled after the derivation of discriminant analysis by Anderson ( 1984) which we briefly discuss now . Let /1 (x) and /2 (x) be the probability density functions associated with the p x l random vector X as belonging to the populations 1l'1 and 1l'1 , respectively. An object X with associated sample space Q must be assigned to either 1l'1 or 1l'1 . Let R1 be the set of X values for which we classify objects as ,r1 and let R: = n R1 be the remaining X values for which we classify objects as 1r2 â€¢ Assume that 1l'1 has the multivariate normal distribution with mean "1 and variance-covariance matrix L whereas ": has the multivariate normal distribution with mean and variance-covariance matrix . It can then be shown that a vector of observations X is classified as belonging to ,r1 if where p1 and p2 are prior probabilities of ,r1 and ,r:. Note that c(21I) and c(ll2) are the costs of misspecifying the observation vector X as Jr2 when it should be classified as 1r1 and vice-versa. In the multivariate CUSUM setting, we assume that Pi = p2 and that 1r1 and ": are the distributions of X in the in-control and the out-of-control states, respectively . The ratio between the two costs is equivalent to selecting a cut-off value to achieve a pre-specified in-control average run length. PAGE 38 23 Crosier (1988) proposed two multivariate CUSUM procedures . The first procedure is based on accumulating successive values of T, the square-root of Hotelling' s T2 . This multivariate CUSUM is given by where S0 0 and k1 > 0 . This procedure signals that the process is out-of-control whenever S" > h1 where h1 > 0 is chosen to achieve a pre-specified in-control average run length. Crosier's second chart is based on accumulating the observation vectors using the statistic c = {cs + x )r "'-1cs + x )}1 ' 2 I 1-l I L.., 1-I I and S, = 0, if C, $ k: = (S, _1 + X, )(1k: IC,), if C, > k:, i = 1,2, ... , where S0 = 0 and k: > 0 . This procedure signals whenever f = {ST "'1 s }l/2 > h I I .I.., I 2' (2.4) where h: > 0 is chosen to achieve a pre-specified in-control average run length. Average run length studies conducted by Crosier ( 1988) show that the vector valued multivariate CUSUM is more sensitive to shifts in the process mean than the multivariate CUSUM that is based on accumulating successive values of T. Pignatiello and Runger (1990) also proposed two multivariate CUSUM charts. The first procedure is based on accumulating successive values ofHotelling ' s T2 . This multivariate CUSUM scheme is given by PAGE 39 24 where S0 ;:::: 0 and k3 > 0. This procedure signals that the process is out-of-control whenever Sn > h3 where h3 > 0 is chosen to achieve a pre-specified in-control average run length. Pignatiello and Runger ' s second chart is based on the following vectors of cumulative sums: i D ; = LXJ f =i-1;+1 and (2.5) where k4 > 0 and l , = (_1 + 1 , ifMC; _1 > O = 1, otherwise , i = 1 , 2,3, .... The process is declared out-of-control whenever MC; > h4 where h4 > 0 is chosen to achieve a pre-specified in-control average run length . As was the case with Crosier's ( 1988) procedure , Pignatiello and Runger ' s vector valued multi v ariate CU SUM is more sensitive to shifts in the process mean than the multivariate C U S UM procedure that is based on accumulating successive values of T2 . 2.3 The Exponentially Weighted Moving Average Chart (EWMA) The univariate EWMA chart, introduced by Roberts (1959), is another alternative to the Shewhart chart when small shifts in the process mean are of interest. The performance of the EWMA is similar to that of the CUSUM in the sense that both charts are able to quickly detect small shifts in the process mean . The Shewhart , CUSUM , and EWMA control charts differ in how each chart uses the data generated b y the production PAGE 40 25 process (Hunter , 1986). An out-of-control signal from the Shewhart chart depends entirely on the most recently plotted point. That is, the weight ( w 1 ) given to the most recently plotted point is w1 = I , and the weights given to all previous points are w1 _ k = 0 for k 2 I . The Shewhart chart thus ignores all information in the past data a nd is insensitive , therefore , to small shifts in the process . The out-of-control signal from the T CUSUM chart depends upon the sum S r = 2,d1 where d1 is the deviation of an t=l observation X1 from the target mean . Thus , all the observations in the CUSUM are weighted equally since all the d1 's in the sum S r receive equal weight. On the other hand , the EWMA chart is based on a statistic that gives less and less weight to data as they get older and older. The performance of the EWMA chart, therefore , depends on t he weighting constant. A smaller weighting constant leads to an EWMA that is more sensitive to smaller shifts in the process mean whereas a large weighting constant l eads to an EWMA that is more sensitive to larger shifts . The univariate EWMA statistic is defined as where O < m < 1 is the weighting constant , X I is the sample mean a t time t , and the starting value for the first sample (t = 1) is Z0 = X where X is the average of the sample means from m preliminary samples taken when the process is assumed to be in-control. Note that and we can substitute this in the formula for Z1 to get PAGE 41 26 If we continue substituting recursively for Z1 _ i ( j = 2,3, .. . , t) we get r 1 Zr= CVL(l-cv)i X r-J +(1-m)r Z0 . }=0 It is clear that the weights cv (l cv) i decrease in value with the age of the sample mean, that is, as the value of j increases . For example , if the most recent sample mean has been assigned a weight of 0 . 20 then the sample mean at time t 1 gets a weight of 0.16 , the sample mean at time t 2 gets a weight of 0 . 128 and so on . The values of the weights, therefore , decrease geometrically with age . If the sample means Xr are independent random variables with variance cr2 In, then it can be shown that the variance of Zr is with limiting value (Y2 z , 2 CY CV 21 = -(-)(1-(1-cv) ) n 2-cv cr2 I n We can, therefore , form the EWMA chart control limits as follows: LCL = X -kCY.J(2 cv) I cvn, C L = X, and U CL= X + kcr.J(2 cv) I cvn where k > 0 . The control limits in this case are based on th e asymptotic variance of Z1 . The control limits that are based on the exact variance of Z r lead to a natural fast initial PAGE 42 27 response EWMA control chart where initial outof-control states a r e detected qu i ckl y . However , in realit y we wou l d expect the process to be in-control a t the start up stage an d then drift out-of-control and , therefore, we have used the as y mptot i c v ariance in the construction of the control chart. The uni v ariate EWMA chart is a p l o t o f Z1 a g ainst t i me t and the process is declared out-of-control whenever Z1 falls abo v e UC L or below L C L . The design parameters of the EWMA chart are k , the multip l e of u used in t h e control limits , and OJ . The parameter v alues are usuall y chosen to a c hie v e a p r e-spec i fied in-control and out-of-control average run length (ARL) . Theoretical s t udies of the average run length properties of the EWMA chart ha v e been conducted b y Cro w der ( 1989) and Lucas and Saccucci (1990) . These studies pro v ided a v erage run length tables for a range of values of OJ and k. An optimal design s t rategy w ould involve specifying the desired in-control and out-of-control average run lengths and the magnitude of the process shift that needs to be detected . Once these quantities are s pec i fied , the appropriate values of OJ and k are selected . In general , s mall v a l ues of OJ a re chosen t o detect small shifts in the process mean . The control limits are usuall y set a t the standard (k = 3) 3u limits . Lowry et al. (1992) extended the univariate EWMA to the multi v ariate setting (MEWMA) by defining vectors ofEWMA's as follows : zt = ~ t +(I-~) ZH, PAGE 43 28 fort= 1 , 2 , ... and with Z0 = 0 and~= diag( w1 ,wc, , w p ) , 0 < w i < 1 , j = 1 ,2, . . . , p and where I is the p dimensional identity matrix . The random v ectors X1 a re i.i . d . NP (0, L) for t = 1,2,. .. . The ivfEWMA chart g i ves an out-of-control signal as soon as T2 = z r ~ I Z > H t I L.z, t ( 2.6) where H is chosen to achieve a pre-specified in-control a v erage run length and L z i s t he I covariance matrix of Z 1 . If there is no a priori reason to weight past obse rv ations differently for the p quality characteristics being monitored , then equa l weig h t s a re assigned , i . e., w, = cv2 = . . . = w P = w . In this case the MEWMA can be written as It can be shown that the covariance matrix of Z I in this case is given b y with asymptotic covariance matrix L z , = -2 CV [ I -(1-CV) 21] L ' -CV aJ Lz =--L. I 2-cv The MEWMA with the exact variance-covariance matrix leads to a natural fast init ial response chart. Thus, initial out-of-control states are detected more quickly. Howe ver, i t is more likely that the process will startup and rema i n in-control for a while and t hen shift out-of-control. Therefore , in practice the as y mp t otic variance-covariance matri x is used to calculate the MEWMA statistic . Simulation results given in Lowry et al. ( 1 992) indicate that smaller v a l ues of cv are more effective in detecting small shifts in the process mean. Their r e s ults also PAGE 44 29 suggested the performance of the MEWMA chart with OJ = 0 .10 compares favorably with that of the multivariate CUSUM charts that ha v e been proposed. However, the MEWMA is slow in detecting large shifts in the process mean . Therefore, it is recommended that t he T2 chart be used in conjunction with the MEWMA. However, in this case there is a trade-off between protection against the detection of large shifts and the quick detection of small shifts in the process mean. This is because the control limits of the MEWMA must be increased slightly to maintain the desired in-control ARL. The EWMA chart can also be used as a process forecasting device . The EWMA provides a forecast of where the process mean will be at the next time period. In order to see this in the univariate case, we first write the EWMA at time t as EWM"A = Z1 = Z1 _ 1 + we1 = Z1-1 + OJ(Xr-1 Z1 _ 1 ) where Z1 = predicted value of the process mean at time t (the new EWMA), Xr1 = estimate of the process mean at time t 1 , Z1 _1 = predicted value of the process mean at time t 1 (the old EWMA), e1 = Xr1 Z1 _ 1 = observed error at time t 1, and OJ is a constant ( 0 < OJ < 1 ) that determines t he depth of memory of the EWMA. We assume that the random error e1 is distributed normal with zero mean and variance a } . Therefore , Z1 is actually a forecast of the va lue of the process mean at time t a nd it can be used as the basis for a dynamic process control algorithm . If the forecast of the mean is different from the target by a critical amount , then either the operator or some electro-mechanical control system can make the necessary process adjustment. The control limits on the EWMA chart can be used to signal when an adjustment is n ecessary , PAGE 45 30 and the difference between the target and the forecast of the mean A can be used to determine how much adjustment is necessary. 2.4 A Review of Liu (1995) Liu (1995) introduced three non-parametric multivariate control chartsthe " r " chart , the " Q" chart , and the " S " chart . These charts are based on the concept of data depth and do not require any assumptions be made about the underlying distribution of the process . The main idea behind the construction of these charts is the reduction of each multivariate observation , X r = ( X1 , ... , X P), to a univariate indexnamely a ranking that is based on the notion of data depths . These ranks are then used to construct the multivariate control charts. We present a brief summary of the three non-parametric multivariate control charts in this Section. A detailed discussion along with simulation results is given in Chapter 4 . For any point X E RP , the simplicial depth of X with respect to a distribution G is given by where s[X1 , â€¢â€¢â€¢ , X p +i] is a simplex whose vertices X1 , â€¢â€¢â€¢ , X p+i are p + I random observations from G. The quantity D0(X) is a measure of how " deep " or how " central " the point X is with respect to the distribution G . Most often , G is unknown and only a sample Xi,,Xm is available. The empirical depth of the point X with respect to the data cloud X i, ,Xm is given by PAGE 46 31 D o (X) = ( m ) 1 m p+ 1 L I(X Es[X; I , ... , x , p + \ ]) 1 S i1 < ... < ip,..1 $ m where Gm is the empirical distribution of X 1 , â€¢â€¢â€¢ , X m, I is the indicator function and is equal to one if X Es[] and equal to zero otherwise . The quantity D0 measures how m "deep" the point X is within the data cloud X 1 , â€¢â€¢â€¢ , X m . In the multivariate control chart setting, the sample X 1 , â€¢â€¢â€¢ , X m is considered to be the base period sample and the point X is considered to be an observation from the control period . The base period sample is assumed to come from a distribution G while the control period sample is assumed to come from a distribution F . If the process is in-control then G = F otherwise G -:tc F . We will assume that both G and F are unknown. We will now briefly discuss the three data depth multivariate control charts . First , consider taking a base period sample , X 1 , â€¢â€¢â€¢ , X m , when the process is assumed to be incontrol. Next, for each observation, X , in the control period consider the following test statistic : where I is the indicator function and is equal to one if the data depth of X 1 is less than or equal to the data depth of X and is equal to zero otherwise . The quanti ty r0m (X) measures how outlying the point X is with respect to the data cloud X 1 , â€¢â€¢â€¢ , X m . A small value of r0 (X) indicates that only a small fraction of the X 's are more outlying than m } the point X . This would indicate that the point X is at the outskirts or on the boundary PAGE 47 32 with respect to the data cloud X1 , â€¢â€¢â€¢ , Xm. The three non-parametric multivariate control charts are based on the statistic r G ( X) . m The " r " chart is constructed by first taking a base period of m observations X 1 , â€¢â€¢â€¢ , X m . Next, for each observation , X;, in the control period , the statistic r G m ex;) is computed . The "r " chart is a plot of r G ex;) versus time t = 1 , 2 , .... The center line m e CL) is set at 0.50 and the lower control limit e LCL) is set at a. These control limits are based On the asymptotic distribution Of r G ex;) being a Uniform distribution between 0 m and 1 , U[O, l] . The asymptotic distribution of r G ex; ) suggests that L C L = a . We m claim that the process is OUt-Of-control Whenever the values Of T G ex;) for a point x; is m below L C L . The " Q " chart is constructed by first taking a base period of m observations X 1 , â€¢â€¢â€¢ , X m . Next consider taking samples of size n e X ~ , i = 1 , .. . , n) in the control period . For each observation , X~, in the control period sample , the statistic r G m e x ~ ) 1s c omputed . The " Q" chart is based on the avera g e of the r G ex~ ) taken o ve r the control m period sample . That is , for each sampl e of size n in the control period , w e compute the statistic w he r e Gm and Fn a re the empirical distribution s of X 1 , â€¢â€¢â€¢ X m and X;, ... , X : , respectivel y, and t = 1 ,2, .... Liu and Sin g h e1993) s ho w by simul a tion th a t th e asy mptotic dis tribut ion of QI e Gm , Fn) i s PAGE 48 33 1 1 1 1 N(-,-(-+-)). 2 12 m n The control limits of the "Q" chart are, therefore , given by CL = 0.50 and 1 L C L = 0.50Z a 12[(1 / m) + (1 / n)] . The process is declared out-of-control whenever Q t (Gm, Fn) for the tth sample falls below LCL . The "S" chart which is analogous to the univariate CUSUM chart for the process mean is constructed by first taking a base period sample X 1 , â€¢â€¢â€¢ , X"' . Next, for each observation , x;, in the control period , the statistic r e,. (X;) is computed . The " S" chart is based on the statistic St (G ,,J which is defined as: Note that St (Gm) can be rewritten as S1 (Gm)= n[Q(Gm, Fn ) 1 / 2] . We can , therefore , construct the "S " chart by letting CL = 0 and L C L = -Za ~112[(1/ m) + (l / n)]/12. These control limits were derived by usin g the asymptotic distribution of Q( Gm, F n ) . The "S" chart is a plot of St (G m ) versus time t = 1 ,2, .... We claim that the process is outof-control whenever St (Gm) falls below LCL. With the literature review in the back gro und , we will now discuss the performance of the normal theory and the non-parametric control charts under departures from the PAGE 49 34 normality assumption . This will lay the foundation for introducing novel , robust multivariate control charts. PAGE 50 CHAPTER3 ROBUSTNESS OF THE NORMAL THEORY MULTIVARIATE CONTROL CHARTS In this chapter we investigate the performance of various normal theory multivariate control chart procedures under departures from the multivariate normality assumption. Included in this investigation are the average run length studies of the x 2 chart, the Hotelling' s T2 chart, the multivariate CUSUM procedures that were proposed by Crosier (I 988) and Pignatiello and Runger (1990), and the multivariate EWMA chart that was proposed by Lowry et al. (1992) . The various multivariate distributions that were used in the simulation study are discussed in Section 3 . 1 and the simulation strategy is outlined in Section 3 .2. The average run length performance of the x 2 charts under departures from multivariate normality are discussed in Section 3.3. We discuss the performance of the x2 chart for individual observations and for subgroups of observations. The average run length performance of the Hotelling ' s T2 chart under deviations from multivariate normality is discussed in Section 3.4. Also, we discuss the performance of the Hotelling's T2 chart for individual observations and for subgroups of observations . The performances of the multivariate CUSUM and the multivariate EWMA charts under deviations from multivariate normality are discussed in Sections 3.5 and 3 .6, respectively . The simulation programs to compute the average run lengths are given in the appendix . 35 PAGE 51 36 3 . 1 Distributions Used in the Simulation Study The simulation study was conducted by sampling random variates from distributions with elliptical directions . We define X 1 , â€¢â€¢â€¢ , X n to be observations from distributions with elliptical directions if they can be constructed in the following way: Let U 1 , â€¢â€¢â€¢ , U n be i.i.d. uniformly distributed over the p -dimensional unit hypersphere . Let R1 , â€¢ â€¢ â€¢ , Rn be any positive scalar random variables . Let D be any nonsingular p x p matrix and form : X1 = R1DVi, , X n = R nDUn . Note that when R1 , â€¢â€¢â€¢ , R n are i.i.d. the X ; ' s are a sample from some ellipticall y s y mmetric population . In fact , this generation process characterizes all ellipticall y s y mmetric populations when the R ; ' s are i . i .d. The class of elliptically symmetric distributions includes the multivariate normal distribution , the Pearson type VII heavy tailed distributions and the Pearson type II light tailed distributions. The following bivariate distributions (with the exception of the bivariate mixed normal distribution) fall under the general class of elliptically symmetric distributions : 1 . The bivariate normal distribution denoted by N 2 (, I:) where is the 2 x 1 mean vector and L is the 2 x 2 covariance matrix which is assumed to be known . Without loss of generality it was assumed that = 0 and L = I where o is the 2 x 1 zero vector and I is the 2 x 2 identity matrix . 2 . The bivariate Cauchy distribution : This was obtained by first generating a bivariate normal ( N2 (0, I)) random variate and an independent Chi-square random variable ( 17) with one degree of freedom ( v = 1 ) . The bivariate Cauchy random vector is then given by N" (0 ,1) / ~ . PAGE 52 37 3. The bivariate t distribution with two degrees of freedom: This was generated by using the same method that is described in (2) but with v = 2. 4. The bivariate t distribution with five degrees of freedom: This was generated by using the same method that is described in (2) with v = 5 . 5 . The bivariate t distribution with eight degrees of freedom: This was generated by using the same method that is described in (2) with v = 8 . 6. The bivariate t distribution with eighteen degrees of freedom: This was generated by using the same method that is described in (2) with v = 18 . 7 . The bivariate mixed normal distribution : This was obtained by first generating a uniform random variable ( U) between O and I. A random vector was generated from the bivariate normal N 2 (1 , I) distribution if the observed value of U was less than or equal to 0 . 50 . A random vector was generated from the bivariate normal N2 (2 , I) distribution if the observed value of U was greater than 0 . 50. Note that t = ( 1,-1) and { = (1,1). For more information on these distributions or on techniques of multivariate simulation, refer to Johnson (1986) . 3 . 2 The Simulation Strategy The simulation study was conducted by using various FOR TRAN 77 and IMSL subroutines on the UNIX platform . The type I error (a) was set at 0 . 005. For a given , known variance covariance matrix this yields an incontrol average run length of 200. This implies that on the average we would expect a false alarm every 200th observation . The avera ge run length values in the simulation study are based on 100 , 000 out-of-control PAGE 53 38 signals . To illustrate the simulation strategy that was used , consider the simulation study ofHotelling's T2 chart for individual observations with a base period sample of m observations. Assume that the observations are generated from the N 2 (, L) distribution where both and L are unknown . 1. Generate m (X1 , â€¢â€¢â€¢ ,Xm)observationsfromthe N2(,L) distribution. 2. Estimate both and L as follows : the mean vector is estimated by m X = L X ; Im and the covariance matrix L is estimated by i = I m s = (1/ m -l)L(X; -X)(X ; xf. 1=! 3 . Generate a future observation ( X 1 , where f = 1 , 2 , ... ) from the N 2 (, L) distribution . 4. Compute Tf =(X1 -xfs-1(X1 -X). 5 . If T f ? h (where h is an appropriate cut-off point), record f (the point where the out of-control signal was observed) and go to step 1 , else go to step 3 . These steps are repeated 100, 000 times and the average run length is taken as the average of the 100 , 000 f values that were recorded whenever T) ? h . 3 . 3 The Performance of the x2 Chart Consider testing H0: = 0 versus Ha: "CFO . Here O is used without loss of generality, since H0 : = 0 can be tested by subtracting 0 from each observation vector and testing whether these differences are located at O . Since the x 2 test is invariant PAGE 54 39 under all nonsingular linear transformations of the data , without loss of generality, w e assume that I: = I . The x 2 chart for individual observat i ons is based on t he t e s t s tatis t ic x 2 = X r X and we claim that there is evidence to indicate that the process i s ou t -of control whenever the ,t2 value for an observation X exceeds x2 \ 0 05 = 10. 60 ( since we are dealing with bivariate vectors). Figure 3 . l gives the average run length performance of the x 2 c hart for v arious shifts in the process mean under each of the sampled distributions that are gi v en i n section 3 . l . The shifts are given in terms of the non-centrality parameter which i s defined as ,1 = (r )1 1 2 where is the mean of the process in the out-of-control state . Note that A is the Mahalanobis distance between the mean of the process in the in-control s tate and the mean of the process in the out-of-control state . A value of A= 0 indica t es that t he process is in-control. Figure 3 . 1 shows that the performance of the x 2 chart is poor under the heavier tailed bivariate distributions as indicated b y the lower average run lengths. The ,t2 chart maintains the pre-specified in-control average run length of 200 under multi v ariate normality. The in-control average run lengths are lower under the hea v ier tai led distributions. This is indicative of a higher than 0 . 005 false alarm rate. As the degrees of freedom ( v) of the multivariate t distribution increases , the average run lengths start to converge to the average run lengths of the chart under multivariate normality. This is t rue since as v oo the multivariate t distribution con v erges to the multi v ariate normal distribution . The type I error rate is also high under the mixed norma l distribution. On PAGE 55 40 200 180 160 -+-Normal 140 --Cauchy 120 __.....t-2 ..J ~t-5 0:: 100 c:( ---ilE-t-8 80 60 -+-t18 ~Mix Nor 40 20 0 0 0.5 1 . 5 2 2.5 3 Lambda Figure 3.1 Plot of ARL versus 1 of the x2 chart for individual observations. the other hand, the type I error rates were found to be lower than the pre-specified type I error rates under the various Pearson type 2 distributions that were used in the study . We can , therefore, conclude that the z2 chart for individual observations performs poorly when the distribution of the data is other than multivariate normal. Next , consider the performance of the z2 chart with subgroups of 11 observations . This chart is based on the test statistic x2 = nXT X where X is the p x l sample mean vector. We claim that there is evidence to indicate that the process is out-of control whenever the x 2 value for a sample in the control period exceeds x f.o 005 = 10. 60. Figure 3 . 2 shows the average run length p e rformance of this chart under the various bivariate distributions that were used in this dissertation study . The simulation results are based on subgroups of size 5 ( n = 5 ) and the non-centrality parameter is gi ven by A = (nT ) 112. PAGE 56 250 200 ...J 0:: c:t 100 50 0 0 0 . 5 41 1 . 5 2 Lambda 2 . 5 3 !-+-Nor mal 1-cauchy 1--.-t-2 ---*"-t-5 ----t-8 -*-t-18 ! I ~Mi x Nor I Figure 3.2 Plot of the ARL versus J of the x2 chart with subgroups of size 5 . Figure 3 . 2 shows that the performance of the x 2 chart with subgroups of size 5 is poor under the heavier tailed bivariate distributions . However , it maintains its pre specified in-control average run length under multivariate normality . The type I error rates are high under both the heavier tailed and the multivariate mixed normal distributions . As expected , an increase in the degrees of freedom ( v) of the multivaria t e t distributions causes the performance of the chart under the multi va riate t distributions to converge to the performance of the chart under multivariate normality. The t y pe I error rates of the chart under the various Pearson type 2 distributions were found to be lower than the pre-specified type I error rate . We can , therefore , conclude that the performance of the x 2 chart with subgroups of size 5 is poor under deviations from multivariate normality. PAGE 57 42 3 . 4 The Performance ofHotelling ' s t2 Chart We will first discuss the performance ofHotelling ' s T2 chart for individual observations . This chart is based on the test statistic where X and S are unbiased estimators of the process mean and the co v ariance matrix L , respectively, and XI is an observation from the control period . The estimators X and S are computed from a base period of m observations when the process is assumed to be in-control. Woodall and Sullivan (1996) have noted that the in-control average run lengths are smaller than the pre-specified in-control average run lengths when S is used to estimate the variance-covariance matrix L . They proposed alternative estimators which were shown to be more robust than S . These robust estimators were used in place of S to form the T2 statistic . We have assumed that the process is in-control in the base period and we, therefore , use S in our simulation studies . The performance of the T2 chart with S replaced by an appropriate robust estimator of I will be studied as a follow up to this dissertation . Showing that the process is out-of-control whenever T2 for an observation X 1 exceeds an appropriate cut-off value (say) h1 is the next order of business. We shall study the performance of the T2 chart for individual observations under three different cut-off values . The first cut-off value is based on the asymptotic distribution of T2 which is Chi square with p degrees of freedom . The cut-off value in this case is x ; 0 005 = 10.60 . The PAGE 58 43 second cut-off value is based on the exact distribution of T2 which is related to the well known F distribution by the following relationship: T2 a,p ,m-p p(m-1) =---F (m-p) a , p,m p where T},p,mP is the upper ath percentile of the T2 distribution with p numerator and m-p denominator degrees of freedom, and Fa,p,m-p is the upper ath percentile of the F distribution with p numerator and m -p denominator degrees of freedom. The simulation study was conducted with base periods of 20 , 50, and 100 observations. The cut-off values of the T2 chart for individual observations with these base periods are 15.99 , 12.34, and 11.41, respectively. Tracy, Young, and Mason (1992) discussed different cut-off values for the start-up and the control stages of the T2 chart. For simplicity, we have assumed that the process is in-control at the start-up stage and thus does not require monitoring. The cut-off value in the control stage as suggested by Tracy, Young, and Mason (1992) is based on the assumption that the process mean , , is known. They suggest the following cut-off value: p(m + l)(m-1) m(m-p) Fa.p,m-p, where Fa,p,m-p is the upper ath percentile of the F distribution with p numerator and m -p denominator degrees of freedom, respectively. In practice, the process mean is unknown and is estimated from a base period of m observations . Our cut-off value is based on practical considerations and not convenience. The third cut-off value was obtained by simulation to achieve an in-control average run length of 200. Note that the average run lengths follow the geometric distribution. However, one of the underlying PAGE 59 44 assumptions of the geometric distribution is that the observations are independent. This assumption does not hold for the Hotelling ' s T2 chart since successive values of the T1 statistic are not independent because they depend on the base period estimates X and S . The simulated cut-off values for base period sample sizes of 20, 50 , and 100 are 10. 90 , 10.83 , and 10. 77 , respectively . Figures 3.3 through 3 . 9 shows the average run length performance of the T2 chart (with the cut-off value based on the asymptotic x2 distribution) for individual observations with base periods of sizes 20 , 50 , and 100 taken from various bivariate distributions. These plots indicate that the performance of the T2 chart ( with the asymptotic cut-off value) for individual observations is poor under departures from multivariate normality . The in-control average run lengths are underestimated even when the underlying distribution of the process is multivariate normal. The type I error rates are high for the heavy tailed distributions . In comparison , the type I error rate is smaller than the pre-specified type I error rate when the underlying distribution of the process is assumed to be multivariate mix ed normal. Increasing the base period sample size does not compensate adequately for this adverse behavior . The type I error rates were lower than the pre-specified type I error rates under the various Pearson type II lighter tailed distributions that were used in this dissertation study. As a result , the in-control average run lengths of the T2 chart are higher than the pre-specified in-control avera g e run length . The a ve ra ge run l e ngths o f the chart s in the out-o f-c ontrol states are con se qu e ntl y affe ct ed . PAGE 60 200 180 160 140 120 ..J a: 100 ct 80 60 40 20 0.5 45 1.5 2 Lambda 2.5 3 ~m=20 I -----m=5o I ----..-m =1 oo I Figure 3. 3 Plot of ARL versus A of the T2 chart for individual observations under bivariate normality . 45 40 35 30 ..J 25 a: ct 20 15 10 5 0 0 0 . 5 1.5 2 2.5 Lambda 3 '~m=20 I I ; -----m=50 1----..m = 1 00 Figure 3 . 4 Plot of ARL versus A of the T2 chart for individual observations under the bivariate Cauchy distribution . PAGE 61 30 25 20 ..J a::: 15 < 10 5 0 0 0 . 5 46 1 . 5 2 Lambda 2 . 5 3 I-+m=20 I ---m=50 I 1--.m=100 1 Figure 3. 5 Plot of ARL versus 1 of the T2 chart for individual observations under the bivariate t distribution with 2 degrees of freedom . 45 40 35 30 ..J 25 a::: < 20 --+-m=20 I ---m= 50 I --.-m= 100 15 10 5 0.5 1 .5 2 2 . 5 3 Lambda Figure 3 . 6 Plot of ARL versus 1 of the T2 chart for individual observations under the bivariate t distribution with 5 de g rees of freedom . PAGE 62 50 40 ..J 0:: 30 c( 20 10 47 0 . 5 Lambda -+--m=20 ----m=50 -.-m=100 1 . 5 2 2 . 5 Figure 3. 7 Plot of ARL versus A of the T2 chart for individual observations under the bivariate t distribution with 8 degrees of freedom. 100 90 80 70 60 ..J a:: 50 c( 40 30 20 10 ,x, 0 0 0 . 5 1 . 5 Lambda 2 2 . 5 I â€¢ m=20 I ----m=50 -.-m = 10 0 1 Figure 3. 8 Plot of ARL versus A of the T2 chart for individual observat i ons under t he bivariate t distribution with 18 degrees of freedom . PAGE 63 48 500 450 400 350 300 --+-m=20 ...J a: 250 PAGE 64 49 6000 5000 4000 -+-m=20 ...J a:: <( 3000 ---m=50 -.-m=100 2000 1000 0 0 0.5 1 . 5 2 2.5 Lambda Figure 3 . 10 Plot of ARL versus l of the T2 chart for individual observations under the bivariate normal distribution. 45 40 35 30 ...J 25 -+-m=20 a:: <( 2 0 ---m= 50 -.-m=100 15 10 5 0 .. 0 0 . 5 1 . 5 2 2 . 5 Lambda Figure 3 . 11 Plot of ARL versus l of the T2 chart for individual observations under the bivariate Cauchy distribution. PAGE 65 50 35 30 25 ..I 20 -+-m=20 0::: < 15 ---m=50 -.-m=100 10 5 0 0 0.5 1.5 2 2 . 5 Lambda Figure 3 . 12 Plot of ARL versus ;t, of the T2 chart for individual observations under the bivariate t distribution with 2 degrees of freedom . 80 70 60 50 ..I 0::: 40 < 30 20 10 0.5 1.5 2 Lambda 2.5 -+-m=20 1---m=50 -.-m=100 Figure 3. 13 Plot of ARL versus ;t, of the T2 chart for individual observations under the bivariate t distribution with 5 degrees of freedom . PAGE 66 140 120 100 ...I 80 a:: ct 60 40 20 0 0 0 . 5 51 1 . 5 Lambda 2 2.5 -+-m=20 __._m=50 ---.-m=100 Figure 3 . 14 Plot of ARL versus A. of the T1 chart for individual observations under the bivariate t distribution with 8 degrees of freedom. 450 400 350 F 300 ...I 250 a:: ct 200 150 100 50 0 0 0 . 5 1 . 5 2 Lambda 2 . 5 i-+-m=20 I l__._m= 50 ! F igure 3 . 1 5 Plot of ARL versus A. of the T1 chart for individual observations under the bivariate t distribution with 18 d egrees of freedom . PAGE 67 1200 1000 800 ...J 0::: 600 < 400 200 0 0 0 . 5 52 1 . 5 2 Lambda 2 . 5 3 Figure 3 . 16 Plot of ARL versus A of the T2 chart for individual observations under the bivariate mixed normal distribution . since the average run lengths were very high (the in-control average run length is well over 15000) . The type I error rates are small under the various Pearson type 2 light tailed distributions that were used in the dissertation study . We can , therefore, conclude that the performance of the T2 chart (with cut-off values based on the exact distribution of T1 ) is poor under both multivariate normality and deviations from multivariate normality. Figures 3 . 17 through 3.23 show the performance ofthe T2 chart (with cut-off values obtained by simulation) for individual observations under the various bivariate distributions and for the different base period sample sizes. The plots indicate that these charts do not perform well under departures from multivariate normality. Although , the pre-specified in-control average run length is achieved under the bivariate normal distribution, the type I error rates are high under the heavier tailed distributions. The type PAGE 68 53 I error rate is smaller than the pre-specified type I error rate under the bivariate mixed normal distribution . Under bivariate normality , the performance of the T2 charts with simulated cut-off values is an improvement over the performance of the T2 charts with both asymptotic and exact distribution cut-off values . The type I error rates are smaller than the pre-specified type I error rates under the various Pearson type 2 distributions that were used in the dissertation. We can, therefore, conclude that the performance of the T2 chart (with cut-off values based on simulation) is poor under departures from multivariate normality . Our simulation study shows that the T2 chart does not perform well under deviations from multivariate normality. The in-control average run lengths are lower than the pre-specified in-control average run lengths under the heavier tailed distributions and are higher than the pre-specified in-control average run lengths under the lighter tailed distributions . 2 50 ,., ...â€¢.â€¢ -. . 200 150 -+-m= 20 ..J -a-m=50 < 100 -.-m= 100 50 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figur e 3 . 17 Plot of ARL v e rsus ,1, of the T2 chart fo r individual obse rv a tions under the bivari ate norm a l distribution . PAGE 69 54 . ----------45 40 35 30 --+-m=20 ...I 25 a: ---m=50 <( 20 15 _.,_m=100 I 10 5 0 0 0.5 1 . 5 2 2.5 3 Lambda ---Figure 3 .18 Plot of ARL versus ,1 of the T2 chart for individual observations under the bivariate Cauchy distribution. 35 30 25 20 ...I a: <( 15 10 5 0 0 0 . 5 1 . 5 2 Lambda 2 . 5 3 !--+-rn=20 ---rn=50 _.,_rn=1 00 ! Figure 3 .19 Plot of ARL versus ,1 of the T2 chart for individual observations under the bivariate t distribution with 2 degrees of freedom. PAGE 70 55 45 40 35 30 ...J 25 a:: c:i: 20 15 ;~m=20 I . 1----m=50 I I â€¢ I ___.._ m= 1 00 i 10 5 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 3.20 Plot of ARL versus ,.l of the T2 chart for individual observations under the bivariate t distribution with 5 degrees of freedom. 60 50 40 ...J a:: 30 c:i: 20 0 0.5 1 . 5 Lambda 2 2 . 5 ~m=20 ----m=50 __...._m=100 I 3 Figure 3.21 Plot of ARL versus ,.l of the T2 chart for individual observations under the bivariate t distribution with 8 degrees of freedom. PAGE 71 56 100 90 80 70 60 -+-m=20 ..J 0:: 50 c( ---m=50 I 40 -.-m=100 30 20 10 0 0 0 . 5 1 . 5 2 2.5 3 Lambda Figure 3 . 22 Plot of ARL versus A. of the T2 chart for individual observations under the bivariate t distribution with 18 degrees of freedom. 600 500 400 . f -+-m=20 ..J 0:: 300 ---m=50 c( -.-m=100 200 100 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 3 .23 Plot of ARL versus A. of the T2 chart for individual observations under the bivariate mixed normal distribution . Next, consider the performance of the T2 chart with subgroups of size n . This chart is based on the test statistic PAGE 72 57 T2 = n(X-Xf s -1 (X -X) where n is the subgroup size, X is the p x 1 subgroup mean vector , X is the p x I estimate of the process mean , and S is the p x p unbiased estimator of the process covariance matrix L . The estimates X and S are computed from a base period of m samples, each of size (say) n' when the process is assumed to be in-control. This simulation study is based on a base period of 25 samples , each of size 5 and subgroup samples , each of size 5 also (that is, m = 25, n' = 5 , n = 5 ) . We claim that t here is evidence to indicate that the process is out-of-control whenever T2 for a samp le in the control period exceeds the asymptotic x 2 cut-off value 10. 60 . Figure 3 . 24 shows the average run length performance of this chart under the various biv ariate distributions that were used in this dissertation study . The non-centralit y parameter is g i v en b y A = (n r )11 2 . 200 180 160 : -+Normal I 140 i----1(1) 120 i-+---1(2) ...I a:: 100 [ ---*-1(5 ) c:,: i 80 i --llf--1(8 ) 60 i ___.._1 ( 18) 40 1--+-MxNor 20 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 3.24 Plot of ARL ve rsus A of th e T2 chart for subgroups of s i ze 5 and a base period o f size 25 under the various bivariate distribution s . PAGE 73 58 Figure 3. 24 shows that the performance of the T2 chart with subgroups of size 5 is poor under both multivariate normality and deviations from multivariate normalit y . The in-control average run lengths of this chart are lower than the pre-specified nominal v alue under all the bivariate distributions that were used in this dissertation stud y . It is interesting to note that most texts suggest using a base period of 25 obse rv ations and subgroup samples of size 5 in both the base period and the control period while suggesting the use of the asymptotic x2 cut-off value . The above figure clearly shows that the performance of the T2 chart is poor with these values even under the assumption that the underlying distribution of the process is multivariate normal. The type I error rates are high for the heavier tailed distributions whereas the type I error rate is closer to the nominal type I error rate under the bivariate mixed normal distribution. Our simulat i on results indicated that the type I error rates under the various Pearson type II distributions were also very small ( << 0 . 005 ) . 3 . 5 Performance of the Multivariate CUSUM Charts We will now discuss the performance of the multi var iate CUSUM charts that we re proposed by Crosier ( 1988) and Pignatiello and Rung e r ( 1990) . Crosier ' s multivar i ate CUSUM is given in Equation (2.4) . For k2 = 0. 50 and a pre-specified inco ntro l average run length of 200 , h2 = 5 . 50 . These v alues are provided in Crosier ( 1 988) . The multivariate CUSUM proposed by Pignatiello and Runger is g iven in Equation (2.5) . For k.). = 0.50 and for a pre-specified in-control average run length of 200, h4 = 4 .75. These values are provided in Pi g natiello and Run ge r ( 1990). Figures 3 . 2 5 through 3 . 3 1 show the PAGE 74 59 average run length performance of the two multivariate C U SUM charts under t he v ariou s distributions that were used in this dissertation stud y . ...J c::: < Lambda ~ I Figure 3 .25 Plot of ARL versus A showing the performance of t he multi v ariate C U SUM procedures under the bivariate normal distribution . 4 ...J c::: 3 < 2 0 0 . 5 1 . 5 Lambda 2 2 . 5 3 l--+--MC1 1---MC2 Figure 3 . 26 Plot of ARL versus A showing the performance of the multivariate C U SUM procedures under the bivariate cauchy distribution. PAGE 75 60 14 12 10 ...J 8 0:: <( 6 4 2 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 3.27 Plot of ARL versus A showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 2 degrees of freedom . 45 40 35 30 ...J 25 g 0:: <( 20 I 15 10 5 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 3 . 28 Plot of ARL versus A showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 5 degrees of freedom . PAGE 76 61 70 I 60 I I 50 g ! ...J 40 0::: ct 30 I\J1C2 . 20 10 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 3.29 Plot of ARL versus le showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 8 degrees of freedom. 140 120 100 ...J 80 0::: ct 60 40 20 0 0 0 . 5 1.5 2 Lambda 2 . 5 3 Figure 3 .30 Plot of ARL versus A showing the performance of the multivariate CUSUM procedures under the bivariate t distribution with 18 degrees of freedom . PAGE 77 62 30 25 20 ..J !-+-l'v1C1 0:: 15 l~l'v1C2 10 5 0 0 0.5 1 . 5 2 2.5 3 Lambda Figure 3 .31 Plot of ARL versus A showing the performance of the multivariate CUSUM procedures under the bivariate mixed normal distribution. Figures 3 . 25 through 3.31 show that the performance of the multivariate CUSlJ M charts are poor under departures from multivariate normality . The false alarm rates are very high under both the heavier tailed distributions and the bivariate mixed normal distribution. The multivariate CUSUM charts are more sensitive to smaller shifts in the process mean than the T ~ chart . The type I error rates of both multivariate CUSUM charts under the various Pearson type 2 light tailed distributions that were used in this dissertation study are smaller than the pre-specified type I error rate. The above figures also indicate that the performances of the two multivariate CU SUM charts are very similar . 3 . 6 Performance of the Multivariate EWMA Chart (MEWMA) We will now study the performance of the multivariate EWMA chart under deviations from multivariate normality. The procedure we use was suggested by Lowry et PAGE 78 63 al. (1992) and is given in Equation ( 2 . 6). For OJ = 0.10, and a pre-specified in-control average run length of 200, H = 8 . 66 . These values are given in Lowry et al. (1992) . Figure 3 . 32 shows the average run length performance of the l'vfEWMA chart under the various bivariate distributions that were used in this dissertation study . Figure 3 . 32 shows that the performance of the l'vfEWMA chart is poor under deviations from normality. The type I error rates are high under both the heavier tailed distributions and the bivariate mixed normal distribution. However , the performance of the l'vfEWMA under multivariate normality is similar to the performances of the T " and CUSUM charts that were discussed in sections 3.4 and 3.5 (Figures 3 . 3 and 3 . 25) . The type I error rates of the l'vfEWMA chart under the various Pearson t y pe 2 light tailed distributions that were used in this study are smaller than the pre-specified type I error rate. 250 200 -+-Normal I -+-1(1) i I 150 ---.-1(2) ; I ..J I et: ~1( 5 ) <( , ~- PAGE 79 CHAPTER4 ROBUSTNESS OF LIV'S (1995) NON-PARAMETRIC MULTIVARIATE CONTROL CHARTS. In this chapter , we introduce three non-parametric multivariate control charts that were proposed by Liu (I 995) and also present simulation results of the average run length performances of these charts . Liu introduced the "r " chart, the " Q" chart , and the "S " chart. These charts are analogous to the univariate X, X, and the CUSUM charts , respectively. The main idea behind the proposed control charts is the reduction of each multivariate observation to a univariate index namely a ranking, that is based on the concept of data depth . These ranks are then used to construct the control charts . The concept of data depth and some statistics that are derived from i t i s d i scussed in Section 4 . 1 . The r" chart is introduced in Section 4 . 2 . We discuss the .. Q" and the " S " charts in Sections 4 . 3 and 4.4, respectively . 4 . 1 The Notion of Data Depths In this Section we introduce the notion of data depths and also discuss some of the statistics that are derived from it. These statistics are then used to construct the data depth control charts . First, assume that the quality of each product can be classified by p quality characteristics. Let G be the prescribed distribution of the process when it is operating in-control and let Y, , ... , Y m be m p x I random observation vectors from G . The sample Y,, ... ,Y,,, isreferredtoasthebaseperiodsample. Next,let X,, X:, .. be 64 PAGE 80 65 new p x l observation vectors taken from the process during the control phase and assume that the X. 's follow a distribution F . The X . 's are used to determine if the I I process is operating in-control or whether the process has gone out of control. An out-of control state would imply that a difference exists between the distributions G and F . The concept of data depth is used to determine if the two distributions are different. For any point y ERP, the simplicial depth of y with respect to the distribution G is given by where s[Y1 , â€¢â€¢â€¢ , Yp +i] is a simplex whose vertices Y1 , â€¢â€¢â€¢ , Y p + i are p + I random observations from G. The quantity D a (y) is a measure of how " deep" or how " central" the point y is with respect to the distribution G . Most often G is unknown and onl y a sample Y1 , â€¢â€¢â€¢ , Y m is available . In this case the sample simplicial depth of the point y with respect to the data cloud Y1 , â€¢â€¢â€¢ , Y m is given by Do.(Y)=C:J, /(y E s[Y1 , â€¢â€¢â€¢ , Y , ]) I p+I 1:Si1 < ... PAGE 81 66 therefore, be compared with the performance of other competing affine inv ariant , multivariate control charts. The author also proved the uniform convergence of D G O to m D G O. This convergence property allows the estimation of D G O from D0m when G is unknown . Throughout this chapter, we will assume that the distribution G is unknown and will, therefore, have to be estimated by G m . /(\\ /'\ I I \ ' / I I I I \ I \ I \ X \ \ \ \ 1()=1 l()=O Figure 4 . 1 Illustrating the values of the indicator function in the bivariate case . To illustrate the notion of data depth, consider a sample Y1 , â€¢â€¢â€¢ , Y n in R 2 . Denote by ~(Y; , Y i , Y k ) a triangle with vertices Y ;, Y i , and Yk. For a sample of size n we can form C(n,3) ("n choose 3") such triangles. Therefore , for any point y E R2 we can now associate the number of such triangles that contain y inside . This number should be close to C(n,3) if y lies close to the center of the data cloud Y1 , â€¢â€¢â€¢ , Y n and should be close to zero if it is on the outskirts and exactly zero if it is outside the boundary of the data cloud. This concept of data depth extends very easily to p dimensions. PAGE 82 67 A center-outward ranking of the sample poi nts is induced if the data depth of each point in the base period sample is computed and ranked in ascending order. If we use Y ul to denote the sample point associated with the jth smalle s t depth value, t hen Y111, . . . , Y r m J are the order statistics of the Y;' s with YrmJ being the most "central" or the " deepest " point. A small value of the depth indicates that the associated point is outl y ing wit h respect to the distribution G . Next, we discuss some statistics that are deri ved from data depths . F i rst , let Y G indicate that the random variable Y follows the distribution G . Let r0 (y) = P ( D0 (Y)::; D0 (y) / Y G), and where I is the indicator function and is equal to one if Y i has a data depth which is less than or equal to the data depth of y . The probabilit y r 0 (y) is a measure of how o utl y ing the point y is with respect to G. A small value of r0 (y) indicates that y is an outl ier and is, therefore, not " central" with respect to the underlying distribution G . On the other hand the quantity r0m (y), which is an empirical version of r0 (y), measures how outlying the point y is with respect to the data cloud Y1 , â€¢â€¢â€¢ , Y m . A small v alue of r 0 m (y) indicates that only a small fraction of the Y i ' s are more outlying than the po i nt y w hich suggests that y is at the outskirts or on the boundary of the data cloud Y1 , â€¢â€¢â€¢ , Y"' . Next, define Q(G ,F) = P(D G (Y)::; D G ( X) I y G , X F). PAGE 83 68 It can be shown that Q( G, F) = E F (r O (X)) where E stands for expected value and r O (X) measures the fraction of the G population that is more outlying than X . The quantity Q( G, F) is the average of such fractions over all the X's from the F population . A value of Q( G, F) < 0.50 implies that on the average more than 50% of the G population is more " central" than any observation X from F . If G is known but F is unknown , then Q( G, F) can be estimated by } n Q(G,FJ = L''a(X , ) n , = 1 where Fn denotes the empirical distribution of X1 , â€¢â€¢â€¢ , X n . On the other hand if both G and F are unknown then an estimate of Q( G , F) is given by Throughout this chapter , we will assume that the distributions G and F are unknown and will , therefore. have to be estimated by G"' and F n , respectively . 4 . 2 The " r " Chart The " r " chart which is analogous to the univariate Shewhart chart for individual observations is based on the statistic r0m O which was discussed in Section 4 . 2 . It is constructed by first taking a base period sample of size m (labeled Y1 , â€¢â€¢â€¢ , Y"' ) . Next, for each future observation X ; in the control period , the statistic r0m (X; ) is computed . The "r" chart is a plot of r0m (X;) against time i . The center line of the control chart (CL) is set at 0 . 50 and the lower control limit (LCL) is set at t he type I error rate (a). These PAGE 84 69 control limits are based on the asymptotic distribution of rem O . Liu and Singh (1993) proved that the asymptotic distribution of rem O is U [O,l] (a uniform distribution between 0 and 1). The asymptotic distribution of r e m ( X , ) suggests that the LCL =a. We claim that the process is out-of-control whenever r e (X; ) for a point X ; plots below a . m To illustrate the construction of the " r" chart consider the example given in L i u (1995) . Liu considers a base period of 500 observations and records the number of observations or fraction of the sample with depth equal to zero . It is observed that 2.2% of the sample observations had a depth of zero so that the t y pe I error rate ( a ) is then set at 2 . 5% . Note that in this approach the type I error rate is set after observing the base period sample size . In reality , the type I error rate is pre-specified and is usually set at 0.005 . We require a base period sample size larger than 500 to achieve this pre-specified type I error rate . In reality, the base period sample size is usually less than 1 00 . The type I error rate for base period sample sizes less than 100 is higher than 0 . 05 . This was verified by simulation . 4 . 3 The "O" Chart The "Q" chart which is similar to the univariate Shewhart chart with subgroups of observations is based on the statistic Q(Gm,FJ which was discussed in Section 4 . 2 . It is constructed by first taking a base period of m observations (labeled Y1 , â€¢â€¢â€¢ , Y m ) . Subgroups of size n (labeled X1> ... ,XJ are taken in the control period . For each X ; m the k1h sample ( k = 1 ,2, ... ) the statistic r e m (X;) is then computed . The statistic Q( Gm, Fn) for the k t h sample is g iven by PAGE 85 70 The " Q " chart is a plot of Q k (G m , Fn) versus time k ( k = 1 , 2, ... ) . The asymptotic distribution of Q( Gm, F n ) is used for deriving the control limits of the " Q " chart. Liu and Singh (1993) show by simulation results that the large sample distribution of Q(Gm,FJ is 1 1 I I N(-,-(-+-)). 2 12 m n Liu and Singh claim that this approximation holds well even when the size of the samples in the control period is as small as 5. However, our simulation results show that the performance of the " Q " chart is poor with control limits based on this approximation . The center line and lower control limit of the "Q" chart with the above approximation are given by C L = 0.50 and 1 LCL = 0.50Za 12[(1/ m) + (1 / n)] . The process is declared out of control whenever Qk (Gm, Fn) for the k ih sample falls below L CL. To illustrate the construction of the "Q" chart , let us consider an e x ample given in Liu (1995). Liu considered a base period sample of 500 observations and observed that 2.2% of these observations had a depth of zero . The type I error rate ( a ) was then set at 0 . 025 . The control period sample size is set at 5. The control limits of the " Q " chart are , therefore , CL= 0.50 LCL = 0.246 PAGE 86 71 and the process is declared out-of-control whenever Qk (G,,,,FJ for the kt h sample falls below 0.246 . 4.4 The " S'' Chart The "S" chart which is analogous to the univariate CUSUM chart for the process mean, is based on the statistic Sn ( G,,,) which is defined as n I SJG,,,) = L[r0m (X; )-2]. i=I Note that Sn ( G,,,) can be rewritten as Sn (Gm)= n[Q(G,,, ,Fn) -1 / 2] and we can construct the " S " chart by letting CL = 0 and LCL = -Z a n 2 [ (1 / m) + ( 1/ n)] /12 . These control limits were derived by using the asymptotic distribution of Q( G"', F n ) which was given in Section 4.3 . The" S" chart is a plot of S n(G,,,) versus time n = l ,2,... . We claim that the process is out of control whenever S n ( G,,,) at time n falls below LCL . Note that the above formulation of the " S " chart does not include a reset feature that is typical of the normal theory CU SUM charts. Although , the " S " chart is reset at zero whenever the statistic Sn ( G,,,) falls below LCL , it is not reset to zero when S n ( G,,,) exceeds zero . This characteristic of the " S" chart slows its ability to detect out-of control states. To illustrate the construction of the " S " chart , let us consider an example from Liu ( 1995) . Liu considered a base period of 500 observations and observed that PAGE 87 72 2. 2% of the observations had a depth of zero . The type I error rate ( a ) was then set at 0 . 025. The control limits of the " S" chart in this case is given by: CL =0 L C L = -I.96~n2 [(l / 500) + (1 / n)] I 12. The process is declared out-of-control whenever SJGm) falls below LCL. 4.5 Discussion of Simulation Results We will now discuss the average run length performance of the three data depth multivariate control charts. Our results are based on simulation studies that were conducted using various FOR TRAN 77 and IMSL subroutines . The type I error rate ( a ) was set at 0.005 . One of the first problems that we encountered was determining an appropriate base period sample size in order to achieve the pre-specified type I error rate . Liu (1995) simulated 500 random observations from the N2 (0 , I) distribution and observed that 2.2% of the base period sample or 11 points had zero data depth . The type I error rate was then set at 0 . 025 . How large a base period sample size would we need to achieve a type I error rate of0. 005? Heuter (1994) was used to heuristically determine an appropriate base period sample size. Heuter' s result states that the expected number of vertices or extreme points on the convex hull of a multivariate normal sample of size n is given by 2~2:rln(n) where :r = 3.142 and In(-) is the natural log function. This result was used as a guideline and base periods of sizes 2000, 2500 , and 3000 were simulated from the N2 (0, I) distribution . It was found that the average number of PAGE 88 73 extreme points for a base period of 2500 observations was 12. 24 thus giving a simulated type I error rate of 0.0048 which is very close to the nominal value of 0 . 005. The base period sample size was, therefore, set at 2500 observations for all three data depth control charts. The simulation results are based on only 1000 out-of-control signals since the algorithm to compute the data depths is very computer intensive . The data depths were computed by using the FOR TRAN algorithm developed by Rousseeuw and Ruts ( 1992). Figure 4.2 shows the average run length performance of the " r " chart under the bivariate distributions that were used in this dissertation. Note that the "r" chart was modified by using a LCL = 0 instead of the recommended LCL = a due to the negligible difference between the pre-specified type I error rate of 0 . 005 and zero. Therefore, the process was declared out-of-control for any observation that was on the boundary of the data cloud that is determined by the base period sample . The performance of the modified "r " chart was found (by simulation) to be identical to the performance of the chart with LCL = a . This is because the only way to achieve a value of the depth less than a is to observe a depth equal to zero . We were also able to considerably reduce the computation time by using LCL = 0 . The plot shows that the in-control ARL is overestimated when the random deviates were sampled from the bivariate normal distribution . It is also clear that the performance of the " r " chart is poor under the heavier tailed distributions. The type I error rate is very small for the multivariate t distribution with 2 and 5 degrees of freedom. As the degrees of freedom increases the type I error rate starts to converge to the type I error rate under the multivariate normal distribution. The type I error rate is overestimated for the bivariate mixed normal distribution . The performance of the " r " chart under the multivariate t distribution with 2 degrees of freedom is unchanged under PAGE 89 74 the various values of the non-centrality parameter. The performances of the " r " chart under multivariate normality and the mixed normal distribution are similar . 400 350 I 300 I-+Normal : i 250 1---t( 2 ) . l ..J -.-t( 5 ) a: 200 <( ~t( 8 ) 150 ---1( 18 ) 100 -+-MxNor 50 0 0 0 . 5 1 . 5 2 2.5 3 Lambda Figure 4 . 2 Plot of ARL versus A performance of the " r " chart under the various bivariate distributions . Figures 4 . 3 through 4.8 compare the performance ofHotelling' s T2 chart with the performance of the " r " chart under the various distributions that were used in this dissertation . Un der bivariate normality, the in-control average run lengths of the T2 c hart are shorter than with the" r " chart for values of A between O and 2. The performance of the T2 chart is better than the performance of the " r " chart under bivariate normalit y owing to the smaller base period sample sizes of the T2 chart relative to the " r " chart. The in-control average run lengths of the T2 chart under the various bivariate t distributions are shorter than the pre-specified in-control average run length . On the other hand the in-control average run l engths of the " r " chart are hi g her than the pre spec ified in-control average run length . The in-control average run length of the T2 chart under PAGE 90 75 the bivariate mixed normal distribution is higher than the pre-specified in-control average run length whereas the in-control average run length of the " r " chart is lower than the pre-specified in-control average run length. Figures 4.9 through 4.10 show the average run length performance of the "Q" chart for subgroups of sizes 5 and 10 under the various bivariate distributions that were used in this dissertation study. The non-centrality parameter is given by ,1, = (n r) 1 1 2 â€¢ Figure 4.9 shows that the average run length performance of the "Q" chart with subgroups of size 5 is poor under both multivariate normality and departures from it. Figure 4.9 indicates that the in-control average run lengths of the "Q" chart are underestimated under all the bivariate distributions . This implies that the chart will give a high rate of false alarms even though the process is in-control. -+-m=20 150 ...J -m=50 a: < .......-m=100 100 -r 50 0 0 0.5 1.5 2 2.5 3 Lambda Figure 4.3 Plot comparing performance of the T2 chart with the" r" chart under bivariate normality. PAGE 91 76 6 00 500 400 -+-m=20 ...I 0:: 300 ct 2 00 -a-m=50 i ......_m=100 I ! ; ~r 1 00 0 0 0 . 5 1 . 5 2 2. 5 3 Lambda Figure 4.4 Plot comparin g the performance of the T2 chart w ith the " r " chart under the bivariate t distribution with 2 d .f. 600 500 4 00 ...I 0:: 300 ct 2 00 1 00 0 0 0.5 1.5 2 2 .5 3 Lambda Figure 4 . 5 Plot comparing the performance of the T2 chart w ith the " r " chart under the bivariate t distribution with 5 d .f. PAGE 92 77 600 500 ,;.. ' 400 -+-m=20 .J --m=50 ci:: 300 : ----A--m=100 ct 200 _.,_r 100 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 4 . 6 Plot comparing the performance of the T2 chart w ith the " r " c hart under the bivariate t distribution with 8 d . f 600 500 400 .J ci:: 300 ct 200 100 0 0 0 . 5 1 . 5 2 Lambda 2 . 5 3 i-+-m=20 --m=50 , ----A--m=100 _.,_r Figure 4 . 7 Plot comparing the performance of the T2 chart with the " r " chart under the bivariate t distribution with 18 d .f. PAGE 93 78 600 500 400 -+-m=20 ...I ----m=50 0:: 300 ct .......,._m=100 200 ----r 100 0 0 0 . 5 1 . 5 2 2.5 3 Lambda Figure 4 . 8 Plot comparing the performance of the T2 chart with the " r " chart under the bivariate mixed normal distribution . 100 90 80 --+Normal 1 70 60 ---t(2) ...I ---+-t(5) 0:: 50 ct ----*t( 8) 40 30 ---t(18) 20 -+-rv'D PAGE 94 70 60 50 40 ...J 0:: <( 30. 20 10 0 0 0 . 5 79 1 . 5 2 Lambda 2 . 5 3 -+-Normal -e-1(2) ----...-1(5) ~t(8) ---1(18) -eMx No Figure 4.10 Plot of ARL versus /4 performance of the Q chart under the various bivariate distributions. Figure 4.10 indicates that the performance of the "Q" chart can be improved upon by increasin g the subgroup size. However, it is clear that increasing the subgroup size does not improve the performance dramatically. The type I error rates are still smaller than the pre-specified type I error rate. Figures 4.11 through 4.16 compares the performance of the Hotelling ' s T2 chart with the performance of the "Q" chart with subgroups of size 5 under the various distributions that were used in this dissertation study . U nder bivariate normalit y, the incontrol average run length of the T2 chart is closer to the pre-specified in-control average run length than the in-control average run length of the "Q" chart. The average run lengths of the T2 chart under the out-of-control states are shorter than the average run l e ngths of th e " Q" chart under the out-of -c ontrol s t a tes. This is a direct consequenc e of PAGE 95 80 the relationship between the type I error rate ( a ) and the power of a hypothesis test. As the type I error rate increases so does the power of the hypothesis test. The in-control average run lengths of the T2 chart are shorter than the in-control average run lengths of the'' Q" chart under the bivariate t distribution with 2 and 5 degrees of freedom. On the other hand, the in-control average run lengths of the T2 chart are higher than the in-control average run lengths of the " Q" under the bivariate t distribution with 8 and 18 degrees of freedom. The poor performance of the " Q " chart can be attributed to its cut-off value which is based on the as ymptotic distribution of the "Q" statistic. We feel that the performance of the " Q" chart can be improved upon by simulating the cut-off value. However, since the FORTRAN programs to compute data depths are very computer intensive, this method may use up unnecessary computer t ime. 160 .------------------, 140 120 100 80 < 60 40 20 0 . 5 1.5 Lambda 2 2 . 5 3 ~ " Q"(n=5 ) --T112 (n =5 ) Figure 4.11 Plot comparing the performance of the T2 chart with the "Q" chart under the bivariate normal distribution. PAGE 96 81 -----------90 80 70 60 ..J 50 --+"Q" (n=5) a:: c( 40 ---P2 (n=5) 30 20 10 0 0 0 . 5 1 . 5 2 2.5 3 Lambda Figure 4.12 Plot comparing the performance of the T2 chart with the " Q " chart under the bivariate t distribution with 2 d.f. -----90 80 70 60 ..J 50 a:: c( 40 30 20 10 0 0 . 5 1 . 5 Lambda 2 2 . 5 --+"Q" (n=5) ' ---P2 (n=5) 3 Figure 4.13 Plot comparing the performance of the T2 chart with the " Q " chart under the bivariate t distribution with 5 d.f. PAGE 97 82 100 90 80 70 60 ..J c:: 50 <( 40 -+"Q " ( n=5 ) ----T " 2 ( n=5) 30 20 10 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 4. 14 Plot comparing the performance of the T2 chart with the " Q " chart under the bivariate t distribution with 8 d . f 140 120 100 ..J 80 c:: -+-"Q" (n=5) i <( 60 ----T"2 ( n=5 ) 40 20 0 0 0.5 1.5 2 2.5 3 Lambda Figure 4. 15 Plot comparing the performance of the r2 chart with the "Q" chart under the bivariate t distribution with 18 d . f PAGE 98 83 200 180 160 140 120 .J 0:: 100 er: 80 -+"Q" (n=5) I ---T"2 (n=5) I 60 40 20 0 0 0.5 1.5 2 2.5 3 Lambda Figure 4. 16 Plot comparing the performance of the T2 chart with the "Q" chart under the bivariate mixed normal distribution. Figure 4. 1 7 shows the performance of the "S " chart under various bivariate distributions. Note that the in-control average run lengths are overestimated under all the bivariate distributions that were used in this dissertation . The in-control average run length is greatest for the mixed normal distribution . The long in-control average run lengths are due to the inability of the "S" CUSUM to reset to zero whenever the statistic SJGm) exceeds zero . Figures 4 .18 through 4.23 compare the performance of the multivariate CUSUM charts that were proposed by Crosier (1988) and Pi g natiello and Runger (1990) with the performance of the "S" chart under the various bivariate distributions that were used in this dissertation . Under bivariate normality , both the normal theory multivariate CUSUM charts attain the pre-specified in-control average run lengths and do very well in detecting out-of-control states . On th e other hand, the "S" chart has an in control av e rage run PAGE 99 84 length that is greater than the pre-specified in-control average run length . The out-of control average run lengths of the " S" chart are greater than the out-of-control average run lengths of both the normal theory multivariate CUSUM charts . The i n-contro l average run lengths of the normal theory CU SUM charts are shorter than the pre-specified in-control average run lengths for the various non-normal distributions . On the other hand the in-control average run lengths of the "S" chart are greater than the pre-specified in control average run length under the various non-normal distributions. Th i s is a d i rect consequence of the relationship between the type I error rate ( a ) of a h y pothesis test and the power of a test. Also note that the in-control average run lengths of the " S " chart are high since it does not have a reset feature that is typical of the normal theory CUSUM procedures . In particular , although the " S" statistic is reset to zero when t he process is declared out-of-control, it is not reset to zero when it is equals a positi v e value . 450 400 350 I-+Norma l I 300 -1( 2) I ..J 250 ---.-1( 5) I a: ---*"" I ( 8) ct 200 150 ----1(1 8 ) -+-MxNo 100 50 0 0 0 . 5 1.5 2 2 . 5 3 Lambda Figure 4. 1 7 Plot of ARL versus A. for the "S " chart under the various bivariate distributions. PAGE 100 300 250 200 .J a:: 150 ct 100 50 0 0 0 . 5 85 1 . 5 2 Lambda 2 . 5 3 I-+Crosier I --PigRunger I ____.._"S " Figure 4 . 18 Plot comparing the performance of the normal theory multivariate CU SUM charts with the "S" chart under the bivariate normal distribution . 300 250 200 .J a:: ct 150 100 50 0 0 0 . 5 1 . 5 2 2.5 Lambda 3 -+-Crosier I--PigRunger ......._"S" Figure 4.19 Plot comparing the performance of the normal theory multivariate CUSUM charts with the "S" chart under the bivariate t distribution with 2 d .f. PAGE 101 86 300 250 200 -+-Crosier ...I a:: 150 < --PigRunger -...--"S" 100 50 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 4 .20 Plot comparing the performance of the normal theory multivariate CUSUM charts with the" S" chart under the bivariate t distribution with 5 d .f. 300 250 200 -+-Crosier ...I a:: 150 < ---PigRunger -...--" S" 100 50 0 0 0 . 5 1.5 2 2 . 5 3 Lambda Figure 4.21 Plot comparing the performance of the normal theory multivariate C U SUM charts with the " S " chart under the bivariate t distribution with 8 d .f. PAGE 102 87 300 250 200 -+-Crosier _, c::: 150 <( --PigRunger ---+--"S" 100 50 0 0 0 . 5 1.5 2 2 . 5 3 Lambda Figure 4.22 Plot comparing the performance of the normal theory multivariate CUSUM charts with the " S" chart under the bivariate t distribution with 18 d.f. 450 400 350 300 _, 250 c::: < 200 -+-Crosier --PigRunger ---+--"S" 150 100 50 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 4 .23 Plot comparing the performance of the normal theory multivariate CUSUM charts with the " S " chart under the bivariate mixed normal distribution. Our studies show that the performances of the non-parametric multivariate control charts that were proposed by Liu ( 1995) are not satisfactory under both normality and PAGE 103 88 deviations from it. Furthermore. a large base period sample is needed to maintain a type I error rate of 0 . 005 in the base period. However , this pre-specified type I error rate is not achieved in the control period. We introduce robust multivariate charts in Chapters 5 and 6 and show by simulation studies that the performances of the proposed charts are an improvement over both the normal theory and Liu ' s (1995) non-parametric multivariate control charts. PAGE 104 CHAPTER 5 ROBUST MULTIVARIATE CONTROL CHARTS UNDER A KNOWN COY ARIANCE MATRIX In this chapter , we propose robust alternatives to the normal theory T2 chart and to the non-parametric multivariate control charts that were proposed by Liu ( 1995). We assume that the covariance matrix of the underlying distribution of the process is known. In Chapter 6 , we introduce affine invariant multivariate control charts under the assumption that the covariance matrix of the underlying distribution of the process is unknown. The performances of the proposed multivariate control charts compare well with the performances of both the normal theory and the non-parametric multivariate control charts that were proposed by Liu ( 1995) under the assumption that the underlying distribution of the process is multivariate normal. On the other hand, the proposed multivariate control charts out perform the normal theory and the non-parametric multivariate control charts when departures from the multivariate normal assumption are encountered. Of the two proposed control charts, the first is based on the affine invariant one sample multivariate sign test developed by Randles (1989), while the second chart is based on the affine invariant one sample multivariate sign-rank test that was developed by Peters and Randles ( 1991) . We summarize Randles ( 1989) and Peters and Randles (1991) in Section 5. 1 and introduce two robust multivariate Shewhart type charts in Section 5 . 2 . The first chart which is based on Randles ( 1989) is called the Randles sign test chart (RST) and the 89 PAGE 105 90 second chart which is based on Peters and Randles ( 1991) is called the Peters-Randles sign-rank test chart (PR-SRT) . Simulation results of the average run length studies of these charts are also given in Section 5 .2. We introduce a robust multivariate EWMA chart in Section 5.4 . This chart, which is a modification of the multivariate EWMA chart that was introduced by Lowry et al. (1992) , is based on Randles (1989) and is called Randles multivariate EWMA chart (RMEWMA). Simulation results of the average run length studies of this chart are also given in Section 5 . 3. 5 . 1 Affine Invariant Multivariate One Sample Sign And Sign Rank Tests Let X 1 , â€¢â€¢â€¢ , X n denote a sample from a p -dimensional , absolutely continuous population with p vector location parameter 0 . We wish to test H0 : 0 = 0 versus HO : 0 -:1; 0. Here O is used without loss of generality , since H0 : 0 = 00 can be tested b y subtracting 00 from each observation vector and testing whether these differences are located at O . This problem is usually solved by using Hotelling ' s T2 procedure which assumes that the underlying population is the N P ( 0 , L) distribution with mean 0 and covariance matrix L . This procedure rejects the null hypothesis H0 whenever -r (n-l)p T2 =X s -1x~---F (n-p) a,p , n p where X is the p x 1 sample mean vector , S is the unbiased p x p sample covariance matrix estimate of L and Fa.p.nP is the upper ath quantile of the F distribution with p numerator and n -p denominator degrees of freedom , respectively. Randles ( 1989) introduced an affine invariant test statistic as an alternative to the Hotelling ' s T2 . Randles test is based on a sign statistic that uses the direction of the observations from O rather PAGE 106 91 than the distances from O . The proposed multivariate test is based on interdirectio ns which is discussed next . Consider a pair of observations X i and X k in a sample of size n . Let C J k denote the number of hyperplanes formed by the origin O and p I o t her poin ts ( excluding X i and X k ) such that X i and X k are on opposite sides of the hyperplane formed . Therefore, given a sample of size n, C f k is an integer between O and (n-2) p-1 inclusive . A value of Cfk = 0 implies that the points X f and X k are adjacent. The counts Cik 's are called interdirections and Cik measures the angular distance between X i and Xk in relation to the origin and the other data points . Note that the interdirect i ons involve only the direction of each data point X i from the origin and do not consider the distance Xi lies from the origin. Randles ( 1989) introduced a sign test based on interdirections. To describe t his test , consider the test for general p that rejects H0 for large va lues of the statistic where P,, (C,, +d")i(p~ 1 ) if) tc k = 0 if)= k and PAGE 107 92 The proportion p J k is the observed fraction of times that X 1 and X k fall on opposite sides of the data-based hyperplanes. Randles ( 1989) showed that the sign test based on interdirections is invariant under non-singular linear transformations and that under H0 , Vn has a small-sample distribution-free property over the class of dis t ributions with elliptical directions. This class of distributions includes all ellipticall y s y mmetric distributions and some other distributions that are skewed. Randles ( 1 989) also showed that under H0 , Vn has an asymptotic Chi-square distribution with p degrees of freedom. The form of the statistic Vn can be motivated b y recalling Raleigh's tes t , which tests the null hypothesis of a uniform distribution on the unit sphere v ersus an alternat iv e distribution that is concentrated in an unknown direction. If U 1 , â€¢â€¢â€¢ , U n denote observations on the p -dimensional unit sphere, Raleigh's statistic is -r-vn = np u u = -"-' "-' u u k n J=I k=l where the term U ~ U 1: is the cosine of the angle between U J and U k . The stati stic Vn is formed by replacing the angle between U 1 and U 1: with an invariant est imator of that angle, namely :rp1k. Note that in this case X ; = U ; for (i = l , ... ,11 ) . With Randles ( 1989) sign test in the background, we now discuss the affine invariant multivariate one sample sign-rank test that was developed b y Peters and Randles ( 1991 ). Since we have considered elliptically symmetric populations, then it is logical to measure the distance of each observation from the origin in terms of elliptical contours. We can then use the ranks of these distances along with the observat i ons ' directions to form a test statistic. Specifically, we form estimated Mahalanobis distances vi a PAGE 108 93 b = X T f l X I I I l n for i = 1, ... ,n where I:= L X , X ; is a consistent estimator of the co v ariance of X n ,=1 under H0 when X is elliptically symmetric. Let R ; = rank(D,) among D1 , â€¢â€¢â€¢ D n for i = 1, . .. ,n. We now weight the (),kt term in the sum Vn by R1Rk, and consider the statistic 3p n n . R R W,, =: LLcos(TrjJf k) 1 _ k . n 1=1 k=I n n The test statistic W,, is affine invariant since both p ;k and D ; (i = 1, ... , n) are affine invariant. Peters and Randles ( 1991) show that under H0 , n W,, converges in distribution to a Chi-square random variable with p degrees of freedom . The statistic W n ca n also be motivated from Raleigh's statist i c w,, (say) which is given by 3 n n R R w = _E_""uTu -1 _ k n l L,.L,. ; k n 1=1 k=I n n where the term U ~ U k represents the cosine of the angle between U 1 and U k . The statistic W,, is formed by replacing the angle between U J and U k with an invariant estimator of that angle , namely 7r/J1 k . Note that in this case X ; = U ; for ( i = 1, .. . ,n). The test statistics Vn and W,, form a basis for all the control charts that are proposed in this chapter. We introduce the multivariate Shewhart type charts first. 5.2 Robust Multivariate Shewhart Type Charts Assume that X is elliptically symmetric with density function PAGE 109 94 f(x) = k i~::1-1 1 2 g{(x0f r l (x-0)} ' ( 5 .1) where x ERP, I is any nonsingular matrix, 0 is the location parameter , g is a onedimensional real valued function that is independent of p , and k P > 0 . For simplicity, assume that I is the covariance matrix of x and is known . Define random vectors U as follows ( 5 . 2) Theorem 5 . 1. For any function g in Equation ( 5 . 1 ) , U is distributed uniformly on the surface of a p -dimensional unit hypersphere . Proof: For simplicity assume that 0 = 0 and I = I . Then , X U=---.Jxrx . We will make use of the polar representation of t he unit vectors U 1 , . . . , UP in order to prove this theorem. A point U on the p -dimensional unit h y persph e re c a n be uniquel y r epre s ented by p I angles 0 1 , ... , 0 P and the equations : ul = r sin 01 sin 0 2 ... s in 0 p 2 sin 0 p l U2 = rsin01 sin02 ... sin0P _2 cos0 P _ 1 up2 = rsin01 s in 0 2 cos03 up1 = rsin01 cos0 2 u p = rcosOi. PAGE 110 95 ' T . ( h where r= X X , 0 < fl; < n, , = 1 , ... , p 2 ; 0 < 0P_1 < 2n, r > 0 see Jo nson 1986 , p . 125-126) . We will first compute the Jacobian of the transformation from U1 , ... , U P to Differentiating the first of these gives 2u1du1 = 2r2 sin 2 81 ... sin 2 8 p-2 sin 8 p-I cos8 p-id.8 p-i + terms involving dr ,d.81 , â€¢â€¢â€¢ , d8 p -2 . Differentiating the second gives 2u1du1 + 2u2du2 = 2r2 sin281 ... sin8P_2 cos8 P _2d8P _2 + terms involving dr ,d.81 , ... ,d.8 p-3 , and so on, down to the last which gives p L 2u; du ; = 2rdr . i = I Next, take the exterior products of all the terms on the left and of all the terms on the right (see Muirhead , 1982, p . 50-78). The exterior product on the left side is p 2Pu1 ... u P [I du ; . i = l The exterior product on the right side is p-1 2 P ?p-l . ?p-J 0, . 'p-S 0 . 0 LJ 0 0 [Ide d r-sm1 sm2 â€¢â€¢â€¢ sm p-I cosu1 cos :, .. cos p I ; I\ r , i=I which equals PAGE 111 96 smce . p p , e p-1 e e e e u1 â€¢â€¢â€¢ u P , sm I sm 2 ... sm p i cos 1 ... cos p-i. Equating the two sides and using Theorem 2 .11 (Muirhead, 1982 , p. 52) , the Jacobian of the transformation from U1 , â€¢â€¢â€¢ , UP to r, 01 , â€¢â€¢â€¢ , 0 P _1 is given by p 1 . p 2 {) â€¢ p 3 0 . 0 r sm u1 sm 2 ... sm p -2 . This implies that the joint density function of r, 01 , â€¢â€¢â€¢ , 0P_1 is proportional to from which it is apparent that r , 01 , â€¢â€¢â€¢ , 0 P _1 are all independent and 0 k has density function proportional to sinmi k 0k. The joint density can be factored into the density of r and . p 2 {) â€¢ p 3 0 . 0 sm u1 sm 2 â€¢â€¢â€¢ sm p -2 which implies that U1 , â€¢â€¢â€¢ , U P is uniformly distributed on the boundary of a p dimensional unit hypersphere . (Q.E.D.) Theorem 5 . 2. The random vector U has expected value O and covariance matrix p -11 where I is the p dimensional identity matrix . Proof: Let = E(U) and I = E(U )(U f. Since U is distributed uniformly on the p dimensional unit hypersphere , then for any p x p orthogonal matrix H , HU is also uniformly distributed on the p dimensional unit hypersphere . T herefore , = H and L = H L HT. Hence is the ze ro vector and I is proportional to the identity PAGE 112 97 matrix IP. Since trace( L )=trace E(UU r ) = E (trac e(U ru) = 1 , then L = (l / p )IP . (Q . E . D.) Next , define and note that from theorem 5.2 and i = l E(Z) = 0 , n Var(Z) =-I p where I is the p -dimensional identity matrix . We now define the test statistic RST as follows RST= zr (Var(Z)f1Z. Note that RST can be re-written as where the term U~U k is the cosine of the angle between the unit vectors U J and U k . Theorem 5 . 3 . Under the assumptions that the underlying distribution of the process is elliptically symmetric and the process is in-control , the test statistic RST has an asymptotic Chi-square distribution with p degrees of freedom. Proof: First note that by the central limit theorem , PAGE 113 98 Therefore , The R S T chart is constructed by taking samples of s i ze n in the control period and computing the test statistic RST . The process is declared out-of-control whene ver the statistic RST exceeds an appropriate cut-off value L. In realit y the control period sample size is as small as 5 . In this case , the cut-off v alue based on the asymptot i c distribution of RST does not y ield the pre-specified in-control average run length . Therefore , the cut-off value L is obtained by simulation . The simulation study was based on a pre-specified in control average run length of 200 and control period samples of size 5 and 10 . The values of the cut-offs for control period samples of size 5 and 10 are 8.43 and 9 . 67 , respectivel y . As the sample size increases, the simulated cut-off values converges to the asymptotic X { 0005 cut-off value of 10.60 . Figures 5 . 1 and 5 . 2 show the average run length performance of the RST chart for v arious values of the non-centralit y parameter A = (nr )1 1 2 and under the biv ariate distributions that are used in this dissertation . Figure 5 . 1 indicates that the RST chart with n = 5 maintains the pre-specified in-control average run length under the ellipticall y symmetric distributions. The in-control average run length is shorter than 200 under the bivariate mixed normal distribution . The figure also shows that the performance of the RST chart with n = 5 does not vary much for the different elliptically symmetric distributions . This is in-contrast to the performance of the T2 chart in which the in control average run lengths were very different for the various distributions . The incontrol average run length of the T2 chart under the bivariate mixed normal distribution is PAGE 114 99 greater than the pre-specified in-control average run length . On the other hand the in control average run length of the RST chart is shorter than the pre-specified in-control average run length . 200 180 160 --+-Normal 140 -t(2) 120 .......,_t(5) ..J 0:: 100 -M-t(B) C( -W-t(18) 60 --+-MxNor 40 20 0 0 0 . 5 1.5 2 2 . 5 3 Lambda Figure 5 . 1 Plot of ARL versus A for the RST with n = 5 under various bivariate distributions. Figure 5. 2 shows that the RST chart with n = l O maintains the pre-specified in control average run length under the elliptically symmetric distributions . The in-control average run length is shorter than the pre-specified in-control average run length under the bivariate mixed normal distribution although it is greater than the in-control average run length of the chart with n = 5 . The performance of the RST chart with n = 10 does not differ significantly for the various elliptically symmetric distributions. This is in contrast to the performance of both the T2 and the " Q " chart where the a v erage run lengths were very different for the various distributions . Increasing the sample size makes the RST PAGE 115 100 chart more sensitive since the out-of-control average run lengths with n = 10 are shorter then the out-of-control average run lengths of the RST chart with n = 5 . 250 200 -+-Normal 150 ---t(2) ..J . --.-1(5) a:: c( 1~t(8) 100 --!(18) -+-Wo PAGE 116 101 3 n n R R p R -RST = LL u ; u k _ J _ k . n i = I k = I n n Theorem 5 . 4 . Under the assumptions that the underlying distribution of the process is elliptically symmetric and the process is in-control , the test statistic n( PR RST) has an asymptotic Chi-square distribution with p degrees of freedom . Proof: Note that 3p n n R R n(PR-RST) =-LLV~ Vk-1 _ k . n i=I k=I n n To establish the limiting distributions we use an approximating statistic 3 n n n(PRRST)" = _p_ LL u ~ u kH(D; )H(Di ) n i = I k=I where Di= X ~ L -1 X i , H(t) = P H0 ( Di::; t) (0::; t) (see Peters and Randles (1989)) . Now let n z = LV;H(D; ) i = I and note that E(U;) = O and that E (H(D;)) = E ( U[O, l]) = 1/ 2 . Also note that Var(V; ) = p -1 I and that (H2 (D; )) = ( U2 [0 ,1]) = I / 3 . Therefore , n E (Z) = I E(U;H(D;)) i = I n = I E (U;)E(H(D;)) i = I PAGE 117 102 since U, is independent of H(D,) under the assumption that the X, has a distribution that can be classified under the family of elliptically symmetric distributions. Also note that n var(Z) = LVar(V,H(D,)) r=I " """" T = L..,E(H-(D,)U,U,) r=I II = L(H2{D,))(U,U;) r=I n =-I 3p p where I is the p-dimensional identity matrix . Next, write n(PR SRT) as follows n(PR-SRT) =Zr[Var(Z)tZ and note that this can be rewritten as Using the central limit theorem, we know that ~p ~r-;;Z ~NP (0, I) . Therefore, by the multivariate central limit theorem n(PR-SRT) has a Chi-square distribution with p degrees of freedom. (Q.E.D.) The PR SRT chart is constructed by taking samples of size n and computing the test statistic n(PR-SRT). The process is declared out-of-control whenever the value of the statistic n(PRSR7) exceeds an appropriate cut-off value l. In reality the control period sample size is as small as 5. In this case, the cut-off value based on the PAGE 118 103 asymptotic distribution of n(PRSR'!) does not yield the pre-specified in-control average run lengths. Therefore, the cut-off value L is obtained by simulation to achieve a pre-specified in-control average run length . Our simulation studies are based on a pre specified in-control average run length of200 and control period samples of size 5 and 10. The value of the cut-off is 9.38 for a sample of size 5 and 10.32 for a sample of size 10. Figures 5.3 and 5.4 show the average run length performance of the PR-SRT chart for different values of the non-centrality parameter ,1. = (n7 )' ' : under the various bivariate distributions that are used in this dissertation. Figure 5.3 shows that the PR-SRT chart with n = 5 maintains the pre-specified in-control average run length of 200 under all the elliptically symmetric distributions . The in-control average run length is shorter than 200 for the bivariate mixed normal distribution. The figure also shows that the out-of-control average run lengths are not too different for the different elliptically symmetric distributions . Figure 5 . 4 shows that the PR SRT chart with n = l O maintains the pre-specified in-control average run length of 200 under all the elliptically symmetric distributions. The in-control average run length is shorter for the bivariate mixed normal distribution again . However. the in-control average run length under the bivariate mixed normal distribution is slightly greater than for n = IO than the in-control average run length when 11 = 5 . The figure also shows that the out-of-control average run lengths are not too different for the different elliptically symmetric distributions . Increasing the sample size makes the PR-SRT chart more sensitive to shifts in the process mean . This is evident from the fact that the out-of-control average run lengths of the chart with 11 = 10 are shorter than the respective out-of-control average run lengths of the chart with n = 5. PAGE 119 104 Figures 5 . 5 through 5 . 10 compare the average run length performances of the RST, PR SRT , T2 , and the " Q" chart that was proposed by Liu ( 199 5) . We will restrict the comparisons to the case when the control period samples are of size 5 . 250 200 -+-N orma l t( 2) 150 --lrt ( 5 ) ...J a: -M-t(8) c:( 100 -t(18) -+-Mi xNorm 50 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 5 . 3 Plot of ARL versus A for the PR-S'RT with n = 5 under various bivariate distributions . 200 180 160 140 120 ...J a: 100 c:( 80 60 40 0 . 5 1 . 5 Lambda -+-Normal -t( 2 ) _._t( 5 ) -M-t( 8 ) -t( 18) -+-MixNorm 2 2.5 3 Figure 5. 4 Plot of ARL versus A for the PR SRT with n = l O under various bivariate distributions . PAGE 120 -----------250 200 150 .J 0:: <( 100 50 0 0 0 . 5 105 ----------------------.. ----------1.5 Lambda 2 2 . 5 3 -+-T " 2 -e-0 ---t::rRST ~ffi-SRT ---llEChiSq Figure 5.5 Plot comparing the T2 , the " Q" , the RST , the PR-SRT, and the x2 charts under bivariate normality. .J 0:: <( ----------... -----200 ------------------, 180 160 140 120 so----~ 60 40 20 t---+---+--~~-,+==5!!;;;;;;~-=:;;;;;;d 0 ~=:=:!t==~==:li::=~====IE====0 0 . 5 1 . 5 Lambda 2 2.5 3 -+-T"2 -e-'' Q ' ' ---t::rRST ~ffi-SRT ---llEChiSq Figure 5.6 Plot comparing the T2 , the "Q", the RST, the PR-SRT, and the x2 charts under the bivariate t distribution with 2 d.f. PAGE 121 200 180 160 140 120 ...J 0:: 100 ct 80 60 40 20 0 0 . 5 106 1 . 5 2 Lambda 2 . 5 3 -+--T"2 ---" Q " -trRST ~PR-SRT ---Ch iSq Figure 5. 7 Plot comparing the T2 , the " Q ", the RST, the PR-SRT, and the , / charts under the bivariate t distribution with 5 d . f 200 180 160 140 120 ...J 0:: 100 ct 80 60 40 20 0 0 0 . 5 1 . 5 2 2.5 Lambda 3 -+--T" 2 , . I ---"Q" -tr-RST ~PR-SRT ----ChiSq Figure 5 . 8 Plot comparing the T2 , the " Q " , the RST, the PR -SRT, and the x2 c harts under the bivariate t distribution with 8 d.f PAGE 122 200 180 160 140 120 ..J 100 a:: < 80 60 40 20 0 0 0 . 5 107 ---------------------1 . 5 Lambda 2 2 . 5 3 -+-P2 ~ a ' ' -1:r--RST ~PR-SRT. --ilE-ChiSq Figure 5.9 Plot comparing the T2 , the " Q", the RST, the PR-SRT, and the x2 charts under the bivariate t distribution with 18 d.f. ----------------------------200 180 160. 140 120 ..J 100 a:: < 80 60 40 20 0 0 0.5 1 . 5 Lambda 2 2.5 3 -+-P2 ' ~ " Q" -tr-RST ~PR-SRT --ilE-ChiSq Figure 5 . 10 Plot comparing the T2 , the "Q", the RST, the PR-SRT, and the x2 charts under the bivariate mixed normal distribution. Figure 5 . 5 shows that the RST and the PR -RST charts maintain their pre specified in-control average run lengths under bi v ariate normality . On the other hand, PAGE 123 108 Figure 5 . 5 shows that the RST and the PR -RST charts maintain their pre specified in-control average run lengths under bivariate normality. On the other hand, T2 and the "Q" charts do not maintain the pre-specified in-control a v erage run lengths. The short out-of-control average run lengths of both these charts are a direct consequence of their low in-control average run lengths . The T2 chart does better than the "Q" chart under bivariate normality since it is constructed by using a base period of onl y 25 sample of size 5 whereas the "Q" chart is constructed by using a base period of 2500 observations. Figures 5.6 through 5 .9 indicate that the RST and the PR -SRT charts maintain their pre-specified in-control average run lengths under the various t distributions that were used in this dissertation . The plots also indicate that these charts do fairly well in detecting out-of-control states . On the other hand, the T2 and the "Q" charts ha ve high type I error rates under all the t distributions that were used in this dissertation . The low out-of-control average run lengths of the T2 and the II Q" charts are a direct consequence of their low in-control average run lengths. Figure 5. 10 indicates that the in-control average run lengths of the RST and the PR -SRT charts are smaller than the pre-specified in-control average run length under the bivariate mixed normal distribution . The in-control average run length of the " Q" chart is very similar to that of the RST and the PR -SRT charts . The out-of-con t rol average run lengths of the " Q" chart are smaller than RST and the PR -SRT charts for non-centrality values greater than 1 . 00 . The in-control average run lengths of the T2 PAGE 124 109 chart are higher than the in-control and the out-of-control average run lengths of the RST, the PR -SRT, and the .. Q" charts . 5 .3 A Robust Multivariate Exponentially Weighted Moving Average Chart We introduce a robust multivariate EWMA chart in this Section . This chart is a modification of the multivariate EWMA chart that was proposed by Lowry et al . ( 1992) which was discussed in Chapter 2 . Again, we assume that X is elliptically symmetric with density /(x) which was given in Equation (5 .1). The robust multivariate EWMA statistic Z, at time t , where t = 0, l, ... , is defined as (5 .3) where O < r < l is the weighing constant, U, (see Equation (5 .2)) is the unit vector at time t, and the starting value ( t = 0) of the robust multivariate EWMA statistic is Z0 = E(U) = 0 . The robust multivariate EWMA chart gives an out-of-control signal whenever where L is obtained by simulation to achieve a pre specified in-control average run length . Theorem 5 . 5 The covariance matrix of the robust multivariate EWMA statistic at time t is given by Var(Z,) = (r I p(2 -r))[l -(l -r)21 ]I with asymptotic variance (r I p(2-r))l where I is the p-dimensional identity matrix . Proof: By repeated substitution in Equation (5 .3), it can be shown that r Z, = Lr(l,)1 -'U,. J=I PAGE 125 110 Thus t Var(Z1 ) = LVar(r(l-rf-1 U) j=I t = r2 L (l-r)2< r -f)var(U i ) J=I = r2(l-r)2t i:o-rr211 p p 1=1 r(l -(1 r )21 ) =----I. p(2 -r) P The asymptotic variance is obvious since (l-(l-r)21)-1-+_ PAGE 126 111 l l ~ 2 = pr(2 -r) I Io-, y -1 o -, y k u ~ u k j=I k = I where the term u ; u k represents the cosine of the angle between the observations X 1 and X k . On the other hand the quadratic form of the normal theory multivariate EWMA chart is given by ( ( ~ 2 = rc2-r)II(1-,y , o-, r k x ~ x k J=I k = I where the term X ; X k represents the Mahalanobis distance between the observations X , and X k . Figure 5 . 11 shows the average run length performance of the robust multivariate EWMA control chart for various values of the non-centrality parameter A = ( r ) 1 2 with r = 0.10 and under the various bivariate distributions that were used in this dissertation study . Figure 5 . 11 shows that the robust multivariate EWMA chart maintains the pre specified in-control average run length under all the elliptically symmetric distributions that were used . The in-control average run length under the bivariate mixed normal distribution is approximately l 90. This is in contrast to the performance of the normal theory multivariate EWMA chart where the pre-specified in-control average run length was achieved only for the bivariate normal distribution . The performance of the robust multivariate EWMA chart under deviations from normality is clearly an improvement over the performance of the normal theory multivariate EWMA chart where the type I error rates were higher than the pre-specified type I error rate . PAGE 127 2 00 180 160 140 120 ..J 0::: ci: 100 80 60 40 20 0 0 0 . 5 112 1 . 5 2 Lambda 2 . 5 3 i-+-Normal i I i---t( 2 ) J-6-t( 5 ) !--*-1(8 ) I--t( 18 ) i j_._MxNorml Figure 5. 11 Plot of ARL v ersus 2 for the RMEWMA with r = 0.10 under v arious bivariate distributions. PAGE 128 CHAPTER6 AFFINE INVARIANT ROBUST MULTIVARIATE CONTROL CHARTS UNDER AN UNKNOWN COVARIANCE MATRIX In this chapter, we extend the robust multivariate control charts that were introduced in Chapter 5 to the case when the covariance matrix of the underlying distribution of the process is unknown. The performance of the proposed multivariate control charts compare well with the performance of, both , the normal theory and the non parametric control charts that were proposed by Liu ( 1995) under the assumption that the underlying distribution of the process is multivariate normal. On the other hand the performance of the proposed multivariate control charts are an improvement over the performance of, both, the normal theory and the non-parametric control charts that were introduced by Liu ( 1995) under deviations from the assumption of multivariate normality. The proposed control charts are based on modifications of the affine invariant one sample multivariate sign test that was developed by Randles (1989) and Hettmansperger et al. (1994) , and on the affine invariant one sample multivariate sign-rank test that was developed by Peters and Randles (1991 ) . The robust multivariate Shewhart type charts are introduced in Sec t ion 6. 1 . These charts are called the V n (Randles (1989)) chart, the H (He ttmansp erger et al. (1994)) chart , and the W n (Peters and Randles (1996)) chart , respectivel y . Simulation results of the average run length studies of these charts are also given in Section 6 .1. A robust multivariate EWMA chart is introduced in Section 6.2. This c ha rt is based on the work of 113 PAGE 129 114 Randles ( 1989) and is called the V,, EWMA chart . Simulation results of the average run length studies of this chart are also given in Section 6 .2 . 6. 1 Affine Inv ariant Multivariate Shewhart Type Charts Assume that X is ellipticall y symmetric and let X lb, ... , X mb denote the base A 1 m period sample. Let L = -L X,bx : be a consistent estimator of the co v ariance of X m , ~ 1 and consider a sample { X 1 , â€¢â€¢â€¢ , X n } in the control period . Consider testing the hypothesis HO : 0 = 0 versus H a : 0 1:0 . Note that we are assuming that the mean of the process in the in-control state is O and that we are interested in detecting departures from this target. First we discuss the construction of the Vn chart . Let the base period sample define a set of hyperplanes originating from the origin O . For instance , in the bivariate case a set of hyperplanes would be m lines originating from the origin O . N ext consider taking samples of size n in the control period and consider pairs of observations X 1 and X 1,; from this sample . Let C11,; denote the number of hyperplanes formed by the origin 0 and p -I other points from the base period sample such that X 1 and X k are on opposite sides of the hyperplane formed . In the bivariate case , given a base period sample of size m, C11,; is an integer between O and m inclusive . The counts C1k are called interdirections and it measures the angular distance between X 1 and X 1,; in relation to the origin and the base period vectors. The Vn chart is constructed by first taking a base period sample of size m . Next , for each sample of size n in the control period compute the statistic Vn which is defined as follows : PAGE 130 115 p n n vn = IIcos(JifJj k ) n } = I k=I where p fk is as defined in Chapter 5 . We claim that there is evidence to indicate that the process is out-of-control whenever Vn for a sample in the control period exceeds an appropriate cut-off value L which is obtained by simulation to achieve a pre-specified in control average run length. Simulation studies were conducted for base periods of size 25, 50 , and 100 and control period samples of size 5 and I 0 . Figures 6 . 1 through 6 . 6 display the performance of the Vn chart for different base period samples and a control period sample of size 5 under t he various bivariate distributions that were used in this dissertation study. The cut-off values for base periods of size 25, 50, and 100 are 8 .15, 8 . 31, and 8 . 38, respectively . 250 200 150 -+-m = 2 5 ..I 0::: < ---m=50 100 --Irm= 100 I 50 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 6 . 1 Plot comparin g the p erformance of th e V n chart for different bas e p e riod sample siz e s and under the bi var iate normal distri bution . PAGE 131 -. ----------250 200 150 ..J er:: < 100 50 0 0 0 . 5 116 1.5 Lambda 2 -+-m=25 -a-m=50 -1::r-m=100 2 . 5 3 Figure 6.2 Plot comparing the performance of the V,, chart for different base period sample sizes and under the bivariate t distribution with 2 d.f. ..J er:: < 0 0 . 5 1 . 5 Lambda 2 2 . 5 3 -+-m=25 -a-m=50 -1::rm=100 Figure 6.3 Plot comparing the performance of the V,, chart for different base period sample sizes and under the bivariate t distribution with 5 d.f. PAGE 132 117 250 200 150 --+-ITF25 .J 0:: <( ---ITF50 100 ITF1 00 50 0 0 0 . 5 1.5 2 2 . 5 3 Lambda -----------Figure 6.4 Plot comparing the performance of the V,, chart for different base period sample sizes and under the bivariate t distribution with 8 d.f. 250 200 150 .J 0:: <( 100 50 0 0 0 . 5 -----------. -------1 . 5 Lambda 2 2 . 5 3 ,--+-ITF2 5 ---ITF50 ~ITF100' Figure 6.5 Plot comparing the performance of the V,, chart for different base period sample sizes and under the bivariate t distribution with 18 d.f. PAGE 133 118 I 250 I I 200 I I ! 150 ;-+-m=2 5 1 [ ...I a:: ---m=50 I ct --0-m= 1 oo I 100 50 I I 0 0 0.5 1 . 5 2 2.5 3 Lambda Figure 6.6 Plot comparing the performance of the V n chart for different base period sample sizes and under the bivariate mixed normal distribution. Figures 6.1 through 6.6 indicate that the V n chart maintains the pre-specified incontrol average run length under all the bivariate distributions that were used in this dissertation study. The Vn chart also does fairly well in detecting out-of-control states for the elliptically symmetric distributions. The performance of the chart is poor under the bivariate mixed normal distribution where an average of approximately 50 observations have to be sampled before an out-of-control signal is given when the non-centrality parameter equals 3. 00. Figures 6. 7 through 6. 12 display the performance of the Vn chart for different base period samples and a control period sample of size l O under the various bivariate distributions that were used in this dissertation study . The cut-off values for base periods of size 25 , 50, and 100 are 8.54 , 9 . 11, and 9.38 , respectively. PAGE 134 119 250 200 150 -+-m=25 : .J 0:: <( ---m=50 100 -o-m=1 00 50 0 0 0 . 5 1 . 5 2 2.5 3 Lambda Figure 6. 7 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivariate normal distribution . 250 200 150 .J 0:: <( -+-m=25 Ii ---m=50 , 100 -o-m=1 00 i 50 0 0 0.5 1 . 5 2 2 . 5 3 Lambda Figure 6.8 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivariate t distribution with 2 d . f PAGE 135 250 200 150 ...J a:: c:i: 100 50 0 0 0 . 5 120 1 . 5 2 Lambda 2 . 5 3 l-+-m=25 ! t---m=50 i --Irm=100 1 Figure 6 . 9 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivariate t distribution with 5 d.f. 250 200 150 -+-m=25 ...J a:: c:i: ---m=50 100 --1r-m=100 50 0 0 0 . 5 1.5 2 2 . 5 3 Lambda Figure 6 . IO Plot comparing the performance of the V n chart for different base period sample sizes and under the bivariate t distribution with 8 d.f. PAGE 136 121 250 200 150 E:l ..J 0:: I c( l-om=100 I 100 50 0 0 0 . 5 1.5 2 2.5 3 Lambda Figure 6.11 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivariate t distribution with 18 d.f. 200 180 1 60 140 120 -+-m=25 ..J -m=50 0:: 100 c( _..,._m=100 80 60 40 20 0 0 0.5 1.5 2 2.5 3 Lambda Figure 6.12 Plot comparing the performance of the Vn chart for different base period sample sizes and under the bivariate mixed normal distribution. Figures 6 . 7 through 6.12 indicate that the Vn chart maintains the pre-specified incontrol average run length under the various bivariate distributions that were used in the PAGE 137 122 dissertation study . The V n chart also does fairly well in detecting out-of-control states under the elliptically s y mmetric distributions. However, the performance of t he r n chart is poor under the bivariate mixed norma l distribution where an average of appro x imately 40 observations are needed before an out-of-control signal is given when the noncentrality parameter equals 3 . 00. Figures 6.1 through 6.12 also indicate that increasing the control period sample size improves the performance of the Vn chart. The out-of-control average run lengths of the charts with n = 10 are lower than the out-of-control average run lengths of the chart with n = 5 . Next, we discuss the construction of the W,, chart. As before , let the base period sample define a set of hyperplanes originating from the origin O . Next consider taking samples of size n in the control period . For each observation, X i , in the control period compute the estimated Mahalanobis distances D i via D i = x ; f -1 X i where f is a consistent estimator of the variance-covariance matrix of X and i = 1 , ... , 11. Let R ; = rank(D;), i = 1,2, ... ,n. We now weight the (j,k)1h term in the sum V,, b y R J R k and consider the statistic 3p n n R R W,, = 2 Z:Z:cos(,ifJJk)-1 _k. n pl k = I n n The W,, chart is constructed by first taking a base period sample of size m . Next, for each sample of size n in the control period, the statistic W,, is computed . We claim that there is evidence to indicate that the process is out-of-control whenever nW,, for a sample in the control period exceeds an appropriate cut-off value L which is obtained b y simulation to PAGE 138 123 achieve a pre-specified in-control average run length. Simulation studies were conducted for base periods of size 25 , 50, and l 00 and control period samples of size 5 and l 0 . Figures 6.13 through 6.18 displays the performance of the W,, chart for different base period samples and control period samples of size 5 under the various bivariate distributions that were used in this dissertation study. The cut-off values for base periods of size 25, 50 , and l 00 are 9.14 , 9 .28, and 9 . 34, respectively. The plots indicate that the W,, chart maintains the pre-specified in-control average run length under the various bivariate distributions that were used in the dissertation study . The W,, chart also does fairly well in detecting out-of-control states under the elliptically symmetric distributions . However, the performance of the W,, chart is poor under the bivariate mixed normal distribution where an average of approximately 50 observations need to be sampled before an out-of-control signal is given when the non-centrality parameter equals 3 . 00. 250 200 150 -+-m=25 ...I c:: <( ----m=50 100 ---tr-m=100 50 0 0 0.5 1 . 5 2 2.5 3 Lambda Figure 6. 13. Plot comparing the performance of the W,, chart for different base period sample sizes and under the bivariate normal distribution. PAGE 139 250 200 150 ..J 0:: ct 100 50 0 0 0 . 5 124 1 . 5 2 Lambda 2 . 5 3 i-+-m=25 ! !----m=50 I j--lk-m=100J Figure 6 . 14 . Plot comparing the performance of the W n chart for different base period sample sizes and under the bivariate t distribution with 2 d .f. 250 200 150 -+-m=25 ..J 0:: ----m=50 ct 100 --ik-m=100 50 0 0 0 . 5 1 . 5 2 2.5 3 Lambda Figure 6 . 15 Plot comparing the performance of the w;, chart for different base period sample sizes and under the bivariate t distribution with 5 d.f. PAGE 140 125 250 200 150 ~m=25 I ...J ~m=50 I et:: I I 100 ----b-m=100 I I I 50 I i 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda â€¢ Figure 6 . 16 Plot comparing the performance of the W,, chart for different base period sample sizes and under the bivariate t distribution with 8 d.f 250 200 150 ~m=25 ...J ~m=50 et:: 100 ----b-m= 100 50 Lambda Figure 6 . 17 Plot comparin g th e p e rformance of the W,, chart for diff e r e nt b ase p e riod sample siz es and und e r th e biv ariate t distribution with 18 d.f PAGE 141 250 200 150 ..J 0::: <( 100 50 0 0 0 . 5 126 1 . 5 2 Lambda 2 . 5 3 l~m=25 l---m=50 Figure 6. 18 Plot comparing the performance of the w;, chart for different base per i od sample sizes and under the bivariate mixed normal distribution . Figures 6.19 through 6 . 24 displa y the performance of the W,, chart for different base period samples and control period samples of size 10 under the various biv aria t e distributions that were used in this dissertation study . The appropriate cut-off v alues for base period samples sizes of 25, 50 , and 100 are 9.58, 9 . 98, and 10 . 17, respectively . 250 200 150 ..J 0::: <( 100 50 0 0 0 . 5 1 . 5 2 Lambda 2 . 5 3 ~m=25 ---m=50 ---trm= 100 Figure 6 . 19 Plot comparing the performance of the W,, chart for different base period sample sizes and under the bivariate normal distribution . PAGE 142 127 250 200 150 i-+-m=25 ! : ..J 0:: < 100 j-11-m=50 I ; J-&-m=100 I : 50 0 0 0.5 1 . 5 2 2 . 5 3 Lambda Figure 6.20 Plot comparing the performance of the Wn chart for different base period sample sizes and under the bivariate t distribution with 2 d.f. 250 200 150 -+-m=25 I ..J 0:: < -11-m=50 100 -&-m=100 50 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 6. 21. Plot comparing the performance of the W ~ chart for different base period sample sizes and under the bivariate t distribution with 5 d .f. PAGE 143 250 200 150 ..J 0::: ct 100 50 0 0 0 . 5 128 1.5 Lambda 2 2.5 3 l-+-m=25 I ----m=50 i -trm=100 J / I Figure 6.22 Plot comparing the performance of the W n chart for different base period sample sizes and under the bivariate t distribution with 8 d .f. 250 200 150 -+-m=25 ...I 0::: ct ----m=50 100 -tr-m=100 50 0 0 0 . 5 1.5 2 2 . 5 3 Lambda Figure 6.23 Plot comparing the performance of the w:i chart for different base period sample sizes and under the bivariate t distribution with 18 d.f. PAGE 144 129 200 180 160 140 120 ...I 0:: 100 <( 80 60 -+-rn=25 I ---rn=so I --trrn=100 1 40 20 0 0 0.5 1.5 2 2 . 5 3 Lambda Figure 6 . 24 Plot comparing the performance of the w;, chart for different base period sample sizes and under the bivariate mixed normal distribution . Figures 6 . 19 through 6.24 indicate that the w;, chart maintains the pre-specified incontrol average run length under the various bivariate distributions that were used in the dissertation study . The w;, chart also does fairly well in detecting out-of-control states under the elliptically symmetric distributions. However , the performance of the w;, chart is poor under the bivariate mixed normal distribution where an average of approximately 40 observations need to be sampled before an out-of-control signal is given when the non centrality parameter equals 3 . 00. The plots also show that the performance of the w;, chart improves with an increase in the base period sample size . Figures 6 .13 through 6.24 also indicate that increasing the control period sample size improves the performance of the w;, chart. The out-of-control average run lengths of the charts with n = 10 are lower than the out-of-control average run lengths of the chart with n = 5. PAGE 145 130 We will now discuss the construction of the H chart . In order to do this w e will first re v iew the affine invariant multi v ariate one-sample test that was proposed b y Hettmansperger et al. ( 1994) . Assume that { X1 , â€¢â€¢â€¢ , X J is a random sample from a p -v ariate symmetric distribution with a probability densit y function w hich was gi v en in Equation ( 5 . 1) . Consider testing HO : 0 = 0 versus H a : 0 0 . Let this sample define a se t of h yperplanes formed by p 1 observations along with the origin. Denote b y N q t he number of hyperplanes that can be formed where q = 1 , ... , N q i s an inde x set th a t identifies each hyperplane . Next , for each observation X i in the sample , compu t e the sign vector , Sn (Xi ) , which is defined as follows : where s q (Xi)= sgn( e ; X i ) ( sgn(-) > 0 or sgn(-) < 0) indicates whether X i is abo v e or below the hyperplane identified by the index q and e q is a normal vector to the h y perplane identified by the index q . The multivariate sign test statistic for t e s ting the null h ypothesis H0 is the sum of signs of the observations , Hettmansperger et al. (1994) show that under H0 , n -1 1 2T1 has mean O and co v ariance matrix An asymptotically distribution free affine invariant version of the test statistic is PAGE 146 131 with a limiting x distribution. In order to construct the H chart, first, consider taking a base period sample , (X1b, ... ,Xmb ) , of size m . Let this base period sample define a set of hyperplanes formed by p l observations along with the origin . Denote by N q the number of possible hyperplanes that can be formed where q =I, ... , Nq is an index set that identifies each hyperplane . Next , for each observation X ib ( i = 1 , .. . , m ) in the base period sample , compute S m (X;b) which is defined as follows: where sq(X;b) = sgn(e ; x ; b) (sgn(-) > 0 or sgn(-) < 0 ) indicates whether Xib is abo v e or below the hyperplane identified by the index q and e q is a normal vector to the hyperplane identified by the index q . We then compute the covariance matrix 81 which is given by Next , consider control period samples of size n. For each observation X ; in the control period sample , compute Sm (X;) which is defined as follows : PAGE 147 132 where sq(X;) = sgn(e ;x;) (sgn(-) > 0 or sgn(-) < 0) indicates whether X ; is above or below the hyperplane identified by the index q and e q is a normal vector to the hyperplane identified b y the index q . Now, let n Tl= LSm(X;) i=l and form the test statistic We claim that there is evidence to indicate that the process is out of control whenever H for a control period sample exceeds an appropriate cut-off value L which is obtained by simulation to achieve a pre-specified in-control average run length . Simulation studies were conducted for base periods of size 25, 50 , and I 00 and control period samples of size 5 and 10. Figures 6.25 through 6.30 display the performance of the H chart for different base period samples and control period samples of size 5 under the various bi v ariate distributions that were used in this dissertation study. The cut-off v alues for base period s of size 25, 50 , and 100 are 8 . 58 , 8.71, and 8 . 94 , respectively . Figure 6 .25 indicates that the H chart maintains the pre specified in-control average run lengths under the bivariate normal distribution and under the various base period sample sizes . The plot also indicates that the chart is fairly insensitive to shifts in the process mean . For instance, an average of 13 observations is needed before an out-of control signal is given when the non-centrality value of 2.50 and when the base period sample size is 100 . PAGE 148 133 250 I I 200 I I 150 -+-m=25 I ..J 0:: c( 100 00 50 0 0 0.5 1.5 2 2 . 5 3 Lambda Figure 6.25 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate normal distribution. 250 f 200 150 1-+-m=25 ..J 0:: !---m=50 c( 100 ~m=100 50 0 0 0 . 5 1 . 5 2 2.5 3 Lambda Figure 6 . 26 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 2 d .f. PAGE 149 250 200 150 ..J 0:: C( 100 50 I 0 L 0 . 5 134 1 . 5 Lambda 2 -+-m=25 ---m=50 -4:-m=1 00 . 2 . 5 3 Figure 6.27 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 5 d.f. 250 200 150 -+-m=25 ..J 0:: C( ---m=50 100 -4-m=1 00 : 50 0 0 0 . 5 1 . 5 2 2.5 3 Lambda Figure 6.28 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 8 d.f. PAGE 150 250 200 150 ..J 0:: C( 100 50 0 0 0 . 5 135 1.5 2 Lambda 2.5 3 i~m=25 1---m=50 l-1r-m=1 oo I Figure 6.29 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 18 d. f. 250 200 150 ..J 0:: C( 100 50 0 0 0 . 5 1.5 2 Lambda 2 . 5 3 ~m=25 ---m--50 -1r-m=100 Figure 6 . 30 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate mixed normal distribution . F igures 6.2 6 and 6 .2 7 indicate that the H chart do e s not maintain the pre specified in-control avera g e run length under the bivariate t distributions with 2 and 5 d .f. The in-control av e rag e run lengths converge to the pr e -sp e cified in-control a verag e run PAGE 151 136 length with an increase in the base period sample size. The plots also indicate that the chart is insensitive to shifts in the process mean under the bivariate t distributions with 2 and 5 d.f. For instance, under the bivariate t distribution with 2 d.f., an average of 16 observations need to be sampled before an out-of-control signal i s given when the non centrality parameter value equals 3. 00 and when the base period sample size is l 00. In the case of the bivariate t distribution with 5 d .f., an average of 10 observations need to be sampled when the non-centrality parameter value equals 3 . 00 and when the base period sample size equals 100 . Figures 6 . 28 and 6.29 indicate that the H chart maintains the pre-specified in-control average run length under the bivariate t distributions with 8 and 18 d .f. However, the plots also indicate that the chart is insensitive to shifts in the process mean. For instance , an average of about 9 observations need to be sampled before an out of-control signal is given when the non-centrality value equals 3.00. Figure 6.30 indicates that the H chart maintains the pre-specified in-control average run length under the bivariate mixed normal distribution . However , the performance of the chart in the out-of-control state is poor . For instance, an average of 52 observations needs to be sampled before an out-of-control signal is given when the non-centrality value equals 3 .00 . Figures 6 . 25 through 6.30 also indicate that increasing the base period sample size marginally improves the performance of the H chart . Figures 6.31 through 6 . 36 displays the performance of the H chart for different base period samples and control period samples of size 10 under the various bivariate distributions that were used in this dissertation study . The cut-off values for base periods of size 25, 50 , and 100 are 9 . 84, 10.01, and 10.31, respectively. PAGE 152 250 200 150 ..J c:( 100 50 0 0 0 . 5 137 1.5 2 Lambda 2.5 3 J-+-m=25 :--a-m=50 I !-tr-m=100 Figure 6 .31 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate normal distribution . 250 200 150 -+-m=25 ..J --a-m=50 c:( 100 -tr-m=100 50 0 0 0 . 5 1.5 2 2.5 3 Lambda Figure 6 . 32 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 2 d.f. PAGE 153 138 250 200 150 -+-m=25 ...I a:: ---m=50 ct 100 --Irm=1 00 50 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 6.33 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 5 d . f 250 200 150 -+-m=25 I ...I a:: ct ---m=50 I 1 : I 100 --Irm=100 : 50 0 0 0 . 5 1.5 2 2 . 5 3 Lambda Figure 6 . 34 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 8 d .f. PAGE 154 139 200 180 160 140 120 -I 0:: 100 c( 80 e-+--m=25 I ----m=50 -t:r-m=100 60 40 20 0 0 0 . 5 1 . 5 2 2 . 5 3 Lambda Figure 6 .35 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate t distribution with 18 d . f. 200 180 160 140 120 -+--m=25 -I 0:: 100 c( ----m=50 80 -t:r-m=100 60 40 20 0 0 0 . 5 1.5 2 2 . 5 3 Lambda Figure 6.36 Plot comparing the performance of the H chart for different base period sample sizes and under the bivariate mixed normal distribution. Figures 6 .31 through 6.36 indicate that the H chart maintains t he pre-specified in control average run length under all the bivariate distribution that were used in this dissertation. Figure 6.31 indicates that the chart does fairly well in detecting out-of- PAGE 155 140 control states under the bivariate normal distribution. An average of 4 observations is needed when the non-centrality value equals 3 . 00 and when the base period sample size is 100. This is one-half of the number of observations required when the control period sample size is 5 . Figures 6 .32 through 6 . 35 indicate that the H chart does fairl y well in detecting out-of-control states under the various heavier tailed distributions that were used. The out-of-control average run lengths of the chart under the bivariate t distribution with 2 d.f. are higher than the out-of-control average run lengths under the other t distributions. Figure 6.36 indicates that the performance of the chart is poor under the bivariate mixed normal distribution. An average of 32 observations need to be sampled before an out-of-control signa l is given when the non-centralit y v alue equals 3 . 00 and when the base period sample size equals 100. Figures 6.31 through 6.36 also indicate that the performance of the H chart can be marginally improved by increasing the base period sample size. Figures 6 . 25 through 6.36 indicate that the performance of the H chart improves as the control period sample size increases . The out-of-control average run lengths are smaller for control period samples of size 10 than for control period samples of size 5 . Figures 6 .37 through 6.42 compares the performance of the T2 chart with subgroups of observations, the " Q" chart, the RST chart, the PR SRT chart , the Vn chart , the W,, chart, and the H chart under the various bivariate distributions that were used in this dissertation. We have restricted the comparisons to base periods of size 25 ( except for the " Q" chart) and control period samples of size 5. PAGE 156 141 250 200 l-+-T2 : ___ " Q " I 150 1-ts-RST ..J 0:: --*-PR-SRT <( 100 50 H 0 0 0.5 1.5 2 2.5 3 Lambda Figure6. 37 Plotcomparingtheperformanceofthe T2 , "Q", RST, PR-SRT, in, W:, , and H charts under the bivariate normal distribution . 250 200 150 ..J 0:: <( 100 50 0 0 0 . 5 1.5 Lambda 2 2.5 3 -+-T2 . i : 1 ---" Q " ! 1 -ts-RST --*PR-SRT â€¢ 1:=:~ ' Figure 6.38 Plot comparing the performance of the T2, "Q", RST, PR-SRT, Vn, w;,, and H charts under the bivariate t distribution with 2 d . f PAGE 157 142 250 1:=:11 200 150 1-tr-RST ' ..J I 0: t ---*"PR-SRT c:i: I 100 1----vn 1 --+-Wn 50 I i~H 0.5 1 . 5 2 2 . 5 3 Lambda Figure6. 39 Plotcomparingtheperformanceofthe T":., "Q", RST , PR-SRT, V n , w;,, and H charts under the bivariate t distribution with 5 d .f. 250 200 -+-T2 ----" Q " 150 -tr-RST ..J I ---*"PR-SRT 0: c:i: ' 100 i----vn I 50 1 --+-Wn 1~H 0 0 0.5 1 . 5 2 2 . 5 3 Lambda Figure 6.40 Plot comparing the performance of the T2 , " Q", RST , PR SRT , Vn, w;,, and H charts under the bivariate t distribution with 8 d .f. PAGE 158 143 250 200 -+-T2 ----"Q" 150 ~RST ..J a:: ~PR-SRT c:t 100 ----Vn _._wn 50 -+-H 0 0 0 . 5 1 . 5 2 2.5 3 Lambda Figure6.41 Plotcomparingtheperformanceofthe T2, " Q " , RST, PR-SRT, V n, W n , and H charts under the bivariate t distribution with 18 d . f 0 0 . 5 1 . 5 Lambda 2 2 . 5 -+-T2 i . ____ .. Q.. I ~RST i :~PR-SRTI ----Vn i i _._wn I. -+-H I 3 Figure6.42 Plotcomparingtheperformanceofthe T 2, " Q ", RST, PR-SRT, V n , W,,, and H charts under the bivariate mixed normal distribution. Figures 6.37 through 6.42 indicate that the RST, PR-SRT, T ~ , , Wn, and H charts perform better than the T2 , and the " Q" charts under the ellipticall y symmetric PAGE 159 144 distributions. The proposed charts maintain the pre-specified in-control average run length whereas the T2 and "Q" charts have shorter in -control average run length and , therefore , a higher type I error rate. The plots also indicate that the Vn and W n charts perform better than the H chart. The improved performance is only marginal under the bivariate normal distribution but is fairl y significant under the various t distributions . The T2 chart performs better than the other charts under the bivariate mixed normal distribution . The in-control average run length is fairly close to the pre-specified in control average run length and the chart does well in detecting out-of-control states. On the other hand , even though the Vn , W,, , and H charts maintain the pre-specified incontrol average run length , they are insensitive to shifts in the process mean for the mixed normal distribution . The "Q ", Vn , and W,, charts have a low in-control average run length and , therefore , have a high type I error rate in that setting . 6 .2 An Affine-Invariant Multivariate EWMA Chart We will now introduce an affine-invariant multivariate EWMA c hart und e r the assumption that the covariance matri x of the underlying distribution o f the process is unknown . Again , assume that X is elliptically symmetric with densit y g i ve n b y Equation (5.1). Recall from Chapter 5 that the robust multivariate EWMA statistic at time t (under the assumption that the covariance matri x of the underlyin g distribution of the process is known) is given by PAGE 160 145 where O < r < 1 is the weighing constant, U 1 is the unit vector (see Equation (5.2)) at time t and the starting value of the robust multivariate EWMA statistic at time O is g iv en by Z O = 0 . The process is declared out-of-control whenever T.2 = z T " I z t t L....z, 1 exceeds an appropriate cut-off value L which is obtained by simulation to achieve a pre specified in-control average run length . Note that I I T / = pr(2-r)LL(l-r)1-1(1-r/-ku~u k /=I k=l where U ~ U k is the cosine of the angle between the observations X 1 and X k. Under the assumption that the covariance matrix of the underl y ing distribution of the process is unknown, we can modify Randles ( 1989) to construct an affine-invariant multivariate EWMA chart. First, consider taking a base period sample ( X lb, â€¢â€¢â€¢ , X mb) and let this sample define a set of hyperplanes where each hyperplane is formed b y p 1 observations along with the origin. Next, consider taking observations X 1 ( t = 1 ,2, ... ) in the control period . At each time period , t ( t = l ,2, .. . ) , compute I I T/* = pr(2-r)LL(l-r)1-1(1-r)1-k cos(JrjJ1k ) /=I k=I where p Jk is the proportion of hyperplanes that separate the observations X 1 and X k (see Chapter 5). Note that cos(JrjJ1k) is an affine-invariant estimator of U ~ U k . The process is declared out-of-control whenever the statistic ~2 â€¢ at time t exceeds an appropriate cut-off value L which is obtained by simulation to achieve a pre-specified in control average run length. The simulation study was conducted by using a base period of PAGE 161 146 size 25 . The cut-off value of the affine-invariant multivariate EWMA chart for this base period and for r = 0.10 is 8.53 . Figure 6.43 shows the performance of the proposed EWMA chart under the various bivariate distributions that were used in this dissertation . 250 200 , ~ ---t(2) 150 ...J --lk-t(5) I i 0::: cs: ---*--1(8) â€¢ 100 ~t( 18) 50 __._MxNo 0 0 0.5 1 . 5 2 2 . 5 3 Lambda Figure 6.43 Plot showing the performance of the V n -EWMA chart under the various bivariate distributions. Figure 6.43 indicates that the V n -EWMA chart maintains the pre-specified incontrol average run length under all the bivariate distributions that were used in this dissertation. It also does fairly well in d e tecting the out-of-control states under the various elliptically s y mmetric distributions . The performance of the chart is poor under the bivariate mixed normal distribution. For instance , an average of about 100 observations need to be sampled before an out-of-control signal is given when the non centrality value equals 0 . 50 . PAGE 162 147 Figures 6.44 through 6.49 compares the performance of the normal theory multivariate EWMA, the REWMA , and the V n EWMA charts under the vari ous bivariate distributions and when r = O. l O . 250 200 150 ..J a: c( 100 50 0 0 0 . 5 1.5 2 2 . 5 Lambda 3 I-+Low ry ' s _._R8/IM,A V n-8/IMoA Figure 6.44 Plot comparing the performance of Lowry et al. (1992), REWMA , and Vn -EWMA charts under the bivariate normal distribution . 250 200 150 ..J a: c( 100 50 0 0 0 . 5 1 . 5 2 Lambda 2.5 3 -+-Lawry' s _._R8/IMoA ~Vn-8/IM,A Figure 6.45 Plot comparing the performance of Lowry et al. (1992), REWMA , and Vn -EWMA charts under the bivariate t distribution with 2 d.f. PAGE 163 148 250 200 150 -+Low ry ' s ...I a: c:( ---RENMA. 100 ---tsV nENMA. 50 0 0 0.5 1 . 5 2 2.5 3 Lambda Figure 6.46 Plot comparing the performance of Lowry et al. (1992) , REWMA, and V n EWM'A charts under the bivariate t distribution with 5 d. f. 250 200 150 -+-Lawry' s ...I a: c:( ---RENMA. 100 ---tsV n-ENMA. 50 0 0 0.5 1 . 5 2 2 . 5 3 Lambda Figure 6.47 Plot comparing the performance of Lowry et al. (1992) , REWMA , and V n EWM'A charts under the bivariate t distribution with 8 d.f. PAGE 164 149 250 200 150 i-+Low ry ' s I ...J 1---REWMA. 0:: , . I cc l-1z-vn-EWMA. ! 100 50 0 0 0.5 1.5 2 2 . 5 3 Lambda Figure 6.48 Plot comparing the performance of Lowry et al. (1992) , REWMA, and V EWMA charts under the bivariate t distribution with 18 d .f. n 200 180 160 <' , ' 140 120 -+-Lowry's ...J 0:: 100 ---REWMA. cc 80 -tz-V n-EWMA. I i I ) 60 40 20 0 0 0 . 5 1 . 5 2 2.5 3 Lambda Figure 6.49 Plot comparing the performance of Lowry et al. (1992), REWMA, and Vn EWMA charts under the bivariate mixed normal distribution. Figures 6.44 indicates that the multivariate EWMA chart that was proposed by Lowry et al. (1992) performs better than REWMA and the V n -EWMA charts . PAGE 165 150 However , the performance of the REWMA and the Vn EWMA charts are not too different from the performance of the normal theory multi v aria t e EWMA chart . T he REWMA and the V n EWMA charts are an impro v ement o v er t he performance of the normal theory multivariate EWMA chart under the non-normal biv ariate distribut i ons that were used in the dissertation study . The proposed EWMA charts maintain their pre specified in-control average run lengths and do fairly well in detecting out-of-control states. On the other hand , the normal theory EWMA chart has a v ery high false alarm ra t e under the non-normal bivariate distributions . Note that all three charts suffer from the problem of " inertia " in detecting larger shifts in the process mean . PAGE 166 CHAPTER 7 SUMMARY AND CONCLUSIONS In this dissertation we have proposed charts that are robust to departures from normality as alternatives to the normal theory multivariate control charts . Our simulation results indicate that the normal theory multivariate control charts perform poorly under departures from the assumption of multivariate normality . When departures from normality occur , the type I error rates are high under the heavier tailed distributions and are low under the lighter tailed distributions . In reality, the assumption of multivariate normality rarely holds and in these cases, the use of the normal theory multivariate control charts may lead to wrong conclusions about the nature of the out-of-control signals . One of the objectives of any multivariate control procedure is to maintain the pre-specified type I error rate . Unfortunately , this objective is not achieved even when normal theory multivariate control charts are used , when deviations from the multivariate normality assumption occur. The performance of nonparametric control charts, proposed b y Liu ( 1995) , were also studied. Our simulation results indicate that Liu's charts do not perform well under deviations from multivariate normality . With Liu ' s charts the type I error rates are inflated as is the case with most or all of the nonparametric multivariate control charts . Furthermore , to obtain a type I error rate that is acceptable , a large base period sample is required . However , in reality , large base periods are seldom used . Furthermore, the 151 PAGE 167 152 nonparametric CU SUM chart that was proposed by Liu ( 199 5) does not ha v e a reset feature and ,therefore, is very slow in detecting small shifts in the process mean. We have proposed alternative charts which are based on affine invariant multivariate one-sample versions of the sign and sign-rank tests that were developed by Randles (1989), Hettmansperger et al. (1994), and Peters and Randles (1991). Simulation results indicate that our proposed charts not only maintain their pre-specified type I error rates under the various heavier tailed elliptically symmetric distributions that were used , but also detect out-of-control states more quickly than the other charts do. Simulation results also indicate that our proposed charts perform as well as the normal theory charts and Liu's nonparametric multivariate control charts even under multivariate normality . Future areas of research involve the investigation of the performance of multivariate control charts under robust estimates of and I: . This would include the investigation ofHotelling's T2 chart with the unbiased estimators X and S replaced by robust estimators such as the minimum volume estimators of multivariate location and scatter that were proposed by Rousseeuw (1984). We would also like to investigate the properties of the w;, chart with the consistent estimator L replaced by the minimum volume estimator of multivariate scatter. When the multivariate control chart signa l s that the process is out of control, we believe it is important to investigate which of the p quality characteristics has lead to the out-of-control signal. The procedures currently used to do this are normal theory based and we expect that they do not perform well under departures from normality. Finally, we believe that bayesian multivariate control charts is an area for future research as well. PAGE 168 APPENDIX FORTRAN PROGRAMS The programs in this appendix are designed to compute in-control average run lengths of the different multivariate control chart procedures when the random devia t es are sampled from the bivariate normal distribution. In order to compute average run lengths in the out-of-control states, simply reset the variable VALUE to an appropriate value of the non-centrality parameter. The programs can also be modified to compute the average run lengths of the multivariate control chart procedures when the random deviates are sampled from the non-normal distributions that were discussed in this dissertation . The calling programs for the subroutines to generate random deviates from other bivariate distributions can be found in the User ' s Manual for IMSL FORTRAN Subroutines/or Statistical Analysis (1989). 1. ARL of the x2 Chart for Individual Observations C C The key input and output variables will be declared now. C integer irank,iseed,nout integer i , count l , count2,nr,k parameter (nr=l,k=2) real ssum 1,ssum2,y(nr,k),cov(k,k),rsig(k,k),chi real avg, var , std,se,ymean(k),c(nr) C C The IMSL subroutines will be declared now. C C external chfac , rnm vn,mset, umach , b linf call umach(k,nout) C Initialization of some of the variables. C cov(l,1)=1.0 cov(l ,2)=0.0 153 PAGE 169 C cov(2 , l)=O.O cov(2,2)=1.0 open (unit=5 , status='new' , file='h0') value=0.00 call chfac(2,cov ,2,0.0000 l , irank,rsig,2) irank=12345679 call mset( irank ) ssuml=O.O ssum2=0. 0 154 C The first loop which will generate 100 ,000 out-of-control signals . C C do 10 i=l, 100000 countl =0 count2 = 1 C If count 1 does not equal 1 a random deviate is generated. C 11 if ( count 1 .ne . 1) then call mmvn(nr,k,rsig,k , y,nr) C C Compute the x statistic. C chi = nr* blinf(k , k , cov , k , y ,y) if ( chi .ge. 10.60) then ssum 1 = ssum 1 +count2 ssum2 = ssum2+count2 * *2 count 1 = count 1 + 1 endif count2 = count2+ 1 goto 11 endif 10 continue C C Compute the average run length and the simulation error. C avg=ssum 1/100000.0 var=(ssum2 -(ssuml **2 )/100000.0) /99999. 0 std= sqrt(var) se=std/sqrt(l 00000.0) write (5 ,6) avg, std , se 6 format(! x , f8.3 , 1 x,f8.3 , 1 x,f8.3) close(5) stop PAGE 170 155 end To modify program 1 to compute average run lengths of the z2 for subgroups o f observations , reset the variable N R and add the following subroutine : C C Subroutine to compute the mean vector of samples in the control period. C subroutine came an( n, vect , mean) integer n,i.j real vect(n , 2) , mean(2) do 10 i=l, 2 mean( i )=0. 0 do 20 j=l , n mean( i )=mean( i )+vect(i ,i) 20 continue mean( i )=mean( i ) / n 10 continue return end To invoke this subroutine add the following call statement in program 1 after the statement Call mmvn(): Call camean(nr , y , ymean). 2. ARL of the r2 Chart for Individual Observations C C The key input and output variables will be declared now. C C integer irank,iseed,nout,index integer i,count 1,count2,nr 1 , nr2 , k parameter (nrl =20 , nr2=1 , k=2) real ssum 1 , ssum2 , y 1 (nr 1 ,k), y2(nr2,k) , cov(k,k),rsig(k , k),chi real avg , var , std,se,ymean(k) , covar(2,2),cut C The IMSL subroutines will be declared now. C PAGE 171 156 external chfac,rnmvn , mset, umach , blinf , linrg call umach(k , nout) C C Initialization of some of the variables. C C cov(l,1)=1.0 cov(l , 2)=0.0 cov(2 , 1 )=0.0 cov(2 ,2 )=1.0 value=0.00 iseed= 1234569 cut=I0.60 open ( unit=5 , status='new' ,file='hO') call chfac(2,cov ,2,0.00001,irank,rsig,2) call mset(iseed) ssuml=O.O ssum2=0.0 C The first loop will generate 1000,000 out-of-control signals. C do 10 i=l,100000 C C Generate the base period sample and compute an estimate for the process mean and the C process covariance matrix. C call mmvn(nr l ,k,rsig,k,y 1,nr 1) C C Refer to program 1 for the program listing of the following subroutine. C call camean(nr 1,y 1,ymean) call cacov( nr 1,y 1 ,y mean,covar) countl=O count2=1 11 if (countl .ne. 1) then C C If count I does not equal 1 generate an observation in the control period. C C call mmvn(nr2,k,rsig , k,y2,nr2) y2( 1, 1 )=y2( 1, 1 )-ymean( 1) y2( 1,2 )=y2( 1,2 )-ymean(2) C Compute the r2 statistic. C chi=blinf(k,k,covar,k,y2,y2) if(chi .ge. cut) then PAGE 172 ssum 1 =ssum I + count2 ssum2=ssum2 + count2 * *2 countl =count I+ 1 endif count2=count2+ 1 goto 11 endif 10 continue C 157 C Compute the average run length and the simulation error. C avg=ssuml/ 100000 .00 var=(ssum2 -(ssuml **2)/100000.0)/99999 . 0 std=sqrt( var) se=std/sqrt( 100000.0) write (5,6) avg.std,se 6 format(lx,f8.3.lx,f8.3.lx, f8.3) close(5) stop end C C Subroutine to compute the sample covariance matrix. C subroutine cacov(n,y l ,mean, covar) integer i,j,n real y 1 (n,2) , mean(2),covar(2 , 2) suml=O . O sum2=0.0 sum3=0.0 do 10 i=l , n y 1 (i, 1 )=y 1 (i, I )-mean( 1) y 1 (i ,2)=y 1 (i,2) -mean(2) sum 1 =sum I +yl (i, 1 )*yl(i, 1) sum2=sum2+y 1 (i, 1 )*yl (i,2) sum3=sum3+y 1 (i,2)*y I (i , 2) 10 continue covar(l, l)=suml/(n-1) covar(l ,2)=sum2/(n-l) covar(2, 1 )=covar( 1,2) covar(2,2)=sum3/(n-l) call linrg(2 , covar,2,covar,2) return end PAGE 173 158 The base period sample size in program 2 is set at 20. In order to simulate different base period samples. simply reset the variable NRI . 3. ARL of the r2 Chart for Subgroups of Observations C C The key input and output variables will be declared now. C C integer irank,iseed , nout , index integer i.count l ,count2.nr l ,nr2,k parameter (nrl =20 , nr2=5,k=2) real cov(k,k),mesum(k) , omean(k),coor2(k) real tvect(nr2 , k) , tvect2(nr2,k),rsig(k,k) real coor(2),matrix(k,k),ssum l ,ssum2,sampsum(k,k) real avg2, std2 , matmean(k,k) , a(k) C Declaring the IMSL subroutines. C C external chfac, mm vn,rnset, umach, b linf,linrg open( unit=56,status='new' ,fi le ='r20') C Initialization of some of the varia bles. C C cov(l, l )= 1.0 cov(l,2 ) = 0.0 cov(2, 1 ) = 0 . 0 cov(2,2)= 1.0 iseed =l2345679 call rnset(iseed) call chfac(k,cov ,k,0.0000 l ,irank,rsig,k) value= 0 .00 ssuml=O.O ssum2= 0.0 C The first loop will generate l 00,000 out-of-control signals. C do 10 countl = I , 100000 mes um( I ) = O .0 mesum(2 ) = 0 . 0 do 5 i = l ,2 do 6 j =l,2 sampsum(i ,j)=O.O PAGE 174 159 6 continue 5 continue C C Generate 20 base period samples of size 5 and compute the overall sample mean and C sample covariance matrix. C do 11 index2= l ,nr 1 call rnmvn(nr2,k,rsig , k,tvect,nr2) C C Refer to program 1 for a program listing of the following subroutine . C C call camean(coor,tvect) mesum( 1 )=mes um( 1 )+coor( 1) mesum(2 )=mesum(2 )+coor(2) C Refer to program 2 for a program listing of the following subroutine. C call cacov(matrix,coor , tvect) do 7 i=l,2 do 8 j=l,2 sampsum(i,j)=sampsum(i,j)+matrix(i ,j) 8 continue 7 continue 11 continue do 1 i=l,2 do2j=l,2 matmean(i,j)=sampsum(i,j)/real(nr l) 2 continue 1 continue omean( 1 )=mesum( 1 )/real( nr 1) omean(2)=mesum(2)/real(nr 1) C C Compute the inverse of the overall sample covariance matrix. C C call linrg(2,matmean,2,matmean,2) i=O count2=1 C Generate control period samples of size 5. C 13 if (i .ne. 1) then call rnmvn(nr2,k,rsig,k,tvect2,nr2) call camean(coor2,tvect2) a( 1 )=coor2( 1 )-omean( 1 )+sqrt( value) a(2)=coor2(2)-omean(2)+sqrt(value) PAGE 175 C C Compute the r2 statistic. C t=nr2 *blinf(k,k,matmean,k,a,a) if (t . ge. 10.60) then ssum 1 =ssum 1 +count2 ssum2=ssum2+count2 * count2 i=i+ 1 endif count2=count2+ 1 goto 13 endif 10 continue C 160 C Compute the average run length and the simulation error. C call std( 100000,ssum 1,ssum2,avg2,std2) write(56,61) avg2,std2 , std2/sqrt(l 00000.00) 61 format(l x , f9 .4, 1 x,f9 .4, 1 x,f9 .4) close(56) stop end C C Subroutine to compute the average run length and the simulation error. C subroutine std(num,sum 1,surn2,avg,std2) integer num real sum 1, sum2, var, std2, avg avg=suml/num var=(sum2-(suml **2)/(num))/(num-1) std2=sqrt(var) return end The average run length computations in program 3 is based on base periods of 20 samples of size 5 each and control period samples of size 5. These numbers may be modified by resetting the variables NRJ and NR2. PAGE 176 161 4. ARL of Crosier's (1988) Multivariate C USUM C C The key input and output variables will be defined no w. C C integer irank , iseed , nout,index integer count l .count2,dev , i,index2 real y( 1,2),cov ( 2 , 2) , avg,std2 real ssum 1 , ssum2 , value,rsig(2,2) , s _n(2) real C_N,S_N(2),k,h,temp_l , temp_2,temp C Declaration of the IMSL subroutines. C external rnmvn.rnset,chfac,umach open( unit=5 , status=' new ' ,file='rO') C C Initialization of some of the variables. C C cov(l,1)=1.0 cov(l , 2)=0.0 cov(2, 1 )=0.0 cov(2,2)=1.0 iseed= 12345679 call rnset(iseed) call chfac(2 , cov , 2, 0. 0000 l ,irank,rsig,2) ssuml=O.O ssum2=0 . 0 k=0.5 h=5.50 C The first loop will generate 100000 out-of-control signals C do 10 countl =l, 100000 i=O count2=1 s_n(l)=O.O s_n(2)=0 . 0 12 if (i .ne. 1) then call rnmvn( 1,2,rsig,2,y, 1) C C Crosier ' s multivariate CUSUM algorithm. C temp_ I =s _ n( 1 )+y( 1 , 1 )+sqrt(value) temp _2=s _ n(2)+y(l ,2)+sqrt(value) PAGE 177 C _ N=(temp _ 1 * *2)+(temp _ 2 * *2) C _ N = sqrt(C _N) if (C _N .le. k) then s _ n(l)=O.O s_n(2)=0.0 else 162 s _n( 1 ) =(s _ n( 1 )+y( 1 , 1 ))*( 1-k/C _N) s_n(2)=(s_n(2)+y(l , 2))*(1-k/C _N) endif t=(s_n(l)**2) + (s_n(2)**2) t=sqrt(t) if (t .ge. h) then ssum 1 =ssum 1 +count2 ssum2=ssum2+count2 * count2 i = i + 1 endif count2 = count2 + 1 goto 12 endif 10 continue C C The average run lengths and the simulation error will be computed here. C call std( 100000 ,ssum 1,ssum2 , avg2,std2) write( 5 , 6) avg2,std2 , std2/sqrt( 100000.00) 6 format(lx, f9.4, lx,f9.4,lx,f9.4) stop end 5. ARL of Pignatiello and Runger' s (1990) Multiv a riat e CUSUM C C The key input and output variables will be declared now. C C integer irank,iseed , nout,index integer k , nout , n ,nr, countl , count2 parame t e r(k = 2,nr= 1) real y(nr, k),ssum l ,ss um 2,C (k) ,C t real cov(k, k) , value , cut ,MC l , rsi g( k,k) C The IMSL subroutines will be declared now. C PAGE 178 C external chfac,rnmvn , mset , umach.blinf call umach(k,nout) C Initialization of some of the variables. C C cov(l, 1)=1.0 cov( 1,2)=0.0 cov(2,l)=O.O cov(2,2)=1.0 value=O.O iseed= 12345679 cut=4.75 ssuml=O.O ssum2=0.0 open(unit=5,status= 'new' ,file='hO ' ) call chfac(k,cov,k,0.00001,irank.rsig,k) MCI=O.O n=l 163 C The first loop will generate I 00000 out-of-control signals . C do 1 0 index= 1, 100000 countl=O count2=1 C(l)=O.O C(2)=0.0 11 if ( count 1 .ne. 1) then C C Pignatiello and Runger's (1990) multivariate CUSUM procedure. C if(MCI .gt. 0.0) then n=n+l else n=l C(l)=O.O C(2)=0.0 endif call rnmvn(nr,k,rsig,k,y,nr) C( 1 )=C( 1 )+y( 1, 1 ) C(2)=C(2)+y(l ,2) Ct=sqrt(blinf(k,k,cov ,k, C, C)) MC 1 =max(Ct-0.SO*n,O.O) if (MC 1 .ge. cut) then ssurn 1 =ssum 1 +count2 ssum2=ssum2+count2 * *2 PAGE 179 count 1 =count 1 + 1 endif count2=count2+ 1 goto 11 endif 10 continue C 164 C Compute the average run length and the simulation error. C avg=ssum 1 / 100000.0 var=( ssum2-( ssum 1 * * 2 )/ 100000. 0 )/99999. 0 std=sqrt(var) se=std/sqrt( 100000.0) write(5,6) avg,std,se 6 format( 1 x,f8.3, 1 x,f8.3 , 1 x,f8.3) close(5) stop end 6. ARL of the Normal Theory Multivariate EWMA Chart (r=0.30) C C The key input and output variables will be declared now. C C integer irank , iseed ,nout integer indexO , index 1 , k , rep parameter (k=2) integer nr2,ldr2 parameter(nr2= 1,ldr2= 1) real Zprev(k),Znew(k),covar(2,2) real cut, ssum 1,ssum2 ,avg, var ,std,se real cov(k , k),rsig(k,k),y(nr2,k),chi C The IMSL subroutines will be declared now. C C external chfac,mrnvn ,mset,umach,blinf call umach(2,nout) C Initialization of some of the variables. C cov(l, 1 )= 1.0 cov(l ,2)=0.0 PAGE 180 C cov(2,l)=O.O cov(2,2) = 1.0 open (unit=5 , status='new' ,file='hO') call chfac(k , cov.2,0.0000 l ,irank,rsig, 2) iseed= 12345679 call mset(iseed) cut=I0. 08 ssuml=O.O ssum2=0.0 covar( I , I )=5 .67 covar(l , 2)=0.0 covar(2 , I )=0.0 covar(2,2)=5 .67 165 C The first loop will generate 100 , 000 out-of-control signals. C C do I rep= I , I 00000 count! =O count2=1 Zprev( I ) = 0 . 0 Zprev(2) = 0 . 0 Znew( 1 )=O. 0 Znew(2) = 0.0 C If count I does not equal I we generate a random deviate from the control period. C 45 if (countl .ne. 1) then call mmv n(nr2 ,k ,rsig , 2 , y , ldr2) C C Lowry et. al ( 1992) multivariate EWMA procedure C Znew( 1 )=0 . 3 *y(l, 1 )+O. 7*Zprev( I) Znew(2) = 0.3*y(l , 2)+0.7*Zprev(2) chi= blinf(k , k,covar ,k,Znew,Znew) if (chi .ge. cut) then ssum 1 = ssum 1 + count2 ssum2 = ssum2 + count2 * count2 count 1 = count 1 + 1 endif count2=count2+ I Zprev( 1 ) =Zne w( 1) Zpre v( 2 ) = Zne w( 2 ) g oto 4 5 endif 1 continu e PAGE 181 166 avg=ssum 1/100000. 0 var=( ssum2-100000. 0 * (avg** 2) )/99999. 0 std=sqrt(var) se=std/sqrt(20000.0) write( 5 ,6)avg,std,se 6 format( 1 xJ8.3, 1 x , f8.3, 1 x , f8.3) close(5) stop end 7. ARL of the "r" Chart C C The key input and output variables will be declared now. C C integer irank,iseed,ldr , ldrsig,nout , nr , k parameter (nr=2500,ldrsig=2,ldr=2500,k=2) integer nr2,ldr2,totall , sum 1 parameter (nr2 = 1 , ldr2 =1) integer index 1,index2 , i,count2 real total 2, total3 , cov(2,2) ,rsig(k, k ), temp real x(nr2,2),tvect(nr,2) real ssurn 1,ssurn2,status real davg , dstd,value C The IMSL subroutines will be declared now. C C external chfac ,mmvn, mset , umach,munf call umach(2,nout) C Initialization of some of the variables. C ssurnl=O . O ssurn2 = 0.0 value =O.O cov(l, 1 )=1.0 cov(l ,2 ) = 0 . 0 cov( 2, 1 ) = 0.0 cov(2,2) = 1.0 iseed = 12345679 call mset(iseed) call chfac(k , cov , 2,0.00001 , irank , rsig , ldrsig) PAGE 182 open( unit=5 ,status='new' ,file='rOt') C 167 C This first loop will generate 1000 out-of-control signals C do 10 indexl=l,1000 C C The base period is generated here. C C call rnmvn(nr,k,rsig,ldrsig,tvect,ldr) i=O count2=1 C If no out-of-control signal then generate another random deviate from the control C period. C 12 if (i .ne. 1) then C call rnmvn(nr2,k,rsig,ldrsig,x,ldr2) x(l , 1 )=x( 1, 1 )+sqrt( value) x( 1,2)=x( 1,2)+sqrt(value) C Compute the depth of the control period observation with respect to the base period C data cloud. C call depth( nr, tvect,x,status) if (status .eq. 0.0) then ssum 1 =ssum 1 +count2 ssum2=ssum2+count2 * *2 i=i+l endif count2=count2+ 1 goto 12 endif 10 continue C C See the previous programs for a program listing of the following subroutine. C call std(l000,ssuml,ssum2,davg,dstd) write(S,6) davg,dstd,dstd/sqrt(l 000 . 0) 6 format( 1 x,f9 .4, 1 x,f9 .4, 1 x,f9 .4) close(S) stop end C C This is the calling program for the main Depth Subroutine. C PAGE 183 subroutine depth(num, v ,x , sdep) integer num real v ( num , 2),x( 1 , 2),sdep , hdep real alpha(50000),tx(num) , ty(num) integer f(50000) , i , j , k do 20 i=l, num 168 tx(i)=v(i, 1) ty(i)=v(i , 2) 20 continue C call depth2(x( 1, 1 ) , x( 1 , 2) , num,tx,ty , alpha,f , sdep , hdep) return end C This subroutine will compute the data depths of points in the control period. C C subroutine depth2( u, v ,n , x,y ,alpha,f,sdep , hdep) real u, v , x(n) , y(n) , alpha(n),nums,nbad real p , p2 , eps,d,xu , yu,angle , alphk,betak , sdep , hdep integer f(n) , gi nums=O.O numh=O sdep=O.O hdep=O.O if (n .It. 1) return p=acos(-1.0) p2=p*2 . 0 eps=O.000001 nz=O c Construct the array ALPHA. C do 10 i=l, n d=sqrt((x(i)-u)*(x(i)-u)+(y(i)-v)*(y ( i)-v ) ) if ( d .le. eps) then nz=nz+l else xu=(x(i)-u) / d yu=(y(i)-v)/d if (abs(xu) .gt. abs(yu)) then if (x(i) .ge. u) then alpha(i-nz)=asin(yu) if (alpha(i-nz) .It. 0 . 0) then alpha(i-nz)=p2+alpha(i-nz) endif else PAGE 184 alpha(i-nz)=p-asin(yu) endif else if (y(i) .ge. v) then alpha(i-nz)=acos(xu) else alpha(i-nz)=p2-acos(xu) endif endif 169 if (alpha(i-nz) .ge. (p2-eps)) alpha(i-nz)=O.O endif 10 continue nn=n-nz if (nn .le . 1) goto 60 C c Sort the array ALPHA . C call sort(alpha,nn) C c Check whether Z=(U, V) lies outside the data cloud. C angle=alpha( 1 )-alpha(nn)+p2 do 20 i=2 , nn angle=amaxl (angle,(alpha(i)-alpha(i-1 ))) 20 continue if (angle .gt. (p+eps)) goto 60 C c Make smallest ALPHA equal to zero, c and compute NU=number of alpha < pi. C angle = alpha( 1) nu=O do 30 i = l ,nn alpha(i) = alpha(i)-angle if (alpha(i) .lt. (p-eps)) nu= nu+ 1 30 continue if (nu .ge. nn) goto 60 C c Mergesort the alpha with their antipodal angles beta , c and at the same time update i,f(i), and nbad. ja= l jb= l alphk = alpha( l) betak = alpha(nu + 1 )-p nn2 = nn * 2 PAGE 185 nbad=O 1=nu nf=nn do 40 j= 1,nn2 170 if ((alphk+eps) .lt. betak) then nf=nf+l if (ia .lt. nn) then ja=ja+l alphk=alpha(ia) else alphk=p2+ 1.0 endif else i=i+l if (i .eq. (nn+l)) then i=l nf=nf-nn endif endif f(i)=nf nbad=nbad+c((nf-i),2) if (ib .lt. nn) then jb=jb+l if ((ib+nu) .le. nn) then betak=alpha(ib+nu )-p else betak=alpha(ib+nu-nn)+p endif else betak=p2+ 1.0 endif 40 continue nums=c( nn,3 )-nbad C c Computation of NUMH for half space depth. C gi=O ja=l angle=alpha( 1) numh=minO( f( 1 ),( nn-f( 1))) do 50 i=2,nn if (alpha(i) .le. (angle+eps)) then ja=ja+l else gi=gi+ja PAGE 186 171 ja=l angle=alpha(i) endif ki=f(i)-gi numh=minO(numh ,minO(ki,(nn -ki))) 50 continue C c Adjust for the number NT of data points equal to (U,V): C 60 nums=nums+c(nz , 1 )*c(nn,2)+c(nz,2)*c(nn, 1 )+c(nz,3) if (n .ge. 3) sdep=(nums)/c(n,3) numh=nurnh+nz hdep=(numh+0.0)/(n+O.O) return end real function c(m,j) if (m .It. j) then c=O.O else am=real(m) if (j .eq. 1) c=am if(j . eq. 2) c=(am*(am-1))/2 if (j .eq. 3) c=(am*(am-l)*(am-2))/6 endif return end subroutine sort(b,n) C c Sorts an array B (of length N <=SOOOO) in O(NLogN) time. C real b(n),amm,xx dimension jlv(50000),jrv(50000) jss=l jlv(l)= l jrv(l)= n 10 jndl=jlv(jss) j r= jrv(jss) jss=jss-1 20 jnc=jndl J = Jr jtwe=(jndl+jr)/2 xx = b(jtwe) 30 if (b(jnc) .ge. xx) goto 40 jnc=jnc+ 1 goto 30 PAGE 187 40 if (xx . ge. bU)) goto 50 j=j-1 goto 40 50 if (inc .gt. j) goto 60 amm=b(jnc ) b(jnc)=b(j) b(j)=amm jnc=jnc+l j=j-1 60 if (inc .le. j) goto 30 if ((j-jndl) .lt. (jr-jnc)) goto 80 if (jndl .ge. j) goto 70 jss=jss+ 1 jlv(jss)=jndl jrv(jss)=j 70 jndl=jnc goto 100 80 if (inc . ge. jr) goto 90 jss=jss+ 1 jlv(jss)=jnc jrv(jss)=jr 90 jr=j 1 OOif (jndl . lt. jr) goto 20 if (jss .ne. 0) goto 10 return end 172 8. ARL o f th e " O " C h a rt C C The key input and output variables will be declared now. C C integer irank , iseed , ldr , ldrsig , nout ,nr , k parameter (nr= 2500,ldrsi g=2, ldr= 2 500 , k = 2 ) integer nr2, ldr2 , total l ,suml parameter ( nr 2 =5 , ldr2= 1) integ e r ind e x 1 , ind ex2, i,count2 real total 2, total3 , cov( 2,2 ) , rsig(k , k ) , temp real x (nr 2,2 ) , tvect(nr ,2 ) , ndep(nr) ,s t a tus0 r e al ssum 1 ,s sum2 ,s t atus, cut,staav g real dav g, dstd , value , stattot , x 2 ( 1 , 2 ) PAGE 188 173 C The IMSL subroutines will be declared now. C C external chfac , rnmvn ,mset,umach,munf call umach(2 , nout) C Initialization of some of the variables . C C ssuml=O.O ssum2=0.0 value=O.O cut=0.1668 cov(l, 1 )= 1.0 cov(l ,2)=0.0 cov(2, 1 )=0.0 cov(2,2)=1.0 iseed= 12345679 call mset(iseed) call chfac(k , cov , 2 , 0.00001 , irank , rsig , ldrsig) open( unit=5 , status='new' , file='rOt') C This first loop will generate 1000 out-of-control signals C do 10 index 1 = 1, 1 000 C C The base period is generated here. C C call rnmvn(nr , k , rsig , ldrsig , tvect , ldr) call depthO( tvect , ndep) i=O count2=1 C If no out-of-control signal then generate another sample from the control C period. C 12 if (i .ne . 1) then C call mmvn(nr2 , k,rsig,ldrsig , x,ldr2) x( 1, 1 )=x( 1 , 1 )+sqrt(value) x( 1,2 )=x( 1,2 )+sqrt( value) C Compute the depth of the control period observ ation w ith respect to the base period C data cloud. C stattot=O.O do 1 00 index 1 = 1 , nr2 x2( 1, 1 )=x(index 1, 1) PAGE 189 x2( l ,2)=x(index 1,2) C 174 C Refer to Program 7 for a program listing of this subroutine. C call depth( nr, tvect.x2 .status) C C This subroutine will compute the r values of each point in the control period. C call check( statusO , status,ndep) stattot=stattot+statusO I 00 continue staavg=stattot/real( nr2) if (staavg .le. cut) then ssum 1 =ssum I +count2 ssum2 =ssum2 +co un t2 * * 2 i=i+ I endif count2=count2+ I goto 12 endif IO continue C C See the previous programs for a program listing of the following subroutine . C call std( I 000,ssum 1,ssum2,davg,dstd) write(5,6) davg,dstd , dstd/sqrt( I 000.0) 6 format(l x,f9 .4, 1 x,f9 .4, I xJ9 .4) close(5) stop end C C This subroutine will compute the value of r for each point in the control period. C subroutine check( avg,status,ndep) integer sum,i real status,ndep(2500),avg sum=O do 10 i= I ,2500 if (ndep(i) .le. status) then sum=sum+l endif 10 continue avg=sum/2500.0 return end PAGE 190 175 C C This subroutine will compute the depths of the points in t he base period sample . C subroutine depthO( tvect , ndep) integer i , j ,k,f( 5000),index 1 real x( 5000),y( 5000) , u( 5000) , v( 5000) real tvect(2500 , 2),ndep(2500) , tx(5000) , ty(5000) real alpha(5000),tukdep do 10 i=l,2500 x(i)=tvect(i , 1) y(i)=tvect(i,2) u(i)=x(i) v(i)=y(i) 10 continue do 20 j= 1,2500 do 21 k=l, j-1 tx(k)=x(k) ty(k)=y(k) 21 continue do 22 l=j+ 1,2500 tx(l)=x(l) ty(l)=y(l) 22 continue do 23 index 1 = 1,2500 if (indexl . eq. j) then do 24 index2=j , 2499 tx(index2)=tx(index2+ 1) ty(index2)=ty(index2+ 1) 24 continue endif 23 continue call depth2(u(j),v(j),2499,tx,ty,alpha,f,simdep,tukdep) ndep(j )=simdep 20 continue return end 9. ARL of the "S" Chart C C The key input and output variables will be declared now. C PAGE 191 176 integer irank,iseed , ldr , ldrsig,nout , nr , k parameter (nr=2500,ldrsig=2,ldr=2500,k=2) integer nr2,ldr2,total 1,sum 1,vivek parameter ( nr2= 1,ldr2= 1) C integer index 1 , index2 , i,count2 real x2(1,2),ttl ,sl, s2 real total2 , total3,cov(2,2) , rsig(k,k) real x(nr2 , 2) , tvect(nr , 2),ndep(nr) real ssum 1 , ssum2 ,s tatus,status0 real davg,dstd,value C The IMSL subroutines will be declared now. C C external chfac,mmvn,mset,umach call umach(2 , nout) C Initialization of some of the variables. C C ssuml=O.O ssum2=0.0 value=O.O cov(l, 1 )= 1.0 cov(l ,2)=0.0 cov(2, 1 )=0.0 cov(2,2)=1.0 iseed= 12345679 call mset(iseed) call chfac(k,cov , 2,0 . 00001 , irank,rsig , ldrsig) open(unit=5 , status='new',file='r0') C The first loop will generate 1000 out-of-control signals. C do 10 indexl =1,1000 C C Generate the base period sample. C call mmvn(nr,k,rsig,ldrsig,tvect,ldr) C C Compute the depths of the observations in the base period sample. C Refer to Program 8 for a program listing of this subroutine. C call depthO( tvect,ndep) i=O count2=1 ttl=O.O PAGE 192 177 12 if (i .ne. 1) then C C If no out-of-control signal then generate a point from the control period. C C call rnmvn( 1 , k , rsig,ldrsig,x , 1) x2(1, 1 )=x( 1 , 1 )+sqrt(value) x2( 1,2)=x( 1 , 2)+sqrt(value) C Refer to program 7 for a listing of this subroutine . C call depth( nr , tvect , x2,status) C C Compute the r values for each point in the control period. C call check(statusO,status,ndep) ttl =ttl +status0-0.50 if (ttl .gt. 0.00) then ttl=0.00 else sl =count2*sqrt((l/2500. + l / real(count2))/12.) s2=ttl /sl if (s2 .le . -2 . 578) then ssum 1 =ssum 1 +count2 ssum2=ssum2+count2 * *2 i=i+l endif endif count2=count2+ 1 goto 12 endif 10 continue C C Refer to the previous programs for a program listing of this subroutine. C call std( 500,ssum l , ssum2,davg,dstd) write(5,6) davg , dstd , dstd/sqrt(SOO .O) 6 format(l x,f9 .4 , 1 x,f9 .4, 1 x,f9 .4) close(5) stop end PAGE 193 178 1 0. ARL of the R S T Chart C C The key input and output variables will be declared now. C integer i , irank,iseed , k , ldr , ldrsig , nout , nr parameter( nr2 =5 ,k=2,ldrsig= 2) integer indexO,index 1,count2 real avg , var ,std,se,dis( nr2 ) , temp real dist,tempi,tempj ,tempij , rsig(2 , 2) real cov(k,k),stat,r2(nr2,2),ssum 1.ssum2,cut,value C C The IMSL subroutines will be declared now. C external chfac,rnmvn,mset , umach call umach(2,nout) open( unit=5 , status='new' , file='vnO') C C Initialization of some of the variables. C C cov(l, 1 )= 1.0 cov(l ,2)=0.0 cov(2, 1 )=0.0 cov(2 , 2)=1.0 call chfac(k,cov,2,0 . 00001,irank,rsig,ldrsig) ssuml=O.O ssum2=0.0 cut=8.43 value=O.O iseed= 12345679 call rnset(iseed) C The initial loop will generate 100000 out-of-control signals. C do 10 indexO= 1, 100000 indexl=O count2=1 12 if (index 1 .ne. 1) then call rnmvn(nr2,2,rsig,2 , r2,nr2) do 1 i=l, nr2 r2(i, 1 )=r2(i , 1 )+sqrt( value) r2(i,2)=r2(i,2)+sqrt( value) continue stat=O.O PAGE 194 dist=O.O C C The RST procedure. C do 1002 i= 1 , nr2 tempi=r2(i, 1 )* *2+r2(i,2)* *2 do 1003 j= 1 , nr2 179 dist=r2(i , 1 )*r2(j, 1 )+r2(i , 2)*r2(j , 2) tempj=r2(j , 1)**2 + r2(j , 2)**2 tempij=sqrt(tempi*tempj) dist=dist / tempij stat=stat+dist 1003 continue 1002 continue stat=0.40*stat if (stat .ge. cut) then ssum 1 =ssum 1 +count2 ssum2=ssum2+count2 * *2 index 1 =index 1 + 1 endif count2=count2+ 1 goto 12 endif 10 continue avg=ssum 1/100000 . 0 var=(ssum2-(ssuml **2)/100000.0)/99999.0 std=sqrt(var) se=std/sqrt( 100000.0 ) write(5 , 6) avg,std , se 6 format(l x , f8.3, 1 x ,f8. 3 , lx,f8. 3) stop end 11. ARL of the PR-SRTChart C C The key input and output variables will be declared now. C integer i , irank , iseed , k , ldr , ldrsig,nout , nr parameter(nr2= 1 O,k=2,ldrsig=2) integer indexO,index 1,count2 real avg,var , std,se,dis(nr2),temp PAGE 195 180 real dist( nr 2 , nr2), tempi, tempj, tempij ,rsig(2 , 2) real cov(k , k).stat,r2( nr2.2 ),ssum 1,ssum2,cut, value C C The IMSL subroutines will be declared now. C external chfac , rnrnvn ,mset,umach call umach(2 , nout) open(unit=5 ,status='new' , file='vnO') C C Initialization of some of the key variables. C C cov(l, 1 )= l.O cov(l ,2)=0 . 0 cov(2, 1 )=0.0 cov(2,2)=1.0 call chfac(k , cov ,2,0.00001,irank,rsig,ldrsig) ssuml=O.O ssum2=0 . 0 cut=l0.32 value=O.O iseed=l2345679 call mset(iseed) C The first loop will generate 100000 out-of-control signals. C do 10 indexO= 1 , 1 00000 indexl=O count2=1 12 if(indexl .ne. l)then call rnrnvn(nr2,2,rsig,2,r2,nr2) do 1 i=l,nr2 r2(i , 1 )=r2(i, 1 )+sqrt(value) r2(i,2)=r2(i,2)+sqrt(value) dis(i)=r2(i, 1 )**2 + r2(i,2)**2 1 continue C call sort(nr2,dis,r2) stat=O.O C The PR-SRTprocedure. C do 1002 i= l ,nr2 tempi=r2(i, 1 )* *2+r2(i,2)* *2 do 1003 j= 1,nr2 dist(i,j)=r2(i, 1 )*r2(j, 1 )+r2(i,2)*r2(j ,2) tempj=r2(j,1)**2 + r2(j,2)**2 PAGE 196 tempij=sqrt ( tempi*tempj) dist(i,j )=dist(i , j ) / tempij stat=stat+dist(i,j )* i *j/(nr2 * *2) 1003 continue 1002 continue stat=0.60*stat if (stat .g e . cut ) then ssum 1 =ssum 1 + count2 ssum2=ssum2 + count2 * *2 index 1 =index 1 + 1 endif count2=count2+ 1 goto 12 endif 10 continue avg=ssum 1 / 100000. 0 181 var=( ssum2( ssum 1 * * 2 ) / 100000 . 0 ) /99999. 0 std=sqrt(var) se=std/sqrt( 100000. 0) write(5 ,6) avg, std,se 6 format(lx, f8.3 , lx, f8.3 , lx,f8.3) stop end C C The sort procedure. C subroutine sort( count , datain, vector) integer i , j, count real datain( count) , temp , vector( count,2) , t v ect l , tvect2 do 10 j=2, count temp=datain(j) tvect 1 =vector(j , 1 ) tvect2=vector(j , 2) i=j-1 6 if (i .gt. 0 .and. datain(i) .gt. temp) then datain(i+ 1 ) = datain ( i) vector(i+ 1, 1 )=vector(i, 1) vector(i+ 1 , 2)=vector(i ,2) i=i-1 goto 6 endif datain(i+ 1 )=temp vector(i+ 1 , 1 )=tvectl vector(i+ 1 , 2)=tvect2 10 continue PAGE 197 182 return end 12. ARL of the Robust EWMA Chart C C The key input and output variables will be declared now. C C integer irank,iseed,ldr,ldrsig , nout.nr integer indexO,index 1,k,rep,nr2,ldr2 parameter (ldrsig=2,k=2 , nr2= l , ldr2= 1) real Zprev(k),Znew(k),temp , temp 1 ( 1 ),temp2( 1) real cut,ssum 1,ssum2 , value,avg , var.std,se real cov(k.k) , rsig(k , k) ,y( nr2.k),temp3 real t(k),chi,covar(k,k),df,c( 1 ),u2 C The IMSL subroutines will be declared now. C C external chfac,rnmvn,rnset,umach,linrg ,blinf,rnunf call umach(2,nout) C Initialization of some of the variables. C C cov(l, 1 )= 1.0 cov( 1,2)=0.0 cov(2, 1 )=0.0 cov(2,2)= 1.0 covar(l, 1 )=38.0 covar( 1,2)=0.0 covar(2, 1 )=covar( 1,2) covar(2,2)=38.0 call chfac(k,cov , 2,0.00001,irank,rsig,ldrsig) iseed= 12345679 call rnset(iseed) cut=8.05 ssuml=O.O ssum2=0.0 C The first loop will generate 100000 out-of-control signals. C do 1 rep= 1, 100000 countl =O PAGE 198 count2=1 Zprev(l)=O.O Zprev(2)=0.0 Znew( 1 )=O. 0 Znew(2)=0 . 0 45 if ( count 1 .ne. 1) then C C The robust EWMA procedure. C call rnrnvn(nr2,k,rsig,ldrsig , y ,ldr2) y(l, 1 )=y( 1, 1 )+sqrt(value) y( 1,2 )=y( 1,2 )+sq rt( value) u2=sqrt(y( 1, 1 )* *2+y( 1,2)* *2) t( 1 )=y( 1, 1 )/u2 t(2)=y( l ,2)/u2 Znew( 1 )=0.1 *t( 1 )+0.9*Zprev(l) Znew(2)=0. l *t(2)+0.9*Zprev(2) chi =b linf(k,k,covar ,k,Znew , Znew) if (chi .ge. cut) then ssum 1 =ssum 1 +count2 ssum2=ssum2+count2 * count2 count 1 =count 1 + 1 endif count2=count2+ 1 Zprev( 1 )=Znew( 1) Zprev(2)=Znew(2) goto 45 endif 1 continue avg=ssum 1/100000.0 183 var=(ssum2-l OOOOO.O*(avg**2))/99999.0 std=sqrt(var) C se=std/ sqrt( 100000. 0) stop end 13. ARL of the V(n) Chart C The key input and output variables will be declared now. C integer i,irank,iseed,k,ldr,ldrsig,nout,nr PAGE 199 C 184 parameter(nr= 1 OO,nr2= 1O,k=2,ldrsig=2,ldr=100) integer index 1 (nr),track(nr,nr),rep,indexO integer index2.count2 real avg,var,std.se,rsig(2,2),temp real cov(k , k),r(nr,k),phat(nr , nr) real w _ n , wtemp , reject , val,r2(nr2,2) , ssum 1 , ssum2 , cut double precision pi C The IMSL subroutines will be declared now. C C external chfac.rnmvn,rnset,umach call umach(2 , nout) C Initialization of some of the variables. C C pi=22.0/7.0 cov(l, 1 )= 1.0 cov(l , 2)=0 . 0 cov(2 , 1 )=0 . 0 cov(2,2)=1 . 0 open(unit=l , status='new',file='randO') call chfac(k,cov , 2,0.00001 ,irank,rsig,ldrsig) ssuml=O.O ssum2=0.0 iseed= 12345679 call rnset(iseed) rep=lOOOOO val=O.O cut=9.38 C The first loop will generate 100 , 000 out-of-control variables. C do 10 indexO= I , rep call rnmvn(nr,k,rsig,ldrsig,r,ldr) index2=0 count2=1 12 if (index2 .ne. 1) then call rnmvn( nr2,2,rsig,2,r2,nr2) do 1001 i=l,nr2 r2(i, 1 )=r2(i, 1 )+sqrt(val) r2(i,2)=r2(i,2)+sqrt(val) 1001 continue do 30 i=l, nr2 do 40 j= 1,nr2 track( i,j )=O PAGE 200 phat(i,j)=O.O 40 continue 30 * continue do 50 i= l ,nr2 do 60 j= 1,nr2 call inter(nr,r2(i, 1 ),r2(i , 2), r2(j, 1 ),r2(j,2) , r,track(i,j)) 60 continue 50 continue C C The V,, procedure. C do 80 i=l, nr2 do 90 j=l, nr2 if (i .eq. j) then phat(i,j) =O.O else 185 phat(i,j)= (track(i,j)+ 1.0)/(nr + O .O) endif 90 continue 80 continue wtemp=O.O do 100 i=l, nr2 do 200 j =l, nr2 wtemp=cos(pi *phat( i,j) )+wtemp 200 continue 100 continue w _ n=(2./(nr2+0.0))*wtemp if(w_n .ge. cut) then ssum 1 =ssum 1 +count2 ssum2 = ssum2+count2 * *2 index2 = index2+ 1 endif count2 = count2+ 1 goto 12 endif 10 continue avg=ssum 1/100000. 0 var=( ssum2-( ssum 1 * * 2 ) / 1 00000. ) / 99999. 0 std=sqrt( var) se = std/sqrt( 100000.0) write( 1 ,2 )avg,std,se 2 format(l x,f8.3 , 1 x,f8.3 , 1 x,f8.3) close(l) PAGE 201 186 stop end C C This subroutine will calculate the interdirections. C subroutine inter( coun,val ,va2,vb l ,vb2 , v,track) integer coun, k , track real val ,va2 , vb l ,vb2,v(coun,2) real det 1,det2 track=O do 10 k=l, coun det 1 =va 1 * v(k,2 )-va2 * v(k, l ) det2=vb 1 *v(k,2)-vb2*v(k,1) if(detl *det2 .lt. O)then track=track+ 1 endif l O continue return end 14. ARL of the W(n) Chart C C The key input and output variables will be declared now. C C integer i , irank,iseed,k,ldr,ldrsig , nout , nr parameter(nr=25 , nr2= l 5 , k=2 , ldrsig=2 , ldr=25) integer index 1 (nr),track(nr,nr) , rep,indexO integer index2,count2 real cov(k, k),r( nr ,k ),covar(k , k) , dist( nr ) , y(k) , phat(nr , nr) real w_n, wtemp,reject,val , r2(nr2 , 2) ,ssuml,ssum2,cut, c(nr2) double precision pi C The IMSL subroutines will be declared now. C C external chfac,mmvn,rnset,umach,linrg,rnchi call umach(2,nout) C Initializations of some of the variables. C pi=22.0/7.0 cov(l, 1 )= 1.0 cov( 1,2)=0.0 cov(2, 1 )=0. 0 PAGE 202 187 cov ( 2 , 2)= 1.0 call chfac(k , cov , 2 , 0 . 00001 , irank,rsig , ldrsig) iseed= 12345679 C ssuml=O.O ssum2=0 . 0 call mset(iseed) val=O.O cut=l0.04 rep=lOOOOO C The first loop will generate 100,000 out-of-control signals. C do 10 indexO= 1,rep call mmvn(nr,k,rsig,ldrsig,r , ldr) call cacov(nr , covar , r) call linrg(2,covar , 2 , covar , 2) index2=0 count2=1 12 if ( index2 .ne. 1) then C C The W,, procedure . C call mmvn(nr2 , 2 , rsig,2 , r2 , nr2) do 5 i=l,nr2 do 6 j=l,2 r2(i,j)=r2(i,j)+sqrt(val) 6 continue 5 continue do 20 i= l , nr2 y( 1 )=r2(i , 1) y(2)=r2(i , 2) dist(i )=blinf(2,2,covar ,2,y , y) 20 continue C C See program IO for a listing of this subroutine . C call sort(nr2 , dist,r2) do 25 i=l,nr2 index I (i)=i 25 continue do 30 i=l , nr2 do 40 j= l , nr2 track( i,j)=O phat(i , j)=O.O PAGE 203 40 continue 30 continue do 50 i= ! .nr2 do 60 j=l,nr2 C 188 C See program 12 for a program listing of the following subroutine. C 60 50 90 80 * * call inter(nr,r2(i, 1 ),r2(i,2), r2(j, 1 ),r2(j , 2),r,track(i,j)) continue continue do 80 i=l, nr2 do 90 j=l, nr2 if (i .eq. j) then phat(i,j)=O.O else phat(i,j)=(track(i,j) + 1.0)/(nr+O.O) endif continue continue wtemp=O.O do 100 i =l, nr2 do 200 j=l, nr2 wtemp=cos(pi *phat( i ,j)) * index 1 (i) * index 1 (j ) / ((nr2**2+0.0))+wtemp 200 continue 100 continue w _ n = (6 ./ (nr2+0.0))*wtemp if (w_n .ge. cut) then ssum 1 = ssum I +count2 ssum2 = ssum2 + count2 * *2 index2 = index2+ I endif count2 = count2+ 1 goto 12 endif 10 continue avg =ssum I /rep var=(ssum2-((ssum 1 )**2)/rep )) / (rep-1) std = sqrt(var) se= std/sqrt(real( rep)) stop end PAGE 204 189 15. ARL of the H Chart C C The key input and output variables will be declared now. C integer irank,iseed,ldr , ldrsig , nout,nr integer indexO,index 1,k,rep parameter (nr= 1OO, ldrsig=2 , ldr=100,k=2) integer nr2 , ldr2 parameter(nr2= 1O, ldr2=10) real cut, ssum 1,ssum2 , value,avg,var,std,se real cov(k,k),rsig(k,k) ,y( nr2,k),yt(k,nr2) real x(nr,k),xt(k,nr),covar(k , k),u,v,eq(k) ,cp real osn(k,nr ),ost( nr ,k),t(k),chi,ost2( nr2,k), temp C C The IMSL subroutines will be declared now. C C external chfac,mmvn , mset, umach,linrg, bl inf call umach(2,nout) C Initialization of some of the variables. C cov(l, 1 )= 1.0 cov( 1,2)=0.0 cov(2, I )=0.0 cov(2 , 2)=1.0 open (unit=5,status='new',file='h0') call chfac(k,cov,2,0.00001 , irank,rsig,ldrsig) iseed= 12345679 call mset(iseed) value=O . O cut=9.84 ssuml=O.O ssum2=0.0 do 1 rep= 1 , 100000 call rnmvn(nr,k,rsig,ldrsig,x,ldr) C C The first loop will generate 100,000 out-of-control signals . C do IO indexO= 1,nr xt( l ,indexO)=x(indexO, 1) xt(2 , index0 )=x( indexO ,2) 10 continue C PAGE 205 190 C Get the sign vectors of the vectors in the base period. C do 40 indexO-= 1,nr call Rn(nr,indexO,xt,osn( 1,index0) , osn(2,index0)) ost(indexO , 1 )=osn( 1 , indexO) ost(index0 , 2)=osn(2,index0) 40 continue C C Compute the matrix B 1 â€¢ C call cacov(nr,covar,ost) call linrg(k,covar,k , covar,k) countl=O count2=1 45 if(countl .ne. 1) then C C Get the control period vectors and compute the sign vectors. C call rnmvn( nr2,k,rsig,ldrsig,y ,ldr2) do 50 index0=1,nr2 yt( 1,indexO)=y(indexO , 1 )+sqrt( value) yt(2,index0 )=y( indexO ,2 )+sq rt( value) 50 continue do 60 indexO= 1 , nr2 u=O.O v=O.O do 70 indexl=l,nr eq( I )=xt(2 , index 1) eq(2)=-xt( I , index 1) cp=eq( I )*yt( l ,indexO)+eq(2 )*yt(2,index0) if (cp .It. 0.0) then eq( I )=-eq( 1) eq(2)=-eq(2) elseif(cp .gt. 0.0) then eq(l)=eq(l) eq(2)=eq(2) else eq(l)=O.O eq(2)=0 . 0 endif u=u+eq(l) v=v+eq(2) 70 continue ost2(index0, 1 )=u/nr PAGE 206 191 ost2(index0 , 2)=v/nr 60 continue call sum(nr2,ost2 , t) chi=(blinf(k,k , covar ,k,t, t) )/real(nr2) if (chi .ge. cut) then ssum 1 =ssum 1 +count2 ssum2=ssum2+count2*count2 count 1 =count 1 + l endif count2=count2+ 1 goto 45 endif 1 continue C C Compute the average run lengths and the simulation errors C avg=ssum 1 / 100000. 0 var=(ssum2-l OOOOO.O*(avg**2))/99999.0 std=sqrt( var) se=std/ sqrt( 100000. 0) write( 5 , 6)avg , std,se 6 format( 1 x , f8.3 , 1 x , f8.3 , 1 x , f8.3) close(5) stop end C C Compute the vector T1 â€¢ C subroutine sum(N,ost , t) i nteger indexO , index 1,N , k parameter (k=2) real ost(N,k) , t(k) do 10 indexO= 1 , k t(indexO)=O.O do 20 indexl=l,N t(indexO)=t(indexO)+ost(index 1 , indexO) 20 continue 10 continue return end C C These subroutines will compute the sign vectors . C subroutine Rn(N,i , x,u,v) PAGE 207 integer N ,i,q real x(2,N),u,v.sl , s2,tl,t2 sl=O.O s2=0.0 do 10 q=l,N call sqeq(N , i.q,x,tl ,t2) sl =sl +tl s2=s2+t2 10 continue u=sl/N v=s2/N return end subroutine sqeq(N , i , q,x, t 1 , t2) integer i,q , temp 1 , N real x(2,N),tl,t2 real eq 1,eq2,temp2 tl=O.O t2=0.0 eql=O.O eq2=0.0 templ= q-i if(templ .ne. 0) then eql=x(2,q) eq2 = -x( l ,q) temp2=eql *x(l,i)+eq2*x(2,i) if (temp2 .It. 0) then tl =-eql t2=-eq2 else tl =eql t2=eq2 endif endif return end 192 PAGE 208 REFERENCES Alt, F . B. (1984). Multivariate quality control. Encyclopedia of Statistical Sciences, Volume 6 , S . Kotz and N . Johnson , eds . John Wiley and Sons , New York, 110122. Alt, F. B. and Smith , N . (1988) . Multivariate Process Control. Handbook of Statistics, Elsevier, Amsterdam , 333-351. Anderson , T. W . (1984) . An Introduction to Multivariate Statistical Analysis (Second Edition). John Wiley and Sons, New York. Crosier , R. B. ( 1988). Multivariate generalizations of cumulative sum quality-control schemes. Technometrics, 30(3), 291-303 . Crowder, S . V . (1989). Design of exponentially weighted moving average schemes. Journal of Quality Technology , 21(3), 155-162 . Healy , J. D. (1987) . A note on multivariate CUSUM procedures . Technometrics , 29(4), 409-412. Hettmansperger , T. P., Nyblom , J. and Oja . H . (1994) . Affine invariant multivariate one sample sign tests . Journal of the Royal Statistical Society B , 56(1) , 22 1-234 . Reuter, I. (1994) . The convex hull of a normal sample . Advances in Applied Probability, 26, 855-875 . Hotelling , H . ( 194 7). Multivariate quality control, illustrated by the air testing of sample bombsights . Techniques of Statistical Analysis. McGraw Hill , New York, 111184. Hunter, J. S . (1986) . The exponentially weighted moving average chart . Journal of Quality Technology , 18(4), 203-210 . Jackson, J. E. (1956). Quality control methods. Industrial Quality Contro l , 12(7) , 2-6 . Jackson , J. E . (1959) . Quality control methods for several related variables. Technometrics , 1(4) , 359-3 77. 1 93 PAGE 209 194 Jackson , J. E. (1980). Principal components and factor analysis: Part IPrincipal components. Journal of Quality Technology, 12(4), 201-213 . Jackson, J.E. (1985). Multivariate Quality Control. Communications in Statistics-Theory and Methods , 1-1(11), 2657-2688. Johnson, M . E . ( 1986). Multivariate Statistical Simulation. John Wiley and Sons , New York. Liu, R . Y. (1990). On a notion of data depth based on random simplices. The Annals of Statistics, 18(1), 405-414. Liu, R . Y. (1995). Control charts for multivariate processes. Journal of the American Statistical Association , 90(432) , 1380-1387. Liu, R. Y. and Singh, K. ( 1993) . A quality index based on data depth and multivariate rank tests. Journal of the American Statistical Association, 88(421), 252-260. Lowry , C. A., Woodall, W . H . , Champ, C. W ., and Rigdon , S . E . (1992) . A multivariate exponentially weighted moving average chart . Technometrics , 34(1) , 46-53 . Lucas. J. M. and Saccucci , M. S. (1990). Exponentially weighted moving average control schemes : Properties and enhancements . T ec hnometrics , 32(1), 1-29 . Montgomery, D . C . (1991). Introduction to Statistical Quality Control (Second Edition). John Wiley and Sons, New York. Muirhead , R. J. (1982). Aspects of Multivariate Statistical Theory. John Wiley and Sons , New York. Page , E. S . (1954) . Continuous inspection schemes . Biometrika , 41(1), 100-115 . Peters , D . and Randles , R. H. (1990) . A multivariate signed-rank test for the one-sample location problem. Journal of the American Statistical Association, 85( 410) , 552557 . Pignatiello , J. J. and Runger, G . C . ( 1990) , Comparisons of multivariate CUSUM charts . Journal of Quality Technology , 22(3), 173-186. Randles , R . H . (1989). A distribution-free multivariate sign test based on interdirections. Journal of the American Statistical Association , 84(408) , 1045-198 9 . Roberts , S . W. (1959). Control chart tests based on geometric movin g average charts . T e chnometrics , 1(3) , 239-250 . PAGE 210 195 Rousseeuw , P . J. (1984). Least median of squares regression. Journal of the American Statistical Association , 79( 5) , 871-881. Rousseeuw , P . J. and Ruts, I. (1992). B iv ariate simpl ici al depth . Technical Report , University of Antwerp , Department of Mathematics and Computer Science . Sullivan, J. H. and Woodall , W . H. (1996). A comparison of multivariate control charts for individual observations. Journal of Quality Technology, 28( 4) , 398-408. Tracy , N . D . Young , J.C., and Mason, R . L. (1992) . Mu ltivariate control charts for individual observations. Journal of Quality Technology, 24(2), 88-95 . Woodall , W . H. and Ncube, M . M. (1985). Multivariate CUSUM qual ity control procedures. Technometrics , 27(3), 285-292. Woodall , W. H. and Faltin , F . W . (1996). An o verview and Perspective on control charting . Statistical Applications in Proc ess Co ntrol . Marcel Dekker, Inc., New York , 7-20. User's Manual , IMSL STAT/LIBRARY, FORTRAN Subroutines for Statistical Analysis (1989) PAGE 211 BIOGRAPHICAL SKETCH Vivek Balraj Ajmani was born on December 18, 1963 in Calcutta, India. He moved to Bombay , India in 1969 and lived there until 1987 earning a Diploma in Hotel Management and Catering Technology from the Institute of Hotel Management and Catering Technology in 1985. After a short stint as a Management Trainee at the Nataraj Hotel in Bombay, Vivek came to the Indiana University of Pennsylvania , Indiana, PA in January, 1987 . He earned a Bachelor of Science degree with a double major in Computer Science and Applied Mathematics in 1990 and earned a Master of Science in Applied Mathematics in 1992 . Vivek entered the Department of Statistics at the University of Florida in 1992 and received a Master of Statistics degree in 1994, and will receive a Doctor of Philosophy in Statistics in May , 1998. While at the U niversity of Florida , Vivek taught a number of undergraduate statistics courses , a job he enjoyed very much. Vivek was also a statistics consultant in the statistics center at the Center for Instructional & Research Computing Activities ( CIRCA) from July , 1994 until January , 1998 . Outside statistics , Vivek enjoys watching movies , listening to jazz and blues , cooking , and collecting Lionel toy trains . Vivek is married to the form e r Dr. Preeti Saini . After graduation , Vivek plans on working for the 3M company located in Austin , Texas. 196 PAGE 212 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarl y presentation and is fully adequate , in scope and quality, as a dissertation for the degree of Doctor of Philosophy . --, i '! ,,,. j _./h.-z.;k ~' G. Geoffrey;wnirfg, Chairman Associate Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate , in scope and quality , as a dissertation for the degree of Doctor of Philosophy. Ronald Randles ' Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate , in scope and quality , as a dissertation for the degree of Doctor of Philosophy . Malay Ghosh . . 1 Professor of Stat1st1c l I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate , in scope and quality, as a dissertation for the degree of Doctor of Philosophy . John Cornell Professor of Statistics PAGE 213 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate , in scope and quali ty, as a dissertation for the degree of Doctor of Philosophy . r\ I i . /J I ., . : )dlU'-i.........._ /-/G'ii..c;........,._,_-Dianne Schaub Assistant Professor of Industrial and Systems Engineering This dissertation was submitted to the Graduate Faculty of the Department of Statistics in the College of Liberal Arts and Sciences and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. May 1998 Dean , Graduate School |