Local assessment of perturbations

MISSING IMAGE

Material Information

Title:
Local assessment of perturbations
Physical Description:
viii, 202 leaves : ill. ; 29 cm.
Language:
English
Creator:
Hartless, Glen Lawson, 1971-
Publication Date:

Subjects

Subjects / Keywords:
Statistics thesis, Ph. D   ( lcsh )
Dissertations, Academic -- Statistics -- UF   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 2000.
Bibliography:
Includes bibliographical references (leaves 194-201).
Statement of Responsibility:
by Glen Lawson Hartless.
General Note:
Printout.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 024884073
oclc - 45301198
System ID:
AA00020429:00001

Full Text









LOCAL ASSESSMENT OF PERTURBATIONS


By

GLEN LAWSON HARTLESS














A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


2000
































Copyright 2000

by

Glen Lawson Hartless


































To Christy













ACKNOWLEDGMENTS

I would first like to thank my chairman, Jim Booth, and cochairman, Ramon

Littell, for all of their help and guidance. Their combined knowledge, insight, and

mentoring has been extraordinary. I would also like to express my gratitude to them

for allowing me to pursue avenues of research that are probably closer to my heart

than theirs. Thanks also go to the other esteemed professors who served on my

supervisory committee: James Algina, Randy Carter, and James Hobert.

I would also like to acknowledge people who have influenced me academically

and professionally over the years: Chuck Nowalk, John Grant, and many others

in the Phillipsburg and Lopatcong school systems, Walter Pirie and the faculty

at Virginia Tech, Ron DiCola at AT&T/Lucent, Larry Leemis at the College of

William & Mary, Danny Perkins, Bo Beaulieu, and many others at the University of

Florida, and also fellow students Jonathan Hartzel, Phil McGoff, and Galin Jones,

among many others. I would also like to thank Allan West for maintaining an

extremely reliable computing environment.

Thanks also go to my parents for their love and support and for instilling in

me the discipline needed to achieve my goals.

Finally, my utmost thanks go to my wife, Christine. I am forever grateful for

her unequivocal love, unending patience, and thoughtful insight.















TABLE OF CONTENTS


page


ACKNOWLEDGMENTS .................


ABSTRACT .... ..... .... ..... .... .. .. .. .. .

CHAPTERS

1 INTRODUCTION TO LOCAL INFLUENCE ............


Prem ise . . . . . . . . . .
Measuring the Effects of Perturbations .
Local Influence Diagnostics .........
Perturbations in Regression .........
Basic Perturbations .............
Brief Literature Review ...........
O utline . . . . . . . . . .


2 COMPUTATIONAL ISSUES IN LOCAL INFLUENCE ANALYSIS 21

2.1 Curvatures and the Direction of Maximum Curvature .... 21
2.2 Local Influence Analysis of a Profile Likelihood Displacement 23
2.3 Building Blocks of Local Influence for Regression ....... 24
2.4 Perturbations to the Response in Regression ........ 27
2.5 Unequal Measurement Precision Applied to Regression . 31
2.6 Benchmarks for Perturbation Analyses . . . . .... ..36
2.7 Local Assessment Under Reparameterizations . . . ... ..40

3 PERTURBATIONS TO THE X MATRIX IN REGRESSION . 49

3.1 Perturbations to Explanatory Variables . . . . ... ..49
3.2 Perturbations to a Single Column . . . . . . .... .. 50
3.3 Perturbations to a Single Row . . . . . . . .... .. 59
3.4 Perturbations to the Entire Design Matrix . . . .... ..65

4 LOCAL ASSESSMENT FOR OPTIMAL ESTIMATING FUNCTIONS 87


Inferential Platform . . . . . . . . . .
Local Assessment of Perturbations . . . . . .
Acceleration, Curvature and Normal Curvature . . .
Algebraic Expressions for Local Assessment . . . .
Computational Formulas for J and P . . . . .


. .. 88
. . 93
. .. 96
. . 98
. .. .103


. . . iv


3
8
3
8
. . . 18
. . . 18
. . . 20








4.6 Local Assessment for Maximum Likelihood Inference . . 110
4.7 First-order Assessment of the Log Generalized Variance . 116
4.8 Local Assessment for Quasi-likelihood Inference ....... .125

5 LOCAL ASSESSMENT OF PREDICTIONS . . . . . .... ..134

5.1 Prediction via First Principles . . . . . . . ... .. 134
5.2 Predictive Likelihood . . . . . . . . . ... .. 136
5.3 Influence Measures for Prediction . . . . . . ... .. 138
5.4 Estimation and Prediction in Linear Mixed Models . . . 140
5.5 Local Influence Methods for Linear Mixed Models . . . 146
5.6 One-way Random Effects Model . . . . . . ... .. 151
5.7 Residual Variance Perturbation Scheme . . . . .... ..154
5.8 Cluster Variance Perturbation Schemes . . . . .... ..159
5.9 Application to Corn Crop Data . . . . . . ... .. 167

6 MULTI-STAGE ESTIMATIONS AND FUTURE RESEARCH . 185

6.1 Local Assessment in Multi-stage Estimations . . . ... ..185
6.2 Synopsis and Additional Future Research . . . . .... ..186

APPENDICES

A MODEL FITTING INFORMATION FOR SCHOOL DATA . . 188

B MODEL FITTING INFORMATION FOR CORN CROP DATA . 191

REFERENCES . . . . . . . . . . . . . . . . ... .. 194

BIOGRAPHICAL SKETCH . . . . . . . . . . . . ... .. 202













Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy



LOCAL ASSESSMENT OF PERTURBATIONS



By

Glen Lawson Hartless

August 2000

Chairman: James G. Booth
Major Department: Statistics

Statistical models are useful and convenient tools for describing data.

However, models can be adversely effected by mistaken information, poor

assumptions, and small subsets of influential data. Seemingly inconsequential

changes to the observed data, the postulated model, or the inference technique may

greatly affect the scientific conclusions made from model-fitting. This dissertation

considers local assessment of perturbations in parametric and semi-parametric

models. The effects of perturbation are assessed using a Taylor-series approximation

in the vicinity of the fitted model, leading to graphical diagnostics that allow the

analyst to identify sensitive parameter estimates and sources of undue influence.

First, computational formulas are given to assess the influence of perturbations

on linear combinations of the parameter estimates for maximum likelihood. A
diagnostic plot that identifies sensitive fitted values and self-predicting observations

in linear regression is introduced. Second, the eigensystem of the acceleration matrix

for perturbing each element of the design matrix in linear regression is derived.








This eigensystem leads to a diagnostic plot for identifying specific elements that

are influential, and it is shown how the results relate to principal components,

collinearity, and influence diagnostics.

Next, it is shown that the conceptual framework and computational tools

easily extend from maximum likelihood to inference that is performed using

estimating equations. In particular, local assessment of parameter estimates and

standard errors derived from optimal estimating functions is addressed. Algebraic

expressions and computational formulas are given for both first- and second-order

assessment, and applications such as quasi-likelihood are considered.

Finally, the techniques are applied to linear mixed models in order to assess

the influence of individual observations and clusters. While previous applications

considered only estimation, the effects of perturbations on both estimation and

prediction is addressed. It is shown that the effects on the fixed and random effect

inferences can be assessed simultaneously by using an influence measure based on

the unstandardized predictive log-likelihood of the random effects.













CHAPTER 1
INTRODUCTION TO LOCAL INFLUENCE
1.1 Premise

George Box (1982, p. 34) stated "all models are wrong; some models are

useful," and A. F. Desmond (1997b, p. 117) stated "a robust method ought not to

perform poorly in neighborhoods of the nominally true model." Indeed, conclusions

based on a statistical model should be reliable under small perturbations to the

data, the model's assumptions or the particulars of the estimation method. In a

seminal paper, Cook (1986) utilized the basic concept of perturbing the postulated

model or the observed data to develop a variety of diagnostic tools. He showed how

small perturbations can be used to develop easily-computable diagnostics for almost

any model estimated by maximum likelihood (ML) estimation.

1.2 Measuring the Effects of Perturbations

In a statistical analysis, parameter estimates are a crucial part of the

decision-making process. These estimates are a function of the observed data, the

postulated model, and the analyst's choice of statistical inference technique. Any

one of these three aspects of a statistical-modeling situation is subject to uncertainty

and doubt: observed data may be incorrectly recorded or subject to rounding

error, a slightly different model may be just as plausible as the postulated model,

and nuisance aspects of modeling such as choosing hyperparameters are subject to

second-guessing. If small changes in these aspects of a statistical modeling problem

create large changes in parameter estimates, the analysis is unreliable. Therefore,

the change in parameter estimates due to perturbation can serve as a barometer for

the robustness of the scientific conclusions.








Following Cook (1986), a vector of q perturbations is denoted as
w = (WO1, W2,... wq)', and the q-dimensional space of possible perturbations is

denoted as 0. Included in this space is one point, wo, that does not change

the original formulation of the problem, and this is called the null perturbation.

The perturbations are not parameters to be estimated, but rather they are

known real-valued quantities introduced into the problem by the analyst. These

perturbations might represent measurement errors, unequal variances, case weights,

correlated errors, missing predictors or some other change to the data, model or

inference method. Prudent choice of perturbations can lead to useful diagnostics.

For example, if parameter estimates are sensitive to perturbing the contribution

(weight) of a particular observation to estimation, it indicates that the observation

is overly influential on the estimates.

For data y and parameters 0, let L(O; y) denote the log-likelihood from

the original postulated model, and let L,(0; y) denote the log-likelihood resulting

from the inclusion of perturbations. If the perturbation specified is the null

perturbation, then the two likelihood are equivalent. The resulting maximum

likelihood estimators (MLEs) of these two likelihood are denoted as 0 and 06,

respectively. Cook (1986) used the likelihood displacement to summarize parameter

estimate changes under perturbation:

LD(w) = 2[L(0; y)- L(06; y)]. (1.1)

From inspection of (1.1), we see that the likelihood displacement is the difference of

the original likelihood evaluated at two values: the original MLEs and the perturbed

MLEs. Because 0 maximizes the original likelihood, LD(w) is non-negative and

achieves its minimum value of zero when w = w0. As the perturbed MLEs become

increasingly different from the original MLEs, LD(w) becomes larger. Thus, the








size of LD(w) provides a measure of how much the perturbation affects parameter
estimation.
Effects on a subset of parameters, 01 from 0' = (0', 0'), can be obtained by

using the profile likelihood displacement:

LD[I"w) = 2[L(01, 02; y) L(01,, g(06y); y)], (1.2)

where 0 is from = (0 2.), and g(0}1) is the vector that maximizes the profile

log-likelihood, L(01,, 02), with respect to 02. The profile likelihood displacement
allows influence to be "directed" through 01, with the other parameter estimates
only changing in sympathy.
In the next section it is shown how to obtain diagnostic tools from a

local influence analysis of the likelihood displacement and the profile likelihood
displacement.

1.3 Local Influence Diagnostics

Although we could simply choose values for w and compute LD(w), this

is not particularly useful for two reasons. First, LD(w) is often not a closed-form

function of w. Hence, if we wish to find its value, re-fitting the model may be
necessary. Second, even if calculating LD(w) was easy, we must somehow decide

what values to use for w. This choice would be highly data-dependent and a
nuisance for the analyst.
Therefore, we would like a methodology that can examine the information

contained in the function LD(w) and yet does not require a choice of actual values
for w nor involve re-estimating the model parameters. Following Cook (1986),
this can be accomplished using the notion of curvature. Simple plots can then be
used to identify whether estimation is sensitive to any particular elements of the
q-dimensional perturbation.


















1-f -----

Figure 1.1: Schematic of a plane curve with its tangent line. The angle between the
tangent line and the x-axis is .


1.3.1 Curvature of a Plane Curve
Curvature is a measure of how quickly a curve turns or twists at a particular
point (Swokowski 1988). Let (f(t), g(t)) be a smooth plane curve. The arc length
parameter s of such a curve is the distance travelled along the curve from a fixed
point A to another point on the curve (Figure 1.1). Letting tA denote the value of t
that yields point A, s can be computed as

s(t) = (f(u))2 + (gt(u))2)]du, (1.3)

and we may express the curve in terms of s: (f(s-l(s)), g(s-l(s))). Additionally, let
T(s) be the tangent line at a point on the curve, and let q denote the angle between
T(s) and the x-axis.
Curvature is defined as the absolute value of the rate of change in q with
respect to s:

C 0 (1.4)

A plane curve that is turning quickly at (f(t), g(t)) has a larger curvature than a
curve that is nearly straight at that point.








2

1.5


0.5

12
-0.5

-1

-1.5
-2

Figure 1.2: Curve with two circles of curvature



The reciprocal of curvature is called the radius of curvature, and it is the

radius of the best fitting circle at that point on the curve. These circles of curvature

have the same first and second derivatives and tangent line as the curve and lie on

its concave side. Figure 1.2 shows the curve (j sin(t),1 sin(2t)) and two of its circles

of curvature. The one on the left occurs when t = 3 while the other one occurs
2

when t = arccos(O). Their curvatures are 6 and 8/9 respectively. The expression

"turning on a dime" refers to to a sharp turn: one in which the radius of curvature

is small, and consequently the curvature of the path is large.

The formula for curvature is


f'(g"(t) g'(t)f"(t)
C =---------3-.(1.5)
((f'(t))2 + (g(t))2)3

We are particularly interested in assessing curvature at the local minimum of a

function from x to y, i.e. when x = t. For this special case, curvature measures

how quickly the function increases from that minimum value. In this case, (1.5)

simplifies to the acceleration of the curve:

S19X2g) n(
0 = 2g,~ = (1.6)
C x x=xo






















Figure 1.3: Likelihood displacement with directional vectors


where g(xo) is the function's minimum. This result can be applied to LD(w) at w0

when there is only a single perturbation, w. Here, the curvature is given by

O2LD(w) (1.7)
Ow2 Ww=o(1

and it measures how quickly the parameter estimates change when w is moved

away from the null perturbation. A large curvature indicates that even if w is close

to, or local to w0, the likelihood displacement increases substantially. Hence, a

large curvature indicates that even a slight perturbation can be very influential on

parameter estimates.

1.3.2 Using Plane Curves to Describe LD(w)

The curvature of plane curves can also be used to describe LD(w) when

q > 1. Now w can be moved away from wo in different directions, as shown by

the vectors in Figure 1.3 for a two-dimensional perturbation. These vectors can be

written as


LO,(a) = wo+ av,


(1.8)




















Figure 1.4: Intersection of plane and surface creates space curve


where the vector v determines the direction and the value a determines the position
relative to wo. Let P, denote a plane that contains w,(a) and extends directly
upward from 0 to intersect LD(w) (Figure 1.4). The intersection of P" and LD(w)
forms a space curve, (w,(a)', LD(w,(a)))1 Because this space curve is contained
within the plane P,, its curvature can be obtained as if it were a plane curve,

(a, LD(aw,(a)). Thus the curvature of LD(w,(a)) at w0 is simply

Cv 2LD( (a)) (1.9)
Oa2 La=O

Different directions produce curvatures of various sizes. The normalized vector that
produces the largest curvature is denoted Vmax and is of particular interest.

1.3.3 Direction of Maximum Curvature, Vmax
The relative sizes of the elements of vmax indicate which elements of w are
most influential on the parameter estimates. For example, if the ith element of Vmax



1 More formally, these space curves are called normal sections. The term comes
from the fact that each plane, Ps, contains the surface's principal normal vector, the
vector that is orthogonal to the surface's tangent plane at Wo. Curvatures of the
normal sections are referred to as normal curvatures.








is large, then wi is a key player in producing the largest curvature of LD(w) at the

null perturbation, i.e. in producing an immediate change in parameter estimates.

The direction of maximum curvature is the main diagnostic of Cook's local

influence methodology. Typically, Vma is standardized to have length one and is

then plotted against indices 1, 2,..., q. If any elements are particularly large relative

to the others, they stand out in the plot. The analyst then may want to take steps

to alleviate the sensitivity to the perturbation. What, if any, steps to be taken

depend upon the perturbation scheme and the particulars of the data analysis.
For some models and perturbation schemes, the formula for Vmax is known

explicitly, while in other situations it requires some moderate computations. In

analyses such as regression, Vmax is typically a function of familiar quantities.

1.4 Perturbations in Regression

Consider the normal linear regression model:

Y = XO + C, (1.10)

where Y is a n x 1 vector of responses, X is a known n x k model matrix of rank k,

,3 is a k x 1 vector of unknown parameters, and e is a n x 1 vector of independent

identically distributed normal random variables with mean 0 and variance or2 > 0.

Hence, Y N(X ,oa2In), and 0 = (01,...,3k, a2)'.

1.4.1 Scheme 1: Measurement Error in the Response

In order to examine the effects of measurement error in the response,

Schwarzmann (1991) introduced perturbations to the observed data. He considered

fitting a model to the observed data set with slight perturbations: y, + wi, for

i = 1,..., q. Hence, w0 is a vector of zeroes for this perturbation scheme. By letting

q = n, all of the observations can be perturbed simultaneously. This allows the











Table 1.1: Scottish hills race data


Observation Record Ti
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Source: Atkinson (1986)


me (min.)
16.083
48.350
33.650
45.600
62.267
73.217
204.617
36.367
29.750
39.750
192.667
43.050
65.000
44.133
26.933
72.250
98.417
78.650
17.417
32.567
15.950
27.900
47.633
17.933
18.683
26.217
34.433
28.567
50.500
20.950
85.583
32.383
170.250
28.100
159.833


Distance (miles)
2.5
6.0
6.0
7.5
8.0
8.0
16.0
6.0
5.0
6.0
28.0
5.0
9.5
6.0
4.5
10.0
14.0
3.0
4.5
5.5
3.0
3.5
6.0
2.0
3.0
4.0
6.0
5.0
6.5
5.0
10.0
6.0
18.0
4.5
20.0


Climb (ft)
650
2500
900
800
3070
2866
7500
800
800
650
2100
2000
2200
500
1500
3000
2200
350
1000
600
300
1500
2200
900
600
2000
800
950
1750
500
4400
600
5200
850
5000








analyst to determine which observations would have the most effect on parameter

estimates in the event that there is measurement or rounding error.

1.4.1.1 Influence on the full vector of parameter estimates

Atkinson (1986) analyzed a data set of record times for 35 different hill races,

as recorded by the Scottish Hill Runners Association (Table 1.1). Scatter plots

indicate the positive association of the three variables: winning time, race distance

and race climb. (Figure 1.5). The influence of perturbations on the parameter

estimates from a regression of winning time on DISTANCE and CLIMB is of
interest. For convenience, the regressors and response have been centered and scaled,

thereby eliminating the need for an intercept. Thus, the full vector of p parameters
is 0 = (ldist,I Acimb, a2)'.

Figure 1.6 shows vmax plotted against the indices 1 through q = n = 35 for

the Scottish hills data. Element 18, which corresponds to perturbation of the 18th

observation, is the dominant contributor to the direction of maximum curvature.

This observation is from a short race with little CLIMB, and yet it has a large

record time. Similar races have much smaller record times, indicating that race 18

is an outlier. Atkinson (1986) has actually suggested that the winning time for this

race may have been 18.67 minutes rather than 1 hour and 18.67 minutes.

Interestingly, it can be shown that the direction of maximum curvature for

Scheme 1 is proportional to the residuals. Thus, if we standardized the vector
of residuals to length 1 and plotted them against case indices, we would obtain

Figure 1.6. This implies that parameter estimates are most sensitive to a set of

small additive measurement errors when those errors tend to exaggerate (or correct)

the model's error in estimating each data point, with the most exaggeration (or

correction) occurring for the most poorly predicted points.




















.18 1,


043
5 10 15 20 25 30 D~ac


(a) Winning
distance


time vs. race


01)

50 e.


1000 2000 3000 4000 5000 6000 7000 80061imb


(b) Winning time vs. race
climb


Climb
8000
a7
7000
6000
5000
4000
3000
2000 "' *11
*0
1000 J ell
1 5 10 15 20 25 30 Distance
5 10 15 20 25 30


(c) Race climb vs. race dis-
tance


Figure 1.5: Hills data; scatter-plots with highlighted cases


1

0.75
84
0.5

4 0.25
o
0
0

-0.25
-0.5

-0.75


0 5 10 15 20
Case


25 30 35


Figure 1.6: Hills data; response perturbation; Vma. for 0


0


0


a ** ; O0 -






12


I 1
0.75 0.75
0.5 0.5
0.25 a 0.25 *
,.' .. .' .. .. .. '' --- .*,0>* '* "al -- *
i* n. ... *5 'a

-0.25 -0.25
-0.5 -0.5
-0.75 -0.75
1I -1

(a) Vmax for directed influ- (b) vmax for directed influ-
ence on /dist ence on climb

Figure 1.7: Hills data; response perturbation



1.4.1.2 Influence on a subset of parameter estimates

Plotting Vmax for the profile likelihood displacement, LD011 (w), can indicate

which components of w affect a subset of parameters. Examining influence on

91 = a2 results in the same vm,,a as the original likelihood displacement, i.e. it is

proportional to the residuals. The directions of maximum curvatures for 01 = /dist

and 01 = dcimb are shown in Figures 1.7(a) and 1.7(b). The first plot indicates that

small measurement error in race 11's winning time is most influential on A&ist. In the

plot of v.ax for fliimb, elements 7 and 11 stand out. The fact that the two elements

have opposite signs can be used for interpretation. They indicate that maximum

curvature is primarily achieved by perturbing winning times 7 and 11 in opposite

directions (i.e. making one observation smaller and the other larger). Thus, if the

record winning times for races 7 and 11 were slightly closer together or farther apart,

this would affect the slope estimate for CLIMB.

1.4.1.3 Plotting options

Note that the standardized Vma,,, is only unique up to sign change. For

example, Figure 1.8 displays the same standardized vector as Figure 1.6, but with

all of the elements' signs switched. This suggests an alternative plot in which the








1
0.75
0.5
0.25 o o

025 1" 15 ~ 2r O 30 *35
-0.25
O'I
-0.5
-0.75
-1

Figure 1.8: Hills data; alternate rendering of vma.,,



signs of the elements are suppressed: a star plot. Here, each element is displayed

as a point, with the distance from the origin equal to its absolute value. The q

elements are simply displayed around the point (0,0), starting in the first quadrant

and ending in the fourth quadrant. The plot is completed by labelling the points

with indices. Labeling all points can result in cluttered plot (1.9(a)), and so a better

option is to only label the largest elements (1.9(b)). In these renderings, element

18 is again noticeably large, but our attention is also drawn to element 7. This

observation has the characteristics of an influential observation: it has the highest

climb and one of the longest distances (Figure 1.5(c)), as well as the second largest

residual.

Unlike the index plot, the star plot does not give the illusion that the

indices are in a specific order. However, the suppression of element signs can be an

advantage or a disadvantage. The advantage is that when looking at several plots

of directional vectors, suppressing the signs helps the analyst identify patterns by

emphasizing element size. The disadvantage is that the diagnostic interpretation of

the signs is lost.










1- 1-



0.5- 0.5-
7 7

5
2
35 0.5 1 1 0.51
18 13 18
26 31

-0.5- -0.5-



-1. -1-

(a) All elements numbered (b) Largest elements numbered


Figure 1.9: Hills data; IVmaxl as a star plot



1.4.2 Scheme 2: Perturbations to Measurement Precision

Another assumption that can be perturbed is that of equal variance (Cook

1986). This is accomplished by letting V(yi) = for i = 1,... n. Here the null

perturbation is w0 = In.

With this perturbation scheme, we deem that some observations are observed

with more precision than others. The resulting perturbed MLEs weight each case

according to its perturbed variance. If certain cases are influential, then even a

slight down-weighting or up-weighting of them produces large displacements.

This perturbation scheme has an important connection to the widely-used

influence measure Cook's Distance (Cook 1977). Recall that the ih Cook's Distance

is a measure of the change in regression parameter estimates that results from

deleting the ith observation. Clearly a deleted observation provides no information for

finding 3. Similarly, as the ith perturbation is decreased from 1, the ith observation

provides less and less information to the estimation process owing to its increased








variance. Thus, case-deletion can be thought of as a rather drastic perturbation:

one that results in a variance -*0 oo for the ilh observation. Case-deletion is referred
to as a global influence technique because it assesses the effect of perturbations that

are not local to the null perturbation in fl.

1.4.2.1 Influence on the full vector of parameter estimates

Figure 1.10(a) shows the direction of maximum curvature for the Scottish

hills data set. Again, element 18 dominates the direction of maximum curvature.

This implies that the precision of observation 18 is the most influential on the full

set of parameter estimates.

1.4.2.2 Influence on &2

Using the profile likelihood displacement, influence directed upon a2 alone

can be assessed. Observation 18 is again implicated by vmax as being most locally

influential (Figure 1.10(b)).

1.4.2.3 Influence on individual regression parameter estimates

Figure 1.10(d) shows Vma,, when isolating effects on lMist. Although there are

several points that stand out, observation 7 is identified as being most influential

since element 7 is large. This race is also implicated as being locally influential in

the plot of Vmax for /dimb (Figure 1.10(c)). In fact, the DFBETAS diagnostic

(Belsley et al. 1980) also identifies race 7 as being influential on cdiimb. Recall that

the value of DFBETAS$j measures how the jth parameter estimate is affected

by deleting the ith observation. Writing the n DFBETAS for the jth parameter

estimate in vector form, it is shown in Section 2.5 that this vector is proportional to

the following Hadamard (element-wise) product of vectors:


























.. - ...-. . . . -


0.25

10 35
-0.25

-0.5

-0.75

-1


..5 10 6o 15 2 0 25 30 35


(a) Vmax for 6


I !~i~ 2T'T~30 ~
S


(c) Vmax for climbb


(b) Vmax for &2


5 Ct O e0 25 30 35
p


(d) Vmax for fdist


1

0.75

0.5

0.25


S -0.25

-0.5

-0.75
-1


0 9


'.' *,:* S0'''a'5"'3'0 935
Op*


(e) Vmax for / (f) v2 for 4



Figure 1.10: Hills data; variance perturbation


I


5 10 w 15 TOT 75









1
0.75

0.5

0.25


-0.25

-0.5

-0.75
*i

Figure 1.11: Standardized vector of DFBETAS for cdimb





1
1hn rl rlj
1-hil
1
1-h2 r2 r2J (1.11)
1


1- ) f r. rj

Here, hi is the ith diagonal element of the "hat" matrix X(X'X)-1X', ri is the ih

residual, and rij is the ill residual from regressing Xj on the other columns of X.

It can be shown (see Section 2.5) that vma,, for directing the effects of the variance

perturbation on 4j is proportional to


ri rij

r2 r2j (1.12)
r. v L 2


r, rni


The elements of the two vectors differ only by a factor of 1 In fact, standardizing

the vector of 35 DFBETAS for 4c3imb to length 1 and plotting them (Figure 1.11)

produces a figure nearly identical to Figure 1.10(c).








Influence on both regression parameter estimates

Figure 1.10(e) displays the direction of maximum curvature for directed

influence on 6. Plot (f) is the direction of largest curvature under the constraint of

orthogonality to Vma,. The two plots are very similar to the directed influence plots

in (c) and (d).

1.5 Basic Perturbations

As an alternative to finding the direction of maximum curvature, we may

utilize basic perturbations. The i1th basic perturbation is the same as the null

perturbation except for an additive change of 1 in its ith element. For example, the

first basic perturbation is w0 + (1,0,..., 0)', and corresponds to moving a distance

of 1 along the first axis in fl. Typically, the curvatures for each of the q basic

perturbations would be computed and then plotted together.

As an example, consider applying the variance perturbation to a linear

regression, and examining directed influence on J3. In this case, the ith basic

curvature, Cb,, is a local influence analog to the ith Cook's Distance, because it

corresponds to re-weighting a single observation. It is shown in Section 2.7.2.1 that

the ith basic perturbation for this scheme measures influence on the ith fitted value.

Figure 1.12(a) plots the 35 curvatures for the basic perturbations applied to the

Scottish hills data, while Figure 1.12(b) plots the 35 Cook's Distances. Although

the scales differ, they provide similar information. Both plots draw attention to race

7, but they differ somewhat in their emphasis of races 11 and 18.

1.6 Brief Literature Review

Local influence techniques have been applied to a number of statistical

models, most notably univariate and multivariate linear models (Cook 1986; Cook

and Weisberg 1991; Lawrance 1991; Schwarzmann 1991; Tsai and Wu 1992; Wu

and Luo 1993a; Kim 1995; Kim 1996c; Tang and Fung 1996; Fung and Tang 1997;









2.5 .

3 2
1.5
2


0.5
1 3
.p.*. *.. l...i. O,. . ...pg..s... 0...
. 0 15 2O 25 30 35 5 10 15" 02"o 5- 30 3

(a) Basic curvatures (b) Cook's Distances

Figure 1.12: Hills data; Cb's for variance perturbation and Cook's Distances



Kim 1998; Rancel and Sierra 1999). A number of articles have also appeared on

applications in generalized linear models (GLMs) (Cook 1986; Thomas and Cook

1989; Thomas and Cook 1990; Thomas 1990; Lee and Zhao 1997) and survival

analysis (Pettitt and Daud 1989; Weissfeld and Schneider 1990; Weissfeld 1990;

Escobar and Meeker 1992; Barlow 1997; Brown 1997). Other applications include

linear mixed models (Beckman et al. 1987; Lesaffre and Verbeke 1998; Lesaffre et al.

1999), principal components analysis (Shi 1997), factor analysis (Jung et al. 1997),

discriminant analysis (Wang et al. 1996), optimal experimental design (Kim 1991),

structural equations (Cadigan 1995; Wang and Lee 1996), growth curve models

(Pan et al. 1996; Pan et al. 1997), spline smoothing (Thomas 1991), Box-Cox

transformations (Lawrance 1988; Kim 1996b), nonlinear models (St. Laurent and

Cook 1993; Wu and Wan 1994), measurement error models (Zhao and Lee 1995; Lee

and Zhao 1996), and elliptical linear regression models (Galea et al. 1997).

Many authors have presented modifications of, extentions to, and theoretical

justifications for the basic methodology (Vos 1991; Schall and Gonin 1991; Schall

and Dunne 1992; Farebrother 1992; Wu and Luo 1993a; Billor and Loynes 1993;

Kim 1996a; Fung and Kwan 1997; Lu et al. 1997; Cadigan and Farrell 1999; Poon








and Poon 1999). Extentions to Bayesian methodology have also been considered

(McCulloch 1989; Lavine 1992; Pan et al. 1996).

1.7 Outline

In Chapter 2, computational formulas for local influence analysis are discussed

and derived for the two perturbation schemes discussed in this chapter. Chapter 3

contains derivations and examples of local influence diagnostics for linear regression

under perturbations to the X matrix. In Chapter 4, the methodology is extended to

a general class of influence measures and estimation techniques. Computational tools

are also provided. In Chapter 5, inference and local influence analysis for prediction

problems are discussed. Techniques for assessing estimation and prediction in linear

mixed models are developed and applied to data. In Chapter 6, some preliminary

results that can be applied to multi-stage estimations are presented. The chapter

also contains commentary on additional areas of future research. The appendices

contain details for data analyses.













CHAPTER 2
COMPUTATIONAL ISSUES IN LOCAL INFLUENCE ANALYSIS

In this chapter, computational formulas for local influence are reviewed

(Cook 1986; Beckman et al. 1987), To demonstrate the mechanics of the formulas,

detailed derivations of previous results for the regression model (Cook and Weisberg

1991; Schwarzmann 1991) will be presented with minor additions. Next, the issue

of how to obtain objective benchmarks is considered, and some new novel solutions

are outlined. Finally, local influence is discussed under reparamaterizations of the

perturbation space f and the parameter space E. By applying previous results

(Escobar and Meeker 1992; Pan et al. 1997), some new diagnostic tools for influence

on fitted values in linear regression are derived.

2.1 Curvatures and the Direction of Maximum Curvature

Cook (1986) showed that the curvature of LD(w) in direction v can be

expressed as

C, = 2v'A'(-L--)Av, (2.1)

where -L is the observed information matrix of the original model,

-L = L(0; y)


and Apxq has elements

Aij = &2Lw (O; y)
aOiOwj O=6,w=wo

for i = 1,... ,p and j = 1,... q. This result is proved in Chapter 4 as a special case

of Theorem 4.2.








The q x q matrix F = 2A'(-L-)A is the acceleration matrix. It is easily

obtained because A is a function of L,(0; y), and -L-1 is typically computed

as part of ML estimation since it serves as an estimated asymptotic covariance

matrix for the parameter estimates. Hence, computing the curvature in a particular

direction is straightforward. In addition, the diagonal elements of the acceleration

matrix are the q basic curvatures. Finally, it is well known how to obtain the

maximum value of a quadratic form such as (2.1) (Johnson and Wichern 1992,

pp. 66-68). Specifically, Cma, is the largest eigenvalue of F, and Vmax is the
corresponding eigenvector. Hence, finding Cm. and Vmax is numerically feasible.

The acceleration matrix can provide additional information. For example, the

second largest eigenvalue is the largest curvature in a direction that is orthogonal

to Vma. Hence, its corresponding eigenvector indicates a direction of large local

influence that is unrelated to the first one. Additional elements of the spectral

decomposition of F can be used in a similar fashion.
A consequence of this derivation is that it shows how local influence

techniques reduce the dimensionality of examining LD(w) via a small number of

curvatures and directional vectors. It is well known that the number of non-zero

eigenvalues of a symmetric matrix such as F is equal to its rank. Since it is a

product of other matrices, the rank of F is no more than the minimum of their

ranks, which is typically p.

In the regression examples given previously, p = 3 and q = 35, meaning that

the local influence approach can summarize the 35-dimensional surface LD(w) with

three curvatures and directional vectors. Similarly, LD0(1(w) was summarized with

just two, as shown in Figures 1.12(c) and (d).

Despite the advantages of the eigen-analysis of F, one difficulty that arises

is the occurence of eigenvalues with multiplicities greater than one. For example,

if Cmax = A1 has multiplicity greater than 1, then there is no unique Vmax to plot.








In these situations it may be more prudent to look at directed influence on single

parameter estimates.

2.2 Local Influence Analysis of a Profile Likelihood Displacement

The profile likelihood displacement, LD[l](w), was presented in Section
1.2 as a measure of influence directed through 01. Cook (1986) showed that the
acceleration matrix can be expressed as

Po"1 = -2A'(L1 B22)A, (2.2)


where

0 0
B22 =
0 L-12

and L22 comes from the partition

L,= ,12

L21 L22

where

t3 9- 9'
L -00200, o=


This result is proved in Chapter 4 as a special case of Theorem 4.2.
Beckman et al. (1987) considered directing influence on a single parameter

estimate. They noted that in this case, the acceleration matrix is the outer product
of a vector, and hence has only one non-zero eigenvalue. Without loss of generality,
suppose that we wish to direct influence on the first parameter estimate in 0. The
eigensystem is then given by


A= 21|A'TII v oc AT,


(2.3)







1
where T is the vector for which TT' = (L B22). It can be expressed as


T= )1 (2.4)


where c = L1 L12L21L21

2.3 Building Blocks of Local Influence for Regression

In this section, the building blocks of F for a multiple linear regression are

constructed. Subsequent sections apply these results to the response and variance
perturbations used in Chapter 1.

2.3.1 Notation

Unless otherwise noted, the derivations assume the presence of an inter-
cept in the model so that the residuals sum to zero. Centering and scaling

the regressors tends to faciliate interpretation, but is not necessary.

The number of regressors is k = p 1.
H = X(X'X)-1X.

Xi represents the ith column of X, and x\ represents the ith row of X.

r = (In H)y represents the residuals.

r, represents the residuals from fitting a model to the ih predictor using

the other predictors. These can be considered as values of an adjusted

predictor (i.e. adjusted for relationships with other predictors). Note:

Hr, = ri and (I H)r, = 0.

X(i) represents the X matrix with the ith column removed. Similarly,

0(i) is the regression parameter vector with the ith element removed, and

A(i) is A without the ith row.








2.3.2 Acceleration Matrix

The ML estimates of 0 are the same as those of ordinary least squares,

= (X'X)-'X'y ,and the ML estimate of &2 is r. The p x p estimated

asymptotic variance-covariance matrix of the MLEs is

[&2 (X'X)-1 0
01-- 2&
n

Partitioning A' into [A A',2], the acceleration matrix is given by

F = 2A'(_-L)A

= 2&2A (XX')-1A + --44A2A,2.
n


2.3.3 Acceleration Matrix for Directed Influence on a2

The acceleration matrix for directed influence on 6,2 will utilize the matrix

B22, which is given by

0 0
0' 2&
n

This implies that the acceleration matrix for directed influence on 8.2 is

[2] = -2A,( 1 B22)A
46,4 (2.5)
= -A-A2,.
n

Noting that A'2 is in fact a q x 1 vector, there will be a single eigenvalue and

eigenvector given by


n44 11A' V1OCAt.2

These quantities are the maximum curvature and its direction, respectively.








2.3.4 Acceleration Matrix for Directed Influence on 13

When directing influence on /3, we have

B22__2 (XX')-1 0
L _B22 :
Tf 0

The resulting acceleration matrix is


= 2,&2A (XX')-1 Af.


(2.6)


Note that this implies that directed local influence on /3 is equivalent to local
influence for /3 when treating a2 as known and equal to its MLE.

2.3.5 Acceleration Matrix for Directed Influence on a Single Regression Parameter

By applying the general results of Beckman et al. (1987), we can construct T
(2.4) for directed influence on 31 in a regression model. Using results for the inverse
of a partitioned matrix, TT' = -(L B22) can be expressed as


1
*"*2
d -(l)X(1))-I X!)Xl

0


-XX(l) (X(1)X(1))1 0
(X(l)X(l))- X1)XlXlX(1)(X(1)X(1))X 0-1 ,

O-1 0
(2.7)


where d = ,~X 1
where d = XlX1 X'X(1)(X(1)X(1)) X'()Xl.
We denote (X'1)X(1)) X'1)Xl by j'1, as it is the regression parameter

estimates from modeling X, as a linear combination of the other explanatory

variables. This simplifies the notation of (2.7) greatly:


-(L B22) 2
1Iirl12


1 -Vi0

-7i1 7i'1 0
0 0' 0


(2.8)








Using these results, we find

1

T lrll (2.9)
0

and this leads to the following expressions for the maximum curvature and its

direction when directing influence on 13i:


Cmax = 21|A'TI| = 2 |2 A1 A/ iljl vmax oc AT oc A A^

The next two sections apply these results to document more completely the

two perturbation schemes given in Chapter 1.

2.4 Perturbations to the Response in Regression

The results of this section closely follow those of Schwarzmann (1991),

with only minor additions. As in Chapter 1, a set of additive perturbations are

introduced into the observed data to examine the influence of measurement error in

the response. An n x 1 vector of perturbations is introduced such that the perturbed
likelihood is based on Y + w N(X13, a2In). The null perturbation is wo0 = On.

Equivalent results can be obtained by perturbing the model of the mean to be

X3 w. That is, by considering a model assuming Y N(X/3 w, a2In). Hence,

the measurement error perturbation can be interpreted as a type of mean-shift

outlier model; each perturbation is a "stand-in" for an additional parameter that
could correct or exacerbate lack-of-fit.








2.4.1 Building Blocks
The perturbed likelihood is


L,,(O; y) cx -- logu2 -


1(y Xf + Wy Xf3 + w)


n 1
n log 02 W-(y + W)'(y + w)


"i 2
-n log a -
2


1-- (' + 2w' w'w),


where y = y Xf3. The next step is construction of A. Partial derivatives with
respect to w are as follows:


OL,(O;y)0; 1
Ow (2 +2w)
1

1
W-(y XO +W).


Second partial derivatives are then


92 L,,(0; yi)
OUL,(0; y) _

Oaua2


1
- x
U2

1


-X03+w).


Evaluating at the original model yields


902L,(O; y)
OwO/3' -=6,6 =W


a2L,(0; y)
woao2 0=-,=wo


1


1
-r


From these quantities, the n x p matrix A' can be constructed:

A'= X r ]








Finally, the acceleration matrix is

S1 r 2 (XX')-1 0 X'
F-=2- IX[2xt-

2 2H (2.10)rr'

&2 n&4
8.2
2.4.2 Directed Influence on a2

The acceleration matrix for the profile likelihood displacement for &2 is
simply the second term of (2.10):

[2] rr', (2.11)
n&.4

Solving the characteristic equations (F AI)v = 0 directly, the single
eigensolution is

Cmax = Amax = Vmax (c r.

Therefore, as mentioned in Chapter 1, Vmax for directed influence on &2 is
proportional to the residuals,

2.4.3 Directed Influence on/

Without loss of generality, consider directed influence on /13i. Rather than
find the acceleration matrix, we proceed directly to determining the sole eigenvector

proportional to A'T and the corresponding curvature 211A'T112, as per (2.3) and








(2.9):

I [ 1 1
VmaxO cx Xl X(1) i ^1


S [xl- x(l)- ]
&IlrxII
1
iI,111
-< T7--'1
oc ri

and


C 211 1i~ r 112= 2
Cmax =2I _r__ I 2

Therefore, the direction of maximum curvature for directed influence on f3 is

proportional to the jth adjusted predictor.

2.4.3.1 Directed influence on

The acceleration matrix for the profile likelihood displacement for 3 is the

first term of (2.10):

-] H. (2.12)
&~2

This matrix has a single non-zero eigenvalue of with multiplicity k, implying that

there are an infinite number of directional vectors that have maximum curvature.

Although not discussed by Schwarzmann (1991), one set of orthogonal solutions can

be motivated by considering the canonical form of the model:


Y = Za + E,


(2.13)








where

Z=Xp



W= [IW ... k I and

i = the ith eigenvector of X'X.

Because Wp' = Ik, this is simply a reparameterization of the original regression
model. The new regressors, Z1 = X I, ..., Zk = Xpk, are known as the principal

components, and they are mutually orthogonal. In addition, each satisfies the

characteristic equations of F[]:

2- 2I) Z, T (H I)X i (2.14)


=0.

Therefore, the standardized principal component regressors can serve as an

orthogonal set of directional vectors, each of which has maximum curvature. Note

also that the k adjusted regressors, rl, . rk, are also directions of maximum

curvature, but they do not constitute an orthogonal set.

2.4.3.2 Influence on 0

The acceleration matrix F given in (2.10) has two non-zero eigenvalues: 4

and ^, with the second one having multiplicity k. The first eigenvector is oc r,

and the set of k principal components can be used to complete an orthogonal set of

eigenvectors. Comfirmation of these solutions is straightforward.

2.5 Unequal Measurement Precision Applied to Regression

Cook introduced this perturbation in his original (1986) paper on local
influence, and the results in this section come from his work. An n x 1 vector

of perturbations is introduced such that the perturbed likelihood is based upon








y N(Xf3, U2D(1/w)), where D(1/w) denotes a diagonal matrix with ijh element
-. The null perturbation is w0 = In. The n basic perturbations would be the local
versions of the n Cook's distances (Cook 1977; Cook 1979). In addition, the n basic
perturbations applied to LDIOA(w) are local influence analogs of the n DFBETAS
for/3.
A second interpretation of this perturbation scheme comes from the idea

of case-weights. Case-weights are used in an analysis when each observation has
been sampled with unequal probability. Under disproportionate sampling, cases are
weighted by the inverse of their sampling probability in order to obtain unbiased
regression parameter estimates. In the normal linear model, this weighting is
numerically equivalent to perturbing the error variances. Hence, this perturbation
scheme can be used to examine sensitivity to the assumption of random sampling.

2.5.1 Building Blocks
To begin, we develop the perturbed likelihood, which is

2 1
L,,(8; y) oc -j log a2 2a--(y X/3'D~w)y X/3)

nloga2 -_ 1 'D(w).
2 20r2

The next step is construction of A. Partial derivatives with respect to /3 are as
follows:
OL,(0; y) -1 (-2X'D(w)y + 2X'D(w)X)3)
913 2ar
= X'D(w)p,

and the partial derivative with respect to a2 is

OL,,(O; y) n 1
02 2- + D(.rr
,9,T22 (40 v' ()y







Second partial derivatives are then

a2L(O; y) lX'D()

92L(O; y) y)
9a29I 24w

and evaluating at the original model yields:

2L(O;y) = -X'D(r)
9039W' e=,=w &2
92L.(;y) 0; ,.yo 1 1
o2Ow' -,=0, 24 (r r)' = r'sq,

where is the denotes the Hadamard (element-wise) product, implying that rsq is
an n x 1 vector with elements r?. Combining these we can write

A 2- D(r)X jrq


and

P=2&2 1D(r)X (XXy)-1 1X'D^)
454 [[( )
+4 rsqrf (2.15)
_n [ 1 07 8 q
2 [D(r)HD(r) + 1 r..r
P 2n2 r sqr .


2.5.2 Directed Influence on 62
The acceleration matrix for the profile likelihood displacement for &2 is
simply the second term of (2.15):

W 4 1rqrq (2.16)
Solving the characteristic equations directly, the single eigensolution isn
Solving the characteristic equations directly, the single eigensolution is


Cmax 4 ma
Sn&4 vmax oc rsq








This suggests that &2 is most affected by variance perturbations when they are
scaled according to the squared residuals.

2.5.3 Directed influence on ij
Without loss of generality, consider directed influence on /31. Again we will
proceed directly to determining the sole eigenvector proportional to A'T and the
corresponding curvature 211A'T|12 as per (2.3) and (2.9):


Vmiax (X D(r) X[ X (1)
17
S1D(r) [x1 X(1)Yi] (2.17)
1
= --D(r)rl
&Jlr1JJ
ox r rl

and

S=2n 211 r._ 12=I22. (2.18)
cm 1J I I1 rJ11 r=1

This suggests that maximum local influence of the perturbation on /i is
achieved by perturbing cases that have both large residuals and extreme values for
the adjusted covariate. Cook (1986) provided additional details of how this result
compares to the information in a detrended added-variable plot.
Also, we may compare the information in Vma. with that provided by the
DFBETAS diagnostic (Belsley et al. 1980). Recall that the value of DFBETASi,j
is the standardized difference between /3j and j,(i), the estimate when deleting the
ith observation. The formula is given by

SDFBETAS 3j,(i) c- jri,
DFBETALV J r Vr(j) scjl(1 hI )








where r, is the ith residual, cj is the jh column of X(X'X)-', cij is the ith element
of cj, and s2 = 12 is the mean squared-error estimate of a2. The vector c, can be
n
written in terms of the jth adjusted regressor by partitioning (X'X)-1 as per (2.7).
Forj = 1,

[- 1 1 1
Ci X, X(1) ]f --I [P r].


Using this equality, the vector of n DFBETAS for /3j is proportional to

1
1-hil
1
1-h22 rr. (2.19)

1
l-hnn

Since Vma x ocr rj, observations with large leverage stand out more in a plot of the
n DFBETAS for fj than in a plot of Vmax,. This algebraic relationship between
Vma and DFBETAS has not been noted by previous authors.

2.5.4 Directed influence on/
The acceleration matrix for directed influence on 3 is the first term of (2.15):

= j [D(r)HD(r)].


The eigensystem of the matrix does not have a closed-form. However, some idea of
the directed influence of this perturbation on 3 can be ascertained by utilizing basic
perturbations. The ith basic perturbation corresponds to modifying the variance of
only the ith case, and its curvature is the ith diagonal element of F'1 (Cook 1986):

Cb = r2 hii. (2.20)
(7U










1 1
0.75 0.75
0.5 0.5
0.25 0.25
6 S
4 0y $@il 0 a-- -
5. ."- '" 2 3-0 5" "'is 2 5 -0 -' 35
-0.25 -0.25
-0.5 -0.5
-0.75 -0,75
-1 -1

(a) V2 for (b) v3 for


Figure 2.1: Hills data; variance perturbation



This is similar to the ih Cook's Distance (Cook 1977):

1 2h,.
k(1 hi)2s2 It,


2.5.5 Influence on 0

The eigensystem of F is not closed-form, but it can be found numerically.

As an example, consider the perturbation of the error variances for the Scottish

hills data. There are three non-zero eigenvalues for F: 14.82, 4.91, and 0.37. The

eigenvector associated with the largest eigenvalue was plotted in Figure 1.10(a). It

is similar to Vma.,, for directed influence on &2, as given in Figure 1.10(b). The other

two eigenvectors for F are plotted in Figure 2.1.


2.6 Benchmarks for Perturbation Analyses

An analyst using local influence diagnostics will want to know when a

curvature is "too big" and when an element of Vmax is "too big." Calibration of local

influence diagnostics was the topic of much debate in the discussion and rejoinder

to Cook (1986), and there is still no consensus opinion on the subject. Much of the

difficulty lies in the fact that the perturbation is introduced into the problem by the








analyst. Because the technique does not involve hypothesis testing, critical values

are not appropriate. However, there have been attempts to develop rule-of-thumb

benchmarks, as exist for Cook's Distance and other common influence diagnostics.

In this section, we discuss some general difficulties in developing benchmarks

for perturbation analyses, as well as some issues specific to the local influence

approach. Although it is argued that all sensitivity is "relative", some simple

methods for assessing the size of curvatures and the size of the elements of directional

vectors are presented.

2.6.1 The Analyst's Dilemma

As mentioned before, the perturbation is introduced by the analyst, and this

implies that he/she must decide whether the effects of perturbation are important,

much the way a researcher decides what effect size is important. Consider a situation

in which the researcher is actually able to give the size or stochastic behavior of

the perturbation. Here, we may concretely measure the effects of perturbation

by whether or not the conclusion of a hypothesis test changes. Alternatively, the

effects can be measured by whether or not the perturbed estimates are outside an

unperturbed confidence ellipsoid. However, despite having a concrete measure of the

perturbations's effects, assessing the practical relevance of the results still falls to

the analyst.

2.6.2 Calibrating Curvatures in Local Influence

In addition to the general limitations of a perturbation analysis, an analyst

using the local influence approach must also contend with the fact that the behavior

and size of the perturbations are undetermined apart from the basic formulation.

Loynes (1986) and Beckman et al. (1987) noted that curvatures do not have

the invariance properties needed for a catch-all method of benchmarking. This

necessitates a reliance upon relative influence.








2.6.2.1 1-to-1 coordinate changes in fl

Following Loynes (1986) and Beckman et al. (1987), let us consider 1-to-1

coordinate changes in the perturbation space f. That is, a new perturbation scheme

is constructed with elements w! = k(wa) for i = 1,... q, for a smooth function k. The

new scheme's perturbation, w*, will not yield the same curvature as w. Specifically,

C. = (q _2 C,, where w,, denotes the null perturbation value for wi.
Ow=w0 Wo
-2 G2
As an example, consider perturbing the variance of y1 to be 2 rather than 1-.

Here, k(w1) = w?, resulting in new curvatures that are (_01? .i)2 (2w ) =1 = 4

times the original ones. Hence, despite the fact that the two versions of 0 can

produce the same set of perturbed models, the curvature in identical directions will

change. This implies that a universal benchmark for curvatures is not possible.

Indeed, given the arbitrariness of scaling the perturbation, it should come as no

surprise that curvatures cannot be compared across perturbation schemes. However,

the direction of maximum curvature is unchanged under 1-to-1 coordinate changes

in fl.

2.6.2.2 Interpreting curvatures

In this section, a simple way to assess curvatures from the same perturbation
2
scheme is given. First, RC is a Taylor series approximation to LD(wo + av)
2
(Lawrance 1991; Escobar and Meeker 1992). Also, assuming that n is large, an

asymptotic 100(1 a)% likelihood-based confidence region for 0 is given by values

of 0 such that


2(L(; y)- L(O;y)) < Xp,-.

Thus, if twice the curvature in direction v is larger than 2x,1-, then a perturbation

at a distance of one from w0 in direction v moves the parameter estimates to the

edge of the asymptotic confidence region.








To further develop this idea, let us define the size of a perturbation to be
its Euclidean distance from the null perturbation (i.e., 11w Woll). Suppose further

that we have perturbations in two different directions: WA in direction VA and WB
in direction VB. If ICA\ is k times larger than IC\I, then perturbation WB must be

vk times larger than WA to have the same effect on the likelihood displacement.
Equivalently, in order to perturb the estimates to the edge of the confidence ellipsoid,

it would be necessary to travel Vk times farther in direction VB in 0 as in direction
VA. This provides a simple interpretation for the relative size of two curvatures.

2.6.2.3 Internal benchmarks for curvatures

Building on the interpretation given above, curvatures can be benchmarked

based on their relative size to some sort of "average" curvature for a given scheme
and fitted model. For instance, the average of the q basic curvatures for the fitted
model can serve as a benchmark. Alternatively, the average curvature could be
computed by (1) allowing the squared elements of v to have a Dirichlet(al ... aq)
distribution with all ai equal to one, and then (2) finding the expected curvature.
This would be equivalent to finding the average approximate displacement at a

radius of V2i from Wo (Cook 1986). If such a benchmark is difficult to compute, a
Monte Carlo estimate can be calculated. Because these benchmarks are computed
using curvatures from the fitted model, they result in comparisons that are internal
to the fitted model. The conformal normal curvature of Poon and Poon (1999)
would also be an internal way to assess the sizes of curvatures.

2.6.2.4 External benchmarks for curvatures

It is entirely possible that all of the curvatures of a fitted model are large

due to the nature of the data set, and using internal benchmarks will not alert
the researcher to such a phenomenon. To address this, a benchmark that is not
dependent on the observed data is needed. To this end, we might utilize the
expected value of the curvature in direction v, using 0 in place of the unknown






40

parameters 0. The expected curvatures can then be plotted along with the observed

curvatures (Cadigan and Farrell 1999). If the expectation of C.ma is difficult to

obtain, Monte Carlo simulations could be used to estimate it. As an alternative,

Schall and Dunne (1992) developed a means of external benchmarking by developing

a scaled curvature using results on parameter orthogonality (Cox and Reid 1987).

2.6.2.5 Internal versus external benchmarks

Internal and external benchmarks are complimentary tools, since each one

provides different diagnostic information. Comparing curvatures from the same

fitted model highlights relative sensitivity given the data and the design. Comparing

curvatures to their expectations highlights whether the sensitivities are expected

given the experimental or sampling design.

2.6.3 Calibration of Directional Vectors

For a directional vector from the eigensystem of F, the analyst must compare

the individual elements in the vector to determine which aspects of the perturbation

are most important. Typically, it is suggested that the analyst simply look for

elements that stand out as being large relative to the rest. Alternately, a vector

whose elements are all equal in size can serve as a means of comparison. That is to

say, if each of the q elements of the perturbation are contributing equally, then each

is equal to either or --. A particular element whose absolute value is larger

than, say, 2 or 3 times that of equally contributing elements (e.c.e.) would then be

of interest. Hence, the inclusion of horizontal lines at 2- and -- in the plot of

directional vectors may be useful.

2.7 Local Assessment Under Reparameterizations

Pan et al. (1997) show that Vmax is invariant to any 1-to-1 measureable

transformation of the parameter space R. In fact, for 0 = h(k), the entire

eigensystem of Fk, is the same as that of F0. However, directed influence on the







estimates of the redefined parameters can then be examined. Using similar steps to
Pan et al. (1997) and Escobar and Meeker (1992), we can write update formulas
for Ao and Lo and then calculate directed influence on a subset of the redefined
parameter estimates. Using partial differentiation, the building blocks of the
acceleration matrix under the parameterization 0 can be written:

00'
A o= TA (2.21)
A = (-Aoi a

I -o- -) a(2.22)
(900 -' -1 1(00'1
'got ) LO (-50-,

where all partial derivatives are evaluated at the fitted model. From this, the
acceleration matrix for directed influence on 01 can be obtained from the partition

0 = (01, 02)':
F1 -2 = -2A,(L-1 B02)A-
= 00...i 00'
2A' (L B22)a0-AO (2.23)
00'
-2A( (L- -B 2-)AO,

where

0 0
B0^22-
0 L-1
B~22=[022

L22 comes from the partition

Loll L 121
L021 L022

and all partial derivatives are again evaluated at the fitted model.
Although Pan et al. (1997) suggested that (2.23) is invariant to the choice
of 02, they did not provide a proof. To show this result, consider another







reparameterization for which the first pi elements are the same as those of 4:


2 02
This implies that

= I P 0 ((2.24)


and this leads to the following expression for the lower right submatrix of L:


o ao' .. ao a2








where all partial derivatives are evaluated at the original model. Meanwhile, the
acceleration matrix for directed influence on i/1 is
Loo
a -2A O a1' l-1 9
*"[ 122 ,i- ~ ~ i






o V 2 a0(2.26)
t9_..-1 00 0 a0
L022 -





aoq,)
1 90f2 f 2

where all partial derivatives are evaluated at the original model. Meanwhile, the




The invariance of the acceleration matrix for directed influence on to the choices

of 2 is proved by showing that (2.26) is equivalent to (2.23). Using the inverse of
0 L-1
102 (2.26)
ao i 9 0 0 ao,
=-2A(L -9 )A9.
0 "22
The invariance of the acceleration matrix for directed influence on 01> to the choice
Of 0^2 is proved by showing that (2.26) is equivalent to (2.23). Using the inverse of








(2.25),


D0
al'


0 0 D0'
0 L-1 9D
IP22


Substituting (2.27) into


oo 4o 0 1 Dk'.9o'
DO [9 o 0 0 90l 9O'
0 L-1022

L90 O4 0 0 0 02 002 a l


DO0 0 0 O0'
9) lo 9 .2LT-1 L-1 Q0D
L 0 -' 22 0 22 2
DO 0 0 1O'
90' o L-1 90
(2.26) completes the result.
(2.26) completes the result.


(2.27)


2.7.1 Directed Influence on a Linear Combination of Parameter Estimates
As mentioned in Escobar and Meeker (1992), a specific application of these
results is to consider directed influence on a linear combination of parameter
estimates, x'O. There are many ways to reparameterize the model such that x'O is
one of the new parameters. For example, assuming that the first element of x is
non-zero, 40 can be defined as AO, where

A zX(1)
oP_1 IP_1

where x, and x(i) are the first and remaining elements of x, respectively (Escobar
and Meeker 1992). Using equation (2.23), the acceleration matrix for directed
influence on 01 = x'O can be computed, and the resulting diagnostics are invariant
to the choice of 02, .., Op. By using a prudent choice of new parameters, we can
obtain easy-to-compute formulas for the single eigensolution of F [], as per the
following theorem.






44

Theorem 2.1

The maximum curvature and its direction for directed influence on z'O is given by


Cmax = lk2 1IA0LOI2 'max "'-1 X

"-1/2
where ki = L/2x.

Proof of Theorem 2.1

To begin, consider a model with parameterization K' = K'l2 0, where

K ki k2 ... kp]


is a full rank p x p matrix with

Ic -1/
ki = l/2x

kIkj= 0 i/j

k'iki kkl i= 2...p.

By construction,
1
S= X'0 K-1 K'
Ilk,112
000
0 1/2 Ko (90 1 LO1K.


Using the update formulas in (2.21) and (2.22) it follows that





A 1- 10
A4 = -9--




(5- 707) L
L(1=O0 -ILlj, 1(0--


K,11/2ilil/2K = Ilk, 112j_
L 0 0







Next, we show that directed influence on ^ = x'O is not a function of k2,..., kp.
From equation (2.23), we find

x'"-]- 2( Ao i1/2K) (lkl121- lkl112 [ ] ( Klil2Ae)


2 A Loi/2A K 1 ] K'L91'2A
kllk,2 0 00

2 A' iL-1/2 kkl I1/2

2
illill

-lk,112 AoLe xz LO Ao.
(2.28)

The single eigensolution of this rank-one matrix yields the final solution for Cmax
and Vmax. E
The construction of 4 is worth some note. The premultiplication of 0 by
K'Lf/2 orthogonalizes the parameters, in the sense that the off-diagonal elements
of the observed information matrix for 0 are zero. (Note that the orthogonalization
is with respect to the observed information matrix as in Tawn (1987), rather
"1/20
than the expected information as in Cox and Reid (1987).) First, a = Le20
is a parameterization with an observed information matrix equal to the identity
matrix. Then, K is constructed to make orthogonal contrasts of a, with the first
column corresponding to the linear combination of interest. Finally, by scaling the
columns of K to all have the same length as ki, K becomes a scalar multiple of
an orthonormal matrix, and this leads to the algebraic simplifications. Also, if
x'0 = 0j the local influence diagnostics provided by the theorem are the same as
those provided by (2.3), as per Beckman et al. (1987).








2.7.2 Application to Regression
As an example, consider perturbing the variances of a regression model and
examining influence on a linear combination of the regression parameter estimates,
A(x) = x'f3. From (2.28), we find

F iL 1 I (2.29)

= -- D(rX(X'X)-'xx'(X'X)-IX'D(r),
W2h,

where h. = x'(X'X)-lx Further,

Vma c x'(X'X)-1X'D(r) oc Cov(X/3,x'3) -r (2.30)

Cma = 2ix'(X'X)-X'D(r)2 (2.31)
&2h.

where Cov(Xo, x'/3) is the vector of empirical covariances between Xf3 and x'f3.
2.7.2.1 A DFFITS-type measure
Using the variance perturbation and directed influence on z'/3, a local
influence analog of DFFITS can be defined. Recall that DFFITS. is defined as the
externally studentized change in the ith fitted value when deleting the ith observation
(Belsley et al. 1980):

DFFITS. = -
r i]hisi

where p(xi)[ij is the estimate of p(xi) with the ith observation deleted and s22 is the
[i]
mean squared-error estimate of a2 with the ih observation deleted. A large value
for DFFITS indicates that the observation is somewhat "self-predicting". It can be
shown that DFFITS, simplifies to

(1 /h
i,) [i]








A similar measure can be obtained by examining local influence on
p2(xi) = xif/ when using the ith basic variance perturbation. That is, only the
ih observation is reweighted, and the curvature for directed influence on f(xi) is

examined. The curvature for the ith basic perturbation is the ith diagonal element of


2,r~hii (.2
Cb, = BasicFITSi = 2r (2.32)

A large value of BasicFITS indicates that the perturbation of the observation's
precision/weighting greatly influences its fitted value. Thus it is a local influence
analog of DFFITS. Coincidently, BasicFITS5 is equivalent to the ith basic curvature
for the acceleration matrix, as given in (2.20). That implies that for the variance
perturbation, the n basic curvatures of LD(w) are measuring influence on the n
fitted values.
While BasicFITS, is the ith basic curvature when directing influence on /(xi),
we can also define MaxFITS, to be Cma, for directed influence on fA(xi). Using
(2.31),

Cmaxi = MaxFITS. 2 r h "(2.33)
2h Ziihj=

By definition the, ith MaxFITS can be no smaller than the ith BasicFITS. While
BasicFITS provides information about which observation's perturbation influences
its fitted value, MaxFITS provides information about how sensitive each fitted value

is relative to the others.
Example

Consider examining influence on the fitted values for each of the 35 covariate
profiles in the Scottish hills data set. Figure 2.2(a) displays the 35 BasicFITS (i.e.,
each of the basic curvatures for directed influence on the corresponding ft(xi)) .
Race 7 is indicated as being influential on its fitted value, while race 18 is also shown











*


5 10 15 20 25 30 35


(a) BasicFITS


0
S
*0
I

0
0



6

*0




* 00 0


* 00
40


5 10 15 20 25 30 35


(b) MaxFITS


(c) BasicFITS (stars connected by
solid line) and MaxFITS (triangles
connected by dashed line)


Figure 2.2: Hills data; a2D(1/w) perturbation; influence on fitted values




to be quite influential on its fitted value. Plot (b) shows the 35 MaxFITS (i.e. the

Cmax'S for directed influence on each of the 35 /(xi)). Since no point stands out

as being large, the plot indicates that no fitted value is particularly sensitivity to

variance perturbations. Plot (c) shows both sets of diagnostics. The BasicFITS

values are stars joined by a solid line, while the MaxFITS values are triangles joined

by a dashed line. The closeness of the MaxFITS and BasicFITS values for race 7

indicates that this observation is somewhat self-predicting.













CHAPTER 3
PERTURBATIONS TO THE X MATRIX IN REGRESSION

In this chapter, the influence that infinitesimal errors in the X matrix can

have on parameter estimates in a linear regression is addressed. First, previous

results for single column and single row perturbations are reviewed. This is followed

by a generalization of these results to perturbing all elements in the X matrix. It is

shown that the eigensolution of P[ is related to principal components regression,

collinearity, and outliers. Finally, new graphical techniques are presented with the

results applied to the Scottish hills data set and an educational outcomes data set.

3.1 Perturbations to Explanatory Variables

Letting k denote the number of predictors, Cook (1986) proposed that an

nk-sized perturbation can be incorporated into the design matrix in an additive

fashion. These perturbations represent fixed errors in the regressors and should

not be confused with random errors-in-variables techniques (Fuller 1987). The

perturbation scheme can be used to assess the effects of small rounding or truncation

errors in the explanatory variables. Upon perturbation, the X matrix becomes

W1 Wn+l W2n+l ... Wkn-n+1

W2 Wn+2 ... ... Wkn-n+2
X + W3 ... ... ... Wkn-n+3



Wn W2n W3n ... WCkn

The use of a single subscript implies w = vec(W), where the vec operator stacks the

columns of the matrix into a single column. When particular elements of W greatly

affect parameter estimation, it indicates that the corresponding element of X is








influential. This allows the analyst to identify specific combinations of explanatory
variable values that influence estimation. As will be shown in Section 3.4, sensitivity

of the regression parameter estimates to these kinds of perturbations is greater when
there is collinearity amongst the regressors and when the model is calibrated to fit
the observed data too closely.

3.2 Perturbations to a Single Column

The special case of only perturbing a single column of X was discussed
in Cook (1986), Thomas and Cook (1989), and Cook and Weisberg (1991). This
section recreates their results in detail.

3.2.1 Building Blocks

Here and in the following sections, di represents a vector with a 1 in the ith
position and zeros elsewhere. The length of the vector depends on the context of its
use. Note that for di of size k,
1
d(X'X)- d i -rI2 (3.1)

and
1
X(X'X)-ld, = 1, ,r, (3.2)

where r, contains the residuals from regressing X, on the other columns of X.

Without loss of generality, suppose perturbations are added to the n values
of the first predictor (i.e., X1 becomes X1 + w). The perturbed likelihood is

L(; y) oc -n log2 21 (- X'3 A)'y- XP3- '1w)
2 2 (3.3)
=- _logu2 1 2W + 'w).
O2 -o + 0






51

The next step is the construction of A. Partial derivatives with respect to w are as
follows:


9L,(O; y)
(w


=- (y- X 13W).
0-


(3.4)


Second partial derivatives are then


02L,(0O; y)
9w9/31
02L (0; y)
9w 013(l
02L.(O; y)
OwaO2


1
=^(y-x(1)&l)- 2/3i(x,+u,))




- -(y-Xf 1W).
O04


Evaluating at the original model yields


i2Lw(O; y)
0w0fa 0=0,W=Wo
02L,(0; y)
09u90l() O=,w=wo
02L,(0; y)
0wQ^a2 e=eW


1
1 (r 4lX,)

= -x(1)
/41

a4,


and from these quantities we construct the n x p matrix


A' i[r-/l-
=- rd' lX


-]X(1) -Jr]

rI ]


where d, is a k x 1 vector. It follows that the acceleration matrix is given by


(3.5)


dlr' -- 1x
r/


P 21 rd -1 'xx')-i o ]
2 rd _-lx -I] ^01 2&--4
n
=2 + 2- ) rrI + 121H -l(rrl + r)r')







3.2.2 Directed Influence on &2
The acceleration matrix for the profile likelihood displacement for &2 is given
by

F[] = A^,2A (3.6)

n ,rr. (3.7)

Solving the characteristic equations (F AI)v = 0 directly, the single
eigensolution is

Cmax a v max cX r.

Therefore the direction of maximum curvature for directed influence on &2 is always
proportional to the residuals regardless of which column of X is perturbed, but the
corresponding curvature is a dependent upon the effect size, |/3i|.

3.2.3 Directed Influence on/ i
Using the results of Beckman et al. (1987), the sole eigenvector is proportional
to A'T and the corresponding curvature is 2|IA'TI12 as per (2.3):
1 [ ] 5 a1
Vmax 0C r 7 /41X1 -1-X(i) &1[ ]


1 [r 431Xl + 1 1 X] (3.8)

&I1ri|l
Ic r 4r r







and

Cmax 2 r- il'
= 211= 2 1 2'- /1, 112
& 2^jp [rr 2 _ri'+ +4172 (3.9)
=2 1( + Ir2


The direction of maximum curvature is a weighted average of the residuals and the
adjusted regressor. As noted in Cook and Weisberg (1991), 41r1 = y y(), where
Y() is the vector of fitted values when X1 is excluded from the model. Using this
result, we find Vma, oc 2r rl), where r*) is the vector of residuals when modeling
y using all regressors except X1. This indicates that the direction is dominated by
the residuals, a fact which is demonstrated by examples later in this chapter. Cook
and Weisberg (1991) discussed how Vmax relates to de-trended added variable plots.

3.2.4 Directed Influence on Other Regression Parameter Estimates
Thomas and Cook (1989) examined the influence that perturbations in one
column of X can have on the regression parameter estimate corresponding to a
different column. Without loss of generality, suppose we perturb the first column of
X and look at the effects on the last regression parameter estimate /k. Again using
the results of Beckman et al. (1987) we can obtain the maximum curvature and its
direction. Letting di be of size k 1,


vm, oc AT [ (rd, ,X(,)) ] X -f

-i rkI [-r k + ,-H(k)Xk IlXkL (3.10)

ocC ckir Ilrk,







where ikl is the regression parameter estimate corresponding to X1 when predicting
Xk with the other columns of the X matrix. The maximum curvature is given by

Cmax = [i ,2kir'+ ir'k] [kir +OArJ
S icTrk-- k [ 1 + / (3.11)
2 + II- H2 .
1+ Y1 Irk112!
When the partial correlation between X1 and Xk is zero, Cmax achieves its minimum
value of -, yielding Vmax cx rk- In the event that Xk is a linear combination of the
other regressors with non-zero ykl, Cmax = oo and Vmax c r. Between these two
extemes, the direction of maximum curvature is a weighted average of the residuals
and the adjusted kth regressor. These quantities are used to detect outliers and
measure leverage, respectively. If the partial correlation between X1 and Xk is as
large as that between X1 and y, then the two vectors are equally weighted. Note
that this weighting is invariant to scale changes in the columns of the design matrix
as well as the response.
3.2.4.1 Directed influence on /
Without loss of generality, we again consider perturbing the first column.
The rank-k acceleration matrix is

=-- 2 ( 1 rr' + H- (-rr'i + rr') (3.12)


The first eigenvalue of this matrix is unique, and yields a maximum curvature and
corresponding direction of

Cmax = (A2 + Jr ) and vi oc r- lrl.







Hence, the maximum curvature and its direction are identical to that of directed
influence on i. Confirmation of this first eigensolution follows:

-A11) vi (" 4' (i2 + HPE) I) (r _

oc (rr' + _riII2IH I,(rr[ + ,r')) (r ihi)
-(/32r 112 + 11712j1( ~~
-( ,ll+IIl (r ri)
= Itr112r &I 112lri IIl'lII1Hr', + K1r, 112 r
rllr llr 111rr + Illri ll2r, + llrll2,
= -kl,.,lt112Hr, + 'Jll,.11l ri

=0.

The second eigenvalue is /321 and has multiplicity k 1. As mentioned earlier,
this implies that there is no unique set of orthogonal eigenvectors to complete the
solution.

3.2.5 Influence on
The acceleration matrix for examining both f3 and &2 simulatenously has
k + 1 = p non-zero eigenvalues. The form of the eigensolution is unknown. My
numerical studies indicate that there are three unique non-zero eigenvalues, one of
which has multiplicity k 1. The one with higher multiplicity appears to be the
second largest of the three, with a value of'7. ^2

Example
Huynh (1982) and Fung and Kwan (1997) examined data from 20 randomly
sampled schools from the Mid-Atlantic and New England States. The mean verbal
test score (VSCORE) of each school's 6th graders is regressed on staff salaries per
pupil (SALARY), % of white-collar fathers (WCOLR), a socio-economic composite
score (SES), mean teacher's verbal score (TSCORE), and mean mother's educational








Table 3.1: School data; Pearson correlation coefficients

SALARY WCOLR SES TSCORE MOMED VSCORE
SALARY 1.00
WCOLR 0.18 1.00
SES 0.22 0.82 1.00
TSCORE 0.50 0.05 0.18 1.00
MOMED 0.19 0.92 0.81 0.12 1.00
VSCORE 0.19 0.75 0.92 0.33 0.73 1.00


level (MOMED). Appendix A contains model fitting information and diagnostics for

the recession analysis, with all variables centered and scaled.

Diagnostic plots of the residuals, externally studentized residuals, Cook's

Distances, diagonal elements of the hat matrix, and DFBETAS appear in Figure

3.1. Ordinary least squares regression indicates cases 3 and 18 are outliers, although

Huynh (1982) also identified observation 11 as a mild outlier. This data set also

has some redundancy among the predictors SES, WCOLR and MOMED (Table

3.1). Although the X matrix is not ill-conditioned, the association between SES

and MOMED accounts for the unexpected negative slope estimate for MOMED.

Table 3.2 shows the non-zero curvatures (eigenvalues) for perturbing

single columns of the X matrix. Comparing the columns of the table provides

information on which regressor's measurement is most influential. Here we see

that the measurement of SES is typically the most influential, both for regression

parameter estimates and for &2. Comparing the rows provides information on which

parameter estimates are most influenced by these kinds of perturbations. Here we

see that WCOLR and MOMED estimates are the most sensitive. These are the two

regressors that are highly correlated with each other and with SES. It comes as no

surprise that the results for collinear explanatory variables are more susceptable to

measurement errors of this nature.
























9 9
I 9 '
* *S D *1


(a) Residuals


S 10 1iS 20


(d) Hat diagonals


I
0.5 9

*D.*
-0. II 11 20

.1
d.!



(g) DFBETAS for
SES


* t 9s 1 o 10 20
iz 5 "'* U "1'S '
e .



(b) Externally stu-
dentized residuals


9 9 *~ -- *.*. *
~ t~ .91b


(e) DFBETAS for
SALARY


999
9 10 ~ 11
l.1~


(h) DFBETAS
for TSCORE


0.3
0.9


..... .. .. 'D .' o
5 It is 20


(c) Cook's Dis-
tances



Mi
1.5 #
1.
0.51 .9
*, 9 "9 9


0.5

0.1


(f) DFBETAS for
WCOLR


(i) DFBETAS for
MOMED


Figure 3.1: School data; diagnostic plots


* 0 ,


. '. ,.'IV 'fO








Table 3.2: School data; curvatures using X, + w perturbation
Estimates Column being perturbed
in LD(w)
or LD[el](w) SALARY WCOLR SES TSCORE MOMED
0 A1=4.144 A1=20.156 A1=50.008 A1=6.658 A1=19.065
A2=0.440 A2=0.847 A2=19.002 A2=1.413 A2=00.932
A3=0.094 A3=0.071 A3=14.441 A3=0.599 A3=00.091

3 A1=3.357 A1=18.534 A1=26.445 A1=4.432 A1=17.292
A2=0.440 A2=0.847 A2=19.002 A2=1.413 A2=00.932

i 3.357 0.956 19.008 2.126 0.951
/2 0.458 18.534 19.890 1.585 10.869
3 0.443 2.957 26.445 1.501 1.640
/4 1.129 1.859 19.220 4.432 1.295
/5 0.444 11.590 19.324 1.480 17.292

&2 0.881 1.693 38.004 2.825 1.865




Since perturbing the SES variable is most influential, examining directional

vectors for this perturbation is of interest. The element corresponding to case 18 is
the most noticeable in the plot of vmax for the likelihood displacement (Figure 3.2).
This observation has a large residual and is indicated as being influential in the
various diagnostic plots in Figure 3.1. Also note the similarity of Figure 3.2 to the
residuals (Figure 3.1(a)).
Plots (a)-(e) of Figure 3.3 show the directions of maximum curvature when
directing influence on each of the individual regression parameter estimates. In
general the plots indicate different elements of X3 to be influential on different /3.
Finally, Figure 3.3(f) provides the direction of maximum curvature when directing
influence on &2. Recall that this vector is proportional to the residuals.








1
0.75
0.5
0.25
0 0
.5o ft 15 20
o o
-0.25
-0.5 *
-0.75
-1

Figure 3.2: School data; X3 + w perturbation; influence on/3



3.3 Perturbations to a Single Row

The special case of only perturbing a single row of X was discussed in

Thomas and Cook (1989) in generalized linear models. Here, detailed derivations

for regression are presented. The results indicate that single row perturbations have

limited diagnostic value.


3.3.1 Building Blocks

Without loss of generality, suppose perturbations are added to the k regressor

values of the first observation (i.e., x' becomes x' + w'). For d, being of order n,

the perturbed likelihood is

L (O;y) oc log ,2 -1 (y X3 d,'f3'(y X dlw,')
22a
2 1 k k (3.13)
= -log (' 2 y- 2( Owj) + ( jWj)2).
j=1 j=l

Partial derivatives with respect to w are as follows:

'2 (3.(-214+2()j)
9w 212 (3.14)

=1-CE m























0, S 0


5
0 0


1- - -1 .* .41 I 1


'10 n 20
o -0.25

S-0.5

*0.75

*I


(a) vma. for di


1

0.75
S
0.5

o 0.25

15 20
.5 7 2
-0.25

.-


(c) vma for /3


* O000.


15 w


1.g
I


a O **0


5 10 15 020
# I


(b) Vmx for f2


O* O


I
O S


* 5 o 10
O


N .,ioA


(d) Vmax for 04


0 O 5 0 10


O *
o15
-o --
o o


(e) Vmax for /35


(f) Vmax for &2


Figure 3.3: School data; X3 + w perturbation; directed influence plots


o
Oo
B


5
*
I







Since the null perturbation is 0, evaluating at w0 simplifies the expression. We then
proceed to take second partial derivatives:


92L,(0; y) 1
4W49,8i L O a2


0


0
Yi x'f1
0


0


&2L.(0; y) (Y_ -x[03.
aJ9(:r2 W=Oo = ( O4

Evaluating at 0 = 0 and combining the portions for the regression parameter
estimates we obtain


where r, is the first
matrix


a2L .( 0; y) 1 l 1
0wao' 10=,w o Oxi
02 L,,(0; y) 8=,=o r,
L(r2;y) T- 4

residual. From these quantities we can construct the k x p


A'=I


(3.15)


The acceleration matrix is then given by

S. 2 r2 I- 1 [ (OP1 2 I 4 r- '
n hr -
= 2 [(hil + 2r2) 8'r 2 (X'X)-' r1Mx (X'X)-1 ri(X'X)-YXi']
T2 ~~~11rJ21


- Xl0








3.3.2 Directed Influence on 62

The acceleration matrix for the profile likelihood displacement for &2 is given

by

F[J-o ^Q2 [2&14] [2
r n &4J (3.16)

n4r

Solving the characteristic equations (F2] AI)v directly, the single eigensolution is


Ca= ^IIn VmaxOC.

Notice how the curvature depends upon the residual for the case that is being

perturbed, but the directional vector is always proportional to 3 regardless of which
row of X is being perturbed. This suggests that Vmax is not very useful for single

row perturbations. The values of Cmax for each row can be compared to assess which
row is most influential when perturbed. However, the relative effects only depend

upon the residuals, indicating there is little value added by plotting the curvatures.







3.3.3 Directed Influence on f3.
Without loss of generality, we can direct influence on 01:

Vmax oC AT

C [rlIk[ 1


--1 riIk-1 --
I0
-X rl /-'- 1ll


+ (ill Xll)3
hl]


21=|( + -x-2,
Cm.= p-- ,lrl II Y- *[)' + (:ll- X1:0011"2


where il is the predicted value of xll from modeling X1 as a linear combination
of the other regressors. More generally, the maximum local influence on ,Bj by
perturbing row i is

Vmax = ri + (iij xij)O
( -14Y )
Cmx= 2-11-rj II 11(1 ') + (i xij)l2.

If the columns of X have all been standardized, then each of the elements of 7 and
of /3 is less than 1. This suggests that element i of Vmx is likely to be larger than
the other elements. However, the signs of ri and (iij xjj,)/ can work to cancel
each other out. Further interpretation is suppressed in lieu of the more general


Ok

S( rl
Ok-i

rl( 1






64

Cmax
40
35
30
25
20
15
10 oo o*0
S O 0
5 O
i Row
5 10 15 20

Figure 3.4: School data; maximum curvatures for single row perturbations



perturbation presented in Section 3.4. Note that the curvature and directional

vector change if one or more of the columns of X are rescaled.


3.3.4 Influence on 3 and b

The acceleration matrix for directed influence on /3 is


p = -2 hj + rh(X'X)- rlfx(X'X)-1 r(X'X)-1X }. (3.17)


The eigensystem of this matrix is unknown, but my numerical examples indicate

that it has k distinct non-zero eigenvalues. The eigensystem of F for the full set of

parameter estimates is also unknown.


3.3.5 Example

The values of school 18's regressors are most influential when perturbing rows

of X one at a time, as evidenced by a large maximum curvature (Figure 3.4). For

perturbing row 18, vmax is nearly proportional to /3 (Figure 3.5). Note that although

element 3 of Vmax appears quite large, it is actually less than the benchmark of

2 = 894.






65

i
0.75
0.5
0.25

2 3 4 5
-0.25
*
-0.5
-0.75
-1

Figure 3.5: School data; x18 + W', vm for 0



3.4 Perturbations to the Entire Design Matrix

At this point, the general case of perturbing all nk elements of the design

matrix is considered. We use W to denote the perturbations in matrix form:


W1 Wn+l W2n+1 .. W.kn-n+1

W2 Wn+2 ... ... Wk n-n+2

W W w3 ... ... ... Wkn-n+3



Wn W2n W3n ... Wkn

and wj to denote the jth column of W.


3.4.1 Building Blocks

The derivations in this section only require the presence of an intercept in

the model (or centering of the response y). However, for ease of interpretation, we

assume the columns of X have been centered and scaled. The perturbed likelihood

is


L,(O;y) oc log o2 (y- XO W )'(y X3 W3). (3.18)
2 o 20r2





66

Proceeding in a similar fashion to the derivations for perturbing a single column,
partial derivatives with respect to w are as follows:


aL,,(e;y) 1
wJ a2


01(y -
02(Y -


X3) /3WI3
XO) 0wf2


Since the null perturbation is 0, evaluating at wo simplifies the expression:


9L,,(O;y) 1_
9Ow W=o a2


1(y-xf3)
01(y- XJ8)


0k(y XO)


We then proceed to take second partial derivatives and evaluate at

F r 0 ... 0\ t


1
~a2


92 Lw 0; y)0=0
9WO9,3, e0e0,uW=W


0 ...


0:

/31
I2x



&kX


L \
={(Ikr-O3X)
or


92L,(0;y) 1
9w9Oa2 O=Oo,W=Wo -


(3.19)


(3.20)


i3ir
^2r


&3kr


A k(y XJ3) AWO


14(.39r).







From these quantities we can construct the np x p matrix A', as given by Cook
(1986):

A'= [Ikr- X (r)] (3.21)


where denotes the Kronecker product (Searle 1982).
The next step is to construct the acceleration matrix. Here we use the fact
that (A ( B)(C D) = AC 0 BD, assuming comformability of the matrices.
Noting that A = A 1 = 1 A, the acceleration matrix can be written as

2=((I r- 9X (X'X) (Ikr- X')+4 (I'rr'


4+ \n / \

(3.22)

3.4.2 Directed Influence on 62
The acceleration matrix for the profile likelihood displacement for a2 is the
second term of (3.22):

4= Of rr'. (3.23)
n&4

This matrix has a single non-zero eigenvalue, |2, with corresponding
eigenvector oc /3 r. This solution is confirmed below:
[4 4
-I)v oc 4n rr' T l21 ] (03 9 r)

= 40'&rr_11 11].(0r)

4- 0 [ 1 2 1 I M I2 r 11 1 12. 1 r )
=0.







This result implies maximum local influence on 62 is achieved by perturbations in
each column that are proportional to the residuals scaled by the corresponding j3.

3.4.3 Directed Influence on (3
In this section, we derive the full eigensystem of ], where

pp] = 2 (1, 0 r X) ((X'X)-1 0 r' -/3 (X'X)-X').

The following theorem provides the full eigensystem of F for the X + W
perturbation in a regression analysis. Note that eigenvalues in the theorem appeared
in Cook (1986), but the eigenvectors appear here for the first time.
Theorem 3.1
Recall that W. is the jth eigenvector of X'X and that Xoj is the j1th principal
component. Letting 6j denote the jth eigenvalue of X'X, the k non-zero eigenvalues
and eigenvectors of F[ are

S= 2(6n + 2) V a W_-j+ 0 r 3 9 Xpkij+_, (3.25)
) 2L_j+1 -&2 jO kj1 7 ~ kjl


forj = 1,...,k.
Proof of Theorem 3.1
Because the algebraic expressions are so long, the eigensolutions are confirmed in
two steps. Noting that 61 and Okj+1 are the jth eigensolution of (X'X)-1, we
S-j+1






have
] n 11013"12 (9 r)
( -2( 3j- z)^^
2 [(1, (9 r 0 0X) ((X'X)-l (9 r' 2' (X'X)-1X')] (W, (9) r)
T22
2([ + Iim3I2)(W r)
2 Ik (ik r X) (It ^ -j )] 2 J(le + II^H12)(Wj 0 r)
P~ [(Ir-) (IrI2 P) jjl12lI2)~0) (.6
2 [117112 r - X X^)1 -pI 2 Il2+ i2)( 9 r)
&2 ij T 6j
3326

=2 liiir ^ ^ li ( x )].
2 l02 W 9 r 1 f ( 0 ( 9 X j
The following matrix multiplication yields the same expression as (3.26):



-= [(k 9 r X) ((X'X) -' (X'X) -') (3 X Xv?)
&2 (llrli
2- 2(1 + 1012)(0 ) Xpj)

2 [(Ik9 r 0 X) (0- _|13||2Wj)] 2(1!E + 10112)0 Xyj)
= 2 [m11{-i_ r + ( Xa)] 2 ( !l + IIii2)( Xp)
&2 2 3
= 2-1,1 r) 1112O (9 X\)]
(3.27)
Using (3.26) and (3.27), confirmation of the jth eigensolution is immediate:

AI 2= 2n-- + 111)1 (Wk-j+l 9 r)
-26-j+l &2
-_[, 2( n +x 11112) 0( X k- (3.28)


=0.







Finally, for j : j*, the orthogonality of the solutions follows:

(W, (9 r i3 (D X^p)'(j., (9 r (g Xvi.)
= j, 9 r'r ', ( 9 ,jX'r w ,r'X j, + '3 x'XX j,
= 0- 0- 0+f0 +

=0. 0



3.4.3.1 Interpreting Theorem 3.1
The closed-form eigesolution in Theorem 3.1 gives insight into how slight
changes to the observed explanatory variables affect inference. To interpret the
theorem, consider maximum curvature, as per (3.25):

Cmax = 2( + ). (3.29)

The eigenvalues of X'X are traditionally used to assess the amount of collinearity
in the regressors. If 8k is small, there is redundancy in the explanatory variables,
and this causes the inversion of X'X to be unstable. If this inversion is unstable,
so too are the regression parameter estimates, 3 = (X'X)- X'y. The first term in
Cma,,x is large when 5k is small, giving further evidence that the estimates are more
sensitive when collinearity is present. The second term of Cma, is also interesting.
It can be considered a naive measure of goodness of fit; when the relationships
between the explanatory variables and y are strong, |11&32 is large and &2 is small.
Hence, "better fitting" models are more sensitive to perturbation. However, 11'3112/&2
typically grows whenever additional variables are added to the model, indicating
more complicated models are also more sensitive to pertubation. In fact, HII13I2/&2
bears some similarity to R2:

R SSeg IIX0ii2 IIX 112
SSy y n(&2+IIX + 12)








The quantity R2 is the proportion of variance explained by the model, and serves
as a measure of goodness-of-fit. However, R2 can be made artificially large by
adding additional explanatory variables, even when they contribute very little to the
explanatory value of the model. The quantity 11 1H2/&2 behaves similarly: additional
variables decrease the denominator and usually increase the numerator. Of course,
the scale of the regressors is a major factor in the size of |I1|12, indicating that
scaling the regressors would be prudent for interpretation.
So, we may interpret Cmax in the following way. The vector of parameter
estimates is more sensitive when (a) there are collinear variables, (b) there are a large
number of explanatory variables, and (c) the model closely fits the data. We are not
surprised to see that the first two phenomenon are sources of sensitivity. However,
the suggestion that better-fitting models are more sensitive to measurement error
may at first be disconcerting. When viewed in light of Type I and Type II errors,
this is not the case. Recall that a Type I error occurs when a test is wrongly
declared significant, while a Type II error occurs when a test is wrongly declared
insignificant. Generally, a Type I error is considered the worse of the two. So, if
there are strong relationships between X and y, the formula for Cma,, implies this
might be missed because of measurement error in the regressors. However, if there
is a weak relationship, perturbations to X would not make the relationship appear
strong. In this sense, the regression model is better protected against Type I errors
(concluding there are strong relationships when there are none) than Type II errors
(concluding there are weak relationships when in fact there are strong ones) when
there is measurement error in the regressors.
As a final comment on Cma, the term 111112/&2 may be better described as
an indicator of a closely fitting model rather than a well fitting model. This suggests
that if we construct complicated models to fit the nuances of a data set, the resulting








parameter estimates are more sensitive to rounding and measurement errors in the
explanatory variables.
Now let us consider Vma as per Theorem 3.1:

Vmax OC Wk 9 r 3 X( k. (3.30)

This vector and the other eigenvectors of Fl provide detailed information about
which elements of X most affect the stability of the MLEs. The expression for
Vma, shows how outliers and leverage in the final principal component regressor
determine precisely what kinds of mismeasurements have large local influence on f3.
The directional vector can be used to scale W to achieve maximum local influence:

Wmax 0C [9klr lXfk Vk2r 3XXok ... Vkkr &kXp ]. (3.31)

The elements in the first column are a weighted average of the residuals (r) and the
regressor values of the last principal component regressor (Xpk). The respective
weights are the first original regressor's contribution to the final principal component

(kl) and the negative of the first regression parameter estimate (-i).
Alternatively, we can think of the elements in the first row as being a
weighted average of the explanatory variables' contributions to the final principal
component (W) and the negative of the parameter estimates (0'). The respective
weights are the first residual (ri) and the negative of the first observation's value for
the final principal component (-xlk).
3.4.3.2 A special case
In order to examine in more detail how the perturbations given in (3.31)
affect /, we consider a simple situation for which there are closed-forms for the
perturbed parameter estimates. Let us assume that the response and a pair of
regressors have been centered and scaled to have length one, and that the regressors







are orthogonal. The implications of these simplifications are as follows:

|iy||= I|X1 = |Ix211 1 r'r= 1-/1-A2
1 = xy 2 =x'2Y
61= 6 =1 = 12.

The first two eigenvalues of F are the same:

Cma.A, --l- A2 = 2(n- + 2n 1 )1
&2i Pi2 22

and we may choose either of the following vectors for Vma:

r- iXl -lX2
-42-X1 r' 42-X2

Without loss of generality we choose the first one. Letting X" denote the perturbed
design matrix, we have
Xo = X + aW
X=X+aWr-X -X]
X+aIr 4X, (3.32)

= ar + (1-aIai)Xi X2-a 2X1

Xl,.w X2,w ]

for some real number a. Now, consider the effects of perturbation on 41. Assuming
a no-intercept model is still fit after perturbation, the perturbed estimate is given by
r-,i Y (3.33)
ll 1 1 11







where ri, is the residuals from fitting X,,, as a function of X2,". The building
blocks needed are as follows:
1 X
H2, = X2,(X, X2,)X -1 a/2X)(X'2 a2X'1)
1 a22(X
r,,, = (I H )X = (1 a/ acf2)Xi + cX2 + ar

ri,,y = (1 a~l ac42)41 + 42 + a(l 412 /22)
I|ri,,|2 = (1 a ac42)2 + c2 a2(1 2 2)

where c = a.2(1-a$. It follows that
2+a2p
01, a ( I ac2)41 + c42 + a(l -1 2 -_ 2)
(1 a: ac/32)2 + c2 + a2(1 -12 2)

For the sake of comparison, we can also consider perturbing only the first
column. Perturbation in the direction of Vmax when directing effects on /31 results in
the following perturbed X matrix:

X, = X + a*W*

= X+a* Ir- IX 0 (3.34)

= [ a*r + (1 a*I )Xi X2]

as per (3.8). Here, the perturbed estimate is given by

&W a ) + a* (1 -1- 2)
(1a )2 + a'2(1 _/ 2_ 22)"

In order for the two perturbations to be comparable, they should be equal in size.
This is accomplished by stacking the columns of the perturbation matrices and
making the resulting vectors equal length. In other words, the Froebenius norms
of aW and a*W* should be equal. Letting a* = a provides the necessary
adjustment.








Now we can plot &,, for some selected values of /p and I2, considering values
of a ranging from, say, -V2 to vf2-. This eliminates perturbation matrices with
Froebenius norms that are larger than that of the unperturbed X matrix. Figure 3.6
shows plots of &, for four sets of parameter estimates. The solid lines represents the
parameter estimate in the direction Vmax for the two-column perturbation, while the
dashed line represents parameter estimates in the direction Vsmax for the one-column
perturbation. All of the influence curves in plots (a) and (b) have a value of .2 under
the null perturbation, i.e. when a = 0. Likewise, the curves in plots (c) and (d) are
all equal to .5 at the null perturbation. In all plots we see that the single column
perturbation does not have the same effect as the dual column perturbation. The
discrepancy between their effects is larger when /32 is larger (plots (b) and (d)).

3.4.4 Influence in Principal Components Regression
Consider the canonical form of the model, as given in Chapter 1:

Y = Za + 6, (3.35)

where Z = XV and a = p'/3. The following theorem proves that the direction of
maximum curvature is the direction of maximumum directed influence on &k.
Theorem 3.2
Let & denote the MLEs of a for model (3.35). The pairs Aj, vj given in Theorem
3.1 are the maximum curvature and direction of maximum curvature for directed
influence on &k-j+1, j = 1,..., k.































-1 0,5-'


0.5 1


(a) Influence curves for /3 = .2,
/2 =.4


Bi


-1 .-e:~


0.5 1


(b) Influence curves for /l = .2,
/32 = .7


B1


0.5 1


(c) Influence curves for/3i = .5, (d) Influence curves for /3 = .5,
/2 = .4 2 = .7


Figure 3.6: Effects of single and double
double column; dashed line- column 1 onl


column perturbations on Q\. Solid line-


..............








Proof of Theorem 3.2 We begin by obtaining update formulas for A and L1 as
per Section 2.7:

A" = A0/
A/ A^
=App (3.36)
= T,(Ifcr-f3X)p
= ( k 0 r - ,x )3
1

1= (W L r X)



&2 (3.37)
= D(6

= &2D(1/6).








Next, we find maximum directed influence on dk-j+l using the results of Beckman

et al. (1987):

Vmax 0C AT _,Wk+l

0


0
1 r&i -
oc &2 W^ r XV] N jk-j+1

0


0

oc Wk-j+l r (- 9 X k-j+l



Cmax 2| 2 (IAaT _|j+l 2
)(-1 9 r' 0 k-j+1X')(Wk-j+1 0 /3 0 X kj+l)



2 (n&2 1111 blI2k 3+l)
&26k_j+l
2( 6k +1
_k-j+1 ar2

Since these expressions are the jth eigensolution of FP the proof is complete. 0
This theorem gives added insight into the direction of maximum curvature.

In short, Vmax is the direction that changes &k the most. Similarly, the jth largest
eigenvalue and associated eigenvector specifically correspond to effects on &k-j+l.
Also, the sizes of the curvatures show that the estimated coefficients for the most
important principal component regressors are also the least sensitive.








Example 1

Consider the Scottish hills data set described in Chapter 1. The purpose of
this example is twofold- to demonstrate how the X-matrix results can be displayed

graphically, and to demonstrate how collinearity affects the curvature values when

examining influence on f3.
Figure 3.7 gives the direction of maximum curvature for directed influence on

a2. The first 35 elements correspond to perturbations of the DISTANCE regressor

values, while elements 36-70 correspond to perturbations of the CLIMB regressor
values. The value for race 18's DISTANCE is the largest contributor to this
influential perturbation. Elements 7 (DISTANCE for race 7) and 53 (CLIMB for

race 18) are also large

The information in Figure 3.7 can be difficult to discern because the

perturbation vector enters into the likelihood in the form of a matrix, W. Another

way to display Vma is to first take its absolute value and then to consider the

elements corresponding to each column of W. Each portion can then be plotted

against indices 1,...,n and shown together (Figure 3.8(a)). This graphical display

provides a "profile plot" of each column of W's contribution to the direction of

maximum curvature. Here we see again that elements corresponding to observations

18 and 7 as being the most influential. There is also some evidence that measurement

error in the DISTANCE values (solid line) are more influential than error in the

CLIMB values (dashed line).

Figures 3.8(b) and 3.8(b) provide profile plots of the absolute value of the
two directional vectors corresponding to v, and v2 of F Il. The first plot draws

attention to observations 7, 11, and 18, while no observations appear noteworthy in

the second plot.

At this point we turn our attention to the curvature values that accompany
these directional vectors (Table 3.3). The first column shows the curvatures for local








Table 3.3: Hills data; curvatures using X + W perturbation. Model I contains
DISTANCE and CLIMB. Model II contains DISTANCE and CLIMB-RATE
Estimates in LD(w)
or LD[O1](w) Model I Model II

2 Al = 30.47 A = 28.52

3 A1=21.15 A1=16.41
A2=16.49 A2=16.25


influence when fitting the regression model for winning time as a linear combination

of DISTANCE and CLIMB. However, as was evident in Figure 1.5(c), the two

regressors are linearly related. An alternative model can be fit by replacing climb in

feet with climb in feet per distance travelled, CLIMB-RATE This regressor does

not have the strong positive association with distance that the original coding of

CLIMB has. This should reduce the collinearity in the model, thereby reducing

sensitivity to perturbations in the X matrix.

Using standardized regressors, the new model is fit. Column 2 of Table 3.3

shows the curvature values for this new model. Cma.x for directed influence on f

has been substantially reduced, indicating that the MLEs for the second model are

indeed less sensitive to perturbations in the X matrix.

Figures 3.9(a), (b) and (c) provide the profile plots of the absolute value

of the directional vectors that accompany the curvatures in Table 3.3, Column II.

All three plots tend to indicate DISTANCE values as being more influential than

CLIMB-RATE values, and all three plots show substantial change from those in

Figure 3.8. The plot for &2 now indicates the DISTANCE value for observations 7

and 18 as influential, rather than just 18. No elements stand out in Vmax, while case

11's DISTANCE value is a very substantial contributor to v2.








1
0.75
0.5
0.25

S10. 20 .*.30 *'4-0 *5'- **69 *. W
-0.25
-0.5
-0.75
-1

Figure 3.7: Hills data X + W pert.; vma for &2



Example 2

We now consider applying the X + W perturbation to the school data and

examining directed influence on (3. The purpose of this example is to demonstrate

alternative methods for displaying Vma,, when the number of regressors and/or

observations is large.

Figure 3.10 is the plot of Vma. for directed influence on /3. Elements 1-20

correspond to SALARY, elements 21-40 correspond to WCOLR, elements 41-60

correspond to SES, elements 61-80 correspond to TSCORE, and elements 81-100

correspond to MOMED. The two regressors that have little association with other

explanatory variables, SALARY and TSCORE, have small elements. In contrast,

some of the elements for the other regressors are more substantial. This plot does a

fair job of highlighting regressors, but a poor job of highlighting observations.

Figure 3.11 is the profile plot of Ivmaxl. It is easy to see that cases 3 and 18

have multiple regressor values that are influential. In addition, one of the values for

case 2 is influential. However, is to difficult to keep track of which line represents

which column. Another option is to plot "profiles" of each row, as in Figure 3.12.

Here, some of the elements of observations 2, 3, and 18 are highlighted.























5 10 15 20 25 30 35
5 10 15 20 25 30 35


(a) IVmaxl for a2


5 10 15 20 25 30 35


(b) IVm.. for 0


5 10 15 20 25 30 35


(c) Iv21 for f


Figure 3.8: Hills data; X + W pert.; column profiles of directional vectors. DIS-
TANCE values are connected by solid line. CLIMB values are connected by a dashed
line.


























5 10 15 20 25 30 35


(a) IV|max for &2


5 10 15 20 25 30 35


(b) IVmaxl for 3


5

5
5 10 15 20 25 30 35
5
5

(
1

(C) IV21 for


Figure 3.9: Hills data; Model II; X+ W pert.; column profiles for directional vectors.
DISTANCE values are connected by solid line. CLIMB-RATE values are connected
by a dashed line.



















* a


. "" 2Q 40.
*


* *
- ~ .- -a


*.6Q 80
* a


Figure 3.10: School data; X + W pert.; Vmax for directed influence on f3









0.4



0.3



0'.2


i~~ ~ ,J/ '






5 10 15 20

Figure 3.11: School data; X+W pert.; Column profiles of VmaxI for directed influence
on /. Thick solid line- SALARY, thin solid line- WCOLR, dot-dashed line- SES,
small-dashed line- TSCORE, large-dashed line- MOMED.


0.4[


0.2






-0.2



-0.4


* 160
























2 3 4 5

Figure 3.12: School data; X + W pert.; row profiles of IVmaxl for directed influence
on /3 with large elements labeled.


A bubble plot of I Wmax I provides the best graphical display of the information
in Vma, (Figure 3.13). Each element is represented by a point plotted according to
both the corresponding regressor (x-axis) and case (y-axis). Around each point is a
bubble with radius equal to the absolute value of the element, and so large bubbles
indicate large contributors to the direction of maximum curvature. Again, we see

that the elements corresponding to x18is,2, x18is,5, x3,2, x3,5, and x2,3 have the largest

influence, as evidenced by having the largest bubbles.







86












Row

20





0 0

0

0 a


C.
15 0.

10 *

*0*~




*
10 .







0 0 0

0 *












S1 2 3 4 5 Column

Figure 3.13: School data; X + W pert.; bubble plot of IWmax for 3













CHAPTER 4
LOCAL ASSESSMENT FOR OPTIMAL ESTIMATING FUNCTIONS

It is possible to examine the local influence of perturbations on measures other
than the likelihood displacement, LD(w), and the profile likelihood displacement,

LD[G1](w). Several authors have applied local influence techniques to specific
measures of influence. For instance, Lawrance (1988) and Wu and Luo (1993c)
looked at influence on the Box-Cox transformation parameter. Also, influence
has been examined on the residual sums of squares in regression (Wu and Luo
1993b), deviance residuals and confidence regions in GLMs (Thomas and Cook 1990;

Thomas 1990), and goodness-of-link test statistics in GLMs (Lee and Zhao 1997).
However, considerably less attention has been given to assessing local influence
when estimation is not performed by ML. Some work has been done on extentions

to Bayesian analysis (McCulloch 1989; Lavine 1992; Pan et al. 1996), and Lp norm

estimation (Schall and Gonin 1991).
In this chapter, local influence techniques are extended in two ways. First,

we consider the influence of perturbations on a general class of influence measures.

Second, we address parameter estimates that are found via estimating equations.
The intent is to unify nearly all previous work on this methodology and provide a

conceptual basis and computational tools for future applications. The first three
sections outline the inferential platform, a general class of influence measures,

and assumptions about the perturbations. Next, techniques for local assessment

of perturbations are discussed. In Section 4.4, we give some algebraic results
for the two main diagnostic quantities: the gradient vector and the acceleration
matrix. Section 4.5 contains computational formulas, while Sections 4.6-4.8 contain
applications. Several measures, including LD(w) and LD['] (w), are considered in








section 4.6 using ML. Section 4.7 addresses influence on the generalized variance
with an emphasis on ML. Finally, the last section considers quasi-likelihood.

4.1 Inferential Platform

This section outlines assumptions about estimation, the nature of
perturbations, and how the effects of perturbation are measured.

4.1.1 Estimating Functions

Suppose that a statistical analysis is performed to estimate p parameters,
0. Further, suppose that point estimates, 0, are obtained as solutions to a set of p
unbiased estimating equations,

h(O; y) = 0, (4.1)

where for each estimating function, hi(O; y),

Eo(hi(O;y)) =0 i= 1,...p.

Sometimes these estimating equations arise from minimizing or maximizing some
criterion with respect to 0. Examples include least-squares normal equations, score
equations and equations resulting from minimizing a Bayes risk. Quasi-likelihood

(QL) equations (Wedderburn 1974) were originally motivated by maximizing a
quasi-likelihood function, but the existence of the objective function is not necessary.

Generalized estimating equations (GEEs) (Liang and Zeger 1986), like QL equations,
merely require assumptions about first and second moments.
The theory of estimating equations (EEs) dates back to early work by V. P.
Godambe (Godambe 1960; Godambe 1976). There has been renewed interest in the
subject due to new estimation techniques for semi-parametric models and models for
clustered non-normal responses (Zeger and Liang 1986; Prentice 1988; McCullagh
and Nelder 1989; Desmond 1997a). A considerable amount of work has centered








on obtaining optimal estimating functions (OEFs) (Godambe 1976; Crowder 1986;

Crowder 1987; Godambe and Heyde 1987; Firth 1987; Godambe and Thompson
1989; Godambe 1991). To this end, the efficiency matrix of a set of p estimating
functions is defined as

Eff(h) = (E [ ] )-1V[h](E h (4.2)

Functions that minimize this non-negative definate matrix are considered optimal.
Roughly speaking, this index of efficiency is small when the variance of the
estimating functions is small and the average gradient is large. Provided that the
estimating equations arise from maximizing or minimizing some function, we may
interpret that large gradients indicate the function is not flat around its extremum.
Assuming that a fully parameteric form for the data is assumed, the score
functions are the unique OEFs up to a scalar multiple (Godambe 1960; Bhapkar
1972). However, with fewer assumptions about the data, it is generally not possible
to find a unique set of optimal estimating functions. Instead, optimality is bestowed
upon estimating functions within a certain class. One such class of estimating
functions are those that are linear functions of the data.

Under mild regularity conditions, the asymptotic variance of 0 is in fact given
by (4.2) (Morton 1981; McCullagh 1983; Crowder 1986). Thus, minimizing Eff(h)
is equivalent to minimizing the asymptotic variance of the parameter estimates.
Note that the first interpretation of the efficiency matrix is appealing in that it is a
finite sample index, but somewhat awkward in that it is based on a property of the
estimating functions. The second interpretation is appealing in that it is based on a

property of the estimates, but limited in that it requires asymptotics.
For the purposes of applying local influence diagnostics we assume only that
the estimating functions have been standardized. That is, given a set of unbiased
estimating functions, h*(0; y), we utilize the standardized estimating functions given








by

h(O;y) =-E [-I-O (V [h*])-lh'. (4.3)

This standardization has three implications. First, V(h(0; y))-1 is equivalent
to (4.2) and hence equivalent to the asymptotic variance of the parameter estimates.
In addition, the standardized estimating functions are optimal in the following sense.
Given a set of unbiased estimating functions, {h(O; y)}, the OEFs amongst the
class of linear combinations of the h* are given by (4.3). Hence, the standardized
estimating functions are an optimal rescaling of the original estimating functions.
Note that the negative sign is present in (4.3) so that the score equations, rather
than the negative of the score equation, are considered standardized. Finally,
standardized estimating functions are "information unbiased" (Lindsay 1982),
meaning that the following equality holds:

-E [Oh = E[hh'] = V[h]. (4.4)

Hence, V[0] = -E[Oh/0O']. This fact is used later in the chapter to derive influence
diagnostics for confidence ellipsoids.

4.1.2 Perturbations
Perturbations are again introduced as a vector of q real numbers, Wo. These
perturbations may affect the data, the model and its assumptions, or perhaps even
just the estimation technique itself. Examples of data perturbations are minor
changes or rounding errors in the response and the explanatory variables. An
example of model perturbation is the variance perturbation scheme used in Chapters
1 and 2. Finally, perturbing the contribution of each observation to the estimating
functions by case-weights is an example of perturbing the estimation technique.
This perturbation does not affect the distributional assumptions of the data, but
rather how each observation is utilized for inference. Another perturbation to the








inference method would be to allow the value of the hyper-parameters in a Bayesian
analysis to vary. Though some might argue that the hyper-parameters are part of
the model, their values are often chosen for computational convenience.
For the purposes of this chapter, we assume that the perturbation scheme

results in a set of p perturbed estimating equations,

h,(0; y) = 0, (4.5)

which have solutions 6,. It is also assumed that there exists a null perturbation, wo,

whose estimating equations are identical to the original ones. Finally, it is assumed

that there are no constraints on w that would limit their values in a neighborhood of
w0. For example, constraining the space of variance perturbations to positive values

is allowed. However, constraining the perturbations to sum to one is not allowed.
These kinds of constraints could be accommodated, but are not considered here.

4.1.3 Influence Measures

The final item to specify is a quantity whose sensitivity to perturbation is of
interest. This influence measure serves as a barometer for how perturbation effects

inference. The measure can depend on w directly and indirectly via some r x 1

vector of functions of the perturbed parameter estimates, -y(O0,). Some influence
measures, m(w, ,(O)), may be appropriate regardless of the method of estimation

and the perturbation, while others may be more application-specific.
Examples of rather general objective measures are the following:

ml = x'16 (4.6)

M= (4.7)

m3 (Y (48)







where Ew(yi) = p. None of these three measures depend on w directly. The first is
a linear combination of the perturbed parameters. Here, 7(0O,) = z'O,, is a scalar.
The second measure is the ratio of the ith unperturbed parameter estimate to the ith
perturbed estimate. Again, y(9w) = 6L is a scalar. The third measure resembles a
Pearson goodness-of-fit statistic based on the perturbed estimates. Since it is not yet
specified how the means, fA,, depend on the estimates, we simply write y(0b,) = w.
The following measures are more application-specific, in that they depend on
how inference is performed and perhaps even upon the perturbation itself:

m4 = LD(w) = 2[L(0; y) L(0,; y)] (4.9)

m5 = LD'\(w) = 2[L(01,62; y) L(01,, g(10,);y)] (4.10)

m6 = LD*(w) = 2[L,(6,; y) L0,(; y)] (4.11)


f(y I 0) ]
m7 = E [1og(f/(Y I 00) (4.12)


m8 = log (h(O ;Y) (4.13)

The first of these measures is the likelihood displacement, for which "Y(0,) = 0b.
The second is the profile likelihood displacement, which has -Y(6O)' = ( g(06i)'),
where g(01.) maximizes the profile likelihood of 01, with respect to 02. The third
measure is a likelihood displacement with a "moving frame" because the base
likelihood of the measure is the perturbed one. Obviously, these three measures are
intended for use when estimation is performed by maximum likelihood.
The influence measure m7 is a Kullback-Leibler divergence between the
densities f(y I 0) and f(y | 0I). The quantity is non-negative, achieving its
minimum value of zero when the two densities are equivalent. Here, Y'(0w) = 0Ow.
The last measure addresses sensitivity of asymptotic standard errors. Recall
that the determinant of a matrix equals the product of its eigenvalues, and when
the matrix is a variance-covariance matrix for a vector of estimates 0, the squared