# Biased Bootstrap Methods for Semiparametric Models

 Permanent Link: http://ufdc.ufl.edu/UFE0021391/00001

## Material Information

Title: Biased Bootstrap Methods for Semiparametric Models
Physical Description: 1 online resource (112 p.)
Language: english
Creator: Giurcanu, Mihai C
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

## Subjects

Subjects / Keywords: 2sls, bootstrap, consistency, gmm, iteration, regression, resampling
Statistics -- Dissertations, Academic -- UF
Genre: Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

## Notes

Abstract: The finite sample properties of estimators in some moment condition models often differ substantially from the approximations provided by asymptotic theory. The bootstrap can provide a way to circumvent the inadequacies of asymptotic approximations, but in over identified models, where the dimension of the parameter is smaller than the number of moment conditions, the usual uniform bootstrap may be inconsistent. This problem is usually solved by recentering either the residuals, the sample estimating equations, or the statistic of interest. In this dissertation, we developed a new biased bootstrap methodology for moment condition models. This biased bootstrap is a form of weighted bootstrap with the weights chosen to satisfy the constraints imposed by the model. First, we construct a pseudo-parametric family of weighted empirical distributions, obtained by minimizing the Cressie-Read distance to the empirical distribution under the constraints imposed by the model. The resulting family has the least favorable property, meaning that the inverse of the Fisher information matrix evaluated at the MLE equals the sandwich estimator. By resampling within this family, we 'mimic' the parametric bootstrap for semiparametric models. An extension of this methodology for time series applies the biased bootstrap to the sample of blocks of consecutive observations. Our overall goal is to extend and develop the range of applications and theoretical properties of the biased bootstrap, focusing mainly in three directions. First, we prove that the biased bootstrap is consistent in moment condition models, with no need for 'recentering'. Moreover, by applying bootstrap recycling within the pseudo-parametric family, we obtain computationally feasible and more accurate iterated biased bootstrap procedures. The main idea here is to reuse the first level bootstrap resamples in order to estimate higher level parameters corresponding to the iterated bootstrap. This methodology is a competitor of the jackknife after bootstrap developed by Efron (1992) in the iid case and by Lahiri (2002) for dependent data. Third, new biased bootstrap procedures are proposed for problems where the usual uniform bootstrap fails, such as on the boundary of the parameter space and for certain asymptotically nonnormal statistics.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Mihai C Giurcanu.
Thesis: Thesis (Ph.D.)--University of Florida, 2007.
Local: Adviser: Presnell, Brett D.

## Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0021391:00001

 Permanent Link: http://ufdc.ufl.edu/UFE0021391/00001

BIASED BOOTSTRAP METHODS FOR SEMIPARAMETRIC MODELS

By

MIHAI C. GIURCANU

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2007

2007 Mihai C. Giurcanu

To my parents, my lovely wife Magda, and

my wonderful children Michael and Stefanie

ACKNOWLEDGMENTS

I am indebted to many people who gave me support and advice while a graduate

student. First, I want to thank Professor Brett Presnell for giving me the opportunity

to work on very interesting statistical topics during my graduate education. He was an

invaluable source of inspiration in this fascinating research. This dissertation would not

have been possible without his constant feedback and guidance. Our weekly discussions

helped me better understand the statistical issues involved in our research. I also want

to thank to my graduate committee members Professor Malay Ghosh, Professor Alex

Trindade, Professor Jim Hobert, and Professor Murali Rao for reading my research.

Our conversations on different statistical topics improved my understanding and vision

of statistics. I also want to thank to my first mathematics teacher, Professor Ignatie

Henny, who made me discover the beauty of mathematics. I especially want to thank my

parents for their love and interest in my education and my family, my wife Magda, and my

children Michael and Stefanie for their love, support, encouragement, and understanding

when sometimes I had to stay long hours at school to finish this dissertation.

TABLE OF CONTENTS

page

ACKNOW LEDGMENTS ................................. 4

LIST OF TABLES ....................... ............. 7

LIST OF FIGURES .................................... 8

A BSTR A CT . . . . . . . . .. . 9

CHAPTER

1 ESTIMATION IN MOMENT CONDITION MODELS ............. 11

1.1 Introduction ....................... .......... 11
1.2 Generalized Method of Moments .......... ............ 11
1.2.1 Review on Generalized Method of Moments ............. 11
1.2.2 GMM Estimation and Asymptotic Results ............. 13
1.2.3 N.-i I1 GMM Models ................... .... 17
1.3 M- Estimation ...................... .......... 18
1.4 Empirical Likelihood Estimation .................. ..... .. 20
1.4.1 Review on Empirical Likelihood . . ....... .. 20
1.4.2 Empirical Likelihood for Moment Condition Models . ... 22

2 THE BIASED BOOTSTRAP WITH IID OBSERVATIONS . ... 27

2.1 Review on the Bootstrap with IID Observations . . . 27
2.2 Least Favorable Families Corresponding to Z-Estimation Model ..... 31
2.3 The Biased Bootstrap for GMM .................. .. 33
2.3.1 Consistency Results for the Biased Bootstrap . . 35
2.3.2 The Biased Bootstrap Recycling . ........ 38
2.4 Instrumental Variables .................. .......... .. 42
2.4.1 Review on Instrumental Variables .................. .. 42
2.4.2 Bootstrapping 2SLS Estimators ............... .. 48
2.4.3 Simulations .................. ............ .. 51

3 THE BLOCK BIASED BOOTSTRAP FOR TIME SERIES . .... 59

3.1 Review on Bootstrap for Time Series . . .... .. ... 59
3.2 The Block Biased Bootstrap for Generalized M-Estimators . ... 60
3.3 Consistency Results for the Block Biased Bootstrap . . .. 63
3.4 Iterated Block Biased Bootstrap Recycling .... . . 65
3.5 An Application to the Optimal Block Size Selection . . .. 67

4 A HYBRID BIASED BOOTSTRAP .................. .... .. 74

4.1 On the Boundary of the Parametric Space ................. .. 74
4.2 Certain Asymptotically Nonnormal Statistics ................ .. 76

APPENDIX

A PROOFS OF CONSISTENCY RESULTS FROM CHAPTER I ........ 79

B PROOFS OF RESULTS FROM CHAPTER II ...... .......... 82

B.1 Least Favorable Families ................... ........ 82
B.2 Consistency of the Biased Bootstrap for GMM Estimators ........ 84
B.3 Consistency Results for Bootstrapping 2SLS Estimators ........... 95

C PROOFS OF CONSISTENCY RESULTS FROM CHAPTER III . ... 100

REFERENCES ................... ............ ...... 106

BIOGRAPHICAL SKETCH .................. ............. .. 112

LIST OF TABLES

Table page

2-1 Estimated coverage probabilities for bootstrap one-sided, upper confidence intervals
at a .90 nominal level, =7.8, B 1000, S=1000: GMM biased bootstrap (GBB),
GMM uniform bootstrap (GUB), centered residuals (FCR), and based on ..-i-', ii!1 ic
approximation (ASY) .................. .............. .. 56

2-2 Estimated coverage probabilities for bootstrap one-sided, upper confidence intervals
at a .95 nominal level, =7.8, B 1000, S=1000: GMM biased bootstrap (GBB),
GMM uniform bootstrap (GUB), centered residuals (FCR), and based on ..-i-,iiiill ic
approximation (ASY) .................. .............. .. 57

3-1 Computation of the optimal block size for uniform block bootstrap estimation
of level-2 parameters Q1 and Q2 given by (3-22) and (3-23). The number of simulations
is S=1000, and the number of bootstrap resamples is B=1000. An asterix (*)
shows the block size for which the minimum RMSE has been attained. . 72

3-2 Computation of the optimal block size for moving block biased bootstrap estimation
of level-2 parameters k1 and Q2 given by (3-22) and (3-23). The number of simulations
is S=1000, and the number of bootstrap resamples is B=1000. An asterix (*)
shows the block size for which the minimum RMSE has been attained. . 72

4-1 The quantiles of the distribution of T, under p=0, and their bootstrap approximations
given by the ordinary (uniform) bootstrap (UB) and the "hybrid" biased bootstrap(HBB),
using B=1000 bootstrap resamples, S 1000 simulation runs and 6 n4 78

4-2 The quantiles of the ..,-i-!,ii.l)tic distribution of T, under p=0.2, and their bootstrap
approximations given by the ordinary (uniform) bootstrap (UB) and the "hybrid"
biased bootstrap (HB), using B-1000 bootstrap resamples, S=1000 simulation
runs and T n-.4 ................ ............... 78

LIST OF FIGURES

Figure page

2-1 Estimated coverage errors corresponding to different bootstrap confidence intervals,
at different a levels, sample size n=20, 40: o GMM Biased Bootstrap (GBB),
D GMM uniform bootstrap, 0 GMM Recycling Biased Bootstrap (RBB), A
centered residuals (FCR), V based on .-i~,'!Iil.l ic approximation (ASY) . 57

2-2 Estimated coverage errors corresponding to different bootstrap confidence intervals,
at different a levels, sample size n=60, 80: o GMM Biased Bootstrap (GBB),
D GMM uniform bootstrap, 0 GMM Recycling Biased Bootstrap (RBB), A
centered residuals (FCR), V based on .-i~,'!Iil.l ic approximation (ASY) . 58

2-3 Estimated coverage errors corresponding to different bootstrap confidence intervals,
at different a levels, sample size n=100, 200: o GMM Biased Bootstrap (GBB),
D GMM uniform bootstrap, 0 GMM Recycling Biased Bootstrap (RBB), A
centered residuals (FCR), V based on .-i~,'!Iil.l ic approximation (ASY) . 58

3-1 The bootstrap estimates of the RMSE's corresponding to different block bootstrap
schemes. We used B=1000 outer bootstrap resamples (for the biased bootstrap
recycling (BBR) and the adjusted biased bootstrap recycling (ABBR)) and for
the uniform double bootstrap (UB) and the double biased bootstrap (BB) an
additional 500 inner bootstrap resamples for each outer bootstrap resample 73

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

BIASED BOOTSTRAP METHODS FOR SEMIPARAMETRIC MODELS

By

Mihai C. Giurcanu

August 2007

C'!i ,': Brett Presnell
Major: Statistics

The finite sample properties of estimators in some moment condition models

often differ substantially from the approximations provided by .-i-'!!,'il ic theory. The

bootstrap can provide a way to circumvent the inadequacies of .-i- i!!ill ic approximations,

but in over identified models, where the dimension of the parameter is smaller than the

number of moment conditions, the usual uniform bootstrap may be inconsistent. This

problem is usually solved by recentering either the residuals, the sample estimating

equations, or the statistic of interest.

In this dissertation, we developed a new biased bootstrap methodology for moment

condition models. This biased bootstrap is a form of weighted bootstrap with the

weights chosen to satisfy the constraints imposed by the model. First, we construct a

pseudo-parametric family of weighted empirical distributions, obtained by minimizing the

Cressie-Read distance to the empirical distribution under the constraints imposed by the

model. The resulting family has the least favorable property, meaning that the inverse of

the Fisher information matrix evaluated at the MLE equals the sandwich estimator. By

resampling within this family, we "mimic" the parametric bootstrap for semiparametric

models. An extension of this methodology for time series applies the biased bootstrap to

the sample of blocks of consecutive observations.

Our overall goal is to extend and develop the range of applications and theoretical

properties of the biased bootstrap, focusing mainly in three directions. First, we prove

that the biased bootstrap is consistent in moment condition models, with no need for

"reo. iil. ini:, Moreover, by applying bootstrap recycling within the pseudo-parametric

family, we obtain computationally feasible and more accurate iterated biased bootstrap

procedures. The main idea here is to reuse the first level bootstrap resamples in order

to estimate higher level parameters corresponding to the iterated bootstrap. Third, new

biased bootstrap procedures are proposed for problems where the usual uniform bootstrap

fails, such as on the boundary of the parameter space and for certain .i-vmptotically

nonnormal statistics.

CHAPTER 1
ESTIMATION IN MOMENT CONDITION MODELS

1.1 Introduction

In this chapter we review three moment condition models that will be used when

defining the biased bootstrap procedures in the next chapters. We first present the

Generalized Method of Moments (GMM), a very popular method of estimation in

econometrics. We describe the GMM estimators and their .,-vmptotic properties, and

we give a new proof of consistency for GMM estimators. In the next section we describe

M-estimation. M-estimators are more general than the GMM estimators, but their

.,-i- !, ,ii I ic properties are usually studied for some particular cases. Here we prove a

consistency result for M-estimators based on concave criterion functions and an extension

allowing for nuisance parameters. We close this chapter with empirical likelihood, a

more recently proposed approach to inference. Empirical likelihood estimation offers an

alternative to GMM estimation for over-identified models and it is closely connected with

the biased bootstrap defined in the next sections. Some techniques used in proving the

.-i-, iI!ill ic properties of empirical likelihood are also used in proving the corresponding

results for the biased bootstrap.

1.2 Generalized Method of Moments

1.2.1 Review on Generalized Method of Moments

The generalized method of moments was first introduced in the econometric

literature by Hansen (1982) and has been widely applied to time series, cross sectional

data, and panel models, particularly with applications to economic and financial data.

This methodology generalizes many standard estimation methods, including maximum

likelihood (illi), ordinary least squares (OLS), generalized estimating equations (GEE),

and the method of moments (\I\ ), by allowing the number of estimating equations to

exceed the number of free parameters.

As Hall (2005) remarks in a recent econometrics textbook, GMM has had a great

impact in the econometrics literature mostly because economic models rarely provide

a complete specification of the probability distributions of the data. Moreover, the

optimality properties of maximum likelihood estimators (MLEs) are only attained when

the distributional assumptions are correctly specified. Under misspecification, White

(1982b) proved that (pseudo) MLEs are no longer optimal, emphasizing the need for

alternative methods of estimation to reduce the impact of misspecification in parametric

modeling.

In a recent review paper, Lin- i v and Qu (2003) remark that GMM combines

estimating equations in an optimal way and that the corresponding procedures are highly

efficient in the following sense: GMM estimators are equivalent to the estimators based

on the best linear combinations of the estimating equations, in terms of the .,-i-!,ii, ll ic

variance. In particular, if the set of estimating equations (moment conditions) contains

the score equations of a correctly specified parametric model, then the GMM estimator is

.,-i-!,l .' I ically equivalent to the MLE, though a second order effect may be evident with

smaller samples when additional estimating equations are included in the set of scores.

Qu et al. (2000) argue that GMM can be used to improve the efficiency of generalized

estimating equations in longitudinal data models (Liang and Zeger, 1987). They

optimally combine an extended set of scores in such a way that, under misspecification,

the estimators are more efficient than those based on the GEE for a given i l.:i

correlation 1i ii1:: Park (2000) uses GMM to balance robustness and efficiency of point

estimators combining both efficient and robust scores for parameters in order to obtain

a GMM estimator associated with the implied semiparametric model that is efficient for

both heavy and light tailed distributions. Qu and Song (2002) use the GMM J-test of

over-identifying restrictions for testing whether missing data is ignorable by constructing

an additional set of scores based on different missing data patterns. They distinguish

between ignorable and non-ignorable missing data models by whether the semiparametric

model induced by the extended set of scores on different missing data patterns is true.

1.2.2 GMM Estimation and Asymptotic Results

Let X {Xi,..., X, } be an iid sample from a distribution F, with support in

Rd and let 00 e O C RP be the parameter to be estimated. For a given f, we denote

by EF[f] = E[f(Xi)] the expectation of f with respect to the underlying distribution

function F and denote by F, the empirical distribution function corresponding to the

sample s. As discussed in the introduction, a GMM model specifies population moment

conditions in terms of a function b : Rd x ( -i Rq whose expectation under F is zero. Let

be(x) = b(x, 0). We give below a formal definition following Hall (2005).
D. [fi/o., 1.2.1 (Population moment condition). The moment condition of a GMM model

is given by

E[b(X, 0o)] E[boo] 0. (1-1)

Having defined the moment condition that identifies the GMM model, we also require

that 0o be globally identifiable (Hall, 2005).

D. fil.,i 1.2.2 (Global Identifiability). 0o is globally identifiable if

EF[bo] /0 for all 0 e 0 with 0 / 0o. (1-2)

Usually, in order for 0o to be globally identifiable, it is necessary for the dimension

of bo (the "basic score") to equal or exceed the dimension of the parameter vector 0, i.e.

p < q. Henceforth we will assume that this condition is satisfied.

When p = q, the Z-estimator for Oo = 0(F) is obtained by considering the sample

version of (1-1),

1E, [be] b(Xi, 0) 0, (1-3)
i

and solving it in 0. In this case, under regularity conditions, 0 is consistent and
.-i-,i!111 i. ically normal, with

l/2( -_ 00) N(0, D- V(D- )T), (1-4)

where D EF [Vboo], Vb(x, 00) is the Jacobian of b(x, 00), and V VarF [boo],

(van der Vaart, 1998, Theorem 5.21, p. 52). The sandwich estimator E, the nonparametric
estimator of the .-i-~! ,l.. ic covariance matrix of the Z-estimator, is given by

t n -1 (t n i /1 n -1
S(i b(X 0, ) 1 b(X, 0)b(X 0 )T Vb(X, 0)T (1-5)
i= 1 i=l i= 1

Unfortunately, (1-3) usually has no solution when q > p, even though the population

equation (1-1) is satisfied. One way around this problem is to consider the GMM

estimator.

D. /7i; ,., 1.2.3 (GMM Estimator). For any symmetric, positive definite matrix W,

(possibly random), the GMM estimator of 00 is defined as

argmin b(o0)TWb,(0), (1-6)
OEH
eee

where b,(0) -= b(X, 0).

We will aliv-; assume that the sequence of matrices (We,) converges in probability

to a (nonrandom) positive definite matrix W. Denote by b(0) EF[be] the expectation

of the basic score under the true distribution F and let Q(0) = b(0)TWb(0) and

Qn(0) = b,(o)TWeb,(). Then it is obvious that

0 = argmin Q(0). (1-7)
eee
OEH

Hansen (1982) first showed that under classical regularity conditions, the GMM estimator

is consistent and .,-i, ii!1.l ically normal. Usually, consistency results for GMM estimators

are obtained assuming one of two types of conditions. Either the parameter space is

assumed to be compact, in which case the criterion function is required to be continuous

in the parameters, or the parameter space is arbitrary and the criterion function to be

maximized (minimized) is assumed to be concave (convex), (H-i-hi 2000, pp. 456-458).

We present here a general consistency result based on Theorem 5.7 of van der Vaart (1998,

p. 45). Suppose that for every c > 0

sup IQ,(0)) Q() P 0, (1-8)
eee

and

inf Q(0) > Q(00). (1-9)
0:\\0-0011>\

Then any sequence of estimators 0n with Q,(0 ) < Q,(0o)+ op(1) converges in probability

to 00.

It is easy to see that if O is compact and b(0) is continuous then the global

identifiability property (1-2) and condition (1-9) are equivalent. Moreover, if we further

assume that b(x, 0) is continuous in 0 for all x and EF[supo e |boll] < oo, then using a

Uniform Law of Large Numbers (van der Vaart, 1998, Example 19.8, p. 272), we also can

establish (1-8). Consequently, we have established the following corollary (for other proofs,

see, e.g., Hall (2005, p. 67), Davidson and Mackinnon (1993, p. 592), and Matyas (1999,

p. 13)).

Corollary 1.2.1. Suppose that Oo is an interior point of the parameter space O -*ilifying

the population moment and the 11 ..l.rl .:1. ,I/',/l.7.:l:;i, conditions given in D fiil'..i 1.2.1

and 1.2.2, "i.. 1.; Suppose further that 0 C Rp is compact, that b(x, 0) is continuous

in 0, and that EF [supoeoIlbeol] < o0. Then the ('111 / estimator 1. fi,. in (1-6) is .,l.;'

consistent, i.e.
SP

Under additional regularity conditions, Hansen (1982) showed that GMM estimators

are .,- mptotically normally distributed. We present here the most common conditions

found in the statistical and econometrics literature to assure .,-i- ill,. ic normality. If, in

addition to the conditions of Corrolary 1.2.1, b(x, 0) is continuously differentiable with

respect to 0 for all x, EF Ibo 12< o0 and EF[supo| Vboll] < oo, then

ln /2( 0) N(O, (DTWD)- DTWVWD(DTWD)-1),

where Vb(x, 0o) is the Jacobian of b(x, Bo), with (i, j)th entry Vb(x, Oo)y =
D E [Vboo], and V VarF [bo].

Theorem 3.2 of Hansen (1982) proves that the choice of W, = V-1 gives the

smallest ..i-mptotic variance of GMM estimators, where V is any consistent estimator

of V (note though that W, is then random). In this case the GMM estimator is called

efficient. However, in order to estimate V, we must first estimate 0o. Hansen proposed

a two-step GMM estimator, which is obtained by first computing an initial (inefficient)

estimator 0, of Oo using an arbitrary weight matrix W (usually the identity), and then

letting 0, = argmino b,(0)TV(o )-lb,(0), where V(0) is an estimator of the ..i-mptotic

covariance of b,(0). Usually, V(0) n1 En- b(X,, O)b(X,, O)T, or the centered version

V(0) = n-1 il(b(Xi, ) b,(O))(b(Xi, O) b,(O))T in semiparametric estimation,

but the covariance can be also modeled parametrically. The .,i-mptotic distribution of the

two-step GMM estimator is

n1/2(, 80) N(0, (DTV-1D)-1). (1-10)

The k-step GMM estimator is defined by iterating the second step above k times

replacing On by the current value of 0n at each iteration. Another efficient estimator,

called the continuous updating GMM estimator, was developed by Hansen et al. (1996)

and is defined to be

0, = argmin b,(o)TV(o)-lb,(0). (1-11)

It can be shown that the continuous updating GMM estimator is consistent and has the

same .,i-mptotic properties as the two-step GMM estimator given in (1-10).

The J-test of over-identifying restrictions given in Lemma 4.2 of Hansen (1982), uses

the statistic

Qn (0,) b ,(0 ,) (,) -lb, (), (1-12)

where O, is an efficient GMM estimator, to test whether the GMM moment condition

E[be] = 0 holds for any value of 0. Under the same assumptions as before, the J-test

statistic given in (1-12) for testing the over-identifying restrictions (1-1) satisfies

nQ,(8- ) X)-P. (1-13)

1.2.3 Nested GMM Models

Lind- ; and Qu (2003) analyzed the GMM methodology within parametric,

semiparametric, and nonparametric frameworks. They identify three levels of nested

models. At the first level, we find the parametric model. Here, in general, we identify

some basic scores that define the parameter of interest. For example, for a univariate

distribution, we can take bi(x, 0) = x 0 as a basic score for the mean and b2(x, 0)

sign(x 0) .5 as a basic score for the median. In the case of normal distribution with

known variance, -;v N(Oo, 1), both basic scores provide consistent estimators of location

parameter 00. In this case, any efficient GMM estimator based on the basic scores bl and

b2 is fully efficient, in the sense that its .,-vmptotic variance equals the inverse Fisher

information.

If the parametric model is not correct, Lin -1i- and Qu (2003) define the semiparametric

model implied by the moment conditions to be the set of all distributions F compatible

with the scores, i.e., the set of all F for which there exists 0 CE such that EF [bo] = 0.

Then the GMM estimators are consistent under weakening of model assumptions. On the

other hand, if the parametric model holds and the scores are correctly specified in the

semiparametric model, then any efficient GMM estimator is first order equivalent to the

MLE.

The semiparametric model is false when the underlying distribution F is incompatible

with the basic scores, i.e., EF [bo] / 0 for all 0 c 0. In this situation, we can still obtain

consistent estimators by taking as our parameter the value of 0 E O that minimizes the

quadratic distance between the expected scores b(0) and the zero vector (also known as

Mahalanobis distance), i.e.,

0 = argmin Q(0), (1-14)
eee
where Q(O) = b(0)TV(0)-lb(0), with b(O) = Ep[be] and V(O) = Ep[bob ]. Let Q,(0) be

the sample version of Q(0), i.e.,

Q.(0) = b ()TV(0o) -1b~(0). (1 15)

Lind-v- and Qu (2003) call Q,(0) "the quadratic inference function" and they show

that this inference function mimics the properties of log-likelihood (even when the

semiparametric model is false):

"Likelihood ratio" test: nQ,(0o) nQ,(0~) ~- X.

"Profile likelihood" test: RnQ,(0o01, 2(012 ))n- rnQ(0 ,02) ^ Xdim(e) for testing
Ho : 01 001, where 02(001) argmin02 Q(0o01, 02).

"J-test of over-identifying restrictions": nQ,(O ) -- X_, under the semiparametric
model, yielding a valid goodness-of-fit test for the semiparametric model.

1.3 M- Estimation

M-estimation is another popular method for finding estimators. For more references

and results, see van der Vaart (1998), Serfling (1980), and Huber (1981). In this case, the

parameter of interest 00 is given as a maximizer of the population "criterion function"

m(0)

0o = argmaxm(0). (116)
eee
EOH
The M-estimator is defined as a maximizer of the inpIl. criterion" function

0 = argmax m,(0). (117)
OG

For instance, if m(0) = E[m(X, 0)], we usually take m,(0) = m(Xi, 0).

We consider now consistency results for M-estimators given as maximizers of concave

criterion functions when the parameter space is not necessarily compact, a topic that

has been actively researched in econometric theory. In the case of GMM estimators,

the assumption of concavity is not restrictive for (basic) scores that are linear in the

parameters, since, in this case, it can be shown that the negative of the quadratic inference

function (1-15) is concave (H--ihi 2000, p. 468).

H i-l-hi (2000, p. 458) presents a proposition from N. y and McFadden (1994,

pp. 2133-2134) that establishes consistency for M-estimators based on concave criterion

functions. Proposition 1.3.1 below gives sufficient conditions for consistency under weaker

assumptions. Its proof, given in Appendix A, does not require that the sample criterion

functions converge in probability on the entire parameter space, nor does it depend on the

result that M-estimators corresponding to continuous criterion functions are consistent.

Moreover, it does not require the parameter space to be convex, as in Hi-ihi (2000,

p. 458). The proof uses only the fact that pointwise convergence in probability for concave

functions on an open set implies uniform convergence on compact subsets of that open set

(Pollard, 1991, sec. 6) and is easily adapted to accommodate nuisance parameters, as in

Proposition 1.3.2 (proven in Appendix A). For further details and applications to some

financial risk measures, see Giurcanu and Trindade (2006).

Denote by S(to,c) = {t e R : lit toll = c} and B(to,c) = {t e Rq : lt tol < c} the

sphere, respectively, the closed ball centered at to of radius e.

Proposition 1.3.1 (Consistency under concavity). Let m,(0) and m(0) be population

and sample criterion functions, -, "i-.. /.:; /; and let Oo = argmax6oc m(0) and 08

argmaxoE, m,(0). Suppose that Oo is ii//, li/// .:l ,..ifi,!,1, that m,(0) is concave in 0 e

with I' .'l',l'.:l.:l 1, and that there exists a neighborhood C C 0 of Oo such that for every

0 e C, m,(0) -> m(0). Then O, -> 0.

Proposition 1.3.2 (Consistency under concavity with nuisance parameters). Let m(0)

and mn(0) be the population and sample criterion functions, ', -./,i. /.: .l/; and let Oo

argmaxe m(O), with Oo = (00,1,00,2). Let C C 0 be a neighborhood of 0o and /. fI;,.'

for every 0 = (01, 02) E C, On,2(01) = arg"i_ '1,82)EC n (01, 02). Suppose that

0o is i, .,1,ll;/ .,l, I.:i11, m,n(0) is concave in 0 with pj,., '',.:l.:/; 1 and that for every

0 c C, mn(0) A- m(0). For ,:.;, consistent sequence of estimators 0n,1 for 0o,1, let

0O,,2 0n,2(8n,l). Then 0,,2 -- 00,2.
1.4 Empirical Likelihood Estimation

1.4.1 Review on Empirical Likelihood

Empirical likelihood, introduced in a series of papers by Owen (1988, 1990, 1991),

is a nonparametric approach to inference with applications in many areas of statistics.

Empirical likelihood allows the use of likelihood methods without necessarily assuming

that the data are drawn from a parametric family of distributions. As Owen (2001,

p. 1) remarks in his comprehensive monograph on empirical likelihood, the advantages

of empirical likelihood arise because "it combines the reliability of the nonparametric

methods with the flexibility and effectiveness of the likelihood appro !I He adopted

the name "empirical likelihood" because the empirical distribution of the data p-1i-

an important role. As we will describe later in this section, alternative nonparametric

likelihood ratios have been developed that are also based on the empirical distribution

function, and as Owen (2001, p. 2) states in his book, "empirical likelihood ... is

distinguished more by being a likelihood than by being empirical".

The main idea of empirical likelihood is to construct a likelihood ratio statistic for the

parameter of interest using a multinomial distribution on the observed data. Owen (1988)

proves an analogue of Wilks Theorem, obtaining a X2 .i-,in!l ltic distribution for the

negative of twice the log empirical likelihood ratio. As Owen (1988) remarks, this result is

surprising because the number of nuisance parameters, n 1, increases with the sample

size.

Empirical likelihood was initially applied to construct confidence regions for

parameters defined by statistical functionals, such as Z-estimators, Frechet differentiable

functionals, and smooth functions of means (Owen, 1988). Owen (1991) applied empirical

likelihood to regression models by extending the theory for independent and non-identical

distributed observations. Kolaczyk (1994) made further extensions to generalized linear

models.

Empirical likelihood was soon recognized as a serious competitor to contemporary

methods of nonparametric inference, such as the bootstrap. Hall and La Scala (1990)

argue that "empirical likelihood ... deserves a prominent place in the modern statistician's

armory of computer-intensive -...1 They identify the following advantages of empirical

likelihood over the bootstrap:

(1) empirical likelihood provides confidence regions for multivariate parameters, and

the shapes are data driven, being concentrated in places where the density of the

parameter estimator is greatest;

(2) empirical likelihood is Bartlett correctable, i.e., a correction for the mean reduces the

coverage error of confidence regions based on empirical likelihood from order n-1 to

order n-2;

(3) empirical likelihood does not require estimation of scale or skewness;

(4) empirical likelihood regions are range preserving and transformation respecting.

Imbens (2002) gives a review of recent developments concerning maximum empirical

likelihood estimators, which are defined as the maximizers of the empirical likelihood

over the parameter space. He remarks that their main merit is that they circumvent

the need to estimate the covariance matrix (the .,-ii!l ,' i1 ic covariance of the sample

criterion function) necessary in the case of GMM estimators and also they have a nice

information-theoretic interpretation. It turns out that the (.oil iii ll.iLt- -l'i-dating GMM

estimator is a particular case of the generalized empirical likelihood estimator obtained

in the class of Cressie-Read power divergences for the Euclidean distance (p = 2). He

compares these estimators on a simulated dynamical panel data set.

Empirical likelihood has been successfully applied in a time series context as well;

chapter 8 of Owen (2001) is dedicated to this subject. By reducing to independence, he

shows how to apply empirical likelihood in the case of AR(1) processes. Extensions to

arbitrary order autoregressive processes are easily obtained, and it would be interesting to

see how inference based on empirical likelihood competes with the classical approaches in

time series. Kitamura (1997) introduced a blockwise empirical likelihood that preserves

the dependence structure within the observations. By extending results from Qin and

Lawless (1994), he derives an efficient estimator by maximizing the blockwise empirical

likelihood. This estimator is called the maximum blockwise empirical likelihood estimator

and is the counterpart for time series of the maximum empirical likelihood estimator.

1.4.2 Empirical Likelihood for Moment Condition Models

Let X1,... ,X, be an iid sample from F, 0o E C WR the parameter of interest

and b(x, 0) =(bi(x, 0),..., bq(x, 0))T a q-dimensional function indexed by 0, such that

0o = 0(F) is the unique solution to the population equation

EF[boo] =0. (1-18)

If p = q we obtain the classical Z-estimation model, and if q > p, then we obtain the GMM

model, as discussed in previous sections. We consider both models at the same time, and,

when necessary, we underline the differences. We first give some definitions.

D. /7i/,, 1.4.1 (Nonparametric Likelihood). The nonparametric likelihood of a

distribution G is defined by

L(G)=- G{X,}, (1-19)
i=1
where G{Xi} represents the probability of getting the value Xi under the distribution G.

This is not a likelihood as defined in classical statistical theory, but L(G) is the

probability of obtaining exactly the observed values X,..., X,. In order to have a

positive nonparametric likelihood, G must place positive probability on each observed

value X, i = 1,..., n.

D. /7i ./,.( 1.4.2 (Nonparametric Likelihood Ratio). The nonparametric likelihood ratio is

defined to be

(G) () (1-20)
L(F<)'
where F, is the empirical distribution function.

D /(./.., 1.4.3 (Empirical Likelihood). The empirical likelihood for 0 is defined as

R(0) = sup{R(G) : G < F, Ec[be] = 0}, (1-21)

where the supremum is taken over all distributions supported on the sample.

It is shown in Owen (2001, pp. 11-12) that we can treat the data as if there were

no ties, by considering the probabilities associated with observations and not with their

values. If we represent any distribution G < F, by a vector of weights p = (pl,...,pn),

where pi = G{Xi}, then the empirical likelihood ratio can be written in an equivalent form

as

R(0) sup unp pib(X, 0) 0, pi O l,p > 0 (1 22)
p i=1 i= 1 i= 1
Owen (1988) proves the following fundamental result. Let X1,...,X, E Rd be independent

random vectors with common distribution F. For 0 e 0 C RP and x IRd, let b(x, 0) e

RP. Let 0 be such that CovF[boo] is finite and has rank p. If 0o satisfies E[boo] = 0, then

-21og(((00)) ~ XP. As Owen (2001, p. 41) remarks in his monograph, an interesting

aspect of this .i-mptotic result is that it does not include conditions on b(x, 0) nor on

EF [be].

Let F,n = {G : R(G) > c, G < Fn} and STn = UeC,.t : EG[bt] = 0}. Owen's result

-,-.-, -r- taking c = exp(- (1 a)/2), where X(1 a) is the 1 a quantile of iX, in

order to obtain an ..i-mptotic 100(1 a) confidence region for 00, i.e.

P(Oo E S,n) 1- a, as n oo. (1-23)

Inspection of the proof reveals that the leading term in the .,-ii!l, ,I ic expansion of the

empirical likelihood ratio is

-2log( 7(0o)) nbTV-b + Op(n-1),

where b = n-1 EY b(Xi, 0o) and V, n-1 EY1 b(X, 0o)b(X,, 0)T". Let T2 n6S-1b

be the corresponding Hotelling's T2 statistic, where

S = 1 (b(Xi,Oo) b)(b(X, Oo) b).
i=1

Then -2log(R7(0o)) = T2 + Op(n-1). Thus, Owen (1990) -,,-. -I1 using the quantiles of a

scaled Fisher's F distribution (n) Fp,T,-p instead of a X 2 when constructing the empirical

likelihood confidence regions for 00.

Hall and La Scala (1990), DiCiccio and Romano (1989), and DiCiccio et al. (1991)

show that empirical likelihood is Bartlett correctable. Bartlett correction amounts to a

mean correction of the empirical likelihood in order to achieve a coverage accuracy of order

Op(n-2). Empirical likelihood is Bartlett correctable because the third and the fourth

cumulants of the components of the signed root of the empirical likelihood are of orders at

most Op(n-3/2) and Op(n-2), respectively. Consequently, the empirical likelihood admits

the expansion

2log(7(0o)) + x) + Op(-2). (1-24)

Since the algebraic expression for a is fairly complicated, Hall and La Scala (1990) -,i-- -1

a bootstrap approximation.

Qin and Lawless (1994) extend empirical likelihood for Z-estimators to models where

the dimension of the estimating equation is greater than that of the parameter. They

define the maximum empirical likelihood estimator (MELE) to be the maximizer of the

empirical likelihood ratio over the parameter space, i.e.,

0 argmax 7(0). (1-25)
Oee

Following the same arguments as in Owen (1990), Qin and Lawless (1994) prove that

the optimal weights are given by pi(O) = (n(l + ATb(Xi, 0)))1 for any 0 in a small

neighborhood of the true parameter 00, where A is the Lagrange multiplier of the system

(1-22) and satisfies
n b(Xi, 0)
=0
n(1 + ATb(Xi, 0))

It is an easy exercise to show that for p = q, 0 0, the usual Z-estimator, but for p < q, 0

generally differs from the GMM estimator 0.

Qin and Lawless (1994) show that 0, the MELE of 00, is .,-imptotically normally

distributed, with the same limiting distribution as the efficient GMM estimator.

Specifically, assume that E[b(Xi,00)b(Xi, 00)T] is positive definite, that b(x, 0) is two

times continuously differentiable in a neighborhood of 00 where l|b(x, 0)11, I|Vb(x, 0)11,

and IIV2b(x, 0)11 are bounded by some F-integrable function G(x), and that the rank of

E[Vb(X,, Oo)] is p. Then

(1) with probability 1, R(0) attains its maximum at a value 0 in the interior of the ball

10 0o|<
(2) /(8 Oo) N(0, A) and v/ ~- N(0, U), and

(3) -21ogR(0) ,,

where I is the identity matrix, A (DTV-1D)-1, and U = V-(I DADTV-1).

Result (3) gives a test for over-identifying restrictions, a competitor to the GMM J-test as

described in previous sections.

B ,.-.-. 1y (1998) has generalized empirical likelihood by considering the family of

Cressie-Read power divergences. The Cressie-Read power divergence between F, and Fp

(or equivalently between the vector of uniform probabilities po = n-11 and p) is given by

1 n
DP(p) n p(np), (1 26)
i=1

where

[p(p 1)]-1( 1), if p / 0,1,

lp(u) log(u), if p 0, (1 27)

ulog(u), ifp =1.

The power divergences contain as special cases the forward (p = 0) and backwards (p = 1)

Kullback-Leibler divergences, Hellinger distance (p = 1/2), and Euclidean distance (p = 2).

B ,.-.-. ly defined the empirical divergence for the mean for the whole class of Cressie-Read

discrepancy measures by generalizing (1-22) in case of the mean, i.e. for every p E R

CRp(O) = inf Dp(p) : piXi O Ip 1 ,p > 0 (1-28)
P i= 1 i= 1

B r--__ dly (1998) showed that Owen's result on the .-i-~!,ll ,ics of empirical likelihood

holds for any member in Cressie-Read family. Jing and Wood (1996) show that the

exponential empirical likelihood (obtained for p = 1) is not Bartlett correctable, and later,

B r.-.- dly shows that empirical likelihood is the only element in the Cressie-Read family of

divergences that admits a Bartlett correction. Nevertheless, in his simulation results, the

use of a scaled Fisher's F distribution gives better coverage than both .,-vmptotic 2 and

Bartlett corrected confidence regions. Corcoran (1998) extends the class of discrepancy

statistics that admit Bartlett corrections. Smith (1997) introduces the class of generalized

empirical likelihood estimators defined as saddle points of an optimization problem defined

in terms of a normalized convex function. N. .-- y and Smith (2004) show that this class of

estimators generalizes the class of minimum Cressie-Read discrepancy estimators.

CHAPTER 2
THE BIASED BOOTSTRAP WITH IID OBSERVATIONS

2.1 Review on the Bootstrap with IID Observations

Since its introduction by Efron (1979), the bootstrap has provided new methods

to applied statistics and motivated a myriad of new theoretical results. In a recent

edition of Statistical Science dedicated to the bootstrap, Efron (2003) remarked that the

bootstrap was initially introduced as an alternative to the jackknife for estimating the bias

and variance of an estimator. Since then, many new applications have been developed,

including bootstrap confidence intervals and significance tests, bootstrap bias reduction,

and bootstrap diagnostics.

In reviewing recent developments in bocli -, ippi.- Davison et al. (2003) mentioned

several new directions of research, including highly accurate parametric bootstrap

procedures, theoretical properties for the nonparametric bootstrap with unequal

probabilities, the m-out-of-n bootstrap, bootstrap failures and remedies for superefficient

estimators, significance J I i- and resampling for dependent data. Books that deal with

both theoretical properties and applications of bootstrap include Hall (1992), Efron and

Tibshirani (1993), Shao and Tu (1995), Davison and Hinkley (1997), and Lahiri (2003).

The main idea of the bootstrap is to estimate the sampling distribution of a statistic

by its (re)sampling distribution obtained under an estimate of the underlying distribution

of the data. This definition applies to both parametric and nonparametric problems as

follows. Suppose X {X1,..., X, } is an iid sample from a distribution F and we want

to estimate the sampling distribution of a statistic T, = T,(X1,...,X,). Let (T,IF)

represent the distribution of T, when the data Xis are drawn from F.

In the parametric bootstrap, we consider a parametric model {Fe; 0 e 0} for the

underlying F, with F = Fg0 and O0 E 0, where 0 is the parameter space. In order to

apply the bootstrap principle to this problem, we first estimate the parameter 0o by a

consistent (and efficient) estimator 0 (usually the MLE) and take F6 as our estimate

of F. Next, let {Xf,... ,X,} be a bootstrap resample, i.e., conditional on X,

X*,...,X, are iid from Fb, and denote by T,* T,(X*,...,X ) the value of T, computed

for this (hypothetical) bootstrap resample X*. Then, the parametric bootstrap estimate

of (T,,F) is given by (T, ,F).

In the nonparametric bootstrap, we usually take F,, the empirical distribution

function of the sample X, as our estimate of the underlying distribution F, though

weighted versions are also used. As before, let K* = {X,... X} be a bootstrap

resample, i.e., conditional on ,, Xf,..., X, are iid from F,, and denote by T, =

T,(X*,...,X*,) the value of T, computed for X*. Then, the nonparametric bootstrap

estimate of (T, IF) is given by [(T,* F,).

Except for some special cases, there are no closed form expressions for [(T* F,)

(Hall, 1992, pp. 9-11), and bootstrap estimates are usually found by Monte Carlo

simulation. In this case, for a given integer B, we consider B simulated bootstrap

resamples X*,..., Q), and we compute the statistic Tb T*(b,*) for every resample

Sb*. Then, we approximate (T,* F,) by the empirical distribution of the T*'s, i.e.,

B
L(T.* IF.) T Z ,r
b=1

where 6, is the unit point mass at x. Having a bootstrap estimate of the sampling

distribution of a statistic we can estimate its bias, variance, and the quantiles of interest.

The bootstrap can be iterated, and usually each iteration reduces the order of error

of bootstrap confidence intervals and tests by a factor of n-1/2, as in Hall and Martin

(1988). Generally, no more than two levels of bootstrap are employ, -1 and such procedures

are referred to in the literature as "the iterated bootstrap", "the nested bootstrap", and

"the double bootstrap". The computational effort required by the iterated bootstrap is

generally taken to be the square of that required for one level of bc.. .1 -i 1 -ppi.- which

is already computationally involved. Applications of the iterated bootstrap include

calibration of confidence regions (Beran, 1987, 1988; Hall, 1992), bias reduction (Hall and

Martin, 1988; Davison and Hinkley, 1997), variance stabilization (Tibshirani, 1988; Hall

and Presnell, 1999), and bootstrap diagnostics (Efron, 1992; Canty et al., 2000).

Often, the bootstrap provides more accurate results than first order .,-i-,i!!,il ic

approximations, without making use of the complex algebra of higher order expansions.

The analysis of the performance of bootstrap procedures generally rely on Edgeworth

expansions. The Edgeworth expansion is a refinement of the Central Limit Theorem that

gives the form of the error terms in an .,-i, iiill ic approximation of the distribution of the

sample mean, extended by Bhattacharya and Ghosh (1978) to smooth function of means.

Bootstrap versions of these expansions were developed by Hall (1988) in order to analyse

the performance of different types of bootstrap confidence intervals. As a consequence,

the bootstrap often gives rejection and coverage probabilities that are more accurate than

approximate large sample methods.

Generally, a good bootstrap procedure should satisfy two desiderata: it should yield

an .,-i-i!!ii1l 1 ically consistent estimate of the sampling distribution of a statistic and, for

small to moderate sample sizes, it should outperform .-i iiil-l ic approximations. Shao

and Tu (1995) identify some techniques used in the statistical literature to establish

bootstrap consistency. The most popular technique is called imitation. The main idea here

is to imitate the proof for obtaining the .,-vmptotic distribution of the statistic in order to

extend it for the bootstrap. The consistency of the bootstrap for the sample mean can be

proven this way (van der Vaart, 1998, Thereom 23.4). Then, by applying the delta method

for bootstrap (van der Vaart, 1998, Theorem 23.5), the consistency of the bootstrap for

smooth functions of sample means follows. Another technique uses Berry-Esseen type

inequalities. The main advantage of this method is that one can also obtain the rate of

convergence of the bootstrap estimates. Unfortunately, it is often difficult to obtain such

inequalities.

In order to show consistency of the bootstrap, one usually considers a metric 6

that metrizes weak convergence in the space of distribution functions (Huber, 1981),

and then shows that, the distance between the bootstrap distribution and the sampling

distribution of the statistic converges to zero either almost surely or in probability, as the

sample size increases. In the former case the bootstrap is called strongly consistent, in the

latter weakly consistent. We will also use the terminology of convergence in distribution,

conditionally given the data, almost surely or in probability.

If G is continuous, 6 can be taken to be the Kolmogorov-Smirnov distance, which

metrizes weak convergence in this case, but other metrics have been used in studying

the consistency of bootstrap. For example, Bickel and Freedman (1981) used Mallow's

distance in proving consistency of the bootstrap for t-statistics, von Mises functionals,

and empirical processes. Freedman (1981, 1984) also uses Mallow's distance to prove

the consistency of the bootstrap distribution of ordinary least squares (OLS) and two

stage least squares (2SLS) estimators in certain linear regression models. Using empirical

processes theory, bootstrap consistency can be also established for Hadamard differentiable

statistical functionals using the consistency of the empirical bootstrap for the Brownian

Bridge (Gine, 1990), and more recently van der Vaart (1998, pp. 332 334). Using a

functional delta method, one can prove bootstrap consistency for a myriad of statistics,

such as sample quantiles, L-estimators, and nonparametric-goodness-of-fit statistics such

as the von Mises and Kolmogorov-Smirnov statistics.

To illustrate, using the same notations as above, let X be an iid sample from F

and let G, = (TIF) be the distribution of T,. Having a bootstrap resample X*,

let T =- T,(Xf,..., X,*) be the bootstrap version of Tn corresponding to and let

G, = [(T IF,) be its bootstrap distribution. Suppose that T, converges weakly to G,

so that 6(G,, G) 0 as n -i o, where 6 is a metric that metrizes weak convergence of

distributions, such as Levy distance or bounded Lipschitz distance (Huber, 1981). While

the .-i~!, ill. ic theory approximates the distribution G, by its limit G, the bootstrap

approximates G, by its bootstrap distribution (which is a random distribution). If the

sequence of random distributions G, converges to G in probability, i.e. 6(G,, G) -2 0, then

P
we will that T, converges weakly to G in probability and write T* -I G. In this case,

by the triangle inequality, 6(G,, G,) -2 0, and the bootstrap is weakly consistent.

In order to prove that T,* P G, the following notations are often used. We will -,

that T, = o7(1) if for every c > 0, P(I|T, > e| ) -> 0, where P(-| ) is the conditional

probability given the sample i. In appendix B.2, we present a (conditional) version of

Slutsy's theorem, that shows that if T,* -~ T, BA B + o7(1), and C, = C + o7(1), then

B*T, + CA ^ BT + C. Finally, using a classical subsequence argument, (Kallenberg, 2002,

Lemma 4.2, p. 63), note that 6(G,, G) -P 0 if and only if for every subsequence Gm, of Gn,

there exists a further subsequence Gin of Gmn, such that 6(Gin, G) a 0 as n -i oo.

2.2 Least Favorable Families Corresponding to Z-Estimation Model

In this section, we introduce pseudo-parametric families of distributions associated

with Z-estimation model described in Section 1.2.2. Presnell (2002) shows that these

families are least favorable for parameters that are smooth functions of the mean vector.

In Theorem 2.2.1 below, we prove that this result is also true for the Z-estimation

model (1-3). Here, by least favorable we mean that, when evaluated at the maximum

likelihood estimator, the inverse of the Fisher information matrix corresponding to the

pseudo-parametric families defined in Section 1.2.2 is equal to the sandwich matrix, the

usual nonparametric estimator of the .,-vmptotic variance of the Z-estimator. In this

sense, inference about the parameters is not made artificially easier by restricting attention

to these families of distributions (Stein, 1956).

There are a number of bootstrap confidence interval procedures based on least

favorable families of distributions. Efron's tilted bootstrap confidence intervals are based

on inverting a bootstrap hypothesis test carried out within a least favorable family. The

automatic percentile method (DiCiccio and Romano, 1989) can be applied in conjunction

with a least favorable family approach (DiCiccio and Romano, 1990). Also, DiCiccio and

Romano (1990) and Hall and Presnell (1999) -i--::. -1 estimating the variance of 0 as a

function of 0 by resampling along a least favorable family in order to compute variance
stabilized bootstrap-t intervals.
C'! ....- and fix any value of p. For a given value of 0, we choose the vector of
probabilities p = p(0) in order to

minimize Dp(p) = 1 / (npp), (2-la)
i=1

S pb(X,, 0) 0,
subject to (2-1b)
-Pi 1, pi > 0,
i= 1
where bp is given by (1-27). Note that

) (p1- )-l-1, if p/1, (22)
l+log(u), if p 1.

Using the same notation as in Presnell (2002), define

Sdef ( 1)-1 + pDp(p), p 1,
Yp(p) u (pi ) = (2-3)
n i=1 1 + D(p), p 1.

Applying a Lagrange multipliers argument to (2-1), for a fixed 0

0 D,(p)- Ao [ p- 1 -AT pb(X, 0)
Ii= i=
= (npi) o ATb(X, 0). (2-4)

Multiplying (2-4) by pi and summing over i, gives

C,(p) Ao 0. (2-5)

Equation (2-4) may be solved for pi using (2-5), yielding

npi = (2 6)
upi = {1 + (p 1) [p6 + ATb(Xi, 0)] } 1), if p:/ t, (2-6)
exp{ + ATb(Xi, 0)}, if p 1,

where 6 and A are chosen to satisfy the constraints pi = 1 and Y', pib(Xi, 0) = 0.

Note that in this representation 6 is equal to the minimized power divergence, i.e.,

6 =D,(p).

For any 0, let Fo be the weighted empirical distribution on the sample ), corresponding

to the vector of weights p(0) given by (2-6), i.e.

FO pi(0)6xs,
i= 1

where 6, is the unit point mass at x. We show in the appendix that 3 {Fo}oee, the

resulting family of weighted empirical distributions indexed by 0, has the nonparametric

least favorable property (Stein, 1956; DiCiccio and Romano, 1990).

Theorem 2.2.1. The inverse of the Fisher information matrix .... -. ',, 1,,:; to the

pseudo-parametric families /, 7.,, .1 in Section 1.2.2, evaluated at the maximum likelihood

estimator, is equal to the sandwich matrix, the usual nonparametric estimator of the

'- IIl'.'l/:.: variance covariance matrix of the Z-estimator.

2.3 The Biased Bootstrap for GMM

In spite of the good .i,-ill l ic properties of GMM estimators, Monte Carlo

experiments have shown that the actual sizes of tests based on first order .,-i-,i ll, ic

theory differ greatly from their nominal levels (Tauchen, 1986). In an effort to improve

finite sample performance of these tests, Hall and Horowitz (1996) devised a modified

bootstrap procedure that can be also applied with dependent data. Their procedure

recenters the moment conditions so that the modified moment conditions are fulfilled by

the sample.

Hahn (1996) showed that the bootstrap t-test for individual parameters is .-,-in',! l)tically

consistent without recentering. However, if the uncentered moment conditions are used,

then the bootstrap estimate of the distribution of the J-test statistic of over-identifying

restrictions is not consistent, a fact also claimed by Brown and N. .-- y (2002) and Lind,-v

and Qu (2003). As noted by Lind,-i- and Qu (2003), using the uncentered moment

conditions, the null hypothesis of mean-zero of the scores does not hold for the sample,

so that, one is sampling under an alternative hypothesis in which the mean of the scores

is not zero. This may have a small impact on the critical values if the null is true (if the

bootstrap is consistent under the null), but will have a great impact if the null is false;

consequently, the size of the bootstrap test might be nearly correct but its power may be

poor.

Hall and Presnell (1999) devised the biased bootstrap in order to improve the

performance of a wide range of statistical procedures for hypothesis .' -1ii:- shrinkage,

robust estimation and variance stabilization. We use this methodology in order to

construct a semiparametric bootstrap for GMM. A biased bootstrap for GMM was

introduced by Brown and N. .-- y (2002), though from another perspective. They argue

that the weighted empirical distribution that minimizes the Kullback-Leibler divergence to

the empirical distribution while satisfying the GMM equations (they call it "the empirical

likelihood distribution"), attains a semiparametric efficiency lower bound (Brown and

N. .-- 3, 1998). To show the consistency of the weighted bootstrap for t and J statistics,

they claimed that because the empirical likelihood distribution is more efficient than

the empirical distribution of the sample, the corresponding weighted bootstrap is both

consistent and more efficient than the uniform bootstrap.

Our approach is similar. We first introduce a family of weighted empirical distributions

3 {Fo}ose defined by (2-1), associated with the GMM model (1-1). We bootstrap

as if we had a parametric model: first we estimate the parameter 00 = O(F) using,

- i, the GMM estimator 0; then, for the first level of bootstrap, a generic resample

X* ={X,... ,X,} is a sequence of (conditional) iid draws from Fb E 3. It is obvious

that if YL1 b(Xi, 0) = 0, which is generally the case when dim(b) = dim(0), then

Fb = F,, so that the biased bootstrap coincides in this case with the classical uniform

bootstrap. Let 0* be the bootstrap version of 0 on the resample X*. At the second

iteration of the biased bootstrap, a typical re-resample X** = {X**,..., X**} is a

(conditional) iid sample from the weighted empirical distribution Fo, corresponding to
the parameter estimate 0*. Here, the resampling procedure of the biased bootstrap differs

from the uniform bootstrap even when dim(b) dim(0), since typically F6. / F*, where

F* is the empirical distribution of the resample X*. In Section 2.3.2, we will see that the

biased bootstrap has certain computational advantages over the "uniform" bootstrap when

the bootstrap is iterated.

In this way, we mimic the parametric bootstrap for semiparametric models without

any adjustment of the moment conditions and without the need to find the centering value

for the bootstrap version of the statistics, as it is usually required in such situations, see

e.g. Shorack (1981), Freedman (1981), Hall and Horowitz (1996), and Lahiri (2003). To

summarize, the main steps in the biased bootstrap for GMM are as follows:

Estimate the parameter by GMM estimator 8.

Find Fo as in (2-6).

Resample the data drawn from Fo, to obtain bootstrap estimates of the
distributions of 0 and nQ,(0).

2.3.1 Consistency Results for the Biased Bootstrap

Using the biased-bootstrap procedure, we estimate the distributions (b1/2 (0- Oo) F)
and (nQ,(0)IF) with their bootstrap versions (7n12(0* 0)|F6) and (nQ(0*) |F),

where the weights of F6 are given by (2-6). We assume the following regularity conditions

hold.

Assumption 2.3.1. The parameter space O C IR is compact and 00 is an interior point

of O that satisfies the population moment condition and the global identifiability condition

given by Definitions 1.2.1-1.2.2.

Assumption 2.3.2. Suppose that be(x) is three times differentiable in 0 in a neighborhood

of 00 for every x, such that I b(x, 0) |, ||Vb(x, 0) 1, IIV2b(x, 0) and IV3b(x, 0)|| are all

bounded by some square integrable function k, with EF k2 < 00.

Assumption 2.3.3. Assume that V = Eb(Xi, Oo)b(Xi, 0o)T is positive definite and that

the rank of D= E[Vb(Xi, 0o)] is p.

In the following proofs, we assume p = 0. The first lemma is concerned with the

existence of the weights satisfying (2-6). We show next the consistency of the biased

bootstrap for the sample mean of criterion functions (Theorem 2.3.1), then, we give a

lemma on conditional uniform convergence in probability that will be needed in proving

the weak consistency of GMM estimators and their bootstrap versions. We end with

a theorem on consistency of the biased bootstrap distribution of the GMM estimators

(Theorem 2.3.3). Theorem 2.3.4 shows the consistency of the biased bootstrap for the

J-test of over-identifying restrictions.

Lemma 2.3.1. Under Assumptions 2.3.1-2.3.3, 0 is inside the convex hull of {b(Xi, ) :

i = 1,..., n}, w ith ,.l .,,,.:.: /'; l 1, .,,. I,': I, 1.

Theorem 2.3.1. Let X {Xi,...,X,} be an iid sample from F, and let Oo = 0(F) be

the parameter of interest, assumed to .iI.fy equation (1-1). Let 0 be a sequence of ('11U11

estimators 1/. ,, I. by (1-6) and let F6 be the weighted empirical distribution given by (2-6)

corresponding to 0. Let X* = {X.,... ,X,} be a biased bootstrap resample. Under the

assumptions 2.3.1-2.3.3, as n oo,

n-1/2 b(X*, ) ^1 N(O, V). (2-7)
i= 1

We now give a conditional uniform convergence result.

Lemma 2.3.2. Under the same conditions as in Theorem 2.3.1, the following ,,.:.,rm

convergence for the biased bootstrap holds:

sup b(X;,0)- Eb, o= (1), (2-8)
i= 1

where E be = E b(Xi, 0) = b(O).

Theorem 2.3.2. Under the same conditions as in Theorem 2.3.1, for ,;,i sequence of

GMM estimators 0 and (biased) bootstrap estimators 0*, with Q,(0) > Qn(0o) op(1) and

Q*(0*) > Q(O0) Op(1), we have 0 = Oo + op(1) and 0* = Oo + o(1), and hence also that

* =0+ o (1).

Theorem 2.3.3. Let X {X1,...,X,} be an iid sample from F, and let Oo = 0(F) be

the parameter of interest, assumed to -il.'-fy equation (1-1). Let 0 be a sequence of (.'1111

estimators /, 7,'., by (1-6) and let Fb be the weighted empirical distribution given by (2-6)

corresponding to 0. Let i* = {XI ,... ,X,} be a biased bootstrap resample and let 0* be

the bootstrap version of 0 on K*. Under the assumptions 2.3.1-2.3.3, as n -+ oo,

l1/2(0* ) ^ N(O, (DTWD)-IDTWVWD(DTWD)- ). (2-9)

Remark 2.3.1. Using the conditional form of Slutsky's Theorem given by Lemma B.2.4

in the appendix B.2, we can substitute for the nonrandom matrix W the nonparametric

estimator of V-1. As a consequence, the consistency of the bootstrap for the two-step

GMM estimator follows.

Remark 2.3.2. In the proof of Theorem 2.3.1, we use the fact that the weighted empirical

distribution satisfies the moment conditions. If we do not use the biased bootstrap, the

conditional mean of the (uniform) bootstrap sample mean b((0) contains a random

element that disappears .,-vmptotically, a result that is proven in the appendix B.2. The

fact that recentering is not necessary in the case of GMM estimation has been also proven

by Hahn (1996), though using another approach. As a consequence, without reweighting

or recentering, the bootstrap estimate of (/n(0( 0o) Fo) is still consistent. On the

other hand, we will show also in the appendix B.2 that the usual (uniform, uncentered)

bootstrap distribution estimate of the J-test of over-identifying restrictions is weakly

inconsistent.

Theorem 2.3.4. Under the same conditions as in Theorem 2.3.3,

nQn (o) X,-P. (2-10)

As a consequence of these .i-ii! .ll ic results, the Kolmogorov-Smirnov distance

between the bootstrap distributions and the sampling distributions of these statistics

converges to zero in probability as the sample size grows to infinity, that is:

sup p (n1/2( 0) < F6) P (n1/2( 00) < xF) 0,

and

sup P (nQ*(O,) < xlF) P (nQ,(O,) < xlF) 0.

2.3.2 The Biased Bootstrap Recycling

Bootstrap recycling was first proposed by Newton and G (v0-r (1995). They -ii-.-. -1. '

drawing a common set of potential re-resamples from a single "da -1,1 distribution, and

then using importance weighting to "recycle" these re-resamples to estimate (conditional)

expectations for each first level bootstrap resample. Unfortunately, this method is

applicable only to the parametric bootstrap, mostly because in the nonparametric

bootstrap the support of the resample empirical distributions varies from resample to

resample. Thus, the 1 I i ily of samples from any candidate distribution that dominates

all the resample distributions will have zero importance weights for most resamples,

leading to extremely inefficient and unstable Monte Carlo estimates (Ventura, 2000).

Using the recycling method, Presnell and Giurcanu (2007) construct a recycling

algorithm for the iterated biased bootstrap that yields a second order correct confidence

interval in the smooth function of means model. At the same time, this procedure

preserves the computational requirements of the single level bootstrap. The main idea of

this method is to use the importance sampling identity (or change of measure)

E0.(T**) E( T* ) (2-11)

d P*
where Po F"'~ is the n fold product measure of Fo and d is the Radon-Nikodym

derivative of P%* with respect to P6. The Monte-Carlo recycling approximation of (2-11)

is given by

SdP0) b=1
where the resamples ,b* with b 1,..., B are iid from the -Si distribution Pg and

P .( (b*)
wb = (213)
P(Xb *)

is the likelihood ratio statistic corresponding to the probability models P6, and Pg

evaluated for the resample Xb*. Using the definitions of P,. and P6, we see that

Pa.( b,*) ni pi *))
mp6 ( ) (PA 7(2-14)
Pe(Xb*) i=1 Pi )

where X}X, where here {X,...,XA}.

In the following, we illustrate the use of the biased bootstrap, iterated biased

bootstrap and biased bootstrap recycling in a standard bootstrap estimation problem.

Suppose that X {Xi,..., X,} is an iid sample from F and we want to estimate T(F),

the solution in t of the equation

E [ft(F, F6)] 0, (2-15)

where F6 is the weighted empirical distribution corresponding to 0 and ft is a given

functional. Examples of such functionals include (Hall, 1992):

ft(F,F=) o t, (216)

for bias estimation/reduction,

ft(F, F) (0- 0)2 t, (2-17)

for mean square error estimation, and

ft(F, Fo) = I{ < +t} a, (2-18)

for a level upper confidence intervals, where I{.} represents the indicator function of a set.
The bootstrap estimate T(F6) of T(F) solves the sample analogue of (2-15),

Ee [ft(F, F.)] = 0, (2-19)

obtained from (2-15) is obtained using the "plug-in rule", i.e. by substituting F6 for F
and F6, for F6.
Usually, E [fT(F6)(F, F6)] / 0, and we might be interested in a further (additive)
correction t(F), the solution in t to the following equation

E [fT(F6)+t(F, F6)] 0. (2-20)

As before, let t(F6) be the bootstrap estimate of t(F), which solves in t the sample
analogue of (2-20)

E6 [fT(F,.)+t(F, FO.)] =0. (2 21)

This equation is again obtained using the "plug-in" rule, by substituting Fb for F and Fb,
for Fb in (2 20). Note that T(Fg,) solves Eg, [ft(Fo.,F,,)] 0, and this is where the
second level of bootstrapping enters. The hope is then that E [fT(Fr)+t(FI)(F, FO)] M 0.
This approximation needs to be taken in the following sense: it is not that T(Fb) + t(F6)
is closer to t(F) than is T(Fo), but that E [fT(F)+t()(F)F, Fo)] is closer to zero than is

E [fT(F)(F, F)].
We can argue as in Hall and Martin (1988, pp. 663-665) that any iteration of the
biased bootstrap increases the accuracy of the estimation. Specifically, suppose that

E [fr( )(F, F)] c(F)n-j/2 + Op(r-(j+1)/2),

and

)E [fT(F)+t(F, F0)] a / 0,
at t0o
where c is a "smooth functional". Then

E [fT(FR)+t(F)(F, F)] = Op(n-(+l)/2).

The argument is identical to Hall and Martin (1988), using the additional fact that if

S= Oo + Op(n-1/2) then II|F F||o, Op(n-1/2).
The recycling biased bootstrap can be used to find the Monte Carlo estimate of

T(Fg,), the solution in t of the equation

Eb. [ft(Fe., Fe..)] = 0. (2-22)

For a given b = 1,..., B, let X,* = {X,...,XbX} be a biased bootstrap resample. Let

0 0(,)*), b 1,... B, be the version of 0 corresponding to Xb*. Then we obtain the
following recycling Monte Carlo approximation of the conditional expectation from (2-22)
as in (2-12):
B
EB. [ft(Fo*,Fo*)] Y, E ,,' ft(Fo, ,Fo), (2-23)
b'= 1
where

"p ( *, p )(O)
and
m # {j : Xj X}.

The recycling Monte Carlo estimate TB(Fo4) of T(FoF) is the solution in t of the equation

B

b' 1
where the upper index B represents the number of simulated bootstrap samples. Hence,
the recycling Monte Carlo approximation tB(Fo) of t(F4) is the solution in t of the

equation
1B
SfTB(FY)+t(FO, Fb=) 0. (2-25)
b= 1

2.4 Instrumental Variables

2.4.1 Review on Instrumental Variables

In the classical regression model, the assumption that the errors are independent of

regressors is necessary in order for OLS estimators to be consistent. In many observational

studies, this orthogonality condition between the regressors and the errors is not satisfied,

so new techniques have been developed to analyze such data. One such technique is the

method of instrumental variables, which assumes the existence of an instrumental variable,

i.e., a variable that is correlated with the regressors but uncorrelated with the errors. This

technique was proposed in the econometric literature by Reiersol (1941) and has been

developed theoretically by Durbin (1954), Sargan (1958), Brundy and Jorgenson (1971),

and White (1982a), among others. Extensive treatments of this methodology are given

in econometric texts, such as Davidson and Mackinnon (1993), Matyas (1999), Hi-vlhi

(2000), and Hall (2005).

The method of instrumental variables is widely applied to cross-sectional, panel, and

times series models, and is more generally used to make causal inference in observational

studies and errors-in-variables models. A variety of estimators have appeared in the

econometrics literature, including two-stage least squares (2SLS), instrumental variable

(IV), and two-stage instrumental variable (2SIV) estimators. All these estimators can also

be viewed as particular types of GMM estimators.

Consider now the estimation of causal effects in an observational study (Angrist et al.,

1996). If one wants to estimate a treatment effect, but the individuals exert some control

over the treatment assignment, then differences between group means are biased. An

instrumental variable is a variable that is correlated with the exposure to the treatment,

but uncorrelated with the outcome after controlling for exposure to the treatment. Thus,

the variation in the instrumental variable can be used to replace the variation in the

treatment assignment when comparing the group effects.

As an example, consider the causal effect of years of schooling on earnings (Angrist

and Krueger, 1991). Individual and institutional decisions generate a correlation between

schooling and unobserved covariates such as ability and motivation, that are related to

potential earnings. For instance, if compulsory attendance laws were extended, then

those who had planned to be in school less would continue to earn less due to unobserved

covariates such as ability, motivation, and family background. A set of instruments in this

case is a set of variables that affect schooling but not earnings, once schooling is controlled

for (i.e. included in the regression equation). Angrist and Krueger (1991) use the quarter

of an individual' s birth as an instrument. They argued that students who are born earlier

in the calendar year are typically older when they enter the school than students who are

born later in the year. This pattern arises because most districts do not admit students

unless they attain age 6 by January 1. Consequently, children born earlier in the year

attain the drop-out age after attending the school for a shorter period of time than those

born later in the calendar year. Other instruments used in studying the causal effect of

schooling on earnings include siblings composition (Butcher and Case, 1994) and proximity

to a nearby college (Card, 1995).

Instrumental variables have also been successfully applied in bio-statistical research.

In a recent article, Newhouse and McClellan (1998) describe how instrumental variables

can be applied to estimate treatment effects in an observational study when a controlled

trial cannot be done. They illustrate with an application to .'-..ressive treatment of acute

myocardial infarction in the elderly. They use instrumental variables to estimate the effect

of catheterization on mortality rate. As instrument, they use the "differential hI-I i,.

(the additional distance, if any, beyond the distance to the nearest hospital to reach a

catheterization hospital). They argue that the differential distance has no direct effect

on myocardial infarction but it effects the likelihood of catheterization (the greater the

differential distance, the less likely it is for a patient to be admitted to a catheterization

hospital). When estimating the effect of catheterization on mortality rate, the authors

obtain a substantial reduction in magnitude when using IV estimate instead of OLS

estimate. In their conclusion, the authors argue in favor of IV estimation technique in

observational studies: "The results of a well-designed observational study are useful even if

the results of a clinical trial are available."

Hogan and Lancaster (2004) apply IV in order to infer causal effects in longitudinal

repeated measurements. They review two methods for estimating causal effects in

longitudinal data, inverse of probability weighting (IPW) and instrumental variables.

They apply these methods to the HERS data, a six-year natural history study that

enrolled 871 HIV-infected women starting in 1993, in order to estimate the therapeutic

effect of highly active antiretroviral therapy regimen (HAART) on CD4 cell count, using

marginal structural modeling. In this data set, the receipt of therapy varies with time and

depends on CD4 count and other covariates. They remark that both methods rely on two

important assumptions: no unmeasured confounding for IPW and the reliability of the

instruments for IV (they must be strongly correlated with the exposure to the HAART

therapy).

Before going into further detail, we present the 2SLS estimator for the linear IV

regression model. Consider the following regression model defined by

Yi x f +Ci, i= ,...,n, (2-26)

where xi is a p x 1 vector of explanatory random variables for the observed random

variable yi, ci is the unobserved error term, 0 is the regression parameter, and the q

instruments are included in the q x 1 vector zi. Some of the following assumptions from

Hall (2005, pp. 34-42) can be relaxed, (White, 1982a), but we confine ourselves to these to

simplify the presentation.

Assumption 2.4.1. The vectors vi = (xT, zT, c)T, for i = 1,... n are iid, rank(Qxz) = p,
and rank(Qzz) = q with p < q where Qz = E[x2zT], and Qzz = E[zZT].
Assumption 2.4.2 (Classical assumptions about the error ci). (i) E[ei] = 0, (ii) E[eC]
a-2, (iii) ci and zi are independent.
Assumption 2.4.3 (Moment conditions).

E [l (xf, zf, )T 4] < 00. (2-27)

Let X = [x,..., x,]T be the n x p matrix of explanatory random variables, y the
n x 1 vector of responses, e = (ei,..., e )T the n x 1 vector of unobserved error term
and Z = [z1,..., z]T the n x q matrix of instrumental random variables, which usually
includes a column of l's. In matrix notation, the linear model (2-26) can be written as

y = Xp + (2-28)

where Z and e are independent, with E[e] = 0 and Var(e) = ra2, where I is the identity
matrix. The 2SLS estimator of 0 is given by

)= (XTZ(ZTZ)- ZTX)-XTZ(ZTZ)- ZTy. (2-29)

Let Pz Z(ZTZ)-ZT be the projection matrix onto the column space of Z, then the
2SLS estimator can be written in an equivalent form as

= (XTPzX)- XTPzy. (2-30)

The estimator given by (2-29) is called the two-stage least squares estimator (2SLS)
because it can be obtained in a two-step least squares procedure. At the first stage, regress
the columns of X on the column space of Z and then, at the second stage, regress y on
the column space of the fitted values of X obtained from the first stage regressions. To be
more exact, for each j E 1,... ,p, consider the regressions

x() z6j+.j, j ...P

where x() is the jth column of X, Z is the matrix of explanatory variables and yj is the
error term. The OLS estimate of 6j is j = (ZTZ)-IZTx(J), a q x 1 vector of estimates.

The fitted values from these regressions are (j) = Pzx(j), each of dimension n x 1. By

concatenating these fitted column vectors (j), we obtain the fitted values of X from these

regressions,

X = [5( ),..., )] = Z(ZTZ)-1ZTX = PzX,

of dimension n x p. Now, at the second stage, we consider the regression model with y as

the response and X as the matrix of explanatory variables

y= Xp +-, (2-31)

where y is the error term. The OLS estimator of 0 in (2-31) is

(XTZX)- XTy

= (XTPzX) 1XT Pzy,

in agreement with (2-29).

The 2SLS estimator / is also a feasible generalized least squares estimator (H Vi-lhi

2000, p. 59). To see this, multiplying (2-28) by ZT, we obtain

ZTy = ZTXP + ZTe. (2-32)

Since E[ei] = 0 and zi and le are independent,

Cov[cizi] E[cz zt] = 2Qz,

and the GLS estimate of 0 corresponding to (2-32) is

/3 (XT Q-ZT X)- XT Q- ZTy.

Now, since n-1ZTZ > Qzz, a feasible GLS estimator of / is obtained by substituting the

consistent estimator n-1ZTZ for Qzz, which again yields 3 as given by (2-29).

The 2SLS estimator is also particular case of an efficient GMM estimator. For
i ... n, let ui (yi, xT, zf)T be the observations arranged in vector form. For
u = (y, xT, ZT)T e RI+p+q, and 3 E RP, let g(u, /) = (y xTp)z. From Assumption 2.4.2,
for all i = 1,..., n,

E[g(u, )]0, (2-33)

where 0 is the regression parameter in (2-28). By Assumption 2.4.1, q > p, so, the q
moment conditions in (2-33) define a GMM model as in (1-1). Then, for a given weight
matrix W,, the GMM estimator is

3 = argmin g,()T Wg,(O), (2-34)
'3

where g,(/) = n-1 El(y TP)Z = n-1ZT(y XP). Since Q,(/3) = g,()T/Wg,(3)
is differentiable with respect to 3, the GMM estimator 3 is a solution to VQT(/)
XTZW,ZT(y XP) = 0, and it is given by

= (XTZWZTX) -XTZWnZTy. (2-35)

In order to obtain an efficient GMM estimator, we need to choose W, to be an estimator
of the ..i-mptotic covariance matrix of gT,(/). Since gT,(/) = n-1 Zi (y xfT)Zi

n-1 Ei cizi, using Assumptions 2.4.1-2.4.3, we obtain n-1/2 y1 eiZi -i N(O, 2Q zz).
Since n-1ZTZ a Qzz, an efficient GMM estimator is obtained from (2-35) by
substituting (ZTZ)-1 for W,, i.e.

S= (XTPzX) 1XPzy, (2-36)

the same as in (2-29).
Finally, if Assumptions 2.4.1-2.4.3 are fulfilled, then the 2SLS estimator is
.i-i v:, l-. ically normally distributed (Hall, 2005), and in particular

n1/2( /3) N (0, 2(QQxz Q T-1) .

2.4.2 Bootstrapping 2SLS Estimators

Consistency results and bootstrap procedures for regression models have been

developed in Freedman (1981), Shorack (1981), Freedman (1984), and Wu (1986).

Freedman (1981) identifies two main bootstrap procedures for linear models: bootstrapping

residuals and bootstrapping cases. Consider the following linear model

y, = xf/ + (2-37)

where yi and xi are the response and the explanatory variables, respectively, 0 is the

regression parameter, and ci is the error term. The observed data are in the form =

{(Xi, yi),..., (x, y,)}. In regression models, the cis are taken to be independent, zero
mean, and usually identically distributed. In this case, one usually first fits the model to

the data and then the residuals are resampled and added to the fitted values to obtain

the bootstrap sample. More precisely, let i = yi yi be the residuals from the regression

fit. Then, we first resample n iid draws from = {c1,..., e}, obtaining the bootstrap

residual resample *= {e,... ,e}. Finally, for each i, define

Yi x/3 + c. (2-38)

In the regression model, a typical bootstrap resample is then = {(y*, xi) : i

1,..., n}, so that the bootstrap estimate /* is defined as the version of / on 1*. This is

called resampling residuals.

The case when {(yi, xi) : i = 1,..., n} are iid pairs, then the linear model (2-37) is

called the correlation model, using the same terminology as (Hall, 1992, p. 170). In this

case, the pairs (y, x\ ) are resampled randomly from 2'. Then, the bootstrap estimate /*

is defined as the version of / on S*.

In the case of the zero-intercept regression model, Freedman (1981) and Shorack

(1981) argue that residuals need to be first re-centered before resampling, in order to

obtain consistent bootstrap estimates. Freedman remarks that "..., without centering,

the bootstrap will usually fail." We will prove that this depends on the underlying

assumptions on the data generating mechanism. The main reason is that in the case of

bootstrapping residuals, the (uniform) bootstrap introduces a bias in the estimates that

sometimes does not vanish .,-,iii!ill1 ically. (This issue does not arise in ordinary least

squares regression with an intercept.) We do not include the proof of the next theorem in

the appendix since it is similar with the following proofs regarding consistency results for

bootstrapping 2SLS estimators, later in this section.

Theorem 2.4.1. Suppose that the vectors vi = (xi, i) are iid, with ci and xi independent.

Moreover, suppose that E. |i 114< oo and let a2 = Var[ci]. Then

(i) If Ex = 0, then the bootstrap is (., ,Il.) consistent: VQ(/* /) N(O, a2 -1);

(ii) If Exi = p 0, then the following (conditional) weak convergence does not hold:

(* ) N(O,a2Z-1).

In the case of the 2SLS estimator, in order for the ordinary uniform bootstrap to be

consistent, Freedman (1984, p. 834) argues that the residuals first need to be recentered,

since "As data, the residuals are not orthogonal to the instruments." This statement is

potentially misleading and might lead a practioner to recenter the residuals unnecessarily.

We will show in this section that under general conditions, the uniform bootstrap estimate

of the distribution of 2SLS estimator is indeed weakly consistent without the need for

re-centering. The main idea here is that it can be shown that bootstrapping the data in

2SLS regression model is equivalent to bootstrapping cases, and bootstrapping cases in

2SLS model is equivalent to bootstrapping efficient GMM estimators, in the GMM model

associated with the regression model. We have shown in the appendix that the (uniform)

bootstrap is consistent for the sampling distribution of GMM estimators. Although Hahn

(1996) also proved the result of consistency for GMM estimators, he does not make the

connection between the Freedman's resampling procedure with uncentered residuals and

bootstrapping the GMM estimators.

To be more specific, let residuals from the 2SLS fit be given by

i = yi xi3, (2-39)

where x is the i-th row of the matrix X. Generally, the ij's are not orthogonal to the

instruments, i.e.,

jzii / 0. (2-40)
i=1
Freedman defines the recentered residuals as the part of residuals that are orthogonal to

the column space of the Z (instruments), i.e.

i zT(ZTZ)-1ZTc.

In order to preserve the dependence structure within vi = (xi, zi, ci), Freedman resamples

from -' -{(x,zfT ,),i 1 l,...,n}. Let 1* {(x', z, ),..., (x*7,z, E,)} be a

generic uniform resample from 4'. Denote by X*, Z*, P;, and e* the bootstrap versions

of X, Z, Pz, and e on 3t*. Define the bootstrap observations y* by

y* X*) + *. (2-41)

The analog of 3 for the bootstrap resample 1* is given by

3* (X*TP;X*) X* Py*-* (2-42)

We now show that recentering is not necessary when bootstrapping 2SLS estimators.

Let = {(x i, l, i),... (x,, z( ,)} and = {(x,z, e),..., (x*, z*, *)} be a

(uniform) bootstrap resample from 2, i.e. (x, z, e),..., (a*, z, *) are (conditionally)

iid, (discrete) uniformly distributed on '. Denote by X*, Z*, P;, and c* the bootstrap

versions of X, Z, Pz, and c on i*. Define the bootstrap observations y* and /* the

bootstrap analog of / for the bootstrap resample as before. It is an easy exercise to

show that resampling (xi, zi, ci)'s, with equal probabilities is equivalent to resampling

cases (yi, xi, zi)'s, with equal probabilities. Therefore, since 2SLS estimators are particular

cases of efficient GMM estimators (as in (2-36)), and using the result that the uniform

bootstrap is consistent for the distribution of the GMM estimators, we obtain the uniform

bootstrap gives consistent estimators for 2SLS estimators, without recentering the

residuals.

The following consistency results can be generalized to allow for heteroskedasticity of

the error terms, but we limit the proof to these simpler hypotheses (White, 1982a).

Theorem 2.4.2. Under the Assumptions 2.4.1-2.4.3, for every x e R",

P, z^* iEi i= 1 i= 1

where 4o,,2Qz (x) is the joint cumulative distribution function of a multivariate normal

vector with mean 0 and covariance matrix O2Q.zz

Theorem 2.4.3. If the Assumptions 2.4.1-2.4.3 are fulfilled, then the bootstrap estimate of

the distribution of the 2SLS estimator is ,i,;,,1;,/..:.' .ria consistent, i.e.

,n/2(* ) f N (0, ,2(Q Q QL)-1) (2-44)

2.4.3 Simulations

We present some simulation studies to compare the coverage performances of

confidence intervals based on different bootstrap procedures developed in this chapter.

We consider the following regression model

Yi = axi + cli, i 1,. n, (2-45)

where the (random) explanatory variables are given by

Xi = Ali + C2i, (2-46)

and the instruments zli and z2i are generated by

Sli 7<2i + 33i, (2-47)

Zi2 72i + C4i, (2-48)

and Cli, C2i, C3i, C4i are iid, (X 1) distributed. Hence, cis have mean 0 and variance 2.

Let x = (xl,..., X)T be the nx 1 vector of explanatory variables, y the nx 1 vector of

responses, e = (Cen, )T the n x 1 vector of unobserved error term, zi = (zl,... zn)T

and z2 (z21,. 2)T the n x 1 vectors of instrumental variables. In matrix notation,

the model given by (2-45)-(2-48) can be written in a more compact form as

y = ax + e, (2-49)

with zl and z2 as instrumental variables. We can easily see that zl and z2 are valid

instruments, in the sense that they are uncorrelated with the error term e (Cov(zli, ~i) =

0) and correlated with the endogenous variable x (Cov(zi, xi) = 7). Moreover,

Cov(xi, i) = A, so that the classical regression model assumptions are not fulfilled.

As a consequence, the OLS estimate of a is inconsistent in this case.

In this simulation study, we compare the coverage performances of confidence

intervals based on biased bootstrap, double biased bootstrap, biased bootstrap recycling,

centered residual bootstrap (Freedman, 1984), uncentered residual bootstrap, centered

double bootstrap, and uncentered residual double bootstrap.

In order to define the biased bootstrap confidence intervals, we consider the

instrumental variable regression model in the GMM framework, as described in subsection 2.4.1.

For i = 1,..., n, let ui = (yi, xi, zT)T be the observations arranged in vector form. For

u = (y, x, zT)T e R4 and a E R, let g(x, y, z, a) = (y ax)z. By assumption,

E[g(ui,a)] = 0, i= 1,...,n, (2-50)

where a is the regression parameter. It is worth noting that n-1 i (yi axi)zi = 0 does

not have solutions, since the dimension of the system of equations (2 in this case) is i.-.- r

than the dimension of the parameter (1 in this case). Thus, we are in the GMM setup.

Let = {u, i : i 1..., n} be the sample of observations arranged in vector form,

and let 3 = {F, : a E R} be the least favorable family of weighted empirical distributions

on the sample 3 defined in Section 2.3 associated with the implied GMM model defined

by (2-50). Let a be the 2SLS estimator of a. We apply the bootstrap technique described

in subsection 2.3.2 to construct bootstrap confidence intervals. For an a level upper

confidence interval for a, we need to find tl-, the solution of the "population equation"

P (a < t t1-)) a 0. (2-51)

We rewrite the left side of this equation as follows

P (a < -) a P (- a > t-a) a

= 1- a P (a- a < ti-a)

-1 E [I(- a < tl)]. (2-52)

Hence, tl-a is the solution of the "population equation"

E [I(a a < tl_ )] 1 a. (2-53)

Let ti be the bootstrap estimate of tl-,, i.e. the solution of the i p!l-- equation"

Ea I(a* a < la-) = 1 a. (2-54)

The biased bootstrap upper confidence interval with nominal coverage a is then Jbb

(-oo, a tl-,). Since typically

P (a < a a1-_) / a, (2-55)

we can look for a (further) correction of the coverage level a. Let q(a) be the solution of

P(a < a- ti(a))= a. (2-56)

Let q(a) be the bootstrap estimator of q(a), so that q(a) satisfies

Pa(a < a* ) = a, (2-57)

where t satisfies P(a** a* < t IFa*) = Since q(a) satisfies Ea[I(ta) < a )] = a,

and

tq(a) _< a E[I(a** -* < a* )> q(a), (2 58)

it follows that q(a) satisfies

Ea [I {Ea*[I((** < a)] > q(a)}] a. (2-59)

Consequently, q(a) is the (1 a)th quantile of Ea* [I(a** a* < a)].

We now describe the Monte Carlo implementation of this bootstrap procedure.

For each biased bootstrap resample b*, b = 1,..., B, compute ag and draw C biased

bootstrap re-resamples b**, c 1,..., C. For each c, let a7* be the bootstrap version of a

on Xb*. For every b = 1,... B, let

4 = I( -2 < a a). (2-60)
c=l 1

The Monte Carlo approximation of q(a) is the (1 a)th quantile of z,..., z*. Then,

compute 4t(a), the 4(a)th quantile of ag a, b 1,..., B. Hence, the ath upper bootstrap

(coverage) calibrated confidence interval for a is given by

Jgbb = (-00, ( tq(a)). (2-61)

The same procedure is used for the (ordinary) double bootstrap, with the only

difference that the resamples are taken from uniform empirical distributions on the

samples. Hence Jgub, the Monte Carlo approximation of the double bootstrap (coverage)

calibrated upper confidence interval for a can be found in a similar way.

We describe now how recycling can be applied for this problem. Instead of redrawing

the second level bootstrap re-resamples cb**, c 1,..., C, in order to estimate

F, (I(-** a< -a )),

corresponding to the bth resample Xb*, we use the importance sampling (change of

measure) identity:

4 F, (I(** a < a ))
EaI* d \ (2-62)
E I,* H-en

Let i'T r, i 1P()) where mi =- #{j :X X }. Hence,

B
z4 I(i(a a < a a)',,. (2-63)
b'= 1
The Monte Carlo approximation of q(a) is the (1 a)th quantile of z-,..., z, and we

then compute ti(a), the 4(a)th quantile of a b = 1,..., B. Hence, the ath upper

bootstrap (coverage) calibrated confidence interval for a is given by

Jrbb = (- 0, ii ig(a)). (2-64)

Let Kbb' = wb be the normalized weights. Then, the (adjusted) Monte Carlo estimates
Let bb 1Zbi Wbb,
in (2-63) are:
B
Z4 I(a, ab < a a)bb'. (265)
b'=
We denote by J', ) the (adjusted) recycling biased bootstrap confidence interval.

Tables 2-1 and 2-2 show the Monte Carlo estimates of the coverage probabilities

corresponding to different bootstrap confidence intervals: GMM biased bootstrap (GBB),

GMM uniform bootstrap (GUB), and Freedman's centered residuals (FCR). We have

also included the coverage of confidence intervals based on .,-vmptotic approximation

(ASY). The coverage probabilities were estimated by the proportion of nominal level

a upper confidence intervals containing the parameter a, using 1000 simulation runs

and 1000 bootstrap resamples, for different values of a = .9, .95, for sample sizes n

20, 40, 60, 80,100, 200, 300. From these tables we remark that all these three resampling

procedures perform similarly, for both types of bootstrap confidence intervals: percentile

and percentile-t. The bootstrap percentile-t confidence intervals perform slightly better

than the bootstrap percentile confidence interval for all resampling procedures. The

confidence intervals based on the .,-iii!!i, l.l i, approximation perform quite poorly in this

case even for relatively large sample sizes.

Figures 2-1, 2-2, and 2-3 show the coverage errors for different bootstrap percentile

confidence intervals, using different bootstrapping procedures for different nominal sizes

a = .9, .85, .9, .95, .99. The coverage probabilities were examined using 500 simulation

runs and 500 outer level resamples. From these plots we can conclude that except for the

sample size n = 20 when all bootstrap procedures perform relatively similar, for all other

sample sizes n = 40, 60, 80,100, 200 the biased bootstrap recycling confidence intervals

outperfom all other bootstrap percentile confidence intervals.

Table 2-1. Estimated coverage probabilities for bootstrap one-sided, upper confidence
intervals at a .90 nominal level, 7=.8, B 1000, S 1000: GMM biased
bootstrap (GBB), GMM uniform bootstrap (GUB), centered residuals (FCR),
and based on .-i, i-!ll ic approximation (ASY)
Percentile Percentile-t
n GBB GUB FGBB GUB FCR Asym
20 0.912 0.928 0.915 0.825 0.831 0.816 0.998
40 0.964 0.972 0.969 0.858 0.862 0.854 0.989
60 0.968 0.973 0.976 0.861 0.869 0.861 0.989
80 0.967 0.968 0.966 0.874 0.878 0.876 0.97
100 0.963 0.969 0.967 0.865 0.874 0.869 0.961
200 0.944 0.946 0.944 0.88 0.881 0.875 0.951
300 0.932 0.93 0.929 0.877 0.879 0.878 0.95

Table 2-2. Estimated coverage probabilities i bootstrap (
intervals at a=.95 nominal level, y.8, B 1000
bootstrap (GBB), GMM i:: : i: bootstrap (GC
and based on asymptotic approximation (A )

one-sided, upper confidence
, S 1000: GMM biased
JB), centered residuals (FCR)

0
l o
o

080

085

Nominal Level

Figure 2-1

090

Nominal Level

Estimated coverage errors co:.- -i -(:: .:: to ::
intervals, at different a levels, sample size n=
(GBB), E CMM i: :: bo, i, ,O CMM i.
(RBB), A centered residuals (FCR), V based on
(ASY)

::: bootstrap confidence
40: o GMM Biased Be i
cycling F : Bo( i
oticc oxidation

ii
ii
ii
ii
::

; :;;

:: : : : :: :

48
4=

I
LU
& o
0
o

095

Nominal Level

Nominal Level

Estimated : :, errors corresp( :: :: to
intervals, at different a levels, sample size n ::'
((H I 3), D (: i :: : : bo 0 (: i R.
(RBB). A centered residuals (FCR), V based on
(ASY)

Sbootstrap confidence
o GMM Biased Be .! ,
ling Biased Bootstrap
.. .totic .. iooximation

w
o o
o

0 95

Nominal Level

Nominal Level

Figure 2-3. Estimated coverage errors co: t.o : :.: bootstrap confidence
intervals, at different a( levels, sample size n=100, o GMM F .i
Bootstrap (CBB), D GCMM 1i: ::.. bootstrap, 0 GMM P : Biased
Bootstrap (RBB), A centered residuals (FC ), V based on -mptotic
.. ;. .. (ASY)

1i .ure 2-2.

o -
0 =-.8A

CHAPTER 3
THE BLOCK BIASED BOOTSTRAP FOR TIME SERIES

3.1 Review on Bootstrap for Time Series

There are essentially two different v--~ to bootstrap dependent data. One is

model based, when the residuals from the model fit are assumed to be approximately

independent. In this case, the bootstrap samples are obtained by first resampling

appropriately the residuals from the model fit; the bootstrap resamples are then recovered

using the model structure with the estimated parameters. See for example Freedman

and Peters (1984), Freedman (1984), Efron and Tibshirani (1986), and Bose (1981).

Model free bootstrap procedures have been proposed in a series of papers by Hall (1985),

Carlstein (1986), and Kiinsch (1989). These are based on blocking arguments, in which

the data are first divided in blocks of consecutive observations, and the blocks instead of

the observations are resampled in order to obtain the bootstrap resamples. As a result, the

dependence structure within the time series is preserved within each block.

There are different types of blocking, including overlapping and nonoverlapping

blocks. These in turn give rise to different block bootstrap procedures, such as the

nonoverlapping block bootstrap (Carlstein, 1986), the moving block bootstrap (Kiinsch,

1989), the circular block bootstrap (Politis and Romano, 1992), and the stationary

bootstrap (Politis and Romano, 1994). All these block bootstrap methods are particular

cases of the generalized block bootstrap, as defined in Lahiri (2003, pp. 31 33). By

approximating general stationary time series with families of (increasingly more complex)

parametric models, sieve bootstraps have also been developed in a series of papers by

Kreiss (1992), Bilmann (1997), and Choi and Hall (2000).

Instead of resampling the blocks with equal probabilities, we propose a new procedure

to resample the blocks from a weighted empirical distribution on the sample of blocks

which satisfies the sample moment conditions. The weights are obtained by minimizing the

Cressie-Read distance to the empirical distribution under the constraint that it satisfies

the moment conditions at the M-estimator. This resampling procedure can be applied

successfully also for the GMM model, and consistency proofs can be easily extended in

this case for bootstrapping GMM and GEL estimators and the test of over-identifying

restrictions. Moreover, as in the iid case, by applying a recycling algorithm (Section 2.3.2),

we can reuse the first level block bootstrap resamples when iterating the block biased

bootstrap in order to estimate higher level parameters.

3.2 The Block Biased Bootstrap for Generalized M-Estimators

In this section, we propose a different method to bootstrap generalized M-estimators

for time series, based on the biased-bootstrap. Brown and N. v. y (2002) applied the

biased-bootstrap for GMM model with iid observations. They anticipated that a similar

bootstrap procedure for dependent data would be possible.

Let X {X1,...,X,} be a realization of a d-dimensional stationary time series.

Denote by F the common distribution of Xis, and by F, = (1/n) E:l 6x, the empirical

distribution of Xis. Suppose that the parameter of interest 00 E C WR is defined

implicitly as the unique solution to the "population equation"

EF [boo] E [b(Xi; 0o)] 0, (3 1)

for some b : Rd x 0 -i RP. Bustos (1982) introduced the generalized M-estimator 0 of 0 as

a solution to the inp!.-1 equation"

E, [b] = bb(X); ) 0. (3-2)
i=1

This class of estimators includes the (pseudo) maximum likelihood estimators (MLEs) and

certain robust statistics in classical parametric time series. We now describe the moving

block bootstrap (though the methodology can be extended for non-overlapping blocks

as well). Let Bi {Xi,...,Xi+l_1} denote the block of length I starting at Xi, with

i = 1,..., N, where N = n 1 + 1 is the total number of blocks and let = {-Bi,..., BN}

be the sample of moving blocks. We will suppose throughout that the block length

I l(n) satisfies I oc and -7- 0 as n oo.

Let B*,..., B* denote a simple random sample drawn from 1X, where b = [T], i.e.

bi is the integer part of the ratio between the sample size n and the block size 1 (though

other values are also possible). Since each resampled block contains I elements, in total we
resample nI = lb elements. If we denote the elements of block B* by X(_+,...,

then K* = {X. ,... ,X, } is a block bootstrap resample of size nl. Generally, the

bootstrap version of 0 and its centered and scaled version T- = 1/2( 0o) are defined in

one of two v--i-.

One way is to define the bootstrap version 0* of 0 as the solution of

t1
1 b(X, 0*) 0, (3 3)
ni
i=1

and then the bootstrap version of T, is T = n (/2(O* 0), where the centering value 0 is

defined as the solution to

b(0) 1 E [b(X:,0) ] 0, (3 4)
ni
i= 1

where E [ I] is the conditional expectation given the sample it. It is easy to see that

0 is generally different from 0, so, computation of T* requires solving an additional set of

equations for the right centering value. Using 0 instead of 0 when defining the bootstrap

version of T, induces an extra bias (due only to the resampling procedure) that leads to

either a worse approximation or inconsistency of the bootstrap (Lahiri, 2003, pp. 81-83).

Another way to define the bootstrap version of 0 is as a solution to the modified

equation

1 Z[b(X,*) 6b(0)] 0. (3-5)
i= 1
This approach has been first -, i--. -1. I by Shorack (1981) in the context of bootstrapping

M-estimators in linear regression models and also by Hall and Horowitz (1996) in the

context of bootstrapping GMM estimators. Note that b(0) is the appropriate quantity

that makes the estimating equation (3-5) conditionally unbiased. As Lahiri (2003, p. 83)

remarks, one advantage of using (3-5) over (3-3) is that we need to solve only one set of

equations (3-5), compared with two sets of equations (3-3) and (3-4) in the latter case.

In order to define the weights of the blocks corresponding to moving block biased

bootstrap (\!I BB), denote by Ui(0) (1/1) E 1j b(Xj; 0) the average of moments at 0

in block Bi. Then let p = p(0) = (p,... ,pN) be the vector of probabilities that is closest

to the vector of uniform weights (1/N,..., 1/N) such that the weighted mean of the block

averages is equal to the zero vector. In other words, p is the solution of the minimization

problem
N
minimize Dp(p) = t,(Npi),
i= 1
N (3-6)
subject to { 1 piUi() ,(3

where b, is the kernel of Cressie-Read distance. The optimization problem (3-6) may be

solved for pi as in the iid case, yielding

{1 + (p- 1)[p6 +ATU()] } 1), ifp/ 1, (3
=(3-7)
exp + ATU(0), ifp 1,

where 6 and A are chosen to satisfy the constraints i1 pi 1 and pU(0) 0.

To obtain the MBBB samples, we first compute the estimate 0 of 00, then randomly

select bl blocks from the collection = {Bi,..., BN} corresponding to the vector

of probabilities p(0) defined by (3-7). As before, each resampled block B3 contains I

elements which we denote by B3 = (X*i-1)+,"..., X), i = ,... b, and nI Ibl is the total

number of bootstrap values. Then X* = {X,..., X1 } is a MBBB sample of size n1. Let

0* denote the bootstrap version of 0 corresponding to the biased bootstrap resample K*

defined as the solution to (l/ni) EY b(X*; 0*) = 0. By construction, the (conditional)

expectation given the sample X of the biased bootstrap mean of the moment conditions

at 0 is the zero vector, i.e.

1 -i
Eb [b*6)] 0 E b(XJ;6)
i 1
bl1

i= 1

E[U()] = pj)U () 0, (3-8)
i 1

where b6(0) (1/n) E b(Xi; 0) and U(0) (1/1) Ei+-1 b(Xj; 0).

3.3 Consistency Results for the Block Biased Bootstrap

Let T, = ln(0 6o) and T,* = V (0* 6), and denote by GT and G, their

distributions (G, is a random distribution). Before describing the bootstrap for time

series, we first introduce the notions of stationarity and a-mixing property necessary for

central limit theorem in the case of dependent data. A sequence of random vectors Xi,

i = 1, 2... is called (strictly) stationary if for every i1 < i2 < ... < ik, k E N, and for

every m E N, the distributions of (Xi,,... ,Xk) and (Xil+m,.. Xic ) are the same. The

strong mixing or a-mixing coefficient of (Xi) is defined as

a(n) = sup{| P(A n B) P(A) P(B)| : A c {Xj : 1 < j < k},
(3-9)
B e a{Xj : j > k + n + 1}t, k N}, n E N.

We assume the following regularity conditions hold.

Assumption 3.3.1. The parameter space 0 C IR is compact, Oo is an inner point of

which satisfies the population moment condition given by (3-1).

Assumption 3.3.2. The function bo(x) is two times differentiable in 0 in a neighborhood

of 00 for every x, such that Ilb(x, 0) 1, ||Vb(x, 0) |, and IIV2b(x, 0) 1| are all bounded by

some square integrable function k, with EF k2 < oc.

Assumption 3.3.3. Both Cov(b(Xi, Oo), b(Xi, 00))+2 i1 Cov(b(Xi, Oo), b(Xl+i, 6o))

and D= E[Vb(X1,00)] are nonsingular.

Assumption 3.3.4. There exists a 6 > 0, such that : ac(n)6/(2+6) < oo. Also, the

block length I satisfies 1-1 + n-6/(4+2)1 o(1) and EIb(X1, 00) 112+< 0.

Under the conditions above (Lahiri, 2003, Theorem 4.2, p. 83), the following

weak convergence holds: GT G, where G = N(0, D 1,(D-1)T) and Eo
lim,,, nVar(b,(Oo)) = Cov(b(Xi, Oo), b(X1, Oo)) + 2 Ei Cov(b(Xi, Oo), b(Xli, Oo)),

b6(O) (1/n) E b(X, ), D E [Vb(Xi, 0o)].

The next lemma is concerned with the existence of the weights satisfying (3-6). Let

U, = U,(0).
Lemma 3.3.1. Under the same conditions as in Theorem 3.3.1, 0 is inside the convex

hull of {Ui : i 1 ,..., N }, with "'..1 l..7l,. I J// '1'1'..' i':.., 1 as n --- o .

Let Co {Ui : i = 1,...,N} = { C aiUi : ci > 0, E 1i 1} be the convex hull

for {Ui : i 1,...,n}. Since 0 E Co {U : i 1,...,N}, there exists a unique set of
weights {pi : i 1,..., N} solving (3-6) such that pi 1, pi > 0 and U = 0.

We consider in detail the case when p = 0, with similar proofs holding for any member in

the Cressie-Read family. Taking p = 0 in (3-7), the weights have the following simplified

expressions
1
pi = T) (3-10)
N(1 + ATU+)'
where the Lagrange multiplier A satisfies

N= 0. (311)
t + ATU,
i= 1
The next theorem shows the consistency of the moving block biased bootstrap distribution

of the mean of moment conditions.

Theorem 3.3.1. Let X = {X1,..., X, } be a realization of a strictly station.' ;, time

series and let Oo be the parameter of interest, assumed to -iI:-fy (3-1). Suppose that 0 is a

sequence of generalized M-estimators 1/. ;,. ., by (3-2), and let F6 be the weighted empirical

distribution given by (3-6). Let X* = {XI ,...,XT*} be a block biased bootstrap resample.

Under the assumptions 3.3.1-3.3.4, as n -+ o,

I/2 b,(0) N(0,o ). (3-12)

Lemma 2.3.2 and Theorem 2.3.2 from Section 2.3.1 hold also in this case. Since the

proofs are identical, we have not included them in the appendix.

Lemma 3.3.2. Under the same conditions as in Theorem 3.3.1, the following ,in:f.rm

convergence for the biased bootstrap holds:

n
sup -Zt b(X,0) EFb = o-(1), (3-13)
i= 1

where E be = Eb(Xi, 0) = b(O).

Theorem 3.3.2. Under the same conditions as in Theorem 3.3.1 and for every

sequence of generalized M-estimators 0 and (biased) bootstrap estimators 0*, with

b(0) = n-1 Ei b(X, 0) = op() and b*(0*) n-1 E 1 b(X, *) o(1), we

have 0 = o + op(1) and 0* = o + o (1), and hence also that 0* 0 + o* (1).

The next theorem shows that the MBBB distribution of the generalized M-estimator

is consistent.

Theorem 3.3.3. Let X {X1,..., X,} be a realization of a strictly statio,.,,l time

series and let Oo be the parameter of interest, assumed to -,i,-fy (3-1). Let 0 be a

sequence of generalized M-estimators 1, fi,. I by (3-2) and let Fh be the weighted empirical

distribution given by (3-6). Let i* ={X ,.. ..,X1 } be a block biased bootstrap resample

and let 0* be the bootstrap version of 0 on i*. Under the assumptions 3.3.1-3.3.4, as

n -oo,

n (/20 0) 2P N(O,D-I :C(D-1)T). (3-14)

3.4 Iterated Block Biased Bootstrap Recycling

For any block size 1, first introduce a family 3 {Fo}oee of weighted empirical

distributions on the sample X of blocks of length 1, defined by (3-7) and associated with

the generalized M-estimation model (3-1). Then, estimate the parameter 0o = 0(F) by 0,

and, for the first level of block biased bootstrap, resample the blocks according to Fb e 7,

obtaining the resample X*. Compute 0*, the biased bootstrap version of 0 corresponding

to the resample 5*. Then, at the second iteration, the block biased bootstrap re-resample

is obtained from iid draws B/* from the weighted empirical distribution F, E 7,

obtaining the re-resample Yi**, (also called the inner level bootstrap resample).

Instead of drawing the second level bootstrap re-resamples in order to estimate

E, (T**), we may use the importance sampling identity as in the iid case

Eg.(T**) E (T* dP ), (3- 15)

where G is a given "dl -1,Si distribution, P.g = F" is the n fold product measure of

F., and dp is the Radon-Nikodym derivative of P,. with respect to G. Of course, if

the statistic itself is difficult to evaluate for each sample, we can choose the first level

biased bootstrap resamples Xb*, with b= 1,... ,B. Hence, we have the following recycling

approximation of (3-15) corresponding to re-sample Xb*

SB
E(T)-T(*), (3-16)
b' =
where
",'" ,n) and n) = #j B BiJ.

To illustrate the use of biased-bootstrap recycling, we consider a typical bootstrap

estimation problem. Suppose that we want to estimate a characteristic Q of the sampling

distribution of 0 00, such as its bias, variance or ath quantile. Lahiri (2003, p. 2) calls

00 a level-1 parameter and defines the level-2 parameter Q as a functional related to the

sampling distribution of an estimator of a level-1 parameter. Generally, Q is the solution of

a functional equation

E f(F, F6) 0. (3-17)

Examples of such functionals are given by (2-17)-(2-19). Denote by 0(1) and 0(1)* the

biased bootstrap solutions of (3-17), which satisfy

EB [fL(l)(FO, F,)] = 0 and Eb, [f(). ,(F., FQ..)] = 0. (3-18)

Let X,*,..., Y be B biased bootstrap resamples and let QB(1) be the Monte Carlo

approximation of Q(1), defined as the solution in 0 of the equation

S.f(Fb, F O.) (3-19)
b=1

The recycling Monte Carlo approximation *B(1) of 14(1) (corresponding to resample Xb*)

is the solution in 0 of the equation
B
fo(F', Fl-,, = 0, (3-20)
b' 1

where ii,,, is given by (3-16).

Remark 3.4.1. The advantage of the biased bootstrap recycling becomes more obvious

when the statistic is difficult to evaluate for each sample (as may be the case for the

generalized M-estimator) and we need to iterate the bootstrap. In this case, we do not

need to evaluate these estimators for the second level bootstrap re-resamples because we

use only the estimators already computed at the first level and the vectors of probabilities

corresponding to these estimates.

Remark 3.4.2. This biased bootstrap recycling methodology can be viewed as a competitor

of the jackknife-after-bootstrap developed by Efron (1992) in the iid case and by Lahiri

(2003) in the case of dependent data. The main idea of these methods is to use the first

level resamples when estimating higher level parameters corresponding to the iterated

bootstrap.

3.5 An Application to the Optimal Block Size Selection

We describe now a small simulation study to show the finite sample properties

of optimal block size selection using the recycling MBBB method described in the

previous section. We use the same time series model as Hall et al. (1995) and Lahiri

(2003, pp. 182-186) in which

Xi (c + ci+)/v, i 0,1,... (3-21)

where ci are iid, (X 1) distributed. Hence, E = 0 and Varc = 2. We consider the

level-1 parameter 0 = EX1 and its obvious estimator = X, with the sample size n = 125.

The level-2 parameters of interest are

01 = nVar(X) (3-22)

and

2 P (X < 0). (3-23)

For each block size 1, let (1)l denote the block biased bootstrap estimator of 0, based on

the block size 1. The optimal choice of the block size can be taken to be

0o argminMSE((01)) argmin E ((l) _- )2. (3-24)
1 1

Using the same terminology as Lahiri (2003), lo is a level-3 parameter, since it relates to

the sampling distribution of '(1), which is an estimator of the level-2 parameter '. Since

the underlying F is unknown, we can approximate the expectation from the last display

by its biased bootstrap version,

= argminMSEg(4*(1)) argmin E ( *(1) (/))2, (3-25)
1 1

where I is a reasonable pilot block size. Hence, in order to estimate 1, we need to iterate

the bootstrap or to apply recycling.

We describe in more detail the computational details of the recycling procedure

proposed in this section, for the particular case when i1 = nVar(X), the same ideas apply

for distribution function estimation. We consider the forward Kullback-Leibler distance

when defining the weights of the biased bootstrap, obtained for p = 0 in the family of

power divergences given by (1-27). For a given value of the block size 1, the vector of

weights given by (3-7) is computed

1
p(X) ~(1 AU(X))' (326)

where Ui(X) = (/1) -1 Xj -X and A satisfies

N U(2)
7 n(- UX) 0. (3-27)
in( 1 AU(X))

Let FX be the weighted empirical distribution on the sample of blocks that assigns

probability pX(X) to the block Bi. Now, let j*,..., Y be B moving-block biased-bootstrap

resamples. For each b 1,..., B, denote by X* the sample mean of Xb*. Let
B
B(l) jZ uI(X X)2
b=l

be the Monte Carlo approximation of the bootstrap estimate Q(1), where ni = lbl. Now,

for each b, compute the vector of weights p(X*) = (pi(X*) : i 1,..., N) using (3 26) and

(3-27). First, a description of the computational details for the double biased bootstrap is

presented and then its recycling version.

For each b, consider C re-resamples { *c* : c = 1,..., C} from Fx*. For each c,

compute XL*, the sample mean of 2b*, so that, the Monte Carlo approximation of

Q (1) ni Varx( (X**) is

CY(1) n1(X 7_ X )2.
c=l 1
Finally, the Monte Carlo approximation MSE C(Q*(I)) of MSEx(**(1)) is

B
MSE ( *(l)) () ))2
b=l

where I is a pilot value for the block size (in the simulations below, = [1251/3] = 5).

Although instead of QB(1) we could have used any consistent estimator of 1, (e.g. one

based on a spectral density estimator evaluated at 0), a choice of a smoothing parameter

(the bandwidth of the spectral density estimator) is still necessary due to the infinite

dimensional nature of 01. The MC approximation of the bootstrap estimate of the optimal

block size 10 is

Sargmin MSEfo (*()),
leJn
where J,, is the set of potential block sizes. In the simulations below, we considered

J125 {1,..., 10}.
In recycling, instead of drawing C additional resamples for each b, we use the initial

resamples {Ib* : b = 1,..., B} together with the corresponding vector of probabilities

(p(X*) : b = 1,..., B), computed as before. Using the importance sampling identity, we
have

;(l) = n,1Ex. (X** X*)2
P p(3-28)
n, Ex [(X* X*)2dPX] (3 2)

where Px = F is the n fold product measure of Fx. Using the resamples { b* b =

1,...,B}, the Monte Carlo recycling approximation /rB(/) of b(1) is

1 B N /f*\mi)
r (1/) -__IXb ) 2 A ()2 (3-29)
b'- 1 i 1 i

where

m = b : 3. (3-30)

Hence, the recycling approximation MSE (B*(1)) of MSEx( *()) is

SB
MSE (*()) = r(1) (1))2.
b=l

Let now kbb' bb be the normalized weights. Then, the (adjusted) Monte Carlo
bb' Wbb'
approximation of ((1) given by (3-28) is

B
rB(0) E l(Xb X*b)2kbb'- (3 31)
b' 1

By running 20000 simulations from the model 3-21, true values of the level-2

parameters were found and they are given by Q1=3.984 and 42 0.516. To find the true

optimal block sizes using both uniform and biased bootstrap, we generate moving block

bootstrap estimators of Q1 and 42 for block sizes 1 = 1,..., 10. Tables 3-1 and 3-2 give

the mean, bias, standard deviation and root mean square errors (RMSE) of the MBB and

MBBB estimators of Q1 and 42 based on S=1000 simulation runs and B = 1000 bootstrap

resamples. From this table we see that the optimal block size is 1 =3 for Qi and 1 =2 for

2.*

Figure 3-1 shows the bootstrap estimates of the RMSE's of different bootstrap

procedures for parameters Q1 and 42, for different block sizes, for one realization of the

process (3-21). For each bootstrap procedure, the optimal block length is the mimimum of

the bootstrap estimates of RMSE's over all block lengths 1 = 1,..., 10. We can see that,

for this particular sample, the bootstrap procedures choose similar .plI 11, I!" block sizes.

In these simulations, we used B=1000 outer bootstrap resamples (for the biased bootstrap

recycling (BBR) and the adjusted biased bootstrap recycling (ABR)) and additional 500

inner bootstrap re-resamples for the uniform double bootstrap (UB) and the double biased

bootstrap (BB).

Table 3-1. Computation of the optimal block size for uniform block bootstrap estimation
of level-2 parameters 0i and 92 given by (3-22) and (3-23). The number of
simulations is S=1000, and the number of bootstrap resamples is B=1000.

An asterix (*) shows the block size
attained.

for which the minimum RMSE has been

Table 3-2.

Computation of the optimal block size for moving block biased bootstrap
estimation of level-2 parameters 01 and /2 given by (3-22) and (3-23). The
number of simulations is S=1000, and the number of bootstrap resamples is
B=1000. An asterix (*) shows the block size for which the minimum RMSE

has been attained.
Variance Estimation Distribution Function Estimation
1 Mean Bias SD RMSE Mean Bias SD RMSE
1 1.972 -2.012 0.679 2.124 0.5094 -0.0068 0.0162 0.0175
2 2.936 -1.048 1.032 1.470 0.5139 -0.0023 0.0164 0.0166*
3 3.248 -0.737 1.200 1.408* 0.5141 -0.0021 0.0165 0.0167
4 3.386 -0.598 1.329 1.457 0.5133 -0.0029 0.0171 0.0173
5 3.448 -0.536 1.394 1.493 0.5135 -0.0027 0.0167 0.0169
6 3.497 -0.488 1.474 1.552 0.5128 -0.0034 0.0169 0.0172
7 3.516 -0.468 1.544 1.612 0.5122 -0.0040 0.0180 0.0183
8 3.528 -0.457 1.617 1.680 0.5112 -0.0050 0.0176 0.0182
9 3.545 -0.439 1.670 1.726 0.5114 -0.0048 0.0184 0.0190
10 3.526 -0.458 1.710 1.770 0.5121 -0.0041 0.0186 0.0189

Variance Estimation Distribution Function Estimation
1 Mean Bias SD RMSE Mean Bias SD RMSE
1 1.977 -2.007 0.688 2.122 0.5092 -0.0070 0.0161 0.0175
2 2.939 -1.045 1.040 1.474 0.5135 -0.0027 0.0164 0.0166*
3 3.245 -0.740 1.201 1.411* 0.5133 -0.0029 0.0167 0.0169
4 3.383 -0.601 1.318 1.448 0.5131 -0.0031 0.0174 0.0177
5 3.451 -0.534 1.405 1.502 0.5124 -0.0038 0.0178 0.0180
6 3.484 -0.500 1.479 1.561 0.5119 -0.0043 0.0171 0.0176
7 3.504 -0.480 1.533 1.606 0.5124 -0.0038 0.0174 0.0178
8 3.524 -0.460 1.618 1.681 0.5123 -0.0039 0.0174 0.0178
9 3.523 -0.462 1.686 1.747 0.5111 -0.0051 0.0187 0.0194
10 3.514 -0.470 1.725 1.787 0.5108 -0.0054 0.0182 0.0189

o

o 0

8 10

Block Size

Figure 3-1.

Block Size

i.. bootstrap estimates of the BR 1 c s corrn --- endingg to different block
bootstrap schemes. We used B U I (:; outer bootstrap resamples (for the
biased b(... i re cycling (BBR) and the 1u'isted biased bootstrap recycling
(ABBR)) and for the uniform double bootstrap (UB) and the double biased
bootstrap (BB) an additional 500 inner bootstrap resamples for each outer
bootstrap resample

CHAPTER 4
A HYBRID BIASED BOOTSTRAP

4.1 On the Boundary of the Parametric Space

We consider now a problem where the ordinary bootstrap fails. Unfortunately the

biased bootstrap, applied in the natural way, also fails. However, we are able to devise a

somewhat contrived biased bootstrap approach that is similar to the recentering solution

proposed by Datta (1995), but can be more generally applied.

Andrews (2000) gives an example where the uniform bootstrap fails. He remarks

"... we provide such a counterexample ... [which] is quite simple but it generalizes to a

wide variety of estimation problems that are of importance in econometric applications".

Let X {X1,...,X, } be an iid sample from N(p, 1) distribution. Suppose that the

parameter space for p is the positive reals. Then the MLE of p is ~ = max {X,, 0},

where X, = n-1 Z Xi. Let Tn= n1/2(np p) and

T Z, if > 0, (4
max{Z,0}, if p 0.

where Z ~ N(0, 1). Obviously, Tn T as n -- oo.

Let K* = {X*,..., X,} be a bootstrap sample from F,, where F, is the empirical

distribution function. Andrews (2000) shows that in the case when p = 0, the bootstrap is

inconsistent. We devise here a biased bootstrap procedure that gives consistent bootstrap

estimates for the distribution of the MLE. Consider a sequence of positive reals 6, = n-,

with a E (0, 1/2) and define the resampling distribution as

G Fo, if X, < ,(42)
F,, if X, > 6,,

where Fo = argmin, {D,(p) : E piX = E pi = 1,pi > 0} and Dp is the

Cressie-Read distance.

Denote by X* = {X,..., X.*} a bootstrap sample from G,, and by *" the bootstrap

version of p, computed on X*. The biased bootstrap version of T, is

n),

if X, < 6",

if X, > 6L.

(4-3)

In order to prove that we show that
In order to prove that T7 ~, T, we show that

sup P (T, < x G,)

P(T< x) -0.

(4-4)

The following relations are true:

sup P ( T,< x|G,)
x

P (T < x)

x

+sup P (n1/2(p
x

p.)

P (T < x) I{X, > },

where I{.} is the indicator function of a set. If p = 0, it follows from the Law of Iterated

Logarithm that

lim sup I{X, > 6&}

Since when p

I limsup > 1
2n /n 1/
1 /2 /1 1 1 /2

I limsup- n--
I (log log n)1/2

0, sup, P (n1/2" < xF)

P (T < x)

g log n) > 1
nl/26 -

op(l), it follows that (4-4)

holds. When p > 0, it follows that

limsup (Xn, < 6n)

sup P (1 /2(
T,

lim sup I X <

I {limsup -x <
'< n 6n

in) < X\F.)

it follows that (4-4) holds.

P (T < x) I{x,. < <}

(4 5)

(4-6)

0 a.s.

Since

0 a.s.

P (T < x)

wol

f n1/2*,
T 1 2
81/2 (x
n n

4.2 Certain Asymptotically Nonnormal Statistics

Datta (1995) cites Babu (1984)'s paper to argue that the classical bootstrap

approximation "breaks down ... even for nice statistics." Babu (1984) remarked that

the bootstrap approximation of a smooth function of multivariate sample mean is not

consistent for "certain" values of the mean vector. Following their example, we propose

a biased bootstrap version that can correct the inconsistency of the classical uniform

bootstrap.

Let S = {Xi,...,X,} be an iid sample from N(p, 1) distribution and suppose that

we are interested in estimating p2, for which the MLE is X2. Consider

T 1/2 n (X~2 2) if p/ 0,
nX%, if p 0.

Obviously, T, P- T, where

T N(0, 42), if p / 0,

2, if = 0.

Datta (1995) shows that the classical uniform bootstrap is not valid in this case. To

rectify this, as before, consider a sequence of positive reals 6, = n-, with a E (0, 1/2) and

define the resampling distribution G, to be

G,- Fo, if X < (4-9)
F,, if IX,|> 6,,

where Fo = argmin, {Dp(p) : E 1piXi 0, lpi 1,pi > 0} and Dp is the

Cressie-Read distance. Let K* = {XI,... ,X} be a biased bootstrap sample, and

let X* be the bootstrap sample mean. The bootstrap version of (4-7) on the bootstrap

sample K* is

nX{ if X,n < 6,
T= (4-10)
n1/2 (X2 X), if |X, > b6.

Using the same arguments as in the previous example, it can be shown that T,* ~ T.

Table 4-1 gives the results of a simulation study that shows the performance of this

"hybrid" biased bootstrap. Under p=0, T = nX2 has a chi-square distribution, with

one degree of freedom. For these simulations, we have considered sample sizes n=50,

100, 200, 500. For each sample size, we have computed the quantiles of the ordinary

(uniform) bootstrap estimate (n(XF2 X )IF,) as well as the "hybrid biased bootstrap"

C(T,* \G). For this simulation study we have used B 1000 bootstrap resamples, S = 1000

simulated samples and 6 = n-.4. We have also included the true quantiles of T,, given by

the quantiles of the chi-square distribution with one degree of freedom, for comparison.

It is obvious from this simulation study that this "hybrid biased bootstrap" outperforms

the ordinary bootstrap, as expected from the theoretical result. The uniform bootstrap

performs erratically in the tails, and the approximation does not improve with increasing

the sample size.

Table 4-2 gives the bootstrap estimates of the same quantities as above, under

p=0.2. In this case, T, = 1/2(X, .22) has a normal distribution N(0, .42), for the

sample sizes n=50, 100, 200, 500. As before, for each sample size, we have computed the

quantiles of the ordinary (uniform) bootstrap estimate (n1/2(X,2 X.) IFn) as well as

the "hybrid biased bc..e1 -i I'p (T, G,). We have also included the true quantiles of T,,

for comparison. It is obvious from this simulation study that this "hybrid" biased bootstrap

approaches the ordinary bootstrap, as expected from the theoretical result. As the sample

size increases, the two different bootstrap procedures give almost indistinguishable results.

Table 4-1. T-.. i of the
given

simulation runs and 6,

S: buetion of TI under p ., and their bootstrap

by the ordinary (uniform) bo i: (UB) and the

,(HBB), using B
n- 4

1000 bootstrap resamples, S=1000

10 .15 .: .85 .95

Table 4-2. T-: quant.iles of the .: -mptotic distribution of T, under p =0.2, and their
bootstrap approximations given by the ... .. v*y (uniform) bootstrap (UB)
and the : :d" biased bootstrap (HB), using B=1000 bootstrap resamples
S 1000 simulation runs and ,, = n-.4

.05 .10 .15 .20 .80

: :

: :

i :
: :

: :
: :

: :

: :
: :

i :
: :

APPENDIX A
PROOFS OF CONSISTENCY RESULTS FROM CHAPTER I

In order to prove Propositions 1.3.1 and 1.3.2, we first present two results shared by

concave criterion functions, for more details and applications, see Giurcanu and Trindade

(2006).

Proposition A.O.1. Let f : --+ R be a concave function and suppose that there exists

to E Rq and c > 0 such that f(to) > suptes(to,,) f(t). Ift* = argmaxte,q f(t), then

* B(to, c).

Proof. Suppose on the contrary that t* B(to, ). Then, there exists 0 < A < 1 such that

t1 = At* + (1 A)to and t1 e S(to, c). Then, the following relations are true

f(ti) f(At* + (1 A)to)

> Af (t*) + (1 A)f(to)

> Af (ti) + (1 A)f(to)

> f(tl),

a contradiction. E

Proposition A.0.2. Let m,(O) and m(O) be random and deterministic functions,
P P
'' .I'.. /' ;If supOe Im, (0) m(0) 0, then supoe, m,(0) sup ee m(0).

Proof. First, consider the case when supo m(0) = oc. Since

m,(0) + nm,(0) m(o0) > m(0)

with probability one, it follows that

sup m,(0) + sup m,n(0) m(0) I > sup m(O)

with probability one. Since supo |m,(0) m(O) I 0 and supo m(O) = oc, it follows that

sup m,(0) oo.

If supo m(0) < oo, using the following relations

supmn,(O) -sup m(O) sup (m,(O) -supm(Oi))
0 0 0 01
< sup (m,(6) mn())

< sup mT(0) m(0) ,

we obtain that with probability one,

sup m,n(0) sup m(0) < sup nm,,(0) m(0),
06 6

and from here, the result follows. E

Proof of Proposition 1.3.1. Let c > 0 be small enough such that B(0o, ) C C. By

Proposition A.0.1, ignoring sets of probability zero,

sup m,(60) < m,(80)} C {, c B(O~, e) (A-1)
OES(0o,()

Since pointwise convergence in probability for concave functions on an open set implies

uniform convergence in probability on compact subsets of that open set (Pollard, 1991,

sec. 6), using Proposition A.0.2 we obtain

sup m,(0) -- sup m(O). (A-2)
oES(0o,c) oES(0o,)

Since 0o is globally identifiable, supoEs(oo,) m(0) < m(80), and using (A-2) it follows that

P ( sup m,(0) < mn,(0o) 1, as n 0oo.
0ES(0o,) /

From (A- ), we finally obtain P (6O, E B(Oo, )) 1 as n -+ oo, which establishes

the result. (When measurability is an issue, all results hold with respect to the outer

probability P*.) D

Proof of Proposition 1.3.2. As before, since m,(0) is concave in 0, uniform convergence in
probability on any compact neighborhood D of 00 included in C holds, i.e.,

sup .m(0) m(0) 0. (A-3)
OED

The consistency of 0.,i and (A-3) imply that for every 02 with 0 = (00,1, 2) C C

m,(0.,, 02) m(o0,1, 82). (A-4)

To see this, note that for large enough n, and for any compact neighborhood D of O0
containing (00,1, 02) and contained in C,

P ( n(,is, 02)- m(00o,i, 2) >C)

< P ( m,n(0,, 02) m(ni, 02) > e/2) + P ( ,n(0 2) m (0,i, 02) > e/2)

< P sup n(0) nm()| > e/2) + P m(n0,i, 2) (0,1 2) > e/2)
OED
0 as n --oo, (A-5)

We used the fact that m(-, 02) is continuous at 80,1, so that the second term in the third
line of last display converges to 0. We have also used that P((0,i ,02) D) -+ 0. Since
k.(02) :- n(0n,1, 2) satisfies the hypotheses of Proposition 1.3.1, we obtain

P
argmax kT(02) -- argmax k(02),
02:0EO 02:OEO

where k(.) = m(0o0,, ) is the in probability limit of kT. Noting that On,2 = argmax, kT(02)
and 00,2 = argmaxo2 k(02), gives the required result. E

APPENDIX B
PROOFS OF RESULTS FROM CHAPTER II

B.1 Least Favorable Families

Proof of Theorem 2.2.1. We consider only the case when p / 1, a similar proof holding

also for p = 1. From (2-6),

nVp, 6 0

{1 + (p 1) [p6 + Ab(Xi, 0)] }(2-p)/(p-1)

x (pVW + (VA)Tb(X,, 0) + (Vb(X,, 0))A) 6 A .

(B-1)

From (2-la), (2 2), and (2 5) it follows that 6\ = 0 and A|= 0, so that, from (B-1)

nVpi = pV6 = 6 + (VA)T o bb(X,,0).

(B-2)

Using (2-1b), it follows that EZ, Vpi

0, so by (2-la) we obtain

Ve o 0) Vp e== 0,
i= 1
and from (B-2) it follows that

nVp6 = (VA)T o =b(X,0).

Multiplying both sides of (B-3) by (Vpi)T 0 6 and summing over i yields

in V1p 6(Vpi) _6
i= 1

(VA) 6= 6 b(X1, e)(Vp) 6=.
i= 1

Differentiating the first restriction of (2-1b) with respect to 0 at 0 yields

S b(Xi, 0)(Vp,) 0=
i 1

(B-5)

S Vb(Xi, 0).
i 1

From (B-4) and (B-5)

n 5 p 0=6(VPi) 0=
i= 1

(VA)T 16= Vb(Xi,0).
i= 1

(B-3)

(B-4)

(B-6)

Multiplying both sides of (B-3) by b(XX, O)T, summing over i yields

Svp obb(X4, 0) (VA) Y6 o 1b(X ,O)b(X4, O). (B-7)
i 1 i 1
From (B-5) and (B-7) it follows that
--( t 1 t
VA o = ( b(X4, o)b(X4, o) 1 Vb(X4, 0) (B-8)
Si=- 1 i= l 1
Moreover, from (B-6) and (B-8) we have
n
Sy V 6 o(VP)T 6 6
i=1
n 1 -1 n (B9)
1 nVb(X, O)T b(X, 0)b(X, )T 1 Vb(X 0) (B9)
i= 1 i= 1 i= 1

whose inverse is the usual sandwich estimator of ..-i 'in'l ic variance of the Z-estimator 0

given in (1-5).

Now, consider 3 {Fo}oee as a parametric family of distributions, indexed by 0.

The Fisher information corresponding to this parametric model is given by

1(0) = EF [V(logFo) (V(logF))T]
n
i [>7[V(logPi)] [V(logPi)]T
i= 1

=5 Vp (Vps). (B-10)
i 1

If we evaluate the Fisher information at 0, from (B-10) we obtain

n
I(0) = n Vp o (Vpi) T (B-1)
i= 1
From (B 9) and (B11) it follows

I(0 = Vb(X4, ) ( -b(X4,)b(X4, 0)T 1 Vb(X, 0) (B-12)
i =1 i 1 i= 1

The last display shows that the Fisher information matrix I(0), evaluated at the

Z-estimator 0, equals the inverse of the sandwich covariance matrix. D

B.2 Consistency of the Biased Bootstrap for GMM Estimators

First we prove that the weights given by (2-6) exist with probability approaching one.

This result extends the result of Owen (2001).

Proof of Lemma 2.3.1. Suppose to the contrary that 0 is not inside the convex hull of

{b(Xi, 0) : i = 1,..., n} with probability approaching one. Then there exists an c > 0 and

a sequence (nk)k>l, such that for all k > 1

P (0 Co{b(Xj,, : j 1,...,nk}) > e,

where Co{b(Xj, 0,) : j E 1,... ,nk} denotes the convex hull generated by {b(Xj, O) :

j 1,...,nk}. Let

AT,= {w : 0 Co{b(Xj, 06{)(u) :j 1,..., nk}}.

Then for all uw E Ak, there exists 1, C R', such that I|1,|= 1 and that Wr]b(Xj, Ok)(w) >

0 for all j = 1,..., nk. From this, it follows that for all w E A ,

P y (7,x >0) =- I{m b(Xy,01)(k ) > 0} 1, (B 13)
k 1

where P1, is the empirical distribution of {b(Xj, 0 ,) : i = 1,..., nk. Consequently, for all

w E Anr ,

sup P" (rTx > 0) = 1. (B-14)
117711- 1
Using a Glivenko-Cantelli result given by the Theorem 19.4 of van der Vaart (1998,

p. 270), the following convergence holds with probability 1:

lim sup PP, (rTx > 0) P (rTZ > 0) 0, (B-15)
n 7711 1

where Z ~ N(0, V). From (B-15), it follows that

sup P, (rTx > 0) sup P (rTZ > 0), (B-16)

almost surely as n -+ oc. We take now B = limkA,, = nk>l Um>k An. Since for all
m > 1, P (A,~) > e, it follows that P (B) > e. For almost all w C B, from (B 14) and
(B-16), it follows that

limsup sup P (rTx > 0) 1 sup P (rTZ > 0). (B-17)
n 11|7-l 1|7 14|-

Since V is positive definite (Assumption 2.3.3), we know that rlTVrl > 0 for all r1 such
that I|||l 1. Thus
P (rZ > 0)= for all I||- 1,

hence sup P,,| 1 P (IT Z > 0) 1/2, contradicting (B 17). E

The next Lemma gives the order of the Lagrange multiplier A, and generalizes
Theorem 3.2 of Owen (2001, pp. 219-222).
Lemma B.2.1. Under Assumptions 2.3.1-2.3.3, = Op(n-1/2).

Proof. Using (2-6) for the particular case when p = 0, it follows that the weights are given

by pi(O) = (n(1 + ATb(X, O)))-1, where A satisfies E, p (0)b(Xi, ) 0. Hence

1 A'b(Xi, 6)
n b(X, O) 1 b(Xi, 0. (B-18)
1 1 + ATb(Xs, 0)
Hence,
1 b(Xi) b (X (10)
-Yb(X4, ) (B-19)
n n( + ATb(X, 6))

Let S r b(X= ,Ob(i(Xi,)T and S s Y b(XiO)b(Xi',)T, so that from (B-19) it follows
ti 1 n (l+ATb(Xi,0)) i=1
that SA = 1 Ei, b(Xi, 0). Since pi > 0, we obtain 1 + ATb(Xi, 0) > 0, hence

A
IJA II

IAII b(X, O)b(X, 6)A

< b(X,'O)b(Xio-)TA
- n(1+ ATb(X, O)) (1 + A max b(X,) )
S || A || (1 + || I||11 ()

- t b(Xi,0) (1+ ||A||II ,),
i= 1

where

W.= max i b(X,, 0).
i= 1,...,n

(B-20)

From the last display, we obtain

Al (S

n n
w -- b(X, o) < b(X, 0)
i=1 i=1

(B-21)

Since

V + op(1),

S=1 b(Xi, 0)b(Xi, 0)T
i= 1

and by Lemma B.2.2, we have W~ =

Sb(Xi, )
i= 1

Op(n-1/2),

op(n1/2), hence, from (B-21) it follows that A

Op(n-1/2).

The next lemma gives the order of Wn defined above.

Lemma B.2.2. Under Assumptions 2.3.1-2.3.3, we have WW

given by

W = -max Ib(X, o0)\.
i 1,...,n

op(n1/2), where Wn is

(B-22)

Proof. Using Assumption 2.3.2, by the mean value theorem and triangle inequality, it

follows that

IIb(Xi,0)11< IIb(Xi,O0)11+1 10

0o||k(Xi),

with probability one. Hence, the following inequality holds with probability one

max Ib(Xi, 0)11< max Ib(X, 0o)1||+|| 0o|| max k(Xi).
i= 1,...,n i= 1,...,nr i 1,...,n

(B-23)

Using Lemma 11.2 of Owen (2001, p. 218), it follows that

max IIb(X, O0)o1= op(n1/2) and max k(Xi) = op(1/2).
i= 1,...,n i= 1,...,n

Using that 0 = o + Op(n-1/2), from (B-23) we have W1 = op(1/2). [

The following lemma provides us with an order of convergence that will be needed in

the proofs of the following theorems.

Lemma B.2.3. Under Assumptions 2.3.1-2.3.3,

1
max
il,...,n 1 + ATb(Xi, )

Op(1).

Proof. The following inequality holds on the set D= {I||AII |, < 1}

1 + ATb(X,) 1- ||IIA||I,

Since A = Op(n-1/2) and W, = op(n1/2), it follows that |IIX|11 =

From (B-24), we obtain that maxi 1 .... I+1Tb( = Op(X)

The next proof shows the consistency of the biased bootstrap

moment conditions.

(B-24)

op(1) and P(D,) 1.
[]

for sample mean of

Proof of Theorem 2.3.1. Conditional on X1, X2,..., the term I Y 1 b(X*, 0) is the

sample mean of n iid random variables. Using the definition of Fo, the (conditional) mean

and variance of these observations are

Eg [b(X;,)] = pib(Xi,0) 0
i= 1

Var6 [b(X;, )] = pib(Xi, 0)b(X0, )T.
i= 1

and

(B-25)

(B-26)

First, let us show that Z 1pi b(Xi,0)b(Xi, )T V as n -- oo. We rewrite (B-26)
as

p pib(Xi, ) b(Xi, 6)T
t-i
i t 1
i bb(X, 0)b(X, 0)T + (p ) b(Xi,0)b(Xi, O)T. (B-27)
i= 1 i= 1
From Assumption 2.3.2 it follows that the following uniform law of large numbers holds,
being an application of Theorem 19.4 of van der Vaart (1998, p. 270)

sup 1 b(Xi,0)b(X,0)T- V(0) 0 as n oo,
0 n
i= 1

where V(O) = Er [bob ]. Now, since 0 -P Oo and V = V(00), it follows that the first
term on the right side of (B-27) converges in probability to the .. i!,ll ic covariance
V as n -- oo (for this we also use the fact that V(O) is continuous at 0o, which follows
from Assumption 2.3.2). For the second term of (B-27), using the fact that pi (n(1 +
ATb(X, 0)))-1 from (2-9), we obtain the following relations (we use ||-|| to denote the
Euclidean norm for vectors and the induced operator norm for matrices):

ri 0)T= k Ilxllllb(X, )113
l(pj 1/n)b(X i, 6)b(X i, -)T < I | A| | b(XO, 6)113

< IIllmax 1 lb(X, 0) 3
1 S Op(n-1/2)Op(1)Op(n1/2) = Op(), (B-28)

where the next to last relation follows from Lemmas B.2.1-B.2.3.
Since X*s are sampled from Fo, which changes with n, in order to complete the proof,
the Central Limit Theorem for triangular arrays is used. It suffices to show that for every
e > 0 (van der Vaart, 1998, p. 20)

Eo [llb(X,0) b2j( b(X0, ) > > 1/2)] 0 as n oo. (B-29)

The following relations are true:

EB [llb(X, 0)2( b(X, 0) 1> c 1/2)] Y 6llb(X 0)|2(Jlb(X,0)|> c,1/2)
i1 (B-30)

pi|llb(X )|2 n > C 1/2).
i-1

As before, EC l pi Ib(Xj,0)2 Ellbool2. From Lemma B.2.2, I(W, > cn1/2) 0, and

hence, (B-29) follows from (B-30).

Let Gb = ( ( (0b* 0) lF) and G N(0, V). Using a subsequence argument for

convergence in probability (Kallenberg, 2002, Lemma 4.2, p. 63), it follows that for any

subsequence (mn) C (n), there is a further subsequence (I,) C (mn) such that (B-29)

holds along (IT) almost surely. Using the Central Limit Theorem for triangular arrays

along (lI), it follows that 6(Ga G) ^- 0, where 6 is a metric on the space of distributions

that metrizes the weak convergence. From this, using again the subsequence criterion, it

follows that 6(Gg, G) P~ 0. In other words, we have shown that any subsequence of {Gg }

has a further subsequence that converges in distribution to G almost surely, so that, the

sequence converges in distribution in probability. O

We now prove the (conditional) uniform convergence in probability result that will be

used in proving the consistency of the GMM estimators (and their bootstrap versions) in

Theorem 2.3.2.

Proof of Lemma 2.3.2. Without loss of generality, we suppose q = 1. Using the same

arguments as in Example 19.8 of van der Vaart (1998), there exists a finite sequence

of open balls U,I i 1,..., n, such that 0 CU U b/ buj(x) < b(x, 0) < bi(x) for

all 0 E Ui and for all x, and EF IbU bu,~ < e, where bu (x) = infeE, b(x, 0) and

bU(x) = supoe b(x, 0). For all 0 e Uj,

n n n
SEbuX) < 1Eb(X 0) < 1 Eb (X)l (B 31)
i= 1 i= 1 i= 1

with probability one. Using the same techniques as in Theorem 2.3.1, we have I bU j (X )
EF bU + o(l), E' i bu(X) Ebu + op(1), and 1 l b(X", 0) E be + o(1).

Hence, for all 0 Uj, EF bu, + o(lt) < b(O) < E, bu3 + o(l). Hence, for all 0 e Uj

l b(X;*,0) E be < +o(1). (B-32)
i=

Taking now the supremum over 0 e in(B-32), we obtain

i n
sup b(X;*,) -EFbo < + (1). (B-33)
6ee n
i= 1

Taking now c 0 in (B-33), we obtain the desired result. E

Proof of Theorem 2.3.2. First, by Lemma 2.3.2, we have

sup IQ(0) Q(O) op(1) (B-34)
eee
OEH

and

sup I* (0) Q(0)| op(l), (B-35)
eee
where we use the same technique for proving the first uniform convergence result of

(B-34). By Corollary 1.2.1 we obtain the consistency of 0. Using standard techniques,

this in turn would imply that 0 = 00 + Op(n-1/2), so that all conditions of Theorem 2.3.1
hold. Since Q( 0*) < Q(0) + o*(1) (by hypothesis) and Q(0) Q(0) + o*(1) (by

Theorem 2.3.1), we obtain

Q*(0*) <- Q(0) + op(l).

Hence, using the fact that Q,(0) < Q(Oo) Op(1), we have

Q(0o) Q(* ) Q Q(0*) + Op(l)

> Qc*(0) Q(0*) + op() o (1). (B-36)

But, for every c > 0, there exists r] > 0 such that for every 0 with 110 0oll> C,

Q(0) > Q(0o) + r. Thus

{1|1 0o1 > } C {Q(0*) > Q(0o) + T7}.

Since P,{Q(0*) > Q(0o) + Tl} 0, we have P,{1|0* o||l> e} 4 0, and hence
0* 0o + o(1). E

The next lemma will be used in the proof of the Thereom 2.3.3. It is a version of
Slutsky's theorem formulated for conditional convergence in distribution in probability.
Lemma B.2.4 (Lahiri (2003, Lemma 4.1, p. 77)). For n e N, let a*, b* and T,* be random

variables, all /. I;,, 1 on a common jI,,.. '1l,l.:l. space ( P, P). Suppose that X, is a
sub-a-field of I and that there exist X, -measurable random variables a and b such that

P(|a a > e Xt) + P(lb, b1 > c X,) 0 as n oo, (B-37)

for every c > 0. Also, suppose that T,* ^ T co,'l././..-, .'ll;' given X,, where T represents a
--measurable random variable. Then,

aT* + b aT + b. (B-38)

Proof of Theorem 2.3.3. For simplicity we take the case when p = 1. Let '((0) =
Vb,(0)TWb,(0). By Taylor's theorem there exists a 0* between 0 and 0* such that

1
0 0*) (0) + (0* 0) (0) + (0* 0)22 (0*), (B-39)

where
VW~j() V2bT(0)TWb (0) + Vb(0)6WVb.(0), (B-40)

and

V2** b Wbl 3 0T *O) +3 2b T WVb( 0). (B 41)

Since V2'*#(0) E + o(1), b*() = o(1), and Vb() D + o)(1), it follows that

V7 (0) DT WD + o*(1), (B-42)

where E = E [V2b(Xi, o)]. Moreover,

IIV21*) < I V3 b ) II l I *) ll+311 6 (b*) 1 1Vb:(b*)l
S/1 n 2
< 4||11| n-1k(X*))

Since (n-1 E i k(X ))2 (0- ) o (1), it follows that

(-* )- )2 ( *) o= o(). (B-43)

By substituting (B-42) and (B-43) in (B-39), we obtain

-WI(#) = (* 0)(op(1) + DTWD + ( )V2 9*))
2 (B-44)
=( 0)(DTWD + o*(1)).

Hence
V(0* 0) = -(DTWD)- 1n'(0) + o*(1). (B-45)

Using (the conditional version of) Slutsky's thereom and Theorem 2.3.1, it follows that
VnIf(0) 2 N(O, DTWVWD), and hence

V(0* 0) ^ N(O, (DTWD)- DTWVWD(DTWD)-1)

Remark B.2.1. Neither recentering nor the biased bootstrap are necessary in order to
assure bootstrap consistency of the GMM estimators. In other words, the ordinary
uniform bootstrap consistently estimates the distribution of V(-( 0o), a fact also
confirmed by Hahn (1996).

We show now that in the case of the uniform bootstrap, the previous consistency

result holds. To this end, all results from the previous proof hold, up to and including

(B-45). Hence, we need to show now that

Vv/I(O ) #- N(0, DTWVWD).

(B-46)

In the same way as in the proof of Theorem (2.3.1), it is easy to prove that the following

convergence holds for the uniform bootstrap

v((b(b) b,(#)) N(0, V).

In order to show (B-46), by (conditional) Slutsky's theorem, it is hence enough to show

that

nVb(O) TWb() = op(1).

This follows from the following relations:

vnVb(O)T Wbn(O)

o ) (by conditional Slutsky theorem).
OP(t) (by conditional Slutsky's theorem).

We now prove that the biased bootstrap for the J-test for overidentified restrictions is
consistent.

Proof of Theorem 2.3.4. As in the proof of Theorem 2.3.3, using the Taylor expansion we

obtain that

V-1/2 VnbV ) V-1/2 VVnb ) + (* _))V-1/2Vb () + op(1).

From (B-45), for the particular case when W = V-1, it follows that

(B-47)

(DTV- D)- 1r /n (O) + o*(1).

v (* 0)

(B-48)

From (B-47) and (B-48), we obtain

v-1/2 Vnb0*) = -1/2Vn() + v(* 0)V-1/ 2b(0) + oP(1)
v-1/vnb ) V-1/DV(V-l)- lDTV-1/2V-1/2 ) + o(v1)
S[I V-1/2D(DTV-D)- DTV-1/2] V-1/V2 n 0) + o(1). (B-49)

Since V-1/2D(DTV- D)-IDTV-1/2 is idempotent of rank p = 1, it follows that [I -
V-1/2D(DTV-1D)-IDTV-1/ ] is idemptotent of rank q p (remember that we consider
the case p 1). Moreover, from Thereom 2.3.3, it follows that V-1/2 b (0) NV(O, I).
Hence

nQn(0*) = b*(0*)TV- b*(0*) [V-1/2 Tb*(O*)]TV-1/2 6*(*)

[V-1/2 ~ (0)]T [I- V-1/2D(DTV- D)- DTV-1/2] V-1/2 ()
Xq-p (by Theorem 2.3.1). (B-50)

Consequently, nQ(0*) 2 xp.*

Remark B.2.2. We will prove now that without recentering, (or without applying
the biased bootstrap), the bootstrap estimate of the distribution of the J-statistic is
inconsistent. First we show the following (unconditional) weak convergence

b*(() bn(0) V 0
(0) b() N(0,B), B (B-51)
bn(0) 0 A

where A = I V-1/2D(DTV-1D)-IDTV-1/2. For every x, y E Rq, using the bounded
convergence theorem and the analogue of Theorem 2.3.1 for the uniform bootstrap, the

following relations are true:

P (vn(b:(#)- b.(6)) < x; b(.) < y)

E (I(b,(0) < y) E,[I(v/(b (0) b,(0)) < x)|l])

SE (I(b,(0) < y)Fx(x)) + E (E, [I(v/(b(b) b,(6)) < x)1^] F(x))

=Fx(x)Fy(y) + o(1), (by Theorem 2.3.3), (B-52)

where X ~ N(0, V) and Y ~ N(0, A). Hence, (B-52) holds, so that, the following

(unconditional) weak convergence holds

V-1/2 nb ) N(0,I+ V-1/2AV-1/2).
n -, N(O, I + AV-

(B-53)

Hence,

AV- 1/2 /b() N(0, A + AV-1/2AV-1/2A).

(B-54)

Suppose on the contrary that nQ (0*) ^ >, then, by the bounded convergence theorem

and (B-49) it follows that ,b (O)TV-1/2AV-1/2b*O) 2_. From (B-53), we see that

C = A + AV-1/2AV-1/2A should be idempotent of rank q p (Driscoll, 1999). Of course

this is not alv-i-b the case, e.g. for V = I, C = 2A, which is not idempotent.

B.3 Consistency Results for Bootstrapping 2SLS Estimators

First we prove a useful lemma.

Lemma B.3.1. Under the Assumptions 2.4.1-2.4.3

1 i- 1

(B-55)

i= 1
Qo z + op(1),
v/ .

where

(B-56)

Q = I QiQd QQ t) 'QQ -1)

is an idempotent matrix of rank q p.

ini
Z z i(yi

ti 1 i(

itn
i n
>111zii
U~-i

i 1 T

QT fn(

3) +op(l).

V(~3- 3) = V(XTZ(ZTZ)-IZTX) -XTZ(ZTZ)-IZTC

= (QxQiQT zQ- 1 zi + op(i),
rom (Bi57) and (B 1

from (B-57) and (B-58), it follows

n
>11
i 1

n 1

Szi +

z n + ().
Q Yzici+Op(l).

Remark B.3.1. As an immediate consequence of Lemma B.3.1, it follows that

i 1n
i

where

R = 2QQQT

.2Qzz 2QT (QZQ QT -lQz
a Q(z Q Q

which is a matrix of rank q

Proof.

Sn

i 1

Since

(B-57)

n

1 n

(B-58)

(B-59)

(B-60)

(B-61)

xo X ( ))

Here is the consistency proof for the proposed biased bootstrap procedure for 2SLS

estimators. The main steps of the proof are similar to the proof of consistency of GMM

estimators discussed in the previous section.

Proof of Theorem 2.4.2. We need to show that

z1 (z zc) N(0, u2Qzz).

i=1 j= 1

For fixed sequences yi, y2,..., Zi, z2, .., the term n-1 Zi 1(* *- 1 Zn i Zjj)

is the mean of n observations zic n-1 i zj. The (conditional) mean and variance

of these (hypothetical) observations are

n
E, [z*:* n-1 j c]
j-1

n
3=1

n
n-1 Y zjj = 0,
j=1

n
Var, [ze: n-12 ZJ]
j 1

The following convergence holds

n n
i=1 i=1
1 n

i 1
u2Qz + op(l).

j 1
3=1

(B-63)

n n 1
j=1 j=1

i= 1

Y2Zzc ( i3)) + E YZ(xT (X ))2
i=1 i 1
(B-64)

Since (x*, z*, e*) are sampled from the empirical distribution on the sample 3 which

changes with n, we use the Central Limit Theorem for triangular arrays. By Proposition 2.27

of van der Vaart (1998), we need to verify that the Lindeberg condition holds, i.e., for any

6 > 0,

E, z: n-1 ejzj I Z* -1 Z > 61/2 0.
j=1 j=1

(B-65)

and

(B 62)

The following relations hold:

E,I iz n-1 I n-1> 12
j=1 j=1

zt n nz 2 1 12
S Ziii n~-1 Zjj I ( Zi -- jZji n6T ,112)
i= 1 j= 1 j= 1

n1i2)
Sj=1 j=1
[n n
Sz nj 2 I ( max zi + Y c z > Jul/2) (B-66)
i=- j-1 j=1

Using the assumption 2.4.1-2.4.3,

n
max Izj||~jI+ 1nY jZ e
j= 1,...,n
j=1

max IIZj(yj j)ll+op(1)
j 1,....n

< max lzj(yj xj) (0 0) zj
j 1,....n
< max |zyy + /3-/3|| max ||aj||
J 1,....n Jlz 1....n
S op(O1/2) + Op(n-l/2)Op(n21/2) Op(1/2), (B-67)

where for the last relation we used the result of Lemma 11.2 of Owen (2001, p.218). Hence

I (maxj 1,...,n ZjC + 1 = jZj > u1/2) 0. Moreover, using the same arguments

as in the proof of (B-64),

1 ii -1 j z -E 2lzl
i= 1 j= 1

therefore (B-65) holds. As before, using the subsequence argument, it follows that, any

subsequence, has a further subsequence that converges in distribution conditionally almost

surely, so that, the sequence converge in distribution conditionally in probability. D

Proof of Theorem 2.4.3. The following relations are true

n 1/2( 3) /2 ((X*TPX*) 1TP -)

Sn1/2 ((X*TP X*) lX*T(X* + e*)

/2(X*TP;X*) lX*TPE*.

Using that n-1X*TZ*

n-1 E fi *xz

Q+z+o p() and n-1Z*TZ*

n-1 1 ziz*T

Qzz + o (1), we obtain

Sn

n
i1

S1

Using Theorem 2.4.2, we only need to prove that

n1

1
QXQ- /I- i

(QxzQJIQTZ)

op(1).

(B-69)

(B-70)

Using Lemma B.3.1, we obtain

(QzQQ-1 T Q-1 zQ-1 1
zz 5 z1i
1-

i 1
Q Qo + oQp(1) o-(1). (B 71)

-0 + op(l) op(l). (B-71)

Hence (B-70) holds, so that (2-44) holds.

(B-68)

i 1

(&QxzQLIQT 1QIr&xzQ (

APPENDIX C
PROOFS OF CONSISTENCY RESULTS FROM CHAPTER III

Proof of the Lemma 3.3.1. We need to show that 0 is inside the convex hull of {Uj :

j = 1,..., N} with probability approaching one. Suppose to the contrary that the above

statement is false. Then there exists an e > 0 and a sequence (Nk)k>1, such that for all
k>l

P (0 Co{U:j= ,...,Nk}) >,

where Co{Uj : j 1,..., Nk } denotes the convex hull generated by Ujs. Let

ANk= U{: 0 Co{Uj(w) : j 1,..., Nk}}.

Then for all w E AN, there exists r, E R with Ir1,| = 1 such that rl7Uj(w) > 0 for all

j = 1,..., Nk. From this, it follows that for all Lw c ANk,

P Nk
P" x > 0) = I{1 Uj(L) > 0} 1, (C-1)
wk (C
j=1

where PNk is the empirical distribution of {U j = 1,..., Nk}. Consequently, for

all wL c AN, supl P 11 P~ (OTx > 0) = 1. Using a Glivenko-Cantelli result given

by Theorem 19.4 of van der Vaart (1998, p. 270), the following convergence holds with

probability one:

lim sup IP, (rx > 0) P (OTZ > 0) 0, (C-2)

where Z ~ N(0, Y,). From (C-2), it follows that

sup P, (rfTx > 0) -- sup P (rTZ > 0),
ll ll i1 11 ll- 1

almost surely as n -- o. We take now B = limkANk = k>1 Um>k ANm. Since for all

m > 1, P (AN,) > e, it follows that P (B) > e. For almost all w c B,

limsup sup P~ (rTx > 0) 1 sup P (OTZ > 0). (C 3)
k 1|411=1 1411 =1

Since YE is positive definite (Assumption 3.3.3), we know that rToEr > 0 for all r1 such

that ||r||= 1. Thus
1
P (TZ > 0) for all 1|7|1 1,

hence sup||,- 1P (1rTZ > 0) 1/2, contradicting (C-3). E

Let WN = maxi 1,...,N I Ui1. Next lemmas give the order of A and WN, and generalize

the results of from the previous section for dependent data.

Lemma C.0.2. Under the same conditions as in Ti,,.., 1,, 3.3.1,

II||A| Op(lN-1/2) (C4)

Proof of Lemma C.0.2. From (3-11), it follows that

1 N [ I UT 1
U s 1 XT 0. (C-5)

Hence,
1 N N U (C6)
N 1 i N( + ATU)
N UUT N U.UT
Let S i and S N -, so that from (C-6) it follows that SA =
N(I+\TUi) N
1N il Ui. Since pi > 0, we obtain 1 + AT > 0, hence

III A II UiUiA

< (1 + IIA|| max I1|U 11)
< N(1 + Ti) i=1,...N
S||SA|| (1 + ||11||Il )
1 N
S AE^ (+ AHIIN ),
i 1
where WN = maxi 1,...,N || U. From the last display, we obtain

Ili=A S --VI N Ni N(C

Using the similar results of Lahiri (2003, p. 52), the following orders hold

I oN N( 12)'
i= i i=

and the result of the next lemma that WN o(l-lN1/2), from (C 7) we obtain the

following order for the Lagrange multiplier A,

II|A| Op(lN-1/2). (C8)

Lemma C.0.3. Under the same conditions as in Ti, .., 1,, 3.3.1,

WN- o(Nl/2-1). (C-9)

Proof of Lemma C.O.3. Since

i+l- 1
|Uj(0)||< ||U(0o)+||1 1 k(X)|l 0o|| (C 10)
3=i

it follows that
i+l- 1
max \Uj(0) \< max U||U(8o0)||+|0 0o|| max 1- k(X). (C-11)
i=1,...,N i=1,...,N i=1,...,N
j=i

Hence, it is enough to show that maxi 1,...,N||l U(0o) || o(1-NV1/2). Since for any integer

i > 1, E [llb(X, 00) 2+6] < oo, it follows that for any A > 0, E I P(l b(Xi,00o) 12+->

An) < oc. Using the (strict) stationarity property, b(Xi, Oo)s are identical distributed,

hence

SP( lb(X, 00) 2+6> An) < oc. (C-12)
n>l
Therefore, by Borel-Cantelli lemma, \\b(X,, 0o)| < A1/(2+6) 1/(2+6) eventually, for any

A > 0. This implies that for every i 1,..., N,

i+l -1
U Ji(Oo) = | b(Xj,Oo) < A1(2+6)N1/(2+6)
3=i

eventually, for any A > 0. Consequently,

VN max I||jU(0)> i=1,...,N

eventually, for any A > 0. Taking A arbitrarily close to 0 in a countable set, we obtain

VN o(N1(2+6)) with probability one. Obviously, N1(2+6) < N1/21-1 for O(NS/(4+26)).

Taking now I = O(N/(4+2)) we finally obtain VN = o(N1/21-1) with probability one.

Hence WN = o(N1/21-1) with probability one. D

In the proof of Theorem 3.3.1, we will also need the following result.

Lemma C.0.4. Under the same conditions as in Ti, .,. n,, 3.3.1, ,

1
max Op(1).
i=1,...,N 1 + A TUi

Proof of Lemma C.0.4. The following inequality holds on the set DN = {A II N|| < 1}

1 1
U < (C 13)
1+ ATU t 1 ||A||1i, N

Since A = Op(lN-1/2) and WN = Op(1-1N1/2), it follows that P(DN) 1 and I|A||II N

op(1). From (C-13), maxi 1...,N +U Op(1).

Proof of Theorem 3.3.1. Let UI, = 1-1/2 +1-' b(Xj, 0) be the (scaled) average of

moments at 0 in block Bi and let U, = 1-1/2 -+1 b(Xj, 0) be its bootstrap version. For

a fixed sequence X1,X2,..., the term b#(0) = b-1' E U, is the mean of b (hypothetical)

observations. Conditionally on X, U1*, i = 1,...,b are iid, with P (U = Uj) = pj, for

all j = 1,..., N. Using the definition of Fo, the (conditional) mean and variance of these

observations are
N
Eb [U.*] = piUi 0 (C-14)
i=1
and
N
Varo [UJ] = piUiU. (C-15)
i=1

E-1/2 1 U,i, it follows that

N
Varo [U*1,i] =- EpUiiU-.
i 1

Using the same result as in Lahiri (2003, p. 52), we obtain

U Y Ul,Ui U cT.
i 1

In order to show that Varb [U ,i] Y,, we only need to show that

N N
N E 1,i U i lE,iPli,
i= i= 1

Op(l).

(C-16)

(C-17)

(C-18)

To this end, we use the following inequalities

pi)

iN

N
< || IA II U11 111 |2
-ll z^^1 7ri

1 1v II
< ||A|| max IlUill max 1
i=1,...,N i=1,...,N 1 + TUi i-
Op(lN-1/2)op (1-1l/2)0p(1)0p(1)

where, for the last inequality, Lemmas C.0.2-C.0.4 are used. Since Eg[U*,i]

UN' 11 2
N

0, and

P
Var [Uj,j] YE, the central limit theorem for triangular arrays is used. We only need to

verify that the Lindeberg condition holds, i.e. for every c > 0,

Eb [|Ui 121 (| 1Ui > b1/2C)] 0. (C 19)

Since Ui, i 1,..., b are conditionally iid, it follows that

E, [I|UI ||2I ,1 (U| > b1/2e)]

N
pi | U1,ijj2 ( ,i||> b1/2
i= 1

pijUij21( maX 1,i b1/2C).
1 lmaxJIjj
i= 1,...,N
i= 1

Since n, /2b()

b-1/2 lb 11/2Ui*

Since maxi 1,...,N ||UI, I op(N 1/21-1/2) = Op(b1/2), we have

I( max ||Ul,i| b 1/2e) -P0.
i=1,...,N

Since Ei1 l llU, ll2 Ellb(X, 00o2 (similarly to (C-17)), it follows that (C-19) holds.

Using the same subsequence argument as before, it follows that any subsequence, has a

further subsequence that converges weakly, conditionally almost surely, so that, the full

sequence sequence converge weakly, conditionally in probability. D

Proof of Theorem 3.3.3. For simplicity we consider the case when q = 1. By Taylor's

theorem, there exists a 0* on the line between 0 and 0* such that

1
0 b:(0*) b*(0) + Vb(0)(0* 0) + (0* )TV2b(*)(* 0). (C-20)

Using Assumption 3.3.2, it follows that IIV2b, (0*) < n-1 E k(X*). Hence

(0* 0)TV2b*(0*) = op(1).

Since Vb7(0) D + o)(1), from (C-20) it follows

1
-b*(0) (D + o(1) (* )T2bv0*)(* 0)

(D + op(1l))(0 0). (C-21)

Hence,

v/7(0* 0) -D- 1 b*0)o + op(1). (C-22)

Using Theorem 3.3.1, it follows that nlb*(0) -, N(0, Y,), and hence, using the

(conditional) Slutsky's theorem, we have

1(0* 0) N(0, D -' (D-1)). (C-23)

[]

REFERENCES

PAGE 13

Hall ( 2005 ). E[b(X1;0)]=EF[b0]=0:(1{1)HavingdenedthemomentconditionthatidentiestheGMMmodel,wealsorequirethat0begloballyidentiable( Hall 2005 ). EF[b]6=0forall2with6=0:(1{2)Usually,inorderfor0tobegloballyidentiable,itisnecessaryforthedimensionofb(the\basicscore")toequalorexceedthedimensionoftheparametervector,i.e.pq.Henceforthwewillassumethatthisconditionissatised.Whenp=q,theZ-estimatorfor0=(F)isobtainedbyconsideringthesampleversionof( 1{1 ), EFn[b]=1 13

PAGE 14

vanderVaart 1998 ,Theorem5.21,p.52).Thesandwichestimator^,thenonparametricestimatoroftheasymptoticcovariancematrixoftheZ-estimator,isgivenby ^=1 1{3 )usuallyhasnosolutionwhenq>p,eventhoughthepopulationequation( 1{1 )issatised.OnewayaroundthisproblemistoconsidertheGMMestimator. ^n=argmin2bn()TWnbn();(1{6)wherebn()=1 Hansen ( 1982 )rstshowedthatunderclassicalregularityconditions,theGMMestimatorisconsistentandasymptoticallynormal.Usually,consistencyresultsforGMMestimatorsareobtainedassumingoneoftwotypesofconditions.Eithertheparameterspaceisassumedtobecompact,inwhichcasethecriterionfunctionisrequiredtobecontinuous 14

PAGE 15

Hayashi 2000 ,pp.456{458).WepresenthereageneralconsistencyresultbasedonTheorem5.7of vanderVaart ( 1998 ,p.45).Supposethatforevery>0 sup2jQn()Q()jP!0;(1{8)and inf:k0kQ()>Q(0):(1{9)Thenanysequenceofestimators^nwithQn(^n)Qn(0)+oP(1)convergesinprobabilityto0.Itiseasytoseethatifiscompactandb()iscontinuousthentheglobalidentiabilityproperty( 1{2 )andcondition( 1{9 )areequivalent.Moreover,ifwefurtherassumethatb(x;)iscontinuousinforallxandEF[sup2kbk]<1,thenusingaUniformLawofLargeNumbers( vanderVaart 1998 ,Example19.8,p.272),wealsocanestablish( 1{8 ).Consequently,wehaveestablishedthefollowingcorollary(forotherproofs,see,e.g., Hall ( 2005 ,p.67), DavidsonandMackinnon ( 1993 ,p.592),and Matyas ( 1999 ,p.13)). 1.2.1 and 1.2.2 ,respectively.SupposefurtherthatRpiscompact,thatb(x;)iscontinuousin,andthatEF[sup2kbk]<1.ThentheGMMestimatordenedin( 1{6 )isweaklyconsistent,i.e.^nP!0:Underadditionalregularityconditions, Hansen ( 1982 )showedthatGMMestimatorsareasymptoticallynormallydistributed.Wepresentherethemostcommonconditionsfoundinthestatisticalandeconometricsliteraturetoassureasymptoticnormality.If,inadditiontotheconditionsofCorrolary 1.2.1 ,b(x;)iscontinuouslydierentiablewith 15

PAGE 16

Hansen ( 1982 )provesthatthechoiceofWn=bV1givesthesmallestasymptoticvarianceofGMMestimators,where^VisanyconsistentestimatorofV(notethoughthatWnisthenrandom).InthiscasetheGMMestimatoriscalledecient.However,inordertoestimate^V,wemustrstestimate0.Hansenproposedatwo-stepGMMestimator,whichisobtainedbyrstcomputinganinitial(inecient)estimator~nof0usinganarbitraryweightmatrixW(usuallytheidentity),andthenletting^n=argminbn()T^V(~n)1bn(),where^V()isanestimatoroftheasymptoticcovarianceofbn().Usually,^V()=n1Pni=1b(Xi;)b(Xi;)T,orthecenteredversion^V()=n1Pni=1(b(Xi;)bn())(b(Xi;)bn())Tinsemiparametricestimation,butthecovariancecanbealsomodeledparametrically.Theasymptoticdistributionofthetwo-stepGMMestimatoris Hansenetal. ( 1996 )andisdenedtobe ^n=argminbn()T^V()1bn():(1{11)ItcanbeshownthatthecontinuousupdatingGMMestimatorisconsistentandhasthesameasymptoticpropertiesasthetwo-stepGMMestimatorgivenin( 1{10 ). 16

PAGE 17

Hansen ( 1982 ),usesthestatistic 1{12 )fortestingtheover-identifyingrestrictions( 1{1 )satises ( 2003 )analyzedtheGMMmethodologywithinparametric,semiparametric,andnonparametricframeworks.Theyidentifythreelevelsofnestedmodels.Attherstlevel,wendtheparametricmodel.Here,ingeneral,weidentifysomebasicscoresthatdenetheparameterofinterest.Forexample,foraunivariatedistribution,wecantakeb1(x;)=xasabasicscoreforthemeanandb2(x;)=sign(x):5asabasicscoreforthemedian.Inthecaseofnormaldistributionwithknownvariance,sayN(0;1),bothbasicscoresprovideconsistentestimatorsoflocationparameter0.Inthiscase,anyecientGMMestimatorbasedonthebasicscoresb1andb2isfullyecient,inthesensethatitsasymptoticvarianceequalstheinverseFisherinformation.Iftheparametricmodelisnotcorrect, LindsayandQu ( 2003 )denethesemiparametricmodelimpliedbythemomentconditionstobethesetofalldistributionsFcompatiblewiththescores,i.e.,thesetofallFforwhichthereexists2suchthatEFb=0.ThentheGMMestimatorsareconsistentunderweakeningofmodelassumptions.Ontheotherhand,iftheparametricmodelholdsandthescoresarecorrectlyspeciedinthesemiparametricmodel,thenanyecientGMMestimatorisrstorderequivalenttotheMLE. 17

PAGE 18

LindsayandQu ( 2003 )callQn()\thequadraticinferencefunction"andtheyshowthatthisinferencefunctionmimicsthepropertiesoflog-likelihood(evenwhenthesemiparametricmodelisfalse): vanderVaart ( 1998 ), Sering ( 1980 ),and Huber ( 1981 ).Inthiscase,theparameterofinterest0isgivenasamaximizerofthepopulation\criterionfunction"m() ^=argmax2mn():(1{17) 18

PAGE 19

1{15 )isconcave( Hayashi 2000 ,p.468). Hayashi ( 2000 ,p.458)presentsapropositionfrom NeweyandMcFadden ( 1994 ,pp.2133{2134)thatestablishesconsistencyforM-estimatorsbasedonconcavecriterionfunctions.Proposition 1.3.1 belowgivessucientconditionsforconsistencyunderweakerassumptions.Itsproof,giveninAppendix A ,doesnotrequirethatthesamplecriterionfunctionsconvergeinprobabilityontheentireparameterspace,nordoesitdependontheresultthatM-estimatorscorrespondingtocontinuouscriterionfunctionsareconsistent.Moreover,itdoesnotrequiretheparameterspacetobeconvex,asin Hayashi ( 2000 ,p.458).Theproofusesonlythefactthatpointwiseconvergenceinprobabilityforconcavefunctionsonanopensetimpliesuniformconvergenceoncompactsubsetsofthatopenset( Pollard 1991 ,sec.6)andiseasilyadaptedtoaccommodatenuisanceparameters,asinProposition 1.3.2 (proveninAppendix A ).Forfurtherdetailsandapplicationstosomenancialriskmeasures,see GiurcanuandTrindade ( 2006 ).DenotebyS(t0;)=ft2Rq:ktt0k=gandB(t0;)=ft2Rq:ktt0kgthesphere,respectively,theclosedballcenteredatt0ofradius.

PAGE 20

1.4.1ReviewonEmpiricalLikelihoodEmpiricallikelihood,introducedinaseriesofpapersby Owen ( 1988 1990 1991 ),isanonparametricapproachtoinferencewithapplicationsinmanyareasofstatistics.Empiricallikelihoodallowstheuseoflikelihoodmethodswithoutnecessarilyassumingthatthedataaredrawnfromaparametricfamilyofdistributions.As Owen ( 2001 ,p.1)remarksinhiscomprehensivemonographonempiricallikelihood,theadvantagesofempiricallikelihoodarisebecause\itcombinesthereliabilityofthenonparametricmethodswiththeexibilityandeectivenessofthelikelihoodapproach".Headoptedthename\empiricallikelihood"becausetheempiricaldistributionofthedataplaysanimportantrole.Aswewilldescribelaterinthissection,alternativenonparametriclikelihoodratioshavebeendevelopedthatarealsobasedontheempiricaldistributionfunction,andas Owen ( 2001 ,p.2)statesinhisbook,\empiricallikelihood...isdistinguishedmorebybeingalikelihoodthanbybeingempirical".Themainideaofempiricallikelihoodistoconstructalikelihoodratiostatisticfortheparameterofinterestusingamultinomialdistributionontheobserveddata. Owen ( 1988 )provesananalogueofWilksTheorem,obtaininga2asymptoticdistributionforthenegativeoftwicethelogempiricallikelihoodratio.As Owen ( 1988 )remarks,thisresultissurprisingbecausethenumberofnuisanceparameters,n1,increaseswiththesamplesize. 20

PAGE 21

Owen 1988 ). Owen ( 1991 )appliedempiricallikelihoodtoregressionmodelsbyextendingthetheoryforindependentandnon-identicaldistributedobservations. Kolaczyk ( 1994 )madefurtherextensionstogeneralizedlinearmodels.Empiricallikelihoodwassoonrecognizedasaseriouscompetitortocontemporarymethodsofnonparametricinference,suchasthebootstrap. HallandLaScala ( 1990 )arguethat\empiricallikelihood...deservesaprominentplaceinthemodernstatistician'sarmoryofcomputer-intensivetools".Theyidentifythefollowingadvantagesofempiricallikelihoodoverthebootstrap: (1) empiricallikelihoodprovidescondenceregionsformultivariateparameters,andtheshapesaredatadriven,beingconcentratedinplaceswherethedensityoftheparameterestimatorisgreatest; (2) empiricallikelihoodisBartlettcorrectable,i.e.,acorrectionforthemeanreducesthecoverageerrorofcondenceregionsbasedonempiricallikelihoodfromordern1toordern2; (3) empiricallikelihooddoesnotrequireestimationofscaleorskewness; (4) empiricallikelihoodregionsarerangepreservingandtransformationrespecting. Imbens ( 2002 )givesareviewofrecentdevelopmentsconcerningmaximumempiricallikelihoodestimators,whicharedenedasthemaximizersoftheempiricallikelihoodovertheparameterspace.Heremarksthattheirmainmeritisthattheycircumventtheneedtoestimatethecovariancematrix(theasymptoticcovarianceofthesamplecriterionfunction)necessaryinthecaseofGMMestimatorsandalsotheyhaveaniceinformation-theoreticinterpretation.Itturnsoutthatthecontinuously-updatingGMMestimatorisaparticularcaseofthegeneralizedempiricallikelihoodestimatorobtained 21

PAGE 22

Owen ( 2001 )isdedicatedtothissubject.Byreducingtoindependence,heshowshowtoapplyempiricallikelihoodinthecaseofAR(1)processes.Extensionstoarbitraryorderautoregressiveprocessesareeasilyobtained,anditwouldbeinterestingtoseehowinferencebasedonempiricallikelihoodcompeteswiththeclassicalapproachesintimeseries. Kitamura ( 1997 )introducedablockwiseempiricallikelihoodthatpreservesthedependencestructurewithintheobservations.Byextendingresultsfrom QinandLawless ( 1994 ),hederivesanecientestimatorbymaximizingtheblockwiseempiricallikelihood.Thisestimatoriscalledthemaximumblockwiseempiricallikelihoodestimatorandisthecounterpartfortimeseriesofthemaximumempiricallikelihoodestimator. EF[b0]=0:(1{18)Ifp=qweobtaintheclassicalZ-estimationmodel,andifq>p,thenweobtaintheGMMmodel,asdiscussedinprevioussections.Weconsiderbothmodelsatthesametime,and,whennecessary,weunderlinethedierences.Werstgivesomedenitions. 22

PAGE 23

Owen ( 2001 ,pp.11{12)thatwecantreatthedataasiftherewerenoties,byconsideringtheprobabilitiesassociatedwithobservationsandnotwiththeirvalues.IfwerepresentanydistributionGFnbyavectorofweightsp=(p1;:::;pn),wherepi=GfXig,thentheempiricallikelihoodratiocanbewritteninanequivalentformas Owen ( 1988 )provesthefollowingfundamentalresult.LetX1;:::;Xn2RdbeindependentrandomvectorswithcommondistributionF.For2Rpandx2Rd,letb(x;)2Rp.Let0besuchthatCovF[b0]isniteandhasrankp.If0satisesEF[b0]=0,then2log(R(0))2p.As Owen ( 2001 ,p.41)remarksinhismonograph,aninterestingaspectofthisasymptoticresultisthatitdoesnotincludeconditionsonb(x;)noronEFb.LetFc;n=fG:R(G)c;GFngandSc;n=SG2Fc;nft:EG[bt]=0g.Owen'sresultsuggeststakingc=exp(2p(1)=2),where2p(1)isthe1quantileof2p,inordertoobtainanasymptotic100(1)%condenceregionfor0,i.e. P(02Sc;n)!1;asn!1:(1{23) 23

PAGE 24

Owen ( 1990 )suggestsusingthequantilesofascaledFisher'sFdistribution(n1)p npFp;npinsteadofa2pwhenconstructingtheempiricallikelihoodcondenceregionsfor0. HallandLaScala ( 1990 ), DiCiccioandRomano ( 1989 ),and DiCiccioetal. ( 1991 )showthatempiricallikelihoodisBartlettcorrectable.BartlettcorrectionamountstoameancorrectionoftheempiricallikelihoodinordertoachieveacoverageaccuracyoforderOP(n2).EmpiricallikelihoodisBartlettcorrectablebecausethethirdandthefourthcumulantsofthecomponentsofthesignedrootoftheempiricallikelihoodareofordersatmostOP(n3=2)andOP(n2),respectively.Consequently,theempiricallikelihoodadmitstheexpansion n2q+OP(n2):(1{24)Sincethealgebraicexpressionforaisfairlycomplicated, HallandLaScala ( 1990 )suggestabootstrapapproximation. QinandLawless ( 1994 )extendempiricallikelihoodforZ-estimatorstomodelswherethedimensionoftheestimatingequationisgreaterthanthatoftheparameter.Theydenethemaximumempiricallikelihoodestimator(MELE)tobethemaximizeroftheempiricallikelihoodratioovertheparameterspace,i.e., ~=argmax2R():(1{25) 24

PAGE 25

Owen ( 1990 ), QinandLawless ( 1994 )provethattheoptimalweightsaregivenbypi()=n(1+Tb(Xi;))1foranyinasmallneighborhoodofthetrueparameter0,whereistheLagrangemultiplierofthesystem( 1{22 )andsatisesnXi=1b(Xi;) QinandLawless ( 1994 )showthat~,theMELEof0,isasymptoticallynormallydistributed,withthesamelimitingdistributionastheecientGMMestimator.Specically,assumethatE[b(Xi;0)b(Xi;0)T]ispositivedenite,thatb(x;)istwotimescontinuouslydierentiableinaneighborhoodof0wherekb(x;)k,krb(x;)k,andkr2b(x;)kareboundedbysomeF-integrablefunctionG(x),andthattherankofE[rb(Xi;0)]isp.Then (1) withprobability1,R()attainsitsmaximumatavalue~intheinterioroftheballk0kn1=3asn!1, (2) (3) Baggerly ( 1998 )hasgeneralizedempiricallikelihoodbyconsideringthefamilyofCressie-Readpowerdivergences.TheCressie-ReadpowerdivergencebetweenFnandFp(orequivalentlybetweenthevectorofuniformprobabilitiesp0=n11andp)isgivenby 25

PAGE 26

Baggerly denedtheempiricaldivergenceforthemeanforthewholeclassofCressie-Readdiscrepancymeasuresbygeneralizing( 1{22 )incaseofthemean,i.e.forevery2R Baggerly ( 1998 )showedthat Owen 'sresultontheasymptoticsofempiricallikelihoodholdsforanymemberinCressie-Readfamily. JingandWood ( 1996 )showthattheexponentialempiricallikelihood(obtainedfor=1)isnotBartlettcorrectable,andlater, Baggerly showsthatempiricallikelihoodistheonlyelementintheCressie-ReadfamilyofdivergencesthatadmitsaBartlettcorrection.Nevertheless,inhissimulationresults,theuseofascaledFisher'sFdistributiongivesbettercoveragethanbothasymptotic2andBartlettcorrectedcondenceregions. Corcoran ( 1998 )extendstheclassofdiscrepancystatisticsthatadmitBartlettcorrections. Smith ( 1997 )introducestheclassofgeneralizedempiricallikelihoodestimatorsdenedassaddlepointsofanoptimizationproblemdenedintermsofanormalizedconvexfunction. NeweyandSmith ( 2004 )showthatthisclassofestimatorsgeneralizestheclassofminimumCressie-Readdiscrepancyestimators. 26

PAGE 27

Efron ( 1979 ),thebootstraphasprovidednewmethodstoappliedstatisticsandmotivatedamyriadofnewtheoreticalresults.InarecenteditionofStatisticalSciencededicatedtothebootstrap, Efron ( 2003 )remarkedthatthebootstrapwasinitiallyintroducedasanalternativetothejackknifeforestimatingthebiasandvarianceofanestimator.Sincethen,manynewapplicationshavebeendeveloped,includingbootstrapcondenceintervalsandsignicancetests,bootstrapbiasreduction,andbootstrapdiagnostics.Inreviewingrecentdevelopmentsinbootstrapping, Davisonetal. ( 2003 )mentionedseveralnewdirectionsofresearch,includinghighlyaccurateparametricbootstrapprocedures,theoreticalpropertiesforthenonparametricbootstrapwithunequalprobabilities,them-out-of-nbootstrap,bootstrapfailuresandremediesforsuperecientestimators,signicancetesting,andresamplingfordependentdata.Booksthatdealwithboththeoreticalpropertiesandapplicationsofbootstrapinclude Hall ( 1992 ), EfronandTibshirani ( 1993 ), ShaoandTu ( 1995 ), DavisonandHinkley ( 1997 ),and Lahiri ( 2003 ).Themainideaofthebootstrapistoestimatethesamplingdistributionofastatisticbyits(re)samplingdistributionobtainedunderanestimateoftheunderlyingdistributionofthedata.Thisdenitionappliestobothparametricandnonparametricproblemsasfollows.SupposeX=fX1;:::;XngisaniidsamplefromadistributionFandwewanttoestimatethesamplingdistributionofastatisticTn=Tn(X1;:::;Xn).LetL(TnjF)representthedistributionofTnwhenthedataXisaredrawnfromF.Intheparametricbootstrap,weconsideraparametricmodelfF;2gfortheunderlyingF,withF=F0and02,whereistheparameterspace.Inordertoapplythebootstrapprincipletothisproblem,werstestimatetheparameter0byaconsistent(andecient)estimator^(usuallytheMLE)andtakeF^asourestimate 27

PAGE 28

Hall 1992 ,pp.9{11),andbootstrapestimatesareusuallyfoundbyMonteCarlosimulation.Inthiscase,foragivenintegerB,weconsiderBsimulatedbootstrapresamplesX1;:::;XB,andwecomputethestatisticTb=T(Xb)foreveryresampleXb.Then,weapproximateL(TnjFn)bytheempiricaldistributionoftheTb's,i.e.,L(TnjFn)1 HallandMartin ( 1988 ).Generally,nomorethantwolevelsofbootstrapareemployed,andsuchproceduresarereferredtointheliteratureas\theiteratedbootstrap",\thenestedbootstrap",and\thedoublebootstrap".Thecomputationaleortrequiredbytheiteratedbootstrapisgenerallytakentobethesquareofthatrequiredforonelevelofbootstrapping,whichisalreadycomputationallyinvolved.Applicationsoftheiteratedbootstrapincludecalibrationofcondenceregions( Beran 1987 1988 ; Hall 1992 ),biasreduction( Halland 28

PAGE 29

, 1988 ; DavisonandHinkley 1997 ),variancestabilization( Tibshirani 1988 ; HallandPresnell 1999 ),andbootstrapdiagnostics( Efron 1992 ; Cantyetal. 2000 ).Often,thebootstrapprovidesmoreaccurateresultsthanrstorderasymptoticapproximations,withoutmakinguseofthecomplexalgebraofhigherorderexpansions.TheanalysisoftheperformanceofbootstrapproceduresgenerallyrelyonEdgeworthexpansions.TheEdgeworthexpansionisarenementoftheCentralLimitTheoremthatgivestheformoftheerrortermsinanasymptoticapproximationofthedistributionofthesamplemean,extendedby BhattacharyaandGhosh ( 1978 )tosmoothfunctionofmeans.Bootstrapversionsoftheseexpansionsweredevelopedby Hall ( 1988 )inordertoanalysetheperformanceofdierenttypesofbootstrapcondenceintervals.Asaconsequence,thebootstrapoftengivesrejectionandcoverageprobabilitiesthataremoreaccuratethanapproximatelargesamplemethods.Generally,agoodbootstrapprocedureshouldsatisfytwodesiderata:itshouldyieldanasymptoticallyconsistentestimateofthesamplingdistributionofastatisticand,forsmalltomoderatesamplesizes,itshouldoutperformasymptoticapproximations. ShaoandTu ( 1995 )identifysometechniquesusedinthestatisticalliteraturetoestablishbootstrapconsistency.Themostpopulartechniqueiscalledimitation.Themainideahereistoimitatetheproofforobtainingtheasymptoticdistributionofthestatisticinordertoextenditforthebootstrap.Theconsistencyofthebootstrapforthesamplemeancanbeproventhisway( vanderVaart 1998 ,Thereom23.4).Then,byapplyingthedeltamethodforbootstrap( vanderVaart 1998 ,Theorem23.5),theconsistencyofthebootstrapforsmoothfunctionsofsamplemeansfollows.AnothertechniqueusesBerry-Esseentypeinequalities.Themainadvantageofthismethodisthatonecanalsoobtaintherateofconvergenceofthebootstrapestimates.Unfortunately,itisoftendiculttoobtainsuchinequalities.Inordertoshowconsistencyofthebootstrap,oneusuallyconsidersametricthatmetrizesweakconvergenceinthespaceofdistributionfunctions( Huber 1981 ), 29

PAGE 30

BickelandFreedman ( 1981 )usedMallow'sdistanceinprovingconsistencyofthebootstrapfort-statistics,vonMisesfunctionals,andempiricalprocesses. Freedman ( 1981 1984 )alsousesMallow'sdistancetoprovetheconsistencyofthebootstrapdistributionofordinaryleastsquares(OLS)andtwostageleastsquares(2SLS)estimatorsincertainlinearregressionmodels.Usingempiricalprocessestheory,bootstrapconsistencycanbealsoestablishedforHadamarddierentiablestatisticalfunctionalsusingtheconsistencyoftheempiricalbootstrapfortheBrownianBridge( Gine 1990 ),andmorerecently vanderVaart ( 1998 ,pp.332{334).Usingafunctionaldeltamethod,onecanprovebootstrapconsistencyforamyriadofstatistics,suchassamplequantiles,L-estimators,andnonparametric-goodness-of-tstatisticssuchasthevonMisesandKolmogorov-Smirnovstatistics.Toillustrate,usingthesamenotationsasabove,letXbeaniidsamplefromFandletGn=L(TnjF)bethedistributionofTn.HavingabootstrapresampleX,letTn=Tn(X1;:::;Xn)bethebootstrapversionofTncorrespondingtoXandlet^Gn=L(TnjFn)beitsbootstrapdistribution.SupposethatTnconvergesweaklytoG,sothat(Gn;G)!0asn!1,whereisametricthatmetrizesweakconvergenceofdistributions,suchasLevydistanceorboundedLipschitzdistance( Huber 1981 ).WhiletheasymptotictheoryapproximatesthedistributionGnbyitslimitG,thebootstrapapproximatesGnbyitsbootstrapdistribution^Gn(whichisarandomdistribution).Ifthesequenceofrandomdistributions^GnconvergestoGinprobability,i.e.(^Gn;G)P!0,then 30

PAGE 31

B.2 ,wepresenta(conditional)versionofSlutsy'stheorem,thatshowsthatifTnPT,Bn=B+oP(1),andCn=C+oP(1),thenBnTn+CnPBT+C.Finally,usingaclassicalsubsequenceargument,( Kallenberg 2002 ,Lemma4.2,p.63),notethat(^Gn;G)P!0ifandonlyifforeverysubsequence^Gmnof^Gn,thereexistsafurthersubsequence^Glnof^Gmn,suchthat(^Gln;G)a:s:!0asn!1. 1.2.2 Presnell ( 2002 )showsthatthesefamiliesareleastfavorableforparametersthataresmoothfunctionsofthemeanvector.InTheorem 2.2.1 below,weprovethatthisresultisalsotruefortheZ-estimationmodel( 1{3 ).Here,byleastfavorablewemeanthat,whenevaluatedatthemaximumlikelihoodestimator,theinverseoftheFisherinformationmatrixcorrespondingtothepseudo-parametricfamiliesdenedinSection 1.2.2 isequaltothesandwichmatrix,theusualnonparametricestimatoroftheasymptoticvarianceoftheZ-estimator.Inthissense,inferenceabouttheparametersisnotmadearticiallyeasierbyrestrictingattentiontothesefamiliesofdistributions( Stein 1956 ).Thereareanumberofbootstrapcondenceintervalproceduresbasedonleastfavorablefamiliesofdistributions.Efron'stiltedbootstrapcondenceintervalsarebasedoninvertingabootstraphypothesistestcarriedoutwithinaleastfavorablefamily.Theautomaticpercentilemethod( DiCiccioandRomano 1989 )canbeappliedinconjunctionwithaleastfavorablefamilyapproach( DiCiccioandRomano 1990 ).Also, DiCiccioandRomano ( 1990 )and HallandPresnell ( 1999 )suggestestimatingthevarianceof^asa 31

PAGE 32

minimizeD(p)=1 1{27 ).Notethat Presnell ( 2002 ),dene 2{1 ),foraxed @pi(D(p)0"nXi=1pi1#TnXi=1pib(Xi;))=0(npi)0Tb(Xi;): Multiplying( 2{4 )bypiandsummingoveri,gives 32

PAGE 33

2{4 )maybesolvedforpiusing( 2{5 ),yielding 2{6 ),i.e.F=nXi=1pi()Xi;wherexistheunitpointmassatx.WeshowintheappendixthatF=fFg2,theresultingfamilyofweightedempiricaldistributionsindexedby,hasthenonparametricleastfavorableproperty( Stein 1956 ; DiCiccioandRomano 1990 ). 1.2.2 ,evaluatedatthemaximumlikelihoodestimator,isequaltothesandwichmatrix,theusualnonparametricestimatoroftheasymptoticvariancecovariancematrixoftheZ-estimator. Tauchen 1986 ).Inaneorttoimprovenitesampleperformanceofthesetests, HallandHorowitz ( 1996 )devisedamodiedbootstrapprocedurethatcanbealsoappliedwithdependentdata.Theirprocedurerecentersthemomentconditionssothatthemodiedmomentconditionsarefullledbythesample. 33

PAGE 34

( 1996 )showedthatthebootstrapt-testforindividualparametersisasymptoticallyconsistentwithoutrecentering.However,iftheuncenteredmomentconditionsareused,thenthebootstrapestimateofthedistributionoftheJ-teststatisticofover-identifyingrestrictionsisnotconsistent,afactalsoclaimedby BrownandNewey ( 2002 )and LindsayandQu ( 2003 ).Asnotedby LindsayandQu ( 2003 ),usingtheuncenteredmomentconditions,thenullhypothesisofmean-zeroofthescoresdoesnotholdforthesample,sothat,oneissamplingunderanalternativehypothesisinwhichthemeanofthescoresisnotzero.Thismayhaveasmallimpactonthecriticalvaluesifthenullistrue(ifthebootstrapisconsistentunderthenull),butwillhaveagreatimpactifthenullisfalse;consequently,thesizeofthebootstraptestmightbenearlycorrectbutitspowermaybepoor. HallandPresnell ( 1999 )devisedthebiasedbootstrapinordertoimprovetheperformanceofawiderangeofstatisticalproceduresforhypothesistesting,shrinkage,robustestimationandvariancestabilization.WeusethismethodologyinordertoconstructasemiparametricbootstrapforGMM.AbiasedbootstrapforGMMwasintroducedby BrownandNewey ( 2002 ),thoughfromanotherperspective.TheyarguethattheweightedempiricaldistributionthatminimizestheKullback-LeiblerdivergencetotheempiricaldistributionwhilesatisfyingtheGMMequations(theycallit\theempiricallikelihooddistribution"),attainsasemiparametriceciencylowerbound( BrownandNewey 1998 ).ToshowtheconsistencyoftheweightedbootstrapfortandJstatistics,theyclaimedthatbecausetheempiricallikelihooddistributionismoreecientthantheempiricaldistributionofthesample,thecorrespondingweightedbootstrapisbothconsistentandmoreecientthantheuniformbootstrap.Ourapproachissimilar.WerstintroduceafamilyofweightedempiricaldistributionsF=fFg2denedby( 2{1 ),associatedwiththeGMMmodel( 1{1 ).Webootstrapasifwehadaparametricmodel:rstweestimatetheparameter0=(F)using,say,theGMMestimator^;then,fortherstlevelofbootstrap,agenericresample 34

PAGE 35

2.3.2 ,wewillseethatthebiasedbootstraphascertaincomputationaladvantagesoverthe\uniform"bootstrapwhenthebootstrapisiterated.Inthisway,wemimictheparametricbootstrapforsemiparametricmodelswithoutanyadjustmentofthemomentconditionsandwithouttheneedtondthecenteringvalueforthebootstrapversionofthestatistics,asitisusuallyrequiredinsuchsituations,seee.g. Shorack ( 1981 ), Freedman ( 1981 ), HallandHorowitz ( 1996 ),and Lahiri ( 2003 ).Tosummarize,themainstepsinthebiasedbootstrapforGMMareasfollows: 2{6 ). 2{6 ).Weassumethefollowingregularityconditionshold. 35

PAGE 36

1.2.1 { 1.2.2 2{6 ).Weshownexttheconsistencyofthebiasedbootstrapforthesamplemeanofcriterionfunctions(Theorem 2.3.1 ),then,wegivealemmaonconditionaluniformconvergenceinprobabilitythatwillbeneededinprovingtheweakconsistencyofGMMestimatorsandtheirbootstrapversions.WeendwithatheoremonconsistencyofthebiasedbootstrapdistributionoftheGMMestimators(Theorem 2.3.3 ).Theorem 2.3.4 showstheconsistencyofthebiasedbootstrapfortheJ-testofover-identifyingrestrictions. 2.3.1 { 2.3.3 ,0isinsidetheconvexhulloffb(Xi;^):i=1;:::;ng,withprobabilityapproaching1. 1{1 ).Let^beasequenceofGMMestimatorsdenedby( 1{6 )andletF^betheweightedempiricaldistributiongivenby( 2{6 )correspondingto^.LetX=fX1;:::;Xngbeabiasedbootstrapresample.Undertheassumptions 2.3.1 { 2.3.3 ,asn!1, 36

PAGE 37

2.3.1 ,thefollowinguniformconvergenceforthebiasedbootstrapholds: 2.3.1 ,foranysequenceofGMMestimators^and(biased)bootstrapestimators^,withQn(^)Qn(0)oP(1)andQn(^)Qn(^)oP(1),wehave^=0+oP(1)and^=0+oP(1),andhencealsothat^=^+oP(1). 1{1 ).Let^beasequenceofGMMestimatorsdenedby( 1{6 )andletF^betheweightedempiricaldistributiongivenby( 2{6 )correspondingto^.LetX=fX1;:::;Xngbeabiasedbootstrapresampleandlet^bethebootstrapversionof^onX.Undertheassumptions 2.3.1 { 2.3.3 ,asn!1, B.2.4 intheappendix B.2 ,wecansubstituteforthenonrandommatrixWthenonparametricestimatorofV1.Asaconsequence,theconsistencyofthebootstrapforthetwo-stepGMMestimatorfollows. 2.3.1 ,weusethefactthattheweightedempiricaldistributionsatisesthemomentconditions.Ifwedonotusethebiasedbootstrap,theconditionalmeanofthe(uniform)bootstrapsamplemeanbn(^)containsarandomelementthatdisappearsasymptotically,aresultthatisprovenintheappendix B.2 .ThefactthatrecenteringisnotnecessaryinthecaseofGMMestimationhasbeenalsoprovenby Hahn ( 1996 ),thoughusinganotherapproach.Asaconsequence,withoutreweightingorrecentering,thebootstrapestimateofLp 37

PAGE 38

B.2 thattheusual(uniform,uncentered)bootstrapdistributionestimateoftheJ-testofover-identifyingrestrictionsisweaklyinconsistent. 2.3.3 NewtonandGeyer ( 1995 ).Theysuggesteddrawingacommonsetofpotentialre-resamplesfromasingle\design"distribution,andthenusingimportanceweightingto\recycle"thesere-resamplestoestimate(conditional)expectationsforeachrstlevelbootstrapresample.Unfortunately,thismethodisapplicableonlytotheparametricbootstrap,mostlybecauseinthenonparametricbootstrapthesupportoftheresampleempiricaldistributionsvariesfromresampletoresample.Thus,themajorityofsamplesfromanycandidatedistributionthatdominatesalltheresampledistributionswillhavezeroimportanceweightsformostresamples,leadingtoextremelyinecientandunstableMonteCarloestimates( Ventura 2000 ).Usingtherecyclingmethod, PresnellandGiurcanu ( 2007 )constructarecyclingalgorithmfortheiteratedbiasedbootstrapthatyieldsasecondordercorrectcondenceintervalinthesmoothfunctionofmeansmodel.Atthesametime,thisprocedurepreservesthecomputationalrequirementsofthesinglelevelbootstrap.Themainideaof 38

PAGE 39

E^(T)=E^TdP^ 2{11 )isgivenby E^TdP^ P^(Xb)(2{13)isthelikelihoodratiostatisticcorrespondingtotheprobabilitymodelsP^andP^evaluatedfortheresampleXb.UsingthedenitionsofP^andP^,weseethat P^(Xb) P^(Xb)=nYi=1pi(^) Eft(F;F^)=0;(2{15)whereF^istheweightedempiricaldistributioncorrespondingto^andftisagivenfunctional.Examplesofsuchfunctionalsinclude( Hall 1992 ): 39

PAGE 40

2{15 ), E^[ft(F^;F^)]=0;(2{19)obtainedfrom( 2{15 )isobtainedusingthe\plug-inrule",i.e.bysubstitutingF^forFandF^forF^.Usually,EfT(F^)(F;F^)6=0,andwemightbeinterestedinafurther(additive)correctiont(F),thesolutioninttothefollowingequation EfT(F^)+t(F;F^)=0:(2{20)Asbefore,lett(F^)bethebootstrapestimateoft(F),whichsolvesintthesampleanalogueof( 2{20 ) E^fT(F^)+t(F^;F^)=0:(2{21)Thisequationisagainobtainedusingthe\plug-in"rule,bysubstitutingF^forFandF^forF^in( 2{20 ).NotethatT(F^)solvesE^ft(F^;F^)=0,andthisiswherethesecondlevelofbootstrappingenters.ThehopeisthenthatEfT(F^)+t(F^)(F;F^)0.Thisapproximationneedstobetakeninthefollowingsense:itisnotthatT(F^)+t(F^)isclosertot(F)thanisT(F^),butthatEfT(F^)+t(F^)(F;F^)isclosertozerothanisEfT(F^)(F;F^).Wecanargueasin HallandMartin ( 1988 ,pp.663{665)thatanyiterationofthebiasedbootstrapincreasestheaccuracyoftheestimation.Specically,supposethatEfT(F^)(F;F^)=c(F)nj=2+OP(n(j+1)=2);

PAGE 41

@tEfT(F^)+t(F;F^)t=0!a6=0;wherecisa\smoothfunctional".ThenEfT(F^)+t(F^)(F;F^)=OP(n(j+1)=2):Theargumentisidenticalto HallandMartin ( 1988 ),usingtheadditionalfactthatif^=0+OP(n1=2)thenkF^Fk1=OP(n1=2).TherecyclingbiasedbootstrapcanbeusedtondtheMonteCarloestimateofT(F^),thesolutionintoftheequation E^ft(F^;F^)=0:(2{22)Foragivenb=1;:::;B,letXb=fXb1;:::;Xbngbeabiasedbootstrapresample.Let^b=^Xb,b=1;:::;B,betheversionof^correspondingtoXb.ThenweobtainthefollowingrecyclingMonteCarloapproximationoftheconditionalexpectationfrom( 2{22 )asin( 2{12 ): E^bft(F^b;F^)1 1 41

PAGE 42

1 2.4.1ReviewonInstrumentalVariablesIntheclassicalregressionmodel,theassumptionthattheerrorsareindependentofregressorsisnecessaryinorderforOLSestimatorstobeconsistent.Inmanyobservationalstudies,thisorthogonalityconditionbetweentheregressorsandtheerrorsisnotsatised,sonewtechniqueshavebeendevelopedtoanalyzesuchdata.Onesuchtechniqueisthemethodofinstrumentalvariables,whichassumestheexistenceofaninstrumentalvariable,i.e.,avariablethatiscorrelatedwiththeregressorsbutuncorrelatedwiththeerrors.Thistechniquewasproposedintheeconometricliteratureby Reiersl ( 1941 )andhasbeendevelopedtheoreticallyby Durbin ( 1954 ), Sargan ( 1958 ), BrundyandJorgenson ( 1971 ),and White ( 1982a ),amongothers.Extensivetreatmentsofthismethodologyaregivenineconometrictexts,suchas DavidsonandMackinnon ( 1993 ), Matyas ( 1999 ), Hayashi ( 2000 ),and Hall ( 2005 ).Themethodofinstrumentalvariablesiswidelyappliedtocross-sectional,panel,andtimesseriesmodels,andismoregenerallyusedtomakecausalinferenceinobservationalstudiesanderrors-in-variablesmodels.Avarietyofestimatorshaveappearedintheeconometricsliterature,includingtwo-stageleastsquares(2SLS),instrumentalvariable(IV),andtwo-stageinstrumentalvariable(2SIV)estimators.AlltheseestimatorscanalsobeviewedasparticulartypesofGMMestimators.Considernowtheestimationofcausaleectsinanobservationalstudy( Angristetal. 1996 ).Ifonewantstoestimateatreatmenteect,buttheindividualsexertsomecontroloverthetreatmentassignment,thendierencesbetweengroupmeansarebiased.Aninstrumentalvariableisavariablethatiscorrelatedwiththeexposuretothetreatment,butuncorrelatedwiththeoutcomeaftercontrollingforexposuretothetreatment.Thus, 42

PAGE 43

AngristandKrueger 1991 ).Individualandinstitutionaldecisionsgenerateacorrelationbetweenschoolingandunobservedcovariatessuchasabilityandmotivation,thatarerelatedtopotentialearnings.Forinstance,ifcompulsoryattendancelawswereextended,thenthosewhohadplannedtobeinschoollesswouldcontinuetoearnlessduetounobservedcovariatessuchasability,motivation,andfamilybackground.Asetofinstrumentsinthiscaseisasetofvariablesthataectschoolingbutnotearnings,onceschoolingiscontrolledfor(i.e.includedintheregressionequation). AngristandKrueger ( 1991 )usethequarterofanindividual'sbirthasaninstrument.Theyarguedthatstudentswhoarebornearlierinthecalendaryeararetypicallyolderwhentheyentertheschoolthanstudentswhoarebornlaterintheyear.Thispatternarisesbecausemostdistrictsdonotadmitstudentsunlesstheyattainage6byJanuary1.Consequently,childrenbornearlierintheyearattainthedrop-outageafterattendingtheschoolforashorterperiodoftimethanthosebornlaterinthecalendaryear.Otherinstrumentsusedinstudyingthecausaleectofschoolingonearningsincludesiblingscomposition( ButcherandCase 1994 )andproximitytoanearbycollege( Card 1995 ).Instrumentalvariableshavealsobeensuccessfullyappliedinbio-statisticalresearch.Inarecentarticle, NewhouseandMcClellan ( 1998 )describehowinstrumentalvariablescanbeappliedtoestimatetreatmenteectsinanobservationalstudywhenacontrolledtrialcannotbedone.Theyillustratewithanapplicationtoaggressivetreatmentofacutemyocardialinfarctionintheelderly.Theyuseinstrumentalvariablestoestimatetheeectofcatheterizationonmortalityrate.Asinstrument,theyusethe\dierentialdistance"(theadditionaldistance,ifany,beyondthedistancetothenearesthospitaltoreachacatheterizationhospital).Theyarguethatthedierentialdistancehasnodirecteectonmyocardialinfarctionbutiteectsthelikelihoodofcatheterization(thegreaterthe 43

PAGE 44

HoganandLancaster ( 2004 )applyIVinordertoinfercausaleectsinlongitudinalrepeatedmeasurements.Theyreviewtwomethodsforestimatingcausaleectsinlongitudinaldata,inverseofprobabilityweighting(IPW)andinstrumentalvariables.TheyapplythesemethodstotheHERSdata,asix-yearnaturalhistorystudythatenrolled871HIV-infectedwomenstartingin1993,inordertoestimatethetherapeuticeectofhighlyactiveantiretroviraltherapyregimen(HAART)onCD4cellcount,usingmarginalstructuralmodeling.Inthisdataset,thereceiptoftherapyvarieswithtimeanddependsonCD4countandothercovariates.Theyremarkthatbothmethodsrelyontwoimportantassumptions:nounmeasuredconfoundingforIPWandthereliabilityoftheinstrumentsforIV(theymustbestronglycorrelatedwiththeexposuretotheHAARTtherapy).Beforegoingintofurtherdetail,wepresentthe2SLSestimatorforthelinearIVregressionmodel.Considerthefollowingregressionmodeldenedby Hall ( 2005 ,pp.34{42)canberelaxed,( White 1982a ),butweconneourselvestothesetosimplifythepresentation. 44

PAGE 45

2{26 )canbewrittenas ^=XTZ(ZTZ)1ZTX1XTZ(ZTZ)1ZTy:(2{29)LetPZ=ZZTZ1ZTbetheprojectionmatrixontothecolumnspaceofZ,thenthe2SLSestimatorcanbewritteninanequivalentformas ^=XTPZX1XTPZy:(2{30)Theestimatorgivenby( 2{29 )iscalledthetwo-stageleastsquaresestimator(2SLS)becauseitcanbeobtainedinatwo-stepleastsquaresprocedure.Attherststage,regressthecolumnsofXonthecolumnspaceofZandthen,atthesecondstage,regressyonthecolumnspaceofthettedvaluesofXobtainedfromtherststageregressions.Tobemoreexact,foreachj21;:::;p,considertheregressionsx(j)=Zj+j;j=1;:::;p;

PAGE 46

2{31 )is^=(^XT^X)1^XTy=XTPZX1XTPZy;inagreementwith( 2{29 ).The2SLSestimator^isalsoafeasiblegeneralizedleastsquaresestimator( Hayashi 2000 ,p.59).Toseethis,multiplying( 2{28 )byZT,weobtain 2{32 )is~=(XTZQ1zzZTX)1XTZQ1zzZTy:Now,sincen1ZTZa:s:!Qzz,afeasibleGLSestimatorofisobtainedbysubstitutingtheconsistentestimatorn1ZTZforQzz,whichagainyields^asgivenby( 2{29 ). 46

PAGE 47

2.4.2 ,foralli=1;:::;n, E[g(ui;)]=0;(2{33)whereistheregressionparameterin( 2{28 ).ByAssumption 2.4.1 ,qp,so,theqmomentconditionsin( 2{33 )deneaGMMmodelasin( 1{1 ).Then,foragivenweightmatrixWn,theGMMestimatoris ^=argmingn()TWngn();(2{34)wheregn()=n1Pni=1(yixTi)zi=n1ZT(yX).SinceQn()=gn()TWngn()isdierentiablewithrespectto,theGMMestimator^isasolutiontorQn(^)=XTZWnZT(yX^)=0,anditisgivenby ^=XTZWnZTX1XTZWnZTy:(2{35)InordertoobtainanecientGMMestimator,weneedtochooseWntobeanestimatoroftheasymptoticcovariancematrixofgn().Sincegn()=n1Pni=1(yixTi)zi=n1Pni=1izi,usingAssumptions 2.4.1 { 2.4.3 ,weobtainn1=2Pni=1iziN(0;2Qzz).Sincen1ZTZa:s:!Qzz,anecientGMMestimatorisobtainedfrom( 2{35 )bysubstituting(ZTZ)1forWn,i.e. ^=XTPZX1XTPZy;(2{36)thesameasin( 2{29 ).Finally,ifAssumptions 2.4.1 { 2.4.3 arefullled,thenthe2SLSestimatorisasymptoticallynormallydistributed( Hall 2005 ),andinparticularn1=2(^)N0;2(QxzQ1zzQTxz)1:

PAGE 48

Freedman ( 1981 ), Shorack ( 1981 ), Freedman ( 1984 ),and Wu ( 1986 ). Freedman ( 1981 )identiestwomainbootstrapproceduresforlinearmodels:bootstrappingresidualsandbootstrappingcases.Considerthefollowinglinearmodel 2{37 )iscalledthecorrelationmodel,usingthesameterminologyas( Hall 1992 ,p.170).Inthiscase,thepairs(yi;xi)areresampledrandomlyfromZ.Then,thebootstrapestimate^isdenedastheversionof^onZ.Inthecaseofthezero-interceptregressionmodel, Freedman ( 1981 )and Shorack ( 1981 )arguethatresidualsneedtoberstre-centeredbeforeresampling,inordertoobtainconsistentbootstrapestimates. Freedman remarksthat\...,withoutcentering, 48

PAGE 49

(i) IfExi=0,thenthebootstrapis(weakly)consistent:p (ii) IfExi=6=0,thenthefollowing(conditional)weakconvergencedoesnothold:p Freedman ( 1984 ,p.834)arguesthattheresidualsrstneedtoberecentered,since\Asdata,theresidualsarenotorthogonaltotheinstruments."Thisstatementispotentiallymisleadingandmightleadapractionertorecentertheresidualsunnecessarily.Wewillshowinthissectionthatundergeneralconditions,theuniformbootstrapestimateofthedistributionof2SLSestimatorisindeedweaklyconsistentwithouttheneedforre-centering.Themainideahereisthatitcanbeshownthatbootstrappingthedatain2SLSregressionmodelisequivalenttobootstrappingcases,andbootstrappingcasesin2SLSmodelisequivalenttobootstrappingecientGMMestimators,intheGMMmodelassociatedwiththeregressionmodel.Wehaveshownintheappendixthatthe(uniform)bootstrapisconsistentforthesamplingdistributionofGMMestimators.Although Hahn ( 1996 )alsoprovedtheresultofconsistencyforGMMestimators,hedoesnotmaketheconnectionbetweentheFreedman'sresamplingprocedurewithuncenteredresidualsandbootstrappingtheGMMestimators. 49

PAGE 50

^i=yixTi^;(2{39)wherexTiisthei-throwofthematrixX.Generally,the^i'sarenotorthogonaltotheinstruments,i.e., Freedman denestherecenteredresidualsasthepartofresidualsthatareorthogonaltothecolumnspaceoftheZ(instruments),i.e.~i=^izTi(ZTZ)1ZT^:Inordertopreservethedependencestructurewithinvi=(xi;zi;i), Freedman resamplesfrom~Z=f(xTi;zTi;~i)T;i=1;:::;ng.Let~Z=f(x1;z1;~1);:::;(xn;zn;~n)gbeagenericuniformresamplefrom~Z.DenotebyX,Z,PZ,and~thebootstrapversionsofX,Z,PZ,and~on~Z.Denethebootstrapobservations~yby ~y=X^+~:(2{41)Theanalogof^forthebootstrapresample~Zisgivenby ~=XTPZX1XTPZ~y:(2{42)Wenowshowthatrecenteringisnotnecessarywhenbootstrapping2SLSestimators.LetZ=f(x1;z1;^1);:::;(xn;zn;^n)gandZ=f(x1;z1;1);:::;(xn;zn;n)gbea(uniform)bootstrapresamplefromZ,i.e.(x1;z1;1);:::;(xn;zn;n)are(conditionally)iid,(discrete)uniformlydistributedonZ.DenotebyX,Z,PZ,and^thebootstrapversionsofX,Z,PZ,and^onZ.Denethebootstrapobservationsyand^thebootstrapanalogof^forthebootstrapresampleZasbefore.Itisaneasyexercisetoshowthatresampling(xi;zi;^i)'s,withequalprobabilitiesisequivalenttoresamplingcases(yi;xi;zi)'s,withequalprobabilities.Therefore,since2SLSestimatorsareparticular 50

PAGE 51

2{36 )),andusingtheresultthattheuniformbootstrapisconsistentforthedistributionoftheGMMestimators,weobtaintheuniformbootstrapgivesconsistentestimatorsfor2SLSestimators,withoutrecenteringtheresiduals.Thefollowingconsistencyresultscanbegeneralizedtoallowforheteroskedasticityoftheerrorterms,butwelimittheprooftothesesimplerhypotheses( White 1982a ). 2.4.1 { 2.4.3 ,foreveryx2Rk, 2.4.1 2.4.3 arefullled,thenthebootstrapestimateofthedistributionofthe2SLSestimatorisasymptoticallyconsistent,i.e. 51

PAGE 52

and1i;2i;3i;4iareiid,(211)distributed.Hence,ishavemean0andvariance2.Letx=(x1;:::;xn)Tbethen1vectorofexplanatoryvariables,ythen1vectorofresponses,=(11;:::;1n)Tthen1vectorofunobservederrorterm,z1=(z11;:::;z1n)Tandz2=(z21;:::;z2n)Tthen1vectorsofinstrumentalvariables.Inmatrixnotation,themodelgivenby( 2{45 ){( 2{48 )canbewritteninamorecompactformas Freedman 1984 ),uncenteredresidualbootstrap,centereddoublebootstrap,anduncenteredresidualdoublebootstrap.Inordertodenethebiasedbootstrapcondenceintervals,weconsidertheinstrumentalvariableregressionmodelintheGMMframework,asdescribedinsubsection 2.4.1 .Fori=1;:::;n,letui=(yi;xi;zTi)Tbetheobservationsarrangedinvectorform.Foru=(y;x;zT)T2R4anda2R,letg(x;y;z;a)=(yax)z.Byassumption, E[g(ui;a)]=0;i=1;:::;n;(2{50) 52

PAGE 53

2.3 associatedwiththeimpliedGMMmodeldenedby( 2{50 ).Let^abethe2SLSestimatorofa.Weapplythebootstraptechniquedescribedinsubsection 2.3.2 toconstructbootstrapcondenceintervals.Foranleveluppercondenceintervalfora,weneedtondt1,thesolutionofthe\populationequation" Pa^at1=0:(2{51)Werewritetheleftsideofthisequationasfollows Pa^at1=P^aat1=1P^aat1=1EI(^aat1): Hence,t1isthesolutionofthe\populationequation" EI(^aat1)=1:(2{53)Let^t1bethebootstrapestimateoft1,i.e.thesolutionofthe\sampleequation" E^aI(^a^a^t1)=1:(2{54)ThebiasedbootstrapuppercondenceintervalwithnominalcoverageisthenJbb=(;^a^t1).Sincetypically Pa^a^t16=;(2{55) 53

PAGE 54

P(a^a^tq())=:(2{56)Let^q()bethebootstrapestimatorofq(),sothat^q()satises P^a(^a^a^t^q())=;(2{57)where^tsatisesP(^a^a^tjF^a)=.Since^q()satisesE^a[I(^t^q()^a^a)]=,and ^t^q()^a^a()E^a[I(^a^a^a^a)]^q();(2{58)itfollowsthat^q()satises E^aIE^a[I(^a^a^a^a)]^q()=:(2{59)Consequently,^q()isthe(1)thquantileofE^a[I(^a^a^a^a)].WenowdescribetheMonteCarloimplementationofthisbootstrapprocedure.ForeachbiasedbootstrapresampleXb,b=1;:::;B,compute^abanddrawCbiasedbootstrapre-resamplesXbc,c=1;:::;C.Foreachc,let^abcbethebootstrapversionof^aonXbc.Foreveryb=1;:::B,let 54

PAGE 55

2{63 )are: 2-1 and 2-2 showtheMonteCarloestimatesofthecoverageprobabilitiescorrespondingtodierentbootstrapcondenceintervals:GMMbiasedbootstrap(GBB),GMMuniformbootstrap(GUB),and Freedman 'scenteredresiduals(FCR).Wehave 55

PAGE 56

2-1 2-2 ,and 2-3 showthecoverageerrorsfordierentbootstrappercentilecondenceintervals,usingdierentbootstrappingproceduresfordierentnominalsizes=:9;:85;:9;:95;:99.Thecoverageprobabilitieswereexaminedusing500simulationrunsand500outerlevelresamples.Fromtheseplotswecanconcludethatexceptforthesamplesizen=20whenallbootstrapproceduresperformrelativelysimilar,forallothersamplesizesn=40;60;80;100;200thebiasedbootstraprecyclingcondenceintervalsoutperfomallotherbootstrappercentilecondenceintervals. Table2-1. Estimatedcoverageprobabilitiesforbootstrapone-sided,uppercondenceintervalsat=.90nominallevel,=.8,B=1000,S=1000:GMMbiasedbootstrap(GBB),GMMuniformbootstrap(GUB),centeredresiduals(FCR),andbasedonasymptoticapproximation(ASY) Percentile Percentile-t n GBB GUB FCR GBB GUB FCR Asym 20 0.912 0.928 0.915 0.825 0.831 0.816 0.998 40 0.964 0.972 0.969 0.858 0.862 0.854 0.989 60 0.968 0.973 0.976 0.861 0.869 0.861 0.989 80 0.967 0.968 0.966 0.874 0.878 0.876 0.97 100 0.963 0.969 0.967 0.865 0.874 0.869 0.961 200 0.944 0.946 0.944 0.88 0.881 0.875 0.951 300 0.932 0.93 0.929 0.877 0.879 0.878 0.95 56

PAGE 57

Estimatedcoverageprobabilitiesforbootstrapone-sided,uppercondenceintervalsat=.95nominallevel,=.8,B=1000,S=1000:GMMbiasedbootstrap(GBB),GMMuniformbootstrap(GUB),centeredresiduals(FCR),andbasedonasymptoticapproximation(ASY) Percentile Percentile-t n GBB GUB FCR GBB GUB FCR Asym 20 0.976 0.984 0.979 0.872 0.874 0.862 1 40 0.997 0.995 0.996 0.89 0.891 0.886 1 60 0.999 0.999 0.999 0.905 0.907 0.899 1 80 0.997 0.996 0.996 0.912 0.916 0.913 0.996 100 0.995 0.997 0.997 0.925 0.922 0.922 0.996 200 0.99 0.991 0.991 0.927 0.927 0.928 0.988 300 0.99 0.992 0.992 0.93 0.93 0.925 0.979 Figure2-1. Estimatedcoverageerrorscorrespondingtodierentbootstrapcondenceintervals,atdierentlevels,samplesizen=20,40:GMMBiasedBootstrap(GBB),GMMuniformbootstrap,GMMRecyclingBiasedBootstrap(RBB),Mcenteredresiduals(FCR),Obasedonasymptoticapproximation(ASY) 57

PAGE 58

Estimatedcoverageerrorscorrespondingtodierentbootstrapcondenceintervals,atdierentlevels,samplesizen=60,80:GMMBiasedBootstrap(GBB),GMMuniformbootstrap,GMMRecyclingBiasedBootstrap(RBB),Mcenteredresiduals(FCR),Obasedonasymptoticapproximation(ASY) Figure2-3. Estimatedcoverageerrorscorrespondingtodierentbootstrapcondenceintervals,atdierentlevels,samplesizen=100,200:GMMBiasedBootstrap(GBB),GMMuniformbootstrap,GMMRecyclingBiasedBootstrap(RBB),Mcenteredresiduals(FCR),Obasedonasymptoticapproximation(ASY) 58

PAGE 59

FreedmanandPeters ( 1984 ), Freedman ( 1984 ), EfronandTibshirani ( 1986 ),and Bose ( 1981 ).Modelfreebootstrapprocedureshavebeenproposedinaseriesofpapersby Hall ( 1985 ), Carlstein ( 1986 ),and Kunsch ( 1989 ).Thesearebasedonblockingarguments,inwhichthedataarerstdividedinblocksofconsecutiveobservations,andtheblocksinsteadoftheobservationsareresampledinordertoobtainthebootstrapresamples.Asaresult,thedependencestructurewithinthetimeseriesispreservedwithineachblock.Therearedierenttypesofblocking,includingoverlappingandnonoverlappingblocks.Theseinturngiverisetodierentblockbootstrapprocedures,suchasthenonoverlappingblockbootstrap( Carlstein 1986 ),themovingblockbootstrap( Kunsch 1989 ),thecircularblockbootstrap( PolitisandRomano 1992 ),andthestationarybootstrap( PolitisandRomano 1994 ).Alltheseblockbootstrapmethodsareparticularcasesofthegeneralizedblockbootstrap,asdenedin Lahiri ( 2003 ,pp.31{33).Byapproximatinggeneralstationarytimeserieswithfamiliesof(increasinglymorecomplex)parametricmodels,sievebootstrapshavealsobeendevelopedinaseriesofpapersby Kreiss ( 1992 ), Bulmann ( 1997 ),and ChoiandHall ( 2000 ).Insteadofresamplingtheblockswithequalprobabilities,weproposeanewproceduretoresampletheblocksfromaweightedempiricaldistributiononthesampleofblockswhichsatisesthesamplemomentconditions.TheweightsareobtainedbyminimizingtheCressie-Readdistancetotheempiricaldistributionundertheconstraintthatitsatises 59

PAGE 60

2.3.2 ),wecanreusetherstlevelblockbootstrapresampleswheniteratingtheblockbiasedbootstrapinordertoestimatehigherlevelparameters. BrownandNewey ( 2002 )appliedthebiased-bootstrapforGMMmodelwithiidobservations.Theyanticipatedthatasimilarbootstrapprocedurefordependentdatawouldbepossible.LetX=fX1;:::;Xngbearealizationofad-dimensionalstationarytimeseries.DenotebyFthecommondistributionofXis,andbyFn=1=nPni=1XitheempiricaldistributionofXis.Supposethattheparameterofinterest02Rpisdenedimplicitlyastheuniquesolutiontothe\populationequation" EFb0=Eb(X1;0)=0;(3{1)forsomeb:Rd!Rp. Bustos ( 1982 )introducedthegeneralizedM-estimator^ofasasolutiontothe\sampleequation" EFnb^=1

PAGE 61

n!0asn!1.LetB1;:::;BbldenoteasimplerandomsampledrawnfromB,wherebl=n l,i.e.blistheintegerpartoftheratiobetweenthesamplesizenandtheblocksizel(thoughothervaluesarealsopossible).Sinceeachresampledblockcontainslelements,intotalweresamplen1=lblelements.IfwedenotetheelementsofblockBibyXi(l1)+1;:::;Xil,thenX=fX1;:::;Xn1gisablockbootstrapresampleofsizen1.Generally,thebootstrapversionof^anditscenteredandscaledversionTn=n1=2(^0)aredenedinoneoftwoways.Onewayistodenethebootstrapversion^of^asthesolutionof 1 b()=1 Lahiri 2003 ,pp.81{83).Anotherwaytodenethebootstrapversionof^isasasolutiontothemodiedequation 1 Shorack ( 1981 )inthecontextofbootstrappingM-estimatorsinlinearregressionmodelsandalsoby HallandHorowitz ( 1996 )inthecontextofbootstrappingGMMestimators.Notethatb(^)istheappropriatequantity 61

PAGE 62

3{5 )conditionallyunbiased.As Lahiri ( 2003 ,p.83)remarks,oneadvantageofusing( 3{5 )over( 3{3 )isthatweneedtosolveonlyonesetofequations( 3{5 ),comparedwithtwosetsofequations( 3{3 )and( 3{4 )inthelattercase.Inordertodenetheweightsoftheblockscorrespondingtomovingblockbiasedbootstrap(MBBB),denotebyUi()=1=lPi+l1j=ib(Xj;)theaverageofmomentsatinblockBi.Thenletp=p()=(p1;:::;pN)bethevectorofprobabilitiesthatisclosesttothevectorofuniformweights(1=N;:::;1=N)suchthattheweightedmeanoftheblockaveragesisequaltothezerovector.Inotherwords,pisthesolutionoftheminimizationproblemminimizeD(p)=1 3{6 )maybesolvedforpiasintheiidcase,yielding 3{7 ).Asbefore,eachresampledblockBicontainslelementswhichwedenotebyBi=(X(i1)l+1;:::;Xil),i=1;:::bl,andn1=lblisthetotalnumberofbootstrapvalues.ThenX=fX1;:::;Xn1gisaMBBBsampleofsizen1.Let^denotethebootstrapversionof^correspondingtothebiasedbootstrapresampleXdenedasthesolutionto(1=n1)Pn1i=1b(Xi;^)=0:Byconstruction,the(conditional)expectationgiventhesampleXofthebiasedbootstrapmeanofthemomentconditions 62

PAGE 63

E^[bn(^)]=E^h1 wherebn()=1=n1Pn1i=1b(Xi;)andUi()=1=lPi+l1j=ib(Xj;). 3{1 ). 63

PAGE 64

Lahiri 2003 ,Theorem4.2,p.83),thefollowingweakconvergenceholds:GnG,whereG=N0;D11(D1)Tand1=limn!1nVar(bn(0))=Cov(b(X1;0);b(X1;0))+2P1i=1Cov(b(X1;0);b(X1+i;0)),bn()=1=nPni=1b(Xi;),D=Erb(X1;0).Thenextlemmaisconcernedwiththeexistenceoftheweightssatisfying( 3{6 ).LetUi=Ui(^). 3.3.1 ,0isinsidetheconvexhulloffUi:i=1;:::;Ng,withprobabilityapproaching1asn!1.LetCoUi:i=1;:::;N=Pni=1iUi:i0;Pni=1i=1betheconvexhullforfUi:i=1;:::;ng.Since02CoUi:i=1;:::;N,thereexistsauniquesetofweightsfpi:i=1;:::;Ngsolving( 3{6 )suchthatPNi=1pi=1;pi0andPNi=1piUi=0.Weconsiderindetailthecasewhen=0,withsimilarproofsholdingforanymemberintheCressie-Readfamily.Taking=0in( 3{7 ),theweightshavethefollowingsimpliedexpressions 3{1 ).Supposethat^isasequenceofgeneralizedM-estimatorsdenedby( 3{2 ),andletF^betheweightedempiricaldistributiongivenby( 3{6 ).LetX=fX1;:::;Xn1gbeablockbiasedbootstrapresample.

PAGE 65

3.3.1 { 3.3.4 ,asn!1, 2.3.2 andTheorem 2.3.2 fromSection 2.3.1 holdalsointhiscase.Sincetheproofsareidentical,wehavenotincludedthemintheappendix. 3.3.1 ,thefollowinguniformconvergenceforthebiasedbootstrapholds: 3.3.1 andforeverysequenceofgeneralizedM-estimators^and(biased)bootstrapestimators^,withb(^)=n1Pni=1b(Xi;^)=oP(1)andb(^)=n1Pni=1b(Xi;^)=oP(1),wehave^=0+oP(1)and^=0+oP(1),andhencealsothat^=^+oP(1).ThenexttheoremshowsthattheMBBBdistributionofthegeneralizedM-estimatorisconsistent. 3{1 ).Let^beasequenceofgeneralizedM-estimatorsdenedby( 3{2 )andletF^betheweightedempiricaldistributiongivenby( 3{6 ).LetX=fX1;:::;Xn1gbeablockbiasedbootstrapresampleandlet^bethebootstrapversionof^onX.Undertheassumptions 3.3.1 { 3.3.4 ,asn!1, 3{7 )andassociatedwiththegeneralizedM-estimationmodel( 3{1 ).Then,estimatetheparameter0=(F)by^, 65

PAGE 66

E^(T)=EGTdP^ whereGisagiven\design"distribution,P^=Fn^isthenfoldproductmeasureofF^anddP^ 3{15 )correspondingtore-sampleXb Lahiri ( 2003 ,p.2)calls0alevel-1parameteranddenesthelevel-2parameterasafunctionalrelatedtothesamplingdistributionofanestimatorofalevel-1parameter.Generally,isthesolutionofafunctionalequation Ef(F;F^)=0:(3{17) 66

PAGE 67

2{17 ){( 2{19 ).Denoteby^(l)and^(l)thebiasedbootstrapsolutionsof( 3{17 ),whichsatisfy E^f^(l)(F^;F^)=0andE^f^(l)(F^;F^)=0:(3{18)LetX1;:::;XBbeBbiasedbootstrapresamplesandlet^B(l)betheMonteCarloapproximationof^(l),denedasthesolutioninoftheequation 1 1 3{16 ). Efron ( 1992 )intheiidcaseandby Lahiri ( 2003 )inthecaseofdependentdata.Themainideaofthesemethodsistousetherstlevelresampleswhenestimatinghigherlevelparameterscorrespondingtotheiteratedbootstrap. 67

PAGE 68

Halletal. ( 1995 )and Lahiri ( 2003 ,pp.182{186)inwhich Usingthesameterminologyas Lahiri ( 2003 ),l0isalevel-3parameter,sinceitrelatestothesamplingdistributionof^(l),whichisanestimatorofthelevel-2parameter.SincetheunderlyingFisunknown,wecanapproximatetheexpectationfromthelastdisplaybyitsbiasedbootstrapversion, ^l=argminlMSE^(^(l))=argminlE^^(l)^(~l)2;(3{25)where~lisareasonablepilotblocksize.Hence,inordertoestimate^l,weneedtoiteratethebootstraportoapplyrecycling.Wedescribeinmoredetailthecomputationaldetailsoftherecyclingprocedureproposedinthissection,fortheparticularcasewhen1=nVar(X),thesameideasapplyfordistributionfunctionestimation.WeconsidertheforwardKullback-Leiblerdistancewhendeningtheweightsofthebiasedbootstrap,obtainedfor=0inthefamilyof 68

PAGE 69

1{27 ).Foragivenvalueoftheblocksizel,thevectorofweightsgivenby( 3{7 )iscomputed 3{26 )and( 3{27 ).First,adescriptionofthecomputationaldetailsforthedoublebiasedbootstrapispresentedandthenitsrecyclingversion.Foreachb,considerCre-resamplesfXbc:c=1;:::;CgfromFXb.Foreachc,computeXbc,thesamplemeanofXbc,sothat,theMonteCarloapproximationof^b(l)=n1VarXbXis^Cb(l)=1 69

PAGE 70

^b(l)=n1EXbXXb2=n1EXXXb2dPXb ^rBb(l)=1 3{28 )is ^arBb(l)=BXb0=1n1(Xb0Xb)2kbb0:(3{31) 70

PAGE 71

3{21 ,truevaluesofthelevel-2parameterswerefoundandtheyaregivenby1=3.984and2=0.516.Tondthetrueoptimalblocksizesusingbothuniformandbiasedbootstrap,wegeneratemovingblockbootstrapestimatorsof1and2forblocksizesl=1;:::;10.Tables 3-1 and 3-2 givethemean,bias,standarddeviationandrootmeansquareerrors(RMSE)oftheMBBandMBBBestimatorsof1and2basedonS=1000simulationrunsandB=1000bootstrapresamples.Fromthistableweseethattheoptimalblocksizeisl10=3for1andl20=2for2.Figure 3-1 showsthebootstrapestimatesoftheRMSE'sofdierentbootstrapproceduresforparameters1and2,fordierentblocksizes,foronerealizationoftheprocess( 3{21 ).Foreachbootstrapprocedure,theoptimalblocklengthisthemimimumofthebootstrapestimatesofRMSE'soverallblocklengthsl=1;:::;10.Wecanseethat,forthisparticularsample,thebootstrapprocedureschoosesimilar\optimal"blocksizes.Inthesesimulations,weusedB=1000outerbootstrapresamples(forthebiasedbootstraprecycling(BBR)andtheadjustedbiasedbootstraprecycling(ABR))andadditional500innerbootstrapre-resamplesfortheuniformdoublebootstrap(UB)andthedoublebiasedbootstrap(BB). 71

PAGE 72

Computationoftheoptimalblocksizeforuniformblockbootstrapestimationoflevel-2parameters1and2givenby( 3{22 )and( 3{23 ).ThenumberofsimulationsisS=1000,andthenumberofbootstrapresamplesisB=1000.Anasterix(*)showstheblocksizeforwhichtheminimumRMSEhasbeenattained. VarianceEstimation DistributionFunctionEstimation l Mean Bias SD RMSE Mean Bias SD RMSE 1 1.977 -2.007 0.688 2.122 0.5092 -0.0070 0.0161 0.0175 2 2.939 -1.045 1.040 1.474 0.5135 -0.0027 0.0164 0.0166 3.245 -0.740 1.201 1.411 -0.0029 0.0167 0.0169 4 3.383 -0.601 1.318 1.448 0.5131 -0.0031 0.0174 0.0177 5 3.451 -0.534 1.405 1.502 0.5124 -0.0038 0.0178 0.0180 6 3.484 -0.500 1.479 1.561 0.5119 -0.0043 0.0171 0.0176 7 3.504 -0.480 1.533 1.606 0.5124 -0.0038 0.0174 0.0178 8 3.524 -0.460 1.618 1.681 0.5123 -0.0039 0.0174 0.0178 9 3.523 -0.462 1.686 1.747 0.5111 -0.0051 0.0187 0.0194 10 3.514 -0.470 1.725 1.787 0.5108 -0.0054 0.0182 0.0189 Table3-2. Computationoftheoptimalblocksizeformovingblockbiasedbootstrapestimationoflevel-2parameters1and2givenby( 3{22 )and( 3{23 ).ThenumberofsimulationsisS=1000,andthenumberofbootstrapresamplesisB=1000.Anasterix(*)showstheblocksizeforwhichtheminimumRMSEhasbeenattained. VarianceEstimation DistributionFunctionEstimation l Mean Bias SD RMSE Mean Bias SD RMSE 1 1.972 -2.012 0.679 2.124 0.5094 -0.0068 0.0162 0.0175 2 2.936 -1.048 1.032 1.470 0.5139 -0.0023 0.0164 0.0166 3.248 -0.737 1.200 1.408 -0.0021 0.0165 0.0167 4 3.386 -0.598 1.329 1.457 0.5133 -0.0029 0.0171 0.0173 5 3.448 -0.536 1.394 1.493 0.5135 -0.0027 0.0167 0.0169 6 3.497 -0.488 1.474 1.552 0.5128 -0.0034 0.0169 0.0172 7 3.516 -0.468 1.544 1.612 0.5122 -0.0040 0.0180 0.0183 8 3.528 -0.457 1.617 1.680 0.5112 -0.0050 0.0176 0.0182 9 3.545 -0.439 1.670 1.726 0.5114 -0.0048 0.0184 0.0190 10 3.526 -0.458 1.710 1.770 0.5121 -0.0041 0.0186 0.0189 72

PAGE 73

ThebootstrapestimatesoftheRMSE'scorrespondingtodierentblockbootstrapschemes.WeusedB=1000outerbootstrapresamples(forthebiasedbootstraprecycling(BBR)andtheadjustedbiasedbootstraprecycling(ABBR))andfortheuniformdoublebootstrap(UB)andthedoublebiasedbootstrap(BB)anadditional500innerbootstrapresamplesforeachouterbootstrapresample 73

PAGE 74

Datta ( 1995 ),butcanbemoregenerallyapplied. Andrews ( 2000 )givesanexamplewheretheuniformbootstrapfails.Heremarks\...weprovidesuchacounterexample...[which]isquitesimplebutitgeneralizestoawidevarietyofestimationproblemsthatareofimportanceineconometricapplications".LetX=fX1;:::;XngbeaniidsamplefromN(;1)distribution.Supposethattheparameterspaceforisthepositivereals.ThentheMLEofis^n=maxXn;0,whereXn=n1Pni=1Xi.LetTn=n1=2(^n)and Andrews ( 2000 )showsthatinthecasewhen=0,thebootstrapisinconsistent.WedevisehereabiasedbootstrapprocedurethatgivesconsistentbootstrapestimatesforthedistributionoftheMLE.Considerasequenceofpositiverealsn=n,with2(0;1=2)anddenetheresamplingdistributionas 74

PAGE 75

supxPTnxjGnPTxP!0:(4{4)Thefollowingrelationsaretrue: supxPTnxjGnPTxsupxPn1=2^nxjF0PTxIXnn +supxPn1=2(^n^n)xjFnPTxIXnn; whereIfgistheindicatorfunctionofaset.If=0,itfollowsfromtheLawofIteratedLogarithmthatlimsupnIfXnng=InlimsupnXn 4{4 )holds.When>0,itfollowsthatlimsupnIXnn=limsupnInXn 4{4 )holds. 75

PAGE 76

( 1995 )cites Babu ( 1984 )'spapertoarguethattheclassicalbootstrapapproximation\breaksdown...evenfornicestatistics." Babu ( 1984 )remarkedthatthebootstrapapproximationofasmoothfunctionofmultivariatesamplemeanisnotconsistentfor\certain"valuesofthemeanvector.Followingtheirexample,weproposeabiasedbootstrapversionthatcancorrecttheinconsistencyoftheclassicaluniformbootstrap.LetX=fX1;:::;XngbeaniidsamplefromN(;1)distributionandsupposethatweareinterestedinestimating2,forwhichtheMLEisX2.Consider Datta ( 1995 )showsthattheclassicaluniformbootstrapisnotvalidinthiscase.Torectifythis,asbefore,considerasequenceofpositiverealsn=n,with2(0;1=2)anddenetheresamplingdistributionGntobe 4{7 )onthebootstrap 76

PAGE 77

4-1 givestheresultsofasimulationstudythatshowstheperformanceofthis\hybrid"biasedbootstrap.Under=0,Tn=nX2nhasachi-squaredistribution,withonedegreeoffreedom.Forthesesimulations,wehaveconsideredsamplesizesn=50,100,200,500.Foreachsamplesize,wehavecomputedthequantilesoftheordinary(uniform)bootstrapestimateL(n(X2nX2n)jFn)aswellasthe\hybridbiasedbootstrap"L(TnjGn).ForthissimulationstudywehaveusedB=1000bootstrapresamples,S=1000simulatedsamplesand=n:4.WehavealsoincludedthetruequantilesofTn,givenbythequantilesofthechi-squaredistributionwithonedegreeoffreedom,forcomparison.Itisobviusfromthissimulationstudythatthis\hybridbiasedbootstrap"outperformstheordinarybootstrap,asexpectedfromthetheoreticalresult.Theuniformbootstrapperformserraticallyinthetails,andtheapproximationdoesnotimprovewithincreasingthesamplesize.Table 4-2 givesthebootstrapestimatesofthesamequantitiesasabove,under=0.2.Inthiscase,Tn=n1=2(X2n:22)hasanormaldistributionN(0;:42),forthesamplesizesn=50,100,200,500.Asbefore,foreachsamplesize,wehavecomputedthequantilesoftheordinary(uniform)bootstrapestimateL(n1=2(X2nX2n)jFn)aswellasthe\hybridbiasedbootstrap"L(TnjGn).WehavealsoincludedthetruequantilesofTn,forcomparison.Itisobviusfromthissimulationstudythatthis\hybrid"biasedbootstrapapproachestheordinarybootstrap,asexpectedfromthetheoreticalresult.Asthesamplesizeincreasis,thetwodierentbootstrapproceduresgivealmostindistinguishableresults. 77

PAGE 78

ThequantilesofthedistributionofTnunder=0,andtheirbootstrapapproximationsgivenbytheordinary(uniform)bootstrap(UB)andthe\hybrid"biasedbootstrap(HBB),usingB=1000bootstrapresamples,S=1000simulationrunsandn=n:4 .10 .15 .20 .80 .85 .90 .95 0.0158 0.0358 0.0642 1.6424 2.0723 2.7055 3.8415 UB,n=50 -0.9823 -0.9084 -0.8186 -0.7157 2.3037 2.9697 3.9059 5.5182 HBB,n=50 -0.0714 -0.0539 -0.0292 0.0027 1.4750 1.8641 2.4293 3.4429 UB,n=100 -0.9497 -0.8806 -0.7956 -0.6956 2.2888 2.9500 3.8959 5.5003 HBB,n=100 -0.0405 -0.0257 -0.003 0.0264 1.4977 1.8891 2.4608 3.4919 UB,n=200 -1.0056 -0.9321 -0.8421 -0.7390 2.3399 3.0151 3.9796 5.6105 HBB,n=200 -0.0241 -0.0101 0.0111 0.0398 1.5202 1.9210 2.5082 3.5573 UB,n=500 -0.9200 -0.8538 -0.7710 -0.6765 2.2791 2.9427 3.8789 5.4813 HBB,n=500 -0.0087 0.0039 0.0240 0.0520 1.5538 1.9587 2.5534 3.6234 Table4-2. ThequantilesoftheasymptoticdistributionofTnunder=0.2,andtheirbootstrapapproximationsgivenbytheordinary(uniform)bootstrap(UB)andthe\hybrid"biasedbootstrap(HB),usingB=1000bootstrapresamples,S=1000simulationrunsandn=n:4 .10 .15 .20 .80 .85 .90 .95 -0.657 -0.512 -0.414 -0.336 0.336 0.414 0.512 0.657 UB,n=50 -0.379 -0.336 -0.297 -0.256 0.470 0.607 0.783 1.088 HBB,n=50 -0.323 -0.276 -0.230 -0.181 1.128 1.432 1.863 2.610 UB,n=100 -0.424 -0.368 -0.319 -0.272 0.421 0.538 0.695 0.952 HBB,n=100 -0.386 -0.329 -0.275 -0.223 0.878 1.112 1.443 2.016 UB,n=200 -0.481 -0.406 -0.346 -0.292 0.395 0.501 0.642 0.866 HBB,n=200 -0.469 -0.394 -0.333 -0.277 0.538 0.680 0.877 1.203 UB,n=500 -0.535 -0.437 -0.365 -0.304 0.369 0.465 0.588 0.783 HBB,n=500 -0.534 -0.437 -0.365 -0.303 0.372 0.470 0.593 0.791 78

PAGE 79

1.3.1 and 1.3.2 ,werstpresenttworesultssharedbyconcavecriterionfunctions,formoredetailsandapplications,see GiurcanuandTrindade ( 2006 ). Proof. Proof.

PAGE 80

1.3.1 A.0.1 ,ignoringsetsofprobabilityzero, Sincepointwiseconvergenceinprobabilityforconcavefunctionsonanopensetimpliesuniformconvergenceinprobabilityoncompactsubsetsofthatopenset( Pollard 1991 ,sec.6),usingProposition A.0.2 weobtain sup2S(0;)mn()P!sup2S(0;)m():(A{2)Since0isgloballyidentiable,sup2S(0;)m()
PAGE 81

1.3.2 sup2Dmn()m()P!0: Theconsistencyof^n;1and( A{3 )implythatforevery2with=(0;1;2)2C mn(^n;1;2)P!m(0;1;2):(A{4)Toseethis,notethatforlargeenoughn,andforanycompactneighborhoodDof0containing(0;1;2)andcontainedinC,Pmn(^n;1;2)m(0;1;2)Pmn(^n;1;2)m(^n;1;2)=2+Pm(^n;1;2)m(0;1;2)=2Psup2Djmn()m()j=2+Pm(^n;1;2)m(0;1;2)=2!0asn!1; 1.3.1 ,weobtainargmax2:2kn(2)P!argmax2:2k(2);wherek()=m(0;1;)istheinprobabilitylimitofkn.Notingthat^n;2=argmax2kn(2)and0;2=argmax2k(2),givestherequiredresult. 81

PAGE 82

2.2.1 2{6 ), From( 2{1a ),( 2{2 ),and( 2{5 )itfollowsthat=^=0and=^=0,sothat,from( B{1 ) 2{1b ),itfollowsthatPni=1rpi=0,soby( 2{1a )weobtainr=^=0(1)nXi=1rpi=^=0;andfrom( B{2 )itfollowsthat B{3 )by(rpi)T=^andsummingoveriyields 2{1b )withrespecttoat^yields B{4 )and( B{5 ) 82

PAGE 83

B{3 )byb(Xi;^)T,summingoveriyields B{5 )and( B{7 )itfollowsthat B{6 )and( B{8 )wehave 1{5 ).Now,considerF=fFg2asaparametricfamilyofdistributions,indexedby.TheFisherinformationcorrespondingtothisparametricmodelisgivenby IfweevaluatetheFisherinformationat^,from( B{10 )weobtain B{9 )and( B{11 )itfollows 83

PAGE 84

2{6 )existwithprobabilityapproachingone.Thisresultextendstheresultof Owen ( 2001 ). 2.3.1 P!nkT!x>0=1 supkk=1P!nkTx>0=1:(B{14)UsingaGlivenko-CantelliresultgivenbytheTheorem19.4of vanderVaart ( 1998 ,p.270),thefollowingconvergenceholdswithprobability1: limnsupkk=1PnTx>0PTZ>0=0;(B{15) 84

PAGE 85

B{15 ),itfollowsthat supkk=1PnTx>0!supkk=1PTZ>0;(B{16)almostsurelyasn!1.WetakenowB= limkAnk=Tk1SmkAnm.Sinceforallm1,PAnm,itfollowsthatP(B).Foralmostall!2B,from( B{14 )and( B{16 ),itfollowsthat limsupnsupkk=1P!nTx>0=1=supkk=1PTZ>0:(B{17)SinceVispositivedenite(Assumption 2.3.3 ),weknowthatTV>0forallsuchthatkk=1.ThusPTZ>0=1 2;forallkk=1;hencesupkk=1PTZ>0=1=2,contradicting( B{17 ). ThenextLemmagivestheorderoftheLagrangemultiplier,andgeneralizesTheorem3.2of Owen ( 2001 ,pp.219{222). 2.3.1 { 2.3.3 ,=OP(n1=2). Proof. 2{6 )fortheparticularcasewhen=0,itfollowsthattheweightsaregivenbypi(^)=(n(1+Tb(Xi;^)))1,wheresatisesPni=1pi()b(Xi;^)=0.Hence 1 1+Tb(Xi;^)#=0:(B{18)Hence, 1 n1+Tb(Xi;^):(B{19) 85

PAGE 86

B{19 )itfollowsthat~S=1 nkknXi=1b(Xi;^)b(Xi;^)T n1+Tb(Xi;^)1+kkmaxi=1;:::;nkb(Xi;^)k=k~Sk1+kkWn=1 B.2.2 ,wehaveWn=oP(n1=2),hence,from( B{21 )itfollowsthat=OP(n1=2). ThenextlemmagivestheorderofWndenedabove. 2.3.1 { 2.3.3 ,wehaveWn=oP(n1=2),whereWnisgivenby 2.3.2 ,bythemeanvaluetheoremandtriangleinequality,itfollowsthatkb(Xi;^)kkb(Xi;0)k+k^0kk(Xi);

PAGE 87

maxi=1;:::;nkb(Xi;^)kmaxi=1;:::;nkb(Xi;0)k+k^0kmaxi=1;:::;nk(Xi):(B{23)UsingLemma11.2of Owen ( 2001 ,p.218),itfollowsthatmaxi=1;:::;nkb(Xi;0)k=oP(n1=2)andmaxi=1;:::;nk(Xi)=oP(n1=2):Usingthat^=0+OP(n1=2),from( B{23 )wehaveWn=oP(n1=2). Thefollowinglemmaprovidesuswithanorderofconvergencethatwillbeneededintheproofsofthefollowingtheorems. 2.3.1 { 2.3.3 ,maxi=1;:::;n1 1+Tb(Xi;^)=OP(1): 1+Tb(Xi;^)1 1kkWn:(B{24)Since=OP(n1=2)andWn=oP(n1=2),itfollowsthatkkWn=oP(1)andP(Dn)!1.From( B{24 ),weobtainthatmaxi=1;:::;n1 1+Tb(Xi;^)=OP(1). Thenextproofshowstheconsistencyofthebiasedbootstrapforsamplemeanofmomentconditions. 2.3.1 E^b(X1;^)=nXi=1pib(Xi;^)=0(B{25)and Var^b(X1;^)=nXi=1pib(Xi;^)b(Xi;^)T:(B{26) 87

PAGE 88

B{26 )as FromAssumption 2.3.2 itfollowsthatthefollowinguniformlawoflargenumbersholds,beinganapplicationofTheorem19.4of vanderVaart ( 1998 ,p.270)sup1 B{27 )convergesinprobabilitytotheasymptoticcovarianceVasn!1(forthiswealsousethefactthatV()iscontinuousat0,whichfollowsfromAssumption 2.3.2 ).Forthesecondtermof( B{27 ),usingthefactthatpi=n(1+Tb(Xi;^))1from( 2{9 ),weobtainthefollowingrelations(weusekktodenotetheEuclideannormforvectorsandtheinducedoperatornormformatrices): wherethenexttolastrelationfollowsfromLemmas B.2.1 { B.2.3 .SinceXisaresampledfromF^,whichchangeswithn,inordertocompletetheproof,theCentralLimitTheoremfortriangulararraysisused.Itsucestoshowthatforevery>0( vanderVaart 1998 ,p.20) E^hkb(Xi;^)k2I(kb(Xi;^)kn1=2)iP!0asn!1:(B{29) 88

PAGE 89

E^kb(Xi;^)k2I(kb(Xi;^)kn1=2)=nXi=1pikb(Xi;^)k2I(kb(Xi;^)kn1=2)nXi=1pikb(Xi;^)k2I(Wnn1=2):(B{30)Asbefore,Pni=1pikb(Xi;^)k2P!Ekb0k2.FromLemma B.2.2 ,I(Wnn1=2)P!0,andhence,( B{29 )followsfrom( B{30 ).LetG^n=Lp Kallenberg 2002 ,Lemma4.2,p.63),itfollowsthatforanysubsequence(mn)(n),thereisafurthersubsequence(ln)(mn)suchthat( B{29 )holdsalong(ln)almostsurely.UsingtheCentralLimitTheoremfortriangulararraysalong(ln),itfollowsthat(G^ln;G)a:s:0;whereisametriconthespaceofdistributionsthatmetrizestheweakconvergence.Fromthis,usingagainthesubsequencecriterion,itfollowsthat(G^n;G)P0:Inotherwords,wehaveshownthatanysubsequenceoffG^nghasafurthersubsequencethatconvergesindistributiontoGalmostsurely,sothat,thesequenceconvergesindistributioninprobability. Wenowprovethe(conditional)uniformconvergenceinprobabilityresultthatwillbeusedinprovingtheconsistencyoftheGMMestimators(andtheirbootstrapversions)inTheorem 2.3.2 2.3.2 vanderVaart ( 1998 ),thereexistsanitesequenceofopenballsUi,i=1;:::;n,suchthatSli=1Ui,bUi(x)b(x;)bUi(x)forall2Uiandforallx,andEFjbUibUij,wherebUi(x)=inf2Uib(x;)andbUi(x)=sup2Uib(x;).Forall2Uj, 1 89

PAGE 90

2.3.1 ,wehave1 B{32 ),weobtain sup21 B{33 ),weobtainthedesiredresult. 2.3.2 2.3.2 ,wehave sup2jQn()Q()j=oP(1)(B{34)and sup2jQn()Q()j=oP(1);(B{35)whereweusethesametechniqueforprovingtherstuniformconvergenceresultof( B{34 ).ByCorollary 1.2.1 weobtaintheconsistencyof^.Usingstandardtechniques,thisinturnwouldimplythat^=0+OP(n1=2),sothatallconditionsofTheorem 2.3.1 hold.SinceQn(^)Qn(^)+oP(1)(byhypothesis)andQn(^)=Qn(^)+oP(1)(byTheorem 2.3.1 ),weobtainQn(^)Qn(^)+oP(1):Hence,usingthefactthatQn(^)Q(0)oP(1),wehave 90

PAGE 91

ThenextlemmawillbeusedintheproofoftheThereom 2.3.3 .ItisaversionofSlutsky'stheoremformulatedforconditionalconvergenceindistributioninprobability. 2.3.3 0=n(^)=n(^)+(^^)rn(^)+1 2(^^)2r2n(~);(B{39)where 91

PAGE 92

(^^)r2n(~)=oP(1):(B{43)Bysubstituting( B{42 )and( B{43 )in( B{39 ),weobtain 2(^^)r2n(~))=(^^)(DTWD+oP(1)):(B{44)Hence 2.3.1 ,itfollowsthatp Hahn ( 1996 ). 92

PAGE 93

B{45 ).Hence,weneedtoshownowthat 2.3.1 ),itiseasytoprovethatthefollowingconvergenceholdsfortheuniformbootstrapp B{46 ),by(conditional)Slutsky'stheorem,itishenceenoughtoshowthatp 2.3.4 2.3.3 ,usingtheTaylorexpansionweobtainthat B{45 ),fortheparticularcasewhenW=V1,itfollowsthat 93

PAGE 94

B{47 )and( B{48 ),weobtain SinceV1=2D(DTV1D)1DTV1=2isidempotentofrankp=1,itfollowsthatIV1=2D(DTV1D)1DTV1=2isidemptotentofrankqp(rememberthatweconsiderthecasep=1).Moreover,fromThereom 2.3.3 ,itfollowsthatV1=2p 2.3.1 ): Consequently,nQn(^)P2qp. 2.3.1 fortheuniformbootstrap,the 94

PAGE 95

2.3.3 ); B{52 )holds,sothat,thefollowing(unconditional)weakconvergenceholds B{49 )itfollowsthatnbn(^)TV1=2AV1=2bn(^)2qp.From( B{53 ),weseethatC=A+AV1=2AV1=2Ashouldbeidempotentofrankqp( Driscoll 1999 ).Ofcoursethisisnotalwaysthecase,e.g.forV=I,C=2A,whichisnotidempotent. 2.4.1 { 2.4.3

PAGE 96

Since from( B{57 )and( B{58 ),itfollows 1 B.3.1 ,itfollowsthat 1 whichisamatrixofrankqp. 96

PAGE 97

2.4.2 Eziin1nXj=1zj^j=n1nXj=1zj^jn1nXj=1zj^j=0;(B{62)and Varziin1nXj=1zj^j=1 1 Since(xi;zi;i)aresampledfromtheempiricaldistributiononthesampleZwhichchangeswithn,weusetheCentralLimitTheoremfortriangulararrays.ByProposition2.27of vanderVaart ( 1998 ),weneedtoverifythattheLindebergconditionholds,i.e.,forany>0, E"ziin1nXj=1^jzj2Iziin1nXj=1^jzjn1=2#P!0:(B{65) 97

PAGE 98

2.4.1 { 2.4.3 maxj=1;:::;nkzj^jk+n1nXj=1^jzj=maxj=1;::::nkzj(yj^Txj)k+oP(1)maxj=1;::::nkzj(yjTxj)(^)Txjkmaxj=1;::::nkzjjk+k^kmaxj=1;::::nkxjk=oP(n1=2)+OP(n1=2)oP(n1=2)=oP(n1=2); whereforthelastrelationweusedtheresultofLemma11.2of Owen ( 2001 ,p.218).HenceImaxj=1;:::nzj^j+n1Pnj=1^jzjn1=2P!0.Moreover,usingthesameargumentsasintheproofof( B{64 ),1 B{65 )holds.Asbefore,usingthesubsequenceargument,itfollowsthat,anysubsequence,hasafurthersubsequencethatconvergesindistributionconditionallyalmostsurely,sothat,thesequenceconvergeindistributionconditionallyinprobability. 98

PAGE 99

2.4.3 Usingthatn1XTZ=n1Pni=1xizTi=Qxz+oP(1)andn1ZTZ=n1Pni=1zizTi=Qzz+oP(1),weobtainn1=2(^^)=(QxzQ1zzQTxz)1QxzQ1zz1 (B{69)UsingTheorem 2.4.2 ,weonlyneedtoprovethat (QxzQ1zzQTxz)1QxzQ1zz1 B.3.1 ,weobtain(QxzQ1zzQTxz)1QxzQ1zz1 B{70 )holds,sothat( 2{44 )holds. 99

PAGE 100

3.3.1 P!NkT!x>0=1 vanderVaart ( 1998 ,p.270),thefollowingconvergenceholdswithprobabilityone: limnsupkk=1PnTx>0PTZ>0=0;(C{2)whereZN(0;1).From( C{2 ),itfollowsthatsupkk=1PnTx>0!supkk=1PTZ>0;almostsurelyasn!1.WetakenowB= limkANk=Tk1SmkANm.Sinceforallm1,PANm,itfollowsthatP(B).Foralmostall!2B, limsupksupkk=1P!NkTx>0=1=supkk=1PTZ>0:(C{3) 100

PAGE 101

3.3.3 ),weknowthatT1>0forallsuchthatkk=1.ThusPTZ>0=1 2;forallkk=1;hencesupkk=1PTZ>0=1=2,contradicting( C{3 ). LetWN=maxi=1;:::;NjjUijj.NextlemmasgivetheorderofandWN,andgeneralizetheresultsoffromtheprevioussectionfordependentdata. 3.3.1 C.0.2 3{11 ),itfollowsthat 1 1 N1+TUi:(C{6)Let~S=PNi=1UiUTi C{6 )itfollowsthat~S=1 NkkNXi=1UiUTi N1+TUi1+kkmaxi=1;:::;NkUik=k~Sk1+kkWN=1 101

PAGE 102

Lahiri ( 2003 ,p.52),thefollowingordersholdl NNXi=1UiUTi=1+oP(1);1 C{7 )weobtainthefollowingorderfortheLagrangemultiplier, 3.3.1 C.0.3 maxi=1;:::;NkUi(^)kmaxi=1;:::;NkUi(0)k+k^0kmaxi=1;:::;Nl1i+l1Xj=ik(Xi):(C{11)Hence,itisenoughtoshowthatmaxi=1;:::;NkUi(0)k=o(l1N1=2).Sinceforanyintegeri1,Ekb(Xi;0)k2+<1,itfollowsthatforanyA>0,Pn1P(kb(Xi;0)k2+An)<1.Usingthe(strict)stationarityproperty,b(Xi;0)sareidenticaldistributed,hence

PAGE 103

IntheproofofTheorem 3.3.1 ,wewillalsoneedthefollowingresult. 3.3.1 ,,maxi=1;:::;N1 1+TUi=OP(1): C.0.4 1+TUi1 1kkWN:(C{13)Since=OP(lN1=2)andWN=oP(l1N1=2),itfollowsthatP(DN)!1andkkWN=oP(1).From( C{13 ),maxi=1;:::;N1 1+TUi=OP(1). 3.3.1 E^Ui=NXi=1piUi=0(C{14)and Var^Ui=NXi=1piUiUTi:(C{15) 103

PAGE 104

Var^U1;i=NXi=1piU1;iUT1;i: Usingthesameresultasin Lahiri ( 2003 ,p.52),weobtain 1 C.0.2 { C.0.4 areused.SinceE^[U1;i]=0,andVar^[U1;i]P!1,thecentrallimittheoremfortriangulararraysisused.WeonlyneedtoverifythattheLindebergconditionholds,i.e.forevery>0, E^kU1;ik2IkU1;ikb1=2P!0:(C{19)SinceU1;i,i=1;:::;bareconditionallyiid,itfollowsthatE^kU1;ik2IkU1;ikb1=2=NXi=1pikU1;ik2IkU1;ikb1=2NXi=1pikU1;ik2Imaxi=1;:::;NkU1;ikb1=2:

PAGE 105

C{17 )),itfollowsthat( C{19 )holds.Usingthesamesubsequenceargumentasbefore,itfollowsthatanysubsequence,hasafurthersubsequencethatconvergesweakly,conditionallyalmostsurely,sothat,thefullsequencesequenceconvergeweakly,conditionallyinprobability. 3.3.3 0=bn(^)=bn(^)+rbn(^)(^^)+1 2(^^)Tr2bn(~)(^^):(C{20)UsingAssumption 3.3.2 ,itfollowsthatkr2bn(~)kn1Pni=1k(Xi).Hence(^^)Tr2bn(~)=oP(1):Sincerbn(^)=D+oP(1),from( C{20 )itfollows 2(^^)Tr2bn(~)(^^)=(D+oP(1))(^^): Hence, 3.3.1 ,itfollowsthatp 105

PAGE 106

Andrews,D.W.(2000).Inconsistencyofthebootstrapwhenaparameterisontheboundaryoftheparameterspace.Econometrica68,399{405. Angrist,J.D.,G.W.Imbens,andD.B.Rubin(1996).Identicationofcausaleectsusinginstrumentalvariables.JournaloftheAmericanStatisticalAssociation91,444{455. Angrist,J.D.andA.B.Krueger(1991).Doescompulsoryschoolattendanceaectschoolingandearnings?QuarterlyJournalofEconomics106,979{1014. Babu,G.J.(1984).Bootstrappingstatisticswithlinearcombinationsofchi-squaresasweaklimit.SankhyaSer.A46,85{93. Baggerly,K.A.(1998).Empiricallikelihoodasagoodness-of-tmeasure.Biometrika85,535{547. Beran,R.(1987).Prepivotingtoreducelevelerrorofcondencesets.Biometrika74,457{68. Beran,R.(1988).Prepivotingteststatistics:Abootstrapviewofasymptoticrenements.JournaloftheAmericanStatisticalAssociation83(403),687{697. Bhattacharya,R.N.andJ.K.Ghosh(1978).OnthevalidityoftheformalEdgeworthexpansion.TheAnnalsofStatistics6,434{451. Bickel,P.J.andD.A.Freedman(1981).Someasymptotictheoryforthebootstrap.TheAnnalsofStatistics9,1196{1217. Bose,A.(1981).Edgeworthcorrectionbybootstrapinautoregressions.TheAnnalsofStatistics16,1709{1722. Brown,B.W.andW.K.Newey(1998).Ecientsemiparametricestimationofexpectations.Econometrica66,453{464. Brown,B.W.andW.K.Newey(2002).GMM,ecientbootstrapping,andimprovedinference.JournalofBusinessandEconomicStatistics20,507{517. Brundy,J.andD.Jorgenson(1971).Ecientestimationofsimultaneousequationsbyinstrumentalvariables.ReviewofEconomicsandStatistics53,207{224. Bulmann,P.(1997).Sievebootstrapfortimeseries.Bernoulli3,123{148. Bustos,O.H.(1982).GeneralM-estimatesforcontaminatedp-thorderautoregressiveprocesses;consistencyandasymptoticnormality.ZeitschriftfurWahrscheindlichkeits-theorieundVerwandteGebiete59,491{504. Butcher,K.F.andA.Case(1994).Theeectofsiblingcompositiononwomen'seducationandearnings.QuarterlyJournalofEconomics109,531{563. 106

PAGE 107

Card,D.(1995).Usinggeographicvariationincollegeproximitytoestimatethereturnstoschooling.InAspectsofLaborMarketBehavior:EssaysinHonorofJohnVanderkamp,UniversityofTorontoPress. Carlstein,E.(1986).Theuseofsubseriesmethodsforestimatingthevarianceofageneralstatisticsfromastationarytimeseries.TheAnnalsofStatistics14,1171{1179. Choi,E.andP.Hall(2000).Bootstrapcondenceregionscomputedfromautoregressionsofarbitraryorder.JournaloftheRoyalStatisticalSociety.SeriesB(StatisticalMethodology)62,461{477. Corcoran,S.A.(1998).Bartlettadjustmentofempiricaldiscrepancystatistics.Biometrika85,967{972. Datta,S.(1995).Onamodiedbootstrapforcertainasymptoticallynonnormalstatistics.Statistics&ProbabilityLetters24,91{98. Davidson,R.andJ.G.Mackinnon(1993).EstimationandInferenceinEconometrics.NewYork:OxfordUniversityPress. Davison,A.andD.V.Hinkley(1997).BootstrapMethodsandtheirApplications.CambridgeUniversityPress,Cambridge,UK. Davison,A.,D.V.Hinkley,andG.Young(2003).Recentdevelopmentsinbootstrapmethodology.StatisticalScience18,141{157. DiCiccio,T.J.,P.Hall,andJ.P.Romano(1991).EmpiricallikelihoodisBartlett-correctable.TheAnnalsofStatistics19(2),1053{1061. DiCiccio,T.J.andJ.P.Romano(1989).Theautomaticpercentilemethod:accuratecondencelimitsinparametricmodels.CanadianJournalofStatistics17(2),155{169. DiCiccio,T.J.andJ.P.Romano(1990).Nonparametriccondencelimitsbyresamplingmethodsandleastfavorablefamilies.InternationalStatisticalReview58,59{76. Driscoll,M.F.(1999).Animprovedresultrelatingquadraticformsandchi-squaredistributions.TheAmericanStatistician53,273{275. Durbin,J.(1954).Errorsinvariables.ReviewoftheInternationalStatisticalInstitute22,23{32. Efron,B.(1979).Bootstrapmethods:anotherlookatjackknife.TheAnnalsofStatis-tics7,1{26. Efron,B.(1992).Jackknife-after-bootstrapstandarderrorsandinuencefunctions.JournaloftheRoyalStatisticalSociety,SeriesB54,83{127. 107

PAGE 108

Efron,B.andR.Tibshirani(1986).Bootstrapmethodsforstandarderrors,condenceintervals,andothermeasuresofstatisticalaccuracy.StatisticalScience1,54{77. Efron,B.andR.J.Tibshirani(1993).IntroductiontotheBootstrap.NY:ChapmanandHall. Freedman,D.A.(1981).Bootstrappingregressionmodels.TheAnnalsofStatistics9,1218{1226. Freedman,D.A.(1984).Onbootstrappingtwo-stageleastsquaresestimatesinstationarylinearmodels.TheAnnalsofStatistics12,827{842. Freedman,D.A.andS.F.Peters(1984).Bootstrappinganeconometricmodel:Someempiricalresults.JournalofBusinessandEconomicStatistics2,150{158. Gine,E.(1990).Bootstrappinggeneralempiricalmeasures.TheAnnalsofProbability18,851{869. Giurcanu,M.andA.Trindade(2006).EstablishingconsistencyofM-estimatorsunderconcavitywithanapplicationtosomenancialriskmeasures.TechnicalReport00000,DepartmentofStatistics,UniversityofFlorida. Hahn,J.(1996).Anoteonbootstrappinggeneralizedmethodofmomentsestimators.EconometricTheory12,187{196. Hall,A.R.(2005).GeneralizedMethodofMoments.NewYork:OxfordUniversityPress. Hall,P.(1985).Resamplingacoverageprocess.StochasticProcessesandtheirApplica-tions20,231{246. Hall,P.(1988).Theoreticalcomparisonofbootstrapcondenceintervals.TheAnnalsofStatistics16,927{953. Hall,P.(1992).TheBootstrapandEdgeworthExpansion.NewYork:Springer-Verlag. Hall,P.,J.Horowitz,andB.Jing(1995).Onblockingrulesforthebootstrapwithdependentdata.Biometrika82,561{574. Hall,P.andJ.L.Horowitz(1996).Bootstrapcriticalvaluesfortestsbasedongeneralized-method-of-momentsestimators.Econometrica64,891{916. Hall,P.andB.LaScala(1990).Methodologyandalgorithmsofempiricallikelihood.InternationalStatisticalReview58,109{127. Hall,P.andM.A.Martin(1988).Onbootstrapresamplinganditeration.Biometrika75,661{671. 108

PAGE 109

Hansen,L.,J.Heaton,andA.Yaron(1996).Finite-samplepropertiesofsomealternativeGMMestimators.JournalofBusinessandEconomicStatistics14,262{280. Hansen,L.P.(1982).Largesamplepropertiesofgeneralizedmethodofmomentsestimators.Econometrica50,1029{1054. Hayashi,F.(2000).Econometrics.PrincetonUniversityPress. Hogan,J.W.andT.Lancaster(2004).Instrumentalvariablesandinverseprobabilityweightingforcausalinferencefromlongitudinalobservationalstudies.StatisticalMethodsforMedicalResearch13,17{48. Huber,P.J.(1981).RobustStatistics.NewYork:Wiley-Interscience. Imbens,G.W.(2002).Generalizedmethodofmomentsandempiricallikelihood.JournalofBusiness&EconomicStatistics20,493{506. Jing,B.Y.andA.T.A.Wood(1996).ExponentialempiricallikelihoodisnotBartlettcorrectable.AnnalsofStatistics24,365{369. Kallenberg,O.(2002).FoundationsofModernProbability.NewYork:Springer. Kitamura,Y.(1997).Empiricallikelihoodmethodswithweaklydependentprocesses.TheAnnalsofStatistics25,2084{2102. Kolaczyk,E.D.(1994).Empiricallikelihoodforgeneralizedlinearmodels.StatisticaSinica4,199{218. Kreiss,J.P.(1992).BootstrapproceduresforAR(1)processes,inJockel,K.H.,Rothe,G.andSendler,W.,eds.,Bootstrappingandrelatedtechniques.Heidelberg:Springer. Kunsch,H.R.(1989).TheJackknifeandandthebootstrapforgeneralstationaryobservations.TheAnnalsofStatistics17,1217{1241. Lahiri,S.N.(2003).ResamplingMethodsforDependentData.NewYork:Springer-Verlag. Liang,K.Y.andS.L.Zeger(1987).Longitudinaldataanalysisusinggeneralizedlinearmodels.Biometrika73,13{22. Lindsay,B.G.andA.Qu(2003).Inferencefunctionsandquadraticscoretests.StatisticalScience18,394{410. Matyas,L.(1999).ed.GeneralizedMethodofMomentsEstimation.NewYork:CambridgeUniversityPress. 109

PAGE 110

Newey,W.K.andR.J.Smith(2004).HigherorderpropertiesofGMMandgeneralizedempiricallikelihoodestimators.Econometrica72,219{255. Newhouse,J.P.andM.McClellan(1998).Econometricsinoutcomesresearch:Theuseofinstrumentalvariables.AnnualReviewonPublicHealth19,17{34. Newton,M.A.andC.J.Geyer(1995).Bootstraprecycling:AMonteCarloalternativetonestedbootstrap.JournaloftheAmericanStatisticalAssociation89,905{912. Owen,A.B.(1988).Empiricallikelihoodratiocondenceintervalforasinglefunctional.Biometrika75,237{249. Owen,A.B.(1990).Empiricallikelihoodratiocondenceregions.TheAnnalsofStatistics18,90{120. Owen,A.B.(1991).Empiricallikelihoodforlinearmodels.TheAnnalsofStatistics19,1725{1747. Owen,A.B.(2001).EmpiricalLikelihood.NewYork:Chapman&Hall/CRC. Park,C.(2000).Robustestimationandtestingbasedonquadraticinferencefunctions.Ph.D.dissertation,Dept.ofStatistics,PennsylvaniaStateUniv. Politis,D.N.andJ.P.Romano(1992).Acircularblockresamplingprocedureforstationarydata,inR.LepageandL.Billard,eds.,ExploringtheLimitsofBootstrap.NewYork:Wiley. Politis,D.N.andJ.P.Romano(1994).Thestationarybootstrap.JournaloftheAmericanStatisticalAssociation89,1303{1313. Pollard,D.(1991).Asymptoticsforleastabsolutedeviationsregressionestimators.EconometricTheory7,186{199. Presnell,B.(2002).Anoteonleastfavorablefamiliesandpowerdivergence.TechnicalReport0000,DepartmentofStatistics,UniversityofFlorida. Presnell,B.andM.Giurcanu(2007).Biased-bootstraprecycling.TechnicalReport0000,DepartmentofStatistics,UniversityofFlorida. Qin,J.andJ.Lawless(1994).Empiricallikelihoodandgeneralestimatingequations.TheAnnalsofStatistics22,300{325. Qu,A.,B.G.Lindsay,andB.Li(2000).Improvinggeneralizedestimatingequationsusingquadraticinferencefunctions.Biometrika87,823{836. 110

PAGE 111

Reiersl,O.(1941).Conuenceanalysisbymeansoflagmomentsandothermethodsofconuenceanalysis.Econometrica9,1{24. Sargan,J.D.(1958).Theestimationofeconomicrelationshipsusinginstrumentalvariables.Econometrica26,393{415. Sering,R.(1980).ApproximationTheoremsofMathematicalStatistics.NewYork:Wiley. Shao,J.andD.Tu(1995).TheJackknifeandtheBootstrap.NewYork:Springer. Shorack,G.(1981).Bootstrappingrobustregression.TechnicalReport8,DepartmentofStatistics,UniversityofWashington. Smith,R.(1997).AlternativesemiparametriclikelihoodapproachestoGeneralizedMethodofMomentsestimation.EconomicJournal107,503{519. Stein,C.(1956).Ecientnonparametrictestingandestimation.InJ.Neyman(Ed.),Pro-ceedingsoftheThirdBerkeleySymposiumonMathematicalStatisticsandProbability,Volume1,pp.187{195.UniversityofCaliforniaPress. Tauchen,G.(1986).Statisticalpropertiesofgeneralizedmethod-of-momentsestimatorsofstructuralparametersobtainedfromnancialmarketdata.JournalofBusiness&EconomicStatistics4,397{416. Tibshirani,R.(1988).Variancestabilizationandthebootstrap.Biometrika75(3),433{444. vanderVaart,A.W.(1998).AsymptoticStatistics.NewYork:Springer-Verlag. Ventura,V.(2000).Non-parametricbootstraprecycling.TechnicalReport673,DepartmentofStatistics,CarnegieMellonUniversity. White,H.(1982a).Instrumentalvariablesregressionwithindependentobservations.Econometrica50,483{499. White,H.(1982b).Maximumlikelihoodestimationofmisspeciedmodels.Economet-rica50,1{25. Wu,C.F.J.(1986).Jackknife,bootstrapandotherresamplingmethodsinregressionanalysis(withDiscussion).TheAnnalsofStatistics14,1261{1350. 111

PAGE 112

MihaiGiurcanuwasbornonSeptember19,1975inMangalia,Romania.UpongraduationfromhighschoolinJuly1994,heenrolledasastudentintheFacultyofMathematicsatUniversityofBucharestfromwhencehereceivedadegreeofBachelorofArtsinMathematicsinJuly1998.InSeptember1998heenteredaMaster'sprograminAppliedStatisticsandOptimizationattheFacultyofMathematicsatUniversityofBucharest.Duringthistime,hewasappointedasTeachingAssistantinMathematicsatPolytechnicUniversityofBucharest.InJuly2000,hereceivedaMaster'sofArtsdegreeinAppliedStatisticsandOptimizationfromtheUniversityofBucharest.InNovember2000,heobtainedaFellowshipinStatisticsatWeierstrassInstituteofBerlin.InAugust2002heenteredaPhDprogramintheDepartmentofStatisticsatUniversityofFlorida.DuringhisgraduateeducationattheUniversityofFlorida,hewasalsoappointedasteachingassistantandinstructortodierentclassesintheDepartmentofStatistics.HegraduatedinAugust2007.HisdissertationisentitledBiasedBootstrapMethodsforSemiparametricModels.HismainresearchinterestsinStatisticsareresamplingtechniques,biostatistics,econometrics,andtimeseries. 112