Interval Estimation for the Mean of the Selected Populations

MISSING IMAGE

Material Information

Title:
Interval Estimation for the Mean of the Selected Populations
Physical Description:
1 online resource (60 p.)
Language:
english
Creator:
Fuentes,Claudio
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Statistics
Committee Chair:
Casella, George
Committee Members:
Ghosh, Malay
Daniels, Michael J
Peter, Gary F

Subjects

Subjects / Keywords:
ci -- confidence -- coverage -- estimation -- inference -- interval -- largest -- mean -- normal -- population -- probability -- selected -- simultaneous
Statistics -- Dissertations, Academic -- UF
Genre:
Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Consider an experiment in which p independent populations pi_i, with corresponding unknown means theta_i are available and suppose that for every 1<= i <= p, we can obtain a sample X_i1,...,X_in from pi_i. In this context, researchers are sometimes interested in selecting the populations that give the largest sample means as a result of the experiment, and to estimate the corresponding population means theta_i. In this dissertation, we present a frequentist approach to the problem, based on the minimization of the coverage probability, and discuss how to construct confidence intervals for the mean of k>=1 selected populations, assuming the populations pi_i are normal and have a common variance sigma^2. Finally, we extend the results for the case when the value of k is randomly chosen and discuss the potential connection of the procedure with false discovery rate analysis. We include numerical studies and a real application example that corroborate this new approach produces confidence intervals that maintain the nominal coverage probability while taking into account the selection procedure.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Claudio Fuentes.
Thesis:
Thesis (Ph.D.)--University of Florida, 2011.
Local:
Adviser: Casella, George.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2012-08-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2011
System ID:
UFE0043284:00001


This item is only available as the following downloads:


Full Text

PAGE 2

2

PAGE 3

3

PAGE 4

IwouldliketogratefullyandsincerelythankDr.GeorgeCasellaforhisguidance,understandingandpatienceduringmygraduatestudiesattheUniversityofFlorida.Workingwithhim,asaresearchassistantandasastudent,hasbeenoneofthemostrewardingexperiencesofmylife.HiswealthofknowledgeandexperiencehasshapednotonlythewayIunderstandstatisticstoday.Iwouldalsoliketothankmygraduatecommitteemembers:Dr.MichaelDaniels,Dr.MalayGhoshandDr.GaryPeterfortheirunderstandingandsupport,throughoutthewholeprocess.Theirsharpcommentsandsuggestionshavegreatlyimprovedthequalityofthiswork.Iamdeeplygratefultoallmyteachersandprofessors.InparticularthoseattheUniversityofFloridaandthePonticiaUniversidadCatolicadeChile.ItisnotaexaggerationtosaythatalmosteverythingIknowtodayistheproductoftheirdedicationandexcellenceatteaching.Withoutanydoubts,theythoughtmemorethanIcouldlearn.ThankyouDr.AlvaroCofre.Iwouldnotbeherewritingtheselinesifitwasnotforyourconstantsupportandinspiration.Finally,IwouldliketothankmyparentsJorgeFuentesandEdithMelendez.ItisbecauseoftheirunconditionalloveandsupportthatIhavebeenabletoreachthisfar. 4

PAGE 5

page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 6 LISTOFFIGURES ..................................... 7 ABSTRACT ......................................... 8 CHAPTER 1INTRODUCTION ................................... 9 1.1TwoFormulationsoftheProblem ....................... 9 1.2InferenceontheSelectedMean ....................... 10 2INTERVALESTIMATIONFOLLOWINGTHESELECTIONOFONEPOPULATION .................................... 13 2.1TheKnownVarianceCase .......................... 14 2.2TheUnknownVarianceCase ......................... 21 2.3NumericalStudies ............................... 22 2.4TablesandFigures ............................... 24 3CONFIDENCEINTERVALSFOLLOWINGTHESELECTIONOFk1POPULATIONS .................................... 29 3.1AnAlternativeApproach ............................ 35 3.2NumericalStudies ............................... 42 3.3TablesandFigures ............................... 44 4INTERVALESTIMATIONFOLLOWINGTHESELECTIONOFARANDOMNUMBEROFPOPULATIONS ........................... 46 4.1ConnectiontoFDR ............................... 48 4.2TablesandFigures ............................... 50 5APPLICATIONEXAMPLE .............................. 53 5.1FixedSelection ................................. 53 5.2RandomSelection ............................... 54 5.3TablesandFigures ............................... 55 6CONCLUSIONS ................................... 56 LISTOFREFERENCES .................................. 58 BIOGRAPHICALSKETCH ................................ 60 5

PAGE 6

Table page 2-1Congurationofthenewparameterizationforthecoverageprobability ..... 24 2-2Congurationofthenewparameterizationforthecasep=3 24 2-3Representationoftheparametersi,jwhenp=k+1 24 2-4Coverageprobabilityof95%CIfortheselectedmeanwhenp=4 25 3-1Structureofthe'sforthecasep=4,k=2 44 3-2Coverageprobabilitiesforthenumberofpopulationmeansvsthenumberofselectedpopulations ................................. 44 3-3Observedcondencecoefcientfor95%CIwhenp=6 44 3-4Cutoffpointsfor95%CIusingthenewmethod .................. 45 5-1Condenceintervalsforxedtoplog-scoredifferences .............. 55 5-2Condenceintervalsforrandomtoplog-scoredifferences ............ 55 6

PAGE 7

Figure page 2-1Coverageprobabilityasafunctionof21and32whenp=3 25 2-2Plotof@h=@21whenp=3 26 2-3Plotsofthersttwotermsof@h=@21 26 2-4Condencecoefcientvsthenumberofpopulationsfortheiidcaseand=0.05 27 2-5Cutoffpointversusnumberofpopulationsfortheiidcaseand=0.05 28 3-1Coverageprobabilitiesasafunctionofwhenp=6 45 4-1IndividualcomponentsforthecoverageprobabilityforrandomK 50 4-2LowerboundforrandomKvaryingtheprobabilityselection ........... 51 4-3CoverageprobabilitiesforrandomKfordifferentvaluesofp 52 7

PAGE 8

8

PAGE 9

Bechhofer ( 1954 ), GuptaandSobel ( 1957 ).Inhispaper,Bechhoferpresentsasinglesamplemultipledecisionprocedureforrankingmeansofnormalpopulations.Assumingthevariancesofthepopulationsareknown,heisabletoobtainclosedformexpressionsfortheprobabilitiesofacorrectrankingindifferentscenarios.Thisapproachismoreconcernedwithselectionofthepopulationwiththelargestmeanratherthanestimationofthatmean.Guptaandco-authorshavepioneeredthesubsetselectionapproach,inwhichasubsetofpopulationsisselectedwithaminimumprobabilityguaranteeofcontainingthelargestmeanwithcertainprobabilityP(see GuptaandPanchapakesan ( 2002 ));whileBechhoferusesanindifferentzone.Thatis,thereisaminimumguaranteedprobabilityofselectingthepopulationwiththelargestmean,aslongasthatmeanisseparatedfromthesecondlargestbyaspecieddistance(see Bechhoferetal. ( 1995 )). 1. Selectthepopulationthathasthelargestparameter,maxf1,...,pg,andestimateitsvalue. 9

PAGE 10

Selectthepopulationwiththelargestsamplemean,andestimatethecorrespondingi.Therstoftheseproblemshasbeenwidelydiscussedintheliterature.Forexample, BlumenthalandCohen ( 1968 )considerestimatingthelargermeanfromtwonormalpopulationsandcomparedifferentestimators,buttheydonotdiscusshowtomaketheselection.Inthisdirection, GuttmanandTiao ( 1964 )proposeaBayesianprocedureconsistinginthemaximizationoftheexpectedposteriorutilityforacertainutilityfunctionU(i).Inthesamedirection,butfromafrequentistperspective, SaxenaandTong ( 1969 ), Saxena ( 1976 ),and ChenandDudewicz ( 1976 )considerpointandintervalestimationofthelargestmean. PutterandRubinstein ( 1968 )).Thisissuebecomesclearifweconsiderallthepopulationstobeidenticallydistributed,forwewillbeestimatingthepopulationmeanbyanextremevalue. Dahiya ( 1974 )addressesthisproblemforthecaseoftwonormalpopulationsandproposedestimatorsthatperformbetterintermsoftheMSE.Progresswasmadeby CohenandSackrowitz ( 1982 ), CohenandSackrowitz ( 1986 )and GuptaandMiescke ( 1990 ),whereBayesandgeneralizedBayesruleswereobtainedandstudied.However,performancetheoremsarescarce.Oneexceptionis Hwang ( 1993 ),whoproposesanempiricalBayesestimatorandshowsthatitperformsbetterintermsoftheBayesriskwithrespecttoanynormalprior.Anotherexceptionis SackrowitzandSamuel-Cahn ( 1984 )who,inthecaseofthenegativeexponentialdistribution,ndUMVUEandminimaxestimatorsofthemeanoftheselectedpopulation.Theproblemofimprovingtheintuitiveestimatoristechnicallydifcult.Inaddition,despitetheobviousbiasproblem,ithasbeendifculttoestablishitsoptimality 10

PAGE 11

Berger ( 1976 ), Brown ( 1979 )and Lele ( 1993 )arenotstraightforward.Inthisdirection, Stein ( 1964 )establishedtheminimaxityandadmissibilityofthenaiveestimatorfork=2.Minimaxityforthegeneralcase,wasestablishedlaterby SackrowitzandSamuel-Cahn ( 1986 ),weretheydiscussedthecasenormalcasefork3.Admissibility,forthegeneralcase,appearstobestillopen.Similarly,intervalestimationisanequallychallengingandagain,littlecanbefoundintheliterature.Typically,condenceintervalsareconstructedintheusualway,usingthestandardnormaldistributionasareferencetoattainthedesiredcoverageprobability.Howevertheseintervalsdonotmaintainthenominalcoverageprobability,asthenumberofpopulationsincrease. QiuandHwang ( 2007 )proposeanempiricalBayesapproachtoconstructsimultaneouscondenceintervalsforKselectedmeans,butwearenotawareofanyotherattemptstosolvethisproblem.Intheirpaper,QiuandHwangconsideranormal-normalmodelforthemeanoftheselectedpopulation,whichassumesthateachpopulationmeanifollowsanormaldistribution.UndertheseassumptionstheyareabletoconstructsimultaneouscondenceintervalsthatmaintainthenominalcoverageprobabilityandaresubstantiallyshorterthantheintervalsconstructedusingtheBonferroni'sbounds.Howeverthecondenceintervalstheyproposeareasymptoticallyoptimal,andsincetheircoverageprobabilitiesareobtainedaveragingoverbothsamplespaceandprior,theydonotgiveavalidfrequentistinterval.Wearenotawareofanyotherattemptstosolvethisproblem.Recently,amodernvariationofthisproblemhasbecomeverypopular,withamajorreasonbeingtheexplosionofgenomicdata,callingforthedevelopmentofnewmethodologies.Forinstance,ingenomicstudies,lookingeitherfordifferentialexpressionorgenomewideassociation,thousandsofgenesarescreened,butonlyasmallernumberareselectedforfurtherstudy.Consequently,theassessmentof 11

PAGE 12

12

PAGE 14

2 )canbeexplicitlydeterminedusingthejointdistributionof(X1,...,Xp).Forexample,wheni=1(thersttermofthesum),wehave 2 ),assumingthepopulationvariance2isknown,andpresentanewapproachtoobtainthedesiredcondenceintervals. 14

PAGE 15

2 )intermsof21...p1,andobtainP(12X1c,X1X2,...,X1Xp)=P(jzjc,!221,...,!pp1)=1 (2)p=2Zcc(pYj=2Z1j1e1 2(!jz)2d!j)e1 2z2dz.Noticethatforxedz,theintegralswithinthecurlybracketsfgareessentiallythetailprobabilityofanormaldistributioncenteredatz.Therefore,wecanwriteP(jzjc,!221,...,!pp1)=Zcc(pYj=2(zj1))(z)dz,where()denotesthepdfofthestandardnormaldistribution.Ofcourse,thesameargumentisvalidfortheremainingtermsofthesumin( 2 ).ItfollowsthatwecanfullydescribetheprobabilityP((1)2X(1)c)intermsofanewsetofparametersij's,whereij=ijfor1i,jp.Underthisrepresentation,foreveryc>0,thevalueofthecoverageprobabilityP((1)2X(1)c)isdeterminedbytherelativedistancesbetweenthepopulationmeansi,i=1,...,p.Inotherwords,wecoverageprobabilitydenesafunctionhc()=P((1)2X(1)c),where=(11,12,...,pp)isthevectorofpossiblecongurationsoftherelativedistancesij's.Inthiscontext,wecanobtaincondenceintervalsfor(1),thathave(atleast)therightnominallevel,byminimizingrstthefunctionhc.Specically,given01,wecandeterminethevalueofc>0thatsatises 15

PAGE 16

1. 2. 3. Forj>k,jk=j,j1+j1,j2+...+k+1,k.Thesepropertiesrevealacertainunderlyingsymmetryinthestructureoftheproblem.ThissymmetryisportrayedinTable 2-1 whereeveryentryijcorrespondstothedifferencebetweenthevaluesofiandjlocatedinrowiandcolumnjrespectively.Inaddition,Property3indicatesthatweonlyneedtoconsiderp1parametersinordertodeterminethevalueofP((1)2X(1)c).Infact,foranygivenorderingoftheparametersi's,wecanalwayschoosearepresentationoftheprobabilityin( 2 )basedonp1parametersij.Asaresult,wehavethatthetrueorderingofthepopulationmeansi'sisnotparticularlyrelevantinthisapproach,andhence,wewillassume(withoutanylossofgenerality)that12...p.Althoughtheintroductionofthenewparameterizationseemstoreduce(inasense)thecomplexityoftheproblem,theminimizationofhcisstilldifcult.First,becauseofthedelicatebalanceexistingbetweentheij'sinthefullexpression(seeTable 2-1 )andsecond,becausetheformulaofthecoverageprobabilityissomehowinvolved.Toillustratetheseproblems,letusdiscussthecasep=2.WehaveP((1)2X(1)c)=Zcc(z12)(z)dz+Zcc(z+12)(z)dz=Zcc[(z12)+(z+12)](z)dz,where12>0.Sinceonlythequantityinbrackets[]dependson21and(z)>0,itseemsreasonabletothinkthathc(12)=P((1)2X(1)c)isminimizedatthesamepointwheregz(12)=(z12)+(z+12)ndsitsminimum.However,differentiatinggz

PAGE 17

2-2 .Weobtain 2z2dz 2z2dz+1 2z2dz,where12,230and()denotesthecdfofthestandardnormaldistribution.Preliminarystudiessuggestthattheglobalminimumofhc(12,23)=P((1)2X(1)c)islocatedattheorigin(seeFigure 2-1 ),butaformalproofisrequired.Tothisend,itissufcienttoshowthat@hc=@23>0and@hc=@12>0. 17

PAGE 18

2Zcc(z+23)e1 2(23+12+z)21 2z2dz 2Zcc(z12)e1 2(23+12z)21 2z2dz+1 2Zcc(z23)e1 2(12+z)21 2z2dz1 2Zcc(z2312)e1 2(12z)21 2z2dz.Sincethepartialderivativedependsonboth12and23,thebehaviorofitssignisnotobvious,butdifferentnumericalstudiessupporttheideathatthederivativeisnon-negative.Figure 2-2 showstheplotoftheintegrandof@hc=@12forxedvaluesof12and23.Noticethatifwegroupthersttwotermsandthelasttwotermsof( 2 ),wecanlookatthepartialderivativeasthesumoftwodifferences.InFigure 2-3 weobserve(inseparateplots)theintegrandsofthersttwotermsofthepartialderivative@hc=@12,forxedvaluesof12and23.Theplotsuggestthattheintegrandsdifferonlybyalocationparameter.Infact,changingvariables,wecanrewritetheexpressionin( 2 )as 2Z23+12+c23+12cZcc(z12)e1 2(23+12z)21 2z2dz,D2=1 2Z12+c12cZcc(z2312)e1 2(12z)21 2z2dz.Recallthat12>0,thenlookingatD2,wehavetwopossibilitiesfortheintervalsofintegration: 1.

PAGE 19

2 )weobtaintheratioisgreaterthan1(regardlessthecase)whichiscompellingtoconcludethatD2>0.Noticethattheargumentstillholdsifwereplacethecdf()byanynon-decreasingfunctionorifwechangetheinterval(c,c)for(c1,c2),wherec1,c2>0.Thisway,weobtainthefollowingmoregeneralresult: 2(1z)21 2z2dz0,wheretheinequalityisstrictwheneverthefunctionfismonotonicallyincreasinginz.

PAGE 20

2.1 isthatD1>0.Asaresult,weobtainthat@h=@12>0.Asimilarargumentshowsthat@h=@23>0,completingtheproof.ItfollowsthatcoverageprobabilityP((1)2X(1)c)isminimizedat12=23=0,thatis,whenever1=2=3.ObservethatProposition 2.1 givesastraightforwardproofforthecasep=2.Ineffect,forhc(12)=P((1)2X(1)c),wehavedhc 2.1 withf=1=2,weobtainthath0c(12)0.Itimmediatelyfollowsthatthecoverageprobabilityisminimizedat12=0,orequivalently,when1=2.Forthegeneralcase(p>3),weobservethatwhenmovingfromthecasep=ktothecasep=k+1,weonlyneedtoincludetheextraparameterk+1,kinordertodescribetheproblem(seeTable 2-3 ).Then,usingProposition 2.1 andmathematicalinductionweobtainthefollowingresult:

PAGE 21

whereZi=(Xii)=andij=(ij)=for1i,j3.Noticethattakingt=s=wecanrewriteeachterminthesum( 2 )asamixture.WeobtainP((1)2(X(1)sc)=Z10P(jZ1jct,Z1Z2+21,Z1Z3+31jt)'(t)dt+Z10P(Z2Z1+21,jZ2jct,Z2Z3+32jt)'(t)dt+Z10P(Z3Z1+13,Z3Z2+32,jZ3jctjt)'(t)dt,

PAGE 22

2.1 )thattheprobabilityP((1)2X(1)tcjt)intheintegralisminimizedat1=2=3.ThegeneralizationofthisresultfollowsfromadirectapplicationofLemma 1 2 22

PAGE 23

2-4 showstheresultofsimulationsconsideringupto30populationswiththesamemeanandsetting=0.05.Thesolidbluelinerepresentsthecondencecoefcientobtainedusingourproposedcondenceintervalsandthedashedredlinedepictsthebehaviorofthecondencecoefcientobtainedusingthestandardcondenceintervals.Observethatthesolidlineisconstantatthenominallevel95%.Intuitively,inordertomaintainthecoverageprobabilityconstant,thecondenceintervalsneedtogetwider.However,thisincrementisnotdramaticandslowdownasthenumberofpopulationsincrease.Forinstance,ifweconsider10000populations,thevalueofthecutoffpointisonlyabout4.41.Infact,fromtheinequalityinTheorem 2.1 itcanbedeterminedthatthebehaviorofthecutoffvaluecp 2-5 showsthebehaviorofthecutoffpointc,asthenumberofpopulationsincreaseforthecase=0.05.Thesolidlinecorrespondtothevalueofthestandardcutoffpointfora95%condenceinterval(z=2=1.96).Thedashed/dottedlinerepresentsthevalueofcforthenewcondenceintervalsandthedashedlinecorrespondtothecutoffvaluesfortheBonferroniintervals.Inanappliedsituation,thepopulationmeansi(1ip)willberarelyidentical.Henceweneedtocomparetheperformanceofthecondenceintervalswhenthepopulationsmeansaredifferent.Table 2-4 summarizesomeresultsobtainedby 23

PAGE 24

2-4 ),thetraditionalintervalsmayperformpoorly. CongurationofthenewparameterizationfortheprobabilityP((1)2X(1)c).Inthetableij=ij. ............p Table2-2. Congurationofthenewparameterizationforthecasep=3,when12and23arethefreeparameters.Inthetableij=ij. Representationoftheparametersi,jforthecasep=k+1. ...............k

PAGE 25

Observedcoverageprobabilityof95%CIforthemeanoftheselectedpopulationoutoffourpopulationsusingthetraditionalandthenewmethod.Thereportedvaluescorrespondtotheaverageaftertenreplicationsandthenumberinparenthesisisthecorrespondingstandarderror. (0,0,0,0)0.9040.952(0.0016)(0.0012) (0,0.25,0.5,1)0.9070.952(0.0020)(0.0011) (0,5,10,15)0.9500.974(0.0014)(0.0009) (0,0,0,2)0.9280.9584(0.0042)(0.0027) (0,0,0,5)0.9520.973(0.0031)(0.0028) Figure2-1. Coverageprobabilityasafunctionof21and32whenp=3. 25

PAGE 26

Plotof@h=@21forpredeterminedvaluesof21and32. Figure2-3. Plotsofthersttwotermsof@h=@21forpredeterminedvaluesof21and32. 26

PAGE 27

Condencecoefcientversusnumberofpopulationsforthecaseofidenticalpopulationmeansand=0.05.Thesolidbluelinecorrespondstothecondencecoefcientforthenewcondenceintervals,andthedashedredlinecorrespondstothecondencecoefcientforthetraditionalcondenceintervals. 27

PAGE 28

Cutoffpointversusnumberofpopulationsforthecaseofidenticalpopulationmeansand=0.05.Thedashedbluelinecorrespondstothecutoffvalueforthetraditionalcondenceinterval,z=2=1.96.ThedashedredlinecorrespondstothecutoffvalueforthenewintervalsandthedashedlinecorrespondstothecutoffvaluefortheBonferroniintervals. 28

PAGE 29

3 )asXj16=...6=jkP((1)2X(1)c,...,(k)2X(k)c,X(1)=Xj1,...,X(k)=Xjk),wherethesumhaspkterms.Letusconsiderrst,thecasep=4andk=2.Then,theprobabilityofinterestis 29

PAGE 30

3 )canbewrittenP(12X1c,22X2c,X1X2,X2X3,X2X4)=P(jZ1jc,jZ2jc,Z2Z1+12,Z3Z2+23,Z4Z2+24)andmakinguseofthenormalityassumptions,wecanexplicitlywriteP(12X1c,22X2c,X1X2,X2X3,X2X4)=ZccZmin(c,z121)c(z232)(z243g)(z1)(z2)dz2dz1+ZccZmin(c,z212)c(z131)(z141g)(z1)(z2)dz1dz2Ofcourse,thesameargumentisvalidfortheothertermsinthesum.Thisway,consideringallthe12possiblecongurationsfortheorderoftherandomvariablesX1, 30

PAGE 31

3 )inclosedformP((1)2X(1)c,(2)2X(2)c)=ZccZmin(c,z121)c(z232)(z243g)(z1)(z2)dz2dz1+ZccZmin(c,z212)c(z131)(z141g)(z1)(z2)dz1dz2+ZccZmin(c,z131)c(z323)(z343g)(z1)(z3)dz3dz1+ZccZmin(c,z313)c(z121)(z141g)(z1)(z3)dz1dz3+ZccZmin(c,z141)c(z424)(z434g)(z1)(z4)dz4dz1+ZccZmin(c,z414)c(z121)(z131g)(z1)(z4)dz1dz4+ZccZmin(c,z232)c(z313)(z343g)(z3)(z2)dz3dz2+ZccZmin(c,z323)c(z212)(z242g)(z3)(z2)dz2dz3+ZccZmin(c,z242)c(z414)(z434g)(z4)(z2)dz4dz2+ZccZmin(c,z424)c(z212)(z232g)(z4)(z2)dz2dz4+ZccZmin(c,z343)c(z414)(z424g)(z3)(z4)dz4dz3+ZccZmin(c,z434)c(z313)(z323g)(z3)(z4)dz3dz4Inordertominimizethisexpression,weneedtoaddresstwodifcultiesequallychallenging: 31

PAGE 33

3-1 .Thispatternisparticularlyimportantsinceitsuggeststogeneralizetheexpressionforanyvaluesofpandk.Inordertodeterminethecongurationof'sthatminimizetheexpressionin( 3 ),weassume(withoutlossofgenerality)that1234,thiswayij0foranyij.Also,weconsider12,23and34asfreeparameters.Basedonourpreviousresults,itisreasonabletobelievethattheminimumof( 3 )isreachedattheorigin.Inordertoprovethisclaimwehavestudiedthebehaviorofthe 33

PAGE 34

3-1 ),butthecurrentformulationoftheproblemmakesdifculteventoestablishthatisnotlocatedattheinterioroftheregiondeterminedby12,23and34.Thesedifcultiescallforadifferentapproachwhichwediscussinthefollowingsection. 34

PAGE 35

35

PAGE 36

whereIj=fj1,...,jkg,thesetofindicesforthetopkvariablesinthej-tharrangementandIcj=fjk+1,...,jpg,thesetofindicesforthebottompkvariablesinthej-tharrangement.Noticethatifk=1wearebackinthecasediscussedinChapter2andthecasek=pcorrespondtosimultaneouscondenceintervals.Letustakeacloserlookatthisformulaandconsiderrstthecasep=6andk=3.Insuchcase,thesumin( 3 )willhave63=20termsdeterminedbythecongurations 36

PAGE 37

37

PAGE 39

3 )canbewrittenasZZR1(z1+12)(z1)(z3)dz1dz3+ZZR2(z3+32)(z1)(z3)dz1dz3.Similarly,theintegralin( 3 )canbewrittenasZZR1(z1+12)(z1)(z3)dz1dz3+ZZR2(z1+12)(z1)(z3)dz1dz3

PAGE 40

wheretheequalityisattainedasymptoticallyaspapproachesinnity.Integrating( 3 )withrespecttozp,weobtain((c)(c))24(p1k1)Xj=1ZccZccYm2Icjmin`2Ijfpgfz`+`gmY`2Ijfpg(z`)dz`35wherethequantityinbrackets[]isexactlythecoverageprobabilityforselectingk1outofp1.Repeatingtheargument,butnowlettingp1"1,weobtainthelowerbound((c)(c))224(p2k2)Xj=1ZccZccYm2Icjmin`2Ijfpgfz`+`gmY`2Ijfp,p2g(z`)dz`35.

PAGE 42

3-2 showstheresultofasimulationstudyinwhichweconsideredsixpopulationsandwevariedthenumberofoftheselectedones.Intherstcolumnwecanseethenumberofpopulationmeanssetequaltozero(the 42

PAGE 43

3-3 summarizestheresultsfortheobservedcoverageprobabilitiesconsidering6populationsobtainedinanumericalstudy.Thenominallevelis95%.Inthetable,therstcolumnshowsdifferentcongurationsforthepopulationmeansandtherstrowsindicatethenumberofselectedpopulations.Weobservedthatforeverycongurationtheobservedcoverageprobabilityisneverbelowthenominallevel.Theseresultremainvalidforeveryothercongurationwehaveconsidered(includingchangingthenumberofpopulations)whichvalidatesthereliabilityoftheprocedure.Finally,westudiedthebehaviorofthelengthoftheintervals.InChapter2weobservedthatthecondenceintervalsincreaseinlengthasthenumberofpopulationsincreases.Thisbehaviorisalsoexpectedwhenweareselectingk>1populations,howeverititisimportanttodeterminehowthevalueofkaffectsthelengthoftheintervals.Table 3-4 showstheresultsofanumericalstudyinwhichweconsidereddifferentvaluesofp(totalnumberofpopulations)andk(numberofselectedpopulations).Inthetable,therstcolumnsshowsthenumberofpopulations,andtherstrowthenumberofselectedpopulations.Inthebodyweobservethevaluesofthecutoffpointsfora95%condenceintervalsforthecorrespondingconguration,andthelastcolumnshowsthecutoffvaluesfor95%simultaneouscondenceintervalsusingBonferroni.WenoticethattheproposedintervalsarealwaysshortertoBonferroni,evenwhenweselectalltheavailablepopulations(p=k).Thisdifferenceincreasesasthenumberofpopulationsincreases. 43

PAGE 44

Structureofthe'sforthecasep=4,k=2(see 3 ).Eachrowrepresentaterminthesum. Top +13+23+14+24(X1,X3) +1223+14+34(X1,X4) +1224+1334(X2,X3) Coverageprobabilitiesforthenumberofpopulationmeansequalto0(rstcolumn)vsthenumberofselectedpopulation(rstrow). #ofi=0 0.7400.7400.7390.7380.7140.5315 0.8980.6980.6980.6970.6820.5314 0.9040.8130.6620.6620.6540.5313 0.8610.8530.7300.6260.6230.5312 0.8190.8180.8050.6580.5920.5311 0.7770.7770.7760.7570.5900.5310 0.7400.7400.7390.7380.7140.531 Table3-3. Observedcoverageprobabilityfor95%CIforthemeanoftheselectedpopulationswhenp=6usingthenewmethod. (0,0,0,0,0,0) 44

PAGE 45

Cutoffpointsfor95%CIfordifferentvaluesofpandkusingthenewmethod. NumPop 1 1.960 1.9602 1.9602.236 2.2413 2.1212.2362.388 2.3944 2.2342.3192.3882.491 2.4985 2.3192.3872.4432.4912.569 2.576 Figure3-1. Coverageprobabilitiesasafunctionofwhenp=6.Theplotssuggesttheminimumisnotreachedattheorigin. 45

PAGE 46

Fromourpreviousresults,wenoticethatforeveryterminthesumP((1)2X(1)c,...,(j)c)((c)(c))j1pj+1(c)pj+1(c),

PAGE 47

4 )andobtain where()isthecdfofthestandardnormaldistribution.Sincetheinequalityaboveisnotobtainedbydirectminimizationofthecoverageprobabilityin( 4 ),anysolutionbasedon( 4 )islikelytobetooconservative.Therefore,itisimportanttoassesstheperformanceoftheproposedboundintermsofotsproximitytothecoverageprobability.Therstthingtodetermine,isthebehaviorofthelowerboundsatthecomponentlevel(K=j).Figure 4-1 showstheresultsofanumericalstudyconsideringthecomponentsK=1,...,K=6ofthecoverageprobabilitywhenp=6.Thedashedbluelineshowsthebehavioroftherespectivecomponentasthenormof=(1,...,6)increasesandtheredsolidlineshowsthecorrespondinglowerbound.Weobservethelowerbound(fortheindividualterms)isnotextremelyconservative.Ontheotherhand,theprobabilitythatK=jisgivenby whereP(Xidj)=1(dji)istheprobabilityofselectionforpopulationi.Noticethattheexpressionin( 4 )resemblesabinomialdistribution.Infact,taking1=...=p=,wehave(pj)Xi=1Yi2Ii[1(dji)]Yi2Ici(dji)=pj[1(dj)]j[(dj)]pj,

PAGE 48

4-2 showstheresultsofanumericalstudyinwhichwetaked1=...=dp=d,andusethequantityqasatuningparameter.Weseethatbychangingthevalueoftheprobabilityofselectionwecanmovethepositionlowerbound(redsolidline)andproducesomeimprovementintheapproximationofthethecoverageprobability.Basedonthepreviousobservations,wecanobtainanapproximatesolutiontotheproblemanddeterminec>0usingtheequation1=pXj=1pj((c)(c))j1pj+1(c)pj+1(c)[1(qj)]j[(qj)]pj,forany0<<1.Numericalstudiessuggesttheresultsbasedontheexpressionabovearenotextremelyconservative.Inaddition,theresultssuggesttheperformanceofthemethodgreatlyimprovesasthenumberofpopulationincreases(seeFigure 4-3 ). BenjaminiandHochberg ( 1995 )andisatechniquecommonlyusedbypractitionersinthecontextofmultipletesting.Themainideaistocontroltheproportionoferrorscommittedbyfalselyrejectingnullhypotheses.Insimpleterms,theprocedureworksinthefollowingway:supposethatweneedtotestH1,...,Hmhypothesesandwearenotwillingtoacceptaproportionoffalsediscoveriesgreaterthanq.WerstranktheP-values(andcorrespondinghypothesis)resultingfromallthetestsfromsmallertolargestanddenethesequenceq1,q2,...,qpaccordingtoqi=(i=m)qfori=1...p).Then,wedenektobethelargestisuchthatP-valuei
PAGE 49

49

PAGE 50

IndividualcomponentsandcorrespondingboundsforthetermsofthecoverageprobabilityforrandomKwhenp=6.Thebluedashedlinecorrespondtothecoverageprobabilityandtheredsolidlineisthelowerbound. 50

PAGE 51

BehaviorofthelowerboundforrandomKwhenp=6astheprobabilityofselectionvaries. 51

PAGE 52

BehaviorofthecoverageprobabilitiesandrespectivelowerboundsforrandomKasthepopulationsizepvaries. 52

PAGE 53

53

PAGE 54

5-1 showsthemean,standarddeviationandcondenceintervalsforthetop5andbottom5selectedgenes.Thetableshowthat,althoughthevalueofthecutoffpointcisseeminglylarge,theactualcondenceintervalsarenarrowenoughtodrawpracticalconclusions. 5-2 showsthemean,standarddeviation,P-valueandcondenceintervalsforthemeandifferenceofthe25populationsselectedusingtheFDRcriteria.Again,weobservetheintervalsarenarrowenoughtocarryoutmeaningfulinference.Infacttheresultsofalltheintervalsagreewiththeconclusionsofthetests. 54

PAGE 55

Condenceintervalsbasedontheselectionofthetop100log-scoredifferences RankingMeanStDev95%CI 14.760.262(4.247,5.268)24.380.303(3.790,4.969)33.930.203(3.534,4.325)43.790.519(2.782,4.804)53.520.600(2.351,4.685)961.350.930(-0.457,3.163)971.350.680(0.029,2.675)981.351.459(-1.488,4.189)991.350.911(-0.428,3.118)1001.340.915(-0.445,3.117) Table5-2. Condenceintervalsbasedontheselectionofthetoplog-scoredifferences,randomlychosenusingFDR. MeanStDevP-value95%CI 4.380.3012.67e-08(3.778,4.981)3.930.2033.93e-08(3.526,4.333)4.760.2622.13e-07(4.237,5.279)2.850.6621.60e-06(1.532,4.161)0.950.2362.00e-05(0.483,1.420)3.030.5882.82e-05(1.859,4.194)0.990.3384.84e-05(0.320,1.662)0.480.0866.44e-05(0.311,0.652)1.830.3516.50e-05(1.130,2.526)1.240.4496.64e-05(0.345,2.129)0.880.2326.95e-05(0.417,1.337)1.250.4577.87e-05(0.344,2.159)2.611.1778.81e-05(0.271,4.943)1.320.4839.02e-05(0.357,2.277)0.980.1739.16e-05(0.638,1.324)3.520.6009.57e-05(2.328,4.709)2.740.7391.04e-04(1.277,4.212)1.420.4501.15e-04(0.530,2.319)1.000.4311.19e-04(0.139,1.851)1.120.5101.25e-04(0.103,2.127)3.500.5821.29e-04(2.343,4.654) 55

PAGE 56

56

PAGE 57

57

PAGE 58

Bechhofer,R.(1954).Asingle-samplemultipledecisionprocedureforrankingmeansofnormalpopulationswithknownvariances.TheAnnalsofMathematicalStatis-tics25(1),16. Bechhofer,R.,T.Santner,andD.Goldsman(1995).Designandanalysisofexperimentsforstatisticalselection,screening,andmultiplecomparisons.Wiley. Benjamini,Y.andY.Hochberg(1995).Controllingthefalsediscoveryrate:apracticalandpowerfulapproachtomultipletesting.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),289. Berger,J.(1976).InadmissibilityresultsforgeneralizedBayesestimatorsofcoordinatesofalocationvector.TheAnnalsofStatistics,302. Blumenthal,S.andA.Cohen(1968).Estimationofthelargeroftwonormalmeans.JournaloftheAmericanStatisticalAssociation63(323),861. Brown,L.(1979).Aheuristicmethodfordeterminingadmissibilityofestimatorswithapplications.TheAnnalsofStatistics,960. Chen,H.andE.Dudewicz(1976).Proceduresforxed-widthintervalestimationofthelargestnormalmean.JournaloftheAmericanStatisticalAssociation71(355),752. Cohen,A.andH.Sackrowitz(1982).EstimatingtheMeanoftheSelectedPopulation.ThirdPurdueSymposiumonStatisticalDecisionTheoryandRelatedTopics. Cohen,A.andH.Sackrowitz(1986).ADecisionTheoreticFormulationforPopulationSelectionFollowedbyEstimatingtheMeanoftheSelectedPopulation.FourthPurdueSymposiumonStatisticalDecisionTheoryandRelatedTopics,243. Dahiya,R.(1974).Estimationofthemeanoftheselectedpopulation.JournaloftheAmericanStatisticalAssociation69(345),226. Gupta,S.andK.Miescke(1990).Onndingthelargestnormalmeanandestimatingtheselectedmean.Sankhya:TheIndianJournalofStatistics,SeriesB52(2),144. Gupta,S.andS.Panchapakesan(2002).Multipledecisionprocedures:theoryandmethodologyofselectingandrankingpopulations.SocietyforIndustrialMathematics. Gupta,S.andM.Sobel(1957).Onastatisticwhicharisesinselectionandrankingproblems.TheAnnalsofMathematicalStatistics28(4),957. Guttman,I.andG.Tiao(1964).ABayesianapproachtosomebestpopulationproblems.TheAnnalsofMathematicalStatistics35(2),825. Hwang,J.(1993).EmpiricalBayesEstimationfortheMeansoftheSelectedPopulations.Sankhya:TheIndianJournalofStatistics,SeriesA55(2),285. 58

PAGE 59

Putter,J.andD.Rubinstein(1968).Onestimatingthemeanofaselectedpopulation.Tech.Kept165. Qiu,J.andJ.Hwang(2007).Sharpsimultaneousintervalsforthemeansofselectedpopulationswithapplicationtomicroarraydataanalysis.Biometrics63,767. Sackrowitz,H.andE.Samuel-Cahn(1984).Estimationofthemeanofaselectednegativeexponentialpopulation.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological)46(2),242. Sackrowitz,H.andE.Samuel-Cahn(1986).Evaluatingthechosenpopulation:abayesandminimaxapproach.LectureNotes-MonographSeries,386. Saxena,K.(1976).Asingle-sampleprocedurefortheestimationofthelargestmean.JournaloftheAmericanStatisticalAssociation,147. Saxena,K.andY.Tong(1969).Intervalestimationofthelargestmeanofknormalpopulationswithknownvariances.JournaloftheAmericanStatisticalAssociation,296. Stein,C.(1964).Contributiontothediscussionofbayesianandnon-bayesiandecisiontheory.HandoutInstituteofMathematicalStatisticsMeeting. 59

PAGE 60

ClaudioFuenteswasborninChilein1977.Upongraduationfromhighschool,heenrolledasastudentatthePonticiaUniversidadCatolicadeChile,wherehereceivedadegreeofBachelorofScienceinmathematicsin2001.Duringhisundergraduatehewasappointedasateachingassistantforseveralcourses.Itwasthenwhenhedevelopedadeepappreciationforteachinganddecidedtopursueanacademiccareer.InDecember2003,hereceivedamasterdegreeinstatisticsfromthesameinstitution.InAugust2005,heenteredthegraduateprogramintheDepartmentofStatisticsattheUniversityofFlorida.Duringhiseducationthere,hehadtheopportunitytoworkasaresearchassistantforDistinguishedProfessorDr.GeorgeCasella,whobecamehisadvisor.InAugust2008heearnedthedegreeofMasterofScienceinstatisticswithathesisinclusteranalysisandinAugust2011heearnedhisPhD.inStatisticswithadissertationinintervalestimationfollowingselection.Aftergraduation,hejoinedtheDepartmentofStatisticsatOregonStateUniversityasassistantprofessor. 60