Define b as the smallest integer larger than l/mn(j).

4. The number of true null hypotheses is estimated as go

= min(b, g).

Recently, Storey and Tibshirani [6] observed and demon-

strated in their Figure 1 that the density histogram of p-

values based on pl,..., pg from a microarray experiment

looked flat in the region of (0.5, 1). Based on the fact that

the null p-values of genes are uniformly distributed, most

of genes in the region of (0.5, 1) should be from the null.

Therefore, they used a smoothing method to estimate go

based on the flat region of the observed p-values. How-

ever, the implicit requirements of their method are: 1) g

should be large, and 2) the proportion of true null

hypotheses should also be large, such as 0.5, so that0 can

be estimated accurately. The method developed in this

paper tries to bypass these two requirements so that a

broad range of multiple testing problems can be applied.

It is conceivable that when the number of relevant genes

are known for certain target disease, a chip with a small

number of genes will be widely used for diagnosis. The

FDR methods, including both the BH and the aBH ver-

sions, have now been widely accepted for microarray anal-

yses. They have been implemented in the R program [7],

which can be incorporated into the Bioconductor package

[8]. Several microarray analysis computer programs, such

as SAM (Significance Analysis of Microarrays) [9] and

GeneSpring [10] also use the FDR criterion to identify dif-

ferentially expressed genes. Yang et al. [11] have used the

FDR criterion to determine the sample size in microarray

experiments.

Methods

Without loss of generality, we describe the method in a

one-sample testing problem. Its extension to commonly

used paired-t test, two-sample t test, or nonparametric

rank tests in microarray analysis is very straightforward.

Let Yi = (Y,i,..., Yig)', i = 1I..., n be g-variate vector samples

from populations with means p = (p/,..., pg)'. Define YO) =

(Y1j,..., y,1), ,j = 1,..., g as a row vector for the jth compo-

nent. The null hypothesis of the jth component is Ho0 : /j

= 0 and the alternative hypothesis is H1j: p/j # 0. Let the p-

Page 3 of 14

(page number not for citation purposes)

BMC Bioinformatics 2006, 7:15

http://www.biomedcentral.com/1471-2105/7/15

value for rejecting the null H0j be p, derived from a prop-

erly chosen test statistic.

Once the p-values are derived, we first summarize the

information content of p-values using a differentiable

real-valued decreasing function h, i.e., h(pi,..., pg) is

decreasing to all its arguments. We call h the combining

function and its value, 7 = h(pi,..., pg), the global statistic.

Next, define 77(0) to be the observed global statistic of 77

derived directly from the observed data. Note that a small

value of pj indicates Hoj is less likely to be true. Since h is a

decreasing function of its argument pj, a large value of /(0)

indicates a subset of the null hypotheses is likely to be true

alternatives. To determine whether 7/(0) is large enough to

make such a claim, we need to know the distribution of 77

when all the nulls are true. The distribution of 77 depends

on the distribution of its arguments, the correlation

between its arguments, and the combining function h. In

a microarray experiment, there seems no way to model or

estimate so many correlations (see, for example, Chapter

6 of Pesarin [12]). Hence, a reasonable approach which

can tackle the correlation within each experimental unit is

to determine the critical value by a permutation test. More

specifically, we calculate all B = 2" values of 7 based on

(coY1 ,..., co,Y3n), where ), = 1 (The multivariate two-sam-

ple permutation method can be found in Pesarin [12]).

Hence, we have a set of reference values {q0) ,..., (g 0)}

from the B permutations, and can now define the p-value

of the global 7/()-statistic as

(0) b=1 ) (0)

B

where I(x) is the indicator function that takes value 1 if the

statement x is true and 0 if it is not. To determine whether

the global null hypothesis of all p/),j = 1,..., g, is zero is to

evaluate if the global p-value, p(0), is less than or equal to

a significance level. However, this level is part of the esti-

mation process. It will be determined by the data (see later

Alg. 2).

When the global null hypothesis is rejected, it indicates

that not all null hypotheses Ho0s are true. Immediate ques-

tion is, which subset of hypotheses is the true alternative.

To determine whether to reject the hypothesis H(oi) is a

multiple testing problem. We can, however, determine the

size of true alternatives using the following iterated proc-

ess. We regard the gene with the smallest p-value as the

major contributor for the rejection and claim that the null

hypothesis is not true with this gene. When this gene is

removed from the data set, we continue the same process

with the rest g 1 genes and the p(O) computed by (2) is

now denoted by p(1). The whole process is then repeated

to produce a sequence of pseudo-global p-values, p(s), s =

1,..., g 1. We call the later step global p-value pseudo

because it is only based on the subset of the original data.

The detail of how to generate pseudo-global p-values is

given in the Algorithm section. First, we will use these

pseudo-global p-values to estimate the number of true

null genes go.

We observe that, intuitively, p(o) < p(1) < ... < p(9-i), and this

monotone increasing property will be proved in (J.1) of

the Justification section. In addition we will prove in

(J.2) of the Justification section that if r pseudo-global p-

values are less than a given value /(0
tor of the number of null genes is

g0o =g r+ p /(1- )2

and this estimator ensures that E [ go I > go. (The conserv-

ativeness of this estimate under other conditions using

other techniques has also been mentioned by Storey and

Tibshirani [13] and Efron et al. [ 14].) Since the inequality

E [ g0 ] > go holds for any value of t, the best choice of/ is

the one which makes go closest to go. By defining A = 8/

(1 l)2 and p(/) = r A, we observe that p(/j) is an increas-

ing function and then a decreasing function of 8/ for the

following reasons. When 8/ is small, A increases slowly.

However, r is an increasing function in /f and r is always

greater than 1. Therefore, p(/j) is an increasing function

for small value of /. On the other hand, when / -> 1, A

reaches oo. Since r is finite, p(/) becomes a decreasing

function in this range. Since (3) is equivalent to go = g -

p(8), the optimal value of / should be determined as

/ = arg max p(3)

/3

and the method of estimating go is equivalent to finding

the optimal /value.

The estimation of go can thus be summarized as

Alg. 2 global-p method in go estimation

1. List rp as the integer such that p(-1) '

p ) > /for a large number ofs for 0

Page 4 of 14

(page number not for citation purposes)

BMC Bioinformatics 2006, 7:15

http://www.biomedcentral.com/1471-2105/7/15

-10000 -5000 0 5000 10000 15000

0 0 0 0 0 0 O

Figure 5

The plot of function p(/) in a real data analysis to determine

the arg max p(3) for the tumor data.

2. Find such that p(0J) = r/- 6/(1 f)2 is maximized (see

Figure 5). Let this /fbe /*.

3. Let r,, be the integer such that p( -1) < V s = 1,..., r,

and p ) > ,*

4. Let g = min[g r, + /( *)2, g].

We called this global-p method and denoted it by .

Alg. 3 Differentially expressed gene selection with given

FDR

With go estimated, to control the FDR at a level for identi-

fying differentially expressed genes in microarray data

analysis with the conservative estimate go0, we simply test

each sub-hypothesis based on (1) with the new a- a/ go0,

i.e., rejects all H), j = 1,..., /, if p) < j x a and P(+) > (I

go

+ 1) x We also need to address the choices for the

combining function h. In this paper, we consider only two

commonly used ones. 1) Fisher's sum of logarithm

g

n= h(p ...,pg)= -21log(p,)

j=1

and 2) Liptak's sum of inverse standard Gaussian distribu-

tion functions

n= h(pl,...,pg f 1(I- ps).

j=1

More discussion on the choices of combining functions

can be found in Birnbaum [15] and Folks [16]. The com-

puter program for the proposed method, global-p, has

been implemented in R script and is publicly available

[171.

Results

Simulation

Two simulation studies were conducted. First, when the

number of hypotheses are small and second, when the

total number of hypotheses is extremely large. For small

number of hypotheses, we used the following configura-

tions: numbers of hypotheses are 16, 32, 64, and 128 with

sample size n = 10 and the proportions of go/g being 0,

0.25, 0.5 and 0.75. The true expressions under the alterna-

tive hypotheses are assumed to be variance 1 Gaussian

random variables with non-zero mean values p/ = dj. In

one set of experiments, all djs are set as 0.2 or 0.4; in the

other set the djs are divided into 4 equal size blocks with

values 0.2, 0.4, 0.6 or 0.8 in each block. The number of

permutations is B = 1,000 and the number of simulations

is 20,000. The FDR is set at a = 0.05. Four procedures are

used for testing: BH, aBH, our global-p test one with

Fisher's combining function, and one with Liptak's com-

bining function. For the purpose of comparison, we also

plug in the true value of go into the aBH method. This is

the ideal situation to reach the highest power.

Table I: False discovery rate when all null hypotheses are true.

BH aBH

0.047

0.049

0.049

0.048

0.047

0.049

0.049

0.048

Fisher

0.048

0.049

0.049

0.049

Liptak

0.048

0.050

0.049

0.049

Page 5 of 14

(page number not for citation purposes)

BMC Bioinformatics 2006, 7:15

BMC Bioinformatics 2006, 7:15

Table 1 lists the simulated expected FDR when all null

hypotheses are true. Figures 1 and 2 show the estimated

FDR and power under various gene expression composi-

tions.

Based on this simulation results, all the four methods

have their expected FDR below the nominal FDR a = 0.05.

Both combining functions in our proposed method have

a higher power than the aBH procedure in most situa-

tions, while there is little difference between them. The

d = 0.2

1.0 -

= 0.8 -

--

Z 0.6 -

S0.4 -

0

0.2 -

O.-

0.0 -

- 1.0 -

0.8 -

0.6 -

LO 0.4 -

CN

0.2 -

0.0 -

- 1.0 -

S0.8 -

0.6 -

0-

O 0.4 -

LO

0.2 -

O.-

0.0 -

- 1.0 -

Z 0.8 -

z-

S0.6 -

LO 0.4 -

t-~

0.2 -

0.0 -

16 32 64 128

http://www.biomedcentral.com/1471-2105/7/15

improvement over the aBH methods is substantial when

the proportion of the true null hypotheses is small.

Since most microarray data consist of tens of thousands of

genes simultaneously and many of their expressions are

correlated, our second simulation tried to reflect these

facts. Storey and Tibshirani [13] proposed a clumpyy"

dependence model in which genes are partitioned into

blocks. The gene expressions are assumed independent

between blocks but dependent within the same block.

d = 0.4 Mixed d

------ - -

._

16

32 64 128 16 32 64 128

Figure 2

The same simulation study with estimated average powers, using the same legends as Figure I. The powers of the new method

are clearly higher than the existing two, especially when the null proportion is small.

Page 6 of 14

(page number not for citation purposes)

, , ,

IIII

http://www.biomedcentral.com/1471-2105/7/15

Following the clumpy model, 10 test and 10 control sam-

ples were generated from normal random variables with

mean 0 and standard deviation 1 where each sample con-

tained 10,000 genes. For test samples, genes with expres-

sion difference were added by 3 to represent the

expression differences. The number of differentially

expressed genes go were 100, 2,000, 5,000, 8,000 which

corresponded to the proportion of nulls 0o at 0.99, 0.8,

0.5, 0.2. To simulate intra-block dependency, we gener-

ated one vector of normal random variables with mean 0

and standard deviation 0.2 in each block of size 50 and

add it to every gene in that block. This process creates cor-

relations between genes. Since the true expression differ-

ences for non-null genes are moderate large in this

simulation, we expect that any good method should accu-

rately estimate o0.

In a recent article, Broberg [18] did a comparative review

of various newly developed methods for estimating r0. To

make a comparison, we presented the means and stand-

ard errors of estimates of 0o for all the methods describe in

Broberg's paper in addition to our proposed method (p )

over 1000 simulations. The details of r0estimation meth-

ods (BUM, SPLOSH, QVALUE, Boostrap LSE, SEP, LSL,

mgf, PRE) can be found in Broberg's paper and the pro-

grams for these methods can be found in the add-on pack-

ages of the freely available R software [19]. The simulation

results are shown in Table 2.

As expected, all of them performed very well for large o0.

If both the mean (bias) and standard error (stability) over

the whole range of 0o are considered, LSL and Global-p

stand out and LSL is better in stability. However, as we will

see from the next study, LSL seems to be over-conservative

in real data analysis. The means of both combining func-

tions of our proposed method performed well but Liptak's

function has a higher standard errors. Therefore, Fisher's

function will be used for the real data analysis. During the

simulation, we also noticed that current implementation

of SPLOSH, SEP, mgf, and PRE methods in R programs

failed to work when the number of hypotheses was small.

Real data analysis

We use a publically available experimental data set from

the Stanford microarray database [20] first to compare the

differences between BH, aBH and our method with

Fisher's combining function. The purpose of this experi-

ment was to identify genes that have different expressions

between prostate tumor tissue and matched normal tis-

sue. It consists of a total of 82 microarrays: 41 arrays were

from primary prostate tumors and the other half were

from matched normal prostate tissues. Each array con-

tains 32,152 different human genes. We chose this data

set because of the large number of replicates. With this

amount of replicates, we had a better idea of the ground

truth.

To compare performances of various methods, we split

the whole data set into two groups: a test data set and a

confirmation data set. Eleven pairs of tumor and normal

arrays were chosen for the test data set and the remaining

arrays were used for the confirmation. The number of

eleven test pairs were taken from a systematic sampling to

avoid bias. Patients 1, 5, 9,..., 41 in the original order from

the database were chosen as the test set. Expression infor-

mation is missing if it is labelled as failed, contaminated,

or flagged. Genes were removed from our study if more

than half of their expressions were missing in the original

data set. A total of 24,865 genes was used to identify dif-

ferences in expressions using BH, aBH and our proposed

procedures at 0.01 expected FDR level. Since the experi-

ment was designed with paired tumor-normal arrays, the

paired t-test was used to derive the p-value of each gene in

the test data set. The BH procedure identified 1,254 genes,

the aBH procedure identified 1,523 genes, and our pro-

posed method identified 2,119 genes. Our test is appar-

ently more powerful if it can maintain the required FDR

0.01. Since we did not have the biological information,

another approach was to estimate FDR based on our rejec-

tion set. Specifically, suppose the rejection set contains the

p-values that are less than and the estimated proportion

of null hypotheses is 0o An intuitive estimate for FDR is

[21,221

FDR() = (4)

Pr[P<^ ]

where Pr[P < 1] (= I(i P < )/g. Based on the

confirm data set, our proposed method reported an esti-

mator of r0, z0 = 0.5326. Using equation (4), the esti-

mated FDRs for BH, aBH and our method were 0.0054,

0.0067, 0.0099, respectively.

If we further reject more sub-hypotheses beyond the

number provided by our method, the FDR will exceed the

assigned level. For example, if we reject the next 500 null

sub-hypotheses that have the smallest p-values among

those not previously rejected, the estimated FDR is 0.0137

which is larger than the pre-assigned 0.01 level.

Moreover, we calculated the standard difference (paired t-

statistic) of expressions using the confirmation data set.

The confirmation data set contains 30 pairs of arrays so

Page 7 of 14

(page number not for citation purposes)

BMC Bioinformatics 2006, 7:15

http://www.biomedcentral.com/1471-2105/7/15

Table 2: The means (first row) and standard errors (second row) of estimates 7kO

0.2,0.5,0.8,0.99.

True TO

BUM

SPLOSH

LSL

Bootstrap LSE

QVALUE

SEP

mgf

0.200

0.1124

0.0021

0.8770

0.0737

0.2011

0.0005

0.1938

0.0079

0.1975

0.0107

0.9982

0.0004

0.2005

0.0014

0.1998

0.0017

Global-p

(Fisher)

Global-p

(Liptak)

0.5000

0.3962

0.0024

0.9030

0.0592

0.5015

0.0007

0.4905

0.0135

0.4979

0.0176

0.4978

0.0078

0.4944

0.0044

0.1712

0.0084

0.5002

0.0012

0.4992

0.0038

based on various methods when r0 =

0.8000

0.7288

0.0021

0.9343

0.0482

0.8016

0.0006

0.7863

0.0184

0.7980

0.0242

0.7982

0.0089

0.7973

0.0055

0.6568

0.0107

0.8002

0.0018

0.7978

0.0037

0.9900

1.0000

0.0000

0.9594

0.0271

0.9904

0.0003

0.9787

0.0159

0.9871

0.0154

0.9883

0.0090

0.9905

0.0062

0.9815

0.0066

0.9899

0.0019

0.9884

0.0027

that the statistic is a good estimate for the unknown stand-

ard difference. Figure 3 shows histograms of absolute

standard differences based on genes identified by the BH

procedure, by the aBH procedure, by our method, and the

extra 500 genes beyond our method. From the histo-

grams, several hundred differentially expressed genes that

were not identified by BH or aBH in the test arrays were

identified by our method and most of them have standard

expression differences greater than 2. However, the next

group of 500 genes beyond our method may contain too

many false discoveries to make the FDR acceptable.

Next, we use this data set to compare different estimates

of proportion of null genes, zr. The whole data set was

partitioned into four sub-data sets. We labelled the arrays

from 1 to 41 based on the original order in their database.

Sub-data set 1 contained 11 arrays with labels 1, 5, 9,...,

41; sub-data set 2 contains 10 arrays with labels 2,6,..., 38;

sub-data set 3 contains 10 arrays with labels 3,7,..., 39;

sub-data set 4 contained the remaining arrays. Although

we did not know the true value of %o in real data analysis,

we used the four sub-data sets to compare the variation of

different estimation methods. In addition, we used the

whole data set to check if the estimates are reliable. For

global-p method, we also plot the function of p(/) in Fig-

ure 5 using the whole data set. The maximum of p(/)

occurs at = 0.91.

The estimates 0o were listed in Table 3. All z0 estimates

in the subsets were larger than the ones in the whole data

set. This is expected because genes with small expressional

differences are more likely to be classified as null genes in

a small sample. From the %o estimates from the whole data

set, we may say the LSL gave a large %o value, PRE gave a

small value and all the others gave similar estimates. We

used LSL, PRE and Global-p to draw the p-value density

histogram using the whole data set in Figure 4. The upper

panel is the overall view and the lower panel is the closer

view. The green dash-dot line is the height using LSL esti-

mate; the red dash line PRE estimate; the blue dot line glo-

bal-p estimate. The LSL method, which is also shown in

Broberg's study, is too conservative producing the largest

value. The PRE method underestimated zr. If we used 0o

by PRE method to improve FDR procedure, it is likely to

have a higher FDR than the nominal level. We think our

%o estimate is reliable because it is close but higher than

the height of observed p-values in the region of (0.6, 1) if

we assume genes with expressional differences have rare

chance to have large observed p-values.

Conclusion

Benjamin and Hochberg's FDR procedure [21 opens an

useful approach for multiple comparisons in microarray

analysis. Due to the complexity in microarray experi-

ments, it seems unlikely that one method can dominate

all the other methods in all the cases. In this paper, we

have demonstrated a new method that is intuitive appear-

ing because:

Page 8 of 14

(page number not for citation purposes)

BMC Bioinformatics 2006, 7:15

http://www.biomedcentral.com/1471-2105/7/15

3 4 5 6 7 8 9 10 11 12

Adaptive BH

3 4 5 6 7 8 9 10 11 12

Proposed method

0 1 2 3 4 5 6 7 8 9 10 11 12

500 genes above proposed method

--1 -__fl

0 1

2 3 4 5 6 7 8 9 10 11 12

Absolute standard difference estimate

Figure 3

The histograms of the absolute standard difference estimates (x-axis) in the confirmation data set for the selected genes from

the three procedures based on the test data set for the prostate tumor data. The top is the selected gene histogram by the BH

procedure; the second is that by the aBH procedure; and the third is the one by our method. The similarity of the three histo-

grams means that the additional 596 genes picked by our method should maintain the same FDR (0.01) as the other two. The

bottom panel contains the next 500 genes beyond our proposed method. The left shift of this histogram means that the addi-

tional genes may contain too many false discoveries.

Page 9 of 14

(page number not for citation purposes)

BMC Bioinformatics 2006, 7:15

http://www.biomedcentral.com/1471-2105/7/15

-.- --

-. -- I I I I- 1 .

0.0 0.2 0.4 0.6 0.8 1.0

-II I I

0.0 0.2 0.4 0.6 0.8 1.0

Figure 4

The histogram of p-values in a real data analysis. About half of

the large p-values to the right look uniformly distributed.

1. It uses permutation test which can take care the com-

plex correlation structures in gene expressions.

2. Its global test based on sequentially eliminated signifi-

cant genes should provide a good stopping rule because

all the remaining genes are always considered together.

With the support of simulation and real data studies, the

new method should be a viable alternative to find the dif-

ferentially expressed genes in microarray experiments.

Algorithm

We illustrate the algorithm for calculating pseudo-global

p-values based on marginal p-values, pl,..., Pg Please

note that the procedure described here is the same regard-

less of go. We consider h(pl, ..., pgo ) in the summation

form:

q = h(p1,...,pgo)

go

Sh(Pi),

i=1

where is a differentiable decreasing function in which

dh(t)

h'(p) = dt exists and (p) -> oo as p -> 0. Note that

dt t = P

equation (5) totally meets the requirement that h should

be a differentiable real-valued decreasing function. The

realization of can be, for instance, (p) = -2log(p) if we use

Fisher's sum of logarithm or (p) = (D-1 (1 p) if we use

Liptak's sum of inverse standard Gaussian distribution

function. Recall that pj is the p-value obtained fromjth row

vector (Ylj,..., Yn,) and the ordered marginal p-values are

PP()
denoted by subscripts while the pseudo-global p-values in

(2) in are denoted by superscripts. Parentheses are used to

reveal the order property.

In one-sample testing problem, we permute data B times,

where each time we assign wi = 1 with equal probability

to (wY, ..., it Y ) For each permuted data, we use PbN) to

be the p-value from the bth permuted data for the gene

that produced p), j = 1,..., go, b = 1,..., B.

We summarize P(i) and Pb(i) using the following table.

P(1) PI(1) ... Pb(1) ... PB(1)

P(2) Pl1(2)

... Pb(2)

PB(2)

P(go) P1(go) Pb(go) "' PB(go)

Note that the first column is p-values of genes from the

original data while the remaining columns are from the

permuted data.

To simplify the notation, let (()) = (), b(i) = b(i) and 7 is

just the sum of (.). Therefore, we can transform the table

(6) into the following table with the addition of the last

row where its value 7 is the sum of the (.) in its corre-

sponding column.

h(1) h(1) ... hb(1) ... hB(1)

h(2) h1(2)

.. hb(2)

.. hB(2)

h(go) hl(go) ... hb(go) ... hB(go)

4 4 ... 4 ... 4

(0 ... 7 ) ... o 0)

Once o(0) and 0(),..., 7(B) are available, the global p-value,

by definition, is

(o) -1 b )1 b (o)

B

The same process is directly used to calculate the pseudo-

global p-values after removing the genes which have the

smallest p-values. Actually, we can use the available data

illustrated in table (8). By induction, suppose P(I),..., P(s)

Page 10 of 14

(page number not for citation purposes)

BMC Bioinformatics 2006, 7:15

http://www.biomedcentral.com/1471-2105/7/15

Table 3: Estimates of 7Ct based on the prostate tumor data set from the Stanford microarray database. The first four rows are

estimated based on the four partitions of the whole data set while the last row the is based on the data set. The means and standard

errors (5th and 6th rows) are calculated based on the estimates of the four sub-data sets.

BUM SPLOSH

0.4011

0.5211

0.5529

0.6056

0.5202

0.0867

0.3940

0.4844

0.5710

0.6104

0.6627

0.5821

0.0752

0.4706

LSL Bootstrap

LSE

0.8071

0.9106

0.9302

0.9628

0.9027

0.0672

0.6598

0.4802

0.5797

0.6235

0.6709

0.5886

0.0813

0.3949

QVALUE

0.4662

0.5679

0.6070

0.6618

0.5758

0.0826

0.3882

SEP mgf PRE Global-p

0.5792

0.5865

0.6232

0.6760

0.6162

0.0443

0.5979

0.5677

0.6866

0.7120

0.7529

0.6798

0.0796

0.4706

0.4536

0.5727

0.6097

0.6646

0.5751

0.0894

0.2803

0.5583

0.6722

0.7082

0.7493

0.6702

0.0821

0.4721

have been removed, the values of the raw data and their

reference sets are

h(s+1) hl(s+l) ... hb(s+1) ... hB(s+l)

h(s+2) hl(s+2) ... hb(s+2) ..*** hB(s+2)

(8)

h(go) hl(g0) ... b(o) ... hB(go)

(S) () ( (s)

1 1 ... 1,t. ... B

and the pseudo-global p-value after removing the s genes

is

B I(7(s) >(s))

p(s) b=1 b

B

Justification

Our justification is based on extreme cases to make the

proofs tractable. However, this requirement is not very

critical. Our simulation and real data analysis show that

even with moderate go value, the pseudo-global p-value,

p(l), starts to increase when our procedure has removed

most of the significant genes. Besides, it jumps up very fast

to reach the point that p(s) will be larger than threshold /8

and stop removing genes. Let g be the total number of sub-

hypotheses considered. Given g, let d = (d1,..., dg) denote

the vector containing the true values of pj, j = 1,..., g for

each sub-hypothesis; and R(g, d) be the number of YO)

removed using procedure p Without loss of generality,

we assume d( < ... < d(g-g0) < 0 and do) = 0 forj = g go +

1,..., g. We have

and the equality holds when d(gg0) -> The equality

affirms that the sub-sample YO) with extreme small dj < 0

is identified and removed from the component of 77. Actu-

ally, when the gene expression difference is far away from

zero, it will be identified by any reasonable multiple test-

ing procedure. Hence, in this extreme case, we only focus

on EV[R(go,O)] which is based on the remaining go sam-

ples that have null distributions. Our goal is reduced to

finding an upper bond of EV[R(go,O)], so that the

expected value of go can be bounded (below) by go.

J. I Monotone in Pseudo-global p-values

The key feature of our algorithm is that the pseudo-global

p-values are monotone increasing. We can be more precise

by showing that

Pr p() < p(+) (S) 1 ass > 0.

go

(10)

Romano (Section 2)[23] has proved that the distribution

of 7Sb) can be approximated by Gaussian distribution

using the Central Limit theorem. Therefore, if we let the

mean and standard deviation of reference samples 17s) (b

= 1,..., B) be p/(s) and o(s), respectively, and use Z to denote

the standard Gaussian random variable with distribution

function (D, then the pseudo-global p-value p(s) can be

expressed as

E [R(g,d) < g- go + E [R(go,0)],

Page 11 of 14

(page number not for citation purposes)

Sub-data I

Sub-data 2

Sub-data 3

Sub-data 4

Mean

Sd

Whole

BMC Bioinformatics 2006, 7:15

http://www.biomedcentral.com/1471-2105/7/15

B

Pr Z > (s) () 1()

SPr Z > C7()

where C(s) = -1(1 p(s)) is p(s)th upper percentile of the

standard Gaussian distribution. Hence, the relation

between 7l(s) and C(s) is

q(s) = P(s) + oCs) C (s).

Similarly, for the p/s+1) and c(s+1) as the mean and standard

deviation of reference samples fjs+l) (b = 1,..., B), we

repeat the same approximation method to express p(s+1) as

) > 1, n 1(S+))

B

Pr[Z> (s+l) (s+l)

Pr[ Z > (,(s ) + h(s+l)) (,(s+l) + h(s+l))

Pr[Z )> (s) _- (s+) ) h(s+)) )1

Pr Z > (+l)

Pr Z > (s) + (s)dcs) ((s+) + h(s+)

Pr Z > r(s) C(s) +

J(S) (s+1) h(s+l)

a (s+)

Comparing equations (11) and (13), we observe that to

prove p(s) < p(s+i) with probability 1 asymptotically when

p(s) is given is equivalent to prove that

(s) (S) (s+') -

SC() s+ (s+l) < C(s) (14)

O_(s+l) 4_(s+l)

with probability 1 asymptotically when C(s) is given.

As g0 -> oc, the sample deviations o(s) and gcs+1) are almost

identical so that U C(s) -> C(s). Next, using table (8)

again, we can express p/) /(S+i) as

B h(Pb+))

B

The distribution of the reference set P(s+i) ... PB(+I) con-

verges to the uniform (0, 1) distribution since the distri-

bution of the permuted test statistics, which are

corresponding to these p-values, converges to a Normal

distribution using Theorem 2.1 of Romano's work. There-

fore, if we define uniform (0, 1) random variable as U,

-~b=}1h +) B converges in probability to E[(U)] by the

B

Law of Large Number. Hence, to prove equation (14) we

only need to prove that

Pr[(p(,+l)) ->E[(U)]] -> 1. (15)

Assume, without loss of generality, that pl,..., Pgo corre-

(12) spond to the true null. To prove equation (15), we first

observe that, since pl,... Pgo are independent uniform

random variables, P(s+i) is the (s + 1)/g0-th quantile of the

uniform (0, 1) distribution. By defining

Cs+1,go = (s + 1) / go and using the distribution of sample

quantile [24], we have

8o-(p(s+i) ,+1,so) -> N(0,+1,sgo ( + lo0)).

Since the first derivative of exists, we can use the delta

method to have

so (h(p(s+l)-) h(s+l,g) -> N(0,(,2),

where = +1g (1 s+, ) [h'(s+,go )]2 represents

the asymptotical standard error of (p(s, )).

Finally, equation (15) can be proven as,

Pr[h(p(s+l)) < E[h(U)]]

=Pr h(p(s+l)) h(s+,go)

< I

E[h(U) ]-h(4+,g)

E[h(U)]-h(gsz^g)

Page 12 of 14

(page number not for citation purposes)

BMC Bioinformatics 2006, 7:15

http://www.biomedcentral.com/1471-2105/7/15

which converges to 0 because E[(U)] is finite and

h(G+i,go) as ,+,g = (s + 1)/go 0.

J.2 Determining the threshold P3 by (3)

First, we recall that, as p(0) is the global p-value based on

all null data, it is uniformly distributed. That is,

Pr[p(o) < ,61 = P.

We have proved that the pseudo-global p-values, p(s), are

monotone increasing when s/go -> 0. Therefore, there

exists m such that p(0) p() ... < p(m). Furthermore, for all

s & m,

Pr[p(S) <6 fpO) < f, j = 0, ..., s 1]

The purpose here is to derive equation (3) for any /within

(0,1) using the above inequality. We start with the

number of genes removed by our procedure. Observe that

Eo[R(go,O)]

go-1

= kPr[R(go,O) = k]

k=o

= -1 kpr [p(o) < ap) < l ,... p(k-1) < ,

k=i

(k) > ].

In addition,

Pr[p(0) < ,p) < p,..., p(1) < ,p() > /p]

= Pr[p(o)

\Y[Pr I p pS < pl= 0,..., s -

Pr[p(k) > p(i) < ,i= 0,...,k- 1

_< r/min(k-1,m)1

= km

where km = min(k, m + 1). Hence, we can find an upper

bound of the expected number of genes removed,

Ep[R(go,O)], which is

Ep[R(go,O)]

k=l

( (1-_p 1-(m+l)m+2 /(1 /)

(1- _)2

+(go m 2)m"+1

( )as m is large. (16)

(1 )2

Let A = A(/?) = /(1 ,)2. From equations (9) and (16), we

have

E [R(g,d) -Al

so that the number of true null estimate is

0 = g R(g, d) + A

and this estimate ensures that

E[ olg0 go-

References

1. Westfall PH, Young SS: Resampling-based multiple testing New York:

John Wiley & Sons; 1993.

2. Benjamini Y, Hochberg Y: Controlling the false discovery rate: A

Practical and powerful approach to multiple testing. JRSSB

1995, 57:289-300.

3. Benjamini Y, Hochberg Y: The control of the false discovery rate

in multiple testing under dependency. Annals of Statistics

2001:1165-1188.

4. Simes RJ: An improved Bonferroni procedure for multiple

tests of significance. Biometrika 1986, 73:.

5. Benjamini Y, Hochberg Y: On the adaptive control of the false

discovery rate in multiple testing with independent statis-

tics. journal of Education and Behavioral Statistics 2000, 25:60-83.

6. Storey JD, Tibshirani R: Statistical significance for genomewide

studies. PNAS 2003, 100:9440-9445.

7. Reiner A, Yekutieli D, Benjamini Y: Identifying differentially

expressed genes using false discovery rate controlling proce-

dures. Bioinformatics 2003, I 9:368-375.

8. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S,

Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W,

lacus S, Irizarry R, Li FLC, Maechler M, Rossini AJ, Sawitzki G, Smith

C, Smyth G, Tierney L, Yang JYH, Zhang J: Bioconductor: Open

software development for computational biology and bioin-

formatics. 2004 [http://genomebiology.com/2004/5/10/R80].

9. Tusher VG, Tibshirani R, Chu G: Significance analysis of micro-

arrays applied to the ionizing radiation response. PNAS 2001,

98:5116-5121.

10. GeneSpring 7.1. Silicon Genetics [http://www.silicongenet

ics.com]

I I. Yang MCK, Yang JJ, Mclndoe RA, She JX: Microarray experimen-

tal design: power and samples size considerations. Physiol

Genomics 2003, 16:24-28.

12. Pesarin F: Multivariate permutation tests with applications in Biostatistics

West Sussex, England: John Wiley & Sons; 2001.

13. Storey JD, Tibshirani R: Estimating false discovery rates under

dependence, with applications to DNA microarrays. Technical

report, Department of Statistics, Stanford University 2001.

14. Efron B, Storey JD, Tibshirani R: Microarrays, Empirical Bayes

Methods, and False Discovery Rates. Technical Report 217,

Department of Statistics, Stanford University 2001.

15. Birnbaum A: Combining independent tests of significance. JASA

1954, 49:559-574.

Page 13 of 14

(page number not for citation purposes)

BMC Bioinformatics 2006, 7:15

http://www.biomedcentral.com/1471-2105/7/15

16. Folks JL: Combination of independent tests. In Handbook of Sta-

tistics Volume 4. Edited by: Krishnaiah PR, Sen PK. New York: Elsevier

Science Publishers; 1984:113-121.

17. Global-p website [http://www.stat.ufl.edu/-~jyang/global p/glo

bal.p.R

18. Broberg P: A comparative review of estimates of the propor-

tion unchanged genes and the false discovery rate. 8MC Bioin-

formatics 2005, 6(I 99):.

19. R Development Core Team: R: A language and environment for

statistical computing. R Foundation for Statistical Computing, Vienna,

Austria 2005 [http://www.R-project.org]. ISBN 3-900051-07-0

20. Lapointe J, Li C, Higgins JP, van de Rijn M, Bair E, Montgomery K, Fer-

rari M, Egevad L, Rayford W, Bergerheim U, Ekman P, DeMarzo AM,

Tibshirani R, Botstein D, Brown PO, Brooks JD, Pollack JR: Gene

expression profiling identifies clinically relevant subtypes of

prostate cancer. PNAS 2004, 101:811-816.

21. Storey JD: A direct approach to false discovery rates. JRSS8

2002, 64:479-498.

22. Genovese C, Wasserman L: A stochastic process approach to

false discovery control. The Annals of Statistics 2004,

32:1035-1061.

23. Romano JH: On the behavior of randomization tests without

a group invariance assumption. journal of the American Statistical

Association 1990, 85:.

24. Serfling RJ: Approximation Theorems of Mathematical Statistics New

York: John Wiley & Sons; 1980.

Page 14 of 14

(page number not for citation purposes)

Publish with BioMed Central and every

scientist can read your work free of charge

"BioMed Central will be the most significant development for

disseminating the results of biomedical research in our lifetime."

Sir Paul Nurse, Cancer Research UK

Your research papers will be:

available free of charge to the entire biomedical community

peer reviewed and published immediately upon acceptance

cited in PubMed and archived on PubMed Central

yours you keep the copyright

Submit your manuscript here: BioMedcentral

http://www.biomedcentral.com/info/publishing adv.asp

BMC Bioinformatics 2006, 7:15