Citation
Improved exact methods for statistical inference in contingency tables

Material Information

Title:
Improved exact methods for statistical inference in contingency tables
Creator:
Kim, Donguk, 1959-
Publication Date:
Language:
English
Physical Description:
viii, 252 leaves : ill. ; 29 cm.

Subjects

Subjects / Keywords:
Approximation ( jstor )
Computer printers ( jstor )
Confidence interval ( jstor )
Integers ( jstor )
P values ( jstor )
Probabilities ( jstor )
Random allocation ( jstor )
Statistical models ( jstor )
Statistics ( jstor )
Subroutines ( jstor )
Dissertations, Academic -- Statistics -- UF ( lcsh )
Statistics thesis Ph. D ( lcsh )
Genre:
bibliography ( marcgt )
non-fiction ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1993.
Bibliography:
Includes bibliographical references (leaves 248-251).
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by Donguk Kim.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
The University of Florida George A. Smathers Libraries respect the intellectual property rights of others and do not claim any copyright interest in this item. This item may be protected by copyright but is made available here under a claim of fair use (17 U.S.C. §107) for non-profit research and educational purposes. Users of this work have responsibility for determining copyright status prior to reusing, publishing or reproducing this item for purposes other than what is allowed by fair use or other copyright exemptions. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. The Smathers Libraries would like to learn more about this item and invite individuals or organizations to contact the RDS coordinator (ufdissertations@uflib.ufl.edu) with any additional information they can provide.
Resource Identifier:
021623576 ( ALEPH )
33411939 ( OCLC )

Downloads

This item has the following downloads:


Full Text










IMPROVED EXACT METHODS FOR STATISTICAL INFERENCE
IN CONTINGENCY TABLES















By

DONGUK KIM


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

1994


UNIVERSITY OF FLORTDA LIBRARIES































(�) Copyright 1994

by

Donguk Kim






































To my wife, daughter

and

my parents














ACKNOWLEDGEMENTS


I would like to express my sincere gratitude to Dr. Alan Agresti. Without his guidance and encouragement, this work would not have been completed. I would like to thank Dr. Mark Yang, Dr. Myron Chang, Dr. Brett Presnell, and Dr. David Wilson for their encouragement and advice while serving on my dissertation committee.

In my six years as a student here, I learned from all professors. I would also like to thank Dr. Yang and Dr. Randles for all the support while I worked as a consultant in the Biostatistics Division and as a teaching assistant. Also my thanks go to all my colleagues and friends.

Finally, I wish to express my special thanks to my family, especially my wife, YoungHee, for her love, patience, and encouragement, and my daughter, Minjee for her love. Furthermore, I would like to thank my parents for their love, encouragement, and support.














TABLE OF CONTENTS




ACKNOW LEDGEM ENTS ..................................................... iv

A B ST R A C T ................................................................... vii

CHAPTERS

I IN TR O D U CTIO N .................................................... 1

1.1 Literature Review ............................................... 1
1.2 Summary of Dissertation Work ................................. 6

2 IMPROVED EXACT INFERENCE ABOUT CONDITIONAL ASSOC IA T IO N .......................................................... 9

2.1 Introduction ..................................................... 9
2.2 A Less Conservative P-value .................................... 11
2.3 A Less Conservative "Exact" Confidence Interval .............. 31
2.4 Alternative Modifications of "Exact" Confidence Intervals ..... 38 2.5 Connections with Logistic Regression ........................... 62
2.6 D iscussion ...................................................... 63

3 APPROXIMATING EXACT INFERENCE ABOUT CONDITIONAL
A SSO C IAT IO N .................................................... 64

3.1 Introduction ..................................................... 64
3.2 Tests of Conditional Independence Assuming No Three-factor
Interaction .................................................... 65
3.3 Tests of Conditional Independence Permitting Three-factor Interaction ...................................................... 72
3.4 The Construction of the Modified Exact P-value ............... 82
3.5 Approximation of Exact P-values ............................... 86
3.6 E xam ples ........................................................ 89
3.7 FORTRAN Program for Simulation ............................ 94

4 IMPROVED EXACT TESTS FOR ORDINAL VARIABLES IN I x
J x K TA B LES ................................................... 96








4.1 Introduction ..................................................... 96
4.2 Basic Results in Two-way Contingency Table .................. 98
4.3 Unbiasedness of Tests in Three-way Contingency Tables ....... 104 4.4 Com plete Class of Tests ......................................... 115
4.5 A dm issible Tests ................................................ 116
4.6 Exact, Unbiased and Admissible Tests .......................... 118
4.7 E xam ple ......................................................... 121
4.8 D iscussion ....................................................... 124

5 C O N C LU SIO N ....................................................... 125

5.1 D iscussion ....................................................... 125
5.2 Future Research ................................................. 126

APPENDICES

A SOURCE CODE FOR EXACT INFERENCE ....................... 129


B SOURCE CODE FOR SIMULATION ................................ 209

B.1 Program Structure .............................................. 209
B.2 Part of Source Code ............................................ 211

R EFER EN C ES ................................................................. 248

BIOGRAPHICAL SKETCH ................................................... 252














Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy

IMPROVED EXACT METHODS FOR STATISTICAL INFERENCE IN CONTINGENCY TABLES

By

Donguk Kim

August 1994

Chairman: Alan Agresti
Major Department: Statistics

Ordinary "exact" methods can be highly conservative when the distribution of the test statistic is discrete. This becomes more severe as the number of dimensions or the number of categories is small. We improve exact inferential methods by decreasing the conservativeness that occurs due to discreteness. In this dissertation, modifications of exact inferential methods are suggested for conditional associations in three-way contingency tables. For testing conditional independence, we present a modified P-value. It utilizes both the usual test statistic and, at the observed value of that statistic, a supplementary statistic directed toward a broader alternative. For 2 x 2 x K tables, we propose modified "exact" confidence intervals for an assumed common odds ratio based on inverting two separate one-sided tests using the modified P-value. We also present an alternative and usually even better way of constructing "exact" confidence intervals, based on inverting a two-sided test with a modified P-value.

For I x J x K tables, we discuss exact tests of conditional independence using six test statistics that have connections with loglinear models. Three statistics assume a lack of three-factor interaction, and the other three statistics do not require this








assumption. All six statistics are score statistics for loglinear models that treat none, one, or both of the classifications as ordinal. Then, we discuss possible alternative ways of forming modified exact P-values in I x J x K contingency tables, and we propose modified exact P-values for six tests corresponding to six loglinear models. For three-way contingency tables, computational algorithms have limited availability for tests of conditional independence when I and J exceed two. We use a simulation algorithm to obtain precise estimates of ordinary and modified exact P-values for cases for which the current computational algorithms are infeasible.

For I x J x K tables, we show how to construct exact, unbiased, and admissible tests for an ordinal alternative to conditional independence by using a modified Pvalue approach. This is a generalization of the results of Cohen and Sackrowitz for a test of independence in two-way contingency tables for an ordinal alternative. The ordinary test of conditional independence for 2 x 2 x K contingency tables is usually inadmissible.














CHAPTER 1
INTRODUCTION


1.1 Literature Review




Statistical inference for contingency tables generally is carried out by large-sample approximations for sampling distributions of the test statistic rather than the exact discrete distribution. A central concern is the quality of the asymptotic approximation. Large-sample approximations apply as the sample size grows, for a fixed number of cells. The adequacy of the chi-square approximation depends on both the sample size and the number of cells. Some contingency tables occur where the sample size is too small to apply asymptotic methods. Also, high-dimensional contingency tables tend to be sparse, and as a consequence the asymptotic approximation to the sampling distribution is often very poor. Agresti (1992) surveyed exact inference for contingency tables and explained the developments of exact methods for contingency tables. He suggested the use of exact methods instead of large-sample approximations when the application of asymptotic approximation is questionable. We focus on exact inferential methods for conditional associations in three-way contingency tables.

When the exact distribution of the test statistic is discrete, it is known that ordinary "exact" tests and confidence intervals can be highly conservative because of the discreteness of the distribution. Though exact tests are guaranteed to control the probability of Type I error at any nominal level, we may not achieve a probability of Type I error of the nominal level exactly. The actual probability of Type I error may be considerably smaller. For instance, in a 2 x 2 contingency table, Fisher's








exact test is always conservative. For exact inference about a parameter of interest, we condition on sufficient statistics for unknown parameters to eliminate them. For an exact conditional test for categorical data, the reference set of tables over which the exact conditional distribution is defined is the set of contingency tables having certain marginal counts fixed. This extra conditioning makes the distribution of the test statistic more highly discrete.

Barnard (1947) proposed an unconditional exact test for 2 x 2 contingency tables. The reference set of his test is defined as the set of all tables with fixed row margins and all possible column margins. Since the column margins are not fixed, this unconditional test has many more tables in the reference set, and the distribution of the test statistic is less discrete. A disadvantage of the unconditional test is that computations are infeasible for larger tables, since maximizing over the space of nuisance parameters is needed for implementation. For further details, see Yates (1984) and Suissa and Shuster (1985).

One way to reduce conservativeness is the mid P adjustment. Let T be a test statistic and t, be its observed value. According to Lancaster (1961), the mid P adjustment utilizes half of the probability of the observed value of T; hence, it subtracts half of the probability of the observed statistic from the usual exact P-value. This reduces the conservativeness due to discreteness and does not rely on randomization to eliminate the conservativeness. But one drawback is that it can not guarantee exactness, in the sense that the actual size possibly exceeds the nominal level. It comes from the fact that the mid P approach subtracts half of the probability of the observed statistic from the exact P-value.

For nonparametric tests, Streitberg and Roehmel (1990) considered utilizing a secondary statistic together with the usual statistic to discriminate among those rank configurations that have the same value of the primary statistic. He showed that his test is uniformly more powerful than the Wilcoxon-Mann-Whitney test, and the








P-value of this test employing any secondary statistic can not be larger than the Pvalue from the ordinary test. A similar approach to reduce the conservativeness is due to Cohen and Sackrowitz (1992). They suggested a modified P-value that utilizes both the usual test statistic and, at the observed value of that statistic, the null table probability for a secondary partitioning for those tables having T = to. Instead of including all tables having T = t, in the calculation of the P-value, they include tables that are no more likely than the observed. They used this for ordinal tests in two-way tables.

Discreteness also affects interval estimation. An "exact" confidence interval for a parameter can be constructed by inverting the exact conditional test. The ordinary confidence interval (Cox 1970, Gart 1970, Mehta et al. 1985, Vollset et al. 1991) is based on inverting two separate one-sided tests using the ordinary P-value. Because of discreteness, we get a conservative confidence interval. The actual confidence coefficient is at least the nominal level.

We could construct an exact confidence interval based on inverting a single twosided test rather than two separate one-sided tests. Using a two-sided approach, Sterne (1954) constructed a confidence interval for a single binomial parameter, and Baptista and Pike (1977) constructed confidence limits for the odds ratio in a 2 x 2 table. This two-sided confidence interval also is conservative.

Some problems arise when exact methods are infeasible and the application of large-sample approximations is questionable. For large-sample inference about conditional association in three-way contingency tables, Mantel and Haenszel (1959) gave a test statistic comparing two groups on a binary response, adjusting for control variables. Since Cochran (1954) proposed a similar statistic, it is called the CochranMantel-Haenszel statistic. This is a test for conditional independence in 2 x 2 x K tables. Also, Birch (1964) showed that under the assumption of a constant odds ratio within each of the tables, this test is uniformly most powerful unbiased.








Birch (1965) derived three test statistics for testing the null hypothesis of conditional independence of two variables in I x J x K contingency tables. These are score statistics for loglinear models that none, one, or both of the classifications are ordinal. These models assume a lack of three-factor interaction. When both classifications are nominal, the corresponding statistic is a generalized Cochran-Mantel-Haenszel test statistic to handle more than two groups or more than two responses. This method involves computing the expected values and the covariance matrix under the multiple hypergeometric probability model for each of the tables. These quantities then are summed across the tables, and a quadratic form of the test statistic is generated. When both classifications are ordinal, the corresponding statistic is the same as Mantel's (1963) score statistic. Furthermore, Birch's statistics are special cases of a general statistic proposed by Landis et al. (1978). These statistics have an asymptotic chi-squared distribution.

Rather than use large-sample approximations, we wish to conduct exact inference. Even though recent developments make exact methods feasible for some inferential analyses, because of computational complexity, we do not have exact methods for some situations. For three-way contingency tables, current computational algorithms for exact methods are restricted to certain analyses for 2 x J x K tables with ordered columns.

The Monte Carlo method is another alternative to either exact or asymptotic methods. This method is based on estimating the exact conditional sampling distribution of the statistic by generating random tables having the relevant fixed margins. It is useful for those situations where the data set is too large for an exact computation or too sparse to rely on the asymptotic theory. For table generation by simulating from a hypergeometric distribution, Boyett (1979) wrote a program that generates a two-way random table from the exact distribution with given row and column totals.








Patefield (1981) presented a program generating a random table, and his program is faster than Boyett's for larger sample sizes.

Agresti et al. (1979) utilized the Monte Carlo method effectively for a variety of tests for two-way tables. Even for large tables or large sample sizes, one can quickly approximate as closely as needed the ordinary and modified exact P-values for these statistics. This method consists of sampling contingency tables from the conditional reference set in proportion to their probabilities and computing an unbiased point estimate and a narrow confidence interval for an exact P-value.

When we construct a critical region for exact tests with some preassigned nominal level a, supplementary randomization would be required at the boundary of the critical region in order to achieve the nominal size. This is typical for any discrete problem. After randomization, the resulting test may be inadmissible. Cohen and Sackrowitz (1991) focused on two-way tables and showed unbiasedness for the test of independence in two-way tables for an ordinal alternative. Eaton (1970) showed the essentially complete class in an exponential family. Eaton's theorem shows that the essentially complete class consists of tests whose acceptance regions are convex with possible randomization on the boundary of acceptance region. Furthermore, Ledwina (1978a, 1984) gave the class of admissible rules in an exponential family. Using the same argument in Ledwina, Cohen and Sackrowitz (1991) proved a theorem that gives the class of exact, unbiased, and admissible tests in two-way contingency tables. They constructed the exact test of size a by ordering the tables according to their probabilities on sample points where the test would randomize. They made the number of tables on which randomization would occur considerably smaller than in the usual test.









1.2 Summary of Dissertation Work




In Chapter 2, we present exact tests of conditional independence against the alternative of no three-factor interaction. Our modified exact tests are adaptations of the ordinary exact conditional tests that are less conservative. We propose a modified P-value based on a secondary partitioning of the sample space beyond that generated by the test statistic. It utilizes both the usual test statistic and, at the observed value of that statistic, a supplementary statistic T' directed toward a broader alternative. In the calculation of the P-value, we include only those tables that are at least as contradictory to the null in terms of T'. One can calculate this modified P-value for any test statistic having a discrete distribution. The modified P-value is less discrete than the ordinary P-value, does not employ randomization, and leads to a less conservative "exact" test.

By inverting results of tests using modified P-values, we obtain an exact and less conservative confidence interval, in the sense that the modified confidence interval has confidence coefficient at least the nominal level and is narrower than the ordinary one. For 2 x 2 x K tables, we suggest a modified "exact" confidence interval inverting the test based on a modified one-sided P-value to make the actual confidence coefficient closer to the nominal value. Also, we present an alternative and usually even better way of constructing "exact" confidence intervals, based on inverting a two-sided test with a modified P-value.

Furthermore, we utilize the mid P-value to construct intervals applying these methods, although these are not exact. To compare these types of intervals, we calculate actual coverage probability or expected length of the confidence intervals based on inverting one-sided or two-sided tests using the ordinary or modified P-value.








In Chapter 3, we suggest exact inference regarding conditional associations in three-way contingency tables. For exact tests of conditional independence in I x J x K tables, three statistics assuming a lack of three-factor interaction are discussed, and then we provide three other test statistics permitting three-factor interaction. All six test statistics are score statistics for loglinear models that treat none, one, or both of the classifications as ordinal. Also they have asymptotic chi-squared distributions. Using these statistics, we propose modified exact P-values for six tests for testing conditional independence with I x J x K tables.

For cases that are currently computationally infeasible, we construct a simulation algorithm to obtain precise estimates of ordinary and modified exact P-values, using a table-generation procedure suggested by Patefield (1981). We utilize six test statistics for exact tests of conditional independence.

In Chapter 4, we generalize results of Cohen and Sackrowitz (1991, 1992) to construct exact, unbiased, and admissible tests for an ordinal alternative to conditional independence for I x J x K tables. We first show unbiasedness of tests when one wishes to test a null hypothesis of conditional independence against the alternative of no three-factor interaction model in three-way contingency tables. Then we present the complete class of tests and admissible tests in an exponential family following Eaton (1970) and Ledwina (1978a, 1984). Using these arguments, we generalize to the three-way case some results of Cohen and Sackrowitz regarding admissibility of tests for two-way tables. Combining these, we have a theorem that gives the class of exact, unbiased, and admissible tests in three-way contingency tables.

With this theorem, we discuss how to construct unbiased tests and how to set up critical regions to obtain tests of conditional independence of fixed size a, for an ordinal alternative. We construct the exact test of size a by ordering the tables according to a secondary statistic directed toward a broader alternative hypothesis at the randomization points, utilizing the modified approach discussed in Chapter 2. By






8

using the modified approach, the resulting test is admissible after randomization, and it requires less randomization than usual. Also, we have actual size closer to a nominal value. The Appendix contains a FORTRAN program. Using this program, one can easily get ordinary and modified exact inference about conditional associations for

2 x 2 x K contingency tables.














CHAPTER 2
IMPROVED EXACT INFERENCE ABOUT CONDITIONAL ASSOCIATION


2.1 Introduction




When a test statistic has a discrete distribution, ordinary "exact" tests and confidence intervals can be highly conservative due to discreteness. If we conduct a test using some preassigned size a, the probability of Type I error is always less than or equal to a preassigned value. If one constructs an "exact" confidence interval with confidence coefficient 1 - a, the actual confidence coefficient is at least that level and is unknown (Neyman 1935). We wish to improve ordinary exact inferential methods by decreasing the conservativeness that occurs due to discreteness. In this chapter, we suggest modifications of exact inferential methods for conditional associations in

2 x 2 x K contingency tables.

For instance, we present an example of a 2 x 2 x 5 table for which the ordinary 95% confidence interval for an assumed common odds ratio is (1.1, 531.5). The discreteness implies that .95 is a lower bound for the actual confidence coefficient. We show how to construct a modified confidence interval that also has the guarantee of at least 95% confidence, but takes the much shorter range (2.1, 67.3). Our approach is applicable for any contingency table of size larger than 2x2, but we illustrate the arguments in terms of inferences about conditional associations in 2 x 2 x K contingency tables. The ideas and notations apply throughout the dissertation. In this chapter we are focusing on 2 x 2 x K contingency tables.








For three-way tables, consider the hypothesis of conditional independence of two variables, given the third one. For instance, if {Wijk} denote probabilities for a multinomial distribution over the I x J x K cells, where i=,rijk = 1, the hypothesis states that
7rijk =- 7ri+k~r+jk /Tr++k.

The subscript "+" denotes the sum over the index it replaces. Let N = {nijk} denote the cell counts, with expected frequences {mij}. We discuss exact conditional tests of this hypothesis, generalizing Fisher's exact test for 2 x 2 tables. We also discuss confidence intervals for odds ratios pertaining to conditional association.

Let X denote the row classification, Y the column classification, and Z the layer classification. The hypothesis of conditional independence of X and Y, given Z, is usually tested against the alternative of no three-factor interaction. This alternative is the loglinear model of form
lognijk = y + Ax + Ay + A + xYr + AXz + Az, (2.1)


having sufficient statistics ({nij+ }, {fli+k} , {fn+jk}). The null hypothesis corresponds to the special case of this model in which all A-NY = 0. Exact conditional tests utilize the distribution of {nij+ }, the sufficient statistics for these parameters, conditional on the other sufficient statistics, that relate to the remaining parameters. For the case of a 2 x 2 x K table, for instance, one uses the distribution of Zk njlk, conditional on the row totals {fl+k} and column totals {fn+jk} for the partial tables (Birch 1964). The parameter of interest for estimation is the assumed common odds ratio for each

2 x 2 table.

We present exact tests of conditional independence for the alternative of no threefactor interaction. Our modified exact tests are adaptations of the ordinary exact conditional tests that are less conservative. They use a modified P-value based on a secondary partitioning of the sample space beyond that generated by the test statistic. It utilizes both the usual test statistic and, at the observed value of that statistic,








a supplementary statistic directed toward a broader alternative. A modified P-value is less discrete than the ordinary P-value and leads to less conservative "exact" tests. By inverting results of tests using modified P-values, we have an exact and less conservative confidence interval, in the sense that a modified confidence interval has confidence coefficient at least the nominal level, and it is narrower than the ordinary one.

Section 2 introduces the modified P-value and shows that its distribution can be much less discrete than that of the ordinary P-value. We compare the ordinary and modified P-values with examples. Furthermore, the null expected value of the P-value is discussed in both procedures in order to examine the degree of conservativeness. Section 3 discusses modified "exact" confidence intervals, based on inverting two one-sided tests using the modified P-value. Though they are also conservative, they may be much narrower than the usual one. Illustrations are given for estimating an assumed common odds ratio for several 2x2 tables. Section 4 presents an alternative and usually even better way of constructing "exact" confidence intervals, based on inverting a two-sided test with a modified P-value. Section 5 discusses some related results for logistic regression models, and Section 6 gives some comments.



2.2 A Less Conservative P-value




Suppose we would like to conduct an exact conditional test for categorical data using some preassigned size a, such as 0.05. Denote by F the set of contingency tables having the same marginal counts as the ones that are fixed by the conditioning argument for the exact conditional test. This is the set of tables over which the exact conditional distribution is defined. For the test of conditional independence,








for instance, F is the set of I x J x K tables of nonnegative integers, IF = {Z : EiZijk 71+jk, EjZijk = 7zi+k, for all i,j, k}.

It is usually not possible to construct a critical region for exact conditional tests with preassigned size a because of the discreteness of the distribution. If an exact test is desired of arbitrary size a, supplementary randomization would be required to make the decision about whether to reject when a table occurs at the boundary of the critical region. In practice, it is unacceptable to employ randomization, and one normally simply reports a P-value. In general, suppose we have a test statistic T, such as a Wald, likelihood ratio, or score statistic, and suppose t, is the observed value of T. If large values of T contradict the null, the usual P-value is


P = PHo(T > to), (2.2) the probability under the null hypothesis that T is at least t,. Ordinarily, if one wants to make a decision about H0, one rejects if the P-value < a. The discreteness implies that the test based on the P-value is conservative in the sense that the actual size is


PHo(P







particularly when data are discrete. However, the discreteness also affects interval estimation.



2.2.1 The Modified "Exact" P-value




To reduce the degree of conservativeness, we suggest a modified P-value based on a less discrete distribution than that of T. The modified P-value uses a partition of the sample space that is more refined than we get using T alone. We use T to construct a primary partitioning of all tables that have the sufficient statistics fixed by the conditional test. Then, within fixed values of T, we generate a secondary partitioning using some other index T' of the degree to which the data contradict the null hypothesis. The statistic T' is a test statistic directed toward a somewhat broader alternative hypothesis, hence detecting information that may be missed by T. Let t, and t', denote the observed values of the primary and secondary statistic. The modified P-value is defined as
P- = PH,(T > t,) + PHo(T = t, T' > t'), (2.4)


where the probabilities are computed under the null conditional distribution. Instead of including all tables having T = t, in the calculation of the P-value, we include only those that are at least as contradictory to the null in terms of having at least as large a value of T'.

To illustrate, consider testing conditional independence in 2 x 2 x K tables. Normally, if we expect about the same strength of association in each 2 x 2 stratum, we test against the alternative (2.1) of no three-factor interaction. Using this narrow alternative helps to build power compared to statistics based on the general alternative, even if we do not feel that reality exactly satisfies (2.1). Suppose we use as the primary statistic the score statistic, which is based on T= k nllk, for the conditional









set of tables having the same row and column totals as the observed table. Then one could use the score statistic for the general alternative (the saturated model) for the secondary partitioning. This is simply T' = Zk X ', where X' denotes the Pearson statistic for testing independence in the kth partial table. The secondary statistic also contains information about the validity of the null hypothesis, but is directed toward a wider alternative.

Another possibility for the secondary partitioning is to use the null table probability, in which case T' can be expressed as the negative log of that probability. For a given value of T, tables that are less likely under the null are then considered to give greater evidence against the null. Let B = {Z : Z E F, T = to, P(Z) _< P(N)}, where the probabilities are computed under the null. The modified P-value is then P; = PHo(T > to) + PHo(B). (2.5) The modified P-value orders sample tables in F according to their probabilities when T = to. Hence, this is based on the probability of the observed table as well as some test statistic. Cohen and Sackrowitz (1992) used this type of P-value for ordinal tests in two-way tables. We will compare both ways of forming modified P-values and confidence intervals based on these modified P-values, with examples. We prefer P* over P* for the modified P-value, because both T and T' are score statistics for testing conditional independence.

The setting and the statistic T in definitions (2.4) and (2.5) are arbitrary. One can calculate P* for any test statistic having a discrete distribution, since it satisfies PH0 (P* < a) < a for 0 < a < 1. We show that under the null this modified P-value has the property,


PHo(P*<) :a for 0 < < 1. (2.6) Let P* be a modified P-value and let m be a possible marginal configuration. We first show that the conditional P-value has PH (P* < am) < a. The result is








easily obtained by noting that the modified P-value is a special case of the usual P-value using a more refined partitioning of T and T'. The ordinary P-value uses a partitioning based on T, and it is the sum of PHo(T = t,) and the probability of more extreme values of T. The modified mid P-value uses a partitioning based on T and T' within T. Let Max(.) denote the maximum value, let Min(.) denote the minimum value, and let Gap(T) denote the minimum difference between two consecutive values of T. We assume that T and T' have positive values. Define a new statistic T* = T x Max(T')/Gap(T) + T'. If Min(T') equals 0, we transform from T' to T' + 1 in order to avoid ties in T*. Then, T*(Z1) > T*(Z2) for all tables Z1,Z2 with T(Zl) > T(Z2). Let to denote the value of T* for the observed table. Note that a partitioning of the sample space using T and T' within T is equivalent to a partitioning of the sample space using T*. Since there are no ties, ordering tables using T and T' within T is equivalent to ordering tables using T*. Then, the sum of the probability that T' is at least To' at T = t, and the probability of more extreme values of T is equivalent to the sum of PHo(T* = t*) and the probability of more extreme values of T*. That is,

P* = PHo(T > t,) + PH(T = t,, T' > t') = PHo(T* > t*) + PHo(T* = t*).

Hence, the modified P-value is a special case of the usual P-value with a more refined partitioning, and we have PHo(P* < aim) < a. Then, under the null, PHo(P < a) = E[PHo(P* < a m)] < a, (2.7) since the average of these conditional modified P-values over all possible marginal configurations is less than or equal to a. Thus, we have shown that the probability of Type I error is no greater than the nominal value.
The modified P-values can not be larger than the ordinary P-values, so the test based on it is less conservative in the sense that the actual size is closer to the nominal








value. Also, the sampling distribution of the modified P-value is less discrete than usual in the sense that its support can have considerably more points. When each table with a particular statistic value T has the same value of T', then P* is the same as the usual exact P-value. As a special case, when there is only one table having each distinct value of T, such as in Fisher's exact test, they are identical. Note that if T is a score or Wald or likelihood-ratio statistic for a particular alternative, it does not help to take T' to be one of the other statistics for that same alternative. Because these tests all depend only on the sufficient statistics under the alternative, two tables that have the same value of T also have the same value of T', when T and T' are taken from these procedures. Thus, we base T' on a more general alternative, for which the extra sufficient statistic provides a finer partitioning.

When a test statistic has a continuous distribution, the P-value has a uniform(0,1) null distribution. Hence, for the continuous case the expected value of P-value is ' 2"
We prove now that in the discrete case the expected value of P under the null is greater than 1. For an arbitrary random variable X (Mood, Graybill and Boes 1974, page 65),


EX ' - Fx(x)]dx-J Fx()dx


I"0[1 - Pr[X < x]]dx - Pr[X < xdx.


Thus, EP = fo[I -Pr[P < p]]dp. Since, from (2.6) 1 -Pr[P < p] > 1 -p, 0 < p < 1, we have


EP > J [I - pd 1
0 2



In the discrete case, the P-value is stochastically larger than the uniform, and its expected value exceeds 1. Hence, we can describe the degree of conservativeness by








comparing EHoP to 0.5. If the expected value exceeds 0.5 by much, the conservativeness is severe.



2.2.2 The Modified Mid P-value




The mid P-value (Lancaster 1961) is another alternative to the usual P-value that many statisticians have recommended as a way of compromising between having a conservative test and using supplementary randomization (e.g., Barnard 1990). It is defined by

Pmid = PH,(T > to) + (1/2)PHo(T = to).

It subtracts half of the probability of the observed statistic from the usual exact P-value. The mid P-value has the appealing property that its null expected value for a discrete distribution equals exactly , the expected P-value for a continuous distribution. A disadvantage is that a test based on it is no longer "exact," the actual size possibly exceeding the nominal value.

The mid P-value assigns weight ' to probabilities of all tables comparable to the observed table in the sense that T = to. For the modified P-value (2.4), the comparable tables are those with T to and T' = t. Thus, we can define a mid P version of the modified P-value by

i - PH(T = to, = t'o). (2.8) Like the ordinary mid P-value, the modified mid P-value has null expected value equal to 1. The result is easily obtained by noting that the modified mid P-value is a special case of the usual mid P-value using a more refined partitioning of T and T'. The ordinary mid P-value uses a partitioning based on T, and it is the sum of half of PHo (T = to) and the probability of more extreme values of T. The modified








mid P-value uses a partitioning based on T and T' within T. We assume that T and T' have positive values. Let Gap(T) denote the minimum difference between two consecutive values of T. Define a new statistic T* = T x Max(T')/Gap(T) + T'. If Min(T') equals 0, we transform from T' to T'+ 1 in order to avoid ties in T*. Then, T*(Z1) > T*(Z2) for all tables Z1,Z2 with T(Zi) > T(Z2). Let t* denote the value of T* for the observed table. Note that a partitioning of the sample space using T and T' within T is equivalent to a partitioning of the sample space using T*. Since there are no ties, ordering tables using T and T' within T is equivalent to ordering tables using T*. Then, the sum of half of PHo (T = t0, T' = t') and the probability of more extreme values of T' at T = t, and more extreme values of T is equivalent to the sum of half of PH, (T* = to) and the probability of more extreme values of T*. That is,


PIa = P&o(T > t.) + PHo(T = t,, T' > t') + (1/2)PH(T = t,, T' = t)

= PHo(T* > t*) + (1/2)PH.(T* = t*).

Hence, the modified mid P-value is a special case of the mid P-value with a more refined partitioning, and its null expected value is equal to 1. Also, the difference between the modified P-value and modified mid P-value is less than the difference between the ordinary P-value and ordinary mid P-value. That is, (P* - P*aid) (P - P,,id).



2.2.3 Examples




We consider the test of conditional independence in three-way contingency tables under the assumption of no three-factor interaction. We will illustrate the ordinary and modified P-values using 2 x 2 x 5 and 2 x 2 x 18 contingency tables. For 2 x 2 x K tables, the exact test utilizes the test statistic T = Zk 711k, given








{II+k, 2+k, n+lk, n+2k}. It assumes homogeneity of the odds ratios in the 2 x 2 x K contingency tables. For modified P-values, we can utilize Ei Xk or the table probability, P(Z), for the secondary statistic T'. In the examples we utilize E X2 for T' in (2.4).
We illustrate the modified P-values (2.4) and (2.5) using Table 2.1, taken from Mantel (1963). It refers to the effectiveness of immediately injected or 1 -hourdelayed penicillin in protecting rabbits against lethal injection with /-hemolytic streptococci. Let P=penicillin level, D=delay, and C=whether cured. Under the assumption of a constant odds ratio 0 between D and C at each level of P, we test Ho : 0 = 1 against Ha : 0 > 1. Our alternative is the higher cure rate for immediate injection. For the first and last table, the zero marginal count implies that the conditional distribution of n1ik is degenerate, and the table makes no contribution to the test. Therefore, we can conduct the test using the three remaining tables.

The test statistic is T = E n1lk, given marginal totals of row and column variables at each level of the third one. For these tables, t, = 14, and the four tables with T > 14 are {(n111,7/112,n1113) = (3,6,6),(2,6,6),(3,5,6),(3,6,5)}. The values of T' for these four tables are 11.09, 7.54, 6.59, and 11.09, respectively. Among them, the observed table is (3,6,5). The ordinary exact P-value is P PH0(T > 14) = (2+9+16+2)/1452 = 0.0200. The modified exact P-values are P* =P = (2+2)/1452

0.0028, the null probability for the tables {(3, 6,6), (3, 6, 5)}.
For another example, we consider Table 2.2, the "crying babies" data given by ('ox (1970, p. 5), a 2 x 2 x 18 table. On each of 18 days, babies not crying at a specific time in a hospital ward served as subjects. On each day one baby chosen at random formed the experimental group, and the remainder were controls. Babies were identified as crying or not at the end of a specific period. For these tables, the observed values are t,=15, t'o=17.2601 and the P-values are P = 0.045, P* = 0.024, and P; = 0.021.








There can be a considerable discrepancy between the behavior of the ordinary and modified "exact" P-values, the modified one having a distribution that can be much less discrete. For Table 2.1, the total number of possible P-values equals 9 for the ordinary P-value, 32 for P*, and 35 for PP. For Table 2.2, the corresponding numbers are 19, 115938, and 13110. Figure 2.1 presents the cumulative distribution functions of the ordinary exact P-value and of P* for null conditional distributions based on the fixed margins of Table 2.1. Figure 2.2 presents the analogous distributions for p.* Also, Figures 2.3 and 2.4 display the corresponding cumulative distribution functions for null conditional distributions based on the fixed margins of Table 2.2. For Table 2.2, the modified cdf for P* or P* has a distribution practically indistinguishable from the uniform.

We can summarize the degree of conservativeness of each P-value using EH (P-value). Using the conditional distribution based on the fixed margins of Table 2.1, EHo P = 0.611 and EHoP* = 0.545 and EHoPp = 0.542. For Table 2.2, EHoP = 0.576 and EHoP* =0.500 and EHoP; = 0.501.

We now illustrate the ordinary and modified mid P-values. For the modified mid P-value, we can use T' = EjX or the table probability for the secondary statistic. For Table 2.1, Pmid = 0.011 and P&id = 0.002 for both modified mid P-values using EX' or the table probability. For Table 2.2, Pmid 0.028, and 1"id = 0.024 with T = E X' and 0.021 with the table probability. Figures 2.5 and 2.6 present the cumulative distribution functions of the modified exact P-value and the modified mid P-value using T' = E X', and the corresponding cumulative distribution functions using the table probability for T', respectively, for null conditional distributions based on the margins of Table 2.1. There is a good contrast between the behavior of the modified "exact" P-value and modified mid P-value. The modified P-value never exceeds the nominal level, but the modified mid P-value can exceed it. The modified








mid P-value jumps and exceeds the nominal value before the modified P-value jumps closely to the nominal value.

Figures 2.7 and 2.8 display the cumulative distribution functions of the ordinary mid P-value and the modified mid P-value using T' = X', and the corresponding cumulative distribution functions using the table probability for the modified mid P-value, respectively, for the null conditional distribution based on the margins of Table 2.1. Though tests based on the ordinary and modified mid P-value are not ,exact," the gap between the actual size and the nominal level tends to be less for the modified mid P-value than for the ordinary mid P-value. One way to measure how close the cdf of P is to the uniform cdf is by the measure


M = J IF(x) - G(x)ldx,


where F = cdf of P and G = uniform cdf. Using Table 2.1 with T' = X, we have M = 0.055 for P1,id, and M = 0.022 for Pid. For the exact P-values, we have M 0.111 for P, and M = 0.045 for P*.

Table 2.1. Example for exact analyses. Penicillin Response Level Delay Cured Died 1/8 None 0 6 1 1/2 Hour 0 5 1/4 None 3 3 1 1/2 Hour 0 6 1/2 None 6 0 1 1/2 Hour 2 4 1 None 5 1 1 1/2 Hour 6 0 4 None 2 0 1 1/2 Hour 5 0 Source: Mantel (1963)









Table 2.2. Example for exact analyses.


Treated


Control


I


Source: Cox (1970)


Day
1
2
3
4
5
6
7
8
9 10 11
12 13 14 15 16 17 18


Control


Not Crying
1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1


Crying
0 0 0 1
0 0 0 0 0 1
0 0 0 0 0 0 1
0


Not Crying
3 2 1 1
4 4 5
4 3 8 5 8 5 4 4 7 4 5


Crying
5 4 4 5 1
5 3 4 2 1 1 1
3 1 2 1
2 3










P(P-value <=x)
OF


- .......... ...................................


Modified P-value
................. Ordinary P-value
Oriay -au
--------------.---------


Figure 2.1. Two cumulative distribution functions of exact P-values with T' Xk, for the margins of Table 2.1.


..... ..........











P(P-value <=x)
c)F


�.
,v--


0.0 0.2 0.4 0.6 0.8 1.0
x


Figure 2.2. Two cumulative distribution functions of exact P-values with T' = P(Z), for the margins of Table 2.1.


Modified P-value ................. Ordinary P-value










P(P-value <=x)
q




oo '- ....................











Co Modified P-value
- ..................... -................. O d n r - a u
- --------------------Ordinary P-value
j~.. .........


0.0 0.2 0.4 0.6 0.8 1.0
x


Figure 2.3. Two cumulative distribution functions of exact P-values with T' = Xk, for the margins of Table 2.2.










P(P-value <=x)
01


Vt-


0.0 0.2 0.4 0.6 0.8 1.0



Figure 2.4. Two cumulative distribution functions of exact P-values with T' = P(Z), for the margins of Table 2.2.


Modified P-value
- .-.-----------.---.--- ................. Ordinary P-value
. ...... ..










P(P-value <=x)
0p
o._f-


Modified P-value ................. Modified Mid P-value


0I0I I
0.0 0.2 0.4 0.6 0.8 1.0


Figure 2.5. Cumulative distribution functions of the modified exact P-value and the modified mid P-value with T' = E X,, for the margins of Table 2.1.











P(P-value <=x)
o.


.
T"


0.0 0.2 0.4 0.6 0.8 1.0
x


Figure 2.6. Cumulative distribution functions of the modified exact P-value and the modified mid P-value with T' = P(Z), for the margins of Table 2.1.


Modified P-value ................. Modified Mid P-value











P(P-value <=x)
0

0













........................ ........ ............Odnr MiP-au
o-F
0r




0.0 0.2 0.4 0.6 0.8 1.0
x


Figure 2.7. Cumulative distribution functions of the ordinary mid P-value and the modified mid P-value with T'= T Xk, for the margins of Table 2.1.











P(P-value <=x)
CD, J


0.0 0.2 0.4 0.6 0.8 1.0
x


Figure 2.8. Cumulative distribution functions of the ordinary mid P-value and the modified mid P-value with T' = P(Z), for the margins of Table 2.1.


Modified Mid P-value .............. ..... .- ..........Ordinary Mid P-value








2.2.4 Software




Thomas (1975) gave the first algorithm for exact analysis of several 2 x 2 contingency tables. This FORTRAN program required enumeration of all possible tables in the conditional reference set; hence, it could be slow. It provided exact tests for conditional independence as well as an exact confidence interval for a common odds ratio, and computed the conditional maximum likelihood estimate. Vollset and Hirji (1991) presented a fast FORTRAN program for the exact test of conditional independence and confidence interval for a common odds ratio in several 2 x 2 contingency tables.
We suggest modifications of exact methods based on ordering the tables by their secondary statistic. In order to implement a modified exact test, we need to compare the secondary statistic, T', of the generated table to that of the observed table, for tables such that T = t, and decide whether the table contributes to the P-values. We have modified Vollset and Hirji's FORTRAN program to implement modified exact P-values. Also, the modified software can compute the expected value and the cumulative distribution of P in both ordinary and modified procedures. The source code is listed as Appendix A.



2.3 A Less Conservative "Exact" Confidence Interval




Discreteness also affects confidence interval estimation. For the "exact" confidence interval with nominal confidence coefficient 1 - a, the actual confidence coefficient is at least that level and is unknown (Neyman 1935). Since the modified P-value is less discrete than the ordinary P-value and leads to less conservative "exact" tests, we can








reduce the conservativeness by employing the modified P-value for the construction of confidence intervals.

For 2 x 2 x K tables, we suggest modified "exact" confidence intervals for an assumed common odds ratio based on inverting results of tests using the modified P-value. Such intervals have confidence coefficient guaranteed to equal at least the nominal level, but are narrower than the ordinary "exact" interval. Illustrations are given for estimating an assumed common odds ratio for several 2 x 2 tables.



2.3.1 The Ordinary "Exact" Confidence Interval




One can construct an exact confidence interval for a parameter by inverting the exact conditional test regarding the value of that parameter. For an ordinary exact confidence interval, one can invert the test based on the ordinary exact P-value.

To illustrate, suppose we want to estimate an assumed common odds ratio, 0, in a 2 x 2 x K contingency table. The conditional probability of any table in the reference set, F, is


flk ( +1k n2 ni
P({nllk}I{nl+k}, {n+lk}, {fn+2k} 0) = _ l lk i+k - n1k ) - , (2.9) ZzEr'flk ( +1k n+2k 9zk
k Zk ) ( ni+k - Zk

where {z-,.-,zK} denote values of {n ,--, - n1K} for a table in the reference set F. Let t = {Z : Z E F, Zknlik = t}. Ordinary exact confidence limits for the common odds ratio are constructed from the conditional distribution of T = Zk n11k, that is

Ctt
P(T = t; O) = t-t c"O.' (2.10)









where


_t )(, 742k
ZEFt Zk 71-tk Zk

and where tml. = Zk max(0, 71+k - 7+2k) and t ..a k min(nl+k, n+lk). The ordinary interval (Cox 1970, Gart 1970, Mehta et al. 1985, Vollset et al. 1991) is based on inverting two separate one-sided tests. It equals (0-, 0+), where for tjni < to < tmax,


at 0=0- PI(0) = Et>toP(t;=)
21
a
at 0 =0+ P2(0) = Et

When to = tmin, the lower endpoint is 0; if to = tmax, the upper endpoint is oo. It is easily shown that (0-(t),0+(t)) has confidence coefficient at least 100(1 -a) (Mehta et al. 1985). Due to discreteness of the distribution of T, we have only a conservative confidence interval, and the actual confidence coefficient is unknown.



2.3.2 The Modified "Exact" Confidence Interval




To ensure that the actual confidence coefficient is closer to the nominal value and to obtain a narrower "exact" interval, one can invert the two one-sided tests based on the modified exact P-value. We illustrate this using a secondary statistic EZ Xk(O) or the table probability to generate the secondary partitioning. In the non-null case, T' is defined as

ZX2(0) >z z(7l1k - rni2k(
k ij rnk(O)

where 7hijk(O) is the estimate of the expected cell count, assuming common odds ratio 0. When 0 = 1, E X2(0) is the Pearson statistic for testing conditional independence.








If large values of T' contradict the null, we let B(O) {Z : Z E r, T = t, T'(0) > $(0)}. When the table probability is utilized, we denote P(Z; 0) as the probability of table Z when the common odds ratio is 0, and let B(O) = {Z : Z c F,T = to, P(Z; 0) P(N; 0)}. The modified "exact" confidence limits are found using the functions


Pj'(0) = Et>toP(t;0) + P[B(0);0],

P (0) = Et 2-, and the upper limit, 0*_, is the largest of all O's to satisfy P. (0) > . When P1*(0) and P2(O) are strictly monotone functions of 0, the limits satisfy P(O*) P(O*) =

We show that the probability that this interval excludes 0, Pr(0* > 0) + Pr(0_ < 0), is at most a. The lower limit is the smallest value of 0 for which P*(0) > 2-. For

0 < 0_, P (0) < 2. It follows that
2

Pr(0* > 0) _ Pr(Pi*(0) < a


< Pr(Pl (0) < a)


EPr(P (0) < am)

<
- 2'

where m denotes a possible marginal configuration, and the last step follows because of discreteness. For the upper limit, by the same arguments we have Pr(0_ < 0) < ' The result follows.

Clearly, this interval is contained within the ordinary one. Hence, the modified confidence interval is "exact," yet it has actual confidence coefficient closer to the nominal value than the ordinary "exact" interval. One can solve for the modified








endpoints numerically, based on the ordinary endpoints as the initial values. The algorithm to find the endpoints is as follows. Start with an initial value based on the ordinary one, since the modified limits are contained within the ordinary ones. Note that P1(O) and P2(O) are strictly monotone functions of 0 (Mehta et al. 1985). Also note that Pj*(O) is bounded by P1(9), and P2(O) is bounded by P2(O). Even though P (O) and P2(O) are not monotone functions of 0, the limits can be found within the ordinary limits because they are bounded by P1(O) and P2(0), respectively. Hence ordinary confidence limits provide good starting values for both the monotone case and the non-monotone case. The initial value for the lower limit can be set to be 0-, and the initial value for the upper limit can be set to be 1.01 x 0+.

Suppose we want to find the lower limit. Generally, the searching algorithm is composed of two steps. The first step is to increase the value of 0 until some value of 0 has Pj(O) > 2. For the sake of the non-monotone case, the value of 0 is increased by a small amount so that P (O) can not change much between two values of O's. The second step is iteration within an interval to find the limit. Denote by OA the most recent estimate that has P (O) < ' and denote by OB the most recent estimate that
2
has P7(9) > 2. The initial values of 0A and 0B are set to be zero. As 0 changes, OA or 0B is updated depending on the value of P*(9), and these values will be used for the second stage to determine an interval for iteration.

More specifically, if Pt(O) < ', the current estimate is too small. If Pt(O) > 2'2
the current estimate is too large. For the first step, compute P*(9) at the initial value of 0. If Pj(O) = a, this is the limit. If Pj*(9) < 2, multiply 0 by 1.01 to increase the
2'2
value of 0. Using this new estimate, compute P (9). Continue this process until some estimate is found that has P"(O) > 2. Once this happens, the second step begins. Iteration occurs between two values of 0. These two values are the previous estimate that has P (O) < " and the current estimate that has Pt(O) > 2. Note that OA and 0B have been updated as the estimate changes. Then the new estimate is defined as








O2' and P (O) is computed using this estimate. Depending on the value of P (O), OA or 0B is updated. The process continues until I 1 is sufficiently close to zero, for 0B
example, 10-4. If Pl (O) and P2(O) are strictly monotone functions, this algorithm finds the limits that satisfy P*(O*) = P(O_) = '. If not a monotone function, it finds the smallest of all O's to satisfy P*(O) > ., and the largest of all O's to satisfy P.*(O) > a. Thus, this algorithm can be used for both monotone and non-monotone cases.
For the upper limit, the same procedure follows except that at 0 = 0+ if P (O) <

multiply 0 by 0.99 to decrease the value of 0. This comes from the fact that if Pl*(O) < ', the current estimate is too large, and if P1*(O) > ', the current estimate
2 2' is too small. This algorithm is an adaptation of one written by Baptista and Pike (1977) for exact two-sided confidence limits for an odds ratio in a 2 x 2 table.

Next, we show that when the ordinary P-value and the modified P-value P1% based on table probabilities are identical, then the ordinary and modified exact confidence intervals (based on inverting the test using P ) also are identical. Suppose we use the table probability for T'. By the definition,


P = PHo(T>to),

Pp* = PHo(t > to) + PH.({Z : T = t,, P(Z) < P(N)}). When the ordinary and modified P-values are identical, we have


PH,(T = t) = PH0({Z T = t,, P(Z) < P(N)}).

Hence, the observed table has the largest null probability among those tables having T = t,. This means that when 0 = 1, the coefficient for the observed table


I k 1k )( k )nk
( n~lk n+k - nilk








is the largest among those coefficients for tables having T = t,. Since for arbitrary 0 we get

Elk (nk 2k nlk=[flk (~ "+I ( +k ]O
Tllk 71+k - ?lllk ) \fl l- k - n )lk )

the table probability for arbitrary 0 depends on only this coefficient. Because the observed table has the largest coefficient among those tables having T = t, it has the largest probability among those tables having T = t, for arbitrary 0. Hence, P(T = t,; 0) = P[B(O); 0], and the ordinary and modified exact confidence intervals also are identical.

This property does not hold when T' = X'(0) is used to construct the modified P-value. The expected cell counts in T' have explicit forms under the null, but they do not have explicit forms under the alternative assuming 0, though they can be obtained by the iterative proportional fitting algorithm. For those tables having T = to, if the observed table has the smallest value of T' under the null, it does not necessarily have the smallest value of T' under the alternative. Hence, the ordinary and modified exact confidence intervals are not necessarily identical when P = P*.

We now illustrate exact confidence intervals for a common odds ratio using Tables 2.1 and 2.2. The 95% "exact" interval using the ordinary approach is (1.08,531.51) for Table 2.1 and (0.86,21.37) for Table 2.2. The corresponding modified "exact" confidence interval using T' = EXk(0) is (2.08, 67.35) for Table 2.1 and (1.01,13.63) for Table 2.2. Also, the corresponding modified "exact" confidence interval using the table probability for T' is (2.08,67.35) for Table 2.1 and (1.04,14.87) for Table 2.2. We see that inferences can be considerably sharper with the modified approach. For Table 2.1, for instance, the lower bound of the ordinary interval indicates that the true odds ratio could be quite close to conditional independence. The modified interval suggests that the odds ratio is substantively quite different from conditional independence.









2.4 Alternative Modifications of "Exact" Confidence Intervals




In previous sections, we have considered two types of probabilities, that is, the probability of obtaining T equal to or less than the observed value of T = to, and separately the probability of obtaining T equal to or greater than the observed value t,. Then, confidence limits are constructed by inverting the test. Hence, confidence intervals discussed so far are based on inverting two separate one-sided tests of level a/2 each. We now suggest an alternative way to form an "exact" confidence interval for a common odds ratio. This method is based on inverting a single two-sided test rather than two one-sided tests.

We show that confidence intervals based on inverting two-sided tests tend to be less conservative than those based on inverting two separate one-sided tests. Also we discuss modified mid P confidence intervals based on inverting one-sided or two-sided tests using modified mid P-values.



2.4.1 The Ordinary Two-Sided "Exact" Confidence Interval




Sterne (1954) used a two-sided approach in constructing a confidence interval for a single binomial parameter, and Baptista and Pike (1977) used it to construct confidence limits for the odds ratio in a 2 x 2 table. We can extend this directly to 2 x 2 x K tables. For testing a particular value of 0, a two-sided P-value is given by P(0) = P(t;0). (2.13) {t: P(i;o)
When the distribution of T has probabilities monotonically increasing in t up to some point and then monotonically decreasing after that, this is simply a two-tail








probability. (This has happened for all examples we have considered, and it may indeed be a property of the distribution of T for 2 x 2 x K tables; however, except for K = 1, it does not seem to be known whether the distribution of a sum of noncentral hypergeometric variates is unimodal.) The two-sided exact confidence interval then consists of the values for 0 for which this two-sided P-value equals at least a. Alternatively, one could base the two-sided P-value on a non-null test statistic (such as the score statistic), and construct the confidence interval by inverting that test using the exact non-null distribution. We will discuss this in Chapter 5.

This two-sided approach produces an interval that is usually, but not necessarily, shorter than the ordinary one based on inverting two separate one-sided tests. Under certain conditions, it can be shown that the two-sided approach is better, at least for one of the endpoints. For instance, when the upper limit 0+ of this interval is quite large, the distribution of T often satisfies P(t; 0+) > P(to; 0+) for all t > t,. A special case of this holds when the probabilities are monotone increasing in t, which is guaranteed when 0+ > maxt{ct-i/ct}. In order to show this, from (2.10) we have P (T = t; 0+ ) = .. tU .
E--u----min +

For trin < t < t.x,


P(T=t;O+)-P(T=t-1;0+) tmax ( t t--1 E--u--tmin +

E +-1 (CtO+ - Ct-1).
U~tminU+

If 0+ > ,__, P(T = t;0+) > P(T = t - 1;0+) for arbitrary t. Hence, if 0+ > maxt{ct-1/Ct}, the probabilities are monotone increasing in t. In this case, since P(t; 0+) > P(to; 0+) for all t > to,


E P(t;0+) - 5 P(t;0+) = 0.
It: P(t;O+ ):_P(to;O+ )} t







Hence, this upper limit 0+ is the same as the upper limit obtained using the one-sided testing approach with double the error probability. For instance, the upper limit of the 95% interval based on inverting a two-sided test is then the same as the upper limit of the 90% interval for the approach based on inverting two separate one-sided tests. Analogous remarks apply to the lower limit. In such cases, there is a clear advantage to using this approach based on two-sided tests. Unless one is specifically interested in a one-sided confidence interval (i.e., a lower bound alone or an upper bound alone for 0), we prefer this approach.



2.4.2 The Modified Two-Sided "Exact" Confidence Interval




Following the modified approach of the previous section, one can construct a modification of this confidence interval based on two-sided tests by using a modified P-value. We define a modified two-sided P-value for testing a particular value of 0 as


P*(0) = P(O) - P({Z: Z E F, P(t; 0) = P(to, 0), T'(0) < t'(O)}). (2.14) Again, if we use the table probability for the secondary partitioning, we define a modified two-sided P-value for testing a particular value of 0 as


Pp(O) = P(o) - P({Z: Z E F,P(t;0) = P(to, 0),P(Z;0) > P(N; 0)}). (2.15) For the modified two-sided confidence interval, we consider the shortest interval that contains all of the the values of 0 for which


P*(0) > a. (2.16) The lower limit, 0, is the smallest 0 satisfying (2.16), and the upper limit, 0*, is the largest 0 satisfying (2.16). We show that this confidence interval is "exact." For all








values of 0 lying outside the closed interval 0*_ _< 0 , it follows that P*(O) < a. Then

Pr(O < 0*_,0 > 0+) _< Pr(P*(O) < a)

< Pr(P*() < a)

= E Pr(P*(O) < lrrm)

< a.

Hence, Pr(0* < 0 < 0.) > 1 - a.
This approach gives even narrower intervals than obtained by inverting the twosided test with the ordinary P-value. Note that 0- is the smallest 0 satisfying P(O) > a. Thus, before 0-, there is no point having P(O) > a. Also note that P*(O) is bounded by P(O) and P*(O) < P(O). For instance, at the ordinary lower limit, if P*(O_) = P(0_), then 0* = 0_. Otherwise, 0* > 0_. By a symmetric argument, 0_ < 0+. Hence, the two-sided modified confidence interval is contained within the two-sided ordinary confidence interval.
We illustrate these alternative "exact" confidence intervals for the common odds ratio using Tables 2.1 and 2.2. For Table 2.1 the 95% confidence interval by inverting a two-sided test is (1.29, 261.49) based on the ordinary exact P-values and (1.38, 40.45) based on modified exact P-values, P*(O) and P (0). Using Table 2.2 the confidence intervals are (0.88, 15.92) using the ordinary exact P-values, (1.01, 10.30) using P*(O), and (1.01, 11.14) using P (O).

Table 2.3 contains 95% confidence intervals obtained using the two separate onesided ordinary and modified exact P-values, and using the ordinary and modified two-sided exact P-values. For these tables, the confidence interval constructed using the ordinary two-sided P-value is shorter than the ordinary one based on two onesided P-values. In fact, for each data set, the upper endpoint for the two-sided based interval equals the endpoint that would be obtained with the one-sided method for








a 90% confidence interval. For each type of interval, the ones based on the modified P-value are narrower yet. For Table 2.2 the modified confidence interval based on T' = Zk Xk is shorter than the corresponding confidence interval based on the table probability in both one-sided and two-sided cases.

One way to compare the methods to construct the confidence interval and to calculate some degree of the conservativeness is using the coverage function (Vollset and Hirji 1991). The coverage function, for a given value of 0, is computed by summation of P(t; 0) over t for which the confidence interval contains the given value of 0. The function is then plotted as a function of 0. Hence, it displays how closely the actual coverage probability falls to the nominal coverage probability.

For the conditional distribution having the fixed marginal counts of Table 2.1, Figures 2.9 and 2.10 show the actual coverage probability as a function of the true log odds ratio, for 95% confidence intervals based on inverting separate one-sided tests using the ordinary or modified P-value. We use E X'(0) for Figure 2.9 and the table probability for Figure 2.10, for the secondary partitioning in the modified P-value. There is a clear advantage to using the interval based on the modified P-value. For Table 2.2, this calculation requires a huge computing time, and we have not been able to get results using the conditional distribution based on the margins of all 18 partial tables. Thus, we display results using various subsets of the partial tables of Table 2.2. Figure 2.11 gives an analogous display using various numbers of partial tables from Table 2.2. It shows how the conservativeness is reduced by using confidence intervals based on inverting tests with modified P-values. As the number of strata increases, the modified approach yields actual level closer to the nominal level, and this holds over a broader range of odds ratio values.

For either approach, for sufficiently large 0, all tables with those margins would have lower bound of the interval below 0; for sufficiently small 0, all tables would have upper bound above 0. In such cases, the actual probability of coverage of a









100(1 - a)% confidence interval has lower bound 1 - a/2. That bound is achieved at values of 0 that are potential endpoints of the intervals (Neyman 1935). To show this, let (0-, 0+) denote the ordinary interval based on a one-sided test. Suppose that the value of the upper limit, 0+, is large enough so that all the lower limits from other possible tables are less than 0+. Since 0+ is constructed by inverting the one-sided a/2 test, we have P(T < t,; 0+) = a/2 and P(T > t, + 1; 0+) = 1 - a/2 accordingly. The coverage function at 0 = 0+ is

C(O+) = E I(t,O0+)P(t;O0+)
t

= P(t; t>t�+1;0+)
= I - o/2,


where 1(t, 0+) is a indicator function to indicate whether or not 0+ is within the confidence interval at T = t. Note that at 0 = 0+, we have P(T < t,;O+) = o/2, and 0+ is the upper limit. At some value of T = t', the fact that 0+ is within this interval corresponds to P(T < t'; 0+) > a/2. In order to satisfy this, we need to have t' > to + 1, since P(T < t,; 0+) = a/2. Hence, the coverage probability that is the summation of P(t; 0+) over t such that t > t, + 1 is 1 - a/2. For 0 > 0+ the coverage function has P(O) > 1 - o/2.

Figures 2.12 and 2.13 give an analogous display for the confidence intervals based on inverting two-sided tests using the ordinary or modified P-value using Table 2.1. For the secondary statistic T', Figure 2.12 uses EX2(0) and Figure 2.13 uses the table probability. Again, there is an advantage to the interval based on the modified P-value. Comparing the figures of coverage probability for confidence intervals, we see there is almost always an advantage to using the confidence interval based on inverting two-sided tests. Figure 2.14 gives an analogous display using some fixed sets of margins of Table 2.2. There is a dramatic improvement in the two-sided modified confidence intervals, when the number of strata is large. As the number of









strata increases, we can expect that actual coverage probability is very close to the nominal coverage probability. When log 0 is between -2 and 2, we see there is a large increase in the coverage probability for both the ordinary two-sided and modified twosided confidence intervals. At that point, many new tables for which the confidence intervals contain the given value of 0 are added to the calculation of the coverage probability, and the jump comes from the new included non-null table probabilities. For the coverage probability based on two-sided ordinary tests, the big jump has occurred before the coverage probability based on two-sided modified tests has a big jump, and the amount of increase is greater than that of two-sided modified tests. Also, at that jump point, more new tables are included for the coverage probability based on two-sided ordinary tests than the coverage probability based on two-sided modified tests.

We have observed similar results using other sets of fixed margins. In particular, for the two-sided approach, for large I log 01, the true coverage probability has 0.95 as a lower bound rather than 0.975. For the proof, let (0, 0+) be the ordinary confidence interval based on the two-sided test. Suppose that the value of the upper limit, 0+, is large enough so that all of the lower limits from other possible tables are less than 0+. Then at 0 = 0+ we have Z{t: P(t;o+)<_P(to;O+)} P(t; 0+) < a, accordingly, Eft ,P(t;o+)>P(to;O+)} P(t; 0+) > 1 - a. At 0 = 0+, the coverage function is


C(O+) E >ZI(t,O+)P(t;O+)
t

- 2 P(t;0+)
{t: P(t;O+)>P(to;0+)}

> 1-a


since at 0 = 0+, we have E{t: P(t;O+) _P(to;O+)} P(t; 0+) < a. At some value of T = t', the fact that 0+ is within this interval corresponds to Z-t: P(t;o+) a. In order to satisfy this, we need to have P(t'; 0+) > P(to; 0+). Then the two-sided








ordinary P-value is larger than a at T = t'. Hence, the coverage probability, which is the summation over t such that P(t; 0+) > P(to; 0+), is at least 1 - a. Also for

0 > 0+ the coverage function has P(O) > 1 - a.

For a special case, suppose that P(t; 0+) > P(to; 0+) for all t > to. Then at 0 0+,


P(9+) = P(t; 0+) = P(t; 9+) =a.
{t: P(t;O+)_P(to;O+)} t
Accordingly, we have Et>t,+l P(t; 0+) = 1 - a. Then the coverage function at 0 - 0+ is

C(O+) = ZI(t,O+)P(t;O+)
t

= P(t;0+)
t>to+l
- 1 -a,


since at some value of T = t, the fact that 0+ is within this interval corresponds to P(T < t'; 0+) > a. This requires t' > t, + 1, since P(T < to; 0+) = a. Hence the coverage function has C(O+) >_ 1 - a. This relates to the property mentioned previously, by which an interval endpoint for the two-sided approach with error probability a can equal one for the one-sided approach with error probability 2a.

So far, we have used the coverage probability to compare the methods of constructing the confidence interval. An alternative way to compare them is to compute the expected length of confidence intervals for 0 or for log 0. A complication results from infinite endpoints that occur at T =tmx or T = trin. Figure 2.15 displays the expected length of confidence intervals for 0, for four methods, using the margins of Table 2.1. The two-sided modified confidence interval has the smallest expected length, uniformly for all 0. For instance, the expected lengths at 0 = 1 are 21.84, 17.22, 13.78, and 11.21 for one-sided ordinary, one-sided modified, two-sided ordinary, and two-sided modified intervals, respectively. For this figure, we arbitrarily set the








upper limit equal to 1000 whenever T =tmax. Since the expected length depends on the upper limit at T = tmax, that value was chosen to be almost two times the maximum finite upper limit among the four methods. Figure 2.16 presents the analogous expected length of confidence intervals for log 0, using the margins of Table 2.1. Again, the two-sided modified confidence interval has uniformly the smallest expected length. We use 1.0 x 10- for the lower limit of 0 at T = tin and 1000 for the upper limit of 0 at T = tmax. Figures 2.17 and 2.18 give analogous displays using the margins of table 2.1, comparing the lengths conditional on T -� t,,Ii, or tmax. Then, the expected length does not depend on the values of the lower limit at tr.in and the upper limit at T = tmax. Again, the two-sided modified confidence interval has uniformly the smallest expected length.



2.4.3 The One-Sided Mid P Confidence Interval




For confidence intervals for a common odds ratio based either on inverting two separate one-sided tests or inverting a two-sided test, one can construct even narrower intervals, albeit not "exact" ones, by inverting the tests based on the modified mid P value. The ordinary mid P confidence limits based on inverting two separate one-sided tests are found using the functions


Pnid(l)(O) = P1(0) - IP(to; 0)),
2

Pmid(2)(O) = P2(0) - 1P(to; 0)). (2.17)
2

The limits are determined by the same method used for the modified exact confidence interval, using Pnid(l)(0) for the lower limit and P. d(2)(0) for the upper limit. Though








approximate, this type of confidence interval based on the ordinary mid P-value has been observed empirically to behave well (Mehta and Walsh 1992).

Following the modified approach based on using a one-sided modified mid Pvalue, let BI(O) {Z: Z E F,T = t, T'(O) = t'(O)}. The modified mid P confidence interval based on inverting two separate one-sided tests uses


P'mid(1)(O) = PI(O) - 1P(B(O); 0),
2

Pd(2) P(0) - 1P(Bi(0); 0). (2.18)


The limits are chosen by the same method used for the modified exact confidence interval, using Pl.d(i)(0) for the lower limit and P d(2)(0) for the upper limit. This approach tends to give narrower intervals than obtained by inverting the one-sided test with the ordinary mid P-value. We illustrate these confidence intervals for the common odds ratio using Tables 2.1 and 2.2. For Table 2.1, the 95% confidence interval by inverting a one-sided test is (1.34, 266.54) based on the ordinary mid P-values and (2.22, 56.00) based on the modified mid P-values using E X'(0) or the table probability for T'. Using Table 2.2, the confidence intervals are (0.98, 16.89) using the ordinary mid P-values, (1.01, 13.61) using the modified mid P-values with X(0), and (1.04, 14.85) using the modified mid P-values with the table probability for T'.



2.4.4 The Two-Sided Mid P Confidence Interval




As the two-sided approach tends to give an interval that is usually narrower than the one based on inverting two separate one-sided tests, we can construct a shorter interval using two-sided mid P-values. Though these cannot guarantee achieving at








least the nominal confidence level, one could define mid P versions of the ordinary two-sided and modified two-sided intervals. For testing a particular value of 0, a two-sided mid P-value can be defined as


Pmid(0) = P(O) - P({Z: Z E F, P(t;0) =P(to;0)}). (2.19)
2

The limits are determined by the same method used for the two-sided exact confidence interval.

Following the modified approach, one can construct a modified confidence interval based on two-sided tests by using a modified mid P-value. We define a modified twosided mid P-value for testing a particular value of 0 as


PnA(O) = P*(O) - -P({Z : Z E F, P(t; 0) = P(to; 0), T'(0) = t'(0)}). (2.20) 20

Also, the limits are determined by the same method used for the two-sided exact confidence interval. We illustrate these confidence intervals for the common odds ratio using Tables 2.1 and 2.2. For Table 2.1, the 95% confidence interval by inverting a two-sided test is (1.38, 131.51) based on the ordinary mid P-values and (1.38, 35.51) based on modified mid P-values using T' = X2(0). Using Table 2.2, the confidence intervals are (1.01, 12.58) and (1.01, 10.29) using the ordinary and modified mid Pvalues with T' = E Xk(0), respectively. For these data sets, the confidence interval constructed by using the ordinary two-sided mid P-values is shorter than the ordinary one based on two one-sided mid P-values. For each type of interval, the modified interval is narrower than the ordinary one. Table 2.4 summarizes these 95% confidence intervals using Table 2.1 and Table 2.2.

For the conditional distribution having the fixed marginal counts of Table 2.1, Figure 2.19 shows the actual coverage probability as a function of the true log odds ratio, for the 95% confidence intervals based on inverting separate one-sided tests using the ordinary mid P-value or the modified mid P-value with T' = E Xk(0). The








exact method yields a coverage exceeding the nominal level, whereas the coverage of the mid P-value fluctuates about the nominal level. For either approach, for sufficiently large I log 01, the actual probability of coverage of a 100(1 - a)% confidence interval is centered about 1 - a/2 and that of the modified mid P-value deviates less from 1 - a/2.

Figure 2.20 gives an analogous display for the confidence intervals based on inverting two-sided tests using the ordinary mid P-value or the modified mid P-value with

T X2 (0). There is an advantage to the interval based on the modified P-value. For either approach, the actual probability of coverage of a 100(1 - a)% confidence interval is centered about the nominal level, and that of the modified mid P-value is even closer to the nominal level. For intervals using mid P-values, we suggest the use of the confidence interval based on inverting two-sided tests using the modified mid P-value.



Table 2.3. Various 95% confidence intervals for the common odds ratio.
Method Data set 1 Data set 2 Exact CI
Ordinary 1-sided P 1.08, 531.51 0.86, 21.37 Modified 1-sided P (P*) 2.08, 67.35 1.01, 13.63 Modified 1-sided P (P ) 2.08, 67.35 1.04, 14.87 Ordinary 2-sided P 1.29, 261.49 0.88, 15.92 Modified 2-sided P (P*) 1.38, 40.45 1.01, 10.30 Modified 2-sided P (P ) 1.38, 40.45 1.01, 11.14 Approximate CI
Mantel-Haenszel 1.03, 47.73 0.86, 12.93 ML 1.28, 128.12 0.99, 17.64











Table 2.4. Various 95% confidence intervals for the P-value.


common odds ratio using mid


Method Data set 1 Data set 2 Approximate CI
Ordinary 1-sided mid P 1.34, 266.54 0.98, 16.89 Modified 1-sided mid P (P*) 2.22, 56.00 1.01, 13.61 Modified 1-sided mid P (P') 2.22, 56.00 1.04, 14.85

Ordinary 2-sided mid P 1.38, 131.51 1.01, 12.58 Modified 2-sided mid P (P*) 1.38, 35.51 1.01, 10.29


COVERAGE P
C



co
C1




(0
0 , - -"
C)
C ,. . y



0


------- One-sided Modified P
D
One-sided Ordinary P
0
C:D

-4 -2 0 2 4 LOG THETA


Figure 2.9. Coverage probability for confidence intervals based on inverting one-sided tests with T' = X'(O), for conditional distribution based on margins of Table 2.1.





























One-sided Modified P One-sided Ordinary P


-4 -2 0 2 4 LOG THETA


Figure 2.10. Coverage probability for confidence intervals based on inverting onesided tests with T' = P(Z), for conditional distribution based on margins of Table


COVERAGE P

of











-0ERAGE P

00 (0 0)


K=3


......... One-sided Modified J -One-sided Ordinary

-4 -2 0 2 4
LOG THETA

K=9
-RAGE P


0


C)






0 O0 0) 0)



0) 0)


COYEI
0 O0 CD



C)

CDI


2 4


K=6
RAGE P







......... One-sided Modified I --One-sided Ordinary I


-4 -2 0
LOG THETA

K=12


2 4


-4 -2 0 2 4
LOG THETA


Figure 2.11. Coverage probability for confidence intervals sided tests with T' X'(O), for conditional distribution tables of Table 2.2.


based on inverting onebased on first K partial


-4 -2 0
LOG THETA


--------- One-sided Modified I
--One-sided Ordinary I


A











COVERAGE P






0)




(0
0)









0
cD
0)\
0



0 0)


6


.......... Two-sided Modified P

Two-sided Ordinary P


-4 - 0 2 4 LOG THETA


Figure 2.12. Coverage probability for confidence intervals based on inverting twosided tests with T' = Xk(O), for conditional distribution based on margins of Table
2.1.










COVERAGE P


----------- Two-sided Modified P

Two-sided Ordinary P


-4 -2 0
LOG TH ETA


Figure 2.13. Coverage probability for confidence intervals based on inverting twosided tests with T' = P(Z), for conditional distribution based on margins of Table
2.1.











C.O0ERAGE P

C
C5

)


K=3


-4 -2 0 2 4
LOG THETA

K=9
-RAGE P







--------- Two-sided Modified
-Two-sided Ordinary


-4


-2 0
LOG THETA


K=6


2 4


-4 -2 0
LOG THETA

K=12


-4 -2 0
LOG THETA


Figure 2.14. Coverage probability for confidence intervals based on inverting twosided tests with T' = X2(O), for conditional distribution based on first K partial tables of Table 2.2.


6


--------- Two-sided Modified
--Two-sided Odnr


......... Two-sided Modified I
-Two-sided Ordinary I


0.
0). C). 0) C).
0 0)


2 4


2 4


......... Two-sided Modified
-Two-sided Ordinary


m











LENGTH (THETA)


C)



0







0D
-1 One-sided Ordinary P
----- One-sided Modified P C> - Two-sided Ordinary P
-Two-sided Modified P 0J." tOesddOdnr


THETA


Figure 2.15. Expected length of confidence intervals for 0, with T' = Xk(O), for conditional distribution based on margins of Table 2.1.










LENGTH(LOG THETA)


\ ". \ / /-. N.\ -, \/ /'\ \'. \ / .,// \ ',". \ / / / \ "/', - ." "/

\. /,-..' /
.o /, /
"N.", ....... , /
" -' : -- / \ / N./ Onesided Ordinary P
N.
...... One-sided Modified P

Two-sided Ordinary P
--- Two-sided Modified P


0
LOG THETA


Figure 2.16. Expected length of confidence intervals for log for conditional distribution based on margins of Table 2.1.


0, with T'= E Xk(),










LENGTH (THETA)



0 C\4


........ .................---


-
- -


One-sided Ordinary P ......... One-sided Modified P ...-. Two-sided Ordinary P
Two-sided Modified P


THETA


Figure 2.17. Expected length of confidence intervals for 0, conditional on T � t,$n or tmax, with T' = E X, (0), for conditional distribution based on margins of Table 2.1.










LENGTH(LOG THETA)


One-sided Ordinary P ........... One-sided Modified P
--- - Two-sided Ordinary P

-- - Two-sided Modified P


4
-4 -2 024 LOG THETA


Figure 2.18. Expected length of confidence intervals for log 0, conditional on T $ trin or tmnax, with T' = X'(0), for conditional distribution based on margins of Table
2.1.










COVERAGE P


........... One-sided Modified Mid P
o_ One-sided Ordinary Mid P
O
-4 -2 0 2 4 LOG THETA


Figure 2.19. Coverage probability for confidence intervals based on inverting onesided tests using mid P-values with T' X,(0), for conditional distribution based on margins of Table 2.1.










COVERAGE P


-Two-sided Ordinary Mid P


-4 -2 0 2 4 LOG THETA


Figure 2.20. Coverage probability for confidence intervals based on inverting twosided tests using mid P-values with T' Xk(O), for conditional distribution based on margins of Table 2.1.









2.5 Connections with Logistic Regression




Consider a set of independent binary variables, Y1," * , Y,. Corresponding to each variable, Yj, there is a (p x 1) vector xj = (xlj,... , xj)' of explanatory variables. Let wrj be the probability that Yj = 1. Suppose that the response is related to the explanatory variables by the logistic regression model, log + -'0. (2.21)


The likelihood function is

_exp[+ y(x I3 + y)]



The p x 1 vector of sufficient statistic for 3 is t = j=lyjxj.

Suppose p = 2, and we want to conduct inferences about /l1. Again, one can eliminate I2 by conditioning on its sufficient statistic, t2 = ZjYjX2j. One can treat the data for the logistic regression model as a three-way 2 x I x K tables where I and K are the number of distinct values of the explanatory variables, X1 and X2, respectively.

Exact inference in logistic regression often is highly discrete, even degenerate. One can often alleviate this problem somewhat by treating the data as a contingency table and using the alternative way discussed in Section 2 of constructing P-values. To illustrate, for Table 2.1 we let 7rij denote the probability of cure for the jth individual at the ith penicillin level. The logistic model has form log "' = 'j + /3xij i = 1,... , 3,
1 --riJ

where xij is a dummy variable for delay. The observed value of the sufficient statistic T is 14. For testing H0 :fi = 0, the exact one-sided P-value is P = P(T > 14) = 0.0200. The modified exact P-value, using T' = X'(O) or the table probability, is 0.0028.









2.6 Discussion




We have shown that use of a modified P-value leads to exact tests and confidence intervals that are less conservative than the usual ones. The improvement can be considerable when K is large but n is not, in which case there may be a large number of tables with the different secondary statistic value that have the same primary test statistic value.

We prefer modified exact tests and confidence intervals over the ordinary exact ones, because they are less conservative than the ordinary ones but still guarantee at least the nominal level. We prefer confidence intervals based on inverting two-sided tests over those based on inverting two separate one-sided tests, because they tend to be less conservative. Likewise, for confidence intervals using mid P-values, we prefer intervals based on inverting two-sided tests using modified mid P-values.

For the secondary statistic, we have used Zk Xk' and the table probability in our examples, and clearly the reduction in conservativeness occurs with test statistics for more general alternatives. A FORTRAN program has been prepared, designed for IBM-compatible PCs or UNIX workstations, for computing modified P-values for tests of conditional independence and modified confidence intervals for an assumed common odds ratio. This program also computes the actual coverage probability and the expected length of confidence intervals using four methods. This program, for 2 x 2 x K tables, is an adaptation of one written by Vollset and Hirji (1991) for ordinary exact inference for such tables. The Appendix A contains the FORTRAN source code.














CHAPTER 3
APPROXIMATING EXACT INFERENCE ABOUT CONDITIONAL ASSOCIATION


3.1 Introduction




For three-way tables, consider the hypothesis of conditional independence of X and Y, given Z. This hypothesis is usually tested against the alternative of no threefactor interaction. The general alternative that permits three-factor interaction is the general loglinear model for a three-way table and has the form Ax z ~ (3.1) 109 Mijk= y- + A- + AY + AZ + A)( + Ak + AYZ + V(, (.1


When X or Y are ordinal, narrower alternatives can be constructed for the exact tests.

We suggest exact inference regarding conditional associations in three-way contingency tables. For I x J x K tables, we discuss six test statistics for conditional independence that have natural connections with loglinear models for various alternatives. We use a simulation algorithm to obtain precise estimates of exact P-values for cases that are currently computationally infeasible.

For three-way contingency tables, current computational algorithms for the exact methods are restricted to certain analysis for 2 x J x K tables. Also when the sample size is small or when the contingency tables are sparse, large-sample approximations can be questionable to apply. The Monte Carlo method is an alternative to either the exact or asymptotic methods. This method is based on estimating the exact conditional sampling distribution of the statistic, by generating random tables having the relevant fixed margins. The advantage of this method is that the number of tables








generated is fixed in advance, and the computing time does not depend greatly on the sample size n and the table size, compared to methods for exact analysis. For the random table generation, we use the procedure by Patefield (1981) that simulates hypergeometric distributions.

Section 2 discusses exact tests of conditional independence in I x J x K tables using three statistics that are popular for asymptotic tests. These are naturally linked to alternatives corresponding to loglinear models that assume a lack of three-factor interaction. Section 3 presents three other statistics that do not require this assumption. All six test statistics are score statistics for loglinear models that treat none, one, or both of the classifications as ordinal. Section 4 discusses possible alternative ways of forming modified exact P-values in I x J x K contingency tables, generalizing the modified P-value discussed in Chapter 2. We propose modified exact P-values for six tests for testing conditional independence with I x J x K tables.

Computational algorithms have limited availability for tests of conditional independence when I and J exceed two. Section 5 describes a Monte Carlo sampling routine that approximates the ordinary and modified exact P-values. We utilize six test statistics for exact tests of conditional independence. Section 6 illustrates approximate exact tests of conditional independence with examples, and Section 7 explains a FORTRAN program utilizing the simulation algorithm.



3.2 Tests of Conditional Independence Assuming No Three-factor Interaction




This section presents three test statistics for testing conditional independence of X and Y, given Z, in I x J x K contingency tables, proposed by Birch (1965). We present loglinear models for which these are score statistics. These models assume a lack of three-factor interaction. We then present three adaptations of these statistics








that do not require that assumption in the next section. In each case, one test treats both X and Y as nominal, one test treats X as nominal and Y as ordinal, and one test treats both as ordinal.

The asymptotic chi-squared theory is well developed for the statistics we present. Our focus will be to construct exact tests of conditional independence, using these statistics with the reference set F of tables with the same margins. We use score statistics for loglinear models rather than likelihood-ratio or Wald statistics. This makes the computations for exact analyses simpler, since one does not need to fit the model for each table in F.



3.2.1 Nominal-by-Nominal Test




Birch (1965), Landis et al. (1978), and Mantel and Byar (1978) generalized the Cochran- Mantel-Haenszel statistic to handle more than two groups or more than two responses. Suppose X and Y are nominal. Let nk denote the counts for cells in the first I - 1 rows and J - 1 columns for stratum k of Z. Conditional on the row and column totals in that stratum, let Mk denote the null expected value of nk. Then d = Ek(nk - ink) represents the (I - 1)(J - 1) x 1 vector having elements,

n (i+kn+jk ,..,I-1 1. (3.2)
dij= E~njk (n++k ] ZJ


Let Vk denote the null covariance matrix of nk, where


COV(71jk, f'j'k) _ lni+k(6biin++k - ni,+k)n+jk(6jj,n++k - n+j'k) (3.3) , 2 =+k(n++k - 1)


with .b {1 ifa=b
t = 0 otherwise.









Then V = EkVk is the null covariance matrix of d. The efficient score statistic for testing conditional independence against the alternative of no three-factor interaction is

C2 = d'V-d. (3.4) This is also called the generalized Cochran-Mantel-Haenszel statistic. Under conditional independence, this statistic has a large sample chi-squared distribution with df = (I - 1)(J - 1). For K = 1 stratum with n observations, the statistic reduces to the multiple (n - 1)/n of the Pearson chi-squared statistic for testing independence.

The statistic C2 is sensitive to detecting conditional associations when the association is similar in each stratum. Hence, the generalized Cochran-Mantel-Haenszel statistic has low power for detecting an association in which the patterns of association for some of the strata are in the opposite direction of the patterns displayed by other strata, relative to the case that the association is similar.



3.2.2 Ordinal-by-Ordinal Test




When X and Y are ordinal, it often makes sense to test against a narrow alternative, corresponding to a monotone trend in the conditional association. It then makes sense to form a test statistic using a model that is a special case of the no threefactor interaction model and reflects the ordinality, such as the model of homogeneous linear-by-linear association,
109 Milk = 1 + Ax + AY + Af + #uivj + AXZ + AY?. (3.5)


It replaces the general association term A (Y by a linear-by-linear term /uivj, where {ui} and {vj} are monotone scores for levels of X and Y. The parameter /3 in that model describes X - Y partial association. The model of conditional independence








of X and Y is its special case in which fl = 0. For this model, the sufficient statistic for /3 is 2k[EjEjuivjnijk]. When I = J = 2, the usual statistic Znllk results from the scores ul = V1 = 1,u2 = V2 = 0. This is the Birch's exact test statistic for testing conditional independence in 2 x 2 x K contingency tables, and we have utilized this statistic in Chapter 2 for the conditional exact test. Also, Mehta, Patel and Gray (1985) and Vollset, Hirji and Elashoff (1991) used this statistic to implement the exact test.

For the asymptotic test of H, /3 = 0, one can use Mantel's (1963) generalized statistic for detecting association between ordinal variables. This ordinal test focuses the departure from independence on a single degree of freedom. Suppose we expect a monotone conditional relationship between X and Y, with the same direction at each level of Z, and suppose that we can assign monotone scores {ui} to levels of X and {jv} to levels of Y. Then there is evidence of positive trend if, within each stratum, the statistic Ei~juivjnijk is greater than its expectation under independence.

For the model (3.5), given the marginal totals in each stratum and under conditional independence of X and Y,


E(Ej~3Jujvjnijk) = (Eiuini+k) (Ejvjn�jk)
n++k


Var(23i~juivn = 1 [Eu2ni+k (iuini+k )2] x [- (Zjvjn+j k )2
VrEjuvnk =n++k - 1 ik n++k j ~k- n++k


To summarize the correlation information from the K strata, Mantel (1963) proposed the statistic


M2 ={2k~[EiZ'iuvjnijk - E(jZEjujvjniJk)]}2 (3.6) EkVar(Ei Zjuivjnjk)

This is the score statistic for testing conditional independence for model (3.5). It has an asymptotic chi-squared distribution with df = 1.








3.2.3 Nominal-by-Ordinal Test




Suppose the row variable X is nominal and the column variable Y is ordinal. A useful loglinear model replaces the ordered row scores in model (3.5) by unordered parameters {t },
~ (3.7) 109 rnijk = y + Ax + AY + Az + ftivj + AiXz + Ayz 3.7


The sufficient statistics for {ti} are Ejvjnij+, i = 1,... , I. These can be interpreted as the row sums for a response Y within each level of X, using the scores {vj }, summed over the strata. Assuming the model holds, we can test conditional independence by testing j = /12 =.... = pI. Let YI1,"" , Y,+, be a random sample within the stratum k, which takes scores vi,'-- , vj. Let 1 denote the (I - 1) x 1 vector having elements Ii = Wkfl+k(Wik - Wk), (3.8) where

Wik = E(h:Ih=)Yh/fni+k

= Ejnijkvj/ni+k, h 1,...n++k,

and



= YiYZjjnijkVj/n++k.

Note that Wik is the row mean on Y at level i of X and level k of Z, treating Y as a response with scores {vj}. Similarly, Wk is the kth stratum mean for Y. Let A denote the null covariance matrix of 1, which has elements

Cov(1'IiI)= Efli+k(6ii'71++k - fliI+k) En++k(Wh Wk)]1
Cov(4, li,) = 'kt ln++k(n++k - 1) h= h

= Ekt n++k(6ni'fl++k - i+k) Ejn+jk(vj- Vk)2]. (3.9) Yik[f++k(fl++k - 1)









Then the efficient score statistic for testing conditional independence against the alternative of (3.7) is I'A-11. This statistic is sensitive to location differences among the I conditional distributions of Y that are similar at each level of Z. The asymptotic null distribution is chi-squared with df = I - 1.

The three statistics just discussed were suggested by Birch (1965) for testing conditional independence. The three asymptotic tests are available in SAS (PROC FREQ).



3.2.4 Generalized Tests




The previous three statistics are special cases of a general statistic proposed by Landis et al. (1978). Let nk denote a column vector of the cell counts in stratum k, and let mk denote their expected values. Also let Pi+k denote the marginal proportion of ith row and let P+jk denote the marginal proportion of jth column. We introduce the following notation to define the generalized test statistic.



IIk
n 'ik (n lk, ., n Jk)

nk =(n' ,.-,, )


Pf+k = ni+k/?Z++k

P+jk = n+jk/n++k

Pt= (Pl+k,P2+k,� , PI+k) =( n+k n2+k . nI+k


+*k =(P1k P+2k P+Jk= +k 7+2k n+Jk + ++k n++k n++k









Assume that cell counts from different strata are independent. Landis et al. (1978) showed that under the hypothesis of conditional independence, the expected value and covariance matrix of the frequencies are, respectively, mk = E[nklHo] =n++k(P+k � P+.k) (3.10) and


Var[nkj++k 1[(Dp+k - P*+kP*+k) 0 (Dp+* - P+.kP+.k)], (3.11)
n++k 1

where 0 denotes Kronecker product multiplication and Da is a matrix with elements of a on the main diagonal.

The generalized statistic for testing conditional independence is defined as

QM = G'V-G, (3.12) where

G = IEkBk(nk - Mk)

VG = EkBk[Var(nkjHo)]B,

and where

Bk = Rk 9 Ck

is a matrix of fixed constants based on row scores Rk and column scores Ck for the kth stratum. When the null hypothesis is true, the statistic QM is approximately distributed as chi-squared with degree of freedom equal to the rank of Bk.

Suppose the row variable X is nominal and the column variable Y is ordinal. Then mean score of Y is meaningful. In this case, the mean score is computed for each row of the table, and the alternative hypothesis is that, for at least one stratum, the mean scores of the I rows are unequal. Then the statistic is sensitive to location differences among the I distributions of Y.








For this case we can define the matrix Rk that has dimension (I - 1) x I as


Rk = (i-1,-gl-1), (3.13) where I,-1 is an identity matrix of rank I - 1, and JI-1 is an an (I - 1) x 1 vector of ones. The matrix has the effect of forming I - 1 independent contrasts of I mean scores. The matrix Ck has dimension 1 x J, and the scores are specified as one for each column. Then QM sums over the K strata information about how I row means compare to their null expected values, and it has df = I - 1.

When both variables are ordinal, Rk and Ck can be defined as Rk = (ul, ui) and Ck = (v1,'" , Vj). If the scores Rk and Ck are the same for all strata, QM simplifies to M2.

When both variables are nominal, Rk = (II-1,-JI-1), and Ck = (IJ-1, -J-1) can be used. Then QM simplifies to d'V-'d with df = (I - 1)(J - 1).

For exact tests of conditional independence in I x J x K tables, we discussed test statistics assuming a lack of three-factor interaction. These are score statistics for loglinear models that treat none, one, or both of the classifications as ordinal. Also they have asymptotic chi-squared distributions.



3.3 Tests of Conditional Independence Permitting Three-factor Interaction




The tests discussed so far assume no three-factor interaction. Suppose, instead, we expect the nature of the association between X and Y to vary considerably across levels of Z. Then one would test against an alternative that permits the association to vary across the strata of Z.








3.3.1 Nominal- by-Nominal Test




Suppose X and Y are nominal. Then one could test conditional independence against the saturated loglinear model, since the only more general model is the saturated model. An efficient score statistic is the Pearson statistic for testing conditional independence against the alternative of the saturated model (Agresti 1992). Letting Xk denote the Pearson statistic for testing independence within the kth level of Z, this statistic is EkXk. The asymptotic distribution of this statistic is chi-squared with df = K(I- 1)(J- 1), since at each partial table Xk2 has asymptotic chi-squared distribution with df = (I - 1)(J - 1), and we have K independent partial tables. Also, this is the df for testing a loglinear model of conditional independence against the most general alternative.



3.3.2 Ordinal-by-Ordinal Test




The model of homogeneous linear-by-linear association (3.5) allows association between two ordinal variables in each table and this association is homogeneous across levels of Z. When X and Y are ordinal, one sometimes expects a monotone association between X and Y that changes strength across levels of Z. We consider a loglinear model that permits association between X and Y within each level of Z, but heterogeneity among levels of Z, and the degree of heterogeneity is explained by its association parameter. A relevant loglinear model is then the heterogeneous linear-by-linear association model,

log m22k =y + AX + AY + Az + kUiVj + A + Ay7. (3.14)





74


For this model, the null hypothesis of conditional independence is H0 : =1 f#K = 0. The loglikelihood is L(m) = EZEflznijk log mijk - Z 7f ijk
i j k i j k

=ZZEfl ik([t+ AiX' A}'+ AZ + ktiVj + AXZ+ Ay7) - :IMjk
., k z j k >1 )~xnj � S A~n+j2+ �>Ank + k >3 E 1:uivjfl2k
i j k k i j

+ >3> xz + E E Ayzn k - E E E mik. (3.15)
i k k i j k

For this model the sufficient statistic for 0k is YiEjuivjnijk. For k = 1,.. , K, the derivative of the loglikelihood is

=L(m) - Uivflik - UiViMi3k a ki j i j Under the hypothesis of conditional independence, we have Tmijk = i+k +. Hence, for k= 1,... ,K,

OL(m) im k - >3 >3 3 (ni3k -ni3k)
0/3k = 3



>3>3UiV3(njk - fi+kl+jk) ij n++k


n>3>3uivj(pijk- Pi+kP+jk) Sj p++k


Let s denote the K x 1 vector having elements

Sk =Ei~jUiVj(Pijk -Pi+kP+jk P++k

- 1 EiEjuivj(nijk ni+k+j). (3.16) TZ l+








Then s can be defined as

Zi Zj tivj(Pijl - )P+i uvj~pj2 -P++I EL Z-j UiVj(pij2 - P++2 "
- jE ijPj - Pi+''k)

Ei Zj Uivj(PijK - Pi+kP+iK) P++K


Z3jF- uivj(nij, - _______) n++2
Ei Zj uivj(nij2 - ni+2fn+12) n++2
n ZjE uivj (nijk - n+f+k) i j utiv(rtijK -n+Kn+jK n++K





For fixed k, let Gk(7r) = j > j uivj(7rijk - l+kl+jk.). Let gk represent the IJ x 1 lr++k

vector having elements
1

gk(i,j) = [(uin++k - EYiuana+k)(vjn++k - Ebbn+bk)],


and let gD be the IJK x 1 vector with gD'= (O-k1)IJ, gk',OKk)1J)" For example,

D OG(r)

(ul-++i - Ea ua7ra+i)(Vl7r++1 - Eb Vb7r+bl) (u1'++1 - Ea ua7ra+1)(V27r++l - Eb Vb7r+bl)

2 (Ui r++ - E Ua7ra+1)(Vj7r++ - Zb V r+bl) 7-..++I .. .

(UIr++1 - Ea Ua~ra+l)(VJr++1 - b Vbr+bl) O(K-1)IJ








(uln�+I - Za, Uanalci)(Vlfl++i - Zb Vbfl+bl) (uin�+i - Ea Uana+i)(v2fl++i - Zb vbnbfl)

- -( a � .a + .)(v n + + l - Vb b + b l)
(umn++l - EZa uafla,+l)(vjfl+�1 E b Vbfl+bl) O(K-I)IJ


= (K-l)il I


D aGk(lr) gk a


I
- .2
7++k


1
n 2


(nvilr++k (Ulr++k ( Uilr+�k (urlr+�k


O(k-)IJb
u7ra+k)(Vl7r++k - Eb V7r+bk)

E a u,7r%,+k)(V2'r++k - Zb Vb7r+bk)
- E7 +k)( V:r++k - E b V b+bk) ,



- ,7~ara+k)(Vj~r++k - Zb VbWr+bk) O(K-k)IJ


O(k-l)IJ
(uin++k - Ea Uafla+k)(VlU++k -bVflbk) (uln+�k - Za u~Th.k)(V2fl�+k - Zb Vbnl4-k (u n++k - Ea U.na+k)(Vjn++k - Eb Vbn+bk) (u n++k - E, afla+k)(vJfl++k - Eb Vbn+bk)
O(K-k)IJ


[O(k-1)IJ 1
gk
LO(K-k)lJl








Also let D represent the K x IJK matrix such that row k consists of g', that is




D
07r
a3GK(7r)'
0 r





The null asymptotic covariance matrix of s is H DD'/n, where n = YZ>nijk and E = Diag(p) - pp' with p { { }. The score statistic for testing H0 :01= ++k

S3K = 0 is then s'H-'s. From Rao (1973, page 418), the asymptotic distribution of s is K-variate normal. Its mean is zero and dispersion matrix is the information matrix. Hence the asymptotic distribution of s'H-s is chi-squared with df = K. The number of df is the number of components of parameters for testing, or the rank of the asymptotic covariance matrix.



3.3.3 Nominal-by-Ordinal Test




A loglinear model (3.7) implies there are row effects on the association, and these row effects are the same for each level of Z. In general cases when X is nominal and Y is ordinal, we might expect heterogeneity in the row effects on the association. Then a relevant loglinear model to allow heterogeneity across the strata is
log,,ni k = it + AX + AjY + A' +k ikv ,Z+ y.(.7
j~ikj + X' +AYZ.(3.17)

The model is sensitive to alternatives whereby means on Y vary across levels of both X and Z. For identifiability, we use constraints >r 14k = 0. For this model, the null hypothesis of conditional independence is H0 :Ilik = 0 for i = 1,... , I - 1 and





78


k 1,.- ,K. The loglikelihood is L(m) = E 5 1: nijk09log mijk - jk
i j k i j k
SS~njk(L+AX+Aj+Ak+PizkVa+AZ+A)-S5mtjk
i j k ik jk

- n+u� S ni++ + 5Ayn+ + ZAk n+k + 555 ~IkVflak i~~ j k ik + EAXZn + 55AYZn - 555 j (3.18)
i k j k i j k

For this model the sufficient statistic for /Lik is Ejvjnijk. For fixed i and k, the derivative of the loglikelihood is L(m)
= E vjnijk - VjMijk. aItik j


Under the hypothesis of conditional independence, we have i2ijk -,+k'+i. Hence, for fixed i and k,
OL(m),= = v vi(nijk - r nijk)
(9/tik j


= vj(nik - ni+kn+jk) J fn++k


= n v2(pijk - Pi+kP+jk) P++k

For i 1,..- ,I - 1, k = 1,... ,K, let q be the K(I - 1) x 1 vector having elements

qik = 1: vj (pijk - Pi+kP+jk j P++k

- iL~vflk -1i+kn+jk 71 n++k

1
- -1+k(Wk - Wk), (3.19)
n








where Wk = j nijkvj/ni+k, and Wk = EZ j nijkVj/n++k. Then q can be defined as

Ej vj(PIj1 - p" ' ) qJ V j(P2jI - P2P+Ik

Zj VJ (P(I-1)jl - P(-1)+kP+Il) P++l
.. ..........................
Ej Vj(P2Jk - Pl+kP+7) P++k
Ej v3(p2j k - P ) q =P++k

E j Vj(P(l-l)jk - P(--1)+kP+jk )
P++k
. . . . . . . . . . . . . o. . . . . . . . . . . . ..
o .. . . . . . . . .. . . .o o. . . . . . . . . . . ..
P++K
Zj vj(P2jK - PI+KP+TK P++K'
E~j vj(p(1-1)jK -- ELI-1)+KP+,IK
' P++K


Or it can be written as


Ej Vj(ljl - K' + 71 Ej j()2jl - _n_+.l+1) nl++l
Ej vj(n(,-)jl _ n/-,)+ln+jl
n++,
..... oo.. .............,........o
Ej vj(nljk - +n+k 1 Ej vj(n. jk _ n2+kn )
q = -n++k�
n ...
Ej vj(n(,-I)jk - nl ++k


JEj vj(nijK - "1'+-K)
n++K
Ej vj(n2jK - n++K)

Ej Vj(n(l-)jK __ n(-1)+Kn+jK ~~n++K .








For fixed i, k, let Gik(7r) = >J vj(7rijk - lirll+Ik) Let rik represent the IJ x 1 vector Ir+ + k
having elements
1
rik(i,j) - n +-[(vjn++k - Ebvbn+bk)(n++kii, - ni+k)], z' 1,-,I,


and let rE be the IJK x 1 vector with ri= (O~k l)iJ, rik, O(K k)iJ)* That is,


E OGik(lr)
rik - O

O(k-1)IJ
(VIW++k - Eb Vb7r+bk)(-W+k) (V27r++k -Zb Vbll+bk)(-7ri+k)

(vJlr++k - Eb Vbll+bk)(-T+k)

.. ... ..................................
(Vl7++k - Eb VblI+bk)(lr++k - 7ri+k) 1 (V27r++k - Eb Vbr+bk)(7r++k - 7ri+k) 7r 2
(VJ r++k - Eb Vblr+bk)(7r++k - 7i+k) .. ........ ........ .. ......... .....
. . .... . . . . . . . . . . . . . . . . .. . . . . . . . . .
(Vllr++k - Eb Vb7r+bk)(-ri+k) (V21r++k - Eb Vb7r+bk)(-7ri+k)

(vjr++k - Eb Vb7r+bk)(-ri+k) O(K-k)IJ




















E 1 rik 2
n' +k


O(k-1)IJ
(vln++k - >b Vbl+bk)(-n+A;) (v2n++k - Eb Vbf+bk)(-ni+k)

(v n++k - Eb Vb+bk)(-i+k)
. . . . . , . . . . . . . . . . . . . . . . . . . . . . . .

(Vn++k - Eb vbn+bk)(n++k - ni+k) (v2n++k - Zb Vbf+bk)(f++k -ln+k)

(vjn++k - Eb Vbf+bk)(n++k - fl+k)
,.. o . . . .. . . . . . . . . . . . . . . . . . . . . . . . .

.
(vln++k - Eb Vbf+bk)(-i+k) (V2fl++k - Zb Vb+bk)(-IJ +k

(VJTI++k - Zb Vbfl+bk)(fli+k)
L O(K-k)IJ


O(K-1)IJ] =- I rik I. [O(K-k)IJ]




Also let E represent the K(I - 1) x IJK matrix such that the row corresponding to

i, k consists of rik, that is,


- 9G1(7r)'
aor
aG(I-1) (7r)o~r
........o..
aGlK(7r)'
aar
aG(_l )K(Tr).
1r1


The null asymptotic covariance matrix of q is R = EE'/n. The score statistic for








testing H0 : /ik = 0 for i = 1,.-. ,I - l and k = 1,... ,K is q'R-'q. Its asymptotic distribution is chi-squared with df = K(I - 1). The number of df is the rank of the asymptotic covariance matrix or the number of components of parameters for testing.

For exact tests, one identifies any of these six statistics with T in the calculation of the exact P-value. We discuss next how to construct modified exact P-values for the six tests.



3.4 The Construction of the Modified Exact P-value




So far, we have discussed six test statistics for testing conditional independence of X and Y, given Z, in three-way contingency tables. The ordinary exact P-value can be constructed by utilizing these statistics. In Chapter 2, we proposed a modified exact P-value, to reduce the degree of conservativeness. It is based on both the usual test statistic and, at the observed value of T, a secondary statistic T' that generates a secondary partitioning. The statistic T' is a statistic directed toward a broader alternative. Then, T' can catch some information about the validity of the null hypothesis when the assumed alternative for T is not exactly satisfied. The modified exact P-value is defined in Chapter 2 as

P* = PHo(T > t,) + PHo(T = to, T' > t')


when large values of T and T' contradict the null. We have shown in Chapter 2, using 2 x 2 x K tables, that the modified P-value has less discrete sampling distributions, and modified tests reduce the degree of conservativeness. We can apply this modified approach to I x J x K tables to reduce the conservativeness and to get sharper results.

For testing conditional independence assuming no three-factor interaction, we denote T1 to be the test statistic when both X and Y are nominal, denote T2 to be








the test statistic when X is nominal and Y is ordinal, denote T2' to be the test statistic when X is ordinal and Y is nominal, and denote T3 to be the test statistic when both X and Y are ordinal. Also, let T4, T5, T5 and T6 be the corresponding test statistics when we permit three-factor interaction. Note that these are score statistics.

In this section, we discuss possible alternative ways of forming modified P-values for testing conditional independence for I x J x K tables. Ordinary exact P-values for these six tests correspond to six loglinear models for primary alternative hypotheses. The general rule to construct the modified exact P-value is as follows. We use a score statistic for T', in order to have consistency. If there is only one potential statistic for T', we use that one. But, if there is more than one potential statistic, we apply a basic principle to choose a T' among them. Now, we establish basic principles. We can consider four types of principles. The first principle is to choose a T' from the next most general alternative, while keeping the same assumption as T about three-factor interaction. The second principle is to choose a T' from the most general alternative, while keeping the same assumption as T about three-factor interaction. The third principle is to choose a T' from the most general alternative among all cases. The fourth principle is to choose a T' while keeping the nature of the classification variables. Next, we discuss all possible statistics for T' for six cases. Note that all possible potential statistics for T' are T1,T2,T2,T3, T4, T, T,, and T6. We first consider the tests assuming no three-factor interaction.

When both X and Y are nominal, the primary test statistic T is T1. The secondary statistic T' can be T4, since T4 corresponds to a more general alternative hypothesis. Second, when X is nominal and Y is ordinal, T is T2 and T' can be T1, T4, or T5. Third, when both X and Y are ordinal, T is T3 and T' can be T1,T2,T.,T4,T5,T5, or T6. Since T3 is constructed from the narrowest alternative, the other statistics can be potential statistics for T.








Next, we assume three-factor interaction. First, when both X and Y are nominal, T is T4, but there is no general score statistic for T', since T is constructed from the most general alternative. We could, however, use the table probability for T' for the secondary partitioning. Second, when X is nominal and Y is ordinal, T is TS and T' can be T4. Finally, when both X and Y are ordinal, T is T6, and T' can be T4, T5 or T5. Table 3.1 summarizes all possible statistics for T' for six tests.

We see two cases have only one potential statistic for T'. For the nominal-bynominal case assuming no three-factor interaction, T' is T4. Note that permitting three-factor interaction, nominal-by-nominal case, there is no score statistic, but we could use the table probability. Also, for the nominal-by-ordinal case, T' is T4. For these three cases, there is only one choice for T'. For other three cases, we apply a basic principle in order to choose a T' among potential statistics.

For the first principle, we choose a T' from the next most general alternative, while keeping the same assumption as T about three-factor interaction. Assuming no-three factor interaction, (T, T') is (T2, T1) for the nominal-by-ordinal case, since the nominal-by-nominal case is more general, and it also corresponds to the next most general alternative assuming no three-factor interaction in this case. For the ordinalby-ordinal case, the next most general alternative corresponds to the nominal-byordinal case or the ordinal-by nominal case. Hence (T,T') is (T3,T2) or (T3,T2). Accordingly, for the ordinal-by-ordinal case permitting three-factor interaction, (T, T') is (T6, T) or (T6, T,).

The second principle is to choose a T' from the most general alternative among three cases, while keeping the same assumption as T about three-factor interaction. Then, assuming no-three factor intercation, the corresponding statistics for (T, T') is (T2, TI) for the nominal-by-ordinal case and (T3, T1) for the ordinal-by-ordinal case, since the nominal-by-nominal case is the most general among three cases. Also, for the ordinal-by-ordinal case permitting three-factor intercation, (T, T') is (T6, T4).








For the third principle of the most general alternative among all cases, the corresponding statistics for (T, T') is (T2, T4), (T3, T4), and (T6, T4), since T4 corresponds to the most general alternative among all cases. For the fourth principle of keeping the nature of the classification variables, the corresponding statistics for (T, T') is (T2, T5), (T3, T6). For the ordinal-by-ordinal case permitting three-factor interaction, T' does not have a potential statistic in this principle.

Among four principles, we prefer the first principle, since modified P-values can be defined for most cases using this principle, and it can utilize the ordinality of classification variables. For the second and third principles, T' does not consider possible ordinality. Table 3.2 summarizes test statistics for the construction of ordinary and modified exact P-values for testing conditional independence in I x J x K contingency tables using the first principle. For I x J x K contingency tables, the discreteness will not be severe when the sample size is large. But, when the sample size is small, the modified P-value can reduce the conservativeness. We discuss implementation of the exact tests in the next section.



Table 3.1. All possible statistics for T' for six tests.
T TI
T1 T2 T T3 T4 T5 T6
Assuming no
three-factor interaction

Nominal-by-Nominal T . . . .
Nominal-by-Ordinal T2 T1 . . T4 T5
Ordinal-by-Ordinal T3 T1 T2 T2 T4 T5 T5 T6

Permitting
three-factor interaction

Nominal-by- Nominal T4
Nominal-by-Ordinal T5 T4
Ordinal-by-Ordinal T6 T4 T5 TS









Table 3.2. Test statistics for the construction of the ordinary and modified exact P-values P* for testing conditional independence in I x J x K contingency tables.
Ordinary Modified
P-value P-value P*
T (T,T')
Assuming no
three-factor interaction

Nominal-by-Nominal T, (T1, T4) Nominal-by-Ordinal T2 (T2, T1 ) Ordinal-by-Ordinal T3 (T3, T2) Permitting
three-factor interaction

Nominal-by-Nominal T4 (T4, P(Z)) Nominal-by-Ordinal T5 (T5, T4) Ordinal-by-Ordinal T6 (T6, T5)



3.5 Approximation of Exact P-values




For three-way contingency tables, algorithms for testing conditional independence are available in widely-available software only for the 2 x J x K case with ordered columns (StatXact 1991). Even for table sizes where software exists, the reference set of tables for the conditional distribution is sometimes too large for an exact P-value computation. For instance, sometimes the sample size is moderately large but there are many cells and the table is sparse, so exact methods are infeasible but the use of standard asymptotic theory is questionable.

In some cases, one can obtain a very accurate approximation to the distribution of the test statistic using a saddlepoint approximation. This higher-order asymptotic approximation is more accurate than the normal approximation or the one- or two-term Edgeworth expansion. It is applicable to conditional densities and tail probabilities of sufficient statistics in exponential families. For example, to approximate








conditional tail probabilities, one can use an approximation due to Skovgaard (1987). Davison (1988) applied the approximation to model (3.5) for 2 x 2 x K tables, and Pierce and Peters (1992) applied it to model (3.5) for K = 1.

To illustrate the saddlepoint approximation, we show how to apply it to the homogeneous linear-by-linear association model (3.5) for arbitrary K. Let / denote the ML estimate of /3 in that model. Let G2(1) and G2(L x L) denote the likelihoodratio statistics for testing the goodness of fit of the conditional independence and homogeneous linear-by-linear association models. The conditional P-value for testing H0 /3 0 against H, :/3 > 0 has saddlepoint approximation

_ 1 _1),( .0
Pr(T > to{fni+k} , {n+jk}) - 1 - F(z) + O(z)( 1 (3.20) w z

where

z = sgn()G2(I)- G2(LxL) and W = 2Isinh



The matrices I, and IrxL are the observed information matrices for the conditional independence model and homogeneous linear-by-linear association model, and I) and g denote the standard normal cdf and pdf.

Since software is not yet available in the generality needed for the exact conditional methods we have described for I x J x K tables, we next present an alternative method that can approximate the exact conditional result as well as needed. This is the simple approach of performing a Monte Carlo simulation on the conditional set. The Monte Carlo method is an alternative to computing either the exact or asymptotic P-values. It is useful for those situations where the data set is too large for an exact P-value computation or too sparse to rely on the asymptotic theory.

Agresti et al. (1979) utilized this method effectively for a variety of tests for twoway tables. Even for large tables or large sample sizes, one can quickly approximate








as closely as needed the ordinary and modified exact P-values for the six statistics presented in Section 2 and Section 3. The method consists of sampling contingency tables from the conditional reference set in proportion to their probabilities, and computing an unbiased point estimate and a narrow confidence interval for an exact P-value. We constructed an algorithm to perform precise approximations for the exact inferences using a table-generation procedure suggested by Patefield (1981). For practical applications, we prefer this approximation to the saddlepoint because it is available more generally (e.g., for multi-degree-of-freedom statistics for testing vectors of parameters) because its accuracy is known to the user, and because that accuracy can be set as finely as one requires.

We proposed ordinary and modified exact P-values for six tests, and T and T' are defined in Table 3.2. To illustrate, suppose we want to estimate a modified exact one-sided P-value when X and Y are ordinal assuming no three-factor interaction. Then, we test against a narrower alternative of the homogeneous linear-by-linear association model (3.5). The secondary statistic T' is a test statistic directed toward a broader alternative hypothesis. For T', one possibility is the score statistic for the case of nominal-ordinal association assuming no three-factor interaction. Let t' be the observed value of T'. Therefore, in this case we have T = Z Z Z uivjnijk, and T' is a score statistic discussed in Section 3.2.3. This is a one-sided test. Accordingly, modified exact P-values for other tests can be constructed by using T and T' in Table

3.2. They are two-sided tests.

To implement the exact tests, we sample M contingency tables, with replacement, from the reference set F of tables with the same margins, where M is chosen to give the desired degree of accuracy with some fixed probability. Define the upper critical region of the reference set by


P = {Z E F T > t, or (T = t, and T' > t') }.






89


The other possibility for T' is to use the null table probability. Under the null hypothesis of conditional independence, the probability of observing any specific Z E F is

Pr(Z kz) = n+k!Ifjnijk! (3.21)


Then we define the critical region of the reference set by

F {Z E F: T> t or (T= t and P(Z) < P(N)) }.

For the ith table sampled, let y, = 1 if zi C F*, and let yi 0, otherwise. The point estimate of the modified P-value is
1
-Eiyi,
=M


the proportion of sampled tables in F*. Likewise, the estimate of the modified P-value using the null table probability for T' can be defined using f*, and we denote by p For the estimate of ordinary exact P-value, the upper critical region of the reference set, F', is

F' = {Z E F :T > t},

that is, the proportion of sampled tables that have a test statistic at least as large as the observed one.



3.6 Examples



3.6.1 Example 1




We illustrate the exact tests using Table 3.3. This is a cross classification of job satisfaction by income, controlling for gender, for black Americans sampled in the








General Social Survey of 1991. In order to utilize ordinality in studying the partial association between income and satisfaction, we test conditional independence against the model (3.5) of homogeneous linear-by-linear association. Using equallyspaced row and column scores, the likelihood-ratio chi-squared statistic for testing the fit of that model equals 12.33, with df = 17. The estimated association parameter is f 0.388 with s.c. = 0.155. The likelihood-ratio chi-squared statistic for testing conditional independence, assuming the model, is 19.37-12.33=7.04 with df = 1. There seems to be very strong evidence of a positive association between income and satisfaction. However, the data are sparse enough to make large-sample approximations questionable; yet the sample size is sufficiently large so that exact analyses are infeasible. We used Monte Carlo sampling with M = 50, 000, which guarantees that P-value estimators fall within 0.004 of the true P-value with probability at least 0.95.

For the exact tests assuming no three-factor interaction, the estimated exact Pvalues for the ordinary exact P-values (with 95% precision indicated in parentheses) are 0.332 (� 0.004) for the nominal-by-nominal test, 0.024 (� 0.001) for the nominalby-ordinal test, and 0.006 (� 0.001) for the ordinal-by-ordinal test. Using T' defined in Table 3.2, the corresponding estimated exact P-values for modified exact P-values P* are 0.332, 0.024, and 0.004. Also using the null table probability for T', the corresponding estimated modified P-values P; are 0.332, 0.024, and 0.005. The distribution of T takes 121 separate points for the ordinal-by-ordinal test, and since the degree of discreteness is not severe, the two types of P-values are essentially the same. The asymptotic P-values are 0.335, 0.026, and 0.005, respectively. In this case, first-order asymptotic approximations work quite well.

For other exact tests permitting three-factor interaction, the estimated exact Pvalues for the ordinary exact P-values are 0.281 for the nominal-by-nominal test, 0.089 for the nominal-by-ordinal test, and 0.020 for the ordinal-by-ordinal test. The corresponding estimated P-values for modified exact P-value, P* or PP, are 0.281,









0.089, and 0.020. Also, the corresponding asymptotic P-values are 0.277, 0.089, and 0.020. Table 3.4 summarizes results for all six tests we have discussed. Note that we would not obtain strong evidence of association if we ignored the ordinality of the variables. For large n, since the discreteness is not severe, the modified approach is not needed. Generally, the modified P-value is less discrete than the ordinary P-value and leads to less conservative tests. For small n, we can see the advantage of using the modified approach.



Table 3.3. Cross- classification of job satisfaction with income, controlling for gender, for black Americans.
Gender Income Satisfaction VD LS MS VS Male < 5000 1 1 2 1 < 15000 0 3 5 1 < 25000 0 0 7 3 > 25000 0 1 9 6 Female < 5000 1 3 11 2 < 15000 2 3 17 3 < 25000 0 1 8 5 > 25000 0 2 4 2 Source: General Social Surveys (1991) VD : Very Dissatisfied, LS : A little Satisfied MS : Moderately Satisfied, VS : Very Satisfied




3.6.2 Example 2




We next illustrate the exact tests of independence using Table 3.5, which is a 3 x 2 table from the example in Table 1 of Patefield (1982). This is the results of a doubleblind study concerning the use of Oxprenolol in the treatment of examination stress. Among 32 students, 15 were treated with Oxprenolol and 17 were given Diazepam









Table 3.4. Estimated exact P-values for testing conditional independence in Table
3.3.
Ordinary Modified Modified Asymptotic P-value P-value P* P-value P; P-value Assuming no
three-factor interaction

Nominal-by-Nominal 0.332 0.332 0.332 0.335 Nominal-by- Ordinal 0.024 0.024 0.024 0.026 Ordinal-by-Ordinal 0.006 0.004 0.005 0.005

Permitting
three-factor interaction

Nominal-by-Nominal 0.281 0.281 0.281 0.277 Nominal-by-Ordinal 0.089 0.089 0.089 0.089 Ordinal-by-Ordinal 0.020 0.020 0.020 0.021



(control). The examination results were compared with their tutor's prediction. The column classification is ordinal, and the row classification can be assumed as ordinal since it has two levels.

When X and Y are ordinal, a relevant model that reflects the ordinality in a two-way table is the model of linear-by-linear association, log rij = y + AX + AY + /3uivj. (3.22)


The independence model is the special case of /3 = 0. We test independence against the model of linear-by-linear association in order to utilize ordinality. For unit-spaced scores, the likelihood-ratio chi-squared statistic for testing the fit of that model equals 2.64, with df = 1. The estimated association parameter is /3 = 1.706 with s.e. = 0.773. The likelihood-ratio chi-squared statistic for testing independence, assuming the model, is 9.38-2.64=6.74 with df = 1 (P=0.009). There seems to be very strong evidence that the examination grades compared with their tutor's prediction tend to be higher in the treatment group. Large-sample approximations are questionable




Full Text

PAGE 1

IMPROVED EXACT METHODS EOR STATISTICAL INFERENCE IN CONTINGENCY TABLES By DONGUK KIM A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1994 UNIVERSITY OF FLORIDA LIBRARIES

PAGE 2

© Copyright 1994 by Donguk Kim

PAGE 3

To my wife, daughter and my parents

PAGE 4

ACKNOWLEDGEMENTS I would like to express my sincere gratitude to Dr. Alan Agresti. Without his guidance and encouragement, this work would not have been completed. I would like to thank Dr. Mark Yang, Dr. Myron Chang, Dr. Brett Presnell, and Dr. David Wilson for their encouragement and advice while serving on my dissertation committee. In my six years as a student here, I learned from all professors. I would also like to thank Dr. Yang and Dr. Randles for all the support while 1 worked as a consultant in the Biostatistics Division and as a teaching assistant. Also my thanks go to all my colleagues and friends. Einally, 1 wish to express my special thanks to my family, especially my wife, YoungHee, for her love, patience, and encouragement, and my daughter, Minjee for her love. Furthermore, I would like to thank my parents for their love, encouragement, and support. IV

PAGE 5

TABLE OF CONTENTS ACKNOWLEDGEMENTS iv ABSTRACT vii CHAPTERS 1 INTRODUCTION 1 1.1 Literature Review 1 1.2 Summary of Dissertation Work 6 2 IMPROVED EXACT INFERENCE ABOUT CONDITIONAL ASSOCIATION 9 2.1 Introduction 9 2.2 A Less Conservative P-value 11 2.3 A Less Conservative “Exact” Confidence Interval 31 2.4 Alternative Modifications of “Exact” Confidence Intervals 38 2.5 Connections with Logistic Regression 62 2.6 Discussion 63 3 APPROXIMATING EXACT INFERENCE ABOUT CONDITIONAL ASSOCIATION 64 3.1 Introduction 64 3.2 Tests of Conditional Independence Assuming No Three-factor Interaction 65 3.3 Tests of Conditional Independence Permitting Three-factor Interaction 72 3.4 The Construction of the Modified Exact P-value 82 3.5 Approximation of Exact P-values 86 3.6 Examples 89 3.7 FORTRAN Program for Simulation 94 4 IMPROVED EXACT TESTS FOR ORDINAL VARIABLES IN / x J X K TABLES 96 v

PAGE 6

4.1 Introduction 96 4.2 Basic Results in Two-way Contingency Table 98 4.3 Unbiasedness of Tests in Three-way Contingency Tables 104 4.4 Complete Class of Tests 115 4.5 Admissible Tests 116 4.6 Exact, Unbiased and Admissible Tests 118 4.7 Example 121 4.8 Discussion 124 5 CONCLUSION 125 5.1 Discussion 125 5.2 Future Research 126 APPENDICES A SOURCE CODE FOR EXACT INFERENCE 129 B SOURCE CODE FOR SIMULATION 209 B.l Program Structure 209 B.2 Part of Source Code 211 REFERENCES 248 BIOGRAPHICAL SKETCH 252 VI

PAGE 7

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy IMPROVED EXACT METHODS FOR STATISTICAL INFERENCE IN CONTINGENCY TABLES By Donguk Kim August 1994 Chairman: Alan Agresti Major Department: Statistics Ordinary “exact” methods can be highly conservative when the distribution of the test statistic is discrete. This becomes more severe as the number of dimensions or the number of categories is small. We improve exact inferential methods by decreasing the conservativeness that occurs due to discreteness. In this dissertation, modifications of exact inferential methods are suggested for conditional associations in three-way contingency tables. Eor testing conditional independence, we present a modified P-value. It utilizes both the usual test statistic and, at the observed value of that statistic, a supplementary statistic directed toward a broader alternative. For 2 x 2 x 7\ tables, we propose modified “exact” confidence intervals for an assumed common odds ratio based on inverting two separate one-sided tests using the modified P-value. We also present an alternative and usually even better way of constructing “exact” confidence intervals, based on inverting a two-sided test with a modified P-value. For / X J X A' tables, we discuss exact tests of conditional independence using six test statistics that have connections with loglinear models. Three statistics assume a lack of three-factor interaction, and the other three statistics do not require this Vll

PAGE 8

assumption. All six statistics are score statistics for loglinear models that treat none, one, or both of the classifications as ordinal. Then, we discuss possible alternative ways of forming modified exact P-values in I x J x K contingency tables, and we propose modified exact P-values for six tests corresponding to six loglinear models. For three-way contingency tables, computational algorithms have limited availability for tests of conditional independence when I and J exceed two. We use a simulation algorithm to obtain precise estimates of ordinary and modified exact P-values for cases for which the current computational algorithms are infeasible. For I X J X K tables, we show how to construct exact, unbiased, and admissible tests for an ordinal alternative to conditional independence by using a modified Pvalue approach. This is a generalization of the results of Cohen and Sackrowitz for a test of independence in two-way contingency tables for an ordinal alternative. The ordinary test of conditional independence for 2 x 2 x K contingency tables is usually inadmissible. viii

PAGE 9

CHAPTER 1 INTRODUCTION 1.1 Literature Review Statistical inference for contingency tables generally is carried out by large-sample approximations for sampling distributions of the test statistic rather than the exact discrete distribution. A central concern is the quality of the asymptotic approximation. Large-sample approximations apply as the sample size grows, for a fixed number of cells. The adequacy of the chi-square approximation depends on both the sample size and the number of cells. Some contingency tables occur where the sample size is too small to apply asymptotic methods. Also, high-dimensional contingency tables tend to be sparse, and as a consequence the asymptotic approximation to the sampling distribution is often very poor. Agresti (1992) surveyed exact inference for contingency tables and explained the developments of exact methods for contingency tables. He suggested the use of exact methods instead of large-sample approximations when the application of asymptotic approximation is questionable. We focus on exact inferential methods for conditional associations in three-way contingency tables. When the exact distribution of the test statistic is discrete, it is known that ordinary “exact” tests and confidence intervals can be highly conservative because of the discreteness of the distribution. Though exact tests are guaranteed to control the probability of Type I error at any nominal level, we may not achieve a probability of Type I error of the nominal level exactly. The actual probability of Type I error may be considerably smaller. For instance, in a 2 x 2 contingency table, Fisher’s 1

PAGE 10

2 exact test is always conservative. For exact inference about a parameter of interest, we condition on sufficient statistics for unknown parameters to eliminate them. For an exact conditional test for categorical data, the reference set of tables over which the exact conditional distribution is defined is the set of contingency tables having certain marginal counts fixed. This extra conditioning makes the distribution of the test statistic more highly discrete. Barnard (1947) proposed an unconditional exact test for 2x2 contingency tables. The reference set of his test is defined as the set of all tables with fixed row margins and all possible column margins. Since the column margins are not fixed, this unconditional test has many more tables in the reference set, and the distribution of the test statistic is less discrete. A disadvantage of the unconditional test is that computations are infeasible for larger tables, since maximizing over the space of nuisance parameters is needed for implementation. For further details, see Yates (1984) and Suissa and Shuster (1985). One way to reduce conservativeness is the mid P adjustment. Let T be a test statistic and tg be its observed value. According to Lancaster (1961), the mid P adjustment utilizes half of the probability of the observed value of T ; hence, it subtracts half of the probability of the observed statistic from the usual exact P-value. This reduces the conservativeness due to discreteness and does not rely on randomization to eliminate the conservativeness. But one drawback is that it can not guarantee exactness, in the sense that the actual size possibly exceeds the nominal level, ft comes from the fact that the mid P approach subtracts half of the probability of the observed statistic from the exact P-value. For nonparametric tests, Streitberg and Roehmel (1990) considered utilizing a secondary statistic together with the usual statistic to discriminate among those rank configurations that have the same value of the primary statistic. He showed that his test is uniformly more powerful than the Wilcoxon-MannWhitney test, and the

PAGE 11

3 P-value of this test employing any secondary statistic can not be larger than the Pvalue from the ordinary test. A similar approach to reduce the conservativeness is due to Cohen and Sackrowitz (1992). They suggested a modified P-value that utilizes both the usual test statistic and, at the observed value of that statistic, the null table probability for a secondary partitioning for those tables having T = toInstead of including all tables having T = to in the calculation of the P-value, they include tables that are no more likely than the observed. They used this for ordinal tests in two-way tables. Discreteness also alfects interval estimation. An “exact” confidence interval for a parameter can be constructed by inverting the exact conditional test. The ordinary confidence interval (Cox 1970, Gart 1970, Mehta et al. 1985, Vollset et al. 1991) is based on inverting two separate one-sided tests using the ordinary P-value. Because of discreteness, we get a conservative confidence interval. The actual confidence coefficient is at least the nominal level. We could construct an exact confidence interval based on inverting a single twosided test rather than two separate one-sided tests. Using a two-sided approach, Sterne (1954) constructed a confidence interval for a single binomial parameter, and Baptista and Pike (1977) constructed confidence limits for the odds ratio in a 2 x 2 table. This two-sided confidence interval also is conservative. Some problems arise when exact methods are infeasible and the application of large-sample approximations is questionable. For large-sample inference about conditional association in three-way contingency tables. Mantel and Haenszel (1959) gave a test statistic comparing two groups on a binary response, adjusting for control variables. Since Cochran (1954) proposed a similar statistic, it is called the CochranMantel-Haenszel statistic. This is a test for conditional independence in 2 x 2 x K tables. Also, Birch (1964) showed that under the assumption of a constant odds ratio within each of the tables, this test is uniformly most powerful unbiased.

PAGE 12

4 Birch (1965) derived three test statistics for testing the null hypothesis of conditional independence of two variables in / x J x K contingency tables. These are score statistics for loglinear models that none, one, or both of the classifications are ordinal. These models assume a lack of three-factor interaction. When both classifications are nominal, the corresponding statistic is a generalized Cochran-Mantel-Haenszel test statistic to handle more than two groups or more than two responses. This method involves computing the expected values and the covariance matrix under the multiple hypergeometric probability model for each of the tables. These quantities then are summed across the tables, and a quadratic form of the test statistic is generated. When both classifications are ordinal, the corresponding statistic is the same as MantelÂ’s (1963) score statistic. Furthermore, BirchÂ’s statistics are special cases of a general statistic proposed by Landis et al. (1978). These statistics have an asymptotic chi-squared distribution. Rather than use large-sample approximations, we wish to conduct exact inference. Even though recent developments make exact methods feasible for some inferential analyses, because of computational complexity, we do not have exact methods for some situations. For three-way contingency tables, current computational algorithms for exact methods are restricted to certain analyses for 2 x J x K tables with ordered columns. The Monte Carlo method is another alternative to either exact or asymptotic, methods. This method is based on estimating the exact conditional sampling distribution of the statistic by generating random tables having the relevant fixed margins. It is useful for those situations where the data set is too large for an exact computation or too sparse to rely on the asymptotic, theory. For table generation by simulating from a hypergeometric distribution, Boyett (1979) wrote a program that generates a two-way random table from the exact distribution with given row and column totals.

PAGE 13

5 Patefield (1981) presented a program generating a random table, and his program is faster than BoyettÂ’s for larger sample sizes. Agresti et al. (1979) utilized the Monte Carlo method effectively for a variety of tests for two-way tables. Even for large tables or large sample sizes, one can quickly approximate as closely as needed the ordinary and modified exact P-values for these statistics. This method consists of sampling contingency tables from the conditional reference set in proportion to their probabilities and computing an unbiased point estimate and a narrow confidence interval for an exact P-value. When we construct a critical region for exact tests with some preassigned nominal level Q, supplementary randomization would be required at the boundary of the critical region in order to achieve the nominal size. This is typical for any discrete problem. After randomization, the resulting test may be inadmissible. Cohen and Sackrowitz (1991) focused on two-way tables and showed unbiasedness for the test of independence in two-way tables for an ordinal alternative. Eaton (1970) showed the essentially complete class in an exponential family. EatonÂ’s theorem shows that the essentially complete class consists of tests whose acceptance regions are convex with possible randomization on the boundary of acceptance region. Furthermore, Ledwina (1978a, 1984) gave the class of admissible rules in an exponential family. Using the same argument in Ledwina, Cohen and Sackrowitz (1991) proved a theorem that gives the class of exact, unbiased, and admissible tests in two-way contingency tables. They constructed the exact test of size a by ordering the tables according to their probabilities on sample points where the test would randomize. They made the number of tables on which randomization would occur considerably smaller than in the usual test.

PAGE 14

6 1.2 Summary of Dissertation Work In Chapter 2, we present exact tests of conditional independence against the alternative of no threefactor interaction. Our modified exact tests are adaptations of the ordinary exact conditional tests that are less conservative. We propose a modified P-value based on a secondary partitioning of the sample space beyond that generated by the test statistic. It utilizes both the usual test statistic and, at the observed value of that statistic, a supplementary statistic T' directed toward a broader alternative. In the calculation of the P-value, we include only those tables that are at least as contradictory to the null in terms of T' . One can calculate this modified P-value for any test statistic having a discrete distribution. The modified P-value is less discrete than the ordinary P-value, does not employ randomization, and leads to a less conservative “exact” test. By inverting results of tests using modified P-values, we obtain an exact and less conservative confidence interval, in the sense that the modified confidence interval has confidence coefficient at least the nominal level and is narrower than the ordinary one. For 2 X 2 X K tables, we suggest a modified “exact” confidence interval inverting the test based on a modified one-sided P-value to make the actual confidence coefficient closer to the nominal value. Also, we present an alternative and usually even better way of constructing “exact” confidence intervals, based on inverting a two-sided test with a modified P-value. Furthermore, we utilize the mid P-value to construct intervals applying these methods, although these are not exact. To compare these types of intervals, we calculate actual coverage probability or expected length of the confidence intervals based on inverting one-sided or two-sided tests using the ordinary or modified P-value.

PAGE 15

7 In Chapter 3, we suggest exact inference regarding conditional associations in three-way contingency tables. For exact tests of conditional independence in / x J x K tables, three statistics assuming a lack of three-factor interaction are discussed, and then we provide three other test statistics permitting three-factor interaction. All six test statistics are score statistics for loglinear models that treat none, one, or both of the classifications as ordinal. Also they have asymptotic chi-squared distributions. Using these statistics, we propose modified exact P-values for six tests for testing conditional independence with I x J x K tables. For cases that are currently computationally infeasible, we construct a simulation algorithm to obtain precise estimates of ordinary and modified exact P-values, using a table-generation procedure suggested by Patefield (1981). We utilize six test statistics for exact tests of conditional independence. In Chapter 4, we generalize results of Cohen and Sackrowitz (1991, 1992) to construct exact, unbiased, and admissible tests for an ordinal alternative to conditional independence for I x J x K tables. We first show unbiasedness of tests when one wishes to test a null hypothesis of conditional independence against the alternative of no three-factor interaction model in three-way contingency tables. Then we present the complete class of tests and admissible tests in an exponential family following Eaton (1970) and Ledwina (1978a, 1984). Using these arguments, we generalize to the three-way case some results of Cohen and Sackrowitz regarding admissibility of tests for two-way tables. Combining these, we have a theorem that gives the class of exact, unbiased, and admissible tests in three-way contingency tables. With this theorem, we discuss how to construct unbiased tests and how to set up critical regions to obtain tests of conditional independence of fixed size a, for an ordinal alternative. We construct the exact test of size cr by ordering the tables according to a secondary statistic directed toward a broader alternative hypothesis at the randomization points, utilizing the modified approach discussed in Chapter 2. By

PAGE 16

8 using the modified approach, the resulting test is admissible after randomization, and it requires less randomization than usual. Also, we have actual size closer to a nominal value. The Appendix contains a FORTRAN program. Using this program, one can easily get ordinary and modified exact inference about conditional associations for 2 X 2 X K contingency tables.

PAGE 17

CHAPTER 2 IMPROVED EXACT INFERENCE ABOUT CONDITIONAL ASSOCIATION 2.1 Introduction When a test statistic has a discrete distribution, ordinary “exact” tests and confidence intervals can be highly conservative due to discreteness. If we conduct a test using some preassigned size a, the probability of Type I error is always less than or equal to a preassigned value. If one constructs an “exact” confidence interval with confidence coefficient 1 — a, the actual confidence coefficient is at least that level and is unknown (Neyman 1935). We wish to improve ordinary exact inferential methods by decreasing the conservativeness that occurs due to discreteness. In this chapter, we suggest modifications of exact inferential methods for conditional associations in 2 X 2 X K contingency tables. For instance, we present an example of a 2 x 2 x 5 table for which the ordinary 95% confidence interval for an assumed common odds ratio is (1.1, 531.5). The discreteness implies that .95 is a lower bound for the actual confidence coefficient. We show how to construct a modified confidence interval that also has the guarantee of at least 95% confidence, but takes the much shorter range (2.1, 67.3). Our approach is applicable for any contingency table of size larger than 2x2, but we illustrate the arguments in terms of inferences about conditional associations in 2 x 2 x K contingency tables. The ideas and notations apply throughout the dissertation. In this chapter we are focusing on 2 x 2 x K contingency tables. 9

PAGE 18

10 For three-way tables, consider the hypothesis of conditional independence of two variables, given the third one. For instance, if {’Kijk} denote probabilities for a multinomial distribution over the I x J x K cells, where SSETTij^ = 1, the hypothesis states that T^ijk — +jk I ^++kThe subscript “-f-” denotes the sum over the index it replaces. Let N = {riijA;} denote the cell counts, with expected frequences {niijk}We discuss exact conditional tests of this hypothesis, generalizing Fisher’s exact test for 2 x 2 tables. We also discuss confidence intervals for odds ratios pertaining to conditional association. Let X denote the row classification, Y the column classification, and Z the layer classification. The hypothesis of conditional independence of X and Y, given Z, is usually tested against the alternative of no three-factor interaction. This alternative is the loglinear model of form log 77iijk = // + Af + AJ -f Af -b \ -f Xjif , (2.1) having sufficient statistics ({n,j 4 .}, {rzi+A:}, The null hypothesis corresponds to the special case of this model in which all A,A^ = 0. Exact conditional tests utilize the distribution of the sufficient statistics for these parameters, conditional on the other sufhcient statistics, that relate to the remaining parameters. For the case of a 2 X 2 X K table, for instance, one uses the distribution of Y^kUuk, conditional on the row totals {rij+fc} and column totals {n^jk} for the partial tables (Birch 1964). The parameter of interest for estimation is the assumed common odds ratio for each 2x2 table. We present exact tests of conditional independence for the alternative of no threefactor interaction. Our modified exact tests are adaptations of the ordinary exact conditional tests that are less conservative. They use a modified P-value based on a secondary partitioning of the sample space beyond that generated by the test statistic. It utilizes both the usual test statistic and, at the observed value of that statistic.

PAGE 19

11 a supplementary statistic directed toward a broader alternative. A modified P-value is less discrete than the ordinary P-value and leads to less conservative “exact” tests. By inverting results of tests using modified P-values, we have an exact and less conservative confidence interval, in the sense that a modified confidence interval has confidence coefficient at least the nominal level, and it is narrower than the ordinary one. Section 2 introduces the modified P-value and shows that its distribution can be much less discrete than that of the ordinary P-value. We compare the ordinary and modified P-values with examples. Furthermore, the null expected value of the P-value is discussed in both procedures in order to examine the degree of conservativeness. Section 3 discusses modified “exact” confidence intervals, based on inverting two one-sided tests using the modified P-value. Though they are also conservative, they may be much narrower than the usual one. Illustrations are given for estimating an assumed common odds ratio for several 2x2 tables. Section 4 presents an alternative and usually even better way of constructing “exact” confidence intervals, based on inverting a two-sided test with a modified P-value. Section 5 discusses some related results for logistic regression models, and Section 6 gives some comments. 2.2 A Less Conservative P-value Suppose we would like to conduct an exact conditional test for categorical data using some preassigned size a, such as 0.05. Denote by P the set of contingency tables having the same marginal counts as the ones that are fixed by the conditioning argument for the exact conditional test. This is the set of tables over which the exact conditional distribution is defined. For the test of conditional independence.

PAGE 20

12 for instance, P is the set of / x J x K tables of nonnegative integers, P = {Z : = n^jk, TjjZijk = Ui^k, foralU,j, ^}. It is usually not possible to construct a critical region for exact conditional tests with preassigned size a because of the discreteness of the distribution. If an exact test is desired of arbitrary size a, supplementary randomization would be required to make the decision about whether to reject when a table occurs at the boundary of the critical region. In practice, it is unacceptable to employ randomization, and one normally simply reports a P-value. In general, suppose we have a test statistic T, such as a Wald, likelihood ratio, or score statistic, and suppose tg is the observed value of T . If large values of T contradict the null, the usual P-value is P = Pho{T>U), ( 2 . 2 ) the probability under the null hypothesis that T is at least tg. Ordinarily, if one wants to make a decision about //q, one rejects if the P-value < a. The discreteness implies that the test based on the P-value is conservative in the sense that the actual size is Pho{P < tt) < « for 0 < a < 1. (2.3) In the exact conditional approach, one conditions on sufficient statistics for unknown parameters in order to eliminate them. Then, the tail probability that determines the P-value does not depend on unknown parameters and can be exactly calculated. The extra conditioning reduces the set of possible test statistic values, making the distribution more highly discrete. Hence, tests of nominal size a based on the exact conditional P-value can be even more conservative. The actual probability of Type I error can be considerably less than the nominal value unless the sample size is reasonably large. This problem is exacerbated by the tendency of many users to put too much emphasis on testing at sacred levels such as .05. One can argue that one should simply report the P-value and not make comparisons to such arbitrary levels.

PAGE 21

13 particularly when data are discrete. However, the discreteness also affects interval estimation. 2.2.1 The Modified “Exact” P-value To reduce the degree of conservativeness, we suggest a modified P-value based on a less discrete distribution than that of T. The modified P-value uses a partition of the sample space that is more refined than we get using T alone. We use T to construct a primary partitioning of all tables that have the sufficient statistics fixed by the conditional test. Then, within fixed values of T, we generate a secondary partitioning using some other index T' of the degree to which the data contradict the null hypothesis. The statistic T' is a test statistic directed toward a somewhat broader alternative hypothesis, hence detecting information that may be missed by T . Let to and denote the observed values of the primary and secondary statistic. The modified P-value is defined as P* = Pho{T > to) + PhAT = to, r > O, (2.4) where the probabilities are computed under the null conditional distribution. Instead of including all tables having T = to in the calculation of the P-value, we include only those that are at least as contradictory to the null in terms of having at least as large a value of T'. To illustrate, consider testing conditional independence in 2 X 2 X K tables. Normally, if we expect about the same strength of association in each 2x2 stratum, we test against the alternative (2.1) of no three-factor interaction. Using this narrow alternative helps to build power compared to statistics based on the general alternative, even if we do not feel that reality exactly satisfies (2.1). Suppose we use as the primary statistic the score statistic, which is based on T = for the conditional

PAGE 22

14 set of tables having the same row and column totals as the observed table. Then one could use the score statistic for the general alternative (the saturated model) for the secondary partitioning. This is simply T = where Xl denotes the Pearson statistic for testing independence in the ^'th partial table. The secondary statistic also contains information about the validity of the null hypothesis, but is directed toward a wider alternative. Another possibility for the secondary partitioning is to use the null table probability, in which case T' can be expressed as the negative log of that probability. For a given value of T, tables that are less likely under the null are then considered to give greater evidence against the null. Let 5 = {Z : Z £ F, T = to, T’(Z) < T’(N)}, where the probabilities are computed under the null. The modified P-value is then P; = PH,{T>t,) + PH,[B). (2.5) The modified P-value orders sample tables in P according to their probabilities when T = toHence, this is based on the probability of the observed table as well as some test statistic. Cohen and Sackrowitz (1992) used this type of P-value for ordinal tests in two-way tables. We will compare both ways of forming modified Pvalues and confidence intervals based on these modified P-values, with examples. We prefer P* over P* for the modified P-value, because both T and T' are score statistics for testing conditional independence. The setting and the statistic T in definitions (2.4) and (2.5) are arbitrary. One can calculate P* for any test statistic having a discrete distribution, since it satisfies Pho{P* < «) < « for 0 < O' < 1. We show that under the null this modified P-value has the property, Pho{P* < o) < O' for 0 < O' < 1. (2.6) Let P* be a modified P-value and let m be a possible marginal configuration. We first show that the conditional P-value has Pho{P* < a\m) < a. The result is

PAGE 23

15 easily obtained by noting that the modified P-value is a special case of the usual P-value using a more refined partitioning of T and T' . The ordinary P-value uses a partitioning based on T, and it is the sum of Ph^(T = to) and the probability of more extreme values of T . The modified mid P-value uses a partitioning based on T and T' within T. Let Max(-) denote the maximum value, let Min(-) denote the minimum value, and let Gap(r) denote the minimum difference between two consecutive values of T. We assume that T and T' have positive values. Define a new statistic T* = T x Max(T'')/Gap(T') -f T'. If Min(T'') equals 0, we transform from T' to T' + i in order to avoid ties in T*. Then, T*(Zi) > T*{Z 2 ) for all tables Zi,Z 2 with T(Z\) > T[Z 2 ). Let t* denote the value of T* for the observed table. Note that a partitioning of the sample space using T and T' within T is equivalent to a partitioning of the sample space usiirg T* . Since there are no ties, ordering tables using T and T' within T is equivalent to ordering tables using T*. Then, the sum of the probability that T' is at least T'^ dX T = to and the probability of more extreme values of T is equivalent to the sum of Ph^[T* = t*) and the probability of more extreme values of T*. That is, P* = PHo{T>to) + PHAT = to,r>Q = PH,{T*>t:) + PH,{r = Q. Hence, the modified P-value is a special case of the usual P-value with a more refined partitioning, and we have Ph^{P* < a\m) < a. Then, under the null, Pho{P* < ot) = E[Pho{P* < a|m)] < a, (2.7) since the average of these conditional modified P-values over all possible marginal configurations is less than or equal to a. Thus, we have shown that the probability of Type 1 error is no greater than the nominal value. The modified P-values can not be larger than the ordinary P-values, so the test based on it is less conservative in the sense that the actual size is closer to the nominal

PAGE 24

16 value. Also, the sampling distribution of the modified P-value is less discrete than usual in the sense that its support can have considerably more points. When each table with a particular statistic value T has the same value of T', then P* is the same as the usual exact P-value. As a special case, when there is only one table having each distinct value of T, such as in Fisher’s exact test, they are identical. Note that if r is a score or Wald or likelihood-ratio statistic for a particular alternative, it does not help to take T' to be one of the other statistics for that same alternative. Because these tests all depend only on the sufficient statistics under the alternative, two tables that have the same value of T also have the same value of T' , when T and T' are taken from these procedures. Thus, we base T' on a more general alternative, for which the extra sufficient statistic provides a finer partitioning. When a test statistic has a continuous distribution, the P-value has a uniform(0,l) null distribution. Hence, for the continuous case the expected value of P-value is |. We prove now that in the discrete case the expected value of P under the null is greater than For an arbitrary random variable X (Mood, Graybill and Boes 1974, page 65), EX roo i-O / [1 Fx{x)]dx / Fx{x)dx Jo J —oo fOO yO / [1 — Pr[A" < x]]dx — / Pr[A" < x]dx. Jo J —oo Thus, EP = /J[l — Pr[P < p]]dp. Since, from (2.6) 1 — Pr[P l— p, 0 / [1 -P]dp Jo 1 2 ‘ In the discrete case, the P-value is stochastically larger than the uniform, and its expected value exceeds Hence, we can describe the degree of conservativeness by

PAGE 25

17 comparing £’//gP to 0.5. If the expected value exceeds 0.5 by much, the conservativeuess is severe. 2.2.2 The Modified Mid P-value The mid P-value (Lancaster 1961) is another alternative to the usual P-value that many statisticians have recommended as a way of compromising between having a conservative test and using supplementary randomization {t.g., Barnard 1990). It is defined by T„ud = Pho{T > to) -h [\I2)Pho[T = to). It subtracts half of the probability of the observed statistic from the usual exact P-value. The mid P-value has the appealing property that its null expected value for a discrete distribution equals exactly the expected P-value for a continuous distribution. A disadvantage is that a test based on it is no longer “exact,” the actual size possibly exceeding the nominal value. The mid P-value assigns weight 1 to probabilities of all tables comparable to the observed table in the sense that T = toFor the modified P-value (2.4), the comparable tables are those with T = tg and T' = Thus, we can define a mid P version of the modified P-value by P:.uA = P*\PhAT = to, r = Q. (2.8) Like the ordinary mid F’-value, the modified mid P-value has null expected value equal to The result is easily obtained by noting that the modified mid P-value is a special case of the usual mid P-value using a more refined partitioning of T and T'. The ordinary mid P-value uses a partitioning based on T, and it is the sum of half of Ph^ [T = to) and the probability of more extreme values of T. The modified

PAGE 26

18 mid P-value uses a partitioning based on T and T within T. We assume that T and T' have positive values. Let Gap(T') denote the minimum difference between two consecutive values of T. Define a new statistic T* = T x Max(r')/Gap(r) + T'. If Min(T’') equals 0, we transform from T' to T' + I in order to avoid ties in T*. Then, T*{Z^) > T*{Zt 2 ) for all tables Zi,Z 2 with T{Zi) > T{Z 2 ). Let t* denote the value of T* for the observed table. Note that a partitioning of the sample space using T and T' within T is equivalent to a partitioning of the sample space using T* . Since there are no ties, ordering tables using T and T' within T is equivalent to ordering tables using T*. Then, the sum of half of Pho{T = to, T' — and the probability of more extreme values of T' dX T — to and more extreme values of T is equivalent to the sum of half of Phq{T* = t*) and the probability of more extreme values of T*. That is, ^mid = PHo[T>to)PPHo{T = to,r>Q + {\l2)PuST^to,r^Q = PH,{T*>t:) + {ii2)PHo{r = t:). Hence, the modified mid P-value is a special case of the mid P-value with a more refined partitioning, and its null expected value is equal to |. Also, the difference between the modified P-value and modihed mid P-value is less than the difference between the ordinary P-value and ordinary mid P-value. That is, {P* — ^inid) ^ {P Pnud). 2.2.3 Examples We consider the test of conditional independence in three-way contingency tables under the assumption of no three-factor interaction. We will illustrate the ordinary and modified Pvalues using 2x2x5 and 2 x 2 x 18 contingency tables. For 2 X 2 X K tables, the exact test utilizes the test statistic T = given

PAGE 27

19 {n\+k,n 2 +k,n+ik,n^ 2 k}It assumes homogeneity of the odds ratios in the 2 x 2 x K contingency tables. For modified P-values, we can utilize the table probability, F(Z), for the secondary statistic T' . In the examples we utilize Y^Xl for T' in (2.4). We illustrate the modified P-values (2.4) and (2.5) using Table 2.1, taken from Mantel (1963). It refers to the elfectiveness of immediately injected or l|-hourdelayed penicillin in protecting rabbits against lethal injection with /3-hemolytic streptococci. Let F’=penicillin level, D— delay, and C^whether cured. Under the assumption of a constant odds ratio 0 between D and C at each level of P, we test Hq : 9 = \ against //„ : 0 > 1. Our alternative is the higher cure rate for immediate injection. For the first and last table, the zero marginal count implies that the conditional distribution of n-i\k is degenerate, and the table makes no contribution to the test. Therefore, we can conduct the test using the three remaining tables. The test statistic is T = given marginal totals of row and column variables at each level of the third one. For these tables, tg = 14, and the four tables with T > 14 are {(nm, nn 2 , nns) (3, 6, 6), (2, 6, 6), (3, 5, 6), (3, 6, 5)}. The values of T' for these four tables are 11.09, 7.54, 6.59, and 11.09, respectively. Among them, the observed table is (3,6,5). The ordinary exact P-value is C = CH.(r > 14) = (2-|-9-|-16-|-2)/1452 = 0.0200. The modified exact P-values are P* = P* = (2-|-2)/1452 = 0.0028, the null probability for the tables {(3, 6, 6), (3, 6, 5)}. For another example, we consider Table 2.2, the “crying babies” data given by Cox (1970, p. 5), a 2 X 2 X 18 table. On each of 18 days, babies not crying at a specific time in a hospital ward served as subjects. On each day one baby chosen at random formed the experimental group, and the remainder were controls. Babies were identified as crying or not at the end of a specific period. For these tables, the observed values are to=15, t(,=17.2601 and the P-values are P = 0.045, P* = 0.024, and P* = 0.021.

PAGE 28

20 There can be a considerable discrepancy between the behavior of the ordinary and modified “exact” P-values, the modified one having a distribution that can be much less discrete. For Table 2.1, the total number of possible P-values equals 9 for the ordinary P-value, 32 for P*, and 35 for P*. For Table 2.2, the corresponding numbers are 19, 115938, and 13110. Figure 2.1 presents the cumulative distribution functions of the ordinary exact P-value and of P* for null conditional distributions based on the fixed margins of Table 2.1. Figure 2.2 presents the analogous distributions for P*. Also, Figures 2.3 and 2.4 display the corresponding cumulative distribution functions for null conditional distributions based on the fixed margins of Table 2.2. For Table 2.2, the modified cdf for P* or P* has a distribution practically indistinguishable from the uniform. We can summarize the degree of conservativeness of each P-value using P-value). Using the conditional distribution based on the fixed margins of Table 2.1, Eh^P — 0.611 and Eh,P* = 0.545 and Eh,P; = 0.542. For Table 2.2, Eh^P = 0.576 and Eh,P* =0.500 and Eh^P; = 0.501. We now illustrate the ordinary and modified mid P-values. For the modified mid P-value, we can use T' = or the table probability for the secondary statistic. For Table 2.1, = 0.011 and = 0.002 for both modified mid P-values using E X'l or the table probability. For Table 2.2, P,„id = 0.028, and P*^-^ = 0.024 with T' = 0.021 with the table probability. Figures 2.5 and 2.6 present the cumulative distribution functions of the modified exact P-value and the modified mid P-value using T' = the corresponding cumulative distribution functions using the table probability for T\ respectively, for null conditional distributions based on the margins of Table 2.1. There is a good contrast between the behavior of the modified “exact” P-value and modified mid P-value. The modified P-value never exceeds the nominal level, but the modified mid P-value can exceed it. The modified

PAGE 29

21 mid P-value jumps and exceeds the nominal value before the modified P-value jumps closely to the nominal value. Figures 2.7 and 2.8 display the cumulative distribution functions of the ordinary mid P-value and the modified mid P-value using T' — 3 -nd the corresponding cumulative distribution functions using the table probability for the modified mid P-value, respectively, for the null conditional distribution based on the margins of Table 2.1. Though tests based on the ordinary and modified mid P-value are not “exact,” the gap between the actual size and the nominal level tends to be less for the modified mid P-value than for the ordinary mid P-value. One way to measure how close the cdf of P is to the uniform cdf is by the measure M ^ J \F{x)-G{x)\dx, where F = cdf of P and G = uniform cdf. Using Table 2.1 with T' = we have M = 0.055 for P„ud, and M = 0.022 for P*^^. For the exact P-values, we have M = 0.111 for P, and M = 0.045 for P*. Table 2.1. Example for exact analyses. Penicillin Response Level Delay Cured Died 1/8 None 0 6 1 1/2 Hour 0 5 1/4 None 3 3 1 1 /2 Hour 0 6 1/2 None 6 0 1 1/2 Hour 2 4 1 None 5 1 1 1/2 Hour 6 0 4 None 2 0 1 1/2 Hour 5 0 Source: Mantel (1963)

PAGE 30

22 Table 2.2. Example for exact analyses. Treated Control Day Not Crying Crying Not Crying Crying 1 1 0 3 5 2 1 0 2 4 3 1 0 1 4 4 0 1 1 5 5 1 0 4 1 6 1 0 4 5 7 1 0 5 3 8 1 0 4 4 9 1 0 3 2 10 0 1 8 1 11 1 0 5 1 12 1 0 8 1 13 1 0 5 3 14 1 0 4 1 15 1 0 4 2 16 1 0 7 1 17 0 1 4 2 18 1 0 5 3 Source: Cox (1970)

PAGE 31

23 P(P-value <=x) Figure 2.1. Two cumulative distribution functions of exact P-values with T' — for the margins of Table 2.1.

PAGE 32

24 P(P-value <=x) o cq o CD o CD C\J CD p CD Modified P-value Ordinary P-value 0.0 0.2 0.4 0.6 0.8 1.0 Figure 2.2. Two cumulative distribution functions of exact P-values with T' = P(Z) for the margins of Table 2.1. Â’

PAGE 33

25 P(P-value <=x) Figure 2.3. Two cumulative distribution functions of exact P-values with T' = for the margins of Table 2.2.

PAGE 34

26 P(P-value <=x) Figure 2.4. Two cumulative distribution functions of exact P-values with T' = ^(Z), for the margins of Table 2.2.

PAGE 35

27 P(P-value <=x) o 00 o' cq o ' o ' C\i o ' p o ' jf Modified P-value Modified Mid P-value 0.0 0.2 0.4 0.6 0.8 1.0 Figure 2.5. Cumulative distribution functions of the modified exact P-value and the modified mid P-value with T' = for the margins of Table 2.1.

PAGE 36

28 P(P-v^e <=x) o 00 o CD CD CD ' CM O CD CD ' 0.0 r" 0.2 0.4 0.6 rf Modified P-value Modified Mid P-value 0.8 1.0 Figure 2.6. Cumulative distribution functions of the modified exact P-value and the modified mid P-value with T' = P{Z), for the margins of Table 2.1.

PAGE 37

29 P(P-value <=x) Figure 2.7. Cumulative distribution functions of the ordinary mid P-value and the modified mid P-value with T' = for the margins of Table 2.1.

PAGE 38

30 P(P-value <=x) Figure 2.8. Cumulative distribution functions of the ordinary mid P-value and the modified mid P-value with T' = P(Z), for the margins of Table 2.1.

PAGE 39

31 2.2.4 Software Thomas (1975) gave the first algorithm for exact analysis of several 2x2 contingency tables. This FORTRAN program required enumeration of all possible tables in the conditional reference set; hence, it could be slow. It provided exact tests for conditional independence as well as an exact confidence interval for a common odds ratio, and computed the conditional maximum likelihood estimate. Vollset and Hirji (1991) presented a fast FORTRAN program for the exact test of conditional independence and confidence interval for a common odds ratio in several 2x2 contingency tables. We suggest modifications of exact methods based on ordering the tables by their secondary statistic. In order to implement a modified exact test, we need to compare the secondary statistic, T\ of the generated table to that of the observed table, for tables such that T = to, and decide whether the table contributes to the P-values. We have modified Vollset and Hirji ’s FORTRAN program to implement modified exact P-values. Also, the modified software can compute the expected value and the cumulative distribution of P in both ordinary and modified procedures. The source code is listed as Appendix A. 2.3 A Less Conservative “Exact” Confidence Interval Discreteness also affects confidence interval estimation. For the “exact” confidence interval with nominal confidence coefficient \ — a, the actual confidence coefficient is at least that level and is unknown (Neyman 1935). Since the modified P-value is less discrete than the ordinary P-value and leads to less conservative “exact” tests, we can

PAGE 40

32 reduce the conservativeness by employing the modified P-value for the construction of confidence intervals. For 2 X 2 X A tables, we suggest modified “exact” confidence intervals for an assumed common odds ratio based on inverting results of tests using the modified P-value. Such intervals have confidence coefficient guaranteed to equal at least the nominal level, but are narrower than the ordinary “exact” interval. Illustrations are given for estimating an assumed common odds ratio for several 2x2 tables. 2.3.1 The Ordinary “Exact” Confidence Interval One can construct an exact confidence interval for a parameter by inverting the exact conditional test regarding the value of that parameter. For an ordinary exact confidence interval, one can invert the test based on the ordinary exact P-value. To illustrate, suppose we want to estimate an assumed common odds ratio, 0, in a 2 X 2 x A contingency table. The conditional probability of any table in the reference set, F, is P{{nuk}\{ni+k}, {n+u}, {rr+2fc}; 0) 2A: XZzer Ha; Zk ^ 1 +^ j (2.9) where {zj, • • • , 2 /^-} denote values of {rim, , for a table in the reference set F. Let Ft = {Z : Z 6 F, Y.k'^^nk = t}. Ordinary exact confidence limits for the common odds ratio are constructed from the conditional distribution of T = J2k ^Hik, that is ct0^ E U tn P{T = t-0) (2.10)

PAGE 41

where ct = E n, zer, ^+1A: Zk ^+2k \ ^k j and where i^3,x(0, ni+^ — n+ 2 /t) and i,nax = min(ni 4 ,/t, n+i*,.). The ordinary interval (Cox 1970, Cart 1970, Mehta et al. 1985, Vollset et al. 1991) is based on inverting two separate one-sided tests. It equals (6»_, 6>+), where for Cmn < to < Cnax, at e = e_: p,{e) = Et>t^p{f,e) = ^, e = 0+ : P.ie) = = (2.11) When to = Cnin, the lower endpoint is 0; if to = ^max, the upper endpoint is oo. It is easily shown that {0_{t),0^{t)) has confidence coefficient at least 100(1 a) (Mehta et al. 1985). Due to discreteness of the distribution of T, we have only a conservative confidence interval, and the actual confidence coefficient is unknown. 2.3.2 The Modified “Exact” Confidence Interval To ensure that the actual confidence coefficient is closer to the nominal value and to obtain a narrower “exact” interval, one can invert the two one-sided tests based on the modified exact P-value. We illustrate this using a secondary statistic J2^kW or the table probability to generate the secondary partitioning. In the non-null case, T' is defined as T' = T,xl(e) = Y.T.P. k i j where mijk{9) is the estimate of the expected cell count, assuming common odds ratio 0. When 0 = 1, XX ^l(^) is the Pearson statistic for testing conditional independence.

PAGE 42

34 If large values of V contradict the null, we let 8(6) = {Z : Z 6 F,r = to,T'{0) > ^o(^)}’ When the table probability is utilized, we denote PCZ^d) as the probability of table Z when the common odds ratio is 6, and let 8 (9) = {Z : Z e T,T = toi P{Z] 6) < P(N;0)}. The modified “exact” confidence limits are found using the functions P^O) = ^t>tMt-,0) + P[B{9y,9i = ^t f, and the upper limit, 0;, is the largest of all 0’s to satisfy P;{9) > When P*{9) and P;{9) are strictly monotone functions of 0, the limits satisfy P^{6*_) = P^iOD = We show that the probability that this interval excludes 0, Pr(01 > 0) + Pr(0^ < 0), is at most a. The lower limit is the smallest value of 0 for which Tj*(0) > For 0 < 01, Tr(0) < f . It follows that Pr(01>0) < Pr(P;(0)<|) = ilPr(P;(0)<||m) a ^ r where m denotes a possible marginal configuration, and the last step follows because of discreteness. For the upper limit, by the same arguments we have Pr(0;lj_ < ^)
PAGE 43

endpoints numerically, based on the ordinary endpoints as the initial values. The algorithm to find the endpoints is as follows. Start with an initial value based on the ordinary one, since the modified limits are contained within the ordinary ones. Note that P\{9) and P-iiO) are strictly monotone functions of 9 (Mehta et al. 1985). Also note that P^{9) is bounded by Pi{9), and P 2 *(^) bounded by P2{9). Even though Pi {9) and P-^iP) are not monotone functions of the limits can be found within the ordinary limits because they are bounded by P\{9) and ^* 2(^)5 respectively. Hence ordinary confidence limits provide good starting values for both the monotone case and the non-monotone case. The initial value for the lower limit can be set to be 0_, and the initial value for the upper limit can be set to be 1.01 x Suppose we want to find the lower limit. Generally, the searching algorithm is composed of two steps. The first step is to increase the value of 9 until some value of 9 has Pi {9) > |. For the sake of the non-monotone case, the value of 9 is increased by a small amount so that Pi{9) can not change much between two values of 0Â’s. The second step is iteration within an interval to find the limit. Denote by 9a the most recent estimate that has P;{9) < | and denote by 9b the most recent estimate that has Pi{9) > |. The initial values of 9a and 9s are set to be zero. As 9 changes, 9a or 9b is updated depending on the value of TÂ’]*( 0 ), and these values will be used for the second stage to determine an interval for iteration. More specifically, if Pi{9) < the current estimate is too small. If Pi{9) > the current estimate is too large. For the first step, compute P*{9) at the initial value of 9. If Pi {9) = |, this is the limit. If Pj*(0) < f , multiply 9 by 1.01 to increase the value of 9. Using this new estimate, compute P^*(0). Continue this process until some estimate is found that has Pi{9) > Once this happens, the second step begins. Iteration occurs between two values of 9. These two values are the previous estimate that has Pi{9) < | and the current estimate that has Pf"(0) > |. Note that 9a and 9b have been updated as the estimate changes. Then the new estimate is defined as

PAGE 44

36 and Pi {9) is computed using this estimate. Depending on the value of Pi{6), 9 a or 9b is updated. The process continues until — 1 is sufficiently close to zero, for example, If Pi{9) and T’ 2 *(^) ^^e strictly monotone functions, this algorithm finds the limits that satisfy Pi{9*_) = T* 2 *(^+) “ f • If oot a monotone function, it finds the smallest of all 0’s to satisfy Pi{9) > and the largest of all 0’s to satisfy P-zi^) P fThus, this algorithm can be used for both monotone and non-monotone cases. For the upper limit, the same procedure follows except that at 0 = 0+ if Pi{9) < multiply 0 by 0.99 to decrease the value of 0. This comes from the fact that if P*{9) < |, the current estimate is too large, and if Pi{9) > |, the current estimate is too small. This algorithm is an adaptation of one written by Baptista and Pike (1977) for exact two-sided confidence limits for an odds ratio in a 2 x 2 table. Next, we show that when the ordinary P-value and the modified P-value P* based on table probabilities are identical, then the ordinary and modified exact confidence intervals (based on inverting the test using P*) also are identical. Suppose we use the table probability for T' . By the definition, P = Pho{T>Q, p; = PHo{t>to)PPH,{{'l-.T = U,P{Z)
PAGE 45

37 is the largest among those coefficients for tables having T = tg. Since for arbitrary 6 we get nt { \ '^H\k ^11/c 0'^Hik '^-\-'2k ^1+A: “ ^lljt the table probability for arbitrary 9 depends on only this coefficient. Because the observed table has the largest coefficient among those tables having T = tg, it has the largest probability among those tables having T = tg for arbitrary 9. Hence, P{T = tg\9) = P[B[9)\9], and the ordinary and modified exact confidence intervals also are identical. This property does not hold when T' = used to construct the modified P-value. The expected cell counts in T' have explicit forms under the null, but they do not have explicit forms under the alternative assuming 0, though they can be obtained by the iterative proportional fitting algorithm. For those tables having T = tg, if the observed table has the smallest value of T' under the null, it does not necessarily have the smallest value of T' under the alternative. Hence, the ordinary and modified exact confidence intervals are not necessarily identical when P = P* . We now illustrate exact confidence intervals for a common odds ratio using Tables 2.1 and 2.2. The 95% “exact” interval using the ordinary approach is (1.08,531.51) for Table 2.1 and (0.86,21.37) for Table 2.2. The corresponding modified “exact” confidence interval using V = E^|(^) is (2.08,67.35) for Table 2.1 and (1.01, 13.63) for Table 2.2. Also, the corresponding modified “exact” confidence interval using the table probability for T is (2.08,67.35) for Table 2.1 and (1.04, 14.87) for Table 2.2. We see that inferences can be considerably sharper with the modified approach. For Table 2.1, for instance, the lower bound of the ordinary interval indicates that the true odds ratio could be quite close to conditional independence. The modified interval suggests that the odds ratio is substantively quite different from conditional independence.

PAGE 46

38 2--4 Alternative Modifications of “Exact” Confidence Intervals In previous sections, we have considered two types of probabilities, that is, the probability of obtaining T equal to or less than the observed value of T = to, and separately the probability of obtaining T equal to or greater than the observed value toThen, confidence liinits are constructed by inverting the test. Hence, confidence intervals discussed so far are based on inverting two separate one-sided tests of level a/2 each. We now suggest an alternative way to form an “exact” confidence interval for a common odds ratio. This method is based on inverting a single two-sided test rather than two one-sided tests. We show that confidence intervals based on inverting two-sided tests tend to be less conservative than those based on inverting two separate one-sided tests. Also we discuss modified mid P confidence intervals based on inverting one-sided or two-sided tests using modified mid Pvalues. 2.-.4.1 The Ordinary Two-Sided “Exact” Confidence Interval Sterne (1954) used a two-sided approach in constructing a confidence interval for a single binomial parameter, and Baptista and Pike (1977) used it to construct confidence limits for the odds ratio in a 2 x 2 table. We can extend this directly to 2 X 2 X K tables. For testing a particular value of 9, a two-sided P-value is given by P{9)= P{t-9). (2.13) {t . p(t-,e)
PAGE 47

39 probability. (This has happened for all examples we have considered, and it may indeed be a property of the distribution of T for 2 x 2 x K tables; however, except for K = 1, it does not seem to be known whether the distribution of a sum of noncentral hypergeometric variates is unimodal.) The two-sided exact confidence interval then consists of the values for 6 for which this two-sided P-value equals at least a. Alternatively, one could base the two-sided P-value on a non-null test statistic (such as the score statistic), and construct the confidence interval by inverting that test using the exact non-null distribution. We will discuss this in Chapter 5. This two-sided approach produces an interval that is usually, but not necessarily, shorter than the ordinary one based on inverting two separate one-sided tests. Under certain conditions, it can be shown that the two-sided approach is better, at least for one of the endpoints. For instance, when the upper limit 9+ of this interval is quite large, the distribution of T often satisfies P{t-,9+) > P{to\9+) for all t > toA special case of this holds when the probabilities are monotone increasing in t, which is guaranteed when 6+ > maxt{ct_i/c<}. In order to show this, from (2.10) we have P{T = t-e^) = Ci9\ E tmax ^ Dt For tniin f ^ Ciiax5 PiT = t-e+)-P{T = t-\-,9^) = 1 E ^max « fiu «= ^^,P(T = t\9j^) > P[T = t — 1;^+) for arbitrary t. Hence, if 9^ > max({cj_i/c( }, the probabilities are monotone increasing in t. In this case, since P{t] 9^) > P{to\ 9^) for all t > tg, Y. = at
PAGE 48

40 Hence, this upper limit 0^ is the same as the upper limit obtained using the one-sided testing approach with double the error probability. For instance, the upper limit of the 95% interval based on inverting a two-sided test is then the same as the upper limit of the 90% interval for the approach based on inverting two separate one-sided tests. Analogous remarks apply to the lower limit. In such cases, there is a clear advantage to using this approach based on two-sided tests. Unless one is specifically interested in a one-sided confidence interval (he., a lower bound alone or an upper bound alone for 0), we prefer this approach. 2.4.2 The Modified Two-Sided “Exact” Confidence Interval Following the modified approach of the previous section, one can construct a modification of this confidence interval based on two-sided tests by using a modified P-value. We define a modified two-sided P-value for testing a particular value of 6 as p*{ 0 ) = P(e) P{{Z : Z G r,P{t;0) = P{t,,0),T'{O) < a^)}). ( 2 . 14 ) Again, if we use the table probability for the secondary partitioning, we define a modified two-sided P-value for testing a particular value of 0 as p;{0) = P{0) P{{Z ; Z € T,P{t-0) = P{to,0),P{Z;0) > P{N-,0)}). (2.15) For the modified two-sided confidence interval, we consider the shortest interval that contains all of the the values of 0 for which P^0)>a. (2.16) The lower limit, 6*1, is the smallest 0 satisfying (2.16), and the upper limit, 01J., is the largest 0 satisfying (2.16). We show that this confidence interval is “exact.” For all

PAGE 49

41 values of 9 lying outside the closed interval 9*_ < 9 < 9%, it follows that P*{9) < a. Then Vv{9 < 9*_,9 > 9\) < ?x{P*{9) 1 — a. This approach gives even narrower intervals than obtained by inverting the twosided test with the ordinary P-value. Note that 9is the smallest 9 satisfying P{9) > a. Thus, before 9_, there is no point having P{9) > a. Also note that P*{9) is bounded by P{9) and P*(9) < P{9). For instance, at the ordinary lower limit, if P*{9-) = P{9_), then 9l_ = 0_. Otherwise, 01 > 0_. By a symmetric argument, < 0-I-Hence, the two-sided modified confidence interval is contained within the two-sided ordinary confidence interval. We illustrate these alternative “exact” confidence intervals for the common odds ratio using Tables 2.1 and 2.2. For Table 2.1 the 95% confidence interval by inverting a two-sided test is (1.29, 261.49) based on the ordinary exact P-values and (1.38, 40.45) based on modified exact P-values, P*{9) and P*{9). Using Table 2.2 the confidence intervals are (0.88, 15.92) using the ordinary exact P-values, (1.01, 10.30) using P*{9), and (1.01, 11.14) using P;{9). Table 2.3 contains 95% confidence intervals obtained using the two separate onesided ordinary and modified exact P-values, and using the ordinary and modified two-sided exact P-values. For these tables, the confidence interval constructed using the ordinary two-sided P-value is shorter than the ordinary one based on two onesided P-values. In fact, for each data set, the upper endpoint for the two-sided based interval equals the endpoint that would be obtained with the one-sided method for

PAGE 50

42 a 90% confidence interval. For each type of interval, the ones based on the modified P-value are narrower yet. For Table 2.2 the modified confidence interval based on T' = Yik is shorter than the corresponding confidence interval based on the table probability in both one-sided and two-sided cases. One way to compare the methods to construct the confidence interval and to calculate some degree of the conservativeness is using the coverage function (Vollset and Hirji 1991). The coverage function, for a given value of 0, is computed by summation of 9) over t for which the confidence interval contains the given value of 9. The function is then plotted as a function of 9. Hence, it displays how closely the actual coverage probability falls to the nominal coverage probability. For the conditional distribution having the fixed marginal counts of Table 2.1, Figures 2.9 and 2.10 show the actual coverage probability as a function of the true log odds ratio, for 95% confidence intervals based on inverting separate one-sided tests using the ordinary or modified P-value. We use for Figure 2.9 and the table probability for Figure 2.10, for the secondary partitioning in the modified P-value. There is a clear advantage to using the interval based on the modified P-value. For Table 2.2, this calculation requires a huge computing time, and we have not been able to get results using the conditional distribution based on the margins of all 18 partial tables. Thus, we display results using various subsets of the partial tables of Table 2.2. Figure 2.11 gives an analogous display using various numbers of partial tables from Table 2.2. It shows how the conservativeness is reduced by using confidence intervals based on inverting tests with modified P-values. As the number of strata increases, the modified approach yields actual level closer to the nominal level, and this holds over a broader range of odds ratio values. For either approach, for sufficiently large 9, all tables with those margins would have lower bound of the interval below 9] for sufficiently small 9, all tables would have upper bound above 9. In such cases, the actual probability of coverage of a

PAGE 51

43 100(1 a)% confidence interval has lower bound 1 a/2. That bound is achieved at values of 9 that are potential endpoints of the intervals (Neyinan 1935). To show this, let {9_,9^) denote the ordinary interval based on a one-sided test. Suppose that the value of the upper limit, is large enough so that all the lower limits from other possible tables are less than 9^. Since 9j^ is constructed by inverting the one-sided a/2 test, we have P{T < tg', 9j^) = a/2 and P{T > to + 1; ^+) = 1 — o;/2 accordingly. The coverage function at 0 = 0+ is C{9^) = Y.nt.0^)P{t-9^) t = P(t; t > to + l;9+) = where l{t,9+) is a indicator function to indicate whether or not 9+ is within the confidence interval at T = t. Note that at 0 = 6>+, we have P{T < to]9+] = a/2, and 9^ is the upper limit. At some value of T = the fact that 9^ is within this interval corresponds to P(T < T; 0^.) > a/2. In order to satisfy this, we need to have t' > to + 1, since P{T < to] 9^) = a/2. Hence, the coverage probability that is the summation of P{t] 9+) over t such that f > fo + 1 is 1 a/2. For 9 > 9+ the coverage function has P{9) > 1 — a/2. Figures 2.12 and 2.13 give an analogous display for the confidence intervals based on inverting two-sided tests using the ordinary or modified P-value using Table 2.1. For the secondary statistic T\ Figure 2.12 uses Y^Xl{9) and Figure 2.13 uses the table probability. Again, there is an advantage to the interval based on the modified P-value. Comparing the figures of coverage probability for confidence intervals, we see there is almost always an advantage to using the confidence interval based on inverting two-sided tests. Figure 2.14 gives an analogous display using some fixed sets of margins of Table 2.2. There is a dramatic improvement in the two-sided modified confidence intervals, when the number of strata is large. As the number of

PAGE 52

44 strata increases, we can expect that actual coverage probability is very close to the nominal coverage probability. When log 9 is between -2 and 2, we see there is a large increase in the coverage probability for both the ordinary two-sided and modified twosided confidence intervals. At that point, many new tables for which the confidence intervals contain the given value of 6 are added to the calculation of the coverage probability, and the jump comes from the new included non-null table probabilities. For the coverage probability based on two-sided ordinary tests, the big jump has occurred before the coverage probability based on two-sided modified tests has a big jump, and the amount of increase is greater than that of two-sided modified tests. Also, at that jump point, more new tables are included for the coverage probability based on two-sided ordinary tests than the coverage probability based on two-sided modified tests. We have observed similar results using other sets of fixed margins. In particular, for the two-sided approach, for large |log6»|, the true coverage probability has 0.95 as a lower bound rather than 0.975. For the proof, let be the ordinary confidence interval based on the two-sided test. Suppose that the value of the upper limit, 0_)., is large enough so that all of the lower limits from other possible tables are less than Then at 0 = 6*+ we have Yl{t p(t-,e+)p{to-,B+)} -P(^) ^1 -) > 1 — a. At 0 the coverage function is C{9+) = ^I{t,9+)P{t-e^) t = T. {t : P{t-,B+)>P{to-,e+)} > 1 a, since at 0 = 9^, we have Yl{t . p(t-,B+) a. In order to satisfy this, we need to have P{t'\ 9p) > P{to] 9^). Then the two-sided

PAGE 53

45 ordinary P-value is larger than cr at T = t'. Hence, the coverage probability, which is the summation over t such that P{t;9+) > P{to-,9+), is at least 1 a. Also for 9 > 9+ the coverage function has P{9) > 1 — cv. For a special case, suppose that P{t] 9+) > P{to\ 9^) for all t > Then at 6> = 9^, P{9^)= Y. F(i;0+) = = «• {t . p(f,e^)-,op t>to+l = 1 a, since at some value of T = t', the fact that 9^ is within this interval corresponds to P(T < t ; 0_|_) > a. This requires T > to + 1, since P[T < toj ^+) = Hence the coverage function has C{9^) > 1 — a. This relates to the property mentioned previously, by which an interval endpoint for the two-sided approach with error probability a can equal one for the one-sided approach with error probability 2a. So far, we have used the coverage probability to compare the methods of constructing the confidence interval. An alternative way to compare them is to compute the expected length of confidence intervals for 9 or for log 9. A complication results from infinite endpoints that occur at T = or T = Figure 2.15 displays the expected length of confidence intervals for 0, for four methods, using the margins of Table 2.1. The two-sided modified confidence interval has the smallest expected length, uniformly for all 9. For instance, the expected lengths at 9 = 1 are 21.84, 17.22, 13.78, and 11.21 for one-sided ordinary, one-sided modified, two-sided ordinary, and two-sided modified intervals, respectively. For this figure, we arbitrarily set the

PAGE 54

46 upper limit equal to 1000 whenever T = tmaxSince the expected length depends on the upper limit at T = tmax, that value was chosen to be almost two times the maximum finite upper limit among the four methods. Figure 2.16 presents the analogous expected length of confidence intervals for log 6, using the margins of Table 2.1. Again, the two-sided modified confidence interval has uniformly the smallest expected length. We use 1.0 x 10-^ for the lower limit of at T = and 1000 for the upper limit of 0 at T = fniaxFigures 2.17 and 2.18 give analogous displays using the margins of table 2.1, comparing the lengths conditional on T ^ or f,naxThen, the expected length does not depend on the values of the lower limit at and the upper limit at T = f„iax. Again, the two-sided modified confidence interval has uniformly the smallest expected length. 2-4.3 The One-Sided Mid P Confidence Interval For confidence intervals for a common odds ratio based either on inverting two separate one-sided tests or inverting a two-sided test, one can construct even narrower intervals, albeit not “exact” ones, by inverting the tests based on the modified mid P value. The ordinary mid P confidence limits based on inverting two separate one-sided tests are found using the functions r„.id(.)(») = p,(0)-ip(«.;0)), Pniid(2)(^) = ~ 2 P(Ci ^))(2-U) The limits are determined by the same method used for the modified exact confidence interval, using P„iid(i)(^) for the lower limit and P„ud( 2 )(^) for the upper limit. Though

PAGE 55

47 approximate, this type of confidence interval based on the ordinary mid P-value has been observed empirically to behave well (Mehta and Walsh 1992). Following the modified approach based on using a one-sided modified mid Pvalue, let Bi{0) = {Z : Z G P,r = to,T'{6) = The modified mid P confidence interval based on inverting two separate one-sided tests uses p,:ud(2)W = pm-\p[Bx(«)\e). (2.18) The limits are chosen by the same method used for the modified exact confidence interval, using for the lower limit and 7W(j(2)(^) upper limit. This approach tends to give narrower intervals than obtained by inverting the one-sided test with the ordinary mid P-value. We illustrate these confidence intervals for the common odds ratio using Tables 2.1 and 2.2. For Table 2.1, the 95% confidence interval by inverting a one-sided test is (1.34, 266.54) based on the ordinary mid P-values and (2.22, 56.00) based on the modified mid P-values using E^l(^) or the table probability for V. Using Table 2.2, the confidence intervals are (0.98, 16.89) using the ordinary mid P-values, (1.01, 13.61) using the modified mid P-values with S (1-04, 14.85) using the modified mid P-values with the table probability for r . 2.4.4 The Two-Sided Mid P Confidence Interval As the two-sided approach tends to give an interval that is usually narrower than the one based on inverting two separate one-sided tests, we can construct a shorter interval using two-sided mid P-values. Though these cannot guarantee achieving at

PAGE 56

48 least the nominal confidence level, one could define mid P versions of the ordinary two-sided and modified two-sided intervals. For testing a particular value of 0, a two-sided mid P-value can be defined as = P(e)-^-P{{Z:Z^T,P{t-e) = Pit,-9)}). (2.19) The limits are determined by the same method used for the two-sided exact confidence interval. Following the modified approach, one can construct a modified confidence interval based on two-sided tests by using a modified mid P-value. We define a modified twosided mid P-value for testing a particular value of 9 as p:.M = PÂ’"{^)-\p{{^-Zer,P{t-,e) = p{p-,0),r{9) = t',{e)}).{2:2O) Also, the limits are determined by the same method used for the two-sided exact confidence interval. We illustrate these confidence intervals for the common odds ratio using Tables 2.1 and 2.2. For Table 2.1, the 95% confidence interval by inverting a two-sided test is (1.38, 131.51) based on the ordinary mid P-values and (1.38, 35.51) based on modified mid P-values using T' = T.Xl{9). Using Table 2.2, the confidence intervals are (1.01, 12.58) and (1.01, 10.29) using the ordinary and modified mid Pvalues with T' = EX^(0), respectively. For these data sets, the confidence interval constructed by using the ordinary two-sided mid P-values is shorter than the ordinary one based on two one-sided mid P-values. For each type of interval, the modified interval is narrower than the ordinary one. Table 2.4 summarizes these 95% confidence intervals using Table 2.1 and Table 2.2. For the conditional distribution having the fixed marginal counts of Table 2.1, Figure 2.19 shows the actual coverage probability as a function of the true log odds ratio, for the 95% confidence intervals based on inverting separate one-sided tests using the ordinary mid P-value or the modified mid P-value with T' = EX^(9). The

PAGE 57

49 exact method yields a coverage exceeding the nominal level, whereas the coverage of the mid P-value fluctuates about the nominal level. For either approach, for sufficiently large |log6»|, the actual probability of coverage of a 100(1 a)% confidence interval is centered about 1 otj 2 and that of the modified mid P-value deviates less from 1 — a/2. Figure 2.20 gives an analogous display for the confidence intervals based on inverting two-sided tests using the ordinary mid P-value or the modified mid P-value with There is an advantage to the interval based on the modified P-value. For either approach, the actual probability of coverage of a 100(1 — a)% confidence interval is centered about the nominal level, and that of the modified mid P-value is even closer to the nominal level. For intervals using mid P-values, we suggest the use of the confidence interval based on inverting two-sided tests using the modified mid P-value. Method Data set 1 Data set 2 Exact Cl Ordinary 1-sided P Modified 1-sided P (P*) Modified 1-sided P (P*) 1.08, 531.51 2.08, 67.35 2.08, 67.35 0.86, 21.37 1.01, 13.63 1.04, 14.87 Ordinary 2-sided P Modified 2-sided P (P*) Modified 2-sided P (P*) 1.29, 261.49 1.38, 40.45 1.38, 40.45 0.88, 15.92 1.01, 10.30 1.01, 11.14 Approximate Cl Mantel-Haenszel ML 1.03, 47.73 1.28, 128.12 0.86, 12.93 0.99, 17.64

PAGE 58

50 Table 2.4. Various 95% confidence intervals for the common odds ratio using mid P-value. Method Data set 1 Data set 2 Approximate Cl Ordinary 1 -sided mid P 1.34, 266.54 0.98, 16.89 Modified 1-sided mid P (P*) 2.22, 56.00 1.01, 13.61 Modified 1-sided mid P {P*) 2.22, 56.00 1.04, 14.85 Ordinary 2-sided mid P 1.38, 131.51 1.01, 12.58 Modified 2-sided mid P {P*) 1.38, 35.51 1.01, 10.29 COVERAGE P LOG THETA Figure 2.9. Coverage probability for confidence intervals based on inverting one-sided tests with T' = for conditional distribution based on margins of Table 2.1.

PAGE 59

51 COVERAGE P "Ti ' ^ ^ -4 -2 0 2 4 LOG THETA Figure 2.10. Coverage probability for confidence intervals based on inverting onesided tests with T' = P{Z), for conditional distribution based on margins of Table

PAGE 60

52 K=3 COVERAGE P LOG THETA K=6 COVERAGE P LOG THETA K=9 COVERAGE P LOG THETA K=12 4-2 0 2 LOG THETA Figure 2.11. Coverage probability for confidence intervals based on inverting onesided tests with T' = ^Xl(0), for conditional distribution based on first K partial tables of Table 2.2.

PAGE 61

53 LOG THETA Figure 2.12. Coverage probability for confidence intervals based on inverting twosided tests with T' = ^ X^(9), for conditional distribution based on margins of Table ^ • 1 •

PAGE 62

54 COVERAGE P LOG THETA Figure 2.13. Coverage probability for confidence intervals based on inverting twosided tests with T' = P(Z), for conditional distribution based on margins of Table ^ . 1 .

PAGE 63

55 K_ 3 DO^ERAGE P LOG THETA K=6 COVERAGE P LOG THETA K=9 COVERAGE P LOG THETA K=12 -4 -2 0 2 4 LOG THETA Figure 2.14. Coverage probability for confidence intervals based on inverting twosided tests with T' = '£,Xl{0), for conditional distribution based on first K partial tables of Table 2.2.

PAGE 64

100 200 300 400 500 56 LENGTH (THETA) Figure 2.15. Expected length of confidence intervals for 6 , with T' = for conditional distribution based on margins of Table 2.1.

PAGE 65

57 LENGTH(LOG THETA) Figure 2.16. Expected length of confidence intervals for log 0, with T' = for conditional distribution based on margins of Table 2.1.

PAGE 66

100 150 200 58 LENGTH (THETA) Figure 2.17. Expected length of confidence intervals for conditional on T ^ or ^max5 with T' = for conditional distribution based on margins of Table 2.1.

PAGE 67

59 LENGTH(LOG THETA) Figure 2.18. Expected length of confidence intervals for log 6, conditional on T ^ t .„i„ ^max? with T — S ^ 1(^)5 for conditional distribution based on margins of Table

PAGE 68

0.90 0.92 0.94 0.96 0.98 1.00 60 LOG THETA Figure 2.19. Coverage probability for confidence intervals based on inverting onesided tests using mid P-values with T' = for conditional distribution based on margins of Table 2.1.

PAGE 69

0.90 0.92 0.94 0.96 0.98 1.00 61 COVERAGE P LOG THETA Figure 2.20. Coverage probability for confidence intervals based on inverting twosided tests using mid P-values with T' = Y^X^(6), for conditional distribution based on margins of Table 2.1.

PAGE 70

62 2.5 Connections with Logistic Regression Consider a set of independent binary variables, Vj, • • • , Y^. Corresponding to each variable, Yj, there is a (p x 1) vector = (xij,--,Xpj)' of explanatory variables. Let TTj be the probability that Yj — 1. Suppose that the response is related to the explanatory variables by the logistic regression model, log 7T, 1 7Tj= 7 + x'/3. (2.21) The likelihood function is exp[S^^iP,(x;.^ + 7)] n”=i[l + exp(x'/3 + 7)] The p X 1 vector of sufficient statistic for /3 is t = . Suppose p = 2, and we want to conduct inferences about Again, one can eliminate (32 by conditioning on its sufficient statistic, ^2 — One can treat the data for the logistic regression model as a three-way 2 x / x K tables where I and K are the number of distinct values of the explanatory variables, Xx and A2, respectively. Exact inference in logistic regression often is highly discrete, even degenerate. One can often alleviate this problem somewhat by treating the data as a contingency table and using the alternative way discussed in Section 2 of constructing P-values. To illustrate, for Table 2.1 we let -k^ denote the probability of cure for the jth individual at the tth penicillin level. The logistic model has form log = 7,+ i = 1, • • • , 3 , where is a dummy variable for delay. The observed value of the sufficient statistic T is 14. For testing Ho : /3 = 0, the exact one-sided P-value \s P = P{T > 14) = 0.0200. The modified exact P-value, using T = E^K^) or the table probability, is 0.0028.

PAGE 71

63 2.6 Discussion We have shown that use of a modified P-value leads to exact tests and confidence intervals that are less conservative than the usual ones. The improvement can be considerable when K is large but n is not, in which case there may be a large number of tables with the different secondary statistic value that have the same primary test statistic value. We prefer modified exact tests and confidence intervals over the ordinary exact ones, because they are less conservative than the ordinary ones but still guarantee at least the nominal level. We prefer confidence intervals based on inverting two-sided tests over those based on inverting two separate one-sided tests, because they tend to be less conservative. Likewise, for confidence intervals using mid P-values, we prefer intervals based on inverting two-sided tests using modified mid P-values. For the secondary statistic, we have used Y.k and the table probability in our examples, and clearly the reduction in conservativeness occurs with test statistics for more general alternatives. A FORTRAN program has been prepared, designed for IBM-compatible PCs or UNIX workstations, for computing modified P-values for tests of conditional independence and modified confidence intervals for an assumed common odds ratio. This program also computes the actual coverage probability and the expected length of confidence intervals using four methods. This program, for 2 X 2 X A tables, is an adaptation of one written by Vollset and Hirji (1991) for ordinary exact inference for such tables. The Appendix A contains the FORTRAN source code.

PAGE 72

CHAPTER 3 APPROXIMATING EXACT INFERENCE ABOUT CONDITIONAL ASSOCIATION 3.1 Introduction For three-way tables, consider the hypothesis of conditional independence of X and y , given Z . This hypothesis is usually tested against the alternative of no threefactor interaction. The general alternative that permits three-factor interaction is the general loglinear model for a three-way table and has the form log = /i + Af -f A[ + Af + AfX + Af/ + aJ/ + ( 3 . 1 ) When X or V are ordinal, narrower alternatives can be constructed for the exact tests. We suggest exact inference regarding conditional associations in three-way contingency tables. For I x J x K tables, we discuss six test statistics for conditional independence that have natural connections with loglinear models for various alternatives. We use a simulation algorithm to obtain precise estimates of exact P-values for cases that are currently computationally infeasible. For three-way contingency tables, current computational algorithms for the exact methods are restricted to certain analysis for 2 x J x K tables. Also when the sample size is small or when the contingency tables are sparse, large-sample approximations can be questionable to apply. The Monte Carlo method is an alternative to either the exact or asymptotic methods. This method is based on estimating the exact conditional sampling distribution of the statistic, by generating random tables having the relevant fixed margins. The advantage of this method is that the number of tables 64

PAGE 73

65 generated is fixed in advance, and the computing time does not depend greatly on the sample size n and the table size, compared to methods for exact analysis. For the random table generation, we use the procedure by Patefield (1981) that simulates hypergeometric distributions. Section 2 discusses exact tests of conditional independence in / x J x tables using three statistics that are popular for asymptotic tests. These are naturally linked to alternatives corresponding to loglinear models that assume a lack of three-factor interaction. Section 3 presents three other statistics that do not require this assumption. All SIX test statistics are score statistics for loglinear models that treat none, one, or both of the classifications as ordinal. Section 4 discusses possible alternative ways of forming modified exact P-values in / x J x K contingency tables, generalizing the modified P-value discussed in Chapter 2. We propose modified exact P-values for six tests for testing conditional independence with I x J x K tables. Computational algorithms have limited availability for tests of conditional independence when / and J exceed two. Section 5 describes a Monte Carlo sampling routine that approximates the ordinary and modified exact P-values. We utilize six test statistics for exact tests of conditional independence. Section 6 illustrates approximate exact tests of conditional independence with examples, and Section 7 explains a FORTRAN program utilizing the simulation algorithm. — Tests of Conditional Independence Assuming No Three-factor Interaction This section presents three test statistics for testing conditional independence of and R, given Z,\n I x J x K contingency tables, proposed by Birch (1965). We present loglinear models for which these are score statistics. These models assume a lack of three-factor interaction. We then present three adaptations of these statistics

PAGE 74

66 that do not require that assumption in the next section. In each case, one test treats both X and Y as nominal, one test treats X as nominal and Y as ordinal, and one test treats both as ordinal. The asymptotic chi-squared theory is well developed for the statistics we present. Our focus will be to construct exact tests of conditional independence, using these statistics with the reference set F of tables with the same margins. We use score statistics for loglinear models rather than likelihood-ratio or Wald statistics. This makes the computations for exact analyses simpler, since one does not need to fit the model for each table in T. 3.2.1 Nominal-bv-Nominal Test Birch (1965), Landis et al (1978), and Mantel and Byar (1978) generalized the Cochran-Mantel-Haenszel statistic to handle more than two groups or more than two responses. Suppose X and Y are nominal. Let n^. denote the counts for cells in the first I 1 rows and J 1 columns for stratum k of Z. Conditional on the row and column totals in that stratum, let denote the null expected value of Then d = SA,.(nfc nifc) represents the (/ 1)(J 1) x 1 vector having elements. z = l,---,/-l j = l,...,J_i. (3.2) Let Xk denote the null covariance matrix of n^, where Cov(rijj^, 1^i+k{^n''>l++k '>^i'+k}n^jk[6jjin^^k ~ '^^+j'k) nl^k{n++k 1 ) (3.3)

PAGE 75

67 Then V T,k\k is the null covariance matrix of d. The efficient score statistic for testing conditional independence against the alternative of no three-factor interaction is C'^ = d'V-^d. (3.4) This is also called the generalized CochranMantelHaenszel statistic. Under conditional independence, this statistic has a large sample chi-squared distribution with = For A = 1 stratum with n observations, the statistic reduces to the multiple (n — l)/n of the Pearson chi-squared statistic for testing independence. The statistic is sensitive to detecting conditional associations when the association is similar in each stratum. Hence, the generalized Cochran-Mantel-Haenszel statistic has low power for detecting an association in which the patterns of association for some of the strata are in the opposite direction of the patterns displayed by other strata, relative to the case that the association is similar. 3.2.2 Ordinal-bv-Ordinal Test When X and Y are ordinal, it often makes sense to test against a narrow alternative, corresponding to a monotone trend in the conditional association. It then makes sense to form a test statistic using a model that is a special case of the no threefactor interaction model and reflects the ordinality, such as the model of homogeneous linearby-linear association, log rriijk = /^ + + f^UiVj + + Xjif . (3.5) It replaces the general association term by a linearbylinear term ^UiVj, where {u,} and {uj} are monotone scores for levels of X and Y. The parameter /? in that model describes X — Y partial association. The model of conditional independence

PAGE 76

68 of X and Y is its special case in which /3 = 0. For this model, the sufficient statistic for is Yik\YiYjUiVj7iijk'\. When I = J = 2^ the usual statistic results from the scores Ui = Ui = 1, u -2 = U 2 = 0. This is the Birch’s exact test statistic for testing conditional independence in 2 x 2 x K contingency tables, and we have utilized this statistic in Chapter 2 for the conditional exact test. Also, Mehta, Patel and Gray (198o) and Vollset, Hirji and Elashoff (1991) used this statistic to implement the exact test. For the asymptotic test of //„ : /3 = 0, one can use Mantel’s (1963) generalized statistic for detecting association between ordinal variables. This ordinal test focuses the departure from independence on a single degree of freedom. Suppose we expect a monotone conditional relationship between X and V", with the same direction at each level of Z, and suppose that we can assign monotone scores {ui} to levels of X and {vj} to levels of Y . Then there is evidence of positive trend if, within each stratum, the statistic YiYijUiVjriiji. is greater than its expectation under independence. For the model (3.5), given the marginal totals in each stratum and under conditional independence of X and Y, E{YiY,UiV^mjk) = Var(E, ^++k 1 X lE,vW„ + " 'f^++k To summarize the correlation information from the K strata. Mantel (1963) proposed the statistic j^'2 ^ {Yk[T,,T,jUiVjnijk E{YiT,jUiVjnjjk)]y T,kXa.r{'Ei'EjUiVjnijk) ' ' ’ This is the score statistic for testing conditional independence for model (3.5). It has an asymptotic, chi-squared distribution with df = 1.

PAGE 77

69 3.2.3 Nominal-bv-Qrdinal Test Suppose the row variable X is nominal and the column variable Y is ordinal. A useful loglinear model replaces the ordered row scores in model (3.5) by unordered parameters {/x,}, log = /X + Af + Xj + Af + fiiv, + Aj^.^ + AJ/. (3.7) The sufficient statistics for {fii} are = I,-,7. These can be interpreted as the row sums for a response Y within each level of X, using the scores {uj}, summed over the strata. Assuming the model holds, we can test conditional independence by testing fly = ^2 = — fJ-iLet Vi, • • • , be a random sample within the stratum k, which takes scores uj, • • • , vj. Let I denote the (7 — 1) x 1 vector having elements h — fTt)? ('L8) where and — YjTiijkVjjrii^k, h = I,-Note that Wn, is the row mean on Y at level i of X and level k of Z, treating Y as a response with scores {uj}. Similarly, Wk is the Arth stratum mean for Y. Let A denote the null covariance matrix of 1 , which has elements + + ^ -tj (3.9)

PAGE 78

70 Then the efficient score statistic for testing conditional independence against the alternative of (3.7) is 1 A ^1. This statistic is sensitive to location differences among the I conditional distributions of Y that are similar at each level of Z . The asymptotic null distribution is chi-squared with df = 1 — 1. The three statistics just discussed were suggested by Birch (1965) for testing conditional independence. The three asymptotic tests are available in SAS (PROC FREQ). 3.2.4 Generalized Tests The previous three statistics are special cases of a general statistic proposed by Landis et al. (1978). Let iik denote a column vector of the cell counts in stratum k, and let ni^. denote their expected values. Also let Ri+it denote the marginal proportion of zth row and let P denote the marginal proportion of jth column. We introduce the following notation to define the generalized test statistic. '^ik id^i\ki 5 ^k — i'^lki ' 1 '^Ik) P+k ^^i+k P^++k P+jk ^^+jkllT'++k p' ^ ( p , JD . p / ^1+*: ^2-t-fc n, ^ *+k V-' 1+*: » ^2+ki ' ' ' 1 ^I+k ) — ( , , ' ' ' j ) f^++k 1^-\--\-k f^++k p P \ / '^^+lk n.^2k nj^Jk ^ ^-\-*k — K^+lk, r+2k: • • • , t^+Jk) = ( , , • , ) f^++k 'IT'+^k 1^++k

PAGE 79

71 Assume that cell counts from different strata are independent. Landis et al. (1978) showed that under the hypothesis of conditional independence, the expected value and covariance matrix of the frequencies are, respectively. rrik = E[rik\Ho] = n++k{P*u ® P+*k) (3.10) and Var[nA,|i/o] = _ K+k [{D Pt+k ^*+l^^*+k) (3-11) where ® denotes Kronecker product multiplication and Da is a matrix with elements of a on the main diagonal. The generalized statistic for testing conditional independence is defined as Qm = G'VqG, ( 3 . 12 ) where and where G = EkBk{nk ruk) Vq = EkBk[\&r{nk\Ho)]B’f,, Bk — B-k ® Ck is a matrix of fixed constants based on row scores Rk and column scores Ck for the kth stratum. When the null hypothesis is true, the statistic Qm is approximately distributed as chi-squared with degree of freedom equal to the rank oi BkSuppose the row variable X is nominal and the column variable Y is ordinal. Then mean score of Y is meaningful. In this case, the mean score is computed for each row of the table, and the alternative hypothesis is that, for at least one stratum, the mean scores of the / rows are unequal. Then the statistic is sensitive to location differences among the / distributions of Y.

PAGE 80

72 For this case we can define the matrix Rk that has dimension (/ — 1) x / as (3.13) where Ij_i is an identity matrix of rank / 1, and J/_i is an an (/ 1) x 1 vector of ones. The matrix has the effect of forming 7—1 independent contrasts of 7 mean scores. The matrix Cj^ has dimension 1 x T, and the scores are specified as one for each column. Then sums over the K strata information about how 7 row means compare to their null expected values, and it has d/ = 7 1. When both variables are ordinal, R^ and C^. can be defined as R^ = (ui, • • • , u/), and Cfc = (uj,--,uj). If the scores R^ and C}. are the same for all strata, Qm simplifies to M'^ . When both variables are nominal, Rk = J/_i), and Ck = can be used. Then Qm simplifies to d'V~^d with df = {I 1)(J 1). For exact tests of conditional independence in I x J x K tables, we discussed test statistics assuming a lack of three-factor interaction. These are score statistics for loglinear models that treat none, one, or both of the classifications as ordinal. Also they have asymptotic chi-squared distributions. — Tests of Conditional Independence Permitting Three-factor Interaction The tests discussed so far assume no three-factor interaction. Suppose, instead, we expect the nature of the association between X and Y to vary considerably across levels of Z . Then one would test against an alternative that permits the association to vary across the strata of Z.

PAGE 81

73 3.3.1 Nominal-bv-Nominal Test, Suppose and Y are nominal. Then one could test conditional independence against the saturated loglinear model, since the only more general model is the saturated model. An efficient score statistic is the Pearson statistic for testing conditional independence against the alternative of the saturated model (Agresti 1992). Letting denote the Pearson statistic for testing independence within the kth level of Z, this statistic is The asymptotic distribution of this statistic is chi-squared with df = K{I 1)(J1), since at each partial table Xj has asymptotic chi-squared distribution with df = (/ — 1)(J — 1), and we have K independent partial tables. Also, this is the df for testing a loglinear model of conditional independence against the most general alternative. 3.3.2 Ordinal-bv-Ordinal Test The model of homogeneous linear-by-linear association (3.5) allows association between two ordinal variables in each table and this association is homogeneous across levels of Z. When X and Y are ordinal, one sometimes expects a monotone association between X and Y that changes strength across levels of Z. We consider a loglinear model that permits association between X and Y within each level of Z, but heterogeneity among levels of Z, and the degree of heterogeneity is explained by its association parameter. A relevant loglinear model is then the heterogeneous linear-by-linear association model. log mijk = ,, + Af + A]" + Af + fitu.v, + A^^ + A]'/. ( 3 . 14 )

PAGE 82

74 For this model, the null hypothesis of conditional independence is Hq : /?j = • • = I^K = 0. The loglikelihood is L{m) = ^ ^ log ^ ^ ^ j k i 3 k = E E E "..*(/* + >~f + At + Af + h„,v, + + Aj/) E E E * ^ ^ . j fc — + X/ ^i++ + E ^*'+:?+ + X! ^*++fc + H ^A; XllZ UiVjtlijk ^ ] k k t j + EEAf/.w + EEAkV,/.-EEE’««»(s.is) * A: j A: i j k For this model the sufhcient statistic for is EiEjUiVjTiijk. For A; = 1, • • • , A", the derivative of the loglikelihood is dL{m) d^k EE U^V jTXijf^ EE Vj Tll^j /j , * J i Under the hypothesis of conditional independence, we have mijk = !h±±I!± 2 jL_ Hence, for A; = 1, • • , h\ dL{m) EE UiVji^Tl^jk '^ijk^ i J = V Vu n fnt 3 i j P++fc Let s denote the K x 1 vector having elements Sk Ej ^jU{Vj (^Pijk Pi+kP+jk X P++k I n X]j ^jU^Vj {p^ijk ^i+kf^+jk ^ 'kl++k (3.16)

PAGE 83

75 Then s can be defined as s St Sj ^i'^ji^Pijk _St Sj ^i^j{PijK Pi+lP+)l \ P+ + 1 P.+2P-4-72 \ P++2 ^ Pi+^P+tfc \ P+ + * ' Pi+kP+iK •< P+ + K ' St Sj Si Sj ^t^j(^tj2 1 St Sj ^i^j{^ijk _St Sj ”»+l”+.il \ ”++l ^ »H+2n+i2_\ "++2 ^ ^t+fc^+ tfc \ «++* ' ^i+if»+t Jr \ "+ + A' h For fixed fc, let G/;(7r) — Si Sj ^i^i(7riiA; — Let g*; represent the IJ x 1 T T ^ vector having elements Ski'll j) *7^ ^a'^a'l^^a+k){f^jf^++k S6f^6^+6fc )] j ^++k and let gf be the UK x 1 vector with gf' = gj, For example, Si D agi(7T) Ott (Mi7T+ + i Sa^‘a7ra+l)(ni7T+ + i S& ) (uiTr+4.1 J2a Ua7ra+l)(n27T++i ^^67T+fci) (Uj7r_|__f_i — Sa ^a^a+l )(Wj7r.|__(.i S6^(>^+6l) (U/7T++1 Ea ^^a7ra+l)(uj7T++i J2b ^bT^+bl) 0(K-1)IJ

PAGE 84

76 ( Uin++i u „ n „+ i )(^; in++i ^bn+bi) (uin++i ~ J2a’^ana+-l){v2Tl^^i ~Z^6^6^+6l) {UiTl^+l Ea ^ a « a + l )( Vjn++i E & ^^ 6^+61 ) (u/n++i Ea ^a«a+l)(vjn++i “ E& ^6«+6l ) 0 ( A '1 )/J = gl 0 ( A '1 )/J and n ++1 Sk dGk(7r) d-TT 1 ^++k 0 ( fc i)/j (ui7T++fc Ea 't^a7ra+fc)(ni7r++fc E & VbTT+bk) ( Ui 7 T++fc Ea ^< a 7 ra + A :)( f 27 r++/t E & Ea ^a^a+/c)(^j^++A: E& ^6^+6/c) (u/7r++A; Ea^^a7Ta+A;)(^^j7r++*: E& 0 ( AT fc)/J 0 ( fc l)/J (ni«++^. Ea ?^an„+A:)(t^ira++fc Eft ^^6?^+ftft:) (uin++^. Ea ^a?^a+fc)(w2«++A,~ Eft^ft«+6fc) 1 ^++k (^t^++A; ~ Ea ^a^a+fc)(t^j^++fc ~ Eft ^ft^+ftfc) (u/n++A; Ea Uana+k){vjn++k ~ Eft ^’ft^+ftfc) 0 ( A 'yt)/J 0 ( fc l)/J gk P{K-k)IJ,

PAGE 85

77 Also let D represent the K x UK matrix such that row k consists of gf that is 3Gi(7r) M D 97T ogjUKl' L 97T J The null asymptotic covariance matrix of s is H = DSD'/n, where n = and S = Diag{-p) — pp' with p = { } • The score statistic for testing Ho : /3-i = = f^K = 0 is then s'H ^s. From Rao (1973, page 418), the asymptotic distribution of s is A -variate normal. Its mean is zero and dispersion matrix is the information matrix. Hence the asymptotic distribution of is chi-squared with df = K. The number of df is the number of components of parameters for testing, or the rank of the asymptotic covariance matrix. 3.3.3 Nominal-bv-Ordinal Test A loglinear model (3.7) implies there are row effects on the association, and these row effects are the same for each level of Z. In general cases when X is nominal and y is ordinal, we might expect heterogeneity in the row effects on the association. Then a relevant loglinear model to allow heterogeneity across the strata is log rn,jk = /r + Af + Xj + Af + fi.kVj + Xf^^ + AJ/. (3.17) The model is sensitive to alternatives whereby means on Y vary across levels of both X and Z . For identiliability, we use constraints jiik = 0. For this model, the null hypothesis of conditional independence is //q : /ii/t = 0 for i = 1 , • • • , / — 1 and

PAGE 86

78 k — 1, • • • , . The loglikelihood i IS ^ 3 ^ i j k + jJ'ikVj + 4 =^ + 4 ")-EEE t 3 k f^^ijk i j k k'^j'f^ijk = rifi + E ".++ + E d «+;+ + E Af»++^ + EE E ® 3 k i ] k + EE^f/".+^ + EE4V,t-EEE™«*P.is) '‘ k j k i j k For this model the sufficient statistic for is For fixed i and k, the derivative of the loglikelihood is dL{m) dfx tk = '^Vjriijk-^Vjmijk. Under the hypothesis of conditional independence, we have Hence, for fixed i and k. dL{m) ^ / U j ( Tlij f; TTlij k ) n Y.^^ipijk ''^++k Pi+kP-kjk P++k )• For i — I,-,/ 1, A: — 1, • • , A , let q be the K[1 — 1) x 1 vector having elements qik = J2^j{Pvk Pi+kP+jk P++k ), ^++A: ), n n,+k{Wik Wk), (3.19)

PAGE 87

79 where W; Or it can k — and Wk — Yli Y^j nijkVjfn^^kThen q can be defined as E, ’>,iPw E, MPu-m be written as E, v,{p Ej ^j(p(t-i)jk P(I-l)+kP+jk P+ + k E, Mp„k E, ”,(P2,K E, ",(?{;-! )tK P+ + K P(7-i)+j
PAGE 88

80 For fixed i, k, let Gik{Tv) — Vj{-Kijk — Let r,fc represent the IJ x I vector having elements '^bVbn+bk){n++kSii> n^+k)], i' = I,-,1, ^^++k and let rg be the UK x 1 vector with rg' = {0[k-i)u,rg, ^[K-k)ij)That is, ^ik — _ dGik{Tv) dn 7T ++A; 0 {k-\)IJ (ni 7 T++fc Yb'^bT^+bk){-T^i+k) {V2n+J^k Yb VbT^+bk){-T^t+k) {vjTTj^^k Yb VbT^+bk){-T^i+k) (^1^++A: ^6^+6/t)(^++A; ^j'+fc) (^2^++A; ^2ib ^b'^+bk'){'^-{-+k '^i+k') {vjT^++k Yb Vb7r+bk){T^++k TTi+fc) (ni7r++^ — Yb'^b'^+bk){—'^i+k) {v2TV++k Yb Vb7r+bk){-T^i+k) (nj7T++fc Yb VbT^+bk){-'n'i+k) 0{K-k)lJ

PAGE 89

81 Or {vin+^k Y.b Vbn+bk){-rii+k) {v2Ti^+k J2b Vbn+bk){-ni+k) (vjn++k E& ^6«+5/c)(-«i+A:) Et ^6^+6A;)(^++fc ^^i+k') (^2^+4-A; ^^6 ) (^-r+A: Et ) (^++A ^i-kk') {Vin^+k — E& ^6^+6A:)( — «t+A:) (^2^++A; E& ^6^+6fc)( ^i-b-k) {vjn++k T,bnn+bk){-ni+k) ^(K-k)IJ 0(A'-1)/J 0(A'-A)/J. Also let E represent the A (/ 1) x UK matrix such that the row corresponding to t, k consists of rfj^' , that is, E dGii(TV) ' 1 dTT 97T dG,j
PAGE 90

82 testing Ho . = 0 for z = l, 1 and A: = 1, • • • , /if is q'R~^q. Its asymptotic distribution is chi-squared with df = K{I — 1). The number of df is the rank of the asymptotic covariance matrix or the number of components of parameters for testing. For exact tests, one identifies any of these six statistics with T in the calculation of the exact P-value. We discuss next how to construct modified exact P-values for the six tests. 'M The Construction of the Modified Exact P-vabie So far, we have discussed six test statistics for testing conditional independence of X and V , given Z, in three-way contingency tables. The ordinary exact P-value can be constructed by utilizing these statistics. In Chapter 2, we proposed a modified exact P-value, to reduce the degree of conservativeness. It is based on both the usual test statistic and, at the observed value of T, a secondary statistic T' that generates a secondary partitioning. The statistic T' is a statistic directed toward a broader alternative. Then, T' can catch some information about the validity of the null hypothesis when the assumed alternative for T is not exactly satisfied. The modified exact P-value is defined in Chapter 2 as P* = Ph,{T > Q + Ph,{T = U, r > CJ, when large values of T and T' contradict the null. We have shown in Chapter 2, using 2 X 2 X A tables, that the modified P-value has less discrete sampling distributions, and modified tests reduce the degree of conservativeness. We can apply this modified approach to / x J x A tables to reduce the conservativeness and to get sharper results. For testing conditional independence assuming no three-factor interaction, we denote T-[ to be the test statistic when both X and Y are nominal, denote T -2 to be

PAGE 91

83 the test statistic when X is nominal and V is ordinal, denote T2 to be the test statistic when X is ordinal and Y is nominal, and denote T3 to be the test statistic when both X and Y are ordinal. Also, let T4,Ts,T^ and Tq be the corresponding test statistics when we permit three-factor interaction. Note that these are score statistics. In this section, we discuss possible alternative ways of forming modified P-values for testing conditional independence for 1 x J x K tables. Ordinary exact P-values for these six tests correspond to six loglinear models for primary alternative hypotheses. The general rule to construct the modified exact P-value is as follows. We use a score statistic for T', in order to have consistency. If there is only one potential statistic for T , we use that one. But, if there is more than one potential statistic, we apply a basic principle to choose a T' among them. Now, we establish basic principles. We can consider four types of principles. The first principle is to choose a T from the next most general alternative, while keeping the same assumption as T about three-factor interaction. The second principle is to choose a T' from the most general alternative, while keeping the same assumption as T about three-factor interaction. The third principle is to choose a T' from the most general alternative among all cases. The fourth principle is to choose a T' while keeping the nature of the classification variables. Next, we discuss all possible statistics for T' for six cases. Note that all possible potential statistics for r are Tj, 7^2, T', Ta, ^4, Ts, T', and Te. We first consider the tests assuming no three-factor interaction. When both X and Y are nominal, the primary test statistic T isTi. The secondary statistic T' can be T4, since T4 corresponds to a more general alternative hypothesis. Second, when X is nominal and Y is ordinal, T is T2 and T' can be Ti,T4, or T5. Third, when both X and V are ordinal, T is T3 and T' can be r„r 2 ,Ti,r 4 ,T 5 ,T',or Tg. Since T3 is constructed from the narrowest alternative, the other statistics can be potential statistics for T' .

PAGE 92

84 Next, we assume threefactor interaction. First, when both X and V are nominal, T is T4, but there is no general score statistic for T\ since T is constructed from the most general alternative. We could, however, use the table probability for T' for the secondary partitioning. Second, when X is nominal and Y is ordinal, T is T5 and V can be T4. Finally, when both A' and Y are ordinal, T is Tg, and T' can be T4, Tg or Tg. Table 3.1 summarizes all possible statistics for V for six tests. We see two cases have only one potential statistic for T' . For the nominal-bynominal case assuming no three-factor interaction, T' is T4. Note that permitting three-factor interaction, nominal-by-nominal case, there is no score statistic, but we could use the table probability. Also, for the nominal-by-ordinal case, T' is T4. For these three cases, there is only one choice for T'. For other three cases, we apply a basic principle in order to choose a T' among potential statistics. For the first principle, we choose a T' from the next most general alternative, whde keeping the same assumption as T about three-factor interaction. Assuming no-three factor interaction, {T,T') is (T2,T-i) for the nominal-by-ordinal case, since the nominal-by-nominal case is more general, and it also corresponds to the next most general alternative assuming no three-factor interaction in this case. For the ordinalby-ordinal case, the next most general alternative corresponds to the nominal-byordinal case or the ordinal-by nominal case. Hence {T,T') is {T^,T2) or {Tz.T^). Accordingly, for the ordinal-by-ordinal case permitting three-factor interaction, {T,T') is (T6,r5)or (T6,r'). The second principle is to choose a T' from the most general alternative among three cases, while keeping the same assumption as T about three-factor interaction. Then, assuming no-three factor intercation, the corresponding statistics for (T,T') is (T-iiTi) for the nominal-by-ordinal case and {TziT\) for the ordinal-by-ordinal case, since the nominal-by-nominal case is the most general among three cases. Also, for the ordinal-by-ordinal case permitting three-factor intercation, {T,T') is {Tq,T4).

PAGE 93

85 For the third principle of the most general alternative among all cases, the corresponding statistics for {T,T) is (72, T4), (T3, Tij), and (Tg,?!}), since T4 corresponds to the most general alternative among all cases. For the fourth principle of keeping the nature of the classification variables, the corresponding statistics for (T,T') is (72,75), (73,76)For the ordinal-by-ordinal case permitting three-factor interaction, r' does not have a potential statistic in this principle. Among four principles, we prefer the first principle, since modified P-values can be defined for most cases using this principle, and it can utilize the ordinality of classification variables. For the second and third principles, T' does not consider possible ordinality. Table 3.2 summarizes test statistics for the construction of ordinary and modified exact P-values for testing conditional independence in / x T x K contingency tables using the first principle. For I x J x K contingency tables, the discreteness will not be severe when the sample size is large. But, when the sample size is small, the modified P-value can reduce the conservativeness. We discuss implementation of the exact tests in the next section. Table 3 . 1 . All possible statistics for T' for six tests. T V Assuming no three-factor interaction Nominal-by-Nominal Tj Nominal-by-Ordinal Ordinal-by-Ordinal T3 Permitting three-factor interaction Nominal-by-Nominal T4 Nominal-by-Ordinal Ordinal-by-Ordinal Te Fi T-2 T' Ta T4 Ts T' Te • • • • T4 • • • 7^1 • • T 4 n • Ti T-2 T' T4 Ts Ti Te T4 T4 T5 T^

PAGE 94

86 Table 3.2. ^Test statistics for the construction of the ordinary and modified exact P-values P for t esting conditional independence \n I x J x K cont ingency tables. Ordinary Modified P-value P-value P* T {T,r) Assuming no three-factor interaction Nominal-by-Nominal Ti (r„r,) Nominal-by-Ordinal T2 (T>,T,) Ordinal-by-Ordinal Tz {Ts,T2) Permitting three-factor interaction Nominal-by-Nominal T4 iTi,P(Z)) Nominal-by-Ordinal T, {Ts,T,) Ordinal-by-Ordinal Te 3.5 Approximation of Exact P-values For three-way contingency tables, algorithms for testing conditional independence are available in widely-available software only for the 2 x J x K case with ordered columns (StatXact 1991). Even for table sizes where software exists, the reference set of tables for the conditional distribution is sometimes too large for an exact P-value computation. For instance, sometimes the sample size is moderately large but there are many cells and the table is sparse, so exact methods are infeasible but the use of standard asymptotic theory is questionable. In some cases, one can obtain a very accurate approximation to the distribution of the test statistic using a saddlepoint approximation. This higher-order asymptotic approximation is more accurate than the normal approximation or the oneor two-term Edgeworth expansion. It is applicable to conditional densities and tail probabilities of sufficient statistics in exponential families. For example, to approximate

PAGE 95

87 conditional tail probabilities, one can use an approximation due to Skovgaard (1987). Davison (1988) applied the approximation to model (3.5) for 2 x 2 x K tables, and Pierce and Peters (1992) applied it to model (3.5) for K = 1. To illustrate the saddlepoint approximation, we show how to apply it to the homogeneous linear-by-linear association model (3.5) for arbitrary K . Let denote the ML estimate of ^ in that model. Let G^{I) and G^{L x L) denote the likelihoodratio statistics for testing the goodness of fit of the conditional independence and homogeneous linear-by-linear association models. The conditional P-value for testing Ho : fj = 0 against : ^ > 0 has saddlepoint approximation Hr(T > to|{n,+fc}, {n+^;t}) ~ 1 $( 2 ) -f (z)(-), (3.20) w z where 2 = sgn{^)yjG^{I) G^{L X L) and w = 2 smh(-) |//| The matrices Ij and are the observed information matrices for the conditional independence model and homogeneous linear-by-linear association model, and $ and 4> denote the standard normal cdf and pdf. Since software is not yet available in the generality needed for the exact conditional methods we have described for 7 x J x A' tables, we next present an alternative method that can approximate the exact conditional result as well as needed. This is the simple approach of performing a Monte Carlo simulation on the conditional set. The Monte Carlo method is an alternative to computing either the exact or asymptotic P-values. It is useful for those situations where the data set is too large for an exact P-value computation or too sparse to rely on the asymptotic theory. Agresti et al. (1979) utilized this method effectively for a variety of tests for twoway tables. Even for large tables or large sample sizes, one can quickly approximate

PAGE 96

88 as closely as needed the ordinary and modified exact P-values for the six statistics presented in Section 2 and Section 3. The method consists of sampling contingency tables from the conditional reference set in proportion to their probabilities, and computing an unbiased point estimate and a narrow confidence interval for an exact P-value. We constructed an algorithm to perform precise approximations for the exact inferences using a table-generation procedure suggested by Patefield (1981). For practical applications, we prefer this approximation to the saddlepoint because it is available more generally {e.g., for multi-degree-of-freedom statistics for testing vectors of parameters) because its accuracy is known to the user, and because that accuracy can be set as finely as one requires. We proposed ordinary and modified exact P-values for six tests, and T and T' are defined in Table 3.2. To illustrate, suppose we want to estimate a modified exact one-sided P-value when X and Y are ordinal assuming no three-factor interaction. Then, we test against a narrower alternative of the homogeneous linear-by-linear association model (3.5). The secondary statistic T' is a test statistic directed toward a broader alternative hypothesis. For T\ one possibility is the score statistic for the case of nominal-ordinal association assuming no three-factor interaction. Let be the observed value of V . Therefore, in this case we have T = and T' is a score statistic discussed in Section 3.2.3. This is a one-sided test. Accordingly, modified exact P-values for other tests can be constructed by using T and T' in Table 3.2. They are two-sided tests. To implement the exact tests, we sample M contingency tables, with replacement, from the reference set P of tables with the same margins, where M is chosen to give the desired degiee of accuracy with some fixed probability. Define the upper critical region of the reference set by r* = {Z e F : T > or (T = and T' > Q }.

PAGE 97

89 The other possibility for T' is to use the null table probability. Under the null hypothesis of conditional independence, the probability of observing any specific Z € F is Pr{z = z) = n K k=l ! rij ! ( 3 . 21 ) Then we define the critical region of the reference set by fp {Z e r : r > to or {T = to and P{Z) < P{N)) }. For the z'th table sampled, let ?/, = 1 if 2 ,6 T*, and let y,= 0, otherwise. The point estimate of the modified P-value is the proportion of sampled tables in F*. Likewise, the estimate of the modified P-value using the null table probability for T' can be defined using F*, and we denote by p*. For the estimate of ordinary exact P-value, the upper critical region of the reference set, F', is F' = {Z GF:T>fo}, that is, the proportion of sampled tables that have a test statistic at least as large as the observed one. 3.6 Examples 3.6.1 Example 1 We illustrate the exact tests using Table 3.3. This is a cross classification of job satisfaction by income, controlling for gender, for black Americans sampled in the

PAGE 98

90 General Social Survey of 1991. In order to utilize ordinality in studying the partial association between income and satisfaction, we test conditional independence against the model (3.5) of homogeneous linear-by-linear association. Using equallyspaced row and column scores, the likelihood-ratio chi-squared statistic for testing the fit of that model equals 12.33, with df = 17. The estimated association parameter is /3 = 0.388 with s.e. = 0.155. The likelihood-ratio chi-squared statistic for testing conditional independence, assuming the model, is 19.37-12.33=7.04 with df = 1. There seems to be very strong evidence of a positive association between income and satisfaction. However, the data are sparse enough to make large-sample approximations questionable; yet the sample size is sufficiently large so that exact analyses are infeasible. We used Monte Carlo sampling with M = 50, 000, which guarantees that P-value estimators fall within 0.004 of the true P-value with probability at least 0.95. For the exact tests assuming no three-factor interaction, the estimated exact Pvalues for the ordinary exact P-values (with 95% precision indicated in parentheses) are 0.332 (± 0.004) for the nominal-by-nominal test, 0.024 (± 0.001) for the nominalby-ordinal test, and 0.006 (± 0.001) for the ordinal-by-ordinal test. Using T' defined in Table 3.2, the corresponding estimated exact P-values for modified exact P-values P* are 0.332, 0.024, and 0.004. Also using the null table probability for T', the corresponding estimated modified P-values Pf are 0.332, 0.024, and 0.005. The distribution of T takes 121 separate points for the ordinal-by-ordinal test, and since the degree of discreteness is not severe, the two types of P-values are essentially the same. The asymptotic P-values are 0.335, 0.026, and 0.005, respectively. In this case, first-order asymptotic approximations work quite well. For other exact tests permitting three-factor interaction, the estimated exact Pvalues for the ordinary exact P-values are 0.281 for the nominal-by-nominal test, 0.089 for the nominal-by-ordinal test, and 0.020 for the ordinal-by-ordinal test. The corresponding estimated P-values for modified exact P-value, P* or P*, are 0.281,

PAGE 99

91 0.089, and 0.020. Also, the corresponding asymptotic P-values are 0.277, 0.089, and 0.020. Table 3.4 summarizes results for all six tests we have discussed. Note that we would not obtain strong evidence of association if we ignored the ordinality of the variables. For large n, since the discreteness is not severe, the modified approach is not needed. Generally, the modified P-value is less discrete than the ordinary P-value and leads to less conservative tests. For small n, we can see the advantage of using the modified approach. Table 3.3. Cross-classification of job satisfaction with income, controlling for gender, for black Americans. Â’ Gender Income Satisfaction VD LS MS VS Male < 5000 1 1 2 1 < 15000 0 3 5 1 < 25000 0 0 7 3 > 25000 0 1 9 6 Female < 5000 1 3 11 2 < 15000 2 3 17 3 < 25000 0 1 8 5 > 25000 0 2 4 2 Source: General Social Surveys (1991) VD : Very Dissatisfied, LS : A little Satisfied MS : Moderately Satisfied, VS : Very Satisfied 3.6.2 Example 2 We next illustrate the exact tests of independence using Table 3.5, which is a 3 x 2 table from the example in Table 1 of Patefield (1982). This is the results of a doubleblind study concerning the use of Oxprenolol in the treatment of examination stress. Among 32 students, 15 were treated with Oxprenolol and 17 were given Diazepam

PAGE 100

92 Table 3.4. Estimated exact P-values for testing conditional independence in Table Ordinary Pvalue Modified P-value P* Modified P-value P* Asymptotic P-value Assuming no three-factor interaction Nominal-by-Nominal 0.332 0.332 0.332 0.335 Nominal-by-Ordinal 0.024 0.024 0.024 0.026 Ordinal-by-Ordinal 0.006 0.004 0.005 0.005 Permitting three-factor interaction N om i nal byN omi nal 0.281 0.281 0.281 0.277 Nominal-by-Ordinal 0.089 0.089 0.089 0.089 Ordinal-by-Ordinal 0.020 0.020 0.020 0.021 (control). The examination results were compared with their tutorÂ’s prediction. The column classification is ordinal, and the row classification can be assumed as ordinal since it has two levels. When X and Y are ordinal, a relevant model that reflects the ordinality in a two-way table is the model of linear-by-linear association, log = // + Af + Aj + ^UiVj. (3.22) The independence model is the special case of /? = 0. We test independence against the model of linear-by-linear association in order to utilize ordinality. For unit-spaced scores, the likelihood-ratio chi-squared statistic for testing the fit of that model equals 2.64, with df = 1. The estimated association parameter \s ^ = 1.706 with s.e. = 0.773. The likelihood-ratio chi-squared statistic for testing independence, assuming the model, is 9.38-2.64=6.74 with df = I (P=0.009). There seems to be very strong evidence that the examination grades compared with their tutorÂ’s prediction tend to be higher in the treatment group. Large-sample approximations are questionable

PAGE 101

93 since the sample size is small. We use Monte Carlo sampling with M = 50,000 and compare the estimated exact P-value with the the exact P-value. For the exact tests of independence, the estimated exact P-values for the ordinary exact P-values (with 95% precision indicated in parentheses) are 0.026 (± 0.001) for the nominal-by-nominal test, 0.024 (± 0.001) for the nominal-by-ordinal test, and 0.013 (± 0.001) for the ordinal-by-ordinal test. The corresponding estimated exact P-values for modified exact P-values P* are 0.026, 0.017, and 0.013. The asymptotic P-values are 0.028, 0.015, and 0.007, respectively. The ordinary exact P-value for the ordinal-by-ordinal test is 0.013. For a / x J table with ordinal variables, StatXact gives ordinary exact P-values, based on methodology in Agresti et al. 1990. Table 3.6 summarizes results for the tests we have discussed. Note that utilizing the ordinality provides very strong evidence of association. Also, the modified P-value can give sharper inference for small n. Table 3.5. Ex amination results compared with tutor ’s predictions. Group Result s Better Same Worse Treated 582 Control 0 11 6 Source: Patefield (1982) Better : Better than predicted Same : Same as predicted Worse : Worse than predicted

PAGE 102

94 Ordinary P-value Modified P-value P* Asymptotic P-value Nominal-by-Nominal 0.026 0.026 0.028 Nominal-by-Ordinal 0.024 0.017 0.015 Ordinal-by-Ordinal 0.013 0.013 0.007 3.7 FORTRAN Program for Simulation Patefield (1981) provided a subroutine for generating two-way random tables with fixed row and column totals. We can apply his algorithm stratum by stratum in order to construct three-way random contingency tables. We utilize the six exact tests for testing conditional independence in 7 x J x K contingency tables that were discussed in Section 2 and Section 3. These test statistics are score statistics for loglinear models, and they do not require fitting the model. The computations, which involve simulating exact conditional distributions, are considerably simpler when one can use test statistics that do not require fitting the model for each table generated for the simulations. Boyett (1979) also constructed a subroutine that generates two-way random tables from the exact distribution with given row and column totals. PatefieldÂ’s (1981) subroutine is faster for larger values of n, and it can calculate the probability of each generated random table. By the Monte Carlo sampling of tables in the reference set, we can approximate exact inference with simulated exact and modified exact P-values for testing conditional independence. By resampling these random contingency tables, the P-value is updated. The FORTRAN program runs interactively. For computational accuracy, double precision is used. This program is designed for IBM-compatible PCs or UNIX

PAGE 103

95 workstations, and the general structure of the program and part of FORTRAN source code are listed in Appendix B. 3.7.1 Restrictions Two-way random tables must have at least two rows and two columns, and row and column totals should be positive. The maximum number of rows and columns is 50, and maximum number of strata is 20. The number {N ROW 1) x {NCOL 1) should be less than 250. This is the maximum array for the variance-covariance matrix in the nominal-by-nominal test. Recursive calculation of log-factorial through log(n-fl)! = log(n)! + log(n + 1) has the disadvantage of accumulating a large rounding error (Verbeek and Kroonerberg 1985). For accuracy, double precision is used for the logfactorial, and the log-factorial can be computed up to 25000.

PAGE 104

CHAPTER 4 IMPROVED EXACT TESTS EOR ORDINAL VARIABLES IN / x J x A' TABLES 4.1 Introduction Consider contingency tables under the full multinomial model where row and column classifications are ordinal. In two-way contingency tables when both classifications are ordinal, the null hypothesis of independence can be tested against the alternative that utilizes local log odds ratios. Many tests for measuring ordinal association have been proposed. We can utilize tests based on C — D, the number of concordant pairs minus the number of discordant pairs, or based on the gamma statistic. Both are discussed in Agresti (1990). Also, log-linear models with ordered categories are discussed. Agresti, Mehta, and Patel (1990) provide an algorithm that permits exact tests for the linear-by-linear association model for two-way contingency tables with ordered categories. If an exact test is desired with size being equal to some preassigned value, then randomization would be required on some tables of observed frequencies. This is typical of any discrete problem. We want the resulting test to be admissible even though randomization occurred. Cohen and Sackrowitz (1991) proved a theorem that gives the class of exact, unbiased, and admissible tests. Also, Cohen and Sackrowitz (1992) suggested a procedure for an exact test of size a, and a modified P-value. Such tests are performed conditionally, given the values of the sufficient statistics for the nuisance parameters under the null hypothesis. Hence, the critical value depends on the values of the sufficient statistics. 96

PAGE 105

97 They constructed the exact test of size cv by ordering the tables according to their probabilities on sample points where the test would randomize. They made the number of tables on which randomization would occur considerably smaller than in the usual test. We could use another test statistic directed toward a broader alternative hypothesis at the randomization points, utilizing the modified approach discussed in Chapter 2. Cohen and Sackrowitz (C-S) focused on two-way tables, and showed unbiasedness of tests in two-way tables. Eaton (1970) showed the essentially complete class in an exponential family. EatonÂ’s theorem shows that the essentially complete class consists of tests whose acceptance regions are convex with possible randomization on the boundary of acceptance region. Furthermore, Ledwina (1978a, 1984) gave the class of admissible rules in an exponential family. Admissibility of tests for the C-S theorem is obtained using the arguments in Ledwina. We focus on analyzing three-way tables. The problem we will consider is testing conditional independence, assuming that the model of no three-factor interaction holds. We first introduce theorems and lemmas from C-S (1991), and then generalize these to three-way contingency tables. In Section 2 we state the theorem of C-S (1991) as well as related lemmas, that give the class of unbiased admissible tests. In Section 3 we show unbiasedness of tests when one wishes to test a null hypothesis of conditional independence against the alternative of no three-factor interaction model in three-way contingency tables. Sections 4 and 5 present the complete class of tests and admissible tests in an exponential family. Using these arguments, the tests of the C-S theorem lie in a complete and admissible class when we consider three-way tables under the multinomial model. Section 6 generalizes to the three-way case some results of Cohen and Sackrowitz (1991, 1992) regarding admissibility of tests for two-way tables. For an ordinal alternative, we discuss construction of tests of conditional independence that are exact.

PAGE 106

98 unbiased, and admissible. As a special case, we note that the ordinary randomized test of conditional independence for 2 x 2 x K tables is usually inadmissible. Section 7 illustrates the exact, unbiased and admissible tests with examples. We test conditional independence in 2 x 2 x 5 tables. Section 8 gives some comments. 4.2 Basic Results in Two-way Contingency Table Consider testing independence against the alternative that all local log odds ratios are nonnegative with at least one local log odds ratio positive for a two-way table. We will state the class of tests that are simultaneously exact, unbiased, and admissible in this section. We need definitions and lemmas for the proof of unbiasedness of tests, obtained by Cohen and Sackrowitz (1991). We will extend their theorem and lemmas to three-way tables in the next section. Consider an / x J contingency table under the full multinomial model where each classification is ordinal . Let N = } be the 7 x J two-way contingency table of cell frequencies, and let tt = {TTjj } be the / x J matrix of corresponding cell probabilities, where n = SSrijj, and SSTTjj = 1. Let njq_ be the zth row total of cell frequencies, * = I)’’’ and n_).j the jth column total of cell frequencies, j = m = ({ujq.}, {ra_|_j}). We define the local log odds ratios as = log ^ _ l,---,7 — 1, j = l,---,J— 1. Our testing problem can be expressed as testing the null hypothesis Hq : for i = 1, • • • , 7 1, j = 1, • • • , J 1. From Ledwina (1984), under the full multinomial model, the distribution of an observed random vector, N, of the cell frequencies (nn, ,n/j) can be written in the form f{N) = d”n!n(j^,(n,,!)-i exp(E.'jl-/-'n,,a,, + + S/r^r+,d,), (4.1) W-l.J-l

PAGE 107

99 where a^j — k = dj — and d = (1 + + Se“'^+*’-+''^)-\ Note that (4. 1 ) is the density of multivariate exponential family, and a/t; = Then our hypotheses become //q : V’p = 0, i = 1, 1, j = 1, • • , J 1, and Ha ' ’ki] > 0, with strict inequality for at least one pair Also let Tjj = T, = (r,!,--i = I,-,/ — 1, and T = (Ti, • , T/_i). Attention can be restricted to the sufficient statistics (T, m) which have the joint distribution f{t, m) = /3(V’, 6, d) exp(S,^~^S/"Vpdp + E/-^ni+6, + E/-^n+,d^)(7(t, m). (4.2) Note that {T,m) is a one-to-one linear transformation from the space N. Let us next consider the structure of an exact test. If one wishes an exact test such that the size is equal to nominal value, any test procedure would require possible randomizations on some points of the distribution of test statistic. For an observed table N, a test chooses rejection or acceptance with certain probabilities that depend on N , denoted by <^{N) and 1 — ip{N), respectively. A randomized test is therefore completely characterized by (/?, the critical function, with 0 < ^{N) < 1 for all N. If tp{N) takes on only the values 1 and 0, then this becomes a nonrandomized test. Let ^p{N) denote an exact test of size a depending on T and m for the hypotheses concerning the distribution of N (or the joint distribution of T and m), and also log 'KiJ'Kij 1 T^iJ log 7T/J log — T^IJ

PAGE 108

100 denote the conditional test as a function of t, for each fixed m, by (pm(t). If the conditional test has conditional size a, then the size of the original test can be obtained from the conditional tests by taking the expectation over m, which is = ~ where i/ refers to the nuisance parameters for m. By Lehmann (1986), m is sufficient and complete under the null, and any similar test of size a: must have Neyman structure. Hence, the test for each fixed ?n, (prnit), must have conditional size a, i.e., E^_Q[ipffi[t)\m] = a for all m. Accordingly, if
PAGE 109

101 Hence, an exact test is unbiased and admissible if and only if conditionally, given the acceptance regions are monotone (in the sense that the corresponding is monotone) and convex with randomization possible only at extreme points. The following definitions and lemmas are used for the proof of the unbiasedness of tests in Theorem 4.2.1. Let a; be a A; x 1 vector lying in x X 2 x x Xk, where Xi is a totally ordered subset of . Let Hk denote the family of nondecreasing functions on X’^. Let G and f{x) be a nonnegative function defined on X'^ satisfying fix V y)f{x Ay)> /(®)/(y), (4.3) where V and A are the corresponding lattice operations on i.e., for a; = (xi,--,Xk), y = ivu--,vk) xy y = (max(xi,?/i),max(x2,2/2),--, max(a;;t, and X Ay = (min(a:i,yi),min(x2,?/2),--,mm{xk,yk)). From Karlin and Rinott (1980) we have the following definition. Definition 4.2.1 A function with the property (4.3) is said to be multivariate totally positive of order 2 (MTP 2 ) on XK Also a A; x 1 random vector [/ = (f/j, • • , Uk) is MTP 2 if its density is MTP 2 . The multivariate total positivity is defined in terms of ordering on a lattice. Karlin and Rinott showed that if f{x) and g{x) are MTP 2 on X\ then f{x)gix) is MTP 2 on X^ Also, if fix) = gix„x^), where g is TP 2 on X, x X^, then / is MTP 2 on X*^. Hence, products of such functions are MTP 2 on X^. With connection to MTP 2 density, Fortuin, Ginibre, and Kasteleyn (1971) stated the following inequality, which we denote by FGK. Let C7 be a random vector whose density is MTP 2 , with respect

PAGE 110

102 to a product measure defined on a product set. Let LLi, IV 2 be functions of U lying in Hk. Then Now let u EW,EW2. (4.4) Eh{U) Eh{V), (4.5) 7=1, •••,7-1, j = l,...,J-i, Ti is MTP 2 (4.6) TjITi, • • • ,r,_i is MTP 2 for all z = 2,3, • • ,7 — 1 (4.7) if Tj < T' for j = I,-,i1 . (4.8) E{W{T)W*{T)\m} > E{W{T)\m}E{W*{T)\m}. (4.9)

PAGE 111

103 The next lemmas are three conditions assumed for the previous lemma, and the proofs are given by Cohen and Sackrowitz (1991). Lemma 4.2.2 Under Hq^Ti given m is MTP-iLemma 4.2.3 Under //q, T,|Ti, • • • , T,_i, m is MTP^ for alH = 2, • • • , / 1. Lemma 4.2.4 Under //o, T,|Ti, • , T,_i, m

(4-11) where Cj = Next, we show that tests based on T = have a desirable monotonicity property; hence, they are unbiased. We note that by the definition of monotonicity (all are fixed, except one) for any 7, J the statistic T = ESCj is monotone in i = 1, • ,7 — 1, j = 1, • • • , J — 1. Hence, the test based on T is monotone and then unbiased, since it satisfies the condition of Theorem 4.2.1. This is the test statistic that Cohen and Sackrowitz used in two-way tables with ordinal alternative.

PAGE 112

104 For the admissibility portion, Ledwina’s theorem will be applied and stated in Section 4.o. The unbiased and admissible tests will be explained in Section 4.6, and we will show that the tests satisfy the properties of Theorem 4.2.1. — Unbiasedness of Tests in Three-way Contingency Tables In this section, we will generalize Theorem 4.2.1 for testing conditional independence in three-way contingency tables. The unbiasedness portion of the tests in Theorem 4.2.1 considering three-way tables is proved with lemmas, and we utilize the definitions and lemmas stated in Section 4.2. For the admissibility part, we will apply the theorems in Ledwina (1978a, 1984) and Matthes and Truax (1967) for exponential families, which will be stated in the next sections. Showing unbiasedness of tests is the main part for proving Theorem 4.2.1 in three-way tables, and we will follow the arguments in Ledwina for the admissibility of the tests. Then we have the exact, unbiased, and admissible tests in three-way tables. 4.3.1 Conditional Independence Model We will specify the general multinomial model in three-way tables, and state the testing problem under the null hypothesis of conditional independence. We will prove unbiasedness of tests and related lemmas focusing on three-way tables. Consider an I X J X K contingency table under the multinomial model, where each row and column classification is ordinal. Let N = denote observed cell counts, with expected frequencies Let tt = be probabilities for a multinomial distribution

PAGE 113

105 over I xj X K cells, where n = SSSui,^., and SSSTTi^^ = 1. From Ledwina (1978b), the distribution of N can be written as f{N) 711 I,J,K _ I ^tjk where ^k f^i+kCik ^ ^ ' J— 1 vA'— 1 . A'-l (4.12) O^ijk — log T^iJkT^Ijk '^iJK'^IjK ^ijK^IJK bij 'K^jK'^IJK log ^iJK'^ IjK Cik 1 T^UkT^UK = log T^iJKT^IJk djk log '^IjK'^IJk ^ T^iJK r -log , fj t^uk = log t^ijk 1 ^ijk'^IJk 1 log log ^iJk^ Ijk , i'fc == log T^IJk T^IJK T^ijKT^IJK '^iJK'^IjK and / = +T,e^^ +T,e^^ ^j]Qei+fj+gk+b,j+cik+djk+a,jkyi Let ’^ij{k) log '^ij k'^i-\-l,j-{~l,k which is the local log odds ratios in A;th stratum. Note that log '^Imk^IJk '^IJk'^Imk

PAGE 114

106 Hence, we have ^Imk ^j=mi^ij{k) ^ij{K))Also let T^in=il:]^-inimk-, and = (Tn(A:), Ti 2 (k), • • , Ti(^j-i){k)), ^ = l,•••,/1 , and T = . . . , , t\^\ • • • , , • • • , t[^\ Then ^k=l [^L\^j=H'^ij{k) i’i](K))T^j(k)] ~ ^k=\ \'^i=\^i=l^m=\'^]=\{i’ii(k) — '4’ij(K))nimk\ = Sfc=l ^/=1 ^m2\{'^i=l'^'j=m{',,(K))%fc) + Sj Sj Yil ^n^jkdjk + S/^i‘«t++ei + Ej~ln+j^fj + E^S^^n++kgk) g(t, r). (4.14) Hence, no three-factor interaction has the following equivalent expressions, for all z and j: kkijk — 0, h — I,*** ,A 1 ^ij(k) — i^ijiK): ^' = 1 , ’ ‘ , A — 1 = ^ij(2) = • • • = -lpij(K) = ^ijIt means that the association between row and column variables is identical at each level of stratum.

PAGE 115

107 When we test the model of conditional independence, we will assume that the model of no three-factor interaction holds. Hence, we assume that for all i and ji ii’ijik) — '4’ij(K)) = 0, ^' = 1, • • • , A — 1. Let Then m-\Y'l—l \yj—l 1 ^i=l ^j=l (4.15) Let "»=({".+»). {"+jt}). i=l,...,7-l, t = I,... ,/C. Using (4.15), we rewrite (4.14) as f{N) = /J(V-.c,d,e,/,s)exp(Ef-'E'-'S/-'(*,(K + S'-'S/-Vy,mr„+ + + E/-'Ef + E -^1 n,++e, + -f nj^^kQk) g{t, m). Note that = ^LiT^uk)Then (4.16) nA' — 1^7 — 1 * i'^ij(k) '^ij(K))Tij(k) + i>ij{K)Ti o+ Efc Ej S^. (il’ij{k) ’’Pij{K))Tij(^k) + ^E^ ^'0o(A')E^_jTjj(fc) = Ef-'E/-'E/-V,j(fc)T,,(fc) + E^'E/-V. ij(K)Tij(K) — ^k=\'^i ^E^ ^i>i3(k)Tij(k). (4.17)

PAGE 116

108 Hence, using (4.17), we rewrite (4.16) as f{N) = /5(V’, c, d, e, /,flf) exp(E^Sf ^’^ij(k)Tij(k) + '^Li^t++ei + T,jIiTi+j+fj + E^~^^n^^k 9 k) • g{t, m). (4.18) From (4.18), we see that we may treat the observation as and the parameters &s The problem is to test conditional independence under the assumption that the model of no three-factor interaction holds. Here we consider the problem under the simplifying assumption that the ipij{^k) have a common tp over k, so that the hypothesis reduces to //q : i/j = 0, when the '0’s are not assumed to be equal but '^ij{k) — 0si, k — 1, • • • , A for i = l, •••,/ — 1, j = ^ J — Therefore, our hypotheses become ^0 : Aj{k) = Aj{K)^nd^Pij^K) = 0, i = l,••,/-!, j = 1, . . . , J _ i, A; = 1, • • , 1 44 0„(fc)=0 = O, i = 1,... ,/1, j = 1^... Ha : No Three-Factor Interaction Model. The test is carried out conditionally, given the values of margins, and the conditional joint distribution of N given m under the null reduces to the product of K hypergeometric mass functions, which is the table probability under the null. 4.3.2 Unbiasedness of Tests In order to prove unbiasedness of tests in Theorem 4.2.1, we need the following lemma. In the lemma, three conditions are assumed and they will be verified after proving unbiasedness in Theorem 4.2.1. This test is done by conditioning on the

PAGE 117

109 values of all elements in the margins, m = ({ui+fc}, that are random, so that m the conditional model these margins m are fixed, and cell counts from different strata are independent. Lemma 4.3.1 Assume Hq is true. Also assume, conditional on m, \)T^^^\sMTP 2 for all A: = l,---,/i (4-19) ii\ 12 ^( 1 ) y(l) rp(2) rp{2) rp(K) „(A') . for alM = 2,3,--,/1, k= I, ,!< (4.20) iii) |tS'\ • • , Tt\ , • • • , t\^}

E{W{T)\m}E{W*{T)\m}. ( 4 . 22 ) Eioof. We suppress m, since all statements are conditional on m. Now EW{T)W*{T) = ^fT(TS'V-,tH) = £{£ir(rS'>,... .T^)! > £{£(ir(rS'>,... ,t<0) (I) rri{2) (A-)( 4 . 23 )

PAGE 118

110 by (4.20) and the FGK (Fortuin, Ginibre, and Kasteleyn) inequality. The expression (4.23) can be written as ... .TiLVd't--,tK) (4.24) (b ti(2) where W'.ItS’', . . . , T>'_v . . . , T\1l) = EW{lf\ .... ri^’i |Ti", .... rW ) 41 ) ^( 2 ) ... rr{K) (1) and Note that (4.21) implies and W* G Therefore, one can use (4.20) again. Hence, = E{£ir,(rl".... .Ti‘_t,r«.... ,rS)vrr(ri".... .T<'i.r«... .rW)| ’(1) 'Ti(2) w... rp(^) 1 ’ » 7-35 -*2 ) 5 /-aJ / > ^{^(h/i(tS'),-.. ,tH|t!'V-,t^])^(vf;(tS'\... ,tK|tS'V-,tK))} ’ll) 'T^(l) (4.2,5) by letting H/2(Tf\-.. ,tS]) = ^1Ti(tS'V-,tK|T«,--and h',*(tS'>.... ,rW) = (4.26) The process can be repeated until we have that (4.25) is greater than or equal to EW,_2{T[^\ • • , T!^W/-2(rS'\ • • • , (4.27)

PAGE 119

Ill The last step comes from (4.19) and FGK inequality. Also by the definition of lT/_ 2 , rp{2) » 1 5 (4.28) Similarly for W*. Using (4.28) on the right-hand side of (4.27) we have F;{fU(T)lU*(T)|m} = E{W{T^^\. rrW rri{2) ?-*/-! 5-^1 J' 5 7-1 1 rrii^) rp(2) ) /-I) 1 ) • • • > E{W{t[^\nnW rp(2) 5 /-15 -t 1 5 • E{W*{t[^\ . . . rp[2) ••• = £;{fU(T)|m}£;{lU*(T)|m}. (4.29) Proof of Unbiasedness in Theorem 4.2.1 Now we show unbiasedness of tests in Theorem 4.2.1 in three-way tables. Let /^(t|m), T = (tS'\ • • • , T?_\, tS''\ • • , , t [^'\ • • • , denote the conditional density of T\m, where V’ lies in the alternative space and let fQ{t\ni) be the conditional density under the null. Using (4.18) we derive conditional densities.

PAGE 120

112 Then, W*{t) oc exp(SfE,^ Vij(fc)Tii(fc)). (4.30) Hence, W*{t) is monotone nondecreasing in T, and W*{t) G Tf(A')(/-i)(j-i) for any V’ in the alternative space. Also, by the assumption, test ipm{t) G Consider for tjj in the alternative space. > [Sv^m(i)/o(^|m)][ElT*(i)/o(t|m)], by (4.22) = «• (4.31) By the application of Lemma 4.3.1, we have inequality. Expression (4.31) implies conditional unbiasedness of which in turn implies unbiasedness of the original test ip{N), by noting that m(^)|»Ti)] > tt, where u refers to the nuisance parameters. Hence, we finish the unbiasedness portion of Theorem 4.2.1. Proof of Lemmas Now we verify (4.19), (4.20), and (4.21), which are conditions assumed for Lemma 4.3.1.

PAGE 121

113 i) Under Hq, given m is MTP 2 , for all = 1, • • • , K. Proof . Let 1 < A: < A", and z = I,-,/1, j = 1,... , J_ 1. T\^^\{n^+k'}An+Jk'} k' = I,--. ,K ^ {n+jfc}, {ni+;t/}, where A:' = 1, • • • , A: 1, A; + 1, • • • , A" {n+jA:} since {ui+*;/}, {n+j*,./} are independent of pf^\ {rzi+;t}, {n+,k}, (4.32) which is MTP -2 by Lemma 4.2.2. ii) Under Hi 0 , rri(k) “* i I-* 1 j . . . T^(^) > j-n -* 1 5 5 t-1 ) rp{K) 1-^ \ 5 is MTP 2 for all z = 2,3,-.. ,/ -1, A; = l,-.. ,AL Proof. Let VI VI and for all z = 2, ••• ,/-l, rri{h) 1 i \ 1 ? 41 ) rp(2) rp{2) „(7C) I-l 5 -*• 1 > 5 -*• i-\ 5 5 -I 1 . . . 'T'(^) > 5 i-1 ' , m 44 rp{k) -*• i . . . 'T’W 'Tl(l) 5 -* i-1 ) 1 ? ’ -* 1-1 ^ * rri(k—\) 5-^1 5 _ Ji(fc-l) ^(fc+l) _ _ _ rp{K) ’ 1 — 1 ’ 1 5 9 t — 1 5 ^ {rii+k}, {n+jk} by the independence of the strata, 4 ^') which is MTP 2 by Lemma 4.2.3.

PAGE 122

114 Hi) Under i/o, given m, 'TÂ’iP T<('P 5 i -1 J 1 ? 'X'{^ ^

.... .r'W.rw ... ,t'W |rl \... . {’i+j't}, <'' {n;.+fc}, for all i = 2 , 3 , •••,/1 if = 1 ^... which is proven by Lemma 4 . 2 . 4 . Hence, all three conditions assumed for Lemma 4 . 3.1 are established. We next present the complete class of tests and admissible tests in an exponential fainily.

PAGE 123

115 4.4 Complete Class of Tests We show that the tests in Theorem 4.2.1 lie in the complete class of tests. From (4.18) we have fi^) — c, d, e, /,^) exp(S^E/ ^'^ij{k)Tij(^k) + ^L\ni++ei + E/~/ n+j+/j + g{t, m). (4.34) We rewrite (4.34) in the following family of distributions, P{T,Z;ip,w) = C(rJj,w) exp[ip' T + w'Z]. (4.35) That is, a random vector (T,Z) e x has an exponential density. Let 0 denote the natural parameter space, and assume (0,0) is an interior point of 0. Eaton described an essentially complete class of tests, and we need the following notation to formulate Eaton’s result. Let V’ be the parameter of interest and w be the nuisance parameters. The problem considered is that of testing hypotheses. Ho : ^ = 0 , Ha : V’ e C where Hi is contained in some half-space. It is assumed that for each V’ G there exists a.weR'' such that G 0. Let V C il™ be the smallest convex cone containing fli, and let V~ denote the normal cone of V, e.g., V = {w e i?” : E”^^u,Uj < 0 for all t; G E}, rri = df. (4.36)

PAGE 124

116 Moreover, $ stands for the class of nonempty closed convex sets in and — {C* : C* G $ and V~ (Z C — c for each c G dC}^ (4-37) where dC stands for the boundary of C . Consider the set, D (C), of test functions with the following property: if V? G D*{V), there exits a measurable set A C x such that each Z section, A{Z) C R \ is in $(1/) and f 1 if T G A(Zf, ^(t,z) = r(t,z) if TedA(Z), [O ifTGlntM(Z), where A(Z)‘^ refers to the complement of A(Z). The notation A(Z) refers to Z section of acceptance region. This means the acceptance region at fixed Z ~ z when we consider the conditional test. Eaton (1970) showed that D* is an essentially complete class for testing //q : -0 = 0 against //« : 0 G fij. In light of (4.34), the testing problem in three-way contingency tables fits the framework of Eaton, which yields the fact that the tests in Theorem 4.2.1 lie in the complete class of tests. 4.5 Admissible Tests Matthes and Truax (1967) described the class of admissible tests on mnltivariate exponential distributions for testing //q : 0 = 0 against : 0 ^ 0 , based on the conditional distribution of T given Z. This description is given under the assumption that the support of conditional distribution is finite. They showed that a test


PAGE 125

117 developed by Matthes and Truax, Ledwina (1978a, 1984) gave admissibility of tests on multivariate exponential distributions with discrete support. It is characterized by the fact that the conditional distribution of T given Z = z is independent of the nuisance parameters w. Hence, we consider the admissibility on each section of Z = z separately, and then obtain the class of admissible tests for the original problem. The class of admissible tests for Hq : ip = 0 against Ha : ip e fli C R”^ in (4.35), based on the conditional distribution of T given Z = z, is described as follows. A test if{t) is admissible if and only if there exists a set A G $(!/) in (4.37) such that on each surface of Z = z, A(Z) C and where E denotes the set of all extreme points of A. This means that a test ^{t) is admissible if and only if for each fixed z, the acceptance region is convex, and randomization happens only at extreme points. Ledwina (1984) also gave connections between admissibility of tests for the conditional distributions and the initial problem of tests based on (4.35). Ledwina showed that the test is admissible for testing Hq against //„ if and only if for every fixed Z = z, the test
PAGE 126

118 iA Exact, Unbiased and Admissible Tests In this section we illustrate the exact, unbiased, and admissible tests that satisfy the properties of Theorem 4.2.1. We discuss how to construct unbiased tests and how to set up critical regions to obtain tests of conditional independence of fixed size a, for the ordinal alternative. We focus on three-way tables where row and column classifications are ordinal, and the contents of Sections 3, 4 and 5 are combined together to give the unbiased and admissible tests. One advantage of ordinal models over the nominal-scale models is that tests based on ordinal models have more power to detect certain types of association and interaction (Agresti, 1990). The model of homogeneous linear-by-linear association, which utilizes the ordinality of X and Y is log M + Af + Aj + Af + (3uiVj + + AJ/. (4.38) We test conditional independence, Hq : = 0, or equivalently, /3 = 0, against the alternative (4.38) of linear-by-linear association, using the sufficient statistic for (i in that model, T (4.39) We show that a test based on T satisfies the conditions of Theorem 4.2.1, so it is the exact, unbiased, and admissible test. First, we show that T can be expressed as T — {ui — u,+i)(u^ — Vj+i)t^j(^k)] + C, where and (7 is a constant depending on the scores and the fixed marginal totals.

PAGE 127

119 Let U/+1 = = 0. Then, k i 3 K I J ~ X^{(^t ~ '^t + l) + ('lij-l-l — '^ 8 1 2 ) + • • • + {uj — W/^l)} • k i j {(wj — Wj+i) + (uj+i — Vj+2) + • • • + (nj — Vj+i)}mjk] K I J I J k i J a=i l,=j K I J I J = EEEEE(“. Ua+i){vb Vi,+3)nijk] k i j a=i b=j A / J a 6 = EEEEE(« k a=l 6—1 J— 1 j— 1 ^ I J a b = EEEK Ua+i){Vb V6+1) ^ k a=l 6=1 j_j j_j K I J = EEE(“.^®+l)(^i ^j+l )L'2(A:)] k t=l j=\ K l-\ J-\ = IZtl] Ui+i){Vj Vj+i)tij^k) k 2 = 1 j = l + J-l /_! (it; n/+i) ^{vj + {vj nj+j) Ui+i)Lv(fc) + uivjt]j(^k)] i=l K /-I J-l = EEE (ujUi^i){vj + ( 7 . k 2=1 j=l Thus, T is monotone in {tij^k)} if the scores satisfy (n, Ui+i){vj n^+i) > 0, ( 4 . 40 )

PAGE 128

120 for z 1, • • • , / 1, j __ ^ J that is, if the scores {u,}, {r;j} are both monotone increasing or both monotone decreasing. We note that the statistic is a special case of T, for the equally-spaced scores {m = I — (i — 1)}, and {vj = J ~ U ~ !)}• Thus, tests based on T are unbiased. In constructing critical regions, we utilize a secondary statistic, T', for ordering the tables for which T = tg. The secondary statistic is used to generate a secondary partitioning to set up critical regions to obtain tests of conditional independence of fixed size a. When / = J = 2, we could use T' = to order the tables for which T = tg. The approach of Cohen and Sackrowitz (1992) is to utilize their conditional null probabilities to order the tables. These relate to the modified P-value, which we discussed in Chapter 2. The same argument applies if one uses some other secondary statistic. Let Cg, be a constant, depending on m, such that P{T > > a and P{T > C4 = A < «. The test rejects if T > Ca. When T = C^, consider all tables having T = and order the tables according to their secondary test statistic values. When the large values of T contradict the null, attention can be given to the tables having larger values of T' among the tables having T = CaFor another case, if some table has small probability under the null hypothesis, it implies that such a table would be unlikely to occur if Ho is true. And for a particular value of T, a smaller table probability under the null corresponds to stronger contradiction to the null hypothesis. Hence, attention can be given to the tables whose null probabilities are less probable among them when we construct the rejection region using the null table probability for the secondary statistic. Thus, when T = Ca, we reject for those tables whose secondary statistic values are the largest or whose probabilities are smallest, and whose probabilities total at most (a — A). Instead of randomizing on all tables where T = Ca, we allow randomization only at extreme points of a convex acceptance

PAGE 129

121 section of the remaining points, so that the test is exact, unbiased, and admissible. We denote a test of this form by ip*. Forming the critical region in this way gives a test that is less likely to require randomization than the usual test ip that randomizes on the entire set {n ; m fixed, T = Ca}Also, the modified test is better than ip, since usually the entire set of tables having T = Ca contains nonextreme points, making ip inadmissible. In this section, we have shown that a test based on T using monotone scores satisfies the properties of Theorem 4.2.1, since a test based on T has a desirable monotonicity property by the construction of T, and we allow randomization only at extreme points of the convex acceptance section. Hence, the test ip* is exact, unbiased, and admissible. A nonrandomized test using T is unbiased and admissible, but it would be conservative when used with a fixed size a. But the test, ip*, would have actual size closer to a nominal level than the ordinary test. 4.7 Example We consider the test of conditional independence in three-way contingency tables, where row and column variables are ordinal. We assume that the model of no three-factor interaction holds and we can construct tests to increase power against important alternatives. We will illustrate construction of an exact, unbiased, and admissible test using 2x2x5 contingency tables. When / = J = 2, the usual statistic Efc^iuresults from the scores uj = Uj = 1, U 2 = U 2 = 0 in T = When I ~ J = 2, the test ip* gives an alternative to the ordinary one for testing conditional independence for a set of 2 x 2 tables, under the assumption of a common odds ratio. The ordinary test is often inadmissible. For an / x J table, we can construct exact, unbiased, and admissible tests for an ordinal alternative to independence

PAGE 130

122 by using a modified approach, but it is not easy to display the acceptance section if / and J are greater than 3. 4-7.1 — Test of Conditional Independence : 2x2x5 tables We utilize the middle three subtables of Table 2.1 to illustrate construction of an exact, unbiased, and admissible test. We study size a = 0.05 tests based on T = Given D and C marginal totals at each level of P = | and 1, nn 2 can range between 0 and 3, nug can range between 2 and 6, and nu 4 can have 5 or 6. The whole distribution of and nn 4 is composed of 40 tables. Since P{T > 13} = 0.1136 > q; and P{T > 13} = 0.0200 < a, randomization is required for those tables with T = 13. We use Y.X'1 or the null table probability for the secondary statistic. Followings are the tables with T > 13. 1)T = 15 {n\v2 ^113 ^lu) — 3,6,6) with P(3,6,6) = 1452’ .XI 11.09 2)T = 14 (^*112 riii4) — f (2,6,6) with P(2,6,6) _ 9 1452’ E. XI = 7.54 ^^1135 (3,5,6) with ms, 6) — 16 1452 ’ E. XI = 6.59 1 (3,6,5) with P(3,6,5) _ 2 1452’ E. XI = 11.09 3)T = 1 3 ' (1,6,6) with P(l,6,6) _ 9 1452 ’ E. XI = 7.54 (^ill2, (2,6,5) with P(2,6,5) _ 9 1452’ E. XI = 7.54 ^*113, nn4) = < (3, 5, 5) with P(3,5,5) — 16 1452 ’ E. XI = 6.59 (3,4,6) with P(3,4,6) _ 30 1452’ E. XI = 5.09 (2,5,6) with P(2,5,6) _ 72 1452’ Efc XI = 3.04.

PAGE 131

123 The usual 0.05-size conditional test based on T is I 0 otherwise. 1 if (n„2, nii3, nii4) (3, 6, 6), (2, 6, 6), (3, 5, 6), (3, 6, 5) 0.3206 if n„3, n„4) = (1, 6, 6), (2, 5, 6), (3, 4, 6), (2, 6, 5), (3, 5, 5) This test randomizes with equal probability on all tables for which T = 13. Since the table (2,5,6) is an interior point of line segment between tables (1,6,6) and (3,4,6), it is not an extreme point of a convex acceptance region. It makes inadmissible by noting that randomization should occur only at extreme points in order to be admissible. Hence, another test ^p' will beat the test ip. Since the table (1,6,6) has the largest Y.k value or the smallest null table probability among tables for which T = 13, it can be included in the rejection region. The table (2,0,6) is now an extreme point for this test. Since randomization is permitted only on the extreme points of convex acceptance region, it is admissible. The exact test p that orders the tables according to their secondary statistic values is We can add tables into the rejection region until the probability of rejection is not greater than the size. Hence, two tables (2,6,5) and (3,5,5) are entered into the rejection region since they have the next largest Y.k values or the next smallest null table probabilities. Furthermore, the table (2,5,6), which has the table probability 0 otherwise. 1 if (nii2, ni43, n„4) = (3, 6, 6), (2, 6, 6), (3, 5, 6), (3, 6, 5), (1,6, 6), (2, 6, 5), (3, 5, 5) 0.3200 if (nn-2, ?iii4) = (3,4,6) . 0 otherwise.

PAGE 132

124 close to our size, can be excluded from randomization so that the table (3,4,6) is the only extreme point for possible randomization. The test ip* randomizes only on an extreme point (3,4,6) of its convex acceptance region, and it satisfies the properties of Theorem 4.2.1. Hence, it is exact, unbiased, and admissible. Compared to the previous test, it has the advantage of having only a single table for which randomization is necessary. The probability that randomization is required is only 0.0207, rather than 0.0937. In this data set, we get the same results of exact, unbiased, and admissible tests using either Y,k the null table probability for the secondary statistic. 4.8 Discussion For / X J X K tables, we generalized results of Cohen and Sackrowitz (1992) and showed how to construct exact, unbiased, and admissible tests for an ordinal alternative to conditional independence. The ordinary exact test of conditional independence for 2 X 2 X A tables is often inadmissible. In practice, randomized tests are unacceptable. Thus, even the tests described in Section 6 that require less randomization than usual are not intended for practical use. However, results of that section suggest an obvious way of forming critical regions for tests so that one can have actual size closer to a desired value (such as 0.05) than would be possible with the ordinary test.

PAGE 133

CHAPTER 5 CONCLUSION 5.1 Discussion The conservativeness due to the discreteness of a statistic is a typical problem for exact inference with categorical data. Ways of reducing the conservativeness in exact tests and confidence intervals were proposed in Chapter 2. We prefer modified exact tests and confidence intervals to the ordinary exact ones because they are less conservative than the ordinary ones, but still guarantee at least the nominal level. We also prefer confidence intervals based on inverting two-sided tests over those based on inverting two separate one-sided tests because they tend to be less conservative. The approach using a modified P-value can be utilized in approximating exact inference regarding conditional associations in / x J x K tables. In Chapter 3 we discussed six test statistics for conditional independence. We obtained precise estimates of ordinary and modified exact P-values by using a simulation algorithm for cases that currently are computationally infeasible. For / X J X A tables, Chapter 4 discussed construction of tests of conditional independence that are exact, unbiased, and admissible for an ordinal alternative. By using a modified approach, less randomization is required than usual, and we obtain actual size closer to a nominal level. The ordinary exact test of conditional independence for 2 x 2 x A tables is often inadmissible, and we showed how to obtain improved tests. 125

PAGE 134

126 5.2 Future Research We have considered improved “exact” inference about conditional association in 2 X 2 X A contingency tables. The idea of a modified P-value can be applied to any contingency tables, and it can be calculated for any test statistic having a discrete distribution. One research study could be the application of the modified approach to exact tests of no three-factor interaction. Zelen (1971) presented an exact test of homogeneity of odds ratios in 2 x 2 x K tables. For an exact test of no three-factor interaction for 2 x 2 x A tables, an efficient score statistic against the saturated model is the Pearson statistic for testing the fit of that model (Agresti 1992). We could use this score statistic as a primary statistic and the table probability as a secondary statistic to define modified P-values. We could study how much improvement can be obtained by using a modified approach. We could consider a modified confidence interval for the /? parameter in the linearby-linear association model. Under the alternative, the conditional distribution of T = J2HuiVjn,j has a noncentral hypergeometric distribution (2.10), where = 9, and C( is the sum of (firinij!) ^ for all tables with given marginal distributions having T — to (Agresti et al. 1990). By using a modified confidence interval, we could reduce the conservativeness of the Agresti-MehtaPatel interval. As we mentioned in Section 2.4.1 for 2 x 2 x K tables, we could base confidence intervals on tests in which the two-sided P-value uses a non-null test statistic, instead of the table probability. For instance, we could consider a test statistic X) A: I EV^(niu) ’ where under the alternative of assuming 0, mju is the mean of Uiu, and V{nuk) IS its variance. Since for a fixed value of 9, EV^(niu) is a constant, T{9) depends

PAGE 135

127 only on its numerator. By using the exact non-null distribution, we could construct a two-sided ordinary or modified confidence interval. Another area to consider is how we can apply important sampling (Mehta, Patel, and Senchaudhuri 1988) as an alternative to conventional Monte Carlo sampling to simulate the exact distribution and to estimate exact significance levels. In importance sampling, the tables are selected in proportion to their importance for reducing the variance of the estimated Monte Carlo P-values, whereas in Monte Carlo sampling, the tables are sampled independently with replacement from the reference set. The accuracy and the speed will be increased by using importance sampling. We could use a simulation algorithm to approximate exact confidence intervals. Then, we need to have an algorithm to simulate the non-null distribution. Under the alternative, the joint probability distribution of a table has a noncentral hypergeometric distribution, and random tables should satisfy the association structure as well as the fixed margins. As we construct an “exact” confidence interval for a parameter by inverting the results of the exact conditional tests based on ordinary or modified exact P-values, we can approximate exact confidence intervals for a parameter by the same method based on the estimate of ordinary or modified exact P-values. Also, we could approximate exact inference for the test of no three-factor interaction. In this case the conditional reference set is the set of / x J x K tables whose X\,XZ,VZ marginal tables are fixed at the corresponding values of the observed tables. More power would be obtained for narrower alternatives that utilize ordinality. For the test of conditional independence in / x J x K tables, we defined the class of exact, unbiased, and admissible tests. There are other null hypotheses of interest. We could consider the class of exact, unbiased, and admissible tests for testing no three-factor interaction against an ordinal alternative. In summary, we suggested exact inference regarding conditional associations in three-way tables, modifying the usual exact conditional approach. This seems to be a

PAGE 136

128 promising approach for categorical data analysis, and more work can be done utilizing this approach.

PAGE 137

APPENDIX A SOURCE CODE EOR EXACT INEERENCE Following are FORTRAN source code for computing the ordinary and modified exact P-values, four types of confidence intervals, and coverage probability. Data or its file name can be entered from console, and this program provides four types of confidence intervals or coverage probability by the option. When the coverage probability is requested, it makes five output files. They are “00. Cl” for one-sided ordinary exact confidence interval, “OM.CI” for one-sided modified exact confidence interval, “TO. Cl” for two-sided ordinary exact confidence interval, “TM.CI” for twosided modified exact confidence interval, and “COVER.P” for coverage probability for four types of confidence intervals. This program, for 2 x 2 x K tables, is an adaptation of one written by Vollset and Hirji (1991) for ordinary exact inference. integer itab(1000,4) ,I0T0T INTEGER NIK(2, 1000) ,NJK(2,1000) ,NT0T(1000) INTEGER ISUMA,J,SCD INTEGER*2 JH3 , JM3 , JS3 , JSS3 integer infhyl(lOOO) , infhyu(lOOO) ,INUM(270000, 20) , INUMl (270000 , 1 ) double precision hyp(0 : 2000) ,ds(0 : 1 ,0 : 5500) ,ddl ,lge DOUBLE PRECISION C(5500) ,B,LLL,K,FF,R1 (5) ,R2(5) DOUBLE PRECISION ROOTRF , EPS ,X0 ,X1 ,X3 DOUBLE PRECISION LL,UL,MH,MUE,KA(100,2) ,RUL,RLL,MIDP,MAXP,PVAL2 DOUBLE PRECISION FLOWER, FUPPER,FL0WB0,FUPPB0,MAXPE,pobsh,0DR DOUBLE PREC I S I ON ALPHA , VRBG , SVRBG , ALL , AUL , START , ELL , EUL , ELL 1 , EUL 1 DOUBLE PRECISION PALPHA,P_UP,P_L0,P_UP1,P_L01 129

PAGE 138

130 DOUBLE PRECISION P_UP2 ,P_L02 ,ELL2 ,EUL2 DOUBLE PRECISION OOCI (270000 , 2) , OMCI (270000 , 2) ,TOCI (270000 , 2) DOUBLE PRECISION TMCI (270000 , 2) , C0VER(1600 , 5) DOUBLE PRECISION HYPDIO (270000 , 1) ,POOCI , POMCI , PTOCI ,PTMCI DOUBLE PRECISION ELL3,EUL3,P_L03,P_UP3,DALL DOUBLE PRECISION DENO ,hypd(1000 , 0 : 2000) , POBSHl , PEXIMP , PEX DOUBLE PRECISION HYPD2 (270000 , 20) ,HYPD1 (270000 , 1) DOUBLE PRECISION CHI (270000) ,CHIOBS CHARACTER* 16 FNAME COMMON/PARAM/C , J , SCD , K , FF C0MM0N/CIl/ik,mxs,mxz,mxd,lge,itab,hyp,ds,ipar,kl,k2,ierr,pobsh COMMON /CH/ NIK,NJK,NTOT COMMON /ART/ lOTOT COMMON/ CI2/hypd , inf hyl , inf hyu , POBSHl , PEXIMP , PEX COMMON /DKIM/ DENO , ITOT, ISUML, INUM,HYPD2 , INUMl ,HYPD1 COMMON /CHI/ CHI,CHIOBS c C c c: C C C C C C EXTERNAL FLOWER , FUPPER , FLOWBO , FUPPBO DATA LGE /307.0D+00/ DATA KA(95,1) /3 . 84145882D+00 / KA(95,2) /2.5D-02/ DATA KA(90,1) /2 . 70554345D+00 / KA(90,2) /5.0D-02/ DATA KA(99,1) /6 . 63489660D+00 / KA(99,2) /5.0D-03/ DATA KA(80,1) /I . 64237442D+00 / KA(80,2) /l.OD-01/ DATA KA(50,1) /O . 45493642D+00 / KA(50,2) /2.5D-01/ DATA MIDP /O.DOO/, MAXP /O.DOO/ FF IS 1 FOR EXACT AND 0.5 FOR MID-P EXACT IMAX MAX NO. OF ITERATIONS K ALPHA/2 EPS STOPPING CRITERION LGE=307.0D+00 WRITE(*, 10000) 10000 F0RMAT(3(/) ,T12, '***** Ex2x2xK (version 24.0 — 5/94) ***i^^*> J ^ 1 /,T12, 'Ordinary and Modified Exact P-values and CIs',/, 2 T12,'for several 2x2 tables.',/, 3 T12,' One-sided and Two-sided Approach : ',/) write(* , 10001)

PAGE 139

10001 F0RMAT(T7,' This program calculates',/, 1 T7,' 1. Ordinary and Modified Exact P-values, ',/, 2 T7,' 2. Four Types of Exact Confidence Limits',/, 3 T7 , ' for the Common Odds Ratio, and',/, 4 T7,' 3. Coverage Probability for CIs.') WRITE(*, 10002) 10002 F0RMAT(/,T7, ' The program of Vollset, Hirji, Elashoff is ', />T7, graciously provided and slightly modified.', + /,T7,' Several routines are added for modified ', + 'exact inference.') WRITE(*, 10004) 10004 FORMATC /,/,T7,' Any questions about the use of this software + /,T7,' can be directed to Dr. Alan Agresti or Donguk Kim.') C C lA = 95 l-ALPHA/2 ALPHA=1 . DO-DBLE(IA) /lOO . DO 1 FF = O.D+00 IK=0 IMAX=50 K = KA(IA,2) EPS = l.D-09 c intrinsic functions : f loat () ; dexpO c iin=5 iot=6 c c maximum number of strata = 1000 c maximum value of range of c hypergeometric distribution = 2000 c maximum value of range of c final distribution = 5500 mxs = 1000 mxz = 2000 mxd = 5500

PAGE 140

c c maximum stratum size c mxss = 500000 C C READ DATA C 10 WRITE(I0T,999) WRITE(I0T,997) READ(IIN,15)FNAME C FNAME= 'peni . dat ' 15 F0RMAT(A16) IF(FNAME .EQ. Â’cÂ’ .OR. FNAME .EQ. 'COTHEN PRINT *,'GIVE l-ALPHA/2: 50,80,90,95 OR 99)' READ(IIN,16)IA 16 F0RMAT(I2) GOTO 1 END IF IF (FNAME .EC), 'k' .OR. FNAME .EQ. 'KOTHEN CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 18 write(iot,20) 20 format(/lOx, 'Enter no . of strata') read(iin,*) ik if (ik .It. 1) goto 10 open(unit=28,f ile=' 2 x 2 .dat ' ) do 30 i=l,ik write(iot,40)i 40 format (/lOx, 'Enter table' , lx, i3) read(iin,*)itab(i,l) ,itab(i,2) ,itab(i,3) ,itab(i,4) write (iot,*) (itab(i , j ) , j=l ,4) write(28,*) (itab(i , j ) , j=l ,4) NIK(1,I)=ITAB(I,1)+ITAB(I,2) NIK(2,I)=ITAB(I,3)+ITAB(I,4) NJK(1,I)=ITAB(I,1)+ITAB(I,3) NJK(2,I)=ITAB(I,2)+ITAB(I,4)

PAGE 141

NT0T(I)=NIK(1,I)+NIK(2,I) 30 continue GOTO 100 END IF cccccccccccccccccccccccccccccccccc OPEN (UNIT=27 , FILE=FNAME) C 0PEN(UNIT=27,FILE='peni .dat ' ) DO 70 1=1, MXS READ(27,*,END=100)(ITAB(I,J), J=l,4) IK=IK+1 WRITE(I0T,*)(ITAB(I,J),J=1,4) NIK(1,I)=ITAB(I,1)+ITAB(I,2) NIK(2,I)=ITAB(I,3)+ITAB(I,4) N JK ( 1 , I ) =ITAB (1,1 ) +ITAB (1,3) NJK(2,I)=ITAB(I,2)+ITAB(I,4) NT0T(I)=NIK(1,I)+NIK(2,I) 70 CONTINUE 100 PRINT *, 'NO. STRATA MK WRITE(*,80) 80 FORMAT(/, 'ENTER CODE FOR ANALYSIS ; ' , 1 />/.' 1: P-VALUE AND CONFIDENCE INETRVAL', 2 /,Â’ 2: COVERAGE PROBABILITY FOR CIS. ' ,/) READ(*,*)NCODE IF (NCODE .EQ. 1) GO TO 110 lERR = 0 j ci=0 call cnv2x2(ik,mxs,mxz,mxd,lge,itab,hyp,ds,ipar,kl,k2,ierr,pobsh, 1 jci,odr) print* print*, 'TOTAL NO. OF RANDOM TABLES =',iotot C C COMPUTE CIS FOR EACH RANDOM TABLE C

PAGE 142

134 OPEN (UNIT=45 , FILE= ' 00 . Cl 0 OPEN (UNIT=46 , FILE= ' OM . Cl 0 OPEN (UNIT=47 , FILE= ' TO . Cl ' ) OPEN (UNIT=48 , FILE= ' TM . Cl ' ) DO 5000 IAITN=1,I0T0T C print*, 'data' C PRINT*, 'NO. OF RANDOM TABLE =',IAITN DO 5010 1=1, IK ITAB(I,1)=INUM(IAITN,I)+INFHYL(I) ITAB(I,2)=NIK(1,I)-ITAB(I,1) ITAB(I,3)=NJK(1,I)-ITAB(I,1) ITAB(I,4)=NT0T(I)-(ITAB(I,1)+ITAB(I,2)+ITAB(I,3)) C print*,itab(i,l) ,itab(i,2) ,itab(i,3) ,itab(i,4) 5010 CONTINUE ff=0.d0 no lERR = 0 C CALL GETTIM(JH1,JM1,JS1,JSS1) j ci=0 cnv2x2 (ik ,mxs ,mxz ,mxd, Ige , itab ,hyp , ds , ipar ,kl , k2 , ierr ,pobsh, 1 jci,odr) IF (IERR .GT. 0) THEN CALL ERROR ( I ERR , MX s , mx s s , MXZ , MXD ) GOTO 170 END IF C CALL GETTIM(JH2, JM2, JS2, JSS2) C ITIME = 60*60*(JH2-JH1) + 60*(JM2-JM1) + JS2-JS1 c ATIME = FLOAT(ITIME) + FLOAT( JSS2JSSl) /lOO . 0 170 CONTINUE C CALCULATES OBSERVED POSITION IN SAMPLE SPACE C J POSITION C SCD SIZE COND. SAMPLE SPACE C ADDED BY DONGUK KIM ISUMA=0 C DO 180 1=1, IK

PAGE 143

135 ISUMA = ISUMA + ITAB(I,1) 180 CONTINUE J = ISUMA K1 + 1 SCO = K2 K1 + 1 IF (NCODE .EQ. 1) THEN print* PRINT* , ' print*, 'DISTRIBUTION in T : ',kl , ' <= T <=',k2,scd,' values' print* ,' OBSERVED T is in ',j,' th position among ',scd print*, 'OBSERVED PRIMARY TEST STATISTIC =',isuma print*, 'OBSERVED SECONDARY TEST STATISTIC = ' , SNGL (CHIOBS) PRINT* print* PRINT*, ' WRITE(*,906)PEXIMP WRITE (*, 907) PEX PRINT* PRINT*, 'PROB. OF OBSERVED TABLES =' ,SNGL(P0BSH1/DEN0) PRINT*, ' PRINT* 906 FORMAT('THE MODIFIED EXACT P-VALUE =',3X,F12.6) 907 FORMATC'THE ORDINARY EXACT P-VALUE =',3X,F12.6) ENDIF DO 210 I=K1,K2 C(I-Kl+l)=DS(IPAR,I-kl) 210 CONTINUE IF (NCODE .EQ. 1) THEN open(unit=29,f ile='dist .fx5' ) DO 211 I=K1,K2 write(29,*)i-kl+l,C(i-kl+l) ,DEXP(C(i-kl+l)) ,I 211 CONTINUE ENDIF C C IP0S=1 IF OBSERVED IS ON LOWER BOUNDARY, 2 ON UPPER 0 OW C IP0S=0 IF(J .EQ. 1) IPOS =1 IF(J .EQ. SCD) IP0S=2 C PRINT *, 'IPOS' ,IPOS C

PAGE 144

136 C CALCULATE STARTING VALUES C C CALL SATO (ITAB, IK, LL,UL,MH,KA,IA,RLL,RUL, IPOS) CALL SATO ( ITAB , IK , LL , UL , MH , KA , lA , RLL , RUL , IPOS , VRBG) Rl(3) = UL R2(3) = LL Rl(4) = RUL R2(4) = RLL C P-VALUES IFCIPOS .GE.DGOTO 220 FF=0 . 5 MIDP=FLOWER(O.DOO)+K PVAL2=FUPPER(0.D00)+K IF(MIDP .GT. PVAL2)MIDP=PVAL2 MIDP=MIDP*2 FF=1.0 MAXP=FLOWER (0 . DOO ) +K PVAL2=FUPPER(0 .DOO)+K IF(MAXP .GT. PVAL2)MAXP=PVAL2 MAXPE=MAXP C PRINT*, 'ONE SIDED P_EXACT ' ,MAXPE C MAXP=MAXP*2 220 FF=1.0 IFdPOS .EQ. DTHEN MAXP=2* (FLOWBO (0 . DOO) +K) FF=0.5 MIDP=2* (FLOWBO (0 . DOO) +K) ENDIF FF=0.5 IFCIPOS .EQ. 2)THEN MIDP=2* (FUPPBO (0 . DOO)+K) FF=1.0 MAXP=2* (FUPPBO (0. DOO) +K) ENDIF FF=0 DO 1000 JJ=1,2 FF=FF+0.5 C PRINT * , ' FF ' , FF IFCIPOS .EQ. 1) GOTO 300 IFCIPOS .EQ. 2) GOTO 310 C M.U.E IF(FF .EQ. 0.5) THEN

PAGE 145

137 K = 4.999999999999999D-001 XO = DLOG(MH) C PRINT 'MH' ,MH CALL brent (XO , EPS , IMAX , ROOTRF , FLOWER , NRF) HUE = ROOTRF C PRINT 'M.U.E' ,ROOTRF K = KA(IA,2) C PRINT 'K' ,K END IF XO = DLOG(UL) CALL brent (XO , EPS , IMAX , ROOTRF , FLOWER , NRF) Rl(JJ)=ROOTRF XO = DLOG(LL) CALL brent (XO , EPS , IMAX , ROOTRF , FUPPER ,NRF) R2(JJ)=R00TRF GOTO 340 300 XO = DLOG(UL) CALL brent (XO , EPS , IMAX , ROOTRF , FLOWBO ,NRF) R1(JJ)=R00TRF GOTO 341 310 XO = DLOG(LL) CALL brent (XO , EPS , IMAX , ROOTRF , FUPPBO , NRF) R1(JJ)=R00TRF GOTO 342 340 IF(FF .EQ. 0.5) GOTO 1000 IF (NCODE .EQ.l) THEN WRITE(*,999)IA PRINT POINT ESTIMATES' PRINT ' WRITE ( * , 979 ) MH , DEXP (MUE) PRINT ' WRITE(*,993)MAXPE WRITE (*, 994) POBSH PRINT * , ' ' PRINT * , ' INTERVAL ESTIMATES LOWER + 2+ONESIDED P' PRINT ' ' WRITE(*,991)DEXP(R2(2)) ,DEXP(R1(2)) ,MAXP WRITE(*,992)DEXP(R2(D) ,DEXP(R1(D) ,MIDP WRITE(*,980)R2(3) ,R1(3) WRITE(*,981)R2(4) ,R1(4) UPPER

PAGE 146

138 ENDIF GOTO 343 IF (FF .EQ. 0.5)G0T0 1000 IF (NCODE .EQ.l) THEN WRITE(*,999)IA PRINT LOWER BOUNDARY: UPPER LIMITS ONLY' PRINT ' PRINT * , ' INTERVAL ESTIMATES LOWER 2*0NESIDED P' WRITE(*,971)DEXP(R1(2)) ,MAXP WRITE(*,972)DEXP(R1(D) ,MIDP WRITE(*,973)R1(3) ENDIF GOTO 343 342 IF(FF .EQ. 0.5)G0T0 1000 IF (NCODE .EQ. 1) THEN WRITE(*,999)IA PRINT UPPER BOUNDARY: LOWER LIMITS OMLYÂ’ PRINT * , ' ' PRINT * , ' INTERVAL ESTIMATES LOWER 2*0NESIDED P' WRITE(*,974)DEXP(R1(2)) ,MAXP WRITE(*,975)DEXP(R1(D) ,MIDP WRITE(*,976)R2(3) ENDIF 343 CONTINUE IF (NCODE .EQ. 1) THEN PRINT*, ' ENDIF 971 FORMAT (' MAX-P EXACT 972 FORMAT (' MID-P EXACT 973 FORMAT (' MANTEL-HAENSZEL974 FORMAT (' MAX-P EXACT 975 FORMAT (' MID-P EXACT 976 FORMAT (' MANTEL-HAENSZEL979 FORMAT (' MANTEL-HAENSZEL ^ + ' MEDIAN UNBIASED =' ,F12 980 FORMAT (' MANTEL-HAENSZEL-; 981 FORMAT(' MANTEL-HAENSZEL-] M7X,2(5X,F12.6)) M7X,2(5X,F12.6)) ITO' ,17X,5X,F12.6) ' ,5X,F12.6,17X,F12.6) ' ,5X,F12.6,17X,F12.6) ITO' ,5X,F12.6) ' ,F12.6, ,2(5X,F12.6)) UPPER UPPER

PAGE 147

139 988 990 C 990 991 992 993 994 995 996 997 +ge FORMATC/,/,' MODIFED P-VALUE ' , 3(5X ,F12 . 6) ) FORMAT ('MODIFED EXACT Cl USING BRENTl ' , 2(5X,F12 . 6) ) FORMATC/,/,' MID P CORRECTED P EXACT ' ,3(5X,F12 .6)) FORMATC' MAX-P EXACT ' , 3C5X,F12 . 6) ) FORMATC' MID-P EXACT ' , 3C5X ,F12 . 6) ) FORMATC' ONE SIDED P EXACT ',5X,F12.6) FORMATC' PROB OF OBSERVED TABLES ', 2X ,F12 . 6) F0RMATC2Cl0X,F12.6)) F0RMATC10X,F12.6) FORMAT ClOX, 'ENTER FILENAME Ck for keyboard entry c to chan alpha-level) ' ,4C/)) 998 FORMAT C/2X, 'ELAPSED TIME CSECS) = ',F8.2,' + ',F8.2,' = ' + F8.2) 999 F0RMATC3C/),2X,I2, + '•/. TWO-SIDED EXACT CONFIDENCE', + ' LIMITS FDR THE COMMON ODDS RATIO' + 2C/)) 1000 CONTINUE IF CNCODE .EQ. 1) THEN WRITE C*, 350) FORMATC/,/, 'MODIFIED EXACT CONFIDENCE INTERVAL CY=1,N=0) J) READC*,*)NC0DE1 IF CNCDDEl .NE. 1) THEN PRINT*, 'END' GO TO 1002 ENDIF END IF JCI=1 C ONE-SIDED MODIFIED P CONFIDENCE INTERVAL. 2001 CONTINUE C2001 WRITEC*,2005) 2005 FORMATC/,/, 'MODIFIED EXACT CONFIDENCE LIMITS FOR ',/, 1 'THE COMMON ODDS RATIO USING ITERA CY=1,N=0) ?',/,/) c READC*,*) JSCIl JSCI1=1 C PRINT*, 'JSCI1=1' IF CJSCIl .EQ. 0) GO TO 3001

PAGE 148

140 C INITIAL VALUE FOR ITERAl IS THE LIMITS FROM ORDINARY EXACT Cl ALL=DEXP(R2(2)) AUL=DEXP(R1(2))*1.1D0 C AUL=DEXP(R1(2)) C print*, 'INITIAL VALUE USING ORDINARY EXACT Cl = ',all,aul C COMPUTE LOWER LIMIT ist=l JCI0=2 IF (J .EQ. SCD) ALL=DEXP(R1(2)) IF (J .EQ. 1) THEN ELLl=O.DO P_L01=1.D0 C PRINT*, 'LOWER LIMIT =' ,ELL1 C PRINT* GO TO 2006 END IF C PRINT*, 'INITIAL VALUE FOR THE LOWER LIMIT = ' ,ALL CALL ITERA 1 (ALPHA , ALL , ELL 1 , ist , JCI 0 , PALPHA) P_L01=PALPHA C print*, 'lower limit elll from ITERAl =',elll C PRINT* C COMPUTE UPPER LIMIT c START=1.D0/AUL 2006 START=AUL ist=2 JCI0=1 IF (J .EQ. SCD) THEN EUL1=99999. 999999 P_UP1=1.D0 C PRINT* ,' UPPER LIMIT =',EUL1 GO TO 2007 ENDIF C PRINT*, 'INITIAL VALUE FOR THE UPPER LIMIT = ' ,AUL C print* ,' start=' , start CALL ITERAl (ALPHA , START , EULl , ist , JCIO , PALPHA) P_UP1=PALPHA C print*, 'upper limit eull from ITERA1= ' , eull 2007 CONTINUE

PAGE 149

141 IF (NCODE .EQ. 2) GO TO 3001 PRINT* PRINT*, ' WRITE(*,2010) ELL1,EUL1 PRINT*, 'P-VALUE FOR THE LIMIT (low, up) => ,P L01,P_UP1 PRINT*, ' 1___! , PRINT* 2010 FORMAT ('ONE-SIDED MODIFIED EXACT Cl ' , 2 (5X ,F12 . 6) ) C TWO-SIDED ORDINARY P CONFIDENCE INTERVAL. C C The P-value is the sum of the either tail. 3001 CONTINUE C3001 WRITE(*,3600) 3600 FORMAT(/,/, 'TWO-SIDED ORDINARY EXACT CONFIDENCE LIMITS ', 1 'FOR THE COMMON ODDS RATIO (Y=1,N=0) ?',/) c READ(*,*) JTSCI I00T0=1 JTSCI=1 C PRINT*, 'JTSCI=1' IF (JTSCI .EQ. 0) GO TO 1001 C STARTING VALUES ARE LIMITS FOR ORDINARY EXACT Cl. ALL=DEXP(R2(2)) AUL=DEXP(R1(2))*1.1D0 C AUL=DEXP(R1(2)) C print*, 'INITIAL VALUE USING ORDINARY EXACT Cl = ',all,aul C COMPUTE LOWER LIMIT ist=l IF (J .EQ. 1) THEN ELL3=0.D0 P_L03=1 .DO C PRINT*, 'LOWER LIMIT =',ELL3 C PRINT* GO TO 3006 ENDIF

PAGE 150

142 IF (J .EQ. SCD) ALL=DEXP(R1(2)) C PRINT*, 'INITIAL VALUE FOR THE LOWER LIMIT = ' ,ALL CALL ITERA (ALPHA , ALL , ELL3 , ist , PALPHA , lOOTO) P_L03=PALPHA C print*, 'lower limit ell3 from ITERA =',ell3 C PRINT* C COMPUTE UPPER LIMIT c START=l.DO/AUL 3006 START=AUL ist=2 IF (J .EQ. SCD) THEN EUL3=99999. 999999 P_UP3=1 .DO C PRINT*, 'UPPER LIMIT =' ,EUL3 GO TO 3007 END IF C print* , ' start= ' , start C PRINT*, 'INITIAL VALUE FOR THE UPPER LIMIT = ',AUL CALL ITERA (ALPHA , START , EUL3 , ist , PALPHA , lOOTO) P_UP3=PALPHA C print*, 'upper limit eul3 from ITERA =',eul3 3007 CONTINUE IF (NCODE .EQ. 2) GO TO 1001 PRINT* PRINT*, ' WRITE (*,3989) ELL3,EUL3 PRINT*, 'P-VALUE FOR THE LIMIT (low, up) =\P_L03,P_UP3 PRINT*, ' 1 PRINT* 3989 FORMAT( 'TWO-SIDED ORDINARY EXACT Cl ' , 2 (5X ,F12 . 6) ) C TWO-SIDED MODIFIED P CONFIDENCE INTERVAL. C C The P-value is the sum of the either tail.

PAGE 151

143 1001 CONTINUE ClOOl WRITE(*,600) 600 F0RMAT(/,/, 'TWO-SIDED MODIFIED EXACT CONFIDENCE LIMITS \ 1 'FOR THE COMMON ODDS RATIO (Y=1,N=0) ?',/) c READ(*,*) JSCI I00T0=2 JSCI=1 C PRINT*, MSCI=1' IF (JSCI .EQ. 0) GO TO 1002 C STARTING VALUES ARE LIMITS FOR ORDINARY EXACT Cl . ALL=DEXP(R2(2)) AUL=DEXP(R1(2))*1.1D0 C AUL=DEXP(R1(2)) C print*, INITIAL VALUE USING ORDINARY EXACT Cl = \all,aul C COMPUTE LOWER LIMIT ist=l IF (J .EQ. 1) THEN ELL2=0.D0 P_L02=1.D0 C PRINT*, 'LOWER LIMIT =',ELL2 C PRINT* GO TO 1006 ENDIF IF (J .EQ. SCD) ALL=DEXP(R1(2)) C PRINT*, 'INITIAL VALUE FOR THE LOWER LIMIT = ' ,ALL CALL ITERA (ALPHA , ALL , ELL2 , i st , PALPHA , lOOTO) P_L02=PALPHA C print*, 'lower limit ell2 from ITERA =',ell2 C PRINT* C COMPUTE UPPER LIMIT c START=1 .DO/AUL 1006 START=AUL ist=2 IF (J .EQ. SCD) THEN EUL2=99999. 999999

PAGE 152

144 P_UP2=1 .DO C PRINT*, 'UPPER LIMIT =' ,EUL2 GO TO 1007 ENDIF C print*, ' start=' , start C PRINT*, 'INITIAL VALUE FOR THE UPPER LIMIT = ' ,AUL CALL ITERA (ALPHA , START , EUL2 , ist , PALPHA , lOOTO) P_UP2=PALPHA C print*, 'eul = ',eul c EUL=1.D0/EUL C print*, 'upper limit eul2 from ITERA =',eul2 1007 CONTINUE IF (NCODE .EQ. 2) GO TO 1008 PRINT* PRINT* , ' WRITE (*,989) ELL2,EUL2 PRINT*, 'P-VALUE FOR THE LIMIT (low, up) =',P_L02,P UP2 PRINT*, ' 1_I , PRINT* PRINT* , ' END ' 989 F0RMAT( 'TWO-SIDED MODIFIED EXACT Cl ' , 2 (5X ,F12 . 6) ) IF (NCODE .EQ. 1) GO TO 1002 C** + *****:|=*=K*:(c*»c=)c=|==|c=C„c^c*=(c**=)cXt»c:)c:tc*:t:*******=(c*=(c**=|c=|o|c*****,^**„t**:(t*** + * 1008 CONTINUE IF (J .EQ. 1 .OR. J .EQ. SCD) THEN IF (J .EQ. 1) THEN 00CI(IAITN,1)=0.D0 00CI(IAITN,2)=DEXP(R1(2)) 0MCI(IAITN,1)=ELL1 0MCI(IAITN,2)=EUL1 T0CI(IAITN,1)=ELL3 T0CI(IAITN,2)=EUL3 TMCI(IAITN,1)=ELL2 TMCI(IAITN,2)=EUL2 ENDIF IF (J .EQ. SCD) THEN 00CI(IAITN,1)=DEXP(R1(2)) OOCIdAITN, 2) =99999. 99999 0MCI(IAITN,1)=ELL1 0MCI(IAITN,2)=EUL1

PAGE 153

145 T0CI(IAITN,1)=ELL3 T0CI(IAITN,2)=EUL3 TMCI(IAITN,1)=ELL2 TMCI(IAITN,2)=EUL2 END IF ELSE 00CI(IAITN,1)=DEXP(R2(2)) 00CI(IAITN,2)=DEXP(R1(2)) 0MCI(IAITN,1)=ELL1 0MCI(IAITN,2)=EUL1 T0CI(IAITN,1)=ELL3 T0CI(IAITN,2)=EUL3 TMCI(IAITN,1)=ELL2 TMCI(IAITN,2)=EUL2 END IF WRITE(45,5100)IAITN,00CI(IAITN,1),00CI(IAITN,2) WRITE(46 , 5100) lAITN, OMCI (lAITN, 1) , OMCI (lAITN, 2) WRITE(47,5100)IAITN,T0CI(IAITN,1) ,T0CI(IAITN,2) WRITE(48,5100)IAITN,TMCI(IAITN,1 ) , TMCI (IAITn' 2) 5000 CONTINUE c c COVERAGE PROBABILITY C C DALL=-5.51D0 IST=1 DO 5200 IAIN=1,1100 DALL=DALL+0.01D0 ALL=DEXP(DALL) CALL ITERA 1 0 ( ALPHA , ALL , ELL2 , I ST , PALPHA , HYPD 1 0 ) POOCI=O.DO POMCI=O.DO PTOCI=O.DO PTMCI=O.DO DO 5210 IAITN=1,I0T0T IF (ALL .GE. 00CI(IAITN,1) .AND. ALL .LE. OOCI (lAITN, 2) ) THEN

PAGE 154

146 5210 5200 5220 5100 5300 1002 10 20 P00CI=P00CI+HYPD10(IAITN, 1) END IF IF (ALL .GE. OMCIdAITN, 1) .AND. ALL .LE. OMCI (lAITN, 2) ) THEN P0MCI=P0MCI+HYPD10(IAITN, 1) END IF IF (ALL .GE. TOCIdAITN, 1) .AND. ALL .LE. TOCI (lAITN, 2) ) THEN PT0CI=PT0CI+HYPD10(IAITN,1) ENDIF IF (ALL .GE. TMCI(IAITN,1) -AND. ALL .LE. TMCI (lAITN, 2) ) THEN PTMCI=PTMCI+HYPD10 (lAITN ,1) ENDIF CONTINUE C0VER(IAIN,1)=DALL COVER (IAIN, 2) =POOCI C0VER(IAIN,3)=P0MCI C0VER(IAIN,4)=PT0CI COVER (IAIN, 5 )=PTMCI CONTINUE OPEN (UNIT=50 , FILE= ' COVER . P 0 DO 5220 IAIN=1,1100 WRITE(50,5300)C0VER(IAIN,1),C0VER(IAIN,2) ,C0VER(IAIN,3) , 1 C0VER(IAIN,4),C0VER(IAIN,5) CONTINUE PRINT*, 'END' F0RMAT(I5,2F12.6) F0RMAT(5F12.4) END subroutine error (i err ,mxs ,mxss ,mxz,mxd) iot = 6 if (ierr .eq. 1) then write(iot, 10)mxs format (/lOx, 'Error : Maximum no. of strata = ' , i4) return endif if (ierr .eq. 2) then write(iot,20)mxz f ormat (/ lOx , ' Insuf f icient memory : Increase size of',/, 'array HYP to be more than ',i7)

PAGE 155

147 30 40 return endif if (ierr .eq. 3) then write (iot,30)mxd format (/lOx , ^ Insufficient memory ; Increase size oiÂ’ ,J , + 'array DS to be more than ',i7) return endif if (ierr .eq. 4) then write(iot,40)mxss format (/lOx, 'Error : Maximum stratum size = ',i9) return endif return end C********** SUBROUTINE CNV2X2 ( IK , MXS , MXZ , MXD , LGE , ITAB , HYP , DS , II , K1 , K2 , IERR , 1 pobsh, JCI,0DR) C C C C C C C CONVOLVES HYPERGEOMETRIC DISTRIBUTIONS GENERATED BY SEVERAL 2X2 TABLES INTEGER ITAB(MXS,4) DOUBLE PRECISION HYP (0 :MXZ) ,DS (0 : 1 , 0 :MXD) ,SUMLG DOUBLE PRECISION DDl ,DD2 , 0NE,ZER0 ,HYMAX,DSMX ,LGE,EL DOUBLE PRECISION ZLOG,ZEXP,X DOUBLE PRECISION hypsum(lOOO) ,hypobs (1000) ,pobsh,hypd(l000 ,0 : 2000) DOUBLE PRECISION DENOl ,P0BSH1 , PEXIMP ,PEX , 0DR,PSI integer infhyl(lOOO) , infhyu(lOOO) C0MM0N/CI2/hypd,infhyl,infhyu,P0BSHl,PEXIMP,PEX DATA ONE, ZERO /l . OD+00 , 0 . OD+00/ ZLOG(X) = DLOG(X) ZEXP(X) = DEXP(X) CHECK INPUT PARAMETERS IF (IK .GT. MXS) IERR=1 K1 = 0 DO 1 1=1, IK IMM = ITAB (1,1) + ITAB (I, 2) INN = ITAB(I,3) + ITAB(I,4)

PAGE 156

ITT = ITAB(I,1) + ITAB(I,3) 148 C C LOWER AND UPPER LIMITS FOR STRATUM DISTRIBUTION C IF (ITT .GT. INN) THEN ILl = ITT-INN ELSE ILl = 0 END IF IF (ITT .LT. IMM) THEN IL2 = ITT ELSE IL2 = IMM END IF ITT = IL2 ILl IF (ITT .GT. MXZ) lERR = 2 K1 = K1 + ILl 1 CONTINUE IF (lERR .GT. 0) RETURN C DDl = 10*0NE EL = LGE*ZL0G(DD1) C C INITIALISE AND SET LOG-SCALE INDICATOR C II = 0 JJ = 1 IR = 0 DS(0,0) = ONE/DEXP(EL) DSMX = ZERO EL ILS = 0 C C FOR STRATA=1, . . . ,IK, COMPUTE HYPERGEOMETRIC DISTRIBUTION C AND PERFORM CONVOLUTION IN A RECURSIVE FASHION C DO 13 1=1, IK IMM = ITAB(I,1) + ITAB(I,2) INN = ITAB(I,3) + ITAB(I,4) ITT = ITAB(I,1) + ITAB(I,3) C C LOWER AND UPPER LIMITS FOR CONVOLUTION C IF (ITT .GT. INN) THEN ILl = ITT-INN ELSE

PAGE 157

ILl = 0 END IF IF (ITT .LT. IMM) THEN IL2 = ITT ELSE IL2 = IMM END IF IL2 = IL2 ILl K2 = IR + IL2 IF (K2 .GT. MXD) THEN lERR = 3 RETURN END IF COMPUTE STRATUM DISTRIBUTION ON LOG-SCALE HYP(O) = ZERO DO 2 J=1,IL2 DDl = DBLE(FL0AT(IMM-J-IL1+1))*DBLE(FL0AT(ITT-J-IL1+D) DD2 = DBLE(FL0AT(J+IL1))*DBLE(FL0AT(INN-ITT+J+IL1)) HYP(J) = HYP(J-l) + ZL0G(DD1/DD2) print*, Mata ,j ,HYP(J) ,zexp(HYP(J)) , j+ILl 2 CONTINUE IF (ILS .EQ. 1) GOTO 9 GET MAXIMUM HYPERGEOMETRIC COEFFICIENT ON LOG-SCALE AND CHECK FOR POTENTIAL OVERFLOW IN STRATUM DISTRIBUTION AM = (1.0 + FL0AT(ITT))/(1.0 + FL0AT(INN+1)/FL0AT(IMM+1)) lAM = IFIX(AM) ILl IF (HYP(O) .GT. HYP(IL2)) THEN DO 3 J=0,IL2 HYP(J) = HYP(J) HYP(IL2) print*. 'j ,hyp(j) ,zexp(HYP(J)) ' , j >hyp(j) ,zexp(HYP(J)) 3 CONTINUE ENDIF HYMAX = HYP (I AM) IF (HYMAX .GT. EL) THEN ILS = 1 print*, 'ILS=1 in Cl' GOTO 7 ENDIF CHECK FOR POTENTIAL OVERFLOW IN THE ITH CONVOLUTION

PAGE 158

150 IF (IL2 .LT. IR) THEN IXX = IL2 + 1 ELSE IXX = IR + 1 END IF C IXX = (IL2 + 1)*(IR + 1) DDl = DBLE(FLOATdXX)) DDl = ZLOG(DDl) DSMX = DSMX + HYMAX + DDl C WRITE(*,*)I, DSMX, HYMAX, DDl IF (DSMX .GE. EL ONE) THEN ILS = 1 print*, 'ILS=1 in Cl' GOTO 7 END IF C C CONVERT STRATUM DISTRIBUTION TO NATURAL SCALE C hypsum(i)=0 .do C PRINT*, 'ODR,JCI' ,ODR,JCI DO 4 J=0,IL2 IF (JCI .Eq. 1) THEN HYP(J) = ZEXP(HYP(J))*0DR**(J+IL1) GO TO 9999 END IF HYP(J) = ZEXP(HYP(J)) C ihyp(i,j)=j 9999 liypd(i, j)=hyp(j) c hypd(i,j) is hypergeometric prob dist for each stratum. hyps urn (i)=hyp sum (i)+hyp(j) C print*, 'HYP(J)',J, HYP( J) ,hypsum(i) 4 CONTINUE infhyl(i)=ill infhyu(i)=il2 c infhyu(i) is the no. of possible tables for the fixed 2x2 tables. ikim=itab(i , 1) -ILl hypobs ( i ) =hyp ( ikim) C HYPOBS (I) IS THE NUMERATOR OF PROB FOR EACH STRATUM. C PRINT* ,' STRATUM PROB ', I ,hyp (ikim) ,hypsum(i) , hypobs (i)/HYPSUM(I) c C PERFORM CONVOLUTION ON NATURAL SCALE

PAGE 159

151 C DSMX = ZERO EL DO 6 J=0,K2 IF (J .GT. IL2) THEN IHl = J-IL2 ELSE IHl = 0 END IF IF (J .LT. IR) THEN IH2 = J ELSE IH2 = IR ENDIF DS(JJ,J) = DS(II,IH1)*HYP(J-IH1) DO 5 JR=IH1+1,IH2 DDl = DS(II, JR)*HYP(J-JR) DS(JJ,J) = DS(JJ,J) + DDl 5 CONTINUE DDl = ZLOG(DS(JJ,J)) IF (DDl .GT. DSMX) DSMX = DDl 6 CONTINUE GOTO 12 7 CONTINUE C C CONVERT (I-l)TH CONVOLVED DISTRIBUTION TO LOG SCALE C DO 8 KK=0,IR DSdi.KK) = EL + ZLOG(DS(II,KK)) 8 CONTINUE C C PERFORM CONVOLUTION ON LOGARITHMIC SCALE C 9 CONTINUE C if (ils .eq. 1) then hypsum(i)=0 .do C PRINT* d ODR, JCI' ,ODR,JCI DO 4444 J=0,IL2 IF (JCI .EQ. 1) THEN HYP(J) = ZEXP(HYP(J))*ODR**(J+ILl) GO TO 9998 ENDIF HYP(J) = ZEXP(HYP(J))

PAGE 160

152 C ihyp(i,j)=j 9998 hypd(i, j)=hyp(j) c hypd(i,j) is hypergeometric prob dist for each stratum. hypsum ( i ) =hypsum ( i ) +hyp ( j ) C print*, 'HYP(J) \J, HYP ( J) ,hypsum(i) 4444 CONTINUE infhyl(i)=ill infhyu(i)=il2 c infhyu(i) is the no. of possible tables for the fixed 2x2 tables. ikim=itab(i , 1)-IL1 hypobs ( i ) =hyp ( ikim) C HYPOBS (I) IS THE NUMERATOR OF PROB FOR EACH STRATUM, endif DO 11 J=0,K2 IF (J .GT. IL2) THEN IHl = J-IL2 ELSE IHl = 0 ENDIF IF (J .LT. IR) THEN IH2 = J ELSE IH2 = IR ENDIF DS(JJ,J) = DS(II,IH1) + HYP(J-IHl) DO 10 JR=IH1+1,IH2 DDl = DS(II,JR) + HYP(J-JR) DD2 = DS(JJ,J) DS(JJ,J) = SUMLG(DD1,DD2) 10 CONTINUE 11 CONTINUE C C RESET FOR NEXT STEP C 12 II = JJ JJ = 1 II IR = K2 C C DELETE THE NEXT TWO STATEMENTS FROM C THE PUBLISHED ALGORITHM C C write(*,95)i,ils

PAGE 161

153 95 format (/lOx, 'stratum no. i3 ,4x scale =', i2) C C 13 CONTINUE C ADDED BY DONGUK KIM POBSH=l .DO DO 20 1=1, IK 20 POBSH=POBSH*HYPOBS(l) P0BSH1=P0BSH C POBSH IS OBSERVED VALUE AND PROB IS POBSH/DENOl DEN01=1 .DO DO 21 1=1, IK 21 DEN01=DEN01*HYPSUM(I) P0BSH=P0BSH/DEN01 C PRINT*, 'PROB OF OBSERVED TABLE= ', POBSH C PRINT*, 'OBSERVED VALUE FOR PROB= ' , POBSHl C C NORMALISE FINAL DISTRIBUTION C IF (ILS .NE. 1) THEN DO 14 1=0, K2 DS(II,I) = ZLOG(DS(II,D) 14 CONTINUE ENDIF DSMX = DS(II,0) DO 15 1=1, K2 DDl = DS(II,I) DD2 = SUMLG (DSMX, DDl) DSMX = DD2 15 CONTINUE DO 16 1=0, K2 DS(II,I) = DS(II,I) DSMX 16 CONTINUE K2 = K1 + K2 c Added by DONGUK KIM C C CALCULATES OBSERVED POSITION IN SAMPLE SPACE C J POSITION C SCD SIZE COND. SAMPLE SPACE

PAGE 162

154 ISUMA=0 DO 180 1=1, IK ISUMA = ISUMA + ITAB(I,1) 180 CONTINUE J = ISUMA K1 + 1 SCD = K2 K1 + 1 IF (JCI .EQ. 1) GO TO 999 JCI0=0 C IF (JCI .EQ. 1) GO TO 999 CALL IMPROV ( ik , it ab , hypd , inf hyl , inf hyu , POBSHl , PEXIMP , PEX , JCI 0 , PSI ) 999 RETURN END C ADDED BY DONGUK KIM DEC.l, 1992 C ENUMERATE ALL POSSIBLE TABLES WITH GIVEN 2x2 MARGINS, AND C COMPUTE IMPROVED EXACT AND EXACT UPPER AND LOWER TAIL PROBABILITY. SUBROUTINE IMPROV ( ik , it ab , hypd , inf hyl , inf hyu , POBSHl , PEXIMP 1 PEX, JCIO,PSI) PARAMETER (MAXT=270000) DOUBLE PRECISION HYPD (1000 , 0 : 2000) ,HYPSUM(1000) ,HYPD1 (270000 , 1) DOUBLE PRECISION DENO ,TS , POBSHl , PTOBSl ,PT0BS2 , PEXIMP , PEX DOUBLE PRECISION PEXIML,PEXIMU,PEXL,PEXU,HYPD2(270000,20) DOUBLE PRECISION PEXPl ,PEXP2 ,X,PDN1 ,PDN2 DOUBLE PRECISION PVDIS (270000 , 5) , PDT(500 , 3) ,PSI DOUBLE PRECISION CHI (270000) ,CHI1 (270000) ,CMH,CHIOBS ,G2 (270000) DOUBLE PRECISION G , GOBS , CUP ,FIT( 1000 , 2 , 2) C PDT(500,3) :Pr(P-value<=x) for two P-values. INTEGER ITAB(1000,4) ,INFHYL(1000) , INFHYU(IOOO) INTEGER INUM(270000,20) ,INUM1(270000,1) ,IOTOT INTEGER NIK(2,1000) ,NJK(2,1000) ,NT0T(1000) , MATRIX (1000, 2, 2) LOGICAL ISEA

PAGE 163

155 COMMON /DKIM/ DENO , ITOT, ISUML, INUM,HYPD2 , INUMl ,HYPD1 c COMMON /DKIMl/ POBSHl , ISUMTS , IK COMMON /DKIMl/ ISUMTS COMMON /CH/ NIK,NJK,NTOT COMMON /ART/ lOTOT COMMON /CHI/ CHI,CHIOBS C HYPDl HAS PROB HAVING T .GE. T_OBS. C INUMl HAS VALUE IN EACH STRATUM HAVING T .GE. T_OBS . C THE MAXIMUM NO OF TABLES FOR T .GE. T_OBS IS ALLOWED TO BE 5500. C IF IT IS GREATER THAN 5500, MAXT IN HYPDl (MAXT , 1000) AND C INUMl (MAXT, 1000) SHOULD BE INCREASED. DEN0=1 .DO ISUML=0 ISUMU=0 DO 100 1=1, IK HYPSUM(I)=O.DO IJ=INFHYU(I) C print*, a,ILl,IL2,HYPSUM(I) M,INFHYL(I) ,INFHYU(I) ,HYPSUM(I) C IJ IS IL2 FOR EACH STRATUM DO 110 J=0,IJ 110 HYPSUM(I)=HYPSUM(I)+HYPD(I,J) I SUML=I SUML+INFHYL ( I ) ISUMU=ISUMU+INFHYU(I) DENO=DENO*HYPSUM ( I ) C DENO IS DENOMINATOR OF THE HYPERGEO. PROB. DIST. 100 CONTINUE C PRINT*, ^DENO=' , DENO ISUMA=0 DO 120 1=1, IK ISUMA=ISUMA+ITAB (I , 1) 120 CONTINUE I SUMTS = I SUMA1 SUML C PRINT*, 'I SUMA, I SUMTS' , ISUMA , ISUMTS C ISUMTS IS OBSERVED TEST STATISTICS WHICH WILL BE USED. IT0T=1 DO 130 1=1, IK

PAGE 164

130 156 IT0T= ITOT* ( INFHYU ( I ) + 1 ) IOTOT=ITOT C PRINT*, 'NO. OF POSSIBLE TABLES = \ITOT C ITOT IS TOTAL NUMBER OF ENUMERATION FOR THE TABLES. IF (ISEA) GO TO 315 C C SET ISEA FOR THE SUBSEQUENT CALLS C ISEA=.TRUE. C TO MAKE INUM IC0UNT=0 NUM=1 DO 210 K1=1,IK-1 DO 220 K2=K1+1,IK 220 NUM=NUM* (INFHYU (K2)+l) C NUM IS NUMBER OF REPLICATES. K=K1 IN=INFHYU(K) 99 DO 230 K3=1,NUM IC0UNT=IC0UNT+1 INUM(ICOUNT,K)=IN 230 CONTINUE IF (IN .GT. 0) THEN IN=IN-1 GO TO 99 END IF IF (ICOUNT .LT. ITOT) THEN IN=INFHYU(K) GO TO 99 ELSE IC0UNT=0 NUM=1 ENDIF 210 CONTINUE C THE LAST STRATUM (i.e., LAST COLUMN IN ARRAY)

PAGE 165

157 IC0UNT=0 K=IK IN=INFHYU(K) 250 IC0UNT=IC0UNT+1 INUM(ICDUNT,K)=IN IF (IN .GT. 0) THEN IN=IN-1 GO TO 250 END IF IF (ICOUNT .LT. ITOT) THEN IN=INFHYU(K) GO TO 250 END IF C PRINT* , ' ICOUNT= ' , ICOUNT C PRINT* ITOT= ^ , ITOT DO 300 1=1, ITOT ITS=0 DO 310 J=1,IK 310 ITS=ITS+INUM(I, J) INUM(I,IK+1)=ITS 300 CONTINUE 315 CONTINUE c PRINT*, 'DISPLAY ALL POSSIBLE TABLES (Y= 1 ,N= 0 ) ?' C READ(*,*)IDIS IDIS=0 IF (IDIS .EQ.l) THEN PRINT*, 'INPUT NO. OF INCREMENT :' C READ(*,*)INCR INCR=1 PRINT* , ' ENUMERATION ' DO 320 1=1, ITOT, INCR PRINT* , I , ( INUM ( I , J ) +INFHYL (J),J=1,IK), INUM ( I , IK+ 1 ) +I SUML 320 CONTINUE END IF C PRINT* C COMPUTATION OF PROB OF ALL RANDOM TABLES DO 350 K=1,IK

PAGE 166

158 DO 355 I=l,ITOT IA=INUM(I,K) HYPD2(I,K)=HYPD(K,IA) 355 CONTINUE 350 CONTINUE DO 360 I=1,IT0T TS=1.D0 DO 365 J=1,IK 365 TS=TS*HYPD2(I, J) HYPD2(I,IK+1)=TS 360 CONTINUE C PRINT PROB OF ALL RANDOM TABLE TS=O.DO DO 367 I=1,IT0T TS=TS+HYPD2(I,IK+1) 367 CONTINUE C PRINT*, 'SUM OF VALUE, DENO, PROB= ' ,TS , DENO ,TS/DENO C PRINT* C PRINT*, 'PROB OF ALL RANDOM TABLES (Y=1,N=0) ?' C READ(*,*)IDSA IDSA=0 IF (IDSA .EQ. 1) THEN PRINT*, 'ENUMERATION OF PROB OF ALL RANDOM TABLES' DO 370 I=1,IT0T,INCR PRINT*,I, (sngI(HYPD2(I, J)) ,J=1,IK) ,sngl(HYPD 2 (I,IK+l)/DEN 0 ) 370 CONTINUE ENDIF C print* c C GENERATE ALL POSSIBLE RANDOM TABLES C C WRITE(*, 70010) C70010 FORMAT(/, 'PRINT X~2 AND G~2 FDR ALL RANDOM TABLES ? (Y=1,N=0)') C READ(*,*)IX2G2 IX2G2=0 C PRINT*, 'IX2G2=0'

PAGE 167

159 NR0W=2 NC0L=2 NSTM=IK CUP=O.DO C IPF IS CALLED JUST ONE TIME WITHIN FIXED OR. IF (JCIO .NE. 0) THEN DO 375 K=1,IK MATRIX (K,1,1)=ITAB(K,1) MATRIX (K , 1 , 2) =ITAB (K , 2) MATRIX(K,2,1)=ITAB(K,3) MATRIX (K , 2 , 2) =ITAB (K , 4) 375 CONTINUE CALL IPF(PSI, IK, MATRIX, FIT) END IF DO 384 I=1,IT0T DO 386 K=1,IK MATRIX (K , 1 , 1) =INUM ( I , K) +INFHYL (K) MATRIX (K , 1 , 2) =NIK ( 1 , K) -MATRIX (K ,1,1) MATRIX (K , 2 , 1 ) =N JK ( 1 , K) -MATRIX (K ,1,1) MATRIX (K , 2 , 2) =NTOT (K) (MATRIX (K ,1,1) +MATRIX (K , 1 , 2) +MATRIX (K , 2 , 1 ) ) IF (IX2G2 .EQ. 1) THEN WRITE (*, 70000) I , K , MATRIX (K , 1 , 1 ) , MATRIX (K , 1 , 2) , MATRIX (K , 2 , 1) , 1 MATRIX (K, 2, 2) C WRITE(*,70001)K,NIK(1,K) ,NIK(2,K) ,NJK(1,K) ,NJK(2,K) ,NTOT(K) 70000 F0RMAT(2I5,' : ',4110) 70001 FORMATC 'TOTAL' ,6110) END IF 386 CONTINUE C IF (JCIO .NE. 0) CALL IPF(PSI , IK , MATRIX , FIT) CALL CMHNN 1 (NROW , NCOL , NSTM , NIK , N JK , NTOT , MATRIX , CMH , G , 1 JCIO, FIT) CHI(I)=CMH G2(I)=G CUP=CUP+HYPD2 (I , IK+1) /DENO IF (IX2G2 .EQ. 1) THEN WRITE(*, 70002)1, HYPD2(I,IK+1)/DEN0,CHI(I) ,G2(I) ,CUP 70002 FORMAT('NO., Pr(T), X~2, G~2 = ' , 15 ,4F14 . 7 , /)

PAGE 168

160 END IF 384 CONTINUE C COMPUTE THE OBSERVED CHI-SQUARED STATISTIC : CHIOBS IF (IX2G2 .EQ. 1) THEN PRINT* PRINT* OBSERVED DATA' END IF DO 390 K=1,IK MATRIX (K,1,1)=ITAB(K,1) MATRIX (K , 1 , 2) =ITAB (K , 2) MATRIX (K , 2 , 1) =ITAB (K ,3) MATRIX (K , 2 , 2) =ITAB (K , 4) IF (IX2G2 .EQ. 1) THEN WRITE (* , 70004) K , ITAB (K , 1 ) , ITAB (K , 2) , ITAB (K , 3) , 1 ITAB (K, 4) 70004 F0RMAT(5X, 15, ' ; ',4110) C WRITE(*,70001)K,NIK(1,K) ,NIK(2,K) ,NJK(l,K) ,NJK(2,K) ,NTOT(K) END IF 390 CONTINUE CALL CMHNN 1 (NROW , NCOL , NSTM , NIK , N JK , NTOT , MATRIX , CMH , G , 1 JCIO,FIT) CHIOBS=CMH GOBS=G IF (IX2G2 .EQ.l) THEN WRITE (*, 70003) POBSHl/DENO, CHIOBS, GOBS 70003 FORMAT (' OBSERVED Pr(t_o), X~2, G"2 = ' , 3F14 . 7 , / , /) ENDIF 395 IF (JCIO .EQ. 3) GO TO 1000 C UPPER TAIL C TO MAKE INUMl C KIM_9.F IF (JCIO .EQ. 1) GO TO 666 IC0UNT1=0 DO 400 I=1,IT0T

PAGE 169

161 IF (INUM(I,IK+1) .GE. ISUMTS) THEN IC0UNT1=IC0UNT1+1 INUMl (ICOUNTl , 1) =INUM(I , IK+1) HYPDl (ICOUNTl , 1) =HYPD2(I , IK+1) CHI1(IC0UNT1)=CHI(I) IF (ICOUNTl .GE. MAXT) THEN PRINT* INCREASE ARRAY INUMl, HYPDl IN SUBROUTINE IMPRIV' print*, 'icount,icountl' , icount , icountl go to 1000 END IF END IF 400 CONTINUE IF (JCIO .EQ. 0) THEN C PRINT*, 'NO OF TABLES FOR T .GE. T.OBS =MC0UNT1 END IF C PRINT* C PRINT*, 'DISPLAY ALL TABLES FOR T .GE. T_OBS (Y=1,N=0) ?' C READ(*,*)IDIS1 IDIS1=0 IF (IDISl .EQ. 1) THEN c PRINT*, 'INPUT NO. OF INCREMENT :' c READ(*,*)INCR1 INCR1=1 PRINT*, 'ENUMERATION FOR THOSE TABLES HAVING T .GE. T.OBS' DO 420 I=1,IC0UNT1,INCR1 PRINT* , I , INUMl (I , 1) +ISUML 420 CONTINUE ENDIF C PRINT* C COMPUTATION OF PROB OF OBSERVING OBSERVED AND RANDOM TABLES C PRINT*, 'PROB FOR THOSE TABLES HAVING T .GE. T OBS (Y=1,N=0) ' C READ(*,*)IDS2 IDS2=0

PAGE 170

IF (IDS2 .EQ. 1) THEN PRINT* ENUMERATION OF PROB FOR THOSE TABLES HAVING T DO 550 1=1, ICOUNTl, INCH PRINT* , I , sngl (HYPD 1 ( I , 1 ) /DENO ) 550 CONTINUE END IF C print* PT0BS1=0.D0 C PRINT*, 'DISPLAY UPPER TAIL IMPROV. PROB (Y=1,N=0) ?' C READ(*,*)IDS3 IDS3=0 IF (IDS3 .EQ. 1) WRITE(*,888) c WRITE (*,888) DO 560 I=1,IC0UNT1 IF (INUM1(I,1) .GT. ISUMTS .OR. INUM1(I,1) .EQ. ISUMTS 1 .AND. CHIl(I) .GE. CHIOBS) THEN C 1 .AND. HYPD1(I,1) .LE. POBSHl) THEN PT0BS1=PT0BS1+HYPD1(I,1) if (IDS3 .NE. 1) GO TO 555 WRITE (* , 900) INUMl (I , 1) +ISUML , ISUMTS+ISUML , 2 HYPDl (I , 1) /DENO , POBSHl/DENO , PTOBS 1/DENO 900 F0RMAT(2(1X,I7) ,3(1X,F12.6)) 555 ENDIF 560 CONTINUE c ENDIF C print* PEXIMU=PT0BS1/DEN0 C PEXIMPU IS IMPROVED UPPER TAIL EXACT PROB. PT0BS2=0.D0 C PRINT*, 'DISPLAY UPPER TAIL PROB (Y=1,N=0) ?' C READ(*,*)IDS4 IDS4=0 IF (IDS4 .EQ. 1) WRITE(*,889) C WRITE(*,889) DO 570 I=1,IC0UNT1 IF (INUMl (1,1) .GE. ISUMTS) THEN PT0BS2=PT0BS2+HYPD1 (I , 1) IF (IDS4 .NE. 1) GO TO 565 WRITE (* , 900) INUMl (I , 1) +ISUML , ISUMTS+ISUML ,

PAGE 171

163 3 HYPDl (I , 1) /DENO , POBSHl/DENO , PT0BS2/DEN0 C PRINT* , INUMl (I , 1) , ISUMTS , SNGL (HYPDl (I , 1) /DENO) , C 3 SNGL (POBSHl/DENO) , SNGL (PT0BS2/DEN0) 565 ENDIF 570 CONTINUE C ENDIF C print* PEXU=PT0BS2/DEN0 C PEXU IS UPPER TAIL EXACT PROB. C PRINT* , ' IMPROVED P.EXACT = ' , PEXIMP C PRINT*,' P.EXACT =',PEX c PRINT*, 'PROB. OF OBSERVED TABLES =', POBSHl/DENO C WRITE(*,901)PEXIMU C WRITE(*,902)PEXU C PRINT* C PRINT*, 'PROB. OF OBSERVED TABLES =', POBSHl/DENO C PRINT* IF (JCIO .EQ. 2) THEN PEXIMP=PEXIMU PEX=PEXU GO TO 1000 ENDIF 888 FORMAT(' T T_obs Pr(Ta) pr(Ta_obs) P(T>T_obs+ lorder) ' ) 889 FORMAT(' T T_obs Pr(Ta) pr(Ta_obs) P(T>=T obs) 2 ') 901 FORMAT(' IMPROVED UPPER P.EXACT =',3X,F12.6) 902 FORMAT(' UPPER P.EXACT =',3x!f12.6) C LOWER TAIL C TO MAKE INUMl C KIM.9.F 666 IC0UNT1=0 DO 600 I=1,IT0T IF (INUM(I,IK+1) .LE. ISUMTS) THEN IC0UNT1=IC0UNT1+1

PAGE 172

INUMl (ICOUNTl , 1)=INUM(I , IK+1) HYPDl (ICOUNTl , 1)=HYPD2(I , IK+1) CHI1(IC0UNT1)=CHI(I) 164 IF (ICOUNTl .GE. MAXT) THEN PRINT*, INCREASE ARRAY INUMl, HYPDl IN SUBROUTINE IMPRIV' print* , ' icount , icountl ' , icount , icountl go to 1000 ENDIF END IF 600 CONTINUE IF (JCIO .EQ. 0) THEN C PRINT*, 'NO OF TABLES FOR T .LE. T_0BS =',IC0UNT1 ENDIF C PRINT* C PRINT*, 'DISPLAY ALL TABLES FOR T .LE. T_0BS (Y=1,N=0) ?' C READ(*,*)IDIS1 IDIS1=0 IF (IDISl .EQ. 1) THEN c PRINT*, 'INPUT NO. OF INCREMENT :' c READ(*,*)INCR1 INCR1=1 PRINT*, 'ENUMERATION FOR THOSE TABLES HAVING T .LE. T_OBS' DO 620 I=1,IC0UNT1,INCR1 PRINT* , I , INUMl (I , 1) +ISUML 620 CONTINUE ENDIF C PRINT* C COMPUTATION OF PROB OF OBSERVING OBSERVED AND RANDOM TABLES C PRINT*, 'PROB FOR THOSE TABLES HAVING T .LE. T_OBS (Y=1,N=0) ?' C READ(*,*)IDS2 IDS2=0 IF (IDS2 .EQ. 1) THEN PRINT*, 'ENUMERATION OF PROB FOR THOSE TABLES HAVING T .LE. T OBS' DO 750 I=1,IC0UNT1,INCR

PAGE 173

165 PRINT* , I , sngl (HYPO 1 ( I , 1 ) /DENO ) 750 CONTINUE END IF C print* PT0BS1=0.D0 C PRINT*, 'DISPLAY LOWER TAIL IMPROV. PROB (Y=1,N=0) ?' C READ(*,*)IDS3 IDS3=0 IF (IDS3 .EQ. 1) WRITE(*,890) c WRITE(*,890) DO 760 I=1,IC0UNT1 IF (INUM1(I,1) .LT. ISUMTS .OR. INUM1(I,1) .EQ. ISUMTS 1 .AND. CHIl(I) .GE. CHIOBS) THEN C 1 .AND. HYPD1(I,1) .LE. POBSHl) THEN PT0BS1=PT0BS1+HYPD1(I,1) if (IDS3 .NE. 1) GO TO 755 WRITE (* , 900) INUMl (I , D+ISUML , ISUMTS+ISUML , 2 HYPDl (I , 1) /DENO , POBSHl /DENO ,PT0BS1/DEN0 755 ENDIF 760 CONTINUE c ENDIF C print* PEXIML=PT0BS1/DEN0 C PEXIMPL IS IMPROVED LOWER TAIL EXACT PROB. PT0BS2=0.D0 C PRINT*, 'DISPLAY LOWER TAIL PROB (Y=1,N=0) ?' C READ(*,*)IDS4 IDS4=0 IF (IDS4 .EQ. 1) WRITE(*,891) C WRITE(*,891) DO 770 I=1,IC0UNT1 IF (INUMl (1,1) .LE. ISUMTS) THEN PT0BS2=PT0BS2+HYPD1 (1,1) IF (IDS4 .NE. 1) GO TO 765 WRITE (* , 900) INUMl (I , D+ISUML , ISUMTS+ISUML , 3 HYPDl (I , 1) /DENO ,P0BSH1/DEN0 ,PT0BS2/DEN0 C PRINT* , INUMl (I , 1) , ISUMTS , SNGL (HYPDl (I , 1) /DENO) , ^ 3 SNGL (POBSHl/DENO) , SNGL (PT0BS2/DEN0) 765 ENDIF

PAGE 174

166 770 CONTINUE C ENDIF C print* PEXL=PT0BS2/DEN0 C PEXL IS LOWER TAIL EXACT PROB . C PRINT* /IMPROVED LOWER P.EXACT =\PEXIML C PRINT*/ LOWER P.EXACT =',PEXL c PRINT*/PROB. OF OBSERVED TABLES = /POBSHl/DENO C WRITE (*, 903) PEXIML C WRITE (*,904) PEXL C PRINT* C PRINT* PROB . OF OBSERVED TABLES =' ,P0BSH1/DEN0 C PRINT* IF (JCIO .EQ. 1) THEN PEXIMP=PEXIML PEX=PEXL GO TO 1000 ENDIF 890 FORMAT (' lorder) ' ) T T_obs Pr(Ta) pr(Ta_obs) P(T,/, 1 ' i.e., Pr(P-value <= x ,0
PAGE 177

169 C IS Pr(P-value<=x) for x=PDT(IX,l). OPEN (UNIT=32 , FILE= ' pcdf .out 0 DO 1330 IX=1,500 WRITE (32 ,1340) PDT ( IX , 1 ) , PDT (IX , 2) /DENO , PDT ( IX , 3) /DENO 1330 CONTINUE 1340 F0RMAT(F12.6,1X,F12.6,1X,F12.6) 1000 RETURN END C********=(c**=)c*^c=(c=t==t=*=)=***^c*=t:**=)<>(c,|c**:*,,c=|<*=lT_obs+ 889 FORMAT (' T 2') T_obs Pr(Ta) pr(Ta_obs) P(T>=T_obs) 901 FORMAT ('IMPROVED UPPER P.EXACT =' ,3X,F12.6) 902 c FORMAT (' UPPER P.EXACT =',3X,F12.6,/) c— C LOWER TAIL C TO MAKE INUMl : ALL POSSIBLE TABLES SUCH THAT T<=t_obs C COMPUTATION OF PROB OF OBSERVING RANDOM TABLES C TO MAKE HYPDl : THE VALUE OF PROB FOR INUMl C PROB OF THE TABLE IS HYPDl (I , 1) /DENO 598 IC0UNT1=0 DO 600 I=1,IT0T IF (INUM(I,IK+1) .LE. ISUMTS) THEN IC0UNT1=IC0UNT1+1 C DO 610 J=1,IK+1 INUMl (ICOUNTl , 1)=INUM(I , IK+1) HYPDl (ICOUNTl , 1)=HYPD2(I , IK+1) C610 CONTINUE END IF 600 CONTINUE C WRITE (30, 103) ICOUNTl 103 FORMAT('NO OF TABLES FOR T .LE. T_OBS =',I10) PT0BS1=0.D0 PT0BS2=0.D0

PAGE 180

172 DD 760 I=1,IC0UNT1 PT0BS2=PT0BS2+HYPD1 (1 , 1) IF (INUM1(I,1) .LT. ISUMTS .OR. INUM1(I,1) .EQ. ISUMTS 1 .AND. HYPD1(I,1) .LE. POBSHl) THEN PT0BS1=PT0BS1+HYPD1 (I , 1) 755 ENDIF 760 CONTINUE PEXIML=PT0BS1/DEN0 C PEXIMPL IS IMPROVED LOWER TAIL EXACT PROB . PEXL=PT0BS2/DEN0 C PEXL IS LOWER TAIL EXACT PROB. WRITE (30 , 903) INO , ISUMTS+ISUML , POBSHl/DENO , PEXIML , PEXL C WRITE(30, 903) PEXIML C WRITE (30, 904) PEXL C PRINT* C PRINT*, ^PROB. OF OBSERVED TABLES = \P0BSH1/DEN0 C PRINT* PEXIMP=PEXIML PEX=PEXL 890 FORMAT(' T T_obs Pr(Ta) pr(Ta_obs) P(T) 903 F0RMAT(I8,I7,1X,F15.10,1X,F12.6,F12.6) 904 FORMAT(' LOWER P.EXACT = ' , 3X ,F12 . 6 , /) 1000 RETURN END C FROM KARIM MAY 90 SLR. FOR ********** DOUBLE PRECISION FUNCTION SUMLG (DDDl ,DDD2)

PAGE 181

DOUBLE PRECISION DDD ,DD1 ,DD2 ,DDD1 ,DDD2 DOUBLE PRECISION ZLOG,ZEXP,X C ZLOG(X) = DLOG(X) ZEXP(X) = DEXP(X) C DD1=DDD1 DD2=DDD2 C PRINT *, 'HELLO FROM WITHIN SUMLG' C DDD = DDl IF (DD2 .GT. DDl) DDD=DD2 DDl = ZEXP(DDl-DDD) DD2 = ZEXP(DD2-DDD) DDD = ZL0G(DD1+DD2) + DDD SUMLG = DDD RETURN END C******=t:>l'** C DIFFERENT SUMLG IN junE 91 c DOUBLE PRECISION FUNCTION SUMLG(DD1 ,DD2) c DOUBLE PRECISION DDD,DD1,DD2 c DOUBLE PRECISION ZLOG,ZEXP,X cC c ZLOG(X) = DLOG(X) c ZEXP(X) = DEXP(X) cC c DDD = DDl c IF (DD2 .GT. DDl) DDD=DD2 c DDl = ZEXP(DDl-DDD) c DD2 = ZEXP(DD2-DDD) c DDD = ZL0G(DD1+DD2) + DDD c SUMLG = DDD c RETURN c END C * * >(t :(£ * * * *>)c * * + j|c ^ ^ 5j. 3)c ^ ^ ^ ^ ^ ^ ^ ^ ^ DOUBLE PRECISION FUNCTION FLOWER(BETA) C C CALCULATES P(T=T)*FF + P(T
PAGE 182

174 C DO 30 1=1, SCD A(I)=C(I)+(I-1)*BETA 30 CONTINUE C SLA=A(1) DO 40 1=2, J-1 SLA=SUMLG(SLA,A(D) 40 CONTINUE SLO=A(J) SLB=A(J+1) DO 50 I=J+2,SCD SLB=SUMLG(SLB,A(D) 50 CONTINUE UUU=SUMLG(SLA,SLB) SLA=SUMLG (SLA , SLO+DLOG (FF) ) UUU=SUMLG(UUU,SLO) C SLU = DEXP(SLA UUU) SLU = SLU K FLOWER = SLU RETURN END C *********=(c*******>|<*j)c*:(c:(c**>|c:(c:(c*:(c**=(c*:t:* C ******** ************ ** Jt: j(t * :f: ^ 3(c ^ ^ ^ ^ ^ DOUBLE PRECISION FUNCTION FLOWBO(BETA) C C CALCULATES P(T=T)*FF WHEN T IS ON LOWER BOUNDARY C DOUBLE PRECISION BETA, A(5500) , SLA, SLB, SLU, UUU, SUMLG,C(5500) ,K,SLO + ,FF INTEGER J,SCD COMMON/PARAM/C , J , SCD , K , FF C DO 30 1=1, SCD A(I)=C(I)+(I-1)*BETA 30 CONTINUE C SL0=A(1) SLB=A(J+1) DO 50 I=J+2,SCD SLB=SUMLG(SLB,A(D) 50 CONTINUE

PAGE 183

UUU=SUMLG(SLB,SLO) SLO=SLD+DLOG(FF) 175 C SLU = DEXP(SLO UUU) SLU = SLU K FLOWED = SLU RETURN END C *********=)<:*=l==|c** + :(c*=(c******** + ***>|c**** C * * * :)c * :^c * * * % :(c * >(c =)c * :(c * 3(c * 3)c :)c * 3tc * * >|t * % * * * J)c * * :)t * :(c :(< :(t )|( * DOUBLE PRECISION FUNCTION FUPPER(BETA) C C CALCULATES P(T=T)*FF + P(T>T) C DOUBLE PRECISION BETA, A(5500) , SLA, SLB, SLU, UUU, SUMLG,C(5500) ,K,SLO + ,FF INTEGER J,SCD COMMON/PARAM/C , J , SCD , K , FF DO 30 1=1, SCD A(I)=C(I)+(I-1)*BETA 30 CONTINUE C SLA=A(1) DO 40 1=2, J-1 SLA=SUMLG(SLA,A(D) 40 CONTINUE SLO=A(J) SLB=A(J+1) DO 50 I=J+2,SCD SLB=SUMLG(SLB,A(D) 50 CONTINUE UUU=SUMLG(SLA,SLB) SLB=SUMLG (SLB , SLO+DLOG (FF) ) UUU=SUMLG(UUU,SLO) SLU = DEXP(SLB UUU) SLU = SLU K FUPPER = SLU RETURN END C ************=(C**=K***:(C*>)C*****=|C*>(C****>(C*

PAGE 184

176 DOUBLE PRECISION FUNCTION FUPPBO(BETA) C C CALCULATES P(T=T)*FF WHEN T IS ON UPPER BOUNDARY C DOUBLE PRECISION BETA, A(5500) , SLA, SLB,SLU,UUU,SUMLG,C(5500) ,K,SLO + ,FF INTEGER J,SCD COMMON/PARAM/C , J , SCD , K , FF C DO 30 1=1, SCD A(I)=C(I)+(I-1)*BETA 30 CONTINUE C SLA=A(1) DO 40 1=2, J-1 SLA=SUMLG(SLA,A(D) 40 CONTINUE SLO=A(J) UUU=SUMLG(SLA,SLO) SLO=SLO+DLOG(FF) C SLU = DEXP(SLO UUU) SLU = SLU K FUPPBO = SLU RETURN END C ******=«*****=tc=|c=(c**:)c***j(c*)(cs(c*^>(c**!)c***=(c* SUBROUTINE brent (XO , t ol , IMAX , zbrent , Func , ITER) c FUNCTION ZBRENT(FUNC,X1,X2,T0L) c Van Wijngaarden-Dekker-Brent method c in Press WH, Flannery BP, Teukolsky SA, Vetterling WT: c Numerical Recipes The Art of Scientific Computing c (Fortran version). Cajnbridge: Cambridge University Press, 1989 c code on pages 253-254. c c Using Brent's method, find the root of a function FUNC known to c lie between XI and X2. The root returned as ZBRENT will be refined c until its accuracy is TOL. c (EPS is machine floating point precision, see p 16)

PAGE 185

177 c eps changed declarations + delxO introduced c PARAMETER(ITMAX=100,EPS=1 .d-14) double precision delxO,func,tol,zbrent ,a,b,c,d,e,fa,fb,fc double precision p , q, r , s , xm,xO ,toll external func delxO= . 2d+00 1 A=XO-delxO B=XO+delxO FA=FUNC(A) FB=FUNC(B) IF(FB*FA.GT.O)then C print 'BRACKET ROOTl | ' delx0=delx0*2 goto 1 endif “ no modifications below this line FC=FB DO 11 ITER=1,ITMAX IF(FB*FC.GT.O)THEN C=A FC=FA D=B-A E=D ENDIF IF ( ABS (FC) . LT . ABS (FB) ) THEN A=B B=C C=A FA=FB FB=FC FC=FA ENDIF T0L1=2 . *EPS*ABS (B) +0 . 5*T0L XM=.5*(C-B) IF(ABS(XM) .LE.TOLl .OR. FB . EQ . 0 . )THEN ZBRENT=B RETURN ENDIF IF(ABS(E) .GE.TOLl .AND. ABS (FA) . GT . ABS (FB) )THEN S=FB/FA IF(A.EQ.C)THEN P=2.*XM*S Q=l.-S ELSE

PAGE 186

178 Q=FA/FC R=FB/FC P=S* (2 . *XM*q* (Q-R) (B-A) * (R1 . ) ) Q=(Q-1.)*(R-1.)*(S-1.) END IF IF(P.GT.0.) Q=-q P=ABS(P) IF(2.*P .LT. MIN(3.*XM*q-ABS(T0Ll*q) ,ABS(E*q)))THEN E=D D=p/q ELSE D=XM E=D END IF ELSE d=xm e=d endif A=B FA=FB IF(ABS(D) .GT. TOLl) THEN B=B+D ELSE B=B+SIGN(TOLl,XM) ENDIF FB=FUNC(B) 11 CONTINUE PAUSE 'MAX IT' ZBRENT=B RETURN end C * * :(t * j)t % SUBROUTINE brent 1 (XO , t ol , IMAX , zbrent , ITER , JCI , JCIO , PALPHA) c FUNCTION ZBRENT(FUNC,X1,X2,T0L) c Van Wijngaarden-Dekker-Brent method c in Press WH, Flannery BP, Teukolsky SA, Vetterling WT: c Numerical Recipes The Art of Scientific Computing c (Fortran version). Cambridge: Cambridge University Press, 1989 c code on pages 253-254. c c Using Brent's method, find the root of a function FUNC known to c lie between XI and X2 . The root returned as ZBRENT will be refined

PAGE 187

until its accuracy is TOL. (EPS is machine floating point precision, see p 16) eps changed declarations + delxO introduced IMPLICIT REAL*8 (A-H,0-Z) PARAMETER (ITMAX=1 00, EPS=l.d14) double precision delxO,tol,zbrent,a,b,c,d,e,fa,fb,fc double precision delxO ,f unc ,tol ,zbrent , a,b , c, d ,e ,f a,f b ,f c double precision p, q,r,s,xm,x0, toll, 0R1,0R2,FA1, FBI integer itab (1000 ,4) , inf hyl (1000) , infhyu(lOOO) double precision hyp(0 : 2000) ,ds(0 : 1 ,0 : 5500) ,lge ,P0BSH DOUBLE PRECISION hypd(1000 , 0 : 2000) , POBSHl ,PEXIMP,PEX INTEGER J,SCD DOUBLE PRECISION CA(5500) ,K ,FF COMMON/ Cl 1 / ik , mxs , mxz , mxd , Ige , it ab , hyp , ds , ipar , kl , k2 , i err , pobsh COMMON/ CI2/hypd , inf hyl , inf hyu , POBSHl , PEXIMP , PEX COMMON/PARAM/CA , J , SCD , K , FF external func delxO=l . 2d0 0R1=X0 IF (XO-delxO .LE. O.DO) THEN 0Rl=0Rl/2.d0 ELSE 0Rl=X0-delx0 END IF 0R2=X0+delx0 A=0R1 B=0R2 PRINT* PRINT*, 'ODDS RATIO 0R1,0R2= ',0R1,0R2 call cnv2x2(ik, mxs, mxz, mxd, Ige, itab, hyp, ds, ipar, kl,k2,ierr, pobsh, 1 jci,0Rl) CALL IMPROV ( ik , it ab , hypd , inf hyl , inf hyu , POBSHl , PEXIMP , PEX , JCI 0 , ORl ) FA=PEXIMP-K FA1=PEX-K call cnv2x2 (ik ,mxs ,mxz ,mxd , Ige , itab , hyp , ds , ipar ,kl , k2 , i err , pobsh , 1 jci,0R2) CALL IMPROV ( ik , it ab , hypd , inf hyl , inf hyu , POBSH 1 , PEX IMP , PEX , JCI 0 , 0R2 )

PAGE 188

FB=PEXIMP-K FB1=PEX-K IF (FB*FA . GT . 0)then print 'BRACKET ROOT I | ' PRINT*.'K,X0,delx0,0Rl,0R2,FA,FB',K,X0,delx0,0Rl,0R2,FA,FB delx0=delx0*2 goto 1 endif no modifications below this line FC=FB DO 11 ITER=1,ITMAX IF(FB*FC.GT.O)THEN C=A FC=FA D=B-A E=D ENDIF IF(ABS(FC) .LT.ABS(FB))THEN A=B B=C C=A FA=FB FB=FC FC=FA ENDIF T0L1=2 . *EPS*ABS (B) +0 . 5*T0L XM=.5*(C-B) IF(ABS(XM) .LE.TOLl .OR. FB . EQ . 0 . )THEN ZBRENT=B 0R2=B PRINT*, 'ODDS RATIO 0R2 ',0R2 call cnv2x2 ( ik , mxs , mxz , mxd , Ige , it ab , hyp , ds , ipar , kl , k2 , ierr , pobsh , 1 jci,0R2) CALL IMPROV ( ik , it ab , hypd , inf hyl , inf hyu , POBSHl , PEXIMP , PEX , JCI 0 , 0R2 ) PALPHA=PEXIMP PRINT* , ' PEXIMP= ' , PEXIMP PRINT*, 'FIRST TETURN ZBRENT=B',B RETURN ENDIF

PAGE 189

181 IF(ABS(E) .GE.TQLl .AND. ABS (FA) . GT . ABS (FB) )THEN S=FB/FA IF(A.EQ.C)THEN P=2.*XM*S Q=l.-S ELSE Q=FA/FC R=FB/FC P=S* (2 . (Q-R) (B-A) * (R1 . ) ) Q=(Q-1.)*(R-1.)*(S-1.) IF(P.GT.O.) Q=-Q P=ABS(P) IF(2.*P .LT. MIN(3.*XM*Q-ABS(T0Ll*q) ,ABS(E*q)))THEN E=D D=p/q ELSE D=XM E=D END IF ELSE d=xm e=d endif A=B FA=FB IF(ABS(D) .GT. TOLD THEN B=B+D ELSE B=B+SIGN(TOLl,XM) ENDIF 1 jci,DR2) CALL IMPRDV ( ik , it ab , hypd , inf hyl , inf hyu , PDBSHl , PEXIMP , PEX , JCIG , GR2) FB=PEXIMP-K FB1=PEX-K ENDIF C FB=FUNC(B) C DR2=B PRINT*, ^ ODDS RATIO 0R2 \0R2 11 CONTINUE PAUSE 'MAX IT' ZBRENT=B PRINT*, 'SECOND RETURN ZBRENT=B',B C

PAGE 190

182 RETURN end SUBROUTINE SATO ( ITAB , IK , LL , UL , MH , KA , lA , RLL , RUL , IPOS , VRBG) C C C CALCULATES THE LIMITS OF EQUATION (2) IN C SATO, T. (1990). CONFIDENCE LINITS FOR THE COMMON ODDS RATIO C BASED ON THE ASYMPTOTIC DISTRIBUTION OF THE MANTEL-HAENSZEL C HAENSZEL ESTIMATOR. BIOMETRICS, 46, 71-80. C C INTEGER ITAB(1000,4) ,IA,IPOS DOUBLE PRECISION IN , IM, INN,R,S ,P , Q , W ,RK, SK, SQ ,LL,UL , CHI2 ,MH DOUBLE PRECISION KA ( 100 , 2) , SVDl , SVD2 ,SVD3 , VRBG ,RUL,RLL DATA W/O.DOO/,RK /O.DOO/,SK /O . DOO/ , SVDl/0 . DOO/ , SVD2/0 . DOO/ DATA SVD3/0.D00/ CHI2 = KA(IA,1) C PRINT *,CHI2 C ADDED BY DONGUK KIM, OCT. 3, 1993 C THIS IS REQUIRED FOR THE ITERATION OF RANDOM TABLES. C SET TO ZERO. W=O.DO RK=O.DO SK=O.DO SVD1=0.D0 SVD2=0.D0 SVD3=0.D0 DO 100 1=1, IK IT = ITAB(I,1) + ITAB(I,2) IN = ITAB(I,1) + ITAB(I,3) IM = ITAB(I,2) + ITAB(I,4) INN = IN + IM R = ITAB(I,1)*ITAB(I,4)/INN S = ITAB(I,2)*ITAB(I,3)/INN P = (ITAB(I,1) + ITAB(I,4))/INN Q = (ITAB(I,2) + ITAB(I,3))/INN W = W + (Q + 1/INN)*R + (P + 1/INN)*S

PAGE 191

183 RK = RK + R SK = SK + S C c ROBINS J, BRESLOW NE, GREENLAND S. ESTIMATORS OF C THE MANTEL-HAENSZEL VARIANCE CONSISTENT IN BOTH C SPARSE DATA AND LARGE-STRATA LIMITING MODELS C BIOMETRICS 1986;42:311-23. C C VARIANCE SVDl = SVDl + P*R SVD2 = SVD2 + (Q*R + P*S) SVD3 = SVD3 + Q*S 100 CONTINUE c ggQ limits (CONT) IF (IPOS .GE. DTHEN RUL=999 RLL=999 GOTO 109 END IF VRBG=SVDl/2/RK/RK + SVD2/2/RK/SK + SVD3/2/SK/SK RLL = DEXP(DL0G(RK/SK)-SQRT(CHI2*VRBG)) RUL = DEXP(DL0G(RK/SK)+SQRT(CHI2*VRBG)) C PRINT RLL, RUL 109 SQ = SQRT((4*RK*SK + CHI2*W) *CHI2*W) IF(SK .EQ. 0.0) GOTO 110 LL = (2*RK*SK + CHI2*W SQ)/2/SK/SK UL = (2*RK*SK + CHI2*W + Sq)/2/SK/SK MH = RK/SK C PRINT RK/SK C PRINT *,LL,UL GOTO 120 110 LL = (2*RK*SK + CHI2*W SQ)/2/RK/RK UL = (2*RK*SK + CHI2*W + SQ)/2/RK/RK LL = 1/UL C PRINT *, INFINITE POINT ESTIMATE LOWER LIMIT ONLY^ C PRINT *,LL 120 RETURN END C234567

PAGE 192

184 SUBROUTINE IT2 (ALPHA , INOUT, CAl , CA2 , PT , ATO , SUM, lOOTO) C C INOUT=l IF t_obs IS IN THE GIVEN PROBABILITY DISTRIBUTION WITH C PROBABILITY 1-ALPHA, ELSE 0 IMPLICIT REALMS (A-H,0-Z) c PARAMETER(EPS=l.d-6) PARAMETER (EPS=1 . d-14) integer itab(1000 ,4) , infhyl (1000) , infhyu(lOOO) double precision hyp(0 : 2000) ,ds(0 : 1 , 0 : 5500) ,lge ,POBSH DOUBLE PRECISION hypd(lOOO , 0 : 2000) , POBSHl ,PEXIMP,PEX INTEGER J,SCD DOUBLE PRECISION CA(5500) ,K,FF,CA1(5500) ,CA2(5500) ,INC(5500) DOUBLE PRECISION CAS (5500) COMMON/ Cl 1/ ik , mxs , mxz , mxd , Ige , it ab , hyp , ds , ipar , kl , k2 , i err , pobsh C0MM0N/CI2/hypd, infhyl, inf hyu, POBSHl, PEXIMP,PEX COMMON/PARAM/CA , J , SCD , K , FF DO 10 I=1,K2-K1+1 10 CA3(I)=CA1(I) C APPLYING MODIFIED P IN0UT=1 SUM=O.DO CALL SHELL(K2-K1+1,CA2) DO 100 I=1,K2-K1+1 DO 110 I1=1,K2-K1+1 IF (CA2(I) .EQ. CASdD) THEN INC(I)=I1 C FOR OTHER T THAT HAS THE SAME PROB. CA3(I1)=0.D0 GO TO 100 END IF 110 CONTINUE cl05 PRINT*, 'CA2(I),INC(I)= ' ,CA2(I) ,INC(I) 100 CONTINUE C C FOR TWO-SIDED LIMITS ADD TERMS FROM SMALLEST PROB C IN ASCENDING ORDER OF SIZE (NOT FROM EITHER TAIL) . C IP=1 150 SUM=SUM+CA2(IP) IIK=INC(IP)

PAGE 193

185 IF (lOOTO .EQ. 2) THEN C TWO SIDED-MODIFIED P (MODIFIED STERNE-TYPE P) IF (IIK .EQ. J) SUM=SUM-ATO END IF CAl(IIK)=O.DO C PRINT*, 'SUM, IIK= ', SUM, IIK IF (SUM .GE. ALPHA) THEN C PRINT*, 'SUM, IIK, INOUT= ' , SUM, IIK , INOUT RETURN ENDIF C IF (INC(IP) .EQ. J) THEN IF (IP .EQ. K2-K1+1 .OR. CA2(IP+1) .GT. PT) THEN IN0UT=0 C PRINT*, 'SUM, IIK, INOUT= ', SUM, IIK, INOUT RETURN ENDIF IP=IP+1 GO TO 150 END C234567 SUBROUTINE ITERA (ALPHA , START , RHO 1 , ist , PALPHA , lOOTO) C GIVEN ALPHA, STARTING VALUE, ITERA ITERATES AND RETURNS RHO A LOWER LIMIT. IMPLICIT REAL*8 (A-H,0-Z) PARAMETER(EPS=1 . d-14) PARAMETER (NNI T= 1 0 0 0 ) c PARAMETER(EPS=1 . d-6) integer itab (1000,4), inf hyl ( 1000) , inf hyu ( 1000) double precision hyp (0 : 2000) ,ds (0 : 1 , 0 : 5500) , Ige ,POBSH,PSI DOUBLE PRECISION hypd(l000,0 :2000) ,P0BSH1 ,PEXIMP,PEX DOUBLE PRECISION SRT(NNIT,2) ,SRTS(NNIT) ,SRTSR(NNIT,2) DOUBLE PRECISION SRTl (NNIT, 2) , SRTSl (NNIT) , SRTSRl (NNIT, 2) INTEGER J,SCD DOUBLE PRECISION CA(5500) ,K,FF,CA1(5500) ,CA2(5500) ,CC(5500) COMMON/CI 1/ik , mxs ,mxz , mxd , Ige , it ab , hyp , ds , ipar , kl , k2 , ierr , pobsh COMMON/ CI2/hypd , inf hyl , inf hyu , POBSHl , PEXIMP , PEX

PAGE 194

186 COMMON/PARAM/CA , J , SCD , K , FF COMMDN/ITN/SRT , SRTS , SRTSR , SRTl , SRTS 1 , SRTSRl C IF (1ST .EQ. 1) THEN C OPEN (UNIT=35 ,FILE= ' st_lo_ci . out ' ) C OPEN (UNIT=39 ,FILE= ' st_lo_all . out 0 C ELSE C OPEN (UNIT=36 ,FILE= ' st_up_ci . out ’ ) C OPEN (UNIT=40 ,FILE= ‘ st_up_all . out ’ ) C ENDIF DO 5 JJI=1,NNIT SRT(JJI,l)=O.DO SRT(JJI,2)=0.D0 SRTS(JJI)=O.DO SRTSR(JJI,l)=O.DO SRTSR(JJI,2)=0.D0 SRTl(JJI,l)=O.DO SRT1(JJI,2)=0.D0 SRTSl(JJI)=O.DO SRTSRl(JJI,l)=O.DO 5 SRTSR1(JJI,2)=0.D0 PSI=START RHO=O.DO KL=0 ITE=0 ITE1=0 C FOR STERNE'S Cl, JCI=1 SHOULD BE ASSIGNED FOR ODR COMPUTATION, C WHENEVER WE CALL CNV2X2. JCI=1 10 call cnv2x2(ik,mxs,mxz,mxd,lge,itab,hyp,ds,ipar,kl,k2,ierr,pobsh, 1 jci,PSI) JCI0=3 CALL IMPROV (ik , itab ,hypd , inf hyl , inf hyu , POBSHl , PEXIMP , 1 PEX, JCIO,PSI) C COMPUTE MODIFIED EXACT ALTERNATIVE PROB DISTN. CALL COMPT(CC,ATO,PT,IOOTO) c print* , ' =' ,psi

PAGE 195

187 c CAl(I) : PROBABILITY DO 90 I=1,K2-K1+1 CA1(I)=CC(I) 90 CONTINUE c CA2(I) : DUPLICATE OF CAl AND C THIS WILL BE SORTED PROBABILITY IN ASCENDING ORDER AFTER ITl . DO 95 I=1,K2-K1+1 CA2(I)=CC(I) 95 CONTINUE C IN=INOUT (ALPHA) C CALL ITl (ALPHA, INOUT, CAl) CALL IT2 (ALPHA , INOUT , CA 1 , CA2 , PT , ATO , SUM , I OOTO ) IN=INOUT C C KL IS 0 UNTIL CORRECT VALUE IS SPANNED BY RHO AND OPSI, C THEN KL IS SET TO 1 . C C IN=1 IF PSI IS TOO LARGE, ELSE IN=0 . C C ATO IS INCLUDED IN ACCEPTANCE REGION. IF (CAl(J) .EQ. O.DO) THEN CA1(J)=AT0 ELSE CA1(J)=CA1(J)+AT0 END IF PCHK=O.DO DO 100 I=1,K2-K1+1 PCHK=PCHK+CA1(I) C IF (I .EQ. K2-K1+1) print*, i, CAl(I) ,pchk 100 continue C PRINT* , ' PSI , TWO-SIDED P-VALUE = ' , PSI , SUM C PRINT*, 'P.ACCEPT, TOTAL P = ' , PCHK , SUM+PCHK ITE1=ITE1+1 SRT1(ITE1,1)=PSI SRT1(ITE1,2)=SUM IF (ITEl .GT. 1000) THEN PRINT*, 'NOT CONVERGE IN TWO-SIDED P'

PAGE 196

188 PALPHA=-99999 . 99999 RH01=-99999. 99999 GO TO 99 END IF C SUM IS THE TWO-SIDED P_VALUE. IF (IN .EQ. 0) THEN ITE=ITE+1 SRT(ITE,1)=PSI SRT(ITE,2)=SUM ENDIF IF (KL .EQ. 1) GO TO 40 IF (IN .EQ. 1) GO TO 20 RHO=PSI c PSI=PSI*1 . IDO if (ist .eq. 1) PSI=PSI*1 . OIDO if (ist .eq. 2) PSI=PSI*0 . 99D0 GO TO 10 20 KL=1 OPSI=PSI 30 PSI=(RH0+0PSI)*0.5D0 C C NEW ESTIMATE IS MIDPOINT OF SPANNING INTERVAL GO TO 10 40 IF (IN .EQ. 1) OPSI=PSI IF (IN .NE. 1) RHO=PSI C IF (DABS(RHO/OPSI -l.DO) .LT. EPS) RETURN IF (DABS(RHO/OPSI -l.DO) .LT. EPS) THEN IF (ITEl .GT. NNIT) THEN PRINT*, 'INCREASE NNIT FOR ARRAYS SRT,SRTS' GO TO 99 ELSE C PRINT*, 'NO OF ITERATION = Â’,ITE,ITE1 C PRINT* ENDIF DO 102 JJI=1,ITE 102 SRTS(JJI)=SRT(JJI,2) CALL SHELL1(ITE,SRTS)

PAGE 197

189 DO 200 1=1, ITE DO 210 11=1, ITE IF (SETS (I) .EQ. SRT(I1,2)) THEN SRTSR(I,1)=SRT(I1,1) SRTSR(I,2)=SRT(I1,2) GO TO 200 END IF 210 CONTINUE 200 CONTINUE C IF (1ST .EQ.l) THEN C DO 105 JJI=ITE,1,-1 C105 WRITE(35,107) J JI , SRTSR( J JI , 1) , SRTSR( J JI , 2) C ELSE C DO 106 JJI=ITE,1,-1 C106 WRITE(36,107) J JI , SRTSR( J JI , 1) , SRTSR( J JI , 2) C ENDIF 107 F0RMAT(I10,2F20.15) C PALPHA=SRTSR(ITE,2) C RH01=SRTSR(ITE,1) C PRINT*, 'FINAL LIMIT (RHO) = ' ,RHO C SORTING BY THETA DO 300 JJI=1,ITE1 300 SRTS1(JJI)=SRT1(JJI,1) CALL SHELL 1(ITE1,SRTS1) C FOR THE LOWER LIMIT P_VALUE IS SAVED IN ASCENDING ORDER. IF (1ST .EQ. 1) THEN DO 310 I=1,ITE1 DO 320 11=1, ITEl IF (SRTSl(I) .EQ. SRT1(I1,1)) THEN SRTSR1(I,1)=SRT1(I1,1) SRTSR1(I,2)=SRT1(I1,2) SRT1(I1,1)=0.D0 C FOR THE SAKE OF THE SAME THETA. GO TO 310 ENDIF 320 CONTINUE 310 CONTINUE C FOR THE UPPER LIMIT P.VALUE IS SAVED IN ASCENDING ORDER. C THAT IS, THETA IS SAVED IN DESCENDING ORDER.

PAGE 198

190 ELSE DO 330 I=1,ITE1 DO 340 I1=ITE1,1,-1 IF (SRTSl(I) .EQ. SRTldl,!)) THEN SRTSRl (ITEl-I+1 , 1)=SRT1 (II , 1) SRTSR1(ITE1-I+1,2)=SRT1(I1,2) SRTldl, 1)=0. DO C FOR THE SAKE OF THE SAME THETA. GO TO 330 END IF 340 CONTINUE 330 CONTINUE END IF DO 350 JJI=1,ITE1 IF (SRTSRl (JJI,1) .EQ. RHO) THEN ITE2=JJI GO TO 360 END IF 350 CONTINUE 360 CONTINUE C360 PRINT*, 'LIMIT (RHO) (ITE2) =',ITE2 ITE3=ITE2 DO 400 JJI=ITE2,1,-1 400 IF (SRTSRl (JJI, 2) .GT. ALPHA) ITE3=JJI IF (ITE3 .NE. ITE2) THEN ITE4=ITE3-1 ELSE ITE4=ITE3 ENDIF C PRINT* , ' ITE3 , ITE4 = ' , ITE3 , ITE4 C FIND THE MAXIMUM P_VALUE WHICH CAN NOT EXCEED ALPHA/2 AND ITS THETA. TMAX=SRTSR1(ITE4,1) PRMAX=SRTSR1 (ITE4 , 2) PALPHA=PRMAX RH01=TMAX C PRINT*, 'CLOSER ALPHA= ' , SRTSR(ITE, 1) ,SRTSR(ITE, 2) C PRINT*, 'LIMIT (RHO) = ' ,TMAX,PRMAX C IF (1ST .EQ.l) THEN C DO 420 JJI=ITE1,1,-1

PAGE 199

191 C420 WRITE(39,107) J JI , SRTSRl ( JJI , 1) , SRTSRl ( J JI , 2) C ELSE C DO 430 JJI=ITE1,1,-1 C430 WRITE(40,107) J JI , SRTSRl (JJI , 1) , SRTSRl (JJI , 2) C END IF 99 RETURN END IF GO TO 30 END C234567 SUBROUTINE COMPT(CC,ATO,PT,IOOTO) IMPLICIT REALMS (A-H,0-Z) PARAMETER(EPS=1 . d-14) integer itab(1000,4) ,infhyl(lOOO) ,infhyu( 1000 ) INTEGER INUM(270000,20) ,INUM1(270000,1) double precision hyp(0 : 2000) ,ds(0 : 1 , 0: 5500) ,lge,POBSH DOUBLE PRECISION liypd(lOOO , 0 : 2000) , POBSHl ,PEXIMP ,PEX DOUBLE PRECISION HYPDl (270000 , 1) ,HYPD2 (270000 , 20) DOUBLE PRECISION HYPD3 (270000 , 1) C HYPD3(270000, 1) IS PR(T) FOR EACH TABLE. INTEGER J,SCD DOUBLE PRECISION CA(5500) ,K,FF,CC(5500) DOUBLE PRECISION CHI (270000) ,CHI1 (270000) ,CHIOBS COMMON/ Cl 1/ ik , mxs , mxz , mxd , Ige , it ab , hyp , ds , ipar , kl , k2 , ierr , pobsh COMMON/ CI2/hypd , inf hyl , inf hyu , POBSHl , PEXIMP , PEX COMMON/PARAM/CA , J , SCD , K , FF COMMON /DKIM/ DENO , ITOT, ISUML, INUM,HYPD2 , INUMl , HYPDl c COMMON /DKIMl/ POBSHl , ISUMTS , IK COMMON /DKIMl/ ISUMTS COMMON / CHI/ CHI,CHIOBS DO 250 I=1,K2-K1+1 250 CC(I)=O.DO

PAGE 200

192 DO 300 I=1,IT0T IL=INUM(I,IK+1)+1 CC ( ID =CC ( IL) +HYPD2 ( I , IK+ 1 ) 300 CONTINUE c aa=0.d0 DO 310 I=1,K2-K1+1 CC(I)=CC(I)/DENO c aa=aa+cc(i) c print*, i, cc(i) ,aa 310 continue DO 320 I=1,IT0T IM=INUM(I,IK+1)+1 HYPD3(I,1)=CC(IM) 320 CONTINUE IC0UNT1=0 DO 400 I=1,IT0T C IF (INUM(I,IK+1) .EQ. ISUMTS) THEN IF (HYPD3(I,1) .EQ. CC(J)) THEN IC0UNT1=IC0UNT1+1 INUMl (ICOUNTI , 1)=INUM(I , IK+1) HYPDl (ICOUNTI , 1) =HYPD2(I , IK+1) CHI1(IC0UNT1)=CHI(I) END IF 400 CONTINUE PT0BS3=0.D0 PT0BS5=0.D0 DO 560 1=1, ICOUNTI C IF (INUMl (1,1) .EQ. ISUMTS C 1 .AND. HYPDl (1,1) .LE. POBSHl) THEN C IF (HYPDl (1,1) .LE. POBSHl) THEN C CHI-SQUARED STATISTIC IS USED FOR SECONDARY PARTITION. IF (CHIl(I) .GE. CHIOBS) THEN PT0BS3=PT0BS3+HYPD1 (1,1) ELSE PT0BS5=PT0BS5+HYPD1 (1,1) END IF 560 CONTINUE PT0=PT0BS3/DEN0

PAGE 201

193 ATD=PT0BS5/DEN0 PT=CC(J) C ATO IS INCLUDED IN ACCEPTANCE REGION. C PRINT*, 'IMPROVED P(T=T_0) = ' ,PTO c PRINT*, 'ORD P IMPROVED P(T=T_0) = ' ,ATO C CC(J)=PTO RETURN END C234567 C GIVEN ALPHA, STARTING VALUE, ITERA ITERATES AND RETURNS RHO A LIMIT. SUBROUTINE ITERA 1 (ALPHA, START, RHO 1 , ist , JCIO , PALPHA) IMPLICIT REAL*8 (A-H,0-Z) PARAMETER(EPS=1 . d-14) PARAMETER (NNIT= 1 000 ) c PARAMETER ( EP S = 1 . d6 ) integer itab(1000,4) ,infhyl(1000) ,infhyu(1000) double precision hyp(0 : 2000) ,ds(0 : 1 , 0 : 5500) , lge,POBSH,PSI DOUBLE PRECISION hypd(lOOO , 0 : 2000) , POBSHl ,PEXIMP ,PEX DOUBLE PRECISION SRT(NNIT,2) ,SRTS(NNIT) ,SRTSR(NNIT,2) DOUBLE PRECISION SRTl (NNIT, 2) , SRTSl (NNIT) ,SRTSR1 (NNIT, 2) INTEGER J,SCD DOUBLE PRECISION CA(5500) ,K,FF COMMON/ Cl 1/ik , mxs ,mxz , mxd , Ige , itab , hyp , ds , ipar , kl ,k2 , ierr , pobsh COMMON/ CI2/hypd , inf hyl , inf hyu , POBSHl , PEXIMP , PEX COMMON/PARAM/CA , J , SCD , K , FF COMMON/ITN/SRT , SRTS , SRTSR , SRTl , SRTS 1 , SRTSRl C IF (IST .EQ. 1) THEN C OPEN (UNIT=33 , FILE= ' mp_lo_ci . out ' ) C OPEN (UNIT=37 , FILE= ' mp_lo_all . out ' ) C ELSE C OPEN (UNIT=34,FILE= 'mp_up_ci . out ' ) C OPEN (UNIT=38 , FILE= ’ mp_up_all . out ‘ ) C ENDIF

PAGE 202

194 DO 5 JJI=1,NNIT SRT(JJI,l)=O.DO SRT(JJI,2)=0.D0 SRTS(JJI)=O.DO SRTSR(JJI,l)=O.DO SRTSR(JJI,2)=0.D0 SRTl(JJI,l)=O.DO SRT1(JJI,2)=0.D0 SRTSl(JJI)=O.DO SRTSRl(JJI,l)=O.DO 5 SRTSR1(JJI,2)=0.D0 PSI=START RHO=O.DO KL=0 ITE=0 ITE1=0 AALP=ALPHA/2.D0 IF (J .EQ. 1 .OR. J .EQ. SCD) AALP=ALPHA C FOR STERNE'S Cl, JCI=1 SHOULD BE ASSIGNED FOR ODR COMPUTATION, C WHENEVER WE CALL CNV2X2 . JCI=1 10 call cnv 2 x 2 (ik,mxs,mxz,mxd,lge,itab,hyp,ds,ipar,kl,k 2 ,ierr,pobsh, 1 jci,PSI) C JCI0=3 cc JCI0=0 CALL IMPROV ( ik , it ab , hypd , inf hy 1 , inf hyu , POBSHl , PEXIMP , 1 PEX, JCIO,PSI) ITE1=ITE1+1 SRT1(ITE1,1)=PSI SRT1(ITE1,2)=PEXIMP IF (ITEI .GT. 1000) THEN PRINT*, 'NOT CONVERGE IN ONE-SIDED MODIFIED P' RH01=-99999. 99999 PALPHA=-99999 . 99999

PAGE 203

195 GO TO 99 END IF C print*, 'psi, PEXIMP =' ,psi ,PEXIMP C IF (PEXIMP .GE. ALPHA/2. DO) THEN IF (PEXIMP .GE. AALP) THEN IN=1 ELSE IN=0 ITE=ITE+1 SRT(ITE,1)=PSI SRT(ITE,2)=PEXIMP END IF C C KL IS 0 UNTIL CORRECT VALUE IS SPANNED BY RHO AND OPSI, C THEN KL IS SET TO 1 . C C IN=1 IF PSI IS TOO LARGE, ELSE IN=0 . C IF (KL .EQ. 1) GO TO 40 IF (IN .EQ. 1) GO TO 20 RHO=PSI if (ist .eq. 1) PSI=PSI*1 . OIDO if (ist .eq. 2) PSI=PSI*0 . 99D0 GO TO 10 20 KL=1 OPSI=PSI 30 PSI=(RH0+0PSI)*0.5D0 C C NEW ESTIMATE IS MIDPOINT OF SPANNING INTERVAL GO TO 10 40 IF (IN .EQ. 1) OPSI=PSI IF (IN .NE. 1) RHO=PSI c IF (DABS(RHO/OPSI -l.DO) .LT. EPS) RETURN IF (DABS(RHO/OPSI -l.DO) .LT. EPS) THEN JCI=1 PSI=RHO call cnv 2 x 2 (ik,mxs,mxz,mxd,lge,itab,hyp,ds,ipar,kl,k 2 ,ierr,pobsh, 1 jci,PSI)

PAGE 204

196 CALL IMPROV (ik , itab ,hypd , inf hyl , inf hyu , POBSHl , PEXIMP , 1 PEX,JCIO,PSI) C print*, 'FINAL LIMIT : psi, PEXIMP =' ,psi , PEXIMP ITE=ITE+1 SRT(ITE,1)=PSI SRT(ITE,2)=PEXIMP IF (ITEl .GT. NNIT) THEN PRINT* INCREASE NNIT FOR ARRAYS SRT,SRTS' GO TO 99 ELSE C PRINT*, 'NO OF ITERATION ITE,ITE1= ',ITE,ITE1 C PRINT* END IF DO 100 JJI=1,ITE 100 SRTS(JJI)=SRT(JJI,2) CALL SHELL1(ITE,SRTS) DO 200 1=1, ITE DO 210 11=1, ITE IF (SRTS(I) .EQ. SRT(I1,2)) THEN SRTSR(I,1)=SRT(I1,1) SRTSR(I,2)=SRT(I1,2) GO TO 200 END IF 210 CONTINUE 200 CONTINUE C IF (1ST .Eq.l) THEN C DO 105 JJI=ITE,1,-1 C105 WRITE(33,107) J JI , SRTSR( J JI , 1) ,SRTSR( J JI , 2) C ELSE C DO 106 JJI=ITE,1,-1 C106 WRITE(34,107) JJI ,SRTSR( JJI , 1) ,SRTSR( JJI ,2) C ENDIF 107 F0RMAT(I10,2F20.15) C PALPHA=SRTSR(ITE,2) C RH01=SRTSR(ITE,1) C SORTING BY THETA DO 300 JJI=1,ITE1

PAGE 205

197 300 SRTS1(JJI)=SRT1(JJI,1) CALL SHELL1(ITE1,SRTS1) C FOR THE LOWER LIMIT P.VALUE IS SAVED IN ASCENDING ORDER. IF (1ST .EQ. 1) THEN DO 310 I=1,ITE1 DO 320 11=1, ITEl IF (SRTSl(I) .Eq. SRT1(I1,1)) THEN SRTSR1(I,1)=SRT1(I1,1) SRTSR1(I,2)=SRT1(I1,2) SRT1(I1,1)=0.D0 C FOR THE SAKE OF THE SAME THETA. GO TO 310 END IF 320 CONTINUE 310 CONTINUE C FOR THE UPPER LIMIT P_VALUE IS SAVED IN ASCENDING ORDER. C THAT IS, THETA IS SAVED IN DESCENDING ORDER. ELSE DO 330 1=1, ITEl DO 340 I1=ITE1,1,-1 IF (SRTSl(I) .EQ. SRTKII,!)) THEN SRTSR1(ITE1-I+1,1)=SRT1(I1,1) SRTSR1(ITE1-I+1,2)=SRT1(I1,2) SRT1(I1,1)=0.D0 C FOR THE SAKE OF THE SAME THETA. GO TO 330 ENDIF 340 CONTINUE 330 CONTINUE ENDIF DO 350 JJI=1,ITE1 IF (SRTSR1(JJI,1) .EQ. RHO) THEN ITE2=JJI GO TO 360 ENDIF 350 CONTINUE 360 CONTINUE C PRINT*, 'LIMIT (RHO) (ITE2) =',ITE2 ITE3=ITE2 DO 400 JJI=ITE2,1,-1 400 IF (SRTSR1(JJI,2) .GT. AALP) ITE3=JJI C400 IF (SRTSR1(JJI,2) .GT. ALPHA/2. DO) ITE3=JJI

PAGE 206

198 IF (ITE3 .NE. ITE2) THEN ITE4=ITE3-1 ELSE ITE4=ITE3 ENDIF C PRINT* , ' ITE3 , ITE4 = ' , ITE3 , ITE4 C FIND THE MAXIMUM P.VALUE WHICH CAN NOT EXCEED ALPHA/2 AND ITS THETA. TMAX=SRTSR1(ITE4,1) PRMAX=SRTSR1 ( ITE4 ,2) PALPHA=PRMAX RH01=TMAX C PRINT*, 'CLOSER ALPHA/2=' ,SRTSR(ITE, 1) ,SRTSR(ITE,2) C PRINT*, 'LIMIT(RHO) = ' , TMAX , PRMAX C IF (1ST .EQ.l) THEN C DO 420 JJI=ITE1,1,-1 C420 WRITE(37,107) J JI , SRTSRl ( JJI , 1) , SRTSRl ( J JI , 2) C ELSE C DO 430 JJI=ITE1,1,-1 C430 WRITE(38,107) J JI , SRTSRl (JJI , 1) , SRTSRl (JJI , 2) C ENDIF 99 RETURN ENDIF GO TO 30 END **** ***** *** *** ,|c Xc * !)C ** :(t =(c * ^ ;)c * =tc :)c **** )|c ;)c Jjc * SHELL SORT C234567 SUBROUTINE SHELL (N,ARR) c Sorts an array ARR of length N into ascending numerical order, c by the Shell-Mezgar algorithem (diminishing increment sort) . c N is input; ARR is replaced on output by its sorted rearrangement. IMPLICIT REAL*8 (A-H,0-Z) PARAMETER (ALN2I=1 . DO/O . 69314718 , TINY=l.E-5) REAL*8 ARR(5500) L0GNB2=INT (ALOG (FLOAT (N) ) *ALN2I+TINY)

PAGE 207

199 M=N DO 12 NN=1,L0GNB2 M=M/2 K=N-M DO 11 J=1,K I=J 3 CONTINUE L=I+M IF(ARR(L) .LT.ARR(D) THEN T=ARR(I) ARR(I)=ARR(L) ARR(L)=T I=I-M IF(I.GE.l)GO TO 3 END IF 11 CONTINUE 12 CONTINUE RETURN END C234567 SUBROUTINE SHELL 1 (N,ARR) c Sorts an array ARR of length N into ascending numerical order, c by the Shell-Mezgar algorithem (diminishing increment sort) . c N is input; ARR is replaced on output by its sorted rearrangement. IMPLICIT REALMS (A-H,0-Z) PARAMETER (ALN2I=1 . DO/O . 69314718 , TINY=l.E-5) PARAMETER (NNIT=1000) REAL*8 ARR(NNIT) L0GNB2=INT (ALOG (FLOAT (N) ) *ALN2I+TINY) M=N DO 12 NN=1,L0GNB2 M=M/2 K=N-M DO 11 J=1,K I=J 3 CONTINUE L=I+M IF(ARR(L) .LT.ARR(D) THEN T=ARR(I) ARR(I)=ARR(L) ARR(L)=T I=I-M IF(I.GE.1)G0 TO 3

PAGE 208

200 ENDIF 11 CONTINUE 12 CONTINUE RETURN END SUBROUTINE ITERAIO (ALPHA , START ,RH01 , ist , PALPHA ,HYPD10) C GIVEN ALPHA, STARTING VALUE, ITERA ITERATES AND RETURNS RHO C A LOWER LIMIT. IMPLICIT REALMS (A-H,0-Z) PARAMETER(EPS=1 . d-14) PARAMETER(NNIT=1000) c PARAMETER ( EP S = 1 . d6 ) integer itab (1000 ,4) , inf hyl (1000) , infhyu(lOOO) double precision hyp(0 ; 2000) ,ds(0 : 1 ,0 : 5500) , lge,POBSH DOUBLE PRECISION hypd(1000,0 :2000) ,P0BSH1 ,PEXIMP,PEX DOUBLE PRECISION SRT(NNIT,2) ,SRTS(NNIT) ,SRTSR(NNIT,2) DOUBLE PRECISION SRTl (NNIT, 2) , SRTSl (NNIT) , SRTSRl (NNIT, 2) INTEGER J,SCD DOUBLE PRECISION CA(5500) ,K,FF,CAl(5500) ,CA2(5500) ,CC(5500) DOUBLE PRECISION HYPD2 (270000 , 20) ,HYPD10 (270000 , 1) COMMON/CIl/ik,mxs,mxz,mxd,lge, itab, hyp, ds,ipar,kl,k2,ierr,pobsh COMMON/ CI2/hypd , inf hyl , inf hyu , POBSHl , PEXIMP , PEX COMMON/PARAM/CA , J , SCD , K , FF COMMON/ITN/SRT , SRTS , SRTSR , SRTl , SRTS 1 , SRTSRl PSI=START RHO=O.DO KL=0 ITE=0 ITE1=0 C FOR STERNE'S Cl, JCI=1 SHOULD BE ASSIGNED FOR ODR COMPUTATION, C WHENEVER WE CALL CNV2X2 . JCI=1 10 call cnv2x2 ( ik , mxs , mxz ,mxd , Ige , itab , hyp , ds , ipar ,kl , k2 , ierr ,pobsh , 1 jci,PSI) JCI0=3 CALL IMPROV ( ik , it ab , hypd , inf hyl , inf hyu , POBSHl , PEXIMP ,

PAGE 209

201 1 PEX, JCIO.PSI) C COMPUTE MODIFIED EXACT ALTERNATIVE PROB DISTN. CALL C0MPT10(CC,AT0,PT,HYPD10) RETURN END C234567 SUBROUTINE COMPT 1 0 ( CC , ATO , PT , HYPD 1 0 ) IMPLICIT REALMS (A-H.O-Z) PARAMETER(EPS=1 .d-14) integer itab(lOOO,4) , infhyl (1000) , inf hyu( 1000) INTEGER INUM(270000, 20) ,INUM1 (270000,1) double precision hyp (0 : 2000) ,ds (0 : 1 , 0 : 5500) , Ige ,P0BSH DOUBLE PRECISION hypd(lOOO , 0 : 2000) , POBSHl ,PEXIMP , PEX DOUBLE PRECISION HYPD1(270000, 1) ,HYPD2(270000,20) DOUBLE PRECISION HYPD3(270000, 1) ,HYPD10(270000 , 1) C HYPDIO (270000,1) IS PR(T) FOR EACH TABLE. INTEGER J,SCD DOUBLE PRECISION CA(5500) ,K,FF,CC(5500) COMMON/ Cl 1/ik , mxs , mxz , mxd , Ige , it ab , hyp , ds , ipar , kl ,k2 , ierr , pobsh C0MM0N/CI2/hypd, infhyl , infhyu, POBSHl ,PEXIMP , PEX COMMON/PARAM/CA , J , SCD , K , FF COMMON /DKIM/ DENO , ITOT, ISUML, INUM,HYPD2 , INUMl ,HYPD1 c COMMON /DKIMl/ POBSHl , ISUMTS , IK COMMON /DKIMl/ ISUMTS DO 300 1=1, ITOT HYPD 10(1,1) =HYPD2 ( I , I K+ 1 ) /DENO 300 CONTINUE RETURN END

PAGE 210

202 C EFFICIENT SCORE TEST STATISTICS 4 C234567 SUBROUTINE CMHNN 1 (NROW , NCOL , NSTM , NIK , N JK , NTOT , MATRIX , CMH , G 1 JCIO.FIT) C TO COMPUTE THE EFFICIENT SCORE TEST STATISTIC. C MAX NO. OF STRATUM; 1000 C NO. OF ROW AND COLUMN : 2, 2 C COMMON IS USED FOR NIK, NJK, NTOT C IMPLICIT REALMS (A-H,0-Z) DOUBLE PRECISION X,EV,CMH,G DOUBLE PRECISION FIT(1000,2,2) INTEGER MATRIX(1000,2,2) ,NIK(2,1000) ,NJK(2,1000) ,NT0T(1000) C COMMON /Al/ NIK, NJK, NTOT C DO 90 K=1,NSTM C WRITE(*,1000)K,MATRIX(K,1,1) ,MATRIX(K,1,2) ,MATRIX(K,2 1) C 1 MATRIX(K,2,2) C WRITE(*,1001)K,NIK(1,K) ,NIK(2,K) ,NJK(l,K) ,NJK(2,K) ,NTOT(K) C90 CONTINUE CIOOO FORMATC 'DATA' ,5110) ClOOl FORMATC' TOTAL' ,6110) X=O.DO G=O.DO DO 100 K=1,NSTM DO 110 1=1, NROW DO 110 J=1,NC0L IF (JCIO .EQ. 0) THEN EV= (NIK ( I , K) *N JK ( J , K) ) /DBLE (NTOT (K) ) ELSE EV=FIT(K,I, J) ENDIF X=X+ ( (DBLE (MATRIX (K , I , J) ) -EV) **2) /EV C IF (MATRIX (K, I, J) .EQ. 0) GO TO 110 C G=G+DBLE (MATRIX (K , I , J) ) *DLOG (DBLE (MATRIX (K , I , J) ) /EV) 110 CONTINUE

PAGE 211

203 100 CONTINUE CMH=X C G=2.D0*G C WRITE(*,1010)CMH,G ClOlO FORMAT ('CHI-SQUARED STATISTIC, G*2 =',2F12.7) RETURN END c ITERATIVE PROPORTIONAL FITTING ALGORITHM C (XZ,YZ) WITH N_{11K}=0R FOR K=1,IK, 1 OTHERWISE. SUBROUTINE IPF (PSI , IK , MATRIX , FIT) C IMPLICIT REAL*8(A-H,0-Z) C MAX NO. OF STRATA=100 C MAX NO. OF ITERATI0N=2000 PARAMETER(EPS=1 . D-8) DOUBLE PRECISION X (2 , 2 , 100) , E(2000 , 2 , 2 , 100) , EE(3 , 2 , 100) DOUBLE PRECISION XX(3,2,100) DOUBLE PRECISION ETH(lOO) ,FIT( 1000 , 2 , 2) DOUBLE PRECISION THETA, PSI ,XA,EA, PI ,P2 ,P3 DOUBLE PRECISION FSIK(2 , 100) ,FSJK(2, 100) INTEGER MATRIX (1000, 2, 2) THETA=PSI DO 5 K=1,IK DO 5 1=1,2 DO 5 J=l,2 5 X(I,J,K)=DBLE(MATRIX(K,I, J)) XA=O.DO DO 10 1=1,2 DO 10 J=l,2 DO 11 K=1,IK XA=XA+X(I, J,K) XX(1,I,J)=XA 11

PAGE 212

XA=O.DO 10 CONTINUE XA=O.DO DO 20 1=1,2 DO 20 K=1,IK DO 21 J=l,2 21 XA=XA+X(I,J,K) XX(2,I,K)=XA XA=O.DO 20 CONTINUE XA=O.DO DO 30 J=l,2 DO 30 K=1,IK DO 31 1=1,2 31 XA=XA+X(I,J,K) XX(3,J,K)=XA XA=O.DO 30 CONTINUE C C SET TO 1 FOR INITIAL VALUE OF EXPECTED VALUE C DO 40 1=1,2 DO 40 J=l,2 DO 50 K=1,IK IF (I .EQ. 1 .AND. J .EQ. 1) THEN E(1,I, J,K)=THETA ELSE E(1,I, J,K)=1.D0 END IF 50 CONTINUE C50 E(1,I, J,K)=1.D0 40 CONTINUE C C COMPUTATION ROUTINE C N=1 C KK=1 KK=2 2222 N=N+1 IF (N .GT. 2000) THEN PRINT*, INCREASE ARRAY E(2000 , 2 , 2 , 100) IN IPF

PAGE 213

PRINT*, 'IT DOES NOT CONVERGE WITHIN 2000 ITERATIONS. GO TO 999 END IF IF (KK .EQ. 1) GO TO 1000 IF (KK .EQ. 2) GO TO 2000 IF (KK .EQ. 3) GO TO 3000 C STEP 1 1000 EA=O.DO DO 45 1=1,2 DO 45 J=l,2 DO 46 K=1,IK 46 EA=EA+E(N-1,I,J,K) EE(1,I,J)=EA EA=O.DO 45 CONTINUE DO 100 1=1,2 DO 100 J=l,2 DO 100 K=1,IK E(N,I,J,K)=XX(1,I,J)*E(N-1,I,J,K)/EE(1,I,J) 100 CONTINUE KK=KK+1 GO TO 555 C C STEP 2 C 2000 EA=O.DO DO 57 1=1,2 DO 57 K=1,IK DO 51 J=l,2 51 EA=EA+E(N-1,I, J,K) EE(2,I,K)=EA EA=O.DO 57 CONTINUE DO 101 1=1,2 DO 101 J=l,2 DO 101 K=1,IK E(N,I,J,K)=XX(2,I,K)*E(N-1,I,J,K)/EE(2,I,K) 101 CONTINUE KK=KK+1 GO TO 555 C C STEP 3 C 3000 EA=O.DO

PAGE 214

206 DO 60 J=l,2 DO 60 K=1,IK DO 61 1=1,2 61 EA=EA+E(N-1,I, J,K) EE(3, J,K)=EA EA=O.DO 60 CONTINUE DO 102 1=1,2 DO 102 J=l,2 DO 102 K=1,IK E(N,I, J,K)=XX(3,J,K)*E(N-1,I,J,K)/EE(3,J,K) 102 CONTINUE C KK=1 KK=2 GO TO 555 C C CHECK CONVERGENCE C 555 DO 103 1=1,2 DO 103 J=l,2 DO 103 K=1,IK N1=N-1 N2=N-2 IF (N1 .LT. 0) Nl=l IF (N2 .LT. 0) N2=l P1=DABS(E(N,I, J,K)-E(N1,I,J,K)) P2=DABS (E(N , I , J , K) -E(N2 , I , J , K) ) P3=DABS (E(N1 , I , J , K) -E (N2 , I , J ,K) ) IF (PI .GT. EPS .OR. P2 .GT. EPS .OR. P3 .GT. EPS) 1 GO TO 2222 103 CONTINUE GO TO 1111 C C PRINT C nil CONTINUE DO 666 K=1,IK DO 667 1=1,2 DO 667 J=l,2 667 FIT(K,I,J)=E(N,I,J,K) 666 CONTINUE C WRITE(*,131) C WRITE(*,123) (((X(I,J,K) ,K=1,IK) ,J=1,2) ,1=1,2)

PAGE 215

207 C WRITE(*,132) C DO 157 11=1,3 C WRITE(*,124)((XX(I1,J,K),K=1,IK),J=1,2) C157 CONTINUE C WRITE (*,133) C WRITE(*,134) C DO 77 JJ=1,N C ESTIMATED ORS FOR EACH STRATUM C DO 78 K=1,IK C78 ETH(K)=E(JJ,1,1,K)*E(JJ,2,2,K)/(E(JJ,1,2,K)*E(JJ,2,1,K)) C NN=JJ-1 C WRITE(*,125)NN, (((E(JJ,I,J,K) ,K=1,IK) , J=1 , 2) , 1=1 , 2) , C 1 (ETH(K),K=1,IK) C77 CONTINUE C WRITE(*,135)N-1 C CHECK IF OBSERVED AND FITTED FREQUENCIES MATCH. C XX(2,I,K) : NIK(I,K) <->FSIK(I,K) C XX(3,J,K) : NJK(J,K) <->FSJK(J,K) C FSIK(I,K) ,FSJK(J,K) DO 670 K=1,IK DO 680 1=1,2 680 FSIK(I,K)=O.DO DO 690 J=l,2 690 FSJK(J,K)=O.DO 670 CONTINUE DO 700 K=1,IK DO 710 1=1,2 DO 710 J=l,2 710 FSIK(I,K)=FSIK(I,K)+FIT(K,I,J) DO 720 J=l,2 DO 720 1=1,2 720 FSJK(J,K)=FSJK(J,K)+FIT(K,I, J) 700 CONTINUE C WRITE(*,140) C WRITE(*,124)((FSIK(I,K) ,K=1 , IK) , 1=1 , 2) C WRITE(*,141) C WRITE(*,124)((FSJK(J,K),K=1,IK),J=1,2) 140 F0RMAT(/,10X, 'X-Z MARGINAL DATA FOR FITTED VALUES 0

PAGE 216

208 141 FORMAT(/,10X, 'Y-Z MARGINAL DATA FOR FITTED VALUES') DO 730 K=1,IK DO 730 1=1,2 IF (DABS(XX(2,I,K)-FSIK(I,K)) .GT. EPS) THEN PRINT*,I,K,XX(2,I,K),FSIK(I,K),XX(2,I,K)-FSIK(I,K) PRINT*, I, K,' OBSERVED AND FITTED FREQUENCIES DOES NOT MATCH ' PRINT*,' IN X-Z MARGINAL TABLE.' END IF 730 CONTINUE DO 740 K=1,IK DO 740 J=l,2 IF (DABS(XX(3, J,K)-FSJK(J,K)) .GT. EPS) THEN PRINT*, J,K,XX(3,J,K),FSJK(J,K),XX(3,J,K)-FSJK(J,K) PRINT*, J,K,' OBSERVED AND FITTED FREQUENCIES DOES NOT MATCH ' PRINT*,' IN Y-Z MARGINAL TABLE.' ENDIF 740 CONTINUE 123 F0RMAT(10(8F9.3,/)) 124 F0RMAT(20(5F9.3,/)) 125 F0RMAT(I3,1X,10(8F9.3,/)) 131 FORMAT (1 OX, 'OUTPUT' ,/,10X, 'DATA') 132 F0RMAT(/,10X, 'MARGINAL DATA FOR EACH STEP') 133 FORMAT (/,! OX, 'EXPECTED VALUE IN EACH ITERATION') 134 F0RMAT(6X,' M(lll) M(112) M(121) M(122) M(211) M(212) ' , 1 ' M(221) M(222) ORl 0R2') 135 FORMAT (1 OX, 'CONVERGENCE IN ',15,' ITERATIONS.') 999 RETURN END

PAGE 217

APPENDIX B SOURCE CODE EOR SIMULATION Eollowing are program structure and part of FORTRAN source code for approximating exact inference about conditional association in / x J x K contingency tables. It shows how the estimate of the ordinary or modified exact P-value for six tests can be constructed. B.l Program Structure Important parameters are defined as follows. NROW . Integer : input : number of rows in the observed matrix NCOL . Integer : input : number of columns in the observed matrix NSTM . Integer : input : number of strata in the observed matrix NROWTl : Integer array(50) : output : vector of row totals for the observed matrix at each stratum NCOLTl : Integer array(50) : output ; vector of column totals for the observed matrix at each stratum NROWT : Integer array(20,50) ; output : NROWTl is combined for all the strata NCOLT ; Integer array(20,50) : output : NCOLTl is combined for all the strata NTOT : Integer array(20) : output ; vector of stratum totals for the observed table JWORK : Integer array(50) : output : workspace MATRIX 1 : Integer array (50,50) : output ; the randomly generated two-way table at each stratum MX . Integer array(20,50,50) : input : the observed three-way table 209

PAGE 218

210 MATRIX : Integer array (20, 50, 50) : output : the randomly generated three-way table NCODE : Integer : input ; select the type of tests of conditional independence NRCM ; Integer : input : (NROW-l)x(NCOL-l) IDUM : Negative Integer : input : Seed C;MH : double precision : output : score statistic Important subroutines are defined as follows. Subroutine RCONT2 (NROW,NCOL,NSTM,NROWTl,NCOLTl,JWORK,MATRIXl,KEY,IFAULT,IDUM) . Generate Two-Way random tables with given marginal totals Subroutine COMPTOT(K,NROW,NCOL,MX,NROWT,NCOLT,NTOT) ; Compute row, column, and stratum totals Double precision Function RANI (IDUM) : Uniform Random Number Generator, which is used in Subroutine RCONT2 Subroutine GETWTS(NROW,NCOL,WTR,WTC,NCODE) : Get scores if ordinal variable is used Subroutine CMHNN(NRCM,NROW,NCOL,NSTM,MATRIX,CMH) : Compute score statistic assuming no three-factor interaction when both X and Y are nominal Subroutine CMHNO(NROW,NCOL,NSTM,MATRIX,CMH) ; Compute score statistic assuming no three-factor interaction when W is nominal.

PAGE 219

211 and Y is ordinal Subroutine CMHOO(NROW,NCOL,NSTM, MATRIX, CMH) ; Compute score statistic assuming no three-factor interaction when both X and Y are ordinal Subroutine CMHNN1(NRCM,NR0W,NC0L,NSTM, MATRIX, CMH) : Compute score statistic permitting three-factor interaction when both A' and Y are nominal Subroutine CMHN01(NR0W,NC0L,NSTM,MATRIX,CMH) : Compute score statistic permitting three-factor interaction when X is nominal, and Y is ordinal Subroutine CMH001(NR0W,NC0L,NSTM,MATRIX,CMH) : Compute score statistic permitting three-factor interaction when both .Y and Y are ordinal Other subroutines are involved to compute inverse matrix, matrix multiplication, and Kronecker product multiplication. B.2 Part of Source Code PROGRAM THREEWAY PARAMETER(lda=250) PARAMETER (ldal= 15) PARAMETER(epsilon=l . OE-14) IMPLICIT REALMS (A-H,0-Z) REAL*8 0(50,50) ,Dl(lda) ,VK(20,lda,lda) ,V(lda,lda) REALMS DlV(lda) ,det(2)

PAGE 220

DIMENSION NR0WT(20,50) ,NC0LT(20 , 50) .MATRIX (20, 50, 50) DIMENSION JW0RK(50) ,MX(20,50,50) ,NT0T(20) ,NNT0T(20) DIMENSION NR0WTK50) ,NC0LT1(50) ,MATRIX1 (50 , 50) DIMENSION NR0WT2(50) ,NC0LT2(50) DIMENSION NIK(50,20),NJK(50,20) C DIMENSION NIJ(50,50) REAL*8 FACT(25001) ,WTR(50) ,WTC(50) LOGICAL KEY LOGICAL LSP,LSM LOGICAL KIM C LOGICAL KEYl COMMON /B/ NROWM,NCOLM,FACT C COMMON /Bl/ NIK,NJK COMMON /Al/ NIK,NJK,NTOT COMMON /A3/ D,Dl,VK,V,DIV,det COMMON /A4/ WTR,WTC C COMMON /TEMPRY/ HOP C DATA MAXTOT /25000/ C C input Simulatioii Informati on ******:(=!(=*** C WRITE(*, 10000) 10000 F0RMAT(3(/) ,T12, '***** LxL5 (version 8.0 — 4/16/94) 1 /,T12,'SIX EFFICIENT SCORE STATISTICS',/, 2 T12,'F0R TESTING CONDITIONAL INDEPENDENCE',/, 3 T12 , ' OF THREE-WAY TABLES . ' , /) write(* , 10001) 10001 F0RMAT(T12,' THIS PROGRAM CALCULATES',/, 1 T12, 'PRECISE ESTIMATES AND CONFIDENCE INTERVALS',/, 2 T12,'F0R THE MODIFIED EXACT P-VALUES . ' , / , 3 T12,'THEY UTILIZE BOTH SCORE STATISTICS.',/) WRITE (*,45) 45 FORMAT (/,/,/, 'ENTER NUMBER OF STRATUMS: ') READ(*,*) NSTM WRITE(*,50)

PAGE 221

50 FORMAT (/, 'ENTER NUMBER OF ROWS AND COLS: ') READ(*,*) NROW,NCOL 52 WRITE(*,55) 55 FORMAT (/, 'ENTER CODE FOR TESTING: ', 1 1,1,' ASSUMING NO-THREE FACTOR INTERACTION :', 1 ! J ,Â’ 1 : NOMINAL BY NOMINAL' , 2 /,' 2 : NOMINAL BY ORDINAL', 3 ! 3 : ORDINAL BY ORDINAL ' , / 4 1,1,' W/0 ASSUMING NO-THREE FACTOR INTERACTION : ' , 1 />/.' 4 : NOMINAL BY NOMINAL' , 2 ! ,' 5 : NOMINAL BY ORDINAL', 3 ! ,' 6 : ORDINAL BY ORDINAL',/) READ(*,*) NCODE IF (NCODE. EQ.l .OR. NCODE. EQ. 2 .OR. NCODE. EQ. 3 .OR. 1 NCODE. EQ. 4 .OR. NCODE. EQ. 5 .OR. NCODE. EQ. 6) GO TO 57 PRINT*, 'PLEASE ENTER THE NUMBER (1 TO 6).' GO TO 52 57 IF (NCODE .EQ. 1 .OR. NCODE .EQ. 4) GO TO 60 CALL GETWTS (NROW , NCOL , WTR , WTC , NCODE) 60 WRITE(*,75) 75 FORMAT (/, 'ENTER OBSERVED TABLES FOR EACH STRATUM (ROW BY ROW)-') DO 5 K=1,NSTM PRINT* ,' STRATUM NUMBER =' ,K DO 10 1=1, NROW READ(*,*) (MX(K,I, J) , J=1,NC0L) 10 CONTINUE 5 CONTINUE WRITE(*,80) 80 FORMAT(/, 'ENTER NUMBER OF SIMULATION:') READ(*,*) NSIM WRITE(*,85) 85 FORMAT (/, 'ENTER THE SEED (INTEGER) :') READ(*,*) ISEED C DO 20 K=1,NSTM CALL COMPTOT (K , NROW , NCOL , MX , NROWT , NCOLT , NTOT)

PAGE 222

CONTINUE DO 23 K=1,NSTM NNTOT(K)=NTOT(K) CALL SHELL (NSTM.NNTOT) NNTOTAL=NNTOT (NSTM) PRINT* PRINT*, 'MAX OF NTOT =',NNTOTAL PRINT* PRINT*, 'ROW TOTAL' DO 25 K=1,NSTM PRINT*, K,' : ' ,(NROWT(K,I),I=l,NROW) CONTINUE PRINT* PRINT* ,' COLUMN TOTAL' DO 27 K=1,NSTM PRINT*, K,' : ',(NCOLT(K,I),I=l,NCOL) CONTINUE PRINT* PRINT* , ' STRATUM TOTAL ' PRINT* , (NTOT(K) ,K=1 ,NSTM) PRINT* PRINT* CALL MARTAB (NROW , NCOL , NSTM , NROWT , NCOLT , NIK , N JK) NRCM=(NROW-l)*(NCOL-l) CALL CPOBS (NROW , NCOL , NSTM ,MX ,NNTOTAL , P_OBS) POBS=P_OBS IF (NCODE.LE.3) THEN IF (NCODE.Eq.l) THEN CALL CMHNN (NRCM , NROW , NCOL , NSTM , MX , CMH) CMHOBS=CMH PRINT*, 'CMH, CMHOBS=' ,CMH,CMHOBS CALL CMHNN 1 (NROW , NCOL , NSTM , MX , CMH) CMHOBSl=CMH PRINT*, 'CMH,CMHOBSl=' ,CMH,CMHOBSl ELSE IF (NC0DE.EQ.2) THEN

PAGE 223

215 CALL CMHNO(NROW,NCOL,NSTM,MX,CMH) CMHOBS=CMH CALL CMHNN (NRCM , NROW , NCOL , NSTM ,MX , CMH) CMHOBSl=CMH ELSE CALL CMHOQ (NROW, NCOL, NSTM, MX, CMH) CMHOBS=CMH CALL CMHNO (NROW, NCOL, NSTM, MX, CMH) CMHOBSl=CMH END IF END IF ELSE IF (NCODE.EQ.4) THEN CALL CMHNN 1 (NROW , NCOL , NSTM , MX , CMH) CMHOBS=CMH C NO MORE GENERAL STATISTIC FOR T' ; WILL USE P({N}) ELSE IF (NC0DE.EQ.5) THEN CALL CMHNO 1 (NROW , NCOL , NSTM , MX , CMH) CMHOBS=CMH CALL CMHNN 1 (NROW , NCOL , NSTM , MX , CMH) CMHOBSl=CMH ELSE CALL CMHOO 1 (NROW , NCOL , NSTM , MX , CMH) CMHOBS=CMH CALL CMHNO 1 (NROW , NCOL , NSTM , MX , CMH) CMHOBSl=CMH END IF ENDIF END IF PRINT* PRINT*, 'THE OBSERVED PRIMARY SCORE STATISTIC =',CMHOBS PRINT*, 'THE OBSERVED SECONDRY SCORE STATISTIC =',CMHOBSl PRINT*, 'PROB OF OBSERVED TABLE = ' , SNGL (FOBS) PRINT* WRITE (*,28) ^8 FORMAT (/, 'PRINT EACH RANDOM TABLES ? (Y=1,N=0)') READ(*,*) NDATA begin simulation **********************:*: S=O.DO

PAGE 224

216 ITC0UNT=0 120 PRINT* PRINT*, 'RANDOM TABLES WITH FIXED MARGINS' PRINT* ITC0UNT=ITC0UNT+1 IDUM=(-1)*ISEED DO 100 IJ=1,NSIM SH0P=1 .DO IF (NDATA .NE. 1) GO TO 103 PRINT*, 'SIMULATION NO = ',IJ 103 DO 105 K=1,NSTM KK=K DO 90 L=1,NR0W NROWTl (L) =NROWT(KK , L) 90 CONTINUE DO 92 L=1,NC0L NC0LT1(L)=NC0LT(KK,L) 92 CONTINUE CALL RC0NT2 ( IJ , KK , NROW , NCOL , NSTM , NROWTl , NCOLTl , JWORK , 1 MATRIX 1 , KEY , IFAULT , NNTOTAL , ISEED , IDUM) C PRINT*, 'PROB OF RANDOM TABLE =',HOP SHOP=SHOP*HOP DO 95 L=1,NR0W DO 96 M=1,NC0L MATRIX (KK , L ,M) =MATRIX1 (L , M) 96 CONTINUE 95 CONTINUE IF (NDATA .NE. 1) GO TO 97 DO 98 1=1, NROW PRINT* , (MATRIX (KK , I , J) , J=1 , NCOL) 98 CONTINUE PRINT* 97 IF (IFAULT .NE. 0) THEN

PAGE 225

217 105 PRINT* , ' IFAULT GO TO 1000 END IF CONTINUE = \ IFAULT, 'KEY = ' ,KEY C C C C C CALL COMPMAR (NROW , NCOL , NSTM , MATRIX , NI J , KEYl ) POBS : PROB OF OBSERVED TABLE, COMPUTED FROM CPOBS FOR TOTAL STRATUM HOP : PROB OF RANDOM TABLE, COMPUTED FROM RC0NT2 FOR ONE STRATUM SHOP : PROB OF RANDOM TABLE , COMPUTED FROM RC0NT2 FOR TOTAL STRATUM IF (NC0DE.LE.3) THEN IF (NCODE.EQ.l) THEN CALL CMHNN (NRCM , NROW , NCOL , NSTM , MATRIX , CMH) CMHRAN=CMH CALL CMHNN 1 (NROW , NCOL , NSTM , MATRIX , CMH) CMHRAN1=CMH IF (CMHRAN .GT. CMHOBS .OR. CMHRAN .EQ. CMHOBS .AND. CMHRAN 1 .GE. CMHOBS 1) 1 CMHRAN .EQ. CMHOBS .AND. SNGL(SHOP) .LE. SNGL(POBS)) ! S=S+1 IF (CMHRAN .GE. CMHOBS) S=S+1 ELSE IF (NC0DE.EQ.2) THEN CALL CMHNO (NROW , NCOL , NSTM , MATRIX , CMH) CMHRAN=CMH CALL CMHNN (NRCM , NROW , NCOL , NSTM , MATRIX , CMH) CMHRAN 1=CMH IF (CMHRAN .GT. CMHOBS .OR. CMHRAN .EQ. CMHOBS .AND. CMHRANl .GE. CMHOBSl) CMHRAN .EQ. CMHOBS .AND. SNGL(SHOP) .LE. SNGL(POBS)) S=S+1 IF (CMHRAN .GE. CMHOBS) S=S+1 ELSE CALL CMHOO (NROW , NCOL , NSTM , MATRIX , CMH) CMHRAN=CMH CALL CMHNO (NROW , NCOL , NSTM , MATRIX , CMH) CMHRAN 1=CMH IF (CMHRAN .GT. CMHOBS .OR. CMHRAN .EQ. CMHOBS .AND. CMHRANl .GE. CMHOBSl) CMHRAN .EQ. CMHOBS .AND. SNGL(SHOP) .LE. SNGL(POBS))

PAGE 226

218 2 S=S+1 C IF (CMHRAN .GE. CMHOBS) S=S+1 ENDIF END IF ELSE IF (NCODE.EQ.4) THEN CALL CMHNN 1 (NROW , NCOL , NSTM , MATRIX , CMH) CMHRAN=CMH C NO MORE GENERAL STATISTIC FOR T^ USE P({Z}) IF (CMHRAN .GT. CMHOBS .OR. 1 CMHRAN .EQ. CMHOBS .AND. SNGL(SHOP) .LE. SNGL(POBS)) 2 S=S+1 C IF (CMHRAN .GE. CMHOBS) S=S+1 ELSE IF (NC0DE.EQ.5) THEN CALL CMHNO 1 (NROW , NCOL , NSTM , MATRIX , CMH) CMHRAN=CMH CALL CMHNN 1 (NROW , NCOL , NSTM , MATRIX , CMH) CMHRAN 1=CMH IF (CMHRAN .GT. CMHOBS .OR. CMHRAN .EQ. CMHOBS .AND. CMHRAN 1 .GE. CMHOBS 1) CMHRAN .EQ. CMHOBS .AND. SNGL(SHOP) .LE. SNGL(POBS)) S=S+1 IF (CMHRAN .GE. CMHOBS) S=S+1 ELSE CALL CMHOO 1 (NROW , NCOL , NSTM , MATRIX , CMH) CMHRAN=CMH CALL CMHNO 1 (NROW , NCOL , NSTM , MATRIX , CMH) CMHRAN 1=CMH IF (CMHRAN .GT. CMHOBS .OR. CMHRAN .EQ. CMHOBS .AND. CMHRANl .GE. CMHOBS 1) CMHRAN .EQ. CMHOBS .AND. SNGL(SHOP) .LE. SNGL(POBS)) S=S+1 IF (CMHRAN .GE. CMHOBS) S=S+1 ENDIF ENDIF ENDIF

PAGE 227

219 IF (NDATA .NE. 1) GO TO 100 PRINT*, 'THE PRIMARY SCORE STATISTIC FROM RANDOM TABLE =',CMHRAN PRINT*, 'THE SECONDARY SCORE STATISTIC FROM RANDOM TABLE =',CMHRAN1 PRINT*, 'PROB OF RANDOM TABLE = ' , SNGL(SHOP) PRINT* , IJ , CMHRAN , CMHOBS , SNGL (S) , SNGL (SHOP) PRINT* 100 CONTINUE ITNSIM=NSIM*ITCOUNT P_EXACT=S/ITNSIM VAR_P=P_EXACT* ( 1 . DO-P_EXACT) /ITNSIM CI1=P_EXACT-1 . 96D0*DSQRT(VAR_P) CI2=P_EXACT+1 . 96D0*DSQRT(VAR_P) STD_P=DSQRT(VAR_P) PRINT* PRINT*, 'UPDATED ESTIMATE OF P_VALUE = ' , SNGL (P_EXACT) PRINT*, 'UPDATED ESTIMATE OF VAR_P =' ,SNGL(VAR~P) PRINT*, 'UPDATED ESTIMATE OF STD_P = ' , SNGL (STD_P) PRINT* PRINT*, 'A 95*/. CONFIDENCE INTERVAL FOR UPDATED ESTIMATE OF P : ' PRINT*,' ',SNGL(CI1),SNGL(CI2) WRITE(*,125) NSIM,ISEED 125 F0RMAT(/,I8,' TABLES SAMPLED WITH CURRENT STARTING SEED ',18) WRITE(*,126) ITNSIM 126 FORMAT (18,' TABLES SAMPLED TOTALLY') c*=(==)=***=ic******:t;***,,.„.*,^,^,^ simulation *******************:+:**** WRITE(*,110) 110 FORMAT(/,'DO YOU WANT TO SAMPLE MORE TABLES ? (Y=1,N=0) :') READ(*,*) MORET IF (MORET .EC). 1) THEN WRITE(*,115) 115 FORMAT (/, 'PLEASE REENTER THE SEED (INTEGER) :') READ(*,*) ISEED GO TO 120 ENDIF

PAGE 228

220 PRINT* PRINT* , ' END ' 1000 STOP END C**************** end of main program ****=('****>t:**=(t**5i<*>it***j(c:(t*!(c C234567 SUBROUTINE RC0NT2 ( IJ , KK , NROW , NCOL , NSTM , NROWTl , NCOLTl , JWORK , 1 MATRIX 1 , KEY , IFAULT , NNTOTAL , ISEED , IDUM) C C ALGORITHM AS 159 APPL. STATIST. (1981) VOL. 30, NO . 1 C GENERATE RANDOM TWO-WAY TABLE WITH GIVEN MARGINAL TOTALS C C CODES ARE MODIFIED BY DONGUK KIM TO BE USED FOR THE GENERATION C OF THREE WAY TABLES, AND DEXP,DBLE,AND DLOG ARE USED INSTEAD OF C EXP, FLOAT, AND LOG. C NNTOTAL IS THE MAXIMUM OF NTOTAL FOR THE STRATUM C AND USED IN COMPUTING LOG-FACTORALS . IMPLICIT REAL*8 (A-H,0-Z) DIMENSION NR0WTK50) ,NC0LT1(50) ,MATRIX1 (50 , 50) DIMENSION JW0RK(50) DIMENSION NR0WT2(50) ,NC0LT2(50) REAL*8 FACT(25001) LOGICAL KEY LOGICAL LSP,LSM COMMON /B/ NROWM,NCOLM,FACT C COMMON /TEMPRY/ HOP C DATA MAXTOT /25000/ C C IDUM=(KK+(IJ-1)*NSTM)*(-1) -ISEED C PRINT*, 'IDUM=' , IDUM C IFAULT=0 DO 100 1=1, NROW IF (NROWTl(I) .LE. 0) GOTO 214 100 CONTINUE

PAGE 229

221 NT0TAL=0 DO 101 J=1,NC0L IF (NCOLTl(J) .LE. 0) GOTO 215 NT0TAL=NT0TAL+NC0LT1 ( J) 101 CONTINUE IF (NTOTAL .GT. MAXTOT) GOTO 216 IF (KEY) GOTO 103 C C SET KEY FOR SUBSEQUENT CALLS C KEY=.TRUE. C C CHECK FOR FAULTS AND PREPARE FOR FUTURE CALLS C IF (NROW .LE. 1) GOTO 212 IF (NCOL .LE. 1) GOTO 213 NR0WM=NR0W-1 NC0LM=NC0L-1 C C CALCULATE LOG-FACTORIALS C X=O.DO FACT(1)=0.D0 DO 102 I=1,NNT0TAL X=X+DL0G(DBLE(D) FACT(I+1)=X 102 CONTINUE c print*, 'I factorial' c do 90 i=l,20 c print*,i,dexp(fact(i)) c90 continue C C c CONSTRUCT RANDOM MATRIX C c 103 DO 105 J=1,NC0LM 105 JW0RK(J)=NC0LT1(J) JC=NTOTAL C H0P=1.D0

PAGE 230

222 C DO 190 L=1,NR0WM NR0WTL=NR0WT1 (L) IA=NROWTL IC=JC JC=JC-NROWTL DO 180 M=1,NC0LM ID=JWORK(M) IE=IC IC=IC-ID IB=IE-IA II=IB-ID C C TEST FOR ZERO ENTRIES IN MATRIX C IF (IE .NE. 0) GOTO 130 DO 121 J=M,NCOL 121 MATRIXKL, J)=0 GOTO 190 C C GENERATE PSEUDO-RANDUM NUMBER C 130 RAND=RAN1(IDUM) C C COMPUTE CONDITIONAL EXPECTED VALUE OF MATRIX (L,M) C 131 NLM=DBLE(IA*ID)/DBLE(IE)+0.5 IAP=IA+1 IDP=ID+1 IGP=IDP-NLM IHP=IAP-NLM NLMP=NLM+1 IIP=II+NLMP X=DEXP(FACT(IAP)+FACT(IB+1)+FACT(IC+1)+FACT(IDP)1 F ACT ( I E+ 1 ) -F ACT (NLMP ) -FACT ( IGP ) -FACT ( IHP) -FACT (IIP)) IF (X .GE. RAND) GOTO 160 SUMPRB=X Y=X NLL=NLM LSP=. FALSE. LSM=. FALSE. C C INCREMENT ENTRY IN ROW L, COLUMN M C 140 J=(ID-NLM)*(IA-NLM)

PAGE 231

IF (J .EQ. 0) GOTO 156 NLM=NLM+1 X=X*DBLE ( J) /DBLE (NLM* (II+NLM) ) SUMPRB=SUMPRB+X IF (SUMPRB .GE. RAND) GOTO 160 150 IF (LSM) GOTO 155 C C DECREMENT ENTRY IN ROW L, COLUMN M C J=NLL*(II+NLL) IF (J .EQ. 0) GOTO 154 NLL=NLL-1 Y=Y*DBLE(J)/DBLE((ID-NLL)*(IA-NLL)) SUMPRB=SUMPRB+Y IF (SUMPRB .GE. RAND) GOTO 159 IF (.NOT.LSP) GOTO 140 GOTO 150 154 LSM=.TRUE. 155 IF (.NOT.LSP) GOTO 140 RAND=SUMPRB*RAN1 (IDUM) GOTO 131 156 LSP=.TRUE. GOTO 150 159 NLM=NLL C HOP=HOP*Y GOTO 161 160 HOP=HOP*X 161 MATRIX 1(L,M)=NLM C C160 MATRIX 1(L,M)=NLM IA=IA-NLM JWORK (M) = JWORK (M) -NLM 180 CONTINUE MATRIX1(L,NC0L)=IA 190 CONTINUE C C COMPUTE ENTRIES IN LAST ROW OF MATRIX C DO 192 M=1,NC0LM 192 MATRIXl (NROW, M)= JWORK (M) MATRIX1(NR0W,NC0L)=IB-MATRIX1(NR0W,NC0LM) C PRINT*, 'HOP = ' ,HOP

PAGE 232

224 C C CHECK THE RANDOM TABLES SATISFY FIXED ROW TOTALS AND COLUMN TOTALS. C CALL COMPTOTl (NROW , NCOL , MATRIX 1 , NR0WT2 , NC0LT2) DO 195 M=l,NROW IF (NR0WT2(M) .NE. NROWTl(M)) GO TO 200 195 CONTINUE DO 197 M=1,NC0L IF (NC0LT2(M) .NE. NCOLTl(M)) GO TO 202 197 CONTINUE RETURN C C SET FAULTS C 212 IFAULT=1 RETURN 213 IFAULT=2 RETURN 214 IFAULT=3 RETURN 215 IFAULT=4 RETURN 216 IFAULT=5 RETURN 200 PRINT*, M,'th ROW TOTAL IS WRONG.' RETURN 202 PRINT*, M,'th COLUMN TOTAL IS WRONG.' RETURN END * Uniform random generator DOUBLE PRECISION FUNCTION RANl(IDUM) IMPLICIT REAL*8 (A-H,0-Z) REAL*8 R(97) PARAMETER (Ml=259200 , IA1=7141 , IC1=54773 , RM1=3 . 8580247E-6) PARAMETER (M2= 134456 , IA2=8121 , IC2=2841 1 , RM2=7 . 4373773E-6)

PAGE 233

PARAMETER (M3=243000 , IA3=4561 , IC3=51349) DATA IFF /O/ IF (IDUM.LT.O.OR.IFF.EQ.O) THEN IFF=1 IX1=M0D(IC1-IDUM,M1) IX1=M0D(IA1*IX1+IC1 ,M1) IX2=M0D(IX1,M2) IX1=M0D(IA1*IX1+IC1 ,M1) IX3=MOD(IXl,M3) DO 11 J=l,97 IX1=MDD(IA1*IX1+IC1,M1) IX2=M0D(IA2*IX2+IC2,M2) R(J)=(DBLE(IX1)+DBLE(IX2)*RM2)*RM1 11 CONTINUE IDUM=1 END IF IX1=M0D(IA1*IX1+IC1 ,M1) IX2=M0D(IA2*IX2+IC2,M2) IX3=M0D(IA3*IX3+IC3,M3) J=1+(97*IX3)/M3 IF(J.GT.97.0R. J.LT.DPAUSE RAN1=R(J) R(J)=(DBLE(IX1)+DBLE(IX2)*RM2)*RM1 RETURN END C234567 SUBROUTINE GETWTS (NROW , NCOL , WTR , WTC , NCODE) IMPLICIT REAL*8 (A-H,0-Z) INTEGER NROW, NCOL REAL*8 WTR(50) ,WTC(50) IF (NCODE .EQ. 2 .OR. NCODE .EQ.B) GO TO 105 WRITE(*,100) 100 FORMAT (/, 'ENTER ROW SCORES: ') READ(*,*) (WTR(I) ,1=1, NROW) 105 WRITE(*,110) 110 FORMAT (/, 'ENTER COLUMN SCORES: ') READ(*,*) (WTC(J) , J=1,NC0L) RETURN END

PAGE 234

226 C SCORE STATISTICS 1 C234567 SUBROUTINE CMHNN (NRCM , NROW , NCOL , NSTM , MATRIX , CMH) C TO COMPUTE SCORE STATISTIC C FOR THE TEST OF THE CONDITIONAL INDEPENDENCE OF THE I*J*K TABLES C WHEN X IS NOMINAL AND Y IS NOMINAL. C COMMON IS USED FOR NIK,NJK,NTOT IMPLICIT REALMS (A-H,0-Z) PARAMETER(lda=250) DIMENSION MATRIX(20,50,50) ,NIK(50,20) ,NJK(50,20) ,NT0T(20) REALMS D(50,50) ,Dl(lda) ,VK(20,lda,lda) ,V(lda,lda) REALMS DlV(lda) ,det(2) c REAL*8 D(50, 50) ,D1(NRCM) ,VK(20, NRCM, NRCM) ,V(NRCM, NRCM) c REALMS DIV(NRCM) ,det(2) c realms VINV(lda,lda) integer Ida, NRCM, info , job LOGICAL KIM COMMON /Al/ NIK,NJK,NTOT COMMON /A3/ D,Dl,VK,V,DIV,det NR0WM=NR0W-1 NC0LM=NC0L-1 C COMPUTE D((NROWM*NCOLM) ,1) VECTOR DO 100 I=1,NR0WM DO 105 J=1,NC0LM D(I, J)=O.DO DO no K=1,NSTM D(I, J)=D(I,J)+(MATRIX(K,I,J)-(NIK(I,K)*NJK(J,K))/DBLE(NTOT(K))) 110 continue 105 CONTINUE 100 CONTINUE DO 115 I=1,NR0WM DO 120 J=1,NC0LM K=(I-1)*NC0LM+J 120 D1(K)=D(I,J) 115 CONTINUE

PAGE 235

c c c c 122 160 150 140 130 125 190 180 170 195 227 IF (KIM) GO TO 15 SET KIM FOR SUBSEQUENT CALLS KIM=.TRUE. COMPUTE V ( (NROWM+NCOLM) , (NROWM*NCOLM) ) MATRIX DO 125 K=1,NSTM L=0 DO 130 I=1,NR0WM DO 140 J=1,NC0LM L=L+1 M=0 DO 150 IP=1,NR0WM DO 160 JP=1,NC0LM IND1=0 IND2=0 IF (I.EQ.IP) IND1=1 IF (J.EQ.JP) IND2=1 M=M+1 VK(K,L,M)=NIK(I,K)*(IND1*NT0T(K)-NIK(IP,K)) L *NJK( J ,K) * (IND2*NT0T(K) -NJK( JP ,K) ) / ^ DBLE(NT0T(K)*NT0T(K)*(NT0T(K)-1.D0)) CONTINUE CONTINUE CONTINUE CONTINUE CONTINUE DO 170 I=1,NRCM DO 180 J=1,NRCM V(I, J)=O.DO DO 190 K=1,NSTM V(I,J)=V(I,J)+VK(K,I,J) CONTINUE CONTINUE WRITE(*,195) FORMAT (/, 'PRINT NULL COVARIANCE MATRIX ? (Y=1,N=0)') READ(*,*) NCOV IF (NCOV .NE. 1) GO TO 196 print*

PAGE 236

228 PRINT*, 'NULL COVARIANCE MATRIX DO 191 I=1,NRCM PRINT*, (SNGL(V(I,J)) ,J=1,NRCM) 191 CONTINUE 196 J0B=01 n=NRCM c lda=NRCM CALL dpof a(V,lda,n, inf o) IF (INFO .NE. 0) THEN WRITE(*,99) INFO 99 F0RMAT(/, 'THE FACTORIZATION IS NOT COMPLETE.',/, 1 'THE LEADING MINOR OF ORDER' , 15 ,' IS NOT POSITIVE DEFINITE.') PRINT* END IF CALL dpodi(V,lda,n,det, job) C COMPUTE DETERMINENT AND INVERSE MATRIX OF c A CERTAIN REAL SYMMETRIC POSITIVE DEFINITE MATRIX. C ONLY FOR SYMMETRIC MATRIX ! ! c DPODI PRODUCES THE UPPER HALF OF INVERSE OF V. C RETURNED V IS THE VAR-COV MATRIX OF V. DO 5 1=2, n DO 6 J=1,I-1 6 V(I,J)=V(J,I) 5 CONTINUE 7 WRITE(*,198) 198 FORMAT (/, 'PRINT INVERSE MATRIX OF NULL COV. MATRIX ? (Y=1,N=0)') READ(*,*) NINVC IF (NINVC .NE. 1) GO TO 15 PRINT* PRINT* , ' INVERSE MATRIX : ' do 10 i=l,n 10 print*, (SNGL(V(i,j)) ,j=l,n) 15 CALL MULTVA(D1,V,NRCM,NRCM,DIV) CALL INNER(DIV,D1,NRCM,CMHV) CMH=CMHV C PRINT*, 'SCORE STATISTIC =' ,CMH

PAGE 237

229 RETURN END C SCORE STATISTIC 2 C234567 SUBROUTINE CMHNO (NROW , NCOL , NSTM , MATRIX , CMH) C TO COMPUTE SCORE STATISTIC C FOR THE TEST OF THE CONDITIONAL INDEPENDENCE OF THE I*J*K TABLES C WHEN X IS NOMINAL AND Y IS ORDINAL. C COMMON IS USED FOR NIK,NJK,NTOT C COMMON IS USED FOR WTR.WTC IMPLICIT REALMS (A-H,0-Z) PARAMETER(lda=250) PARAMETER (ldal=l 5) DIMENSION MATRIX(20,50,50) ,NIK(50,20) ,NJK(50,20) ,NT0T(20) REALMS WTR(50) ,WTC(50) C global arrays REAL*8 PIK(50,20) ,PJK(50,20) REALMS NK(20,lda,l) ,MK(20 ,lda, 1) ,VK(20 ,lda, Ida) REALMS GK(20,15,15) , VGK(20 , 15 , 15) ,G(15,15) ,VG(15,15) ,GT(15,15) REAL*8 BK(lda,lda) ,BKT(lda,lda) ,CK(15,15) C local arrays REAL*8 A(15,15) ,A1(15,15) , Cl (15, 15) ,C2 (15 , 15) , C3(15 , 15) REAL*8 D(15,15) ,B(15) ,Y(15) REAL*8 C(lda,lda) ,GNMK(lda,lda) ,YK(lda,lda) ,V(lda,lda) REAL*8 GTVG(15,15) int eger Ida , Idal , NROWM , NNN , NRNC , inf o , j ob LOGICAL KIM COMMON /Al/ NIK,NJK,NTOT COMMON /A4/ WTR.WTC NNN=1 NROWM=NROW-l NRNC=NROW*NCOL IF (KIM) GO TO 1000 C C SET KIM FOR SUBSEQUENT CALLS

PAGE 238

230 C KIM=.TRUE. C COMPUTE NULL VAR-COV MATRIX VG(NROWM,NRDWM) DO 100 K=1,NSTM DO 110 I=1,NR0W 1 10 PIK ( I , K) =DBLE (NIK (I , K) ) /DELE (NTOT (K) ) DO 120 J=1,NC0L 120 PJK(J,K)=DBLE(NJK(J,K))/DBLE(NTOT(K)) 100 CONTINUE C COMPUTE Mk=E(Nk|HO) ,WHICH IS SAVED IN MK(K,NRNC,1) C DO 200 K=1,NSTM DO 230 I=1,NR0W 230 A(I,1)=PIK(I,K) DO 240 J=1,NC0L 240 A1(J,1)=PJK(J,K) CALL DIRECTMM ( A , A 1 , C , NROW , NNN , NCOL , NNN) C ********** :)c =|c ** j(c Xc =)c =)c :(c *** >|c !)c ****** =(c *:(<* ;t: *********** ** DO 250 I=1,NRNC 250 MK(K,I,1)=NT0T(K)*C(I,1) DO 255 1=1,15 DO 256 J=l,15 A(I, J)=O.DO Aid, J)=O.DO 256 CONTINUE 255 CONTINUE DO 257 1=1, Ida DO 257 J=1,NNN 257 C(I,J)=O.DO 200 C C— CONTINUE

PAGE 239

231 C COMPUTE Var(Nk|HO),WHICH IS SAVED IN VK(K,I,J) C DO 350 K=1,NSTM DO 260 J=1,NC0L B(J)=PJK(J,K) A(J,1)=PJK(J,K) A1(1,J)=PJK(J,K) 260 CONTINUE CALL MMULTM (A , A1 , NCOL , NNN ,NCOL , Cl ) CALL DIAG(B,NCOL,D) CALL MSUBTM(D, Cl, NCOL, NCOL, C2) DO 265 1=1,15 265 B(I)=O.DO DO 267 1=1,15 DO 268 J=l,15 A(I, J)=O.DO Aid, J)=O.DO C1(I, J)=O.DO D(I, J)=O.DO 268 CONTINUE 267 CONTINUE DO 270 I=1,NR0W B(I)=PIK(I,K) A(I,1)=PIK(I,K) A1(1,I)=PIK(I,K) 270 CONTINUE CALL MMULTM ( A , A 1 , NRO W , NNN , NRO W , C 1 ) CALL DIAG(B,NROW,D) CALL MSUBTM(D,C1,NR0W,NR0W,C3) C******:|cj)c*>|c^cj|c*)(c*:)c*j|c:(oK5(ot::t:****!(<********s|c:(c*:t:*****!|t**** CALL D IRECTMM ( C3 , C2 , C , NRO W , NRO W , NCOL , NCOL ) C * =fc =tc J|c =(c J|C jK * J)C * :^o|c ,|o(c ,|c :(c ,K ^ ^ DO 280 I=1,NRNC DO 290 J=1,NRNC 290 VK(K , I , J)=DBLE(NTOT(K) *NTOT(K) )/DBLE(NTOT(K) -1) *C (I , J) 280 CONTINUE

PAGE 240

232 DO 300 1=1,15 DO 310 J=l,15 A(I, J)=O.DO Aid, J)=O.DO C1(I, J)=O.DO C2(I, J)=O.DO C3(I, J)=O.DO D(I, J)=O.DO 310 CONTINUE 300 CONTINUE DO 320 1=1, Ida DO 330 J=l,lda 330 C(I,J)=O.DO 320 CONTINUE DO 340 1=1,15 340 B(I)=O.DO 350 CONTINUE C C COMPUTE SCORE MATRIX BK(NR0WM,NRNC)=CK(1 ,NCOL)@RK(NROWM,NROW) DO 400 I=1,NR0WM 400 Y(I)=1.D0 YY=-1.D0 CALL AUGMD(Y,NROWM,YY,D) C D(NROWM,NROW)=RK C CK(1,NC0L) IS COLUMN SCORES. DO 410 J=1,NC0L 410 CK(1, J)=WTC(J) C**>(c****=(c** + **:(c=(c=|c****j|c**:)c**:(c**:(c)|t*=(c****:tc**:t:>(c*>|c CALL DIRECTMM (D , CK , BK , NROWM , NROW , NNN , NCOL) CALL TRANS (BK, NROWM, NRNC,BKT) C BKT(NRNC, NROWM) IS TRANSPOSE OF BK (NROWM, NRNC) . DO 446 1=1,15 Y(I)=O.DO DO 447 1=1,15 446

PAGE 241

233 DO 447 J=l,15 447 D(I,J)=O.DO C COMPUTE VG(NROWM,NROWM) . THIS IS SUMMING VGK(K,NROWM,NROWM) C .WHICH IS Bk(VAR(Nk|HO)Bk' , OVER K STRATUM. C DO 450 K=1,MSTM DO 460 I=1,NRNC DO 470 J=1,NRNC 470 GNMK(I,J)=VK(K,I,J) 460 CONTINUE CALL MMULTM 1 ( BK , GNMK , NRO WM , NRNC , NRNC , YK) CALL MMULTM 1 ( YK , BKT , NROWM , NRNC , NROWM , V) DO 475 1=1, NROWM DO 480 J=l, NROWM 480 VGK(K,I,J)=V(I,J) 475 CONTINUE DO 485 1=1, Ida DO 490 J=l,lda GNMKd, J)=O.DO YK(I, J)=O.DO V(I,J)=O.DO 490 CONTINUE 485 CONTINUE 450 CONTINUE DO 530 1=1, NROWM DO 540 J=l, NROWM VG(I, J)=O.DO DO 550 K=1,NSTM 550 VG(I,J)=VG(I,J)+VGK(K,I,J) 540 CONTINUE 530 CONTINUE C VG (NROWM, NROWM) IS VAR-COV MATRIX. WRITE(*,600) 600 FORMAT (/, 'PRINT NULL COVARIANCE MATRIX ? (Y=1,N=0)')

PAGE 242

234 READ(*,*) MCOV IF (NCOV .NE. 1) GO TO 620 print* PRINT*, 'MULL COVARIANCE MATRIX DO 610 I=1,NR0WM PRINT* , (SMGL(VG(I , J) ) , J=1 ,NROWM) 610 CONTINUE 620 J0B=01 n=NROWM c Ida=MRCM CALL dpofa(VG,ldal,n, info) IF (INFO .ME. 0) THEM WRITE(*,699) INFO 699 FORMATC/, 'THE FACTORIZATION IS NOT COMPLETE.',/, 1 'THE LEADING MINOR OF ORDER', 15,' IS NOT POSITIVE DEFINITE.') PRINT* ENDIF CALL dpodi (VG , Idal ,n, det , job) DO 605 1=2, n DO 606 J=1,I-1 606 VG(I,J)=VG(J,I) 605 CONTINUE 7 WRITE(*,698) 698 FORMAT (/, 'PRINT INVERSE MATRIX OF NULL COV. MATRIX ? (Y=1,N=0)') READ(*,*) NINVC IF (NINVC .NE. 1) GO TO 1000 PRINT* PRINT* , ' INVERSE MATRIX : ' do 690 i=l,n 690 print*, (SMGL(VG(i,j)),j=l,n) C COMPUTE G(NR0WM,1). THIS IS SUMMING GK(K,NROWM, 1) C , WHICH IS Bk(Mk-Mk), OVER K STRATUM. C G(MR0WM,1) DEPENDS ON DATA Nk. 1000 DO 1005 K=1,NSTM

PAGE 243

235 DO 1010 I=1,NR0W DO 1020 J=1,NC0L IJ=(I-1)*NC0L+J NK (K , IJ , 1) =MATRIX (K , I , J) 1020 CONTINUE 1010 CONTINUE DO 1030 I=1,NRNC GNMK(I,1)=NK(K,I,1)-MK(K,I,1) C NK(K,I,1) IS DEFINED AS REAL*8 1030 CONTINUE C ARRAYS ARE (Ida, Ida). MMULTMl IS CALLED INSTEAD OF MMULTM. CALL MMULTMl (BK , GNMK , NROWM, NRNC , NNN , YK) DO 1040 1=1, NROWM GK(K,I,1)=YK(I,1) 1040 CONTINUE DO 1050 1=1, Ida GNMK(I,1)=0.D0 YK(I,1)=0.D0 1050 CONTINUE 1005 CONTINUE DO 1060 1=1, NROWM DO 1070 J=1,NNN G(I, J)=O.DO DO 1080 K=1,NSTM 1080 G(I,J)=G(I,J)+GK(K,I,J) 1070 CONTINUE 1060 CONTINUE C C COMPUTE SCORE STATISTIC C CMH=G' (VG-'-l)G C C COMPUTE TRANSPOSE OF G (NROWM, 1)

PAGE 244

236 DO 1100 I=1,NR0WM 1100 GT(1,I)=G(I,1) CALL MMULTM ( GT , VG , NNN , NRO WM , NRO WM , GTVG ) DO 1200 1=1,15 B(I)=O.DO Y(I)=O.DO 1200 CONTINUE DO 1210 I=1,NR0WM B(I)=GTVG(1,I) Y(I)=G(I,1) 1210 CONTINUE CALL INNER1(B,Y,NR0WM,CMHV) CMH=CMHV C PRINT*, 'C-M-H STATISTIC =\CMH DO 1212 1=1,15 B(I)=O.DO Y(I)=O.DO 1212 CONTINUE DO 1215 1=1,15 DO 1215 J=l,15 GTVG(I,J)=O.DO 1215 D(I,J)=0.D0 RETURN END C SCORE STATISTIC 3 C234567 SUBROUTINE CMHOO (NROW , NCOL , NSTM, MATRIX , CMH) C C C C C TO COMPUTE SCORE STATISTIC FOR THE TEST OF THE CONDITIONAL INDEPENDENCE OF THE I*J*K TABLES WHEN X IS ORDINAL AND Y IS ORDINAL. COMMON IS USED FOR NIK,NJK,NTOT COMMON IS USED FOR WTR,WTC IMPLICIT REAL*8 (A-H,0-Z) DIMENSION MATRIX(20,50,50) ,NIK(50,20) ,NJK(50,20) ,NT0T(20) REAL*8 WTR(50) ,WTC(50)

PAGE 245

237 COMMON /Al/ NIK,NJK,NTOT COMMON /A4/ WTR,WTC T=O.DO TT=O.DO DO 100 K=1,NSTM DO 110 I=1,NR0W DO 120 J=1,NC0L T=T+WTR (I) *WTC ( J) * (MATRIX (K , I , J) (NIK ( I , K) *N JK ( J , K) ) 1 /DBLE(NTOT(K))) c T1=WTR(I)*WTC(J)*(MATRIX(K,I,J)-(NIK(I,K)*NJK(J,K))/DBLE(NT0T(K))) c LINEAR RANK STATISTICS TT=TT+WTR ( I ) * WTC ( J ) ^MATRIX (K , I , J ) c TT1=WTR(I)*WTC(J)*MATRIX(K,I, J) c PRINT*, 'Tl=' ,T1, 'TT1=' ,TT1 120 CONTINUE no CONTINUE 100 CONTINUE C CMH=T CMH=TT c PRINT*, 'T=' ,T, 'TT=' ,TT RETURN END C SCORE STATISTIC 4 C234567 SUBROUTINE CMHNN 1 (NROW , NCOL , NSTM , MATRIX , CMH) C TO COMPUTE SCORE STATISTIC C FOR THE TEST OF THE CONDITIONAL INDEPENDENCE OF THE I*J*K TABLES C WHEN X IS NOMINAL AND Y IS NOMINAL. C W/0 ASSUMING NO-THREE FACTOR INTERACTION MODEL. C MAX NO. OF STRATUM: 10 C MAX NO. OF ROW*COL : 250 C COMMON IS USED FOR NIK,NJK,NTOT

PAGE 246

238 IMPLICIT REAL*8 (A-H,0-Z) DIMENSION MATRIX(20,50,50) ,NIK(50,20) ,NJK(50,20) ,NT0T(20) COMMON /Al/ NIK,NJK,NTOT X=O.DO DO 100 K=1,NSTM DO 110 I=1,NR0W DO 110 J=1,NC0L EV=(NIK(I,K)*NJK(J,K))/DBLE(NTOT(K)) X=X+ ( (DELE (MATRIX (K , I , J) ) -EV) **2) /EV c print*, nik(i,k) ,njk(j ,k) ,ntot(k) c print*, matrix(k, i, j) ,ev,x 110 CONTINUE 100 CONTINUE CMH=x c print*, 'SCORE STATISTIC = ' ,CMH RETURN END C SCORE STATISTIC 5 C234567 SUBROUTINE CMHNO 1 (NROW , NCOL , NSTM , MATRIX , CMH) C TO COMPUTE SCORE STATISTIC C FOR THE TEST OF THE CONDITIONAL INDEPENDENCE OF THE I*J*K TABLES C WHEN X IS NOMINAL AND Y IS ORDINAL. C W/0 ASSUMING NO-THREE FACTOR INTERACTION MODEL. C MAX NO. OF STRATUM: 10 C MAX NO. OF ROW*COL : 250 C COMMON IS USED FOR NIK,NJK,NTOT C COMMON IS USED FOR WTR,WTC IMPLICIT REAL*8 (A-H,0-Z) DIMENSION MATRIX(20,50,50) ,NIK(50,20) ,NJK(50,20) ,NT0T(20) REAL*8 WTR(50) ,WTC(50) REAL*8 UV(IOO) ,VK(10) ,GK(100,250) ,0(100,2500) ,DT(2500, 100) REAL*8 P(2500) , DP (2500 , 2500) , PP(2500, 2500) ,SIGMA(2500, 2500) REAL*8 DSIGMA(100,2500) ,C0VG(100, 100) ,DIV(100)

PAGE 247

239 LOGICAL KIM COMMON /Al/ NIK,NJK,NTOT COMMON /A4/ WTR,WTC COMMON /A5/ P, DP, PP, SIGMA NNN=1 NRNC=NROW*NCOL KNRNC=NSTM*NRNC NROWM=NROW-l NRNK=(NROW-l)*NSTM NT0TAL=0 DO 100 K=1,NSTM 100 NTOTAL=NTOTAL+NTOT(K) c print* , ' ntotal= ' , ntotal L=0 DO 200 K=1,NSTM DO 210 I=1,NR0WM L=L+1 UV(L)=O.DO DO 220 J=1,NC0L UV (L) =UV (L) +WTC( J) * (MATRIX(K , I , J) 1 dble(nik(i,k)*njk(j,k))/dble(ntot(k))) 220 CONTINUE UV (L) =UV (L) /DBLE (NTOTAL) 210 CONTINUE 200 CONTINUE IF (KIM) GO TO 900 C C SET KIM FOR SUBSEQUENT CALLS C KIM=.TRUE. C NULL ASYMPTOTIC COVARIANCE OF SCORES. C COMPUTE GK(NRNK,NRNC) DO 250 K=1,NSTM VK(K)=O.DO DO 270 J=1,NC0L 270 VK(K)=VK(K)+WTC(J)*NJK(J,K) 250 CONTINUE

PAGE 248

240 L=0 DO 280 K=1,NSTM DO 290 I=1,NR0WM L=L+1 M=0 DO 300 IP=1,NR0W IND1=0 IF (I .EQ. IP) IND1=1 DO 310 JP=1,NC0L M=M+1 GK(L,M)=(WTC(JP)*NTOT(K)-VK(K))* 1 (NTOT (K) *IND1-NIK (I , K) ) /DELE (NTOT (K) *NTOT (K) ) 310 CONTINUE 300 CONTINUE 290 CONTINUE 280 CONTINUE C COMPUTE D(NRNK,KNRNC) DO 320 I=1,NRNK DO 330 J=1,KNRNC 330 D(I,J)=0.d0 320 CONTINUE L=0 DO 350 K=1,NSTM DO 360 I=1,NR0WM L=L+1 DO 370 IJ=1,NRNC M=(K-1)*NRNC+IJ D(L,M)=GK(L,IJ) 370 CONTINUE 360 CONTINUE 350 CONTINUE C COMPUTE SIGMA (KNRNC,KNRNC)=DIAG(P)-PP^ L=0 DO 400 K=1,NSTM DO 410 I=1,NR0W DO 420 J=1,NC0L L=L+1 420 P (L) =DBLE (NIK ( I , K) *N JK ( J , K) ) /DELE ( (NTOT (K) *NTOTAL) ) 410 CONTINUE 400 CONTINUE

PAGE 249

241 C P(KNRNC), DP(KNRNC,KNRNC) CALL DIAG1(P,KNRNC,DP) CALL CMULR(P,KNRNC,NNN,KNRNC,PP) DO 500 I=1,KNRNC DO 510 J=1,KNRNC 510 SIGMA(I,J)=DP(I,J)-PP(I,J) 500 CONTINUE C COMPUTE COV(G(P))=D SIGMA D'/NTOTAL C TRANSPOSE OF D(NRNK,KNRNC) : DT(KNRNC,NRNK) DO 550 I=1,NRNK DO 560 J=1,KNRNC 560 DT(J,I)=D(I,J) 550 CONTINUE C COMPUTE D(NRNK,KNRNC)*SIGMA(KNRNC,KNRNC)=DSIGMA(NRNK,KNRNC) C print*, 'dsigma(NRNK,knrnc) ' DO 600 I=1,NRNK DO 610 J=1,KNRNC DSIGMACI, J)=O.DO DO 620 K=1,KNRNC DSIGMACI, J)=DSIGMA (I, J)+D(I,K)*SIGMA(K,J) 620 CONTINUE if (dabs(dsigma(i,j)) .It. l.Od-15) dsigma(i , j ) =0 . dO 610 CONTINUE 600 CONTINUE c do 622 i=l,NRNK c622 print*, (sngl(dsigma(i,j)) ,j=l,knrnc) C COMPUTE DSIGMA(NRNK,KNRNC)*DT(KNRNC,NRNK)=COVG(NRNK,NRNK) c print*, 'covg(NRNK,NRNK) ' DO 650 I=1,NRNK DO 660 J=1,NRNK COVGCI, J)=O.DO DO 670 K=1,KNRNC 670 COVGCI, J)=COVG(I,J)+DSIGMA(I,K)*DT(K,J) 660 CONTINUE c print*, CsnglCcovgCi,j)) , j=l,NRNK) 650 CONTINUE C COMPUTE ESTIMATE COV GCP) DO 700 I=1,NRNK

PAGE 250

242 DO 710 J=1,NRNK 710 COVG ( I , J ) =COVG ( I , J) /DELE (NTOTAL) 700 CONTINUE WRITE(*,720) 720 FORMAT (/, 'PRINT NULL COVARIANCE MATRIX ? (Y=1,N=0)') READ(*,*) NCOV IF (NCOV .NE. 1) GO TO 760 print* PRINT*, 'NULL COVARIANCE MATRIX :' DO 750 I=1,NRNK PRINT* , (SNGL(COVG (I , J) ) , J=1 ,NRNK) 750 CONTINUE 760 J0B=01 n=NRNK lda=100 CALL dpof a(COVG, lda,n, inf o) IF (INFO .NE. 0) THEN WRITE(*,699) INFO 699 FORMAT(/, 'THE FACTORIZATION IS NOT COMPLETE.',/, 1 'THE LEADING MINOR OF ORDER', 15,' IS NOT POSITIVE DEFINITE.') PRINT* END IF CALL dpodi (COVG , Ida, n, det ,j ob) DO 800 1=2, n DO 810 J=1,I-1 810 C0VG(I,J)=C0VG(J,I) 800 CONTINUE WRITE(*,850) 850 FORMAT (/, 'PRINT INVERSE MATRIX OF MULL COV. MATRIX ? (Y=1,M=0)') READ(*,*) MINVC IF (NINVC .NE. 1) GO TO 900 PRINT* PRINT* , ' INVERSE MATRIX : ' do 860 i=l,n 860 print*, (SMGL(COVG(i , j ) ) ,j=l,n) C COMPUTE SCORE STATISTIC : UV' COVG'-l UV

PAGE 251

243 900 CALL MULTVA2(UV,C0VG,NRNK,NRNK,DIV) CALL INNER2(DIV,UV,NRNK,CMHV) CMH=CMHV C PRINT*, 'SCORE STATISTIC FOR RANDOM TABLE =' ,CMH RETURN END C SCORE STATISTIC 6 C234567 SUBROUTINE CMHOO 1 (NROW , NCOL , NSTM , MATRIX , CMH) C TO COMPUTE SCORE TEST STATISTI C FOR THE TEST OF THE CONDITIONAL INDEPENDENCE OF THE I*J*K TABLES C WHEN X IS ORDINAL AND Y IS ORDINAL. C W/0 ASSUMING NO-THREE FACTOR INTERACTION MODEL. C MAX NO. OF STRATUM: 10 C MAX NO. OF ROW*COL : 250 C COMMON IS USED FOR NIK,NJK,NTOT C COMMON IS USED FOR WTR,WTC IMPLICIT REAL*8 (A-H,0-Z) DIMENSION MATRIX(20,50,50) ,NIK(50 , 20) ,NJK (50 , 20) ,NT0T(20) REAL*8 WTR(50) ,WTC(50) REAL*8 UV(IO) ,UK(10) ,VK(10) ,GK(10,250) ,D(10,2500) ,DT(2500,10) REAL*8 P(2500) ,DP(2500,2500) ,PP(2500 , 2500) , SIGMA(2500 , 2500) REAL*8 DSIGMA(10,2500) ,C0VG(10,10) ,DIV(10) LOGICAL KIM COMMON /Al/ NIK,NJK,NTOT COMMON /A4/ WTR,WTC COMMON /A5/ P, DP, PP, SIGMA NNN=1 NRNC=NROW*NCOL KNRNC=NSTM*NRNC NT0TAL=0

PAGE 252

244 DO 100 K=1,NSTM 100 NTOTAL=NTOTAL+NTOT(K) c print*, 'ntotal='Â’ ,ntotal DO 200 K=1,NSTM UV(K)=O.DO DO 210 I=1,NR0W DO 220 J=1,NC0L UV (K) =UV (K) +WTR(I) *WTC ( J) * (MATRIX (K , I , J) 1 DBLE(NIK(I,K)*NJK(J,K))/DBLE(NTOT(K))) 220 CONTINUE 210 CONTINUE UV (K) =UV (K) /DBLE (NTOTAL) 200 CONTINUE IF (KIM) GO TO 900 C C SET KIM FOR SUBSEQUENT CALLS C KIM=.TRUE. C NULL ASYMPTOTIC COVARIANCE OF SCORES. C COMPUTE GK(NSTM,NRNC) DO 250 K=1,NSTM UK(K)=O.DO VK(K)=O.DO DO 260 I=1,NR0W 260 UK(K)=UK(K)+WTR(I)*NIK(I,K) DO 270 J=1,NC0L 270 VK(K)=VK(K)+WTC(J)*NJK(J,K) IJ=0 DO 280 I=1,NR0W DO 290 J=1,NC0L IJ=IJ+1 GK(K,IJ)=WTR(I)*WTC(J)-(WTR(I)*VK(K) 1 +WTC(J)*UK(K))/DBLE(NTOT(K)) 2 +UK (K) *VK (K) /DBLE (NTOT (K) *NTOT (K) ) 290 CONTINUE 280 CONTINUE 250 CONTINUE C COMPUTE D(NSTM,KNRNC) DO 300 K=1,NSTM

PAGE 253

DO 305 IJ=1,KNRNC 305 D(K,IJ)=0.d0 300 CONTINUE c print*, M(k,l) ' DO 310 K=1,NSTM DO 320 IJ=1,NRNC L=(K-1)*NRNC+IJ D(K,L)=GK(K,IJ) 320 CONTINUE 310 CONTINUE C COMPUTE SIGMA (KNRNC,KNRNC)=DIAG(P)-PP^ L=0 DO 400 K=1,NSTM DO 410 I=1,NR0W DO 420 J=1,NC0L L=L+1 420 P (L) =DBLE (NIK ( I , K) *N JK ( J , K) ) /DELE ( (NTOT (K) *NTOTAL) ) 410 CONTINUE 400 CONTINUE C P ( KNRNC ) , DP ( KNRNC , KNRNC ) CALL DIAG1(P, KNRNC, DP) CALL CMULR(P, KNRNC, NNN, KNRNC, PP) DO 500 1=1, KNRNC DO 510 J=l, KNRNC 510 SIGMA(I,J)=DP(I,J)-PP(I,J) 500 CONTINUE C COMPUTE COV(G(P))=D SIGMA DVNTOTAL C TRANSPOSE OF D(NSTM, KNRNC) : DT (KNRNC ,NSTM) DO 550 I=1,NSTM DO 560 J=l, KNRNC 560 DT(J,I)=D(I,J) 550 CONTINUE C COMPUTE D(NSTM,KNRNC)*SIGMA(KNRNC,KNRNC)=DSIGMA(NSTM, KNRNC) c print* ,' dsigma(nstm,knrnc) ' DO 600 I=1,NSTM DO 610 J=l, KNRNC DSIGMA(I,J)=O.DO DO 620 K=l, KNRNC

PAGE 254

246 DSIGMAd, J)=DSIGMA(I, J)+D(I,K)*SIGMA(K, J) c print*, i,j ,D(I,K) ,SIGMA(K,J) ,D(I,K)*SIGMA(K, J) ,DSIGMA(I, J) 620 CONTINUE if (dabs(dsigma(i,j)) .It. l.Od-15) dsigma(i, j)=0.d0 610 CONTINUE 600 CONTINUE C COMPUTE DSIGMA(NSTM,KNRNC)*DT(KNRNC,NSTM)=COVG(NSTM,NSTM) c print*, 'covg(nstm,nstm) ' DO 650 I=1,NSTM DO 660 J=1,NSTM COVGd, J)=O.DO DO 670 K=1,KNRNC 670 COVGd, J)=COVG(I,J)+DSIGMAd,K)*DT(K,J) 660 CONTINUE 650 CONTINUE C COMPUTE ESTIMATE COV G(P) DO 700 I=1,NSTM DO 710 J=1,NSTM 710 COVG (I , J) =COVG (I , J) /DELE (NTOTAL) 700 CONTINUE WRITE(*,720) 720 FORMAT (/, 'PRINT NULL COVARIANCE MATRIX ? (Y=1,N=0)') READ(*,*) NCOV IF (NCOV .NE. 1) GO TO 760 print* PRINT*, 'NULL COVARIANCE MATRIX :' DO 750 I=1,NSTM PRINT* , (SNGL (COVG (I , J) ), J=1 , NSTM) 750 CONTINUE 760 J0B=01 n=NSTM lda=10 CALL dpof a(COVG , lda,n, inf o) IF (INFO .NE. 0) THEN WRITE(*,699) INFO 699 FORMAT(/, 'THE FACTORIZATION IS NOT COMPLETE.',/, 1 'THE LEADING MINOR OF ORDER ',15,' IS NOT POSITIVE DEFINITE.')

PAGE 255

247 PRINT* ENDIF CALL dpodi(COVG,lda,n,det , job) DO 800 1=2, n DO 810 J=1,I-1 810 COVGd, J)=C0VG(J,I) 800 CONTINUE WRITE(*,850) 850 FORMAT (/, 'PRINT INVERSE MATRIX OF NULL COV. MATRIX ? (Y=1,N=0)') READ(*,*) NINVC IF (NINVC .NE. 1) GO TO 900 PRINT* PRINT* , ' INVERSE MATRIX : ' do 860 i=l,n 860 print* , (SNGL(COVG(i , j ) ) , j=l ,n) C COMPUTE SCORE TEST STATISTIC : UV' COVG"-! UV 900 CALL MULTVA1(UV,C0VG,NSTM,NSTM,DIV) CALL INNER1(DIV,UV,NSTM,CMHV) CMH=CMHV C PRINT*, 'SCORE STATISTIC FOR RANDOM TABLE =' ,CMH RETURN END

PAGE 256

REFERENCES Agresti, A. (1990). Categorical Data Analysis. Wiley, New York. Agresti, A. (1992). A survey of exact inference for contingency tables. Statistical Science, 7, 131-177. Agresti, A., Mehta, C. R., & Patel, N. R. (1990). Exact inference for contingency tables with ordered categories. Journal of the American Statistical Association, 85, 453-458. Agresti, A., Wackerly, D., & Boyett, .1. (1979). Exact conditional tests for crossclassifications: Approximation of attained significance levels. Psychornetrika, 44, 75-83. Baptista, J., & Pike, M. C. (1977). Exact two-sided confidence limits for the odds ratio in a 2 X 2 table. Journal of the Royal Statistical Society, Ser. C, 26, 214-220. Barnard, G. A. (1947). Significance tests for 2x2 tables. Biometrika, 34, 123-138. Barnard, G. A. (1990). Must clinical trials be large? The interpretation of Pvalues and the combination of test results. Statistics in Medicine, 9, 601-614. Birch, M. W. (1964). The detection of partial association I; The 2x2 case. Journal of the Royal Statistical Society, Ser. B, 26, 313-324. Birch, M. W. (1965). The detection of partial association II: The general case. Journal of the Royal Statistical Society, Ser. B, 27, 111-124. Boyett, ,1. (1979). Random R x C tables with given row and column totals. Journal of the Royal Statistical Society, Ser. C, 28, 329-332. (dogg, C. C. (1982). Some models for the analysis of association in multiway crossclassifications having ordered categories. Journal of the American Statistical Association, 77, 803-815. Cochran, W. G. (1954). Some methods of strengthening the common tests. Biometrics, 10, 417-451. Cohen, A., & Sackrowitz, H. B. (1991). Tests for independence in contingency tables with ordered categories. Journal of Multivariate Analysis, 36, 56-67. Cohen, A., & Sackrowitz, H. B. (1992). An evaluation of some tests of trend in contingency tables. Journal of the American Statistical Association, 87, 470-475. Cox, D. R. (1970). Analysis of Binary Data. Chapman and Hall, London. 248

PAGE 257

249 Cox, D. R., & Hinkley, D. V. (1974). Theoretical Statistics. Chapman and Hall, London. Cox, M. A. A., & Plackett, R. L. (1980). Small samples in contingency tables. Biometrika, 67, 1-13. Davison, A. C. (1988). Approximate conditional inference in generalized linear models. .Journal of the Royal Statistical Society, Ser. B, 50, 445-461. Eaton, M. L. (1970). A complete class theorem for multidimensional one-sided alternatives. Armais of Mathematical Statistics, fl, 1884-1888. Fortuin, C. M., Ginibre, .1., & Kasteleyn, P. W. (1971). Correlation inequalities on some partially ordered sets. Communications in Mathematical Physics, 22, 89-103. Gart, .1. J. (1970). Point and interval estimation of the common odds ratio in the combination of 2 X 2 tables with fixed marginals. Biometrika, 57, 471-475. Hirji, K. F., Mehta, C. R., & Patel, N. R. (1987). Computing distributions for exact logistic regression. Journal of the American Statistical Association, 82, 1110-1117. Karlin, S., & Rinott, Y. (1980). Classes of orderings of measures and related correlation inequalities. 1. Multivariate totaUy positive distributions. Journal of Multivariate Analysis, 10, 467-498. Kuritz, S. J., Landis, J. R., & Koch, G. G. (1988). A general overview of Mantel-Haenszel methods: Applications and recent developments. Annual Review of Public Health, 9, 123-160. Lamport, L. (1986). I^TjrX; A Document Preparation System. Addison -Wesley Publishing Company, Reading, Massachusetts. Lancaster, H. 0. (1961). Significance tests in discrete distributions. Jownal of the American Statistical Association, 56, 223-234. Landis, J. R., Heyman, E. R., & Koch, G. G. (1978). Average partial association in threeway contingency tables: A review and discussion of alternative tests. International Statistical Review, 46, 237-254. Ledwina, T. (1978a). Admissible tests for exponential families with finite support. Mathernatische Operationsforschung und Statistik, Ser. Statistics, 9, 105-118. Ledwina, T. (1978b). Admissible tests for contingency tables and Poisson’s distributions, 11. Mathernatische Operationsforschung und Statistik, Ser. Statistics, 9, 119-125. Ledwina, T. (1984). A note on admissibility of some tests of independence against ‘positive dependence’ in R x C contingence table. Mathernatische Operationsforschung und Statistik, Ser. Statistics, 15, 565-570. Lehmann, E. L. (1986). Testing Statistical Hypotheses (second edition). Wiley, New York. Mantel, N. (1963). Chi-square tests with one degree of freedom: extensions of the MantelHaenszel procedure. Journal of the American Statistical Association, 58, 690-700.

PAGE 258

250 Mantel, N., & Byar, D. P. (1978). Marginal homogeneity, symmetry and independence. Communications in Statistics, Ser. A, 7, 953-976. Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of National Cancer Institute, 22, 719-748. Marshall, A. W., & Olkin, 1. (1970). Inequalities: Theory of Majorization and Its Applications. Academic Press, New York. Matthes, T. K., & Truax, I). R. (1967). Tests of composite hypotheses for the multivariate exponential family. Annals of Mathematical Statistics, 38, 681-697. Mehta, C. R., Patel, N. R., & Gray, R. (1985). Computing an exact confidence interval for the common odds ratio in several 2 by 2 contingency tables. Journal of the American Statistical Association, 80, 969-973. Mehta, C. R., Patel, N. R., & Senchaudhuri, P. (1988). Importance sampling for estimating exact probabilities in permutational inference. Journal of the American Statistical Association, 83, 999-1005. Mehta, C. R., & Walsh, S. J. (1992). Comparison of exact, mid-p, and MantelHaenszel confidence intervals for the common odds ratio across several 2 x2 contingency tables. The American Statistician, f6, 146-150. Mood, A. M., Graybill, F. A., & Boes, D. C. (1974). Introduction to the Theory of Statistics (third edition). McGraw-HiU, New York. Neyman, J. (1935). On the problem of confidence limits. Annals of Mathematical Statistics, 6, 111-116. Pagano, M., & Halvorsen, K. T. (1981). An algorithm for finding the exact significance levels of 7X c contingency tables. Journal of the American Statistical Association, 76, 931-934. Patefield, W. M. ( 1981). An efficient method of generating random RxC tables with given row and column totals. Journal of the Royal Statistical Society, Ser. C, 30, 91-97. Patefield, W. M. (1982). Exact tests for trends in ordered contingency tables. Applied Statistics, 31, 32-43. Pierce, D. A., & Peters, D. (1992). Practical use of higher order asymptotics for multiparameter exponential families. Journal of the Royal Statistical Society, Ser. B, 54, 701-737. Plackett, R. L. (1981). The Analysis of Categorical Data (second edition). Griffin, London. Rao, C. R. (1973). Linear Statistical Inference and Its Applications (second edition). Wiley, New York. Robins, J., Breslow, N., & Greenland, S. (1986). Estimators of the MantelHaenszel variance consistent in both sparse data and large-strata limiting models. Biometrics, 42, 311-323.

PAGE 259

251 SAS. (1988). SAS/STAT UserÂ’s Guide (Release 6.03 edition). SAS Institute Inc., Cary, NC. Skovgaard, I. (1987). Saddlepoint expansions for conditional distributions. Journal of the Applied Probability, 2f, 875-887. StatXact. (1991). StatXact: Statistical Software for Exact Nonparametric Inference (Version 2 edition). Cytel Software, Cambridge, MA. Sterne, T. E. (1954). Some remarks on confidence or fiducial limits. Biometrika, 4 1 , 275-278. Streitberg, B., Sz Roehmel, J. (1990). On tests that are uniformly more powerful than the Wilcoxon-MannWhitney test. Biometrics, 46, 481-484. Suissa, S., & Shuster, J. J. (1985). Exact unconditional sample sizes for the 2 by 2 binomial trial. Journal of the Royal Statistical Society, Ser. A, IfS, 317-327. Thomas, D. G. (1975). Exact and asymptotic methods for the combination of 2 X 2 tables. Computers and Biomedical Research, 8, 423-446. Tritchler, D. (1984). An algorithm for exact logistic regression. Journal of the American Statistical Association, 79, 709-711. Verbeek, A., & Kroonenberg, P. M. (1985). A survey of algorithms for exact distributions of test statistics in r X c contingency tables with fixed margins. Computational Statistics and Data Analysis, 3, 159-185. VoUset, S. E., & Hirji, K. E. (1991). A microcomputer program for exact and asymptotic analysis of several 2x2 tables. Epidemiology, 2, 217-220. Vollset, S. E., Hirji, K. F., & Elashoff, R. M. (1991). Fast computation of exact confidence limits for the common odds ratio in a series of 2 X 2 tables. Journal of the American Statistical Association, 86, 404-409. Yates, F. (1984). Tests of significance for 2 X 2 contingency tables. Journal of the Royal Statistical Society, Ser. A, 1 47, 426-463. Zelen, M. (1971). The analysis of several 2 X 2 contingency tables. Biometrika, 58, 129-137.

PAGE 260

BIOGRAPHICAL SKETCH Donguk Kim was born on October 26, 1959 in Pusan, Korea. He was awarded a Bachelor of Economics degree in statistics in 1983, from Sung Kyun Kwan University, Korea. He also received a Master of Economics degree in statistics in 1985, from Sung Kyun Kwan University. He came to graduate school at the University of Elorida in spring 1989. While working toward his Ph.D in statistics from the University of Florida, he also worked as a teaching assistant and a statistical consultant for the Division of Biostatistics. He has been a member of the American Statistical Association since 1990. He is married and has one child. After graduation he looks forward to doing teaching and research. 252

PAGE 261

I certify that I have read this study and that in my opinion it conforms to acceptable sUndards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Alan Agresti, Cliairman Professor of Statistics I certify that 1 have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. 1 certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Pl^osophy. Myron N Associate 5 hang ^rofessor of Statistics 1 certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Assistant Professor of Statistics I certify that 1 have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. David C. Wilson Professor of Mat

PAGE 262

This dissertation was submitted to the Graduate Faculty of the Department of Statistics in the College of Liberal Arts and Sciences and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. August 1994 Dean, Graduate School