UFDC Home myUFDC Home  |   Help
<%BANNER%>

# Using Ulam's Method to Test for Mixing

## Material Information

Title: Using Ulam's Method to Test for Mixing
Physical Description: 1 online resource (145 p.)
Language: english
Creator: Smith, Aaron
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

## Subjects

Subjects / Keywords: aaron, correlation, decay, eigenfunction, eigenvalue, fluid, incompressible, markov, measure, method, mixing, preserving, protocol, shift, smith, statistics, stirring, strong, subshift, ulam, weak
Mathematics -- Dissertations, Academic -- UF
Genre: Mathematics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

## Notes

Abstract: Ulam s method is a popular technique to discretize and approximate a continuous dynamical system. We propose a statistical test using Ulam s method to evaluate the hypothesis that a dynamical system with a measure-preserving map is weak-mixing. The test statistic is the second largest eigenvalue of a Monte Carlo stochastic matrix that approximates a doubly-stochastic Ulam matrix of the dynamical system. This eigenvalue leads to a mixing rate estimate. Our approach requires only one experiment while the most common method to evaluate the hypothesis, decay of correlation, requires iterated experiments. Currently, time of computation determines how many Monte Carlo points to use; we present a method based on desired accuracy and risk tolerance to decide how many points should be used. Our test has direct application to mixing relatively incompressible fluids, such as water and chocolate. Stirring protocols of compression resistant fluids may be modeled with measure-preserving maps. Our test evaluates if a stirring protocol mixes and can compare mixing rates between different protocols.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Aaron Smith.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.

## Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0041915:00001

## Material Information

Title: Using Ulam's Method to Test for Mixing
Physical Description: 1 online resource (145 p.)
Language: english
Creator: Smith, Aaron
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

## Subjects

Subjects / Keywords: aaron, correlation, decay, eigenfunction, eigenvalue, fluid, incompressible, markov, measure, method, mixing, preserving, protocol, shift, smith, statistics, stirring, strong, subshift, ulam, weak
Mathematics -- Dissertations, Academic -- UF
Genre: Mathematics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

## Notes

Abstract: Ulam s method is a popular technique to discretize and approximate a continuous dynamical system. We propose a statistical test using Ulam s method to evaluate the hypothesis that a dynamical system with a measure-preserving map is weak-mixing. The test statistic is the second largest eigenvalue of a Monte Carlo stochastic matrix that approximates a doubly-stochastic Ulam matrix of the dynamical system. This eigenvalue leads to a mixing rate estimate. Our approach requires only one experiment while the most common method to evaluate the hypothesis, decay of correlation, requires iterated experiments. Currently, time of computation determines how many Monte Carlo points to use; we present a method based on desired accuracy and risk tolerance to decide how many points should be used. Our test has direct application to mixing relatively incompressible fluids, such as water and chocolate. Stirring protocols of compression resistant fluids may be modeled with measure-preserving maps. Our test evaluates if a stirring protocol mixes and can compare mixing rates between different protocols.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Aaron Smith.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.

## Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0041915:00001

Full Text

USING ULAM'S METHOD TO TEST FOR MIXING

By

AARON CARL SMITH

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2010

2010 Aaron Carl Smith

I dedicate this to Bridgett for her support while I pursued my goals.

ACKNOWLEDGMENTS

I thank Professor Boyland for his guidance, and members of my supervisory

committee for their mentoring. I thank Bridgett and Akiko for their love and patience. I

needed the support they gave to reach my goal.

ACKNOWLEDGMENTS .........................

LIST O F TABLES .. .. .. .. .. .. .. .

ABSTRACT ...............................

CHAPTER

1 INTRO DUCTION ..........................

1.1 Hypotheses for Testing ....................
1.2 Testing Procedure ....................

2 ERGODIC THEORY AND MARKOV SHIFTS ..........

2.1 Ergodic Theory . .
2.2 Markov Shifts .........................

3 STOCHASTIC AND DOUBLY-STOCHASTIC MATRICES ...

3.1 Doubly-Stochastic Matrices .................
3.2 Additional Properties of Stochastic Matrices ........

4 ESTIMATING THE RATE OF MIXING .. ............

4.1 The Jordan Canonical Form of Stochastic Matrices ....
4.2 Estimating Mixing Rate .. ................

5 PROBABILISTIC PROPERTIES OF DS-MATRICES ......

5.1 Random DS-Matrices .. .................
5.2 Metric Entropy of Markov Shifts with Random Matrices ..

6 PARTITION REFINEMENTS .. ................

6.1 Equal Measure Refinements .. ..............
6.2 A Special Class of Refinements ..............

7 PROBALISTIC PROPERTIES OF PARTITION REFINEMENTS

7.1 Entries of a Refinement Matrix .. .............
7.2 The Central Tendency of Refinement Matrices ......
7.3 Metric Entropy After Equal Measure Refinement .....

8 ULAM MATRICES .. ......................

8.1 Building the Stochastic Ulam Matrix .. ..........
8.2 Properties of Ulam Matrices .................

9 CONVERGENCE TO AN OPERATOR ................

9.1 Stirring Protocols as Operators and Operator Eigenfunctions
9.2 Convergence Results ......................

10 DECAY OF CORRELATION ......................

10.1 Comparing Our Test to Decay of Correlation .
10.2 A Conjecture About Mixing Rate .

11 CRITERIA FOR WHEN MORE DATA POINTS ARE NEEDED .

11.1 Our Main Criteria for When More Data Points Are Needed .
11.2 Other Criteria for When More Data Points Are Needed .

12 PROBABILITY DISTRIBUTIONS OF DS-MATRICES .

12.1 Conditional Probability Distributions .
12.2 Approximating Probability Distributions .
12.2.1 The Dealer's Algorithm ............
12.2.2 Full Convex Combinations ..........
12.2.3 Reduced Convex Combinations .
12.2.4 The DS-Greedy Algorithm ..........
12.2.5 Using the Greedy DS-Algorithm .
12.2.6 DS-Matrices Arising from Unitary Matrices .

13 EXAM PLES .. .. .. .. .. .. .

13.1 The Reflection Map .................
13.2 Arnold's Cat Map ...................
13.3 The Sine Flow Map (parameter 8/5) .
13.4 The Sine Flow Map (parameter 4/5) ........
13.5 The Baker's Map ...................
13.6 The Chirikov Standard Map (parameter 0) .

REFERENCES ............. .. ..........

BIOGRAPHICAL SKETCH ...................

. 1 17
. 120
. 120
. 123
. 12 6
. 128
. 130
. 1 3 1

133

. 13 3
. 13 5
. 13 6
. 137
. 13 8
. 14 0

. 14 3

. 14 5

. 92
. 97

. 107

. 107
. 110

112

112
116

117

LIST OF TABLES
Table page

13-1 The Reflection M ap . . 135

13-2 Arnold's Cat Map ........... ...................... 136

13-3 The Sine Flow Map (parameter 8/5) ........................ 137

13-4 The Sine Flow Map (parameter 4/5) ........................ 138

13-5 The Baker's M ap . . 140

13-6 The Chirikov Standard Map (parameter 0) . 141

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

USING ULAM'S METHOD TO TEST FOR MIXING

By

Aaron Carl Smith

August 2010

Chair: Philip Boyland
Major: Mathematics

Ulam's method is a popular technique to discretize and approximate a continuous

dynamical system. We propose a statistical test using Ulam's method to evaluate the

hypothesis that a dynamical system with a measure-preserving map is weak-mixing.

The test statistic is the second largest eigenvalue of a Monte Carlo stochastic matrix that

approximates a doubly-stochastic Ulam matrix of the dynamical system. This eigenvalue

leads to a mixing rate estimate. Our approach requires only one experiment while the

most common method to evaluate the hypothesis, decay of correlation, requires iterated

experiments. Currently, time of computation determines how many Monte Carlo points

to use; we present a method based on desired accuracy and risk tolerance to decide

how many points should be used. Our test has direct application to mixing relatively

incompressible fluids, such as water and chocolate. Stirring protocols of compression

resistant fluids may be modeled with measure-preserving maps. Our test evaluates if a

stirring protocol mixes and can compare mixing rates between different protocols.

CHAPTER 1
INTRODUCTION

If we need to evaluate a stirring protocol's ability to mix an incompressible fluid in a

closed system, but cannot do so analytically, Ulam's method provides a Markov shift that

approximates the protocol. Since the fluid is incompressible, the stirring protocol defines

a measure preserving map where the volume of a region defines its measure. For a

function to be a stirring protocol, the function must be volume (measure) preserving.

Otherwise a closed system could have more or less mass after mixing than before.

Let D be the incompressible fluid we wish to mix (D is our domain), D is a bounded

and connected subset of Rk, k e N, B is the Borel o-algebra of D,and p is the uniform

probability measure rescaledd Lebesque measure) of (D, B). The function f : D -> D is

defined by the given stirring protocol. Our dynamical system is (D, B, p, f).

If a stirring protocol mixes, then the concentration of an ingredient will become

constant as the protocol is iterated. When a solution is mixed, the amount of an

ingredient in a region becomes proportional to the volume of the region. If we partition

the fluid into n parts, we may use an n x n stochastic matrix to represent how the stirring

protocol moves an ingredient; if the protocol mixes, then the matrix will become a rank

one matrix with all rows being equal as the protocol is iterated (rows of the rank one

matrix correspond to the measure of partition sets). When the partition is a Markov

partition, we may use powers of the matrix to represent how iterations of the stirring

protocol move an ingredient; powers of a stochastic matrix converge to a rank one

matrix if and only if the second largest eigenvalue is less than one in magnitude. Since

the fluid is partitioned into n sets, it is natural to partition the fluid into sets with volume
. When we partition a stirring protocol in this manner the approximating matrix and its
n
transpose will be stochastic.

The stochastic matrix that approximates the stirring protocol defines a Markov shift,

so the Markov shift approximates and models how stirring iterations move particles from

partition set to partition set. The Markov shift as a dynamical system approximates the

dynamical system defined by stirring, so we may use the Markov shift to make decisions

We will call the matrix generated by Ulam's method an Ulam matrix; after rescaling

the rows of an Ulam matrix we will call the resulting matrix an Ulam stochastic

matrix. We will use the Markov shift defined by the Ulam stochastic matrix to decide

if (D, B, p, f) is mixing. The magnitude of the second largest eigenvalue of the

Ulam stochastic matrix provides a test statistic to accept or reject the protocol as

weak-mixing. Also if we accept the protocol as mixing, the second largest eigenvalue

and the dimensions of the Ulam stochastic matrix provides an estimate of the rate of

mixing. From now on when we refer to eigenvalue size, we mean the magnitude of the

eigenvalues.

Ulam's method has been used to study hyperbolic maps [1], find attractors [2],

and approximate random and forced dynamical systems [3]. Stan Ulam proposed what

we call Ulam's method in his 1964 book Problems in Modern Mathematics [4], the

procedure gives a discretization of a Perron-Frobenius operator (transfer operator). The

procedure provides a superior method of estimating long term distributions and natural

invariant measures of deterministic systems [3]. Eigenvalues and their corresponding

eigenfunctions of hyperbolic maps can reveal important persistent structures of a

dyamical systems, such as almostinvariant sets [1, 5]. Ulam's finite approximation of

absolutely continuous invariant measures of systems defined by random compositions

of piecewise monotonic transformations converges [6]. Ulam's method may be used

to estimate the measure-theoretic entropy of uniformly hyperbolic maps on smooth

manifolds and obtain numerical estimates of physical measures, Lyapunov exponents,

decay of correlation and escape rates for everywhere expanding maps, Anosov

maps, maps that are hyperbolic on an attracting invariant set, and hyperbolic on a

non-attracting invariant set [7].

In this paper, we present the testing procedure and hypotheses of testing early on,
we do this to give an overview of paper, help the reader understand the goals of this

work, and provide easy reference. Background information about ergodic theory and

Markov shifts is provided to establish our notation and review critical concepts. Since
the stirring protocol is modeled with a doubly-stochastic matrix, a review of stochastic

and doubly-stochastic matrix properties is presented. When the approximating Markov
shift defines a mixing dynamical system, the rate at which the matrix converges to
a rank one matrix provides an estimate of (D, B, p, f)'s mixing rate; construction of

the estimate is provided to justify its utility. When a decision is made based upon
observations, statistics can establish confidence in the decision, the approximating
matrix is treated as a random variable so that statistics may be used. Properties of

random doubly-stochastic matrices are given to illuminate the approximating matrix
as a random variable. How does the Markov shift change if the partition is refined?
Will a sequence of partitions lead to a sequence of Markov shift dynamical systems
that converge? What will the convergence rate be? As a first step to answering

these questions, we look at the relationship between Markov shifts before and after
a refinement, then investigate the probabilistic properties of random partitions. To
be consistent, we only consider refinement that have partition sets of equal volume.

Ulam's method gives us a approximating matrix that usually cannot be observed directly,

numerical or statistical observations can approximate this matrix with a Monte Carlo
technique; proof that the Monte Carlo technique converges is provided. Ulam's method

converges to an operator [6]; similarities between the approximating matrix and the
target operator are established, and proof of convergence with respect weak-mixing is
given. Decay of correlation is a well established measure of mixing; the second largest

eigenvalue of the approximating matrix and decay of correlation are different measures
of mixing. Decay of correlation is a better measure of mixing, but requires a sequence of

stirring iteration, the second largest eigenvalue requires one iteration. If the partition is a

Markov partiition, then the two measures are equivalent. One must decide if the sample

size is sufficient when using a Monte Carlo technique, we propose using a modified

chi-squared test to evaluate sufficiency after building the the approximating matrix.

Statistics requires probability distributions of observations, several statistical and Monte

Carlo methods of distribution estimation are given.

1.1 Hypotheses for Testing

The hypotheses for testing are

1. Ho: (D, B, f) is not ergodic (and hence not mixing).

2. Ha : (D, B, f) is ergodic but not weak-mixing.

3. Ha2 : (D, B, p, f) is weak-mixing (and hence ergodic).

We will define ergodic, weak-mixing and strong-mixing in the Ergodic Theory and

Markov Shifts chapter. We refer to Ho as the null hypothesis, Ha and Ha2 are called

alternative hypotheses. The procedure we present does not prove or disprove that

(D, B, j, f) is ergodic or weak-mixing, it provides a method to decide to accept or reject

hypotheses. We use logical arguments of the form

If statement A is correct, then statement B is correct or

If statement B is incorrect, then statement A is incorrect.

when proving a statement; in hypothesis testing we use

If our observation is unlikely, then the null hypothesis is probably incorrect.

So hypothesis testing uses an argument similar to a contrapositive. When making a

decision we can make two mistakes

1. We can reject Ho when Ho is correct. This is called a type I error.

2. We can fail to reject Ho when Ho is incorrect. This is called a type II error.

When we decide which hypothesis is the null and which are the alternatives, we

set the null hypothesis such that the consequences of a type I error are more severe

than the consequences of type II errors. Therefore it takes strong evidence to reject

Ho. In general, we cannot simultaneously control the type I error risk and the type II

error risk; by default we control the risk of a type I error. The maximum chance we are

willing to risk a type I error is called the significance level. To conduct proper hypothesis

testing, one must set the significance level before gathering observations. After setting

the significance level, one must establish what statistic to use (called the test statistic);

what set of statistics results in reject Ho, what set of statistics results in fail to reject Ho

(the boundary between these two sets is called critical valuess).

For our problem, P is a stochastic Ulam matrix that approximates P, a doubly-stochastic

matrix (ds-matrix). We will make our decision based on the following criteria.

1. Reject Ho in favor of Ha2 if I A2(P) I< c2 where c2 is a critical value.

2. Reject Ho in favor of Ha if I A2(P) 1 I> ci and I A2(P) I> c2 where cl is a critical

value.

3. Fail to reject Ho otherwise.

Since we are more concerned with a type I error, rejecting Ho is unlikely when

observing random events. To set the critical values we must have an estimate of the

probability distribution of the test statistic. Probability distibutions of ds-matrices are

difficult to work with, the Monte Carlo chapter provides ways to approximate probability

distributions of test statistics.

1.2 Testing Procedure

Notation 1.2.1. Let
-1 1-
n n

1 1
-n n-
and I be the identity matrix. We will denote disjoint unions with -.

1. Set the significance level.

2. Set n such that connected regions of measure are sufficiently small for the
n
application. If an upper bound of (D, B, p, f)'s entropy is known, call it h, set n such

that eh < n.

3. Decide which conditional probability distributions of IA2(P)I when IA2(P)I = 1 to

use. We propose using a beta distribution with a > 2 and / = 1 for Ha2.

4. Set critical values for Ha, Ha2.

5. Partition D into n connected subsets with equal measure,

n
{D,}f1, I D= D,, p(DI)=
i= 1

6. Randomly select m sample points in D, call the points {xk} J1. Let mi be the

number of points in Di.

7. Run one iteration of the mixing protocol.

8. Let my be the number of points such that xk e DI and f(xk) E Dj. Let M = (my), M

is is called an Ulam Matrix.

9. Let P = ( ), P is an Ulam stochastic matrix. Compare the second largest

eigenvalue of P, A2(P), to the critical values.

10. If there are concerns about eigenvalue stability, confirm the results of Ho versus

Ha2 with a1((/ P)P).

11. Make a decision about the hypotheses of testing based on the critical values.

12. If we reject Ho in favor of Ha2, let the rate at which (n 1)(A2(P))N-n'+ 0 as

N oo be our estimate of the rate of mixing.

13. Estimate of the entropy of the dynamical system with
n n
S nY log pl .
n
i=1 j=1

Definition 1.2.2. When we partition D into n connected subsets of equal measure,

D = i1 Di, p(I(D) = 1/n, we call the partition an n-partition.

After taking an n-partition, f maps some portion of state i to state. Let P = (p,) be
the matrix where

P (x C Dj A f-'(x) E Di)
i '(x) p(x e Dlf -(x) E ID) = p(f(x) E Dlx Ce ID).
S I(f-(x) C Di)

Our measure p is a probability measure on D so by construction P is a stochastic matrix
with pi, = p(f(x) E D jx e D,) and j iP, = 1, p, is a conditional probability.
When establishing an n-partition the physical action for the mixing protocol on
the domain should be considered. If the domain is the unit disk and mixing protocol
acts in a circular manner, to check that regions closer and farther from the origin are
mixing we could partition the disk into rings with radii r, = rk = If the

domain is a rectangle and we are confident that the mixing protocol mixes horizontal
(vertical) sections, then we may partition the rectangle into n horizontal (vertical) smaller
rectangles to check that vertical (horizontal) regions mix. If we do not know about the
stirring protocol before partitioning, the partition should minimize subset diameter.

CHAPTER 2
ERGODIC THEORY AND MARKOV SHIFTS

2.1 Ergodic Theory

In this section we provide theorems and definitions from ergodic theory and hint at

how Markov shifts relate to approximating (D, B, j, f).

Definition 2.1.1. [8] If (D, B, p) is a probability space, a measure preserving transforma-

tion f is ergodic if B B and f-1(B) = B then p(B) = 0 orp(Bc) = 0.

This definition generalizes to o-finite measure spaces, but we will only look at

probability spaces.

We will use the following theorem to define ergodic, weak-mixing and strong-mixing.

Theorem 2.1.2. [9] Let (D, B, i, f) be a probability space and let S be a semi-algebra

that generates B. Let f : D D be a measure preserving transformation. Then

1. (D, B, p, f) is ergodic if and only if for all A, B e S,

N-1
m I(f -'(A)n B) = i/(A)(B).
i= o

2. (D,, B, j, f) is weak-mixing if and only if for all A, B e S,
i N-1
lim N p(f-'(A) n B) p(A)p(B) I= 0.
i= 0

3. (D, B, B, f) is strong-mixing if and only if for all A, B e S,

lim p(f-"(A) n B) = I(A) (B).
N-+oo

Notice that (D, B, p, f) is strong-mixing = (D, B, p, f) is weak-mixing = (D, B, p, f)

is ergodic.

Ergodic means that iterations of the function averages out to be independent,

strong-mixing means that iterations of the function converges to independence, and

weak-mixing means that iterations of the function convergences to independence except

for rare occasions [8].

2.2 Markov Shifts

Definition 2.2.1. A vector

P= (pl p2... n)

is a probability vector if

0< pi <1,

for all i and

P+ P2 ... -Pn = 1.

Definition 2.2.2. A matrix is a stochastic matrix if every row is a probability vector.

Notation 2.2.3. Let

Xn ={(xi) : i N, xi e {1,2, ..., n}} or

Xn ={(xi) : c Z, x e {1,2 .. n}}.

Notice that, e {1, 2,..., n} instead ofx, e {0, 1, 2,..., n 1}. We do this for ease of

numerical indexing.

Definition 2.2.4. Subsets of Xn of the form

{(x,) e Xn : X = a, X2 = a2, ...,Xk = ak

for a given a,, a, ..., ak are called cylinder sets.

If En is the o-algebra generated by cylinder sets, p is a length n probability vector

and P is a stochastic matrix such that pP = p, then (Xn, En, (p, P)) defines a globally

consistent measure space where the measure of {{x,} e Xn : xl = a, x2 = a2, ...Xk

ak} is (Pa1)(Pa1a2...Pak- ak)"

The measure of a cylinder set uses both p and P.

Definition 2.2.5. If P is a stochastic matrix and p is a probability vector such that

S= pP, then p is called a stationary distribution of P.
Definition 2.2.6. Let f, : E, Z,, f,((xi)) = (x, +) then f, is called the shift map.

The shift map is a (p, P)-measure preserving map.
Remark 2.2.7. If pj = 0 for some j then the measure of {(x,) e X, : Xk = j} is zero;

without loss of generality say that pj > 0 for allj c {1, 2,..., n}. Otherwise if pj = 0 then

the set a sequences with xk = Pj for some k has zero measure.
Definition 2.2.8. The dynamical system (X,, E,, (p, P), fn) is called a Markov shift,

with (i, P) as the Markov measure. IfXn = {(x,) :i e N,x, e {1, 2,..., n}} then it is a
one-sided Markov shift. IfXn = {(x,) : ie Z,, i {1,2,..., n}} then it is a two-sided

Markov shift.

Our goal is to use (X,, Zn, (p, P), fn) to approximate (D, B, /, f). If we look at a point
x e D, iterate f, and let aN = / where fN(x) e Di, (p, P) gives the probability distribution
of cylinder sets. For our mixing problem an element of X, represents the movement of
an ingredient particle while stirring, so we say that (X,, Z,, (P, P), f,) is a one-sided shift.

In the Doubly-Stochastic Matrices section, we will show that p = (1/n, 1/n,..., 1/n) is

a stationary distribution for any n x n ds-matrix. For brevity we will let (p, P) represent
Markov shifts.

With Markov shifts, weak-mixing is equivalent to strong-mixing [9], so we will refer

to ((1/n, 1/n, 1/n,..., 1/n), P) as mixing or not mixing. Since our Markov shift is only an
approximation of (D, B, p, f), we may accept hypotheses of weak-mixing and not make

statements of strong-mixing when weak-mixing is accepted. Later we will show why

((1/n, 1/n, 1/n,..., 1/n), P) is a reasonable approximation of (D, B, p, f).
1. If ((1/n, 1/n, 1/n,..., 1/n), P) is not ergodic, we will fail to reject the hypothesis that

(D, B, p, f) is not ergodic and not weak-mixing.

2. If ((l/n, l/n, 1/n,..., l/n), P) is ergodic but not mixing, we will reject the hypothesis

that (D, B, p, f) is not ergodic in favor of the hypothesis that (D, B, p, f) is ergodic

but not weak-mixing.

3. If ((1/n, 1/n, 1/n,..., 1/n), P) is mixing, we will reject the hypothesis that (D, B, p, f)

is not ergodic and not mixing in favor of the hypothesis that (D, B, p, f) is ergodic

and weak-mixing.

The following lemma and theorems give us criteria for when

((1/n, 1/n, 1/n, ..., 1/n), P)

is ergodic or mixing.

Lemma 2.2.9. Let P be a stochastic matrix, having a strictly positive probability vector p

with pP = p, then
N-1
Q = lim 1y9pi
i=

exists. The matrix Q is also stochastic and

PQ = QP= Q.

Any eigenvector of P for the eigenvalue 1 is also an eigenvector of Q. Also

Q2 = Q.

Theorem 2.2.10. Let fn denote the (p, P) Markov shift (one-sided or two-sided). We can

assume pi > 0, Vi, where # = (p, ..., pn) (P is n x n). Let Q be the matrix obtained in

the lemma above, the following are equivalent:

1. (Xn, Zn, (p, P), fQ) is ergodic.

2. All rows of the matrix Q are identical.

3. Every entry in Q is strictly positive.

4. P is irreducible.

5. 1 is a simple eigenvalue of P.

We will set p = (1/n, 1/n, 1/n,..., 1/n), P = (py), pu = p(f(x) e Dj x e Di). P is a

stochastic matrix; 1 is and eigenvalue of P, if 1 is a simple eigenvalue of P and all other

eigenvalues are not 1 in magnitude, then we will reject the null hypothesis in favor of the

hypothesis that (D, B, p, f) is ergodic.

Theorem 2.2.11. [9, 10] If fn is the (p, P) Markov shift (either one-sided or two-sided)

the following are equivalent:

1. (Xn, Z,, (p, P), fn) is weak-mixing.

2. (Xn, tn, (P, P), fn) is strong-mixing.

3. The matrix P is irreducible and periodic (i.e. 3 N > 0 such that the matrix pN has

no zero entries).

4. For all states i, j we have (Pk)u J p.

5. 1 is the only eigenvalue of P with magnitude 1, and it is a simple root of P's

characteristic polynomial.

If we say that {A,, 2, A3,..., An} is the multiset of P's eigenvalues with

1 = > IA21 > IA31 > ... > Inl.

Partially order the eigenvalues by magnitude, then by distance from 1. The previous

theorem implies that ((1/n, 1/n, 1/n,..., 1/n), P) is mixing if and only if 1 > |A21. So if

|A21 is smaller than one, then we will reject the null hypothesis in favor of the hypothesis
that (D, B, p, f) is weak-mixing. Since (Xn, Zn, ((1/n, n, n,..., 1/n), P), fn) is only an

approximation of (D, B, p, f) our test only checks for weak-mixing.

Theorem 2.2.12. [9] The Markov shift (p, P), p= (pi), P = (P,), has metric entropy
n n
pip,(log(p)).
i=l j=1

Where we define 0 log(0) := 0.

Corollary 2.2.13. The sum

(D, B, f).

n 1 i p,-i (log(p,) gives an estimate of the entropy of
n

Proof. Set (p, P) = ((1/n, 1/n, 1/n,..., 1/n), P), pi = 1/n Vi.

Theorem 2.2.14. If (p, P) is a Markov shift on n states (one-sided or two-sided) then its

entropy is less than or equal to log(n).

Corollary 2.2.15. If h is an upper bound for the entropy of (D, B, p, f) then we should

set

n > exp(h)

when we partition D.

Example 2.2.16. Here are some examples of Markov shifts with four states.
1/6 5/6 0 0

5/6 1/6 0 0
1. ((1/4,1/4,1/4,1/4), ) is a non-ergodic Markov sh
0 0 1/3 2/3

0 0 2/3 1/3
entropy approximately 0.544.
0 1 0 0

0 0 1 0
0010
2. ((1/4,1/4,1/4,1/4), ) is an ergodic Markov shift, but is non-n
0 0 0 1

1 0 0 0
entropy 0.
0001

1000
entropy 0.

1/4 1/4 1/4 1/4

1/4 1/4 1/4 1/4
3. ((1/4,1/4,1/4,1/4), 1/4 1/4 1/4 1/4
1/4 1/4 1/4 1/4

1/4 1/4 1/4 1/4
shift achieves the entropy upper bound

ift with

nixing, with

) is a mixing Markov shift. This Markov

Example 2.2.17 (Special Case n = 2). If we have a 2-refinement ofD, P = q

1 p. The characteristic polynomial of P is (A 1)(A + 1 2p). So A2(P) =2p 1.

1 0
((1/2, 1/2), P) is ergodic -P /
0 1

1 0 0 1
((1/2, 1/2), P) is mixing #P / or
01 1 0

When n = 2, a doubly-stochastic matrix is symmetric and there are one or two distinct
entries, so it is easy to explicitly state all cases of ergodic and mixing Markov shifts.
When n > 2, the numbers of entries make describing such Markov shifts with matrix
entries difficult.

CHAPTER 3
STOCHASTIC AND DOUBLY-STOCHASTIC MATRICES

3.1 Doubly-Stochastic Matrices

When we construct the stochastic Ulam matrix that approximates our stirring

protocol, it will approximate a doubly-stochastic matrix. In this section we establish

properties of stochastic matrices and doubly stochastic matrices. Some of the results

are not used in our hypothesis tests, but can confirm statistical results and provide

intuition into the structure of doubly stochastic matrices.

Definition 3.1.1. If P is an n x n stochastic matrix such that pT is also a stochastic

matrix then P is a doubly-stochastic matrix. We will refer to such matrices as ds-

matrices.

Notation 3.1.2. Let M, denote the set of n x n matrices such that all rows sum to one

(no restriction on entries).

Let P, denote the set of n x n stochastic matrices.

Let DS,, denote the set of n x n ds-matrices.

Let ,, denote the set of n x n symmetric ds-matrices.

Notice that S, c DSn, C Pn M,.

Remark 3.1.3. If P is an n x n symmetric stochastic matrix then P is a ds-matrix, but the

converse is not true.

Proof. The matrix P is stochastic, and

P= pT

Therefore P is doubly-stochastic.

For the converse look at the counterexample

1/3 1/3 1/3

P= 1/2 1/6 1/3

1/6 1/2 1/3

pT is stochastic, but P is not symmetric.

Theorem 3.1.4. If P is the stochastic matrix formed by taking an n-partition of

(D, B, p, f), then P is a ds-matrix.

Proof. By construction, P is a stochastic matrix. Since 0 < p, < 1 for all UI, it suffices to

show that the sum of all columns equals one. Since p(x e D,) are all equal, p(x e D,) =
1
n
n n
pY= u(f(x) Dxe D,)
i= 1 i= 1
p(f(x) e DAxeID,)
_(X E ]Di)

nt (f (x) e Dr Axe D,)
1
i= 1
n
n
= n yp(f(x) e Dj Ax e D,)
i 1
n
= np(f(x) e Dr Ax e D,)
i 1

= n(f(x) e D) = n(1/n) = 1.

So 2i py 1. It follows that PT is a stochastic matrix. D

We skip the standard proofs of the following lemmas. The next lemma shows that all

of the eigenvalues of a stochastic matrix are on the unit disk of the complex plane.

Lemma 3.1.5. If P is an n x n stochastic matrix and A is an eigenvalue of P then

A < 1.

Next we show that the largest eigenvalue of any stochastic matrix is one with

an eigenvector of all ones. Because of this lemma, we may use (1/n,..., 1/n) as a

stationary distribution of P. Since our goal is to measure mixing between n states each

with measure 1/n, (1/n,..., 1/n) is intuitively the correct stationary distribution to use.

Without the lemma we would have the additional task of finding a stationary distribution

of P.

Lemma 3.1.6. If P is a stochastic matrix, then

(1, 1, ...)

is a left eigenvector of pT with eigenvalue 1. If P is an n x n ds-matrix, then

(1/n, 1/n, ..., l/n)

is a stationary distribution.

The positive and negative entries of eigenvectors from an Ulam matrix can be used

to detect regions that do not mix or are slow to mix. The next theorem is critical to this

technique. In addition, the next theorem will help justify using eigenvectors from an Ulam

matrix to approximate eigenfunctions. Also the theorem and proof are very similar to an

Theorem 3.1.7. If P is an n x n stochastic matrix with eigenvector v = (vi), Pv = A ,

then

n
A = 1 or vi= 0.
i=1

Proof. The matrix P is a stochastic matrix so all rows sums to one. Since Pv = AV,

n

p v, = Av,
J 1
n n n

i=1 j=1 i= 1
n n n

j=1 i 1 i=1
n n
Y vj(1)= A Y vi
j=1 i=1

n n

i=1 i=1
n
(1-A) v = 0.
i=1

Thus we have the result. D

Theorem 3.1.8 (The Perron-Frobenius Theorem for Stochastic Matrices). If P is a

stochastic matrix with strictly positive entries then

1. P has 1 as a simple eigenvalue.

2. The eigenvector corresponding to 1 has strictly positive entries.

3. All other eigenvalues have magnitude less than 1.

4. The eigenvectors that do not correspond to 1 have nonpositive entries.

If P is a stochastic matrix with nonnegative entries then there is an eigenvector corre-

sponding to 1 with all entries on [0, oo).

Corollary 3.1.9. If P (our Monte Carlo approximation of P) has strictly positive entries

then for each ij

p(f(x)e Dj Ax e D,) > 0 almost surely.

It follows that P has strictly positive entries, so (Xn, ,n, ((1/n, ..., 1/n), P), f,) is weak-

mixing and we will conclude that (D, B, /, f) is weak-mixing.

When P is the matrix defined by

Pi = p(f(x)e Dl |xID,),

the vector

(1/n, 1/n, 1/n, ..., l/n)

is a left eigenvector and stationary distribution of P. Thus

((l/n, l/n, /n, ..., l/n), P)

defines a one-side or two-sided Markov shift. Our goal is to use

(Xn, Zn, ((1/n, 1/n, 1/n,..., 1/n), P), fn)

to approximate (D, B, /, f). If we look at a point x e D, iterate f, and let aN = / where
fN(x) e Di, ((1/n, 1/n, 1/n,..., 1/n), P) as a Markov shift gives the probability distribution

of cylinder sets. So our approximation gives us information about iterations of f on D.

Knowing that P is a ds-matrix gives us a stationary distribution of P that is consistent
1
with p(D,) = .
n
Now we take a look at the simplest case possible, n = 2. We will return to this

example to illustrate concepts.

Example 3.1.10 (Special Case n = 2). If we partition D into D1 andD2, p(DI) = p(D2) =

, we get the following equations for P.

Pl + P12 = 1

P21 + P22 = 1

P11 + P21 = 1

P12 + P22 = 1

Thus Pll = P22, P12 = P21- If we set p = p1, q 1 p= p12,

P p q

Proposition 3.1.11. S, = DSn,, n = 2

Proof. Break up the proof into three cases, n = 2, n = 3, and n > 3.

n=2: This case follows from the previous example. Every 2 x 2 ds-matrix is of the

form

= p l-p
1-p p

for some p E [0, 1].

n=3: This case follows from a previous counterexample. The matrix

1/3 1/3 1/3
P= 1/2 1/6 1/3

1/6 1/2 1/3

is a 3 x 3 ds-matrix that is not symmetric.

n > 3: Let P denote the matrix in the n = 3 case, Ik be the k x k identity matrix, and

Ok be the k x k matrix with zero for all entries. Then for each n > 3,

P 03
03 /n-3

is an n x n ds-matrix that is not symmetric. D

As n increases, the degrees of freedom for n x n ds-matrices increases and the

ways that a ds-matrix can deviate from being symmetric grow. Due to this and the

observation that randomly generated ds-matrices are symmetric less frequently for large

n, we propose the following conjecture.

Conjecture 3.1.12. If M, is the set of n x n matrices whose rows and columns sum to

one, g : M, -> R"2, II M IIF= I g(M) 112 for all M E Mn, DSn, is the set of n x n ds-matrices,

and S, is the set of symmetric n x n ds-matrices, then

lim p(g(Sn)) = 0
n-oo

in the measure space (g(DSn), B, /) where p is Lebesque measure.

If this conjecture is correct, then observing a symmetric ds-matrix becomes less

likely as n -> oo.

Definition 3.1.13. If V is a vector space with real scalars and we take a linear com-

bination of elements from V, cl + c2 + ... + CNVN, such that 0 < c, < 1 and

C1 + C2 + ... + =C 1 then cl v + c2 + ... + c V+ N is called a convex combination. We

refer to the coefficient {c, c2,...., CN} as convex coefficients.

Remark 3.1.14. If {ci, c2,..., CN} are convex coefficients then E = (c,) is a probability

vector. Convex combinations are weighted averages.

Theorem 3.1.15 (Birkhoff-von Neumann). [11] An n x n matrix is doubly stochastic if and

only if it is a convex combination of n x n permutation matrices.

So DS,, is a convex set with the permutation matrices being the extreme points of

the set. In fact, by Caratheodory's convex set theorem, every n x n ds-matrix is a convex

combination of (n 1)2 + 1 or fewer permutation matrices.

Notation 3.1.16. Let

-1 1-
n n

1 1
-n n. -

A quick computation shows that

-1
P=
k=1

where {P,}J 1 is the set of n x n permutation matrices.

Since P is the average of all n x n permutation matrices, P is the geometric center

of DSn. If we apply the uniform probability measure to the set of permutation matrices,

then P is the mean.

Example 3.1.17 (Special Case n = 2). If n = 2 then a ds-matrix is of the form[: q
q p
q = 1 p.

p q 1 0 01
= P + q
q p 0 1 1 0

Notice that for 2 x 2 ds-matrices are always symmetric.

Example 3.1.18 (Special Case n = 3). If P is an n x n ds-matrix then P is a convex
1 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 1

combination of 0 1 0 0 1 0 1 0 0 1 1 0 0

00 1 1 0 0 1 1 0 0 1 0 0 1 0
Since the first four permutation matrices are symmetric and the last two are not, the
010 0 0 1

convex coefficients of 00 1 0 0 determine how far a ds-matrix is from

1 0 0 1 0
symmetry.

The next theorem shows that DS,, is a bounded set.

Theorem 3.1.19. If P is an n x n ds-matrix, then for the 1-norm, 2-norm ,0o-norm and

Frobenius-norm

I P P
Proof. By the Birkhoff-von Neumann theorem P is a convex combination of permutation

matrices. Assume that P = 1i, iPi where {Pi}n1 is the set of n x n permutation

matrices,
n! n!
|| P P iPi- a |I
i=1 i=1
nI
= ai(P,- P)I
i=1
n!
< a, I P,- P
i=1

n!
< Za,maxj P I
i= 1
n!
< (max,- P- P |) Za,
i= 1
< maxj | P- P | .

Since P has all equal entries,

maxj || P,- P I|=| P | .

Thus we get the upperbound. O

Corollary 3.1.20.

P 1< 2(n 1)
||P -P||< <2
n

||P- P|,< <2
n
II P-P liF< V -

Notice that all upper bounds are achieved when P is a permutation matrix.

Proof. For the one, infinite and Frobenius norms the result follows from the definitions.
For the two norm, P is a symmetric idempotent matrix thus / P is a symmetric
idempotent matrix. Idempotent matrices have eigenvalues 0 and 1. Since / P is not
the zero matrix, it has one as its largest eigenvalue. Look at the largest singular value of
I P.

I /- P 112 = a1(/ P)

= JAI((/- P)T(I- P))I

A ((/ P)(/ P))I

= AI( /-P)I

Thus PI P < 1. D

Example 3.1.21 (n= 2). If P= p p then

| -+| 1- 2
P+, P-

II P 1 = 12p- 1 < 1,

II P-P112 = 12p- 1< 1,

I| P-P5 1I= 12p- 1< 1,

I| P-P IF = 12p- 1< 1.

The matrix P is the center of ds-matrices, how does ((1/n,..., 1/n), P) compare to

other measures? If our Markov shift has ((1/n,..., 1/n), P) as its measure, then knowing
which partition set x is in tells us nothing about which partition set f(x) is in. Thus P
is the matrix of optimal mixing. If IA2(P)| < 1, then (Xn, Zn, ((1/n,..., 1/n), P), fn) is a
mixing dynamical system and pk -> P as k -> oc (we will show this result while we

construct our mixing rate estimate). So a Markov shift, (Xn, Zn, ((1/n,..., 1/n), P), fn),
being a mixing dynamical system is equivalent to pk -> P as k -> oc. We will need a
Jordan canonical form of P to make an inference about the rate at which pk -> P as
k -> oo. We will use the rate at which Pk -> P as k -> o to estimate the mixing rate of
(D, B, f).
Lemma 3.1.22. The n x n matrix P is an orthogonal projection [12] with characteristic
polynomial xn-(x 1). Futhermore the Jordan Canonical form of P has a one on the
diagonal and all other entries are zero.

Proof. The matrix P is a ds-matrix, hence (1/n, 1/n,..., 1/n) is a stationary distribution.

(1/n, 1/n, ..., 1/n)P = (1/n, 1/n, ..., /n).

It follows that

P2 =P.

Since P is an orthogonal projection, all eigenvalues are 0 or 1. All columns and all rows

are equal so rank(P) = 1, thus only one eigenvalue does not equal zero. Since the rank

of a matrix is equal to the rank of its Jordan canonical form, the last part of the lemma

follows. O

3.2 Additional Properties of Stochastic Matrices

Since we are using an eigenvalue of a stochastic matrix as a test statistic, the

probability distribution of eigenvalues is important. We may use the next few results

to eliminate some measures from consideration as the probability distribution of A2(P)

when we have prior knowledge about P.

Proposition 3.2.1. If P is an n x n stochastic matrix then A2 + ... + n [-1, n 1].

Notice that the upper bound is achieved when P is the identity matrix and the lower
010
bound is achieved when P = 0 0 1

100

Proof. The matrix P is stochastic so all entries are on [0, 1], thus trace(P) e [0, n].

A + A2 + ... + An = trace(P) so A1 + A2 + ... + An e [0, n]. Since A1 = 1, A2 + ... An E

[-1, n- 1].

Proposition 3.2.2. If P is an n x n stochastic matrix then det(P) e [-1, 1].

The upper and lower bounds are achieved by permutation matrices.

Proof. All stochastic matrices have real entries, so det(P) e R.

det(P) = AjA2...An

Taking absolute values of both sides, we get

Sdet(P) =| AA2...An I

= AI 1II A2 ... An

<1.

Thus we get the result. D

Corollary 3.2.3. If P is an n x n stochastic matrix then A2...An e [-1,1].

The next result shows that if we are working with a class of ds-matrices with large

entries on the diagonal, then the eigenvalues are not uniformly distributed on the unit

disk. If we write the Gershgorin circle theorem for stochastic matrices, we can quickly

find a region that will contain all eigenvalues.

Theorem 3.2.4. If P = (p,) is an n x n stochastic matrix then

A min Pii < 1 min pii

for all eigenvalues.

Proof. By Gershgorin circle theorem there exists i such that
n
A p,, < S PU
j= 1,ij

P is a stochastic matrix, thus
n
SP= Pii.
j 1,iJj

It follows that

A p,, I 1 -p,.

So each eigenvalue is contained in a closed disk centered at pii with radius 1 pii for

some i. All of these disks have a diameter in [-1, 1] that contains 1. If we look at the real

numbers in [-1, 1] contained the ith disk,

Ix P < 1 pii

Pii 1
2pii 1
2 min pj.-1 < 2pii 1 x 1
J

2min p 1 J

min p -1 J J J

x minpy < 1 min pj.
J J

It follows that the real numbers contained in I A pi < 1 pi are contained in the disk

defined by I A -minj py I< 1 minj py. Since the center of each disk is contained in [0, 1],

all disks are contained in I A minj pj 1< 1 minj pjj. D

Corollary 3.2.5. If P is an n x n stochastic matrix with all diagonal entries greater than -
2
then P is invertible.

Proof. For an n x n matrix to be invertible, all eigenvalues must be nonzero.

1
I A pi I< 1 minipii <

Since

1
< miniPii,
2

the open disk centered at pij with radius does not contain zero. D
2

If we have prior knowledge of a positive lower bound of trace(P), then we may use

the next theorem to exclude some distributions from consideration as the probability

distribution of A2.

Theorem 3.2.6. If P is an n x n stochastic matrix then trace( < 2
n-1
Proof. P is a nonnegative matrix, so trace(P) > 0,

trace(P) = 1+ A2 + ... + An

= 1 +2 + ... +n

<1 -+ IA2 +...+ An

< 1+(n-- 1)I 2 A2

Therefore

trace(P) -1
I-I A2 I
n-1

Thus we get the result. O

This upperbound is achieved when P is the identity matrix.

Theorem 3.2.7. If P is an n x n stochastic matrix then

I An < "V-I det(P) < A2 .

Proof. Since P is stochastic,

A1=l.

Let's use the relationship between determinants and eigenvalues,

det(P) = AjA2...An

= A2...An.

Taking absolute values of both sides, we get

det(P) A =| ...An =| A2 | n .

By how we defined Ai,

SIn-1<1 A2 I A... An I<1 A2 In-1

Thus

I An I< n / det(P) I < A2 .

This gives us both inequalities. O

Notice that I An 1= "-/ det(P) I = A2 I when P is a permutation matrix or P.
Corollary 3.2.8. If P is an n x n stochastic matrix then

max ce() det(P) } n-

Proposition 3.2.9. If P is an n x n stochastic matrix then for any matrix norm induced by
a vector norm

S<11 P II

Proof. One is the largest eigenvalue for any stochastic matrix. D

The identity matrix achieves the lower bound.
Proposition 3.2.10. If P is an n x n stochastic matrix, then

|I P I| =

II P 12 = 1

I P |1 =1

Now we look at a matrix that defines a linear map that we may use to evaluate the

eigenvalues of P. We may use this map when eigenvalue stability is uncertain.

Notation 3.2.11. Let T denote the n x n matrix

I P.

Notation 3.2.12. Let < u, v > denote the standard dot product on the vectors u and v.

The next theorem shows that A2(P) = Ai(TP).

Theorem 3.2.13. If P is an n x n stochastic matrix, then

A(TP) = (A(P)U{0}) {1},

where A denotes the multiset of eigenvalues. Moreover, any eigenvector of P with A f 1

is an eigenvector of TP with the same eigenvalue.

Proof. Since P is a stochastic matrix,

is an eigenvector with eigenvalue one and has norm one. For any vector, v = (vi),

< v, >

is orthogonal to u.

- < u > =V- < V,

= Iv-

1 1

)( )

-
/K

=AT(
ATV

n
= TV.

TPV =AV.
If v is a scalar mutipletor of P, P= then,
TPv=T(vP')

=\1Tv

=\(lv < v, *> ).

IfAf1,< 7,?>= 0,

TP7 = A 7.

If v is a scalar multiple of i then

TPv = : = 0 U.

\0/

IfA = 1and / J then without loss on generality we may assume that < v, u >= 0.

It follows that

TPV =v.

Since TP has the same eigenvectors as P, we get the result. D

So we may use TP to describe the eigenvalues of P, furthermore A2(P) = A1(TP).
Next we look at how the singular values of TP compare to the eigenvalues of P.

Notation 3.2.14. If M is an n x m matrix, let 1 (M), 2,(M),...,. mn(m,n)(M) denote the
singular values of M;

71(M) > 72(M) > ... > min(m,n)(M).

Theorem 3.2.15. If P is an n x n stochastic matrix, then

IA2(P)| < 7l(TP) < 1.

Proof. Use an eigenvalue and singular value inequality [12] and our previous result,

|A2(P)| = AI(TP)| < a (TP).

The upperbound of one follows from a straight forward computation. O

If we are concerned about the stability of eigenvalues from our approximating
matrix, P, then we may use the eigenvalues of TP. If we do not trust the stability of

eigenvalues from either matrix, then we may use the first singular value of TP. Since
the first singular value of a matrix is very stable, o-( TP) is a better statistic when
eigenvalue stability is questionable. Unfortunately, the probability distribution of aoi(TP)
will likely differ from the probability distribution of |A2(P)|.

CHAPTER 4
ESTIMATING THE RATE OF MIXING

Time is money, so if we have several mixing protocols we would prefer one that

mixes rapidly. After we conclude that a protocol mixes, the next question is "How

fast does it mix?" In this section we establish a statistical measure of mixing rate for

(D, B, p, f) when

(Xn, Zn, ((1/n, 1/n,..., 1/n), P), f,) is mixing.

4.1 The Jordan Canonical Form of Stochastic Matrices

Theorem 4.1.1. If P is an n x n ds-matrix such that IA2(P)I < 1 then

1/n ... 1/n
pN" as N oo.

1/n ... 1/n

Proof. Let J be a Jordan canonical form of P with (J)ii = A,, and conjugating matrix E,

P = EJE-1,

and

pN = EN E-.

Also, let Bil denote an / x / Jordan Block of J with eigenvalue Ai.

If |A2(P)| < 1, 1 has multiplicity one, hence [1] is a Jordan block; all other

blocks have diagonal entries with magnitude less than one. If we look at powers of

Jordan blocks Bi, (N > n), the diagonal is A"...A", the subdiagonals are zeros, the

superdiagonals are ANd() where d is the dth superdiagonal. When |A2(P)I < 1 and

S>1,

N=d

passes the ratio test, so the sum converges. Thus all entries of BN go to zero as

N -- oo. It follows that

1 0 ... 0

0 0 ... 0

0 0 ... 0

as N oo. JN goes to a rank one matrix as N -- oo and pN = EjNE-1, thus pN goes to

a rank one matrix as N -> oo. The left probability vector (1/n, 1/n,..., 1/n) is a stationary

distribution for every ds-matrix, so

(1/n, 1/n, .... 1/n)P = (1/n, 1/n,.... 1/n).

Thus the statement follows. D

The Markov shift ((1/n,..., 1/n), P) intuitively gives the optimal mixing for a Markov

shift. Knowing xi tells us nothing about x, i, and for any probability vector, (pl, p2,..., pn),

(P,, P2, ..., pn)P = (l/n, 1/n,..., l/n).

When we use (pi, P2, ..., p,) to represent a simple function approximating the initial

concentration of an ingredient to be mixed in D, a stirring protocol that mixes the

ingredient in one iteration will have (Xn, ,n, ((1/n,..., 1/n), P), f,) as the approximating

Markov shift.

4.2 Estimating Mixing Rate

The dynamical system (Xn, En, ((1/n,..., 1/n), P), fn) is our approximation of how

N iterations of our stirring protocol act on D. So the rate at which pN goes to a rank one

matrix gives us a measure of the rate that f mixes D. Let Bi, be an / x / Jordan block of

P with eigenvalue A,. The rate at which B2, goes to zero for the largest I determines the

rate at which pN goes to a rank one matrix. So if we know the Jordan canonical form of

P, we have a measure of the mixing rate. What if n is so large that the Jordan canonical
form of P is not computable?

Theorem 4.2.1. The sequence defined by

n( 1 I2 N-n 1

N E N provides an estimate of f's rate of mixing.

Proof. Let J be the Jordan canonical form of P with conjugating matrix E,

P = E-1JE, pN = E-1jNE.

If |A2(P)I = 1 then JN does not go a rank one matrix as N -> oo, and we do not
conclude that (D, B, /, f) is weak-mixing.
If A2(P)| < 1, 1 has multiplicity one, hence [1] is a Jordan block; all other
blocks have diagonal entries with magnitude less than one. If we look at powers of
a Jordan block Bi, (N > n), the diagonal is AN... A, the subdiagonals are zeros, the
superdiagonals are A" d() where d is the dth super diagonal. So the upper right entry

of a Jordan block is the slowest to converge to zero. When |A2(P)| < 1 and i > 1,

rN- d(N)
N=d

passes the ratio test, so the sum converges. Thus all entries of BN go to zero as

N -> oo. If we look at ratios of the upper right entries, we see that eigenvalue magnitude
influences the rate of convergence more than block size. Thus the upper right entry of

the largest block of A2 converges most slowly to 0. The largest block size possible for A2
is (n 1) x (n 1). Therefore the rate at which entries of J that converges to zero go
no slower than the rate that A'-" 1(nN) -> 0. The equivalence of P and J shows that

the rate that AN(nN) -> 0 as N -> oo gives an upper bound on the rate that pN goes to a
rank one matrix as N -> oo. D

Our mixing rate estimate is an upperbound on the rate that pN goes to a rank one

matrix. Since (X,, Zn, (p, P), fn) only approximates the dynamical system, we use the

rate at which

n N 1 A2- n o
as our mixing rate estimate instead of an upperbound on the rate of mixing.
as our mixing rate estimate instead of an upperbound on the rate of mixing.

CHAPTER 5
PROBABILISTIC PROPERTIES OF DS-MATRICES
After the value of n is set we know everything about

(Xn, Zn, ((l/n, 1/n, ..., l/n), P), fn)

except the ds-matrix P. We will treat P as a random event, and in this section we look
at properties of random doubly stochastic matrices. Most of the results presented apply
directly to our dynamical problem, others are presented to provide insight into random
ds-matrices. Since each entry of a random matrix defines a random variable, we will
spend quite a bit of time on the entries of random ds-matrices.
5.1 Random DS-Matrices

Definition 5.1.1. If x(w) is an integrable function over the probability space (Q, E, p(w)),
then the expected value of x is

E(x)= Jx(w)dp(w).

If (x E(x))2 is an integrable function, then the variance of x is

V(x) = E((x- E(x))2).

If x(w), y(w) and x(w)y(w) are integrable functions over the probability space

(n,Zt,(W))

then the covariance of x and y is

cov(x, y) = E((x- E(x))(y- E(y))).

Some standard results are that

E(c) = c

for any constant c,

V(x) = E(x2) (E(x))2

whenever E(x2) and E(x) are finite, and

cov(x, y) = E(xy) E(x)E(y)

whenever E(xy), E(x), and E(xy) are finite.

Theorem 5.1.2. If P = (py) is a random stochastic matrix where py are identically

distributed then

1
E(py) = for all ij.
n

Proof. Since P is a stochastic matrix and taking an expected value is a linear operation,

n
E(p )
j-i1

nE(py) = 1

E(pu)

Thus the statement holds. C

Theorem 5.1.3. If P = (py) is a random n x n stochastic matrix where py are identically

distributed and N E N, then

1 1
< E(p') <
n n

where p = (PU)N.

Proof. First we prove that E(pN) <

For any stochastic matrix P,

... + Pin = 1

Pil + Pi2

(Pil+ Pi2 ... Pin) = 1

Pil + Pi2 + P in +0 = 1.

Where 0 is the summation of the remaining addends after expanding the multinomial.

-- P _il PN -...- Pn <

=1-pN -pN -...-pi->0 <1
Pil + Pi2 + + Pin -

E(" Pi,2 ... P ) = E(1 ) < E(1)

nE(p) = 1 E(0) < 1

1 E(0) 1
E(pN) = (O) < -
n n n
1
Now we prove that < E (pN) by using Minkowski's inequality.

1 = Pil Pi2 ... +Pin

= (Pil + Pi2 + ... + Pin)
= E((Pil Pi2 + + Pin)N

= E ((pil+ Pi2 + ... Pi)N)

Since p, are identically distributed, these expected values are all equal,

1

Dividing both sides by n gives

n

Finally taking powers of both sides gives the result.

< E(p, ).

1 1
Hence < E(pN) < -
n n
Corollary 5.1.4. If P = (p,) is a random n x n stochastic matrix where py are identically

distributed then

1 1
V(pU) < -
n n

Proof.

V(p,) = E(p2) E(p,)2
1
= E(p) n2
1 1
-n n2

Thus the statement follows. D

The next few theorems give properties of the covariance between entries of a

random n x n ds-matrix. Since all rows and all columns sum to one, if one entry changes

then at least one other entry on the same row and one other entry on the same column

must change to maintain the sum. If follows that the entries of a random n x n ds-matrix

cannot be independent.

Theorem 5.1.5. If P = (py) is a random n x n stochastic matrix where py are identically

distributed then

1
0 < E(pupi,) <
n

Proof. Since P is stochastic,

o

PAGE 123

sum(~u)isarandomconvexcombinationalmostsurely.Wemaytaketheabsolutevalueofrealrandomvariablesthatarecontinuousatzerotogetconvexcombinations.Ifwechangethedistributionoftheui's,wechangethedistributionoftheconvexcombinationsandthuschangethedistributionofthedoublystochasticmatrices.Toverifythis,comparetheresultswhentheui'sareindependentuniform([0,1])randomvariables,andwhenui=v2iwherethevi'sareindependentcauchy(0,1)randomvariables.

PAGE 124

Proof. dkP(N+1
PAGE 125

dk(P(N+1
PAGE 126

Theorem12.2.11. Proof. Wewillrefertoconvexcombinationsofpermutationmatricesthatusealln!permutationmatricesalmostsurelyasfullconvexcombinations.Wewillcallconvexcombinationsthatuse(n1)2+1permutationmatricesreducedconvexcombinations.Thenextresultshowsthatfullandreducedconvexcombinationssamplefromprobabilityspaceswithdifferentmeasures.

PAGE 127

Proof. Itwouldbenicetobeabletoextendasetofobservedds-matriceswheneverobtainingobservationsisdifcultorexpensive;MuraliRaocreatedawaytoextendasetofobservations.

PAGE 130

1. Applythegreedyds-algorithmtoourobservedds-matrices. 2. Usethemethodofmomentsormaximumlikelihoodestimationtoapproximatetheparameter.Calltheapproximationb. 3. Useindependentgamma(b,b)randomvariablestogeneratenewds-matrices.Theresearchermustdecidewhatvalueofbisappropriate.Theck'sareindependentandidenticallydistributedgamma(b,b).k=ck

PAGE 131

Proof. 131

PAGE 132

17 ],theexpectedvalueofthesecondlargesteigenvalueoftheds-matricesgoestozeroasngoestoinnity[ 18 ].IfAisarandomnmmatrixwithfullcolumnrankalmostsurely,Uisaunitaryqr-factorofA,andPisthematrixwherePij=jUijj2,thenPisarandomds-matrix.DifferentprobabilitymeasuresforAwillresultindifferentprobabilitymeasuresforP.Soanyprocessthatgeneratesrandommatriceswithfullcolumnrankmaybeusedtogenerateds-matrices. 132

PAGE 133

133

PAGE 134

13-1 .ForeverypartitionbP=P,m=106appearstobesufcientandwefailtorejectHoandcorrectlyconcludethatthemapisnotmixing. 134

PAGE 135

TheReectionMap NumberofStatesAveragej2(bP)jTypicalp-valueof2 13-2 135

PAGE 136

PAGE 137

TheSineFlowMap(parameter8/5) NumberofStatesAveragej2(bP)jTypicalp-valueof2 whenevern>N.Weranourprocedure100timeswith106pseudorandomlygeneratedpoints(uniformly)andsawtheresultslistedintable 13-3 .Typicalsignicancelevelswillconcludethatweusedenoughpointsforn=4,16,64,256,1024,butm=106appearstobeinsufcientwhenn=4096.AtypicalbPforfourstatesisbP2666666640.16550.23410.23300.36730.23160.16560.37090.23180.23320.36910.16470.23310.36960.23260.23280.1650377777775.j2(bP)j0.2048 13-4 137

PAGE 138

TheSineFlowMap(parameter4/5) NumberofStatesAveragej2(bP)jTypicalp-valueof2 Typicalsignicancelevelswillconcludethatweusedenoughpointswhenn=4,16,64,256,1024,butm=106isnotsufcientwhenn=4096.AtypicalbPforfourstatesisbP2666666640.20150.23110.22940.33800.22930.20180.33940.22960.23070.33670.20100.23170.33910.22960.22980.2015377777775.j2(bP)j0.1375ItisbelievedthatfmixesfasterforlargerT.Comparisonofeigenvaluemagnitudefromthetwosineowmapexamplesrunscountertothisconjecture.Thebaker'smapexampleshowshoweigenvalueinstabilitycanaltertheobservedeigenvalues.

PAGE 139

13-5 139

PAGE 140

TheBaker'sMap NumberofStatesAveragej2(bP)jTypicalp-valueof2 AtypicalbPforfourstatesisbP2666666640.50130.498700000.49950.50050.50110.498900000.49970.5003377777775.max(jPbPj)0.0013j2(bP)j0.0079.Typicalsignicancelevelswillconcludethatweusedenoughpointswhenn=4,16,64,256,1024,butm=106isnotsufcientwhenn=4096.Weknowthatthemapismixing,theinducedMarkovshiftismixingbuttheeigenvaluesoftheapproximatingmatricesdonotreecttherateofmixing.Thisexampledemonstrateshoweigenvalueinstabilityandinsufcientmcanthrowoffanobservation.

PAGE 141

TheChirikovStandardMap(parameter0) NumberofStatesAveragej2(bP)jTypicalp-valueof2 Forthisexamplewesetk=0sothatwemaycomputeP,f(x,y)=(x,x+y).WhenwepartitiontheunitsquareintofoursquaresP=2666666641=201=2001=201=21=201=2001=201=2377777775,whichhascharacteristicpolynomial2(1)2.Weranourprocedure100timeswith106pseudorandomlygeneratedpoints(uniformly)andsawtheresultslistedintable 13-6 .AtypicalbPforfourstatesisbP2666666640.501200.4988000.500000.50000.500600.4994000.501000.4990377777775.

PAGE 142

142

PAGE 143