with
1 > cm, > Cm2 > ... > Crm 1)2+1 > 0.
Since each step takes a maximal coefficient to construct a convex combination of a
dsmatrix, we call this the dsgreedy algorithm.
f = factorial(n);
pt = perms(1 : n); % perms generates all n! permutations.
M = zeros(n, n, f); % For each i, M(:,:,i) will be a permutation matrix.
for i = 1 : f; % This loops stores permutation matrices in M.
forj = 1: n;
M(j, pt(i, j), i) = 1;
end
end
v = zeros(l, f); %v will be the vector of convex coefficients
S P;
%This loops looks at the S c(i)*M(:,:,i) where M(:,:,i) is a permutation matrix
%and c(i) is that largest possible value s.t. S c(i)*M(:,:,i) has no negative entries.
forj = : (1 + (n 1)2);
c = zeros(l, f); %Stores largest values s.t. S c(i)*M(:,:,i) has no negative entries.
h = [ ]; %h will indicate where in c the max coefficient is.
for / = 1 : f;
c(i) = min((S. M(:,:, i)) ones(n, 1));
end
129
TABLE OF CONTENTS
ACKNOWLEDGMENTS .........................
LIST O F TABLES .. .. .. .. .. .. .. .
ABSTRACT ...............................
CHAPTER
1 INTRO DUCTION ..........................
1.1 Hypotheses for Testing ....................
1.2 Testing Procedure ....................
2 ERGODIC THEORY AND MARKOV SHIFTS ..........
2.1 Ergodic Theory . .
2.2 Markov Shifts .........................
3 STOCHASTIC AND DOUBLYSTOCHASTIC MATRICES ...
3.1 DoublyStochastic Matrices .................
3.2 Additional Properties of Stochastic Matrices ........
4 ESTIMATING THE RATE OF MIXING .. ............
4.1 The Jordan Canonical Form of Stochastic Matrices ....
4.2 Estimating Mixing Rate .. ................
5 PROBABILISTIC PROPERTIES OF DSMATRICES ......
5.1 Random DSMatrices .. .................
5.2 Metric Entropy of Markov Shifts with Random Matrices ..
6 PARTITION REFINEMENTS .. ................
6.1 Equal Measure Refinements .. ..............
6.2 A Special Class of Refinements ..............
7 PROBALISTIC PROPERTIES OF PARTITION REFINEMENTS
7.1 Entries of a Refinement Matrix .. .............
7.2 The Central Tendency of Refinement Matrices ......
7.3 Metric Entropy After Equal Measure Refinement .....
8 ULAM MATRICES .. ......................
8.1 Building the Stochastic Ulam Matrix .. ..........
8.2 Properties of Ulam Matrices .................
is constant for any length n permutations a then
E(A2A3...An) = 0.
Theorem 5.1.12. If P = (py) is a random n x n stochastic matrix where py are identically
distributed then E(A2 + A3 + ... + An) = 0.
Proof. Since P is a stochastic matrix, A1 = 1. By the definition of trace(P) and the
commutative property of the trace operator
trace(P) = 1 + 2 +...+ An and trace(P) = pl + p22 + ... + Pnn
It follows that
1 + A2 + ... + An = Pll + P22 + ... + Pnn
Taking expected values of both sides give
E(1 + A2 +... An) = E(p + P22 + ... Pnn)
1 + E(A2+ ... +An) = E(p1) + E(P22)+ ...+ E(nn)
1 1 1
=++...+
n n n
= 1.
Subtracting one from both sides gives the result D
Notation 5.1.13. If {PnJk 1 is a sequence of nk x nk matrices, let Ak,i denote Ai(Pn,).
We will use the next theorem to tell us about the eigenvalues of matrices that arise
from taking a sequence of refinements of {D,} 1.
Theorem 5.1.14. If {Pnjk =1 is a sequence of nk x nk stochastic matrices such that
E((Pn,)u) = for alli
nk
When we partition the unit square into four squares
1 0 0 0
0 0 1 0
1000
0010
P=
0100
0001
which has characteristic polynomial
(A 1)3(A + 1).
Our eigenvalues are 1, 1, 1, 1; P has four linearly independent eigenvectors so it is
diagonalizable. If we refine our partition into smaller squares we see that P will be a
diagonal block matrix with
/n [1 blocks and
n 01 blocks
2 1 0
(recall that n is a power of four). Since each block is diagonalizable, our refinement
matrices are diagonalizable with characteristic polynomial
(A 1) (A + 1) .
We ran our procedure 100 times with 106 pseudorandomly generated points
(uniformly) and saw the following results listed in table 131.
For every partition P = P, m = 106 appears to be sufficient and we fail to reject Ho
and correctly conclude that the map is not mixing.
[13] G. Casella, R. L. Berger, Statistical inference, The Wadsworth & Brooks/Cole
Statistics/Probability Series, Wadsworth & Brooks/Cole Advanced Books &
Software, Pacific Grove, CA, 1990.
[14] J. Ding, A. Zhou, Finite approximations of FrobeniusPerron operators. A solution
of Ulam's conjecture to multidimensional transformations, Phys. D 92 (12) (1996)
6168.
URL http://dx.doi.org/10.1016/01672789(95)002928
[15] P. Billingsley, Probability and measure, 3rd Edition, Wiley Series in Probability
and Mathematical Statistics, John Wiley & Sons Inc., New York, 1995, a
Wileylnterscience Publication.
[16] M. Rao, personal communication (2009).
[17] M. Poiniak, K. Zyczkowski, M. Kus, Composed ensembles of random unitary
matrices, J. Phys. A 31 (3) (1998) 10591071.
URL http://dx.doi.org/10.1088/03054470/31/3/016
[18] G. Berkolaiko, Spectral gap of doubly stochastic matrices generated from
equidistributed unitary matrices, J. Phys. A 34 (22) (2001) L319L326.
URL http://dx.doi.org/10.1088/03054470/34/22/101
[19] M. Giona, S. Cerbelli, Connecting the spatial structure of periodic orbits and
invariant manifolds in hyperbolic areapreserving systems, Phys. Lett. A 347 (46)
(2005) 200207.
URL http://dx.doi.org/10.1016/j .physleta.2005.08.005
REFERENCES
[1] G. Froyland, On Ulam approximation of the isolated spectrum and eigenfunctions of
hyperbolic maps, Discrete Contin. Dyn. Syst. 17 (3) (2007) 671689 (electronic).
URL http://dx.doi.org/10.3934/dcds.2007.17.671
[2] F. Y Hunt, Unique ergodicity and the approximation of attractors and their invariant
measures using Ulam's method, Nonlinearity 11 (2) (1998) 307317.
URL http://dx.doi.org/10.1088/09517715/11/2/007
[3] Froyland, G. Aihara, Kazuyuki, Ulam formulae for random and forced systems,
Proceedings of the 1998 International Symposium on Nonlinear Theory and its
Applications 2 (1998) 623626.
[4] S. M. Ulam, Problems in modern mathematics, Science Editions John Wiley &
Sons, Inc., New York, 1964.
[5] M. Dellnitz, G. Froyland, S. Sertl, On the isolated spectrum of the PerronFrobenius
operator, Nonlinearity 13 (4) (2000) 11711188.
URL http://dx.doi.org/10.1088/09517715/13/4/310
[6] T. Y Li, Finite approximation for the FrobeniusPerron operator. A solution to Ulam's
conjecture, J. Approximation Theory 17 (2) (1976) 177186.
[7] G. Froyland, Using Ulam's method to calculate entropy and other dynamical
invariants, Nonlinearity 12 (1) (1999) 79101.
URL http://dx.doi.org/10.1088/09517715/12/1/006
[8] J. R. Brown, Ergodic theory and topological dynamics, Academic Press [Harcourt
Brace Jovanovich Publishers], New York, 1976, pure and Applied Mathematics, No.
70.
[9] P. Walters, An introduction to ergodic theory, Vol. 79 of Graduate Texts in
Mathematics, SpringerVerlag, New York, 1982.
[10] V. Baladi, Positive transfer operators and decay of correlations, Vol. 16 of Advanced
Series in Nonlinear Dynamics, World Scientific Publishing Co. Inc., River Edge, NJ,
2000.
[11] G. Birkhoff, Three observations on linear algebra, Univ. Nac. Tucuman. Revista A. 5
(1946) 147151.
[12] G. H. Golub, C. F Van Loan, Matrix computations, 3rd Edition, Johns Hopkins
Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore,
MD, 1996.
143
The next three theorems for eigenfunctions of f* are similar to theorems regarding
eigenvectors of stochastic matrices and help justify using eigenvectors to approximate
eigenfunctions.
Proposition 9.1.2. If u is an integrable eigenfunction of f*, f*(u) = Au, and f is
pmeausure preserving, then
A = 1 or I udlu = 0.
Proof. The function f is measure preserving, so f, u o fdp = f, udp. Since u is an
eigenfunction, f*(u) = Au; so u o f = Au. Thus
/ ud = Aud p
(1 A) ud/l =0.
Thus the statement holds. D
So if u is an eigenfunction of f* and defines a probablitiy distribution function on D,
then A = 1.
Proposition 9.1.3. If u is an eigenfunction of f*, f*(u) = Au, and f is pmeausure
preserving, then
IA < 1.
Proof. By hypothesis Au = u o f, so if we integrate over any set in B,
j Aud i = u o fd/j l
JA Jud f =f(A)
It follows that for any natural number k,
Ak / ud/l = f ()ud/.
JA Jf(A)
(Pil+ Pi2 ... Pin) = 1
Pil + Pi2 + P in +0 = 1.
Where 0 is the summation of the remaining addends after expanding the multinomial.
 P _il PN ... Pn <
=1pN pN ...pi>0 <1
Pil + Pi2 + + Pin 
E(" Pi,2 ... P ) = E(1 ) < E(1)
nE(p) = 1 E(0) < 1
1 E(0) 1
E(pN) = (O) < 
n n n
1
Now we prove that < E (pN) by using Minkowski's inequality.
1 = Pil Pi2 ... +Pin
= (Pil + Pi2 + ... + Pin)
= E((Pil Pi2 + + Pin)N
= E ((pil+ Pi2 + ... Pi)N)
Since p, are identically distributed, these expected values are all equal,
1
Dividing both sides by n gives
n
Finally taking powers of both sides gives the result.
< E(p, ).
Conjecture 6.2.8. If {Pn, }J is a sequence of dsmatrices where P,,, is a refinement
matrix of P, for all k, then
{fA2(Pn, )} 1
is a submartingale.
This conjecture was proven for Dkrefinements by the previous theorems.
Theorem 6.2.9. If Pnk is a krefinement matrix of Pn, then
trace(Pnk) < k(trace(Pn)).
Proof. By the refinement equations,
k k
a=1 /=1
kpii.
n k k
^CC^+p,
i=1 a=1 /=31
n k
i=l a=l a#3
n k n
( Y a (i aP,)
i=1 a=1 i=1 a 3
It follows that
trace(Pnk)
(i p,. ) = ki p.
/=1 ai3 /=1
Since 0 < p, ,
trace(Pnk) < k(trace(Pn)).
Thus we get the result.
Thus
k Sp
i=1
ii1
2. If ((l/n, l/n, 1/n,..., l/n), P) is ergodic but not mixing, we will reject the hypothesis
that (D, B, p, f) is not ergodic in favor of the hypothesis that (D, B, p, f) is ergodic
but not weakmixing.
3. If ((1/n, 1/n, 1/n,..., 1/n), P) is mixing, we will reject the hypothesis that (D, B, p, f)
is not ergodic and not mixing in favor of the hypothesis that (D, B, p, f) is ergodic
and weakmixing.
The following lemma and theorems give us criteria for when
((1/n, 1/n, 1/n, ..., 1/n), P)
is ergodic or mixing.
Lemma 2.2.9. Let P be a stochastic matrix, having a strictly positive probability vector p
with pP = p, then
N1
Q = lim 1y9pi
i=
exists. The matrix Q is also stochastic and
PQ = QP= Q.
Any eigenvector of P for the eigenvalue 1 is also an eigenvector of Q. Also
Q2 = Q.
Theorem 2.2.10. Let fn denote the (p, P) Markov shift (onesided or twosided). We can
assume pi > 0, Vi, where # = (p, ..., pn) (P is n x n). Let Q be the matrix obtained in
the lemma above, the following are equivalent:
1. (Xn, Zn, (p, P), fQ) is ergodic.
2. All rows of the matrix Q are identical.
3. Every entry in Q is strictly positive.
4. P is irreducible.
Taking absolute values of both sides of the equation yields
JA Jud fk(A)
,Ak ud  _< ud _< uldy
JA Jud
A k udp < u\dp.
Now, u is an eigenfunction, thus u z 0 and there exists B c B such that JB ud/ z 0.
IAIk< ud, for all k e N.
I fB udlp
Since IAk is bounded above for all k e N, AI < 1. D
Notation 9.1.4. If u e L'(D, B, p), {{Dki iD ~ is a sequence of nkpartitions (I(ki) =
 for all k, i), let Uk denote the simple function over {DkiI} defined by
nk
n' f udp. ni
uk(X1) = U x) = Y [ ud,] (x).
k i=( i= 1
By construction
Sk dp/ = u d forallki,
and thus
SUk d = u dp for all k.
So Uk is our best approximation of u over the oalgebra generated by {IDki ,}. In fact
Uk is a projection of u onto the set of simple function over {Dki}i 1 [14]. If {Dk+li ,}I is
a refinement of {Dki} 1, then the set of simple functions over {Dki} 1 is contained in
the set of simple functions over {Dk li,}n1 ; it follows that uk+ is a better approximation
of u than uk. We use the entries from stationary distributions of {Pnk ,} to construct
approximations of stationary distributions of f*. If #k = kPnk and # it a probability vector
Proof. All stochastic matrices have real entries, so det(P) e R.
det(P) = AjA2...An
Taking absolute values of both sides, we get
Sdet(P) = AA2...An I
= AI 1II A2 ... An
<1.
Thus we get the result. D
Corollary 3.2.3. If P is an n x n stochastic matrix then A2...An e [1,1].
The next result shows that if we are working with a class of dsmatrices with large
entries on the diagonal, then the eigenvalues are not uniformly distributed on the unit
disk. If we write the Gershgorin circle theorem for stochastic matrices, we can quickly
find a region that will contain all eigenvalues.
Theorem 3.2.4. If P = (p,) is an n x n stochastic matrix then
A min Pii < 1 min pii
for all eigenvalues.
Proof. By Gershgorin circle theorem there exists i such that
n
A p,, < S PU
j= 1,ij
P is a stochastic matrix, thus
n
SP= Pii.
j 1,iJj
It follows that
A p,, I 1 p,.
Theorem 6.1.5. If Pnk is an nk x nk dsmatrix, then Pnk is a krefinement matrix for some
n x n dsmatrix.
Proof. The matrix Pnk is ds so all rows and columns sum to one, all entries are
nonnegative. Look at the block matrix formed by breaking Pnk into n2 blocks of size
k x k. Call the blocks {BU}l
B11... B1n
Pnk = '
Bnl ... Bnn
Take the sum of each block's entries. Let
(E Bll)...(E Bin)
B=
(E Bnl)...(E Bnn)
Notice that B is an n x n matrix. Since Pnk is ds, all rows and columns of B sum to k and
all entries of B are nonnegative. Hence ( )B is an n x n dsmatrix. D
k
If we take a sequence of refinement matrices
{Pnk =1 nklnk+l, Ak,2 = n2(Pnk),
then {Ak,2}k 1 measures mixing at each refinement. Since A(k+l),2 measures mixing on
a finer partition than Ak,2, one would expect {IAk,2 1' to be a nondecreasing sequence,
this is not the case. There are examples of sequences that have A(k+1),21 < Ak,2. The
observed refinement matrices with decreasing eigenvalue magnitude had poor mixing
between states of the form D,, D2,,..., Di,. That is, the mixing was poor between states
that were all contained in one prerefined state.
Proposition 6.1.6. If {Pnj lnk+l is a sequence of nkl refinement matrices, then
Snk
{Ak,2} is not necessarily a nondecreasing sequence.
Hence the results hold.
Definition 6.2.6. [13] The bias of a point estimator W of a parameter 0 is the difference
between E(W) and 0; that is,
BiaseW = E(W) 0.
If E(W) = 0, then we call W an unbiased estimator of 0. If E(W) / 0, then we call W a
biased estimator of 0.
Theorem 6.2.7. If Pn is an n x n dsmatrix with Dkrefinement matrix Pnk, then
A2(Pn)l < E(IA2(Pnk)l).
Furthermore IA2(Pn) is a biased estimator for IA2(Pnk)l whenever the probability
distribution of I A2 (Pnk) I has support above I A2 (Pn) That is to say
I 2(Pn)l < E(IA2(Pnk)l).
Proof. Pnk is a Dkrefinement matrix of Pn so
IA2(Pn)l < IA2(Pnk)l.
Hence
A2(Pn) < E(IA2(Pnk)).
Whenever the probability function of A2(Pnk)l has support above IA2(Pn)I,
A2(Pn) < E(IA2(Pnk)l).
Thus we get the result. D
When we refine {D,}n 1, the approximation of (D, B, p, f) is finer. So the criteria to
mix over the refined partition is more stringent. Since A2(Pnk)l is our measure of mixing,
we make the following conjecture.
12.2.2 Full Convex Combinations
The Birkhoffvon Neumann theorem provides a technique for generating Monte
Carlo probability distribution functions for real functions of dsmatrices. If we want to
randomly generate a doubly stochastic matrix, then the Birkhoffvon Neumann theorem
tells us that we may apply a randomly generated weighted average to the set of n x n
permutation matrices to get a randomly generated dsmatrix. Recording the statistics
of the random dsmatrices gives a Monte Carlo approximation of the desired probability
distribution.
Proposition 12.2.4. If u is a length N vector where ui is a nonnegative random variable
for all i and P(u = 0) = 0, then as is a random convex combination almost surely.
We may take the absolute value of real random variables that are continuous at
zero to get convex combinations. If we change the distribution of the u,'s, we change the
distribution of the convex combinations and thus change the distribution of the doubly
stochastic matrices. To verify this, compare the results when the u,'s are independent
uniform([0, 1]) random variables, and when u, = v2 where the v,'s are independent
cauchy(0, 1) random variables.
Notation 12.2.5. Let = (71,..., TN) denote a convex vector.
Proposition 12.2.6. If is a random probability vector (convex combination) of length N
and y, 72, ...., 7N are marginally identically distributed, then
1
E(yi) 
for all i.
Remark 12.2.7. The convex coefficients of the 7 from the previous theorem are not
independent, if one entry increases (decreases) then sum of the other terms must
decrease (increase) to preserve 71 + 72 + ... +
123
Algorithm 12.2.10 (Rao's Convex Data Extention). If {Pk}'L 1are the n x n permutation
matrices and y is a length n! convex vector, then for any permutation on (1, 2, ..., n!), ,
n!
7a'(uk) Pk
k=l
is an n x n dsmatrix.
So if is a random convex vector, then we may extend our data by randomly
selecting permutations to generate new dsmatrices.
12.2.3 Reduced Convex Combinations
Theorem 12.2.11. If P is an n x n dsmatrix then P equals a convex combination of n x n
permutation matrices with at most (n 1)2 + 1 nonzero coefficients.
Proof. There are n2 entries of an n x n matrix; all rows and all columns of a dsmatrix
sum to one so there are (n 1)2 degrees of freedom. We may treat the set of matrices
whose columns and rows sum to one as a set with dimension (n 1)2. By the
Birkhoffvon Neumann theorem the set of dsmatrices is convex with permutation
matrices as corners. By Carathodory's theorem for convex sets, every dsmatrix may be
expressed as a convex combination of (n 1)2 + 1 permutation matrices. D
We will refer to convex combinations of permutation matrices that use all n!
permutation matrices almost surely as full convex combinations. We will call convex
combinations that use (n 1)2 + 1 permutation matrices reduced convex combinations.
The next result shows that full and reduced convex combinations sample from
probability spaces with different measures.
Theorem 12.2.12. Let f and be random convex vectors of length n!. All entries of f
are nonzero almost surely; has (n 1)! or fewer nonzero entries. If { Pk } is the set
of n x n permutation matrices,
n!
Pf = Z(~f)kPk
k=l
126
Jensen's inequality
E(\ P P F)<
E( PPI)
E(ZZ(P 
i=1 J=1
i=1 j=1
i=1 j 1
n n1
ZZ gn2
i=1 j 1
n1
n
<
g
Taking square roots of both sides leads to
E(11 Pp) F )
We get the result when we take the limit as g goes to infinity.
We get the result when we take the limit as g goes to infinity.
The next example is an extreme case to show the futility of using too few points.
Example 12.2.3 (min mi = 1). If we run the dealer's algorithm to approximate the
probability distribution of dsmatrices for Ulam's method with min mi = 1, then g = 1. All
matrices in our Monte Carlo approximation will be dsmatrices with exactly one 1 in each
column and each row, thus the matrices are permutation matrices. All eigenvalues of a
permutation matrix have magnitude one. If follows that the subshifts of finite type arising
from the Monte Carlo matrices are nonmixing. So if we we use the Monte Carlo matrices
to estimate the probability distribution of I 2 we will fail to reject that (D, B, p, f) is not
weakmixing.
122
1)2)
n
1 )2)
n
Multiply both sides by n,
n(E(l,k(Di) n ) E(, ))E(1j ,)) =[(p(k)_ P),].
Square both sides,
n2(E(lfk l n) /n) E(1,k p))E(1 ,))2 =[(p(k) P),]2.
Take the sum of these terms over both i and j,
n n n n
"n2 (E(1 i) n) E(Ifk ))E(I,))2 = (P(k) 
i 1 j 1 i1 j 1
Take the square root of both sides to get the left hand side to equal the Frobenius norm,
n( (E(l1 Di)n) (E(1k ))E(1))2)/2 =( (p(k) P)) 2
i 1 =1 i=1 j 1
=  p(k) P IF
By the definition of covariance, we get the result.
So we may use the sequence { P(k) P IF}I 1 to detect decay of correlation
between simple functions over sets in ({ID,} 1). Also we may use the rate at which this
sequence goes to zero to measure the rate that the dynamical system strongmixes over
7({ i}D,} n )
Definition 10.1.5. Let (X, E, v, g) be a dynamical system, a finite partition of X, {Ai}nl,
is a generating partition of Z if
n
S= ( U U gk(Ai)).
oo
Since Markov partitions capture the dynamics of a system, we use the sequence
{11 p(k)_ pk IF} 2
109
Theorem 12.2.8. If C1, 2, ..., cNvI are independent identically distributed gamma(a,3)
random variables and 7 is the probability vector given by
Ci
C1 +... + CNv+'
then the marginal distribution for each yj is beta(a,Na).
Proof. Since c1, c2,...., CNI+ are gamma distributed, 0 < c, ..., CNv+ almost always. Since
c1, C2,..., CN+ are iid we may show the result for 7N+1 without loss of generality,
P(N+I < k) = P( CN < k)
C1 + ... + CN + CNv+
= P(CNv+ < (c1 +... + CN + CN +)k)
= P((1 k)CN+l < (c1 +... + CN)k)
(c +... + CN)k
= P(CNv< ).
1k
Now c1, C2,..., CN are gamma, the moment generating function of c + c2 + ... + cN
shows us that c + c2 + ... + Cv is a gamma(Noa,) random variable,
f0/0 j X& '1 x/S yNValey/P3
P(<7NI < k)= k  rox3NNc dxdy
d odf f< Xolex//3 yNaleY//I
dk P(dk ) [ (a)a F(Na)N d
Using the definition of derivatives and the dominated convergence theorem, we see that
d "f d xfxlex/P yNaley//3
dkP(<7N+l < k) = dNdxdy.
dk P dk 0 Fr(a) F(Na) Na
It follows that
SP(
dk
rorN(1 j)y+l_ y(N+I)aI exp( )dy.
F(a)F(Na)(i k)"+ (+ (I)"
k2E(p(P ) < k2p < k2 k k2 E(p2 )
1
E(p, ) < p < 1 E(p~+).
Using the left inequality we get
1 1
E(p2 ) < p < 1 E(p21) < 1 p
pO k WE)
1
P 1 < E(p,2 ) <
Combining this statement with the fact that a squared value is nonnegative gives us the
result. D
Corollary 7.1.5. If Pn is an n x n dsmatrix, Pnk is a krefinement of Pn, Pn = (p,), Pnk
(pij), p,, are identically distributed of each fixed ij, then
1 2
V(pP) < (1 k2)P
Proof. A standard result is that V(pij) = E(p2, ) E(p, )2.
Thus
p2
V(pidj) = E(p2J) E(p)2 = E(pi ) E E 2 J
Using the previous theorem we get
p2 1 2
V(pio) < p = (1 k2)P
So we have an upper bound on the variance. O
Theorem 7.1.6. If Pnk is a krefinement matrix of Pn, then
E(trace(Pnk)) = trace(Pn), and
E(A2(Pnk)+ 3(Pnk) + ... + Ak(Pn,)) = (Pn) + A3(P) + ... (Pn).
matrix defined by
(P,,y = (f () C IDk X E IDki)
Let Pn be the operator that maps simple functions over {D(ki,}7 to simple functions over
{kiD, 1 such that P (Uk(X)) corresponds to ik Pnk
By construction
i=
I udp.
Dr
When u defines a probability distribution, ik is a probability vector.
We will use
P, (uk(X))
to approximate
f*(u).
We will use left eigenvectors of
to construct approximations to eigenfunctions of f*. The next theorem shows that when
f* acts on probability distribution functions and u o f is continuous,
p*
nk
converges to the PerronFrobenius operator defined by our stirring protocol.
Theorem 9.2.4. If u e L'(D, B, p), (D, B, p, f) is a pmeasure preserving dynamical sys
tem, f{{IDki }1, is a sequence ofnkpartitions, and {Dk liI,}1 i's a refinement of
nk
Table 135. The Baker's Map
Number of States Average A2(P) I Typical pvalue of X2
4 0.03 0.004
16 0.17 0.00
64 0.33 0.00
256 0.44 0.00
1024 0.53 0.00
4096 0.61 1.00
A typical P for four states is
0.5013 0.4987 0 0
0 0 0.4995 0.5005
PM
0.5011 0.4989 0 0
0 0 0.4997 0.5003
max(IP P) 0.0013
A2(P)I z 0.0079.
Typical significance levels will conclude that we used enough points when
n = 4,16, 64, 256, 1024,
but m = 106 is not sufficient when n = 4096. We know that the map is mixing, the
induced Markov shift is mixing but the eigenvalues of the approximating matrices do not
reflect the rate of mixing. This example demonstrates how eigenvalue instability and
insufficient m can throw off an observation.
13.6 The Chirikov Standard Map (parameter 0)
The Chirikov standard map is a Lebesque measure preserving function that maps
the torus to itself
f(x, y) = (x+ ksin(27y), y +x k sin(27y)).
140
Algorithm 12.2.1 (The Dealer's Algorithm for DSMatrices in Matlab).
M = zeros(n, n);
deal = randperm(n g);
forj = 1 : n g;
M(mod(j, n) + 1, mod(deal(j), n) + 1)...
M(mod(j, n) + 1, mod(deal(j), n) + 1) + 1;
end
P =(1/g) M;
When we use Ulam's method to generate P our target matrix, P, is a dsmatrix
so the column and row sums of M = (my) should be close to constant if mi is
constant. The entries of P will be rational by construction. If we set g = min mi then
the Dealer's Algorithm gives us a way to generate a Monte Carlo approximation of the
probability distribution of P. The other algorithms we present approximate the probability
distribution of P. The dealer's algorithm should be used when
1. The number of sample points is smaller than we would like (min m, is small).
2. We have no knowledge of the distribution before hand.
3. We want to sample dsmatrices with entries from { : a e {0, 1,2,..., g}}.
g
Theorem 12.2.2. If P is an n x n dsmatrix generated by the dealer's algorithm, then
 P P IF > 0 in probability as g oo, and
E(ll P P F 0 as g oo
Proof. The second statement implies the first, so we just need to show convergence
of the expected value. Let P = M, where M is a random matrix defined by the
g
number from each suit a player receives in the dealer's algorithm. The entries of M, md,
are marginally nonindependent binomial ( g) random variables by construction. By
n
Some dynamical systems have atypical behavior for subsets of measure zero. If
we use a mesh of points for sampling, we may unintentionally sample exclusively from
an atypical subset. To avoid this type of sample bias, the points should have random
coordinates. This way if a subset has an atypical property, sampling will probably not
be from that subset only. Usually it is easier and faster to generate m random points
(m = Y:1 mi) in D then count mi, rather than generate m, points in D,. The quotient in
the proof that P converges to P provides a way to decide how large m should be before
generating data. If mini<,in{m,} is too small, generate more points and combine the
sets of points until min
The function f is measure preserving, so if after applying the map to the points
there is a Di with no data points, not enough points were used. If minl
sufficiently large and close to being constant, the number of points in Di before and
after mapping will be about the same.
The matrix P is a dsmatrix, P will be a stochastic matrix and may not be a
dsmatrix. For (v, P) to be a Markov shift, P must be a stochastic matrix and 7 must
be a left eigenvector of P and a positive probability vector. So the criteria of ergodicity
and mixing hold for our observations as long as P has a left eigenvector that is a
positive stationary distribution. We will show that P converges to a dsmatrix as
min1 o, so if our observations do not provide such a vector either not
enough points were used, a hypothesis was violated or an error was made.
8.2 Properties of Ulam Matrices
Theorem 8.2.1. Let P be an n x n stochastic matrix from Ulam's method (n is fixed) and
the sample points are independent uniform random variables, then
i >p p almost surely as mi oo,
and
E(p) = p,.
(v, appears in the vector k times for all i) is an eigenvector of Pnk with eigenvalue A, then
V1
v2
vn
is an eigenvector of Pn with eigenvalue A.
Proof. By the hypotheses
n k
j= 1 = 1
a=l j
n k k
j=1 a= 1 =1
The matrix Pnk is a krefinement matrix of Pn and
: v,.
k
5 Avi.
a=1
k 0kv,
k f 0, thus
n
v5kp, = Akv,
j=1
n
p, = Av,.
J=1
Hence the statement holds.
Theorem 6.1.8. If Pn is an n x n dsmatrix, Pnk is a krefinement of Pn, PnV = Av, v
(vi), then
Thus the vector
(v, appears k times) averages out over block rows to be like an eigenvector of Pnk.
Zk Tn 1k
ja _j_1 B=1P'.Jl/ J
( V1 ... V1 V2 ... ... Vn... Vn,
Now we look at a matrix that defines a linear map that we may use to evaluate the
eigenvalues of P. We may use this map when eigenvalue stability is uncertain.
Notation 3.2.11. Let T denote the n x n matrix
I P.
Notation 3.2.12. Let < u, v > denote the standard dot product on the vectors u and v.
The next theorem shows that A2(P) = Ai(TP).
Theorem 3.2.13. If P is an n x n stochastic matrix, then
A(TP) = (A(P)U{0}) {1},
where A denotes the multiset of eigenvalues. Moreover, any eigenvector of P with A f 1
is an eigenvector of TP with the same eigenvalue.
Proof. Since P is a stochastic matrix,
is an eigenvector with eigenvalue one and has norm one. For any vector, v = (vi),
< v, >
is orthogonal to u.
 < u > =V < V,
= Iv
1 1
)( )

/K
In this paper, we present the testing procedure and hypotheses of testing early on,
we do this to give an overview of paper, help the reader understand the goals of this
work, and provide easy reference. Background information about ergodic theory and
Markov shifts is provided to establish our notation and review critical concepts. Since
the stirring protocol is modeled with a doublystochastic matrix, a review of stochastic
and doublystochastic matrix properties is presented. When the approximating Markov
shift defines a mixing dynamical system, the rate at which the matrix converges to
a rank one matrix provides an estimate of (D, B, p, f)'s mixing rate; construction of
the estimate is provided to justify its utility. When a decision is made based upon
observations, statistics can establish confidence in the decision, the approximating
matrix is treated as a random variable so that statistics may be used. Properties of
random doublystochastic matrices are given to illuminate the approximating matrix
as a random variable. How does the Markov shift change if the partition is refined?
Will a sequence of partitions lead to a sequence of Markov shift dynamical systems
that converge? What will the convergence rate be? As a first step to answering
these questions, we look at the relationship between Markov shifts before and after
a refinement, then investigate the probabilistic properties of random partitions. To
be consistent, we only consider refinement that have partition sets of equal volume.
Ulam's method gives us a approximating matrix that usually cannot be observed directly,
numerical or statistical observations can approximate this matrix with a Monte Carlo
technique; proof that the Monte Carlo technique converges is provided. Ulam's method
converges to an operator [6]; similarities between the approximating matrix and the
target operator are established, and proof of convergence with respect weakmixing is
given. Decay of correlation is a well established measure of mixing; the second largest
eigenvalue of the approximating matrix and decay of correlation are different measures
of mixing. Decay of correlation is a better measure of mixing, but requires a sequence of
stirring iteration, the second largest eigenvalue requires one iteration. If the partition is a
P, we have a measure of the mixing rate. What if n is so large that the Jordan canonical
form of P is not computable?
Theorem 4.2.1. The sequence defined by
n( 1 I2 Nn 1
N E N provides an estimate of f's rate of mixing.
Proof. Let J be the Jordan canonical form of P with conjugating matrix E,
P = E1JE, pN = E1jNE.
If A2(P)I = 1 then JN does not go a rank one matrix as N > oo, and we do not
conclude that (D, B, /, f) is weakmixing.
If A2(P) < 1, 1 has multiplicity one, hence [1] is a Jordan block; all other
blocks have diagonal entries with magnitude less than one. If we look at powers of
a Jordan block Bi, (N > n), the diagonal is AN... A, the subdiagonals are zeros, the
superdiagonals are A" d() where d is the dth super diagonal. So the upper right entry
of a Jordan block is the slowest to converge to zero. When A2(P) < 1 and i > 1,
rN d(N)
N=d
passes the ratio test, so the sum converges. Thus all entries of BN go to zero as
N > oo. If we look at ratios of the upper right entries, we see that eigenvalue magnitude
influences the rate of convergence more than block size. Thus the upper right entry of
the largest block of A2 converges most slowly to 0. The largest block size possible for A2
is (n 1) x (n 1). Therefore the rate at which entries of J that converges to zero go
no slower than the rate that A'" 1(nN) > 0. The equivalence of P and J shows that
the rate that AN(nN) > 0 as N > oo gives an upper bound on the rate that pN goes to a
rank one matrix as N > oo. D
If u = 1, then Uk = 1, SO Uk = u. When u = 1, it represents an ingredient with
constant concentration, that is to say the ingredient is mixed throughout.
Notation 9.1.7. If {{IDki 1}} is a sequence of nkpartitions where {IDk+ li,} is an
nk refinement of {Dki} 1, nk nk +, nk < nk +, then let Sk and So denote the following
nk
aalgebras,
Sk =({Dki,} )
the intersection of all aalgebras containing the partition sets {Dki, 1},
00
S, =(SU Sk)
k=l
the intersection of all aalgebras containing UJ, Sk.
Notice that Sk C Sk+l
The function Uk is Skmeasurable and integrable, and
j uk d = j ud p for all A Sk
by construction. It follows that
Uk = E(UlSk).
Definition 9.1.8. [15] Let X1, X, ... be a sequence of random variables on a probability
space (Q, F, P), and let ,, F2, ... be a sequence of aalgebras in T. The sequence
{(Xk, Fk)} =1 is a martingale if these four conditions hold:
1. k Ck k+i;
2. Xk is measurable in Fk;
3. E(Xk) < oo;
4. with probability one, E(Xk+ Fk) = Xk.
We have the uniform([0, 1]) distribution forp when C is a uniform([p, 1 p]) random
variable.
The expected value of a beta(a, /) random variable is and p is fixed so if
a + p
E(c) = 0, then
a +p
It follows that
p2(1 p)
pa
So if E(c) = 0, then V(p) decreases as a increases.
12.2 Approximating Probability Distributions
The rest of the chapter will look at other ways to approximate the probability
distributions of dsmatrices when the central tendency of the distribution is not specified.
We propose using one of the following Monte Carlo technique to approximate probability
distributions of statistics) from random dsmatrices when n > 2.
1. Use the dealer's algorithm to generate dsmatrices
2. Take convex combinations of all n! permutation matrices to generate dsmatrices
3. Take convex combinations of (n 1)2 +1 or more permutation matrices to generate
dsmatrices
4. Use unitary matrices to generate dsmatrices
12.2.1 The Dealer's Algorithm
Say that a dealer has a deck with n suits and g cards in each suit (ng cards in
the deck), shuffles the cards such that the order of the cards is a uniformly distributed
random variable. Then the dealer deals the entire deck to n distinct players. We can
represent the number of cards in a suit that each player received with an n x n matrix,
each player corresponds to a row, each suit corresponds to a column. All rows and all
1
columns will sum to g. We get a dsmatrix after rescaling the matrix by
g
120
Example 3.1.17 (Special Case n = 2). If n = 2 then a dsmatrix is of the form[: q
q p
q = 1 p.
p q 1 0 01
= P + q
q p 0 1 1 0
Notice that for 2 x 2 dsmatrices are always symmetric.
Example 3.1.18 (Special Case n = 3). If P is an n x n dsmatrix then P is a convex
1 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 1
combination of 0 1 0 0 1 0 1 0 0 1 1 0 0
00 1 1 0 0 1 1 0 0 1 0 0 1 0
Since the first four permutation matrices are symmetric and the last two are not, the
010 0 0 1
convex coefficients of 00 1 0 0 determine how far a dsmatrix is from
1 0 0 1 0
symmetry.
The next theorem shows that DS,, is a bounded set.
Theorem 3.1.19. If P is an n x n dsmatrix, then for the 1norm, 2norm ,0onorm and
Frobeniusnorm
I P P
Proof. By the Birkhoffvon Neumann theorem P is a convex combination of permutation
matrices. Assume that P = 1i, iPi where {Pi}n1 is the set of n x n permutation
matrices,
n! n!
 P P iPi a I
i=1 i=1
nI
= ai(P, P)I
i=1
n!
< a, I P, P
i=1
The following is the simplest general example of a krefinement. We present it to
show the relationships between the unrefined Markov shift and the refined Markov shift.
Example 6.1.3. If we have a 2partition and apply a 2refinement, what does P2 tell
us about P4 ? Here we are looking at the situation where P2 is a known matrix from a
Markov shift ((1/2, 1/2), P2) and we want to make an inference about the Markov shift
that results if we refine our partition, ((1/4, 1/4, 1/4, 1/4), P4). Let
P2 = q and
P1111 P1112 P1121 P1122
P1211 P1212 P1221 P1222
P4 =
P2111 P2112 P2121 P2122
P2211 P2212 P2221 P2222
First let's look at P4 as a dsmatrix without considering it as a refinement of P2. The
matrix P4 is a stochastic matrix so all rows sum to one.
P1111 + P1112 + P1121 + P1122 1
P1211 + P212 + P1221 + P222 1
P211 + P2112 + P2121 + P2122 1
P2211 + P2212 + P2221 + P2222 = 1
Our matrix P4 is a dsmatrix so all columns sum to one and we get the additional
equations.
P1111 + P1211 + P2111 + P2211 = 1
P1112 + P1212 + P2112 + P2212 = 1
P1121 + P1221 + P2121 + P2221 =1
P1122 + P1222 + P2122 + P2222 = 1
Corollary 2.2.13. The sum
(D, B, f).
n 1 i p,i (log(p,) gives an estimate of the entropy of
n
Proof. Set (p, P) = ((1/n, 1/n, 1/n,..., 1/n), P), pi = 1/n Vi.
Theorem 2.2.14. If (p, P) is a Markov shift on n states (onesided or twosided) then its
entropy is less than or equal to log(n).
Corollary 2.2.15. If h is an upper bound for the entropy of (D, B, p, f) then we should
set
n > exp(h)
when we partition D.
Example 2.2.16. Here are some examples of Markov shifts with four states.
1/6 5/6 0 0
5/6 1/6 0 0
1. ((1/4,1/4,1/4,1/4), ) is a nonergodic Markov sh
0 0 1/3 2/3
0 0 2/3 1/3
entropy approximately 0.544.
0 1 0 0
0 0 1 0
0010
2. ((1/4,1/4,1/4,1/4), ) is an ergodic Markov shift, but is nonn
0 0 0 1
1 0 0 0
entropy 0.
0001
1000
entropy 0.
1/4 1/4 1/4 1/4
1/4 1/4 1/4 1/4
3. ((1/4,1/4,1/4,1/4), 1/4 1/4 1/4 1/4
1/4 1/4 1/4 1/4
1/4 1/4 1/4 1/4
shift achieves the entropy upper bound
ift with
nixing, with
) is a mixing Markov shift. This Markov
LIST OF TABLES
Table page
131 The Reflection M ap . . 135
132 Arnold's Cat Map ........... ...................... 136
133 The Sine Flow Map (parameter 8/5) ........................ 137
134 The Sine Flow Map (parameter 4/5) ........................ 138
135 The Baker's M ap . . 140
136 The Chirikov Standard Map (parameter 0) . 141
Where mi is the number of states that start in state Di.
5. Let A, be P's second largest eigenvalue in magnitude. If A2 is not unique pick an
eigenvalue of minimal distance to one.
6. If IA2 is sufficiently smaller than 1, reject the hypothesis that (D, B, p, f) is not
mixing in favor of the hypothesis that (D, B, p, f) is weakmixing.
7. If IA2 is not sufficiently small, but IA2 11 is sufficiently large reject the hypothesis
that (D, B, p, f) is not ergodic in favor of the hypothesis that (D, B, p, f) is ergodic.
8. If AX is close to one, fail to reject the hypothesis that (D, B, p, f) is not ergodic.
9. If we accept the hypothesis that (D, B, p, f) is weakmixing, use the rate at which
(N,) 12 Nnl > 0 as N o
as an estimate of the rate at which f mixes.
10. Let
n n
t  p, log(p,)
n
i= j=1
be our estimate of our entropy. Note that by construction of npartitions, a sta
tionary distribution of P is (1/n, 1/n,..., 1/n), so we use (1/n, 1/n) for the
stationary distribution of our approximating Markov shift.
When we use the term sufficient, we compare the test statistic to the corresponding
critical value for hypothesis testing. The critical value comes from the level of significance
we set, and the test statistic's probability distribution. When we make a decision, it is
better to conclude that a weakmixing dynamical system is not mixing (type II error) than
to accept a nonmixing dynamical system as weakmixing (type I error). We will discuss
probability distributions in a later chapter. We propose using a beta distribution with
a > 2, and / = 1 to set the critical value for the weakmixing hypothesis.
Definition 8.1.2. The matrix M = (my) from the previous algorithm is called an Ulam
Matrix.
variable on the unit disk, then IA21 is a beta random variable with a = 2, / = 1. We
propose that if we have no insight into the value of a, then we should set it equal to two.
When the beta distribution is the correct distribution and a > 1, / = 1; if we set the
critical value with a smaller alpha parameter, we will be less likely to make a Type I error,
if we set the critical value with a larger alpha parameter, we will be less more likely to
make a Type I error. So it is better to use an alpha parameter that is too small rather
than too big.
The next example describes the probability distribution function if n = 2, and p is a
beta(a, /) random variable.
Example 12.1.1 (n=2). Say that
P= p q
PI
and we observe a perturbed version of P,
P=
P=P+e,
q=1lpe.
Where c is a random variable such that is a beta(a, /) random variable, then
P(p < kip) = P(p + c < kp) = k F(a ) ( 1 t)3ldt.
It follows that
If = 1, then P(p < kp) = k.
If a = 1, then P(p < kIp) = 1 (1 k).
If a = 1, and3 = 1, then P(p < kIp) = k.
119
pT is stochastic, but P is not symmetric.
Theorem 3.1.4. If P is the stochastic matrix formed by taking an npartition of
(D, B, p, f), then P is a dsmatrix.
Proof. By construction, P is a stochastic matrix. Since 0 < p, < 1 for all UI, it suffices to
show that the sum of all columns equals one. Since p(x e D,) are all equal, p(x e D,) =
1
n
n n
pY= u(f(x) Dxe D,)
i= 1 i= 1
p(f(x) e DAxeID,)
_(X E ]Di)
nt (f (x) e Dr Axe D,)
1
i= 1
n
n
= n yp(f(x) e Dj Ax e D,)
i 1
n
= np(f(x) e Dr Ax e D,)
i 1
= n(f(x) e D) = n(1/n) = 1.
So 2i py 1. It follows that PT is a stochastic matrix. D
We skip the standard proofs of the following lemmas. The next lemma shows that all
of the eigenvalues of a stochastic matrix are on the unit disk of the complex plane.
Lemma 3.1.5. If P is an n x n stochastic matrix and A is an eigenvalue of P then
A < 1.
Next we show that the largest eigenvalue of any stochastic matrix is one with
an eigenvector of all ones. Because of this lemma, we may use (1/n,..., 1/n) as a
stationary distribution of P. Since our goal is to measure mixing between n states each
with measure 1/n, (1/n,..., 1/n) is intuitively the correct stationary distribution to use.
The sum of each row (column) is one, so if py and Pij are on the same row (column)
and p. increases, pi,j tends to decrease to maintain the sum. If i f i' andj / j', when
p. increases, py, tends to decrease to maintain the row sum; when py, decreases, pi,,
tends to increase to maintain the column sum.
When P is a random matrix, det(P), trace(P), and A,(P) are random variables. We
look at properties of these random variables for the remainder of the chapter.
Theorem 5.1.10. If P = (py) is an n x n matrix (n > 2), and E(pI,(I)p22(2)...Pno(n)) is
constant for any length n permutations a, then
E(det(P)) = 0.
Proof. We need to use the definition of determinants that uses permutation,
det(P) = (1) Pl(1)P2,(2)...Pn(n).
a is a permutation
Where k, = 1 if is an odd permutation and k, = 0 if a is an even permutation.
E(det(P))= E( 1)k1(1)P2(2)Pn(n)
a is a permutation
S ( )k, E(P1,(1)P27(2) P...pn(n))
a is a permutation
= (l)kE(PllP22...Pnn)
aT is a permutation
= E(pP22...Pnn) (1)ke
a is a permutation
Since n > 2 half of the permutations are even, and half are odd. D
Corollary 5.1.11. If P = (pu) is an n x n matrix (n > 2), and
So each eigenvalue is contained in a closed disk centered at pii with radius 1 pii for
some i. All of these disks have a diameter in [1, 1] that contains 1. If we look at the real
numbers in [1, 1] contained the ith disk,
Ix P < 1 pii
Pii 1
2pii 1
2 min pj.1 < 2pii 1 x 1
J
2min p 1
J
min p 1
J J J
x minpy < 1 min pj.
J J
It follows that the real numbers contained in I A pi < 1 pi are contained in the disk
defined by I A minj py I< 1 minj py. Since the center of each disk is contained in [0, 1],
all disks are contained in I A minj pj 1< 1 minj pjj. D
Corollary 3.2.5. If P is an n x n stochastic matrix with all diagonal entries greater than 
2
then P is invertible.
Proof. For an n x n matrix to be invertible, all eigenvalues must be nonzero.
1
I A pi I< 1 minipii <
Since
1
< miniPii,
2
the open disk centered at pij with radius does not contain zero. D
2
log(n) < E(p, log(py)).
n
Thus we get the two inequalities. O
The next theorem is useful if one wishes to use the harmonic mean of the entries of
a random stochastic matrix.
Theorem 5.2.2. If P = (py) is a random n x n stochastic matrix where py are identically
distributed and 0 < py almost surely, then n < E( ).
1
Proof. Since f(x) = is a convex function on (0, oo), Jensen's inequality tells us that
X
1 1
< E(1)
E(pJ) pU
1/n pu
^)
n < E( ).
And the statement follows. O
Table 131. The Reflection Map
Number of States Average I A2(P) I Typical pvalue of X2
4 1 0
16 1 0
64 1 0
256 1 0
1024 1 0
4096 1 0
13.2 Arnold's Cat Map
Look at the unit square as a torus and apply Arnold's cat map,
2 1 x
f(x,y) = mod 1.
11
This map is strongmixing. When we partition the unit torus into four squares
1/4 1/4 1/4 1/4
1/4 1/4 1/4 1/4
1/4 1/4 1/4 1/4
1/4 1/4 1/4 1/4
which has characteristic polynomial
A3(A 1).
This matrix arises from our particular partition of four subsquares of the unit square. A
way to confirm that this is the correct matrix is to draw the mapping of the subsquares
on the xyplane, then look at where the four mapped subsquares are on the torus. The
eigenvalues are 1, 0, 0, 0. So when we partition the unit torus into four subsquares,
Arnold's cat map sends onefourth of a subsquare to each subsquare.
We ran our procedure 100 times with 106 pseudorandomly generated points
(uniformly) and saw the results listed in table 132.
135
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
USING ULAM'S METHOD TO TEST FOR MIXING
By
Aaron Carl Smith
August 2010
Chair: Philip Boyland
Major: Mathematics
Ulam's method is a popular technique to discretize and approximate a continuous
dynamical system. We propose a statistical test using Ulam's method to evaluate the
hypothesis that a dynamical system with a measurepreserving map is weakmixing.
The test statistic is the second largest eigenvalue of a Monte Carlo stochastic matrix that
approximates a doublystochastic Ulam matrix of the dynamical system. This eigenvalue
leads to a mixing rate estimate. Our approach requires only one experiment while the
most common method to evaluate the hypothesis, decay of correlation, requires iterated
experiments. Currently, time of computation determines how many Monte Carlo points
to use; we present a method based on desired accuracy and risk tolerance to decide
how many points should be used. Our test has direct application to mixing relatively
incompressible fluids, such as water and chocolate. Stirring protocols of compression
resistant fluids may be modeled with measurepreserving maps. Our test evaluates if a
stirring protocol mixes and can compare mixing rates between different protocols.
(n1)2 1+l
P = Y.YkP(k)
k= 1
12.2.6 DSMatrices Arising from Unitary Matrices
Every unitary matrix can be used to create a dsmatrix.
Proposition 12.2.14. If U is a unitary matrix and P is the matrix such that
P, = Uj ,
then P is a dsmatrix.
Proof. We must show that P is a nonnegative matrix whose columns and rows all sum
to one. By how we define P, P is a nonnegative matrix. Since U is unitary,
I= U*U.
So if
is a column vector of U, then
1 =< U, U< > uy2
Thus PT is a stochastic matrix. If we repeat this argument with
I = UU*,
we see that the rows of P sum to one, thus P is a stochastic matrix. Hence P is a
dsmatrix. O
n=2: This case follows from the previous example. Every 2 x 2 dsmatrix is of the
form
= p lp
1p p
for some p E [0, 1].
n=3: This case follows from a previous counterexample. The matrix
1/3 1/3 1/3
P= 1/2 1/6 1/3
1/6 1/2 1/3
is a 3 x 3 dsmatrix that is not symmetric.
n > 3: Let P denote the matrix in the n = 3 case, Ik be the k x k identity matrix, and
Ok be the k x k matrix with zero for all entries. Then for each n > 3,
P 03
03 /n3
is an n x n dsmatrix that is not symmetric. D
As n increases, the degrees of freedom for n x n dsmatrices increases and the
ways that a dsmatrix can deviate from being symmetric grow. Due to this and the
observation that randomly generated dsmatrices are symmetric less frequently for large
n, we propose the following conjecture.
Conjecture 3.1.12. If M, is the set of n x n matrices whose rows and columns sum to
one, g : M, > R"2, II M IIF= I g(M) 112 for all M E Mn, DSn, is the set of n x n dsmatrices,
and S, is the set of symmetric n x n dsmatrices, then
lim p(g(Sn)) = 0
noo
in the measure space (g(DSn), B, /) where p is Lebesque measure.
Lemma 9.2.6. If (X, E, /p) is a measure space and A, B, C c E, then
/p(A) p(B) < I<(AAB) and
p(A n B) p(A n C) < p(BAC)
Proof. For the first inequality,
p/(A) p(B)
= p(A n Bc)
= I (A n BC)
< 1p(A n Bc)
< (AAB).
 p(A n B) p(A n B) p(Ac n B)
p(Ac n B)
p(Ac n B)
Now the second inequality,
I(A n B) p(A n C)
p((A n B) n (A n C)c) p((A n B) n (A n C))
Ip((A n B) n (Ac U Cc)) p((Ac U B) n (A n C))
p(A n B n Cc) p(A n Bc n C)
(A n B n Cc) +p(A n Bc n C)
p(B n Cc) + (Bc n C)
p(BAC).
Hence we get both inequalities.
Our next theorem gives a criteria for weakmixing.
Theorem 9.2.7. If
1. {FTk }=1 is a sequence of aalgebras of D.
2. Fk C k+lVk
3. Fk < oo Vk
4. 7 = a(U 1 k)
5. (D, F,,, p) is a probability space
103
{D(,ki}1, u(x) defines a probability distribution, and u o f is continuous then
Pnk(Uk(X)) f*(u) in L1 as k oo
Proof. Since u defines a probability distribution on (D, B), uk(X) defines a probability
distribution and uk is a probability vector. The matrix Pnk is a doublystochastic matrix so
UkPnk is a probability vector. Thus Pnk(uk(x)) defines a probability distribution function on
(D, B). If
P(x A) = u d/u for all A B,
then
Su o fdp= P(f(x) e Dkj)
nk
= P(x e Dk)P(f(x) e Dk X Dki)
i= 1
nk(/ ud)P
i= 1
nk
= (/ uk dp)pU
i= 1
=/ Pk(Uk(x))dli.
It follows that
SPk(uk(x))d= J u o fdp/ for all ki.
Furthermore
/ P~ (uk(x))d = u o fd/p for all A e Sk for all k.
JA JA
100
CHAPTER 6
PARTITION REFINEMENTS
6.1 Equal Measure Refinements
If we know P and we refine our partition of D, what does P tell us about our new
Markov shift? Intuitively, we expect the refined subshift of finite type to be a better
approximation to (D, B, p, f). It is unreasonable to use a Markov shift from an npartition
to approximate a continuous dynamical system if the Markov shift does not provide
information about new Markov shifts formed after refining the partition. Here we present
the relationship between npartition Markov shifts and nkpartition Markov shifts where
the later is a refinement of the former. Partition each Di into k connected subsets,
k
a=l
1
nk
Each subset has equal measure. We will refer to such refinements as krefinements.
Notation 6.1.1. We will use the following notation when referring to refinement of
partitions.
1. Let P, be our stochastic matrix before refinement with entries py,
py = p(f(x) E DI x e D,).
2. Let Pnk be our stochastic matrix after refinement with entries pij,
Arrange the rows and columns of Pnk with the order
1112...14k2122...2k...nin2...7nk.
With this arrangement we can represent Pnk as a block matrix where each entry of P,
corresponds to a k x k block of Pnk
Table 132. Arnold's Cat Map
Number of States Average A2(P) I Typical pvalue of X2
4 0.00 0.00
16 0.09 0.00
64 0.45 0.00
256 0.45 0.00
1024 0.45 0.00
4096 0.49 1.00
A typical P for four states is
0.2507 0.2496 0.2501 0.2496
0.2513 0.2484 0.2505 0.2497
PM
0.2489 0.2511 0.2500 0.2500
0.2506 0.2504 0.2499 0.2491
max(IP P) 0.0016
A2(P)I 0.0016
Typical significance levels will conclude that we used enough points when
n = 4,16, 64, 256, 1024,
but m = 106 is not sufficient when n = 4096.
13.3 The Sine Flow Map (parameter 8/5)
The sine flow map is a well studied area preserving nonlinear map on the torus.
f(x, y) = (x+ Tsin(27ry), y+ Tsin(27r(x+ Tsin(27ry))))
When T = 8/5, f is chaotic, it is conjectured that f is chaotic when T = 4/5 [19]. For a
dynamical system to be chaotic it must be topologically mixing; that is to say, for any two
open sets A, B c D, there exists an N such that
f"(A) n B 0
136
If we relabel the subscripts of Di we can get
1/2 1/
1/2 1/
0 C
0 C
0
0
1/2
1/2
and
0.5012
0.5006
0
0
0.4988
0.4994
0
0
0
0
0.5000
0.5010
0
0
0.5000
0.4990
The graph of our Markov shifts has two disjoint subgraphs, hence our subshifts of finite
type are not ergodic. Typical significance levels will conclude that we used enough
points when
n = 4, 16, 64, 256, 1024, but m = 106 is not sufficient when n = 4096.
142
6.2 A Special Class of Refinements
Now we look at a special class of kpartitions. These partitions are interesting
because the eigenvalue multiset after refinement contains the eigenvalue multiset before
refinement.
Definition 6.2.1. If Pnk is a krefinement of Pn such that for every block matrix
PJil PilJk
PikJl PikJk
there exists a k x k dsmatrix Dy such that
P/ill PilJk
pyDy = ,
Pik ... P1kk
then Pnk is called a Dk refinement matrix of P,.
Definition 6.2.2. If Pnk is a krefinement of Pn such that for every block matrix
Philj PilJk
Pikl P1kk
there exists a k x k permutation matrix S such that
PJli PilJk
PikJl PJk
then Pnk is called a Skrefinement matrix of P,.
Remark 6.2.3. Every Skrefinement is a Dkrefinement and if we take a sequence of
Skrefinements then the matrices will become sparse.
BIOGRAPHICAL SKETCH
Aaron Carl Smith was born in Portland, Indiana, and grew up in Ashland, Oregon.
After serving in the United States Army, Aaron used the Montgomery GI bill to attend the
University of Florida. He is married to the most beautiful woman in the world, Bridgett
Smith; they have a wonderful daughter, Akiko.
145
Notice that the function inside the integral is the kernel of a gamma distribution.
d (P( < k)) F((N )a) k(1 k)N1
dk F(Na)F(a)
This is the probability distribution function of a beta(a,Na) random variable. By the
independence of C1, c2,..., cN, we have the result. O
Remark 12.2.9. If P is an n x n dsmatrix then P's convex combination may or may not
be unique.
Proof. Here we look at two examples that justify the statement.
1. If P is a permutation matrix, then the only convex coefficient that is not zero is
the coefficient corresponding to P. So permutation matrices have unique convex
coefficients.
1/n... 1/n
2. If P = then the convex vector (1/n! ... 1/n!) gives P.
1/n... 1/n
If Pk is the permutation matrix corresponding to
i i k mod n,
gcd(k, n) = 1,
take the convex combination where powers of Pk have coefficients 1/n and all
other coefficients are zero. This convex combination gives P also.
So some dsmatrices result from unique convex combinations and some do not. D
It would be nice to be able to extend a set of observed dsmatrices whenever
obtaining observations is difficult or expensive; Murali Rao [16] created a way to extend
a set of observations.
125
USING ULAM'S METHOD TO TEST FOR MIXING
By
AARON CARL SMITH
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2010
2.2 Markov Shifts
Definition 2.2.1. A vector
P= (pl p2... n)
is a probability vector if
0< pi <1,
for all i and
P+ P2 ... Pn = 1.
Definition 2.2.2. A matrix is a stochastic matrix if every row is a probability vector.
Notation 2.2.3. Let
Xn ={(xi) : i N, xi e {1,2, ..., n}} or
Xn ={(xi) : c Z, x e {1,2 .. n}}.
Notice that, e {1, 2,..., n} instead ofx, e {0, 1, 2,..., n 1}. We do this for ease of
numerical indexing.
Definition 2.2.4. Subsets of Xn of the form
{(x,) e Xn : X = a, X2 = a2, ...,Xk = ak
for a given a,, a, ..., ak are called cylinder sets.
If En is the oalgebra generated by cylinder sets, p is a length n probability vector
and P is a stochastic matrix such that pP = p, then (Xn, En, (p, P)) defines a globally
consistent measure space where the measure of {{x,} e Xn : xl = a, x2 = a2, ...Xk
ak} is (Pa1)(Pa1a2...Pak ak)"
The measure of a cylinder set uses both p and P.
Proof. The matrix P is a dsmatrix, hence (1/n, 1/n,..., 1/n) is a stationary distribution.
(1/n, 1/n, ..., 1/n)P = (1/n, 1/n, ..., /n).
It follows that
P2 =P.
Since P is an orthogonal projection, all eigenvalues are 0 or 1. All columns and all rows
are equal so rank(P) = 1, thus only one eigenvalue does not equal zero. Since the rank
of a matrix is equal to the rank of its Jordan canonical form, the last part of the lemma
follows. O
3.2 Additional Properties of Stochastic Matrices
Since we are using an eigenvalue of a stochastic matrix as a test statistic, the
probability distribution of eigenvalues is important. We may use the next few results
to eliminate some measures from consideration as the probability distribution of A2(P)
when we have prior knowledge about P.
Proposition 3.2.1. If P is an n x n stochastic matrix then A2 + ... + n [1, n 1].
Notice that the upper bound is achieved when P is the identity matrix and the lower
010
bound is achieved when P = 0 0 1
100
Proof. The matrix P is stochastic so all entries are on [0, 1], thus trace(P) e [0, n].
A + A2 + ... + An = trace(P) so A1 + A2 + ... + An e [0, n]. Since A1 = 1, A2 + ... An E
[1, n 1].
Proposition 3.2.2. If P is an n x n stochastic matrix then det(P) e [1, 1].
The upper and lower bounds are achieved by permutation matrices.
Proof. Proof by counterexample:
Let
4/33
3/11
1/33
19/33
5/33
4/33
20/33
4/33
5/11
10/33
2/11
2/33
3/11
10/33
2/11
8/33
The eigenvalues of P4 are approximately
1, 0.246, 0.044 0.167/, 0.044 + 0.167/,
so the magnitude of P4's eigenvalues are approximately
1, 0.246, 0.173, 0.173,
A4,2 M 0.246.
Using the refinement equations we see that
The eigenvalues of P2 are {1, 1/3}, A2,2
1/3 2/3
2/3 1/3
= 1/3. It follows that A4,2 < IA2,2
When we take a refinement, what do the eigenvalues and eigenvectors of Pn tell
us about the eigenvalues and eigenvectors of Pnk? For Pn to be of value, it needs to
capture the useful information from Pnk, if it does not, then Pn has no hope of being
useful in making a decision about (D, B, /, f). Since we are using an eigenvalue as
a test statistic, we present the next results describing the relationship between the
eigenvalues of Pn and Pnk
Theorem 6.1.7. If Pnk is a krefinement matrix of Pn and
v2... Vn ... Vn)
( V1 ... V1 V2 ...
it follows that the expected value exists. By the strong law of large numbers, and that the
data points are independent and indentically distributed, converges to its
mij +... + m,
expected value. O
Remark 11.1.4. When we use y as a test statistic, we let
mlj + ... + mnj
Mu = 0 whenever m, = 0.
mj + ... mn
We do this to prevent problems with zero denominators when
f(x) i D
for all data points. By construction, data points start in each Di, but it is possible to have
a sample where no points land in a particular partition set.
Theorem 11.1.5. If {m } are our counts from our Ulam matrix and mi = m2 = ... mn,
then
V( m )<1 p2
mlj + ... + mnj
Proof.
M( )2 < 1.
mlj +... + mnj
Take the expected value,
E(( m )2) < 1
mlj +... + mn
Subtract the term needed to make the left hand side equal the variance,
E(( m )2) (E( my ))2 < 1 (E( m
mlj +... + mn mj + ... + m mij ... + mn
V( M )<1 p2
Sv v...
So we have an upper bound for the variance of the quotient. D
for any constant c,
V(x) = E(x2) (E(x))2
whenever E(x2) and E(x) are finite, and
cov(x, y) = E(xy) E(x)E(y)
whenever E(xy), E(x), and E(xy) are finite.
Theorem 5.1.2. If P = (py) is a random stochastic matrix where py are identically
distributed then
1
E(py) = for all ij.
n
Proof. Since P is a stochastic matrix and taking an expected value is a linear operation,
n
E(p )
ji1
nE(py) = 1
E(pu)
Thus the statement holds. C
Theorem 5.1.3. If P = (py) is a random n x n stochastic matrix where py are identically
distributed and N E N, then
1 1
< E(p') <
n n
where p = (PU)N.
Proof. First we prove that E(pN) <
For any stochastic matrix P,
... + Pin = 1
Pil + Pi2
than the consequences of type II errors. Therefore it takes strong evidence to reject
Ho. In general, we cannot simultaneously control the type I error risk and the type II
error risk; by default we control the risk of a type I error. The maximum chance we are
willing to risk a type I error is called the significance level. To conduct proper hypothesis
testing, one must set the significance level before gathering observations. After setting
the significance level, one must establish what statistic to use (called the test statistic);
what set of statistics results in reject Ho, what set of statistics results in fail to reject Ho
(the boundary between these two sets is called critical valuess).
For our problem, P is a stochastic Ulam matrix that approximates P, a doublystochastic
matrix (dsmatrix). We will make our decision based on the following criteria.
1. Reject Ho in favor of Ha2 if I A2(P) I< c2 where c2 is a critical value.
2. Reject Ho in favor of Ha if I A2(P) 1 I> ci and I A2(P) I> c2 where cl is a critical
value.
3. Fail to reject Ho otherwise.
Since we are more concerned with a type I error, rejecting Ho is unlikely when
observing random events. To set the critical values we must have an estimate of the
probability distribution of the test statistic. Probability distibutions of dsmatrices are
difficult to work with, the Monte Carlo chapter provides ways to approximate probability
distributions of test statistics.
1.2 Testing Procedure
Notation 1.2.1. Let
1 1
n n
1 1
n n
and I be the identity matrix. We will denote disjoint unions with .
1. Set the significance level.
I dedicate this to Bridgett for her support while I pursued my goals.
CHAPTER 13
EXAMPLES
Here we demonstrate our procedure on well known maps on the unit square. We
partitioned the unit square into halfopen subsquares and set m = 106 points. The points
were uniformly distributed pseudorandom numbers generated with MATLAB's rand
function on default settings. A goal was for the points to be approximately independent
identically distributed uniform((0, 1) x (0, 1)) random variables. The number of partition
sets is a power of four since we are partitioning the unit square into subsquares,
n = 4,16, 64,256, 1024, 4096.
When a map is defined on the standard torus, we treat the surface as the unit square
with the edges identified. For each map we present one observed matrices when
n = 4 since this is the easiest case to interpret; we omit the matrices for n =
16, 64, 256, 1024, 4096. We present the average observed IA21. Under the assumption
that our conjectured test for when the data points are sufficient, we present typical
results from our x2test; the pvalues are presented rather than giving the results for a
particular significance level, this is done to allow the reader to make conclusions using
their own criteria.
13.1 The Reflection Map
The reflection map defined by
0 1 x
f(x, y) (Y, x),
1 0 y
reflects the unit square over the line x = y. The disk
{(x,y) : (x 1/2)2 + (y 1/2)2 < 1/4}
is mapped to itself and has measure r/4, thus the reflection map is not ergodic.
133
Taking the reduced row echelon form of the system of equations after including the
refinement equations (pivots are in bold font) we get
1 0
L 0
L 0
L 0
L 0
1 0
1 0
) 1
) 0
) 0
) 0
) 0
) 1
.1
) 1
11
1 1
) 0
) 0
) 0
) 0
S 0
) 0
) 1
1 1
) 0
0
S 0
S2p
1
2p
1
2p
0
1 2p
0
0
0
0
0
From the twelve equations the reduced row echelon form has eight pivots and four
zerorows. Thus including the refinement equations reduces one degree of freedom.
The zero matrix is not a solution to the system of equations, thus the solution set is
still not a vector space. If we convert the reduced row echelon form to matrices, we see
that a solution is
0
2p
0
1 2p
0
1 2p
0
2p
After taking an npartition, f maps some portion of state i to state. Let P = (p,) be
the matrix where
P (x C Dj A f'(x) E Di)
i '(x) p(x e Dlf (x) E ID) = p(f(x) E Dlx Ce ID).
S I(f(x) C Di)
Our measure p is a probability measure on D so by construction P is a stochastic matrix
with pi, = p(f(x) E D jx e D,) and j iP, = 1, p, is a conditional probability.
When establishing an npartition the physical action for the mixing protocol on
the domain should be considered. If the domain is the unit disk and mixing protocol
acts in a circular manner, to check that regions closer and farther from the origin are
mixing we could partition the disk into rings with radii r, = rk = If the
domain is a rectangle and we are confident that the mixing protocol mixes horizontal
(vertical) sections, then we may partition the rectangle into n horizontal (vertical) smaller
rectangles to check that vertical (horizontal) regions mix. If we do not know about the
stirring protocol before partitioning, the partition should minimize subset diameter.
11.2 Other Criteria for When More Data Points Are Needed
Next, we propose alternative criteria for deciding if mini mi is too small.
1. When we generate P, we know that it approximates an n x n dsmatrix, thus
column sums should be close to one. If
n n n n
j=1 i= 1j= i=1
is significantly larger than zero then more points should be used.
2. Suppose we want to check how close P is to being ds with respect to a particular.
If the value of
1 1 1
1 1= (P I0
1 1 \1
differs significantly from zero, then more points should be used.
3. All dsmatrices have (1/n, 1/n,..., 1/n) as a stationary distribution so if P does not
have a strictly positive stationary distribution, more points should be used.
4. All dsmatrices have (1/n, 1/n,.... 1/n) as a stationary distribution so if
SIn  < a, (1/n, 1/n, ..., 1/n)
sup < a, (1/n, :1 uP = u, a is a probability vector}
is far from one for all strictly positive stationary distributions of P, then more points
should be used.
116
CHAPTER 1
INTRODUCTION
If we need to evaluate a stirring protocol's ability to mix an incompressible fluid in a
closed system, but cannot do so analytically, Ulam's method provides a Markov shift that
approximates the protocol. Since the fluid is incompressible, the stirring protocol defines
a measure preserving map where the volume of a region defines its measure. For a
function to be a stirring protocol, the function must be volume (measure) preserving.
Otherwise a closed system could have more or less mass after mixing than before.
Let D be the incompressible fluid we wish to mix (D is our domain), D is a bounded
and connected subset of Rk, k e N, B is the Borel oalgebra of D,and p is the uniform
probability measure rescaledd Lebesque measure) of (D, B). The function f : D > D is
defined by the given stirring protocol. Our dynamical system is (D, B, p, f).
If a stirring protocol mixes, then the concentration of an ingredient will become
constant as the protocol is iterated. When a solution is mixed, the amount of an
ingredient in a region becomes proportional to the volume of the region. If we partition
the fluid into n parts, we may use an n x n stochastic matrix to represent how the stirring
protocol moves an ingredient; if the protocol mixes, then the matrix will become a rank
one matrix with all rows being equal as the protocol is iterated (rows of the rank one
matrix correspond to the measure of partition sets). When the partition is a Markov
partition, we may use powers of the matrix to represent how iterations of the stirring
protocol move an ingredient; powers of a stochastic matrix converge to a rank one
matrix if and only if the second largest eigenvalue is less than one in magnitude. Since
the fluid is partitioned into n sets, it is natural to partition the fluid into sets with volume
. When we partition a stirring protocol in this manner the approximating matrix and its
n
transpose will be stochastic.
The stochastic matrix that approximates the stirring protocol defines a Markov shift,
so the Markov shift approximates and models how stirring iterations move particles from
The next conjecture provides a hypothesis test to check if mini mi is large enough
after running Ulam method. The conjecture is based on the X2test where
(observed expected)2
expected
is the test statistic. Here we insert the entries of P for the observed values, 1/n for
the expected values in the denominator. Since we wish to check if our data points are
insufficient and P is a dsmatrix, we insert
my
mli +... + mnj
for the expected values in the numerator. In the chapter of examples, we assume this
conjecture is correct and give the results of this test.
Conjecture 11.1.6. If {m,} are our counts from our Ulam matrix and
mi, m2, ..., mn
are the number of points in
D1,D2, ..., D1
before applying our map, then
n n
n n( m, mU )2 21)
S J m j i ... m m, ... in ( 2
i=1 j=1
and this may be used as a test statistic for the hypothesis test
1. Ho : mi is insufficient for some i.
2. Ha : mi is sufficient for all i.
Where we will reject Ho in favor of Ha if
n n
n W (  m  y  )2
j + ...+ m, mil + ... min
is smaller than a critical value.
115
Theorem 6.1.2. If ((1/n,..., 1/n), P,) is the Markov shift that approximates (D, B, /, f)
after an npartition, and Pnk is the stochastic matrix after a krefinement, then
((1/nk,..., 1/nk), Pnk)
is a Markov shift and
k k
=31 a=l
We will refer to these equations as the refinement equations.
Proof. After refining, Pnk is an nk x nk dsmatrix so ((1/nk,..., 1/nk), Pnk) is a Markov
shift. Now let's look at the refinement equations.
k k k k
k = nk f (x) e D Ax e Di')
==1 a=l
/3 a1l nk
k k
= nk((f(x)E ~Ax Di,)
v ( Ax CD,
3=1 aa=
li(f(x) E D Ax E Di)
=kk
1
nk/(f (x) E D, Ax E Di)
(x D Ax i)
= kp.(
Thus we get the refinement equations. O
Proof.
P(k <1 P P ) = P(k <11 EJE1 EJE1 )
=P(k <11 E(J J)E1 II)
< P(k < E I J 1 E1 II).
Multiply both sides by negative one and add one to both sides,
1 P(k <11 P P ) > 1 P(k <11 E IIII J IIII E1 II).
By the properties of probability measures we see that
P(l P < k) > P(l E II JJ I E1 < k)
k
P(l P P < k) > P(ll J J < ).
II E I II E1 I
When E is unitary and we have the 2norm or the Frobenius norm
P(k <1 P P ) = P(k <1 EJE1 EJE1 )
= P(k <1I E(J J)E1 II)
= P(k <1 J J ).
Thus both statements hold. D
Taking absolute values of both sides, we get
det(P) A = ...An = A2  n .
By how we defined Ai,
SIn1<1 A2 I A... An I<1 A2 In1
Thus
I An I< n / det(P) I < A2 .
This gives us both inequalities. O
Notice that I An 1= "/ det(P) I = A2 I when P is a permutation matrix or P.
Corollary 3.2.8. If P is an n x n stochastic matrix then
max ce() det(P) }
n
Proposition 3.2.9. If P is an n x n stochastic matrix then for any matrix norm induced by
a vector norm
S<11 P II
Proof. One is the largest eigenvalue for any stochastic matrix. D
The identity matrix achieves the lower bound.
Proposition 3.2.10. If P is an n x n stochastic matrix, then
I P I =
II P 12 = 1
I P 1 =1
9 CONVERGENCE TO AN OPERATOR ................
9.1 Stirring Protocols as Operators and Operator Eigenfunctions
9.2 Convergence Results ......................
10 DECAY OF CORRELATION ......................
10.1 Comparing Our Test to Decay of Correlation .
10.2 A Conjecture About Mixing Rate .
11 CRITERIA FOR WHEN MORE DATA POINTS ARE NEEDED .
11.1 Our Main Criteria for When More Data Points Are Needed .
11.2 Other Criteria for When More Data Points Are Needed .
12 PROBABILITY DISTRIBUTIONS OF DSMATRICES .
12.1 Conditional Probability Distributions .
12.2 Approximating Probability Distributions .
12.2.1 The Dealer's Algorithm ............
12.2.2 Full Convex Combinations ..........
12.2.3 Reduced Convex Combinations .
12.2.4 The DSGreedy Algorithm ..........
12.2.5 Using the Greedy DSAlgorithm .
12.2.6 DSMatrices Arising from Unitary Matrices .
13 EXAM PLES .. .. .. .. .. .. .
13.1 The Reflection Map .................
13.2 Arnold's Cat Map ...................
13.3 The Sine Flow Map (parameter 8/5) .
13.4 The Sine Flow Map (parameter 4/5) ........
13.5 The Baker's Map ...................
13.6 The Chirikov Standard Map (parameter 0) .
REFERENCES ............. .. ..........
BIOGRAPHICAL SKETCH ...................
. 1 17
. 120
. 120
. 123
. 12 6
. 128
. 130
. 1 3 1
133
. 13 3
. 13 5
. 13 6
. 137
. 13 8
. 14 0
. 14 3
. 14 5
. 92
. 97
. 107
. 107
. 110
112
112
116
117
Table 133. The Sine Flow Map (parameter 8/5)
Number of States Average I A2(P) I Typical pvalue of X2
4 0.2042 0.000
16 0.2068 0.000
64 0.3539 0.000
256 0.5198 0.000
1024 0.6927 0.000
4096 0.7427 1.000
whenever n > N.
We ran our procedure 100 times with 106 pseudorandomly generated points
(uniformly) and saw the results listed in table 133.
Typical significance levels will conclude that we used enough points for
n = 4,16, 64, 256, 1024,
but m = 106 appears to be insufficient when n = 4096.
A typical P for four states is
0.1655 0.2341 0.2330 0.3673
0.2316 0.1656 0.3709 0.2318
0.2332 0.3691 0.1647 0.2331
0.3696 0.2326 0.2328 0.1650
A2(P) 0.2048
13.4 The Sine Flow Map (parameter 4/5)
Here we set the parameter of the sine flow map to 4/5.
f(x, y) = (x + (4/5) sin(27y), y + (4/5) sin(27r(x + (4/5) sin(27y))))
It is conjectured that this dynamical system is choatic.
We ran our procedure 100 times with 106 pseudorandomly generated points
(uniformly) and saw the results listed in table 134.
137
P log(P) < E(pinj. log(pi, )).
k k
Taking summations of both sides leads to
k k n n k k n n
EEI log( ) < EE pE E(pY lg(p,))
a= l /31 i=l j=1 a= 1 /=1 i=l j=1
n n k k n n
k 2log( ) < E( p ijl (p ))
i= 1 j= 1 a= 1= 1 i= 1 j= 1
1
The formula for metric entropy has a negative sign, the formula for hnk includes n
nk
1
multiply both sides by nk
nk
1kknn
al E( 1 il ip og
a 1 /3 1 i j 1
i=1 j=1
E(hnk) < ~pU log( )
i=1 j=1
n n n
E(hnk) < Py log(py) +
i=1 j=1 i
n n
n'6 log(k)
=1 j=1
Thus we get the bound.
E(hnk) < hn
Slog(k).
IfA = 1and / J then without loss on generality we may assume that < v, u >= 0.
It follows that
TPV =v.
Since TP has the same eigenvectors as P, we get the result. D
So we may use TP to describe the eigenvalues of P, furthermore A2(P) = A1(TP).
Next we look at how the singular values of TP compare to the eigenvalues of P.
Notation 3.2.14. If M is an n x m matrix, let 1 (M), 2,(M),...,. mn(m,n)(M) denote the
singular values of M;
71(M) > 72(M) > ... > min(m,n)(M).
Theorem 3.2.15. If P is an n x n stochastic matrix, then
IA2(P) < 7l(TP) < 1.
Proof. Use an eigenvalue and singular value inequality [12] and our previous result,
A2(P) = AI(TP) < a (TP).
The upperbound of one follows from a straight forward computation. O
If we are concerned about the stability of eigenvalues from our approximating
matrix, P, then we may use the eigenvalues of TP. If we do not trust the stability of
eigenvalues from either matrix, then we may use the first singular value of TP. Since
the first singular value of a matrix is very stable, o( TP) is a better statistic when
eigenvalue stability is questionable. Unfortunately, the probability distribution of aoi(TP)
will likely differ from the probability distribution of A2(P).
CHAPTER 7
PROBALISTIC PROPERTIES OF PARTITION REFINEMENTS
If we know P, of our npartition and we apply a krefinement, what do we expect of
Pnk? The matrix P, provides all of our knowledge of Pnk, so in this section we make the
assumption that for fixed i, the distribution of pij is identical for each a, 3.
7.1 Entries of a Refinement Matrix
First we look at probabilistic properties of the entries of a refinement matrix.
Theorem 7.1.1. If Pnk is a krefinement matrix of Pn and the distribution of pinj is
identical for each a, 3, then
E(p.) =
Proof. The refinement matrix, Pnk, is a krefinement matrix of Pn so k and Pn are known.
k k
S pj =
a= 13=1
k k
a= 13=1
k k
a= 13=1
The distribution of pi,. is identical for each a, 3, so
k2 E(pjl ) = kpu
E(p,^ = 6.
Thus we get the result. D
Theorem 7.1.2. If Pn is an n x n dsmatrix, Pnk is a krefinement of P,, Pn = (p,), Pnk =
(pij), p,, are identically distributed of each fixed ij, then
E(p( ) < kq2pq
1
4 minl
It follows that
P( PP >e)< n2
F 4e2 minl
Both n and c are fixed before we sample, so
PP 0
F
in probability as minli oo.
The previous theorem showed convergence in probability and gives insight into the
probability distribution of P. Our next theorem shows that
E( P P )>0as min {mi} oo.
F 1
So the next theorem implies the previous convergence result; we showed the last
theorem for its statement about the probability distribution.
Theorem 8.2.3. If P is an approximation of P generated by Ulam's method with a fixed
partition and the sample points are independent uniform random variables, then
E( P P ) as min {mi} oo.
ProoF 1e
Proof. By Jensen's inequality
(E( P ))2<
E( PP )
n n
i=1 j=1
n n
n n
iZ Z v(P
i =1 j=1
/1 j=1
p,)2)
5.2 Metric Entropy of Markov Shifts with Random Matrices
The next two theorems are important for estimating the metric entropy of
(Xn, En, (1/n, ..., 1/n), P), fn)
when P is a random variable. The metric entropy of the dynamical system is
n nPu Iog(pj).
n n
i=1 j=1
Where we define
0 log(O) =0.
Theorem 5.2.1. If P = (py) is a random n x n stochastic matrix where py are identically
distributed and 0 < py almost surely then
E(log(p,)) < log n, and
log(n) < E(p, log(py)).
n
Proof. First we will show the first inequality. Since f(x) = log x is a convex function on
(0, oo), Jensen's inequality tells us
E( log py) < log E(py)
E(log py) < log
n
E(log pu) < log n
E(log py) > log n.
Now we will show the second inequality. Since f(x) = x log x is convex on (0, oo),
Jensen's inequality tells us that
E(py) log(E(pu)) < E(pu log(pu))
1 1
1 log( ) < E(pu log(pu))
n n
Theorem 6.2.4 (The Boyland Theorem). If Pn is a dsmatrix, Pnk is a Dkrefinement of
V1
V/2
Pn and v2 is an eigenvector of Pn with eigenvalue A, then
\vn
V1 ... V1 V2 ... V2 ... Vn ... Vn
is an eigenvector of Pnk with eigenvalue A.
Proof. For any is
n k n k
: Y Pl vo = V Y pwo
j= 1 j 11 31
The refinement matrix, Pnk, is a Dkrefinement of Pn so every row of the block matrix that
corresponds to p, sums to p,, thus
n k n
C lZ = vy p,
j= 1 =1 j 1
Since we are working with an eigenvector of Pn,
n k
j= 13=1
Hence the statement follows. O
Corollary 6.2.5. If Pn is a dsmatrix, Pnk is a Dkrefinement of Pn, then
I Aj(Pn)I < Aj(Pnk) I forallj < n,
I An(Pn) I > Ank(Pnk) I,
I det(Pn) I > det(Pnk)
If we take averages,
N1
(f '(A) n B)
i0
)N1
p(A)(B) <1 (I(BABk) + (AAAk)
I (f'(Ak)n E
p (Ak)P(Bk) 
I(BBAk) I(AAAk)
SN1
Y( I/(fi(Ak))n Bk)
N
i=(
I p(Ak)(Bk)
o"(Uc TFk) so there exists KA, KBe, KABE e N such that
I(BABk) < e/4 whenever k > KA,
Ip(AAAk)I < e/4 whenever k > KBe
I (Ak)p(Bk)
ip(A)ip(B)I < e/4 whenever k > KABe
Thus
N1
S i(f'(A) n B)
i=O
N1
/ (f'(Ak)n Bk)
i=
/i(Ak)/(Bk))
whenever k > max{KA6, KBe, KABe}. Now,
I N1
lim p (f'(A) n B)
i= 1
0 for all A, B e Fk.
Hence there exists a K e N such that
N1
 I/p(f (Ak) Bk)
i=
p(Ak)P(Bk)I < e/4 whenever N > K.
105
k) (Ak)P(Bk)
p (A)/p(B) )
Let c > 0 be given, F.o
/l(Ak)P(Bk)
3e/4
p (A)/.(B) .
p(A)/p(B) I
p(A)/(B)
then we look to see if
nk
nkyPkl (X)
i=1
approximates a nonconstant stationary distribution of f*.
Theorem 9.1.5. If u e L'(D, B, p), {{rD,} ki nk is a sequence of nkpartitions where
{IDk li ) 1 is an kl refinement of {Dki 1 (nk nk+), then
nk
I Uk+ 1dl
SUkdip.
I Uk+ldl,
nk+1
/ nrk+ [
k+ 1
~nk+ 1 [ u
i=1
nk+l
nk 1 ,C
i 1 ,Cc
/,
i=1 :C
 / udi/
j ukdlp.
udp]1 (x)d/
dp] /
[
,(x)dp
udp] 
nk+l
udpl
udl
We present the next proposition since the result holds for f* and stochastic
matrices.
Proposition 9.1.6. (D, B, p, f) is a dynamical system, f is measure preserving, then
u = 1 is an eigenfunction of f* with eigenvalue one.
Proof.
CHAPTER 9
CONVERGENCE TO AN OPERATOR
9.1 Stirring Protocols as Operators and Operator Eigenfunctions
Notation 9.1.1. If (D, B, p, f) is a dynamical system, f is measure preserving, let f*
denote the operator
f : L'(D, B,) L'(D, B,)
f*(u)= uo f.
For our problem, f represents our stirring protocol, u is a probability distribution
function that measures the concentration of an ingredient before stirring and u o fk is a
probability distribution function that measures the concentration of the ingredient after
running the protocol k times. If our stirring protocol mixes, then for any continuous initial
concentration of our ingredient, the concentration should become constant as we run the
stirring protocol. Mathematically we represent this situation as
J u o fkd/ > p(A) for all A B as k o.
For mixing, we are concerned with u e L(ID, B, p) that define a probability
distribution function on D, i.e. u > 0 and 11 u 1= 1. When u is constant, the
concentration of the ingredient is constant and the ingredient is mixed. The function f*
is a PerronFrobenius operator defined by our stirring protocol. Stationary distributions
for a stochastic matrix are left eigenvectors with eigenvalue one, we want to look at
u e L(ID, B, p) that define stationary distributions for f*. We shall see that P is a
Galerkin projection of f*.
If u is a nonconstant nonnegative eigenfunction for f* with A = 1 that defines a
probability distribution function, then when we stir an ingredient with initial distribution u,
the concentration of the ingredient does not change. Since u is nonconstant, our stirring
protocol is not mixing.
is to have the distribution
P(A2(P) = 1)= 1
P(A2(P) = t)= Oif t 1.
We cannot reject Ho in favor of Ha2 if we use this distribution, so hopefully this is not
the correct distribution. Let's look at having A2(P) = 1 as a mode of the distribution.
1. Uniform distribution: All values on [0, 1] are a mode of the distribution.
g(t) = 1[o0,1](t)
2. Beta distribution: To have t = 1 be a mode we must set a > 1, = 1.
gal(t) = ata11[o,1](t)
3. Triangle distribution: The mode of the distribution is a so set a = 1.
ga(t) = 2tl[,1](t)
If a = 1, = 1, then the beta distribution gives us the uniform distribution; if
a = 2, = 1, then the beta distribution gives us the triangle distribution distribution
with mode of one. The beta family of distributions includes the uniform distribution and
the triangle distribution with mode at one. So we propose using the beta distribution with
a > 1, = 1.
gal(t) = ata11[o,1](t).
The beta distribution provides the additional advantage that it is an exponential family
of distributions. When a > 1, / = 1 and a significance level is set, the critical value
increases as a increases. If possible, before evaluating our stirring protocol, we should
look at several stirring patterns in the same class as f and use maximum likelihood
estimates or method of moments to estimate a. If A2 is a uniformly distributed random
118
Theorem 9.2.1 (A martingale convergence theorem). [15] If,, F, T,... is a sequence of
aalgebras satisfying F c F2 c ..., = = (U lTFk) and Z is integrable, then
E(Z Fk) E(Z 1F,) with probability one.
The next theorem gives us convergence of uk to u when S, = B. Thus it is
important to refine partitions such that the refinements generate the Borel oalgebra in
the limit.
Theorem 9.2.2. If u e L1(D, B, p), {{Dki } Inki is a sequence of nkpartitions where
{IDk+i}1 is an refinement of Dki, 1, nklnk+ and S, = B, then for a given E D
nk
Uk(X) > u(x) as k oo in p.
Proof. Here we use the previous martingale convergence theorem.
Uk(X)= E(u(x) Sk) E(u(x)S) = E(uB).
Since u is Bmeasureable, E(u1B) = u. Thus uk(X) u(x). D
Since we are working with a sequence of refinement partitions, uk(X) is contained
in the set of simple functions over {Dk +li,}1 Hence Uk+l(x) is no worse of an
approximation of u(x) than Uk(X).
Notation 9.2.3. If u e L1(D, B, p), (D, B, /, f) is a pmeasure preserving dynamical sys
tem, {{IDki I n =1 is a sequence of nkpartitions, and {Dk+li}+1 /s a nk refinement of
nk
{Dk,}i 1, let Uk R1Xnk be the vector where
(Ek)i = udp.
So the entries of Uk correspond to the coefficients of uk (x) (the coefficients of our
Galerkin projection of u onto the simple functions over {Dki,} 1). Let Pk, be the nk x nk
If we set up these equations as a system of equations and take the reduced row
echelon form (pivots are in bold font) we get
1 0 0
S 1 0
S 1 0
1 1 
0 0
. 1 1
. 1 1
) 0 0
0 1 1
0 0 1
1 0 0
1 0 1
0 1 0
0 0 1
1 1 1
0 0 0
Notice that from the eight equations the reduced row echelon form has seven pivots
and one zerorow.
The zero matrix is not a solution to the system of equations, thus the solution set is
not a vector space. If we convert the reduced row echelon form back to matrices, we see
that a solution is
0 0
0 0
0 1
1 0
This matrix added to any linear combination of the following matrices is a solution to
the system of equations.
1 0 0 1 0 0 0 0 0 0 0
1 0 1 0 1 1 0 0 0 0 0 0
0 1 0 1 0 0 0 0 0 1 1 0
0 1 1 0 1 1 0 0 0 1 1 0
1 0
) 1
L 0
) 1
L 0
) 1
) 0
) 0
Conjecture 10.2.1. For any measure preserving dynamical system (D, B, p, f) that is
strongmixing with an npartition
II Pk P lF< (k) P liF and
IA2(P)k < I,2(P(k)) .
If these conjectures are correct and the dynamical system is strongmixing, then
our weakmixing rate estimate converges faster than the rate that the dynamical system
strongmixes. That is to say
n ) A,(P) N" > 0 faster than f strongmixes.
Therefore
N1
I ~p(f'(A) n B) (A)p(B) < e whenever N > K.
i=
Thus we get the result. O
106
2010 Aaron Carl Smith
Definition 2.2.5. If P is a stochastic matrix and p is a probability vector such that
S= pP, then p is called a stationary distribution of P.
Definition 2.2.6. Let f, : E, Z,, f,((xi)) = (x, +) then f, is called the shift map.
The shift map is a (p, P)measure preserving map.
Remark 2.2.7. If pj = 0 for some j then the measure of {(x,) e X, : Xk = j} is zero;
without loss of generality say that pj > 0 for allj c {1, 2,..., n}. Otherwise if pj = 0 then
the set a sequences with xk = Pj for some k has zero measure.
Definition 2.2.8. The dynamical system (X,, E,, (p, P), fn) is called a Markov shift,
with (i, P) as the Markov measure. IfXn = {(x,) :i e N,x, e {1, 2,..., n}} then it is a
onesided Markov shift. IfXn = {(x,) : ie Z,, i {1,2,..., n}} then it is a twosided
Markov shift.
Our goal is to use (X,, Zn, (p, P), fn) to approximate (D, B, /, f). If we look at a point
x e D, iterate f, and let aN = / where fN(x) e Di, (p, P) gives the probability distribution
of cylinder sets. For our mixing problem an element of X, represents the movement of
an ingredient particle while stirring, so we say that (X,, Z,, (P, P), f,) is a onesided shift.
In the DoublyStochastic Matrices section, we will show that p = (1/n, 1/n,..., 1/n) is
a stationary distribution for any n x n dsmatrix. For brevity we will let (p, P) represent
Markov shifts.
With Markov shifts, weakmixing is equivalent to strongmixing [9], so we will refer
to ((1/n, 1/n, 1/n,..., 1/n), P) as mixing or not mixing. Since our Markov shift is only an
approximation of (D, B, p, f), we may accept hypotheses of weakmixing and not make
statements of strongmixing when weakmixing is accepted. Later we will show why
((1/n, 1/n, 1/n,..., 1/n), P) is a reasonable approximation of (D, B, p, f).
1. If ((1/n, 1/n, 1/n,..., 1/n), P) is not ergodic, we will fail to reject the hypothesis that
(D, B, p, f) is not ergodic and not weakmixing.
Markov partiition, then the two measures are equivalent. One must decide if the sample
size is sufficient when using a Monte Carlo technique, we propose using a modified
chisquared test to evaluate sufficiency after building the the approximating matrix.
Statistics requires probability distributions of observations, several statistical and Monte
Carlo methods of distribution estimation are given.
1.1 Hypotheses for Testing
The hypotheses for testing are
1. Ho: (D, B, f) is not ergodic (and hence not mixing).
2. Ha : (D, B, f) is ergodic but not weakmixing.
3. Ha2 : (D, B, p, f) is weakmixing (and hence ergodic).
We will define ergodic, weakmixing and strongmixing in the Ergodic Theory and
Markov Shifts chapter. We refer to Ho as the null hypothesis, Ha and Ha2 are called
alternative hypotheses. The procedure we present does not prove or disprove that
(D, B, j, f) is ergodic or weakmixing, it provides a method to decide to accept or reject
hypotheses. We use logical arguments of the form
If statement A is correct, then statement B is correct or
If statement B is incorrect, then statement A is incorrect.
when proving a statement; in hypothesis testing we use
If our observation is unlikely, then the null hypothesis is probably incorrect.
So hypothesis testing uses an argument similar to a contrapositive. When making a
decision we can make two mistakes
1. We can reject Ho when Ho is correct. This is called a type I error.
2. We can fail to reject Ho when Ho is incorrect. This is called a type II error.
When we decide which hypothesis is the null and which are the alternatives, we
set the null hypothesis such that the consequences of a type I error are more severe
In addition, by boundedness and monotonicity, if we take a sequence of Dkrefinements
and apply these functions to the refinement matrices, the resulting sequences will
converge.
Proof. Let A(P,), A(Pnk) denote the eigenvalue multisets of P, and Pnk; Pnk is a
Dkrefinement of P, so
A(Pn) C A(Pnk)
{1, A2(Pn),... An(P)} C {1, A2(Pnk) .... An(Pnk), ..., nk(Pnk)}
{A2(P), ..., n(Pn)} C {A2(Pnk), A.. An(Pnk), ..., nk(Pnk)}
{I A2(Pn) 1, .... IAn(Pn) } C {I A2(Pnk) I,  n(Pnk) A, I knk(Pnk)}
Since A(Pn) C A(Pnk), thejth largest element of { A2(Pn) I,..., An(Pn) i} is smaller
than the thejth largest element of { A2(Pnk) I ... I An(Pnk) I, I Ank(Pnk) i}, hence
I j(Pn) < Aj (Pnk) I for all j < n. Now if we take the minimum of both multisets,
{ A,2(Pn),. n(n)A }(P) } C f{ A2(Pnk) I, , n(Pnk) 1, ... I Annk(Pk) I
min{ A2(Pn) I .... n(Pn) } > min{ A2(Pnk) A .. I n(Pnk) A .. I nk(Pnk) }
I An(Pn) I>1 Ank(Pnk) I
When we take determinants of Pn and Pnk, we get
det(Pn) = J A
AEA(Pn)
det(Pk)= J A=( Jn A)( n A).
AeA(P n) AeA(Pn) Ae(A(Pn)A(Pn))
Eigenvalue magnitude of any stochastic matrix is bounded above by one, thus
I AI< I
AE(A(Pnk)A(Pn))
Idet(Pnk) = J Al  A < I n Al= det(Pn) .
AEA(Pn) Ae(A(Pn)A(Pn)) AEA(Pn)
CHAPTER 4
ESTIMATING THE RATE OF MIXING
Time is money, so if we have several mixing protocols we would prefer one that
mixes rapidly. After we conclude that a protocol mixes, the next question is "How
fast does it mix?" In this section we establish a statistical measure of mixing rate for
(D, B, p, f) when
(Xn, Zn, ((1/n, 1/n,..., 1/n), P), f,) is mixing.
4.1 The Jordan Canonical Form of Stochastic Matrices
Theorem 4.1.1. If P is an n x n dsmatrix such that IA2(P)I < 1 then
1/n ... 1/n
pN" as N oo.
1/n ... 1/n
Proof. Let J be a Jordan canonical form of P with (J)ii = A,, and conjugating matrix E,
P = EJE1,
and
pN = EN E.
Also, let Bil denote an / x / Jordan Block of J with eigenvalue Ai.
If A2(P) < 1, 1 has multiplicity one, hence [1] is a Jordan block; all other
blocks have diagonal entries with magnitude less than one. If we look at powers of
Jordan blocks Bi, (N > n), the diagonal is A"...A", the subdiagonals are zeros, the
superdiagonals are ANd() where d is the dth superdiagonal. When A2(P)I < 1 and
S>1,
N=d
Remark 6.2.10. The upper bound in the previous theorem is achieved when Pn and Pnk
are identity matrices or derangement permutation matrices.
Theorem 6.1.10. If Pn is an n x n dsmatrix, Pnk is a krefinement of Pn, Pn
v = (v,), u*Pn = Au*, u = (ui), then
( D1...u i U2... 12... lDn... ln) Pnk
V1'
Proof. The left hand side of the equation equals
Proof. The left hand side of the equation equals
n k n k
j j l 1uip, V
i= 1 a=1 j= 1 =3 1
Ak < v, > .
n n k k
Div pi i
i=l j1= a= 13=1
n n
Y jjv kpU
i=1 j 1
n n
i=1 j=1
n
k iAvi
i= 1
n
kA Di vi
i=1
Ak < v, > .
Thus we get the result.
Example 2.2.17 (Special Case n = 2). If we have a 2refinement ofD, P = q
1 p. The characteristic polynomial of P is (A 1)(A + 1 2p). So A2(P) =2p 1.
1 0
((1/2, 1/2), P) is ergodic P /
0 1
1 0 0 1
((1/2, 1/2), P) is mixing #P / or
01 1 0
When n = 2, a doublystochastic matrix is symmetric and there are one or two distinct
entries, so it is easy to explicitly state all cases of ergodic and mixing Markov shifts.
When n > 2, the numbers of entries make describing such Markov shifts with matrix
entries difficult.
Since we are treating the entries of a refinement matrix as a random event, what
can we say about the variance of entries?
Theorem 7.1.4. If P, is an n x n dsmatrix, Pnk is a krefinement of P,, P, = (p,), Pnk
(P, ,), pij, are identically distributed of each fixed ij, then
1
max(0, p +
Sk
Proof. First note that Pnk is a dsmatrix, so 0 < po < 1 and 0 < p,2 < 1. By the
refinement equations
k k
a= /3=1
k k
a 1p= )2
a1 131
k k
a=1 3=1
Pi jo Piprj
a7a' or373'
kpu
k2p2
PUp~
Now 0 < pij < 1, so
0 < pi5j, Pij, < k2 k
ao a'ord/3'
So if we use these upper and lower bounds, we see that
k k k k
a=1 =1 a =l/3=1
k k
5p o
a=l8=1
k k
S p 12 < k < k
=Taking expected values shows that
Taking expected values shows that
k k
5 p'jo P,^j,< k2 k + p2
a#a'or/33' a1 /3 1
k k
k +p YY
a= 1=1
k k
k 13 Y1
k k
E(E Ep ) <
a= 1 1
E(k2p2) < E(k2 k
k k
a= 13 1
1) < E(p2 ) < p2.
W3 Y
6. (D, Fo, I, f) is a dynamical system
7. f is measure preserving
8. limN 4 ,1 pl(f'(A)n B) p(A)p(B) = OVA, Be for all k.
then (D, FT, /, f) is weakmixing.
Proof. Let A, B e T,, Ak, Bk e Tk such that
p(AAAk) = min{/p(AAS) E Fk}
(BA/Bk) = min{p(BAS) : E TFk
Since Fk1 < oo and k C FT, Ak and Bk exist,
Ip(f'(A) n B) p(A) (B)I =p(f'(A) n B) p(f'(A) n Bk)
+p(f'(A) n Bk) (f'(Ak) Bk)
+ (f'(Ak) n Bk) p(Ak)I(Bk)
+ p(Ak)i(Bk) (A)p(B)
<(f'(A) n B) p(f'(A) n Bk)
+ (f'(A) n Bk) (f'(Ak) n Bk)
+ p(f'(Ak)n Bk) p(Ak)P(Bk)
S\p(Ak)A(Bk) p(A)p(B)
< (BABk)I + (f'(A)Af'(Ak))
p(f'(Ak)n Bk) p(Ak)(Bk)
p(Ak)(Bk) p)(A)p(B)
=p(BABk) + p(AAAk)
p (f'(Ak)n Bk) p(Ak)(Bk)
,p(Ak),(Bk) )(A) .(B).
then
1 < E(lim inf Ak2 Ak3 +... A+ Akk) < 0.
koo
Proof. Stochastic matrices have entries on [0, 1] so trace(Pjk) e [0, nk]. By Fatou's
lemma
0
koo koo
0
koo
Ak2 + Ak3
0
koo
0 <1 + E(lim inf Ak2
koo
0 <1 + E(lim inf Ak2
koo
1
koo
nk
S knk) < lim inf E( (Pk)ii)
oo i=
i 1
3  + knk) < lim inf >E((Pk)ii)
koo
i=1
nk
nk 1
3+ ... + Akn) < liminf V2
koo nk
i= 1
3 + + Aknk) <
... + kn, ) < 0.
Thus we get the result.
The next example gives us an idea of how much we can expect P to differ from P
when n = 2.
Example 5.1.15. If P is an 2 x 2 dsmatrix then
IIPP PII1= P P 12=I P P  =1 P P IIF
12p 1< 1
Where P =
variable, then
1p
p p
For these particular norms if p is a uniform[0, 1] random
E(I P P )
1
E(12p 11) =
2
Proof. Since 7 is an eigenvector of Pn,
PV = AV.
So it follows that
n
vj p, = Av.
J1
By the refinement equations
k n k
a=l j=l 3=
a1j1 /31
n k k
j=1 a=l /3=1
n
vj kpu
j 1
Akp,.
Divide both sides by k to get the result.
Remark 6.1.9. For any matrix M with left and right eigenvectors of A, u and v (u* M
AI* and MV = Av),
u*Mv = ,u* v = < au,v > .
Where denotes conjugate transpose, and < u, v > refers to the dot product.
=AT(
ATV
n
= TV.
TPV =AV.
If v is a scalar mutipletor of P, P= then,
TPv=T(vP')
=\1Tv
=\(lv < v, *> ).
IfAf1,< 7,?>= 0,
TP7 = A 7.
If v is a scalar multiple of i then
TPv = : = 0 U.
\0/
Theorem 11.1.2. If my counts the points such that x Di and f(x) e Dj for our
Ulam matrix where the data points are independent uniform random variables, and
mi = m2 = ... = mn, then
m
mlj + ... + mn
> p, as mi > oo almost surely
Proof. mik ~ binomial(pik, m) for each ik so by the strong law of large numbers
mik
> pik as mi > o almost surely.
iii
Now we take the limit of the quotient,
,. my
lim M
mloo mlj + ...
ml
mU
= lim m
mloo ml m
mi mi
mm
limmlo m
(limmlo o m )l + ... + (limmoo
lim m1
limm,oo m_
pmi
Ppp
Plj + ... + Pnj
Now ply + ... + p, = 1, since we use an npartition to generate our Ulam matrix. So
,. my
lim M = p6.
moo mlj ... + mnj
Thus the quotient converges to the entry of P.
Corollary 11.1.3. E( mU ) = P
mlj + ... + mn
Proof. Our measure space is a probability space and
0<
my
mlj + ...
< 1,
113
m,

PAGE 2
2
PAGE 3
3
PAGE 4
IthankProfessorBoylandforhisguidance,andmembersofmysupervisorycommitteefortheirmentoring.IthankBridgettandAkikofortheirloveandpatience.Ineededthesupporttheygavetoreachmygoal. 4
PAGE 5
page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 7 ABSTRACT ......................................... 8 CHAPTER 1INTRODUCTION ................................... 9 1.1HypothesesforTesting ............................. 12 1.2TestingProcedure ............................... 13 2ERGODICTHEORYANDMARKOVSHIFTS ................... 16 2.1ErgodicTheory ................................. 16 2.2MarkovShifts .................................. 17 3STOCHASTICANDDOUBLYSTOCHASTICMATRICES ............ 23 3.1DoublyStochasticMatrices .......................... 23 3.2AdditionalPropertiesofStochasticMatrices ................. 33 4ESTIMATINGTHERATEOFMIXING ....................... 41 4.1TheJordanCanonicalFormofStochasticMatrices ............. 41 4.2EstimatingMixingRate ............................ 42 5PROBABILISTICPROPERTIESOFDSMATRICES ............... 45 5.1RandomDSMatrices ............................. 45 5.2MetricEntropyofMarkovShiftswithRandomMatrices ........... 54 6PARTITIONREFINEMENTS ............................ 56 6.1EqualMeasureRenements ......................... 56 6.2ASpecialClassofRenements ........................ 68 7PROBALISTICPROPERTIESOFPARTITIONREFINEMENTS ......... 74 7.1EntriesofaRenementMatrix ........................ 74 7.2TheCentralTendencyofRenementMatrices ............... 78 7.3MetricEntropyAfterEqualMeasureRenement .............. 80 8ULAMMATRICES .................................. 82 8.1BuildingtheStochasticUlamMatrix ..................... 82 8.2PropertiesofUlamMatrices .......................... 84 5
PAGE 6
....................... 92 9.1StirringProtocolsasOperatorsandOperatorEigenfunctions ....... 92 9.2ConvergenceResults ............................. 97 10DECAYOFCORRELATION ............................. 107 10.1ComparingOurTesttoDecayofCorrelation ................. 107 10.2AConjectureAboutMixingRate ....................... 110 11CRITERIAFORWHENMOREDATAPOINTSARENEEDED .......... 112 11.1OurMainCriteriaforWhenMoreDataPointsAreNeeded ......... 112 11.2OtherCriteriaforWhenMoreDataPointsAreNeeded ........... 116 12PROBABILITYDISTRIBUTIONSOFDSMATRICES ............... 117 12.1ConditionalProbabilityDistributions ..................... 117 12.2ApproximatingProbabilityDistributions .................... 120 12.2.1TheDealer'sAlgorithm ......................... 120 12.2.2FullConvexCombinations ....................... 123 12.2.3ReducedConvexCombinations .................... 126 12.2.4TheDSGreedyAlgorithm ....................... 128 12.2.5UsingtheGreedyDSAlgorithm .................... 130 12.2.6DSMatricesArisingfromUnitaryMatrices .............. 131 13EXAMPLES ...................................... 133 13.1TheReectionMap .............................. 133 13.2Arnold'sCatMap ................................ 135 13.3TheSineFlowMap(parameter8/5) ..................... 136 13.4TheSineFlowMap(parameter4/5) ..................... 137 13.5TheBaker'sMap ................................ 138 13.6TheChirikovStandardMap(parameter0) .................. 140 REFERENCES ....................................... 143 BIOGRAPHICALSKETCH ................................ 145 6
PAGE 7
Table page 131TheReectionMap ................................. 135 132Arnold'sCatMap ................................... 136 133TheSineFlowMap(parameter8/5) ........................ 137 134TheSineFlowMap(parameter4/5) ........................ 138 135TheBaker'sMap ................................... 140 136TheChirikovStandardMap(parameter0) ..................... 141 7
PAGE 8
8
PAGE 9
9
PAGE 10
1 ],ndattractors[ 2 ],andapproximaterandomandforceddynamicalsystems[ 3 ].StanUlamproposedwhatwecallUlam'smethodinhis1964bookProblemsinModernMathematics[ 4 ],theproceduregivesadiscretizationofaPerronFrobeniusoperator(transferoperator).Theprocedureprovidesasuperiormethodofestimatinglongtermdistributionsandnaturalinvariantmeasuresofdeterministicsystems[ 3 ].Eigenvaluesandtheircorrespondingeigenfunctionsofhyperbolicmapscanrevealimportantpersistentstructuresofadyamicalsystems,suchasalmostinvariantsets[ 1 5 ].Ulam'sniteapproximationofabsolutelycontinuousinvariantmeasuresofsystemsdenedbyrandomcompositionsofpiecewisemonotonictransformationsconverges[ 6 ].Ulam'smethodmaybeusedtoestimatethemeasuretheoreticentropyofuniformlyhyperbolicmapsonsmoothmanifoldsandobtainnumericalestimatesofphysicalmeasures,Lyapunovexponents,decayofcorrelationandescaperatesforeverywhereexpandingmaps,Anosovmaps,mapsthatarehyperboliconanattractinginvariantset,andhyperboliconanonattractinginvariantset[ 7 ]. 10
PAGE 11
6 ];similaritiesbetweentheapproximatingmatrixandthetargetoperatorareestablished,andproofofconvergencewithrespectweakmixingisgiven.Decayofcorrelationisawellestablishedmeasureofmixing;thesecondlargesteigenvalueoftheapproximatingmatrixanddecayofcorrelationaredifferentmeasuresofmixing.Decayofcorrelationisabettermeasureofmixing,butrequiresasequenceofstirringiteration,thesecondlargesteigenvaluerequiresoneiteration.Ifthepartitionisa 11
PAGE 12
1. 2. 3. 1. WecanrejectHowhenHoiscorrect.ThisiscalledatypeIerror. 2. WecanfailtorejectHowhenHoisincorrect.ThisiscalledatypeIIerror.Whenwedecidewhichhypothesisisthenullandwhicharethealternatives,wesetthenullhypothesissuchthattheconsequencesofatypeIerroraremoresevere 12
PAGE 13
1. RejectHoinfavorofHa2ifj2(bP)j c1andj2(bP)jc2wherec1isacriticalvalue. 3. FailtorejectHootherwise.SincewearemoreconcernedwithatypeIerror,rejectingHoisunlikelywhenobservingrandomevents.Tosetthecriticalvalueswemusthaveanestimateoftheprobabilitydistributionoftheteststatistic.Probabilitydistibutionsofdsmatricesaredifculttoworkwith,theMonteCarlochapterprovideswaystoapproximateprobabilitydistributionsofteststatistics. Notation1.2.1. Setthesignicancelevel. 13
PAGE 14
Setnsuchthatconnectedregionsofmeasure1 3. Decidewhichconditionalprobabilitydistributionsofj2(bP)jwhenj2(P)j=1touse.Weproposeusingabetadistributionwith2and=1forHa2. 4. SetcriticalvaluesforHa1,Ha2. 5. PartitionDintonconnectedsubsetswithequalmeasure,fDigni=1,D=n]i=1Di,(Di)=1 RandomlyselectmsamplepointsinD,callthepointsfxkgmk=1.LetmibethenumberofpointsinDi. 7. Runoneiterationofthemixingprotocol. 8. Letmijbethenumberofpointssuchthatxk2Diandf(xk)2Dj.LetM=(mij),MisiscalledanUlamMatrix. 9. LetbP=(mij 10. Ifthereareconcernsabouteigenvaluestability,conrmtheresultsofHoversusHa2with1((IP)bP). 11. Makeadecisionaboutthehypothesesoftestingbasedonthecriticalvalues. 12. IfwerejectHoinfavorofHa2,lettherateatwhichNn1(2(bP))Nn+1!0asN!1beourestimateoftherateofmixing. 13. Estimateoftheentropyofthedynamicalsystemwith1
PAGE 15
15
PAGE 16
8 ]If(D,B,)isaprobabilityspace,ameasurepreservingtransformationfisergodicifB2Bandf1(B)=Bthen(B)=0or(Bc)=0.Thisdenitiongeneralizestonitemeasurespaces,butwewillonlylookatprobabilityspaces.Wewillusethefollowingtheoremtodeneergodic,weakmixingandstrongmixing. 9 ]Let(D,B,,f)beaprobabilityspaceandletSbeasemialgebrathatgeneratesB.Letf:D!Dbeameasurepreservingtransformation.Then 1. 8 ]. 16
PAGE 17
Denition2.2.1. 17
PAGE 18
9 ],sowewillreferto((1=n,1=n,1=n,...,1=n),P)asmixingornotmixing.SinceourMarkovshiftisonlyanapproximationof(D,B,,f),wemayaccepthypothesesofweakmixingandnotmakestatementsofstrongmixingwhenweakmixingisaccepted.Laterwewillshowwhy((1=n,1=n,1=n,...,1=n),P)isareasonableapproximationof(D,B,,f). 1. If((1=n,1=n,1=n,...,1=n),P)isnotergodic,wewillfailtorejectthehypothesisthat(D,B,,f)isnotergodicandnotweakmixing. 18
PAGE 19
If((1=n,1=n,1=n,...,1=n),P)isergodicbutnotmixing,wewillrejectthehypothesisthat(D,B,,f)isnotergodicinfavorofthehypothesisthat(D,B,,f)isergodicbutnotweakmixing. 3. If((1=n,1=n,1=n,...,1=n),P)ismixing,wewillrejectthehypothesisthat(D,B,,f)isnotergodicandnotmixinginfavorofthehypothesisthat(D,B,,f)isergodicandweakmixing.Thefollowinglemmaandtheoremsgiveuscriteriaforwhen((1=n,1=n,1=n,...,1=n),P)isergodicormixing. 1. 2. AllrowsofthematrixQareidentical. 3. EveryentryinQisstrictlypositive. 4.
PAGE 20
9 10 ]Iffnisthe(~p,P)Markovshift(eitheronesidedortwosided)thefollowingareequivalent: 1. 2. 3. ThematrixPisirreducibleandaperiodic(i.e.9N>0suchthatthematrixPNhasnozeroentries). 4. Forallstatesi,jwehave(Pk)ij!pj. 5. 9 ]TheMarkovshift(~p,P),~p=(pi),P=(Pij),hasmetricentropynXi=1nXj=1pipij(log(pij)).Wherewedene0log(0):=0.
PAGE 21
Proof. 1. ((1/4,1/4,1/4,1/4),2666666641=65=6005=61=600001=32=3002=31=3377777775)isanonergodicMarkovshiftwithentropyapproximately0.544. 2. ((1/4,1/4,1/4,1/4),2666666640100001000011000377777775)isanergodicMarkovshift,butisnonmixing,withentropy0. 3. ((1/4,1/4,1/4,1/4),2666666641=41=41=41=41=41=41=41=41=41=41=41=41=41=41=41=4377777775)isamixingMarkovshift.ThisMarkovshiftachievestheentropyupperboundoflog4.
PAGE 23
Proof.
PAGE 24
Proof. 1 Weskipthestandardproofsofthefollowinglemmas.Thenextlemmashowsthatalloftheeigenvaluesofastochasticmatrixareontheunitdiskofthecomplexplane. 24
PAGE 26
1. 2. Theeigenvectorcorrespondingto1hasstrictlypositiveentries. 3. Allothereigenvalueshavemagnitudelessthan1. 4. Theeigenvectorsthatdonotcorrespondto1havenonpositiveentries.IfPisastochasticmatrixwithnonnegativeentriesthenthereisaneigenvectorcorrespondingto1withallentrieson[0,1).
PAGE 27
2,wegetthefollowingequationsforP.p11+p12=1p21+p22=1p11+p21=1p12+p22=1Thusp11=p22,p12=p21.Ifwesetp=p11,q=1p=p12,P=264pqqp375. 27
PAGE 28
:Thiscasefollowsfromthepreviousexample.Every22dsmatrixisoftheformP=264p1p1pp375forsomep2[0,1].n=3 :Thiscasefollowsfromapreviouscounterexample.ThematrixP=2666641=31=31=31=21=61=31=61=21=3377775isa33dsmatrixthatisnotsymmetric.n>3 :LetPdenotethematrixinthen=3case,Ikbethekkidentitymatrix,and0kbethekkmatrixwithzeroforallentries.Thenforeachn>3,264P0303In3375isannndsmatrixthatisnotsymmetric. Asnincreases,thedegreesoffreedomfornndsmatricesincreasesandthewaysthatadsmatrixcandeviatefrombeingsymmetricgrow.Duetothisandtheobservationthatrandomlygenerateddsmatricesaresymmetriclessfrequentlyforlargen,weproposethefollowingconjecture.
PAGE 29
11 ]Annnmatrixisdoublystochasticifandonlyifitisaconvexcombinationofnnpermutationmatrices.SoDSnisaconvexsetwiththepermutationmatricesbeingtheextremepointsoftheset.Infact,byCaratheodory'sconvexsettheorem,everynndsmatrixisaconvexcombinationof(n1)2+1orfewerpermutationmatrices. 29
PAGE 31
Proof.
PAGE 32
2p+1 2p+1 2p1 2375k.ItfollowthatkPPk1=j2p1j1,kPPk2=j2p1j1,kPPk1=j2p1j1,kPPkF=j2p1j1.ThematrixPisthecenterofdsmatrices,howdoes((1=n,...,1=n),P)comparetoothermeasures?IfourMarkovshifthas((1=n,...,1=n),P)asitsmeasure,thenknowingwhichpartitionsetxisintellsusnothingaboutwhichpartitionsetf(x)isin.ThusPisthematrixofoptimalmixing.Ifj2(P)j<1,then(Xn,n,((1=n,...,1=n),P),fn)isamixingdynamicalsystemandPk!Pask!1(wewillshowthisresultwhileweconstructourmixingrateestimate).SoaMarkovshift,(Xn,n,((1=n,...,1=n),P),fn),beingamixingdynamicalsystemisequivalenttoPk!Pask!1.WewillneedaJordancanonicalformofPtomakeaninferenceabouttherateatwhichPk!Pask!1.WewillusetherateatwhichbPk!Pask!1toestimatethemixingrateof(D,B,,f). 12 ]withcharacteristicpolynomialxn1(x1).FuthermoretheJordanCanonicalformofPhasaoneonthediagonalandallotherentriesarezero.
PAGE 33
33
PAGE 34
Proof.
PAGE 35
2thenPisinvertible. Proof. 2.Since1 2
PAGE 36
Proof. ThisupperboundisachievedwhenPistheidentitymatrix.
PAGE 37
Noticethatjnj=n1p Theidentitymatrixachievesthelowerbound.
PAGE 38
Proof.
PAGE 40
SowemayuseTPtodescribetheeigenvaluesofP,furthermore2(P)=1(TP).NextwelookathowthesingularvaluesofTPcomparetotheeigenvaluesofP. 12 ]andourpreviousresult,j2(P)j=j1(TP)j1(TP).Theupperboundofonefollowsfromastraightforwardcomputation. Ifweareconcernedaboutthestabilityofeigenvaluesfromourapproximatingmatrix,bP,thenwemayusetheeigenvaluesofTbP.Ifwedonottrustthestabilityofeigenvaluesfromeithermatrix,thenwemayusetherstsingularvalueofTbP.Sincetherstsingularvalueofamatrixisverystable,1(TbP)isabetterstatisticwheneigenvaluestabilityisquestionable.Unfortunately,theprobabilitydistributionof1(TbP)willlikelydifferfromtheprobabilitydistributionofj2(bP)j. 40
PAGE 41
Theorem4.1.1.
PAGE 42
TheMarkovshift((1=n,...,1=n),P)intuitivelygivestheoptimalmixingforaMarkovshift.Knowingxitellsusnothingaboutxi+1,andforanyprobabilityvector,(p1,p2,...,pn),(p1,p2,...,pn)P=(1=n,1=n,...,1=n).Whenweuse(p1,p2,...,pn)torepresentasimplefunctionapproximatingtheinitialconcentrationofaningredienttobemixedinD,astirringprotocolthatmixestheingredientinoneiterationwillhave(Xn,n,((1=n,...,1=n),P),fn)astheapproximatingMarkovshift. 42
PAGE 43
Proof. 43
PAGE 44
44
PAGE 45
Denition5.1.1.
PAGE 46
Proof.
PAGE 48
Thenextfewtheoremsgivepropertiesofthecovariancebetweenentriesofarandomnndsmatrix.Sinceallrowsandallcolumnssumtoone,ifoneentrychangesthenatleastoneotherentryonthesamerowandoneotherentryonthesamecolumnmustchangetomaintainthesum.Iffollowsthattheentriesofarandomnndsmatrixcannotbeindependent.
PAGE 50
1. Ifj6=j0,then1 Ifi6=i0,then1
PAGE 53
ThenextexamplegivesusanideaofhowmuchwecanexpectPtodifferfromPwhenn=2. 2.
PAGE 55
Thenexttheoremisusefulifonewishestousetheharmonicmeanoftheentriesofarandomstochasticmatrix. Proof. 1=nE(1 55
PAGE 56
1. LetPnbeourstochasticmatrixbeforerenementwithentriespij,pij=(f(x)2Djjx2Di). LetPnkbeourstochasticmatrixafterrenementwithentriespij,pij=(f(x)2Djjx2Di).ArrangetherowsandcolumnsofPnkwiththeorder1112...1k2122...2k...n1n2...nk.WiththisarrangementwecanrepresentPnkasablockmatrixwhereeachentryofPncorrespondstoakkblockofPnk. 56
PAGE 57
57
PAGE 63
Proof. IfwetakeasequenceofrenementmatricesfPnkg1k=1,nkjnk+1,k,2=2(Pnk),thenfk,2g1k=1measuresmixingateachrenement.Since(k+1),2measuresmixingonanerpartitionthank,2,onewouldexpectfjk,2jg1k=1tobeanondecreasingsequence,thisisnotthecase.Thereareexamplesofsequencesthathavej(k+1),2j
PAGE 64
Whenwetakearenement,whatdotheeigenvaluesandeigenvectorsofPntellusabouttheeigenvaluesandeigenvectorsofPnk?ForPntobeofvalue,itneedstocapturetheusefulinformationfromPnk,ifitdoesnot,thenPnhasnohopeofbeingusefulinmakingadecisionabout(D,B,,f).Sinceweareusinganeigenvalueasateststatistic,wepresentthenextresultsdescribingtherelationshipbetweentheeigenvaluesofPnandPnk.
PAGE 65
Proof.
PAGE 67
67
PAGE 69
Proof.
PAGE 70
Proof.
PAGE 71
13 ]ThebiasofapointestimatorWofaparameteristhedifferencebetweenE(W)and;thatis,BiasW=E(W).IfE(W)=,thenwecallWanunbiasedestimatorof.IfE(W)6=,thenwecallWabiasedestimatorof. WhenwerenefDigni=1,theapproximationof(D,B,,f)isner.Sothecriteriatomixovertherenedpartitionismorestringent.Sincej2(Pnk)jisourmeasureofmixing,wemakethefollowingconjecture. 71
PAGE 72
72
PAGE 75
Proof. 75
PAGE 78
Notation7.2.1.
PAGE 79
Proof. ThenexttheoremindicateshowmuchwecanexpectPntodifferfrom
PAGE 81
81
PAGE 82
1. ApplyanpartitiontoD. 2. RandomlygenerateorstatisticallysamplemiuniformlydistributedindependentpointsinDiforalli(mi2N).Applyftothedatapoints. 3. Setmijequaltothenumberofpointssuchthatx2Di,f(x)2Dj. 4. LetbPbethematrixwithbpij=mij
PAGE 83
5. Letb2bebP'ssecondlargesteigenvalueinmagnitude.Ifb2isnotuniquepickaneigenvalueofminimaldistancetoone. 6. Ifjb2jissufcientlysmallerthan1,rejectthehypothesisthat(D,B,,f)isnotmixinginfavorofthehypothesisthat(D,B,,f)isweakmixing. 7. Ifjb2jisnotsufcientlysmall,butjb21jissufcientlylargerejectthehypothesisthat(D,B,,f)isnotergodicinfavorofthehypothesisthat(D,B,,f)isergodic. 8. Ifb2isclosetoone,failtorejectthehypothesisthat(D,B,,f)isnotergodic. 9. Ifweacceptthehypothesisthat(D,B,,f)isweakmixing,usetherateatwhichNn1jb2jNn+1!0asN!1asanestimateoftherateatwhichfmixes. 10. Let1
PAGE 84
Theorem8.2.1.
PAGE 85
SinceweareusinganapproximationofP,howfaroffisbP?HowmanydatapointsshouldbeusedtogeneratebP? 4mi
PAGE 86
4min1inmi.ItfollowsthatP(PbPF>)n2 TheprevioustheoremshowedconvergenceinprobabilityandgivesinsightintotheprobabilitydistributionofbP.OurnexttheoremshowsthatE(PbPF)!0asmin1infmig!1.Sothenexttheoremimpliesthepreviousconvergenceresult;weshowedthelasttheoremforitsstatementabouttheprobabilitydistribution.
PAGE 87
4minXi=1nXj=11 4min1inmi=n2 Wewantb2,toconvergetothecorrectvalue,otherwisethesecondlargesteigenvaluewouldmakeapoorteststatistic. 87
PAGE 88
13 ]IfTisafunctiononempiricaldataXandisaparameteroftheprobabilitydistributionofX,thenT(X)isasufcientstatisticforiftheconditionaldistributionofXgivenT(X)doesnotdependon. Proof. Onceagainwelookatthesimplestcase,herewemakeinferencesabouttheprobabilitydistributionofanUlamapproximationof22matrix.
PAGE 90
Proof. TheJordancanonicalformofPprovidesamixingrateestimate.WhatcanwesayaboutusingtheJordancanonicalformofbPtoapproximateP'scanonicalform?
PAGE 91
91
PAGE 92
Notation9.1.1. 92
PAGE 93
SoifuisaneigenfunctionoffanddenesaprobablitiydistributionfunctiononD,then=1.
PAGE 94
(Dki)]1Dki(x)=nknkXi=1[ZDkiud]1Dki(x).ByconstructionZDkiukd=ZDkiudforallki,andthusZDukd=ZDudforallk.SoukisourbestapproximationofuoverthealgebrageneratedbyfDkignki=1.InfactukisaprojectionofuontothesetofsimplefunctionoverfDkignki=1[ 14 ].IffDk+1ignk+1i=1isarenementoffDkignki=1,thenthesetofsimplefunctionsoverfDkignki=1iscontainedinthesetofsimplefunctionsoverfDk+1ignk+1i=1;itfollowsthatuk+1isabetterapproximationofuthanuk.WeusetheentriesfromstationarydistributionsoffbPnkg1k=1toconstructapproximationsofstationarydistributionsoff.If~pk=~pkbPnkand~pitaprobabilityvector 94
PAGE 96
15 ]LetX1,X2,...beasequenceofrandomvariablesonaprobabilityspace(,F,P),andletF1,F2,...beasequenceofalgebrasinF.Thesequencef(Xk,Fk)g1k=1isamartingaleifthesefourconditionshold: 1. 2. 3. 4. withprobabilityone,E(Xk+1jFk)=Xk.
PAGE 97
Proof. (Dkj)=RDkjud 97
PAGE 98
15 ]IfF1,F2,...isasequenceofalgebrassatisfyingF1F2...,F1=(S1k=1Fk)andZisintegrable,thenE(ZjFk)!E(ZjF1)withprobabilityone.ThenexttheoremgivesusconvergenceofuktouwhenS1=B.ThusitisimportanttorenepartitionssuchthattherenementsgeneratetheBorelalgebrainthelimit. Sinceweareworkingwithasequenceofrenementpartitions,uk(x)iscontainedinthesetofsimplefunctionsoverfDk+1ignk+1i=1.Henceuk+1(x)isnoworseofanapproximationofu(x)thanuk(x).
PAGE 101
101
PAGE 102
1. 2. 3. Wearenotsayingthat(D,Fk,,f)isaweakmixingdynamicalsystem,fdoesnotneedtobeFkmeasureable.ThispropositionshowsusthatifweobservetwosetsAk,Bk2FksuchthatlimN!11 102
PAGE 103
Ournexttheoremgivesacriteriaforweakmixing. 1. 2.
PAGE 104
7. 8. Proof.
PAGE 106
106
PAGE 107
Denition10.1.1. Proof.
PAGE 109
SowemayusethesequencefkP(k)PkFg1k=1todetectdecayofcorrelationbetweensimplefunctionsoversetsin(fDigni=1).Alsowemayusetherateatwhichthissequencegoestozerotomeasuretheratethatthedynamicalsystemstrongmixesover(fDigni=1).
PAGE 110
Proof. 110
PAGE 111
111
PAGE 112
1=n=(f(x)2Dj^x2Di) Itfollowsthatmij 112
PAGE 113
Proof.
PAGE 114
114
PAGE 115
1. 2.
PAGE 116
1. WhenwegeneratebP,weknowthatitapproximatesannndsmatrix,thuscolumnsumsshouldbeclosetoone.IfnXj=1(1nXi=1bpij)2=nXj=1(nXi=1bpij)2nissignicantlylargerthanzerothenmorepointsshouldbeused. 2. SupposewewanttocheckhowclosebPistobeingdswithrespecttoaparticular.IfthevalueofkbPT0BBBBBBB@11...11CCCCCCCA0BBBBBBB@11...11CCCCCCCAk=k(bPTI)0BBBBBBB@11...11CCCCCCCAkdifferssignicantlyfromzero,thenmorepointsshouldbeused. 3. Alldsmatriceshave(1=n,1=n,...,1=n)asastationarydistributionsoifbPdoesnothaveastrictlypositivestationarydistribution,morepointsshouldbeused. 4. Alldsmatriceshave(1=n,1=n,...,1=n)asastationarydistributionsoifsupnp k~uk:~ubP=~u,~uisaprobabilityvectoroisfarfromoneforallstrictlypositivestationarydistributionsofbP,thenmorepointsshouldbeused. 116
PAGE 117
1. 2. 1. Uniformdistribution:g(t)=1[0,1](t) Betadistribution:g(t)=()() (+)t1(1t)t11[0,1](t),>0,>0 Triangledistribution:ga(t)isthepiecewiselinearfunctionthatformsatrianglewiththeinterval[0,1]asthebaseandthepoint(a,2)astheapex.Whenj2(P)j=1,thecorrectdistributiontouseprobablyhasj2(bP)j=1asacentraltendency.Sincej2(bP)j1theonlywaytohavethemeanormedianequalone 117
PAGE 118
1. Uniformdistribution:Allvalueson[0,1]areamodeofthedistribution.g(t)=1[0,1](t) Betadistribution:Tohavet=1beamodewemustset1,=1.g1(t)=t11[0,1](t) Triangledistribution:Themodeofthedistributionisasoseta=1.ga(t)=2t1[0,1](t)If=1,=1,thenthebetadistributiongivesustheuniformdistribution;if=2,=1,thenthebetadistributiongivesusthetriangledistributiondistributionwithmodeofone.Thebetafamilyofdistributionsincludestheuniformdistributionandthetriangledistributionwithmodeatone.Soweproposeusingthebetadistributionwith1,=1.g1(t)=t11[0,1](t).Thebetadistributionprovidestheadditionaladvantagethatitisanexponentialfamilyofdistributions.When1,=1andasignicancelevelisset,thecriticalvalueincreasesasincreases.Ifpossible,beforeevaluatingourstirringprotocol,weshouldlookatseveralstirringpatternsinthesameclassasfandusemaximumlikelihoodestimatesormethodofmomentstoestimate.If2isauniformlydistributedrandom 118
PAGE 119
()()t1(1t)1dt.ItfollowsthatIf=1,thenP(bp
PAGE 120
+,andpisxedsoifE()=0,thenE(bp)=p= +.ItfollowsthatV(bp)=V()=p2(1p) 1. Usethedealer'salgorithmtogeneratedsmatrices 2. Takeconvexcombinationsofalln!permutationmatricestogeneratedsmatrices 3. Takeconvexcombinationsof(n1)2+1ormorepermutationmatricestogeneratedsmatrices 4. Useunitarymatricestogeneratedsmatrices 120
PAGE 121
1. Thenumberofsamplepointsissmallerthanwewouldlike(minmiissmall). 2. Wehavenoknowledgeofthedistributionbeforehand. 3. Wewanttosampledsmatriceswithentriesfromfa g:a2f0,1,2,...,ggg. 121
PAGE 122
g.TakingsquarerootsofbothsidesleadstoE(kPPkF)
PAGE 123
sum(~u)isarandomconvexcombinationalmostsurely.Wemaytaketheabsolutevalueofrealrandomvariablesthatarecontinuousatzerotogetconvexcombinations.Ifwechangethedistributionoftheui's,wechangethedistributionoftheconvexcombinationsandthuschangethedistributionofthedoublystochasticmatrices.Toverifythis,comparetheresultswhentheui'sareindependentuniform([0,1])randomvariables,andwhenui=v2iwherethevi'sareindependentcauchy(0,1)randomvariables.
PAGE 124
Proof. dkP(N+1
PAGE 125
dk(P(N+1
PAGE 126
Theorem12.2.11. Proof. Wewillrefertoconvexcombinationsofpermutationmatricesthatusealln!permutationmatricesalmostsurelyasfullconvexcombinations.Wewillcallconvexcombinationsthatuse(n1)2+1permutationmatricesreducedconvexcombinations.Thenextresultshowsthatfullandreducedconvexcombinationssamplefromprobabilityspaceswithdifferentmeasures.
PAGE 127
Proof. Itwouldbenicetobeabletoextendasetofobserveddsmatriceswheneverobtainingobservationsisdifcultorexpensive;MuraliRaocreatedawaytoextendasetofobservations.
PAGE 130
1. Applythegreedydsalgorithmtoourobserveddsmatrices. 2. Usethemethodofmomentsormaximumlikelihoodestimationtoapproximatetheparameter.Calltheapproximationb. 3. Useindependentgamma(b,b)randomvariablestogeneratenewdsmatrices.Theresearchermustdecidewhatvalueofbisappropriate.Theck'sareindependentandidenticallydistributedgamma(b,b).k=ck
PAGE 131
Proof. 131
PAGE 132
17 ],theexpectedvalueofthesecondlargesteigenvalueofthedsmatricesgoestozeroasngoestoinnity[ 18 ].IfAisarandomnmmatrixwithfullcolumnrankalmostsurely,UisaunitaryqrfactorofA,andPisthematrixwherePij=jUijj2,thenPisarandomdsmatrix.DifferentprobabilitymeasuresforAwillresultindifferentprobabilitymeasuresforP.Soanyprocessthatgeneratesrandommatriceswithfullcolumnrankmaybeusedtogeneratedsmatrices. 132
PAGE 133
133
PAGE 134
131 .ForeverypartitionbP=P,m=106appearstobesufcientandwefailtorejectHoandcorrectlyconcludethatthemapisnotmixing. 134
PAGE 135
TheReectionMap NumberofStatesAveragej2(bP)jTypicalpvalueof2 132 135
PAGE 136
Arnold'sCatMap NumberofStatesAveragej2(bP)jTypicalpvalueof2 AtypicalbPforfourstatesisbP2666666640.25070.24960.25010.24960.25130.24840.25050.24970.24890.25110.25000.25000.25060.25040.24990.2491377777775.max(jPbPj)0.0016j2(bP)j0.0016Typicalsignicancelevelswillconcludethatweusedenoughpointswhenn=4,16,64,256,1024,butm=106isnotsufcientwhenn=4096. 19 ].Foradynamicalsystemtobechaoticitmustbetopologicallymixing;thatistosay,foranytwoopensetsA,BD,thereexistsanNsuchthatfn(A)\B6=;
PAGE 137
TheSineFlowMap(parameter8/5) NumberofStatesAveragej2(bP)jTypicalpvalueof2 whenevern>N.Weranourprocedure100timeswith106pseudorandomlygeneratedpoints(uniformly)andsawtheresultslistedintable 133 .Typicalsignicancelevelswillconcludethatweusedenoughpointsforn=4,16,64,256,1024,butm=106appearstobeinsufcientwhenn=4096.AtypicalbPforfourstatesisbP2666666640.16550.23410.23300.36730.23160.16560.37090.23180.23320.36910.16470.23310.36960.23260.23280.1650377777775.j2(bP)j0.2048 134 137
PAGE 138
TheSineFlowMap(parameter4/5) NumberofStatesAveragej2(bP)jTypicalpvalueof2 Typicalsignicancelevelswillconcludethatweusedenoughpointswhenn=4,16,64,256,1024,butm=106isnotsufcientwhenn=4096.AtypicalbPforfourstatesisbP2666666640.20150.23110.22940.33800.22930.20180.33940.22960.23070.33670.20100.23170.33910.22960.22980.2015377777775.j2(bP)j0.1375ItisbelievedthatfmixesfasterforlargerT.Comparisonofeigenvaluemagnitudefromthetwosineowmapexamplesrunscountertothisconjecture.Thebaker'smapexampleshowshoweigenvalueinstabilitycanaltertheobservedeigenvalues.
PAGE 139
135 139
PAGE 140
TheBaker'sMap NumberofStatesAveragej2(bP)jTypicalpvalueof2 AtypicalbPforfourstatesisbP2666666640.50130.498700000.49950.50050.50110.498900000.49970.5003377777775.max(jPbPj)0.0013j2(bP)j0.0079.Typicalsignicancelevelswillconcludethatweusedenoughpointswhenn=4,16,64,256,1024,butm=106isnotsufcientwhenn=4096.Weknowthatthemapismixing,theinducedMarkovshiftismixingbuttheeigenvaluesoftheapproximatingmatricesdonotreecttherateofmixing.Thisexampledemonstrateshoweigenvalueinstabilityandinsufcientmcanthrowoffanobservation.
PAGE 141
TheChirikovStandardMap(parameter0) NumberofStatesAveragej2(bP)jTypicalpvalueof2 Forthisexamplewesetk=0sothatwemaycomputeP,f(x,y)=(x,x+y).WhenwepartitiontheunitsquareintofoursquaresP=2666666641=201=2001=201=21=201=2001=201=2377777775,whichhascharacteristicpolynomial2(1)2.Weranourprocedure100timeswith106pseudorandomlygeneratedpoints(uniformly)andsawtheresultslistedintable 136 .AtypicalbPforfourstatesisbP2666666640.501200.4988000.500000.50000.500600.4994000.501000.4990377777775.
PAGE 142
142
PAGE 143
[1] G.Froyland,OnUlamapproximationoftheisolatedspectrumandeigenfunctionsofhyperbolicmaps,DiscreteContin.Dyn.Syst.17(3)(2007)671(electronic).URL F.Y.Hunt,UniqueergodicityandtheapproximationofattractorsandtheirinvariantmeasuresusingUlam'smethod,Nonlinearity11(2)(1998)307.URL Froyland,G.Aihara,Kazuyuki,Ulamformulaeforrandomandforcedsystems,Proceedingsofthe1998InternationalSymposiumonNonlinearTheoryanditsApplications2(1998)623. [4] S.M.Ulam,Problemsinmodernmathematics,ScienceEditionsJohnWiley&Sons,Inc.,NewYork,1964. [5] M.Dellnitz,G.Froyland,S.Sertl,OntheisolatedspectrumofthePerronFrobeniusoperator,Nonlinearity13(4)(2000)1171.URL T.Y.Li,FiniteapproximationfortheFrobeniusPerronoperator.AsolutiontoUlam'sconjecture,J.ApproximationTheory17(2)(1976)177. [7] G.Froyland,UsingUlam'smethodtocalculateentropyandotherdynamicalinvariants,Nonlinearity12(1)(1999)79.URL J.R.Brown,Ergodictheoryandtopologicaldynamics,AcademicPress[HarcourtBraceJovanovichPublishers],NewYork,1976,pureandAppliedMathematics,No.70. [9] P.Walters,Anintroductiontoergodictheory,Vol.79ofGraduateTextsinMathematics,SpringerVerlag,NewYork,1982. [10] V.Baladi,Positivetransferoperatorsanddecayofcorrelations,Vol.16ofAdvancedSeriesinNonlinearDynamics,WorldScienticPublishingCo.Inc.,RiverEdge,NJ,2000. [11] G.Birkhoff,Threeobservationsonlinearalgebra,Univ.Nac.Tucuman.RevistaA.5(1946)147. [12] G.H.Golub,C.F.VanLoan,Matrixcomputations,3rdEdition,JohnsHopkinsStudiesintheMathematicalSciences,JohnsHopkinsUniversityPress,Baltimore,MD,1996. 143
PAGE 144
G.Casella,R.L.Berger,Statisticalinference,TheWadsworth&Brooks/ColeStatistics/ProbabilitySeries,Wadsworth&Brooks/ColeAdvancedBooks&Software,PacicGrove,CA,1990. [14] J.Ding,A.Zhou,FiniteapproximationsofFrobeniusPerronoperators.AsolutionofUlam'sconjecturetomultidimensionaltransformations,Phys.D92(12)(1996)61.URL P.Billingsley,Probabilityandmeasure,3rdEdition,WileySeriesinProbabilityandMathematicalStatistics,JohnWiley&SonsInc.,NewYork,1995,aWileyIntersciencePublication. [16] M.Rao,personalcommunication(2009). [17] M.Pozniak,K.Zyczkowski,M.Kus,Composedensemblesofrandomunitarymatrices,J.Phys.A31(3)(1998)1059.URL G.Berkolaiko,Spectralgapofdoublystochasticmatricesgeneratedfromequidistributedunitarymatrices,J.Phys.A34(22)(2001)L319L326.URL M.Giona,S.Cerbelli,Connectingthespatialstructureofperiodicorbitsandinvariantmanifoldsinhyperbolicareapreservingsystems,Phys.Lett.A347(46)(2005)200.URL
PAGE 145
AaronCarlSmithwasborninPortland,Indiana,andgrewupinAshland,Oregon.AfterservingintheUnitedStatesArmy,AaronusedtheMontgomeryGIbilltoattendtheUniversityofFlorida.Heismarriedtothemostbeautifulwomanintheworld,BridgettSmith;theyhaveawonderfuldaughter,Akiko. 145
