Timing and channel estimation in multiple-antenna communication systems

MISSING IMAGE

Material Information

Title:
Timing and channel estimation in multiple-antenna communication systems
Physical Description:
x, 95 leaves : ill. ; 29 cm.
Language:
English
Creator:
Liu, Yong
Publication Date:

Subjects

Subjects / Keywords:
Electrical and Computer Engineering thesis, Ph. D   ( lcsh )
Dissertations, Academic -- Electrical and Computer Engineering -- UF   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 2005.
Bibliography:
Includes bibliographical references.
Statement of Responsibility:
by Yong Liu.
General Note:
Printout.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 003477821
sobekcm - AA00004683_00001
System ID:
AA00004683:00001

Full Text











TIMING AND CHANNEL ESTIMATION IN MULTIPLE-ANTENNA COMMUNICATION
SYSTEMS

















By

YONG LIU


















A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


2005


































Copyright 2005

by

Yong Liu
















To my wife and parents.














ACKNOWLEDGMENTS


First of all, I would like to thank my advisor, Dr. Tan F. Wong, for his invaluable guidance,

help and constant encouragement during my graduate study at the University of Florida.

I also want to thank the other members of my graduate committee, Dr. John M. Shea, Dr.

Jose A. B. Fortes and Dr. William W. Hager, for their suggestions and help. My special thanks

go to Dr. William W. Hager for his great help during the development of the second half of this

work.

Finally, I am extremely grateful to my family for their encouragement, devotion and sup-

port throughout my whole life.














TABLE OF CONTENTS

page

ACKNOWLEDGMENTS.................. .. ............ iv

TA B LE . ... . .. . . vii

LIST OF FIGURES ..................................... viii

ABSTRACT ................. .. ... ......................... ix

CHAPTER

1 INTRODUCTION ............................... 1

1.1 Timing Estimation for Rayleigh Flat-fading MIMO Channels ..... 3
1.2 Channel Estimation for Correlated MIMO Channels with Colored
Interference . . . 4
1.3 Organization of the Dissertation .................... 5

2 TIMING ESTIMATION IN MULTIPLE-ANTENNA SYSTEMS OVER
RAYLEIGH FLAT-FADING CHANNELS ................ 7

2.1 Introduction .... .. .. .. .. 7
2.2 System M odel .............................. 8
2.3 Timing Estimation with Unknown Deterministic Channel ...... 11
2.3.1 M L Estimator ........ .. ... ........... 11
2.3.2 Cramer-Rao Bound ...................... 12
2.3.3 Optimal Training Scheme ................... 14
2.4 Timing Estimation with Random Channel .............. 29
2.4.1 M L Estimator ............... ........... 31
2.4.2 Cramer-Rao Bound ...................... 32
2.4.3 Optimal Training Scheme ................... 35
2.5 Discussions and Conclusions ..................... 37
2.5.1 Orthogonal Training Signals .................. 40
2.5.2 Perfectly Correlated Training Signals . .. 40
2.5.3 Deterministic vs Random Channel Approaches .... 41

3 CHANNEL ESTIMATION FOR CORRELATED MIMO CHANNELS
WITH COLORED INTERFERENCE . . 42

3.1 Introduction .. .. .. .. 42
3.2 System Model ............... .... ........ 44
3.3 Optimal Training Sequence Design .................. 49
3.3.1 Solution Structure . . . 50
3.3.2 The Optimal E .......................... 56
v









3.3.3 Optimal Eigenvector Ordering . . 58
3.4 Estimation of Channel Statistics and Feedback Design ... 62
3.5 Numerical Results ............................ 69
3.5.1 Co-channel Interference . . 70
3.5.2 Jamming Signals ................ ........ 71
3.6 Conclusion .. .. .. .. .. 73
3.7 Appendix . . . . 76
3.7.1 A Trace Problem ...... . . 76
3.7.2 A Determinant Problem . . 81

4 CONCLUSION AND FUTURE WORK . . 87

4.1 Timing Estimation for Rayleigh Flat-fading MIMO Channels 87
4.2 Channel Estimation for Correlated MIMO Channels with Colored
Interference ................ ......... 87
4.3 Timing Estimation for Correlated MIMO Channels with Colored Noise 88

REFERENCES ........... ....................... 89

BIOGRAPHICAL SKETCH ....... .. .............. 95














TABLE

Table page

1.1 M atrix N stations .. .. .. . .. 6














LIST OF FIGURES

Figure page

2.1 Outage probabilities achieved using different training signal sets for a system
with 4 transmit and 1 receive antennas. The unit of the threshold e is T2. 22

2.2 Outage probabilities achieved using orthogonal training signals for different
numbers of transmit antennas. One receive antenna is employed. The unit of
the threshold E is T ........ . . 23

2.3 Comparison of outage probabilities of the ML estimator obtained from simula-
tion and calculated from the CRB. The number of transmit antennas nt is 2
and = 10-4T2. ................. .... .. ..... 24

2.4 Comparison of the MSE of the ML estimator obtained from simulation and
the average CRB. The number of transmit antennas nt, is 2. The unit in the
vertical axis is T2............. ................... 30

2.5 Comparison of CRBs obtained using orthogonal training sequences and per-
fectly correlated training sequences for different numbers of transmit anten-
nas. Note that the CRB of the system with the perfectly correlated training
sequences is the same for any number of transmit antennas. . 38

2.6 Comparison of the MSE of the ML estimator obtained from simulation and the
CRB. The number of transmit antennas nt is 2. The unit in the vertical axis is
T 2. . . . . . 3 9

3.1 Comparison of total MSEs obtained using different training sequences. ISI-free
symbol waveform and high spatial correlation channel. . ... 72

3.2 Comparison of total MSEs obtained using different training sequences. ISI-free
symbol waveform and low spatial correlation channel. . ... 73

3.3 Comparison of total MSEs obtained using different training sequences. AR
jammers and high spatial correlation channel. . ... 74

3.4 Comparison of total MSEs obtained using different training sequences. AR
jammers and low spatial correlation channel. . .. 75














Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

TIMING AND CHANNEL ESTIMATION IN MULTIPLE-ANTENNA COMMUNICATION
SYSTEMS

By

Yong Liu

December 2005

Chair: Tan F. Wong
Major Department: Electrical and Computer Engineering

There is an increasing demand for next generation wireless networks, including wireless

local area networks and the third generation cellular networks, that can provide high data rate for

broadband services, improve quality of service (QoS), and support more users. The use of mul-

tiple transmit and receive antennas can offer substantial performance improvement to a wireless

communication system by making the use of the extra degrees of freedom in the spatial domain

and thus is a promising technique to satisfy this demand. Many of the current space-time coding

schemes proposed for multiple-antenna systems assume perfect timing estimation and channel

estimation to achieve the expected performance gain. The lack of timing synchronization be-

tween the transmit and receive signals and the inaccuracy of channel estimation could degrade

the system performance.

In the first half of this work, we investigate the problem of timing estimation in multiple-

antenna systems with the aid of training signals. A slow, independent and identically distributed

Rayleigh flat-fading channel model is considered. We derive two maximum likelihood timing

estimators based on two different approaches, namely treating the channel as deterministic and

random, and present the corresponding Cramer-Rao bounds (CRBs). Then the optimal designs

of training signals based on some figures of merit associated with the CRBs are discussed.









In the second half of this work, we study the problem of the estimation of correlated

multiple-input multiple-output (MIMO) channels with colored interference. The Bayesian chan-

nel estimator is derived and the optimal training sequences are designed based on the mean

square error of channel estimation. We propose an algorithm to estimate the long-term chan-

nel statistics in the construction of the optimal training sequences. We also design an efficient

scheme to feed back the required information to the transmitter where we can approximately

construct the optimal sequences. Numerical results show that the optimal training sequences

provide substantial performance gain for channel estimation when compared with other train-

ing sequences.














CHAPTER 1
INTRODUCTION


There is an increasing demand for next generation wireless networks, including wireless

local area networks and the third generation cellular networks, that can provide high data rate

for broadband services, improve quality of service (QoS), and support more users. The use of

multiple antennas at both the transmitters and receivers in wireless communication systems is

a significant technical breakthrough which can offer substantial performance improvement to

wireless links by making the use of the extra degrees of freedom in the spatial domain and thus

is a promising technique to fulfill these requirements. A system employing multiple transmit

and receive antennas is often called a multiple-input multiple-output (MIMO) system. Recently,

the MIMO system and its related techniques have been widely considered for next generation

wireless communication systems such as wireless local area networks (WLAN) and the third

generation (3G) cellular networks. With multiple antennas, the communication performance can

be improved by many orders of magnitude without increasing transmit power and bandwidth.

Only more hardware complexity is needed. This additional hardware requirement is enabled by

the increasing computational power of integrated circuits.

MIMO systems provide various benefits that include spatial multiplexing gain and diver-

sity gain. The information capacity of wireless communication systems increases significantly

by employing multiple antennas. It has been analytically proved that MIMO systems can pro-

vide a linear increase in capacity [1, 2] which is proportional to the minimum of the number

of transmit antennas and the number of receive antennas. This spatial multiplexing gain can

be obtained by transmitting independent data streams from different transmit antennas. The

increased information rate is achieved without the requirement of increasing the transmit power

and expanding the transmission bandwidth.









The physical characteristics of the wireless channel present a fundamental technical chal-

lenge for reliable communications. Wireless communication channels exhibit significant sig-

nal variations on a short term time scale which is known as fading. One way to mitigate the

degradation effects of fading is to employ diversity techniques which provide the receiver with

several replicas of the same transmitted signal over independent fading channels. The proba-

bility that all the received signals experience deep fades simultaneously reduces considerably.

Thus diversity techniques increase the reliability of wireless links and dramatically improve the

communication performance over fading channels. The commonly used diversity techniques

include time diversity, frequency diversity and spatial diversity. Time diversity can be provided

by channel coding combined with interleaving or automatic repeat request (ARQ) schemes. In

frequency diversity, the same narrowband signal is transmitted over over different frequency

bands to provide independent fading channels. Spatial diversity, which is also known as an-

tenna diversity obtained by the use of multiple antennas, is preferred over time diversity and

frequency diversity since it does not need to increase the transmit signal power and bandwidth.

If the fading effects between different pairs of transmit and receive antennas are approximately

independent and the transmitted signal is carefully designed, the received signals can be com-

bined at the receiver such that the fading of the resultant signal is greatly reduced compared to

a single antenna communication system and thus wireless link improvement is provided.

Space-time coding (STC) is one key technique that has been introduced to provide en-

hanced performance for wireless communication systems employed with multiple antennas.

Space time codes are designed to use the extra degrees of freedom in the spatial domain pro-

vided by extra antennas. They incorporate the temporal and spatial correlations into signals

from different transmit antennas to achieve transmit diversity and provide spatial multiplexing

gain. The main classes of space time codes include the Bell labs layered space-time architecture

(BLAST), space-time trellis codes (STTC) and space-time block codes (STBC).

Tarokh et al. [3] proposed space-time trellis codes which can provide full diversity gain at

the receiver. After that, many efforts have been made to improve the originally designed space-

time trellis codes [4, 5]. Since space-time trellis codes are designed based on trellis codes, they

provide additional coding gain. But the Viterbi algorithm has to be employed for the optimal









decoder of STTC, and thus the decoding complexity grows exponentially with the memory

length of trellis codes and the number of antennas.

To reduce the decoding complexity, Alamouti introduced a simple space-time block coding

scheme for a two transmit antenna system which can provide full diversity gain without sacrific-

ing the transmission data rate [6]. The scheme was extended to more than two transmit antennas

based on the theory of orthogonal designs [7, 8, 9]. Space-time block codes can be decoded us-

ing much simpler linear processing at the receiver compared with the Viterbi algorithm required

for space-time trellis codes. Although space-time block codes achieve the same diversity gain

as space-time trellis codes for the same number of transmit antennas, they do not provide any

significant coding gain. To make a compromise between STBC and STTC, the schemes of con-

catenating the traditional trellis codes with space-time block codes to obtain additional coding

gain has been proposed [10-14].

BLAST [15, 16] is the first space-time coding scheme proposed for MIMO systems which

provides spatial multiplexing. In BLAST, the multiple independent data streams are transmit-

ted from different transmit antennas, and are extracted by using the interference nulling and

interference successive cancelation strategies at the receiver. This decoding scheme operated

in spatial domain for BLAST is similar as the successive interference cancelation proposed for

multiuser detection [17] in CDMA systems. Field tests showed that BLAST provides a substan-

tial increase of data rates for wireless communication systems operating in practical channels

[18].

1.1 Timing Estimation for Rayleigh Flat-fading MIMO Channels

To achieve the performance gain promised by the multiple antenna system, parameter es-

timations including channel estimation, timing estimation and frequency offset estimation are

key components of the space-time system design. Both channel estimation and frequency offset

estimation for MIMO systems have been extensively studied in the literature [19, 20].

An issue that has not been sufficiently explored is timing synchronization in multiple-

antenna systems. Inaccuracies in timing synchronization can degrade the performance of such

communication systems in a similar way as the MIMO channel estimation and frequency offset

estimation error do. For instance, many of the current space-time coding schemes proposed









for multiple-antenna systems assume perfect knowledge of timing and channel gains at the re-

ceiver in order to be able to achieve the promised diversity gain and capacity improvement.

The performance of these systems may be limited by the accuracy of timing estimation. One

objective of this work is to study the problem of timing estimation for a wireless communica-

tion system employing multiple transmit and receive antennas in a Rayleigh flat-fading channel

environment.

1.2 Channel Estimation for Correlated MIMO Channels with Colored Interference

For the multiple antenna communication system, theoretical analysis [1, 2, 15] shows that

the capacity increases linearly with the number of antennas under the assumption that channel

gains between different transmit and receive antennas are identical and independent distributed

(i.i.d.). The i.i.d. assumption is reasonable for sufficiently rich scattering environments. On the

other hand, it is also important to analyze the capacities, design optimal transmission strategies,

and investigate the related channel parameter estimation problem for MIMO systems in more

realistic situations which include spatially correlated channels and colored interference.

In the more realistic channel environment, fading correlation exists between the different

transmit antennas and receive antennas. It was shown [21 ] that the capacity of correlated MIMO

channels still grows linearly with the number of antennas but the growth rate is affected by the

channel correlations and smaller than that in independent fading channels. Based on the ca-

pacity results for correlated MIMO channels, optimal transmission strategies [22-25] have been

widely investigated. Jorswieck et al. [24] investigated the correlated Rayleigh flat fading MIMO

systems with perfect channel state information at the receiver and the channel covariance infor-

mation fed back to the transmitter. It was shown that transmitting signals along the directions

of the eigenvectors of the transmit correlation matrix is the optimal transmission strategy.

The capacity of MIMO channels has also been investigated for wireless communication

systems with colored interference. The scenario arises in cellular systems where the users in

one cell suffer from the co-channel interference from the users in other cells due to frequency

reuse, or in ad hoc networks where each transmitter-receiver pair suffers from the interference

from other transmitter-receiver pairs operating in the same frequency band. In Lozano et al.

[26], the capacity of MIMO systems with the presence of spatially colored interference was









investigated. It was shown that the capacity increases with the interference spatial correlation

and the lowest capacity is achieved when the interference is white. In Moustakas et al. [27], the

authors provided analytical expressions for the statistics of the mutual information for spatially

correlated channels with the presence of interference.

Channel estimation is necessary for coherent detection in multiple antenna communication

systems. The inaccuracy of channel estimation could degrade the system performance substan-

tially. There are few works considering the channel estimation problem for MIMO systems

in realistic situations, which include both spatially correlated channels and interference. So

another objective of this work is to investigate the problem of estimating correlated MIMO

channels with colored interference.
1.3 Organization of the Dissertation

The dissertation is organized in the following manner. The timing estimation problem

for MIMO systems with the aid of training signals is investigated in Chapter 2. In Chapter

3, we study the problem of estimating correlated MIMO channels in the presence of colored

interference. Conclusions are drawn in Chapter 4. The notation used in this dissertation is

summarized in Table 1.1 for clarity.























Table 1.1: Matrix Notations


A
a
Real(a)
It
0
diag(xi, x2,..
A7'
A*
AH
A1/2
vec(A)
tr(A)
det(A)
AB
a > b
A
ar
CN

b(t)


,X1)


matrix with complex entries
column vector with complex entries
real part of column vector a
n x n identity matrix
zero matrix
diagonal matrix with xa, 2, -... n as the diagonal elements
transpose of A
complex conjugate of A
complex conjugate transpose (Hermitian) of A
Hermitian square root of A
vector obtained by stacking columns of A on top of each other
trace of A
determinant of A
Kronecker product of A and B
inequality elementwise
matrix with real entries
column vector with real entries
complex Gaussian distribution
the first derivative of (t) w.r.t. t
the second derivative of 0(t) w.r.t. t


0) I

















CHAPTER 2
TIMING ESTIMATION IN MULTIPLE-ANTENNA SYSTEMS OVER RAYLEIGH
FLAT-FADING CHANNELS

2.1 Introduction

In this chapter, we investigate the timing estimation problem for a wireless communication

system employing multiple transmit and receive antennas with the aid of training signals.

Previous related work was primarily restricted to acquisition in spread spectrum systems

with multiple receive antennas [28, 29]. In Dlugos et al. [28] and Win et al. [29], the maximum

likelihood estimator of the received code lag was obtained, and the error probability for the

acquisition system was derived. A deterministic but unknown channel was considered in Dlugos

et al. [28], whereas a flat Rayleigh fading channel with known statistics was assumed in Win

et al. [29]. An optimal estimator for code acquisition was derived in Shamain et al. [30] for

spatially correlated channels. In Zhang et al. [31], the performance of code acquisition in a

DS-CDMA system employing multiple transmit antennas was analyzed. Through simulations,

it was shown that the presence of multiple transmit antennas improved the code acquisition

performance, relative to that of a single-antenna system.

Issues related to parameter estimation of signals received by an array of antennas have

also been treated in the radar array signal processing literature [32, 33]. Time delay and spatial

signature estimation of known signals received by an array of antennas was investigated in

Swindlehurst et al. [34]. ML algorithms and the Cramer-Rao bound for time delay and array

calibration estimation were developed, and some computationally efficient approximations of

the ML algorithms were proposed. In Dogandzic et al. [35], ML methods were developed for

space-time fading channel estimation with an antenna array in spatially correlated noise. The

CRBs for the unknown directions of arrival, time delays, and Doppler shifts were derived, under

a structured and unstructured array response model.

In the present work, we consider a wireless communication system with multiple trans-

mit and receive antennas in a slow, independent and identically distributed (i.i.d.) Rayleigh
7









flat-fading environment. The goal is to investigate the problem of timing estimation in such

a system with the aid of training signals. One of the main questions that we try to answer is

to find the optimal training signal design. We investigate the timing estimation problem under

two approaches. In the first approach, the channel is assumed to be unknown and determinis-

tic where joint estimation of the channel and delay is carried out. We derive an ML estimator

for joint channel and timing estimation, and compute the associated CRB. Then we discuss

the optimal training signals with respect to two performance measures based on the CRB: the

outage probability that the CRB is larger than a threshold and the average CRB. We show that

the optimal training scheme is one wherein orthogonal training signals from multiple transmit

antennas are used. In the second approach, the channel is assumed to be unknown but random

with known statistics. We use the likelihood function averaged over all random channel real-

izations to obtain the ML estimator for the delay. We derive the associated CRB and study the

optimal training scheme in terms of minimizing the CRB. We show that perfectly correlated

training signals employed at different transmit antennas constitute the optimal transmit scheme,

in contrast to orthogonal training signals in the first approach.

The rest of this chapter is organized in the following manner. The system model is in-

troduced in Section 2.2. In Section 2.3, we consider the timing estimation problem when the

channel is assumed to be unknown but deterministic. In Section 2.4, we study the problem of

timing estimation with the assumption that the channel is random but with known statistics. In

both sections, we derive the ML timing estimators and compute the associated CRBs. Optimal

training signal designs are discussed based on the corresponding CRBs. In Section 2.5, some

discussions comparing these two timing estimation approaches are provided.
2.2 System Model

We consider a single-user MIMO system with 7t transmit antennas and nr receive anten-

nas. We assume a quasi-static (block fading) channel where the channel varies slowly enough

to be considered invariant over a block. However, the channel changes to an independent value

from block to block. By using the unstructured array model [33], the received baseband signals









at the receive antennas are given in vector form by
nt
r(t) = hksk(t T) + n(t), (2.1)
k=1

where hk = [hk2, hk2,... hknr]w with hi3 denoting the channel gain from the ith transmit

antenna to the jth receive antenna, r(t) is the nr x 1 received signal vector from the receive an-

tenna array and Sk (t) is the transmitted training signal from the kth transmit antenna. Define the

channel vector as h = [hf h,..., h ]T. Also, n(I) is a complex, circular-symmetric, white

Gaussian noise process with zero mean and covariance matrix E[n(t)n(u)HI =2 a2 6(t u).

The symbol T denotes the unknown, deterministic delay to be estimated. This model assumes

that the delays between all pairs of transmit and receive antennas are the same. This corresponds

to the case in which the distance between the transmit and receive antenna arrays is much larger

than the sizes of the arrays.

We consider the Rayleigh flat-fading channel model, in which the channel coefficients

h,j are i.i.d. complex, circular-symmetric, zero-mean Gaussian random variables with the

CA/(0, p2) distribution, i.e.,


E[hkh] P2ur, E[hkhfk = 0, and E[hihH] = E[hihT] = 0, for i 5 j.


The conditional likelihood function of r(t), given the unknown r and h, can be written as


p(r(t) I, h) = Tr-ra- 2exp ( r(t) hkSk(t 7) dt), (2.2)
k=1

where we have assumed that the training signals sk(t), for k = 1,...., n,, have finite durations,

and the observation interval To is larger than the sum of the maximum training signal duration

and the maximum possible value of T. Thus the whole transmitted training signals are observed

at the receiver.








We can simplify the exponent of the likelihood function to find the sufficient statistics for
the estimation of the delay r:
To ut 2
/ r(t) hkSk(t-) dt
'0 k=1

= rH(t)r(t) dt 2Re{ [ rH(t)sk(t r) dt hk




= const 2Re r(t)sk(t r) dt hk
k=1 .
nt n t T o
+hfh s (t)sj (t) dt,
i1 =11
where the term const represents the part which does not depend on the delay r and the channel
h. Also, the last equality holds due to the assumption that To is larger than the sum of the
maximum training signal duration and the maximum possible delay.
Denote the matched filter output corresponding to the kth transmit signal by

rk(r) = r*(t)sk(t ) dt, k = 1,2.... nt. (2.3)

Note that r(r) = [rl(r)T, r2(7)T, ... r, (r7)T]T provides sufficient statistics for estimating r.
With this notation, we then have

2Re{ [ /T'rH(t)sk(t 7)dt hk} = 2Re{r(r)Th}. (2.4)

Denote the crosscorrelation between the training signals from the ith and jth transmit antennas
as
r = s*(t)s,(t) dt, (2.5)

which forms the (i, j)th element of the correlation matrix F. Let C = F 0 In,. Then, we have
nt 11t T(
E hhj s*(t)sj(t) dt = h"Ch. (2.6)
i=1 =1 0








From (2.4) and (2.6), the conditional likelihood function of r(r), given the unknowns r and h,
can be written as

p(r(r)|T, h) = 7r-' a-2nexp (const 2Re{r(T)Th} + hHCh1. (2.7)

Let (7r) = [Re(r(r))', -Im(r(r))T]T, f = [Re(h)T, Im(h)T]T, and

= Re (C) -Im(C) By using the isomorphism between real and complex matrices
Im(C) Re(C)
[36], we have 2Re{r(T)Th} = 2i(T)Th and h"Ch = 2h1TC. In terms of these real quantities,
the conditional likelihood function of f(r) is then

p(i(r)7r,h) = 7r-n"-2nexp cost 2(T) h + 2hTh (2.8)
2(2.8)


2.3 Timing Estimation with Unknown Deterministic Channel
In this chapter, we will treat h as unknown but deterministic in the estimation process and
consider the joint estimation of the delay r and the channel vector h.
2.3.1 ML Estimator
In this section, we develop the ML estimator for the joint estimation of the timing r and
the channel vector h. The joint ML estimate of r and h maximizes the conditional likelihood
function (2.8) as a function of r and h:

maxp((7-)|T, h) = max{mnaxp(r(r)|r, h)}. (2.9)
r,h h

Alternatively, we can maximize the log-likelihood function given by

1-
L = const + I (2f (r) h 2h'Tft). (2.10)

As suggested in (2.9), we first maximize the log likelihood function L over h. Taking the first
derivative of L with respect to (w.r.t.) h gives

aL 1
-- = {2f(r) 40C}.
Off 6,









By letting = 0, we get the ML estimate of the channel h as

m = -C- (T), (2.11)
2

where we have assumed that C, i.e. C, is nonsingular to obtain a unique estimation of the

channel. Then substituting (2.11) into (2.10) gives the ML estimate of the delay 7 in the form:

T,,,I = arg max {i(T)TC- ((T)}. (2.12)


To implement the ML estimator in general, we need to conduct a line search over all possible

values of r to maximize the above metric.
2.3.2 Cramer-Rao Bound

The Cramer-Rao bound gives a lower bound on the variance of any unbiased estimator

[36, 37]. It has been widely used to lower bound the mean square error (MSE) of symbol timing

estimators [38, 39]. It is well known [36, 37] that ML estimators, under mild regularity condi-

tions and with independent and identically distributed observations, are asymptotically unbiased

and efficient. It can be easily verified that the elements of r(T) given in (2.3) corresponding to

different receive antennas are i.i.d. observations. Thus for a particular realization of the channel

h, the ML estimator is asymptotically efficient, i.e., it approaches the CRB as the number of

receive antennas nr becomes large. Hence the CRB is a suitable performance measure for the

ML estimator of the delay 7. We will also verify the suitability of employing the CRB as a

performance metric by computer simulation examples.

The main result of this section on the CRB is contained in the following theorem.

Theorem 2.3.1 (Cramer-Rao bound). Suppose that the first and second derivatives of the

training signals Sk(t), for k = 1,..., nt, exist and they are uniformly continuous on [0. To].

Together with the standard regularity conditions in [36, 37], the Cramer-Rao bound for the
estimation of the delay 7 for a given realization of the channel h is given by

,2 1
CRB(h) 2r() T E'TC- r' (2.13)








where E[ ] = [E[ a2 ], E[ ],.... E[ ] with

E = h' (t) ()}l=, i= 1,2,...,nt (2.14)
T k=1 k Jk= o

andE[9- = [E[ T E[2 ]T. .E[ ] with

E ) = h s (1)(t) dt, i = 1,2,...,nt. (2.15)
k=1 "
Proof The CRB for the estimation of r is given as

CRB(h) = (I-1)22, (2.16)

where I is the Fisher information matrix for the joint estimation of the channel h and the delay
T which is defined as

I = 112 -E -E (2.17)
121 122 -E 2Lj _-E Lj

Since t = 4C and = 0, we have
oh Oh
E 52L1 4-.
I1 = -E = -H (2.18)
1,h2I W2
Moreover,
112 = -E -21- (2.19)

Let v E [ =E E E[ .. ., -- T ,
thenl2 = = -^ [Re(v)T, _Im(v)T]T.
The ith block of v can be computed from
,(r) fT r(t) Os(t T) dt
1 r*(r'(t) 0 -

= nh .s*(I 7.) + n*(.)]) di
0 k=1
h= h*( s(t 7) dt n*(t) '(t-') dt.
k=1 0








The fact that the noise n(t) is zero-mean gives

E [ ( = h; s (t- r) (t dt = hi s*(t)(t) dt. (2.20)
ST k=1 k 9k=1

Finally, 122 = E [ ]) h = Re {E [a2)] h}. Similarly, 122 can be computed
from the fact that
E { )= h s*((t) i(t) dt. (2.21)
k=i 1
Applying the standard result on the inverse of a partitioned matrix to (2.16) and (2.17)
gives
CRB-'(h) = 122 1211-11112. (2.22)

By using the relationship between real and complex matrices [36], we get

121111112 2= T 2 V = 2VT-v*. (2.23)

Then the CRB of the estimation of the delay r is

CRB(h) = 2 ". (2.24)
2Re E[--J h +E [ C- E[LTJ



We note that the CRB varies with different choices of training signals. By carefully choos-
ing the training signals to minimize a suitable measure associated with the CRB, we can poten-
tially improve the estimation performance.
2.3.3 Optimal Training Scheme
Communication systems often employ the same symbol waveforms for both training and
data phases. The choice of the symbol waveform is mainly decided by the performance required
by data transmissions. In this section, we shall make the following simplifying but practically
reasonable assumptions on the training signals:









Assumption 1

Let ak = [ak(0)...., ak(N I)]r be the training sequence assigned to the kth transmit

antenna, and on this antenna the training signal waveform is of the form
N-1
Sk(t) = E ak(i)4(t iT,), (2.25)
i=0

where N is the number of training symbols and Vp(t) is the symbol waveform. We call the

N x nt matrix A = [a. a2, ..., an,] as the training sequence matrix.

Assumption 2

The symbol waveform Vj(t) is time-limited to a single symbol period [0, T,] so that adjacent

symbols do not interfere with each other. In addition, O(t) is sufficiently smooth to guarantee

the existence of uniformly continuous first and second derivatives. This condition is satisfied

for most symbol waveforms of practical interest. Two typical examples are the time-domain

raised-cosine pulse and the half-sine pulse.

Assumption 3

AHA is nonsingular, and hence F and C are also nonsingular. We note that this implies

that N > nt.

Under the assumptions stated above, the CRB for the timing estimation can be simplified

to the expression summarized in the following corollary.

Corollary 2.3.1 (Cramer-Rao bound). Given Assumptions 1-3, the CRB for the estimation of

-r for a particular realization of the channel h reduces to

CRB(h) = -2 1H (2.26)
2V, hH(AHA 9 In,)h'

where '0 = ,bc -r- I Od12, Ob f I(t)12 dt, V)c = f V*(t)(t) dt, and 'd = f ?*(t) (t) dt.

Proof With the three assumptions on the training signals, we have
7 ToN-1 N-1
s'(t)(t)dt = ( Eak*(m)O*(t mT,) E (l)ai(t-lT,)dt
m0 0n=0 1=0
H aa,. I (t)(t) dt (2.27)








Then, Eqn. (2.14) can be written in terms of the training sequences as:

E 02 :c h(a"a. (2.28)
k=1
Thus
E [2r h = ,E T hha'Ha, = V,1h(AHAI I)h. (2.29)
E -r2 i k-k
i=1 k=1

Hence 122 = -gRe E [ hr = 2chH(AHA 0,,)h.
Moreover, (2.15) can also be simplified in terms of the training sequences as:

E -d h.akai. (2.30)
k=l

Thus E = -)d(AHA Inr)*h*. Similarly, we have

= s(t)sj(t) dt = ,a aj (2.31)

and C = 'b(AHA 0 I,,).Hence, (2.23) can be written as

121 -1112 = 2Vdh (AHA 0 In,r)(bAHA 0 In,)-'V*(AHA 0 I1,)h
= 2 dIhH(AHA I9,)h. (2.32)
0-2 Ob

Then the Cramer-Rao bound for the estimation of the delay r is

CRB(h) = [122 121111121-1
= O hH(AHA 0 I,)h -2 d h(AA I,,) h

S a2' b 1 (2.33)
2(',00, + '..', 2) hH(AHA 0 Io,)h

By using some standard properties of the Fourier transform similar to the Parseval's theo-
rem, we have b = f[ |()|2d, '_ -2 .- j 2 (w) 2dw, and
ad = f J jl W'(w)j2dw, where T1(w) is the Fourier transform of V(t). Then according to the









Cauchy-Schwarz inequality, we have


0a = M'b + I dl2
[ T! 0 ~()12dw]2 +1[j +00 C(9

27r 27r


< 0. (2.34)


Since Ob > 0, we have > 0 which implies that the expression of the CRB given in (2.33)

is nontrivial.



As a result, the dependence of the CRB on the training signals Sk(t), for k = 1,..., nt,

simplifies into that on the training sequence matrix A and the symbol waveform i(t). In the

following two subsections, we optimize the training sequence matrix A in terms of two perfor-

mance measures, namely the outage probability that the CRB is larger than a threshold and the

average CRB over all channel realizations.

Outage probability

In this subsection, the outage probability that the CRB is larger than the threshold 6, i.e.

Pr(CRB(h) > (), is used as a performance measure with respect to which the training signals

from different transmit antennas are optimized.

Write the spectral decomposition of AHA as AHA = UAUH, where U is a unitary

matrix and A = diag{ A1, A2, ., An, } is the diagonal matrix containing the positive eigenvalues

of AHA. The design of the optimal training scheme can now be formulated as the following

optimization problem:

mmin Pr(CRB(h) > c)
A
subject to tr{AHA} = Eib1 A. <_

A, > 0, i = 1,...,n (2.35)


where pbtr {AHA} < P specifies a constraint on the total transmit power.








First, we consider a simple but important case: 2 transmit antennas and 1 receive antenna.
In this case, the optimization problem (2.35) can be simplified as follows. Starting from Corol-
lary 2.3.1, we have

Pr(CRB(h) > e) = Pr hHAHAh < (2.36)

With the spectral decomposition of AHA, hHAHAh = hHUAUHh = h'HAh' = Ai|i|2 +
A2h'12, where h' = UHh. Since h is a random vector with i.i.d. complex, circular-symmetric,
zero-mean Gaussian elements and U is a unitary matrix, h' is also a complex Gaussian random
vector with the same distribution as h. We note that Ih'I 2 has the exponential distribution with
E(|h 12) = p2.
Let X = and c = P for i = 1, 2, then

Pr hHAHAh < ) ) Pr (cXi + c2X2 < -, (2.37)
2 2ea/ 2cV\ Pp2

where X1 and X2 are independent random variables with exponential distribution and E(Xi) =

E(X2) = 1. The total power constraint A, < -is equivalent to c, + c2 < 1. Hence the
optimization problem can be rewritten in the following simple form:

min Pr (ciXi + c2X2 < 2Eb2
Ci,C2 \ pp2
subject to c + c2 < 1, and c1,c2 >0 (2.38)

In order to solve the above optimization problem, we employ the following result on the
Schur-convexity' of the distribution function of the linear combination of two exponential ran-
dom variables [41].
Lemma 2.3.1. Let X1 and X2 be independent random variables with exponential distribution,
and E(Xi) = E(X2) = 1. Then the function

F(ci. c2, x) = Pr(ciXi + c2X2 < x), where cl + c2 = 1 and cl, c2 > 0,



A detailed description on Schur-convexity and majorization can be found in Marshall et al.
[40].









is Schur convex on (cl, c2) ifx < 1, and it is Schur concave on (ci. c2) if x > 3/2.

Using the above lemma and considering the region in which the CRB threshold e >

- P-- the optimization cost function in (2.38) is a Schur convex function on (c1, c2) Thus

minimization of the cost function occurs if and only if cl = c2 = i.e.., A1 = A2 = [40].

This implies that the optimal A is such that A"A = -E1I2. The optimal training scheme is

summarized in the following theorem.

Theorem 2.3.2. Suppose that the CRB threshold > the training sequence matrix

A such that AHA = -I minimizes the outage probability of the CRB for a system with 2
2Vkb
transmit antennas and 1 receive antenna. That is, the optimal training sequences from different

transmit antennas are orthogonal to each other and have equal powers.

We shall see from the discussion in the next subsection on the average CRB (Corollary

2.3.2), the value ---- is exactly one half of the average CRB over all channel realizations.

Thus, it is reasonable to consider the stated region of the CRB threshold.

It seems natural that a result analogous to the one in Lemma 2.3.1 be true for the more

general case. While the proof of such a result remains open, there is strong evidence regarding

the Schur convexity of the function F(ci,..., c,,, x) = Pr(cl Xi + + cn, X,, < x) where Xi,

for i = 1 ... nt, are independent random variables with unit-mean exponential distribution.

The following conjecture has been advanced in Merkle et al. [41], supported by some strong

numerical results.

Conjecture 2.3.1. The family of unimodal distribution functions F(c ,.... cn, x) is increasing

with respect to the variance (i.e., Schur-convex) for small values x, and decreasing (i.e., Schur-

concave) for large values of x.

Based on the above conjecture, we conjecture that the result in Theorem 2.3.2 extends to

the case of arbitrary numbers of transmit and receive antennas:

Conjecture 2.3.2. When A"A = --I, the outage probability of the CRB is minimized if the

CRB threshold c is not too small. Thus the optimal training sequences from different transmit

antennas, in terms of minimizing the outage probability, are orthogonal to each other and have

equal powers.









In Hassibi et al. [42], the authors assumed perfect timing estimation and studied the prob-

lem of choosing the optimal training sequences for channel estimation to maximize a lower

bound on the capacity of the channel that was learned by training. The optimal training se-

quences for channel estimation turned out to have the same structure as those we get here for

timing estimation.

To illustrate our conjecture on the optimality of orthogonal sequences, we have carried

out a large number of numerical calculations. In the broad region of c that we are interested
in, we have not observed the existence of any other schemes which can achieve a lower out-

age probability than the orthogonal training signals. In Fig. 2.1, we plot, for instance, the

outage probabilities Pr(CRB(h) > e) for a system with 4 transmit antennas and a single re-

ceive antenna employing different training signal sets. Note that since P is the total transmit
power constraint, the signal-to-noise ratio (SNR) p2 here should be understood as the total

SNR for the whole training period instead of the SNR for one symbol period. The time-domain

raised-cosine pulse is used as the symbol waveform. The results in the figure suggest that the
orthogonal training signals are optimal and can provide a significant performance gain over the

other training signals.
In Fig. 2.2, we compare the outage performance of orthogonal training sequences for dif-

ferent numbers of transmit antennas. The results in the figure show that the use of multiple trans-
mit antennas can offer substantial estimation performance improvement over a single-antenna

system. For example, if we consider the outage probability Pr(CRB(h) > e) = 0.1, the two-

transmit antenna system can achieve a 4 dB performance gain and the four-transmit antenna

system can achieve a 6 dB performance gain. The performance gap grows with decreasing

outage probability.
More precisely, the outage probability for orthogonal training signals is given by

Pr(CRB(h) > E) = Pr hH h < ntb )

= 1 exp, 2 G pp2 2ba Pp2J (2.39)








where the second equality is obtained from the fact that hHh is X2 -distributed [43]. From
(2.39), it is not hard to see that when the SNR is large, i.e. > 't the outage probability
is approximately given by

1 F 2 ]n2
Pr(CRB(h) > c) b 2 [-f p] "(2.40)
(nnr),! 2eoa Pp2

Eqn. (2.40) indicates that the outage probability decreases with the (ntn,)th power of the re-
ciprocal of the SNR. The power ntn, is usually referred to as the diversity order of the system
[43]. Thus we conclude that the use of multiple transmit and receive antennas (with orthogonal
training signals) provides spatial diversity for timing estimation in the same way as space-time
coding does for demodulation [1, 15].
An important remaining issue is whether the ML estimator can achieve the outage prob-
ability of the CRB. For each realization of the channel h, the ML estimator is asymptotically
efficient with increasing number of receive antennas n,. We note that Pr(CRB(h) > e) =
Eh[l(CRB(h) > c)], where 1(-) is the indicator function. Because the indicator function is a
bounded function, the dominated convergence theorem [44] implies that the ML estimator can
achieve the outage probability of the CRB asymptotically.
To verify the suitability of using the outage probability as a performance metric when the
number of receive antennas is small, we evaluate the performance of the ML estimator via
Monte-Carlo simulations. In Fig. 2.3, we plot the outage probabilities of the ML estimator
obtained from simulation and calculated using the CRB, respectively, for a system with two
transmit antennas and employing orthogonal training sequences. It can be seen that the ML
estimator gives an outage probability performance very close to that predicted by the CRB even
for small values of n, = 1, 2, and 4. Hence, the simulation results verify that the outage prob-
ability of the CRB provides an effective performance metric also when the number of receive
antennas is small.
Average CRB
In this subsection, we use the CRB averaged over the Rayleigh flat-fading channel h as an
alternate performance measure based on which the training signals from the transmit antennas
are optimized.






































>7
R'~.
\ \ 'S
\\ \q N7
.\
\ \
~ \


I \


25
SNR (dB)


Figure 2.1: Outage probabilities achieved using different training signal sets for a system with
4 transmit and 1 receive antennas. The unit of the threshold e is Tf .


E- =10-2,C=c=C=C3=4
-e- =10 2,c =2/3,c2=1/6,C3=c =1/12
-B- C=10 ,c1=9/10,c2=C3=C4=1/30
-V- =102,c =99/100C2 =C3 =C4=1/300
-*- =10- C1=C=C3C4
-0- =10 3,C1 =2/3,c2=1/6,C3=c4=1/12
- E- =10 3,c =9/10,c2=C3=C4=1/30
E=10 3C =99/100,C2=C3=C4=1/300


10-3,
10







23

















-0- n=1,=10

100--. n =2,E=10-2
N10 .- n=4,E=103
N nt=4,E=10-3
Sn=2,E=10

( .. ^ ... .. .. .. .. ... n =1,E=10-4
G L -- nt=2,c=10-4
S. n=4,E=10

10-1 -\ V.




o N\










10V
10 15 20 25 30 35 40
SNR (dB)

Figure 2.2: Outage probabilities achieved using orthogonal training signals for different num-
bers of transmit antennas. One receive antenna is employed. The unit of the threshold e is
T 2








24



















10 ---I
ML, nr=I
S ML, nr=2
... .. .... M L n 4 ....
CRB, n=1
\ \_ CRB, nr=2
CRB, n,-4










0




10 -







18 20 22 24 26 28 30 32 34
SNR(dB)


Figure 2.3: Comparison of outage probabilities of the ML estimator obtained from simulation
and calculated from the CRB. The number of transmit antennas nt is 2 and e = T104T?.









After averaging over the Rayleigh flat-fading channel h, the average CRB is given as

Eh[CRB(h)] 2bEh [hH(AH IJ. (2.41)

The design of the optimal training scheme can now be formulated as the following optimization

problem:

mmn Eh hH(AHAIn,)h

subject to tr{AHA} =E Ai <

Ai > 0, i = 1,...,nt. (2.42)

The following theorem specifies the optimal training sequence that minimizes the average CRB.

Theorem 2.3.3. When AHA = P1I, the average CRB over the Rayleigh flat-fading channel

h is minimized. That is, the optimal training sequences from different transmit antennas, in

terms of minimizing the average CRB, are orthogonal to each other and have equal powers.

Proof Let W = U'A'U'H, where A' = diag{A'i, A',..., A, } contains the positive eigenval-

ues of the Hermitian matrix W, and U' is a unitary matrix. Consider the following optimization
problem:

mmn E[

subject to tr{W} = EI' A' < ,v

A' > 0, i= 1,...,ntnr. (2.43)

Note that

E [h h = E[hU = E[hHA = E [znt ,l (2.44)
h hHWh h H U'A'U'Hh h'HA'h' =E_ AI|hl|2

where h' = U'Hh. As before, h' is a complex Gaussian random vector with the same distribu-
tion as h.
Let g(A') = ---, where xi > 0 are assumed to be fixed constants. We study the convexity
property of g.









We have -_ _- --- and 2- ,= j, Then the Hessian G(A') ofg is








It is easily seen that every rows of the Hessian G(A') are dependent. So rank(G(A')) = 1.
Ai I j 1."


A'-)3 (E )3
G =A') 2aEj 2ax2xrt
( A, )3 ( A* )3 ". (E )3



It is easily seen that every rows of the Hessian G(A') are dependent. So rank(G(A')) = 1.

G(A') only has one nonzero eigenvalue which is E 2X2 > 0. (the sum of eigenvalues of a

matrix is equivalent with the sum of all diagonal elements.) All other eigenvalues are zero.

Hence, the Hessian G(A') is a positive semidefinite matrix. Then g(A') is a convex function on

R" = {(A,..., A,,,) : A' > 0, for i= 1,..., ntn,}.

In order to solve the above optimization problem, we employ the following result from the

theory of majorization [40]. We first introduce some fundamental concepts of majorization that

we require in the derivation of the optimal transmit scheme.

For any x = (x:,..., x,) R", let xp] > ...> x[ denote the components of x in

decreasing order.

Definition 2.3.1. For vectors x, y A C R", vector y majorizes x on A if
k k
X[i] < y[i], k = 1,...,n-1

n n

i=1 i=1
The notation x -< y means x is majorized by y on A, or y majorizes x on A.

Majorization makes precise the vague notion that the components of a vector x are less

spread out or more nearly equal than the components of vector y.

Definition 2.3.2. A real-valued function f defined on a set A C R"n is said to be Schur-convex

on A if

x y => f(x) f (y)

f is Schur-concave if the above inequality is reversed. It follows that f is Schur-convex on

A if and only if -f is Schur-concave on A.








Lemma 2.3.2. IfX1,.... X, are exchangeable random variables and the multi-variable, single-
valued function g is a symmetric Borel-measurable convex function, then the function

f(a,,..., an) = E[g(aX1...., aXn)]

is Schur convex.

Since h' are i.i.d., they are exchangeable random variables. Since hI are exchangeable
random variables and g(A') is a symmetric Borel-measurable convex function, the function
E 2[ 1", is Schur-convex by the lemma.
Moreover, since P, is majorized by (Ai..., ,A't,) whenever A > 0, A' =
nr, we know [40] that E n is minimized with A' = A'2 r = We
A-b 1 1. 1A,'h',j s 1 = 2 -t,,r kbnt'
note that this choice of A', i = 1,..., ntn,, also satisfies the constraints in the minimization
problem in (2.42). Thus, it is also a solution to the original minimization problem. Thus the
optimal training sequence matrix A should satisfy A"A = P I which implies that the train-
ing sequences from different transmit antennas are orthogonal to each other and have equal
powers. O

With the optimal training sequences, we can provide an explicit expression for the average
CRB which is described in the next corollary.
Corollary 2.3.2 (Average CRB). Using the optimal training scheme, the average CRB over the
Rayleigh flat-fading channel h is given by

Eh[CRB(h)] -- b 2- (2.45)
2 (n _L) Pp2

when ntn, > 2.

Proof From Theorem 2.3.3 and its derivation, the average CRB under the optimal training
scheme is given as
Eh[CRB(h)] = E2hbE p (2.46)
20,, bLnt 2i=l 2i I
where h' are i.i.d. complex circular-symmetric Gaussian random variables with the C.A/(0, p2)
distribution.








Let Y = ~ ~'r Ih' 2. Then Y is xine-distributed with the probability density function
(p.d.f.)
fy(y)=1 y-It-le -, for y > 0. (2.47)


Let Z = 1/Y. The p.d.f. of Z is given as


fz(z) = -fy = 7(tt2r ), n,,+l' for z > 0.

The expectation of Z can be computed as

E(Z) = zfz(z)dz
1 00
= )2ltTr2tF(t) j e- z"ntn-2 dz. (2.48)

When ntnr > 2 [45], we have

2n"i' (ntnr 2)! 1
E(Z) = p2nnr2ntr(ntn, 1)! p-2ntn,+2 p2(ntn 1) (2.49)

Then from (2.46) and (2.49), the average CRB can be written in a simplified way as

Eh[CRB(h)] = 0- (2.50)
2 (n, ) a Pp2



With the optimal orthogonall) training sequences, the average CRB is a simple function of
the constant --, which only depends on the symbol waveform V(t), the signal-to-noise ratio

P-2, the number of transmit antennas nt, and the number of receive antennas nr. Note that the
average CRB in the limit of large nt or large nr can be approximated as

Eh[CR-B(h)] (2.51)
2nr pp 2, 2 2n, (VbU + I[d12) pp2'

which is inversely proportional to the number of receive antennas nr. When V'(t) is symmetric
about -, such as the time-domain raised-cosine pulse and the half-sine pulse, pd becomes zero.
Then the average CRB for the estimation of the delay r with orthogonal training signals can be








written as
1 cr2
Eh[CRB(h)] = 1(2.52)
2 _-L) 32Pp2

where = 1/2 [ (t)(t) dt/2 i -2 1/2 is known as the root-mean-
square bandwidth [37] of the symbol waveform. Here ID(cw) is the Fourier transform of 0 (t).
We note that the average CRB can be decreased by increasing the bandwidth of the symbol
waveform.
As before, we would like to know whether the ML estimator can achieve the average
CRB. Because the function hH(AHA0I,)h is not a bounded function, thus unlike the outage
probability of the CRB, the ML estimator may not achieve the average CRB asymptotically
(see further discussion in Section 2.5.1). However, the average CRB provides a lower bound for
the variance of any unbiased timing estimator averaged over the channel realizations.
Again, we employ Monte-Carlo simulations to evaluate the performance of the ML esti-
mator with a small number of receive antennas. In Fig. 2.4, we compare the mean squared
error (MSE) achieved by the ML estimator and the average CRB given by (2.45) for a system
with two transmit antennas and employing orthogonal training sequences. For a single receive
antenna system, the performance of the ML estimator deviates significantly from the average
CRB. This is due to the events in which all the channel coefficients are very small simultane-
ously causing the estimation performance to be very poor. The large estimation errors caused by
these events dominate the MSE of the ML estimator. We can see from the figure that the effect
of these events diminishes as the number of receive antennas or the SNR increases. In the for-
mer case, the error dominating events become rarer as the number of receive antennas increases.
In the latter case, the estimation errors, and hence the effect of the error dominating events, get
smaller as SNR increases. For a reasonably small value of n.r, e.g. 4, and a reasonably high
SNR, e.g. 20 dB, we see that the average CRB is still a rather appropriate performance metric.
2.4 Timing Estimation with Random Channel

Recently, differential space-time coding schemes [46, 47, 48] have been developed where
channel estimates are not required at the receiver. For this situation, we only need to consider



















































20
SNR(dB)


Figure 2.4: Comparison of the MSE of the ML estimator obtained from simulation and the
average CRB. The number of transmit antennas nt is 2. The unit in the vertical axis is T2.








the estimation of the delay r. A reasonable model to represent this scenario is that the channel
is random with known statistics.
2.4.1 ML Estimator
Recall that the conditional likelihood function p((7)|17, f) of i (7) in terms of real vectors
and matrices is given by (2.8). With the assumption of i.i.d. Rayleigh flat-fading channels
between the transmit and receive antennas, we have E [hh ] = p21n,,r and E al2] = I2n,,.n
The joint probability density function of the channel vector h is given as

p() = exp { h- 2fTl. (2.53)

We can average p((rT) |r, h) over all realizations of h to obtain the unconditional likelihood
function as

p(i(r)|r)

= /P((r()T,hfi)p(h)dhfi

= const x 1 exp { f(r)T (20 + PI r(7r) (2.54)
v/det(2C + (I) P

where we have used the integral result from Cramer [49, 11.12.1 a]. The natural logarithm of
p(f(r) 17) is the log-likelihood function:

ln[p(r(7T)7)] = const + -T.fT 2C + -I f(7). (2.55)
p2 )

By using the relationship between real and complex matrices [36], the log-likelihood function
can be written in terms of complex quantities as
1 .7T C 2 -1
ln[p(r(r)|r)] = const + -r(r) C + -I) r(r)*. (2.56)

Hence the ML estimator for the delay r is given by

Tn-, = argmaxp(r(r) T) = arg max r(r)T C + 2-I) r(r)* (2.57)

We assume that L is known to the receiver for the implementation of the ML estimator. We
note that the matrix C + -1 is always invertible. So unlike the restriction in Section 2.3, C can








be singular which implies the training signals from different transmit antennas can be correlated
with each other.
2.4.2 Cramer-Rao Bound
The CRB for the timing estimation based on the random channel model is summarized in
the following theorem.
Theorem 2.4.1 (Cramer-Rao bound). Suppose that the first and second derivatives of the
training signals Sk(t) exist and they are uniformly continuous on [0. To]. Together with the
standard regularity conditions [36, 37], the Cramer-Rao bound for the estimation of the delay
r over the i.i. d Rayleigh flat-fading channel model is given by

CRB = 0(2.58)
2n, tr{D-1G}"

where D = F + I1 and the (i, j)th element of G is

G p2 (IT Sk(t)A*(t) dt T s'(t)s j (t) dt
k=1 0
+ p2 sk(t)d*(t) dt ss(t)sj(t) dt
2 k=1 / \JI )

+p ( f sk(t)(t) dt) ( s(t)j(t) dt (2.59)
2 k=1 Jo Jo 0
fori,j = 1,2,...,nt.

Proof To derive the CRB, we start from the log-likelihood function in (2.56):

ln[p(r(r)|r)] = const + {r()TC + i 2 r(r)







The second derivative of the log likelihood function ln[p(r(T)|Ir)] w.r.t. r is


021n[p(r(T) lT)]
0r2
2 r(r) T2+ ) -r(*
a2 2r I p2 -)_


1 a2r( rT
U2- 2


2 2 -1)
= tr{(Cp2yI,

tr{ (C + I)-
T2 1 p2 W


12 r (7) C
r{c


P -1
I2 r()


12 (


) 2 T
2 '2


The expectation of the above is

E{ 021 n[p(r(i)r)] }




+ 2 2 E2 aTO
1 DT2 Tr( -

+-itr{(C + 0) E[ r(r) ] }
+-T-2tr{ (C E [r(T)* a2rT]}


Write r =
aO-


r ,()T o9r .T ar, T where the ith block can be computed as


h*~s (t ) dt-
k=1 k r -


(t) s(t r)
o 19'


Or (T)* Or (T)
= hk [ f Sk(t s(t ) dt


T O s(t s- r) dt]
0o kt a) -5


Ht
k=1 I


+ n(t) -' _r dt

+ TO nH(t) asj dt .
0 aM


Recall that the channel gain vector h is assumed to have i.i.d complex circular symmetric Gaus-
sian elements, i.e., E[hkhH] = p2,Ir, E[hkh ] = 0, and E[hihy] = E[hih '] = 0, i j. Thus


2 )
+ I21


-1 2 ( )*
9r2


D- 19iT f


2 -1 *2 "*
+ I r(r)


9Or(r)
Di-


(2.60)


Then


(2.61)








we have


E ari(r)* ar (r)
EL Sr r J
2 (p2 s k(t ) s*(- dt
k=1
T s(t T) s (t -
+f. 2)T dt

2= p k (1) (-dt ( To
k=1 0 0 J) o


Ss (t -7) (t T) dt

In'

st (1),(1)di) + a2 fTo u((t) dl}Inr.
0


As a result, we have E N o J = P 0 In,, where the (i, j)th element of P is given by


pIj = p2 s ( J s(t)*(t) dt)
k=1 0


( O (t) j (t) dt) + -2 (t),(t) dt.


Similarly, we also have E --5 r (T) ] + E r(r)*d2 = Q I, where the (i,j)th
element of Q is given by


"' To
Qj = P2 Sk
k=1

k=1
for i,j = 1,2,...,nt.
Let D = r + 2I, then
;;


E { ln[p(r(T))]
&r2


sTo (
S0(t) S(t)dt) +
) 0


J *(t)s (t) dt
Jo


S(t)S:(t) dt) (

Sk(t)s*(t) dt) (


+- 2 JT


= -tr{(D s In,)-(P I)} + 1tr{(D In,) (Q 0 I)}

= tr (D I,,) P + Q) 0 In
= tr D-1 P + 1Q I,

2 n, x tr D-1P + Q (2.62)
0`22 1


where the second equality is obtained by using (A g B)-1 = A-1 0 B- 1 and the third equality
is obtained from the property (A 0 B) (C 0 D) = (AC) (BD) [50]. Let G = P + 1Q. Since

s (t 7)sj(t 7) dt = F = const.


fo sWgj(t)dt
To /


s*(t)gj(t)dt,








differentiating the left side twice w.r.t. r gives

2 J *(t)sj(t) dt + f (t)sj(t) dt + J s*(t)sj(t) dt = 0.

Then using the above equality, the (i, j)th element of G becomes

1
Gi = P + IQ

SpE Sk(t),s(t) dt s(st),j(t) dt
k=1

+12 P f s (t)g*(t)dt) ( fo s*(t)sj(t) dt)

+ P2 E k(t)s*(t) dt s (t)sj(t) dt (2.63)
k= / \Jo /

for i, j = 1, 2,..., nt. As a result, we note that G does not depend on the noise a2. The CRB
of the timing estimation is given as

1 12
CRB = i = (2.64)
E{ 'i--In!i''ro i ...- 1, 2n, tr{D -1G }



2.4.3 Optimal Training Scheme
In this section, we impose Assumptions 1 and 2 made in Section 2.3 on the form of the
training signals. With these two assumptions, Gij can be simplified to

G = 0P2 Haaa = -P2a H {l H
k=1 k=1
Hence we have G = 0bp2AHAAHA. Thus the CRB for the timing estimation can be simpli-
fied as given in the following corollary.
Corollary 2.4.1. Given Assumptions 1 and 2, the Cramer-Rao bound for the estimation of the
delay r over the i.i.d Rayleigh flat-fading channel model reduces to

CRB = (2.65)
2nraP2 tr { (VbAHA + _I) AHAAHA}









Moreover, in terms of the eigenvalues A1, A2, ..., An, of AHA, we have
,7 2 1 n, u
tr AA p2 A AA"A = A2 (2.66)
P i= 1b + 7

Thus the minimization of the CRB is equivalent to the following optimization problem:

max E, I

subject to E1 Ai < P

Ai > 0, i = 1,...,nt. (2.67)

It can be easily verified that the cost function Y i -j' is a convex function on (A1,..., An,).

Then the following theorem specifies the optimal training sequences [51].

Theorem 2.4.2. The CRB is minimized by choosing the training sequence matrix A such that
A = -, and A2 .. = An, = 0. That is, the optimal training sequences from different

transmit antennas are perfectly correlated.

We note that the rank of the optimal training sequence matrix A is 1. This implies that we

can choose an arbitrary subset of transmit antennas to transmit the training signals as long as the

training sequences from the chosen transmit antennas are perfectly correlated with each other.

A common choice is to use the same training sequence and evenly assign the power to each

transmit antenna. With the optimal choice of training sequences, the corresponding minimum

CRB is given by:
CRB =(1+ ). (2.68)
2na Pp2 ( 2
On the other hand, when orthogonal training signals are employed, i.e., A = = An, =

the CRB is maximized to the value

CRB =-- 1 + n (2.69)
2nr,'da Pp2 PP2

Contrary to the previous case of joint estimation of the channel and delay where orthogonal

training sequences are optimal, they are the worst in terms of the CRB value for estimating the

delay under the random channel model. Fig. 2.5 compares the CRBs of the system with the

perfectly correlated training sequences and that with the orthogonal training sequences. Note









that the CRB of the system with the perfectly correlated training sequences is the same for any

number of transmit antennas. We see that the performance gain achieved by the optimal scheme

is obvious when the SNR is low. For any fixed nt, the performance gap vanishes as the SNR

becomes sufficiently large.

In Fig. 2.6, we compare the MSE achieved by the ML estimator and the CRB given in

(2.68) for a system with two transmit antennas and employing perfectly correlated training

sequences. The perfect correlation is obtained by using the same training sequence and evenly

assign the power to each transmit antenna. As will be discussed in Section V.B, no knowledge of

signal-to-noise ratio is needed to implement the ML delay estimator for this choice of perfectly

correlated training sequences. We observe from the figure that for a reasonably small value of

nr, e.g. 4, and a reasonably high SNR, e.g. 20 dB, the CRB is a tight lower bound on the MSE

performance of the ML estimator. This together with the asymptotic achievability of the CRB

suggest that it is an appropriate performance metric.

2.5 Discussions and Conclusions

In the previous two sections, we have studied the problem of timing estimation in multiple-

antenna systems from two different approaches. In Section 2.3, the channel h is assumed to

be unknown but deterministic and joint ML estimation of h and the delay 7 is performed. In

contrast, in Section 2.4, we assume that the channel is random but with known statistics and use

the likelihood function averaged over all channel realizations to construct the ML estimator for

the delay T. These two approaches lead to two different optimal training signal designs. For the

deterministic channel approach, we see that orthogonal training sequences minimize the outage

probability as well as the average CRB. For the random channel approach, perfectly correlated

training sequences minimizes the CRB. Here we compare these two approaches in terms of the

resulting ML estimators, CRBs, and suitability of the outage and average CRB performance

measures.





































10-2
L)


0 5 10


SNR(dB)


Figure 2.5: Comparison of CRBs obtained using orthogonal training sequences and perfectly
correlated training sequences for different numbers of transmit antennas. Note that the CRB
of the system with the perfectly correlated training sequences is the same for any number of
transmit antennas.


















































20
SNR(dB)


Figure 2.6: Comparison of the MSE of the ML estimator obtained from simulation and the
CRB. The number of transmit antennas nt is 2. The unit in the vertical axis is T2.









2.5.1 Orthogonal Training Signals

When orthogonal training signals are employed, both the ML estimators of the delay r

under the deterministic and random channel approaches, respectively, reduce to

,m = argmax{r(r)Tr(r)*}. (2.70)

Thus the equal gain combiner for the received signals from the receive antennas is the ML

estimator for both approaches. Under the deterministic channel approach, the average CRB has

the value
Eh[CRB(h)] _) 2b. (2.71)
2 (nr p) P2

Under the random channel approach, the CRB has the value

CRB- b 21 + nt)or (2.72)
2na Pp2 ( Pp 2

As discussed before, the CRB in (2.72) is asymptotically achievable by the ML estimator when

the number of receive antennas goes to infinity. In addition, the limiting ratio between (2.71)

and (2.72), when nr approaches infinity, is 1- which is smaller than 1. This implies that
1+nt -
the average CRB in (2.71) is not achievable by the ML estimator asymptotically when n,. ap-

proaches infinity. On the other hand, for small values of nr, the value in (2.71) can be larger

than the value of (2.72) when the SNR is large enough. More precisely, this happens when

> nt(nrnt 1). Thus in this case, the average CRB in (2.71) actually gives a tighter bound
on the performance of the ML estimator. The simulation results in Fig. 2.4 are in agreement

with this observation.

In this sense, the average CRB may not be as good a performance measure as the outage

probability in the deterministic channel approach since the latter is asymptotically achievable,
starting at very small values of n,., by the ML estimator. However, for small values of n, and at

high SNR, the average CRB may still be a reasonable performance metric.
2.5.2 Perfectly Correlated Training Signals

Under the random channel approach employing perfectly correlated training signals, we

have AHA = P qqT where q is an arbitrary nt x 1 vector with qTq = nt. For instance,









q = [1, 1,..., I]T when we use the same training sequence and evenly assign the power to each

transmit antenna. By using the matrix inversion formula, the ML delay estimator for this choice

of perfectly correlated sequences is reduced to be exactly the same as the one for orthogonal

training sequences given in (2.70). We note that the knowledge of the SNR is not needed

to implement the above ML estimator. Comparing the results in Figs. 2.4 and 2.6, the MSE

obtained by the ML estimator with the perfectly correlated training sequences is smaller than

that obtained by the ML estimator with orthogonal training sequences for all cases considered in

the simulation studies. This observation is in agreement with the training sequence optimization

result based on the CRB that the perfectly correlated sequences are superior than the orthogonal

sequences under the random channel approach.

In general, the SNR information is needed to implement the ML estimator. We also note

that perfectly correlated training signals are not applicable in the deterministic channel approach

since they cannot be used to estimate the channel vector h.

2.5.3 Deterministic vs Random Channel Approaches

The results and discussions in the previous sections provide some guidelines of whether

to use the deterministic or random channel approaches in estimating the timing parameter. If

the design consideration is the outage probability, i.e., neglecting a small percentage of the

worst-case channel realizations, one would employ the deterministic channel approach with

orthogonal training signals. On the other hand, if the average estimation (over all channel

realizations) error is the main design criterion, one would employ the random channel approach

with perfectly correlated training signals. We note that the perfectly correlated training signals

cannot be used for channel estimation. Thus they may be more suitable for space-time coding

schemes that do not require the channel information. In addition, the advantage of the perfectly

correlated training signals over orthogonal signals vanishes at high SNR in the random channel

approach. Thus when the number of transmit antennas is not very large and at high SNR, one

could employ orthogonal training signals for either of the two approaches.














CHAPTER 3
CHANNEL ESTIMATION FOR CORRELATED MIMO CHANNELS WITH COLORED
INTERFERENCE

3.1 Introduction

Many multiple antenna communication systems are designed for coherent detection that

requires channel state information (CSI) in the demodulation process. For practical wireless

communication systems, it is common that the channel parameters are estimated by sending

known training symbols to the receiver. The performance of this kind of training-based chan-

nel estimation scheme depends on the design of training signals which has been extensively

investigated in the literature.

It is well known that imperfect knowledge of the channel has a detrimental effect on the

achievable rate it can sustain [52]. Training sequences can be designed based on information

theoretic metrics such as the ergodic capacity and outage capacity of a MIMO channel [42,

53, 54]. The mean square error (MSE) is another commonly used performance measure for

channel estimation. Many works [55-65] have be carried out to investigate the training sequence

design problem based on MSE for MIMO fading channels. In Wong et al. [61], the authors

studied the problem of training sequence design for multiple-antenna systems over flat fading

MIMO channels in the presence of colored interference. The MIMO channels are assumed to

be spatially white, i.e., there is no correlation among the transmit and receiver antennas. The

optimal training sequences were designed to minimize the channel estimation MSE under a total

transmit power constraint. The optimal training sequence design result implied that we should

intentionally assign transmit power to the subspace with less interference. A practical algorithm

of estimating the long-term second-order statistics of the interference correlation matrix and

an efficient scheme of feeding back necessary information to the transmitter for constructing

the optimal training sequences were also proposed. In Kotecha et al. [62], the problem of

transmit signal design was investigated for the estimation of spatial correlated MIMO Rayleigh

flat fading channels. The optimal training signal was designed to optimize two criteria: the
42









minimization of the channel estimation MSE and the maximization of the conditional mutual

information (CMI) between the channel and the received signal. The authors adopted the virtual

channel representation model [66] for MIMO correlated channels. It was shown that the optimal

training signal should be transmitted along the strong transmit eigen-directions in which more

scatters are present. The powers transmitted along these eigen-directions are determined by

the water-filling solutions based on the minimum MSE and maximum CMI criteria. In Cai

et al. [65], the space-time spreading scheme, block coding scheme and channel estimation

for correlated fading channels in the presence of interference have been studied. The authors

focused on the single receive antenna case and extended their results to the multiple receive

antennas case where receive antennas were assumed to be uncorrelated. Based on the previous

optimization results for the special case [63] where there was no interference, the space-time

beamforming (STBF) matrix was chosen as the training symbol matrix for the linear MMSE

channel estimator. Then the optimal power loading scheme was designed for the training symbol

matrices in this particular set.

In this chapter, we investigate the problem of estimating correlated MIMO channels with

colored interference. We adopt the correlated MIMO channel model [21, 67] which expresses

the channel matrix as a product of the receive correlation matrix, a white matrix with identically

and independent distributed (i.i.d.) entries, and the transmit correlation matrix. This model im-

plies that transmit and receiver correlation can be separated. This fact has been verified by field

measurements. The colored interference model used here is more suitable than the white noise

model when jamming signals and/or co-channel interference are present in the wireless com-

munication system. We consider an interference limited wireless communication system, and

assume that the thermal noise is small relative to interference and can be ignored. Then we show

that the covariance matrix of the interference has a Kronecker product form which implies that

the temporal and spatial correlation of the interference are separable. The channel estimation

MSE is used as a performance metric for the design of training sequences. The optimization

problem encountered here which minimizes the channel estimation MSE under a power con-

straint is a generalization of two previous optimization problems which are encountered widely









in the signal processing area [61, 63, 64, 68]. We first analyze the optimal structure of the solu-

tion by using the Lagrangian method, and then find the optimal power allocation scheme which

has the water-filling interpretation. Finally we determine the optimal ordering for the related

eigenvector matrices. In Cai et al. [65], the authors encountered the essentially same optimiza-

tion problem but with the different form. Based on the the previous optimization results for the

special case [63], the authors chose to optimize the training sequence matrix in a particular set

of matrices which have the same solution structure and eigenvector ordering as our solution.

Here we rigorously prove that this particular solution structure and eigenvector ordering result

are optimal for arbitrary matrices with the power constraint. The design of the optimal training

sequences has a clear physical interpretation which implies that we should assign more power to

the transmission direction constructed by the eigen-direction with larger channel gains and the

interference subspace with less interference. In order to implement the channel estimator and

construct the optimal training sequences, we propose an algorithm to estimate long-term chan-

nel statistics and design an efficient feedback scheme so that we can approximately construct

the optimal sequences at the transmitter. Numerical results show that with the optimal training

sequences, the channel estimation MSE can be reduced substantially when compared with the

use of other training sequences.

The chapter is organized in the following manner. The system model and linear MMSE

channel estimator that we consider are introduced in Section 3.2. In Section 3.3, The optimal

training sequence is designed based on minimizing the total channel estimation MSE. In Section

3.4, an algorithm for the estimation of long-term characteristics of the channel is proposed

and an efficient feedback scheme is designed. Numerical results are provided in Section 3.5.

Conclusion is drawn in Section 3.6.
3.2 System Model

We consider a single user link with multiple interferers. The desired user has nt transmit

antennas and n, receive antennas. We assume that there are M interfering signals and the ith

interferer has n, transmit antennas. The MIMO channel is assumed to be quasi-static (block

fading) in that it varies slowly enough to be considered invariant over a block. However, the

channel changes to independent values from block to block. We assume that the users employ









a frame-based transmission protocol which comprises training and payload data. The received

baseband signals at the receive antennas during the training period are given in matrix form by
M M
Y = HST+ HiST = HST+E = HST + E,. (3.1)

E

The n, x n.t matrix H and the nr x ni matrix Hi are the channel gain matrices from the transmitter

and the ith interferer to the receiver, respectively. S is the N x nt training symbol matrix known

to the receiver for estimating the channel gain matrix H of the desired user during the training

period. N is the number of training symbols from each transmit antenna and N is usually

much larger than nt. S, is the N x ni interference symbol matrix from the ith interferer. We

assume that the elements in S, are identically distributed zero-mean complex random variables,

correlated across both space and time. The interference processes are assumed to be wide-sense

stationary in time. We consider an interference limited wireless communication system. Hence

we assume that the thermal noise is small relative to interference and can be ignored [69].

We adopt the correlated MIMO channel model [21, 67] which models the channel gain

matrix H as:
H = R2H,, R1/2 (3.2)

where Rt models the correlation between the transmit antennas and Rr models the correlation

between the receive antennas, respectively. The notation (.)1/2 stands for the Hermitian square

root of a matrix. H,, is a matrix whose elements are independent and identical distributed zero-

mean circular-symmetric complex Gaussian random variables with unit variance. Let h,, =

vec(H,), where vec(X) is the vector obtained by stacking the columns of X on top of each

other, then we have

h = vec(H) = (Rt/ 0 R )h,,, (3.3)

with h ~ C.N(0, R, R,) where CAf denotes complex Gaussian distribution. Similarly, we

model the channel gain matrix from the ith interferer to the receiver as:


H, = R'/2H,iR (/2


(3.4)









and

hi = vec(Hi) = (R/02 R/2)hw. (3.5)

Using the vec operator, we can write the received signal in vector form as


y = vec(Y)
M
= (S 0 In,)vec(H) + -(S 0 In,)vec(Hz)
i=1
M
= (S In,)h + Z(S 0 I,)(R2 R2)h
i=1
M
= (S0In,)h+ ee
i=1
= (S In,)h + e (3.6)


where In, denotes the nr x n,. identity matrix.

To derive the linear MMSE channel estimator, we need the following lemma.

Lemma 3.2.1. E(e) = 0 and the covariance matrix of e is

M
E(eeH) = QNi Rr = QN 0 R, (3.7)
i=1

where
SRk(O) ... R,k(- 1)
QN = : (3.8)

E= RZk (N 1) k=, R1Ik(0)
and R,k (r) represents the time correlation between the signals at time instants m and m + T

from the kth antenna of the ith interferer

Proof Since h,i ~- CaV(0, I.,n,), E(ei) = 0. Then we have E(e) = 0.

The received signal from the ith interferer can be written as


E. = HiS, = R/2H,, R/2S. = R1/2H, S. (3.9)


Since Si is wide-sense stationary in time, S, is also wide-sense stationary in time.








Using the vec operator, we can rewrite the interfering signal from the ith interferer as


ej = vec(Ei) = (IN 0 R'/2)vec(H ,,S,).


(3.10)


The covariance matrix of e, is given as


E(eefH) = E[(IN 0 R'/2)vec(H,,S,)vec(Hw,,S)H(IN 0 R/2)H]

= (I R1/2)E[vec(HewS )vec(HS)H](IN RN 2).

Let e' = vec(HwiSi), we can show that the covariance matrix of e is

k,k(0)I, ... k= R,k(N 1)I
E[eeH] : ".

JR 1 Rk(N 1)I .. E 1= Rk(0)Ir
= QN In,,


(3.11)


(3.12)


where Rk,k (r) represents the time correlation between the signals at time instants m and m + T

from the kth antenna of the ith interferer. Then we have

E[eie'l] = (IN 0 R'/2)(QNi 0 In,)(IN 0 R1/2)


(3.13)


= QNi Rr..


The covariance matrix of e is then given as


M
E[eeH] = Z QN 0 R, = Q .
i=1


(3.14)


We note that QN captures the temporal correlation of the interference and R, represents

the spatial correlation. The covariance matrix of the interference has a Kronecker product form
which implies that the temporal and spatial correlation of the interference are separable.








We notice that (3.6) represents a linear model. Based on the Bayesian Gauss-Markov
Theorem [36], the linear minimum mean square error estimator (LMMSE) for h is given as:

h = [(S" 3 In,)(Q O R,)-'(S I,) + (Rt R,)-']-1(S 0 In.,)(QN O R,)-ly
= [(S1HQNIS + R7')-1 R,](SH Inr)(QN 3 Rr 1)y

= [(SHQ NS + Rtl)- SHQN' 0 I ]Jy. (3.15)

Using the equality vec(AYB) = (BT 0 A)vec(Y), we can rewrite the channel estimator in the
compact matrix form as
fH = YQ-1S(SHQ NS + Rt )-'. (3.16)

Hence the channel estimator does not depend on the receive channel correlation matrix Rr.
The performance of the channel estimator is measured by the estimation error e = h h
whose mean is zero and whose covariance matrix is

C, = E[(h h)(h h)"]
= [(SH 0 I.,)(QN Rr) (S 0 In,) + (Rt 0 R,)-]-

= [(SHQN)S) ( Rr1 + R '-1 9 R,-]-1

= [(S"QNS + Rt) Rr-1
= (SHQ NS + Rt 1)-' 3 R, (3.17)

where the third equality is due to (A B)(C D) = AC BD and (A B)- = A-' B-1.
The diagonal elements of the error covariance matrix C, yields the minimum Bayesian MSE.
The total MSE is the commonly used performance measure for MIMO channel estimation. By
using the fact that tr(A 0 B) = trAtrB, we have

tr(C,) = tr((SHQN'S + R '1)-' s R,) = tr((SHQN'S + Rt-')-)tr(Rr).

Thus the minimization of the total MSE over training sequences does not depend on the receive
channel correlation matrix. Only the temporal interference correlation matrix QN and the trans-
mit correlation matrix Rt need to be considered in obtaining the optimal training sequences.









3.3 Optimal Training Sequence Design

In this section, we investigate the problem of optimal training sequence design for channel

estimation. With the total Bayesian MSE as the performance measure, the optimization of

training sequences can be formulated as follows

min tr(SHQN S + Rt-1)-1 (3.18)
S
subject to tr{SHS} < P

where tr{SHS} < P specifies the power constraint.

The optimization problem itself is of great interest to researchers in the signal processing

and communication areas. Its special cases (with either QN or Rt equal to the identity matrix)

have been encountered widely in joint linear transmitter-receiver design [63, 68, 70], training

sequence design for channel estimation in multiple antenna communication systems [61, 64],

and spreading sequence optimization for code division multiple access (CDMA) communication

systems [71].

The solution in the special case Rt = I, found for example in Wong et al. [61] and

Scaglione et al. [68], can be expressed in terms of the eigenvalues and eigenvectors of QN and

a Lagrange multiplier associated with the power constraint. Similarly, the solution in the special

case QN = I, found for example in Zhou et al. [63] and Biguesh et al. [64], can be expressed in

terms of the eigenvalues and eigenvectors of Rt and a Lagrange multiplier associated with the

power constraint. The optimization of the generalized mean square error problem introduced

here is more difficult. We will show that (3.18) has a solution that can be expressed S = UEVH

where U and V are orthonormal matrices of eigenvectors for QN and Rt respectively, and E

is diagonal. Solving (3.18) involves computing diagonalizations of QN and Rt, and finding an

ordering for the columns of U and V. In Cai et al. [65], the authors encountered the essentially

same optimization problem but with the different form. Based on the the previous optimization

results for the special case [63], the authors chose to optimize the training sequence matrix in

a particular set of matrices which have the same solution structure and eigenvector ordering as

our solution. Here we rigorously prove that this particular solution structure and eigenvector

ordering result are optimal for arbitrary matrices with the power constraint.









A related optimization problem which minimizes the trace of the mean square error matrix

in a variant form is discussed in Section 3.7.1., and another optimization problem which max-

imizes the determinant of the inverse of the mean square error matrix is introduced in Section

3.7.2.

We solve the optimization problem (3.18) in three steps. First, we analyze the optimal

structure of the solution by using the Lagrangian method, then find the optimal power allocation

scheme, and finally determine the optimal ordering for the related eigenvector matrices.
3.3.1 Solution Structure

We begin by analyzing the structure of an optimal solution to (3.18). Let UAUH and

VAVH be diagonalizations of QN and Rt where the columns of U and V are orthonormal

eigenvectors. Let Aj, 1 < j < N, and 6~, 1 < i < nt, denote the diagonal elements of A and

A, respectively. We assume that the eigenvalues { A, } are arranged in increasing order, and {5,}

are arranged in decreasing order:

062> ... -> n, > 0. (3.19)

Let us define
T = UHSV. (3.20)

Substituting S = UTVH in (3.18) gives the following equivalent optimization problem:

min tr (THA -1T+ A-)-1 (3.21)

subject to tr (THT) < P, T E CNxt.


We now show that the solution to (3.21) has at most one nonzero in each row and column.

Theorem 3.3.1. There exists a solution of (3.21) of the form T = IIH1II2 where IIi and 112

are permutation matrices and oai = 0 for all i $ j.

Proof. We first prove the theorem under the following nondegeneracy assumption:


6 7 8j > 0 and A. $ A, > 0 for all i # j.


(3.22)








Since the cost function of (3.21) is a continuous function of A and A, and since any A > 0 and
6 > 0 can be approximated arbitrarily closely by vectors 6 and A satisfying the nondegeneracy
conditions (3.22), we conclude that the theorem holds for arbitrary A > 0 and 6 > 0.
There exists an optimal solution of (3.21) since the feasible set is compact and the cost
function is a continuous function of T. Since the eigenvalues of A T"A- TA are nonneg-
ative, it follows that for any choice of T,

tr (THA-T + A-)-1 trA(ATHA-1TA +I)-1 < tr (A),

with equality when T = 0. Hence, there exists a nonzero optimal solution of (3.21), which is
denoted T. According to the Lagrange multiplier theorem, the first-order necessary condition
for an optimal solution is the following: there exists a scalar 7 > 0 such that:

tr ((THA-1T + A-)- + 7THT)T= = 0. (3.23)

For notation simplicity, let
M = THA-T + A-1. (3.24)

For any invertible matrix M, the derivative of the inverse of a matrix [72] is given as:

dM-a MIdM)M-
=-M- M-
dT

Hence, (3.23) is equivalent to:

tr (7[TH6T + 6THT] M-THA-16T + 6THA- M-1) = 0

for all matrices 6T e CNxIt.
Let Real (z) denote the real part of z E C. Based on the fact that tr (A + AH) =
2(Real [tr (A)]) and tr (AB) = tr (BA), we have

Real [tr (yH6T M-2THA-16T)] = 0.

By taking 6T either pure real or pure imaginary, we deduce that


tr ([-'"H M-2THA-1]6T) = 0









for all 6T. By choosing 6T to be completely zero except for a single nonzero entry, we conclude
that
7TT" M-'T"A-1 = 0. (3.25)

If 7 = 0, then T = 0 since both A and A are invertible. Hence, 7 > 0.
We multiply (3.25) on the right by T to obtain

TT = M-2THA-'T = (THA-T + A-1) -2HA-1T (3.26)


Since THT is Hermitian, we have

(THA -T + A-1)-2THA-l = THA-T(THA-1 + A-1)-2.

Then we will show that THA-lT and A-1 commute with each other. We need the following

lemma [73]:

Lemma 3.3.1. If A and B are diagonalizable, they share the same eigenvector matrix if and

only ifAB = BA.

For the simplicity of notations, let A = T"A-'T and B = A-1. Then we have

(A + B)-2A = A(A + B)-2


According to Lemma 3.3.1, A and (A + B)-2 share the same eigenvector matrix. Since A + B

and (A + B)-2 have the same eigenvector matrix, A and A + B share the same eigenvector
matrix. Then we have
A(A+ B) = (A +B)A

Hence,
AB = BA,

which implies that THAl-T and A-1 commute with each other. Since A-1 is diagonal,
TH A-1T is diagonal. Since THA -T is diagonal, THT is diagonal by (3.26).

Since THA-"I' and A-1 are diagonal, both M and M-1 are diagonal. Hence, the factor
M-2 in (3.25) is diagonal with real diagonal elements denoted ej, 1 < j < ntt. By (3.25), we









have
7 j = e (3.27)
Ai

If lj, 7 0, then (3.27) implies that
ej
-- =7- 10.
Ai

By the nondegeneracy condition (3.22), no two diagonal elements of A are equal. If for any

fixed j, Tij $ 0 for i = i1 and i2, then the identity = -y7 yields a contradiction since 7 $ 0

and Ai, = Ai,. Hence, each column of T has at most one nonzero. Since THT is diagonal,

two different columns cannot have their single nonzero in the same row. This implies that each

column and each row of T have at most one nonzero. A suitable permutation of the rows and

columns of T gives a diagonal matrix E, which completes the proof. E

Combining the relationship (3.20) between T and S and Theorem 3.3.1, we conclude that

problem (3.18) has a solution of the form S = UII1EIH2VH, where Hi and H2 are permutation

matrices. We will show that we can eliminate one of these two permutation matrices.

Substituting S = UII NEH2VH in (3.18), the equivalent optimization problem is obtained

as:

min tr E(HHA-i + I12A-I 2 (3.28)
E,n1i.n2 \ /
M
subject to tr C 2o < P
i=1

where M represents the minimum of N and nt. In the above optimization problem, the mini-

mization is over diagonal matrices E with a, .. ., O-M as the diagonal elements, and two per-

mutation matrices HI and 1.2-

Since the symmetric permutations IIA-1f1I and IIHA-1H2H essentially interchange di-

agonal elements of A and A, (3.28) is equivalent to
M
mirn 2 (3.29)
l ^ aO/Ar(,) + 1/d7,2()
M
subject to o~ <- P, Tri E PN, X72 E P.,
i= 1


where PN is the set of bijections of {1, 2,..., N} onto itself.








We will now show that the optimal solution only depends on the smallest eigenvalues of

QN and the largest eigenvalues of Rt.
Lemma 3.3.2. Let UAUH and VAVH be diagonalizations of Q and D respectively where

the columns of U and V are orthonormal eigenvectors. Let a, 7ir, and 7r2 denote an optimal

solution of (3.29) and define the sets

M ={i: > 0}, Q={A,,(L):ieM}, and = {16,(j) :ie M},

If M has I elements, then the elements of the set Q constitute the I smallest eigenvalues of QN,

and the elements of R constitute the I largest eigenvalues Rt, respectively.

Proof Assume k M and A7r(k) < Ail(i) for some i E M. It is easy to see that by inter-

changing the values of 7r, (i) and 7r (k), the new i-th term in the cost function is smaller than the

previous i-th term. It contradicts the optimal assumption of a and 7r. Then Ar,(k) > A1(i).
Then, suppose that k M and (62(k) > 52(i) for some i 6 M. Let C denote the cost

value due to the sum of the i-th term and the k-th term before the interchange. Similarly, let C+

denote the cost value due to the sum of the i-th term and the k-th term after the interchange of

the values of 72(i) and 7r2(k). We have

1
C = a2/A() + /(+ 2(k)

and
C+ 1 + 0)2(I)
S-2/A71(i) + 1/6 2(k)
Since 6,:(k) > 62 (i),

C+ C
( 2(k) 4- 2())(42(k) 7r2(i)/A() W i2+2(k)/A r + j 2()/ ))
(0' 232(k)/ / A(7) + 1(r 7M/A7,it) + 1)
< 0.

The cost is reduced by interchanging the values of r2 (i) and 7r2(k), which violates the optimality
of ao- and 7r. Hence, 652(k) < 42(0). O









Using Lemma 3.3.2, we now show that one of the permutations in (3.29) can be deleted if

the eigenvalues of QN and Rt are arranged in a particular order.

Theorem 3.3.2. Let UAUH and VAV1H be diagonalizations of QN and Rt respectively where

the columns of U and V are orthonormal eigenvectors, the eigenvalues of QN are arranged in

increasing order and the eigenvalues of Rt are arranged in decreasing order If M is the

minimum of the rank of QN and R,, then (3.29) is equivalent to
M
min 1 (3.30)
1 2/Ai) + 1/6i
M
subject to cr,! < P, 7r C PM,
i=1
where cr = 0 for i > M.

Proof Since at most M eigenvalues of either QN or Rt are nonzero, it follows from Lemma

3.3.2 that the set M has at most M elements. Since the elements of Q are the smallest eigen-

values of Q and the elements of R are the largest eigenvalues of Rt, we can assume that

7ri(i) e [1, M] and 72(i) e [1, M] for each i e M. Hence, we restrict the sum in (3.29) to

those indices i S where
S = { j): 1 < j< M}.

Let us define = o-r Ij) and 7r(j) = t7r21(j)). Since 7r(j) E [1, Al] for j E [1, M], it

follows that rt PM. In (3.29) we restrict the summation to i E S and we replace i by 7F2 '(j)

to obtain
M M
1 where ( \ )2 < p
E / + 1/.2() aj/A(j) + 1/ ',
iES j=1 i=1

This completes the proof of (3.30). E

Combining the relationship (3.20) between T and S, Theorem 3.3.1 and Theorem 3.3.2

yields the following corollary:

Corollary 3.3.1. Problem (3.18) has a solution of the form S = UIIEVH where the columns

of U and V are orthonormal eigenvectors of QN and Rt respectively with the eigenvalues of









QN arranged in increasing order and the eigenvalues of Rt arranged in decreasing order, H is
a permutation matrix, and E is diagonal.

Proof Let a and 7r be a solution of (3.30). For i > M, define 7r(i) = i and aC = 0. If H is the

permutation matrix corresponding to 7r, then making a substitution S = UHIIVH in the cost

function of (3.18) yields the cost function in (3.30). Since (3.29) and (3.30) are equivalent by

Theorem 3.3.2, S is optimal in (3.18). E

3.3.2 The Optimal E

We now consider the optimization problem which minimizes the cost function over a with

the permutation 7r in (3.30) given. Then in the next subsection, we will find the optimatial

permutation 7r based on the solution to the optimization problem considered here. For the sake

of notation simplicity, let pi denote 1/AX(i) and q, denote 1/6i. Hence, for fixed 7r, (3.30) is

equivalent to the following optimization problem:

mmin (3.31)
a Pi + qi
i=1
M
subject to aE < P.
i=1

The solution of (3.31) can be expressed in terms of a Lagrange multiplier related to the power
constraint. The structure of this solution has a water filling interpretation in the communication

literature [74].

Theorem 3.3.3. The optimal solution of (3.31) is given by

S 1 q 1/2
ai = max { -i, 0 (3.32)
IPiP Pi

where the parameter p is chosen so that
M
:o = P. (3.33)


Proof. Since the minimization of the cost function in (3.31) is over a closed and bounded set,

there exists a solution. At an optimal solution to (3.31), the power constraint must be an equality.

Otherwise, we can multiply a by a scalar larger than 1 to reduce to the value of the cost function.
For the sake of notation simplicity, let ti = oa. Then the reduced optimization problem (3.31)









is equivalent to
M
min 1 (3.34)
t Z pit + qi
M
subject to t = P, t > 0.
i=1

Since the cost function is strictly convex and the constraint is convex, the optimal solution to

(3.34) is unique.

According to the Lagrange multiplier theorem, the first-order necessary conditions [51]

(Karush-Kuhn-Tucker conditions) for an optimal solution of (3.34) are the following: there

exists a scalar p > 0 and a vector v e IRM such that

S + p- vi = 0, v 0, ti > 0, vt = 0, 1 < i < M. (3.35)
(piti + qi)2

Due to the convexity of the cost and the constraint, any solution of these conditions is the unique

optimal solution of (3.34).

A solution to (3.35) can be obtained as follows. We define the function

(71q)= 1 (3.36)
\. Pi' Pi)(1

Here x+ = max{x, 0}. This particular value for ti is obtained by setting v, = 0 in (3.35) and

solving for ti; when the solution is < 0, we set ti(p) = 0 (this corresponds to the + operator

(3.36)). We note that t,(/p) is a decreasing function of p which approaches +00 as p approaches

0 and which approaches 0 as p grows to +oo. Hence, the equation
M
E ti(.) = P (3.37)
i=1

has a unique positive solution. Since ti(pi/q<2) = 0, we have ti(/) = 0 for p> p/qg2. Then we

have
A +. -=- +.p- > 0 for p > p./q2.
(piti(p) + qi)2 q2
We deduce that the Karush-Kuhn-Tucker conditions can be satisfied when p is the positive

solution of (3.37). E









3.3.3 Optimal Eigenvector Ordering

Finally, we need to find an optimal permutation in (3.30), i.e., an optimal ordering for the

eigenvalues of QN and Rt.

Theorem 3.3.4. If the eigenvalues {Ai} of QN are arranged in increasing order and the eigen-

values {6i } of Rt are arranged in decreasing order, then an optimal permutation in (3.30) is

r(i) = i, 1 < i < M. (3.38)


Proof Assume that there exist indices i and j such that i < j, A, > Aj and 6, > 6j, i.e., pi < pj

and qi < qj. A) and A are not arranged in the supposed optimal order for the eigenvalues of

QN. We will show that it will cause contradiction.
Let us consider the following optimization problem:

min + (3.39)
titj pti + qi pjtj + qj
subject to ti + tj = P, ti > 0, tj > 0,


where P = a' + o-J. Since a yields an optimal solution of (3.30), it follows that a solution of

the above optimization problem is ti = a2 and tj = aoj. Based on Theorem 3.3.3, the ti is given

as
t()= -qi, (3.40)
V Pit Pi
where p is a Lagrange multiplier obtained from the power constraint ti + tj = P as:
1 + 1
7-fi vpi ,p (3.41)
=p+ q (3.41)
Pi Pj

Let C denote the cost function for (3.39). Combining (3.40) and (3.41) gives
1 + 1 1 )2
C + p
piti + qi pjtj + qj P + +

Now, suppose that we interchange the values of pi and pj. Let C+ denote the cost value

associated with the interchange. With the assumption that the optimal solution of (3.39) is still









positive after the exchange of p, and p3, we have

( + 1 )2
C+ _= (; + v7) (3.42)
Pi Pj

We need to use the following lemma [40]:

Lemma 3.3.3. If ai, bi, i = 1,..., n are two sets of numbers,
n n n
jajbj.-i+i] E aibi < jaa[>]b[,]
i=1 i=1 i=1

From the above lemma, we have q- + l > 1 + 21. Then C+ < C.
Pi Pj Pi Pj
If C+ < C, it contradicts the optimality of u. Then we have C+ = C. Hence, for each i

and j with i < j, pi < pj and q, < qj, we can interchange the values of pi and pj to obtain a new

permutation with the same value for the cost function. After the interchange, we have pi > pj,

i.e., A, < Aj. In this way, the A, are arranged in increasing order. Since the 6i are arranged in

decreasing order, we conclude that the associated optimal permutation 7t is (3.38).

One technical point must now be checked: we should verify that if pi < pj and q, < qj

with i < j, and if we exchange pi and pj, then the corresponding optimal solution of (3.39)

remains positive.

To check it, we consider two cases respectively. For the first case, suppose ok > 0 with

i < j < k, Pk < Pi < Pj and q, < qj < qk. From (3.40), we have

o() = 1[ qk >0,
V Pk Pk

Then
1 qk


After the exchange,
1 p + i


1 q- 1 q 1 qk
l_{lp >, >_ V P, ) ) > 0.
PJ ( "3 I+)=3 f VP









Similarly,

1 q3 1 1 1 1 qk


For the second case, suppose j = max ()) and pi = min(Q),pi < pj and qg < q.

Since the original solution, before the exchange, is positive, it follows from (3.40) and

(3.41) that
P > q j and P > q q. (3.43)
Pt Pj V PiP, Pi
After the exchange, the analogous inequalities that must be satisfied to preserve nonnegativity

are
p > qj qi (3.44)
\fPip, Pj
and
p > q qj. (3.45)
VPP4', Pi
(3.45) is satisfied from (3.43) and the fact that pi < pj and qg < qj. If (3.44) is also satisfied, the

proof is completed.

If (3.44) is not satisfied, i.e., P < the associated cost after the exchange is

1 1
C* +-
p P + qg qj

where t = P and t, = 0. We will show that C* < C with P > -q P > --

and P < -k- -_ Letting C* < C gives
-- (i-ptp Pj

1 1 +1 )2
+ < .
PPq P P

Multiplying both sides of the above inequality with (pjP + qi)qj(P + ', + -) gives
Pi Pj

q (p + q + + ) + (p + )(+ + ) < 1 + 1
P +P p + Pt + )2(pip + q-)qj.

After considerable algebra on the above inequality, we find that to show C* < C is equivalent

to show that f(P) < 0 with

f(P) = pP2 + (q, + qj + Pj q p qj 2 qp ( q- )2, (3.46)
PA At vv v,1









when P (max[ ], -). Since
p ,p j pj) "',P-J j ]Pp .)"Sin e

f( )=(pj + 1)(q q)( q qj -) < 0







when --- > q and




we have C* < C. i

Combining Corollary 3.3.1, Theorem 3.3.3 and Theorem 3.3.4, we conclude that the opti-

mal training sequences should be designed according to the following theorem.

Theorem 3.3.5. Let UAUH and VAV" be the diagonalizations of QN and Rt respectively

where the columns of U and V are orthonormal eigenvectors, the corresponding eigenvalues

{ A } are arranged in increasing order, and {5, } are arranged in decreasing order Then the
optimal solution of (3.18) is given by

S = Uq V (3.47)


where m specifies the power allocation which is diagonal with diagonal elements given by





and F, = O for i > nt, with the parameter pi is chosen so that
nt

Si=
With the optimal training sequence, the channel estimator simplifies to


fl = YUr,, rVH,


(3.49)









where r = diag{yi, ... ., 7-, with y = the columns of U, are the eigenvectors of QN

corresponding to its nt smallest eigenvalues, and the columns of V, are the eigenvectors of Rt.

The design of the optimal training sequences summarized in the above theorem has a clear

physical interpretation. Each eigenvector of the transmit correlation matrix Rt represents the

transmit eigen-direction and the associated eigenvalue indicates the channel gain in that eigen-

direction. More power should be assigned to the signals transmitted along the eigen-direction

with larger channel gains. On the other hand, each eigenvector of the interference temporal

correlation matrix QN represents the interference subspace and the corresponding eigenvalue

indicates the amount of interference in that subspace. Hence, we should choose the subspaces

with the least amount of interference for transmission. To facilitate the understanding of the

physical meaning of optimal training sequences, we can rewrite them in an alternative way as
nt

i=1

where u, are orthonormal eigenvectors of QN with the corresponding eigenvalues arranged in an

increasing order and vi are orthonormal eigenvectors of Rt with the corresponding eigenvalues

arranged in a decreasing order. The vectors ui and vi form transmission directions in time

and space, respectively. The above theorem implies that the optimal training sequence design

put more power to the transmission direction constructed by the eigen-directions with larger

channel gains and the interference subspaces with less interference. The power assignment is

determined by the water-filling argument under a finite power constraint.

3.4 Estimation of Channel Statistics and Feedback Design

To implement the channel estimator and construct the optimal training sequences for chan-

nel estimation, we need the knowledge of the transmit antenna correlation matrix Rt and the

interference covariance matrix QN at both the receiver and transmitter sides. Since these two

matrices are long-term channel characteristics, they can be estimated by using the observed

training signals at the receiver end and then fed back to the transmitter end for the construc-

tion of the optimal training sequences. In this section, we propose an algorithm to estimate

these long-term channel characteristics and design an efficient feedback scheme so that we can

approximately construct the optimal training sequences at the transmitter end.








Let us assume that the training signal matrix S is sent over a block of K packets. The
received training signals for the nth packet are given as

y(n) = (S In,)h(n) + e(n)

= (S I.,)(R/2 R1/2)h(n) + e(n)

= (SR/2 0 R1/2)h,,(n) + e(n). (3.50)

We can calculate the sample average correlation matrix of the received signal from the previous
K packets as follows:
K
R= y(n)y(n). (3.51)
n= 1
It is easy to see that R is the sufficient statistics for the estimation of the second-order correlation
matrices Rt and QN if e(n) is Gaussian distributed.
We can show that the correlation matrix of the received signal has the Kronecker product
form:

R = E(y(n)y(n)H)

= (SR'/2 R1/2)E(h,,(n)h,(n)H)(R/2SH 0 R1/2) + E(e(n)e(n)H)

= (SRtSH) 0 Rr + QN o Rr

= (SRtSH + QN) 0 Rr

= RqoR,. (3.52)

where Rq = SRtSH + QN IfR Rq Rr, then R = aRq 0 1Rr for any a : 0. Hence, Rq
and R, can not be uniquely identified from observing y(n). Fortunately, the channel estimator
and the design of optimal sequences are invariant to scaling of the estimates of Rt and QN. This
can be explained as follows:

I'(n) = Y(n)(aQN)-S(SH(aQN)-lS + (aRt)-l)-1

= Y(n)QNIS(SHQNS + Rt1)-1

= H(n)









and

tr((SH(QQN)-lS + (aR,)-')-1 = atr((SHQNIS + R 1)-T).

We notice that the new cost function of the optimization problem is just a scaled version of the

original cost function.

For the estimation of Rq and Rr, we need to impose an additional constraint on Rr. Here

we force tr(R,) = nr. Then an iterative flip-flop algorithm [75, 76, 77] can be used to estimate

Rq and R,. If the received interference signal e(n) is Gaussian distributed, the flip-flop algo-

rithm provides the maximum likelihood estimates (MLE) of Rq and R,. [75] when it converges.

Otherwise, the algorithm gives the estimates of Rq and R, in the least square sense. For fixed

Rr, the MLE of Rq is obtained as

= or KY,(n)[YH(n)]'} (3.53)
U=l v=l 7=1

where ar, is the (u, v)th element of R-1 and Yu(n) is the uth row vector of the received signal

matrix Y(n). Similarly, for fixed Rq, the MLE of Rr is obtained as
N N K
RO= o: W (n)W'(n) (3.54)
u=l- 1= n=1

where oa, is the (u, v)th element of f'1 and W,(n) is the uth column vector of the received

signal Y(n). Then to get uniquely identifiable Rq and Rr, we need to scale Rr to make

tr(R,) = nr. We note that the terms inside the braces in (3.53) and (3.54) can be computed

before the running of the iterative estimation algorithm to reduce computational complexity. To

start the iterative algorithm, an initial value of either Rq or R, should be assigned. A natu-

ral choice is to initially make Ri = In,. Then the iterative algorithm alternates between the

calculation of Rk and ,. until convergence. While it is difficult to prove analytically that the

algorithm converges to the MLE, extensive data experiments [75] in statistics show that it al-

ways converges to the MLE for situations of practical sample sizes. The convergence in our

case is also verified by the numerical results in Section 3.5.








Then Rt and QN can be estimated based on Rq. We note that only 7'(QN) fn R -(S) can
be uniquely identified from Rq in the sense below (R. denotes the range space of a matrix and
R' denotes the perpendicular subspace of the range of a matrix):
Lemma 3.4.1. Let Rt and QN be Hermitian positive semi-definite matrices andRq = SRtSH+

QN, where S is offull rank. Let D = {(Rt, QN) : RZ(QN) C 7'(S)}. Then there is an 1-1
correspondence between Rq and (R,, QN) only for the pairs of (Rt, QN) in D.

Proof Let Ps = S(SHS)-ISH be the projection onto R7(S) and Ps = I -Ps be the projection
onto RT (S).
First, let (Rt, QN), (Rit, Q') e D. Let Rg = SRtSH + QN and R' = SRSH +Q'
Consider P(Rq = PsQN = QN, PsRq = SRtSH, and PR', = PsQ'N = Q'N PsR =
SR'tSH. Since S is of full rank, PsRq = PsR' iff Rt = Rt. Also since Ps and P' are
projections onto complementary subspaces, Rq -= R'q iffPR = P7R' and PsR = PsR',
i.e. (Rt, QN) = (R, Q'N).
Conversely, let (Rt, QN) e D and Rq = SRSH + QN. Now choose R'(t Rt and define

Q'N = QN + SRtSH SR'SH. Since R(QN) C R'(S) and S is of full rank, (Rt, Q'N) ID'.
But R' = SRSH + Q = SRtSH+ QN = Rq.

Based on the above lemma, we see that estimating QN and Rt simultaneously from Rq is
not possible. However, since PsRqPs = PsQNPs, we can estimate QN from P RqP .
For notation simplicity, let A denote Ps RqPs. Since the interference signals are wide-sense
stationary in time, QN is a Topelitz matrix which can be represented by a sequence {qk; k =
0, 1, (N- 1)} with QN = { qk,} = { qk-j }. Then the ijth element of P' QN PS is given
by E k Pimq-kPkj with pyj denoting the ijth element of PI. Equating the ijth element of

P QNPs with aj, we have a set of linear equations in {qk}. Noticing the hermitian nature of
PsQNPs and A and separating the real and imaginary parts of qk and aj, we have N2 linear
equations with 2N 1 unknowns in q, = [qo, Re(q1), Im(qi),..., Re(qN_1), Im(qN-1)]7. The
set of linear equations can be solved by employing the least square approach. Then the estimate
of QN can be constructed based on q,.








In addition, when N is large, QN can be approximated by a circulant matrix [78] with fixed

eigenvectors as:

QNg FNyF" (3.55)

where FN is the N x N FFT matrix and % is a diagonal matrix containing eigenvalues Oi. We

notice that we only require the nt smallest eigenvalues of QN and their corresponding eigen-
vectors in constructing the optimal training sequences. With the circulant matrix approximation

(3.55), it is equivalent to estimating the n, smallest eigenvalues O, and identifying the corre-
sponding columns of F. The nt, smallest positive eigenvalues of QN are used as the estimates

of the nt smallest Vi, and the corresponding columns of F are chosen as those closest to the
eigenvectors associated with the nt smallest positive eigenvalues of QN.
The estimates of the nt smallest Oi and the nt indices of the chosen columns of FN are

then fed back to the transmitter for the optimal training sequence construction. We notice that

it is bandwidth efficient to just feed back these indices of FN instead of the whole eigenvectors

of QN because the number of training symbols N during the training period is usually large.
To derive the estimator of Rt, we need the following lemma which establishes the asymp-

totical equivalence of QN and PsQNP, as N increases.
Lemma 3.4.2. With the assumption that QN is an absolutely summable Toeplitz matrix, QN and

PsQNPs are asymptotically equivalent. Since QN is Toeplitz, P QNP( is asymptotically
Toeplitz.

Proof Two definitions of the norms of a matrix which include the strong norm and weak norm
[78, 79] are needed to study the asymptotic equivalence of two matrices. The strong norm || A |
is defined as

I| A J|= maXx:x*x=1[x*A*Ax] = v/Amr (A*A)

where Amax represents the largest eigenvalues of a matrix. If A is Hermitian, I| A I|= Amax(A)|.
The weak norm of A is defined as


JAI = (n-'Tr[A*A]) .








Two sequences of n x n matrices An and Bn are said to be asymptotically equivalent [78]

if A, and Bn are uniformly bounded in strong norm:

|| An 1, Bn J|< M
and An B, approaches zero in weak norm as n -- oo:

lim An Bn| = 0.
n--00

If one of the two matrices is Toeplitz, then the other is said to be asymptotically Toeplitz.

Without the loss of generality, we assume that QN is an absolutely summable Toeplitz

matrix. (For the temporal interference correlation matrix QN arising from practical scenarios,

such as jamming signals and co-channel interference considered here, it is easy to verify that

QN is absolutely summable.) QN can be represented by a sequence {qk; k = 0, 1, +2,... }
with QN = {qkj} = {qk-j} and E= -oo qk < 00. It is shown [80] that QN is bounded in

strong norm as:
+o00
I QN <|| 2 1 I, = 2Mq < oo.
k=-oo
Then we need to show that |1 PsQNPs II is also bounded. Using the properties of the strong

norm, we have

II PQNPs II

= I (I- Ps)QN(I Ps) 1I

= II QN PsQN QNPS + PsQNPs II

< II QN II + II PsQN 1I + II QNPs I + II PsQNPs II.

To proceed, we need the following lemma [40]:

Lemma 3.4.3. For two Hermitian positive semi-definite matrices G and H,


Amax(GH) < Amax(G)Amax(H).








Then, we have

11 PsQN H= [Am.o(QNPsQN)] < [Am,(QN)Ama.(Ps)Ama(QN)]I = Am(Q) =|1 QN I

Similarly, 1I QNPs 11i11 QN 11 and || PsQNPs 1|111 QN I|. Thus, PQNP 11 4 |
QN 11= 8AM. Let M = 8M,, then | QN |11 M < c0 and I| PsQNP s II< M < 00.
Next, we need to show that the distance of the two matrices goes to zero asymptotically in
weak norm. Using the properties of weak norm, we have

IQN P QNPI

= IPsQN + QNPs PsQNPsI

< IPsQNl + IQNPsI + IPsQNPsI.

We need the following Lemma [78, 80]:

Lemma 3.4.4. Given two n x n matrices G and H, then

IGHI <11 G II IH.

The weak norm of Ps can be written as

Ps| = (N-Tr[S(SHS)-ISH]) = (N-1Tr[ ) = ( .

Then using the above lemma, we have

IQNPsI <11 QN II Psi = (n) I1 QN 1< (n)2Mq.

Similarly, IPsQNPsI <-1 PsQN II (') <11 QN II (a)! < (N) 2M, and
IPsQNI = IQNPsI < (L)22M,. Then, we can show that

IQN PsQNPs 3 im () 2Aq = 0.









Based on the above lemma, the transmit channel correlation matrix Rt can be estimated by

projecting the received signal onto R(S). Since N is usually much larger than nt, we have

Rq SRtSH + P QNPs, (3.56)


and hence

PsRqPs PsSR,SHPs + PsPsQNP sPs = SRSH. (3.57)

Then we can estimate the transmit channel correlation matrix R, using

ft = (SHS)-lSHfRS(SHS)-1. (3.58)

3.5 Numerical Results

In this section, we present some numerical results to show the performance gain for channel

estimation achieved by the designed optimal training sequences. We consider a MIMO system

with 3 transmit antennas and 3 receive antennas. The antennas form uniform linear arrays at both

the transmitter and the receiver. For a small angle spread, the correlation coefficient between

the ith and the jth transmit antenna [67] can be approximated as:

[Rlj 2 exp {-j2r sin A dt sin }dO = Jo(27r|i J sin A ), (3.59)
S27 j27ri sin A A

where Jo(x) is the zeroth order Bessel function of the first kind, A is the angle spread, dt is

the antenna spacing and A is the wavelength of a narrow-band signal. We set dt = 0.5A. In

the simulations, we consider two channels with different transmit channel correlations: a high

spatial correlation channel with A = 50 and a low spatial correlation channel with A = 25.

The receive correlation matrix Rr is calculated similarly as the transmit correlation matrix with

A = 25.
We consider two kinds of interference: the co-channel interference from other users in the

same wireless system and jamming signals which are usually modeled by autoregressive (AR)

random processes.
We compare the channel estimation performance in terms of the total MSE for systems

using different sets of training sequences. The following different training sequence sets are

considered for comparison: 1) the optimal training sequences described in Section 3.3., 2) the









approximate optimal training sequence constructed based on the channel and interference statis-

tics obtained by using the proposed estimation algorithm in Section 3.4., 3) the temporally op-

timal training sequences for which the transmit channel correlation matrix is assumed to be an

identity matrix and only temporal interference correlation is considered in designing the optimal

training sequences. (we also consider the approximate temporally optimal sequences which are

constructed based on the channel statistics obtained by using the proposed algorithm), 4) the

spatially optimal training sequences for which the interference is assumed to be temporally

white and only transmit correlation is considered in designing the optimal training sequences.

(we also consider the approximate spatially optimal sequences which are constructed based

on the channel statistics obtained by using the proposed algorithm), 5) Binary orthogonal se-
quences, 6) Random sequences.
3.5.1 Co-channel Interference

In a cellular wireless communication system, co-channel interference (CCI) from other

cells exists due to frequency reuse. Hence, the interfering signals have the same signal format

as that of the desired user. We can express the interfering signal transmitted from the ith transmit

antenna of the mth interferer as

s m)(t) = b2)0(t IT Tm) (3.60)
v 'ti l=-00

where P,, is the transmit power of the mth interferer, and {b)'I } are data symbols transmitted

from the ith transmit antenna of the mth interferer. They are assumed to be i.i.d. binary random

variables with zero mean and unit variance. In addition, )(t) is the symbol waveform and T is

the symbol duration. It is assumed that the receiver is synchronized to the desired user but not

necessarily to the interfering signals and Tr is the symbol timing difference between the rmth

interferer and the desired user signal. Without loss of generality, we assume 0 < r,, < T. The
elements of the interference symbol matrix Si are samples at the matched filter output at the

receiver at time index jT. The (j, i)th element of S, is

F=\- b'((j 1)T -T,) (3.61)
,-o'









with

(t) = (t s) "(s)ds (3.62)
--oo
where 4(t) is the autocorrelation of the symbol waveform. For the co-channel interference, the

temporal interference correlation is due to the intersymbol interference in the sampled interfer-

ing signals.

In the simulations, it is assumed that there are two interfering signals in the system and the

SIR (signal-to-interference ratio) is set to be OdB. The ISI-free symbol waveform with raised

cosine spectrum is chosen as the symbol waveform. For this case, we have
4(t) = sinc(wrt/T) cos(- r23T)



We set the roll-off factor 0 = 0.5, T1 = 0.2T and -r2 = 0.5T.

In Fig. 3.1 and Fig. 3.2, we show the total channel estimation MSEs for the high spatial

correlation channel and low spatial correlation channel, respectively. For both cases, the opti-

mal sequences outperform the orthogonal sequences and random sequences significantly. For

the high spatial correlation channel, the optimal sequences provide a substantial performance

gain over both the spatially optimal sequences and the temporally optimal sequences. The ap-

proximate optimal sequences achieve most of the performance gain obtained by the optimal

sequences. For the low spatial correlation channel, the MSE performance of the approximate

optimal sequences is close to that of the optimal sequences. The temporal correlation has a

stronger impact on the channel estimation than the spatial channel correlation due to the fact

that the length of training sequences N is much larger than the number of transmit antennas t.

It is verified by the simulation results shown in Fig. 3.2 that the temporally optimal sequences

achieve an estimation performance similar to that achieved by the optimal sequences. These two

optimal sequences provide significant performance gain over the spatially optimal sequences.

3.5.2 Jamming Signals

We assume that there are two jamming signals in the system. The jamming signals are

modeled as two first order AR processes driven by temporally white Guassian processes {ui,t}

as,


(3.63)


Si,t = aiSi,t-_1 + Ui,t





























10-1












LU
w









10-2


I I I I I
15 20 25 30 35 40
N


45 50 55 60 65


Figure 3.1: Comparison of total MSEs obtained using different training sequences. ISI-free
symbol waveform and high spatial correlation channel.


S ... .......... 1) Optimal sequences
2) Approximate optimal sequences
S-9- 3) Temporally optimal sequences
S-e- 4) Spatially optimal sequences
-8- 5) Orthogonal sequences
-+- 6) Random sequences
-v- Approximate temporally optimal sequenes
S-0- Approximate spatially optimal sequences










S--- 1) Optimal sequences
S-*- 2) Approximate optimal sequences
S-v- 3) Temporally optimal sequences
/-e- 4) Spatially optimal sequences
10- ....... -e- 5) Orthogonal sequences
10- .. ... --- 6) Random sequences
+ -v- Approximate temporally optimal sequences
S \ ...... ...... -0- Approximate spatially optimal sequences

















10-2 I I I
15 20 25 30 35 40 45 50 55 60 65
N

Figure 3.2: Comparison of total MSEs obtained using different training sequences. ISI-free
symbol waveform and low spatial correlation channel.


where si, represents the jamming signal transmitted by the ith jammer at the tth time index, ao

is the temporal correlation coefficient, and u,,t has zero mean with variance oa,, which decides

the transmit power of the ith jammer. The SIR is set to be 0 dB. We choose aI = 0.4 and

02 = 0.5.

In Fig. 3.3 and Fig. 3.4, we show the total channel estimation MSEs for the high spatial

correlation channel and low spatial correlation channel, respectively. For AR jammers, simi-

lar conclusions on the estimation performance achieved by different training sequences can be

made as in the case of co-channel interference.

3.6 Conclusion

In this chapter, we consider a wireless communication system with multiple transmit and

receive antennas in a slow, Rayleigh flat-fading environment. We study the problem of the

estimation of correlated MIMO channels with colored interference. The Bayesian channel es-

timator is derived and the optimal training sequences are designed based on the mean square

error (MSE) of channel estimation. We propose an algorithm to estimate long-term channel






























I I I


- 1) Optimal squences
- 2) Approximate optimal sequences
-v- 3) Temporally optimal sequences
-e- 4) Spatially optimal sequences
-B- 5) Orthogonal sequencs
-i- 6) Random sequences
-v- Approximate temporally optimal sequences
-0- Approximate spatially optimal sequences










_0_0


L I I I
15 20 25 30 35 40
N


45 50 55 60 65


Figure 3.3: Comparison of total MSEs obtained using different training sequences. ARjammers
and high spatial correlation channel.

























































10
15


20 25 30 35 40
N


I I 55 60 65
45 50 55 60 65


Figure 3.4: Comparison of total MSEs obtained using different training sequences.
and low spatial correlation channel.


- 1) Optimal sequences
* 2) Approximate optimal sequences
-v- 3) Temporally optimal sequences
-e- 4) Spatially optimal sequences
-B- 5) Orthogonal sequences
-+- 6) Random sequences
-v- Approximate temporally optimal sequences
-0- Approximate spatially optimal sequences


AR jammers


\









statistics and design an efficient feedback scheme so that we can approximately construct the

optimal sequences at the transmitter. Numerical results show that the optimal training sequences

provide substantial performance gain for channel estimation when compared with other training

sequences.
3.7 Appendix

3.7.1 A Trace Problem

In this appendix, we analyze a variant of the optimization problem (3.18) which can be

formulated as

1 1
min tr(RfSHQN SR + It)-1 (3.64)
s
subject to tr{S"S} < P


Two different trace optimization problems (3.18) and (3.64) are related in the form of cost

functions. The cost function of the original optimization problem (3.18) can be rewritten as

tr(SHQ lS + R- 1)-1 = trRt(RFSHQN-SR? + I)-1,


which can be viewed as the weighting of the cost function of the new trace optimization problem

(3.64).

For the sake of notational simplicity, we consider the following same optimization problem

as (3.64) with different but simpler notations.

rain tr (DSHQSD + 1)-1 (3.65)
S
subject to tr (SHS) < P, S E C"X.


Here Q is a nonzero Hermitian, positive semidefinite matrix, D is a nonzero Hermitian, positive

definite matrix, and the positive scalar P is the power constraint associated with the signal

S. The main results on the solution to the optimization problem (3.65) are cited here for the

completeness of the dissertation and the details can be found in the literature [81]. We write the

inverse matrix


C = (DSHQSD + I)1)-









for convenience.

As discussed before, the solution in the special case D = I can be expressed in terms

of the eigenvalues and eigenvectors of Q and a Lagrange multiplier associated with the power

constraint. For the optimization problem introduced here, D $ I and minimizing the trace of

C is more difficult. We will show that (3.65) has a solution that can be expressed S = UEVH

where U and V are orthonormal matrices of eigenvectors for Q and D respectively, and E

is diagonal. Solving (3.65) involves computing diagonalizations of Q and D, and finding an

ordering for the columns of U and V. We are able to evaluate the optimal ordering when either

P is large or P is small. However, for intermediate values of P, evaluating the optimal ordering

is more difficult. The problem (3.65) has a combinatorial nature, unlike the special case D = I.

The trace problem (3.65) arises in spreading sequence optimization for code division mul-

tiple access (CDMA) systems. In cellular communication systems, multiple access schemes

allow many users to share simultaneously a finite amount of radio resources. CDMA is one of

the main access techniques. It is adopted in the IS-95 system and will be used in next generation

cellular communication systems [82]. In a CDMA system, different users are assigned different

spreading sequences so that the users can share the communication channel. We consider the

uplink (communication from the mobile units to the base station) of a CDMA system where the

users within a base station are symbol synchronous. The co-channel interference from the users

in the neighboring cells are modeled by additive, colored Gaussian noise. The received signal

at the base station is
K
y = hisix + e,
i=1
where K is the number of signals received by the base station, xi is the symbol transmitted from

the ith user, si, CN is the spreading sequence assigned to the ith user, hi is the channel gain

from the ith user to the base station, and e E CN is the additive, colored Gaussian noise with

zero mean and covariance E. Usually the size of K and N are comparable. It is assumed that

the symbols x, are independent with zero mean and unit variance. The received signal can be

expressed as


(3.66)


y = SHx + e,









where S, the spreading sequence matrix, has jth column sj, and H is a diagonal matrix with

ith diagonal element hi. Again, by the Bayesian Gauss-Markov Theorem [36, 83], the MMSE

estimator of x is
x = (HHSHE-1SH + I)-HHSHE-ly.

The corresponding covariance matrix of the estimation error is

C, = (HHSHE-ISH + I)-


The optimal spreading sequences for all the users which minimizes the co-channel interference

to other cells, subject to a power constraint, corresponds to (3.65) with Q = E-1 and D = H,
a diagonal matrix.
To solve the trace optimization problem, we begin by analyzing the structure of an optimal

solution to (3.65). Let UAUH and VAVH be diagonalizations of Q and D respectively (the

columns of U and V are orthonormal eigenvectors). Let 6i, 1 < i < n, and Aj, 1 < j < m,

denote the diagonal elements of A and A respectively. We assume that the eigenvalues are
arranged in decreasing order:

1 62 > > 6n and A1> A2> ... > Am. (3.67)


Let us define
T = UHSV. (3.68)

Making the substitution S = UTVH in (3.65) yields the following equivalent problem:

min tr (ATHATA + I1)-1 (3.69)

subject to tr (THT) < P, T CCmxn.


We now show that (3.69) has a solution with at most one nonzero in each row and column.
Theorem 3.7.1. There exists a solution of (3.69) of the form T = II E112 where III and H2
are permutation matrices and o-i = Ofor all i =, j.
Combining the relationship (3.68) between T and S and Theorem 3.7.1, we conclude that
problem (3.65) has a solution of the form S = UIIEII2VH, where III and 112 are permutation









matrices. We will now show that one of these two permutation matrices can be deleted if the

eigenvalues of D and Q are arranged in decreasing order.

Let N denote the minimum of m and n. Making the substitution S = UfIIHI2VH in

(3.65), we obtain the equivalent problem:

min tr ((H2Ar2H)EH(H 1AI)E 2AH) + I (3.70)
N
subject to tr ao2 < P.
i=1

Here the minimization is over diagonal matrices E with 0-1,.... aN on the diagonal, and per-

mutation matrices II1 and II2.

The symmetric permutations HI'AII1 and I12AII' essentially interchange diagonal ele-

ments of A and A. Hence, (3.70) is equivalent to

min N 1 (3.71)
or, 11 2 (: (i)O-)2A,1(i) + 1
N
subject to -2 < P, 711 E Pm, 712 Pn
i=1

where Pm is the set of bijections of {l1, 2,.. ., m} onto itself.

We first show that we can restrict our attention to the largest diagonal elements of D and

Q.

Lemma 3.7.1. Let UAUH and VAVH be diagonalizations of Q and D respectively where

the columns of U and V are orthonormal eigenvectors. Let a, 7r1, and 7r2 denote an optimal

solution of(3.71) and define the sets

AN= {i: >0}, Q= {A,,(.) : i Af}, and D = {6,( : i E },


IfJ f has I elements, then the elements of the set D and Q are all nonzero, and they constitute

the I largest eigenvalues of D and Q respectively.

Using Lemma 3.7.1, we now eliminate one of the permutations in (3.71).

Theorem 3.7.2. Let UAUH and VAVH be diagonalizations of Q and D respectively where

the columns of U and V are orthonormal eigenvectors, and the eigenvalues of Q and D are

arranged in decreasing order as in (3.67). If K is the minimum of the rank of D and Q, then









(3.71) is equivalent to
K 1
mm 1 (3.72)
min (6,,)2A,(i)+ 1
K
subject to Za,2 P, 7rC 'PK,
i=1

where ai = 0 for i > K.

Proof The proof is similar to that for Theorem 3.3.2 E

Corollary 3.7.1. Problem (3.65) has a solution of the form S = UIIH~VH where the columns of

U andV are orthonormal eigenvectors of Q and D respectively with the associated eigenvalues

arranged in decreasing order, H is a permutation matrix, and E is diagonal.

Proof The proof is similar to that for Corollary 3.3.1. OD

Assuming the permutation 7r in (3.72) is given, let us now consider the problem of optimiz-

ing over a. To simplify the indexing, let pi denote A,(i). Hence, for fixed 7r, (3.72) is equivalent

to the following optimization problem:


min (3.73)
rnin ~ (6iC)2pi + 1

K
subject to af l P.
i=1

The solution of (3.73) can be expressed in terms of a Lagrange multiplier for the constraint.

Theorem 3.7.3. The optimal solution of (3.73) is given by


cr = max { P- p, 1/2 (3.74)


where the parameter p is chosen so that
K
or 2 = P. (3.75)


Proof The proof is similar to that for Theorem 3.3.3.









To solve (3.65), we need to find an optimal ordering for the eigenvalues of D and Q. In

Theorems 3.7.4 and 3.7.5, we determine the optimal ordering when the power P is either large
or small.

Theorem 3.7.4. If the eigenvalues {Ai } and {6, } of Q and D respectively are arranged in

decreasing order, then for P sufficiently large, an optimal permutation in (3.72) is

7r(i) = K + 1 i., < i < K, 7i) = i, i > K. (3.76)

Theorem 3.7.5. Suppose the eigenvalues {Ai } and {6i } of Q andD respectively are arranged in

decreasing order, and let L be the minimum of the multiplicities of 61 and A1. For P sufficiently
small, an optimal solution of (3.65) is

S = u'ivH (3.77)

where ui and vi are the orthonormalized eigenvectors of Q and D associated with A1 and 61

respectively.
3.7.2 A Determinant Problem

In this appendix, we analyze the following matrix optimization problem where we maxi-

mize the determinant, denoted "det", of a matrix:

max det (DSHQSD + I) (3.78)
S
subject to tr (SHS) < P, S E Cmxn

Since the determinant of the inverse of a matrix is the reciprocal of the determinant of the matrix,

it follows that problem (3.78) is equivalent to replacing trace by determinant in (3.65). Hence,

in the original problem (3.65), we minimize the sum of the eigenvalues of the MSE matrix C,

while in the second problem (3.78), we minimize the product of the eigenvalues of C. In either
case, we try to make the eigenvalues of C small, but with different metrics.
For the special case D = I, the solution of (3.78) can be found in Telatar [1], and for the
special case Q = I, the solution of (3.78) can be found in Zhou [63]. For the more general
problem (3.78), we again show that the solution can be expressed S = UEVH, where U and V

are orthonormal matrices of eigenvectors for Q and D respectively, and E is diagonal. Unlike









the trace problem (3.65), the ordering of the columns of U and V does not depend on the

power P the columns of U and V should be ordered so that the associated eigenvalues of

Q and D are in decreasing order. This optimal eigenvector ordering result is the same as that

for the optimization problem (3.18) in Section 3.3 when the same notations for corresponding

matrices are adopted. In Cai et al. [65], the authors formulated the similar optimization problem

while studying the space-time spreading (STS) scheme for correlated fading channels in the

presence of interference. Based on the previous optimization result for the special case Q = I

[63], UEV" was chosen as the STS matrix, and then the optimal eigenvector ordering and E

were decided. Here we solve the optimization problem (3.78) by using the method introduced

in Wong et al. [61] and Wong et al. [84]. (Two important matrix inequalities arising from

majorization theory [40] are used.)

The determinant problem arises from spreading sequence optimization for CDMA systems.

For CDMA systems, a different performance measure, which arises in information theory, is

the sum capacity of the channel. The mean square error is a performance measure for uncoded

systems, while the sum capacity is a performance measure for coded systems. It represents the

maximum sum of the rates at which users can transmit information reliably. The sum capacity

of the synchronous multiple access channel (3.66) is

Csum = max I(xi,..., XK; y),

where I represents the mutual information [74] between the inputs x1, X2,..., XK and the out-

put vector y. The maximization is over the independent random inputs X1, X2, ... XK. The

maximum is achieved when all the random inputs are Gaussian. In this case, the sum capacity

[71, 85] becomes

Csum = 1 log det (HHSHE-1SH + I).
2N
Since log is a monotone increasing function, the maximization of the sum capacity, subject to a

power constraint, corresponds to the optimization problem (3.78) with Q = E-1 and D = H.

The solution to the determinant problem (3.78) can be expressed as follows:

Theorem 3.7.6. Let UAUH and VAVH be the diagonalizations of Q and D respectively

where the columns ofU andV are orthonormal eigenvectors and the corresponding eigenvalues









{A,} and {6f} are arranged in decreasing order If K is the minimum of the rank of Q and D,

then the optimal solution of (3.78) is given by


S = UEVH, (3.79)


where E is diagonal with diagonal elements given by


a, = max i- 1/20 for 1 < i < K (3.80)


and ai = O for i > K, where the parameter p is chosen so that
K
^P.
i= 1

Proof Initially, let us assume that both D and Q are nonsingular later we remove this restric-

tion. Insert T = Q1/2S in (3.78) and multiply the objection function on the left and right by

det (D-1) to obtain the following equivalent formulation:


max det (THT + D-2) (3.81)
T
subject to tr (TTHQ-1) < P, T E Cmxn


Let wi, 1 < i < n, denote the eigenvalues of THT arranged in decreasing order. By a theorem of

Fiedler [86] (also see [40, Chap. 9, G.4]), the determinant of a sum THT + D-2 of Hermitian

matrices is bounded by the product of the sum of the respective eigenvalues (assuming the

eigenvalues of THT and D are in decreasing order):


det (THT + D-2) < U(w, + 6 2) (3.82)
i=1

Also, by a theorem of Ruhe [87] (also see [40, Chap. 9, H2]), the trace of a product (TTH)Q-1

of Hermitian matrices is bounded from below by the sum of the product of respective eigenval-

ues (assuming the eigenvalues of TTH and Q are in decreasing order):
N
tr (TTHQ-1) > -wA.-1, N = min{m,n}, (3.83)
i=1


since at most N eigenvalues of THT and TTH are nonzero.









We replace the cost function in (3.78) by the upper bound (3.82) and we replace the con-

straint in (3.78) by the lower bound (3.83) to obtain the problem:


max ( 2 n(wO + 6.2) (3.84)
i=N+ i=1
N
subject to L toA < P, w uwi+1 > 0 for i < N.
i=1

If T is feasible in (3.81), then the square of its singular values are feasible in (3.84) by (3.83).

And by (3.82), the value of the cost function in (3.84) is greater than or equal to the associated

value (3.81). Since the feasible set for (3.84) is closed and bounded, and since the cost function

is continuous, there exists a maximizing w, and the maximum value of the cost function (3.84)

is greater than or equal to the maximum value in (3.81).

Consider the matrix T = UVnl/2VH where f is a diagonal matrix containing the max-

imizing w on the diagonal. For this choice of T, the inequalities (3.82) and (3.83) are both

qualities. Hence, this choice for T attains the maximum in (3.81). The corresponding optimal

solution of (3.78) is


S = Q-1/T = UA-1/2UHU 1/2VH = UA-1/2nl/2VH. (3.85)


To complete the proof of the theorem, we need to explain how to compute the optimal W in

(3.84).

At the optimal solution of (3.84), the power constraint must be an equality (otherwise, we

could multiply w by a positive scalar and increase the cost). Let us ignore the monotonicity

constraint w _> o w+1 (we will show that the maximizer satisfies this constraint automatically).

After taking the log of the cost function, we obtain the following simplified version of (3.84):
N
max log(w + -2) (3.86)

N
subject to w iA1 = P, W > 0.
i=i

Since the cost function is strictly concave, the maximizer of (3.86) is unique.









The first-order optimality conditions (KKT conditions) for an optimal solution of (3.86)

are the following: There exists a scalar p > 0 and a vector v e R" such that

1 p
-i + vi = 0, > 0, w > 0, viwi = 0, 1 < i < N. (3.87)
+,_ + A2 A.

Analogous to the proof of Theorem 3.7.3, we define the function

i) = ( 2) + (3.88)


This particular value for wi is obtained by setting i = 0 in (3.87), solving for wu; when the

solution is < 0, we set wji(p) = 0 (this corresponds to the + operator (3.88)). Observe that

wi(/) in (3.88) is a decreasing function of y which approaches +oo as p approaches 0 and

which approaches 0 as p tends to +00. Hence, the equation
n
i(p) A\t = P (3.89)
i= 1

has a unique positive solution. We have wc = 0 for p > Ai62, which implies that

1 = ) ,+ 1 =- > -62++6,2 =0 when p>\Ai6.
i(p) + '(, -2 A, 6-2 + A -

It follows that the KKT conditions are satisfied when p is the positive solution of (3.89). Since

the At and 6i are both arranged in decreasing order, it follows that for any choice p > 0, the w,

given by (3.88) are in decreasing order. Hence, the constraint woi+ < wi in (3.84) is satisfied

by the solution of (3.86). Combining the formula (3.88) for the solution of (3.86) with the

expression (3.85) for the solution of (3.78), we obtain the solution S given in (3.79) and (3.80)

where E = A- 1/21/2.

Now suppose that either D or Q is singular. Let us consider a perturbed problem where

we replace Q by Q, = UAUH and D by D, = VAVH:

max det (DSHQSD, + I) (3.90)
S
subject to tr (SHS) < P, S Cmxn


Here A, and A, are obtained from A and A by setting 6, = c = Aj for i or j > K. Since Qe

and D, are nonsingular, it follows from our previous analysis that the perturbed problem (3.90)









has a solution of the form S, = UEVH where the diagonal elements of E, are given by


max 1 01/2 for 1 o = { 1 ) ~ (3.91)

max 0 fori > K.

Let p be chosen so that
K
ae)2 = P.
i=1
Observe that when c3 < p, we have ac = 0 for i > K and
N
c)2 = p.
i=1

Hence, for each e > 0 with e3 < p, the optimal solution of the perturbed problem does not

depend on e and the trailing diagonal elements a' for i > K vanish. Since the cost function in

the perturbed problem (3.90) is a continuous function of e, we conclude that for e3 < p, S, is the

optimal solution of (3.90) for e = 0. The perturbed problem (3.90) with e = 0 coincides with

the original problem (3.78). Consequently, the solution (3.79) and (3.80) is valid, even when

either Q or D is singular. F














CHAPTER 4
CONCLUSION AND FUTURE WORK


To achieve the performance gain promised by multiple antenna systems, parameter estima-

tions including timing estimation and channel estimation are key components of the space-time

system design. In this work, we investigate the timing estimation and channel estimation prob-

lems for MIMO systems.
4.1 Timing Estimation for Rayleigh Flat-fading MIMO Channels

In Chapter 2, we consider a wireless communication system with multiple transmit and

receive antennas in a slow, independent and identically distributed (i.i.d.) Rayleigh flat-fading

environment. We study the problem of timing estimation in such a system with the aid of

training signals from two different approaches. In the first approach, the channel is assumed to

be unknown but deterministic and joint ML estimation of the channel and delay is performed. In

contrast, in the second approach, we assume that the channel is random but with known statistics

and use the likelihood function averaged over all random channel realizations to construct the

ML estimator for the delay. For both approaches, we derive the optimal training sequences

based on the performance measures associated with the CRB of timing estimation. These two

approaches lead to two different optimal training signal designs. For the deterministic channel

approach, we show that orthogonal training signals from multiple transmit antennas minimize

the outage probability as well as the average CRB. For the random channel approach, perfectly

correlated training signals employed at different transmit antennas minimize the CRB.
4.2 Channel Estimation for Correlated MIMO Channels with Colored Interference

In Chapter 3, we consider a wireless communication system with multiple transmit and

receive antennas in a slow, Rayleigh flat-fading environment. We investigate the problem of

estimating correlated MIMO channels in the presence of colored interference. The Bayesian

channel estimator is derived and the optimal training sequences are designed based on mini-

mizing the MSE of channel estimation. The design of the optimal training sequences has a
87









clear physical interpretation which implies that we should assign more power to the transmis-

sion direction constructed by the eigen-directions with larger channel gains and the interference

subspaces with less interference. The power assignment is determined by the water-filling argu-

ment under a finite power constraint. In order to implement the channel estimator and construct

the optimal training sequences, we propose an algorithm to estimate long-term channel statis-

tics and design an efficient feedback scheme so that we can approximately construct the optimal

sequences at the transmitter. Numerical results show that with optimal training sequences, the

MSE of channel estimation can be reduced substantially when compared with other training

sequences.
4.3 Timing Estimation for Correlated MIMO Channels with Colored Noise

In the second chapter, we study the timing estimation problem with the assumption that

the fading coefficients between the pairs of transmit and receive antennas are independent and

identically distributed. This assumption does not generally hold in practice due to the antenna

spacings and orientation, the mutual coupling, the richness of scattering, and the presence of

dominant components [88]. Thus it is natural to extend the current work to investigate the

synchronization problem in correlated channels.

Another possible direction to extend the present work is to address the timing estimation

problem for the MIMO system in colored noise. It is more suitable to adopt the colored noise

model than the white noise model when jammers and co-channel interference are present in the

communication system.














REFERENCES


[1] I. E. Telatar, "Capacity of multi-antenna Gaussian channels," Eur Trans. Telecom., vol.
10, pp. 585-595, Nov. 1999.

[2] G. J. Foschini and M. J. Gans, "On limits of wireless communications in a fading environ-
ment when using multiple antennas," Wireless Commun. Mag., vol. 6, pp. 311-335, Mar.
1998.

[3] V. Tarokh, N. Seshadri, and A. R. Calderbank, "Space-time codes for high data rate wire-
less communication: performance criterion and code construction," IEEE Trans. Inform.
Theory, vol. 44, pp. 744-765, Mar. 1998.

[4] S. Baro, G. Bauch, and A. Hansmann, "Improved codes for space-time trellis-coded mod-
ulation," IEEE Comm. Lett., vol. 4, pp. 20-22, Jan. 2000.

[5] A. R. Hammons and H. E. Gamal, "On the theory of space-time codes for PSK modula-
tion," IEEE Trans. Inform. Theory, vol. 46, pp. 524-542, Mar. 2000.

[6] S. M. Alamouti, "A simple transmit diversity technique for wireless communications,"
IEEEJ. Select. Areas in Commun., vol. 16, pp. 1451-1458, Oct. 1998.

[7] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, "Space-time block coding from orthogo-
nal designs," IEEE Trans. Inform. Theory, vol. 45, pp. 1456-1467, July. 1999.

[8] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, "Space-time block coding for wireless
communications: performance results," IEEE J. Select. Areas in Commun., vol. 17, pp.
452-460, Mar. 1999.

[9] G. Ganesan and P. Stoica, "Space-time diversity using orthogonal and amicable orthogonal
designs," Wireless Personal Communications, vol. 18, pp. 165-178, Aug. 2001.

[10] S. Alamouti, V. Tarokh and P. Poon, "Trellis-coded modulation and transmit diversity:
design criteria and performance evaluation," in Proc. IEEE Int. Conf. Universal Personal
Communications, vol. 2, Florence, Italy, Oct. 1998, pp. 703-707.

[11] B. M. Hochwald and T. L. Marzetta, "Unitary space-time modulation for multiple-antenna
communication in Rayleigh flat fading," IEEE Trans. Inform. Theory, vol. 46, pp. 543-
564, Mar. 2000.

[12] B. Hassibi and B. M. Hochwald, "High-rate codes that are linear in space and time," IEEE
Trans. Inform. Theory, vol. 48, pp. 1804-1824, Jul. 2002.

[13] S. Siwamogsathama and M. P. Fitz, "Robust space-time coding for correlated Rayleigh
fading channels," IEEE Trans. Signal Processing, vol. 50, pp. 2408-2416, Oct. 2002.









[14] Y. Gong and K. B. Letaief, "Concatenated space-time block coding with trellis coded
modulation in fading channels," in IEEE Trans. Wireless Commun., vol. 4, pp. 580-590,
Oct. 2002.

[15] G. J. Foschini, "Layered space-time architecture for wireless communication in a fading
environment when using multi-element antennas," Bell Labs. Tech. J. vol. 1, no. 2, pp.
41-59, 1996.

[16] G. Foschini, G. Golden, R. Valenzuela, and P. Wolniansky, "Simplified processing for
high spectral efficiency wireless communication employing multi-element arrays," IEEE
J. Select. Areas in Commun., vol. 17, pp. 1841-1852, Nov. 1999.

[17] S. Verdu, Multiuser Detection. Cambridge, UK: Cambridge Univ. Press, 1998.

[18] M. J. Gans, N. Amitay, Y. Yeh, H. Xu, T. Damen, R. Valenzuela, T. Sizer, R. Storz, D.
Taylor, W. MacDonald, C. Tran, and A. Adamiecki, "Outdoor BLAST measurement sys-
tem at 2.44 GHz: calibration and initial results," IEEE J. Select. Areas in Commun., vol.
20, pp. 570-583, Apr. 2002.

[19] C. Budianu and L. Tong, "Channel estimation for space-time orthogonal block codes,"
IEEE Trans. Signal Processing, vol. 50, pp. 2515-2528, Oct. 2002.

[20] P. Stoica and 0. Besson, "Training sequence design for frequency offset and frequency-
selective channel estimation," IEEE Trans. Commun., vol. 51, pp. 1910-1917, Nov. 2003.

[21] C. Chuah, D. Tse, J. Kahn and R. Valenzuela, "Capacity scaling in MIMO wireless systems
under correlated fading," IEEE Trans. Inform. Theory, vol. 48, pp. 637-650, Mar. 2002.

[22] A. L. Moustaks and S. H. Simon, "Optimizing multiple-input single-output (MISO) com-
munication systems with general Gaussian channels: nontrivial covariance and nonzero
mean," IEEE Trans. Inform. Theory, vol. 49, no. 10, pp. 2770-2780, Oct. 2003.

[23] S. A. Jafar and A. Goldsmith, "Multiple-antenna capacity in correlated Rayleigh fading
with channel covariance information," IEEE Trans. Wireless Commun., vol. 4, no. 3, pp.
990-997, May. 2005.

[24] E. A. Jorswieck and H. Boche, "Channel capacity and capacity-range of beamforming in
MIMO systems under correlated fading with covariance feedback," IEEE Trans. Wireless
Commun., vol. 3, pp. 1543-1553, Sep. 2004.

[25] E. A. Jorswieck and H. Boche, "Optimal transmission strategies and impact of correlation
in multiantenna systems with different types of channel state information," IEEE Trans.
Signal Processing, vol. 52, no. 12, pp. 3440-3453, Dec. 2004.

[26] A. Lozano and A. M. Tulino, "Capacity of multiple-transmit multiple-receive antenna ar-
chitectures," IEEE Trans. Inform. Theory, vol. 48, no. 12, pp. 3117-3128, Dec 2002.

[27] A. L. Moustakas, S. H. Simon, and A. M. Sengupta, "MIMO capacity through correlated
channels in the presence of correlated interferers and noise: a (not so) large N analysis,"
IEEE Trans. Inform. Theory, vol. 49, no. 10, pp. 2545-2561, Oct. 2003.