Citation

## Material Information

Title:
Timing and channel estimation in multiple-antenna communication systems
Creator:
Liu, Yong
Publication Date:
Language:
English
Physical Description:
x, 95 leaves : ill. ; 29 cm.

## Subjects

Subjects / Keywords:
Antennas ( jstor )
Cost functions ( jstor )
Eigenvalues ( jstor )
Eigenvectors ( jstor )
Estimators ( jstor )
Matrices ( jstor )
Optimal solutions ( jstor )
Signals ( jstor )
Spacetime ( jstor )
Statistical estimation ( jstor )
Dissertations, Academic -- Electrical and Computer Engineering -- UF ( lcsh )
Electrical and Computer Engineering thesis, Ph. D ( lcsh )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

## Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 2005.
Bibliography:
Includes bibliographical references.
General Note:
Printout.
General Note:
Vita.
Statement of Responsibility:
by Yong Liu.

## Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Resource Identifier:
003477821 ( ALEPH )
AA00004683_00001 ( sobekcm )
770717180 ( OCLC )

Full Text

TIMING AND CHANNEL ESTIMATION IN MULTIPLE-ANTENNA COMMUNICATION
SYSTEMS

By

YONG LIU

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2005

by

Yong Liu

To my wife and parents.

ACKNOWLEDGMENTS

First of all, I would like to thank my advisor, Dr. Tan F. Wong, for his invaluable guidance,

help and constant encouragement during my graduate study at the University of Florida.

I also want to thank the other members of my graduate committee, Dr. John M. Shea, Dr.

Jose A. B. Fortes and Dr. William W. Hager, for their suggestions and help. My special thanks

go to Dr. William W. Hager for his great help during the development of the second half of this

work.

Finally, I am extremely grateful to my family for their encouragement, devotion and sup-

port throughout my whole life.

page

ACKNOWLEDGMENTS.................. .. ............ iv

TA B LE ... .. . vii

LIST OF FIGURES ..................................... viii

ABSTRACT ................. .. ... ......................... ix

CHAPTER

1 INTRODUCTION ............................... 1

1.1 Timing Estimation for Rayleigh Flat-fading MIMO Channels ..... 3
1.2 Channel Estimation for Correlated MIMO Channels with Colored
Interference . 4
1.3 Organization of the Dissertation .................... 5

2 TIMING ESTIMATION IN MULTIPLE-ANTENNA SYSTEMS OVER

2.1 Introduction .... .. .. .. .. 7
2.2 System M odel .............................. 8
2.3 Timing Estimation with Unknown Deterministic Channel ...... 11
2.3.1 M L Estimator ........ .. ... ........... 11
2.3.2 Cramer-Rao Bound ...................... 12
2.3.3 Optimal Training Scheme ................... 14
2.4 Timing Estimation with Random Channel .............. 29
2.4.1 M L Estimator ............... ........... 31
2.4.2 Cramer-Rao Bound ...................... 32
2.4.3 Optimal Training Scheme ................... 35
2.5 Discussions and Conclusions ..................... 37
2.5.1 Orthogonal Training Signals .................. 40
2.5.2 Perfectly Correlated Training Signals .. 40
2.5.3 Deterministic vs Random Channel Approaches .... 41

3 CHANNEL ESTIMATION FOR CORRELATED MIMO CHANNELS
WITH COLORED INTERFERENCE . 42

3.1 Introduction .. .. .. .. 42
3.2 System Model ............... .... ........ 44
3.3 Optimal Training Sequence Design .................. 49
3.3.1 Solution Structure . 50
3.3.2 The Optimal E .......................... 56
v

3.3.3 Optimal Eigenvector Ordering . 58
3.4 Estimation of Channel Statistics and Feedback Design ... 62
3.5 Numerical Results ............................ 69
3.5.1 Co-channel Interference . 70
3.5.2 Jamming Signals ................ ........ 71
3.6 Conclusion .. .. .. .. .. 73
3.7 Appendix . . 76
3.7.1 A Trace Problem ...... . 76
3.7.2 A Determinant Problem . 81

4 CONCLUSION AND FUTURE WORK . 87

4.1 Timing Estimation for Rayleigh Flat-fading MIMO Channels 87
4.2 Channel Estimation for Correlated MIMO Channels with Colored
Interference ................ ......... 87
4.3 Timing Estimation for Correlated MIMO Channels with Colored Noise 88

REFERENCES ........... ....................... 89

BIOGRAPHICAL SKETCH ....... .. .............. 95

TABLE

Table page

1.1 M atrix N stations .. .. .. .. 6

LIST OF FIGURES

Figure page

2.1 Outage probabilities achieved using different training signal sets for a system
with 4 transmit and 1 receive antennas. The unit of the threshold e is T2. 22

2.2 Outage probabilities achieved using orthogonal training signals for different
numbers of transmit antennas. One receive antenna is employed. The unit of
the threshold E is T ........ . 23

2.3 Comparison of outage probabilities of the ML estimator obtained from simula-
tion and calculated from the CRB. The number of transmit antennas nt is 2
and = 10-4T2. ................. .... .. ..... 24

2.4 Comparison of the MSE of the ML estimator obtained from simulation and
the average CRB. The number of transmit antennas nt, is 2. The unit in the
vertical axis is T2............. ................... 30

2.5 Comparison of CRBs obtained using orthogonal training sequences and per-
fectly correlated training sequences for different numbers of transmit anten-
nas. Note that the CRB of the system with the perfectly correlated training
sequences is the same for any number of transmit antennas. 38

2.6 Comparison of the MSE of the ML estimator obtained from simulation and the
CRB. The number of transmit antennas nt is 2. The unit in the vertical axis is
T 2. . . 3 9

3.1 Comparison of total MSEs obtained using different training sequences. ISI-free
symbol waveform and high spatial correlation channel. ... 72

3.2 Comparison of total MSEs obtained using different training sequences. ISI-free
symbol waveform and low spatial correlation channel. ... 73

3.3 Comparison of total MSEs obtained using different training sequences. AR
jammers and high spatial correlation channel. ... 74

3.4 Comparison of total MSEs obtained using different training sequences. AR
jammers and low spatial correlation channel. .. 75

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

TIMING AND CHANNEL ESTIMATION IN MULTIPLE-ANTENNA COMMUNICATION
SYSTEMS

By

Yong Liu

December 2005

Chair: Tan F. Wong
Major Department: Electrical and Computer Engineering

There is an increasing demand for next generation wireless networks, including wireless

local area networks and the third generation cellular networks, that can provide high data rate for

broadband services, improve quality of service (QoS), and support more users. The use of mul-

tiple transmit and receive antennas can offer substantial performance improvement to a wireless

communication system by making the use of the extra degrees of freedom in the spatial domain

and thus is a promising technique to satisfy this demand. Many of the current space-time coding

schemes proposed for multiple-antenna systems assume perfect timing estimation and channel

estimation to achieve the expected performance gain. The lack of timing synchronization be-

tween the transmit and receive signals and the inaccuracy of channel estimation could degrade

the system performance.

In the first half of this work, we investigate the problem of timing estimation in multiple-

antenna systems with the aid of training signals. A slow, independent and identically distributed

Rayleigh flat-fading channel model is considered. We derive two maximum likelihood timing

estimators based on two different approaches, namely treating the channel as deterministic and

random, and present the corresponding Cramer-Rao bounds (CRBs). Then the optimal designs

of training signals based on some figures of merit associated with the CRBs are discussed.

In the second half of this work, we study the problem of the estimation of correlated

multiple-input multiple-output (MIMO) channels with colored interference. The Bayesian chan-

nel estimator is derived and the optimal training sequences are designed based on the mean

square error of channel estimation. We propose an algorithm to estimate the long-term chan-

nel statistics in the construction of the optimal training sequences. We also design an efficient

scheme to feed back the required information to the transmitter where we can approximately

construct the optimal sequences. Numerical results show that the optimal training sequences

provide substantial performance gain for channel estimation when compared with other train-

ing sequences.

CHAPTER 1
INTRODUCTION

There is an increasing demand for next generation wireless networks, including wireless

local area networks and the third generation cellular networks, that can provide high data rate

for broadband services, improve quality of service (QoS), and support more users. The use of

multiple antennas at both the transmitters and receivers in wireless communication systems is

a significant technical breakthrough which can offer substantial performance improvement to

wireless links by making the use of the extra degrees of freedom in the spatial domain and thus

is a promising technique to fulfill these requirements. A system employing multiple transmit

and receive antennas is often called a multiple-input multiple-output (MIMO) system. Recently,

the MIMO system and its related techniques have been widely considered for next generation

wireless communication systems such as wireless local area networks (WLAN) and the third

generation (3G) cellular networks. With multiple antennas, the communication performance can

be improved by many orders of magnitude without increasing transmit power and bandwidth.

Only more hardware complexity is needed. This additional hardware requirement is enabled by

the increasing computational power of integrated circuits.

MIMO systems provide various benefits that include spatial multiplexing gain and diver-

sity gain. The information capacity of wireless communication systems increases significantly

by employing multiple antennas. It has been analytically proved that MIMO systems can pro-

vide a linear increase in capacity [1, 2] which is proportional to the minimum of the number

of transmit antennas and the number of receive antennas. This spatial multiplexing gain can

be obtained by transmitting independent data streams from different transmit antennas. The

increased information rate is achieved without the requirement of increasing the transmit power

and expanding the transmission bandwidth.

The physical characteristics of the wireless channel present a fundamental technical chal-

lenge for reliable communications. Wireless communication channels exhibit significant sig-

nal variations on a short term time scale which is known as fading. One way to mitigate the

several replicas of the same transmitted signal over independent fading channels. The proba-

bility that all the received signals experience deep fades simultaneously reduces considerably.

Thus diversity techniques increase the reliability of wireless links and dramatically improve the

communication performance over fading channels. The commonly used diversity techniques

include time diversity, frequency diversity and spatial diversity. Time diversity can be provided

by channel coding combined with interleaving or automatic repeat request (ARQ) schemes. In

frequency diversity, the same narrowband signal is transmitted over over different frequency

bands to provide independent fading channels. Spatial diversity, which is also known as an-

tenna diversity obtained by the use of multiple antennas, is preferred over time diversity and

frequency diversity since it does not need to increase the transmit signal power and bandwidth.

If the fading effects between different pairs of transmit and receive antennas are approximately

independent and the transmitted signal is carefully designed, the received signals can be com-

bined at the receiver such that the fading of the resultant signal is greatly reduced compared to

a single antenna communication system and thus wireless link improvement is provided.

Space-time coding (STC) is one key technique that has been introduced to provide en-

hanced performance for wireless communication systems employed with multiple antennas.

Space time codes are designed to use the extra degrees of freedom in the spatial domain pro-

vided by extra antennas. They incorporate the temporal and spatial correlations into signals

from different transmit antennas to achieve transmit diversity and provide spatial multiplexing

gain. The main classes of space time codes include the Bell labs layered space-time architecture

(BLAST), space-time trellis codes (STTC) and space-time block codes (STBC).

Tarokh et al. [3] proposed space-time trellis codes which can provide full diversity gain at

the receiver. After that, many efforts have been made to improve the originally designed space-

time trellis codes [4, 5]. Since space-time trellis codes are designed based on trellis codes, they

provide additional coding gain. But the Viterbi algorithm has to be employed for the optimal

decoder of STTC, and thus the decoding complexity grows exponentially with the memory

length of trellis codes and the number of antennas.

To reduce the decoding complexity, Alamouti introduced a simple space-time block coding

scheme for a two transmit antenna system which can provide full diversity gain without sacrific-

ing the transmission data rate [6]. The scheme was extended to more than two transmit antennas

based on the theory of orthogonal designs [7, 8, 9]. Space-time block codes can be decoded us-

ing much simpler linear processing at the receiver compared with the Viterbi algorithm required

for space-time trellis codes. Although space-time block codes achieve the same diversity gain

as space-time trellis codes for the same number of transmit antennas, they do not provide any

significant coding gain. To make a compromise between STBC and STTC, the schemes of con-

catenating the traditional trellis codes with space-time block codes to obtain additional coding

gain has been proposed [10-14].

BLAST [15, 16] is the first space-time coding scheme proposed for MIMO systems which

provides spatial multiplexing. In BLAST, the multiple independent data streams are transmit-

ted from different transmit antennas, and are extracted by using the interference nulling and

interference successive cancelation strategies at the receiver. This decoding scheme operated

in spatial domain for BLAST is similar as the successive interference cancelation proposed for

multiuser detection [17] in CDMA systems. Field tests showed that BLAST provides a substan-

tial increase of data rates for wireless communication systems operating in practical channels

[18].

1.1 Timing Estimation for Rayleigh Flat-fading MIMO Channels

To achieve the performance gain promised by the multiple antenna system, parameter es-

timations including channel estimation, timing estimation and frequency offset estimation are

key components of the space-time system design. Both channel estimation and frequency offset

estimation for MIMO systems have been extensively studied in the literature [19, 20].

An issue that has not been sufficiently explored is timing synchronization in multiple-

antenna systems. Inaccuracies in timing synchronization can degrade the performance of such

communication systems in a similar way as the MIMO channel estimation and frequency offset

estimation error do. For instance, many of the current space-time coding schemes proposed

for multiple-antenna systems assume perfect knowledge of timing and channel gains at the re-

ceiver in order to be able to achieve the promised diversity gain and capacity improvement.

The performance of these systems may be limited by the accuracy of timing estimation. One

objective of this work is to study the problem of timing estimation for a wireless communica-

tion system employing multiple transmit and receive antennas in a Rayleigh flat-fading channel

environment.

1.2 Channel Estimation for Correlated MIMO Channels with Colored Interference

For the multiple antenna communication system, theoretical analysis [1, 2, 15] shows that

the capacity increases linearly with the number of antennas under the assumption that channel

gains between different transmit and receive antennas are identical and independent distributed

(i.i.d.). The i.i.d. assumption is reasonable for sufficiently rich scattering environments. On the

other hand, it is also important to analyze the capacities, design optimal transmission strategies,

and investigate the related channel parameter estimation problem for MIMO systems in more

realistic situations which include spatially correlated channels and colored interference.

In the more realistic channel environment, fading correlation exists between the different

transmit antennas and receive antennas. It was shown [21 ] that the capacity of correlated MIMO

channels still grows linearly with the number of antennas but the growth rate is affected by the

channel correlations and smaller than that in independent fading channels. Based on the ca-

pacity results for correlated MIMO channels, optimal transmission strategies [22-25] have been

widely investigated. Jorswieck et al. [24] investigated the correlated Rayleigh flat fading MIMO

systems with perfect channel state information at the receiver and the channel covariance infor-

mation fed back to the transmitter. It was shown that transmitting signals along the directions

of the eigenvectors of the transmit correlation matrix is the optimal transmission strategy.

The capacity of MIMO channels has also been investigated for wireless communication

systems with colored interference. The scenario arises in cellular systems where the users in

one cell suffer from the co-channel interference from the users in other cells due to frequency

reuse, or in ad hoc networks where each transmitter-receiver pair suffers from the interference

from other transmitter-receiver pairs operating in the same frequency band. In Lozano et al.

[26], the capacity of MIMO systems with the presence of spatially colored interference was

investigated. It was shown that the capacity increases with the interference spatial correlation

and the lowest capacity is achieved when the interference is white. In Moustakas et al. [27], the

authors provided analytical expressions for the statistics of the mutual information for spatially

correlated channels with the presence of interference.

Channel estimation is necessary for coherent detection in multiple antenna communication

systems. The inaccuracy of channel estimation could degrade the system performance substan-

tially. There are few works considering the channel estimation problem for MIMO systems

in realistic situations, which include both spatially correlated channels and interference. So

another objective of this work is to investigate the problem of estimating correlated MIMO

channels with colored interference.
1.3 Organization of the Dissertation

The dissertation is organized in the following manner. The timing estimation problem

for MIMO systems with the aid of training signals is investigated in Chapter 2. In Chapter

3, we study the problem of estimating correlated MIMO channels in the presence of colored

interference. Conclusions are drawn in Chapter 4. The notation used in this dissertation is

summarized in Table 1.1 for clarity.

Table 1.1: Matrix Notations

A
a
Real(a)
It
0
diag(xi, x2,..
A7'
A*
AH
A1/2
vec(A)
tr(A)
det(A)
AB
a > b
A
ar
CN

b(t)

,X1)

matrix with complex entries
column vector with complex entries
real part of column vector a
n x n identity matrix
zero matrix
diagonal matrix with xa, 2, -... n as the diagonal elements
transpose of A
complex conjugate of A
complex conjugate transpose (Hermitian) of A
Hermitian square root of A
vector obtained by stacking columns of A on top of each other
trace of A
determinant of A
Kronecker product of A and B
inequality elementwise
matrix with real entries
column vector with real entries
complex Gaussian distribution
the first derivative of (t) w.r.t. t
the second derivative of 0(t) w.r.t. t

0) I

CHAPTER 2
TIMING ESTIMATION IN MULTIPLE-ANTENNA SYSTEMS OVER RAYLEIGH

2.1 Introduction

In this chapter, we investigate the timing estimation problem for a wireless communication

system employing multiple transmit and receive antennas with the aid of training signals.

Previous related work was primarily restricted to acquisition in spread spectrum systems

with multiple receive antennas [28, 29]. In Dlugos et al. [28] and Win et al. [29], the maximum

likelihood estimator of the received code lag was obtained, and the error probability for the

acquisition system was derived. A deterministic but unknown channel was considered in Dlugos

et al. [28], whereas a flat Rayleigh fading channel with known statistics was assumed in Win

et al. [29]. An optimal estimator for code acquisition was derived in Shamain et al. [30] for

spatially correlated channels. In Zhang et al. [31], the performance of code acquisition in a

DS-CDMA system employing multiple transmit antennas was analyzed. Through simulations,

it was shown that the presence of multiple transmit antennas improved the code acquisition

performance, relative to that of a single-antenna system.

Issues related to parameter estimation of signals received by an array of antennas have

also been treated in the radar array signal processing literature [32, 33]. Time delay and spatial

signature estimation of known signals received by an array of antennas was investigated in

Swindlehurst et al. [34]. ML algorithms and the Cramer-Rao bound for time delay and array

calibration estimation were developed, and some computationally efficient approximations of

the ML algorithms were proposed. In Dogandzic et al. [35], ML methods were developed for

space-time fading channel estimation with an antenna array in spatially correlated noise. The

CRBs for the unknown directions of arrival, time delays, and Doppler shifts were derived, under

a structured and unstructured array response model.

In the present work, we consider a wireless communication system with multiple trans-

mit and receive antennas in a slow, independent and identically distributed (i.i.d.) Rayleigh
7

flat-fading environment. The goal is to investigate the problem of timing estimation in such

a system with the aid of training signals. One of the main questions that we try to answer is

to find the optimal training signal design. We investigate the timing estimation problem under

two approaches. In the first approach, the channel is assumed to be unknown and determinis-

tic where joint estimation of the channel and delay is carried out. We derive an ML estimator

for joint channel and timing estimation, and compute the associated CRB. Then we discuss

the optimal training signals with respect to two performance measures based on the CRB: the

outage probability that the CRB is larger than a threshold and the average CRB. We show that

the optimal training scheme is one wherein orthogonal training signals from multiple transmit

antennas are used. In the second approach, the channel is assumed to be unknown but random

with known statistics. We use the likelihood function averaged over all random channel real-

izations to obtain the ML estimator for the delay. We derive the associated CRB and study the

optimal training scheme in terms of minimizing the CRB. We show that perfectly correlated

training signals employed at different transmit antennas constitute the optimal transmit scheme,

in contrast to orthogonal training signals in the first approach.

The rest of this chapter is organized in the following manner. The system model is in-

troduced in Section 2.2. In Section 2.3, we consider the timing estimation problem when the

channel is assumed to be unknown but deterministic. In Section 2.4, we study the problem of

timing estimation with the assumption that the channel is random but with known statistics. In

both sections, we derive the ML timing estimators and compute the associated CRBs. Optimal

training signal designs are discussed based on the corresponding CRBs. In Section 2.5, some

discussions comparing these two timing estimation approaches are provided.
2.2 System Model

We consider a single-user MIMO system with 7t transmit antennas and nr receive anten-

nas. We assume a quasi-static (block fading) channel where the channel varies slowly enough

to be considered invariant over a block. However, the channel changes to an independent value

from block to block. By using the unstructured array model [33], the received baseband signals

at the receive antennas are given in vector form by
nt
r(t) = hksk(t T) + n(t), (2.1)
k=1

where hk = [hk2, hk2,... hknr]w with hi3 denoting the channel gain from the ith transmit

antenna to the jth receive antenna, r(t) is the nr x 1 received signal vector from the receive an-

tenna array and Sk (t) is the transmitted training signal from the kth transmit antenna. Define the

channel vector as h = [hf h,..., h ]T. Also, n(I) is a complex, circular-symmetric, white

Gaussian noise process with zero mean and covariance matrix E[n(t)n(u)HI =2 a2 6(t u).

The symbol T denotes the unknown, deterministic delay to be estimated. This model assumes

that the delays between all pairs of transmit and receive antennas are the same. This corresponds

to the case in which the distance between the transmit and receive antenna arrays is much larger

than the sizes of the arrays.

We consider the Rayleigh flat-fading channel model, in which the channel coefficients

h,j are i.i.d. complex, circular-symmetric, zero-mean Gaussian random variables with the

CA/(0, p2) distribution, i.e.,

E[hkh] P2ur, E[hkhfk = 0, and E[hihH] = E[hihT] = 0, for i 5 j.

The conditional likelihood function of r(t), given the unknown r and h, can be written as

p(r(t) I, h) = Tr-ra- 2exp ( r(t) hkSk(t 7) dt), (2.2)
k=1

where we have assumed that the training signals sk(t), for k = 1,...., n,, have finite durations,

and the observation interval To is larger than the sum of the maximum training signal duration

and the maximum possible value of T. Thus the whole transmitted training signals are observed

We can simplify the exponent of the likelihood function to find the sufficient statistics for
the estimation of the delay r:
To ut 2
/ r(t) hkSk(t-) dt
'0 k=1

= rH(t)r(t) dt 2Re{ [ rH(t)sk(t r) dt hk

= const 2Re r(t)sk(t r) dt hk
k=1 .
nt n t T o
+hfh s (t)sj (t) dt,
i1 =11
where the term const represents the part which does not depend on the delay r and the channel
h. Also, the last equality holds due to the assumption that To is larger than the sum of the
maximum training signal duration and the maximum possible delay.
Denote the matched filter output corresponding to the kth transmit signal by

rk(r) = r*(t)sk(t ) dt, k = 1,2.... nt. (2.3)

Note that r(r) = [rl(r)T, r2(7)T, ... r, (r7)T]T provides sufficient statistics for estimating r.
With this notation, we then have

2Re{ [ /T'rH(t)sk(t 7)dt hk} = 2Re{r(r)Th}. (2.4)

Denote the crosscorrelation between the training signals from the ith and jth transmit antennas
as
r = s*(t)s,(t) dt, (2.5)

which forms the (i, j)th element of the correlation matrix F. Let C = F 0 In,. Then, we have
nt 11t T(
E hhj s*(t)sj(t) dt = h"Ch. (2.6)
i=1 =1 0

From (2.4) and (2.6), the conditional likelihood function of r(r), given the unknowns r and h,
can be written as

p(r(r)|T, h) = 7r-' a-2nexp (const 2Re{r(T)Th} + hHCh1. (2.7)

Let (7r) = [Re(r(r))', -Im(r(r))T]T, f = [Re(h)T, Im(h)T]T, and

= Re (C) -Im(C) By using the isomorphism between real and complex matrices
Im(C) Re(C)
[36], we have 2Re{r(T)Th} = 2i(T)Th and h"Ch = 2h1TC. In terms of these real quantities,
the conditional likelihood function of f(r) is then

p(i(r)7r,h) = 7r-n"-2nexp cost 2(T) h + 2hTh (2.8)
2(2.8)

2.3 Timing Estimation with Unknown Deterministic Channel
In this chapter, we will treat h as unknown but deterministic in the estimation process and
consider the joint estimation of the delay r and the channel vector h.
2.3.1 ML Estimator
In this section, we develop the ML estimator for the joint estimation of the timing r and
the channel vector h. The joint ML estimate of r and h maximizes the conditional likelihood
function (2.8) as a function of r and h:

maxp((7-)|T, h) = max{mnaxp(r(r)|r, h)}. (2.9)
r,h h

Alternatively, we can maximize the log-likelihood function given by

1-
L = const + I (2f (r) h 2h'Tft). (2.10)

As suggested in (2.9), we first maximize the log likelihood function L over h. Taking the first
derivative of L with respect to (w.r.t.) h gives

aL 1
-- = {2f(r) 40C}.
Off 6,

By letting = 0, we get the ML estimate of the channel h as

m = -C- (T), (2.11)
2

where we have assumed that C, i.e. C, is nonsingular to obtain a unique estimation of the

channel. Then substituting (2.11) into (2.10) gives the ML estimate of the delay 7 in the form:

T,,,I = arg max {i(T)TC- ((T)}. (2.12)

To implement the ML estimator in general, we need to conduct a line search over all possible

values of r to maximize the above metric.
2.3.2 Cramer-Rao Bound

The Cramer-Rao bound gives a lower bound on the variance of any unbiased estimator

[36, 37]. It has been widely used to lower bound the mean square error (MSE) of symbol timing

estimators [38, 39]. It is well known [36, 37] that ML estimators, under mild regularity condi-

tions and with independent and identically distributed observations, are asymptotically unbiased

and efficient. It can be easily verified that the elements of r(T) given in (2.3) corresponding to

different receive antennas are i.i.d. observations. Thus for a particular realization of the channel

h, the ML estimator is asymptotically efficient, i.e., it approaches the CRB as the number of

receive antennas nr becomes large. Hence the CRB is a suitable performance measure for the

ML estimator of the delay 7. We will also verify the suitability of employing the CRB as a

performance metric by computer simulation examples.

The main result of this section on the CRB is contained in the following theorem.

Theorem 2.3.1 (Cramer-Rao bound). Suppose that the first and second derivatives of the

training signals Sk(t), for k = 1,..., nt, exist and they are uniformly continuous on [0. To].

Together with the standard regularity conditions in [36, 37], the Cramer-Rao bound for the
estimation of the delay 7 for a given realization of the channel h is given by

,2 1
CRB(h) 2r() T E'TC- r' (2.13)

where E[ ] = [E[ a2 ], E[ ],.... E[ ] with

E = h' (t) ()}l=, i= 1,2,...,nt (2.14)
T k=1 k Jk= o

andE[9- = [E[ T E[2 ]T. .E[ ] with

E ) = h s (1)(t) dt, i = 1,2,...,nt. (2.15)
k=1 "
Proof The CRB for the estimation of r is given as

CRB(h) = (I-1)22, (2.16)

where I is the Fisher information matrix for the joint estimation of the channel h and the delay
T which is defined as

I = 112 -E -E (2.17)
121 122 -E 2Lj _-E Lj

Since t = 4C and = 0, we have
oh Oh
E 52L1 4-.
I1 = -E = -H (2.18)
1,h2I W2
Moreover,
112 = -E -21- (2.19)

Let v E [ =E E E[ .. ., -- T ,
thenl2 = = -^ [Re(v)T, _Im(v)T]T.
The ith block of v can be computed from
,(r) fT r(t) Os(t T) dt
1 r*(r'(t) 0 -

= nh .s*(I 7.) + n*(.)]) di
0 k=1
h= h*( s(t 7) dt n*(t) '(t-') dt.
k=1 0

The fact that the noise n(t) is zero-mean gives

E [ ( = h; s (t- r) (t dt = hi s*(t)(t) dt. (2.20)
ST k=1 k 9k=1

Finally, 122 = E [ ]) h = Re {E [a2)] h}. Similarly, 122 can be computed
from the fact that
E { )= h s*((t) i(t) dt. (2.21)
k=i 1
Applying the standard result on the inverse of a partitioned matrix to (2.16) and (2.17)
gives
CRB-'(h) = 122 1211-11112. (2.22)

By using the relationship between real and complex matrices [36], we get

121111112 2= T 2 V = 2VT-v*. (2.23)

Then the CRB of the estimation of the delay r is

CRB(h) = 2 ". (2.24)
2Re E[--J h +E [ C- E[LTJ

We note that the CRB varies with different choices of training signals. By carefully choos-
ing the training signals to minimize a suitable measure associated with the CRB, we can poten-
tially improve the estimation performance.
2.3.3 Optimal Training Scheme
Communication systems often employ the same symbol waveforms for both training and
data phases. The choice of the symbol waveform is mainly decided by the performance required
by data transmissions. In this section, we shall make the following simplifying but practically
reasonable assumptions on the training signals:

Assumption 1

Let ak = [ak(0)...., ak(N I)]r be the training sequence assigned to the kth transmit

antenna, and on this antenna the training signal waveform is of the form
N-1
Sk(t) = E ak(i)4(t iT,), (2.25)
i=0

where N is the number of training symbols and Vp(t) is the symbol waveform. We call the

N x nt matrix A = [a. a2, ..., an,] as the training sequence matrix.

Assumption 2

The symbol waveform Vj(t) is time-limited to a single symbol period [0, T,] so that adjacent

symbols do not interfere with each other. In addition, O(t) is sufficiently smooth to guarantee

the existence of uniformly continuous first and second derivatives. This condition is satisfied

for most symbol waveforms of practical interest. Two typical examples are the time-domain

raised-cosine pulse and the half-sine pulse.

Assumption 3

AHA is nonsingular, and hence F and C are also nonsingular. We note that this implies

that N > nt.

Under the assumptions stated above, the CRB for the timing estimation can be simplified

to the expression summarized in the following corollary.

Corollary 2.3.1 (Cramer-Rao bound). Given Assumptions 1-3, the CRB for the estimation of

-r for a particular realization of the channel h reduces to

CRB(h) = -2 1H (2.26)
2V, hH(AHA 9 In,)h'

where '0 = ,bc -r- I Od12, Ob f I(t)12 dt, V)c = f V*(t)(t) dt, and 'd = f ?*(t) (t) dt.

Proof With the three assumptions on the training signals, we have
7 ToN-1 N-1
s'(t)(t)dt = ( Eak*(m)O*(t mT,) E (l)ai(t-lT,)dt
m0 0n=0 1=0
H aa,. I (t)(t) dt (2.27)

Then, Eqn. (2.14) can be written in terms of the training sequences as:

E 02 :c h(a"a. (2.28)
k=1
Thus
E [2r h = ,E T hha'Ha, = V,1h(AHAI I)h. (2.29)
E -r2 i k-k
i=1 k=1

Hence 122 = -gRe E [ hr = 2chH(AHA 0,,)h.
Moreover, (2.15) can also be simplified in terms of the training sequences as:

E -d h.akai. (2.30)
k=l

Thus E = -)d(AHA Inr)*h*. Similarly, we have

= s(t)sj(t) dt = ,a aj (2.31)

and C = 'b(AHA 0 I,,).Hence, (2.23) can be written as

121 -1112 = 2Vdh (AHA 0 In,r)(bAHA 0 In,)-'V*(AHA 0 I1,)h
= 2 dIhH(AHA I9,)h. (2.32)
0-2 Ob

Then the Cramer-Rao bound for the estimation of the delay r is

CRB(h) = [122 121111121-1
= O hH(AHA 0 I,)h -2 d h(AA I,,) h

S a2' b 1 (2.33)
2(',00, + '..', 2) hH(AHA 0 Io,)h

By using some standard properties of the Fourier transform similar to the Parseval's theo-
rem, we have b = f[ |()|2d, '_ -2 .- j 2 (w) 2dw, and
ad = f J jl W'(w)j2dw, where T1(w) is the Fourier transform of V(t). Then according to the

Cauchy-Schwarz inequality, we have

0a = M'b + I dl2
[ T! 0 ~()12dw]2 +1[j +00 C(9

27r 27r

< 0. (2.34)

Since Ob > 0, we have > 0 which implies that the expression of the CRB given in (2.33)

is nontrivial.

As a result, the dependence of the CRB on the training signals Sk(t), for k = 1,..., nt,

simplifies into that on the training sequence matrix A and the symbol waveform i(t). In the

following two subsections, we optimize the training sequence matrix A in terms of two perfor-

mance measures, namely the outage probability that the CRB is larger than a threshold and the

average CRB over all channel realizations.

Outage probability

In this subsection, the outage probability that the CRB is larger than the threshold 6, i.e.

Pr(CRB(h) > (), is used as a performance measure with respect to which the training signals

from different transmit antennas are optimized.

Write the spectral decomposition of AHA as AHA = UAUH, where U is a unitary

matrix and A = diag{ A1, A2, ., An, } is the diagonal matrix containing the positive eigenvalues

of AHA. The design of the optimal training scheme can now be formulated as the following

optimization problem:

mmin Pr(CRB(h) > c)
A
subject to tr{AHA} = Eib1 A. <_

A, > 0, i = 1,...,n (2.35)

where pbtr {AHA} < P specifies a constraint on the total transmit power.

First, we consider a simple but important case: 2 transmit antennas and 1 receive antenna.
In this case, the optimization problem (2.35) can be simplified as follows. Starting from Corol-
lary 2.3.1, we have

Pr(CRB(h) > e) = Pr hHAHAh < (2.36)

With the spectral decomposition of AHA, hHAHAh = hHUAUHh = h'HAh' = Ai|i|2 +
A2h'12, where h' = UHh. Since h is a random vector with i.i.d. complex, circular-symmetric,
zero-mean Gaussian elements and U is a unitary matrix, h' is also a complex Gaussian random
vector with the same distribution as h. We note that Ih'I 2 has the exponential distribution with
E(|h 12) = p2.
Let X = and c = P for i = 1, 2, then

Pr hHAHAh < ) ) Pr (cXi + c2X2 < -, (2.37)
2 2ea/ 2cV\ Pp2

where X1 and X2 are independent random variables with exponential distribution and E(Xi) =

E(X2) = 1. The total power constraint A, < -is equivalent to c, + c2 < 1. Hence the
optimization problem can be rewritten in the following simple form:

min Pr (ciXi + c2X2 < 2Eb2
Ci,C2 \ pp2
subject to c + c2 < 1, and c1,c2 >0 (2.38)

In order to solve the above optimization problem, we employ the following result on the
Schur-convexity' of the distribution function of the linear combination of two exponential ran-
dom variables [41].
Lemma 2.3.1. Let X1 and X2 be independent random variables with exponential distribution,
and E(Xi) = E(X2) = 1. Then the function

F(ci. c2, x) = Pr(ciXi + c2X2 < x), where cl + c2 = 1 and cl, c2 > 0,

A detailed description on Schur-convexity and majorization can be found in Marshall et al.
[40].

is Schur convex on (cl, c2) ifx < 1, and it is Schur concave on (ci. c2) if x > 3/2.

Using the above lemma and considering the region in which the CRB threshold e >

- P-- the optimization cost function in (2.38) is a Schur convex function on (c1, c2) Thus

minimization of the cost function occurs if and only if cl = c2 = i.e.., A1 = A2 = [40].

This implies that the optimal A is such that A"A = -E1I2. The optimal training scheme is

summarized in the following theorem.

Theorem 2.3.2. Suppose that the CRB threshold > the training sequence matrix

A such that AHA = -I minimizes the outage probability of the CRB for a system with 2
2Vkb
transmit antennas and 1 receive antenna. That is, the optimal training sequences from different

transmit antennas are orthogonal to each other and have equal powers.

We shall see from the discussion in the next subsection on the average CRB (Corollary

2.3.2), the value ---- is exactly one half of the average CRB over all channel realizations.

Thus, it is reasonable to consider the stated region of the CRB threshold.

It seems natural that a result analogous to the one in Lemma 2.3.1 be true for the more

general case. While the proof of such a result remains open, there is strong evidence regarding

the Schur convexity of the function F(ci,..., c,,, x) = Pr(cl Xi + + cn, X,, < x) where Xi,

for i = 1 ... nt, are independent random variables with unit-mean exponential distribution.

The following conjecture has been advanced in Merkle et al. [41], supported by some strong

numerical results.

Conjecture 2.3.1. The family of unimodal distribution functions F(c ,.... cn, x) is increasing

with respect to the variance (i.e., Schur-convex) for small values x, and decreasing (i.e., Schur-

concave) for large values of x.

Based on the above conjecture, we conjecture that the result in Theorem 2.3.2 extends to

the case of arbitrary numbers of transmit and receive antennas:

Conjecture 2.3.2. When A"A = --I, the outage probability of the CRB is minimized if the

CRB threshold c is not too small. Thus the optimal training sequences from different transmit

antennas, in terms of minimizing the outage probability, are orthogonal to each other and have

equal powers.

In Hassibi et al. [42], the authors assumed perfect timing estimation and studied the prob-

lem of choosing the optimal training sequences for channel estimation to maximize a lower

bound on the capacity of the channel that was learned by training. The optimal training se-

quences for channel estimation turned out to have the same structure as those we get here for

timing estimation.

To illustrate our conjecture on the optimality of orthogonal sequences, we have carried

out a large number of numerical calculations. In the broad region of c that we are interested
in, we have not observed the existence of any other schemes which can achieve a lower out-

age probability than the orthogonal training signals. In Fig. 2.1, we plot, for instance, the

outage probabilities Pr(CRB(h) > e) for a system with 4 transmit antennas and a single re-

ceive antenna employing different training signal sets. Note that since P is the total transmit
power constraint, the signal-to-noise ratio (SNR) p2 here should be understood as the total

SNR for the whole training period instead of the SNR for one symbol period. The time-domain

raised-cosine pulse is used as the symbol waveform. The results in the figure suggest that the
orthogonal training signals are optimal and can provide a significant performance gain over the

other training signals.
In Fig. 2.2, we compare the outage performance of orthogonal training sequences for dif-

ferent numbers of transmit antennas. The results in the figure show that the use of multiple trans-
mit antennas can offer substantial estimation performance improvement over a single-antenna

system. For example, if we consider the outage probability Pr(CRB(h) > e) = 0.1, the two-

transmit antenna system can achieve a 4 dB performance gain and the four-transmit antenna

system can achieve a 6 dB performance gain. The performance gap grows with decreasing

outage probability.
More precisely, the outage probability for orthogonal training signals is given by

Pr(CRB(h) > E) = Pr hH h < ntb )

= 1 exp, 2 G pp2 2ba Pp2J (2.39)

where the second equality is obtained from the fact that hHh is X2 -distributed [43]. From
(2.39), it is not hard to see that when the SNR is large, i.e. > 't the outage probability
is approximately given by

1 F 2 ]n2
Pr(CRB(h) > c) b 2 [-f p] "(2.40)
(nnr),! 2eoa Pp2

Eqn. (2.40) indicates that the outage probability decreases with the (ntn,)th power of the re-
ciprocal of the SNR. The power ntn, is usually referred to as the diversity order of the system
[43]. Thus we conclude that the use of multiple transmit and receive antennas (with orthogonal
training signals) provides spatial diversity for timing estimation in the same way as space-time
coding does for demodulation [1, 15].
An important remaining issue is whether the ML estimator can achieve the outage prob-
ability of the CRB. For each realization of the channel h, the ML estimator is asymptotically
efficient with increasing number of receive antennas n,. We note that Pr(CRB(h) > e) =
Eh[l(CRB(h) > c)], where 1(-) is the indicator function. Because the indicator function is a
bounded function, the dominated convergence theorem [44] implies that the ML estimator can
achieve the outage probability of the CRB asymptotically.
To verify the suitability of using the outage probability as a performance metric when the
number of receive antennas is small, we evaluate the performance of the ML estimator via
Monte-Carlo simulations. In Fig. 2.3, we plot the outage probabilities of the ML estimator
obtained from simulation and calculated using the CRB, respectively, for a system with two
transmit antennas and employing orthogonal training sequences. It can be seen that the ML
estimator gives an outage probability performance very close to that predicted by the CRB even
for small values of n, = 1, 2, and 4. Hence, the simulation results verify that the outage prob-
ability of the CRB provides an effective performance metric also when the number of receive
antennas is small.
Average CRB
In this subsection, we use the CRB averaged over the Rayleigh flat-fading channel h as an
alternate performance measure based on which the training signals from the transmit antennas
are optimized.

>7
R'~.
\ \ 'S
\\ \q N7
.\
\ \
~ \

I \

25
SNR (dB)

Figure 2.1: Outage probabilities achieved using different training signal sets for a system with
4 transmit and 1 receive antennas. The unit of the threshold e is Tf .

E- =10-2,C=c=C=C3=4
-e- =10 2,c =2/3,c2=1/6,C3=c =1/12
-B- C=10 ,c1=9/10,c2=C3=C4=1/30
-V- =102,c =99/100C2 =C3 =C4=1/300
-*- =10- C1=C=C3C4
-0- =10 3,C1 =2/3,c2=1/6,C3=c4=1/12
- E- =10 3,c =9/10,c2=C3=C4=1/30
E=10 3C =99/100,C2=C3=C4=1/300

10-3,
10

23

-0- n=1,=10

100--. n =2,E=10-2
N10 .- n=4,E=103
N nt=4,E=10-3
Sn=2,E=10

( .. ^ ... .. .. .. .. ... n =1,E=10-4
G L -- nt=2,c=10-4
S. n=4,E=10

10-1 -\ V.

o N\

10V
10 15 20 25 30 35 40
SNR (dB)

Figure 2.2: Outage probabilities achieved using orthogonal training signals for different num-
bers of transmit antennas. One receive antenna is employed. The unit of the threshold e is
T 2

24

10 ---I
ML, nr=I
S ML, nr=2
... .. .... M L n 4 ....
CRB, n=1
\ \_ CRB, nr=2
CRB, n,-4

0

10 -

18 20 22 24 26 28 30 32 34
SNR(dB)

Figure 2.3: Comparison of outage probabilities of the ML estimator obtained from simulation
and calculated from the CRB. The number of transmit antennas nt is 2 and e = T104T?.

After averaging over the Rayleigh flat-fading channel h, the average CRB is given as

Eh[CRB(h)] 2bEh [hH(AH IJ. (2.41)

The design of the optimal training scheme can now be formulated as the following optimization

problem:

mmn Eh hH(AHAIn,)h

subject to tr{AHA} =E Ai <

Ai > 0, i = 1,...,nt. (2.42)

The following theorem specifies the optimal training sequence that minimizes the average CRB.

Theorem 2.3.3. When AHA = P1I, the average CRB over the Rayleigh flat-fading channel

h is minimized. That is, the optimal training sequences from different transmit antennas, in

terms of minimizing the average CRB, are orthogonal to each other and have equal powers.

Proof Let W = U'A'U'H, where A' = diag{A'i, A',..., A, } contains the positive eigenval-

ues of the Hermitian matrix W, and U' is a unitary matrix. Consider the following optimization
problem:

mmn E[

subject to tr{W} = EI' A' < ,v

A' > 0, i= 1,...,ntnr. (2.43)

Note that

E [h h = E[hU = E[hHA = E [znt ,l (2.44)
h hHWh h H U'A'U'Hh h'HA'h' =E_ AI|hl|2

where h' = U'Hh. As before, h' is a complex Gaussian random vector with the same distribu-
tion as h.
Let g(A') = ---, where xi > 0 are assumed to be fixed constants. We study the convexity
property of g.

We have -_ _- --- and 2- ,= j, Then the Hessian G(A') ofg is

It is easily seen that every rows of the Hessian G(A') are dependent. So rank(G(A')) = 1.
Ai I j 1."

A'-)3 (E )3
G =A') 2aEj 2ax2xrt
( A, )3 ( A* )3 ". (E )3

It is easily seen that every rows of the Hessian G(A') are dependent. So rank(G(A')) = 1.

G(A') only has one nonzero eigenvalue which is E 2X2 > 0. (the sum of eigenvalues of a

matrix is equivalent with the sum of all diagonal elements.) All other eigenvalues are zero.

Hence, the Hessian G(A') is a positive semidefinite matrix. Then g(A') is a convex function on

R" = {(A,..., A,,,) : A' > 0, for i= 1,..., ntn,}.

In order to solve the above optimization problem, we employ the following result from the

theory of majorization [40]. We first introduce some fundamental concepts of majorization that

we require in the derivation of the optimal transmit scheme.

For any x = (x:,..., x,) R", let xp] > ...> x[ denote the components of x in

decreasing order.

Definition 2.3.1. For vectors x, y A C R", vector y majorizes x on A if
k k
X[i] < y[i], k = 1,...,n-1

n n

i=1 i=1
The notation x -< y means x is majorized by y on A, or y majorizes x on A.

Majorization makes precise the vague notion that the components of a vector x are less

spread out or more nearly equal than the components of vector y.

Definition 2.3.2. A real-valued function f defined on a set A C R"n is said to be Schur-convex

on A if

x y => f(x) f (y)

f is Schur-concave if the above inequality is reversed. It follows that f is Schur-convex on

A if and only if -f is Schur-concave on A.

Lemma 2.3.2. IfX1,.... X, are exchangeable random variables and the multi-variable, single-
valued function g is a symmetric Borel-measurable convex function, then the function

f(a,,..., an) = E[g(aX1...., aXn)]

is Schur convex.

Since h' are i.i.d., they are exchangeable random variables. Since hI are exchangeable
random variables and g(A') is a symmetric Borel-measurable convex function, the function
E 2[ 1", is Schur-convex by the lemma.
Moreover, since P, is majorized by (Ai..., ,A't,) whenever A > 0, A' =
nr, we know [40] that E n is minimized with A' = A'2 r = We
A-b 1 1. 1A,'h',j s 1 = 2 -t,,r kbnt'
note that this choice of A', i = 1,..., ntn,, also satisfies the constraints in the minimization
problem in (2.42). Thus, it is also a solution to the original minimization problem. Thus the
optimal training sequence matrix A should satisfy A"A = P I which implies that the train-
ing sequences from different transmit antennas are orthogonal to each other and have equal
powers. O

With the optimal training sequences, we can provide an explicit expression for the average
CRB which is described in the next corollary.
Corollary 2.3.2 (Average CRB). Using the optimal training scheme, the average CRB over the
Rayleigh flat-fading channel h is given by

Eh[CRB(h)] -- b 2- (2.45)
2 (n _L) Pp2

when ntn, > 2.

Proof From Theorem 2.3.3 and its derivation, the average CRB under the optimal training
scheme is given as
Eh[CRB(h)] = E2hbE p (2.46)
20,, bLnt 2i=l 2i I
where h' are i.i.d. complex circular-symmetric Gaussian random variables with the C.A/(0, p2)
distribution.

Let Y = ~ ~'r Ih' 2. Then Y is xine-distributed with the probability density function
(p.d.f.)
fy(y)=1 y-It-le -, for y > 0. (2.47)

Let Z = 1/Y. The p.d.f. of Z is given as

fz(z) = -fy = 7(tt2r ), n,,+l' for z > 0.

The expectation of Z can be computed as

E(Z) = zfz(z)dz
1 00
= )2ltTr2tF(t) j e- z"ntn-2 dz. (2.48)

When ntnr > 2 [45], we have

2n"i' (ntnr 2)! 1
E(Z) = p2nnr2ntr(ntn, 1)! p-2ntn,+2 p2(ntn 1) (2.49)

Then from (2.46) and (2.49), the average CRB can be written in a simplified way as

Eh[CRB(h)] = 0- (2.50)
2 (n, ) a Pp2

With the optimal orthogonall) training sequences, the average CRB is a simple function of
the constant --, which only depends on the symbol waveform V(t), the signal-to-noise ratio

P-2, the number of transmit antennas nt, and the number of receive antennas nr. Note that the
average CRB in the limit of large nt or large nr can be approximated as

Eh[CR-B(h)] (2.51)
2nr pp 2, 2 2n, (VbU + I[d12) pp2'

which is inversely proportional to the number of receive antennas nr. When V'(t) is symmetric
about -, such as the time-domain raised-cosine pulse and the half-sine pulse, pd becomes zero.
Then the average CRB for the estimation of the delay r with orthogonal training signals can be

written as
1 cr2
Eh[CRB(h)] = 1(2.52)
2 _-L) 32Pp2

where = 1/2 [ (t)(t) dt/2 i -2 1/2 is known as the root-mean-
square bandwidth [37] of the symbol waveform. Here ID(cw) is the Fourier transform of 0 (t).
We note that the average CRB can be decreased by increasing the bandwidth of the symbol
waveform.
As before, we would like to know whether the ML estimator can achieve the average
CRB. Because the function hH(AHA0I,)h is not a bounded function, thus unlike the outage
probability of the CRB, the ML estimator may not achieve the average CRB asymptotically
(see further discussion in Section 2.5.1). However, the average CRB provides a lower bound for
the variance of any unbiased timing estimator averaged over the channel realizations.
Again, we employ Monte-Carlo simulations to evaluate the performance of the ML esti-
mator with a small number of receive antennas. In Fig. 2.4, we compare the mean squared
error (MSE) achieved by the ML estimator and the average CRB given by (2.45) for a system
with two transmit antennas and employing orthogonal training sequences. For a single receive
antenna system, the performance of the ML estimator deviates significantly from the average
CRB. This is due to the events in which all the channel coefficients are very small simultane-
ously causing the estimation performance to be very poor. The large estimation errors caused by
these events dominate the MSE of the ML estimator. We can see from the figure that the effect
of these events diminishes as the number of receive antennas or the SNR increases. In the for-
mer case, the error dominating events become rarer as the number of receive antennas increases.
In the latter case, the estimation errors, and hence the effect of the error dominating events, get
smaller as SNR increases. For a reasonably small value of n.r, e.g. 4, and a reasonably high
SNR, e.g. 20 dB, we see that the average CRB is still a rather appropriate performance metric.
2.4 Timing Estimation with Random Channel

Recently, differential space-time coding schemes [46, 47, 48] have been developed where
channel estimates are not required at the receiver. For this situation, we only need to consider

20
SNR(dB)

Figure 2.4: Comparison of the MSE of the ML estimator obtained from simulation and the
average CRB. The number of transmit antennas nt is 2. The unit in the vertical axis is T2.

the estimation of the delay r. A reasonable model to represent this scenario is that the channel
is random with known statistics.
2.4.1 ML Estimator
Recall that the conditional likelihood function p((7)|17, f) of i (7) in terms of real vectors
and matrices is given by (2.8). With the assumption of i.i.d. Rayleigh flat-fading channels
between the transmit and receive antennas, we have E [hh ] = p21n,,r and E al2] = I2n,,.n
The joint probability density function of the channel vector h is given as

p() = exp { h- 2fTl. (2.53)

We can average p((rT) |r, h) over all realizations of h to obtain the unconditional likelihood
function as

p(i(r)|r)

= /P((r()T,hfi)p(h)dhfi

= const x 1 exp { f(r)T (20 + PI r(7r) (2.54)
v/det(2C + (I) P

where we have used the integral result from Cramer [49, 11.12.1 a]. The natural logarithm of
p(f(r) 17) is the log-likelihood function:

ln[p(r(7T)7)] = const + -T.fT 2C + -I f(7). (2.55)
p2 )

By using the relationship between real and complex matrices [36], the log-likelihood function
can be written in terms of complex quantities as
1 .7T C 2 -1
ln[p(r(r)|r)] = const + -r(r) C + -I) r(r)*. (2.56)

Hence the ML estimator for the delay r is given by

Tn-, = argmaxp(r(r) T) = arg max r(r)T C + 2-I) r(r)* (2.57)

We assume that L is known to the receiver for the implementation of the ML estimator. We
note that the matrix C + -1 is always invertible. So unlike the restriction in Section 2.3, C can

be singular which implies the training signals from different transmit antennas can be correlated
with each other.
2.4.2 Cramer-Rao Bound
The CRB for the timing estimation based on the random channel model is summarized in
the following theorem.
Theorem 2.4.1 (Cramer-Rao bound). Suppose that the first and second derivatives of the
training signals Sk(t) exist and they are uniformly continuous on [0. To]. Together with the
standard regularity conditions [36, 37], the Cramer-Rao bound for the estimation of the delay
r over the i.i. d Rayleigh flat-fading channel model is given by

CRB = 0(2.58)
2n, tr{D-1G}"

where D = F + I1 and the (i, j)th element of G is

G p2 (IT Sk(t)A*(t) dt T s'(t)s j (t) dt
k=1 0
+ p2 sk(t)d*(t) dt ss(t)sj(t) dt
2 k=1 / \JI )

+p ( f sk(t)(t) dt) ( s(t)j(t) dt (2.59)
2 k=1 Jo Jo 0
fori,j = 1,2,...,nt.

Proof To derive the CRB, we start from the log-likelihood function in (2.56):

ln[p(r(r)|r)] = const + {r()TC + i 2 r(r)

The second derivative of the log likelihood function ln[p(r(T)|Ir)] w.r.t. r is

021n[p(r(T) lT)]
0r2
2 r(r) T2+ ) -r(*
a2 2r I p2 -)_

1 a2r( rT
U2- 2

2 2 -1)
= tr{(Cp2yI,

tr{ (C + I)-
T2 1 p2 W

12 r (7) C
r{c

P -1
I2 r()

12 (

) 2 T
2 '2

The expectation of the above is

E{ 021 n[p(r(i)r)] }

+ 2 2 E2 aTO
1 DT2 Tr( -

+-itr{(C + 0) E[ r(r) ] }
+-T-2tr{ (C E [r(T)* a2rT]}

Write r =
aO-

r ,()T o9r .T ar, T where the ith block can be computed as

h*~s (t ) dt-
k=1 k r -

(t) s(t r)
o 19'

Or (T)* Or (T)
= hk [ f Sk(t s(t ) dt

T O s(t s- r) dt]
0o kt a) -5

Ht
k=1 I

+ n(t) -' _r dt

+ TO nH(t) asj dt .
0 aM

Recall that the channel gain vector h is assumed to have i.i.d complex circular symmetric Gaus-
sian elements, i.e., E[hkhH] = p2,Ir, E[hkh ] = 0, and E[hihy] = E[hih '] = 0, i j. Thus

2 )
+ I21

-1 2 ( )*
9r2

D- 19iT f

2 -1 *2 "*
+ I r(r)

9Or(r)
Di-

(2.60)

Then

(2.61)

we have

E ari(r)* ar (r)
EL Sr r J
2 (p2 s k(t ) s*(- dt
k=1
T s(t T) s (t -
+f. 2)T dt

2= p k (1) (-dt ( To
k=1 0 0 J) o

Ss (t -7) (t T) dt

In'

st (1),(1)di) + a2 fTo u((t) dl}Inr.
0

As a result, we have E N o J = P 0 In,, where the (i, j)th element of P is given by

pIj = p2 s ( J s(t)*(t) dt)
k=1 0

( O (t) j (t) dt) + -2 (t),(t) dt.

Similarly, we also have E --5 r (T) ] + E r(r)*d2 = Q I, where the (i,j)th
element of Q is given by

"' To
Qj = P2 Sk
k=1

k=1
for i,j = 1,2,...,nt.
Let D = r + 2I, then
;;

E { ln[p(r(T))]
&r2

sTo (
S0(t) S(t)dt) +
) 0

J *(t)s (t) dt
Jo

S(t)S:(t) dt) (

Sk(t)s*(t) dt) (

+- 2 JT

= -tr{(D s In,)-(P I)} + 1tr{(D In,) (Q 0 I)}

= tr (D I,,) P + Q) 0 In
= tr D-1 P + 1Q I,

2 n, x tr D-1P + Q (2.62)
0`22 1

where the second equality is obtained by using (A g B)-1 = A-1 0 B- 1 and the third equality
is obtained from the property (A 0 B) (C 0 D) = (AC) (BD) [50]. Let G = P + 1Q. Since

s (t 7)sj(t 7) dt = F = const.

fo sWgj(t)dt
To /

s*(t)gj(t)dt,

differentiating the left side twice w.r.t. r gives

2 J *(t)sj(t) dt + f (t)sj(t) dt + J s*(t)sj(t) dt = 0.

Then using the above equality, the (i, j)th element of G becomes

1
Gi = P + IQ

SpE Sk(t),s(t) dt s(st),j(t) dt
k=1

+12 P f s (t)g*(t)dt) ( fo s*(t)sj(t) dt)

+ P2 E k(t)s*(t) dt s (t)sj(t) dt (2.63)
k= / \Jo /

for i, j = 1, 2,..., nt. As a result, we note that G does not depend on the noise a2. The CRB
of the timing estimation is given as

1 12
CRB = i = (2.64)
E{ 'i--In!i''ro i ...- 1, 2n, tr{D -1G }

2.4.3 Optimal Training Scheme
In this section, we impose Assumptions 1 and 2 made in Section 2.3 on the form of the
training signals. With these two assumptions, Gij can be simplified to

G = 0P2 Haaa = -P2a H {l H
k=1 k=1
Hence we have G = 0bp2AHAAHA. Thus the CRB for the timing estimation can be simpli-
fied as given in the following corollary.
Corollary 2.4.1. Given Assumptions 1 and 2, the Cramer-Rao bound for the estimation of the
delay r over the i.i.d Rayleigh flat-fading channel model reduces to

CRB = (2.65)
2nraP2 tr { (VbAHA + _I) AHAAHA}

Moreover, in terms of the eigenvalues A1, A2, ..., An, of AHA, we have
,7 2 1 n, u
tr AA p2 A AA"A = A2 (2.66)
P i= 1b + 7

Thus the minimization of the CRB is equivalent to the following optimization problem:

max E, I

subject to E1 Ai < P

Ai > 0, i = 1,...,nt. (2.67)

It can be easily verified that the cost function Y i -j' is a convex function on (A1,..., An,).

Then the following theorem specifies the optimal training sequences [51].

Theorem 2.4.2. The CRB is minimized by choosing the training sequence matrix A such that
A = -, and A2 .. = An, = 0. That is, the optimal training sequences from different

transmit antennas are perfectly correlated.

We note that the rank of the optimal training sequence matrix A is 1. This implies that we

can choose an arbitrary subset of transmit antennas to transmit the training signals as long as the

training sequences from the chosen transmit antennas are perfectly correlated with each other.

A common choice is to use the same training sequence and evenly assign the power to each

transmit antenna. With the optimal choice of training sequences, the corresponding minimum

CRB is given by:
CRB =(1+ ). (2.68)
2na Pp2 ( 2
On the other hand, when orthogonal training signals are employed, i.e., A = = An, =

the CRB is maximized to the value

CRB =-- 1 + n (2.69)
2nr,'da Pp2 PP2

Contrary to the previous case of joint estimation of the channel and delay where orthogonal

training sequences are optimal, they are the worst in terms of the CRB value for estimating the

delay under the random channel model. Fig. 2.5 compares the CRBs of the system with the

perfectly correlated training sequences and that with the orthogonal training sequences. Note

that the CRB of the system with the perfectly correlated training sequences is the same for any

number of transmit antennas. We see that the performance gain achieved by the optimal scheme

is obvious when the SNR is low. For any fixed nt, the performance gap vanishes as the SNR

becomes sufficiently large.

In Fig. 2.6, we compare the MSE achieved by the ML estimator and the CRB given in

(2.68) for a system with two transmit antennas and employing perfectly correlated training

sequences. The perfect correlation is obtained by using the same training sequence and evenly

assign the power to each transmit antenna. As will be discussed in Section V.B, no knowledge of

signal-to-noise ratio is needed to implement the ML delay estimator for this choice of perfectly

correlated training sequences. We observe from the figure that for a reasonably small value of

nr, e.g. 4, and a reasonably high SNR, e.g. 20 dB, the CRB is a tight lower bound on the MSE

performance of the ML estimator. This together with the asymptotic achievability of the CRB

suggest that it is an appropriate performance metric.

2.5 Discussions and Conclusions

In the previous two sections, we have studied the problem of timing estimation in multiple-

antenna systems from two different approaches. In Section 2.3, the channel h is assumed to

be unknown but deterministic and joint ML estimation of h and the delay 7 is performed. In

contrast, in Section 2.4, we assume that the channel is random but with known statistics and use

the likelihood function averaged over all channel realizations to construct the ML estimator for

the delay T. These two approaches lead to two different optimal training signal designs. For the

deterministic channel approach, we see that orthogonal training sequences minimize the outage

probability as well as the average CRB. For the random channel approach, perfectly correlated

training sequences minimizes the CRB. Here we compare these two approaches in terms of the

resulting ML estimators, CRBs, and suitability of the outage and average CRB performance

measures.

10-2
L)

0 5 10

SNR(dB)

Figure 2.5: Comparison of CRBs obtained using orthogonal training sequences and perfectly
correlated training sequences for different numbers of transmit antennas. Note that the CRB
of the system with the perfectly correlated training sequences is the same for any number of
transmit antennas.

20
SNR(dB)

Figure 2.6: Comparison of the MSE of the ML estimator obtained from simulation and the
CRB. The number of transmit antennas nt is 2. The unit in the vertical axis is T2.

2.5.1 Orthogonal Training Signals

When orthogonal training signals are employed, both the ML estimators of the delay r

under the deterministic and random channel approaches, respectively, reduce to

,m = argmax{r(r)Tr(r)*}. (2.70)

Thus the equal gain combiner for the received signals from the receive antennas is the ML

estimator for both approaches. Under the deterministic channel approach, the average CRB has

the value
Eh[CRB(h)] _) 2b. (2.71)
2 (nr p) P2

Under the random channel approach, the CRB has the value

CRB- b 21 + nt)or (2.72)
2na Pp2 ( Pp 2

As discussed before, the CRB in (2.72) is asymptotically achievable by the ML estimator when

the number of receive antennas goes to infinity. In addition, the limiting ratio between (2.71)

and (2.72), when nr approaches infinity, is 1- which is smaller than 1. This implies that
1+nt -
the average CRB in (2.71) is not achievable by the ML estimator asymptotically when n,. ap-

proaches infinity. On the other hand, for small values of nr, the value in (2.71) can be larger

than the value of (2.72) when the SNR is large enough. More precisely, this happens when

> nt(nrnt 1). Thus in this case, the average CRB in (2.71) actually gives a tighter bound
on the performance of the ML estimator. The simulation results in Fig. 2.4 are in agreement

with this observation.

In this sense, the average CRB may not be as good a performance measure as the outage

probability in the deterministic channel approach since the latter is asymptotically achievable,
starting at very small values of n,., by the ML estimator. However, for small values of n, and at

high SNR, the average CRB may still be a reasonable performance metric.
2.5.2 Perfectly Correlated Training Signals

Under the random channel approach employing perfectly correlated training signals, we

have AHA = P qqT where q is an arbitrary nt x 1 vector with qTq = nt. For instance,

q = [1, 1,..., I]T when we use the same training sequence and evenly assign the power to each

transmit antenna. By using the matrix inversion formula, the ML delay estimator for this choice

of perfectly correlated sequences is reduced to be exactly the same as the one for orthogonal

training sequences given in (2.70). We note that the knowledge of the SNR is not needed

to implement the above ML estimator. Comparing the results in Figs. 2.4 and 2.6, the MSE

obtained by the ML estimator with the perfectly correlated training sequences is smaller than

that obtained by the ML estimator with orthogonal training sequences for all cases considered in

the simulation studies. This observation is in agreement with the training sequence optimization

result based on the CRB that the perfectly correlated sequences are superior than the orthogonal

sequences under the random channel approach.

In general, the SNR information is needed to implement the ML estimator. We also note

that perfectly correlated training signals are not applicable in the deterministic channel approach

since they cannot be used to estimate the channel vector h.

2.5.3 Deterministic vs Random Channel Approaches

The results and discussions in the previous sections provide some guidelines of whether

to use the deterministic or random channel approaches in estimating the timing parameter. If

the design consideration is the outage probability, i.e., neglecting a small percentage of the

worst-case channel realizations, one would employ the deterministic channel approach with

orthogonal training signals. On the other hand, if the average estimation (over all channel

realizations) error is the main design criterion, one would employ the random channel approach

with perfectly correlated training signals. We note that the perfectly correlated training signals

cannot be used for channel estimation. Thus they may be more suitable for space-time coding

schemes that do not require the channel information. In addition, the advantage of the perfectly

correlated training signals over orthogonal signals vanishes at high SNR in the random channel

approach. Thus when the number of transmit antennas is not very large and at high SNR, one

could employ orthogonal training signals for either of the two approaches.

CHAPTER 3
CHANNEL ESTIMATION FOR CORRELATED MIMO CHANNELS WITH COLORED
INTERFERENCE

3.1 Introduction

Many multiple antenna communication systems are designed for coherent detection that

requires channel state information (CSI) in the demodulation process. For practical wireless

communication systems, it is common that the channel parameters are estimated by sending

known training symbols to the receiver. The performance of this kind of training-based chan-

nel estimation scheme depends on the design of training signals which has been extensively

investigated in the literature.

It is well known that imperfect knowledge of the channel has a detrimental effect on the

achievable rate it can sustain [52]. Training sequences can be designed based on information

theoretic metrics such as the ergodic capacity and outage capacity of a MIMO channel [42,

53, 54]. The mean square error (MSE) is another commonly used performance measure for

channel estimation. Many works [55-65] have be carried out to investigate the training sequence

design problem based on MSE for MIMO fading channels. In Wong et al. [61], the authors

studied the problem of training sequence design for multiple-antenna systems over flat fading

MIMO channels in the presence of colored interference. The MIMO channels are assumed to

be spatially white, i.e., there is no correlation among the transmit and receiver antennas. The

optimal training sequences were designed to minimize the channel estimation MSE under a total

transmit power constraint. The optimal training sequence design result implied that we should

intentionally assign transmit power to the subspace with less interference. A practical algorithm

of estimating the long-term second-order statistics of the interference correlation matrix and

an efficient scheme of feeding back necessary information to the transmitter for constructing

the optimal training sequences were also proposed. In Kotecha et al. [62], the problem of

transmit signal design was investigated for the estimation of spatial correlated MIMO Rayleigh

flat fading channels. The optimal training signal was designed to optimize two criteria: the
42

minimization of the channel estimation MSE and the maximization of the conditional mutual

information (CMI) between the channel and the received signal. The authors adopted the virtual

channel representation model [66] for MIMO correlated channels. It was shown that the optimal

training signal should be transmitted along the strong transmit eigen-directions in which more

scatters are present. The powers transmitted along these eigen-directions are determined by

the water-filling solutions based on the minimum MSE and maximum CMI criteria. In Cai

et al. [65], the space-time spreading scheme, block coding scheme and channel estimation

for correlated fading channels in the presence of interference have been studied. The authors

focused on the single receive antenna case and extended their results to the multiple receive

antennas case where receive antennas were assumed to be uncorrelated. Based on the previous

optimization results for the special case [63] where there was no interference, the space-time

beamforming (STBF) matrix was chosen as the training symbol matrix for the linear MMSE

channel estimator. Then the optimal power loading scheme was designed for the training symbol

matrices in this particular set.

In this chapter, we investigate the problem of estimating correlated MIMO channels with

colored interference. We adopt the correlated MIMO channel model [21, 67] which expresses

the channel matrix as a product of the receive correlation matrix, a white matrix with identically

and independent distributed (i.i.d.) entries, and the transmit correlation matrix. This model im-

plies that transmit and receiver correlation can be separated. This fact has been verified by field

measurements. The colored interference model used here is more suitable than the white noise

model when jamming signals and/or co-channel interference are present in the wireless com-

munication system. We consider an interference limited wireless communication system, and

assume that the thermal noise is small relative to interference and can be ignored. Then we show

that the covariance matrix of the interference has a Kronecker product form which implies that

the temporal and spatial correlation of the interference are separable. The channel estimation

MSE is used as a performance metric for the design of training sequences. The optimization

problem encountered here which minimizes the channel estimation MSE under a power con-

straint is a generalization of two previous optimization problems which are encountered widely

in the signal processing area [61, 63, 64, 68]. We first analyze the optimal structure of the solu-

tion by using the Lagrangian method, and then find the optimal power allocation scheme which

has the water-filling interpretation. Finally we determine the optimal ordering for the related

eigenvector matrices. In Cai et al. [65], the authors encountered the essentially same optimiza-

tion problem but with the different form. Based on the the previous optimization results for the

special case [63], the authors chose to optimize the training sequence matrix in a particular set

of matrices which have the same solution structure and eigenvector ordering as our solution.

Here we rigorously prove that this particular solution structure and eigenvector ordering result

are optimal for arbitrary matrices with the power constraint. The design of the optimal training

sequences has a clear physical interpretation which implies that we should assign more power to

the transmission direction constructed by the eigen-direction with larger channel gains and the

interference subspace with less interference. In order to implement the channel estimator and

construct the optimal training sequences, we propose an algorithm to estimate long-term chan-

nel statistics and design an efficient feedback scheme so that we can approximately construct

the optimal sequences at the transmitter. Numerical results show that with the optimal training

sequences, the channel estimation MSE can be reduced substantially when compared with the

use of other training sequences.

The chapter is organized in the following manner. The system model and linear MMSE

channel estimator that we consider are introduced in Section 3.2. In Section 3.3, The optimal

training sequence is designed based on minimizing the total channel estimation MSE. In Section

3.4, an algorithm for the estimation of long-term characteristics of the channel is proposed

and an efficient feedback scheme is designed. Numerical results are provided in Section 3.5.

Conclusion is drawn in Section 3.6.
3.2 System Model

We consider a single user link with multiple interferers. The desired user has nt transmit

antennas and n, receive antennas. We assume that there are M interfering signals and the ith

interferer has n, transmit antennas. The MIMO channel is assumed to be quasi-static (block

fading) in that it varies slowly enough to be considered invariant over a block. However, the

channel changes to independent values from block to block. We assume that the users employ

a frame-based transmission protocol which comprises training and payload data. The received

baseband signals at the receive antennas during the training period are given in matrix form by
M M
Y = HST+ HiST = HST+E = HST + E,. (3.1)

E

The n, x n.t matrix H and the nr x ni matrix Hi are the channel gain matrices from the transmitter

and the ith interferer to the receiver, respectively. S is the N x nt training symbol matrix known

to the receiver for estimating the channel gain matrix H of the desired user during the training

period. N is the number of training symbols from each transmit antenna and N is usually

much larger than nt. S, is the N x ni interference symbol matrix from the ith interferer. We

assume that the elements in S, are identically distributed zero-mean complex random variables,

correlated across both space and time. The interference processes are assumed to be wide-sense

stationary in time. We consider an interference limited wireless communication system. Hence

we assume that the thermal noise is small relative to interference and can be ignored [69].

We adopt the correlated MIMO channel model [21, 67] which models the channel gain

matrix H as:
H = R2H,, R1/2 (3.2)

where Rt models the correlation between the transmit antennas and Rr models the correlation

between the receive antennas, respectively. The notation (.)1/2 stands for the Hermitian square

root of a matrix. H,, is a matrix whose elements are independent and identical distributed zero-

mean circular-symmetric complex Gaussian random variables with unit variance. Let h,, =

vec(H,), where vec(X) is the vector obtained by stacking the columns of X on top of each

other, then we have

h = vec(H) = (Rt/ 0 R )h,,, (3.3)

with h ~ C.N(0, R, R,) where CAf denotes complex Gaussian distribution. Similarly, we

model the channel gain matrix from the ith interferer to the receiver as:

H, = R'/2H,iR (/2

(3.4)

and

hi = vec(Hi) = (R/02 R/2)hw. (3.5)

Using the vec operator, we can write the received signal in vector form as

y = vec(Y)
M
= (S 0 In,)vec(H) + -(S 0 In,)vec(Hz)
i=1
M
= (S In,)h + Z(S 0 I,)(R2 R2)h
i=1
M
= (S0In,)h+ ee
i=1
= (S In,)h + e (3.6)

where In, denotes the nr x n,. identity matrix.

To derive the linear MMSE channel estimator, we need the following lemma.

Lemma 3.2.1. E(e) = 0 and the covariance matrix of e is

M
E(eeH) = QNi Rr = QN 0 R, (3.7)
i=1

where
SRk(O) ... R,k(- 1)
QN = : (3.8)

E= RZk (N 1) k=, R1Ik(0)
and R,k (r) represents the time correlation between the signals at time instants m and m + T

from the kth antenna of the ith interferer

Proof Since h,i ~- CaV(0, I.,n,), E(ei) = 0. Then we have E(e) = 0.

The received signal from the ith interferer can be written as

E. = HiS, = R/2H,, R/2S. = R1/2H, S. (3.9)

Since Si is wide-sense stationary in time, S, is also wide-sense stationary in time.

Using the vec operator, we can rewrite the interfering signal from the ith interferer as

ej = vec(Ei) = (IN 0 R'/2)vec(H ,,S,).

(3.10)

The covariance matrix of e, is given as

E(eefH) = E[(IN 0 R'/2)vec(H,,S,)vec(Hw,,S)H(IN 0 R/2)H]

= (I R1/2)E[vec(HewS )vec(HS)H](IN RN 2).

Let e' = vec(HwiSi), we can show that the covariance matrix of e is

k,k(0)I, ... k= R,k(N 1)I
E[eeH] : ".

JR 1 Rk(N 1)I .. E 1= Rk(0)Ir
= QN In,,

(3.11)

(3.12)

where Rk,k (r) represents the time correlation between the signals at time instants m and m + T

from the kth antenna of the ith interferer. Then we have

E[eie'l] = (IN 0 R'/2)(QNi 0 In,)(IN 0 R1/2)

(3.13)

= QNi Rr..

The covariance matrix of e is then given as

M
E[eeH] = Z QN 0 R, = Q .
i=1

(3.14)

We note that QN captures the temporal correlation of the interference and R, represents

the spatial correlation. The covariance matrix of the interference has a Kronecker product form
which implies that the temporal and spatial correlation of the interference are separable.

We notice that (3.6) represents a linear model. Based on the Bayesian Gauss-Markov
Theorem [36], the linear minimum mean square error estimator (LMMSE) for h is given as:

h = [(S" 3 In,)(Q O R,)-'(S I,) + (Rt R,)-']-1(S 0 In.,)(QN O R,)-ly
= [(S1HQNIS + R7')-1 R,](SH Inr)(QN 3 Rr 1)y

= [(SHQ NS + Rtl)- SHQN' 0 I ]Jy. (3.15)

Using the equality vec(AYB) = (BT 0 A)vec(Y), we can rewrite the channel estimator in the
compact matrix form as
fH = YQ-1S(SHQ NS + Rt )-'. (3.16)

Hence the channel estimator does not depend on the receive channel correlation matrix Rr.
The performance of the channel estimator is measured by the estimation error e = h h
whose mean is zero and whose covariance matrix is

C, = E[(h h)(h h)"]
= [(SH 0 I.,)(QN Rr) (S 0 In,) + (Rt 0 R,)-]-

= [(SHQN)S) ( Rr1 + R '-1 9 R,-]-1

= [(S"QNS + Rt) Rr-1
= (SHQ NS + Rt 1)-' 3 R, (3.17)

where the third equality is due to (A B)(C D) = AC BD and (A B)- = A-' B-1.
The diagonal elements of the error covariance matrix C, yields the minimum Bayesian MSE.
The total MSE is the commonly used performance measure for MIMO channel estimation. By
using the fact that tr(A 0 B) = trAtrB, we have

tr(C,) = tr((SHQN'S + R '1)-' s R,) = tr((SHQN'S + Rt-')-)tr(Rr).

Thus the minimization of the total MSE over training sequences does not depend on the receive
channel correlation matrix. Only the temporal interference correlation matrix QN and the trans-
mit correlation matrix Rt need to be considered in obtaining the optimal training sequences.

3.3 Optimal Training Sequence Design

In this section, we investigate the problem of optimal training sequence design for channel

estimation. With the total Bayesian MSE as the performance measure, the optimization of

training sequences can be formulated as follows

min tr(SHQN S + Rt-1)-1 (3.18)
S
subject to tr{SHS} < P

where tr{SHS} < P specifies the power constraint.

The optimization problem itself is of great interest to researchers in the signal processing

and communication areas. Its special cases (with either QN or Rt equal to the identity matrix)

have been encountered widely in joint linear transmitter-receiver design [63, 68, 70], training

sequence design for channel estimation in multiple antenna communication systems [61, 64],

and spreading sequence optimization for code division multiple access (CDMA) communication

systems [71].

The solution in the special case Rt = I, found for example in Wong et al. [61] and

Scaglione et al. [68], can be expressed in terms of the eigenvalues and eigenvectors of QN and

a Lagrange multiplier associated with the power constraint. Similarly, the solution in the special

case QN = I, found for example in Zhou et al. [63] and Biguesh et al. [64], can be expressed in

terms of the eigenvalues and eigenvectors of Rt and a Lagrange multiplier associated with the

power constraint. The optimization of the generalized mean square error problem introduced

here is more difficult. We will show that (3.18) has a solution that can be expressed S = UEVH

where U and V are orthonormal matrices of eigenvectors for QN and Rt respectively, and E

is diagonal. Solving (3.18) involves computing diagonalizations of QN and Rt, and finding an

ordering for the columns of U and V. In Cai et al. [65], the authors encountered the essentially

same optimization problem but with the different form. Based on the the previous optimization

results for the special case [63], the authors chose to optimize the training sequence matrix in

a particular set of matrices which have the same solution structure and eigenvector ordering as

our solution. Here we rigorously prove that this particular solution structure and eigenvector

ordering result are optimal for arbitrary matrices with the power constraint.

A related optimization problem which minimizes the trace of the mean square error matrix

in a variant form is discussed in Section 3.7.1., and another optimization problem which max-

imizes the determinant of the inverse of the mean square error matrix is introduced in Section

3.7.2.

We solve the optimization problem (3.18) in three steps. First, we analyze the optimal

structure of the solution by using the Lagrangian method, then find the optimal power allocation

scheme, and finally determine the optimal ordering for the related eigenvector matrices.
3.3.1 Solution Structure

We begin by analyzing the structure of an optimal solution to (3.18). Let UAUH and

VAVH be diagonalizations of QN and Rt where the columns of U and V are orthonormal

eigenvectors. Let Aj, 1 < j < N, and 6~, 1 < i < nt, denote the diagonal elements of A and

A, respectively. We assume that the eigenvalues { A, } are arranged in increasing order, and {5,}

are arranged in decreasing order:

062> ... -> n, > 0. (3.19)

Let us define
T = UHSV. (3.20)

Substituting S = UTVH in (3.18) gives the following equivalent optimization problem:

min tr (THA -1T+ A-)-1 (3.21)

subject to tr (THT) < P, T E CNxt.

We now show that the solution to (3.21) has at most one nonzero in each row and column.

Theorem 3.3.1. There exists a solution of (3.21) of the form T = IIH1II2 where IIi and 112

are permutation matrices and oai = 0 for all i \$ j.

Proof. We first prove the theorem under the following nondegeneracy assumption:

6 7 8j > 0 and A. \$ A, > 0 for all i # j.

(3.22)

Since the cost function of (3.21) is a continuous function of A and A, and since any A > 0 and
6 > 0 can be approximated arbitrarily closely by vectors 6 and A satisfying the nondegeneracy
conditions (3.22), we conclude that the theorem holds for arbitrary A > 0 and 6 > 0.
There exists an optimal solution of (3.21) since the feasible set is compact and the cost
function is a continuous function of T. Since the eigenvalues of A T"A- TA are nonneg-
ative, it follows that for any choice of T,

tr (THA-T + A-)-1 trA(ATHA-1TA +I)-1 < tr (A),

with equality when T = 0. Hence, there exists a nonzero optimal solution of (3.21), which is
denoted T. According to the Lagrange multiplier theorem, the first-order necessary condition
for an optimal solution is the following: there exists a scalar 7 > 0 such that:

tr ((THA-1T + A-)- + 7THT)T= = 0. (3.23)

For notation simplicity, let
M = THA-T + A-1. (3.24)

For any invertible matrix M, the derivative of the inverse of a matrix [72] is given as:

dM-a MIdM)M-
=-M- M-
dT

Hence, (3.23) is equivalent to:

tr (7[TH6T + 6THT] M-THA-16T + 6THA- M-1) = 0

for all matrices 6T e CNxIt.
Let Real (z) denote the real part of z E C. Based on the fact that tr (A + AH) =
2(Real [tr (A)]) and tr (AB) = tr (BA), we have

Real [tr (yH6T M-2THA-16T)] = 0.

By taking 6T either pure real or pure imaginary, we deduce that

tr ([-'"H M-2THA-1]6T) = 0

for all 6T. By choosing 6T to be completely zero except for a single nonzero entry, we conclude
that
7TT" M-'T"A-1 = 0. (3.25)

If 7 = 0, then T = 0 since both A and A are invertible. Hence, 7 > 0.
We multiply (3.25) on the right by T to obtain

TT = M-2THA-'T = (THA-T + A-1) -2HA-1T (3.26)

Since THT is Hermitian, we have

(THA -T + A-1)-2THA-l = THA-T(THA-1 + A-1)-2.

Then we will show that THA-lT and A-1 commute with each other. We need the following

lemma [73]:

Lemma 3.3.1. If A and B are diagonalizable, they share the same eigenvector matrix if and

only ifAB = BA.

For the simplicity of notations, let A = T"A-'T and B = A-1. Then we have

(A + B)-2A = A(A + B)-2

According to Lemma 3.3.1, A and (A + B)-2 share the same eigenvector matrix. Since A + B

and (A + B)-2 have the same eigenvector matrix, A and A + B share the same eigenvector
matrix. Then we have
A(A+ B) = (A +B)A

Hence,
AB = BA,

which implies that THAl-T and A-1 commute with each other. Since A-1 is diagonal,
TH A-1T is diagonal. Since THA -T is diagonal, THT is diagonal by (3.26).

Since THA-"I' and A-1 are diagonal, both M and M-1 are diagonal. Hence, the factor
M-2 in (3.25) is diagonal with real diagonal elements denoted ej, 1 < j < ntt. By (3.25), we

have
7 j = e (3.27)
Ai

If lj, 7 0, then (3.27) implies that
ej
-- =7- 10.
Ai

By the nondegeneracy condition (3.22), no two diagonal elements of A are equal. If for any

fixed j, Tij \$ 0 for i = i1 and i2, then the identity = -y7 yields a contradiction since 7 \$ 0

and Ai, = Ai,. Hence, each column of T has at most one nonzero. Since THT is diagonal,

two different columns cannot have their single nonzero in the same row. This implies that each

column and each row of T have at most one nonzero. A suitable permutation of the rows and

columns of T gives a diagonal matrix E, which completes the proof. E

Combining the relationship (3.20) between T and S and Theorem 3.3.1, we conclude that

problem (3.18) has a solution of the form S = UII1EIH2VH, where Hi and H2 are permutation

matrices. We will show that we can eliminate one of these two permutation matrices.

Substituting S = UII NEH2VH in (3.18), the equivalent optimization problem is obtained

as:

min tr E(HHA-i + I12A-I 2 (3.28)
E,n1i.n2 \ /
M
subject to tr C 2o < P
i=1

where M represents the minimum of N and nt. In the above optimization problem, the mini-

mization is over diagonal matrices E with a, .. ., O-M as the diagonal elements, and two per-

mutation matrices HI and 1.2-

Since the symmetric permutations IIA-1f1I and IIHA-1H2H essentially interchange di-

agonal elements of A and A, (3.28) is equivalent to
M
mirn 2 (3.29)
l ^ aO/Ar(,) + 1/d7,2()
M
subject to o~ <- P, Tri E PN, X72 E P.,
i= 1

where PN is the set of bijections of {1, 2,..., N} onto itself.

We will now show that the optimal solution only depends on the smallest eigenvalues of

QN and the largest eigenvalues of Rt.
Lemma 3.3.2. Let UAUH and VAVH be diagonalizations of Q and D respectively where

the columns of U and V are orthonormal eigenvectors. Let a, 7ir, and 7r2 denote an optimal

solution of (3.29) and define the sets

M ={i: > 0}, Q={A,,(L):ieM}, and = {16,(j) :ie M},

If M has I elements, then the elements of the set Q constitute the I smallest eigenvalues of QN,

and the elements of R constitute the I largest eigenvalues Rt, respectively.

Proof Assume k M and A7r(k) < Ail(i) for some i E M. It is easy to see that by inter-

changing the values of 7r, (i) and 7r (k), the new i-th term in the cost function is smaller than the

previous i-th term. It contradicts the optimal assumption of a and 7r. Then Ar,(k) > A1(i).
Then, suppose that k M and (62(k) > 52(i) for some i 6 M. Let C denote the cost

value due to the sum of the i-th term and the k-th term before the interchange. Similarly, let C+

denote the cost value due to the sum of the i-th term and the k-th term after the interchange of

the values of 72(i) and 7r2(k). We have

1
C = a2/A() + /(+ 2(k)

and
C+ 1 + 0)2(I)
S-2/A71(i) + 1/6 2(k)
Since 6,:(k) > 62 (i),

C+ C
( 2(k) 4- 2())(42(k) 7r2(i)/A() W i2+2(k)/A r + j 2()/ ))
(0' 232(k)/ / A(7) + 1(r 7M/A7,it) + 1)
< 0.

The cost is reduced by interchanging the values of r2 (i) and 7r2(k), which violates the optimality
of ao- and 7r. Hence, 652(k) < 42(0). O

Using Lemma 3.3.2, we now show that one of the permutations in (3.29) can be deleted if

the eigenvalues of QN and Rt are arranged in a particular order.

Theorem 3.3.2. Let UAUH and VAV1H be diagonalizations of QN and Rt respectively where

the columns of U and V are orthonormal eigenvectors, the eigenvalues of QN are arranged in

increasing order and the eigenvalues of Rt are arranged in decreasing order If M is the

minimum of the rank of QN and R,, then (3.29) is equivalent to
M
min 1 (3.30)
1 2/Ai) + 1/6i
M
subject to cr,! < P, 7r C PM,
i=1
where cr = 0 for i > M.

Proof Since at most M eigenvalues of either QN or Rt are nonzero, it follows from Lemma

3.3.2 that the set M has at most M elements. Since the elements of Q are the smallest eigen-

values of Q and the elements of R are the largest eigenvalues of Rt, we can assume that

7ri(i) e [1, M] and 72(i) e [1, M] for each i e M. Hence, we restrict the sum in (3.29) to

those indices i S where
S = { j): 1 < j< M}.

Let us define = o-r Ij) and 7r(j) = t7r21(j)). Since 7r(j) E [1, Al] for j E [1, M], it

follows that rt PM. In (3.29) we restrict the summation to i E S and we replace i by 7F2 '(j)

to obtain
M M
1 where ( \ )2 < p
E / + 1/.2() aj/A(j) + 1/ ',
iES j=1 i=1

This completes the proof of (3.30). E

Combining the relationship (3.20) between T and S, Theorem 3.3.1 and Theorem 3.3.2

yields the following corollary:

Corollary 3.3.1. Problem (3.18) has a solution of the form S = UIIEVH where the columns

of U and V are orthonormal eigenvectors of QN and Rt respectively with the eigenvalues of

QN arranged in increasing order and the eigenvalues of Rt arranged in decreasing order, H is
a permutation matrix, and E is diagonal.

Proof Let a and 7r be a solution of (3.30). For i > M, define 7r(i) = i and aC = 0. If H is the

permutation matrix corresponding to 7r, then making a substitution S = UHIIVH in the cost

function of (3.18) yields the cost function in (3.30). Since (3.29) and (3.30) are equivalent by

Theorem 3.3.2, S is optimal in (3.18). E

3.3.2 The Optimal E

We now consider the optimization problem which minimizes the cost function over a with

the permutation 7r in (3.30) given. Then in the next subsection, we will find the optimatial

permutation 7r based on the solution to the optimization problem considered here. For the sake

of notation simplicity, let pi denote 1/AX(i) and q, denote 1/6i. Hence, for fixed 7r, (3.30) is

equivalent to the following optimization problem:

mmin (3.31)
a Pi + qi
i=1
M
subject to aE < P.
i=1

The solution of (3.31) can be expressed in terms of a Lagrange multiplier related to the power
constraint. The structure of this solution has a water filling interpretation in the communication

literature [74].

Theorem 3.3.3. The optimal solution of (3.31) is given by

S 1 q 1/2
ai = max { -i, 0 (3.32)
IPiP Pi

where the parameter p is chosen so that
M
:o = P. (3.33)

Proof. Since the minimization of the cost function in (3.31) is over a closed and bounded set,

there exists a solution. At an optimal solution to (3.31), the power constraint must be an equality.

Otherwise, we can multiply a by a scalar larger than 1 to reduce to the value of the cost function.
For the sake of notation simplicity, let ti = oa. Then the reduced optimization problem (3.31)

is equivalent to
M
min 1 (3.34)
t Z pit + qi
M
subject to t = P, t > 0.
i=1

Since the cost function is strictly convex and the constraint is convex, the optimal solution to

(3.34) is unique.

According to the Lagrange multiplier theorem, the first-order necessary conditions [51]

(Karush-Kuhn-Tucker conditions) for an optimal solution of (3.34) are the following: there

exists a scalar p > 0 and a vector v e IRM such that

S + p- vi = 0, v 0, ti > 0, vt = 0, 1 < i < M. (3.35)
(piti + qi)2

Due to the convexity of the cost and the constraint, any solution of these conditions is the unique

optimal solution of (3.34).

A solution to (3.35) can be obtained as follows. We define the function

(71q)= 1 (3.36)
\. Pi' Pi)(1

Here x+ = max{x, 0}. This particular value for ti is obtained by setting v, = 0 in (3.35) and

solving for ti; when the solution is < 0, we set ti(p) = 0 (this corresponds to the + operator

(3.36)). We note that t,(/p) is a decreasing function of p which approaches +00 as p approaches

0 and which approaches 0 as p grows to +oo. Hence, the equation
M
E ti(.) = P (3.37)
i=1

has a unique positive solution. Since ti(pi/q<2) = 0, we have ti(/) = 0 for p> p/qg2. Then we

have
A +. -=- +.p- > 0 for p > p./q2.
(piti(p) + qi)2 q2
We deduce that the Karush-Kuhn-Tucker conditions can be satisfied when p is the positive

solution of (3.37). E

3.3.3 Optimal Eigenvector Ordering

Finally, we need to find an optimal permutation in (3.30), i.e., an optimal ordering for the

eigenvalues of QN and Rt.

Theorem 3.3.4. If the eigenvalues {Ai} of QN are arranged in increasing order and the eigen-

values {6i } of Rt are arranged in decreasing order, then an optimal permutation in (3.30) is

r(i) = i, 1 < i < M. (3.38)

Proof Assume that there exist indices i and j such that i < j, A, > Aj and 6, > 6j, i.e., pi < pj

and qi < qj. A) and A are not arranged in the supposed optimal order for the eigenvalues of

QN. We will show that it will cause contradiction.
Let us consider the following optimization problem:

min + (3.39)
titj pti + qi pjtj + qj
subject to ti + tj = P, ti > 0, tj > 0,

where P = a' + o-J. Since a yields an optimal solution of (3.30), it follows that a solution of

the above optimization problem is ti = a2 and tj = aoj. Based on Theorem 3.3.3, the ti is given

as
t()= -qi, (3.40)
V Pit Pi
where p is a Lagrange multiplier obtained from the power constraint ti + tj = P as:
1 + 1
7-fi vpi ,p (3.41)
=p+ q (3.41)
Pi Pj

Let C denote the cost function for (3.39). Combining (3.40) and (3.41) gives
1 + 1 1 )2
C + p
piti + qi pjtj + qj P + +

Now, suppose that we interchange the values of pi and pj. Let C+ denote the cost value

associated with the interchange. With the assumption that the optimal solution of (3.39) is still

positive after the exchange of p, and p3, we have

( + 1 )2
C+ _= (; + v7) (3.42)
Pi Pj

We need to use the following lemma [40]:

Lemma 3.3.3. If ai, bi, i = 1,..., n are two sets of numbers,
n n n
jajbj.-i+i] E aibi < jaa[>]b[,]
i=1 i=1 i=1

From the above lemma, we have q- + l > 1 + 21. Then C+ < C.
Pi Pj Pi Pj
If C+ < C, it contradicts the optimality of u. Then we have C+ = C. Hence, for each i

and j with i < j, pi < pj and q, < qj, we can interchange the values of pi and pj to obtain a new

permutation with the same value for the cost function. After the interchange, we have pi > pj,

i.e., A, < Aj. In this way, the A, are arranged in increasing order. Since the 6i are arranged in

decreasing order, we conclude that the associated optimal permutation 7t is (3.38).

One technical point must now be checked: we should verify that if pi < pj and q, < qj

with i < j, and if we exchange pi and pj, then the corresponding optimal solution of (3.39)

remains positive.

To check it, we consider two cases respectively. For the first case, suppose ok > 0 with

i < j < k, Pk < Pi < Pj and q, < qj < qk. From (3.40), we have

o() = 1[ qk >0,
V Pk Pk

Then
1 qk

After the exchange,
1 p + i

1 q- 1 q 1 qk
l_{lp >, >_ V P, ) ) > 0.
PJ ( "3 I+)=3 f VP

Similarly,

1 q3 1 1 1 1 qk

For the second case, suppose j = max ()) and pi = min(Q),pi < pj and qg < q.

Since the original solution, before the exchange, is positive, it follows from (3.40) and

(3.41) that
P > q j and P > q q. (3.43)
Pt Pj V PiP, Pi
After the exchange, the analogous inequalities that must be satisfied to preserve nonnegativity

are
p > qj qi (3.44)
\fPip, Pj
and
p > q qj. (3.45)
VPP4', Pi
(3.45) is satisfied from (3.43) and the fact that pi < pj and qg < qj. If (3.44) is also satisfied, the

proof is completed.

If (3.44) is not satisfied, i.e., P < the associated cost after the exchange is

1 1
C* +-
p P + qg qj

where t = P and t, = 0. We will show that C* < C with P > -q P > --

and P < -k- -_ Letting C* < C gives
-- (i-ptp Pj

1 1 +1 )2
+ < .
PPq P P

Multiplying both sides of the above inequality with (pjP + qi)qj(P + ', + -) gives
Pi Pj

q (p + q + + ) + (p + )(+ + ) < 1 + 1
P +P p + Pt + )2(pip + q-)qj.

After considerable algebra on the above inequality, we find that to show C* < C is equivalent

to show that f(P) < 0 with

f(P) = pP2 + (q, + qj + Pj q p qj 2 qp ( q- )2, (3.46)
PA At vv v,1

when P (max[ ], -). Since
p ,p j pj) "',P-J j ]Pp .)"Sin e

f( )=(pj + 1)(q q)( q qj -) < 0

when --- > q and

we have C* < C. i

Combining Corollary 3.3.1, Theorem 3.3.3 and Theorem 3.3.4, we conclude that the opti-

mal training sequences should be designed according to the following theorem.

Theorem 3.3.5. Let UAUH and VAV" be the diagonalizations of QN and Rt respectively

where the columns of U and V are orthonormal eigenvectors, the corresponding eigenvalues

{ A } are arranged in increasing order, and {5, } are arranged in decreasing order Then the
optimal solution of (3.18) is given by

S = Uq V (3.47)

where m specifies the power allocation which is diagonal with diagonal elements given by

and F, = O for i > nt, with the parameter pi is chosen so that
nt

Si=
With the optimal training sequence, the channel estimator simplifies to

fl = YUr,, rVH,

(3.49)

where r = diag{yi, ... ., 7-, with y = the columns of U, are the eigenvectors of QN

corresponding to its nt smallest eigenvalues, and the columns of V, are the eigenvectors of Rt.

The design of the optimal training sequences summarized in the above theorem has a clear

physical interpretation. Each eigenvector of the transmit correlation matrix Rt represents the

transmit eigen-direction and the associated eigenvalue indicates the channel gain in that eigen-

direction. More power should be assigned to the signals transmitted along the eigen-direction

with larger channel gains. On the other hand, each eigenvector of the interference temporal

correlation matrix QN represents the interference subspace and the corresponding eigenvalue

indicates the amount of interference in that subspace. Hence, we should choose the subspaces

with the least amount of interference for transmission. To facilitate the understanding of the

physical meaning of optimal training sequences, we can rewrite them in an alternative way as
nt

i=1

where u, are orthonormal eigenvectors of QN with the corresponding eigenvalues arranged in an

increasing order and vi are orthonormal eigenvectors of Rt with the corresponding eigenvalues

arranged in a decreasing order. The vectors ui and vi form transmission directions in time

and space, respectively. The above theorem implies that the optimal training sequence design

put more power to the transmission direction constructed by the eigen-directions with larger

channel gains and the interference subspaces with less interference. The power assignment is

determined by the water-filling argument under a finite power constraint.

3.4 Estimation of Channel Statistics and Feedback Design

To implement the channel estimator and construct the optimal training sequences for chan-

nel estimation, we need the knowledge of the transmit antenna correlation matrix Rt and the

interference covariance matrix QN at both the receiver and transmitter sides. Since these two

matrices are long-term channel characteristics, they can be estimated by using the observed

training signals at the receiver end and then fed back to the transmitter end for the construc-

tion of the optimal training sequences. In this section, we propose an algorithm to estimate

these long-term channel characteristics and design an efficient feedback scheme so that we can

approximately construct the optimal training sequences at the transmitter end.

Let us assume that the training signal matrix S is sent over a block of K packets. The
received training signals for the nth packet are given as

y(n) = (S In,)h(n) + e(n)

= (S I.,)(R/2 R1/2)h(n) + e(n)

= (SR/2 0 R1/2)h,,(n) + e(n). (3.50)

We can calculate the sample average correlation matrix of the received signal from the previous
K packets as follows:
K
R= y(n)y(n). (3.51)
n= 1
It is easy to see that R is the sufficient statistics for the estimation of the second-order correlation
matrices Rt and QN if e(n) is Gaussian distributed.
We can show that the correlation matrix of the received signal has the Kronecker product
form:

R = E(y(n)y(n)H)

= (SR'/2 R1/2)E(h,,(n)h,(n)H)(R/2SH 0 R1/2) + E(e(n)e(n)H)

= (SRtSH) 0 Rr + QN o Rr

= (SRtSH + QN) 0 Rr

= RqoR,. (3.52)

where Rq = SRtSH + QN IfR Rq Rr, then R = aRq 0 1Rr for any a : 0. Hence, Rq
and R, can not be uniquely identified from observing y(n). Fortunately, the channel estimator
and the design of optimal sequences are invariant to scaling of the estimates of Rt and QN. This
can be explained as follows:

I'(n) = Y(n)(aQN)-S(SH(aQN)-lS + (aRt)-l)-1

= Y(n)QNIS(SHQNS + Rt1)-1

= H(n)

and

tr((SH(QQN)-lS + (aR,)-')-1 = atr((SHQNIS + R 1)-T).

We notice that the new cost function of the optimization problem is just a scaled version of the

original cost function.

For the estimation of Rq and Rr, we need to impose an additional constraint on Rr. Here

we force tr(R,) = nr. Then an iterative flip-flop algorithm [75, 76, 77] can be used to estimate

Rq and R,. If the received interference signal e(n) is Gaussian distributed, the flip-flop algo-

rithm provides the maximum likelihood estimates (MLE) of Rq and R,. [75] when it converges.

Otherwise, the algorithm gives the estimates of Rq and R, in the least square sense. For fixed

Rr, the MLE of Rq is obtained as

= or KY,(n)[YH(n)]'} (3.53)
U=l v=l 7=1

where ar, is the (u, v)th element of R-1 and Yu(n) is the uth row vector of the received signal

matrix Y(n). Similarly, for fixed Rq, the MLE of Rr is obtained as
N N K
RO= o: W (n)W'(n) (3.54)
u=l- 1= n=1

where oa, is the (u, v)th element of f'1 and W,(n) is the uth column vector of the received

signal Y(n). Then to get uniquely identifiable Rq and Rr, we need to scale Rr to make

tr(R,) = nr. We note that the terms inside the braces in (3.53) and (3.54) can be computed

before the running of the iterative estimation algorithm to reduce computational complexity. To

start the iterative algorithm, an initial value of either Rq or R, should be assigned. A natu-

ral choice is to initially make Ri = In,. Then the iterative algorithm alternates between the

calculation of Rk and ,. until convergence. While it is difficult to prove analytically that the

algorithm converges to the MLE, extensive data experiments [75] in statistics show that it al-

ways converges to the MLE for situations of practical sample sizes. The convergence in our

case is also verified by the numerical results in Section 3.5.

Then Rt and QN can be estimated based on Rq. We note that only 7'(QN) fn R -(S) can
be uniquely identified from Rq in the sense below (R. denotes the range space of a matrix and
R' denotes the perpendicular subspace of the range of a matrix):
Lemma 3.4.1. Let Rt and QN be Hermitian positive semi-definite matrices andRq = SRtSH+

QN, where S is offull rank. Let D = {(Rt, QN) : RZ(QN) C 7'(S)}. Then there is an 1-1
correspondence between Rq and (R,, QN) only for the pairs of (Rt, QN) in D.

Proof Let Ps = S(SHS)-ISH be the projection onto R7(S) and Ps = I -Ps be the projection
onto RT (S).
First, let (Rt, QN), (Rit, Q') e D. Let Rg = SRtSH + QN and R' = SRSH +Q'
Consider P(Rq = PsQN = QN, PsRq = SRtSH, and PR', = PsQ'N = Q'N PsR =
SR'tSH. Since S is of full rank, PsRq = PsR' iff Rt = Rt. Also since Ps and P' are
projections onto complementary subspaces, Rq -= R'q iffPR = P7R' and PsR = PsR',
i.e. (Rt, QN) = (R, Q'N).
Conversely, let (Rt, QN) e D and Rq = SRSH + QN. Now choose R'(t Rt and define

Q'N = QN + SRtSH SR'SH. Since R(QN) C R'(S) and S is of full rank, (Rt, Q'N) ID'.
But R' = SRSH + Q = SRtSH+ QN = Rq.

Based on the above lemma, we see that estimating QN and Rt simultaneously from Rq is
not possible. However, since PsRqPs = PsQNPs, we can estimate QN from P RqP .
For notation simplicity, let A denote Ps RqPs. Since the interference signals are wide-sense
stationary in time, QN is a Topelitz matrix which can be represented by a sequence {qk; k =
0, 1, (N- 1)} with QN = { qk,} = { qk-j }. Then the ijth element of P' QN PS is given
by E k Pimq-kPkj with pyj denoting the ijth element of PI. Equating the ijth element of

P QNPs with aj, we have a set of linear equations in {qk}. Noticing the hermitian nature of
PsQNPs and A and separating the real and imaginary parts of qk and aj, we have N2 linear
equations with 2N 1 unknowns in q, = [qo, Re(q1), Im(qi),..., Re(qN_1), Im(qN-1)]7. The
set of linear equations can be solved by employing the least square approach. Then the estimate
of QN can be constructed based on q,.

In addition, when N is large, QN can be approximated by a circulant matrix [78] with fixed

eigenvectors as:

QNg FNyF" (3.55)

where FN is the N x N FFT matrix and % is a diagonal matrix containing eigenvalues Oi. We

notice that we only require the nt smallest eigenvalues of QN and their corresponding eigen-
vectors in constructing the optimal training sequences. With the circulant matrix approximation

(3.55), it is equivalent to estimating the n, smallest eigenvalues O, and identifying the corre-
sponding columns of F. The nt, smallest positive eigenvalues of QN are used as the estimates

of the nt smallest Vi, and the corresponding columns of F are chosen as those closest to the
eigenvectors associated with the nt smallest positive eigenvalues of QN.
The estimates of the nt smallest Oi and the nt indices of the chosen columns of FN are

then fed back to the transmitter for the optimal training sequence construction. We notice that

it is bandwidth efficient to just feed back these indices of FN instead of the whole eigenvectors

of QN because the number of training symbols N during the training period is usually large.
To derive the estimator of Rt, we need the following lemma which establishes the asymp-

totical equivalence of QN and PsQNP, as N increases.
Lemma 3.4.2. With the assumption that QN is an absolutely summable Toeplitz matrix, QN and

PsQNPs are asymptotically equivalent. Since QN is Toeplitz, P QNP( is asymptotically
Toeplitz.

Proof Two definitions of the norms of a matrix which include the strong norm and weak norm
[78, 79] are needed to study the asymptotic equivalence of two matrices. The strong norm || A |
is defined as

I| A J|= maXx:x*x=1[x*A*Ax] = v/Amr (A*A)

where Amax represents the largest eigenvalues of a matrix. If A is Hermitian, I| A I|= Amax(A)|.
The weak norm of A is defined as

JAI = (n-'Tr[A*A]) .

Two sequences of n x n matrices An and Bn are said to be asymptotically equivalent [78]

if A, and Bn are uniformly bounded in strong norm:

|| An 1, Bn J|< M
and An B, approaches zero in weak norm as n -- oo:

lim An Bn| = 0.
n--00

If one of the two matrices is Toeplitz, then the other is said to be asymptotically Toeplitz.

Without the loss of generality, we assume that QN is an absolutely summable Toeplitz

matrix. (For the temporal interference correlation matrix QN arising from practical scenarios,

such as jamming signals and co-channel interference considered here, it is easy to verify that

QN is absolutely summable.) QN can be represented by a sequence {qk; k = 0, 1, +2,... }
with QN = {qkj} = {qk-j} and E= -oo qk < 00. It is shown [80] that QN is bounded in

strong norm as:
+o00
I QN <|| 2 1 I, = 2Mq < oo.
k=-oo
Then we need to show that |1 PsQNPs II is also bounded. Using the properties of the strong

norm, we have

II PQNPs II

= I (I- Ps)QN(I Ps) 1I

= II QN PsQN QNPS + PsQNPs II

< II QN II + II PsQN 1I + II QNPs I + II PsQNPs II.

To proceed, we need the following lemma [40]:

Lemma 3.4.3. For two Hermitian positive semi-definite matrices G and H,

Amax(GH) < Amax(G)Amax(H).

Then, we have

11 PsQN H= [Am.o(QNPsQN)] < [Am,(QN)Ama.(Ps)Ama(QN)]I = Am(Q) =|1 QN I

Similarly, 1I QNPs 11i11 QN 11 and || PsQNPs 1|111 QN I|. Thus, PQNP 11 4 |
QN 11= 8AM. Let M = 8M,, then | QN |11 M < c0 and I| PsQNP s II< M < 00.
Next, we need to show that the distance of the two matrices goes to zero asymptotically in
weak norm. Using the properties of weak norm, we have

IQN P QNPI

= IPsQN + QNPs PsQNPsI

< IPsQNl + IQNPsI + IPsQNPsI.

We need the following Lemma [78, 80]:

Lemma 3.4.4. Given two n x n matrices G and H, then

IGHI <11 G II IH.

The weak norm of Ps can be written as

Ps| = (N-Tr[S(SHS)-ISH]) = (N-1Tr[ ) = ( .

Then using the above lemma, we have

IQNPsI <11 QN II Psi = (n) I1 QN 1< (n)2Mq.

Similarly, IPsQNPsI <-1 PsQN II (') <11 QN II (a)! < (N) 2M, and
IPsQNI = IQNPsI < (L)22M,. Then, we can show that

IQN PsQNPs 3 im () 2Aq = 0.

Based on the above lemma, the transmit channel correlation matrix Rt can be estimated by

projecting the received signal onto R(S). Since N is usually much larger than nt, we have

Rq SRtSH + P QNPs, (3.56)

and hence

PsRqPs PsSR,SHPs + PsPsQNP sPs = SRSH. (3.57)

Then we can estimate the transmit channel correlation matrix R, using

ft = (SHS)-lSHfRS(SHS)-1. (3.58)

3.5 Numerical Results

In this section, we present some numerical results to show the performance gain for channel

estimation achieved by the designed optimal training sequences. We consider a MIMO system

with 3 transmit antennas and 3 receive antennas. The antennas form uniform linear arrays at both

the transmitter and the receiver. For a small angle spread, the correlation coefficient between

the ith and the jth transmit antenna [67] can be approximated as:

[Rlj 2 exp {-j2r sin A dt sin }dO = Jo(27r|i J sin A ), (3.59)
S27 j27ri sin A A

where Jo(x) is the zeroth order Bessel function of the first kind, A is the angle spread, dt is

the antenna spacing and A is the wavelength of a narrow-band signal. We set dt = 0.5A. In

the simulations, we consider two channels with different transmit channel correlations: a high

spatial correlation channel with A = 50 and a low spatial correlation channel with A = 25.

The receive correlation matrix Rr is calculated similarly as the transmit correlation matrix with

A = 25.
We consider two kinds of interference: the co-channel interference from other users in the

same wireless system and jamming signals which are usually modeled by autoregressive (AR)

random processes.
We compare the channel estimation performance in terms of the total MSE for systems

using different sets of training sequences. The following different training sequence sets are

considered for comparison: 1) the optimal training sequences described in Section 3.3., 2) the

approximate optimal training sequence constructed based on the channel and interference statis-

tics obtained by using the proposed estimation algorithm in Section 3.4., 3) the temporally op-

timal training sequences for which the transmit channel correlation matrix is assumed to be an

identity matrix and only temporal interference correlation is considered in designing the optimal

training sequences. (we also consider the approximate temporally optimal sequences which are

constructed based on the channel statistics obtained by using the proposed algorithm), 4) the

spatially optimal training sequences for which the interference is assumed to be temporally

white and only transmit correlation is considered in designing the optimal training sequences.

(we also consider the approximate spatially optimal sequences which are constructed based

on the channel statistics obtained by using the proposed algorithm), 5) Binary orthogonal se-
quences, 6) Random sequences.
3.5.1 Co-channel Interference

In a cellular wireless communication system, co-channel interference (CCI) from other

cells exists due to frequency reuse. Hence, the interfering signals have the same signal format

as that of the desired user. We can express the interfering signal transmitted from the ith transmit

antenna of the mth interferer as

s m)(t) = b2)0(t IT Tm) (3.60)
v 'ti l=-00

where P,, is the transmit power of the mth interferer, and {b)'I } are data symbols transmitted

from the ith transmit antenna of the mth interferer. They are assumed to be i.i.d. binary random

variables with zero mean and unit variance. In addition, )(t) is the symbol waveform and T is

the symbol duration. It is assumed that the receiver is synchronized to the desired user but not

necessarily to the interfering signals and Tr is the symbol timing difference between the rmth

interferer and the desired user signal. Without loss of generality, we assume 0 < r,, < T. The
elements of the interference symbol matrix Si are samples at the matched filter output at the

receiver at time index jT. The (j, i)th element of S, is

F=\- b'((j 1)T -T,) (3.61)
,-o'

with

(t) = (t s) "(s)ds (3.62)
--oo
where 4(t) is the autocorrelation of the symbol waveform. For the co-channel interference, the

temporal interference correlation is due to the intersymbol interference in the sampled interfer-

ing signals.

In the simulations, it is assumed that there are two interfering signals in the system and the

SIR (signal-to-interference ratio) is set to be OdB. The ISI-free symbol waveform with raised

cosine spectrum is chosen as the symbol waveform. For this case, we have
4(t) = sinc(wrt/T) cos(- r23T)

We set the roll-off factor 0 = 0.5, T1 = 0.2T and -r2 = 0.5T.

In Fig. 3.1 and Fig. 3.2, we show the total channel estimation MSEs for the high spatial

correlation channel and low spatial correlation channel, respectively. For both cases, the opti-

mal sequences outperform the orthogonal sequences and random sequences significantly. For

the high spatial correlation channel, the optimal sequences provide a substantial performance

gain over both the spatially optimal sequences and the temporally optimal sequences. The ap-

proximate optimal sequences achieve most of the performance gain obtained by the optimal

sequences. For the low spatial correlation channel, the MSE performance of the approximate

optimal sequences is close to that of the optimal sequences. The temporal correlation has a

stronger impact on the channel estimation than the spatial channel correlation due to the fact

that the length of training sequences N is much larger than the number of transmit antennas t.

It is verified by the simulation results shown in Fig. 3.2 that the temporally optimal sequences

achieve an estimation performance similar to that achieved by the optimal sequences. These two

optimal sequences provide significant performance gain over the spatially optimal sequences.

3.5.2 Jamming Signals

We assume that there are two jamming signals in the system. The jamming signals are

modeled as two first order AR processes driven by temporally white Guassian processes {ui,t}

as,

(3.63)

Si,t = aiSi,t-_1 + Ui,t

10-1

LU
w

10-2

I I I I I
15 20 25 30 35 40
N

45 50 55 60 65

Figure 3.1: Comparison of total MSEs obtained using different training sequences. ISI-free
symbol waveform and high spatial correlation channel.

S ... .......... 1) Optimal sequences
2) Approximate optimal sequences
S-9- 3) Temporally optimal sequences
S-e- 4) Spatially optimal sequences
-8- 5) Orthogonal sequences
-+- 6) Random sequences
-v- Approximate temporally optimal sequenes
S-0- Approximate spatially optimal sequences

S--- 1) Optimal sequences
S-*- 2) Approximate optimal sequences
S-v- 3) Temporally optimal sequences
/-e- 4) Spatially optimal sequences
10- ....... -e- 5) Orthogonal sequences
10- .. ... --- 6) Random sequences
+ -v- Approximate temporally optimal sequences
S \ ...... ...... -0- Approximate spatially optimal sequences

10-2 I I I
15 20 25 30 35 40 45 50 55 60 65
N

Figure 3.2: Comparison of total MSEs obtained using different training sequences. ISI-free
symbol waveform and low spatial correlation channel.

where si, represents the jamming signal transmitted by the ith jammer at the tth time index, ao

is the temporal correlation coefficient, and u,,t has zero mean with variance oa,, which decides

the transmit power of the ith jammer. The SIR is set to be 0 dB. We choose aI = 0.4 and

02 = 0.5.

In Fig. 3.3 and Fig. 3.4, we show the total channel estimation MSEs for the high spatial

correlation channel and low spatial correlation channel, respectively. For AR jammers, simi-

lar conclusions on the estimation performance achieved by different training sequences can be

made as in the case of co-channel interference.

3.6 Conclusion

In this chapter, we consider a wireless communication system with multiple transmit and

receive antennas in a slow, Rayleigh flat-fading environment. We study the problem of the

estimation of correlated MIMO channels with colored interference. The Bayesian channel es-

timator is derived and the optimal training sequences are designed based on the mean square

error (MSE) of channel estimation. We propose an algorithm to estimate long-term channel

I I I

- 1) Optimal squences
- 2) Approximate optimal sequences
-v- 3) Temporally optimal sequences
-e- 4) Spatially optimal sequences
-B- 5) Orthogonal sequencs
-i- 6) Random sequences
-v- Approximate temporally optimal sequences
-0- Approximate spatially optimal sequences

_0_0

L I I I
15 20 25 30 35 40
N

45 50 55 60 65

Figure 3.3: Comparison of total MSEs obtained using different training sequences. ARjammers
and high spatial correlation channel.

10
15

20 25 30 35 40
N

I I 55 60 65
45 50 55 60 65

Figure 3.4: Comparison of total MSEs obtained using different training sequences.
and low spatial correlation channel.

- 1) Optimal sequences
* 2) Approximate optimal sequences
-v- 3) Temporally optimal sequences
-e- 4) Spatially optimal sequences
-B- 5) Orthogonal sequences
-+- 6) Random sequences
-v- Approximate temporally optimal sequences
-0- Approximate spatially optimal sequences

AR jammers

\

statistics and design an efficient feedback scheme so that we can approximately construct the

optimal sequences at the transmitter. Numerical results show that the optimal training sequences

provide substantial performance gain for channel estimation when compared with other training

sequences.
3.7 Appendix

3.7.1 A Trace Problem

In this appendix, we analyze a variant of the optimization problem (3.18) which can be

formulated as

1 1
min tr(RfSHQN SR + It)-1 (3.64)
s
subject to tr{S"S} < P

Two different trace optimization problems (3.18) and (3.64) are related in the form of cost

functions. The cost function of the original optimization problem (3.18) can be rewritten as

tr(SHQ lS + R- 1)-1 = trRt(RFSHQN-SR? + I)-1,

which can be viewed as the weighting of the cost function of the new trace optimization problem

(3.64).

For the sake of notational simplicity, we consider the following same optimization problem

as (3.64) with different but simpler notations.

rain tr (DSHQSD + 1)-1 (3.65)
S
subject to tr (SHS) < P, S E C"X.

Here Q is a nonzero Hermitian, positive semidefinite matrix, D is a nonzero Hermitian, positive

definite matrix, and the positive scalar P is the power constraint associated with the signal

S. The main results on the solution to the optimization problem (3.65) are cited here for the

completeness of the dissertation and the details can be found in the literature [81]. We write the

inverse matrix

C = (DSHQSD + I)1)-

for convenience.

As discussed before, the solution in the special case D = I can be expressed in terms

of the eigenvalues and eigenvectors of Q and a Lagrange multiplier associated with the power

constraint. For the optimization problem introduced here, D \$ I and minimizing the trace of

C is more difficult. We will show that (3.65) has a solution that can be expressed S = UEVH

where U and V are orthonormal matrices of eigenvectors for Q and D respectively, and E

is diagonal. Solving (3.65) involves computing diagonalizations of Q and D, and finding an

ordering for the columns of U and V. We are able to evaluate the optimal ordering when either

P is large or P is small. However, for intermediate values of P, evaluating the optimal ordering

is more difficult. The problem (3.65) has a combinatorial nature, unlike the special case D = I.

The trace problem (3.65) arises in spreading sequence optimization for code division mul-

tiple access (CDMA) systems. In cellular communication systems, multiple access schemes

allow many users to share simultaneously a finite amount of radio resources. CDMA is one of

the main access techniques. It is adopted in the IS-95 system and will be used in next generation

cellular communication systems [82]. In a CDMA system, different users are assigned different

spreading sequences so that the users can share the communication channel. We consider the

uplink (communication from the mobile units to the base station) of a CDMA system where the

users within a base station are symbol synchronous. The co-channel interference from the users

in the neighboring cells are modeled by additive, colored Gaussian noise. The received signal

at the base station is
K
y = hisix + e,
i=1
where K is the number of signals received by the base station, xi is the symbol transmitted from

the ith user, si, CN is the spreading sequence assigned to the ith user, hi is the channel gain

from the ith user to the base station, and e E CN is the additive, colored Gaussian noise with

zero mean and covariance E. Usually the size of K and N are comparable. It is assumed that

the symbols x, are independent with zero mean and unit variance. The received signal can be

expressed as

(3.66)

y = SHx + e,

where S, the spreading sequence matrix, has jth column sj, and H is a diagonal matrix with

ith diagonal element hi. Again, by the Bayesian Gauss-Markov Theorem [36, 83], the MMSE

estimator of x is
x = (HHSHE-1SH + I)-HHSHE-ly.

The corresponding covariance matrix of the estimation error is

C, = (HHSHE-ISH + I)-

The optimal spreading sequences for all the users which minimizes the co-channel interference

to other cells, subject to a power constraint, corresponds to (3.65) with Q = E-1 and D = H,
a diagonal matrix.
To solve the trace optimization problem, we begin by analyzing the structure of an optimal

solution to (3.65). Let UAUH and VAVH be diagonalizations of Q and D respectively (the

columns of U and V are orthonormal eigenvectors). Let 6i, 1 < i < n, and Aj, 1 < j < m,

denote the diagonal elements of A and A respectively. We assume that the eigenvalues are
arranged in decreasing order:

1 62 > > 6n and A1> A2> ... > Am. (3.67)

Let us define
T = UHSV. (3.68)

Making the substitution S = UTVH in (3.65) yields the following equivalent problem:

min tr (ATHATA + I1)-1 (3.69)

subject to tr (THT) < P, T CCmxn.

We now show that (3.69) has a solution with at most one nonzero in each row and column.
Theorem 3.7.1. There exists a solution of (3.69) of the form T = II E112 where III and H2
are permutation matrices and o-i = Ofor all i =, j.
Combining the relationship (3.68) between T and S and Theorem 3.7.1, we conclude that
problem (3.65) has a solution of the form S = UIIEII2VH, where III and 112 are permutation

matrices. We will now show that one of these two permutation matrices can be deleted if the

eigenvalues of D and Q are arranged in decreasing order.

Let N denote the minimum of m and n. Making the substitution S = UfIIHI2VH in

(3.65), we obtain the equivalent problem:

min tr ((H2Ar2H)EH(H 1AI)E 2AH) + I (3.70)
N
subject to tr ao2 < P.
i=1

Here the minimization is over diagonal matrices E with 0-1,.... aN on the diagonal, and per-

mutation matrices II1 and II2.

The symmetric permutations HI'AII1 and I12AII' essentially interchange diagonal ele-

ments of A and A. Hence, (3.70) is equivalent to

min N 1 (3.71)
or, 11 2 (: (i)O-)2A,1(i) + 1
N
subject to -2 < P, 711 E Pm, 712 Pn
i=1

where Pm is the set of bijections of {l1, 2,.. ., m} onto itself.

We first show that we can restrict our attention to the largest diagonal elements of D and

Q.

Lemma 3.7.1. Let UAUH and VAVH be diagonalizations of Q and D respectively where

the columns of U and V are orthonormal eigenvectors. Let a, 7r1, and 7r2 denote an optimal

solution of(3.71) and define the sets

AN= {i: >0}, Q= {A,,(.) : i Af}, and D = {6,( : i E },

IfJ f has I elements, then the elements of the set D and Q are all nonzero, and they constitute

the I largest eigenvalues of D and Q respectively.

Using Lemma 3.7.1, we now eliminate one of the permutations in (3.71).

Theorem 3.7.2. Let UAUH and VAVH be diagonalizations of Q and D respectively where

the columns of U and V are orthonormal eigenvectors, and the eigenvalues of Q and D are

arranged in decreasing order as in (3.67). If K is the minimum of the rank of D and Q, then

(3.71) is equivalent to
K 1
mm 1 (3.72)
min (6,,)2A,(i)+ 1
K
subject to Za,2 P, 7rC 'PK,
i=1

where ai = 0 for i > K.

Proof The proof is similar to that for Theorem 3.3.2 E

Corollary 3.7.1. Problem (3.65) has a solution of the form S = UIIH~VH where the columns of

U andV are orthonormal eigenvectors of Q and D respectively with the associated eigenvalues

arranged in decreasing order, H is a permutation matrix, and E is diagonal.

Proof The proof is similar to that for Corollary 3.3.1. OD

Assuming the permutation 7r in (3.72) is given, let us now consider the problem of optimiz-

ing over a. To simplify the indexing, let pi denote A,(i). Hence, for fixed 7r, (3.72) is equivalent

to the following optimization problem:

min (3.73)
rnin ~ (6iC)2pi + 1

K
subject to af l P.
i=1

The solution of (3.73) can be expressed in terms of a Lagrange multiplier for the constraint.

Theorem 3.7.3. The optimal solution of (3.73) is given by

cr = max { P- p, 1/2 (3.74)

where the parameter p is chosen so that
K
or 2 = P. (3.75)

Proof The proof is similar to that for Theorem 3.3.3.

To solve (3.65), we need to find an optimal ordering for the eigenvalues of D and Q. In

Theorems 3.7.4 and 3.7.5, we determine the optimal ordering when the power P is either large
or small.

Theorem 3.7.4. If the eigenvalues {Ai } and {6, } of Q and D respectively are arranged in

decreasing order, then for P sufficiently large, an optimal permutation in (3.72) is

7r(i) = K + 1 i., < i < K, 7i) = i, i > K. (3.76)

Theorem 3.7.5. Suppose the eigenvalues {Ai } and {6i } of Q andD respectively are arranged in

decreasing order, and let L be the minimum of the multiplicities of 61 and A1. For P sufficiently
small, an optimal solution of (3.65) is

S = u'ivH (3.77)

where ui and vi are the orthonormalized eigenvectors of Q and D associated with A1 and 61

respectively.
3.7.2 A Determinant Problem

In this appendix, we analyze the following matrix optimization problem where we maxi-

mize the determinant, denoted "det", of a matrix:

max det (DSHQSD + I) (3.78)
S
subject to tr (SHS) < P, S E Cmxn

Since the determinant of the inverse of a matrix is the reciprocal of the determinant of the matrix,

it follows that problem (3.78) is equivalent to replacing trace by determinant in (3.65). Hence,

in the original problem (3.65), we minimize the sum of the eigenvalues of the MSE matrix C,

while in the second problem (3.78), we minimize the product of the eigenvalues of C. In either
case, we try to make the eigenvalues of C small, but with different metrics.
For the special case D = I, the solution of (3.78) can be found in Telatar [1], and for the
special case Q = I, the solution of (3.78) can be found in Zhou [63]. For the more general
problem (3.78), we again show that the solution can be expressed S = UEVH, where U and V

are orthonormal matrices of eigenvectors for Q and D respectively, and E is diagonal. Unlike

the trace problem (3.65), the ordering of the columns of U and V does not depend on the

power P the columns of U and V should be ordered so that the associated eigenvalues of

Q and D are in decreasing order. This optimal eigenvector ordering result is the same as that

for the optimization problem (3.18) in Section 3.3 when the same notations for corresponding

matrices are adopted. In Cai et al. [65], the authors formulated the similar optimization problem

while studying the space-time spreading (STS) scheme for correlated fading channels in the

presence of interference. Based on the previous optimization result for the special case Q = I

[63], UEV" was chosen as the STS matrix, and then the optimal eigenvector ordering and E

were decided. Here we solve the optimization problem (3.78) by using the method introduced

in Wong et al. [61] and Wong et al. [84]. (Two important matrix inequalities arising from

majorization theory [40] are used.)

The determinant problem arises from spreading sequence optimization for CDMA systems.

For CDMA systems, a different performance measure, which arises in information theory, is

the sum capacity of the channel. The mean square error is a performance measure for uncoded

systems, while the sum capacity is a performance measure for coded systems. It represents the

maximum sum of the rates at which users can transmit information reliably. The sum capacity

of the synchronous multiple access channel (3.66) is

Csum = max I(xi,..., XK; y),

where I represents the mutual information [74] between the inputs x1, X2,..., XK and the out-

put vector y. The maximization is over the independent random inputs X1, X2, ... XK. The

maximum is achieved when all the random inputs are Gaussian. In this case, the sum capacity

[71, 85] becomes

Csum = 1 log det (HHSHE-1SH + I).
2N
Since log is a monotone increasing function, the maximization of the sum capacity, subject to a

power constraint, corresponds to the optimization problem (3.78) with Q = E-1 and D = H.

The solution to the determinant problem (3.78) can be expressed as follows:

Theorem 3.7.6. Let UAUH and VAVH be the diagonalizations of Q and D respectively

where the columns ofU andV are orthonormal eigenvectors and the corresponding eigenvalues

{A,} and {6f} are arranged in decreasing order If K is the minimum of the rank of Q and D,

then the optimal solution of (3.78) is given by

S = UEVH, (3.79)

where E is diagonal with diagonal elements given by

a, = max i- 1/20 for 1 < i < K (3.80)

and ai = O for i > K, where the parameter p is chosen so that
K
^P.
i= 1

Proof Initially, let us assume that both D and Q are nonsingular later we remove this restric-

tion. Insert T = Q1/2S in (3.78) and multiply the objection function on the left and right by

det (D-1) to obtain the following equivalent formulation:

max det (THT + D-2) (3.81)
T
subject to tr (TTHQ-1) < P, T E Cmxn

Let wi, 1 < i < n, denote the eigenvalues of THT arranged in decreasing order. By a theorem of

Fiedler [86] (also see [40, Chap. 9, G.4]), the determinant of a sum THT + D-2 of Hermitian

matrices is bounded by the product of the sum of the respective eigenvalues (assuming the

eigenvalues of THT and D are in decreasing order):

det (THT + D-2) < U(w, + 6 2) (3.82)
i=1

Also, by a theorem of Ruhe [87] (also see [40, Chap. 9, H2]), the trace of a product (TTH)Q-1

of Hermitian matrices is bounded from below by the sum of the product of respective eigenval-

ues (assuming the eigenvalues of TTH and Q are in decreasing order):
N
tr (TTHQ-1) > -wA.-1, N = min{m,n}, (3.83)
i=1

since at most N eigenvalues of THT and TTH are nonzero.

We replace the cost function in (3.78) by the upper bound (3.82) and we replace the con-

straint in (3.78) by the lower bound (3.83) to obtain the problem:

max ( 2 n(wO + 6.2) (3.84)
i=N+ i=1
N
subject to L toA < P, w uwi+1 > 0 for i < N.
i=1

If T is feasible in (3.81), then the square of its singular values are feasible in (3.84) by (3.83).

And by (3.82), the value of the cost function in (3.84) is greater than or equal to the associated

value (3.81). Since the feasible set for (3.84) is closed and bounded, and since the cost function

is continuous, there exists a maximizing w, and the maximum value of the cost function (3.84)

is greater than or equal to the maximum value in (3.81).

Consider the matrix T = UVnl/2VH where f is a diagonal matrix containing the max-

imizing w on the diagonal. For this choice of T, the inequalities (3.82) and (3.83) are both

qualities. Hence, this choice for T attains the maximum in (3.81). The corresponding optimal

solution of (3.78) is

S = Q-1/T = UA-1/2UHU 1/2VH = UA-1/2nl/2VH. (3.85)

To complete the proof of the theorem, we need to explain how to compute the optimal W in

(3.84).

At the optimal solution of (3.84), the power constraint must be an equality (otherwise, we

could multiply w by a positive scalar and increase the cost). Let us ignore the monotonicity

constraint w _> o w+1 (we will show that the maximizer satisfies this constraint automatically).

After taking the log of the cost function, we obtain the following simplified version of (3.84):
N
max log(w + -2) (3.86)

N
subject to w iA1 = P, W > 0.
i=i

Since the cost function is strictly concave, the maximizer of (3.86) is unique.

The first-order optimality conditions (KKT conditions) for an optimal solution of (3.86)

are the following: There exists a scalar p > 0 and a vector v e R" such that

1 p
-i + vi = 0, > 0, w > 0, viwi = 0, 1 < i < N. (3.87)
+,_ + A2 A.

Analogous to the proof of Theorem 3.7.3, we define the function

i) = ( 2) + (3.88)

This particular value for wi is obtained by setting i = 0 in (3.87), solving for wu; when the

solution is < 0, we set wji(p) = 0 (this corresponds to the + operator (3.88)). Observe that

wi(/) in (3.88) is a decreasing function of y which approaches +oo as p approaches 0 and

which approaches 0 as p tends to +00. Hence, the equation
n
i(p) A\t = P (3.89)
i= 1

has a unique positive solution. We have wc = 0 for p > Ai62, which implies that

1 = ) ,+ 1 =- > -62++6,2 =0 when p>\Ai6.
i(p) + '(, -2 A, 6-2 + A -

It follows that the KKT conditions are satisfied when p is the positive solution of (3.89). Since

the At and 6i are both arranged in decreasing order, it follows that for any choice p > 0, the w,

given by (3.88) are in decreasing order. Hence, the constraint woi+ < wi in (3.84) is satisfied

by the solution of (3.86). Combining the formula (3.88) for the solution of (3.86) with the

expression (3.85) for the solution of (3.78), we obtain the solution S given in (3.79) and (3.80)

where E = A- 1/21/2.

Now suppose that either D or Q is singular. Let us consider a perturbed problem where

we replace Q by Q, = UAUH and D by D, = VAVH:

max det (DSHQSD, + I) (3.90)
S
subject to tr (SHS) < P, S Cmxn

Here A, and A, are obtained from A and A by setting 6, = c = Aj for i or j > K. Since Qe

and D, are nonsingular, it follows from our previous analysis that the perturbed problem (3.90)

has a solution of the form S, = UEVH where the diagonal elements of E, are given by

max 1 01/2 for 1 o = { 1 ) ~ (3.91)

max 0 fori > K.

Let p be chosen so that
K
ae)2 = P.
i=1
Observe that when c3 < p, we have ac = 0 for i > K and
N
c)2 = p.
i=1

Hence, for each e > 0 with e3 < p, the optimal solution of the perturbed problem does not

depend on e and the trailing diagonal elements a' for i > K vanish. Since the cost function in

the perturbed problem (3.90) is a continuous function of e, we conclude that for e3 < p, S, is the

optimal solution of (3.90) for e = 0. The perturbed problem (3.90) with e = 0 coincides with

the original problem (3.78). Consequently, the solution (3.79) and (3.80) is valid, even when

either Q or D is singular. F

CHAPTER 4
CONCLUSION AND FUTURE WORK

To achieve the performance gain promised by multiple antenna systems, parameter estima-

tions including timing estimation and channel estimation are key components of the space-time

system design. In this work, we investigate the timing estimation and channel estimation prob-

lems for MIMO systems.
4.1 Timing Estimation for Rayleigh Flat-fading MIMO Channels

In Chapter 2, we consider a wireless communication system with multiple transmit and

receive antennas in a slow, independent and identically distributed (i.i.d.) Rayleigh flat-fading

environment. We study the problem of timing estimation in such a system with the aid of

training signals from two different approaches. In the first approach, the channel is assumed to

be unknown but deterministic and joint ML estimation of the channel and delay is performed. In

contrast, in the second approach, we assume that the channel is random but with known statistics

and use the likelihood function averaged over all random channel realizations to construct the

ML estimator for the delay. For both approaches, we derive the optimal training sequences

based on the performance measures associated with the CRB of timing estimation. These two

approaches lead to two different optimal training signal designs. For the deterministic channel

approach, we show that orthogonal training signals from multiple transmit antennas minimize

the outage probability as well as the average CRB. For the random channel approach, perfectly

correlated training signals employed at different transmit antennas minimize the CRB.
4.2 Channel Estimation for Correlated MIMO Channels with Colored Interference

In Chapter 3, we consider a wireless communication system with multiple transmit and

receive antennas in a slow, Rayleigh flat-fading environment. We investigate the problem of

estimating correlated MIMO channels in the presence of colored interference. The Bayesian

channel estimator is derived and the optimal training sequences are designed based on mini-

mizing the MSE of channel estimation. The design of the optimal training sequences has a
87

clear physical interpretation which implies that we should assign more power to the transmis-

sion direction constructed by the eigen-directions with larger channel gains and the interference

subspaces with less interference. The power assignment is determined by the water-filling argu-

ment under a finite power constraint. In order to implement the channel estimator and construct

the optimal training sequences, we propose an algorithm to estimate long-term channel statis-

tics and design an efficient feedback scheme so that we can approximately construct the optimal

sequences at the transmitter. Numerical results show that with optimal training sequences, the

MSE of channel estimation can be reduced substantially when compared with other training

sequences.
4.3 Timing Estimation for Correlated MIMO Channels with Colored Noise

In the second chapter, we study the timing estimation problem with the assumption that

the fading coefficients between the pairs of transmit and receive antennas are independent and

identically distributed. This assumption does not generally hold in practice due to the antenna

spacings and orientation, the mutual coupling, the richness of scattering, and the presence of

dominant components [88]. Thus it is natural to extend the current work to investigate the

synchronization problem in correlated channels.

Another possible direction to extend the present work is to address the timing estimation

problem for the MIMO system in colored noise. It is more suitable to adopt the colored noise

model than the white noise model when jammers and co-channel interference are present in the

communication system.

REFERENCES

[1] I. E. Telatar, "Capacity of multi-antenna Gaussian channels," Eur Trans. Telecom., vol.
10, pp. 585-595, Nov. 1999.

[2] G. J. Foschini and M. J. Gans, "On limits of wireless communications in a fading environ-
ment when using multiple antennas," Wireless Commun. Mag., vol. 6, pp. 311-335, Mar.
1998.

[3] V. Tarokh, N. Seshadri, and A. R. Calderbank, "Space-time codes for high data rate wire-
less communication: performance criterion and code construction," IEEE Trans. Inform.
Theory, vol. 44, pp. 744-765, Mar. 1998.

[4] S. Baro, G. Bauch, and A. Hansmann, "Improved codes for space-time trellis-coded mod-
ulation," IEEE Comm. Lett., vol. 4, pp. 20-22, Jan. 2000.

[5] A. R. Hammons and H. E. Gamal, "On the theory of space-time codes for PSK modula-
tion," IEEE Trans. Inform. Theory, vol. 46, pp. 524-542, Mar. 2000.

[6] S. M. Alamouti, "A simple transmit diversity technique for wireless communications,"
IEEEJ. Select. Areas in Commun., vol. 16, pp. 1451-1458, Oct. 1998.

[7] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, "Space-time block coding from orthogo-
nal designs," IEEE Trans. Inform. Theory, vol. 45, pp. 1456-1467, July. 1999.

[8] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, "Space-time block coding for wireless
communications: performance results," IEEE J. Select. Areas in Commun., vol. 17, pp.
452-460, Mar. 1999.

[9] G. Ganesan and P. Stoica, "Space-time diversity using orthogonal and amicable orthogonal
designs," Wireless Personal Communications, vol. 18, pp. 165-178, Aug. 2001.

[10] S. Alamouti, V. Tarokh and P. Poon, "Trellis-coded modulation and transmit diversity:
design criteria and performance evaluation," in Proc. IEEE Int. Conf. Universal Personal
Communications, vol. 2, Florence, Italy, Oct. 1998, pp. 703-707.

[11] B. M. Hochwald and T. L. Marzetta, "Unitary space-time modulation for multiple-antenna
communication in Rayleigh flat fading," IEEE Trans. Inform. Theory, vol. 46, pp. 543-
564, Mar. 2000.

[12] B. Hassibi and B. M. Hochwald, "High-rate codes that are linear in space and time," IEEE
Trans. Inform. Theory, vol. 48, pp. 1804-1824, Jul. 2002.

[13] S. Siwamogsathama and M. P. Fitz, "Robust space-time coding for correlated Rayleigh
fading channels," IEEE Trans. Signal Processing, vol. 50, pp. 2408-2416, Oct. 2002.

[14] Y. Gong and K. B. Letaief, "Concatenated space-time block coding with trellis coded
modulation in fading channels," in IEEE Trans. Wireless Commun., vol. 4, pp. 580-590,
Oct. 2002.

[15] G. J. Foschini, "Layered space-time architecture for wireless communication in a fading
environment when using multi-element antennas," Bell Labs. Tech. J. vol. 1, no. 2, pp.
41-59, 1996.

[16] G. Foschini, G. Golden, R. Valenzuela, and P. Wolniansky, "Simplified processing for
high spectral efficiency wireless communication employing multi-element arrays," IEEE
J. Select. Areas in Commun., vol. 17, pp. 1841-1852, Nov. 1999.

[17] S. Verdu, Multiuser Detection. Cambridge, UK: Cambridge Univ. Press, 1998.

[18] M. J. Gans, N. Amitay, Y. Yeh, H. Xu, T. Damen, R. Valenzuela, T. Sizer, R. Storz, D.
Taylor, W. MacDonald, C. Tran, and A. Adamiecki, "Outdoor BLAST measurement sys-
tem at 2.44 GHz: calibration and initial results," IEEE J. Select. Areas in Commun., vol.
20, pp. 570-583, Apr. 2002.

[19] C. Budianu and L. Tong, "Channel estimation for space-time orthogonal block codes,"
IEEE Trans. Signal Processing, vol. 50, pp. 2515-2528, Oct. 2002.

[20] P. Stoica and 0. Besson, "Training sequence design for frequency offset and frequency-
selective channel estimation," IEEE Trans. Commun., vol. 51, pp. 1910-1917, Nov. 2003.

[21] C. Chuah, D. Tse, J. Kahn and R. Valenzuela, "Capacity scaling in MIMO wireless systems
under correlated fading," IEEE Trans. Inform. Theory, vol. 48, pp. 637-650, Mar. 2002.

[22] A. L. Moustaks and S. H. Simon, "Optimizing multiple-input single-output (MISO) com-
munication systems with general Gaussian channels: nontrivial covariance and nonzero
mean," IEEE Trans. Inform. Theory, vol. 49, no. 10, pp. 2770-2780, Oct. 2003.

[23] S. A. Jafar and A. Goldsmith, "Multiple-antenna capacity in correlated Rayleigh fading
with channel covariance information," IEEE Trans. Wireless Commun., vol. 4, no. 3, pp.
990-997, May. 2005.

[24] E. A. Jorswieck and H. Boche, "Channel capacity and capacity-range of beamforming in
MIMO systems under correlated fading with covariance feedback," IEEE Trans. Wireless
Commun., vol. 3, pp. 1543-1553, Sep. 2004.

[25] E. A. Jorswieck and H. Boche, "Optimal transmission strategies and impact of correlation
in multiantenna systems with different types of channel state information," IEEE Trans.
Signal Processing, vol. 52, no. 12, pp. 3440-3453, Dec. 2004.

[26] A. Lozano and A. M. Tulino, "Capacity of multiple-transmit multiple-receive antenna ar-
chitectures," IEEE Trans. Inform. Theory, vol. 48, no. 12, pp. 3117-3128, Dec 2002.

[27] A. L. Moustakas, S. H. Simon, and A. M. Sengupta, "MIMO capacity through correlated
channels in the presence of correlated interferers and noise: a (not so) large N analysis,"
IEEE Trans. Inform. Theory, vol. 49, no. 10, pp. 2545-2561, Oct. 2003.

Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EH096G9LG_U7LX2G INGEST_TIME 2011-11-02T14:50:52Z PACKAGE AA00004683_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES

PAGE 1

TIMING AND CHANNEL ESTIMATION IN MULTIPLE-ANTENNA COMMUNICATION SYSTEMS By YONG LIU A DISSERTATION PRES ENTE D TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005

PAGE 2

PAGE 3

To my wife and parents.

PAGE 4

ACKNOWLEDGMENTS First of all I wou ld like to thank my advisor Dr. Tan F. Wong for his invaluable guidance help and constant encouragement during my graduate study at the University of Florida. I also want to thank the other members of my graduate committee Dr. John M Shea Dr. Jose A B Fortes and Dr. William W. Hager, for their suggestions and help My specia l thanks go to Dr. William W. Hager for his great help during the development of the second half of this work. Finally I am extreme l y gratefu l to my famil y for their encouragement devotion and sup port throughout my whole life. I V

PAGE 5

TABLE OF CONTENTS ACKNOWLEDGMENTS TABLE ...... LIST OF FIGURES ABSTRACT. CHAPTER INTRODUCTION 1.1 1.2 1.3 Timing Estimation fo r Rayleigh Flat-fading MIMO Channels Channel Estimation for Correlated MIMO Channel s with Colored Interference . . . . . Organization of the Dissertation . . . . . . . . . 2 TIMING ESTIMATION IN MULTIPLE-ANTENNA SYSTEMS OVE R 3 RAYLEIGH FLAT-FADING CHANNELS 2 1 Introduction . . . . . . . 2.2 System Model .......... . 2.3 Timing Estimation with Unknown Detenninistic Channel 2.3. 1 ML Estimator ..... 2 3 2 CramerRao Bound ....... 2 3.3 Optima l Training Scheme ... . 2.4 Timing Estimation with Random Channel 2.4 1 ML Estimator . . . 2.4.2 CramerRao Bound .. 2.4.3 Optimal Training Scheme 2 5 Discus sions and Conclusions . 2 .5.1 Orthogonal Training Signa l s 2.5.2 Perfectly Correlate d Training Signa l s 2 5.3 Deterministic vs Random Channel Approaches CHANNEL ES TIMATION FOR CORRELATED MIMO CHANNELS WITH COLORED INTERFERENCE 3 .1 Introduction . . . . . 3 .2 System Mode l . . . . . 3.3 Optimal Training Sequence D esign 3 3.1 Solution Structure 3.3.2 The Optimal I: .. V lV Vll Vlll IX 3 4 5 7 7 8 11 11 1 2 14 29 31 32 35 37 40 40 41 42 42 44 49 50 56

PAGE 6

4 3.3 3 Optimal Eigenvector Ordering . ... ... . 3 .4 Estimation of Channel Statistics and Feedback Design 3 5 Numerical Results ...... 3. 5 1 Co-channel Interference 3.5.2 Jamming Signals 3 6 Conclusion ....... 3. 7 Appendix . . . . 3.7.1 A Trace Problem 3 7.2 A Determinant Problem CONCLUSION AND FUTURE WORK 4 1 4.2 Timing Estimation for Rayleigh Flat-fading MIMO Channels Channel Estimation for Correlated MIMO Channels with Colored 58 62 69 70 71 73 76 76 81 87 87 Interference . . . . . . . . . . . . . . 87 4.3 Timing Estimation for Correlated MIMO Channels with Colored Noise 88 REFERENCES BIOGRAPHICAL SKETCH VI 89 95

PAGE 7

TABLE Table 1.1 Matrix Notations . . . . . . . . . . . . . . . . 6 Vil

PAGE 8

LIST OF FIGURES Figure 2.1 Outage probabilities achieved using different training signal sets for a system with 4 transmit and I receive antennas The unit of the threshold E is T;. 2.2 Outage probabilities achieved using orthogonal training signals for different numbers of transmit antennas One receive antenna is employed. The unit of the threshold E is T52 . . . . . . . . . . . . . . . 2 3 Comparison of outage probabilities of the ML estimator obtained from simula tion and calculated from the CRB. The number of transmit antennas nt is 2 and E = 10-4rs2 ... .................... 2.4 Comparison of the MSE of the ML estimator obtained from simulation and the average CRB. The number of transmit antennas nt is 2 The unit in the rt. I .. T2 ve 1ca axis 1s s . . . . 2.5 Comparison of CRBs obtained using orthogonal training sequences and per fectly correlated training sequences for different numbers of transmit anten nas Note that the CRB of the system with the perfectly correlated training sequences is the same for any number of transmit antennas .......... 2.6 Comparison of the MSE of the ML estimator obtained from simulation and the CRB. The number of transmit antennas ni is 2 The unit in the vertical axis is T s 2 ............... ........................ 3 1 Comparison of total MSEs obtained using different training sequences !SI-free symbol waveform and high spatial correlation channel. . . . . . . 3.2 Comparison of total MSEs obtained usin g different training sequences. ISi-free 22 23 24 30 38 39 72 symbol waveform and low spatial correlation channel. . . . . 73 3.3 Comparison of total MSEs obtained using different training sequences. AR jammers and high spatial correlation channel. . . . . . . 74 3.4 Comparison of total MSEs obtained using different training sequences. AR jamrners and low spatial correlation channel. . . . . . . 75 Vlll

PAGE 9

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy TIMING AND CHANNEL ESTIMATION IN MULTIPLE-ANTENNA COMMUNICATION SYSTEMS By Yong Liu December 2005 Chair: Tan F. Wong Major Department: Electrical and Computer Engineering There is an increasing demand for next generation wire less networks, including wireless local area networks and the third generation ce llular networks that can provide high data rate for broadband services improve quality of service (QoS), and support more u sers. The use of mul tiple transmit and receive antennas can offer substantial performance improvement to a wireless communication system by making th e u se of the extra degrees of freedom in the spatia l domain and thus is a promising technique to satisfy this demand. Many of the current space-time coding schemes proposed for multiple-antenna systems assume perfect timjng estimation and channel estimation to achieve the expected performanc e gain. The lack of timing synchronization be tween the transmit and receive signals and the in acc uracy of channel estimation could degrade the system performance In the first half of this work we investigate the probl e m of timing estimation in multiple antenna systems with the aid of training signa l s. A s low, independent and identically distribu ted Rayleigh flat-fading channel model i s considered. We derive two maximum lik e lihood timing estimators based on two different approaches, namely treating the channel as detenninistic and random, and present the corresponding Cramer-Rao bounds (CRBs). Then the optimal desi gns of trainin g s ignal s based on some figures of merit associated with the CRBs are discussed I X

PAGE 10

In the second half of this work we study the problem of the estimation of correlated multiple-input multiple-output (MIMO) channels with colored interference The Bayesian chan nel estimator is derived and the optimal training sequences are designed based on the mean square error of channel estimation. We propose an algorithm to estimate the long-tenn chan nel statistics in the construction of the optimal training sequences. We also design an efficient scheme to feed back the required information to the transmitter where we can approximately construct the optimal sequences Numerical results show that the optimal training sequences provide substantial perfonnance gain for channel estimation when compared with other train mg sequences. X

PAGE 11

CHAPTER 1 INTRODUCTION There is an increasing demand for next generation wireless networks including wireless local area networks and the third generation cellular networks that can provide high data rate for broadband services improve quality of service (QoS), and support more users. The use of multiple antennas at both the transmitters and receivers in wireless communication systems is a significant technical breakthrough which can offer substantial performance improvement to wireless links by making the use of the extra degrees of freedom in the spatial domain and thus is a promising technique to fulfill these requirements. A system employing multiple transmit and receive antennas is often called a multiple-input multiple-output (MIMO) system. Recently the MIMO system and its related techniques have been widely considered for next generation wireless communication systems such as wireless local area networks (WLAN) and the third generation (3G) cellular networks. With multiple antennas, t he communication performance can be improved by many orders of magnitude without increasing transmit power and bandwidth. Only more hardware complexity is needed This additional hardware requirement is enabled by the increasing computational power of integrated circuits. MIMO systems provide various benefits that include spatial multiplexing gain and diver sity gain. The information capacity of wireless communication systems increases significantly by employing multiple antennas. It has been analytically proved that MlMO systems can pro vide a linear increase in capacity [ 1 2] which is proportional to the minimum of the number of transrrut antennas and the number of receive antennas. This spatial multiplexing gain can be obtained by transmitting independent data strean1s from different transmit antennas The increased inform at ion rate is achieved without the requirement of increasing the transmit power and expanding the transmission bandwidth.

PAGE 12

2 The physical characteristics of the wireless channel present a fundamental technical chal lenge for reliable communications. Wireless communication channels exhibit s ignificant sig nal variations on a short term time scale which is kno w n as fading. One way to mitigate the degradation effects of fading is to emplo y diversit y techniques which pro vi de the receiver wit h severa l replicas of the same transmitted signal over ind epe ndent fadin g channels. The proba bility that all the received signals experience deep fades simultaneously reduces considerably. Thus diversity technique s increase the reliability of wireless links and dramaticall y improve the communication perform an ce o ve r fading channels. The commonly u sed diversity technique s include time diversity frequency diversity and spatial di vers ity Time di vers ity can be pro vi ded by channel coding combined w ith int erleavi n g or automatic rep eat request (ARQ ) schemes. In frequency diversity the sa me narrowband signal is tran s mitted over over differ ent frequency bands to pro vide independent fading channels. Spatial diversity wh ich is also known as an tenna di ve r s it y obtained by the use of multiple an tenn as is preferred over time diversity an d frequency dive r s ity since it does not need to incre ase the tra nsmit s ignal power and bandwidth. If the fading effects b etwee n di ffere nt pairs of transmit and r eceive antennas are approx im ate l y independent and the transmitted s ignal i s carefully de signed the received signals can be co m bined at the rec e iver such that the fading of the re s ultant s ignal i s great l y reduced compared to a s in g l e antenna communication syste m an d thus wireless link improvement i s provided Sp ace -tim e coding (S TC) is one ke y technique that has been introduced to provide en hanc ed performance for w ir e l ess communication systems emp lo yed with multiple antennas. Space time codes are d es i gned to u se th e extra d egrees of freedom in the s pati a l domain pro vided by extra antennas They incorporate the temporal and spatial correlations into signals from different transmit antennas to achieve tran smi t di versity and provide spatial multiplexing gain. The main classes of space time codes includ e the Bell l a b s l ayered spacetim e architecture (BLAST), space-time trellis codes (STTC) a nd space time block codes (STBC) Tarokh et al. [3] proposed space-time tr e lli s co d es which can provide full diversity gain at the receiver. After that many efforts ha ve b ee n m a de to impro ve the originally designed space time trellis co d es [4, 5). Since space tim e trellis co d es are designed b ased on trellis co d es they provide ad dit iona l codin g gain But the V it er bi algor ithm has to be employed for the optimal

PAGE 13

3 decoder of STTC and thus the decoding complexity grows exponentially with the memory length of trellis codes and the number of antennas To reduce the decoding complexity, Alamouti introduced a simple space-time block coding scheme for a two transmit antenna system which can pro vide full diversity gain without sacrific ing the transmission data rate [6]. The scheme was extended to more than two transmit antennas based on the theory of orthogonal designs [7, 8, 9]. Space-time block codes can be decoded us ing much simpler linear processing at the receiver compared with the Viterbi algorithm required for space-time trellis codes Although space-time block codes achieve the same diversity ga in as space-time trellis codes for the same number of transmit antennas, they do not provide any significant coding gain. To make a compromise between STBC and STTC, the schemes of con catenating the traditional trellis codes with space-time block codes to obtain additional coding gain has been proposed [ 10-14]. B LAST [ 15, 16] is the first space-time coding scheme propo sed for MIMO systems which provides spatial multiplexing. In BLAST the multiple independent data streams are transmit ted from di fferent transmit antennas, and are extracted by using the interference nulling and interference successive cancelation strategies at the receiver. This decoding scheme opera te d in spatial domain for BLAST is similar as the successive interference cancelation proposed for multiuser detection [ 17] in CDMA systems Field tests showed that BLAST provides a substan tial increase of data rates for wireless communication systems operating in practical channels [18]. 1.1 Timing Estimation for Rayleigh Flat-fading MIMO Channels To achieve the performance gain promised by the multiple antenna system parameter es timations including channel estimation timing estimation and frequency offset estimation are key components of the space-time system de s ign Both channe l estimation and frequency offset estimation for MIMO systems ha ve b een extensively studied in the literature [19 20]. An issue that has not been sufficiently explored is timing s y nchroni z ation in multiple antenna syste ms Inaccuracies in timing sync hroni za tion can degrade the performance of such communication systems in a simi l ar way as the MIMO channel estimation and frequency offset est imation error do. For instance m any of the current space-time coding schemes proposed

PAGE 14

4 for multiple-antenna systems assume perfect knowledge of timing and channel gains at the re ceiver in order to be able to achieve the promised diversity gain and capacity improvement. The perfonnance of these systems may be limi ted by the accuracy of timing estimation. One objective of this work is to study the problem of timing estimation for a wireless communica tion system employing multiple transmit and receive antennas in a Rayleigh flat-fading channel environment. 1.2 Channel Estimation for Correlated MIMO Channels with Colored Interference For the multiple antenna communication system, theoretical analysis [ l 2 15] shows that the capacity increases l inearly with the number of antennas under the assumption that channel gains between different transmit and receive antennas are identical and independent distributed (i.i.d.). The i.i.d. assumption is reasonable for sufficiently rich scattering environments. On the other hand it is a l so important to analyze the capacities design optimal transmission strategies and investigate the related channel parameter estimation problem for MIMO systems in more realistic situations which include spatially corre l ated channels and colored interference In the more realistic channel environment fading correlation exists between the different transmit antennas and receive antennas It was shown [21] that the c a pacity of correlated MIMO channels still grows linearly with the number of antennas but the growth rate is affected by the channel correlation s and smaller than that in independent fadin g chann els. Based on the ca pacity results for correlated MlMO channels optimal transmission strategies [22-25] have been widely inve tigated. Jorswieck et al. [24] investigated the correlated Rayleigh flat fading MIMO systems with perfect channel state infonnation at the receiver and the channel covariance infor mation fed back to the transmitter. It was shown that transmitting signals along the directions of the eigenvectors of the transmit correlation matrix is the optimal transmission strategy. The capacity of MIMO channels has also been investigated for wireless communication systems with colored interference. The scenario arises in cellular systems where the users in one cell suffer from the co-channel interference from the users in other cells due to frequency reuse or in ad hoc networks where each transmitter-receiver pair suffers from the interference from other transmitter-receiver pairs operatin g in the s ame frequ e ncy band. In Lozano e t al. [26] the capacity of MIMO systems with the presence of spatially colored interference was

PAGE 15

5 investigated. It was shown that the capacity increases with the interference spatial correlation and the lowest capacity is achieved when the interference is white. In Moustakas et al. [27] the authors provided analytical expressions for the statistics of the mutual information for spatially correlated channels with the presence of interference Channel estimation is nece s sary for coherent detection in multiple antenna communication systems The inaccuracy of channel estimation could degrade the system performance substan tially There are few works considering the channel estimation problem for MIMO systems in realistic situations which include both spatially correlated channels and interference So another objective of this work is to investigate the problem of estimating correla t ed MIMO channels with colored interference 1 3 Organi z ation of the Dissertation The dissertation is organized in the following manner. The timing estimation problem for MIMO sys tems with the aid of training signals is investigated in Chapter 2. In Chapter 3 we study the problem of estimating correlated MIMO channels in the presence of colored interference. Conclusions are drawn in Chapter 4. The notation used in this dissertation is summarized in Table 1.1 for clarity.

PAGE 16

A a Real ( a ) I n 0 diag ( X1, X2, . Xn) A T A AH A 1 ; 2 vec( A ) tr ( A ) det ( A ) A B a > b A a CN ~(t) ;/;(t) Table 1 .1: Matrix Notations matrix with complex entries column vector with complex entries real part of column vector a n x n identity matrix zero matrix diagonal matrix with x1 x2 ... x11 as the diagonal elements transpose of A complex conjugate of A complex conjugate transpose (Hennitian) of A Hennitian square root of A vector obtained by stacking columns of A on top of each other trace of A determinant of A Kronecker product of A and B inequality elementwise matrix with rea l entries column vector with real entries complex Gaussian distribution the first derivative of 'lj;( t ) w .r. t. t the second derivative of 'lj;(t) w r.t. t 6

PAGE 17

CHAPTER2 TIMING ESTIMATION IN MULTIPLE-ANTENNA SYSTEMS OVER RAYLEIGH FLAT-FADING CHANNELS 2 1 Introduction In this chapter we investigate the timing estimation problem for a wireless communication system employing multiple transmit and receive antennas with the aid of training signals. Previous related work was primarily restricted to acquisition in spread spectrum systems with multiple receive antennas [28, 29]. In Dlugos et al. [28] and Win et al. [29], the maximum likelihood estimator of the received code lag was obtained, and the error probability for the acquisition system was derived A deterministic but unknown channel was considered in Dlugos et al. [28] whereas a flat Rayleigh fading channel with known statistics was assumed in Win et al. [29]. An optimal estimator for code acquisition was derived in Shamain et al. [30] for spatially correlated channels. In Zhang et al. [31 ], the performance of code acquisition in a DS-CDMA system employing multiple transmit antennas was analyzed. Through simulations it was shown that the presence of multiple transmit antennas improved the code acquisition performance relative to that of a single-antenna system. Issues related to parameter estimation of signals received by an array of antennas have also been treated in the radar array signal processing literature [32, 33]. Time delay and spatial signature estimation of known signals received by an array of antennas was investigated in Swindlehurst et al. [34]. ML algorithms and the Cramer-Rao bound for time delay and array calibration e s timation were developed and some computationally efficient approximations of the ML algorithms were proposed. In Dogandzic et al. [35] ML methods were developed for space-time fading channel estimation with an antenna array in spatially correlated noise. The CRBs for the unknown directions of arrival time delays and Doppler shifts were derived under a structured and unstructured array response model. In the present work we consider a wireless communication sy tern with multiple trans mit and receive antennas in a slow independent and identically dis tributed (i.i d. ) Rayleigh 7

PAGE 18

8 flat-fading environment. The goal is to investigate the problem of timing estimation in such a system with the aid of training signals One of the main questions that we try to answer is to find the optimal training signal design. We investigate the timing estimation problem under two approaches. In the first approach the channel is assumed to be unknown and determinis tic where joint estimation of the channel and delay is carried out. We derive an ML estimator for joint channel and timing estimation and compute the associated CRB. Then we discuss the optimal training signals with respect to two performance measures based on the CRB: the outage probability that the CRB is larger than a threshold and the average CRB. We show that the optimal training scheme is one wherein orthogonal training signals from multiple transmit antennas are used. In the second approach, the channel is assumed to be unknown but random with known statistics. We use the likelihood function averaged over all random channel real izations to obtain the ML estimator for the delay. We derive the associated CRB and study the optimal training scheme in terms of minimizing the CRB. We show that perfectly correlated training signals employed at different transmit antennas constitute the optimal transmit scheme, in contrast to orthogonal training signals in the first approach. The rest of this chapter is organized in the followin g manner. The system model is in troduced in Section 2.2. In Section 2 3 we consider the timing estimation problem when the channel is asswned to be unknown but deterministic In Section 2.4 we study the problem of timing estimation with the assumption that the channel is random but with known statistics In both sections we derive the ML timing estimators and compute the associated CRBs. Optimal training signal designs are discussed based on the corresponding CRBs. In Section 2.5 some discussions comparing these two timing estimation approaches are provided. 2 2 System Model We consider a single-user MIMO system with nt transmit antennas and nr receive anten nas We assume a quasi-static (block fading) channel where th e channel varies slowly enough to be considered invariant over a block However the channel chang es to an independent value from block to block. By using the unstructured array model [33] the received baseband signals

PAGE 19

9 at the receive antennas are given in vector form by nt r(t) = L h ksk(t T ) + n ( t ) (2 1) k=l where h k = [ hki, h k2 . hknr]T with h i j denoting the channel gain from the i th transmit antenna to the jth receive antenna r ( t ) is the n r x 1 recei ve d signal vector from the receive an tenna array and sk( t ) is the transmitted training signal from the kth transmit antenna. Define the channel vector ash = [ hf, h I, ... h;'Jr. Also, n ( l ) i s a complex, circular-symmetric, white Gaussian noise process with zero mean and covariance matrix E [ n (t) n (u)Hj = o-2I n r 6 (t -u). The symbol T denotes the unknown deterministic delay to be estimated. This model assumes that the delays between all pair s of tran sm it and receive antennas are the same. This corresponds to the case in which the distance between the transmit and receive antenna arrays is much larger than the sizes of the arrays We consider the Raylei gh flat-fading channel model in which the channel coefficients h,j are i.i.d. complex circular-symmetric, zeromean Gaussian random variables with the CN( O p2) distribution, i.e. The conditional likelihood function ofr( t), g iven the unknown T and h, can be written as (2.2) where we have assumed that the training signals sk(l), for k = 1 ... n1 have finite durations and the observation interval T0 is larger than the sum of the maximum training signal duration and the maximum possible va lu e of T Thus the whole transmitted training signa l s are observed at the receiver.

PAGE 20

10 We can simplify the exponent of the likelihood function to find the sufficient statistics for the estimation of the delay T: f II r (t) t h,s,(t T) II' d t 1 T r H (t) r (l) d t 2Re{ t [ 1 T r H (t)s,(t T) dt l h } where the term canst represents the part which does not depend on the delay T and the channel h. Also, the last equality holds due to the assumption that T0 is larger than the sum of the maximum training signal duration and the maximum possible delay. Denote the matched filter output corresponding to the k th transmit si g nal by ( T o rk(T) = l o r*(t) k(t -T) dt, k = l 2 .... n t. (2.3) Ote that r ( T) = [ r 1 ( T f, r 2 ( T f ... r n, ( T tr provides Sufficient Statistics for estimating T. With this notation we then have (2.4) Denote the crosscorrelation between the training signals from the i th and j th transmit antennas as ( T o r i J = l o s;(t)sJ(t) dt, (2.5) which fonns the (i j)th element of the correlation matrix r. Let C = r Inr Then, we have (2 6)

PAGE 21

11 From (2.4) and (2 6) the conditional likelihood function of r ( T), given the unknowns T and h can be written as p ( r ( T ) IT,h ) = 7!'-nra--2nrexp [ : 2 (can s t 2Re{r(Tfh} + hHCh)]. (2.7) Let f (T) = [ Re ( r(T)f -lm( r(T)fir, h = [ Re ( hf, lm( hfir, and A ( Re ( C ) -lm(C) ) C = By using the isomorphjsm between real and complex matrices lm( C ) Re( C ) T TA A A A [36] we have 2Re{r(T) h} = 2f( T ) hand h/-/Ch = 2 h rCh. 1n tenns of these real quantities the conditional likelihood function off ( T) is then 2.3 Timjng Estimation with Unknown Deterministic Channel 1n trus chapter we will treat h as unknown but detenninistic in the estimation process and consider the joint estimation of the delay T and the channel vector h 2.3 1 ML Estimator 1n thls section we develop the ML estimator for the joint estimation of the timing T and the channel vector h The joint ML estimate of T and h maximjzes the conditional likelihood function (2 8 ) a s a function of T and h: m~p(f(T)IT, h ) = max{m_?,Xp(f (T) I T h )}. r h r h (2.9) Alternatively we can maximi z e the log-likelihood function given by (2.10) As suggested in ( 2 9) we first maximize the log likelihood function L over h. Taking the first derivative of L with respect to ( w.r.t.) h gi v es aL 1 A A A = { 2 f (T) 4Ch}. 8 h a-2

PAGE 22

12 By letting = 0 we get the ML estimate of the channel h as (2.11) where we have assumed that C i.e. C is nonsingular to obtain a unique estimation of the channel. Then substituting (2.11) into (2.10) gives the ML estimate of the delay T in the form: (2 .12) To implement the ML estimator in general, we need to conduct a line search over all possible values of T to maximize the above metric. 2.3 2 Cramer-Rao Bound The Cramer-Rao bound gives a lower bound on the variance of any unbiased estimator (36 37]. It has been widely used to lower bound the mean square error (MSE) of symbol timing estimators (38 39]. lt is well known (36 37] that ML estimators, under mild regularity condi tions and with independent and identically distributed observations are asymptotically unbiased and efficient. It can be easily verified that the elements of r(T) given in (2 3) corresponding to different receive antennas are i i d observations Thus for a particular realization of the channel h the ML estimator is asymptotically efficient, i.e., it approaches the CRB as the number of receive antennas nr become s lar ge. Hence the CRB i s a s uitable performance measure for the ML estimator of the delay T. We will also verify the s uitability of employing the CRB as a performance metric by computer simulation examples. The main re s ult of this sectio n on the CRB is contained in th e following theorem. Theorem 2.3.1 (Cramer-Rao bound). Suppose that th e first and second derivatives of the training signals sk(t), for k = 1 ... nt, exist and th ey are uniforml y c ontinuous on [ 0 T0 ] Together with th e standard regularity conditions in [36, 3 7}, th e Cramer -Rao bound for the estimat ion of the delay T for a given r e alization of the c hannel h is given b y a2 1 CRB( h ) = -----------2 Re{E[8~ ~\'")rh} + E [8~~)r C -1 [8~~)(' (2.13)

PAGE 23

13 i=l 2 ... ,nt (2.14) and E[a~~)l = [ E[ar~;T)F, E[ar~~r )ir .... E[ara~(r)lrf w ith i = 1 2 ... rlt. (2.15) Proof The CRB for the estimation of T is given as CRB(h) = ( 1 -1 )22, (2 16) where I is the Fisher information matrix for the joint estimation of the channel h and the delay T which is defined as Since = 4 C and = 0 we have 8 h 8 h [ a2 L ] 4 A l11 = E afi2 = (J2C. Moreover I12 = E [ ~ 2 ] = 2 2 E [ 8 r 8(T)] = 1I1-8 h8T (J T L t = E [8r ( r)] = E [8r 1 ( r f 8 r2(rf 8 rni(r f ]T e V 8T 8r ar . ar then 11 2 = -~v = -~ [Re( vf, -Im( v ff. The ith block of v can be computed from ( 2 17) (2 18) (2.19)

PAGE 24

14 The fact that the noise n (t) is zero-mean g ives [ O r (T)] n, ('o 8s ( t T) "' {To E = h i lo si( t T) i OT d t = h i l o si(t)si(t) dt (2 20) Finally I = -2 E [82r(r)r] h = -2 Re {E [82r(r}r] h}. Similarly 122 can be computed 22 from the fact that { 32 ( ) } n, / T0 E ;~2 T = h i l o sk(t)s i(t) dt. (2.21) Applying the standard result on the inverse of a partitioned matrix to (2.16 ) and (2.17) gives (2.22) By using the relationship between real and complex matrices [36] we get I I-11 2 A T ( 4 c~)1 2 A 2 r c-1 21 11 12 = 2v 2 2 v = 2v v a a a a (2 23) Then the CRB of the estimation of the delay T is a2 1 CRB ( h ) = --------------. 2 Re { E [ a~~~r)] Th} + E [a~~)] T c -1 E [ a~~l] (2.24) D We note that the CRB varies with different choices of training signals. By carefully choos ing the training signals to minimize a s ui tab l e measure associated with the CRB we can poten tially improve the estimation perfonnance. 2 3 3 Optimal Training Scheme Communication systems often employ the same symbo l waveforms for both training and data phases. The choice of the symbol waveform is mainly decided by the performance required by data transmissions. In this sect ion we shall make the followin g simpli fy in g but practically reasonable assumptions on the training signa l s:

PAGE 25

15 Assumption 1 Let a k = [ak(O), ... ak( l)]T be the training s equence assigned to the k th transmit antenna and on this antenna the training signal waveform is of the form N-1 sk(t) = L ak(i)?j)(t i T s), ( 2.25 ) i=O where N is the number of training symbols and 7/J(l) is the symbol waveform. We call the N x nt matrix A = [ a1 a2 ... cint] as the training sequence matrix. Assumption 2 The symbol wa vefonn 7/J(t) is time-limited to a single symbol period [ O Ts] so t h at adjacent symbols do not interfere with each other. In addition 7/J(t) is sufficiently smooth to guarantee the existence of uniformly continuous first and second derivatives. This condition is satisfied for most symbol waveforms of practical interest. Two typical examples are the time-domain rai s ed-cosin e pulse and the half-sine pulse. Assumption 3 AH A is nonsingular and hence r and C are also nonsingular. We note that this implies that N nt. Under the a s sumptions stated above the CRB for the timing estimation can be simplified to the expres s ion summarized in the following corollary Corollary 2.3.1 (Cramer-Rao bound). Give n As s umptions 1 3 t h e CRB f or the est imatio n of T for a parti c ular r e alization of the c hann e l h r educes to CT27/Jb 1 CRB( h ) = -27/Ja hH (AHA Inr)h ( 2.26) w h e r e 7Pa = 7Pb7Pc + J7/JdJ2 7Pb = J l7/J(t)J2 dt, 7Pc = J 7/J*(t);/;(t) dt, and7/Jd = J 7/J*(t)~(t) dt. Proof With th e thr e e as s umption s on th e trainin g sig n a l s we h ave T o N-1 1 1 L a~(m )?j)*(t mTs) L a i ( l ) (t -lTs) dt O m = O l = O a : a i j 7/)*(t );/;(t) dt ( 2 .2 7)

PAGE 26

Then Eqn. (2.14) can be written in terms of the training seq uence s as: Thus [ 82r ( T)] T T H H H E 872 h = VJc 8 h i h k a k a i = 'l/Jch ( A A InJh Hencel22 = -~Re [E [8~V)r h] T = -~'l/J c h H ( AHA Inr)h Moreover (2.15) can also be simplified in terms of the training sequences as : Thus E [ a~~) ] = -'l/Jd(AH A Inr)*h*. Similarly, we have rij = j s;( t)s1( t) dt = 'l/Jb a f a 1 and C = 't/;b(A HA l11J .Hence (2.23) can be written as 22 VJdh H (AHA@ InJ('l/Jb AH A @ Inrt1'l/Jd(A HA @ Inr)h (J 16 (2.28) (2 .29) (2.30) (2.31) 2_ l'l/Jdl2 hH (AHA I ) h (2.32) (J2 VJb nr Then the Cramer-Rao bound for the estimation of the delay T is CRB ( h ) [ l22 l2111} 112]-1 [ -(J~ 'l/JchH (AHA@ Inr)h -:2 l ~ ~\1H ( A H A @ Inr)h ]-l (J2VJb 1 = 2('l/JbVJc + l'l/Jdl2)hH( AHA @ Inr)h. (2 -33 ) By u sing some standard properties of the Fourier transform similar to the Parseval 's theo rem we hav e VJb = 2 ~ J ~00 l\ll(w)l2dw, VJc = -2 ~ J ~00 w2l\ll(w)l2dw, and VJd = f,; J ~ wl\ll(w)l2dw, where \ll(w) is the Fourier tran sform of 'l/;(t) Then according to the

PAGE 27

17 Cauchy-Schwarz inequality we have (2 34) Since 'l/Jb 2 0 we have !: 2: 0 which implies that the expression of the CRB given in (2 33) is nontrivial. D As a result the dependence of the CRB on the training signals sk(t), fork = 1 ... n i simplifies into that on the training sequence matrix A and the symbol waveform 'l/J( t ) In the following two subsections we optimi z e the training sequence matrix A in tenns of two perfor mance measures namely the outage probabi l ity that the CRB is larger than a threshold and the average CRB over all channel realizations. Outage probability In this subsection, the outage probability that the CRB is larger than the thresho l d E, i.e. Pr( CRB ( h ) > E), is used as a performance measure with respect to which the training signals from different transmit antennas are optimized. Write the spectral decomposition of AH A as AH A = U AU H where U is a unitary matrix and A = diag{ A1 A2 ... A n1 } is the diagonal matrix containing the positive eigenvalues of AH A. The design of the optimal training scheme can now be formulated as the following optimization problem: mm Pr(CRB ( h ) 2: E) A subject to tr{AH A} = I: ~1 Ai::; ; A,> 0 i = 1 ... ni (2 35) where 'l/Jbtr{ AH A} ::; P specifies a constraint on the total transmit power.

PAGE 28

18 First, we consider a simple but important case: 2 transmit antennas and 1 receive antenna. In this case, the optimization problem (2.35) can be simplified as follows. Starting from Corol lary 2 3.1, we have Pr( CRB ( h ) l) = Pr ( hH AH Ah :S -;;i~) (2 36) With the spectral decomposition of AH A, hH AH Ah= hHUAUHh = h H Ah'= .X1'h;j2 + .X2j h~l2 where h1 = UHh. Since his a random vector with i.i d complex circular-symmetric zero-mean Gaussian elements and U is a unitary matrix h1 is also a complex Gaussian random vector with the same distribution as h. We note that lh:12 has the exponential distribution with E (lh:12 ) = p2. L X lh' 1 2 d ,\; ti -1 2 h et i -y an ci PNb, or i , t en (2 37) where X 1 and X2 are independent random variables with exponential distribution and E ( X i) = E( X2 ) = 1. The total power constraint ~~!1 .Xi :S ; is equivalent to 1 + c2 :S 1. Hence the optimization problem can be rewritten in the following simple fonn: subject to c1 + c2 :S 1 and c1 c2 > 0 (2.38) In order to solve the above optimization problem we employ the following result on the Schur-convexity1 of the distribution function of the linear combination of two exponential ran dom variables [ 41]. Lemma 2.3.1. L e t X1 and X2 b e ind e p endent random v ariabl es w ith exp on e ntial distribution and E(X1 ) = E(X2) = 1. The n the func tion 1 A detailed description on Schur-convexity and majorization can be found in Marshall et al. [40].

PAGE 29

19 is Schur convex on ( c1 c2 ) if x 1, and it is Schur concave on ( c1 c2 ) if x 3 /2. Using the above lemma and considering the region in which the C RB threshold E > -/t ;;2 the optimization cost function in (2.3 8) is a Schur convex function on ( c1 c2 ) T hus minimization of the cost function occurs if and only i f c1 = c 2 = i .e. >-1 = >-2 = 2 ;b [ 40]. This impli es that the optimal A is s uch that A1 1 A = 2;b 12 The optimal training scheme is s ummarized in the following theorem. ,t,2 2 Theorem 2.3.2. Suppos e that th e CRB thr eshold t -f,t /;p2, the training sequence matrix A such that A H A = 2: b I minimiz es th e outage probability of the CRB for a system with 2 transmit antennas and 1 r ece iv e antenna. That is the optimal training sequences from differ en t transmit antennas are orthogonal to each other and have eq ual powers. We s hall see from the di sc u ss ion in t h e next s ub sectio n on the average CRB ( Corollary 2.3 .2), the value 21a ;;2 i s exactly one half of th e average CRB over all channel realizations. Thus it is reasonable to consider the stated reg ion of the CRB threshold It seems natural that a result analogous to th e one in Lemma 2 3.1 be tru e for the mo re general case. While the proof of s uch a result remains open there is strong evidence regarding the Schur convexity of the function F(c1 ... Cn,, x) = Pr(c1X1 + + Cn1Xn1 x) where X i fo r i = 1. ... n t are indep en d ent random var iabl es w i th unit-m ea n exponential distri but ion. The following conjecture ha s been advanced in Merkle et al. [ 41 ], s upp orted by some strong numerical r es ult s Conjecture 2.3.1. The famil y of unimodal distribution functions F ( c1 .... en1 x ) is increasing wit h respect to the variance (i. e Schur-convex) for small va lu es x, and decreasing ( i.e., Schur co ncave) for larg e va lu es of x Based on the above conjecture, we conjecture that the result in Theorem 2 3 2 extends to the case of ar bit rary numbers of transmit an d receive antennas: Conjecture 2.3.2 When AH A = ./ I th e outage probabili ty of the CRB is minimi zed if the 'f'bnt CRB threshold t is not too small Thus the optimal training sequ e nces from diffe r e nt transmit antennas in terms of minimi zing the outage probability, are orthogonal to eac h other and have equa l powers

PAGE 30

20 In Hassibi et al. [42], the authors assumed perfect timing estimation and studied the prob lem of choosing the optimal training sequences for channel estimation to maximize a lower bound on the capacity of the channel that was learned by training. The optimal training se quences for channel estimation turned out to have the same structure as those we get here for timing estimation To illustrate our conjecture on the optimality of orthogonal sequences, we have carried out a large number of numerical calculations. In the broad region of 1: that we are interested in we have not observed the existence of any other schemes which can achieve a lower out age probability than the orthogonal training signals. In Fig. 2.1 we plot for instance, the outage probabilities Pr( CRB ( h ) 2'. 1:) for a system with 4 transmit antennas and a single re ceive antenna employing different training signal sets. Note that since P is the total transmit power constraint, the signal-to-noise ratio (SNR) !:..{-here should be understood as the total (J SNR for the whole training period instead of the SNR for one symbol period. The time-domain raised-cosine pulse is used as the symbol waveform. The results in the figure suggest that the orthogonal training signals are optimal and can provide a significant performance gain over the other training signals. In Fig. 2.2 we compare the outage performance of orthogonal training sequences for dif ferent numbers of transmit antennas. The results in the figure show that the use of multiple transmit antennas can offer substantial estimation performance improvement over a single-antenna system. For example, if we consider the outage probability Pr( CRB ( h ) 2: 1:) = 0 1 the two transmit antenna system can achieve a 4 dB performance gain and the four-transmit antenna system can achieve a 6 dB performance gain The performance gap grows with decreasing outage probability. More precisely the outage probability for orthogonal training signals is given b y Pr( CRB ( h ) 2'. 1:) = (2.39)

PAGE 31

21 where the second equality is obtained from the fact that hHh is X~ntnr -di stributed [43]. From (2.39), it is not hard to see that when the SNR is large i.e !:.-fn2t~t, the outage probability U f.'f'a is approximately given by 1 Pr(CRB( h ) t:) ( ) I ntnr. (2.40) Eqn. (2.40) indicates that the outage probability decreases with the (ntn, ) th power of the re ciprocal of the SNR. The power ntnr is usually referred to as the diversity order of the system [ 43]. Thus we conclude that the use of multiple transmit and receive antennas (with orthogonal training signals) provides spatial diversity for timing estimation in the same way as space time coding does for demodulation [ 1 15]. An important remaining issue is whether the ML estimator can achieve the outage prob ability of the CRB. For each realization of the channel h the ML estimator is asymptotically efficient with increasing number of receive antennas nr. We note that Pr(CRB( h ) t:) = Eh [ 1 ( CRB ( h ) t:)], where 1 (-) is the indicator function Because the indicator function is a bounded function the dominated convergence theorem [44] implies that the ML estimator can achieve the outage probability of the CRB asymptotically To verify the suitability of using the outage probability as a performance metric when the number of receive antennas is small we evaluate the performance of the ML estimator via Monte-Carlo simulations. In Fig. 2.3 we plot the outage probabilities of the ML estimator obtained from simulation and calcu l ated using the CRB, respectively for a system with two transmit antennas and employing orthogonal training sequences. It can be seen that the ML estimator gives an outage probability perfonnance very close to that predicted by the CRB even for small values of n,. = 1 2, and 4 Hence, the simulation results verify that the outage prob ability of the CRB provides an effective performance metric also when the number of receive antennas is small. Average CRB In this subsection, we use the CRB averaged over the Rayleigh flat-fading channel has an alternate performance measure based on which the trainin g signals from the transmit antennas are optimized

PAGE 32

I q .. \ \ \ \ ... \ .. I ... \ \ .. \ 0 .... \ \ \ \( \ .. iu \ . \ \ ... \.'. \ ' 'v, \ :( \ . '\;( \ \ W 10 -3 '--------'---'---'----'--'-------'-'1--'----~ ~ '--------"'---~-~ 10 15 20 25 30 35 40 SNR ( dB) 22 Figure 2.1: Outage probabilities achieved using different training signal sets for a system with 4 transmit and 1 receive antennas. The unit of the threshold E is Tt

PAGE 33

w I\ :2 ar er s it 'Q 1 0-1 0 10 -2 Q .
PAGE 34

:.0 ctl .0 e C. Q) 0, ctl :5 0 10 r_--,-~=--,-'""=--::--r---.----------r-----r------r-;:::::====I=====,---,--, -eML, n,=I \ 18 20 22 ... ........ .. 24 .., .. ' 26 SNR ( dB) \ \ \ 28 \ \ \ 30 -ML, n, = 2 -+ML, n,=4 _ CRB, n,=I CRB, n ,=2 \ \ CRB n,=4 32 34 24 F igure 2.3 : Comparison of outage probabilitie s of the ML estimator obtained from simulation and calculated from the CRB The number of transmit antennas nt is 2 and E = 10 -4T ;

PAGE 35

25 After averaging over the Rayleigh flat-fading channel h the average CRB is given as (2.41) The design of the optimal training scheme can now be formulated as the following optimization problem : min A subject to tr{AH A}= ''nt ,\ < .!:.._ 61=1 1 1Pb i = 1 ... ,nt. (2.42) The following theorem specifies the optimal training sequence that minimizes the average CRB. Theorem 2.3.3. When AH A = ./ I the average CRB over the Rayleigh flat-fading channel 'l'bnt h is minimi zed. That is the optimal training sequences from different transmit antennas in t e rms of minimizing th e averag e CRB are orthogonal to e a c h oth er and have equal powers. Proof Let W = U A U H where A = diag{>~ >.~, ... >-~inJ contains the positive eigenval ues of the Hermitian matrix W and U is a unitary matrix. Consider the following optimization problem : min E [11H~11] w subiect to tr{W} = "ntnr >.' < nrP J 61=1 1 ,j;b (2.43) Note that (2.44) where h' = u H h. As befor e, h is a complex Gaussian random vector with the same distribu tion ash. Let g(>.') = E ~'.x; where xi 2 0 are assumed to be fixed constants. We study the convexity property of g

PAGE 36

26 n r h .E.fL -x d __E:JJ_ 2XiXj Th th H G ( ') f vve ave o A ; ( L A;x;)2 an o).;o).~ = o:::: ).;x; )3 en e essian o g 1s 2xf 2x1x2 2 x1 :tnt C L ).:x;)3 ( L ).;x;)3 (LA;x;)3 G(A1 ) = 2x2: q 2x~ 2x2Xnt (LA;x;)3 ( L A;x;)3 ( L A;x;)3 It is easily seen that every rows of the Hessian G ( A1 ) are dependent. So r ank(G ( A1)) = 1. G(A1 ) only has one nonzero eigenvalue which is 0fA:::)3 2 0. (the sum of eigenvalues of a matrix is equivalent with the sum of all diagonal elements.) All other eigenvalues are zero. Hence the Hessian G(l) is a positive semidefinite matrix Then g(l ) is a convex function on In order to solve the a bove optimization problem, we employ the following result from the theory of majorization [40]. We first introduce some fundamental concepts of majorization that we require in the derivation of the optimal transmit scheme. For any x = (x1 . x11 ) E lRn, let X[i] 2 2 X[n] denote the components of x in decreasing order. Definition 2.3.1. For vectors x y E A C JR11, vector y majorizes x on A if k k L X[i] L Y[i), k = 1 ... n 1 i=l i = l n n LX[i] = LY[i] i=l i=l The notation x ---< y means x is majorized by y on A, or y majori zes x on A. Majorization makes precise the vague notion that the components of a vector x are less spread out or more nearly equa l than the components of vector y Definition 2.3.2. A r ea lva lu e d function f de.fined on a set A C ]Rn is said to be S ch urconvex on A if X---< y =} j(x) j ( y ) f is Schur-concave if the above inequality is reversed It follows that f is Schur-convex on A if and only if -f is Schur-concave on A.

PAGE 37

27 Lemma 2.3.2. If X 1 . Xn are exchangeab le random variables and the multi-variable, single valued function g is a s y mmetric Borel-measurable convex function then the function is Schur convex Since h : are i.i.d. they are exchangeable random var iables Since h: are exchangeable random variables and g(>.') is a symmetric Borel-mea sura ble convex function, the function E [I:~;7}>-'.lh;l2 ] is Schur-convex by the lemma. Moreover since ( 1/l;t, ... 1/l;t) is majorized by ( >.~, . >.~tnJ whenever>: > 0 I:>-: = ~:, we know [40] that E [I:~;7}>.'.lh;l2 ] is minimi zed with>. ~ = >.~ = = >.~tnr = w:nt. We note that this choice of>:, i = 1 ... ntnr, also satisfies the constraints in the minimization problem in (2.42) Thus it i s also a solution to the original minimization problem Thus the optimal training sequence matrix A should satisfy AH A = ./ I which implies that the train-'f'bn, ing sequences from different transmit antennas are orthogonal to each other and have equal powers. D With the optimal training sequences we can provide an explicit expression for the average CRB which is described in the next corollary Corollary 2.3.2 (Average CRB). Using the optimal training sch e me, th e average CRB over the Rayleigh fl.at-fading channel h is given by 'l/J~ (j2 Eh[ CRB( h)] = ( ) P2 2 n _!__ P r nt 'Pa (2.45) Proof From Theorem 2.3 3 and its derivation the avera g e CRB under the optimal training scheme is given as (2.46) where h : are i.i d complex circular-symmetric Gaussian random variab l es with the CN(0, p2 ) distribution.

PAGE 38

28 Let Y = I:7~~r lhW. Then Y is X~ntnr -di stributed with the probability density function (p.d.f.) for y 2'. 0 (2.47) for z > 0 The expectation of Z can be computed as (2.48) When n t n r 2'. 2 [45], we have (2.49) Then from (2.46) and (2.49) the average CRB can be written in a simplified way as VJ~ (52 Eh[ CRB ( h ) ] = ( ) p 2 2 n ...!._ a l p r nt f.//a (2.50) D With the optimal (orthogonal) training sequences the average CRB is a simple function of the constant -210 which only depends on the symbol waveform 1/;(t), the signal-to noise ratio !:.ii;-, the number of transmit antennas ni, and the number of receive antennas n,.. Note that the a average CRB in the limit of la rge ni or large n r can be approximated as (2.51) which is inversely proportional to the number of receive antennas nr When 1/;(t) is symmetric about ~ such as the tim e-domain raised-cosin e pulse and the half-sine pulse VJd b e comes z ero Then the average CRB for the estimation of the delay T with orthogonal training signals can be

PAGE 39

29 written as 1 (72 Eh [ CRB ( h)] = ( ) -p 2 2 n -1... {]2 P T nt (2 52) [-11] l / 2 [-,/J* (t),/J( t )dt] 112 [J~00 w21'1'(w)l2dw] 1 1 2 where {J VJb -l1/J(t)I dt -J !;: l lf ,(w)l2dw 1s known as the root-meansquare bandwidth [37] of the symbol waveform. Here w(w) is the Fourier transform of 'l/J(t). We note that the average CRB can be decreased by increasing the bandwidth of the symbol waveform. As before we would like to know whether the ML estimator can achieve the average CRB. Because the function hH( A H l I nrl h is not a bounded function thus unlike the outage probability of the CRB, the ML estimator may not achieve the average CRB asymptotically ( see further discussion in Section 2. 5 .1 ). However the average CRB provides a lower bound for the variance of any unbiased timing estimator averaged over the channel realizations. Again we employ Monte-Carlo simu l ations to evaluate the performance of the ML esti mator with a small number of receive antennas. ln Fig 2.4, we compare the mean squared error (MSE) achieved by the ML estimator and the average CRB given by (2.45) for a system with two transmit antennas and employing orthogonal training sequences. For a single receive antenna system, the performance of the ML estimator deviates significantly from the average CRB. This is due to the events in which all the channel coefficients are very small simultane ously causing the estimation performance to be very poor The large estimation errors caused by these events dominate the MSE of the ML estimator. We can see from the figure that the effect of these events diminishes as the number of receive antennas or the SNR increases. ln the former case, the error dominating events become rarer as the number of receive antennas increases. In the latter case the estimation errors and hence the effect of the error dominating events get smaller as SNR increases For a reasonab l y small value of nr, e .g. 4 and a reasonably high SNR, e.g. 20 dB we see that the average CRB is still a rather appropriate performance metric 2.4 Timing Estimation with Random Channel Recently differential space-time coding schemes [46 47, 48] have been developed where channel estimates are not required at the receiver. For this situation, we only need to consider

PAGE 40

100 ..::-:----,-,-,-,------,-,.,.,:-:--:----:---:-:----,----,-....,..,.,...,...,-;7'.""7---,--,-r,--:----:--:-----:-----,----:--:-r:----,--------:i Average CRB, n,=1 ............ . . < r ~:-~~), ! = + Average CRB, n,=2 0 Average CRB n,=4 'v Average CRB, n,=8 -+MSE, n,=1 -tMSE n,=2 -e-MSE, n,=4 -V-MSE,n,=8 : -. : ....... .... 10 -6 .__ ____ ...,__ ____ _,__ ____ ___. _____ _,__ ____ _._ ____ __J 5 10 15 20 SNR(dB) 25 30 35 30 Figure 2.4: Comparison of the MSE of the ML estimator obtained from simulation and the average CRB. The number of transmit antennas n t is 2. The unit in the vertical axis is T;.

PAGE 41

31 the estimation of the delay T. A reasonable model to represent this scenario is that the channel is random with known statistics. 2.4.1 ML Estimator Recall that the conditional likelihood function p ( f ( T) j T h ) off ( T) in terms of real vectors and matrices is given by (2 8). With the asswnption of i.i .d. Rayleigh flat-fading channels between the transmit and receive antennas we have E [hhH] = p2Intnr and E [ fifi T ] = ~ I2ninr The joint probability density function of the channel vector h is given as (2.53) We can average p( f ( T) jT, h ) over all realizations of h to obtain the unconditional likelihood function as p(f (T)jT) J p(f(T)jT h )p( h ) d h COnStX = exp{~f(Tf(2c+a:1)\ (T) } (2.54) J det ( 2 C + I ) a P where we have used the integral result from Cramer [49 11.12. l a]. The natural logarithm of p(f( T) jT) is the log-likelihood function: (2.55) By using the relationship betwe en real and complex matrices [36] the log-likelihood function can be written in terms of complex quantities as (2.56) Hence the ML es timator for the d e l ay T is g iv en by We assume that u; is known to the receiver for the implementation of the ML estimator. We p note that the matrix C + ~; I is always invertible. So unlike the restriction in Section 2.3, C can

PAGE 42

32 be singular which implies the training signals from different transmit antennas can be correla te d with each other. 2.4.2 Cramer-Rao Bound The CRB for the timing estimation based on the random channel model is summarized in the following theorem Theorem 2.4.1 (Cramer-Rao bound). Suppose that th e first and second derivatives of the training signals sk(t) ex ist and they are uniformly continuous on [ O T0]. Together with the standard regulari ty condit ions {36, 3 7 } the Cramer-Rao bound for th e esti mation of the d el ay over the i.i.d Rayleigh flat-fading chann e l mode l is given by (72 1 CRB=------2nr tr{D -1G } w here D = r + ~ I and the (i, j)th e l emen t ofG is for i j = 1 2, . nt. Proof To derive the CRB we start from the lo glikelihood function in (2.56): 1 ( 2 )-l ln[p(r(,)I,)] =canst+ r72r(,f C + ;21 r (,)*. (2.58) (2.59)

PAGE 43

33 The second deri v ative of the log likelihood function ln[p( r (T)J T ) ] w.r .t. T i s The expectation of the above is ( 2 60) Write a r ( r ) = [ari(r)T a r 2 (rf ... a rnt(r)T] T where the ith block can be computed as ar ar ar , ar 8 r i(T) ~h* [ 1To *( ) 8si (t T ) d ] 1To *( ) 8si(t T) d ---6 k s k t T ---t -n t ----t. 8 T k=l O 8T o 8T (2.61) Then 8ri(T)* 8r1(T) T 8T 8T { t,h [f s,(t 7)s;t 7 ) dt] + f n(t/sit 7 ) dt} { t,h/'[ f s;(t 7Jst 7 ) dt] + f nH(t); ~ 7 7 ) dt} Rec a ll that th e c hann e l ga in vector h i s asswne d t o ha ve i.i d compl ex c ir c ul ar sy mmetri c G a u s-

PAGE 44

34 we have As a result, we have E [ atr a rtf] = P Inr where the ( i j ) th element of Pis given by Similarly we also have E [0 2;~;r r ( T f ] + E [ r(T)*82;~~f ] = Q Inr, where the (i j ) th element of Q is given by Q ; p' t, ( 1 r s,(t)s;(t) d t ) ( 1r. sk(t)s; (t) dt) + a' 1 r s;(t)s;(t) dt for i j = l 2 . nt 2 Let D = r + a 2 I, then p (2.62) where the second equality is obtained by using ( A B ) -1 = A -1 B -1 and the third equality is obtained from the property ( A B)( C D) = (AC) (BD) [50]. Let G = P + Q. Since

PAGE 45

35 differentiating the left side twice w.r.t. T gives Then using the above equality, the ( i j)th element of G becomes (2.63) for i j = 1 2 ... nt, As a result we note that G does not depend on the noise CJ2 The CRB of the timing estimation is given as CRB = l E{ a2ln[p( r (r)lr))} 8r2 (J2 1 2 n,. tr{D-1G} (2.64) D 2.4.3 Optimal Training Scheme In this section, we impose Assumptions 1 and 2 made in Section 2 3 on the form of the training signals. With these two assumptions G i j can be simplified to Hence we have G = 'lj;ap2 AH AA HA. Thus the CRB for the timing estimation can be simpli fied as given in the followin g corollary. Corollary 2.4.1. Giv en Ass umptions 1 and 2, th e Cramer-Rao bound for the estimation of the d e la y T ove r th e i i d Rayl e igh flat-fading c hann e l model redu ces to (2.65)

PAGE 46

36 Moreover, in terms of the eigenvalues A1 A2 ... Ant of A H A we have tr 'lj;bAH A + CI2 I AH AAH A = L i a2 { ( 2 )-1 } nt A2 p i=l 'lj;bA-i + p2 (2.66) Thus the minimization of the CRB is equivalent to the following optimization problem : subject to A i 2:: 0 i = 1 ... n t (2 67) It can be easily verified that the cost function ~n..'..1 >..; 2 is a convex function on ( A1 . A n t). D i -,t,b>..;+~ p Then the following theorem specifies the optimal training sequences [51]. Theorem 2.4.2. The CRB is minimized by c hoosin g the tra ining sequence matrix A such that A1 = ~ and A2 = = Ant = 0. That is the optimal training sequences from differ ent transmit antennas are perfectl y c orrelat ed. We note that the rank of the optimal training sequence matrix A is 1. This implies that we can choose an arbitrary subset of transmit antennas to transmit the training signals as long as the training sequences from the chosen transmit antennas are perfectly correlated with each other. A common choice is to use the same training sequence and evenly assign the power to each transmit antenna. With the optimal choice of training sequences, the corresponding minimwn CRB is given by : 'lj; 2 (I2 ( (I2 ) CRB = -2 i:, p 2 1 + p 2 n r'f/a p p (2.68) On the other hand when orthogonal trainin g signals are emp lo yed, i.e. A1 = = Ant = ~ , nt'l'b the CRB is maximized to the value CRB = ___ b ___ l+nt. 'lj; 2 (I2 ( (I2 ) 2 nr'l/Ja P p2 Pp2 (2 69) Contrary to the previous case of joint estimation of the channel and delay where orthogonal training sequences are optimal they are the worst in terms of the C RB value for est imatin g the delay under the random channel model. Fig. 2.5 compares the CRBs of the system with the perfectly correlated trainin g sequences and that with the orthogonal training sequences Note

PAGE 47

37 that the CRB of the system with the perfectly correlated tra ining sequences is the same for any number of transmit antennas. We see that the perfonnance gain achieved by the optimal scheme is obvious when the SNR is low. For any fixed nt, the performance gap va nishes as the SNR becomes sufficiently large. In Fig 2.6, we compare the MSE achieved by the ML estimator and the CRB given in (2.68) for a system with two transmit antennas and employing perfectly correlated training sequences The perfect correlation is obtained by using the same training sequence and evenly assign the power to each transmit antenna. As will be discussed in Section V.B, no knowledge of signal-to-noise ratio is needed to implement the ML delay estimator for this choice of perfectly correlated training sequences. We observe from the figur e that for a reasonably small value of nr, e.g. 4, and a reasonably high SNR, e.g. 20 dB the CRB is a tight lower bound on the MSE performance of the ML estimator. This together with the asymptotic achievability of the CRB suggest that it is an appropriate performance metric. 2.5 Discussions and Conclusions In the previous two sections, we have studied the problem of timing estimation in multiple antenna systems from two different approaches. In Section 2.3 the channel h is ass umed to be unknown but deterministic and joint ML estimation of h and the dela y is performed. In contrast, in Section 2.4, we assume that the channel is random but with known statistics and use the likelihood function averaged over all channel realizations to construct the ML estimator for the delay,. These two approaches lead to two different optimal training signal designs. For the deterministic channel approach we see that orthogonal training sequences minimize the outage probability as well as the average CRB .For the random channel approach, perfectly correlated training sequences minimizes the CRB. Here we compare these two approaches in terms of the resulting ML estimators, CRBs, and suitability of the outage and average CRB performance measures.

PAGE 48

' ~ ....... .. ......... ........ .. ~ ~:--,"., ... ~ : : :...:'=-k_'. ,0 vi:., --'vn ,=1, nt1 --e-n ,=1, nt4 -+-n,=1, ni=B 9 n,=2 ni=1 e n ,=2, n t 4 + n ,=2, ni=B -"""-, ~~~~-... '~ .. ----~'<~-..... "i.\-. -< '1\$l ._ ...... 10 - .__ _____ _.__ _____ _,__ _____ _._ _____ ___.__ __ ___, .... _, 0 5 10 1 5 20 SNR ( dB) 38 Figure 2 5 : Comparison of CRBs obtained usin g ortho g onal training sequences and perfec tl y correlated training sequences for differ e nt numbers of transmit ant e nnas Note th a t the C RB of the system with the perfectly correlated training sequences is the same for any nwnber of tran smit ant e nna s

PAGE 49

10 ..-,-:--:-,-----;-,..,--,,,...,..,..,...,,,...,.,....,.,777-,-,--:--,-:-7"1"'.:-:--:-.,....,-:-"7"7"7-,-,-,-.,..,..,--r,-:----,-,-,,-:--:----:--:.,....,-:--:r----;-,-,---,----,-,-:--:---,--,=====;;=:=:==-, CRB n,=1 + CRB n,=2 e CRB, n,=4 "fl CRB n ,=8 _._ MSE n,=1 -+MSE n,=2 --eMSE n,=4 --VMSE n,=8 10-6~----~-------'--------'-----~----~------' 5 10 15 20 SNR(dB) 25 30 35 39 Figure 2.6: Comparison of the MSE of the ML estimator obtained from simu l ation and the CRB The number of transmit antennas nt is 2. The unit in the vertical axis is r t

PAGE 50

40 2.5.1 Orthogonal Trairung Signals When orthogonal training signals are employed, both the ML estimators of the delay T under the detenninistic and random channel approaches, respectively, reduce to (2. 70) Thus the equal gain combiner for the received signals from the receive antennas is the ML estimator for both approaches. Under the deterministic channel approach, the average CRB has the value 't/Jt (52 Eh[CRB(h)] = ( ) p"2 2 n -1... ' P r nt 'Pa (2.71) Under the random channel approach, the CRB has the value (2.72) As discussed before the CRB in (2.72) is asymptotically achievable by the ML estimator when the number of receive antennas goes to infinity ln addition, the limiting ratio between (2.71) and (2.72), when n r approaches infinity is 1 which is smaller than 1. This implies that l+ntPp'I the average CRB in (2.71) is not achievable by the ML estimator asymptotically when n,. approaches infinity. On the other hand for small values of nr, the value in (2.71) can be larger than the value of (2.72) when the SNR is large enough. More precisely this happens when !:J-> nt(n r n t -1). Thus in this case the average CRB in (2.71) actually gives a tighter bound on the performance of the ML estimator. The simulation results in Fig. 2.4 are in agreement with this observation 1n this sense the average CRB may not be as good a perfonnance measure a s the outage probability in the detenninistic channel approach since the latter is asymptotically achievable starting at very small values of n,., by the ML estimator. However for small values o f n1 and at high SNR, the average CRB may still be a reasonable performance metric. 2.5 2 Perfectl y Corr e lated Trairung Si g nals Under the random channel approach employing perfectly correlated training signals w e have AH A = / qqT where q is an arbitrary nt x 1 vector with q T q = nt. For instance 'l'bnt

PAGE 51

41 q = [ 1 1 . lf when we use the same training sequence and evenly assign the power to each transmit antenna. By using the matrix inversion formula the ML delay estimator for t his choice of perfectly correlated sequences is reduced to be exactly the same as the one for orthogonal training sequences given in (2. 70) We note that the knowledge of the SNR ff is not needed (j to implement the above ML estimator. Comparing the results in Figs. 2.4 and 2 6, the MSE obtained by the ML estimator with the perfectly correlated training sequences is smaller than that obtained by the ML estimator with orthogonal training sequences for all cases considered in the simulation studies. This observation is in agreement with the training sequence optimization result based on the CRB that the perfectly correlated sequences are superior than the orthogonal sequences under the random channel approach In general the SNR information is needed to implement the ML estimator. We also note that perfectly correlated training signals are not applicable in the deterministic channel approach since they cannot be used to estimate the channel vec tor h. 2.5.3 Deterministic vs Random Channel Approaches The results and discussions in the previous sections provide some guidelines of whether to use the deterministic or random channel approaches in estimating the timing parameter. If the design consideration is the outage probability i.e. neglecting a small percentage of the worst-case channel realizations one would employ the deterministic channel approach with orthogonal training signals. On the other hand if the average estimation ( over all channel realizations) error is the main design criterion one would employ the random channel approach with perfectly correlated training signals We note that the perfectly correlated training signals cannot be used for channel estimation. Thus they may be more suitable for space-time coding schemes that do not require the channel information. In addition, the advantage of the perfectly correlated training signals over orthogonal signals vanishes at high SNR in the random channel approach. Thus when the number of transmit antennas i s not very large and at high SNR, one could employ orthogonal training signals for e ither of the two approaches.

PAGE 52

CHAPTER3 CHANNEL ESTIMATION FOR CORRELATED MIMO CHANNELS WITH COLORED INTERFERENCE 3 .1 Introduction Many multiple antenna communication systems are designed for coherent detection that requires channel state information (CSI) in the demodulation process For practical wireless communication systems it is common that the channel parameters are estimated by sending known training symbols to the receiver. The performance of this kind of training-based chan nel estimation scheme depends on the design of training signals which has been extensively investigated in the literature. It is well known that imperfect knowledge of the channel has a detrimental effect on the achievable rate it can sustain [52]. Training sequences can be designed based on infonnation theoretic metrics such as the ergodic capacity and outage capacity of a MIMO channel [ 42 53, 54]. The mean square error (MSE) is another commonly used performance measure for channel estimation. Many works [55-65] have be carried out to investigate the training sequence design problem based on MSE for MIMO fading channels. In Wong et al. [61] the authors studied the problem of training sequence design for multiple-antenna systems over flat fading MIMO channels in the presence of colored interference. The MIMO channels are assumed to be spatially white, i.e. there is no correlation among the transmit and receiver antennas. The optimal training sequences were designed to minimize the channel estimation MSE under a total transmit power constraint. The optimal training sequence design result implied that we should intentionally assign transmit power to the subspace with less interference A practical algorithm of estimating the long-term second-order statistics of the interference correlation matrix and an efficient sc heme of feedin g back necessary information to the transmjtter for constructing the optimal training sequences were also proposed. In Kotecha et al. [62] the problem of transmit signal design was inve st igated for the estimation of spatial correlated MlMO Rayl e i g h flat fading channels. The optimal training signal was desi g ned to optimi z e two criteria: the 42

PAGE 53

43 minimization of the channel estimation MSE and the maximization of the conditional mutual information (CMI) between the channe l and the received signal. The authors adopted the virtua l channel representation model [66] for MIMO correlated channels It was shown that the optimal training signal should be transmitted along the strong transmit eigen-directions in which more scatters are present. The powers transmitted along these eigen directions are determined by the water-filling solutions based on the minimum MSE and maximum CMI criteria. In Cai et al. [65], the space-time spreading scheme, block coding scheme and channel estimation for correlated fading channe l s in the presence of interference have been studied The authors focused on the single receive antenna case and extended their results to the multiple receive antennas case where receive antennas were assumed to be uncorrelated Based on the previous optimization results for the special case [63] where there was no interference the space-time beamforming (STBF) matrix was chosen as the training symbol matrix for the linear MMSE channel estimator. Then the optima l power loading scheme was designed for the training symbol matrices in this particular set. In this chapter we investigate the prob l em of estimating correlated MIMO channels with colored interference. We adopt the correlated MIMO channel model [21, 67] which expresses the channel matrix as a product of the receive correlation matrix, a white matrix with identically and independent distributed (i.i.d.) entries, and the transmit correlation matrix. This model im p l ies that transmit and receiver correlation can be separated. This fact has been verified by field measurements. The colored interference mode l used here is more suitable than the white noise model when jamming signals and/or co-channel interference are present in the wireless com munication system. We consider an interference limited wireless communication system, and assume that the thennal noise is small relative to interference and can be ignored. Then we show that the covariance matrix of the interference has a Kronecker product form which implies that the temporal and spatial correlation of the interference are separable The channel estimation MSE is used as a performance metric for the design of training sequences The optimization problem encountered here which minimizes the channel estimation MSE under a power con straint is a generalization of two previous optimi z ation problems which are encountered widely

PAGE 54

44 in the signal processing area (61, 63, 64, 68]. We first analyze the optimal structure of the solu tion by using the Lagrangian method and then find the optimal power allocation scheme which has the water-filling interpretation. Finally we determine the optimal ordering for the related eigenvector matrices. In Cai et al. (65], the authors encountered the essentially same optimiza tion problem but with the different form. Based on the the previous optimization results for the special case (63] the authors chose to optimize the training sequence matrix in a particular set of matrices which have the same solution structure and eigenvector ordering as our solution. Here we rigorously prove that this particular solution structure and eigenvector ordering result are optimal for arbitrary matrices with the power constraint. The design of the optimal training sequences has a clear physical interpretation which implies that we should assign more power t o the transmission direction constructed by the eigen-direction with larger channel gains and the interference subspace with less interference. In order to implement the channel estimator and construct the optimal training sequences we propose an algorithm to estimate long-term chan nel statistics and design an efficient feedback scheme so that we can approximately construct the optimal sequences at the transmitter. Numerical results show that with the optimal training sequences, the channel estimation MSE can be reduced substantially when compared with the use of other training sequences. The chapter is organized in the following manner The system model and linear MMSE channel estimator that we consider are introduced in Section 3.2. In Section 3 3 The optimal training sequence is designed based on minimi z ing the total channel estimation MSE. In Section 3.4, an algorithm for the estimation of long-term characteristics of the channel is proposed and an efficient feedback scheme is designed. Numerical results are provided in Section 3.5. Conclusion is drawn in Section 3.6. 3.2 System Model We consider a single user link with multiple interferers. The desir e d user has nt transmit antennas and n r receive antennas. We assume that there are M interfering signals and the i th interferer has n i transmit antennas. The MIMO channel is assumed to be quasi-static (block fading) in that it varies slowly enough to be considered invariant over a block. However the channel changes to independent values from block to block. We assume that the users employ

PAGE 55

45 a frame-based transmis s ion protocol which comprises training and pay lo a d data. The receiv ed baseband s ignals at the receive antennas during the training period are given in matrix form by M M Y =HST+ LHi S i = HST +E =HST+ LEi. (3.1) i=l Then,. x nt matrix H and the nr x n i matrix H i are the channel gain matrices from the transmitter and the i th interferer to the receiver, respectively. S is the x nt training symbol matrix known to the receiver for estimating the channel gain matrix H of the desired user during the training period. N is the number of training symbo l s from each transmit antenna and N is usuall y much larger than nt S i is the N x ni interference symbol matrix from the i th interferer We assume that the elements in S i are identically distributed zero-mean complex random variables correlated across both space and time. The interference processes are assumed to be wide-sense stationary in time We consider an interference limited wireless c01mnunication system. Hence we assume that the thennal noise is small re l ative to interference and can be ignored [69]. We adopt the correlated MIMO channe l model [21, 67] which models the channel gain matrix Has: (3.2) where R t models the corre l ation between the transmit antennas and R r models the correlation between the receive antennas respectively The notation (-)112 stands for the Hermitian square root of a matrix. H w is a matrix whose e l eme nts are indep e ndent and identical distributed ze ro mean circular-symmetric complex Gaussian random variables with unit variance. Let h w = vec( H w), where vec( X ) i s the vec tor obtained by stacking the columns of X on top of each other then we h ave (3.3) with h ,....., CN(O Rt Rr ) where CN denotes complex Gaussian distribution. Similarly we model the channel gain matrix from the i th inter ferer to the receiver as: H R 1 ; 2 H R 1/2 1 -r WI ti (3.4)

PAGE 56

and Using the v e c operator we can write the received signal in vector form as y v ec ( Y ) M ( S I n r )vec(H) + I)si Inr)v ec( Hi) i=l M ( S Inr) h + 2)Si I n r )( R ; /2 R ;1 2) hwi i=l M ( S l11J h + L e ; where Inr denotes the n, x nr identity matrix. To derive the linear MMS E channel estimator we need the following lemma. Lemma 3 2.1. E ( e ) = 0 and th e c ovarian ce matrix of e i s M E (eeH) = L Q N i R r = Q N R i=l where 46 (3.5) (3.6) (3.7 ) (3. 8 ) and R1,k ( T) r eprese nts th e tim e cor relatio n b etween th e s i g nals at t i m e ins tants m an d m + T from the k th ant e nna of th e i th int erfe r er. Proof Since hwi ,..__, CN( 0 Inrnt ), E( e ; ) = 0. Then we have E ( e) = 0. The received signal from the i th interferer can be writt e n as E i = H i S i = R ;1 2H w ; R ;(2s i = R ;1 2HwiS i. ....___., S ; Since S i is wide-s e nse stationary in time S i is also wid e -sense s tationary in t ime. (3.9)

PAGE 57

47 Using the vec operator we can rewrite the interfering signal from the i th interferer as The covariance matrix of e i is given as E[(I N R ;1 2)vec( HwiS i )vec( HwiS i) H ( I N R ; l2)H] ( I N R ;1 2)E[vec( HwiS i)vec( Hwisit](I N @ R ;12). Let e : = vec( H w i S i), we can show that the covariance matrix of e : is I::Z ~1 Rtk(N -l ) I r QNi I n r, (3.10) (3.11) (3.12) where R l k ( T ) represents the time correlation between the signals at time instants m and m + T from the k th antenna of the i th interferer. Then we have The covariance matrix of e is then given as M E [eeH] = L Q N i R,. = Q R,. i=l (3.13) (3.14) D We note that Q N captures the temporal correlation of the interfer e nce and R,. represents the spatial correlation The covariance matrix of th e interference h as a Kronecker product form which implies that the temporal and spatial correlation o f the inter fere nce are se parable.

PAGE 58

48 We notice that (3. 6) represents a linear model. Based on the Bayesian Gauss-Markov Theorem [36], the linear minimum mean square error estimator (LMMSE) for h is given as: h [(SH InJ(Q N Rrt1( S InJ + ( R t Rrt1J-1(SH I nr)( Q N R r t1Y [(SHQ, / S + R ;ltl Rr](S H I nr)( QNl R;l)y (3.15) Using the equality vec( A YB) = ( B T A )vec( Y), we can rewrite the channel estimator in the compact matrix fom1 as iI = YQN1s (s11 Q N 1 s + R ;1 )-1 (3. 16) Hence the channel estimator does not depend on the receive channel correlation matrix Rr. The performance of the channel estimator is measured by the estimation error E = h h whose mean is zero and whose covariance matrix is C f E [ ( h h )(h h )"] [(SH Inr)( Q N Rr)-1( S Inr) +(R t Rr) -ljl [(SHQNlS ) R;l + R;l R;li-1 [(S" QNIS + R;l) R;li-1 ( SHQNls + R;ltl R r (3.1 7 ) where the third equality is due to ( A B)(C D ) = ACBD and ( A B ) -1 = A -1 B -1 The diagonal elements of the error covariance matrix C f y ields the minimum Ba y e s ian MSE The total MSE is the commonly used performance measure for MIMO channel estimation. By usin g the fact that tr ( A B ) = trAtrB, we have Thus the minimi za tion of the total MS E over trainin g sequ e nces do es not depend on the recei v e channel correlation matrix. Only the temporal interference correlation matrix Q and the trans mit correlation matrix R t need to be con s idered in obtainin g the optimal trainin g sequences.

PAGE 59

49 3.3 Optimal Training Sequence Design In this section we investigate the problem of optimal training sequence design for channel estimation. With the total Bayesian MSE as the perfonnance measure, the optimization of training sequences can be formulated as follows mm s subject to tr(SHQ;:/s + R 11t1 tr{SHS} P where tr{SH S} P specifies the power constraint. (3 .18) The optimization problem itself is of great interest to researchers in the signal processing and communication areas Its special cases (with either Q N or R t equal to the identity matrix) have been encountered widely in joint linear transmitter-receiver design [63, 68, 70], training sequence design for channel estimation in multiple antenna communication systems [61, 64], and spreading sequence optimization for code division multiple access (CDMA) communication systems [71]. The solution in the special case Rt = I, found for example in Wong et al. [61] and Scaglione et al. [68], can be expressed in terms of the eigenvalues and eigenvectors of Q N and a Lagrange multiplier associated with the power constraint. Similarly the solution in the special case Q N = I found for example in Zhou et al. [63] and Biguesh et al. [64] can be expressed in terms of the eigenvalues and eigenvectors of R t and a Lagrange multiplier associated with the power constraint. The optimization of the generalized mean square error problem introduced here is more difficult. We will show that (3.18) has a solution that can be expressed S = U'EVH where U and V are orthonormal matrices of eigenvectors for Q N and R t respectively, and 'E is diagonal. Solving (3 18) involves computing diagonali z ations of Q N and Rt, and finding an ordering for the columns of U and V. In Cai et al. [65] the authors encountered the essentially same optimization problem but with the different fonn. Based on the the previous optimization results for the special case [ 63], the authors chose to optimize the training sequence matrix in a particular set of matrices which have the same solution structure and eigenvector ordering as our solution. Here we ri g orously prove that this particular solution structure and eigenvector ordering result are optimal for arbitrary matrices with the power constraint.

PAGE 60

50 A related optimization problem which minimizes the trace of the mean square error matrix in a variant form is discussed in Section 3 7.1. and another optimization problem which max imizes the determinant of the inverse of the mean square error matrix is introduced in Section 3.7.2. We solve the optimization problem (3 .18) in three steps. First, we analyze the optimal structure of the solution by using the Lagrangian method then find the optimal power allocation scheme and finally determine the optimal ordering for the related eigenvector matrices 3.3 1 Solution Structure We begin b y analyzing the structure of an optimal so lution to (3.18). Let UAUH and V .6. V H be diagonalizations of Q N and R t where the columns of U and V are orthonormal eigenvectors. Let A j, 1 ::; j ::; N, and c5i 1 ::; i ::; nt, denote the diagonal elements of A and .6., respectively. We assume that the eigenvalues { >-.;} are arranged in increasing order and { c5;} are arranged in decreasing order: (3 .19) Let us define (3.20) Substituting S = UTV H in (3 .18) gives the following equiva lent optimiz ation problem: mm (3.21) subject to tr (THT) ::; P T E (CNxnt. We now show that the solution to (3.21) has at most one nonzero in each row and column. Theorem 3.3.1. There exists a solution of (3.2 1 ) of th e form T = I11l:I12 where I11 and Il2 are p er mutation matrices and aij = 0 for a ll i #j Proof We first prove the theorem under the following nond egeneracy assumption: c5; =/-c5j > 0 and >-.i =/\ > 0 for all i =/j. (3 .22 )

PAGE 61

51 Since the cost function of (3 .21) is a continuous function of a and A and since any ,\ > 0 and 8 > 0 can be approximated arbitrarily closely by vectors 8 and ,\ satisfying the nondegeneracy conditions (3. 22), we conclude that the theorem holds for arbitrary ,\ > 0 and 8 > 0 There exists an optimal solution of (3 21) since the feasible set is compact and the cost function is a continuous function of T. Since the eigenvalues of a TH A -1T a are nonneg ative it follows that for any choice of T with equality when T = 0 Hence there exists a nonzero optimal solution of (3 .21), which is denoted T According to the Lagrange multiplier theorem the first-order necessary condition for an optimal solution is the following: there exists a scalar 'Y 2: 0 such that: d tr ( ( T H A -1T + a -1)-1 + 'VT HT) -= 0 d T I T ~ For notation simplicity, let For any invertible matrix M the derivative of the inverse of a matrix [72] is given as: Hence (3.23) is equivalent to : for all matrices 8 T E c N x n,_ (3. 23) ( 3.24) Let Real ( z ) denote the real part of z E C. Based on the fact that tr ( A + AH) 2( Real [tr ( A)]) and tr (AB) = tr (BA) we have By talcing 8 T either pure real or pure im a ginary we deduce that

PAGE 62

52 for all b"T. By choosing 8 T to be completely zero except for a single nonzero entry we conclude that -yTH M -2T11 A l = 0 (3.25) If -y = 0, then T = 0 since both A and A are invertible. Hence, -y > 0. We multiply (3.25) on the right by T to obtain (3. 26) Since THT is Hermitian we have Then we will show that TH A -1T and A -1 commute with each other. We need the following lemma (73]: Lemma 3.3.1. If A and Bare diagonalizable, they share the same eigenvecto r matr ix if and onl y if AB = BA. For the simplicity of notations let A = TH A -1T and B = A -1 Then we have ( A + B t2 A = A ( A + B t2 According to Lemma 3.3 1 A and ( A + B)-2 share the same eigenvector matrix. Since A + B and ( A + B)-2 ha ve the same eigenvector matrix A and A + B share the sam e eigenvector matrix. Then we have A ( A + B ) = (A+ B ) A Hence AB= BA, which implies that TH A -1T and A -1 commute with each other Since A -1 is diagonal T H A -1T i s dia g onal. Since T H A -1T is diagonal THT is dia gona l by (3. 26) Since TH A -1T and A -1 are diagonal, both Mand M -1 are diagonal. Hence the factor M -2 in (3.25) is dia g onal with real diagonal e l e m e nts d e noted ej, 1 :S j :S nt. By (3.25) we

PAGE 63

53 have (3. 27) If tij =I= 0, then (3 .27) implies that By the nondegeneracy condition (3.22), no two diagonal e lemen ts of A are equal. If for any fixed j, tij =I= 0 for i = i1 and i2 then the identity = yields a contradiction since =I= 0 and >.i1 =I= >.i2 Hence, each column of T has at most one nonzero. Since 'i'H'i' is diagonal two different columns cannot have their sing le nonzero in the same row. This implies that each column and each row of T have at most one nonzero A suitable permutation of the rows and columns of T gives a diagonal matrix E, which completes the proof. D Combining the relationship (3.20) between T and Sand Theorem 3.3. l we conclude that problem ( 3 .18) has a solution of the form S = UI11EI12 VH, where I11 and I12 are permutation matrices We will show that we can eliminate one of these two permutation matrices. Substituting S = UI11EI12 yH in (3.18 ), the equivalent optimization problem is obtained as: (3.28) 1\1 subject to tr L CY; :S P i=l where M represents the minimum of N and nt, In the above optimization problem, the mini mization is over diagonal matrices E with CY1 ... CYM as the diagona l elements, and two per mutation matrices I11 and I12 Since the symmetric permutations II{' A -1 I11 and I12d -1 II{ essentially interchange di agona l elements of A and d (3.28) is equivalent to M 1 min :z:= 2; N cr,1r1,1r2 i = l (Ji A1r1(i) + 1 1r2(i) M s ubject to L CY. ; :SP 1r1 E PN, 1r2 E Pnt i = l where P N is the set of bijections of {1, 2 ... N} onto its e lf. (3.29)

PAGE 64

54 We will now show that the optimal solution only depends on the smallest eigenvalues of Q N and the largest eigenvalues of Rt. Lemma 3.3.2. Let U AU H and V Do V H be diagonalizations of Q and D respectively where the columns ofU and V are orthonormal e ig envectors. Let a, 1r1 and 1r2 denote an optimal solution of (3.29) and define the sets M = { i : a i > O}, Q = P.1ri( i): i EM}, and R = {61r2(i) : i EM}, ff M has l elements, th e n the elements of the set Q constitute the l smallest eigenvalues ofQN, and the elements ofR constitute the l largest eigenvalues Rt, respectively. Proof Assume k (/. M and >-1ri(k) < >-1ri(i) for some i E M. It is easy to see that by inter changing the values of 1r1 ( i) and 1r1 ( k ) the new i -th term in the cost function is smaller than the previous i -th term. It contradicts the optimal assumption of a and 1r. Then >-1ri(k) 2:'. >-1ri(i) Then, suppose that k (/. M and 81r2 ( k ) > 81r2(i) for some i E M. Let C denote the cost value due to the sum of the i -th term and the k-th term before the interchange. Similarly let c+ denote the cost value due to the sum of the i -th term and the k -th term after the interchange of the values of 1r2 ( i) and 1r2 ( k). We have and c+ -C < 0 (81r2(k) 81r2(i))(ai 4 81r2(k)81r2(i)/ >-;1(i) + a/81r2(k ) / >-1r1(i) + a/81r2(i)/ A1r1(i)) (a/61r2 ( ki/ >-1ri (i) + l ) (a/81r2(ii/ A1r1(i) + 1 ) The cost is reduced by interchanging the va lues of 1r2 ( i) and 1r2 ( k), which violates the optimality of a and 1r. Hence 61r2 (k) 81r2(i) D

PAGE 65

55 Using Lemma 3 3 2 we now show that one of the permutations in (3.29) can be deleted if the eigenvalues of Q N and R t are arranged in a particular order. Theorem 3.3.2. Let U AU H and V V H be diagonalizations of Q N and R t respectively where the columns ofU and V are orthonormal e ig envectors, the eigenvalues of Q N are arranged in increasing order and the eigenval u es of Rt are arranged in decreasing order. ff M is the minimum of the rankofQ N and R t then ( 3 29 ) is equival e nt to min U,7r M l (Jl / >-1r(i) + 1/0 (3.30) M subject to L CJ; P 1r E PM, i = l where CJi = 0 for i > M Proof Since at most M eigenva lu es of either Q N or R t are nonzero, it follows from Lemma 3.3.2 that the set M has at most M elements Since the elements of Q are the smallest eigen values of Q and the e l ements of R are the largest eigenva lu es of R t we can assume that 1r1 ( i ) E [ l NI] and 1r2(i ) E [ l AI] for each i E M. Hence, we restrict the sum in (3.29) to those indices i E S where Let us define CJ; = CJ1r;1(j) and 1r(j) = 1r1(1r21(j)). Since 1r(j) E [ l J\l] for j E [ l NI], it follows that 1r E P M In (3.29) we restrict the summation to i E S and we replace i by 1r21(j ) to obtain M where L ( (J; )2 P. i=l This completes the proof of (3 .30) D Combining the relationship (3.20) between T and S Theorem 3.3 1 an d Theorem 3 3 2 yie lds the following corollary: Coro llar y 3.3.1. Probl em (3.18 ) has a solution of the form S = UII:EVH wh e r e the columns of U and V are orthonormal e ig e nv e ctors of Q N and R t r espec ti ve l y with the e ig e nvalu e s o f

PAGE 66

56 Q N arrang e d in increasing orde r and th e e igenval ue s of R t arrang e d in d ecreasing order, IT i s a permutation matrix, and :E is diagonal Proof Let a and 1r be a solution of (3.30). For i > M define 1r(i) = i and ai = 0. Ifll is the permutation matrix corresponding to 1r, then making a substitution S = UIT:EVH in the cost function of (3.18) yields the cost function in (3.30). Since (3. 29) and (3.30 ) are equ i valent by Theorem 3 3.2 Sis optimal in (3. 18). 3.3.2 The Optimal :E D We now consider the optimization problem which minimizes the cost function over a with the permutation 1r in (3 30) given. Then in the next subsection we will find the optimatial permutation 1r based on the solution to the optimization problem considered here. For the sake of notation simplicity let Pi denote 1 / >..1r(i) and qi denote 1/ 6i. Hence for fixed 1r (3.30) is equivalent to the following optimization problem: min (T subject to M l ~Pia;+ M I: a?::; P. i=l (3. 31) The solution of (3 31) can be expressed in terms of a Lagrange multiplier related to the power constraint. The structure of this solution has a water fillin g interpreta t ion in the communication literature [7 4]. Theorem 3.3.3. Th e optimal solution o/(3.31) is give n b y a ; = max { /1 q \ 0}112, VPii, p; w h e r e th e parame t e r is c hosen s o that ( 3 32) ( 3 3 3 ) Proof Sinc e th e minimi z ation of the co s t function in (3 .31 ) is ov e r a closed and bound e d set, there exists a solution. At an optimal solution to (3 31) the power constraint must be an equali ty Otherwise we can multiply a by a scalar larger than l to reduce to the value o f the cost function. For the sake of notation simplici ty, let t i = a;. Then the r e duced optimization probl e m (3. 31)

PAGE 67

57 is equivalent to min t M l P it i + Q i (3 -34) M subject to L t i = P t o i = l Since the cost function is strictly convex and the constraint is convex the optimal solution to (3.34) is unique. According to the Lagrange multiplier theorem the first-order necessary conditions [51] (Karush-Kuhn-Tucker conditions) for an optimal solution of (3.34) are the following: there exists a scalar 0 and a vector v E IR. M such that 1 :Si :S M (3.35) Due to the convexity of the cost and the constraint any solution of these conditions is the unique optimal solution of (3.34). A solution to (3.35) can be obtained as follows. We define the function (3.36) Here x+ = max{x, 0}. This particular value for t i is obtained b y setting v i = 0 in (3.35) and solving for t i ; when the solution is < 0 we set t i() = 0 (this corresponds to the + operator (3.36)) We note that t i() is a decreasing function of which approaches +oo as approaches 0 and which approaches O as grows to +oo. Hence the equation M Lti() = P (3.3 7 ) i=l has a unique positive solution Since t i (pd q;) = 0, we have t i() = 0 for P d q;. Then we have p Pi O fi / 2 ( ( ) ) 2 + = -2 + > or > p q,. P it i + q q We deduce that the Karush-Kuhn-Tucker conditions can be satisfied when is the positi ve so lution of (3.3 7) D

PAGE 68

58 3 3.3 Optimal Eigenvector Ordering Finally we need to find an optimal permutation in ( 3.30), i.e. an optimal ordering for the eigenvalues of Q N and Rt. Theorem 3.3.4. If the eigenvalues {>.i } ofQN are arranged in increasing order and the eigen values { bi} of R t are arranged in decr e asing order, then an optimal permutation in ( 3.30) is 1r( i) = i 1 :S i :S M. (3.38) Proof Assume that there exist indices i and j such that i < j, A i > \ and b i > b1 i.e., P i < p1 and q i < q1 A i and A1 are not arranged in the supposed optimal order for the eigenvalues of Q N We will show that it will cause contradiction. Let us consider the following optimization problem: 1 1 mm ---+---ti,ti P i l i + Q i p1t1 + Q1 (3. 39) subject to t i + t1 = P t i 0 t1 0 where P = a-;+ aJ. Since o yields an optimal solution of (3.30) it follows that a solution of the above optimization problem is t i = o; and t1 = oJ. Based on Theorem 3 3 3 the t i is given as { Q i t i()= --, P P i where is a Lagrange multiplier obtained from the power constraint t i + t1 = P as: 1 + 1 ...ffi viii ./= P+!li+.
PAGE 69

59 positive after the exchange of P i and p1 we have (_1 + _1 )2 + ..ff; ../Pi C = P+!li.+.!li. P i Pi (3.42) We need to use the following lemma [40]: Lemma 3.3.3 If ai, b i i = 1 ... n are two sets of numb ers, n n n i=l i=l i=l From the above lemma we have !li. + .!Ii > .!Ii+ !!i.._ Then c+ < C. Pi Pi Pi Pi If c+ < C, it contradicts the optimality of a. Then we have c+ = C Hence for each i and j with i < j, Pi < p 1 and q i < q 1 we can interchange the values of Pi and p 1 to obtain a new permutation with the same value for the cost function. After the interchange, we have Pi > p 1 i.e ., A i < A1 In this way the A i are arranged in increasing order. Since the 6 i are arranged in decreasing order, we conclude that the associated optimal permutation 1r is (3.38). One technical point must now be checked: we should verify that if P i < p 1 and qi < q 1 with i < j, and if we exchange p.; and p1 then the corre s ponding optimal solution of (3.39) remains positive To check it we consider two cases respectively. For the first case suppose ak > 0 with i < j < k, Pk < P i < p 1 and qi < q 1 < qk. From (3.40), we have Then After the exchange 1 # P+ ~+.!Ii 1 P i Pi > _ 1 + 1 r;-; ..ff; ../Pi V ,.., ( + ) I+ q i 1 1 qi 1 1 qk ) ti = ---=-(---) > (---> 0 P1 + Pj ,JP; if y'Pj y'Pj ..j ./Pk

PAGE 70

60 Similarly For the second case suppose j = max(N) and P i = min(Q),P i < P j and q i < Q j Since the original so lution befor e th e exchange, is positi ve, it follows from (3.40) and (3.41) that P > _q_i q j and P > ___2j_ qi Jp;pj Pj Jp;pj Pi (3.43) After the exchange, the analogous inequalities that must be satisfied to preserve nonnegativity are ( 3 44) and p > _q_i qj. Jp;lij p; (3.45) (3.45) is satisfied from (3.43) and the fact that p ; < Pj and q i < Q j If (3.44) is also satisfied, the proof is completed. If (3.44) is n o t sa tisfied i.e., P '.S ../;iPj -9};, the associated cos t after the exchange is where t i = P and tj = 0. We will show that C* :S C with P > ../P;Pi ~' P > ..;!~P; ~ and P < _Jj_ !Ji Letting C* < C g i ves ../PiP; Pi ( 1 + 1 )2 1 + _!_ < 7rf; 7rffq. P p + q q p + !Ji + :!1. J t J P i Pi Multiplyin g both s id es of the above inequality with (p1P + qi)qj(P +;:+~)gives After considerable algebra on th e abov e inequali ty, we find th at to s how C* :S C is e qui va l e nt to s how th at f ( P ) :S 0 w ith

PAGE 71

61 when P E ( max[____!li__ -':!i __!!j_ -!!i] __!!j_ -9i). Since ) P i P i Pi' JPi P i P i ) P i P i P i when _!ii..._ -':!i > __!!j_ -IJi ) P i Pi P i -JPiPi P i when __!!j_ -!Ii > _!Ji..._ -':!i and ../PiPj P i J P i P i P i we have C* :'.SC. D Combining Corollary 3.3.1 Theorem 3.3.3 and Theorem 3.3.4 we conclude that the opti mal training sequences should be designed according to the following theorem Theorem 3.3.5. L e t U AU H and VA V H b e th e diagonali z ation s of Q N and R t resp ect i vely w here th e columns ofU and V are orthonormal e ig e nv ect ors th e c orr e sp o ndin g ei genval ue s { >.i } are arranged in in c r e asing ord er, and { c5i} are arra n g e d in d ec reasing order. Then t he optimal solution of ( 3 1 8) i s given by (3.4 7 ) wher e :E sp e cifi es th e pow er allo ca tion whi c h is diagonal wi th dia go nal e l eme nts g i ve n b y (3.48 ) and a i = 0 for i > nt, w i th th e par am e ter is c h ose n so t ha t n t L a l= P. 'i= l With the optimal training sequence the channel estima tor simplifies to H H = YUntrVnt, (3.49 )

PAGE 72

62 where r = diag{'Y1, ... ,nJ with 'Yi= at~:~.x; the columns ofU111 are the eigenvectors of Q N corresponding to its nt smallest eigenvalues, and the columns ofV 111 are the eigenvectors of R t The design of the optimal training seq uence s summarized in the above theorem has a clear physical interpretation. Each eigenvector of the transmit correlation matrix R i represents the transmit eigen-direction and the associated eigenvalue indicates the channel gain in that eigen direction. More power should be assigned to the signals transmitted along the eigen-direction with larger channel gains. On the other hand each eigenvector of the interference temporal correlation matrix Q N represents the interference subspace and the corresponding eigenvalue indicates the amount of interference in that su b space. Hence, we should choose the subspaces with the least amount of interference for transmission To facilitate the understanding of the physical meaning of optimal training sequences, we can rewrite them in an alternative way as n t S = L CTi u ivt i = l where u i are orthonormal e i genvectors of Q N with the corresponding eigenvalues arranged in an increasing order and v i are orthonormal eigenvectors of R i with the corresponding eigenvalues arranged in a decreasing order. The vectors u i and v i form transmission directions in time and space respectively. The above theorem implies that the optimal training sequence design put more power to the transmission direction constructed by the eigen-directions with larger channe l gains and the interference subspaces with l ess interference. The power assignment is determined by the water-filling arg um ent under a finite power constraint. 3.4 Estimation of Channel Statistics and Feedback Design To implement the channel estimator and construct the optima l training sequences for chan nel estimation we need the knowledge of the transmit antenna correlation matrix R t and the interference covariance matrix Q N at both the receiver and transmitter sides. Since these two matrices are long-term channel characteristics they can be estimated by using the observed training signals at the receiver end and then fed back to the transmitter end for the construc tion of the optimal training sequences. In this section, we propose an algorithm to estimate these l ong-term channel characteristics and design an efficient feedback scheme so that we can approximately construct the optimal training sequences at the transmitter end.

PAGE 73

63 Let us assume that the training signa l matrix S is sent over a block of K packets. The received training signals for the n th packet are given as y (n) ( S 0 InJh(n) + e (n) ( S I nr)(R~1 2 R ;12) hw(n) + e (n) (SR;1 2 0 R;l2) h w(n) + e (n). (3.50) We can calculate the samp l e average correlation matrix of the recei ved signal from the previous K packets as follows : l l( A H R = K 6 y(n)y(n) n = l (3.51) It is easy to see that R is the sufficient statistics for the estimation of the second-order correlation matrices R t and Q N if e( n) is Gaussian distributed. We can show that the correlation matrix of the received signal has the Kronecker product form: R E(y (n) y (n)H) = (SRz12 0 R ;1 2)E(hw(n)h w(nt)( R z1 2S H R ;1 2 ) + E( e (n) e (n)H) (SRiSH) R,. + Q N R r (SRiSH + QN) R,. (3.52) where R q = SRiSH + Q N If R = R q Rr, then R = a R q ~R,. for any a=/: 0. Hence, R q and R,. can not be uniquely identified from observing y (n) Fortunately the channel estimator and the design of optimal sequences are invariant to scaling of the estimates of R t and QN. This can be explained as follows: fI' (n) Y (n)(a Q N ) -1s(SH (a Q N ) -1s + ( a Ri)-1 ) 1 Y (n) Q,/S (SH Q ;~,1S + R t1 ) 1 H(n)

PAGE 74

64 and We notice that the new cost function of the optimization problem is just a scaled vers ion of the original cost function For the estimation of R q and R,. we need to impose an additional constraint on R r Here we force tr ( R,. ) = nr Then an iterative flip-flop algorithm [ 75 76 77] can be used to estimate R q and Rr. If the received interference signa l e(n) is Gaussian distributed the flip-flop algo rithm provide s the maximum lik e lihood estimates (MLE) of R q and R,. [75] when it converges. Otherwise the algorithm gives the estimates of R q and R,. in the least square sense. For fixed R,. the MLE of R q is obtained as (3.53) where a~v is the (u v ) th element of R;:-1 and Y u(n) is the u th row v ector of the received signal matrix Y (n) Similarly for fixed Rq, the MLE of R r is obtained as (3.54) where a~v is the ( u v) th e l ement of ft;1 and W u(n) is the u th column ve ctor of the received signa l Y (n) Then to get uniquel y identifiable R q and R r, we need to scale R,. to make tr (Rr) = nr, We note that the terms inside the brace s in (3.53) and (3.54) can be computed before the running of the iterative estimation a l gorithm to reduce computational complexity To start the iterative a lgorithm an initial va lue of either R q or R,. shou ld be assigned. A natu ral choice is to initially make Rr = l11r Then the iterati ve algorithm alternates between the calculation of Rq and ii;. until convergence. While it is difficult to prove ana lytica lly that the algorithm conver ges to the MLE extensive dat a experiments [75] in statistics show that it al ways converges to the MLE for situations of practical samp l e sizes. The convergence in our case is also verified by the num erica l results in Section 3.5.

PAGE 75

65 Then R t and Q N can be estimated based on Rq. We note that only R ( Q N) n R.l(S) can be uniquely identified from R q in the sense below ('R denotes the range space of a matrix and n.1 denotes the perpendicular subspace oftbe range of a matrix): Lemma 3.4.1. Let R t and Q N be Hermiti an positive semi-definite matrices andR q = SRtSH + QN, where Sis of full rank. Let IIJl = { ( Ri, QN) : R(QN) C R.l(S)}. Then there is an 1-1 correspondence between R q and ( R1 , QN) only for the pairs of ( R t QN) in IIJl. Proof Let Ps = S (SHs)-1SH be the projection onto R ( S ) and Pt= 1-Ps be the projection onto R.l(S ). First, let ( R t, QN) ( R~, Q ~) E IIJl. Let R q = SRt SH + Q N and R ~ = SR~ SH + Q ~ C d p .l .l Q Q s H .l I .l I I ons1 er 8Rq = P8 N = N PsRq = R t S and P8R q = P8Q N = QN, PsRq = SR:sH. Since S is of full rank PsRq = PsR~ iff R t = R ; Also since Ps an d P t are projections onto complementary subspaces, R q = R ~ iff P t R q = Pt R ~ and PsRq = PsR~, i.e. (Rt, QN) = ( R ; Q ~) Conversely let ( R , QN) E IIJl and R ei= SRt SH + Q N Now choose R : =I= R t and define Q ~ = Q N + SRt SH sR;sH. Since R ( QN) C R.l( S ) ands is of full rank ( R ; Q~) (/. IIJl. D Based on the above lemma we see that estimating Q N and R t simultaneously from R q is not possible Howerver since PtRqPt = PtQNPt, we can estimate Q N from PtRqPt. For notation simplicity, let A denote Pt RqPt. Since the interference signa ls are wide-sense stationary in time, Q N is a Topelitz matrix which can be represented by a sequence { Qk; k = 0 1 (N 1)} with Q N = { Q k j } = { Qkj}Then the ijtb element of PtQNPt is given by I:1 I:k P i lQl -kPkj with P i J denoting the ijth element of Pt. Eq uating the ijth element of PtQN P t with a i1 we have a set oflinear equations in {qk}. Noticing the hermitian nature of P t Q N P t and A and separating the real an d imaginary parts of qk and aij, we have N2 lin ear equations with 2 1 unknowns in Q r = [ q0 Re (q1 ) Im (q1 ) ... Re(qN-i), Im(qN_i)JT. The set of lin ea r equations can be s olved b y e mploying the l east sq uar e approach. Then the estimate of Q N can be constructed based on Q r

PAGE 76

66 In addition when N is large Q N can be approximated by a circulant matrix [78] with fixed eigenvectors as: (3.55) where F N is the N x N FFT matrix and \JI is a diagonal matrix containing eigenvalues "Pi We notice that we only require the nt smallest eigenvalues of Q N and their corresponding eigen vectors in constructing the optimal training sequences. With the circulant matrix approximation (3.55) it is equivalent to estimating the ni smallest eigenvalues 1/); and identifying the corre sponding columns of F. The nt smallest positive eigenvalues of Q N are used as the estimates of the ni smallest '!/);, and the corresponding columns of F are chosen as those closest to the eigenvectors associated with the ni smallest positive eigenvalues of Q N The estimates of the nt smallest '!/Ji and the ni indices of the chosen columns of F N are then fed back to the transmitter for the optimal training sequence construction. We notice that it is bandwidth efficient to just feed back these indices of F N instead of the whole eigenvectors of Q N because the number of training symbols N during the trainin g period is usually large. To derive the estimator of Ri, we need the following lemma which establishes the asymp totical equivalence of Q N and P t Q N P t as N increases. Lemma 3.4.2. With the assumption that Q N is an absolutel y summable Toeplitz matrix Q N and PtQNPt are as y mptoticall y e quival e nt Sinc e Q N is Toeplit z PtQNPt is asymptotically Toeplitz. Proof Two definitions of the norms of a matrix which include the strong norm and weak norm [78 79] are needed to study the asymptotic equivalence of two matrices The strong norm II A I I is defined as II A II= maxx:xx=i[ x A *Ax] = J Amax(A A ) where Amax represents the lar gest eigenvalues of a matrix If A i s Hermitian 1 1 A II= I Amax(A)I. The weak norm of A is defined as

PAGE 77

67 Two sequences of n x n matrices A n and B n are said to be asymptotically equi v alent [78] if A n and B n are uniformly bounded in strong norm: I I A n II, I I B n 11:S NJ< 00 and A n B11 approaches zero in weak norm as n oo : lim I A, B n l = 0 n->oo If one of the two matrices is Toeplitz then the other is said to be asymptotically Toeplitz Without the loss of generality, we assume that Q N is an absolutely summable Toeplitz matrix (For the temporal interference correlation matrix Q N arising from practical scenarios such as jamming signals and co-channel interference considered here it is easy to v erify that Q N is absolutely summable.) Q N can be represented by a sequence { qk; k = 0 l , .. } with Q N = {qk j } = {qkj } and L t: _00 l q k l < oo It is shown [ 80] that Q N is bounded in strong norm as: +oo I I Q N 1 1:S 2 L lqkl = 2Mq < 00. k = oo Then we need to s how that I I P t Q N P t I I is also bounded. Usin g the properties o f t he strong norm we have I I P~QNP~ I I II (I Ps) QN(I -Ps) I I I I Q N PsQN Q NPs + PsQNPs II < I I Q N I I + I I PsQN II + I I Q NPs I I + I I PsQNPs II To proceed we need the followin g lemma [ 40]: Lemma 3.4.3. For two H e rmitian pos itiv e semid efi nit e matri ce s G and H

PAGE 78

68 Then we ha ve Similarly I I Q NPs 1 1 ~11 Q N I I and I I PsQNPs 11~11 Q N IIThus, I I PtQN P t I I ~ 4 I I Q N II= 8 M q Let M = 8Mq then II Q N I I ~ M < oo and 11 P t Q N P t I I ~ M < oo Next we need to show that the distan ce of the two m atr ices goes to zero asymptoticall y in weak norm. Using the propertie s of weak norm, we have IQN P t Q N Ptl = IPsQN + Q NPs PsQNPsl We need the following Lemma [78 80]: Lemma 3.4.4. Giv e n two n x n matric e s G and H th e n IGHI ~ I I G II I HI. The weak norm of P s can b e written as Then u sing the a bo ve l emma we have Similarly I P s Q N P s l ~ I I P s Q N II en ~II Q N I I (;J ) (W )Mq and IPsQNI = I Q NPsl (;J ) 2MI/. Then we can show that D

PAGE 79

69 Based on the above lemma the transmit channel correlation matrix R t can be estimated by projecting the received signal onto R ( S). Since N is usually much larger than nt, we have (3.56) and hence (3.57) Then we can estimate the transmit channel correlation matrix R u si ng (3.58) 3.5 Numerical Results In this section we present some numerical results to show the performance gain for channel estimation achieved by the designed optimal training sequences We consider a MIMO system with 3 transmit antennas and 3 receive antennas. The antennas form uniform linear arrays at both the transmitter and the receiver. For a small angle spread the correlation coefficient between the i th and the j th transmit antenna [67] can be approximated as : 1 f27r d d [ R tkJ 271" J o exp{ j 27fli jl sin 6 ; s in 0}d0 = J0(27fli jl s in 6 ; ) (3.59) where 10 ( x) is the zeroth order Bessel function of the first kind 6 is the angle spread, dt is the antenna spacing and >. is the wavelength of a narrow-band signal. We set dt = 0.5>.. In the simulations, we consider two channels with different transmit channel correlations: a high spatial correlation channel with 6. = 5 and a low spatial correlation channel with 6. = 25. The receive correlation matrix R,. is calculated similarly as the transmit correlation matrix with 6. = 25 We consider two kinds of interference : the co-channel interference from other users in the same wireless system and jamming signals which are usually modeled by autoregressive (AR) random proces ses. We compare the channel estimation performance in terms of the total MSE for systems using different sets of training sequences. The following different tr a inin g sequence sets are considered for comparison : 1) the optim a l training sequences de scr ibed in Section 3.3., 2) the

PAGE 80

70 approximate optimal training sequence constructed based on the channel and interference statis tics obtained by using the proposed estimation algorithm in Section 3.4 3) the temporally op timal training sequences for which the transmit channel correlation matrix is assumed to be an identity matrix and only temporal interference correlation is considered in designing the optimal training sequences. (we also consider the approximate temporally optimal seq uences which are constructed based on the channel statistics obtained by using the proposed algorithm), 4) the spatially optimal training sequences for which the interference is assumed to be temporally white and only transmit correlation is considered in designing the optimal training sequences. (we also consider the approximate spatially optimal sequences which are constructed based on the channel statistics obtained by using the proposed algorithm), 5) Binary orthogonal se quences 6) Random sequences. 3. 5 .1 Co-channel Interference In a cellular wireless communication system, co-channel interference (CCI) from other cells exists due to frequency reuse. Hence the interfering signals have the same signal format as that of the desired user We can express the interfering signal transmitted from the i th transmit antenna of the mth interferer as (3.60) where P m is the transmit power of the m th interferer and { b ~ 7 ) } are data symbols tra nsmitted from the i th transmit antenna of the mth interferer. They are assumed to be i.i.d. binary random variables with zero mean and unit variance. In addition '1/J(t) is the symbol waveform and Tis the symbol duration. It is assumed that the receiver is synchronized to the desired user but not necessarily to the interfering signa l s and Tm is the symbol timing difference b etween the m th interferer and the desired user signal. Without l oss of generality, we assume 0 :S Tm < T. The elements of the interference symbol matrix S i are samples at the matched filter output at the receiver at time index jT. The ( j i )t h element of S i is (3. 61)

PAGE 81

71 with ~(t) = 1 : 'lj)(t s )'lj)H (s) d s (3.62) where ~(t) is the autocorrelation of the symbol waveform. For the co-channel interference, the temporal interference correlation is due to the intersymbol interference in the sampled interfer ing signals. In the simulations, it is assumed that there are two interfering signals in the system and the SIR (signal-to-interference ratio) is set to be OdB. The ISi-free symbol waveform with raised cosine spectrum is chosen as the symbol waveform. For this case we have c o s ( 1r f]t / T ) 'lj)(t) = smc(1rt/ T) /32 2 / T2 1 4 t We set the roll-off factor /3 = 0.5, T 1 = 0 .2T and T 2 = 0.5T. In Fig. 3.1 and Fig. 3.2, we show the total channel estimation MSEs for the high spatial correlation channel and low spatial correlation channel respectively For both cases, the opti mal sequences outperform the orthogonal sequences and random sequences significantly. For the high spatial correlation channel, the optimal sequences provide a substantial performance gain over both the spatially optimal sequences and the temporally optimal sequences. The ap proximate optimal sequences achieve most of the perfonnance gain obtained by the optimal sequences. For the low spatial correlation channel the MSE perfonnance of the approxima t e optimal sequences is close to that of the optimal sequences The temporal correlation has a stronger impact on the channel estimation than the spatial channel correlation due to the fact that the length of training sequences N is much larger than the number of transmit antennas t It is verified by the simulation results shown in Fig. 3.2 that the temporally optimal sequences achieve an estimation performance similar to that achieved by the optimal sequences. These two optimal sequences provide significant performance gain over the spatially optimal sequences. 3.5.2 Jamming Signals We assume that there are two jamming signals in the system The jamming signals are modeled as two first order AR processes driven by temporally white Guassian processes { ui, t } as si, t = aisi, t 1 + ui,t (3. 6 3 )

PAGE 82

w Cl) 10 -1 ~--~--~---~--~--~-;:.:::.:::.:::.::r.:::.:::.:::.:::.:::.:::.r.:::.:::.:::.:::.:::.:::r.:::.:::.:::.:::.:::~~.:::.:::.:::.:::.:::-----.L., \ .... '* 15 20 25 30 35 40 N ---*1) Optimal sequences 2 ) Approximate optimal sequences -v--3) Temporally optimal sequences --e4) Spatially optimal sequences -e5) Orthogonal sequences -+6) Random sequences --<:7-Approximate temporally optimal sequenes -0Approximate spatially optimal sequences 45 50 55 60 65 72 Figure 3.1: Comparison of total MSEs obtained using different training sequences. ISi-free symbol waveform and high spatial correlation channel.

PAGE 83

UJ Cl) --+1) Optimal sequences --+2) Approximate optimal sequences --'7-3) Te mpora lly optimal sequences --e-4) Spatially optimal sequences . -e-5) Orthogona l squences -t6) Random sequences --'vApproximate tempora ll y optimal sequences -0Approximate spatial! optimal sequences 10-2~-~-----'----'-----'---_._ __ _,__ __ _._ __ ....__ __ _..__ __, 15 20 25 30 35 40 N 45 50 55 60 65 73 Figure 3.2: Comparison of total MSEs obtained using different training sequences. ISi-free symbol waveform and low spatial correlation channel. where si,t represents the jamming signal transmitted by the i th jammer at the t th time index ai is the temporal corre lati on coefficient and u i t has zero mean with variance O"~,i which decides the transmit power of the i th jammer. The SIR is set to be O dB. We choose a1 = 0 .4 and 0'.2 = 0.5. In Fig. 3.3 and Fig 3.4 we show the total channel estimation MSEs for the high spatial correlation channel and low spatial correlation channel respecti v ely. For AR jarnmers simi lar conclusions on the estimation performance achieved by different training sequences can be made as in th e case of co-channel interference 3 6 Conclusion In this chapter we consider a wireless communication system with multiple transmit and receive antennas in a slow Rayleigh flat-fading environment. We study the problem of the estimation of correlated MIMO channe l s with colored interference The Bayesian channel es timator is d erive d and the optim a l tr a inin g s equ e nces a r e d e si gne d ba se d on th e m e an squ are error (MSE) of channel estimation. We propose an algorithm to estimate long-term chann e l

PAGE 84

UJ Cl) ::E 15 .... .. 20 .... '* .... 25 30 "'lit-.:. : ..... 35 40 N 74 --+1) Optimal squences "* -2) Approximate optimal sequences ----'
PAGE 85

w (/) :::i; ~--~------~--------< -+1) Optimal sequences 2) Approximate optimal sequences --'173 ) Temporally optimal sequences --&4) Spatially opti m al sequen ces -e5 ) Orthogonal sequen ces -+-6) Random sequences --vApproximate temporally opt i mal sequences -0Approximate spatially optimal sequences 10 -2~-~ --~---~--~--~---~-~---~--~--~ 15 20 25 30 35 40 N 45 50 55 60 65 75 Figure 3.4: Comparison of total MSEs obtained using different training sequences. ARjammers and low s patial correlation channel.

PAGE 86

76 statistics and design an efficient feedback scheme so that we can approximately construct the optimal sequences at the transmitter. Numerica l resu l ts show that the optimal training sequences provide substantial performance gain for channe l estimation when compared with other training sequences. 3. 7 Appendix 3.7.l A Trace Problem In this appendix, we analyze a variant of the optimization problem (3 .18) which can be fonnulated as mm s 1 l tr ( R f SHQ;,1SRt + lt)-1 (3. 64) subject to tr{S1-1S} ::; P Two different trace optimization problems (3 .18) and (3. 64) are re l ated in the form of cost functions. The cost function of the original optimization problem (3.18) can be rewritten as which can be viewed as the weighting of the cost function of the new trace optimization problem (3. 64). For the sake ofnotational simp l icity we consider the following same optimization problem as (3.64) with different but simpler notations. min s subject to tr (S1-1S) ::; P S E cmxn_ (3.65) Here Q is a nonzero Hermitian positive semidefinite matrix Dis a nonzero Hermitian positive definite matrix and the positive scalar P is the power constraint associated with the signal S The main results on the solution to the optimization problem (3.65) are cited here for the completeness of the dissertation and the details can be found in the literature [81]. We write the inverse matrix

PAGE 87

77 for convenience. As discussed before the solution in the special case D = I can be expressed in terms of the eigenvalues and eigenvectors of Q and a Lagrange multiplier associated with the power constraint. For the optimization problem introduced here D # I and minimizing the trace of C is more difficult. We will show that (3.65) has a solution that can be expressed S = U~VH where U and V are orthonormal matrices of eigenvectors for Q and D respectively, and is diagonal. Solving (3.65) involves computing diagonalizations of Q and D, and finding an ordering for the columns of U and V We are able to evaluate the optimal ordering when either Pis large or Pis small. However for intermediate values of P, evaluating the optimal ordering is more difficult. The problem (3.65) has a combinatorial nature unlike the special case D = I. The trace problem (3.65) arises in spreading sequence optimization for code division mul tiple access (CDMA) systems. In cellular communication systems, multiple access schemes allow many users to share simultaneously a finite amount of radio resources CDMA is one of the main access techniques It is adopted in the IS-95 system and will be used in next generation cellular communication systems [82]. In a CDMA system, different users are assigned different spreading seq uences so that the users can share the communication channel. We consider the uplink ( communication from the mobile units to the base station) of a CDMA system where the users within a base station are symbol synchronous. The co-channel interference from the users in the neighboring cells are modeled by additive colored Gaussian noise. The received signal at the base station is I( y = L h i s ixi + e i=l where I< is the number of signals received by the base station, x i is the symbol transmitted from the i th user, s i E CN is the spreading sequence assigned to the i th user h i is the channel gain from the i th user to the base station and e E CN is the additive colored Gaussian noise with zero mean and covariance E. Usually the size of J{ and N are comparable. It is assumed that the symbols xi are independent with zero mean and unit variance. The received signal can be expressed as y = SHx+ e (3.66)

PAGE 88

78 where S, the spreading sequence matrix has jth column sj, and H is a diagonal matrix with i th diagonal element h i Again by the Bayesian Gauss-Markov Theorem [36 83], the MMSE estimator of x is The corresponding covariance matrix of the estimation error is The optimal spreading sequences for all the users which minimizes the co-channel interference to other cells subject to a power constraint, corresponds to (3.65) with Q = E -1 and D = H a diagonal matrix. To solve the trace optimization problem we begin by analyzing the structure of an optimal solution to (3.65). Let UAUH and VD. yH be diagonalizations of Q and D respectively (the columns of U and V are orthonormal eigenvectors). Let i 1 :S i :S n, and A j 1 :S j :S m, denote the diagonal elements of D. and A respectively We assume that the eigenvalues are arranged in decreasing order: (3.67) Let us define (3.68) Making the substitution S = UTVH in (3.65) yields the following equivalent problem: mm (3.69) subject to tr (TH T ) :S P T E cmxn We now show that (3.6 9) has a solution with at most one nonzero in each row and column Theorem 3.7.1. There exists a solu tion o/(3.69) of th e form T = II1I::II2 where II1 and II2 are permutation matri ce s and aij = 0 for all i =/= j. Combining the relationship (3.68) between T and S and Theorem 3.7.1, we conclude that problem (3.65) has a solution of the form S = UII1I::II2 yH, where II1 and II2 are permutation

PAGE 89

79 matrices. We will now show that one of these two permutation matrices can be deleted if the eigenvalues of D and Q are arranged in decreasing order. Let N denote the minimum of m and n. Making the substitution S = UII1I;II2 y H in (3.65), we obtain the equivalent problem: (3.70) N subject to tr La ; P. i=l Here the minimization is over diagonal matrices :E with a1 ... a N on the diagonal and per mutation matrices II1 and II2 The symmetric permutations II~ AII1 and II2AIIf essentially interchange diagonal ele ments of A and A. Hence (3. 70) is equivalent to N 1 mi n L 2 o-,1r1 ,1r2 i=l (51r2(i)a i ) >-1r1(i) + 1 (3. 71) N subject to L a ; P 7r1 E Pm, 1r2 E P n i=l where P m is the s et of bijections of {1, 2 ... m } onto itself. We first show that we can restrict our attention to the largest diagonal elements of D and Q. Lemma 3.7.1. L e t UAUH and VA yH b e dia go nali zations of Q and D r esp ect i ve l y w h ere the c olumns of U and V ar e orthonormal e ig e nv ec tors. L e t a, 1r1 and 1r2 d e not e a n optimal s olution of(3.7 1 ) and d efi n e th e se t s N = { i : ai>O}, Q ={A.,,.1(i):iE N } an d D = { b.,,.2(i):iEN } If N ha s l e l e m en ts then the e l e m e nt s of t h e set V and Q a r e a ll n on z e ro, and t hey c ons titute the l la rges t e ig e n v alu es o f D and Q r es p ec ti ve ly. Usin g L emma 3 .7. l w e now eliminat e on e o f th e p e rmutation s in (3. 7 1 ). Theorem 3. 7 .2. L e t U AU H and VA V H b e di ago nali za tion s of Q and D res p ective l y where the c olumn s ofU and V ar e o r thonormal e i ge nv ectors, and th e e i ge n va lu es of Q and Dare arranged in d ecre a sing ord e r a s in (3.67). If J( is the m ini mum of t h e rank ofD an d Q the n

PAGE 90

80 (3. 71) is equivalent to min K 1 L (5a-)2>. C i=l I I 7r I (3.72) K subject to La ; :SP, 1r E PK i=l where O"i = 0 for i > K. Proof The proof is similar to that for Theorem 3 .3.2 D Corollary 3.7.1. Problem (3.65) has a solution of the form S = urn:vH where the columns of U andV are orthonormal eigenvectors ofQ andD respectively with the associated eigenvalues arranged in decreasing order, IT is a permutation matrix, and :Eis diagonal. Proof The proof is similar to that for Corollary 3.3.1. D Assuming the permutation 1r in (3. 72) is given, let us now consider the problem of optimiz ing over a. To simplify the indexing let P i denote >-1r(i) Hence, for fixed -rr, (3 .72) is equivalent to the following optimization problem: min 17 /( 1 I: ( 6 a )2p i=l I l l (3. 73) K subject to La; :S P. i=l The solution of (3.73) can be expressed in terms of a Lagrange multiplier for the constraint. Theorem 3.7.3. The optimal solution of(3. 73) is given by (3.74) where the parameter is chosen so that K La ; = P (3. 75) i=l Proof The proof is similar to that for Theorem 3.3.3 D

PAGE 91

81 To solve (3.65), we need to find an optimal ordering for the eigenvalues of D and Q In Theorems 3.7.4 and 3.7 5 we determine the optimal ordering when the power Pis either large or small. Theorem 3. 7.4. If the eigenvalues {>.i} and {bi } of Q and D respectively are arranged in decreasing order, then for P sufficiently large an optimal permutation in (3.72) is 1r(' i) = K + 1 i 1 i K 1r(i) = i, i > K (3.76) Theorem 3. 7.5. Suppos e the eigenvalues { >.i} and { bi} of Q and D respectively are arranged in decreasing order, and let L be the minimum of the multiplicities of 1 and >.1 For P sufficiently small, an optimal solution of (3.65) is (3.77) where u i and v i are the orthonormalized e ig envectors of Q and D associated with >.1 and 61 respectively. 3.7.2 A Determinant Problem In this appendix, we analyze the following matrix optimization problem where we maxi mize the determinant, denoted det" of a matrix: max det ( DSH QSD + I ) (3.78) s subject to tr (SH S ) P s E cmx n Since the determinant of the inverse of a matrix is the reciprocal of the determinant of the matrix, it follows that problem (3.78) is equivalent to replacing trace by determinant in (3 65). Hence in the original problem (3 65), we minimize the sum of the eigenvalues of the MSE matrix C, while in the second problem (3 78), we minimize the product of the eigenvalues of C. In either case, we try to make the eigenvalues of C small but with different metrics. For the special case D = I the solution of (3.78) can be found in Telatar [1] and for the special case Q = I the solution of (3.78) can be found in Zhou [63 ]. For the more general problem (3.78) we again show that the so lution can be expressed S = U~VH, where U and V are orthonormal matrices of eigenvectors for Q and D respectively and is dia gonal. Unlike

PAGE 92

82 the trace problem (3.65), the ordering of the columns of U and V does not depend on the power P the columns of U and V should be ordered so that the associated eigenvalues of Q and D are in decreasing order This optimal eigenvector ordering result is the same as that for the optimization problem (3. 18) in Section 3.3 when the same notations for corresponding matrices are adopted. In Cai et al. [65] the a uthor s formulated the similar optimization problem while studying the space -tim e spreading (STS) scheme for correlated fading channels in the presence of interference. Based on the previous optimization result for the special case Q = I [63] U:EVH was chosen as the STS matrix and then the optimal eigenvector ordering and :E were decided. Here we so l ve the optimizat ion problem (3. 78) by using the method introduced in Wong et al. [ 61] and Wong et al. [84]. (Two important matrix inequalities arising from majorization theory [40] are used.) The determinant problem arises from spreading sequence optimization for C D MA systems. For CDMA systems a different performance measure which arises in information theory is the sum capacity of the channel. The mean square error is a performance measure for uncoded systems, while the sum capacity is a performance measure for coded systems. It represents the maximum sum of the rates at which users can transmit information reliably The sum capacity of the synchronous multiple access channel (3. 66) is C sum = max (xi ... XK; y ), where I represents the mutual infonnation [74] between the inputs x1 x2 ... XK and the out put vector y. The maximization is over the independent random inputs x1 x2 ... x K The maximum is achieved when all the random inputs are Gaussian. In t his case the sum capacity [71, 85] becomes 1 H H 1 C sum = 2 N l og det ( H S E SH+ I ) Since l og is a monotone increasing function the maximization of the sum capacity subject to a power constraint corr e spond s to the optimization problem ( 3.78) with Q = E -1 a nd D = H The solution to the determinant problem (3.78) can be expressed as follows : Theorem 3. 7.6. L e t VA UH and V Ll V H b e th e dia g onali zati ons of Q and D r e sp ec tiv ely wh e r e th e columns ofV andV are orthonormal e igenv ec tors and th e c orr e spondin g ei g e nva l u e s

PAGE 93

83 {>.i } and { (q are arranged in decreasing order. If K is the minimum of the rank ofQ and D then the optimal solution of (3. 7 8) is given by ( 3 .79) where :E is diagonal w ith diagonal e l ements given by { 1 1 } l/2 a i = max >.ic5;' 0 for 1 i K (3.80) and a i = 0 for i > K, where the parameter i s chosen so that Proof Initially let us assume that both D and Q are nonsingular later we remove this res tric tion. Insert T = Q112S in (3.78) and multiply the objection function on the left and right by det ( D -1 ) to obtain the following equivalent fonnulation: max det ( T H T + n -2 ) (3.81) T subject to tr (TTH Q -1 ) P T E cmx n Let wi, 1 i n, denote the eigenvalues of TH T arranged in decreasing order By a theorem of Fiedler (86] (also see (40 Chap. 9 G.4]) the determinant of a sum THT + n -2 of Hermitian matrices is bounded by the product of the sum of the respective eigenvalues (assuming the eigenvalues of T H T and Dare in decreasing order): n det (THT + n -2 ) IT (wi + & ;2 ) (3.82) i=l Also by a theorem of Ruh e (87] (also see (40 Chap 9 H 2 ]) the trace of a product (TTH)Q -1 of Hermitian matrices is bounded fr om b e low b y the sum o f th e product of respective eigenval ues (assuming the eigenvalues of TTH and Qare in decre as ing order): N tr (TTHQ-1 ) I::wi>.;1 N = min{m, n } (3.8 3) i=l since at most N eige nvalue s ofTH T an d TTH are non ze ro.

PAGE 94

84 We replace the cost function in (3.78) by the upper bound (3.82) and we replace the con straint in (3.78) by the lower bound (3.83) to obtain the problem: m ~ CTI,&;-') D,(w, + &;-') (3. 84) N subject to L wi>.;1 P wi 2'.: wi+1 2'.: 0 for i < N. i=l IfT is feasible in (3.81), then the square of its sing ul ar values are feasible in (3.84) by (3.83) And by (3 82) the value of the cost function in (3.84) is greater than or equa l to the associated value (3.81). Since the feasible set for (3. 84) is closed and bounded, and since the cost function is continuous there exists a maximizing w, and the maximum value of the cost function (3. 84) is greater than or equa l to the maximum value in (3.81). Consider the matrix T = un112yH where n is a diagonal matrix containing the max imizing w on the diagonal. For this choice of T the inequalities (3.82) and (3.83) are both equalities. Hence this choice for T attains the maximum in (3 8 I). The corresponding optimal solution of (3.78) is (3. 85) To comp l ete the proof of the theorem we need to explain how to compu t e the optimal w in (3.84) At the optimal solution of (3. 84) the power constraint must be an equality (otherwise we could multip l y w by a positive sca lar and increase the cost) Let us ignore the monotonicity constraint w i 2'.: wi+1 (we will show that the maximizer satisfies this constraint automatically). After taking the log of the cost function we obtain the following simplified version of (3. 84 ) : N max L l og(w i + i:5i-2 ) (3.86) w i = l N subject to L w i>.;1 = P w 2'.: 0 i = l Since the cost function is strict l y concave, the maximizer o f (3 .86) is unique.

PAGE 95

85 The first-order optimality conditions (KKT conditions) for an optim a l solution of (3.86) are the following: There exists a scalar 2'. 0 and a vector v E ]Rn such that (3.87) Analogous to the proof of Theorem 3. 7 .3, we define the function (3.88) This particular value for w; is obtained by setting v; = 0 in (3.87), solving for w ; ; when the solution is < 0 we set w i() = 0 (this corresponds to the + operator (3.88)). Observe that wi() in (3.88) is a decreasing function of which approaches +oo as approaches O and which approaches O as tends to +oo. Hence the equation n I: Wi()Ai1 = p (3.89) i=l has a unique positive solution. We have w i = 0 for 2'. >.;c5f which implies that I I 2 2 2 v = -2 + -= --2 + -> -c5 + c5 = 0 when 2'. >..c5, . wi() + c5;-A ; c5;-A ; ' It follows that the KKT conditions are satisfied when is the positi v e so lution of (3.89) Since the A i and c5i are both arranged in decrea s ing order it follows tha t for any choice 2'. 0 the wi given by (3.88) are in decreasing order. Hence, the constr a int wi+1 :S w ; in (3.84) is satisfied by the solution o f (3.86). Combining the formula (3.88) for the solution of (3.86) with the expression (3.85) for the s olution of (3.78), we obt a in th e so lution S given in (3.7 9) a nd (3.80) where 'E = A -1 / 2n112 Now suppose that e ith er D or Q i s singu l ar Let u s consider a p ertur b e d problem where we replace Q by Q f = UAf U H an d D by D f = V -6.f V H : max d et ( D f S H Q fSDf + I ) s s ubj ect to tr ( S H S ) :S P s E cmxn (3.90) He r e A f and -6.f are obtained from A and .6. b y setti n g c5i = E = >.1 for i or j > I<. Since Q f and D f are nonsingular it follows from our pr ev iou s a n a l ysis th at the p erturbe d problem (3. 90)

PAGE 96

86 has a solution of the form S = U~l V H where the diagonal elements of~ are given by { 1 1 }1 / 2 max ---2 0 >-A { 1 1 } 112 max 0 c:;3 for 1 :S i :S K for i > K Let be chosen so that i=l Observe that when c::3 < we ha v e a f = 0 for i > K and N I)an2 = P. i=l (3. 91) Hence for each c:: > 0 with c::3 < the optimal solution of the perturbed problem does not depend on c:: and the trailing diagonal elements a f for i > K vanish. Since the cost function in the perturbed problem (3. 90) is a continuous function of c::, we conclude that for c::3 < S l is the optimal solution of (3.90) for c:: = 0. The perturbed problem (3. 90) with c:: = 0 coincides with the original problem (3.78). Consequently, the solution (3 79) and (3. 80) is valid even when either Q or D is singular. D

PAGE 97

CHAPTER4 CONCLUSION AND FUTURE WORK To achieve the performance gain promised by multiple antenna systems parameter estima tions including timing estimation and channel estimation are key components of the space-time system design. In this work we investigate the timing estimation and channel estimation prob lems for MIMO systems. 4.1 Timing Estimation for Rayleigh Flat-fading MIMO Channels In Chapter 2 we consider a wireless communication system with multiple transmit and receive antennas in a slow independent and identically distributed (i.i.d.) Rayleigh flat-fading environment. We study the problem of timing estimation in such a system with the aid of training signals from two different approaches. In the first approach the channel is assumed to be unknown but deterministic and joint ML estimation of the channel and delay is performed. In contrast in the second approach we assume that the channel is random but with known statistics and use the likelihood function averaged over all random channel realizations to construct the ML estimator for the delay. For both approaches, we derive the optimal training sequences based on the performance measures associated with the CRB of timing estimation. These two approaches lead to two different optimal training signal designs. For the deterministic channel approach we show that orthogonal training signals from multiple transmit antennas minimi z e the outage probability as well as the average CRB. For the random channel approach perfectly correlated training signals employed at different transmit antennas minimi z e the CRB. 4.2 Channel Estimation for Correlated MIMO Channels with Colored Interference In Chapter 3 we consider a wireless communication system with multiple transmit and receive antennas in a slow Rayleigh flat-fading environment. We investigate the problem of estimating correlated MIMO channels in the presence of colored interference. The Bayesian channel estimator is derived and the optimal training sequences are designed based on mini mizing the MSE of channel estimation. The design of the optimal training sequences has a 87

PAGE 98

88 clear physical interpretation which implies that we should assign more power to the transmis sion direction constructed by the eigen-directions with larger channel gains and the interference subspaces with l ess interference. The power assignment is determined by the water-filling argu ment under a finite power constraint. In order to implement the channel estimator and construct the optimal training sequences, we propose an algorithm to estimate long-term channel statis tics and design an efficient feedback scheme so that we can approximately construct the optimal sequences at the transmitter Numerical results show that with optimal training sequences, the MSE of channel estimation can be reduced substantially when compared with other training sequences 4 3 Timing Estimation for Corre l ated MIMO Channels with Colored Noise In the second chapter we study the timing estimation problem with the assumption that the fading coefficients between the pairs of transmit and receive antennas are independent and identically distributed. This assumption does not generally hold in practice due to the antenna spacings and orientation, the mutual coupling, the richness of scattering and the presence of dominant components [88]. Thus it is natural to extend the current work to investigate the synchronization problem in correlated channels Another possible direction to extend the present work is to address the timing estimation problem for the MIMO system in colored noise. It is more suitable to adopt the colored noise model than the white noise model when jamrners and co-channel interference are present in the communication system

PAGE 99

REFERENCES [ l] I. E. Telatar "Capacity of multi-antenna Gaussian channels," Eur. Trans. Telecom., vol. l 0, pp. 585-595, Nov. 1999. [2] G. J. Foschini and M. J Gans "On limits of wireless communications in a fading environment when using multiple antennas," Wireless Commun. Mag., vol. 6 pp. 311-335, Mar. 1998. [3] V. Tarokh, N. Seshadri and A. R Calderbank, "Space-time codes for high data rate wire less communication: perfonnance criterion and code construction ," IEEE Trans. Inform. Theory, vol. 44, pp. 744-765, Mar. 1998 [4] S. Baro, G Bauch and A. Hansmann, Improved codes for space -time trellis-coded mod ulation IEEE Comm. Lett ., vol. 4, pp. 20 22 Jan. 2000. [ 5] A. R. Hammons and H. E. Gama! On the theory of space-time codes for PSK modula tion ," IEEE Trans Inform. Theory, vol. 46, pp. 524 542, Mar. 2000. [6] S. M. Alamouti "A simple transmit diver sity technique for wireless communications," IEEE J. Select. Areas in Commun., vol. 16 pp. 1451 1458, Oct. 1998. [7] V. Tarokh H Jafarkhani, and A. R. Calderbank, "Space-time block coding from orthogo nal designs ," IEEE Trans. Inform Theory, vol. 45, pp. 1456-1467, July. 1999. [8] V. Tarokh, H. Jafarkhani and A R. Calderbank, Space-time block coding for wireless communications: performance results IEEE J. Sel ec t Areas in Commun., vol. 1 7, pp 452-460, Mar. 1999. [9] G. Ganesan and P. Stoica, "Space-time div ersity using orthogonal and amicable orthogonal designs ," Wireless Personal Communications vol. 18, pp. 165 178 Aug. 2001. [ l O] S. Alamouti V. Tarokh and P. Poon "Tre llis-coded modulation and transmit di versity: design criteria and performance evaluation," in Proc IEEE Int Conj Universal Personal Communi cations vol. 2 Florence Italy Oct. 1998 pp. 703 7 07. [11] B M Hochwald and T. L. Marzetta "Unitary space-time modulation for multiple-antenna communication in Rayleigh flat fading ," IEEE Trans Inform Theory, vol. 46 pp. 543 564, Mar. 2000. [12] B. Hassibi and B. M. Hochwald High-rate codes that are lin ear in space and time ," IEEE Trans. Inform Theory, vol. 48, pp. 1804-1824 Jul. 2002. [13] S. Siwamogsathama and M. P. Fitz, Robust space-time coding for correlated Rayleigh fading channels ," IEEE Trans. Signal Pro cess ing vol. 50 pp. 2408 2416, Oct. 2002. 89

PAGE 100

90 [14) Y. Gong and K. B. Letaief, "Concatenated space -time block coding with trellis coded modulation in fading channels," in IEEE Trans. Wireless Commun., vol. 4 pp. 580-590, Oct. 2002. [15) G J. Foschini, "Layere d space-time architecture for wireless communication in a fading environment when using multi-element antennas ," B e ll Labs Tech. J vol. 1 no. 2 pp. 41-59, 1996. [16) G. Foschini G. Golden R. Valenzuela, and P Wolniansky Simplified processing for high spectral efficiency wireless communication employing multi-element arrays," IEEE J Select. Areas in Commun ., vol. 17, pp. 1841 1852 Nov. 1999. [17) S. Verdu Multiuser Detection Cambridge, UK: Cambridge Univ. Press, 1998. [18) M. J. Gans, N. Amitay, Y. Yeh H Xu, T. Darnen, R. Valenzuela T Sizer, R. Storz D. Taylor W. MacDonald, C. Tran and A. Adamiecki, Outdoor BLAST measurement system at 2.44 GHz: calibration and initial results ," IEEE J Select. Areas in Commun ., vol. 20, pp. 570 583 Apr. 2002 [ 19) C. Budianu and L. Tong Channel estimation for space-time orthogonal block codes ," IEEE Trans. Signal Processing, vol. 50 pp. 2515 2528 Oct. 2002 [20) P Stoica and 0. Besson "Training sequence design for frequency offset and frequency selective channel estimation," IEEE Trans Commun ., vol. 51, pp. 1910-1917, Nov. 2003. [21) C. Chuah, D. Tse, J. Kahn and R. Valenzuela, Capacity scaling in MIMO wireless systems under correlated fading ," IEEE Trans Inform Theory, vol. 48, pp. 637 650, Mar. 2002. [22) A. L. Moustaks and S. H. Simon Optimi zing multiple-input single-output (MISO) com munication systems with general Gaussian channels: nontrivial covariance and nonzero mean ," IEEE Trans Inform Theory vol. 49 no. 10 pp. 2770-2780, Oct. 2003. [23) S. A. Jafar and A. Goldsmith "Multip l e-antenna capacity in correlated Rayleigh fading with channel covariance information IEEE Trans. Wireless Commun. vol. 4, no 3 pp 990-997 May. 2 005. [24) E. A. Jorswieck and H. Boche Channel capacity and capacity-range of beamforming in MIMO systems under correlated fading with covariance feedback," IEEE Trans Wir e l ess Commun vol. 3, pp. 1543-1553 Sep. 2004. [25) E. A. Jorswieck and H. Boche Optimal transmission strategies and impact of correlation in multiantenna systems with different types of channel state information," IEEE Trans. Signal Pro cessing, vol. 52 no. 12, pp. 3440-3453, Dec. 2004. [26) A. Lo za no and A M. Tulino "Capacity of multiple-transmit multiple-receive antenna ar chitectures IEEE Trans. Inform Theo ry, vol. 48, no. 12, pp. 3117 3128 Dec 2 002 [27] A. L. Moustakas S. H. Simon, and A. M Sengupta "MIMO capacity through correlated channe l s in the presence of correlated interferers and noise: a (not so) l arge N analysis ," IEEE Trans. Info rm. Theory, vol. 49, no 10 pp 2545 2561 Oct. 2 003.

PAGE 101

91 [28] D. M. Dlugos and R. A. Scholtz "Acquisition of spread spectrum signals b y an adapti v e array IEEE Transa c tions on A cous tics Speec h and S ignal Proc e ssing, vol. 37 no. 8 pp. 1253 1270 Aug. 1989. [29] M. Z. Win and R. K. Mallik, Acquisition of spread spectrum signals in Rayleigh fading in Proc IEEE GLOBECOM 02, vol. 1 Taipei, Taiwan ROC, Nov. 2002 pp. 856-860. [30] P Shamain and L. B Milstein, "Acquisition of direct sequence spread spectrum signals with correlated fading IEEE J S e lect. Areas in Commun vol. 19 pp. 2406-2419, Dec. 2001. [31] Y. Zhang and S L. Miller Acquisition performance in transmission di v ersity CDMA systems in Pro c IEEE GLOBECOM 'OJ, v ol. 2 San Antonio Texas Nov 2001 pp. 827 831. [32] H. Krirn and M Vi berg Two decades of array signal processing research IEEE Sign a l Pro ces sin g Maga z in e, pp 67 94 July 1996 [33] A. L. Swindlehurst and P Stoica Maximum likelihood methods in radar array signal processing ," Pro c IEEE v ol. 86 pp. 421--441 Feb 1998 [34] A L. Swindlehurst Time delay and spatial signature estimation using known as yn chronous signals IEEE T r ans Si gnal Pro cessing, vol. 46, pp. 449--4 62 Feb 1998. [35] A Dogandzic and A Nehorai "Space-time fading channel estimation and symbol detec tion in unknown spatially correlated noise IEEE T r ans. S ignal Pro ce ssin g v ol. 50 pp. 457--474 Mar. 2002. [36] S M Kay Fundam e ntal s o f Stati s ti cal Signal Pro cess ing : Estimation T h e o ry E nglewood Cliffs NJ: Prentice Hall 1993. [37] H. V. Poor, An Introdu c tion to Si g nal D e t ec tion and Estim at ion. New York: Springer Verlag 1988. [38] F. M. Gardner, "Demodulator reference recovery techniques suited for digital implement a tion European Space Agency Final Reports, ESTEC Contract no. 6847 / 86/NL/DG, May. 1988 [39] M Moe neclaey, A fundamental lower bound on the performance o f practical joint carrier and bit synchronizer s, IEEE Tran s Comm un. vol. 32 pp. 1007-1012 Sep. 1984 [40] A. W. Marshall and I. Olk.in ine qualiti e s : The o ry o f M ajoriza tio n and Its Applicati ons. New York: Academic 1979 [ 41] M. Merkle and L. Petro v ic On Schur-con v exity of some di s tribution functions ," P ub/. In s t Math. v ol. 56 no. 7 pp. 111118 1994 [ 42 ] B. Has s ibi and B. M Hochwald "How much trainin g is need e d in mult i pl e-antenna wire less links ," I EEE Trans. Inform The o ry vol. 49, pp. 951 963 Apr. 2003 [43] J G Proaki s D igita l Co mmuni catio ns 4th e d Eng l ewook cliffs NJ: Prentice Hall 2000 [44] W. Rudin P r in c ipl es of Math e mati cal Analysis. New York : McGraw Hill 19 76.

PAGE 102

92 [45] I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals Series, and Products, 5th ed. New York: Academic, 1994. [46] B. L. Hughes, "Differential space-time modulation," IEEE Trans. Inform Theory vol. 46, pp. 2567 2578, Nov. 2000. [ 47] V. Tarokh and H. Jafarkhani, "A differential detection scheme for transmit diversity," IEEE J. Select. Areas in Commun. vol. 18, pp. l 169 1174, July 2000. [48] B. M. Hochwald and W. Sweldens, "Differential unitary space-time modulation," IEEE Trans Commun. vol. 48 pp. 2041 2052 Dec. 2000. [49] H. Cramer, Mathematical Methods of Statistics Princeton, NJ: Princeton Univ. Press 1946. [50] A. Graham, Kronecker Products and Matrix Calculus w ith Application New York: Wiley, 1981. [51] E. K. P. Chong and S. H. Zak,An Introduction to Optimization. New York: Wiley, 1996. [52] M. Medard The effect upon channel capacity in wireless communications of perfect and imperfect knowledge of the channel," IEEE Trans Inform. Theo ry vol. 46, no. 3, pp. 933 946 May 2000. [53] F. Digham N. Mehta, A. Molisch, and J. Zhang, Joint pilot and data loading technique for MIMO systems operating with covariance feedback," Int e rnational Confer e nce on 3G Mobile Communication Technologies, London, UK, Oct. 2004. [54] X. Ma, L. Yang and G. B. Giannakis, "Optimal training for MIMO frequency-selective fading channels IEEE Trans Wireless Commun., vol. 4, pp. 453-466, Mar. 2005. [55] M. Dong and L. Tong, "Optimal design and placement of pilot symbols for channel esti mation IEEE Trans. Signal Pro ce ssing, vol. 50 pp. 3055-3069 Dec. 2002 [56] S. Yang and J. Wu Optimal binary training sequence design for multiple-antenna systems over dispersive fading channels IEEE Trans V eh. T ec hnol vol. 51, pp 127 1-1276 Sep. 2002. [57] P Fan and W. H. Mow, "On optimal training sequence design for multiple-antenna systems over dispersive fading channels and its extensions," IEEE Trans. V eh. Technol. vol. 53, pp. 1623-1626 ,Sep.2004. [58] P. Fan, N Suehiro, N Kuroyanagi and X. M. Deng, A class of binary sequences with zero correlation zone Elec tron. L ett., vol. 35, pp. 77 7 -779 1999 [59] P. Fan and L. Hao, "Generalized orthogonal sequences and their applications in s y n chronous CDMA systems," IEICE Trans. Fundam ent., vol. E83-A no. 11, pp. 1-16 2000. [60] C. Fragouli N. Al-Dhahir and W. Turin Training based channel estimation for multiple antenna broadband transmissions IEEE Trans. Wir e l e ss Commun. vol. 2 pp. 384-391 Mar. 2003.

PAGE 103

93 [61] T. F. Wong and B. Park "Tra ining sequence optimization in MIMO systems with colored interference IEEE Trans. Commun., vol. 52, pp. 1939-1947, Nov. 2004. [62] J. H. Kotecha and A. M. Sayeed, Transmit signal design for optimal estimation of corre lated MIMO channels," IEEE Trans. Signal Pro cessing, vol. 52, pp 546-557 Feb. 2004. [63] S Zhou and G B Giannakis Optimal transmitter eigen-beamforming and space-time block coding based on channel correlations IEEE Trans Inform Theory vol. 49, pp. 1673 1690 July 2003. [64] M. Biguesh and A B Gershman "MIMO channel estimation: optimal training and trade offs between estimation techniques," IEEE Int ernationa l Conference on Communications, Paris France, Jun. 2004, pp. 2658 2662. [65] X. Cai, G. B. Giannakis and M. D. Zoltowski, "Space-time spreading and block coding for correlated fading channels in the presence of interference IEEE Trans. Commun. vol. 53, pp. 515-525, Mar. 2005. [66] A. M. Sayeed "Deconstrucing multi-antenna fading channels "IEEE Trans Signal Pro cessing, vol. 50 pp. 2563-2579, Oct. 2002 [67] D. S. Shiu G. J. Foschini, M. J. Gans, and J. M Kahn, "Fading correlation and its effect on the capacity of multielement antenna systems IEEE Trans Commun ., vol. 48 pp. 502-513 Mar. 2000. [68] A. Scaglione P. Stoica S. Barbarossa, G. B. Giannakis and H. Sampath "Optima l designs for space-time linear precoders and decoders IEEE Trans Signal Processing vol. 50, pp. I 051-1064 May 2002 [69] Y. Song and S. D. Blostein "Channel estimation and data detection for MIMO systems under spatially and temporally colored interference ," EURASIP Jou rnal of Applied Signal Processing pp. 685-695 May. 2004. [70] D Palomar J. Cioffi and M Lagunas, "Joint Tx-Rx beamforming design for multicar rier MIMO channels: a unified framework for convex optimization IEEE Trans. Signal Processing vol. 51, pp. 2381-2401, Sept. 2003. [71] P. Viswanath and V. Anantharam "Optimal se qu ences for CDMA under colored noise: a schur-saddle function property ," IEEE Trans. Inform. Theo ry, vol. 48 pp. 1295 -1318, 2002 [72] J. R. Magnus and H. Neudecker, Matrix Differential Calculus with Applications in Statis ti cs and Econometri c s Chichester, West Sussex: Wiley 1988 [73] G. Strang, Lin ear Algebra and I ts Appli ca tions 3rd ed. Orlando FL: Harcourt Brace Jo vanovich 1988 [74] T M Cover and J. A. Thomas Elements of I nformation Theory. New York: Wiley 1991. [75] N. Lu "Tests on multiplicative covariance structures ," Ph.D. Thesis University of Iowa 2002

PAGE 104

94 [76] P. J. Brown, M. G Kenward, and E. E. Bassett "Bayesian discrimination with longitudinal data," Biostatistics, vol. 2 pp 417-432, 2001. [77] P. Dutilleul, "The MLE algorithm for the matrix normal distribution ," Journal of statistical computation and simulation, vol. 64, pp. 105123 1999. [78] R. M Gray, "On the asymptotic eigenvalue distribution ofToeplitz matrices," IEEE Trans Inform Theory vol. 18, pp. 725 730, 1972. [79] U. Grenander and G. Szego, Toeplitz Forms and Their Applications. Berkeley, Calif.: Univ. California Press, 1958. [80] R. M Gray Toeplitz and Circulant Matric es: a R eview, Re v ised Aug. 2002 [Online ]. Available: http://www-ee stanford edu/,..,_,gray / toeplit z. pdf. [81] W. W. Hager Y. Liu and T. F. Wong "Optimiztion of generalized mean square error in signal processing and communication," submitted to Linear Algebra and Its Applications, Apr. 2005 Availabe: http://www.math.ufl edu /,..,_,hager. [82] A J. Viterbi CDMA-Principles of Spread Sp ec trum Communications. Reading MA: Addison-Wesley, 1995. [83] U Madhow and M Honig "MMSE in interference suppression for direct sequence spread spectrum CDMA ," IEEE Trans. Commun., vol. 42, pp. 3178-3188 Nov. 1994. [84] T. F. Wong and T. M. Lok, Transmitter adaptation in multicode OS-CDMA systems," IEEE J S e le c t Areas Commun ., vol. 19, pp. 69-82, Jan. 2001. [85] S Verdu Capacity region of Gaussian CDMA channels: the symbol-synchronous case," 24th Annu. Alle rton Conj Communications, Control and Computing, Monticello IL Oct. 1986 pp. l 025-1034 [86] M Fiedler, "Bounds for the determinant of the sum of hermitian matrices ," Pro Amer. Math. Soc. vol. 30 pp 27-31 1971. [87] A. Rube Perturbation bounds for means of eigenvalues and invarieant s ubspace ," Nordisk Tidskr. Informationsbehandling (BIT) vol. 10 pp 343-354 19 7 0. [88] C. Oest ges and A J. Paulraj "A physical scattering model for MIMO macrocellular wire less channels ," IEEE J S e l ect. Areas in Commun. vol. 21 pp 721-729, Jun. 2003

PAGE 105

BIOGRAPHICAL SKETCH Yong Liu received the B.Eng. in information engineering from Zhejiang Uni v ersity Hangzhou China in 1999 and the M Sc degree in electrical and computer engineering from the University of Florida Gainesville, in 2002 He joined the University of Florida in Fall 2002 to pursue a Ph.D. degree in electrical and computer engineering. He carried ou t research in wireless communication in the Wireless Information Networkin g Group durin g his Ph .D. studies. 9 5

PAGE 106

I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality as a disser tation for the degree of Doctor of Philosophy. Tan~?,/A)g Associate Professor of Electrical and Computer Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation an d is fully adequate, in scope and quality as a disser tation for the degree of Doctor of Philosophy . ortes h Eminent Scholar of Electrical and Compu te r Engineering I certify that I have read this study and that in m y opinion it conforms to acceptable standards of scholarly pre se ntation and is fully adequate, in cope and qu ity as a disser tation for the degree of Doctor of Philosophy Computer Engineering I certify that I have read this study and that in my opinion i t confonns to acce ptable standards of scholarly presentation and is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. -tJ~ ~;b#c William W Ha ger Professor of Mathematics

PAGE 107

This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. December 2005 Pramod P Khargonekar Dean College of Engineering Kenneth J. Gerhardt Interim Dean Graduate School

PAGE 108

I L 17 .~; 2as_._ .,__.._....,,,, __ L ?-83 UNIVERSITY OF FLORIDA 111111111/I Ill I l l lllll l llll I I IIIIII I III III I Ill l llll I l l llll l Ill I I 3 1262 08554 2594