UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations  Vendor Digitized Files   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
TIMING AND CHANNEL ESTIMATION IN MULTIPLEANTENNA COMMUNICATION SYSTEMS By YONG LIU A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005 Copyright 2005 by Yong Liu To my wife and parents. ACKNOWLEDGMENTS First of all, I would like to thank my advisor, Dr. Tan F. Wong, for his invaluable guidance, help and constant encouragement during my graduate study at the University of Florida. I also want to thank the other members of my graduate committee, Dr. John M. Shea, Dr. Jose A. B. Fortes and Dr. William W. Hager, for their suggestions and help. My special thanks go to Dr. William W. Hager for his great help during the development of the second half of this work. Finally, I am extremely grateful to my family for their encouragement, devotion and sup port throughout my whole life. TABLE OF CONTENTS page ACKNOWLEDGMENTS.................. .. ............ iv TA B LE . ... . .. . . vii LIST OF FIGURES ..................................... viii ABSTRACT ................. .. ... ......................... ix CHAPTER 1 INTRODUCTION ............................... 1 1.1 Timing Estimation for Rayleigh Flatfading MIMO Channels ..... 3 1.2 Channel Estimation for Correlated MIMO Channels with Colored Interference . . . 4 1.3 Organization of the Dissertation .................... 5 2 TIMING ESTIMATION IN MULTIPLEANTENNA SYSTEMS OVER RAYLEIGH FLATFADING CHANNELS ................ 7 2.1 Introduction .... .. .. .. .. 7 2.2 System M odel .............................. 8 2.3 Timing Estimation with Unknown Deterministic Channel ...... 11 2.3.1 M L Estimator ........ .. ... ........... 11 2.3.2 CramerRao Bound ...................... 12 2.3.3 Optimal Training Scheme ................... 14 2.4 Timing Estimation with Random Channel .............. 29 2.4.1 M L Estimator ............... ........... 31 2.4.2 CramerRao Bound ...................... 32 2.4.3 Optimal Training Scheme ................... 35 2.5 Discussions and Conclusions ..................... 37 2.5.1 Orthogonal Training Signals .................. 40 2.5.2 Perfectly Correlated Training Signals . .. 40 2.5.3 Deterministic vs Random Channel Approaches .... 41 3 CHANNEL ESTIMATION FOR CORRELATED MIMO CHANNELS WITH COLORED INTERFERENCE . . 42 3.1 Introduction .. .. .. .. 42 3.2 System Model ............... .... ........ 44 3.3 Optimal Training Sequence Design .................. 49 3.3.1 Solution Structure . . . 50 3.3.2 The Optimal E .......................... 56 v 3.3.3 Optimal Eigenvector Ordering . . 58 3.4 Estimation of Channel Statistics and Feedback Design ... 62 3.5 Numerical Results ............................ 69 3.5.1 Cochannel Interference . . 70 3.5.2 Jamming Signals ................ ........ 71 3.6 Conclusion .. .. .. .. .. 73 3.7 Appendix . . . . 76 3.7.1 A Trace Problem ...... . . 76 3.7.2 A Determinant Problem . . 81 4 CONCLUSION AND FUTURE WORK . . 87 4.1 Timing Estimation for Rayleigh Flatfading MIMO Channels 87 4.2 Channel Estimation for Correlated MIMO Channels with Colored Interference ................ ......... 87 4.3 Timing Estimation for Correlated MIMO Channels with Colored Noise 88 REFERENCES ........... ....................... 89 BIOGRAPHICAL SKETCH ....... .. .............. 95 TABLE Table page 1.1 M atrix N stations .. .. .. . .. 6 LIST OF FIGURES Figure page 2.1 Outage probabilities achieved using different training signal sets for a system with 4 transmit and 1 receive antennas. The unit of the threshold e is T2. 22 2.2 Outage probabilities achieved using orthogonal training signals for different numbers of transmit antennas. One receive antenna is employed. The unit of the threshold E is T ........ . . 23 2.3 Comparison of outage probabilities of the ML estimator obtained from simula tion and calculated from the CRB. The number of transmit antennas nt is 2 and = 104T2. ................. .... .. ..... 24 2.4 Comparison of the MSE of the ML estimator obtained from simulation and the average CRB. The number of transmit antennas nt, is 2. The unit in the vertical axis is T2............. ................... 30 2.5 Comparison of CRBs obtained using orthogonal training sequences and per fectly correlated training sequences for different numbers of transmit anten nas. Note that the CRB of the system with the perfectly correlated training sequences is the same for any number of transmit antennas. . 38 2.6 Comparison of the MSE of the ML estimator obtained from simulation and the CRB. The number of transmit antennas nt is 2. The unit in the vertical axis is T 2. . . . . . 3 9 3.1 Comparison of total MSEs obtained using different training sequences. ISIfree symbol waveform and high spatial correlation channel. . ... 72 3.2 Comparison of total MSEs obtained using different training sequences. ISIfree symbol waveform and low spatial correlation channel. . ... 73 3.3 Comparison of total MSEs obtained using different training sequences. AR jammers and high spatial correlation channel. . ... 74 3.4 Comparison of total MSEs obtained using different training sequences. AR jammers and low spatial correlation channel. . .. 75 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy TIMING AND CHANNEL ESTIMATION IN MULTIPLEANTENNA COMMUNICATION SYSTEMS By Yong Liu December 2005 Chair: Tan F. Wong Major Department: Electrical and Computer Engineering There is an increasing demand for next generation wireless networks, including wireless local area networks and the third generation cellular networks, that can provide high data rate for broadband services, improve quality of service (QoS), and support more users. The use of mul tiple transmit and receive antennas can offer substantial performance improvement to a wireless communication system by making the use of the extra degrees of freedom in the spatial domain and thus is a promising technique to satisfy this demand. Many of the current spacetime coding schemes proposed for multipleantenna systems assume perfect timing estimation and channel estimation to achieve the expected performance gain. The lack of timing synchronization be tween the transmit and receive signals and the inaccuracy of channel estimation could degrade the system performance. In the first half of this work, we investigate the problem of timing estimation in multiple antenna systems with the aid of training signals. A slow, independent and identically distributed Rayleigh flatfading channel model is considered. We derive two maximum likelihood timing estimators based on two different approaches, namely treating the channel as deterministic and random, and present the corresponding CramerRao bounds (CRBs). Then the optimal designs of training signals based on some figures of merit associated with the CRBs are discussed. In the second half of this work, we study the problem of the estimation of correlated multipleinput multipleoutput (MIMO) channels with colored interference. The Bayesian chan nel estimator is derived and the optimal training sequences are designed based on the mean square error of channel estimation. We propose an algorithm to estimate the longterm chan nel statistics in the construction of the optimal training sequences. We also design an efficient scheme to feed back the required information to the transmitter where we can approximately construct the optimal sequences. Numerical results show that the optimal training sequences provide substantial performance gain for channel estimation when compared with other train ing sequences. CHAPTER 1 INTRODUCTION There is an increasing demand for next generation wireless networks, including wireless local area networks and the third generation cellular networks, that can provide high data rate for broadband services, improve quality of service (QoS), and support more users. The use of multiple antennas at both the transmitters and receivers in wireless communication systems is a significant technical breakthrough which can offer substantial performance improvement to wireless links by making the use of the extra degrees of freedom in the spatial domain and thus is a promising technique to fulfill these requirements. A system employing multiple transmit and receive antennas is often called a multipleinput multipleoutput (MIMO) system. Recently, the MIMO system and its related techniques have been widely considered for next generation wireless communication systems such as wireless local area networks (WLAN) and the third generation (3G) cellular networks. With multiple antennas, the communication performance can be improved by many orders of magnitude without increasing transmit power and bandwidth. Only more hardware complexity is needed. This additional hardware requirement is enabled by the increasing computational power of integrated circuits. MIMO systems provide various benefits that include spatial multiplexing gain and diver sity gain. The information capacity of wireless communication systems increases significantly by employing multiple antennas. It has been analytically proved that MIMO systems can pro vide a linear increase in capacity [1, 2] which is proportional to the minimum of the number of transmit antennas and the number of receive antennas. This spatial multiplexing gain can be obtained by transmitting independent data streams from different transmit antennas. The increased information rate is achieved without the requirement of increasing the transmit power and expanding the transmission bandwidth. The physical characteristics of the wireless channel present a fundamental technical chal lenge for reliable communications. Wireless communication channels exhibit significant sig nal variations on a short term time scale which is known as fading. One way to mitigate the degradation effects of fading is to employ diversity techniques which provide the receiver with several replicas of the same transmitted signal over independent fading channels. The proba bility that all the received signals experience deep fades simultaneously reduces considerably. Thus diversity techniques increase the reliability of wireless links and dramatically improve the communication performance over fading channels. The commonly used diversity techniques include time diversity, frequency diversity and spatial diversity. Time diversity can be provided by channel coding combined with interleaving or automatic repeat request (ARQ) schemes. In frequency diversity, the same narrowband signal is transmitted over over different frequency bands to provide independent fading channels. Spatial diversity, which is also known as an tenna diversity obtained by the use of multiple antennas, is preferred over time diversity and frequency diversity since it does not need to increase the transmit signal power and bandwidth. If the fading effects between different pairs of transmit and receive antennas are approximately independent and the transmitted signal is carefully designed, the received signals can be com bined at the receiver such that the fading of the resultant signal is greatly reduced compared to a single antenna communication system and thus wireless link improvement is provided. Spacetime coding (STC) is one key technique that has been introduced to provide en hanced performance for wireless communication systems employed with multiple antennas. Space time codes are designed to use the extra degrees of freedom in the spatial domain pro vided by extra antennas. They incorporate the temporal and spatial correlations into signals from different transmit antennas to achieve transmit diversity and provide spatial multiplexing gain. The main classes of space time codes include the Bell labs layered spacetime architecture (BLAST), spacetime trellis codes (STTC) and spacetime block codes (STBC). Tarokh et al. [3] proposed spacetime trellis codes which can provide full diversity gain at the receiver. After that, many efforts have been made to improve the originally designed space time trellis codes [4, 5]. Since spacetime trellis codes are designed based on trellis codes, they provide additional coding gain. But the Viterbi algorithm has to be employed for the optimal decoder of STTC, and thus the decoding complexity grows exponentially with the memory length of trellis codes and the number of antennas. To reduce the decoding complexity, Alamouti introduced a simple spacetime block coding scheme for a two transmit antenna system which can provide full diversity gain without sacrific ing the transmission data rate [6]. The scheme was extended to more than two transmit antennas based on the theory of orthogonal designs [7, 8, 9]. Spacetime block codes can be decoded us ing much simpler linear processing at the receiver compared with the Viterbi algorithm required for spacetime trellis codes. Although spacetime block codes achieve the same diversity gain as spacetime trellis codes for the same number of transmit antennas, they do not provide any significant coding gain. To make a compromise between STBC and STTC, the schemes of con catenating the traditional trellis codes with spacetime block codes to obtain additional coding gain has been proposed [1014]. BLAST [15, 16] is the first spacetime coding scheme proposed for MIMO systems which provides spatial multiplexing. In BLAST, the multiple independent data streams are transmit ted from different transmit antennas, and are extracted by using the interference nulling and interference successive cancelation strategies at the receiver. This decoding scheme operated in spatial domain for BLAST is similar as the successive interference cancelation proposed for multiuser detection [17] in CDMA systems. Field tests showed that BLAST provides a substan tial increase of data rates for wireless communication systems operating in practical channels [18]. 1.1 Timing Estimation for Rayleigh Flatfading MIMO Channels To achieve the performance gain promised by the multiple antenna system, parameter es timations including channel estimation, timing estimation and frequency offset estimation are key components of the spacetime system design. Both channel estimation and frequency offset estimation for MIMO systems have been extensively studied in the literature [19, 20]. An issue that has not been sufficiently explored is timing synchronization in multiple antenna systems. Inaccuracies in timing synchronization can degrade the performance of such communication systems in a similar way as the MIMO channel estimation and frequency offset estimation error do. For instance, many of the current spacetime coding schemes proposed for multipleantenna systems assume perfect knowledge of timing and channel gains at the re ceiver in order to be able to achieve the promised diversity gain and capacity improvement. The performance of these systems may be limited by the accuracy of timing estimation. One objective of this work is to study the problem of timing estimation for a wireless communica tion system employing multiple transmit and receive antennas in a Rayleigh flatfading channel environment. 1.2 Channel Estimation for Correlated MIMO Channels with Colored Interference For the multiple antenna communication system, theoretical analysis [1, 2, 15] shows that the capacity increases linearly with the number of antennas under the assumption that channel gains between different transmit and receive antennas are identical and independent distributed (i.i.d.). The i.i.d. assumption is reasonable for sufficiently rich scattering environments. On the other hand, it is also important to analyze the capacities, design optimal transmission strategies, and investigate the related channel parameter estimation problem for MIMO systems in more realistic situations which include spatially correlated channels and colored interference. In the more realistic channel environment, fading correlation exists between the different transmit antennas and receive antennas. It was shown [21 ] that the capacity of correlated MIMO channels still grows linearly with the number of antennas but the growth rate is affected by the channel correlations and smaller than that in independent fading channels. Based on the ca pacity results for correlated MIMO channels, optimal transmission strategies [2225] have been widely investigated. Jorswieck et al. [24] investigated the correlated Rayleigh flat fading MIMO systems with perfect channel state information at the receiver and the channel covariance infor mation fed back to the transmitter. It was shown that transmitting signals along the directions of the eigenvectors of the transmit correlation matrix is the optimal transmission strategy. The capacity of MIMO channels has also been investigated for wireless communication systems with colored interference. The scenario arises in cellular systems where the users in one cell suffer from the cochannel interference from the users in other cells due to frequency reuse, or in ad hoc networks where each transmitterreceiver pair suffers from the interference from other transmitterreceiver pairs operating in the same frequency band. In Lozano et al. [26], the capacity of MIMO systems with the presence of spatially colored interference was investigated. It was shown that the capacity increases with the interference spatial correlation and the lowest capacity is achieved when the interference is white. In Moustakas et al. [27], the authors provided analytical expressions for the statistics of the mutual information for spatially correlated channels with the presence of interference. Channel estimation is necessary for coherent detection in multiple antenna communication systems. The inaccuracy of channel estimation could degrade the system performance substan tially. There are few works considering the channel estimation problem for MIMO systems in realistic situations, which include both spatially correlated channels and interference. So another objective of this work is to investigate the problem of estimating correlated MIMO channels with colored interference. 1.3 Organization of the Dissertation The dissertation is organized in the following manner. The timing estimation problem for MIMO systems with the aid of training signals is investigated in Chapter 2. In Chapter 3, we study the problem of estimating correlated MIMO channels in the presence of colored interference. Conclusions are drawn in Chapter 4. The notation used in this dissertation is summarized in Table 1.1 for clarity. Table 1.1: Matrix Notations A a Real(a) It 0 diag(xi, x2,.. A7' A* AH A1/2 vec(A) tr(A) det(A) AB a > b A ar CN b(t) ,X1) matrix with complex entries column vector with complex entries real part of column vector a n x n identity matrix zero matrix diagonal matrix with xa, 2, ... n as the diagonal elements transpose of A complex conjugate of A complex conjugate transpose (Hermitian) of A Hermitian square root of A vector obtained by stacking columns of A on top of each other trace of A determinant of A Kronecker product of A and B inequality elementwise matrix with real entries column vector with real entries complex Gaussian distribution the first derivative of (t) w.r.t. t the second derivative of 0(t) w.r.t. t 0) I CHAPTER 2 TIMING ESTIMATION IN MULTIPLEANTENNA SYSTEMS OVER RAYLEIGH FLATFADING CHANNELS 2.1 Introduction In this chapter, we investigate the timing estimation problem for a wireless communication system employing multiple transmit and receive antennas with the aid of training signals. Previous related work was primarily restricted to acquisition in spread spectrum systems with multiple receive antennas [28, 29]. In Dlugos et al. [28] and Win et al. [29], the maximum likelihood estimator of the received code lag was obtained, and the error probability for the acquisition system was derived. A deterministic but unknown channel was considered in Dlugos et al. [28], whereas a flat Rayleigh fading channel with known statistics was assumed in Win et al. [29]. An optimal estimator for code acquisition was derived in Shamain et al. [30] for spatially correlated channels. In Zhang et al. [31], the performance of code acquisition in a DSCDMA system employing multiple transmit antennas was analyzed. Through simulations, it was shown that the presence of multiple transmit antennas improved the code acquisition performance, relative to that of a singleantenna system. Issues related to parameter estimation of signals received by an array of antennas have also been treated in the radar array signal processing literature [32, 33]. Time delay and spatial signature estimation of known signals received by an array of antennas was investigated in Swindlehurst et al. [34]. ML algorithms and the CramerRao bound for time delay and array calibration estimation were developed, and some computationally efficient approximations of the ML algorithms were proposed. In Dogandzic et al. [35], ML methods were developed for spacetime fading channel estimation with an antenna array in spatially correlated noise. The CRBs for the unknown directions of arrival, time delays, and Doppler shifts were derived, under a structured and unstructured array response model. In the present work, we consider a wireless communication system with multiple trans mit and receive antennas in a slow, independent and identically distributed (i.i.d.) Rayleigh 7 flatfading environment. The goal is to investigate the problem of timing estimation in such a system with the aid of training signals. One of the main questions that we try to answer is to find the optimal training signal design. We investigate the timing estimation problem under two approaches. In the first approach, the channel is assumed to be unknown and determinis tic where joint estimation of the channel and delay is carried out. We derive an ML estimator for joint channel and timing estimation, and compute the associated CRB. Then we discuss the optimal training signals with respect to two performance measures based on the CRB: the outage probability that the CRB is larger than a threshold and the average CRB. We show that the optimal training scheme is one wherein orthogonal training signals from multiple transmit antennas are used. In the second approach, the channel is assumed to be unknown but random with known statistics. We use the likelihood function averaged over all random channel real izations to obtain the ML estimator for the delay. We derive the associated CRB and study the optimal training scheme in terms of minimizing the CRB. We show that perfectly correlated training signals employed at different transmit antennas constitute the optimal transmit scheme, in contrast to orthogonal training signals in the first approach. The rest of this chapter is organized in the following manner. The system model is in troduced in Section 2.2. In Section 2.3, we consider the timing estimation problem when the channel is assumed to be unknown but deterministic. In Section 2.4, we study the problem of timing estimation with the assumption that the channel is random but with known statistics. In both sections, we derive the ML timing estimators and compute the associated CRBs. Optimal training signal designs are discussed based on the corresponding CRBs. In Section 2.5, some discussions comparing these two timing estimation approaches are provided. 2.2 System Model We consider a singleuser MIMO system with 7t transmit antennas and nr receive anten nas. We assume a quasistatic (block fading) channel where the channel varies slowly enough to be considered invariant over a block. However, the channel changes to an independent value from block to block. By using the unstructured array model [33], the received baseband signals at the receive antennas are given in vector form by nt r(t) = hksk(t T) + n(t), (2.1) k=1 where hk = [hk2, hk2,... hknr]w with hi3 denoting the channel gain from the ith transmit antenna to the jth receive antenna, r(t) is the nr x 1 received signal vector from the receive an tenna array and Sk (t) is the transmitted training signal from the kth transmit antenna. Define the channel vector as h = [hf h,..., h ]T. Also, n(I) is a complex, circularsymmetric, white Gaussian noise process with zero mean and covariance matrix E[n(t)n(u)HI =2 a2 6(t u). The symbol T denotes the unknown, deterministic delay to be estimated. This model assumes that the delays between all pairs of transmit and receive antennas are the same. This corresponds to the case in which the distance between the transmit and receive antenna arrays is much larger than the sizes of the arrays. We consider the Rayleigh flatfading channel model, in which the channel coefficients h,j are i.i.d. complex, circularsymmetric, zeromean Gaussian random variables with the CA/(0, p2) distribution, i.e., E[hkh] P2ur, E[hkhfk = 0, and E[hihH] = E[hihT] = 0, for i 5 j. The conditional likelihood function of r(t), given the unknown r and h, can be written as p(r(t) I, h) = Trra 2exp ( r(t) hkSk(t 7) dt), (2.2) k=1 where we have assumed that the training signals sk(t), for k = 1,...., n,, have finite durations, and the observation interval To is larger than the sum of the maximum training signal duration and the maximum possible value of T. Thus the whole transmitted training signals are observed at the receiver. We can simplify the exponent of the likelihood function to find the sufficient statistics for the estimation of the delay r: To ut 2 / r(t) hkSk(t) dt '0 k=1 = rH(t)r(t) dt 2Re{ [ rH(t)sk(t r) dt hk = const 2Re r(t)sk(t r) dt hk k=1 . nt n t T o +hfh s (t)sj (t) dt, i1 =11 where the term const represents the part which does not depend on the delay r and the channel h. Also, the last equality holds due to the assumption that To is larger than the sum of the maximum training signal duration and the maximum possible delay. Denote the matched filter output corresponding to the kth transmit signal by rk(r) = r*(t)sk(t ) dt, k = 1,2.... nt. (2.3) Note that r(r) = [rl(r)T, r2(7)T, ... r, (r7)T]T provides sufficient statistics for estimating r. With this notation, we then have 2Re{ [ /T'rH(t)sk(t 7)dt hk} = 2Re{r(r)Th}. (2.4) Denote the crosscorrelation between the training signals from the ith and jth transmit antennas as r = s*(t)s,(t) dt, (2.5) which forms the (i, j)th element of the correlation matrix F. Let C = F 0 In,. Then, we have nt 11t T( E hhj s*(t)sj(t) dt = h"Ch. (2.6) i=1 =1 0 From (2.4) and (2.6), the conditional likelihood function of r(r), given the unknowns r and h, can be written as p(r(r)T, h) = 7r' a2nexp (const 2Re{r(T)Th} + hHCh1. (2.7) Let (7r) = [Re(r(r))', Im(r(r))T]T, f = [Re(h)T, Im(h)T]T, and = Re (C) Im(C) By using the isomorphism between real and complex matrices Im(C) Re(C) [36], we have 2Re{r(T)Th} = 2i(T)Th and h"Ch = 2h1TC. In terms of these real quantities, the conditional likelihood function of f(r) is then p(i(r)7r,h) = 7rn"2nexp cost 2(T) h + 2hTh (2.8) 2(2.8) 2.3 Timing Estimation with Unknown Deterministic Channel In this chapter, we will treat h as unknown but deterministic in the estimation process and consider the joint estimation of the delay r and the channel vector h. 2.3.1 ML Estimator In this section, we develop the ML estimator for the joint estimation of the timing r and the channel vector h. The joint ML estimate of r and h maximizes the conditional likelihood function (2.8) as a function of r and h: maxp((7)T, h) = max{mnaxp(r(r)r, h)}. (2.9) r,h h Alternatively, we can maximize the loglikelihood function given by 1 L = const + I (2f (r) h 2h'Tft). (2.10) As suggested in (2.9), we first maximize the log likelihood function L over h. Taking the first derivative of L with respect to (w.r.t.) h gives aL 1  = {2f(r) 40C}. Off 6, By letting = 0, we get the ML estimate of the channel h as m = C (T), (2.11) 2 where we have assumed that C, i.e. C, is nonsingular to obtain a unique estimation of the channel. Then substituting (2.11) into (2.10) gives the ML estimate of the delay 7 in the form: T,,,I = arg max {i(T)TC ((T)}. (2.12) To implement the ML estimator in general, we need to conduct a line search over all possible values of r to maximize the above metric. 2.3.2 CramerRao Bound The CramerRao bound gives a lower bound on the variance of any unbiased estimator [36, 37]. It has been widely used to lower bound the mean square error (MSE) of symbol timing estimators [38, 39]. It is well known [36, 37] that ML estimators, under mild regularity condi tions and with independent and identically distributed observations, are asymptotically unbiased and efficient. It can be easily verified that the elements of r(T) given in (2.3) corresponding to different receive antennas are i.i.d. observations. Thus for a particular realization of the channel h, the ML estimator is asymptotically efficient, i.e., it approaches the CRB as the number of receive antennas nr becomes large. Hence the CRB is a suitable performance measure for the ML estimator of the delay 7. We will also verify the suitability of employing the CRB as a performance metric by computer simulation examples. The main result of this section on the CRB is contained in the following theorem. Theorem 2.3.1 (CramerRao bound). Suppose that the first and second derivatives of the training signals Sk(t), for k = 1,..., nt, exist and they are uniformly continuous on [0. To]. Together with the standard regularity conditions in [36, 37], the CramerRao bound for the estimation of the delay 7 for a given realization of the channel h is given by ,2 1 CRB(h) 2r() T E'TC r' (2.13) where E[ ] = [E[ a2 ], E[ ],.... E[ ] with E = h' (t) ()}l=, i= 1,2,...,nt (2.14) T k=1 k Jk= o andE[9 = [E[ T E[2 ]T. .E[ ] with E ) = h s (1)(t) dt, i = 1,2,...,nt. (2.15) k=1 " Proof The CRB for the estimation of r is given as CRB(h) = (I1)22, (2.16) where I is the Fisher information matrix for the joint estimation of the channel h and the delay T which is defined as I = 112 E E (2.17) 121 122 E 2Lj _E Lj Since t = 4C and = 0, we have oh Oh E 52L1 4. I1 = E = H (2.18) 1,h2I W2 Moreover, 112 = E 21 (2.19) Let v E [ =E E E[ .. .,  T , thenl2 = = ^ [Re(v)T, _Im(v)T]T. The ith block of v can be computed from ,(r) fT r(t) Os(t T) dt 1 r*(r'(t) 0  = nh .s*(I 7.) + n*(.)]) di 0 k=1 h= h*( s(t 7) dt n*(t) '(t') dt. k=1 0 The fact that the noise n(t) is zeromean gives E [ ( = h; s (t r) (t dt = hi s*(t)(t) dt. (2.20) ST k=1 k 9k=1 Finally, 122 = E [ ]) h = Re {E [a2)] h}. Similarly, 122 can be computed from the fact that E { )= h s*((t) i(t) dt. (2.21) k=i 1 Applying the standard result on the inverse of a partitioned matrix to (2.16) and (2.17) gives CRB'(h) = 122 121111112. (2.22) By using the relationship between real and complex matrices [36], we get 121111112 2= T 2 V = 2VTv*. (2.23) Then the CRB of the estimation of the delay r is CRB(h) = 2 ". (2.24) 2Re E[J h +E [ C E[LTJ We note that the CRB varies with different choices of training signals. By carefully choos ing the training signals to minimize a suitable measure associated with the CRB, we can poten tially improve the estimation performance. 2.3.3 Optimal Training Scheme Communication systems often employ the same symbol waveforms for both training and data phases. The choice of the symbol waveform is mainly decided by the performance required by data transmissions. In this section, we shall make the following simplifying but practically reasonable assumptions on the training signals: Assumption 1 Let ak = [ak(0)...., ak(N I)]r be the training sequence assigned to the kth transmit antenna, and on this antenna the training signal waveform is of the form N1 Sk(t) = E ak(i)4(t iT,), (2.25) i=0 where N is the number of training symbols and Vp(t) is the symbol waveform. We call the N x nt matrix A = [a. a2, ..., an,] as the training sequence matrix. Assumption 2 The symbol waveform Vj(t) is timelimited to a single symbol period [0, T,] so that adjacent symbols do not interfere with each other. In addition, O(t) is sufficiently smooth to guarantee the existence of uniformly continuous first and second derivatives. This condition is satisfied for most symbol waveforms of practical interest. Two typical examples are the timedomain raisedcosine pulse and the halfsine pulse. Assumption 3 AHA is nonsingular, and hence F and C are also nonsingular. We note that this implies that N > nt. Under the assumptions stated above, the CRB for the timing estimation can be simplified to the expression summarized in the following corollary. Corollary 2.3.1 (CramerRao bound). Given Assumptions 13, the CRB for the estimation of r for a particular realization of the channel h reduces to CRB(h) = 2 1H (2.26) 2V, hH(AHA 9 In,)h' where '0 = ,bc r I Od12, Ob f I(t)12 dt, V)c = f V*(t)(t) dt, and 'd = f ?*(t) (t) dt. Proof With the three assumptions on the training signals, we have 7 ToN1 N1 s'(t)(t)dt = ( Eak*(m)O*(t mT,) E (l)ai(tlT,)dt m0 0n=0 1=0 H aa,. I (t)(t) dt (2.27) Then, Eqn. (2.14) can be written in terms of the training sequences as: E 02 :c h(a"a. (2.28) k=1 Thus E [2r h = ,E T hha'Ha, = V,1h(AHAI I)h. (2.29) E r2 i kk i=1 k=1 Hence 122 = gRe E [ hr = 2chH(AHA 0,,)h. Moreover, (2.15) can also be simplified in terms of the training sequences as: E d h.akai. (2.30) k=l Thus E = )d(AHA Inr)*h*. Similarly, we have = s(t)sj(t) dt = ,a aj (2.31) and C = 'b(AHA 0 I,,).Hence, (2.23) can be written as 121 1112 = 2Vdh (AHA 0 In,r)(bAHA 0 In,)'V*(AHA 0 I1,)h = 2 dIhH(AHA I9,)h. (2.32) 02 Ob Then the CramerRao bound for the estimation of the delay r is CRB(h) = [122 1211111211 = O hH(AHA 0 I,)h 2 d h(AA I,,) h S a2' b 1 (2.33) 2(',00, + '..', 2) hH(AHA 0 Io,)h By using some standard properties of the Fourier transform similar to the Parseval's theo rem, we have b = f[ ()2d, '_ 2 . j 2 (w) 2dw, and ad = f J jl W'(w)j2dw, where T1(w) is the Fourier transform of V(t). Then according to the CauchySchwarz inequality, we have 0a = M'b + I dl2 [ T! 0 ~()12dw]2 +1[j +00 C(9 27r 27r < 0. (2.34) Since Ob > 0, we have > 0 which implies that the expression of the CRB given in (2.33) is nontrivial. As a result, the dependence of the CRB on the training signals Sk(t), for k = 1,..., nt, simplifies into that on the training sequence matrix A and the symbol waveform i(t). In the following two subsections, we optimize the training sequence matrix A in terms of two perfor mance measures, namely the outage probability that the CRB is larger than a threshold and the average CRB over all channel realizations. Outage probability In this subsection, the outage probability that the CRB is larger than the threshold 6, i.e. Pr(CRB(h) > (), is used as a performance measure with respect to which the training signals from different transmit antennas are optimized. Write the spectral decomposition of AHA as AHA = UAUH, where U is a unitary matrix and A = diag{ A1, A2, ., An, } is the diagonal matrix containing the positive eigenvalues of AHA. The design of the optimal training scheme can now be formulated as the following optimization problem: mmin Pr(CRB(h) > c) A subject to tr{AHA} = Eib1 A. <_ A, > 0, i = 1,...,n (2.35) where pbtr {AHA} < P specifies a constraint on the total transmit power. First, we consider a simple but important case: 2 transmit antennas and 1 receive antenna. In this case, the optimization problem (2.35) can be simplified as follows. Starting from Corol lary 2.3.1, we have Pr(CRB(h) > e) = Pr hHAHAh < (2.36) With the spectral decomposition of AHA, hHAHAh = hHUAUHh = h'HAh' = Aii2 + A2h'12, where h' = UHh. Since h is a random vector with i.i.d. complex, circularsymmetric, zeromean Gaussian elements and U is a unitary matrix, h' is also a complex Gaussian random vector with the same distribution as h. We note that Ih'I 2 has the exponential distribution with E(h 12) = p2. Let X = and c = P for i = 1, 2, then Pr hHAHAh < ) ) Pr (cXi + c2X2 < , (2.37) 2 2ea/ 2cV\ Pp2 where X1 and X2 are independent random variables with exponential distribution and E(Xi) = E(X2) = 1. The total power constraint A, < is equivalent to c, + c2 < 1. Hence the optimization problem can be rewritten in the following simple form: min Pr (ciXi + c2X2 < 2Eb2 Ci,C2 \ pp2 subject to c + c2 < 1, and c1,c2 >0 (2.38) In order to solve the above optimization problem, we employ the following result on the Schurconvexity' of the distribution function of the linear combination of two exponential ran dom variables [41]. Lemma 2.3.1. Let X1 and X2 be independent random variables with exponential distribution, and E(Xi) = E(X2) = 1. Then the function F(ci. c2, x) = Pr(ciXi + c2X2 < x), where cl + c2 = 1 and cl, c2 > 0, A detailed description on Schurconvexity and majorization can be found in Marshall et al. [40]. is Schur convex on (cl, c2) ifx < 1, and it is Schur concave on (ci. c2) if x > 3/2. Using the above lemma and considering the region in which the CRB threshold e >  P the optimization cost function in (2.38) is a Schur convex function on (c1, c2) Thus minimization of the cost function occurs if and only if cl = c2 = i.e.., A1 = A2 = [40]. This implies that the optimal A is such that A"A = E1I2. The optimal training scheme is summarized in the following theorem. Theorem 2.3.2. Suppose that the CRB threshold > the training sequence matrix A such that AHA = I minimizes the outage probability of the CRB for a system with 2 2Vkb transmit antennas and 1 receive antenna. That is, the optimal training sequences from different transmit antennas are orthogonal to each other and have equal powers. We shall see from the discussion in the next subsection on the average CRB (Corollary 2.3.2), the value  is exactly one half of the average CRB over all channel realizations. Thus, it is reasonable to consider the stated region of the CRB threshold. It seems natural that a result analogous to the one in Lemma 2.3.1 be true for the more general case. While the proof of such a result remains open, there is strong evidence regarding the Schur convexity of the function F(ci,..., c,,, x) = Pr(cl Xi + + cn, X,, < x) where Xi, for i = 1 ... nt, are independent random variables with unitmean exponential distribution. The following conjecture has been advanced in Merkle et al. [41], supported by some strong numerical results. Conjecture 2.3.1. The family of unimodal distribution functions F(c ,.... cn, x) is increasing with respect to the variance (i.e., Schurconvex) for small values x, and decreasing (i.e., Schur concave) for large values of x. Based on the above conjecture, we conjecture that the result in Theorem 2.3.2 extends to the case of arbitrary numbers of transmit and receive antennas: Conjecture 2.3.2. When A"A = I, the outage probability of the CRB is minimized if the CRB threshold c is not too small. Thus the optimal training sequences from different transmit antennas, in terms of minimizing the outage probability, are orthogonal to each other and have equal powers. In Hassibi et al. [42], the authors assumed perfect timing estimation and studied the prob lem of choosing the optimal training sequences for channel estimation to maximize a lower bound on the capacity of the channel that was learned by training. The optimal training se quences for channel estimation turned out to have the same structure as those we get here for timing estimation. To illustrate our conjecture on the optimality of orthogonal sequences, we have carried out a large number of numerical calculations. In the broad region of c that we are interested in, we have not observed the existence of any other schemes which can achieve a lower out age probability than the orthogonal training signals. In Fig. 2.1, we plot, for instance, the outage probabilities Pr(CRB(h) > e) for a system with 4 transmit antennas and a single re ceive antenna employing different training signal sets. Note that since P is the total transmit power constraint, the signaltonoise ratio (SNR) p2 here should be understood as the total SNR for the whole training period instead of the SNR for one symbol period. The timedomain raisedcosine pulse is used as the symbol waveform. The results in the figure suggest that the orthogonal training signals are optimal and can provide a significant performance gain over the other training signals. In Fig. 2.2, we compare the outage performance of orthogonal training sequences for dif ferent numbers of transmit antennas. The results in the figure show that the use of multiple trans mit antennas can offer substantial estimation performance improvement over a singleantenna system. For example, if we consider the outage probability Pr(CRB(h) > e) = 0.1, the two transmit antenna system can achieve a 4 dB performance gain and the fourtransmit antenna system can achieve a 6 dB performance gain. The performance gap grows with decreasing outage probability. More precisely, the outage probability for orthogonal training signals is given by Pr(CRB(h) > E) = Pr hH h < ntb ) = 1 exp, 2 G pp2 2ba Pp2J (2.39) where the second equality is obtained from the fact that hHh is X2 distributed [43]. From (2.39), it is not hard to see that when the SNR is large, i.e. > 't the outage probability is approximately given by 1 F 2 ]n2 Pr(CRB(h) > c) b 2 [f p] "(2.40) (nnr),! 2eoa Pp2 Eqn. (2.40) indicates that the outage probability decreases with the (ntn,)th power of the re ciprocal of the SNR. The power ntn, is usually referred to as the diversity order of the system [43]. Thus we conclude that the use of multiple transmit and receive antennas (with orthogonal training signals) provides spatial diversity for timing estimation in the same way as spacetime coding does for demodulation [1, 15]. An important remaining issue is whether the ML estimator can achieve the outage prob ability of the CRB. For each realization of the channel h, the ML estimator is asymptotically efficient with increasing number of receive antennas n,. We note that Pr(CRB(h) > e) = Eh[l(CRB(h) > c)], where 1() is the indicator function. Because the indicator function is a bounded function, the dominated convergence theorem [44] implies that the ML estimator can achieve the outage probability of the CRB asymptotically. To verify the suitability of using the outage probability as a performance metric when the number of receive antennas is small, we evaluate the performance of the ML estimator via MonteCarlo simulations. In Fig. 2.3, we plot the outage probabilities of the ML estimator obtained from simulation and calculated using the CRB, respectively, for a system with two transmit antennas and employing orthogonal training sequences. It can be seen that the ML estimator gives an outage probability performance very close to that predicted by the CRB even for small values of n, = 1, 2, and 4. Hence, the simulation results verify that the outage prob ability of the CRB provides an effective performance metric also when the number of receive antennas is small. Average CRB In this subsection, we use the CRB averaged over the Rayleigh flatfading channel h as an alternate performance measure based on which the training signals from the transmit antennas are optimized. >7 R'~. \ \ 'S \\ \q N7 .\ \ \ ~ \ I \ 25 SNR (dB) Figure 2.1: Outage probabilities achieved using different training signal sets for a system with 4 transmit and 1 receive antennas. The unit of the threshold e is Tf . E =102,C=c=C=C3=4 e =10 2,c =2/3,c2=1/6,C3=c =1/12 B C=10 ,c1=9/10,c2=C3=C4=1/30 V =102,c =99/100C2 =C3 =C4=1/300 * =10 C1=C=C3C4 0 =10 3,C1 =2/3,c2=1/6,C3=c4=1/12  E =10 3,c =9/10,c2=C3=C4=1/30 E=10 3C =99/100,C2=C3=C4=1/300 103, 10 23 0 n=1,=10 100. n =2,E=102 N10 . n=4,E=103 N nt=4,E=103 Sn=2,E=10 ( .. ^ ... .. .. .. .. ... n =1,E=104 G L  nt=2,c=104 S. n=4,E=10 101 \ V. o N\ 10V 10 15 20 25 30 35 40 SNR (dB) Figure 2.2: Outage probabilities achieved using orthogonal training signals for different num bers of transmit antennas. One receive antenna is employed. The unit of the threshold e is T 2 24 10 I ML, nr=I S ML, nr=2 ... .. .... M L n 4 .... CRB, n=1 \ \_ CRB, nr=2 CRB, n,4 0 10  18 20 22 24 26 28 30 32 34 SNR(dB) Figure 2.3: Comparison of outage probabilities of the ML estimator obtained from simulation and calculated from the CRB. The number of transmit antennas nt is 2 and e = T104T?. After averaging over the Rayleigh flatfading channel h, the average CRB is given as Eh[CRB(h)] 2bEh [hH(AH IJ. (2.41) The design of the optimal training scheme can now be formulated as the following optimization problem: mmn Eh hH(AHAIn,)h subject to tr{AHA} =E Ai < Ai > 0, i = 1,...,nt. (2.42) The following theorem specifies the optimal training sequence that minimizes the average CRB. Theorem 2.3.3. When AHA = P1I, the average CRB over the Rayleigh flatfading channel h is minimized. That is, the optimal training sequences from different transmit antennas, in terms of minimizing the average CRB, are orthogonal to each other and have equal powers. Proof Let W = U'A'U'H, where A' = diag{A'i, A',..., A, } contains the positive eigenval ues of the Hermitian matrix W, and U' is a unitary matrix. Consider the following optimization problem: mmn E[ subject to tr{W} = EI' A' < ,v A' > 0, i= 1,...,ntnr. (2.43) Note that E [h h = E[hU = E[hHA = E [znt ,l (2.44) h hHWh h H U'A'U'Hh h'HA'h' =E_ AIhl2 where h' = U'Hh. As before, h' is a complex Gaussian random vector with the same distribu tion as h. Let g(A') = , where xi > 0 are assumed to be fixed constants. We study the convexity property of g. We have _ _  and 2 ,= j, Then the Hessian G(A') ofg is It is easily seen that every rows of the Hessian G(A') are dependent. So rank(G(A')) = 1. Ai I j 1." A')3 (E )3 G =A') 2aEj 2ax2xrt ( A, )3 ( A* )3 ". (E )3 It is easily seen that every rows of the Hessian G(A') are dependent. So rank(G(A')) = 1. G(A') only has one nonzero eigenvalue which is E 2X2 > 0. (the sum of eigenvalues of a matrix is equivalent with the sum of all diagonal elements.) All other eigenvalues are zero. Hence, the Hessian G(A') is a positive semidefinite matrix. Then g(A') is a convex function on R" = {(A,..., A,,,) : A' > 0, for i= 1,..., ntn,}. In order to solve the above optimization problem, we employ the following result from the theory of majorization [40]. We first introduce some fundamental concepts of majorization that we require in the derivation of the optimal transmit scheme. For any x = (x:,..., x,) R", let xp] > ...> x[ denote the components of x in decreasing order. Definition 2.3.1. For vectors x, y A C R", vector y majorizes x on A if k k X[i] < y[i], k = 1,...,n1 n n i=1 i=1 The notation x < y means x is majorized by y on A, or y majorizes x on A. Majorization makes precise the vague notion that the components of a vector x are less spread out or more nearly equal than the components of vector y. Definition 2.3.2. A realvalued function f defined on a set A C R"n is said to be Schurconvex on A if x y => f(x) f (y) f is Schurconcave if the above inequality is reversed. It follows that f is Schurconvex on A if and only if f is Schurconcave on A. Lemma 2.3.2. IfX1,.... X, are exchangeable random variables and the multivariable, single valued function g is a symmetric Borelmeasurable convex function, then the function f(a,,..., an) = E[g(aX1...., aXn)] is Schur convex. Since h' are i.i.d., they are exchangeable random variables. Since hI are exchangeable random variables and g(A') is a symmetric Borelmeasurable convex function, the function E 2[ 1", is Schurconvex by the lemma. Moreover, since P, is majorized by (Ai..., ,A't,) whenever A > 0, A' = nr, we know [40] that E n is minimized with A' = A'2 r = We Ab 1 1. 1A,'h',j s 1 = 2 t,,r kbnt' note that this choice of A', i = 1,..., ntn,, also satisfies the constraints in the minimization problem in (2.42). Thus, it is also a solution to the original minimization problem. Thus the optimal training sequence matrix A should satisfy A"A = P I which implies that the train ing sequences from different transmit antennas are orthogonal to each other and have equal powers. O With the optimal training sequences, we can provide an explicit expression for the average CRB which is described in the next corollary. Corollary 2.3.2 (Average CRB). Using the optimal training scheme, the average CRB over the Rayleigh flatfading channel h is given by Eh[CRB(h)]  b 2 (2.45) 2 (n _L) Pp2 when ntn, > 2. Proof From Theorem 2.3.3 and its derivation, the average CRB under the optimal training scheme is given as Eh[CRB(h)] = E2hbE p (2.46) 20,, bLnt 2i=l 2i I where h' are i.i.d. complex circularsymmetric Gaussian random variables with the C.A/(0, p2) distribution. Let Y = ~ ~'r Ih' 2. Then Y is xinedistributed with the probability density function (p.d.f.) fy(y)=1 yItle , for y > 0. (2.47) Let Z = 1/Y. The p.d.f. of Z is given as fz(z) = fy = 7(tt2r ), n,,+l' for z > 0. The expectation of Z can be computed as E(Z) = zfz(z)dz 1 00 = )2ltTr2tF(t) j e z"ntn2 dz. (2.48) When ntnr > 2 [45], we have 2n"i' (ntnr 2)! 1 E(Z) = p2nnr2ntr(ntn, 1)! p2ntn,+2 p2(ntn 1) (2.49) Then from (2.46) and (2.49), the average CRB can be written in a simplified way as Eh[CRB(h)] = 0 (2.50) 2 (n, ) a Pp2 With the optimal orthogonall) training sequences, the average CRB is a simple function of the constant , which only depends on the symbol waveform V(t), the signaltonoise ratio P2, the number of transmit antennas nt, and the number of receive antennas nr. Note that the average CRB in the limit of large nt or large nr can be approximated as Eh[CRB(h)] (2.51) 2nr pp 2, 2 2n, (VbU + I[d12) pp2' which is inversely proportional to the number of receive antennas nr. When V'(t) is symmetric about , such as the timedomain raisedcosine pulse and the halfsine pulse, pd becomes zero. Then the average CRB for the estimation of the delay r with orthogonal training signals can be written as 1 cr2 Eh[CRB(h)] = 1(2.52) 2 _L) 32Pp2 where = 1/2 [ (t)(t) dt/2 i 2 1/2 is known as the rootmean square bandwidth [37] of the symbol waveform. Here ID(cw) is the Fourier transform of 0 (t). We note that the average CRB can be decreased by increasing the bandwidth of the symbol waveform. As before, we would like to know whether the ML estimator can achieve the average CRB. Because the function hH(AHA0I,)h is not a bounded function, thus unlike the outage probability of the CRB, the ML estimator may not achieve the average CRB asymptotically (see further discussion in Section 2.5.1). However, the average CRB provides a lower bound for the variance of any unbiased timing estimator averaged over the channel realizations. Again, we employ MonteCarlo simulations to evaluate the performance of the ML esti mator with a small number of receive antennas. In Fig. 2.4, we compare the mean squared error (MSE) achieved by the ML estimator and the average CRB given by (2.45) for a system with two transmit antennas and employing orthogonal training sequences. For a single receive antenna system, the performance of the ML estimator deviates significantly from the average CRB. This is due to the events in which all the channel coefficients are very small simultane ously causing the estimation performance to be very poor. The large estimation errors caused by these events dominate the MSE of the ML estimator. We can see from the figure that the effect of these events diminishes as the number of receive antennas or the SNR increases. In the for mer case, the error dominating events become rarer as the number of receive antennas increases. In the latter case, the estimation errors, and hence the effect of the error dominating events, get smaller as SNR increases. For a reasonably small value of n.r, e.g. 4, and a reasonably high SNR, e.g. 20 dB, we see that the average CRB is still a rather appropriate performance metric. 2.4 Timing Estimation with Random Channel Recently, differential spacetime coding schemes [46, 47, 48] have been developed where channel estimates are not required at the receiver. For this situation, we only need to consider 20 SNR(dB) Figure 2.4: Comparison of the MSE of the ML estimator obtained from simulation and the average CRB. The number of transmit antennas nt is 2. The unit in the vertical axis is T2. the estimation of the delay r. A reasonable model to represent this scenario is that the channel is random with known statistics. 2.4.1 ML Estimator Recall that the conditional likelihood function p((7)17, f) of i (7) in terms of real vectors and matrices is given by (2.8). With the assumption of i.i.d. Rayleigh flatfading channels between the transmit and receive antennas, we have E [hh ] = p21n,,r and E al2] = I2n,,.n The joint probability density function of the channel vector h is given as p() = exp { h 2fTl. (2.53) We can average p((rT) r, h) over all realizations of h to obtain the unconditional likelihood function as p(i(r)r) = /P((r()T,hfi)p(h)dhfi = const x 1 exp { f(r)T (20 + PI r(7r) (2.54) v/det(2C + (I) P where we have used the integral result from Cramer [49, 11.12.1 a]. The natural logarithm of p(f(r) 17) is the loglikelihood function: ln[p(r(7T)7)] = const + T.fT 2C + I f(7). (2.55) p2 ) By using the relationship between real and complex matrices [36], the loglikelihood function can be written in terms of complex quantities as 1 .7T C 2 1 ln[p(r(r)r)] = const + r(r) C + I) r(r)*. (2.56) Hence the ML estimator for the delay r is given by Tn, = argmaxp(r(r) T) = arg max r(r)T C + 2I) r(r)* (2.57) We assume that L is known to the receiver for the implementation of the ML estimator. We note that the matrix C + 1 is always invertible. So unlike the restriction in Section 2.3, C can be singular which implies the training signals from different transmit antennas can be correlated with each other. 2.4.2 CramerRao Bound The CRB for the timing estimation based on the random channel model is summarized in the following theorem. Theorem 2.4.1 (CramerRao bound). Suppose that the first and second derivatives of the training signals Sk(t) exist and they are uniformly continuous on [0. To]. Together with the standard regularity conditions [36, 37], the CramerRao bound for the estimation of the delay r over the i.i. d Rayleigh flatfading channel model is given by CRB = 0(2.58) 2n, tr{D1G}" where D = F + I1 and the (i, j)th element of G is G p2 (IT Sk(t)A*(t) dt T s'(t)s j (t) dt k=1 0 + p2 sk(t)d*(t) dt ss(t)sj(t) dt 2 k=1 / \JI ) +p ( f sk(t)(t) dt) ( s(t)j(t) dt (2.59) 2 k=1 Jo Jo 0 fori,j = 1,2,...,nt. Proof To derive the CRB, we start from the loglikelihood function in (2.56): ln[p(r(r)r)] = const + {r()TC + i 2 r(r) The second derivative of the log likelihood function ln[p(r(T)Ir)] w.r.t. r is 021n[p(r(T) lT)] 0r2 2 r(r) T2+ ) r(* a2 2r I p2 )_ 1 a2r( rT U2 2 2 2 1) = tr{(Cp2yI, tr{ (C + I) T2 1 p2 W 12 r (7) C r{c P 1 I2 r() 12 ( ) 2 T 2 '2 The expectation of the above is E{ 021 n[p(r(i)r)] } + 2 2 E2 aTO 1 DT2 Tr(  +itr{(C + 0) E[ r(r) ] } +T2tr{ (C E [r(T)* a2rT]} Write r = aO r ,()T o9r .T ar, T where the ith block can be computed as h*~s (t ) dt k=1 k r  (t) s(t r) o 19' Or (T)* Or (T) = hk [ f Sk(t s(t ) dt T O s(t s r) dt] 0o kt a) 5 Ht k=1 I + n(t) ' _r dt + TO nH(t) asj dt . 0 aM Recall that the channel gain vector h is assumed to have i.i.d complex circular symmetric Gaus sian elements, i.e., E[hkhH] = p2,Ir, E[hkh ] = 0, and E[hihy] = E[hih '] = 0, i j. Thus 2 ) + I21 1 2 ( )* 9r2 D 19iT f 2 1 *2 "* + I r(r) 9Or(r) Di (2.60) Then (2.61) we have E ari(r)* ar (r) EL Sr r J 2 (p2 s k(t ) s*( dt k=1 T s(t T) s (t  +f. 2)T dt 2= p k (1) (dt ( To k=1 0 0 J) o Ss (t 7) (t T) dt In' st (1),(1)di) + a2 fTo u((t) dl}Inr. 0 As a result, we have E N o J = P 0 In,, where the (i, j)th element of P is given by pIj = p2 s ( J s(t)*(t) dt) k=1 0 ( O (t) j (t) dt) + 2 (t),(t) dt. Similarly, we also have E 5 r (T) ] + E r(r)*d2 = Q I, where the (i,j)th element of Q is given by "' To Qj = P2 Sk k=1 k=1 for i,j = 1,2,...,nt. Let D = r + 2I, then ;; E { ln[p(r(T))] &r2 sTo ( S0(t) S(t)dt) + ) 0 J *(t)s (t) dt Jo S(t)S:(t) dt) ( Sk(t)s*(t) dt) ( + 2 JT = tr{(D s In,)(P I)} + 1tr{(D In,) (Q 0 I)} = tr (D I,,) P + Q) 0 In = tr D1 P + 1Q I, 2 n, x tr D1P + Q (2.62) 0`22 1 where the second equality is obtained by using (A g B)1 = A1 0 B 1 and the third equality is obtained from the property (A 0 B) (C 0 D) = (AC) (BD) [50]. Let G = P + 1Q. Since s (t 7)sj(t 7) dt = F = const. fo sWgj(t)dt To / s*(t)gj(t)dt, differentiating the left side twice w.r.t. r gives 2 J *(t)sj(t) dt + f (t)sj(t) dt + J s*(t)sj(t) dt = 0. Then using the above equality, the (i, j)th element of G becomes 1 Gi = P + IQ SpE Sk(t),s(t) dt s(st),j(t) dt k=1 +12 P f s (t)g*(t)dt) ( fo s*(t)sj(t) dt) + P2 E k(t)s*(t) dt s (t)sj(t) dt (2.63) k= / \Jo / for i, j = 1, 2,..., nt. As a result, we note that G does not depend on the noise a2. The CRB of the timing estimation is given as 1 12 CRB = i = (2.64) E{ 'iIn!i''ro i ... 1, 2n, tr{D 1G } 2.4.3 Optimal Training Scheme In this section, we impose Assumptions 1 and 2 made in Section 2.3 on the form of the training signals. With these two assumptions, Gij can be simplified to G = 0P2 Haaa = P2a H {l H k=1 k=1 Hence we have G = 0bp2AHAAHA. Thus the CRB for the timing estimation can be simpli fied as given in the following corollary. Corollary 2.4.1. Given Assumptions 1 and 2, the CramerRao bound for the estimation of the delay r over the i.i.d Rayleigh flatfading channel model reduces to CRB = (2.65) 2nraP2 tr { (VbAHA + _I) AHAAHA} Moreover, in terms of the eigenvalues A1, A2, ..., An, of AHA, we have ,7 2 1 n, u tr AA p2 A AA"A = A2 (2.66) P i= 1b + 7 Thus the minimization of the CRB is equivalent to the following optimization problem: max E, I subject to E1 Ai < P Ai > 0, i = 1,...,nt. (2.67) It can be easily verified that the cost function Y i j' is a convex function on (A1,..., An,). Then the following theorem specifies the optimal training sequences [51]. Theorem 2.4.2. The CRB is minimized by choosing the training sequence matrix A such that A = , and A2 .. = An, = 0. That is, the optimal training sequences from different transmit antennas are perfectly correlated. We note that the rank of the optimal training sequence matrix A is 1. This implies that we can choose an arbitrary subset of transmit antennas to transmit the training signals as long as the training sequences from the chosen transmit antennas are perfectly correlated with each other. A common choice is to use the same training sequence and evenly assign the power to each transmit antenna. With the optimal choice of training sequences, the corresponding minimum CRB is given by: CRB =(1+ ). (2.68) 2na Pp2 ( 2 On the other hand, when orthogonal training signals are employed, i.e., A = = An, = the CRB is maximized to the value CRB = 1 + n (2.69) 2nr,'da Pp2 PP2 Contrary to the previous case of joint estimation of the channel and delay where orthogonal training sequences are optimal, they are the worst in terms of the CRB value for estimating the delay under the random channel model. Fig. 2.5 compares the CRBs of the system with the perfectly correlated training sequences and that with the orthogonal training sequences. Note that the CRB of the system with the perfectly correlated training sequences is the same for any number of transmit antennas. We see that the performance gain achieved by the optimal scheme is obvious when the SNR is low. For any fixed nt, the performance gap vanishes as the SNR becomes sufficiently large. In Fig. 2.6, we compare the MSE achieved by the ML estimator and the CRB given in (2.68) for a system with two transmit antennas and employing perfectly correlated training sequences. The perfect correlation is obtained by using the same training sequence and evenly assign the power to each transmit antenna. As will be discussed in Section V.B, no knowledge of signaltonoise ratio is needed to implement the ML delay estimator for this choice of perfectly correlated training sequences. We observe from the figure that for a reasonably small value of nr, e.g. 4, and a reasonably high SNR, e.g. 20 dB, the CRB is a tight lower bound on the MSE performance of the ML estimator. This together with the asymptotic achievability of the CRB suggest that it is an appropriate performance metric. 2.5 Discussions and Conclusions In the previous two sections, we have studied the problem of timing estimation in multiple antenna systems from two different approaches. In Section 2.3, the channel h is assumed to be unknown but deterministic and joint ML estimation of h and the delay 7 is performed. In contrast, in Section 2.4, we assume that the channel is random but with known statistics and use the likelihood function averaged over all channel realizations to construct the ML estimator for the delay T. These two approaches lead to two different optimal training signal designs. For the deterministic channel approach, we see that orthogonal training sequences minimize the outage probability as well as the average CRB. For the random channel approach, perfectly correlated training sequences minimizes the CRB. Here we compare these two approaches in terms of the resulting ML estimators, CRBs, and suitability of the outage and average CRB performance measures. 102 L) 0 5 10 SNR(dB) Figure 2.5: Comparison of CRBs obtained using orthogonal training sequences and perfectly correlated training sequences for different numbers of transmit antennas. Note that the CRB of the system with the perfectly correlated training sequences is the same for any number of transmit antennas. 20 SNR(dB) Figure 2.6: Comparison of the MSE of the ML estimator obtained from simulation and the CRB. The number of transmit antennas nt is 2. The unit in the vertical axis is T2. 2.5.1 Orthogonal Training Signals When orthogonal training signals are employed, both the ML estimators of the delay r under the deterministic and random channel approaches, respectively, reduce to ,m = argmax{r(r)Tr(r)*}. (2.70) Thus the equal gain combiner for the received signals from the receive antennas is the ML estimator for both approaches. Under the deterministic channel approach, the average CRB has the value Eh[CRB(h)] _) 2b. (2.71) 2 (nr p) P2 Under the random channel approach, the CRB has the value CRB b 21 + nt)or (2.72) 2na Pp2 ( Pp 2 As discussed before, the CRB in (2.72) is asymptotically achievable by the ML estimator when the number of receive antennas goes to infinity. In addition, the limiting ratio between (2.71) and (2.72), when nr approaches infinity, is 1 which is smaller than 1. This implies that 1+nt  the average CRB in (2.71) is not achievable by the ML estimator asymptotically when n,. ap proaches infinity. On the other hand, for small values of nr, the value in (2.71) can be larger than the value of (2.72) when the SNR is large enough. More precisely, this happens when > nt(nrnt 1). Thus in this case, the average CRB in (2.71) actually gives a tighter bound on the performance of the ML estimator. The simulation results in Fig. 2.4 are in agreement with this observation. In this sense, the average CRB may not be as good a performance measure as the outage probability in the deterministic channel approach since the latter is asymptotically achievable, starting at very small values of n,., by the ML estimator. However, for small values of n, and at high SNR, the average CRB may still be a reasonable performance metric. 2.5.2 Perfectly Correlated Training Signals Under the random channel approach employing perfectly correlated training signals, we have AHA = P qqT where q is an arbitrary nt x 1 vector with qTq = nt. For instance, q = [1, 1,..., I]T when we use the same training sequence and evenly assign the power to each transmit antenna. By using the matrix inversion formula, the ML delay estimator for this choice of perfectly correlated sequences is reduced to be exactly the same as the one for orthogonal training sequences given in (2.70). We note that the knowledge of the SNR is not needed to implement the above ML estimator. Comparing the results in Figs. 2.4 and 2.6, the MSE obtained by the ML estimator with the perfectly correlated training sequences is smaller than that obtained by the ML estimator with orthogonal training sequences for all cases considered in the simulation studies. This observation is in agreement with the training sequence optimization result based on the CRB that the perfectly correlated sequences are superior than the orthogonal sequences under the random channel approach. In general, the SNR information is needed to implement the ML estimator. We also note that perfectly correlated training signals are not applicable in the deterministic channel approach since they cannot be used to estimate the channel vector h. 2.5.3 Deterministic vs Random Channel Approaches The results and discussions in the previous sections provide some guidelines of whether to use the deterministic or random channel approaches in estimating the timing parameter. If the design consideration is the outage probability, i.e., neglecting a small percentage of the worstcase channel realizations, one would employ the deterministic channel approach with orthogonal training signals. On the other hand, if the average estimation (over all channel realizations) error is the main design criterion, one would employ the random channel approach with perfectly correlated training signals. We note that the perfectly correlated training signals cannot be used for channel estimation. Thus they may be more suitable for spacetime coding schemes that do not require the channel information. In addition, the advantage of the perfectly correlated training signals over orthogonal signals vanishes at high SNR in the random channel approach. Thus when the number of transmit antennas is not very large and at high SNR, one could employ orthogonal training signals for either of the two approaches. CHAPTER 3 CHANNEL ESTIMATION FOR CORRELATED MIMO CHANNELS WITH COLORED INTERFERENCE 3.1 Introduction Many multiple antenna communication systems are designed for coherent detection that requires channel state information (CSI) in the demodulation process. For practical wireless communication systems, it is common that the channel parameters are estimated by sending known training symbols to the receiver. The performance of this kind of trainingbased chan nel estimation scheme depends on the design of training signals which has been extensively investigated in the literature. It is well known that imperfect knowledge of the channel has a detrimental effect on the achievable rate it can sustain [52]. Training sequences can be designed based on information theoretic metrics such as the ergodic capacity and outage capacity of a MIMO channel [42, 53, 54]. The mean square error (MSE) is another commonly used performance measure for channel estimation. Many works [5565] have be carried out to investigate the training sequence design problem based on MSE for MIMO fading channels. In Wong et al. [61], the authors studied the problem of training sequence design for multipleantenna systems over flat fading MIMO channels in the presence of colored interference. The MIMO channels are assumed to be spatially white, i.e., there is no correlation among the transmit and receiver antennas. The optimal training sequences were designed to minimize the channel estimation MSE under a total transmit power constraint. The optimal training sequence design result implied that we should intentionally assign transmit power to the subspace with less interference. A practical algorithm of estimating the longterm secondorder statistics of the interference correlation matrix and an efficient scheme of feeding back necessary information to the transmitter for constructing the optimal training sequences were also proposed. In Kotecha et al. [62], the problem of transmit signal design was investigated for the estimation of spatial correlated MIMO Rayleigh flat fading channels. The optimal training signal was designed to optimize two criteria: the 42 minimization of the channel estimation MSE and the maximization of the conditional mutual information (CMI) between the channel and the received signal. The authors adopted the virtual channel representation model [66] for MIMO correlated channels. It was shown that the optimal training signal should be transmitted along the strong transmit eigendirections in which more scatters are present. The powers transmitted along these eigendirections are determined by the waterfilling solutions based on the minimum MSE and maximum CMI criteria. In Cai et al. [65], the spacetime spreading scheme, block coding scheme and channel estimation for correlated fading channels in the presence of interference have been studied. The authors focused on the single receive antenna case and extended their results to the multiple receive antennas case where receive antennas were assumed to be uncorrelated. Based on the previous optimization results for the special case [63] where there was no interference, the spacetime beamforming (STBF) matrix was chosen as the training symbol matrix for the linear MMSE channel estimator. Then the optimal power loading scheme was designed for the training symbol matrices in this particular set. In this chapter, we investigate the problem of estimating correlated MIMO channels with colored interference. We adopt the correlated MIMO channel model [21, 67] which expresses the channel matrix as a product of the receive correlation matrix, a white matrix with identically and independent distributed (i.i.d.) entries, and the transmit correlation matrix. This model im plies that transmit and receiver correlation can be separated. This fact has been verified by field measurements. The colored interference model used here is more suitable than the white noise model when jamming signals and/or cochannel interference are present in the wireless com munication system. We consider an interference limited wireless communication system, and assume that the thermal noise is small relative to interference and can be ignored. Then we show that the covariance matrix of the interference has a Kronecker product form which implies that the temporal and spatial correlation of the interference are separable. The channel estimation MSE is used as a performance metric for the design of training sequences. The optimization problem encountered here which minimizes the channel estimation MSE under a power con straint is a generalization of two previous optimization problems which are encountered widely in the signal processing area [61, 63, 64, 68]. We first analyze the optimal structure of the solu tion by using the Lagrangian method, and then find the optimal power allocation scheme which has the waterfilling interpretation. Finally we determine the optimal ordering for the related eigenvector matrices. In Cai et al. [65], the authors encountered the essentially same optimiza tion problem but with the different form. Based on the the previous optimization results for the special case [63], the authors chose to optimize the training sequence matrix in a particular set of matrices which have the same solution structure and eigenvector ordering as our solution. Here we rigorously prove that this particular solution structure and eigenvector ordering result are optimal for arbitrary matrices with the power constraint. The design of the optimal training sequences has a clear physical interpretation which implies that we should assign more power to the transmission direction constructed by the eigendirection with larger channel gains and the interference subspace with less interference. In order to implement the channel estimator and construct the optimal training sequences, we propose an algorithm to estimate longterm chan nel statistics and design an efficient feedback scheme so that we can approximately construct the optimal sequences at the transmitter. Numerical results show that with the optimal training sequences, the channel estimation MSE can be reduced substantially when compared with the use of other training sequences. The chapter is organized in the following manner. The system model and linear MMSE channel estimator that we consider are introduced in Section 3.2. In Section 3.3, The optimal training sequence is designed based on minimizing the total channel estimation MSE. In Section 3.4, an algorithm for the estimation of longterm characteristics of the channel is proposed and an efficient feedback scheme is designed. Numerical results are provided in Section 3.5. Conclusion is drawn in Section 3.6. 3.2 System Model We consider a single user link with multiple interferers. The desired user has nt transmit antennas and n, receive antennas. We assume that there are M interfering signals and the ith interferer has n, transmit antennas. The MIMO channel is assumed to be quasistatic (block fading) in that it varies slowly enough to be considered invariant over a block. However, the channel changes to independent values from block to block. We assume that the users employ a framebased transmission protocol which comprises training and payload data. The received baseband signals at the receive antennas during the training period are given in matrix form by M M Y = HST+ HiST = HST+E = HST + E,. (3.1) E The n, x n.t matrix H and the nr x ni matrix Hi are the channel gain matrices from the transmitter and the ith interferer to the receiver, respectively. S is the N x nt training symbol matrix known to the receiver for estimating the channel gain matrix H of the desired user during the training period. N is the number of training symbols from each transmit antenna and N is usually much larger than nt. S, is the N x ni interference symbol matrix from the ith interferer. We assume that the elements in S, are identically distributed zeromean complex random variables, correlated across both space and time. The interference processes are assumed to be widesense stationary in time. We consider an interference limited wireless communication system. Hence we assume that the thermal noise is small relative to interference and can be ignored [69]. We adopt the correlated MIMO channel model [21, 67] which models the channel gain matrix H as: H = R2H,, R1/2 (3.2) where Rt models the correlation between the transmit antennas and Rr models the correlation between the receive antennas, respectively. The notation (.)1/2 stands for the Hermitian square root of a matrix. H,, is a matrix whose elements are independent and identical distributed zero mean circularsymmetric complex Gaussian random variables with unit variance. Let h,, = vec(H,), where vec(X) is the vector obtained by stacking the columns of X on top of each other, then we have h = vec(H) = (Rt/ 0 R )h,,, (3.3) with h ~ C.N(0, R, R,) where CAf denotes complex Gaussian distribution. Similarly, we model the channel gain matrix from the ith interferer to the receiver as: H, = R'/2H,iR (/2 (3.4) and hi = vec(Hi) = (R/02 R/2)hw. (3.5) Using the vec operator, we can write the received signal in vector form as y = vec(Y) M = (S 0 In,)vec(H) + (S 0 In,)vec(Hz) i=1 M = (S In,)h + Z(S 0 I,)(R2 R2)h i=1 M = (S0In,)h+ ee i=1 = (S In,)h + e (3.6) where In, denotes the nr x n,. identity matrix. To derive the linear MMSE channel estimator, we need the following lemma. Lemma 3.2.1. E(e) = 0 and the covariance matrix of e is M E(eeH) = QNi Rr = QN 0 R, (3.7) i=1 where SRk(O) ... R,k( 1) QN = : (3.8) E= RZk (N 1) k=, R1Ik(0) and R,k (r) represents the time correlation between the signals at time instants m and m + T from the kth antenna of the ith interferer Proof Since h,i ~ CaV(0, I.,n,), E(ei) = 0. Then we have E(e) = 0. The received signal from the ith interferer can be written as E. = HiS, = R/2H,, R/2S. = R1/2H, S. (3.9) Since Si is widesense stationary in time, S, is also widesense stationary in time. Using the vec operator, we can rewrite the interfering signal from the ith interferer as ej = vec(Ei) = (IN 0 R'/2)vec(H ,,S,). (3.10) The covariance matrix of e, is given as E(eefH) = E[(IN 0 R'/2)vec(H,,S,)vec(Hw,,S)H(IN 0 R/2)H] = (I R1/2)E[vec(HewS )vec(HS)H](IN RN 2). Let e' = vec(HwiSi), we can show that the covariance matrix of e is k,k(0)I, ... k= R,k(N 1)I E[eeH] : ". JR 1 Rk(N 1)I .. E 1= Rk(0)Ir = QN In,, (3.11) (3.12) where Rk,k (r) represents the time correlation between the signals at time instants m and m + T from the kth antenna of the ith interferer. Then we have E[eie'l] = (IN 0 R'/2)(QNi 0 In,)(IN 0 R1/2) (3.13) = QNi Rr.. The covariance matrix of e is then given as M E[eeH] = Z QN 0 R, = Q . i=1 (3.14) We note that QN captures the temporal correlation of the interference and R, represents the spatial correlation. The covariance matrix of the interference has a Kronecker product form which implies that the temporal and spatial correlation of the interference are separable. We notice that (3.6) represents a linear model. Based on the Bayesian GaussMarkov Theorem [36], the linear minimum mean square error estimator (LMMSE) for h is given as: h = [(S" 3 In,)(Q O R,)'(S I,) + (Rt R,)']1(S 0 In.,)(QN O R,)ly = [(S1HQNIS + R7')1 R,](SH Inr)(QN 3 Rr 1)y = [(SHQ NS + Rtl) SHQN' 0 I ]Jy. (3.15) Using the equality vec(AYB) = (BT 0 A)vec(Y), we can rewrite the channel estimator in the compact matrix form as fH = YQ1S(SHQ NS + Rt )'. (3.16) Hence the channel estimator does not depend on the receive channel correlation matrix Rr. The performance of the channel estimator is measured by the estimation error e = h h whose mean is zero and whose covariance matrix is C, = E[(h h)(h h)"] = [(SH 0 I.,)(QN Rr) (S 0 In,) + (Rt 0 R,)] = [(SHQN)S) ( Rr1 + R '1 9 R,]1 = [(S"QNS + Rt) Rr1 = (SHQ NS + Rt 1)' 3 R, (3.17) where the third equality is due to (A B)(C D) = AC BD and (A B) = A' B1. The diagonal elements of the error covariance matrix C, yields the minimum Bayesian MSE. The total MSE is the commonly used performance measure for MIMO channel estimation. By using the fact that tr(A 0 B) = trAtrB, we have tr(C,) = tr((SHQN'S + R '1)' s R,) = tr((SHQN'S + Rt'))tr(Rr). Thus the minimization of the total MSE over training sequences does not depend on the receive channel correlation matrix. Only the temporal interference correlation matrix QN and the trans mit correlation matrix Rt need to be considered in obtaining the optimal training sequences. 3.3 Optimal Training Sequence Design In this section, we investigate the problem of optimal training sequence design for channel estimation. With the total Bayesian MSE as the performance measure, the optimization of training sequences can be formulated as follows min tr(SHQN S + Rt1)1 (3.18) S subject to tr{SHS} < P where tr{SHS} < P specifies the power constraint. The optimization problem itself is of great interest to researchers in the signal processing and communication areas. Its special cases (with either QN or Rt equal to the identity matrix) have been encountered widely in joint linear transmitterreceiver design [63, 68, 70], training sequence design for channel estimation in multiple antenna communication systems [61, 64], and spreading sequence optimization for code division multiple access (CDMA) communication systems [71]. The solution in the special case Rt = I, found for example in Wong et al. [61] and Scaglione et al. [68], can be expressed in terms of the eigenvalues and eigenvectors of QN and a Lagrange multiplier associated with the power constraint. Similarly, the solution in the special case QN = I, found for example in Zhou et al. [63] and Biguesh et al. [64], can be expressed in terms of the eigenvalues and eigenvectors of Rt and a Lagrange multiplier associated with the power constraint. The optimization of the generalized mean square error problem introduced here is more difficult. We will show that (3.18) has a solution that can be expressed S = UEVH where U and V are orthonormal matrices of eigenvectors for QN and Rt respectively, and E is diagonal. Solving (3.18) involves computing diagonalizations of QN and Rt, and finding an ordering for the columns of U and V. In Cai et al. [65], the authors encountered the essentially same optimization problem but with the different form. Based on the the previous optimization results for the special case [63], the authors chose to optimize the training sequence matrix in a particular set of matrices which have the same solution structure and eigenvector ordering as our solution. Here we rigorously prove that this particular solution structure and eigenvector ordering result are optimal for arbitrary matrices with the power constraint. A related optimization problem which minimizes the trace of the mean square error matrix in a variant form is discussed in Section 3.7.1., and another optimization problem which max imizes the determinant of the inverse of the mean square error matrix is introduced in Section 3.7.2. We solve the optimization problem (3.18) in three steps. First, we analyze the optimal structure of the solution by using the Lagrangian method, then find the optimal power allocation scheme, and finally determine the optimal ordering for the related eigenvector matrices. 3.3.1 Solution Structure We begin by analyzing the structure of an optimal solution to (3.18). Let UAUH and VAVH be diagonalizations of QN and Rt where the columns of U and V are orthonormal eigenvectors. Let Aj, 1 < j < N, and 6~, 1 < i < nt, denote the diagonal elements of A and A, respectively. We assume that the eigenvalues { A, } are arranged in increasing order, and {5,} are arranged in decreasing order: 0 Let us define T = UHSV. (3.20) Substituting S = UTVH in (3.18) gives the following equivalent optimization problem: min tr (THA 1T+ A)1 (3.21) subject to tr (THT) < P, T E CNxt. We now show that the solution to (3.21) has at most one nonzero in each row and column. Theorem 3.3.1. There exists a solution of (3.21) of the form T = IIH1II2 where IIi and 112 are permutation matrices and oai = 0 for all i $ j. Proof. We first prove the theorem under the following nondegeneracy assumption: 6 7 8j > 0 and A. $ A, > 0 for all i # j. (3.22) Since the cost function of (3.21) is a continuous function of A and A, and since any A > 0 and 6 > 0 can be approximated arbitrarily closely by vectors 6 and A satisfying the nondegeneracy conditions (3.22), we conclude that the theorem holds for arbitrary A > 0 and 6 > 0. There exists an optimal solution of (3.21) since the feasible set is compact and the cost function is a continuous function of T. Since the eigenvalues of A T"A TA are nonneg ative, it follows that for any choice of T, tr (THAT + A)1 trA(ATHA1TA +I)1 < tr (A), with equality when T = 0. Hence, there exists a nonzero optimal solution of (3.21), which is denoted T. According to the Lagrange multiplier theorem, the firstorder necessary condition for an optimal solution is the following: there exists a scalar 7 > 0 such that: tr ((THA1T + A) + 7THT)T= = 0. (3.23) For notation simplicity, let M = THAT + A1. (3.24) For any invertible matrix M, the derivative of the inverse of a matrix [72] is given as: dMa MIdM)M =M M dT Hence, (3.23) is equivalent to: tr (7[TH6T + 6THT] MTHA16T + 6THA M1) = 0 for all matrices 6T e CNxIt. Let Real (z) denote the real part of z E C. Based on the fact that tr (A + AH) = 2(Real [tr (A)]) and tr (AB) = tr (BA), we have Real [tr (yH6T M2THA16T)] = 0. By taking 6T either pure real or pure imaginary, we deduce that tr (['"H M2THA1]6T) = 0 for all 6T. By choosing 6T to be completely zero except for a single nonzero entry, we conclude that 7TT" M'T"A1 = 0. (3.25) If 7 = 0, then T = 0 since both A and A are invertible. Hence, 7 > 0. We multiply (3.25) on the right by T to obtain TT = M2THA'T = (THAT + A1) 2HA1T (3.26) Since THT is Hermitian, we have (THA T + A1)2THAl = THAT(THA1 + A1)2. Then we will show that THAlT and A1 commute with each other. We need the following lemma [73]: Lemma 3.3.1. If A and B are diagonalizable, they share the same eigenvector matrix if and only ifAB = BA. For the simplicity of notations, let A = T"A'T and B = A1. Then we have (A + B)2A = A(A + B)2 According to Lemma 3.3.1, A and (A + B)2 share the same eigenvector matrix. Since A + B and (A + B)2 have the same eigenvector matrix, A and A + B share the same eigenvector matrix. Then we have A(A+ B) = (A +B)A Hence, AB = BA, which implies that THAlT and A1 commute with each other. Since A1 is diagonal, TH A1T is diagonal. Since THA T is diagonal, THT is diagonal by (3.26). Since THA"I' and A1 are diagonal, both M and M1 are diagonal. Hence, the factor M2 in (3.25) is diagonal with real diagonal elements denoted ej, 1 < j < ntt. By (3.25), we have 7 j = e (3.27) Ai If lj, 7 0, then (3.27) implies that ej  =7 10. Ai By the nondegeneracy condition (3.22), no two diagonal elements of A are equal. If for any fixed j, Tij $ 0 for i = i1 and i2, then the identity = y7 yields a contradiction since 7 $ 0 and Ai, = Ai,. Hence, each column of T has at most one nonzero. Since THT is diagonal, two different columns cannot have their single nonzero in the same row. This implies that each column and each row of T have at most one nonzero. A suitable permutation of the rows and columns of T gives a diagonal matrix E, which completes the proof. E Combining the relationship (3.20) between T and S and Theorem 3.3.1, we conclude that problem (3.18) has a solution of the form S = UII1EIH2VH, where Hi and H2 are permutation matrices. We will show that we can eliminate one of these two permutation matrices. Substituting S = UII NEH2VH in (3.18), the equivalent optimization problem is obtained as: min tr E(HHAi + I12AI 2 (3.28) E,n1i.n2 \ / M subject to tr C 2o < P i=1 where M represents the minimum of N and nt. In the above optimization problem, the mini mization is over diagonal matrices E with a, .. ., OM as the diagonal elements, and two per mutation matrices HI and 1.2 Since the symmetric permutations IIA1f1I and IIHA1H2H essentially interchange di agonal elements of A and A, (3.28) is equivalent to M mirn 2 (3.29) l ^ aO/Ar(,) + 1/d7,2() M subject to o~ < P, Tri E PN, X72 E P., i= 1 where PN is the set of bijections of {1, 2,..., N} onto itself. We will now show that the optimal solution only depends on the smallest eigenvalues of QN and the largest eigenvalues of Rt. Lemma 3.3.2. Let UAUH and VAVH be diagonalizations of Q and D respectively where the columns of U and V are orthonormal eigenvectors. Let a, 7ir, and 7r2 denote an optimal solution of (3.29) and define the sets M ={i: > 0}, Q={A,,(L):ieM}, and = {16,(j) :ie M}, If M has I elements, then the elements of the set Q constitute the I smallest eigenvalues of QN, and the elements of R constitute the I largest eigenvalues Rt, respectively. Proof Assume k M and A7r(k) < Ail(i) for some i E M. It is easy to see that by inter changing the values of 7r, (i) and 7r (k), the new ith term in the cost function is smaller than the previous ith term. It contradicts the optimal assumption of a and 7r. Then Ar,(k) > A1(i). Then, suppose that k M and (62(k) > 52(i) for some i 6 M. Let C denote the cost value due to the sum of the ith term and the kth term before the interchange. Similarly, let C+ denote the cost value due to the sum of the ith term and the kth term after the interchange of the values of 72(i) and 7r2(k). We have 1 C = a2/A() + /(+ 2(k) and C+ 1 + 0)2(I) S2/A71(i) + 1/6 2(k) Since 6,:(k) > 62 (i), C+ C ( 2(k) 4 2())(42(k) 7r2(i)/A() W i2+2(k)/A r + j 2()/ )) (0' 232(k)/ / A(7) + 1(r 7M/A7,it) + 1) < 0. The cost is reduced by interchanging the values of r2 (i) and 7r2(k), which violates the optimality of ao and 7r. Hence, 652(k) < 42(0). O Using Lemma 3.3.2, we now show that one of the permutations in (3.29) can be deleted if the eigenvalues of QN and Rt are arranged in a particular order. Theorem 3.3.2. Let UAUH and VAV1H be diagonalizations of QN and Rt respectively where the columns of U and V are orthonormal eigenvectors, the eigenvalues of QN are arranged in increasing order and the eigenvalues of Rt are arranged in decreasing order If M is the minimum of the rank of QN and R,, then (3.29) is equivalent to M min 1 (3.30) 1 2/Ai) + 1/6i M subject to cr,! < P, 7r C PM, i=1 where cr = 0 for i > M. Proof Since at most M eigenvalues of either QN or Rt are nonzero, it follows from Lemma 3.3.2 that the set M has at most M elements. Since the elements of Q are the smallest eigen values of Q and the elements of R are the largest eigenvalues of Rt, we can assume that 7ri(i) e [1, M] and 72(i) e [1, M] for each i e M. Hence, we restrict the sum in (3.29) to those indices i S where S = { j): 1 < j< M}. Let us define = or Ij) and 7r(j) = t7r21(j)). Since 7r(j) E [1, Al] for j E [1, M], it follows that rt PM. In (3.29) we restrict the summation to i E S and we replace i by 7F2 '(j) to obtain M M 1 where ( \ )2 < p E / + 1/.2() aj/A(j) + 1/ ', iES j=1 i=1 This completes the proof of (3.30). E Combining the relationship (3.20) between T and S, Theorem 3.3.1 and Theorem 3.3.2 yields the following corollary: Corollary 3.3.1. Problem (3.18) has a solution of the form S = UIIEVH where the columns of U and V are orthonormal eigenvectors of QN and Rt respectively with the eigenvalues of QN arranged in increasing order and the eigenvalues of Rt arranged in decreasing order, H is a permutation matrix, and E is diagonal. Proof Let a and 7r be a solution of (3.30). For i > M, define 7r(i) = i and aC = 0. If H is the permutation matrix corresponding to 7r, then making a substitution S = UHIIVH in the cost function of (3.18) yields the cost function in (3.30). Since (3.29) and (3.30) are equivalent by Theorem 3.3.2, S is optimal in (3.18). E 3.3.2 The Optimal E We now consider the optimization problem which minimizes the cost function over a with the permutation 7r in (3.30) given. Then in the next subsection, we will find the optimatial permutation 7r based on the solution to the optimization problem considered here. For the sake of notation simplicity, let pi denote 1/AX(i) and q, denote 1/6i. Hence, for fixed 7r, (3.30) is equivalent to the following optimization problem: mmin (3.31) a Pi + qi i=1 M subject to aE < P. i=1 The solution of (3.31) can be expressed in terms of a Lagrange multiplier related to the power constraint. The structure of this solution has a water filling interpretation in the communication literature [74]. Theorem 3.3.3. The optimal solution of (3.31) is given by S 1 q 1/2 ai = max { i, 0 (3.32) IPiP Pi where the parameter p is chosen so that M :o = P. (3.33) Proof. Since the minimization of the cost function in (3.31) is over a closed and bounded set, there exists a solution. At an optimal solution to (3.31), the power constraint must be an equality. Otherwise, we can multiply a by a scalar larger than 1 to reduce to the value of the cost function. For the sake of notation simplicity, let ti = oa. Then the reduced optimization problem (3.31) is equivalent to M min 1 (3.34) t Z pit + qi M subject to t = P, t > 0. i=1 Since the cost function is strictly convex and the constraint is convex, the optimal solution to (3.34) is unique. According to the Lagrange multiplier theorem, the firstorder necessary conditions [51] (KarushKuhnTucker conditions) for an optimal solution of (3.34) are the following: there exists a scalar p > 0 and a vector v e IRM such that S + p vi = 0, v 0, ti > 0, vt = 0, 1 < i < M. (3.35) (piti + qi)2 Due to the convexity of the cost and the constraint, any solution of these conditions is the unique optimal solution of (3.34). A solution to (3.35) can be obtained as follows. We define the function (71q)= 1 (3.36) \. Pi' Pi)(1 Here x+ = max{x, 0}. This particular value for ti is obtained by setting v, = 0 in (3.35) and solving for ti; when the solution is < 0, we set ti(p) = 0 (this corresponds to the + operator (3.36)). We note that t,(/p) is a decreasing function of p which approaches +00 as p approaches 0 and which approaches 0 as p grows to +oo. Hence, the equation M E ti(.) = P (3.37) i=1 has a unique positive solution. Since ti(pi/q<2) = 0, we have ti(/) = 0 for p> p/qg2. Then we have A +. = +.p > 0 for p > p./q2. (piti(p) + qi)2 q2 We deduce that the KarushKuhnTucker conditions can be satisfied when p is the positive solution of (3.37). E 3.3.3 Optimal Eigenvector Ordering Finally, we need to find an optimal permutation in (3.30), i.e., an optimal ordering for the eigenvalues of QN and Rt. Theorem 3.3.4. If the eigenvalues {Ai} of QN are arranged in increasing order and the eigen values {6i } of Rt are arranged in decreasing order, then an optimal permutation in (3.30) is r(i) = i, 1 < i < M. (3.38) Proof Assume that there exist indices i and j such that i < j, A, > Aj and 6, > 6j, i.e., pi < pj and qi < qj. A) and A are not arranged in the supposed optimal order for the eigenvalues of QN. We will show that it will cause contradiction. Let us consider the following optimization problem: min + (3.39) titj pti + qi pjtj + qj subject to ti + tj = P, ti > 0, tj > 0, where P = a' + oJ. Since a yields an optimal solution of (3.30), it follows that a solution of the above optimization problem is ti = a2 and tj = aoj. Based on Theorem 3.3.3, the ti is given as t()= qi, (3.40) V Pit Pi where p is a Lagrange multiplier obtained from the power constraint ti + tj = P as: 1 + 1 7fi vpi ,p (3.41) =p+ q (3.41) Pi Pj Let C denote the cost function for (3.39). Combining (3.40) and (3.41) gives 1 + 1 1 )2 C + p piti + qi pjtj + qj P + + Now, suppose that we interchange the values of pi and pj. Let C+ denote the cost value associated with the interchange. With the assumption that the optimal solution of (3.39) is still positive after the exchange of p, and p3, we have ( + 1 )2 C+ _= (; + v7) (3.42) Pi Pj We need to use the following lemma [40]: Lemma 3.3.3. If ai, bi, i = 1,..., n are two sets of numbers, n n n jajbj.i+i] E aibi < jaa[>]b[,] i=1 i=1 i=1 From the above lemma, we have q + l > 1 + 21. Then C+ < C. Pi Pj Pi Pj If C+ < C, it contradicts the optimality of u. Then we have C+ = C. Hence, for each i and j with i < j, pi < pj and q, < qj, we can interchange the values of pi and pj to obtain a new permutation with the same value for the cost function. After the interchange, we have pi > pj, i.e., A, < Aj. In this way, the A, are arranged in increasing order. Since the 6i are arranged in decreasing order, we conclude that the associated optimal permutation 7t is (3.38). One technical point must now be checked: we should verify that if pi < pj and q, < qj with i < j, and if we exchange pi and pj, then the corresponding optimal solution of (3.39) remains positive. To check it, we consider two cases respectively. For the first case, suppose ok > 0 with i < j < k, Pk < Pi < Pj and q, < qj < qk. From (3.40), we have o() = 1[ qk >0, V Pk Pk Then 1 qk After the exchange, 1 p + i 1 q 1 q 1 qk l_{lp >, >_ V P, ) ) > 0. PJ ( "3 I+)=3 f VP Similarly, 1 q3 1 1 1 1 qk For the second case, suppose j = max ()) and pi = min(Q),pi < pj and qg < q. Since the original solution, before the exchange, is positive, it follows from (3.40) and (3.41) that P > q j and P > q q. (3.43) Pt Pj V PiP, Pi After the exchange, the analogous inequalities that must be satisfied to preserve nonnegativity are p > qj qi (3.44) \fPip, Pj and p > q qj. (3.45) VPP4', Pi (3.45) is satisfied from (3.43) and the fact that pi < pj and qg < qj. If (3.44) is also satisfied, the proof is completed. If (3.44) is not satisfied, i.e., P < the associated cost after the exchange is 1 1 C* + p P + qg qj where t = P and t, = 0. We will show that C* < C with P > q P >  and P < k _ Letting C* < C gives  (iptp Pj 1 1 +1 )2 + < . PPq P P Multiplying both sides of the above inequality with (pjP + qi)qj(P + ', + ) gives Pi Pj q (p + q + + ) + (p + )(+ + ) < 1 + 1 P +P p + Pt + )2(pip + q)qj. After considerable algebra on the above inequality, we find that to show C* < C is equivalent to show that f(P) < 0 with f(P) = pP2 + (q, + qj + Pj q p qj 2 qp ( q )2, (3.46) PA At vv v,1 when P (max[ ], ). Since p ,p j pj) "',PJ j ]Pp .)"Sin e f( )=(pj + 1)(q q)( q qj ) < 0 when  > q and we have C* < C. i Combining Corollary 3.3.1, Theorem 3.3.3 and Theorem 3.3.4, we conclude that the opti mal training sequences should be designed according to the following theorem. Theorem 3.3.5. Let UAUH and VAV" be the diagonalizations of QN and Rt respectively where the columns of U and V are orthonormal eigenvectors, the corresponding eigenvalues { A } are arranged in increasing order, and {5, } are arranged in decreasing order Then the optimal solution of (3.18) is given by S = Uq V (3.47) where m specifies the power allocation which is diagonal with diagonal elements given by and F, = O for i > nt, with the parameter pi is chosen so that nt Si= With the optimal training sequence, the channel estimator simplifies to fl = YUr,, rVH, (3.49) where r = diag{yi, ... ., 7, with y = the columns of U, are the eigenvectors of QN corresponding to its nt smallest eigenvalues, and the columns of V, are the eigenvectors of Rt. The design of the optimal training sequences summarized in the above theorem has a clear physical interpretation. Each eigenvector of the transmit correlation matrix Rt represents the transmit eigendirection and the associated eigenvalue indicates the channel gain in that eigen direction. More power should be assigned to the signals transmitted along the eigendirection with larger channel gains. On the other hand, each eigenvector of the interference temporal correlation matrix QN represents the interference subspace and the corresponding eigenvalue indicates the amount of interference in that subspace. Hence, we should choose the subspaces with the least amount of interference for transmission. To facilitate the understanding of the physical meaning of optimal training sequences, we can rewrite them in an alternative way as nt i=1 where u, are orthonormal eigenvectors of QN with the corresponding eigenvalues arranged in an increasing order and vi are orthonormal eigenvectors of Rt with the corresponding eigenvalues arranged in a decreasing order. The vectors ui and vi form transmission directions in time and space, respectively. The above theorem implies that the optimal training sequence design put more power to the transmission direction constructed by the eigendirections with larger channel gains and the interference subspaces with less interference. The power assignment is determined by the waterfilling argument under a finite power constraint. 3.4 Estimation of Channel Statistics and Feedback Design To implement the channel estimator and construct the optimal training sequences for chan nel estimation, we need the knowledge of the transmit antenna correlation matrix Rt and the interference covariance matrix QN at both the receiver and transmitter sides. Since these two matrices are longterm channel characteristics, they can be estimated by using the observed training signals at the receiver end and then fed back to the transmitter end for the construc tion of the optimal training sequences. In this section, we propose an algorithm to estimate these longterm channel characteristics and design an efficient feedback scheme so that we can approximately construct the optimal training sequences at the transmitter end. Let us assume that the training signal matrix S is sent over a block of K packets. The received training signals for the nth packet are given as y(n) = (S In,)h(n) + e(n) = (S I.,)(R/2 R1/2)h(n) + e(n) = (SR/2 0 R1/2)h,,(n) + e(n). (3.50) We can calculate the sample average correlation matrix of the received signal from the previous K packets as follows: K R= y(n)y(n). (3.51) n= 1 It is easy to see that R is the sufficient statistics for the estimation of the secondorder correlation matrices Rt and QN if e(n) is Gaussian distributed. We can show that the correlation matrix of the received signal has the Kronecker product form: R = E(y(n)y(n)H) = (SR'/2 R1/2)E(h,,(n)h,(n)H)(R/2SH 0 R1/2) + E(e(n)e(n)H) = (SRtSH) 0 Rr + QN o Rr = (SRtSH + QN) 0 Rr = RqoR,. (3.52) where Rq = SRtSH + QN IfR Rq Rr, then R = aRq 0 1Rr for any a : 0. Hence, Rq and R, can not be uniquely identified from observing y(n). Fortunately, the channel estimator and the design of optimal sequences are invariant to scaling of the estimates of Rt and QN. This can be explained as follows: I'(n) = Y(n)(aQN)S(SH(aQN)lS + (aRt)l)1 = Y(n)QNIS(SHQNS + Rt1)1 = H(n) and tr((SH(QQN)lS + (aR,)')1 = atr((SHQNIS + R 1)T). We notice that the new cost function of the optimization problem is just a scaled version of the original cost function. For the estimation of Rq and Rr, we need to impose an additional constraint on Rr. Here we force tr(R,) = nr. Then an iterative flipflop algorithm [75, 76, 77] can be used to estimate Rq and R,. If the received interference signal e(n) is Gaussian distributed, the flipflop algo rithm provides the maximum likelihood estimates (MLE) of Rq and R,. [75] when it converges. Otherwise, the algorithm gives the estimates of Rq and R, in the least square sense. For fixed Rr, the MLE of Rq is obtained as = or KY,(n)[YH(n)]'} (3.53) U=l v=l 7=1 where ar, is the (u, v)th element of R1 and Yu(n) is the uth row vector of the received signal matrix Y(n). Similarly, for fixed Rq, the MLE of Rr is obtained as N N K RO= o: W (n)W'(n) (3.54) u=l 1= n=1 where oa, is the (u, v)th element of f'1 and W,(n) is the uth column vector of the received signal Y(n). Then to get uniquely identifiable Rq and Rr, we need to scale Rr to make tr(R,) = nr. We note that the terms inside the braces in (3.53) and (3.54) can be computed before the running of the iterative estimation algorithm to reduce computational complexity. To start the iterative algorithm, an initial value of either Rq or R, should be assigned. A natu ral choice is to initially make Ri = In,. Then the iterative algorithm alternates between the calculation of Rk and ,. until convergence. While it is difficult to prove analytically that the algorithm converges to the MLE, extensive data experiments [75] in statistics show that it al ways converges to the MLE for situations of practical sample sizes. The convergence in our case is also verified by the numerical results in Section 3.5. Then Rt and QN can be estimated based on Rq. We note that only 7'(QN) fn R (S) can be uniquely identified from Rq in the sense below (R. denotes the range space of a matrix and R' denotes the perpendicular subspace of the range of a matrix): Lemma 3.4.1. Let Rt and QN be Hermitian positive semidefinite matrices andRq = SRtSH+ QN, where S is offull rank. Let D = {(Rt, QN) : RZ(QN) C 7'(S)}. Then there is an 11 correspondence between Rq and (R,, QN) only for the pairs of (Rt, QN) in D. Proof Let Ps = S(SHS)ISH be the projection onto R7(S) and Ps = I Ps be the projection onto RT (S). First, let (Rt, QN), (Rit, Q') e D. Let Rg = SRtSH + QN and R' = SRSH +Q' Consider P(Rq = PsQN = QN, PsRq = SRtSH, and PR', = PsQ'N = Q'N PsR = SR'tSH. Since S is of full rank, PsRq = PsR' iff Rt = Rt. Also since Ps and P' are projections onto complementary subspaces, Rq = R'q iffPR = P7R' and PsR = PsR', i.e. (Rt, QN) = (R, Q'N). Conversely, let (Rt, QN) e D and Rq = SRSH + QN. Now choose R'(t Rt and define Q'N = QN + SRtSH SR'SH. Since R(QN) C R'(S) and S is of full rank, (Rt, Q'N) ID'. But R' = SRSH + Q = SRtSH+ QN = Rq. Based on the above lemma, we see that estimating QN and Rt simultaneously from Rq is not possible. However, since PsRqPs = PsQNPs, we can estimate QN from P RqP . For notation simplicity, let A denote Ps RqPs. Since the interference signals are widesense stationary in time, QN is a Topelitz matrix which can be represented by a sequence {qk; k = 0, 1, (N 1)} with QN = { qk,} = { qkj }. Then the ijth element of P' QN PS is given by E k PimqkPkj with pyj denoting the ijth element of PI. Equating the ijth element of P QNPs with aj, we have a set of linear equations in {qk}. Noticing the hermitian nature of PsQNPs and A and separating the real and imaginary parts of qk and aj, we have N2 linear equations with 2N 1 unknowns in q, = [qo, Re(q1), Im(qi),..., Re(qN_1), Im(qN1)]7. The set of linear equations can be solved by employing the least square approach. Then the estimate of QN can be constructed based on q,. In addition, when N is large, QN can be approximated by a circulant matrix [78] with fixed eigenvectors as: QNg FNyF" (3.55) where FN is the N x N FFT matrix and % is a diagonal matrix containing eigenvalues Oi. We notice that we only require the nt smallest eigenvalues of QN and their corresponding eigen vectors in constructing the optimal training sequences. With the circulant matrix approximation (3.55), it is equivalent to estimating the n, smallest eigenvalues O, and identifying the corre sponding columns of F. The nt, smallest positive eigenvalues of QN are used as the estimates of the nt smallest Vi, and the corresponding columns of F are chosen as those closest to the eigenvectors associated with the nt smallest positive eigenvalues of QN. The estimates of the nt smallest Oi and the nt indices of the chosen columns of FN are then fed back to the transmitter for the optimal training sequence construction. We notice that it is bandwidth efficient to just feed back these indices of FN instead of the whole eigenvectors of QN because the number of training symbols N during the training period is usually large. To derive the estimator of Rt, we need the following lemma which establishes the asymp totical equivalence of QN and PsQNP, as N increases. Lemma 3.4.2. With the assumption that QN is an absolutely summable Toeplitz matrix, QN and PsQNPs are asymptotically equivalent. Since QN is Toeplitz, P QNP( is asymptotically Toeplitz. Proof Two definitions of the norms of a matrix which include the strong norm and weak norm [78, 79] are needed to study the asymptotic equivalence of two matrices. The strong norm  A  is defined as I A J= maXx:x*x=1[x*A*Ax] = v/Amr (A*A) where Amax represents the largest eigenvalues of a matrix. If A is Hermitian, I A I= Amax(A). The weak norm of A is defined as JAI = (n'Tr[A*A]) . Two sequences of n x n matrices An and Bn are said to be asymptotically equivalent [78] if A, and Bn are uniformly bounded in strong norm:  An 1, Bn J< M and An B, approaches zero in weak norm as n  oo: lim An Bn = 0. n00 If one of the two matrices is Toeplitz, then the other is said to be asymptotically Toeplitz. Without the loss of generality, we assume that QN is an absolutely summable Toeplitz matrix. (For the temporal interference correlation matrix QN arising from practical scenarios, such as jamming signals and cochannel interference considered here, it is easy to verify that QN is absolutely summable.) QN can be represented by a sequence {qk; k = 0, 1, +2,... } with QN = {qkj} = {qkj} and E= oo qk < 00. It is shown [80] that QN is bounded in strong norm as: +o00 I QN < 2 1 I, = 2Mq < oo. k=oo Then we need to show that 1 PsQNPs II is also bounded. Using the properties of the strong norm, we have II PQNPs II = I (I Ps)QN(I Ps) 1I = II QN PsQN QNPS + PsQNPs II < II QN II + II PsQN 1I + II QNPs I + II PsQNPs II. To proceed, we need the following lemma [40]: Lemma 3.4.3. For two Hermitian positive semidefinite matrices G and H, Amax(GH) < Amax(G)Amax(H). Then, we have 11 PsQN H= [Am.o(QNPsQN)] < [Am,(QN)Ama.(Ps)Ama(QN)]I = Am(Q) =1 QN I Similarly, 1I QNPs 11i11 QN 11 and  PsQNPs 1111 QN I. Thus, PQNP 11 4  QN 11= 8AM. Let M = 8M,, then  QN 11 M < c0 and I PsQNP s II< M < 00. Next, we need to show that the distance of the two matrices goes to zero asymptotically in weak norm. Using the properties of weak norm, we have IQN P QNPI = IPsQN + QNPs PsQNPsI < IPsQNl + IQNPsI + IPsQNPsI. We need the following Lemma [78, 80]: Lemma 3.4.4. Given two n x n matrices G and H, then IGHI <11 G II IH. The weak norm of Ps can be written as Ps = (NTr[S(SHS)ISH]) = (N1Tr[ ) = ( . Then using the above lemma, we have IQNPsI <11 QN II Psi = (n) I1 QN 1< (n)2Mq. Similarly, IPsQNPsI <1 PsQN II (') <11 QN II (a)! < (N) 2M, and IPsQNI = IQNPsI < (L)22M,. Then, we can show that IQN PsQNPs 3 im () 2Aq = 0. Based on the above lemma, the transmit channel correlation matrix Rt can be estimated by projecting the received signal onto R(S). Since N is usually much larger than nt, we have Rq SRtSH + P QNPs, (3.56) and hence PsRqPs PsSR,SHPs + PsPsQNP sPs = SRSH. (3.57) Then we can estimate the transmit channel correlation matrix R, using ft = (SHS)lSHfRS(SHS)1. (3.58) 3.5 Numerical Results In this section, we present some numerical results to show the performance gain for channel estimation achieved by the designed optimal training sequences. We consider a MIMO system with 3 transmit antennas and 3 receive antennas. The antennas form uniform linear arrays at both the transmitter and the receiver. For a small angle spread, the correlation coefficient between the ith and the jth transmit antenna [67] can be approximated as: [Rlj 2 exp {j2r sin A dt sin }dO = Jo(27ri J sin A ), (3.59) S27 j27ri sin A A where Jo(x) is the zeroth order Bessel function of the first kind, A is the angle spread, dt is the antenna spacing and A is the wavelength of a narrowband signal. We set dt = 0.5A. In the simulations, we consider two channels with different transmit channel correlations: a high spatial correlation channel with A = 50 and a low spatial correlation channel with A = 25. The receive correlation matrix Rr is calculated similarly as the transmit correlation matrix with A = 25. We consider two kinds of interference: the cochannel interference from other users in the same wireless system and jamming signals which are usually modeled by autoregressive (AR) random processes. We compare the channel estimation performance in terms of the total MSE for systems using different sets of training sequences. The following different training sequence sets are considered for comparison: 1) the optimal training sequences described in Section 3.3., 2) the approximate optimal training sequence constructed based on the channel and interference statis tics obtained by using the proposed estimation algorithm in Section 3.4., 3) the temporally op timal training sequences for which the transmit channel correlation matrix is assumed to be an identity matrix and only temporal interference correlation is considered in designing the optimal training sequences. (we also consider the approximate temporally optimal sequences which are constructed based on the channel statistics obtained by using the proposed algorithm), 4) the spatially optimal training sequences for which the interference is assumed to be temporally white and only transmit correlation is considered in designing the optimal training sequences. (we also consider the approximate spatially optimal sequences which are constructed based on the channel statistics obtained by using the proposed algorithm), 5) Binary orthogonal se quences, 6) Random sequences. 3.5.1 Cochannel Interference In a cellular wireless communication system, cochannel interference (CCI) from other cells exists due to frequency reuse. Hence, the interfering signals have the same signal format as that of the desired user. We can express the interfering signal transmitted from the ith transmit antenna of the mth interferer as s m)(t) = b2)0(t IT Tm) (3.60) v 'ti l=00 where P,, is the transmit power of the mth interferer, and {b)'I } are data symbols transmitted from the ith transmit antenna of the mth interferer. They are assumed to be i.i.d. binary random variables with zero mean and unit variance. In addition, )(t) is the symbol waveform and T is the symbol duration. It is assumed that the receiver is synchronized to the desired user but not necessarily to the interfering signals and Tr is the symbol timing difference between the rmth interferer and the desired user signal. Without loss of generality, we assume 0 < r,, < T. The elements of the interference symbol matrix Si are samples at the matched filter output at the receiver at time index jT. The (j, i)th element of S, is F=\ b'((j 1)T T,) (3.61) ,o' with (t) = (t s) "(s)ds (3.62) oo where 4(t) is the autocorrelation of the symbol waveform. For the cochannel interference, the temporal interference correlation is due to the intersymbol interference in the sampled interfer ing signals. In the simulations, it is assumed that there are two interfering signals in the system and the SIR (signaltointerference ratio) is set to be OdB. The ISIfree symbol waveform with raised cosine spectrum is chosen as the symbol waveform. For this case, we have 4(t) = sinc(wrt/T) cos( r23T) We set the rolloff factor 0 = 0.5, T1 = 0.2T and r2 = 0.5T. In Fig. 3.1 and Fig. 3.2, we show the total channel estimation MSEs for the high spatial correlation channel and low spatial correlation channel, respectively. For both cases, the opti mal sequences outperform the orthogonal sequences and random sequences significantly. For the high spatial correlation channel, the optimal sequences provide a substantial performance gain over both the spatially optimal sequences and the temporally optimal sequences. The ap proximate optimal sequences achieve most of the performance gain obtained by the optimal sequences. For the low spatial correlation channel, the MSE performance of the approximate optimal sequences is close to that of the optimal sequences. The temporal correlation has a stronger impact on the channel estimation than the spatial channel correlation due to the fact that the length of training sequences N is much larger than the number of transmit antennas t. It is verified by the simulation results shown in Fig. 3.2 that the temporally optimal sequences achieve an estimation performance similar to that achieved by the optimal sequences. These two optimal sequences provide significant performance gain over the spatially optimal sequences. 3.5.2 Jamming Signals We assume that there are two jamming signals in the system. The jamming signals are modeled as two first order AR processes driven by temporally white Guassian processes {ui,t} as, (3.63) Si,t = aiSi,t_1 + Ui,t 101 LU w 102 I I I I I 15 20 25 30 35 40 N 45 50 55 60 65 Figure 3.1: Comparison of total MSEs obtained using different training sequences. ISIfree symbol waveform and high spatial correlation channel. S ... .......... 1) Optimal sequences 2) Approximate optimal sequences S9 3) Temporally optimal sequences Se 4) Spatially optimal sequences 8 5) Orthogonal sequences + 6) Random sequences v Approximate temporally optimal sequenes S0 Approximate spatially optimal sequences S 1) Optimal sequences S* 2) Approximate optimal sequences Sv 3) Temporally optimal sequences /e 4) Spatially optimal sequences 10 ....... e 5) Orthogonal sequences 10 .. ...  6) Random sequences + v Approximate temporally optimal sequences S \ ...... ...... 0 Approximate spatially optimal sequences 102 I I I 15 20 25 30 35 40 45 50 55 60 65 N Figure 3.2: Comparison of total MSEs obtained using different training sequences. ISIfree symbol waveform and low spatial correlation channel. where si, represents the jamming signal transmitted by the ith jammer at the tth time index, ao is the temporal correlation coefficient, and u,,t has zero mean with variance oa,, which decides the transmit power of the ith jammer. The SIR is set to be 0 dB. We choose aI = 0.4 and 02 = 0.5. In Fig. 3.3 and Fig. 3.4, we show the total channel estimation MSEs for the high spatial correlation channel and low spatial correlation channel, respectively. For AR jammers, simi lar conclusions on the estimation performance achieved by different training sequences can be made as in the case of cochannel interference. 3.6 Conclusion In this chapter, we consider a wireless communication system with multiple transmit and receive antennas in a slow, Rayleigh flatfading environment. We study the problem of the estimation of correlated MIMO channels with colored interference. The Bayesian channel es timator is derived and the optimal training sequences are designed based on the mean square error (MSE) of channel estimation. We propose an algorithm to estimate longterm channel I I I  1) Optimal squences  2) Approximate optimal sequences v 3) Temporally optimal sequences e 4) Spatially optimal sequences B 5) Orthogonal sequencs i 6) Random sequences v Approximate temporally optimal sequences 0 Approximate spatially optimal sequences _0_0 L I I I 15 20 25 30 35 40 N 45 50 55 60 65 Figure 3.3: Comparison of total MSEs obtained using different training sequences. ARjammers and high spatial correlation channel. 10 15 20 25 30 35 40 N I I 55 60 65 45 50 55 60 65 Figure 3.4: Comparison of total MSEs obtained using different training sequences. and low spatial correlation channel.  1) Optimal sequences * 2) Approximate optimal sequences v 3) Temporally optimal sequences e 4) Spatially optimal sequences B 5) Orthogonal sequences + 6) Random sequences v Approximate temporally optimal sequences 0 Approximate spatially optimal sequences AR jammers \ statistics and design an efficient feedback scheme so that we can approximately construct the optimal sequences at the transmitter. Numerical results show that the optimal training sequences provide substantial performance gain for channel estimation when compared with other training sequences. 3.7 Appendix 3.7.1 A Trace Problem In this appendix, we analyze a variant of the optimization problem (3.18) which can be formulated as 1 1 min tr(RfSHQN SR + It)1 (3.64) s subject to tr{S"S} < P Two different trace optimization problems (3.18) and (3.64) are related in the form of cost functions. The cost function of the original optimization problem (3.18) can be rewritten as tr(SHQ lS + R 1)1 = trRt(RFSHQNSR? + I)1, which can be viewed as the weighting of the cost function of the new trace optimization problem (3.64). For the sake of notational simplicity, we consider the following same optimization problem as (3.64) with different but simpler notations. rain tr (DSHQSD + 1)1 (3.65) S subject to tr (SHS) < P, S E C"X. Here Q is a nonzero Hermitian, positive semidefinite matrix, D is a nonzero Hermitian, positive definite matrix, and the positive scalar P is the power constraint associated with the signal S. The main results on the solution to the optimization problem (3.65) are cited here for the completeness of the dissertation and the details can be found in the literature [81]. We write the inverse matrix C = (DSHQSD + I)1) for convenience. As discussed before, the solution in the special case D = I can be expressed in terms of the eigenvalues and eigenvectors of Q and a Lagrange multiplier associated with the power constraint. For the optimization problem introduced here, D $ I and minimizing the trace of C is more difficult. We will show that (3.65) has a solution that can be expressed S = UEVH where U and V are orthonormal matrices of eigenvectors for Q and D respectively, and E is diagonal. Solving (3.65) involves computing diagonalizations of Q and D, and finding an ordering for the columns of U and V. We are able to evaluate the optimal ordering when either P is large or P is small. However, for intermediate values of P, evaluating the optimal ordering is more difficult. The problem (3.65) has a combinatorial nature, unlike the special case D = I. The trace problem (3.65) arises in spreading sequence optimization for code division mul tiple access (CDMA) systems. In cellular communication systems, multiple access schemes allow many users to share simultaneously a finite amount of radio resources. CDMA is one of the main access techniques. It is adopted in the IS95 system and will be used in next generation cellular communication systems [82]. In a CDMA system, different users are assigned different spreading sequences so that the users can share the communication channel. We consider the uplink (communication from the mobile units to the base station) of a CDMA system where the users within a base station are symbol synchronous. The cochannel interference from the users in the neighboring cells are modeled by additive, colored Gaussian noise. The received signal at the base station is K y = hisix + e, i=1 where K is the number of signals received by the base station, xi is the symbol transmitted from the ith user, si, CN is the spreading sequence assigned to the ith user, hi is the channel gain from the ith user to the base station, and e E CN is the additive, colored Gaussian noise with zero mean and covariance E. Usually the size of K and N are comparable. It is assumed that the symbols x, are independent with zero mean and unit variance. The received signal can be expressed as (3.66) y = SHx + e, where S, the spreading sequence matrix, has jth column sj, and H is a diagonal matrix with ith diagonal element hi. Again, by the Bayesian GaussMarkov Theorem [36, 83], the MMSE estimator of x is x = (HHSHE1SH + I)HHSHEly. The corresponding covariance matrix of the estimation error is C, = (HHSHEISH + I) The optimal spreading sequences for all the users which minimizes the cochannel interference to other cells, subject to a power constraint, corresponds to (3.65) with Q = E1 and D = H, a diagonal matrix. To solve the trace optimization problem, we begin by analyzing the structure of an optimal solution to (3.65). Let UAUH and VAVH be diagonalizations of Q and D respectively (the columns of U and V are orthonormal eigenvectors). Let 6i, 1 < i < n, and Aj, 1 < j < m, denote the diagonal elements of A and A respectively. We assume that the eigenvalues are arranged in decreasing order: 1 62 > > 6n and A1> A2> ... > Am. (3.67) Let us define T = UHSV. (3.68) Making the substitution S = UTVH in (3.65) yields the following equivalent problem: min tr (ATHATA + I1)1 (3.69) subject to tr (THT) < P, T CCmxn. We now show that (3.69) has a solution with at most one nonzero in each row and column. Theorem 3.7.1. There exists a solution of (3.69) of the form T = II E112 where III and H2 are permutation matrices and oi = Ofor all i =, j. Combining the relationship (3.68) between T and S and Theorem 3.7.1, we conclude that problem (3.65) has a solution of the form S = UIIEII2VH, where III and 112 are permutation matrices. We will now show that one of these two permutation matrices can be deleted if the eigenvalues of D and Q are arranged in decreasing order. Let N denote the minimum of m and n. Making the substitution S = UfIIHI2VH in (3.65), we obtain the equivalent problem: min tr ((H2Ar2H)EH(H 1AI)E 2AH) + I (3.70) N subject to tr ao2 < P. i=1 Here the minimization is over diagonal matrices E with 01,.... aN on the diagonal, and per mutation matrices II1 and II2. The symmetric permutations HI'AII1 and I12AII' essentially interchange diagonal ele ments of A and A. Hence, (3.70) is equivalent to min N 1 (3.71) or, 11 2 (: (i)O)2A,1(i) + 1 N subject to 2 < P, 711 E Pm, 712 Pn i=1 where Pm is the set of bijections of {l1, 2,.. ., m} onto itself. We first show that we can restrict our attention to the largest diagonal elements of D and Q. Lemma 3.7.1. Let UAUH and VAVH be diagonalizations of Q and D respectively where the columns of U and V are orthonormal eigenvectors. Let a, 7r1, and 7r2 denote an optimal solution of(3.71) and define the sets AN= {i: >0}, Q= {A,,(.) : i Af}, and D = {6,( : i E }, IfJ f has I elements, then the elements of the set D and Q are all nonzero, and they constitute the I largest eigenvalues of D and Q respectively. Using Lemma 3.7.1, we now eliminate one of the permutations in (3.71). Theorem 3.7.2. Let UAUH and VAVH be diagonalizations of Q and D respectively where the columns of U and V are orthonormal eigenvectors, and the eigenvalues of Q and D are arranged in decreasing order as in (3.67). If K is the minimum of the rank of D and Q, then (3.71) is equivalent to K 1 mm 1 (3.72) min (6,,)2A,(i)+ 1 K subject to Za,2 P, 7rC 'PK, i=1 where ai = 0 for i > K. Proof The proof is similar to that for Theorem 3.3.2 E Corollary 3.7.1. Problem (3.65) has a solution of the form S = UIIH~VH where the columns of U andV are orthonormal eigenvectors of Q and D respectively with the associated eigenvalues arranged in decreasing order, H is a permutation matrix, and E is diagonal. Proof The proof is similar to that for Corollary 3.3.1. OD Assuming the permutation 7r in (3.72) is given, let us now consider the problem of optimiz ing over a. To simplify the indexing, let pi denote A,(i). Hence, for fixed 7r, (3.72) is equivalent to the following optimization problem: min (3.73) rnin ~ (6iC)2pi + 1 K subject to af l P. i=1 The solution of (3.73) can be expressed in terms of a Lagrange multiplier for the constraint. Theorem 3.7.3. The optimal solution of (3.73) is given by cr = max { P p, 1/2 (3.74) where the parameter p is chosen so that K or 2 = P. (3.75) Proof The proof is similar to that for Theorem 3.3.3. To solve (3.65), we need to find an optimal ordering for the eigenvalues of D and Q. In Theorems 3.7.4 and 3.7.5, we determine the optimal ordering when the power P is either large or small. Theorem 3.7.4. If the eigenvalues {Ai } and {6, } of Q and D respectively are arranged in decreasing order, then for P sufficiently large, an optimal permutation in (3.72) is 7r(i) = K + 1 i., < i < K, 7i) = i, i > K. (3.76) Theorem 3.7.5. Suppose the eigenvalues {Ai } and {6i } of Q andD respectively are arranged in decreasing order, and let L be the minimum of the multiplicities of 61 and A1. For P sufficiently small, an optimal solution of (3.65) is S = u'ivH (3.77) where ui and vi are the orthonormalized eigenvectors of Q and D associated with A1 and 61 respectively. 3.7.2 A Determinant Problem In this appendix, we analyze the following matrix optimization problem where we maxi mize the determinant, denoted "det", of a matrix: max det (DSHQSD + I) (3.78) S subject to tr (SHS) < P, S E Cmxn Since the determinant of the inverse of a matrix is the reciprocal of the determinant of the matrix, it follows that problem (3.78) is equivalent to replacing trace by determinant in (3.65). Hence, in the original problem (3.65), we minimize the sum of the eigenvalues of the MSE matrix C, while in the second problem (3.78), we minimize the product of the eigenvalues of C. In either case, we try to make the eigenvalues of C small, but with different metrics. For the special case D = I, the solution of (3.78) can be found in Telatar [1], and for the special case Q = I, the solution of (3.78) can be found in Zhou [63]. For the more general problem (3.78), we again show that the solution can be expressed S = UEVH, where U and V are orthonormal matrices of eigenvectors for Q and D respectively, and E is diagonal. Unlike the trace problem (3.65), the ordering of the columns of U and V does not depend on the power P the columns of U and V should be ordered so that the associated eigenvalues of Q and D are in decreasing order. This optimal eigenvector ordering result is the same as that for the optimization problem (3.18) in Section 3.3 when the same notations for corresponding matrices are adopted. In Cai et al. [65], the authors formulated the similar optimization problem while studying the spacetime spreading (STS) scheme for correlated fading channels in the presence of interference. Based on the previous optimization result for the special case Q = I [63], UEV" was chosen as the STS matrix, and then the optimal eigenvector ordering and E were decided. Here we solve the optimization problem (3.78) by using the method introduced in Wong et al. [61] and Wong et al. [84]. (Two important matrix inequalities arising from majorization theory [40] are used.) The determinant problem arises from spreading sequence optimization for CDMA systems. For CDMA systems, a different performance measure, which arises in information theory, is the sum capacity of the channel. The mean square error is a performance measure for uncoded systems, while the sum capacity is a performance measure for coded systems. It represents the maximum sum of the rates at which users can transmit information reliably. The sum capacity of the synchronous multiple access channel (3.66) is Csum = max I(xi,..., XK; y), where I represents the mutual information [74] between the inputs x1, X2,..., XK and the out put vector y. The maximization is over the independent random inputs X1, X2, ... XK. The maximum is achieved when all the random inputs are Gaussian. In this case, the sum capacity [71, 85] becomes Csum = 1 log det (HHSHE1SH + I). 2N Since log is a monotone increasing function, the maximization of the sum capacity, subject to a power constraint, corresponds to the optimization problem (3.78) with Q = E1 and D = H. The solution to the determinant problem (3.78) can be expressed as follows: Theorem 3.7.6. Let UAUH and VAVH be the diagonalizations of Q and D respectively where the columns ofU andV are orthonormal eigenvectors and the corresponding eigenvalues {A,} and {6f} are arranged in decreasing order If K is the minimum of the rank of Q and D, then the optimal solution of (3.78) is given by S = UEVH, (3.79) where E is diagonal with diagonal elements given by a, = max i 1/20 for 1 < i < K (3.80) and ai = O for i > K, where the parameter p is chosen so that K ^P. i= 1 Proof Initially, let us assume that both D and Q are nonsingular later we remove this restric tion. Insert T = Q1/2S in (3.78) and multiply the objection function on the left and right by det (D1) to obtain the following equivalent formulation: max det (THT + D2) (3.81) T subject to tr (TTHQ1) < P, T E Cmxn Let wi, 1 < i < n, denote the eigenvalues of THT arranged in decreasing order. By a theorem of Fiedler [86] (also see [40, Chap. 9, G.4]), the determinant of a sum THT + D2 of Hermitian matrices is bounded by the product of the sum of the respective eigenvalues (assuming the eigenvalues of THT and D are in decreasing order): det (THT + D2) < U(w, + 6 2) (3.82) i=1 Also, by a theorem of Ruhe [87] (also see [40, Chap. 9, H2]), the trace of a product (TTH)Q1 of Hermitian matrices is bounded from below by the sum of the product of respective eigenval ues (assuming the eigenvalues of TTH and Q are in decreasing order): N tr (TTHQ1) > wA.1, N = min{m,n}, (3.83) i=1 since at most N eigenvalues of THT and TTH are nonzero. We replace the cost function in (3.78) by the upper bound (3.82) and we replace the con straint in (3.78) by the lower bound (3.83) to obtain the problem: max ( 2 n(wO + 6.2) (3.84) i=N+ i=1 N subject to L toA < P, w uwi+1 > 0 for i < N. i=1 If T is feasible in (3.81), then the square of its singular values are feasible in (3.84) by (3.83). And by (3.82), the value of the cost function in (3.84) is greater than or equal to the associated value (3.81). Since the feasible set for (3.84) is closed and bounded, and since the cost function is continuous, there exists a maximizing w, and the maximum value of the cost function (3.84) is greater than or equal to the maximum value in (3.81). Consider the matrix T = UVnl/2VH where f is a diagonal matrix containing the max imizing w on the diagonal. For this choice of T, the inequalities (3.82) and (3.83) are both qualities. Hence, this choice for T attains the maximum in (3.81). The corresponding optimal solution of (3.78) is S = Q1/T = UA1/2UHU 1/2VH = UA1/2nl/2VH. (3.85) To complete the proof of the theorem, we need to explain how to compute the optimal W in (3.84). At the optimal solution of (3.84), the power constraint must be an equality (otherwise, we could multiply w by a positive scalar and increase the cost). Let us ignore the monotonicity constraint w _> o w+1 (we will show that the maximizer satisfies this constraint automatically). After taking the log of the cost function, we obtain the following simplified version of (3.84): N max log(w + 2) (3.86) N subject to w iA1 = P, W > 0. i=i Since the cost function is strictly concave, the maximizer of (3.86) is unique. The firstorder optimality conditions (KKT conditions) for an optimal solution of (3.86) are the following: There exists a scalar p > 0 and a vector v e R" such that 1 p i + vi = 0, > 0, w > 0, viwi = 0, 1 < i < N. (3.87) +,_ + A2 A. Analogous to the proof of Theorem 3.7.3, we define the function i) = ( 2) + (3.88) This particular value for wi is obtained by setting i = 0 in (3.87), solving for wu; when the solution is < 0, we set wji(p) = 0 (this corresponds to the + operator (3.88)). Observe that wi(/) in (3.88) is a decreasing function of y which approaches +oo as p approaches 0 and which approaches 0 as p tends to +00. Hence, the equation n i(p) A\t = P (3.89) i= 1 has a unique positive solution. We have wc = 0 for p > Ai62, which implies that 1 = ) ,+ 1 = > 62++6,2 =0 when p>\Ai6. i(p) + '(, 2 A, 62 + A  It follows that the KKT conditions are satisfied when p is the positive solution of (3.89). Since the At and 6i are both arranged in decreasing order, it follows that for any choice p > 0, the w, given by (3.88) are in decreasing order. Hence, the constraint woi+ < wi in (3.84) is satisfied by the solution of (3.86). Combining the formula (3.88) for the solution of (3.86) with the expression (3.85) for the solution of (3.78), we obtain the solution S given in (3.79) and (3.80) where E = A 1/21/2. Now suppose that either D or Q is singular. Let us consider a perturbed problem where we replace Q by Q, = UAUH and D by D, = VAVH: max det (DSHQSD, + I) (3.90) S subject to tr (SHS) < P, S Cmxn Here A, and A, are obtained from A and A by setting 6, = c = Aj for i or j > K. Since Qe and D, are nonsingular, it follows from our previous analysis that the perturbed problem (3.90) has a solution of the form S, = UEVH where the diagonal elements of E, are given by max 1 01/2 for 1 o = { 1 ) ~ (3.91) max 0 fori > K. Let p be chosen so that K ae)2 = P. i=1 Observe that when c3 < p, we have ac = 0 for i > K and N c)2 = p. i=1 Hence, for each e > 0 with e3 < p, the optimal solution of the perturbed problem does not depend on e and the trailing diagonal elements a' for i > K vanish. Since the cost function in the perturbed problem (3.90) is a continuous function of e, we conclude that for e3 < p, S, is the optimal solution of (3.90) for e = 0. The perturbed problem (3.90) with e = 0 coincides with the original problem (3.78). Consequently, the solution (3.79) and (3.80) is valid, even when either Q or D is singular. F CHAPTER 4 CONCLUSION AND FUTURE WORK To achieve the performance gain promised by multiple antenna systems, parameter estima tions including timing estimation and channel estimation are key components of the spacetime system design. In this work, we investigate the timing estimation and channel estimation prob lems for MIMO systems. 4.1 Timing Estimation for Rayleigh Flatfading MIMO Channels In Chapter 2, we consider a wireless communication system with multiple transmit and receive antennas in a slow, independent and identically distributed (i.i.d.) Rayleigh flatfading environment. We study the problem of timing estimation in such a system with the aid of training signals from two different approaches. In the first approach, the channel is assumed to be unknown but deterministic and joint ML estimation of the channel and delay is performed. In contrast, in the second approach, we assume that the channel is random but with known statistics and use the likelihood function averaged over all random channel realizations to construct the ML estimator for the delay. For both approaches, we derive the optimal training sequences based on the performance measures associated with the CRB of timing estimation. These two approaches lead to two different optimal training signal designs. For the deterministic channel approach, we show that orthogonal training signals from multiple transmit antennas minimize the outage probability as well as the average CRB. For the random channel approach, perfectly correlated training signals employed at different transmit antennas minimize the CRB. 4.2 Channel Estimation for Correlated MIMO Channels with Colored Interference In Chapter 3, we consider a wireless communication system with multiple transmit and receive antennas in a slow, Rayleigh flatfading environment. We investigate the problem of estimating correlated MIMO channels in the presence of colored interference. The Bayesian channel estimator is derived and the optimal training sequences are designed based on mini mizing the MSE of channel estimation. The design of the optimal training sequences has a 87 clear physical interpretation which implies that we should assign more power to the transmis sion direction constructed by the eigendirections with larger channel gains and the interference subspaces with less interference. The power assignment is determined by the waterfilling argu ment under a finite power constraint. In order to implement the channel estimator and construct the optimal training sequences, we propose an algorithm to estimate longterm channel statis tics and design an efficient feedback scheme so that we can approximately construct the optimal sequences at the transmitter. Numerical results show that with optimal training sequences, the MSE of channel estimation can be reduced substantially when compared with other training sequences. 4.3 Timing Estimation for Correlated MIMO Channels with Colored Noise In the second chapter, we study the timing estimation problem with the assumption that the fading coefficients between the pairs of transmit and receive antennas are independent and identically distributed. This assumption does not generally hold in practice due to the antenna spacings and orientation, the mutual coupling, the richness of scattering, and the presence of dominant components [88]. Thus it is natural to extend the current work to investigate the synchronization problem in correlated channels. Another possible direction to extend the present work is to address the timing estimation problem for the MIMO system in colored noise. It is more suitable to adopt the colored noise model than the white noise model when jammers and cochannel interference are present in the communication system. REFERENCES [1] I. E. Telatar, "Capacity of multiantenna Gaussian channels," Eur Trans. Telecom., vol. 10, pp. 585595, Nov. 1999. [2] G. J. Foschini and M. J. Gans, "On limits of wireless communications in a fading environ ment when using multiple antennas," Wireless Commun. Mag., vol. 6, pp. 311335, Mar. 1998. [3] V. Tarokh, N. Seshadri, and A. R. Calderbank, "Spacetime codes for high data rate wire less communication: performance criterion and code construction," IEEE Trans. Inform. Theory, vol. 44, pp. 744765, Mar. 1998. [4] S. Baro, G. Bauch, and A. Hansmann, "Improved codes for spacetime trelliscoded mod ulation," IEEE Comm. Lett., vol. 4, pp. 2022, Jan. 2000. [5] A. R. Hammons and H. E. Gamal, "On the theory of spacetime codes for PSK modula tion," IEEE Trans. Inform. Theory, vol. 46, pp. 524542, Mar. 2000. [6] S. M. Alamouti, "A simple transmit diversity technique for wireless communications," IEEEJ. Select. Areas in Commun., vol. 16, pp. 14511458, Oct. 1998. [7] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, "Spacetime block coding from orthogo nal designs," IEEE Trans. Inform. Theory, vol. 45, pp. 14561467, July. 1999. [8] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, "Spacetime block coding for wireless communications: performance results," IEEE J. Select. Areas in Commun., vol. 17, pp. 452460, Mar. 1999. [9] G. Ganesan and P. Stoica, "Spacetime diversity using orthogonal and amicable orthogonal designs," Wireless Personal Communications, vol. 18, pp. 165178, Aug. 2001. [10] S. Alamouti, V. Tarokh and P. Poon, "Trelliscoded modulation and transmit diversity: design criteria and performance evaluation," in Proc. IEEE Int. Conf. Universal Personal Communications, vol. 2, Florence, Italy, Oct. 1998, pp. 703707. [11] B. M. Hochwald and T. L. Marzetta, "Unitary spacetime modulation for multipleantenna communication in Rayleigh flat fading," IEEE Trans. Inform. Theory, vol. 46, pp. 543 564, Mar. 2000. [12] B. Hassibi and B. M. Hochwald, "Highrate codes that are linear in space and time," IEEE Trans. Inform. Theory, vol. 48, pp. 18041824, Jul. 2002. [13] S. Siwamogsathama and M. P. Fitz, "Robust spacetime coding for correlated Rayleigh fading channels," IEEE Trans. Signal Processing, vol. 50, pp. 24082416, Oct. 2002. [14] Y. Gong and K. B. Letaief, "Concatenated spacetime block coding with trellis coded modulation in fading channels," in IEEE Trans. Wireless Commun., vol. 4, pp. 580590, Oct. 2002. [15] G. J. Foschini, "Layered spacetime architecture for wireless communication in a fading environment when using multielement antennas," Bell Labs. Tech. J. vol. 1, no. 2, pp. 4159, 1996. [16] G. Foschini, G. Golden, R. Valenzuela, and P. Wolniansky, "Simplified processing for high spectral efficiency wireless communication employing multielement arrays," IEEE J. Select. Areas in Commun., vol. 17, pp. 18411852, Nov. 1999. [17] S. Verdu, Multiuser Detection. Cambridge, UK: Cambridge Univ. Press, 1998. [18] M. J. Gans, N. Amitay, Y. Yeh, H. Xu, T. Damen, R. Valenzuela, T. Sizer, R. Storz, D. Taylor, W. MacDonald, C. Tran, and A. Adamiecki, "Outdoor BLAST measurement sys tem at 2.44 GHz: calibration and initial results," IEEE J. Select. Areas in Commun., vol. 20, pp. 570583, Apr. 2002. [19] C. Budianu and L. Tong, "Channel estimation for spacetime orthogonal block codes," IEEE Trans. Signal Processing, vol. 50, pp. 25152528, Oct. 2002. [20] P. Stoica and 0. Besson, "Training sequence design for frequency offset and frequency selective channel estimation," IEEE Trans. Commun., vol. 51, pp. 19101917, Nov. 2003. [21] C. Chuah, D. Tse, J. Kahn and R. Valenzuela, "Capacity scaling in MIMO wireless systems under correlated fading," IEEE Trans. Inform. Theory, vol. 48, pp. 637650, Mar. 2002. [22] A. L. Moustaks and S. H. Simon, "Optimizing multipleinput singleoutput (MISO) com munication systems with general Gaussian channels: nontrivial covariance and nonzero mean," IEEE Trans. Inform. Theory, vol. 49, no. 10, pp. 27702780, Oct. 2003. [23] S. A. Jafar and A. Goldsmith, "Multipleantenna capacity in correlated Rayleigh fading with channel covariance information," IEEE Trans. Wireless Commun., vol. 4, no. 3, pp. 990997, May. 2005. [24] E. A. Jorswieck and H. Boche, "Channel capacity and capacityrange of beamforming in MIMO systems under correlated fading with covariance feedback," IEEE Trans. Wireless Commun., vol. 3, pp. 15431553, Sep. 2004. [25] E. A. Jorswieck and H. Boche, "Optimal transmission strategies and impact of correlation in multiantenna systems with different types of channel state information," IEEE Trans. Signal Processing, vol. 52, no. 12, pp. 34403453, Dec. 2004. [26] A. Lozano and A. M. Tulino, "Capacity of multipletransmit multiplereceive antenna ar chitectures," IEEE Trans. Inform. Theory, vol. 48, no. 12, pp. 31173128, Dec 2002. [27] A. L. Moustakas, S. H. Simon, and A. M. Sengupta, "MIMO capacity through correlated channels in the presence of correlated interferers and noise: a (not so) large N analysis," IEEE Trans. Inform. Theory, vol. 49, no. 10, pp. 25452561, Oct. 2003. 