UFDC Home  myUFDC Home  Help 



Full Text  
DESIGN AND ANALYSIS OF OPTIMAL DECODING MODELS FOR BRAIN MACHINE INTERFACES By SUNGPHIL KIM A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005 Copyright 2005 by SungPhil Kim This document is dedicated to my mother and my wife. ACKNOWLEDGMENTS I would like to thank God for His endless love to give me the best all the time in my life. I would also like to thank my mother, brothers and my wife for their love, supports, and solid belief in me. I would like to thank Dr. Jose C. Principe for his untiring support, advice, and guidance. I will never forget his inspiration for me to make me think as a researcher. I am very grateful to Dr. John G. Harris, Dr. Michael C. Nechyba, Dr. Karl Gugel, and Dr. Mark C.K.Yang for their support and advice for the brainmachine interfaces research. I am also exceptionally grateful to Dr. Justin C. Sanchez, Dr. Yadunandana N. Rao, Dr. Deniz Ergodmus, and Shalom Darmanjian for their sincere support and collaboration. I must acknowledge Dr. Miguel A.L. Nicolelis and Dr. Jose M. Carmena for the opportunity to conduct this research with their support. Final thanks go to Yuseok Ko, Jeongho Cho and Dongho Han, who have always helped and encouraged me. TABLE OF CONTENTS page ACKNOWLEDGMENT S .............. .................... iv LI ST OF T ABLE S ................. .............._ viii..._._..... LIST OF FIGURES .............. .................... ix AB STRAC T ......__................ ........_._ ........xi CHAPTER 1 INTRODUCTION ................. ...............1.......... ...... Review of BMI Signal Processing ................. ...............3............ ... Approaches .............. ...............6..... O utline .............. ...............8..... 2 EXPERIMENTAL SETUPS FOR BRAINMACHINE INTERFACES ................... 10 Recording of Electrical Activity of Neuronal Ensembles .............. .....................1 Behavioral Tasks ................. ...............11........ ...... Properties of Data ................. ...............12.............. Neuronal Firing Patterns .......................... ...............13...... Hand Movements. ................. ...............16........ ...... 3 LINEAR MODELING ................. ...............19.............. Linear Modeling for BMIs ................. ...............19........ ..... The Wiener Filter ................. ...............23......___. .... Stochastic Gradient Learning .............. ...............27.... Other Linear Modeling .............. ...............28.... 4 REGULARIZED LINEAR MODELING ................. ...............3.. 1............. Dimension Reduction Using Sub space Proj section .......... ................ ...............3 1 A Hybrid Subspace Projection .............. ........ ....... ...... .......3 Design of a Decoding Model Using the Subspace Wiener Filter........................35 Parsimonious Modeling in Time Using the Gamma Filter ................. ................ ..37 Regularization by Parameter Constraints .............. ...............41.... Review of Shrinkage Methods .............. ...............43.... Shrinkage methods ................. ... .. .. .. .........4 The relationship between sub space proj section and ridge regression ...........45 Comparison of shrinkage methods ........._.._.. ....._.. ........__. .......45 Regularization Based on the L2Norm Penalty ................. ............... ...._...47 Regularization Based on the L1norm Penalty............... ...............50 5 NONLINEAR MIXTURE OF MULTIPLE LINEAR MODEL S.........__ ................54 Nonlinear Mixture of Linear Models Approach ................. ...........................5 5 Nonlinear Mixture of Competitive Linear Models ................. ......................55 Time Delay Neural Networks ................. ...............59................ BMIs Design Using NMCLM .............. ...............59.... A analysis .................. .... ....... .. ........ ........6 Evaluation of Training Performance for NMCLM .............. ....................6 Analysis of Linear Filters ................ ...............62................ 6 COMPARISON OF MODELS................ ...............64 Comparison of Model Parameters ................. ...............67................ Performance Evaluation............... ...............6 Statistical Performance Comparison .............. ...............70.... 7 MULTIRESOLUTION ANALYSIS FOR BMI ................... ............... 7 Multiresolution Analysis of Neuronal Spike Trains .......................__ ..............76 Multiresolution Analysis ................... ...... ...............7 Multiresolution Analysis for the BMI Data. .............. ........._ ...............80 The Analysis of the Linear Model Based on the Multiresolution Repres entati on .............. ......... ._. ... .....__ ..... .........8 Comparison of Models with the Multiresolution Representation. .............. ..... ..........85 Combination of Linear and Nonlinear Models ....._____ ..... ... ._ ............._..89 Nonlinear M odeling. ............ _...... ._ ...............91.... Simulations ............ _...... ._ ...............93.... Discussions .............. ...............95.... 8 DETERMINATION OF NEURONAL FIRING PATTERNS USING NON NEGATIVE MATRIX FACTORIZATION .............. ...............99.... Nonnegative Matrix Factorization ........_........_...__ .........._ .............0 Factorization of Neuronal Bin Count Matrix ................. ........................__..103 Data Preparation ........._..._.._ ................. 103....... .... 3D food reaching data ......... ......_..._.._ ...............103.... 2D target reaching data .............. ...............105.... Analysis of Factorization Process .............. ...............105.... Choice of the number of bases ......___ ....... ___ ... ...__..........0 How does NMF find repeated patterns? ......____ .......__ ..............106 Local minima problem ............__......___....._ .............1 Case Studies A: 3D Food Reaching .........__.........__ ......__. ........10 Case Study B: 2D Target Reaching ....._.__._ ........._. ....._._. ........13 Model Improvement Using NMF ................. ...............119............... Discussions ................. ................. 121........ .... 9 REAL TIME NEURONAL SUB SET SELECTION ................ ............ .........123 OnLine Variable Selection ................ ...............126................ OnLine Channel Selection Method ................ ...............129............... Determination of Selection Criterion ............... ... ... ......... ........... .........13 Determination of Threshold in LAR Using Surrogate Data.............................. 132 Conditional Selection Criterion............... ...............13 Experiments of Neuronal Subset Selection .............. ...............141.... Discussions ............. ...... .............. 149... 10 CONCLUSIONS AND FUTURE WORKS ......____ ...... .. __ ..........__.....154 LIST OF REFERENCES ...._... ................. ...............162 ..... BIOGRAPHICAL SKETCH ................. ...............169......... ...... LIST OF TABLES Table pg 21 The distributions of the sorted neuronal activity for each monkey in motor cortical areas ................. ...............11................. 41 Procedure of the LAR algorithm .....__.....___ ..........._ ...........5 61 The generalization performances of linear models and nonlinear models for the 3D food reaching task. ............. ...............69..... 62 The generalization performances of linear models and nonlinear models for the 2D target reaching task. .............. ...............69__ ...... 63 The ttest results for the difference of the magnitude of error vectors from the test dataset between the Wiener filter and other models............... .................7 71 The number of the selected neurons in each cortical area ................. ................. .85 72 The number of the nonzero weights ................. ...............87......___. .. 73 The number of neurons selected by LAR for each models .................. ...............88 74 Performance comparison between the multiresolution and the single resolution m odels. ............. ...............88..... 75 Performance comparison between the combinatory model and the single linear m odel. ............. ...............94..... 81 C omp ari son of i mp ortant neuron s; food reach ng ................. ........................11 3 82 Comparison of important neurons: target reaching ................. .......__. .........119 83 Performance evaluation of the Wiener filter and the mixture of multiple models based on NM F. ............. ...............120.... 91 Procedure of the LAR algorithm: revisited. ......____ .......__ .............. ..126 92 The modified LAR algorithm for online variable selection .........._... ..............128 LIST OF FIGURES Figure pg 11 A system identification block diagram for BMIs ................. .......... ...............2 21 An experimental setup of 3D reaching task. ............. .....................12 22 An experimental setup of 2D target reaching task. The monkey moves a cursor (yellow circle) to a randomly placed target (green circle), and rewarded if a cursor intersects the target. ........._ ...... .. ...............13... 23 An example of the binned data. ........._._ ....__. ...............13 24 The plots of the average (dot) and the standard deviation (bar) for each neuron of three m onkeys .............. ...............14.... 25 The traj ectories of the estimated mean firing rates for movement (solid line) and rest (dotted line) over sequence of subsets ................. ...............15........... .. 26 Illustrations of nonstationary properties of the input autocorrelation matrix...........16 27 Sample traj ectories of (a) 3D food reaching, and (b) 2D target reaching m ovem ents. ............. ...............17..... 28 The db6 continuous wavelet coefficients of traj ectory signals of (a) 3D food reaching, and (b) 2D target reaching. ......____ ... ....._ ....__ ...............18 31 The topology of the linear filter designed for BMIs in the case of the 3D reaching task .............. ...............20.... 32 The Hinton diagram of the weights of the Wiener filter for food reaching. ............26 33 The Hinton diagram of the weights of the Wiener filter for target reaching........... .27 41 The overall diagram of the sub space Wiener filter ................. ................ ...._.34 42 The contour map of the validation MSE for (a) food reaching, and (b) target reaching. ............. ...............3 5.... 43 The first three proj section vectors in PCA for (a) food reaching, and (c) target reaching, and PLS for (b) food reaching, and (d) target reaching, respectively ................. ...............37................. 44 An overall diagram of a generalized feedforward filter............._ .........._ .....39 45 The contour maps of the validation MSE computed at each grid (~K,, pl, for (a) food reaching, and (b) target reaching............... ...............41 46 Contours of the L,norm of weight vector for various values of p in the 2D weight space. .............. ...............46.... 47 Convergence of the regularization parameter 3(n) over iterations; (a) food reaching, and (b) target reaching............... ...............49 48 The histogram of the magnitudes of weights over all the coordinates of hand position, trained by weight decay (solid line) and NMLS (dotted line); (a) food reaching, and (b) target reaching ................. ...............50........... ... 49 An illustration of the LAR procedure. ........._ ...... ......__..........5 51 An overall diagram of the nonlinear mixture of competitive linear models. ...........56 52 Demonstration of the localization of competitive linear models. ..........................58 53 Frequency response of ten FIR filters; (left) pole zero plots, (right) frequency responses. ............. ...............63..... 61 The actual hand traj ectory (dotted red line) and the estimated hand traj ectory (solid black line) in the x, y, and zcoordinate for the 3D food reaching task on a sample part of the test data ........... ..... ._ ...............65.. 62 The actual hand trajectory (dotted red line) and the estimated hand traj ectory (solid black line) in the x, and ycoordinate for the 2D target reaching task on a sample part of the test data. ........... ..... ._ ...............66. 63 The distributions of normalized weight magnitudes of four linear models over neuronal space for; (a) food reaching, and (b) target reaching. ............. ................68 64 Comparison of the CEM of the nine models for (a) the food reaching task, and (b) the target reaching task ................. ...............70................ 71 An illustration of the scaled convolution output from the Haar a trous wavelet transform .............. ...............8 1.... 72 An example of the series of u,(k) along with the corresponding hand traj ectories ................. ...............8.. 2.............. 73 The demonstration of the relation between the neuronal firing activity representation at each scale (solid lines) and the hand position traj ectory at x coordinate (dotted lines) ................. ...............83........... .... 74 The distribution of the selected input variables for (a) xcoordinate, (b) and y coordinate of position, and (c) xcoordinate, and (d) ycoordinate of velocity........86 75 The CEM curves of the single resolution model (red dotted lines), and the multiresolution model (black solid lines)............... ...............89. 76 An example of the residual traj ectory from a linear model (the xcoordinate). .......93 77 An example of the output traj ectories of the combinatory network and single linear m odel. ............. ...............95..... 78 Tap outputs from two generalized feedforward filters for a neuronal bin count input with different delay: the gamma, and Haar wavelet ................. .......................97 81 Segmentation of the reaching traj ectories: reach from rest to food, reach from food to mouth, and reach from mouth to rest position .............. ....................10 82 The NMF results for food reaching ................. ...............111........... .. 83 The NMF results for target reaching ................. ...............114........... .. 84 The hand position samples collected along with peaks in each NMF encoding (left), and the mean and variance of each set (right). ........_................. ................1 16 85 The probabilities of the occurrence for hand position to be in each of sixteen angle bins. .............. .. ...............117......... ...... 86. Tuning curve of neuronal firing patterns encoded in each NMF basis for 16 angle bins. ........... ..... ._ ...............118.. 91 The diagram of the architecture of real time neuronal subset selection method....13 1 92 An illustration of the successive maximum correlation over stages in the case of two variables (channels)............... ..............13 93 Examples of the maximum absolute correlation curve in LAR. ...........................13 5 94 Neuronal subset selection examples............... ...............13 95 Demonstration of filter outputs before subset selection; (top) synchronized data, (bottom) desynchronized data. ...._ ......_____ ............ .............3 96 Neuronal subset selection conditioned by the correlation between filter outputs and desired response. ............. ...............142.... 97 Demonstration of the robustness of the algorithm to initial conditions. ................143 98 An example of the outputs of two tracking systems with (solid line), and without online channel selection (dashed line). ............. ...............144.... 99 Neuronal subset selection for all three coordinates of food reaching movement. .145 910 Neuronal subset selection over 2,000second data; (a) subsets in the early part, and (b) subsets in the late part of the data ................. ...............146........... . 911 Neuronal subset selection for a 2D target reaching BMI. .................. ...............148 912 2D hand traj ectories in five sample data segments selected in Fig. 91 1...............148 913 Selection of individual neurons over a series of reaching movements. ...............15 1 914 The distribution of the subset size over a series of reaching movements. .............153 915 Comparison of the average misadjustment per movement between the standard MIMO system learned by LMS and the MIMO system with online channel selection ................. ...............153................ Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy DESIGN AND ANALYSIS OF OPTIMAL DECODING MODELS FOR BRAIN MACHINE INTERFACES By SungPhil Kim May 2005 Chair: Jose C. Principe Major Department: Electrical and Computer Engineering The role of decoding models in the design of brainmachine interfaces (BMIs) is to approximate the mapping from the firing activity of the cortical neuronal ensemble to associated behavior. The linear model, that in a statistical signal processing setting is called the Wiener filter, has been the primary vehicle to estimate the mapping. One of the purposes of this dissertation is to conduct an extensive comparative study of multiinput, multioutput (MIMO) decoding models in two experimental BMI settings in which monkeys perform dissimilar behavioral tasks. The issues in decoding model estimation for BMIs include the large input dimensionality, the spatiotemporal neural firing patterns, nonstationary, and the adequacy of the linearity assumption. These issues lead us to concentrate our studies into four research directions; the topology of the models (linear versus nonlinear), regularization both in space and time, preprocessing from discrete events to continuous input variables, and ways to cope with the nonstationarity present in the data. The comparison of the optimized linear and nonlinear MIMO models with the Wiener filter based on generalization performance shows that the improvement, although statistically significant, is minor with respect to the baseline. A second line of investigation deals with the analysis of motor cortex activity based on experimental BMI setups. Firstly, we propose an input based strategy called use non negative matrix factorization (NMF) to uncover spatiotemporal patterns in neuronal ensembles correlated to behavior. The specific spatiotemporal patterns of neural activity can be determined from the NMF basis vectors using only the input data, and their temporal relationships with behavior can be extracted from the NMF encodings. Secondly, a real time neuronal subset selection method is developed to find a subset of neurons that is most relevant to kinematic traj ectories at every sampling time instance. The method based on an online implementation of the LAR (Least Angle Regression) algorithm requires the availability of the desired response. The experimental analysis demonstrates the nonstationary characteristics of the relationship between the activity of neuronal ensemble and behavior. CHAPTER 1 INTTRODUCTION The direct control of machines by thought has been rather close to fiction until recent developments in neuroscience which seek direct interfaces between brain and machines. This emerging field has been called brainmachine interfaces (BMIs). One of the clinical demands driving BMIs is restoring motor functions in 'lockedin' patients who suffer from paralysis caused by traumatic or degenerative lesions. In fact, there are more than 200,000 patients in the United States of America who live with partial or total permanent paralysis with 11,000 new cases each year [Nob99]. Eventually, BMIs may also impact the very paradigm of human computer interfaces. Several research groups have demonstrated that subj ects can control robotic arms or computer cursors on screen by using their brain activity [Car03, Cha99, Ken98, Mor99, Mus04, Ser02, She03, Tay02 and Wes00]. These demonstrations in rodents, primates, and human patients show promising ways to bypass spinal cord lesions. In these experiments, up to a hundred electrodes are chronically implanted in motor areas in the cortex to record the electrical activities of hundreds of neurons. The control signals for external devices are extracted by a series of signal processing modules including spike detection/sorting algorithms and decoding algorithms. This experimental BMI paradigm, which is illustrated in Fig. 11, relies on three basic elements. Longterm and stable recordings enable us to obtain a mass of neuronal activity through microelectrode arrays. A mathematical model extracts the information of motor parameters from neuronal activity recordings in real time. A prosthetic device such as a robotic arm receives control signals from a mathematical model to coordinate the subj ect' s intended movement. Spike Spike Wiener sortingnl binming ~ filter ~ dn 5 d(n) b i/l ~mouth~Yfo tray Figure 11. A system identification block diagram for BMIs. This dissertation mainly focuses on building mathematical models in BMIs. These models utilize spike trains provided by spike sorting algorithms as inputs, and desired response of movement parameters such as hand position, velocity, or the gripping force which are synchronously recorded by optical sensors during motor performance of the subj ect. The design of these models can be viewed as a system identification problem [Hay96a]. Recent investigations in BMI modeling have demonstrated successful estimation of the transfer function from motor cortex neural firing patterns to hand movement traj ectory of primates, with a relatively simple Wiener filter [Cha99; Mor99; Ser02 and Wes00]. If one thinks about the complexity of the motor system, starting from the intricate firing modulation of millions of cells in the cortex, passing through the added complexity of the spinal cord functionality up to the spatiotemporal firing of motor neurons that control each muscle fiber, it is rather surprising that a simple linear proj section in the input space is able to capture the behavior of this complex system with correlation coefficients around 0.8 between the desired and actual traj ectories. This leads us to look from an optimal signal processing framework at the challenges and opportunities of this class of models for BMIs. There are several challenges of the application based on an idea of the BMI setup. First, the spatiotemporal patterns in spike trains data are not fully known and thus cannot guide us in the proper way for designing the models. Second, this is a MIMO (multiple inputs multiple outputs) mapping problem, with a large dimensionality (i.e., for 100 neuronal inputs, the Wiener filter with 10 taps has 1,000 free parameters for each coordinate of outputs). Third, the statistics are not constant either in time or in space. Fourth, some neuronal firings are not related to the task and constitute therefore noise in the data. Fifth, there is no way of knowing if the true mapping is linear or nonlinear. In spite of all these difficult questions the linear model learns the traj ectory with a mean correlation coefficient of 0.6 ~ 0.8; therefore it is instructive to undertake a systematic analysis of the issues to derive Wiener filters for BMIs. Review of BMI Signal Processing An approach to restore motor functions in paralyzed patients using direct interfaces between cortical motor areas and artificial actuators was first proposed by Schmidt [Sch80]. He proposed to connect from the electrical activities of cortical neuronal ensemble to an actuator to bypass spinal cord injuries. Recently, Chapin and coworkers demonstrated that rats were trained to receive rewards of water drops by pressing a lever to control the rotation of a robotic arm [Cha99]. A linear model learned by least squares utilized the activities of 2146 neurons in primary motor cortex (M1) as inputs to predict the motion of robot. Rats turned out to learn to control the robotic arm using only neuronal signals without moving arms. Afterwards, other research groups j oined the line of the study of experimental BMIs. Wessberg et al. [Wes00] in a joint research group including Duke University, SUNY, and MIT demonstrated a real time control of robotic arm using up to 100 neuronal activities. The Wiener filter or time delay neural network (TDNN) was designed to predict the 3D hand trajectories of food reaching movements using neuronal bin count data with a 100ms nonoverlapping time windows embedded by a 10tap delay line. Carmena et al. at Duke University also showed that with a relatively large number of cells (>100) monkeys could brain control a robot arm to perform two distinct different motor tasks including reaching and grasping [Car03]. In these experiments, monkeys could control a real robotic actuator through a closedloop BMI. They also reported the change of the contributions of neuronal populations during learning. Taylor et al. at Arizona State University presented a 3D cursor tracking BMI in their report [Tay02], where a monkey made arm movements in a 3D virtual environment to reach a randomly placed target. Using 18 cells from primary cortical area (M1), they investigated the effect of visual feedback on movements by comparing openloop traj ectories of hand controlled cursor movements and closedloop traj ectories of brain controlled cursor movements. A coadaptive movement prediction algorithm based on a population vector method, which was developed to track changes in cell tuning properties during brain controlled movement, iteratively refines the estimate of cell tuning properties as a subj ect attempts to make a series of brain controlled movement. Other works on decoding algorithms in BMIs were reviewed in Schwartz et al. [Sch01]. In this review, parametric linear models including the population vector algorithm [Geo83] and the Wiener filter, and nonparametric methods including the maximum likelihood estimate, the principal component analysis (PCA) [Isa00], and selforganizing feature maps (SOFM) [Lin97] were introduced as motorrelated information extraction algorithms from neural activity for BMIs. Serruya et al. in Donoghue laboratory at Brown University also demonstrated that monkeys tracked a continuously moving visual obj ect in a video monitor by moving a manipulandum [Ser02]. The Wiener filter with 50ms bins embedded by 20 tap delay lines was used to predict hand position from 7~30 Ml cell activities. They also showed that time required to acquire targets using brain control was very similar to hand control. Wu et al. in the same group proposed using a Kalman fi1ter as a decoding model [Wu03] for Ending probabilistic relationship between motion and mean firing rates (for 140ms time windows). They extended this Kalman filtering framework to build a mixture of linear models using a switching Kalman filter model in which the hidden state variables were estimated by the expectationmaximization (EM) algorithm [Wu04]. Andersen and coworkers in Caltech implanted microelectrode arrays in posterior parietal cortex (PPC) which is assumed to be responsible for planning of movements [And04, Mus04 and She03]. Highlevel signals related to a goal of movements were decoded using the maximum likelihood estimate of cursor positions from ~ 40 neuronal activities in PPC of monkeys. They demonstrated that neuronal activities in PPC could provide information about movement plans; thus they can be used for various neural prosthetic applications without moving limbs. Kennedy et al. first demonstrated a human BMI by implanting a special electrode in the human neocortex to extract signals to control a cursor on a computer monitor [Ken98]. Using spike trains as input to a computer, severely disabled patients could learn to move a cursor. Our group at the University of Florida in collaboration with Duke University has designed decoding models for 3D food reaching or 2D target reaching BMIs, including the Wiener filter and recursive multilayer perceptrons (RMLP) [SanO2a]. Based on the sensitivity analysis in the trained linear and nonlinear models, we improved the performance of models using only relevant neuronal activities [SanO3b]. Further development of switching multiple linear models combined by a nonlinear network was proposed by Kim et al. to increase prediction performance in food reaching [Kim03b]. Recently, Rao et al. demonstrated that echo state networks could be used as an alternative to nonlinear models such as RMLP or TDNN, with relatively uncomplicated training [Rao04] . Overall reviews for BMIs can be found in the following studies: [And04, DonO2, NicO l, NicO3a and Sch04]. For overall reviews of braincomputer interfaces (BCIs), see Wolpaw et al. [Wol02] and Friechs et al. [Fri04]. Approaches In this dissertation, we will address the following issues: First, we will apply the Wiener filter algorithm [Hay96a] to the BMI applications and show its performance in two types of training data: food reaching and tracking reaching experimental datasets. This algorithm will be the golden standard for the other adaptive methods developed. Then we will compare other adaptive algorithms that reach the same solution in the statistical sense for stationary data, but may handle the nonstationarity nature of the data better. We are referring to the least mean square algorithm (LMS) that will be implemented here in its normalized form (NLMS) [Hay96a]. The issue of the number of free parameters of the model will be handled by three different techniques. The first is the subspace Wiener filter, which first proj ects the input data using principal component analysis (PCA) [Hay96b], and then derives a Wiener filter to the desired response. Although PCA has been used as a maj or sub space proj section method, it does not orient the proj section to take advantage of the desired response structure. As an alternative, we propose a new idea of seeking subspace decomposition in the joint space through a hybrid subspace method, which combines the criterion of PCA and partial least squares (PLS) [Jon93 and Kim03a]. We also implement reduction in the number of degrees of freedom of the model by using a generalized feedforward filter based on the gamma tap delay line [Pri93], which has the ability to cover the same memory depth of the tap delay line with smaller filter order. The third method implemented uses online regularization based on the L,norm penalty [Has01], which decreases the values of unimportant weights through training. The problem of finding the optimal parameter for the penalty function will be addressed. The next issue covered in this paper relates to the adequacy of the linear modeling. We design a nonlinear mixture of switching, competitive linear models that implement a locally linear but globally nonlinear model [Kim03b]. This structure can be thought as a time delay neural network (TDNN) [Hay96b] that is trained in a different way to conquer the difficulty of training thousands of parameters with relatively small data sets. An important contribution of BMIs to brainrelated research fields is opening a new avenue for the experimental studies for the investigation of real time operation of neural systems in behaving animals [NicO3a]. For instance, using experimental BMIs, we may be able to explore the realtime nonstationary operations of neuronal ensemble in association with behavior. Also, the cellular contributions in a large neuronal population to the motor parameter encoding can be analyzed through BMIs. In a view of this respect, we investigate the properties of neuronal ensemble synchronized with behavior in BMIs using several approaches. First, we will seek a way to represent neuronal activity more efficiently in the context of BMI modeling. Through the multiresolution analysis [Mur04] for neural spike trains, we can construct a richer input space to possibly extract more encoded information, thus enhancing prediction models [Kim05c]. The issue of designing suitable models in this extended input feature space will be addressed. Second, we will demonstrate an approach to determine neuronal spatiotemporal patterns using nonnegative matrix factorization [Lee99]. This mathematical procedure, which has been introduced for image processing, can be utilized to extract spatiotemporal patterns of different neuronal populations without training of models [Kim05d]. Third, a real time neuronal subset selection algorithm is developed to Eind out which groups of neuronal activities exhibit relevance to a particular hand traj ectory, and to investigate nonstationary characteristics of neuronal ensemble in time [Kim05b]. This selection scheme is developed based on linear filters used for BMIs. Outline The dissertation is organized as follows: The experimental BMIs paradigms and the descriptions of the recorded datasets are presented in chapter 2. We revisit the applications of the linear adaptive fi1ters including the Wiener filter to BMIs in chapter 3. In chapter 4, several regularization methods are investigated to solve the problem of a large number of free parameters. In chapter 5, the technique of a nonlinear modeling using competitive multiple linear models is introduced and discussed. The experimental results and the comparisons of all the models for the two different behavioral tasks are summarized in chapter 6. Further developments of BMI models based on the multiresolution analysis are demonstrated in chapter 7. Several analytical methods including NMF and online subset selection using experimental BMIs are introduced in chapter 8 and 9. Conclusions and future research directions are discussed in chapter 10. CHAPTER 2 EXPERIMENTAL SETUPS FOR BRAINMACHINE INTERFACES The datasets that are used for the prediction models were collected in experimental BMIs paradigm by Nicolelis lab at Duke University. In this paradigm, the electrical activity of cortical neuronal ensembles from awake, behaving primates were recorded and used by statistical models for controlling a robotic arm in which the arm movements of primates was reproduced. In this chapter, we describe the recording of the activity of neuronal ensembles and the experimental paradigm for behavioral tasks. The properties of the datasets are also presented. Recording of Electrical Activity of Neuronal Ensembles Multiple microwire arrays were chronically implanted in multiple cortical areas of one adult female owl monkey (Aotus trivirgatus) named as Belle, and two adult female Rhesus monkeys (Macaca mulatta) named as Ivy and Aurora. In an owl monkey, multiple lowdensity microelectrode arrays (MBlabs, Dennison, TX), each including 16 32 50pum Tefloncoated stainless microwires, were implanted in the left dorsal premotor cortex (PMd), left primary motor cortes (M1), left posterior parietal cortex (PP), right PMd and Ml, and right PP cortex [Wes00]. In the first Rhesus monkey (Aurora), multiple highdensity microelectrode arrays developed at Duke University were implanted in the right PMd, right Ml, right somatosensory (Sl), right supplementary motor area (SMA), and the left Ml cortex. In the second Rhesus monkey (Ivy), multiple highdensity microelectrode arrays were implanted in the right PP, Ml, and SMA cortex [Car03 and NicO3b]. After surgical procedures, a multichannel acquisition processor (MAP, Plexon, Dallas, TX) cluster was used in the experiments to record the neuronal action potentials simultaneously. The spikes of single neuron from each microwire were discriminated based on timeamplitude discriminators and a principal component (PC) algorithm [Nic97 and Wes00]. Analog waveforms of the action potential and the firing time of each spike were stored. The firing times are binned within a 100ms nonoverlapping window, yielding a sequence of counts of the number of spikes in each bin. The distribution of the activity from the sorted neurons over cortex is presented in table 21 for each monkey. In this table, the indices of the sorted neuronal activity based on electrode arrays are used for identification purpose. These indices will be used through the remainder of dissertation. Note that in table 21, contra indicates the cortical areas in the opposite hemisphere to moving hand, ipsi does the areas in the same hemisphere. Table 21. The distributions of the sorted neuronal activity for each monkey in motor cortical areas. PP Ml PMd Sl SMA Ml PMd / contra contra contra contra contra ipsi M1ipsi 133 3454 5581 82104 Ll~e(33 ) (21) (27) (23) 149 50139 140192 Ivy (49) (90) (53) 67123 166 124161 162180 181185 Aurra(.57) (66) (38) (19)(5 Behavioral Tasks During a recording period, each primate was trained to perform particular motor tasks. In the first experimental setup, an owl monkey (Belle) performed three dimensional movements to reach for food randomly placed at one of four positions on a tray as depicted in Fig 21. In this task, the monkey placed its hand on a platform SThe number of the sorted neuronal activity in the cortical area. attached to the chair. When a barrier was open, the monkey reached and grabbed food. The location and orientation of the wrist of the monkey were continuously recorded using a plastic strip with multiple fiber optic sensors (Shape Tape, Measureand, Inc., Fredricton, NB, Canada) [Wes00]. These signals were sampled at 200Hz. mouth~b~food tray Figure 21. An experimental setup of 3D reaching task. In the second experimental setup, the Rhesus monkeys (Aurora and Ivy) performed a twodimensional target reaching task (Fig. 22). In this task, the monkey was cued to move the cursor on a computer screen by controlling a handheld manipulandum in order to reach the target. The monkey was rewarded when the cursor intersected the target. The position of the manipulandum was continuously recorded at 1000Hz sampling rate. Properties of Data BMI models are designed to receive the binned spike counts as input signals and to predict hand position or velocity as desired signals. Before describing BMI models, it is informative to get the picture of the characteristics of inputoutput data. Therefore, we here present several characteristics of the data which are used for all BMI models in the remainder of this dissertation. 13 Figure 22. An experimental setup of 2D target reaching task. The monkey moves a cursor (yellow circle) to a randomly placed target (green circle), and rewarded if a cursor intersects the target. Neuronal Firing Patterns Firstly, the examples of the binned data are illustrated in Fig. 23 for six sample neurons collected from M1 cortex of Belle. We can notice that some neurons fire more frequently than others. 10r 5~ 10 10r 5  10 I 0 10r 5  0l., ,,.. ",1. "1/ 1 ""ll "' "" ,,. 1... . 0 100 200 300 400 500 600 time (ms) Figure 23. An example of the binned data. Secondly, we examine the descriptive statistics of the binned data over entire neurons. The first statistic that we evaluate is the sparseness of the data measured by the ratio of the number of null bins (containing no spike) to the total number of bins. As a result, the sparseness is 85.6% for Belle' s dataset, 65.2% for Ivy, and 60.5% for Aurora, respectively. Then, the average and the standard deviation of the bin count for each neuron are evaluated in three datasets as depicted in Fig. 24. It shows the variance of statistics over neuronal space. (a) (b) (c) Figure 24. The plots of the average (dot) and the standard deviation (bar) for each neuron of three monkeys, (a) Belle, (b) Ivy, and (c) Aurora are illustrated In addition, the difference of firing rates during movement and rest for a 3D reaching task is evaluated. In order to quantify the difference, we estimate the mean firing rate during movement and rest separately. We collect 1300second long contiguous data samples from Belle' s dataset, and manually select 81 subsets of movement from them. The remaining parts are referred to rest subsets. Then the mean firing rate of each subset for movement and rest is estimated by averaging bin counts over entire neurons and time period of a given subset, respectively. Figure 25 shows the resulted estimates of mean firing rates for movement and rest. It shows that neurons tend to fire more frequently in average during movement. However, due to the uncertainty of the segmentation between movement and rest these average statistics are variable and subj ect to changing. It is also noteworthy that the mean firing rate tends to reduce with time. O 32 II ; 02 Subset Ind ces Figure 25. The traj ectories of the estimated mean firing rates for movement (solid line) and rest (dotted line) over sequence of subsets. Finally, the nonstationary characteristics of input are investigated through observation of temporal change of the input autocorrelation matrix. The autocorrelation matrix of the multidimensional input data is estimated based on the assumption of ergodicity (see chapter 3 for details). In order to monitor the temporal change, the autocorrelation matrix is estimated for a sliding time window (4000sample length) which slides by 1000 samples (100 second). For each estimated autocorrelation matrix, the condition number and the maximum eigenvalue are computed as approximations of the properties of the matrix. The experimental results of these quantities for three datasets are presented in Fig. 26. It is observed that there is temporal variance of the properties of the input autocorrelation matrix. 1040 10B 1020 Bel Aurora i10 1010 102 O 1010 104 0 10 0 10 0 5 10 15 Window Index Window Index Window Index 30 100 35 e ele v Aurora S25 10 40 25 0 10 0 10 0 5 10 15 Window Index Window Index Window Index Figure 26. Illustrations of nonstationary properties of the input autocorrelation matrix. The dotted lines in the bottom panel indicate the reference maximum eigenvalue which is computed over entire data samples. Hand Movements The hand movements of primates are mainly parameterized by the traj ectories of hand positions. We treat these traj ectories as our desired signals to be predicted. Note that the hand positions which are sampled at 200Hz or 1000Hz are downsampled to 10Hz to be synchronized with the 100ms binned data. Before the investigation of the characteristics of desired signals, we first present the sample traj ectories from two different tasks (food reaching of Belle and target reaching of Ivy) in Fig. 27. In the food reaching movement (Fig. 27a), the traj ectory approximately spans a hyperplane in which three specific parts of movement such as reach to food, food to mouth, and mouth to rest are placed. Figure 27a describes three reaching movements. In Fig. 27b, a 2D traj ectory in the target reaching task over 4 second time duration is depicted. The traj ectory starts from the dot in the middle of the figure to the arrow. It demonstrates the traj ectory in this task spans the entire given 2D space and is more irregular than in 3D food reaching. 100 40C (mm) 5040 (mm) P30 40 (mm) (a) (b) Figure 27. Sample traj ectories of (a) 3D food reaching, and (b) 2D target reaching movements . Now, we seek to observe the nonstationary characteristics of these traj ectory signals. The continuous wavelet transform based on the basic wavelet function such as the Daubechies wavelet (db6 wavelet is used in this analysis) [Dau92] is performed to see the frequency change over time. 10000sample traj ectory data from both 3D food reaching and 2D target reaching are used for wavelet analysis. The absolute values of wavelet coefficients are plotted in Fig. 28. From this wavelet transform, we can clearly see the nonstationarity of the traj ectory signals for both tasks. I 50 50 u 100 200 300 400 CIO... v 0 T rne (rns) (b) Figure 28. The db6 continuous wavelet coefficients of traj ectory signals of (a) 3D food reaching, and (b) 2D target reaching. Darker pixels in coefficients indicate larger values. Absolute Wavelet Coefficients r, 'r i ,, O 100 200 300 400 500 60 Time (rns) (a) LL 100 Absolute Wavelet Coefficients [ II II I ill I U) Z Ir IVX LO 0. O 10 CHAPTER 3 LINTEAR MODELINTG In this chapter, we will present the design of adaptive linear filters for BMIs and the standard methods to estimate the parameters. Linear Modeling for BMIs Consider a set of spike counts from 2~neurons, and a hand position vector de W~ (C is the output dimension, C = 2 or 3). The spike count of each neuron is embedded by an Ltap timedelay line. Then, the input vector for a linear model at a given time instance nZ is composed as x(n) = [xl(n), xl(n1) ... xl(nL+1), x2 8Z) .... r 1417 T, XE LMh~ where x,(nj) denotes the spike count of neuron i at a time instance nj. A linear model estimating hand position at time instance n from the embedded spike counts can be described as ye = ff x, (n j)we, +bC (31) where ye is the ccoordinate of the estimated hand position by the model, w,c" is a weight on the connection from x,(nj) to ye, and bc is a bias for the ccoordinate. The bias can be removed from the model when we normalize x and d such that E[x] = 0, O E W LMh, and E[d] = 0, O E W where E[] denotes the mean operator. Note that this model can be regarded as a combination of three separate linear models estimating each coordinate of hand position from identical input. In a matrix form, we can rewrite (1) as y = W x (3 2) where y is a Cdimensional output vector, and W is a weight matrix of dimension (LM+1)xC. Each column of W consists of [wloc cL' W12c. W1L1c', 20c', 21c' c ,~ III' . 11 1 T Fig. 31 shows the topology of the linear model for the BMI application, which will be kept basically unchanged in the reminder of this dissertation. The most significant differences will be in the number of parameters and in the way the parameters w,, of the model are computed from the data. All the models are applied to estimate the 3D or 2D hand positions using L = 10 taps, M~= 99 (Belle) neurons (after eliminating the ones that do not fire during training parts of recordings) for the food reaching task and M~= 192 (Ivy) or 185 (Aurora) for the target reaching task. The length of the time delays (L) is determined based on the preliminary BMI study of the correlation between time lags and hand movements in Wessberg et al. [Wes00], where the neuronal firings up to 1 second before current hand xl(n) v 'y(n) XM(1 Figure 31. The topology of the linear filter designed for BMIs in the case of the 3D reaching task. x,(n) are the bin counts input from ith HOUTOn (total M~neurons) at time instance n, and zl denotes a discrete time delay operator. yc(n) is the hand position in the ccoordinate. wcg is a weight on x,(nj) for yc(n), and L is the number of taps. movement are significantly correlated with movement. The sizes of the training and the testing sets are 10,000 samples (~16.7 minutes) and 3,000 samples (~ 5 minutes) for all the models and three datasets, respectively. The size of the training set is empirically chosen by consideration of the compromise between nonstationarity and the quality of estimation: a longer training set can improve estimation of parameters, but increases a chance of entering more nonstationary characteristics of data in estimation. The weights are fixed after adaptation, and the outputs of the model are produced for novel testing samples. Performance of the model is evaluated based on these testing outputs with respect to generalization. The following quantitative performance measures are used to evaluate the accuracy of the estimation: 1. Correlation coefficient (CC) quantifies the linear relationship between estimated and actual hand traj ectories defined as CC ~" (33) sdS where, Cdyi denotes the covariance between two variables d and y, and sd (OT Sy) denotes the standard deviation of d (or y). In our evaluation, Cdy is the covariance between actual hand traj ectory (d) and its estimation by model (y). 2. The signal to error ratio (SER) is the ratio of the powers of actual hand traj ectory signals and the error of a model defined as d~i(k) 2 SER _k=1 (3 4) where d;k) and e(k) are the actual hand signal and the error at a time instance k, and K is the size of the window in which SER is computed. 3. The cumulative error metric (CEM) estimates the cumulative distribution functions of the error radius defined as CEM~) = r( <; ) .(35) So, CEM(r) is the estimated probability that the radius of the error vector is less than or equal to certain value r. We compute CC and SER for a short sliding time window in order to see if a given model predicts better for a particular part of traj ectory. The size of the window is determined empirically. For the food reaching data, the size is set to 4 seconds which single reaching movement approximately takes. However, the duration of movement cannot be estimated for target reaching data since there is no apparent rest period between consecutive reaching movements. Therefore, the size of the window for the target reaching data is set long enough (1 minute) to make the computation of CC and SER reliable. For comparison between different models, the averages of CC and SER from every window are computed respectively. These computations are conducted separately for each coordinate of hand position. Furthermore, we divide the evaluation results of food reaching into two modes: movement and rest. In each mode, the averages of CC and SER over three coordinates are used for evaluation instead of individual CC and SER in each coordinate. For target reaching, where the separation between movement and rest is not apparent, evaluation is executed separately for each coordinate. The three performance measures introduced here complement one another; CC measures linear covariance between actual and estimated traj ectories, thus providing the evaluation of tracking ability. But it lacks measuring the bias of estimation. This shortcoming is supplemented by SER that is based on error measurement. However, SER bears a problem such that it can be inclined to coordinate system which is calibrated artificially. For instance, with the similar error power, SER becomes relatively large when the magnitude of actual traj ectory becomes large, thus increasing signal power. However, the magnitude of hand position does not possess any practical meaning. This problem can be counterbalanced by CEM in which only the radius of error vector is concerned. It also provides a statistical tool for performance measure which is especially useful for statistical comparison of model on average. Hence, we can state that three measures jointly allow more comprehensive performance evaluation than using individual measures separately. The Wiener Filter The transfer function from the neural bin count to hand position can be estimated by linear adaptive fi1ters, among which the Wiener filter plays a central role [Hay96]. The weight matrix in the Wiener filter for the case of MIMO system is estimated by the WienerHopf solution as Ww,,,,,,, = R P. (3 6) R is the correlation matrix of neural spike inputs with the dimension of (LM~x(LM), rzz r2  r2M R = .(37) where r,, is the L xL crosscorrelation matrix between neurons i and j (if j), and r,, is the LxL autocorrelation matrix of neuron i. P is the (LM~xC crosscorrelation matrix between the neuronal bin count and hand position as P = c ,(38) where pc, is the crosscorrelation vector between neuron i and the ccoordinate of hand position. The estimated weights Wurene, are optimal based on the assumption that the error is drawn from white Gaussian distribution and the data are stationary. The predictor Wwlener xIIIIILY mii izes lthe ll men quaret errorI (MSE)C costL function, J = E [ le 2 ], e = d y (3 9) Each subblock matrix r,, can be decomposed as ra, (0) ra 1  r, L 1 ra,= r, (1) ra,(0) ra,(L 2)(30 ra 1 )r (2 L) r (0) (0 where r,,(r) represents the correlation between neurons i and j with time lag T. These correlations, which are the second order moments of discretetime random processes x,(m) and x,(k), are the functions of the time difference (mk) based on the assumption of wide sense stationarity (m and k denote discrete time instances for each process). Assuming that the random process x,(k) is ergodic for all i, we can utilize the time average statistics to estimate correlation. In this case, the estimate of correlation between two neurons, r,(mk), can be obtained by 1 ",rmx() ii~,,)(1 r, (m k) = E[x, (m)x, (k)] ,( l,( ) Vj 1 ,M. (1 N1,, 1 The crosscorrelation vector p,, can be decomposed and estimated in the same way. r,,( ) is estimated using equation (311) from the neuronal bin count data with x,(n) and x,(n) being the bin count of neurons i and j respectively. From equation (31 1), it can be seen that r,,(r) is equal to r,,(r). Since these two correlation estimates are positioned at the opposite side against the diagonal entries of R, the equivalence between r,,( ) and r,,( r) leads the symmetry of R. The symmetric matrix R, then, can be inverted effectively by using the Cholesky factorization. This factorization reduces the computational complexity for the inverse of R from O(N3) to O(N2) where N is the number of parameters. Notice that R must be a nonsingular matrix to obtain the solution from (33). However, if the condition number of R is very large, causing R to be close to a singular matrix, then the Wwener may be inadequately determined. This usually happens when the number of samples is too small, or the input variables are linearly dependent to each other. In such a case, we can reduce the condition number by adding an identity matrix multiplied by some constant to R before inversion. This procedure is called ridge regression in statistics [Hoe70], and the solution obtained by this procedure turns out to minimize a cost function which linearly combines the one in (39) and a regularization term. The details will be discussed in chapter 4. In our estimation of the Wiener solution, however, we do not employ this regularization scheme Figure 32 and 33 display the Hinton diagrams of the weights of the Wiener filter obtained by (36) for food reaching and target reaching, respectively. Each column of Wwlener (i.e., the weight vector of the Wiener filter for each coordinate) is rearranged in a matrix form to show spatiotemporal structure of weight vectors. In this matrix form, the neuron indices are aligned in the xaxis and the time lags are in the yaxis. Note that the first row of the matrix corresponds to the zero lag (for the instantaneous neuronal bin counts), followed by the successive rows corresponding to the increasing lags (up to nine). In the Hinton diagram, white pixels denote positive signs, while black ones do negative signs. Also, the size of pixels indicates the magnitude of a weight. From the Hinton diagram, we can probe the contribution of individual neurons to the output of the Wiener filter. For this purpose, the weights represented in the Hinton diagrams are yielded from the input in which each neuronal bin count time series x,(n) is normalized to have unit variance. Then, the value of a weight can represent the sensitivity of the filter output to the corresponding input [SanO3a]. Also, we can see the sign of the correlation between a particular neuronal input and the output. For instance, the weights for neurons indexed by 5, 7, 21, 23, and 71 exhibit relatively large positive values for food reaching (see Fig. 32), indicating that those neuronal activities are positively correlated with the output. On the other hand, the weights for neurons 26, 45, 74, and 85 exhibit large negative values indicating the negative correlation between neuronal inputs and the output. There are also some neurons for which the weights have positive and negative values (e.g. 14 and 93). It is possible from these diagrams to examine the significant time lags for each neuron in terms of the contribution to the filter output. For instance, in the case of neuron 7 or 93, the recent bin counts seem to be more correlated with the current output. However, for neuron 23 or 74, the delayed bin counts seem to be more correlated with the current output. Similar observations can be made for target reaching in Fig. 33. 10 20 30 40 50 60 70 80 90 100 Figure 32. The Hinton diagram of the weights of the Wiener filter for food reaching. 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 Figure 33. The Hinton diagram of the weights of the Wiener filter for target reaching. Stochastic Gradient Learning The underlying assumption of the Wiener filter is that the statistics of the data are timeinvariant. However, in the nonstationary environment where the statistics of the data vary in time, the Wiener filter only uses the average statistics to determine weights. The normalized least mean squares (NLMS) algorithm, a modified version of the least mean squares (LMS) algorithm, can train weights effectively for nonstationary inputs by varying the learning rate [Hay96]. It utilizes the stochastic estimation of the power of input signals to adjust the learning rate at each time instance. The weights at a given time instance n are updated by NLMS as w ns~(nZ+ 1)= wim~Cs(nZ)+ 27 c"(n)x(n) (312) 7 + x(n)l where r satisfies 0 the ccoordinate and x(n) is an input vector. If we let r(n) r/(7+x(n)2), then the NLMS algorithm can be viewed as the LMS algorithm with a timevarying learning rate such that, w 73~S (n + 1) = w ,3,, (nZ)+ r(n)ec (n)x(n) (313) Although the weights in NLMS converge to the same solution as the Wiener filter in the statistical sense for stationary data and a timevarying learning rate, the solution will be different for the nonstationary data. The weights of the linear filter for BMIs are estimated by NLMS with the settings of r = 0.01 and 7=1. In the empirical analysis of the resulted outputs of this fi1ter, we observed that for food reaching the accuracy of the estimation is improved compared to the Wiener filter, especially during rest (see the details of results in chapter 6), It means that the weights found a better compromise between the two very different characteristics of movement and rest. This improvement has been achieved because of the update rule (312) where the weights in NLMS are updated with a relatively high learning rate during rest since total firing count increases during movement (see Fig. 25). Thus, for the class of motor behaviors in which movement periods are separated by rest, the NLMS algorithm captures more information about rest positions than the Wiener filter. Other Linear Modeling For comparison with other linear models being proposed for BMIs, a Kalman filter is designed and its prediction performance is evaluated for the same data used in this dissertation. The Kalman filter, which estimates the internal state for a linear dynamical system [Kal60] and produces a generative model for the data, has been proposed to learn the dynamical nature of the biological motor system in BMIs [Wu03, SanO2b]. In the Kalman filtering framework, the system state includes the hand position, velocity and acceleration, and the observation includes the neuronal bin count. Based on the assumption of the linear relationship (with additive Gaussian noises) between the state and the observation, as well as the states at current and previous time instances, the Kalman filter recursively estimates the hand kinematics in realtime from cortical neurons. Although the system parameters representing the linear relationship are Eixed after training, the Kalman filter can adjust its gain to track the timevarying nature of motor systems. We briefly review the method of the Kalman filter used for BMIs. The linear dynamic equation for the state is given by z(n +1) = Az(n) + o(n), (314) where z(n) is a state vector for the hand kinematics such that z(n) = [px(n) p,(n) vx(n) v,(n) ax(n) a;(n)] ; pc(n) denotes hand position for the ccoordinate, v(n) velocity, and a(n) acceleration, at a time instance n. For food reaching, pz(n), vz(n), and az(n) are added to the state vector, respectively. o(n) is a process noise vector following the Gaussian distribution with a zeromean vector and a covariance matrix G2. The stateoutput mapping equation is given by x(n) = Hz (n) + u(n) (315) where x(n) is the instantaneous neuronal bin count vector (binned by a 100ms non overlapping time window). Note that Wu et. al. designed the same Kalman filter with different window size (70ms) [Wu03]. u(n) is a measurement noise term following the Gaussian distribution with a zeromean vector and a covariance matrix Q. Given the training set, A and H are determined by the least squares (LS) which solves the following optimization problems, N12 A =arg min Iz(n +1) Az(n)l (316) A n=1 H = arg min x(n7) Hz (n) (317) H n=1 Given A and H, the estimate of covariance matrix 02 and Q can be obtained by 1 N1 02 = C (z(n + 1) Az(n))(z(n + 1) Az(n))T (318) 1^N Q (x(n) Hz(n))(x(n) Hz(n)) (9 N n=1 With the model (A, H, 02, Q) obtained, the Kalman filter estimates the state of the hand kinematics from the novel neuronal bin count vectors (the test data) in real time. The state estimate z(n) and the Kalman gain matrix K(n) are updated at each time instance by the following recursion, P (n) = AP(n 1)AT + 0Z (3 20) K(n) = P (n)H' (HP (n)HT + Q)' (3 21) z(n) = Aztn 1) +K(n)(x(n) HAz(n 1)) (3 22) P(n) = (I K(n)H)P (n) (323) Note that the error covariance matrix P and the state vector estimate z must be initialized before starting this recursion. CHAPTER 4 REGULARIZED LINEAR MODELING In chapter 3, we have demonstrated the design of linear filters which can be adapted for BMI applications. Despite the intrinsic sophistication in the BMI system, the simple linear filter (which merely combines the weighted bin count inputs) could estimate the primate' s hand position fairly well, especially showing the ability of tracking low frequency traj ectory. Based on this fact, we seek an opportunity to improve the performance of linear models by importing advanced learning techniques. Among those, a class of regularization methods is preferred since it yields smoother function approximation in order to improve the generalization performance for BMI models. In this chapter, we propose to use three different regularization approaches. The first approach reduces the input space dimension using subspace proj section and subsequently operates the linear filter in the subspace. The second approach reduces the fi1ter order in each neuronal channel by employing the gamma delay line. The third approach places constraints on the model parameter space to reduce the effective number of parameters. We will discuss the methodology, implementation and analysis of these regularization approaches in this chapter. Dimension Reduction Using Subspace Projection One of the challenges in the design of decoding models for BMIs is that some neurons' firings are not substantially modulated during task performance, and they only add noise to the multichannel input data. In addition, some neurons' firings are correlated with each other; thus it may be advantageous to blend these inputs to improve model performance. Subspace proj section, which can reduce the noise and blend correlated input signals together, may curtail unnecessary firing signals by a proper proj section matrix. It also reduces the number of degrees of freedom in the multichannel data, and consequently decreases the variance of the model. Here, we introduce a hybrid sub space proj section method which is derived by combining the criteria of principal component analysis (PCA) and partial least squares (PLS). Then, we will design the sub space Wiener filter based on this hybrid subspace proj section for BMIs. A Hybrid Subspace Projection PCA, which preserves maximum variance in the data, has been widely adopted as a proj section method [Hay96b]. The proj section vector WPCA is determined by maximizing the variance of the proj section outputs as wPCA =argmax JPCA(w)= E[ x'w 2 TR~w (41) where Rs is the input covariance matrix computed over the neuronal space only (it is an M~x M~ matrix where M~ is the number of neurons). x is an M~x 1 instantaneous neuronal bin count vector. It has been well known that WPCA turns out to be the eigenvector of Rs corresponding to the largest eigenvalues. Then an M~x S proj section matrix which constructs an Sdimensional sub space consists of S eigenvectors corresponding to the S largest eigenvalues. However, PCA does not exploit information in the j oint space of both input and desired response. This means that there may be directions with large variance that are not important to describe the correlation between input and desired response (e.g., some neuronal modulations related to the monkey's anticipation of reward might be substantial, but less useful for the direct estimation of movement parameters), but will be preserved by the PCA decomposition. One of the subspace proj section methods to construct the subspace in the j oint space is PLS, which seeks the proj section maximizing the crosscorrelation between the proj section outputs and desired response [Jon93]. Given an input vector x and a desired response d, a proj section vector of PLS, wPLS maximizes the following criterion, wPLS = arg max JPLS (w)= E[(xTw)d] = E[wT (xd)] = wTp, (42) where, p is defined as an M~x 1 crosscorrelation vector between x and d. The consecutive orthogonal PLS proj section vectors are computed using the deflation method [Hay96b]. There have been efforts to find a better proj section which can combine properties of PCA and PLS. The continuum regression (CR), introduced by Stone and Brooks [Sto90], attempted to blend the criteria of the ordinary least square (OLS), PCA and PLS. Recently, we have proposed a hybrid criterion function similar to the CR, together with a stochastic learning algorithm to estimate the projection matrix [Kim03a]. The learned proj section can be either PCA, PLS, or combination of them. A hybrid criterion function combining PCA and PLS is given by (wT p)2A wTR,w)l~ J(w, 2) = (43) ww where, Ai is a balancing factor between PCA and PLS. This criterion covers the continuous range between PLS (il= 1) and PCA (il= 0).1 Since the log function is monotonically increasing, the criterion can be rewritten as, log(J(w, 2))= AZlog(w'p)2 +(l /)l0g(wrRyw) log(w'w) (44) 1 The CR covers OLS, PLS and PCA. However, since we are only interested in the case when subspace projection is necessary, OLS can be omitted in our criterion. We seek to maximize this criterion for 0<;&: 1. There are two learning algorithms derived in [Kim03a] to find w (one is based on gradient descent, and the other is based on the fixedpoint algorithm), but we opt to use the fixedpoint learning algorithm here due to its fast convergence and independence of learning rate. The estimation of w at the k+1th iteration in the fixedpoint algorithm is given by [ Ap (1 AZ)Rsw(k) w(k +1I) = (1 T)w(k) + T +~)p Wk TRwk (45) with a random initial vector w(0). T (0 oscillating behavior near convergence. The convergence rate is affected by Tthat produces a tradeoff between the convergence speed and the accuracy. Note that the fastest convergence can be obtained with T = 1. The consecutive proj section vectors are also learned by the deflation method to form in each column of a projection matrix W. After proj section onto the subspace by W, we embed the input signal at each channel with an Ltap delay line and design the Wiener filter to estimate the hand position. Figure 41 illustrates the overall diagram of the subspace Wiener filter. x~n)Sub space z Wig r ~r(n) projection f~iter Figure 41. The overall diagram of the subspace Wiener filter. y(n) denotes the estimated hand position vector. There are L1 delay operators (z ) for each subspace channel . Design of a Decoding Model Using the Subspace Wiener Filter The holdout crossvalidation method [Bis95] is utilized to determine both the optimal sub space dimension (S) and h simultaneously. 10,000 consecutive data samples are divided into 9,000 training and 1,000 validation samples for both the food reaching and target reaching tasks, respectively. The MSE over validation samples is computed after training for each set of (Si, hj), where Sie {20, 21, ..., 60} and Aj {,t 0.,.. 1 }1 In Fig. 42, the contour map of the computed MSE is depicted. The minimum MSE is found at (S, h) = (37, 0.9) for food reaching and (S, h) = (44, 0.6) for target reaching, respectively. The validation MSE also tends to be smaller for larger h in the lower sub space dimension while the MSE levels are rather flat in the higher subspace dimension. This indicates that PLS plays a more important role in building a better subspace Wiener filter for the lower sub space dimension. 087 05" I 04~ 20 25 30 35 40 45 60 56 60 subspace d menslan subspace d menslan (a) (b) Figure 42. The contour map of the validation MSE for (a) food reaching, and (b) target reaching. The darker lines indicate lower MSE levels. To investigate the difference between the subspaces by PCA and PLS further, the first three proj section vectors are estimated by setting ii= 0 or 1 in (45) as presented in Fig. 43. Note that PLS yields separate vectors corresponding to each hand position coordinate since it utilizes desired response, while PCA needs only one proj section regardless of coordinates. In food reaching, the proj section vectors of PCA have large weights on the neurons that fire frequently. For instance, the neurons indexed as 42, 57, and 93 are empirically discovered to have the largest firing counts. Since the neural firing data is sparse, PCA attempts to build a subspace with frequently firing neurons in order to preserve the variance. On the other hand, the weights in PLS proj section have larger values on different neurons which do not fire very frequently such as the neurons indexed as 7 and 23. From the Hinton diagram described in previous chapter (see Fig. 32), these neurons were discovered to significantly contribute to the output of the Wiener filter designed for BMIs. Therefore, PLS is able to utilize the information from important neurons that do not even fire very frequently by exploiting the information in the j oint space. For target reaching, we can also observe that more neurons are involved in the proj section vectors in PLS than PCA. The neurons with larger weights in the PCA proj section, again, are observed to fire more frequently. It is interesting to observe that for target reaching, the subspace dimension obtained from the crossvalidation is of the same order as the number of neurons obtained in the neuron dropping analysis performed in Sanchez et al. [SanO3b]. In fact, the number of important neurons, for which the correlation coefficient between model outputs and desired hand traj ectories is maximized, is 35, which is close to the subspace dimension of 44. The empirical measurements of performance for the test data using the subspace Wiener filter with the above parameters demonstrate that the generalization performance of the subspace Wiener filter for both tasks reach slightly higher level than those of the Wiener filter or the linear filter trained by NLMS (see chapter 6). We expect, however, much higher improvements using the subspace proj section methods for larger datasets 1st proleCt on C 52nprlcon 3rdprlcon . S20 40 60 80 100 (a) 1 1~st projection 2nd projection 0 r 3rd projection 050 100 150 1st projection 2nd prolection 1 3rd pro ect on 0 20 40 60 80 100 Neurons (b) 1st projection 2nd projection 3rd projection 050 100 150 (c) (d) Figure 43. The first three proj section vectors in PCA for (a) food reaching, and (c) target reaching, and PLS for (b) food reaching, and (d) target reaching, respectively. (more than 200 neurons; Carmena, J.M., Lebedev, M.A., & Nicolelis, M.A.L., unpublished observations) and anticipate that these techniques will be important in the foreseeable future when the number of simultaneously recorded neurons surpasses 1,000. Parsimonious Modeling in Time Using the Gamma Filter The large number of parameters in decoding models is caused not only by the number of neurons but by the number of time delays required to capture the history of the neuron firings over time. Although we use a 10tap delay line in this study, the size of the delay line can be variable depending upon the bin size (e.g., if we use a 50ms time bin, then the number of time lags increases to 20). Hence, it is desirable to represent the temporal patterns of neuronal data through more efficient way to reduce the number of taps. A linear filter described in previous chapters can be decomposed into multiple finite impulse response (FIR) filters arranged to every neuron. An FIR filter has advantages of trivial stability and easy adaptation. However, the length of the impulse response and the filter order are equivalent in an FIR filter. Hence, when a problem requires a deep memory and a small number of parameters, an infinite impulse response (IIR) system is more likely appropriate. However, the stability issue in the adaptation and the nonconvex error surface of an IIR filter yield nontrivial challenges for practical use. A generalized feedforward filter provides a signal processing framework to incorporate both FIR and IIR characteristics into single system by employing a local feedback structure [Pri93]. As shown in Fig. 44, an input signal is delayed at each tap by a delay operator defined by specific transfer function G(z). Note that when G(z) = : it becomes an FIR filter. The transfer function of an overall system H(z) is stable when G(z) is stable since H(z) = [ wk(G.(z))k (46) where K is the number of taps. It has been shown that a generalized feedforward filter can provide a trivial stability and easy adaptation while decoupling the memory depth from the filter order. xK(Z I~ p y(n) Figure 44. An overall diagram of a generalized feedforward filter [Pri93]. xo(n) is an instantaneous input, and y(n) is a filter output. The gamma filter is a special case of the generalized feedforward filter with G(z) = p/l(z(1pu)) where pu is a feedback parameter. The impulse response of the transfer function from an input to the kth tap, denoted as gk(n), is given by gk (nZ)= Z '(Gk (z))= Z1 ~; = n 1kl Unk u(n k) (47) where, Z () indicates the inverse ztransform and u(n) the step function. When pu = 1, the gamma filter becomes an FIR filter. The stability of the gamma filter in adaptation is guaranteed when 0 < pu < 2 due to a local feedback structure. The memory depth D with a feedback parameter pu in the Kthorder gamma filter is given by D) = Itfor pu < 1, or D = for pu > 1. (48) pU 2pU If we defined the resolution R pu, the property of the gamma delay line can be described K = D x R for pu < 1, or K = D x (2 R) for pu > 1. (49) This property shows that the gamma filter decouples the memory depth from the filter order by adjusting a feedback parameter (p). In the case of p = 1 (i.e., the FIR filter), the resolution is maximized whereas the memory depth is minimized for a given filter order. But this choice sometimes results in overfitting when a signal to be modeled requires more time delays than the number of descriptive parameters. Therefore, the gamma filter with the proper choice of a feedback parameter can avoid overfitting by the decoupled memory structure. The tap weights can be updated using NLMS, and therefore the computational complexity is of the same order of FIR filters. A feedback parameter pu can also be adapted from the data. However, instead of adaptively learning pu, we can serach the best combination of K and pu by using the crossvalidation. In the same way as performed in previous section, the MSE in a validation set is computed for each set ofK, and 4u, where KJE (2, 3, ..., 10) (note that we ignore the case of K,=1, which implements memoryless process) and yuE (0. 1, 0.2, ..., 1.9). The number of samples is 9000 and 1000 for training and validation, respectively. The contour of the validation MSE is shown in Fig. 45. The minimum MSE is achieved for (K, pu) = (4, 0.3) for food reaching and (K, pu) = (10, 1.2) for target reaching, respectively. The memory depth estimated by this empirical method becomes D 13 for the food reaching task and D = 12.5 for the target reaching task. The savings in the number of parameters are 60% (3 120 1248) for the food reaching task. It appears that the temporal resolution of the filter (R) for target reaching is larger than that for food reaching: R = 0.3 for food reaching and 0.8 for target reaching, respectively. It might indicate that relatively irregular target reaching movement requires finer temporal resolution. The generalization performance of the gamma filter with the optimized K and 18 1 08 08 2345678910 2345678910 Numbef of taps Numberof taps Figure 45. The contour maps of the validation MSE computed at each grid {~K,, pu,} for (a) food reaching, and (b) target reaching. The darker lines denote lower MSE levels. p is evaluated through the novel test data. The empirical results show that the gamma filter exhibits slightly better performance than both the Wiener filter, and the FIR filter trained by NLMS (see chapter 6). Regularization by Parameter Constraints There have been numerous efforts about model selection to deal with the bias variance dilemma [Gem92]. One of them is a pruning technique which seeks to diminish unnecessary parameters by imposing some constraints in the model parameter space (see [Ree93] for review). Among many pruning techniques, weight decay has been widely used due to its simplicity and fair performance [Kro92]. Weight decay is based on an error cost function to which an additional penalty term of parameters is added. This penalty restricts the L2nOrm of a parameter vector, and is balanced with the MSE cost by a regularization parameter. Although weight decay has originated in a neural networks field, it shares the same cost function with a statistical method called ridge regression [Hoe70]. A difference is that ridge regression provides an analytical solution, where weight decay provides an iterative solution. Hence, understanding ridge regression may give us a better appreciation of weight decay. One of the interesting features of ridge regression is its link to sub space projection, especially PCA. This feature leads us to see in which directions in the input space ridge regression (or weight decay) prunes more. This property will be reviewed in more details shortly. Ridge regression belongs to a class of shrinkage method in statistical learning. As stated earlier, it employs the L2nOrm penalty. However, recent studies of statistical learning have revealed that the L1norm penalty sometimes provides a better solution as a shrinkage method in many applications than the L2nOrm penalty [Has0 1]. LASSO (Linear Absolute Shrinkage and Selection Operator) has been a prominent algorithm [Tib96] among the L1norm based shrinkage methods. However, its implementation is computationally complex. LAR (Least Angle Regression) has been recently proposed by Efron et. al. providing a framework to incorporate LASSO and forward stagewise selection [Efr04]. With LAR, the computational complexity in the learning algorithm can be significantly reduced. It is notable that we have already applied this class of regularization for BMIs using NLMS since NLMS can be viewed as the solution to the constrained optimization problem [Hay96a]. In fact, the NLMS algorithm described in (312) is the solution to the following problem: M/irinimize Iw(n+1) w(mII (410) subject to d(nZ) w(nZ +1)T x(n) = 0 for a given desired response d(n) and an input vector x(n) [Dou94]. It has also been shown in [Slo93] that NLMS can be the solution to the following optimization problem: w (n + 1) p411 where pu is the step size. In the NLMS algorithm, the weights are updated such that the change of weight vectors is minimized. The NLMS algorithm can be therefore viewed as the solution to the error minimization problem with the constraints on the difference between successive weight updates. In this section, we will review statistical shrinkage methods and its relationship with subspace proj section. Then, the application of ridge regression and weight decay for BMIs will be investigated. Finally, the properties of the LAR algorithm and its application to BMIs will be discussed. Review of Shrinkage Methods Here, we review the basic concepts in coefficient shrinkage methods. The link between subspace proj section and shrinkage methods is then illustrated. Various shrinkage methods are finally illustrated both in a geometric view and in a Bayesian framework. Shrinkage methods Consider a constrained minimization problem for given an input vector x, and an desired output d such that v~=arg minE dw'X w, (412) subject to Iw 2~ where w is a linear model parameter vector, and 1^ is the optimal solution to it. This modeling technique is called ridge regression. When there are many correlated input variables in a linear model, the estimated weights can be poorly determined with high variance. For instance, the effect of a large positive weight on an input variable can be canceled by a large negative weight on another correlated input variable. If we restrict the size of weights as in (412), such a problem can be effectively prevented. The other motivation of ridge regression is to make the input autocovariance matrix nonsingular even if it is not of full rank. Let X be an NxL input matrix in which each row represents an observation vector (x in equation 412), and d be an Nx l desired output vector. N indicates the number of observations, and L is the input dimension. We assume that each column of X is normalized to have zeromean. Then, the optimal solution 1^ in (412) by ridge regression is w y = (R + 61) P, (413) where I is an L xL identity matrix. R and P represent XTX and X'd, respectively. Notice that the matrix R + 6I is invertible even if R is a singular matrix. We can obtain some insights in the properties of ridge regression by the singular value decomposition (SVD) of X. The SVD of X is given by X = UAA", (414) where U and V are NxL and L xL unitary orthogonal matrices, and A is an L xL diagonal matrix with diagonal entries Atl > At 2, ..., ALr, > called the singular values. Then, the prediction outputs yielded by ridge regression can be written using the SVD as Xv~, = X(XTX + 6) XTd = UA~(A2 + 11A~UTd = CuI u, d, (415) where u, is the ith COlumn of U. From (415), we can see that ridge regression finds the coordinates of d with respect to each orthonormal basis u,. Then it shrinks coordinates by ?j +6 (3> 0). Therefore the coordinate with a smaller A will be shrnmk more. It is easy to show that the singular values (A,? Jnl+ i~ndct ther varinceo the~ principal componnents of X [Has01]. Hence, the smaller singular value corresponds to the direction of smaller variance which is shrunk more by ridge regression. Now we consider LASSO as an L1norm based shrinkage method. The fundamental difference between ridge regression and LASSO is the penalty in the cost function as, arg mmn 2 ~ Wn rhe t Lw~p( wzassoE[dwx]sbett w
W =1 