UFDC Home  myUFDC Home  Help 



Full Text  
FROM CORTICAL NEURAL SPIKE TRAINS TO BEHAVIOR: MODELING AND ANALYSIS By JUSTIN CORT SANCHEZ A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2004 Copyright 2004 by Justin Cort Sanchez This dissertation is dedicated to my father. His sense of practicality and his love of biology has inspired me to become the Biomedical Engineer that I am today. ACKNOWLEDGMENTS The path to achieving a doctorate is like a rollercoaster ride. Some days you are up and some days you are down, and other times you just feel like screaming. All along the ride though, my wife Karen has been strapped in the seat right next to me. Never in my life has a person brought such a sense of calmness and balance. Not only does she make everything better, but she always brings out the best in me. Operating this rollercoaster ride were my committee members. They made sure that I was safely strapped in and prepared to ride. When I started my graduate studies on BrainMachine interfaces, I knew nothing about signal processing. With their infinite patience and guidance, I was given the opportunity to grow as a person and as a researcher. The chair of my committee, Dr. Jose Principe, inspired me to think big and sent me all over the world to meet the leaders in the field. I hope that one day I will be able to give back the gifts and opportunities that my committee members have so graciously given me. I would like to thank my family for giving me the financial and emotional support to ride this rollercoaster. They have always believed in me. They have always supported me. They have sacrificed themselves for me. They taught me that when everything is said and done, your family will always be there for you. Along for the ride were Deniz, Ken, and Yadu who cheered me on along the way. There is a saying that you are only as good as the people around you and I believe that all of them made me a better person both in terms of friendship and signal processing. I will never forget discussions we had during our 14hour ride through the Canadian Rockies. I thank them for all of their help and insight over the years. I also cannot forget Phil who was involved in this research from the beginning, Scott Morrison for the long hours with the DSP, and Shalom for always adding a bit of humor to all of the smoke and mirrors. Last but not least I need to thank Brian Whitten. Throughout our 12year friendship in Tampa and Gainesville we always thought that we would end up as rock stars. We had some great times getting away from work and playing shows at all of the smoky bars in Florida. In the end though, he became a Veterinarian and I became a Biomedical Engineer; how unlikely! TABLE OF CONTENTS page A C K N O W L E D G M E N T S ................................................................................................. iv LIST OF TABLES .............. .......... .. ....... ........... ....... ix LIST OF FIGURES ......... ......................... ...... ........ ............ xi A B S T R A C T .........x.................................... ....................... ................. xv CHAPTER 1 IN TR OD U CTION ............................................... .. ......................... .. Historical Overview of BMI Modeling Approaches ..............................................3 Foundations of N euronal R ecordings .........................................................................6 Characteristics of Neuronal Activity ............................................................7 Local N euronal Correlations ........................................ .......................... 10 2 DUKE BRAINMACHINE INTERFACE PARADIGM .......................................12 N euronal R recording M ethodology .................................................. .....................12 B ehavioral E xperim ents............................................ ....................................... 16 3 M ODELING PROBLEM ........................................................................... 19 What Mapping Does the Model Have to Find? ...................................................19 Signal Processing Approaches to M odeling ..................................... ......... ......... 20 W white B ox ............................................................ 20 G ray B ox .......................... .... ..................20 Population vector algorithm ........................................... ..................... 21 T odorov's m echanistic m odel ........................................... .....................22 Implementation of "gray box" models..................................................24 B lack B ox ....................................... ............... ............ ......... 27 Finite impulse response filter ............ ................................. ............... 28 Tim edelay neural netw ork ........................................ ....... ............... 29 Recurrent multilayer perceptron............................. ............ ............. 31 Development and Testing of BMI Models..... ..................................32 Reaching Task Performance.....................................................32 Topology and training complexity comparisons............. ................33 Regularization, weight decay, and cross validation ...................................38 P perform ance m etrics.......................................................... ............... 40 Test set perform ance ............................................................................. 42 C ursor C control T ask ......................... ...................... ......... ........... 48 D isc u ssio n ............................................................................................................. 5 2 4 RMLP MODEL GENERALIZATION ................................ ........................ 55 Motivation for Studying the RMLP ................................ ................................. 55 Motivation for Quantifying Model Generalization................. ............................58 MultiSession Model Generalization..................... ..... .......................... 59 M ultiTask M odel G eneralization ........................................ ......................... 65 D iscu ssio n ...................................... ................................................. 7 0 5 ANALYSIS OF THE NEURAL TO MOTOR REPRESENTATION SPACE CON STRUCTED BY THE RM LP....................................................... ............... 73 Introduction .............................................................. .... .... ... ............73 Understanding the RMLP Mapping Network Organization..............................74 U understanding the M apping ........................................ .......................... 74 Input Layer W eights ............................................ .. .. .... .. ........ .... 77 O utput L ay er W eights .............................................................. .....................80 Cursor C control M apping........................................................... ............... 83 D iscu ssio n ...............................8 5............................. 6 INTERPRETING CORTICAL CONTRIBUTIONS THROUGH TRAINED BMI M O D E L S .......................................................................................................8 9 Introdu action ...........................................................................................89 Cortices Involved in Hand Reaching............... ........... ..... .............. ............... 90 Belle's Cortical Contributions................ ....................................... 90 Carm en's Cortical Contributions........................... ............... 92 Cortices Involved in Cursor Tracking ............................................. ............... 95 D iscu ssio n ...................................... ................................................. 9 7 7 ASCERTAINING THE IMPORTANCE OF NEURONS ...................................... 100 Introduction ................... .............. .. ............................ ................. 100 Assumptions for Ranking the Importance of a Neuron ................. ... .................102 Sensitivity Analysis for Reaching Tasks ............ .............................................103 Cellular Importance for Cursor Control Tasks................................................... 111 M odelDependent Sensitivity Analysis ............. .................... .................. 111 M odelIndependent Cellular Tuning Analysis..............................................113 R elative R anking of N eurons .......................................... .......... .................. 118 Model Performance with a Subset of Sensitive Cells ............ .... ........... 119 Implications of Sensitivity Analysis for Model Generalization............................121 D isc u ssio n ................................................... ................. ................ 12 3 8 CONCLUSIONS AND FUTURE DIRECTIONS .................. ............... 128 Sum m ary of C contributions ...................................................................... .......... 131 Difficulties Encountered in BMI Research............... .........................131 F u tu re D direction s ............. .......... ...................................... ............. .... 13 3 F final T thoughts ........................................... ................................................ ......134 APPENDIX A TRAIN IN G TH E RM LP ................................................ .............................. 135 Introduction ...............................................................................135 Topology and Trajectory Learning.................................... ..................................... 135 M onte Carlo Simulations ........................................ ......... ................... 137 B EVALUATION OF THE RMLP MODELING PERFORMANCE USING SPIKE SORTED AND NONSPIKESORTED NEURONAL FIRING PATTERNS........ 140 In tro d u ctio n ......................................................................................................... 14 0 D ata Preparation ................................... ... .......... .............. .. 140 S im u latio n s ..............................................................14 1 D discussion and C onclusion................................................ ............ ............... 144 C MODEL EXCITATION WITH RANDOM INPUTS.............................................145 LIST OF REFEREN CE S .................... ............................. ...... 148 BIOGRAPHICAL SKETCH .................................. ................................ 155 LIST OF TABLES Table pge 11. Neuronal activity for a 25minute recording session.....................................10 21. Assignment of electrode arrays to cortical regions for owl monkeys .....................14 22. Assignment of electrode arrays to cortical regions for Rhesus monkeys..................14 31. M odel param eters ...................... ...................... .. .. ......... ......... 37 32. M odel com putational com plexity.................................... .......................... .......... 38 33. Reaching task testing CC and SER (Belle) .................................... ............... 45 34. Reaching task testing CC and SER (Carmen) ................................. ............... 47 35. Reaching task testing CC and SER (Ivy)........................................ ............... 50 36. Reaching task testing CC and SER (Aurora)............................................... 50 41. Scenario 1: Significant decreases in correlation between sessions .........................64 42. Scenario 2: Significant increases in correlation between sessions.........................64 43. Multitask testing CC and SER values .............. ..... ......... ..................68 44. Significance of CC compared to FIR filter..................................... ............... 68 61. Summary of cortical assignments....................... .... ........................... 92 71. The 10 top ranked cells.......................................... .... ........ ........ .... 118 72. Test set correlation coefficients using the full ensemble............... ............... 120 73. Test set correlation coefficients using 10 most important neurons .......................120 A1. RMLP performance as a function of the number of hidden PEs...............................136 A2. Average testing correlation coefficients as a function of trajectory length............ 137 A3. Training performance for 100 Monte Carlo simulations ............. ... ..................139 A4. Best perform ing netw ork ....................................................... ............... 139 B 1. Testing SER and CC values......................................................... ............... 142 C1. Model performance using random vs. real neuronal activity ..............................146 x LIST OF FIGURES Figurege 11. Conceptual drawing of BM I components..............................................................2 12. The spikebinning process. A) Cellular potentials, B) A spike train, C) Bin count for a single cell, D) An ensemble of bin counts.............. ............................................ 8 13. Timevarying statistics of neuronal recordings for two behaviors..............................9 14. Local neuronal correlations timesynchronized with hand position and velocity .....11 21. Research components of the DukeDARPA BrainMachine Interface Project.........13 22. Chronic implant distributed over six distinct cortical areas of a Rhesus monkey.....15 23. Reachingtask experimental setup and a representative 3D hand trajectory............ 17 24. Using a joystick the monkey controlled the cursor to intersect the target (Task 2) and to grasp a virtual object by applying a gripping force indicated by the rings (T a sk 3 ). .......................................................... ................ 18 31. K alm an filter block diagram ........... .................. ......... ............... ............... 25 32. FIR filter topology. Each neuronal input sN contains a tapdelay line with Itaps.....29 33. Tim edelay neural network topology ............................................. ............... 30 34. Fully connected, state recurrent neural network............. .... .................31 35. R teaching m ovem ent trajectory ...................................................................... .. .... 42 36. Testing performance for a three reaching movements (Belle) ................................44 37. Reaching task testing CEM (Belle) ................................................ ............... 45 38. Testing performance for a three reaching movements (Carmen)............................46 39. Reaching task testing CEM (Carmen).......... .... .................. ............... 48 310. Testing performance for a three reaching movements (Ivy) .................................49 311. Testing performance for a three reaching movements (Aurora) ...........................50 312. Reaching task testing CEM (Ivy) ........................................ ........................ 51 313. Reaching task testing CEM (Aurora) ........................................... ............... 51 41. Tw o scenarios for data preparation ........................................ ....................... 60 42. Scenario 1: Testing correlation coefficients for HP, HV, and GF...........................61 43. Scenario 2: Testing correlation coefficients for HP, HV, and GF..........................62 44. M ultitask m odel training trajectories ................................................. ....................66 45. CEM curves for linear and nonlinear models trained on a multitask...................... 69 46. Multitask testing trajectories centered upon a transition between the tasks ............69 51. Preactivity and Activity in a RMLP with one hidden PE .....................................75 52. Operating points on hidden layer nonlinearity .................................. ............... 76 53. Input layer decomposition into Wis(t) (solid) and Wfyl(tl) (dashed) .....................77 54. Norm of the input vector (104 neurons)................................ .......................... 79 55. Angle between s(t) and W1. Direction cosines for successive input vectors s(t) and s(t 1) .............................................................................................. 7 9 56. Selection of neurons contributing the most to input vector rotation .........................80 57. Output layer weight vector direction for one PE....................................................82 58. Movement trajectory with superimposed output weight vectors (solid) and principal components (dashed). This view is in the direction of PC3 ....................................82 59. RMLP network decomposition for the cursor control task ....................................84 61. One movement segmented into rest/food, food/mouth, and mouth/rest motions......91 62. FIR filter (Aurora): Testing output X, Y, and Z trajectories (bold) for one desired movement (light) from fifteen Wiener filters trained with neuronal firing counts from all combinations of four cortical areas ........... ................. ................93 63. RMLP (Aurora): Testing output X, Y, and Z trajectories (bold) for one desired movement (light) from fifteen RMLPs trained with neuronal firing counts from all combinations of four cortical areas ................... ......... ....................94 64. RMLP (Carmen): Testing output X, Y, and Z trajectories (bold) for one desired movement (light) from three RMLPs trained with neuronal firing counts from all combinations of two cortical areas.............................................94 65. RMLP (Aurora): Testing outputs (o markers) and desired positions (x markers) for six models trained with each separate cortical input. Testing (X, Y) correlation coefficients are provided in the title of each subplot ............................................ 96 66. RMLP (Ivy): Testing outputs (o markers) and desired positions (x markers) for four models trained with each separate cortical input. Testing (X, Y) correlation coefficients are provided in the title of each subplot ............................................ 97 71. Sensitivity at time t for a typical neuron as a function of A .................................. 105 72. RMLP timevarying sensitivity. A) X, Y, Z desired trajectories for three similar movements. B) Neuronal firing counts summed (at each time bin) over 104 neurons. C) Sensitivity (averaged over 104 neurons) for three coordinate directionsl06 73. Reaching task neuronal sensitivities sorted from minimum to maximum for a movement. The ten highest sensitivities are labeled with the corresponding neuron108 74 Testing outputs for RMLP models trained with subsets of neurons. A,B,C) X, Y, and Z trajectories (bold) for one movement (light) from three RMLPs trained with the highest, intermediate, and lowest sensitivity neurons. D) CEM decreases as sensitive neurons are dropped ......... ............................................ ..................... 109 75. Belle's neuronal firing counts from the ten highest and lowest sensitivity neurons timesynchronized with the trajectory of one reaching movement ........................110 76. Sensitivity based neuronal ranking for hand position and velocity for two sessions using a RMLP. The cortical areas corresponding to the ten highest ranking HP, HV and GF neurons are given by the colormap................................. ...... ............ ...112 77. Neuronal ranking based upon the depth of the tuning for each cell. The cortical areas corresponding to the most sharply tuned cells is given by the colormap .....116 78. AD) Cellular tuning curves for hand direction and gripping force using a model independent ranking method, EF) Tuning curves for two representative cells from plots B ) and D ) ................................................................... .........117 79. Scatter plot of neuron ranking for tuning depth and sensitivity analysis ..............19 710. Model performance as a function of the number of cells utilized. Cells were removed from the analysis either by randomly dropping cells (neuron dropping  ND) or by removing the least sensitive cell in an ordered list (computed from a sensitivity analysis SA ) .......................................................... ............... 122 A1. RMLP learning curve. MSE (upper curve) Crossvalidation (lower curve) ..........137 A2. Training MSE curves for 100 Monte Carlo simulations .....................................138 B1. Signal to error ratio between the actual and estimated hand coordinates for SS and N SS data ..................................... ................................. .......... 142 B2. Peaks of hand trajectory (Zcoordinate) ..... ...............................................143 B3. Estimation errors for six peaks. Targets are represented by an x at the origins. The average error (mm) in each direction is displayed on the respective axis............43 C1. Performance measures for five MonteCarlo simulations using neuronal and random d ata ............................................................................................. 14 7 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy FROM CORTICAL NEURAL SPIKE TRAINS TO BEHAVIOR: MODELING AND ANALYSIS By Justin Cort Sanchez May 2004 Chair: Jose C. Principe Major Department: Biomedical Engineering Brain machine interface (BMI) design can be achieved by training linear and nonlinear models with simultaneously recorded cortical neural activity and goal directed behavior. Realtime implementation of this technology requires reliable and accurate signal processing models that produce small error variance in the estimated kinematic trajectories. In this dissertation, the mapping performance and generalization of a recurrent multilayer perception (RMLP) is compared with standard linear and nonlinear signal processing models for two species of primates and two behavioral tasks. Each modeling approach is shown to have strengths and weaknesses that are compared experimentally. The RMLP approach shows very accurate peak amplitude estimations with small error variance using a parsimonious model topology. To validate and advance the stateoftheart of this BMI modeling design, it is necessary to understand how the proposed model represents the neuraltomotor mappings. The RMLP is analyzed here and an interpretation of the neuraltomotor solution of this network is built by tracing the signals through the topology using signal processing concepts. We then propose the use of optimized BMI models for analyzing neural activity to assess the role of and importance of individual neurons and cortical areas in generating the performed movement. It is further shown that by pruning the initial ensemble of neural inputs with the ranked importance of cells, a reduced set of cells can be found that exceed the BMI performance levels of the full ensemble. CHAPTER 1 INTRODUCTION Throughout a lifetime, it is not often that one has the opportunity to be a part of a revolution. However, when the time arises, it is often an unexpected, challenging, and turbulent time because details about the future remain unknown. For example, history has been faced with political, ethical, and scientific revolutions in which nations were born, lives were lost, and lifestyles were changed. Each of these events bestowed on the revolutionists an opportunity to change their lives by learning about the core of their beliefs, which enabled them to expand their horizons. Acting upon the new opportunities, the activists embraced the new perspectives which, undoubtedly made the future more clear. Presently we are in the midst of a technological revolution. Our lives are being transformed by worldwide highspeed communications and digital technology that allows for instant gratification in the exchange of ideas and experiences. Interaction with digital computers is the means through which this revolution is taking place. As time passes and our scientific abilities develop, it remains unknown how deeply cultures will embrace this technology to express their will, and share their ideas and experiences. We must ask ourselves how and when will the line between man and machine blur or even vanish. How can we prepare for this merger? What are the scientific, ethical, and engineering challenges that must be overcome for such a change? What opportunities should be seized now, that will shape our future? Recently, several landmark experimental paradigms have begun to blur the line between man and machine by showing the feasibility of using neuroprosthetic devices to restore motor function and control in individuals who are "locked in" or who have lost the ability to control the movement of their limbs [119]. In these experiments, researchers seek to both rehabilitate and augment the performance of neuralmotor systems using BrainMachine Interfaces (BMIs) that directly transfer the intent of the individual (as collected from the brain cortex) into control commands for prosthetic limbs and computers. Brain Machine Interface research has been motivated by the need to help the more than 200,000 individuals in the U.S. suffering from a wide variety of neurological disorders that include spinal cord injury and diseases of the peripheral nervous system [20]. While the symptoms and causes of these disabilities are diverse, one characteristic is common in many of these neurologic conditions; normal functioning of the brain remains intact. If the brain is spared from injury and control signals can be extracted, the BMI problem becomes one of finding optimal signal processing techniques to efficiently and accurately convert these signals into operative control commands. Step 2. Step 3. Optimal _ Conditioning Signal Processing Computer and Prosthetic Arm Step 1. Commands Figure 11. Conceptual drawing of BMI components A conceptual drawing of a BMI is depicted in (Fig. 11) where neural activity from hundreds of cells is recorded (step 1), conditioned (step 2), and translated (step 3) directly into hand position (HP), hand velocity (HV), and hand gripping force (GF) of a prosthetic arm or cursor control for a computer. Our study focused on step 3 of the diagram where optimal signal processing techniques were used to find the functional relationship between neuronal activity and behavior. From an optimal signal processing viewpoint, BMI modeling in step 3 is a challenging task because of several factors: the intrinsic partial access to the motor cortex information due to the subsampling of the neural activity, the unknown aspects of neural coding, the huge dimensionality of the problem, and the need for realtime signal processing algorithms. The problem is further complicated by a need for good generalization in nonstationary environments that depends on model topologies, fitting criteria, and training algorithms. Finally, reconstruction accuracy must be assessed, since it is linked to the choice of linear vs. nonlinear and feedforward vs. feedback models. Since the basic biological and engineering challenges associated with optimal signal processing for BMI experiments require a highly interdisciplinary knowledge base involving neuroscience, electrical and computer engineering, and biomechanics, the BMI modeling problem is introduced in several steps. First, an overview of the pioneering modeling approaches gives the reader a deeper understanding of what has been accomplished in this area of research. Second, the reader is familiarized with characteristics of the neural recordings used in signal processing methods. Historical Overview of BMI Modeling Approaches The foundations of BMI research were laid in the early 1980s by E. M. Schmidt [1] who was interested in finding out if it was possible to use neural recordings from the motor cortex of a primate to control external devices. In this pioneering work, Schmidt measured how well primates could be conditioned to modulate the firing patterns of single cortical cells using a series of eight target lamps, each symbolizing a cellularfiring rate that the primate was required to produce. The study confirmed that a primate could intend to match the target firing rates and also estimated the information transfer rate in the neural recordings to be half that of using the intact motor system as the output. With this result, Schmidt proposed that engineered interfaces could be designed to use modulations of neural firing rates as control signals. Shortly after Schmidt published his results, Georgopopoulos and Schwartz presented a theory for neural population coding of hand kinematics as well as a method for reconstructing hand trajectories, called the population vector algorithm (PVA) [7]. Using center out reaching tasks, Georgopoulos proposed that each cell in the motor cortex has a "preferred hand direction" for which it fires maximally and the distribution of cellular firing over a range of movement directions could be characterized by a simple cosine function [5]. In this theory, arm movements were shown to be constructed by a population "voting" process among the cells; each cell makes a vectoral contribution to the overall movement in its preferred direction with magnitude proportional to the cell's average firing rate [21]. Schmidt's proof of concept and Georgopoulos' BMI application to reaching tasks spawned a variety of studies implementing "out of the box" signal processing modeling approaches. One of the most notable studies by Chapin et al. [3] showed that a Jordan style recurrent neural network could be used to translate the neural activity (21 to 46 neurons) of rats trained to obtain water by pressing a lever with a paw. The usefulness of this BMI was shown when the animals routinely stopped physically moving their limbs to obtain the water reward. Also in the neural network class, Lin et al. [22] used a self organizing map (SOM) that clustered neurons with similar firing patterns which then indicated movement directions for a spiral drawing task. Borrowing from control theory, Kalaska and Scott [23] proposed the use of forward and inverse control architectures for reaching movements. Also during this period other researchers presented interpretations of population coding which included a probabilitybased population coding from Sanger [24] and musclebased cellular tuning from MussaIvaldi [25]. Almost 20 years after Schmidt and Georgopoulos' initial experiments, Nicolelis and colleagues [19] presented the next major advancement in BMIs by demonstrating a real (nonportable) neuroprosthetic device in which the neuronal activity of a primate was used to control a robotic arm. This research group hypothesized that the information needed for the BMI is distributed across several cortices and therefore neuronal activity was collected from 100 cells in multiple cortical areas (premotor, primary motor, and posterior parietal) while the primate performed a 3D feeding (reaching) task. Linear and nonlinear signal processing techniques including a frequency domain Wiener filter (WF) and a timedelay neural network (TDNN) were used to estimate hand position. Trajectory estimates were then transferred via the internet to a local robot and a robot located at another university. In parallel with the the work of Nicolelis, Donoghue and colleagues [16] presented a contrasting view of BMIs by showing that a 2D computer cursor control task could be achieved using only a few neurons (between 7 and 30) located only in the primary motor cortex of a primate. The Wiener filter signal processing methodology was again implemented here, however this paradigm was closedloop since the primate received instant visual feedback from the cursor position output from the WF. The novelty of their experiment was the primate's opportunity to incorporate the signal processing model into its motor processing. The final BMI approach presented is from the Andersen research group [26] who showed that the end point of hand reaching can be estimated using a Bayesian probabilistic method. Neural recordings were taken from the Parietal Reach Region (PRR) since this region is believed to encode the planning and target of hand motor tasks. Using this hypothesis they devised a paradigm where a primate was cued to move its hand to a rectangular grid of target locations presented on a computer screen. The neural tomotor translation involves computing the likelihood of neural activity, given a particular target. While this technique has been shown to accurately predict the end point of hand reaching, it differs from the aforementioned techniques by not accounting for the hand trajectory. Foundations of Neuronal Recordings One of the most important steps in implementing an optimal signal processing technique for any application is data analysis. Optimality in the signal processing technique implies that the apriori information about the statistics of the data match the a priori information used in designing the signal processing technique [27]. In the case of BMIs, the statistical properties of the neural recordings and the analysis of neural ensemble data are not fully understood. Hence, our lack of information means that the neuralmotor translation is not guaranteed to be optimum. Despite this reality, by developing new neuronal dataanalysis techniques, the match between neural recordings and BMI design can be improved [28, 29]. For this reason, it is important for the reader to be familiar with the characteristics of neural recordings that would be encountered. Characteristics of Neuronal Activity The process of extracting signals from the motor, premotor, and parietal cortices of a behaving animal involves the implantation of subdural microwire electrode arrays into the brain tissue (usually layer V) [19]. At this point, the reader should be aware that scope of sampling of current BMI studies involves tens to hundreds of cortical cells recorded from tissue that is estimated to contain 1011 neurons, 1014 synapses, and 1010 cortical circuits [30]. Each microwire measures the potentials (action potentials) resulting from ionic current exchanges across the membranes of neurons locally surrounding the electrode. Typical cellular potentials (Fig. 12A), have magnitudes ranging from hundreds of microvolts to tens of millivolts, and time durations of tens to a couple of milliseconds [29]. Since action potentials are so short in duration, it is common to treat them as point processes where the continuous voltage waveform is converted into a series of timestamps indicating the instance in time when the spike occurred. Using the timestamps, a series of pulses or spikes (zeros or ones) can be used to visualize the activity of each neuron; this timeseries (Fig. 12B) is referred to as a spike train. The spike trains of neural ensembles have several properties including sparsity, nonstationarity, and are noncontinuous valued. While the statistical properties of neural recordings can vary depending on the sample area, the animal, and the behavior paradigm, in general, spike trains can be estimated to have a Poisson distribution [28]. To reduce the sparsity in neuronal recordings, a method of inning is used to count the Action Potentals A Spikes 0.08 O.2 0.6 I 02 time time Ensemble10 Cells Bin Counts C C S .. I m m3.5, 25 : L mmm] oIS time (100ms) 0 te Figure 12. The spikebinning process. A) Cellular potentials, B) A spike train, C) Bin count for a single cell, D) An ensemble of bin counts. number of spikes in 100 ms nonoverlapping windows as shown in Fig. 12C. This method greatly reduces the number of zeros in the digitized timeseries and also provides a time to amplitude conversion of the firing events. Even with the inning procedure, the data remains extremely sparse. To assess the degree of sparsity and nonstationarity in BMI data, we used observations from a 25minute BMI experiment from the Nicolelis Lab in Duke University (Table 11). The table shows that the percentage of zeros can be as high as 80%. Next, an ensemble average of the firing rate per minute is computed using nonoverlapping 1 minute windows averaged across all cells. The ensemble of cells used in this analysis primarily contains low firing rates given by the small ensemble average. Additionally we can see time variability of the 1 minute ensemble average given by the associated standard deviation. (Reaching Task) E 0.3 0 (, a 0.25 t* LL. a 0.2 e ( 0) 0.85 * 0.8 E 0.75 LL 0 0.7 0.65 0.6 0 5 10 15 20 Time (min) 25 30 35 40 Figure 13. Timevarying statistics of neuronal recordings for two behaviors The important message of this analysis is that standard optimal signal processing techniques (Wiener filters, neural network, etc.) were not designed for data that is nonstationary and discrete valued. In Figure 13, the average firing rate of the ensemble (computed in nonoverlapping 60 sec windows) is tracked for a 38minute session. From minute to minute, the mean value in the firing rate can change drastically depending on the movement being performed. Ideally we would like our optimal signal processing techniques to capture the changes observed in Fig. 13. However, the reader should be aware that in the environment of this dataset, applying any of the "out of the box" signal processing techniques means that the neural to motor mapping is not optimal. More importantly, any performance evaluations and model interpretations drawn by the 5 10 15 20 25 30 35 40 (Cursor Control) experimenter can be directly linked and biased by the mismatch between data and model type. Table 11. Neuronal activity for a 25minute recording session Percentage of Zeros Average Firing Rate spikes/cell/minute 3D Reaching Task 86 0.25+0.03 (104 cells) 2D Cursor Control 60 0.690.02 Task (185 cells) Local Neuronal Correlations Another method to describe the nonstationary nature of the data is by computing local neuronal correlations. In this scheme, we attempt to observe information contained in cell assemblies, or subsets of neurons in each cortical area that mutually excite each other [31]. We attempt to extract useful information in the local bursting activity of cells in the ensemble by analyzing the local correlations among the cells. To identify and extract this local activity, the crosscorrelation function in Eq. 11 was used to quantify synchronized firing among cells in the ensemble. In this case, small sliding (overlapping) windows of data are defined by A(t) which is a matrix containing L delayed versions of firing patterns from 180 cells. At each time tick t in the simulation, the crosscorrelation between neurons i andj for all delays / is computed. Since we are interested in picking only the most strongly synchronized bursting neurons in the local window, we simply average over the delays in Eq. 12. We define C, as a vector representing how well correlated the activity of neuronj is with the rest of the neuronal ensemble. Next, a single 180x1 vector at each time tick t is obtained in Eq. 13 by summing cellular assembly crosscorrelations only within each sampled cortical area, Ml, PMd, SMA, and S1. C(t),, = E[A(t), A(t l) (11) 1 L C(t),J = C(t),J, (12) c(t)= Z (t) (13) i=cortex The neural correlation measure in Eq. 13 was then used as a realtime marker of neural activity corresponding to segments of movements of a hand trajectory. In Fig. 14, highly correlated neuronal activity is shown to be timevarying in synchrony with changes in movement direction as indicated by the vertical dashed line. Figure 14 shows that the firing rates contained in the data is highly variable even for similar movements. Using "off the shelf' signal processing modeling approaches, it is difficult to capture this local variability in time since many of the models use inner product operations that average the ensemble's activity. Local Neuronal Correlation PMd psi. 0.6 c 1 SMAi  50 100 150 200 250 300 1500 o50 I , 0 Ex o 50 100 1; 200 250 300 ( 20 Eo :' 5r '., li ,., ...:' 1 ,)0L Time (1'Omsj Figure 14. Local neuronal correlations timesynchronized with hand position and velocity CHAPTER 2 DUKE BRAINMACHINE INTERFACE PARADIGM The ultimate goal for modeling and analyzing the relationship from cortical neuronal spike trains to behavior is to develop components of a real closedloop BMI system. The envisioned multicomponent, multiuniversity experimental paradigm was developed for primates by Miguel Nicolelis at Duke University (Fig. 21). This novel research paradigm seeks to collect neuronal activity from multiple cortices and translate the encoded information into commands for a robotic arm that the primate can see and react to. The hope is that visual feedback and somatosensory stimulation from the movement of the robotic arm will allow the primate to incorporate the mechanical device into its own cognitive space. It remains unknown how neural control of a mechanical device will impact the normal neurophysiology of the primate. Modeling and analysis of neuronal and behavioral recordings bridges the gap between component 1 and component 5 of the diagram, and provides an intersection for studying motor neurophysiology. Here we provide a description of the neuronal recording techniques and behavioral paradigms. Data collected from these experiments serves as inputs and desired responses for the models in the remainder of this dissertation. Neuronal Recording Methodology Data for the following experiments were generated in the primate laboratory at Duke University and neuronal recordings were collected from four adult, female primates: 2 owl monkeys (Aotus trivirgatus) Belle and Carmen and 2 Rhesus monkeys (Macaca mulatta) Aurora and Ivy. All four animals were instrumented with several high density microelectrode arrays, each consisting of up to 128 microwires (30 to 50 [m in diameter, spaced 300 [tm apart), distributed in a 16 X 8 matrix. Each recording site occupies a total area of 15.7 mm2 (5.6 x 2.8 mm) and is capable of recording up to four single cells from each microwire for a total of 512 neurons (4 x 128). 1 Nicolelis, Duke Neurobiology MultiChannell 3 Wolf, 2 Wiggins Recordings J Duke BME Plexon Inc. VLSIBased DSPBased Duke Neu..:.iu., ., Processing Spikeng 8H Tactle 4 Principe, for Duke BME Feedback''r:3 : m n usa a e a nUniversity RealTime .ie nap. RNeea o RealTime Florida Neural Data Analysis Polymeric Actuators 6 Clark, Duke Mechanical Eng. Computational] Model of the 1 and 7 RealTime BMI Operation and Teleoperation Visual and of Neurobot 8 Henriquez, Tactile Hand Duke BME Feedback 3 Dimensional Artificial Limb 5 Srinivasan, MIT Figure 21. Research components of the DukeDARPA BrainMachine Interface Project Through a series of small craniotomies, electrode arrays were implanted stereotaxically in seven cortical neural structures that are involved in controlling fine arm and hand movements. The following cortical areas and arm/hand neural structures were targeted using neuroanatomical atlases and intraoperative stimulation: the anterior parietal cortex (areas 3a, 1, 25); and in area 7a of the posterior parietal cortex (this area receives both visual and tactile inputs and projects to the premotor cortex); primary motor (MI) cortex, the dorsolateral premotor areas (PMd); the supplementary motor area (SMA); and the primary somatosensory cortex (S1) [19, 32]. Assignment of electrode arrays and the sampled cortical area for each primate are shown in Tables 21 and 22. The number of cells contained in each session's (S1, S2) recording is also included. Notice that the number of cells in each session can either increase or decrease. For Belle, Aurora, and Ivy, the time interval between sessions was 1 day; but in the case of Carmen, only one experiment could be obtained. Figure 22 shows a fully instrumented primate that has microwire arrays implanted into three cortical areas on each hemisphere. Table 21. Assignment of electrode arrays to cortical regions for owl monkeys Belle (right handed) Carmen (right handed) Area 1 Area 2 Area 3 Area 4 Area 2 Area 3 Posterior Primary Dorsal Primary Primary Motor Dorsal Premotor Parietal Motor Premotor Motor & (Mlcontra) (PMdcontra) (PP (Ml (PMd Dorsal contra) contra) contra) Premtor (M1/PMd ipsi) S133 S121 S127 S123 cells S137 cells S117 cells cells cells cells S220 cells S229 S219 S223 cells cells cells Table 22. Assignment of electrode arrays to cortical regions for Rhesus monkeys Aurora (left handed) Ivy (left handed) Area 2 Area 4 Area 5 Area 6 Area 7 Area 1 Area 2 Area 6 Primary Dorsal Somato Supp. Motor Primary Posterior Primary Supp. Motor Pre sensory Associative Motor Parietal Motor Motor (Ml motor (Sl (SMA (Ml (PP (Ml Associ contra) (PMd contra) contra) ipsi) contra) contra) ative contra) (SMA contra) S156 S164 S139 S119 cells S15 S149 S190 S153 cells cells cells S219 cells cells cells cells cells S257 S266 S238 S25 S250 S297 S258 cells cells cells cells cells cells cells On recovering from the surgical procedure, the primates are treated with a daily regiment of antibiotics. Neuronal recordings begin 15 days after surgery. Multineuron recording hardware and software, developed by Plexon Inc. (Dallas, TX), is used in the experimental setup. With Plexon's Multichannel Many Neuron Acquisition Processor (MNAP), the neuronal action potentials are amplified and band pass filtered (500 Hz to 5 KHz), and later digitized using a sampling rate of 30 KHz. From the raw electrode voltage potentials, cells are sorted, and single spikes are discriminated using a principle component algorithm and a pair of timevoltage windows per unit. Particular neurons are sorted by the BMI experimenter, who adjusts the sizes and positions of the timevoltage boxes in an attempt to gather features of single cells. The firing times of all sorted spikes are transferred to the harddisk of a controller PC. Neuronal firing times are then binned (added) in nonoverlapping windows of 100 ms (and are the values directly used as inputs to the models used in this dissertation). Along with the firing times, the MNAP software keeps records of the action potential waveforms for all sorted cells, so the single neurons can be identified and compared over the span of several recording sessions. &2 i~iW, Figure 22. Chronic implant distributed over six distinct cortical areas of a Rhesus monkey Behavioral Experiments Three separate behavioral tasks were used in this study. In the first experiment, the firing times of single neurons were recorded while the primates (Belle and Carmen) performed a 3D reaching task that involved a righthanded reach to food then placing the food in the mouth. The primates are seated in a chair (Fig 23) and are cued to reach to food in one of four positions. The primate's hand position, used as the models' desired signal, was recorded (with a time shared clock) and digitized with 200 Hz sampling rate. To take into account the primate's reaction time, the spike trains were delayed by 0.2301 seconds with respect to the hand position. This delay was chosen based on loose neurophysiologic reasoning, and should be subject to optimization in future studies. Neuronal and positional data were collected during two independent sessions on 2 consecutive days, from the same primate performing the same reaching task. The task and recording procedure were repeated for the second primate in two independent sessions over 2 consecutive days for each of the tasks. In this experiment, since reaching movement are sparsely embedded in a background of resting movements we used a large set of 20,000 samples was used for model training; and 3,000 samples for model testing. The second task, a cursor control task, involved the presentation of a randomly placed target (large disk) on a computer monitor in front of the monkey (Aurora and Ivy). The monkey used a handheld manipulandum (joystick) to move the cursor (smaller circle) so that it intersects the target (Fig. 24). On intersecting the target with the cursor, the monkey received a juice reward. While the monkey performed the motor task, the 1 For the purpose of this dissertation, all results are consistent on an interval of 0.030 to 0.430 seconds. While optimization of the delay for a particular animal and motor task can be the subject of another study, we have observed performance to decrease with increasing the delay. hand position and velocity for each coordinate direction (HPx, HPy and HVx, HVy) were recorded in real time (1000 Hz) along with the corresponding neural activity. Data Acqutatoni Uni i S. ...6 0 ...... 60 Mouth 20 STray 2 40 0 20 start 40 ,60MM 60C0 20 Figure 23. Reachingtask experimental setup and a representative 3D hand trajectory In the third task, the monkey was presented with the cursor in the center of the screen and two concentric rings. The diameter difference of these two circles instructed the amount of gripping force the animal had to produce. Gripping force (GF) was measured by a pressure transducer located on the joystick. The size of the cursor grew as the monkey gripped the joystick, providing continuous visual feedback of the amount of gripping force. Force instruction changed every trial while the position of the joystick was fixed. Hand kinematic parameters, were digitally lowpass filtered and downsampled to 10 Hz. Both the neural recordings and behavior were time aligned2 and used directly as inputs and desired signals for each model. During each recording session, a sample data set was chosen consisting of 8,000 data points. This data set was segmented into two exclusive parts: 5,000 samples for model training and 3,000 samples for model testing. 2 We assume that the memory structures in each model can account for the delay between neural activity and the generation of the corresponding behavior. Results in [32] showed that models trained with approximately 10 minutes of data produced the best fit. Task 2 Task 3 Figure 24. Using a joystick the monkey controlled the cursor to intersect the target (Task 2) and to grasp a virtual object by applying a gripping force indicated by the rings (Task 3). CHAPTER 3 MODELING PROBLEM What Mapping Does the Model Have to Find? The models implemented in BMIs must learn to interpret neuronal activity and accurately translate it into motor commands. By analyzing recordings of neural activity collected simultaneously with behavior the aim is to find functional relationship between neural activity and the kinematic variables. An important question here is how to choose the class of functions and model topologies that best match the data and that are sufficiently powerful to create a mapping from neuronal activity to a variety of behaviors. As a guide, prior knowledge about the nervous system can be used to help develop this relationship. Since the experimental paradigm involves measuring only two variables (neural firing and behavior) we are directed to a general class of InputOutput (I/O) models that have the ability to create a representation space between two timeseries. Within this class, several candidate models are available. Based on the amount of neurophysiologic information known about the system, an appropriate model can be chosen. Three types of I/O models based on the amount of prior knowledge exist in the literature [33]: * White Box. The model is perfectly known and built from physical insight and observations. * Gray Box. Some physical insight to the model is known but other parameters need to be determined from the data * Black Box. No physical insight is available or used to choose the model. The chosen model is known to be robust in a variety of applications. Our choice of white, gray or black box is dependent upon our ability to access and measure signals at various levels of the motor system as well as the computational cost of implementing the model in our current computing hardware. Signal Processing Approaches to Modeling White Box The first modeling approach, "white box,"would require the highest level of physiologic detail. Starting with behavior and trace back, the system comprises muscles, peripheral nerves, the spinal cord, and ultimately the brain. This is a daunting task of system modeling due to the complexity, interconnectivity, and dimensionality of the involved neural structures. Model implementation would require the parameterization of a complete motor system [34] that includes the cortex, cerebellum, basal ganglia, thalamus, corticospinal tracts, and motor units. Since all of the details of each component/subcomponent of the described motor system remain unknown and are the subject of study for many neurophysiologic research groups around the world it is not presently feasible to implement whitebox BMIs. Even if were possible to parameterize the system to some high level of detail, the task of implementing the system in our state oftheart computers and digital signal processors (DSPs) would be a cumbersome task. Gray Box Next, the "gray box" model requires a reduced level of physical insight. In the "gray box" approach, one could take a particularly important feature of the motor nervous system, incorporate this knowledge into the model, and then use data to determine the rest of the unknown parameters. Two examples of "gray box" models can be found in the BMI literature. First, one of the most common examples is Georgopoulos' population vector algorithm (PVA) [7]. Using observations that cortical neuronal firing rates were dependent on the direction of arm movement, a model was formulated to incorporate the weighted sum of the neuronal firing rates. The weights of the model are then determined from the neural and behaviroal recordings. A second example is given by Todorov who extended the PVA by observing multiple correlations of Ml firing with movement position, velocity, acceleration, force exerted on an object, visual target position, movement preparation, and joint configuration [5, 6, 10, 12, 14, 16, 18, 19, 3538]. With these observations, Todorov proposed a minimal, linear model that relates the delayed firings in Ml to the sum of many mechanistic variables (position, velocity, acceleration and force of the hand) [39]. Todorov's mechanistic variables are incorporated into signal processing methodologies through a general class of generative models [4, 40]. Using knowledge about the relationship between arm kinematics and neural activity, the states (preferably the feature space of Todorov) of linear or nonlinear dynamical systems can be assigned. This methodology is supported by a well known training procedure developed by Kalman [41]. Since the formulation of generative models is recursive in nature, it is believed that the model is well suited for learning about motor systems because the states are all intrinsically related in time. Population vector algorithm The first model discussed is the population vector algorithm which assumes that a cell's firing rate is a function of the velocity vector associated with the movement performed by the individual. The PVA model is given by Eq. (31) ,,(V) = b +bvx + bv +bv = B V = BV cos (31) where the firing rate s for neuron n is a weighted (bnx,y,z ) sum of the vectoral components (vx,,z ) of the unit velocity vector V of the hand plus mean firing rate bo0. The relationship in Eq. (31) is the inner product between the velocity vector of the movement and the weight vector for each neuron. The inner product (i.e spiking rate) of this relationship becomes maximum when the weight vector B is collinear with the velocity vector V. At this point, the weight vector B can be thought as the cells preferred direction for firing since it indicates the direction for which the neuron's activity will be maximum. The weights bn can be determined by multiple regression techniques [7]. Each neuron makes a vectoral contribution w in the direction of P, with magnitude given in Eq. (32). The resulting population vector or movement is given by Eq. (33) where the reconstructed movement at time t is simply the sum of each neurons preferred direction weighted by the firing rate. w, (V, t)= s, (V) b" (32) N B P(V,t) = w(V, t) B (33) Todorov's mechanistic model An extension to the PVA has been proposed by Todorov [39] who considered multiple correlations of Ml firing with movement velocity and acceleration, position, force exerted on an object, visual target position, movement preparation, and joint configuration [5, 6, 10, 12, 14, 16, 18, 19, 3538]. Todorov proposed an alternative hypothesis stating that neural correlates with kinematic variables are epiphenomena of muscle activation stimulated by neural activation. Using studies showing that Ml that contains multiple, overlapping representations of arm muscles and forms dense corticospinal projections to the spinal cord and is involved with the triggering of motor programs and modulation of spinal reflexes [30], Todorov proposed a minimal, linear model that relates the delayed firings in Ml to the sum of mechanistic variables (position, velocity, acceleration and force of the hand). Todorov's model takes the form Us(t A) = F GF(t)+ mHA(t)+ bHV(t)+ kHP(t) (34) where the neural population vector U is scaled by the neural activity s(t) and related to the scaled kinematic properties of gripping force GF(t), hand acceleration HA(t), velocity HV(t), and position HP(t)3. From the BMI experimental setup, a spatial sampling (in the hundred of neurons) of the input s(t) and the hand position, velocity, and acceleartion are collected synchronously, therefore the problem is one of finding the appropriate constants using a system identification framework [27]. Todorov's model in Eq. (34) assumes a first order force production model and a local linear approximation to multijoint kinematics that may be too restrictive for BMIs. The mechanistic model for neural control of motor activity given in Eq. (34) involves a dynamical system where the output variables, position, velocity, acceleration, and force, of the motor system are driven by an high dimensional input signal that is comprised of delayed ensemble neural activity [39]. In this interpretation of Eq. (34), the neural activity can be viewed as the cause of the changes in the mechanical variables and the system will be performing the decoding. In an alternative interpretation of equation Eq. (34), one can regard the neural activity as a distributed representation of the mechanical activity, and the system will be performing generative modeling. Next a more general state space model implementation, the Kalman Filter, will be presented. This filter corresponds to the representation interpretation of Todorov's model for neural control. 3 The mechanistic model reduces to the PVA if the force, acceleration, and position terms are removed. Implementation of "gray box" models The Kalman formulation attempts to estimate the state, x(t), of a linear dynamical system (Fig. 31). Here, for BMI applications define that the hand position, velocity, acceleration, and the neuronal spike counts are governed by a linear dynamical equation. In this model, the state vector is defined in Eq. (35) x(t)= [HP(t) HV(t) HA(t) f,1 (t)]T (35) where HP, HV, and HA are the hand position, velocity, and acceleration vectors4, respectively. The spike counts of N neurons are also included in the state vector as fl,...,fN. The Kalman formulation consists of a generative model for the data specified by the linear dynamic equation for the state in Eq. (36) x(t + 1)= Ax(t)+ w(t) (36) where w(t) is assumed to be a zeromean noise term with covariance W. The output mapping (from state to spike trains) for this BMI linear system is simply y(t) = Cx(t)+ v(t) (37) where v(t) is the zero mean measurement noise with covariance V and y is a vector consisting of the neuron firing patterns binned in nonoverlapping windows. In this specific formulation, the outputmapping matrix is C = [0Nx9 IN ] and the output noise is zero, i.e. V=0. This recursive state estimation using hand position, velocity, and acceleration is potentially more powerful than the linear filter mapping just neural activity to position. This advantage comes at the cost of long training set requirements since the model contains many parameters to be optimized. 4 The state vector is of dimension 9+N; each kinematic variable contains an x, y, and z component plus the dimensionality of the neural ensemble. Figure 31. Kalman filter block diagram Suppose there are L training samples of x(t) and y(t), and the model parameters A and W are determined using least squares. The optimization problem to be solved is given by Eq. (38). LI A =arg min x(t + 1) Ax(t)2l (38) A t=1 The solution to this optimization problem is found to be Eq. (39) A = XX (XXT1 (39) where the matrices are defined as Xo = [x, .. xL l X, = [x2 **. XL]. The estimate of the covariance matrice W can then be obtained using Eq. (310). W = (X, AXo)(X, AX) /(L 1) (310) Once the system parameters are determined using least squares on the training data, the model obtained (A, C, W) can be used in the Kalman filter to generate estimates of hand positions from neuronal firing measurements. Essentially, the model proposed here assumes a linear dynamical relationship between current and future trajectory states and spike counts. Since the Kalman filter formulation requires a reference output from the model, the spike counts are assigned to the output, as they are the only available signals. The Kalman filter is an adaptive implementation of the Luenberger observer where the observer gains are optimized to minimize the state estimation error variance. In real time operation, the Kalman gain matrix K Eq. (312), is updated using the projection of the error covariance in Eq. (311) and the error covariance update in Eq. (314). During model testing, the Kalman gain correction is a powerful method for decreasing estimation error. The state in Eq. (313) is updated by adjusting the current state value by the error multiplied with the Kalman gain. P (t +1) = AP(t)AT +W (311) K(t +1) = P (t +1)CT (CP (t +1)CT)' (312) x(t + 1) Ax(t) + K(t + 1)(Y(t + 1) CAx(t)) (313) P(t +1)= (IK(t+1)C)P (t+1) (314) Using measured position, velocity, and acceleration as states and neuronal firing counts as model outputs within this recursive framework, this approach may seem to be the best stateoftheart method available to understand the encoding and decoding between neural activity and hand kinematics. Unfortunately, for BMIs this particular formulation is faced with problems of parameter estimation. The generative model is required to find the mapping from the lowdimensional kinematic parameter state space to the highdimensional output space of neuronal firing patterns (100+ dimensions). Estimating model parameters from the collapsed space to the highdimensional neural can be difficult and yield multiple solutions. For this modeling approach, our use of physiologic knowledge in the framework of the model actually complicates the mapping process. As an alternative, one could disregard any knowledge about the system being modeled and use a strictly datadriven methodology to build the model. Other BMI research groups have studied and extended the Kalman formulation presented here for neural decoding [4244]. A multiple model implementation or "switching Kalman filter" has been proposed by Black et al. [43]. Additionally, Particle filtering has been applied to both linear and nonlinear state equations and Gaussian and Poisson models [42, 44]. While these approaches to the BMI mapping problem are novel and interesting, they have produced similar performance results to the standard Kalman formulation. Black Box The last I/O model presented is the "black box" model for BMIs. In this case, it is assumed that no physical insight is available for the model. The foundations of this type of time series modeling were laid by Norbert Wiener for applications of gun control during World War II [45]. While military gun control applications may not seem to have a natural connection to BMIs, Wiener provided the tools for building models that correlate a wide variety of inputs (in our case neuronal firing rates) and outputs (hand/arm movements). While this Wiener filter is topographically similar to the PVA algorithm, it is interesting to note that it was developed more than thirty years before Georgopoulos was developing his linear model relating neuronal activity to arm movement direction. The three inputoutput modeling abstractions have gained a large support from the scientific community and are also a well established methodology in signal processing and control theory for system identification [27]. Engineers have applied these models for many years to a wide variety of applications and have proven that the method produces viable phenomenological descriptions when properly applied [46, 47]. One of the advantages of the technique is that it quickly finds, with relatively simple algorithmic techniques, optimal mappings (in the sense of minimum error power) between different time series using a nonparametric approach (i.e. without requiring a specific model for the time series generation). These advantages have to be counter weighted by the abstract (nonstructural) level of the modeling and the many difficulties of the method, such as determining what a reasonable fit, a model order, and a topology is to appropriately represent the relationships among the input and desired response time series. Finite impulse response filter The first black box model that we will discuss assumes that there exists a linear mapping between the desired hand kinematics and neuronal firing counts. In this model, the delayed versions of the firing counts, s(tl), are the bases that construct the output signal. Figure 32 shows the topology of the multiple output Wiener filter (WF) where the output y, is a weighted linear combination of the 10 most recent values5 of neuronal inputs s given in Eq. (3.15) [27]. Here y, can be defined to be any of the single coordinate directions of the kinematic variables HP, HV, HA, or GF. The model parameters are updated using either the optimal linear Least Squares (LS) solution (Wiener Solution) or the LMS algorithm which utilizes stochastic gradient descent [27]. The Wiener solution is given by Eq. (316), where R and P, are the autocorrelation and cross correlation functions, respectively, and dj is one of the following: hand trajectory, velocity, or gripping force6. Using the iterative LMS the algorithm in Eq. (317), the model 5 In our studies we have observed neuronal activity correlated with behavior for up to 10 lags. 6 Each neuronal input and desired trajectory for the WF was preprocessed to have a mean value of zero. parameters are updated incrementally using r, the constant learning rate, and the error e(t)=dx(t)yx(t). y (t) = Ws(t) (315) W, = R 'P =E(sTs)E(sTdj) (Wiener) (316) w, (t +1)= W, (t)+ re, (t)s(t) (LMS) (317) x,;(t) z  w Figure 32. FIR filter topology. Each neuronal input sN contains a tapdelay line with I taps Linear filters trained with mean square error (MSE) provide the best linear estimate of the mapping between neural firing patterns and hand position. Even though the solution is guaranteed to converge to the global optimum, the model assumes that the relationship between neural activity and hand position is linear which may not be the case. Furthermore, for large input spaces, including memory in the input introduces many extra degrees of freedom to the model hindering generalization capabilities. Timedelay neural network Spatiotemporal nonlinear mappings of neuronal firing patterns to hand position can be constructed using TimeDelay Neural Networks (TDNN). This topology (Fig. 33) is more powerful than the linear FIR filter because each of the hidden PEs outputs can be thought of as a nonlinear adaptive basis of the output space utilized to project the high dimensional data. Then these projections can be linearly combined to form the outputs that will predict the desired hand movements. This architecture consists of a tap delay line memory structure at the input in which past neuronal firing patterns in time can be stored, followed by a nonlinearity. The output of the first hidden layer of the network can be described with the relation y, (t) = f(Ws(t)) where f(.) is the hyperbolic tangent nonlinearity (tanh(px))7. The input vector s includes I most recent spike counts from N input neurons. In this model the delayed versions of the firing counts, s(tl), are the bases that construct the output of the hidden layer. The number of delays in the topology should be set that there is significant coupling between the input and desired signal. The output layer of the network produces the hand trajectory y2(t) using a linear combination of the hidden states and is given by y (t) = Wy, (t). The weights (Wi, W2) of this network can be trained using static backpropagation8 with mean square error (MSE) as the learning criterion. xl(t) SzWeight Weight Matrix Matrix yP(t) z y(t) W1 W2 Figure 33. Timedelay neural network topology SThe logistic function is another common nonlinearity used in neural networks. 8 Backpropagation is a simple application of the chainrule which propagates the gradients through the topology. While the nonlinear nature of the TDNN may seem as an attractive choice for BMIs, putting memory at the input of this topology presents a difficulty in training and model generalization. Adding memory to the high dimensional neural input introduces many free parameters to train. For example, if a neural ensemble contains 100 neurons with ten delays of memory and the TDNN topology contains five hidden processing elements (PEs), 5000 free parameters are introduced in the input layer alone. Large datasets and slow learning rates are required to avoid overfitting. Untrained weights can also add variance to the testing performance thus decreasing accuracy. Recurrent multilayer perception The final black box BMI model discussed is potentially the most powerful because it not only contains a nonlinearity but it includes dynamics through the use of feedback. The recurrent multilayer perception (RMLP) architecture in Fig. 34 consists of an input layer with N neuronal input channels, a fully connected hidden layer of nonlinear processing elements (PEs), (in this case tanh), and an output layer of linear PEs. PE Y2 x Wi Wf Figure 34. Fully connected, state recurrent neural network Each hidden layer PE is connected to every other hidden PE using a unit time delay. In the input layer equation of Eq. (318), the state produced at the output of the first hidden layer is a nonlinear function of a weighted combination (including a bias) of the current input and the previous state. The feedback of the state allows for continuous representations on multiple timescales, and effectively implements a shortterm memory mechanism. Here,f(.) is a sigmoid nonlinearity (in this case tanh), and the weight matrices W1, W2, and Wf, as well as the bias vectors bi and b2 are again trained using synchronized neural activity and hand position data. Each of the hidden PEs outputs can be thought of as a nonlinear adaptive basis of the output space utilized to project the high dimensional data. These projections are then linearly combined to form the outputs of the RMLP that will predict the desired hand movements as shown in Eq. (319). One of the disadvantages of the RMLP when compared with the Kalman filter is that there is no known closed form solution to estimate the matrices Wf, Wi and W2 in the model; therefore, gradient descent learning is used. The RMLP can be trained (see appendix A, B and C for details) with backpropagation through time (BPTT) or realtime recurrent learning (RTRL) [48]. Later, in Chapter 4, the training formulation differences between the Kalman filter and the RMLP will be compared and contrasted. y,(t) = f(Wx(t)+ Wyl(t 1)+ b,) (318) y2(t) = W2y(t)+ b2 (319) Development and Testing of BMI Models Reaching Task Performance Preliminary BMI modeling studies were focused on comparing the performance of linear, generative, nonlinear, feedforward, and dynamical models for the hand reaching motor task. The four models studied included the FIRWiener filter, TDNN (the nonlinear extension to the Wiener filter), Kalman filter, and RMLP. Since each of these models employs very different principles and has different mapping power, it is expected that they will perform differently; however, the extent to which they differ remains unknown. Here, a comparison of gray and black box BMI models for a hand reaching task will be provided. Topology and training complexity comparisons One of the most difficult aspects of modeling for BMIs is the dimensionality of the neuronal input. Because of this large dimensionality, even the simplest models contain topologies with thousands of free parameters. Moreover, the BMI model is often trying to approximate relatively simple trajectories resembling sine waves which practically can be approximated with only two free parameters. Immediately we are faced with avoiding overfitting the data. Large dimensionality also has an impact on the computational complexity of the model which can require thousands more multiplications, divisions, and function evaluations. This is especially a problem is we wish to implement the model in lowpower portable digital signal processors (DSP). Here we will asses each of the four BMI models in terms of their number of free parameters and computational complexity. Model overfitting is often described in terms of prediction risk (PR) which is the expected performance of a topology when predicting new trajectories not encountered during training [49]. Several estimates of the PR for linear models have been proposed in the literature [5053]. A simple way to develop a formulation for the prediction risk is to assume the quadratic form in Eq (320) where e is the training error for a model with D parameters and N training samples. In this quadratic formulation, we can consider an optimal number of parameters, (opt, that minimizes the PR. We wish to estimate how the prediction risk will vary with D which can be given by a simple Taylor series expansion of Eq. (320) around OOPt as performed in [53]. Manipulation of the Taylor expansion will yield the general form in Eq. (321). Other formulations for the prediction risk include the generalized crossvalidation (GCV) and Akaike's final prediction error (FPE) given in Eqs. (322) and (323). The important characteristic of Eqs (321 to 323) is that they all involve the interplay of the number of model parameters to the number of training samples. In general, the prediction risk increases as the number of model parameters increases. PR=E[(e2 N)] (320) PR e2 1+ (321) 2 GCV= 2 (322) 1 N 1+ FPE = e2 (323) 1I The formulations for the prediction risk presented here have been extended to nonlinear models [54]. While the estimation of the prediction risk for linear models is rather straightforward, in the case nonlinear models the formulation is complicated by two factors. First, the nonlinear formulation involves computing effective number of parameters (a number that differs from the true number of parameter in the model) which is nontrivial to estimate since it depends on the amount of model bias, model nonlinearity, and amount of regularization used in training [54]. Second, the formulation involves computing the noise covariance matrix of the desired signal; another parameter that is nontrivial to compute especially in the context of BMI hand trajectories. For the reaching task dataset, all of the models utilize 104 neuronal inputs as shown in Table 31. The first encounter with an explosion in the number of free parameters occurs for both the Wiener filter and TDNN since they contain a 10 tapdelay line at the input. Immediately the number of inputs is multiplied by 10. The TDNN topology has the greatest number of free parameters, 5215, of the feedforward topologies because the neuronal tapdelay memory structure is also multiplied by the 5 hidden processing elements following the input. The Wiener filter, which does not contain any hidden processing elements, contains 3120 free parameters. In the case of the Kalman filter, which is the largest topology, the number of parameters explodes due to the size of the A and C matrices since they both contain the square of the dimensionality of the 104 neuronal inputs. Finally, the RMLP topology is the most frugal since it moves its memory structure to the hidden layer through the use of feedback yielding a total of 560 free parameters. To quantify how the number of free parameters effects model training time, a Pentium 4 class computer with 512 MB DDR RAM, the software package NeuroSolutions for the neural networks [55], and Matlab for computing the Kalman and Wiener solution were used to train the models. The training times of all four topologies is given in Table 31. For the Wiener filter, the computation of the inverse of a 1040x1040 autocorrelation matrix took 47 seconds in Matlab which is optimized for matrix computations. For the neural networks, the complete set of data is presented to the learning algorithm in several iterations called epochs. In NeuroSolutions, whose programming is based in C, 20010 samples were presented 130 and 1000 times in 22 min 15 sec. and 6 min. 35 sec. for the TDNN and RMLP respectively [55]. The TDNN was trained with backpropagation and the RMLP was trained with backpropagation through time (BPTT) [48] with a trajectory of 30 samples and learning rates of 0.01, 0.01, and 0.001 for the input, feedback, and output layers respectively. Momemtum learning was also implemented with a rate of 0.7. One hundred Monte Carlo simulations with different initial conditions were conducted of neuronal data to improve the chances of obtaining the global optimum. Of all the Monte Carlo simulations, the network with the smallest error achieved a MSE of 0.0203+0.0009. A small training standard deviation indicates the network repeatedly achieved the same level of performance. Neural network training was stopped using the method of crossvalidation (batch size of 1000 pts.), to maximize the generalization of the network [56]. The Kalman proved to be the slowest to train since the update of the Kalman gain requires several matrix multiplies and divisions. In these simulations, the number of epochs chosen was based upon performance in a 1000 sample crossvalidation set which will be discussed in the next section. To maximize generalization during training, ridge regression, weight decay, and slow learning rates were also implemented. The number of free parameters is also related to the computational complexity of each model given in Table 32. The number of multiplies, adds, and function evaluations describe how demanding the topology is for producing an output. The computational complexity especially becomes critical when implementing the model in a lowpower portable DSP, which is the intended outcome for BMI applications. In Table 3 define No, Table 31. Model parameters Wiener TDNN Kalman RMLP Filter Training Time 47 sec. 22 min. 15 sec. 2 min. 43 6 min. 35 sec. sec. Number of Epochs 1 130 1 1000 Crossvalidation N/A 1000pts. N/A 1000pts. Number of Inputs 104 104 104 104 Number of Tap 10 10 N/A N/A Delays Number of Hidden N/A 5 113 (states) 5 PEs Number of 3 3 9 3 Outputs Number of 3120 5215 12073 560 Adapted Weights Regularization 0.1 (RR) le5 (WD) N/A le5 (WD) Learning Rates N/A 1e4 (Input) N/A 1e2 (Input) le5 (Output) le2 (Feedback) le3 (Output) t, d, andN1 to be the number of inputs, tapdelays, number of output, and number of hidden PEs, respectively. In this case, only the number of multiplications and function evaluations are presented since the number of adds is essentially identical to the number of multiplications. Again it can be seen that demanding models contain memory in the neural input layer. With the addition of each neuronal input the computational complexity of the Wiener filter increases by 10 and the TDNN by 50. The Kalman filter is the most computationally complex (O((No + 9)3)) since both the state transition and output matrix contain dimensionality of the neuronal input. For the neural networks, the number of function evaluations is not as demanding since they contain only five for both the TDNN and RMLP. Comparing the neural network training times, also exemplifies the computational complexity of each topology; the TDNN (the most computationally complex) requires the most training time and allows only a hundred presentations of the training data. As a rule of thumb to overcome these difficulties, BMI architectures should avoid the use of memory structures at the input. Table 32. Model computational complexity Multiplications Function Evaluations Wiener Filter No x tx d N/A TDNN N, xtxN, + N xd N1 Kalman O((No + 9)3) N/A RMLP NoxN1 +N1 xd+NxN, N1 Regularization, weight decay, and cross validation The primary goal in BMI modeling experiments is to produce the best estimates of HP, HV, and GF from neuronal activity that has not been used to train the model. This testing performance describes the generalization ability of the models. To achieve good generalization for a given problem, the two first considerations to be addressed are the choice of model topology and training algorithm. These choices are especially important in the design of BMIs because performance is dependent upon how well the model deals with the large dimensionality of the input as well as how the model generalizes in nonstationary environments. The generalization of the model can be explained in terms of the biasvariance dilemma of machine learning [57], which is related to the number of free parameters of a model. The MIMO structure of BMIs built for the data presented here were shown to have as few as several hundred to as many as several thousand free parameters. On one extreme if the model does not contain enough parameters, there are too few degrees of freedom to fit the function to be estimated which results in bias errors. On the other extreme, models with too many degrees of freedom tend to overfit the function to be estimated. In terms of BMIs, models tend to err on the latter because of the large dimensionality of the input. We discussed earlier that BMI model overfitting is especially a problem in topologies where memory is implemented in the neural input layer. With each new delay element, the number of free parameters will scale with the number of input neurons as in the FIR filter and TDNN network. To handle the biasvariance dilemma we would like to effectively eliminate extraneous model parameters or reduce the value of P in Eqs. (321 to 323). One could use the traditional Akaike or BIC criteria; however, the MIMO structure of BMIs excludes these approaches [58]. As a second option, during model training regularization techniques could be implemented that attempt to reduce the value of unimportant weights to zero and effectively prune the size of the model topology [59]. In BMI experiments we are not only faced with regularization issues but we must also consider illconditioned model solutions that result from the use of finite datasets. For example, computation of the optimal solution for the linear Wiener filter involves inverting a poorly conditioned input correlation matrix that results from sparse neural firing data that is highly variable. One method of dealing with this problem is to use the pseudoinverse. However, since we are interested in both conditioning and regularization we chose to use ridge regression (RR) [60] where an identity matrix is multiplied by a white noise variance and is added to the correlation matrix. The criterion function of RR is given by, J(w) = E[ e2 ]+ w2 (324) where w are the weights, e is the model error, and the additional term 651w2 smoothes the cost function. The choice of the amount of regularization (3) plays an important role in the generalization performance and for larger deltas performance can suffer because SNR is sacrificed for smaller condition numbers. It has been proposed by Larsen et al. that 3 can be optimized by minimizing the generalization error with respect to 3 [47]. For other model topologies such as the TDNN, RMLP, and the LMS update for the FIR, weight decay (WD) regularization is an online method of RR to minimize the criterion function in Eq. (324) using the stochastic gradient, updating the weights by Eq. (325). w(n + 1) = w(n) + qV(J) 3w(n) (325) Both RR and weight decay can be viewed as the implementations of a Bayesian approach to complexity control in supervised learning using a zeromean Gaussian prior [61]. A second method that can be used to maximize the generalization of a BMI model is called crossvalidation. Developments in learning theory have shown that during model training there is a point of maximum generalization after which model performance on unseen data will begin to deteriorate [56]. After this point, the model is said to be overtrained. To circumvent this problem, a crossvalidation set can be used to indicate an early stopping point in the training procedure. To implement this method, the training data is divided into a training set and a crossvalidation set. Periodically during model training, the crossvalidation set is used to test the performance of the model. When the error in the validation set begins to increase, the training should be stopped. Performance metrics The most common metric for reporting results in BMI is through the correlation coefficient (CC) computed between the desired trajectory and the model output. While the CC is the gold standard in BMI research, it is not free from biases. The CC is a good measure of linearity between two signals; however, in the testing of our BMI models, we often encountered signals that were linearly related but also contained constant biases or spatial errors (in millimeters) incurred in the tasks. In [12] we proposed an additional metric to quantify the accuracy of the mapping by computing the signal to error ratio also between the actual and estimated hand trajectories The SER (square of the desired signal divided by the square of the estimation error) gives a measure of the accuracy of estimated position in terms of the error variance. High SERs are desired since they are produced when the estimated output error variance is small. Throughout our BMI modeling studies we also observed that CC and SER do not relate directly to errors in the physical world. In a BMI application, we are interested in quantifying the distance (e.g. in millimeters) between the target position and the BMI output. In order to evaluate the performance of a BMI we propose a more specific figure of merit that emphasizes the accuracy of the reach portion of this movement, which we call the Cumulative Error Metric (CEM). CEM was inspired by the receiver operating characteristics (ROC) a hallmark in detection theory. CEM is defined as the probability of finding errors less than a given size (in millimeters) along the trajectory. To use this metric, plot the probability of finding a network output within a 3D radius around the desired data point, defined by CEM = Pe2 < r] (326) where e = d y. It is therefore very easy to visually assess the quality of an algorithm in terms of maximum error (the right extent of the curve) and how probable are large errors (the closer the CEM is to the left top corner of the plot the better). This plot contains a lot of information, since it tells how accurate the BMI is throughout the test set or simply in the movement trajectory (very much like a receiver operating characteristic used in detection). It should also be noted that similar curves can have large samplebysample deviations, when for instance one curve is delayed with respect to the other. Therefore CEM is judged as a sensitive metric when a delay between the spike data and the hand positions exists. Test set performance The performance of the four BMI models was evaluated in terms of CC, SER, and CEM. Each of the models is approximating a trajectory as illustrated in Fig. 35. The reaching task which consists of a reach to food and subsequent reach to the mouth is embedded in periods where the animal's hand is at rest as shown by the flat trajectories to the left and right of the movement. Since we are interested in how the models perform in each mode of the movement we present CC, SER, and CEM curves both for movement and rest periods. The performance metrics are also computed using a sliding window of 40 samples (4 seconds) so that an estimate of the standard deviation could be quantified. The window length of 40 was selected because each movement spans about 4 seconds. Figure 35. Reaching movement trajectory In testing, all model parameters were fixed and 3,000 consecutive bins (300 sees) of novel neuronal data were fed into the models to predict new hand trajectories. Fig. 36 shows the output (bold traces) of the four topologies for three desired reaching movements (thin traces). While only three movements are presented for simplicity, it can be shown that during the short period of observation (5 min), there is no noticeable degradation of the model fitting across time. From the plots it can bee seen that qualitatively all three topologies do a fair job at capturing the reach to the food and the initial reach to the mouth. However, the FIR filter, TDNN and Kalman filter cannot maintain the peak values of HP at the mouth position. Additionally the FIR filter and Kalman have smooth transitions between the food and mouth while the RMLP and TDNN sharply change their positions in this region. The most "noisy" trajectories were produced by the FIR and Kalman while the TDNN produce sharp changes in trajectory during the resting periods. The RMLP generated the smoothest trajectories during resting and maintained the peak values compared to the other models. The reaching task testing metrics are presented in Table 33. It can be seen in the table that the CC can give a misleading perspective of performance since all the models produce approximately the same values. Nevertheless, the KolmogorovSmirnov (KS) for a p value of 0.05 is used to compare the correlation coefficients with the simplest model, the FIR filter. The TDNN, Kalman and RMLP all produced CC values that were significantly different than the FIR filter and the CC values itself can be used to gauge if the difference is significantly better. The RMLP had the highest SER indicating that on average it produced smaller errors. All four models have poor resting CC values which 44 can be attributed to the output variability in the trajectory (i.e. there is not a strong linear relationship between the output and desired trajectoties) Overall, the probability of finding small errors is highest for the RMLP as shown in the CEM curves of Fig. 37. While the RMLP follows in 2nd place for a high probability of generating small errors, it is overcome by the FIR which performs better for larger errors. The Kalman filter has the worst overall CEM curve. In the movements of the reaching task though, the Kalman always outperforms the FIR and for moderate and large errors it outperforms the TDNN and RMLP. The RMLP outperforms the TDNN in the movements as corroborated with the trajectories (Fig. 36). 80 60 40 20 LL 0 20 40 0 50 100 150 200 250 300 350 400 450 500 80 60 Z 40 Z 20 40 5010010200 0 50 100 150 200 250 300 350 400 450 500 80 S60 CU 40 E 20 S20 40 0 50 100 150 200 250 300 350 400 450 500 80 60 40 40 0 50 100 150 200 250 300 350 400 450 500 Time (100ms) Figure 36. Testing performance for a three reaching movements (Belle) Table 33. Reaching task testing CC and SER (Belle) Linear Model TDNN Kalman Filter RMLP (FIR) Correlation 0.830.09 0.800.17 0.830.11 0.840.15 Coefficient (movement) CC KS Test 0 1 1 1 (movement) SER (dB) 5.971.31 5.452.52 5.30+2.07 6.932.55 (movement) Correlation 0.100.29 0.040.25 0.030.26 0.060.25 Coefficient (rest) CC KS Test 0 1 1 1 (rest) SER (dB) 3.162.69 3.005.55 0.933.69 6.993.95 (rest) Entire Test Trajectory S ...... .......FIR , RMLP  TDNN Kalman 20 40 60 80 Movements (Hits) of Test Trajectory 20 40 60 80 3D Error Radius (mm) Figure 37. Reaching task testing CEM (Belle) 1 0.8 S0.6 ro 2 0.4 0 0.2 0 0 1 0.8 S0.6 (0 2 0.4 0 0 46 The training of the four models was repeated for the second owl monkey named Carmen. In this experiment, the model topologies (FIR, TDNN, Kalman, and RMLP) and the type of reaching movement were held as controls. The only variable was the neuronal recordings extracted from the behaving primate. In this case, only half the number of cells was (54 neurons) collected compared to the previous experiment. The reduction in the number of inputs resulted from not sampling the PP and M1/PMd ipsi cortices (see Table. 21). In Fig. (38) the output trajectories from the four models are presented as bold traces. Qualitatively we see an immediate decrease in performance. All of the generated trajectories are much noisier and miss capturing peaks of the movements. Specifically all four models cannot produce an increase in any of the coordinated during the reach to the food. 100 50 L 0 50 0 50 100 150 200 250 300 350 400 450 500 100 Z 50 , 0 50 100 150 200 250 300 350 400 450 500 100 CU 50 E 0 50 0 50 100 150 200 250 300 350 400 450 500 100 a 50 J 2 0 100 150 200 250 300 350 400 450 Time (100ms) Figure 38. Testing performance for a three reaching movements (Carmen) Quantification of this reduction in performance is given in Table 34 where the CC values showed a 30% drop. The worst performance during the resting periods was produced by the Kalman filter which produced a SER of 3.11 dB. The relative trends in performance were again maintained; the Kalman and FIR produced noisy trajectories, the TDNN produced sharp spikes, and the RMLP and Kalman maintained the peak values. The CEM curves (Fig. 39) show again that the RMLP outperforms all other models for small errors but for large errors the RMLP is comparable with the FIR. At this point for a reaching BMI task, the expressive power of each of the four models has been demonstrated. Depending upon the use of nonlinearity, dynamics, or tapdelay memory structures different levels of performance can be obtained in terms of trajectory smoothness, and reconstruction of peak values. While some of these differences may be subtle, we observed the largest change in performance when switching between neuronal inputs. The use of a reduced number of neurons and cortical areas sampled had a dramatic effect on model performance even though the topologies and task remained fixed. Table 34. Reaching task testing CC and SER (Carmen) Linear Model TDNN Kalman Filter RMLP (FIR) Correlation 0.640.24 0.420.38 0.630.28 0.650.24 Coefficient (movement) CC KS Test 0 1 0 0 (movement) SER (dB) 4.82+3.29 4.21+3.75 3.463.80 5.043.67 (movement) Correlation 0.030.28 0.180.32 0.020.30 0.040.27 Coefficient (rest) CC KS Test 0 1 1 1 (rest) SER (dB) 1.033.16 0.223.32 3.11+3.72 0.223.51 (rest) Entire Test Trajectory ..... FIR TDNN RMLP Kalman 1 0.8 S0.6 Co 2 0.4 0 0.2 0 1 0.8 0.6 (0 2 0.4 0.2 0.2 50 100 3D Error Radius (mm) 150 Figure 39. Reaching task testing CEM (Carmen) Cursor Control Task In light of the results produced in the reach task experiments, BMI modeling was extended to the cursor control task again for two primates (in this case the Rhesus monkeys have a more sophisticated nervous system). In experiments for Ivy and Aurora, the results were not as dramatic but the relative features in the model outputs were the same. Again the FIR had difficulties maintaining the peaks of the trajectories (see Fig. 3 10 YCoordinate) and the TDNN had sharp changes in the trajectories (see Fig. 311 Y Coordinate). Both the Kalman and RMLP produce the best reconstructions of the trajectories in terms of smoothness and capturing the peaks of the movements. In Tables 35 and 36 this result is corroborated with the high CC and SER values. Overall, the Movements (Hits) of Test Trajectory 49 TDNN, Kalman, and RMLP all produce CC values that are significantly better that the FIR filter as indicated by the KS test. In both tables we present the X and Y coordinates separately since there is a noticeable difference in the level of performance for each coordinate. This discrepancy can be attributed to the differences in the nature of the movements (i.e. in each experiment one coordinate primarily contains either a low or high frequency component). Performance may be dependent upon the model's ability to cope with both characteristics. Another possible explanation is that the cursor control experimental paradigm involves a coordinate transformation for the primate. Manipulandum movements are made in a plane parallel to the floor while the target cursor is being placed on a computer monitor whose screen is parallel to the walls. Depending on how the experimenters at Duke University defined the coordinate axes, one coordinate is the same in both planes while the other is rotated 90. XCoordinate YCoordinate 50 50 0 0 50 50 0 50 100 150 200 0 50 100 150 200 50 50 50 0 0 50 100 150 200 0 50 100 150 200 50 50 0 50 100 150 200 0 50 100 150 200 50 50 0 50 100 150 200 0 50 100 150 200 Time (10Oms) Time (100ms) Figure 310. Testing performance for a three reaching movements (Ivy) XCoordinate 50 olyAv~g wwiV 50 z z I n 0 50 100 150 200 0 50 100 150 200 50 50 50 50 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 Time (100ms) 50 100 150 Time (100ms) Figure 311. Testing performance for a three reaching movements (Aurora) Table 35. Reaching task testing CC and SER (Ivy) FIR TDNN Kalman RMLP Correlation X 0.640.16 0.580.12 0.660.18 0.650.17 Coefficient Y 0.40+0.24 0.470.23 0.58+0.23 0.46+0.31 CCKSTest X 0 1 1 0 Y 0 1 1 1 SER (dB) X 1.502.29 1.351.42 1.972.10 1.582.62 Y 0.102.09 0.992.52 0.083.76 0.623.93 Table 36. Reaching task testing CC and SER (Aurora) FIR TDNN Kalman RMLP Correlation X 0.650.21 0.690.17 0.620.26 0.660.24 Coefficient Y 0.770.15 0.790.11 0.820.11 0.790.16 CCKSTest X 0 1 1 1 Y 0 1 1 1 SER (dB) X 1.342.45 16.11+2.18 15.652.33 16.992.48 Y 3.242.70 13.801.97 14.562.63 14.302.80 YCoord in ate Entire Test Trajectory ..... FIR  TDNN Kalman  RMLP Error Radius (mm) Figure 312. Reaching task testing CEM (Ivy) Entire Test Trajectory ..... FIR TDNN Kalman , RMLP 10 20 30 40 Error Radius (mm) Figure 313. Reaching task testing CEM (Aurora) In terms in CEM, for both sessions (Figs. 313 and 314) all four models do not deviate much from each other indicating that the performance is basically the same for the models. For a given probability, the maximum difference in error radius was four millimeters. While the variation in each model's CEM small, the slope of the curves for each primate varied greatly. For Ivy the probability increased 0.03 percent per millimeter increase in radius while Aurora doubled that amount for 0.06 percent pre millimeter. Again we attribute the increase in performance to differences in the cells from the of sampled cortices. For ivy only three cortices (Ml, SMA, and PP) were used while Aurora's data included (Ml, S1, SMA, PMd, and Mlipsi). Further analysis of the contributions of each cortex will be presented later in this dissertation. Discussion With each trained model, we evaluated the performance with standard metrics such as correlation coefficients (CC) and compared them with new metrics which included signal to error ratio (SER), and the cumulative error metric (CEM). This study showed that each model's performance can vary depending on the task, number of cells, sampled cortices, species, and individual. For the reaching movements, the nonlinear RMLP performed significantly better than other linear models. However, for the cursor control task all models performed similarly in terms of the performance metrics used. Examples of the differences were demonstrated in the smoothness of the trajectories as well as the ability to capture the peaks of the movements. Variability in these results indicates a need to further study model performance in both more controlled neuronal/behavioral datasets that include a variety of motor tasks, trajectories, and dynamic ranges. In BMI experiments, we are seeking movement trajectories that are similar real hand movements. In particular, we desire accurate smooth trajectories. In the experiments presented here, the nonlinear, dynamical model (RMLP) produced the most realistic trajectories compared to the FIR, TDNN and Kalman filter. The ability of this model to perform better may result from the use of saturating nonlinearities in the large movement space as well as the ability to use neuronal inputs at multiple timescales through the use of feedback in the hidden layer. In all models studied though, performance (to different degrees) was also affected by the order of the model (i.e. the number of free parameters). This was especially true for models that implemented memory structures in the input space such as the linear models, and TDNN. The RMLP can overcome this issue by reducing the number of free parameters by shifting the memory structure to the hidden layer through the use of feedback. With the performance results reported in these experiments, we can now discuss practical considerations when building BMIs. By far the easiest model to implement is the Wiener filter. With its quick computation time and straightforward linear mathematical theory it is clearly an attractive choice for BMIs. We can also explain its function in terms of simple weighted sums of delayed versions of the ensemble neuronal firing (i.e. it is correlating neuronal activity with HP). However from the trajectories in Figs. 36, 38, 310, and 311, the output is noisy and does not accurately capture the details of the movement. We can first attribute these errors to the solution obtained from inverting a poorly conditioned autocorrelation matrix and second to the number of free parameters in the model topology. While we may think that by adding nonlinearity to the Wiener filter topology as in the TDNN we can obtain a more powerful tool, we found out that the large increase in the number of free parameters overshadowed the increase in performance. We have found that training the TDNN is slow and tedious and subject to getting trapped in local minima. Moving to a Kalman based training procedure we thought that the online Kalman gain parameter improve performance; however this technique suffered from parameter estimation issues. Knowledge of these training and performance issues leads us to the RMLP. With the choice of moving the memory structure to the hidden layer, we immediately gain a reduction in the number of free parameters. This change is not without a cost since the BPTT training algorithm is more difficult to implement than for example the Wiener solution. Nevertheless, using a combination of dynamics and nonlinearity in the hidden layer also allowed the model to accurately capture the quick transitions in the movement as well as maintain the peak hand positions at the mouth. Capturing these positions resulted in larger values in the correlation coefficient, and SER. While the RMLP was able to outperform the other three topologies, it is not free from error; the output is still extremely noisy for applications of real BMIs (imagine trying to grasp a glass of water). The search for the right modeling tools and techniques to overcome the errors presented here are the subject of future research for optimal signal processing for brain machine interfaces. CHAPTER 4 RMLP MODEL GENERALIZATION Motivation for Studying the RMLP In the previous chapter, the performance of four models (grey and black box) was compared for BMI experiments involving reaching and cursor tracking tasks. The goal was to compare and contrast how model topologies and training methodologies effect trajectory reconstruction for BMIs. Evaluation of model performance using CC, SER, and CEM indicated that the RMLP is a better choice compared to the other models. Here we continue evaluation of the RMLP by discussing generalization or the ability of the RMLP to continuously produce accurate estimates of the desired trajectory over long testing sets of novel data. First we would like to discuss in further detail the list below indicating why the RMLP is an appropriate choice for BMI applications. * RMLP topology has the desired biological plausibility needed for BMI design * The use of nonlinearity and dynamics gives the RMLP a powerful approximating capability * RMLP produced equivalent or better CC, SER, and CEM with the fewest number of model parameters While the RMLP may first appear to be an "off the shelf' black box model, comparison to Todorov's mechanistic model reveals that it has the desired biological plausibility needed for BMI design. To illustrate the plausibility we propose a general (possibly nonlinear) state space model implementation that corresponds to the representation interpretation of Todorov's model for neural control: s(t) = g(s(t 1)) + w(t 1) (State dynamics) x(t) = h(s(t)) + v(t) (Output mapping) In Eq. (41), the state vector s(t) can include the mechanic variables position, velocity, acceleration, and force. A statespace, linear filter from which the Kalman is one example is a special case of (41) when g(.) and h(.) are linear. For BMIs, the output vector x(t) must consist of the neuronal activity since in testing mode (after optimization of the models g(.) and h(.) is done) the BMI will operate using only the neuronal activity. The modeling errors and measurement noise can be summarized by the two noise terms w(t) and v(t). In comparison, the RMLP approach for the BMI involves the cause interpretation of Todorov's model. Differences in the representation and cause interpretations result in very different training procedures for both topologies. Referring Eq. (319) the model output y2 consists only of the hand position; however, the RMLP must learn to build an efficient internal dynamical representation of the other mechanical variables (velocity, acceleration and force) through the use of feedback. In fact, in the RMLP model, we can regard the hidden state vector (yi in Eq. 318) as the RMLP representation of these mechanical variables driven by the neural activity in the input (x). Hence, the dynamical nature of Todorov's model is implemented through the nonlinear feedback in the RMLP. The output layer is responsible for extracting the position information from the representation in yi using a linear combination. An interesting analogy exists between the output layer weight matrix W2 (Eq. 319) in the RMLP and the matrix U (Eq. 34) in Todorov's model. This analogy stems from the fact that each column of U represents a direction in the space spanning the mixture of mechanical variables to which the corresponding individual neuron is cosinetuned, which is a natural consequence of the inner product. Similarly, each column of W2 represents a direction in the space of hand position to which a nonlinear mixture of neuronal activity is tuned. While the state and output equations of the RMLP and Kalman filter can be written in a similar form, the training of the two formulations is very different. On one hand, neural networks simply adjust model parameters my minimizing a cost function, typically the meansquare error (MSE) between the output of the model and the desired trajectory. On the other hand, the Kalman formulation assumes a probabilistic model of the temporal kinematics. Also included in the Kalman formulation is an estimation of the uncertainty in the state and output estimates. It may seem that the Kalman formulation is a more mathematically principled and structured approach; however the performance of the model is subject to the assumptions imposed by the probabilistic model. It is often difficult to corroborate the assumptions (Gaussian firing statistics, linear kinematics) with the unknown aspects of neural coding and motor systems. Again the Kalman formulation also suffers from issues of parameter estimation which can be overcome by using backpropagation as in the RMLP BPTT formulation. Compared to the other models (FIR, TDNN) the RMLP is the only topology capable of creating Todorov's internal position, velocity, and acceleration representations that may be necessary for BMI applications. The expressive power of the RMLP is greater because it can utilize the dynamics created by feedback in the hidden layer to create these states. The other topologies are far more restrictive. For example, the FIR limits the neural to motor mapping to a linear feedforward system. Since there is still no proof that the neural to motor mapping is linear, the TDNN takes this a step further by using nonlinearity; however, it is still feedforward. Since BMI model generalization is related to the complexity of the specific task at hand, we wish to equip the topology with features that give it the best chance of succeeding in the task. Therefore, we believe the use of both nonlinearity and dynamics are the most appropriate. Lastly, the RMLP produced equivalent or better CC, SER, and CEM values with the fewest number of model parameters. One aspect and explanation of this performance level is that the RMLP generalized better during the testing. As discussed earlier, the ability of a model to generalize is directly related to the complexity or the number of free parameters in the topology. The RMLP effectively reduces the number of parameters by moving the memory structure away from the neuronal input which has large dimensionality. This topology is capable of achieving an equivalent memory depth in samples with fewer parameters. The use of a low complexity smooth function enables the topology to avoid overfitting the data. Motivation for Quantifying Model Generalization There are two aspects of model generalization that we deem will be important for the success of real BMIs. The ultimate vision is that the BMI could be operated by a paralyzed patient who through some training procedure could use the neuroprosthetic device for long periods of time and for a wide variety of movements. If BMI models are not reliable in this sense, then the user would have to continuously retrain the system which would place a discontinuity in the freedom to operate the device. Currently BMI models have only been tested in limited datasets involving only one session and one type of movement (either reaching or cursor control). Here we will investigate model performance for both multisession and multitask datasets. We have shown that the RMLP generalizes well over short testing datasets (5 min.). In the introduction of this dissertation, it is claimed that the modeling problem is complicated by the nonstationary nature of the neuronal recordings. At this point, it remains unclear if the nonstationarity is a result of a change in the underlying properties of individual cells or if the ensemble activity itself is evolving as a function of the attention state of the primate. If we refer to the literature, many BMI groups have found experimentally that the model mappings were updated frequently (i.e. every 10 min.) to achieve a high level of performance [16, 18, 19]. The consensus is nonspecific and states without quantification that the models need to be updated because the ensemble of recorded neurons is changing its firing properties and directional tuning frequently. This statement implies that the fixed model parameters are no longer relevant for producing accurate trajectory reconstructions. MultiSession Model Generalization The scheme used to test the multisession model generalization involves the concatenation of datasets (neuronal and behavioral recordings) from several recording sessions spanning several days. The objective is to train a model with data from session 1 and continue testing into session 2. Two scenarios for testing model generalization are presented in Fig. 41. Since only two recording sessions from Duke University included the necessary information to concatenate the datasets, the second scenario involves reversing the order of the sessions. Optimally the concatenation of several consecutive sessions is desirable since it would mimic a real BMI situation; however, the data was not available. Over the span of several recording sessions it was shown in Tables 21 and 22 that the number of cells in the ensemble can vary. In order to test a fixed model in another recording session, the experimenter must be careful that the same training neurons are assigned to their respective weights in the testing set. Without aligning cells to weights that they were specifically trained for, the model is not guaranteed to produce testing movements that are in the same class of training movements. To prevent this problem Duke University saves the spike templates used for spike sorting the cells during the data collection process. In other recording sessions, the templates can be compared and similar cells in both sessions can be indexed. In the case of Aurora's BMI dataset used in these experiments, only 180 cells were common in the two sessions (183 S1, 185 S2) and they were aligned in sequence in the data concatenation process. Time C Scenario Training Set Session 1 Session 2 Test Set Scenario 2 . Session 2 Session 1 Figure 41. Two scenarios for data preparation To test the RMLP generalization, the model was first trained with either the first 5000 samples from the dataset corresponding to scenario 1 or 2. A RMLP with topology identical to those used in the cursor tracking experiments of Chapter 2 was employed in the training of each scenario. Upon training completion, the model parameters were fixed and novel neuronal data samples were run through the networks. A minimum of 44 min. and a maximum of 85.5 minutes of testing data were used in the experiments. The 61 duration between sessions ranged between one day for HP and HV while GF had a two day timelapse. Position XCoord. 0 20 40 60 80 Position YCoord. 0 20 40 60 80 Velocity XCoord. . 0.8 S0.6 0 . 0.4 a 0.2 0 0 *" 0.8 00.6 0 0.4 Cm a) " 0.2 c o C) L 20 40 60 80 0 20 40 60 80 0 20 40 60 80 100 120 140 160 180 Time (30sec) Figure 42. Scenario 1: Testing correlation coefficients for HP, HV, and GF The testing CC values, computed in 30 second nonoverlapping windows, are presented for both scenarios in Figs. 42 and 43. The selected window size represents a compromise between time resolution and the need for enough data samples to compute the correlation. For comparison, the experiment described above was repeated for the simplest BMI model, the FIR filter (10 tapdelay), which will serve as a control. The most obvious trend in the curves is that they are highly variable and this observation is corroborated with the large standard deviation reported in Tables 33 through 36. We can also see that both the RMLP and FIR filter fail at producing high values for the same time windows. Lastly, depending upon the scenario, the correlation curves show either a downward trend (Fig 42 HP, HV, Fig 43 GF) or an upward trend (Fig 43 HP, HV, Fig 42 GF). Position XCoord. Velocity XCoord. 0.8 0.8 0.6 0.6 0 0 0.4 0.4 S0.2 S2 'S1 0 0.2 S2 S1_ S0 20 40 60 80 100 0 20 40 60 80 100 Position YCoord. Velocity YCoord. o 1 S0.4 I 0.4 S0.S2 1 0.2 S2 S1, 0 20 40 60 80 100 0 20 40 60 80 100 r Gripping Force o 0.5 0 ru FIR _ RMLP 0 S2 ,S1 L R 0 0 20 40 60 80 100 120 140 160 180 Time (30sec) Figure 43. Scenario 2: Testing correlation coefficients for HP, HV, and GF The trends in the correlation curves are quantified for each scenario in Tables 41 and 42. Here the KolmogorovSmirnov (KS) and twosample ttest for a p value of 0.05 are used to compare the correlation coefficients from each session. The KS test is a "goodness of fit" test that measures if two independent samples (for BMIs this is the correlation in S1 and S2) are drawn from the same underlying continuous population. Since we observe increases and decreases in the mean the ttest will also be used to determine if the two independent samples come from distributions with equal means. Included in both tables are the mean values of the correlation for each session which can be used to gauge if significant changes in correlation either increased or decreased. In S1, the HP and HV datasets failed the KS test indicating the correlations in each session belong to separate populations. Additionally, the average correlation values (highlighted in the table) from S2 were all significantly lower than in S1. This result would seem to indicate that both the RMLP and FIR models generalize well up to a point and then the performance slowly degrades. It should be noted here that between model training and testing, performance will always degrade. However, the amount of degradation is dependent upon the particular segment used in training because of the nonstationary nature of the data. Additionally this result does not guarantee that performance will not increase in other signal segments depending upon the conditions in the data. In the case of the GF, a significant increase in the observed correlations occurred. In Table 42 samples 15 through 75, after inspection of the model output and desired GFs, the statistical properties of the desired signal were observed to change significantly. Differences in performance between the sessions could be attributed to differences in the dynamic range and frequency of the gripping forces that could not be captured by the trained model. In scenario 2 an equivalent or significant increase in HP and HV testing performance was observed as time progressed. Significant values are highlighted in Table 43. Most CC values achieved on average a 17% increase in correlation in S2. This result which is contrary to that which is reported in the BMI literature quantifies that both linear and nonlinear models can generalize for testing periods close to an hour in length even when the generation of neuronal activity is discontinued for a period of one or two days (1 day HP, HV; 2 days GF). Again we also observed a significant change in the GF performance but this time performance degraded on average to a low value of 0.45. During testing the dynamic range and frequency changed to values not encountered while training and may again serve as an explanation for the poor performance. Table 41. Scenario 1: Significant decreases in correlation between sessions HP HV GF FIR X KS 1 1 1 TTest 1 1 1 Mean S1 0.74 0.71 0.62 Mean S2 0.51 0.54 0.72 Y KS 1 1 TTest 1 1 Mean S1 0.68 0.67 Mean S2 0.60 0.55 RMLP X KS 1 1 1 TTest 1 1 1 Mean S1 0.75 0.75 0.63 Mean S2 0.53 0.55 0.79 Y KS 1 1 TTest 1 1 Mean S1 0.75 0.71 Mean S2 0.63 0.60 Table 42. Scenario 2: Significant increases in correlatio n between sessions HP HV IGF FIR X KS 1 1 1 TTest 1 1 1 Mean S1 0.60 0.57 0.87 Mean S2 0.70 0.68 0.43 Y KS 0 0 TTest 0 0 Mean S1 0.63 0.64 Mean S2 0.60 0.62 RMLP X KS 1 1 1 TTest 1 1 1 Mean S1 0.60 0.59 0.88 Mean S2 0.70 0.69 0.48 Y KS 0 0 TTest 0 0 Mean S1 0.65 0.61 Mean S2 0.66 0.63 MultiTask Model Generalization The second component of BMI model generalization involves testing the RMLP model on a multitask movement. To stress the generalization capability of the model we require data that fully utilizes the 3D working space of the primate. In animal experiments, it is often difficult to obtaining these datasets because BMI experimental paradigms involving primates require carefully planned behavioral tasks that are motivational, nonthreatening, and require simple skill sets for the primate. Even with extensive planning, the primates must be trained for several months prior to neuroprosthetic implantation and behavioral experimentation. For these reasons, the datasets that have been provided by Duke University often only involve singletask behaviors collected from several species of primates. Here data from a single primate performing multiple tasks in a single session was not available; therefore, we devised an experimental paradigm that could demonstrate multitask model generalization using existing datasets. The experimental paradigm involves concatenating datasets from the behavioral experiments of Belle (hand reaching) and Aurora (cursor control). The idea is to mix segments of 3D hand reaching with 2D cursor control movements thus forcing the trained network topology to learn changes in movement frequency and dynamic range. Obviously since the dimensionality (both neuronal and positional) of the two datasets differs, several constraints were artificially imposed. First, the desired trajectory for all movements included X, Y, and Z coordinates; therefore, segments containing 2D movements were zeroed in the Z direction. An example of the desired signal used in model training presented in Fig. 44 which shows how the cursor control movement (samples 10004000) with small amplitude is interleaved with the large amplitude reaching movement (samples 40007000). The entire training dataset consists of 14000 samples with alternating patterns of the two behaviors. The second constraint imposed involves the neuronal inputs and is highly artificial since the resulting dataset concatenates neurons from different cortices of two different species (owl and Rhesus monkeys). In this experiment we make the strong assumption that assigning different types of cells to each model weight is not a significant problem. Only the number of cells used from each recording session is preserved in the concatenation process. The firing patters of 104 cells were selected using a "cellular importance measure" discussed later in Chapter 7 of this dissertation. As with the desired dataset, the firing patterns from the 104 cells were alternated in synchrony with their respective movements in this multitask paradigm. XCoordinate 50 0 50 100 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 YCoordinate 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 ZCoordinate 50 50 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Figure 44. Multitask model training trajectories The RMLP topology trained using this dataset was modified to contain fifteen hidden PEs which will give the model extra degrees of freedom to account for the variability encountered in the movements. In the separate behavioral experiments the training BPTT trajectory length was chosen to match the number of samples of each movement. For our datasets 40 samples and 50 samples were chosen for the cursor tracking and hand reaching movements, respectively. In the multitask experiment, the trajectory was chosen to mach the largest movement and in this case it is the cursor tracking task with 50 samples. Given the strong assumptions about the data in this experiment we are surprised by the high performance of the linear and nonlinear models. While the RMLP has been shown to be a universal mapper in 9N it is not obvious that the linear FIR filter would be able to reconstruct the two trajectories in testing. The FIR CC values were all significantly lower than the RMLP as shown by the KS test in Table 44. On of the most attractive features of the RMLP is its ability to zero the Z coordinate in the 2D movements. The RMLP produced the smallest mean square error during these segments as shown in the last column of Table 43. Differences in performance is also confirmed further by the variance in the CEM curves of Fig. 45. Since the variance in the curves is small this result shows that for this dataset a topology with a sufficient number of degrees of freedom and training examples it is possible to learn simultaneously the functional mapping from two species of primates and two movements. Next we would like to push the performance of the two topologies up to a point where they may fail in producing a good mapping in this multitask. As discussed earlier the expressive power of the RMLP is much greater than the FIR because of the use of nonlinear recurrent PEs in the hidden layer. We believe that the power of these elements will become much more important when the size of the topology becomes greatly reduced. This may occur when recordings of the neuronal ensemble contain only a few important (see Chapter 7) cells. To test this hypothesis, both topologies were retrained using only a subset of 10 important cells. Again in Table 43 we present the testing CC and SER values which show an overall decrease in both model's performance. However, we observed a much greater average drop in performance by the FIR filter. In this case, the CC values for the FIR decreased by 12% while the RMLP had only a 7% drop. In Fig. 46 we can observe qualitative differences in the performance by plotting the testing X, Y, and Z model outputs (bold) along with the desired signal centered upon a transition in the type of movement (reaching vs. cursor). In the figure, the Xcoordinate of the FIR tends to average the outputs for both reaching and cursor control while the RMLP is capable of distinguishing between the two. We can additionally observe that the FIR continues to have problems producing flat outputs in the Zcoordinate. Table 43. Multitask testing CC and SER values Correlation Coefficient/SER (dB) X Y Z MSE Z dir FIR 0.530.20/1.401.19 0.590.18/1.571.40 0.610.20/0.681.07 13.76 RMLP 0.600.18/1.601.40 0.640.19/1.88+1.56 0.690.23/1.061.96 1.39 FIR 0.260.26/1.050.71 0.510.22/1.391.03 0.590.24/0.610.87 8.96 subset RMLP 0.520.21/1.381.01 0.530.25/1.611.21 0.670.27/1.061.94 1.09 subset Table 44. Significance of CC compared to FIR filter KS Test X Y Z FIR 0 0 0 RMLP 1 1 1 FIR subset 1 1 1 RMLP subset 1 1 1 Entire Test Trajectory 1 0.5 10 20 30 40 Error Radius (mm)  RMLP FIR  RMLPSubset FIRSubset 50 60 70 Figure 45. CEM curves for linear and nonlinear models trained on a multitask RM LPSubset 40 20 0 20 40 Reaching 'C rsor 0 200 400 600 800 100 0 5o 1Reacing Cursor 0 200 400 600 800 100 FIRSubset 0 200 400 600 800 1000 0 200 400 600 800 1000 Figure 46. Multitask testing trajectories centered upon a transition between the tasks Discussion During training, any model topology with a sufficient number of parameters can produce an output with zero error especially when the desired signal is simple. Models of this type that produce zero training error have overfit or memorized the training data and will not perform well when faced with novel testing datasets. In BMI modeling experiments, the chance of overfitting the data is especially high because the large dimensionality of the neuronal input leads to topologies with hundred or even thousands of parameters that are attempting to estimate trajectories that resemble simple sine waves. These undesirable qualities bring a sense of brittleness to BMI models and suggest that they will not perform well in real BMI applications where fixed models will be required to produce accurate trajectories for a wide variety of movements and long periods of time. The generalization of our most promising network, the RMLP, was tested for both the longest duration dataset in our archive and for combined movement/species tasks. The results of our experiments are predicative of precepts that are well known in adaptive filter and neural network theory that involve properties of both the training data and model topology. In terms of training data, all model topologies assume that the input and desired signals have stationary statistics. The correlation plots in Fig. 14 clearly show that this is not the case and local neuronal activity sometimes fires in synchrony with peaks in movement and sometimes it doesn't. Consequently the models selected in these experiments learn to average cellular responses (a result of inner product filtering operations in all topologies). Time instances where this averaging is not appropriate is reflected in the dips in correlation of Figs. 42 and 43. We also observed that dips in the correlation occurred when the models were trained on data that was not representative of the data encountered during testing. Qualitative observations about model generalization regarding the number of training observations, number of model parameters, distribution function of the desired signal, and final training error have been well described by Vapnik's principle of Empirical Risk Minimization and the VC dimension (VapnikChervonekis dimension). Vapnik's theory provides a mathematical foundation for solving the problem of choosing model/data design parameters for best generalization. In practice though the process of designing the model for a specific problem/dataset involves a compromise between available data, model expressive power, and computational complexity. In our experiments on the multitask data set, we found that large linear and nonlinear models (that include the full ensemble of cells) performed well in testing. From a signal processing point of view, this result is obvious because the process of optimally orienting a hyperplane becomes much simpler because the size of the neuronal input naturally maps it to a high dimensional space. Moreover we trained both linear and nonlinear models with observations of reaching and cursor tracking covering the entire testing distribution function. In this case, the problem of regression may be solved by a simple linear system. Mapping to highdimensional spaces and then performing linear regression is one of the fundamental principles of Support Vector regression. However, when the input was pruned to include only ten cells, the size of the space of both models reduced significantly but the performance of the RMLP remained high while the FIR reduced greatly. This result is showing that compared to the FIR the RMLP is capable of a better compromise of representing different patterns as can be expected. The use of multiple PEs in the hidden layer gives the RMLP the ability to nonlinearly segment the space of the movement and assign neuronal activity to each of the segments with the input layer weights. The FIR is limited to a simple liner transformation directly from the neuronal activity to the movement coordinate system. In the next chapter we will look in depth at how the RMLP finds the functional mapping by tracing the signals through the topology. In summary, we have shown for two BMI models (linear feedforward, and nonlinear feedback) trained with MSE that testing performance may not be as brittle as expected. Depending upon the training dataset, it is possible (in open loop experiments) that model performance can significantly increase even went the animal was disconnected from the model for up to two days. Moreover, in high dimensional spaces both linear and nonlinear models have the expressive power to simultaneously learn the functional mapping between cortical activity and behavior for two species of animals and two very distinct hand movements. In environments where dimensionality is a premium, the nonlinear feedback model (RMLP) proved to be more powerful in producing high performance. To achieve high performance in real BMI applications the results of these experiments advocate the use of training sets that incorporate diverse training movements coupled with a parsimonious model with broad expressive power. CHAPTER 5 ANALYSIS OF THE NEURAL TO MOTOR REPRESENTATION SPACE CONSTRUCTED BY THE RMLP Introduction In order to advance the stateoftheart in BMI design and gain confidence in the models obtained, it is critical to understand how they exploit information in the neural recordings. Moreover, comprehending the models' solution may help discover features in the neural recordings that were previously unrecognizable. With an explanation of how the network addressed these simple tasks, we can guide the development of our next generation BMI models, which will face more complicated motor control problems. It is relatively simple to analyze linear BMI models because they are well understood and the mathematical theory of least squares is welldeveloped [27, 46, 47]; however, we are interested in analyzing our best performing neural network, the RMLP in a nonautonomous task. It is a common belief in the neural network and the machine learning community that neural networks are "black boxes" and interpreting the function of nonlinear dynamical networks is extremely difficult [62, 63], the exception being the autonomous Hopfield network with its fixed point dynamics. Here we will apply both signal processing constructs and dynamics to make sense of the representations constructed by the topology, weights, and activations of the model. We hypothesize that a trained RMLP can be used to discover how changes in the neural input can be used to create changes in behavior. By tracing the signals through the topology, visualizing projections of the input and intermediate states, and mathematically understanding the dynamics of the system we will show the mechanisms by which real neuronal activity is transformed into hand trajectories by the RMLP. From a neurophysiologic point of view, we are using a signal processing modeling framework as a tool to tell us about the mapping from neural activity to behavior. We will demonstrate that for BMI experiments the choice of a parsimonious model, the use of the desired response in a modelbased framework, and the appropriate use of signal processing theory provide sufficient evidence to present constructs that agree with basic principles of neural control. Understanding the RMLP Mapping Network Organization To elucidate how RMLP maps the firing neuronal patterns to hand position we examine the representation space constructed by the trained network. We interpret the output of the hidden layer PEs as the bases, and the output layer weights as the coordinates of the projection. For tanh nonlinearities, the bases are ridge functions [48], and the input weight vector defines the direction in the large input space where the ridge is oriented. To motivate the analysis, we consider two networks with hidden layers that included five (best performing network), and one (simplest RMLP architecture) processing element (PE). Analysis of the RMLP organization will first be performed for the reaching task since the simple, repeated movement contains landmarks (see Fig. 35) which we can refer to. After this analysis is performed, we will compare the organization to the random movements in the cursor control task. Understanding the Mapping We examine the performance of the first layer of the simplest RMLP network by plotting the neuronal projections before entering (preactivity) and after leaving (activity) the hidden layer in Fig. 51. We see that the PE is bursting during the reaching instants9. This observation offers a great simplification to the RMLP analysis because in the single PE network, Wfreduces to a scalar. Linearizing the input layer equation in (318) around zero (point of maximum slope) we can unfold the recursive equations of the RMLP over time as: y,(t)= f'(O)Y (s(t)+ f'(O)(Wfs(t1)+W fs(t2)+...+fs(tn))) (51) We see thatyi(t) is constructed by exponentially weighting the past neuronal activity by Wf. We have observed that Wf settles to a value that makes this operating point locally unstable (f(0) is an unstable attractor). For example, when the slope of the tanh function at this point is 0.5, Wf settles to 2.1, yielding a pole of the linearized system at 0.5 x 2.1=1.05. This locally unstable behavior of the RMLP is the reason of bursting, and segments very accurately the movements from the arm at rest. Hidden PE 2 1 2 0 200 400 600 800 1000 Hidden PE 2 1 20 200 400 600 800 1000 Time (100ms) Figure 51. Preactivity and Activity in a RMLP with one hidden PE 9 In multiple PE cases only one PE exhibits bursting, which is another reason for studying this network. CL = 0 0 w 0a PE Input Figure 52. Operating points on hidden layer nonlinearity Observing Fig. 52, the two saturation regions of the PE are obviously stable (f'(x) 0) and the system state will tend to converge to one of them. The operating point of the hidden PE is controlled by Wls(t)+Wfyl(tl). When this value is slightly negative, the operating point moves to the lower negative region of the nonlinearity (labeled Rest in Fig. 52). Otherwise, the operating point moves to the upper positive region (Movement). In order to understand how the projected timevarying neural activity vector and feedback interact to produce the bursting behavior during movement observed in Fig. 51, we decompose the first layer of the network into its two components Wis(t) and Wfyi(t 1), and plot each of these components in Fig. 53 during one movement preceded by a resting period. Notice that when the arm is at rest, both Wis(t) and Wfyl(tl) are negative, so according to our analysis, the PE is saturated in the negative region. In the figure, we see that every time Wis(t) approaches zero (e.g. at t=40) we obtain a rapid increase in Wfyl(t+l) (e.g. t=41), because the operating point approaches the unstable region of the hidden PE, and the PE goes briefly out of negative saturation. We further observe that in order for the feedback to remain positive, Wis(t) must be sustained at a positive value for some samples (e.g. t= 10130), and then the feedback kicksin, amplifies this minute change, and the operating point of the hidden PE goes to the positive saturation, smoothing out the changes in the input. This condition corresponds to the movement part of the trajectory. Therefore, we can think of the feedback loop as a switch that is triggered when the projection of the input Wis(t) approaches zero and becomes positive. 1.5 W1*s t=40 =110130 1 . Wf*Y1 1 Wf*Y   k V  t= 41 0.   ~t Ix l u i W  0.5 , I 1 I  1.5 2 Rest Movement 0 50 100 150 Figure 53. Input layer decomposition into Wis(t) (solid) and Wfyl(tl) (dashed) Input Layer Weights The input neural activity vector controls the RMLP operation by maintaining the PE input around Wis(t)~0, which is achieved by placing through training the Wi vector perpendicular to the input. This seems an odd choice, since this is the unstable attractor of the dynamics. What does this solution tells in terms of neural activity? To answer this question, we have to do some further signal processing analysis. Remember that the neural activity is a long vector with 104 entries (neural channels). As we can see from Fig. 53, the inner product with W1 is very noisy, reflecting the highly irregular nature of the neural firings. We first computed the norm of the weight vector over time as in Fig. 54. As we can see the norm of s(t) is mostly constant, meaning that on average the number of spikes per unit time (100 msec) is basically constant. Therefore, we conclude that what differentiates the neural firings during a movement must be a slight rotation of the vector that makes Wis(t) barely positive. We plot in Fig. 55 subplot 1 the angle at successive time ticks (100 msec) between inputs and the Wi vector to corroborate our reasoning. Notice that immediately preceding and during movement the angle between the input and W1 vectors becomes slightly less than 90 degrees as expected. The sign change is a result of a slight rotation (92 to 88 ) of the neural activity vector initiating and during movement. The components of this vector are the neuronal firing counts, therefore, either all components change or just a few change. In order to find out which case applies here, we computed the directional cosine change for all the 104 neurons during movement1. Figure 56 shows a plot of the neurons and their average weighted firing during a rest and movement segment of data. We can observe that neurons 4, 5, 7, 22, 26, 29, 38, 45, 93, and 94 can be considered the ones that affect the rotation of the neural vector the most. Figure 55 subplot 2 shows the directional cosine for successive inputs of the subspace spanned by the most important 10 neurons, and the directional cosine of the rest of the neural population consisting of 94 neurons. Basically there is no change of the rotation of the neural activity in the 94 dimensional space, while in the space of the 10 highest neurons the directional cosines change appreciably. 10 Ten neurons with the highest weighted average firing rate over a rest/movement window of 150 samples are selected. The weights are the first layer weights of RMLP corresponding to each neuron. 79 Norm of Input Vector 9.1 9 8.8  8.7 8.6   8.5 Rest Movement 8.4 0 50 100 150 Time (100ms) Figure 54. Norm of the input vector (104 neurons) Angle Between Input(t) and W1 (degrees) 96 94   92   90 88     88I RestMovement Threshold 86 0 50 100 150 Input(t) Input(t1) Direction Cosines 1.05 1 0.95 0.9  0.85  0.8 \ Rest Movement 0.75 I 0 50 100 150 time (100ms) Figure 55. Angle between s(t) and W1. Direction cosines for successive input vectors s(t) and s(t1) Average Weighted Neuronal Firing Threshold for Selection V Neuron # Figure 56. Selection of neurons contributing the most to input vector rotation Output Layer Weights In this single hidden PE case, the second (output) layer consists of a 3x1 weight vector to predict 3D hand position. The network output is always a scaled version of this vector, where the timevarying scale factor is given by the hidden PE output. Hence, this simplified single hidden PE network can only generate movements along a 1D space (line). In light of this 1D result, we proceed with our best performing network, which contains 5 hidden PEs1. The weight connections for one of the hidden PE directions are shown in bold in Fig. 57. Each hidden PE direction contributes to the X, Y, Z output through the weights Wx, Wy, and Wz. 1 For multiple PEs, one eigenvalue of Wf leads to an unstable pole, so the results extend. Let us relate the output layer weight vectors to the reaching movement. A plot of the hand trajectory in 3D space along with the weight vectors associated with each hidden PE as shown in Fig. 58. As a first qualitative observation, PE #1 seems to point to the mouth of the primate while PEs #4, 2 and 5 point to the food. We also observe PE #3 to be pointing from the food to the mouth. To quantify these observations, we plot in Fig. 9 the Principal Components (PC) of the movement (the view is in the direction of the third PC) and compute the angle (in degrees) between the output weight vectors and the principal components. The first two PCs point in the directions of maximum variance (restfood and restmouth) of the movement while the third PC vector captures little variance. The corresponding eigenvalues are 568, 207, 30 indicating that the first two PCs capture 70% and 26% of the movement variance, while the third PC only captures 4%. Since the trajectory of reaching for food and putting it in the mouth essentially lies in a plane, the first two PCs define a minimal spanning set for this movement. PC1 points toward the mouth while PC2 points toward the food. The third PC is orthogonal to this plane. Investigating the angles between the PE weights and the PCs show that while the directions of PE#1, PE#3 approach PC1 (with angle differences of 28 and 34 respectively), the directions of PE#2 and PE#5 approach PC2 (23 and 20 ). Finally, PE#4 aligns itself equally distant from PC 1 and PC2 (46 and 44). This indicates that the network output weight vectors organize so that each specializes for either reach to food or reach to mouth. All PEs form large angles with PC3 indicating that the network exploits the fact that the movements lie in a plane. Hidden Layer Tr WY Wz Figure 57. Output layer weight vector direction for one PE / .' .Mouth .: t / // P # Figure 58. Movement trajectory with superimposed output weight vectors (solid) and principal components (dashed). This view is in the direction of PC3 Cursor Control Mapping One of the most interesting aspects of the RMLP organization is that through the training procedure the input layer weights were adapted so that they are orthogonal to the neuronal input vector. In the reaching task, we suspected that this orientation of the weights combined with unstable dynamics were a special solution for the situation of a reaching task requiring a quick burst in the change of position. Until the trained RMLP for the cursor control task was analyzed, it remained unknown how the network would solve the mapping problem for smooth trajectories. The same analysis techniques used for the reaching task are presented in Fig. 59 for the cursor control task. Here, 150 samples of the smooth trajectory are analyzed and 12 for simplicity the decomposition is performed on a RMLP trained with one hidden PE12 In the first subplot, we see that again the angle between neuronal input vector and the input layer weights are roughly orthogonal. At certain time instances, the angle decreases to less than 900 and immediately following this crossing a large increase in the weighted feedback results (see Fig. 59 subplot 2 dashed line)13. Again we see that the feedback is triggered by a rotation of the neuronal input vector. The projection of the neuronal input vector, shown in subplot 2 (solid line), remains negative for all negative hand positions (Fig. 59 subplots 4 and 5). Only when the projection of the input approaches zero does the feedback kick in to amplify the signal and saturate the nonlinearity shown in subplot 3. Recall that for the reaching task that the negative tail of the nonlinearity was assigned 12 This network achieved testing correlation coefficients of 0.40 and 0.71 for X and Y coordinates, respectively. 13 This trend repeated itself for the multiPE network. The weights for each PE were oriented such that the angle formed with the input was either slightly greater or less than 900. These differences are a matter of a sign change in the weights. to resting movements and the positive tail was assigned to reaches; however, in this cursor control task the negative tail of the nonlinearity is assigned to all negative positions while the positive tail is assigned to positive positions. Analysis of the output layer vectors of the trained RMLP network revealed that again they basically span the 2D space of the cursor movements. Since there are no observable landmarks in this random movement analysis of the PE alignment is not as clear. Angel Between Input(t) and W1 (degrees) 100 95 90     90  85 0 50 100 150 Y1 Decomposition 2 _ 0 1 0 50 100 150 50 XCoordinate 0 0 50 100 150 50 YCoordinate 0 50 0 50 100 150 Figure 59. RMLP network decomposition for the cursor control task 