<%BANNER%>

The UF Time Machine

Permanent Link: http://ufdc.ufl.edu/UFE0042337/00001

Material Information

Title: The UF Time Machine a Spike Based Computation Architecture
Physical Description: 1 online resource (127 p.)
Language: english
Creator: Garg, Vaibhav
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: analog, circuit, computation, events, fpga, java, neuromorphic, neuron, simulator, spike, spikesim, synapses, usb, verilog, weights
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The purpose of this research is to investigate of a general purpose spike-based computation architecture. The brain consists of atomic units called neurons which communicate with each other using timing events generally called spikes. The information is stored in the timing between the spikes. The structure of the neuron network, the efficacy of the connection (weight) between these neurons and different properties of the neuron together process the information captured by the brain. This mode of computation can be very efficient in terms of power, area required and can help solve complex engineering problems such as speech recognition effectively. This work builds simplified circuit models of these neurons and organizes them into a configurable network. The architecture is called ?The UF Time Machine? as not only it operates on timing events but introduces a novel concept of storing weights in time rather than using an analog or digital value. Instead of delivering an instantaneous spike from one spiking neuron to another, the weight is sent as the pulse width of the spike. In this work, a comprehensive review of the existing spike-based architectures examining the advantages and shortcomings of each is done. An analog very large scale integration (VLSI) prototype chip has been designed, fabricated and tested. This chip contained 32 integrate-and-fire neuron circuits and 1024 synapses which can all be active at the same time. A digital implementation of the network on a field programmable gate array (FPGA) having 196 neurons with 50,176 simultaneously active inhibitory and excitatory synapses each has also been realized. It is shown that the FPGA architecture can be scaled to over 6000 neurons and over three million synapses. A digital FPGA controller for routing spikes and setting weights has been implemented which can processes up to 34 million synapses per second. An extensible JAVA based behavioral simulator called ?SpikeSim? for this architecture has been developed to speed up the design and testing of this architecture. To sum up, ?The UF Time Machine? provides a densely integrated, highly configurable spike-based computation substrate to implement spike-based algorithms.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Vaibhav Garg.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Harris, John G.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0042337:00001

Permanent Link: http://ufdc.ufl.edu/UFE0042337/00001

Material Information

Title: The UF Time Machine a Spike Based Computation Architecture
Physical Description: 1 online resource (127 p.)
Language: english
Creator: Garg, Vaibhav
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: analog, circuit, computation, events, fpga, java, neuromorphic, neuron, simulator, spike, spikesim, synapses, usb, verilog, weights
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The purpose of this research is to investigate of a general purpose spike-based computation architecture. The brain consists of atomic units called neurons which communicate with each other using timing events generally called spikes. The information is stored in the timing between the spikes. The structure of the neuron network, the efficacy of the connection (weight) between these neurons and different properties of the neuron together process the information captured by the brain. This mode of computation can be very efficient in terms of power, area required and can help solve complex engineering problems such as speech recognition effectively. This work builds simplified circuit models of these neurons and organizes them into a configurable network. The architecture is called ?The UF Time Machine? as not only it operates on timing events but introduces a novel concept of storing weights in time rather than using an analog or digital value. Instead of delivering an instantaneous spike from one spiking neuron to another, the weight is sent as the pulse width of the spike. In this work, a comprehensive review of the existing spike-based architectures examining the advantages and shortcomings of each is done. An analog very large scale integration (VLSI) prototype chip has been designed, fabricated and tested. This chip contained 32 integrate-and-fire neuron circuits and 1024 synapses which can all be active at the same time. A digital implementation of the network on a field programmable gate array (FPGA) having 196 neurons with 50,176 simultaneously active inhibitory and excitatory synapses each has also been realized. It is shown that the FPGA architecture can be scaled to over 6000 neurons and over three million synapses. A digital FPGA controller for routing spikes and setting weights has been implemented which can processes up to 34 million synapses per second. An extensible JAVA based behavioral simulator called ?SpikeSim? for this architecture has been developed to speed up the design and testing of this architecture. To sum up, ?The UF Time Machine? provides a densely integrated, highly configurable spike-based computation substrate to implement spike-based algorithms.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Vaibhav Garg.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Harris, John G.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0042337:00001


This item has the following downloads:


Full Text

PAGE 1

THEUFTIMEMACHINE:ASPIKEBASEDCOMPUTATIONARCHITECTURE By VAIBHAVGARG ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2010

PAGE 2

c 2010VaibhavGarg 2

PAGE 3

To Mywife,Varsha 3

PAGE 4

ACKNOWLEDGMENTS IamgreatlyindebtedtomyadviserDr.JohnG.Harris,rstlyforhisinitial encouragementtoenrollintothePhDprogramandthenforprovidingconstantguidance andsupportthroughoutmyPhD.HehasbeenapatientPhDadviser,averyhelpful mentorandagoodfriendforthepastveandahalfyears.FromhimIhavelearnedhow tobeinquisitive,thinkintelligentlyandcriticallyviewnewideas.Hehastaughtmealot aboutneuromorphicengineeringandthebasicsofspikecomputation.Iamverygrateful tohimforgivingmetheopportunitytoattendtheprestigiousTellurideNeuromorphic WorkshopinColoradothrice,whereImetsomeofthebrightestmindsintheworldand madenewfriends.Apartfromresearch,healsogavemeopportunitiestoworkonother projectsandtakeupinternshipswhichhelpedinmyprofessionalandpersonalgrowth. IwouldliketothankmycommitteemembersDr.Jos eC.Principe,DrJos eA.B. FortesandDr.ArunavaBanerjeeforprovidingvaluablefeedbackonmyresearch.Iam especiallygratefultoTobiDelbr uckforansweringmyquestionsrelatedtocircuitand USBbaseddesigns. CNELlabmatesespeciallyManuRastogi,JieXu,VishnuRavinuthulaand AlexanderSingh-Alvaradohavebeenaconstantsourceofsupport,motivationand funatdifferentpointsduringmyresearch.Ihavegainedimmenselyfrommyintense discussionswiththemwhichgaverisetonewideas.IespeciallywanttomentionRavi Shekharwhohasbeeninstrumentalinthecompletionofthiswork.Iamverygrateful tohimforbrainstormingideaswithmeandforhisdedicationandsupportindoing monotonouslayoutsandlatenightwadingthroughVerilogandJavacode.Ialsowantto thankMikeStapeltonforhishelpfulhintsonPCBdesignandJasonKawajaforalways heedingtomyincessantrequestsfornewsoftwareforourCNELservers. Thisthesiswouldneverhavebeenpossiblewithoutthesupportandmotivationof mylovelywifeVarsha.Shealwaysencouragedmetogivemybestandprayedtogodfor me.SheprovidedtheemotionalsupportthatIneededandwillinglymadesacricesat 4

PAGE 5

bothpersonalandprofessionallevels.Shehasshowntremendouspatienceduringmy latenight-outsforworkandnothavingthetimetocallherwhenshemovedawaytoget hersecondPhD.Ascholarherself,Iamsureshewillbethehappiestpersontoseeme becomeaPhD. IwouldliketoacknowledgemyfundingsourceNationalScienceFoundationNSF whichsupportedmyresearchpartiallythroughgrantNumber05412410. IamindebtedtomyparentsforprovidingthebesteducationthatIcouldhaveand formakingmecapableofpursuingmyhigherstudieshereattheUniversityofFlorida. IamalsoverygratefultoIqbalQaiyumiandShahedaQaiyumiformakingmefeelat homeinGainesville.Asalways,IfeelgodiswithmeaslongasIworkhardandfollow hispath. 5

PAGE 6

TABLEOFCONTENTS page ACKNOWLEDGMENTS..................................4 LISTOFTABLES......................................8 LISTOFFIGURES.....................................9 ABSTRACT.........................................11 CHAPTER 1INTRODUCTION...................................13 1.1Motivation....................................13 1.2OutlineoftheThesis..............................15 2BACKGROUND...................................16 2.1SomeImportantConcepts...........................16 2.2SingleChipandMulti-ChipArchitectures...................18 2.3Digtal/FPGABasedArchitectures.......................22 2.4NeuromorphicChipCommunication.....................27 2.5Discussion...................................28 3THEUFTIMEMACHINE:ASPIKINGNEURONARCHITECTURE.......30 3.1Description...................................30 3.1.1Neuron,SynapsesandWeights....................31 3.1.2CommunicationandControl......................35 3.1.2.1Arbitratedaddresseventrepresentation..........36 3.2HardwareImplementation...........................39 3.2.1Mixed-SignalImplementation.....................40 3.2.1.1Analogspikingneuronarray................40 3.2.1.2Customasynchronouscounterdesign...........43 3.2.1.3Analogsynapsesanddigitalweights............47 3.2.1.4CustomAERcircuitdesign.................50 3.2.2FPGABasedImplementation.....................54 3.3USBEventMonitoring.............................58 3.3.1OpalKelly'sXEM3050.........................58 3.3.2EventMonitoringandTimeStamping.................60 3.3.2.1Timestampandeventstorage...............61 3.3.2.2Monitorstatemachine....................63 3.3.2.3Mainstatemachine.....................63 3.4DigitalController................................66 3.4.1ConnectionTableModule.......................67 3.4.2WeightBufferModule.........................68 6

PAGE 7

3.4.3ControllerFSM.............................69 3.5Discussion...................................69 4SPIKESIM:SIMULATORFORTHENEURONARCHITECTURE........73 4.1SoftwareArchitecture.............................75 4.1.1TimeStepSimulation..........................75 4.1.2Event-BasedSimulation........................76 4.1.3SoftwareArchitecture.........................78 4.2SummaryofFeatures.............................84 4.3SampleApplications..............................86 4.3.1LiquidStateMachine..........................86 4.3.2EdgeDetectionExample........................87 4.3.3PlayingCardRecognition.......................89 5RESULTSANDPERFORMANCEMETRICS...................95 5.1Customanalogcircuits.............................95 5.2FPGA-BasedDigitalCircuits.........................101 5.2.1NeuronArray..............................101 5.2.2Controller................................103 5.2.3USBcommunication..........................105 5.3ScalingandComparison............................108 5.4Discussion...................................110 6CONCLUSIONANDFUTUREWORK.......................112 REFERENCES.......................................117 BIOGRAPHICALSKETCH................................127 7

PAGE 8

LISTOFTABLES Table page 3-1Truthtablefor1bitcounter.............................44 3-2Truthtablefor clockGen logicforthecounter...................45 3-3Bufferamplierdesigncharacteristics.......................50 3-4Monitorstatemachineinputandoutputs......................64 3-5Mainstatemachineinputsandoutputs.......................65 5-1Fabricated32neuronchipspecications......................96 5-2Fabricatedcounterspecications..........................98 5-3Digitalneuronarrayspecications.........................102 5-4DigitallogicrequirementsfortheneuronontheXilinxXC3S4000FPGA....103 5-5Digitalcontrollerspecications...........................104 5-6DigitalcontrollerspeedspecicationswhenusingblockRAMonFPGA....105 5-7DigitalControllerspeedspecicationswhenusingexternalSDRAM......106 5-8DigitallogicrequirementsforthecontrollerontheXilinxXC3S4000FPGA...106 5-9USBtimestampboardspecications........................106 5-10USBtimestampboardresourcerequirements..................107 5-11ComparisonofUFTMwithotherarchitectures...................109 8

PAGE 9

LISTOFFIGURES Figure page 3-1Blockdiagramofspikingneuronnetwork.....................30 3-2Blockdiagramofaneuronandsynapses.....................32 3-3Conceptofstoringweightsintime.........................33 3-4AERblockdiagram..................................37 3-5Neuron-AERhandshakesequence.........................38 3-6Integrate-and-reneuroncircuit...........................40 3-7Simulationresults:Integrate-and-reneuron.....................42 3-8Gateleveldiagramforonebitcounter........................45 3-9Varioussmallblocksforcounter..........................46 3-10Simulationresults:Asynchronouscounter.....................47 3-11CurrentDACschematic...............................48 3-12CurrentDACcopycells...............................49 3-13Currentsplitterbuffers................................50 3-14Simulation:Neuronandsynapsescircuitssimulatedtogether..........51 3-15AERblocks'schematics...............................51 3-16Simulation:8inputAER...............................53 3-17Digitalneuronblockdiagram............................54 3-18DigitalsimulationofUFTMonFPGA........................57 3-19XEM3050boardfromOpalKelly..........................59 3-20USBmonitoringdeviceoverview..........................61 3-21MonitoringMachinestatediagramforUSBmonitor.................64 3-22MainMachinestatediagramforUSBmonitor....................65 3-23Digitalcontrollerblockdiagram...........................66 3-24Buffersfordigitalcontroller.............................67 4-1AUMLdiagramexample..............................78 9

PAGE 10

4-2Relationshipbetweenpackagesin SpikeSim ...................79 4-3UMLnotationof SimulationSetup classin SpikeSim ...............81 4-4UMLnotationofdifferentclassesin SpikeSim ...................83 4-5UMLnotationof NeuronArray classin SpikeSim .................84 4-6SpikingactivityofaLSM..............................87 4-7Circuitusedforspikesmoothing..........................87 4-8Setupusedtoachievenegativethresholdinedgedetection...........88 4-9Edgedetectionsimulationresultson SpikeSim ..................89 AOriginalimageconvertedtospiketimings................89 BSmoothedspiketimingsanddetectededge...............89 4-10Playingcardsimagingsetup.............................90 4-11Spikegenerationfromtimetorstspikeemulation................91 AEdgesfor8heartscard..........................91 BSpikesfromimagerfor8ofheartscard.................91 4-12Timetorstspikematrix...............................92 4-13Speedperformanceof SpikeSim ..........................93 4-14Recordinginternalvariablesof SpikeSim .....................94 5-132Neuronchipmicrograph.............................95 5-2Chipresults:asynchronouscounter........................97 5-3Chipresults:neuronmembranevoltage......................98 5-4Chipresults:Variationinspikerateswithrefractoryperiod............99 5-5Chipresults:Variationinspikeratewithcurrent..................99 5-6Chipresults:Powerandenergyperspike.....................100 5-7PCBboardsetup...................................101 10

PAGE 11

AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy THEUFTIMEMACHINE:ASPIKEBASEDCOMPUTATIONARCHITECTURE By VaibhavGarg December2010 Chair:JohnG.Harris Major:ElectricalandComputerEngineering Thepurposeofthisresearchistoinvestigateofageneralpurposespike-based computationarchitecture.Thebrainconsistsofatomicunitscalledneuronswhich communicatewitheachotherusingtimingeventsgenerallycalledspikes.The informationisstoredinthetimingbetweenthespikes.Thestructureoftheneuron network,theefcacyoftheconnectionweightbetweentheseneuronsanddifferent propertiesoftheneurontogetherprocesstheinformationcapturedbythebrain.This modeofcomputationcanbeveryefcientintermsofpower,arearequiredandcanhelp solvecomplexengineeringproblemssuchasspeechrecognitioneffectively.Thiswork buildssimpliedcircuitmodelsoftheseneuronsandorganizesthemintoacongurable network. ThearchitectureiscalledTheUFTimeMachineasnotonlyitoperateson timingeventsbutintroducesanovelconceptofstoringweightsintimeratherthan usingananalogordigitalvalue.Insteadofdeliveringaninstantaneousspikefromone spikingneurontoanother,theweightissentasthepulsewidthofthespike.Inthis work,acomprehensivereviewoftheexistingspike-basedarchitecturesexaminingthe advantagesandshortcomingsofeachisdone.Ananalogverylargescaleintegration VLSIprototypechiphasbeendesigned,fabricatedandtested.Thischipcontained 32integrate-and-reneuroncircuitsand1024synapseswhichcanallbeactiveatthe sametime.Adigitalimplementationofthenetworkonaeldprogrammablegatearray 11

PAGE 12

FPGAhaving196neuronswith50,176simultaneouslyactiveinhibitoryandexcitatory synapseseachhasalsobeenrealized.ItisshownthattheFPGAarchitecturecanbe scaledtoover6000neuronsandoverthreemillionsynapses.AdigitalFPGAcontroller forroutingspikesandsettingweightshasbeenimplementedwhichcanprocessesupto 34millionsynapsespersecond.AnextensibleJAVAbasedbehavioralsimulatorcalled SpikeSimforthisarchitecturehasbeendevelopedtospeedupthedesignandtesting ofthisarchitecture.Tosumup,TheUFTimeMachineprovidesadenselyintegrated, highlycongurablespike-basedcomputationsubstratetoimplementspike-based algorithms. 12

PAGE 13

CHAPTER1 INTRODUCTION 1.1Motivation Asengineerswewouldbefoolishtoignorethelessonsofabillionyearsof evolution CarverMead,1993 Biologicalsystems,especiallytheanimalbrainareknowntoperformcomplex largescaletasksquicklyandefcientlythatmayneverbematchedbyman-made machines.Thesebiologicalsystemsarecharacterizedbyperformancemetricssuch ashighsensitivity,widedynamicrangeandhighsignaltonoiseratio.Forexample, state-of-theartvoicerecognitionsystemsareonlylimitedtoaxedsizedvocabulary andthattooatmuchlowernoiselevelsascomparedtotheanimalauditorysystem. Thesebiologicalsystemsalsohaveasuperiorarchitectureascomparedtocurrent digitalcomputationmachines.Thebrainconsumesonly12Wofpowerascompared toanywherebetween132Wto200WforthelatestIntel r processor.Thebrainisvery compactandhasveryhighfaulttolerance.Noiseisubiquitousandtherearenoreliable components.Thisisincomparisontodigitalcircuitswhichmustbeerrorfreeandhave builtinnoisemargins.Withincreasingadvancementintechnology,itisquiteclearthat theMoore'slawmaynotholdandwemayhitfundamentallimitsofmaterials,device andcircuitengineering.Toadvance,itisimportanttorealizethatthereisaneedto moveawayfromthevonNeumannarchitecturestyleofcomputingandinsteadexplore computationinspiredbytheanimalbrain.Thecomputationinthebrainischaracterized bycontinuousadaptation,parallelprocessingandapproximatesolutions.Questionis, canwecapturethesepropertiesinsiliconhardware? CarverMeadandotherresearchersrealizedthesimilaritybetweenthephysics oftransistordevicesandthatofneurons,thebasicbuildingblocksofthebrain[76]. Thefunctioningoftheseneuronsisverysimilartoanalogcircuitsthatoperatein subthresholdregioncharacterizedbyexponentialrelationshipbetweencurrentand 13

PAGE 14

voltage.Thisstartedanewresearcheldcalledneuromorphicengineering.Itis originallydenedasthedevelopmentofarticialsystemsthatemploythephysical propertiesof,ortheinformationrepresentationsfoundin,biologicalnervoussystems. Overthatpastthreedecades,neuromorphicengineershavebuilthardware integratedcircuitsthatmimicbiologyandperformcertaintaskssuchasvision,audition andlocomotion.Buildinghardwareisadvantageousascomparedtosoftwareemulation duetoitscompactness,lowpowerconsumptionandreal-timeoperation.Alotof neuroscientistsandengineersaretryingtoexplorebasiccomputationtechniquesofthe brain.Thebrainiscomposedoflayersofneuronsandallproduceanactionpotential orspikeandiscommunicatedbetweenneuronsusingconnectionscalledsynapses. Varietiesofneuronarraysarefabricatedwithneuronshavingpropertiessimilarto thatfoundinthebrain.Thecircuitsthatcloselymodeltheseactionpotentialsand responseofsynapsestothemrequirealargenumberofcomponentsandelectrical biaseswhichneedtobepreciselynetuned.Thisisbecausewestilllackinnovations inmaterialsasdemonstratedbybiology.Withhighvariabilityandpracticaldifcultiesin analogcircuitfabrication,mostresearchersndithardtoconguretheseIC'sanddo usefulcomputationaltasksevenforwhichtheyweredesigned.Manyscientistshave postulatedthattherealinformationliesinthetimingofthespikesthatareobservedin thebrain.Theshapeoftheactionpotentialisnotimportantascomparedtotheprecise occurrenceofspikesinthenetworkandthemodulationofsynapticstrength. Inthisthesis,anoveltimecomputationarchitecturecalledthe UFTimeMachineUFTM ispresented.Ittriestosimplifytheoperationoftheseneuronarrays yetmaintainingarichsetoffeaturesthatallowresearcherstoimplementdifferent algorithmsonahardwarefeasiblearchitecture.Itprovidesageneralpurposecomputation substrateasamicroprocessorwouldforanysoftware.UFTMtriestotakeadvantage ofrapidadvancementofdigitalcircuitstoimplementcomplexoperationsandofanalog tosolvethecontinuous-timedifferentialequationsforneuronmodels.UFTMprovides 14

PAGE 15

auniversalfront-endforrapiddevelopmentofalgorithmsbyintegratingsoftware emulation,FPGAbasedhardwareemulationandanalogchipbasedemulation.The designofa32neuronanalogchipwith1024synapseshasbeendonetoimplementthe UFTM.Adigitalimplementationwith196neuronsand100,352synapsesonanFPGAis done.Adigitalcontrollercapableofprocessing34megaeventspersecondpersecond hasbeenimplementedonanFPGA. 1.2OutlineoftheThesis Chapter2reviewsarecenthistoryofspikingarchitecturedesign.Itreviewssome oftherecentlypublishedanalogchipbasedandFPGAbasedneuralarchitectures andtechniquesforneuromorphicchipcommunication.Chapter3describesthe UF TimeMachine architecture.ItdescribesthevariouscomponentsofUFTMfollowedby simulationsandfabricationresults.Italsoincludesadescriptionandsimulationresults foraddresseventrepresentationcircuitsusedforneuromorphicchipcommunication. DetailsforthedigitalimplementationforthespikingarchitectureandforaUSB communicationmodulearealsopresented.Chapter4introduces SpikeSim ,aJAVA basedsimulatorforUFTM.Itisabehavioralsimulatorthatcanbeusedforrapid developmentofalgorithmsandprovidedesigninsightsandrequirementsforthe hardwareimplementation.Chapter5summarizestheresultsfromthefabricated chipandtheFPGAbasedimplementation.Thisthesisconcludeswithadiscussionand futureworkinChapter6. 15

PAGE 16

CHAPTER2 BACKGROUND Therehasbeensubstantialprogressinneuromorphichardwaredesignbyvarious researchgroupsinthelasttwodecades.Hence,thereisalargeandadiversebody ofliteraturerelatedtohardwarespikingneurons[14,41,56,58],learningmechanisms inhardware[10,79,80],spikingsensors[70,108,116]andspikingnetworks[15, 54,59,104,109].AcomprehensivesurveyofearlyneuromorphicVLSIworkcanbe foundinLande[68].Indiverietal.[61]andLiuetal.[72]reviewthoroughlytherecent advancesinneuromorphicengineeringandtheirapplicationstobuildingcognitive orintelligentneuromorphicmachines.Tokeepthisreviewrelevanttoourresearch objectives,rstsomebasicconceptshelpfulinunderstandingtheliteraturereviewthat followsareintroduced,followedbysomeoftherecentlypublishedgeneralpurpose singlechipandmulti-chipspikingneuronarchitectures.AreviewofFPGA-based implementationofdigitalneuronsisfollowedbyadiscussionofavailablemechanisms forcommunicationandcapturingoutputfromtheseVLSIhardwarechipsispresented. 2.1SomeImportantConcepts Thefactthatwecanbuilddevicesthatimplementthesamebasicoperations asthosethenervoussystemusesleadstotheinevitableconclusionthatwe shouldbeabletobuildentiresystemsbasedontheorganizingprinciples usedbythenervoussystem.Iwillrefertothesesystemsgenericallyas neuromorphicsystems.CarverMead,1990 Meadenvisionedthatsimilaritiesbetweenbiologicalnetworksandanalogcircuits operatingintheCMOSsub-thresholdregimeandthesimilaritybetweenthephysicsof neuronsandthatoftransistorscanbeexploitedtodevelopaneweldofengineering[76]. Since1989manycircuitshavebeendevelopedinsiliconwhichmodelbiology closelyandotherswhichperformengineeringtasksmoreefcientlythanourcurrent systems.Thefundamentalbuildingblockformostneuromorphicsystemsistheneuron. 16

PAGE 17

Asimpleneuroncanbethoughtofacurrentintegratorwhichemitsaspikewhenever theintegratedcapacitorvoltagealsocalledthemembranevoltagereachesathreshold. Itisbelievedthatthetimingbetweenthesespikeencodeinformation.Therearemany modelsofneuronsrangingfromconductance-basedmodelswheretheinputcurrentis dynamicallycontrolledbytheintegratedvoltagetoamuchsimplerhardwareplausible model:theintegrate-and-remodelwhichignoresmuchofthedynamicsobservedin biology[see36,44].Thereisanongoingdebateonhowaccuratethebiologicalmodels needtobeinordertoperformusefulcomputations. Neuronsareinterconnectedbybiologicalstructurescalledsynapses.Synaptic connectionsinbiologyarecharacterizedbyaweight,probabilityoftransmissionofa spikeandadynamicresponsetotheincomingspike.Aspikeemittedbyapre-synaptic neurondeliveredtoapost-synapticneuroncausesacurrenttobeinjectedintothe targetneuronlteredbythesynapse'sresponse.Dynamicsynapsesarethought tocontributetocomputationinthebrain.Eachneuroncanhaveahundredsto thousandsofsynapticconnections.Synapsesincircuitscanbesimplyimplemented asinstantaneousmultipliersfollowedbyaddersoramorecomplexcircuitwitha linear/non-linearresponsepossiblywithmemory[14]. Neuromorphicengineersbuildcircuitswhichmodelthevisualandauditory pathways,theolfactorysystemandothermotor-sensoryfunctionsofthebrain.They alsobuildlargespikingneuronarrayswhichareusedtomodelcertainhigher-level functionsofthebrain.Severalalgorithmsbasedonspikingneuronshavealsobeen developedwhichcanbeusedtoaccomplishengineeringtasksmoreeffectively.Ifthe spikingarraysarebuilttobegenericenough,thesealgorithmscanbeimplementedon thesespikingneuronprocessors. MostspikingarraysusethestandardizedAddressEventRepresentationAER[69, 75,102]totransmitinformationoutsidethechip.AERtakesadvantageofthesparse activityobservedinbiology.Ratherthancontinuouspollingsynchronizedtoaclock, 17

PAGE 18

AERoutputstheaddressofanyneuronwheneveritspikesasynchronouslyi.e.itis event-based.Arbitrationisusedasamechanismforcollisiondetectionandresolution. IfthespikingactivityofthenetworkisveryhighthentheperformanceofAERdegrades gracefully[24]. 2.2SingleChipandMulti-ChipArchitectures Inthedesignofalargeneuronspikingarray,themostimportantdecisiontobe madeisthatofnumberofsynapsesandconnectivity.Implementingon-chipconnectivity allowsforhigherbandwidthfortransmissionofspikesasspikesareroutedwithinthe chiplocallyandimplementationofsynapseswithspecialproperties[60,101].Theother alternativeistoimplementvirtualconnectivitywhereeachpre-synapticspikeisbrought outsidethechipandmappedbackontothechipviasomemechanismsuchasalook uptableLUTora1-1mapper.Cauwenberghsetal.[33]presentedsuchachipwith 1024analogVLSIintegrate-and-reneuronsandeachwith128probabilisticsynapses. TheyimplementedtheconnectivityusingaLUTstoredona128K 16RAM.Synaptic efcacy w isexpressedasthecombinedeffectofthreebio-physicalmechanisms: w = abc where a representsthequantalneurotransmitterrelease, b istheprobabilityofsynaptic releaseand c isthemeasureofpost-synapticeffect.Inthisimplementationthey kept c asconstantwith a and b asvariable.Foreveryspike,arandomnumberis generatedandiscomparedtotheprobabilityifthespikeneedstobetransmitted.For communicationofspikes,insteadofusinganarbitratedaddresseventrepresentation AER,apollingmechanismisusedwhichtendstoslowdownthesystem.This systemwasthenusedforaboundarycontoursystemforimagesegmentationand boundarycompletioninthepresenceofclutterandocclusion.Amoreadvancedsystem usingthesameideaoftheLUTwaspresentedbyVogelsteinetal.[110].Calledthe IFATintegrate-and-rearraytransceiver,thesystemconsistedofa60 40array 18

PAGE 19

ofconductance-basedneurons[111].Theweightsintheneuronareimplemented usingaswitchedcapacitorarray.FollowingEquation.2,now c isusedtosetthe weight.ThesystemusesanAERrepresentationtocommunicatewithanFPGAand amicro-controllerunittomanagetheincomingandoutgoingspikes.Thesynaptic updatesareequivalenttodumpingaquantityofchargeonthemembranecapacitor instantaneouslycorrespondingtoaclock.TheIFATisquitegeneralforallowingdifferent algorithmstobeprogrammedonitbuthasafewlimitations.Apartfromitslargesize, mismatchinthecapacitorarraycancauseneuronstohaveverydifferentcharacterizes. Moreimportantly,theinstantaneousupdateofthemembranevoltagetendstodestroy timestructureinthespikesthataregenerated.Inotherwords,theeffectofprecise occurrenceoftwopre-synapticspikesislostsincetheupdatetoneuronhappensonly onaclockedge.Ateveryclockcycle,onlyoneneuronisupdatedwhiletherestare sittingidle.Theauthorsdosuggestsendingmultiplechargepacketsbyvarying a and b toemulatesomeofthedynamics,butthiswouldstrainthebandwidthandresources tocontinuallyupdatetheprobabilityorweightofaconnection.Vogelsteinetal.[109] laterusedtheIFATinamulti-chipcongurationsimulating4800neuronswith4million synapsestoimplementtherstfewstagesofsaliencedetectionandobjectrecognition. Thoughthenumberofsynapsesseemsimpressive,allsynapticevaluationtakeplace seriallyandthepublishedsystemimposesalimitonthetotalnumberofsynaptic connectionsthatcanbeevaluatedatanygiventime. Indiverietal.[60]presentedanarrayof32-lowpowerspikingneurons.Eachneuron has8synapsescomprising6excitatoryand2inhibitorysynapses.Thesynapsesare addressableviatheAERprotocolandaspikegeneratedinthearraycanberouted arbitrarilytoanyofthesynapsesbycapturingthespikeusingtheAERprotocolat theoutput.Thesesynapsesimplementsomeoftheobservedbiologicalphenomena suchasshorttermdepression[117]andlongtermdepression[19,115].Implementing synapseshelpmodelbiologycloselybutlimitthenumberofsynapsesduetothe 19

PAGE 20

size.Moreover,itisdifculttomultiplexthesesynapsesamongstdifferentneuronsfor efcientutilizationofresources.Mostofthesesynapticcircuitsrequirealargenumber ofbiaseswhicharecommontoallsynapsesleadingtoaparametricsearchinavery highdimensionspace.Mitraetal.[80]builtanadvancedversionofthischipneurons and2048synapseswhereasingleneuroncanbeconnectedto128,256,512or 1024synapseusingamultiplexer,.Thereareonly2048synapses,dependingonthe multiplexer,theusablenumberofneuronsdecreasefrom16to2.Thisdoesmakeit morecongurablebutittradesoffresourcesbetweensynapsesandneurons.Thischip wasusedtoimplementaspike-basedplasticitymechanism[26].Itmaybepossiblethat when1024synapsesareconnectedtoeachneuronofthetwoneurons,manysynapses maybesittingidleandcouldbeusedtoactivateotherneurons.Aswewillseelater, advancesinhighspeeddigitalVLSIallowthemultiplexingofsynapses,somethingthat doesnotoccurinbiology. ArthurandBoahen[11]presentedaSTDPspike-timingdependentplasticity chipwhichconsistedof32by32arrayofexcitatoryprincipalneuronscommingled with16by16arrayofinhibitoryinterneurons.Eachprincipalneuronshad21STDP synapses.ACPLDcomplexprogrammablelogicdevicemediatescommunicationof spikesandimplementsconnectionsusingalookuptablestoredinaRAMchip.The excitatoryneuronitselfconsistedofcircuitsforasoma,synapse,refractory,axonhillock andcalcium-dependentpotassiumchannel.TheseSTDPsynapsescanbeturned onoroffviaacomputerconnectedtotheboardusingaUSBinterface.Theprincipal neuronstogetherwithSTDPsynapseswereusedtodemonstratehowSTDPaccounts forvariabilityintheneuronparametersandhowitcanbeusedtolearnandrecalla pattern.Inotherwork[12],theinhibitoryinterneuronswereusedtodemonstratehow theneuronssynchronizeinthegammafrequencyrange.Adiffusorcircuitisusedto connectaninhibitoryneurontoitsneighborsrealizingall-to-allinhibition.Itwasshown thattheneuronscansynchronizeusingshuntinginhibitionandsynapticdelays.This 20

PAGE 21

chipispartiallycongurableplacingsomerestrictionsontheconnectivityandweights. Tomimicbiologyclosely,alargenumberofbiasesareusedforallcircuitswhichmakeit hardertocongurethechipforotherapplications. Giulionietal.[48]demonstratedachipthatalsoimplementedthestoplearning[26] mechanisminanetworktoclassifycorrelatedpatterns.Thechipconsistsof32 integrate-and-reneuronswithspike-frequencyadaptationand2016Hebbianbistable spikedrivenstochasticsynapses.Eachneuronconsistsof31recurrentsynapses connectingallotherneuronstoit.Inadditionthereare32synapticconnectionsfrom outsidetowhichthespikesaredeliveredusingtheAERinterface.Eachsynapse hasacongurationblockcalledtheTypeAssignmentCircuitTACwhichusing threebitstodeneifthesynapseisinhibitory,excitatoryordisabled.A4032bitshift registerisusedtoloadcongurationinformationforallsynapses.Thissystemdubbed F-LANN[47]doesallowacertaindegreeoffreedominitscongurabilitybutdoes haveitsshortcomings.Connectionscannotbemadeorbrokenontheyasitrequires re-programmingofthewholeshiftregister.Alsothesynapticweightsarecontrolled byanalogbiaseswhichareglobal.Theselimitationsmakeitsuitableonlyforcertain learningtasksastheonepresentedinthepaper. Anotherapproachforrecongurableneuromorphiccomputingsimilartoeld programmablegatearraysFPGAoreldprogrammableanalogarraysFPAA[2]is adoptedbyKoickaletal.[67].Theauthorsdescribeabasicbuildingblock,ageneric timeeventblock,consistingofalargetimeconstant,balancedstructureoperational amplier,acapacitorarray,acomparatorandtwotransmissiongates.Itisshownhow usingtheseblocksmultipletimesbutconnecteddifferently,aspikingneuron,adynamic synapseandaspiketimedependentlearningcircuitcanbeimplemented.Thus,using manyoftheseblocksanddeningappropriateconnections,aspikingneuronnetwork canberealized.However,eachoftheseeventblocksislargeinsizeduetotheanalog componentspresent.Thespike-timingdependentcircuititselfrequires16blocks.The 21

PAGE 22

chippresentedused 10 mm 2 areatoimplementjust10eventblocks.Itisnotdescribed howtheconnectionsbetweentheseblockswillbemadeordifferentanalogbiasesand digitalbitswillbeprovidedtheeventblocksasrequiredbyaparticularcircuit. Schemmeletal.[96]describeawafer-scaleneuronnetwork:FACETSFast AnalogComputingwithEmergentTransientStates[4].Tohaveahighdensityof neuronsandsynapticconnections,eachsiliconwaferofFACETSconsistsof56 reticles.EachreticleconsistsofeightHICANNHighInputCountAnalogNeural Networkchips.EachHICANNchipconsistsofaANNCOREAnalogNeuralNetwork Corecontaining128ksynapsesand512membranecircuitswhichcanbeusedto formneuronswithupto16Ksynapses.Thewafersareconnectedtoamotherboard containingFPGAsforinter-chipconnectivity.Extensiveroutingbetweenthechips onthewaferisavailableandcanbeprogrammedusingvariousalgorithms[42]. Communicationbetweenreticlesisaccomplishedusingcontinuoustimeevent transmissionprotocolwhichusesacombinationofspatialandtemporalmultiplexing.A singlelowvoltagedifferentialsignallingschemeisusedtolowerpowerconsumption. Repeatersandsignaldriverswithacomplexroutingmechanismenablesynapticevents tocrosschipboundaries.Apostprocessingstepisrequiredtoconnectwiresbetween differentchipsonthesamewafer.TheFACETSprogramwithitsintricateframeworkof customhardwareandsoftwareallowsforlargescalesimulations.Butthedevelopment ofFACETShardwareisveryexpensiveandcomplicatedandcanonlyfabricatedby particularresearchgroupafterinvestingyearsofresearchanddevelopment. 2.3Digtal/FPGABasedArchitectures FPGA-basedspikingneuronnetworksSNNhavebecomeanattractivealternative toimplementingspikingnetworksinsoftwareordevelopingdedicatedhardware section2.2.SNNsinspiredfrombiologytypicallyrequirelargenumberofneuronstobe simulatedforaparticulartaskandneedtotakeadvantagethatbiologicalsystemsare inherentlyparallel.Thisiscomputationallyveryexpensiveusingsoftwaresimulation 22

PAGE 23

runningoncomputersbasedontheserialVonNeumannsequentialprocessing architecture.Asthenumberofneuronsandinter-connectionsbetweenthemincreases itbecomesdifcultforsoftwaresimulationtokeepupinrealtimeuntilandunlesshuge resourcessuchassupercomputersareused[9].Developingapplication-specicICs ASICsisnotonlytimeconsumingandexpensivebutalsoinexible.Manytimesthe neuronmodelandconnectivityontheseASICsisxedandtheycanbecongured torunonlyasmallsetoftasks.Anychangerequiresare-designoftheASIC.FPGA implementationsontheotherhandprovideamixofcheapcost,speed,parallelism andexibilitybutarelimitedbycomputationalpowerandprecision.Someoftherecent advancesinFPGA-basedneuromorphicsystemsarereviewedhere. Maguireetal.[74]comprehensivelyreviewtheearlierASIC,DSPandFPGA basedimplementationsofbotharticialneuralnetworksandspikingneuralnetworks. EarlierimplementationssuchasNESPINN[63]andMASPINN[97]areexamplesof acceleratorboardsthatareconnectedtoahostcomputertoaccelerateSNNsimulation times.AmorerecentDSP-basedsequentialsimulationacceleratorParSpike[113] showedadvantagesforsimulatingalargevisionnetwork.Roggenetal.[93]reported acellularSNNmodelwithrecongurableconnectivityimplementedonanFPGA and64neuronswereusedtosuccessfullydemonstrateanobstacleavoidancetask witharobot.Rosetal.[94]publishedanotherhardwareacceleratorboardcalled theRT-Spike.Itcansimulate1024neuronswith4096connectionsonXilinxVirtex 2000EFPGAmountedonaPCIboardwhilecommunicatingwiththehostcomputer forroutingandlearningupdates.TheauthorsMaguireetal.[74]inthesamepaper presentanimplementationofaconductancebasedleakyintegrate-and-remodelwith STDPlearningonaXilinxVirtexIIXC28000FPGA.Thenumberofneuronsthatcan beimplementedwereinverselyproportionaltothenumberofsynapses.Forbiological plausibleratioof100synapsesperneuron,atotal4neuronscanbeimplementedon eachFPGA.Toincreasethenumberofneurons,timemultiplexingisused.Thecontrol 23

PAGE 24

andbookkeepingofthisschemeisorchestratedbyaMicroBlazesoftprocessercore. Thisprovidedaspeedupfactorforeachneuronofabout125000. Schrauwenetal.[100]provideanexcellentreviewofvariousarchitecturesusedfor FPGAimplementations.Theydescribeandre-implementtwoarchitecturestylesfora spikingneuralnetworkandthendetailanarchitecturewhichisamixofthetwoandhas certainadvantages.Theauthorsareespeciallyinterestedinanimplementationwhich runsintherealtimescaleratherthanacceleratedsimulationsanduseitforspeech recognition.Therstkindofarchitectureisserialprocessing,parallelarithmetic[93]. Herethesynapsesarecomputedseriallybutthemembranepotentialupdateandreset happensinparallel.There-implementeddesignwith200neuronscanbeclocked at100MHzandis347timesfasterthanspeechsampledat16KHz.Thesecond architectureusesparallelprocessing,serialarithmetic[46].Herethesynapticinput isprocessedinparallelusinganaddertreeandthemembranepotentialisupdated serially.Inthere-implementedversiontheadderispipelinedgivingaclockedspeedof 116MHzfor200neuronswhichis205timesfasterthanspeechsampledat16KHz. Thenovelarchitecturepresentedisserialprocessingandserialarithmetic.Heretheall theinformationsuchassynapses,weight,membranevoltageanddecayareallmemory based.ManyneuronsaremultiplexedontoonphysicalneuroncalledtheProcessing ElementPE.AcontrollercontrolsmanyPEs.TheauthorsimplementaLiquidState Machine[73]onthisarchitectureforspeechrecognitionandcanrun1600neuronsat 16KHzusing40PEs.RunninganFPGAatamuchslowerclock,theauthorsareableto tradeoffspeedwithincreasednumberofneuronsthatcanimplemented. AmorerecentFPGA-basedspikingneuronarraywaspresentedinCassidy etal.[31].Aleakyintegrate-and-reneuronwithSTDPlearningisimplementedona XilinxSpartanXC3S1500FPGA.Thedesignconsistsof32identicalneuronswhich communicateusinganAER[22]bus.AnAERre-mapperestablishesanalltoall connectivitywithafanoutoffour.Theauthorsimplementauditory,parametersearch 24

PAGE 25

andabalancedexcitationexperimentrunning3125timesfasterthanrealtime.The authorssuggestthattime-multiplexingcanbeusedbutthedesignofacontrollerisnot presented.[30]presentedanarrayofdynamicaldigitalsiliconneuronsimplementingthe famousIzhikevichneuronmodel[62].Thearrayconsistsof32physicalneuronseach multiplexed8timessimulating256neurons.Thearrayoperatesat5,000timesfaster thanrealtime. FPGAsprovideanacceptabletrade-offbetweencongurabilityandeaseofdesign versuspowerandareaconsumption.Howevertheyarenotagoodsolutionformassive scalingofneuromorphicarchitectures[74]astheysufferfromroutingcongestionand powerconsumption.Also,thehardwareneedstobereconguredeverytimeanew modelofneuronorsynapseistobeimplemented.TheSpiNNakerchip[89]aimsto overcometheselimitationsbybuildingacustomASICbasedneuromorphicplatform forlarge-scalemodelingandforexplorationofdifferentmodels.EachSpiNNakerchip consistsofmanyARM9processorcoresrunningataround200MHz.Eachcorehas accesstoitsownprivatememorycalledTightlyCoupledMemoryTCMdivided into32KBofInstructionTCMITCMand64KBofDataTCMDTCM.TheITCM storesallthecodetorunontheprocessorandDTCMstoresdatarequiredforthe particularoperationbeingperformed.AllprocessingcoresshareviaaDMA,anexternal SDRAMclockedat133MHzwhereallsynapticconnectionandparametersarestored. ASystemNetworkonChipNOCinterfacestheprocessorcoreswitheachother andwiththeSDRAMandwiththeoutsideworldusingEthernetprotocol.ThisNOC implementsagloballyasynchronous,locallysynchronousblockwhichoperatesinan independenttimedomain[88].Itusesdelayinsensitivecommunicationtoachieve speedsupto1Gbps.AnotherNOCnamedCommunicationNetworkonChipisused tocommunicatebetweentheprocessorsandamulticastrouterandisresponsible forcommunicatingspikesbetweencoresonthesamechiporondifferentchips.The 25

PAGE 26

multicastrouterisdesignedtoacceptincomingspikeeventsfromothercoresandroute themaccordinglyfollowingdifferentroutingalgorithms[86]. SpiNNakerusesevent-basedprocessingtoachieverealtimesimulationin hardware.Eachspikeisrepresentedasanevent-packetwithaloadcontaining packetmanagementdata.Computationtakesplacesonlywhenaneventisgenerated. Thiswayeachprocessorcanbemultiplexedtosimulatemanyneuronswhosestates areupdatedonlywhenaspikearrivesandoutputsaspikeatintervalsof1ms.To generalizetheimplementationofdifferentmodels,SpiNNakerusesathreetieredmodel. Atthelowestlevel,thedevicelevelontheprocessor,interruptsrequestsandservice routinesaredenedforpacketreceived,DMAoperationcompleteandtimerevents.At thenextlevel,thesystemlevel,iswhereneuralandsynapsemodelsaredenedwhich invokesthedevicemodelsdependingonthemodeldenedsuchasspikingneuron model,leaky-integrate-and-remodels[87]andSTDPsynapses[65].Thetoplevelis theModellevelwheretheneuralnetworkconnectionsandcongurationisdenedwhich invokestheSystemlevelmodel.Thiswayarbitraryneuralnetworkscanbemappedto SpiNNakerchips[66].SpiNNakeraimstoprovideawholesoftwarestackfortheend userwhichconsistsinterfacesfordeningnetworksusingpre-denedneuronmodel librariesordevelopingnewlibraries[64].Mostoftheresultsthathavebeenpublished aresimulationresultsusingVerilogandARMsimulators.Currentlya2processor corechiphasbeenfabricatedanda20processorcorechipisbeingfabricated.Each processorcanimplementabout1000neuronsperprocessorrunningat1msupdates assumingaverageringrateof100Hz.Eachrouterhasabout1000entriesandabout 1000synapsesperneuron. TheSpiNNakerprojectfollowscloselytheaimsoftheUFTimeMachineof congurability,exibilityandgeneralization.Itdoestrade-offitsgeneralpurposeuse andmassiveparallelismwithcomplexityofdesign.AscomparedtotheFPGAand analogchips,thedesignofthesechipsisverycomplicated.Alsotheneuronoutputsare 26

PAGE 27

quantizedtoa1msclock.ItiseasiertowritecodeforARM9processorsbutthesecores aresequentialprocessorsandthuslimithowmanyneuronsthatcanbemultiplexedon theseprocessorsforrealtimeoperations. 2.4NeuromorphicChipCommunication Communicatingwithneuromorphicchipsiscrucialnotonlyformonitoringthe activityofthechipsbutforsendingexternalinputtothenetworkandcommunicatingwith otherchips.HardwarewiringorroutinglimitationsinVLSIfabricationlimitthenumberof connectionsthatcanbemadeonthechipandthushavetobeprogrammedsomehow externally.Communicationschemessuchasrandomaccessandtimemultiplexinghave beenusedtocommunicatebetweenchips.Severaltechniquestomaximizebandwidth suchaspolling,freeforallandarbitrationhavebeeninvestigatedbefore[75,81,102]. Boahen[20]debatedbetweentheabovesaidapproaches.Itturnsoutthatthearbitrated address-eventrepresentationAERprovidesthehighestbandwidthandscalability forthespikingneuronnetworkthathavesparseactivity.Detailsofaddressevent representationAERanditsimplementationarediscussedlater.Here,someofthe end-to-endsystemsthatareusedtocapturespikes,sendthemtothecomputerand re-mapspikesontothechiparelisted. Oneoftheinitialdemonstrationsofcapturingspikesfromneuromorphicchips usingUSB2.0protocolwasdemonstratedbyMerollaetal.[77].Theauthorswere abletocapture7million16-biteventspersecondequaling112Mbp/s.Theydidnot sendtimestampsoftheeventsbutaspecialheartbeataddressatregularintervals. ItusedlessUSBbandwidthbutrequiredinterpolationtoreconstructtimestamps.A moreadvancedboardwithaimprovedfunctionalitywasbuiltbyBerneretal.[17].The boardallowedsimultaneousmonitoringandsequencingofpreciselytimedAERdata.A completehost-sidesoftwarenamed jAER [5,38]isavailabletocapture,sequenceand displayeventsinrealtimefromchipsthatareconnectedtotheboard. 27

PAGE 28

AnotherprojectwhichhasdevelopedAERcommunicationhardwareisthe CAVIARproject[3].TherstboardisaPCI-AERinterfaceboardwhichprovides averyfastcommunicationchannelbetweentheAERchipandthePCsoftware.It hasacongurabletimestampfrom30nsto480ns.Inaddition,anotherportable boardisdevelopedcalledtheUSB-AERboard.Thisboardcangeneratesynthetic eventsfortestingAERgenerator,itcanmonitorandstoreAEReventsfromachip Data-loggerandPlayer,mapeventsfromonechiptoanotherAERMapperand captureimageframesfromvisionchipsAERframe-grabber.Fordetailsabout theseboardsseeRivasetal.[92]andPazetal.[84].Anotherboardalsocalledthe PCI-AER[7]wasdevelopedbyagroupinRome.Itcanmonitoramaximumof4AER chipsandhassequencingandmappingcapability.DriversforMicrosoftWindows r and LinuxoperatingsystemsandMATLAB r interfacesareavailable.Formoredetailsabout thisboardseeChiccaetal.[34]andDante[35]. AnimportantfeaturerequiredoftheseUSB-AERboardsistotemporallydelay spikesasthisisalsoseeninbiologyandisusefulforcertainalgorithmsandsynchrony[12]. Linares-Barrancoetal.[71]implementedatime-warpingAERmapper.Thisboardnot onlymapsAEReventsbutalsocandelayeventseitherbyaglobaldelayordelayeach eventarbitrarily.Theauthorsstatethatitbecomesdifculttokeepprocessingdelays shortforarbitrarydelayoncethenumberofeventsormappingsincrease. 2.5Discussion Anumberofkeythemesandtrade-offscanbegenerallyseeninthisreviewof spikingneuronhardwarearchitectures: 1.Congurablevs.dedicatedhardware.Developingdedicatedhardwarefor synapsesisattractiveduetomodelingofrichdynamicsandimplementinglocality connectionsbetweenneighboringneurons.However,havingindependent weightsandchangingthemateachsynapseisadifculttask.Fixedconnections limitthepossibletypesofalgorithmsthatcanbeimplemented.Dependingon thealgorithm,theconnectivitypattern,thefan-inandfan-outofaneuronmaybe requiredtobechangedwhichutilizesthehardwareefciently. 28

PAGE 29

2.Flexiblecommunicationandlearning.Thisallowsforarbitraryroutingofspikes withintheneuronarrayandimplementlearningoradaptationrulessuchasSTDP betweensynapses.Againasseeninmostoftheworkssofar,learningislimited toafewsynapsesinthechipsandtheysharethesameparameters.Thelearning mechanismsforanalogdesignscannotbechangedoncefabricated. 3.Scalability.Oneobjectiveofthesespikingnetworksistobuildalargenumberof neuronsonachiporamulti-chipnetwork.Tocommunicatebetweenneuromorphic chipsadditionalhardwareisthenrequiredwhichisnotconsideredinthedesigns exceptinafewcases[78] 4.Accuratemodeling.Mostcircuitsthatmodelbiologycloselyrelyonprecisedesign andcontrolofanalogcircuits.Owingtoproblemssuchasnoise,mismatch, parasiticsandcrosstalkthedesignofthesecircuitsisdifcultandunreliable.Most designsrequirequiteabitofeffortinsearchingandsettinguplargenumberof parameterswithhighprecisionforthehardwaretobeuseful. 5.Parallelism.Analogneuronarraysarepowerfulastheytakeadvantageof parallelism.OnFPGAsspikingnetworkscanbemadesuchthateachneuron canbesimulatedinparallelormanyneuronscanbemultiplexedtogetherto shareresources.Theformerhasadvantageofrunningatarealtimescaleand interfacingwithspikingsensorsthatgiveasynchronousoutputandissimplerin design.Thelatterhashighercomplexitybutgivesahigherthroughput. 6.Dedicatedversusvirtualsynapses.Mostofthearchitecturesimplementdedicated synapses.Thisrequiresforthemtoeitherhaveaglobalweightoracomplex methodforsettingaweightforaparticularconnection.SpiNNakerprovidesa mechanismforinstantiatingvirtualsynapses.Virtualsynapsesallowformaximal useofsynapsesasinactiveorunusedsynapsesbetweenparticularneuronscan beusedforconnectingtheseneuronstootherneurons. IntheChapter3anovelspikingarchitecturecalledtheUFTimeMachineUFTMis described.Itprovidesaexiblehybridanalog-digitalalternativewhichtacklessomeof theissuespresentedabove. 29

PAGE 30

CHAPTER3 THEUFTIMEMACHINE:ASPIKINGNEURONARCHITECTURE Inthischapter,thedesignofanoveltime-basedcomputationalarchitecture ispresented.First,ahighleveldescriptionofthecomponentssuchasneurons, synapsesweightsandsynapseroutingisgiven.Afunctionaldescriptionofthedigital controllerwhichiscommontobothananalogordigitalneuronarrayisalsogiven.A concreteimplementationoftheneuronarrayonananalogchipisexplainedincluding themechanismsforimplementingsynapsesandweightsinanalogcircuits.Afully digitalimplementationofthearchitectureisalsodescribed.Implementationdetailsfor theUSBcommunicationandthedigitalcontrolleraregivennext.Mostofthedigital implementationusingVerilogHDLworkwasdoneincollaborationwith,RaviShekhar, anotherstudentinCNELinDepartmentofElectricalEngineeringatUniversityofFlorida. 3.1Description Figure3-1.Blockdiagramofspikingneuronnetwork Thearchitecture'sname,UFTimeMachineUFTM,issofortworeasons.One obviousreasonisthatitcomputeswithtimingevents.Theothersubtlereason,aswill beseenlater,isthattheweightsforsynapticconnectionsarestoredintimeinstead ofananalogordigitalvalue.TheideabehindthedesignofUFTMisasimpleone. 30

PAGE 31

Itisadvantageoustoridethewaveofdigitaltechnologytoperformfastprocessing, memorylookup,storageofweightsanddecisionmakingandyetuseanalogforwhat itisbestat:implementingmassivelyparallelcontinuous-timedifferentialequationsto simulateneurons.Figure3-1showsablockdiagramoftheUFTimeMachine.The NeuronArrayconsistsofparallelindependentneuronsandcorrespondingsynapses. AsdiscussedinSection3.1.1,theneuronmodelcanbequitegenericandindependent ofthearchitecturedesign.ThesynapticweightsarediscussedalsoinSection3.1.1 andthesynapticconnectionmechanismisdiscussedinSection3.1.2.Theneurons communicatewiththeirspikesusingthearbitratedaddresseventrepresentation AERasmentionedinChapter2.Forcontinuity,theAERprotocolisdescribedin Section3.1.2.1.Theneuronarraycanbeimplementedinanalogwheremanyneurons inparallelcomputetheirmembranepotentialandcreatespikes.Thisisbestforneuron modelswhichhavealeakyfactor,synapticanddendriticdynamicsandadaptation.We canalsoreplacetheneuronarray,withoutlossofgeneralitywithadigitallyimplemented neuronarrayonanFPGAoradigitalASICaslongastheAERprotocolisusedto communicatewiththecontroller.TheoutputAERbusisconnectedtoadigitalcontroller whichacknowledgesincomingspikes,looksupconnectionsandweightsandsends spikestocorrespondingneuronsontheinputAERbus.Anyexternalspikinginputeither fromasensorsuchassiliconcochlea,retinaorgeneratedonacomputercanbeinputto thedigitalcontrollerviaaninputbus.Thecontrollerthenarbitratesorprioritizesbetween theneuronarrayandtheexternalinputs. 3.1.1Neuron,SynapsesandWeights Figure3-2showsablockleveldiagramthesynapsesandaleakyintegrate-and-re neuron.Theneuronconsistsofacapacitor C representingthemembranevoltage V mem .Thecapacitorintegratesinputcurrentfromacurrent-basedDACrepresenting thesynapticinjectioncurrent.Resistance R modelsthevoltage-dependentleakcurrent. WheneverthemembranevoltagereachesathresholdV th thecomparatorresaspike 31

PAGE 32

Figure3-2.Blockdiagramofaneuronandsynapses andresetsthemembranevoltagetoitsinitialvalueortherestingpotentialoftheneuron. Theequationforaleakyintegrate-and-reneuronfor V mid =0 isgivenby: C d V mem d t + V mem R = I in Storingandapplyingweightsforeachconnectionisextremelyresourceintensiveand difculttoimplement.Thatiswhymostimplementationschoosetoeithersetaglobal weightforallconnectionsorbreakupconnectionsintopoolswithcommonweights. Theweightgenerallyrepresentstheefcacywithwhichthespikeisdeliveredtoa neuron.Thiscanbedonebycontrollingtheinjectedcurrentamplitudethatresults duetoincomingspike.Anaivewayofimplementingweightsistostoreadigitalvalue inamemorycellperconnection.Then,whenaconnectionisactive,eitherroutethis digitalvaluetoaparticularsynapseanduseaninsituDACtocontrolthecurrent magnitude.AnotherwayistorstuseasinglehighspeedDACandroutetheresulting analogvoltagetoaparticularsynapse.Theformerrequiresverydenseroutingofdigital 32

PAGE 33

signalstostoreweightsandasmallDACineachneuronwhereasinthelatter,itisvery difculttoreliablyandrobustlymultiplexandtransferananalogvoltageacrossachip toaparticularsynapse.Anotherwaycouldbeimplementingonchipstorageforeach synapseeitherusingaRAMlikestructureorusinganalogmemories.RAMnotonly requiresrealestatebutalsorequiressupportinginfrastructuretoreadandwrite.Analog memoriesareattractivebutarenotreliableowingtotheirsusceptibilitytocrosstalk, chargeinjectionandleakage.Floatinggatescanbeusedasmemoryelements[55,105] butalsorequireextensiveinfrastructureandatimeconsumingprocesstoprogram[52]. Analogmemories[83]requireverycarefuldesignandcanhaveveryhighmismatch acrossalargechip.Theyalsorequireaperiodicrefreshbyre-samplingthedatareliably usingcomplexanalogcircuitsandstoringitbackintothememory. Figure3-3.Each spike on inputtotheneuronincrementstheinjectedcurrentbyoneunit. Dependingonweightintimee.g. W 1 ,a spike o isissuedandtheinjected currentisdecrementedby1unit.Twosuccessive spike on signalscause injectedcurrenttobeincrementedby2units. Toimplementweightsmoreefciently,wesuggestanovelschemeofrepresenting weightsintime.Variationinsynapticweightsmeansvariationintheamountofcharge thatisdumpedontheneuron.Thechargeproleversustimeobviouslydependson synapticdynamics.Since R t 0 Idt = Q ,whenweswitchoncurrentI in ,weborrowtime t betweenthetimeofeacheventandthenswitchoffthecurrentI in .Figure3-3explains thisidea.N 1 andN 2 aretwopre-synapticneuronsthatconnecttoapost-synaptic neuronN 3 withweightW 1 andW 2 respectively.WhenN 1 res,a spike on signalissent toN 3 .ThiscausesthecurrentI in inN 3 togoupby1unit.SubsequentlywhenN 2 also 33

PAGE 34

res,another spike on signalissenttoN 3 causingI in inN 3 tojumpbyanotherunit,the newvaluebecomes2units.Aftertime t 0 + W 1 haselapsed,a spike o signalissent toN 3 causingcurrenttobenowequalto1unit.HencethesynapticcontributionofN 1 isproportionaltoW 1 .Aftertime t 1 + W 2 ,thesecondspikeisturnedoffcausingI in to bezerounits.Herecontributionsfromdifferentsynapsesareaddedlinearly.Inhibitory connectionscanbeimplementedbyinterchangingtheactionsspike on andspike o fora spikeoccurrence.InthecontrollerinFigure3-1,inhibitoryweightscanbeimplemented asnegativenumbersorbysettingaspecialag.Thesevirtualsynapsesoffera wonderfulcompromisebetweentheadvantagesofdedicatedhardwaresynapsesand exibility. Figure3-2showshowtheideaofaweightintimecanbeimplementedusing anasynchronouscounterandacurrent-baseddigitaltoanalogconverter.Every spike on causesthecountertocountupbyoneandeveryspike o causesthecounter todecrementby1.Byswappingtheorderofcountsignalsforaspike,wecancreate theeffectofasimpleinhibitorysynapse.Eachneuronthereforehastwoaddresses: onefor countup andotherfor countdown .Todeliveraparticularspikesignal,the correspondingaddresslineisgivenashortpulseasshowninFigure3-1.Thecounteris saturating,sinceoncethemaximumorminimumofthecountervalueisreacheditjust staysconstantanddoesnotwraparound.Thisistrueinbiologywhere,asthenumber ofinputspikesincrease,theeffectofthespikesgraduallyreducesorsaturates.The numberofbitsinthecountergovernshowmanysynapsestoaneuroncanbeactiveat anygiventime.Fore.g.acounterwith5bitscanhave2 5 or32simultaneouslyactive synapses.ThecounteroutputcontrolsthebinarycurrentDACasshowninFigure3-2. TheMSBofthecounterisusedasasignbitforneuronsthathavebothexcitatoryand inhibitorysynapses.PositivevaluesofthecountercausecurrentfromthebinaryDAC tobesourcedontothecapacitor.Ifthevalueofthecounterisnegative,itsinkscurrent causingthecapacitorvoltagetodrop. 34

PAGE 35

3.1.2CommunicationandControl AsshowninFigure3-1,adigitalcontrollerisrequiredtocontrolandruntheneuron array.ThiscontrollerisimplementedonanFPGA.FPGAsprovideexibilityforrapid prototypingandre-congurabilitytoexpeditethedesignprocess.SincetheFPGAis programmedusingahardwaredescriptionlanguagesuchasVerilogorVHDL,thesame codecanbeusedtosynthesisandfabricateacustomdigitalASIC.ThisASICwillbe fasterandsmallerinsizeascomparedtoanFPGA.Threeimportanttasksthatare requiredtobeaccomplishedbythecontrollerare: 1.Recording:Acknowledgespikesfromneuronarray.Thisprocessdoestwojobs. aTheneuronarrayanddigitalcontrollerareconnectedviaanAERbus.For everyspikethatisgenerated,theneuronarrayissuesaRequestandputsthe addressonthebuswhichisthenreadbythecontroller. bTime-stamptheincomingspikeandgiveittoaUSBcommunicationprocess tobesenttoacomputerformonitoringandrecording. 2.Lookupconnectionsforaparticularpre-synapticspike.Thisprocessdoestwo things: aMappingandscheduling:Lookupthepost-synapticconnections,synaptic delaysandweightsfortheincomingpre-synapticspikeaddressfromRAM andqueueallcountupscountdownsfornegativeweightsorinhibitory connectionsintoaweightbuffer.Thetimeiscalculatedbyaddingcurrent timetothesynapticdelayreadfromtheRAM. bQueueintheweightbufferthecorrespondingcountdownscountups forinhibitoryconnectionforeachconnectionwithavalueintimeindicating whentosendtheseout.Thetimeiscalculatedbyaddingcurrenttimewiththe correspondingweightintime.Sinceaweightforaspikethatoccursrstcan begreaterthantheweightforaspikethatoccurslater,anyentriesintothe bufferhavetobesuchthatbufferisalwayssortedintime. cMonitortheweightbufferandsendoutspikesontheinputAERbus. 3.ExternalInputs:Monitorexternalinputbusforinputfromotherdevicessuchas spikingsensors.Initiallytheinputcanbeprovidedbythehostcomputerusingthe USBbus.Theprocessingofmappingandschedulingissameasdescribedabove. 35

PAGE 36

Owingtotheadvantageofthedigitaltechnologyandtheeaseofprogrammingon anFPGA,otherfeaturessuchaslearningmechanismscanbeeasilyaddedtothis controller.Anexampleofsuchlearningmechanismisspiketimingdependentplasticity betweenconnections.Ithasbeenshownsuchrulescanbeusedforlearningand adaptationinbiology. 3.1.2.1Arbitratedaddresseventrepresentation ThespikingrateforanalogVLSIneurons,asforbiologicalneurons,issparse, fromafewHztoafewhundredsofHz.Thespeedofdigitalbuseswhichisintens ofMHzcanusedtotime-multiplextheoutputofmanyneuronsusingonlyfewwires. Asmentionedearlier,theaddress-eventrepresentationAERisthemostefcient communicationprotocolthatcanbeusedtotransmitspikes.AERusesbinary-encoded wordstorepresentthespikesasaddresseventsinlog 2 NbitstouniquelyidentifyN neurons.Thetimeatwhichtheaddressisplacedonthebusautomaticallyencodes thetimingofthespike.Whenmorethanoneneuronattemptstosendoutspikesatthe sametimeeventcollision,anarbitrationschemeisusedtoselectoneoftheneuron. Sincetheeventsareasynchronous,ahandshakemechanismisusedtocommunicate spikesateachstep.Notonlydoesthisensuresanalmostdelayinsensitivedesign butitalsoavoidstimingjitterinregularsampledsystemsthatwouldhaveoccurredif asynchronouseventsweresampledatclockededgesonly. Figure3-4showshowtheneuronsinaneuronarraycommunicatewiththeAER block.Eachneuronhasone Request andone Acknowledge signal.Eacharbiterhas twoinputsandthreeoutputsandisconnectedtotwoneurons.Thearbiter'stwoinputs areconnectedbytwoneurons' Request lines.Thetwooutputsconnecttotwoneurons' Acknowledge lines.Thethirdoutputisconnectedtooneofthe Request lineofthe nextarbiter.Thearbitersarearrangedasabinarytreeasshowninthegure.When aneuronspikes,itmakesarequeston Req linewhichisrelayedbytheimmediate arbiterupwardsintothetree.Thenalarbiterselectsitselfandastheselectsignal, 36

PAGE 37

Figure3-4.AERconsistsofabinarytreeof2inputarbiters.Eacharbitertakestwo requestsandrelaysarequestupwardsbluearrows.Thetopnodeselects itselfandproceedsdownwardsredarrows.Ateachnodearbitrationis donetoselectoneoftherequests.Finally,attheleafnodesoneneurongets selectedN1 ack .Aninterfaceblockimplementsthehandshakebetween neuronandAERblock. Ack ,propagatesdownwards,eacharbiterwhenselectedbyitsparent,selectsoneofits requestinglinesarbitrarily.Attheleafnodeonlyoneneurongetsselected.InFigure3-4, bluelinesindicaterequestsmadebytheneuronandredlinesindicatethepathforselect ortheacknowledgesignalfollowedtoselectneuronN1.Theimplementationofthe arbiterisfair.meaningthataneuron'ssecondrequestisnotselectedoracknowledged tillalltheotherneuronsinthetreewhichmadearequestbeforethesecondrequestare serviced.Onceaneuronisselected,anaddressencoderoutputstheaddressonthe bus.Theencoderisnotshowninthegure.Forcommunicationbetweentheneuron blockandtheAERblock,aninterfaceblockisrequiredtogeneratetherighthandshake signalsfortheAERblockaswellastogeneratetheneuronarrayrequestsignaland receivetheacknowledgesignaltoandfromthedigitalcontrollerrespectively. Figure3-5showsthehandshakesequencebetweentheneuronandtheAER blockandhowthesignalsforcommunicatingoutsidethechipwiththecontrollerare generated.Briey: 1.Neuronmakesarequestbytaking N req high. 37

PAGE 38

Figure3-5.Handshakesequencebetweenaneuronandcorrespondingarbiter.Seetext forexplanation 2.Theinterfaceblockmakesarequesttothearbiterimmediatelybytaking Arb req high. 3.Thearbiterselectsthisneuronbymaking Arb sel high.The3processeshappen simultaneously: aTheinterfaceblockacknowledgestheneuronbytaking N ack high.The interfaceblockmakessurethat N ack ismadehighonlywhen Chip ack ishigh i.e.lastcommunicationwiththecontrolleriscomplete. bTheencoderencodestheaddressoftheselectedneuron. c Chip req ismadehighindicatingtothecontrollerthatanewaddressis available. 4.Thecontroller,onseeing Chip req high,takes Chip ack lowcausingtheencodertoput dataontheaddressbus. 5.Simultaneously,onreceivingthe N ack ,theneuronresetsitselfpulling N req low. 6.When N req goeslow,theinterfaceblockspullsthe Arb req low.Theinterfaceblocks makessurethat Arb req ismadelowonlywhen Chip ack islowindicatingthatthe controllerhasacknowledgedreceiptofthechiprequest. 7.Thearbiterremovesitsselectsignal Arb sel causingtwothingstohappen simultaneously: 38

PAGE 39

aTheacknowledgetotheneuronisremovedbymaking N ack low. bThechiprequest Chip req goeslow. 8.When Chip req goeslowandthecontrollerhasreadtheaddress,thecontrollertakes Chip ack highcompletingthehandshake. TheAERschemethatisillustratedisadaptedfrommanyofthepreviousworksthat hasbeenpublished.Detailsabouttheimplementationcanbefoundin[20,21,37,75]. Neuronsinthearraycanbeorganizedasatwodimensionalarrayinwhicheachneuron isuniquelyidentiedbyarowandacolumnaddress.Implementingatwodimensional AERprotocolrequiresextralogictobebuiltintoeachneuronandamorecomplicated interfaceblock.Thisisincontrasttoaonedimensionaldesignwhichispresented above.Boahen[22]presentedatwodimensionalAERdesigninwhichtheoverheadof arbitrating,encodinganddecodingcanbereducedinareafromNto p N .Theoverhead intimecanbereducedbyexploitinglocality,asobservedinbiology,intheclustered activityofneurons.Thisisdonebyservicingallneuronsinaparticularcolumnrst beforeselectingthenextcolumn.Inthisway,thecolumnaddressisonlyoutputoncefor allneuronsringinthesamecolumn.Later,tofurtheroptimizehardwareresourcesand speeduptheAEReventhandling,Boahen[23]alsopresentedadesigninwhichallrow addressesofasamecolumnaretransmittedseriallyfollowedbythecolumnaddress reducingthenumberofhardwarepinsrequiredforthecolumnaddress. 3.2HardwareImplementation ThehardwareimplementationoftheUFTimeMachineisdividedbroadlyinto twoparts.Firstisacustomdesignedmixedsignalchipwhichconsistsofcircuits forthedecoder,counters,neurons,currentsourcesandtheAERblock,designed andfabricatedinONsemiconductor's0.5 m technology.Thisoffersadvantages, aswillbeseenlater,intermsofpowerconsumptionandareatoprecisionratio.The secondimplementationisacompletedigitalversioninwhichtheneuronarray,the synapsemechanismandtheAERblockareimplementedonanFPGA.Itprovides 39

PAGE 40

advantagesintermsofeaseofdevelopmentandre-congurability.Thissectionpresents detailsofbothimplementationsfollowedbythecontrollerandUSBcommunication implementationswhicharecommontoboththecustommixed-signalandFPGA-based implementationsofUFTM. 3.2.1Mixed-SignalImplementation Acustomchipwith32neuronswitheachneuronhaving32simultaneouslypossible activesynapsesexcitatoryand16inhibitorywasfabricatedandtested.Thischip alsocontainedcurrentsources,buffers,adecoderattheinputandAERattheoutput. ThemeasuredresultsarepresentedinChapter5. 3.2.1.1Analogspikingneuronarray Figure3-6.LowpowerIFneuroncircuitaspresentedin[60] Figure3-6showsthecircuitfortheIFNeuronthathasbeenimplementedin theneuronarray.ThiscircuitwasproposedbyIndiverietal.[60]andincludedother 40

PAGE 41

capabilitiessuchasspikerateadaptation.Aninputcurrent I in frompre-synapticspikes isintegratedlinearlyrepresentedby V mem bythecapacitor C 1 .M 17 andM 16 forma sourcefollower.Ifthesourcefolloweroperatesinsub-thresholdregionthen V 1 = V mem )]TJ/F39 11.9552 Tf 12.71 0 Td [(V th [72]where isthesub-thresholdslopefactor.As V mem increases, V 1 increasesapproachingthetrippointofinverterformedbyM 11 -M 15 .As V 1 getscloseto thetrippoint,asubthresholdcurrentowsintheinverterasM 15 isjustturningonand V 2 startsgoinglow.ThiscurrentgetsmirroredviaM 11 andM 10 andfurtherincreases V mem settingupapositivefeedback,switchingtheinverterM 11 -M 15 rapidlyandreducing theshortcircuitpowerdissipationintheinverter. V th canbeusedtosetsomecontrol overthespikingthresholdoftheneuronasitmodulatesthe V 1 linearlyinresponse tochangein V mem .InverterM 4 -M 5 shapesvoltage V 2 togenerateaspike Req which issentasarequesttotheAERinterfaceblock.Onreceiving Ack Reset goeshigh, switchingonM 3 andM 2 ,pulling V mem and V 1 toground,resettingtheneuron.Onreset afterthereceiptof Ack V 2 goeshighand Req goeslowcompletingthespikeandpulling requestfromtheinterfaceblock.As V 2 goeshighM 14 triestodischarge Reset toground bythedischargerateiscontrolledbycurrentthroughM 6 whichiscontrolledby V rfr Thus,aslongas Reset issufcientlyhigh, C 1 ispulledtogroundputtingtheneuronin arefractoryperiod.M 1 and V lk implement,ifrequired,aconstantleakcurrent.M 9 is requiredtocontroltheonsetofpositivefeedbackandM 8 controlstheonsetofresetto maintainthestabilityofthefeedbackloop.M 17 andM 16 havelargewidthsandlengths forbettermatchingandthereducingeffectofavariablesubthresholdslopefactor TheresettransistorM 3 discharges V mem quickly,but V 1 dischargesveryslowlyasM 16 is insub-threshold.Thisslowsdowntheresetandcausesalargeshortcircuitcurrentto owinM 11 -M 15 M 2 wasaddedtoreset V 1 simultaneouslytoremovethisproblem.The 41

PAGE 42

equationforthisneuroncanbewrittenas[16]: C 1 d V mem d t = I in + I fb )]TJ/F39 11.9552 Tf 11.956 0 Td [(I lk = I in + I sc e )]TJ/F26 7.9701 Tf 6.587 0 Td [( 16 V th U T e 16 V mem U T )]TJ/F30 11.9552 Tf 11.955 13.27 Td [(h I n 0 1 e U T V lk )]TJ/F22 11.9552 Tf 5.48 -9.684 Td [(1 )]TJ/F39 11.9552 Tf 11.955 0 Td [(e )]TJ/F40 7.9701 Tf 6.587 0 Td [(V mem i where: U T isthethermalvoltage 25mV. i aretherespectivesub-thresholdfactorsofthetransistors. I fb istherst-ordermodeledpositivefeedbackcurrentthatowsinM 9 -M 10 I sc isthecurrentthatowsintheinverterM 11 -M 15 duringpositivefeedback. I n 0 1 isaccumulatedpre-exponentialinsub-thresholdequationforM 1 Figure3-7.SimulationresultsusingAMI0.5 m technologyfortheneuronpresented above.Topleftcornershowshowspikingratevarieswithincreasinginput current.Toprightcornershowshow V th canbechangedtocontroltimeof spike.Bottomgureshowsvariationinspiketimingwithrefractoryperiodset by V rfr Figure3-7showssimulationresultsoftheneuronshowninFigure3-6inONsemiconductor's 0.5 m process.Ifweignorethetimespentduringpositivefeedbackinthecircuit,we 42

PAGE 43

canseethattheneuronthreshold V thresh rangeisdependenton V th seeFigure3-6as: V trip + V 1 V thresh + V 2 + V trip where: V 1 6 V th 6 V 2 where V trip istrippointofinverterand isthesub-thresholdfactor.ForAMI0.5 m and V dd =5V, 0.35 V 6 V th 6 0.7 V V trip 0.77 V and =0.94 whichgives 1.17 v 6 V thres 6 1.6 V 3.2.1.2Customasynchronouscounterdesign An n -bitcounterisconstructedfrom n counterblocks.Thedesignofthiscounteris suchthatitsaturatesatitsmaximumandminimumvaluesanddoesnotwraparound. Thetransistorcountgrowslinearlywith n .Thedesignofthiscounterwasrstinitiated byTomHolzatCNELinDepartmentofElectricalandComputerEngineeringat UniversityofFlorida.Eachcounterblocksharesaglobalcountdirectionsignaland aglobalclock.Thecentralideaofthecounteristhateachbittellsthehigherbitwhen totoggleandthehigherbittellsthelowerbitwhenitssaturatedandcannottoggle anymore.Whenthehigherbitissaturated,thelowerbitdoesnotaskthehigherbit totoggle.Eachbithastwoinput-outputpairstopropagatethetoggleandsaturation informationupanddownthecounter. Each i th bithasaninputfrom T in i connectedtopreviousbit's T out i ,aninput S in i connectedtothenextbit's S out i andastate Q i .The T in i signaltellsabitthatitshould toggleifitcan.The S in i signalletsabitknowthattheupperbitsaresaturated.The T out i and S out i signalsrespectivelyaskthenextupperbittotoggleandinformthenext lowerbitifallthebitsaboveitaresaturated.Formally, T in ,0 =1 T in i = T out i )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 S in n =1 S in i = S out i +1 43

PAGE 44

i.e.therstbitalwaysbeingaskedtotoggleandtheMSBisalwaystoldthathigherbits aresaturated.Thecounteroutputsaregeneratedaccordingto: T out i = T in i Q i Up S out i S out i = S in i Q i Up A T i signalisinternaltoeachbitindicatingifthebitshouldbetoggledonthenextcount pulse.Abit'sstateisupdatedoneveryrisingedgeoftheclockaccordingto: T i = T in i S out i Q i = Q i T i + Q i T i XORing Q i with Up enablesthecountertoworkbidirectionallywithoutseparatelogicfor upanddown.Thetruthtableforacounterbitis:Figure3-8showstheimplementation Table3-1.Truthtablefor1bitcounter T in i S in i T i T out i S out i 0 0 0 0 0 0 1 0 0 Q i Up 1 0 1 Q i Up 0 1 1 0 0 Q i Up ofaonebitcounter. N ofthesearecascadedinseriestobuilda N -bitcounter.The outputsofcounter Q i areusedtoswitchthegatescontrollingthecurrentDAC. Sinceactivityintheneuronarrayisasynchronousorevent-based,eachcounter associatedwithaneuronneedsaseparateclock.Werealizethatacounterneeds tocountupordownonlyatthearrivaloftheupanddownsignalfromthecontroller. Tomakethiscounterasynchronous,Asmalllogicblockwasdesignedtogeneratea clockrisingedgeofthe Up or Down signal.Figure3-9showsthelogicblocksusedto implementthislogic.Truthtableforthis clockGen circuitis 44

PAGE 45

Figure3-8.Eachonebitcounterreceives S in and T in andoutputs S out and T out Q and Q arethestateofthisbit. Set and Reset areusedforasynchronoussetting andresettingthebit. CLK and CLK causebittotoggleontherisingedge. Table3-2.Truthtablefor clockGen logicforthecounter CntUp CntDn Clk Up Up 0 0 0 hold hold 0 1 1 0 1 1 0 1 1 0 1 1 X illegal illegal Asmentionedearlier,theMSBofthecounterisusedasasignbitwheninhibitory synapsesaretobeimplemented.Forthisinann-bitcounterweneed: Q n =0 Q i = Q i Q n =1 Q i = Q i becausewhenthecountervalueisnegative Q n =0weneedtoconnecttheappropriate bitsofthecurrentDACtogroundseeFigure3-2tosinkcurrentandwhenthecounter valueispositive Q n =1,weneedtosourcecurrentbyconnectingthecurrentDACtothe powersupply.Figure3-9showsthecircuitusedtoimplementthis.Forasignedcounter usingitsMSBasasignbit,therearetwozerostateswhenallbitsexceptMSBarezero. Whenthecounteriscountingdownfrompositivetonegativeitwilltaketwocountdown 45

PAGE 46

Figure3-9.Topblockingureshowshowthe Set and Reset signalsaregenerated. Bottomleftblockgeneratesthe CLK CLK Up Up signalsfrominput CntUp and CntDn signals.Bottomrightblockshowshowappropriatesignalsforthe currentDACtosource+ortosink-aregenerated. pulsestocrosszeroandbecomenegativeoneandviceversa,i.e.thecounterstatewill gofrom 1000...1+ d 1 0000...0+ d 0 0111....1 )]TJ/F39 11.9552 Tf 9.299 0 Td [(d 0 0111...0 )]TJ/F39 11.9552 Tf 9.299 0 Td [(d 1 .Toavoid twostatesforzero,asmalllogicblockisdesignedwhichmonitorsthe T in n signaland forcesthecountertoskiponeofthestatesusingthe Set and Reset signals.The Set and Reset canbeoverriddenbyexternalresetandsetsignals.Thecounterisstartedat thestate 1000...0+ d 0 byswappingtheinput Set and Reset signalstotheMSBblock. Figure3-9showsthecircuitthatisusedtoimplementthisblock.Formally: Reset = Up T in n CLK extReset Set = )]TJETq1 0 0 1 262.754 174.871 cm[]0 d 0 J 0.478 w 0 0 m 14.835 0 l SQBT/F39 11.9552 Tf 262.754 164.895 Td [(Up T in n CLK extSet Figure3-10showssimulationresultsofcounterdesignedinAMI0.5 mtechnology. Thecounterisgivencountuppulsesatregularintervalsandthecountercountsuptoits maximumvalueandthensaturates. 46

PAGE 47

Figure3-10.8-bitcountersimulatedwithonly CntUp givenatregularintervals.Allthe signalsstayhighafterthecountersaturates. 3.2.1.3Analogsynapsesanddigitalweights ToimplementthecurrentDACforsynapsesasshowninSection3.1.1,acurrent splittercircuitisusedwhichsplitsamastercurrentgeneratedusingaboot-strapped currentsource[39].Figure3-11showsthecircuitfortheDACandreaderisreferredto Delbr uckandSchaik[39]formoredetails.Briey,thecircuitconsistsofasplitterchain whichlikeanR-2Rladdercanexactlysplitacurrentintobinarycurrentsirrespectiveof thetransistor'sregionofoperation[28].Thecircuitshowninthegurehasbeenslightly modiedsoastocreatecurrentsinbothdirections.Thisenablesustoimplement excitatoryandinhibitoryconnections.Abootstrappedmastercurrentbias,basedon aWidlarcurrentsource[112]isusedtosupplycurrenttothecurrentsplitterchain.For 47

PAGE 48

operationinsubthresholdinFigure3-11: I m = log M U T R g m = log M R where I m isthemastercurrentthatistobesplit. M isratioof W L oftransistors M n 21 and M n 21 U T isthethermalvoltage Ristheresistanceinthegure. g m isthetransconductanceoftransistor M n 21 isthesubthresholdfactor. Figure3-11.AcurrentbiasgeneratorwithbinarycurrentsplitterstobuildacurrentDAC. Ithastwosplitterchainstogenerateasinkingcurrentorasourcingcurrent. HencethiscurrentoutputoftheDACjustdependsontheratio M andiscongurablevia R.Itisalsoaconstant g m circuitasithasno-rstorderdependenceontemperature.A 48

PAGE 49

start-upcircuitryisrequiredtokickthemasterbiascircuitoutofastablestatewhere I m isalmostzero. Foreachneuron,wecreateacopyofallthebranchesofthecurrentsplitter.We thenusethesourceswitchingtechnique[32]toselectaparticularcombinationof currentsdependingonthecounteroutputasindescribedinSection3.1.1.Figure3-12 showshowthecurrentsaresourcedrightandsinkedleftfromthemembrane capacitor.Theinputs )]TJ/F39 11.9552 Tf 9.298 0 Td [(Q 0 to )]TJ/F39 11.9552 Tf 9.299 0 Td [(Q n )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 and + Q 0 to + Q n )]TJ/F23 7.9701 Tf 6.587 0 Td [(1 arederivedfromadifferent circuitshowninFigure3-9.Thebiasvoltages V bn to V bn 0 and V bp to V bp 0 arederived fromthecurrentsplittercircuitinFigure3-11. Figure3-12.CircuitsusedtocopycurrentfromcurrentDACtoeachneuron.Leftblock isusedforsinkingcurrentwhencounterisnegativeandrightblockisused forsourcingcurrentforpositivevaluesofcounter. Oneproblemincopyingcurrentsandsourceswitchingisthatoffeed-throughdue toparasiticcapacitance.Thecounterproducesaverysharprisingedgeatthesourceof copytransistors Q i inFigure3-12whichcausesatransientinthemirroredvoltages v bni and v bpi viathegate-drainorgate-sourceparasiticcapacitance.Thesetransients takealongtimetodieoutastheonlydischargepathisthroughtheparasitic C gd or C gs viaaverysmallcurrentwhichisbeingmirrored.Thiscausesanerroneousvalue ofcurrenttobecopiedfromthemastersplitter.Toalleviatethisproblem,eachofthe currentsthataregeneratedbythemastersplitterneedstobebuffered.Thisisdone onceperchip.Afoldercascodeamplier[8]isusedhereasitcanprovideaveryhigh gainbandwidthandaswellasaninputvoltagerangegreaterthanoneofthepower railsdependingonthetypeoftheinputdifferentialpair.Thisisimportantbecausewe 49

PAGE 50

generateverysmallcurrentsfromthesplitterwhichcausesthegatevoltagesofthe currentmirrorthatistobebufferedtositveryclosetothepowerrails.Figure3-13 showsthecircuitsthatweredesignedandTable3-3listssomesimulatedcharacteristics ofthebuffer.Emphasiswasplacedonverylowoffsetandlowbiascurrentduring design. Figure3-13.Foldedcascodetopologybasedbuffersareusedinunitygainconguration tobuffergatevoltagegeneratedbycurrentsplittercircuitinFigure3-11 Table3-3.Bufferamplierdesigncharacteristics Characteristics Value PowerSupply 5V BiasCurrent 10 A DCOpenLoopGain 71.63dB Bandwidth 12.41MHz PhaseMargin 75.31 OffsetMonteCarlo Max:20mV, = 423 V = 8.76mV MaxOffsetMismatchanalysis = 11.6mV Figure3-14showsaneuronthatissimulatedtogetherwiththean8-bitsynapse. 3.2.1.4CustomAERcircuitdesign Figure3-15showstheschematicforimplementationoftheAERschemeshownin Section3.1.2.1.Thetwo-inputarbiterisshownintheleftblockofthegure.Thearbiter ispipelinedforhigherefciencyanditisdesignedtousethesameselectionsignalfrom upperlever Arb fromnextlevel toprocessthedaughtercellsmakingrequestsatthesame 50

PAGE 51

Figure3-14.An8bitcounterandcurrentsplitterareusedforsynapseswiththeneuron. Regularupcountpulsesaresenttotheneuron.Thespikingfrequency increaseswithtimeastheinjectedcurrentincreaseswithincreasingvalue ofcountertillthecountersaturates. S 7 out showswhencounterissaturated. Figure3-15.Leftblockshowsatwoinputarbiter.Toprightblockshowstheinterface circuit.Bottomrightblockistheimplementationoftheaddressencoder. 51

PAGE 52

time.Also,nonewrequestisacceptedfromthedaughtercellsbeforeanyotherpending requestsareserviced.Thismakessurethatthearbitrationisfairi.e.aneurongetsto transmitsecondtimeonlywhenallotherneuronsrequestingafterit'srstspikeand beforeitssecondspikeareserviced.ThereaderisreferredtoBoahen[21]formore detailsabouttheimplementation.Theinterfacecircuit,shownintopright,theneuron requesttothearbiterandinthereversedirectionthearbiterselectsignaltoneuron.The addressencoder,bottomrightblock,isa1-inm on-hotbinaryencoderwhichencodesm neuronsintoNaddress N = log 2 m Figure3-16showssimulationresultsoftheAERblocksimplementedinON Semiconductor's0.5 mprocess.Twoneuronsgeneraterequests, N 1 req and N 2 req atthesametime.Thearbitertreeselectsneuron1raises Chip req andafteritreceives acknowledgmentfromthereceiversitservicesneuron2.Theaddressencoderoutputs addressaccordingly. 52

PAGE 53

Figure3-16.SimulationofAERcircuitsshowingthehandshakeasexplainedin Section3.1.2.1. 53

PAGE 54

3.2.2FPGABasedImplementation Figure3-17.DigitalImplementationofaleakyintegerate-and-reneuronwithrefractory period. Acompletedigitalimplementationoftheneuronarray,decoderandAERhasbeen doneusingVerilogHDLandimplementedonanXilinxFPGAbasedboardmadeby OpalKelly,describedlaterinSection3.3.1.Asdiscussedpreviously,anFPGA-based implementationdecreasesdevelopmenttime,providesfasterresultsandexibilityin design.Theneuronhardwarecanbemultiplexedmanytimesowingtothehigherspeed ofthedigitalFPGAs.Anewcongurationoftheneuroncanbeeasilydownloaded ontotheFPGA.Figure3-17showstheblockdiagramofaleakyintegrateandre neuronthathasbeenimplemented.Anasynchronoussaturatingcounteraccepts eitheranuporadownsignalwhichcomesfromadigitaldecoder.Thecounterisa signedsaturatingcounterwhoseMSBisusedasasignbit.Thevalueofthecounter isaccumulatedataclockspeedof f NCLK .Itiseitheraddedtoorsubtractedfromthe accumulatordependingonthesignbitofthecounter.Theoutputoftheaccumulatoris comparedtoapre-determinedthreshold.Ifthevalueishigherthanthatofthethreshold, aspikerequestisgeneratedwhichisthenforwardedtoanAERblock.Theneuron waitsforitsrequesttobeservicedandonreceivingtheacknowledgmentfromthe AER,issuesaresetoftheaccumulator.Iftheleakyfactorisenabledforaneuron, thenthevalueofoftheaccumulatorisrightshiftedbyanamountspeciedbytheuser intheleakregisterandsubtractedfromtheaccumulator.Thismethodofcalculating 54

PAGE 55

thoughapproximate,savesalotofexpensivehardwareandtimebycalculatingthe leakasapowerof 1 2 insteadofanexponential.Thisapproachisusedinmanyprevious digitalimplementationssuchasinSchrauwenetal.[99].Ifarefractoryperiodisalso enabled,thentherefractoryblockisactivatedwheneverthereisaspikeandkeepsthe accumulatorresettilltherefractoryperiodisover.Therefractoryperiodiscalculatedby incrementingacounterintherefractoryblockwhenaneuronspikesandcountingupat particularratetillitreachesapre-denedrefractoryvalue.Therefractoryperiodisover oncecounterreachesthisvalueandresetisremoved. Thethresholdoftheneuronandleakfactorhastobescaledtoaccountforthe valuesofcurrentandcapacitortogetthespiketimingscorrectinatherealworldtime scale.Thedigitalneuronsaremeanttoruninrealtimescale.Thismeansthatunlikea simulationonacomputer,theneurondoesnotspeedupitscalculation,butintegrates overthesametimeasananalogneuron.Thisallowsustointerfacethisdigitalneuron chipwithspikingsensorsthatproduceinter-spikeintervalsinreal-timescale.The dynamicrangeofinterspikeintervalsthatcanbegeneratedbythisneurondependson thewidthoftheaccumulator, W acc bits,andtheneuronclock f NCLK whichdetermines howfastthestateofaneuronisupdated.ForaconstantringrateforDCinput, T min theminimuminter-spikeintervaltime,happenswhenthecounterisatitsmaximum valueandthresholdissetatitsminimumnonzerovalueand T max ,themaximum inter-spikeinterval,happenswhenthecounterisatitsminimumnonzerovalueand thresholdissetatitsmaximumvalue.Soforacounterwithwidth W ctr bits: T max = 2 W acc )]TJ/F22 11.9552 Tf 11.956 0 Td [(1 f NCLK T min = 1 f NCLK W ctr )]TJ/F22 11.9552 Tf 11.955 0 Td [(1 Fore.g.for f CLK of1MHzand16bitaccumulator,themaximumspikeintervalfor constantringis65ms.The f NCLK determinestheresolutionoftheneuronequation update.Italsodeterminesthetimingprecisionoftheeffectofanincomingspiketoa 55

PAGE 56

neuron,i.e.eventhoughthecounterisasynchronous,theneurontakesintoaccount anincomingspikeonlyatthenextclock.Sothereisatrade-offbetweentheprecision oftheneuronupdatesandthedynamicrangeoftheinterspikeintervals.Typicallythe neuronsreataveryslowrateandthusaveryhighprecisionisnotrequiredforneuron updates.Forknownvaluesofcapacitance,thethresholdandtheinjectioncurrent ofananalogneuron,thethresholdofthedigitalneuroncanbescaledaccordingto equation3. TheneuronclockcanberunmuchslowerthanthatsupportedbytheFPGAto savepowerandincreasedynamicrange.However,thedigitaldecoderandtheAERcan berunasfastaspossibleastheirinteractionwiththeneuronisasynchronous.Sofor multiplesynapticconnectionstoaneuronwhichhaveinputatthesametimetheycan bedeliveredtotheasynchronouscounterwithveryhighprecisionandtheyallappearto arrivesimultaneouslytotheneuronrunningataslowerclock.Thedigitalimplementation ofAERisverysimilartotheonediscussedinSection3.2.1.4.However,insteadof computingthestateofallthearbitersinoneclockcycle,eachdepthofthetreetakes onecycletocompute.ThisleadstoafasterclockfortheAERandhigherthroughput thoughthetimeforasinglerequesttopropagatedownthetreetakesasmanycycles asthedepthofAERtree.ThedifferentclocksintheFPGAaregeneratedusingdigital clockmanagersDCMsavailableonXilinxFPGAs[114]. Figure3-18showsatypicalsimulationofthewholesystemdevelopedonthe FPGA.Theupcountsynapsesforneurons6and7identiedbysynapsenumber12 and14respectivelyaregiven3upcountseachbyplacingtheiraddressonthebusand sendingthedecoderandenablesignal.Thevalueofthecountersisthenheldconstant bynotsendinganyfurtherupordowncounts.Therefractoryperiodissetto4clocks andleakagefactorissetto4whichwhichmeansthat 1 16 th valueoftheofaccumulatoris subtractedfromtheitsvalueateachneuronclock.Theshapeofthemembranevoltage curveisverysimilartoan RC chargingcircuit.Whenthemembranereachestheset 56

PAGE 57

thresholdof50,theneuronresandcreatesarequest.Thetwoneuronsspikevery closelybutatdifferenttimesowingtodifferenceinthetimeofinputs.TheAERreceives tworequestsandservicesthemonebyoneandputstheaddressofneuronsonthe AERaddressbusintheorderthespikeswerecreated. Figure3-18.Digitalsimulationofneuronarray,decoderandAER. 57

PAGE 58

3.3USBEventMonitoring AnimportantcomponentforUFTimeMachineistheUSBmonitoringdevice.This isrequiredtorecordspikesthatoccurwithintheneuralarraysothatausercanobserve whatishappeninginsidetheneuralarray.Themonitoringdevicecapturestheaddress ofthespikingneuronandaddsatimestampandsendsittothecomputerforfurther storageandvisualization.Thedevicehastobecapableoftransferringdataathigh datarateswithreasonabletimestampresolutionandprovideamechanismforreal timevisualization.TheUniversalSerialBusUSBinterfaceisapopularchoicefor datacommunicationasitisubiquitousinallcomputersandprovidesanacceptable dataratesupto40MBpsforUSB2.0ascomparedtoserialorparallelports.The pioneeringworkforsuchadevicewasrstpublishedbyBerneretal.[17].Asmentioned earlier,thiswasacustomdesignedboardforwhichrmwareonthedevicesideand softwareonthecomputerhadtobedeveloped.Thisnotonlyrequiresexpertisewith USBprotocolbutitalsotimeconsuminganddifculttoreproducebyotherusers.We adoptedadifferentapproachbybuyinganofftheshelfboardfromOpalKelly[6]where mostofthecodeforUSBcommunicationisalreadyimplemented.Thenitestate machineswhichcontrolthetimestampingandhandshakewithAERinterfacehavebeen adaptedfromthesourcecodeofUSBAERboardavailableat[5]andisexplainedin greatdetailin[18].SomemodicationshavebeenmadetoadapttheUSB-AERboard toworkontheOpalKellyBoards. 3.3.1OpalKelly'sXEM3050 Figure3-19showstheOpalKellyboardwhichhasbeenusedextensivelyinthe developmentprocess.TheboardhastheXilinxSpartan3FPGAwith4millionlogic gates.IthasaCypressFX2LPUSBmicrocontrollerandupto64MBofSDRAM.It alsohasacongurableCypressSemiconductorPLLwhichcanproduceclocksupto 150MHz.ThedetailsabouttheboardareavailableonlineatOpalKelly'swebsite.The 58

PAGE 59

boardhastwohighdensity80pinconnectorstowhichtheFPGA'sI/O's,onboard regulator'sandPLLoutputsareconnected. OpalKellyprovidesaverygoodapplicationprogrammer'sinterfaceAPInamed FrontPanelforcommunicatingandcontrollingtheboard.Insteadofusingtheslow JTAGprogramminginterface,FrontPanelprovideslibrariesandutilitiestodownload theFPGAcongurationdirectlyviaUSB.Fromthecomputerside,thelowleveldrivers arealreadyprovidedwithwrapperinterfacesavailableinC,C++,MATLAB r ,JAVAand Python.Ontheboardside,FrontPanelcontainspre-compiledVerilogcodewhichtakes careofsettingupandcommunicatingwiththeCypressUSBmicro-controller.Theboard shipswiththermwarefortheUSBmicro-controllerpreloaded.USBoffersfourtypes Figure3-19.XilinxFPGAbasedboardwithUSBmicrocontroller. Source: www.opalkelly.com oftransfers[13]oneofwhich,thecontroltransferisusedduringinitialidenticationand conguration.Theotherthreedatatransfermodes:Bulk,InterruptandIsochronousare usedwithdevicessuchasexternalharddrives,mouseandkeyboardsandstreaming audiorespectively.Thersttwodatatransfersareusedextensively.Interrupttransfers offerguaranteedlatencybutareveryslowcomparedtobulktransfers.Bulktransfers aredesignedtotransferdatainburstsof512bytesusingtheidletimeoftheUSBbus. Thetheoreticalmaximumforbulktransfersis53.248MBpsandahighspeedhost 59

PAGE 60

canachieveupto38MBps.FrontpanelAPIprovidesabstractionoftheseunderlying mechanismonthedevicesidesothatitsisveryeasyforacoderunningonFPGAto transferdatatothecomputerbyassumingthewholeUSBcommunicationscontrolleras ablackbox.Theseabstractionsare: 1.Wires:Thearesixteen16-bitwidebuseswhosevaluecansetorreadfromeither computerorhostsidebygivingtheaddressofthewireandthensettingorreading thevaluedependingonthedirection.Allvaluesofthewiresareupdatedatonce byissuingacommandfromthehostside.Theseareusefulforsettingconstantsor readingstatuswordsetc. 2.Pipes:PipesusethebulktransfermodetotransferdataoverUSB.16pipesare availablewhichcanbelinkedtoabufferormemoryblockonthedevicesideor onthecomputerside.Thedevicesideuser'scodeneedstobereadywhenevera pipetransferisrequestedbyeitherprovidingdataatagivenclockrateorreading thedataattheUSBclockrateor48MHz 3.Triggers:Triggersaresynonymouswithswitchesonaboard.Theyareactivefor onlyonceclockcycle.Theycanbeusedtostartorstopanoperationorsignalthat astatehasbeenachieved. 3.3.2EventMonitoringandTimeStamping Figure3-20showsthehowtheUSBmonitoringmachineisorganizedontheFPGA board.Thefunctionoftheboardisperformedbythreeseparatestatemachinesnamely: MonitorMachine ReadFromUSBMachine and MainMachine .The MainMachine is responsibleforsynchronizingtheoperationsofallmachines.The Monitormachine isresponsibleforreceivingrequestsfromanAERinterface,latchingtheaddressand returningtheacknowledgementassoonaspossible. ReadFromUSBMachine waitsfor thereadsignalfromthehostcomputerandtransfersthestoredeventsfromonboard memorytothecomputerviaOpalKellyUSBinterface.Theeventsarestoredusinga ping-pongbufferingschemeusingtheBlockRAMSpresentontheFPGA.Atimestamp counterwithcongurablefrequencyisusedtotimestamptheincomingevent.Abrief descriptionofeachoftheblocksisgivenbelow. 60

PAGE 61

Figure3-20.TopleveldiagramshowingcomponentsofUSBmonitoringdevicewith signalowbetweenthem. 3.3.2.1Timestampandeventstorage Asdescribedin[18]thisimplementationalsousesa16-bittimestamp.Thetime stampclockgeneratorcangeneratetimestampclocksbydividingthemainFPGAclock byanintegerbetween2and127.ThisdividerissetviaawirefromtheFrontPanel APIthusmakingthetimestampresolutioncongurable.Thistimestampcountercan overowveryquickly 65msfor1 s16bittimestamp.Usingawidertimestamp wastesbandwidthasmostofthehigherbitswillbezero.Toovercomethis,aspecial overoweventissentbysettingtheMSBto1indicatingtothehostthatanoverowhas occurred.Thehostthenunwrapstheseeventsinto32bittimestampstakingcareofthe overow.Theoverowdetectionmechanismisthesameasin[18]. 61

PAGE 62

Theimplementationin[18]implementsthermwareontheUSBmicrocontroller andhasaccesstothebuffersonthehostsidewhichareledbyconcurrentthreads managedbytheoperatingsystem'skernel.Thisfunctionalityisprovidedbythe Theysecon'sUSBdrivers.Ourimplementationsworksatonelevelupinabstraction anddoesnothaveaccesstothesebuffers.Thismeansthattheeventshavetobe bufferedontheFPGAitselfandtransferredtothecomputerwhenapipereadis issued.Maximumsustainedeventratethatcanbeachievedisthusafunctionof availablememoryontheFPGA.Wehaveimplementedaping-pongbufferingschemeto maximizethethroughput.Whileonebufferisbeingreadpingtheotherbufferisbing lledpong.Thebuffersswitchwhenthenextreadisissuedirrespectiveifthebufferis fullornot.WeusingXilinx'sCoreGeneratorTooltogeneratetheonboard32bitwide simpledualportBlockRAMs.UsingtheseRAMswecanwriteononeportandread ontheother.Thewritingwidthis32bit,storingtheaddressandtimestampinjustone clockcycle.Thereadingwidthis16bitsasthepipetransferwidthusingFrontPanelis16 bits.Hencefortwicethenumberofeventreadclockcyclesareusedforpipingoutdata fromtheFPGA. TheOpalKellyFrontPanelAPIrequiresthattheuserspecifybeforehandhowmany bytesaregoingtobepipedoutfromtheFPGAduringapiperead.Hencewehaveto knowhowmanyeventshaveoccurredbeforereading.Thismayseemasanoverhead butitturnsoutthatitcanbeusedtocontroltherateatwhichtheeventsarereadout oftheFPGA.Thewaythisisachievedisasfollows.Thecurrentvalueoftheevent counterkeepingtrackofhowmanyhaveoccurredsincelastreadisalwaysavailableon awirethroughtheFrontPanelAPI.Thesoftwareonthehostsidecanperiodicallycheck fornumberofeventsandissueapipereadifitisgreaterthanauserspeciedvalue. Whenareadisissued,atriggerissentrstwhichswitchesthebufferandupdatesthe numberofeventsonaseparatewireasitmayhaveincrementedsincethelastpeek. Thesoftwarethenreadsthenalcountandissuesapipereadforthatmanyevents. 62

PAGE 63

Thiswayifanapplicationneedsquickupdatesofeventsforvisualizationorprocessing theminimumnumberofeventstobereadcanbesettoalowervalue.Forrecording purposesthisvaluecanbesetclosetomaximumsizeofthebuffersothatthehighest USBtransferrateisachieved. 3.3.2.2Monitorstatemachine Whentheboardispoweredupthisstatemachineentersinanidlestatewhere requestsareacknowledgedimmediatelyandnoeventsarerecorded.Arunsignalset bytheUSBinterfaceusingonebitofthe16-bitwirefromFrontPanelmovesitintoa waitingstate.Assoonastherequestisreceivedthemachineentersthewriteevent stateprovidedthatthemainmachinehassignaledthatithascompletedprocessing thepreviouseventbysettingeventreadysignal.Thewriteeventstatelatchesthe address,acknowledgestherequestandindicatesviatheseteventreadysignaltothe Mainmachine thataneventisavailableforprocessing.Themachinethenwaitsforthe requesttoberemoved.Whenitisremoved,themachineremovesacknowledgement andreturnsbacktowaiting.Howeverifthebufferbecomesfull,themachinedisablesthe eventcounterandacknowledgestherequestimmediately.Ifanothereventisreceived whenthebufferisfull,amissedeventcounterisincrementedwhichisavailabletothe hostviaanotherwireinFrontPanelAPI.Thismachineisverysimilartothatisdescribed in[18].Figure3-21showsthestatediagramforthismachine.Table3-4listtheinput outputsofthismachinewithdefaultvaluesofoutputs. 3.3.2.3Mainstatemachine The Mainstatemachine isresponsibleforcoordinatingthefunctioningofother statemachines.Wehavenotimplementedmanyotherfunctionssuchassequencing, multipleboardmonitoringandpassthroughasdonein[18].Hencethismachineas showninFigure3-22,turnsouttobeverysimple.Ithasbeenincludedinthedesignso thatthedesigncanbeextendedinthefuture.Asub-component,notshowninthegure, isdenedtodetectaneventjustbeforethetimestampoverows.Thisisrequiredas 63

PAGE 64

Figure3-21.MonitorMachinestatediagramonlyshowingstatesandoutputswhenthey arechanged. Table3-4.Monitorstatemachineinputandoutputs Signal Direction Description DefaultValue clkxI in FPGAclock resetxI in Systemreset RunxI in Startmonitoring AERReqxI in ChipRequest eventReadyxI in LastEventprocssed eventBufferFullxI in EventBufferoverow setEventReadyxO out EventAvailable 0 AERAckxO out ChipAcknowledgement 1 AERAddrRegEnxO out EnableAERAddressregister 0 EvtCtrRegEnxO out EnableEventCounterRegister 1 missedEventsxP out Numberofmissedevents timestampoverowisalsosentasaneventandisgivenpriorityinthestatemachine. Whilesittingintheidlestateifaneventoccurs,themachinemovestothewriteevent statewhereitlatchesthetimestampcountervalueandincrementstheeventcounter whichalsoservesastheaddressforthebuffermemory.Itclearsthemonitoreventag indicatingtothe Monitormachine thatitcanacceptnewevents.Howeverifthereisa timestampoverow,themachinegoestotheoverowstatewheretheMSBofthetime stampissetto1andtheeventcounterincremented.Aneventthatoccurredjustbefore 64

PAGE 65

overowandwasnotservicedisthenstoredasdescribedabove.Iftheeventbufferfull aggoeshigh,themachinesitsidlenotwritinganyfurthereventsuntilthebufferisread andcleared.Table3-5showstheinputsandoutputsandtheirdefaultvaluesforthis machine. Figure3-22.MainMachine.Statediagramonlyshowsstatesandoutputswhentheyare changed. Table3-5.Mainstatemachineinputsandoutputs Signal Direction Description DefaultValue clockxI in FPGAclock ResetxI in Systemreset timeStampOverFlowxI in TimeStampoverow occurred MonitorEventReadyxI in EventAvailable EventBeforeOverFlowxI in Eventoccurredbefore timestampoverow eventBufferFullxI in EventBufferoverow clearMonitorEventxO out Eventprocessingdone 0 incrementEventCounterxO out Incrementeventcounter 0 timeStampRegEnxO out EnableTimeStamp Addressregister 0 timeStampMSBxO out SetTimestampMSBif overow 0 65

PAGE 66

3.4DigitalController Adigitalcontrollerisresponsibleforroutingspikeswithappropriatesynaptic parameterssee3.1.2.IthasbeenimplementedonthesameOpalKellyboardas describedinthepreviousSection3.3.1.Theconguration,controlanddownloadof synapseinformationisdoneviatheOpalKellyUSBinterfaceusingtheFrontPanelAPI providedbythesameboard.Thebasictaskofthiscontroller,asexplainedearlieristo acceptanincomingspike,sendacopyofittotheUSBmonitoringmoduleforrecording, lookupallthesynapticconnectionsfromthememoryandsincetheweightsarein time,scheduletheupdownanddownupsignalsfortheneuralarrayaccordingly. Figure3-23showsthetoplevelblockdiagramofthiscontrollermodulealongwith itsinputandoutputsignalsandsomeimportantinternalsignalsbetweenmajor sub-modules.Theoperationofthecontrollercanbeunderstoodbydescribingits Figure3-23.ThedigitalcontrollerasimplementedonanFPGA.Thisgureshowsthe variousmodulesthatmakeupthecontrollerandtheirinteractions. 66

PAGE 67

threemajorcomponents:Themaincontrollernitestatemachine,theconnectiontable managementmoduleandtheweighteventmanagementmodule. Figure3-24.Organizationofmemorybuffersforthedigitalcontroller.Connectiontable storessynapticconnections.Controllerscheduleseventsdependingon synapticdelayandweightwhichareintimeintotheeventbuffer. 3.4.1ConnectionTableModule Thismoduleisresponsibleforstoringsynapticconnectioninamemory,receiving aninputspikingneuronaddressandreadingoutonebyoneallthepost-synapticneuron addressesandassociateddelaysandweights.Thememoryasofnowforsimplicity, hasbeenbuiltusingBlockRAMmodulesavailableontheFPGAitself.Themajor componentsofthismodulearealsoshowninFigure3-23.Thefunctioningofthisblock isdescribedbelow: Initially,theusertellstheControllerFSMviaUSBthatitwantstowriteinto theconnectiontable.TheControllerFSMissues CTBuffWrite whichtellsthe connectiontableFSMthatawritefromtheUSBwillbegin.TheuserviaUSBthen providesthesynapticparameterswhichareloadedintothememory.Thememory organizationfortheconnectiontableisshowninFigure3-24.Theconnection tableisdividedintoblocksindexedbypre-synapticneuronaddress.Eachblockis thensubdividedtostorethepostsynapticconnectionsfromthatparticularneuron. Figure3-24alsoshowshowadatawordfortheconnectiontableisorganized.The rstfteenbitsofMSBarepostsynapticneuronaddress,thenextbitidentiesit 67

PAGE 68

asanexcitatoryorinhibitoryconnection.Thenextsixbitsindicatethesynaptic delayforthisconnectionandtheremainingtenbitsspecifytheweight.Afterthe connectiontableisloaded,anotherbufferalsoindexedbypre-synapticneuron address,storesthenumberofsynapticconnectionsthateachneuronhasinits correspondingblockintheconnectiontable. Oncetheconnectiontablehasbeeninitialized,itwaitsfora CTMEnable signal fromtheControllerFSMwhichisgeneratedwhenaspikeisgeneratedatits input.Theconnectiontableaddressgeneratorthenreadsthenumberofsynaptic connectionsfromthesynapticcountbufferandstartscountingfromzerountil itreachesthereadvaluefromsynapsecountbuffer.Itthengeneratestheread addressesfortheconnectiontablebyconcatenatingthethepre-synapticneuron addressandthevalueofthecounter.Everytimethecounterisincrementeda newrowfromtheblockcorrespondingtothepre-synapticneuronisreadoutas CTBufffReadxO totheweightbuffermodule.TheconnectiontableFSMissues CTCountEqual onceitisdonereadingoutalladdressfromtheblock. 3.4.2WeightBufferModule Thismoduleisresponsibleforschedulingupanddowncountsandthenreading themouttotheneuronarrayattherighttime.AcircularFIFOisusedtostorethe scheduledevents.Figure3-24showshowthisFIFOisorganized.TheFIFOisdivided intotimeslotsinthiscasewhereeachtimeslotrepresentstheminimum resolutionoftheweights.Ateachtimeslot,manyeventsinthiscasecanbe stored.Alleventsinonetimeslotrepresenttheeventsthathavetooccuratthesame timewithatimeresolutionofhowfasttheFIFOtimeslotsaretraversed.Eacheventina timeslotstorestheaddressofthepostsynapticneuronaddressthatneedstobeeither givenanupcountordowncountatthattimeslot.Thenumberoftimeslotsdenethe maximumweightpossible.Asintheconnectiontable,anothersmallbuffer WeightCountBuff ,indexedbythetimeslotaddressstoreshowmanyeventsarestoredinthattime slot.Thefunctionofthismodulecanbedescribedasfollows: The WT AddressGenerator moduleisforgeneratingtheaddressoftheslots.The outputofthismodulerepresentsthecurrenttimei.e.the BuffAddr signal. The WT ReadModule isresponsibleforreadingalltheeventsintheFIFO viasignals weightBuffReadAddr foraddressand WeightBuffRead forread enable.Thismodulereadsalleventsinthetimeslotthatiscurrentlysetbythe 68

PAGE 69

WT AddressGenerator moduleandsendsittotheneuronarrayviasignals NextNeuronAddr and Inhib/Excit .Theclockspeedatwhichmodulereadsevents inatimeslotisanexactmultiplethetotalnumberofeventspossibleinatimeslot fore.g.64oftheclockspeedofthe WT AddressGenerator .Asatimeslotmay notalwaysbefullwithevents,thismodulereadsthenumberofeventsthatare scheduledintheweightcountbufferforthatslotandprocessesthemandforrest ofthetimedisablesitsoutput. WT AddressCalculatorreceivesthe Delay Weight andtypefortheconnection fromtheweightbufferFSM.Itrstcalculatesthetimeslotforschedulingthe upcountforthespikebyaddingthevalueofthecurrenttimeslotaddressand delaytogether.Thenitlooksintheweightcountbuffertocalculatewhereinthe calculatedtimeslotthiseventistobestored.Itformstheaddress WeightBuffAddr byappendingsumandoutputofweightcountbufferandputstheneuronaddress inweightbufferusing weightBuffData signal.Fortheweight,itdoesasimilar processbutinsteadofschedulinganupcountitipstheLSBandstoresadown count.Iftheincomingconnectionisinhibitorythedirectionofcountsarereversed. OneitisdoneschedulinganeventitletstheControllerFSMknowviasignal WTDone WeightBufferFSMisresponsibleforsynchronizingthemodulesmentionedabove. Itgetsa WTEnable signalfromControllertoindicatethateventsneedstobe schedulednow.TheFSMenablesthewritingtothebufferaccordinglyafterthe WT AddrCalculator nisheditsprocessing.Itisalsoresponsibleforupdatethe weightcountbufferwheneveraneweventisaddedtoatimeslot. 3.4.3ControllerFSM Theconnectiontablelook-upandschedulingoftheeventsarepipelinedfor maximumthroughputsothatanrequestfromaneuronisservicedasfastaspossible. ThepipelineiscontrolledbytheControllerFSM.ThejobofthisFSMistoenablethe connectiontablewhenaspikearrives,waitfortherightamountnumberofcyclesbefore enablingtheweightbuffermodule.Itwaitsfortheconnectiontabletonishreadingthe wholeblockandthenletstheweightbuffermoduleknowwhenithastostopwriting. Aftertheweightbuffermoduleisdoneschedulingallevents,thecontrollerFSMthen nallyacknowledgestherequestandisreadytoaccepttonewspikerequests. 3.5Discussion Inthischapter,afunctionaldescriptionfollowedbyhardwareimplementationdetails ofthespikingcomputationarchitecture,theUFTimeMachineUFTMispresented. 69

PAGE 70

UFTMprovidesadvantagesoversomeoftheexistingarchitecturesdiscussedin Chapter2.Itprovidesexibilityforroutingspikesbyallowingarbitrarycongurable connectionsbetweenneuronsthatcanbesetatthebeginningofthecomputationor changedonthey.Thisisincomparisontodedicatedhardwaresynapseswhichonce fabricated,connectasetofneuronsirrespectiveifthesynapseisbeingusedornot. Implementingrecurrentconnectionsiseasyashavinganentryintheconnectiontable. Allowingweightstobesetindependentlyforeachconnectionsolvestheproblemof storingweightatthesynapseintheneuronarray.Theseweightsarestoreddigitally whichareeasiertocomputeandstoreascomparedtosettinganalogbiasesasweights. Thisweightingtechniquealsohasadvantagesintermsofnumberofpossible synapses.Thelimitingfactorhereisthenumberofsimultaneousactivesynapses ratherthanthetotalnumberofsynapses.Forothersystemswithdedicatedsynapses andvirtualrouting,hardwareneedstoprovidedforasmanyconnectionsthatare possiblei.e.synapseshavetobefabricatedasaddressableunitsonthechip.InUFTM, thesynapseisnotidentiedbyanaddress,butbyanupordowncounttoaneuron activatedataparticulartime.Theaddressspaceornumberofbitsrequiredtoaddress asynapseisequalto log 2 N +1 whereNisthenumberofneurons.Whenalargeof numberofsynapsesarepresent,theaddressspacerequiredusconsiderablysmaller thanthatrequiredforaddressingeachsynapseseparately.Thissavesnumberofpins onachipandalsothememoryrequiredtorepresentaconnectioninthememory.This architecturetakesadvantageofthefactthatinbiologytoo,onlyasmallsubsetofthe largenumberofsynapsesareactiveatanygiventime.Anexampleofthisfeatureis showninSection4.3.3. Anothermajoradvantageisthattheresolutionoftheweightsisonlylimitedby theresolutionofweightstorageinthedigitalcontroller.Theweightswillbequantized tomaximumfrequencyoftheglobaltimerthatisimplementedontheFPGA.This resolutionisscalestosmallervalueswithscalingoftechnology.Theweightscanalso 70

PAGE 71

haveahighdynamicrangeaswetakeadvantageofthefactthattimebetweenspikesis intheorderofmilliseconds.Strongconnectionsbetweenneuronscanemulatedusing multipleconnectionsthusincreasingthedynamicrangeevenmore.Withthisweighting technique,itisstraightforwardtoimplementinhibitoryconnections.Usingtimeas weightalsogivesanadvantageofoverarchitecturessuchasIFAT[110]whereaneuron computesonlywhenaspikeisdeliveredtoit.Howeversinceinthisarchitecture,the neuronintegratesovertime,itallowsformoreinterestingalgorithmstobeimplemented whichdependonprecisespiketiming.Forexample,ifaneuronisintegratingoverthe periodofitsweightandisaboutreachtoitsthreshold,aninhibitoryspikecancomein attherightmomenttopreventitfromspiking.InIFAT,toachievethesameeffect,the chargeupdateofneuronhastobenegrained,i.e.atamuchhigherclock,requiring higherbandwidthseeChapter2. Thehardwaredesignofthearchitectureallowsforswappingofthelow-power integrate-and-reneuronwithanyotherneuronmodelwhichacceptsasynaptic injectioncurrentdirectlyontothemembranecapacitor.Thesaturatingcounterallows multiplesimultaneoussynapsestobeactiveandsaturateswhenmaximuminputsat agiventimearereceived.Thenoveldesignforthecounterallowsforlinearincrease insizewithincreasingbitsandalsoreducesswitchingactivityascounterapproaches saturationsavingpower.TheuseoftheAERprotocolallowsthearchitecturetointerface withanyAERbasedoutputsystem.Italsoenablestheanalogarraytoswapoutwitha digitalarrayimplementedonFPGAwherere-designofneuronsisquicklypossible. ThecurrentUFTMdoeshaveitslimitations.Therearenosynapticdynamicsin thisarchitectureasthesynapsesarerepresentedbythecounterasstepinputs.More richdynamicssuchasrisetime,falltimeandconductancebasedsynapsesarenot implemented.Thiscanbearoadblockforalgorithmsdependentonthesedynamics. Somedynamicsmaybeimplementedbyusinglog-domainlteringcircuitstoshape theinputcurrentpulse.Butthisintroducesmorebiasestocontrolandmuchbigger 71

PAGE 72

parametricspace,whichgoesagainstthephilosophyofkeepingthingssimple.Also, theweightscannotbeveryhighastheirvaluewillbecomecomparabletotimebetween spikes.Thedynamicrangebetweentheresolutionoftheweightandneuronupdate inadigitalneuronhastobelargesoastonotalterthetimingdifferencebetween twoconsecutivespikes.Sendingtwospikesforeveryonespikethatissentoutalso stressestheinputbustotheneuronarrayforbandwidth.UFTMdoesnotprovide localconnectionsinsidetheneuronarray.Inthebrain,localityisacommonfeature wheremostoftheconnectionsofaneuroniswithitsneighbors.Globalconnections arefewandsparse.Soevenifaneuronwantstocommunicatewithitsimmediate neighbor,UFTMcarriesouttheexpensiveoperationofroutingthespikethroughthe digitalcontroller.However,bysacricinglocalitywegainexibilityinconnectivity.The patternoflocalconnectivitycanbechangedinthememoryascomparedtodedicated localconnectivity.WehopethatthegeneralityandexibilityoftheUFTimeMachinecan overcomeitslimitations. 72

PAGE 73

CHAPTER4 SPIKESIM:SIMULATORFORTHENEURONARCHITECTURE Buildinghardwaredesignsisnotonlyexpensivebutisverytimeconsuming.Often itisalsoinexible,asanychangetoadd/removeafeaturerequiresthewholeprocess ofdesignandfabricationtoberepeated.Ifadesignfaultsuchasanincorrectvalue ofsynapticcurrent,thresholdvoltageormembranecapacitanceisdiscoveredafter fabrication,itishighlypossiblethatthefabricatedchipmayeithernotworkatallorgive undesiredresults.Therefore,itisaprudentchoicetodevelopabehavioralsimulation ofthehardwaretosimulatethespike-basedarrayandconvergeonasetofparameters aidinghardwaredesign.Furthermore,simulationallowsthearchitectureparametersto betunedforperformanceonparticularapplications. Inthischapter,thedesignofaspike-basedsimulatorcalled SpikeSim isdiscussed. Itwasdevelopedusing JAVA astheprogramminglanguage.Thereareahostof spike-basedsimulatorsthathavebeenpublished.Foranexcellentandcomprehensive review,seeBretteetal.[27].Duetoalargerangeofcomputationalproblemsconcerning spikingneurons,thesesimulatorscanvaryinthecomplexityofneuronmodelsthat areused,rangingfromdetailedbiophysicalrepresentationstoconductance-based modelssuchasHodgkin-Huxelymodelstothesimplerintegrate-and-remodels. Simulatorscanalsobedividedintotwomaincategories:synchronousorclock-driven andasynchronousorevent-driven.Clock-drivenortimestepsimulatorsupdateneuron stateateveryclockedgeandcanbeappliedtoanyneuronmodel.Sinceneuronstates arecomputedonlyatdiscretetime-stepsasmallervalueoftime-stepleadstobetter accuracy.Computationateverytime-stepandsmalltime-stepscanquicklystrain resourcesforlargeneuronnetworks.Event-basedsimulators,ontheotherhand,rely onexactsolutionoftheneuronmodelandhencearesuitedformoresimplermodelsof neuronswhoseanalyticalsolutionareavailable. 73

PAGE 74

SimulatorssuchasNEURON[29],GENESIS[25]andSPLIT[53]cansimulate multi-compartmentneuronmodels,implementingdetailedanatomicalandphysiological models.SinglecompartmentmodelssuchasNEST[45]andCSIM[82,85]can alsoimplementdetailedconductance-basedmodels.Event-drivensimulatorssuch asMVASPIKEandSpikeNet[40]implementthesimplermodelsforasynchronous simulations.AnewhighlyexiblesimulatorBRIAN[49,50]hasbeendesignedin Python.Itismostlydesignedtobeasinglecompartmentsimulator.InBRIAN,different neuronmodelscanbedenedincludingvariouslevelsofcontroloverconnectivity andsynapticweights.Themaingoalsofthesesimulatorsisexibility,scalability,high speedandprovidinginterfacesforanalysisandvisualizationofresults.Toscaleto largenetworks,mostsimulatorsimplementparallelizedoperationstotakeadvantageof multi-processorandmulti-computerclusters. Neuromorphicengineersfaceobstacleswhenportingsimulationsetupsfrom thesespikesimulatorstohardwarespikingarchitecturesbecausemanyofthefeatures andarchitecturalexibilitythatisassumedwithsoftwaremodelingisnotavailable inhardware.Simulationinhardwarebringsupissuessuchasresolution,accuracy, precision,memoryspace,connectivityandcomputationpower.Toalleviatethisproblem, SpikeSim closelymodelsthebehavioroftheUFTimeMachineUFTMpresented inChapter3atalevelwhichisatahigherabstractionthanthatofcircuitordevice simulations.Runningfullscalesimulationsofhardwareusingcircuitsimulatorsis virtuallyintractableintimeandcomputingresources. SpikeSim modelsthecurrent sources,countersandneuronsatabehaviorallevelallowingtheusertosimulate algorithmsonthesimulatorandexpectresultstomatchthoseonactualhardware.Such simulationsthenprovidevaluesforthecapacitor,threshold,counterlengthandthe numberofneuronsrequiredtobebuiltonthehardware. 74

PAGE 75

4.1SoftwareArchitecture SpikeSim hasbeenbuiltusingobjectorientedprogrammingconceptstosimplify inclusionofnewfeaturessuchasnewneuronmodels,stochasticmodelsofcurrent sourcesanddifferentimplementationsofthedigitalcontroller.SinceMATLABnatively supportsJAVA,itisveryeasytousethesimulator'sJAVAlibrariesinMATLABforfurther analysisandvisualization.WrappercodeonboththeJAVAandMATLABsidehas beendevelopedtomakethesimulatorconvenienttouseinMATLAB.Thesimulator alsosupportsastand-alonelemodewhereallthecongurationandsetupisprovided throughtextlesandtheoutputiswrittentoles. SpikeSim supportsbatchmodeto allowmultipleiterationswithdifferentinputsandinitialconditionstoberunonthesame neuronnetworksetup.TwomodesofsimulationareimplementedinSpikeSim.Atime stepsimulationmodeandanevent-basedsimulationmode. 4.1.1TimeStepSimulation Intimestepmode,thesimulationtimeisdividedintononzerodiscretetimeslots denedbytheuser.Ateachtimestep,everyneuroninthearrayisevaluated.This evaluationincludesevaluatingthecounters,currentsourcesandthreshold.Eachof theseelementscanvaryasafunctionoftimeandotherstatevariables.Afterthat,the neuronequationisevaluatedtoupdatethemembranevoltageandcheckedtoseeif aneuronspiked.Ateachtimestepalistofallspikingneuronsismade,connections arelookedupinthesynapsetableforeverypre-synapticspikeandcorresponding post-synapticneuroncountersareupdated.IfanAERisenabled,thentheoutgoing spikesarerstpassedthroughanAERblockwhichcanfurtherdelaythespikes.The equationsforimplementedneuronsSpikingneuronsandleakyintegrate-and-re neuronsareupdatedusingEuler'smethod,whichissimplerthanotherdifferential equationsolversandisaccurateenoughforsimpledifferentialequationsinour applications. 75

PAGE 76

Thespeedoftimestepsimulationsishighlydependentonthestepsizeandthe sizeoftheneuronarray.Timestepsimulationsareinherentlyinefcientforspiking neuronsimulations.Thisisbecauseeventhoughtheneuronarrayisquitelarge,the spikingactivityisgenerallysparseinspaceandtime.Atmostofthetimesteps,none oftheneuronsarespikingandthereisnoactivity.Stillthesimulatorhaspolltoeach neuronandevaluateallobjectsassociatedwithitandvaluablecomputationtimeis wasted.Abetteralternativedescribedbelow,isthatofdoingevent-basedsimulations whichevaluatesaneurononlywhenrequired.Timestepsimulationsarerequiredto simulateneuronswhoseequationsdonothaveaclosedformsolutionortosimulate non-lineareffectssuchasnoiseandmismatchincurrentsandthresholdsofneurons. 4.1.2Event-BasedSimulation InUFTMeveryspikecausesanupanddowncounttobeissued.Thesetransitions canbethoughtofeventswhicharesparseandasynchronous.Aneuronstateneeds tobeupdatedonlywheneventsoccurcreatingacomputationalspeedupforsparsely spikingneuronnetworks.Event-basedsimulationhasbeenbuiltintoSpikeSim.The underlyingsoftwarearchitectureforsettingupandconguringthesimulatoristhesame asthatoftimestepsimulation.Onlytheimplementationofthecontrollerisdifferent. Forevent-basedsimulationstwobuffersaremaintainedatanygiventimein SpikeSim.Onenamedweightbufferisinitializedwithspiketimingsofexternalinputs andstoreswhenanupordowncountneedstobesenttoapostsynapticneuron whichdependsonsynapticdelaysand/orweights.Theotherbuffernamedsynapse bufferinitializedasempty,storeswheninthefuture,ifatall,aneuronwillspike.The maximumsizeofthesynapsebufferisequaltothetotalnumberofneuronsinasystem becauseintheworstcase,allneuronsmaybespikinginthefutureatanygiventime instant.Boththesebuffersstoretimeinascendingorderandthedatastructureusedto implementthemsupportssortedinsert.Eachneuron,giventhestateofitscomponents, candirectlyevaluatethenextspiketimebysolvingadifferentialequation.Theneuron 76

PAGE 77

storeswhenitwaslastevaluatedandwhatthenextspiketimewas.Theprocessof event-basedsimulationisexplainedbelow. 1.Comparetherstelementsinthetwobuffersandpicktheonewhichwilloccur rst. 2.Ifitisaweightbuffereventwhichwillbethecaseinitially,proceedtoupdatethe countervalueeitherupordownoftheneuronforwhichtheeventis.Remove thiseventfromtheweighteventbuffer.Evaluatethisneuronandcalculatethenext spiketime.Ifanitespiketimeiscalculated,schedulethisneurontoreatthe calculatedspiketimeasaneventinsynapsebuffer.Howeverbeforescheduling, checkifanentryforthisneuronalreadyexistedinthesynapsetable.Ifitdoes, thenremovethatentryandstorethenewupdatedspiketime.Thismayhappenas neuronmayreceivemorethanoneinputatdifferenttimeinstantsbeforeitres. 3.Iftheeventisfromthesynapsetable,lookupalltheconnectionsfortheneuron forwhichtheeventis.Foreachconnection,readthesynapticdelayandschedule anupcountdownforinhibitoryconnectionsintheweightbuffer.Alsoschedulein thecorrespondingdowncountupcountforinhibitoryconnectionsintheweight buffer.Removethiseventfromthebuffer. 4.Gobackandlookfornexteventinthetwobuffers. Thissimulationmethodmaybeviewedasamodelofdeferredeventprocessingwhere theeffectofaneventiscalculatedbutappearsonlylaterwhenandonlyiftheneuron spikes.Thereisnotimestepinthissimulation.Thesimulationjumpsfromoneevent timetoanother.Thetimedifferencescanbearbitrarilysmallorarbitrarilylarge.As willbeseenlater,event-basedsimulatorprovidesfasterspeedcomparedtotimestep simulations.However,event-basedsimulationscannotbeusedeasilyforsimulation timevaryingfunctionsforwhichaclosedformsolutionisnotpossible.SpikeSimalso providesafeaturewheretheinternalvariablesoftheneuronscanberecordedfor furtheranalysis.Recordingthesevariablesisnotalwayspossibleusingevent-based simulation.Recordingintimestepsimulationhappensateverystep.Inevent-based simulationinternalvariablesarerecordedonlywhenaneventhappens.Forexample, considerifthemembranevoltageofaneuronisrecordedatoneeventandthen recordedagainonlywhentheneuronspikes.Theonlyvaluesthatwillberecorded 77

PAGE 78

willbeinitialvalueandnalvalueofthemembraneandnootherpointsonthecurvein between. 4.1.3SoftwareArchitecture Thestructureandinteractionbetweendifferentcomponentsofthesimulatoris illustratedusingtheUniedModelingLanguageUML[95].UMListhemostwidely knownandusednotationforobject-orientedanalysisanddesign.Figure4-1illustrates someofthecommonlyusedsymbolsinUMLnotation.InUML,theclassesare Figure4-1.AUMLdiagramshowingstaticobjectsofaCustomer-Orderexample.UML representationfordifferentassociations,classrepresentationsandattributes areshown.Takenfrom: http://edn.embarcadero.com/article/31863 representedbyaboxwiththeirnameinthetitle.Thenamesofabstractclassesis writteninitalics.Theattributesrepresentvariablesassociatedwiththeclassandthe operationsarethefunctionsthataredenedintheclass.Associationbetweenclasses isrepresentedusingarrows.Aspecicassociationcalledcomposition,whereoneclass iscontainedinanother,isrepresentedbyalineandasolidarrowattheend.UMLalso providesawaytorepresentgeneralizationsi.e.torepresentinterfacesandabstract classesandtheiractualimplementationasshowninFigure4-1.Asthecomplexityof thesoftwaregrows,itiscommontogroupclassesrepresentingsomecommonlogic orfunctionintoapackage.InUML,asshowninFigure4-1,dependenciesbetween packagescanalsobedenedusingspecialsymbolsforpackagesanddashedarrows. Figure4-2showsthepackagesdenedin SpikeSim andtheirdependencies. 78

PAGE 79

Figure4-2.AUMLdiagramshowingwhatpackagesaredenedin SpikeSim andtheir dependencies. SimRun Atthetoplevel, SimRun packagecontainsclassesforcommunicatingwith MATLABandrunningthesimulator.Itisdependentonthreesub-packages, namely:Controller,SetupandRecorder. SimRun.Controller Thispackagecontainsclassestoimplementthedigitalcontroller forroutingspikesinandoutoftheneuronarrayasdiscussedinSection3.1.2.This packagecontainstwoconcreteimplementationofthedigitalcontrollernamely: BasicController whichatimestepcontrollerand EvenBasedtBasicController which isevent-basedcontroller. SimRun.Setup Thispackageisusedforcongurationofthesimulator.Alltheuser providedvaluessuchasnumberofneurons,simulationtime,timestepandother hostofparametersarecapturedbythispackageandusedtocongureother components. SimRun.Recorder Thispackagecontainsclassesthatareusedtoinstantiaterecorders oninternalvariablessuchasmembranevoltage,synapticcurrentandcounter valueoftheneuronsforfurthervisualization. 79

PAGE 80

SimComponents Thispackagecontainsclassestorepresenttheneuronarray, connectiontablesandexternalinputs.Itisdependentonbasiccomponentsofthe systemrepresentedbythepackages: Neuron AER ,and Event SimExceptions Thispackagerepresentsvariouserrorsthatcanbeencounteredduring congurationandexecutionofthesimulatorandisusedbyalmostallpackages. SimOutput Thispackagecontainsclasseswhichareusedtostoretheoutputofthe simulator.Ithastwomethodsforstoringoutputs.Theoutputcanbeeitherstored inlesandreadbacklaterintotoolssuchasMATLABforvisualization.Theother methodistostorethedatainmemoryandpassitasavariabletoMATLABor otherJAVAprogramsthatmightinvokeSpikeSim. In SpikeSim ,objectorientedprogrammingconceptsofinheritanceanddynamic classloadingareusedextensively.Inheritanceallowsfornewclassestobederived fromaparentclass,inherititspropertiesandcanaddmorefunctionalitytothechild classes.UsingJAVAclassidentierssuchas Interface and Abstract ,thechildclass canbeforcedtoprovideimplementationsoffunctionsinheritedfromtheparentclass. DynamicclassloadingallowsJAVAtodetermineatruntimewhichclasshasbeen requestedbytheuserandthenloadit.Mostofthecomponentsofthesimulatorare denedasabstractclassesorinterfaces.Thesimulationengineusesthesecomponents tosimulatethenetworkbutactuallyaccessestheprovidedconcreteimplementations oftheseclasses.Thechoiceastowhichconcreteimplementationisusedcanbe determinedatruntimeandloadeddynamically.Thisallowsforuserstodenetheirown implementationsandbediscoveredatruntimeaslongtheinterfaceandabstractclass rulesarefollowed.Thereisnoneedtomodifytheexistingcodetoaddfunctionality. Forexample,theclass Neuron isanabstractclassrepresentinganeuron.Itcontains thedenitionsforfunctionssuchas evalNeuron ,representingfunctionalityoffered byaneuronandiscalledbythesimulationengine.Theclass IFNeuron derivedfrom class Neuron representsallthetime-stepbasedneurons.The LeakyIF isaconcrete 80

PAGE 81

implementationof IFNeuron classwhichmodelstheleakyintegrate-and-reneuronby implementingthemethodsintheparentabstractclass. Figure4-3. SimulationSetup classisusedtocongure,instantiateandinitializevarious componentsofthesimulator. AmodelofthesimulationsetupisshowninFigure4-3.Theclass SimulationSetup isanabstractclassforwhichtwoimplementationsareprovided: SimulationSetupFromVars whichisusedtosetuponlinesimulationse.g.fromMATLABandotherJAVA programsand SimulationSetupFromFile tosetupstandalonesimulationsfromles.The abstractionallowsforanyotherarbitraryimplementationof SimulationSetup toextend thewaythesimulatorisinitialized.Asmentionedearlier,aMATLABwrappercodeis availablewhichusesthe SimulationSetupFromVars classandinitializesthe SimulationSetup classwitha simulatorProperties object.The simulationSetup classinitializesthe followingcomponents: AER ThisclasssimulatestheAERbehavior.TheAERisalsoanabstractclasswith implementationsforatrivialnoAER NoAER andarandomservicetimeAER RandomAER providedasshowninFigure4-4. 81

PAGE 82

Controller Thisclasssimulatesthedigitalcontroller.AsshowninFigure4-4,theclass BasicController providesaconcreteimplementationofthe Controller class.This classdoesonlythebasicreception,connectionlookupandroutingofspikes.More complexcontrollerswithfeaturessuchaslearningcanbeimplementedeasilyby extendingthe Controller class.Anotherconcreteimplementation EventBasicController implementstheevent-basedsimulationofSpikeSim. ConnectionTable Thistablestorestheconnectionsbetweentheneuronswhichare eitherreadfromaleorreadfromavariablepassedtoitduringinitialization. Thissimulatestheconnectionstorageinthememory.Theconnectionsarestored indexedbythepre-synapticneuronusingJAVA'shashedmapdatastructure. SimOutputControl Thisabstractclassallowstheusertocontrolhowtheoutputis generated.Theimplementation OutputVarControl storesoutputandinputspikes inmemorythatcanbepassedtoMATLABand OutputFileControl allowsdatato bewrittentoles.Thechoiceofclasstobeusedismadebytheimplementationof SimulationSetup class. Recorder Thisclassisdesignedfortheusertoqueryinternalvariablesof SpikeSim's components.Itisabigstrainontheresourcestostoreeverypossiblevariableand thememoryrequiredtostorethemincreasesexponentiallywithnetworksize.For eachbasiccomponentthatcanhaverecordablevaluessuchastheneuronand counter,functionsusingaknownnamingterminologyarerequiredtobedenedin theseclassesandthe Recorder classcanbeaskedtorecordtheinternalvariable byspecifyingitsname.AgainasshowninFigure4-4thisisanabstractclassand hasoneconcreteimplementationtostorethisdatainmemorytobepassedto MATLAB. NeuronArray Thisclassrepresentsthewholeneuronarray.Itisinstantiatedby the SimulationSetup class.AsshowninFigure4-5,itcanconsistofoneor moreneuronpoolsdenotedbythe NeuronPool class.Eachneuronpoolisa 82

PAGE 83

logicalcollectionofneuronsthathavesamepropertiesandcanrepresenta setofneuronsinvolvedinaparticulartask.Atthelowerlevel,allneuronsare representedphysicallyasalineararraywitheachneuronidentiedbyanunique indexnumber.Eachpoolconsistsofneuronsrepresentedbytheabstractclass Neuron .Twoclassesfurtherinheritthe Neuron class: IFNeuron fortimestep neuronsand EventIFNeuron forevent-basedneurons.Concreteimplementations for IFNeuron and EventIFNeuron areprovidedasshowninFigure4-5.Again,the choiceofwhichneurontouseanditscorrespondingpropertiescanbemadeat runtimethrough SimulationSetup class.Eachneuronjustlikeinthehardwaresee Section3.1.1,consistsofacomparator Threshold ,acongurablecurrentsource CurrentSrc andacongurablecounter Counter Figure4-4.Relationshipbetweendifferentclassesin SpikeSim 83

PAGE 84

Figure4-5.Relationshipbetween NeuronArray classandotherclassesin SpikeSim 4.2SummaryofFeatures ThemaingoalofdevelopingthesimulatoristosimulatethebehaviorUFTMcloseto hardwareandprovideinsightintodesigningbetterhardware.Thisisaccomplishedinthe simulatorbyfollowingways: Firstlythesimulatormodelsallhardwarecomponentsthatarerequiredforthe spikingarchitecture.Itdoesnotassumeavailabilityofanyothercomputational resource.TheonlydifferentialequationthatissolvedusingEuler'smethodisthat ofsynapticcurrentintegration,whichinhardware,isdonenaturallybytheanalog circuit. Theneuronhardwareiscloselymodeledincludingitscomponents:thecounter, thecurrentsourceandthethreshold.Thecountercanbeconguredforthe numberofbits,thedelaybetweeninputandoutput,andarbitraryinitialvalue.The currentsourceismodeledasabinarycurrentDACanditspossibletoincludea noisesourceandmismatchmodeltomodelnoiseincurrentsourceandmismatch betweendifferentbranches.Similarly,thethresholdobjectcanmodelnoiseand mismatchinthecapacitancevaluetoaccountforvariationinfabrication. 84

PAGE 85

AmodeloftheAERisimplemented.DelayofthespikesgoingintotheAERbefore reachingthecontrollercanbemodeled.Bymeasuringtheactivityandaverage delayexperiencedbyspikes,anestimateofbandwidthrequirementandhowit affectsthespikecomputationcanbemade. ModelingthecontrollerasthewayitwillbeimplementedontheFPGAhelps specifyrequirementsformemoryforstoringconnectionsandeventscanbemade. Theimplementationcanalsohelptoestimateresourcerequirementssuchas buffersizesforimplementationofmorecomplexalgorithms. Anadditionalfeaturethatcanbeeasilyimplementedistoforcetheuseoftime quantizedvaluesofweightsandquantizespikeoutputtoaclockedgetosimulate effectsoftimequantizationmoreclosely. Asstatedearlier,extensibilityandexibilityaretwomainfeaturesof SpikeSim for continueddevelopment.Itispossibletoextendfeaturesofcomponentsthesimulatorby writingnewclassesandjustpluggingthemintothesimulator.Writingthesimulatorin anativelanguagesuchasJAVAhelpspeedupsimulationsbutalsoprovidesexibility suchas: JAVAbeinganoperatingsystemindependentlanguage, SpikeSim iseasily portabletodifferentenvironmentssuchasMicrosoftWindows,Linux/Unixand AppleMACOS. Enoughhooksareprovidedinthesimulatorforittobeinvokedfromdifferent languagesandsoftwaresuchasMATLABforfurtheranalysisandvisualization. Anevent-basedsimulationisalsoprovidedwhichcanspeedupsimulationby ordersofmagnitudeforsparselyspikinglargeneuronnetworks. SpikeSimprovidestwomodesofoperation,onlinethroughothertoolssuchas MATLABandofinemodeusingles.Ofinemodeisusefulforrunninglong simulationsarerunwherelargeamountofdataisgenerated.Byusingles, SpikeSimdumpstheoutputtolesthusavoidinganymemoryoverow. LimitedfunctionalitytogenerateVerilogHDLcodedependingontheconguration of SpikeSim tocompileanddirectlyburnontheFPGAusingJAVAlibrariesis alreadyavailableinSpikeSim.CurrentlytheVerilogHDLcodeforAER,Decoder andNeuronarraycanbegenerateddirectlyfromJAVA.Acompleteone-click solutioncanbedevelopedlater. ApopularopensourceJAVAbasedframework jAER [5]hasbeendeveloped byTobiDelbr uckforcapturing,routingandvisualizingspikesfromspike-based 85

PAGE 86

sensorsandneuronchips.Since SpikeSim iswritteninJAVAitcanbeintegrated with jAER toreceivespikesfromexternalspikesensorssuchasaretina[38]ora cochlea[108]andperformcomputationonthem.Thismakes jAER andSpikeSim apowerfulcombinationasonecansimulatespikecomputationalgorithmsonreal worlddata. 4.3SampleApplications AcompellingmodelofneuralcomputationistheLiquidStateMachineLSM[73]. Thismodelprovidesaconceptualframeworkforworkingwithbiologicallyrealisticpulsed neuronmodelsintegrate-and-reneuronsasthebasiccomputationalelementwithina nonlinearrecurrentarchitecture,whereconnectionsandweightscanberandomlyset. TheLSMclassierhasbeenusedforspeechrecognitionassumingspikeinputsfroma cochlearmodel[98,106,107].Anentirelydifferentcomputationtechniqueusingsteps hasbeenproposedbyVishnuRavinuthuladuringhisPhDdissertationatUniversity ofFlorida[90,91].Theinformationiscontainedinthetimeofstartofstepandthese stepscanbeprocessedonframebyframebasis.Insteadofprocessingwithpulsesstart andstoptimes,thesetimemodecircuitsprocessusinginnitestatesthatgetresetat theendofeachframe.CMOSimagerswhichgivestepbasedoutputshavealsobeen fabricated[51]. 4.3.1LiquidStateMachine TheUFTMisgeneralenoughtoallowimplementationofeithertime-basedor step-basedalgorithms.Fig.4-6showsarasterplotofasimulatedspikingnetwork on SpikeSim .Itconsistsof270leakyintegrate-and-reneurons.Theinhibitoryand excitatoryconnections,weights,synapticdelaysandspikeinputtimingsaretaken asdescribedin[73],howevertherearenodynamicsynapses.Wecanseethatthe networkexhibitsarichdynamicresponsetotheinput.Ithaslocalmemorytospread inputsovertimeandalsohasafadingmemoryduetoleakyneurons.Suchanetwork canbeusedasaliquidintheLSMclassierandreadoutblocksimplementedinthe digitalcontroller.Thissuggeststhatanetworkofneuronsandaverysimplemodelof 86

PAGE 87

synapsescanalsobeusedforcomputationasopposedcomplicatedandbiologically plausiblesynapses[57]. Figure4-6.ALiquidStateMachineissimulatedinSpikeSim.Theinternalactivityofthe neuronsfortheexperimentisverysimilartothatpredictedbyCSIM[73,82] 4.3.2EdgeDetectionExample Astep-basededgedetectioncircuitdescribedin[91]isimplementedonUFTMand simulatedon SpikeSim .BrighterpixelsfromaCMOSimagerproduceastepearlier thandarkerpixels.Thesepixelsarethensmoothedusinga1-2-1convolutionmaskto removenoiseandthresholddifferentiationisdone.Forthesmoothingstep,theequation foroutputis: t OUT = t 1 +2 t 2 + t 3 4 + CV TH 4 I where t 1 t 2 t 3 arethetimeoftheoutputsofthepixelandFigure4-7showsthecircuit requiredtoimplementthesmoothing.Toimplementthisfora16pixelimager,16 Figure4-7.Aconceptualcircuitimplementedfor1-2-1smoothingwhereaneuronhas threeinputswiththesecondinputhavingtwicetheweight. 87

PAGE 88

neuronswereusedwitheachconnectedwiththreesynapticinputs.Fortwicethe currenttogetthe 2 t 2 term,thesecondpixelcounterissenttwo UP countsignals inveryshortdurationforeachoneinputreceived.Thisiseasilypossibleastime differencesininputpulsesaremuchlargerthanthedelaysduetothecountersand AER.Todetectanedge,outputsfromtwoconsecutivesmoothingneuronsarefedinto Figure4-8.Setupusedtoachievenegativethresholdinedgedetection athresholdblock.Thiscircuitcomparesthetimebetweenthetwospikesandifthetime exceedsathreshold,anoutputspikeisred.Thecircuitin[91]usesacomparatorwith anegativethresholdtodetectanegativedarktobrightedge,whichisnotavailablein thisarchitecture.Toimplementnegativethreshold,thesignsofweightsarereversed intheconnectionstothesecondneuronasshowning.4-8.Asmentionedearlier negativeweightmeansadowncount.Theequationrequiredforcorrectoutputisgiven by: t OUT = t 1 + CV TH I where,ifasmoothedspikearrivesat t 1 andanadjacentneurondoesnotspikeforthe next CV TH I secondsthenthereisaspikeat t OUT indicatinganedge.InFig.4-8,if t 1 < t 2 and t 2 > CV TH I thenneuron1resindicatingabrightedge,otherwiseif t 2 < t 1 and t 1 > CV TH I thenneuron2resindicatingadarkedge.Ifeitherof t 2 t 1 < CV TH I thenboth neuronsrstreceiveanupcountandthenadowncountbeforeeitheronecanreand noedgeisdetected.Asimagesarealwaysprocessedonaframebyframebasis,the weightsneedtobelargeenoughtorepresentaspikeasastepbecauseeverythingis 88

PAGE 89

resetattheendofaframe.ThisexamplestressesfurthertheclaimthattheUFTime Machineisnotrestrictedtoimplementingjustoneclassofalgorithmsonly.Fig.4-9A AOriginalimageconvertedtospiketimings BSmoothedspiketimingsanddetectededge Figure4-9.Asimpleedgedetectionalgorithmimplementedonspikeoutputssimulated using SpikeSim showstheinputimageandconvertedtospiketimingsrespectively.Fig.4-9Bshowsthe outputaftersmoothingandthedetectededges.Neuronoutputsfordetectededgesare showningroupsoftwotoshowasifoneneurondoesthresholding.Spikesinneuron sets5and6showpresenceofedgeson 4 th 5 th and 6 th pixel.Sincetherstneuronin boththesetsred,itmeansthatwemovedfrombrightertodarkerregiongoingfromtop pixeltobottompixel.Multiplespikesonthe 6 th setrepresentsalargeintensitychange. 4.3.3PlayingCardRecognition AsapartoftheprestigiousTellurideNeuromorphicCognitionEngineering Workshopheldeveryyear,agrandchallengewasdevisedtoplaythecardgame ofheartsusingspike-basedprocessing[1].Thecardrecognitionandintelligent gameplayingbymachinesweretobeachievedasfaraspossiblewithspike-based computation.Theexperimenthadreasonablesuccessandtheresultscanbefound at[103].Theexperimentsetupwasasfollows: 89

PAGE 90

Figure4-10.Setuptocaptureimageofthecard. Foracquisitionoftheimage,MATLAB r ImageAcquisitionToolBoxwasusedto captureimagefromaLogitech r WebcamPro9000camera.Usingthetoolbox, theimagewasdownsampledtoasizeof100by100pixels.Thesetupdesigned tocaptureimageisshowninFigure4-10.Thesetupismadesuchthatpositionof thecardrelativetothecameraisxedthatisthereisnotranslationalorrotational variation.Thiswasdonetokeeptheexperimentsimple. Theacquiredimageisthenhighpasslteredtosmoothenitandisthensubtracted fromtheoriginalsothattheedgesbecomebrighterandtherestofthecardis darker.AnexampleofanimageaftertheprocessingisshowninFigure4-11A. TakingtheintensityofthepixelasininputtoaTimeToFirstSpikeCMOS imager[51]softwareemulation,spiketimingsforonespikeperpixelperframe arecalculated.Thethresholdoftheneuronisdecreasedlinearlyovertimeto makesureallpixelseventuallyspike[51].Thissetupisdifferentfromtheone implementedattheworkshopwhereneuronringraterepresentedtheinformation ratherthantheactualspiketime.Figure4-11Bshowsspikingactivityofthecard showninFigure4-11Awherethevaluesofimagerneuronaresetsuchthatall pixelsrebytheendof32ms.The100 100neuronarrayhasbeenlinearizedto showtherasterplot.Aswecanseemostofthebrightpixelswhichcorrespondto theedgesreveryearlywithin5msandcontainmostoftheinformation. 52integrate-and-reneuronsareusedasclassiers.Eachpixelconnectsto eachofthe52classiers.Thuseachclassierhas10,000inputconnections.The weightsaretrainedonlyonceforeachclassierbysettingtheweightintimefor connectionfromapixelforwhichaaspikeispresentintherst5msofdata andsettingrestofsynapsestootherclassierstobeinhibitory.Theweightsare normalizedbythetotalnumberofspikespresentintherst5msofdata. 90

PAGE 91

Whenacardisplacedinthesetup,theconnectionstotheparticularclassier becomeactiveanddriveenoughcurrentintotheneurontomakeitspikeearlier thanothers.Theotherclassiersmayalsosharesomecommonspikingpixels andmayreeventuallybuttheyrelater.ThisisillustratedinFigure4-12where wecanseethatthisnetworkactsasamatchedclassier.Theintensityinthe imagerepresentsspiketimes.Wecanseethattheearliestspiketimesareon thediagonalshowingthattheclassierworksbydetectingtimetorstspike. Classiersforcardssuchastheseven,eightandnineofanysuitshavecommon edgesthusrecloselybutlaterintime.Thesecanbesuppressedbyhavingstrong inhibitionconnectionbetweenclassierssuchthattherstspikingclassiercan suppressactivityofothers. AEdgesfor8heartscard BSpikesfromimagerfor8ofheartscard Figure4-11.Leftgureshowss100 100imageof8ofHeartscardafterhighpass lteringandsubtractingfromoriginalimage.Rightgureshowsspikes generatedfromasimulatedtimetorstspikeimager.Brightpixelsre early.Asthethresholdreduceslinearlytozero,alltheremainingpixelsre between30msto32ms. Usingthecardrecognitionexperimentasanexamplewecanlookatsomeuseful featuresofthesimulatorandperformance.Inthecardsetupeachinputpixelis connectedto52neuronseitherwithanexcitatoryorinhibitoryconnection.Sofor eachincomingspike,SpikeSimhastoprocessthatmanyconnections.Forexample ifthereareatotalof3000spikesin6ms,SpikeSimprocesses 3000 52=156000 connectionswhichitcompletesinabout5secondsforevent-basedsimulationand 18secondsfortimestepsimulation.Figure4-13showstheperformancecomparisonfor 91

PAGE 92

Figure4-12.Thisgureshowsthetimetorstspikeduringtesting.Allredpixelsare zerowhichmeansnospikeoccurred.Wecanseethatthematched classieralwaysspikesrst. timestepandevent-basedsimulationforthecardrecognitionexperiment.Eachofthe 52cardsgeneratesdifferentnumberofspikes.Thestepsizefortime-stepsimulation wasselectedtobe100nsforaccuratesimulation.Asthetimestepissmaller,the timestepsimulationresultsbecomemoreaccuratewhencomparedtoevent-based simulation.Thesimulationtimedifferencebetweenbothmodesofsimulationwill increaseexponentiallywheremorecomplicatedmodelsofneuronsareusedandthey requirenertimestepforaccuratesimulations.Alargegapinthedataisbecausenone ofthe52cardsgeneratespikecountbetween3200to4000.Thesimulationsetuptime forabout10,000neuronsand520,000connectionsabout4secondsthroughMATLAB. 92

PAGE 93

Figure4-13.Figureshowstimeforsimulationvs.numberofspikesforsameexperiment forSpikeSimfortime-stepandevent-basedsimulations.Thetimestepis 100nsfortime-stepbasedsimulations.ThisdoneonIntel r Core TM 2Duo 3.00GHzprocessorwith4GBRAM.Variationintimeisprobablydueto kickinginofJAVAGarbagecollector.Thesetimesweremeasureusing MATLAB.Actualtimeareslightlylower. Figure4-14showsthehowtheinternalvariablesforasimulationthatcanbe recordedfromSpikeSim.Themembranevoltage,countervalueandcurrentvaluefor aparticularneuroncanberecorded.Thishelpsindebugginganddesigninganeuron networkandspecifyingrequirementssuchascounterwidth,currentvaluesetc.Aswe canseefromthegurethemaximumandminimumvalueofcounterforthisexperiment canbebetween20and-30.TheseresultsshowsalsoanimportantfeatureoftheUF TimeMachine.Eventhoughthenumberofsynapsesperneuronisverylargeonly asmallnumberofthemareactiveatanygivensimultaneously.Comparedtoother designs,theUFTMdoesnotneedtobuilddedicatedsynapseforallsynapsesthatcan belistedintheconnectiontable.Thisistypicallythecasewithbiologicalnetworks. 93

PAGE 94

Figure4-14.Onrightinternalvariablessuchascounter,currentandmembranevoltage valueoftheclassierneuronfor8ofheartsasinFigure4-11Aareshown. Ontheleftinternalvariablesfor2ofHeartsareshownwhenstimulusisfor 8ofheartsasinFigure4-11B 94

PAGE 95

CHAPTER5 RESULTSANDPERFORMANCEMETRICS 5.1Customanalogcircuits Figure5-1.Amicrographofa32neuronchipfabricatedinONSemiconductor's0.5 m technology.Tokeeptheaspectratioclosetoone,32neuronsarefolded abouttheverticalaxis. Afullyfunctionalcustomdesignedmixedsignalchipwith32neuronseach having32synapsesexcitatoryand16inhibitoryperneuronwasbuiltwiththe ONSemiconductor's0.5 m C5NprocessesthroughMOSIS.Thechipmicrograph excludingthepadsisshowningure5-1.Insteadofbuildingalineararrayof32 neurons,16neuronsareeachplacedabouttheverticalaxiswiththeAERblockin between.Thishelpstokeepthechipassquareaspossible.Thedecoderinplaced ontheoutsideedgeoftheneuronarraytoreceiveadirectinputfromthepads.The neuronsareplacedontheinneredgesoastoconnectdirectlytotheAERblockinthe 95

PAGE 96

center.Betweentheneuronanddecoderisthecounterandthecapacitor.Duetothe largefeaturesizeoftheprocess,thecapacitorareaisdominantascomparedtoother blocks.Thecapacitorsizeiskeptlargesoastogeneratebiologicaltimespiketimes. Theprocessusedisquiteoldanddoesnotsupportgenerationandmirroringofvery lowcurrentsreliably.ThedesignofthesecomponentsisdescribedinSection3.2.Many otherchipswerefabricatedpreviouslytotestandcharacterizethecomponentsofthis chip.Themeasuredresultsfromthischipandothersarepresentedinthischapter. Specialemphasiswasplacedonmatching,areaandsignalintegrityduringlayout. Someimportantcharacteristicsofthechiparedescribedintable5-1. Table5-1.Fabricated32neuronchipspecications Specication Value Units Numberofneurons 32 Numberofsimultaneousactive synapsesinhibitoryandexcitatory each. a 512 Chiparea b 1.77mm 1.272mm=2.25 mm 2 Neuronarea 30.6 m 46.8 m =1432.08 m 2 Capacitorarea 0.0203 mm 2 CapacitorValue 13.88 pF PowerSupply 5 V Biasblockstaticcurrent 100 A a 16perneuron b doesnotincludeareaofpads Figure5-2showsresultsmeasuredfromafabricated8-bitcounterchip.Only selectedpinsaremeasuredforverifyingthefunctionality.Inthegure,onleft,the counterismadetocountupusingcontinuouspulses. S out 7 ,theoutputofthemost signicantbitcounterindicatingthatthecounterissaturated,goeshighafterthecorrect numberofcountuppulses.Similarly,wecanseethatforcountdownpulsesthecounter countsdowntothelowestvalueandsaturates.Therecordedchipdataisnoisyasa 96

PAGE 97

verylowsamplingrateof100Hzisusedtorecorddatatocaptureasmanydatapoints possiblefromtheoscilloscope. Figure5-2.8-bitcounterselectedbitsfromchipshowthatcountercancountupleft gureandcountdownrightgureandsaturates.Seetextforexplanation. Table5-2listssomeimportantmeasuredresultsforthecounter.Themeasured powerofthechipwhenthecounterissaturatedisnegligible.Thisisbecausebydesign, allbitsofthecounterknowwhentheyaresaturatedandtheincomingupordowncounts aregatedcausingnoactivityinthechip.Asseeninthedelayforthecounterisvery smallandthecountercanrunataveryhighspeed.Itwasnotpossibletoverifythe maximumspeedduetounavailabilityofsuchhighspeedtestingequipment.Thelayout ofeachcounterbitissuchthatitisstackableverticallyandhorizontallythusleadingto averycompactlayoutforthefull5-bitcounter.Figure5-3showsthemembraneoutput voltagerecordedfromaneuroninthechip.Anactive-lowneuronrequestisgenerated wheneveraneuronspikes.Theacknowledgmenttopplotinthegureisreturned byLabView r softwareinterfacedtothechipviaaPCIe6133cardmanufacturedby NationalInstruments.TheOpalKellyboardcannotbeusedsincethechipworksat5V andtheXilinxSpartan3eFPGAontheboardisnotcompliantwith5Vdevices.Itcan 97

PAGE 98

Table5-2.Fabricatedcounterspecications Specication Value Units Areaperbit 37.8 m 54.0 m =2041.2 m 2 5bitcounterarea c 0.1575mm 0.0936mm=0.014742 mm 2 PowerSupply 5 V Standbycurrentnoactivity 20 pA Min.switchingpowerat1KHz 19.5 nW Delay 9 ns c IncludesareaforcircuitsshowninFigure3-9 beseeninsamegurethatthecomputertakesaverylongtimerelativelyabout340 s torecognizetherequestandreturntheacknowledgment.Thisfurtherobviatestheneed tomakeahighspeedmonitoringdevicetoserviceaneuronasfastaspossible. Figure5-3.Figureshowshowthemembranevoltageandtherequestgeneratedbythe neuronandtheACKisreturned. Figure5-4isgeneratedwhenthevoltagecontrollingtherefractoryperiod V rfr in Figure3-6isvariedfrom0.30Vto0.4V.Thespikerateincreasesasthevoltageis increased. 98

PAGE 99

Figure5-4.Figureshowshowthespikeratechangeswithvaryingtherefractoryperiod voltage vrfr inFigure3-6 Figure5-5showshowthespikeratevarieswithchangeincurrentthatissplitbythe currentsplittercircuit. Figure5-5.Variationinspiketimingswithchangeinmaincurrentthatissplitbythe currentsplittercircuit. 99

PAGE 100

Figure5-6.Thegureshowsthevariationinpowerandenergyconsumedbytheneuron perspike. Figure5-6showsthevariationinpowerandenergyperspikeconsumedbythe chip.Thisiscalculatedbymeasuringtheaveragepowerdrawnthroughthepower supplywhichsuppliescurrentonlytotheneuroncircuitincludingthecounterand currentcopycellsinFigure3-12.Asexpected,thepowerconsumedincreases exponentiallywithringrate.Theringrategetslimitedbytherefractoryperiod.The poweristhendividedbythespikeratetocalculatetheenergyperspike.Theenergyper spikeshowsaminimaatacertainspikerate.Thisisbecauseatlowerspikerates,the currentconsumedislowbuttheneuronintegratesoveralongertimebeforethepositive feedbackstarts,requiringmoreenergy.Howeveratmuchhigherratesitintegratesfor lesstimebutconsumesmorecurrentduringthattimetospikefaster.Thereisaminima wherethespikerateversustheenergyrequiredisoptimal.Thiskindofbehavioris alsoobservedinbiologytoowherebiologyselectsanoptimalrateofringforminimum energyconsumption.Thedataisapproximateasmeasurementsatverylowcurrent valuesisnoisy. 100

PAGE 101

5.2FPGA-BasedDigitalCircuits AbreakoutPCBfortheOpalKellyFPGAboardshowninFigure5-7wasfabricated toconnectthedigitalcontrollertoeitherthedigitaloranalogneuronarray.Thisboard hasinterfacesfortheAERinputifadigitalcontrollerisrunningontheFPGAanda AERoutputbususedifaneuronarrayisimplementedontheFPGA.Anoptionalbus isprovidedwhichcanbeusedforlevelconversionfrom5Vto3.3Vtoconnectthe fabricatedchipstotheFPGA. Figure5-7.PCBdesignedtoconnecttheneuronarrayanalogordigitaltothedigital controllerandUSBmonitoring. 5.2.1NeuronArray Table5-3liststhespecicationsforadigitalneuronarraythatwasbuiltonanFPGA with4milliongates.Theneuronarrayconsistsofalineararrayofdigitalneurons,a decoderforreceivinginputandandadigitalAERforsendingoutspikestothedigital controller.Theneuronimplementsaleakyfactorandarefractoryperiodwhichcan enabledordisabledbytheuser.Eachneuronusesjustlessthanonepercentofthe 101

PAGE 102

totalFPGAareaaslistedintable5-4.Theneuronconsistsofthesaturatingcounter, anaccumulatorandacomparator.Digitalimplementationsofferadvantagesintermsof speed.Fromthetableitisclearthattheneuronmembranevoltagecanbecalculated withveryhighaccuracyowingtoafastclock.However,duetoslowtimescaleandhigh tolerancecharacteristicsofspikingnetworks,theFPGAcanberunmuchslowerthan required.Thisprovidestwoadvantages.Firstly,runningatslowerclockspeedmeans lowerpowerconsumption.Secondly,toimplementbiologicaltimespiketimingssoas tointerfacewithrealtimespike-basedsensors,thewidthoftherequiredaccumulators dependsontheclockspeed.Signicantsavingsinareabyusingfewerbitsinthe accumulatorcanbeachievedbyslowingdowntheclock. Table5-3.Digitalneuronarrayspecications Specication Value Units Numberofneurons 196 Numberofneuronsifmultiplexed d 6,144 Neuronstateupdatespeed 153 MHz NeuronStateupdatespeedifmultiplexed 1 MHz Maximumnumberofsimultaneoussynapses inhibitoryandexcitatoryeach 50,176 Maximumnumberofsimultaneoussynapses inhibitoryandexcitatoryeachifmultiplexed 1,572,864 Thresholdresolution 16 bits Inputdecoderdelay 2.04 ns AERdelayperdepthofarbitertree 5 ns Leakfactorresolution 6 bits Refractoryperiodresolution 8 bits d 48neuronsmultiplexedonto1neuron Table5-3alsoshowsthateventhoughthedecoderandAERareclockedcircuits, theyappearasasynchronousinputsandoutputsrespectively,owingtotheirhighspeed. Manysimultaneousupordowncountscanbeappliedtothedecoderveryfastwhich incrementstheasynchronouscounter.Theslowneuronclockthenupdatestheneuron withtheeffectofallcountstakentogether.Similarly,theneuroncannotgeneratespikes 102

PAGE 103

atsuchahighrate,sotheAERblocksworksalmostinanasynchronousfashionforthe neuron. Theresource-hungryblocksintheneuronaretheaccumulatorandtheblocks usedtocalculatetheresistiveleakateverytimestep.Eachphysicalneuroncanbe multiplexedtosimulatemanyneurons.TheFPGAneuronclockisrunatthehighest possiblespeedandthestateoftheneuronoftheneuronisupdatedonlyatthedivided clockspeed.Thecounter,accumulatorandleaksubtractorcanbesharedbetweenthe multiplexedneuron.Ateachclockcyclethestateoftheneuronisfetchedfromaregister arrayandstoredbackintothearrayafterupdate.Thenumberofneuronsthatcanbe multiplexedislimitedbytheFPGAclockspeedandmemorycapabilities.Asshown inintable5-3,eachphysicalneuroncanmultiplex64neuronsgivingatotalof6,144 neuronsand1,572,864synapses.Theupdateclockofeachneuronisnow1MHzwhich iscalculatedbyaccountingformultiplexingoverheadanddividingtheremainingtime bythenumberofmultiplexedneurons.Theupdateclockcanonlybemadeslowerifthe weightresolutionisincreasedsoastokeepweighttimessmallerthanneuronupdate timesforaccuracy.Thismultiplexedneuronarrayhasbeennotimplementedasofnow andispartoffuturework. Table5-4.DigitallogicrequirementsfortheneuronontheXilinxXC3S4000FPGA LogicUtilization Used Utilization a NumberofsliceFlipFlops 36 0% Numberof4inputLUTs 172 0% Numberofslices 93 0% Numberof16-bitadders 3 0% Numberof16-bitcomparators 2 0% Numberof16bitshifters 1 0% a ThisisfortheXilinxXC3S4000FPGA 5.2.2Controller ThedigitalcontrollerisanintegralcomponentoftheUFTimeMachine.Itisthe controllerwhichprovidesthescalabilityandexibilitythatisclaimedbythisarchitecture. 103

PAGE 104

Table5-5.Digitalcontrollerspecications Specication Value Units Totalnumberofsynapsesinconnectiontable e 16,384 Numberofconnectionsperpre-synapticneuronfanout e 158 Weightresolution f 10 bits Dynamicrangeforweights 60 dB SynapticDelayresolution f 10 bits Dynamicrangeforsynapticdelay 60 dB MaximumDelay g 1 ms WeightJitter 1 LSB e CurrentlyimplementedusingBlockRAMsontheFPGA.CanbeverylargeifanexternalSDRAMis used. f CanbehigherforanFPGAwithmoreBlockRAMs. g Dependsonhowfasttheweightbufferistraversed.Slowerclocktradesoffhigherrangewithresolution. Thedigitalcontrollerisrequiredtobeasfastaspossibleinmappingincomingspikes, schedulingeventsandthensendingtheeventsontime.Table5-5showstheachievable specicationofthiscontrollerwhenimplementedontheOpalKellyFPGAboard.The weightsarequantizedto 1 s withatotaldynamicrangeof1ms.Thedynamicrangeof theweightsdependsonthememoryavailableontheFPGA.Theweightschedulingand readingeventsoftheFPGAneedtoworkveryfastanditisverydifculttoimplement themusinganSDRAM.ThesynaptictableasdescribedinFigure3-24isimplemented currentlyontheFPGAboarditselfforsimplicity. Foreachincomingspike,apipelineissetuptoreadaconnectiontableentryand scheduleaneventintheweightbuffer.Thenumberofclockcycleseachspikehasto waitbeforeallofitsconnectionservicedisequaltosettingupofthepipelineandthen5 timesthetotalnumberofpost-synapticconnections.Thecontrollerrequires5cyclesto updatetheweightbufferandtheassociatedcounter.Hence,thethroughputincreases withincreaseinnumberofpostsynapticconnections.Asthenumberofconnections becomelarge,connectionper5FPGAclockcyclesservicedapproachestheidealvalue of1.Table5-6liststhesevalues.Thetotalnumberofclockcyclesrequiredtoservicea 104

PAGE 105

spikeisgivenas: Num clkCycles =3+ Num connections Therefore,thenumberofsynapsesservicedpersecondisapropertyofthecontroller designandFPGAusedandisconstant.However,themaximumconstantringrate forDCinputthatcanbesupportedwithoutmakinganeuronwaitforitsrequesttobe serviceddependsonthefanoutofthatneuronandvarieswithdesignofthespiking networkthatUFTMiscurrentlyrunning. TheconnectiontablecanbemadeverylargeusingtheSDRAMavailableonthe board.MostSDRAMssupportreading/writinginblockorpagemode.Thereisan overheadinsettingupthetransfer,butafterthedataisavailableateveryclockcycle whichinthiscaseislimitedto133MHz.Thecontrollercanberunatmuchhigher speedthanthatoftheSDRAM.However,thelimitingstepintheprocessingofaspike isfetchingthedatafromthememorywhichinthiscasewillbe133MHz.Hencethe controlleristhenclockedusingthesameclockastheSDRAMsoastoavoidany bufferingorwaitingfordata.Table5-7liststhespeedatwhichconnectionscanbe servicedwhenstoredinexternalSDRAM.Table5-8liststhehardwarerequirementsfor thedigitalcontroller.ThemainresourcehungryblockisjusttheBlockRAMsthatare bothcurrentlyusedfortheweightbufferandtheconnectiontable. Table5-6.DigitalcontrollerspeedspecicationswhenusingblockRAMonFPGA Specication Value Units ClockSpeed 171MHz Minimumoverheadtoserviceoneconnection h 3 clockcycles Numberofsynapsespersecondforeach spike,notincludingoverhead h 34 megaper second h ThisscaleswithhigherspeedFPGAs. 5.2.3USBcommunication Anevent-basedmoduletocapture,timestampincomingeventsandtransferthem overUSBtoacomputerasdescribedinSection3.3hasbeenimplementedonthe 105

PAGE 106

Table5-7.DigitalControllerspeedspecicationswhenusingexternalSDRAM Specication Value Units ClockSpeed 133MHz Minimumoverheadtoserviceoneconnection 3 clockcycles Numberofsynapsespersecondforeach spike,notincludingoverhead i 26.6 megaper second i ThisscaleswithhigherspeedFPGAs. Table5-8.DigitallogicrequirementsforthecontrollerontheXilinxXC3S4000FPGA LogicUtilization Used Utilization j NumberofsliceFlipFlops 115 0% Numberof4inputLUTs 531 0% Numberofslices 266 0% NumberofBlockRAMs 90 94% j ThisisfortheXilinxXC3S4000FPGA OpalKellyboarddescribedinthesamesection.Table5-9liststhespecicationsofthis designandtable5-10liststheresourcerequirementsforthisdesignonFPGA.As Table5-9.USBtimestampboardspecications Specication Value Units ClockSpeed 100 MHz Timestamprange 32 bits Timestampresolution 1.27-0.02 s Addressbuswidth 16 bits Sustainedeventrate 2 Meps Averagedatarate 3 Meps Minimumtimebetweeneventsfornowait 40 ns mentionedinearlierinSection3.3.2,theOpalKellyboarddoesnotprovidelow-level accesstothebuffersimplementedbykerneloftheoperatingsystem.Thereforethedata thatneedstobesentviabulktransferhastobeavailableinabufferontheFPGAitself whenthereadcommandisissuedbythecomputer.Thismakesthemaximumsustained eventratetobedependentonthemaximumbuffersizethatcanbebuiltontheboard. FortheOpalKellyXEM3050boardtwobuffersofmaximumof64KBeachcanbebuilt. Assumingabout6mshighlydependentonmachineandoperatingsystemusedbefore 106

PAGE 107

Table5-10.USBtimestampboardresourcerequirements LogicUtilization Used Utilization k NumberofsliceFlipFlops 367 1% Numberof4inputLUTs 569 1% Numberofslices 442 1% NumberofBlockRAMmodules l 60 62% k ThisisfortheXilinxXC3S4000FPGA l Thisdeterminesthemaximumsustainedeventratethatcanberecordedandiscongurable.See textfordetails. anotherdatatransferbeginsonaIntel r Core TM 2Duo3.00GHzprocessorwith4GB RAM,Windows7,thesegivesaupperlimitonsustaineddataeventsrateofabout2.4 Meps.Inpractice,about2Mepscanbeachievedonslowercomputers.Thisisnota severelimitationasthisdesigncanbeeasilyimplementedonanFPGAboardwhich providedmoreonboardBlockRAMs.Amorecomplicateddesigncanbeimplemented whichcanstoreinformationonSDRAMmoduleavailableoutsidetheFPGA.Themore interestingspecicationisthetimearequesthastowaitbeforeitiscapturedandan acknowledgementissued.Thisis4cyclesinthisdesign.Neuronstendstogenerally resparsely.However,inaneventofburst,theboardcankeepupwithitaslongas thenumberofeventsarenotlargeenoughtollthe64KBbufferbeforethenextread isissued.Thetimestampresolutioniscongurableviathesoftwareonthecomputer. Theusercanalsospecify,thenumberofeventsthathavetooccurbeforeareadis issued.Thisprovidesexibilityincontrollingtherateatwhichtheeventsaredeliveredto hostandcanbesetaccordinglyforeitherreal-timevisualizationordatarecording.This trades-offbetweenmaximumsustainedrateandreal-timeupdateofeventstothehost. 107

PAGE 108

5.3ScalingandComparison Thesimultaneouslyactivesynapsesareattractiveintermsofscalability.ForUFTM, assumeNneurons,S-bitcounterandAareaperbitofthecounter.Then Totalsimultaneousactivesynapses = N 2 S Synapseaddresslength = log 2 N +1 bits Areapersynapse = 2 S S A ForthecurrentFPGAimplementation,withN=6,144multiplexedandS=9,total numberofsynapses=3.14millionmultiplexedanda14bitsynapseaddressspaceis required.Aconventionalsynapticconnectionrepresentationontheotherhandwould require22bitsforsamenumberofsynapses.Thisleadstoasavingofalmost1MBof memoryforevery1millionconnectionand8physicalpinsonthechiporFPGA.Since thesevirtualsynapsesarebuiltusingdigitalcircuits,thenumberofsynapsesarehighly scalable. Thedigitalimplementationoftheneuronarrayandthecontrolleralsoscale verywellwithtechnology.OnalatestXilinxVirtex-6XC6VLX760-1FPGA,about 2560physicalneuronsor230,000multiplexedneuronscanimplemented.Withan 8-bitcounterthisleadsto1.3millionphysicaland118millionsimultaneouslyactive synapsesthatcanbeimplemented.Theanalogversionscaleswithtechnologybutnot asaggressivelyasdigital.Itismoresuitedforimplementingcomplexmodelsofneurons. Mostoftheneuronarrayarchitecturesarebuiltforaspecictaskorpurpose.The detailsofneurobiologycanbeabstractedatvariouslevelsleadingtoverydifferent architectureimplementations.Itisthusverydifculttocomparethesearchitecturesin termsoftheircapabilitiesandperformance.AmodestattemptismadeinTable5-11. FortheanalogversionofUFTMitiscomparedtotheIFATtransceiverarchitecture discussedinearlierinChapter2andfordigitalversionitiscomparedtotheSpiNNaker architecture,alsodiscussedinthesamechapter. 108

PAGE 109

Table5-11.ComparisonofUFTMwithotherarchitectures Specication UFTM Analog m IFAT[109] UFTM Digital m SpinNNaker [65] Numberneurons 32in5 mm 2 2,400in 9 mm 2 6,144on XC3S4000 2,800on2 ARMcores Synapsesper second 34M/s 1M/s 34M/s 12800/s NMDA Weightresolution 10bits 3bits 10bits 16bits Simultaneously activesynapses 1024 None 3,145,728 Yes Power 540 W 10 6 events/s+ 10 W @100Hz spikerate 625 W 10 6 events/s 220mWmin. 12.5mW25mW ARM98 core Biological time-scale Yes No Yes Yes SynapticDelay Yes No Yes Yes Refractoryand leaky Yes No Yes Yes m Thiswork. Themostimportantperformancemetricforthesearchitecturesisthenumberof neurons.ItisclearfromthetablethatFPGAsowingtotheirparallelismallowforlarger numberofneuronswhencomparedtosequentialARMprocessors.However,apenalty ispaidintermsofcomplexityandexibility.Foranalog,itmayseemthatthiswork comparesmiserablytoIFATwhenimplementedinthesametechnology.However,the IFATisbuiltforacceleratedsimulationratherthanreal-timebiological-scaleoperation. Thisallowsforthemembranecapacitorstobemuchsmaller,ascomparedtoUFTM, leadingtohigherdensityofneurons.TheUFTMoutperformsbothIFATandSpiNNaker innumberofsynapsesprocessedpersecondduetothenovelimplementationof weights.SincenoexplicitroutingofweightsisrequiredinUFTM,ahigherresolutionof weightscanbeimplementedintheFPGA.Thisresolutioncanbeeasilyincreasedby usinganFPGAwithlargeronboardblockRAM. 109

PAGE 110

Thenovelideaofsimultaneouslyactivesynapseshasbeenemphasizedrepeatedly inthiswork.Itallowsformoresynapticandtimedynamics.Thisfeatureisnotpossible ofIFATandcanbepossiblyimplementedonSpiNNakerbutisnotexplicitlymentioned inthepublishedwork.FortheanalogversionofUFTM,thepowerconsumedbythe analogarrayasreportedinthetable,includespowerofthesynapseswhenhandling1 megaeventspersecondwithneuronsgeneratingspikesat100Hz.Thereportedpower forIFATdoesnotincludethepoweroftheexternalDACrequiredtocreatetheresting potentialateveryclockcycle.ThissignicantpowerconsumptionisavoidedinUFTMby usingthecurrentDACandbyrepresentingweightsintime.Thedigitalversionofneuron architecturesrequiremuchmorepowerascomparedtoanalog.TheFPGAduetoits routingfabricandotherperipheralsrequiresanorderofmagnitudemorepowerthan ARMcores. 5.4Discussion Inthischapter,detailedresultsandperformancemetricsfortheanaloganddigital implementationoftheUFTimeMachinearepresented.Theanalogneuronisattractive duetoitspowerconsumption.Manymoreanalogneuronscanbepackedinachip whenfabricatedinanystateoftheartprocesswithsmallerfeaturesizeandmore numberofroutinglayersavailable.Evenmoredensenetworkscanbebuildusing3-D VLSItechnology[43].FPGAsontheotherhandrequiremoreresourcestoimplement thesameneuronsandachievesamenumberofbitsofresolution.However,FPGAs havethecontinuousscalingofdigitaltechnologyandrobustnessofdigitaldesigntotheir advantage.TheneuronscanbeeasilymultiplexedorfasterFPGA'scanbeusedasthe costofspendingmorepower.Itisverydifculttocontrolthevariationinparameters acrosschipinabigneuronarray.Thequickre-programmingofanFPGAisabig advantagecomparedtothelongtimeittakesforfabricationofacustomanalogchip. ItisbesttoimplementthesimplerneuronmodelsintheFPGAandthemorecomplex 110

PAGE 111

modelsinanalog.TheUFTimeMachinesupportsbothdigitalandanalogversionof neuronarrays. Digitaltechnologyprovidesalltheadvantagesforthedigitalcontroller.Thespeed ofcontrollergetsbetterwithadvancementsintechnology.Morecapabilitiescanbe addedtothecontrollerowingtofasterspeedsandshrinkingofdesignsize.Ina multi-neuronchip,organizedasabinarytree,everynodeexcepttheleafnodeis justthisdigitalcontrollerwhichisresponsibleforroutingspikesandsynapticparameters accordingly.Withadvancementintechnology,sequentialprocessorsrunningsimple operatingsystemssuchtheARMCore,TI'sOMAPprocessorshavebecomevery fastaccompaniedbyperipheralssuchasDMAcontrollerandDSPprocessors.These processorscanbeprogrammedinahigherlevellanguagetoimplementdigitalcontroller withmanymorecapabilities. 111

PAGE 112

CHAPTER6 CONCLUSIONANDFUTUREWORK Inthisthesis,thedesignofanovelarchitectureforcomputationwithspikeshas beenpresented.Firstly,someoftherelevantrecentpublishedworkwerereviewed. Numerousspikingneuronnetworkchipshavebeenfabricatedbyvariousgroups toexplorevariousneurosciencequestionsinareassuchasvision,auditionand attention.NeuronarchitecturesdesignedandoptimizedforFPGAswerereviewed. Thearchitecturepresentedhere,calledtheUFTimeMachineUFTM,introduces anovelconceptofstoringweightsintime.Thismakesiteasiertoimplementvirtual connectionswithindependentweightsforeachconnection. Toimplementtheweightsandmultiplesynapticconnections,anovelsaturating up-downcounterwasdesigned.Thecounterhasastackingdesigntoincreasethe numberofbitsandthetotalsizeandtransistorcountscaleslinearlywithnumberof bits.Thearchitecturecanimplementbothexcitatoryandinhibitoryconnections.An integrate-and-reneuroncircuitandabinarycurrent-basedDACwereimplemented, fabricatedandtested.Tocommunicatespikesbetweenthechiptoadigitalcontroller, anasynchronousaddresseventrepresentationschemeandaninputdecoderwasalso designedandfabricated. AfullydigitalcounterpartoftheneuronwasdesignedinVerilogHDLandimplemented onanFPGA.Thisincludedleakyintegrate-and-reneurons,aclockedversionof theAERandaninputdecoder.Thedigitalimplementationiseasiertoprogramand testascomparedtoanalogcircuitsandcanbeverydenseifmultipleneuronsare multiplexedononephysicalneuron.ResultsinChapter5comparethespecications andrequirementsofthetwodifferentdesigns.Adigitalcontrollerwhichisresponsible forroutingspikestoandfromtheneuralarraywasdesignedandimplementedonan FPGA.Thecontrollercaneasilyprocessesupto34megaconnectionspersecondand canscheduleweightsanddelaysupto1mswith1 s resolution.AUSBmonitoring 112

PAGE 113

devicefortimestampingincomingspikesandsendingthemtoacomputerforstorage andvisualizationwasdesigned.Itsupportsasustainedeventrateupto2megaevents persecond. Tomodelthehardwarearchitecture,asimulatornamed SpikeSim wasdeveloped inJAVA.Itwasdemonstratedthatthesimulatorcancloselymodelthehardwareand ishighlyextensibleforfuturedevelopment.Itprovidestwomodesofsimulation.One isatimestepmodewhichhasmoreexibilityinimplementinganyneuronmodelbutis slowercomparedtoanevent-basedmodewhichusestheconceptofexactsimulation anddeferredeventprocessing.Simulationofalgorithmsallowforrapiddevelopmentand alsoprovideinsightintothehardwarerequirements.Differentspiked-basedapplications suchastheliquidstatemachine,anedgedetectionalgorithmandaplayingcard recognitionalgorithmwereimplementedonthesimulator.SpikeSimprovidesan interfacetoMATLABalso.Thissimulatorwillbemadeavailabletoothergroupsworking onspikealgorithmstofurtherexplorethecapabilitiesofthespikingarchitecture. SomekeyadvantagesoftheUFTimeMachineUFTMaregivenbelow: 1.UFTMprovidesanovelwayofstoringandapplyingweightstotheneurons. Bystoringweightsintimeandusingpulsewidthasmeasureofweight,UFTM providesaverylargedynamicrangeforweightswhichisgenerallyverydifcult toimplementinotherspike-basedarchitectures.Implementingweightdynamics suchasahigheramplitudeweightinlesstimeascomparedtoalowamplitude weightoveralargetimecanbeeasilyimplementedbyhavingmultipleconnections betweenthesameneurons.Byimplementingweightsintimeratherthanan instantaneousupdate,UFTMallowsfortemporalsequencingofspiketimingstobe implemented. 2.UFTMtogetherwithvirtualsynapseandstorageofweightsintimeallowsformany moresynapsesperneuron.Ittakesadvantageofthefactthatnotallsynapses areactiveatanygiventime.Thewidthofthecounterinaneurondeterminesthe maximumnumberofsimultaneouslyactiveexcitatoryorinhibitorysynapses.The addressspacerequiredforaddressingsynapsesissmallersavingpincountand memory. 3.UFTMcanalsoimplementstep-basedmodeofcomputationwhichisnot implementedbyanyotherarchitectures.Asweightsareimplementedinthepulse 113

PAGE 114

width,theweightscanbemadeinnitelargerthanaframetodostep-based computation.Theneuronsresetattheendofeachframe. 4.AdigitalcontrollerandsynapseinUFTMisindependentofthedesignofneuron arrayaslongastheneuronmodelacceptsaninputcurrentandproducesaspike. Thedigitalcontrollercaneasilyinterfacewithananalogordigitalneuronarray. Themajoraccomplishmentsinthisworkareasfollows: 1.Anovelspike-basedarchitectureforcomputingwithspikesnamedTheUFTime Machinewasproposedandimplementedinhardware.A32neuronand512 simultaneouslyactiveeachexcitatoryandinhibitorysynapsesanalogchipwas designedandfabricated.Adigitalimplementationwith196neuronsand50,176 simultaneouslyactiveeachexcitatoryandinhibitorysynapseswasdoneona4 milliongateFPGA. 2.AsimulatornamedSpikeSimwaswritteninJAVA.Thissimulatorprovidesa behavioralmodelofthearchitectureandhelpsindesigningthearchitecturefor particularapplications.Someapplicationshighlightingtheadvantagesofthe architectureweresimulatedonSpikeSim. 3.Adetailedperformanceanalysisofthehardwareimplementationisprovided.A comparisonbetweenanaloganddigitalversionsofneuronarrayforresource utilizationwasdone. Inthefuture,theimplementationoftheUFTMcanbeoptimizedandextendedtoprovide morecapabilities.Someoptimizationsandcapabilitiesthatcanbeaddedare: 1.Implementtheanalogneuronarrayinamorecompetitiveprocessesforhigher density.ImplementthedigitalneuronarrayinmorerecentFPGAwithhigher speedsandlargergatecounts.Multiplexingmanyneuronsonaonephysical neuroncanalsoleadtoveryhighdensityofneuronsonanFPGA. 2.ImplementtheconnectiontableonSDRAMtostorelargenumberofconnections. 3.ImplementlearningmechanismssuchasSTDPonthedigitalcontrolleraswellas inSpikeSim. 4.Implementsynapticdynamicsinneuronarraysbyusinglog-domainlteringtoget exponentiallyrisingandfallingtimeinputsynapticcurrent. 5.Implementneuronsmodelswithricherdynamicsinhardwareandmodelthemin SpikeSim. 6.TighterintegrationofthedigitalcontrollerwithSpikeSimtoallowtheuserto downloadthecongurationtoaFPGAwithatouchofabutton. 114

PAGE 115

Onewouldalwayshitthephysicallimitsoftechnologyveryquicklyifalargenumber ofneuronsandconnectionsarefabricatedinasinglechip.Thearchitectureshould takeadvantageofdenselocalconnectivityandsparseglobalconnectivity.Amulti-chip architectureisnecessaryforbuildingverylargespikingnetworks.Onewayofdoingthis istobuildabinarytreewheretheleafnodesaretheactualneuronarraysandtheparent nodesarejustmodiedformsofthedigitalcontrollerwhichcanacceptanincoming schedulingeventfromacontrollerfromthedaughternodeandforwardittoeitherits parentnodeorthedaughternode.Thesenodescanbeverycompactandmanyofthem canbefabricatedassmallchips. Formostdesignsthatwerereviewed,theimplementationisrelatedtoaneuroscience question.However,thereisaneedforageneralpurposespikeprocessortoimplement newalgorithmsonacommonplatform,whichUFTMprovides.Itisenvisionedthat thisarchitecturecanallowausertothinkintermsofbasicbuildingblocksthatcanbe repeatedlyusedtobuildlargercomputingblocks.Byreducingthenumberofparameters forcongurationoftheneuronandsynapses,programmingofthisarchitectureissimpler andfaster.UFTMwillprovideauniversalfront-endtoallowforsimulationonJAVAand thenportingthealgorithmontotheFPGAforhardwareemulation.However,thereare someissuesthatneedfurtheranalysis.Synapticconnectionsareimplementedasstep responsesandaddedlinearly,whichisverydifferentofwhatisobservedinbiology. Theeffectofsynapticdynamicsoncomputationalpowerisnotfullyunderstood.The dynamicrangeforweightsimplementedintime,whichitselfarenon-biological,canplay animportantrole.Alargeweightcanbesimulatedwithmultiplesynapticconnections activatedatthesametimeorbyhavingalargerweightvalue.Theformerreducesthe numberofavailablesynapseswhilethelattercanskewthetimingbetweenspikes.An importantquestionthatariseswithincreasingcomputationalpowerofthecomputerand everincreasinggatedensitiesinFPGAsis:Isitevencompetitivetobuildanalog-based neuronarrays?Theanswerliesinscalabilityofthesenetworks.Hardwareneuronstake 115

PAGE 116

advantageofmassiveparallelizationwhichisnotpossibleonmicro-processorbased systems.OntheFPGAimplementingmorecomplexhigherdimensionalmodelsof neuronsrequiresalotmoreresourcesascomparedtoanalogcircuits. 116

PAGE 117

REFERENCES [1]TellurideNeuromorphicCognitionEngineeringWorkshop,2010.[Online]. Available:https://neuromorphs.net/nm/wiki/2010 [2]AnadigmFPAA.[Online].Available:http://www.anadigm.com/fpaa.asp [3]CAVIARProject.[Online].Available:http://www2.imse-cnm.csic.es/caviar/ [4]FACETSFastAnalogComputingwithEmergentTransientStates.[Online]. Available:http://facets.kip.uni-heidelberg.de/ [5]jAEROpenSourceProject.[Online].Available:http://sourceforge.net/apps/trac/ jaer/wiki [6]OpalKellyIncorporated,2010.[Online].Available:http://www.opalkelly.com/ [7]PCI-AERboardDriver,Library&Documentation.[Online].Available: http://www.ini.uzh.ch/ amw/pciaer/ [8]P.E.Allen,D.R.Holberg,andAllen, CMOSAnalogCircuitDesign ,2nded. OxfordUniversityPress,USA,2002. [9]R.Ananthanarayanan,S.K.Esser,H.D.Simon,andD.S.Modha,Thecatisout ofthebag:corticalsimulationswith 10 9 neurons, 10 13 synapses,in Conference onHighPerformanceNetworkingandComputing .Portland,Oregon:ACM,2009, pp.1. [10]J.V.ArthurandK.A.Boahen,Recurrentlyconnectedsiliconneuronswithactive dendritesforone-shotlearning,in IEEEInternationalJointConferenceonNeural Networks ,vol.3,2004,pp.1699. [11],Learninginsilicon:Timingiseverything, AdvancesinNeuralInformation ProcessingSystems ,vol.18,p.75,2006. [12],Synchronyinsilicon:thegammarhythm. IEEETransactionsonNeural Networks ,vol.18,no.6,pp.1815,2007. [13]J.Axelson, USBComplete:EverythingYouNeedtoDevelopCustomUSB PeripheralsCompleteGuidesseries .LakeviewResearch,2005. [14]C.BartolozziandG.Indiveri,SynapticdynamicsinanalogVLSI. Neural Computation ,vol.19,no.10,pp.2581,Oct.2007. [15],SelectiveAttentioninMulti-ChipAddress-EventSystems, Sensors ,vol.9, no.6,pp.5076,2009. [16]D.BenDayanRubin,E.Chicca,andG.Indiveri,Characterizingthering propertiesofanadaptiveanalogVLSIneuron, LectureNotesinComputer Science ,pp.189,2004. 117

PAGE 118

[17]R.Berner,T.Delbr uck,A.Civit-Balcells,andA.Linares-Barranco,A5Meps$100 USB2.0Address-EventMonitor-SequencerInterface,in ProceedingsoftheIEEE InternationalSymposiumonCircuitsandSystems ,2007,pp.2451. [18]R.Berner,HighspeedUSB2.0AERInterfaces,Master'sThesis,ETHZurich, 2006. [19]T.V.BlissandG.L.Collingridge,Asynapticmodelofmemory:long-term potentiationinthehippocampus. Nature ,vol.361,no.6407,pp.31,Jan. 1993. [20]K.A.Boahen, Communicatingneuronalensemblesbetweenneuromorphicchips KluwerAcademicPublishers,1998,ch.11,pp.229. [21],Athroughput-on-demandaddress-eventtransmitterforneuromorphicchips, in IEEE20thAnniversaryConferenceonAdvancedResearchinVLSI ,1999,pp. 72. [22],Point-to-pointconnectivitybetweenneuromorphicchipsusingaddress events, IEEETransactionsonCircuitsandSystemsPartII:AnalogandDigital SignalProcessing ,vol.47,no.5,pp.416,2000. [23],Aburst-modeword-serialaddress-eventlink-I:transmitterdesign, IEEE TransactionsonCircuitsandSystemsPartI:RegularPapers ,vol.51,no.7,pp. 1269,2004. [24],Aburst-modeword-serialaddress-eventlink-III:analysisandtestresults, IEEETransactionsonCircuitsandSystemsPartI:RegularPapers ,vol.51,no.7, pp.1292,2004. [25]J.M.BowerandD.Beeman, TheBookofGENESIS:ExploringRealisticNeural ModelswiththeGEneralNEuralSImulationSystem ,2nded.Springer,1998. [26]J.M.Brader,W.Senn,andS.Fusi,Learningreal-worldstimuliinaneuralnetwork withspike-drivensynapticdynamics. NeuralComputation ,vol.19,no.11,pp. 2881,2007. [27]R.Brette,M.Rudolph,T.Carnevale,M.L.Hines,D.Beeman,J.M.Bower, M.Diesmann,A.Morrison,P.H.Goodman,F.C.Harris,M.Zirpe,T.Natschl ager, D.Pecevski,B.Ermentrout,M.Djurfeldt,A.Lansner,O.Rochel,T.Vieville, E.Muller,A.P.Davison,S.ElBoustani,andA.Destexhe,Simulationofnetworks ofspikingneurons:areviewoftoolsandstrategies. JournalofComputational Neuroscience ,vol.23,no.3,pp.349,Dec.2007. [28]K.BultandG.Geelen,AninherentlylinearandcompactMOST-onlycurrent divisiontechnique, IEEEJournalofSolid-StateCircuits ,vol.27,no.12,pp. 1730,1992. 118

PAGE 119

[29]N.T.CarnevaleandM.L.Hines, TheNEURONBook ,1sted.Cambridge UniversityPress,2009. [30]A.CassidyandA.Andreou,Dynamicaldigitalsiliconneurons,in IEEEBiomedical CircuitsandSystemsConference ,Baltimore,MD,2008,pp.289. [31]A.Cassidy,S.Denham,P.Kanold,andA.G.Andreou,FPGABasedSilicon SpikingNeuralArray,in IEEEBiomedicalCircuitsandSystemsConference IEEE,2007,pp.75. [32]G.CauwenberghsandA.Yariv,Fault-tolerantdynamicmultilevelstoragein analogVLSI, IEEETransactionsonCircuitsandSystemsPartII:Analogand DigitalSignalProcessing ,vol.41,no.12,pp.827,1994. [33]G.Cauwenberghs,D.H.Goldberg,andA.G.Andreou,Probabilisticsynaptic weightinginarecongurablenetworkofVLSIintegrate-and-reneurons, Neural Networks ,vol.14,no.6-7,pp.781,2001. [34]E.Chicca,A.M.Whatley,P.Lichtsteiner,V.Dante,T.Delbr uck,P.DelGiudice,R.J. Douglas,andG.Indiveri,AMultichipPulse-BasedNeuromorphicInfrastructure andItsApplicationtoaModelofOrientationSelectivity, IEEETransactionson CircuitsandSystemsPartI:RegularPapers ,vol.54,no.5,pp.981, 2007. [35]V.Dante,Hardwareandsoftwareforinterfacingtoaddress-eventbased neuromorphicsystems, TheNeuromorphicEngineer ,vol.2,no.1,pp.5,2005. [36]P.DayanandL.F.Abbott, TheoreticalNeuroscience:Computationaland MathematicalModelingofNeuralSystems .TheMITPress,2005. [37]S.R.Deiss,R.J.Douglas,andA.M.Whatley, Apulse-codedcommunications infrastructureforneuromorphicsystems ,1sted.MITPress,1998,ch.6,pp.157 178. [38]T.Delbr uck,Frame-freedynamicdigitalvision,in InternationalSymposiumon Secure-LifeElectronics,AdvancedElectronicsforQualityLifeandSociety ,Tokyo, Japan,2008,pp.21. [39]T.Delbr uckandA.V.Schaik,BiasCurrentGeneratorswithWideDynamic Range, AnalogIntegratedCircuitsandSignalProcessing ,vol.43,no.3,pp. 247,Jun.2005. [40]A.DelormeandS.J.Thorpe,SpikeNET:anevent-drivensimulationpackagefor modellinglargenetworksofspikingneurons. NetworkBristol,England ,vol.14, no.4,pp.613,Nov.2003. [41]R.Etienne-Cummings,E.Culurciello,andK.A.Boahen,Abiomorphicdigital imagesensor, IEEEJournalofSolid-StateCircuits ,vol.38,no.2,pp.281, 2003. 119

PAGE 120

[42]J.Fieres,J.Schemmel,andK.Meier,Realizingbiologicalspikingnetwork modelsinacongurablewafer-scalehardwaresystem,in IEEEInternationalJoint ConferenceonNeuralNetworks .IEEE,2008,pp.969. [43]F.O.Folowosele,NeuromorphicSystems:Siliconneurons andneuralarraysforemulatingthenervoussystem, 2010.[Online].Available:http://www.neurdon.com/2010/08/12/ neuromorphic-systems-silicon-neurons-and-neural-arrays-for-emulating-the-nervous-system/ [44]W.GerstnerandW.M.Kistler, SpikingNeuronModels ,1sted.Cambridge UniversityPress,2002. [45]M.-O.GewaltigandM.Diesmann,NESTNEuralSimulationTool, Scholarpedia vol.2,no.4,p.1430,2007. [46]B.GirauandC.Torreshuitzil,Massivelydistributeddigitalimplementation ofanintegrate-and-reLEGIONnetworkforvisualscenesegmentation, Neurocomputing ,vol.70,no.7-9,pp.1186,2007. [47]M.Giulioni,P.Camilleri,V.Dante,D.Badoni,G.Indiveri,J.Braun,andP.Del Giudice,AVLSInetworkofspikingneuronswithplasticfullycongurable stop-learningsynapses,in IEEEInternationalConferenceonElectronics,Circuits andSystems ,St.Julien's,Malta,2008,pp.678. [48]M.Giulioni,M.Pannunzi,D.Badoni,V.Dante,andP.DelGiudice,Classication ofcorrelatedpatternswithacongurableanalogVLSIneuralnetworkofspiking neuronsandself-regulatingplasticsynapses. NeuralComputation ,vol.21,no.11, pp.3106,Aug.2009. [49]D.F.M.GoodmanandR.Brette,Brian:asimulatorforspikingneuralnetworksin python. FrontiersinNeuroinformatics ,vol.2,p.5,2008. [50],Thebriansimulator. FrontiersinNeuroscience ,vol.3,no.2,pp.192, 2009. [51]X.Guo,X.Qi,andJ.G.Harris,ATime-to-First-SpikeCMOSImageSensor, IEEE SensorsJournal ,vol.7,no.8,pp.1165,Aug.2007. [52]T.Hall,C.M.Twigg,J.Gray,P.Hasler,andD.Anderson,Large-scale eld-programmableanalogarraysforanalogsignalprocessing, IEEETransactions onCircuitsandSystemsPartI:RegularPapers ,vol.52,no.11,pp.2298, 2005. [53]P.Hammarlund,O.Ekeberg,T.Wilhelmsson,andA.Lansner,Largeneural networksimulationsonmultiplehardwareplatforms,in Annualconferenceon ComputationalNeuroscience:TrendsinResearch .NewYork:PlenumPress, 1997,pp.919. 120

PAGE 121

[54]J.Harkin,F.Morgan,L.McDaid,S.Hall,B.McGinley,andS.Cawley, ARecongurableandBiologicallyInspiredParadigmforComputationUsing Network-On-ChipandSpikingNeuralNetworks, InternationalJournalof RecongurableComputing ,vol.2009,pp.1,2009. [55]R.Harrison,J.Bragg,P.Hasler,B.Minch,andS.Deweerth,A CMOSprogrammableanalogmemory-cellarrayusingoating-gatecircuits, IEEETransactionsonCircuitsandSystemsPartII:AnalogandDigitalSignal Processing ,vol.48,no.1,pp.4,2001. [56]K.HynnaandK.A.Boahen,Neuronalion-channeldynamicsinsilicon,in ProceedingsoftheIEEEInternationalSymposiumonCircuitsandSystems ,2006, p.4. [57]G.Indiveri,ModelingSelectiveAttentionUsingaNeuromorphicAnalogVLSI Device, NeuralComputation ,vol.12,no.12,pp.2857,Dec.2000. [58],Alow-poweradaptiveintegrate-and-reneuroncircuit,in Proceedingsof theIEEEInternationalSymposiumonCircuitsandSystems ,2003,pp.12. [59]G.Indiveri,A.Whatley,andJ.Kramer,ArecongurableneuromorphicVLSI multi-chipsystemappliedtovisualmotioncomputation,in Proceedingsof theSeventhInternationalConferenceonMicroelectronicsforNeural,Fuzzyand Bio-InspiredSystems ,1999,pp.37. [60]G.Indiveri,E.Chicca,andR.Douglas,AVLSIarrayoflow-powerspikingneurons andbistablesynapseswithspike-timingdependentplasticity. IEEETransactions onNeuralNetworks ,vol.17,no.1,pp.211,2006. [61]G.Indiveri,E.Chicca,andR.J.Douglas,ArticialCognitiveSystems:FromVLSI NetworksofSpikingNeuronstoNeuromorphicCognition, CognitiveComputation vol.1,no.2,pp.119,2009. [62]E.Izhikevich,Whichmodeltouseforcorticalspikingneurons? IEEE TransactionsonNeuralNetworks ,vol.15,no.5,pp.1063,Sep.2004. [63]A.Jahnke,U.Roth,andH.Klar,ASIMD/DataowArchitecturefor aNeurocomputerforSpike-ProcessingNeuralNetworksNESPINN, in MICRONEURO:5thInternationalConferenceonMicroelectronicsforNeural NetworksandFuzzySystems .IEEEComputerSociety,1996,p.232. [64]X.Jin,F.Galluppi,C.Patterson,A.Rast,S.Davies,T.Steve,andS.Furber, AlgorithmandSoftwareforSimulationofSpikingNeuralNetworksonthe Multi-ChipSpiNNakerSystem,in IEEEInternationalJointConferenceonNeural NetworksIEEEWorldCongressonComputationalIntelligence ,Barcelona, Spain,2010,pp.649. 121

PAGE 122

[65]X.Jin,A.Rast,F.Galluppi,S.Davies,andS.Furber,Implementing Spike-Timing-DependentPlasticityonSpiNNakerNeuromorphicHardware, in IEEEInternationalJointConferenceonNeuralNetworksIEEEWorldCongress onComputationalIntelligence ,Barcelona,Spain,2010,pp.2302. [66]M.Khan,D.Lester,L.Plana,A.Rast,X.Jin,E.Painkras,and S.Furber,SpiNNaker:Mappingneuralnetworksontoamassively-parallel chipmultiprocessor,in IEEEInternationalJointConferenceonNeuralNetworks IEEEWorldCongressonComputationalIntelligence .IEEE,Jun.2008,pp. 2849. [67]T.J.Koickal,L.C.Gouveia,andA.Hamilton,Aprogrammablespike-timingbased circuitblockforrecongurableneuromorphiccomputing, Neurocomputing ,vol.72, no.16-18,pp.3609,Oct.2009. [68]T.S.Lande,Ed., NeuromorphicSystemsEngineering:NeuralNetworksinSilicon ser.TheSpringerInternationalSeriesinEngineeringandComputerScience. KluwerAcademic,1998. [69]J.Lazzaro,J.Wawrzynek,M.Mahowald,M.Sivilotti,andD.Gillespie,Silicon auditoryprocessorsascomputerperipherals, IEEETransactionsonNeural Networks ,vol.4,no.3,pp.523,1993. [70]P.Lichtsteiner,C.Posch,andT.Delbr uck,A128 128120dB15 sLatency AsynchronousTemporalContrastVisionSensor, IEEEJournalofSolid-State Circuits ,vol.43,no.2,pp.566,2008. [71]A.Linares-Barranco,F.Gomez-Rodriguez,G.Jimenez,T.Delbruck,R.Berner, andS.C.Liu,Implementationofatime-warpingAERmapper,in Proceedings oftheIEEEInternationalSymposiumonCircuitsandSystems .IEEE,May2009, pp.2886. [72]S.-C.Liu,J.Kramer,G.Indiveri,T.Delbr uck,andR.Douglas,Eds., AnalogVLSI: CircuitsandPrinciples .TheMITPress,2002. [73]W.Maass,T.Natschl ager,andH.Markram,Real-timecomputingwithoutstable states:anewframeworkforneuralcomputationbasedonperturbations. Neural Computation ,vol.14,no.11,pp.2531,2002. [74]L.Maguire,T.Mcginnity,B.Glackin,A.Ghani,A.Belatreche,andJ.Harkin, Challengesforlarge-scaleimplementationsofspikingneuralnetworkson FPGAs, Neurocomputing ,vol.71,no.1-3,pp.13,Dec.2007. [75]M.Mahowald, AnAnalogVLSISystemforStereoscopicVision ,1sted.Springer, 1994. [76]C.Mead, AnalogVLSIandNeuralSystems ,1sted.AddisonWesleyPublishing Company,1989. 122

PAGE 123

[77]P.A.Merolla,J.V.Arthur,andJ.J.Wittig, TheUSBRevolution ,Dec.2005,pp. 10. [78]P.A.Merolla,J.V.Arthur,B.E.Shi,andK.A.Boahen,ExpandableNetworks forNeuromorphicChips, IEEETransactionsonCircuitsandSystemsPartI: RegularPapers ,vol.54,no.2,pp.301,Feb.2007. [79]S.Mitra,S.Fusi,andG.Indiveri,AVLSIspike-drivendynamicsynapsewhich learnsonlywhennecessary,in ProceedingsoftheIEEEInternationalSymposium onCircuitsandSystems .IEEE,2006,p.4. [80]S.Mitra,G.Indiveri,andS.Fusi,Learningtoclassifycomplexpatternsusinga VLSInetworkofspiking, AdvancesinNeuralInformationProcessingSystems vol.20,pp.1,2008. [81]A.Mortara,E.Vittoz,andP.Venier,AcommunicationschemeforanalogVLSI perceptivesystems, IEEEJournalofSolid-StateCircuits ,vol.30,no.6,pp. 660,1995. [82]T.Natschl ager,H.Markram,andW.Maass, Computermodelsandanalysistools forneuralmicrocircuits .Boston:KluwerAcademicPublishers,2002,ch.9,pp. 123. [83]M.O'HalloranandR.Sarpeshkar,A10-nW12-bitaccurateanalogstoragecell with10-aAleakage, IEEEJournalofSolid-StateCircuits ,vol.39,no.11,pp. 1985,2004. [84]R.Paz,F.Gomez-Rodriguez,M.A.Rodriguez,A.Linares-Barranco, G.Jimenez-Moreno,andA.Civit, TestInfrastructureforAddress-EventRepresentationCommunications ,ser.LectureNotesinComputerScience. SpringerBerlin/Heidelberg,2005,pp.518. [85]D.Pecevski,T.Natschl ager,andK.Schuch,PCSIM:AParallelSimulation EnvironmentforNeuralCircuitsFullyIntegratedwithPython. Frontiersin Neuroinformatics ,vol.3,p.11,2009. [86]L.A.Plana,S.B.Furber,S.Temple,M.Khan,Y.Shi,J.Wu,andS.Yang,A GALSInfrastructureforaMassivelyParallelMultiprocessor, IEEEDesign&Test ofComputers ,vol.24,no.5,pp.454,Sep.2007. [87]A.Rast,F.Galluppi,X.Jin,andS.Furber,TheLeakyIntegrate-and-FireNeuron: APlatformforSynapticModelExplorationontheSpiNNakerChip,in IEEE InternationalJointConferenceonNeuralNetworksIEEEWorldCongresson ComputationalIntelligence ,2010,pp.3959. [88]A.D.Rast,M.Khan,andS.B.Furber,Virtualsynapticinterconnectusing anasynchronousnetwork-on-chip,in IEEEInternationalJointConferenceon 123

PAGE 124

NeuralNetworksIEEEWorldCongressonComputationalIntelligence .IEEE, Jun.2008,pp.2727. [89]A.D.Rast,X.Jin,F.Galluppi,L.A.Plana,C.Patterson,andS.Furber, Scalableevent-drivennativeparallelprocessing,in Proceedingsofthe7thACM internationalconferenceonComputingfrontiers-CF'10 .NewYork,NewYork, USA:ACMPress,2010,p.21. [90]V.RavinuthulaandJ.G.Harris,Time-basedarithmeticusingstepfunctions,in ProceedingsoftheIEEEInternationalSymposiumonCircuitsandSystems ,vol.1, 2004,pp.305. [91]V.Ravinuthula,V.Garg,J.G.Harris,andJ.A.B.Fortes,Time-modecircuits foranalogcomputation, InternationalJournalofCircuitTheoryandApplications vol.37,no.5,pp.631,Jun.2009. [92]M.Rivas,F.Gomez-Rodriguez,R.Paz,A.Linares-Barranco,S.Vicente,and D.Cascado, ToolsforAddress-Event-RepresentationCommunicationSystems andDebugging ,ser.LectureNotesinComputerScience.Berlin/Heidelberg: Springer-Verlag,2005,vol.3696,pp.289. [93]D.Roggen,S.Hohmann,Y.Thoma,andD.Floreano,Hardwarespikingneural networkwithrun-timerecongurableconnectivityinanautonomousrobot,in NASA/DoDConferenceonEvolvableHardware.Proceedings. LosAlamitos,CA, USA:IEEEComputerSociety,2003,pp.189. [94]E.Ros,E.M.Ortigosa,R.Ag s,R.Carrillo,andM.Arnold,Real-timecomputing platformforspikingneuronsRT-spike. IEEETransactionsonNeuralNetworks vol.17,no.4,pp.1050,2006. [95]J.Rumbaugh,I.Jacobson,andG.Booch, TheUniedModelingLanguage ReferenceManual ,2nded.Addison-WesleyProfessional,2004. [96]J.Schemmel,J.Fieres,andK.Meier,Wafer-scaleintegrationofanalogneural networks,in IEEEInternationalJointConferenceonNeuralNetworks ,2008,pp. 431. [97]T.Schoenauer,N.Mehrtash,A.Jahnke,andH.Klar,MASPINN:novelconcepts foraneuroacceleratorforspikingneuralnetworks,in ProceedingsofSPIE ,vol. 3728,no.1,Stockholm,Sweden,Mar.1999,pp.87. [98]B.Schrauwen,M.Dhaene,D.Verstraeten,andJ.V.Campenhout,Compact hardwareforreal-timespeechrecognitionusingaliquidstatemachine,in InternationalJointConferenceonNeuralNetworks ,2007. [99]B.Schrauwen,D.Verstraeten,andJ.V.Campenhout,Anoverviewof reservoircomputing:theory,applicationsandimplementations,in 15thEuropean SymposiumonArticialNeuralNetworks ,2007,pp.471. 124

PAGE 125

[100]B.Schrauwen,M.D'Haene,D.Verstraeten,andJ.V.Campenhout,Compact hardwareliquidstatemachinesonFPGAforreal-timespeechrecognition. Neural Networks ,vol.21,no.2-3,pp.511,2008. [101]T.Serrano-Gotarredona,A.G.Andreou,andB.Linares-Barranco,AERimage lteringarchitectureforvision-processingsystems, IEEETransactionsonCircuits andSystemsPartI:FundamentalTheoryandApplications ,vol.46,no.9,pp. 1064,1999. [102]M.A.Sivilotti,WiringconsiderationsinanalogVLSIsystems,withapplication toeld-programmablenetworks,Ph.D.dissertation,CaliforniaInstituteof Technology,1992. [103]TellurideNeuromorphicCognitionEngineeringWorkshop,SpikeComputation GroupResults,2010.[Online].Available:https://neuromorphs.net/nm/wiki/2010/ results/spike [104]O.T urel,J.H.Lee,X.Ma,andK.K.Likharev,Neuromorphicarchitecturesfor nanoelectroniccircuits, InternationalJournalofCircuitTheoryandApplications vol.32,no.5,pp.277,2004. [105]C.M.Twigg,J.D.Gray,andP.E.Hasler,Programmableoatinggate FPAAswitchesarenotdeadweight,in ProceedingsoftheIEEEInternational SymposiumonCircuitsandSystems .IEEE,May2007,pp.169. [106]I.UysalandJ.G.Harris,Biologicallyplausiblespeechrecognitionusing spike-basedphaselockingcues,in ProceedingsoftheIEEEInternational SymposiumonCircuitsandSystems ,May2009,pp.101. [107]I.Uysal,H.Sathyendra,andJ.G.Harris,Spike-BasedFeatureExtraction forNoiseRobustSpeechRecognitionUsingPhaseSynchronyCoding, in ProceedingsoftheIEEEInternationalSymposiumonCircuitsandSystems IEEE,May2007,pp.1529. [108]C.Vincent,L.Shih-Chii,andS.AndrVan,AEREAR:AMatchedSiliconCochlea PairWithAddressEventRepresentationInterface, IEEETransactionsonCircuits andSystemsPartI:RegularPapers ,vol.54,no.1,pp.48,2007. [109]R.J.Vogelstein,U.Mallik,E.Culurciello,G.Cauwenberghs,and R.Etienne-Cummings,Amultichipneuromorphicsystemforspike-basedvisual informationprocessing. NeuralComputation ,vol.19,no.9,pp.2281,2007. [110]R.J.Vogelstein,U.Mallik,J.T.Vogelstein,andG.Cauwenberghs,Dynamically recongurablesiliconarrayofspikingneuronswithconductance-basedsynapses. IEEETransactionsonNeuralNetworks ,vol.18,no.1,pp.253,2007. 125

PAGE 126

[111]R.Vogelstein,U.Mallik,andG.Cauwenberghs,Siliconspike-basedsynaptic arrayandaddress-eventtransceiver,in ProceedingsoftheIEEEInternational SymposiumonCircuitsandSystems ,vol.1.IEEE,2004,pp.385. [112]R.Widlar,SomeCircuitDesignTechniquesforLinearIntegratedCircuits, IEEE TransactionsonCircuitTheory ,vol.12,no.4,pp.586,1965. [113]C.Wolff,G.Hartmann,andU.R uckert,ParSPIKEAParallelDSP-Accelerator forDynamicSimulationofLargeSpikingNeuralNetworks,in MICRONEURO:7th InternationalConferenceonMicroelectronicsforNeural,FuzzyandBio-Inspired Systems .IEEEComputerSociety,1999,p.324. [114]Xilinx,DigitalClockManagerDCMModule,p.6,2009.[Online].Available: http://www.xilinx.com/support/documentation/ip documentation/dcm module.pdf [115]A.ZadorandL.Dobrunz,DynamicSynapsesintheCortex, Neuron ,vol.19, no.1,pp.1,Jul.1997. [116]K.A.ZaghloulandK.A.Boahen,Asiliconretinathatreproducessignalsinthe opticnerve. Journalofneuralengineering ,vol.3,no.4,pp.257,2006. [117]R.S.ZuckerandW.G.Regehr,Short-termsynapticplasticity. Annualreviewof physiology ,vol.64,pp.355,2002. 126

PAGE 127

BIOGRAPHICALSKETCH VaibhavGargwasborninDelhi,Indiain1982toSavitaandRavinderKumar Garg.Hehasoneyoungersister,DishiGarg.HeismarriedtoVarshaChitnissince 2007whomhemetduringhisundergraduateyears.VaibhavreceivedaBachelorof TechnologydegreeinInformationandCommunicationTechnologyfromtheDhirubhai AmbaniInstituteofInformationandTechnologyDAIICT,Gandhinagar,IndiainMay 2005. SinceFall2005,VaibhavhasbeenaresearchassistantintheComputational NeuroEngineeringLaboratoryCNELattheUniversityofFloridaworkingwithDr. JohnG.Harrisonaspike-basedcomputationarchitecture.Vaibhavwasawarded consecutivelytheOutstandingInternationalStudentoftheYearawardin2009 and2010bytheUniversityofFloridaInternationalCenter.Hisresearchinterests includespikebasedcomputation,lowpowerandmixedsignalcircuitdesign.Heis alsointerestedinwritingsoftwareandlikestolearnnewcomputerlanguagesand technologies.Vaibhavwasresponsiblefordesigninganddevelopinganonlinegraduate applicationwebsite,automatingtasksforthepersonnelofceanddevelopinganonline databasemanagementsystemforcurrentstudentsintheDepartmentofElectrical EngineeringatUniversityofFlorida.Inhissparetime,helikestoworkonautomobiles. VaibhavreceivedhisMasterofScienceMSdegreeinelectricalengineering fromtheUniversityofFloridain2007andhisPh.D.degreeinelectricalengineeringin December2010alsofromUniversityofFlorida.HenowworksforTexasInstruments Inc.,Dallasasananalogcircuitdesignerintheirmixedsignalautomotivegroup. 127