University Press of Florida

Computer Networking: Principles, Protocols, and Practice

MISSING IMAGE

Material Information

Title:
Computer Networking: Principles, Protocols, and Practice
Physical Description:
Book
Language:
en-US
Creator:
Bonaventure, Olivier, Saylor

Subjects

Subjects / Keywords:
Local Area Networks,TCP/IP,Email,DNS,HTTP,TCP/UDP,LAN
Mathematics
Mathematics

Notes

Abstract:
Computer Networking: Principles, Protocols, and Practice was written and submitted to the Open Textbook Challenge by Dr. Olivier Bonaventure of the Université catholique de Louvain (UCL) in Louvainla-Neuve, Belgium. He also serves as the Education Director of ACM SIGCOMM. Computer Networking has already been used by several universities around the world, including UCL.
General Note:
Expositive
General Note:
Higher Education
General Note:
Bonaventure, Olivier
General Note:
Narrative text
General Note:
http://florida.theorangegrove.org/og/file/be18c155-0a47-e83e-72c6-24ca415b45da/1/Computer-Networking-Principles-Bonaventure-1-30-31-OTC1.pdf

Record Information

Source Institution:
University of Florida
Rights Management:
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/. You are free to copy, distribute and transmit this work and to adapt this work if you attribute authorship and use the work for non-commercial purposes. If you …
Resource Identifier:
oclc - no OCLC record
System ID:
AA00011742:00001


This item is only available as the following downloads:


Full Text

PAGE 1

ComputerNetworking:Principles, ProtocolsandPractice Release0.25 OlivierBonaventure October30,2011

PAGE 3

Contents 1Preface 3 2Introduction 5 2.1Servicesandprotocols........................................11 2.2Thereferencemodels........................................20 2.3Organisationofthebook .......................................25 3TheapplicationLayer 27 3.1Principles ...............................................27 3.2Application-levelprotocols .....................................32 3.3Writingsimplenetworkedapplications ...............................55 3.4Summary ...............................................61 3.5Exercises ...............................................61 4Thetransportlayer 67 4.1Principlesofareliabletransportprotocol ..............................67 4.2TheUserDatagramProtocol .....................................87 4.3TheTransmissionControlProtocol .................................89 4.4Summary ...............................................113 4.5Exercises ...............................................114 5Thenetworklayer 127 5.1Principles ...............................................127 5.2InternetProtocol ...........................................140 5.3RoutinginIPnetworks ........................................170 5.4Summary ...............................................195 5.5Exercises ...............................................195 6ThedatalinklayerandtheLocalAreaNetworks 211 6.1Principles ...............................................211 6.2MediumAccessControl .......................................214 6.3Datalinklayertechnologies .....................................228 6.4Summary ...............................................246 6.5Exercises ...............................................246 7Glossary 249 8Bibliography 255 i

PAGE 4

9Indicesandtables 257 Bibliography 259 Index 273 ii

PAGE 5

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Contents 1

PAGE 6

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 2 Contents

PAGE 7

CHAPTER 1 Preface Thistextbookcamefromafrustrationofitsmainauthor.Manyauthorschosetowriteatextbookbecausethere arenotextbooksintheireldorbecausetheyarenotsatisedwiththeexistingtextbooks.Thisfrustration hasproducedseveralexcellenttextbooksinthenetworkingcommunity.Atatimewhennetworkingtextbooks weremainlytheoretical, DouglasComer chosetowriteatextbookentirelyfocusedontheTCP/IPprotocolsuite [Comer1988],adifcultchoiceatthattime.HelaterextendedhistextbookbydescribingacompleteTCP/IP implementation,addingpracticalconsiderationstothetheoreticaldescriptionsin [Comer1988]. RichardStevens approachedtheInternetlikeanexplorerandexplainedtheoperationofprotocolsbylookingatallthepackets thatwereexchangedonthewire [Stevens1994]. JimKurose and KeithRoss reinventedthenetworkingtextbooks bystartingfromtheapplicationsthatthestudentsuseandlaterexplainedtheInternetprotocolsbyremovingone layeraftertheother [KuroseRoss09]. Thefrustrationsthatmotivatedthisbookaredifferent.WhenIstartedtoteachnetworkinginthelate1990s, studentswerealreadyInternetusers,buttheirusagewaslimited.Studentswerestillusingreferencetextbooksand spenttimeinthelibrary.Today'sstudentsarecompletelydifferent.Theyareavidandexperimentedwebusers whondlotsofinformationontheweb.Thisisapositiveattitudesincetheyareprobablymorecuriousthan theirpredecessors.ThankstotheinformationthatisavailableontheInternet,theycancheckorobtainadditional informationaboutthetopicsexplainedbytheirteachers.Thisabundantinformationcreatesseveralchallengesfor ateacher.Untiltheendofthenineteenthcentury,ateacherwasbydenitionmoreknowledgeablethanhisstudents anditwasverydifcultforthestudentstoverifythelessonsgivenbytheirteachers.Today,giventheamount ofinformationavailableatthengertipsofeachstudentthroughtheInternet,verifyingalessonorgettingmore informationaboutagiventopicissometimesonlyafewclicksaway.Websitessuchas wikipedia providelotsof informationonvarioustopicsandstudentsoftenconsultthem.Unfortunately,theorganisationoftheinformation onthesewebsitesisnotwellsuitedtoallowstudentstolearnfromthem.Furthermore,therearehugedifferences inthequalityanddepthoftheinformationthatisavailablefordifferenttopics. Thesecondreasonisthatthecomputernetworkingcommunityisastrongparticipantintheopen-sourcemovement.Today,therearehigh-qualityandwidelyusedopen-sourceimplementationsformostnetworkingprotocols. ThisincludestheTCP/IPimplementationsthatarepartof linux, freebsd orthe uIP stackrunningon8bitscontrollers,butalsoserverssuchas bind, unbound, apache or sendmail andimplementationsofroutingprotocolssuch as xorp or quagga .Furthermore,thedocumentsthatdenealmostalloftheInternetprotocolshavebeendevelopedwithintheInternetEngineeringTaskForce(IETF)usinganopenprocess.TheIETFpublishesitsprotocol specicationsinthepubliclyavailable RFC andnewproposalsaredescribedin Internetdrafts. Thisopentextbookaimstollthegapbetweentheopen-sourceimplementationsandtheopen-sourcenetwork specicationsbyprovidingadetailedbutpedagogicaldescriptionofthekeyprinciplesthatguidetheoperationof theInternet.Thebookisreleasedundera creativecommonslicence.Suchanopen-sourcelicenseismotivated bytworeasons.Therstisthatwehopethatthiswillallowmanystudentstousethebooktolearncomputer networks.ThesecondisthatIhopethatotherteacherswillreuse,adaptandimproveit.Timewilltellifitis possibletobuildacommunityofcontributorstoimproveanddevelopthebookfurther.Asastartingpoint,the rstreleasecontainsallthematerialforaone-semesterrstupperundergraduateoragraduatenetworkingcourse. Asofthiswriting,mostofthetexthasbeenwrittenby OlivierBonaventure. LaurentVanbever, VirginieVanden 3

PAGE 8

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Schriek,DamienSaucezandMickaelHoerdt havecontributedtoexercises.PierreReinbolddesignedtheicons usedtorepresentswitchesandNipaulLonghasredrawnmanyguresintheSVGformat.StephaneBortzmeyer sentmanysuggestionsandcorrectionstothetext.Additionalinformationaboutthetextbookisavailableat http://inl.info.ucl.ac.be/CNP3 4 Chapter1.Preface

PAGE 9

CHAPTER 2 Introduction Whentherstcomputerswerebuiltduringthesecondworldwar,theywereexpensiveandisolated.However, afterabouttwentyyears,astheirpricesgraduallydecreased,therstexperimentsbegantoconnectcomputers together.Intheearly1960s,researchersincluding PaulBaran, DonaldDavies or JosephLicklider independently publishedtherstpapersdescribingtheideaofbuildingcomputernetworks [Baran][Licklider1963] .Given thecostofcomputers,sharingthemoveralongdistancewasaninterestingidea.IntheUS,the ARPANET startedin1969andcontinueduntilthemid1980s [LCCD09].InFrance, LouisPouzin developedtheCyclades network [Pouzin1975].Manyotherresearchnetworkswerebuiltduringthe1970s [Moore].Atthesametime, thetelecommunicationandcomputerindustriesbecameinterestedincomputernetworks.Thetelecommunication industrybetonthe X25.ThecomputerindustrytookacompletelydifferentapproachbydesigningLocalArea Networks(LAN).ManyLANtechnologiessuchasEthernetorTokenRingweredesignedatthattime.During the1980s,theneedtointerconnectmoreandmorecomputersledmostcomputervendorstodeveloptheirown suiteofnetworkingprotocols.Xeroxdeveloped [XNS] ,DECchoseDECNet [Malamud1991] ,IBMdeveloped SNA [McFadyen1976] ,MicrosoftintroducedNetBIOS [Winston2003] ,ApplebetonAppletalk [SAO1990] .In theresearchcommunity,ARPANETwasdecommissionedandreplacedbyTCP/IP [LCCD09] andthereference implementationwasdevelopedinsideBSDUnix [McKusick1999].UniversitieswhowerealreadyrunningUnix couldthusadoptTCP/IPeasilyandvendorsofUnixworkstationssuchasSunorSiliconGraphicsincludedTCP/IP intheirvariantofUnix.Inparallel,the ISO,withsupportfromthegovernments,workedondevelopinganopen 1 Suiteofnetworkingprotocols.Intheend,TCP/IPbecamethedefactostandardthatisnotonlyusedwithinthe researchcommunity.Duringthe1990sandtheearly2000s,thegrowthoftheusageofTCP/IPcontinued,and todayproprietaryprotocolsareseldomused.Asshownbythegurebelow,thatprovidestheestimationofthe numberofhostsattachedtotheInternet,theInternethassustainedlargegrowththroughoutthelast20+years. Figure2.1:EstimationofthenumberofhostsontheInternet 1 OpeninISOtermswasincontrastwiththeproprietaryprotocolsuiteswhosespecicationwasnotalwayspubliclyavailable.TheUS governmentevenmandatedtheusageoftheOSIprotocols(see RFC1169),butthiswasnotsufcienttoencouragealluserstoswitchtothe OSIprotocolsuitethatwasconsideredbymanyastoocomplexcomparedtootherprotocolsuites. 5

PAGE 10

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 RecentestimationsofthenumberofhostsattachedtotheInternetshowacontinuinggrowthsince20+years. However,althoughthenumberofhostsattachedtotheInternetishigh,itshouldbecomparedtothenumber ofmobilephonesthatareinusetoday.MoreandmoreofthesemobilephoneswillbeconnectedtotheInternet.Furthermore,thankstotheavailabilityofTCP/IPimplementationsrequiringlimitedresourcessuchas uIP [Dunkels2003],wecanexpecttoseeagrowthofTCP/IPenabledembeddeddevices. Figure2.2:Estimationofthenumberofmobilephones Beforelookingattheservicesprovidedbycomputernetworks,itisusefultoagreeonsometerminologythat iswidelyusedinnetworkingliterature.Firstofall,computernetworksareoftenclassiedinfunctionofthe geographicalareathattheycover LAN :alocalareanetworktypicallyinterconnectshoststhatareuptoafewormaybeafewtensofkilometersapart. MAN :ametropolitanareanetworktypicallyinterconnectsdevicesthatareuptoafewhundredkilometers apart WAN :awideareanetworkinterconnecthoststhatcanbelocatedanywhereonEarth 2 Anotherclassicationofcomputernetworksisbasedontheirphysicaltopology.Inthefollowinggures,physical linksarerepresentedaslineswhileboxesshowcomputersorothertypesofnetworkingequipment. Computernetworksareusedtoallowseveralhoststoexchangeinformationbetweenthemselves.Toallowany hosttosendmessagestoanyotherhostinthenetwork,theeasiestsolutionistoorganisethemasafull-mesh,with adirectanddedicatedlinkbetweeneachpairofhosts.Suchaphysicaltopologyissometimesused,especially whenhighperformanceandhighredundancyisrequiredforasmallnumberofhosts.However,ithastwomajor drawbacks: foranetworkcontaining n hosts,eachhostmusthave n-1 physicalinterfaces.Inpractice,thenumberof physicalinterfacesonanodewilllimitthesizeofafull-meshnetworkthatcanbebuilt foranetworkcontaining n hosts, n(n)Tj/T1_5 6.974 Tf(1) 2 linksarerequired.Thisispossiblewhenthereareafewnodes inthesameroom,butrarelywhentheyarelocatedseveralkilometersapart Thesecondpossiblephysicalorganisation,whichisalsousedinsidecomputerstoconnectdifferentextension cards,isthebus.Inabusnetwork,allhostsareattachedtoasharedmedium,usuallyacablethroughasingle interface.Whenonehostsendsanelectricalsignalonthebus,thesignalisreceivedbyallhostsattachedtothebus. Adrawbackofbus-basednetworksisthatifthebusisphysicallycut,thenthenetworkissplitintotwoisolated networks.Forthisreason,bus-basednetworksaresometimesconsideredtobedifculttooperateandmaintain, especiallywhenthecableislongandtherearemanyplaceswhereitcanbreak.Suchabus-basedtopologywas usedinearlyEthernetnetworks. Athirdorganisationofacomputernetworkisastartopology.Insuchtopologies,hostshaveasinglephysical interfaceandthereisonephysicallinkbetweeneachhostandthecenterofthestar.Thenodeatthecenterof thestarcanbeeitherapieceofequipmentthatampliesanelectricalsignal,oranactivedevice,suchasapiece 2 Inthisbook,wefocusonnetworksthatareusedonEarth.Thesenetworkssometimesincludesatellitelinks.Besidesthenetwork technologiesthatareusedonEarth,researchersdevelopnetworkingtechniquesthatcouldbeusedbetweennodeslocatedondifferentplanets. SuchanInterPlanetaryInternetrequiresdifferenttechniquesthantheonesdiscussedinthisbook.See RFC4838 andthereferencestherein forinformationaboutthesetechniques. 6 Chapter2.Introduction

PAGE 11

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure2.3:AFullmeshnetwork Figure2.4:AnetworkorganisedasaBus ofequipmentthatunderstandstheformatofthemessagesexchangedthroughthenetwork.Ofcourse,thefailure ofthecentralnodeimpliesthefailureofthenetwork.However,ifonephysicallinkfails(e.g.becausethecable hasbeencut),thenonlyonenodeisdisconnectedfromthenetwork.Inpractice,star-shapednetworksareeasier tooperateandmaintainthanbus-shapednetworks.Manynetworkadministratorsalsoappreciatethefactthat theycancontrolthenetworkfromacentralpoint.AdministeredfromaWebinterface,orthroughaconsole-like connection,thecenterofthestarisausefulpointofcontrol(enablingordisablingdevices)andanexcellent observationpoint(usagestatistics). Figure2.5:AnetworkorganisedasaStar AfourthphysicalorganisationofanetworkistheRingtopology.Likethebusorganisation,eachhosthasasingle physicalinterfaceconnectingittothering.Anysignalsentbyahostontheringwillbereceivedbyallhosts attachedtothering.Fromaredundancypointofview,asingleringisnotthebestsolution,asthesignalonly travelsinonedirectiononthering;thusifoneofthelinkscomposingtheringiscut,theentirenetworkfails.In practice,suchringshavebeenusedinlocalareanetworks,butarenowoftenreplacedbystar-shapednetworks. Inmetropolitannetworks,ringsareoftenusedtointerconnectmultiplelocations.Inthiscase,twoparallellinks, composedofdifferentcables,areoftenusedforredundancy.Withsuchadualring,whenoneringfailsallthe trafccanbequicklyswitchedtotheotherring. Afthphysicalorganisationofanetworkisthetree.Suchnetworksaretypicallyusedwhenalargenumberof customersmustbeconnectedinaverycost-effectivemanner.CableTVnetworksareoftenorganisedastrees. Inpractice,mostrealnetworkscombinepartofthesetopologies.Forexample,acampusnetworkcanbeorganised asaringbetweenthekeybuildings,whilesmallerbuildingsareattachedasatreeorastartoimportantbuildings. 7

PAGE 12

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure2.6:AnetworkorganisedasaRing Figure2.7:AnetworkorganisedasaTree 8 Chapter2.Introduction

PAGE 13

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 OranISPnetworkmayhaveafullmeshofdevicesinthecoreofitsnetwork,andtreestoconnectremoteusers. Throughoutthisbook,ourobjectivewillbetounderstandtheprotocolsandmechanismsthatarenecessaryfora networksuchastheoneshownbelow. Figure2.8:Asimpleinternetwork Thegureaboveillustratesaninternetwork,i.e.anetworkthatinterconnectsothernetworks.Eachnetworkis illustratedasanellipsecontainingafewdevices.Wewillexplainthroughoutthebookthedifferenttypesof devicesandtheirrespectiverolesenablingallhoststoexchangeinformation.Aswellasthis,wewilldiscusshow networksareinterconnected,andtherulesthatguidetheseinterconnections.Wewillalsoanalysehowthebus, ringandmeshtopologiesareusedtobuildrealnetworks. Thelastpointofterminologyweneedtodiscussisthetransmissionmodes.Whenexchanginginformationthrough anetwork,weoftendistinguishbetweenthreetransmissionmodes.InTVandradiotransmission, broadcast is oftenusedtoindicateatechnologythatsendsavideoorradiosignaltoallreceiversinagivengeographicalarea. Broadcastissometimesusedincomputernetworks,butonlyinlocalareanetworkswherethenumberofrecipients islimited. Therstandmostwidespreadtransmissionmodeiscalled unicast .Intheunicasttransmissionmode,information issentbyonesendertoonereceiver.Mostoftoday'sInternetapplicationsrelyontheunicasttransmissionmode. Theexamplebelowshowsanetworkwithtwotypesofdevices:hosts(drawnascomputers)andintermediate nodes(drawnascubes).Hostsexchangeinformationviatheintermediatenodes.Intheexamplebelow,when host S usesunicasttosendinformation,itsendsitviathreeintermediatenodes.Eachofthesenodesreceivesthe informationfromitsupstreamnodeorhost,thenprocessesandforwardsittoitsdownstreamnodeorhost.This iscalled storeandforward andwewillseelaterthatthisconceptiskeyincomputernetworks. Asecondtransmissionmodeis multicast transmissionmode.Thismodeisusedwhenthesameinformationmust besenttoasetofrecipients.ItwasrstusedinLANsbutlaterbecamesupportedinwideareanetworks.When asenderusesmulticasttosendinformationto N receivers,thesendersendsasinglecopyoftheinformationand thenetworknodesduplicatethisinformationwhenevernecessary,sothatitcanreachallrecipientsbelongingto thedestinationgroup. Tounderstandtheimportanceofmulticasttransmission,considersource S thatsendsthesameinformationto destinations A, C and E.Withunicast,thesameinformationpassesthreetimesonintermediatenodes 1 and 2 and twiceonnode 4.Thisisawasteofresourcesontheintermediatenodesandonthelinksbetweenthem.With multicasttransmission,host S sendstheinformationtonode 1 thatforwardsitdownstreamtonode 2.Thisnode createsacopyofthereceivedinformationandsendsonecopydirectlytohost E andtheotherdownstreamtonode 4.Uponreceptionoftheinformation,node 4 producesacopyandforwardsonetonode A andanothertonode C.Thankstomulticast,thesameinformationcanreachalargenumberofreceiverswhilebeingsentonlyonceon eachlink. 9

PAGE 14

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure2.9:Unicasttransmission Figure2.10:Multicasttransmission 10 Chapter2.Introduction

PAGE 15

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Thelasttransmissionmodeisthe anycast transmissionmode.Itwasinitiallydenedin RFC1542.Inthis transmissionmode,asetofreceiversisidentied.Whenasourcesendsinformationtowardsthissetofreceivers, thenetworkensuresthattheinformationisdeliveredto one receiverthatbelongstothisset.Usually,thereceiver closesttothesourceistheonethatreceivestheinformationsentbythisparticularsource.Theanycasttransmission modeisusefultoensureredundancy,aswhenoneofthereceiversfails,thenetworkwillensurethatinformation willbedeliveredtoanotherreceiverbelongingtothesamegroup.However,inpracticesupportingtheanycast transmissionmodecanbedifcult. Figure2.11:Anycasttransmission Intheexampleabove,thethreehostsmarkedwith arepartofthesameanycastgroup.Whenhost S sends informationtothisanycastgroup,thenetworkensuresthatitwillreachoneofthemembersoftheanycastgroup. Thedashedlinesshowapossibledeliveryvianodes 1, 2 and 4.Asubsequentanycasttransmissionfromhost S tothesameanycastgroupcouldreachthehostattachedtointermediatenode 3 asshownbytheplainline. Ananycasttransmissionreachesamemberoftheanycastgroupthatischosenbythenetworkinfunctionofthe currentnetworkconditions. 2.1Servicesandprotocols Animportantaspecttounderstandbeforestudyingcomputernetworksisthedifferencebetweena service anda protocol. Inordertounderstandthedifferencebetweenthetwo,itisusefultostartwithrealworldexamples.Thetraditional Postprovidesaservicewhereapostmandeliversletterstorecipients.ThePostdenespreciselywhichtypesof letters(size,weight,etc)canbedeliveredbyusingtheStandardMailservice.Furthermore,theformatofthe envelopeisspecied(positionofthesenderandrecipientaddresses,positionofthestamp).Someonewhowants tosendalettermusteitherplacetheletterataPostOfceorinsideoneofthededicatedmailboxes.Theletter willthenbecollectedanddeliveredtoitsnalrecipient.NotethatfortheregularservicethePostusuallydoes notguaranteethedeliveryofeachparticularletter,somelettersmaybelost,andsomelettersaredeliveredtothe wrongmailbox.Ifaletterisimportant,thenthesendercanusetheregisteredservicetoensurethattheletterwill bedeliveredtoitsrecipient.SomePostservicesalsoprovideanacknowledgedserviceoranexpressmailservice thatisfasterthantheregularservice. Incomputernetworks,thenotionofserviceismoreformallydenedin [X200] .Itcanbebetterunderstoodby consideringacomputernetwork,whateveritssizeorcomplexity,asablackboxthatprovidesaserviceto users asshowninthegurebelow.Theseuserscouldbehumanusersorprocessesrunningonacomputersystem. Manyuserscanbeattachedtothesameserviceprovider.Throughthisprovider,eachusermustbeableto exchangemessageswithanyotheruser.Tobeabletodeliverthesemessages,theserviceprovidermustbeable tounambiguouslyidentifyeachuser.Incomputernetworks,eachuserisidentiedbyaunique address,wewill discusslaterhowtheseaddressesarebuiltandused.Atthispoint,andwhenconsideringunicasttransmission,the maincharacteristicofthese addresses isthattheyareunique.Twodifferentusersattachedtothenetworkcannot usethesameaddress. 2.1.Servicesandprotocols 11

PAGE 16

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure2.12:Usersandserviceprovider Throughoutthisbook,wewilldeneaserviceasasetofcapabilitiesprovidedbyasystem(anditsunderlying elements)toitsuser.Auserinteractswithaservicethrougha serviceaccesspoint.Notethatasshowninthegure above,usersinteractwithoneserviceprovider.Inpractice,theserviceproviderisdistributedoverseveralhosts, buttheseareimplementationdetailsthatarenotimportantatthisstage.Theseinteractionsbetweenauseranda serviceproviderareexpressedin [X200] byusingprimitives,asshowinthegurebelow.Theseprimitivesare anabstractrepresentationoftheinteractionsbetweenauserandaserviceprovider.Inpractice,theseinteractions couldbeimplementedassystemcallsforexample. Figure2.13:Thefourtypesofprimitives Fourtypesofprimitivesaredened: X.request.Thistypeofprimitivecorrespondstoarequestissuedbyausertoaserviceprovider X.indication.Thistypeofprimitiveisgeneratedbythenetworkprovideranddeliveredtoauser(often relatedtoanearlierandremote X.request primitive) X.response.Thistypeofprimitiveisgeneratedbyausertoanswertoanearlier X.indication primitive X.conrm.Thistypeofprimitiveisdeliveredbytheserviceprovidetoconrmtoauserthataprevious X.request primitivehasbeensuccessfullyprocessed. Primitivescanbecombinedtomodeldifferenttypesofservices.Thesimplestserviceincomputernetworksis calledthe connectionlessservice 3 .Thisservicecanbemodelledbyusingtwoprimitives: Data.request(source,destination,SDU).Thisprimitiveisissuedbyauserthatspecies,asparameters,its (source)address,theaddressoftherecipientofthemessageandthemessageitself.Wewilluse Service DataUnit (SDU)tonamethemessagethatisexchangedtransparentlybetweentwousersofaservice. Data.indication(source,destination,SDU).Thisprimitiveisdeliveredbyaserviceprovidertoauser.It containsasparametersa ServiceDataUnit aswellastheaddressesofthesenderandthedestinationusers. Whendiscussingtheserviceprovidedinacomputernetwork,itisoftenusefultobeabletodescribetheinteractionsbetweentheusersandtheprovidergraphically.Afrequentlyusedrepresentationisthe time-sequence diagram.Inthischapterandlaterthroughoutthebook,wewilloftenusediagramssuchasthegurebelow.A time-sequencediagramdescribestheinteractionsbetweentwousersandaserviceprovider.Byconvention,the usersarerepresentedintheleftandrightpartsofthediagramwhiletheserviceprovideroccupiesthemiddleofthe diagram.Insuchatime-sequencediagram,timeowsfromthetop,tothebottomofthediagram.Eachprimitive 3 Thisserviceiscalledtheconnectionlessservicebecausethereisnoneedtocreateaconnectionbeforetransmittinganydataincontrast withtheconnection-orientedservice. 12 Chapter2.Introduction

PAGE 17

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 isrepresentedbyaplainhorizontalarrow,towhichthenameoftheprimitiveisattached.Thedashedlinesare usedtorepresentthepossiblerelationshipbetweentwo(ormore)primitives.Suchadiagramprovidesinformation abouttheorderingofthedifferentprimitives,butthedistancebetweentwoprimitivesdoesnotrepresentaprecise amountoftime. Thegurebelowprovidesarepresentationoftheconnectionlessserviceasa time-sequencediagram.Theuseron theleft,havingaddress S,issuesa Data.request primitivecontainingSDU M thatmustbedeliveredbytheservice providertodestination D.Thedashedlinebetweenthetwoprimitivesindicatesthatthe Data.indication primitive thatisdeliveredtotheuserontherightcorrespondstothe Data.request primitivesentbytheuserontheleft. Figure2.14:Asimpleconnectionlessservice Thereareseveralpossibleimplementationsoftheconnectionlessservice,whichwewilldiscusslaterinthisbook. Beforestudyingtheserealisations,itisusefultodiscussthepossiblecharacteristicsoftheconnectionlessservice. A reliableconnectionlessservice isaservicewheretheserviceproviderguaranteesthatallSDUssubmittedin Data.requests byauserwilleventuallybedeliveredtotheirdestination.Suchaservicewouldbeveryusefulfor users,butguaranteeingperfectdeliveryisdifcultinpractice.Forthisreason,computernetworksusuallysupport an unreliableconnectionlessservice. An unreliableconnectionless servicemaysufferfromvarioustypesofproblemscomparedtoa reliableconnectionlessservice.Firstofall,an unreliableconnectionlessservice doesnotguaranteethedeliveryofallSDUs. Thiscanbeexpressedgraphicallybyusingthetime-sequencediagrambelow. Inpractice,an unreliableconnectionlessservice willusuallydeliveralargefractionoftheSDUs.However,since thedeliveryofSDUsisnotguaranteed,theusermustbeabletorecoverfromthelossofanySDU. Asecondimperfectionthatmayaffectan unreliableconnectionlessservice isthatitmayduplicateSDUs.Some unreliableconnectionlessserviceprovidersmaydeliveranSDUsentbyausertwiceorevenmore.Thisis illustratedbythetime-sequencediagrambelow. Finally,someunreliableconnectionlessserviceprovidersmaydelivertoadestinationadifferentSDUthanthe onethatwassuppliedinthe Data.request.Thisisillustratedinthegurebelow. Whenauserinteractswithaserviceprovider,itmustpreciselyknowthelimitationsoftheunderlyingserviceto beabletoovercomeanyproblemthatmayarise.Thisrequiresaprecisedenitionofthecharacteristicsofthe underlyingservice. AnotherimportantcharacteristicoftheconnectionlessserviceiswhetheritpreservestheorderingoftheSDUs sentbyoneuser.Fromtheuser'sviewpoint,thisisoftenadesirablecharacteristic.Thisisillustratedinthegure below. However,manyconnectionlessservices,andinparticulartheunreliableservices,donotguaranteethattheywill alwayspreservetheorderingoftheSDUssentbyeachuser.Thisisillustratedinthegurebelow. 2.1.Servicesandprotocols 13

PAGE 18

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure2.15:AnunreliableconnectionlessservicemaylooseSDUs Figure2.16:AnunreliableconnectionlessservicemayduplicateSDUs 14 Chapter2.Introduction

PAGE 19

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure2.17:AnunreliableconnectionlessservicemaydelivererroneousSDUs Figure2.18:AconnectionlessservicethatpreservestheorderingofSDUssentbyagivenuser Figure2.19:AconnectionlessservicethatdoesnotpreservetheorderingofSDUssentbyagivenuser 2.1.Servicesandprotocols 15

PAGE 20

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 The connectionlessservice iswidelyusedincomputernetworksaswewillseelaterinthisbook.Severalvariations tothisbasicservicehavebeenproposed.Oneoftheseisthe conrmedconnectionlessservice .Thisserviceuses a Data.conrm primitiveinadditiontotheclassical Data.request and Data.indication primitives.Thisprimitive isissuedbytheserviceprovidertoconrmtoauserthedeliveryofapreviouslysentSDUtoitsrecipient.Note that,liketheregisteredserviceofthepostofce,the Data.conrm onlyindicatesthattheSDUhasbeendelivered tothedestinationuser.The Data.conrm primitivedoesnotindicatewhethertheSDUhasbeenprocessedbythe destinationuser.This conrmedconnectionlessservice isillustratedinthegurebelow. Figure2.20:Aconrmedconnectionlessservice The connectionlessservice wehavedescribedearlierisfrequentlyusedbyuserswhoneedtoexchangesmall SDUs.UsersneedingtoeithersendorreceiveseveraldifferentandpotentiallylargeSDUs,orwhoneedstructured exchangesoftenpreferthe connection-orientedservice. Aninvocationofthe connection-orientedservice isdividedintothreephases.Therstphaseistheestablishment ofa connection.A connection isatemporaryassociationbetweentwousersthroughaserviceprovider.Several connectionsmayexistatthesametimebetweenanypairofusers.Onceestablished,theconnectionisusedto transferSDUs. Connections usuallyprovideonebidirectionalstreamsupportingtheexchangeofSDUsbetween thetwousersthatareassociatedthroughthe connection.Thisstreamisusedtotransferdataduringthesecond phaseoftheconnectioncalledthe datatransfer phase.Thethirdphaseistheterminationoftheconnection.Once theusershavenishedexchangingSDUs,theyrequesttotheserviceprovidertoterminatetheconnection.Aswe willseelater,therearealsosomecaseswheretheserviceprovidermayneedtoterminateaconnectionitself. Theestablishmentofaconnectioncanbemodelledbyusingfourprimitives: Connect.request, Connect.indication, Connect.response and Connect.conrm.The Connect.request primitiveisusedtorequesttheestablishmentofa connection.Themainparameterofthisprimitiveisthe address ofthedestinationuser.Theserviceprovider deliversa Connect.indication primitivetoinformthedestinationuseroftheconnectionattempt.Ifitacceptsto establishaconnection,itrespondswitha Connect.response primitive.Atthispoint,theconnectionisconsidered tobeopenandthedestinationusercanstartsendingSDUsovertheconnection.Theserviceproviderprocesses the Connect.response andwilldelivera Connect.conrm totheuserwhoinitiatedtheconnection.Thedelivery ofthisprimitiveterminatestheconnectionestablishmentphase.Atthispoint,theconnectionisconsideredtobe openandbothuserscansendSDUs.Asuccessfulconnectionestablishmentisillustratedbelow. Theexampleaboveshowsasuccessfulconnectionestablishment.However,inpracticenotallconnectionsare successfullyestablished.Onereasonisthatthedestinationusermaynotagree,forpolicyorperformancereasons, toestablishaconnectionwiththeinitiatinguseratthistime.Inthiscase,thedestinationuserrespondstothe Connect.indication primitivebya Disconnect.request primitivethatcontainsaparametertoindicatewhythe connectionhasbeenrefused.Theserviceproviderwillthendelivera Disconnect.indication primitivetoinform theinitiatinguser.Asecondreasoniswhentheserviceproviderisunabletoreachthedestinationuser.This mighthappenbecausethedestinationuserisnotcurrentlyattachedtothenetworkorduetocongestion.Inthese 16 Chapter2.Introduction

PAGE 21

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure2.21:Connectionestablishment cases,theserviceproviderrespondstothe Connect.request witha Disconnect.indication primitivewhose reason parametercontainsadditionalinformationaboutthefailureoftheconnection. Figure2.22:Twotypesofrejectionforaconnectionestablishmentattempt Oncetheconnectionhasbeenestablished,theserviceprovidersuppliestwodatastreamstothecommunicating users.TherstdatastreamcanbeusedbytheinitiatingusertosendSDUs.Theseconddatastreamallows therespondingusertosendSDUstotheinitiatinguser.Thedatastreamscanbeorganisedindifferentways.A rstorganisationisthe message-mode transfer.Withthe message-mode transfer,theserviceproviderguarantees thatoneandonlyone Data.indication willbedeliveredtotheendpointofthedatastreamforeach Data.request primitiveissuedbytheotherendpoint.The message-mode transferisillustratedinthegurebelow.Themain advantageofthe message-transfer modeisthattherecipientreceivesexactlytheSDUsthatweresentbytheother user.IfeachSDUcontainsacommand,thereceivingusercanprocesseachcommandassoonasitreceivesa SDU. Unfortunately,the message-mode transferisnotwidelyusedontheInternet.OntheInternet,themostpopular connection-orientedservicetransfersSDUsin stream-mode.Withthe stream-mode,theserviceprovidersuppliesa bytestreamthatlinksthetwocommunicatingusers.Thesendingusersendsbytesbyusing Data.request primitives thatcontainsequencesofbytesasSDUs.TheserviceproviderdeliversSDUscontainingconsecutivebytestothe receivinguserbyusing Data.indication primitives.Theserviceproviderensuresthatallthebytessentatoneend 2.1.Servicesandprotocols 17

PAGE 22

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure2.23:Message-modetransferinaconnectionorientedservice ofthestreamaredeliveredcorrectlyinthesameorderattheotherendpoint.However,theserviceproviderdoes notattempttopreservetheboundariesoftheSDUs.Thereisnorelationenforcedbytheserviceproviderbetween thenumberof Data.request andthenumberof Data.indication primitives.The stream-mode isillustratedinthe gurebelow.Inpractice,aconsequenceoftheutilisationofthe stream-mode isthatiftheuserswanttoexchange structuredSDUs,theywillneedtoprovidethemechanismsthatallowthereceivingusertoseparatesuccessive SDUsinthebytestreamthatitreceives.Aswewillseeinthenextchapter,applicationlayerprotocolsoftenuse specicdelimiterssuchastheendoflinecharactertodelineateSDUsinabytestream. Figure2.24:Stream-modetransferinaconnectionorientedservice Thethirdphaseofaconnectioniswhenitneedstobereleased.Asaconnectioninvolvesthreeparties(twousers andoneserviceprovider),anyofthemcanrequesttheterminationoftheconnection.Usually,connectionsare terminateduponrequestofoneuseroncethedatatransferisnished.However,sometimestheserviceprovider maybeforcedtoterminateaconnection.Thiscanbeduetolackofresourcesinsidetheserviceprovideror becauseoneoftheusersisnotreachableanymorethroughthenetwork.Inthiscase,theserviceproviderwillissue Disconnect.indication primitivestobothusers.Theseprimitiveswillcontain,asparameter,someinformation aboutthereasonfortheterminationoftheconnection.Unfortunately,asillustratedinthegurebelow,whena 18 Chapter2.Introduction

PAGE 23

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 serviceproviderisforcedtoterminateaconnectionitcannotguaranteethatallSDUssentbyeachuserhavebeen deliveredtotheotheruser.Thisconnectionreleaseissaidtobeabruptasitcancauselossesofdata. Figure2.25:Abruptconnectionreleaseinitiatedbytheserviceprovider Anabruptconnectionreleasecanalsobetriggeredbyoneoftheusers.Ifauserneeds,foranyreason,toterminate aconnectionquickly,itcanissuea Disconnect.request primitiveandtorequestanabruptrelease.Theservice providerwillprocesstherequest,stopthetwodatastreamsanddeliverthe Disconnect.indication primitivetothe remoteuserassoonaspossible.Asillustratedinthegurebelow,thisabruptconnectionreleasemaycauselosses ofSDUs. Figure2.26:Abruptconnectionreleaseinitiatedbyauser ToensureareliabledeliveryoftheSDUssentbyeachuseroveraconnection,weneedtoconsiderthetwostreams thatcomposeaconnectionasindependent.AusershouldbeabletoreleasethestreamthatitusestosendSDUs onceithassentalltheSDUsthatitplannedtosendoverthisconnection,butstillcontinuetoreceiveSDUsover theoppositestream.This graceful connectionreleaseisusuallyperformedasshowninthegurebelow.Oneuser issuesa Disconnect.request primitivetoitsprovideronceithasissuedallits Data.request primitives.Theservice providerwillwaituntilall Data.indication primitiveshavebeendeliveredtothereceivinguserbeforeissuingthe Disconnnect.indication primitive.ThisprimitiveinformsthereceivinguserthatitwillnolongerreceiveSDUs overthisconnection,butitisstillabletoissue Data.request primitivesonthestreamintheoppositedirection. Oncetheuserhasissuedallofits Data.request primitives,itissuesa Disconnnect.request primitivetorequestthe terminationoftheremainingstream.Theserviceproviderwillprocesstherequestanddeliverthecorresponding Disconnect.indication totheotheruseronceithasdeliveredallthepending Data.indication primitives.Atthis 2.1.Servicesandprotocols 19

PAGE 24

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 point,alldatahasbeendeliveredandthetwostreamshavebeenreleasedsuccessfullyandtheconnectionis completelyclosed. Figure2.27:Gracefulconnectionrelease Note: Reliabilityoftheconnection-orientedservice Animportantpointtonoteabouttheconnection-orientedserviceisitsreliability.A connection-oriented service canonlyguaranteethecorrectdeliveryofallSDUsprovidedthattheconnectionhasbeenreleasedgracefully.This impliesthatwhiletheconnectionisactive,thereisnoguaranteefortheactualdeliveryoftheSDUsexchangedas theconnectionmayneedtobereleasedabruptlyatanytime. 2.2Thereferencemodels Giventhegrowingcomplexityofcomputernetworks,duringthe1970snetworkresearchersproposedvarious referencemodelstofacilitatethedescriptionofnetworkprotocolsandservices.Ofthese,theOpenSystems Interconnection(OSI)model [Zimmermann80] wasprobablythemostinuential.Itservedasthebasisforthe standardisationworkperformedwithinthe ISO todevelopglobalcomputernetworkstandards.Thereference modelthatweuseinthisbookcanbeconsideredasasimpliedversionoftheOSIreferencemodel 4 2.2.1Thevelayersreferencemodel Ourreferencemodelisdividedintovelayers,asshowninthegurebelow. Startingfromthebottom,therstlayeristhePhysicallayer.Twocommunicatingdevicesarelinkedthrougha physicalmedium.Thisphysicalmediumisusedtotransferanelectricaloropticalsignalbetweentwodirectly connecteddevices.Severaltypesofphysicalmediumsareusedinpractice: electricalcable.Informationcanbetransmittedoverdifferenttypesofelectricalcables.Themostcommon onesarethetwistedpairsthatareusedinthetelephonenetwork,butalsoinenterprisenetworksandcoaxial cables.CoaxialcablesarestillusedincableTVnetworks,butarenolongerusedinenterprisenetworks. Somenetworkingtechnologiesoperateovertheclassicalelectricalcable. opticalber.Opticalbersarefrequentlyusedinpublicandenterprisenetworkswhenthedistancebetweenthecommunicationdevicesislargerthanonekilometer.Therearetwomaintypesofopticalbers :multimodeandmonomode.MultimodeismuchcheaperthanmonomodeberbecauseaLEDcanbe 4 AninterestinghistoricaldiscussionoftheOSI-TCP/IPdebatemaybefoundin [Russel06] 20 Chapter2.Introduction

PAGE 25

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure2.28:Thevelayersofthereferencemodel usedtosendasignaloveramultimodeberwhileamonomodebermustbedrivenbyalaser.Duetothe differentmodesofpropagationoflight,monomodebersarelimitedtodistancesofafewkilometerswhile multimodeberscanbeusedoverdistancesgreaterthanseveraltensofkilometers.Inbothcases,repeaters canbeusedtoregeneratetheopticalsignalatoneendpointofabertosenditoveranotherber. wireless.Inthiscase,aradiosignalisusedtoencodetheinformationexchangedbetweenthecommunicatingdevices.Manytypesofmodulationtechniquesareusedtosendinformationoverawirelesschannel andthereislotofinnovationinthiseldwithnewtechniquesappearingeveryyear.Whilemostwireless networksrelyonradiosignals,someusealaserthatsendslightpulsestoaremotedetector.Theseoptical techniquesallowtocreatepoint-to-pointlinkswhileradio-basedtechniques,dependingonthedirectionality oftheantennas,canbeusedtobuildnetworkscontainingdevicesspreadoverasmallgeographicalarea. AnimportantpointtonoteaboutthePhysicallayeristheservicethatitprovides.Thisserviceisusuallyan unreliableconnection-orientedservicethatallowstheusersofthePhysicallayertoexchangebits.Theunitof informationtransferinthePhysicallayeristhebit.ThePhysicallayerserviceisunreliablebecause: thePhysicallayermaychange,e.g.duetoelectromagneticinterferences,thevalueofabitbeingtransmitted thePhysicallayermaydeliver more bitstothereceiverthanthebitssentbythesender thePhysicallayermaydeliver fewer bitstothereceiverthanthebitssentbythesender Thelasttwopointsmayseemstrangeatrstglance.Whentwodevicesareattachedthroughacable,howisit possibleforbitstobecreatedorlostonsuchacable? Thisismainlyduetothefactthatthecommunicatingdevicesusetheirownclocktotransmitbitsatagivenbit rate.Considerasenderhavingaclockthatticksonemilliontimespersecondandsendsonebiteverytick.Every microsecond,thesendersendsanelectricaloropticalsignalthatencodesonebit.Thesender'sbitrateisthus1 Mbps.Ifthereceiverclockticksexactly 5 everymicrosecond,itwillalsodeliver1Mbpstoitsuser.However,if thereceiver'sclockisslightlyfaster(resp.slower),thanitwilldeliverslightlymore(resp.less)thanonemillion bitseverysecond.Thisexplainswhythephysicallayermayloseorcreatebits. Note: Bitrate Incomputernetworks,thebitrateofthephysicallayerisalwaysexpressedinbitspersecond.OneMbpsisone millionbitspersecondandoneGbpsisonebillionbitspersecond.Thisisincontrastwithmemoryspecicationsthatareusuallyexpressedinbytes(8bits),KiloBytes(1024bytes)orMegaBytes(1048576bytes).Thus transferringoneMBytethrougha1Mbpslinklasts8.39seconds. 5 Havingperfectlysynchronisedclocksrunningatahighfrequencyisverydifcultinpractice.However,somephysicallayersintroducea feedbackloopthatallowsthereceiver'sclocktosynchroniseitselfautomaticallytothesender'sclock.However,notallphysicallayersinclude thiskindofsynchronisation. 2.2.Thereferencemodels 21

PAGE 26

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Bitrate Bitspersecond 1Kbps 10 3 1Mbps 10 6 1Gbps 10 9 1Tbps 10 12 Figure2.29:ThePhysicallayer Thephysicallayerallowsthustwoormoreentitiesthataredirectlyattachedtothesametransmissionmediumto exchangebits.Beingabletoexchangebitsisimportantasvirtuallyanyinformationcanbeencodedasasequence ofbits.Electricalengineersareusedtoprocessingstreamsofbits,butcomputerscientistsusuallyprefertodeal withhigherlevelconcepts.Asimilarissueariseswithlestorage.Storagedevicessuchashard-disksalsostore streamsofbits.Therearehardwaredevicesthatprocessthebitstreamproducedbyahard-disk,butcomputer scientistshavedesignedlesystemstoallowapplicationstoeasilyaccesssuchstoragedevices.Theselesystems aretypicallydividedintoseverallayersaswell.Hard-disksstoresectorsof512bytesormore.Unixlesystems groupsectorsinlargerblocksthatcancontaindataor inodes representingthestructureofthelesystem.Finally,applicationsmanipulatelesanddirectoriesthataretranslatedinblocks,sectorsandeventuallybitsbythe operatingsystem. Computernetworksuseasimilarapproach.Eachlayerprovidesaservicethatisbuiltabovetheunderlyinglayer andisclosertotheneedsoftheapplications. The Datalinklayer buildsontheserviceprovidedbytheunderlyingphysicallayer.The Datalinklayer allows twohoststhataredirectlyconnectedthroughthephysicallayertoexchangeinformation.Theunitofinformation exchangedbetweentwoentitiesinthe Datalinklayer isaframe.Aframeisanitesequenceofbits.Some Datalinklayers usevariable-lengthframeswhileothersonlyusexed-lengthframes.Some Datalinklayers provideaconnection-orientedservicewhileothersprovideaconnectionlessservice.Some Datalinklayers provide reliabledeliverywhileothersdonotguaranteethecorrectdeliveryoftheinformation. Animportantpointtonoteaboutthe Datalinklayer isthatalthoughthegurebelowindicatesthattwoentities ofthe Datalinklayer exchangeframesdirectly,inrealitythisisslightlydifferent.Whenthe Datalinklayer entity ontheleftneedstotransmitaframe,itissuesasmany Data.request primitivestotheunderlying physicallayer astherearebitsintheframe.Thephysicallayerwillthenconvertthesequenceofbitsinanelectromagnetic oropticalsignalthatwillbesentoverthephysicalmedium.The physicallayer ontherighthandsideofthe gurewilldecodethereceivedsignal,recoverthebitsandissuethecorresponding Data.indication primitivesto its Datalinklayer entity.Iftherearenotransmissionerrors,thisentitywillreceivetheframesentearlier. Figure2.30:TheDatalinklayer The Datalinklayer allowsdirectlyconnectedhoststoexchangeinformation,butitisoftennecessarytoexchange informationbetweenhoststhatarenotattachedtothesamephysicalmedium.Thisisthetaskofthe network layer.The networklayer isbuiltabovethe datalinklayer.Networklayerentitiesexchange packets.A packet is anitesequenceofbytesthatistransportedbythedatalinklayerinsideoneormoreframes.Apacketusually 22 Chapter2.Introduction

PAGE 27

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 containsinformationaboutitsoriginanditsdestination,andusuallypassesthroughseveralintermediatedevices calledroutersonitswayfromitsorigintoitsdestination. Figure2.31:Thenetworklayer Mostrealisationsofthenetworklayer,includingtheinternet,donotprovideareliableservice.However,many applicationsneedtoexchangeinformationreliablyandsousingthenetworklayerservicedirectlywouldbe verydifcultforthem.Ensuringthereliabledeliveryofthedataproducedbyapplicationsisthetaskofthe transportlayer. Transportlayer entitiesexchange segments.Asegmentisanitesequenceofbytesthatare transportedinsideoneormorepackets.Atransportlayerentityissuessegments(orsometimespartofsegments) as Data.request totheunderlyingnetworklayerentity. Therearedifferenttypesoftransportlayers.ThemostwidelyusedtransportlayersontheInternetare TCP ,thatprovidesareliableconnection-orientedbytestreamtransportservice,and UDP ,thatprovidesanunreliable connection-lesstransportservice. Figure2.32:Thetransportlayer Theupperlayerofourarchitectureisthe Applicationlayer.Thislayerincludesallthemechanismsanddata structuresthatarenecessaryfortheapplications.WewilluseApplicationDataUnit(ADU)toindicatethedata exchangedbetweentwoentitiesoftheApplicationlayer. Figure2.33:TheApplicationlayer 2.2.2TheTCP/IPreferencemodel IncontrastwithOSI,theTCP/IPcommunitydidnotspendalotofeffortdeningadetailedreferencemodel;in fact,thegoalsoftheInternetarchitecturewereonlydocumentedafterTCP/IPhadbeendeployed [Clark88]. RFC 1122 ,whichdenestherequirementsforInternethosts,mentionsfourdifferentlayers.Startingfromthetop, theseare: anApplicationlayer aTransportlayer anInternetlayerwhichisequivalenttothenetworklayerofourreferencemodel aLinklayerwhichcombinesthefunctionalitiesofthephysicalanddatalinklayersofourve-layerreference model Besidesthisdifferenceinthelowerlayers,theTCP/IPreferencemodelisveryclosetothevelayersthatweuse throughoutthisdocument. 2.2.Thereferencemodels 23

PAGE 28

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 2.2.3TheOSIreferencemodel Comparedtothevelayersreferencemodelexplainedabove,the OSI referencemodeldenedin [X200] is dividedinsevenlayers.Thefourlowerlayersaresimilartothefourlowerlayersdescribedabove.TheOSI referencemodelrenedtheapplicationlayerbydividingitinthreelayers: theSessionlayer.TheSessionlayercontainstheprotocolsandmechanismsthatarenecessarytoorganize andtosynchronizethedialogueandtomanagethedataexchangeofpresentationlayerentities.Whileone ofthemainfunctionsofthetransportlayeristocopewiththeunreliabilityofthenetworklayer,thesession's layerobjectiveistohidethepossiblefailuresoftransport-levelconnectionstotheupperlayerhigher.For this,theSessionLayerprovidesservicesthatallowtoestablishasession-connection,tosupportorderlydata exchange(includingmechanismsthatallowtorecoverfromtheabruptreleaseofanunderlyingtransport connection),andtoreleasetheconnectioninanorderlymanner. thePresentationlayerwasdesignedtocopewiththedifferentwaysofrepresentinginformationoncomputers.Therearemanydifferencesinthewaycomputerstoreinformation.Somecomputersstoreintegersas 32bitseld,othersuse64bitseldandthesameproblemariseswithoatingpointnumber.Fortextual information,thisisevenmorecomplexwiththemanydifferentcharactercodesthathavebeenused 6 .The situationisevenmorecomplexwhenconsideringtheexchangeofstructuredinformationsuchasdatabase records.Tosolvethisproblem,thePresentationlayercontainsprovidesforacommonrepresentationofthe datatransferred.The ASN.1 notationwasdesignedforthePresentationlayerandisstillusedtodaybysome protocols. theApplicationlayerthatcontainsthemechanismsthatdonottinneitherthePresentationnortheSession layer.TheOSIApplicationlayerwasitselffurtherdividedinseveralgenericserviceelements. Note: WherearethemissinglayersinTCP/IPreferencemodel? TheTCP/IPreferenceplacesthePresentationandtheSessionlayersimplicitlyintheApplicationlayer.The mainmotivationsforsimplifyingtheupperlayersintheTCP/IPreferencemodelwerepragmatic.MostInternet applicationsstartedasprototypesthatevolvedandwerelaterstandardised.Manyoftheseapplicationsassumed thattheywouldbeusedtoexchangeinformationwritteninAmericanEnglishandforwhichthe7bitsUS-ASCII charactercodewassufcient.Thiswasthecaseforemail,butaswe'llseeinthenextchapter,emailwasableto evolvetosupportdifferentcharacterencodings.Someapplicationsconsideredthedifferentdatarepresentations explicitly.Forexample, ftp containedmechanismstoconvertalefromoneformattoanotherandtheHTML languagewasdenedtorepresentwebpages.Ontheotherhand,manyISOspecicationsweredevelopedby committeescomposedofpeoplewhodidnotallparticipateinactualimplementations.ISOspentalotofeffort analysingtherequirementsanddeningasolutionthatmeetsalloftheserequirements.Unfortunately,someofthe specicationsweresocomplexthatitwasdifculttoimplementthemcompletelyandthestandardisationbodies denedrecommendedprolesthatcontainedtheimplementedsetsofoptions... Figure2.34:ThesevenlayersoftheOSIreferencemodel 6 Thereisnowaroughconsensusforthegreateruseofthe Unicode characterformat.Unicodecanrepresentmorethan100,000different charactersfromtheknownwrittenlanguagesonEarth.Maybeoneday,allcomputerswillonlyuseUnicodetorepresentalltheirstored charactersandUnicodecouldbecomethestandardformattoexchangecharacters,butwearenotyetatthisstagetoday. 24 Chapter2.Introduction

PAGE 29

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 2.3Organisationofthebook Thisdocumentisorganisedaccordingtothe TCP/IP referencemodelandfollowsatop-downapproach.Most oftheclassicalnetworkingtextbookschoseabottom-upapproach,i.e.theyrstexplainedalltheelectricaland opticaldetailsofthephysicallayerthenmovedtothedatalinklayer.Thisapproachworkedwellduringtheinfancy ofcomputernetworksanduntilthelate1990s.Atthattime,moststudentswerenotusersofcomputernetworks anditwasusefultoexplaincomputernetworksbybuildingthecorrespondingprotocolsfromthesimplest,inthe physicallayer,uptotheapplicationlayer.Today,allstudentsareactiveusersofInternetapplications,andstarting tolearncomputernetworkingbylookingatbitsisnotverymotivating.Startingfrom [KuroseRoss09],many textbooksandteachershavechosenatop-downapproach.Thisapproachstartsfromapplicationssuchasemail andwebthatstudentsalreadyknowandexploresthedifferentlayers,startingfromtheapplicationlayer.This approachworksquitewellwithtoday'sstudents.Thetraditionalbottom-upapproachcouldinfactbeconsidered asanengineeringapproachasitstartsfromthesimplenetworkthatallowstheexchangeofbits,andexplainshow tocombinedifferentprotocolsandmechanismstobuildthemostcomplexapplications.Thetop-downapproach couldontheotherhandbeconsideredasascienticapproach.Likebiologists,itstartsfromanexisting(manbuilt)systemandexploresitlayerbylayer. Besidesthetop-downversusbottom-uporganisation,computernetworkingbookscaneitheraimathavingan in-depthcoverageofasmallnumberoftopics,orathavingalimitedcoverageofawiderangeoftopics.Covering awiderangeoftopicsisinterestingforintroductorycoursesorforstudentswhodonotneedadetailedknowledge ofcomputernetworks.Itallowsthestudentstolearna littleabouteverything andthenstartfromthisbasic knowledgelateriftheyneedtounderstandcomputernetworkinginmoredetail.Thisbookschosetocover,in detail,asmallernumberoftopicsthanothertextbooks.Thisismotivatedbythefactthatcomputernetworksoften needtobepushedtotheirlimits.Understandingthedetailsofthemainnetworkingprotocolsisimportanttobe abletofullygrasphowanetworkbehavesorextendittoprovideinnovativeservices 7 Thebookisorganisedasfollows:Werstdescribetheapplicationlayerinchapter TheapplicationLayer .Given thelargenumberofInternet-basedapplications,itisofcourseimpossibletocoverthemallindetail.Insteadwe focusonthreetypesofInternet-basedapplications.WerststudytheDomainNameSystem(DNS)andthen explainsomeoftheprotocolsinvolvedintheexchangeofelectronicmail.Thediscussionoftheapplicationlayer endswithadescriptionofthekeyprotocolsoftheworldwideweb. Alltheseapplicationsrelyonthetransportlayerthatisexplainedinchapter Thetransportlayer .Thisisakey layerintoday'snetworksasitcontainsallthemechanismsnecessarytoprovideareliabledeliveryofdataoveran unreliablenetwork.Wecoverthetransportlayerbyrstdevelopingasimplereliabletransportlayerprotocoland thenexplainthedetailsoftheTCPandUDPprotocolsusedinTCP/IPnetworks. Afterthetransportlayer,weanalysethenetworklayerinchapter Thenetworklayer .Thisisalsoaveryimportant layerasitisresponsibleforthedeliveryofpacketsfromanysourcetoanydestinationthroughintermediaterouters. Inthenetworklayer,wedescribethetwopossibleorganisationsofthenetworklayerandtheroutingprotocols basedonlink-stateanddistancevectors.ThenweexplainindetailtheIPv4,IPv6,RIP,OSPFandBGPprotocols thatareactuallyusedintoday'sInternet. Thelastchapterofthebookisdevotedtothedatalinklayer.Inchapter ThedatalinklayerandtheLocalArea Networks,webeginbyexplainingtheprinciplesofthedatalinklayersonpoint-to-pointlinks.Then,wefocuson theLocalAreaNetworks.WerstdescribetheMediumAccessControlalgorithmsthatallowmultiplehoststo shareonetransmissionmedium.Weconsiderbothopportunisticanddeterministictechniques.Wethenexplainin detailtwotypesofLANsthatareimportantfromadeploymentviewpointtoday:EthernetandWiFi. 7 Apopularquotesays, thedevilisinthedetails.Thisquotereectsverywelltheoperationofmanynetworkprotocols,wherethechange ofasinglebitmayhavehugeconsequences.Incomputernetworks,understanding all thedetailsissometimesnecessary. 2.3.Organisationofthebook 25

PAGE 30

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 26 Chapter2.Introduction

PAGE 31

CHAPTER 3 TheapplicationLayer TheApplicationLayeristhemostimportantandmostvisiblelayerincomputernetworks.Applicationsresidein thislayerandhumanusersinteractviathoseapplicationsthroughthenetwork. Inthischapter,werstbrieydescribethemainprinciplesoftheapplicationlayerandfocusonthetwomost importantapplicationmodels:theclient-serverandthepeer-to-peermodels.Then,wereviewindetailtwo familiesofprotocolsthathaveprovedtobeveryusefulintheInternet:electronicmailandtheprotocolsthat allowaccesstoinformationontheworldwideweb.WealsodescribetheDomainNameSystemthatallows humanstouseuser-friendlynameswhilethehostsuse32bitsor128bitslongIPaddresses. 3.1Principles Thearetwoimportantmodelsusedtoorganiseanetworkedapplication.Therstandoldestmodelistheclientservermodel.Inthismodel,aserverprovidesservicestoclientsthatexchangeinformationwithit.Thismodelis highlyasymmetrical:clientssendrequestsandserversperformactionsandreturnresponses.Itisillustratedin thegurebelow. Figure3.1:Theclient-servermodel Theclient-servermodelwastherstmodeltobeusedtodevelopnetworkedapplications.Thismodelcomes naturallyfromthemainframesandminicomputersthatweretheonlynetworkedcomputersuseduntilthe1980s. Aminicomputerisamulti-usersystemthatisusedbytensormoreusersatthesametime.Eachuserinteracts withtheminicomputerbyusingaterminal.Thoseterminals,weremainlyascreen,akeyboardandacabledirectly connectedtotheminicomputer. Therearevarioustypesofserversaswellasvarioustypesofclients.Awebserverprovidesinformationin responsetothequerysentbyitsclients.Aprintserverprintsdocumentssentasqueriesbytheclient.An emailserverwillforwardtowardstheirrecipienttheemailmessagessentasquerieswhileamusicserverwill deliverthemusicrequestedbytheclient.Fromtheviewpointoftheapplicationdeveloper,theclientandthe serverapplicationsdirectlyexchangemessages(thehorizontalarrowslabelled Queries and Responses inthe abovegure),butinpracticethesemessagesareexchangedthankstotheunderlyinglayers(theverticalarrowsin theabovegure).Inthischapter,wefocusonthesehorizontalexchangesofmessages. 27

PAGE 32

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Networkedapplicationsdonotexchangerandommessages.Inordertoensurethattheserverisabletounderstand thequeriessentbyaclient,andalsothattheclientisabletounderstandtheresponsessentbytheserver,theymust bothagreeonasetofsyntacticalandsemanticrules.Theserulesdenetheformatofthemessagesexchangedas wellastheirordering.Thissetofrulesiscalledanapplication-level protocol. An application-levelprotocol issimilartoastructuredconversationbetweenhumans.AssumethatAlicewants toknowthecurrenttimebutdoesnothaveawatch.IfBobpassescloseby,thefollowingconversationcouldtake place: Alice: Hello Bob: Hello Alice: Whattimeisit? Bob: 11:55 Alice: Thankyou Bob: You'rewelcome SuchaconversationsucceedsifbothAliceandBobspeakthesamelanguage.IfAlicemeetsTchangwhoonly speaksChinese,shewon'tbeabletoaskhimthecurrenttime.Aconversationbetweenhumanscanbemore complex.Forexample,assumethatBobisasecurityguardwhosedutyistoonlyallowtrustedsecretagentsto enterameetingroom.Ifallagentsknowasecretpassword,theconversationbetweenBobandTrudycouldbeas follows: Bob: Whatisthesecretpassword? Trudy: 1234 Bob: Thisisthecorrectpassword,you'rewelcome IfAlicewantstoenterthemeetingroombutdoesnotknowthepassword,herconversationcouldbeasfollows: Bob: Whatisthesecretpassword? Alice: 3.1415 Bob: Thisisnotthecorrectpassword. Humanconversationscanbeveryformal,e.g.whensoldierscommunicatewiththeirhierarchy,orinformalsuch aswhenfriendsdiscuss.Computersthatcommunicatearemoreakintosoldiersandrequirewell-denedrulesto ensureansuccessfulexchangeofinformation.Therearetwotypesofrulesthatdenehowinformationcanbe exchangedbetweencomputers: syntacticalrulesthatpreciselydenetheformatofthemessagesthatareexchanged.Ascomputersonly processbits,thesyntacticalrulesspecifyhowinformationisencodedasbitstrings organisationoftheinformationow.Formanyapplications,theowofinformationmustbestructuredand thereareprecedencerelationshipsbetweenthedifferenttypesofinformation.Inthetimeexampleabove, AlicemustgreetBobbeforeaskingforthecurrenttime.Alicewouldnotaskforthecurrenttimerstand greetBobafterwards.Suchprecedencerelationshipsexistinnetworkedapplicationsaswell.Forexample, aservermustreceiveausernameandavalidpasswordbeforeacceptingmorecomplexcommandsfromits clients. Letusrstdiscussthesyntacticalrules.Wewilllaterexplainhowtheinformationowcanbeorganisedby analysingrealnetworkedapplications. Application-layerprotocolsexchangetwotypesofmessages.Someprotocolssuchasthoseusedtosupport electronicmailexchangemessagesexpressedasstringsorlinesofcharacters.Asthetransportlayerallowshosts toexchangebytes,theyneedtoagreeonacommonrepresentationofthecharacters.Therstandsimplestmethod toencodecharactersistousethe ASCII table. RFC20 providestheASCIItablethatisusedbymanyprotocols ontheInternet.Forexample,thetabledenesthefollowingbinaryrepresentations: A : 1000011b 0 : 0110000b z : 1111010b 28Chapter3.TheapplicationLayer

PAGE 33

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 @ : 1000000b space : 0100000b Inaddition,the ASCII tablealsodenesseveralnon-printableorcontrolcharacters.Thesecharactersweredesignedtoallowanapplicationtocontrolaprinteroraterminal.Thesecontrolcharactersinclude CR and LF,that areusedtoterminatealine,andthe Bell characterwhichcausestheterminaltoemitasound. carriagereturn (CR): 0001101b linefeed (LF ): 0001010b Bell : 0000111b The ASCII charactersareencodedasasevenbitseld,buttransmittedasaneight-bitsbytewhosehighorderbit isusuallysetto 0.Bytesarealwaystransmittedstartingfromthehighorderormostsignicantbit. Mostapplicationsexchangestringsthatarecomposedofxedorvariablenumbersofcharacters.Acommon solutiontodenethecharacterstringsthatareacceptableistodenethemasagrammarusingaBackus-Naur Form(BNF )suchastheAugmentedBNFdenedin RFC5234.ABNFisasetofproductionrulesthatgenerate allvalidcharacterstrings.Forexample,consideranetworkedapplicationthatusestwocommands,wherethe usercansupplyausernameandapassword.TheBNFforthisapplicationcouldbedenedasshowninthegure below. Figure3.2:AsimpleBNFspecication Theexampleabovedenesseveralterminalsandtwocommands: usercommand and passwordcommand.The ALPHA terminalcontainsalllettersinupperandlowercase.Inthe ALPHA rule, %x41 correspondstoASCII charactercode41inhexadecimal,i.e.capital A.The CR and LF terminalscorrespondtothecarriagereturnand linefeedcontrolcharacters.The CRLF ruleconcatenatesthesetwoterminalstomatchthestandardendofline termination.The DIGIT terminalcontainsalldigits.The SP terminalcorrespondstothewhitespacecharacters. The usercommand iscomposedoftwostringsseparatedbywhitespace.IntheABNFrulesthatdenethe messagesusedbyInternetapplications,thecommandsarecase-insensitive.Therule user correspondstoall possiblecasesofthelettersthatcomposethewordbetweenbrackets,e.g. user, uSeR, USER, usER,...A username containsatleastoneletterandupto8letters.Usernamesarecase-sensitiveastheyarenotdenedasastring betweenbrackets.The password ruleindicatesthatapasswordstartswithaletterandcancontainanynumberof lettersordigits.Thewhitespaceandthecontrolcharacterscannotappearina password denedbytheaboverule. Besidescharacterstrings,someapplicationsalsoneedtoexchange16bitsand32bitseldssuchasintegers.A naivesolutionwouldhavebeentosendthe16-or32-bitseldasitisencodedinthehost'smemory.Unfortunately, therearedifferentmethodstostore16-or32-bitseldsinmemory.SomeCPUsstorethemostsignicantbyte ofa16-bitseldintherstaddressoftheeldwhileothersstoretheleastsignicantbyteatthislocation.When networkedapplicationsrunningondifferentCPUsexchange16bitselds,therearetwopossibilitiestotransfer themoverthetransportservice: sendthemostsignicantbytefollowedbytheleastsignicantbyte sendtheleastsignicantbytefollowedbythemostsignicantbyte Therstpossibilitywasnamed big-endian inanotewrittenbyCohen [Cohen1980] whilethesecondwasnamed little-endian.VendorsofCPUsthatused big-endian inmemoryinsistedonusing big-endian encodinginnetworkedapplicationswhilevendorsofCPUsthatused little-endian recommendedtheopposite.Severalstudies werewrittenontherelativemeritsofeachtypeofencoding,butthediscussionbecamealmostareligiousissue [Cohen1980].Eventually,theInternetchosethe big-endian encoding,i.e.multi-byteeldsarealwaystransmittedbysendingthemostsignicantbyterst, RFC791 referstothisencodingasthe network-byteorder .Most 3.1.Principles 29

PAGE 34

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 libraries 1 usedtowritenetworkedapplicationscontainfunctionstoconvertmulti-byteeldsfrommemorytothe networkbyteorderandviceversa. Besides16and32bitwords,someapplicationsneedtoexchangedatastructurescontainingbiteldsofvarious lengths.Forexample,amessagemaybecomposedofa16bitseldfollowedbyeight,onebitags,a24bits eldandtwo8bitsbytes.Internetprotocolspecicationswilldenesuchamessagebyusingarepresentation suchastheonebelow.Inthisrepresentation,eachlinecorrespondsto32bitsandtheverticallinesareusedto delineateelds.Thenumbersabovethelinesindicatethebitpositionsinthe32-bitsword,withthehighorderbit atposition 0. Figure3.3:Messageformat Themessagementionedabovewillbetransmittedstartingfromtheupper32-bitswordinnetworkbyteorder.The rsteldisencodedin16bits.Itisfollowedbyeightonebitags(A-H ),a24bitseldwhosehighorderbyteis shownintherstlineandthetwoloworderbytesappearinthesecondlinefollowedbytwoonebyteelds.This ASCIIrepresentationisfrequentlyusedwhendeningbinaryprotocols.Wewilluseitforallthebinaryprotocols thatarediscussedinthisbook. Wewilldiscussseveralexamplesofapplication-levelprotocolsinthischapter. 3.1.1Thepeer-to-peermodel Thepeer-to-peermodelemergedduringthelasttenyearsasanotherpossiblearchitecturefornetworkedapplications.Inthetraditionalclient-servermodel,hostsacteitherasserversorasclientsandaserverservesalarge numberofclients.Inthepeer-to-peermodel,allhostsactasbothserversandclientsandtheyplaybothroles. Thepeer-to-peermodelhasbeenusedtodevelopvariousnetworkedapplications,rangingfromInternettelephony tolesharingorInternet-widelesystems.Adetaileddescriptionofpeer-to-peerapplicationsmaybefoundin [BYL2008].Surveysofpeer-to-peerprotocolsandapplicationsmaybefoundin [AS2004] and [LCP2005]. 3.1.2Thetransportservices Networkedapplicationsarebuiltontopofthetransportservice.Asexplainedinthepreviouschapter,thereare twomaintypesoftransportservices: the connectionless or datagram service the connection-oriented or byte-stream service TheconnectionlessserviceallowsapplicationstoeasilyexchangemessagesorServiceDataUnits.OntheInternet, thisserviceisprovidedbytheUDPprotocolthatwillbeexplainedinthenextchapter.Theconnectionlesstransport serviceontheInternetisunreliable,butisabletodetecttransmissionerrors.Thisimpliesthatanapplicationwill notreceiveanSDUthathasbeencorruptedduetotransmissionerrors. Theconnectionlesstransportserviceallowsnetworkedapplicationtoexchangemessages.Severalnetworked applicationsmayberunningatthesametimeonasinglehost.Eachoftheseapplicationsmustbeabletoexchange SDUswithremoteapplications.ToenabletheseexchangesofSDUs,eachnetworkedapplicationrunningona hostisidentiedbythefollowinginformation: the host onwhichtheapplicationisrunning the portnumber onwhichtheapplication listens forSDUs 1 Forexample,the htonl(3) (resp. ntohl(3))functionthestandardClibraryconvertsa32-bitsunsignedintegerfromthebyteorder usedbytheCPUtothenetworkbyteorder(resp.fromthenetworkbyteordertotheCPUbyteorder).Similarfunctionsexistinother programminglanguages. 30Chapter3.TheapplicationLayer

PAGE 35

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 OntheInternet,the portnumber isanintegerandthe host isidentiedbyitsnetworkaddress.Aswewillseein chapter Thenetworklayer therearetwotypesofInternetAddresses: IPversion4 addressesthatare32bitswide IPversion6 addressesthatare128bitswide IPv4addressesareusuallyrepresentedbyusingadotteddecimalrepresentationwhereeachdecimalnumber correspondstoonebyteoftheaddress,e.g. 203.0.113.56.IPv6addressesareusuallyrepresentedasasetof hexadecimalnumbersseparatedbysemicolons,e.g. 2001:db8:3080:2:217:f2ff:fed6:65c0.Today,mostInternet hostshaveoneIPv4address.AsmallfractionofthemalsohaveanIPv6address.Inthefuture,wecanexpectthat moreandmorehostswillhaveIPv6addressesandthatsomeofthemwillnothaveanIPv4addressanymore.A hostthatonlyhasanIPv4addresscannotcommunicatewithahosthavingonlyanIPv6address.Thegurebelow illustratestwothatareusingthedatagramserviceprovidedbyUDPonhoststhatareusingIPv4addresses. Figure3.4:Theconnectionlessordatagramservice Thesecondtransportserviceistheconnection-orientedservice.OntheInternet,thisserviceisoftencalledthe byte-streamservice asitcreatesareliablebytestreambetweenthetwoapplicationsthatarelinkedbyatransport connection.Likethedatagramservice,thenetworkedapplicationsthatusethebyte-streamserviceareidentied bythehostonwhichtheyrunandaportnumber.ThesehostscanbeidentiedbyanIPv4address,anIPv6 addressoraname.Thegurebelowillustratestwoapplicationsthatareusingthebyte-streamserviceprovided bytheTCPprotocolonIPv6hosts.ThebytestreamserviceprovidedbyTCPisreliableandbidirectional. Figure3.5:Theconnection-orientedorbyte-streamservice 3.1.Principles 31

PAGE 36

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 3.2Application-levelprotocols Manyprotocolshavebeendenedfornetworkedapplications.Inthissection,wedescribesomeoftheimportant applicationsthatareusedontheInternet.WerstexplaintheDomainNameSystem(DNS)thatenableshoststo beidentiedbyhuman-friendlynamesinsteadoftheIPv4orIPv6addressesthatareusedbythenetwork.Then, wedescribetheoperationofelectronicmail,oneoftherstkillerapplicationsontheglobalInternet,andthe protocolsusedonworldwideweb. 3.2.1TheDomainNameSystem IntheearlydaysoftheInternet,therewereonlyafewnumberofhosts(mainlyminicomputers)connectedtothe network.Themostpopularapplicationswereremoteloginandletransfer.By1983,therewerealreadyve hundredhostsattachedtotheInternet.EachofthesehostswereidentiedbyauniqueIPv4address.Forcing humanuserstoremembertheIPv4addressesoftheremotehoststhattheywanttousewasnotuser-friendly. Humanusersprefertoremembernames,andusethemwhenneeded.Usingnamesasaliasesforaddressesisa commontechniqueinComputerScience.Itsimpliesthedevelopmentofapplicationsandallowsthedeveloper toignorethelowleveldetails.Forexample,byusingaprogramminglanguageinsteadofwritingmachinecode, adevelopercanwritesoftwarewithoutknowingwhetherthevariablesthatitusesarestoredinmemoryorinside registers. Becausenamesareatahigherlevelthanaddresses,theyallow(bothintheexampleofprogrammingabove,andon theInternet)totreataddressesasmeretechnicalidentiers,whichcanchangeatwill.Onlythenamesarestable. Ontoday'sInternet,whereswitchingtoanotherISPmeanschangingyourIPaddresses,theuser-friendlinessof domainnamesislessimportant(theyarenotoftentypedbyusers)buttheirstabilityremainsaveryimportant, maybetheirmostimportantproperty. Therstsolutionthatallowedapplicationstousenameswasthe hosts.txt le.Thisleissimilartothesymbol tablefoundincompiledcode.ItcontainsthemappingbetweenthenameofeachInternethostanditsassociatedIP address 2 .ItwasmaintainedbySRIInternationalthatcoordinatedtheNetworkInformationCenter(NIC).When anewhostwasconnectedtothenetwork,thesystemadministratorhadtoregisteritsnameandIPaddressatthe NIC.TheNICupdatedthe hosts.txt leonitsserver.AllInternethostsregularlyretrievedtheupdated hosts.txt lefromtheservermaintainedbySRI.Thislewasstoredatawell-knownlocationoneachInternethost(see RFC952)andnetworkedapplicationscoulduseittondtheIPaddresscorrespondingtoaname. A hosts.txt lecanbeusedwhenthereareuptoafewhundredhostsonthenetwork.However,itisclearlynot suitableforanetworkcontainingthousandsormillionsofhosts.Akeyissueinalargenetworkistodenea suitablenamingscheme.TheARPANetinitiallyusedaatnamingspace,i.e.eachhostwasassignedaunique name.Tolimitcollisionsbetweennames,thesenamesusuallycontainedthenameoftheinstitutionandasufxto identifythehostinsidetheinstitution(akindofpoorman'shierarchicalnamingscheme).OntheARPANetfew institutionshadseveralhostsconnectedtothenetwork. However,thelimitationsofaatnamingschemebecameclearbeforetheendoftheARPANetand RFC819 proposedahierarchicalnamingscheme.While RFC819 discussedthepossibilityoforganisingthenamesasa directedgraph,theInternetoptedeventuallyforatreestructurecapableofcontainingallnames.Inthistree,the top-leveldomainsarethosethataredirectlyattachedtotheroot.Thersttop-leveldomainwas .arpa 3 .This top-levelnamewasinitiallyaddedasasufxtothenamesofthehostsattachedtotheARPANetandlistedin the hosts.txt le.In1984,the .gov, .edu, .com, .mil and .org generictop-leveldomainnameswereaddedand RFC1032 proposedtheutilisationofthetwoletter ISO-3166 countrycodesastop-leveldomainnames.Since ISO-3166 denesatwolettercodeforeachcountryrecognisedbytheUnitedNations,thisallowedallcountries toautomaticallyhaveatop-leveldomain.Thesedomainsinclude .be forBelgium, .fr forFrance, .us fortheUSA, .ie forIrelandor .tv forTuvalu,agroupofsmallislandsinthePacicand .tm forTurkmenistan.Today,theset oftop-leveldomain-namesismanagedbytheInternetCorporationforAssignedNamesandNumbers(ICANN ). Recently, ICANN addedadozenofgenerictop-leveldomainsthatarenotrelatedtoacountryandthe .cat top-level domainhasbeenregisteredfortheCatalanlanguage.Thereareongoingdiscussionswithin ICANN toincrease thenumberoftop-leveldomains. 2 The hosts.txt leisnotmaintainedanymore.AhistoricalsnapshotretrievedonApril15th,1984isavailablefrom http://ftp.univie.ac.at/netinfo/netinfo/hosts.txt 3 Seehttp://www.donelan.com/dnstimeline.html foratimelineofDNSrelateddevelopments. 32Chapter3.TheapplicationLayer

PAGE 37

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Eachtop-leveldomainismanagedbyanorganisationthatdecideshowsub-domainnamescanberegistered.Most top-leveldomainnamesusearst-comerstservedsystem,andallowanyonetoregisterdomainnames,but therearesomeexceptions.Forexample, .gov isreservedfortheUSgovernment, .int isreservedforinternational organisationsandnamesinthe .ca aremainlyreserved forcompaniesoruserswhoarepresentinCanada. Figure3.6:Thetreeofdomainnames RFC1035 recommendedthefollowing BNF forfullyqualieddomainnames,toallowhostnameswithasyntax whichworkswithallapplications(thedomainnamesthemselveshaveamuchrichersyntax). Figure3.7:BNFofthefullyqualiedhostnames Thisgrammarspeciesthatahostnameisanorderedlistoflabelsseparatedbythedot(.)character.Eachlabel cancontainletters,numbersandthehyphencharacter(-) 4 .Fullyqualieddomainnamesarereadfromleftto right.Therstlabelisahostnameoradomainnamefollowedbythehierarchyofdomainsandendingwiththe rootimplicitlyattheright.Thetop-leveldomainnamemustbeoneoftheregisteredTLDs 5 .Forexample,inthe abovegure, www.whitehouse.gov correspondstoahostnamed www insidethe whitehouse domainthatbelongs tothe gov top-leveldomain. info.ucl.ac.be correspondstothe info domaininsidethe ucl domainthatisincluded inthe ac sub-domainofthe be top-leveldomain. ThishierarchicalnamingschemeisakeycomponentoftheDomainNameSystem(DNS).TheDNSisadistributeddatabasethatcontainsmappingsbetweenfullyqualieddomainnamesandIPaddresses.TheDNSuses theclient-servermodel.Theclientsarehoststhatneedtoretrievethemappingforagivenname.Each nameserver storespartofthedistributeddatabaseandanswersthequeriessentbyclients.Thereisatleastone nameserver that isresponsibleforeachdomain.Inthegurebelow,domainsarerepresentedbycirclesandtherearethreehosts insidedomain dom (h1, h2 and h3)andthreehostsinsidedomain a.sdom1.dom.Asshowninthegurebelow,a sub-domainmaycontainbothhostnamesandsub-domains. Figure3.8:Asimpletreeofdomainnames 4 Thisspecicationevolvedlatertosupportdomainnameswrittenbyusingothercharactersetsthanus-ASCII RFC5890.Thisextension isimportanttosupportlanguagesotherthanEnglish,butadetaileddiscussionisoutsidethescopeofthisdocument. 5 Theofciallistoftop-leveldomainnamesismaintainedby:term:`IANAat http://data.iana.org/TLD/tlds-alpha-by-domain.txt Additional informationaboutthesedomainsmaybefoundat http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains 3.2.Application-levelprotocols33

PAGE 38

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 A nameserver thatisresponsiblefordomain dom candirectlyanswerthefollowingqueries: theIPaddressofanyhostresidingdirectlyinsidedomain dom (e.g. h2.dom inthegureabove) thenameserver(s)thatareresponsibleforanydirectsub-domainofdomain dom (i.e. sdom1.dom and sdom2.dom inthegureabove,butnot z.sdom1.dom) Toretrievethemappingforhost h2.dom,aclientsendsitsquerytothenameserverthatisresponsiblefordomain .dom.Thenameserverdirectlyanswersthequery.Toretrieveamappingfor h3.a.sdom1.dom aDNSclientrst sendsaquerytothenameserverthatisresponsibleforthe .dom domain.Thisnameserverreturnsthenameserver thatisresponsibleforthe sdom1.dom domain.Thisnameservercannowbecontactedtoobtainthenameserver thatisresponsibleforthe a.sdom1.dom domain.Thisnameservercanbecontactedtoretrievethemappingforthe h3.a.sdom1.dom name.Thankstothisorganisationofthenameservers,itispossibleforaDNSclienttoobtainthe mappingofanyhostinsidethe .dom domainoranyofitssubdomains.ToensurethatanyDNSclientwillbeable toresolveanyfullyqualieddomainname,therearespecialnameserversthatareresponsiblefortherootofthe domainnamehierarchy.Thesenameserversarecalled rootnameserver .Therearecurrentlyaboutadozenroot nameservers 6 Eachrootnameservermaintainsthelist 7 ofallthenameserversthatareresponsibleforeachofthetop-level domainnamesandtheirIPaddresses 8 .Allrootnameserversaresynchronisedandprovidethesameanswers. Byqueryinganyoftherootnameservers,aDNSclientcanobtainthenameserverthatisresponsibleforany top-level-domainname.Fromthisnameserver,itispossibletoresolveanydomainname. Tobeabletocontacttherootnameservers,eachDNSclientmustknowtheirIPaddresses.Thisimplies,that DNSclientsmustmaintainanup-to-datelistoftheIPaddressesoftherootnameservers 9 .Withoutthislist,it isimpossibletocontacttherootnameservers.ForcingallInternethoststomaintainthemostrecentversionof thislistwouldbedifcultfromanoperationalpointofview.Tosolvethisproblem,thedesignersoftheDNS introducedaspecialtypeofDNSserver:theDNSresolvers.A resolver isaserverthatprovidesthename resolutionserviceforasetofclients.Anetworkusuallycontainsafewresolvers.Eachhostinthesenetworksis conguredtosendallitsDNSqueriesviaoneofitslocalresolvers.Thesequeriesarecalled recursivequeries as the resolver mustrecursethroughthehierarchyofnameserverstoobtainthe answer. DNSresolvershaveseveraladvantagesoverlettingeachInternethostquerydirectlynameservers.Firstly,regular Internethostsdonotneedtomaintaintheup-to-datelistoftheIPaddressesoftherootservers.Secondly,regular InternethostsdonotneedtosendqueriestonameserversallovertheInternet.Furthermore,asaDNSresolver servesalargenumberofhosts,itcancachethereceivedanswers.Thisallowstheresolvertoquicklyreturn answersforpopularDNSqueriesandreducestheloadonallDNSservers [JSBM2002]. ThelastcomponentoftheDomainNameSystemistheDNSprotocol.TheDNSprotocolrunsaboveboththe datagramserviceandthebytestreamservices.Inpractice,thedatagramserviceisusedwhenshortqueriesand responsesareexchanged,andthebytestreamserviceisusedwhenlongerresponsesareexpected.Inthissection, wewillonlydiscusstheutilisationoftheDNSprotocolabovethedatagramservice.Thisisthemostfrequent utilisationoftheDNS. DNSmessagesarecomposedofvepartsthatarenamedsectionsin RFC1035.Therstthreesectionsare mandatoryandthelasttwosectionsareoptional.TherstsectionofaDNSmessageisits Header.Itcontains informationaboutthetypeofmessageandthecontentoftheothersections.Thesecondsectioncontainsthe Question senttothenameserverorresolver.Thethirdsectioncontainsthe Answer tothe Question.Whenaclient sendsaDNSquery,the Answer sectionisempty.Thefourthsection,named Authority,containsinformationabout theserversthatcanprovideanauthoritativeanswerifrequired.Thelastsectioncontainsadditionalinformation thatissuppliedbytheresolverorserverbutwasnotrequestedinthequestion. TheheaderofDNSmessagesiscomposedof12bytesanditsstructureisshowninthegurebelow. The ID (identier)isa16-bitsrandomvaluechosenbytheclient.WhenaclientsendsaquestiontoaDNSserver, itremembersthequestionanditsidentier.Whenaserverreturnsananswer,itreturnsinthe ID eldtheidentier 6 Therearecurrently13rootservers.Inpractice,someoftheserootserversarethemselvesimplementedasasetofdistinctphysical servers.Seehttp://www.root-servers.org/formoreinformationaboutthephysicallocationoftheseservers. 7 Acopyoftheinformationmaintainedbyeachrootnameserverisavailableathttp://www.internic.net/zones/root.zone 8 UntilFebruary2008,therootDNSserversonlyhadIPv4addresses.IPv6addresseswereaddedtotherootDNSserversslowlyto avoidcreatingproblemsasdiscussedin http://www.icann.org/en/committees/security/sac018.pdf In2010,severalDNSrootserversarestill notreachablebyusingIPv6. 9 ThecurrentlistoftheIPaddressesoftherootnameserversismaintainedat http://www.internic.net/zones/named.root .TheseIPaddresses arestableandrootnameserversseldomchangetheirIPaddresses.DNSresolversmusthowevermaintainanup-to-datecopyofthisle. 34Chapter3.TheapplicationLayer

PAGE 39

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure3.9:DNSheader chosenbytheclient.Thankstothisidentier,theclientcanmatchthereceivedanswerwiththequestionthatit sent. The QR agissetto 0 inDNSqueriesand 1 inDNSanswers.The Opcode isusedtospecifythetypeofquery. Forinstance,a standardquery iswhenaclientsendsa name andtheserverreturnsthecorresponding data andan updaterequestiswhentheclientsendsa name andnew data andtheserverthenupdatesitsdatabase. The AA bitissetwhentheserverthatsenttheresponsehas authority forthedomainnamefoundinthequestion section.IntheoriginalDNSdeployments,twotypesofserverswereconsidered: authoritative serversand nonauthoritative servers.The authoritative serversaremanagedbythesystemadministratorsresponsibleforagiven domain.Theyalwaysstorethemostrecentinformationaboutadomain. Non-authoritative serversareserversor resolversthatstoreDNSinformationaboutexternaldomainswithoutbeingmanagedbytheownersofadomain. Theymaythusprovideanswersthatareoutofdate.Fromasecuritypointofview,the authoritative bitisnotan absoluteindicationaboutthevalidityofananswer.SecuringtheDomainNameSystemisacomplexproblemthat wasonlyaddressedsatisfactorilyrecentlybytheutilisationofcryptographicsignaturesintheDNSSECextensions toDNSdescribedin RFC4033.However,theseextensionsareoutsidethescopeofthischapter. The RD (recursiondesired)bitissetbyaclientwhenitsendsaquerytoaresolver.Suchaqueryissaidtobe recursive becausetheresolverwillrecursethroughtheDNShierarchytoretrievetheansweronbehalfoftheclient. Inthepast,allresolverswereconguredtoperformrecursivequeriesonbehalfofanyInternethost.However, thisexposestheresolverstoseveralsecurityrisks.Thesimplestoneisthattheresolvercouldbecomeoverloaded byhavingtoomanyrecursivequeriestoprocess.Asofthiswriting,mostresolvers 10 onlyallowrecursivequeries fromclientsbelongingtotheircompanyornetworkanddiscardallotherrecursivequeries.The RA bitindicates whethertheserversupportsrecursion.The RCODE isusedtodistinguishbetweendifferenttypesoferrors.See RFC1035 foradditionaldetails.Thelastfoureldsindicatethesizeofthe Question, Answer, Authority and Additional sectionsoftheDNSmessage. ThelastfoursectionsoftheDNSmessagecontain ResourceRecords (RR).AllRRshavethesametoplevelformat showninthegurebelow. Ina ResourceRecord (RR),the Name indicatesthenameofthenodetowhichthisresourcerecordpertains.The twobytes Type eldindicatethetypeofresourcerecord.The Class eldwasusedtosupporttheutilisationofthe DNSinotherenvironmentsthantheInternet. The TTL eldindicatesthelifetimeofthe ResourceRecord inseconds.Thiseldissetbytheserverthatreturns ananswerandindicatesforhowlongaclientoraresolvercanstorethe ResourceRecord insideitscache.Along TTL indicatesastable RR.Somecompaniesuseshort TTL valuesformobilehostsandalsoforpopularservers. 10 SomeDNSresolversallowanyhosttosendqueries. OpenDNS and GoogleDNS areexampleofopenresolvers. 3.2.Application-levelprotocols35

PAGE 40

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure3.10:DNSResourceRecords Forexample,awebhostingcompanythatwantstospreadtheloadoverapoolofhundredserverscancongure itsnameserverstoreturndifferentanswerstodifferentclients.Ifeachanswerhasasmall TTL,theclientswillbe forcedtosendDNSqueriesregularly.Thenameserverwillreplytothesequeriesbysupplyingtheaddressofthe lessloadedserver. The RDLength eldisthelengthofthe RData eldthatcontainstheinformationofthetypespeciedinthe Type eld. SeveraltypesofDNSRRareusedinpractice.The A typeisusedtoencodetheIPv4addressthatcorrespondsto thespeciedname.The AAAA typeisusedtoencodetheIPv6addressthatcorrespondstothespeciedname.A NS recordcontainsthenameoftheDNSserverthatisresponsibleforagivendomain.Forexample,aqueryfor the A recordassociatedtothe www.ietf.org namereturnsthefollowinganswer. Thisanswercontainsseveralpiecesofinformation.First,thename www.ietf.org isassociatedtoIPaddress 64.170.98.32.Second,the ietf.org domainismanagedbysixdifferentnameservers.Threeofthesenameservers arereachableviaIPv4andIPv6.TwoofthemarenotreachableviaIPv6and ns0.ietf.org isonlyreachable viaIPv6.Aqueryforthe AAAA recordassociatedto www.ietf.org returns 2001:1890:1112:1::20 andthesame authorityandadditionalsections. CNAME (orcanonicalnames)areusedtodenealiases.Forexample www.example.com couldbea CNAME for pc12.example.com thatistheactualnameoftheserveronwhichthewebserverfor www.example.com runs. Note: ReverseDNSandin-addr.arpa TheDNSismainlyusedtondtheIPaddressthatcorrespondtoagivenname.However,itissometimesuseful toobtainthenamethatcorrespondstoanIPaddress.Thisdonebyusingthe PTR (pointer ) RR.The RData part ofa PTRRR containsthenamewhilethe Name partofthe RR containstheIPaddressencodedinthe in-addr.arpa domain.IPv4addressesareencodedinthe in-addr.arpa byreversingthefourdigitsthatcomposethedotted decimalrepresentationoftheaddress.Forexample,considerIPv4address 192.0.2.11.Thehostnameassociated tothisaddresscanbefoundbyrequestingthe PTRRR thatcorrespondsto 11.2.0.192.in-addr.arpa.Asimilar solutionisusedtosupportIPv6addresses,see RFC3596. 36Chapter3.TheapplicationLayer

PAGE 41

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure3.11:Queryforthe A recordof www.ietf.org AnimportantpointtonoteregardingtheDomainNameSystemisitsextensibility.Thankstothe Type and RDLength elds,theformatoftheResourceRecordscaneasilybeextended.Furthermore,aDNSimplementation thatreceivesanewResourceRecordthatitdoesnotunderstandcanignoretherecordwhilestillbeingableto processtheotherpartsofthemessage.Thisallows,forexample,aDNSserverthatonlysupportsIPv4toignore theIPv6addresseslistedintheDNSreplyfor www.ietf.org whilestillbeingabletocorrectlyparsetheResource Recordsthatitunderstands.ThisextensibilityallowedtheDomainNameSystemtoevolveovertheyearswhile stillpreservingthebackwardcompatibilitywithalreadydeployedDNSimplementations. 3.2.2Electronicmail Electronicmail,oremail,isaverypopularapplicationincomputernetworkssuchastheInternet.Email appeared intheearly1970sandallowsuserstoexchangetextbasedmessages.Initially,itwasmainlyusedtoexchange shortmessages,butovertheyearsitsusagehasgrown.Itisnownotonlyusedtoexchangesmall,butalsolong messagesthatcanbecomposedofseveralpartsaswewillseelater. BeforelookingatthedetailsofInternetemail,letusconsiderasimplescenarioillustratedinthegurebelow, whereAlicesendsanemailtoBob.Alicepreparesheremailbyusingan emailclients andsendsittoheremail server.Alice's emailserver extractsBob'saddressfromtheemailanddeliversthemessagetoBob'sserver.Bob retrievesAlice'smessageonhisserverandreadsitbyusinghisfavouriteemailclientorthroughhiswebmail interface. Figure3.12:SimpliedarchitectureoftheInternetemail Theemailsystemthatweconsiderinthisbookiscomposedoffourcomponents: amessageformat,thatdeneshowvalidemailmessagesareencoded protocols,thatallowhostsandserverstoexchangeemailmessages 3.2.Application-levelprotocols37

PAGE 42

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 clientsoftware,thatallowsuserstoeasilycreateandreademailmessages software,thatallowsserverstoefcientlyexchangeemailmessages Wewillrstdiscusstheformatofemailmessagesfollowedbytheprotocolsthatareusedontoday'sInternetto exchangeandretrieveemails.Otheremailsystemshavebeendevelopedinthepast [Bush1993][Genilloud1990] [GC2000],buttodaymostemailsolutionshavemigratedtotheInternetemail.Informationaboutthesoftware thatisusedtocomposeanddeliveremailsmaybefoundon wikipedia amongothers,forboth emailclients and emailservers.MoredetailedinformationaboutthefullInternetMailArchitecturemaybefoundin RFC5598. Emailmessages,likepostalmail,arecomposedoftwoparts: a header thatplaysthesameroleastheletterheadinregularmail.Itcontainsmetadataaboutthemessage. the body thatcontainsthemessageitself. EmailmessagesareentirelycomposedoflinesofASCIIcharacters.Eachlinecancontainupto998characters andisterminatedbythe CR and LF controlcharacters RFC5322.Thelinesthatcomposethe header appear beforethemessage body.Anemptyline,containingonlythe CR and LF characters,markstheendofthe header. Thisisillustratedinthegurebelow. Figure3.13:Thestructureofemailmessages Theemailheadercontainsseverallinesthatallbeginwithakeywordfollowedbyacolonandadditionalinformation.Theformatofemailmessagesandthedifferenttypesofheaderlinesaredenedin RFC5322.Twoofthese headerlinesaremandatoryandmustappearinallemailmessages: Thesenderaddress.Thisheaderlinestartswith From:.Thiscontainsthe(optional)nameofthesender followedbyitsemailaddressbetween < and >.Emailaddressesarealwayscomposedofausername followedbythe @ signandadomainname. Thedate.Thisheaderlinestartswith Date:. RFC5322 preciselydenestheformatusedtoencodeadate. Otherheaderlinesappearinmostemailmessages.The Subject: headerlineallowsthesendertoindicatethetopic discussedintheemail.Threetypesofheaderlinescanbeusedtospecifytherecipientsofamessage: the To: headerlinecontainstheemailaddressesoftheprimaryrecipientsofthemessage 11 .Several addressescanbeseparatedbyusingcommas. the cc: headerlineisusedbythesendertoprovidealistofemailaddressesthatmustreceiveacarboncopy ofthemessage.Severaladdressescanbelistedinthisheaderline,separatedbycommas.Allrecipientsof theemailmessagereceivethe To: and cc: headerlines. the bcc: headerlineisusedbythesendertoprovidealistofcommaseparatedemailaddressesthatmust receiveablindcarboncopyofthemessage.The bcc: headerlineisnotdeliveredtotherecipientsofthe emailmessage. Asimpleemailmessagecontainingthe From:, To:, Subject: and Date: headerlinesandtwolinesofbodyisshown below. 11 Itcouldbesurprisingthatthe To: isnotmandatoryinsideanemailmessage.Whilemostemailmessageswillcontainthisheaderlinean emailthatdoesnotcontaina To: headerlineandthatreliesonthe bcc: tospecifytherecipientisvalidaswell. 38Chapter3.TheapplicationLayer

PAGE 43

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 From:BobSmith To:AliceDoe,AliceSmith Subject:Hello Date:Mon,8Mar201019:55:06-0600 Thisisthe"Helloworld"ofemailmessages. Thisisthesecondlineofthebody Notetheemptylineafterthe Date: headerline;thisemptylinecontainsonlythe CR and LF characters,andmarks theboundarybetweentheheaderandthebodyofthemessage. Severalotheroptionalheaderlinesaredenedin RFC5322 andelsewhere 12 .Furthermore,manyemailclients andserversdenetheirownheaderlinesstartingfrom X-.Severaloftheoptionalheaderlinesdenedin RFC 5322 areworthbeingdiscussedhere: the Message-Id: headerlineisusedtoassociateauniqueidentiertoeachemail.Emailidentiersare usuallystructuredlike string@domain where string isauniquecharacterstringorsequencenumberchosen bythesenderoftheemailand domain thedomainnameofthesender.Sincedomainnamesareunique, ahostcangenerategloballyuniquemessageidentiersconcatenatingalocallyuniqueidentierwithits domainname. the In-reply-to: isusedwhenamessagewascreatedinreplytoapreviousmessage.Inthiscase,theendof the In-reply-to: linecontainstheidentieroftheoriginalmessage. the Received: headerlineisusedwhenanemailmessageisprocessedbyseveralserversbeforereachingits destination.Eachintermediateemailserveraddsa Received: headerline.Theseheaderlinesareusefulto debugproblemsindeliveringemailmessages. Thegurebelowshowstheheaderlinesofoneemailmessage.Themessageoriginatedatahostnamed wira.rstpr.com.au andwasreceivedby smtp3.sgsi.ucl.ac.be.The Received: lineshavebeenwrappedforreadability. Received:fromsmtp3.sgsi.ucl.ac.be(Unknown[10.1.5.3]) bymmp.sipr-dc.ucl.ac.be (SunJava(tm)SystemMessagingServer7u3-15.0164bit(builtFeb122010)) withESMTPid<0KYY00L85LI5JLE0@mmp.sipr-dc.ucl.ac.be>;Mon, 08Mar201011:37:17+0100(CET) Received:frommail.ietf.org(mail.ietf.org[64.170.98.32]) bysmtp3.sgsi.ucl.ac.be(Postfix)withESMTPidB92351C60D7;Mon, 08Mar201011:36:51+0100(CET) Received:from[127.0.0.1](localhost[127.0.0.1])bycore3.amsl.com(Postfix) withESMTPidF066A3A68B9;Mon,08Mar201002:36:38-0800(PST) Received:fromlocalhost(localhost[127.0.0.1])bycore3.amsl.com(Postfix) withESMTPidA1E6C3A681Bfor;Mon, 08Mar201002:36:37-0800(PST) Received:frommail.ietf.org([64.170.98.32]) bylocalhost(core3.amsl.com[127.0.0.1])(amavisd-new,port10024) withESMTPiderw8ih2v8VQafor;Mon, 08Mar201002:36:36-0800(PST) Received:fromgair.firstpr.com.au(gair.firstpr.com.au[150.101.162.123]) bycore3.amsl.com(Postfix)withESMTPid03E893A67EDfor;Mon, 08Mar201002:36:35-0800(PST) Received:from[10.0.0.6](wira.firstpr.com.au[10.0.0.6]) bygair.firstpr.com.au(Postfix)withESMTPidD0A49175B63;Mon, 08Mar201021:36:37+1100(EST) Date:Mon,08Mar201021:36:38+1100 From:RobinWhittle Subject:Re:[rrg]Recommendationandwhathappensnext In-reply-to: To:RRG Message-id:<4B94D336.7030504@firstpr.com.au> 12 Thelistofallstandardemailheaderlinesmaybefoundathttp://www.iana.org/assignments/message-headers/message-header-index.html 3.2.Application-levelprotocols39

PAGE 44

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Messagecontentremoved Initially,emailwasusedtoexchangesmallmessagesofASCIItextbetweencomputerscientists.However,with thegrowthoftheInternet,supportingonlyASCIItextbecameaseverelimitationfortworeasons.Firstofall, non-Englishspeakerswantedtowriteemailsintheirnativelanguagethatoftenrequiredmorecharactersthan thoseoftheASCIIcharactertable.Second,manyuserswantedtosendothercontentthanjustASCIItextby emailsuchasbinaryles,imagesorsound. Tosolvethisproblem,theIETF developedtheMultipurposeInternetMailExtensions( MIME ).Theseextensions werecarefullydesignedtoallowInternetemailtocarrynon-ASCIIcharactersandbinaryleswithoutbreaking theemailserversthatweredeployedatthattime.ThisrequirementforbackwardcompatibilityforcedtheMIME designerstodevelopextensionstotheexistingemailmessageformat RFC822 insteadofdeningacompletely newformatthatwouldhavebeenbettersuitedtosupportthenewtypesofemails. RFC2045 denesthreenewtypesofheaderlinestosupportMIME: The MIME-Version: headerindicatestheversionoftheMIMEspecicationthatwasusedtoencodethe emailmessage.ThecurrentversionofMIMEis1.0.OtherversionsofMIMEmaybedenedinthefuture. Thankstothisheaderline,thesoftwarethatprocessesemailmessageswillbeabletoadapttotheMIME versionusedtoencodethemessage.Messagesthatdonotcontainthisheaderaresupposedtobeformatted accordingtotheoriginal RFC822 specication. The Content-Type: headerlineindicatesthetypeofdatathatiscarriedinsidethemessage(seebelow) The Content-Transfer-Encoding: headerlineisusedtospecifyhowthemessagehasbeenencoded.When MIMEwasdesigned,someemailserverswereonlyabletoprocessmessagescontainingcharactersencoded usingthe7bitsASCIIcharacterset.MIMEallowstheutilisationofothercharacterencodings. Insidetheemailheader,the Content-Type: headerlineindicateshowtheMIMEemailmessageisstructured. RFC 2046 denestheutilisationofthisheaderline.ThetwomostcommonstructuresforMIMEmessagesare: Content-Type:multipart/mixed.ThisheaderlineindicatesthattheMIMEmessagecontainsseveralindependentparts.Forexample,suchamessagemaycontainapartinplaintextandabinaryle. Content-Type:multipart/alternative.ThisheaderlineindicatesthattheMIMEmessagecontainsseveral representationsofthesameinformation.Forexample,a multipart/alternative messagemaycontainbotha plaintextandanHTMLversionofthesametext. TosupportthesetwotypesofMIMEmessages,therecipientofamessagemustbeabletoextractthedifferent partsfromthemessage.In RFC822,anemptylinewasusedtoseparatetheheaderlinesfromthebody.Usingan emptylinetoseparatethedifferentpartsofanemailbodywouldbedifcultasthebodyofemailmessagesoften containsoneormoreemptylines.Anotherpossibleoptionwouldbetodeneaspecialline,e.g. *-LAST_LINE-* tomarktheboundarybetweentwopartsofaMIMEmessage.Unfortunately,thisisnotpossibleassomeemails maycontainthisstringintheirbody(e.g.emailssenttostudentstoexplaintheformatofMIMEmessages).To solvethisproblem,the Content-Type: headerlinecontainsasecondparameterthatspeciesthestringthathas beenusedbythesenderoftheMIMEmessagetodelineatethedifferentparts.Inpractice,thisstringisoften chosenrandomlybythemailclient. Theemailmessagebelow,copiedfrom RFC2046 showsaMIMEmessagecontainingtwopartsthatarebothin plaintextandencodedusingtheASCIIcharacterset.Thestring simpleboundary isdenedinthe Content-Type: headerasthemarkerfortheboundarybetweentwosuccessiveparts.AnotherexampleofMIMEmessagesmay befoundin RFC2046. Date:Mon,20Sep199916:33:16+0200 From:NathanielBorenstein To:NedFreed Subject:Test MIME-Version:1.0 Content-Type:multipart/mixed;boundary="simpleboundary" preamble,tobeignored --simpleboundary Content-Type:text/plain;charset=us-ascii 40Chapter3.TheapplicationLayer

PAGE 45

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Firstpart --simpleboundary Content-Type:text/plain;charset=us-ascii Secondpart --simpleboundary The Content-Type: headercanalsobeusedinsideaMIMEpart.Inthiscase,itindicatesthetypeofdataplaced inthispart.Eachdatatypeisspeciedasatypefollowedbyasubtype.Adetaileddescriptionmaybefoundin RFC2046.Someofthemostpopular Content-Type: headerlinesare: text.Themessagepartcontainsinformationintextualformat.Thereareseveralsubtypes: text/plain for regularASCIItext, text/html denedin RFC2854 fordocumentsin HTML formatorthe text/enriched formatdenedin RFC1896.The Content-Type: headerlinemaycontainasecondparameterthatspecies thecharactersetusedtoencodethetext. charset=us-ascii isthestandardASCIIcharactertable.Other frequentcharactersetsinclude charset=UTF8 or charset=iso-8859-1.The listofstandardcharactersets is maintainedby IANA image.Themessagepartcontainsabinaryrepresentationofanimage.Thesubtypeindicatestheformatof theimagesuchas gif, jpg or png. audio.Themessagepartcontainsanaudioclip.Thesubtypeindicatestheformatoftheaudiocliplike wav or mp3 video.Themessagepartcontainsavideoclip.Thesubtypeindicatestheformatofthevideocliplike avi or mp4 application.Themessagepartcontainsbinaryinformationthatwasproducedbytheparticularapplication listedasthesubtype.Emailclientsusethesubtypetolaunchtheapplicationthatisabletodecodethe receivedbinaryinformation. Note: FromASCIItoUnicode Therstcomputersuseddifferenttechniquestorepresentcharactersinmemoryandondisk.Duringthe1960s, computersbegantoexchangeinformationviatapeortelephonelines.Unfortunately,eachvendorhaditsown proprietarycharactersetandexchangingdatabetweencomputersfromdifferentvendorswasoftendifcult.The 7bitsASCIIcharactertable RFC20 setwasadoptedbyseveralvendorsandbymanyInternetprotocols.However, ASCIIbecameaproblemwiththeinternationalisationoftheInternetandthedesireofmoreandmoreuserstouse charactersetsthatsupporttheirownwrittenlanguage.Arstattemptatsolvingthisproblemwasthedenition oftheISO-8859 charactersetsby ISO.Thisfamilyofstandardsspeciedvariouscharactersetsthatallowedthe representationofmanyEuropeanwrittenlanguagesbyusing8bitscharacters.Unfortunately,an8-bitscharacter setisnotsufcienttosupportsomewidelyusedlanguages,suchasthoseusedinAsiancountries.Fortunately,at theendofthe1980s,severalcomputerscientistsproposedtodevelopastandardthatsupportsallwrittenlanguages usedonEarthtoday.TheUnicodestandard [Unicode] hasnowbeenadoptedbymostcomputerandsoftware vendors.Forexample,JavausesUnicodenativelytomanipulatecharacters,PythoncanhandlebothASCIIand Unicodecharacters.InternetapplicationsareslowlymovingtowardscompletesupportfortheUnicodecharacter sets,butmovingfromASCIItoUnicodeisanimportantchangethatcanhaveahugeimpactoncurrentdeployed implementations.Seeforexample,theworktocompletelyinternationaliseemail RFC4952 anddomainnames RFC5890. ThelastMIMEheaderlineis Content-Transfer-Encoding:.Thisheaderlineisusedafterthe Content-Type: header line,withinamessagepart,andspecieshowthemessageparthasbeenencoded.Thedefaultencodingistouse 7bitsASCII.Themostfrequentencodingsare quoted-printable and Base64.Bothsupportencodingasequence ofbytesintoasetofASCIIlinesthatcanbesafelytransmittedbyemailservers. quoted-printable isdenedin RFC2045.Webrieydescribe base64 whichisdenedin RFC2045 and RFC4648. Base64 dividesthesequenceofbytestobeencodedintogroupsofthreebytes(withthelastgrouppossiblybeing partiallylled).Eachgroupofthreebytesisthendividedintofoursix-biteldsandeachsixbiteldisencoded asacharacterfromthetablebelow. 3.2.Application-levelprotocols41

PAGE 46

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 + 12 M 29 d 46 u 63 / 13 N 30 e 47 v 14 O 31 f 48 w 15 P 32 g 49 x 16 Q 33 h 50 y Theexamplebelow,from RFC4648,illustratesthe Base64 encoding. Inputdata 0x14fb9c03d97e 8-bit 000101001111101110011100000000111101100101111110 6-bit 000101001111101110011100000000111101100101111110 Decimal 51546280613762 Encoding FPucA9l+ Thelastpointtobediscussedabout base64 iswhathappenswhenthelengthofthesequenceofbytestobe encodedisnotamultipleofthree.Inthiscase,thelastgroupofbytesmaycontainoneortwobytesinsteadof three. Base64 reservesthe = characterasapaddingcharacter.Thischaracterisusedtwicewhenthelastgroup containstwobytesandoncewhenitcontainsonebyteasillustratedbythetwoexamplesbelow. Inputdata 0x14 8-bit 00010100 6-bit 000101000000 Decimal 50 Encoding FA== Inputdata 0x14b9 8-bit 0001010011111011 6-bit 000101001111101100 Decimal 51544 Encoding FPs= Nowthatwehaveexplainedtheformatoftheemailmessages,wecandiscusshowthesemessagescanbeexchangedthroughtheInternet.Thegurebelowillustratestheprotocolsthatareusedwhen Alice sendsanemail messageto Bob. Alice preparesheremailwithanemailclientoronawebmailinterface.Tosendheremailto Bob, Alice`sclientwillusetheSimpleMailTransferProtocol( SMTP)todeliverhermessagetoherSMTPserver. Alice`semailclientisconguredwiththenameofthedefaultSMTPserverforherdomain.Thereisusuallyat leastoneSMTPserverperdomain.Todeliverthemessage, Alice`sSMTPservermustndtheSMTPserverthat contains Bob`smailbox.ThiscanbedonebyusingtheMaileXchange(MX)recordsoftheDNS.AsetofMX recordscanbeassociatedtoeachdomain.EachMXrecordcontainsanumericalpreferenceandthefullyqualied domainnameofaSMTPserverthatisabletodeliveremailmessagesdestinedtoallvalidemailaddressesofthis domain.TheDNScanreturnseveralMXrecordsforagivendomain.Inthiscase,theserverwiththelowest preferenceisusedrst.Ifthisserverisnotreachable,thesecondmostpreferredserverisusedetc. Bob`sSMTP serverwillstorethemessagesentby Alice until Bob retrievesitusingawebmailinterfaceorprotocolssuchasthe PostOfceProtocol(POP)ortheInternetMessageAccessProtocol(IMAP). 42Chapter3.TheapplicationLayer

PAGE 47

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure3.14:Emaildeliveryprotocols TheSimpleMailTransferProtocol TheSimpleMailTransferProtocol(SMTP)denedin RFC5321 isaclient-serverprotocol.TheSMTPspecicationdistinguishesbetweenvetypesofprocessesinvolvedinthedeliveryofemailmessages.Emailmessages arecomposedonaMailUserAgent(MUA).TheMUAisusuallyeitheranemailclientorawebmail.TheMUA sendstheemailmessagetoaMailSubmissionAgent(MSA).TheMSAprocessesthereceivedemailandforwards ittotheMailTransmissionAgent(MTA).TheMTAisresponsibleforthetransmissionoftheemail,directlyor viaintermediateMTAstotheMTAofthedestinationdomain.ThisdestinationMTAwillthenforwardthemessagetotheMailDeliveryAgent(MDA)whereitwillbeaccessedbytherecipient'sMUA.SMTPisusedforthe interactionsbetweenMUAandMSA 13 ,MSA-MTAandMTA-MTA. SMTPisatext-basedprotocollikemanyotherapplication-layerprotocolsontheInternet.Itreliesonthebytestreamservice.Serverslistenonport 25.ClientssendcommandsthatareeachcomposedofonelineofASCII textterminatedby CR+LF.ServersreplybysendingASCIIlinesthatcontainathreedigitnumericalerror/success codeandoptionalcomments. TheSMTPprotocol,likemosttext-basedprotocols,isspeciedasa BNF .ThefullBNFisdenedin RFC5321. ThemainSMTPcommandsaredenedbytheBNFrulesshowninthegurebelow. Figure3.15:BNFspecicationoftheSMTPcommands InthisBNF, atext correspondstoprintableASCIIcharacters.ThisBNFruleisdenedin RFC5322.Theve maincommandsare EHLO, MAILFROM:, RCPTTO:, DATA and QUIT 14 Postmaster isthealiasofthesystem administratorwhoisresponsibleforagivendomainorSMTPserver.Alldomainsmusthavea Postmaster alias. TheSMTPresponsesaredenedbytheBNFshowninthegurebelow. Figure3.16:BNFspecicationoftheSMTPresponses SMTPserversusestructuredreplycodescontainingthreedigitsandanoptionalcomment.Therstdigitof 13 Duringthelastyears,manyInternetServiceProviders,campusandenterprisenetworkshavedeployedSMTPextensions RFC4954 on theirMSAs.TheseextensionsforcetheMUAstobeauthenticatedbeforetheMSAacceptsanemailmessagefromtheMUA. 14 TherstversionsofSMTPused HELO astherstcommandsentbyaclienttoaSMTPserver.WhenSMTPwasextendedtosupport newerfeaturessuchas8bitscharacters,itwasnecessarytoallowaservertorecognisewhetheritwasinteractingwithaclientthatsupported theextensionsornot. EHLO becamemandatorywiththepublicationof RFC2821. 3.2.Application-levelprotocols43

PAGE 48

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 thereplycodeindicateswhetherthecommandwassuccessfulornot.Areplycodeof 2xy indicatesthatthe commandhasbeenaccepted.Areplycodeof 3xy indicatesthatthecommandhasbeenaccepted,butadditional informationfromtheclientisexpected.Areplycodeof 4xy indicatesatransientnegativereply.Thismeansthat forsomereason,whichisindicatedbyeithertheotherdigitsorthecomment,thecommandcannotbeprocessed immediately,butthereissomehopethattheproblemwillonlybetransient.Thisisbasicallytellingtheclientto trythesamecommandagainlater.Incontrast,areplycodeof 5xy indicatesapermanentfailureorerror.Inthis case,itisuselessfortheclienttoretrythesamecommandlater.OtherapplicationlayerprotocolssuchasFTP RFC959 orHTTP RFC2616 useasimilarstructurefortheirreplycodes.Additionaldetailsabouttheotherreply codesmaybefoundin RFC5321. ExamplesofSMTPreplycodesincludethefollowing: 500Syntaxerror,commandunrecognized 501Syntaxerrorinparametersorarguments 502Commandnotimplemented 503Badsequenceofcommands 220Serviceready 221Serviceclosingtransmissionchannel 421Servicenotavailable,closingtransmissionchannel 250Requestedmailactionokay,completed 450Requestedmailactionnottaken:mailboxunavailable 452Requestedactionnottaken:insufficientsystemstorage 550Requestedactionnottaken:mailboxunavailable 354Startmailinput;endwith. Therstfourreplycodescorrespondtoerrorsinthecommandssentbytheclient.Thefourthreplycodewould besentbytheserverwhentheclientsendscommandsinanincorrectorder(e.g.theclienttriestosendanemail beforeprovidingthedestinationaddressofthemessage).Replycode 220 isusedbytheserverastherstmessage whenitagreestointeractwiththeclient.Replycode 221 issentbytheserverbeforeclosingtheunderlying transportconnection.Replycode 421 isreturnedwhenthereisaproblem(e.g.lackofmemory/diskresources) thatpreventstheserverfromacceptingthetransportconnection.Replycode 250 isthestandardpositivereplythat indicatesthesuccessofthepreviouscommand.Replycodes 450 and 452 indicatethatthedestinationmailbox istemporarilyunavailable,forvariousreasons,whilereplycode 550 indicatesthatthemailboxdoesnotexistor cannotbeusedforpolicyreasons.Replycode 354 indicatesthattheclientcanstarttransmittingitsemailmessage. Thetransferofanemailmessageisperformedinthreephases.Duringtherstphase,theclientopensatransport connectionwiththeserver.Oncetheconnectionhasbeenestablished,theclientandtheserverexchangegreetings messages(EHLO command).Mostserversinsistonreceivingvalidgreetingmessagesandsomeofthemdropthe underlyingtransportconnectioniftheydonotreceiveavalidgreeting.Oncethegreetingshavebeenexchanged, theemailtransferphasecanstart.Duringthisphase,theclienttransfersoneormoreemailmessagesbyindicating theemailaddressofthesender(MAILFROM: command),theemailaddressoftherecipient(RCPTTO: command) followedbytheheadersandthebodyoftheemailmessage(DATA command).Oncetheclienthasnishedsending allitsqueuedemailmessagestotheSMTPserver,itterminatestheSMTPassociation(QUIT command). Asuccessfultransferofanemailmessageisshownbelow S:220smtp.example.comESMTPMTAinformation C:EHLOmta.example.org S:250Hellomta.example.org,gladtomeetyou C:MAILFROM: S:250Ok C:RCPTTO: S:250Ok C:DATA S:354Enddatawith. C:From:"AliceDoe" C:To:BobSmith C:Date:Mon,9Mar201018:22:32+0100 C:Subject:Hello C: C:HelloBob C:Thisisasmallmessagecontaining4linesoftext. C:Bestregards, 44Chapter3.TheapplicationLayer

PAGE 49

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 C:Alice C:. S:250Ok:queuedas12345 C:QUIT S:221Bye Intheexampleabove,theMTArunningon mta.example.org opensaTCPconnectiontotheSMTPserveronhost smtp.example.com.Thelinesprexedwith S: (resp. C:)aretheresponsessentbytheserver(resp.thecommands sentbytheclient).TheserversendsitsgreetingsassoonastheTCPconnectionhasbeenestablished.Theclient thensendsthe EHLO commandwithitsfullyqualieddomainname.Theserverreplieswithreply-code 250 and sendsitsgreetings.TheSMTPassociationcannowbeusedtoexchangeanemail. Tosendanemail,theclientmustrstprovidetheaddressoftherecipientwith RCPTTO:.Thenitusesthe MAIL FROM: withtheaddressofthesender.Boththerecipientandthesenderareacceptedbytheserver.Theclient cannowissuethe DATA commandtostartthetransferoftheemailmessage.Afterhavingreceivedthe 354 reply code,theclientsendstheheadersandthebodyofitsemailmessage.Theclientindicatestheendofthemessage bysendingalinecontainingonlythe (dot)character 15 .Theserverconrmsthattheemailmessagehasbeen queuedfordeliveryortransmissionwithareplycodeof 250.Theclientissuesthe QUIT commandtoclosethe sessionandtheserverconrmswithreply-code 221,beforeclosingtheTCPconnection. Note: OpenSMTPrelaysandspam Sinceitscreationin1971,emailhasbeenaveryusefultoolthatisusedbymanyuserstoexchangelotsof information.Intheearlydays,allSMTPserverswereopenandanyonecouldusethemtoforwardemailstowards theirnaldestination.Unfortunately,overtheyears,someunscrupuloususershavefoundwaystouseemailfor marketingpurposesortosendmalware.Therstdocumentedabuseofemailformarketingpurposesoccurredin 1978whenamarketerwhoworkedforacomputervendorsenta marketingemail tomanyARPANETusers.At thattime,theARPANETcouldonlybeusedforresearchpurposesandthiswasanabuseoftheacceptableuse policy.Unfortunately,giventheextremelylowcostofsendingemails,theproblemofunsolicitedemailshasnot stopped.Unsolicitedemailsarenowcalledspamanda study carriedoutby ENISA in2009revealsthat95%of emailwasspamandthisnumberseemstocontinuetogrow.Thisplacesaburdenontheemailinfrastructureof InternetServiceProvidersandlargecompaniesthatneedtoprocessmanyuselessmessages. Giventheamountofspammessages,SMTPserversarenolongeropen RFC5068.SeveralextensionstoSMTP havebeendevelopedinrecentyearstodealwiththisproblem.Forexample,theSMTPauthenticationscheme denedin RFC4954 canbeusedbyanSMTPservertoauthenticateaclient.Severaltechniqueshavealsobeen proposedtoallowSMTPserversto authenticate themessagessentbytheirusers RFC4870RFC4871 ThePostOfceProtocol WhentherstversionsofSMTPweredesigned,theInternetwascomposedofminicomputersthatwereusedby anentireuniversitydepartmentorresearchlab.Theseminicomputerswereusedbymanyusersatthesametime. Emailwasmainlyusedtosendmessagesfromauseronagivenhosttoanotheruseronaremotehost.Atthat time,SMTPwastheonlyprotocolinvolvedinthedeliveryoftheemailsasallhostsattachedtothenetworkwere runninganSMTPserver.Onsuchhosts,anemaildestinedtolocaluserswasdeliveredbyplacingtheemailina specialdirectoryorleownedbytheuser.However,theintroductionofpersonalcomputersinthe1980s,changed thisenvironment.Initially,usersofthesepersonalcomputersusedapplicationssuchas telnet toopenaremote sessiononthelocal minicomputer toreadtheiremail.Thiswasnotuser-friendly.Abettersolutionappeared withthedevelopmentofuserfriendlyemailclientapplicationsonpersonalcomputers.Severalprotocolswere designedtoallowtheseclientapplicationstoretrievetheemailmessagesdestinedtoauserfromhis/herserver. Twooftheseprotocolsbecamepopularandarestillusedtoday.ThePostOfceProtocol(POP),denedin RFC 1939,isthesimplestone.Itallowsaclienttodownloadallthemessagesdestinedtoagivenuserfromhis/her emailserver.WedescribePOPbrieyinthissection.ThesecondprotocolistheInternetMessageAccessProtocol (IMAP),denedin RFC3501.IMAPismorepowerful,butalsomorecomplexthanPOP.IMAPwasdesignedto allowclientapplicationstoefcientlyaccessinreal-timetomessagesstoredinvariousfoldersonservers.IMAP 15 Thisimpliesthatavalidemailmessagecannotcontainalinewithonedotfollowedby CR and LF.Ifausertypessuchalineinanemail, hisemailclientwillautomaticallyaddaspacecharacterbeforeorafterthedotwhensendingthemessageoverSMTP. 3.2.Application-levelprotocols45

PAGE 50

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 assumesthatallthemessagesofagivenuserarestoredonaserverandprovidesthefunctionsthatarenecessary tosearch,download,deleteorltermessages. POPisanotherexampleofasimpleline-basedprotocol.POPrunsabovethebytestreamservice.APOPserver usuallylistenstoport110.APOPsessioniscomposedofthreeparts:an authorisation phaseduringwhich theserververiestheclient'scredential,a transaction phaseduringwhichtheclientdownloadsmessagesandan update phasethatconcludesthesession.Theclientsendscommandsandtheserverrepliesareprexedby +OK toindicateasuccessfulcommandorby -ERR toindicateerrors. WhenaclientopensatransportconnectionwiththePOPserver,thelattersendsasbanneranASCII-linestarting with +OK.ThePOPsessionisatthattimeinthe authorisation phase.Inthisphase,theclientcansendits username(resp.password)withthe USER (resp. PASS )command.Theserverreplieswith +OK iftheusername (resp.password)isvalidand -ERR otherwise. Oncetheusernameandpasswordhavebeenvalidated,thePOPsessionentersinthe transaction phase.Inthis phase,theclientcanissueseveralcommands.The STAT commandisusedtoretrievethestatusoftheserver. Uponreceptionofthiscommand,theserverreplieswithalinethatcontains +OK followedbythenumberof messagesinthemailboxandthetotalsizeofthemailboxinbytes.The RETR command,followedbyaspaceand aninteger,isusedtoretrievethenthmessageofthemailbox.The DELE commandisusedtomarkfordeletion thenthmessageofthemailbox. Oncetheclienthasretrievedandpossiblydeletedtheemailscontainedinthemailbox,itmustissuethe QUIT command.ThiscommandterminatesthePOPsessionandallowstheservertodeleteallthemessagesthathave beenmarkedfordeletionbyusingthe DELE command. ThegurebelowprovidesasimplePOPsession.Alllinesprexedwith C: (resp. S:)aresentbytheclient(resp. server). S:+OKPOP3serverready C:USERalice S:+OK CPASS12345pass S:+OKalice'smaildrophas2messages(620octets) C:STAT S:+OK2620 C:LIST S:+OK2messages(620octets) S:1120 S:2500 S:. C:RETR1 S:+OK120octets S: S:. C:DELE1 S:+OKmessage1deleted C:QUIT S:+OKPOP3serversigningoff(1messageleft) Inthisexample,aPOPclientcontactsaPOPserveronbehalfoftheusernamed alice.Notethatinthisexample, Alice'spasswordissentinclearbytheclient.Thisimpliesthatifsomeoneisabletocapturethepacketssentby Alice,hewillknowAlice'spassword 16 .ThenAlice'sclientissuesthe STAT commandtoknowthenumberof messagesthatarestoredinhermailbox.Itthenretrievesanddeletestherstmessageofthemailbox. 3.2.3TheHyperTextTransferProtocol IntheearlydaysoftheInternetwasmainlyusedforremoteterminalaccesswithtelnet,emailandletransfer. Thedefaultletransferprotocol, FTP,denedin RFC959 waswidelyusedand FTP clientsandserversarestill includedinmostoperatingsystems. 16 RFC1939 denestheAPOPauthenticationschemethatisnotvulnerabletosuchattacks. 46Chapter3.TheapplicationLayer

PAGE 51

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Many FTP clientsofferauserinterfacesimilartoaUnixshellandallowtheclienttobrowsethelesystemon theserverandtosendandretrieveles. FTP serverscanbeconguredintwomodes: authenticated:inthismode,theftpserveronlyacceptsuserswithavalidusernameandpassword.Once authenticated,theycanaccessthelesanddirectoriesaccordingtotheirpermissions anonymous:inthismode,clientssupplythe anonymous useridandtheiremailaddressaspassword.These clientsaregrantedaccesstoaspecialzoneofthelesystemthatonlycontainspublicles. ftpwasverypopularinthe1990sandearly2000s,buttodayithasmostlybeensupersededbymorerecent protocols.AuthenticatedaccesstolesismainlydonebyusingtheSecureShell(ssh)protocoldenedin RFC 4251 andsupportedbyclientssuchas scp or sftp.Nowadays,anonymousaccessismainlyprovidedbyweb protocols. Inthelate1980s,highenergyphysicistsworkingat CERN hadtoefcientlyexchangedocumentsabouttheir ongoingandplannedexperiments. TimBerners-Lee evaluatedseveralofthedocumentssharingtechniquesthat wereavailableatthattime [B1989].AsnoneoftheexistingsolutionsmetCERN'srequirements,theychooseto developacompletelynewdocumentsharingsystem.Thissystemwasinitiallycalledthe mesh,butwasquickly renamedthe worldwideweb.Thestartingpointforthe worldwideweb arehypertextdocuments.Anhypertext documentisadocumentthatcontainsreferences(hyperlinks)tootherdocumentsthatthereadercanimmediately access.Hypertextwasnotinventedfortheworldwideweb.Theideaofhypertextdocumentswasproposedin 1945 [Bush1945] andtherstexperimentsweredoneduringthe1960s [Nelson1965][Myers1998] .Comparedto thehypertextdocumentsthatwereusedinthelate1980s,themaininnovationintroducedbythe worldwideweb wastoallowhyperlinkstoreferencedocumentsstoredonremotemachines. Figure3.17:World-widewebclientsandservers Adocumentsharingsystemsuchasthe worldwideweb iscomposedofthreeimportantparts. 1.Astandardisedaddressingschemethatallowsunambiguousidenticationofdocuments 2.Astandarddocumentformat:theHyperTextMarkupLanguage 3.Astandardisedprotocolthatfacilitatesefcientretrievalofdocumentsstoredonaserver Note: Openstandardsandopenimplementations Openstandardshave,andarestillplayingakeyroleinthesuccessofthe worldwideweb asweknowittoday.Withoutopenstandards,theworldwidewebwouldneverhavereacheditscurrentsize.Inadditiontoopen standards,anotherimportantfactorforthesuccessofthewebwastheavailabilityofopenandefcientimplementationsofthesestandards.WhenCERNstartedtoworkonthe web,theirobjectivewastobuildarunning systemthatcouldbeusedbyphysicists.Theydevelopedopen-sourceimplementationsofthe rstwebservers and webclients.Theseopen-sourceimplementationswerepowerfulandcouldbeusedasis,byinstitutionswillingto 3.2.Application-levelprotocols47

PAGE 52

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 shareinformationontheweb.Theywerealsoextendedbyotherdeveloperswhocontributedtonewfeatures.For example,NCSA addedsupportforimagesintheir Mosaicbrowser thatwaseventuallyusedtocreate Netscape Communications. Therstcomponentsofthe worldwideweb aretheUniformResourceIdentiers(URI),denedin RFC3986.A URIisacharacterstringthatunambiguouslyidentiesaresourceontheworldwideweb.Hereisasubsetofthe BNFforURIs URI=scheme":""//"authoritypath["?"query]["#"fragment] scheme=ALPHA (ALPHA/DIGIT/"+"/"-"/".") authority=[userinfo"@"]host[":"port] query= (pchar/"/"/"?") fragment= (pchar/"/"/"?") pchar=unreserved/pct-encoded/sub-delims/":"/"@" query= (pchar/"/"/"?") fragment= (pchar/"/"/"?") pct-encoded="%"HEXDIGHEXDIG unreserved=ALPHA/DIGIT/"-"/"."/"_"/"~" reserved=gen-delims/sub-delims gen-delims=":"/"/"/"?"/"#"/"["/"]"/"@" sub-delims="!"/"$"/"&"/"'"/"("/")"/" "/"+"/","/";"/"=" TherstcomponentofaURIisits scheme.A scheme canbeseenasaselector,indicatingthemeaningofthe eldsafterit.Inpractice,theschemeoftenidentiestheapplication-layerprotocolthatmustbeusedbytheclient toretrievethedocument,butitisnotalwaysthecase.Someschemesdonotimplyaprotocolatallandsome donotindicatearetrievabledocument 17 .Themostfrequentschemeis httpthatwillbedescribedlater.AURI schemecanbedenedforalmostanyapplicationlayerprotocol[#furilist]_.Thecharacters`: and // followthe scheme ofanyURI. ThesecondpartoftheURIisthe authority.WithretrievableURI,thisincludestheDNSnameortheIPaddress oftheserverwherethedocumentcanberetrievedusingtheprotocolspeciedviathe scheme.Thisnamecan beprecededbysomeinformationabouttheuser(e.g.ausername)whoisrequestingtheinformation.Earlier denitionsoftheURIallowedthespecicationofausernameandapasswordbeforethe @ character( RFC 1738),butthisisnowdeprecatedasplacingapasswordinsideaURIisinsecure.Thehostnamecanbefollowed bythesemicoloncharacterandaportnumber.Adefaultportnumberisdenedforsomeprotocolsandtheport numbershouldonlybeincludedintheURIifanon-defaultportnumberisused(forotherprotocols,techniques likeserviceDNSrecordsareused). ThethirdpartoftheURIisthepathtothedocument.ThispathisstructuredaslenamesonaUnixhost(but itdoesnotimplythatthelesareindeedstoredthiswayontheserver).Ifthepathisnotspecied,theserver willreturnadefaultdocument.ThelasttwooptionalpartsoftheURIareusedtoprovideaqueryandindicatea specicpart(e.g.asectioninanarticle)oftherequesteddocument.SampleURIsareshownbelow. http://tools.ietf.org/html/rfc3986.html mailto:infobot@example.com?subject=current-issue http://docs.python.org/library/basehttpserver.html?highlight=http#BaseHTTPServer.BaseHTTPRequestHandler telnet://[2001:6a8:3080:3::2]:80/ ftp://cnn.example.com&story=breaking_news@10.0.0.1/top_story.htm TherstURIcorrespondstoadocumentnamed rfc3986.html thatisstoredontheservernamed tools.ietf.org and canbeaccessedbyusingthe http protocolonitsdefaultport.ThesecondURIcorrespondstoanemailmessage, withsubject current-issue,thatwillbesenttouser infobot indomain example.com.The mailto: URIschemeis denedin RFC6068.ThethirdURIreferencestheportion BaseHTTPServer.BaseHTTPRequestHandler ofthe document basehttpserver.html thatisstoredinthe library directoryonserver docs.python.org.Thisdocumentcan beretrievedbyusingthe http protocol.Thequery highlight=http isassociatedtothisURI.Thefourthexampleisa serverthatoperatesthe telnet protocol,usesIPv6address 2001:6a8:3080:3::2 andisreachableonport80.Thelast URIissomewhatspecial.Mostuserswillassumethatitcorrespondstoadocumentstoredonthe cnn.example.com 17 Anexampleofanon-retrievableURIis urn:isbn:0-380-81593-1 whichisanuniqueidentierforabook,throughtheurnscheme (see RFC3187).Ofcourse,anyURIcanbemakeretrievableviaadedicatedserveroranewprotocolbutthisonehasnoexplicitprotocol.Samethingfortheschemetag(see RFC4151),oftenusedinWebsyndication(see RFC4287 abouttheAtomsyndicationformat). Evenwhentheschemeisretrievable(forinstancewithhttp`),itisoftenusedonlyasanidentier,notasawaytogetaresource.See http://norman.walsh.name/2006/07/25/namesAndAddressesforagoodexplanation. 48Chapter3.TheapplicationLayer

PAGE 53

ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 server.However,toparsethisURI,itisimportanttorememberthatthe @ characterisusedtoseparatetheuser namefromthehostnameintheauthorisationpartofaURI.ThisimpliesthattheURIpointstoadocumentnamed top_story.htm onhosthavingIPv4address 10.0.0.1.Thedocumentwillberetrievedbyusingthe ftp protocolwith theusernamesetto cnn.example.com&story=breaking_news. Thesecondcomponentofthe wordwideweb istheHyperTextMarkupLanguage(HTML).HTMLdenesthe formatofthedocumentsthatareexchangedonthe web.The rstversionofHTML wasderivedfromtheStandard GeneralizedMarkupLanguage(SGML)thatwasstandardisedin1986by ISO. SGML wasdesignedtoallow largeprojectdocumentsinindustriessuchasgovernment,laworaerospacetobesharedefcientlyinamachinereadablemanner.Theseindustriesrequiredocumentstoremainreadableandeditablefortensofyearsandinsisted onastandardisedformatsupportedbymultiplevendors.Today, SGML isnolongerwidelyusedbeyondspecic applications,butitsdescendantsincluding HTML and XML arenowwidespread. Amarkuplanguageisastructuredwayofaddingannotationsabouttheformattingofthedocumentwithinthe documentitself.Examplemarkuplanguagesinclude troff,whichisusedtowritetheUnixmanpagesor Latex. HTMLusesmarkerstoannotatetextandadocumentiscomposedof HTMLelements.Eachelementisusually composedofthreeitems:astarttagthatpotentiallyincludessomespecicattributes,sometext(oftenincluding otherelements),andanendtag.AHTMLtagisakeywordenclosedinanglebrackets.Thegenericformofa HTMLelementis Sometexttobedisplayed MorecomplexHTMLelementscanalsoincludeoptionalattributesinthestarttag sometexttobedisplayed TheHTMLdocumentshownbelowiscomposedoftwoparts:aheader,delineatedbythe and markers,andabody(betweenthe and markers).Intheexamplebelow,theheaderonlycontains atitle,butothertypesofinformationcanbeincludedintheheader.Thebodycontainsanimage,sometextanda listwiththreehyperlinks.TheimageisincludedinthewebpagebyindicatingitsURIbetweenbracketsinsidethe marker.Theimagecan,ofcourse,resideonanyserverandtheclientwillautomaticallydownload itwhenrenderingthewebpage.The

...

markerisusedtospecifytherstlevelofheadings.The
    markerindicatesanunnumberedlistwhilethe
  • markerindicatesalistitem.The text indicatesahyperlink.The text willbeunderlinedintherenderedwebpageandtheclientwillfetchthespecied URIwhentheuserclicksonthelink. Figure3.18:AsimpleHTMLpage AdditionaldetailsaboutthevariousextensionstoHTMLmaybefoundinthe ofcialspecications maintained by W3C. Thethirdcomponentofthe worldwideweb istheHyperTextTransportProtocol(HTTP).HTTPisatext-based protocol,inwhichtheclientsendsarequestandtheserverreturnsaresponse.HTTPrunsabovethebytestream serviceandHTTPserverslistenbydefaultonport 80.ThedesignofHTTPhaslargelybeeninspiredbythe Internetemailprotocols.EachHTTPrequestcontainsthreeparts: a method ,thatindicatesthetypeofrequest,aURI,andtheversionoftheHTTPprotocolusedbytheclient 3.2.Application-levelprotocols49

    PAGE 54

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 a header ,thatisusedbytheclienttospecifyoptionalparametersfortherequest.Anemptylineisusedto marktheendoftheheader anoptionalMIMEdocumentattachedtotherequest Theresponsesentbytheserveralsocontainsthreeparts: a statusline ,thatindicateswhethertherequestwassuccessfulornot a header ,thatcontainsadditionalinformationabouttheresponse.Theresponseheaderendswithanempty line. aMIMEdocument Figure3.19:HTTPrequestsandresponses SeveraltypesofmethodcanbeusedinHTTPrequests.Thethreemostimportantonesare: the GET methodisthemostpopularone.Itisusedtoretrieveadocumentfromaserver.The GET methodisencodedas GET followedbythepathoftheURIoftherequesteddocumentand theversionofHTTPusedbytheclient.Forexample,toretrievethe http://www.w3.org/MarkUp/ URI,aclientmustopenaTCPonport 80 withhost www.w3.org andsendaHTTPrequest containingthefollowingline GET/MarkUp/HTTP/1.0 the HEAD methodisavariantofthe GET methodthatallowstheretrievaloftheheader linesforagivenURIwithoutretrievingtheentiredocument.Itcanbeusedbyaclientto verifyifadocumentexists,forinstance. the POST methodcanbeusedbyaclienttosendadocumenttoaserver.Thesentdocumentis attachedtotheHTTPrequestasaMIMEdocument. HTTPclientsandserverscanincludemanydifferentHTTPheadersinHTTPrequestsandresponses.EachHTTP headerisencodedasasingleASCII-lineterminatedby CR and LF.Severaloftheseheadersarebrieydescribed below.Adetaileddiscussionofallstandardheadersmaybefoundin RFC1945.TheMIMEheaderscanappear inbothHTTPrequestsandHTTPresponses. the Content-Length: headeristhe MIME headerthatindicatesthelengthoftheMIMEdocumentinbytes. the Content-Type: headeristhe MIME headerthatindicatesthetypeoftheattachedMIMEdocument. HTMLpagesusethe text/html type. the Content-Encoding: headerindicateshowthe MIMEdocument hasbeenencoded.Forexample,this headerwouldbesetto x-gzip foradocumentcompressedusingthe gzip software. RFC1945 and RFC2616 deneheadersthatarespecictoHTTPresponses.Theseserverheadersinclude: 50Chapter3.TheapplicationLayer

    PAGE 55

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 the Server: headerindicatestheversionofthewebserverthathasgeneratedtheHTTPresponse.Some serversprovideinformationabouttheirsoftwarereleaseandoptionalmodulesthattheyuse.Forsecurity reasons,somesystemadministratorsdisabletheseheaderstoavoidrevealingtoomuchinformationabout theirservertopotentialattackers. the Date: headerindicateswhentheHTTPresponsehasbeenproducedbytheserver. the Last-Modied: headerindicatesthedateandtimeofthelastmodicationofthedocumentattachedto theHTTPresponse. Similarly,thefollowingheaderlinescanonlyappearinsideHTTPrequestssentbyaclient: the User-Agent: headerprovidesinformationabouttheclientthathasgeneratedtheHTTPrequest.Some serversanalysethisheaderlineandreturndifferentheadersandsometimesdifferentdocumentsfordifferent useragents. the If-Modied-Since: headerisfollowedbyadate.Itenablesclientstocacheinmemoryorondiskthe recentormostfrequentlyuseddocuments.WhenaclientneedstorequestaURIfromaserver,itrstchecks whetherthedocumentisalreadyinitscache.Ifitis,theclientsendsaHTTPrequestwiththe If-ModiedSince: headerindicatingthedateofthecacheddocument.Theserverwillonlyreturnthedocumentattached totheHTTPresponseifitisnewerthantheversionstoredintheclient'scache. the Referrer: headerisfollowedbyaURI.ItindicatestheURIofthedocumentthattheclientvisitedbefore sendingthisHTTPrequest.Thankstothisheader,theservercanknowtheURIofthedocumentcontaining thehyperlinkfollowedbytheclient,ifany.Thisinformationisveryusefultomeasuretheimpactof advertisementscontaininghyperlinksplacedonwebsites. the Host: headercontainsthefullyqualieddomainnameoftheURIbeingrequested. Note: Theimportanceofthe Host: headerline TherstversionofHTTPdidnotincludethe Host: headerline.Thiswasaseverelimitationforwebhostingcompanies.Forexampleconsiderawebhostingcompanythatwantstoserveboth web.example.com and www.example.net onthesamephysicalserver.Bothwebsitescontaina /index.html document.Whenaclient sendsarequestforeither http://web.example.com/index.html or http://www.example.net/index.html,theHTTP1.0 requestcontainsthefollowingline: GET/index.htmlHTTP/1.0 Byparsingthisline,aservercannotdeterminewhich index.html leisrequested.Thankstothe Host: headerline,theserverknowswhethertherequestisfor http://web.example.com/index.html or http://www.dummy.net/index.html.Withoutthe Host: header,thisisimpossible.The Host: headerlineallowed webhostingcompaniestodeveloptheirbusinessbysupportingalargenumberofindependentwebserversonthe samephysicalserver. ThestatuslineoftheHTTPresponsebeginswiththeversionofHTTPusedbytheserver(usually HTTP/1.0 denedin RFC1945 or HTTP/1.1 denedin RFC2616)followedbyathreedigitstatuscodeandadditional informationinEnglish.HTTPstatuscodeshaveasimilarstructureasthereplycodesusedbySMTP. Allstatuscodesstartingwithdigit 2 indicateavalidresponse. 200Ok indicatesthattheHTTPrequestwas successfullyprocessedbytheserverandthattheresponseisvalid. Allstatuscodesstartingwithdigit 3 indicatethattherequesteddocumentisnolongeravailableonthe server. 301MovedPermanently indicatesthattherequesteddocumentisnolongeravailableonthisserver. A Location: headercontainingthenewURIoftherequesteddocumentisinsertedintheHTTPresponse. 304NotModied isusedinresponsetoanHTTPrequestcontainingthe If-Modied-Since: header.This statuslineisusedbytheserverifthedocumentstoredontheserverisnotmorerecentthanthedateindicated inthe If-Modied-Since: header. Allstatuscodesstartingwithdigit 4 indicatethattheserverhasdetectedanerrorintheHTTPrequestsent bytheclient. 400BadRequest indicatesasyntaxerrorintheHTTPrequest. 404NotFound indicatesthat therequesteddocumentdoesnotexistontheserver. 3.2.Application-levelprotocols51

    PAGE 56

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Allstatuscodesstartingwithdigit 5 indicateanerrorontheserver. 500InternalServerError indicatesthat theservercouldnotprocesstherequestduetoanerrorontheserveritself. InboththeHTTPrequestandtheHTTPresponse,theMIMEdocumentreferstoarepresentationofthedocument withtheMIMEheadersindicatingthetypeofdocumentanditssize. AsanillustrationofHTTP/1.0,thetranscriptbelowshowsaHTTPrequestforhttp://www.ietf.org andthecorrespondingHTTPresponse.TheHTTPrequestwassentusingthe curl commandlinetool.The User-Agent: header linecontainsmoreinformationaboutthisclientsoftware.ThereisnoMIMEdocumentattachedtothisHTTP request,anditendswithablankline. GET/HTTP/1.0 User-Agent:curl/7.19.4(universal-apple-darwin10.0)libcurl/7.19.4OpenSSL/0.9.8lzlib/1.2.3 Host:www.ietf.org TheHTTPresponseindicatestheversionoftheserversoftwareusedwiththemodulesincluded.The LastModied: headerindicatesthattherequesteddocumentwasmodiedaboutoneweekbeforetherequest.A HTMLdocument(notshown)isattachedtotheresponse.NotetheblanklinebetweentheheaderoftheHTTP responseandtheattachedMIMEdocument.The Server: headerlinehasbeentruncatedinthisoutput. HTTP/1.1200OK Date:Mon,15Mar201013:40:38GMT Server:Apache/2.2.4(Linux/SUSE)mod_ssl/2.2.4OpenSSL/0.9.8e(truncated) Last-Modified:Tue,09Mar201021:26:53GMT Content-Length:17019 Content-Type:text/html HTTPwasinitiallydesignedtoshareself-containedtextdocuments.Forthisreason,andtoeasetheimplementationofclientsandservers,thedesignersofHTTPchosetoopenaTCPconnectionforeachHTTPrequest. ThisimpliesthataclientmustopenoneTCPconnectionforeachURIthatitwantstoretrievefromaserveras illustratedonthegurebelow.Forawebpagecontainingonlytextdocumentsthiswasareasonabledesignchoice astheclientusuallyremainsidlewhilethe(human)userisreadingtheretrieveddocument. Figure3.20:HTTP1.0andtheunderlyingTCPconnection However,asthewebevolvedtosupportricherdocumentscontainingimages,openingaTCPconnectionforeach URIbecameaperformanceproblem [Mogul1995].Indeed,besidesitsHTMLpart,awebpagemayinclude dozensofimagesormore.ForcingtheclienttoopenaTCPconnectionforeachcomponentofawebpage hastwoimportantdrawbacks.First,theclientandtheservermustexchangepacketstoopenandcloseaTCP connectionaswewillseelater.Thisincreasesthenetworkoverheadandthetotaldelayofcompletelyretrieving allthecomponentsofawebpage.Second,alargenumberofestablishedTCPconnectionsmaybeaperformance bottleneckonservers. ThisproblemwassolvedbyextendingHTTPtosupportpersistentTCPconnections RFC2616.Apersistent connectionisaTCPconnectionoverwhichaclientmaysendseveralHTTPrequests.Thisisillustratedinthe gurebelow. 52Chapter3.TheapplicationLayer

    PAGE 57

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure3.21:HTTP1.1persistentconnections ToallowtheclientsandserverstocontroltheutilisationofthesepersistentTCPconnections,HTTP1.1 RFC 2616 denesseveralnewHTTPheaders: The Connection: headerisusedwiththe Keep-Alive argumentbytheclienttoindicatethatitexpectsthe underlyingTCPconnectiontobepersistent.Whenthisheaderisusedwiththe Close argument,itindicates thattheentitythatsentitwillclosetheunderlyingTCPconnectionattheendoftheHTTPresponse. The Keep-Alive: headerisusedbytheservertoinformtheclientabouthowitagreestousethepersistent connection.Atypical Keep-Alive: containstwoparameters:themaximumnumberofrequeststhatthe serveragreestoserveontheunderlyingTCPconnectionandthetimeout(inseconds)afterwhichtheserver willcloseanidleconnection TheexamplebelowshowstheoperationofHTTP/1.1overapersistentTCPconnectiontoretrievethreeURIs storedonthesameserver.Oncetheconnectionhasbeenestablished,theclientsendsitsrstrequestwiththe Connection:keep-alive headertorequestapersistentconnection. GET/HTTP/1.1 Host:www.kame.net User-Agent:Mozilla/5.0(Macintosh;U;IntelMacOSX10_6_2;en-us) Connection:keep-alive Theserverreplieswiththe Connection:Keep-Alive headerandindicatesthatitacceptsamaximumof100HTTP requestsoverthisconnectionandthatitwillclosetheconnectionifitremainsidlefor15seconds. HTTP/1.1200OK Date:Fri,19Mar201009:23:37GMT Server:Apache/2.0.63(FreeBSD)PHP/5.2.12withSuhosin-Patch Keep-Alive:timeout=15,max=100 Connection:Keep-Alive Content-Length:3462 Content-Type:text/html ... Theclientsendsasecondrequestforthestylesheetoftheretrievedwebpage. GET/style.cssHTTP/1.1 Host:www.kame.net Referer:http://www.kame.net/ User-Agent:Mozilla/5.0(Macintosh;U;IntelMacOSX10_6_2;en-us) Connection:keep-alive 3.2.Application-levelprotocols53

    PAGE 58

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Theserverreplieswiththerequestedstylesheetandmaintainsthepersistentconnection.Notethattheserveronly accepts99remainingHTTPrequestsoverthispersistentconnection. HTTP/1.1200OK Date:Fri,19Mar201009:23:37GMT Server:Apache/2.0.63(FreeBSD)PHP/5.2.12withSuhosin-Patch Last-Modified:Mon,10Apr200605:06:39GMT Content-Length:2235 Keep-Alive:timeout=15,max=99 Connection:Keep-Alive Content-Type:text/css ... Thentheclientautomaticallyrequeststhewebserver'sicon 18 ,thatcouldbedisplayedbythebrowser.Thisserver doesnotcontainsuchURIandthusreplieswitha 404 HTTPstatus.However,theunderlyingTCPconnectionis notclosedimmediately. GET/favicon.icoHTTP/1.1 Host:www.kame.net Referer:http://www.kame.net/ User-Agent:Mozilla/5.0(Macintosh;U;IntelMacOSX10_6_2;en-us) Connection:keep-alive HTTP/1.1404NotFound Date:Fri,19Mar201009:23:40GMT Server:Apache/2.0.63(FreeBSD)PHP/5.2.12withSuhosin-Patch Content-Length:318 Keep-Alive:timeout=15,max=98 Connection:Keep-Alive Content-Type:text/html;charset=iso-8859-1 ... Asillustratedabove,aclientcansendseveralHTTPrequestsoverthesamepersistentTCPconnection.However, itisimportanttonotethatalloftheseHTTPrequestsareconsideredtobeindependentbytheserver.EachHTTP requestmustbeself-contained.Thisimpliesthateachrequestmustincludealltheheaderlinesthatarerequired bytheservertounderstandtherequest.Theindependenceoftheserequestsisoneoftheimportantdesignchoices ofHTTP.Asaconsequenceofthisdesignchoice,whenaserverprocessesaHTTPrequest,itdoesn't'useany otherinformationthanwhatiscontainedintherequestitself.Thisexplainswhytheclientaddsits User-Agent: headerinalloftheHTTPrequestsitsendsoverthepersistentTCPconnection. However,inpractice,someserverswanttoprovidecontenttunedforeachuser.Forexample,someservers canprovideinformationinseverallanguagesorotherserverswanttoprovideadvertisementsthataretargetedto differenttypesofusers.Todothis,serversneedtomaintainsomeinformationaboutthepreferencesofeachuser andusethisinformationtoproducecontentmatchingtheuser'spreferences.HTTPcontainsseveralmechanisms thatenabletosolvethisproblem.Wediscussthreeofthembelow. Arstsolutionistoforcetheuserstobeauthenticated.Thiswasthesolutionusedby FTP tocontrolthelesthat eachusercouldaccess.Initially,usernamesandpasswordscouldbeincludedinsideURIs RFC1738.However, placingpasswordsintheclearinapotentiallypubliclyvisibleURIiscompletelyinsecureandthisusagehasnow beendeprecated RFC3986.HTTPsupportsseveralextensionheaders RFC2617 thatcanbeusedbyaserver torequesttheauthenticationoftheclientbyprovidinghis/hercredentials.However,usernamesandpasswords havenotbeenpopularonwebserversastheyforcehumanuserstorememberoneusernameandonepassword perserver.Rememberingapasswordisacceptablewhenauserneedstoaccessprotectedcontent,butuserswill notaccepttheneedforausernameandpasswordonlytoreceivetargetedadvertisementsfromthewebsitesthat theyvisit. Asecondsolutiontoallowserverstotunethatcontenttotheneedsandcapabilitiesoftheuseristorelyon thedifferenttypesof Accept-* HTTPheaders.Forexample,the Accept-Language: canbeusedbytheclientto 18 FavoriteiconsaresmalliconsthatareusedtorepresentwebserversinthetoolbarofInternetbrowsers.Microsoftaddedthisfeature intheirbrowserswithouttakingintoaccounttheW3Cstandards.See http://www.w3.org/2005/10/howto-favicon foradiscussiononhowto cleanlysupportsuchfavoriteicons. 54Chapter3.TheapplicationLayer

    PAGE 59

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 indicateitspreferredlanguages.Unfortunately,inpracticethisheaderisusuallysetbasedonthedefaultlanguage ofthebrowseranditisnotpossibleforausertoindicatethelanguageitpreferstousebyselectingoptionson eachvisitedwebserver. Thethird,andwidelyadopted,solutionareHTTPcookies.HTTPcookieswereinitiallydevelopedasaprivate extensionby Netscape.Theyarenowpartofthestandard RFC6265.Inanutshell,acookieisashortstringthat ischosenbyaservertorepresentagivenclient.TwoHTTPheadersareused: Cookie: and Set-Cookie:.Whena serverreceivesanHTTPrequestfromanewclient(i.e.anHTTPrequestthatdoesnotcontainthe Cookie: header), itgeneratesacookiefortheclientandincludesitinthe Set-Cookie: headerofthereturnedHTTPresponse.The Set-Cookie: headercontainsseveraladditionalparametersincludingthedomainnamesforwhichthecookieis valid.TheclientstoresallreceivedcookiesondiskandeverytimeitsendsaHTTPrequest,itverieswhether italreadyknowsacookieforthisdomain.Ifso,itattachesthe Cookie: headertotheHTTPrequest.Thisis illustratedinthegurebelowwithHTTP1.1,butcookiesalsoworkwithHTTP1.0. Figure3.22:HTTPcookies Note: PrivacyissueswithHTTPcookies TheHTTPcookiesintroducedby Netscape arekeyforlargee-commercewebsites.However,theyhavealso raisedmanydiscussionsconcerningtheir potentialmisuses.Consider ad.com,acompanythatdeliverslotsof advertisementsonwebsites.Awebsitethatwishestoinclude ad.com`sadvertisementsnexttoitscontentwill addlinksto ad.com insideitsHTMLpages.If ad.com isusedbymanywebsites, ad.com couldbeabletotrackthe interestsofalltheusersthatvisititsclientwebsitesandusethisinformationtoprovidetargetedadvertisements. Privacyadvocateshaveeven sued onlineadvertisementcompaniestoforcethemtocomplywiththeprivacy regulations.Morerecentrelatedtechnologiesalsoraise privacyconcerns 3.3Writingsimplenetworkedapplications Networkedapplicationswereusuallyimplementedbyusingthe socketAPI .ThisAPIwasdesignedwhenTCP/IP wasrstimplementedinthe UnixBSD operatingsystem [Sechrest][LFJLMT],andhasservedasthemodelfor manyAPIsbetweenapplicationsandthenetworkingstackinanoperatingsystem.AlthoughthesocketAPIis verypopular,otherAPIshavealsobeendeveloped.Forexample,theSTREAMSAPIhasbeenaddedtoseveral UnixSystemVvariants [Rago1993].ThesocketAPIissupportedbymostprogramminglanguagesandseveral textbookshavebeendevotedtoit.UsersoftheClanguagecanconsult [DC2009], [Stevens1998], [SFR2004] or [Kerrisk2010].TheJavaimplementationofthesocketAPIisdescribedin [CD2008] andinthe Javatutorial.In thissection,wewillusethe python implementationofthe socket APItoillustratethekeyconcepts.Additional informationaboutthisAPImaybefoundinthe socketsection ofthe pythondocumentation 3.3.Writingsimplenetworkedapplications 55

    PAGE 60

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 ThesocketAPIisquitelow-levelandshouldbeusedonlywhenyouneedacompletecontrolofthenetwork access.Ifyourapplicationsimplyneeds,forinstance,toretrievedatawithHTTP,therearemuchsimplerand higher-levelAPIs. AdetaileddiscussionofthesocketAPIisoutsidethescopeofthissectionandthereferencescitedaboveprovide adetaileddiscussionofallthedetailsofthesocketAPI.Asastartingpoint,itisinterestingtocomparethe socketAPIwiththeserviceprimitivesthatwehavediscussedinthepreviouschapter.Letusrstconsiderthe connectionlessservicethatconsistsofthefollowingtwoprimitives: DATA.request(destination,message) isusedtosendamessagetoaspecieddestination.InthissocketAPI, thiscorrespondstothe send method. DATA.indication(message) isissuedbythetransportservicetodeliveramessagetotheapplication.Inthe socketAPI,thiscorrespondstothereturnofthe recv methodthatiscalledbytheapplication. The DATA primitivesareexchangedthroughaserviceaccesspoint.InthesocketAPI,theequivalenttotheservice accesspointisthe socket.A socket isadatastructurewhichismaintainedbythenetworkingstackandisusedby theapplicationeverytimeitneedstosendorreceivedatathroughthenetworkingstack.The socket methodinthe pythonAPItakestwomainarguments: an addressfamily thatspeciesthetypeofaddressfamilyandthustheunderlyingnetworking stackthatwillbeusedwiththesocket.Thisparametercanbeeither socket.AF_INET or socket.AF_INET6. socket.AF_INET,whichcorrespondstotheTCP/IPv4protocolstackisthe default. socket.AF_INET6 correspondstotheTCP/IPv6protocolstack. a type indicatesthetypeofservicewhichisexpectedfromthenetworkingstack. socket.STREAM (the default)correspondstothereliablebytestreamconnection-orientedservice. socket.DGRAM corresponds totheconnectionlessservice. AsimpleclientthatsendsarequesttoaserverisoftenwrittenasfollowsindescriptionsofthesocketAPI. #Asimpleclientoftheconnectionlessservice importsocket importsys HOSTIP=sys.argv[1] PORT=int(sys.argv[2]) MSG="Hello,World!" s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM) s.sendto(MSG,(HOSTIP,PORT)) Atypicalusageofthisapplicationwouldbe pythonclient.py127.0.0.112345 where 127.0.0.1 istheIPv4addressofthehost(inthiscasethelocalhost)wheretheserverisrunningand 12345 theportoftheserver. Therstoperationisthecreationofthe socket.Twoparametersmustbespeciedwhilecreatinga socket. Therstparameterindicatestheaddressfamilyandthesecondthesockettype.Thesecondoperationisthe transmissionofthemessagebyusing sendto totheserver.Itshouldbenotedthat sendto takesasarguments themessagetobetransmittedandatuplethatcontainstheIPv4addressoftheserveranditsportnumber. ThecodeshownabovesupportsonlytheTCP/IPv4protocolstack.TousetheTCP/IPv6protocolstackthe socket mustbecreatedbyusingthe socket.AF_INET6 addressfamily.ForcingtheapplicationdevelopertoselectTCP/IPv4orTCP/IPv6whencreatinga socket isamajorhurdleforthedeploymentandusage ofTCP/IPv6intheglobalInternet [Cheshire2010].WhilemostoperatingsystemssupportbothTCP/IPv4and TCP/IPv6,manyapplicationsstillonlyuseTCP/IPv4bydefault.Inthelongterm,the socket APIshouldbe abletohandleTCP/IPv4andTCP/IPv6transparentlyandshouldnotforcetheapplicationdevelopertoalways specifywhetheritusesTCP/IPv4orTCP/IPv6. AnotherimportantissuewiththesocketAPIassupportedby python isthatitforcestheapplicationtodealwith IPaddressesinsteadofdealingdirectlywithdomainnames.Thislimitationdatesfromtheearlydaysofthe socket APIinUnix4.2BSD.Atthattime,theDNSwasnotwidelyavailableandonlyIPaddressescouldbe used.MostapplicationsrelyonDNSnamestointeractwithserversandthisutilisationoftheDNSplaysavery importantroletoscalewebserversandcontentdistributionnetworks.Tousedomainnames,theapplicationneeds 56Chapter3.TheapplicationLayer

    PAGE 61

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 toperformtheDNSresolutionbyusingthe getaddrinfo method.ThismethodqueriestheDNSandbuilds the sockaddr datastructurewhichisusedbyothermethodsofthesocketAPI.In python, getaddrinfo takes severalarguments: a name thatisthedomainnameforwhichtheDNSwillbequeried anoptional portnumber whichistheportnumberoftheremoteserver anoptional addressfamily whichindicatestheaddressfamilyusedfortheDNSrequest. socket.AF_INET (resp. socket.AF_INET6)indicatesthatanIPv4(IPv6)addressisexpected.Furthermore,the python socketAPIallowsanapplicationtouse socket.AF_UNSPEC toindicatethatitis abletouseeitherIPv4orIPv6addresses. anoptional sockettype whichcanbeeither socket.SOCK_DGRAM or socket.SOCK_STREAM Intoday'sInternethoststhatarecapableofsupportingbothIPv4andIPv6,allapplicationsshouldbe abletohandlebothIPv4andIPv6addresses.Whenusedwiththe socket.AF_UNSPEC parameter,the socket.getaddrinfo methodreturnsalistoftuplescontainingalltheinformationtocreatea socket. importsocket socket.getaddrinfo('www.example.net',80,socket.AF_UNSPEC,socket.SOCK_STREAM) [(30,1,6, '',('2001:db8:3080:3::2', 80, 0, 0)), (2, 1, 6, '',('203.0.113.225', 80))] Intheexampleabove, socket.getaddrinfo returnstwotuples.Therstonecorrespondstothe sockaddr containingtheIPv6addressoftheremoteserverandthesecondcorrespondstotheIPv4information.Duetosome peculiaritiesofIPv6andIPv4,theformatofthetwotuplesisnotexactlythesame,butthekeyinformationin bothcasesarethenetworklayeraddress(2001:db8:3080:3::2 and 203.0.113.225)andtheportnumber (80).Theotherparametersareseldomused. socket.getaddrinfo canbeusedtobuildasimpleclientthatqueriestheDNSandcontacttheserverby usingeitherIPv4orIPv6dependingontheaddressesreturnedbythe socket.getaddrinfo method.The clientbelowiteratesoverthelistofaddressesreturnedbytheDNSandsendsitsrequesttotherstdestination addressforwhichitcancreatea socket.Otherstrategiesareofcoursepossible.Forexample,ahostrunningin anIPv6networkmightprefertoalwaysuseIPv6whenIPv6isavailable 19 .Anotherexampleisthehappyeyeballs approachwhichisbeingdiscussedwithinthe IETF [WY2011].Forexample, [WY2011] mentionsthatsomeweb browserstrytousetherstaddressreturnedby socket.getaddrinfo.Ifthereisnoanswerwithinsome smalldelay(e.g.300milliseconds),thesecondaddressistried. importsocket importsys HOSTNAME=sys.argv[1] PORT=int(sys.argv[2]) MSG="Hello,World!" for a in socket.getaddrinfo(HOSTNAME,PORT,socket.AF_UNSPEC,socket.SOCK_DGRAM,0,socket.AI_PASSIVE): address_family,sock_type,protocol,canonicalname,sockaddr=a try: s=socket.socket(address_family,sock_type) except socket.error: s=None print "Couldnotcreatesocket" continue if s isnot None: s.sendto(MSG,sockaddr) break NowthatwehavedescribedtheutilisationofthesocketAPItowriteasimpleclientusingtheconnectionless transportservice,letushaveacloserlookatthereliablebytestreamtransportservice.Asexplainedabove,this serviceisinvokedbycreatinga socket oftype socket.SOCK_STREAM.Onceasockethasbeencreated,a clientwilltypicallyconnecttotheremoteserver,sendsomedata,waitforananswerandeventuallyclosethe connection.Theseoperationsareperformedbycallingthefollowingmethods: 19 MostoperatingsystemstodaybydefaultprefertouseIPv6whentheDNSreturnsbothanIPv4andanIPv6addressforaname.See http://ipv6int.net/systems/formoredetailedinformation. 3.3.Writingsimplenetworkedapplications 57

    PAGE 62

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 socket.connect :thismethodtakesa sockaddr datastructure,typicallyreturnedby socket.getaddrinfo,asargument.Itmayfailandraiseanexceptioniftheremoteservercannot bereached. socket.send :thismethodtakesastringasargumentandreturnsthenumberofbytesthatwereactually sent.Thestringwillbetransmittedasasequenceofconsecutivebytestotheremoteserver.Applications areexpectedtocheckthevaluereturnedbythismethodandshouldresendthebytesthatwerenotsend. socket.recv :thismethodtakesanintegerasargumentthatindicatesthesizeofthebufferthathasbeen allocatedtoreceivethedata.Animportantpointtonoteabouttheutilisationofthe socket.recv method isthatasitrunsaboveabytestreamservice,itmayreturnanyamountofbytes(uptothesizeofthebuffer providedbytheapplication).Theapplicationneedstocollectallthereceiveddataandthereisnoguarantee thatsomedatasentbytheremotehostbyusingasinglecalltothe socket.send methodwillbereceived bythedestinationwithasinglecalltothe socket.recv method. socket.shutdown :thismethodisusedtoreleasetheunderlyingconnection.Onsomeplatforms,itis possibletospecifythedirectionoftransfertobereleased(e.g. socket.SHUT_WR toreleasetheoutgoing directionor socket.SHUT_RDWR toreleasebothdirections). socket.close:thismethodisusedtoclosethesocket.Itcalls socket.shutdown iftheunderlying connectionisstillopen. Withthesemethods,itisnowpossibletowriteasimpleHTTPclient.ThisclientoperatesoverbothIPv6andIPv4 andwritesthehomepageoftheremoteserveronthestandardoutput.Italsoreportsthenumberof socket.recv callsthatwereusedtoretrievethehomepage 20 #!/usr/bin/python #Asimplehttpclientthatretrievesthefirstpageofawebsite importsocket, sys if len(sys.argv)!=3 and len(sys.argv)!=2: print "Usage:",sys.argv[0],"hostname[port]" hostname=sys.argv[1] if len(sys.argv)==3: port=int(sys.argv[2]) else: port=80 READBUF=16384 #sizeofdatareadfromwebserver s=None for res in socket.getaddrinfo(hostname,port,socket.AF_UNSPEC,socket.SOCK_STREAM): af,socktype,proto,canonname,sa = res #createsocket try: s = socket.socket(af,socktype,proto) except socket.error: s = None continue #connecttoremotehost try: print "Trying"+sa[0] s.connect(sa) except socket.error,msg: #socketfailed s.close() s = None continue if s: 20 Experimentswiththeclientindicatethatthenumberof socket.recv callscanvaryateachrun.Therearevariousfactorsthatinuence thenumberofsuchcallsthatarerequiredtoretrievesomeinformationfromaserver.We'lldiscusssomeofthemafterhavingexplainedthe operationoftheunderlyingtransportprotocol. 58Chapter3.TheapplicationLayer

    PAGE 63

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 print "Connectedto"+sa[0] s.send('GET/HTTP/1.1\r\nHost:'+hostname+'\r\n\r\n') finished=False count=0 whilenot finished: data=s.recv(READBUF) count=count+1 if len(data)!=0: print repr(data) else: finished=True s.shutdown(socket.SHUT_WR) s.close() print "Datawasreceivedin",count,"recvcalls" break Asmentionedabove,thesocketAPIisverylow-level.Thisistheinterfacetothetransportservice.Foracommon andsimpletask,likeretrievingadocumentfromtheWeb,therearemuchsimplersolutions.Forexample,the pythonstandardlibraryincludesseveralhigh-levelAPIstoimplementationsofvariousapplicationlayerprotocols includingHTTP.Forexample,the httplib modulecanbeusedtoeasilyaccessdocumentsviaHTTP. #!/usr/bin/python #Asimplehttpclientthatretrievesthefirstpageofawebsite,using #thestandardhttpliblibrary importhttplib, sys if len(sys.argv)!=3 and len(sys.argv)!=2: print "Usage:",sys.argv[0],"hostname[port]" sys.exit(1) path='/' hostname=sys.argv[1] if len(sys.argv)==3: port=int(sys.argv[2]) else: port = 80 conn = httplib.HTTPConnection(hostname,port) conn.request("GET",path) r = conn.getresponse() print "Responseis %i (%s)" % (r.status,r.reason) print r.read() Anothermodule,urllib2allowstheprogrammertodirectlyuseURLs.Thisismuchmoresimplerthandirectly usingsockets. Butsimplicityisnottheonlyadvantageofusinghigh-levellibraries.Theyallowtheprogrammertomanipulate higher-levelconcepts(e.g. IwantthecontentpointedbythisURL )butalsoincludemanyfeaturessuchas transparentsupportfortheutilisationof TLS orIPv6. ThesecondtypeofapplicationsthatcanbewrittenbyusingthesocketAPIaretheservers.Aserveristypically runsforeverwaitingtoprocessrequestscomingfromremoteclients.Aserverusingtheconnectionlesswill typicallystartwiththecreationofa socket withthe socket.socket.Thissocketcanbecreatedabovethe TCP/IPv4networkingstack(socket.AF_INET)ortheTCP/IPv6networkingstack(socket.AF_INET6), butnotbothbydefault.Ifaserveriswillingtousethetwonetworkingstacks,itmustcreatetwothreads,oneto handletheTCP/IPv4socketandtheothertohandletheTCP/IPv6socket.Itisunfortunatelyimpossibletodene asocketthatcanreceivedatafrombothnetworkingstacksatthesametimewiththe python socketAPI. AserverusingtheconnectionlessservicewilltypicallyusetwomethodsfromthesocketAPIinadditiontothose thatwehavealreadydiscussed. socket.bind isusedtobindasockettoaportnumberandoptionallyanIPaddress.Mostserverswill bindtheirsockettoallavailableinterfacesontheservers,buttherearesomesituationswheretheserver 3.3.Writingsimplenetworkedapplications 59

    PAGE 64

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 mayprefertobeboundonlytospecicIPaddresses.Forexample,aserverrunningonasmartphonemight wanttobeboundtotheIPaddressoftheWiFiinterfacebutnotonthe3Ginterfacethatismoreexpensive. socket.recvfrom isusedtoreceivedatafromtheunderlyingnetworkingstack.Thismethodreturns boththesender'saddressandthereceiveddata. Thecodebelowillustratesaverysimpleserverrunningabovetheconnectionlesstransportservicethatsimply printsonthestandardoutputallthereceivedmessages.ThisserverusestheTCP/IPv6networkingstack. importsocket, sys PORT=int(sys.argv[1]) BUFF_LEN=8192 s=socket.socket(socket.AF_INET6,socket.SOCK_DGRAM) s.bind(('',PORT,0,0)) while True: data,addr=s.recvfrom(BUFF_LEN) if data=="STOP": print "Stoppingserver" sys.exit(0) print "receivedfrom",addr, "message:",data AserverthatusesthereliablebytestreamservicecanalsobebuiltabovethesocketAPI.Suchaserverstarts bycreatingasocketthatisboundtotheportthathasbeenchosenfortheserver.Thentheservercallsthe socket.listen method.Thisinformstheunderlyingnetworkingstackofthenumberoftransportconnection attemptsthatcanbequeuedintheunderlyingnetworkingstackwaitingtobeacceptedandprocessedbythe server.Theservertypicallyhasathreadwaitingonthe socket.accept method.Thismethodreturnsassoon asaconnectionattemptisreceivedbytheunderlyingstack.Itreturnsasocketthatisboundtotheestablished connectionandtheaddressoftheremotehost.Withthesemethods,itispossibletowriteaverysimplewebserver thatalwaysreturnsa 404 errortoall GET requestsanda 501 errorstoallotherrequests. #AnextremelysimpleHTTPserver importsocket, sys, time #ServerrunsonallIPaddressesbydefault HOST='' #8080canbeusedwithoutrootpriviledges PORT=8080 BUFLEN=8192 #buffersize s=socket.socket(socket.AF_INET6,socket.SOCK_STREAM) try: print "StartingHTTPserveronport",PORT s.bind((HOST,PORT,0,0)) except socket.error: print "Cannotbindtoport:",PORT sys.exit(-1) s.listen(10) #maximum10queuedconnections while True: #arealserverwouldbemultithreadedandwouldcatchexceptions conn,addr=s.accept() print "Connectionfrom",addr data='' whilenot '\n' in data: #waituntilfirstlinehasbeenreceived data = data+conn.recv(BUFLEN) if data.startswith('GET'): #GETrequest conn.send('HTTP/1.0404NotFound\r\n') #arealservershouldservefiles else: 60Chapter3.TheapplicationLayer

    PAGE 65

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 #othertypeofHTTPrequest conn.send('HTTP/1.0501Notimplemented\r\n') now = time.strftime("%a, %d %b%Y%H:%M:%S",time.localtime()) conn.send('Date:' + now +'\r\n') conn.send('Server:Dummy-HTTP-Server\r\n') conn.send('\r\n') conn.shutdown(socket.SHUT_RDWR) conn.close() Thisserverisfarfromaproduction-qualitywebserver.Arealwebserverwouldusemultiplethreadsand/or non-blockingIOtoprocessalargenumberofconcurrentrequests 21 .Furthermore,itwouldalsoneedtohandle alltheerrorsthatcouldhappenwhilereceivingdataoveratransportconnection.Theseareoutsidethescope ofthissectionandadditionalinformationonmorecomplexnetworkedapplicationsmaybefoundelsewhere. Forexample, [RG2010] providesanin-depthdiscussionoftheutilisationofthesocketAPIwithpythonwhile [SFR2004] remainsanexcellentsourceofinformationonthesocketAPIinC. 3.4Summary Inthischapter,webeganbydescribingtheclient-serverandpeer-to-peermodels.Wethendescribed,indetail, threeimportantfamiliesofprotocolsintheapplicationlayer.TheInternetidentieshostsbyusing32bitsIPv4 or128bitsIPv6.However,usingtheseaddressesdirectlyinsideapplicationswouldbedifcultforthehumans thatusethem.WehaveexplainedhowtheDomainNameSystemallowsthemappingofnamestocorresponding addresses.WehavedescribedboththeDNSprotocolthatrunsaboveUDPandthenaminghierarchy.Wehave thendiscussedoneoftheoldestapplicationsontheInternet:electronicmail.Wehavedescribedtheformatof emailmessagesanddescribedtheSMTPprotocolthatisusedtosendemailmessagesaswellasthePOPprotocol thatisusedbyemailrecipientstoretrievetheiremailmessagesfromtheirserver.Finally,wehaveexplainedthe protocolsthatareusedintheworldwidewebandtheHyperTextTransferProtocolinparticular. 3.5Exercises Thissectioncontainsseveralexercisesandsmallchallengesabouttheapplicationlayerprotocols. 3.5.1TheDomainNameSystem TheDomainNameSystem(DNS)playsakeyroleintheInternettodayasitallowsapplicationstousefully qualieddomainnames(FQDN)insteadofIPv4orIPv6addresses.Manytoolsallowtoperformqueriesthrough DNSservers.Forthisexercise,wewilluse dig whichisinstalledonmostUnixsystems. Atypicalusageofdigisasfollows dig@server-ttypefqdn where server istheIPaddressorthenameofaDNSserverorresolver type isthetypeofDNSrecordthatisrequestedbythequerysuchas NS foranameserver, A foranIPv4 address, AAAA foranIPv6address, MX foramailrelay,... fqdn isthefullyqualieddomainnamebeingqueried 1.WhataretheIPaddressesoftheresolversthatthe dig implementationyouareusingrelieson 22 ? 21 Therearemany productionqualitywebserverssoftware available. apache isaverycomplexbutwidelyusedone. thttpd and lighttpd are lesscomplexandtheirsourcecodeisprobablyeasiertounderstand. 22 OnaLinuxmachine,the Description sectionofthe dig manpagetellsyouwhere dig ndsthelistofnameserverstoquery. 3.4.Summary 61

    PAGE 66

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 2.WhatistheIPaddressthatcorrespondsto inl.info.ucl.ac.be ?WhichtypeofDNSquerydoes dig sendto obtainthisinformation? 3.WhichtypeofDNSrequestdoyouneedtosendtoobtainthenameserversthatareresponsibleforagiven domain? 4.Whatarethenameserversthatareresponsibleforthe be top-leveldomain?Wherearetheylocated?Isit possibletouseIPv6toquerythem? 5.Whenrunwithoutanyparameter, dig queriesoneoftherootDNSserversandretrievesthelistofthethe namesofallrootDNSservers.Fortechnicalreasons,thereareonly13differentrootDNSservers.This informationisalsoavailableasatextlefrom http://www.internic.net/zones/named.root WhataretheIP addressesofalltheseservers.CantheybequeriedbyusingIPv6 23 ? 6.AssumenowthatyouareresidinginanetworkwherethereisnoDNSresolverandthatyouneedtostart yourqueryfromtheDNSroot. Use dig tosendaquerytooneoftheserootserverstondtheIPaddressoftheDNSserver(s)(NS record)responsibleforthe org top-leveldomain Use dig tosendaquerytooneoftheseDNSserverstondtheIPaddressoftheDNSserver(s)(NS record)responsibleforroot-servers.org` Continueuntilyoundtheserverresponsiblefor www.root-servers.org WhatisthelifetimeassociatedtothisIPaddress? 7.Performthesameanalysisforapopularwebsitesuchas www.google.com.Whatisthelifetimeassociated tothisIPaddress?Ifyouperformthesamerequestseveraltimes,doyoualwaysreceivethesameanswer ?CanyouexplainwhyalifetimeisassociatedtotheDNSreplies? 8.Use dig tondthemailrelaysusedbythe uclouvain.be and gmail.com domains.Whatisthe TTL ofthese records(usethe +ttlid optionwhenusing dig)?Canyouexplainthepreferencesusedbythe MX records. YoucanndmoreinformationabouttheMXrecordsin RFC974 9.Use dig toquerytheIPv6address(DNSrecordAAAA)ofthefollowinghosts www.sixxs.net www.google.com ipv6.google.com 10.When dig isrun,theheadersectioninitsoutputindicatesthe id theDNSidentierusedtosendthequery. Doesyourimplementationof dig generatesrandomidentiers? dig-tMXgmail.com ;<<>>DiG9.4.3-P3<<>>-tMXgmail.com ;;globaloptions:printcmd ;;Gotanswer: ;;->>HEADER<<-opcode:QUERY,status:NOERROR,id:25718 11.ADNSimplementationsuchas dig andmoreimportantlyanameresolversuchas bind or unbound,always checksthatthereceivedDNSreplycontainsthesameidentierastheDNSrequestthatitsent.Whyisthis soimportant? ImagineanattackerwhoisabletosendforgedDNSrepliesto,forexample,associate www.bigbank.com tohisownIPaddress.HowcouldheattackaDNSimplementationthat sendsDNSrequestscontainingalwaysthesameidentier sendsDNSrequestscontainingidentiersthatareincrementedbyoneaftereachrequest sendsDNSrequestscontainingrandomidentiers 23 YoumayobtainadditionalinformationabouttherootDNSserversfromhttp://www.root-servers.org 62Chapter3.TheapplicationLayer

    PAGE 67

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 12.TheDNSprotocolcanrunoverUDPandoverTCP.MostDNSserversprefertouseUDPbecauseit consumesfewerresourcesontheserver.However,TCPisusefulwhenalargeanswerisexpectedorwhen alargeanswermust.YoucanforcetheutilisationofTCPbyusing dig+tcp.UseTCPandUDPtoquerya rootDNSserver.IsitfastertoreceiveananswerviaTCPorviaUDP? 3.5.2Internetemailprotocols ManyInternetprotocolsare ASCII-basedprotocolswheretheclientsendsrequestsasonelineof ASCII text terminatedby CRLF andtheserverreplieswithoneofmorelinesof ASCII text.Usingsuch ASCII messageshas severaladvantagescomparedtoprotocolsthatrelyonbinaryencodedmessages themessagesexchangedbytheclientandtheservercanbeeasilyunderstoodbyadeveloperornetwork engineerbysimplyreadingthemessages itisofteneasytowriteasmallprototypethatimplementsapartoftheprotocol itispossibletotestaservermanuallybyusingtelnetTelnetisaprotocolthatallowstoobtainaterminalon aremoteserver.Forthis,telnetopensaTCPconnectionwiththeremoteserveronport23.However,most telnet implementationsallowtheusertospecifyanalternateportas telnethostsport Whenusedwithaport numberasparameter, telnet opensaTCPconnectiontotheremotehostonthespeciedport. telnet can thusbeusedtotestanyserverusinganASCII-basedprotocolontopofTCP.Notethatifyouneedtostop arunning telnet session,Ctrl-Cwillnotworkasitwillbesentby telnet totheremotehostovertheTCP connection.Onmany telnet implementationsyoucantype Ctrl-] tofreezetheTCPconnectionandreturn tothetelnetinterface. 1.AssumethatAlicesendsanemailfromher alice@yahoo.com accountto Bob whouses bob@yahoo.com. Whichprotocolsareinvolvedinthetransmissionofthisemail? 2.SamequestionwhenAlicesendsanemailtoherfriendTrudy, trudy@gmail.com. 3.Beforetheadventofwebmailandfeaturerichmailers,emailwaswrittenandreadbyusingcommandline toolsonservers.Usingyouraccounton sirius.info.ucl.ac.be usethe /bin/mail commandlinetooltosend anemailtoyourself onthishost.Thisserverstoreslocalemailsinthe /var/mail directorywithoneleper user.Checkwith /bin/more thecontentofyourmailleandtrytounderstandwhichlineshavebeenadded bytheserverintheheaderofyouremail. 4.Useyourpreferredemailtooltosendanemailmessagetoyourselfcontainingasinglelineoftext.Most emailtoolshavetheabilitytoshowthe source ofthemessage,usethisfunctiontolookatthemessagethat yousentandthemessagethatyoureceived.Canyoundanexplanationforallthelinesthathavebeen addedtoyoursinglelineemail 24 ? 5.TherstversionoftheSMTPprotocolwasdenedin RFC821.ThecurrentstandardforSMTPisdened in RFC5321 Consideringonly RFC821 whatarethemaincommandsofthe SMTP protocol 25 ? 6.WhenusingSMTP,howdoyourecogniseapositivereplyfromanegativeone? 7.ASMTPserverisadaemonprocessthatcanfailduetoabugorlackofresources(e.g.memory).Network administratorsofteninstalltools 26 thatregularlyconnecttotheirserverstocheckthattheyareoperating correctly.AsimplesolutionistoopenaTCPconnectiononport25totheSMTPserver'shost 27 .Ifthe connectionisestablished,thisimpliesthatthereisaprocesslistening.WhatisthereplysentbytheSMTP serverwhenyoutypethefollowingcommand? telnetcnp3.info.ucl.ac.be25 24 Since RFC821,SMTPhasevolvedalotduenotablytothegrowingusageofemailandtheneedtoprotecttheemailsystemagainst spammers.Itisunlikelythatyouwillbeabletoexplainalltheadditionallinesthatyouwillndinemailheaders,butwe'lldiscussthem together. 25 AshorterdescriptionoftheSMTPprotocolmaybefoundonwikipediaathttp://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol 26 Therearemany monitoringtools available. nagios isaverypopularopensourcemonitoringsystem. 27 Notethatusing telnet toconnecttoaremotehostonport25maynotworkinallnetworks.Duetothe spam problem,many ISP networks donotallowtheircustomerstouseportTCP25directlyandforcethemtousetheISP'smailrelaytoforwardtheiremail.Thankstothis,if asoftwaresendingspamhasbeeninstalledonthePCofoneoftheISP'scustomers,thissoftwarewillnotbeabletosendahugeamountof spam.Ifyouconnectto nostromo.info.ucl.ac.be fromthexedstationsinINGI'slab,youshouldnotbeblocked. 3.5.Exercises 63

    PAGE 68

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Warning :Do not trythisonarandomSMTPserver.Theexercisesproposedinthissectionshould onlyberunontheSMTPserverdedicatedfortheseexercises: cnp3.info.ucl.ac.be.Ifyoutrythem onaproductionSMTPserver,theadministratorofthisservermaybecomeangry. 1.ContinuetheSMTPsessionthatyoustartedabovebysendingthegreetingscommand(HELO followedby thefullyqualieddomainnameofyourhost)andterminatethesessionbysendingthe QUIT command. 2.TheminimumSMTPsessionaboveallowstoverifythattheSMTPisrunning.However,thisdoesnot alwaysimplythatmailcanbedelivered.Forexample,largeSMTPserversoftenuseadatabasetostoreall theemailaddressesthattheyserve.Toverifythecorrectoperationofsuchaserver,onepossibilityisto usethe VRFY command.OpenaSMTPsessiononthelab'sSMTPserver(cnp3.info.ucl.ac.be)andusethis commandtoverifythatyouraccountisactive. 3.NowthatyouknowthebasicsofopeningandclosinganSMTPsession,youcannowsendemailmanually byusingthe MAILFROM:, RCPTTO: and DATA commands.Usethesecommandsto manually sendan emailto INGI2141@cnp3.info.ucl.ac.be .Donotforgettoincludethe From:, To: and Subject: linesinyour header. 1.ByusingSMTP,isitpossibletosendanemailthatcontainsexactlythefollowingASCIIart? .. ... 1.Mostemailagentsallowyoutosendemailincarbon-copy(cc:)andalsoinblind-carbon-copy(bcc:)toa recipient.HowdoesaSMTPserversupportsthesetwotypesofrecipients? 2.Intheearlydays,emailwasreadbyusingtoolssuchas /bin/mail ormoreadvancedtext-basedmailreaders suchas pine or elm .Today,emailsarestoredondedicatedserversandretrievedbyusingprotocolssuch as POP_ or IMAP.Fromtheuser'sviewpoint,canyoulisttheadvantagesanddrawbacksofthesetwo protocols? 3.TheTCPprotocolsupports65536differentportsnumbers.Manyoftheseportnumbershavebeenreserved forsomeapplications.TheofcialrepositoryofthereservedportnumbersismaintainedbytheInternet AssignedNumbersAuthority(IANA)on http://www.iana.org/assignments/port-numbers 28 .Usingthisinformation,whatisthedefaultportnumberforthePOP3protocol?DoesitrunontopofUDPorTCP ? 4.ThePostOfceProtocol(POP)isarathersimpleprotocoldescribedin RFC1939.POPoperatesinthree phases.Therstphaseistheauthorizationphasewheretheclientprovidesausernameandapassword.The secondphaseisthetransactionphasewheretheclientcanretrieveemails.Thelastphaseistheupdatephase wheretheclientnalisesthetransaction.WhatarethemainPOPcommandsandtheirparameters?When aPOPserverreturnsananswer,howcanyoueasilydeterminewhethertheanswerispositiveornegative? 5.Onsmartphones,usersoftenwanttoavoiddownloadinglargeemailsoveraslowwirelessconnection.How couldaPOPclientonlydownloademailsthataresmallerthan5KBytes? 6.OpenaPOPsessionwiththelab'sPOPserver(nostromo.info.ucl.ac.be)byusingtheusernameandpasswordthatyoureceived.Verifythatyourusernameandpasswordareacceptedbytheserver. 7.Thelab'sPOPservercontainsascriptthatrunseveryminuteandsendstwoemailmessagestoyouraccount ifyouremailfolderisempty.UsePOPtoretrievethesetwoemailsandprovidethesecretmessagetoyour teachingassistant. 3.5.3TheHyperTextTransferProtocol 1.WhatarethemainmethodssupportedbytherstversionoftheHyperTextTransferProtocol(HTTP) denedin RFC1945 29 ?Whatarethemaintypesofrepliessentbyahttpserver 30 ? 2.Systemadministratorswhoareresponsibleforwebserversoftenwanttomonitortheseserversandcheck thattheyarerunningcorrectly.AsaHTTPserverusesTCPonport80,thesimplestsolutionistoopena 28 OnUnixhosts,asubsetoftheportassignmentsisoftenplacedin /etc/services 29 Seesection5of RFC1945 30 Seesection6.1of RFC1945 64Chapter3.TheapplicationLayer

    PAGE 69

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 TCPconnectiononport80andcheckthattheTCPconnectionisacceptedbytheremotehost.However,as HTTPisanASCII-basedprotocol,itisalsoveryeasytowriteasmallscriptthatdownloadsawebpageon theserverandcomparesitscontentwiththeexpectedone.Use telnet toverifythatawebserverisrunning onhost rembrandt.info.ucl.ac.be 31 3.Insteadofusing telnet onport80,itisalsopossibletouseacommand-linetoolsuchas curl Use curl with the trace-asciitracele optiontostorein tracele alltheinformationexchangedbycurlwhenaccessing theserver. whatistheversionofHTTPusedbycurl? canyouexplainthedifferentheadersplacedbycurlintherequest? canyouexplainthedifferentheadersfoundintheresponse? 4.HTTP1.1,speciedin RFC2616 forcestheclienttousethe Host: inallitsrequests.HTTP1.0doesnot denethe Host: header,bymostimplementationssupportit.Byusing telnet and curl retrievetherstpage ofthe http://totem.info.ucl.ac.be webserverbysendinghttprequestswithandwithoutthe Host: header. Explainthedifferencebetweenthetwo 32 5.Byusing dig and curl ,determineonwhichphysicalhostthe http://www.info.ucl.ac.be, http://inl.info.ucl.ac.be and http://totem.info.ucl.ac.be arehosted 6.Use curl withthe trace-asciilename toretrieve http://www.google.com .Explainwhatabrowsersuchas refoxwoulddowhenretrievingthisURL. 7.TheheaderssentinaHTTPrequestallowtheclienttoprovideadditionalinformationtotheserver.One oftheseheadersistheLanguageheaderthatallowstoindicatethepreferredlanguageoftheclient 33 .For example, curl-HAccept-Language:enhttp://www.google.be'willsendto`http://www.google.be aHTTP requestindicatingEnglish(en)asthepreferredlanguage.DoesgoogleprovideadifferentpageinFrench (fr)andWalloon(wa)?Samequestionfor http://www.uclouvain.be (giventhesizeofthehomepage,use difftocomparethedifferentpagesretrievedfromwww.uclouvain.be) 8.Comparethesizeofthe http://www.yahoo.com and http://www.google.com webpagesbydownloading themwith curl 9.Whatisahttpcookie?Listsomeadvantagesanddrawbacksofusingcookiesonwebservers. 10.Youarenowresponsibleforthe http://www.belgium.be.Thegovernmenthasbuilttwo datacenters containing1000serverseachinAntwerpandNamur.Thiswebsitecontainsstaticinformationandyourobjective istobalancetheloadbetweenthedifferentserversandensuresthattheserviceremainsupevenifoneof thedatacentersisdisconnectedfromtheInternetduetooodingorothernaturaldisasters.Whatarethe techniquesthatyoucanusetoachievethisgoal? 31 TheminimumcommandsenttoaHTTPserveris GET/HTTP/1.0 followedbyCRLFandablankline 32 UsedigtondtheIPaddressusedby totem.info.ucl.ac.be 33 Thelistofavailablelanguagetagscanbefoundathttp://www.loc.gov/standards/iso639-2/php/code_list.phpAdditionalinformationabout thesupportofmultiplelanguagesinInternetprotocolsmaybefoundin RFC5646 3.5.Exercises 65

    PAGE 70

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 66Chapter3.TheapplicationLayer

    PAGE 71

    CHAPTER 4 Thetransportlayer Asthetransportlayerisbuiltontopofthenetworklayer,itisimportanttoknowthekeyfeaturesofthenetwork layerservice.Therearetwotypesofnetworklayerservices:connectionlessandconnection-oriented.The connectionlessnetworklayerserviceisthemostwidespread.Itsmaincharacteristicsare: theconnectionlessnetworklayerservicecanonlytransferSDUsof limitedsize 1 theconnectionlessnetworklayerservicemaydiscardSDUs theconnectionlessnetworklayerservicemaycorruptSDUs theconnectionlessnetworklayerservicemaydelay,reorderorevenduplicateSDUs Figure4.1:Thetransportlayerinthereferencemodel Theseimperfectionsoftheconnectionlessnetworklayerservicewillbecomemuchcleareroncewehaveexplained thenetworklayerinthenextchapter.Atthispoint,letussimplyassumethattheseimperfectionsoccurwithout tryingtounderstandwhytheyoccur. Sometransportprotocolscanbeusedontopofaconnection-orientednetworkservice,suchasclass0oftheISO TransportProtocol(TP0)denedin [X224] ,buttheyhavenotbeenwidelyused.Wedonotdiscussinfurther detailsuchutilisationofaconnection-orientednetworkserviceinthisbook. Thischapterisorganisedasfollows.Wewillrstexplainhowitispossibletoprovideareliabletransportservice ontopofanunreliableconnectionlessnetworkservice.Forthis,weexplainthemainmechanismsfoundinsuch protocols.Then,wewillstudyindetailthetwotransportprotocolsthatareusedintheInternet.Webeginwiththe UserDatagramProtocol(UDP)whichprovidesasimpleconnectionlesstransportservice.Then,wewilldescribe indetailtheTransmissionControlProtocol(TCP),includingitscongestioncontrolmechanism. 4.1Principlesofareliabletransportprotocol Inthissection,wedepictareliabletransportprotocolrunningaboveaconnectionlessnetworklayerservice.For this,werstassumethatthenetworklayerprovidesaperfectservice,i.e.: theconnectionlessnetworklayerservicenevercorruptsSDUs 1 ManynetworklayerservicesareunabletocarrySDUsthatarelargerthan64KBytes. 67

    PAGE 72

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 theconnectionlessnetworklayerserviceneverdiscardsSDUs theconnectionlessnetworklayerserviceneverdelays,reordersnorduplicateSDUs theconnectionlessnetworklayerservicecansupportSDUsof anysize Wewillthenremoveeachoftheseassumptionsoneaftertheotherinordertobetterunderstandthemechanisms usedtosolveeachimperfection. 4.1.1Reliabledatatransferontopofaperfectnetworkservice Thetransportlayerentityinteractswithbothauserintheapplicationlayerandanentityinthenetworklayer. Accordingtothereferencemodel,theseinteractionswillbeperformedusing DATA.req and DATA.ind primitives. However,tosimplifythepresentationandtoavoidconfusionbetweena DATA.req primitiveissuedbytheuser ofthetransportlayerentity,anda DATA.req issuedbythetransportlayerentityitself,wewillusethefollowing terminology: theinteractionsbetweentheuserandthetransportlayerentityarerepresentedbyusingtheclassical DATA.req, DATA.ind primitives theinteractionsbetweenthetransportlayerentityandthenetworklayerservicearerepresentedbyusing send insteadof DATA.req and recvd insteadof DATA.ind Thisisillustratedinthegurebelow. Figure4.2:Interactionsbetweenthetransportlayer,itsuser,anditsnetworklayerprovider Whenrunningontopofaperfectconnectionlessnetworkservice,atransportlevelentitycansimplyissuea send(SDU) uponarrivalofa DATA.req(SDU).Similarly,thereceiverissuesa DATA.ind(SDU) uponreceiptofa recvd(SDU).SuchasimpleprotocolissufcientwhenasingleSDUissent. Unfortunately,thisisnotalwayssufcienttoensureareliabledeliveryoftheSDUs.Considerthecasewherea clientsendstensofSDUstoaserver.Iftheserverisfasterthattheclient,itwillbeabletoreceiveandprocess allthesegmentssentbytheclientanddelivertheircontenttoitsuser.However,iftheserverisslowerthanthe client,problemsmayarise.ThetransportlayerentitycontainsbufferstostoreSDUsthathavebeenreceived asa Data.request fromtheapplicationbuthavenotyetbeensentviathenetworkservice.Iftheapplicationis fasterthanthenetworklayer,thebufferbecomesfullandtheoperatingsystemsuspendstheapplicationtoletthe transportentityemptyitstransmissionqueue.Thetransportentityalsousesabuffertostorethesegmentsreceived fromthenetworklayerthathavenotyetbeenprocessedbytheapplication.Iftheapplicationisslowtoprocessthe data,thisbufferbecomesfullandthetransportentityisnotabletoacceptanymorethesegmentsfromthenetwork layer.Thebuffersofthetransportentityhavealimitedsize 2 andiftheyoverow,thetransportentityisforcedto 2 Intheapplicationlayer,mostserversareimplementedasprocesses.Thenetworkandtransportlayerontheotherhandareusually implementedinsidetheoperatingsystemandtheamountofmemorythattheycanuseislimitedbytheamountofmemoryallocatedtothe entirekernel. 68Chapter4.Thetransportlayer

    PAGE 73

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.3:Thesimplesttransportprotocol discardreceivedsegments. Tosolvethisproblem,ourtransportprotocolmustincludeafeedbackmechanismthatallowsthereceivertoinform thesenderthatithasprocessedasegmentandthatanotheronecanbesent.Thisfeedbackisrequiredeventhough thenetworklayerprovidesaperfectservice.Toincludesuchafeedback,ourtransportprotocolmustprocesstwo typesofsegments: datasegmentscarryingaSDU controlsegmentscarryinganacknowledgmentindicatingthattheprevioussegmentwasprocessedcorrectly Thesetwotypesofsegmentscanbedistinguishedusingasegmentcomposedoftwoparts: the header thatcontainsonebitsetto 0 indatasegmentsandsetto 1 incontrolsegments thepayloadthatcontainstheSDUsuppliedbytheuserapplication Thetransportentitycanthenbemodelledasanitestatemachine,containingtwostatesforthereceiverandtwo statesforthesender.Thegurebelowprovidesagraphicalrepresentationofthisstatemachinewiththesender aboveandthereceiverbelow. Figure4.4:Finitestatemachineofthesimplesttransportprotocol 4.1.Principlesofareliabletransportprotocol 69

    PAGE 74

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 TheaboveFSMshowsthatthesenderhastowaitforanacknowledgementfromthereceiverbeforebeingableto transmitthenextSDU.Thegurebelowillustratestheexchangeofafewsegmentsbetweentwohosts. Figure4.5:Timesequencediagramillustratingtheoperationofthesimplesttransportprotocol 4.1.2Reliabledatatransferontopofanimperfectnetworkservice Thetransportlayermustdealwiththeimperfectionsofthenetworklayerservice.Therearethreetypesofimperfectionsthatmustbeconsideredbythetransportlayer: 1.Segmentscanbecorruptedbytransmissionerrors 2.Segmentscanbelost 3.Segmentscanbereorderedorduplicated Todealwiththesetypesofimperfections,transportprotocolsrelyondifferenttypesofmechanisms.Therst problemistransmissionerrors.Thesegmentssentbyatransportentityisprocessedbythenetworkanddatalink layersandnallytransmittedbythephysicallayer.Alloftheselayersareimperfect.Forexample,thephysical layermaybeaffectedbydifferenttypesoferrors: randomisolatederrorswherethevalueofasinglebithasbeenmodiedduetoatransmissionerror randombursterrorswherethevaluesof n consecutivebitshavebeenchangedduetotransmissionerrors randombitcreationsandrandombitremovalswherebitshavebeenaddedorremovedduetotransmission errors Theonlysolutiontoprotectagainsttransmissionerrorsistoaddredundancytothesegmentsthataresent. InformationTheory denestwomechanismsthatcanbeusedtotransmitinformationoveratransmissionchannel affectedbyrandomerrors.Thesetwomechanismsaddredundancytotheinformationsent,toallowthereceiver todetectorsometimesevencorrecttransmissionerrors.Adetaileddiscussionofthesemechanismsisoutsidethe scopeofthischapter,butitisusefultoconsiderasimplemechanismtounderstanditsoperationanditslimitations. Informationtheory denes codingschemes.Therearedifferenttypesofcodingschemes,butletusfocusoncoding schemesthatoperateonbinarystrings.Acodingschemeisafunctionthatmapsinformationencodedasastring of m bitsintoastringof n bits.Thesimplestcodingschemeistheevenparitycoding.Thiscodingschemetakes an m bitssourcestringandproducesan m+1 bitscodedstringwheretherst m bitsofthecodedstringarethebits ofthesourcestringandthelastbitofthecodedstringischosensuchthatthecodedstringwillalwayscontainan evennumberofbitssetto 1.Forexample: 1001 isencodedas 10010 1101 isencodedas 11011 70Chapter4.Thetransportlayer

    PAGE 75

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 ThisparityschemehasbeenusedinsomeRAMsaswellastoencodecharacterssentoveraserialline.Itiseasy toshowthatthiscodingschemeallowsthereceivertodetectasingletransmissionerror,butitcannotcorrectit. However,iftwoormorebitsareinerror,thereceivermaynotalwaysbeabletodetecttheerror. Somecodingschemesallowthereceivertocorrectsometransmissionerrors.Forexample,considerthecoding schemethatencodeseachsourcebitasfollows: 1 isencodedas 111 0 isencodedas 000 Forexample,considerasenderthatsends 111.Ifthereisonebitinerror,thereceivercouldreceive 011 or 101 or 110.Inthesethreecases,thereceiverwilldecodethereceivedbitpatternasa 1 sinceitcontainsamajorityofbits setto 1.Iftherearetwobitsinerror,thereceiverwillnotbeableanymoretorecoverfromthetransmissionerror. Thissimplecodingschemeforcesthesendertotransmitthreebitsforeachsourcebit.However,itallowsthe receivertocorrectsinglebiterrors.Moreadvancedcodingsystemsthatallowtorecoverfromerrorsareusedin severaltypesofphysicallayers. Transportprotocolsuseerrordetectionschemes,butnoneofthewidelyusedtransportprotocolsrelyonerror correctionschemes.Todetecterrors,asegmentisusuallydividedintotwoparts: a header thatcontainstheeldsusedbythetransportprotocoltoensurereliabledelivery.TheheadercontainsachecksumorCyclicalRedundancyCheck(CRC) [Williams1993] thatisusedtodetecttransmission errors a payload thatcontainstheuserdatapassedbytheapplicationlayer. Somesegmentheadersalsoincludea length ,whichindicatesthetotallengthofthesegmentorthelengthofthe payload. Thesimplesterrordetectionschemeisthechecksum.Achecksumisbasicallyanarithmeticsumofallthebytes thatasegmentiscomposedof.Therearedifferenttypesofchecksums.Forexample,aneightbitchecksumcan becomputedasthearithmeticsumofallthebytesof(boththeheaderandtrailerof)thesegment.Thechecksum iscomputedbythesenderbeforesendingthesegmentandthereceiververiesthechecksumuponreception ofeachsegment.Thereceiverdiscardssegmentsreceivedwithaninvalidchecksum.Checksumscanbeeasily implementedinsoftware,buttheirerrordetectioncapabilitiesarelimited.CyclicalRedundancyChecks(CRC) havebettererrordetectioncapabilities [SGP98],butrequiremoreCPUwhenimplementedinsoftware. Note: Checksums,CRCs,... MostoftheprotocolsintheTCP/IPprotocolsuiterelyonthesimpleInternetchecksuminordertoverifythatthe receivedsegmenthasnotbeenaffectedbytransmissionerrors.Despiteitspopularityandeaseofimplementation, theInternetchecksumisnottheonlyavailablechecksummechanism.CyclicalRedundancyChecks(CRC)are verypowerfulerrordetectionschemesthatareusednotablyondisks,bymanydatalinklayerprotocolsandle formatssuchasziporpng.Theycaneasilybeimplementedefcientlyinhardwareandhavebettererror-detection capabilitiesthantheInternetchecksum [SGP98] .However,whenthersttransportprotocolsweredesigned, CRCswereconsideredtobetooCPU-intensiveforsoftwareimplementationsandotherchecksummechanisms wereusedinstead.TheTCP/IPcommunitychosetheInternetchecksum,theOSIcommunitychosetheFletcher checksum [Sklower89] .Now,thereareefcienttechniquestoquicklycomputeCRCsinsoftware [Feldmeier95] ,theSCTPprotocolinitiallychosetheAdler-32checksumbutreplaceditrecentlywithaCRC(see RFC3309). Thesecondimperfectionofthenetworklayeristhatsegmentsmaybelost.Aswewillseelater,themaincause ofpacketlossesinthenetworklayeristhelackofbuffersinintermediaterouters.Sincethereceiversendsan acknowledgementsegmentafterhavingreceivedeachdatasegment,thesimplestsolutiontodealwithlossesisto usearetransmissiontimer.Whenthesendersendsasegment,itstartsaretransmissiontimer.Thevalueofthis retransmissiontimershouldbelargerthanthe round-trip-time,i.e.thedelaybetweenthetransmissionofadata segmentandthereceptionofthecorrespondingacknowledgement.Whentheretransmissiontimerexpires,the senderassumesthatthedatasegmenthasbeenlostandretransmitsit.Thisisillustratedinthegurebelow. Unfortunately,retransmissiontimersalonearenotsufcienttorecoverfromsegmentlosses.Letusconsider,as anexample,thesituationdepictedbelowwhereanacknowledgementislost.Inthiscase,thesenderretransmits 4.1.Principlesofareliabletransportprotocol 71

    PAGE 76

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.6:Usingretransmissiontimerstorecoverfromsegmentlosses thedatasegmentthathasnotbeenacknowledged.Unfortunately,asillustratedinthegurebelow,thereceiver considerstheretransmissionasanewsegmentwhosepayloadmustbedeliveredtoitsuser. Figure4.7:Limitationsofretransmissiontimers Tosolvethisproblem,transportprotocolsassociatea sequencenumber toeachdatasegment.This sequence number isoneoftheeldsfoundintheheaderofdatasegments.Weusethenotation D(S,...) toindicatea datasegmentwhosesequencenumbereldissetto S.Theacknowledgementsalsocontainasequencenumber indicatingthedatasegmentsthatitisacknowledging.Weuse OKS toindicateanacknowledgementsegmentthat conrmsthereceptionof D(S,...).Thesequencenumberisencodedasabitstringofxedlength.Thesimplest transportprotocolistheAlternatingBitProtocol(ABP). TheAlternatingBitProtocolusesasinglebittoencodethesequencenumber.Itcanbeimplementedeasily.The senderandthereceiversonlyrequireafourstatesFiniteStateMachine. 72Chapter4.Thetransportlayer

    PAGE 77

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.8:Alternatingbitprotocol:SenderFSM Theinitialstateofthesenderis WaitforD(0,...).Inthisstate,thesenderwaitsfora Data.request.Therstdata segmentthatitsendsusessequencenumber 0.Afterhavingsentthissegment,thesenderwaitsforan OK0 acknowledgement.Asegmentisretransmitteduponexpirationoftheretransmissiontimerorifanacknowledgement withanincorrectsequencenumberhasbeenreceived. Thereceiverrstwaitsfor D(0,...).Ifthesegmentcontainsacorrect CRC,itpassestheSDUtoitsuserandsends OK0.IfthesegmentcontainsaninvalidCRC,itisimmediatelydiscarded.Then,thereceiverwaitsfor D(1,...). Inthisstate,itmayreceiveaduplicate D(0,...) oradatasegmentwithaninvalidCRC.Inbothcases,itreturnsan OK0 segmenttoallowthesendertorecoverfromthepossiblelossoftheprevious OK0 segment. Note: Dealingwithcorruptedsegments ThereceiverFSMoftheAlternatingbitprotocoldiscardsallsegmentsthatcontainaninvalidCRC.Thisisthe safestapproachsincethereceivedsegmentcanbecompletelydifferentfromthesegmentsentbytheremotehost. Areceivershouldnotattemptatextractinginformationfromacorruptedsegmentbecauseitcannotknowwhich portionofthesegmenthasbeenaffectedbytheerror. Thegurebelowillustratestheoperationofthealternatingbitprotocol. TheAlternatingBitProtocolcanrecoverfromtransmissionerrorsandsegmentlosses.However,ithasone importantdrawback.Considertwohoststhataredirectlyconnectedbya50Kbits/secsatellitelinkthathasa250 millisecondspropagationdelay.Ifthesehostssend1000bitssegments,thenthemaximumthroughputthatcanbe achievedbythealternatingbitprotocolisonesegmentevery 20+250+250=520 millisecondsifweignorethe transmissiontimeoftheacknowledgement.Thisislessthan2Kbits/sec! Go-back-nandselectiverepeat Toovercometheperformancelimitationsofthealternatingbitprotocol,transportprotocolsrelyon pipelining. Thistechniqueallowsasendertotransmitseveralconsecutivesegmentswithoutbeingforcedtowaitforan acknowledgementaftereachsegment.Eachdatasegmentcontainsasequencenumberencodedinan n bitseld. Pipelining allowsthesendertotransmitsegmentsatahigherrate,butweneedtoensurethatthereceiverdoesnot 4.1.Principlesofareliabletransportprotocol 73

    PAGE 78

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.9:Alternatingbitprotocol:ReceiverFSM Figure4.10:Operationofthealternatingbitprotocol Figure4.11:Pipeliningtoimprovetheperformanceoftransportprotocols 74Chapter4.Thetransportlayer

    PAGE 79

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 becomeoverloaded.Otherwise,thesegmentssentbythesenderarenotcorrectlyreceivedbythedestination.The transportprotocolsthatrelyonpipeliningallowthesendertotransmit W unacknowledgedsegmentsbeforebeing forcedtowaitforanacknowledgementfromthereceivingentity. Thisisimplementedbyusinga slidingwindow.Theslidingwindowisthesetofconsecutivesequencenumbers thatthesendercanusewhentransmittingsegmentswithoutbeingforcedtowaitforanacknowledgement.The gurebelowshowsaslidingwindowcontainingvesegments(6,7,8,9 and 10).Twoofthesesequencenumbers (6 and 7 )havebeenusedtosendsegmentsandonlythreesequencenumbers(8, 9 and 10)remaininthesliding window.Theslidingwindowissaidtobeclosedonceallsequencenumberscontainedintheslidingwindowhave beenused. Figure4.12:Theslidingwindow Thegurebelowillustratestheoperationoftheslidingwindow.Theslidingwindowshowncontainsthreesegments.Thesendercanthustransmitthreesegmentsbeforebeingforcedtowaitforanacknowledgement.The slidingwindowmovestothehighersequencenumbersuponreceptionofacknowledgements.Whentherst acknowledgement(OK0)isreceived,itallowsthesendertomoveitsslidingwindowtotherightandsequence number 3 becomesavailable.ThissequencenumberisusedlatertotransmitSDU d. Figure4.13:Slidingwindowexample Inpractice,asthesegmentheaderencodesthesequencenumberinan n bitsstring,onlythesequencenumbers between 0 and 2 n )Tj/T1_3 9.963 Tf9.353 0 Td(1 canbeused.Thisimpliesthatthesamesequencenumberisusedfordifferentsegmentsand thattheslidingwindowwillwrap.Thisisillustratedinthegurebelowassumingthat 2 bitsareusedtoencode thesequencenumberinthesegmentheader.Notethatuponreceptionof OK1,thesenderslidesitswindowand canusesequencenumber 0 again. Unfortunately,segmentlossesdonotdisappearbecauseatransportprotocolisusingaslidingwindow.Torecover fromsegmentlosses,aslidingwindowprotocolmustdene: aheuristictodetectsegmentlosses a retransmissionstrategy toretransmitthelostsegments. 4.1.Principlesofareliabletransportprotocol 75

    PAGE 80

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.14:Utilisationoftheslidingwindowwithmoduloarithmetic Thesimplestslidingwindowprotocoluses go-back-n recovery.Intuitively, go-back-n operatesasfollows.A go-back-n receiverisassimpleaspossible.Itonlyacceptsthesegmentsthatarrivein-sequence.A go-backn receiverdiscardsanyout-of-sequencesegmentthatitreceives.When go-back-n receivesadatasegment,it alwaysreturnsanacknowledgementcontainingthesequencenumberofthelastin-sequencesegmentthatithas received.Thisacknowledgementissaidtobe cumulative.Whena go-back-n receiversendsanacknowledgement forsequencenumber x,itimplicitlyacknowledgesthereceptionofallsegmentswhosesequencenumberisearlier than x.Akeyadvantageofthesecumulativeacknowledgementsisthatitiseasytorecoverfromthelossofan acknowledgement.Considerforexamplea go-back-n receiverthatreceivedsegments 1, 2 and 3.Itsent OK1, OK2 and OK3.Unfortunately, OK1 and OK2 werelost.Thankstothecumulativeacknowledgements,whenthe receiverreceives OK3,itknowsthatallthreesegmentshavebeencorrectlyreceived. ThegurebelowshowstheFSMofasimple go-back-n receiver.Thisreceiverusestwovariables: lastack and next. next isthenextexpectedsequencenumberand lastack thesequencenumberofthelastdatasegmentthathas beenacknowledged.Thereceiveronlyacceptsthesegmentsthatarereceivedinsequence. maxseq isthenumber ofdifferentsequencenumbers(2 n ). Figure4.15:Go-back-n:receiverFSM A go-back-n senderisalsoverysimple.Itusesasendingbufferthatcanstoreanentireslidingwindowof segments 3 .Thesegmentsaresentwithincreasingsequencenumber(modulo maxseq).Thesendermustwaitfor 3 Thesizeoftheslidingwindowcanbeeitherxedforagivenprotocolornegotiatedduringtheconnectionestablishmentphase.We'll seelaterthatitisalsopossibletochangethesizeoftheslidingwindowduringtheconnection'slifetime. 76Chapter4.Thetransportlayer

    PAGE 81

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 anacknowledgementonceitssendingbufferisfull.Whena go-back-n senderreceivesanacknowledgement,it removesfromthesendingbufferalltheacknowledgedsegmentsandusesaretransmissiontimertodetectsegment losses.Asimple go-back-n sendermaintainsoneretransmissiontimerperconnection.Thistimerisstartedwhen therstsegmentissent.Whenthe go-back-nsender receivesanacknowledgement,itrestartstheretransmission timeronlyiftherearestillunacknowledgedsegmentsinitssendingbuffer.Whentheretransmissiontimerexpires, the go-back-n senderassumesthatalltheunacknowledgedsegmentscurrentlystoredinitssendingbufferhave beenlost.Itthusretransmitsalltheunacknowledgedsegmentsinthebufferandrestartsitsretransmissiontimer. Figure4.16:Go-back-n:senderFSM Theoperationof go-back-n isillustratedinthegurebelow.Inthisgure,notethatuponreceptionoftheoutof-sequencesegment D(2,c),thereceiverreturnsacumulativeacknowledgement C(OK,0) thatacknowledgesall thesegmentsthathavebeenreceivedinsequence.Thelostsegmentisretransmittedupontheexpirationofthe retransmissiontimer. Figure4.17:Go-back-n:example Themainadvantageof go-back-n isthatitcanbeeasilyimplemented,anditcanalsoprovidegoodperformance whenonlyafewsegmentsarelost.However,whentherearemanylosses,theperformanceof go-back-n quickly dropsfortworeasons: the go-back-n receiverdoesnotacceptout-of-sequencesegments the go-back-n senderretransmitsallunacknowledgedsegmentsonceitshasdetectedaloss Selectiverepeat isabetterstrategytorecoverfromsegmentlosses.Intuitively, selectiverepeat allowsthereceiver toacceptout-of-sequencesegments.Furthermore,whena selectiverepeat senderdetectslosses,itonlyretransmits thesegmentsthathavebeenlostandnotthesegmentsthathavealreadybeencorrectlyreceived. 4.1.Principlesofareliabletransportprotocol 77

    PAGE 82

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 A selectiverepeat receivermaintainsaslidingwindowof W segmentsandstoresinabuffertheout-of-sequence segmentsthatitreceives.Thegurebelowshowsavesegmentreceivewindowonareceiverthathasalready receivedsegments 7 and 9. Figure4.18:Thereceivingwindowwithselectiverepeat A selectiverepeat receiverdiscardsallsegmentshavinganinvalidCRC,andmaintainsthevariable lastack as thesequencenumberofthelastin-sequencesegmentthatithasreceived.Thereceiveralwaysincludesthevalue of lastack intheacknowledgementsthatitsends.Someprotocolsalsoallowthe selectiverepeat receiverto acknowledgetheout-of-sequencesegmentsthatithasreceived.Thiscanbedoneforexamplebyplacingthe listofthesequencenumbersofthecorrectlyreceived,butout-of-sequencesegmentsintheacknowledgements togetherwiththe lastack value. Whena selectiverepeat receiverreceivesadatasegment,itrstverieswhetherthesegmentisinsideitsreceiving window.Ifyes,thesegmentisplacedinthereceivebuffer.Ifnot,thereceivedsegmentisdiscardedandan acknowledgementcontaining lastack issenttothesender.Thereceiverthenremovesallconsecutivesegments startingat lastack (ifany)fromthereceivebuffer.Thepayloadsofthesesegmentsaredeliveredtotheuser, lastack andthereceivingwindowareupdated,andanacknowledgementacknowledgingthelastsegmentreceived insequenceissent. The selectiverepeat sendermaintainsasendingbufferthatcanstoreupto W unacknowledgedsegments.These segmentsaresentaslongasthesendingbufferisnotfull.Severalimplementationsofa selectiverepeat sender arepossible.Asimpleimplementationistoassociatearetransmissiontimertoeachsegment.Thetimerisstarted whenthesegmentissentandcancelleduponreceptionofanacknowledgementthatcoversthissegment.Whena retransmissiontimerexpires,thecorrespondingsegmentisretransmittedandthisretransmissiontimerisrestarted. Whenanacknowledgementisreceived,allthesegmentsthatarecoveredbythisacknowledgementareremoved fromthesendingbufferandtheslidingwindowisupdated. Thegurebelowillustratestheoperationof selectiverepeat whensegmentsarelost.Inthisgure, C(OK,x) is usedtoindicatethatallsegments,uptoandincludingsequencenumber x havebeenreceivedcorrectly. Figure4.19:Selectiverepeat:example Purecumulativeacknowledgementsworkwellwiththe go-back-n strategy.However,withonlycumulativeacknowledgementsa selectiverepeat sendercannoteasilydeterminewhichdatasegmentshavebeencorrectlyreceivedafteradatasegmenthasbeenlost.Forexample,inthegureabove,thesecond C(OK,0) doesnotinform 78Chapter4.Thetransportlayer

    PAGE 83

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 explicitlythesenderofthereceptionof D(2,c) andthesendercouldretransmitthissegmentalthoughithasalreadybeenreceived.Apossiblesolutiontoimprovetheperformanceof selectiverepeat istoprovideadditional informationaboutthereceivedsegmentsintheacknowledgementsthatarereturnedbythereceiver.Forexample, thereceivercouldaddinthereturnedacknowledgementthelistofthesequencenumbersofallsegmentsthat havealreadybeenreceived.Suchacknowledgementsaresometimescalled selectiveacknowledgements.Thisis illustratedinthegurebelow. Inthegureabove,whenthesenderreceives C(OK,0,[2]),itknowsthatallsegmentsuptoandincluding D(0,...) havebeencorrectlyreceived.Italsoknowsthatsegment D(2,...) hasbeenreceivedandcancanceltheretransmissiontimerassociatedtothissegment.However,thissegmentshouldnotberemovedfromthesendingbuffer beforethereceptionofacumulativeacknowledgement(C(OK,2) inthegureabove)thatcoversthissegment. Note: Maximumwindowsizewith go-back-n and selectiverepeat Atransportprotocolthatuses n bitstoencodeitssequencenumbercansendupto 2 n differentsegments.However, toensureareliabledeliveryofthesegments, go-back-n and selectiverepeat cannotuseasendingwindowof 2 n segments.Considerrst go-back-n andassumethatasendersends 2 n segments.Thesesegmentsarereceived in-sequencebythedestination,butallthereturnedacknowledgementsarelost.Thesenderwillretransmitall segmentsandtheywillallbeacceptedbythereceiveranddeliveredasecondtimetotheuser.Itiseasytoseethat thisproblemcanbeavoidedifthemaximumsizeofthesendingwindowis 2 n )Tj/T1_4 9.963 Tf10.124 0 Td(1 segments.Asimilarproblem occurswith selectiverepeat.However,asthereceiveracceptsout-of-sequencesegments,asendingwindowof 2 n )Tj/T1_4 9.963 Tf9.882 0 Td(1 segmentsisnotsufcienttoensureareliabledeliveryofallsegments.Itcanbeeasilyshownthattoavoid thisproblem,a selectiverepeat sendercannotuseawindowthatislargerthan 2 n 2 segments. Go-back-n or selectiverepeat areusedbytransportprotocolstoprovideareliabledatatransferaboveanunreliable networklayerservice.Untilnow,wehaveassumedthatthesizeoftheslidingwindowwasxedfortheentire lifetimeoftheconnection.Inpracticeatransportlayerentityisusuallyimplementedintheoperatingsystemand sharesmemorywithotherpartsofthesystem.Furthermore,atransportlayerentitymustsupportseveral(possibly hundredsorthousands)oftransportconnectionsatthesametime.Thisimpliesthatthememorywhichcanbe usedtosupportthesendingorthereceivingbufferofatransportconnectionmaychangeduringthelifetimeofthe connection 4 .Thus,atransportprotocolmustallowthesenderandthereceivertoadjusttheirwindowsizes. Todealwiththisissue,transportprotocolsallowthereceivertoadvertisethecurrentsizeofitsreceivingwindow inalltheacknowledgementsthatitsends.Thereceivingwindowadvertisedbythereceiverboundsthesizeof thesendingbufferusedbythesender.Inpractice,thesendermaintainstwostatevariables: swin,thesizeofits sendingwindow(thatmaybeadjustedbythesystem)and rwin,thesizeofthereceivingwindowadvertisedby thereceiver.Atanytime,thenumberofunacknowledgedsegmentscannotbelargerthan min(swin,rwin) 5 .The utilisationofdynamicwindowsisillustratedinthegurebelow. Thereceivermayadjustitsadvertisedreceivewindowbasedonitscurrentmemoryconsumption,butalsotolimit thebandwidthusedbythesender.Inpractice,thereceivebuffercanalsoshrinkastheapplicationmaynotableto processthereceiveddataquicklyenough.Inthiscase,thereceivebuffermaybecompletelyfullandtheadvertised receivewindowmayshrinkto 0.Whenthesenderreceivesanacknowledgementwithareceivewindowsetto 0, itisblockeduntilitreceivesanacknowledgementwithapositivereceivewindow.Unfortunately,asshowninthe gurebelow,thelossofthisacknowledgementcouldcauseadeadlockasthesenderwaitsforanacknowledgement whilethereceiveriswaitingforadatasegment. Tosolvethisproblem,transportprotocolsrelyonaspecialtimer:the persistencetimer.Thistimerisstarted bythesenderwheneveritreceivesanacknowledgementadvertisingareceivewindowsetto 0.Whenthetimer expires,thesenderretransmitsanoldsegmentinordertoforcethereceivertosendanewacknowledgement,and hencesendthecurrentreceivewindowsize. Toconcludeourdescriptionofthebasicmechanismsfoundintransportprotocols,westillneedtodiscussthe impactofsegmentsarrivinginthewrongorder.Iftwoconsecutivesegmentsarereordered,thereceiverrelieson theirsequencenumberstoreordertheminitsreceivebuffer.Unfortunately,astransportprotocolsreusethesame sequencenumberfordifferentsegments,ifasegmentisdelayedforaprolongedperiodoftime,itmightstillbe acceptedbythereceiver.Thisisillustratedinthegurebelowwheresegment D(1,b) isdelayed. 4 Foradiscussiononhowthesendingbuffercanchange,seee.g. [SMM1998] 5 Notethatifthereceivewindowshrinks,itmighthappenthatthesenderhasalreadysentasegmentthatisnotanymoreinsideitswindow. Thissegmentwillbediscardedbythereceiverandthesenderwillretransmititlater. 4.1.Principlesofareliabletransportprotocol 79

    PAGE 84

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.20:Dynamicreceivingwindow Figure4.21:Riskofdeadlockwithdynamicwindows 80Chapter4.Thetransportlayer

    PAGE 85

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.22:Ambiguitiescausedbyexcessivedelays Todealwiththisproblem,transportprotocolscombinetwosolutions.First,theyuse32bitsormoretoencode thesequencenumberinthesegmentheader.Thisincreasestheoverhead,butalsoincreasesthedelaybetween thetransmissionoftwodifferentsegmentshavingthesamesequencenumber.Second,transportprotocolsrequire thenetworklayertoenforcea MaximumSegmentLifetime(MSL).Thenetworklayermustensurethatnopacket remainsinthenetworkformorethanMSLseconds.IntheInternettheMSLisassumed 6 tobe2minutes RFC 793.Notethatthislimitsthemaximumbandwidthofatransportprotocol.Ifituses n bitstoencodeitssequence numbers,thenitcannotsendmorethan 2 n segmentseveryMSLseconds. Transportprotocolsoftenneedtosenddatainbothdirections.Toreducetheoverheadcausedbytheacknowledgements,mosttransportprotocolsuse piggybacking.Thankstothistechnique,atransportentitycanplaceinsidethe headerofthedatasegmentsthatitsends,theacknowledgementsandthereceivewindowthatitadvertisesforthe oppositedirectionofthedataow.Themainadvantageofpiggybackingisthatitreducestheoverheadasitisnot necessarytosendacompletesegmenttocarryanacknowledgement.Thisisillustratedinthegurebelowwhere theacknowledgementnumberisunderlinedinthedatasegments.Piggybackingisonlyusedwhendataows inbothdirections.Areceiverwillgenerateapureacknowledgementwhenitdoesnotsenddataintheopposite directionasshowninthebottomofthegure. Figure4.23:Piggybacking 6 Aswewillseeinthenextchapter,theInternetdoesnotstrictlyenforcethisMSL.However,itisreasonabletoexpectthatmostpackets ontheInternetwillnotremaininthenetworkduringmorethan2minutes.Thereareafewexceptionstothisrule,suchas RFC1149 whose implementationisdescribedin http://www.blug.linux.no/rfc1149/ buttherearefewreallinkssupporting RFC1149 intheInternet. 4.1.Principlesofareliabletransportprotocol 81

    PAGE 86

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Thelastpointtobediscussedaboutthedatatransfermechanismsusedbytransportprotocolsistheprovisionofa bytestreamservice.Asindicatedintherstchapter,thebytestreamserviceiswidelyusedinthetransportlayer. Thetransportprotocolsthatprovideabytestreamserviceassociateasequencenumbertoallthebytesthatare sentandplacethesequencenumberoftherstbyteofthesegmentinthesegment'sheader.Thisisillustratedin thegurebelow.Inthisexample,thesenderchoosestoputtwobytesineachoftherstthreesegments.Thisis duetographicalreasons,arealtransportprotocolwoulduselargersegmentsinpractice.However,thedivisionof thebytestreamintosegmentscombinedwiththelossesandretransmissionsexplainwhythebytestreamservice doesnotpreservetheSDUboundaries. Figure4.24:Provisionofthebytestreamservice Connectionestablishmentandrelease Thelastpointstobediscussedaboutthetransportprotocolarethemechanismsusedtoestablishandreleasea transportconnection. Weexplainedintherstchapterstheserviceprimitivesusedtoestablishaconnection.Thesimplestapproachto establishatransportconnectionwouldbetodenetwospecialcontrolsegments: CR and CA.The CR segmentis sentbythetransportentitythatwishestoinitiateaconnection.Iftheremoteentitywishestoaccepttheconnection, itrepliesbysendinga CA segment.Thetransportconnectionisconsideredtobeestablishedoncethe CA segment hasbeenreceivedanddatasegmentscanbesentinbothdirections. Figure4.25:Naivetransportconnectionestablishment Unfortunately,thisschemeisnotsufcientforseveralreasons.First,atransportentityusuallyneedstomaintain severaltransportconnectionswithremoteentities.Sometimes,differentusers(i.e.processes)runningabovea giventransportentityrequesttheestablishmentofseveraltransportconnectionstodifferentusersattachedtothe sameremotetransportentity.Thesedifferenttransportconnectionsmustbeclearlyseparatedtoensurethatdata fromoneconnectionisnotpassedtotheotherconnections.Thiscanbeachievedbyusingaconnectionidentier, chosenbythetransportentitiesandplacedinsideeachsegmenttoallowtheentitywhichreceivesasegmentto easilyassociateittooneestablishedconnection. 82Chapter4.Thetransportlayer

    PAGE 87

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Second,asthenetworklayerisimperfect,the CR or CA segmentcanbelost,delayed,orsufferfromtransmission errors.Todealwiththeseproblems,thecontrolsegmentsmustbeprotectedbyusingaCRCorchecksumtodetect transmissionerrors.Furthermore,sincethe CA segmentacknowledgesthereceptionofthe CR segment,the CR segmentcanbeprotectedbyusingaretransmissiontimer. Unfortunately,thisschemeisnotsufcienttoensurethereliabilityofthetransportservice.Considerforexample ashort-livedtransportconnectionwhereasingle,butimportanttransfer(e.g.moneytransferfromabankaccount) issent.Suchashort-livedconnectionstartswitha CR segmentacknowledgedbya CA segment,thenthedata segmentissent,acknowledgedandtheconnectionterminates.Unfortunately,asthenetworklayerserviceis unreliable,delayscombinedtoretransmissionsmayleadtothesituationdepictedinthegurebelow,wherea delayed CR anddatasegmentsfromaformerconnectionareacceptedbythereceivingentityasvalidsegments, andthecorrespondingdataisdeliveredtotheuser.DuplicatingSDUsisnotacceptable,andthetransportprotocol mustsolvethisproblem. Figure4.26:Duplicatetransportconnections? Toavoidtheseduplicates,transportprotocolsrequirethenetworklayertoboundthe MaximumSegmentLifetime (MSL).Theorganisationofthenetworkmustguaranteethatnosegmentremainsinthenetworkforlongerthan MSL seconds.Ontoday'sInternet, MSL isexpectedtobe2minutes.Toavoidduplicatetransportconnections, transportprotocolentitiesmustbeabletosafelydistinguishbetweenaduplicate CR segmentandanew CR segment,withoutforcingeachtransportentitytorememberallthetransportconnectionsthatithasestablishedin thepast. Aclassicalsolutiontoavoidrememberingtheprevioustransportconnectionstodetectduplicatesistouseaclock insideeachtransportentity.This transportclock hasthefollowingcharacteristics: the transportclock isimplementedasa k bitscounteranditsclockcycleissuchthat 2 k cycle>> MSL .Furthermore,the transportclock counterisincrementedeveryclockcycleandaftereachconnection establishment.Thisclockisillustratedinthegurebelow. the transportclock mustcontinuetobeincrementedevenifthetransportentitystopsorreboots Figure4.27:Transportclock Itshouldbenotedthat transportclocks donotneedandusuallyarenotsynchronisedtothereal-timeclock. Preciselysynchronisingreal-timeclocksisaninterestingproblem,butitisoutsidethescopeofthisdocument. See[Mills2006]foradetaileddiscussiononsynchronisingthereal-timeclock. 4.1.Principlesofareliabletransportprotocol 83

    PAGE 88

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 The transportclock iscombinedwithanexchangeofthreesegments,calledthe threewayhandshake,todetect duplicates.This threewayhandshake occursasfollows: 1.Theinitiatingtransportentitysendsa CR segment.Thissegmentrequeststheestablishment ofatransportconnection.Itcontainsaconnectionidentier(notshowninthegure)anda sequencenumber(seq=x inthegurebelow)whosevalueisextractedfromthe transportclock .Thetransmissionofthe CR segmentisprotectedbyaretransmissiontimer. 2.Theremotetransportentityprocessesthe CR segmentandcreatesstatefortheconnectionattempt.Atthisstage,theremoteentitydoesnotyetknowwhetherthisisanewconnection attemptoraduplicatesegment.Itreturnsa CA segmentthatcontainsanacknowledgement numbertoconrmthereceptionofthe CR segment(ack=x inthegurebelow)andasequence number(seq=y inthegurebelow)whosevalueisextractedfromitstransportclock.Atthis stage,theconnectionisnotyetestablished. 3.Theinitiatingentityreceivesthe CA segment.Theacknowledgementnumberofthissegment conrmsthattheremoteentityhascorrectlyreceivedthe CA segment.Thetransportconnection isconsideredtobeestablishedbytheinitiatingentityandthenumberingofthedatasegments startsatsequencenumber x.Beforesendingdatasegments,theinitiatingentitymustacknowledgethereceived CA segmentsbysendinganother CA segment. 4.Theremoteentityconsidersthetransportconnectiontobeestablishedafterhavingreceivedthe segmentthatacknowledgesits CA segment.Thenumberingofthedatasegmentssentbythe remoteentitystartsatsequencenumber y. Thethreewayhandshakeisillustratedinthegurebelow. Figure4.28:Three-wayhandshake Thankstothethreewayhandshake,transportentitiesavoidduplicatetransportconnections.Thisisillustratedby thethreescenariosbelow. Therstscenarioiswhentheremoteentityreceivesanold CR segment.Itconsidersthis CR segmentasa connectionestablishmentattemptandrepliesbysendinga CA segment.However,theinitiatinghostcannotmatch thereceived CA segmentwithapreviousconnectionattempt.Itsendsacontrolsegment(REJECT inthegure below)tocancelthespuriousconnectionattempt.Theremoteentitycancelstheconnectionattemptuponreception ofthiscontrolsegment. Asecondscenarioiswhentheinitiatingentitysendsa CR segmentthatdoesnotreachtheremoteentityand receivesaduplicate CA segmentfromapreviousconnectionattempt.Thisduplicate CA segmentcannotcontain avalidacknowledgementforthe CR segmentasthesequencenumberofthe CR segmentwasextractedfromthe transportclockoftheinitiatingentity.The CA segmentisthusrejectedandthe CR segmentisretransmittedupon expirationoftheretransmissiontimer. Thelastscenarioislesslikely,butititimportanttoconsideritaswell.Theremoteentityreceivesanold CR segment.Itnotestheconnectionattemptandacknowledgesitbysendinga CA segment.Theinitiatingentity 84Chapter4.Thetransportlayer

    PAGE 89

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.29:Three-wayhandshake:recoveryfromaduplicate CR Figure4.30:Three-wayhandshake:recoveryfromaduplicate CA 4.1.Principlesofareliabletransportprotocol 85

    PAGE 90

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 doesnothaveamatchingconnectionattemptandrepliesbysendinga REJECT.Unfortunately,thissegmentnever reachestheremoteentity.Instead,theremoteentityreceivesaretransmissionofanolder CA segmentthatcontains thesamesequencenumberastherst CR segment.This CA segmentcannotbeacceptedbytheremoteentityas aconrmationofthetransportconnectionasitsacknowledgementnumbercannothavethesamevalueasthe sequencenumberoftherst CA segment. Figure4.31:Three-wayhandshake:recoveryfromduplicates CR and CA Whenwediscussedtheconnection-orientedservice,wementionedthattherearetwotypesofconnectionreleases : abruptrelease and gracefulrelease. Therstsolutiontoreleaseatransportconnectionistodeneanewcontrolsegment(e.g.the DR segment)and considertheconnectiontobereleasedoncethissegmenthasbeensentorreceived.Thisisillustratedinthegure below. Figure4.32:Abruptconnectionrelease Astheentitythatsendsthe DR segmentcannotknowwhethertheotherentityhasalreadysentallitsdataonthe connection,SDUscanbelostduringsuchan abruptconnectionrelease. Thesecondmethodtoreleaseatransportconnectionistoreleaseindependentlythetwodirectionsofdatatransfer. OnceauserofthetransportservicehassentallitsSDUs,itperformsa DISCONNECT.req foritsdirectionofdata transfer.Thetransportentitysendsacontrolsegmenttorequestthereleaseoftheconnection after thedeliveryof allpreviousSDUstotheremoteuser.Thisisusuallydonebyplacinginthe DR thenextsequencenumberandby deliveringthe DISCONNECT.ind onlyafterallprevious DATA.ind.Theremoteentityconrmsthereceptionof the DR segmentandthereleaseofthecorrespondingdirectionofdatatransferbyreturninganacknowledgement. Thisisillustratedinthegurebelow. 86Chapter4.Thetransportlayer

    PAGE 91

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.33:Gracefulconnectionrelease 4.2TheUserDatagramProtocol TheUserDatagramProtocol(UDP)isdenedin RFC768.Itprovidesanunreliableconnectionlesstransport serviceontopoftheunreliablenetworklayerconnectionlessservice.ThemaincharacteristicsoftheUDPservice are: theUDPservicecannotdeliverSDUsthatarelargerthan65507bytes 7 theUDPservicedoesnotguaranteethedeliveryofSDUs(lossesanddesquencingcanoccur) theUDPservicewillnotdeliveracorruptedSDUtothedestination Comparedtotheconnectionlessnetworklayerservice,themainadvantageoftheUDPserviceisthatitallows severalapplicationsrunningonahosttoexchangeSDUswithseveralotherapplicationsrunningonremotehosts. Letusconsidertwohosts,e.g.aclientandaserver.Thenetworklayerserviceallowstheclienttosendinformation totheserver,butifanapplicationrunningontheclientwantstocontactaparticularapplicationrunningonthe server,thenanadditionaladdressingmechanismisrequiredotherthantheIPaddressthatidentiesahost,inorder todifferentiatetheapplicationrunningonahost.Thisadditionaladdressingisprovidedby portnumbers.When aserverusingUDPisenabledonahost,thisserverregistersa portnumber .This portnumber willbeusedbythe clientstocontacttheserverprocessviaUDP. ThegurebelowshowsatypicalusageoftheUDPportnumbers.Theclientprocessusesportnumber 1234 while theserverprocessusesportnumber 5678.Whentheclientsendsarequest,itisidentiedasoriginatingfromport number 1234 ontheclienthostanddestinedtoportnumber 5678 ontheserverhost.Whentheserverprocess repliestothisrequest,theserver'sUDPimplementationwillsendthereplyasoriginatingfromport 5678 onthe serverhostanddestinedtoport 1234 ontheclienthost. UDPusesasinglesegmentformatshowninthegurebelow. TheUDPheadercontainsfourelds: a16bitssourceport a16bitsdestinationport a16bitslengtheld a16bitschecksum Astheportnumbersareencodedasa16bitseld,therecanbeuptoonly65535differentserverprocessesthatare boundtoadifferentUDPportatthesametimeonagivenserver.Inpractice,thislimitisneverreached.However, itisworthnoticingthatmostimplementationsdividetherangeofallowedUDPportnumbersintothreedifferent ranges: 7 Thislimitationisduetothefactthatthenetworklayer(IPv4andIPv6)cannottransportpacketsthatarelargerthan64KBytes.AsUDP doesnotincludeanysegmentation/reassemblymechanism,itcannotsplitaSDUbeforesendingit. 4.2.TheUserDatagramProtocol 87

    PAGE 92

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.34:UsageoftheUDPportnumbers Figure4.35:UDPHeaderFormat theprivilegedportnumbers(1
    PAGE 93

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 aUDPapplicationthatisoftenusedinthewidearea.However,inlocalareanetworks,manydistributedsystems relyonRemoteProcedureCall(RPC )thatisoftenusedontopofUDP.InUnixenvironments,theNetworkFile System(NFS )isbuiltontopofRPCandrunsfrequentlyontopofUDP.AsecondclassofUDP-basedapplications aretheinteractivecomputergamesthatneedtofrequentlyexchangesmallmessages,suchastheplayer'slocation ortheirrecentactions.ManyofthesegamesuseUDPtominimisethedelayandcanrecoverfromlosses.A thirdclassofapplicationsaremultimediaapplicationssuchasinteractiveVoiceoverIPorinteractiveVideoover IP.Theseinteractiveapplicationsexpectadelayshorterthanabout200millisecondsbetweenthesenderandthe receiverandcanrecoverfromlossesdirectlyinsidetheapplication. 4.3TheTransmissionControlProtocol TheTransmissionControlProtocol(TCP)wasinitiallydenedin RFC793.Severalpartsoftheprotocolhave beenimprovedsincethepublicationoftheoriginalprotocolspecication 9 .However,thebasicsoftheprotocol remainandanimplementationthatonlysupports RFC793 shouldinter-operatewithtoday'simplementation. TCPprovidesareliablebytestream,connection-orientedtransportserviceontopoftheunreliableconnectionless networkserviceprovidedby IP.TCPisusedbyalargenumberofapplications,including: Email(SMTP, POP, IMAP) Worldwideweb( HTTP,...) Mostletransferprotocols( ftp,peer-to-peerlesharingapplications,...) remotecomputeraccess: telnet ssh, X11, VNC ,... non-interactivemultimediaapplications:ash OntheglobalInternet,mostoftheapplicationsusedinthewidearearelyonTCP.Manystudies 10 havereported thatTCPwasresponsibleformorethan90%ofthedataexchangedintheglobalInternet. Toprovidethisservice,TCPreliesonasimplesegmentformatthatisshowninthegurebelow.EachTCP segmentcontainsaheaderdescribedbelowand,optionally,apayload.ThedefaultlengthoftheTCPheaderis twentybytes,butsomeTCPheaderscontainoptions. Figure4.36:TCPheaderformat ATCPheadercontainsthefollowingelds: Sourceanddestinationports.ThesourceanddestinationportsplayanimportantroleinTCP,asthey allowtheidenticationoftheconnectiontowhichaTCPsegmentbelongs.WhenaclientopensaTCP connection,ittypicallyselectsanephemeralTCPportnumberasitssourceportandcontactstheserverby usingtheserver'sportnumber.Allthesegmentsthataresentbytheclientonthisconnectionhavethesame sourceanddestinationports.Theserversendssegmentsthatcontainassource(resp.destinationport,the 9 AdetailedpresentationofallstandardisationdocumentsconcerningTCPmaybefoundin RFC4614 10 SeveralresearchershaveanalysedtheutilisationofTCPandUDPintheglobalInternet.Mostofthesestudieshavebeenperformedbycollectingallthepacketstransmittedoveragivenlinkduringaperiodofafewhours ordaysandthenanalysingtheirheaderstoinferthetransportprotocolused,thetypeofapplication,...Recent studiesincludehttp://www.caida.org/research/trafc-analysis/tcpudpratio/, https://research.sprintlabs.com/packstat/packetoverview.php or http://www.nanog.org/meetings/nanog43/presentations/Labovitz_internetstats_N43.pdf 4.3.TheTransmissionControlProtocol 89

    PAGE 94

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 destination(resp.source)portofthesegmentssentbytheclient(seegure UtilizationoftheTCPsource anddestinationports).ATCPconnectionisalwaysidentiedbyvepiecesofinformation: theIPaddressoftheclient theIPaddressoftheserver theportchosenbytheclient theportchosenbytheserver TCP the sequencenumber (32bits), acknowledgementnumber (32bits)and window (16bits)eldsareused toprovideareliabledatatransfer,usingawindow-basedprotocol.InaTCPbytestream,eachbyteofthe streamconsumesonesequencenumber.Theirutilisationwillbedescribedinmoredetailinsection TCP reliabledatatransfer the Urgentpointer isusedtoindicatethatsomedatashouldbeconsideredasurgentinaTCPbytestream. However,itisrarelyusedinpracticeandwillnotbedescribedhere.Additionaldetailsabouttheutilisation ofthispointermaybefoundin RFC793, RFC1122 or[Stevens1994] theagseldcontainsasetofbitagsthatindicatehowasegmentshouldbeinterpretedbytheTCPentity receivingit: the SYN agisusedduringconnectionestablishment the FIN agisusedduringconnectionrelease the RST isusedincaseofproblemsorwhenaninvalidsegmenthasbeenreceived whenthe ACK agisset,itindicatesthatthe acknowledgment eldcontainsavalidnumber.Otherwise,thecontentofthe acknowledgment eldmustbeignoredbythereceiver the URG agisusedtogetherwiththe Urgentpointer the PSH agisusedasanoticationfromthesendertoindicatetothereceiverthatitshouldpassall thedataithasreceivedtothereceivingprocess.However,inpracticeTCPimplementationsdonot allowTCPuserstoindicatewhenthe PSH agshouldbesetandthustherearefewrealutilizationsof thisag. the checksum eldcontainsthevalueoftheInternetchecksumcomputedovertheentireTCPsegmentand apseudo-headeraswithUDP the Reserved eldwasinitiallyreservedforfutureutilization.Itisnowusedby RFC3168. the TCPHeaderLength (THL)or DataOffset eldisafourbitseldthatindicatesthesizeoftheTCP headerin32bitwords.ThemaximumsizeoftheTCPheaderisthus64bytes. the Optionalheaderextension isusedtoaddoptionalinformationtotheTCPheader.Thankstothisheader extension,itispossibletoaddneweldstotheTCPheaderthatwerenotplannedintheoriginalspecication.ThisallowedTCPtoevolvesincetheearlyeighties.ThedetailsoftheTCPheaderextensionare explainedinsections TCPconnectionestablishment and TCPreliabledatatransfer Therestofthissectionisorganisedasfollows.WerstexplaintheestablishmentandthereleaseofaTCP connection,thenwediscussthemechanismsthatareusedbyTCPtoprovideareliablebytestreamservice.We endthesectionwithadiscussionofnetworkcongestionandexplainthemechanismsthatTCPusestoavoid congestioncollapse. 4.3.1TCPconnectionestablishment ATCPconnectionisestablishedbyusingathree-wayhandshake.Theconnectionestablishmentphaseusesthe sequencenumber,the acknowledgmentnumber andthe SYN ag.WhenaTCPconnectionisestablished,thetwo communicatinghostsnegotiatetheinitialsequencenumbertobeusedinbothdirectionsoftheconnection.For this,eachTCPentitymaintainsa32bitscounter,whichissupposedtobeincrementedbyoneatleastevery4 90Chapter4.Thetransportlayer

    PAGE 95

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.37:UtilizationoftheTCPsourceanddestinationports microsecondsandaftereachconnectionestablishment 11 .WhenaclienthostwantstoopenaTCPconnection withaserverhost,itcreatesaTCPsegmentwith: the SYN agset the sequencenumber settothecurrentvalueofthe32bitscounteroftheclienthost'sTCPentity Uponreceptionofthissegment(whichisoftencalleda SYNsegment ),theserverhostreplieswithasegment containing: the SYN agset the sequencenumber settothecurrentvalueofthe32bitscounteroftheserverhost'sTCPentity the ACK agset the acknowledgmentnumber settothe sequencenumber ofthereceived SYN segmentincrementedby1 ( mod 2 32 ).WhenaTCPentitysendsasegmenthaving x+1 asacknowledgmentnumber,thisindicatesthat ithasreceivedalldatauptoandincludingsequencenumber x andthatitisexpectingdatahavingsequence number x+1.Asthe SYN agwassetinasegmenthavingsequencenumber x,thisimpliesthatsettingthe SYN aginasegmentconsumesonesequencenumber. Thissegmentisoftencalleda SYN+ACK segment.Theacknowledgmentconrmstotheclientthattheserverhas correctlyreceivedthe SYN segment.The sequencenumber ofthe SYN+ACK segmentisusedbytheserverhostto verifythatthe client hasreceivedthesegment.Uponreceptionofthe SYN+ACK segment,theclienthostreplies withasegmentcontaining: the ACK agset the acknowledgmentnumber settothe sequencenumber ofthereceived SYN+ACK segmentincremented by1( mod 2 32 ) Atthispoint,theTCPconnectionisopenandboththeclientandtheserverareallowedtosendTCPsegments containingdata.Thisisillustratedinthegurebelow. Inthegureabove,theconnectionisconsideredtobeestablishedbytheclientonceithasreceivedthe SYN+ACK segment,whiletheserverconsiderstheconnectiontobeestablisheduponreceptionofthe ACK segment.Therst datasegmentsentbytheclient(server)hasits sequencenumber setto x+1 (resp. y+1). Note: ComputingTCP'sinitialsequencenumber IntheoriginalTCPspecication RFC793,eachTCPentitymaintainedaclocktocomputetheinitialsequence number(ISN )placedinthe SYN and SYN+ACK segments.ThismadetheISNpredictableandcausedasecurity issue.Thetypicalsecurityproblemwasthefollowing.ConsideraserverthattrustsahostbasedonitsIPaddress 11 This32bitscounterwasspeciedin RFC793.A32bitscounterthatisincrementedevery4microsecondswrapsinabout4.5hours. ThisperiodismuchlargerthantheMaximumSegmentLifetimethatisxedat2minutesintheInternet( RFC791, RFC1122). 4.3.TheTransmissionControlProtocol 91

    PAGE 96

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.38:EstablishmentofaTCPconnection andallowsthesystemadministratortologinfromthishostwithoutgivingapassword 12 .Considernowanattacker whoknowsthisparticularcongurationandisabletosendIPpacketshavingtheclient'saddressassource.He cansendfakeTCPsegmentstotheserver,butdoesnotreceivetheserver'sanswers.Ifhecanpredictthe ISN that ischosenbytheserver,hecansendafake SYN segmentandshortlyafterthefake ACK segmentconrmingthe receptionofthe SYN+ACK segmentsentbytheserver.OncetheTCPconnectionisopen,hecanuseittosend anycommandtotheserver.Tocounterthisattack,currentTCPimplementationsaddrandomnesstothe ISN.One ofthesolutions,proposedin RFC1948 istocomputethe ISN as ISN=M+H(localhost,localport,remotehost,remoteport,secret). where M isthecurrentvalueoftheTCPclockand H`isacryptographichashfunction.`localhost and remotehost (resp. localport and remoteport )aretheIPaddresses(portnumbers)ofthelocalandremotehostand secret isa randomnumberonlyknownbytheserver.ThismethodallowstheservertousedifferentISNsfordifferentclients atthesametime. Measurements performedwiththerstimplementationsofthistechniqueshowedthatitwas difculttoimplementitcorrectly,buttoday'sTCPimplementationnowgenerategoodISNs. Aservercould,ofcourse,refusetoopenaTCPconnectionuponreceptionofa SYN segment.Thisrefusalmaybe duetovariousreasons.Theremaybenoserverprocessthatislisteningonthedestinationportofthe SYN segment. Theservercouldalwaysrefuseconnectionestablishmentsfromthisparticularclient(e.g.duetosecurityreasons) ortheservermaynothaveenoughresourcestoacceptanewTCPconnectionatthattime.Inthiscase,theserver wouldreplywithaTCPsegmenthavingits RST agsetandcontainingthe sequencenumber ofthereceived SYN segmentasits acknowledgmentnumber.Thisisillustratedinthegurebelow.Wediscusstheotherutilizations oftheTCP RST aglater(see TCPconnectionrelease). Figure4.39:TCPconnectionestablishmentrejectedbypeer TCPconnectionestablishmentcanbedescribedasthefourstateFiniteStateMachineshownbelow.InthisFSM, !X (resp. ?Y )indicatesthetransmissionofsegment X (resp.receptionofsegment Y )duringthecorresponding transition. Init istheinitialstate. Aclienthoststartsinthe Init state.Itthensendsa SYN segmentandentersthe SYNSent statewhereitwaits fora SYN+ACK segment.Then,itreplieswithan ACK segmentandentersthe Established statewheredatacan 12 OnmanydepartmentalnetworkscontainingUnixworkstations,itwascommontoallowusersononeofthehoststouserlogin RFC1258 toruncommandsonanyoftheworkstationsofthenetworkwithoutgivinganypassword.Inthiscase,theremoteworkstationauthenticated theclienthostbasedonitsIPaddress.Thiswasabadpracticefromasecurityviewpoint. 92Chapter4.Thetransportlayer

    PAGE 97

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.40:TCPFSMforconnectionestablishment beexchanged.Ontheotherhand,aserverhoststartsinthe Init state.Whenaserverprocessstartstolistento adestinationport,theunderlyingTCPentitycreatesaTCPcontrolblockandaqueuetoprocessincoming SYN segments.Uponreceptionofa SYN segment,theserver'sTCPentityreplieswitha SYN+ACK andentersthe SYN RCVD state.Itremainsinthisstateuntilitreceivesan ACK segmentthatacknowledgesits SYN+ACK segment, withthisitthenenterstheEstablishedstate. ApartfromthesetwopathsintheTCPconnectionestablishmentFSM,thereisathirdpaththatcorrespondstothe casewhenboththeclientandtheserversenda SYN segmenttoopenaTCPconnection 13 .Inthiscase,theclient andtheserversenda SYN segmentandenterthe SYNSent state.Uponreceptionofthe SYN segmentsentbythe otherhost,theyreplybysendinga SYN+ACK segmentandenterthe SYNRCVD state.The SYN+ACK thatarrives fromtheotherhostallowsittotransitiontothe Established state.Thegurebelowillustratessuchasimultaneous establishmentofaTCPconnection. Figure4.41:SimultaneousestablishmentofaTCPconnection 13 Ofcourse,suchasimultaneousTCPestablishmentcanonlyoccurifthesourceportchosenbytheclientisequaltothedestination portchosenbytheserver.Thismayhappenwhenahostcanservebothasaclientasaserverorinpeer-to-peerapplicationswhenthe communicatinghostsdonotuseephemeralportnumbers. 4.3.TheTransmissionControlProtocol 93

    PAGE 98

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 DenialofServiceattacks WhenaTCPentityopensaTCPconnection,itcreatesaTransmissionControlBlock(TCB).TheTCB containstheentirestatethatismaintainedbytheTCPentityforeachTCPconnection.Duringconnection establishment,theTCBcontainsthelocalIPaddress,theremoteIPaddress,thelocalportnumber,the remoteportnumber,thecurrentlocalsequencenumber,thelastsequencenumberreceivedfromtheremote entity.Untilthemid1990s,TCPimplementationshadalimitonthenumberofTCPconnectionsthatcould beinthe SYNRCVD stateatagiventime.Manyimplementationssetthislimittoabout100TCBs.Thislimit wasconsideredsufcientevenforheavilyloadhttpserversgiventhesmalldelaybetweenthereceptionofa SYN segmentandthereceptionofthe ACK segmentthatterminatestheestablishmentoftheTCPconnection. Whenthelimitof100TCBsinthe SYNRcvd stateisreached,theTCPentitydiscardsallreceivedTCP SYN segmentsthatdonotcorrespondtoanexistingTCB. Thislimitof100TCBsinthe SYNRcvd statewaschosentoprotecttheTCPentityfromtheriskofoverloadingitsmemorywithtoomanyTCBsinthe SYNRcvd state.However,itwasalsothereasonforanewtypeof DenialofService(DoS)attack RFC4987.ADoSattackisdenedasanattackwhereanattackercanrender aresourceunavailableinthenetwork.Forexample,anattackermaycauseaDoSattackona2Mbpslink usedbyacompanybysendingmorethan2Mbpsofpacketsthroughthislink.Inthiscase,theDoSattack wasmoresubtle.AsaTCPentitydiscardsallreceived SYN segmentsassoonasithas100TCBsinthe SYN Rcvd state,anattackersimplyhadtosendafew100 SYN segmentseverysecondtoaserverandneverreply tothereceived SYN+ACK segments.Toavoidbeingcaught,attackerswereofcoursesendingthese SYN segmentswithadifferentaddressthantheirownIPaddress a .OnmostTCPimplementations,onceaTCB enteredthe SYNRcvd state,itremainedinthisstateforseveralseconds,waitingforaretransmissionofthe initial SYN segment.Thisattackwaslatercalleda SYNood attackandtheserversoftheISPnamedpanix wereamongthersttobeaffected bythisattack. Toavoidthe SYNood attacks,recentTCPimplementationsnolongerenterthe SYNRcvd stateuponreceptionofa SYNsegment.Instead,theyreplydirectlywitha SYN+ACK segmentandwaituntilthereception ofavalid ACK.ThisimplementationtrickisonlypossibleiftheTCPimplementationisabletoverifythat thereceived ACK segmentacknowledgesthe SYN+ACK segmentsentearlierwithoutstoringtheinitialsequencenumberofthis SYN+ACK segmentinaTCB.Thesolutiontosolvethisproblem,whichisknownas SYNcookies istocomputethe32bitsofthe ISN asfollows: thehighorderbitscontaintheloworderbitsofacounterthatisincrementedslowly theloworderbitscontainahashvaluecomputedoverthelocalandremoteIPaddressesandportsand arandomsecretonlyknowntotheserver Theadvantageofthe SYNcookies isthatbyusingthem,theserverdoesnotneedtocreatea TCB upon receptionofthe SYN segmentandcanstillcheckthereturned ACK segmentbyrecomputingthe SYNcookie. a SendingapacketwithadifferentsourceIPaddressthantheaddressallocatedtothehostiscalledsendinga spoofedpacket Retransmittingtherst SYN segment AsIPprovidesanunreliableconnectionlessservice,the SYN and SYN+ACK segmentssenttoopenaTCP connectioncouldbelost.CurrentTCPimplementationsstartaretransmissiontimerwhentheysendtherst SYN segment.Thistimerisoftensettothreesecondsfortherstretransmissionandthendoublesaftereach retransmission RFC2988.TCPimplementationsalsoenforceamaximumnumberofretransmissionsfor theinitial SYN segment. Asexplainedearlier,TCPsegmentsmaycontainanoptionalheaderextension.Inthe SYN and SYN+ACK segments,theseoptionsareusedtonegotiatesomeparametersandtheutilisationofextensionstothebasicTCP specication. TherstparameterwhichisnegotiatedduringtheestablishmentofaTCPconnectionistheMaximumSegment Size(MSS ).TheMSSisthesizeofthelargestsegmentthataTCPentityisabletoprocess.Accordingto RFC 879,allTCPimplementationsmustbeabletoreceiveTCPsegmentscontaining536bytesofpayload.However, mostTCPimplementationsareabletoprocesslargersegments.SuchTCPimplementationsusetheTCPMSS Optioninthe SYN /SYN+ACK segmenttoindicatethelargestsegmenttheyareabletoprocess.TheMSSvalue indicatesthemaximumsizeofthepayloadoftheTCPsegments.Theclient(resp.server)storesinits TCB the MSSvalueannouncedbytheserver(resp.theclient). 94Chapter4.Thetransportlayer

    PAGE 99

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 AnotherutilisationofTCPoptionsduringconnectionestablishmentistoenableTCPextensions.Forexample, consider RFC1323 (whichisdiscussedin TCPreliabledatatransfer ). RFC1323 denesTCPextensionsto supporttimestampsandlargerwindows.Iftheclientsupports RFC1323,itaddsa RFC1323 optiontoits SYN segment.Iftheserverunderstandsthis RFC1323 optionandwishestouseit,itreplieswithan RFC1323 optioninthe SYN+ACK segmentandtheextensiondenedin RFC1323 isusedthroughouttheTCPconnection. Otherwise,iftheserver's SYN+ACK doesnotcontainthe RFC1323 option,theclientisnotallowedtousethis extensionandthecorrespondingTCPheaderoptionsthroughouttheTCPconnection.TCP'soptionmechanism isexibleanditallowstheextensionofTCPwhilemaintainingcompatibilitywitholderimplementations. TheTCPoptionsareencodedbyusingaTypeLengthValueformatwhere: therstbyteindicatesthe type oftheoption. thesecondbyteindicatesthetotallengthoftheoption(includingthersttwobytes)inbytes thelastbytesarespecicforeachtypeofoption RFC793 denestheMaximumSegmentSize(MSS)TCPoptionthatmustbeunderstoodbyallTCPimplementations.Thisoption(type2)hasalengthof4bytesandcontainsa16bitswordthatindicatestheMSSsupported bythesenderofthe SYN segment.TheMSSoptioncanonlybeusedinTCPsegmentshavingthe SYN agset. RFC793 alsodenestwospecialoptionsthatmustbesupportedbyallTCPimplementations.Therstoption is Endofoption.Itisencodedasasinglebytehavingvalue 0x00 andcanbeusedtoensurethattheTCPheader extensionendsona32bitsboundary.The No-Operation option,encodedasasinglebytehavingvalue 0x01,can beusedwhentheTCPheaderextensioncontainsseveralTCPoptionsthatshouldbealignedon32bitboundaries. Allotheroptions 14 areencodedbyusingtheTLVformat. Note: Therobustnessprinciple ThehandlingoftheTCPoptionsbyTCPimplementationsisoneofthemanyapplicationsofthe robustness principle whichisusuallyattributedtoJonPostelandisoftenquotedas Beliberalinwhatyouaccept,and conservativeinwhatyousend RFC1122 ConcerningtheTCPoptions,therobustnessprincipleimpliesthataTCPimplementationshouldbeabletoaccept TCPoptionsthatitdoesnotunderstand,inparticularinreceived SYN segments,andthatitshouldbeabletoparse anyreceivedsegmentwithoutcrashing,evenifthesegmentcontainsanunknownTCPoption.Furthermore,a servershouldnotsendinthe SYN+ACK segmentorlater,optionsthathavenotbeenproposedbytheclientinthe SYN segment. 4.3.2TCPconnectionrelease TCP,likemostconnection-orientedtransportprotocols,supportstwotypesofconnectionrelease: gracefulconnectionrelease,whereeachTCPusercanreleaseitsowndirectionofdatatransfer abruptconnectionrelease,whereeitheroneuserclosesbothdirectionsofdatatransferoroneTCPentity isforcedtoclosetheconnection(e.g.becausetheremotehostdoesnotreplyanymoreorduetolackof resources) Theabruptconnectionreleasemechanismisverysimpleandreliesonasinglesegmenthavingthe RST bitset.A TCPsegmentcontainingthe RST bitcanbesentforthefollowingreasons: anon-SYN segmentwasreceivedforanon-existingTCPconnection RFC793 byextension,someimplementationsrespondwithan RST segmenttoasegmentthatisreceivedonan existingconnectionbutwithaninvalidheader RFC3360.Thiscausesthecorrespondingconnectiontobe closedandhascausedsecurityattacks RFC4953 byextension,someimplementationssendan RST segmentwhentheyneedtocloseanexistingTCPconnection(e.g.becausetherearenotenoughresourcestosupportthisconnectionorbecausetheremotehost isconsideredtobeunreachable).MeasurementshaveshownthatthisusageofTCP RST waswidespread [AW05] 14 ThefulllistofallTCPoptionsmaybefoundat http://www.iana.org/assignments/tcp-parameters/ 4.3.TheTransmissionControlProtocol 95

    PAGE 100

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Whenan RST segmentissentbyaTCPentity,itshouldcontainthecurrentvalueofthe sequencenumber forthe connection(or0ifitdoesnotbelongtoanyexistingconnection)andthe acknowledgementnumber shouldbeset tothenextexpectedin-sequence sequencenumber onthisconnection. Note: TCP RST wars TCPimplementersshouldensurethattwoTCPentitiesneverenteraTCP RST warwherehost A issendinga RST segmentinresponsetoaprevious RST segmentthatwassentbyhost B inresponsetoaTCP RST segmentsentby host A ...Toavoidsuchaninniteexchangeof RST segmentsthatdonotcarrydata,aTCPentityis never allowed tosenda RST segmentinresponsetoanother RST segment. ThenormalwayofterminatingaTCPconnectionisbyusingthegracefulTCPconnectionrelease.Thismechanismusesthe FIN agoftheTCPheaderandallowseachhosttoreleaseitsowndirectionofdatatransfer.Asfor the SYN ag,theutilisationofthe FIN agintheTCPheaderconsumesonesequencenumber.Thegure FSM forTCPconnectionrelease showsthepartoftheTCPFSMusedwhenaTCPconnectionisreleased. Figure4.42:FSMforTCPconnectionrelease Startingfromthe Established state,therearetwomainpathsthroughthisFSM. Therstpathiswhenthehostreceivesasegmentwithsequencenumber x andthe FIN agset.Theutilisationof the FIN agindicatesthatthebytebefore sequencenumberx wasthelastbyteofthebytestreamsentbytheremote host.Onceallofthedatahasbeendeliveredtotheuser,theTCPentitysendsan ACK segmentwhose ack eldis setto (x +1) mod 2 32 toacknowledgethe FIN segment.The FIN segmentissubjecttothesameretransmission mechanismsasanormalTCPsegment.Inparticular,itstransmissionisprotectedbytheretransmissiontimer. Atthispoint,theTCPconnectionentersthe CLOSE_WAIT state.Inthisstate,thehostcanstillsenddatatothe remotehost.Onceallitsdatahavebeensent,itsendsa FIN segmentandenterthe LAST_ACK state.Inthisstate, theTCPentitywaitsfortheacknowledgementofits FIN segment.Itmaystillretransmitunacknowledgeddata segmentse.g.iftheretransmissiontimerexpires.Uponreceptionoftheacknowledgementforthe FIN segment, theTCPconnectioniscompletelyclosedandits TCB canbediscarded. Thesecondpathiswhenthehostdecidesrsttosenda FIN segment.Inthiscase,itentersthe FIN_WAIT1 state.Itthisstate,itcanretransmitunacknowledgedsegmentsbutcannotsendnewdatasegments.Itwaitsforan acknowledgementofits FIN segment,butmayreceivea FIN segmentsentbytheremotehost.Intherstcase, theTCPconnectionentersthe FIN_WAIT2 state.Inthisstate,newdatasegmentsfromtheremotehostarestill accepteduntilthereceptionofthe FIN segment.Theacknowledgementforthis FIN segmentissentoncealldata receivedbeforethe FIN segmenthavebeendeliveredtotheuserandtheconnectionentersthe TIME_WAIT state. Inthesecondcase,a FIN segmentisreceivedandtheconnectionentersthe Closing stateoncealldatareceived fromtheremotehosthavebeendeliveredtotheuser.Inthisstate,nonewdatasegmentscanbesentandthehost waitsforanacknowledgementofits FIN segmentbeforeenteringthe TIME_WAIT state. The TIME_WAIT stateisdifferentfromtheotherstatesoftheTCPFSM.ATCPentityentersthisstateafter havingsentthelast ACK segmentonaTCPconnection.Thissegmentindicatestotheremotehostthatallthe datathatithassenthavebeencorrectlyreceivedandthatitcansafelyreleasetheTCPconnectionanddiscard 96Chapter4.Thetransportlayer

    PAGE 101

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 thecorresponding TCB.Afterhavingsentthelast ACK segment,aTCPconnectionentersthe TIME_WAIT and remainsinthisstatefor 2 MSL seconds.Duringthisperiod,theTCBoftheconnectionismaintained.This ensuresthattheTCPentitythatsentthelast ACK maintainsenoughstatetobeabletoretransmitthissegment ifthis ACK segmentislostandtheremotehostretransmitsitslast FIN segmentoranotherone.Thedelayof 2 MSL secondsensuresthatanyduplicatesegmentsontheconnectionwouldbehandledcorrectlywithout causingthetransmissionofan RST segment.Withoutthe TIME_WAIT stateandthe 2 MSL secondsdelay,the connectionreleasewouldnotbegracefulwhenthelast ACK segmentislost. Note: TIME_WAITonbusyTCPservers The 2 MSL secondsdelayinthe TIME_WAIT stateisanimportantoperationalproblemonservershaving thousandsofsimultaneouslyopenedTCPconnections [FTY99].Considerforexampleabusywebserverthat processes10.000TCPconnectionseverysecond.Ifeachoftheseconnectionsremaininthe TIME_WAIT state for4minutes,thisimpliesthattheserverwouldhavetomaintainmorethan2millionTCBsatanytime.Forthis reason,someTCPimplementationsprefertoperformanabruptconnectionreleasebysendinga RST segmentto closetheconnection [AW05] andimmediatelydiscardthecorresponding TCB.However,ifthe RST segmentis lost,theremotehostcontinuestomaintaina TCB foraconnectionnolongerexists.Thisoptimisationreducesthe numberofTCBsmaintainedbythehostsendingthe RST segmentbutatthepotentialcostofincreasedprocessing ontheremotehostwhenthe RST segmentislost. 4.3.3TCPreliabledatatransfer TheoriginalTCPdatatransfermechanismsweredenedin RFC793.BasedontheexperienceofusingTCP onthegrowingglobalInternet,thispartoftheTCPspecicationhasbeenupdatedandimprovedseveraltimes, alwayswhilepreservingthebackwardcompatibilitywitholderTCPimplementations.Inthissection,wereview themaindatatransfermechanismsusedbyTCP. TCPisawindow-basedtransportprotocolthatprovidesabi-directionalbytestreamservice.Thishasseveral implicationsontheeldsoftheTCPheaderandthemechanismsusedbyTCP.ThethreeeldsoftheTCPheader are: sequencenumber.TCPusesa32bitssequencenumber.The sequencenumber placedintheheaderofa TCPsegmentcontainingdataisthesequencenumberoftherstbyteofthepayloadoftheTCPsegment. acknowledgementnumber.TCPusescumulativepositiveacknowledgements.EachTCPsegmentcontains the sequencenumber ofthenextbytethatthesenderoftheacknowledgementexpectstoreceivefromthe remotehost.Intheory,the acknowledgementnumber isonlyvalidifthe ACK agoftheTCPheaderisset. Inpracticealmostall 15 TCPsegmentshavetheir ACK agset. window.aTCPreceiverusesthis16bitseldtoindicatethecurrentsizeofitsreceivewindowexpressed inbytes. Note: TheTransmissionControlBlock ForeachestablishedTCPconnection,aTCPimplementationmustmaintainaTransmissionControlBlock(TCB). ATCBcontainsalltheinformationrequiredtosendandreceivesegmentsonthisconnection RFC793.This includes 16 : thelocalIPaddress theremoteIPaddress thelocalTCPportnumber theremoteTCPportnumber thecurrentstateoftheTCPFSM the maximumsegmentsize (MSS) 15 Inpractice,onlythe SYN segmentdonothavetheir ACK agset. 16 AcompleteTCPimplementationcontainsadditionalinformationinitsTCB,notablytosupportthe urgent pointer.However,thispartof TCPisnotdiscussedinthisbook.Referto RFC793 and RFC2140 formoredetailsabouttheTCB. 4.3.TheTransmissionControlProtocol 97

    PAGE 102

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 snd.nxt :thesequencenumberofthenextbyteinthebytestream(therstbyteofanewdatasegmentthat yousendusesthissequencenumber) snd.una :theearliestsequencenumberthathasbeensentbuthasnotyetbeenacknowledged snd.wnd :thecurrentsizeofthesendingwindow(inbytes) rcv.nxt :thesequencenumberofthenextbytethatisexpectedtobereceivedfromtheremotehost rcv.wnd :thecurrentsizeofthereceivewindowadvertisedbytheremotehost sendingbuffer :abufferusedtostoreallunacknowledgeddata receivingbuffer :abuffertostorealldatareceivedfromtheremotehostthathasnotyetbeendelivered totheuser.Datamaybestoredinthe receivingbuffer becauseeitheritwasnotreceivedinsequenceor becausetheuseristooslowtoprocessit TheoriginalTCPspecicationcanbecategorisedasatransportprotocolthatprovidesabytestreamserviceand uses go-back-n. Tosendnewdataonanestablishedconnection,aTCPentityperformsthefollowingoperationsonthecorrespondingTCB.Itrstchecksthatthe sendingbuffer doesnotcontainmoredatathanthereceivewindowadvertisedby theremotehost(rcv.wnd ).Ifthewindowisnotfull,upto MSS bytesofdataareplacedinthepayloadofaTCP segment.The sequencenumber ofthissegmentisthesequencenumberoftherstbyteofthepayload.Itissetto therstavailablesequencenumber: snd.nxt and snd.nxt isincrementedbythelengthofthepayloadoftheTCP segment.The acknowledgementnumber ofthissegmentissettothecurrentvalueof rcv.nxt andthe window eld oftheTCPsegmentiscomputedbasedonthecurrentoccupancyofthe receivingbuffer.Thedataiskeptinthe sendingbuffer incaseitneedstoberetransmittedlater. WhenaTCPsegmentwiththe ACK agsetisreceived,thefollowingoperationsareperformed. rcv.wnd issetto thevalueofthe window eldofthereceivedsegment.The acknowledgementnumber iscomparedto snd.una.The newlyacknowledgeddataisremovefromthe sendingbuffer and snd.una isupdated.IftheTCPsegmentcontained data,the sequencenumber iscomparedto rcv.nxt.Iftheyareequal,thesegmentwasreceivedinsequenceand thedatacanbedeliveredtotheuserand rcv.nxt isupdated.Thecontentsofthe receivingbuffer ischeckedtosee whetherotherdataalreadypresentinthisbuffercanbedeliveredinsequencetotheuser.Ifso, rcv.nxt isupdated again.Otherwise,thesegment'spayloadisplacedinthe receivingbuffer. Segmenttransmissionstrategies InatransportprotocolsuchasTCPthatoffersabytestream,apracticalissuethatwasleftasanimplementation choicein RFC793 istodecidewhenanewTCPsegmentcontainingdatamustbesent.Therearetwosimpleand extremeimplementationchoices.TherstimplementationchoiceistosendaTCPsegmentassoonastheuser hasrequestedthetransmissionofsomedata.ThisallowsTCPtoprovidealowdelayservice.However,ifthe userissendingdataonebyteatatime,TCPwouldplaceeachuserbyteinasegmentcontaining20bytesofTCP header 17 .Thisisahugeoverheadthatisnotacceptableinwideareanetworks.Asecondsimplesolutionwould betoonlytransmitanewTCPsegmentoncetheuserhasproducedMSSbytesofdata.Thissolutionreducesthe overhead,butatthecostofapotentiallyveryhighdelay. AnelegantsolutiontothisproblemwasproposedbyJohnNaglein RFC896.JohnNagleobservedthatthe overheadcausedbytheTCPheaderwasaprobleminwideareaconnections,butlessinlocalareaconnections wheretheavailablebandwidthisusuallyhigher.Heproposedthefollowingrulestodecidetosendanewdata segmentwhenanewdatahasbeenproducedbytheuseroranewacksegmenthasbeenreceived ifrcv.wnd>=MSSandlen(data)>=MSS: sendoneMSS-sizedsegment else ifthereareunacknowledgeddata: placedatainbufferuntilacknowledgementhasbeenreceived else sendoneTCPsegmentcontainingallbuffereddata 17 ThisTCPsegmentisthenplacedinanIPheader.WedescribeIPv4andIPv6inthenextchapter.TheminimumsizeoftheIPv4(resp. IPv6)headeris20bytes(resp.40bytes). 98Chapter4.Thetransportlayer

    PAGE 103

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 TherstruleensuresthataTCPconnectionusedforbulkdatatransferalwayssendsfullTCPsegments.The secondrulesendsonepartiallylledTCPsegmenteveryround-trip-time. Thisalgorithm,calledtheNaglealgorithm,takesafewlinesofcodeinallTCPimplementations.Theselinesof codehaveahugeimpactonthepacketsthatareexchangedinTCP/IPnetworks.Researchershaveanalysedthe distributionofthepacketsizesbycapturingandanalysingallthepacketspassingthroughagivenlink.These studieshaveshownseveralimportantresults: inTCP/IPv4networks,alargefractionofthepacketsareTCPsegmentsthatcontainonlyanacknowledgement.Thesepacketsusuallyaccountfor40-50%ofthepacketspassingthroughthestudiedlink inTCP/IPv4networks,mostofthebytesareexchangedinlongpackets,usuallypacketscontainingupto 1460bytesofpayloadwhichisthedefaultMSSforhostsattachedtoanEthernetnetwork,themostpopular typeofLAN Thegurebelowprovidesadistributionofthepacketsizesmeasuredonalink.Itshowsathree-modaldistribution ofthepacketsize.50%ofthepacketscontainpureTCPacknowledgementsandoccupy40bytes.About20% ofthepacketscontainabout500bytes 18 ofuserdataand12%ofthepacketscontain1460bytesofuserdata. However,mostoftheuserdataistransportedinlargepackets.Thispacketsizedistributionhasimplicationson thedesignofroutersaswediscussinthenextchapter. Figure4.43:PacketsizedistributionintheInternet Recentmeasurementsindicatethatthesepacketsizedistributionsarestillvalidintoday'sInternet,althoughthe packetdistributiontendstobecomebimodalwithsmallpacketscorrespondingtoTCPpureacks(40-64bytes dependingontheutilisationofTCPoptions)andlarge1460-bytespacketscarryingmostoftheuserdata. TCPwindows Fromaperformancepointofview,oneofthemainlimitationsoftheoriginalTCPspecicationisthe16bits window eldintheTCPheader.Asthiseldindicatesthecurrentsizeofthereceivewindowinbytes,itlimitsthe TCPreceivewindowat65535bytes.ThislimitationwasnotasevereproblemwhenTCPwasdesignedsinceat thattimehigh-speedwideareanetworksofferedamaximumbandwidthof56kbps.However,intoday'snetwork, thislimitationisnotacceptableanymore.Thetablebelowprovidestherough 19 maximumthroughputthatcanbe achievedbyaTCPconnectionwitha64KByteswindowinfunctionoftheconnection'sround-trip-time RTT MaximumThroughput 1msec 524Mbps 10msec 52.4Mbps 100msec 5.24Mbps 500msec 1.05Mbps Tosolvethisproblem,abackwardcompatibleextensionthatallowsTCPtouselargerreceivewindowswas proposedin RFC1323.Today,mostTCPimplementationssupportthisoption.Thebasicideaisthatinsteadof 18 Whenthesemeasurementsweretaken,somehostshadadefaultMSSof552bytes(e.g.BSDUnixderivatives)or536bytes(thedefault MSSspeciedin RFC793).Today,mostTCPimplementationderivetheMSSfromthemaximumpacketsizeoftheLANinterfacetheyuse (Ethernetinmostcases). 19 ApreciseestimationofthemaximumbandwidththatcanbeachievedbyaTCPconnectionshouldtakeintoaccounttheoverheadofthe TCPandIPheadersaswell. 4.3.TheTransmissionControlProtocol 99

    PAGE 104

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 storing snd.wnd and rcv.wnd as16bitsintegersinthe TCB,theyshouldbestoredas32bitsintegers.AstheTCP segmentheaderonlycontains16bitstoplacethewindoweld,itisimpossibletocopythevalueof snd.wnd in eachsentTCPsegment.Insteadtheheadercontains snd.wnd>>S where S isthescalingfactor( 0 S 14) negotiatedduringconnectionestablishment.TheclientaddsitsproposedscalingfactorasaTCPoptioninthe SYN segment.Iftheserversupports RFC1323,itplacesinthe SYN+ACK segmentthescalingfactorthatituses whenadvertisingitsownreceivewindow.Thelocalandremotescalingfactorsareincludedinthe TCB.Ifthe serverdoesnotsupport RFC1323,itignoresthereceivedoptionandnoscalingisapplied. Byusingthewindowscalingextensionsdenedin RFC1323,TCPimplementationscanuseareceivebuffer ofupto1GByte.Withsuchareceivebuffer,themaximumthroughputthatcanbeachievedbyasingleTCP connectionbecomes: RTT MaximumThroughput 1msec 8590Gbps 10msec 859Gbps 100msec 86Gbps 500msec 17Gbps Thesethroughputsareacceptableintoday'snetworks.However,therearealreadyservershaving10Gbpsinterfaces...EarlyTCPimplementationshadxedreceivingandsendingbuffers 20 .Today'shighperformance implementationsareabletoautomaticallyadjustthesizeofthesendingandreceivingbuffertobettersupporthigh bandwidthows [SMM1998] TCP'sretransmissiontimeout Inago-back-ntransportprotocolsuchasTCP,theretransmissiontimeoutmustbecorrectlysetinordertoachieve goodperformance.Iftheretransmissiontimeoutexpirestooearly,thenbandwidthiswastedbyretransmitting segmentsthathavealreadybeencorrectlyreceived;whereasiftheretransmissiontimeoutexpirestoolate,then bandwidthiswastedbecausethesenderisidlewaitingfortheexpirationofitsretransmissiontimeout. Agoodsettingoftheretransmissiontimeoutclearlydependsonanaccurateestimationoftheround-trip-timeof eachTCPconnection.Theround-trip-timediffersbetweenTCPconnections,butmayalsochangeduringthe lifetimeofasingleconnection.Forexample,thegurebelowshowstheevolutionoftheround-trip-timebetween twohostsduringaperiodof45seconds. Figure4.44:Evolutionoftheround-trip-timebetweentwohosts Theeasiestsolutiontomeasuretheround-trip-timeonaTCPconnectionistomeasurethedelaybetweenthe transmissionofadatasegmentandthereceptionofacorrespondingacknowledgement 21 .Asillustratedinthe 20 Seehttp://fasterdata.es.net/tuning.htmlformoreinformationonhowtotuneaTCPimplementation 21 Intheory,aTCPimplementationcouldstorethetimestampofeachdatasegmenttransmittedandcomputeanewestimatefortheroundtrip-timeuponreceptionofthecorrespondingacknowledgement.However,usingsuchfrequentmeasurementsintroducesalotofnoisein practiceandmanyimplementationsstillmeasuretheround-trip-timeonceperround-trip-timebyrecordingthetransmissiontimeofone segmentatatime RFC2988 100Chapter4.Thetransportlayer

    PAGE 105

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 gurebelow,thismeasurementworkswellwhentherearenosegmentlosses. Figure4.45:Howtomeasuretheround-trip-time? However,whenadatasegmentislost,asillustratedinthebottompartofthegure,themeasurementisambiguous asthesendercannotdeterminewhetherthereceivedacknowledgementwastriggeredbythersttransmissionof segment 123 oritsretransmission.Usingincorrectround-trip-timeestimationscouldleadtoincorrectvaluesof theretransmissiontimeout.Forthisreason,PhilKarnandCraigPartridgeproposed,in [KP91],toignorethe round-trip-timemeasurementsperformedduringretransmissions. Toavoidthisambiguityintheestimationoftheround-trip-timewhensegmentsareretransmitted,recentTCP implementationsrelyonthe timestampoption denedin RFC1323.ThisoptionallowsaTCPsendertoplace two32bittimestampsineachTCPsegmentthatitsends.Thersttimestamp,TSValue(TSval )ischosenbythe senderofthesegment.Itcouldforexamplebethecurrentvalueofitsreal-timeclock 22 .Thesecondvalue,TS EchoReply(TSecr ),isthelast TSval thatwasreceivedfromtheremotehostandstoredinthe TCB.Thegure belowshowshowtheutilizationofthistimestampoptionallowsforthedisambiguationoftheround-trip-time measurementwhenthereareretransmissions. Figure4.46:Disambiguatinground-trip-timemeasurementswiththe RFC1323 timestampoption Oncetheround-trip-timemeasurementshavebeencollectedforagivenTCPconnection,theTCPentitymust computetheretransmissiontimeout.Astheround-trip-timemeasurementsmaychangeduringthelifetimeofa connection,theretransmissiontimeoutmayalsochange.Atthebeginningofaconnection 23 ,theTCPentitythat sendsa SYN segmentdoesnotknowtheround-trip-timetoreachtheremotehostandtheinitialretransmission timeoutisusuallysetto3seconds RFC2988. TheoriginalTCPspecicationproposedin RFC793 toincludetwoadditionalvariablesintheTCB: srtt :thesmoothedround-trip-timecomputedas srrt =( srtt)+((1 )Tj/T1_4 9.963 Tf10.648 0 Td( ) rtt) where rtt isthe round-trip-timemeasuredaccordingtotheaboveprocedureand asmoothingfactor(e.g.0.8or0.9) 22 Somesecurityexpertshaveraisedconcernsthatusingthereal-timeclocktosetthe TSval inthetimestampoptioncanleakinformation suchasthesystem'sup-time.Solutionsproposedtosolvethisproblemmaybefoundin [CNPI09] 23 AsaTCPclientoftenestablishesseveralparallelorsuccessiveconnectionswiththesameserver, RFC2140 hasproposedtoreusefor anewconnectionsomeinformationthatwascollectedintheTCBofapreviousconnection,suchasthemeasuredrtt.However,thissolution hasnotbeenwidelyimplemented. 4.3.TheTransmissionControlProtocol 101

    PAGE 106

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 rto :theretransmissiontimeoutiscomputedas rto = min(60;max(1;f srtt)) where f isusedtotake intoaccountthedelayvariance(value:1.3to2.0).The 60 and 1 constantsareusedtoensurethatthe rto is notlargerthanoneminutenorsmallerthan1second. However,inpractice,thiscomputationfortheretransmissiontimeoutdidnotworkwell.Themainproblemwas thatthecomputed rto didnotcorrectlytakeintoaccountthevariationsinthemeasuredround-trip-time. VanJacobson proposedinhisseminalpaper[Jacobson1988]animprovedalgorithmtocomputethe rto andimplemented itintheBSDUnixdistribution.ThisalgorithmisnowpartoftheTCPstandard RFC2988. Jacobson'salgorithmusestwostatevariables, srtt thesmoothed rtt and rttvar theestimationofthevarianceof the rtt andtwoparameters: and f .WhenaTCPconnectionstarts,therst rto issetto 3 seconds.Whenarst estimationofthe rtt isavailable,the srtt, rttvar and rto arecomputedas srtt=rtt rttvar=rtt/2 rto=srtt+4 rttvar Then,whenotherrttmeasurementsarecollected, srtt and rttvar areupdatedasfollows: rttvar =(1 )Tj/T1_3 9.963 Tf9.962 0 Td(f ) rttvar + f jsrtt )Tj/T1_3 9.963 Tf9.962 0 Td[(rttj srtt =(1 )Tj/T1_3 9.963 Tf9.963 0 Td( ) srtt + rtt rto = srtt +4 rttvar Theproposedvaluesfortheparametersare = 1 8 and f = 1 4 .ThisallowsaTCPimplementation,implemented inthekernel,toperformthe rtt computationbyusingshiftoperationsinsteadofthemorecostlyoatingpoint operations[Jacobson1988].Thegurebelowillustratesthecomputationofthe rto upon rtt changes. Figure4.47:Examplecomputationofthe rto Advancedretransmissionstrategies Thedefaultgo-back-nretransmissionstrategywasdenedin RFC793.Whentheretransmissiontimerexpires, TCPretransmitstherstunacknowledgedsegment(i.e.theonehavingsequencenumber snd.una).Aftereach expirationoftheretransmissiontimeout, RFC2988 recommendstodoublethevalueoftheretransmissiontimeout.Thisiscalledan exponentialbackoff.Thisdoublingoftheretransmissiontimeoutafteraretransmission wasincludedinTCPtodealwithissuessuchasnetwork/receiveroverloadandincorrectinitialestimationsofthe retransmissiontimeout.Ifthesamesegmentisretransmittedseveraltimes,theretransmissiontimeoutisdoubled aftereveryretransmissionuntilitreachesaconguredmaximum. RFC2988 suggestsamaximumretransmission timeoutofatleast60seconds.Oncetheretransmissiontimeoutreachesthisconguredmaximum,theremote hostisconsideredtobeunreachableandtheTCPconnectionisclosed. ThisretransmissionstrategyhasbeenrenedbasedontheexperienceofusingTCPontheInternet.Therst renementwasaclaricationofthestrategyusedtosendacknowledgements.AsTCPusespiggybacking,the 102Chapter4.Thetransportlayer

    PAGE 107

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 easiestandlesscostlymethodtosendacknowledgementsistoplacetheminthedatasegmentssentintheother direction.However,fewapplicationlayerprotocolsexchangedatainbothdirectionsatthesametimeandthusthis methodrarelyworks.Foranapplicationthatissendingdatasegmentsinonedirectiononly,theremoteTCPentity returnsemptyTCPsegmentswhoseonlyusefulinformationistheiracknowledgementnumber.Thismaycause alargeoverheadinwideareanetworkifapure ACK segmentissentinresponsetoeachreceiveddatasegment. MostTCPimplementationsusea delayedacknowledgement strategy.Thisstrategyensuresthatpiggybackingis usedwheneverpossible,otherwisepure ACK segmentsaresentforeverysecondreceiveddatasegmentswhen therearenolosses.Whentherearelossesorreordering, ACK segmentsaremoreimportantforthesenderand theyaresentimmediately RFC813RFC1122.Thisstrategyreliesonanewtimerwithashortdelay(e.g.50 milliseconds)andoneadditionalagintheTCB.Itcanbeimplementedasfollows receptionofadatasegment: ifpkt.seq==rcv.nxt:#segmentreceivedinsequence ifdelayedack: sendpureacksegment cancelacktimer delayedack=False else: delayedack=True startacktimer else:#outofsequencesegment sendpureacksegment ifdelayedack: delayedack=False cancelacktimer transmissionofadatasegment:#piggybackack ifdelayedack: delayedack=False cancelacktimer acktimerexpiration: sendpureacksegment delayedack=False Duetothisdelayedacknowledgementstrategy,duringabulktransfer,aTCPimplementationusuallyacknowledgeseverysecondTCPsegmentreceived. Thedefaultgo-back-nretransmissionstrategyusedbyTCPhastheadvantageofbeingsimpletoimplement,in particularonthereceiverside,butwhentherearelosses,ago-back-nstrategyprovidesalowerperformancethan aselectiverepeatstrategy.TheTCPdevelopershavedesignedseveralextensionstoTCPtoallowittousea selectiverepeatstrategywhilemaintainingbackwardcompatibilitywitholderTCPimplementations.TheseTCP extensionsassumethatthereceiverisabletobufferthesegmentsthatitreceivesout-of-sequence. Therstextensionthatwasproposedisthefastretransmitheuristic.ThisextensioncanbeimplementedonTCP sendersandthusdoesnotrequireanychangetotheprotocol.ItonlyassumesthattheTCPreceiverisableto bufferout-of-sequencesegments. Fromaperformancepointofview,oneissuewithTCP's retransmissiontimeout isthatwhenthereareisolated segmentlosses,theTCPsenderoftenremainsidlewaitingfortheexpirationofitsretransmissiontimeouts.Such isolatedlossesarefrequentintheglobalInternet [Paxson99].Aheuristictodealwithisolatedlosseswithout waitingfortheexpirationoftheretransmissiontimeouthasbeenincludedinmanyTCPimplementationssince theearly1990s.Tounderstandthisheuristic,letusconsiderthegurebelowthatshowsthesegmentsexchanged overaTCPconnectionwhenanisolatedsegmentislost. Asshownabove,whenanisolatedsegmentislostthesenderreceivesseveral duplicateacknowledgements since theTCPreceiverimmediatelysendsapureacknowledgementwhenitreceivesanout-of-sequencesegment.A duplicateacknowledgementisanacknowledgementthatcontainsthesame acknowledgementnumber asaprevious segment.Asingleduplicateacknowledgementdoesnotnecessarilyimplythatasegmentwaslost,asasimple reorderingofthesegmentsmaycauseduplicateacknowledgementsaswell.Measurements [Paxson99] have shownthatsegmentreorderingisfrequentintheInternet.Basedontheseobservations,the fastretransmit heuristic hasbeenincludedinmostTCPimplementations.Itcanbeimplementedasfollows 4.3.TheTransmissionControlProtocol 103

    PAGE 108

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.48:Detectingisolatedsegmentlosses ackarrival: iftcp.ack==snd.una:#duplicateacknowledgement dupacks++ ifdupacks==3: retransmitsegment(snd.una) else: dupacks=0 #processacknowledgement ThisheuristicrequiresanadditionalvariableintheTCB(dupacks).Mostimplementationssetthedefaultnumber ofduplicateacknowledgementsthattriggeraretransmissionto3.ItisnowpartofthestandardTCPspecication RFC2581.The fastretransmit heuristicimprovestheTCPperformanceprovidedthatisolatedsegmentsarelost andthecurrentwindowislargeenoughtoallowthesendertosendthreeduplicateacknowledgements. Thegurebelowillustratestheoperationofthe fastretransmit heuristic. Figure4.49:TCPfastretransmitheuristics Whenlossesarenotisolatedorwhenthewindowsaresmall,theperformanceofthe fastretransmit heuristic decreases.Insuchenvironments,itisnecessarytoallowaTCPsendertouseaselectiverepeatstrategyinstead ofthedefaultgo-back-nstrategy.Implementingselective-repeatrequiresachangetotheTCPprotocolasthe receiverneedstobeabletoinformthesenderoftheout-of-ordersegmentsthatithasalreadyreceived.Thiscan bedonebyusingtheSelectiveAcknowledgements(SACK)optiondenedin RFC2018.ThisTCPoptionis 104Chapter4.Thetransportlayer

    PAGE 109

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 negotiatedduringtheestablishmentofaTCPconnection.IfbothTCPhostssupporttheoption,SACKblockscan beattachedbythereceivertothesegmentsthatitsends.SACKblocksallowaTCPreceivertoindicatetheblocks ofdatathatithasreceivedcorrectlybutoutofsequence.ThegurebelowillustratestheutilisationoftheSACK blocks. Figure4.50:TCPselectiveacknowledgements AnSACKoptioncontainsoneormoreblocks.Ablockcorrespondstoallthesequencenumbersbetweenthe left edge andthe rightedge oftheblock.Thetwoedgesoftheblockareencodedas32bitnumbers(thesamesizeas theTCPsequencenumber)inanSACKoption.AstheSACKoptioncontainsonebytetoencodeitstypeandone byteforitslength,aSACKoptioncontaining b blocksisencodedasasequenceof 2+8 b bytes.Inpractice, thesizeoftheSACKoptioncanbeproblematicastheoptionalTCPheaderextensioncannotbelongerthan44 bytes.AstheSACKoptionisusuallycombinedwiththe RFC1323 timestampextension,thisimpliesthataTCP segmentcannotusuallycontainmorethanthreeSACKblocks.ThislimitationimpliesthataTCPreceivercannot alwaysplaceintheSACKoptionthatitsends,informationaboutallthereceivedblocks. TodealwiththelimitedsizeoftheSACKoption,aTCPreceivercurrentlyhavingmorethan3blocksinsideits receivingbuffermustselecttheblockstoplaceintheSACKoption.AgoodheuristicistoputintheSACKoption theblocksthathavemostrecentlychanged,asthesenderislikelytobealreadyawareoftheolderblocks. WhenasenderreceivesanSACKoptionindicatinganewblockandthusanewpossiblesegmentloss,itusually doesnotretransmitthemissingsegment(simmediately.Todealwithreordering,aTCPsendercanuseaheuristic similarto fastretransmit byretransmittingagaponlyonceithasreceivedthreeSACKoptionsindicatingthisgap. ItshouldbenotedthattheSACKoptiondoesnotsupersedethe acknowledgementnumber oftheTCPheader.A TCPsendercanonlyremovedatafromitssendingbufferoncetheyhavebeenacknowledgedbyTCP'scumulative acknowledgements.Thisdesignwaschosenfortworeasons.First,itallowsthereceivertodiscardpartsofits receivingbufferwhenitisrunningoutofmemorywithoutloosingdata.Second,astheSACKoptionisnot transmittedreliably,thecumulativeacknowledgementsarestillrequiredtodealwithlossesof ACK segments carryingonlySACKinformation.Thus,theSACKoptiononlyservesasahinttoallowthesendertooptimiseits retransmissions. TCPcongestioncontrol Intheprevioussections,wehaveexplainedthemechanismsthatTCPusestodealwithtransmissionerrorsand segmentlosses.InaheterogeneousnetworksuchastheInternetorenterpriseIPnetworks,endsystemshavevery differentlevelsofperformance.Someendsystemsarehigh-endserversattachedto10Gbpslinkswhileothersare mobiledevicesattachedtoaverylowbandwidthwirelesslink.Despitethesehugedifferencesinperformance,a mobiledeviceshouldbeabletoefcientlyexchangesegmentswithahigh-endserver. Tounderstandthisproblembetter,letusconsiderthescenarioshowninthegurebelow,whereaserver(A) 4.3.TheTransmissionControlProtocol 105

    PAGE 110

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 attachedtoa 10Mbps linkissendingTCPsegmentstoanothercomputer(C )throughapaththatcontainsa 2 Mbps link. Figure4.51:TCPoverheterogeneouslinks Inthisnetwork,theTCPsegmentssentbytheserverreachrouter R1. R1 forwardsthesegmentstowardsrouter R2. Router R2 canpotentiallyreceivesegmentsat 10Mbps,butitcanonlyforwardthemat 2Mbps torouter R2 and thentohost C.Router R2 containsbuffersthatallowittostorethepacketsthatcannotimmediatelybeforwarded totheirdestination.TounderstandtheoperationofTCPinthisenvironment,letusconsiderasimpliedmodelof thisnetworkwherehost A isattachedtoa 10Mbps linktoaqueuethatrepresentsthebuffersofrouter R2.This queueisemptiedatarateof 2Mbps. Figure4.52:TCPselfclocking Letusconsiderthathost A usesawindowofthreesegments.Itthussendsthreeback-to-backsegmentsat 10 Mbps andthenwaitsforanacknowledgement.Host A stopssendingsegmentswhenitswindowisfull.These segmentsreachthebuffersofrouter R2.Therstsegmentstoredinthisbufferissentbyrouter R2 atarateof 2 Mbps tothedestinationhost.Uponreceptionofthissegment,thedestinationsendsanacknowledgement.This acknowledgementallowshost A totransmitanewsegment.Thissegmentisstoredinthebuffersofrouter R2 whileitistransmittingthesecondsegmentthatwassentbyhost A...Thus,afterthetransmissionoftherst windowofsegments,TCPsendsonedatasegmentafterthereceptionofeachacknowledgementreturnedbythe destination 24 .Inpractice,theacknowledgementssentbythedestinationserveasakindof clock thatallows thesendinghosttoadaptitstransmissionratetotherateatwhichsegmentsarereceivedbythedestination.This TCPself-clocking istherstmechanismthatallowsTCPtoadapttoheterogeneousnetworks[Jacobson1988].It dependsontheavailabilityofbufferstostorethesegmentsthathavebeensentbythesenderbuthavenotyetbeen transmittedtothedestination. However,TCPisnotalwaysusedinthisenvironment.IntheglobalInternet,TCPisusedinnetworkswherea largenumberofhostssendsegmentstoalargenumberofreceivers.Forexample,letusconsiderthenetwork 24 Ifthedestinationisusingdelayedacknowledgements,thesendinghostsendstwodatasegmentsaftereachacknowledgement. 106Chapter4.Thetransportlayer

    PAGE 111

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 depictedbelowwhichissimilartotheonediscussedin [Jacobson1988] and RFC896.Inthisnetwork,weassume thatthebuffersoftherouterareinnitetoensurethatnopacketislost. Figure4.53:Thecongestioncollapseproblem IfmanyTCPsendersareattachedtotheleftpartofthenetworkabove,theyallsendawindowfullofsegments. Thesesegmentsarestoredinthebuffersoftherouterbeforebeingtransmittedtowardstheirdestination.Ifthere aremanysendersontheleftpartofthenetwork,theoccupancyofthebuffersquicklygrows.Aconsequenceof thebufferoccupancyisthattheround-trip-time,measuredbyTCP,betweenthesenderandthereceiverincreases. Consideranetworkwhere10,000bitssegmentsaresent.Whenthebufferisempty,suchasegmentrequires1 millisecondtobetransmittedonthe 10Mbps linkand5millisecondstobethetransmittedonthe 2Mbps link. Thus,theround-trip-timemeasuredbyTCPisroughly6millisecondsifweignorethepropagationdelayonthe links.MostroutersmanagetheirbuffersasaFIFOqueue 25 .Ifthebuffercontains100segments,theround-triptimebecomes 1+100 5+5 millisecondsasnewsegmentsareonlytransmittedonthe 2Mbps linkonceall previoussegmentshavebeentransmitted.Unfortunately,TCPusesaretransmissiontimerandperforms go-back-n torecoverfromtransmissionerrors.Ifthebufferoccupancyishigh,TCPassumesthatsomesegmentshavebeen lostandretransmitsafullwindowofsegments.Thisincreasestheoccupancyofthebufferandthedelaythrough thebuffer...Furthermore,thebuffermaystoreandsendonthelowbandwidthlinksseveralretransmissionsof thesamesegment.Thisproblemiscalled congestioncollapse.Itoccurredseveraltimesinthelate1980s.For example,[Jacobson1988]notesthatin1986,theusablebandwidthofa32Kbitslinkdroppedto40bitspersecond duetocongestioncollapse 26 The congestioncollapse isaproblemthatallheterogeneousnetworksface.Differentmechanismshavebeen proposedinthescienticliteraturetoavoidorcontrolnetworkcongestion.Someofthemhavebeenimplemented anddeployedinrealnetworks.Tounderstandthisprobleminmoredetail,letusrstconsiderasimplenetwork withtwohostsattachedtoahighbandwidthlinkthataresendingsegmentstodestination C attachedtoalow bandwidthlinkasdepictedbelow. Figure4.54:Thecongestionproblem Toavoid congestioncollapse,thehostsmustregulatetheirtransmissionrate 27 byusinga congestioncontrol mechanism.Suchamechanismcanbeimplementedinthetransportlayerorinthenetworklayer.InTCP/IP networks,itisimplementedinthetransportlayer,butothertechnologiessuchas AsynchronousTransferMode (ATM) or FrameRelay includecongestioncontrolmechanismsinlowerlayers. Letusrstconsiderthesimpleproblemofasetof i hoststhatshareasinglebottlenecklinkasshowninthe exampleabove.Inthisnetwork,thecongestioncontrolschememustachievethefollowingobjectives [CJ1989] : 25 Wediscussinanotherchapterotherpossibleorganisationsoftherouter'sbuffers. 26 Atthistime,TCPimplementationsweremainlyfollowing RFC791.Theround-trip-timeestimationsandtheretransmissionmechanisms wereverysimple.TCPwasimprovedafterthepublicationof [Jacobson1988] 27 Inthissection,wefocusoncongestioncontrolmechanismsthatregulatethetransmissionrateofthehosts.Othertypesofmechanismshavebeenproposedintheliterature.Forexample, credit-based ow-controlhasbeenproposedtoavoidcongestioninATMnetworks [KR1995].Withacredit-basedmechanism,hostscanonlysendpacketsoncetheyhavereceivedcreditsfromtheroutersandthecreditsdepend ontheoccupancyoftherouter'sbuffers. 4.3.TheTransmissionControlProtocol 107

    PAGE 112

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 1.Thecongestioncontrolschememust avoidcongestion.Inpractice,thismeansthatthebottlenecklinkcannotbeoverloaded.If r i (t) isthetransmissionrateallocatedtohost i attime t and R thebandwidthofthebottlenecklink,thenthecongestioncontrolschemeshouldensurethat, onaverage, 8t P r i (t) R 2.Thecongestioncontrolschememustbe efcient.Thebottlenecklinkisusuallybothashared andanexpensiveresource.Usually,bottlenecklinksarewidearealinksthataremuchmore expensivetoupgradethanthelocalareanetworks.Thecongestioncontrolschemeshouldensure thatsuchlinksareefcientlyused.Mathematically,thecontrolschemeshouldensurethat 8t P r i (t) R 3.Thecongestioncontrolschemeshouldbe fair.Mostcongestionschemesaimatachieving maxminfairness.Anallocationoftransmissionratestosourcesissaidtobe max-minfair if: nolinkinthenetworkiscongested therateallocatedtosource j cannotbeincreasedwithoutdecreasingtherateallocatedtoa source i whoseallocationissmallerthantherateallocatedtosource j [Leboudec2008]. Dependingonthenetwork,a max-minfairallocation maynotalwaysexist.Inpractice, max-minfairness isan idealobjectivethatcannotnecessarilybeachieved.Whenthereisasinglebottlenecklinkasintheexampleabove, max-minfairness impliesthateachsourceshouldbeallocatedthesametransmissionrate. Tovisualisethedifferentrateallocations,itisusefultoconsiderthegraphshownbelow.Inthisgraph,weplot onthe x-axis (resp. y-axis)therateallocatedtohost B (resp. A).Apointinthegraph (r B ;r A ) correspondstoa possibleallocationofthetransmissionrates.Sincethereisa 2Mbps bottlenecklinkinthisnetwork,thegraph canbedividedintotworegions.Thelowerleftpartofthegraphcontainsallallocations (r B ;r A ) suchthatthe bottlenecklinkisnotcongested(r A + r B < 2).Therightborderofthisregionisthe efciencyline,i.e.theset ofallocationsthatcompletelyutilisethebottlenecklink(r A + r B =2 ).Finally,the fairnessline isthesetoffair allocations. Figure4.55:Possibleallocatedtransmissionrates Asshowninthegraphabove,arateallocationmaybefairbutnotefcient(e.g. r A =0:7;r B =0:7),fairand efcient(e.g. r A =1;r B =1)orefcientbutnotfair(e.g. r A =1:5;r B =0:5).Ideally,theallocationshould bebothfairandefcient.Unfortunately,maintainingsuchanallocationwithuctuationsinthenumberofows thatusethenetworkisachallengingproblem.Furthermore,theremightbeseveralthousandsofTCPconnections ormorethatpassthroughthesamelink 28 Todealwiththeseuctuationsindemand,whichresultinuctuationsintheavailablebandwidth,computer networksuseacongestioncontrolscheme.Thiscongestioncontrolschemeshouldachievethethreeobjectives listedabove.Somecongestioncontrolschemesrelyonaclosecooperationbetweentheendhostsandtherouters, whileothersaremainlyimplementedontheendhostswithlimitedsupportfromtherouters. 28 Forexample,themeasurementsperformedintheSprintnetworkin2004reportedmorethan10kactiveTCPconnectionsonalink,see https://research.sprintlabs.com/packstat/packetoverview.php.Morerecentinformationaboutbackbonelinksmaybeobtainedfromcaida`s realtimemeasurements,seee.g. http://www.caida.org/data/realtime/passive/ 108Chapter4.Thetransportlayer

    PAGE 113

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Acongestioncontrolschemecanbemodelledasanalgorithmthatadaptsthetransmissionrate(r i (t))ofhost i basedonthefeedbackreceivedfromthenetwork.Differenttypesoffeedbacksarepossible.Thesimplestscheme isabinaryfeedback [CJ1989][Jacobson1988] wherethehostssimplylearnwhetherthenetworkiscongestedor not.SomecongestioncontrolschemesallowthenetworktoregularlysendanallocatedtransmissionrateinMbps toeachhost [BF1995]. Letusfocusonthebinaryfeedbackschemewhichisthemostwidelyusedtoday.Intuitively,thecongestion controlschemeshoulddecreasethetransmissionrateofahostwhencongestionhasbeendetectedinthenetwork, inordertoavoidcongestioncollapse.Furthermore,thehostsshouldincreasetheirtransmissionratewhenthe networkisnotcongested.Otherwise,thehostswouldnotbeabletoefcientlyutilisethenetwork.Therate allocatedtoeachhostuctuateswithtime,dependingonthefeedbackreceivedfromthenetwork.Thegure belowillustratestheevolutionofthetransmissionratesallocatedtotwohostsinoursimplenetwork.Initially,two hostshavealowallocation,butthisisnotefcient.Theallocationsincreaseuntilthenetworkbecomescongested. Atthispoint,thehostsdecreasetheirtransmissionratetoavoidcongestioncollapse.Ifthecongestioncontrol schemeworkswell,aftersometimetheallocationsshouldbecomebothfairandefcient. Figure4.56:Evolutionofthetransmissionrates Varioustypesofrateadaptionalgorithmsarepossible. DahMingChiu and RajJain haveanalysed,in [CJ1989], differenttypesofalgorithmsthatcanbeusedbyasourcetoadaptitstransmissionratetothefeedbackreceived fromthenetwork.Intuitively,sucharateadaptationalgorithmincreasesthetransmissionratewhenthenetwork isnotcongested(ensurethatthenetworkisefcientlyused)anddecreasethetransmissionratewhenthenetwork iscongested(toavoidcongestioncollapse). Thesimplestformoffeedbackthatthenetworkcansendtoasourceisabinaryfeedback(thenetworkiscongested ornotcongested).Inthiscase,a linear rateadaptationalgorithmcanbeexpressedas: rate(t +1)= C + f C rate(t) whenthenetworkiscongested rate(t +1)= N + f N rate(t) whenthenetworkis not congested Withalinearadaptionalgorithm, C ; N ;f C and f N areconstants.Theanalysisof [CJ1989] showsthatto befairandefcient,suchabinaryrateadaptionmechanismmustrelyon AdditiveIncreaseandMultiplicative Decrease.Whenthenetworkisnotcongested,thehostsshouldslowlyincreasetheirtransmissionrate(f N = 1 and N > 0).Whenthenetworkiscongested,thehostsmustmultiplicativelydecreasetheirtransmissionrate (f C < 1 and C =0).SuchanAIMDrateadaptationalgorithmcanbeimplementedbythepseudo-codebelow #AdditiveIncreaseMultiplicativeDecrease ifcongestion: rate=rate betaC#multiplicativedecrease,betaC<1 else rate=rate+alphaN#additiveincrease,v0>0 Note: Whichbinaryfeedback? Twotypesofbinaryfeedbackarepossibleincomputernetworks.Arstsolutionistorelyonimplicitfeedback. ThisisthesolutionchosenforTCP.TCP'scongestioncontrolscheme [Jacobson1988] doesnotrequireanycooperationfromtherouter.Itonlyassumesthattheyusebuffersandthattheydiscardpacketswhenthereiscongestion. 4.3.TheTransmissionControlProtocol 109

    PAGE 114

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 TCPusesthesegmentlossesasanindicationofcongestion.Whentherearenolosses,thenetworkisassumedto benotcongested.Thisimpliesthatcongestionisthemaincauseofpacketlosses.Thisistrueinwirednetworks, butunfortunatelynotalwaystrueinwirelessnetworks.Anothersolutionistorelyonexplicitfeedback.This isthesolutionproposedintheDECBitcongestioncontrolscheme [RJ1995] andusedinFrameRelayandATM networks.Thisexplicitfeedbackcanbeimplementedintwoways.Arstsolutionwouldbetodeneaspecial messagethatcouldbesentbyrouterstohostswhentheyarecongested.Unfortunately,generatingsuchmessages mayincreasetheamountofcongestioninthenetwork.Suchacongestionindicationpacketisthusdiscouraged RFC1812.Abetterapproachistoallowtheintermediaterouterstoindicate,inthepacketsthattheyforward, theircurrentcongestionstatus.Binaryfeedbackcanbeencodedbyusingonebitinthepacketheader.Withsucha scheme,congestedrouterssetaspecialbitinthepacketsthattheyforwardwhilenon-congestedroutersleavethis bitunmodied.Thedestinationhostreturnsthecongestionstatusofthenetworkintheacknowledgementsthatit sends.DetailsaboutsuchasolutioninIPnetworksmaybefoundin RFC3168.Unfortunately,asofthiswriting, thissolutionisstillnotdeployeddespiteitspotentialbenets. TheTCPcongestioncontrolschemewasinitiallyproposedby VanJacobson in [Jacobson1988].Thecurrent specicationmaybefoundin RFC5681.TCPrelieson AdditiveIncreaseandMultiplicativeDecrease(AIMD). Toimplement AIMD,aTCPhostmustbeabletocontrolitstransmissionrate.Arstapproachwouldbetouse timersandadjusttheirexpirationtimesinfunctionoftherateimposedby AIMD.Unfortunately,maintainingsuch timersforalargenumberofTCPconnectionscanbedifcult.Instead, VanJacobson notedthattherateofTCP congestioncanbearticiallycontrolledbyconstrainingitssendingwindow.ATCPconnectioncannotsenddata fasterthan window rtt where window isthemaximumbetweenthehost'ssendingwindowandthewindowadvertised bythereceiver. TCP'scongestioncontrolschemeisbasedona congestionwindow.Thecurrentvalueofthecongestionwindow (cwnd )isstoredintheTCBofeachTCPconnectionandthewindowthatcanbeusedbythesenderisconstrained by min(cwnd;rwin;swin ) where swin isthecurrentsendingwindowand rwin thelastreceivedreceivewindow.The AdditiveIncrease partoftheTCPcongestioncontrolincrementsthecongestionwindowby MSS bytes everyround-trip-time.IntheTCPliterature,thisphaseisoftencalledthe congestionavoidance phase.The MultiplicativeDecrease partoftheTCPcongestioncontroldividesthecurrentvalueofthecongestionwindowonce congestionhasbeendetected. WhenaTCPconnectionbegins,thesendinghostdoesnotknowwhetherthepartofthenetworkthatituses toreachthedestinationiscongestedornot.Toavoidcausingtoomuchcongestion,itmuststartwithasmall congestionwindow. [Jacobson1988] recommendsaninitialwindowofMSSbytes.Astheadditiveincreasepart oftheTCPcongestioncontrolschemeincrementsthecongestionwindowbyMSSbyteseveryround-trip-time, theTCPconnectionmayhavetowaitmanyround-trip-timesbeforebeingabletoefcientlyusetheavailable bandwidth.Thisisespeciallyimportantinenvironmentswherethe bandwidth rtt productishigh.Toavoid waitingtoomanyround-trip-timesbeforereachingacongestionwindowthatislargeenoughtoefcientlyutilise thenetwork,theTCPcongestioncontrolschemeincludesthe slow-start algorithm.TheobjectiveoftheTCP slow-start istoquicklyreachanacceptablevalueforthe cwnd.During slow-start,thecongestionwindowis doubledeveryround-trip-time.The slow-start algorithmusesanadditionalvariableintheTCB: sshtresh (slowstartthreshold ).The ssthresh isanestimationofthelastvalueofthe cwnd thatdidnotcausecongestion.Itis initialisedatthesendingwindowandisupdatedaftereachcongestionevent. Inpractice,aTCPimplementationconsidersthenetworktobecongestedonceitsneedstoretransmitasegment. TheTCPcongestioncontrolschemedistinguishesbetweentwotypesofcongestion: mildcongestion.TCPconsidersthatthenetworkislightlycongestedifitreceivesthreeduplicateacknowledgementsandperformsafastretransmit.Ifthefastretransmitissuccessful,thisimpliesthatonlyone segmenthasbeenlost.Inthiscase,TCPperformsmultiplicativedecreaseandthecongestionwindowis dividedby 2.Theslow-startthresholdissettothenewvalueofthecongestionwindow. severecongestion.TCPconsidersthatthenetworkisseverelycongestedwhenitsretransmissiontimer expires.Inthiscase,TCPretransmitstherstsegment,setstheslow-startthresholdto50%ofthecongestion window.ThecongestionwindowisresettoitsinitialvalueandTCPperformsaslow-start. Thegurebelowillustratestheevolutionofthecongestionwindowwhenthereisseverecongestion.Atthe beginningoftheconnection,thesenderperforms slow-start untiltherstsegmentsarelostandtheretransmission timerexpires.Atthistime,the ssthresh issettohalfofthecurrentcongestionwindowandthecongestionwindow isresetatonesegment.Thelostsegmentsareretransmittedasthesenderagainperformsslow-startuntilthe 110Chapter4.Thetransportlayer

    PAGE 115

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 congestionwindowreachesthe sshtresh.Itthenswitchestocongestionavoidanceandthecongestionwindow increaseslinearlyuntilsegmentsarelostandtheretransmissiontimerexpires... Figure4.57:EvaluationoftheTCPcongestionwindowwithseverecongestion Thegurebelowillustratestheevolutionofthecongestionwindowwhenthenetworkislightlycongestedand alllostsegmentscanberetransmittedusingfastretransmit.Thesenderbeginswithaslow-start.Asegmentis lostbutsuccessfullyretransmittedbyafastretransmit.Thecongestionwindowisdividedby2andthesender immediatelyenterscongestionavoidanceasthiswasamildcongestion. Figure4.58:EvaluationoftheTCPcongestionwindowwhenthenetworkislightlycongested MostTCPimplementationsupdatethecongestionwindowwhentheyreceiveanacknowledgement.Ifweassume thatthereceiveracknowledgeseachreceivedsegmentandthesenderonlysendsMSSsizedsegments,theTCP congestioncontrolschemecanbeimplementedusingthesimpliedpseudo-code 29 below #Initialisation cwnd=MSS; ssthresh=swin; #Ackarrival iftcp.ack>snd.una:#newack,nocongestion ifcwnd
    PAGE 116

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 retransmitsegment(snd.una) ssthresh=max(cwnd/2,2 MSS) cwnd=ssthresh else: dupacks=0 #ackforoldsegment,ignored Expirationoftheretransmissiontimer: send(snd.una)#retransmitfirstlostsegment sshtresh=max(cwnd/2,2 MSS) cwnd=MSS FurthermorewhenaTCPconnectionhasbeenidleformorethanitscurrentretransmissiontimer,itshouldresetits congestionwindowtothecongestionwindowsizethatituseswhentheconnectionbegins,asitnolongerknows thecurrentcongestionstateofthenetwork. Note: Initialcongestionwindow TheoriginalTCPcongestioncontrolmechanismproposedin [Jacobson1988] recommendedthateachTCPconnectionshouldbeginbysetting cwnd = MSS .However,intoday'shigherbandwidthnetworks,usingsucha smallinitialcongestionwindowseverelyaffectstheperformanceforshortTCPconnections,suchasthoseused bywebservers.Sincethepublicationof RFC3390,TCPhostsareallowedtouseaninitialcongestionwindow ofabout4KBytes,whichcorrespondsto3segmentsinmanyenvironments. Thankstoitscongestioncontrolscheme,TCPadaptsitstransmissionratetothelossesthatoccurinthenetwork.Intuitively,theTCPtransmissionratedecreaseswhenthepercentageoflossesincreases.Researchershave proposeddetailedmodelsthatallowthepredictionofthethroughputofaTCPconnectionwhenlossesoccur [MSMO1997].TohavesomeintuitionaboutthefactorsthataffecttheperformanceofTCP,letusconsidera verysimplemodel.Itsassumptionsarenotcompletelyrealistic,butitgivesusgoodintuitionwithoutrequiring complexmathematics. ThismodelconsidersahypotheticalTCPconnectionthatsuffersfromequallyspacedsegmentlosses.If p isthe segmentlossratio,thentheTCPconnectionsuccessfullytransfers 1 p )Tj/T1_5 9.963 Tf10.257 0 Td(1 segmentsandthenextsegmentislost. Ifweignoretheslow-startatthebeginningoftheconnection,TCPinthisenvironmentisalwaysincongestion avoidanceasthereareonlyisolatedlossesthatcanberecoveredbyusingfastretransmit.Theevolutionofthe congestionwindowisthusasshowninthegurebelow.Notethethat x-axis ofthisgurerepresentstimemeasured inunitsofoneround-trip-time,whichissupposedtobeconstantinthemodel,andthe y-axis representsthesize ofthecongestionwindowmeasuredinMSS-sizedsegments. Figure4.59:Evolutionofthecongestionwindowwithregularlosses Asthelossesareequallyspaced,thecongestionwindowalwaysstartsatsomevalue( W 2 ),andisincrementedby oneMSSeveryround-trip-timeuntilitreachestwicethisvalue(W ).Atthispoint,asegmentisretransmittedand thecyclestartsagain.IfthecongestionwindowismeasuredinMSS-sizedsegments,acyclelasts W 2 round-triptimes.ThebandwidthoftheTCPconnectionisthenumberofbytesthathavebeentransmittedduringagiven periodoftime.Duringacycle,thenumberofsegmentsthataresentontheTCPconnectionisequaltotheareaof theyellowtrapezeinthegure.Itsareaisthus: area =( W 2 ) 2 + 1 2 ( W 2 ) 2 = 3W 2 8 112Chapter4.Thetransportlayer

    PAGE 117

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 However,giventheregularlossesthatweconsider,thenumberofsegmentsthataresentbetweentwolosses(i.e. duringacycle)isbydenitionequalto 1 p .Thus, W = q 8 3 p = k p p .Thethroughput(inbytespersecond)ofthe TCPconnectionisequaltothenumberofsegmentstransmitteddividedbythedurationofthecycle: Throughput = area MSS time = 3W 2 8 W 2 rtt or,afterhavingeliminated W, Throughput = q 3 2 MSS rtt p p MoredetailedmodelsandtheanalysisofsimulationshaveshownthatarstordermodeloftheTCPthroughput whenlossesoccurwas Throughput k MSS rtt p p .Thisisanimportantresultwhichshowsthat: TCPconnectionswithasmallround-trip-timecanachieveahigherthroughputthanTCPconnectionshaving alongerround-trip-timewhenlossesoccur.ThisimpliesthattheTCPcongestioncontrolschemeisnot completelyfairsinceitfavorstheconnectionsthathavetheshorterround-trip-time TCPconnectionsthatusealargeMSScanachieveahigherthroughputthattheTCPconnectionsthatuse ashorterMSS.ThiscreatesanothersourceofunfairnessbetweenTCPconnections.However,itshouldbe notedthattodaymosthostsareusingalmostthesameMSS,roughly1460bytes. Ingeneral,themaximumthroughputthatcanbeachievedbyaTCPconnectiondependsonitsmaximumwindow sizeandtheround-trip-timeiftherearenolosses.Iftherearelosses,itdependsontheMSS,theround-trip-time andthelossratio. Throughput
    PAGE 118

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 weexplainedhowatransportconnectioncanbesafelyreleased.Wethendiscussedthemechanismsthatareused inTCP,thereliabletransportprotocol,usedbymostapplicationsontheInternet.Mostnotably,wedescribedthe congestioncontrolmechanismthathasbeenincludedinTCPsincethelate1980sandexplainedhowthereliability mechanismsusedbyTCPhavebeentunedovertheyears. 4.5Exercises Thissectionisdividedintwoparts.Therstpartcontainsexercisesontheprinciplesoftransportprotocols, includingTCP.Thesecondpartcontainsprogrammingchallengespacketanalysistoolstoobservethebehaviour oftransportprotocols. 4.5.1Principles 1.ConsidertheAlternatingBitProtocolasdescribedinthischapter Howdoestheprotocolrecoverfromthelossofadatasegment? Howdoestheprotocolrecoversfromthelossofanacknowledgement? 2.AstudentproposedtooptimisetheAlternatingBitProtocolbyaddinganegativeacknowledgment,i.e.the receiversendsa NAK controlsegmentwhenitreceivesacorrupteddatasegment.Whatkindofinformation shouldbeplacedinthiscontrolsegmentandhowshouldthesenderreactwhenreceivingsucha NAK ? 3.Transportprotocolsrelyondifferenttypesofchecksumstoverifywhethersegmentshavebeenaffectedby transmissionerrors.Themostfrequentlyusedchecksumsare: theInternetchecksumusedbyUDP,TCPandotherInternetprotocols whichisdenedin RFC1071 andimplementedinvariousmodules,e.g. http://ilab.cs.byu.edu/cs460/code/ftp/ichecksum.pyforapythonimplementation the16bitsorthe32bitsCyclicalRedundancyChecks(CRC)thatareoftenusedondisks,in ziparchivesandindatalinklayerprotocols.See http://docs.python.org/library/binascii.html for a python modulethatcontainsthe32bitsCRC theAlderchecksumdenedin RFC2920 fortheSCTPprotocolbutreplacedbyaCRClater RFC3309 theFletcherchecksum [Fletcher1982],see http://drdobbs.com/database/184408761 forimplementationdetails ByusingyourknowledgeoftheInternetchecksum,canyoundatransmissionerrorthatwillnotbe detectedbytheInternetchecksum? 4.TheCRCsareefcienterrordetectioncodesthatareabletodetect: allerrorsthataffectanoddnumberofbits allerrorsthataffectasequenceofbitswhichisshorterthanthelengthoftheCRC CarryexperimentswithoneimplementationofCRC-32toverifythatthisisindeedthecase. 5.ChecksumsandCRCsshouldnotbeconfusedwithsecurehashfunctionssuchasMD5denedin RFC 1321 orSHA-1describedin RFC4634.Securehashfunctionsareusedtoensurethatlesorsometimes packets/segmentshavenotbeenmodied.Securehashfunctionsaimatdetectingmaliciouschangeswhile checksumsandCRCsonlydetectrandomtransmissionerrors.Performsomeexperimentswithhashfunctionssuchasthosedenedinthe http://docs.python.org/library/hashlib.html pythonhashlibmoduletoverify thatthisisindeedthecase. 6.AversionoftheAlternatingBitProtocolsupportingvariablelengthsegmentsusesaheaderthatcontains thefollowingelds: a number (0or1) a length eldthatindicatesthelengthofthedata 114Chapter4.Thetransportlayer

    PAGE 119

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 a CRC Tospeedupthetransmissionofthesegments,astudentproposestocomputetheCRCoverthedata partofthesegmentbutnotovertheheader.Whatdoyouthinkofthisoptimisation? 7.OnUnixhosts,the ping(8) commandcanbeusedtomeasuretheround-trip-timetosendandreceive packetsfromaremotehost.Use ping(8) tomeasuretheround-triptoaremotehost.Chosearemote destinationwhichisfarfromyourcurrentlocation,e.g.asmallwebserverinadistantcountry.Thereare implementationsofpinginvariouslanguages,seee.g. http://pypi.python.org/pypi/ping/0.2 fora python implementationof`'ping'` 8.HowwouldyousettheretransmissiontimerifyouwereimplementingtheAlternatingBitProtocolto exchangeleswithaserversuchastheonethatyoumeasuredabove? 9.WhatarethefactorsthataffecttheperformanceoftheAlternatingBitProtocol? 10.Linksareoftenconsideredassymmetrical,i.e.theyofferthesamebandwidthinbothdirections.SymmetricallinksarewidelyusedinLocalAreaNetworksandinthecoreoftheInternet,buttherearemany asymmetricallinktechnologies.ThemostcommonexamplearethevarioustypesofADSLandCATV technologies.ConsideranimplementationoftheAlternatingBitProtocolthatisusedbetweentwohosts thataredirectlyconnectedbyusinganasymmetriclink.Assumethatahostissendingsegmentscontaining 10bytesofcontrolinformationand90bytesofdataandthattheacknowledgementsare10byteslong.If theround-trip-timeisnegligible,whatistheminimumbandwidthrequiredonthereturnlinktoensurethat thetransmissionofacknowledgementsisnotabottleneck? 11.Deriveamathematicalexpressionthatprovidesthe goodput achievedbytheAlternatingBitProtocolassumingthat: Eachsegmentcontains D bytesofdataand c bytesofcontrolinformation Eachacknowledgementcontains c bytesofcontrolinformation Thebandwidthofthetwodirectionsofthelinkissetto B bitspersecond Thedelaybetweenthetwohostsis s secondsinbothdirections The goodput isdenedastheamountofSDUs(measuredinbytes)thatissuccessfullytransferred duringaperiodoftime 12.ConsideranAlternatingBitProtocolthatisusedoveralinkthatsuffersfromdeterministicerrors.When theerrorratioissetto 1 p ,thismeansthat p )Tj/T1_8 9.963 Tf10.337 0 Td(1 bitsaretransmittedcorrectlyandthe p th bitiscorrupted. DiscussthefactorsthataffecttheperformanceoftheAlternatingBitProtocoloversuchalink. 13.Amazonprovidesthe S3storageservice wherecompaniesandresearcherscanstorelotsofinformationand performcomputationsonthestoredinformation.AmazonallowsuserstosendlesthroughtheInternet,but alsobysendinghard-disks.Assumethata1Terabytehard-diskcanbedeliveredwithin24hourstoAmazon bycourierservice.Whatistheminimumbandwidthrequiredtomatchthebandwidthofthiscourierservice ? 14.Severallargedatacentersoperators(e.g. Microsoft and google)haveannouncedthattheyinstallservers ascontainerswitheachcontainerhostingupto2000servers.Assumingacontainerwith2000serversand eachstoring500GBytesofdata,whatisthetimerequiredtomoveallthedatastoredinonecontainerover one10Gbpslink?Whatisthebandwidthofatruckthatneeds10hourstomoveonecontainerfromone datacentertoanother. 15.Whatarethetechniquesusedbyago-back-nsendertorecoverfrom: transmissionerrors lossesofdatasegments lossesofacknowledgements 16.Considera b bitspersecondlinkbetweentwohoststhathasapropagationdelayof t seconds.Derivea formulathatcomputesthetimeelapsedbetweenthetransmissionoftherstbitofa d bytessegmentfrom asendinghostandthereceptionofthelastbitofthissegmentonthereceivinghost. 4.5.Exercises 115

    PAGE 120

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 17.Considerago-back-nsenderandago-backreceiverthataredirectlyconnectedwitha10Mbpslinkthat hasapropagationdelayof100milliseconds.Assumethattheretransmissiontimerissettothreeseconds. Ifthewindowhasalengthof4segments,drawatime-sequencediagramshowingthetransmissionof10 segments(eachsegmentcontains10000bits): whentherearenolosses whenthethirdandseventhsegmentsarelost whenthesecond,fourth,sixth,eighth,...acknowledgementsarelost whenthethirdandfourthdatasegmentsarereordered(i.e.thefourtharrivesbeforethethird) 18.Samequestionwhenusingselectiverepeatinsteadofgo-back-n.Notethattheanswerisnotnecessarilythe same. 19.Considertwohigh-endserversconnectedback-to-backbyusinga10Gbpsinterface.Ifthedelaybetween thetwoserversisonemillisecond,whatisthethroughputthatcanbeachievedbyatransportprotocolthat isusing10,000bitssegmentsandawindowof onesegment tensegments hundredsegments 20.Considertwoserversaredirectlyconnectedbyusinga b bitspersecondlinkwitharound-trip-timeof r seconds.Thetwoserversareusingatransportprotocolthatsendssegmentscontaining s bytesandacknowledgementscomposedof a bytes.Canyouderiveaformulathatcomputesthesmallestwindow(measured insegments)thatisrequiredtoensurethattheserverswillbeabletocompletelyutilisethelink? 21.Samequestionasaboveifthetwoserversareconnectedthroughanasymmetricallinkthattransmits bu bits persecondinthedirectionusedtosenddatasegmentsand bd bitspersecondinthedirectionusedtosend acknowledgements. 22.TheTrivialFileTransferProtocolisaverysimpleletransferprotocolthatisoftenusedbydisk-lesshosts whenbootingfromaserver.ReadtheTFTPspecicationin RFC1350 andexplainhowTFTPrecovers fromtransmissionerrorsandlosses. 23.Isitpossibleforago-back-nreceivertointer-operatewithaselective-repeatsender?Justifyyouranswer. 24.Isitpossibleforaselective-repeatreceivertointer-operatewithago-back-nsender?Justifyyouranswer. 25.Thego-back-nandselectiverepeatmechanismsthataredescribedinthebookexclusivelyrelyoncumulativeacknowledgements.Thisimpliesthatareceiveralwaysreturnstothesenderinformationaboutthe lastsegmentthatwasreceivedin-sequence.Iftherearefrequentlossesorreordering,aselectiverepeat receivercouldreturnseveraltimesthesamecumulativeacknowledgment.Canyouthinkofothertypesof acknowledgementsthatcouldbeusedbyaselectiverepeatreceivertoprovideadditionalinformationabout theout-of-sequencesegmentsthatithasreceived.Designsuchacknowledgementsandexplainhowthe sendershouldreactuponreceptionofthisinformation. 26.The goodput achievedbyatransportprotocolisusuallydenedasthenumberofapplicationlayerbytes thatareexchangedperunitoftime.Whatarethefactorsthatcaninuencethe goodput achievedbyagiven transportprotocol? 27.WhenusedwithIPv4,TransmissionControlProtocol(TCP)attaches40bytesofcontrolinformationto eachsegmentsent.Assuminganinnitewindowandnolossesnortransmissionerrors,deriveaformula thatcomputesthemaximumTCPgoodputinfunctionofthesizeofthesegmentsthataresent. 28.Ago-back-nsenderusesawindowsizeencodedina n bitseld.Howmanysegmentscanitsendwithout receivinganacknowledgement? 29.Considerthefollowingsituation.Ago-back-nreceiverhassentafullwindowofdatasegments.Allthe segmentshavebeenreceivedcorrectlyandin-orderbythereceiver,butallthereturnedacknowledgements havebeenlost.Showbyusingatimesequencediagram(e.g.byconsideringawindowoffoursegments) whathappensinthiscase.Canyouxtheproblemonthego-back-nsender? 116Chapter4.Thetransportlayer

    PAGE 121

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 30.Samequestionasabove,butassumenowthatboththesenderandthereceiverimplementselectiverepeat. Notethetheanswerwillbedifferentfromtheabovequestion. 31.Consideratransportthatsupportswindowofonehundred1250Bytessegments.Whatisthemaximum bandwidththatthisprotocolcanachieveiftheround-trip-timeissettoonesecond?Whathappensif, insteadofadvertisingawindowofonehundredsegments,thereceiverdecidestoadvertiseawindowof10 segments? 32.Explainunderwhichcircumstancesatransportentitycouldadvertiseawindowof0segments? 33.TounderstandtheoperationoftheTCPcongestioncontrolmechanism,itisusefultodrawsometime sequencediagrams.LetusconsiderasimplescenarioofawebclientconnectedtotheInternetthatwishes toretrieveasimplewebpagefromaremotewebserver.Forsimplicity,wewillassumethatthedelay betweentheclientandtheserveris0.5secondsandthatthepackettransmissiontimesontheclientand theserversarenegligible(e.g.theyarebothconnectedtoa1Gbpsnetwork).Wewillalsoassumethatthe clientandtheserveruse1KBytessegments. 1.ComputethetimerequiredtoopenaTCPconnection,sendanHTTPrequestandretrievea16KBytesweb page.Thispagesizeistypicaloftheresultsreturnedbysearchengineslike google_ orbing.Animportant factorinthisdelayistheinitialsizeoftheTCPcongestionwindowontheserver.Assumerstthattheinitial windowissetto1segmentasdenedin RFC2001,4KBytes(i.e.4segmentsinthiscase)asproposedin RFC3390 or16KBytesasproposedinarecentpaper. 2.Performthesameanalysiswithaninitialwindowofonesegmentisthethirdsegmentsentbytheserveris lostandtheretransmissiontimeoutisxedandsetto2seconds. 3.Samequestionasabovebutassumenowthatthe6thsegmentislost. 4.Samequestionasabove,butconsidernowthelossofthesecondandseventhacknowledgementssentbythe client. 5.Doestheanalysisabovechangesiftheinitialwindowissetto16KBytesinsteadofonesegment? 34.SeveralMByteshavebeensentonaTCPconnectionanditbecomesidleforseveralminutes.Discusswhich valuesshouldbeusedforthecongestionwindow,slowstartthresholdandretransmissiontimers. 35.Tooperatereliably,atransportprotocolthatusesGo-back-n(resp.selectiverepeat)cannotuseawindow thatislargerthan 2 n )Tj/T1_3 9.963 Tf9.962 0 Td(1 (resp. 2 n)Tj/T1_7 6.974 Tf(1 )segments.DoesthislimitationaffectsTCP?Explainyouranswer. 36.Considerthesimplenetworkshowninthegurebelow.Inthisnetwork,therouterbetweentheclientand theservercanonlystoreoneachoutgoinginterfaceonepacketinadditiontothepacketthatitiscurrently transmitting.Itdiscardsallthepacketsthatarrivewhileitsbufferisfull.Assumingthatyoucanneglect thetransmissiontimeofacknowledgementsandthattheserverusesaninitialwindowofonesegmentand hasaretransmissiontimersetto500milliseconds,whatisthetimerequiredtotransmit10segmentsfrom theclienttotheserver.Doestheperformanceincreasesiftheserverusesaninitialwindowof16segments instead? Figure4.60:Simplenetwork 37.ThegurebelowdescribestheevolutionofthecongestionwindowofaTCPconnection.Canyoundthe reasonsforthethreeeventsthataremarkedinthegure? 38.ThegurebelowdescribestheevolutionofthecongestionwindowofaTCPconnection.Canyoundthe reasonsforthethreeeventsthataremarkedinthegure? 4.5.Exercises 117

    PAGE 122

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.61:Evolutionofthecongestionwindow Figure4.62:Evolutionofthecongestionwindow 39.AwebserverservesmainlyHTMLpagesthattinside10TCPsegments.Assumingthatthetransmission timeofeachsegmentcanbeneglected,computethetotaltransfertimeofsuchapage(inround-trip-times) assumingthat: theTCPstackusesaninitialwindowsizeof1segment theTCPstackusesaninitialwindowsizeofthreesegments 40. RFC3168 denesmechanismthatallowrouterstomarkpacketsbysettingonebitinthepacketheaderwhen theyarecongested.WhenaTCPdestinationreceivessuchamarkinginapacket,itreturnsthecongestion markingtothesourcethatreactsbyhalvingitscongestionwindowandperformscongestionavoidance. ConsideraTCPconnectionwherethefourthdatasegmentexperiencescongestion.Comparethedelayto transmit8segmentsinanetworkwhereroutersdiscardspacketsduringcongestionandanetworkwhere routersmarkpacketsduringcongestion. 4.5.2Practice 1.ThesocketinterfaceallowsyoutousetheUDPprotocolonaUnixhost.UDPprovidesaconnectionless unreliableservicethatintheoryallowsyoutosendSDUsofupto64KBytes. ImplementasmallUDPclientandasmallUDPserver(inpython,youcanstartfromtheexampleprovided inhttp://docs.python.org/library/socket.html butyoucanalsouseCorjava) runtheclientandtheserversondifferentworkstationstodetermineexperimentallythelargestSDUthat issupportedbyyourlanguageandOS.Ifpossible,usedifferentlanguagesandOperatingSystemsineach group. 2.Byusingthesocketinterface,implementontopoftheconnectionlessunreliableserviceprovidedbyUDP asimpleclientthatsendsthefollowingmessageshowninthegurebelow. Inthismessage,thebitagsshouldbesetto 01010011b,thevalueofthe16bitseldmustbe thesquarerootofthevaluecontainedinthe32bitseld,thecharacterstringmustbeanASCII representation(withoutanytrailing 0)ofthenumbercontainedinthe32bitscharactereld.Thelast 16bitsofthemessagecontainanInternetchecksumthathasbeencomputedovertheentiremessage. Uponreceptionofamessage,theserververiesthat: 118Chapter4.Thetransportlayer

    PAGE 123

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 theaghasthecorrectvalue the32bitsintegeristhesquareofthe16bitsinteger thecharacterstringisanASCIIrepresentationofthe32bitsinteger theInternetchecksumiscorrect Ifthevericationsucceeds,theserverreturnsaSDUcontaining 11111111b.Otherwiseitreturns 01010101b Yourimplementationmustbeabletorunonbothlowendianandbigendianmachines.Ifyouhave accesstodifferenttypesofmachines(e.g.x86laptopsandSPARCservers),trytorunyourimplementationonbothtypesofmachines. Figure4.63:SimpleSDUformat 3.Thesocketlibraryisalsousedtodevelopapplicationsabovethereliablebytestreamserviceprovidedby TCP.Wehaveinstalledonthe cnp3.info.ucl.ac.be serverasimpleserverthatprovidesasimpleclient-server service.Theserviceoperatesasfollows: theserverlistensonport 62141 foraTCPconnection upontheestablishmentofaTCPconnection,theserversendsanintegerbyusingthefollowing TLVformat: thersttwobitsindicatethetypeofinformation(01forASCII,10forboolean) thenextsixbitsindicatethelengthoftheinformation(inbytes) AnASCIITLVhasavariablelengthandthenextbytescontainoneASCIIcharacterper byte.AbooleanTLVhasalengthofonebyte.Thebyteissetto 00000000b for true and 00000001b forfalse. theclientrepliesbysendingthereceivedintegerencodedasa32bitsintegerin networkbyte order theserverreturnsaTLVcontaining true iftheintegerwascorrectandaTLVcontaining false otherwiseandclosestheTCPconnection ImplementaclienttointeractwiththisserverinC,Javaorpython. 4.Itisnowtimetoimplementasmalltransportprotocol.Theprotocolusesaslidingwindowtotransmitmore thanonesegmentwithoutbeingforcedtowaitforanacknowledgment.Yourimplementationmustsupport variablesizeslidingwindowastheotherendoftheowcansenditsmaximumwindowsize.Thewindow sizeisencodedasathreebitsunsignedinteger. TheprotocolidentiestheDATAsegmentsbyusingsequencenumbers.Thesequencenumberof therstsegmentmustbe0.Itisincrementedbyoneforeachnewsegment.Thereceivermust acknowledgethedeliveredsegmentsbysendinganACKsegment.Thesequencenumbereldinthe ACKsegmentalwayscontainsthesequencenumberofthenextexpectedin-sequencesegmentatthe receiver.Theowofdataisunidirectional,meaningthatthesenderonlysendsDATAsegmentsand thereceiveronlysendsACKsegments. Todealwithsegmentslosses,theprotocolmustimplementarecoverytechniquesuchasgo-back-nor selectiverepeatanduseretransmissiontimers.Youcanselectthetechniquethatbestsuiteyourneeds andstartfromasimpletechniquethatyouimprovelater. 4.5.Exercises 119

    PAGE 124

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.64:Segmentformat Thissegmentformatcontainsthefollowingelds: Type:segmenttype 0x1DATAsegment. 0x2ACKsegment WIN :thesizeofthecurrentwindow(anintegerencodedasa3bitseld).InDATAsegments, thiseldindicatesthesizeofthesendingwindowofthesender.InACKsegments,thiseld indicatesthecurrentvalueofthereceivingwindow. Sequence:Sequencenumber(8bitsunsignedinteger),startsat0.Thesequencenumberis incrementedby1foreachnewDATAsegmentsentbythesender.InsideanACKsegment,the sequenceeldcarriesthesequencenumberofthenextin-sequencesegmentthatisexpectedby thereceiver. Length:lengthofthepayloadinmultipleofonebyte.AllDATAsegmentscontainapayload with512bytesofdata,exceptthelastDATAsegmentofatransferthatcanbeshorter.The receptionofaDATAsegmentwhoselengthisdifferentthan512indicatestheendofthedata transit. Payload :thedatatosend TheclientandtheserverexchangeUDPdatagramsthatcontaintheDATAandACKsegments.They mustprovideacommand-lineinterfacethatallowstotransmitonebinaryleandsupportthefollowingparameters: sender receiver Inordertotestthereactionsofyourprotocolagainsterrorsandlosses,youyoucanusearandom numbergeneratortoprobabilisticallydropreceivedsegmentsandintroducerandomdelaysuponthe arrivalofasegment. Packettraceanalysis Whendebuggingnetworkingproblemsortoanalyseperformanceproblems,itissometimesusefultocapturethe segmentsthatareexchangedbetweentwohostsandtoanalysethem. Severalpackettraceanalysistoolsareavailable,eitherascommercialoropen-sourcetools.Thesetoolsareable tocaptureallthepacketsexchangedonalink.Ofcourse,capturingpacketsrequireadministratorprivileges.They canalsoanalysethecontentofthecapturedpacketsanddisplayinformationaboutthem.Thecapturedpackets canbestoredinaleforofineanalysis. tcpdumpisprobablyoneofthemostwellknownpacketcapturesoftware.Itisabletobothcapturepacketsand displaytheircontent.tcpdumpisatext-basedtoolthatcandisplaythevalueofthemostimportanteldsofthe capturedpackets.Additionalinformationabout tcpdump maybefoundin tcpdump(1).Thetextbelowisan exampleoftheoutputof tcpdump fortherstTCPsegmentsexchangedonanscptransferbetweentwohosts. 21:05:56.230737IP192.168.1.101.54150>130.104.78.8.22:S1385328972:1385328972(0)win65535 21:05:56.251468IP130.104.78.8.22>192.168.1.101.54150:S3627767479:3627767479(0)ack1385328973win49248 21:05:56.251560IP192.168.1.101.54150>130.104.78.8.22:.ack1win65535 21:05:56.279137IP130.104.78.8.22>192.168.1.101.54150:P1:21(20)ack1win49248 120Chapter4.Thetransportlayer

    PAGE 125

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 21:05:56.279241IP192.168.1.101.54150>130.104.78.8.22:.ack21win65535 21:05:56.279534IP192.168.1.101.54150>130.104.78.8.22:P1:22(21)ack21win65535 21:05:56.303527IP130.104.78.8.22>192.168.1.101.54150:.ack22win49248 21:05:56.303623IP192.168.1.101.54150>130.104.78.8.22:P22:814(792)ack21win65535 Youcaneasilyrecogniseintheoutputabovethe SYN segmentcontainingthe MSS, windowscale, timestamp and sackOK options,the SYN+ACK segmentwhose wscale optionindicatesthattheserveruseswindowscalingfor thisconnectionandthentherstfewsegmentsexchangedontheconnection. wiresharkismorerecentthan tcpdump.Itevolvedfromtheetherealpackettraceanalysissoftware.Itcanbeused asatexttoollike tcpdump.ForaTCPconnection, wireshark wouldprovidealmostthesameoutputas tcpdump. Themainadvantageof wireshark isthatitalsoincludesagraphicaluserinterfacethatallowstoperformvarious typesofanalysisonapackettrace. Figure4.65:Wireshark:defaultwindow Thewiresharkwindowisdividedinthreeparts.Thetoppartofthewindowisasummaryoftherstpacketsfrom thetrace.Byclickingononeofthelines,youcanshowthedetailedcontentofthispacketinthemiddlepartof thewindow.Themiddleofthewindowallowsyoutoinspectalltheeldsofthecapturedpacket.Thebottompart ofthewindowisthehexadecimalrepresentationofthepacket,withtheeldselectedinthemiddlewindowbeing highlighted. wiresharkisverygoodatdisplayingpackets,butitalsocontainsseveralanalysistoolsthatcanbeveryuseful. Thersttoolis FollowTCPstream.Itispartofthe Analyze menuandallowsyoutoreassembleanddisplayall thepayloadexchangedduringaTCPconnection.Thistoolcanbeusefulifyouneedtoanalyseforexamplethe commandsexchangedduringaSMTPsession. Thesecondtoolistheowgraphthatispartofthe Statistics menu.Itprovidesatimesequencediagramofthe packetsexchangedwithsomecommentsaboutthepacketcontents.Seeblowforanexample. Figure4.66:Wireshark:owgraph Thethirdsetoftoolsarethe TCPstreamgraph toolsthatarepartofthe Statisticsmenu.Thesetoolsallowyou toplotvarioustypesofinformationextractedfromthesegmentsexchangedduringaTCPconnection.Arst interestinggraphisthe sequencenumbergraph thatshowstheevolutionofthesequencenumbereldofthe capturedsegmentswithtime.Thisgraphcanbeusedtodetectgraphicallyretransmissions. 4.5.Exercises 121

    PAGE 126

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure4.67:Wireshark:sequencenumbergraph Asecondinterestinggraphisthe round-trip-time graphthatshowstheevolutionoftheround-trip-timeinfunction oftime.Thisgraphcanbeusedtocheckwhethertheround-trip-timeremainsstableornot.Notethatfroma packettrace,wireshark canplottwo round-trip-time graphs,Onefortheowfromtheclienttotheserverand theotherone. wireshark willplotthe round-trip-time graphthatcorrespondstotheselectedpacketinthetop wireshark window. Figure4.68:Wireshark:round-trip-timegraph Emulatinganetworkwithnetkit NetkitisnetworkemulatorbasedonUserModeLinux.ItallowstoeasilysetupanemulatednetworkofLinux machines,thatcanactasend-hostorrouters. Note: WherecanIndNetkit? Netkitisavailableat http://www.netkit.org.Filescanbedownloadedfrom http://wiki.netkit.org/index.php/Download_Ofcial,andinstructionsfortheinstallationsareavailablehere: http://wiki.netkit.org/download/netkit/INSTALL TherearetwowaystouseNetkit:Themanualway,andbyusingpre-conguredlabs.Intherstcase,you bootandcontroleachmachineindividually,usingthecommandsstartingwithav(forvirtualmachine).Inthe secondcase,youcanstartawholenetworkinasingleoperation.Thecommandsforcontrollingthelabstartwith al.Themanpagesofthosecommandsisavailablefrom http://wiki.netkit.org/man/man7/netkit.7.html 122Chapter4.Thetransportlayer

    PAGE 127

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Youmustbecarefulnottoforgottostopyourvirtualmachinesandlabs,usingeither vhalt or lhalt. Anetkitlabissimplyadirectorycontainingatleastacongurationlecalled lab.conf,andonedirectoryforeach virtualmachine.InthecasethelabavailableoniCampus,thenetworkiscomposedoftwopcs, pc1 and pc2,both ofthembeingconnectedtoarouter r1.Thelab.conflecontainsthefollowinglines: pc1[0]=A pc2[0]=B r1[0]=A r1[1]=B Thismeansthat pc1 and r1 areconnectedtoavirtualLANnamed A viatheirinterface eth0,while pc2 and r1 areconnectedtothevirtualLAN B viarespectivelytheirinterfaces eth0 and eth1. Thedirectoryofeachdeviceisinitiallyempty,butwillbeusedby Netkit tostoretheirlesystem. Thelabdirectorycancontainoptionalles.Inthelabprovidedtoyou,thepc1.startuplecontainstheshell instructionstobeexecutedonstartupofthevirtualmachine.Inthisspeciccase,thescriptcongurestheinterface eth0 toallowtrafcexchangesbetween pc1 and r1,aswellastheroutingtableentrytojoin pc2. Startingalabconsiststhussimplyinunpackingtheprovidedarchive,goingintothelabdirectoryandtyping lstart tostartthenetwork. Note: Filesharingbetweenvirtualmachinesandhost Virtualmachinescanaccesstothedirectoryofthelabtheybelongto.Thisrepertoryismountedintheirlesystem atthepath /hostlab. Inthenetkitlab(exercises/netkit/netkit_lab_2hosts_1rtr_ipv4.tar.tar.gz,youcannd asimplepython client/serverapplicationthatestablishesTCPconnections.Feelfreetore-usethiscodetoperform youranalysis. Note: netkittools AsthevirtualmachinesrunLinux,standardnetworkingtoolssuchas hping, tcpdump, netstat etc.areavailableas usual. Notethatcapturingnetworktracescanbefacilitatedbyusingthe uml_dump extensionavailableat http://kartoch.msi.unilim.fr/blog/?p=19 .ThisextensionisalreadyinstalledintheNetkitinstallationonthestudentlab.Inordertocapturethetrafcexchangedonagiven`virtualLAN',yousimplyneedtoissuethecommand vdump onthehost.Ifyouwanttopipethetracetowireshark,youcanuse vdumpA|wireshark-i-k 1.ATCP/IPstackreceivesaSYNsegmentwiththesequencenumbersetto1234.Whatwillbethevalueof theacknowledgementnumberinthereturnedSYN+ACKsegment? 2.IsitpossibleforaTCP/IPstacktoreturnaSYN+ACKsegmentwiththeacknowledgementnumbersetto 0 ?Ifno,explainwhy.Ifyes,whatwasthecontentofthereceivedSYNsegment. 3.Openthetcpdump packettrace exercises/traces/trace.5connections_opening_closing.pcap andidentifythenumberofdifferentTCPconnectionsthatareestablishedandclosed.Foreachconnection, explainbywhichmechanismtheyareclosed.Analysetheinitialsequencenumbersthatareusedinthe SYNandSYN+ACKsegments.Howdotheseinitialsequencenumbersevolve?Aretheyincreasedevery 4microseconds? 4.The tcpdump packettrace exercises/traces/trace.5connections.pcap containsseveral connectionattempts.Canyouexplainwhatishappeningwiththeseconnectionattempts? 5.The tcpdump packettrace exercises/traces/trace.ipv6.google.com.pcap wascollected fromapopularwebsitethatisaccessiblebyusingIPv6.ExplaintheTCPoptionsthataresupportedbythe clientandtheserver. 4.5.Exercises 123

    PAGE 128

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 6.Thetcpdumppackettrace exercises/traces/trace.sirius.info.ucl.ac.be.pcap Was collectedonthedepartmentalserver.WhataretheTCPoptionssupportedbythisserver? 7.ATCPimplementationmaintainsaTransmissionControlBlock(TCB)foreachTCPconnection.ThisTCB isadatastructurethatcontainsthecompletestateofeachTCPconnection.TheTCBisdescribedin RFC 793.ItcontainsrsttheidenticationoftheTCPconnection: localip :theIPaddressofthelocalhost remoteip :theIPaddressoftheremotehost remoteport :theTCPportusedforthisconnectionontheremotehost localport :theTCPportusedforthisconnectiononthelocalhost.Notethatwhenaclient opensaTCPconnection,thelocalportwilloftenbechosenintheephemeralportrange(49152 <=localport<=65535). sndnxt :thesequencenumberofthenextbyteinthebytestream(therstbyteofanewdata segmentthatyousendwillusethissequencenumber) snduna :theearliestsequencenumberthathasbeensentbuthasnotyetbeenacknowledged rcvnxt :thesequencenumberofthenextbytethatyourimplementationexpectstoreceive fromtheremotehost.Forthisexercise,youdonotneedtomaintainareceivebufferandyour implementationcandiscardtheout-of-sequencesegmentsthatitreceives sndwnd :thecurrentsendingwindow rcvwnd :thecurrentwindowadvertisedbythereceiver Usingthe exercises/traces/trace.sirius.info.ucl.ac.be.pcap packettrace, whatistheTCBoftheconnectiononhost 130.104.78.8 whenitsendsthethirdsegmentofthetrace? 8.Thetcpdumppackettrace exercises/traces/trace.maps.google.com wascollectedbycontainingapopularwebsitethatprovidesmappinginformation.HowmanyTCPconnectionswereusedto retrievetheinformationfromthisserver? 9.Somenetworkmonitoringtoolssuchas ntop collectalltheTCPsegmentssentandreceivedbyahostor agroupofhostsandprovideinterestingstatisticssuchasthenumberofTCPconnections,thenumberof bytesexchangedovereachTCPconnection,...AssumingthatyoucancapturealltheTCPsegmentssent byahost,proposethepseudo-codeofanapplicationthatwouldlistalltheTCPconnectionsestablished andacceptedbythishostandthenumberofbytesexchangedovereachconnection.Doyouneedtocount thenumberofbytescontainedinsideeachsegmenttoreportthenumberofbytesexchangedovereachTCP connection? 10.Therearetwotypesofrewalls 30 :specialdevicesthatareplacedattheborderofcampusorenterprise networksandsoftwarethatrunsonendhosts.Softwarerewallstypicallyanalyseallthepacketsthatare receivedbyahostanddecidebasedonthepacket'sheaderandcontentswhetheritcanbeprocessedby thehost'snetworkstackormustbediscarded.Systemadministratorsoftencongurerewallsonlaptop orstudentmachinestopreventstudentsfrominstallingserversontheirmachines.HowwouldyoudesignasimplerewallthatblocksallincomingTCPconnectionsbutstillallowsthehosttoestablishTCP connectionstoanyremoteserver? 11.Usingthe netkit labexplainedabove,performsometestsbyusing hping3(8). hping3(8) isacommandlinetoolthatallowsanyone(havingsystemadministratorprivileges)tosendspecialIPpacketsand TCPsegments. hping3(8) canbeusedtoverifythecongurationofrewalls 33 ordiagnoseproblems. WewilluseittotesttheoperationoftheLinuxTCPstackrunninginside netkit. 1.Ontheserverhost,launch tcpdump(1) with -vv asparametertocollectallpacketsreceivedfromthe clientanddisplaythem.Using hping3(8) ontheclienthost,sendavalidSYNsegmenttooneunused portontheserverhost(e.g. 12345).Whatarethecontentsofthesegmentreturnedbytheserver? 30 ArewallisasoftwareorhardwaredevicethatanalysesTCP/IPpacketsanddecides,basedonasetofrules,toacceptordiscardthe packetsreceivedorsent.Therulesusedbyarewallusuallydependonthevalueofsomeeldsofthepackets(e.g.typeoftransportprotocols, ports,...).Wewilldiscussinmoredetailstheoperationofrewallsinthenetworklayerchapter. 124Chapter4.Thetransportlayer

    PAGE 129

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 2.Performthesameexperiment,butnowsendaSYNsegmenttowardsport 7.Thisportisthedefaultportfor thediscardservice(see services(5))launchedby xinetd(8)).Whatsegmentdoestheserversends inreply?Whathappensuponreceptionofthissegment?Explainyouranswer. 12.TheLinuxTCP/IPstackcanbeeasilyconguredbyusing sysctl(8) tochangekernelconguration variables.See http://fasterdata.es.net/TCP-tuning/ip-sysctl-2.6.txt forarecentlistofthesysctlvariableson theLinuxTCP/IPstack.TrytodisabletheselectiveacknowledgementsandtheRFC1323timestampand largewindowoptionsandopenaTCPconnectiononport 7 ontheserverbyusing:manpage:telnet`(1)`. Checkbyusing tcpdump(1) theeffectofthesekernelvariablesonthesegmentssentbytheLinuxstack in netkit. 13.Networkadministratorssometimesneedtoverifywhichnetworkingdaemonsareactiveonaserver.When loggedontheserver,severaltoolscanbeusedtoverifythis.Arstsolutionistousethe netstat(8) command.ThiscommandallowsyoutoextractvariousstatisticsfromthenetworkingstackontheLinux kernel.ForTCP, netstat canlistalltheactiveTCPconnectionswiththestateoftheirFSM. netstat supports thefollowingoptionsthatcouldbeusefulduringthisexercises: -t requestsinformationabouttheTCPconnections -n requestsnumericoutput(bydefault, netstat sendsDNSqueriestoresolveIPaddressesin hostsanduses /etc/services toconvertportnumberinservicenames, -n isrecommendedon netkit machines) -e providesmoreinformationaboutthestateoftheTCPconnections -o providesinformationaboutthetimers -a providesinformationaboutallTCPconnections,notonlythoseintheEstablishedstate Onthe netkit lab,launchadaemonandstartaTCPconnectionbyusing telnet(1) anduse netstat(8) toverifythestateoftheseconnections. Asecondsolutiontodeterminewhichnetworkdaemonsarerunningonaserveristouseatoollike nmap(1). nmap(1) canberunremotelyandthuscanprovideinformationaboutahostonwhich thesystemadministratorcannotlogin.Use tcpdump(1) tocollectthesegmentssentby nmap(1) runningontheclientandexplainhow nmap(1) operates. 14.LonglivedTCPconnectionsaresusceptibletotheso-called RSTattacks.Trytondadditionalinformation aboutthisattackandexplainhowaTCPstackcouldmitigatesuchattacks. 15.Fortheexercisesbelow,wehaveperformedmeasurementsinanemulated 31 networksimilartotheone shownbelow. Figure4.69:Emulatednetwork TheemulatednetworkiscomposedofthreeUMLmachines 32 :aclient,aserverandarouter.The clientandtheserverareconnectedviatherouter.Theclientsendsdatatotheserver.Thelinkbetween therouterandtheclientiscontrolledbyusingthe netem Linuxkernelmodule.Thismoduleallows ustoinsertadditionaldelays,reducethelinkbandwidthandinsertrandompacketlosses. 31 Withanemulatednetwork,itismoredifculttoobtainquantitativeresultsthanwitharealnetworksincealltheemulatedmachines needtosharethesameCPUandmemory.Thiscreatesinteractionsbetweenthedifferentemulatedmachinesthatdonothappeninthereal world.However,sincetheobjectiveofthisexerciseisonlytoallowthestudentstounderstandthebehaviouroftheTCPcongestioncontrol mechanism,thisisnotasevereproblem. 32 FormoreinformationabouttheTCPcongestioncontrolschemesimplementedintheLinuxkernel,see http://linuxgazette.net/135/pfeiffer.htmland http://www.cs.helsinki./research/iwtcp/papers/linuxtcp.pdf orthesourcecodeofarecent Linux.AdescriptionofsomeofthesysctlvariablesthatallowtotunetheTCPimplementationintheLinuxkernelmaybefoundin http://fasterdata.es.net/TCP-tuning/linux.html.Forthisexercise,wehaveconguredtheLinuxkerneltousetheNewRenoscheme RFC3782 thatisveryclosetotheofcialstandarddenedin RFC5681 4.5.Exercises 125

    PAGE 130

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Weusednetemtocollectseveraltraces: exercises/traces/trace0.pcap exercises/traces/trace1.pcap exercises/traces/trace2.pcap exercises/traces/trace3.pcap Usingwiresharkortcpdump,carryoutthefollowinganalyses: 1.IdentifytheTCPoptionsthathavebeenusedontheTCPconnection 2.Trytondexplanationsfortheevolutionoftheround-trip-timeoneachoftheseTCP connections.Forthis,youcanusethe round-trip-time graphofwireshark,butbe carefulwiththeirestimationassomeversionsof wireshark arebuggy 3.VerifywhethertheTCPimplementationusedimplemented delayedacknowledgements 4.Insideeachpackettrace,nd: 1.onesegmentthathasbeenretransmittedbyusing fastretransmit.Explainthisretransmissionindetails. 2.onesegmentthathasbeenretransmittedthankstotheexpirationofTCP'sretransmissiontimeout.Explainwhythissegmentcouldnothavebeenretransmittedbyusing fastretransmit. 5. wireshark containsseveraltwousefulgraphs:the round-trip-time graphandthe time sequence graph.Explainhowyouwouldcomputethesamegraphfromsuchatrace. 6.WhendisplayingTCPsegments,recentversionsof wireshark contain expertanalysis heuristicsthatindicatewhetherthesegmenthasbeenretransmitted,whetheritisa duplicateackorwhethertheretransmissiontimeouthasexpired.Explainhowyou wouldimplementthesameheuristicsas wireshark. 7.Canyoundwhichlehasbeenexchangedduringthetransfer? 16.Youhavebeenhiredasannetworkingexpertbyacompany.Inthiscompany,usersofanetworkedapplicationcomplainthatthenetworkisveryslow.Thedevelopersoftheapplicationarguethatanydelaysare causedbypacketlossesandabuggynetwork.Thenetworkadministratorarguesthatthenetworkworks perfectlyandthatthedelaysperceivedbytheusersarecausedbytheapplicationsortheserverswherethe applicationisrunning.Toresolvethecaseanddeterminewhethertheproblemisduetothenetworkorthe serveronwhichtheapplicationisrunning.Thenetworkadministratorhascollectedarepresentativepacket tracethatyoucandownloadfrom exercises/traces/trace9.pcap.Bylookingatthetrace,can youresolvethiscaseandindicatewhetherthenetworkortheapplicationistheculprit? 126Chapter4.Thetransportlayer

    PAGE 131

    CHAPTER 5 Thenetworklayer Thetransportlayerenablestheapplicationstoefcientlyandreliablyexchangedata.Transportlayerentities expecttobeabletosendsegmenttoanydestinationwithouthavingtounderstandanythingabouttheunderlying subnetworktechnologies.Manysubnetworktechnologiesexist.Mostofthemdifferinsubtledetails(framesize, addressing,...).Thenetworklayeristhegluebetweenthesesubnetworksandthetransportlayer.Ithidestothe transportlayerallthecomplexityoftheunderlyingsubnetworksandensuresthatinformationcanbeexchanged betweenhostsconnectedtodifferenttypesofsubnetworks. Inthischapter,werstexplaintheprinciplesofthenetworklayer.Theseprinciplesincludethedatagramand virtualcircuitmodes,theseparationbetweenthedataplaneandthecontrolplaneandthealgorithmsusedby routingprotocols.Then,weexplain,inmoredetail,thenetworklayerintheInternet,startingwithIPv4andIPv6 andthenmovingtotheroutingprotocols(RIP,OSPFandBGP). 5.1Principles Themainobjectiveofthenetworklayeristoallowendsystems,connectedtodifferentnetworks,toexchange informationthroughintermediatesystemscalled router .Theunitofinformationinthenetworklayeriscalleda packet Figure5.1:Thenetworklayerinthereferencemodel Beforeexplainingthenetworklayerindetail,itisusefultobeginbyanalysingtheserviceprovidedbythe datalink layer.Therearemanyvariantsofthedatalinklayer.Someprovideaconnection-orientedservicewhileothers provideaconnectionlessservice.Inthissection,wefocusonconnectionlessdatalinklayerservicesastheyare themostwidelyused.Usingaconnection-orienteddatalinklayercausessomeproblemsthatarebeyondthescope ofthischapter.See RFC3819 foradiscussiononthistopic. Therearethreemaintypesofdatalinklayers.Thesimplestdatalinklayeriswhenthereareonlytwocommunicatingsystemsthataredirectlyconnectedthroughthephysicallayer.Suchadatalinklayerisusedwhenthereisa point-to-pointlinkbetweenthetwocommunicatingsystems.Thetwosystemscanbeendsystemsorrouters.PPP (Point-to-PointProtocol),denedin RFC1661,isanexampleofsuchapoint-to-pointdatalinklayer.Datalink layersexchange frames andadatalink frame sentbyadatalinklayerentityontheleftistransmittedthroughthe physicallayer,sothatitcanreachthedatalinklayerentityontheright.Point-to-pointdatalinklayerscaneither 127

    PAGE 132

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.2:Thepoint-to-pointdatalinklayer provideanunreliableservice(framescanbecorruptedorlost)orareliableservice(inthiscase,thedatalinklayer includesretransmissionmechanismssimilartotheonesusedinthetransportlayer).Theunreliableserviceis frequentlyusedabovephysicallayers(e.g.opticalber,twistedpairs)havingalowbiterrorratiowhilereliability mechanismsareoftenusedinwirelessnetworkstorecoverlocallyfromtransmissionerrors. ThesecondtypeofdatalinklayeristheoneusedinLocalAreaNetworks(LAN).Conceptually,aLANisasetof communicatingdevicessuchthatanytwodevicescandirectlyexchangeframesthroughthedatalinklayer.Both endsystemsandrouterscanbeconnectedtoaLAN.SomeLANsonlyconnectafewdevices,butthereareLANs thatcanconnecthundredsoreventhousandsofdevices. Figure5.3:Alocalareanetwork Inthenextchapter,wedescribetheorganisationandtheoperationofLocalAreaNetworks.Animportantdifferencebetweenthepoint-to-pointdatalinklayersandthedatalinklayersusedinLANsisthatinaLAN,each communicatingdeviceisidentiedbyaunique datalinklayeraddress.Thisaddressisusuallyembeddedinthe hardwareofthedeviceanddifferenttypesofLANsusedifferenttypesofdatalinklayeraddresses.AcommunicatingdeviceattachedtoaLANcansendadatalinkframetoanyothercommunicatingdevicethatisattachedtothe sameLAN.MostLANsalsosupportspecialbroadcastandmulticastdatalinklayeraddresses.Aframesenttothe broadcastaddressoftheLANisdeliveredtoallcommunicatingdevicesthatareattachedtotheLAN.Themulticastaddressesareusedtoidentifygroupsofcommunicatingdevices.Whenaframeissenttowardsamulticast datalinklayeraddress,itisdeliveredbytheLANtoallcommunicatingdevicesthatbelongtothecorresponding group. ThethirdtypeofdatalinklayersareusedinNon-BroadcastMulti-Access(NBMA)networks.Thesenetworksare usedtointerconnectdeviceslikeaLAN.AlldevicesattachedtoanNBMAnetworkareidentiedbyaunique datalinklayeraddress.However,andthisisthemaindifferencebetweenanNBMAnetworkandatraditional LAN,theNBMAserviceonlysupportsunicast.ThedatalinklayerserviceprovidedbyanNBMAnetwork supportsneitherbroadcastnormulticast. Unfortunatelynodatalinklayerisabletosendframesofunlimitedside.Eachdatalinklayerischaracterisedbya maximumframesize.Therearemorethanadozendifferentdatalinklayersandunfortunatelymostofthemusea differentmaximumframesize.Thenetworklayermustcopewiththeheterogeneityofthedatalinklayer. Thenetworklayeritselfreliesonthefollowingprinciples: 1.Eachnetworklayerentityisidentiedbya networklayeraddress.Thisaddressisindependentofthe datalinklayeraddressesthatitmayuse. 128Chapter5.Thenetworklayer

    PAGE 133

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 2.Theserviceprovidedbythenetworklayerdoesnotdependontheserviceortheinternalorganisationofthe underlyingdatalinklayers. 3.Thenetworklayerisconceptuallydividedintotwoplanes:the dataplane andthe controlplane.The data plane containstheprotocolsandmechanismsthatallowhostsandrouterstoexchangepacketscarryinguser data.The controlplane containstheprotocolsandmechanismsthatenablerouterstoefcientlylearnhow toforwardpacketstowardstheirnaldestination. Theindependenceofthenetworklayerfromtheunderlyingdatalinklayerisakeyprincipleofthenetworklayer.It ensuresthatthenetworklayercanbeusedtoallowhostsattachedtodifferenttypesofdatalinklayerstoexchange packetsthroughintermediaterouters.Furthermore,thisallowsthedatalinklayersandthenetworklayertoevolve independentlyfromeachother.Thisenablesthenetworklayertobeeasilyadaptedtoanewdatalinklayerevery timeanewdatalinklayerisinvented. Therearetwotypesofservicethatcanbeprovidedbythenetworklayer: an unreliableconnectionless service a connection-oriented,reliableorunreliable,service Connection-orientedserviceshavebeenpopularwithtechnologiessuchas X.25 and ATM or frame-relay,but nowadaysmostnetworksusean unreliableconnectionless service.Thisisourmainfocusinthischapter. 5.1.1Organisationofthenetworklayer Therearetwopossibleinternalorganisationsofthenetworklayer: datagram virtualcircuits Theinternalorganisationofthenetworkisorthogonaltotheservicethatitprovides,butmostofthetimea datagramorganisationisusedtoprovideaconnectionlessservicewhileavirtualcircuitorganisationisusedin networksthatprovideaconnection-orientedservice. Datagramorganisation Therstandmostpopularorganisationofthenetworklayeristhedatagramorganisation.Thisorganisationis inspiredbytheorganisationofthepostalservice.Eachhostisidentiedbya networklayeraddress.Tosend informationtoaremotehost,ahostcreatesapacketthatcontains: thenetworklayeraddressofthedestinationhost itsownnetworklayeraddress theinformationtobesent Thenetworklayerlimitsthemaximumpacketsize.Thus,theinformationmusthavebeendividedinpacketsby thetransportlayerbeforebeingpassedtothenetworklayer. Tounderstandthedatagramorganisation,letusconsiderthegurebelow.Anetworklayeraddress,represented byaletter,hasbeenassignedtoeachhostandrouter.Tosendsomeinformationtohost J,host A createsapacket containingitsownaddress,thedestinationaddressandtheinformationtobeexchanged. Withthedatagramorganisation,routersuse hop-by-hopforwarding.Thismeansthatwhenarouterreceives apacketthatisnotdestinedtoitself,itlooksupthedestinationaddressofthepacketinits routingtable.A routingtable isadatastructurethatmapseachdestinationaddress(orsetofdestinationaddresses)totheoutgoing interfaceoverwhichapacketdestinedtothisaddressmustbeforwardedtoreachitsnaldestination. Themainconstraintimposedontheroutingtablesisthattheymustallowanyhostinthenetworktoreachany otherhost.Thisimpliesthateachroutermustknowaroutetowardseachdestination,butalsothatthepaths composedfromtheinformationstoredintheroutingtablesmustnotcontainloops.Otherwise,somedestinations wouldbeunreachable. 5.1.Principles 129

    PAGE 134

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.4:Asimpleinternetwork Intheexampleabove,host A sendsitspackettorouter R1. R1 consultsitsroutingtableandforwardsthepacket towards R2.Basedonitsownroutingtable, R2 decidestoforwardthepacketto R5 thatcandeliverittoits destination. Toallowhoststoexchangepackets,anetworkreliesontwodifferenttypesofprotocolsandmechanisms.First, theremustbeaprecisedenitionoftheformatofthepacketsthataresentbyhostsandprocessedbyrouters. Second,thealgorithmusedbytherouterstoforwardthesepacketsmustbedened.Thisprotocolandthis algorithmarepartofthe dataplane ofthenetworklayer.The dataplane containsalltheprotocolsandalgorithms thatareusedbyhostsandrouterstocreateandprocessthepacketsthatcontainuserdata. The dataplane,andinparticulartheforwardingalgorithmusedbytherouters,dependsontheroutingtables thataremaintainedonreachrouter.Theseroutingtablescanbemaintainedbyusingvarioustechniques(manual conguration,distributedprotocols,centralisedcomputation,etc).Thesetechniquesarepartofthe controlplane ofthenetworklayer.The controlplane containsalltheprotocolsandmechanismsthatareusedtocomputeand installroutingtablesontherouters. Thedatagramorganisationhasbeenverypopularincomputernetworks.Datagrambasednetworklayersinclude IPv4andIPv6intheglobalInternet,CLNPdenedbytheISO,IPXdenedbyNovellorXNSdenedbyXerox [Perlman2000]. Virtualcircuitorganisation Themainadvantageofthedatagramorganisationisitssimplicity.Theprinciplesofthisorganisationcaneasily beunderstood.Furthermore,itallowsahosttoeasilysendapackettowardsanydestinationatanytime.However, aseachpacketisforwardedindependentlybyintermediaterouters,packetssentbyahostmaynotfollowthe samepathtoreachagivendestination.Thismaycausepacketreordering,whichmaybeannoyingfortransport protocols.Furthermore,asarouterusing hop-by-hopforwarding alwaysforwardspacketssenttowardsthesame destinationoverthesameoutgoinginterface,thismaycausecongestionoversomelinks. Thesecondorganisationofthenetworklayer,called virtualcircuits,hasbeeninspiredbytheorganisationof telephonenetworks.Telephonenetworkshavebeendesignedtocarryphonecallsthatusuallylastafewminutes. Eachphoneisidentiedbyatelephonenumberandisattachedtoatelephoneswitch.Toinitiateaphonecall,a telephonerstneedstosendthedestination'sphonenumbertoitslocalswitch.Theswitchcooperateswiththe otherswitchesinthenetworktocreateabi-directionalchannelbetweenthetwotelephonesthroughthenetwork. Thischannelwillbeusedbythetwotelephonesduringthelifetimeofthecallandwillbereleasedattheendof thecall.Untilthe1960s,mostofthesechannelswerecreatedmanually,bytelephoneoperators,uponrequestof thecaller.Today'stelephonenetworksuseautomatedswitchesandallowseveralchannelstobecarriedoverthe samephysicallink,buttheprinciplesremainroughlythesame. 130Chapter5.Thenetworklayer

    PAGE 135

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Inanetworkusingvirtualcircuits,allhostsareidentiedwithanetworklayeraddress.However,ahostmust explicitlyrequesttheestablishmentofa virtualcircuit beforebeingabletosendpacketstoadestinationhost. Therequesttoestablishavirtualcircuitisprocessedbythe controlplane,whichinstallsstatetocreatethevirtual circuitbetweenthesourceandthedestinationthroughintermediaterouters.Allthepacketsthataresentonthe virtualcircuitcontainavirtualcircuitidentierthatallowstherouterstodeterminetowhichvirtualcircuiteach packetbelongs.Thisisillustratedinthegurebelowwithonevirtualcircuitbetweenhost A andhost I andanother onebetweenhost A andhost J. Figure5.5:Asimpleinternetworkusingvirtual-circuits Theestablishmentofavirtualcircuitisperformedusinga signallingprotocol inthe controlplane.Usually,the sourcehostsendsasignallingmessagetoindicatetoitsroutertheaddressofthedestinationandpossiblysome performancecharacteristicsofthevirtualcircuittobeestablished.Therstroutercanprocessthesignalling messageintwodifferentways. Arstsolutionisfortheroutertoconsultitsroutingtable,rememberthecharacteristicsoftherequestedvirtual circuitandforwarditoveritsoutgoinginterfacetowardsthedestination.Thesignallingmessageisthusforwardedhop-by-hopuntilitreachesthedestinationandthevirtualcircuitisopenedalongthepathfollowedbythe signallingmessage.Thisisillustratedwiththeredvirtualcircuitinthegurebelow. Figure5.6:Virtualcircuitestablishment Asecondsolutioncanbeusediftheroutersknowtheentiretopologyofthenetwork.Inthiscase,therstrouter canuseatechniquecalled sourcerouting.Uponreceptionofthesignallingmessage,therstrouterchoosesthe pathofthevirtualcircuitinthenetwork.Thispathisencodedasthelistoftheaddressesofallintermediaterouters toreachthedestination.Itisincludedinthesignallingmessageandintermediaterouterscanremovetheiraddress fromthesignallingmessagebeforeforwardingit.Thistechniqueenablesrouterstospreadthevirtualcircuits throughoutthenetworkbetter.Iftheroutersknowtheloadofremotelinks,theycanalsoselecttheleastloaded pathwhenestablishingavirtualcircuit.Thissolutionisillustratedwiththebluecircuitinthegureabove. Thelastpointtobediscussedaboutthevirtualcircuitorganisationisits dataplane.The dataplane mainlydenes theformatofthedatapacketsandthealgorithmusedbyrouterstoforwardpackets.Thedatapacketscontaina virtualcircuitidentier,encodedasaxednumberofbits.Thesevirtualcircuitidentiersareusuallycalled labels. 5.1.Principles 131

    PAGE 136

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Eachhostmaintainsaowtablethatassociatesalabelwitheachvirtualcircuitthatishasestablished.Whena routerreceivesapacketcontainingalabel,itextractsthelabelandconsultsits labelforwardingtable.Thistable isadatastructurethatmapseachcouple (incominginterface,label) totheoutgoinginterfacetobeusedtoforward thepacketaswellasthelabelthatmustbeplacedintheoutgoingpackets.Inpractice,thelabelforwardingtable canbeimplementedasavectorandthecouple (incominginterface,label) istheindexoftheentryinthevector thatcontainstheoutgoinginterfaceandtheoutgoinglabel.Thusasinglememoryaccessissufcienttoconsult thelabelforwardingtable.Theutilisationofthelabelforwardingtableisillustratedinthegurebelow. Figure5.7:Labelforwardingtablesinanetworkusingvirtualcircuits Thevirtualcircuitorganisationhasbeenmainlyusedinpublicnetworks,startingfromX.25andthenFrameRelay andAsynchronousTransferMode(ATM)network. Boththedatagramandvirtualcircuitorganisationshaveadvantagesanddrawbacks.Themainadvantageof thedatagramorganisationisthathostscaneasilysendpacketstoanynumberofdestinationswhilethevirtual circuitorganisationrequirestheestablishmentofavirtualcircuitbeforethetransmissionofadatapacket.This solutioncanbecostlyforhoststhatexchangesmallamountsofdata.Ontheotherhand,themainadvantage ofthevirtualcircuitorganisationisthattheforwardingalgorithmusedbyroutersissimplerthanwhenusing thedatagramorganisation.Furthermore,theutilisationofvirtualcircuitsmayallowtheloadtobebetterspread throughthenetworkthankstotheutilisationofmultiplevirtualcircuits.TheMultiProtocolLabelSwitching (MPLS)techniquethatwewilldiscussinanotherrevisionofthisbookcanbeconsideredasagoodcompromise betweendatagramandvirtualcircuits.MPLSusesvirtualcircuitsbetweenrouters,butdoesnotextendthemto theendhosts.AdditionalinformationaboutMPLSmaybefoundin [ML2011]. 5.1.2Thecontrolplane Oneoftheobjectivesofthe controlplane inthenetworklayeristomaintaintheroutingtablesthatareusedonall routers.Asindicatedearlier,aroutingtableisadatastructurethatcontains,foreachdestinationaddress(orblock ofaddresses)knownbytherouter,theoutgoinginterfaceoverwhichtheroutermustforwardapacketdestinedto thisaddress.Theroutingtablemayalsocontainadditionalinformationsuchastheaddressofthenextrouteron thepathtowardsthedestinationoranestimationofthecostofthispath. Inthissection,wediscussthethreemaintechniquesthatcanbeusedtomaintaintheroutingtablesinanetwork. Staticrouting Thesimplestsolutionistopre-computealltheroutingtablesofallroutersandtoinstallthemoneachrouter. Severalalgorithmscanbeusedtocomputethesetables. 132Chapter5.Thenetworklayer

    PAGE 137

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Asimplesolutionistouseshortestpathroutingandtominimisethenumberofintermediaterouterstoreach eachdestination.Morecomplexalgorithmscantakeintoaccounttheexpectedloadonthelinkstoensurethat congestiondoesnotoccurforagiventrafcdemand.Thesealgorithmsmustallensurethat: allroutersareconguredwitharoutetoreacheachdestination noneofthepathscomposedwiththeentriesfoundintheroutingtablescontainacycle.Suchacyclewould leadtoaforwardingloop. Thegurebelowshowssampleroutingtablesinaveroutersnetwork. Figure5.8:Routingtablesinasimplenetwork Themaindrawbackofstaticroutingisthatitdoesnotadapttotheevolutionofthenetwork.Whenanewrouter orlinkisadded,allroutingtablesmustberecomputed.Furthermore,whenalinkorrouterfails,theroutingtables mustbeupdatedaswell. Distancevectorrouting Distancevectorroutingisasimpledistributedroutingprotocol.Distancevectorroutingallowsrouterstoautomaticallydiscoverthedestinationsreachableinsidethenetworkaswellastheshortestpathtoreacheachofthese destinations.Theshortestpathiscomputedbasedon metrics or costs thatareassociatedtoeachlink.Weuse l.cost torepresentthemetricthathasbeenconguredforlink l onarouter. Eachroutermaintainsaroutingtable.Theroutingtable R canbemodelledasadatastructurethatstores,foreach knowndestinationaddress d,thefollowingattributes: R[d].link istheoutgoinglinkthattherouterusestoforwardpacketstowardsdestination d R[d].cost isthesumofthemetricsofthelinksthatcomposetheshortestpathtoreachdestination d R[d].time isthetimestampofthelastdistancevectorcontainingdestination d Arouterthatusesdistancevectorroutingregularlysendsitsdistancevectoroverallitsinterfaces.Thedistance vectorisasummaryoftherouter'sroutingtablethatindicatesthedistancetowardseachknowndestination.This distancevectorcanbecomputedfromtheroutingtablebyusingthepseudo-codebelow. EveryNseconds: v=Vector() fordinR[]: #adddestinationdtovector v.add(Pair(d,R[d].cost)) foriininterfaces #sendvectorvonthisinterface send(v,interface) Whenarouterboots,itdoesnotknowanydestinationinthenetworkanditsroutingtableonlycontainsitself.It thussendstoallitsneighboursadistancevectorthatcontainsonlyitsaddressatadistanceof 0.Whenarouter receivesadistancevectoronlink l,itprocessesitasfollows. #V:receivedVector #l:linkoverwhichvectorisreceived defreceived(V,l): #receivedvectorfromlinkl fordinV[] 5.1.Principles 133

    PAGE 138

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 ifnot(dinR[]): #newroute R[d].cost=V[d].cost+l.cost R[d].link=l R[d].time=now else: #existingroute,isthenewbetter? if(((V[d].cost+l.cost)
    PAGE 139

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.10:Routingtablescomputedbydistancevectorinasimplenetwork itsdistancevectortoitsneighbourstoinformthem.Theroutecanthenberemovedfromtheroutingtableafter sometime(e.g. 3 N seconds),toensurethattheneighbouringroutershavereceivedthebadnews,evenifsome distancevectorsdonotreachthemduetotransmissionerrors. Considertheexampleaboveandassumethatthelinkbetweenrouters A and B fails.Beforethefailure, A used B toreachdestinations B, C and E while B onlyusedthe A-B linktoreach A.Theaffectedentriestimeoutonrouters A and B andtheybothsendtheirdistancevector. A sendsitsdistancevector [A =0;D = 1;C = 1;D =1;E = 1]. D knowsthatitcannotreach B anymorevia A D sendsitsdistancevector [D =0;B = 1;A =1;C =2 ;E =1] to A and E. A recoversroutestowards C and E via D. B sendsitsdistancevector [B =0 ;A = 1;C =1 ;D =2 ;E =1] to E and C. D learnsthatthereisno routeanymoretoreach A via B. E sendsitsdistancevector [E =0;A =2;C =1;D =1;B =1] to D, B and C. D learnsaroutetowards B. C and B learnaroutetowards A. Atthispoint,allroutershavearoutingtableallowingthemtoreachallanotherrouters,exceptrouter A,which cannotyetreachrouter B. A recoverstheroutetowards B oncerouter D sendsitsupdateddistancevector [A = 1;B =2;C =2;D =1;E =1] .Thislaststepisillustratedingure Routingtablescomputedbydistancevector afterafailure,whichshowstheroutingtablesonallrouters. Figure5.11:Routingtablescomputedbydistancevectorafterafailure Considernowthatthelinkbetween D and E fails.Thenetworkisnowpartitionedintotwodisjointparts:(A D) and(B, E, C ).Theroutestowards B, C and E expirerstonrouter D.Atthistime,router D updatesitsrouting table. If D sends [D =0;A =1;B = 1;C = 1;E = 1], A learnsthat B, C and E areunreachableandupdatesits routingtable. Unfortunately,ifthedistancevectorsentto A islostorif A sendsitsowndistancevector( [A =0;D =1;B = 3;C =3 ;E =2] )atthesametimeas D sendsitsdistancevector, D updatesitsroutingtabletousethe 5.1.Principles 135

    PAGE 140

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 shorterroutesadvertisedby A towards B, C and E.Aftersometime D sendsanewdistancevector: [D = 0;A =1;E =3;C =4;B =4]. A updatesitsroutingtableandaftersometimesendsitsowndistancevector [A =0;D =1;B =5;C =5;E =4] ,etc.Thisproblemisknownasthe counttoinnityproblem innetworking literature.Routers A and D exchangedistancevectorswithincreasingcostsuntilthesecostsreach 1.This problemmayoccurinotherscenariosthantheonedepictedintheabovegure.Infact,distancevectorrouting maysufferfromcounttoinnityproblemsassoonasthereisacycleinthenetwork.Cyclesarenecessaryto haveenoughredundancytodealwithlinkandrouterfailures.Tomitigatetheimpactofcountingtoinnity,some distancevectorprotocolsconsiderthat 16= 1.Unfortunately,thislimitsthemetricsthatnetworkoperatorscan useandthediameterofthenetworksusingdistancevectors. Thiscounttoinnityproblemoccursbecauserouter A advertisestorouter D aroutethatithaslearnedviarouter D.Apossiblesolutiontoavoidthisproblemcouldbetochangehowaroutercreatesitsdistancevector.Instead ofcomputingonedistancevectorandsendingittoallitsneighbors,aroutercouldcreateadistancevectorthatis specictoeachneighbourandonlycontainstheroutesthathavenotbeenlearnedviathisneighbour.Thiscould beimplementedbythefollowingpseudocode. EveryNseconds: #onevectorforeachinterface forlininterfaces: v=Vector() fordinR[]: if(R[d].link!=i): v=v+Pair(d,R[d.cost]) send(v) #endfordinR[] #endforlininterfaces Thistechniqueiscalled split-horizon.Withthistechnique,thecounttoinnityproblemwouldnothavehappened intheabovescenario,asrouter A wouldhaveadvertised [A =0] ,sinceitlearnedallitsotherroutesviarouter D.Anothervariantcalled split-horizonwithpoisonreverse isalsopossible.Routersusingthisvariantadvertisea costof 1 forthedestinationsthattheyreachviatheroutertowhichtheysendthedistancevector.Thiscanbe implementedbyusingthepseudo-codebelow. EveryNseconds: forlininterfaces: #onevectorforeachinterface v=Vector() fordinR[]: if(R[d].link!=i): v=v+Pair(d,R[d.cost]) else: v=v+Pair(d,infinity); send(v) #endfordinR[] #endforlininterfaces Unfortunately,split-horizon,isnotsufcienttoavoidallcounttoinnityproblemswithdistancevectorrouting. Considerthefailureoflink A-B inthenetworkoffourroutersbelow. Figure5.12:Counttoinnityproblem 136Chapter5.Thenetworklayer

    PAGE 141

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Afterhavingdetectedthefailure,router A sendsitsdistancevectors: [A = 1;B =0;C = 1;E =1] torouter C [A = 1;B =0;C =1;E = 1] torouter E If,unfortunately,thedistancevectorsenttorouter C islostduetoatransmissionerrororbecauserouter C is overloaded,anewcounttoinnityproblemcanoccur.Ifrouter C sendsitsdistancevector [A =2;B =1;C = 0;E = 1] torouter E,thisrouterinstallsarouteofdistance 3 toreach A via C.Router E sendsitsdistancevectors [A =3;B = 1;C =1;E =1] torouter B and [A = 1;B =1;C = 1;E =0] torouter C.Thisdistance vectorallows B torecoverarouteofdistance 4 toreach A. Linkstaterouting Linkstateroutingisthesecondfamilyofroutingprotocols.Whiledistancevectorroutersuseadistributed algorithmtocomputetheirroutingtables,link-stateroutersexchangemessagestoalloweachroutertolearnthe entirenetworktopology.Basedonthislearnedtopology,eachrouteristhenabletocomputeitsroutingtableby usingashortestpathcomputation [Dijkstra1959]. Forlink-staterouting,anetworkismodelledasa directedweightedgraph.Eachrouterisanode,andthelinks betweenroutersaretheedgesinthegraph.Apositiveweightisassociatedtoeachdirectededgeandroutersuse theshortestpathtoreacheachdestination.Inpractice,differenttypesofweightcanbeassociatedtoeachdirected edge: unitweight.Ifalllinkshaveaunitweight,shortestpathroutingprefersthepathswiththeleastnumberof intermediaterouters. weightproportionaltothepropagationdelayonthelink.Ifalllinkweightsareconguredthisway,shortest pathroutingusesthepathswiththesmallestpropagationdelay. weight = C bandwidth where C isaconstantlargerthanthehighestlinkbandwidthinthenetwork.Ifalllink weightsareconguredthisway,shortestpathroutingprefershigherbandwidthpathsoverlowerbandwidth paths Usually,thesameweightisassociatedtothetwodirectededgesthatcorrespondtoaphysicallink(i.e. R 1 R 2 and R 2 R 1).However,nothinginthelinkstateprotocolsrequiresthis.Forexample,iftheweightissetin functionofthelinkbandwidth,thenanasymmetricADSLlinkcouldhaveadifferentweightfortheupstreamand downstreamdirections.Othervariantsarepossible.Somenetworksuseoptimisationalgorithmstondthebest setofweightstominimizecongestioninsidethenetworkforagiventrafcdemand [FRT2002]. Whenalink-staterouterboots,itrstneedstodiscovertowhichroutersitisdirectlyconnected.Forthis,each routersendsaHELLOmessageevery N secondsonallofitsinterfaces.Thismessagecontainstherouter's address.Eachrouterhasauniqueaddress.AsitsneighbouringroutersalsosendHELLOmessages,therouter automaticallydiscoverstowhichneighboursitisconnected.TheseHELLOmessagesareonlysenttoneighbours whoaredirectlyconnectedtoarouter,andarouterneverforwardstheHELLOmessagesthattheyreceive.HELLO messagesarealsousedtodetectlinkandrouterfailures.AlinkisconsideredtohavefailedifnoHELLOmessage hasbeenreceivedfromtheneighbouringrouterforaperiodof k N seconds. Figure5.13:TheexchangeofHELLOmessages 5.1.Principles 137

    PAGE 142

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Oncearouterhasdiscovereditsneighbours,itmustreliablydistributeitslocallinkstoallroutersinthenetwork toallowthemtocomputetheirlocalviewofthenetworktopology.Forthis,eachrouterbuildsa link-statepacket (LSP)containingthefollowinginformation: LSP.Router:identication(address)ofthesenderoftheLSP LSP.age:ageorremaininglifetimeoftheLSP LSP.seq:sequencenumberoftheLSP LSP.Links[]:linksadvertisedintheLSP.Eachdirectedlinkisrepresentedwiththefollowinginformation :-LSP.Links[i].Id:identicationoftheneighbour-LSP.Links[i].cost:costofthelink TheseLSPsmustbereliablydistributedinsidethenetworkwithoutusingtherouter'sroutingtablesincethese tablescanonlybecomputedoncetheLSPshavebeenreceived.The Flooding algorithmisusedtoefciently distributetheLSPsofallrouters.Eachrouterthatimplements ooding maintainsa linkstatedatabase (LSDB) containingthemostrecentLSPsentbyeachrouter.WhenarouterreceivesanLSP,itrstverieswhetherthis LSPisalreadystoredinsideitsLSDB.Ifso,therouterhasalreadydistributedtheLSPearlieranditdoesnotneed toforwardit.Otherwise,therouterforwardstheLSPonalllinksexceptthelinkoverwhichtheLSPwasreceived. Reliableoodingcanbeimplementedbyusingthefollowingpseudo-code. #linksisthesetofalllinksontherouter #RouterR'sLSParrivalonlinkl ifnewer(LSP,LSDB(LSP.Router)): LSDB.add(LSP) foriinlinks: ifi!=l: send(LSP,i) else: #LSPhasalreadybeenflooded Inthispseudo-code, LSDB(r) returnsthemostrecent LSP originatingfromrouter r thatisstoredinthe LSDB. newer(lsp1,lsp2) returnstrueif lsp1 ismorerecentthan lsp2.Seethenotebelowforadiscussiononhow newer canbeimplemented. Note: WhichisthemostrecentLSP? ArouterthatimplementsoodingmustbeabletodetectwhetherareceivedLSPisnewerthanthestoredLSP. ThisrequiresacomparisonbetweenthesequencenumberofthereceivedLSPandthesequencenumberofthe LSPstoredinthelinkstatedatabase.TheARPANETroutingprotocol [MRR1979] useda6bitssequencenumber andimplementedthecomparisonasfollows RFC789 def newer(lsp1,lsp2): return (((lsp1.seq > lsp2.seq) and ((lsp1.seq-lsp2.seq)<=32)) or ((lsp1.seq < lsp2.seq) and ((lsp2.seq-lsp1.seq)> 32))) Thiscomparisontakesintoaccountthemodulo 2 6 arithmeticusedtoincrementthesequencenumbers.Intuitively, thecomparisondividesthecircleofallsequencenumbersintotwohalves.Usually,thesequencenumberofthe receivedLSPisequaltothesequencenumberofthestoredLSPincrementedbyone,butsometimesthesequence numbersoftwosuccessiveLSPsmaydiffer,e.g.ifonerouterhasbeendisconnectedfromthenetworkforsome time.ThecomparisonaboveworkedwelluntilOctober27,1980.Onthisday,theARPANETcrashedcompletely. Thecrashwascomplexandinvolvedseveralrouters.Atonepoint,LSP 40 andLSP 44 fromoneoftherouters werestoredintheLSDBofsomeroutersintheARPANET.AsLSP 44 wasthenewest,itshouldhavereplaced byLSP 40 onallrouters.Unfortunately,oneoftheARPANETrouterssufferedfromamemoryproblemand sequencenumber 40 (101000 inbinary)wasreplacedby 8 (001000 inbinary)inthebuggyrouterandooded. ThreeLSPswerepresentinthenetworkand 44 wasnewerthan 40 whichisnewerthan 8,butunfortunately 8 was consideredtobenewerthan 44...Allroutersstartedtoexchangethesethreelinkstatepacketsforeverandthe onlysolutiontorecoverfromthisproblemwastoshutdowntheentirenetwork RFC789. Currentlinkstateroutingprotocolsusuallyuse32bitssequencenumbersandincludeaspecialmechanisminthe unlikelycasethatasequencenumberreachesthemaximumvalue(usinga32bitssequencenumberspacetakes 136yearsifalinkstatepacketisgeneratedeverysecond). 138Chapter5.Thenetworklayer

    PAGE 143

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Todealwiththememorycorruptionproblem,linkstatepacketscontainachecksum.Thischecksumiscomputed bytherouterthatgeneratestheLSP.EachroutermustverifythechecksumwhenitreceivesoroodsanLSP. Furthermore,eachroutermustperiodicallyverifythechecksumsoftheLSPsstoredinitsLSDB. Floodingisillustratedinthegurebelow.ByexchangingHELLOmessages,eachrouterlearnsitsdirectneighbours.Forexample,router E learnsthatitisdirectlyconnectedtorouters D, B and C.ItsrstLSPhassequence number 0 andcontainsthedirectedlinks E->D, E->B and E->C.Router E sendsitsLSPonallitslinksandrouters D, B and C inserttheLSPintheirLSDBandforwarditovertheirotherlinks. Figure5.14:Flooding:example FloodingallowsLSPstobedistributedtoallroutersinsidethenetworkwithoutrelyingonroutingtables.Inthe exampleabove,theLSPsentbyrouter E islikelytobesenttwiceonsomelinksinthenetwork.Forexample, routers B and C receive E `sLSPatalmostthesametimeandforwarditoverthe B-C link.Toavoidsendingthe sameLSPtwiceoneachlink,apossiblesolutionistoslightlychangethepseudo-codeabovesothatarouterwaits forsomerandomtimebeforeforwardingaLSPoneachlink.Thedrawbackofthissolutionisthatthedelayto oodanLSPtoallroutersinthenetworkincreases.Inpractice,routersimmediatelyoodtheLSPsthatcontain newinformation(e.g.additionorremovalofalink)anddelaytheoodingofrefreshLSPs(i.e.LSPsthatcontain exactlythesameinformationasthepreviousLSPoriginatingfromthisrouter) [FFEB2005]. ToensurethatallroutersreceiveallLSPs,evenwhentherearetransmissionserrors,linkstateroutingprotocols use reliableooding.With reliableooding,routersuseacknowledgementsandifnecessaryretransmissions toensurethatalllinkstatepacketsaresuccessfullytransferredtoallneighbouringrouters.Thankstoreliable ooding,allroutersstoreintheirLSDBthemostrecentLSPsentbyeachrouterinthenetwork.Bycombining thereceivedLSPswithitsownLSP,eachroutercancomputetheentirenetworktopology. Figure5.15:Linkstatedatabasesreceivedbyallrouters 5.1.Principles 139

    PAGE 144

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Note: Staticordynamiclinkmetrics? Aslinkstatepacketsareoodedregularly,routersareabletomeasurethequality(e.g.delayorload)oftheir linksandadjustthemetricofeachlinkaccordingtoitscurrentquality.Suchdynamicadjustmentswereincluded intheARPANETroutingprotocol [MRR1979] .However,experienceshowedthatitwasdifculttotunethe dynamicadjustmentsandensurethatnoforwardingloopsoccurinthenetwork [KZ1989].Today'slinkstate routingprotocolsusemetricsthataremanuallyconguredontheroutersandareonlychangedbythenetwork operatorsornetworkmanagementtools [FRT2002]. Whenalinkfails,thetworoutersattachedtothelinkdetectthefailurebythelackofHELLOmessagesreceived inthelast k N seconds.Oncearouterhasdetectedalocallinkfailure,itgeneratesandoodsanewLSPthat nolongercontainsthefailedlinkandthenewLSPreplacesthepreviousLSPinthenetwork.Asthetworouters attachedtoalinkdonotdetectthisfailureexactlyatthesametime,somelinksmaybeannouncedinonlyone direction.Thisisillustratedinthegurebelow.Router E hasdetectedthefailuresoflink E-B andoodedanew LSP,butrouter B hasnotyetdetectedthefailure. Figure5.16:Thetwo-wayconnectivitycheck WhenalinkisreportedintheLSPofonlyoneoftheattachedrouters,routersconsiderthelinkashavingfailedand theyremoveitfromthedirectedgraphthattheycomputefromtheirLSDB.Thisiscalledthe two-wayconnectivity check.ThischeckallowslinkfailurestobeoodedquicklyasasingleLSPissufcienttoannouncesuchbad news.However,whenalinkcomesup,itcanonlybeusedoncethetwoattachedroutershavesenttheirLSPs.The two-wayconnectivitycheck alsoallowsfordealingwithrouterfailures.Whenarouterfails,allitslinksfailby denition.Unfortunately,itdoesnot,ofcourse,sendanewLSPtoannounceitsfailure.The two-wayconnectivity check ensuresthatthefailedrouterisremovedfromthegraph. Whenarouterhasfailed,itsLSPmustberemovedfromtheLSDBofallrouters 1 .Thiscanbedonebyusingthe age eldthatisincludedineachLSP.The age eldisusedtoboundthemaximumlifetimeofalinkstatepacket inthenetwork.WhenaroutergeneratesaLSP,itsetsitslifetime(usuallymeasuredinseconds)inthe age eld. Allroutersregularlydecrementthe age oftheLSPsintheirLSDBandaLSPisdiscardedonceits age reaches 0. Thankstothe age eld,theLSPfromafailedrouterdoesnotremainintheLSDBsforever. Tocomputeitsroutingtable,eachroutercomputesthespanningtreerootedatitselfbyusingDijkstra'sshortest pathalgorithm [Dijkstra1959].Theroutingtablecanbederivedautomaticallyfromthespanningasshowninthe gurebelow. 5.2InternetProtocol TheInternetProtocol(IP)isthenetworklayerprotocoloftheTCP/IPprotocolsuite.IPallowstheapplications runningabovethetransportlayer(UDP/TCP)touseawiderangeofheterogeneousdatalinklayers.IPwas 1 ItshouldbenotedthatlinkstateroutingassumesthatallroutersinthenetworkhaveenoughmemorytostoretheentireLSDB.The routersthatdonothaveenoughmemorytostoretheentireLSDBcannotparticipateinlinkstaterouting.Somelinkstateroutingprotocols allowrouterstoreportthattheydonothaveenoughmemoryandmustberemovedfromthegraphbytheotherroutersinthenetwork. 140Chapter5.Thenetworklayer

    PAGE 145

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.17:Computationoftheroutingtable designedwhenmostpoint-to-pointlinksweretelephonelineswithmodems.Sincethen,IPhasbeenabletouse LocalAreaNetworks(Ethernet,TokenRing,FDDI,...),newwideareadatalinklayertechnologies(X.25,ATM, FrameRelay,...)andmorerecentlywirelessnetworks(802.11,802.15,UMTS,GPRS,...).TheexibilityofIP anditsabilitytousevarioustypesofunderlyingdatalinklayertechnologiesisoneofitskeyadvantages. Figure5.18:IPandthereferencemodel ThecurrentversionofIPisversion4speciedin RFC791.WerstdescribethisversionandlaterexplainIP version6,whichisexpectedtoreplaceIPversion4inthenotsodistantfuture. 5.2.1IPversion4 IPversion4isthedataplaneprotocolofthenetworklayerintheTCP/IPprotocolsuite.ThedesignofIPversion 4wasbasedonthefollowingassumptions: IPshouldprovideanunreliableconnectionlessservice(TCPprovidesreliabilitywhenrequiredbytheapplication) IPoperateswiththedatagramtransmissionmode IPaddresseshaveaxedsizeof32bits IPmustbeusableabovedifferenttypesofdatalinklayers IPhostsexchangevariablelengthpackets Theaddressesareanimportantpartofanynetworklayerprotocol.Inthelate1970s,thedevelopersofIPv4 designedIPv4foraresearchnetworkthatwouldinterconnectsomeresearchlabsanduniversities.Forthisutilisation,32bitswideaddressesweremuchlargerthantheexpectednumberofhostsonthenetwork.Furthermore,32 bitswasaniceaddresssizeforsoftware-basedrouters.NoneofthedevelopersofIPv4wereexpectingthatIPv4 wouldbecomeaswidelyusedasitistoday. IPv4addressesareencodedasa32bitseld.IPv4addressesareoftenrepresentedin dotted-decimal formatasa sequenceoffourintegersseparatedbya dot.Therstintegeristhedecimalrepresentationofthemostsignicant byteofthe32bitsIPv4address,...Forexample, 1.2.3.4correspondsto00000001000000100000001100000100 5.2.InternetProtocol 141

    PAGE 146

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 127.0.0.1correspondsto01111111000000000000000000000001 255.255.255.255correspondsto11111111111111111111111111111111 AnIPv4addressisusedtoidentifyaninterfaceonarouterorahost.ArouterhasthusasmanyIPv4addressesas thenumberofinterfacesthatithasinthedatalinklayer.Mosthostshaveasingledatalinklayerinterfaceandthus haveasingleIPv4address.However,withthegrowthofwireless,moreandmorehostshaveseveraldatalinklayer interfaces(e.g.anEthernetinterfaceandaWiFiinterface).Thesehostsaresaidtobe multihomed.Amultihomed hostwithtwointerfaceshasthustwoIPv4addresses. Animportantpointtobedenedinanetworklayerprotocolistheallocationofthenetworklayeraddresses.A naiveallocationschemewouldbetoprovideanIPv4addresstoeachhostwhenthehostisattachedtotheInternet onarstcomerstservedbasis.Withthissolution,ahostinBelgiumcouldhaveaddress2.3.4.5whileanother hostlocatedinAfricawoulduseaddress2.3.4.6.Unfortunately,thiswouldforceallrouterstomaintainaspecic routetowardseachhost.Thegurebelowshowsasimpleenterprisenetworkwithtworoutersandthreehostsand theassociatedroutingtablesifsuchisolatedaddresseswereused. Figure5.19:ScalabilityissueswhenusingisolatedIPaddresses Topreservethescalabilityoftheroutingsystem,itisimportanttominimizethenumberofroutesthatarestoredon eachrouter.Aroutercannotstoreandmaintainonerouteforeachofthealmost1billionhoststhatareconnected totoday'sInternet.Routersshouldonlymaintainroutestowardsblocksofaddressesandnottowardsindividual hosts.Forthis,hostsaregroupedin subnets basedontheirlocationinthenetwork.Atypicalsubnetgroups allthehoststhatarepartofthesameenterprise.AnenterprisenetworkisusuallycomposedofseveralLANs interconnectedbyrouters.AsmallblockofaddressesfromtheEnterprise'sblockisusuallyassignedtoeach LAN.AnIPv4addressiscomposedoftwoparts:a subnetworkidentier anda hostidentier.The subnetwork identier iscomposedofthehighorderbitsoftheaddressandthehostidentierisencodedintheloworderbits oftheaddress.Thisisillustratedinthegurebelow. Figure5.20:ThesubnetworkandhostidentiersinsideanIPv4address Whenarouterneedstoforwardapacket,itmustknowthe subnet ofthedestinationaddresstobeabletoconsult itsforwardingtabletoforwardthepacket. RFC791 proposedtousethehigh-orderbitsoftheaddresstoencode thelengthofthesubnetidentier.Thisledtothedenitionofthree classes ofunicastaddresses 2 2 InadditiontotheA,BandCclasses, RFC791 alsodenedthe D and E classesofIPv4addresses.Class D (resp. E )addressesarethose whosehighorderbitsaresetto 1110 (resp. 1111).Class D addressesareusedbyIPmulticastandwillbeexplainedlater.Class E addresses arecurrentlyunused,buttherearesomediscussionsonpossiblefutureusages [WMH2008][FLM2008] 142Chapter5.Thenetworklayer

    PAGE 147

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Class High-orderbits Lengthofsubnetid Numberofnetworks Addressespernetwork ClassA 0 8bits 128 16,777,216(2 24 ) ClassB 10 16bits 16,384 65,536(2 16 ) ClassC 110 24bits 2,097,152 256(2 8 ) However,thesethreeclassesofaddresseswerenotexibleenough.Aclass A subnetwastoolargeformost organisationsandaclass C subnetwastoosmall.Flexibilitywasaddedbytheintroductionof variable-length subnets in RFC1519.With variable-length subnets,thesubnetidentiercanbeanysize,from 1 to 31 bits. Variable-length subnetsallowthenetworkoperatorstouseasubnetthatbettermatchesthenumberofhoststhat areplacedinsidethesubnet.AsubnetidentierorIPv4prexisusually 3 representedas A.B.C.D/p where A.B.C.D isthenetworkaddressobtainedbyconcatenatingthesubnetidentierwithahostidentiercontainingonly 0 and p isthelengthofthesubnetidentierinbits.ThetablebelowprovidesexamplesofIPsubnets. Subnet Numberofaddresses Smallestaddress Highestaddress 10.0.0.0/8 16,777,216 10.0.0.0 10.255.255.255 192.168.0.0/16 65,536 192.168.0.0 192.168.255.255 198.18.0.0/15 131,072 198.18.0.0 198.19.255.255 192.0.2.0/24 256 192.0.2.0 192.0.2.255 10.0.0.0/30 4 10.0.0.0 10.0.0.3 10.0.0.0/31 2 10.0.0.0 10.0.0.1 ThegurebelowprovidesasimpleexampleoftheutilisationofIPv4subnetsinanenterprisenetwork.Thelength ofthesubnetidentierassignedtoaLANusuallydependsontheexpectednumberofhostsattachedtotheLAN. Forpoint-to-pointlinks,manydeploymentshaveused /30 prexes,butrecentroutersarenowusing /31 subnets onpoint-to-pointlinks RFC3021 ordonotevenuseIPv4addressesonsuchlinks 4 Figure5.21:IPsubnetsinasimpleenterprisenetwork Asecondissueconcerningtheaddressesofthenetworklayeristheallocationschemethatisusedtoallocate blocksofaddressestoorganisations.Therstallocationschemewasbasedonthedifferentclassesofaddresses. ThepoolofIPv4addresseswasmanagedbyasecretariatwhoallocatedaddressblocksonarst-comerstserved basis.LargeorganisationssuchasIBM,BBN,aswellasStanfordortheMITwereabletoobtainaclass A address block.Mostorganisationsrequestedaclass B addressblockcontaining65536addresses,whichwassuitablefor mostenterprisesanduniversities.ThetablebelowprovidesexamplesofsomeIPv4addressblocksintheclass B space. 3 AnotherwayofrepresentingIPsubnetsistousenetmasks.Anetmaskisa32bitseldwhose p highorderbitsaresetto 1 andthelow orderbitsaresetto 0.Thenumberofhighorderbitsset 1 indicatesthelengthofthesubnetidentier.Netmasksareusuallyrepresentedin thesamedotteddecimalformatasIPv4addresses.Forexample 10.0.0.0/8 wouldberepresentedas 10.0.0.0255.0.0.0 while 192.168.1.0/24 wouldberepresentedas 192.168.1.0255.255.255.0.Insomecases,thenetmaskcanberepresentedinhexadecimal. 4 Apoint-to-pointlinktowhichnoIPv4addresshasbeenallocatediscalledanunnumberedlink.See RFC1812 section2.2.7fora discussionofsuchunnumberedlinks. 5.2.InternetProtocol 143

    PAGE 148

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Subnet Organisation 130.100.0.0/16 Ericsson,Sweden 130.101.0.0/16 UniversityofAkron,USA 130.102.0.0/16 TheUniversityofQueensland,Australia 130.103.0.0/16 LotusDevelopment,USA 130.104.0.0/16 UniversitecatholiquedeLouvain,Belgium 130.105.0.0/16 OpenSoftwareFoundation,USA However,theInternetwasavictimofitsownsuccessandinthelate1980s,manyorganisationswererequesting blocksofIPv4addressesandstartedconnectingtotheInternet.Mostoftheseorganisationsrequestedclass B addressblocks,asclass A addressblocksweretoolargeandinlimitedsupplywhileclass C addressblockswere consideredtobetoosmall.Unfortunately,therewereonly16,384differentclass B addressblocksandthisaddress spacewasbeingconsumedquickly.Asaconsequence,theroutingtablesmaintainedbytheroutersweregrowing quicklyandsomeroutershaddifcultiesmaintainingalltheseroutesintheirlimitedmemory 5 Figure5.22:EvolutionofthesizeoftheroutingtablesontheInternet(Jul1988-Dec1992-source: RFC1518) Facedwiththesetwoproblems,theInternetEngineeringTaskForcedecidedtodeveloptheClasslessInterdomain Routing(CIDR)architecture RFC1518.ThisarchitectureaimsatallowingIProutingtoscalebetterthanthe class-basedarchitecture.CIDRcontainsthreeimportantmodicationscomparedto RFC791. 1.IPaddressclassesaredeprecated.AllIPequipmentmustuseandsupportvariable-lengthsubnets. 2.IPaddressblocksarenolongerallocatedonarst-come-rst-servedbasis.Instead,CIDRintroducesa hierarchicaladdressallocationscheme. 3.IProutersmustuselongest-prexmatchwhentheylookupadestinationaddressintheirforwardingtable ThelasttwomodicationswereintroducedtoimprovethescalabilityoftheIProutingsystem.Themaindrawbackoftherst-come-rst-servedaddressblockallocationschemewasthatneighbouringaddressblockswere allocatedtoverydifferentorganisationsandconversely,verydifferentaddressblockswereallocatedtosimilarorganisations.WithCIDR,addressblocksareallocatedbyRegionalIPRegistries(RIR)inanaggregatablemanner. ARIRisresponsibleforalargeblockofaddressesandaregion.Forexample, RIPE istheRIRthatisresponsible forEurope.ARIRallocatessmalleraddressblocksfromitslargeblocktoInternetServiceProviders RFC2050. InternetServiceProvidersthenallocatesmalleraddressblockstotheircustomers.Whenanorganisationrequests anaddressblock,itmustprovethatitalreadyhasorexpectstohaveinthenearfuture,anumberofhostsor customersthatisequivalenttothesizeoftherequestedaddressblock. Themainadvantageofthishierarchicaladdressblockallocationschemeisthatitallowstherouterstomaintain fewerroutes.Forexample,considertheaddressblocksthatwereallocatedtosomeoftheBelgianuniversitiesas showninthetablebelow. 5 ExampleroutersfromthisperiodincludetheCiscoAGS http://www.knossos.net.nz/don/wn1.html andAGS+ http://www.ciscopress.com/articles/article.asp?p=25296 144Chapter5.Thenetworklayer

    PAGE 149

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Addressblock Organisation 130.104.0.0/16 UniversitecatholiquedeLouvain 134.58.0.0/16 KatholiekUniversiteitLeuven 138.48.0.0/16 FacultesuniversitairesNotre-DamedelaPaix 139.165.0.0/16 UniversitedeLiege 164.15.0.0/16 UniversiteLibredeBruxelles TheseuniversitiesareallconnectedtotheInternetexclusivelyvia Belnet.Aseachuniversityhasbeenallocated adifferentaddressblock,theroutersof Belnet mustannounceonerouteforeachuniversityandallrouterson theInternetmustmaintainaroutetowardseachuniversity.Incontrast,considerallthehighschoolsandthe governmentinstitutionsthatareconnectedtotheInternetvia Belnet.Anaddressblockwasassignedtothese institutionsaftertheintroductionofCIDRinthe 193.190.0.0/15 addressblockownedby Belnet.WithCIDR, Belnet canannounceasingleroutetowards 193.190.0.0/15 thatcoversallofthesehighschools. However,thereisonedifcultywiththeaggregatablevariablelengthsubnetsusedbyCIDR.Considerforexample FEDICT,agovernmentinstitutionthatusesthe 193.191.244.0/23 addressblock.Assumethatinadditiontobeing connectedtotheInternetvia Belnet FEDICT alsowantstobeconnectedtoanotherInternetServiceProvider. TheFEDICTnetworkisthensaidtobemultihomed.Thisisshowninthegurebelow. Figure5.23:MultihomingandCIDR Withsuchamultihomednetwork,routers R1 and R2 wouldhavetworoutestowardsIPv4address 193.191.245.88 :onerouteviaBelnet(193.190.0.0/15)andonedirectroute(193.191.244.0/23).BothroutesmatchIPv4address 193.192.145.88.Since RFC1519 whenarouterknowsseveralroutestowardsthesamedestinationaddress,it mustforwardpacketsalongtheroutehavingthelongestprexlength.Inthecaseof 193.191.245.88,thisisthe route 193.191.244.0/23 thatisusedtoforwardthepacket.Thisforwardingruleiscalledthe longestprexmatch orthe morespecicmatch.AllIPv4routersimplementthisforwardingrule. Tounderstandthe longestprexmatch forwarding,considerthegurebelow.Withthisrule,theroute 0.0.0.0/0 playsaparticularrole.Asthisroutehasaprexlengthof 0 bits,itmatchesalldestinationaddresses.Thisroute isoftencalledthe default route. apacketwithdestination 192.168.1.1 receivedbyrouter R isdestinedtotherouteritself.Itisdeliveredto theappropriatetransportprotocol. apacketwithdestination 11.2.3.4 matchestworoutes: 11.0.0.0/8 and 0.0.0.0/0.Thepacketisforwarded onthe West interface. apacketwithdestination 130.4.3.4 matchesoneroute: 0.0.0.0/0.Thepacketisforwardedonthe North interface. apacketwithdestination 4.4.5.6 matchestworoutes: 4.0.0.0/8 and 0.0.0.0/0.Thepacketisforwardedon the West interface. 5.2.InternetProtocol 145

    PAGE 150

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 apacketwithdestination 4.10.11.254 matchesthreeroutes: 4.0.0.0/8, 4.10.11.0/24and`0.0.0.0/0.The packetisforwardedonthe South interface. Figure5.24:Longestprexmatchexample Thelongestprexmatchcanbeimplementedbyusingdifferentdatastructures.Onepossibilityistouseatrie. Thegurebelowshowsatriethatencodessixrouteshavingdifferentoutgoinginterfaces. Figure5.25:Atrierepresentingaroutingtable Note: SpecialIPv4addresses MostunicastIPv4addressescanappearassourceanddestinationaddressesinpacketsontheglobalInternet. However,itisworthnotingthatsomeblocksofIPv4addresseshaveaspecialusage,asdescribedin RFC5735. Theseinclude: 146Chapter5.Thenetworklayer

    PAGE 151

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 0.0.0.0/8,whichisreservedforself-identication.Acommonaddressinthisblockis 0.0.0.0,whichis sometimesusedwhenahostbootsanddoesnotyetknowitsIPv4address. 127.0.0.0/8,whichisreservedforloopbackaddresses.EachhostimplementingIPv4musthavealoopback interface(thatisnotattachedtoadatalinklayer).Byconvention,IPv4address 127.0.0.1 isassignedtothis interface.ThisallowsprocessesrunningonahosttouseTCP/IPtocontactotherprocessesrunningonthe samehost.Thiscanbeveryusefulfortestingpurposes. 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16 arereservedforprivatenetworksthatarenotdirectlyattached totheInternet.Theseaddressesareoftencalledprivateaddressesor RFC1918 addresses. 169.254.0.0/16 isusedforlink-localaddresses RFC3927.Somehostsuseanaddressinthisblockwhen theyareconnectedtoanetworkthatdoesnotallocateaddressesasexpected. IPv4packets NowthatwehaveclariedtheallocationofIPv4addressesandtheutilisationofthelongestprexmatchto forwardIPv4packets,wecanhaveamoredetailedlookatIPv4bystartingwiththeformatoftheIPv4packets. TheIPv4packetformatwasdenedin RFC791.Apartfromafewclaricationsandsomebackwardcompatible changes,theIPv4packetformatdidnotchangesignicantlysincethepublicationof RFC791.AllIPv4packets usethe20bytesheadershownbelow.SomeIPv4packetscontainanoptionalheaderextensionthatisdescribed later. Figure5.26:TheIPversion4header ThemaineldsoftheIPv4headerare: a4bits version thatindicatestheversionofIPusedtobuildtheheader.Usingaversioneldintheheader allowsthenetworklayerprotocoltoevolve. a4bits IPHeaderLength(IHL) thatindicatesthelengthoftheIPheaderin32bitswords.Thiseldallows IPv4touseoptionsifrequired,butasitisencodedasa4bitseld,theIPv4headercannotbelongerthan 64bytes. an8bits DS eldthatisusedforQualityofServiceandwhoseusageisdescribedlater. an8bits Protocol eldthatindicatesthetransportlayerprotocolthatmustprocessthepacket'spayloadat thedestination.Commonvaluesforthiseld 6 are 6 forTCPand 17 forUDP a16bits length eldthatindicatesthetotallengthoftheentireIPv4packet(headerandpayload)inbytes. ThisimpliesthatanIPv4packetcannotbelongerthan65535bytes. a32bits sourceaddress eldthatcontainstheIPv4addressofthesourcehost a32bits destinationaddress eldthatcontainstheIPv4addressofthedestinationhost a16bits checksum thatprotectsonlytheIPv4headeragainsttransmissionerrors TheothereldsoftheIPv4headerareusedforspecicpurposes.Therstisthe8bits TimeToLive(TTL) eld. ThiseldisusedbyIPv4toavoidtheriskofhavinganIPv4packetcaughtinaninniteloopduetoatransient 6 Seehttp://www.iana.org/assignments/protocol-numbers/ forthelistofallassigned Protocol numbers 5.2.InternetProtocol 147

    PAGE 152

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 orpermanenterrorinroutingtables 7 .Considerforexamplethesituationdepictedinthegurebelowwhere destination D usesaddress 11.0.0.56.If S sendsapackettowardsthisdestination,thepacketisforwardedtorouter B whichforwardsittorouter C thatforwardsitbacktorouter A,etc. Figure5.27:ForwardingloopsinanIPnetwork Unfortunately,suchloopscanoccurfortworeasonsinIPnetworks.First,ifthenetworkusesstaticrouting,the loopcanbecausedbyasimplecongurationerror.Second,ifthenetworkusesdynamicrouting,suchaloopcan occurtransiently,forexampleduringtheconvergenceoftheroutingprotocolafteralinkorrouterfailure.The TTL eldoftheIPv4headerensuresthatevenifthereareforwardingloopsinthenetwork,packetswillnotloop forever.HostssendtheirIPv4packetswithapositive TTL (usually 64 ormore 8 ).WhenarouterreceivesanIPv4 packet,itrstdecrementsthe TTL byone.Ifthe TTL becomes 0,thepacketisdiscardedandamessageissent backtothepacket'ssource(seesection ICMP).Otherwise,therouterperformsalookupinitsforwardingtableto forwardthepacket. AsecondproblemforIPv4istheheterogeneityofthedatalinklayer.IPv4isusedabovemanyverydifferent datalinklayers.Eachdatalinklayerhasitsowncharacteristicsandasindicatedearlier,eachdatalinklayeris characterisedbyamaximumframesize.FromIP'spointofview,adatalinklayerinterfaceischaracterisedbyits MaximumTransmissionUnit(MTU).TheMTUofaninterfaceisthelargestIPv4packet(includingheader)that itcansend.ThetablebelowprovidessomecommonMTUsizes 9 Datalinklayer MTU Ethernet 1500bytes WiFi 2272bytes ATM(AAL5) 9180bytes 802.15.4 102or81bytes TokenRing 4464bytes FDDI 4352bytes AlthoughIPv4cansend64KByteslongpackets,fewdatalinklayertechnologiesthatareusedtodayareableto senda64KBytesIPv4packetinsideaframe.Furthermore,asillustratedinthegurebelow,anotherproblem isthatahostmaysendapacketthatwouldbetoolargeforoneofthedatalinklayersusedbytheintermediate routers. Figure5.28:Theneedforfragmentationandreassembly 7 TheinitialIPspecicationin RFC791 suggestedthatrouterswoulddecrementthe TTL atleastonceeverysecond.Thiswouldensure thatapacketwouldneverremainformorethan TTL secondsinthenetwork.However,inpracticemostrouterimplementationssimplychose todecrementthe TTL byone. 8 TheinitialTTLvalueusedtosendIPpacketsvaryfromoneimplementationtoanother.MostcurrentIPimplementationsuseaninitial TTLof64ormore.See http://members.cox.net/~ndav1/self_published/TTL_values.html foradditionalinformation. 9 SupportingIPoverthe802.15.4datalinklayertechnologyrequiresspecialmechanisms.See RFC4944 foradiscussionofthespecial problemsposedby802.15.4 148Chapter5.Thenetworklayer

    PAGE 153

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Tosolvetheseproblems,IPv4includesapacketfragmentationandreassemblymechanism.BothhostsandintermediateroutersmayfragmentanIPv4packetifthepacketistoolongtobesentviathedatalinklayer.InIPv4, fragmentationiscompletelyperformedintheIPlayerandalargeIPv4isfragmentedintotwoormoreIPv4packets(calledfragments).TheIPv4fragmentsofalargepacketarenormalIPv4packetsthatareforwardedtowards thedestinationofthelargepacketbyintermediaterouters. TheIPv4fragmentationmechanismreliesonfoureldsoftheIPv4header: Length, Identication,the ags and the FragmentOffset.TheIPv4headercontainstwoags: Morefragments and Don'tFragment(DF).Whenthe DF agisset,thisindicatesthatthepacketcannotbefragmented. ThebasicoperationoftheIPv4fragmentationisasfollows.Alargepacketisfragmentedintotwoormore fragments.Thesizeofallfragments,exceptthelastone,isequaltotheMaximumTransmissionUnitofthelink usedtoforwardthepacket.EachIPv4packetcontainsa16bits Identication eld.Whenapacketisfragmented, the Identication ofthelargepacketiscopiedinallfragmentstoallowthedestinationtoreassemblethereceived fragmentstogether.Ineachfragment,the FragmentOffset indicates,inunitsof8bytes,thepositionofthepayload ofthefragmentinthepayloadoftheoriginalpacket.The Length eldineachfragmentindicatesthelengthof thepayloadofthefragmentasinanormalIPv4packet.Finally,the Morefragments agissetonlyinthelast fragmentofalargepacket. Thefollowingpseudo-codedetailstheIPv4fragmentation,assumingthatthepacketdoesnotcontainoptions. #mtu:maximumsizeofthepacket(includingheader)ofoutgoinglink ifp.len0: iflen(payload)>maxpayload: toSend=IP(dest=p.dest,src=p.src, ttl=p.ttl,id=p.id, frag=p.frag+(pos/8), len=mtu,proto=p.proto)/payload[0:maxpayload] pos=pos+maxpayload payload=payload[maxpayload+1:] else toSend=IP(dest=p.dest,src=p.src, ttl=p.ttl,id=p.id, frag=p.frag+(pos/8), flags=p.flags, len=len(payload),proto=p.proto)/payload forward(toSend) ThefragmentsofanIPv4packetmayarriveatthedestinationinanyorder,aseachfragmentisforwardedindependentlyinthenetworkandmayfollowdifferentpaths.Furthermore,somefragmentsmaybelostandnever reachthedestination. Thereassemblyalgorithmusedbythedestinationhostisroughlyasfollows.First,thedestinationcanverify whetherareceivedIPv4packetisafragmentornotbycheckingthevalueofthe Morefragments agandthe FragmentOffset.Ifthe FragmentOffset issetto 0 andthe Morefragments agisreset,thereceivedpackethasnot beenfragmented.Otherwise,thepackethasbeenfragmentedandmustbereassembled.Thereassemblyalgorithm reliesonthe Identication eldofthereceivedfragmentstoassociateafragmentwiththecorrespondingpacket beingreassembled.Furthermore,the FragmentOffset eldindicatesthepositionofthefragmentpayloadinthe originalunfragmentedpacket.Finally,thepacketwiththe Morefragments agresetallowsthedestinationto determinethetotallengthoftheoriginalunfragmentedpacket. NotethatthereassemblyalgorithmmustdealwiththeunreliabilityoftheIPnetwork.Thisimpliesthatafragment maybeduplicatedorafragmentmayneverreachthedestination.Thedestinationcaneasilydetectfragment duplicationthankstothe FragmentOffset.Todealwithfragmentlosses,thereassemblyalgorithmmustboundthe 5.2.InternetProtocol 149

    PAGE 154

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 timeduringwhichthefragmentsofapacketarestoredinitsbufferwhilethepacketisbeingreassembled.This canbeimplementedbystartingatimerwhentherstfragmentofapacketisreceived.Ifthepackethasnotbeen reassembleduponexpirationofthetimer,allfragmentsarediscardedandthepacketisconsideredtobelost. TheoriginalIPspecication,in RFC791,denedseveraltypesofoptionsthatcanbeaddedtotheIPheader. Eachoptionisencodedusinga typelengthvalue format.Theyarenotwidelyusedtodayandarethusonlybriey described.Additionaldetailsmaybefoundin RFC791. ThemostinterestingoptionsinIPv4arethethreeoptionsthatarerelatedtorouting.The Recordroute optionwas denedtoallownetworkmanagerstodeterminethepathfollowedbyapacket.Whenthe Recordroute optionwas present,routersonthepacket'spathhadtoinserttheirIPaddressintheoption.Thisoptionwasimplemented, butastheoptionalpartoftheIPv4headercanonlycontain44bytes,itisimpossibletodiscoveranentirepath ontheglobalInternet. traceroute(8),despiteitslimitations,isabettersolutiontorecordthepathtowardsa destination. Theotherroutingoptionsarethe Strictsourceroute andthe Loosesourceroute option.Themainideabehind theseoptionsisthatahostmaywant,foranyreason,tospecifythepathtobefollowedbythepacketsthatitsends. The Strictsourceroute optionallowsahosttoindicateinsideeachpackettheexactpathtobefollowed.The Strict sourceroute optioncontainsalistofIPv4addressandapointertoindicatethenextaddressinthelist.Whena routerreceivesapacketcontainingthisoption,itdoesnotlookupthedestinationaddressinitsroutingtablebut forwardsthepacketdirectlytothenextrouterinthelistandadvancesthepointer.Thisisillustratedinthegure belowwhere S forcesitspacketstofollowthe RA-RB-RD path. Figure5.29:Usageofthe Strictsourceroute option ThemaximumlengthoftheoptionalpartoftheIPv4headerisaseverelimitationforthe Strictsourceroute optionasforthe RecordRoute option.The Loosesourceroute optiondoesnotsufferfromthislimitation.This optionallowsthesendinghosttoindicateinsideitspacket some oftheroutersthatmustbetraversedtoreachthe destination.Thisisshowninthegurebelow. S sendsapacketcontainingalistofaddressesandapointertothe nextrouterinthelist.Initially,thispointerpointsto RB.When RA receivesthepacketsentby S,itlooksupinits forwardingtabletheaddresspointedinthe Loosesourceroute optionandnotthedestinationaddress.Thepacket isthenforwardedtorouter RB thatrecognisesitsaddressintheoptionandadvancesthepointer.Asthereisno addresslistedinthe Loosesourceroute optionanymore, RB andotherdownstreamroutersforwardthepacketby performingalookupforthedestinationaddress. Figure5.30:Usageofthe Loosesourceroute option Thesetwooptionsareusuallyignoredbyroutersbecausetheycausesecurityproblems RFC6274. 150Chapter5.Thenetworklayer

    PAGE 155

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 5.2.2ICMPversion4 Itissometimesnecessaryforintermediateroutersorthedestinationhosttoinformthesenderofthepacketofa problemthatoccurredwhileprocessingapacket.IntheTCP/IPprotocolsuite,thisreportingisdonebytheInternet ControlMessageProtocol(ICMP).ICMPisdenedin RFC792.ICMPmessagesarecarriedasthepayloadof IPpackets(theprotocolvaluereservedforICMPis 1).AnICMPmessageiscomposedofan8byteheaderand avariablelengthpayloadthatusuallycontainstherstbytesofthepacketthattriggeredthetransmissionofthe ICMPmessage. Figure5.31:ICMPversion4( RFC792) IntheICMPheader,the Type and Code eldsindicatethetypeofproblemthatwasdetectedbythesenderofthe ICMPmessage.The Checksum protectstheentireICMPmessageagainsttransmissionerrorsandthe Data eld containsadditionalinformationforsomeICMPmessages. ThemaintypesofICMPmessagesare: Destinationunreachable :a Destinationunreachable ICMPmessageissentwhenapacket cannotbedeliveredtoitsdestinationduetoroutingproblems.Differenttypesofunreachability aredistinguished: Networkunreachable :thisICMPmessageissentbyarouterthatdoesnothavearoutefor thesubnetcontainingthedestinationaddressofthepacket Hostunreachable :thisICMPmessageissentbyarouterthatisattachedtothesubnet thatcontainsthedestinationaddressofthepacket,butthisdestinationaddresscannotbe reachedatthistime Protocolunreachable :thisICMPmessageissentbyadestinationhostthathasreceiveda packet,butdoesnotsupportthetransportprotocolindicatedinthepacket's Protocol eld Portunreachable :thisICMPmessageissentbyadestinationhostthathasreceiveda packetdestinedtoaportnumber,butnoserverprocessisboundtothisport Fragmentationneeded :thisICMPmessageissentbyarouterthatreceivesapacketwiththe Don'tFragment agsetthatislargerthantheMTUoftheoutgoinginterface Redirect :thisICMPmessagecanbesentwhentherearetworoutersonthesameLAN.Consider aLANwithonehostandtworouters: R1 and R2.Assumethat R1 isalsoconnectedtosubnet 130.104.0.0/16 while R2 isconnectedtosubnet 138.48.0.0/16.IfahostontheLANsendsa packettowards 130.104.1.1 to R2, R2 needstoforwardthepacketagainontheLANtoreach R1.ThisisnotoptimalasthepacketissenttwiceonthesameLAN.Inthiscase, R2 couldsend anICMP Redirect messagetothehosttoinformitthatitshouldhavesentthepacketdirectlyto R1.Thisallowsthehosttosendtheotherpacketsto 130.104.1.1 directlyvia R1. Parameterproblem :thisICMPmessageissentwhenarouterorahostreceivesanIPpacket containinganerror(e.g.aninvalidoption) 5.2.InternetProtocol 151

    PAGE 156

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.32:ICMPredirect Sourcequench :arouterwassupposedtosendthismessagewhenithadtodiscardpacketsdue tocongestion.However,sendingICMPmessagesincaseofcongestionwasnotthebestway toreducecongestionandsincetheinclusionofacongestioncontrolschemeinTCP,thisICMP messagehasbeendeprecated. TimeExceeded :therearetwotypesof TimeExceeded ICMPmessages TTLexceeded :a TTLexceeded messageissentbyarouterwhenitdiscardsanIPv4packet becauseits TTL reached 0. Reassemblytimeexceeded :thisICMPmessageissentwhenadestinationhasbeenunable toreassembleallthefragmentsofapacketbeforetheexpirationofitsreassemblytimer. Echorequest and Echoreply :theseICMPmessagesareusedbythe ping(8) networkdebuggingsoftware. Note: Redirectionattacks ICMPredirectmessagesareusefulwhenseveralroutersareattachedtothesameLANashosts.However,they shouldbeusedwithcareastheyalsocreateanimportantsecurityrisk.Oneofthemostannoyingattacksinan IPnetworkiscalledthe maninthemiddleattack.Suchanattackoccursifanattackerisabletoreceive,process, possiblymodifyandforwardallthepacketsexchangedbetweenasourceandadestination.Astheattacker receivesallthepacketsitcaneasilycollectpasswordsorcreditcardnumbersoreveninjectfakeinformationinan establishedTCPconnection.ICMPredirectsunfortunatelyenableanattackertoeasilyperformsuchanattack.In thegureabove,considerhost H thatisattachedtothesameLANas A and R1.If H sendsto A anICMPredirect forprex 138.48.0.0/16, A forwardsto H allthepacketsthatitwantstosendtothisprex. H canthenforward themto R2.Toavoidtheseattacks,hostsshouldignoretheICMPredirectmessagesthattheyreceive. ping(8) isoftenusedbynetworkoperatorstoverifythatagivenIPaddressisreachable.Eachhostissupposed 10 toreplywithanICMP Echoreply messagewhenitsreceivesanICMP Echorequest message.Asampleusage of ping(8) isshownbelow. ping130.104.1.1 PING130.104.1.1(130.104.1.1):56databytes 64bytesfrom130.104.1.1:icmp_seq=0ttl=243time=19.961ms 64bytesfrom130.104.1.1:icmp_seq=1ttl=243time=22.072ms 64bytesfrom130.104.1.1:icmp_seq=2ttl=243time=23.064ms 64bytesfrom130.104.1.1:icmp_seq=3ttl=243time=20.026ms 64bytesfrom130.104.1.1:icmp_seq=4ttl=243time=25.099ms ---130.104.1.1pingstatistics--5packetstransmitted,5packetsreceived,0%packetloss round-tripmin/avg/max/stddev=19.961/22.044/25.099/1.938ms 10 Untilafewyearsago,allhostsrepliedto Echorequest ICMPmessages.However,duetothesecurityproblemsthathaveaffectedTCP/IP implementations,manyoftheseimplementationscannowbeconguredtodisableanswering Echorequest ICMPmessages. 152Chapter5.Thenetworklayer

    PAGE 157

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Anotherveryusefuldebuggingtoolis traceroute(8).Thetraceroutemanpagedescribesthistoolas print theroutepacketstaketonetworkhost.tracerouteusesthe TTLexceeded ICMPmessagestodiscovertheintermediateroutersonthepathtowardsadestination.Theprinciplebehindtracerouteisverysimple.Whenarouter receivesanIPpacketwhose TTL issetto 1 itdecrementsthe TTL andisforcedtoreturntothesendinghosta TTLexceeded ICMPmessagecontainingtheheaderandtherstbytesofthediscardedIPpacket.Todiscoverall routersonanetworkpath,asimplesolutionistorstsendapacketwhose TTL issetto 1,thenapacketwhose TTL issetto 2,etc.Asampletracerouteoutputisshownbelow. traceroutewww.ietf.org traceroutetowww.ietf.org(64.170.98.32),64hopsmax,40bytepackets 1CsHalles3.sri.ucl.ac.be(192.168.251.230)5.376ms1.217ms1.137ms 2CtHalles.sri.ucl.ac.be(192.168.251.229)1.444ms1.669ms1.301ms 3CtPythagore.sri.ucl.ac.be(130.104.254.230)1.950ms4.688ms1.319ms 4fe.m20.access.lln.belnet.net(193.191.11.9)1.578ms1.272ms1.259ms 510ge.cr2.brueve.belnet.net(193.191.16.22)5.461ms4.241ms4.162ms 6212.3.237.13(212.3.237.13)5.347ms4.544ms4.285ms 7ae-11-11.car1.Brussels1.Level3.net(4.69.136.249)5.195ms4.304ms4.329ms 8ae-6-6.ebr1.London1.Level3.net(4.69.136.246)8.892ms8.980ms8.830ms 9ae-100-100.ebr2.London1.Level3.net(4.69.141.166)8.925ms8.950ms9.006ms 10ae-41-41.ebr1.NewYork1.Level3.net(4.69.137.66)79.590ms ae-43-43.ebr1.NewYork1.Level3.net(4.69.137.74)78.140ms ae-42-42.ebr1.NewYork1.Level3.net(4.69.137.70)77.663ms 11ae-2-2.ebr1.Newark1.Level3.net(4.69.132.98)78.290ms83.765ms90.006ms 12ae-14-51.car4.Newark1.Level3.net(4.68.99.8)78.309ms78.257ms79.709ms 13ex1-tg2-0.eqnwnj.sbcglobal.net(151.164.89.249)78.460ms78.452ms78.292ms 14151.164.95.190(151.164.95.190)157.198ms160.767ms159.898ms 15ded-p10-0.pltn13.sbcglobal.net(151.164.191.243)161.872ms156.996ms159.425ms 16AMS-1152322.cust-rtr.swbell.net(75.61.192.10)158.735ms158.485ms158.588ms 17mail.ietf.org(64.170.98.32)158.427ms158.502ms158.567ms Theabove traceroute(8) outputshowsa17hopspathbetweenahostatUCLouvainandoneofthemain IETFservers.Foreachhop,tracerouteprovidestheIPv4addressoftherouterthatsenttheICMPmessageandthe measuredround-trip-timebetweenthesourceandthisrouter.traceroutesendsthreeprobeswitheach TTL value. Insomecases,suchasatthe10thhopabove,theICMPmessagesmaybereceivedfromdifferentaddresses.This isusuallybecausedifferentpacketsfromthesamesourcehavefolloweddifferentpaths 11 inthenetwork. AnotherimportantutilisationofICMPmessagesistodiscoverthemaximumMTUthatcanbeusedtoreacha destinationwithoutfragmentation.Asexplainedearlier,whenanIPv4routerreceivesapacketthatislargerthan theMTUoftheoutgoinglink,itmustfragmentthepacket.Unfortunately,fragmentationisacomplexoperation androuterscannotperformitatlinerate [KM1995].Furthermore,whenaTCPsegmentistransportedinanIP packetthatisfragmentedinthenetwork,thelossofasinglefragmentforcesTCPtoretransmittheentiresegment (andthusallthefragments).IfTCPwasabletosendonlypacketsthatdonotrequirefragmentationinthe network,itcouldretransmitonlytheinformationthatwaslostinthenetwork.Inaddition,IPreassemblycauses severalchallengesathighspeedasdiscussedin RFC4963.UsingIPfragmentationtoallowUDPapplicationsto exchangelargemessagesraisesseveralsecurityissues [KPS2003]. ICMP,combinedwiththe Don'tfragment(DF) IPv4ag,isusedbyTCPimplementationstodiscoverthelargest MTUsizethatisallowedtoreachadestinationhostwithoutcausingnetworkfragmentation.Thisisthe PathMTU discovery mechanismdenedin RFC1191.ATCPimplementationthatincludes PathMTUdiscovery (mostdo) requeststheIPv4layertosendallsegmentsinsideIPv4packetshavingthe DF agset.Thisprohibitsintermediate routersfromfragmentingthesepackets.Ifarouterneedstoforwardanunfragmentablepacketoveralinkwitha smallerMTU,itreturnsa Fragmentationneeded ICMPmessagetothesource,indicatingtheMTUofitsoutgoing link.ThisICMPmessagecontainsintheMTUoftherouter'soutgoinglinkinits Data eld.Uponreceptionof thisICMPmessage,thesourceTCPimplementationadjustsitsMaximumSegmentSize(MSS)sothatthepackets containingthesegmentsthatitsendscanbeforwardedbythisrouterwithoutrequiringfragmentation. 11 Adetailedanalysisoftracerouteoutputisoutsidethescopeofthisdocument.Additionalinformationmaybefoundin [ACO+2006] and [DT2007] 5.2.InternetProtocol 153

    PAGE 158

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 InteractionsbetweenIPv4andthedatalinklayer Asmentionedintherstsectionofthischapter,therearethreemaintypesofdatalinklayers: point-to-point links, LANssupportingbroadcastandmulticastand NBMA networks.Therearetwoimportantissuestobeaddressed whenusingIPv4inthesetypesofnetworks.TherstissueishowanIPv4deviceobtainsitsIPv4address.The secondissueishowIPv4packetsareexchangedoverthedatalinklayerservice. Ona point-to-point link,theIPv4addressesofthecommunicatingdevicescanbeconguredmanuallyorby usingasimpleprotocol.IPv4addressesareoftenconguredmanuallyon point-to-point linksbetweenrouters. When point-to-point linksareusedtoattachhoststothenetwork,automaticcongurationisoftenpreferredin ordertoavoidproblemswithincorrectIPv4addresses.Forexample,thePPP,speciedin RFC1661,includes anIPnetworkcontrolprotocolthatcanbeusedbytherouterinthegurebelowtosendtheIPv4addressthatthe attachedhostmustcongureforitsinterface.ThetransmissionofIPv4packetsonapoint-to-pointlinkwillbe discussedinchapter chap:lan. Figure5.33:IPv4onpoint-to-pointlinks UsingIPv4inaLANintroducesanadditionalproblem.OnaLAN,eachdeviceisidentiedbyitsuniquedatalink layeraddress.ThedatalinklayerservicecanbeusedbyanyhostattachedtotheLANtosendaframetoanyother hostattachedtothesameLAN.Forthis,thesendinghostmustknowthedatalinklayeraddressofthedestination host.Forexample,thegurebelowshowsfourhostsattachedtothesameLANconguredwithIPv4addresses inthe 10.0.1.0/24 subnetanddatalinklayeraddressesrepresentedasasinglecharacter 12 .Inthisnetwork,ifhost 10.0.1.22/24 wantstosendanIPv4packettothehosthavingaddress 10.0.1.8,itmustknowthatthedatalinklayer addressofthishostis C. Figure5.34:AsimpleLAN Inasimplenetworksuchastheoneshownabove,itcouldbepossibletomanuallycongurethemappingbetween theIPv4addressesofthehostsandthecorrespondingdatalinklayeraddresses.However,inalargerLANthisis impossible.ToeasetheutilisationofLANs,IPv4hostsmustbeabletoautomaticallyobtainthedatalinklayer addresscorrespondingtoanyIPv4addressonthesameLAN.Thisistheobjectiveofthe AddressResolution Protocol (ARP)denedin RFC826.ARPisadatalinklayerprotocolthatisusedbyIPv4.Itreliesontheability ofthedatalinklayerservicetoeasilydeliverabroadcastframetoalldevicesattachedtothesameLAN. 12 Inpractice,mostlocalareanetworksuseaddressesencodedasa48bitseld [802]_ .Somerecentlocalareanetworktechnologiesuse 64bitsaddresses. 154Chapter5.Thenetworklayer

    PAGE 159

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 TheeasiestwaytounderstandtheoperationofARPistoconsiderthesimplenetworkshownaboveandassume thathost 10.0.1.22/24 needstosendanIPv4packettohost 10.0.1.8.AsthisIPaddressbelongstothesame subnet,thepacketmustbesentdirectlytoitsdestinationviathedatalinklayerservice.Tousethisservice,the sendinghostmustndthedatalinklayeraddressthatisattachedtohost 10.0.1.8.EachIPv4hostmaintainsan ARPcache containingthelistofallmappingsbetweenIPv4addressesanddatalinklayeraddressesthatitknows. WhenanIPv4hostsboots,itsARPcacheisempty. 10.0.1.22 thusrstconsultsitsARPcache.Asthecache doesnotcontaintherequestedmapping,host 10.0.1.22 sendsabroadcastARPqueryframeontheLAN.The framecontainsthedatalinklayeraddressofthesendinghost(A)andtherequestedIPv4address(10.0.1.8).This broadcastframeisreceivedbyalldevicesontheLANandonlythehostthatownstherequestedIPv4address repliesbyreturningaunicastARPreplyframewiththerequestedmapping.Uponreceptionofthisreply,the sendinghostupdatesitsARPcacheandsendstheIPv4packetbyusingthedatalinklayerservice.Todealwith devicesthatmoveorwhoseaddressesarerecongured,mostARPimplementationsremovethecacheentriesthat havenotbeenusedforafewminutes.Someimplementationsre-validateARPcacheentriesfromtimetotimeby sendingARPqueries 13 Note: SecurityissueswiththeAddressResolutionProtocol ARP isanoldandwidelyusedprotocolthatwasunfortunatelydesignedwhensecurityissueswerenotaconcern. ARP isalmostinsecurebydesign.Hostsusing ARP canbesubjecttoseveraltypesofattack.First,amalicious hostcouldcreateadenialofserviceattackonaLANbysendingrandomrepliestothereceivedARPqueries. ThiswouldpollutetheARPcacheoftheotherhostsonthesameLAN.Onaxednetwork,suchattackscanbe detectedbythesystemadministratorwhocanphysicallyremovethemalicioushostsfromtheLAN.Onawireless network,removingamalicioushostismuchmoredifcult. Asecondtypeofattackarethe man-in-the-middle attacks.Thisnameisusedfornetworkattackswherethe attackerisabletoreadandpossiblymodifyallthemessagessentbytheattackeddevices.Suchanattackis possibleinaLAN.Assume,inthegureabove,thathost 10.0.1.9 ismaliciousandwouldliketoreceiveand modifyallthepacketssentbyhost 10.0.1.22 tohost 10.0.1.8.Thiscanbeachievedeasilyifhost 10.0.1.9 manages, bysendingfakeARPreplies,toconvincehost 10.0.1.22 (resp. 10.0.1.8)thatitsowndatalinklayeraddressmust beusedtoreach 10.0.1.8 (resp. 10.0.1.22). ARP isusedbyalldevicesthatareconnectedtoaLANandimplementIPv4.Bothroutersandendhostsimplement ARP.WhenahostneedstosendanIPv4packettoadestinationoutsideofitslocalsubnet,itmustrstsendthe packettooneoftheroutersthatresideonthissubnet.Considerforexamplethenetworkshowninthegure below.EachhostisconguredwithanIPv4addressinthe 10.0.1.0/24 subnetanduses 10.0.1.1 asitsdefault router.Tosendapackettoaddress 1.2.3.4,host 10.0.1.8 willrstneedtoknowthedatalinklayerofthedefault router.ItwillthussendanARPrequestfor 10.0.1.1.UponreceptionoftheARPreply,host 10.0.1.8 updatesits ARPtableandsendsitspacketinaframetoitsdefaultrouter.Therouterwillthenforwardthepackettowardsits naldestination. Figure5.35:AsimpleLANwitharouter IntheearlydaysoftheInternet,IPaddressesweremanuallyconguredonbothhostsandroutersandalmostnever changed.However,thismanualcongurationcanbecomplex 14 andoftencauseserrorsthataresometimesdifculttodebug.RecentTCP/IPimplementationsareabletodetectsomeofthesemiscongurations.Forexample, iftwohostsareattachedtothesamesubnetwiththesameIPv4addresstheywillbeunabletocommunicate.To detectthisproblemhostssendanARPrequestfortheirconguredaddresseachtimetheiraddressedischanged RFC5227.IftheyreceiveananswertothisARPrequest,theytriggeranalarmorinformthesystemadministrator. 13 Seechapter28of [Benvenuti2005] foradescriptionoftheimplementationofARPintheLinuxkernel. 14 Forexample,consideralltheoptionsthatcanbespeciedforthe ifcongutility onUnixhosts. 5.2.InternetProtocol 155

    PAGE 160

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Toeasetheattachmentofhoststosubnets,mostnetworksnowsupporttheDynamicHostCongurationProtocol (DHCP) RFC2131.DHCPallowsahosttoautomaticallyretrieveitsassignedIPv4address.ADHCPserver isassociatedtoeachsubnet 15 .EachDHCPservermanagesapoolofIPv4addressesassignedtothesubnet. Whenahostisrstattachedtothesubnet,itsendsaDHCPrequestmessageinaUDPsegment(theDHCPserver listensonport67).AsthehostknowsneitheritsIPv4addressnortheIPv4addressoftheDHCPserver,this UDPsegmentissentinsideanIPv4packetwhosesourceanddestinationaddressesarerespectively 0.0.0.0 and 255.255.255.255.TheDHCPrequestmaycontainvariousoptionssuchasthenameofthehost,itsdatalinklayer address,etc.TheservercapturestheDHCPrequestandselectsanunassignedaddressinitsaddresspool.Itthen sendstheassignedIPv4addressinaDHCPreplymessagewhichcontainsthedatalinklayeraddressofthehost andadditionalinformationsuchasthesubnetmaskoftheIPv4address,theaddressofthedefaultrouterorthe addressoftheDNSresolver.ThisDHCPreplymessageissentinanIPv4packetwhosesourceanddestination addressesarerespectivelytheIPv4addressoftheDHCPserverandthe 255.255.255.255 broadcastaddress.The DHCPreplyalsospeciesthelifetimeoftheaddressallocation.Thisforcesthehosttorenewitsaddressallocation onceitexpires.Thankstothelimitedleasetime,IPaddressesareautomaticallyreturnedtothepoolofaddresses hostsarepoweredoff.ThisreducesthewasteofIPv4addresses. InanNBMAnetwork,theinteractionsbetweenIPv4andthedatalinklayeraremorecomplexastheARPprotocol cannotbeusedasinaLAN.SuchNBMAnetworksusespecialserversthatstorethemappingsbetweenIPaddressesandthecorrespondingdatalinklayeraddress.AsynchronousTransferMode(ATM)networksforexample canuseeithertheATMARPprotocoldenedin RFC2225 ortheNextHopResolutionProtocol(NHRP)dened in RFC2332.ATMnetworksarelessfrequentlyusedtodayandwewillnotdescribethedetailedoperationof theseservers. OperationofIPv4devices AtthispointofthedescriptionofIPv4,itisusefultohaveadetailedlookathowanIPv4implementationsends, receivesandforwardsIPv4packets.ThesimplestcaseiswhenahostneedstosendasegmentinanIPv4packet. Thehostperformstwooperations.First,itmustdecideonwhichinterfacethepacketwillbesent.Seconditmust createthecorrespondingIPpacket(s). Tosimplifythediscussioninthissection,weignoretheutilisationofIPv4options.Thisisnotaseverelimitation astodayIPv4packetsrarelycontainoptions.DetailsabouttheprocessingoftheIPv4optionsmaybefoundinthe relevantRFCs,suchas RFC791.Furthermore,wealsoassumethatonlypoint-to-pointlinksareused.Wedefer theexplanationoftheoperationofIPv4overLocalAreaNetworksuntilthenextchapter. AnIPv4hosthaving n datalinklayerinterfacesmanages n +1 IPv4addresses: the 127.0.0.1/32 IPv4addressassignedbyconventiontoitsloopbackaddress one A.B.C.D/p IPv4addressassignedtoeachofits n datalinklayerinterfaces Suchahostmaintainsaroutingtablecontainingoneentryforitsloopbackaddressandoneentryforeachsubnet identierassignedtoitsinterfaces.Furthermore,thehostusuallyusesoneofitsinterfacesasthe default interface whensendingpacketsthatarenotaddressedtoadirectlyconnecteddestination.Thisisrepresentedbythe default route: 0.0.0.0/0 thatisassociatedtooneinterface. Whenatransportprotocolrunningonthehostrequeststhetransmissionofasegment,itusuallyprovidestheIPv4 destinationaddresstotheIPv4layerinadditiontothesegment 16 .TheIPv4implementationrstperformsa longestprexmatchwiththedestinationaddressinitsroutingtable.Thelookupreturnstheidenticationofthe interfacethatmustbeusedtosendthepacket.ThehostcanthencreatetheIPv4packetcontainingthesegment. ThesourceIPv4addressofthepacketistheIPv4addressofthehostontheinterfacereturnedbythelongestprex match.The Protocol eldofthepacketissettotheidenticationofthelocaltransportprotocolwhichcreated thesegment.The TTL eldofthepacketissettothedefault TTL usedbythehost.Thehostmustnowchoose thepacket's Identication.This Identication isimportantifthepacketbecomesfragmentedinthenetwork,asit ensuresthatthedestinationisabletoreassemblethereceivedfragments.Ideally,asendinghostshouldneversend apackettwicewiththesame Identication tothesamedestinationhost,inordertoensurethatallfragmentsare correctlyreassembledbythedestination.Unfortunately,witha16bits Identication eldandanexpectedMSLof 15 Inpractice,thereisusuallyoneDHCPserverpergroupofsubnetsandtherouterscaptureoneachsubnettheDHCPmessagesand forwardthemtotheDHCPserver. 16 Atransportprotocolimplementationcanalsospecifywhetherthepacketmustbesentwiththe DF setorset.ATCPimplementation using PathMTUDiscovery wouldalwaysrequestthetransmissionofIPv4packetswiththe DF agset. 156Chapter5.Thenetworklayer

    PAGE 161

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 2minutes,thisimpliesthatthemaximumbandwidthtoagivendestinationislimitedtoroughly286Mbps.With amorerealistic1500bytesMTU,thatbandwidthdropsto6.4Mbps RFC4963 iffragmentationmustbepossible 17 .Thisisverylowandisanotherreasonwhyhostsarehighlyencouragedtoavoidfragmentation.If;despiteall ofthis,theMTUoftheoutgoinginterfaceissmallerthanthepacket'slength,thepacketisfragmented.Finally, thepacket'schecksumiscomputedbeforetransmission. WhenahostreceivesanIPv4packetdestinedtoitself,thereareseveraloperationsthatitmustperform.First, itmustcheckthepacket'schecksum.Ifthechecksumisincorrect,thepacketisdiscarded.Then,itmustcheck whetherthepackethasbeenfragmented.Ifyes,thepacketispassedtothereassemblyalgorithmdescribedearlier. Otherwise,thepacketmustbepassedtotheupperlayer.Thisisdonebylookingatthe Protocol eld(6 forTCP, 17 forUDP).Ifthehostdoesnotimplementthetransportlayerprotocolcorrespondingtothereceived Protocol eld,itsendsa Protocolunreachable ICMPmessagetothesendinghost.IfthereceivedpacketcontainsanICMP message(Protocol eldsetto 1),theprocessingismorecomplex.An Echo-request ICMPmessagetriggersthe transmissionofan ICMPEcho-reply message.TheothertypesofICMPmessagesindicateanerrorthatwas causedbyapreviouslytransmittedpacket.TheseICMPmessagesareusuallyforwardedtothetransportprotocol thatsenttheerroneouspacket.ThiscanbedonebyinspectingthecontentsoftheICMPmessagethatincludes theheaderandtherst64bitsoftheerroneouspacket.IftheIPpacketdidnotcontainoptions,whichisthecase formostIPv4packets,thetransportprotocolcanndintherst32bitsofthetransportheaderthesourceand destinationportstodeterminetheaffectedtransportow.ThisisimportantforPathMTUdiscoveryforexample. WhenarouterreceivesanIPv4packet,itmustrstcheckthepacket'schecksum.Ifthechecksumisinvalid,itis discarded.Otherwise,theroutermustcheckwhetherthedestinationaddressisoneoftheIPv4addressesassigned totherouter.Ifso,theroutermustbehaveasahostandprocessthepacketasdescribedabove.Althoughrouters mainlyforwardIPv4packets,theysometimesneedtobeaccessedashostsbynetworkoperatorsornetwork managementsoftware. Ifthepacketisnotaddressedtotherouter,itmustbeforwardedonanoutgoinginterfaceaccordingtotherouter's routingtable.Therouterrstdecrementsthepacket's TTL.Ifthe TTL reaches 0,a TTLExceeded ICMPmessageis sentbacktothesource.Asthepacketheaderhasbeenmodied,thechecksummustberecomputed.Fortunately, asIPv4usesanarithmeticchecksum,aroutercanincrementallyupdatethepacket'schecksumasdescribedin RFC1624.Then,therouterperformsalongestprexmatchforthepacket'sdestinationaddressinitsforwarding table.Ifnomatchisfound,theroutermustreturna Destinationunreachable ICMPmessagetothesource. Otherwise,thelookupreturnstheinterfaceoverwhichthepacketmustbeforwarded.Beforeforwardingthe packetoverthisinterface,theroutermustrstcomparethelengthofthepacketwiththeMTUoftheoutgoing interface.IfthepacketissmallerthantheMTU,itisforwarded.Otherwise,a Fragmentationneeded ICMP messageissentifthe DF agwassentorthepacketisfragmentedifthe DF wasnotset. Note: LongestprexmatchinIProuters Performingthelongestprexmatchatlinerateonroutersrequireshighlytuneddatastructuresandalgorithms. ConsiderforexampleanimplementationofthelongestmatchbasedonaRadixtreeonarouterwitha10Gbps link.Onsuchalink,aroutercanreceive31,250,00040bytesIPv4packetseverysecond.Toforwardthepackets atlinerate,theroutermustprocessoneIPv4packetevery32nanoseconds.Thiscannotbeachievedbyasoftware implementation.Forahardwareimplementation,themaindifcultyliesinthenumberofmemoryaccessesthat arenecessarytoperformthelongestprexmatch.32nanosecondsisverysmallcomparedtothememoryaccesses thatarerequiredbyanaivelongestprexmatchimplement.Additionalinformationaboutfasterlongestprex matchalgorithmsmaybefoundin[Varghese2005]. 5.2.3IPversion6 Inthelate1980sandearly1990sthegrowthoftheInternetwascausingseveraloperationalproblemsonrouters. ManyoftheseroutershadasingleCPUandupto1MByteofRAMtostoretheiroperatingsystem,packetbuffers androutingtables.GiventherateofallocationofIPv4prexestocompaniesanduniversitieswillingtojointhe Internet,theroutingtableswheregrowingveryquicklyandsomefearedthatallIPv4prexeswouldquicklybe allocated.In1987,astudycitedin RFC1752,estimatedthattherewouldbe100,000networksinthenearfuture. InAugust1990,estimatesindicatedthattheclassBspacewouldbeexhaustedbyMarch1994.Twotypesof 17 Itshouldbenotedthatonlythepacketsthatcanbefragmented(i.e.whose DF agisreset)musthavedifferent Identication elds.The Identication eldisnotusedinthepacketshavingthe DF agset. 5.2.InternetProtocol 157

    PAGE 162

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 solutionweredevelopedtosolvethisproblem.TherstshorttermsolutionwastheintroductionofClasslessInter DomainRouting(CIDR).AsecondshorttermsolutionwastheNetworkAddressTranslation(NAT )mechanism, denedin RFC1631.NATallowedmultiplehoststoshareasinglepublicIPaddress,itisexplainedinsection Middleboxes. However,inparallelwiththeseshort-termsolutions,whichhaveallowedtheIPv4Internettocontinuetobeusable untilnow,theInternetEngineeringTaskForcestartedtoworkondevelopingareplacementforIPv4.Thiswork startedwithanopencallforproposals,outlinedin RFC1550.Severalgroupsrespondedtothiscallwithproposals foranextgenerationInternetProtocol(IPng): TUBAproposedin RFC1347 and RFC1561 PIPproposedin RFC1621 SIPPproposedin RFC1710 TheIETFdecidedtopursuethedevelopmentofIPngbasedontheSIPPproposal.AsIPversion 5 wasalready usedbytheexperimentalST-2protocoldenedin RFC1819,thesuccessorofIPversion4isIPversion6.The initialIPversion6denedin RFC1752 wasdesignedbasedonthefollowingassumptions: IPv6addressesareencodedasa128bitseld TheIPv6headerhasasimpleformatthatcaneasilybeparsedbyhardwaredevices AhostshouldbeabletocongureitsIPv6addressautomatically SecuritymustbepartofIPv6 Note: TheIPngaddresssize WhentheworkonIPngstarted,itwasclearthat32bitswastoosmalltoencodeanIPngaddressandallproposals usedlongeraddresses.However,thereweremanydiscussionsaboutthemostsuitableaddresslength.Arst approach,proposedbySIPin RFC1710,wastouse64bitaddresses.A64bitsaddressspacewas4billiontimes largerthantheIPv4addressspaceand,furthermore,fromanimplementationperspective,64bitCPUswerebeing consideredand64bitaddresseswouldnaturallytinsidetheirregisters.Anotherapproachwastouseanexisting addressformat.ThiswastheTUBAproposal( RFC1347)thatreusestheISOCLNP20bytesaddresses.The 20bytesaddressesprovidedroomforgrowth,butusingISOCLNPwasnotfavoredbytheIETFpartiallydueto politicalreasons,despitethefactthatmatureCLNPimplementationswerealreadyavailable.128bitsappearedto beareasonablecompromiseatthattime. IPv6addressingarchitecture TheexperienceofIPv4revealedthatthescalabilityofanetworklayerprotocolheavilydepends onitsaddressingarchitecture.ThedesignersofIPv6spentalotofeffortdeningitsaddressingarchitecture RFC3513.AllIPv6addressesare128bitswide.Thisimpliesthatthereare 340; 282; 366; 920; 938; 463; 463; 374; 607; 431; 768; 211; 456(3:4 10 38 ) differentIPv6addresses.AsthesurfaceoftheEarthisabout510,072,000 km 2 ,thisimpliesthatthereareabout 6:67 10 23 IPv6addressesper squaremeteronEarth.ComparedtoIPv4,whichoffersonly8addressespersquarekilometer,thisisasignicant improvementonpaper. IPv6supportsunicast,multicastandanycastaddresses.AswithIPv4,anIPv6unicastaddressisusedtoidentify onedatalink-layerinterfaceonahost.Ifahosthasseveraldatalinklayerinterfaces(e.g.anEthernetinterfaceand aWiFiinterface),thenitneedsseveralIPv6addresses.Ingeneral,anIPv6unicastaddressisstructuredasshown inthegurebelow. AnIPv6unicastaddressiscomposedofthreeparts: 1.AglobalroutingprexthatisassignedtotheInternetServiceProviderthatownsthisblockofaddresses 2.AsubnetidentierthatidentiesacustomeroftheISP 3.Aninterfaceidentierthatidentiesaparticularinterfaceonanendsystem 158Chapter5.Thenetworklayer

    PAGE 163

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.36:StructureofIPv6unicastaddresses Intoday'sdeployments,interfaceidentiersarealways64bitswide.Thisimpliesthatwhilethereare 2 128 differentIPv6addresses,theymustbegroupedin 2 64 subnets.Thiscouldappearasawasteofresources,however using64bitsforthehostidentierallowsIPv6addressestobeauto-conguredandalsoprovidessomebenets fromasecuritypointofview,asexplainedinsection ICMPv6 Note: TextualrepresentationofIPv6addresses ItissometimesnecessarytowriteIPv6addressesintextformat,e.g.whenmanuallyconguringaddressesorfor documentationpurposes.ThepreferredformatforwritingIPv6addressesis x:x:x:x:x:x:x:x,wherethe x `sare hexadecimaldigitsrepresentingtheeight16-bitpartsoftheaddress.HereareafewexamplesofIPv6addresses: ABCD:EF01:2345:6789:ABCD:EF01:2345:6789 2001:DB8:0:0:8:800:200C:417A FE80:0:0:0:219:E3FF:FED7:1204 IPv6addressesoftencontainalongsequenceofbitssetto 0.Inthiscase,acompactnotationhasbeendened. Withthisnotation, :: isusedtoindicateoneormoregroupsof16bitsblockscontainingonlybitssetto 0.For example, 2001:DB8:0:0:8:800:200C:417Aisrepresentedas 2001:DB8::8:800:200C:417A FF01:0:0:0:0:0:0:101isrepresentedas FF01::101 0:0:0:0:0:0:0:1isrepresentedas ::1 0:0:0:0:0:0:0:0isrepresentedas :: AnIPv6prexcanberepresentedas address/length,where length isthelengthoftheprexinbits.Forexample, thethreenotationsbelowcorrespondtothesameIPv6prex: 2001:0DB8:0000:CD30:0000:0000:0000:0000/60 2001:0DB8::CD30:0:0:0:0/60 2001:0DB8:0:CD30::/60 Inpractice,thereareseveraltypesofIPv6unicastaddress.Mostofthe IPv6unicastaddresses areallocatedin blocksundertheresponsibilityof IANA.ThecurrentIPv6allocationsarepartofthe 2000::/3 addressblock. RegionalInternetRegistries(RIR)suchas RIPE inEurope, ARIN inNorth-AmericaorAfriNICinAfricahave eachreceiveda blockofIPv6addresses thattheysub-allocatetoInternetServiceProvidersintheirregion.The ISPsthensub-allocateaddressestotheircustomers. WhenconsideringtheallocationofIPv6addresses,twotypesofaddressallocationsareoftendistinguished.The RIRsallocate provider-independent(PI) addresses.PIaddressesareusuallyallocatedtoInternetServiceProviders andlargecompaniesthatareconnectedtoatleasttwodifferentISPs [CSP2009].OnceaPIaddressblockhas beenallocatedtoacompany,thiscompanycanuseitsaddressblockwiththeproviderofitschoiceandchange itsprovideratwill.InternetServiceProvidersallocate provider-aggregatable(PA) addressblocksfromtheirown PIaddressblocktotheircustomers.AcompanythatisconnectedtoonlyoneISPshouldonlyusePAaddresses. 5.2.InternetProtocol 159

    PAGE 164

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 ThedrawbackofPAaddressesisthatwhenacompanyusingaPAaddressblockchangesitsprovider,itneedsto changealltheaddressesthatituses.Thiscanbeanightmarefromanoperationalperspectiveandmanycompanies arelobbyingtoobtain PI addressblockseveniftheyaresmallandconnectedtoasingleprovider.Thetypicalsize oftheIPv6addressblocksare: /32foranInternetServiceProvider /48forasinglecompany /64forasingleuser(e.g.ahomeuserconnectedviaADSL) /128intherarecasewhenitisknownthatnomorethanoneendhostwillbeattached ForthecompaniesthatwanttouseIPv6withoutbeingconnectedtotheIPv6Internet, RFC4193 denesthe UniqueLocalUnicast(ULA) addresses(FC00::/7 ).TheseULAaddressesplayasimilarroleastheprivateIPv4 addressesdenedin RFC1918.However,thesizeofthe FC00::/7 addressblockallowsULAtobemuchmore exiblethanprivateIPv4addresses. Furthermore,theIETFhasreservedsomeIPv6addressesforaspecialusage.Thetwomostimportantonesare: 0:0:0:0:0:0:0:1 (::1 incompactform)istheIPv6loopbackaddress.Thisistheaddressofalogicalinterface thatisalwaysupandrunningonIPv6enabledhosts.Thisistheequivalentof 127.0.0.1 inIPv4. 0:0:0:0:0:0:0:0 (:: incompactform)istheunspeciedIPv6address.ThisistheIPv6addressthatahost canuseassourceaddresswhentryingtoacquireanofcialaddress. ThelasttypeofunicastIPv6addressesarethe LinkLocalUnicast addresses.Theseaddressesarepartofthe FE80::/10 addressblockandaredenedin RFC4291.Eachhostcancomputeitsownlinklocaladdressby concatenatingthe FE80::/64 prexwiththe64bitsidentierofitsinterface.Linklocaladdressescanbeused whenhoststhatareattachedtothesamelink(orlocalareanetwork)needtoexchangepackets.Theyareused notablyforaddressdiscoveryandauto-congurationpurposes.Theirusageisrestrictedtoeachlinkandarouter cannotforwardapacketwhosesourceordestinationaddressisalinklocaladdress.Linklocaladdresseshavealso beendenedforIPv4 RFC3927.However,theIPv4linklocaladdressesareonlyusedwhenahostcannotobtain aregularIPv4address,e.g.onanisolatedLAN. Figure5.37:IPv6linklocaladdressstructure AnimportantconsequenceoftheIPv6unicastaddressingarchitectureandtheutilisationoflink-localaddressesis thatanIPv6hosthasseveralIPv6addresses.ThisimpliesthatanIPv6stackmustbeabletohandlemultipleIPv6 addresses.ThiswasnotalwaysthecasewithIPv4. RFC4291 denesaspecialtypeofIPv6anycastaddress.Onasubnetworkhavingprex p/n,theIPv6address whose 128-n low-orderbitsaresetto 0 istheanycastaddressthatcorrespondstoallroutersinsidethissubnetwork.Thisanycastaddresscanbeusedbyhoststoquicklysendapackettoanyoftheroutersinsidetheirown subnetwork. Finally, RFC4291 denesthestructureoftheIPv6multicastaddresses 18 .Thisstructureisdepictedinthegure below Theloworder112bitsofanIPv6multicastaddressarethegroup'sidentier.Thehighorderbitsareusedasa markertodistinguishmulticastaddressesfromunicastaddresses.Notably,the4bitsageldindicateswhether theaddressistemporaryorpermanent.Finally,thescopeeldindicatestheboundariesoftheforwardingof packetsdestinedtoaparticularaddress.Alink-localscopeindicatesthataroutershouldnotforwardapacket destinedtosuchamulticastaddress.Anorganisationlocal-scopeindicatesthatapacketsenttosuchamulticast destinationaddressshouldnotleavetheorganisation.Finallytheglobalscopeisintendedformulticastgroups spanningtheglobalInternet. 18 ThefulllistofallocatedIPv6multicastaddressesisavailableathttp://www.iana.org/assignments/ipv6-multicast-addresses 160Chapter5.Thenetworklayer

    PAGE 165

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.38:IPv6multicastaddressstructure Amongtheseaddresses,somearewellknown.Forexample,allendsystemautomaticallybelongtothe FF02::1 multicastgroupwhileallroutersautomaticallybelongtothe FF02::2 multicastgroup.WediscussIPv6multicast later. IPv6packetformat TheIPv6packetformatwasheavilyinspiredbythepacketformatproposedfortheSIPPprotocolin RFC1710. ThestandardIPv6headerdenedin RFC2460 occupies40bytesandcontains8differentelds,asshowninthe gurebelow. Figure5.39:TheIPversion6header( RFC2460) Apartfromthesourceanddestinationaddresses,theIPv6headercontainsthefollowingelds: version :a4bitseldsetto 6 andintendedtoallowIPtoevolveinthefutureifneeded Trafcclass :this8bitseldplaysasimilarroleasthe DS byteintheIPv4header Flowlabel :thiseldwasinitiallyintendedtobeusedtotagpacketsbelongingtothesame ow.However, asofthiswriting,thereisnoclearguidelineonhowthiseldshouldbeusedbyhostsandrouters Payloadlength :thisisthesizeofthepacketpayloadinbytes.Asthelengthisencodedasa16bitseld, anIPv6packetcancontainupto65535bytesofpayload. NextHeader :this8bitseldindicatesthetype 19 ofheaderthatfollowstheIPv6header.Itcanbea transportlayerheader(e.g. 6 forTCPor 17 forUDP)oranIPv6option.Handlingoptionsasanextheader allowssimplifyingtheprocessingofIPv6packetscomparedtoIPv4. 19 TheIANAmaintainsthelistofallallocatedNextHeadertypesat http://www.iana.org/assignments/protocol-numbers/ Thesameregistry isusedfortheIPv4protocoleldandfortheIPv6NextHeader. 5.2.InternetProtocol 161

    PAGE 166

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 HopLimit :this8bitseldindicatesthenumberofroutersthatcanforwardthepacket.Itisdecremented byonebyeachrouterandhasthesamepurposeastheTTLeldoftheIPv4header. IncomparisonwithIPv4,theIPv6packetsaremuchsimplerandeasiertoprocessbyrouters.Arstimportant differenceisthatthereisnochecksuminsidetheIPv6header.Thisismainlybecausealldatalinklayersand transportprotocolsincludeachecksumoraCRCtoprotecttheirframes/segmentsagainsttransmissionerrors. AddingachecksumintheIPv6headerwouldhaveforcedeachroutertorecomputethechecksumofallpackets, withlimitedbenetindetectingerrors.Inpractice,anIPchecksumallowsforcatchingerrorsthatoccurinside routers(e.g.duetomemorycorruption)beforethepacketreachesitsdestination.However,thisbenetwasfound tobetoosmallgiventhereliabilityofcurrentmemoriesandthecostofcomputingthechecksumoneachrouter. AseconddifferencewithIPv4isthattheIPv6headerdoesnotsupportfragmentationandreassembly.Experience withIPv4hasshownthatfragmentingpacketsinrouterswascostly [KM1995] andthedevelopersofIPv6have decidedthatrouterswouldnotfragmentpacketsanymore.Ifarouterreceivesapacketthatistoolongtobe forwarded,thepacketisdroppedandtherouterreturnsanICMPv6messagestoinformthesenderoftheproblem. ThesendercantheneitherfragmentthepacketorperformPathMTUdiscovery.InIPv6,packetfragmentationis performedonlybythesourcebyusingIPv6options. ThethirddifferencearetheIPv6options,whicharesimplerandeasiertoprocessthantheIPv4options. Note: Headercompressiononlowbandwidthlinks GiventhesizeoftheIPv6header,itcancausehugeoverheadonlowbandwidthlinks,especiallywhensmall packetsareexchangedsuchasforVoiceoverIPapplications.Insuchenvironments,severaltechniquescanbe usedtoreducetheoverhead.Arstsolutionistousedatacompressioninthedatalinklayertocompressallthe informationexchanged [Thomborson1992].Thesetechniquesaresimilartothedatacompressionalgorithmsused intoolssuchas compress(1) or gzip(1) RFC1951.Theycompressstreamsofbitswithouttakingadvantage ofthefactthatthesestreamscontainIPpacketswithaknownstructure.AsecondsolutionistocompresstheIP andTCPheader.Theseheadercompressiontechniques,suchastheonedenedin RFC2507 takeadvantageof theredundancyfoundinsuccessivepacketsfromthesameowtosignicantlyreducethesizeoftheprotocol headers.AnothersolutionistodeneacompressedencodingoftheIPv6headerthatmatchesthecapabilitiesof theunderlyingdatalinklayer RFC4944. IPv6options InIPv6,eachoptionisconsideredasoneheadercontainingamultipleof8bytestoensurethatIPv6optionsina packetarealignedon64bitboundaries.IPv6denesseveraltypeofoptions: thehop-by-hopoptionsareoptionsthatmustbeprocessedbytheroutersonthepacket'spath thetype0routingheader,whichissimilartotheIPv4loosesourceroutingoption thefragmentationoption,whichisusedwhenfragmentinganIPv6packet thedestinationoptions thesecurityoptionsthatallowIPv6hoststoexchangepacketswithcryptographicauthentication(AH header)orencryptionandauthentication(ESPheader) RFC2460 provideslotsofdetailontheencodingsofthedifferenttypesofoptions.Inthissection,weonlydiscus someofthem.Thereadermayconsult RFC2460 formoreinformationabouttheotheroptions.Therstpointto noteisthateachoptioncontainsa NextHeader eld,whichindicatesthetypeofheaderthatfollowstheoption. AsecondpointtonoteisthatinordertoallowrouterstoefcientlyparseIPv6packets,theoptionsthatmustbe processedbyrouters(hop-by-hopoptionsandtype0routingheader)mustappearrstinthepacket.Thisallows theroutertoprocessapacketwithoutbeingforcedtoanalyseallthepacket'soptions.Athirdpointtonoteis thathop-by-hopanddestinationoptionsareencodedusinga typelengthvalue format.Furthermore,the type eld containsbitsthatindicatewhetherarouterthatdoesnotunderstandthisoptionshouldignoretheoptionordiscard thepacket.Thisallowstheintroductionofnewoptionsintothenetworkwithoutforcingalldevicestobeupgraded tosupportthematthesametime. Two hop-by-hop optionshavebeendened. RFC2675 speciesthejumbogramthatenablesIPv6tosupport packetscontainingapayloadlargerthan65535bytes.Thesejumbopacketshavetheir payloadlength setto 0 and 162Chapter5.Thenetworklayer

    PAGE 167

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 thejumbogramoptioncontainsthepacketlengthasa32bitseld.Suchpacketscanonlybesentfromasource toadestinationifalltheroutersonthepathsupportthisoption.However,asofthiswritingitdoesnotseemthat thejumbogramoptionhasbeenimplemented.Therouteralertoptiondenedin RFC2711 isthesecondexample ofa hop-by-hop option.Thepacketsthatcontainthisoptionshouldbeprocessedinaspecialwaybyintermediate routers.ThisoptionisusedforIPpacketsthatcarryResourceReservationProtocol(RSVP)messages.Itsusage isexplainedlater. Thetype0routingheaderdenedin RFC2460 isanexampleofanIPv6optionthatmustbeprocessedbysome routers.Thisoptionisencodedasshownbelow. Figure5.40:TheType0routingheader( RFC2460) Thetype0routingoptionwasintendedtoallowahosttoindicatealoosesourceroutethatshouldbefollowedby apacketbyspecifyingtheaddressesofsomeoftheroutersthatmustforwardthispacket.Unfortunately,further workwiththisroutingheader,includinganentertainingdemonstrationwith scapy [BE2007] ,revealedsome severesecurityproblemswiththisroutingheader.Forthisreason,loosesourceroutingwiththetype0routing headerhasbeenremovedfromtheIPv6specication RFC5095. InIPv6,fragmentationisperformedexclusivelybythesourcehostandreliesonthefragmentationheader.This 64bitsheaderiscomposedofsixelds: a NextHeader eldthatindicatesthetypeoftheheaderthatfollowsthefragmentationheader a reserved eldsetto 0. the FragmentOffset isa13-bitunsignedintegerthatcontainstheoffset,in8bytesunits,ofthedatafollowing thisheader,relativetothestartoftheoriginalpacket. the More ag,whichissetto 0 inthelastfragmentofapacketandto 1 inallotherfragments. the32bits Identication eldindicatestowhichoriginalpacketafragmentbelongs.Whenahostsends fragmentedpackets,itshouldensurethatitdoesnotreusethesame identication eldforpacketssentto thesamedestinationduringaperiodof MSL seconds.Thisiseasierwiththe32bits identication usedin theIPv6fragmentationheader,thanwiththe16bits identication eldoftheIPv4header. SomeIPv6implementationssendthefragmentsofapacketinincreasingfragmentoffsetorder,startingfromthe rstfragment.Otherssendthefragmentsinreverseorder,startingfromthelastfragment.Thelattersolutioncan beadvantageousforthehostthatneedstoreassemblethefragments,asitcaneasilyallocatethebufferrequiredto 5.2.InternetProtocol 163

    PAGE 168

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 reassembleallfragmentsofthepacketuponreceptionofthelastfragment.Whenahostreceivestherstfragment ofanIPv6packet,itcannotknowapriorithelengthoftheentireIPv6packet. ThegurebelowprovidesanexampleofafragmentedIPv6packetcontainingaUDPsegment.The NextHeader typereservedfortheIPv6fragmentationoptionis44. Figure5.41:IPv6fragmentationexample Finally,thelasttypeofIPv6optionsistheEncaspulatingSecurityPayload(ESP)denedin RFC4303 andthe AuthenticationHeader(AH)denedin RFC4302.ThesetwoheadersareusedbyIPSec RFC4301.Theyare discussedinanotherchapter. 5.2.4ICMPversion6 ICMPv6denedin RFC4443 isthecompanionprotocolforIPv6asICMPv4isthecompanionprotocolforIPv4. ICMPv6isusedbyroutersandhoststoreportproblemswhenprocessingIPv6packets.However,aswewillseein chapter ThedatalinklayerandtheLocalAreaNetworks,ICMPv6isalsousedwhenauto-conguringaddresses. ThetraditionalutilisationofICMPv6issimilartoICMPv4.ICMPv6messagesarecarriedinsideIPv6packets (the NextHeader eldforICMPv6is58).EachICMPmessagecontainsan8bitsheaderwitha type eld,a code eldanda16bitschecksumcomputedovertheentireICMPv6message.Themessagebodycontainsacopyof theIPv6packetinerror. Figure5.42:ICMPversion6packetformat ICMPv6speciestwoclassesofmessages:errormessagesthatindicateaprobleminhandlingapacketand informationalmessages.Fourtypesoferrormessagesaredenedin RFC4443 : 1 [DestinationUnreachable.SuchanICMPv6messageissentwhenthedestinationaddressofapacket isunreachable.The code eldoftheICMPheadercontainsadditionalinformationaboutthetypeof unreachability.Thefollowingcodesarespeciedin RFC4443] 0:Noroutetodestination.ThisindicatesthattherouterthatsenttheICMPv6messagedidnot havearoutetowardsthepacket'sdestination 164Chapter5.Thenetworklayer

    PAGE 169

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 1:Communicationwithdestinationadministrativelyprohibited.Thisindicatesthatarewallhas refusedtoforwardthepackettowardsitsdestination. 2:Beyondscopeofsourceaddress.Thismessagecanbesentifthesourceisusinglink-local addressestoreachaglobalunicastaddressoutsideitssubnet. 3:Addressunreachable.Thismessageindicatesthatthepacketreachedthesubnetofthedestination,butthehostthatownsthisdestinationaddresscannotbereached. 4:Portunreachable.ThismessageindicatesthattheIPv6packetwasreceivedbythedestination, buttherewasnoapplicationlisteningtothespeciedport. 2:PacketTooBig.TherouterthatwastosendtheICMPv6messagereceivedanIPv6packetthatislarger thantheMTUoftheoutgoinglink.TheICMPv6messagecontainstheMTUofthislinkinbytes.This allowsthesendinghosttoimplementPathMTUdiscovery RFC1981 3:TimeExceeded.Thiserrormessagecanbesenteitherbyarouterorbyahost.Arouterwouldset code to 0 toreportthereceptionofapacketwhose HopLimit reached 0.Ahostwouldset code to 1 toreportthat itwasunabletoreassemblereceivedIPv6fragments. 4:ParameterProblem.ThisICMPv6messageisusedtoreporteitherthereceptionofanIPv6packetwith anerroneousheadereld(type 0)oranunknown NextHeader orIPoption(types 1 and 2).Inthiscase,the messagebodycontainstheerroneousIPv6packetandtherst32bitsofthemessagebodycontainapointer totheerror. TwotypesofinformationalICMPv6messagesaredenedin RFC4443 : echorequest and echoreply,whichare usedtotestthereachabilityofadestinationbyusing ping6(8). ICMPv6alsoallowsthediscoveryofthepathbetweenasourceandadestinationbyusing traceroute6(8). TheoutputbelowshowsatraceroutebetweenahostatUCLouvainandoneofthemainIETFservers.Note thatthisIPv6pathisdifferentthantheIPv4paththatwasdescribedearlieralthoughthetwotracerouteswere performedatthesametime. traceroute6www.ietf.org traceroute6towww.ietf.org(2001:1890:1112:1::20)from2001:6a8:3080:2:217:f2ff:fed6:65c0,30hopsmax,12bytepackets 12001:6a8:3080:2::113.821ms0.301ms0.324ms 22001:6a8:3000:8000::10.651ms0.51ms0.495ms 310ge.cr2.bruvil.belnet.net3.402ms3.34ms3.33ms 410ge.cr2.brueve.belnet.net3.668ms10ge.cr2.brueve.belnet.net3.988ms10ge.cr2.brueve.belnet.net3.699ms 5belnet.rt1.ams.nl.geant2.net10.598ms7.214ms10.082ms 6so-7-0-0.rt2.cop.dk.geant2.net20.19ms20.002ms20.064ms 7kbn-ipv6-b1.ipv6.telia.net21.078ms20.868ms20.864ms 8s-ipv6-b1-link.ipv6.telia.net31.312ms31.113ms31.411ms 9s-ipv6-b1-link.ipv6.telia.net61.986ms61.988ms61.994ms 102001:1890:61:8909::1121.716ms121.779ms121.177ms 112001:1890:61:9117::2203.709ms203.305ms203.07ms 12mail.ietf.org204.172ms203.755ms203.748ms Note: RatelimitationofICMPmessages High-endhardwarebasedroutersusespecialpurposechipsontheirinterfacestoforwardIPv6packetsatline rate.Thesechipsareoptimisedtoprocess correct IPpackets.TheyarenotabletocreateICMPmessagesatline rate.WhensuchachipreceivesanIPpacketthattriggersanICMPmessage,itinterruptsthemainCPUofthe routerandthesoftwarerunningonthisCPUprocessesthepacket.ThisCPUismuchslowerthanthehardware accelerationfoundontheinterfaces[Gill2004].ItwouldbeoverloadedifithadtoprocessIPpacketsatlinerate andgenerateoneICMPmessageforeachreceivedpacket.ToprotectthisCPU,high-endrouterslimittherateat whichthehardwarecaninterruptthemainCPUandthustherateatwhichICMPmessagescanbegenerated.This impliesthatnotallerroneousIPpacketscausethetransmissionofanICMPmessage.Theriskofoverloadingthe mainCPUoftherouterisalsothereasonwhyusinghop-by-hopIPv6options,includingtherouteralteroptionis discouraged 20 20 ForadiscussionoftheissueswiththerouteralertIPoption,see http://tools.ietf.org/html/draft-rahman-rtg-router-alert-dangerous-00 or http://tools.ietf.org/html/draft-rahman-rtg-router-alert-considerations-03 5.2.InternetProtocol 165

    PAGE 170

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 InteractionsbetweenIPv6andthedatalinklayer ThereareseveraldifferencesbetweenIPv6andIPv4whenconsideringtheirinteractionswiththedatalinklayer. InIPv6,theinteractionsbetweenthenetworkandthedatalinklayerisperformedusingICMPv6. FirstICMPv6isusedtoresolvethedatalinklayeraddressthatcorrespondstoagivenIPv6address.Thispart ofICMPv6istheNeighbourDiscoveryProtocol(NDP)denedin RFC4861.NDPissimilartoARP,butthere aretwoimportantdifferences.First,NDPmessagesareexchangedinICMPv6messageswhileARPmessages aresentasdatalinklayerframes.Second,anARPrequestissentasabroadcastframewhileanNDPsolicitation messageissentasamulticastICMPv6packetthatistransportedinsideamulticastframe.Theoperationofthe NDPprotocolissimilartoARP.Toobtainanaddressmapping,ahostsendsaNeighbourSolicitationmessage. ThismessageissentinsideanICMPv6messagethatisplacedinanIPv6packetwhosesourceaddressisthe IPv6addressoftherequestinghostandthedestinationaddressistheall-hostsIPv6multicastaddress(FF02::1) towhichallIPv6hostslisten.TheNeighbourSolicitationcontainstherequestedIPv6address.Theownerofthe requestedaddressrepliesbysendingaunicastNeighbourAdvertisementmessagetotherequestinghost.NDP suffersfromsimilarsecurityissuesastheARPprotocol.However,itispossibletosecureNDPbyusingthe CryptographicallyGeneratedIPv6Addresses (CGA)denedin RFC3972.TheSecureNeighbourDiscovery Protocolisdenedin RFC3971,butadetaileddescriptionofthisprotocolisoutsidethescopeofthischapter. IPv6networksalsosupporttheDynamicHostCongurationProtocol.TheIPv6extensionstoDHCParedened in RFC3315.TheoperationofDHCPv6issimilartoDHCPthatwasdescribedearlier.InadditiontoDHCPv6, IPv6networkssupportanothermechanismtoassignIPv6addressestohosts.ThisistheStatelessAddressConguration(SLAC)denedin RFC4862.Whenahostboots,itderivesitsidentierfromitsdatalinklayeraddress 21 andconcatenatesthis64bitsidentiertothe FE80::/64 prextoobtainitslink-localIPv6address.Itthensends aNeighbourSolicitationwithitslink-localaddressasatargettoverifywhetheranotherhostisusingthesame link-localaddressonthissubnet.IfitreceivesaNeighbourAdvertisementindicatingthatthelink-localaddress isusedbyanotherhost,itgeneratesanother64bitsidentierandsendsagainaNeighbourSolicitation.Ifthereis noanswer,thehostconsidersitslink-localaddresstobevalid.Thisaddresswillbeusedasthesourceaddressfor allNDPmessagessentonthesubnet.ToautomaticallycongureitsglobalIPv6address,thehostmustknowthe globallyroutableIPv6prexthatisusedonthelocalsubnet.IPv6routersregularlysendICMPv6RouterAdvertisementmessagesthatindicatetheIPv6prexassignedtoeachsubnet.Uponreceptionofthismessage,thehost canderiveitsglobalIPv6addressbyconcatenatingits64bitsidentierwiththereceivedprex.Itconcludesthe SLACbysendingaNeighbourSolicitationmessagetargetedatitsglobalIPv6addresstoensurethatanotherhost isnotusingthesameIPv6address. 5.2.5Middleboxes WhentheTCP/IParchitectureandtheIPprotocolweredened,twotypeofdeviceswereconsideredinthe networklayer:endhostsandrouters.EndhostsarethesourcesanddestinationsofIPpacketswhilerouters forwardpackets.WhenarouterforwardsanIPpacket,itconsultsitsforwardingtable,updatesthepacket'sTTL, recomputesitschecksumandforwardsittothenexthop.Arouterdoesnotneedtoreadorchangethecontentsof thepacket'spayload. However,intoday'sInternet,thereexistdevicesthatarenotstrictlyroutersbutwhichprocess,sometimesmodify, andforwardIPpackets.Thesedevicesareoftencalled middleboxes RFC3234.Somemiddleboxesonlyoperate inthenetworklayer,butmostmiddleboxesareabletoanalysethepayloadofthereceivedpacketsandextractthe transportheader,andinsomecasestheapplicationlayerprotocols. Inthissection,webrieydescribetwotypeofmiddleboxes:rewallsandnetworkaddresstranslation(NAT) devices.Adiscussionofthedifferenttypesofmiddleboxeswithreferencesmaybefoundin RFC3234. Firewalls WhentheInternetwasonlyaresearchnetworkinterconnectingresearchlabs,securitywasnotaconcern,and mosthostsagreedtoexchangepacketsoverTCPconnectionswithmostotherhosts.However,asmoreandmore 21 Usingadatalinklayeraddresstoderivea64bitsidentierforeachhostraisesprivacyconcernsasthehostwillalwaysusethesame identier.AttackerscouldusethistotrackhostsontheInternet.AnextensiontotheStatelessAddressCongurationmechanismthatdoesnot raiseprivacyconcernsisdenedin RFC4941.Theseprivacyextensionsallowahosttogenerateits64bitsidentierrandomlyeverytimeit attachestoasubnet.Itthenbecomesimpossibleforanattackertousethe64-bitsidentiertotrackahost. 166Chapter5.Thenetworklayer

    PAGE 171

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.43:IPmiddleboxesandthereferencemodel usersandcompaniesbecameconnectedtotheInternet,allowingunlimitedaccesstohoststhattheymanaged startedtoconcerncompanies.Furthermore,attheendofthe1980s,severalsecurityissuesaffectedtheInternet, suchastherstInternetworm [RE1989] andsomewidelypublicisedsecuritybreaches [Stoll1988][CB2003] [Cheswick1990] ThesesecurityproblemsconvincedtheindustrythatIPnetworksareakeypartofacompany'sinfrastructure,that shouldbeprotectedbyspecialdeviceslikesecurityguardsandfencesareusedtoprotectbuildings.Thesespecial deviceswerequicklycalled rewalls.Atypicalrewallhastwointerfaces: anexternalinterfaceconnectedtotheglobalInternet aninternalinterfaceconnectedtoatrustednetwork Therstrewallsincludedcongurablepacketlters.Apacketlterisasetofrulesdeningthesecuritypolicy ofanetwork.Inpractice,theserulesarebasedonthevaluesofeldsintheIPortransportlayerheaders.Any eldoftheIPortransportheadercanbeusedinarewallrule,butthemostcommononesare: lteronthesourceaddress.Forexample,acompanymaydecidetodiscardallpacketsreceivedfromone ofitscompetitors.Inthiscase,allpacketswhosesourceaddressbelongtothecompetitor'saddressblock wouldberejected lterondestinationaddress.Forexample,thehostsoftheresearchlabofacompanymayreceivepackets fromtheglobalInternet,butnotthehostsofthenancialdepartment lteronthe Protocol numberfoundintheIPheader.Forexample,acompanymayonlyallowitshoststo useTCPorUDP,butnotother,moreexperimental,transportprotocols lterontheTCPorUDPportnumbers.Forexample,onlytheDNSserverofacompanyshouldreceived UDPsegmentswhosedestinationportissetto 53 oronlytheofcialSMTPserversofthecompanycan sendTCPsegmentswhosesourceportsaresetto 25 lterontheTCPags.Forexample,asimplesolutiontoprohibitexternalhostsfromopeningTCPconnectionswithhostsinsidethecompanyistodiscardallTCPsegmentsreceivedfromtheexternalinterface withonlythe SYN agset. Suchrewallsareoftencalled stateless rewallsbecausetheydonotmaintainanystateabouttheTCPconnections thatpassthroughthem. Anothertypeofrewallsare stateful rewalls.AstatefulrewalltracksthestateofeachTCPconnectionpassing throughitandmaintainsaTCBforeachoftheseTCPconnection.ThisTCBallowsittoreassemblethereceived segmentsinordertoextracttheirpayloadandperformvericationsintheapplicationlayer.Somerewallsare abletoinspecttheURLsaccessedusingHTTPandlogallURLsvisitedorblockTCPconnectionswherea dangerousURLisexchanged.SomerewallscanverifythatSMTPcommandsareusedwhenaTCPconnection isestablishedonport 25 orthataTCPconnectiononport 80 carriesHTTPcommandsandresponses. Note: Beyondrewalls Apartfromrewalls,differenttypesofsecuritydeviceshavebeeninstalledattheperipheryofcorporatenetworks.IntrusionDetectionSystems(IDS),suchasthepopular snort ,arestatefuldevicesthatarecapableof matchingreassembledsegmentsagainstregularexpressionscorrespondingtosignaturesofviruses,wormsor othertypesofattacks.DeepPacketInspection(DPI)isanothertypeofmiddleboxthatanalysesthepacket'spayloadandisabletoreassembleTCPsegmentsinordertodetectinappropriateusages.WhileIDSaremainlyused incorporatenetworks,DPIismainlyusedinInternetServiceProviders.SomeISPsuseDPItodetectandlimit 5.2.InternetProtocol 167

    PAGE 172

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 thebandwidthconsumedbypeer-to-peerapplications.SomecountriessuchasChinaorIranuseDPItodetect inappropriateInternetusage. NAT NetworkAddressTranslation(NAT)wasproposedin[TE1993] and RFC3022 asashorttermsolutiontodeal withtheexpectedshortageofIPv4addressesinthelate1980s-early1990s.CombinedwithCIDR,NAThelped tosignicantlyslowdowntheconsumptionofIPv4addresses.ANATisamiddleboxthatinterconnectstwo networksthatareusingIPv4addressesfromdifferentaddressingspaces.Usually,oneoftheseaddressingspaces isthepublicInternetwhiletheotherisusingtheprivateIPv4addressesdenedin RFC1918. AverycommondeploymentofNATisinbroadbandaccessroutersasshowninthegurebelow.Thebroadband accessrouterinterconnectsahomenetwork,eitherWiFiorEthernetbased,andtheglobalInternetviaoneISP overADSLorCATV.AsingleIPv4addressisallocatedtothebroadbandaccessrouterandnetworkaddress translationallowsallofthehostsattachedtothehomenetworktoshareasinglepublicIPv4address. Figure5.44:AsimpleNATwithonepublicIPv4address Asecondtypeofdeploymentisinenterprisenetworksasshowninthegurebelow.Inthiscase,theNAT functionalityisinstalledonaborderrouteroftheenterprise.AprivateIPv4addressisassignedtoeachenterprise hostwhiletheborderroutermanagesapoolcontainingseveralpublicIPv4addresses. Figure5.45:AnenterpriseNATwithseveralpublicIPv4addresses Asthenameimplies,aNATisadevicethattranslatesIPaddresses.ANATmaintainsamappingtablebetween theprivateIPaddressesusedintheinternalnetworkandthepublicIPv4addresses.NATallowsalargenumberof hoststoshareapoolofIPaddresses,asthesehostsdonotallaccesstheglobalInternetatthesametime. ThesimplestNATisamiddleboxthatusesaone-to-onemappingbetweenaprivateIPaddressandapublicIP address.Tounderstanditsoperation,letusassumethataNAT,suchastheoneshownabove,hasjustbooted. WhentheNATreceivestherstpacketfromsource S intheinternalnetworkwhichisdestinedtothepublic Internet,itcreatesamappingbetweeninternaladdress S andtherstaddressofitspoolofpublicaddresses(P1). Then,ittranslatesthereceivedpacketsothatitcanbesenttothepublicInternet.Thistranslationisperformedas followed: 168Chapter5.Thenetworklayer

    PAGE 173

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 thesourceaddressofthepacket(S )isreplacedbythemappedpublicaddress(P1) thechecksumoftheIPheaderisincrementallyupdatedasitscontenthaschanged ifthepacketcarriedaTCPorUDPsegment,thetransportlayerchecksumfoundintheincludedsegment mustalsobeupdatedasitiscomputedoverthesegmentandapseudo-headerthatincludesthesourceand destinationaddresses Whenapacketdestinedto P1 isreceivedfromthepublicInternet,theNATconsultsitsmappingtabletond S. Thereceivedpacketistranslatedandforwardedintheinternalnetwork. ThisworksaslongasthepoolofpublicIPaddressesoftheNATdoesnotbecomeempty.Inthiscase,amapping mustberemovedfromthemappingtabletoallowapacketfromanewhosttobetranslated.Thisgarbage collectioncanbeimplementedbyaddingtoeachentryinthemappingtableatimestampthatcontainsthelast utilisationtimeofamappingentry.Thistimestampisupdatedeachtimethecorrespondingentryisused.Then, thegarbagecollectionalgorithmcanremovetheoldestmappingentryinthetable. AdrawbackofsuchasimpleenterpriseNATisthesizeofthepoolofpublicIPv4addresseswhichisoftentoo smalltoallowalargenumberofhostssharesuchaNAT.Inthiscase,abettersolutionistoallowtheNATto translatebothIPaddressesandportnumbers. SuchaNATmaintainsamappingtablethatmapsaninternalIPaddressandTCPportnumberwithanexternal IPaddressandTCPportnumber.WhensuchaNATreceivesapacketfromtheinternalnetwork,itperformsa lookupinthemappingtablewiththepacket'ssourceIPaddressandsourceTCPportnumber.Ifamappingis found,thesourceIPaddressandthesourceTCPportnumberofthepacketaretranslatedwiththevaluesfound inthemappingtable,thechecksumsareupdatedandthepacketissenttotheglobalInternet.Ifnomappingis found,anewmappingiscreatedwiththerstavailablecouple (IPaddress,TCPportnumber) andthepacketis translated.TheentriesofthemappingtableareeitherremovedattheendofthecorrespondingTCPconnection astheNATtracksTCPconnectionstatelikeastatefulrewalloraftersomeidletime. WhensuchaNATreceivesapacketfromtheglobalInternet,itlooksupitsmappingtableforthepacket'sdestinationIPaddressanddestinationTCPportnumber.Ifamappingisfound,thepacketistranslatedandforwarded intotheinternalnetwork.Otherwise,thepacketisdiscardedastheNATcannotdeterminetowhichparticular internalhostthepacketshouldbeforwarded.Forthisreason, With 2 16 differentportnumbers,aNATmaysupportalargenumberofhostswithasinglepublicIPv4address. However,itshouldbenotedthatsomeapplicationsopenalargenumberofTCPconnections [Miyakawa2008]. EachoftheseTCPconnectionsconsumesonemappingentryintheNAT'smappingtable. NATallowsmanyhoststoshareoneorafewpublicIPv4addresses.However,usingNAThastwoimportant drawbacks.First,itisdifcultforexternalhoststoopenTCPconnectionswithhoststhatarebehindaNAT.Some considerthistobeabenetfromasecurityperspective.However,aNATshouldnotbeconfusedwitharewall astherearesometechniquestotraverseNATs.Second,NATbreakstheend-to-endtransparencyofthenetwork andtransportlayers.ThemainproblemiswhenanapplicationlayerprotocolusesIPaddressesinsomeofthe ADUsthatitsends.Apopularexampleisftpdenedin RFC959.Inthiscase,thereisamismatchbetweenthe packetheadertranslatedbytheNATandthepacketpayload.Theonlysolutiontosolvethisproblemistoplace anApplicationLevelGateway(ALG)ontheNATthatunderstandstheapplicationlayerprotocolandcanthus translatetheIPaddressesandportnumbersfoundintheADUs.However,deninganALGforeachapplication iscostlyandapplicationdevelopersshouldavoidusingIPaddressesinthemessagesexchangedintheapplication layer RFC3235. Note: IPv6andNAT NAThasbeenverysuccessfulwithIPv4.GiventhesizeoftheIPv6addressingspace,theIPv6designersexpected thatNATwouldneverbeusefulwithIPv6.Theend-to-endtransparencyofIPv6hasbeenoneofitskeyselling pointscomparedtoIPv4.However,theexpectedshortageofIPv4addressesleadenterprisenetworkadministrators toconsiderIPv6moreseriously.OneoftheresultsofthisanalysisisthattheIETFdenedNATdevices [WB2008] thatareIPv6specic.AnotherusageofNATwithIPv6istoallowIPv6hoststoaccessIPv4destinationsand conversely.TheearlyIPv6specicationsincludedtheNetworkAddressTranslation-ProtocolTranslation(NATPT)mechanismdenedin RFC2766.Thismechanismwaslaterdeprecatedin RFC4966 buthasbeenrecently restartedunderthenameNAT64 [BMvB2009].ANAT64isamiddleboxthatperformstheIPv6<->IPv4packet translationtoallowIPv6hoststocontactIPv4servers RFC6144. 5.2.InternetProtocol 169

    PAGE 174

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 5.3RoutinginIPnetworks InalargeIPnetworksuchastheglobalInternet,routersneedtoexchangeroutinginformation.TheInternetis aninterconnectionofnetworks,oftencalleddomains,thatareunderdifferentresponsibilities.Asofthiswriting, theInternetiscomposedonmorethan30,000differentdomainsandthisnumberisstillgrowing.Adomaincan beasmallenterprisethatmanagesafewroutersinasinglebuilding,alargerenterprisewithahundredrouters atmultiplelocations,oralargeInternetServiceProvidermanagingthousandsofrouters.Twoclassesofrouting protocolsareusedtoallowthesedomainstoefcientlyexchangeroutinginformation. Figure5.46:OrganisationofasmallInternet Therstclassofroutingprotocolsarethe intradomainroutingprotocols (sometimesalsocalledtheinteriorgatewayprotocolsor IGP).Anintradomainroutingprotocolisusedbyallroutersinsideadomaintoexchangerouting informationaboutthedestinationsthatarereachableinsidethedomain.Thereareseveralintradomainrouting protocols.Somedomainsuse RIP,whichisadistancevectorprotocol.Otherdomainsuselink-stateroutingprotocolssuchas OSPF or IS-IS .Finally,somedomainsusestaticroutingorproprietaryprotocolssuchas IGRP or EIGRP. Theseintradomainroutingprotocolsusuallyhavetwoobjectives.First,theydistributeroutinginformationthat correspondstotheshortestpathbetweentworoutersinthedomain.Second,theyshouldallowtheroutersto quicklyrecoverfromlinkandrouterfailures. Thesecondclassofroutingprotocolsarethe interdomainroutingprotocols (sometimesalsocalledtheexterior gatewayprotocolsor EGP).Theobjectiveofaninterdomainroutingprotocolistodistributeroutinginformation betweendomains.Forscalabilityreasons,aninterdomainroutingprotocolmustdistributeaggregatedrouting informationandconsiderseachdomainasablackbox. Averyimportantdifferencebetweenintradomainandinterdomainroutingarethe routingpolicies thatareused byeachdomain.Insideasingledomain,allroutersareconsideredequal,andwhenseveralroutesareavailable toreachagivendestinationprex,thebestrouteisselectedbasedontechnicalcriteriasuchastheroutewiththe shortestdelay,theroutewiththeminimumnumberofhopsortheroutewiththehighestbandwidth. Whenweconsidertheinterconnectionofdomainsthataremanagedbydifferentorganisations,thisisnolonger true.Eachdomainimplementsitsownroutingpolicy.Aroutingpolicyiscomposedofthreeelements:an import lter thatspecieswhichroutescanbeacceptedbyadomain,an exportlter thatspecieswhichroutescanbe advertisedbyadomainandarankingalgorithmthatselectsthebestroutewhenadomainknowsseveralroutes towardsthesamedestinationprex.Aswewillseelater,anotherimportantdifferenceisthattheobjectiveofthe interdomainroutingprotocolistondthe cheapest routetowardseachdestination.Thereisonlyoneinterdomain routingprotocol: BGP. 5.3.1Intradomainrouting Inthissection,webrieydescribethekeyfeaturesofthetwomainintradomainunicastroutingprotocols:RIP andOSPF. 170Chapter5.Thenetworklayer

    PAGE 175

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 RIP TheRoutingInformationProtocol(RIP)isthesimplestroutingprotocolthatwasstandardisedfortheTCP/IP protocolsuite.RIPisdenedin RFC2453.AdditionalinformationaboutRIPmaybefoundin [Malkin1999] RIProutersperiodicallyexchangeRIPmessages.Theformatofthesemessagesisshownbelow.ARIPmessage issentinsideaUDPsegmentwhosedestinationportissetto 521.ARIPmessagecontainsseveralelds.The Cmd eldindicateswhethertheRIPmessageisarequestoraresponse.RouterssendoneofmoreRIPresponse messagesevery30seconds.Thesemessagescontainthedistancevectorsthatsummarizetherouter'sroutingtable. TheRIPrequestmessagescanbeusedbyroutersorhoststoqueryotherroutersaboutthecontentoftheirrouting table.AtypicalusageiswhenarouterbootsandquicklywantstoreceivetheRIPresponsesfromitsneighbours tocomputeitsownroutingtable.ThecurrentversionofRIPisversion2denedin RFC2453 forIPv4and RFC 2080 forIPv6. Figure5.47:RIPmessageformat TheRIPheadercontainsanauthenticationeld.Thisauthenticationcanbeusedbynetworkadministratorsto ensurethatonlytheRIPmessagessentbytheroutersthattheymanageareusedtobuildtheroutingtables. RFC 2453 onlysupportsabasicauthenticationschemewhereallroutersareconguredwiththesamepasswordand includethispasswordinallRIPmessages.Thisisnotverysecuresinceanattackercanknowthepassword bycapturingasingleRIPmessage.However,thispasswordcanprotectagainstcongurationerrors.Stronger authenticationschemesaredescribedin RFC2082 and RFC4822,butthedetailsofthesemechanismsareoutside thescopeofthissection. EachRIPmessagecontainsasetofrouteentries.Eachrouteentryisencodedasa20byteseldwhoseformat isshownbelow.RIPwasinitiallydesignedtobesuitablefordifferentnetworklayerprotocols.SomeimplementationsofRIPwereusedinXNSorIPXnetworks.ThersteldoftheRIProuteentryisthe AddressFamily Identier (AFI ).Thisidentierindicatesthetypeofaddressfoundintherouteentry 22 .IPv4uses AFI=1.The otherimportanteldsoftherouteentryaretheIPv4prex,thenetmaskthatindicatesthelengthofthesubnet identierandisencodedasa32bitsnetmaskandthemetric.Althoughthemetricisencodedasa32bitseld,the maximumRIPmetricis 15 (forRIP, 16= 1) Figure5.48:FormatoftheRIPIPv4routeentries( RFC2453) Witha20bytesrouteentry,itwasdifculttousethesameformatasabovetosupportIPv6.Insteadofdeninga variablelengthrouteentryformat,thedesignersof RFC2080 denedanewformatthatdoesnotincludean AFI eld.Theformatoftherouteentriesusedby RFC2080 isshownbelow. Plen isthelengthofthesubnetidentier inbitsandthemetricisencodedasonebyte.Themaximummetricisstill 15. 22 TheAddressFamilyIdentiersaremaintainedbyIANAathttp://www.iana.org/assignments/address-family-numbers/ 5.3.RoutinginIPnetworks 171

    PAGE 176

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.49:FormatoftheRIPIPv6routeentries Note: Anoteontimers TherstRIPimplementationssenttheirdistancevectorexactlyevery30seconds.Thisworkedwellinmost networks,butsomeresearchersnoticedthatroutersweresometimesoverloadedbecausetheywereprocessing toomanydistancevectorsatthesametime [FJ1994].Theycollectedpackettracesinthesenetworksandfound thataftersometimetherouters'timersbecamesynchronised,i.e.almostallroutersweresendingtheirdistance vectorsatalmostthesametime.Thissynchronisationofthetransmissiontimesofthedistancevectorscaused anoverloadontherouters'CPUbutalsoincreasedtheconvergencetimeoftheprotocolinsomecases.This wasmainlyduetothefactthatallrouterssettheirtimerstothesameexpirationtimeafterhavingprocessedthe receiveddistancevectors. SallyFloyd and VanJacobson proposedin [FJ1994] asimplesolutiontosolvethis synchronisationproblem.Insteadofadvertisingtheirdistancevectorexactlyafter30seconds,aroutershould senditsnextdistancevectorafteradelaychosenrandomlyinthe[15,45]interval RFC2080.Thisrandomisation ofthedelayspreventsthesynchronisationthatoccurswithaxeddelayandisnowarecommendedpracticefor protocoldesigners. OSPF Link-stateroutingprotocolsareusedinIPnetworks.OpenShortestPathFirst(OSPF),denedin RFC2328,isthe linkstateroutingprotocolthathasbeenstandardisedbytheIETF.ThelastversionofOSPF,whichsupportsIPv6, isdenedin RFC5340.OSPFisfrequentlyusedinenterprisenetworksandinsomeISPnetworks.However, ISPnetworksoftenusetheIS-ISlink-stateroutingprotocol [ISO10589] ,whichwasdevelopedfortheISOCLNP protocolbutwasadaptedtobeusedinIP RFC1195 networksbeforethenalisationofthestandardisation ofOSPF.AdetailedanalysisofISISandOSPFmaybefoundin [BMO2006] and [Perlman2000].Additional informationaboutOSPFmaybefoundin [Moy1998]. Comparedtothebasicsoflink-stateroutingprotocolsthatwediscussedinsection Linkstaterouting,thereare someparticularitiesofOSPFthatareworthdiscussing.First,inalargenetwork,oodingtheinformationabout allroutersandlinkstothousandsofroutersormoremaybecostlyaseachrouterneedstostorealltheinformation abouttheentirenetwork.Abetterapproachwouldbetointroducehierarchicalrouting.Hierarchicalrouting dividesthenetworkintoregions.Alltheroutersinsidearegionhavedetailedinformationaboutthetopologyof theregionbutonlylearnaggregatedinformationaboutthetopologyoftheotherregionsandtheirinterconnections. OSPFsupportsarestrictedvariantofhierarchicalrouting.InOSPF'sterminology,aregioniscalledan area. OSPFimposesrestrictionsonhowanetworkcanbedividedintoareas.Anareaisasetofroutersandlinksthat aregroupedtogether.Usually,thetopologyofanareaischosensothatapacketsentbyonerouterinsidethearea canreachanyotherrouterintheareawithoutleavingthearea 23 .AnOSPFareacontainstwotypesofrouters RFC2328: Internalrouter:Arouterwhosedirectlyconnectednetworksbelongtothearea Areaborderrouters:Arouterthatisattachedtoseveralareas. Forexample,thenetworkshowninthegurebelowhasbeendividedintothreeareas: area1,containingrouters R1, R3, R4, R5 and RA, area2 containing R7, R8, R9, R10, RB and RC.OSPFareasareidentiedbya32bit integer,whichissometimesrepresentedasanIPaddress.AmongtheOSPFareas, area0,alsocalledthe backbone area hasaspecialrole.Thebackboneareagroupsalltheareaborderrouters(routers RA, RB and RC inthegure below)andtheroutersthataredirectlyconnectedtothebackboneroutersbutdonotbelongtoanotherarea(router 23 OSPFcansupport virtuallinks toconnectrouterstogetherthatbelongtothesameareabutarenotdirectlyconnected.However,thisgoes beyondthisintroductiontoOSPF. 172Chapter5.Thenetworklayer

    PAGE 177

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 RD inthegurebelow).AnimportantrestrictionimposedbyOSPFisthatthepathbetweentworoutersthat belongtotwodifferentareas(e.g. R1 and R8 inthegurebelow)mustpassthroughthebackbonearea. Figure5.50:OSPFareas Insideeachnon-backbonearea,routersdistributethetopologyoftheareabyexchanginglinkstatepacketswith theotherroutersinthearea.Theinternalroutersdonotknowthetopologyofotherareas,buteachrouterknows howtoreachthebackbonearea.Insideanarea,theroutersonlyexchangelink-statepacketsforalldestinations thatarereachableinsidethearea.InOSPF,theinter-arearoutingisdonebyexchangingdistancevectors.Thisis illustratedbythenetworktopologyshownbelow. LetusrstconsiderOSPFroutinginside area2.Allroutersinthearealearnaroutetowards 192.168.1.0/24 and 192.168.10.0/24.Thetwoareaborderrouters, RB and RC,createnetworksummaryadvertisements.Assuming thatalllinkshaveaunitlinkmetric,thesewouldbe: RB advertises 192.168.1.0/24 atadistanceof 2 and 192.168.10.0/24 atadistanceof 3 RC advertises 192.168.1.0/24 atadistanceof 3 and 192.168.10.0/24 atadistanceof 2 Thesesummaryadvertisementsareoodedthroughthebackboneareaattachedtorouters RB and RC.Initsrouting table,router RA selectsthesummaryadvertisedby RB toreach 192.168.1.0/24 andthesummaryadvertisedby RC toreach 192.168.10.0/24.Inside area1,router RA advertisesasummaryindicatingthat 192.168.1.0/24 and 192.168.10.0/24 arebothatadistanceof 3 fromitself. Ontheotherhand,considertheprexes 10.0.0.0/24 and 10.0.1.0/24 thatareinside area1.Router RA istheonly areaborderrouterthatisattachedtothisarea.Thisroutercancreatetwodifferentnetworksummaryadvertisements: 10.0.0.0/24 atadistanceof 1 and 10.0.1.0/24 atadistanceof 2 from RA 10.0.0.0/23 atadistanceof 2 from RA Therstsummaryadvertisementprovidespreciseinformationaboutthedistanceusedtoreacheachprex.However,allroutersinthenetworkhavetomaintainaroutetowards 10.0.0.0/24 andaroutetowards 10.0.1.0/24 that arebothviarouter RA.ThesecondadvertisementwouldimprovethescalabilityofOSPFbyreducingthenumber ofroutesthatareadvertisedacrossareaboundaries.However,inpracticethisrequiresmanualcongurationon theborderrouters. ThesecondOSPFparticularitythatisworthdiscussingisthesupportofLocalAreaNetworks(LAN).Asshown intheexamplebelow,severalroutersmaybeattachedtothesameLAN. 5.3.RoutinginIPnetworks 173

    PAGE 178

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.51:HierarchicalroutingwithOSPF Figure5.52:AnOSPFLANcontainingseveralrouters 174Chapter5.Thenetworklayer

    PAGE 179

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 ArstsolutiontosupportsuchaLANwithalink-stateroutingprotocolwouldbetoconsiderthataLANis equivalenttoafull-meshofpoint-to-pointlinksasifeachroutercandirectlyreachanyotherrouterontheLAN. However,thisapproachhastwoimportantdrawbacks: 1.EachroutermustexchangeHELLOsandlinkstatepacketswithalltheotherroutersontheLAN.This increasesthenumberofOSPFpacketsthataresentandprocessedbyeachrouter. 2.Remoterouters,whenlookingatthetopologydistributedbyOSPF,considerthatthereisafull-meshof linksbetweenalltheLANrouters.Suchafull-meshimpliesalotofredundancyincaseoffailure,whilein practicetheentireLANmaycompletelyfail.IncaseofafailureoftheentireLAN,allroutersneedtodetect thefailuresandoodlinkstatepacketsbeforetheLANiscompletelyremovedfromtheOSPFtopologyby remoterouters. TobetterrepresentLANsandreducethenumberofOSPFpacketsthatareexchanged,OSPFhandlesLANdifferently.WhenOSPFroutersbootonaLAN,theyelect 24 oneofthemasthe DesignatedRouter(DR) RFC2328. The DR router represents thelocalareanetwork,andadvertisestheLAN'ssubnet(138.48.4.0/24 intheexample above).Furthermore,LANroutersonlyexchangeHELLOpacketswiththe DR.Thankstotheutilisationofa DR, thetopologyoftheLANappearsasasetofpoint-to-pointlinksconnectedtothe DR asshowninthegurebelow. Figure5.53:OSPFrepresentationofaLAN Note: Howtoquicklydetectalinkfailure? NetworkoperatorsexpectanOSPFnetworktobeabletoquicklyrecoverfromlinkorrouterfailures [VPD2004]. InanOSPFnetwork,therecoveryafterafailureisperformedinthreesteps [FFEB2005] : theroutersthatareadjacenttothefailuredetectitquickly.Thedefaultsolutionistorelyontheregular exchangeofHELLOpackets.However,theintervalbetweensuccessiveHELLOsisoftensetto10seconds...SettingtheHELLOtimerdowntoafewmillisecondsisdifcultasHELLOpacketsarecreatedand processedbythemainCPUoftheroutersandtheserouterscannoteasilygenerateandprocessaHELLO packeteverymillisecondoneachoftheirinterfaces.Abettersolutionistouseadedicatedfailuredetection protocolsuchastheBidirectionalForwardingDetection(BFD)protocoldenedin [KW2009] thatcanbe implementeddirectlyontherouterinterfaces.Anothersolutiontobeabletodetectthefailureistoinstrumentthephysicalandthedatalinklayersothattheycaninterrupttherouterwhenalinkfails.Unfortunately, suchasolutioncannotbeusedonalltypesofphysicalanddatalinklayers. theroutersthathavedetectedthefailureoodtheirupdatedlinkstatepacketsinthenetwork allroutersupdatetheirroutingtable 5.3.2Interdomainrouting Asexplainedearlier,theInternetiscomposedofmorethan30,000differentnetworks 25 called domains.Each domainiscomposedofagroupofroutersandhoststhataremanagedbythesameorganisation.Exampledomains includebelnet,sprint,level3, geant, abilene, cisco or google ... 24 TheOSPFDesignatedRouterelectionprocedureisdenedin RFC2328.Eachroutercanbeconguredwitharouterprioritythat inuencestheelectionprocesssincetherouterwiththehighestpriorityispreferredwhenanelectionisrun. 25 AnanalysisoftheevolutionofthenumberofdomainsontheglobalInternetduringthelasttenyearsmaybefoundin http://www.potaroo.net/tools/asn32/ 5.3.RoutinginIPnetworks 175

    PAGE 180

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Eachdomaincontainsasetofrouters.Fromaroutingpointofview,thesedomainscanbedividedintotwo classes:the transit andthe stub domains.A stub domainsendsandreceivespacketswhosesourceordestination areoneofitsownhosts.A transit domainisadomainthatprovidesatransitserviceforotherdomains,i.e.the routersinthisdomainforwardpacketswhosesourceanddestinationdonotbelongtothetransitdomain.Asof thiswriting,about85%ofthedomainsintheInternetarestubdomains 26 .A stub domainthatisconnectedtoa singletransitdomainiscalleda single-homedstub.A multihomedstub isa stub domainconnectedtotwoormore transitproviders. Figure5.54:Transitandstubdomains Thestubdomainscanbefurtherclassiedbyconsideringwhethertheymainlysendorreceivepackets.An accessrich stubdomainisadomainthatcontainshoststhatmainlyreceivepackets.Typicalexamplesincludesmall ADSL-orcablemodem-basedInternetServiceProvidersorenterprisenetworks.Ontheotherhand,a contentrich stubdomainisadomainthatmainlyproducespackets.Examplesof content-rich stubdomainsinclude google, yahoo, microsoft, facebook orcontentdistributionnetworkssuchas akamai or limelight Forthelastfewyears,we haveseenarapidgrowthofthese content-rich stubdomains.Recentmeasurements [ATLAS2009] indicatethata growingfractionofallthepacketsexchangedontheInternetareproducedinthedatacentersmanagedbythese contentproviders. DomainsneedtobeinterconnectedtoallowahostinsideadomaintoexchangeIPpacketswithhostslocated inotherdomains.Fromaphysicalperspective,domainscanbeinterconnectedintwodifferentways.Therst solutionistodirectlyconnectarouterbelongingtotherstdomainwitharouterinsidetheseconddomain.Such linksbetweendomainsarecalledprivateinterdomainlinksor privatepeeringlinks.Inpractice,forredundancyor performancereasons,distinctphysicallinksareusuallyestablishedbetweendifferentroutersinthetwodomains thatareinterconnected. Figure5.55:Interconnectionoftwodomainsviaaprivatepeeringlink Such privatepeeringlinks areusefulwhen,forexample,anenterpriseoruniversitynetworkneedstobeconnected toitsInternetServiceProvider.However,somedomainsareconnectedtohundredsofotherdomains 27 .Forsome ofthesedomains,usingonlyprivatepeeringlinkswouldbetoocostly.Abettersolutiontoallowmanydomains tointerconnectcheaplyarethe InterneteXchangePoints (IXP).An IXP isusuallysomespaceinadatacenterthat hostsroutersbelongingtodifferentdomains.Adomainwillingtoexchangepacketswithotherdomainspresent atthe IXP installsoneofitsroutersonthe IXP andconnectsittootherroutersinsideitsownnetwork.TheIXP containsaLocalAreaNetworktowhichalltheparticipatingroutersareconnected.Whentwodomainsthatare presentattheIXPwish 28 toexchangepackets,theysimplyusetheLocalAreaNetwork.IXPsareverypopular 26 SeveralwebsitescollectandanalysedataabouttheevolutionofBGPintheglobalInternet. http://bgp.potaroo.net provideslotsof statisticsandanalysesthatareupdateddaily. 27 See http://as-rank.caida.org/ forananalysisoftheinterconnectionsbetweendomainsbasedonmeasurementscollectedintheglobal Internet 28 TworoutersthatareattachedtothesameIXPonlyexchangepacketswhentheownersoftheirdomainshaveaneconomicalincentiveto exchangepacketsonthisIXP.Usually,arouteronanIXPisonlyabletoexchangepacketswithasmallfractionoftheroutersthatarepresent onthesameIXP. 176Chapter5.Thenetworklayer

    PAGE 181

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 inEuropeandmanyInternetServiceProvidersandContentprovidersarepresentintheseIXPs. Figure5.56:InterconnectionoftwodomainsatanInterneteXchangePoint IntheearlydaysoftheInternet,domainswouldsimplyexchangealltheroutestheyknowtoallowahostinside onedomaintoreachanyhostintheglobalInternet.However,intoday'shighlycommercialInternet,thisisno longertrueasinterdomainroutingmainlyneedstotakeintoaccounttheeconomicalrelationshipsbetweenthe domains.Furthermore,whileintradomainroutingusuallypreferssomeroutesoverothersbasedontheirtechnical merits(e.g.preferroutewiththeminimumnumberofhops,preferroutewiththeminimumdelay,preferhigh bandwidthroutesoverlowbandwidthones,etc)interdomainroutingmainlydealswitheconomicalissues.For interdomainrouting,thecostofusingarouteisoftenmoreimportantthanthequalityoftheroutemeasuredbyits delayorbandwidth. Therearedifferenttypesofeconomicalrelationshipsthatcanexistbetweendomains.Interdomainroutingconverts theserelationshipsintopeeringrelationshipsbetweendomainsthatareconnectedviapeeringlinks. Therstcategoryofpeeringrelationshipisthe customer->provider relationship.Sucharelationshipisusedwhen acustomerdomainpaysanInternetServiceProvidertobeabletoexchangepacketswiththeglobalInternetover aninterdomainlink.AsimilarrelationshipisusedwhenasmallInternetServiceProviderpaysalargerInternet ServiceProvidertoexchangepacketswiththeglobalInternet. Figure5.57:AsimpleInternetwithpeeringrelationships Tounderstandthe customer->provider relationship,letusconsiderthesimpleinternetworkshowninthegure above.Inthisinternetwork, AS7 isastubdomainthatisconnectedtooneprovider: AS4.Thecontractbetween AS4 and AS7 allowsahostinside AS7 toexchangepacketswithanyhostintheinternetwork.Toenablethis exchangeofpackets, AS7 mustknowaroutetowardsanydomainandallthedomainsoftheinternetworkmust knowaroutevia AS4 thatallowsthemtoreachhostsinside AS7.Fromaroutingperspective,thecommercial contractbetween AS7 and AS4 leadstothefollowingroutesbeingexchanged: overa customer->provider relationship,the customer domainadvertisestoits provider allitsroutesandall theroutesthatithaslearnedfromitsowncustomers. overa provider->customer relationship,the provider advertisesalltheroutesthatitknowstoits customer. Thesecondruleensuresthatthecustomerdomainreceivesaroutetowardsalldestinationsthatarereachablevia itsprovider.TherstruleallowstheroutesofthecustomerdomaintobedistributedthroughouttheInternet. 5.3.RoutinginIPnetworks 177

    PAGE 182

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Comingbacktothegureabove, AS4 advertisestoitstwoproviders AS1 and AS2 itsownroutesandtheroutes learnedfromitscustomer, AS7.Ontheotherhand, AS4 advertisesto AS7 alltheroutesthatitknows. Thesecondtypeofpeeringrelationshipisthe shared-cost peeringrelationship.Sucharelationshipusuallydoes notinvolveapaymentfromonedomaintotheotherincontrastwiththe customer->provider relationship.A shared-cost peeringrelationshipisusuallyestablishedbetweendomainshavingasimilarsizeandgeographic coverage.Forexample,considerthegureabove.If AS3 and AS4 exchangemanypacketsvia AS1,theybothneed topay AS1.Acheaperalternativefor AS3 and AS4 wouldbetoestablisha shared-cost peering.Suchapeering canbeestablishedatIXPswhereboth AS3 and AS4 arepresentorbyusingprivatepeeringlinks.This shared-cost peeringshouldbeusedtoexchangepacketsbetweenhostsinside AS3 andhostsinside AS4.However, AS3 does notwanttoreceiveonthe AS3-AS4shared-cost peeringlinkspacketswhosedestinationbelongsto AS1 as AS3 wouldhavetopaytosendthesepacketsto AS1. Fromaroutingperspective,overa shared-cost peeringrelationshipadomainonlyadvertisesitsinternalroutes andtheroutesthatithaslearnedfromitscustomers.Thisrestrictionensuresthatonlypacketsdestinedtothe localdomainoroneofitscustomersisreceivedoverthe shared-cost peeringrelationship.Thisimpliesthatthe routesthathavebeenlearnedfromaproviderorfromanother shared-cost peerisnotadvertisedovera shared-cost peeringrelationship.Thisismotivatedbyeconomicalreasons.Ifadomainweretoadvertisetheroutesthatit learnedfromaproviderovera shared-cost peeringrelationshipthatdoesnotbringrevenue,itwouldhaveallowed its shared-cost peertousethelinkwithitsproviderwithoutanypayment.Ifadomainweretoadvertisetheroutes itlearnedovera sharedcost peeringoveranother shared-cost peeringrelationship,itwouldhaveallowedthese shared-cost peerstouseitsownnetwork(whichmayspanoneormorecontinents)freelytoexchangepackets. Finally,thelasttypeofpeeringrelationshipisthe sibling.Sucharelationshipisusedwhentwodomainsexchange alltheirroutesinbothdirections.Inpractice,sucharelationshipisonlyusedbetweendomainsthatbelongtothe samecompany. Thesedifferenttypesofrelationshipsareimplementedinthe interdomainroutingpolicies denedbyeachdomain. The interdomainroutingpolicy ofadomainiscomposedofthreemainparts: the importlter thatspecies,foreachpeeringrelationship,theroutesthatcanbeacceptedfromtheneighbouringdomain(thenon-acceptableroutesareignoredandthedomainneverusesthemtoforwardpackets) the exportlter thatspecies,foreachpeeringrelationship,theroutesthatcanbeadvertisedtotheneighbouringdomain the ranking algorithmthatisusedtoselectthebestrouteamongalltheroutesthatthedomainhasreceived towardsthesamedestinationprex Adomain'simportandexportlterscanbedenedbyusingtheRoutePolicySpecicationLanguage(RPSL) speciedin RFC2622 [GAVE1999].SomeInternetServiceProviders,notablyinEurope,useRPSLtodocument 29 theirimportandexportpolicies.SeveraltoolshelptoeasilyconvertaRPSLpolicyintoroutercommands. Thegurebelowprovidesasimpleexampleofimportandexportltersfortwodomainsinasimpleinternetwork. InRPSL,thekeyword ANY isusedtoreplaceanyroutefromanydomain.Itistypicallyusedbyaproviderto indicatethatitannouncesallitsroutestoacustomerovera provider->customer relationship.Thisisthecase for AS4`sexportpolicy.Theexamplebelowclearlyshowsthedifferencebetweena provider->customer anda shared-cost peeringrelationship. AS4`sexportlterindicatesthatitannouncesonlyitsinternalroutes(AS4)and therouteslearnedfromitsclients(AS7 )overits shared-cost peeringwith AS3,whileitadvertisesalltheroutes thatituses(includingtherouteslearnedfrom AS3)to AS7. TheBorderGatewayProtocol TheInternetusesasingleinterdomainroutingprotocol:theBorderGatewayProtocol(BGP).ThecurrentversionofBGPisdenedin RFC4271.BGPdiffersfromtheintradomainroutingprotocolsthatwehavealready discussedinseveralways.First,BGPisa path-vector protocol.WhenaBGProuteradvertisesaroutetowardsa prex,itannouncestheIPprexandtheinterdomainpathusedtoreachthisprex.FromBGP'spointofview, eachdomainisidentiedbyaunique AutonomousSystem (AS)number 30 andtheinterdomainpathcontainsthe ASnumbersofthetransitdomainsthatareusedtoreachtheassociatedprex.Thisinterdomainpathiscalledthe 29 Seeftp://ftp.ripe.net/ripe/dbasefortheRIPEdatabasethatcontainstheimportandexportpoliciesofmanyEuropeanISPs 30 Inthistext,weconsiderAutonomousSystemanddomainassynonyms.Inpractice,adomainmaybedividedintoseveralAutonomous Systems,butweignorethisdetail. 178Chapter5.Thenetworklayer

    PAGE 183

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.58:Importandexportpolicies ASPath.ThankstotheseAS-Paths,BGPdoesnotsufferfromthecount-to-innityproblemsthataffectdistance vectorroutingprotocols.Furthermore,theAS-Pathcanbeusedtoimplementsomeroutingpolicies.AnotherdifferencebetweenBGPandtheintradomainroutingprotocolsisthataBGProuterdoesnotsendtheentirecontents ofitsroutingtabletoitsneighboursregularly.GiventhesizeoftheglobalInternet,routerswouldbeoverloaded bythenumberofBGPmessagesthattheywouldneedtoprocess.BGPusesincrementalupdates,i.e.itonly announcestheroutesthathavechangedtoitsneighbours. ThegurebelowshowsasimpleexampleoftheBGProutesthatareexchangedbetweendomains.Inthisexample, prex 1.0.0.0/8 isannouncedby AS1. AS1 advertisesaBGProutetowardsthisprexto AS2.TheAS-Pathofthis routeindicatesthat AS1 istheoriginatoroftheprex.When AS4 receivestheBGProutefrom AS1,itre-announces itto AS2 andaddsitsASnumbertotheAS-Path. AS2 haslearnedtworoutestowardsprex 1.0.0.0/8.Itcompares thetworoutesandpreferstheroutelearnedfrom AS4 basedonitsownrankingalgorithm. AS2 advertisesto AS5 aroutetowards 1.0.0.0/8 withitsAS-Pathsetto AS2:AS4:AS1.ThankstotheAS-Path, AS5 knowsthatifitsends apackettowards 1.0.0.0/8 thepacketrstpassesthrough AS2,thenthrough AS4 beforereachingitsdestination inside AS1. Figure5.59:SimpleexchangeofBGProutes BGProutersexchangeroutesoverBGPsessions.ABGPsessionisestablishedbetweentworoutersbelongingto twodifferentdomainsthataredirectlyconnected.Asexplainedearlier,thephysicalconnectionbetweenthetwo routerscanbeimplementedasaprivatepeeringlinkoroveranInterneteXchangePoint.ABGPsessionbetween twoadjacentroutersrunsaboveaTCPconnection(thedefaultBGPportis179).Incontrastwithintradomain routingprotocolsthatexchangeIPpacketsorUDPsegments,BGPrunsaboveTCPbecauseTCPensuresareliable deliveryoftheBGPmessagessentbyeachrouterwithoutforcingtherouterstoimplementacknowledgements, checksums,etc.Furthermore,thetworoutersconsiderthepeeringlinktobeupaslongastheBGPsessionand theunderlyingTCPconnectionremainup 31 .ThetwoendpointsofaBGPsessionarecalled BGPpeers. 31 TheBGPsessionsandtheunderlyingTCPconnectionaretypicallyestablishedbytherouterswhentheybootbasedoninformationfound intheirconguration.TheBGPsessionsarerarelyreleased,exceptifthecorrespondingpeeringlinkfailsoroneoftheendpointscrashesor 5.3.RoutinginIPnetworks 179

    PAGE 184

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.60:ABGPpeeringsessionbetweentwodirectlyconnectedrouters Inpractice,toestablishaBGPsessionbetweenrouters R1 and R2 inthegureabove,thenetworkadministrator of AS3 mustrstcongureon R1 theIPaddressof R2 onthe R1-R2 linkandtheASnumberof R2.Router R1 then regularlytriestoestablishtheBGPsessionwith R2. R2 onlyagreestoestablishtheBGPsessionwith R1 onceit hasbeenconguredwiththeIPaddressof R1 anditsASnumber.Forsecurityreasons,arouterneverestablishes aBGPsessionthathasnotbeenmanuallyconguredontherouter. TheBGPprotocol RFC4271 denesseveraltypesofmessagesthatcanbeexchangedoveraBGPsession: OPEN :thismessageissentassoonastheTCPconnectionbetweenthetworoutershasbeenestablished. ItinitialisestheBGPsessionandallowsthenegotiationofsomeoptions.Detailsaboutthismessagemay befoundin RFC4271 NOTIFICATION :thismessageisusedtoterminateaBGPsession,usuallybecauseanerrorhasbeendetectedbytheBGPpeer.Arouterthatsendsorreceivesa NOTIFICATION messageimmediatelyshutdowns thecorrespondingBGPsession. UPDATE :thismessageisusedtoadvertisenewormodiedroutesortowithdrawpreviouslyadvertised routes. KEEPALIVE :thismessageisusedtoensurearegularexchangeofmessagesontheBGPsession,even whennoroutechanges.WhenaBGProuterhasnotsentan UPDATE messageduringthelast30seconds, itshallsenda KEEPALIVE messagetoconrmtotheotherpeerthatitisstillup.Ifapeerdoesnotreceive anyBGPmessageduringaperiodof90seconds 32 ,theBGPsessionisconsideredtobedownandallthe routeslearnedoverthissessionarewithdrawn. Asexplainedearlier,BGPreliesonincrementalupdates.ThisimpliesthatwhenaBGPsessionstarts,eachrouter rstsendsBGP UPDATE messagestoadvertisetotheotherpeeralltheexportableroutesthatitknows.Once alltheserouteshavebeenadvertised,theBGProuteronlysendsBGP UPDATE messagesaboutaprexifthe routeisnew,oneofitsattributeshaschangedortheroutebecameunreachableandmustbewithdrawn.TheBGP UPDATE messageallowsBGProuterstoefcientlyexchangesuchinformationwhileminimisingthenumberof bytesexchanged.Each UPDATE messagecontains: alistofIPprexesthatarewithdrawn alistofIPprexesthatare(re-)advertised thesetofattributes(e.g.AS-Path)associatedtotheadvertisedprexes Intheremainderofthischapter,andalthoughallroutinginformationisexchangedusingBGP UPDATE messages, weassumeforsimplicitythataBGPmessagecontainsonlyinformationaboutoneprexandweusethewords: Withdrawmessage toindicateaBGP UPDATE messagecontainingoneroutethatiswithdrawn Updatemessage toindicateaBGP UPDATE containinganeworupdatedroutetowardsonedestination prexwithitsattributes needstoberebooted. 32 90secondsisthedefaultdelayrecommendedby RFC4271.However,twoBGPpeerscannegotiateadifferenttimerduringthe establishmentoftheirBGPsession.UsingatoosmallintervaltodetectBGPsessionfailuresisnotrecommended.BFD [KW2009] canbe usedtoreplaceBGP'sKEEPALIVEmechanismiffastdetectionofinterdomainlinkfailuresisrequired. 180Chapter5.Thenetworklayer

    PAGE 185

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Fromaconceptualpointofview,aBGProuterconnectedto N BGPpeers,canbedescribedasbeingcomposed offourpartsasshowninthegurebelow. Figure5.61:OrganisationofaBGProuter Inthisgure,therouterreceivesBGPmessagesontheleftpartofthegure,processesthesemessagesandpossibly sendsBGPmessagesontherightpartofthegure.ABGProutercontainsthreeimportantdatastructures: the Adj-RIB-In containstheBGProutesthathavebeenreceivedfromeachBGPpeer.Theroutesinthe Adj-RIB-In arelteredbythe importlter beforebeingplacedinthe BGP-Loc-RIB.Thereisone import lter perBGPpeer. the LocalRoutingInformationBase (Loc-RIB)containsalltheroutesthatareconsideredasacceptableby therouter.The Loc-RIB maycontainseveralroutes,learnedfromdifferentBGPpeers,towardsthesame destinationprex. the ForwardingInformationBase (FIB)isusedbythedataplanetoforwardpacketstowardstheirdestination. The FIB contains,foreachdestination,thebestroutethathasbeenselectedbythe BGPdecisionprocess. Thisdecisionprocessisanalgorithmthatselects,foreachdestinationprex,thebestrouteaccordingtothe router'srankingalgorithmthatispartofitspolicy. the Adj-RIB-Out containstheBGProutesthathavebeenadvertisedtoeachBGPpeer.The Adj-RIB-Out for agivenpeerisbuiltbyapplyingthepeer`s exportlter ontheroutesthathavebeeninstalledinthe FIB. Thereisone exportlter perBGPpeer.Forthisreason,theAdj-RIB-Outofapeermaycontaindifferent routesthantheAdj-RIB-Outofanotherpeer. WhenaBGPsessionstarts,theroutersrstexchange OPEN messagestonegotiatetheoptionsthatapplythroughouttheentiresession.Then,eachrouterextractsfromitsFIBtheroutestobeadvertisedtothepeer.Itisimportant tonotethat,foreachknowndestinationprex,aBGProutercanonlyadvertisetoapeertheroutethatithasitself installedinsideits FIB.Theroutesthatareadvertisedtoapeermustpassthepeer's exportlter.The exportlter isasetofrulesthatdenewhichroutescanbeadvertisedoverthecorrespondingsession,possiblyafterhaving modiedsomeofitsattributes.One exportlter isassociatedtoeachBGPsession.Forexample,ona shared-cost peering,the exportlter onlyselectstheinternalroutesandtheroutesthathavebeenlearnedfroma customer. Thepseudo-codebelowshowstheinitialisationofaBGPsession. def initiliaze_BGP_session(RemoteAS,RemoteIP): #InitializeandstartBGPsession #SendBGPOPENMessagetoRemoteIPonport179 #FollowBGPstatemachine #advertiselocalroutesandrouteslearnedfrompeers / for d in BGPLocRIB: B=build_BGP_Update(d) S=Apply_Export_Filter(RemoteAS,B) if (S!=None): send_Update(S,RemoteAS,RemoteIP) #entireRIBhasbeensent #newUpdateswillbesenttoreflectlocalordistant #changesinrouters Intheabovepseudo-code,the build_BGP_UPDATE(d) procedureextractsfromthe BGPLoc-RIB thebestpath towardsdestination d (i.e.therouteinstalledintheFIB)andpreparesthecorrespondingBGP UPDATE message. 5.3.RoutinginIPnetworks 181

    PAGE 186

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Thismessageisthenpassedtothe exportlter thatreturnsNULLiftheroutecannotbeadvertisedtothepeeror the(possiblymodied)BGP UPDATE messagetobeadvertised.BGProutersallownetworkadministratorsto specifyverycomplex exportlters,seee.g. [WMS2004].Asimple exportlter thatimplementstheequivalentof splithorizon isshownbelow. def apply_export_filter(RemoteAS,BGPMsg): #checkifRemoteASalreadyreceivedroute if RemoteAS is BGPMsg.ASPath: BGPMsg=None #Manyadditionalexportpoliciescanbeconfigured: #AcceptorrefusetheBGPMsg #ModifyselectedattributesinsideBGPMsg return BGPMsg Atthispoint,theremoterouterhasreceivedalltheexportableBGProutes.Afterthisinitialexchange,therouter onlysends BGPUPDATE messageswhenthereisachange(additionofaroute,removalofarouteorchangein theattributesofaroute)inoneoftheseexportableroutes.Suchachangecanhappenwhentherouterreceivesa BGPmessage.Thepseudo-codebelowsummarizestheprocessingoftheseBGPmessages. defRecvd_BGPMsg(Msg,RemoteAS): B=apply_import_filer(Msg,RemoteAS) if(B==None):#Msgnotacceptable return ifIsUPDATE(Msg): Old_Route=BestRoute(Msg.prefix) Insert_in_RIB(Msg) Run_Decision_Process(RIB) if(BestRoute(Msg.prefix)!=Old_Route): #bestroutechanged B=build_BGP_Message(Msg.prefix); S=apply_export_filter(RemoteAS,B); if(S!=None):#announcebestroute send_UPDATE(S,RemoteAS,RemoteIP); elseif(Old_Route!=None): send_WITHDRAW(Msg.prefix,RemoteAS,RemoteIP) else:#MsgisWITHDRAW Old_Route=BestRoute(Msg.prefix) Remove_from_RIB(Msg) Run_Decision_Process(RIB) if(Best_Route(Msg.prefix)!=Old_Route): #bestroutechanged B=build_BGP_Message(Msg.prefix) S=apply_export_filter(RemoteAS,B) if(S!=None):#stillonebestroutetowardsMsg.prefix send_UPDATE(S,RemoteAS,RemoteIP); elseif(Old_Route!=None):#Nobestrouteanymore send_WITHDRAW(Msg.prefix,RemoteAS,RemoteIP); WhenaBGPmessageisreceived,therouterrstappliesthepeer's importlter toverifywhetherthemessageis acceptableornot.Ifthemessageisnotacceptable,theprocessingstops.Thepseudo-codebelowshowsasimple importlter.This importlter acceptsallroutes,exceptthosethatalreadycontainthelocalASintheirAS-Path. Ifsucharoutewasused,itwouldcausearoutingloop.Anotherexampleofan importlter wouldbealterused byanInternetServiceProvideronasessionwithacustomertoonlyacceptroutestowardstheIPprexesassigned tothecustomerbytheprovider.Onrealrouters, importlters canbemuchmorecomplexandsome importlters modifytheattributesofthereceivedBGP UPDATE [WMS2004]. def apply_import_filter(RemoteAS,BGPMsg): if MysAS in BGPMsg.ASPath: BGPMsg=None #Manyadditionalimportpoliciescanbeconfigured: #AcceptorrefusetheBGPMsg #ModifyselectedattributesinsideBGPMsg return BGPMsg 182Chapter5.Thenetworklayer

    PAGE 187

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Note: Thebogonlters Anotherexampleoffrequentlyused importlters aretheltersthatInternetServiceProvidersusetoignore bogonroutes.IntheISPcommunity,abogonrouteisaroutethatshouldnotbeadvertisedontheglobalInternet. TypicalexamplesincludetheprivateIPv4prexesdenedin RFC1918,theloopbackprexes(127.0.0.1/8 and ::1/128)ortheIPprexesthathavenotyetbeenallocatedbyIANA.AwellmanagedBGProutershouldensure thatitneveradvertisesbogonsontheglobalInternet.Detailedinformationaboutthesebogonsmaybefoundat http://www.team-cymru.org/Services/Bogons/ IftheimportlteracceptstheBGPmessage,thepseudo-codedistinguishestwocases.Ifthisisan Update message forprex p,thiscanbeanewrouteforthisprexoramodicationoftheroute'sattributes.Therouter rstretrievesfromits RIB thebestroutetowardsprex p.Then,thenewrouteisinsertedinthe RIB andthe BGP decisionprocess isruntondwhetherthebestroutetowardsdestination p changes.ABGPmessageonlyneedsto besenttotherouter'speersifthebestroutehaschanged.Foreachpeer,therouterappliesthe exportlter toverify whethertheroutecanbeadvertised.Ifyes,thelteredBGPmessageissent.Otherwise,a Withdrawmessage issent.Whentherouterreceivesa Withdrawmessage,italsoverieswhethertheremovaloftheroutefromits RIB causeditsbestroutetowardsthisprextochange.Itshouldbenotedthat,dependingonthecontentofthe RIB andthe exportlters,aBGProutermayneedtosenda Withdrawmessage toapeerafterhavingreceivedan Updatemessage fromanotherpeerandconversely. LetusnowdiscussinmoredetailtheoperationofBGPinanIPv4network.Forthis,letusconsiderthesimple networkcomposedofthreerouterslocatedinthreedifferentASesandshowninthegurebelow. Figure5.62:UtilisationoftheBGPnexthopattribute Thisnetworkcontainsthreerouters: R1, R2 and R3.EachrouterisattachedtoalocalIPv4subnetthatitadvertises usingBGP.TherearetwoBGPsessions,onebetween R1 and R2 andthesecondbetween R2 and R3.A /30 subnet isusedoneachinterdomainlink(195.100.0.0/30 on R1-R2 and 195.100.0.4/30 on R2-R3).TheBGPsessions runaboveTCPconnectionsestablishedbetweentheneighbouringrouters(e.g. 195.100.0.1-195.100.0.2 forthe R1-R2 session). Letusassumethatthe R1-R2 BGPsessionisthersttobeestablished.A BGPUpdate messagesentonsucha sessioncontainsthreeelds: theadvertisedprex the BGPnexthop theattributesincludingtheAS-Path Weusethenotation U(prex,nexthop,attributes) torepresentsucha BGPUpdate messageinthissection.Similarly, W(prex) representsa BGPwithdraw forthespeciedprex.Oncethe R1-R2 sessionhasbeenestablished, R1 sends U(194.100.0.0/24,195.100.0.1,AS10) to R2 and R2 sends U(194.100.2.0/23,195.100.0.2,AS20).Atthis point, R1 canreach 194.100.2.0/23 via 195.100.0.2 and R2 canreach 194.100.0.0/24 via 195.100.0.1. Oncethe R2-R3 hasbeenestablished, R3 sends U(194.100.1.0/24,195.100.0.6,AS30). R2 announcesonthe R2R3 sessionalltheroutesinsideitsRIB.Itthussendsto R3 : U(194.100.0.0/24,195.100.0.5,AS20:AS10) and U(194.100.2.0/23,195.100.0.5,AS20).Notethatwhen R2 advertisestheroutethatitlearnedfrom R1,itupdatesthe BGPnexthopandaddsitsASnumbertotheAS-Path. R2 alsosends U(194.100.1.0/24,195.100.0.2,AS20:AS30) to R1 onthe R1-R3 session.Atthispoint,allBGProuteshavebeenexchangedandallrouterscanreach 194.100.0.0/24, 194.100.2.0/23 and 194.100.1.0/24. 5.3.RoutinginIPnetworks 183

    PAGE 188

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Ifthelinkbetween R2 and R3 fails, R3 detectsthefailureasitdidnotreceive KEEPALIVE messagesrecentlyfrom R2.Atthistime, R3 removesfromitsRIBalltherouteslearnedoverthe R2-R3 BGPsession. R2 alsoremoves fromitsRIBtherouteslearnedfrom R3. R2 alsosends W(194.100.1.0/24) to R1 overthe R1-R3 BGPsession sinceitdoesnothavearouteanymoretowardsthisprex. Note: OriginoftheroutesadvertisedbyaBGProuter AfrequentpracticalquestionabouttheoperationofBGPishowaBGProuterdecidestooriginateoradvertisea routeforthersttime.Inpractice,thisoccursintwosituations: therouterhasbeenmanuallyconguredbythenetworkoperatortoalwaysadvertiseoneorseveralroutes onaBGPsession.Forexample,ontheBGPsessionbetweenUCLouvainanditsprovider, belnet ,UCLouvain'srouteralwaysadvertisesthe 130.104.0.0/16 IPv4prexassignedtothecampusnetwork therouterhasbeenconguredbythenetworkoperatortoadvertiseoveritsBGPsessionsomeoftheroutes thatitlearnswithitsintradomainroutingprotocol.Forexample,anenterpriseroutermayadvertiseovera BGPsessionwithitsprovidertheroutestoremotesiteswhentheseroutesarereachableandadvertisedby theintradomainroutingprotocol Therstsolutionisthemostfrequent.Advertisingrouteslearnedfromanintradomainroutingprotocolisnot recommended,thisisbecauseiftherouteaps 33 ,thiswouldcausealargenumberofBGPmessagesbeing exchangedintheglobalInternet. MostnetworksthatuseBGPcontainmorethanonerouter.Forexample,considerthenetworkshowninthegure belowwhere AS20 containstworoutersattachedtointerdomainlinks: R2 and R4.Inthisnetwork,tworouting protocolsareusedby R2 and R4.TheyuseanintradomainroutingprotocolsuchasOSPFtodistributetheroutes towardstheinternalprexes: 195.100.0.8/30, 195.100.0.0/30,... R2 and R4 alsouseBGP. R2 receivestheroutes advertisedby AS10 while R4 receivestheroutesadvertisedby AS30.Thesetworoutersneedtoexchangethe routesthattheyhaverespectivelyreceivedovertheirBGPsessions. Figure5.63:AlargernetworkusingBGP Arstsolutiontoallow R2 and R3 toexchangetheinterdomainroutesthattheyhavelearnedovertheirrespective BGPsessionswouldbetoconguretheintradomainroutingprotocoltodistributeinside AS20 therouteslearned overtheBGPsessions.Althoughcurrentrouterssupportthisfeature,thisisabadsolutionfortworeasons: 1.IntradomainroutingprotocolscannotdistributetheattributesthatareattachedtoaBGProute.If R4 received viatheintradomainroutingprotocolaroutetowards 194.100.0.0/23 that R2 learnedviaBGP,itwouldnot knowthattheroutewasoriginatedby AS10 andtheonlyadvertisementthatitcouldsendto R3 would containanincorrectAS-Path 2.Intradomainroutingprotocolshavenotbeendesignedtosupportthehundredsofthousandsofroutesthata BGProutercanreceiveontoday'sglobalInternet. 33 Alinkissaidtobeappingifitswitchesseveralbetweenanoperationalstateandadisabledstatewithinashortperiodoftime.Arouter attachedtosuchalinkwouldneedtofrequentlysendroutingmessages. 184Chapter5.Thenetworklayer

    PAGE 189

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 ThebestsolutiontoallowBGProuterstodistribute,insideanAS,alltherouteslearnedoverBGPsessionsisto establishBGPsessionsamongalltheBGProutersinsidetheAS.Inpractice,therearetwotypesofBGPsessions : eBGP sessionor externalBGPsession.SuchaBGPsessionisestablishedbetweentworoutersthatare directlyconnectedandbelongtotwodifferentdomains. iBGP sessionor internalBGPsession.SuchaBGPsessionisestablishedbetweentworoutersbelongingto thesamedomain.Thesetworoutersdonotneedtobedirectlyconnected. Inpractice,eachBGProuterinsideadomainmaintainsan iBGPsession witheveryotherBGProuterinthe domain 34 .Thiscreatesafull-meshof iBGPsessions amongallBGProutersofthedomain. iBGPsessions,like eBGPsessions runoverTCPconnections.Notethatincontrastwith eBGPsessions thatareestablishedbetween directlyconnectedrouters, iBGPsessions areoftenestablishedbetweenroutersthatarenotdirectlyconnected. Animportantpointtonoteabout iBGPsessions isthataBGProuteronlyadvertisesarouteoveran iBGPsession providedthat: therouterusesthisroutetoforwardpackets,and theroutewaslearnedoveroneoftherouter's eBGPsessions ABGProuterdoesnotadvertisearoutethatithaslearnedoveran iBGPsession overanother iBGPsession.Note thataroutercan,ofcourse,advertiseoveran eBGPsession aroutethatithaslearnedoveran iBGPsession. ThisdifferencebetweenthebehaviourofaBGProuterover iBGP and eBGP sessionisduetotheutilisationofa full-meshof iBGPsessions.ConsideranetworkcontainingthreeBGProuters: A, B and C interconnectedviaa full-meshofiBGPsessions.Ifrouter A learnsaroutetowardsprex p fromrouter B,router A doesnotneedto advertisethereceivedroutetorouter C sincerouter C alsolearnsthesamerouteoverthe C-BiBGPsession. Tounderstandtheutilisationofan iBGPsession,letusconsiderwhathappenswhenrouter R1 sends U(194.100.0.0/23,195.100.0.1,AS10) inthenetworkshownbelow.ThisBGPmessageisprocessedby R2 which advertisesitoverits iBGPsession with R4.The BGPUpdate sentby R2 containsthesamenexthopandthesame AS-Pathasinthe BGPUpdate receivedby R2. R4 thensends U(194.100.0.0/23,195.100.0.5,AS20:AS10) to R3. NotethattheBGPnexthopandtheAS-Pathareonlyupdated 35 whenaBGProuteisadvertisedoveran eBGP session. Figure5.64:iBGPandeBGPsessions Note: LoopbackinterfacesandiBGPsessions Inadditiontotheirphysicalinterfaces,routerscanalsobeconguredwithaspecialloopbackinterface.Aloopbackinterfaceisasoftwareinterfacethatisalwaysup.Whenaloopbackinterfaceisconguredonarouter, theaddressassociatedtothisinterfaceisadvertisedbytheintradomainroutingprotocol.Considerforexample 34 Usingafull-meshofiBGPsessionsissuitableinsmallnetworks.However,thissolutiondoesnotscaleinlargenetworkscontaining hundredsormorerouterssince n( n)Tj/T1_6 5.978 Tf(1) 2 iBGPsessionsmustbeestablishedinadomaincontaining n BGProuters.Largedomainsuseeither RouteReection RFC4456 orconfederations RFC5065 toscaletheiriBGP,butthisgoesbeyondthisintroduction. 35 Somerouters,whentheyreceivea BGPUpdate overan eBGPsession,setthenexthopofthereceivedroutetooneoftheirownaddresses. Thisiscalled nexthop-self.Seee.g. [WMS2004] foradditionaldetails. 5.3.RoutinginIPnetworks 185

    PAGE 190

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 arouterwithtwopoint-to-pointinterfacesandoneloopbackinterface.Whenapoint-to-pointinterfacefails,it becomesunreachableandtheroutercannotreceiveanymorepacketsviathisIPaddress.Thisisnotthecasefor theloopbackinterface.Itremainsreachableaslongasatleastoneoftherouter'sinterfacesremainsup. iBGP sessions areusuallyestablishedusingtherouter'sloopbackaddressesasendpoints.Thisallowsthe iBGPsession anditsunderlyingTCPconnectiontoremainupevenifphysicalinterfacesfailontherouters. NowthatrouterscanlearninterdomainroutesoveriBGPandeBGPsessions,letusexaminewhathappenswhen router R3 sendsapacketdestinedto 194.100.1.234. R3 forwardsthispacketto R4. R4 usesanintradomainrouting protocolandBGP.ItsBGProutingtablecontainsthefollowinglongestprexmatch: 194.100.0.0/23 via 195.100.0.1 Thisroutesindicatesthattoforwardapackettowards 194.100.0.0/23, R4 needstoforwardthepacketalongthe routetowards 195.100.0.1.However, R4 isnotdirectlyconnectedto 195.100.0.1. R4 learnedaroutethatmatches thisaddressthankstoitsintradomainroutingprotocolthatdistributedthefollowingroutes: 195.100.0.0/30 via 195.100.0.10 195.100.0.4/30 East 195.100.0.8/30 North 194.100.2.0/23 via 195.100.0.10 194.100.0.4/23 West Tobuilditsforwardingtable, R4 mustcombinetherouteslearnedfromtheintradomainroutingprotocolwiththe routeslearnedfromBGP.Thankstoitsintradomainroutingtable,foreachinterdomainroute R4 replacestheBGP nexthopwithitsshortestpathcomputedbytheintradomainroutingprotocol.Inthegureabove, R4 forwards packetsto 194.100.0.0/23 via 195.100.0.10 towhichitisdirectlyconnectedviaitsNorthinterface. R4 `sresulting forwardingtable,whichassociatesanoutgoinginterfaceforadirectlyconnectedprexoradirectlyconnected nexthopandanoutgoinginterfaceforprexeslearnedviaBGP,isshownbelow: 194.100.0.0/23 via 195.100.0.10 (North) 195.100.0.0/30 via 195.100.0.10 (North) 195.100.0.4/30 East 195.100.0.8/30 North 194.100.2.0/23 via 195.100.0.10 (North) 194.100.4.0/23 West Thereisthusacouplingbetweentheinterdomainandtheintradomainroutingtables.Iftheintradomainroutes change,e.g.duetolinkfailuresorchangesinlinkmetrics,thentheforwardingtablemustbeupdatedoneach routerastheshortestpathtowardsaBGPnexthopmayhavechanged. ThelastpointtobediscussedbeforelookingattheBGPdecisionprocessisthatanetworkmaycontainrouters thatdonotmaintainanyeBGPsession.Theserouterscanbestubroutersattachedtoasinglerouterinthenetwork orcoreroutersthatresideonthepathbetweentwoborderroutersthatareusingBGPasillustratedinthegure below. Inthescenarioabove,router R2 needstobeabletoforwardapackettowardsanydestinationinthe 12.0.0.0/8 prexinside AS30.Suchapacketwouldneedtobeforwardedbyrouter R5 sincethisrouterresidesonthepath between R2 anditsBGPnexthopattachedto R4.Twosolutionscanbeusedtoensurethat R2 isabletoforward suchinterdomainpackets: enableBGPonrouter R5 andincludethisrouterinthe iBGP full-mesh.TwoiBGPsessionswouldbeadded inthegureabove: R2-R5 and R4-R5.ThissolutionworksandisusedbymanyASes.However,itforces allrouterstohaveenoughresources(CPUandmemory)torunBGPandmaintainalargeforwardingtable encapsulatetheinterdomainpacketssentthroughtheASsothatrouter R5 neverneedstoforwardapacket whosedestinationisoutsidethelocalAS.Differentencapsulationmechanismsexist.MultiProtocolLabel Switching(MPLS) RFC3031 andtheLayer2TunnelingProtocol(L2TP) RFC3931 arefrequentlyused inlargedomains,butadetailedexplanationofthesetechniquesisoutsidethescopeofthissection.The 186Chapter5.Thenetworklayer

    PAGE 191

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.65:Howtodealwithnon-BGProuters? simplestencapsulationschemetounderstandisinIPinIPdenedin RFC2003.Thisencapsulationscheme placesanIPpacket(calledtheinnerpacket),includingitspayload,asthepayloadofalargerIPpacket (calledtheouterpacket).Itcanbeusedbyborderrouterstoforwardpacketsviaroutersthatdonotmaintain aBGProutingtable.Forexample,inthegureabove,ifrouter R2 needstoforwardapackettowards destination 12.0.0.1,itcanaddatthefrontofthispacketanIPv4headerwhosesourceaddressissettoone ofitsIPv4addressesandwhosedestinationaddressisoneoftheIPv4addressesof R4.The Protocol eld oftheIPheaderissetto 4 toindicatethatitcontainsanIPv4packet.Thepacketisforwardedby R5 to R4 basedontheforwardingtablethatitbuiltthankstoitsintradomainroutingtable.Uponreceptionofthe packet, R4 removestheouterheaderandconsultsits(BGP)forwardingtabletoforwardthepackettowards R3. TheBGPdecisionprocess Besidestheimportandexportlters,akeydifferencebetweenBGPandtheintradomainroutingprotocolsisthat eachdomaincandeneisownrankingalgorithmtodeterminewhichrouteischosentoforwardpacketswhen severalrouteshavebeenlearnedtowardsthesameprex.ThisrankingdependsonseveralBGPattributesthatcan beattachedtoaBGProute. TherstBGPattributethatisusedtorankBGProutesisthe local-preference (local-pref)attribute.Thisattribute isanunsignedintegerthatisattachedtoeachBGProutereceivedoveraneBGPsessionbytheassociatedimport lter. Whencomparingroutestowardsthesamedestinationprex,aBGProuteralwayspreferstherouteswiththe highest local-pref.IftheBGProuterknowsseveralrouteswiththesame local-pref,itprefersamongtheroutes havingthis local-pref theoneswiththeshortestAS-Path. The local-pref attributeisoftenusedtoprefersomeroutesoverothers.Thisattributeisalwayspresentinside BGPUpdates exchangedover iBGPsessions,butneverpresentinthemessagesexchangedover eBGPsessions. Acommonutilisationof local-pref istosupportbackuplinks.Considerthesituationdepictedinthegurebelow. AS1 wouldalwaysliketousethehighbandwidthlinktosendandreceivepacketsvia AS2 andonlyusethebackup linkuponfailureoftheprimaryone. AsBGProutersalwaysprefertherouteswiththehighest local-pref attribute,thispolicycanbeimplemented usingthefollowingimportlteron R1 import:fromAS2RAatR1setlocalpref=100; fromAS2RBatR1setlocalpref=200; acceptANY Withthisimportlter,alltheBGProuteslearnedfrom RB overthehighbandwidthlinksarepreferredoverthe routeslearnedoverthebackuplink.Iftheprimarylinkfails,thecorrespondingroutesareremovedfrom R1`sRIB and R1 usestheroutelearnedfrom RA. R1 reusestheroutesvia RB assoonastheyareadvertisedby RB oncethe R1-RB linkcomesback. 5.3.RoutinginIPnetworks 187

    PAGE 192

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.66:HowtocreateabackuplinkwithBGP? TheimportlterabovemodiestheselectionoftheBGProutesinside AS1.Thus,itinuencestheroutefollowed bythepacketsforwardedby AS1.Inadditiontousingtheprimarylinktosendpackets, AS1 wouldliketoreceive itspacketsviathehighbandwidthlink.Forthis, AS2 alsoneedstosetthe local-pref attributeinitsimportlter. import:fromAS1R1atRAsetlocalpref=100; fromAS1R1atRBsetlocalpref=200; acceptAS1 Sometimes,the local-pref attributeisusedtoprefera cheap linkcomparedtoamoreexpensiveone.Forexample, inthenetworkbelow, AS1 couldwishtosendandreceivepacketsmainlyviaitsinterdomainlinkwith AS4. Figure5.67:Howtopreferacheaplinkoveranmoreexpensiveone? AS1 caninstallthefollowingimportlteron R1 toensurethatitalwayssendspacketsvia R2 whenithaslearned aroutevia AS2 andanothervia AS4. import:fromAS2RAatR1setlocalpref=100; fromAS4R2atR1setlocalpref=200; acceptANY However,thisimportlterdoesnotinuencehow AS3 ,forexample,preferssomeroutesoverothers.Ifthelink between AS3 and AS2 islessexpensivethanthelinkbetween AS3 and AS4, AS3 couldsendallitspacketsvia AS2 and AS1 wouldreceivepacketsoveritsexpensivelink.Animportantpointtorememberabout local-pref isthat itcanbeusedtoprefersomeroutesoverotherstosendpackets,butithasnoinuenceontheroutesfollowedby receivedpackets. Anotherimportantutilisationofthe local-pref attributeistosupportthe customer->provider and shared-cost peeringrelationships.Fromaneconomicpointofview,thereisanimportantdifferencebetweenthesethreetypesof peeringrelationships.Adomainusuallyearnsmoneywhenitsendspacketsovera provider->customer relationship.Ontheotherhand,itmustpayitsproviderwhenitsendspacketsovera customer->provider relationship. 188Chapter5.Thenetworklayer

    PAGE 193

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Usinga shared-cost peeringtosendpacketsisusuallyneutralfromaneconomicperspective.Totakeintoaccount theseeconomicissues,domainsusuallyconguretheimportltersontheirroutersasfollows: insertahigh local-pref attributeintherouteslearnedfromacustomer insertamedium local-pref attributeintherouteslearnedoverashared-costpeering insertalow local-pref attributeintherouteslearnedfromaprovider Withsuchanimportlter,theroutersofadomainalwaysprefertoreachdestinationsviatheircustomerswhenever sucharouteexists.Otherwise,theyprefertouse shared-cost peeringrelationshipsandtheyonlysendpackets viatheirproviderswhentheydonotknowanyalternateroute.Aconsequenceofsettingthe local-pref attribute likethisisthatInternetpathsareoftenasymmetrical.Considerforexampletheinternetworkshowninthegure below. Figure5.68:AsymmetryofInternetpaths Considerinthisinternetworktheroutesavailableinside AS1 toreach AS5. AS1 learnsthe AS4:AS6:AS7:AS5 path from AS4,the AS3:AS8:AS5 pathfrom AS3 andthe AS2:AS5 pathfrom AS2.Therstpathischosensinceitwas fromlearnedfromacustomer. AS5 ontheotherhandreceivesthreepathstowards AS1 viaitsproviders.Itmay selectanyofthesepathstoreach AS1 ,dependingonhowitprefersoneproviderovertheothers. ComingbacktotheorganisationofaBGProutershowningure OrganisationofaBGProuter ,thelastpartto bediscussedistheBGPdecisionprocess.The BGPDecisionProcess isthealgorithmusedbyrouterstoselect theroutetobeinstalledintheFIBwhentherearemultipleroutestowardsthesameprex.TheBGPdecision processreceivesasetofcandidateroutestowardsthesameprexandusessevensteps.Ateachstep,someroutes areremovedfromthecandidatesetandtheprocessstopswhenthesetonlycontainsoneroute 36 : 1.IgnorerouteshavinganunreachableBGPnexthop 2.Preferrouteshavingthehighestlocal-pref 3.PreferrouteshavingtheshortestAS-Path 4.PreferrouteshavingthesmallestMED 5.PreferrouteslearnedviaeBGPsessionsoverrouteslearnedviaiBGPsessions 6.Preferrouteshavingtheclosestnext-hop 36 SomeBGPimplementationscanbeconguredtoinstallseveralroutestowardsasingleprexintheirFIBforload-balancingpurposes. However,thisgoesbeyondthisintroductiontoBGP. 5.3.RoutinginIPnetworks 189

    PAGE 194

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 7.Tiebreakingrules:preferrouteslearnedfromtherouterwithlowestrouterid TherststepoftheBGPdecisionprocessensuresthataBGProuterdoesnotinstallinitsFIBaroutewhose nexthopisconsideredtobeunreachablebytheintradomainroutingprotocol.Thiscouldhappen,forexample, whenarouterhascrashed.Theintradomainroutingprotocolusuallyadvertisesthefailureofthisrouterbefore thefailureoftheBGPsessionsthatitterminates.ThisruleimpliesthattheBGPdecisionprocessmustbere-run eachtimetheintradomainroutingprotocolreportsachangeinthereachabilityofaprexcontainingoneofmore BGPnexthops. Thesecondruleallowseachdomaintodeneitsroutingpreferences.The local-pref attributeissetbytheimport lteroftherouterthatlearnedarouteoveraneBGPsession. Incontrastwithintradomainroutingprotocols,BGPdoesnotcontainanexplicitmetric.Thisisbecauseinthe globalInternetitisimpossibleforalldomainstoagreeonacommonmetricthatmeetstherequirementsofall domains.Despitethis,BGProuterspreferrouteshavingashortAS-PathattributeoverrouteswithalongASPath.ThisstepoftheBGPdecisionprocessismotivatedbythefactthatoperatorsexpectthataroutewithalong AS-PathislowerqualitythanaroutewithashorterAS-Path.However,studieshaveshownthattherewasnot alwaysastrongcorrelationbetweenthequalityofarouteandthelengthofitsAS-Path [HFPMC2002]. BeforeexplainingthefourthstepoftheBGPdecisionprocess,letusrstdescribethefthandthesixthstepsof theBGPdecisionprocess.Thesetwostepsareusedtoimplement hotpotato routing.Intuitively,whenadomain implements hotpotatorouting,ittriestoforwardpacketsthataredestinedtoaddressesoutsideofitsdomain,to otherdomainsasquicklyaspossible. Tounderstand hotpotatorouting,letusconsiderthetwodomainsshowninthegurebelow. AS2 advertisesprex 1.0.0.0/8 overthe R2-R6 and R3-R7 peeringlinks.Theroutersinside AS1 learntworoutestowards 1.0.0.0/8:one via R6-R2 andthesecondvia R7-R3. Figure5.69:Hotandcoldpotatorouting WiththefthstepoftheBGPdecisionprocess,arouteralwayspreferstousearoutelearnedoveran eBGPsession comparedtoaroutelearnedoveran iBGPsession.Thus,router R6 (resp. R7 )preferstousetherouteviarouter R2 (resp. R3)toreachprex 1.0.0.0/8. ThesixthstepoftheBGPdecisionprocesstakesintoaccountthedistance,measuredasthelengthoftheshortest intradomainpath,betweenaBGProuterandtheBGPnexthopforrouteslearnedover iBGPsessions.Thisruleis usedonrouter R8 intheexampleabove.Thisrouterhasreceivedtworoutestowards 1.0.0.0/8: 1.0.0.0/8 via R7 thatisatadistanceof 1 from R8 1.0.0.0/8 via R6 thatisatadistanceof 50 from R8 Therstroute,via R7 istheonethatrouter R8 prefers,asthisistheroutethatminimisesthecostofforwarding packetsinside AS1 beforesendingthemto AS2. 190Chapter5.Thenetworklayer

    PAGE 195

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Hotpotatorouting allows AS1 tominimisethecostofforwardingpacketstowards AS2.However,thereare situationswherethisisnotdesirable.Forexample,assumethat AS1 and AS2 aredomainswithroutersonboth theEastandtheWestcoastoftheUS.Inthesetwodomains,thehighmetricassociatedtolinks R6-R8 and R0-R2 correspondtothecostofforwardingapacketacrosstheUSA.If AS2 isacustomerthatpays AS1,itwouldprefer toreceivethepacketsdestinedto 1.0.0.0/8 viathe R2-R6 linkinsteadofthe R7-R3 link.Thisistheobjectiveof coldpotatorouting. Coldpotatorouting isimplementedusingthe Multi-ExitDiscriminator(MED) attribute.Thisattributeisan optionalBGPattributethatmaybeset 37 byborderrouterswhenadvertisingaBGProuteoveran eBGPsession. TheMEDattributeisusuallyusedtoindicateoveran eBGPsession thecosttoreachtheBGPnexthopforthe advertisedroute.The MED attributeissetbytherouterthatadvertisesarouteoveran eBGPsession.Inthe exampleabove,router R2 sends U(1.0.0.0/8,R2,AS2,MED=1) while R3 sends U(1.0.0.0/8,R3,AS2,MED=98). AssumethattheBGPsession R7-3 isthersttobeestablished. R7 sends U(1.0.0.0/8,R3,AS2,MED=98) toboth R8 and R6.Atthispoint,allroutersinside AS1 sendthepacketstowards 1.0.0.0/8 via R7-R3.Then,the R6R2 BGPsessionisestablishedandrouter R6 receives U(1.0.0.0/8,R2,AS2,MED=1).Router R6 runsitsdecision processfordestination 1.0.0.0/8 andselectstheroutevia R2 asitschosenroutetoreachthisprexsincethisisthe onlyroutethatitknows. R6 sends U(1.0.0.0/8,R2,AS2,MED=1) torouters R8 and R7.Theybothruntheirdecision processandprefertherouteadvertisedby R6,asitcontainsthesmallest MED.Now,allroutersinside AS1 forward thepacketsto 1.0.0.0/8 vialink R6-R2 asexpectedby AS2.Asrouter R7 nolongerusestheBGProutelearned via R3,itmuststopadvertisingitover iBGPsessions andsends W(1.0.0.0/8) overits iBGPsessions with R6 and R8.However,router R7 stillkeepstheroutelearnedfrom R3 insideitsAdj-RIB-In.Ifthe R6-R2 linkfails, R6 sends W(1.0.0.0/8) overitsiBGPsessionsandrouter R7 respondsbysending U(1.0.0.0/8,R3,AS2,MED=98) over itsiBGPsessions. Inpractice,thefthstepoftheBGPdecisionprocessisslightlymorecomplex,astheroutestowardsagivenprex canbelearnedfromdifferentASes.Forexample,assumethatingure Hotandcoldpotatorouting, 1.0.0.0/8 is alsoadvertisedby AS3 (notshowninthegure)thathaspeeringlinkswithrouters R6 and R8.If AS3 advertisesa routewhoseMEDattributeissetto 2 andanotherwithaMEDsetto 3,howshould AS1`sroutercomparethefour BGProutestowards 1.0.0.0/8 ?IsaMEDvalueof 1 from AS2 betterthanaMEDvalueof 2 from AS3 ?Thefth stepoftheBGPdecisionprocesssolvesthisproblembyonlycomparingtheMEDattributeoftherouteslearned fromthesameneighbourAS.AdditionaldetailsabouttheMEDattributemaybefoundin RFC4451.Itshould benotedthatusingtheMEDattributemaycausesomeproblemsinBGPnetworksasexplainedin [GW2002].In practice,the MED attributeisnotusedon eBGPsessions unlessthetwodomainsagreetoenableit. ThelaststepoftheBGPdecisionallowstheselectionofasingleroutewhenaBGProuterhasreceivedseveral routesthatareconsideredasequalbytherstsixstepsofthedecisionprocess.Thiscanhappenforexample inadual-homedstubattachedtotwodifferentproviders.Asshowninthegurebelow,router R1 receivestwo equallygoodBGProutestowards 1.0.0.0/8.Tobreaktheties,eachrouterisidentiedbyaunique router-id which inpracticeisoneoftheIPaddressesassignedtotherouter.Onsomerouters,thelowestrouteridstepinthe BGPdecisionprocessisreplacedbytheselectionoftheoldestroute RFC5004.Preferringtheoldestroutewhen breakingtiesisusedtopreferstablepathsoverunstablepaths.However,adrawbackofthisapproachisthatthe selectionoftheBGProutesdependsonthearrivaltimesofthecorrespondingmessages.ThismakestheBGP selectionprocessnon-deterministicandcanleadtoproblemsthataredifculttodebug. Figure5.70:Astubconnectedtotwoproviders 37 TheMEDattributecanbeusedon customer->provider peeringrelationshipsuponrequestofthecustomer.On shared-cost peering relationship,theMEDattributeisonlyenabledwhenthereisaexplicitagreementbetweenthetwopeers. 5.3.RoutinginIPnetworks 191

    PAGE 196

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 BGPconvergence Intheprevioussections,wehaveexplainedtheoperationofBGProuters.Comparedtointradomainrouting protocols,akeyfeatureofBGPisitsabilitytosupportinterdomainroutingpoliciesthataredenedbyeach domainasitsimportandexportltersandrankingprocess.Adomaincandeneitsownroutingpoliciesand routervendorshaveimplementedmanycongurationtweakstosupportcomplexroutingpolicies.However,the routingpolicychosenbyadomainmayinterferewiththeroutingpolicychosenbyanotherdomain.Tounderstand thisissue,letusrstconsiderthesimpleinternetworkshownbelow. Figure5.71:Thedisagreeinternetwork Inthisinternetwork,wefocusontheroutetowards 1.0.0.0/8 whichisadvertisedby AS1.Letusalsoassumethat AS3 (resp. AS4)prefers,e.g.foreconomicreasons,aroutelearnedfrom AS4 (AS3)overaroutelearnedfrom AS1. When AS1 sends U(1.0.0.0/8,AS1) to AS3 and AS4,threesequencesofexchangesofBGPmessagesarepossible: 1. AS3 sendsrst U(1.0.0.0/8,AS3:AS1) to AS4. AS4 haslearnedtworoutestowards 1.0.0.0/8.ItrunsitsBGP decisionprocessandselectstheroutevia AS3 anddoesnotadvertisearouteto AS3 2. AS4 rstsends U(1.0.0.0/8,AS3:AS1) to AS3. AS3 haslearnedtworoutestowards 1.0.0.0/8.ItrunsitsBGP decisionprocessandselectstheroutevia AS4 anddoesnotadvertisearouteto AS4 3. AS3 sends U(1.0.0.0/8,AS3:AS1) to AS4 and,atthesametime, AS4 sends U(1.0.0.0/8,AS4:AS1). AS3 prefers theroutevia AS4 andthussends W(1.0.0.0/8) to AS4.Inthemeantime, AS4 preferstheroutevia AS3 and thussends W(1.0.0.0/8) to AS3.Uponreceptionofthe BGPWithdraws, AS3 and AS4 onlyknowthedirect routetowards 1.0.0.0/8. AS3 (resp. AS4)sends U(1.0.0.0/8,AS3:AS1) (resp. U(1.0.0.0/8,AS4:AS1))to AS4 (resp. AS3). AS3 and AS4 couldintheorycontinuetoexchangeBGPmessagesforever.Inpractice,oneof themsendsonemessagefasterthantheotherandBGPconverges. TheexampleabovehasshownthattheroutesselectedbyBGProutersmaysometimesdependontheorderingof theBGPmessagesthatareexchanged.Othersimilarscenariosmaybefoundin RFC4264. Fromanoperationalperspective,theabovecongurationisannoyingsincethenetworkoperatorscannoteasily predictwhichpathsarechosen.Unfortunately,thereareevenmoreannoyingBGPcongurations.Forexample, letusconsiderthecongurationbelowwhichisoftennamed BadGadget [GW1999] Inthisinternetwork,therearefourASes. AS0 advertisesoneroutetowardsoneprexandweonlyanalysethe routestowardsthisprex.Theroutingpreferencesof AS1, AS3 and AS4 arethefollowing: AS1 prefersthepath AS3:AS0 overallotherpaths AS3 prefersthepath AS4:AS0 overallotherpaths AS4 prefersthepath AS1:AS0 overallotherpaths AS0 sends U(p,AS0) to AS1, AS3 and AS4.Asthisistheonlyrouteknownby AS1, AS3 and AS4 towards p,they allselectthedirectpath.LetusnowconsideronepossibleexchangeofBGPmessages: 1. AS1 sends U(p,AS1:AS0) to AS3 and AS4. AS4 selectsthepathvia AS1 sincethisisitspreferredpath. AS3 stillusesthedirectpath. 2. AS4 advertises U(p,AS4:AS1:AS0) to AS3. 192Chapter5.Thenetworklayer

    PAGE 197

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.72:Thebadgadgetinternetwork 3. AS3 sends U(p,AS3:AS0) to AS1 and AS4. AS1 selectsthepathvia AS3 sincethisisitspreferredpath. AS4 stillusesthepathvia AS1. 4.As AS1 haschangeditspath,itsends U(p,AS1:AS3:AS0) to AS4 and W(p) to AS3 sinceitsnewpathisvia AS3. AS4 switchesbacktothedirectpath. 5. AS4 sends U(p,AS4:AS0) to AS1 and AS3. AS3 prefersthepathvia AS4. 6. AS3 sends U(p,AS3:AS4:AS0) to AS1 and W(p) to AS4. AS1 switchesbacktothedirectpathandweare backattherststep. ThisexampleshowsthattheconvergenceofBGPisunfortunatelynotalwaysguaranteedassomeinterdomain routingpoliciesmayinterferewitheachotherincomplexways. [GW1999] haveshownthatcheckingforglobal convergenceiseitherNP-completeorNP-hard.See [GSW2002] foramoredetaileddiscussion. Fortunately,therearesomeoperationalguidelines [GR2001][GGR2001] thatcanguaranteeBGPconvergencein theglobalInternet.ToensurethatBGPwillconverge,theseguidelinesconsiderthattherearetwotypesofpeering relationships: customer->provider and shared-cost.Inthiscase,BGPconvergenceisguaranteedprovidedthat thefollowingconditionsarefullled: 1.Thetopologycomposedofallthedirected customer->provider peeringlinksisanacyclicgraph 2.AnASalwaysprefersaroutereceivedfroma customer overaroutereceivedfroma shared-cost peerora provider. Therstguidelineimpliesthattheprovideroftheproviderof ASx cannotbeacustomerof ASx.Sucharelationship wouldnotmakesensefromaneconomicperspectiveasitwouldimplycircularpayments.Furthermore,providers areusuallylargerthancustomers. Thesecondguidelinealsocorrespondstoeconomicpreferences.Sinceaproviderearnsmoneywhensending packetstooneofitscustomers,itmakessensetoprefersuchcustomerlearnedroutesoverrouteslearnedfrom providers. [GR2001] alsoshowsthatBGPconvergenceisguaranteedevenifanASassociatesthesamepreference torouteslearnedfroma shared-cost peerandrouteslearnedfromacustomer. Fromatheoreticalperspective,theseguidelinesshouldbeveriedautomaticallytoensurethatBGPwillalways convergeintheglobalInternet.However,suchavericationcannotbeperformedinpracticebecausethiswould forcealldomainstodisclosetheirroutingpolicies(andfewarewillingtodoso)andfurthermoretheproblemis knowntobeNP-hard[GW1999]. Inpractice,researchersandoperatorsexpectthattheseguidelinesareveried 38 inmostdomains.Thankstothe largeamountofBGPdatathathasbeencollectedbyoperatorsandresearchers 39 ,severalstudieshaveanalysed 38 Someresearcherssuchas [MUF+2007] haveshownthatmodellingtheInternettopologyattheAS-levelrequiresmorethanthe sharedcost and customer->provider peeringrelationships.However,thereisnopubliclyavailablemodelthatgoesbeyondtheseclassicalpeering relationships. 39 BGPdataisoftencollectedbyestablishingBGPsessionsbetweenUnixhostsrunningaBGPdaemonandBGProutersindifferentASes.TheUnixhostsstoresallBGPmessagesreceivedandregulardumpsofitsBGProutingtable.See http://www.routeviews.org, http://www.ripe.net/ris, http://bgp.potaroo.net or http://irl.cs.ucla.edu/topology/ 5.3.RoutinginIPnetworks 193

    PAGE 198

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 theAS-leveltopologyoftheInternet. [SARK2002] isoneoftherstanalysis.Morerecentstudiesinclude [COZ2008] and [DKF+2007] Basedonthesestudiesand [ATLAS2009],theAS-levelInternettopologycanbesummarisedasshowninthe gurebelow. Figure5.73:ThelayeredstructureoftheglobalInternet ThedomainsontheInternetcanbedividedinaboutfourcategoriesaccordingtotheirroleandtheirpositionin theAS-leveltopology. thecoreoftheInternetiscomposedofadozen-twenty Tier-1 ISPs.A Tier-1 isadomainthathasno provider. SuchanISPhas shared-cost peeringrelationshipswithallother Tier-1 ISPsand provider->customer relationshipswithsmallerISPs.Examplesof Tier-1 ISPsinclude sprint, level3 or opentransit the Tier-2 ISPsarenationalorcontinentalISPsthatarecustomersof Tier-1 ISPs.These Tier-2 ISPshave smallercustomersand shared-cost peeringrelationshipswithother Tier-2 ISPs.Exampleof Tier-2 ISPs includeFranceTelecom,Belgacom,BritishTelecom,... the Tier-3 networksareeitherstubdomainssuchasenterpriseorcampusnetworksnetworksandsmaller ISPs.TheyarecustomersofTier-1andTier-2ISPsandhavesometimes shared-cost peeringrelationships thelargecontentprovidersthataremanaginglargedatacenters.Thesecontentprovidersareproducinga growingfractionofthepacketsexchangedontheglobalInternet [ATLAS2009].Someofthesecontent providersarecustomersofTier-1orTier-2ISPs,buttheyoftentrytoestablish shared-cost peeringrelationships,e.g.atIXPs,withmanyTier-1andTier-2ISPs. DuetothisorganisationoftheInternetandduetotheBGPdecisionprocess,mostAS-levelpathsontheInternet havealengthof3-5AShops. 194Chapter5.Thenetworklayer

    PAGE 199

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 5.4Summary 5.5Exercises 5.5.1Principles 1.Routingprotocolsusedindatanetworksonlyusepositivelinkweights.Whatwouldhappenwithadistance vectorroutingprotocolinthenetworkbelowthatcontainsanegativelinkweight? Figure5.74:Simplenetwork 2.Whenanetworkspecialistdesignsanetwork,oneoftheproblemsthatheneedstosolveistosetthemetrics thelinksinhisnetwork.IntheUSA,theAbilenenetworkinterconnectsmostoftheresearchlabsand universities.Thegurebelowshowsthetopology 40 ofthisnetworkin2009. Figure5.75:TheAbilenenetwork Inthisnetwork,assumethatallthelinkweightsaresetto1.Whatisthepathsfollowedbyapacket sentbytherouterlocatedin LosAngeles toreach: therouterlocatedin NewYork therouterlocatedin Washington ? Isitpossibletocongurethelinkmetricssothatthepacketssentbytherouterlocatedin LosAngeles totherouterslocatedinrespectively NewYork and Washington donotfollowthesamepath? Isitpossibletocongurethelinkweightssothatthepacketssentbytherouterlocatedin LosAngeles torouterlocatedin NewYork followonepathwhilethepacketssentbytherouterlocatedin NewYork totherouterlocatedin LosAngeles followacompletelydifferentpath? 40 ThisgurewasdownloadedfromtheAbileneobservatory http://www.internet2.edu/observatory/archive/data-views.html.ThisobservatorycontainsadetaileddescriptionoftheAbilenenetworkincludingdetailednetworkstatisticsandallthecongurationoftheequipment usedinthenetwork. 5.4.Summary 195

    PAGE 200

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Assumethattherouterslocatedin Denver and KansasCity needtoexchangelotsofpackets.Canyou congurethelinkmetricssuchthatthelinkbetweenthesetworoutersdoesnotcarryanypacketsent byanotherrouterinthenetwork? 3.Inthevenodesnetworkshownbelow,canyoucongurethelinkmetricssothatthepacketssentbyrouter E torouter A uselink B->A whilethepacketssentbyrouter B uselinks B->D and D->A? Figure5.76:Simplevenodesnetwork 4.Inthevenodesnetworkshownabove,canyoucongurethelinkweightssothatthepacketssentbyrouter E (resp. F )followthe E->B->A path(resp. F->D->B->A)? 5.Intheabovequestions,youhaveworkedonthestablestateoftheroutingtablescomputedbyrouting protocols.Letusnowconsiderthetransientproblemsthatmainhappenwhenthenetworktopologychanges 41 .Forthis,considerthenetworktopologyshowninthegurebelowandassumethatallroutersusea distancevectorprotocolthatusessplithorizon. Figure5.77:Simplenetworkwithredundantlinks Ifyoucomputetheroutingtablesofallroutersinthisnetwork,youwouldobtainatablesuchasthe tablebelow: 41 Themaineventsthatcanaffectthetopologyofanetworkare:-thefailureofalink.MeasurementsperformedinIPnetworks haveshownthatsuchfailureshappenfrequentlyandusuallyforrelativelyshortperiodsoftime-theadditionofonelinkinthenetwork.Thismaybebecauseanewlinkhasbeenprovisionedormorefrequentlybecausethelinkfailedsometimeagoandisnow back-thefailure/crashofarouterfollowedbyitsreboot.-achangeinthemetricofalinkbyreconguringtheroutersattachedto thelinkSee http://totem.info.ucl.ac.be/lisis_tool/lisis-example/ forananalysisofthefailuresinsidetheAbilenenetworkinJune2005or http://citeseer.ist.psu.edu/old/markopoulou04characterization.html forananalysisofthefailuresaffectingalargerISPnetwork 196Chapter5.Thenetworklayer

    PAGE 201

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Destination RoutesonA RoutesonB RoutesonC RoutesonD RoutesonE A 0 1viaA 2viaB 3viaC 4viaD B 1viaB 0 1viaB 2viaC 3viaD C 2viaB 1viaC 0 1viaC 2viaD D 3viaB 2viaC 1viaD 0 1viaD E 4viaB 3viaC 2viaD 1viaE 0 Distancevectorprotocolscanoperateintwodifferentmodes: periodicupdates and triggeredupdates. Periodicupdates isthedefaultmodeforadistancevectorprotocol.Forexample,eachroutercould advertiseitsdistancevectoreverythirtyseconds.Withthe triggeredupdates aroutersendsitsdistance vectorwhenitsroutingtablechanges(andperiodicallywhentherearenochanges). Consideradistancevectorprotocolusingsplithorizonand periodicupdates.Assumethatthe link B-C fails. B and C updatetheirlocalroutingtablebuttheywillonlyadvertiseitattheend oftheirperiod.Selectoneorderingforthe periodicupdates andeverytimearoutersendsits distancevector,indicatethevectorsenttoeachneighborandupdatethetableabove.Howmany periodsarerequiredtoallowthenetworktoconvergetoastablestate? Considerthesamedistancevectorprotocol,butnowwith triggeredupdates.Whenlink B-C fails,assumethat B updatesitsroutingtableimmediatelyandsendsitsdistancevectorto A and D.Assumethatboth A and D processthereceiveddistancevectorandthat A sendsitsown distancevector,...Indicateallthedistancevectorsthatareexchangedandupdatethetable aboveeachtimeadistancevectorissentbyarouter(andreceivedbyotherrouters)untilall routershavelearnedanewroutetoeachdestination.Howmanydistancevectormessagesmust beexchangeduntilthenetworkconvergestoastablestate? 6.Considerthenetworkshownbelow.Inthisnetwork,themetricofeachlinkissetto 1 exceptlink A-B whose metricissetto 4 inbothdirections.Inthisnetwork,therearetwopathswiththesamecostbetween D and C.Oldrouterswouldrandomlyselectoneoftheseequalcostpathsandinstallitintheirforwardingtable. Recentroutersareabletouseupto N equalcostpathstowardsthesamedestination. Figure5.78:AsimplenetworkrunningOSPF Onrecentrouters,alookupintheforwardingtableforadestinationaddressreturnsasetofoutgoing interfaces.Howwouldyoudesignanalgorithmthatselectstheoutgoinginterfaceusedforeach packet,knowingthattoavoidreordering,allsegmentsofagivenTCPconnectionshouldfollowthe samepath? 7.Consideragainthenetworkshownabove.Aftersometime,OSPFconvergesandallrouterscomputethe followingroutingtables: Destination RoutesonA RoutesonB RoutesonC RoutesonD RoutesonE A 0 2viaC 1viaA 3viaB,E 2viaC B 2viaC 0 1viaB 1viaB 2viaD,C C 1viaC 1viaC 0 2viaB,E 1viaC D 3viaC 1viaD 2viaB,E 0 1viaD E 2viaC 2viaC,D 1viaE 1viaE 0 5.5.Exercises 197

    PAGE 202

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 AnimportantdifferencebetweenOSPFandRIPisthatOSPFroutersoodlinkstatepacketsthat allowtheotherrouterstorecomputetheirownroutingtableswhileRIProutersexchangedistance vectors.Considerthatlink B-C failsandthatrouter B isthersttodetectthefailure.Atthispoint, B cannotreachanymore A, C and50%ofitspathstowards E havefailed. C cannotreach B anymore andhalfofitspathstowards D havefailed. Router B willooditsupdatedlinkstatepacketthroughtheentirenetworkandallrouterswillrecomputetheirforwardingtable.Uponreceptionofalinkstatepacket,routersusuallyrstoodthe receivedlink-statepacketandthenrecomputetheirforwardingtable.Assumethat B istherstto recomputeitsforwardingtable,followedby D, A, C andnally E 8.Aftereachupdateofaforwardingtable,verifywhichpairsofroutersareabletoexchangepackets.Provide youranswerusingatablesimilartotheoneshownabove. 9.Canyoundanorderingoftheupdatesoftheforwardingtablesthatavoidsalltransientproblems? 10.Considerthenetworkshowninthegurebelowandexplainthepaththatwillbefollowedbythepacketsto reach 194.100.10.0/23 Figure5.79:Astubconnectedtooneprovider 11.Consider,now,asshowninthegurebelowthatthestubASisnowalsoconnectedtoprovider AS789.Via whichproviderwillthepacketsdestinedto 194.100.10.0/23 willbereceivedby AS4567 ?Should AS123 changeitsconguration? 12.Considerthatstubshowninthegurebelowdecidestoadvertisetwo /24 prexesinsteadofitsallocated /23 prex. 1.Viawhichproviderdoes AS4567 receivethepacketsdestinedto 194.100.11.99 and 194.100.10.1 ? 2.Howisthereachabilityoftheseaddressesaffectedwhenlink R1-R3 fails? 3.Proposeacongurationon R1 thatachievesthesameobjectiveastheoneshowninthegurebut alsopreservesthereachabilityofallIPaddressesinside AS4567 ifoneof AS4567 `sinterdomain 198Chapter5.Thenetworklayer

    PAGE 203

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.80:Astubconnectedtotwoproviders linksfails? 13.Considerthenetworkshowninthegurebelow.Inthisnetwork,eachAScontainsasingleBGProuter. Assumethat R1 advertisesasingleprex. R1 receivesalotofpacketsfrom R9.Withoutanyhelpfrom R2, R9 or R4,howcould R1 congureitsBGPadvertisementsuchthatitreceivesthepacketsfrom R9 via R3 ? Whathappenswhenalinkfails? 14.Considerthenetworkshowinthegurebelow. 1.ShowwhichBGPmessagesareexchangedwhenrouter R1 advertisesprex 10.0.0.0/8. 2.Howmanyandwhichroutesareknownbyrouter R5 ?Whichroutedoesitadvertiseto R6 ? 3.Assumenowthatthelinkbetween R1 and R2 fails.Showthemessagesexchangedduetothis event.WhichBGPmessagesaresentto R6 ? 15.Considerthenetworkshowninthegurebelowwhere R1 advertisesasingleprex.Inthisnetwork,the linkbetween R1 and R2 isconsideredasabackuplink.Itshouldonlybeusedonlywhentheprimarylink (R1-R4)fails.Thiscanbeimplementedon R2 bysettingalow local-pref totheroutesreceivedonlink R2-R1 1.Inthisnetwork,whatarethepathsusedbyallrouterstoreach R1 ? 2.Assumenowthatthelink R1-R4 fails.WhichBGPmessagesareexchangedandwhatarenow thepathsusedtoreach R1 ? 3.Link R1-R4 comesback.WhichBGPmessagesareexchangedandwhatdothepathsusedto reach R1 become? 16.OnFebruary22,2008,thePakistanTelecomAuthorityissuedan order toPakistanISPstoblockaccessto threeIPaddressesbelongingto youtube: 208.65.153.238, 208.65.153.253, 208.65.153.251.Oneoperator notedthattheseaddresseswerebelongingtothesame /24 prex.Read http://www.ripe.net/news/studyyoutube-hijacking.html tounderstandwhathappenedreally. 1.Whatshouldhavedone youtube toavoidthisproblem? 5.5.Exercises 199

    PAGE 204

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.81:Astubconnectedtotwoproviders Figure5.82:Asimpleinternetwork 200Chapter5.Thenetworklayer

    PAGE 205

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure5.83:Asimpleinternetwork Figure5.84:Asimpleinternetworkwithabackuplink 5.5.Exercises 201

    PAGE 206

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 2.Whatkindofsolutionswouldyouproposetoimprovethesecurityofinterdomainrouting? 17.Therearecurrently13IPv4addressesthatareassociatedtotherootserversoftheDomainNameSystem. However, http://www.root-servers.org/ indicatesthattherearemorethan100differentphysicalserversthat support.Thisisalargeanycastservice.HowwouldyoucongureBGProuterstoprovidesuchanycast service? 18.Considerthenetworkshowninthegurebelow.Inthisnetwork, R0 advertisesprex p andalllinkmetrics aresetto 1 DrawtheiBGPandeBGPsessions Assumethatsession R0-R8 isdownwhen R0 advertises p over R0-R7.WhataretheBGP messagesexchangedandtherouteschosenbyeachrouterinthenetwork? Session R0-R8 isestablishedand R0 advertisesprex p overthissessionaswell Dotheroutesselectedbyeachrouterchangeifthe MED attributeisusedonthe R7-R6 and R3-R10 sessions,butnotonthe R4-R9 and R6-R8 sessions? Isitpossibletoconguretheroutersinthe R1-R6 networksuchthat R4 reachesprex p via R6-R8 while R2`usesthe`R3-R10 link? Figure5.85:AsimpleInternet 19.TheBGP MED attributeisoftensetattheIGPcosttoreachtheBGPnexthopoftheadvertisedprex. However,routerscanalsobeconguredtoalwaysusethesame MED valuesforallroutesadvertisedover agivensession.Howwouldyouuseitinthegureabovesothatlink R10-R3 istheprimarylinkwhile R7-R6 isabackuplink?Isthereanadvantageordrawbackofusingthe MED attributeforthisapplication comparedto local-pref ? 20.Inthegureabove,assumethatthemanagersof R8 and R9 wouldliketousethe R8-R6 linkasabackup link,butthemanagersof R4 and R6 donoagreetousetheBGP MED attributenortouseadifferent local-pref fortherouteslearnedfrom 5.5.2Practice 1.ForthefollowingIPv4subnets,indicatethesmallestandthelargestIPv4addressinsidethesubnet: 8.0.0.0/8 172.12.0.0/16 200.123.42.128/25 12.1.2.0/13 2.ForthefollowingIPv6subnets,indicatethesmallestandthelargestIPv6addressinsidethesubnet: FE80::/64 2001:db8::/48 2001:6a8:3080::/48 202Chapter5.Thenetworklayer

    PAGE 207

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 3.ResearchersandnetworkoperatorscollectandexposelotsofBGPdata.Forthis,theyestablisheBGP sessionsbetween datacollection routersandproductionrouterslocatedinoperationalnetworks.Several datacollection routersareavailable,themostpopularonesare: http://www.routeviews.org http://www.ripe.net/ris Forthisexercise,youwilluseoneofthe routeviews BGProuters.Youcanaccessoneoftheserouters byusingtelnet.Onceloggedontherouter,youcanusetherouter'scommandlineinterfacetoanalyse itsBGProutingtable. telnetroute-views.routeviews.org Trying128.223.51.103... Connectedtoroute-views.routeviews.org. Escapecharacteris'^]'. C ********************************************************************** OregonExchangeBGPRouteViewer route-views.oregon-ix.net/route-views.routeviews.org routeviewsdataisarchivedonhttp://archive.routeviews.org ThishardwareispartofagrantfromCiscoSystems. Pleasecontacthelp@routeviews.orgifyouhavequestionsor commentsaboutthisservice,itsuse,orifyoumightbeableto contributeyourview. ThisrouterhasviewsofthefullroutingtablesfromseveralASes. ThelistofASesisdocumentedunder"CurrentParticipants"on http://www.routeviews.org/. ************** route-views.routeviews.orgisnowusingAAAforlogins.Loginwith username"rviews".Seehttp://routeviews.org/aaa.html ********************************************************************** UserAccessVerification Username:rviews route-views.oregon-ix.net> ThisrouterhaseBGPsessionswithroutersfromseveralISPs.See http://www.routeviews.org/peers/route-views.oregon-ix.net.txt foranup-to-datelistofalleBGP sessionsmaintainedbythisrouter. Amongallthecommandssupportedbythisrouter,the showipbgp commandisveryuseful.This commandtakesanIPv4prexasparameterandallowsyoutoretrievealltheroutesthatthisrouters hasreceivedinitsAdj-RIB-Inforthespeciedprex. 1.Use showipbgp130.104.0.0/16 tondthebestpathusedbythisroutertoreachUCLouvain 2.Knowingthat 130.104.0.0/16 isannouncedbybelnet(AS2611),whatare,accordingtothisBGP routingtables,theASesthatpeerwithbelnet 3.DothesameanalysisforoneoftheIPv4prexesassignedtoSkynet(AS5432): 62.4.128.0/17. Theoutputofthe showipbgp62.4.128.0/17 revealssomethingstrangeasitseemsthatoneof thepathstowardsthisprexpassestwicevia AS5432.Canyouexplainthis? 2905702123954325432 196.7.106.245from196.7.106.245(196.7.106.245) OriginIGP,metric0,localpref100,valid,external 4.netkitallowstoeasilyperformexperimentsbyusinganemulatedenvironmentisiscomposedofvirtual 5.5.Exercises 203

    PAGE 208

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 machinesrunningUserModelLinux. netkit allowstosetupasmallnetworkinalabandcongureitasif youhadaccesstoseveralPCsinterconnectedbyusingcablesandnetworkequipments. A netkit labisdenedasafewcongurationlesandscripts: lab.conf isatextlethatdenesthevirtualmachinesandthenetworktopology.Asimple lab.conf leisshownbelow. LAB_DESCRIPTION="astringdescribingthelab" LAB_VERSION=1.0 LAB_AUTHOR="theauthorofthelab" LAB_EMAIL="emailaddressoftheauthor" h1[0]="lan" h2[0]="lan" Thiscongurationlerequeststhecreationoftwovirtualmachines,named h1 and h2.Eachofthesehostshasonenetworkinterface(eth0)thatisconnectedtothelocalareanetworknamed lan. netkit allowstodeneseveralinterfacesonagivenhostandattachthemtodifferentlocalareanetworks. A host.startup leforeachhost(h1.startup and h2.startup intheexampleabove).Thisleis ashellscriptthatisexecutedattheendofthebootofthevirtualhost.Thisistypicallyinthis scriptthatthenetworkinterfacesareconguredandthedaemonsarelaunched.Adirectoryfor eachhost(h1 and h2 intheexampleabove).Thisdirectoryisusedtostorecongurationles thatmustbecopiedonthevirtualmachine'slesystemswhentheyarerstcreated. netkit containsseveralscriptsthatcanbeusedtorunalab. lstart allowstolaunchalaband lhalt allowstohaltthemachinesattheendofalab.Ifyouneedtoexchangelesbetweenthevirtual machinesandtheLinuxhostonwhich netkit runs,notethatthevirtualhostsmountthedirectorythat containstherunninglabin /hostlab andyourhomedirectoryin /hosthome. Forthisexercise,youwillusea netkit labcontaining4hostsandtworouters.Thecongurationles areavailable exercises/labs/lab-2routers.tar.gz.Thenetworktopologyofthislabis showninthegurebelow. Figure5.86:Thetworouterslab The lab.conf leforthislabisshownbelow. h1[0]="lan1" h2[0]="lan1" h3[0]="lan2" router1[0]="lan1" router1[1]="lan2" router2[0]="lan2" router2[1]="lan3" h4[0]="lan3" Inthisnetwork,wewillusesubnet 172.12.1.0/24 for lan1, 172.12.2.0/24 for lan2 and 172.12.3.0/24 for lan3. OnLinux,theIPaddressesassignedonaninterfacecanbeconguredbyusing ifconfig(8). When ifconfig(8) isusedwithoutparameters,itlistsalltheexistinginterfacesofthehostwith theirconguration.Asample ifconfig(8) outputisshownbelow. host:~#ifconfig eth0Linkencap:EthernetHWaddrFE:3A:59:CD:59:AD Inetaddr:192.168.1.1Bcast:192.168.1.255Mask:255.255.255.0 204Chapter5.Thenetworklayer

    PAGE 209

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 inet6addr:fe80::fc3a:59ff:fecd:59ad/64Scope:Link UPBROADCASTRUNNINGMULTICASTMTU:1500Metric:1 RXpackets:3errors:0dropped:0overruns:0frame:0 TXpackets:3errors:0dropped:0overruns:0carrier:0 collisions:0txqueuelen:1000 RXbytes:216(216.0b)TXbytes:258(258.0b) Interrupt:5 loLinkencap:LocalLoopback inetaddr:127.0.0.1Mask:255.0.0.0 inet6addr:::1/128Scope:Host UPLOOPBACKRUNNINGMTU:16436Metric:1 RXpackets:0errors:0dropped:0overruns:0frame:0 TXpackets:0errors:0dropped:0overruns:0carrier:0 collisions:0txqueuelen:0 RXbytes:0(0.0b)TXbytes:0(0.0b) Thishosthastwointerfaces:theloopbackinterface(lo withIPv4address 127.0.0.1 andIPv6 address ::1)andthe eth0 interface.The 192.168.1.1/24 addressandalinklocalIPv6address (fe80::fc3a:59ff:fecd:59ad/64)havebeenassignedtointerface eth0.Thebroadcastaddressisused insomeparticularcases,thisisoutsidethescopeofthisexercise. ifconfig(8) alsoprovides statisticssuchasthenumberofpacketssentandreceivedoverthisinterface.Anotherimportantinformationthatisprovidedby ifconfig(8) isthehardwareaddress(HWaddr)usedbythedatalink layeroftheinterface.Ontheexampleabove,the eth0 interfaceusesthe48bits FE:3A:59:CD:59:AD hardwareaddress. YoucanconguretheIPv4addressassignedtoaninterfacebyspecifyingtheaddressandthenetmask. ifconfigeth0192.168.1.2netmask255.255.255.128up Youcanalsospecifytheprexlength ..code-block::text ifcongeth0192.168.1.2/25up Inbothcases, ifcongeth0 allowsyoutoverifythattheinterfacehasbeencorrectlycongured. eth0Linkencap:EthernetHWaddrFE:3A:59:CD:59:AD inetaddr:192.168.1.2Bcast:192.168.1.127Mask:255.255.255.128 inet6addr:fe80::fc3a:59ff:fecd:59ad/64Scope:Link UPBROADCASTRUNNINGMULTICASTMTU:1500Metric:1 RXpackets:3errors:0dropped:0overruns:0frame:0 TXpackets:3errors:0dropped:0overruns:0carrier:0 collisions:0txqueuelen:1000 RXbytes:216(216.0b)TXbytes:258(258.0b) Interrupt:5 AnotherimportantcommandonLinuxis route(8) thatallowstolookatthecontentsoftherouting tablestoredintheLinuxkernelandchangeit.Forexample, route-n returnsthecontentsoftheIPv4 routingtable.See route(8) foradetaileddescriptiononhowyoucancongureroutesbyusing thistool. 1.Use ifconfig(8) tocongurethefollowingIPv4addresses: 172.16.1.11/24 oninterface eth0 on h1 172.16.1.12/24 oninterface eth0 on h2 2.Use route-n tolookatthecontentsoftheroutingtableonthetwohosts. 3.Verifybyusing ping(8) that h1 canreach 172.16.1.12 4.Use ifconfig(8) tocongureIPv4address 172.16.1.1/24 onthe eth0 interfaceof router1 and 172.16.2.1/24 onthe eth1 interfaceonthisrouter. 5.5.Exercises 205

    PAGE 210

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 5.Sincehosts h1 and h2 areattachedtoalocalareanetworkthatcontainsasinglerouter,this routercanactasadefaultrouter.Addadefaultrouteon h1 and h2 sothattheycanuse router1 astheirdefaultroutertoreachanyremoteIPv4address.Verifybyusing ping(8) that h1 can reachaddress 172.16.2.1. 6.Whatdoyouneedtocongureon router2, h3 and h4 sothatallhostsandrouterscanreachall hostsandroutersintheemulatednetwork?Addthe ifcong and route commandsinthe .startup lesofallthehostssothatthenetworkiscorrectlyconguredwhenitisstartedbyusing lstart. 5.UsethenetworkconguredabovetotesthowIPpacketsarefragmented.The ifcong commandallowsyou tospecifytheMaximumTransmissionUnit(MTU),i.e.thelargestsizeoftheframesthatareallowedona giveninterface.ThedefaultMTUonthe eth? interfacesis1500bytes. 1.ForceanMTUof500bytesonthethreeinterfacesattachedto lan2. 2.Use ping-s1000 tosenda1000bytespingpacketfrom h3 tooneoftheroutersattachedto lan2 andcapture thepacketsontheotherrouterbyusing tcpdump(8).InwhichorderdoestheemulatedhostsendstheIP fragments? 3.Use ping-s2000 tosenda2000bytespingpacketfrom h1 to h4 andcapturethepacketson lan2 and lan3 byusing tcpdump(8).InwhichorderdoestheemulatedhostsendstheIPfragments? 4.Fromyourmeasurements,howdoesanemulatedhostgeneratetheidentiersoftheIPpacketsthatitsends ? 5.ResettheMTUonthe eth1 interfaceofrouter r1 at1500bytes,butleavetheMTUonthe eth0 interface ofrouter r2 at500bytes.Checkwhetherhost h1 canpinghost h4.Use tcpdump(8) toanalysewhatis happening. 6.TheRoutingInformationProtocol(RIP)isadistancevectorprotocolthatisoftenusedinsmallIPnetworks. TherearevariousimplementationsofRIP.Forthisexercise,youwilluse quagga,anopen-sourceimplementationofseveralIProutingprotocolsthatrunsonLinuxandotherUnixcompatibleoperatingsystems. quagga(8) isinfactasetofdaemonsthatinteracttogetherandwiththeLinuxkernel.Forthisexercise, youwillusetwoofthesedaemons: zebra(8) and ripd(8). zebra(8) isthemasterdaemonthat handlestheinteractionsbetweentheLinuxkernelroutingtableandtheroutingprotocols. ripd(8) is theimplementationoftheRIPprotocol.ItinteractswiththeLinuxroutingtablesthroughthe zebra(8) daemon. TouseaLinuxrealorvirtualmachineasarouter,youneedtorstconguretheIPaddressesoftheinterfacesofthemachine.Oncethiscongurationhasbeenveried,youcancongurethe zebra(8) and ripd(8) daemons.Thecongurationlesforthesedaemonsresidein /etc/zebra.Therst congurationleis /etc/zebra/daemons.Itliststhedaemonsthatarelaunchedwhenzebraisstarted by /etc/init.d/zebra.Toenable ripd(8) and zebra(8),thislewillbeconguredasfollows. #Thisfiletellsthezebrapackage #whichdaemonstostart. #Entriesareintheformat:=(yes|no|priority) #where'yes'isequivalenttoinfinitelylowpriority,and #lowernumbersmeanhigherpriority.Read #/usr/doc/zebra/README.Debianfordetails. #Daemonsare:bgpdzebraospfdospf6dripdripngd zebra=yes bgpd=no ospfd=yes ospf6d=no ripd=no ripngd=no Thesecondcongurationleisthe /etc/zebra/zebra.conf le.Itdenestheglobalcongurationrules thatapplyto zebra(8).Forthisexercise,weusethedefaultcongurationleshownbelow. !* -zebra* !zebraconfigurationfile 206Chapter5.Thenetworklayer

    PAGE 211

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 hostnamezebra passwordzebra enablepasswordzebra !Staticdefaultroutesample. !iproute0.0.0.0/0203.181.89.241 logfile/var/log/zebra/zebra.log Inthezebracongurationle,linesbeginningwith arecomments.Thiscongurationdenesthe hostnameas zebra andtwopasswords.Thedefaultpassword(passwordzebra)istheonethatmust begivenwhenconnectingtothe zebra(8) managementconsoleoveraTCPconnection.This managementconsolecanbeuselikeashellonaUnixhosttospecifycommandstothe zebra(8) daemons.Thesecondone(enablepasswordzebra)speciesthepasswordtobeprovidedbefore givingcommandsthatchangethecongurationofthedaemon.Itisalsopossibletospecifystatic routesinthiscongurationle,butwedonotusethisfacilityinthisexercise.Thelastparameterthat isspeciedistheloglewhere zebra(8) writesdebugginginformation.Additionalinformation aboutquagga areavailablefrom http://www.quagga.net/docs/docs-info.php Themostinterestingcongurationleforthisexerciseisthe /etc/zebra/ripd.conf le.Itcontainsall theparametersthatarespecictotheoperationoftheRIPprotocol.Asample ripd(8) congurationleisshownbelow. hostnameripd passwordzebra enablepasswordzebra routerrip network100.1.0.0/16 redistributeconnected logfile/var/log/zebra/ripd.log Thiscongurationlesshownthetwodifferentwaystocongure ripd(8).Thestatement router rip indicatesthebeginningofthecongurationfortheRIProutingprotocol.Theindentedlinesthat followarepartofthecongurationofthisprotocol.Therstline, network100.1.0.0/16 isusedto enableRIPontheinterfacewhoseIPsubnetmatches 100.1.0.0/16.Thesecondline, redistribute connected indicatesthatallthesubnetworksthataredirectlyconnectedontheroutershouldbeadvertised.Whenthiscongurationlineisused, ripd(8) interactswiththeLinuxkernelroutingtable andadvertisesallthesubnetworksthataredirectlyconnectedontherouter.Ifanewinterfaceisenabledandconguredontherouter,itssubnetworkprexwillbeautomaticallyadvertised.Similarly, thesubnetworkprexwillbeautomaticallyremovedifthesubnetworkinterfaceisshutdown. ToexperimentwithRIP,youwillusetheemulatedroutersshowninthegurebelow.Youcan downloadtheentirelabfrom exercises/labs/lab-5routers-rip.tar.gz Figure5.87:Theverouterslab 5.5.Exercises 207

    PAGE 212

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 The lab.conf describingthetopologyandtheinterfacesusedonallhostsisshownbelow/ r1[0]="A" r1[1]="B" r1[2]="F" r1[3]="V" r2[0]="A" r2[1]="C" r2[2]="W" r3[0]="B" r3[1]="C" r3[2]="D" r3[3]="X" r4[0]="D" r4[1]="E" r4[2]="Y" r5[0]="E" r5[1]="F" r5[2]="Z" Therearetwotypesofsubnetworksinthistopology.Thesubnetworksfromthe 172.16.0.0/16 prex areusedonthelinksbetweenrouterswhilethesubnetworksfromthe 192.168.0.0/16 prexareused onthelocalareanetworksthatareattachedtoasinglerouter. Aroutercanbeconguredintwodifferentways:byspecifyingcongurationlesandbytypingthe commandsdirectlyontherouterbyusing telnet(1).Therstfourroutershavebeencongured intheprovidedcongurationles.Lookat r1.startup andthecongurationslesin r1/tmp/zebra in thelab'sdirectoryforrouter r1.The r?.startup lescontainthe ifconfig(8) commandsthatare usedtoconguretheinterfacesofeachvirtualrouter.Thecongurationleslocatedin r?/tmp/zebra arealsocopiedautomaticallyonthevirtualrouterwhenitboots. 1.Launchthelabbyusing lstart andverifythatrouter r1 canreach 192.168.1.1, 192.168.2.2, 192.168.3.3 and 192.168.4.4.Youcanalso traceroute(8) todeterminewhatistheroute followedbyyourpackets. 2.The ripd(8) daemoncanalsobeconguredbytypingcommandsoveraTCPconnection. ripd(8) listensonport 2602.Onrouter r1,use telnet127.0.0.12602 toconnecttothe ripd(8) daemon.Thedefaultpasswordis zebra.Onceloggedonthe ripd(8) daemon, youreachthe > promptwhereyoucanquerythestatusoftherouter.Bytyping ? attheprompt, youwillndthelistofsupportedcommands.The show commandisparticularlyuseful,type show? toobtainthelistofitssuboptions.Forexample, showiprip willreturntheroutingtable thatismaintainedbythe ripd(8) daemon. 3.Disableinterface eth3 onrouter r1 bytyping ifcongeth3down onthisrouter.Verifythe impactofthiscommandontheroutingtablesoftheotherroutersinthenetwork.Re-enable thisinterfacebytyping ifcongeth3up. 4.Dothesamewiththe eth1 interfaceonrouter r3. 5.Editthe /etc/zebra/ripd.conf congurationleonrouter r5 sothatthisrouterbecomespartof thenetwork.Verifythat 192.168.5.5 isreachablebyallroutersinsidethenetwork. 7.TheOpenShortestPathFirst(OSPF)protocolisalink-stateprotocolthatisoftenusedinenterpriseIPnetworks.OSPFisimplementedinthe ospfd(8) daemonthatispartof quagga. Weusethesametopologyasinthepreviousexercise.Thenetkitlabmaybedownloadedfrom exercises/labs/lab-5routers-ospf.tar.gz. The ospfd(8) daemonsupportsamorecomplexcongurationthatthe ripd(8) daemon.A samplecongurationisshownbelow. hostnameospfd passwordzebra enablepasswordzebra 208Chapter5.Thenetworklayer

    PAGE 213

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 interfaceeth0 ipospfcost1 interfaceeth1 ipospfcost1 interfaceeth2 ipospfcost1 interfaceeth3 ipospfcost1 routerospf router-id192.168.1.1 network172.16.1.0/24area0.0.0.0 network172.16.2.0/24area0.0.0.0 network172.16.3.0/24area0.0.0.0 network192.168.1.0/24area0.0.0.0 passive-interfaceeth3 logfile/var/log/zebra/ospfd.log Inthiscongurationle,the ipospfcost1 specifyametricof 1 foreachinterface.The ospfd(8) congurationiscomposedofthreeparts.First,eachroutermusthaveoneidentierthatisunique insidethenetwork.Usually,thisidentierisoneoftheIPaddressesassignedtotherouter.Second, eachsubnetworkontherouterisassociatedwithanarea.Inthisexample,weonlyusethebackbone area(i.e. 0.0.0.0).ThelastcommandspeciesthattheOSPFHellomessagesshouldnotbesentover interface eth3 althoughitssubnetworkwillbeadvertisedbytherouter.Suchacommandisoftenused oninterfacesthatareattachedtoendhoststoensurethatnoproblemwilloccurifastudentcongures asoftwareOSPFrouteronhislaptopattachedtothisinterface. Thenetkitlabcontainsalreadythecongurationforrouters r1 r4. The ospfd(8) daemonlistensonTCPport 2604.YoucanfollowtheevolutionoftheOSPFprotocol byusingthe showipospf? commands. 1.Launchthelabbyusing lstart andverifythatthe 192.168.1.1, 192.168.2.2, 192.168.3.3 and 192.168.4.4 addressesarereachablefromanyrouterinsidethenetwork. 2.Congurerouter r5 bychangingthe /etc/zebra/ospfd.conf leandrestartthedaemon.Verify thatthe 192.168.5.5 addressisreachablefromanyrouterinsidethenetwork. 3.Howcanyouupdatethenetworkcongurationsothatthepacketssentbyrouter r1 torouter r5 usethedirectlinkbetweenthetworouterswhilethepacketssentby r5 areforwardedvia r4 ? 4.Disableinterface eth3 onrouter r1 andseehowquicklythenetworkconverges?Youcanfollow theevolutionoftheroutingtableonarouterbytyping netstat-rnc.Re-enableinterface eth3 on router r1. 5.ChangetheMTUof eth0 onrouter r1 butleaveitunchangedoninterface eth0 ofrouter r2. Whatistheimpactofthischange?Canyouexplainwhy? 6.Disableinterface eth1 onrouter r3 andseehowquicklythenetworkconverges?Re-enablethis interface. 7.Haltrouter r2 byusing vcrashr2.Howquicklydoesthenetworkreacttothisfailure? 5.5.Exercises 209

    PAGE 214

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 210Chapter5.Thenetworklayer

    PAGE 215

    CHAPTER 6 ThedatalinklayerandtheLocalArea Networks Thedatalinklayeristhelowestlayerofthereferencemodelthatwediscussindetail.Asmentionedpreviously, therearetwotypesofdatalinklayers.Therstdatalinklayersthatappearedaretheonesthatareusedonpointto-pointlinksbetweendevicesthataredirectlyconnectedbyaphysicallink.Wewillbrieydiscussoneofthese datalinklayersinthischapter.ThesecondtypeofdatalinklayersaretheonesusedinLocalAreaNetworks (LANs).Themaindifferencebetweenthepoint-to-pointandtheLANdatalinklayersisthatthelatterneedto regulatetheaccesstotheLocalAreaNetworkwhichisusuallyasharedmedium. Thischapterisorganisedasfollows.Werstdiscusstheprinciplesofthedatalinklayeraswellastheservices thatitusesfromthephysicallayer.WethendescribeinmoredetailseveralMediumAccessControlalgorithms thatareusedinLocalAreaNetworkstoregulatetheaccesstothesharedmedium.Finallywediscussindetail importantdatalinklayertechnologieswithanemphasisonEthernetandWiFinetworks. 6.1Principles Thedatalinklayerusestheserviceprovidedbythephysicallayer.Althoughtherearemanydifferentimplementationsofthephysicallayerfromatechnologicalperspective,theyallprovideaservicethatenablesthedatalink layertosendandreceivebitsbetweendirectlyconnecteddevices.Thedatalinklayerreceivespacketsfromthe networklayer.Twodatalinklayerentitiesexchange frames.Asexplainedinthepreviouschapter,mostdatalink layertechnologiesimposelimitationsonthesizeoftheframes.Sometechnologiesonlyimposeamaximum framesize,othersenforcebothminimumandmaximumframessizesandnallysometechnologiesonlysupport asingleframesize.Inthelattercase,thedatalinklayerwillusuallyincludeanadaptationsublayertoallowthe networklayertosendandreceivevariable-lengthpackets.Thisadaptationlayermayincludefragmentationand reassemblymechanisms. Figure6.1:Thedatalinklayerandthereferencemodel Thephysicallayerservicefacilitatesthesendingandreceivingofbits.Furthermore,itisusuallyfarfromperfect asexplainedintheintroduction: thePhysicallayermaychange,e.g.duetoelectromagneticinterferences,thevalueofabitbeingtransmitted 211

    PAGE 216

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 thePhysicallayermaydeliver more bitstothereceiverthanthebitssentbythesender thePhysicallayermaydeliver fewer bitstothereceiverthanthebitssentbythesender Thedatalinklayermustallowendsystemstoexchangeframescontainingpacketsdespitealloftheselimitations. Onpoint-to-pointlinksandLocalAreaNetworks,therstproblemtobesolvedishowtoencodeaframeasa sequenceofbits,sothatthereceivercaneasilyrecoverthereceivedframedespitethelimitationsofthephysical layer. Ifthephysicallayerwereperfect,theproblemwouldbeverysimple.Thedatalinklayerwouldsimplyneedto denehowtoencodeeachframeasasequenceofconsecutivebits.Thereceiverwouldtheneasilybeableto extracttheframesfromthereceivedbits.Unfortunately,theimperfectionsofthephysicallayermakethisframing problemslightlymorecomplex.Severalsolutionshavebeenproposedandareusedinpracticeindifferentdatalink layertechnologies. 6.1.1Framing Thisisthe framing problem.Itcanbedenedas:Howdoesasenderencodeframessothatthereceivercan efcientlyextractthemfromthestreamofbitsthatitreceivesfromthephysicallayer Arstsolutiontosolvetheframingproblemistorequirethephysicallayertoremainidleforsometimeafterthe transmissionofeachframe.Theseidleperiodscanbedetectedbythereceiverandserveasamarkertodelineate frameboundaries.Unfortunately,thissolutionisnotsufcientfortworeasons.First,somephysicallayerscannot remainidleandalwaysneedtotransmitbits.Second,insertinganidleperiodbetweenframesdecreasesthe maximumbandwidththatcanbeachievedbythedatalinklayer. Somephysicallayersprovideanalternativetothisidleperiod.Allphysicallayersareabletosendandreceive physicalsymbolsthatrepresentvalues 0 and 1.However,forvariousreasonsthatareoutsidethescopeofthis chapter,severalphysicallayersareabletoexchangeotherphysicalsymbolsaswell.Forexample,theManchester encodingusedinseveralphysicallayerscansendfourdifferentsymbols.TheManchesterencodingisadifferential encodingschemeinwhichtimeisdividedintoxed-lengthperiods.Eachperiodisdividedintwohalvesandtwo differentvoltagelevelscanbeapplied.Tosendasymbol,thesendermustsetoneofthesetwovoltagelevels duringeachhalfperiod.Tosenda 1 (resp. 0),thesendermustsetahigh(resp.low)voltageduringthersthalf oftheperiodandalow(resp.high)voltageduringthesecondhalf.Thisencodingensuresthattherewillbea transitionatthemiddleofeachperiodandallowsthereceivertosynchroniseitsclocktothesender'sclock.Apart fromtheencodingsfor 0 and 1,theManchesterencodingalsosupportstwoadditionalsymbols: InvH and InvB wherethesamevoltagelevelisusedforthetwohalfperiods.Bydenition,thesetwosymbolscannotappear insideaframewhichisonlycomposedof 0 and 1.Sometechnologiesusethesespecialsymbolsasmarkersfor thebeginningorendofframes. Figure6.2:Manchesterencoding Unfortunately,multi-symbolencodingscannotbeusedbyallphysicallayersandagenericsolutionwhichcanbe usedwithanyphysicallayerthatisabletotransmitandreceiveonly 0 and 1 isrequired.Thisgenericsolutionis called stufng andtwovariantsexist: bitstufng and characterstufng.Toenableareceivertoeasilydelineate theframeboundaries,thesetwotechniquesreservespecialbitstringsasframeboundarymarkersandencodethe framessothatthesespecialbitstringsdonotappearinsidetheframes. Bitstufng reservesthe 01111110 bitstringastheframeboundarymarkerandensuresthattherewillneverbe sixconsecutive 1 symbolstransmittedbythephysicallayerinsideaframe.Withbitstufng,aframeissentas follows.First,thesendertransmitsthemarker,i.e. 01111110.Then,itsendsallthebitsoftheframeandinserts 212Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 217

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 anadditionalbitsetto 0 aftereachsequenceofveconsecutive 1 bits.Thisensuresthatthesentframenever containsasequenceofsixconsecutivebitssetto 1.Asaconsequence,themarkerpatterncannotappearinsidethe framesent.Themarkerisalsosenttomarktheendoftheframe.Thereceiverperformstheoppositetodecodea receivedframe.Itrstdetectsthebeginningoftheframethankstothe 01111110 marker.Then,itprocessesthe receivedbitsandcountsthenumberofconsecutivebitssetto 1.Ifa 0 followsveconsecutivebitssetto 1,thisbit isremovedsinceitwasinsertedbythesender.Ifa 1 followsveconsecutivebitssetsto 1,itindicatesamarkerif itisfollowedbyabitsetto 0.Thetablebelowillustratestheapplicationofbitstufngtosomeframes. Originalframe Transmittedframe 0001001001001001001000011 01111110000100100100100100100001101111110 0110111111111111111110010 01111110011011111011111011111011001001111110 01111110 0111111001111101001111110 Forexample,considerthetransmissionof 0110111111111111111110010.Thesenderwillrstsendthe 01111110 markerfollowedby 011011111.Aftertheseveconsecutivebitssetto 1,itinsertsabitsetto 0 followedby 11111. Anew 0 isinserted,followedby 11111.Anew 0 isinsertedfollowedbytheendoftheframe 110010 andthe 01111110 marker. Bitstufng increasesthenumberofbitsrequiredtotransmiteachframe.Theworstcaseforbitstufngisofcourse alongsequenceofbitssetto 1 insidetheframe.Iftransmissionerrorsoccur,stuffedbitsormarkerscanbein error.Inthesecases,theframeaffectedbytheerrorandpossiblythenextframewillnotbecorrectlydecodedby thereceiver,butitwillbeabletoresynchroniseitselfatthenextvalidmarker. Bitstufng canbeeasilyimplementedinhardware.However,implementingitinsoftwareisdifcultgiventhe higheroverheadofbitmanipulationsinsoftware.Softwareimplementationsprefertoprocesscharactersthan bits,software-baseddatalinklayersusuallyuse characterstufng.Thistechniqueoperatesonframesthatcontain anintegernumberof8-bitcharacters.Somecharactersareusedasmarkerstodelineatetheframeboundaries. Many characterstufng techniquesusethe DLE, STX and ETX charactersoftheASCIIcharacterset. DLESTX (resp. DLEETX )isusedtomarkthebeginning(end)ofaframe.Whentransmittingaframe,thesenderaddsa DLE characteraftereachtransmitted DLE character.Thisensuresthatnoneofthemarkerscanappearinsidethe transmittedframe.Thereceiverdetectstheframeboundariesandremovesthesecond DLE whenitreceivestwo consecutive DLE characters.Forexample,totransmitframe 123DLESTX4,asenderwillrstsend DLESTX asamarker,followedby 123DLE.Then,thesendertransmitsanadditional DLE characterfollowedby STX4 andthe DLEETX marker. Originalframe Transmittedframe 1234 DLESTX1234DLEETX 123DLESTX4 DLESTX123DLEDLESTX4DLEETX DLESTXDLEETX DLESTXDLEDLESTXDLEDLEETXDLEETX Characterstufng ,likebitstufng,increasesthelengthofthetransmittedframes.For characterstufng,theworst frameisaframecontainingmany DLE characters.Whentransmissionerrorsoccur,thereceivermayincorrectly decodeoneortwoframes(e.g.iftheerrorsoccurinthemarkers).However,itwillbeabletoresynchroniseitself withthenextcorrectlyreceivedmarkers. 6.1.2Errordetection Besidesframing,datalinklayersalsoincludemechanismstodetectandsometimesevenrecoverfromtransmission error.Toallowareceivertodetecttransmissionerrors,asendermustaddsomeredundantinformationasan error detection codetotheframesent.This errordetection codeiscomputedbythesenderontheframethatittransmits. Whenthereceiverreceivesaframewithanerrordetectioncode,itrecomputesitandverieswhetherthereceived errordetectioncode matchesthecomputer errordetectioncode.Iftheymatch,theframeisconsideredtobevalid. Manyerrordetectionschemesexistandentirebookshavebeenwrittenonthesubject.Adetaileddiscussionof thesetechniquesisoutsidethescopeofthisbook,andwewillonlydiscusssomeexamplestoillustratethekey principles. Tounderstand errordetectioncodes,letusconsidertwodevicesthatexchangebitstringscontaining N bits.To allowthereceivertodetectatransmissionerror,thesenderconvertseachstringof N bitsintoastringof N+r bits.Usually,the r redundantbitsareaddedatthebeginningortheendofthetransmittedbitstring,butsome techniquesinterleaveredundantbitswiththeoriginalbits.An errordetectioncode canbedenedasafunction thatcomputesthe r redundantbitscorrespondingtoeachstringof N bits.Thesimplesterrordetectioncodeisthe 6.1.Principles 213

    PAGE 218

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 paritybit.Therearetwotypesofparityschemes:evenandoddparity.Withthe even (resp. odd )parityscheme, theredundantbitischosensothataneven(resp.odd)numberofbitsaresetto 1 inthetransmittedbitstringof N+r bits.Thereceivercaneasilyrecomputetheparityofeachreceivedbitstringanddiscardthestringswithan invalidparity.Theparityschemeisoftenusedwhen7-bitcharactersareexchanged.Inthiscase,theeighthbitis oftenaparitybit.Thetablebelowshowstheparitybitsthatarecomputedforbitstringscontainingthreebits. 3bitsstring Oddparity Evenparity 000 1 0 001 0 1 010 0 1 100 0 1 111 0 1 110 1 0 101 1 0 011 1 0 Theparitybitallowsareceivertodetecttransmissionerrorsthathaveaffectedasinglebitamongthetransmitted N+r bits.Iftherearetwoormorebitsinerror,thereceivermaynotnecessarilybeabletodetectthetransmission error.Morepowerfulerrordetectionschemeshavebeendened.TheCyclicalRedundancyChecks(CRC)are widelyusedindatalinklayerprotocols.AnN-bitsCRCcandetectalltransmissionerrorsaffectingaburstof lessthanNbitsinthetransmittedframeandalltransmissionerrorsthataffectanoddnumberofbits.Additional detailsaboutCRCsmaybefoundin[Williams1993]. Itisalsopossibletodesignacodethatallowsthereceivertocorrecttransmissionerrors.Thesimplest error correctioncode isthetriplemodularredundancy(TMR).Totransmitabitsetto 1 (resp. 0),thesendertransmits 111 (resp. 000).Whentherearenotransmissionerrors,thereceivercandecode 111 as 1.Iftransmissionerrors haveaffectedasinglebit,thereceiverperformsmajorityvotingasshowninthetablebelow.Thisschemeallows thereceivertocorrectalltransmissionerrorsthataffectasinglebit. Receivedbits Decodedbit 000 0 001 0 010 0 100 0 111 1 110 1 101 1 011 1 Othermorepowerfulerrorcorrectioncodeshavebeenproposedandareusedinsomeapplications.The Hamming Code isaclevercombinationofparitybitsthatprovideserrordetectionandcorrectioncapabilities. Inpractice,datalinklayerprotocolscombinebitstufngorcharacterstufngwithalengthindicationintheframe headerandachecksumorCRC.Thechecksum/CRCiscomputedbythesenderandplacedintheframebefore applyingbit/characterstufng. 6.2MediumAccessControl Point-to-pointdatalinklayersneedtoselectoneoftheframingtechniquesdescribedaboveandoptionallyadd retransmissionalgorithmssuchasthoseexplainedforthetransportlayertoprovideareliableservice.Datalink layersforLocalAreaNetworksfacetwoadditionalproblems.ALANiscomposedofseveralhoststhatare attachedtothesamesharedphysicalmedium.Fromaphysicallayerperspective,aLANcanbeorganisedinfour differentways: abus-shapednetworkwhereallhostsareattachedtothesamephysicalcable aring-shapedwhereallhostsareattachedtoanupstreamandadownstreamnodesothattheentirenetwork formsaring astar-shapednetworkwhereallhostsareattachedtothesamedevice 214Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 219

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 awirelessnetworkwhereallhostscansendandreceiveframesusingradiosignals ThesefourbasicphysicalorganisationsofLocalAreaNetworksareshowngraphicallyinthegurebelow.We willrstfocusononephysicalorganisationatatime. Figure6.3:Bus,ringandstar-shapedLocalAreaNetwork ThecommonproblemamongallofthesenetworkorganisationsishowtoefcientlysharetheaccesstotheLocal AreaNetwork.Iftwodevicessendaframeatthesametime,thetwoelectrical,opticalorradiosignalsthat correspondtotheseframeswillappearatthesametimeonthetransmissionmediumandareceiverwillnotbe abletodecodeeitherframe.Suchsimultaneoustransmissionsarecalled collisions.A collision mayinvolveframes transmittedbytwoormoredevicesattachedtotheLocalAreaNetwork.Collisionsarethemaincauseoferrors inwiredLocalAreaNetworks. AllLocalAreaNetworktechnologiesrelyona MediumAccessControl algorithmtoregulatethetransmissionsto eitherminimiseoravoidcollisions.Therearetwobroadfamiliesof MediumAccessControl algorithms: 1. Deterministic or pessimistic MACalgorithms.Thesealgorithmsassumethatcollisionsareaverysevere problemandthattheymustbecompletelyavoided.Thesealgorithmsensurethatatanytime,atmostone deviceisallowedtosendaframeontheLAN.Thisisusuallyachievedbyusingadistributedprotocolwhich electsonedevicethatisallowedtotransmitateachtime.AdeterministicMACalgorithmensuresthatno collisionwillhappen,butthereissomeoverheadinregulatingthetransmissionofallthedevicesattached totheLAN. 2. Stochastic or optimistic MACalgorithms.Thesealgorithmsassumethatcollisionsarepartofthenormal operationofaLocalAreaNetwork.Theyaimtominimisethenumberofcollisions,buttheydonottryto avoidallcollisions.Stochasticalgorithmsareusuallyeasiertoimplementthandeterministicones. WerstdiscussasimpledeterministicMACalgorithmandthenwedescribeseveralimportantoptimisticalgorithms,beforecomingbacktoadistributedanddeterministicMACalgorithm. 6.2.1Staticallocationmethods ArstsolutiontosharetheavailableresourcesamongallthedevicesattachedtooneLocalAreaNetworkisto dene, apriori,thedistributionofthetransmissionresourcesamongthedifferentdevices.If N devicesneedto sharethetransmissioncapacitiesofaLANoperatingat b Mbps,eachdevicecouldbeallocatedabandwidthof b N Mbps. LimitedresourcesneedtobesharedinotherenvironmentsthanLocalAreaNetworks.SincetherstradiotransmissionsbyMarconimorethanonecenturyago,manyapplicationsthatexchangeinformationthroughradio signalshavebeendeveloped.Eachradiosignalisanelectromagneticwavewhosepoweriscenteredarounda givenfrequency.Theradiospectrumcorrespondstofrequenciesrangingbetweenroughly3KHzand300GHz. Frequencyallocationplansnegotiatedamonggovernmentsreservemostfrequencyrangesforspecicapplications suchasbroadcastradio,broadcasttelevision,mobilecommunications,aeronauticalradionavigation,amateurradio,satellite,etc.Eachfrequencyrangeisthensubdividedintochannelsandeachchannelcanbereservedfora givenapplication,e.g.aradiobroadcasterinagivenregion. FrequencyDivisionMultiplexing (FDM)isastaticallocationschemeinwhichafrequencyisallocatedtoeach deviceattachedtothesharedmedium.Aseachdeviceusesadifferenttransmissionfrequency,collisionscannot 6.2.MediumAccessControl 215

    PAGE 220

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 occur.Inopticalnetworks,avariantofFDMcalled WavelengthDivisionMultiplexing (WDM)canbeused.An opticalbercantransportlightatdifferentwavelengthswithoutinterference.WithWDM,adifferentwavelength isallocatedtoeachofthedevicesthatsharethesameopticalber. TimeDivisionMultiplexing (TDM)isastaticbandwidthallocationmethodthatwasinitiallydenedforthetelephonenetwork.Inthexedtelephonenetwork,avoiceconversationisusuallytransmittedasa64Kbpssignal. Thus,atelephoneconservationgenerates8KBytespersecondoronebyteevery125microsecond.Telephone conversationsoftenneedtobemultiplexedtogetheronasingleline.Forexample,inEurope,thirty64Kbpsvoice signalsaremultiplexedoverasingle2Mbps(E1)line.Thisisdonebyusing TimeDivisionMultiplexing (TDM). TDMdividesthetransmissionopportunitiesintoslots.Inthetelephonenetwork,aslotcorrespondsto125microseconds.Apositioninsideeachslotisreservedforeachvoicesignal.ThegurebelowillustratesTDMona linkthatisusedtocarryfourvoiceconversations.Theverticallinesrepresenttheslotboundariesandtheletters thedifferentvoiceconversations.Onebytefromeachvoiceconversationissentduringeach125microsecond slot.Thebytecorrespondingtoagivenconversationisalwayssentatthesamepositionineachslot. Figure6.4:Time-divisionmultiplexing TDMasshownabovecanbecompletelystatic,i.e.thesameconversationsalwayssharethelink,ordynamic.In thelattercase,thetwoendpointsofthelinkmustexchangemessagesspecifyingwhichconversationuseswhich byteinsideeachslot.Thankstothesesignallingmessages,itispossibletodynamicallyaddandremovevoice conversationsfromagivenlink. TDMandFDMarewidelyusedintelephonenetworkstosupportxedbandwidthconversations.Usingthem inLocalAreaNetworksthatsupportcomputerswouldprobablybeinefcient.Computersusuallydonotsend informationataxedrate.Instead,theyoftenhaveanon-offbehaviour.Duringtheonperiod,thecomputertries tosendatthehighestpossiblerate,e.g.totransferale.Duringtheoffperiod,whichisoftenmuchlongerthan theonperiod,thecomputerdoesnottransmitanypacket.Usingastaticallocationschemeforcomputersattached toaLANwouldleadtohugeinefciencies,astheywouldonlybeabletotransmitat 1 N ofthetotalbandwidth duringtheironperiod,despitethefactthattheothercomputersareintheiroffperiodandthusdonotneedto transmitanyinformation.ThedynamicMACalgorithmsdiscussedintheremainderofthischapteraimsolvethis problem. 6.2.2ALOHA Inthe1960s,computersweremainlymainframeswithafewdozenterminalsattachedtothem.Theseterminals wereusuallyinthesamebuildingasthemainframeandweredirectlyconnectedtoit.Insomecases,theterminals wereinstalledinremotelocationsandconnectedthrougha modem attachedtoa dial-upline.Theuniversity ofHawaiichoseadifferentorganisation.Insteadofusingtelephonelinestoconnectthedistantterminals,they developedtherst packetradio technology[Abramson1970].Untilthen,computernetworkswerebuiltontopof eitherthetelephonenetworkorphysicalcables.ALOHANetshowedthatitwaspossibletouseradiosignalsto interconnectcomputers. TherstversionofALOHANet,describedin [Abramson1970],operatedasfollows:First,theterminalsandthe mainframeexchangedxed-lengthframescomposedof704bits.Eachframecontained808-bitcharacters,some controlbitsandparityinformationtodetecttransmissionerrors.Twochannelsinthe400MHzrangewerereserved fortheoperationofALOHANet.Therstchannelwasusedbythemainframetosendframestoallterminals. Thesecondchannelwassharedamongallterminalstosendframestothemainframe.Asallterminalssharethe sametransmissionchannel,thereisariskofcollision.Todealwiththisproblemaswellastransmissionerrors, themainframeveriedtheparitybitsofthereceivedframeandsentanacknowledgementonitschannelforeach correctlyreceivedframe.Theterminalsontheotherhandhadtoretransmittheunacknowledgedframes.Asfor TCP,retransmittingtheseframesimmediatelyuponexpirationofaxedtimeoutisnotagoodapproachasseveral terminalsmayretransmittheirframesatthesametimeleadingtoanetworkcollapse.Abetterapproach,butstill farfromperfect,isforeachterminaltowaitarandomamountoftimeaftertheexpirationofitsretransmission timeout.Thisavoidssynchronisationamongmultipleretransmittingterminals. 216Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 221

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Thepseudo-codebelowshowstheoperationofanALOHANetterminal.WeusethispythonsyntaxforallMedium AccessControlalgorithmsdescribedinthischapter.Thealgorithmisappliedtoeachnewframethatneedstobe transmitted.Itattemptstotransmitaframeatmost max times(whileloop).Eachtransmissionattemptisperformed asfollows:First,theframeissent.Eachframeisprotectedbyatimeout.Then,theterminalwaitsforeithera validacknowledgementframeortheexpirationofitstimeout.Iftheterminalreceivesanacknowledgement,the framehasbeendeliveredcorrectlyandthealgorithmterminates.Otherwise,theterminalwaitsforarandomtime andattemptstoretransmittheframe. #ALOHA N=1 whileN<=max: send(frame) wait(ack_on_return_channelortimeout) if(ack_on_return_channel): break#transmissionwassuccessful else: #timeout wait(random_time) N=N+1 else: #Toomanytransmissionattempts [Abramson1970]analysedtheperformanceofALOHANetunderparticularassumptionsandfoundthatALOHANetworkedwellwhenthechannelwaslightlyloaded.Inthiscase,theframesarerarelyretransmittedandthe channeltrafc,i.e.thetotalnumberof(correctandretransmitted)framestransmittedperunitoftimeiscloseto the channelutilization,i.e.thenumberofcorrectlytransmittedframesperunitoftime.Unfortunately,theanalysis alsorevealsthatthe channelutilization reachesitsmaximumat 1 2 e =0:186 timesthechannelbandwidth.At higherutilization,ALOHANetbecomesunstableandthenetworkcollapsesduetocollidedretransmissions. Note: Amateurpacketradio PacketradiotechnologieshaveevolvedinvariousdirectionssincetherstexperimentsperformedattheUniversity ofHawaii.TheAmateurpacketradioservicedevelopedbyamateurradiooperatorsisoneofthedescendants ALOHANet.Manyamateurradiooperatorsareveryinterestedinnewtechnologiesandtheyoftenspendcountless hoursdevelopingnewantennasortransceivers.Whentherstpersonalcomputersappeared,severalamateurradio operatorsdesignedradiomodemsandtheirowndatalinklayerprotocols [KPD1985][BNT1997].Thisnetwork grewanditwaspossibletoconnecttoserversinseveralEuropeancountriesbyonlyusingpacketradiorelays. SomeamateurradiooperatorsalsodevelopedTCP/IPprotocolstacksthatwereusedoverthepacketradioservice. Somepartsofthe amateurpacketradionetwork areconnectedtotheglobalInternetandusethe 44.0.0.0/8 prex. ManyimprovementstoALOHANethavebeenproposedsincethepublicationof [Abramson1970],andthistechnique,orsomeofitsvariants,arestillfoundinwirelessnetworkstoday.Theslottedtechniqueproposedin [Roberts1975] isimportantbecauseitshowsthatasimplemodicationcansignicantlyimprovechannelutilization.Insteadofallowingallterminalstotransmitatanytime, [Roberts1975] proposedtodividetimeintoslots andallowterminalstotransmitonlyatthebeginningofeachslot.Eachslotcorrespondstothetimerequiredto transmitonexedsizeframe.Inpractice,theseslotscanbeimposedbyasingleclockthatisreceivedbyall terminals.InALOHANet,itcouldhavebeenlocatedonthecentralmainframe.Theanalysisin [Roberts1975] revealsthatthissimplemodicationimprovesthechannelutilizationbyafactoroftwo. 6.2.3CarrierSenseMultipleAccess ALOHAandslottedALOHAcaneasilybeimplemented,butunfortunately,theycanonlybeusedinnetworksthat areverylightlyloaded.Designinganetworkforaverylowutilisationispossible,butitclearlyincreasesthecost ofthenetwork.ToovercometheproblemsofALOHA,manyMediumAccessControlmechanismshavebeen proposedwhichimprovechannelutilization.CarrierSenseMultipleAccess(CSMA)isasignicantimprovement comparedtoALOHA.CSMArequiresallnodestolistentothetransmissionchanneltoverifythatitisfreebefore transmittingaframe [KT1975].Whenanodesensesthechanneltobebusy,itdefersitstransmissionuntilthe channelbecomesfreeagain.Thepseudo-codebelowprovidesamoredetaileddescriptionoftheoperationof CSMA. 6.2.MediumAccessControl 217

    PAGE 222

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 #persistentCSMA N=1 whileN<=max: wait(channel_becomes_free) send(frame) wait(ackortimeout) ifack: break#transmissionwassuccessful else: #timeout N=N+1 #endofwhileloop #Toomanytransmissionattempts Theabovepseudo-codeisoftencalled persistentCSMA [KT1975]astheterminalwillcontinuouslylistentothe channelandtransmititsframeassoonasthechannelbecomesfree.AnotherimportantvariantofCSMAisthe non-persistentCSMA [KT1975].Themaindifferencebetweenpersistentandnon-persistentCSMAdescribed inthepseudo-codebelowisthatanon-persistentCSMAnodedoesnotcontinuouslylistentothechannelto determinewhenitbecomesfree.Whenanon-persistentCSMAterminalsensesthetransmissionchanneltobe busy,itwaitsforarandomtimebeforesensingthechannelagain.Thisimproveschannelutilizationcomparedto persistentCSMA.WithpersistentCSMA,whentwoterminalssensethechanneltobebusy,theywillbothtransmit (andthuscauseacollision)assoonasthechannelbecomesfree.Withnon-persistentCSMA,thissynchronisation doesnotoccur,astheterminalswaitarandomtimeafterhavingsensedthetransmissionchannel.However,the higherchannelutilizationachievedbynon-persistentCSMAcomesattheexpenseofaslightlyhigherwaiting timeintheterminalswhenthenetworkislightlyloaded. #NonpersistentCSMA N=1 whileN<=max: listen(channel) iffree(channel): send(frame) wait(ackortimeout) ifreceived(ack): break#transmissionwassuccessful else: #timeout N=N+1 else: wait(random_time) #endofwhileloop #Toomanytransmissionattempts [KT1975]analyzesindetailtheperformanceofseveralCSMAvariants.Undersomeassumptionsaboutthetransmissionchannelandthetrafc,theanalysiscomparesALOHA,slottedALOHA,persistentandnon-persistent CSMA.Undertheseassumptions,ALOHAachievesachannelutilizationofonly18.4%ofthechannelcapacity. SlottedALOHAisabletouse36.6%ofthiscapacity.PersistentCSMAimprovestheutilizationbyreaching 52.9%ofthecapacitywhilenon-persistentCSMAachieves81.5%ofthechannelcapacity. 6.2.4CarrierSenseMultipleAccesswithCollisionDetection CSMAimproveschannelutilizationcomparedtoALOHA.However,theperformancecanstillbeimproved, especiallyinwirednetworks.Considerthesituationoftwoterminalsthatareconnectedtothesamecable.This cablecould,forexample,beacoaxialcableasintheearlydaysofEthernet [Metcalfe1976].Itcouldalsobebuilt withtwistedpairs.BeforeextendingCSMA,itisusefultounderstandmoreintuitively,howframesaretransmitted insuchanetworkandhowcollisionscanoccur.Thegurebelowillustratesthephysicaltransmissionofaframe onsuchacable.Totransmititsframe,hostAmustsendanelectricalsignalonthesharedmedium.Therststep isthustobeginthetransmissionoftheelectricalsignal.Thisispoint (1) inthegurebelow.Thiselectricalsignal willtravelalongthecable.Althoughelectricalsignalstravelfast,weknowthatinformationcannottravelfaster thanthespeedoflight(i.e.300.000kilometers/second).Onacoaxialcable,anelectricalsignalisslightlyslower 218Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 223

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 thanthespeedoflightand200.000kilometerspersecondisareasonableestimation.Thisimpliesthatifthecable hasalengthofonekilometer,theelectricalsignalwillneed5microsecondstotravelfromoneendofthecableto theother.Theendsofcoaxialcablesareequippedwithterminationpointsthatensurethattheelectricalsignalis notreectedbacktoitssource.Thisisillustratedatpoint (3) inthegure,wheretheelectricalsignalhasreached theleftendpointandhostB.Atthispoint,BstartstoreceivetheframebeingtransmittedbyA.Noticethatthereis adelaybetweenthetransmissionofabitonhostAanditsreceptionbyhostB.Iftherewereotherhostsattached tothecable,theywouldreceivetherstbitoftheframeatslightlydifferenttimes.Aswewillseelater,thistiming differenceisakeyproblemforMACalgorithms.Atpoint (4),theelectricalsignalhasreachedbothendsofthe cableandoccupiesitcompletely.HostAcontinuestotransmittheelectricalsignaluntiltheendoftheframe.As shownatpoint (5),whenthesendinghoststopsitstransmission,theelectricalsignalcorrespondingtotheendof theframeleavesthecoaxialcable.Thechannelbecomesemptyagainoncetheentireelectricalsignalhasbeen removedfromthecable. Figure6.5:Frametransmissiononasharedbus Nowthatwehavelookedathowaframeisactuallytransmittedasanelectricalsignalonasharedbus,itis interestingtolookinmoredetailatwhathappenswhentwohoststransmitaframeatalmostthesametime.This isillustratedinthegurebelow,wherehostsAandBstarttheirtransmissionatthesametime(point (1)).Atthis time,ifhostCsensesthechannel,itwillconsiderittobefree.Thiswillnotlastalongtimeandatpoint (2) the electricalsignalsfrombothhostAandhostBreachhostC.Thecombinedelectricalsignal(showngraphicallyas thesuperpositionofthetwocurvesinthegure)cannotbedecodedbyhostC.HostCdetectsacollision,asit receivesasignalthatitcannotdecode.SincehostCcannotdecodetheframes,itcannotdeterminewhichhosts aresendingthecollidingframes.NotethathostA(andhostB)willdetectthecollisionafterhostC(point (3) in thegurebelow). Asshownabove,hostsdetectcollisionswhentheyreceiveanelectricalsignalthattheycannotdecode.Inawired network,ahostisabletodetectsuchacollisionbothwhileitislistening(e.g.likehostCinthegureabove) andalsowhileitissendingitsownframe.Whenahosttransmitsaframe,itcancomparetheelectricalsignal thatittransmitswiththeelectricalsignalthatitsensesonthewire.Atpoints (1) and (2) inthegureabove, hostAsensesonlyitsownsignal.Atpoint (3),itsensesanelectricalsignalthatdiffersfromitsownsignaland canthusdetectsthecollision.Atthispoint,itsframeiscorruptedanditcanstopitstransmission.Theability todetectcollisionswhiletransmittingisthestartingpointforthe CarrierSenseMultipleAccesswithCollision Detection(CSMA/CD) MediumAccessControlalgorithm,whichisusedinEthernetnetworks [Metcalfe1976] [802.3] .WhenanEthernethostdetectsacollisionwhileitistransmitting,itimmediatelystopsitstransmission. ComparedwithpureCSMA,CSMA/CDisanimportantimprovementsincewhencollisionsoccur,theyonlylast untilcollidinghostshavedetecteditandstoppedtheirtransmission.Inpractice,whenahostdetectsacollision,it sendsaspecialjammingsignalonthecabletoensurethatallhostshavedetectedthecollision. 6.2.MediumAccessControl 219

    PAGE 224

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure6.6:Framecollisiononasharedbus Tobetterunderstandthesecollisions,itisusefultoanalysewhatwouldbetheworstcollisiononasharedbus network.Letusconsiderawirewithtwohostsattachedatbothends,asshowninthegurebelow.HostA startstotransmititsframeanditselectricalsignalispropagatedonthecable.Itspropagationtimedependsonthe physicallengthofthecableandthespeedoftheelectricalsignal.Letususe torepresentthispropagationdelay inseconds.Slightlylessthan secondsafterthebeginningofthetransmissionofA'sframe,Bdecidestostart transmittingitsownframe.After seconds,BsensesA'sframe,detectsthecollisionandstopstransmitting.The beginningofB'sframetravelsonthecableuntilitreacheshostA.HostAcanthusdetectthecollisionattime )Tj/T1_2 9.963 Tf10.002 0 Td( + 2 .Animportantpointtonoteisthatacollisioncanonlyoccurduringtherst 2 secondsof itstransmission.Ifacollisiondidnotoccurduringthisperiod,itcannotoccurafterwardssincethetransmission channelisbusyafter secondsandCSMA/CDhostssensethetransmissionchannelbeforetransmittingtheir frame. Figure6.7:Theworstcollisiononasharedbus Furthermore,onthewirednetworkswhereCSMA/CDisused,collisionsarealmosttheonlycauseoftransmission errorsthataffectframes.Transmissionerrorsthatonlyaffectafewbitsinsideaframeseldomoccurinthesewired networks.Forthisreason,thedesignersofCSMA/CDchosetocompletelyremovetheacknowledgementframes inthedatalinklayer.Whenahosttransmitsaframe,itverieswhetheritstransmissionhasbeenaffectedbya collision.Ifnot,giventhenegligibleBitErrorRatiooftheunderlyingnetwork,itassumesthattheframewas receivedcorrectlybyitsdestination.Otherwisetheframeisretransmittedaftersomedelay. Removingacknowledgementsisaninterestingoptimisationasitreducesthenumberofframesthatareexchanged onthenetworkandthenumberofframesthatneedtobeprocessedbythehosts.However,tousethisoptimisation, wemustensurethatallhostswillbeabletodetectallthecollisionsthataffecttheirframes.Theproblemis 220Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 225

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 importantforshortframes.Letusconsidertwohosts,AandB,thataresendingasmallframetohostCas illustratedinthegurebelow.IftheframessentbyAandBareveryshort,thesituationillustratedbelowmay occur.HostsAandBsendtheirframeandstoptransmitting(point (1)).Whenthetwoshortframesarriveatthe locationofhostC,theycollideandhostCcannotdecodethem(point (2)).Thetwoframesareabsorbedbythe endsofthewire.NeitherhostAnorhostBhavedetectedthecollision.Theybothconsidertheirframetohave beenreceivedcorrectlybyitsdestination. Figure6.8:Theshort-framecollisionproblem Tosolvethisproblem,networksusingCSMA/CDrequirehoststotransmitforatleast 2 seconds.Since thenetworktransmissionspeedisxedforagivennetworktechnology,thisimpliesthatatechnologythatuses CSMA/CDenforcesaminimumframesize.InthemostpopularCSMA/CDtechnology,Ethernet, 2 iscalled the slottime 1 ThelastinnovationintroducedbyCSMA/CDisthecomputationoftheretransmissiontimeout.AsforALOHA, thistimeoutcannotbexed,otherwisehostscouldbecomesynchronisedandalwaysretransmitatthesametime. Settingsuchatimeoutisalwaysacompromisebetweenthenetworkaccessdelayandtheamountofcollisions.A shorttimeoutwouldleadtoalownetworkaccessdelaybutwithahigherriskofcollisions.Ontheotherhand, alongtimeoutwouldcausealongnetworkaccessdelaybutalowerriskofcollisions.The binaryexponential back-off algorithmwasintroducedinCSMA/CDnetworkstosolvethisproblem. Tounderstand binaryexponentialback-off,letusconsideracollisioncausedbyexactlytwohosts.Onceithas detectedthecollision,ahostcaneitherretransmititsframeimmediatelyordeferitstransmissionforsometime. Ifeachcollidinghostipsacointodecidewhethertoretransmitimmediatelyortodeferitsretransmission,four casesarepossible: 1.Bothhostsretransmitimmediatelyandanewcollisionoccurs 2.Thersthostretransmitsimmediatelyandtheseconddefersitsretransmission 3.Thesecondhostretransmitsimmediatelyandtherstdefersitsretransmission 4.Bothhostsdefertheirretransmissionandanewcollisionoccurs Inthesecondandthirdcases,bothhostshaveippeddifferentcoins.Thedelaychosenbythehostthatdefers itsretransmissionshouldbelongenoughtoensurethatitsretransmissionwillnotcollidewiththeimmediate retransmissionoftheotherhost.Howeverthedelayshouldnotbelongerthanthetimenecessarytoavoidthe collision,becauseifbothhostsdecidetodefertheirtransmission,thenetworkwillbeidleduringthisdelay.The 1 ThisnameshouldnotbeconfusedwiththedurationofatransmissionslotinslottedALOHA.InCSMA/CDnetworks,theslottimeis thetimeduringwhichacollisioncanoccuratthebeginningofthetransmissionofaframe.InslottedALOHA,thedurationofaslotisthe transmissiontimeofanentirexed-sizeframe. 6.2.MediumAccessControl 221

    PAGE 226

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 slottime istheoptimaldelaysinceitistheshortestdelaythatensuresthatthersthostwillbeabletoretransmit itsframecompletelywithoutanycollision. Iftwohostsarecompeting,thealgorithmabovewillavoidasecondcollision50%ofthetime.However,ifthe networkisheavilyloaded,severalhostsmaybecompetingatthesametime.Inthiscase,thehostsshouldbeable toautomaticallyadapttheirretransmissiondelay.The binaryexponentialback-off performsthisadaptationbased onthenumberofcollisionsthathaveaffectedaframe.Aftertherstcollision,thehostipsacoinandwaits 0or1 slottime.Afterthesecondcollision,itgeneratesarandomnumberandwaits0,1,2or3 slottimes,etc. Thedurationofthewaitingtimeisdoubledaftereachcollision.Thecompletepseudo-codefortheCSMA/CD algorithmisshowninthegurebelow. #CSMA/CDpseudo-code N=1 whileN<=max: wait(channel_becomes_free) send(frame) wait_until(end_of_frame)or(collision) ifcollisiondetected: stoptransmitting send(jamming) k=min(10,N) r=random(0,2k-1) slotTime wait(r slotTime) N=N+1 else: wait(inter-frame_delay) break #endofwhileloop #Toomanytransmissionattempts Theinter-framedelayusedinthispseudo-codeisashortdelaycorrespondingtothetimerequiredbyanetwork adaptertoswitchfromtransmittoreceivemode.Itisalsousedtopreventahostfromsendingacontinuous streamofframeswithoutleavinganytransmissionopportunitiesforotherhostsonthenetwork.Thiscontributes tothefairnessofCSMA/CD.Despitethisdelay,therearestillconditionswhereCSMA/CDisnotcompletelyfair [RY1994].Considerforexampleanetworkwithtwohosts:aserversendinglongframesandaclientsending acknowledgments.Measurementsreportedin [RY1994] haveshownthattherearesituationswheretheclient couldsufferfromrepeatedcollisionsthatleadittowaitforlongperiodsoftimeduetotheexponentialback-off algorithm. 6.2.5CarrierSenseMultipleAccesswithCollisionAvoidance The CarrierSenseMultipleAccesswithCollisionAvoidance (CSMA/CA)MediumAccessControlalgorithmwas designedforthepopularWiFiwirelessnetworktechnology [802.11].CSMA/CAalsosensesthetransmission channelbeforetransmittingaframe.Furthermore,CSMA/CAtriestoavoidcollisionsbycarefullytuningthe timersusedbyCSMA/CAdevices. CSMA/CAusesacknowledgementslikeCSMA.EachframecontainsasequencenumberandaCRC.TheCRC isusedtodetecttransmissionerrorswhilethesequencenumberisusedtoavoidframeduplication.Whena devicereceivesacorrectframe,itreturnsaspecialacknowledgementframetothesender.CSMA/CAintroduces asmalldelay,named ShortInterFrameSpacing (SIFS),betweenthereceptionofaframeandthetransmissionof theacknowledgementframe.Thisdelaycorrespondstothetimethatisrequiredtoswitchtheradioofadevice betweenthereceptionandtransmissionmodes. ComparedtoCSMA,CSMA/CAdenesmorepreciselywhenadeviceisallowedtosendaframe.First, CSMA/CAdenestwodelays: DIFS and EIFS.Tosendaframe,adevicemustrstwaituntilthechannel hasbeenidleforatleastthe DistributedCoordinationFunctionInterFrameSpace (DIFS)ifthepreviousframe wasreceivedcorrectly.However,ifthepreviouslyreceivedframewascorrupted,thisindicatesthatthereare collisionsandthedevicemustsensethechannelidleforatleastthe ExtendedInterFrameSpace (EIFS),with SIFS
    PAGE 227

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 ThegurebelowshowsthebasicoperationofCSMA/CAdevices.Beforetransmitting,host A veriesthatthe channelisemptyforalongenoughperiod.Then,itssendsitsdataframe.Aftercheckingthevalidityofthe receivedframe,therecipientsendsanacknowledgementframeafterashortSIFSdelay.Host C,whichdoesnot participateintheframeexchange,sensesthechanneltobebusyatthebeginningofthedataframe.Host C can usethisinformationtodeterminehowlongthechannelwillbebusyfor.Notethatas SIFS
    PAGE 228

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Todealwiththisproblem,CSMA/CAreliesonabackofftimer.Thisbackofftimerisarandomdelaythatis chosenbyeachdeviceinarangethatdependsonthenumberofretransmissionsforthecurrentframe.The rangegrowsexponentiallywiththeretransmissionsasinCSMA/CD.Theminimumrangeforthebackofftimer is [0; 7 slotTime ] wherethe slotTime isaparameterthatdependsontheunderlyingphysicallayer.Compared toCSMA/CD'sexponentialbackoff,therearetwoimportantdifferencestonotice.First,theinitialrangefor thebackofftimerisseventimeslarger.ThisisbecauseitisimpossibleinCSMA/CAtodetectcollisionsas theyhappen.WithCSMA/CA,acollisionmayaffecttheentireframewhilewithCSMA/CDitcanonlyaffect thebeginningoftheframe.Second,aCSMA/CAdevicemustregularlysensethetransmissionchannelduring itsbackofftimer.Ifthechannelbecomesbusy(i.e.becauseanotherdeviceistransmitting),thenthebackoff timermustbefrozenuntilthechannelbecomesfreeagain.Oncethechannelbecomesfree,thebackofftimer isrestarted.ThisisincontrastwithCSMA/CDwherethebackoffisrecomputedaftereachcollision.Thisis illustratedinthegurebelow.Host A choosesasmallerbackoffthanhost C.When C sensesthechanneltobe busy,itfreezesitsbackofftimerandonlyrestartsitoncethechannelisfreeagain. Figure6.11:DetailedexamplewithCSMA/CA Thepseudo-codebelowsummarisestheoperationofaCSMA/CAdevice.ThevaluesoftheSIFS,DIFS,EIFS andslotTimedependontheunderlyingphysicallayertechnology [802.11] #CSMA/CAsimplifiedpseudo-code N=1 whileN<=max: waitUntil(free(channel)) ifcorrect(last_frame): wait(channel_free_during_t>=DIFS) else: wait(channel_free_during_t>=EIFS) back-off_time=int(random[0,min(255,7 (2^(N-1)))]) slotTime wait(channelfreeduringbackoff_time) #backofftimerisfrozenwhilechannelissensedtobebusy send(frame) wait(ackortimeout) ifreceived(ack) #framereceivedcorrectly break else: #retransmissionrequired N=N+1 224Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 229

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Anotherproblemfacedbywirelessnetworksisoftencalledthe hiddenstationproblem.Inawirelessnetwork, radiosignalsarenotalwayspropagatedsamewayinalldirections.Forexample,twodevicesseparatedbyawall maynotbeabletoreceiveeachother'ssignalwhiletheycouldbothbereceivingthesignalproducedbyathird host.Thisisillustratedinthegurebelow,butitcanhappeninotherenvironments.Forexample,twodevicesthat areondifferentsidesofahillmaynotbeabletoreceiveeachother'ssignalwhiletheyarebothabletoreceivethe signalsentbyastationatthetopofthehill.Furthermore,theradiopropagationconditionsmaychangewithtime. Forexample,atruckmaytemporarilyblockthecommunicationbetweentwonearbydevices. Figure6.12:Thehiddenstationproblem Toavoidcollisionsinthesesituations,CSMA/CAallowsdevicestoreservethetransmissionchannelforsome time.Thisisdonebyusingtwocontrolframes: RequestToSend (RTS)and ClearToSend (CTS).Botharevery shortframestominimizetheriskofcollisions.Toreservethetransmissionchannel,adevicesendsaRTSframe totheintendedrecipientofthedataframe.TheRTSframecontainsthedurationoftherequestedreservation.The recipientreplies,afteraSIFSdelay,withaCTSframewhichalsocontainsthedurationofthereservation.Asthe durationofthereservationhasbeensentinbothRTSandCTS,allhoststhatcouldcollidewitheitherthesender orthereceptionofthedataframeareinformedofthereservation.Theycancomputethetotaldurationofthe transmissionanddefertheiraccesstothetransmissionchanneluntilthen.Thisisillustratedinthegurebelow wherehost A reservesthetransmissionchanneltosendadataframetohost B.Host C noticesthereservationand defersitstransmission. Figure6.13:ReservationswithCSMA/CA TheutilizationofthereservationswithCSMA/CAisanoptimisationthatisusefulwhencollisionsarefrequent. Iftherearefewcollisions,thetimerequiredtotransmittheRTSandCTSframescanbecomesignicantandin particularwhenshortframesareexchanged.SomedevicesonlyturnonRTS/CTSaftertransmissionerrors. 6.2.MediumAccessControl 225

    PAGE 230

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 6.2.6DeterministicMediumAccessControlalgorithms Duringthe1970sand1980s,therewerehugedebatesinthenetworkingcommunityaboutthebestsuitedMedium AccessControlalgorithmsforLocalAreaNetworks.Theoptimisticalgorithmsthatwehavedescribeduntil nowwererelativelyeasytoimplementwhentheyweredesigned.Fromaperformanceperspective,mathematical modelsandsimulationsshowedtheabilityoftheseoptimistictechniquestosustainload.However,noneofthe optimistictechniquesareabletoguaranteethataframewillbedeliveredwithinagivendelayboundandsome applicationsrequirepredictabletransmissiondelays.ThedeterministicMACalgorithmswereconsideredbya fractionofthenetworkingcommunityasthebestsolutiontofullltheneedsofLocalAreaNetworks. BoththeproponentsofthedeterministicandtheopportunistictechniqueslobbiedtodevelopstandardsforLocal Areanetworksthatwouldincorporatetheirsolution.Insteadoftryingtondanimpossiblecompromisebetween thesedivergingviews,theIEEE802committeethatwascharteredtodevelopLocalAreaNetworkstandards chosetoworkinparallelonthreedifferentLANtechnologiesandcreatedthreeworkinggroups.The IEEE 802.3workinggroup becameresponsibleforCSMA/CD.TheproponentsofdeterministicMACalgorithmsagreed onthebasicprincipleofexchangingspecialframescalledtokensbetweendevicestoregulatetheaccesstothe transmissionmedium.However,theydidnotagreeonthemostsuitablephysicallayoutforthenetwork.IBM arguedinfavorofRing-shapednetworkswhilethemanufacturingindustry,ledbyGeneralMotors,arguedin favorofabus-shapednetwork.Thisledtothecreationofthe IEEE802.4workinggroup tostandardiseTokenBus networksandthe IEEE802.5workinggroup tostandardiseTokenRingnetworks.Althoughthesetechniquesare notwidelyusedanymoretoday,theprinciplesbehindatoken-basedprotocolarestillimportant. TheIEEE802.5TokenRingtechnologyisdenedin [802.5].WeuseTokenRingasanexampletoexplainthe principlesofthetoken-basedMACalgorithmsinring-shapednetworks.Otherring-shapednetworksincludethe almostdefunctFDDI [Ross1989] orthemorerecentResilientPackRing [DYGU2004] .Agoodsurveyofthe tokenringnetworksmaybefoundin [Bux1989] ATokenRingnetworkiscomposedofasetofstationsthatareattachedtoaunidirectionalring.Thebasicprinciple oftheTokenRingMACalgorithmisthattwotypesofframestravelonthering:tokensanddataframes.Whenthe TokenRingstarts,oneofthestationssendsthetoken.Thetokenisasmallframethatrepresentstheauthorization totransmitdataframesonthering.Totransmitadataframeonthering,astationmustrstcapturethetokenby removingitfromthering.Asonlyonestationcancapturethetokenatatime,thestationthatownsthetokencan safelytransmitadataframeontheringwithoutriskingcollisions.Afterhavingtransmitteditsframe,thestation mustremoveitfromtheringandresendthetokensothatotherstationscantransmittheirownframes. Figure6.14:ATokenRingnetwork WhilethebasicprinciplesoftheTokenRingaresimple,thereareseveralsubtleimplementationdetailsthatadd complexitytoTokenRingnetworks.TounderstandthesedetailsletusanalysetheoperationofaTokenRing interfaceonastation.ATokenRinginterfaceservesthreedifferentpurposes.LikeotherLANinterfaces,itmust beabletosendandreceiveframes.Inaddition,aTokenRinginterfaceispartofthering,andassuch,itmustbe abletoforwardtheelectricalsignalthatpassesontheringevenwhenitsstationispoweredoff. Whenpowered-on,TokenRinginterfacesoperateintwodifferentmodes: listen and transmit.Whenoperating in listen mode,aTokenRinginterfacereceivesanelectricalsignalfromitsupstreamneighbouronthering, introducesadelayequaltothetransmissiontimeofonebitontheringandregeneratesthesignalbeforesending ittoitsdownstreamneighbouronthering. 226Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 231

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 TherstproblemfacedbyaTokenRingnetworkisthatasthetokenrepresentstheauthorizationtotransmit,it mustcontinuouslytravelontheringwhennodataframeisbeingtransmitted.Letusassumethatatokenhasbeen producedandsentontheringbyonestation.InTokenRingnetworks,thetokenisa24bitsframewhosestructure isshownbelow. Figure6.15:802.5tokenformat Thetokeniscomposedofthreeelds.First,the StartingDelimiter isthemarkerthatindicatesthebeginningofa frame.TherstTokenRingnetworksusedManchestercodingandthe StartingDelimiter containedbothsymbols representing 0 andsymbolsthatdonotrepresentbits.Thelasteldisthe EndingDelimiter whichmarkstheend ofthetoken.The AccessControl eldispresentinallframes,andcontainsseveralags.Themostimportantis the Token bitthatissetintokenframesandresetinotherframes. Letusconsiderthevestationnetworkdepictedingure ATokenRingnetwork aboveandassumethatstation S1 sendsatoken.Ifweneglectthepropagationdelayontheinter-stationlinks,aseachstationintroducesaonebit delay,therstbitoftheframewouldreturnto S1 whileitsendsthefthbitofthetoken.Ifstation S1 ispowered offatthattime,onlytherstvebitsofthetokenwilltravelonthering.Toavoidthisproblem,thereisaspecial stationcalledthe Monitor oneachTokenRing.Toensurethatthetokencantravelforeveronthering,this Monitor insertsadelaythatisequaltoatleast24bittransmissiontimes.Ifstation S3 wasthe Monitor ingure AToken Ringnetwork S1 wouldhavebeenabletotransmittheentiretokenbeforereceivingtherstbitofthetokenfrom itsupstreamneighbour. Nowthatwehaveexplainedhowthetokencanbeforwardedonthering,letusanalysehowastationcancapture atokentotransmitadataframe.Forthis,weneedsomeinformationabouttheformatofthedataframes.An 802.5dataframebeginswiththe StartingDelimiter followedbythe AccessControl eldwhose Token bitisreset, a FrameControl eldthatallowsforthedenitionofseveraltypesofframes,destinationandsourceaddress,a payload,aCRC,the EndingDelimiter anda FrameStatus eld.TheformatoftheTokenRingdataframesis illustratedbelow. Figure6.16:802.5dataframeformat Tocaptureatoken,astationmustoperatein Listen mode.Inthismode,thestationreceivesbitsfromitsupstream neighbour.Ifthebitscorrespondtoadataframe,theymustbeforwardedtothedownstreamneighbour.Ifthey correspondtoatoken,thestationcancaptureitandtransmititsdataframe.Boththedataframeandthetoken areencodedasabitstringbeginningwiththe StartingDelimiter followedbythe AccessControl eld.Whenthe stationreceivestherstbitofa StartingDelimiter,itcannotknowwhetherthisisadataframeoratokenand 6.2.MediumAccessControl 227

    PAGE 232

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 mustforwardtheentiredelimitertoitsdownstreamneighbour.Itisonlywhenitreceivesthefourthbitofthe AccessControl eld(i.e.the Token bit)thatthestationknowswhethertheframeisadataframeoratoken.If the Token bitisreset,itindicatesadataframeandtheremainingbitsofthedataframemustbeforwardedtothe downstreamstation.Otherwise( Token bitisset),thisisatokenandthestationcancaptureitbyresettingthe bitthatiscurrentlyinitsbuffer.Thankstothismodication,thebeginningofthetokenisnowthebeginningof adataframeandthestationcanswitchto Transmit modeandsenditsdataframestartingatthefthbitofthe AccessControl eld.Thus,theone-bitdelayintroducedbyeachTokenRingstationplaysakeyroleinenabling thestationstoefcientlycapturethetoken. Afterhavingtransmitteditsdataframe,thestationmustremainin Transmit modeuntilithasreceivedthelastbit ofitsowndataframe.Thisensuresthatthebitssentbyastationdonotremaininthenetworkforever.Adata framesentbyastationinaTokenRingnetworkpassesinfrontofallstationsattachedtothenetwork.Eachstation candetectthedataframeandanalysethedestinationaddresstopossiblycapturetheframe. The FrameStatus eldthatappearsafterthe EndingDelimiter isusedtoprovideacknowledgementswithout requiringspecialframes.The FrameStatus containstwoags: A and C.Bothagsareresetwhenastationsends adataframe.Theseagscanbemodiedbytheirrecipients.Whenastationsensesitsaddressasthedestination addressofaframe,itcancapturetheframe,checkitsCRCandplaceitinitsownbuffers.Thedestinationof aframemustsetthe A bit(resp. C bit)ofthe FrameStatus eldonceithasseen(resp.copied)adataframe. Byinspectingthe FrameStatus ofthereturningframe,thesendercanverifywhetheritsframehasbeenreceived correctlybyitsdestination. ThetextabovedescribesthebasicoperationofaTokenRingnetworkwhenallstationsworkcorrectly.Unfortunately,arealTokenRingnetworkmustbeabletohandlevarioustypesofanomaliesandthisincreasesthe complexityofTokenRingstations.Webrieylisttheproblemsandoutlinetheirsolutionsbelow.Adetailed descriptionoftheoperationofTokenRingstationsmaybefoundin [802.5].Therstproblemiswhenallthe stationsattachedtothenetworkstart.Oneofthemmustbootstrapthenetworkbysendingthersttoken.Forthis, allstationsimplementadistributedelectionmechanismthatisusedtoselectthe Monitor.Anystationcanbecome a Monitor.The Monitor managestheTokenRingnetworkandensuresthatitoperatescorrectly.Itsrstroleisto introduceadelayof24bittransmissiontimestoensurethatthetokencantravelsmoothlyonthering.Second, the Monitor sendsthersttokenonthering.Itmustalsoverifythatthetokenpassesregularly.Accordingto theTokenRingstandard [802.5],astationcannotretainthetokentotransmitdataframesforadurationlonger thanthe TokenHoldingTime (THT)(slightlylessthan10milliseconds).Onanetworkcontaining N stations,the Monitor mustreceivethetokenatleastevery N THT seconds.Ifthe Monitor doesnotreceiveatokenduring suchaperiod,itcutstheringforsometimeandthenreinitialisestheringandsendsatoken. SeveralotheranomaliesmayoccurinaTokenRingnetwork.Forexample,astationcouldcaptureatokenand bepoweredoffbeforehavingresentthetoken.Anotherstationcouldhavecapturedthetoken,sentitsdataframe andbepoweredoffbeforereceivingallofitsdataframe.Inthiscase,thebitstringcorrespondingtotheendofa framewouldremainintheringwithoutbeingremovedbyitssender.Severaltechniquesaredenedin [802.5] to allowthe Monitor tohandlealltheseproblems.Ifunfortunately,the Monitor fails,anotherstationwillbeelected tobecomethenew Monitor. 6.3Datalinklayertechnologies Inthissection,wereviewthekeycharacteristicsofseveraldatalinklayertechnologies.Wediscussinmoredetail thetechnologiesthatarewidelyusedtoday.Adetailedsurveyofalldatalinklayertechnologieswouldbeoutside thescopeofthisbook. 6.3.1ThePoint-to-PointProtocol Manypoint-to-pointdatalinklayers 2 havebeendeveloped,startinginthe1960s [McFadyen1976].Inthissection, wefocusontheprotocolsthatareoftenusedtotransportIPpacketsbetweenhostsorroutersthataredirectly connectedbyapoint-to-pointlink.Thislinkcanbeadedicatedphysicalcable,aleasedlinethroughthetelephone networkoradial-upconnectionwithmodemsonthetwocommunicatinghosts. 2 LAPBandHDLC werewidelyuseddatalinklayerprotocols. 228Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 233

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 TherstsolutiontotransportIPpacketsoveraseriallinewasproposedin RFC1055 andisknownas Serial LineIP (SLIP).SLIPisasimplecharacterstufngtechniqueappliedtoIPpackets.SLIPdenestwospecial characters: END (decimal192)and ESC (decimal219). END appearsatthebeginningandattheendofeach transmittedIPpacketandthesenderadds ESC beforeeach END characterinsideeachtransmittedIPpacket. SLIPonlysupportsthetransmissionofIPpacketsanditassumesthatthetwocommunicatinghosts/routershave beenmanuallyconguredwitheachother'sIPaddress.SLIPwasmainlyusedoverlinksofferingbandwidthof oftenlessthan20Kbps.Onsuchalowbandwidthlink,sending20bytesofIPheaderfollowedby20bytesof TCPheaderforeachTCPsegmenttakesalotoftime.Thisinitiatedthedevelopmentofafamilyofcompression techniquestoefcientlycompresstheTCP/IPheaders.Therstheadercompressiontechniqueproposedin RFC 1144 wasdesignedtoexploittheredundancybetweenseveralconsecutivesegmentsthatbelongtothesameTCP connection.Inallthesesegments,theIPaddressesandportnumbersarealwaysthesame.Furthermore,elds suchasthesequenceandacknowledgementnumbersdonotchangeinarandomway. RFC1144 denedsimple techniquestoreducetheredundancyfoundinsuccessivesegments.Thedevelopmentofheadercompression techniquescontinuedandtherearestillimprovementsbeingdevelopednow RFC5795. WhileSLIPwasimplementedandusedinsomeenvironments,ithadseverallimitationsdiscussedin RFC1055. The Point-to-PointProtocol (PPP)wasdesignedshortlyafterandisspeciedin RFC1548.PPPaimstosupport IPandothernetworklayerprotocolsovervarioustypesofseriallines.PPPisinfactafamilyofthreeprotocols thatareusedtogether: 1.The Point-to-PointProtocol denestheframingtechniquetotransportnetworklayerpackets. 2.The LinkControlProtocol thatisusedtonegotiateoptionsandauthenticatethesessionbyusingusername andpasswordorothertypesofcredentials 3.The NetworkControlProtocol thatisspecicforeachnetworklayerprotocol.Itisusedtonegotiateoptions thatarespecicforeachprotocol.Forexample,IPv4'sNCP RFC1548 cannegotiatetheIPv4addressto beused,theIPv4addressoftheDNSresolver.IPv6'sNCPisdenedin RFC5072. ThePPPframing RFC1662 wasinspiredbythedatalinklayerprotocolsstandardisedbyITU-TandISO.Atypical PPPframeiscomposedoftheeldsshowninthegurebelow.APPPframestartswithaonebyteagcontaining 01111110.PPPcanusebitstufngorcharacterstufngdependingontheenvironmentwheretheprotocolisused. Theaddressandcontroleldsarepresentforbackwardcompatibilityreasons.The16bitProtocoleldcontains theidentier 3 ofthenetworklayerprotocolthatiscarriedinthePPPframe. 0x002d isusedforanIPv4packet compressedwith RFC1144 while 0x002f isusedforanuncompressedIPv4packet. 0xc021 isusedbytheLink ControlProtocol, 0xc023 isusedbythePasswordAuthenticationProtocol(PAP). 0x0057 isusedforIPv6packets. PPPsupportsvariablelengthpackets,butLCPcannegotiateamaximumpacketlength.ThePPPframeendswith aFrameCheckSequence.Thedefaultisa16bitsCRC,butsomeimplementationscannegotiatea32bitsCRC. Theframeendswiththe 01111110 ag. Figure6.17:PPPframeformat PPPplayedakeyroleinallowingInternetServiceProviderstoprovidedial-upaccessovermodemsinthelate 1990sandearly2000s.ISPsoperatedmodembanksconnectedtothetelephonenetwork.FortheseISPs,akey issuewastoauthenticateeachuserconnectedthroughthetelephonenetwork.Thisauthenticationwasperformed byusingthe ExtensibleAuthenticationProtocol (EAP)denedin RFC3748.EAPisasimple,butextensible protocolthatwasinitiallyusedbyaccessrouterstoauthenticatetheusersconnectedthroughdialuplines.Several authenticationmethods,startingfromthesimpleusername/passwordpairstomorecomplexschemeshavebeen 3 TheIANAmaintainstheregistryofallassignedPPPprotocoleldsat:http://www.iana.org/assignments/ppp-numbers 6.3.Datalinklayertechnologies 229

    PAGE 234

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 denedandimplemented.WhenISPsstartedtoupgradetheirphysicalinfrastructuretoprovideInternetaccess over AsymmetricDigitalSubscriberLines (ADSL),theytriedtoreusetheirexistingauthentication(andbilling) systems.Tomeettheserequirements,theIETFdevelopedspecicationstoallowPPPframestobetransportedover othernetworksthanthepoint-to-pointlinksforwhichPPPwasdesigned.Nowadays,mostADSLdeployments usePPPovereitherATM RFC2364 orEthernet RFC2516. 6.3.2Ethernet Ethernetwasdesignedinthe1970satthePaloAltoResearchCenter [Metcalfe1976].Therstprototype 4 used acoaxialcableasthesharedmediumand3Mbpsofbandwidth.Ethernetwasimprovedduringthelate1970s andinthe1980s,DigitalEquipment,IntelandXeroxpublishedtherstofcialEthernetspecication [DIX].This specicationdenesseveralimportantparametersforEthernetnetworks.Therstdecisionwastostandardise thecommercialEthernetat10Mbps.Theseconddecisionwasthedurationofthe slottime.InEthernet,along slottime enablesnetworkstospanalongdistancebutforcesthehosttousealargerminimumframesize.The compromisewasa slottime of51.2microseconds,whichcorrespondstoaminimumframesizeof64bytes. Thethirddecisionwastheframeformat.Theexperimental3MbpsEthernetnetworkbuiltatXeroxusedshort framescontaining8bitsourceanddestinationaddresseselds,a16bittypeindication,upto554bytesofpayload anda16bitCRC.Using8bitaddresseswassuitableforanexperimentalnetwork,butitwasclearlytoosmall forcommercialdeployments.AlthoughtheinitialEthernetspecication [DIX] onlyallowedupto1024hostson anEthernetnetwork,italsorecommendedthreeimportantchangescomparedtothenetworkingtechnologiesthat wereavailableatthattime.TherstchangewastorequireeachhostattachedtoanEthernetnetworktohavea globallyuniquedatalinklayeraddress.Untilthen,datalinklayeraddressesweremanuallyconguredoneachhost. [DP1981] wentagainstthatstateoftheartandnotedSuitableinstallation-specicadministrativeproceduresare alsoneededforassigningnumberstohostsonanetwork.Ifahostismovedfromonenetworktoanotheritmay benecessarytochangeitshostnumberifitsformernumberisinuseonthenewnetwork.Thisiseasiersaidthan done,aseachnetworkmusthaveanadministratorwhomustrecordthecontinuouslychangingstateofthesystem (oftenonapieceofpapertackedtothewall!).Itisanticipatedthatinfutureofceenvironments,hostslocations willchangeasoftenastelephonesarechangedinpresent-dayofces.ThesecondchangeintroducedbyEthernet wastoencodeeachaddressasa48bitseld [DP1981].48bitaddresseswerehugecomparedtothenetworking technologiesavailableinthe1980s,butthehugeaddressspacehadseveraladvantages [DP1981] includingthe abilitytoallocatelargeblocksofaddressestomanufacturers.Eventually,otherLANtechnologiesoptedfor48bits addressesaswell [802]_ .ThethirdchangeintroducedbyEthernetwasthedenitionof broadcast and multicast addresses.Theneedfor multicast Ethernetwasforeseenin [DP1981] andthankstothesizeoftheaddressing spaceitwaspossibletoreservealargeblockofmulticastaddressesforeachmanufacturer. ThedatalinklayeraddressesusedinEthernetnetworksareoftencalledMACaddresses.Theyarestructuredas showninthegurebelow.Therstbitoftheaddressindicateswhethertheaddressidentiesanetworkadapter oramulticastgroup.Theupper24bitsareusedtoencodeanOrganisationUniqueIdentier(OUI).ThisOUI identiesablockofaddressesthathasbeenallocatedbythesecretariat 5 whoisresponsiblefortheuniqueness ofEthernetaddressestoamanufacturer.OnceamanufacturerhasreceivedanOUI,itcanbuildandsellproducts withoneofthe16millionaddressesinthisblock. Figure6.18:48bitsEthernetaddressformat Theoriginal10MbpsEthernetspecication [DIX] denedasimpleframeformatwhereeachframeiscomposed ofveelds.TheEthernetframestartswithapreamble(notshowninthegurebelow)thatisusedbythephysical layerofthereceivertosynchroniseitsclockwiththesender'sclock.Thersteldoftheframeisthedestination 4 AdditionalinformationaboutthehistoryoftheEthernettechnologymaybefoundathttp://ethernethistory.typepad.com/ 5 Initially,theOUIswereallocatedbyXerox [DP1981].However,onceEthernetbecameanIEEEandlateranISOstandard,theallocation oftheOUIsmovedtoIEEE.ThelistofallOUIallocationsmaybefoundat http://standards.ieee.org/regauth/oui/index.shtml 230Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 235

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 address.Asthisaddressisplacedatthebeginningoftheframe,anEthernetinterfacecanquicklyverifywhether itistheframerecipientandifnot,canceltheprocessingofthearrivingframe.Thesecondeldisthesource address.Whilethedestinationaddresscanbeeitheraunicastoramulticast/broadcastaddress,thesourceaddress mustalwaysbeaunicastaddress.Thethirdeldisa16bitsintegerthatindicateswhichtypeofnetworklayer packetiscarriedinsidetheframe.Thiseldisoftencalledthe EtherType.Frequentlyused EtherType values 6 include 0x0800 forIPv4, 0x86DD forIPv6 7 and 0x806 fortheAddressResolutionProtocol(ARP). ThefourthpartoftheEthernetframeisthepayload.Theminimumlengthofthepayloadis46bytestoensurea minimumframesize,includingtheheaderof512bits.TheEthernetpayloadcannotbelongerthan1500bytes. ThissizewasfoundreasonablewhentherstEthernetspecicationwaswritten.Atthattime,Xeroxhadbeen usingitsexperimental3MbpsEthernetthatoffered554bytesofpayloadand RFC1122 requiredaminimum MTUof572bytesforIPv4.1500byteswaslargeenoughtosupporttheseneedswithoutforcingthenetwork adapterstocontainoverlylargememories.Furthermore,simulationsandmeasurementstudiesperformedin EthernetnetworksrevealedthatCSMA/CDwasabletoachieveaveryhighutilization.Thisisillustratedinthe gurebelowbasedon [SH1980],whichshowsthechannelutilizationachievedinEthernetnetworkscontaining differentnumbersofhoststhataresendingframesofdifferentsizes. Figure6.19:Impactoftheframelengthonthemaximumchannelutilisation[SH1980] ThelasteldoftheEthernetframeisa32bitCyclicalRedundancyCheck(CRC).ThisCRCisabletocatcha muchlargernumberoftransmissionerrorsthantheInternetchecksumusedbyIP,UDPandTCP [SGP98].The formatoftheEthernetframeisshownbelow. Note: WhereshouldtheCRCbelocatedinaframe? ThetransportanddatalinklayersusuallychosedifferentstrategiestoplacetheirCRCsorchecksums.Transport layerprotocolsusuallyplacetheirCRCsorchecksumsinthesegmentheader.Datalinklayerprotocolssometimes placetheirCRCintheframeheader,butofteninatrailerattheendoftheframe.Thischoicereectsimplementationassumptions,butalsoinuencesperformance RFC893.WhentheCRCisplacedinthetrailer,asinEthernet, thedatalinklayercancomputeitwhiletransmittingtheframeandinsertitattheendofthetransmission.AllEthernetinterfacesusethisoptimisationtoday.Whenthechecksumisplacedintheheader,asinaTCPsegment, itisimpossibleforthenetworkinterfacetocomputeitwhiletransmittingthesegment.Somenetworkinterfaces providehardwareassistancetocomputetheTCPchecksum,butthisismorecomplexthaniftheTCPchecksum wereplacedinthetrailer 8 6 TheofciallistofallassignedEthernettypevaluesisavailablefrom http://standards.ieee.org/regauth/ethertype/eth.txt 7 Theattentivereadermayquestiontheneedfordifferent EtherTypes forIPv4andIPv6whiletheIPheaderalreadycontainsaversion eldthatcanbeusedtodistinguishbetweenIPv4andIPv6packets.Theoretically,IPv4andIPv6couldhaveusedthesame EtherType. Unfortunately,developersoftheearlyIPv6implementationsfoundthatsomedevicesdidnotchecktheversioneldoftheIPv4packetsthat theyreceivedandparsedframeswhose EtherType wassetto 0x0800 asIPv4packets.SendingIPv6packetstosuchdeviceswouldhavecaused disruptions.Toavoidthisproblem,theIETFdecidedtoapplyforadistinct EtherType valueforIPv6. 8 ThesenetworkinterfacescomputetheTCPchecksumwhileasegmentistransferredfromthehostmemorytothenetworkinterface 6.3.Datalinklayertechnologies 231

    PAGE 236

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure6.20:EthernetDIXframeformat TheEthernetframeformatshownaboveisspeciedin [DIX].ThisistheformatusedtosendbothIPv4 RFC894 andIPv6packets RFC2464.Afterthepublicationof [DIX],theInstituteofElectricalandElectronicEngineers (IEEE)begantostandardiseseveralLocalAreaNetworktechnologies.IEEEworkedonseveralLANtechnologies,startingwithEthernet,TokenRingandTokenBus.Thesethreetechnologieswerecompletelydifferent,but theyallagreedtousethe48bitsMACaddressesspeciedinitiallyforEthernet [802]_ .Whiledevelopingits Ethernetstandard [802.3],theIEEE802.3workinggroupwasconfrontedwithaproblem.Ethernetmandateda minimumpayloadsizeof46bytes,whilesomecompanieswerelookingforaLANtechnologythatcouldtransparentlytransportshortframescontainingonlyafewbytesofpayload.SuchaframecanbesentbyanEthernet hostbypaddingittoensurethatthepayloadisatleast46byteslong.HoweversincetheEthernetheader [DIX] doesnotcontainalengtheld,itisimpossibleforthereceivertodeterminehowmanyusefulbyteswereplaced insidethepayloadeld.Tosolvethisproblem,theIEEEdecidedtoreplacethe Type eldoftheEthernet [DIX] headerwithalengtheld 9 .This Length eldcontainsthenumberofusefulbytesintheframepayload.Thepayloadmuststillcontainatleast46bytes,butpaddingbytesareaddedbythesenderandremovedbythereceiver. Inordertoaddthe Length eldwithoutsignicantlychangingtheframeformat,IEEEhadtoremovethe Type eld.Withoutthiseld,itisimpossibleforareceivinghosttoidentifythetypeofnetworklayerpacketinsidea receivedframe.Tosolvethisnewproblem,IEEEdevelopedacompletelynewsublayercalledtheLogicalLink Control [802.2].Severalprotocolsweredenedinthissublayer.Oneofthemprovidedaslightlydifferentversion ofthe Type eldoftheoriginalEthernetframeformat.Anothercontainedacknowledgementsandretransmissions toprovideareliableservice...Inpractice, [802.2] isneverusedtosupportIPinEthernetnetworks.Thegure belowshowstheofcial [802.3] frameformat. Figure6.21:Ethernet802.3frameformat Note: WhatistheEthernetservice? AnEthernetnetworkprovidesanunreliableconnectionlessservice.Itsupportsthreedifferenttransmissionmodes [SH2004]. 9 Fortunately,IEEEwasabletodenethe [802.3] frameformatwhilemaintainingbackwardcompatibilitywiththeEthernet [DIX] frame format.Thetrickwastoonlyassignvaluesabove1500as EtherType values.Whenahostreceivesaframe,itcandeterminewhetherthe frame'sformatbycheckingits EtherType/Length eld.Avaluelowersmallerthan 1501 isclearlyalengthindicatorandthusan [802.3] frame. Avaluelargerthan 1501 canonlybetypeandthusa [DIX] frame. 232Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 237

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 [unicast, multicast and broadcast.WhiletheEthernetserviceisunreliableintheory,agoodEthernet networkshould,inpractice,provideaservicethat:] deliversframestotheirdestinationwithaveryhighprobabilityofsuccessfuldelivery doesnotreorderthetransmittedframes TherstpropertyisaconsequenceoftheutilisationofCSMA/CD.Thesecondpropertyisaconsequenceof thephysicalorganisationoftheEthernetnetworkasasharedbus.Thesetwopropertiesareimportantandall evolutionsoftheEthernettechnologyhavepreservedthem. SeveralphysicallayershavebeendenedforEthernetnetworks.Therstphysicallayer,usuallycalled10Base5, provided10Mbpsoverathickcoaxialcable.Thecharacteristicsofthecableandthetransceiversthatwereused thenenabledtheutilisationof500meterlongsegments.A10Base5networkcanalsoincluderepeatersbetween segments. Thesecondphysicallayerwas10Base2.Thisphysicallayerusedathincoaxialcablethatwaseasiertoinstall thanthe10Base5cable,butcouldnotbelongerthan185meters.A10BaseFphysicallayerwasalsodened totransportEthernetoverpoint-to-pointopticallinks.Themajorchangetothephysicallayerwasthesupport oftwistedpairsinthe10BaseTspecication.Twistedpaircablesaretraditionallyusedtosupportthetelephone serviceinofcebuildings.Mostofcebuildingstodayareequippedwithstructuredcabling.Severaltwistedpair cablesareinstalledbetweenanyroomandacentraltelecomclosetperbuildingorperoorinlargebuildings. ThesetelecomclosetsactasconcentrationpointsforthetelephoneservicebutalsoforLANs. TheintroductionofthetwistedpairsledtotwomajorchangestoEthernet.Therstchangeconcernsthephysical topologyofthenetwork.10Base2and10Base5networksaresharedbuses,thecoaxialcabletypicallypasses througheachroomthatcontainsaconnectedcomputer.A10BaseTnetworkisastar-shapednetwork.Allthe devicesconnectedtothenetworkareattachedtoatwistedpaircablethatendsinthetelecomcloset.From amaintenanceperspective,thisisamajorimprovement.Thecableisaweakpointin10Base2and10Base5 networks.Anyphysicaldamageonthecablebroketheentirenetworkandwhensuchafailureoccurred,the networkadministratorhadtomanuallychecktheentirecabletodetectwhereitwasdamaged.With10BaseT, whenonetwistedpairisdamaged,onlythedeviceconnectedtothistwistedpairisaffectedandthisdoesnot affecttheotherdevices.Thesecondmajorchangeintroducedby10BaseTwasthatiswasimpossibletobuilda 10BaseTnetworkbysimplyconnectingallthetwistedpairstogether.Allthetwistedpairsmustbeconnectedto arelaythatoperatesinthephysicallayer.Thisrelayiscalledan Ethernethub.A hub isthusaphysicallayer relaythatreceivesanelectricalsignalononeofitsinterfaces,regeneratesthesignalandtransmitsitoverallits otherinterfaces.Some hubs arealsoabletoconverttheelectricalsignalfromonephysicallayertoanother(e.g. 10BaseTto10Base2conversion). Figure6.22:Ethernethubsinthereferencemodel ComputerscandirectlybeattachedtoEthernethubs.EthernethubsthemselvescanbeattachedtootherEthernet hubstobuildalargernetwork.However,someimportantguidelinesmustbefollowedwhenbuildingacomplex networkwithhubs.First,thenetworktopologymustbeatree.Ashubsarerelaysinthephysicallayer,adding alinkbetween Hub2 and Hub3 inthenetworkbelowwouldcreateanelectricalshortcutthatwouldcompletely disruptthenetwork.Thisimpliesthattherecannotbeanyredundancyinahub-basednetwork.Afailureofa huborofalinkbetweentwohubswouldpartitionthenetworkintotwoisolatednetworks.Second,ashubsare relaysinthephysicallayer,collisionscanhappenandmustbehandledbyCSMA/CDasina10Base5network. Thisimpliesthatthemaximumdelaybetweenanypairofdevicesinthenetworkcannotbelongerthanthe51.2 6.3.Datalinklayertechnologies 233

    PAGE 238

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 microseconds slottime.Ifthedelayislonger,collisionsbetweenshortframesmaynotbecorrectlydetected.This constraintlimitsthegeographicalspreadof10BaseTnetworkscontaininghubs. Figure6.23:AhierarchicalEthernetnetworkcomposedofhubs Inthelate1980s,10Mbpsbecametooslowforsomeapplicationsandnetworkmanufacturersdevelopedseveral LANtechnologiesthatofferedhigherbandwidth,suchasthe100MbpsFDDILANthatusedopticalbers.Asthe developmentof10Base5,10Base2and10BaseThadshownthatEthernetcouldbeadaptedtodifferentphysical layers,severalmanufacturersstartedtoworkon100MbpsEthernetandconvincedIEEEtostandardisethisnew technologythatwasinitiallycalled FastEthernet. FastEthernet wasdesignedundertwoconstraints.First, FastEthernet hadtosupporttwistedpairs.Althoughitwaseasierfromaphysicallayerperspectivetosupport higherbandwidthoncoaxialcablesthanontwistedpairs,coaxialcableswereanightmarefromdeployment andmaintenanceperspectives.Second, FastEthernet hadtobeperfectlycompatiblewiththeexisting10Mbps Ethernetstoallow FastEthernet technologytobeusedinitiallyasabackbonetechnologytointerconnect10 MbpsEthernetnetworks.Thisforced FastEthernet touseexactlythesameframeformatas10MbpsEthernet. Thisimpliedthattheminimum FastEthernet framesizeremainedat512bits.TopreserveCSMA/CDwiththis minimumframesizeand100Mbpsinsteadof10Mbps,thedurationofthe slottime wasdecreasedto5.12 microseconds. TheevolutionofEthernetdidnotstop.In1998,theIEEEpublishedarststandardtoprovideGigabitEthernet overopticalbers.Severalothertypesofphysicallayerswereaddedafterwards.The 10GigabitEthernet standard appearedin2002.Workisongoingtodevelop standards for40Gigabitand100GigabitEthernetandsomeare thinkingabout TerabitEthernet.ThetablebelowliststhemainEthernetstandards.Amoredetailedlistmaybe foundat http://en.wikipedia.org/wiki/Ethernet_physical_layer Standard Comments 10Base5 Thickcoaxialcable,500m 10Base2 Thincoaxialcable,185m 10BaseT Twopairsofcategory3+UTP 10Base-F 10Mb/soveropticalber 100Base-Tx Category5UTPorSTP,100mmaximum 100Base-FX Twomultimodeopticalber,2kmmaximum 1000Base-CX Twopairsshieldedtwistedpair,25mmaximum 1000Base-SX Twomultimodeorsinglemodeopticalberswithlasers 10Gbps OpticalberbutalsoCategory6UTP 40-100Gbps Beingdeveloped,standardexpectedin2010 EthernetSwitches Increasingthephysicallayerbandwidthasin FastEthernet wasonlyoneofthesolutionstoimprovetheperformanceofEthernetLANs.Asecondsolutionwastoreplacethehubswithmoreintelligentdevices.As Ethernet hubs operateinthephysicallayer,theycanonlyregeneratetheelectricalsignaltoextendthegeographicalreach 234Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 239

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 ofthenetwork.Fromaperformanceperspective,itwouldbemoreinterestingtohavedevicesthatoperateinthe datalinklayerandcananalysethedestinationaddressofeachframeandforwardtheframesselectivelyonthelink thatleadstothedestination.Suchdevicesareusuallycalled Ethernetswitches 10 .An Ethernetswitch isarelay thatoperatesinthedatalinklayerasisillustratedinthegurebelow. Figure6.24:Ethernetswitchesandthereferencemodel An Ethernetswitch understandstheformatoftheEthernetframesandcanselectivelyforwardframesovereach interface.Forthis,each Ethernetswitch maintainsa MACaddresstable.Thistablecontains,foreachMAC addressknownbytheswitch,theidentieroftheswitch'sportoverwhichaframesenttowardsthisaddressmust beforwardedtoreachitsdestination.Thisisillustratedbelowwiththe MACaddresstable ofthebottomswitch. Whentheswitchreceivesaframedestinedtoaddress B,itforwardstheframeonitsSouthport.Ifitreceivesa framedestinedtoaddress D,itforwardsitonlyonitsNorthport. Figure6.25:OperationofEthernetswitches OneofthesellingpointsofEthernetnetworksisthat,thankstotheutilisationof48bitsMACaddresses,an EthernetLANisplugandplayatthedatalinklayer.WhentwohostsareattachedtothesameEthernetsegment orhub,theycanimmediatelyexchangeEthernetframeswithoutrequiringanyconguration.Itisimportantto retainthisplugandplaycapabilityforEthernetswitchesaswell.ThisimpliesthatEthernetswitchesmustbe abletobuildtheirMACaddresstableautomaticallywithoutrequiringanymanualconguration.Thisautomatic congurationisperformedbythethe MACaddresslearning algorithmthatrunsoneachEthernetswitch.This algorithmextractsthesourceaddressofthereceivedframesandrememberstheportoverwhichaframefrom 10 TherstEthernetrelaysthatoperatedinthedatalinklayerswerecalled bridges.Inpractice,themaindifferencebetweenswitchesand bridgesisthatbridgeswereusuallyimplementedinsoftwarewhileswitchesarehardware-baseddevices.Throughoutthistext,wealwaysuse switch whenreferringtoarelayinthedatalinklayer,butyoumightstillseetheword bridge. 6.3.Datalinklayertechnologies 235

    PAGE 240

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 eachsourceEthernetaddresshasbeenreceived.ThisinformationisinsertedintotheMACaddresstablethatthe switchusestoforwardframes.Thisallowstheswitchtoautomaticallylearntheportsthatitcanusetoreacheach destinationaddress,providedthatthishosthaspreviouslysentatleastoneframe.Thisisnotaproblemsince mostupperlayerprotocolsuseacknowledgementsatsomelayerandthusevenanEthernetprintersendsEthernet framesaswell. Thepseudo-codebelowdetailshowanEthernetswitchforwardsEthernetframes.Itrstupdatesits MACaddress table withthesourceaddressoftheframe.The MACaddresstable usedbysomeswitchesalsocontainsa timestampthatisupdatedeachtimeaframeisreceivedfromeachknownsourceaddress.Thistimestampis usedtoremovefromthe MACaddresstable entriesthathavenotbeenactiveduringthelast n minutes.Thislimits thegrowthofthe MACaddresstable,butalsoallowshoststomovefromoneporttoanother.Theswitchusesits MACaddresstable toforwardthereceivedunicastframe.Ifthereisanentryfortheframe'sdestinationaddress inthe MACaddresstable,theframeisforwardedselectivelyontheportlistedinthisentry.Otherwise,theswitch doesnotknowhowtoreachthedestinationaddressanditmustforwardtheframeonallitsportsexcepttheport fromwhichtheframehasbeenreceived.Thisensuresthattheframewillreachitsdestination,attheexpenseof someunnecessarytransmissions.Theseunnecessarytransmissionswillonlylastuntilthedestinationhassentits rstframe.MulticastandBroadcastframesarealsoforwardedinasimilarway. #ArrivalofframeFonportP #Table:MACaddresstabledictionary:addr->port #Ports:listofallportsontheswitch src=F.SourceAddress dst=F.DestinationAddress Table[src]=P #srcheardonportP if isUnicast(dst): if dst in Table: ForwardFrame(F,Table[dst]) else: for o in Ports: if o!=P:ForwardFrame(F,o) else: #multicastorbroadcastdestination for o in Ports: if o!= P:ForwardFrame(F,o) Note: SecurityissueswithEthernethubsandswitches Fromasecurityperspective,Ethernethubshavethesamedrawbacksastheoldercoaxialcable.Ahostattachedto ahubwillbeabletocapturealltheframesexchangedbetweenanypairofhostsattachedtothesamehub.Ethernet switchesaremuchbetterfromthisperspectivethankstotheselectiveforwarding,ahostwillusuallyonlyreceive theframesdestinedtoitselfaswellasthemulticast,broadcastandunknownframes.However,thisdoesnotimply thatswitchesarecompletelysecure.Thereare,unfortunately,attacksagainstEthernetswitches.Fromasecurity perspective,the MACaddresstable isoneofthefragileelementsofanEthernetswitch.Thistablehasaxed size.Somelow-endswitchescanstoreafewtensorafewhundredsofaddresseswhilehigher-endswitchescan storetensofthousandsofaddressesormore.Fromasecuritypointofview,alimitedresourcecanbethetarget ofDenialofServiceattacks.Unfortunately,suchattacksarealsopossibleonEthernetswitches.Amalicious hostcouldoverowthe MACaddresstable oftheswitchbygeneratingthousandsofframeswithrandomsource addresses.Oncethe MACaddresstable isfull,theswitchneedstobroadcastalltheframesthatitreceives.At thispoint,anattackerwillreceiveunicastframesthatarenotdestinedtoitsaddress.TheARPattackdiscussedin thepreviouschaptercouldalsooccurwithEthernetswitches [Vyncke2007].Recentswitchesimplementseveral typesofdefencesagainsttheseattacks,buttheyneedtobecarefullyconguredbythenetworkadministrator.See [Vyncke2007] foradetaileddiscussiononsecurityissueswithEthernetswitches. The MACaddresslearning algorithmcombinedwiththeforwardingalgorithmworkwellinatree-shapednetwork suchastheoneshownabove.However,todealwithlinkandswitchfailures,networkadministratorsoftenadd redundantlinkstoensurethattheirnetworkremainsconnectedevenafterafailure.Letusconsiderwhathappens intheEthernetnetworkshowninthegurebelow. Whenallswitchesboot,their MACaddresstable isempty.Assumethathost A sendsaframetowardshost C. Uponreceptionofthisframe,switch1updatesits MACaddresstable torememberthataddress A isreachable 236Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 241

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure6.26:Ethernetswitchesinaloop viaitsWestport.Asthereisnoentryforaddress C inswitch1's MACaddresstable,theframeisforwardedto bothswitch2andswitch3.Whenswitch2receivestheframe,itsupdatesits MACaddresstable foraddress A andforwardstheframetohost C aswellastoswitch3.switch3hasthusreceivedtwocopiesofthesameframe. Asswitch3doesnotknowhowtoreachthedestinationaddress,itforwardstheframereceivedfromswitch1to switch2andtheframereceivedfromswitch2toswitch1...Thesingleframesentbyhost A willbecontinuously duplicatedbytheswitchesuntiltheir MACaddresstable containsanentryforaddress C.Quickly,alltheavailable linkbandwidthwillbeusedtoforwardallthecopiesofthisframe.AsEthernetdoesnotcontainany TTL or HopLimit,thisloopwillneverstop. The MACaddresslearning algorithmallowsswitchestobeplug-and-play.Unfortunately,theloopsthatarise whenthenetworktopologyisnotatreeareasevereproblem.Forcingtheswitchestoonlybeusedintree-shaped networksashubswouldbeaseverelimitation.Tosolvethisproblem,theinventorsofEthernetswitcheshave developedthe SpanningTreeProtocol.ThisprotocolallowsswitchestoautomaticallydisableportsonEthernet switchestoensurethatthenetworkdoesnotcontainanycyclethatcouldcauseframestoloopforever. TheSpanningTreeProtocol(802.1d) The SpanningTreeProtocol (STP),proposedin [Perlman1985],isadistributedprotocolthatisusedbyswitches toreducethenetworktopologytoaspanningtree,sothattherearenocyclesinthetopology.Forexample, considerthenetworkshowninthegurebelow.Inthisgure,eachboldlinecorrespondstoanEthernettowhich twoEthernetswitchesareattached.ThisnetworkcontainsseveralcyclesthatmustbebrokentoallowEthernet switchesthatareusingtheMACaddresslearningalgorithmtoexchangeframes. Inthisnetwork,theSTPwillcomputethefollowingspanningtree. Switch1 willbetherootofthetree.Allthe interfacesof Switch1, Switch2 and Switch7 arepartofthespanningtree.Onlytheinterfaceconnectedto LANB willbeactiveon Switch9. LANH willonlybeservedby Switch7 andtheportof Switch44 on LANG willbe disabled.Aframeoriginatingon LANB anddestinedfor LANA willbeforwardedby Switch7 on LANC,thenby Switch1 on LANE,thenby Switch44 on LANF andeventuallyby Switch2 on LANA. Switchesrunningthe SpanningTreeProtocol exchange BPDUs.These BPDUs arealwayssentasframeswith destinationMACaddressasthe ALL_BRIDGES reservedmulticastMACaddress.Eachswitchhasaunique64 bit identier.Toensureuniqueness,thelower48bitsoftheidentieraresettotheuniqueMACaddressallocated totheswitchbyitsmanufacturer.Thehighorder16bitsoftheswitchidentiercanbeconguredbythenetwork administratortoinuencethetopologyofthespanningtree.Thedefaultvalueforthesehighorderbitsis32768. 6.3.Datalinklayertechnologies 237

    PAGE 242

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure6.27:SpanningtreecomputedinaswitchedEthernetnetwork Theswitchesexchange BPDUs tobuildthespanningtree.Intuitively,thespanningtreeisbuiltbyrstselecting theswitchwiththesmallest identier astherootofthetree.Thebranchesofthespanningtreearethencomposed oftheshortestpathsthatallowalloftheswitchesthatcomposethenetworktobereached.The BPDUs exchanged bytheswitchescontainthefollowinginformation: the identier oftherootswitch(R) the cost oftheshortestpathbetweentheswitchthatsentthe BPDU andtherootswitch(c) the identier oftheswitchthatsentthe BPDU (T ) thenumberoftheswitchportoverwhichthe BPDU wassent(p) Wewillusethenotation torepresenta BPDU whose rootidentier is R, cost is c andthatwassent ontheport p ofswitch T.Theconstructionofthespanningtreedependsonanorderingrelationshipamongthe BPDUs.Thisorderingrelationshipcouldbeimplementedbythepythonfunctionbelow. #returnsTrueifbpdub1isbetterthanbpdub2 def better(b1,b2): return ((b1.R isreceivedonport q,theswitchcomputestheport's priority vector : V[q]= ,where cost[q] isthecostassociatedtotheportoverwhichthe BPDU was 238Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 243

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 received.Theswitchstoresinatablethelast priorityvector receivedoneachport.Theswitchthencomparesits own identier withthesmallest rootidentier storedinthistable.Ifitsown identier issmaller,thentheswitch istherootofthespanningtreeandis,bydenition,atadistance 0 oftheroot.The BPDU oftheswitchisthen ,where R istheswitch identier and p willbesettotheportnumberoverwhichthe BPDU issent. Otherwise,theswitchchoosesthebestpriorityvectorfromitstable, bv=.Theportoverwhichthisbest priorityvectorwaslearnedistheswitchportthatisclosesttothe root switch.Thisportbecomesthe Root portof theswitch.Thereisonlyone Root portperswitch.Theswitchcanthencomputeits BPDU as BPDU= ,where R isthe rootidentier, c thecostofthebestpriorityvector, S theidentieroftheswitchand p willbe replacedbythenumberoftheportoverwhichthe BPDU willbesent.Theswitchcanthendeterminethestate ofallitsportsbycomparingitsown BPDU withthepriorityvectorreceivedoneachport.Iftheswitch's BPDU isbetterthanthepriorityvectorofthisport,theportbecomesa Designated port.Otherwise,theportbecomesa Blocked port. Thestateofeachportisimportantwhenconsideringthetransmissionof BPDUs.Therootswitchregularlysends itsown BPDU overallofits(Designated )ports.This BPDU isreceivedonthe Root portofalltheswitchesthat aredirectlyconnectedtothe rootswitch.Eachoftheseswitchescomputesitsown BPDU andsendsthis BPDU overallits Designated ports.These BPDUs arethenreceivedonthe Root portofdownstreamswitches,which thencomputetheirown BPDU,etc.Whenthenetworktopologyisstable,switchessendtheirown BPDU on alltheir Designated ports,oncetheyreceivea BPDU ontheir Root port.No BPDU issentona Blocked port. Switcheslistenfor BPDUs ontheir Blocked and Designated ports,butno BPDU shouldbereceivedoverthese portswhenthetopologyisstable.Theutilisationoftheportsforboth BPDUs anddataframesissummarisedin thetablebelow. Portstate ReceivesBPDUs SendsBPDU Handlesdataframes Blocked yes no no Root yes no yes Designated yes yes yes Toillustratetheoperationofthe SpanningTreeProtocol,letusconsiderthesimplenetworktopologyinthegure below. Figure6.28:AsimpleSpanningtreecomputedinaswitchedEthernetnetwork Assumethat Switch4 isthersttoboot.Itsendsitsown BPDU=<4,0,4,?> onitstwoports.When Switch1 boots,itsends BPDU=<1,0,1,1>.This BPDU isreceivedby Switch4,whichupdatesitstableandcomputesa new BPDU=<1,3,4,?>.Port1of Switch4 becomesthe Root portwhileitssecondportisstillinthe Designated state. Assumenowthat Switch9 bootsandimmediatelyreceives Switch1 `sBPDUonport1. Switch9 computesitsown BPDU=<1,1,9,?> andport1becomesthe Root portofthisswitch.This BPDU issentonport2of Switch9 and 6.3.Datalinklayertechnologies 239

    PAGE 244

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 reaches Switch4. Switch4 comparesthepriorityvectorbuiltfromthis BPDU (i.e. <1,2,9,2>)andnoticesthatitis betterthan Switch4 `s BPDU=<1,3,4,2>.Thus,port2becomesa Blocked porton Switch4. Duringthecomputationofthespanningtree,switchesdiscardallreceiveddataframes,asatthattimethenetwork topologyisnotguaranteedtobeloop-free.Oncethattopologyhasbeenstableforsometime,theswitchesagain starttousetheMAClearningalgorithmtoforwarddataframes.Onlythe Root and Designated portsareused toforwarddataframes.Switchesdiscardallthedataframesreceivedontheir Blocked portsandneverforward framesontheseports. Switches,portsandlinkscanfailinaswitchedEthernetnetwork.Whenafailureoccurs,theswitchesmustbe abletorecomputethespanningtreetorecoverfromthefailure.The SpanningTreeProtocol reliesonregular transmissionsofthe BPDUs todetectthesefailures.A BPDU containstwoadditionalelds:the Age ofthe BPDU andthe MaximumAge.The Age containstheamountoftimethathaspassedsincetherootswitchinitially originatedthe BPDU.Therootswitchsendsits BPDU withan Age ofzeroandeachswitchthatcomputesitsown BPDU incrementsits Age byone.The Age ofthe BPDUs storedonaswitch'stableisalsoincrementedevery second.A BPDU expireswhenits Age reachesthe MaximumAge.Whenthenetworkisstable,thisdoesnot happenas BPDU sareregularlysentbythe root switchanddownstreamswitches.However,ifthe root failsor thenetworkbecomespartitioned, BPDU willexpireandswitcheswillrecomputetheirown BPDU andrestartthe SpanningTreeProtocol.Onceatopologychangehasbeendetected,theforwardingofthedataframesstopsas thetopologyisnotguaranteedtobeloop-free.Additionaldetailsaboutthereactiontofailuresmaybefoundin [802.1d] VirtualLANs AnotherimportantadvantageofEthernetswitchesistheabilitytocreateVirtualLocalAreaNetworks(VLANs). AvirtualLANcanbedenedasa setofportsattachedtooneormoreEthernetswitches.Aswitchcansupport severalVLANsanditrunsoneMAClearningalgorithmforeachVirtualLAN.Whenaswitchreceivesaframe withanunknownoramulticastdestination,itforwardsitoveralltheportsthatbelongtothesameVirtualLAN butnotovertheportsthatbelongtootherVirtualLANs.Similarly,whenaswitchlearnsasourceaddressona port,itassociatesittotheVirtualLANofthisportandusesthisinformationonlywhenforwardingframesonthis VirtualLAN. ThegurebelowillustratesaswitchedEthernetnetworkwiththreeVirtualLANs. VLAN2 and VLAN3 onlyrequire alocalcongurationofswitch S1.Host C canexchangeframeswithhost D,butnotwithhoststhatareoutsideof itsVLAN. VLAN1 ismorecomplexasthereareportsofthisVLANonseveralswitches.TosupportsuchVLANs, localcongurationisnotsufcientanymore.Whenaswitchreceivesaframefromanotherswitch,itmustbeable todeterminetheVLANinwhichtheframeoriginatedtousethecorrectMACtabletoforwardtheframe.This isdonebyassigninganidentiertoeachVirtualLANandplacingthisidentierinsidetheheadersoftheframes thatareexchangedbetweenswitches. Figure6.29:VirtualLocalAreaNetworksinaswitchedEthernetnetwork IEEEdenedinthe [802.1q] standardaspecialheadertoencodetheVLANidentiers.This32bitheader includesa20bitVLANeldthatcontainstheVLANidentierofeachframe.Theformatofthe [802.1q] header isdescribedbelow. The [802.1q] headerisinsertedimmediatelyafterthesourceMACaddressintheEthernetframe(i.e.beforethe EtherTypeeld).Themaximumframesizeisincreasedby4bytes.Itisencodedin32bitsandcontainsfour 240Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 245

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure6.30:Formatofthe802.1qheader elds.TheTagProtocolIdentierissetto 0x8100 toallowthereceivertodetectthepresenceofthisadditional header.The PriorityCodePoint (PCP)isathreebiteldthatisusedtosupportdifferenttransmissionpriorities fortheframe.Value 0 isthelowestpriorityandvalue 7 thehighest.Frameswithahigherprioritycanexpectto beforwardedearlierthanframeshavingalowerpriority.The C bitisusedforcompatibilitybetweenEthernetand TokenRingnetworks.Thelast12bitsofthe802.1qheadercontaintheVLANidentier.Value 0 indicatesthat theframedoesnotbelongtoanyVLANwhilevalue 0xFFF isreserved.Thisimpliesthat4094differentVLAN identierscanbeusedinanEthernetnetwork. 6.3.3802.11wirelessnetworks Theradiospectrumisalimitedresourcethatmustbesharedbyeveryone.Duringmostofthetwentiethcentury, governmentsandinternationalorganisationshaveregulatedmostoftheradiospectrum.Thisregulationcontrols theutilisationoftheradiospectrum,inordertoensurethattherearenointerferencesbetweendifferentusers. Acompanythatwantstouseafrequencyrangeinagivenregionmustapplyforalicensefromtheregulator. Mostregulatorschargeafeefortheutilisationoftheradiospectrumandsomegovernmentshaveencouraged competitionamongcompaniesbiddingforthesamefrequencytoincreasethelicensefees. Inthe1970s,aftertherstexperimentswithALOHANet,interestinwirelessnetworksgrew.Manyexperiments weredoneonandoutsidetheARPANet.Oneoftheseexperimentswasthe rstmobilephone ,whichwasdevelopedandtestedin1973.Thisexperimentalmobilephonewasthestartingpointfortherstgenerationanalog mobilephones.Giventhegrowingdemandformobilephones,itwasclearthattheanalogmobilephonetechnologywasnotsufcienttosupportalargenumberofusers.Tosupportmoreusersandnewservices,researchersin severalcountriesworkedonthedevelopmentofdigitalmobiletelephones.In1987,severalEuropeancountries decidedtodevelopthestandardsforacommoncellulartelephonesystemacrossEurope:the GlobalSystemfor MobileCommunications (GSM).Sincethen,thestandardshaveevolvedandmorethanthreebillionusersare connectedtoGSMnetworkstoday. Whilemostofthefrequencyrangesoftheradiospectrumarereservedforspecicapplicationsandrequirea speciallicence,thereareafewexceptions.Theseexceptionsareknownasthe Industrial,ScienticandMedical (ISM)radiobands.Thesebandscanbeusedforindustrial,scienticandmedicalapplicationswithoutrequiring alicencefromtheregulator.Forexample,someradio-controlledmodelsusethe27MHzISMbandandsome cordlesstelephonesoperateinthe915MHzISM.In1985,the2.400-2.500GHzbandwasaddedtothelistof ISMbands.Thisfrequencyrangecorrespondstothefrequenciesthatareemittedbymicrowaveovens.Sharing thisbandwithlicensedapplicationswouldhavelikelycausedinterferences,giventhelargenumberofmicrowave ovensthatareused.Despitetheriskofinterferenceswithmicrowaveovens,theopeningofthe2.400-2.500GHz allowedthenetworkingindustrytodevelopseveralwirelessnetworktechniquestoallowcomputerstoexchange datawithoutusingcables.Inthissection,wediscussinmoredetailthemostpopularone,i.e.theWiFi [802.11] familyofwirelessnetworks.Otherwirelessnetworkingtechniquessuchas BlueTooth or HiperLAN usethesame frequencyrange. Today,WiFiisaverypopularwirelessnetworkingtechnology.Therearemorethanseveralhundredsofmillions ofWiFidevices.Thedevelopmentofthistechnologystartedinthelate1980swiththe WaveLAN proprietary wirelessnetwork.WaveLANoperatedat2Mbpsanduseddifferentfrequencybandsindifferentregionsofthe world.Intheearly1990s,the IEEE createdthe 802.11workinggroup tostandardiseafamilyofwirelessnetwork technologies.Thisworkinggroupwasveryprolicandproducedseveralwirelessnetworkingstandardsthatuse differentfrequencyrangesanddifferentphysicallayers.Thetablebelowprovidesasummaryofthemain802.11 standards. 6.3.Datalinklayertechnologies 241

    PAGE 246

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Standard Frequency Typicalthroughput Maxbandwidth Range(m)indoor/outdoor 802.11 2.4GHz 0.9Mbps 2Mbps 20/100 802.11a 5GHz 23Mbps 54Mbps 35/120 802.11b 2.4GHz 4.3Mbps 11Mbps 38/140 802.11g 2.4GHz 19Mbps 54Mbps 38/140 802.11n 2.4/5GHz 74Mbps 150Mbps 70/250 Whendevelopingitsfamilyofstandards,the IEEE802.11workinggroup tookasimilarapproachasthe IEEE 802.3workinggroup thatdevelopedvarioustypesofphysicallayersforEthernetnetworks.802.11networksuse theCSMA/CAMediumAccessControltechniquedescribedearlierandtheyallassumethesamearchitectureand usethesameframeformat. ThearchitectureofWiFinetworksisslightlydifferentfromtheLocalAreaNetworksthatwehavediscusseduntil now.Thereare,inpractice,twomaintypesofWiFinetworks: independent or adhoc networksand infrastructure networks 11 .An independent or adhoc networkiscomposedofasetofdevicesthatcommunicatewitheachother. Thesedevicesplaythesameroleandthe adhoc networkisusuallynotconnectedtotheglobalInternet. Adhoc networksareusedwhenforexampleafewlaptopsneedtoexchangeinformationortoconnectacomputerwitha WiFiprinter. Figure6.31:An802.11independentoradhocnetwork MostWiFinetworksare infrastructure networks.An infrastructure networkcontainsoneormore accesspoints thatareattachedtoaxedLocalAreaNetwork(usuallyanEthernetnetwork)thatisconnectedtoothernetworks suchastheInternet.ThegurebelowshowssuchanetworkwithtwoaccesspointsandfourWiFidevices.Each WiFideviceisassociatedtooneaccesspointandusesthisaccesspointasarelaytoexchangeframeswiththe devicesthatareassociatedtoanotheraccesspointorreachablethroughtheLAN. An802.11accesspointisarelaythatoperatesinthedatalinklayerlikeswitches.Thegurebelowrepresentsthe layersofthereferencemodelthatareinvolvedwhenaWiFihostcommunicateswithahostattachedtoanEthernet networkthroughanaccesspoint. 802.11devicesexchangevariablelengthframes,whichhaveaslightlydifferentstructurethanthesimpleframe formatusedinEthernetLANs.Wereviewthekeypartsofthe802.11frames.Additionaldetailsmaybefound in[802.11]and [Gast2002] .An802.11framecontainsaxedlengthheader,avariablelengthpayloadthat maycontainup2324bytesofuserdataanda32bitsCRC.Althoughthepayloadcancontainupto2324bytes, 11 The802.11workinggroupdenedthe basicserviceset(BSS) asagroupofdevicesthatcommunicatewitheachother.Wecontinueto use network whenreferringtoasetofdevicesthatcommunicate. 242Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 247

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure6.32:An802.11infrastructurenetwork Figure6.33:An802.11accesspoint 6.3.Datalinklayertechnologies 243

    PAGE 248

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 most802.11deploymentsuseamaximumpayloadsizeof1500bytesastheyareusedin infrastructure networks attachedtoEthernetLANs.An802.11dataframeisshownbelow. Figure6.34:802.11dataframeformat Therstpartofthe802.11headeristhe16bit FrameControl eld.Thiseldcontainsagsthatindicatethe typeofframe(dataframe,RTS/CTS,acknowledgement,managementframes,etc),whethertheframeissentto orfromaxedLAN,etc [802.11].The Duration isa16biteldthatisusedtoreservethetransmissionchannel. Indataframes,the Duration eldisusuallysettothetimerequiredtotransmitoneacknowledgementframeafter aSIFSdelay.Notethatthe Duration eldmustbesettozeroinmulticastandbroadcastframes.Astheseframes arenotacknowledged,thereisnoneedtoreservethetransmissionchannelaftertheirtransmission.The Sequence control eldcontainsa12bitssequencenumberthatisincrementedforeachdataframe. Theastutereadermayhavenoticedthatthe802.11dataframescontainthree48-bitsaddresselds 12 .Thisis surprisingcomparedtootherprotocolsinthenetworkanddatalinklayerswhoseheadersonlycontainasourceand adestinationaddress.Theneedforathirdaddressinthe802.11headercomesfromthe infrastructure networks.In suchanetwork,framesareusuallyexchangedbetweenroutersandserversattachedtotheLANandWiFidevices attachedtooneoftheaccesspoints.Theroleofthethreeaddresseldsisspeciedbybitagsinthe Frame Control eld. WhenaframeissentfromaWiFidevicetoaserverattachedtothesameLANastheaccesspoint,therstaddress oftheframeissettotheMACaddressoftheaccesspoint,thesecondaddressissettotheMACaddressofthe sourceWiFideviceandthethirdaddressistheaddressofthenaldestinationontheLAN.Whentheserver replies,itsendsanEthernetframewhosesourceaddressisitsMACaddressandthedestinationaddressisthe MACaddressoftheWiFidevice.ThisframeiscapturedbytheaccesspointthatconvertstheEthernetheaderinto an802.11frameheader.The802.11framesentbytheaccesspointcontainsthreeaddresses:therstaddressis theMACaddressofthedestinationWiFidevice,thesecondaddressistheMACaddressoftheaccesspointand thethirdaddresstheMACaddressoftheserverthatsenttheframe. 802.11controlframesaresimplerthandataframes.Theycontaina FrameControl,a Duration eldandoneor twoaddresses.Theacknowledgementframesareverysmall.Theyonlycontaintheaddressofthedestinationof theacknowledgement.Thereisnosourceaddressandno SequenceControl eldintheacknowledgementframes. Thisisbecausetheacknowledgementframecaneasilybeassociatedtothepreviousframethatitacknowledges. Indeed,eachunicastdataframecontainsa Duration eldthatisusedtoreservethetransmissionchanneltoensure thatnocollisionwillaffecttheacknowledgementframe.The SequenceControl eldismainlyusedbythereceiver toremoveduplicateframes.Duplicateframesaredetectedasfollows.Eachdataframecontainsa12bits Sequence Control eldandthe FrameControl eldcontainsthe Retry bitagthatissetwhenaframeistransmitted.Each 802.11receiverstoresthemostrecentsequencenumberreceivedfromeachsourceaddressinframeswhose Retry bitisreset.Uponreceptionofaframewiththe Retry bitset,thereceiververiesitssequencenumbertodetermine whetheritisaduplicatedframeornot. 802.11RTS/CTSframesareusedtoreservethetransmissionchannel,inordertotransmitonedataframeandits acknowledgement.TheRTSframescontaina Duration andthetransmitterandreceiveraddresses.The Duration 12 Infact,the[802.11]frameformatcontainsafourthoptionaladdresseld.Thisfourthaddressisonlyusedwhenan802.11wireless networkisusedtointerconnectbridgesattachedtotwoclassicalLANnetworks. 244Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 249

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure6.35:IEEE802.11ACKandCTSframes eldoftheRTSframeindicatesthedurationoftheentirereservation(i.e.thetimerequiredtotransmittheCTS, thedataframe,theacknowledgementsandtherequiredSIFSdelays).TheCTSframehasthesameformatasthe acknowledgementframe. Figure6.36:IEEE802.11RTSframeformat Note: The802.11service Despitetheutilizationofacknowledgements,the802.11layeronlyprovidesanunreliableconnectionlessservice likeEthernetnetworksthatdonotuseacknowledgements.The802.11acknowledgementsareusedtominimize theprobabilityofframeduplication.Theydonotguaranteethatallframeswillbecorrectlyreceivedbytheir recipients.LikeEthernet,802.11networksprovideahighprobabilityofsuccessfuldeliveryoftheframes,nota guarantee.Furthermore,itshouldbenotedthat802.11networksdonotuseacknowledgementsformulticastand broadcastframes.Thisimpliesthatinpracticesuchframesaremorelikelytosufferfromtransmissionerrorsthan unicastframes. Inadditiontothedataandcontrolframesthatwehavebrieydescribedabove,802.11networksuseseveraltypes ofmanagementframes.Thesemanagementframesareusedforvariouspurposes.Webrieydescribesomeof theseframesbelow.Adetaileddiscussionmaybefoundin [802.11] and [Gast2002]. Arsttypeofmanagementframesarethe beacon frames.Theseframesarebroadcastedregularlybyaccess points.Each beaconframe containsinformationaboutthecapabilitiesoftheaccesspoint(e.g.thesupported 802.11transmissionrates)anda ServiceSetIdentity (SSID).TheSSIDisanull-terminatedASCIIstringthatcan containupto32characters.AnaccesspointmaysupportseveralSSIDsandannouncetheminbeaconframes.An accesspointmayalsochoosetoremainsilentandnotadvertisebeaconframes.Inthiscase,WiFistationsmay send Proberequest framestoforcetheavailableaccesspointstoreturna Proberesponse frame. Note: IPover802.11 TwotypesofencapsulationschemesweredenedtosupportIPinEthernetnetworks:theoriginalencapsulation scheme,builtabovetheEthernetDIXformatisdenedin RFC894 andasecondencapsulation RFC1042 scheme,builtabovetheLLC/SNAPprotocol [802.2].In802.11networks,thesituationissimplerandonlythe RFC1042 encapsulationisused.Inpractice,thisencapsulationadds6bytestothe802.11header.Therstfour bytescorrespondtotheLLC/SNAPheader.TheyarefollowedbythetwobytesEthernetTypeeld(0x800 forIP and 0x806 forARP).ThegurebelowshowsanIPpacketencapsulatedinan802.11frame. ThesecondimportantutilisationofthemanagementframesistoallowaWiFistationtobeassociatedwithan accesspoint.WhenaWiFistationstarts,itlistenstobeaconframestondtheavailableSSIDs.Tobeallowedto 6.3.Datalinklayertechnologies 245

    PAGE 250

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure6.37:IPoverIEEE802.11 sendandreceiveframesviaanaccesspoint,aWiFistationmustbeassociatedtothisaccesspoint.Iftheaccess pointdoesnotuseanysecuritymechanismtosecurethewirelesstransmission,theWiFistationsimplysendsan Associationrequest frametoitspreferredaccesspoint(usuallytheaccesspointthatitreceiveswiththestrongest radiosignal).ThisframecontainssomeparameterschosenbytheWiFistationandtheSSIDthatitrequeststo join.Theaccesspointreplieswithan Associationresponseframe ifitacceptstheWiFIstation. 6.4Summary Inthischapter,werstexplainedtheprinciplesofthedatalinklayer.Therearetwotypesofdatalinklayers:those usedoverpoint-to-pointlinksandthoseusedoverLocalAreaNetworks.Onpoint-to-pointlinks,thedatalinklayer mustatleastprovideaframingtechnique,butsomedatalinklayerprotocolsalsoincludereliabilitymechanisms suchasthoseusedinthetransportlayer.WehavedescribedthePoint-to-PointProtocolthatisoftenusedover point-to-pointlinksintheInternet. LocalAreaNetworksposeadifferentproblemsinceseveraldevicessharethesametransmissionchannel.Inthis case,aMediumAccessControlalgorithmisnecessarytoregulatetheaccesstothetransmissionchannelbecause whenevertwodevicestransmitatthesametimeacollisionoccursandnoneoftheseframescanbedecoded bytheirrecipients.TherearetwofamiliesofMACalgorithms.ThestatisticaloroptimisticMACalgorithms reducetheprobabilityofcollisionsbutdonotcompletelypreventthem.Withsuchalgorithms,whenacollision occurs,thecollidedframesmustberetransmitted.WehavedescribedtheoperationoftheALOHA,CSMA, CSMA/CDandCSMA/CAMACalgorithms.DeterministicorpessimisticMACalgorithmsavoidallcollisions. WehavedescribedtheTokenRingMACwherestationsexchangeatokentoregulatetheaccesstothetransmission channel. Finally,wehavedescribedinmoredetailtwosuccessfulLocalAreaNetworktechnologies:EthernetandWiFi. EthernetisnowthedefactoLANtechnology.WehaveanalysedtheevolutionofEthernetincludingtheoperation ofhubsandswitches.WehavealsodescribedtheSpanningTreeProtocolthatmustbeusedwhenswitches areinterconnected.Forthelastfewyears,WiFibecamethedefactowirelesstechnologyathomeandinside enterprises.WehaveexplainedtheoperationofWiFinetworksanddescribedthemain802.11frames. 6.5Exercises 1.Considertheswitchednetworkshowninthegurebelow.Whatisthespanningtreethatwillbecomputed by802.1dinthisnetworkassumingthatalllinkshaveaunitcost?Indicatethestateofeachport. 2.Considertheswitchednetworkshowninthegureabove.Inthisnetwork,assumethattheLANbetween switches 3 and 12 fails.Howshouldtheswitchesupdatetheirport/addresstablesafterthelinkfailure? 3.Manyenterprisenetworksareorganizedwithasetofbackbonedevicesinterconnectedbyusingafullmesh oflinksasshowninthegurebelow.Inthisnetwork,whatarethebenetsanddrawbacksofusingEthernet 246Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 251

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 Figure6.38:AsmallnetworkcomposedofEthernetswitches switchesandIProutersrunningOSPF? Figure6.39:Atypicalenterprisebackbonenetwork 4.MostcommercialEthernetswitchesareabletoruntheSpanningtreeprotocolindependentlyoneach VLAN.Whatarethebenetsofusingper-VLANspanningtrees? 6.5.Exercises 247

    PAGE 252

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 248Chapter6.ThedatalinklayerandtheLocalAreaNetworks

    PAGE 253

    CHAPTER 7 Glossary AIMD AdditiveIncrease,MultiplicativeDecrease.ArateadaptionalgorithmusednotablybyTCPwhereahost additivelyincreasesitstransmissionratewhenthenetworkisnotcongestedandmultiplicativelydecreases whencongestedisdetected. anycast atransmissionmodewhereaninformationissentfromonesourceto one receiverthatbelongstoa speciedgroup API ApplicationProgrammingInterface ARP TheAddressResolutionProtocolisaprotocolusedbyIPv4devicestoobtainthedatalinklayeraddress thatcorrespondstoanIPv4addressonthelocalareanetwork.ARPisdenedin RFC826 ARPANET TheAdvancedResearchProjectAgency(ARPA)Networkisanetworkthatwasbuiltbynetwork scientistsinUSAwithfundingfromtheARPAoftheUSMinistryofDefense.ARPANETisconsideredas thegrandfatheroftoday'sInternet. ascii TheAmericanStandardCodeforInformationInterchange(ASCII)isacharacter-encodingschemethat denesabinaryrepresentationforcharacters.TheASCIItablecontainsbothprintablecharactersand controlcharacters.ASCIIcharacterswereencodedin7bitsandonlycontainedthecharactersrequiredto writetextinEnglish.OthercharactersetssuchasUnicodehavebeendevelopedlatertosupportallwritten languages. ASN.1 TheAbstractSyntaxNotationOne(ASN.1)wasdesignedbyISOandITU-T.Itisastandardandexible notationthatcanbeusedtodescribedatastructuresforrepresenting,encoding,transmitting,anddecoding databetweenapplications.ItwasdesignedtobeusedinthePresentationlayeroftheOSIreferencemodel butisnowusedinotherprotocolssuchas SNMP. ATM AsynchronousTransferMode BGP TheBorderGatewayProtocolistheinterdomainroutingprotocolusedintheglobalInternet. BNF ABackus-NaurForm(BNF)isaformalwaytodescribealanguagebyusingsyntacticandlexicalrules. BNFsarefrequentlyusedtodeneprogramminglanguages,butalsotodenethemessagesexchanged betweennetworkedapplications. RFC5234 explainshowaBNFmustbewrittentospecifyanInternet protocol. broadcast atransmissionmodewhereissameinformationissenttoallnodesinthenetwork CIDR ClasslessInterDomainRoutingisthecurrentaddressallocationarchitectureforIPv4.Itwasdenedin RFC1518 and RFC4632. dial-upline Asynonymforaregulartelephoneline,i.e.alinethatcanbeusedtodialanytelephonenumber. DNS TheDomainNameSystemisadistributeddatabasethatallowstomapnamesonIPaddresses. DNS TheDomainNameSystemisdenedin RFC1035 249

    PAGE 254

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 DNS TheDomainNameSystemisadistributeddatabasethatcanbequeriedbyhoststomapnamesontoIP addresses eBGP AneBGPsessionisaBGPsessionbetweentwodirectlyconnectedroutersthatbelongtotwodifferent AutonomousSystems.AlsocalledanexternalBGPsession. EGP ExteriorGatewayProtocol.Synonymofinterdomainroutingprotocol EIGRP TheEnhancedInteriorGatewayRoutingProtocol(EIGRP)isaproprietaryintradomainroutingprotocol thatisoftenusedinenterprisenetworks.EIGRPusestheDUALalgorithmdescribedin [Garcia1993]. frame aframeistheunitofinformationtransferinthedatalinklayer Frame-Relay Awideareanetworkingtechnologyusingvirtualcircuitsthatisdeployedbytelecomoperators. ftp TheFileTransferProtocoldenedin RFC959 hasbeenthedefactoprotocoltoexchangelesoverthe InternetbeforethewidespreadadoptionofHTTP RFC2616 FTP TheFileTransferProtocolisdenedin RFC959 hosts.txt AlethatinitiallycontainedthelistofallInternethostswiththeirIPv4address.Asthenetworkgrew, thislewasreplacedbytheDNS,buteachhoststillmaintainsasmallhosts.txtlethatcanbeusedwhen DNSisnotavailable. HTML TheHyperTextMarkupLanguagespeciesthestructureandthesyntaxofthedocumentsthatareexchangedontheworldwideweb.HTMLismaintainedbythe HTMLworkinggroup ofthe W3C HTTP TheHyperTextTransportProtocolisdenedin RFC2616 hub Arelayoperatinginthephysicallayer. IANA TheInternetAssignedNumbersAuthority(IANA)isresponsibleforthecoordinationoftheDNSRoot, IPaddressing,andotherInternetprotocolresources iBGP AniBGPsessionisaBGPbetweentworoutersbelongingtothesameAutonomousSystem.Alsocalled aninternalBGPsession. ICANN TheInternetCorporationforAssignedNamesandNumbers(ICANN)coordinatestheallocationof domainnames,IPaddressesandASnumbersaswellprotocolparameters.Italsocoordinatestheoperation andtheevolutionoftheDNSrootnameservers. IETF TheInternetEngineeringTaskForceisanon-protorganisationthatdevelopsthestandardsfortheprotocolsusedintheInternet.TheIETFmainlycoversthetransportandnetworklayers.Severalapplicationlayer protocolsarealsostandardisedwithintheIETF.TheworkintheIETFisorganisedinworkinggroups.Most oftheworkisperformedbyexchangingemailsandtherearethreeIETFmeetingseveryyear.Participation isopentoanyone.See http://www.ietf.org IGP InteriorGatewayProtocol.Synonymofintradomainroutingprotocol IGRP TheInteriorGatewayRoutingProtocol(IGRP)isaproprietaryintradomainroutingprotocolthatuses distancevector.IGRPsupportsmultiplemetricsforeachroutebuthasbeenreplacedby EIGRP IMAP TheInternetMessageAccessProtocolisdenedin RFC3501 IMAP TheInternetMessageAccessProtocol(IMAP),denedin RFC3501,isanapplication-levelprotocol thatallowsaclienttoaccessandmanipulatetheemailsstoredonaserver.WithIMAP,theemailmessages remainontheserverandarenotdownloadedontheclient. Internet apublicinternet,i.e.anetworkcomposedofdifferentnetworksthatarerunning IPv4 or IPv6 internet aninternetisaninternetwork,i.e.anetworkcomposedofdifferentnetworks.The Internet isavery popularinternetwork,butotherinternetshavebeenusedinthepath. inversequery ForDNSserversandresolvers,aninversequeryisaqueryforthedomainnamethatcorresponds toagivenIPaddress. IP InternetProtocolisthegenerictermforthenetworklayerprotocolintheTCP/IPprotocolsuite. IPv4 is widelyusedtodayand IPv6 isexpectedtoreplace IPv4 250 Chapter7.Glossary

    PAGE 255

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 IPv4 istheversion4oftheInternetProtocol,theconnectionlessnetworklayerprotocolusedinmostofthe Internettoday.IPv4addressesareencodedasa32bitseld. IPv6 istheversion6oftheInternetProtocol,theconnectionlessnetworklayerprotocolwhichisintendedto replace IPv4 .IPv6addressesareencodedasa128bitseld. IS-IS IntermediateSystem-IntermediateSystem.Alink-stateintradomainroutingthatwasinitiallydenedfor theISOCLNPprotocolbutwasextendedtosupportIPv4andIPv6.IS-ISisoftenusedinISPnetworks.It isdenedin[ISO10589] ISN TheInitialSequenceNumberofaTCPconnectionisthesequencenumberchosenbytheclient(resp.server) thatisplacedinthe SYN (resp. SYN+ACK )segmentduringtheestablishmentoftheTCPconnection. ISO TheInternationalStandardizationOrganisationisanagencyoftheUnitedNationsthatisbasedinGeneva anddevelopstandardsonvarioustopics.WithinISO,countryrepresentativesvotetoapproveorrejectstandards.MostoftheworkonthedevelopmentofISOstandardsisdoneinexpertworkinggroups.Additional informationaboutISOmaybeobtainedfromhttp://www.iso.int ISO TheInternationalStandardizationOrganisation ISO-3166 An ISO standardthatdenescodestorepresentcountriesandtheirsubdivisions.See http://www.iso.org/iso/country_codes.htm ISP AnInternetServiceProvider,i.e.anetworkthatprovidesInternetaccesstoitsclients. ITU TheInternationalTelecommunicationUnionisaUnitedNation'sagencywhosepurposeistodevelopstandardsforthetelecommunicationindustry.Itwasinitiallycreatedtostandardisethebasictelephonesystem butexpandedlatertowardsdatanetworks.TheworkwithinITUismainlydonebynetworkspecialistsfrom thetelecommunicationindustry(operatorsandvendors).See http://www.itu.int formoreinformation IXP InterneteXchangePoint.Alocationwhereroutersbelongingtodifferentdomainsareattachedtothesame LocalAreaNetworktoestablishpeeringsessionsandexchangepackets.See http://www.euro-ix.net/ or http://en.wikipedia.org/wiki/List_of_Internet_exchange_points_by_size forapartiallistofIXPs. LAN LocalAreaNetwork leasedline Atelephonelinethatispermanentlyavailablebetweentwoendpoints. MAN MetropolitanAreaNetwork MIME TheMultipurposeInternetMailExtensions(MIME)denedin RFC2045 areasetofextensionstothe formatofemailmessagesthatallowtousenon-ASCIIcharactersinsidemailmessages.AMIMEmessage canbecomposedofseveraldifferentpartseachhavingadifferentformat. MIMEdocument AMIMEdocumentisadocument,encodedbyusingthe MIME format. minicomputer Aminicomputerisamulti-usersystemthatwastypicallyusedinthe1960s/1970s toservedepartments.Seethecorrespondingwikipediaarticleforadditionalinformation: http://en.wikipedia.org/wiki/Minicomputer modem Amodem(modulator-demodulator)isadevicethatencodes(resp.decodes)digitalinformationbymodulating(resp.demodulating)ananalogsignal.Modemsarefrequentlyusedtotransmitdigitalinformation overtelephonelinesandradiolinks.See http://en.wikipedia.org/wiki/Modem forasurveyofvarioustypes ofmodems MSS ATCPoptionusedbyaTCPentityinSYNsegmentstoindicatetheMaximumSegmentSizethatitisable toreceive. multicast atransmissionmodewhereaninformationissentefcientlyto all thereceiversthatbelongtoagiven group nameserver AserverthatimplementstheDNSprotocolandcananswerqueriesfornamesinsideitsowndomain. NAT ANetworkAddressTranslatorisamiddleboxthattranslatesIPpackets. NBMA ANonBroadcastModeMultipleAccessNetworkisasubnetworkthatsupportsmultiplehosts/routers butdoesnotprovideanefcientwayofsendingbroadcastframestoalldevicesattachedtothesubnetwork. ATMsubnetworksareanexampleofNBMAnetworks. 251

    PAGE 256

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 network-byteorder Internetprotocolallowtotransportsequencesofbytes.ThesesequencesofbytesaresufcienttocarryASCIIcharacters.Thenetwork-byteorderreferstotheBig-Endianencodingfor16and32 bitsinteger.See http://en.wikipedia.org/wiki/Endianness NFS TheNetworkFileSystemisdenedin RFC1094 NTP TheNetworkTimeProtocolisdenedin RFC1305 OSI OpenSystemsInterconnection.Asetofnetworkingstandardsdevelopedby ISO includingthe7layersOSI referencemodel. OSPF OpenShortestPathFirst.Alink-stateintradomainroutingprotocolthatisoftenusedinenterpriseand ISPnetworks.OSPFisdenedinand RFC2328 and RFC5340 packet apacketistheunitofinformationtransferinthenetworklayer PBL Problem-basedlearningisateachingapproachthatreliesonproblems. POP ThePostOfceProtocolisdenedin RFC1939 POP ThePostOfceProtocol(POP),dened RFC1939,isanapplication-levelprotocolthatallowsaclientto downloademailmessagesstoredonaserver. resolver AserverthatimplementstheDNSprotocolandcanresolvequeries.Aresolverusuallyservesaset ofclients(e.g.allhostsincampusorallclientsofagivenISP).ItsendsDNSqueriestonameservers everywhereonbehalfofitsclientsandstoresthereceivedanswersinitscache.Aresolvermustknowthe IPaddressesoftherootnameservers. RIP RoutingInformationProtocol.Anintradomainroutingprotocolbasedondistancevectorsthatissometimes usedinenterprisenetworks.RIPisdenedin RFC2453. RIR RegionalInternetRegistry.AnorganisationthatmanagesIPaddressesandASnumbersonbehalfof IANA. rootnameserver Anameserverthatisresponsiblefortherootofthedomainnameshierarchy.Thereare currentlyadozenrootnameserversandeachDNSresolverSee http://www.root-servers.org/ formoreinformationabouttheoperationoftheserootservers. round-trip-time Theround-trip-time(RTT)isthedelaybetweenthetransmissionofasegmentandthereception ofthecorrespondingacknowledgementinatransportprotocol. router Arelayoperatinginthenetworklayer. RPC Severaltypesofremoteprocedurecallshavebeendened.TheRPCmechanismdenedin RFC5531 is usedbyapplicationssuchasNFS SDU(ServiceDataUnit) aServiceDataUnitistheunitinformationtransferredbetweenapplications segment asegmentistheunitofinformationtransferinthetransportlayer SMTP TheSimpleMailTransferProtocolisdenedin RFC821 SNMP TheSimpleNetworkManagementProtocolisamanagementprotocoldenedforTCP/IPnetworks. socket Alow-levelAPIoriginallydenedonBerkeleyUnixtoallowprogrammerstodevelopclientsandservers. spoofedpacket Apacketissaidtobespoofedwhenthesenderofthepackethasusedassourceaddressa differentaddressthanitsown. SSH TheSecureShell(SSH)TransportLayerProtocolisdenedin RFC4253 standardquery ForDNSserversandresolvers,astandardqueryisaqueryfora A ora AAAA record.Sucha querytypicallyreturnsanIPaddress. switch Arelayoperatinginthedatalinklayer. SYNcookie TheSYNcookiesisatechniqueusedtocomputetheinitialsequencenumber(ISN) TCB TheTransmissionControlBlockisthesetofvariablesthataremaintainedforeachestablishedTCPconnectionbyaTCPimplementation. TCP TheTransmissionControlProtocolisaprotocolofthetransportlayerintheTCP/IPprotocolsuitethat providesareliablebytestreamconnection-orientedserviceontopofIP 252 Chapter7.Glossary

    PAGE 257

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 TCP/IP referstothe TCP and IP protocols telnet Thetelnetprotocolisdenedin RFC854 TLD ATop-leveldomainname.TherearetwotypesofTLDs.TheccTLDaretheTLDthatcorrespondtoatwo letters ISO-3166 countrycode.ThegTLDarethegenericTLDsthatarenotassignedtoacountry. TLS TransportLayerSecurity,denedin RFC5246 isacryptographicprotocolthatisusedtoprovidecommunicationsecurityforInternetapplications.Thisprotocolisusedontopofthetransportservicebutadetailed descriptionisoutsidethescopeofthisbook. UDP UserDatagramProtocolisaprotocolofthetransportlayerintheTCP/IPprotocolsuitethatprovidesan unreliableconnectionlessservicethatincludesamechanismtodetectcorruption unicast atransmissionmodewhereaninformationissentfromonesourcetoonerecipient vnc Anetworkedapplicationthatallowstoremotelyaccessacomputer'sGraphicalUserInterface.See http://en.wikipedia.org/wiki/Virtual_Network_Computing W3C Theworldwidewebconsortiumwascreatedtostandardisetheprotocolsandmechanismsusedinthe globalwww.Itisthusfocusedonasubsetoftheapplicationlayer.See http://www.w3c.org WAN WideAreaNetwork X.25 Awideareanetworkingtechnologyusingvirtualcircuitsthatwasdeployedbytelecomoperators. X11 TheXWindowsystemandtheassociatedprotocolsaredenedin [SG1990] XML TheeXtensibleMarkupLanguage(XML)isaexibletextformatderivedfromSGML.Itwasoriginally designedfortheelectronicpublishingindustrybutisnowusedbyawidevarietyofapplicationsthatneed toexchangestructureddata.TheXMLspecicationsaremaintainedby severalworkinggroups ofthe W3C 253

    PAGE 258

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 254 Chapter7.Glossary

    PAGE 259

    CHAPTER 8 Bibliography Wheneverpossible,thebibliographyincludesstablehypertextlinkstothereferencescited. 255

    PAGE 260

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 256Chapter8.Bibliography

    PAGE 261

    CHAPTER 9 Indicesandtables genindex search 257

    PAGE 262

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 258Chapter9.Indicesandtables

    PAGE 263

    Bibliography [802.11]LAN/MANStandardsCommitteeoftheIEEEComputerSociety. IEEEStandardforInformationTechnology-Telecommunicationsandinformationexchangebetweensystems-localandmetropolitanareanetworks-specicrequirements-Part11:WirelessLANMediumAccessControl(MAC)andPhysicalLayer (PHY)Specications.IEEE,1999. [802.1d]LAN/MANStandardsCommitteeoftheIEEEComputerSociety, IEEEStandardforLocaland metropolitanareanetworksMediaAccessControl(MAC)Bridges ,IEEEStd802.1DTM-2004,2004, [802.1q]LAN/MANStandardsCommitteeoftheIEEEComputerSociety, IEEEStandardforLocaland metropolitanareanetworksVirtualBridgedLocalAreaNetworks,2005, [802.2]IEEE802.2-1998(ISO/IEC8802-2:1998),IEEEStandardforInformationtechnology TelecommunicationsandinformationexchangebetweensystemsLocalandmetropolitanareanetworksSpecicrequirementsPart2:LogicalLinkControl.Availablefrom http://standards.ieee.org/getieee802/802.2.html [802.3]LAN/MANStandardsCommitteeoftheIEEEComputerSociety.IEEEStandardforInformation Technology-Telecommunicationsandinformationexchangebetweensystems-localandmetropolitanareanetworks-specicrequirements-Part3:CarrierSensemultipleaccesswithcollision detection(CSMA/CD)accessmethodandphysicallayerspecication.IEEE,2000.Availablefrom http://standards.ieee.org/getieee802/802.3.html [802.5]LAN/MANStandardsCommitteeoftheIEEEComputerSociety.IEEEStandardforInformation technologyTelecommunicationsandinformationexchangebetweensystemsLocalandmetropolitanarea networksSpecicrequirementsPart5:TokenRingAccessMethodandPhysicalLayerSpecication.IEEE, 1998.availablefrom http://standards.ieee.org/getieee802 [ACO+2006]Augustin,B.,Cuvellier,X.,Orgogozo,B.,Viger,F.,Friedman,T.,Latapy,M.,Magnien,C.,Teixeira,R., AvoidingtracerouteanomalieswithParistraceroute,InternetMeasurementConference,October 2006,Seealso http://www.paris-traceroute.net/ [AS2004]Androutsellis-Theotokis,S.andSpinellis,D..2004. Asurveyofpeer-to-peercontentdistributiontechnologies.ACMComput.Surv.36,4(December2004),335-371. [ATLAS2009]Labovitz,C.,Iekel-Johnson,S.,McPherson,D.,Oberheide,J.andJahanian,F., Internetinterdomaintrafc.InProceedingsoftheACMSIGCOMM2010conferenceonSIGCOMM(SIGCOMM`10). ACM,NewYork,NY,USA,75-86. [AW05]Arlitt,M.andWilliamson,C.2005. AnanalysisofTCPresetbehaviourontheinternet.SIGCOMM Comput.Commun.Rev.35,1(Jan.2005),37-44. 259

    PAGE 264

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 [Abramson1970]Abramson,N.,THEALOHASYSTEM:anotheralternativeforcomputercommunications.In ProceedingsoftheNovember17-19,1970,FallJointComputerConference(Houston,Texas,November1719,1970).AFIPS`70(Fall).ACM,NewYork,NY,281-285. [B1989]Berners-Lee,T.,InformationManagement:AProposal,March1989 [Baran]Baran,P., Ondistributedcommunicationsseries, http://www.rand.org/about/history/baran.list.html, [BE2007]Biondi,P.andA.Ebalard, IPv6RoutingHeaderSecurity,CanSecWestSecurityConference2007, April2007. [BF1995]Bonomi,F.andFendick,K.W., Therate-basedowcontrolframeworkfortheavailablebitrateATM service,IEEENetwork,Mar/Apr1995,Volume:9,Issue:2,pages:25-39 [BG1992]Bertsekas,D.,Gallager,G., Datanetworks,secondedition,PrenticeHall,1992 [BMO2006]Bhatia,M.,Manral,V.,Ohara,Y., IS-ISandOSPFDifferenceDiscussions,workinprogress,Internet draft,Jan.2006 [BMvB2009]Bagnulo,M.,Matthews,P.,vanBeijnum,I., NAT64:NetworkAddressandProtocolTranslation fromIPv6ClientstoIPv4Servers,Internetdraft,workinprogress,October2009, [BNT1997]Beech,W.,Nielsen,D.,Taylor,J., AX.25LinkAccessProtocolforAmateurPacketRadio,version `2.2,Revision:July1998 [BOP1994]Brakmo,L.S.,O'Malley,S.W.,andPeterson,L.L., TCPVegas:newtechniquesforcongestion detectionandavoidance.InProceedingsoftheConferenceonCommunicationsArchitectures,Protocolsand Applications(London,UnitedKingdom,August31-September02,1994).SIGCOMM`94.ACM,NewYork, NY,24-35. [Benvenuti2005]Benvenuti,C., UnderstandingLinuxNetworkInternals,O'ReillyMedia,2005 [Bush1945]Bush,V. Aswemaythink TheAtlanticMonthly176(July1945),pp.101 [Bush1993]Bush,R., FidoNet:technology,tools,andhistory.Commun.ACM36,8(Aug.1993),31-35. [Bux1989]Bux,W., Token-ringlocal-areanetworksandtheirperformance,ProceedingsoftheIEEE,Vol77,No 2,p.238-259,Feb.1989 [BYL2008]Buford,J.,Yu,H.,Lua,E.K., P2PNetworkingandApplications,MorganKaufmann,2008 [CB2003]Cheswick,WilliamR.,Bellovin,StevenM.,Rubin,AvielD., Firewallsandinternetsecurity-Second edition-RepellingtheWilyHacker,Addison-Wesley2003 [CD2008]Calvert,K.,Donahoo,M., TCP/IPSocketsinJava:PracticalGuideforProgrammers,MorganKaufman,2008 [CJ1989]Chiu,D.,Jain,R., AnalysisoftheIncreaseandDecreaseAlgorithmsforCongestionAvoidancein ComputerNetworks,ComputerNetworksandISDNSystemsVol17,pp1-14,1989 [CK74]Cerf,V.,Kahn,R., AProtocolforPacketNetworkIntercommunication,IEEETransactionsonCommunications,May1974 [CNPI09]Gont,F., SecurityAssessmentoftheTransmissionControlProtocol(TCP),SecurityAssessmentofthe TransmissionControlProtocol(TCP),Internetdraft,workinprogress,Jan.2011 [COZ2008]Chi,Y.,Oliveira,R.,Zhang,L., Cyclops:TheInternetAS-levelObservatory,ACMSIGCOMM ComputerCommunicationReview(CCR),October2008 [CSP2009]Carr,B.,Sury,O.,PaletMartinez,J.,Davidson,A.,Evans,R.,Yilmaz,F.,Wijte,Y., IPv6Address AllocationandAssignmentPolicy,RIPEdocumentripe-481,September2009 [CT1980]Crane,R.,Taft,E., PracticalconsiderationsinEthernetlocalnetworkdesign,Proc.ofthe13thHawaii InternationalConferenceonSystemsSciences,Honolulu,January,1980,pp.166 [Cheshire2010]Cheshire,S., Connect-By-NameforIPv6,presentationatIETF79th,November2010 [Cheswick1990]Cheswick,B., AnEveningwithBerferdInWhichaCrackerisLured,Endured,andStudied, Proc.WinterUSENIXConference,1990,pp.163-174 260 Bibliography

    PAGE 265

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 [Clark88]ClarkD.,TheDesignPhilosophyoftheDARPAInternetProtocols,ComputerCommunicationsReview18:4,August1988,pp.106-114 [Comer1988]Comer,D., InternetworkingwithTCP/IP:principles,protocols&architecture,PrenticeHall,1988 [Comer1991]ComerD., InternetworkingWithTCP/IP:DesignImplementationandInternals,PrenticeHall, 1991 [Cohen1980]Cohen,D., OnHolyWarsandaPleaforPeace,IEN137,April1980, http://www.ietf.org/rfc/ien/ien137.txt [DC2009]Donahoo,M.,Calvert,K., TCP/IPSocketsinC:PracticalGuideforProgrammers ,MorganKaufman, 2009, [DIX]Digital,Intel,Xerox, TheEthernet:alocalareanetwork:datalinklayerandphysicallayerspecications. SIGCOMMComput.Commun.Rev.11,3(Jul.1981),20-66. [DKF+2007]Dimitropoulos,X.,Krioukov,D.,Fomenkov,M.,Huffaker,B.,Hyun,Y.,Claffy,K.,Riley,G., AS Relationships:InferenceandValidation,ACMSIGCOMMComputerCommunicationReview(CCR),Jan. 2007 [DP1981]Dalal,Y.K.andPrintis,R.S., 48-bitabsoluteinternetandEthernethostnumbers.InProceedingsof theSeventhSymposiumonDataCommunications(MexicoCity,Mexico,October27-29,1981).SIGCOMM `81.ACM,NewYork,NY,240-245. [Dunkels2003]Dunkels,A., FullTCP/IPfor8-BitArchitectures.InProceedingsoftherstinternationalconferenceonmobileapplications,systemsandservices(MOBISYS2003),SanFrancisco,May2003. [DT2007]Donnet,B.andFriedman,T., InternetTopologyDiscovery:aSurvey.IEEECommunicationsSurveys andTutorials,9(4):2-15,December2007 [DYGU2004]Davik,F.Yilmaz,M.Gjessing,S.Uzun,N., IEEE802.17resilientpacketringtutorial,IEEE CommunicationsMagazine,Mar2004,Vol42,N3,p.112-118 [Dijkstra1959]Dijkstra,E., ANoteonTwoProblemsinConnectionwithGraphs.NumerischeMathematik, 1:269-271,1959 [FDDI]ANSI. Informationsystems-FiberDistributedDataInterface(FDDI)-tokenringmediaaccesscontrol (MAC).ANSIX3.139-1987(R1997),1997 [Fletcher1982]Fletcher,J., AnArithmeticChecksumforSerialTransmissions,Communications,IEEETransactionson,Jan.1982,Vol.30,N.1,pp.247-252 [FFEB2005]Francois,P.,Filsls,C.,Evans,J.,andBonaventure,O., Achievingsub-secondIGPconvergencein largeIPnetworks.SIGCOMMComput.Commun.Rev.35,3(Jul.2005),35-44. [FJ1994]Floyd,S.,andJacobson,V., TheSynchronizationofPeriodicRoutingMessages,IEEE/ACMTransactionsonNetworking,V.2N.2,p.122-136,April1994 [FLM2008]Fuller,V.,Lear,E.,Meyer,D., Reclassifying240/4asusableunicastaddressspace,Internetdraft, March2008,workinprogress [FRT2002]Fortz,B.Rexford,J.,Thorup,M., TrafcengineeringwithtraditionalIProutingprotocols,IEEE CommunicationMagazine,October2002 [FTY99]TheodoreFaber,JoeTouch,andWeiYue, TheTIME-WAITstateinTCPandItsEffectonBusyServers, Proc.Infocom`99,pp.1573 [Feldmeier95]Feldmeier,D.C., Fastsoftwareimplementationoferrordetectioncodes.IEEE/ACMTrans.Netw. 3,6(Dec.1995),640-651. [GAVE1999]Govindan,R.,Alaettinoglu,C.,Varadhan,K.,Estrin,D., AnArchitectureforStable,Analyzable InternetRouting,IEEENetworkMagazine,Vol.13,No.1,pp.29,January1999 [GC2000]Grier,D.,Campbell,M., AsocialhistoryofBitnetandListserv,1985-1991,AnnalsoftheHistoryof Computing,IEEE,Volume22,Issue2,Apr-Jun2000,pp.32-41 [Genilloud1990]Genilloud,G., X.400MHS:rststepstowardsanEDIcommunicationstandard.SIGCOMM Comput.Commun.Rev.20,2(Apr.1990),72-86. Bibliography 261

    PAGE 266

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 [GGR2001]Gao,L.,Grifn,T.,Rexford,J., InherentlysafebackuproutingwithBGP,Proc.IEEEINFOCOM, April2001 [GR2001]Gao,L.,Rexford,J., StableInternetroutingwithoutglobalcoordination,IEEE/ACMTransactionson Networking,December2001,pp.681-692 [GSW2002]Grifn,T.G.,Shepherd,F.B.,andWilfong,G., Thestablepathsproblemandinterdomainrouting. IEEE/ACMTrans.Netw.10,2(Apr.2002),232-243 [GW1999]Grifn,T.G.andWilfong,G., AnanalysisofBGPconvergenceproperties.SIGCOMMComput. Commun.Rev.29,4(Oct.1999),277-288. [GW2002]Grifn,T.andWilfong,G.T., AnalysisoftheMEDOscillationProbleminBGP.InProceedingsofthe 10thIEEEinternationalConferenceonNetworkProtocols(November12-15,2002).ICNP.IEEEComputer Society,Washington,DC,90-99 [Garcia1993]Garcia-Lunes-Aceves,J., Loop-FreeRoutingUsingDiffusingComputations,IEEE/ACMTransactionsonNetworking,Vol.1,No,1,Feb.1993 [Gast2002]Gast,M., 802.11WirelessNetworks:TheDenitiveGuide,O'Reilly,2002 [Gill2004]Gill,V., LackofPriorityQueuingConsideredHarmful,ACMQueue,December2004 [Goralski2009]Goralski,W., TheIllustratednetwork:HowTCP/IPworksinamodernnetwork,MorganKaufmann,2009 [HFPMC2002]Huffaker,B.,Fomenkov,M.,Plummer,D.,Moore,D.,Claffy,K., DistanceMetricsintheInternet, PresentedattheIEEEInternationalTelecommunicationsSymposium(ITS)in2002. [HRX2008]Ha,S.,Rhee,I.,andXu,L., CUBIC:anewTCP-friendlyhigh-speedTCPvariant.SIGOPSOper. Syst.Rev.42,5(Jul.2008),64-74. [ISO10589]InformationtechnologyTelecommunicationsandinformationexchangebetweensystems IntermediateSystemtoIntermediateSystemintra-domainrouteinginformationexchangeprotocolforusein conjunctionwiththeprotocolforprovidingtheconnectionless-modenetworkservice(ISO8473),2002 [Jacobson1988]Jacobson,V., Congestionavoidanceandcontrol.InSymposiumProceedingsonCommunicationsArchitecturesandProtocols(Stanford,California,UnitedStates,August16-18,1988).V.Cerf,Ed. SIGCOMM`88.ACM,NewYork,NY,314-329. [JSBM2002]Jung,J.,Sit,E.,Balakrishnan,H.,andMorris,R.2002. DNSperformanceandtheeffectivenessof caching.IEEE/ACMTrans.Netw.10,5(Oct.2002),589-603. [Kerrisk2010]Kerrisk,M., TheLinuxProgrammingInterface,NoStarchPress,2010 [KM1995]Kent,C.A.andMogul,J.C., Fragmentationconsideredharmful.SIGCOMMComput.Commun.Rev. 25,1(Jan.1995),75-87. [KP91]Karn,P.andPartridge,C., Improvinground-triptimeestimatesinreliabletransportprotocols.ACM Trans.Comput.Syst.9,4(Nov.1991),364-373. [KPD1985]Karn,P.,Price,H.,Diersing,R., Packetradioinamateurservice,IEEEJournalonSelectedAreasin Communications,3,May,1985 [KPS2003]Kaufman,C.,Perlman,R.,andSommerfeld,B. DoSprotectionforUDP-basedprotocols.InProceedingsofthe10thACMConferenceonComputerandCommunicationsSecurity(WashingtonD.C.,USA, October27-30,2003).CCS`03.ACM,NewYork,NY,2-7. [KR1995]Kung,N.T.Morris,R., Credit-basedowcontrolforATMnetworks,IEEENetwork,Mar/Apr1995, Volume:9,Issue:2,pages:40-48 [KT1975]Kleinrock,L.,Tobagi,F., PacketSwitchinginRadioChannels:PartICarrierSenseMultiple-Access ModesandtheirThroughput-DelayCharacteristics,IEEETransactionsonCommunications,Vol.COM-23, No.12,pp.1400-1416,December1975. [KW2009]Katz,D.,Ward,D., BidirectionalForwardingDetection, RFC5880,June2010 [KZ1989]Khanna,A.andZinky,J.1989. TherevisedARPANETroutingmetric.SIGCOMMComput.Commun. Rev.19,4(Aug.1989),45-56. 262 Bibliography

    PAGE 267

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 [KuroseRoss09]KuroseJ.andRossK., Computernetworking:atop-downapproachfeaturingtheInternet, Addison-Wesley,2009 [Licklider1963]Licklider,J., MemorandumForMembersandAfliatesoftheIntergalacticComputerNetwork, 1963 [LCCD09]Leiner,B.M.,Cerf,V.G.,Clark,D.D.,Kahn,R.E.,Kleinrock,L.,Lynch,D.C.,Postel,J.,Roberts, L.G.,andWolff,S., Abriefhistoryoftheinternet.SIGCOMMComput.Commun.Rev.39,5(Oct.2009), 22-31. [LCP2005]EngKeongLua,Crowcroft,J.,Pias,M.,Sharma,R.,Lim,S., Asurveyandcomparisonofpeer-topeeroverlaynetworkschemes,CommunicationsSurveys&Tutorials,IEEE,Volume:7,Issue:2,2005,pp. 72-93 [LFJLMT]Lefer,S.,Fabry,R.,Joy,W.,Lapsley,P.,Miller,S.,Torek,C., AnAdvanced4.4BSDInterprocess CommunicationTutorial,4.4BSDProgrammer'sSupplementaryDocumentation [LSP1982]Lamport,L.,Shostak,R.,andPease,M., TheByzantineGeneralsProblem.ACMTrans.Program. Lang.Syst.4,3(Jul.1982),382-401. [Leboudec2008]Leboudec,J.-Y., RateAdaptationCongestionControlandFairness:atutorial,Dec.2008 [Malamud1991]Malamud,C., AnalyzingDECnet/OSIphaseV,VanNostrandReinhold,1991 [McFadyen1976]McFadyen,J., SystemsNetworkArchitecture:Anoverview,IBMSystemsJournal,Vol.15,N. 1,pp.4-23,1976 [McKusick1999]McKusick,M., TwentyYearsofBerkeleyUnix:FromAT&T-OwnedtoFreely Redistributable,inOpenSources:VoicesfromtheOpenSourceRevolution,Oreilly,1999, http://oreilly.com/catalog/opensources/book/toc.html [ML2011]MineiI.andLucekJ.,`MPLS-EnabledApplications:EmergingDevelopmentsandNew Technologies`_(WileySeriesonCommunicationsNetworking&DistributedSystems), Wiley,2011 [MRR1979]McQuillan,J.M.,Richer,I.,andRosen,E.C., Anoverviewofthenewroutingalgorithmforthe ARPANET.InProceedingsoftheSixthSymposiumonDataCommunications(PacicGrove,California, UnitedStates,November27-29,1979).SIGCOMM`79.ACM,NewYork,NY,63-68. [MSMO1997]Mathis,M.,Semke,J.,Mahdavi,J.,andOtt,T.1997. ThemacroscopicbehavioroftheTCP congestionavoidancealgorithm.SIGCOMMComput.Commun.Rev.27,3(Jul.1997),67-82. [MSV1987]Molle,M.,Sohraby,K.,Venetsanopoulos,A., Space-TimeModelsofAsynchronousCSMAProtocolsforLocalAreaNetworks,IEEEJournalonSelectedAreasinCommunications,Volume:5Issue:6,Jul 1987Page(s):956-96 [MUF+2007]Mhlbauer,W.,Uhlig,S.,Fu,B.,Meulle,M.,andMaennel,O., Insearchforanappropriategranularitytomodelroutingpolicies.InProceedingsofthe2007ConferenceonApplications,Technologies,Architectures,andProtocolsForComputerCommunications(Kyoto,Japan,August27-31,2007).SIGCOMM `07.ACM,NewYork,NY,145-156. [Malkin1999]Malkin,G., RIP:AnIntra-DomainRoutingProtocol,AddisonWesley,1999 [Metcalfe1976]MetcalfeR.,Boggs,D., Ethernet:Distributedpacket-switchingforlocalcomputernetworks. CommunicationsoftheACM,19(7):395,1976. [Mills2006]Mills,D.L., ComputerNetworkTimeSynchronization:theNetworkTimeProtocol.CRCPress, March2006,304pp. [Miyakawa2008]Miyakawa,S., FromIPv4onlyTov4/v6DualStack,IETF72IABTechnicalPlenary,July2008 [Mogul1995]Mogul,J., Thecaseforpersistent-connectionHTTP.InProceedingsoftheConferenceonApplications,Technologies,Architectures,andProtocolsForComputerCommunication(Cambridge,Massachusetts, UnitedStates,August28-September01,1995).D.Oran,Ed.SIGCOMM`95.ACM,NewYork,NY,299313. [Moore]Moore,R., Packetswitchinghistory, http://rogerdmoore.ca/PS/ Bibliography 263

    PAGE 268

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 [Moy1998]Moy,J., OSPF:AnatomyofanInternetRoutingProtocol,AddisonWesley,1998 [Myers1998]Myers,B.A., Abriefhistoryofhuman-computerinteractiontechnology.interactions5,2(Mar. 1998),44-54. [Nelson1965]Nelson,T.H., Complexinformationprocessing:alestructureforthecomplex,thechangingand theindeterminate.InProceedingsofthe196520thNationalConference(Cleveland,Ohio,UnitedStates, August24-26,1965).L.Winner,Ed.ACM`65.ACM,NewYork,NY,84-100. [Paxson99]Paxson,V., End-to-endInternetpacketdynamics.SIGCOMMComput.Commun.Rev.27,4(Oct. 1997),139-152. [Perlman1985]Perlman,R., AnalgorithmfordistributedcomputationofaspanningtreeinanextendedLAN. SIGCOMMComput.Commun.Rev.15,4(Sep.1985),44-53. [Perlman2000]Perlman,R., Interconnections:Bridges,routers,switchesandinternetworkingprotocols,2nd edition,AddisonWesley,2000 [Perlman2004]Perlman,R., RBridges:TransparentRouting,Proc.IEEEInfocom,March2004. [Pouzin1975]Pouzin,L., TheCYCLADESNetwork-Presentstateanddevelopmenttrends,Symposiumon ComputerNetworks,1975pp8-13. [Rago1993]Rago,S., UNIXSystemVnetworkprogramming,AddisonWesley,1993 [RE1989]Rochlis,J.A.andEichin,M.W., Withmicroscopeandtweezers:thewormfromMIT'sperspective. Commun.ACM32,6(Jun.1989),689-698. [RFC20]Cerf,V., ASCIIformatfornetworkinterchange, RFC20,Oct.1969 [RFC768]Postel,J., UserDatagramProtocol, RFC768,Aug.1980 [RFC789]Rosen,E., Vulnerabilitiesofnetworkcontrolprotocols:Anexample, RFC789,July1981 [RFC791]Postel,J., InternetProtocol, RFC791,Sep.1981 [RFC792]Postel,J., InternetControlMessageProtocol, RFC792,Sep.1981 [RFC793]Postel,J., TransmissionControlProtocol, RFC793,Sept.1981 [RFC813]Clark,D., WindowandAcknowledgementStrategyinTCP, RFC813,July1982 [RFC819]Su,Z.andPostel,J., DomainnamingconventionforInternetuserapplications, RFC819,Aug.1982 [RFC821]Postel,J., SimpleMailTransferProtocol, RFC821,Aug.1982 [RFC822]Crocker,D., StandardfortheformatofARPAInternettextmessages,:rfc:`822,Aug.1982 [RFC826]Plummer,D., ``EthernetAddressResolutionProtocol:OrConvertingNetworkProtocolAddressesto 48.bitEthernetAddressforTransmissiononEthernetHardware`, RFC826,Nov.1982 [RFC879]Postel,J., TCPmaximumsegmentsizeandrelatedtopics, RFC879,Nov.1983 [RFC893]Lefer,S.andKarels,M., Trailerencapsulations, RFC893,April1984 [RFC894]Hornig,C., AStandardfortheTransmissionofIPDatagramsoverEthernetNetworks, RFC894,April 1984 [RFC896]Nagle,J., CongestionControlinIP/TCPInternetworks, RFC896,Jan.1984 [RFC952]Harrenstien,K.andStahl,M.andFeinler,E., DoDInternethosttablespecication, RFC952,Oct. 1985 [RFC959]Postel,J.andReynolds,J., FileTransferProtocol, RFC959,Oct.1985 [RFC974]Partridge,C., Mailroutingandthedomainsystem, RFC974,Jan.1986 [RFC1032]Stahl,M., Domainadministratorsguide, RFC1032,Nov.1987 [RFC1035]Mockapteris,P., Domainnames-implementationandspecication, RFC1035,Nov.1987 [RFC1042]Postel,J.andReynolds,J., StandardforthetransmissionofIPdatagramsoverIEEE802networks, RFC1042,Feb.1988 264 Bibliography

    PAGE 269

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 [RFC1055]Romkey,J., NonstandardfortransmissionofIPdatagramsoverseriallines:SLIP, RFC1055,June 1988 [RFC1071]Braden,R.,BormanD.andPartridge,C., ComputingtheInternetchecksum, RFC1071,Sep.1988 [RFC1122]Braden,R., RequirementsforInternetHosts-CommunicationLayers, RFC1122,Oct.1989 [RFC1144]Jacobson,V., CompressingTCP/IPHeadersforLow-SpeedSerialLinks, RFC1144,Feb.1990 [RFC1149]Waitzman,D., StandardforthetransmissionofIPdatagramsonaviancarriers, RFC1149,Apr. 1990 [RFC1169]Cerf,V.andMills,K., ExplainingtheroleofGOSIP, RFC1169,Aug.1990 [RFC1191]Mogul,J.andDeering,S., PathMTUdiscovery, RFC1191,Nov.1990 [RFC1195]Callon,R., UseofOSIIS-ISforroutinginTCP/IPanddualenvironments, RFC1195,Dec.1990 [RFC1258]Kantor,B., BSDRlogin, RFC1258,Sept.1991 [RFC1321]Rivest,R., TheMD5Message-DigestAlgorithm, RFC1321,April1992 [RFC1323]Jacobson,V.,BradenR.andBorman,D., TCPExtensionsforHighPerformance, RFC1323,May 1992 [RFC1347]Callon,R.,TCPandUDPwithBiggerAddresses(TUBA), ASimpleProposalforInternetAddressing andRouting, RFC1347,June1992 [RFC1518]Rekhter,Y.andLi,T., AnArchitectureforIPAddressAllocationwithCIDR, RFC1518,Sept.1993 [RFC1519]FullerV.,LiT.,YuJ.andVaradhan,K., ClasslessInter-DomainRouting(CIDR):anAddressAssignmentandAggregationStrategy, RFC1519,Sept.1993 [RFC1542]Wimer,W., ClaricationsandExtensionsfortheBootstrapProtocol, RFC1542,Oct.1993 [RFC1548]Simpson,W., ThePoint-to-PointProtocol(PPP), RFC1548,Dec.1993 [RFC1550]Bradner,S.andMankin,A., IP:NextGeneration(IPng)WhitePaperSolicitation, RFC1550,Dec. 1993 [RFC1561]Piscitello,D., UseofISOCLNPinTUBAEnvironments, RFC1561,Dec.1993 [RFC1621]Francis,P., PIPNear-termarchitecture, RFC1621,May1994 [RFC1624]Risjsighani,A., ComputationoftheInternetChecksumviaIncrementalUpdate, RFC1624,May 1994 [RFC1631]EgevangK.andFrancis,P., TheIPNetworkAddressTranslator(NAT), RFC1631,May1994 [RFC1661]Simpson,W., ThePoint-to-PointProtocol(PPP), RFC1661,Jul.1994 [RFC1662]Simpson,W., PPPinHDLC-likeFraming, RFC1662,July1994 [RFC1710]Hinden,R., SimpleInternetProtocolPlusWhitePaper, RFC1710,Oct.1994 [RFC1738]Berners-Lee,T.,Masinter,L.,andMcCahillM., UniformResourceLocators(URL), RFC1738,Dec. 1994 [RFC1752]Bradner,S.andMankin,A., TheRecommendationfortheIPNextGenerationProtocol, RFC1752, Jan.1995 [RFC1812]Baker,F., RequirementsforIPVersion4Routers, RFC1812,June1995 [RFC1819]Delgrossi,L.,Berger,L., InternetStreamProtocolVersion2(ST2)ProtocolSpecication-Version ST2+, RFC1819,Aug.1995 [RFC1889]SchulzrinneH.,CasnerS.,Frederick,R.andJacobson,V., RTP:ATransportProtocolforReal-Time Applications, RFC1889,Jan.1996 [RFC1896]ResnickP.,WalkerA., Thetext/enrichedMIMEContent-type, RFC1896,Feb.1996 [RFC1918]RekhterY.,MoskowitzB.,KarrenbergD.,deGrootG.andLear,E., AddressAllocationforPrivate Internets, RFC1918,Feb.1996 Bibliography 265

    PAGE 270

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 [RFC1939]Myers,J.andRose,M., PostOfceProtocol-Version3, RFC1939,May1996 [RFC1945]Berners-Lee,T.,Fielding,R.andFrystyk,H., HypertextTransferProtocolHTTP/1.0, RFC1945, May1996 [RFC1948]Bellovin,S., DefendingAgainstSequenceNumberAttacks, RFC1948,May1996 [RFC1951]Deutsch,P., DEFLATECompressedDataFormatSpecicationversion1.3, RFC1951,May1996 [RFC1981]McCann,J.,Deering,S.andMogul,J., PathMTUDiscoveryforIPversion6, RFC1981,Aug.1996 [RFC2003]Perkins,C., IPEncapsulationwithinIP, RFC2003,Oct.1996 [RFC2018]Mathis,M.,Mahdavi,J.,Floyd,S.andRomanow,A., TCPSelectiveAcknowledgmentOptions RFC 2018,Oct.1996 [RFC2045]Freed,N.andBorenstein,N., MultipurposeInternetMailExtensions(MIME)PartOne:Formatof InternetMessageBodies, RFC2045,Nov.1996 [RFC2046]Freed,N.andBorenstein,N., MultipurposeInternetMailExtensions(MIME)PartTwo:MediaTypes, RFC2046,Nov.1996 [RFC2050]Hubbard,K.andKosters,M.andConrad,D.andKarrenberg,D.andPostel,J., InternetRegistryIP AllocationGuidelines, RFC2050,Nov.1996 [RFC2080]Malkin,G.andMinnear,R., RIPngforIPv6, RFC2080,Jan.1997 [RFC2082]Baker,F.andAtkinson,R., RIP-2MD5Authentication, RFC2082,Jan.1997 [RFC2131]Droms,R., DynamicHostCongurationProtocol, RFC2131,March1997 [RFC2140]Touch,J., TCPControlBlockInterdependence, RFC2140,April1997 [RFC2225]Laubach,M.,Halpern,J., ClassicalIPandARPoverATM, RFC2225,April1998 [RFC2328]Moy,J., OSPFVersion2, RFC2328,April1998 [RFC2332]Luciani,J.andKatz,D.andPiscitello,D.andCole,B.andDoraswamy,N., NBMANextHopResolutionProtocol(NHRP), RFC2332,April1998 [RFC2364]Gross,G.andKaycee,M.andLi,A.andMalis,A.andStephens,J., PPPOverAAL5, RFC2364, July1998 [RFC2368]Hoffman,P.andMasinter,L.andZawinski,J., ThemailtoURLscheme, RFC2368,July1998 [RFC2453]Malkin,G., RIPVersion2, RFC2453,Nov.1998 [RFC2460]DeeringS.,Hinden,R., InternetProtocol,Version6(IPv6)Specication, RFC2460,Dec.1998 [RFC2464]Crawford,M., TransmissionofIPv6PacketsoverEthernetNetworks, RFC2464,Dec.1998 [RFC2507]Degermark,M.andNordgren,B.andPink,S., IPHeaderCompression, RFC2507,Feb.1999 [RFC2516]Mamakos,L.andLidl,K.andEvarts,J.andCarrel,J.andSimone,D.andWheeler,R., AMethodfor TransmittingPPPOverEthernet(PPPoE), RFC2516,Feb.1999 [RFC2581]Allman,M.andPaxson,V.andStevens,W., TCPCongestionControl, RFC2581,April1999 [RFC2616]Fielding,R.andGettys,J.andMogul,J.andFrystyk,H.andMasinter,L.andLeach,P.andBernersLee,T., HypertextTransferProtocolHTTP/1.1, RFC2616,June1999 [RFC2617]Franks,J.andHallam-Baker,P.andHostetler,J.andLawrence,S.andLeach,P.andLuotonen,A. andStewart,L., HTTPAuthentication:BasicandDigestAccessAuthentication, RFC2617,June1999 [RFC2622]Alaettinoglu,C.andVillamizar,C.andGerich,E.andKessens,D.andMeyer,D.andBates,T.and Karrenberg,D.andTerpstra,M., RoutingPolicySpecicationLanguage(RPSL), RFC2622,June1999 [RFC2675]Tsirtsis,G.andSrisuresh,P., NetworkAddressTranslation-ProtocolTranslation(NAT-PT), RFC 2766,Feb.2000 [RFC2854]Connolly,D.andMasinter,L., The`text/html'MediaType, RFC2854,June2000 [RFC2965]Kristol,D.andMontulli,L., HTTPStateManagementMechanism, RFC2965,Oct.2000 266 Bibliography

    PAGE 271

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 [RFC2988]Paxson,V.andAllman,M., ComputingTCP'sRetransmissionTimer, RFC2988,Nov.2000 [RFC2991]Thaler,D.andHopps,C., MultipathIssuesinUnicastandMulticastNext-HopSelection, RFC2991, Nov.2000 [RFC3021]Retana,A.andWhite,R.andFuller,V.andMcPherson,D., Using31-BitPrexesonIPv4Point-toPointLinks, RFC3021,Dec.2000 [RFC3022]Srisuresh,P.,Egevang,K., TraditionalIPNetworkAddressTranslator(TraditionalNAT), RFC3022, Jan.2001 [RFC3031]Rosen,E.andViswanathan,A.andCallon,R., MultiprotocolLabelSwitchingArchitecture, RFC 3031,Jan.2001 [RFC3168]Ramakrishnan,K.andFloyd,S.andBlack,D., TheAdditionofExplicitCongestionNotication (ECN)toIP, RFC3168,Sept.2001 [RFC3243]Carpenter,B.andBrim,S., Middleboxes:TaxonomyandIssues, RFC3234,Feb.2002 [RFC3235]Senie,D., NetworkAddressTranslator(NAT)-FriendlyApplicationDesignGuidelines, RFC3235, Jan.2002 [RFC3309]Stone,J.andStewart,R.andOtis,D., StreamControlTransmissionProtocol(SCTP)Checksum Change, RFC3309,Sept.2002 [RFC3315]Droms,R.andBound,J.andVolz,B.andLemon,T.andPerkins,C.andCarney,M., DynamicHost CongurationProtocolforIPv6(DHCPv6), RFC3315,July2003 [RFC3330]IANA, Special-UseIPv4Addresses, RFC3330,Sept.2002 [RFC3360]Floyd,S., InappropriateTCPResetsConsideredHarmful, RFC3360,Aug.2002 [RFC3390]Allman,M.andFloyd,S.andPartridge,C., IncreasingTCP'sInitialWindow, RFC3390,Oct.2002 [RFC3490]Faltstrom,P.andHoffman,P.andCostello,A., InternationalizingDomainNamesinApplications (IDNA), RFC3490,March2003 [RFC3501]Crispin,M., InternetMessageAccessProtocol-Version4rev1, RFC3501,March2003 [RFC3513]Hinden,R.andDeering,S., InternetProtocolVersion6(IPv6)AddressingArchitecture, RFC3513, April2003 [RFC3596]Thomson,S.andHuitema,C.andKsinant,V.andSouissi,M., DNSExtensionstoSupportIPVersion 6, RFC3596,October2003 [RFC3748]Aboba,B.andBlunk,L.andVollbrecht,J.andCarlson,J.andLevkowetz,H., ExtensibleAuthenticationProtocol(EAP), RFC3748,June2004 [RFC3819]Karn,P.andBormann,C.andFairhurst,G.andGrossman,D.andLudwig,R.andMahdavi,J.and Montenegro,G.andTouch,J.andWood,L., AdviceforInternetSubnetworkDesigners, RFC3819,July2004 [RFC3828]Larzon,L-A.andDegermark,M.andPink,S.andJonsson,L-E.andFairhurst,G., TheLightweight UserDatagramProtocol(UDP-Lite), RFC3828,July2004 [RFC3927]Cheshire,S.andAboba,B.andGuttman,E., DynamicCongurationofIPv4Link-LocalAddresses, RFC3927,May2005 [RFC3931]Lau,J.andTownsley,M.andGoyret,I., LayerTwoTunnelingProtocol-Version3(L2TPv3), RFC 3931,March2005 [RFC3971]Arkko,J.andKempf,J.andZill,B.andNikander,P., SEcureNeighborDiscovery(SEND), RFC 3971,March2005 [RFC3972]Aura,T., CryptographicallyGeneratedAddresses(CGA), RFC3972,March2005 [RFC3986]Berners-Lee,T.andFielding,R.andMasinter,L., UniformResourceIdentier(URI):GenericSyntax, RFC3986,January2005 [RFC4033]Arends,R.andAustein,R.andLarson,M.andMassey,D.andRose,S., DNSSecurityIntroduction andRequirements, RFC4033,March2005 Bibliography 267

    PAGE 272

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 [RFC4193]Hinden,R.andHaberman,B., UniqueLocalIPv6UnicastAddresses, RFC4193,Oct.2005 [RFC4251]Ylonen,T.andLonvick,C., TheSecureShell(SSH)ProtocolArchitecture, RFC4251,Jan.2006 [RFC4264]Grifn,T.andHuston,G., BGPWedgies, RFC4264,Nov.2005 [RFC4271]Rekhter,Y.andLi,T.andHares,S., ABorderGatewayProtocol4(BGP-4), RFC4271,Jan.2006 [RFC4291]Hinden,R.andDeering,S., IPVersion6AddressingArchitecture, RFC4291,Feb.2006 [RFC4301]Kent,S.andSeo,K., SecurityArchitecturefortheInternetProtocol, RFC4301,Dec.2005 [RFC4302]Kent,S., IPAuthenticationHeader, RFC4302,Dec.2005 [RFC4303]Kent,S., IPEncapsulatingSecurityPayload(ESP), RFC4303,Dec.2005 [RFC4340]Kohler,E.andHandley,M.andFloyd,S., DatagramCongestionControlProtocol(DCCP), RFC 4340,March2006 [RFC4443]Conta,A.andDeering,S.andGupta,M., InternetControlMessageProtocol(ICMPv6)fortheInternetProtocolVersion6(IPv6)Specication, RFC4443,March2006 [RFC4451]McPherson,D.andGill,V., BGPMULTI_EXIT_DISC(MED)Considerations, RFC4451,March 2006 [RFC4456]Bates,T.andChen,E.andChandra,R., BGPRouteReection:AnAlternativetoFullMeshInternal BGP(IBGP), RFC4456,April2006 [RFC4614]Duke,M.andBraden,R.andEddy,W.andBlanton,E., ARoadmapforTransmissionControlProtocol(TCP)SpecicationDocuments, RFC4614,Oct.2006 [RFC4648]Josefsson,S., TheBase16,Base32,andBase64DataEncodings, RFC4648,Oct.2006 [RFC4822]Atkinson,R.andFanto,M., RIPv2CryptographicAuthentication, RFC4822,Feb.2007 [RFC4838]Cerf,V.andBurleigh,S.andHooke,A.andTorgerson,L.andDurst,R.andScott,K.andFall,K. andWeiss,H., Delay-TolerantNetworkingArchitecture, RFC4838,April2007 [RFC4861]Narten,T.andNordmark,E.andSimpson,W.andSoliman,H.,`NeighborDiscoveryforIPversion6 (IPv6)`, RFC4861,Sept.2007 [RFC4862]Thomson,S.andNarten,T.andJinmei,T., IPv6StatelessAddressAutoconguration, RFC4862, Sept.2007 [RFC4870]Delany,M., Domain-BasedEmailAuthenticationUsingPublicKeysAdvertisedintheDNS(DomainKeys), RFC4870,May2007 [RFC4871]Allman,E.andCallas,J.andDelany,M.andLibbey,M.andFenton,J.andThomas,M., DomainKeys IdentiedMail(DKIM)Signatures, RFC4871,May2007 [RFC4941]Narten,T.andDraves,R.andKrishnan,S., PrivacyExtensionsforStatelessAddressAutocongurationinIPv6, RFC4941,Sept.2007 [RFC4944]Montenegro,G.andKushalnagar,N.andHui,J.andCuller,D., TransmissionofIPv6Packetsover IEEE802.15.4Networks, RFC4944,Sept.2007 [RFC4952]Klensin,J.andKo,Y., OverviewandFrameworkforInternationalizedEmail, RFC4952,July2007 [RFC4953]Touch,J., DefendingTCPAgainstSpoongAttacks, RFC4953,July2007 [RFC4954]Simeborski,R.andMelnikov,A., SMTPServiceExtensionforAuthentication, RFC4954,July2007 [RFC4963]Heffner,J.andMathis,M.andChandler,B., IPv4ReassemblyErrorsatHighDataRates, RFC4963, July2007 [RFC4966]Aoun,C.andDavies,E., ReasonstoMovetheNetworkAddressTranslator-ProtocolTranslator (NAT-PT)toHistoricStatus, RFC4966,July2007 [RFC4987]Eddy,W., TCPSYNFloodingAttacksandCommonMitigations, RFC4987,Aug.2007 [RFC5004]Chen,E.andSangli,S., AvoidBGPBestPathTransitionsfromOneExternaltoAnother, RFC5004, Sept.2007 268 Bibliography

    PAGE 273

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 [RFC5065]Traina,P.andMcPherson,D.andScudder,J., AutonomousSystemConfederationsforBGP, RFC 5065,Aug.2007 [RFC5068]Hutzler,C.andCrocker,D.andResnick,P.andAllman,E.andFinch,T., EmailSubmissionOperations:AccessandAccountabilityRequirements, RFC5068,Nov.2007 [RFC5072]Varada,S.andHaskins,D.andAllen,E., IPVersion6overPPP, RFC5072,Sept.2007 [RFC5095]Abley,J.andSavola,P.andNeville-Neil,G., DeprecationofType0RoutingHeadersinIPv6, RFC 5095,Dec.2007 [RFC5227]Cheshire,S., IPv4AddressConictDetection, RFC5227,July2008 [RFC5234]Crocker,D.andOverell,P., AugmentedBNFforSyntaxSpecications:ABNF, RFC5234,Jan.2008 [RFC5321]Klensin,J., SimpleMailTransferProtocol, RFC5321,Oct.2008 [RFC5322]Resnick,P., InternetMessageFormat, RFC5322,Oct.2008 [RFC5340]Coltun,R.andFerguson,D.andMoy,J.andLindem,A., OSPFforIPv6, RFC5340,July2008 [RFC5598]Crocker,D., InternetMailArchitecture, RFC5598,July2009 [RFC5646]Phillips,A.andDavis,M., TagsforIdentifyingLanguages, RFC5646,Sept.2009 [RFC5681]Allman,M.andPaxson,V.andBlanton,E., TCPcongestioncontrol, RFC5681,Sept.2009 [RFC5735]Cotton,M.andVegoda,L., SpecialUseIPv4Addresses, RFC5735,January2010 [RFC5795]Sandlund,K.andPelletier,G.andJonsson,L-E., TheRObustHeaderCompression(ROHC)Framework, RFC5795,March2010 [RFC6068]Duerst,M.,Masinter,L.andZawinski,J., The`mailto'URIScheme RFC6068,October2010 [RFC6144]Baker,F.andLi,X.andBao,X.andYin,K., FrameworkforIPv4/IPv6Translation, RFC6144,April 2011 [RFC6265]Barth,A., HTTPStateManagementMechanism, RFC6265,April2011 [RFC6274]Gont,F., SecurityAssessmentoftheInternetProtocolVersion4, RFC6274,July2011 [RG2010]Rhodes,B.andGoerzen,J., FoundationsofPythonNetworkProgramming:TheComprehensiveGuide toBuildingNetworkApplicationswithPython,SecondEdition,AcademicPress,2004 [RJ1995]Ramakrishnan,K.K.andJain,R., Abinaryfeedbackschemeforcongestionavoidanceincomputer networkswithaconnectionlessnetworklayer.SIGCOMMComput.Commun.Rev.25,1(Jan.1995),138156. [RY1994]Ramakrishnan,K.K.andHenryYang, TheEthernetCaptureEffect:AnalysisandSolution,ProceedingsofIEEE19thConferenceonLocalComputerNetworks,MN,Oct.1994. [Roberts1975]Roberts,L., ALOHApacketsystemwithandwithoutslotsandcapture.SIGCOMMComput. Commun.Rev.5,2(Apr.1975),28-42. [Ross1989]Ross,F., AnoverviewofFDDI:Theberdistributeddatainterface,IEEEJ.SelectedAreasinComm., vol.7,no.7,pp.1043-1051,Sept.1989 [Russel06]RussellA., RoughConsensusandRunningCodeandtheInternet-OSIStandardsWar,IEEEAnnals oftheHistoryofComputing,July-September2006 [SAO1990]Sidhu,G.,Andrews,R.,Oppenheimer,A., InsideAppleTalk,Addison-Wesley,1990 [SARK2002]Subramanian,L.,Agarwal,S.,Rexford,J.,Katz,R.. CharacterizingtheInternethierarchyfrom multiplevantagepoints.InIEEEINFOCOM,2002 [Sechrest]Sechrest,S., AnIntroductory4.4BSDInterprocessCommunicationTutorial,4.4BSDProgrammer's SupplementaryDocumentation [SG1990]Scheier,R.,Gettys,J., XWindowSystem:TheCompleteReferencetoXlib,XProtocol,ICCCM, XLFD,XVersion11,Release4,DigitalPress Bibliography 269

    PAGE 274

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 [SGP98]Stone,J.,Greenwald,M.,Partridge,C.,andHughes,J., PerformanceofchecksumsandCRC'soverreal data.IEEE/ACMTrans.Netw.6,5(Oct.1998),529-543. [SH1980]Shoch,J.F.andHupp,J.A., MeasuredperformanceofanEthernetlocalnetwork.Commun.ACM23, 12(Dec.1980),711-721. [SH2004]Senapathi,S.,Hernandez,R., IntroductiontoTCPOfoadEngines,March2004 [SMKKB2001]Stoica,I.,Morris,R.,Karger,D.,Kaashoek,F.,andBalakrishnan,H., Chord:Ascalablepeerto-peerlookupserviceforinternetapplications.InProceedingsofthe2001conferenceonApplications,technologies,architectures,andprotocolsforcomputercommunications(SIGCOMM`01).ACM,NewYork,NY, USA,149-160 [SMM1998]Semke,J.,Mahdavi,J.,andMathis,M., AutomaticTCPbuffertuning.SIGCOMMComput.Commun.Rev.28,4(Oct.1998),315-323. [SPMR09]Stigge,M.,Plotz,H.,Muller,W.,Redlich,J., ReversingCRC-TheoryandPractice.Berlin:Humboldt UniversityBerlin.pp.24. [STBT2009]Sridharan,M.,Tan,K.,Bansal,D.,Thaler,D., CompoundTCP:ANewTCPCongestionControl forHigh-SpeedandLongDistanceNetworks,Internetdraft,workinprogress,April2009 [Seifert2008]Seifert,R.,Edwards,J., TheAll-NewSwitchBook:ThecompleteguidetoLANswitchingtechnology,Wiley,2008 [Selinger]Selinger,P., MD5collisiondemo, http://www.mscs.dal.ca/~selinger/md5collision/ [SFR2004]StevensR.andFenner,andRudoff,A., UNIXNetworkProgramming:ThesocketsnetworkingAPI, AddisonWesley,2004 [Sklower89]Sklower,K.1989. ImprovingtheefciencyoftheOSIchecksumcalculation.SIGCOMMComput. Commun.Rev.19,5(Oct.1989),32-43. [Smm98]Semke,J.,Mahdavi,J.,andMathis,M., AutomaticTCPbuffertuning.SIGCOMMComput.Commun. Rev.28,4(Oct.1998),315-323. [Stevens1994]Stevens,R., TCP/IPIllustrated:theProtocols,Addison-Wesley,1994 [Stevens1998]Stevens,R., UNIXNetworkProgramming,Volume1,SecondEdition:NetworkingAPIs:Sockets andXTI,PrenticeHall,1998 [Stewart1998]Stewart,J., BGP4:Inter-DomainRoutingInTheInternet,Addison-Wesley,1998 [Stoll1988]Stoll,C., Stalkingthewilyhacker,Commun.ACM31,5(May.1988),484-497. [TE1993]Tsuchiya,P.F.andEng,T., ExtendingtheIPinternetthroughaddressreuse.SIGCOMMComput. Commun.Rev.23,1(Jan.1993),16-33. [Thomborson1992]Thomborson,C., TheV.42bisStandardforData-CompressingModems,IEEEMicro, September/October1992(vol.12no.5),pp.41-53 [Unicode]TheUnicodeConsortium. TheUnicodeStandard,Version5.0.0,denedby:TheUnicodeStandard, Version5.0(Boston,MA,Addison-Wesley,2007 [VPD2004]Vasseur,J.,Pickavet,M.,andDemeester,P., NetworkRecovery:ProtectionandRestorationofOptical,SONET-SDH,IP,andMPLS.MorganKaufmannPublishersInc.,2004 [Varghese2005]Varghese,G., NetworkAlgorithmics:AnInterdisciplinaryApproachtoDesigningFastNetworkedDevices,MorganKaufmann,2005 [Vyncke2007]Vyncke,E.,Paggen,C., LANSwitchSecurity:WhatHackersKnowAboutYourSwitches,Cisco Press,2007 [WB2008]Waserman,M.,Baker,F., IPv6-to-IPv6NetworkAddressTranslation(NAT66),Internetdraft,November2008, http://tools.ietf.org/html/draft-mrw-behave-nat66-02 [WMH2008]Wilson,P.,Michaelson,G.,Huston,G., Redesignationof240/4fromFutureUsetoPrivate Use,Internetdraft,September2008,workinprogress, http://tools.ietf.org/html/draft-wilson-class-e-02 [WMS2004]White,R.,McPherson,D.,Srihari,S., PracticalBGP,Addison-Wesley,2004 270 Bibliography

    PAGE 275

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 [Watson1981]Watson,R., Timer-BasedMechanismsinReliableTransportProtocolConnectionManagement. ComputerNetworks5:47-56(1981) [Williams1993]Williams,R. ApainlessguidetoCRCerrordetectionalgorithms,August1993,unpublished manuscript, http://www.ross.net/crc/download/crc_v3.txt [Winston2003]Winston,G., NetBIOSSpecication,2003 [WY2011]Wing,D.andYourtchenko,A., HappyEyeballs:SuccesswithDual-StackHosts,Internetdraft,work inprogress,July2011, http://tools.ietf.org/html/draft-ietf-v6ops-happy-eyeballs-03 [X200]ITU-T,recommendationX.200, OpenSystemsInterconnection-ModelandNotation,1994 [X224]ITU-T,recommendationX.224, Informationtechnology-OpenSystemsInterconnection-Protocolfor providingtheconnection-modetransportservice,1995 [XNS]Xerox, XeroxNetworkSystemsArchitecture,XNSG058504,1985 [Zimmermann80]Zimmermann,H., OSIReferenceModel-TheISOModelofArchitectureforOpenSystems Interconnection,IEEETransactionsonCommunications,vol.28,no.4,April1980,pp.425-432. Bibliography 271

    PAGE 276

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 272 Bibliography

    PAGE 277

    Index Symbols ::,160 ::1,160 0.0.0.0,146,155 100BaseTX,234 10Base2,233 10Base5,233 10BaseT,233 127.0.0.1,146 255.255.255.255,155 802.11frameformat,242 802.5dataframe,227 802.5tokenframe,227 A abruptconnectionrelease,18,86 AdditiveIncreaseMultiplicativeDecrease(AIMD),109 address,11 addresslearning,235 AddressResolutionProtocol,154 adhocnetwork,242 AF_INET,56 AF_INET6,56 AF_UNSPEC,56 AIMD,249 ALG,169 ALOHA,216 AlternatingBitProtocol,72 anycast,249 API,249 Applicationlayer,23 ApplicationLevelGateway, 169 ARP, 154, 249 ARPcache, 154 ARPANET, 249 ascii, 249 ASN.1, 249 ATM, 249 B Base64encoding, 41 BasicServiceSet(BSS), 242 beaconframe(802.11), 245 BGP, 178, 249 BGPAdj-RIB-In, 181 BGPAdj-RIB-Out, 181 BGPdecisionprocess, 187 BGPKEEPALIVE, 180 BGPlocal-preference, 187 BGPnexthop, 183 BGPNOTIFICATION, 180 BGPOPEN, 180 BGPpeer, 179 BGPRIB, 181 BGPUPDATE, 180 binaryexponentialback-off(CSMA/CD), 221 bitstufng, 212 BNF, 249 BorderGatewayProtocol, 178 bridge, 234 broadcast, 249 BSS, 242 C CarrierSenseMultipleAccess, 217 CarrierSenseMultipleAccesswithCollisionAvoidance, 222 CarrierSenseMultipleAccesswithCollisionDetection, 218 characterstufng, 213 Checksumcomputation, 88 CIDR, 249 ClassAIPv4address, 142 ClassBIPv4address, 142 ClassCIPv4address, 142 ClasslessInterdomainRouting, 144 ClearToSend, 225 Coldpotatorouting, 191 collision, 215 collisiondetection, 219 collisiondomain, 233 conrmedconnectionlessservice, 13 congestioncollapse, 107 connectionestablishment, 16 273

    PAGE 278

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 connection-orientedservice,16 Connectionlessservice,12 connectionlessservice,13 counttoinnity, 135 CSMA, 217 CSMA(non-persistent), 218 CSMA(persistent), 217 CSMA/CA, 222 CSMA/CD, 218 CTS, 225 CTSframe(802.11), 244 cumulativeacknowledgements, 76 customer-providerpeeringrelationship, 177 D datagram, 129 Datalinklayer, 22 delayedacknowledgements, 102 DenialofService, 93 DHCP, 155 DHCPv6, 166 dial-upline, 249 DIFS, 222 Distancevector, 133 DistributedCoordinationFunctionInterFrameSpace, 222 DNS, 249 DNSmessageformat, 34 DynamicHostCongurationProtocol, 155 E EAP, 229 eBGP, 185, 250 EGP, 250 EIFS, 222 EIGRP, 250 electricalcable, 20 emailmessageformat, 38 EndingDelimiter(TokenRing), 227 Ethernetbridge, 234 EthernetDIXframeformat, 231 Ethernethub, 233 Ethernetswitch, 234 EthernetTypeeld, 230 EtherType, 230 exponentialbackoff, 102 exportpolicy, 178 ExtendedInterFrameSpace, 222 ExtensibleAuthenticationProtocol, 229 F Fairness, 107 FastEthernet, 234 FDM, 215 rewall, 166 Fivelayersreferencemodel, 20 frame, 22, 250 Frame-Relay, 250 framing, 212 FrequencyDivisionMultiplexing, 215 FTP, 250 ftp, 250 G getaddrinfo, 56 go-back-n, 75 gracefulconnectionrelease, 19, 86 H Hellomessage, 137 hiddenstationproblem, 225 hop-by-hopforwarding, 129 hosts.txt, 32, 250 Hotpotatorouting, 190 HTML, 250 HTTP, 250 hub, 250 I IANA, 250 iBGP, 185, 250 ICANN, 250 ICMP, 150 IETF, 250 IGP, 250 IGRP, 250 IMAP, 250 importpolicy, 178 independentnetwork, 242 infrastructurenetwork, 242 interdomainroutingpolicy, 178 Internet, 250 internet, 250 InternetControlMessageProtocol, 150 inversequery, 250 IP, 250 IPoptions, 150 IPprex, 142 IPsubnet, 142 IPv4, 250 IPv4fragmentationandreassembly, 149 IPv6, 251 IPv6fragmentation, 163 IS-IS, 251 ISN, 251 ISO, 251 ISO-3166, 251 ISP, 251 ITU, 251 IXP, 251 J jamming, 219 jumbogram, 162 274 Index

    PAGE 279

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 L label,131 LAN,251 largewindow, 99 leasedline, 251 LinkLocaladdress, 160 linklocalIPv4addresses, 146 link-staterouting, 137 LLC, 232 LogicalLinkControl(LLC), 232 loopbackinterface, 185 M MACaddresslearning, 235 MACaddresstable(Ethernetswitch), 235 MAN, 251 man-in-the-middleattack, 155 Manchesterencoding, 212 max-minfairness, 107 MaximumSegmentLifetime(MSL), 83 maximumsegmentlifetime(MSL), 79 MaximumSegmentSize, 94 MaximumTransmissionUnit, 148 MaximumTransmissionUnit(MTU), 149 message-modedatatransfer, 17 Middlebox, 166 MIME, 251 MIMEdocument, 251 minicomputer, 251 modem, 251 Monitorstation, 228 monomodeopticalber, 20 MSS, 94, 251 MTU, 148 Multi-ExitDiscriminator(MED), 191 multicast, 251 multihomedhost, 142 multihomednetwork, 145 multimodeopticalber, 20 N Naglealgorithm, 98 nameserver, 251 NAT, 168, 251 NAT66, 169 NBMA, 128, 251 NeighbourDiscoveryProtocol, 166 NetworkAddressTranslation, 168 NetworkInformationCenter, 32 Networklayer, 22 network-byteorder, 251 NFS, 252 Non-BroadcastMulti-AccessNetworks, 128 non-persistentCSMA, 218 NTP, 252 O OpenShortestPathFirst, 172 opticalber, 20 orderingofSDUs, 13 OrganisationUniqueIdentier, 230 OSI, 252 OSIreferencemodel, 23 OSPF, 172, 252 OSPFarea, 172 OSPFDesignatedRouter, 173 OUI, 230 P packet, 22, 252 packetradio, 216 packetsizedistribution, 99 PathMTUdiscovery, 153 PBL, 252 peer-to-peer, 30 persistencetimer, 79 persistentCSMA, 217 Physicallayer, 22 piggybacking, 81 ping, 152 ping6, 165 Point-to-PointProtocol, 228 POP, 252 PostOfceProtocol, 45 PPP, 228 privateIPv4addresses, 146 ProviderAggregatableaddress, 159 ProviderIndependentaddress, 159 provisionofabytestreamservice, 81 R Referencemodels, 20 reliableconnectionlessservice, 13 RequestToSend, 225 resolver, 252 RFC RFC1032, 32, 264 RFC1035, 3335, 249, 264 RFC1042, 245, 264 RFC1055, 229, 265 RFC1071, 88, 114, 265 RFC1094, 252 RFC1122, 23, 90, 91, 95, 103, 231, 265 RFC1144, 229, 265 RFC1149, 81, 265 RFC1169, 5, 265 RFC1191, 153, 265 RFC1195, 172, 265 RFC1258, 92, 265 RFC1305, 252 RFC1321, 114, 265 RFC1323, 95, 99101, 105, 265 RFC1347, 158, 265 RFC1350, 116 RFC1518, 144, 249, 265 RFC1519, 143, 145, 265 Index 275

    PAGE 280

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 RFC1542,11,265 RFC1548,229,265 RFC1550,158,265 RFC1561,158,265 RFC1621,158,265 RFC1624,157,265 RFC1631,158,265 RFC1661,127,154,265 RFC1662,229,265 RFC1710,158,161,265 RFC1738,48,54,265 RFC1752,157,158,265 RFC1812,110,143,265 RFC1819,158,265 RFC1889,265 RFC1896,41,265 RFC1918,147,160,168,183,265 RFC1939,45,46,64,252,266 RFC1945,50,51,64,266 RFC1948,92,266 RFC1951,162,266 RFC1981,165,266 RFC20,28,41,264 RFC2001,117 RFC2003,187,266 RFC2018,104,266 RFC2045,40,41,251,266 RFC2046,40,41,266 RFC2050,144,266 RFC2080,171,172,266 RFC2082,171,266 RFC2131,156,266 RFC2140,97,101,266 RFC2225,156,266 RFC2328,172,175,252,266 RFC2332,156,266 RFC2364,230,266 RFC2368,266 RFC2453,171,252,266 RFC2460,161163,266 RFC2464,232,266 RFC2507,162,266 RFC2516,230,266 RFC2581,104,266 RFC2616,44,5053,65,250,266 RFC2617,54,266 RFC2622,178,266 RFC2675,162 RFC2711,163 RFC2766,169,266 RFC2821,43 RFC2854,41,266 RFC2920,114 RFC2965,266 RFC2988,94,100102,267 RFC2991,267 RFC3021,143,267 RFC3022,168,267 RFC3031,186,267 RFC3168,90,110,118,267 RFC3187,48 RFC3234,166,267 RFC3235,169,267 RFC3309,71,114,267 RFC3315,166,267 RFC3330,267 RFC3360,95,267 RFC3390,112,117,267 RFC3490,267 RFC3501,45,250,267 RFC3513,158,267 RFC3596,36,267 RFC3748,229,267 RFC3782,113,125 RFC3819,127,267 RFC3828,267 RFC3927,147,160,267 RFC3931,186,267 RFC3971,166,267 RFC3972,166,267 RFC3986,48,54,267 RFC4033,35,267 RFC4151,48 RFC4193,160,268 RFC4251,47,268 RFC4253,252 RFC4264,192,268 RFC4271,178,180,268 RFC4287,48 RFC4291,160,268 RFC4301,164,268 RFC4302,164,268 RFC4303,164,268 RFC4340,268 RFC4443,164,165,268 RFC4451,191,268 RFC4456,185,268 RFC4614,89,268 RFC4632,249 RFC4634,114 RFC4648,41,42,268 RFC4822,171,268 RFC4838,6,268 RFC4861,166,268 RFC4862,166,268 RFC4870,45,268 RFC4871,45,268 RFC4941,166,268 RFC4944,148,162,268 RFC4952,41,268 RFC4953,95,268 RFC4954,43,45,268 RFC4963,153,157,268 RFC4966,169,268 RFC4987,94,268 RFC5004,191,268 276 Index

    PAGE 281

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 RFC5065,185,269 RFC5068,45,269 RFC5072,229,269 RFC5095,163,269 RFC5227,155,269 RFC5234,29,249,269 RFC5246,253 RFC5321,43,44,63,269 RFC5322,38,39,43,269 RFC5340,172,252,269 RFC5531,252 RFC5598,38,269 RFC5646,65,269 RFC5681,110,111,125,269 RFC5735,146,269 RFC5795,229,269 RFC5880,262 RFC5890,33,41 RFC6068,48,269 RFC6144,169,269 RFC6265,55,269 RFC6274,150,269 RFC768,87,264 RFC789,138,264 RFC791,29,91,107,141, 142, 144, 147, 148, 150, 156, 264 RFC792, 151, 264 RFC793, 81, 8991, 95, 9799, 101, 102, 124, 264 RFC813, 103, 264 RFC819, 32, 264 RFC821, 63, 252, 264 RFC822, 40 RFC826, 154, 249, 264 RFC854, 253 RFC879, 94, 264 RFC893, 231, 264 RFC894, 232, 245, 264 RFC896, 98, 107, 264 RFC952, 32, 264 RFC959, 44, 46, 169, 250, 264 RFC974, 62, 264 RIP, 170, 252 RIR, 252 Robustnessprinciple, 96 rootnameserver, 252 round-trip-time, 252 router, 252 RoutingInformationProtocol, 170 RPC, 252 RTS, 225 RTSframe(802.11), 244 S SDU, 12 SDU(ServiceDataUnit), 252 segment, 23, 252 selectiveacknowledgements, 78 selectiverepeat, 77 sendto, 56 sequencenumber, 72 SerialLineIP, 228 serviceaccesspoint, 11 ServiceDataUnit, 12 serviceprimitives, 12 ServiceSetIdentity(SSID), 245 shared-costpeeringrelationship, 178 ShortInterFrameSpacing, 222 siblingpeeringrelationship, 178 SIFS, 222 SLAC, 166 slottime(Ethernet), 221 slottedALOHA, 217 slotTime(CSMA/CA), 223 SMTP, 252 SNMP, 252 SOCK_DGRAM, 56 SOCK_STREAM, 56 socket, 56, 252 socket.bind, 59 socket.close, 57 socket.connect, 57 socket.recv, 57 socket.recvfrom, 59 socket.send, 57 socket.shutdown, 57 sourcerouting, 131 speedoflight, 218 splithorizon, 136 splithorizonwithpoisonreverse, 136 spoofedpacket, 252 SSH, 252 SSID, 245 standardquery, 252 StartingDelimiter(TokenRing), 227 StatelessAddressConguration, 166 stream-modedatatransfer, 17 stubdomain, 175 stufng(bit), 212 stufng(character), 213 subnetmask, 142 switch, 234, 252 SYNcookie, 252 SYNcookies, 93 T TCB, 252 TCP, 89, 252 TCPConnectionestablishment, 90 TCPconnectionrelease, 95 TCPfastretransmit, 103 TCPheader, 89 TCPInitialSequenceNumber, 91 TCPMSS, 94 TCPOptions, 94 TCPRST, 92 Index 277

    PAGE 282

    ComputerNetworking:Principles,ProtocolsandPractice,Release0.25 TCPSACK,104 TCPselectiveacknowledgements, 104 TCPselfclocking, 105 TCPSYN, 90 TCPSYN+ACK, 90 TCP/IP, 252 TCP/IPreferencemodel, 23 telnet, 253 Tier-1ISP, 194 TimeDivisionMultiplexing, 216 TimeToLive(IP), 147 time-sequencediagram, 12 TLD, 253 TLS, 253 TokenHoldingTime, 228 TokenRingdataframe, 227 TokenRingMonitor, 227 TokenRingtokenframe, 227 traceroute, 153 traceroute6, 165 transitdomain, 175 TransmissionControlBlock, 97 transportclock, 83 Transportlayer, 23 two-wayconnectivity, 140 U UDP, 86, 253 UDPChecksum, 88 UDPsegment, 87 unicast, 253 UniqueLocalUnicastIPv6, 160 unreliableconnectionlessservice, 13 V virtualcircuit, 129 VirtualLAN, 240 VLAN, 240 vnc, 253 W W3C, 253 WAN, 253 WavelengthDivisionMultiplexing, 215 WDM, 215 WiFi, 241 X X.25, 253 X11, 253 XML, 253 278 Index